public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [nios2] [0/7] Support for Nios II R2
@ 2015-07-14 22:29 Sandra Loosemore
  2015-07-14 22:35 ` [nios2] [1/7] Add -march=, -mbmx, -mcdx flags Sandra Loosemore
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 22:29 UTC (permalink / raw)
  To: GCC Patches

I will shortly begin committing a patch series to add GCC support for
Nios II R2, a revision of the original Nios II instruction set.  I
previously wrote up some notes on the technical changes from R1 to R2
when I posted the corresponding binutils patches, here:

https://sourceware.org/ml/binutils/2015-07/msg00014.html

The patch series is in 7 parts.  Parts 1-3 add support for the R2
re-encodings of the base R1 instruction set.  Parts 4-7 include
support for generating the new R2-specific instructions.

[1] Add -march=, -mbmx, -mcdx flags
[2] Adjust for reduced offsets in R2 load/store IO insns
[3] Correct nested function trampolines for R2 encodings
[4] Support new R2 instructions
[5] Support R2 CDX load/store multiple instructions
[6] Update function prologues/epilogues for R2 CDX
[7] Add new intrinsics

The patches are self-contained enough build individually when applied
in sequence, but I've only tested them together as a group.  Locally,
we have been building and testing three multilibs for nios2-elf: the
default R1, plain R2, and R2 with CDX and BMX extensions enabled.  For
now we are leaving R1 as the only multilib being built by default.

Presently there is no support for R2 on nios2-linux-gnu targets.  This
isn't a fundamental limitation of the architecture, we just don't have
kernel or glibc/dynamic linker support yet.  I've regression-tested
the patches on the default R1 nios2-linux-gnu target to ensure they
don't break anything, though.

-Sandra

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [1/7] Add -march=, -mbmx, -mcdx flags
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
@ 2015-07-14 22:35 ` Sandra Loosemore
  2015-07-14 23:01 ` [nios2] [2/7] Adjust for reduced offsets in R2 load/store IO insns Sandra Loosemore
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 22:35 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 791 bytes --]

This patch adds command-line options to select the new R2 variants of
the Nios II architecture.  Aside from checking for conflicting
options, the only thing that this patch does with the new flags is pass
-march= through to the assembler.

The -mbmx and -mcdx options will enable code generation for the
optional BMX and CDX R2 extensions in subsequent patches in this
series.  The instructions added by the third optional R2 extension,
MPX (multiprocessor), are only emitted by explicit intrinsics added in
part 7 of the series, so there doesn't need to be an option for that
one.

This patch is sufficient to compile programs correctly for R2 provided
that they don't use load/store IO instructions (fixed in part 2) or
nested functions (fixed in part 3).

Committed as r225791.

-Sandra


[-- Attachment #2: r2-1.log --]
[-- Type: text/x-log, Size: 744 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/nios2.opt (march, mbmx, mcdx): New options.
	* config/nios2/nios2-opts.h (enum nios2_arch_type): New enum for
	Nios II architecture level.
	* config/nios2/nios2.h (TARGET_ARCH_R2): New define.
	(TARGET_CPU_CPP_BUILTINS): Add definition of __nios2_arch__ symbol.
	(OPTION_DEFAULT_SPECS): Define.
	(ASM_SPEC): Add -march= spec strings.
	* config/nios2/nios2.c (nios2_option_override): Check for
	conflicts involving new options.
	* config.gcc (nios2*-*-*): Support --with-arch=.
	* doc/invoke.texi (Option Summary, Nios II Options): Document
	-march=, -mbmx,	and -mcdx.


[-- Attachment #3: r2-1.patch --]
[-- Type: text/x-patch, Size: 6009 bytes --]

Index: gcc/config/nios2/nios2.opt
===================================================================
--- gcc/config/nios2/nios2.opt	(revision 225786)
+++ gcc/config/nios2/nios2.opt	(working copy)
@@ -565,4 +565,24 @@ mcustom-round=
 Target Report RejectNegative Joined UInteger Var(nios2_custom_round) Init(-1)
 Integer id (N) of round custom instruction
 
+march=
+Target RejectNegative Joined Enum(nios2_arch_type) Var(nios2_arch_option) Init(ARCH_R1)
+Specify the name of the target architecture.
 
+Enum
+Name(nios2_arch_type) Type(enum nios2_arch_type)
+Valid Nios II ISA levels (for -march):
+
+EnumValue
+Enum(nios2_arch_type) String(r1) Value(ARCH_R1)
+
+EnumValue
+Enum(nios2_arch_type) String(r2) Value(ARCH_R2)
+
+mbmx
+Target Report Mask(HAS_BMX)
+Enable generation of R2 BMX instructions
+
+mcdx
+Target Report Mask(HAS_CDX)
+Enable generation of R2 CDX instructions
Index: gcc/config/nios2/nios2-opts.h
===================================================================
--- gcc/config/nios2/nios2-opts.h	(revision 225786)
+++ gcc/config/nios2/nios2-opts.h	(working copy)
@@ -77,5 +77,12 @@ enum nios2_ccs_code
   CCS_BUILTIN_CALL
 };
 
+/* Supported Nios II Architectures.  */
+enum nios2_arch_type
+{
+  ARCH_R1=1,
+  ARCH_R2
+};
+
 #endif
 
Index: gcc/config/nios2/nios2.h
===================================================================
--- gcc/config/nios2/nios2.h	(revision 225786)
+++ gcc/config/nios2/nios2.h	(working copy)
@@ -23,6 +23,9 @@
 #ifndef GCC_NIOS2_H
 #define GCC_NIOS2_H
 
+/* Indicate R2 ISA level support.  */
+#define TARGET_ARCH_R2 (nios2_arch_option == ARCH_R2)
+
 /* FPU insn codes declared here.  */
 #include "config/nios2/nios2-opts.h"
 
@@ -36,7 +39,9 @@
         builtin_define_std ("nios2_big_endian");    \
       else                                          \
         builtin_define_std ("nios2_little_endian"); \
-    }                                               \
+      builtin_define_with_int_value (		    \
+        "__nios2_arch__", (int) nios2_arch_option); \
+    }						    \
   while (0)
 
 /* We're little endian, unless otherwise specified by defining
@@ -50,14 +55,17 @@
 # define TARGET_DEFAULT (MASK_HAS_MUL | TARGET_ENDIAN_DEFAULT)
 #endif
 
+#define OPTION_DEFAULT_SPECS \
+  {"arch", "%{!march=*:%{!mcpu=*:-march=%(VALUE)}}" }
+
 #define CC1_SPEC "%{G*}"
 
 #if TARGET_ENDIAN_DEFAULT == 0
-# define ASM_SPEC "%{!meb:-EL} %{meb:-EB}"
+# define ASM_SPEC "%{!meb:-EL} %{meb:-EB} %{march=*:-march=%*}"
 # define LINK_SPEC_ENDIAN "%{!meb:-EL} %{meb:-EB}"
 # define MULTILIB_DEFAULTS { "EL" }
 #else
-# define ASM_SPEC "%{!mel:-EB} %{mel:-EL}"
+# define ASM_SPEC "%{!mel:-EB} %{mel:-EL} %{march=*:-march=%*}"
 # define LINK_SPEC_ENDIAN "%{!mel:-EB} %{mel:-EL}"
 # define MULTILIB_DEFAULTS { "EB" }
 #endif
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225787)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -1078,6 +1078,19 @@ nios2_option_override (void)
   if (!TARGET_HAS_MUL && TARGET_HAS_MULX)
     target_flags &= ~MASK_HAS_MULX;
 
+  /* Optional BMX and CDX instructions only make sense for R2.  */
+  if (!TARGET_ARCH_R2)
+    {
+      if (TARGET_HAS_BMX)
+	error ("BMX instructions are only supported with R2 architecture");
+      if (TARGET_HAS_CDX)
+	error ("CDX instructions are only supported with R2 architecture");
+    }
+
+  /* R2 is little-endian only.  */
+  if (TARGET_ARCH_R2 && TARGET_BIG_ENDIAN)
+    error ("R2 architecture is little-endian only");
+
   /* Initialize default FPU configurations.  */
   nios2_init_fpu_configs ();
 
Index: gcc/config.gcc
===================================================================
--- gcc/config.gcc	(revision 225786)
+++ gcc/config.gcc	(working copy)
@@ -4052,6 +4052,19 @@ case "${target}" in
 		esac
 		;;
 
+	nios2*-*-*)
+		supported_defaults="arch"
+			case "$with_arch" in
+			"" | r1 | r2)
+				# OK
+				;;
+			*)
+				echo "Unknown arch used in --with-arch=$with_arch" 1>&2
+				exit 1
+				;;
+			esac
+		;;
+
 	powerpc*-*-* | rs6000-*-*)
 		supported_defaults="abi cpu cpu_32 cpu_64 float tune tune_32 tune_64 advance_toolchain"
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 225786)
+++ gcc/doc/invoke.texi	(working copy)
@@ -857,7 +857,8 @@ Objective-C and Objective-C++ Dialects}.
 -mhw-mul -mno-hw-mul -mhw-mulx -mno-hw-mulx -mno-hw-div -mhw-div @gol
 -mcustom-@var{insn}=@var{N} -mno-custom-@var{insn} @gol
 -mcustom-fpu-cfg=@var{name} @gol
--mhal -msmallc -msys-crt0=@var{name} -msys-lib=@var{name}}
+-mhal -msmallc -msys-crt0=@var{name} -msys-lib=@var{name} @gol
+-march=@var{arch} -mbmx -mno-bmx -mcdx -mno-cdx}
 
 @emph{Nvidia PTX Options}
 @gccoptlist{-m32 -m64 -mmainkernel}
@@ -18500,6 +18501,15 @@ small data section.
 Generate little-endian (default) or big-endian (experimental) code,
 respectively.
 
+@item -march=@var{arch}
+@opindex march
+This specifies the name of the target Nios II architecture.  GCC uses this
+name to determine what kind of instructions it can emit when generating
+assembly code.  Permissible names are: @samp{r1}, @samp{r2}.
+
+The preprocessor macro @code{__nios2_arch__} is available to programs,
+with value 1 or 2, indicating the targeted ISA level.
+
 @item -mbypass-cache
 @itemx -mno-bypass-cache
 @opindex mno-bypass-cache
@@ -18538,6 +18548,15 @@ Enable or disable emitting @code{mul}, @
 instructions by the compiler. The default is to emit @code{mul}
 and not emit @code{div} and @code{mulx}.
 
+@item -mbmx
+@itemx -mno-bmx
+@itemx -mcdx
+@itemx -mno-cdx
+Enable or disable generation of Nios II R2 BMX (bit manipulation) and
+CDX (code density) instructions.  Enabling these instructions also
+requires @option{-march=r2}.  Since these instructions are optional
+extensions to the R2 architecture, the default is not to emit them.
+
 @item -mcustom-@var{insn}=@var{N}
 @itemx -mno-custom-@var{insn}
 @opindex mcustom-@var{insn}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [2/7] Adjust for reduced offsets in R2 load/store IO insns
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
  2015-07-14 22:35 ` [nios2] [1/7] Add -march=, -mbmx, -mcdx flags Sandra Loosemore
@ 2015-07-14 23:01 ` Sandra Loosemore
  2015-07-14 23:18 ` [nios2] [3/7] Correct nested function trampolines for R2 encodings Sandra Loosemore
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 23:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 876 bytes --]

Nios II has a group of load/store IO instructions that bypass the
normal memory cache; they're intended to be used for accessing
memory-mapped IO peripherals.  In the R2 re-encoding of the Nios II
instruction set, the constant offset field for these instructions has
been reduced from 16 bits to 12, so GCC needs a new constraint for
memory addresses for these instructions.

A "gotcha" here is that the new encodings don't play nicely with
GP-relative addressing.  %gprel is a 16-bit relocation, and adding a
12-bit equivalent didn't seem very useful as it would restrict the
size of the small data area to only 4K.  Moreover, we'd expect IO
peripherals to be mapped somewhere other than the normal small data
section.  So, we just tell GCC not to emit GP-relative addresses
for anything that might be used in a R2 load/store IO instruction.

Committed as r225792.

-Sandra


[-- Attachment #2: r2-2.log --]
[-- Type: text/x-log, Size: 1012 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/nios2.h (SMALL_INT12): New macro.
	* config/nios2/nios2.c (nios2_valid_addr_offset_p): New function.
	(nios2_valid_addr_expr_p): Use it.
	(nios2_symbol_ref_in_small_data_p): Disallow GP-relative addressing
	with implicit "io" instructions on R2.
	* config/nios2/constraints.md (w): New constraint.
	* config/nios2/predicates.md (ldstio_memory_operand): New.
	* config/nios2/nios2.md (ld<bhw_uns>io, ld<bh>io): Update memory
	operand predicate and constraint.
	(ld<bh>io_signed, st<bhw>io>): Likewise.
	* doc/md.texi (Machine Constraints): Document w constraint.

	gcc/testsuite/
	* gcc.target/nios2/r2-io-range.c: New.
	* gcc.target/nios2/r2-stio-1.c: New.
	* gcc.target/nios2/r2-stio-2.c: New.
	* gcc.target/nios2/nios2-ldxio.c: New.
	* gcc.target/nios2/nios2-stxio.c: Change to assemble test instead
	of just compile.  Add more tests.

[-- Attachment #3: r2-2.patch --]
[-- Type: text/x-patch, Size: 11165 bytes --]

Index: gcc/config/nios2/nios2.h
===================================================================
--- gcc/config/nios2/nios2.h	(revision 225791)
+++ gcc/config/nios2/nios2.h	(working copy)
@@ -216,6 +216,7 @@ enum reg_class
 /* Tests for various kinds of constants used in the Nios II port.  */
 
 #define SMALL_INT(X) ((unsigned HOST_WIDE_INT)(X) + 0x8000 < 0x10000)
+#define SMALL_INT12(X) ((unsigned HOST_WIDE_INT)(X) + 0x800 < 0x1000)
 #define SMALL_INT_UNSIGNED(X) ((X) >= 0 && (X) < 0x10000)
 #define UPPER16_INT(X) (((X) & 0xffff) == 0)
 #define SHIFT_INT(X) ((X) >= 0 && (X) <= 31)
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225791)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -1627,6 +1627,21 @@ nios2_regno_ok_for_base_p (int regno, bo
 	  || regno == ARG_POINTER_REGNUM);
 }
 
+/* Return true if OFFSET is permitted in a load/store address expression.
+   Normally any 16-bit value is permitted, but on R2 if we may be emitting
+   the IO forms of these instructions we must restrict the offset to fit
+   in a 12-bit field instead.  */
+
+static bool
+nios2_valid_addr_offset_p (rtx offset)
+{
+  return (CONST_INT_P (offset)
+	  && ((TARGET_ARCH_R2 && (TARGET_BYPASS_CACHE
+				  || TARGET_BYPASS_CACHE_VOLATILE))
+	      ? SMALL_INT12 (INTVAL (offset))
+	      : SMALL_INT (INTVAL (offset))));
+}
+
 /* Return true if the address expression formed by BASE + OFFSET is
    valid.  */
 static bool
@@ -1637,7 +1652,7 @@ nios2_valid_addr_expr_p (rtx base, rtx o
   return (REG_P (base)
 	  && nios2_regno_ok_for_base_p (REGNO (base), strict_p)
 	  && (offset == NULL_RTX
-	      || const_arith_operand (offset, Pmode)
+	      || nios2_valid_addr_offset_p (offset)
 	      || nios2_unspec_reloc_p (offset)));
 }
 
@@ -1739,6 +1754,13 @@ nios2_symbol_ref_in_small_data_p (rtx sy
   if (SYMBOL_REF_TLS_MODEL (sym) != 0)
     return false;
 
+  /* On Nios II R2, there is no GP-relative relocation that can be
+     used with "io" instructions.  So, if we are implicitly generating
+     those instructions, we cannot emit GP-relative accesses.  */
+  if (TARGET_ARCH_R2
+      && (TARGET_BYPASS_CACHE || TARGET_BYPASS_CACHE_VOLATILE))
+    return false;
+
   /* If the user has explicitly placed the symbol in a small data section
      via an attribute, generate gp-relative addressing even if the symbol
      is external, weak, or larger than we'd automatically put in the
Index: gcc/config/nios2/constraints.md
===================================================================
--- gcc/config/nios2/constraints.md	(revision 225790)
+++ gcc/config/nios2/constraints.md	(working copy)
@@ -28,6 +28,10 @@
 ;;  N: 0 to 255 (for custom instruction numbers)
 ;;  O: 0 to 31 (for control register numbers)
 ;;
+;; We use the following constraint letters for memory constraints
+;;
+;;  w: memory operands for load/store IO and cache instructions
+;;
 ;; We use the following built-in register classes:
 ;;
 ;;  r: general purpose register (r0..r31)
@@ -89,3 +93,7 @@
 (define_constraint "T"
   "A constant unspec offset representing a relocation."
   (match_test "nios2_unspec_reloc_p (op)"))
+
+(define_memory_constraint "w"
+  "A memory operand suitable for load/store IO and cache instructions."
+  (match_operand 0 "ldstio_memory_operand"))
Index: gcc/config/nios2/predicates.md
===================================================================
--- gcc/config/nios2/predicates.md	(revision 225790)
+++ gcc/config/nios2/predicates.md	(working copy)
@@ -83,3 +83,20 @@
                                          &XEXP (op, 0), &XEXP (op, 1),
                                          false));
 })
+
+(define_predicate "ldstio_memory_operand"
+  (match_code "mem")
+{
+  if (TARGET_ARCH_R2)
+    {
+      rtx addr = XEXP (op, 0);
+      if (REG_P (addr))
+        return true;
+      else if (GET_CODE (addr) == PLUS)
+        return (REG_P (XEXP (addr, 0))
+                && CONST_INT_P (XEXP (addr, 1))
+                && SMALL_INT12 (INTVAL (XEXP (addr, 1))));
+      return false;
+    }
+  return memory_operand (op, mode);
+})
Index: gcc/config/nios2/nios2.md
===================================================================
--- gcc/config/nios2/nios2.md	(revision 225790)
+++ gcc/config/nios2/nios2.md	(working copy)
@@ -221,14 +221,14 @@
 (define_insn "ld<bhw_uns>io"
   [(set (match_operand:BHW 0 "register_operand" "=r")
         (unspec_volatile:BHW
-          [(match_operand:BHW 1 "memory_operand" "m")] UNSPECV_LDXIO))]
+          [(match_operand:BHW 1 "ldstio_memory_operand" "w")] UNSPECV_LDXIO))]
   ""
   "ld<bhw_uns>io\\t%0, %1"
   [(set_attr "type" "ld")])
 
 (define_expand "ld<bh>io"
   [(set (match_operand:BH 0 "register_operand" "=r")
-        (match_operand:BH 1 "memory_operand"    "m"))]
+        (match_operand:BH 1 "ldstio_memory_operand" "w"))]
   ""
 {
   rtx tmp = gen_reg_rtx (SImode);
@@ -241,13 +241,13 @@
   [(set (match_operand:SI 0 "register_operand" "=r")
         (sign_extend:SI
           (unspec_volatile:BH
-            [(match_operand:BH 1 "memory_operand" "m")] UNSPECV_LDXIO)))]
+            [(match_operand:BH 1 "ldstio_memory_operand" "w")] UNSPECV_LDXIO)))]
   ""
   "ld<bh>io\\t%0, %1"
   [(set_attr "type" "ld")])
 
 (define_insn "st<bhw>io"
-  [(set (match_operand:BHW 0 "memory_operand" "=m")
+  [(set (match_operand:BHW 0 "ldstio_memory_operand" "=w")
         (unspec_volatile:BHW
           [(match_operand:BHW 1 "reg_or_0_operand" "rM")] UNSPECV_STXIO))]
   ""
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 225790)
+++ gcc/doc/md.texi	(working copy)
@@ -2996,6 +2996,10 @@ Matches immediates which are addresses i
 data section and therefore can be added to @code{gp}
 as a 16-bit immediate to re-create their 32-bit value.
 
+@item w
+A memory operand suitable for load/store IO and cache
+instructions.
+
 @ifset INTERNALS
 @item T
 A @code{const} wrapped @code{UNSPEC} expression,
Index: gcc/testsuite/gcc.target/nios2/r2-io-range.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-io-range.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-io-range.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mbypass-cache" } */
+
+/* Check that the compiler is aware of the reduced offset range for ldio/stio
+   instructions in the Nios II R2 encoding.  */
+
+unsigned int too_big (unsigned int *p)
+{
+  return *(p + 0x400);
+}
+
+unsigned int small_enough (unsigned int *p)
+{
+  return *(p + 0x100);
+}
+
+/* { dg-final { scan-assembler-not "\tldwio\t.*, 4096\\(r.*\\)" } }  */
+/* { dg-final { scan-assembler "\tldwio\t.*, 1024\\(r.*\\)" } }  */
Index: gcc/testsuite/gcc.target/nios2/r2-stio-1.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-stio-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-stio-1.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mgpopt -march=r2" } */
+
+/* The ldio/stio builtins must not use GP-relative addresses for
+   small data objects in R2.  This is because the address offset field
+   has been reduced to 12 bits in R2, and %gprel is a 16-bit relocation.  */
+
+extern volatile unsigned int frob;
+
+volatile unsigned int frob = 0;
+
+void foo (unsigned int val)
+{
+  __builtin_stwio (&frob, val);
+}
+
+/* { dg-final { scan-assembler "stwio\\t" } } */
+/* { dg-final { scan-assembler-not "stwio\\t.*%gprel(frob)" } } */
+
Index: gcc/testsuite/gcc.target/nios2/r2-stio-2.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-stio-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-stio-2.c	(revision 0)
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mgpopt -march=r2 -mbypass-cache" } */
+
+/* Implicit ldio/stio operations must not use GP-relative addresses for
+   small data objects in R2.  This is because the address offset field
+   has been reduced to 12 bits in R2, and %gprel is a 16-bit relocation.  */
+
+extern volatile unsigned int frob;
+
+volatile unsigned int frob = 0;
+
+void foo (unsigned int val)
+{
+  frob = val;
+}
+
+/* { dg-final { scan-assembler "stwio\\t" } } */
+/* { dg-final { scan-assembler-not "stwio\\t.*%gprel(frob)" } } */
+
Index: gcc/testsuite/gcc.target/nios2/nios2-ldxio.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/nios2-ldxio.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/nios2-ldxio.c	(revision 0)
@@ -0,0 +1,52 @@
+/* { dg-do assemble } */
+/* { dg-options "-O" } */
+
+void test_ldbio (unsigned char* p1, unsigned char* p2)
+{
+  __builtin_ldbio (p1);
+  __builtin_ldbio (p2);
+  __builtin_ldbio (p2 + 1);
+  __builtin_ldbio (p2 + 2);
+  __builtin_ldbio (p2 + 2047);
+  __builtin_ldbio (p2 + 2048);
+}
+
+void test_ldbuio (unsigned char* p1, unsigned char* p2)
+{
+  __builtin_ldbuio (p1);
+  __builtin_ldbuio (p2);
+  __builtin_ldbuio (p2 + 1);
+  __builtin_ldbuio (p2 + 2);
+  __builtin_ldbuio (p2 + 2047);
+  __builtin_ldbuio (p2 + 2048);
+}
+
+void test_ldhio (unsigned short* p1, unsigned short* p2)
+{
+  __builtin_ldhio (p1);
+  __builtin_ldhio (p2);
+  __builtin_ldhio (p2 + 1);
+  __builtin_ldhio (p2 + 2);
+  __builtin_ldhio (p2 + 1023);
+  __builtin_ldhio (p2 + 1024);
+}
+
+void test_ldhuio (unsigned short* p1, unsigned short* p2)
+{
+  __builtin_ldhuio (p1);
+  __builtin_ldhuio (p2);
+  __builtin_ldhuio (p2 + 1);
+  __builtin_ldhuio (p2 + 2);
+  __builtin_ldhuio (p2 + 1023);
+  __builtin_ldhuio (p2 + 1024);
+}
+
+void test_ldwio (unsigned int* p1, unsigned int* p2)
+{
+  __builtin_ldwio (p1);
+  __builtin_ldwio (p2);
+  __builtin_ldwio (p2 + 1);
+  __builtin_ldwio (p2 + 2);
+  __builtin_ldwio (p2 + 511);
+  __builtin_ldwio (p2 + 512);
+}
Index: gcc/testsuite/gcc.target/nios2/nios2-stxio.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/nios2-stxio.c	(revision 225790)
+++ gcc/testsuite/gcc.target/nios2/nios2-stxio.c	(working copy)
@@ -1,4 +1,5 @@
-/* { dg-do compile } */
+/* { dg-do assemble } */
+/* { dg-options "-O" } */
 
 void test_stbio (unsigned char* p1, unsigned char* p2)
 {
@@ -6,6 +7,8 @@ void test_stbio (unsigned char* p1, unsi
   __builtin_stbio (p2, 0);
   __builtin_stbio (p2 + 1, 0x80);
   __builtin_stbio (p2 + 2, 0x7f);
+  __builtin_stbio (p2 + 2047, 0x80);
+  __builtin_stbio (p2 + 2048, 0x7f);
 }
 
 void test_sthio (unsigned short* p1, unsigned short* p2)
@@ -14,6 +17,8 @@ void test_sthio (unsigned short* p1, uns
   __builtin_sthio (p2, 0);
   __builtin_sthio (p2 + 1, 0x8000);
   __builtin_sthio (p2 + 2, 0x7fff);
+  __builtin_sthio (p2 + 1023, 0x8000);
+  __builtin_sthio (p2 + 1024, 0x7fff);
 }
 
 void test_stwio (unsigned int* p1, unsigned int* p2)
@@ -22,4 +27,7 @@ void test_stwio (unsigned int* p1, unsig
   __builtin_stwio (p2, 0);
   __builtin_stwio (p2 + 1, 0x80000000);
   __builtin_stwio (p2 + 2, 0x7fffffff);
+  __builtin_stwio (p2 + 511, 5);
+  __builtin_stwio (p2 + 512, 5);
 }
+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [3/7] Correct nested function trampolines for R2 encodings
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
  2015-07-14 22:35 ` [nios2] [1/7] Add -march=, -mbmx, -mcdx flags Sandra Loosemore
  2015-07-14 23:01 ` [nios2] [2/7] Adjust for reduced offsets in R2 load/store IO insns Sandra Loosemore
@ 2015-07-14 23:18 ` Sandra Loosemore
  2015-07-14 23:29 ` [nios2] [4/7] Support new R2 instructions Sandra Loosemore
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 23:18 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 130 bytes --]

This patch adds R2 encodings for the instructions used in nested
function trampolines in libgcc.

Committed as r225794.

-Sandra


[-- Attachment #2: r2-3.log --]
[-- Type: text/x-log, Size: 247 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	libgcc/
	* config/nios2/tramp.c (MOVHI, ORI, JMP): Conditionalize
	for __nios2_arch__ level.

[-- Attachment #3: r2-3.patch --]
[-- Type: text/x-patch, Size: 1066 bytes --]

Index: libgcc/config/nios2/tramp.c
===================================================================
--- libgcc/config/nios2/tramp.c	(revision 225791)
+++ libgcc/config/nios2/tramp.c	(working copy)
@@ -33,13 +33,27 @@ see the files COPYING3 and COPYING.RUNTI
 
 #define SC_REGNO 12
 
-#define MOVHI(reg,imm16) \
+/* Instruction encodings depend on the ISA level.  */
+#if __nios2_arch__ == 2
+#define MOVHI(reg,imm16)			\
+  (((reg) << 11) | ((imm16) << 16) | 0x34)
+#define ORI(reg,imm16)						\
+  (((reg) << 11) | ((reg) << 6) | ((imm16) << 16) | 0x14)
+#define JMP(reg)				\
+  (((reg) << 6) | (0x0d << 26) | 0x20)
+
+#elif __nios2_arch__ == 1
+#define MOVHI(reg,imm16)			\
   (((reg) << 22) | ((imm16) << 6) | 0x34)
-#define ORI(reg,imm16) \
+#define ORI(reg,imm16)						\
   (((reg) << 27) | ((reg) << 22) | ((imm16) << 6) | 0x14)
-#define JMP(reg) \
+#define JMP(reg)				\
   (((reg) << 27) | (0x0d << 11) | 0x3a)
 
+#else
+#error "Unknown Nios II architecture level"
+#endif
+
 void
 __trampoline_setup (unsigned int *addr, void *fnptr, void *chainptr)
 {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [4/7] Support new R2 instructions
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
                   ` (2 preceding siblings ...)
  2015-07-14 23:18 ` [nios2] [3/7] Correct nested function trampolines for R2 encodings Sandra Loosemore
@ 2015-07-14 23:29 ` Sandra Loosemore
  2015-07-14 23:33 ` [nios2] [5/7] Support R2 CDX load/store multiple instructions Sandra Loosemore
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 23:29 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 975 bytes --]

This patch adds GCC support for the bulk of the new R2 instructions --
everything except the new CDX load/store multiple instructions (part
5) and the instructions that are only emitted by new builtins (part
7).

CDX adds a group of 16-bit instructions similar to existing 32-bit
instructions, but with restrictions on which registers can be used,
smaller constant ranges, and the like to achieve the compression.
These instructions have an explicit ".n" suffix in the assembly
language syntax, rather than the encoding being selected automatically
by the assembler.  Instead of adding new insns with a "TARGET_HAS_CDX"
condition, or using the "enabled" attribute to control new
alternatives to the existing insn patterns, Chung-Lin came up with the
clever idea of moving the decision-making to the computation of the
"length" attribute in nios2_cdx_narrow_form_p.  Then
nios2_print_operand maps "%." onto ".n" or not depending on the
length.

Committed as 225796.

-Sandra


[-- Attachment #2: r2-4.log --]
[-- Type: text/x-log, Size: 3661 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/nios2.h (LABEL_ALIGN): Define.
	(REG_ALLOC_ORDER): Define.
	(ADJUST_REG_ALLOC_ORDER): Define.
	(HONOR_REG_ALLOC_ORDER): Define.
	(CDX_REG_P): Define.
	(ANDCLEAR_INT): Define.
	* config/nios2/nios2-protos.h (nios2_add_insn_asm): Declare.
	(nios2_label_align): Declare.
	(nios2_cdx_narrow_form_p): Declare.
	(nios2_adjust_reg_alloc_order): Declare.
	* config/nios2/nios2.c (nios2_rtx_costs): Adjust for BMX zero-extract
	operation.
	(nios2_large_unspec_reloc_p): New function, split from...
	(nios2_legitimate_pic_operand_p): ...here.
	(nios2_emit_move_sequence): Add *high/*lo_sum constant expand code.
	(nios2_print_operand_punct_valid_p): New.
	(nios2_print_operand): Add %., %!, %x, %y, %A.  Remove %U.
	(split_mem_address): New.
	(split_alu_insn): New.
	(cdxreg): New.
	(cdx_add_immed, cdx_and_immed, cdx_mov_immed, cdx_shift_immed): New.
	(enum nios2_add_insn_kind): New.
	(nios2_add_insn_names, nios2_add_insn_narrow): New.
	(nios2_add_insn_classify): New.
	(nios2_add_insn_asm): New.
	(nios2_cdx_narrow_form_p): New.
	(label_align, min_labelno, max_labelno): New.
	(nios2_reorg): New.
	(nios2_label_align): New.
	(nios2_adjust_reg_alloc_order): New.
	(TARGET_PRINT_OPERAND_PUNCT_VALID_P): Define.
	(TARGET_MACHINE_DEPENDENT_REORG): Define.
	* config/nios2/constraints.md (P): New constraint.
	* config/nios2/predicates.md (const_and_operand): New.
	(and_operand): New.
	(stack_memory_operand): New.
	* config/nios2/nios2.md (SP_REGNO): Define stack pointer regno.
	(length): Update to use nios2_cdx_narrow_form_p().
	(type): Add new insn type values.
	(control, alu, st, ld, shift): Update insn reservations with
	new insn type values.
	(*high, *lo_sum): Define new insn patterns for constant generation.
	(movqi_internal, movhi_internal, movsi_internal): Reduce
	alternatives, update asm template to handle CDX variants, update
	type attributes.
	(zero_extendhisi2, zero_extendqi<mode>2): Add CDX variants to asm
	template, update type attributes.
	(extendhisi2, extendqi<mode>2): Likewise.
	(addsi3): Change to use function for asm string.
	(subsi3): Add CDX notation to asm template, update type attributes.
	(negsi3, one_cmplsi3): Likewise.
	(andsi3): New pattern, specialized from logical patterns.
	(<code>si3): Remove and case, combine alternatives, update asm
	template.
	(<shift_op>si3): Add CDX notation, update type attributes.
	(rotrsi3): Update type attribute.
	(*merge, extzv, insv): New insn patterns.
	(return): Change to define_expand.
	(simple_return): Add CDX notation, update type attributes.
	(indirect_jump): Add CDX notation.
	(jump): Update asm cases, update length attribute expression.
	(*call, *call_value, *sibcall, *sibcall_value): Add CDX variant.
	(nios2_cbranch): Update asm cases and length attribute expression
	to handle CDX variants.
	(nios2_cmp<code>): Update asm template.
	(nop): Add CDX notation, update type attributes.
	(trap): Add CDX notation.
	(ctrapsi4): Update asm cases and length attribute expression to
	handle CDX variant.
	* doc/md.texi (Machine Constraints): Document P constraint.

	gcc/testsuite/
	* gcc.target/nios2/andci.c: New.
	* gcc.target/nios2/bmx.c: New.
	* gcc.target/nios2/cdx-add.c: New.
	* gcc.target/nios2/cdx-branch.c: New.
	* gcc.target/nios2/cdx-callret.c: New.
	* gcc.target/nios2/cdx-loadstore.c: New.
	* gcc.target/nios2/cdx-logical.c: New.
	* gcc.target/nios2/cdx-mov.c: New.
	* gcc.target/nios2/cdx-shift.c: New.
	* gcc.target/nios2/cdx-sub.c: New.
	* gcc.target/nios2/nios2-trap-insn.c: Adjust pattern.

[-- Attachment #3: r2-4.patch --]
[-- Type: text/x-patch, Size: 56315 bytes --]

Index: gcc/config/nios2/nios2.h
===================================================================
--- gcc/config/nios2/nios2.h	(revision 225793)
+++ gcc/config/nios2/nios2.h	(working copy)
@@ -96,6 +96,8 @@
   ((TREE_CODE (EXP) == STRING_CST)                              \
    && (ALIGN) < BITS_PER_WORD ? BITS_PER_WORD : (ALIGN))
 
+#define LABEL_ALIGN(LABEL) nios2_label_align (LABEL)
+
 /* Layout of source language data types.  */
 
 #define INT_TYPE_SIZE 32
@@ -175,6 +177,20 @@
 #define HARD_REGNO_NREGS(REGNO, MODE)            \
   ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
 
+/* Order in which to allocate registers.  Each register must be
+   listed once.  This is the default ordering for R1 and non-CDX R2
+   code.  For CDX, we overwrite this in ADJUST_REG_ALLOC_ORDER.  */
+#define REG_ALLOC_ORDER							\
+  { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, \
+      20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, \
+      37, 38, 39 }
+
+#define ADJUST_REG_ALLOC_ORDER nios2_adjust_reg_alloc_order ()
+
+/* Caller-save costs can be less emphasized under R2 CDX, where we can
+   use push.n/pop.n.  */
+#define HONOR_REG_ALLOC_ORDER (TARGET_HAS_CDX)
+
 /* Register Classes.  */
 
 enum reg_class
@@ -213,6 +229,9 @@ enum reg_class
 #define CLASS_MAX_NREGS(CLASS, MODE)					\
   ((GET_MODE_SIZE (MODE) + UNITS_PER_WORD - 1) / UNITS_PER_WORD)
 
+#define CDX_REG_P(REGNO)						\
+  ((REGNO) == 16 || (REGNO) == 17 || (2 <= (REGNO) && (REGNO) <= 7))
+
 /* Tests for various kinds of constants used in the Nios II port.  */
 
 #define SMALL_INT(X) ((unsigned HOST_WIDE_INT)(X) + 0x8000 < 0x10000)
@@ -222,6 +241,8 @@ enum reg_class
 #define SHIFT_INT(X) ((X) >= 0 && (X) <= 31)
 #define RDWRCTL_INT(X) ((X) >= 0 && (X) <= 31)
 #define CUSTOM_INSN_OPCODE(X) ((X) >= 0 && (X) <= 255)
+#define ANDCLEAR_INT(X) \
+  (((X) & 0xffff) == 0xffff || (((X) >> 16) & 0xffff) == 0xffff)
 
 /* Say that the epilogue uses the return address register.  Note that
    in the case of sibcalls, the values "used by the epilogue" are
Index: gcc/config/nios2/nios2-protos.h
===================================================================
--- gcc/config/nios2/nios2-protos.h	(revision 225793)
+++ gcc/config/nios2/nios2-protos.h	(working copy)
@@ -42,12 +42,18 @@ extern bool nios2_validate_fpu_compare (
 
 extern bool nios2_fpu_insn_enabled (enum n2fpu_code);
 extern const char * nios2_fpu_insn_asm (enum n2fpu_code);
+extern const char * nios2_add_insn_asm (rtx_insn *, rtx *);
 
 extern bool nios2_legitimate_pic_operand_p (rtx);
 extern bool gprel_constant_p (rtx);
 extern bool nios2_regno_ok_for_base_p (int, bool);
 extern bool nios2_unspec_reloc_p (rtx);
 
+extern int nios2_label_align (rtx);
+extern bool nios2_cdx_narrow_form_p (rtx_insn *);
+
+extern void nios2_adjust_reg_alloc_order (void);
+
 #ifdef TREE_CODE
 #ifdef ARGS_SIZE_RTX
 /* expr.h defines both ARGS_SIZE_RTX and `enum direction' */
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225793)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -1195,6 +1195,13 @@ nios2_rtx_costs (rtx x, machine_mode mod
           return false;
         }
 
+    case ZERO_EXTRACT:
+      if (TARGET_HAS_BMX)
+	{
+          *total = COSTS_N_INSNS (1);
+          return true;
+	}
+
       default:
         return false;
     }
@@ -1262,6 +1269,14 @@ nios2_unspec_reloc_p (rtx op)
 	  && ! nios2_large_offset_p (XINT (XEXP (op, 0), 1)));
 }
 
+static bool
+nios2_large_unspec_reloc_p (rtx op)
+{
+  return (GET_CODE (op) == CONST
+	  && GET_CODE (XEXP (op, 0)) == UNSPEC
+	  && nios2_large_offset_p (XINT (XEXP (op, 0), 1)));
+}
+
 /* Helper to generate unspec constant.  */
 static rtx
 nios2_unspec_offset (rtx loc, int unspec)
@@ -1871,9 +1886,7 @@ nios2_load_pic_address (rtx sym, int uns
 bool
 nios2_legitimate_pic_operand_p (rtx x)
 {
-  if (GET_CODE (x) == CONST
-      && GET_CODE (XEXP (x, 0)) == UNSPEC
-      && nios2_large_offset_p (XINT (XEXP (x, 0), 1)))
+  if (nios2_large_unspec_reloc_p (x))
     return true;
 
   return ! (GET_CODE (x) == SYMBOL_REF
@@ -2001,10 +2014,37 @@ nios2_emit_move_sequence (rtx *operands,
       from = copy_to_mode_reg (mode, from);
     }
 
-  if (GET_CODE (from) == SYMBOL_REF || GET_CODE (from) == LABEL_REF
-      || (GET_CODE (from) == CONST
-	  && GET_CODE (XEXP (from, 0)) != UNSPEC))
-    from = nios2_legitimize_constant_address (from);
+  if (CONSTANT_P (from))
+    {
+      if (CONST_INT_P (from))
+	{
+	  if (!SMALL_INT (INTVAL (from))
+	      && !SMALL_INT_UNSIGNED (INTVAL (from))
+	      && !UPPER16_INT (INTVAL (from)))
+	    {
+	      HOST_WIDE_INT high = (INTVAL (from) + 0x8000) & ~0xffff;
+	      HOST_WIDE_INT low = INTVAL (from) & 0xffff;
+	      emit_move_insn (to, gen_int_mode (high, SImode));
+	      emit_insn (gen_add2_insn (to, gen_int_mode (low, HImode)));
+	      set_unique_reg_note (get_last_insn (), REG_EQUAL,
+				   copy_rtx (from));
+	      return true;
+	    }
+	}
+      else if (!gprel_constant_p (from))
+	{
+	  if (!nios2_large_unspec_reloc_p (from))
+	    from = nios2_legitimize_constant_address (from);
+	  if (CONSTANT_P (from))
+	    {
+	      emit_insn (gen_rtx_SET (to, gen_rtx_HIGH (Pmode, from)));
+	      emit_insn (gen_rtx_SET (to, gen_rtx_LO_SUM (Pmode, to, from)));
+	      set_unique_reg_note (get_last_insn (), REG_EQUAL,
+				   copy_rtx (operands[1]));
+	      return true;
+	    }
+	}
+    }
 
   operands[0] = to;
   operands[1] = from;
@@ -2037,25 +2077,106 @@ nios2_adjust_call_address (rtx *call_op,
 \f
 /* Output assembly language related definitions.  */
 
+/* Implement TARGET_PRINT_OPERAND_PUNCT_VALID_P.  */
+static bool
+nios2_print_operand_punct_valid_p (unsigned char code)
+{
+  return (code == '.' || code == '!');
+}
+
+
 /* Print the operand OP to file stream FILE modified by LETTER.
    LETTER can be one of:
 
-     i: print "i" if OP is an immediate, except 0
-     o: print "io" if OP is volatile
-     z: for const0_rtx print $0 instead of 0
+     i: print i/hi/ui suffixes (used for mov instruction variants),
+        when OP is the appropriate immediate operand.
+
+     u: like 'i', except without "ui" suffix case (used for cmpgeu/cmpltu)
+
+     o: print "io" if OP needs volatile access (due to TARGET_BYPASS_CACHE
+        or TARGET_BYPASS_CACHE_VOLATILE).
+
+     x: print i/hi/ci/chi suffixes for the and instruction,
+        when OP is the appropriate immediate operand.
+
+     z: prints the third register immediate operand in assembly
+        instructions.  Outputs const0_rtx as the 'zero' register
+	instead of '0'.
+	
+     y: same as 'z', but for specifically for logical instructions,
+        where the processing for immediates are slightly different.
+
      H: for %hiadj
      L: for %lo
-     U: for upper half of 32 bit value
      D: for the upper 32-bits of a 64-bit double value
      R: prints reverse condition.
+     A: prints (reg) operand for ld[s]ex and st[s]ex.
+
+     .: print .n suffix for 16-bit instructions.
+     !: print r.n suffix for 16-bit instructions.  Used for jmpr.n.
 */
 static void
 nios2_print_operand (FILE *file, rtx op, int letter)
 {
 
+  /* First take care of the format letters that just insert a string
+     into the output stream.  */
   switch (letter)
     {
+    case '.':
+      if (current_output_insn && get_attr_length (current_output_insn) == 2)
+	fprintf (file, ".n");
+      return;
+
+    case '!':
+      if (current_output_insn && get_attr_length (current_output_insn) == 2)
+	fprintf (file, "r.n");
+      return;
+
+    case 'x':
+      if (CONST_INT_P (op))
+	{
+	  HOST_WIDE_INT val = INTVAL (op);
+	  HOST_WIDE_INT low = val & 0xffff;
+	  HOST_WIDE_INT high = (val >> 16) & 0xffff;
+
+	  if (val != 0)
+	    {
+	      if (high != 0)
+		{
+		  if (low != 0)
+		    {
+		      gcc_assert (TARGET_ARCH_R2);
+		      if (high == 0xffff)
+			fprintf (file, "c");
+		      else if (low == 0xffff)
+			fprintf (file, "ch");
+		      else
+			gcc_unreachable ();
+		    }
+		  else
+		    fprintf (file, "h");
+		}
+	      fprintf (file, "i");
+	    }
+	}
+      return;
+
+    case 'u':
     case 'i':
+      if (CONST_INT_P (op))
+	{
+	  HOST_WIDE_INT val = INTVAL (op);
+	  HOST_WIDE_INT low = val & 0xffff;
+	  HOST_WIDE_INT high = (val >> 16) & 0xffff;
+	  if (val != 0)
+	    {
+	      if (low == 0 && high != 0)
+		fprintf (file, "h");
+	      else if (high == 0 && (low & 0x8000) != 0 && letter != 'u')
+		fprintf (file, "u");
+	    }
+	}
       if (CONSTANT_P (op) && op != const0_rtx)
         fprintf (file, "i");
       return;
@@ -2064,13 +2185,18 @@ nios2_print_operand (FILE *file, rtx op,
       if (GET_CODE (op) == MEM
 	  && ((MEM_VOLATILE_P (op) && TARGET_BYPASS_CACHE_VOLATILE)
 	      || TARGET_BYPASS_CACHE))
-        fprintf (file, "io");
+	{
+	  gcc_assert (current_output_insn
+		      && get_attr_length (current_output_insn) == 4);
+	  fprintf (file, "io");
+	}
       return;
 
     default:
       break;
     }
 
+  /* Handle comparison operator names.  */
   if (comparison_operator (op, VOIDmode))
     {
       enum rtx_code cond = GET_CODE (op);
@@ -2086,10 +2212,11 @@ nios2_print_operand (FILE *file, rtx op,
 	}
     }
 
+  /* Now handle the cases where we actually need to format an operand.  */
   switch (GET_CODE (op))
     {
     case REG:
-      if (letter == 0 || letter == 'z')
+      if (letter == 0 || letter == 'z' || letter == 'y')
         {
           fprintf (file, "%s", reg_names[REGNO (op)]);
           return;
@@ -2102,19 +2229,64 @@ nios2_print_operand (FILE *file, rtx op,
       break;
 
     case CONST_INT:
-      if (INTVAL (op) == 0 && letter == 'z')
-        {
-          fprintf (file, "zero");
-          return;
-        }
+      {
+	rtx int_rtx = op;
+	HOST_WIDE_INT val = INTVAL (int_rtx);
+	HOST_WIDE_INT low = val & 0xffff;
+	HOST_WIDE_INT high = (val >> 16) & 0xffff;
+
+	if (letter == 'y')
+	  {
+	    if (val == 0)
+	      fprintf (file, "zero");
+	    else
+	      {
+		if (high != 0)
+		  {
+		    if (low != 0)
+		      {
+			gcc_assert (TARGET_ARCH_R2);
+			if (high == 0xffff)
+			  /* andci.  */
+			  int_rtx = gen_int_mode (low, SImode);
+			else if (low == 0xffff)
+			  /* andchi.  */
+			  int_rtx = gen_int_mode (high, SImode);
+			else
+			  gcc_unreachable ();
+		      }
+		    else
+		      /* andhi.  */
+		      int_rtx = gen_int_mode (high, SImode);
+		  }
+		else
+		  /* andi.  */
+		  int_rtx = gen_int_mode (low, SImode);
+		output_addr_const (file, int_rtx);
+	      }
+	    return;
+	  }
+	else if (letter == 'z')
+	  {
+	    if (val == 0)
+	      fprintf (file, "zero");
+	    else
+	      {
+		if (low == 0 && high != 0)
+		  int_rtx = gen_int_mode (high, SImode);
+		else if (low != 0)
+		  {
+		    gcc_assert (high == 0 || high == 0xffff);
+		    int_rtx = gen_int_mode (low, high == 0 ? SImode : HImode);
+		  }
+		else
+		  gcc_unreachable ();
+		output_addr_const (file, int_rtx);
+	      }
+	    return;
+	  }
+      }
 
-      if (letter == 'U')
-        {
-          HOST_WIDE_INT val = INTVAL (op);
-	  val = (val >> 16) & 0xFFFF;
-	  output_addr_const (file, gen_int_mode (val, SImode));
-          return;
-        }
       /* Else, fall through.  */
 
     case CONST:
@@ -2147,6 +2319,12 @@ nios2_print_operand (FILE *file, rtx op,
 
     case SUBREG:
     case MEM:
+      if (letter == 'A')
+	{
+	  /* Address of '(reg)' form, with no index.  */
+	  fprintf (file, "(%s)", reg_names[REGNO (XEXP (op, 0))]);
+	  return;
+	}
       if (letter == 0)
         {
           output_address (op);
@@ -3462,6 +3640,489 @@ nios2_asm_output_mi_thunk (FILE *file, t
   reload_completed = 0;
 }
 
+
+/* Utility function to break a memory address into
+   base register + constant offset.  Return false if something
+   unexpected is seen.  */
+static bool
+split_mem_address (rtx addr, rtx *base_reg, rtx *offset)
+{
+  if (REG_P (addr))
+    {
+      *base_reg = addr;
+      *offset = const0_rtx;
+      return true;
+    }
+  else if (GET_CODE (addr) == PLUS)
+    {
+      *base_reg = XEXP (addr, 0);
+      *offset = XEXP (addr, 1);
+      return true;
+    }
+  return false;
+}
+
+/* Splits out the operands of an ALU insn, places them in *LHS, *RHS1, *RHS2.  */
+static void
+split_alu_insn (rtx_insn *insn, rtx *lhs, rtx *rhs1, rtx *rhs2)
+{
+  rtx pat = PATTERN (insn);
+  gcc_assert (GET_CODE (pat) == SET);
+  *lhs = SET_DEST (pat);
+  *rhs1 = XEXP (SET_SRC (pat), 0);
+  if (GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) != RTX_UNARY)
+    *rhs2 = XEXP (SET_SRC (pat), 1);
+  return;
+}
+
+/* Returns true if OP is a REG and assigned a CDX reg.  */
+static bool
+cdxreg (rtx op)
+{
+  return REG_P (op) && (!reload_completed || CDX_REG_P (REGNO (op)));
+}
+
+/* Returns true if OP is within range of CDX addi.n immediates.  */
+static bool
+cdx_add_immed (rtx op)
+{
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT ival = INTVAL (op);
+      return ival <= 128 && ival > 0 && (ival & (ival - 1)) == 0;
+    }
+  return false;
+}
+
+/* Returns true if OP is within range of CDX andi.n immediates.  */
+static bool
+cdx_and_immed (rtx op)
+{
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT ival = INTVAL (op);
+      return (ival == 1 || ival == 2 || ival == 3 || ival == 4
+	      || ival == 8 || ival == 0xf || ival == 0x10
+	      || ival == 0x10 || ival == 0x1f || ival == 0x20
+	      || ival == 0x3f || ival == 0x3f || ival == 0x7f
+	      || ival == 0x80 || ival == 0xff || ival == 0x7ff
+	      || ival == 0xff00 || ival == 0xffff);
+    }
+  return false;
+}
+
+/* Returns true if OP is within range of CDX movi.n immediates.  */
+static bool
+cdx_mov_immed (rtx op)
+{
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT ival = INTVAL (op);
+      return ((ival >= 0 && ival <= 124)
+	      || ival == 0xff || ival == -2 || ival == -1);
+    }
+  return false;
+}
+
+/* Returns true if OP is within range of CDX slli.n/srli.n immediates.  */
+static bool
+cdx_shift_immed (rtx op)
+{
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT ival = INTVAL (op);
+      return (ival == 1 || ival == 2 || ival == 3 || ival == 8
+	      || ival == 12 || ival == 16 || ival == 24
+	      || ival == 31);
+    }
+  return false;
+}
+
+
+
+/* Classification of different kinds of add instructions.  */
+enum nios2_add_insn_kind {
+  nios2_add_n_kind,
+  nios2_addi_n_kind,
+  nios2_subi_n_kind,
+  nios2_spaddi_n_kind,
+  nios2_spinci_n_kind,
+  nios2_spdeci_n_kind,
+  nios2_add_kind,
+  nios2_addi_kind
+};
+
+static const char *nios2_add_insn_names[] = {
+  "add.n", "addi.n", "subi.n", "spaddi.n",  "spinci.n", "spdeci.n",
+  "add", "addi" };
+static bool nios2_add_insn_narrow[] = {
+  true, true, true, true, true, true,
+  false, false};
+
+/* Function to classify kinds of add instruction patterns.  */
+static enum nios2_add_insn_kind 
+nios2_add_insn_classify (rtx_insn *insn ATTRIBUTE_UNUSED,
+			 rtx lhs, rtx rhs1, rtx rhs2)
+{
+  if (TARGET_HAS_CDX)
+    {
+      if (cdxreg (lhs) && cdxreg (rhs1))
+	{
+	  if (cdxreg (rhs2))
+	    return nios2_add_n_kind;
+	  if (CONST_INT_P (rhs2))
+	    {
+	      HOST_WIDE_INT ival = INTVAL (rhs2);
+	      if (ival > 0 && cdx_add_immed (rhs2))
+		return nios2_addi_n_kind;
+	      if (ival < 0 && cdx_add_immed (GEN_INT (-ival)))
+		return nios2_subi_n_kind;
+	    }
+	}
+      else if (rhs1 == stack_pointer_rtx
+	       && CONST_INT_P (rhs2))
+	{
+	  HOST_WIDE_INT imm7 = INTVAL (rhs2) >> 2;
+	  HOST_WIDE_INT rem = INTVAL (rhs2) & 3;
+	  if (rem == 0 && (imm7 & ~0x7f) == 0)
+	    {
+	      if (cdxreg (lhs))
+		return nios2_spaddi_n_kind;
+	      if (lhs == stack_pointer_rtx)
+		return nios2_spinci_n_kind;
+	    }
+	  imm7 = -INTVAL(rhs2) >> 2;
+	  rem = -INTVAL (rhs2) & 3;
+	  if (lhs == stack_pointer_rtx
+	      && rem == 0 && (imm7 & ~0x7f) == 0)
+	    return nios2_spdeci_n_kind;
+	}
+    }
+  return ((REG_P (rhs2) || rhs2 == const0_rtx)
+	  ? nios2_add_kind : nios2_addi_kind);
+}
+
+/* Emit assembly language for the different kinds of add instructions.  */
+const char*
+nios2_add_insn_asm (rtx_insn *insn, rtx *operands)
+{
+  static char buf[256];
+  int ln = 256;
+  enum nios2_add_insn_kind kind
+    = nios2_add_insn_classify (insn, operands[0], operands[1], operands[2]);
+  if (kind == nios2_subi_n_kind)
+    snprintf (buf, ln, "subi.n\t%%0, %%1, %d", (int) -INTVAL (operands[2]));
+  else if (kind == nios2_spaddi_n_kind)
+    snprintf (buf, ln, "spaddi.n\t%%0, %%2");
+  else if (kind == nios2_spinci_n_kind)
+    snprintf (buf, ln, "spinci.n\t%%2");
+  else if (kind == nios2_spdeci_n_kind)
+    snprintf (buf, ln, "spdeci.n\t%d", (int) -INTVAL (operands[2]));
+  else
+    snprintf (buf, ln, "%s\t%%0, %%1, %%z2", nios2_add_insn_names[(int)kind]);
+  return buf;
+}
+
+/* This routine, which the default "length" attribute computation is
+   based on, encapsulates information about all the cases where CDX
+   provides a narrow 2-byte instruction form.  */
+bool
+nios2_cdx_narrow_form_p (rtx_insn *insn)
+{
+  rtx pat, lhs, rhs1, rhs2;
+  enum attr_type type;
+  if (!TARGET_HAS_CDX)
+    return false;
+  type = get_attr_type (insn);
+  pat = PATTERN (insn);
+  gcc_assert (reload_completed);
+  switch (type)
+    {
+    case TYPE_CONTROL:
+      if (GET_CODE (pat) == SIMPLE_RETURN)
+	return true;
+      if (GET_CODE (pat) == PARALLEL)
+	pat = XVECEXP (pat, 0, 0);
+      if (GET_CODE (pat) == SET)
+	pat = SET_SRC (pat);
+      if (GET_CODE (pat) == IF_THEN_ELSE)
+	{
+	  /* Conditional branch patterns; for these we
+	     only check the comparison to find beqz.n/bnez.n cases.
+	     For the 'nios2_cbranch' pattern, we cannot also check
+	     the branch range here. That will be done at the md
+	     pattern "length" attribute computation.  */
+	  rtx cmp = XEXP (pat, 0);
+	  return ((GET_CODE (cmp) == EQ || GET_CODE (cmp) == NE)
+		  && cdxreg (XEXP (cmp, 0))
+		  && XEXP (cmp, 1) == const0_rtx);
+	}
+      if (GET_CODE (pat) == TRAP_IF)
+	/* trap.n is always usable.  */
+	return true;
+      if (GET_CODE (pat) == CALL)
+	pat = XEXP (XEXP (pat, 0), 0);
+      if (REG_P (pat))
+	/* Control instructions taking a register operand are indirect
+	   jumps and calls.  The CDX instructions have a 5-bit register
+	   field so any reg is valid.  */
+	return true;
+      else
+	{
+	  gcc_assert (!insn_variable_length_p (insn));
+	  return false;
+	}
+    case TYPE_ADD:
+      {
+	enum nios2_add_insn_kind kind;
+	split_alu_insn (insn, &lhs, &rhs1, &rhs2);
+	kind = nios2_add_insn_classify (insn, lhs, rhs1, rhs2);
+	return nios2_add_insn_narrow[(int)kind];
+      }
+    case TYPE_LD:
+      {
+	bool ret;
+	HOST_WIDE_INT offset, rem = 0;
+	rtx addr, reg = SET_DEST (pat), mem = SET_SRC (pat);
+	if (GET_CODE (mem) == SIGN_EXTEND)
+	  /* No CDX form for sign-extended load.  */
+	  return false;
+	if (GET_CODE (mem) == ZERO_EXTEND)
+	  /* The load alternatives in the zero_extend* patterns.  */
+	  mem = XEXP (mem, 0);
+	if (MEM_P (mem))
+	  {
+	    /* ldxio.  */
+	    if ((MEM_VOLATILE_P (mem) && TARGET_BYPASS_CACHE_VOLATILE)
+		|| TARGET_BYPASS_CACHE)
+	      return false;
+	    addr = XEXP (mem, 0);
+	    /* GP-based references are never narrow.  */
+	    if (gprel_constant_p (addr))
+		return false;
+	    ret = split_mem_address (addr, &rhs1, &rhs2);
+	    gcc_assert (ret);
+	  }
+	else
+	  return false;
+
+	offset = INTVAL (rhs2);
+	if (GET_MODE (mem) == SImode)
+	  {
+	    rem = offset & 3;
+	    offset >>= 2;
+	    /* ldwsp.n case.  */
+	    if (rtx_equal_p (rhs1, stack_pointer_rtx)
+		&& rem == 0 && (offset & ~0x1f) == 0)
+	      return true;
+	  }
+	else if (GET_MODE (mem) == HImode)
+	  {
+	    rem = offset & 1;
+	    offset >>= 1;
+	  }
+	/* ldbu.n, ldhu.n, ldw.n cases.  */
+	return (cdxreg (reg) && cdxreg (rhs1)
+		&& rem == 0 && (offset & ~0xf) == 0);
+      }
+    case TYPE_ST:
+      if (GET_CODE (pat) == PARALLEL)
+	/* stex, stsex.  */
+	return false;
+      else
+	{
+	  bool ret;
+	  HOST_WIDE_INT offset, rem = 0;
+	  rtx addr, reg = SET_SRC (pat), mem = SET_DEST (pat);
+	  if (!MEM_P (mem))
+	    return false;
+	  /* stxio.  */
+	  if ((MEM_VOLATILE_P (mem) && TARGET_BYPASS_CACHE_VOLATILE)
+	      || TARGET_BYPASS_CACHE)
+	    return false;
+	  addr = XEXP (mem, 0);
+	  /* GP-based references are never narrow.  */
+	  if (gprel_constant_p (addr))
+	    return false;
+	  ret = split_mem_address (addr, &rhs1, &rhs2);
+	  gcc_assert (ret);
+	  offset = INTVAL (rhs2);
+	  if (GET_MODE (mem) == SImode)
+	    {
+	      rem = offset & 3;
+	      offset >>= 2;
+	      /* stwsp.n case.  */
+	      if (rtx_equal_p (rhs1, stack_pointer_rtx)
+		  && rem == 0 && (offset & ~0x1f) == 0)
+		return true;
+	      /* stwz.n case.  */
+	      else if (reg == const0_rtx && cdxreg (rhs1)
+		       && rem == 0 && (offset & ~0x3f) == 0)
+		return true;
+	    }
+	  else if (GET_MODE (mem) == HImode)
+	    {
+	      rem = offset & 1;
+	      offset >>= 1;
+	    }
+	  else
+	    {
+	      gcc_assert (GET_MODE (mem) == QImode);
+	      /* stbz.n case.  */
+	      if (reg == const0_rtx && cdxreg (rhs1)
+		  && (offset & ~0x3f) == 0)
+		return true;
+	    }
+
+	  /* stbu.n, sthu.n, stw.n cases.  */
+	  return (cdxreg (reg) && cdxreg (rhs1)
+		  && rem == 0 && (offset & ~0xf) == 0);
+	}
+    case TYPE_MOV:
+      lhs = SET_DEST (pat);
+      rhs1 = SET_SRC (pat);
+      if (CONST_INT_P (rhs1))
+	return (cdxreg (lhs) && cdx_mov_immed (rhs1));
+      gcc_assert (REG_P (lhs) && REG_P (rhs1));
+      return true;
+
+    case TYPE_AND:
+      /* Some zero_extend* alternatives are and insns.  */
+      if (GET_CODE (SET_SRC (pat)) == ZERO_EXTEND)
+	return (cdxreg (SET_DEST (pat))
+		&& cdxreg (XEXP (SET_SRC (pat), 0)));
+      split_alu_insn (insn, &lhs, &rhs1, &rhs2);
+      if (CONST_INT_P (rhs2))
+	return (cdxreg (lhs) && cdxreg (rhs1) && cdx_and_immed (rhs2));
+      return (cdxreg (lhs) && cdxreg (rhs2)
+	      && (!reload_completed || rtx_equal_p (lhs, rhs1)));
+
+    case TYPE_OR:
+    case TYPE_XOR:
+      /* Note the two-address limitation for CDX form.  */
+      split_alu_insn (insn, &lhs, &rhs1, &rhs2);
+      return (cdxreg (lhs) && cdxreg (rhs2)
+	      && (!reload_completed || rtx_equal_p (lhs, rhs1)));
+
+    case TYPE_SUB:
+      split_alu_insn (insn, &lhs, &rhs1, &rhs2);
+      return (cdxreg (lhs) && cdxreg (rhs1) && cdxreg (rhs2));
+
+    case TYPE_NEG:
+    case TYPE_NOT:
+      split_alu_insn (insn, &lhs, &rhs1, NULL);
+      return (cdxreg (lhs) && cdxreg (rhs1));
+
+    case TYPE_SLL:
+    case TYPE_SRL:
+      split_alu_insn (insn, &lhs, &rhs1, &rhs2);
+      return (cdxreg (lhs)
+	      && ((cdxreg (rhs1) && cdx_shift_immed (rhs2))
+		  || (cdxreg (rhs2)
+		      && (!reload_completed || rtx_equal_p (lhs, rhs1)))));
+    case TYPE_NOP:
+    case TYPE_PUSH:
+    case TYPE_POP:
+      return true;
+    default:
+      break;
+    }
+  return false;
+}
+
+/* Implement TARGET_MACHINE_DEPENDENT_REORG:
+   We use this hook when emitting CDX code to enforce the 4-byte
+   alignment requirement for labels that are used as the targets of
+   jmpi instructions.  CDX code can otherwise contain a mix of 16-bit
+   and 32-bit instructions aligned on any 16-bit boundary, but functions
+   and jmpi labels have to be 32-bit aligned because of the way the address
+   is encoded in the instruction.  */
+
+static unsigned char *label_align;
+static int min_labelno, max_labelno;
+
+static void
+nios2_reorg (void)
+{
+  bool changed = true;
+  rtx_insn *insn;
+
+  if (!TARGET_HAS_CDX)
+    return;
+
+  /* Initialize the data structures.  */
+  if (label_align)
+    free (label_align);
+  max_labelno = max_label_num ();
+  min_labelno = get_first_label_num ();
+  label_align = XCNEWVEC (unsigned char, max_labelno - min_labelno + 1);
+  
+  /* Iterate on inserting alignment and adjusting branch lengths until
+     no more changes.  */
+  while (changed)
+    {
+      changed = false;
+      shorten_branches (get_insns ());
+
+      for (insn = get_insns (); insn != 0; insn = NEXT_INSN (insn))
+	if (JUMP_P (insn) && insn_variable_length_p (insn))
+	  {
+	    rtx label = JUMP_LABEL (insn);
+	    /* We use the current fact that all cases of 'jmpi'
+	       doing the actual branch in the machine description
+	       has a computed length of 6 or 8.  Length 4 and below
+	       are all PC-relative 'br' branches without the jump-align
+	       problem.  */
+	    if (label && LABEL_P (label) && get_attr_length (insn) > 4)
+	      {
+		int index = CODE_LABEL_NUMBER (label) - min_labelno;
+		if (label_align[index] != 2)
+		  {
+		    label_align[index] = 2;
+		    changed = true;
+		  }
+	      }
+	  }
+    }
+}
+
+/* Implement LABEL_ALIGN, using the information gathered in nios2_reorg.  */
+int
+nios2_label_align (rtx label)
+{
+  int n = CODE_LABEL_NUMBER (label);
+
+  if (label_align && n >= min_labelno && n <= max_labelno)
+    return MAX (label_align[n - min_labelno], align_labels_log);
+  return align_labels_log;
+}
+
+/* Implement ADJUST_REG_ALLOC_ORDER.  We use the default ordering
+   for R1 and non-CDX R2 code; for CDX we tweak thing to prefer
+   the registers that can be used as operands to instructions that
+   have 3-bit register fields.  */
+void
+nios2_adjust_reg_alloc_order (void)
+{
+  const int cdx_reg_alloc_order[] =
+    {
+      /* Call-clobbered GPRs within CDX 3-bit encoded range.  */
+      2, 3, 4, 5, 6, 7, 
+      /* Call-saved GPRs within CDX 3-bit encoded range.  */
+      16, 17,
+      /* Other call-clobbered GPRs.  */
+      8, 9, 10, 11, 12, 13, 14, 15,
+      /* Other call-saved GPRs. RA placed first since it is always saved.  */
+      31, 18, 19, 20, 21, 22, 23, 28,
+      /* Fixed GPRs, not used by the register allocator.  */
+      0, 1, 24, 25, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 38, 39
+   };
+
+  if (TARGET_HAS_CDX)
+    memcpy (reg_alloc_order, cdx_reg_alloc_order,
+	    sizeof (int) * FIRST_PSEUDO_REGISTER);
+}
+
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_FUNCTION_PROLOGUE
@@ -3549,6 +4210,9 @@ nios2_asm_output_mi_thunk (FILE *file, t
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL nios2_output_dwarf_dtprel
 
+#undef TARGET_PRINT_OPERAND_PUNCT_VALID_P
+#define TARGET_PRINT_OPERAND_PUNCT_VALID_P nios2_print_operand_punct_valid_p
+
 #undef TARGET_PRINT_OPERAND
 #define TARGET_PRINT_OPERAND nios2_print_operand
 
@@ -3589,6 +4253,9 @@ nios2_asm_output_mi_thunk (FILE *file, t
 #undef  TARGET_ASM_OUTPUT_MI_THUNK
 #define TARGET_ASM_OUTPUT_MI_THUNK nios2_asm_output_mi_thunk
 
+#undef TARGET_MACHINE_DEPENDENT_REORG
+#define TARGET_MACHINE_DEPENDENT_REORG nios2_reorg
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nios2.h"
Index: gcc/config/nios2/constraints.md
===================================================================
--- gcc/config/nios2/constraints.md	(revision 225793)
+++ gcc/config/nios2/constraints.md	(working copy)
@@ -20,9 +20,10 @@
 
 ;; We use the following constraint letters for constants
 ;;
-;;  I: -32768 to -32767
+;;  I: -32768 to 32767
 ;;  J: 0 to 65535
 ;;  K: $nnnn0000 for some nnnn
+;;  P: Under R2, $nnnnffff or $ffffnnnn for some nnnn
 ;;  L: 0 to 31 (for shift counts)
 ;;  M: 0
 ;;  N: 0 to 255 (for custom instruction numbers)
@@ -86,6 +87,11 @@
   (and (match_code "const_int")
        (match_test "ival >= 0 && ival <= 31")))
 
+(define_constraint "P"
+  "An immediate operand for R2 andchi/andci instructions."
+  (and (match_code "const_int")
+       (match_test "TARGET_ARCH_R2 && ANDCLEAR_INT (ival)")))
+
 (define_constraint "S"
   "An immediate stored in small data, accessible by GP."
   (match_test "gprel_constant_p (op)"))
Index: gcc/config/nios2/predicates.md
===================================================================
--- gcc/config/nios2/predicates.md	(revision 225793)
+++ gcc/config/nios2/predicates.md	(working copy)
@@ -55,6 +55,16 @@
   (ior (match_operand 0 "const_logical_operand")
        (match_operand 0 "register_operand")))
 
+(define_predicate "const_and_operand"
+  (and (match_code "const_int")
+       (match_test "SMALL_INT_UNSIGNED (INTVAL (op))
+                    || UPPER16_INT (INTVAL (op))
+                    || (TARGET_ARCH_R2 && ANDCLEAR_INT (INTVAL (op)))")))
+
+(define_predicate "and_operand"
+  (ior (match_operand 0 "const_and_operand")
+       (match_operand 0 "register_operand")))
+
 (define_predicate "const_shift_operand"
   (and (match_code "const_int")
        (match_test "SHIFT_INT (INTVAL (op))")))
@@ -84,6 +94,16 @@
                                          false));
 })
 
+(define_predicate "stack_memory_operand"
+  (match_code "mem")
+{
+  rtx addr = XEXP (op, 0);
+  return ((REG_P (addr) && REGNO (addr) == SP_REGNO)
+          || (GET_CODE (addr) == PLUS
+              && REG_P (XEXP (addr, 0)) && REGNO (XEXP (addr, 0)) == SP_REGNO
+              && CONST_INT_P (XEXP (addr, 1))));
+})
+
 (define_predicate "ldstio_memory_operand"
   (match_code "mem")
 {
Index: gcc/config/nios2/nios2.md
===================================================================
--- gcc/config/nios2/nios2.md	(revision 225793)
+++ gcc/config/nios2/nios2.md	(working copy)
@@ -30,6 +30,7 @@
 
    (TP_REGNO              23)	; Thread pointer register
    (GP_REGNO	          26)	; Global pointer register
+   (SP_REGNO	          27)	; Stack pointer register
    (FP_REGNO	          28)	; Frame pointer register
    (EA_REGNO	          29)	; Exception return address register
    (RA_REGNO              31)	; Return address register
@@ -92,9 +93,14 @@
 ; incuring a stall.
 
 ; length of an instruction (in bytes)
-(define_attr "length" "" (const_int 4))
+(define_attr "length" ""
+  (if_then_else (match_test "nios2_cdx_narrow_form_p (insn)")
+    (const_int 2)
+    (const_int 4)))
+
 (define_attr "type" 
-  "unknown,complex,control,alu,cond_alu,st,ld,shift,mul,div,custom" 
+  "unknown,complex,control,alu,cond_alu,st,ld,stwm,ldwm,push,pop,mul,div,\
+   custom,add,sub,mov,and,or,xor,neg,not,sll,srl,sra,rol,ror,nop"
   (const_string "complex"))
 
 (define_asm_attributes
@@ -118,11 +124,11 @@
   "cpu")
 
 (define_insn_reservation "control" 1
-  (eq_attr "type" "control")
+  (eq_attr "type" "control,pop")
   "cpu")
 
 (define_insn_reservation "alu" 1
-  (eq_attr "type" "alu")
+  (eq_attr "type" "alu,add,sub,mov,and,or,xor,neg,not")
   "cpu")
 
 (define_insn_reservation "cond_alu" 1
@@ -130,7 +136,7 @@
   "cpu")
 
 (define_insn_reservation "st" 1
-  (eq_attr "type" "st")
+  (eq_attr "type" "st,stwm,push")
   "cpu")
   
 (define_insn_reservation "custom" 1
@@ -139,11 +145,11 @@
 
 ; shifts, muls and lds have three cycle latency
 (define_insn_reservation "ld" 3
-  (eq_attr "type" "ld")
+  (eq_attr "type" "ld,ldwm")
   "cpu")
 
 (define_insn_reservation "shift" 3
-  (eq_attr "type" "shift")
+  (eq_attr "type" "sll,srl,sra,rol,ror")
   "cpu")
 
 (define_insn_reservation "mul" 3
@@ -171,46 +177,90 @@
     DONE;
 })
 
+(define_insn "*high"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (high:SI (match_operand:SI 1 "immediate_operand" "i")))]
+  ""
+  "movhi\\t%0, %H1"
+  [(set_attr "type" "alu")])
+
+(define_insn "*lo_sum"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (lo_sum:SI (match_operand:SI 1 "register_operand"  "r")
+                   (match_operand:SI 2 "immediate_operand" "i")))]
+  ""
+  "addi\\t%0, %1, %L2"
+  [(set_attr "type" "alu")])
+
 (define_insn "movqi_internal"
-  [(set (match_operand:QI 0 "nonimmediate_operand" "=m, r,r, r")
-        (match_operand:QI 1 "general_operand"       "rM,m,rM,I"))]
+  [(set (match_operand:QI 0 "nonimmediate_operand" "=m, r,r")
+        (match_operand:QI 1 "general_operand"       "rM,m,rI"))]
   "(register_operand (operands[0], QImode)
     || reg_or_0_operand (operands[1], QImode))"
-  "@
-    stb%o0\\t%z1, %0
-    ldbu%o1\\t%0, %1
-    mov\\t%0, %z1
-    movi\\t%0, %1"
-  [(set_attr "type" "st,ld,alu,alu")])
+  {
+    switch (which_alternative)
+      {
+      case 0:
+	if (get_attr_length (insn) != 2)
+	  return "stb%o0\\t%z1, %0";
+	else if (const_0_operand (operands[1], QImode))
+	  return "stbz.n\\t%z1, %0";
+	else
+	  return "stb.n\\t%z1, %0";
+      case 1:
+	return "ldbu%o1%.\\t%0, %1";
+      case 2:
+	return "mov%i1%.\\t%0, %z1";
+      default:
+	gcc_unreachable ();
+      }
+  }
+  [(set_attr "type" "st,ld,mov")])
 
 (define_insn "movhi_internal"
-  [(set (match_operand:HI 0 "nonimmediate_operand" "=m, r,r, r")
-        (match_operand:HI 1 "general_operand"       "rM,m,rM,I"))]
+  [(set (match_operand:HI 0 "nonimmediate_operand" "=m, r,r")
+        (match_operand:HI 1 "general_operand"       "rM,m,rI"))]
   "(register_operand (operands[0], HImode)
     || reg_or_0_operand (operands[1], HImode))"
   "@
-    sth%o0\\t%z1, %0
-    ldhu%o1\\t%0, %1
-    mov\\t%0, %z1
-    movi\\t%0, %1"
-  [(set_attr "type" "st,ld,alu,alu")])
+    sth%o0%.\\t%z1, %0
+    ldhu%o1%.\\t%0, %1
+    mov%i1%.\\t%0, %z1"
+  [(set_attr "type" "st,ld,mov")])
 
 (define_insn "movsi_internal"
-  [(set (match_operand:SI 0 "nonimmediate_operand" "=m, r,r, r,r,r,r,r")
-        (match_operand:SI 1 "general_operand"       "rM,m,rM,I,J,K,S,i"))]
+  [(set (match_operand:SI 0 "nonimmediate_operand" "=m, r,r,   r")
+        (match_operand:SI 1 "general_operand"       "rM,m,rIJK,S"))]
   "(register_operand (operands[0], SImode)
     || reg_or_0_operand (operands[1], SImode))"
-  "@
-    stw%o0\\t%z1, %0
-    ldw%o1\\t%0, %1
-    mov\\t%0, %z1
-    movi\\t%0, %1
-    movui\\t%0, %1
-    movhi\\t%0, %H1
-    addi\\t%0, gp, %%gprel(%1)
-    movhi\\t%0, %H1\;addi\\t%0, %0, %L1"
-  [(set_attr "type" "st,ld,alu,alu,alu,alu,alu,alu")
-   (set_attr "length" "4,4,4,4,4,4,4,8")])
+  {
+    switch (which_alternative)
+      {
+      case 0:
+	if (get_attr_length (insn) != 2)
+	  return "stw%o0\\t%z1, %0";
+	else if (stack_memory_operand (operands[0], SImode))
+	  return "stwsp.n\\t%z1, %0";
+	else if (const_0_operand (operands[1], SImode))
+	  return "stwz.n\\t%z1, %0";
+	else
+	  return "stw.n\\t%z1, %0";
+      case 1:
+	if (get_attr_length (insn) != 2)
+	  return "ldw%o1\\t%0, %1";
+	else if (stack_memory_operand (operands[1], SImode))
+	  return "ldwsp.n\\t%0, %1";
+	else
+	  return "ldw.n\\t%0, %1";
+      case 2:
+	return "mov%i1%.\\t%0, %z1";
+      case 3:
+	return "addi\\t%0, gp, %%gprel(%1)";
+      default:
+	gcc_unreachable ();
+      }
+  }
+  [(set_attr "type" "st,ld,mov,alu")])
 
 (define_mode_iterator BH [QI HI])
 (define_mode_iterator BHW [QI HI SI])
@@ -264,18 +314,18 @@
         (zero_extend:SI (match_operand:HI 1 "nonimmediate_operand" "r,m")))]
   ""
   "@
-    andi\\t%0, %1, 0xffff
-    ldhu%o1\\t%0, %1"
-  [(set_attr "type"     "alu,ld")])
+    andi%.\\t%0, %1, 0xffff
+    ldhu%o1%.\\t%0, %1"
+  [(set_attr "type"     "and,ld")])
 
 (define_insn "zero_extendqi<mode>2"
   [(set (match_operand:QX 0 "register_operand" "=r,r")
         (zero_extend:QX (match_operand:QI 1 "nonimmediate_operand" "r,m")))]
   ""
   "@
-    andi\\t%0, %1, 0xff
-    ldbu%o1\\t%0, %1"
-  [(set_attr "type"     "alu,ld")])
+    andi%.\\t%0, %1, 0xff
+    ldbu%o1%.\\t%0, %1"
+  [(set_attr "type"     "and,ld")])
 
 ;; Sign extension patterns
 
@@ -285,7 +335,7 @@
   ""
   "@
    #
-   ldh%o1\\t%0, %1"
+   ldh%o1%.\\t%0, %1"
   [(set_attr "type" "alu,ld")])
 
 (define_insn "extendqi<mode>2"
@@ -294,7 +344,7 @@
   ""
   "@
    #
-   ldb%o1\\t%0, %1"
+   ldb%o1%.\\t%0, %1"
   [(set_attr "type" "alu,ld")])
 
 ;; Split patterns for register alternative cases.
@@ -331,16 +381,18 @@
         (plus:SI (match_operand:SI 1 "register_operand"   "%r")
                  (match_operand:SI 2 "add_regimm_operand" "rIT")))]
   ""
-  "add%i2\\t%0, %1, %z2"
-  [(set_attr "type" "alu")])
+{
+  return nios2_add_insn_asm (insn, operands);
+}
+  [(set_attr "type" "add")])
 
 (define_insn "subsi3"
   [(set (match_operand:SI 0 "register_operand"           "=r")
         (minus:SI (match_operand:SI 1 "reg_or_0_operand" "rM")
                   (match_operand:SI 2 "register_operand" "r")))]
   ""
-  "sub\\t%0, %z1, %2"
-  [(set_attr "type" "alu")])
+  "sub%.\\t%0, %z1, %2"
+  [(set_attr "type" "sub")])
 
 (define_insn "mulsi3"
   [(set (match_operand:SI 0 "register_operand"          "=r")
@@ -422,32 +474,47 @@
   [(set (match_operand:SI 0 "register_operand"        "=r")
         (neg:SI (match_operand:SI 1 "register_operand" "r")))]
   ""
-  "sub\\t%0, zero, %1"
-  [(set_attr "type" "alu")])
+{
+  if (get_attr_length (insn) == 2)
+    return "neg.n\\t%0, %1";
+  else
+    return "sub\\t%0, zero, %1";
+}
+  [(set_attr "type" "neg")])
 
 (define_insn "one_cmplsi2"
   [(set (match_operand:SI 0 "register_operand"        "=r")
         (not:SI (match_operand:SI 1 "register_operand" "r")))]
   ""
-  "nor\\t%0, zero, %1"
-  [(set_attr "type" "alu")])
+{
+  if (get_attr_length (insn) == 2)
+    return "not.n\\t%0, %1";
+  else
+    return "nor\\t%0, zero, %1";
+}
+  [(set_attr "type" "not")])
 
 \f
 ;;  Integer logical Operations
 
-(define_code_iterator LOGICAL [and ior xor])
-(define_code_attr logical_asm [(and "and") (ior "or") (xor "xor")])
+(define_insn "andsi3"
+  [(set (match_operand:SI 0 "register_operand"          "=r")
+        (and:SI (match_operand:SI 1 "register_operand"  "%r")
+                (match_operand:SI 2 "and_operand"     "rJKP")))]
+  ""
+  "and%x2%.\\t%0, %1, %y2"
+  [(set_attr "type" "and")])
+
+(define_code_iterator LOGICAL [ior xor])
+(define_code_attr logical_asm [(ior "or") (xor "xor")])
 
 (define_insn "<code>si3"
-  [(set (match_operand:SI 0 "register_operand"             "=r,r,r")
-        (LOGICAL:SI (match_operand:SI 1 "register_operand" "%r,r,r")
-                    (match_operand:SI 2 "logical_operand"  "rM,J,K")))]
+  [(set (match_operand:SI 0 "register_operand"             "=r")
+        (LOGICAL:SI (match_operand:SI 1 "register_operand" "%r")
+                    (match_operand:SI 2 "logical_operand" "rJK")))]
   ""
-  "@
-    <logical_asm>\\t%0, %1, %z2
-    <logical_asm>%i2\\t%0, %1, %2
-    <logical_asm>h%i2\\t%0, %1, %U2"
-  [(set_attr "type" "alu")])
+  "<logical_asm>%x2%.\\t%0, %1, %y2"
+  [(set_attr "type" "<logical_asm>")])
 
 (define_insn "*norsi3"
   [(set (match_operand:SI 0 "register_operand"                 "=r")
@@ -471,8 +538,8 @@
         (SHIFT:SI (match_operand:SI 1 "register_operand" "r")
                   (match_operand:SI 2 "shift_operand"    "rL")))]
   ""
-  "<shift_asm>%i2\\t%0, %1, %z2"
-  [(set_attr "type" "shift")])
+  "<shift_asm>%i2%.\\t%0, %1, %z2"
+  [(set_attr "type" "<shift_asm>")])
 
 (define_insn "rotrsi3"
   [(set (match_operand:SI 0 "register_operand"             "=r")
@@ -480,7 +547,48 @@
                      (match_operand:SI 2 "register_operand" "r")))]
   ""
   "ror\\t%0, %1, %2"
-  [(set_attr "type" "shift")])
+  [(set_attr "type" "ror")])
+
+;; Nios II R2 Bit Manipulation Extension (BMX), provides
+;; bit merge/insertion/extraction instructions.
+
+(define_insn "*merge"
+  [(set (zero_extract:SI (match_operand:SI 0 "register_operand"   "+r")
+			 (match_operand:SI 1 "const_shift_operand" "L")
+			 (match_operand:SI 2 "const_shift_operand" "L"))
+        (zero_extract:SI (match_operand:SI 3 "register_operand"    "r")
+                         (match_dup 1) (match_dup 2)))]
+  "TARGET_HAS_BMX"
+{
+  operands[4] = GEN_INT (INTVAL (operands[1]) + INTVAL (operands[2]) - 1);
+  return "merge\\t%0, %3, %4, %2";
+}
+  [(set_attr "type" "alu")])
+
+(define_insn "extzv"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (zero_extract:SI (match_operand:SI 1 "register_operand"    "r")
+                         (match_operand:SI 2 "const_shift_operand" "L")
+                         (match_operand:SI 3 "const_shift_operand" "L")))]
+  "TARGET_HAS_BMX"
+{
+  operands[4] = GEN_INT (INTVAL (operands[2]) + INTVAL (operands[3]) - 1);
+  return "extract\\t%0, %1, %4, %3";
+}
+  [(set_attr "type" "alu")])
+
+(define_insn "insv"
+  [(set (zero_extract:SI (match_operand:SI 0 "register_operand"   "+r")
+			 (match_operand:SI 1 "const_shift_operand" "L")
+			 (match_operand:SI 2 "const_shift_operand" "L"))
+	(match_operand:SI 3 "reg_or_0_operand" "rM"))]
+  "TARGET_HAS_BMX"
+{
+  operands[4] = GEN_INT (INTVAL (operands[1]) + INTVAL (operands[2]) - 1);
+  return "insert\\t%0, %z3, %4, %2";
+}
+  [(set_attr "type" "alu")])
+
 
 \f
 ;; Floating point instructions
@@ -635,15 +743,16 @@
   DONE;
 })
 
-(define_insn "return"
+(define_expand "return"
   [(simple_return)]
   "nios2_can_use_return_insn ()"
-  "ret")
+  "")
 
 (define_insn "simple_return"
   [(simple_return)]
   ""
-  "ret")
+  "ret%."
+  [(set_attr "type" "control")])
 
 ;; Block any insns from being moved before this point, since the
 ;; profiling call to mcount can use various registers that aren't
@@ -699,7 +808,7 @@
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:SI 0 "register_operand" "c"))]
   ""
-  "jmp\\t%0"
+  "jmp%!\\t%0"
   [(set_attr "type" "control")])
 
 (define_insn "jump"
@@ -707,7 +816,9 @@
         (label_ref (match_operand 0 "" "")))]
   ""
   {
-    if (flag_pic || get_attr_length (insn) == 4)
+    if (get_attr_length (insn) == 2)
+      return "br.n\\t%0";
+    else if (get_attr_length (insn) == 4)
       return "br\\t%0";
     else
       return "jmpi\\t%0";
@@ -715,11 +826,16 @@
   [(set_attr "type" "control")
    (set (attr "length") 
         (if_then_else
-	    (and (ge (minus (match_dup 0) (pc)) (const_int -32768))
-	         (le (minus (match_dup 0) (pc)) (const_int 32764)))
-	    (const_int 4)
-	    (const_int 8)))])
-
+	    (and (match_test "TARGET_HAS_CDX")
+	         (and (ge (minus (match_dup 0) (pc)) (const_int -1022))
+	              (le (minus (match_dup 0) (pc)) (const_int 1022))))
+	    (const_int 2)
+	    (if_then_else
+	        (ior (match_test "flag_pic")
+	             (and (ge (minus (match_dup 0) (pc)) (const_int -32764))
+	                  (le (minus (match_dup 0) (pc)) (const_int 32764))))
+	        (const_int 4)
+	        (const_int 8))))])
 
 (define_expand "call"
   [(parallel [(call (match_operand 0 "" "")
@@ -743,7 +859,7 @@
   ""
   "@
    call\\t%0
-   callr\\t%0"
+   callr%.\\t%0"
   [(set_attr "type" "control")])
 
 (define_insn "*call_value"
@@ -754,7 +870,7 @@
   ""
   "@
    call\\t%1
-   callr\\t%1"
+   callr%.\\t%1"
   [(set_attr "type" "control")])
 
 (define_expand "sibcall"
@@ -779,7 +895,7 @@
   ""
   "@
    jmpi\\t%0
-   jmp\\t%0"
+   jmp%!\\t%0"
   [(set_attr "type" "control")])
 
 (define_insn "sibcall_value_internal"
@@ -790,7 +906,7 @@
   ""
   "@
    jmpi\\t%1
-   jmp\\t%1"
+   jmp%!\\t%1"
   [(set_attr "type" "control")])
 
 (define_expand "tablejump"
@@ -814,7 +930,7 @@
         (match_operand:SI 0 "register_operand" "c"))
    (use (label_ref (match_operand 1 "" "")))]
   ""
-  "jmp\\t%0"
+  "jmp%!\\t%0"
   [(set_attr "type" "control")])
 
 \f
@@ -868,18 +984,30 @@
        (label_ref (match_operand 3 "" ""))
        (pc)))]
   ""
-  {
-    if (flag_pic || get_attr_length (insn) == 4)
-      return "b%0\t%z1, %z2, %l3";
-    else
-      return "b%R0\t%z1, %z2, .+8;jmpi\t%l3";
-  }
+{
+  if (get_attr_length (insn) == 2)
+    return "b%0z.n\t%z1, %l3";
+  else if (get_attr_length (insn) == 4)
+    return "b%0\t%z1, %z2, %l3";
+  else if (get_attr_length (insn) == 6)
+    return "b%R0z.n\t%z1, .+6;jmpi\t%l3";
+  else
+    return "b%R0\t%z1, %z2, .+8;jmpi\t%l3";
+}
   [(set_attr "type" "control")
    (set (attr "length") 
-        (if_then_else
-	    (and (ge (minus (match_dup 3) (pc)) (const_int -32768))
-	         (le (minus (match_dup 3) (pc)) (const_int 32764)))
-	    (const_int 4) (const_int 8)))])
+        (cond
+         [(and (match_test "nios2_cdx_narrow_form_p (insn)")
+               (ge (minus (match_dup 3) (pc)) (const_int -126))
+               (le (minus (match_dup 3) (pc)) (const_int 126)))
+          (const_int 2)
+          (ior (match_test "flag_pic")
+               (and (ge (minus (match_dup 3) (pc)) (const_int -32764))
+                    (le (minus (match_dup 3) (pc)) (const_int 32764))))
+          (const_int 4)
+          (match_test "nios2_cdx_narrow_form_p (insn)")
+          (const_int 6)]
+         (const_int 8)))])
 
 ;; Floating point comparisons
 (define_code_iterator FCMP [eq ne gt ge le lt])
@@ -917,7 +1045,7 @@
         (UCMP:SI (match_operand:SI 1 "reg_or_0_operand"  "rM")
                  (match_operand:SI 2 "uns_arith_operand" "rJ")))]
   ""
-  "cmp<code>%i2\\t%0, %z1, %z2"
+  "cmp<code>%u2\\t%0, %z1, %z2"
   [(set_attr "type" "alu")])
 
 
@@ -951,8 +1079,8 @@
 (define_insn "nop"
   [(const_int 0)]
   ""
-  "nop"
-  [(set_attr "type" "alu")])
+  "nop%."
+  [(set_attr "type" "nop")])
 
 ;; Connect 'sync' to 'memory_barrier' standard expand name
 (define_expand "memory_barrier"
@@ -1000,7 +1128,7 @@
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 3))]
   ""
-  "trap\\t3"
+  "trap%.\\t3"
   [(set_attr "type" "control")])
 
 (define_insn "ctrapsi4"
@@ -1009,9 +1137,16 @@
                (match_operand:SI 2 "reg_or_0_operand" "rM")])
             (match_operand 3 "const_int_operand" "i"))]
   ""
-  "b%R0\\t%z1, %z2, 1f\;trap\\t%3\;1:"
+{
+  if (get_attr_length (insn) == 6)
+    return "b%R0\\t%z1, %z2, 1f\;trap.n\\t%3\;1:";
+  else
+    return "b%R0\\t%z1, %z2, 1f\;trap\\t%3\;1:";
+}
   [(set_attr "type" "control")
-   (set_attr "length" "8")])
+   (set (attr "length")
+        (if_then_else (match_test "nios2_cdx_narrow_form_p (insn)")
+                      (const_int 6) (const_int 8)))])
   
 ;; Load the GOT register.
 (define_insn "load_got_register"
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 225793)
+++ gcc/doc/md.texi	(working copy)
@@ -2991,6 +2991,9 @@ instead of @code{0} in the assembly outp
 Integer that is valid as an immediate operand for
 a custom instruction opcode. Range 0 to 255.
 
+@item P
+An immediate operand for R2 andchi/andci instructions. 
+
 @item S
 Matches immediates which are addresses in the small
 data section and therefore can be added to @code{gp}
Index: gcc/testsuite/gcc.target/nios2/andci.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/andci.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/andci.c	(revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2" } */
+
+/* Test generation of Nios II R2 "andci" and "andchi" instructions.  */
+
+unsigned int f (unsigned int a)
+{
+  return a & 0xfffffff0;
+}
+
+unsigned int g (unsigned int b)
+{
+  return b & 0xfff0ffff;
+}
+
+/* { dg-final { scan-assembler "\tandci\t.*" } }  */
+/* { dg-final { scan-assembler "\tandchi\t.*" } }  */
+
Index: gcc/testsuite/gcc.target/nios2/bmx.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/bmx.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/bmx.c	(revision 0)
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mbmx" } */
+
+/* Test generation of Nios II R2 BMX instructions.  */
+
+struct s {
+  unsigned int pad1 : 3;
+  unsigned int bitfield : 20;
+  unsigned int intfield;
+};
+
+void f (struct s *a, struct s *b)
+{
+  a->bitfield = b->bitfield;
+}
+
+void g (struct s *a, struct s *b)
+{
+  a->bitfield = b->intfield;
+}
+
+void h (struct s *a, struct s *b)
+{
+  a->intfield = b->bitfield;
+}
+
+/* { dg-final { scan-assembler "\tmerge\t.*, 22, 3" } }  */
+/* { dg-final { scan-assembler "\tinsert\t.*, 22, 3" } }  */
+/* { dg-final { scan-assembler "\textract\t.*, 22, 3" } }  */
Index: gcc/testsuite/gcc.target/nios2/cdx-add.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-add.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-add.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX add.n and addi.n instructions.  */
+
+int f (int a, int b)
+{
+  return a + b;
+}
+
+int g (int a)
+{
+  return a + 32;
+}
+
+int h (int a)
+{
+  return a + 33;
+}
+
+/* { dg-final { scan-assembler "\tadd\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\taddi\\.n\t.*, 32" } } */
+/* { dg-final { scan-assembler "\taddi\t.*, 33" } } */
+
Index: gcc/testsuite/gcc.target/nios2/cdx-branch.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-branch.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-branch.c	(revision 0)
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-Os -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX br.n, beqz.n, bnez.n instructions.  */
+
+int f (int a, int b, int c)
+{
+  if (a == 0)
+    return b;
+  else
+    return c;
+}
+
+int g (int a, int b, int c)
+{
+  if (a != 0)
+    return b;
+  else
+    return c;
+}
+
+extern int i (int);
+extern int j (int);
+extern int k (int);
+
+int h (int a)
+{
+  int x;
+
+  /* As well as the conditional branch for the "if", there has to be
+     an unconditional branch from one branch of the "if" to
+     the return statement.  We compile this testcase with -Os to
+     avoid insertion of a duplicate epilogue in place of the branch.  */
+  if (a == 1)
+    x = i (37);
+  else
+    x = j (42);
+  return x + a + k (x);
+}
+
+/* { dg-final { scan-assembler "\tbeqz\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tbnez\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tbeq\t|\tbne\t" } } */
+/* { dg-final { scan-assembler "\tbr\\.n\t.*" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-callret.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-callret.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-callret.c	(revision 0)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX callr.n, jmpr.n, ret.n instructions.  */
+
+typedef int (*F) (void);
+
+int x (F f)
+{
+  f ();
+
+  /* Note that the compiler might generate a return via pop.n or ldwm;
+     the test below is to make sure that it doesn't generate a 32-bit
+     return instruction.  */
+  return 3;
+}
+
+int y (F f)
+{
+  return f ();
+}
+
+/* { dg-final { scan-assembler "\tcallr\\.n\t.*" } } */
+/* { dg-final { scan-assembler-not "\tret$" } } */
+/* { dg-final { scan-assembler "\tjmpr\\.n\t.*" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-loadstore.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-loadstore.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-loadstore.c	(revision 0)
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX load/store instructions.  */
+
+unsigned char ldb (unsigned char *p)
+{
+  return p[7];
+}
+
+unsigned short ldh (unsigned short *p)
+{
+  return p[7];
+}
+
+unsigned int ldw (unsigned int *p)
+{
+  return p[7];
+}
+
+void stb (unsigned char *p, unsigned char x)
+{
+  p[15] = x;
+}
+
+void sth (unsigned short *p, unsigned short x)
+{
+  p[15] = x;
+}
+
+void stw (unsigned int *p, unsigned int x)
+{
+  p[15] = x;
+}
+
+void no_cdx_stb (unsigned char *p, unsigned char x)
+{
+  p[16] = x;
+}
+
+void no_cdx_sth (unsigned short *p, unsigned short x)
+{
+  p[16] = x;
+}
+
+void no_cdx_stw (unsigned int *p, unsigned int x)
+{
+  p[16] = x;
+}
+
+/* { dg-final { scan-assembler "\tldbu\\.n\t.*, 7\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tldhu\\.n\t.*, 14\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tldw\\.n\t.*, 28\\(.*\\)" } } */
+
+/* { dg-final { scan-assembler "\tstb\\.n\t.*, 15\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tsth\\.n\t.*, 30\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tstw\\.n\t.*, 60\\(.*\\)" } } */
+
+/* { dg-final { scan-assembler "\tstb\t.*, 16\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tsth\t.*, 32\\(.*\\)" } } */
+/* { dg-final { scan-assembler "\tstw\t.*, 64\\(.*\\)" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-logical.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-logical.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-logical.c	(revision 0)
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX and.n, andi.n, or.n, xor.n, and not.n
+   instructions.
+
+   and.n, or.n, and x.n require one of the input registers to be the same
+   as the output register.  Since the tests below want to put the result
+   in the return value register, they use this function to make sure that
+   one of the input operands is also already in the return register.  */
+
+extern unsigned int x (unsigned int a);
+
+unsigned int f (unsigned int a, unsigned int b)
+{
+  return x (a) & b;
+}
+
+unsigned int g (unsigned int a)
+{
+  return a & 31;
+}
+
+unsigned int h (unsigned int a, unsigned int b)
+{
+  return x (a) | b;
+}
+
+unsigned int i (unsigned int a, unsigned int b)
+{
+  return x (a) ^ b;
+}
+
+unsigned int j (unsigned int a)
+{
+  return ~a;
+}
+
+/* { dg-final { scan-assembler "\tand\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tandi\\.n\t.*, 31" } } */
+/* { dg-final { scan-assembler "\tor\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\txor\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tnot\\.n\t.*" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-mov.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-mov.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-mov.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX mov.n and movi.n instructions.  */
+
+extern void f (int a, int b, int c, int d);
+
+int g (int x, int y, int z)
+{
+  f (100, x, y, z);
+  return -1;
+}
+
+/* We should always get mov.n and never mov when compiling with -mcdx.  */
+/* { dg-final { scan-assembler "\tmov\\.n\t.*" } } */
+/* { dg-final { scan-assembler-not "\tmov\t.*" } } */
+
+/* Both of the constant loads are expressible with movi.n.  */
+/* { dg-final { scan-assembler "\tmovi\\.n\t.*, 100" } } */
+/* { dg-final { scan-assembler "\tmovi\\.n\t.*, -1" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-shift.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-shift.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-shift.c	(revision 0)
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX and.n, andi.n, or.n, xor.n, and not.n
+   instructions.  */
+
+extern unsigned int x (unsigned int a);
+
+unsigned int f (unsigned int a, unsigned int b)
+{
+  return x (a) << b;
+}
+
+unsigned int g (unsigned int a)
+{
+  return x (a) << 24;
+}
+
+unsigned int h (unsigned int a, unsigned int b)
+{
+  return x (a) >> b;
+}
+
+unsigned int i (unsigned int a, unsigned int b)
+{
+  return x (a) >> 24;
+}
+
+/* { dg-final { scan-assembler "\tsll\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tslli\\.n\t.*, 24" } } */
+/* { dg-final { scan-assembler "\tsrl\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tsrli\\.n\t.*, 24" } } */
Index: gcc/testsuite/gcc.target/nios2/cdx-sub.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-sub.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-sub.c	(revision 0)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2 -mcdx" } */
+
+/* Check generation of R2 CDX sub.n, subi.n, and neg.n instructions.  */
+
+int f (int a, int b)
+{
+  return a - b;
+}
+
+int g (int a)
+{
+  return a - 32;
+}
+
+int h (int a)
+{
+  return -a;
+}
+
+/* { dg-final { scan-assembler "\tsub\\.n\t.*" } } */
+/* { dg-final { scan-assembler "\tsubi\\.n\t.*, 32" } } */
+/* { dg-final { scan-assembler "\tneg\\.n\t.*" } } */
Index: gcc/testsuite/gcc.target/nios2/nios2-trap-insn.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/nios2-trap-insn.c	(revision 225793)
+++ gcc/testsuite/gcc.target/nios2/nios2-trap-insn.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-final { scan-assembler "trap\\t3" } } */
+/* { dg-final { scan-assembler "trap\\t3|trap.n\\t3" } } */
 
 /* Test the nios2 trap instruction */
 void foo(void){

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [5/7] Support R2 CDX load/store multiple instructions
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
                   ` (3 preceding siblings ...)
  2015-07-14 23:29 ` [nios2] [4/7] Support new R2 instructions Sandra Loosemore
@ 2015-07-14 23:33 ` Sandra Loosemore
  2015-07-14 23:49 ` [nios2] [6/7] Update function prologues/epilogues for R2 CDX Sandra Loosemore
  2015-07-15  0:03 ` [nios2] [7/7] Add new intrinsics Sandra Loosemore
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 23:33 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

This installment of the Nios II R2 patch series adds support for the
new CDX load/store multiple instructions (ldwm, stwm, pop.n, push.n).

The implementation approach we used here is similar to that in the ARM
backend, with all the insn patterns and peephole optimizers generated
by a Standard ML program.  These instructions have quite complicated
restrictions on register numbering and ordering, which are handled by
pop_operation_p and ldswtm_operand_p in nios2.c.

We ran into the regrename bug addressed by this patch
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01214.html when testing
this functionality.  I have adapted a couple of the test cases that were
failing with assembler errors into regression tests for that bug, and
included them in the patch committed.

Our experiments with expanding "load_multiple" and "store_multiple"
patterns into these instructions were not promising.  It might be
worth revisiting that later, but presently the only things that
generate them are the peephole optimizers and the prologue/epilogue
changes coming along in part 6 of the patch series.

Committed as r225798.

-Sandra

[-- Attachment #2: r2-5.log --]
[-- Type: text/x-log, Size: 1028 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/predicates.md (pop_operation): New.
	(ldwm_operation, stwm_operation): New.
	(nios2_hard_register_operand): New.
	* config/nios2/nios2-protos.h (pop_operation_p): Declare.
	(ldstwm_operation_p): Declare.
	(gen_ldstwm_peep): Declare.
	* config/nios2/nios2.c: (nios2_ldst_parallel): Declare.
	(base_reg_adjustment_p): New.
	(pop_operation_p): New.
	(CDX_LDSTWM_VALID_REGS_0, CDX_LDSTWM_VALID_REGS_1): Define.
	(nios2_ldstwm_regset_p): New.
	(ldstwm_operation_p): New.
	(gen_ldst): New.
	(nios2_ldst_parallel): New.
	(struct ldswm_operand): Declare.
	(compare_ldstwm_operands): New.
	(can_use_cdx_ldstw): New.
	(gen_ldstwm_peep): New.
	* config/nios2/nios2-ldstwm.sml: New.
	* config/nios2/nios2.md: Include ldstwm.md.
	* config/nios2/ldstwm.md: Generated.

	gcc/testsuite/
	* gcc.target/nios2/cdx-ldstwm-1.c: New.
	* gcc.target/nios2/cdx-ldstwm-2.c: New.

[-- Attachment #3: r2-5.patch --]
[-- Type: text/x-patch, Size: 31031 bytes --]

Index: gcc/config/nios2/predicates.md
===================================================================
--- gcc/config/nios2/predicates.md	(revision 225796)
+++ gcc/config/nios2/predicates.md	(working copy)
@@ -94,6 +94,30 @@
                                          false));
 })
 
+(define_special_predicate "pop_operation"
+  (match_code "parallel")
+{
+  return pop_operation_p (op);
+})
+
+(define_special_predicate "ldwm_operation"
+  (match_code "parallel")
+{
+  return ldstwm_operation_p (op, /*load_p=*/true);
+})
+
+(define_special_predicate "stwm_operation"
+  (match_code "parallel")
+{
+  return ldstwm_operation_p (op, /*load_p=*/false);
+})
+
+(define_predicate "nios2_hard_register_operand"
+  (match_code "reg")
+{
+  return GP_REG_P (REGNO (op));
+})
+
 (define_predicate "stack_memory_operand"
   (match_code "mem")
 {
Index: gcc/config/nios2/nios2-protos.h
===================================================================
--- gcc/config/nios2/nios2-protos.h	(revision 225796)
+++ gcc/config/nios2/nios2-protos.h	(working copy)
@@ -52,6 +52,10 @@ extern bool nios2_unspec_reloc_p (rtx);
 extern int nios2_label_align (rtx);
 extern bool nios2_cdx_narrow_form_p (rtx_insn *);
 
+extern bool pop_operation_p (rtx);
+extern bool ldstwm_operation_p (rtx, bool);
+extern bool gen_ldstwm_peep (bool, int, rtx, rtx *);
+
 extern void nios2_adjust_reg_alloc_order (void);
 
 #ifdef TREE_CODE
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225796)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -71,6 +71,8 @@ static void nios2_load_pic_register (voi
 static void nios2_register_custom_code (unsigned int, enum nios2_ccs_code, int);
 static const char *nios2_unspec_reloc_name (int);
 static void nios2_register_builtin_fndecl (unsigned, tree);
+static rtx nios2_ldst_parallel (bool, bool, bool, rtx, int,
+				unsigned HOST_WIDE_INT, bool);
 
 /* Threshold for data being put into the small data/bss area, instead
    of the normal data area (references to the small data/bss area take
@@ -456,6 +458,25 @@ restore_reg (int regno, unsigned offset)
   RTX_FRAME_RELATED_P (insn) = 1;
 }
 
+/* This routine tests for the base register update SET in load/store
+   multiple RTL insns, used in pop_operation_p and ldstwm_operation_p.  */
+static bool
+base_reg_adjustment_p (rtx set, rtx *base_reg, rtx *offset)
+{
+  if (GET_CODE (set) == SET
+      && REG_P (SET_DEST (set))
+      && GET_CODE (SET_SRC (set)) == PLUS
+      && REG_P (XEXP (SET_SRC (set), 0))
+      && rtx_equal_p (SET_DEST (set), XEXP (SET_SRC (set), 0))
+      && CONST_INT_P (XEXP (SET_SRC (set), 1)))
+    {
+      *base_reg = XEXP (SET_SRC (set), 0);
+      *offset = XEXP (SET_SRC (set), 1);
+      return true;
+    }
+  return false;
+}
+
 /* Temp regno used inside prologue/epilogue.  */
 #define TEMP_REG_NUM 8
 
@@ -4030,6 +4051,432 @@ nios2_cdx_narrow_form_p (rtx_insn *insn)
   return false;
 }
 
+/* Main function to implement the pop_operation predicate that
+   check pop.n insn pattern integrity.  The CDX pop.n patterns mostly
+   hardcode the restored registers, so the main checking is for the
+   SP offsets.  */
+bool
+pop_operation_p (rtx op)
+{
+  int i;
+  HOST_WIDE_INT last_offset = -1, len = XVECLEN (op, 0);
+  rtx base_reg, offset;
+
+  if (len < 3 /* At least has a return, SP-update, and RA restore.  */
+      || GET_CODE (XVECEXP (op, 0, 0)) != RETURN
+      || !base_reg_adjustment_p (XVECEXP (op, 0, 1), &base_reg, &offset)
+      || !rtx_equal_p (base_reg, stack_pointer_rtx)
+      || !CONST_INT_P (offset)
+      || (INTVAL (offset) & 3) != 0)
+    return false;
+
+  for (i = len - 1; i > 1; i--)
+    {
+      rtx set = XVECEXP (op, 0, i);
+      rtx curr_base_reg, curr_offset;
+
+      if (GET_CODE (set) != SET || !MEM_P (SET_SRC (set))
+	  || !split_mem_address (XEXP (SET_SRC (set), 0),
+				 &curr_base_reg, &curr_offset)
+	  || !rtx_equal_p (base_reg, curr_base_reg)
+	  || !CONST_INT_P (curr_offset))
+	return false;
+      if (i == len - 1)
+	{
+	  last_offset = INTVAL (curr_offset);
+	  if ((last_offset & 3) != 0 || last_offset > 60)
+	    return false;
+	}
+      else
+	{
+	  last_offset += 4;
+	  if (INTVAL (curr_offset) != last_offset)
+	    return false;
+	}
+    }
+  if (last_offset < 0 || last_offset + 4 != INTVAL (offset))
+    return false;
+
+  return true;
+}
+
+
+/* Masks of registers that are valid for CDX ldwm/stwm instructions.
+   The instruction can encode subsets drawn from either R2-R13 or
+   R14-R23 + FP + RA.  */
+#define CDX_LDSTWM_VALID_REGS_0 0x00003ffc
+#define CDX_LDSTWM_VALID_REGS_1 0x90ffc000
+
+static bool
+nios2_ldstwm_regset_p (unsigned int regno, unsigned int *regset)
+{
+  if (*regset == 0)
+    {
+      if (CDX_LDSTWM_VALID_REGS_0 & (1 << regno))
+	*regset = CDX_LDSTWM_VALID_REGS_0;
+      else if (CDX_LDSTWM_VALID_REGS_1 & (1 << regno))
+	*regset = CDX_LDSTWM_VALID_REGS_1;
+      else
+	return false;
+      return true;
+    }
+  else
+    return (*regset & (1 << regno)) != 0;
+}
+
+/* Main function to implement ldwm_operation/stwm_operation
+   predicates that check ldwm/stwm insn pattern integrity.  */
+bool
+ldstwm_operation_p (rtx op, bool load_p)
+{
+  int start, i, end = XVECLEN (op, 0) - 1, last_regno = -1;
+  unsigned int regset = 0;
+  rtx base_reg, offset;  
+  rtx first_elt = XVECEXP (op, 0, 0);
+  bool inc_p = true;
+  bool wb_p = base_reg_adjustment_p (first_elt, &base_reg, &offset);
+  if (GET_CODE (XVECEXP (op, 0, end)) == RETURN)
+    end--;
+  start = wb_p ? 1 : 0;
+  for (i = start; i <= end; i++)
+    {
+      int regno;
+      rtx reg, mem, elt = XVECEXP (op, 0, i);
+      /* Return early if not a SET at all.  */
+      if (GET_CODE (elt) != SET)
+	return false;
+      reg = load_p ? SET_DEST (elt) : SET_SRC (elt);
+      mem = load_p ? SET_SRC (elt) : SET_DEST (elt);
+      if (!REG_P (reg) || !MEM_P (mem))
+	return false;
+      regno = REGNO (reg);
+      if (!nios2_ldstwm_regset_p (regno, &regset))
+	return false;
+      /* If no writeback to determine direction, use offset of first MEM.  */
+      if (wb_p)
+	inc_p = INTVAL (offset) > 0;
+      else if (i == start)
+	{
+	  rtx first_base, first_offset;
+	  if (!split_mem_address (XEXP (mem, 0),
+				  &first_base, &first_offset))
+	    return false;
+	  base_reg = first_base;
+	  inc_p = INTVAL (first_offset) >= 0;
+	}
+      /* Ensure that the base register is not loaded into.  */
+      if (load_p && regno == (int) REGNO (base_reg))
+	return false;
+      /* Check for register order inc/dec integrity.  */
+      if (last_regno >= 0)
+	{
+	  if (inc_p && last_regno >= regno)
+	    return false;
+	  if (!inc_p && last_regno <= regno)
+	    return false;
+	}
+      last_regno = regno;
+    }
+  return true;
+}
+
+/* Helper for nios2_ldst_parallel, for generating a parallel vector
+   SET element.  */
+static rtx
+gen_ldst (bool load_p, int regno, rtx base_mem, int offset)
+{
+  rtx reg = gen_rtx_REG (SImode, regno);
+  rtx mem = adjust_address_nv (base_mem, SImode, offset);
+  return gen_rtx_SET (load_p ? reg : mem,
+		      load_p ? mem : reg);
+}
+
+/* A general routine for creating the body RTL pattern of
+   ldwm/stwm/push.n/pop.n insns.
+   LOAD_P: true/false for load/store direction.
+   REG_INC_P: whether registers are incrementing/decrementing in the
+   *RTL vector* (not necessarily the order defined in the ISA specification).
+   OFFSET_INC_P: Same as REG_INC_P, but for the memory offset order.
+   BASE_MEM: starting MEM.
+   BASE_UPDATE: amount to update base register; zero means no writeback.
+   REGMASK: register mask to load/store.
+   RET_P: true if to tag a (return) element at the end.
+
+   Note that this routine does not do any checking. It's the job of the
+   caller to do the right thing, and the insn patterns to do the
+   safe-guarding.  */
+static rtx
+nios2_ldst_parallel (bool load_p, bool reg_inc_p, bool offset_inc_p,
+		     rtx base_mem, int base_update,
+		     unsigned HOST_WIDE_INT regmask, bool ret_p)
+{
+  rtvec p;
+  int regno, b = 0, i = 0, n = 0, len = popcount_hwi (regmask);
+  if (ret_p) len++, i++, b++;
+  if (base_update != 0) len++, i++;
+  p = rtvec_alloc (len);
+  for (regno = (reg_inc_p ? 0 : 31);
+       regno != (reg_inc_p ? 32 : -1);
+       regno += (reg_inc_p ? 1 : -1))
+    if ((regmask & (1 << regno)) != 0)
+      {
+	int offset = (offset_inc_p ? 4 : -4) * n++;
+	RTVEC_ELT (p, i++) = gen_ldst (load_p, regno, base_mem, offset);
+      }
+  if (ret_p)
+    RTVEC_ELT (p, 0) = ret_rtx;
+  if (base_update != 0)
+    {
+      rtx reg, offset;
+      if (!split_mem_address (XEXP (base_mem, 0), &reg, &offset))
+	gcc_unreachable ();
+      RTVEC_ELT (p, b) =
+	gen_rtx_SET (reg, plus_constant (Pmode, reg, base_update));
+    }
+  return gen_rtx_PARALLEL (VOIDmode, p);
+}
+
+/* CDX ldwm/stwm peephole optimization pattern related routines.  */
+
+/* Data structure and sorting function for ldwm/stwm peephole optimizers.  */
+struct ldstwm_operand
+{
+  int offset;	/* Offset from base register.  */
+  rtx reg;	/* Register to store at this offset.  */
+  rtx mem;	/* Original mem.  */
+  bool bad;	/* True if this load/store can't be combined.  */
+  bool rewrite; /* True if we should rewrite using scratch.  */
+};
+
+static int
+compare_ldstwm_operands (const void *arg1, const void *arg2)
+{
+  const struct ldstwm_operand *op1 = (const struct ldstwm_operand *) arg1;
+  const struct ldstwm_operand *op2 = (const struct ldstwm_operand *) arg2;
+  if (op1->bad)
+    return op2->bad ? 0 : 1;
+  else if (op2->bad)
+    return -1;
+  else
+    return op1->offset - op2->offset;
+}
+
+/* Helper function: return true if a load/store using REGNO with address
+   BASEREG and offset OFFSET meets the constraints for a 2-byte CDX ldw.n,
+   stw.n, ldwsp.n, or stwsp.n instruction.  */
+static bool
+can_use_cdx_ldstw (int regno, int basereg, int offset)
+{
+  if (CDX_REG_P (regno) && CDX_REG_P (basereg)
+      && (offset & 0x3) == 0 && 0 <= offset && offset < 0x40)
+    return true;
+  else if (basereg == SP_REGNO
+	   && offset >= 0 && offset < 0x80 && (offset & 0x3) == 0)
+    return true;
+  return false;
+}
+
+/* This function is called from peephole2 optimizers to try to merge
+   a series of individual loads and stores into a ldwm or stwm.  It
+   can also rewrite addresses inside the individual loads and stores
+   using a common base register using a scratch register and smaller
+   offsets if that allows them to use CDX ldw.n or stw.n instructions
+   instead of 4-byte loads or stores.
+   N is the number of insns we are trying to merge.  SCRATCH is non-null
+   if there is a scratch register available.  The OPERANDS array contains
+   alternating REG (even) and MEM (odd) operands.  */
+bool
+gen_ldstwm_peep (bool load_p, int n, rtx scratch, rtx *operands)
+{
+  /* CDX ldwm/stwm instructions allow a maximum of 12 registers to be
+     specified.  */
+#define MAX_LDSTWM_OPS 12
+  struct ldstwm_operand sort[MAX_LDSTWM_OPS];
+  int basereg = -1;
+  int baseoffset;
+  int i, m, lastoffset, lastreg;
+  unsigned int regmask = 0, usemask = 0, regset;
+  bool needscratch;
+  int newbasereg;
+  int nbytes;
+
+  if (!TARGET_HAS_CDX)
+    return false;
+  if (n < 2 || n > MAX_LDSTWM_OPS)
+    return false;
+
+  /* Check all the operands for validity and initialize the sort array.
+     The places where we return false here are all situations that aren't
+     expected to ever happen -- invalid patterns, invalid registers, etc.  */
+  for (i = 0; i < n; i++)
+    {
+      rtx base, offset;
+      rtx reg = operands[i];
+      rtx mem = operands[i + n];
+      int r, o, regno;
+      bool bad = false;
+
+      if (!REG_P (reg) || !MEM_P (mem))
+	return false;
+
+      regno = REGNO (reg);
+      if (regno > 31)
+	return false;
+      if (load_p && (regmask & (1 << regno)) != 0)
+	return false;
+      regmask |= 1 << regno;
+
+      if (!split_mem_address (XEXP (mem, 0), &base, &offset))
+	return false;
+      r = REGNO (base);
+      o = INTVAL (offset);
+
+      if (basereg == -1)
+	basereg = r;
+      else if (r != basereg)
+	bad = true;
+      usemask |= 1 << r;
+
+      sort[i].bad = bad;
+      sort[i].rewrite = false;
+      sort[i].offset = o;
+      sort[i].reg = reg;
+      sort[i].mem = mem;
+    }
+
+  /* If we are doing a series of register loads, we can't safely reorder
+     them if any of the regs used in addr expressions are also being set.  */
+  if (load_p && (regmask & usemask))
+    return false;
+
+  /* Sort the array by increasing mem offset order, then check that
+     offsets are valid and register order matches mem order.  At the
+     end of this loop, m is the number of loads/stores we will try to
+     combine; the rest are leftovers.  */
+  qsort (sort, n, sizeof (struct ldstwm_operand), compare_ldstwm_operands);
+
+  baseoffset = sort[0].offset;
+  needscratch = baseoffset != 0;
+  if (needscratch && !scratch)
+    return false;
+
+  lastreg = regmask = regset = 0;
+  lastoffset = baseoffset;
+  for (m = 0; m < n && !sort[m].bad; m++)
+    {
+      int thisreg = REGNO (sort[m].reg);
+      if (sort[m].offset != lastoffset
+	  || (m > 0 && lastreg >= thisreg)
+	  || !nios2_ldstwm_regset_p (thisreg, &regset))
+	break;
+      lastoffset += 4;
+      lastreg = thisreg;
+      regmask |= (1 << thisreg);
+    }
+
+  /* For loads, make sure we are not overwriting the scratch reg.
+     The peephole2 pattern isn't supposed to match unless the register is
+     unused all the way through, so this isn't supposed to happen anyway.  */
+  if (load_p
+      && needscratch
+      && ((1 << REGNO (scratch)) & regmask) != 0)
+    return false;
+  newbasereg = needscratch ? (int) REGNO (scratch) : basereg;
+
+  /* We may be able to combine only the first m of the n total loads/stores
+     into a single instruction.  If m < 2, there's no point in emitting
+     a ldwm/stwm at all, but we might be able to do further optimizations
+     if we have a scratch.  We will count the instruction lengths of the
+     old and new patterns and store the savings in nbytes.  */
+  if (m < 2)
+    {
+      if (!needscratch)
+	return false;
+      m = 0;
+      nbytes = 0;
+    }
+  else
+    nbytes = -4;  /* Size of ldwm/stwm.  */
+  if (needscratch)
+    {
+      int bo = baseoffset > 0 ? baseoffset : -baseoffset;
+      if (CDX_REG_P (newbasereg)
+	  && CDX_REG_P (basereg)
+	  && bo <= 128 && bo > 0 && (bo & (bo - 1)) == 0)
+	nbytes -= 2;  /* Size of addi.n/subi.n.  */
+      else
+	nbytes -= 4;  /* Size of non-CDX addi.  */
+    }
+
+  /* Count the size of the input load/store instructions being replaced.  */
+  for (i = 0; i < m; i++)
+    if (can_use_cdx_ldstw (REGNO (sort[i].reg), basereg, sort[i].offset))
+      nbytes += 2;
+    else
+      nbytes += 4;
+
+  /* We may also be able to save a bit if we can rewrite non-CDX
+     load/stores that can't be combined into the ldwm/stwm into CDX
+     load/stores using the scratch reg.  For example, this might happen
+     if baseoffset is large, by bringing in the offsets in the load/store
+     instructions within the range that fits in the CDX instruction.  */
+  if (needscratch && CDX_REG_P (newbasereg))
+    for (i = m; i < n && !sort[i].bad; i++)
+      if (!can_use_cdx_ldstw (REGNO (sort[i].reg), basereg, sort[i].offset)
+	  && can_use_cdx_ldstw (REGNO (sort[i].reg), newbasereg,
+				sort[i].offset - baseoffset))
+	{
+	  sort[i].rewrite = true;
+	  nbytes += 2;
+	}
+
+  /* Are we good to go?  */
+  if (nbytes <= 0)
+    return false;
+
+  /* Emit the scratch load.  */
+  if (needscratch)
+    emit_insn (gen_rtx_SET (scratch, XEXP (sort[0].mem, 0)));
+
+  /* Emit the ldwm/stwm insn.  */
+  if (m > 0)
+    {
+      rtvec p = rtvec_alloc (m);
+      for (i = 0; i < m; i++)
+	{
+	  int offset = sort[i].offset;
+	  rtx mem, reg = sort[i].reg;
+	  rtx base_reg = gen_rtx_REG (Pmode, newbasereg);
+	  if (needscratch)
+	    offset -= baseoffset;
+	  mem = gen_rtx_MEM (SImode, plus_constant (Pmode, base_reg, offset));
+	  if (load_p)
+	    RTVEC_ELT (p, i) = gen_rtx_SET (reg, mem);
+	  else
+	    RTVEC_ELT (p, i) = gen_rtx_SET (mem, reg);
+	}
+      emit_insn (gen_rtx_PARALLEL (VOIDmode, p));
+    }
+
+  /* Emit any leftover load/stores as individual instructions, doing
+     the previously-noted rewrites to use the scratch reg.  */
+  for (i = m; i < n; i++)
+    {
+      rtx reg = sort[i].reg;
+      rtx mem = sort[i].mem;
+      if (sort[i].rewrite)
+	{
+	  int offset = sort[i].offset - baseoffset;
+	  mem = gen_rtx_MEM (SImode, plus_constant (Pmode, scratch, offset));
+	}
+      if (load_p)
+	emit_move_insn (reg, mem);
+      else
+	emit_move_insn (mem, reg);
+    }
+  return true;
+}
+
 /* Implement TARGET_MACHINE_DEPENDENT_REORG:
    We use this hook when emitting CDX code to enforce the 4-byte
    alignment requirement for labels that are used as the targets of
Index: gcc/config/nios2/nios2-ldstwm.sml
===================================================================
--- gcc/config/nios2/nios2-ldstwm.sml	(revision 0)
+++ gcc/config/nios2/nios2-ldstwm.sml	(revision 0)
@@ -0,0 +1,277 @@
+(* Auto-generate Nios II R2 CDX ldwm/stwm/push.n/pop.n patterns
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   This is a Standard ML program.  There are multiple Standard ML
+   implementations widely available.  We recommend the MLton optimizing
+   SML compiler, due to its ease of creating a standalone executable.
+
+     http://www.mlton.org/
+
+   Or from your favourite OS's friendly packaging system. Tested with
+   MLton Release 20130715, though other versions will probably work too.
+
+   Run with:
+     mlton -output a.out /path/to/gcc/config/nios2/nios2-ldstwm.sml
+     ./a.out >/path/to/gcc/config/nios2/ldstwm.md
+*)
+
+datatype ld_st = ld | st;    
+datatype push_pop = push | pop;
+datatype inc_dec = inc | dec;
+
+fun for ls f = map f ls;
+fun conds cond str = if cond then str else "";
+fun ints n = if n>=0 then (Int.toString n) else ("-" ^ (Int.toString (~n)));
+
+fun pushpop_pattern pptype n fp =
+    let 
+	val sp_reg = "(reg:SI SP_REGNO)";
+	val ra_reg = "(reg:SI RA_REGNO)";
+	val fp_reg = "(reg:SI FP_REGNO)";
+
+	fun sets lhs rhs = "(set " ^ lhs ^
+			   (if pptype=push then " "
+			    else " ") ^ rhs ^ ")";
+	val sp_adj =
+	    "(set " ^ sp_reg ^ "\n          " ^
+	    "(plus:SI " ^ sp_reg ^
+	    " (match_operand 1 \"const_int_operand\" \"\")))";
+
+	fun reg i regi = "(reg:SI " ^ (ints regi) ^ ")";
+	fun mem i opndi =
+	    if pptype=push then
+		"(mem:SI (plus:SI (reg:SI SP_REGNO) (const_int " ^ (ints (~4*i)) ^ ")))"
+	    else
+		"(match_operand:SI " ^
+		(ints opndi) ^ " \"stack_memory_operand\" \"\")";
+
+	val start = 1 + (if fp then 2 else 1);
+	val lim = n + (if fp then 2 else 1);
+	fun set_elt i regi opndi =
+	    if pptype=push then (sets (mem i opndi) (reg i regi))
+	    else (sets (reg i regi) (mem i opndi));
+	fun get_elt_list (i, regi, opndi) =
+	    if i > lim then []
+	    else (set_elt i regi opndi) :: get_elt_list (i+1, regi-1, opndi+1);
+
+	val set_elements = get_elt_list (start, 16+n-1, start+1);
+
+	val ra_set = if pptype=push then sets (mem 1 2) ra_reg
+		     else sets ra_reg (mem 1 2);
+	val fp_set = (conds fp (if pptype=push then sets (mem 2 3) fp_reg
+				else sets fp_reg (mem 2 3)));
+	val ret = (conds (pptype=pop) "(return)");
+	val element_list =
+	    List.filter (fn x => x<>"")
+			([ret, sp_adj, ra_set, fp_set] @ set_elements);
+
+	fun reg_index i = 16 + n - i;
+	fun pop_opnds 0 spl = (conds fp ("fp" ^ spl)) ^ "ra"
+	  | pop_opnds n spl = "r" ^ (ints (reg_index n)) ^ spl ^ (pop_opnds (n-1) spl);
+	fun push_opnds 0 spl = "ra" ^ (conds fp (spl ^ "fp"))
+	  | push_opnds n spl = (push_opnds (n-1) spl) ^ spl ^ "r" ^ (ints (reg_index n));
+
+	val spadj_opnd = if pptype=push then 2 else (start+n);
+	val spadj = ints spadj_opnd;
+	val regsave_num = n + (if fp then 2 else 1);
+
+	val ppname = if pptype=push then "push" else "pop";
+	val name = if pptype=push then "push" ^ "_" ^ (push_opnds n "_")
+		   else "pop" ^ "_" ^ (pop_opnds n "_");
+    in
+	"(define_insn \"*cdx_" ^ name ^ "\"\n" ^
+	"  [(match_parallel 0 \"" ^
+	(conds (pptype=pop) "pop_operation") ^ "\"\n" ^
+	"    [" ^ (String.concatWith ("\n     ") element_list) ^ "])]\n" ^
+	"   \"TARGET_HAS_CDX && XVECLEN (operands[0], 0) == " ^
+	(ints (length element_list)) ^
+	(conds (pptype=push)
+	       ("\n    && (-INTVAL (operands[1]) & 3) == 0\n" ^
+		"    && (-INTVAL (operands[1]) - " ^
+		(ints (4*regsave_num)) ^ ") <= 60")) ^
+	"\"\n" ^
+	(if pptype=pop then
+	     "{\n" ^
+	     "  rtx x = XEXP (operands[" ^ spadj ^ "], 0);\n" ^
+	     "  operands[" ^ spadj ^ "] = REG_P (x) ? const0_rtx : XEXP (x, 1);\n" ^
+	     "  return \"pop.n\\\\t{" ^ (pop_opnds n ", ") ^ "}, %" ^ spadj ^ "\";\n" ^
+	     "}\n"
+	 else
+	     "{\n" ^
+	     "  operands[" ^ spadj ^ "] = " ^
+	     "GEN_INT (-INTVAL (operands[1]) - " ^ (ints (4*regsave_num)) ^ ");\n" ^
+	     "  return \"push.n\\\\t{" ^ (push_opnds n ", ") ^ "}, %" ^ spadj ^ "\";\n" ^
+	     "}\n") ^
+	"  [(set_attr \"type\" \"" ^ ppname ^ "\")])\n\n"
+    end;
+
+fun ldstwm_pattern ldst n id wb pc =
+    let
+	val ldstwm = (if ldst=ld then "ldwm" else "stwm");
+	val name = "*cdx_" ^ ldstwm ^ (Int.toString n) ^
+		   (if id=inc then "_inc" else "_dec") ^
+		   (conds wb "_wb") ^ (conds pc "_ret");
+	val base_reg_referenced_p = ref false;
+	val base_regno = ints (n+1);
+	fun plus_addr base offset =
+	    "(plus:SI " ^ base ^ " (const_int " ^ (ints offset) ^ "))";
+	fun base_reg () =
+	    if !base_reg_referenced_p then
+		"(match_dup " ^ base_regno ^ ")"
+	    else (base_reg_referenced_p := true;
+		  "(match_operand:SI " ^ base_regno ^
+		  " \"register_operand\" \"" ^ (conds wb "+&") ^ "r\")");
+	fun reg i = "(match_operand:SI " ^ (ints i) ^
+		    " \"nios2_hard_register_operand\" \"" ^
+		    (conds (ldst=ld) "") ^ "\")";
+
+	fun addr 1 = if id=inc then base_reg ()
+		     else plus_addr (base_reg ()) (~4)
+	  | addr i = let val offset = if id=inc then (i-1)*4 else (~i*4)
+		     in plus_addr (base_reg ()) offset end;
+
+	fun mem i = "(mem:SI " ^ (addr i) ^ ")";
+	fun lhs i = if ldst=ld then reg i else mem i;
+	fun rhs i = if ldst=st then reg i else mem i;
+	fun sets lhs rhs = "(set " ^ lhs ^ "\n          " ^ rhs ^ ")";
+	fun set_elements i =
+	    if i > n then []
+	    else (sets (lhs i) (rhs i)) :: (set_elements (i+1));
+
+	fun opnds 1 = "%1"
+	  | opnds n = opnds(n-1) ^ ", %" ^ (Int.toString n);
+
+	val asm_template = ldstwm ^ "\\\\t{" ^ (opnds n) ^ "}" ^
+			   (if id=inc
+			    then ", (%" ^ base_regno ^ ")++"
+			    else ", --(%" ^ base_regno ^ ")") ^
+			   (conds wb ", writeback") ^
+			   (conds pc ", ret");
+	val wbtmp =
+	    if wb then
+		(sets (base_reg ())
+		      (plus_addr (base_reg ())
+				 ((if id=inc then n else ~n)*4)))
+	    else "";
+	val pctmp = conds pc "(return)";
+	val set_list = List.filter (fn x => x<>"")
+				   ([pctmp, wbtmp] @ (set_elements 1));
+    in
+	if ldst=st andalso pc then ""
+	else
+	    "(define_insn \"" ^ name ^ "\"\n" ^
+	    "  [(match_parallel 0 \"" ^ ldstwm ^  "_operation\"\n" ^
+	    "    [" ^ (String.concatWith ("\n     ") set_list) ^ "])]\n" ^
+	    "   \"TARGET_HAS_CDX && XVECLEN (operands[0], 0) == " ^
+	    (ints (length set_list)) ^ "\"\n" ^
+	    "   \"" ^ asm_template ^ "\"\n" ^
+	    "  [(set_attr \"type\" \"" ^ ldstwm ^ "\")])\n\n"
+    end;
+
+fun peephole_pattern ldst n scratch_p =
+    let
+	fun sets lhs rhs = "(set " ^ lhs ^ "\n        " ^ rhs ^ ")";
+	fun single_set i indent =
+	    let val reg = "(match_operand:SI " ^ (ints i) ^
+			  " \"register_operand\" \"\")";
+		val mem = "(match_operand:SI " ^ (ints (i+n)) ^
+			  " \"memory_operand\" \"\")";
+	    in
+		if ldst=ld then sets reg mem
+		else sets mem reg
+	    end;
+
+	fun single_sets i =
+	    if i=n then []
+	    else (single_set i "   ") :: (single_sets (i+1));
+
+	val scratch = ints (2*n);
+	val peephole_elements =
+	    let val tmp = single_sets 0 in
+		if scratch_p
+		then (["(match_scratch:SI " ^ scratch ^ " \"r\")"] @
+		      tmp @
+		      ["(match_dup " ^ scratch ^ ")"])
+		else tmp
+	    end;
+    in
+	"(define_peephole2\n" ^
+	"  [" ^ (String.concatWith ("\n   ") peephole_elements) ^ "]\n" ^
+	"  \"TARGET_HAS_CDX\"\n" ^
+	"  [(const_int 0)]\n" ^
+	"{\n" ^
+	"  if (gen_ldstwm_peep (" ^
+	(if ldst=st then "false" else "true") ^ ", " ^ (ints n) ^ ", " ^ 
+	(if scratch_p then ("operands[" ^ scratch ^ "]") else "NULL_RTX") ^
+	", operands))\n" ^
+	"    DONE;\n" ^
+	"  else\n" ^
+	"    FAIL;\n" ^
+	"})\n\n"
+    end;
+
+
+print
+("/* Nios II R2 CDX ldwm/stwm/push.h/pop.n instruction patterns.\n" ^
+ "   This file was automatically generated using nios2-ldstwm.sml.\n" ^
+ "   Please do not edit manually.\n" ^
+ "\n" ^
+ "   Copyright (C) 2014-2015 Free Software Foundation, Inc.\n" ^
+ "   Contributed by Mentor Graphics.\n" ^
+ "\n" ^
+ "   This file is part of GCC.\n" ^
+ "\n" ^
+ "   GCC is free software; you can redistribute it and/or modify it\n" ^
+ "   under the terms of the GNU General Public License as published\n" ^
+ "   by the Free Software Foundation; either version 3, or (at your\n" ^
+ "   option) any later version.\n" ^
+ "\n" ^
+ "   GCC is distributed in the hope that it will be useful, but WITHOUT\n" ^
+ "   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY\n" ^
+ "   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public\n" ^
+ "   License for more details.\n" ^
+ "\n" ^
+ "   You should have received a copy of the GNU General Public License and\n" ^
+ "   a copy of the GCC Runtime Library Exception along with this program;\n" ^
+ "   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see\n" ^
+ "   <http://www.gnu.org/licenses/>.  */\n\n");
+
+fun seq a b = if a=b then [b]
+	      else a :: (seq (if a<b then a+1 else a-1) b);
+
+(* push/pop patterns *)
+for (seq 0 8) (fn n =>
+  for [push, pop] (fn p =>
+    for [true, false] (fn fp =>
+       print (pushpop_pattern p n fp))));
+
+(* ldwm/stwm patterns *)
+for [ld, st] (fn l =>
+  for (seq 1 12) (fn n =>
+    for [inc, dec] (fn id =>
+      for [true, false] (fn wb =>
+        for [true, false] (fn pc =>
+          print (ldstwm_pattern l n id wb pc))))));
+
+(* peephole patterns *)
+for [ld, st] (fn l =>
+  for (seq 12 2) (fn n =>
+    print (peephole_pattern l n true)));
+
Index: gcc/config/nios2/nios2.md
===================================================================
--- gcc/config/nios2/nios2.md	(revision 225796)
+++ gcc/config/nios2/nios2.md	(working copy)
@@ -1169,3 +1169,6 @@
   emit_move_insn (operands[0], gen_rtx_REG (Pmode, TP_REGNO));
   DONE;
 })
+;; Include the ldwm/stwm/push.n/pop.n patterns and peepholes.
+(include "ldstwm.md")
+
Index: gcc/testsuite/gcc.target/nios2/cdx-ldstwm-1.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-ldstwm-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-ldstwm-1.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 -fomit-frame-pointer -funroll-all-loops -finline-functions -march=r2 -mcdx -w" } */
+
+/* Based on gcc.c-torture/compile/920501-23.c.
+   This test used to result in assembler errors with R2 CDX because of
+   a bug in regrename; it wasn't re-validating insns after renaming, so
+   ldwm/stwm instructions with incorrect registers were being emitted.  */
+
+typedef unsigned char qi;
+typedef unsigned short hi;
+typedef unsigned long si;
+typedef unsigned long long di;
+subi(a){return 100-a;}
+add(a,b){return a+b;}
+mul(a){return 85*a;}
+memshift(p)unsigned*p;{unsigned x;for(;;){x=*p++>>16;if(x)return x;}}
+ldw(xp)si*xp;{return xp[4];}
+ldws_m(xp)si*xp;{si x;do{x=xp[3];xp+=3;}while(x);}
+postinc_si(p)si*p;{si x;for(;;){x=*p++;if(x)return x;}}
+preinc_si(p)si*p;{si x;for(;;){x=*++p;if(x)return x;}}
+postinc_di(p)di*p;{di x;for(;;){x=*p++;if(x)return x;}}
+preinc_di(p)di*p;{di x;for(;;){x=*++p;if(x)return x;}}
+inc_overlap(p,a)di*p;{do{p=*(di**)p;p=(di*)((int)p+4);}while(*p);}
+di move_di(p,p2)di*p,*p2;{di x=p;p2=((di*)x)[1];return p2[1];}
Index: gcc/testsuite/gcc.target/nios2/cdx-ldstwm-2.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/cdx-ldstwm-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/cdx-ldstwm-2.c	(revision 0)
@@ -0,0 +1,66 @@
+/* { dg-do assemble } */
+/* { dg-options "-O3 -fomit-frame-pointer -funroll-loops -march=r2 -mcdx -w" } */
+
+/* Based on gcc.c-torture/execute/20021120-1.c.
+   This test used to result in assembler errors with R2 CDX because of
+   a bug in regrename; it wasn't re-validating insns after renaming, so
+   ldwm/stwm instructions with incorrect registers were being emitted.  */
+
+/* Macros to emit "L Nxx R" for each octal number xx between 000 and 037.  */
+#define OP1(L, N, R, I, J) L N##I##J R
+#define OP2(L, N, R, I) \
+    OP1(L, N, R, 0, I), OP1(L, N, R, 1, I), \
+    OP1(L, N, R, 2, I), OP1(L, N, R, 3, I)
+#define OP(L, N, R) \
+    OP2(L, N, R, 0), OP2(L, N, R, 1), OP2(L, N, R, 2), OP2(L, N, R, 3), \
+    OP2(L, N, R, 4), OP2(L, N, R, 5), OP2(L, N, R, 6), OP2(L, N, R, 7)
+
+/* Declare 32 unique variables with prefix N.  */
+#define DECLARE(N) OP (, N,)
+
+/* Copy 32 variables with prefix N from the array at ADDR.
+   Leave ADDR pointing to the end of the array.  */
+#define COPYIN(N, ADDR) OP (, N, = *(ADDR++))
+
+/* Likewise, but copy the other way.  */
+#define COPYOUT(N, ADDR) OP (*(ADDR++) =, N,)
+
+/* Add the contents of the array at ADDR to 32 variables with prefix N.
+   Leave ADDR pointing to the end of the array.  */
+#define ADD(N, ADDR) OP (, N, += *(ADDR++))
+
+volatile double gd[32];
+volatile float gf[32];
+
+void foo (int n)
+{
+  double DECLARE(d);
+  float DECLARE(f);
+  volatile double *pd;
+  volatile float *pf;
+  int i;
+
+  pd = gd; COPYIN (d, pd);
+  for (i = 0; i < n; i++)
+    {
+      pf = gf; COPYIN (f, pf);
+      pd = gd; ADD (d, pd);
+      pd = gd; ADD (d, pd);
+      pd = gd; ADD (d, pd);
+      pf = gf; COPYOUT (f, pf);
+    }
+  pd = gd; COPYOUT (d, pd);
+}
+
+int main ()
+{
+  int i;
+
+  for (i = 0; i < 32; i++)
+    gd[i] = i, gf[i] = i;
+  foo (1);
+  for (i = 0; i < 32; i++)
+    if (gd[i] != i * 4 || gf[i] != i)
+      abort ();
+  exit (0);
+}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [6/7] Update function prologues/epilogues for R2 CDX
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
                   ` (4 preceding siblings ...)
  2015-07-14 23:33 ` [nios2] [5/7] Support R2 CDX load/store multiple instructions Sandra Loosemore
@ 2015-07-14 23:49 ` Sandra Loosemore
  2015-07-15  0:03 ` [nios2] [7/7] Add new intrinsics Sandra Loosemore
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-14 23:49 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

This patch re-works the function prologue and epilogue generation code
to emit CDX push.n/stwm and pop.n/ldwm instructions, respectively.

A CDX function prologue always uses push.n to push the callee-saved
registers -- the stack frame will be padded if necessary to ensure a
contiguous register range.  It may also use stwm instructions to save
the register arguments on the stack for va_arg processing, and to save
the EH data registers.  In the case of a sibcall, CDX function
epilogues use ldwm instead of pop.n to restore the callee-saved
registers because pop.n always does an implicit return.

Committed as r225799.

-Sandra


[-- Attachment #2: r2-6.log --]
[-- Type: text/x-log, Size: 910 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/nios2-protos.h (nios2_expand_return): Declare.
	* config/nios2/nios2.c (struct GTY (()) machine_function): Add
	callee_save_reg_size and uses_anonymous_args fields.
	(nios2_compute_frame_layout): Update for CDX push.n/pop.n usage.
	(nios2_create_cfa_notes): New function.
	(nios2_adjust_stack): New function for adjusting stack.
	(nios2_expand_prologue): Update for CDX push.n/pop.n usage.
	Use nios2_adjust_stack.
	(nios2_expand_epilogue): Likewise.
	(nios2_expand_return): New function.
	(nios2_can_use_return_insn): Update for CDX pop.n usage.
	(nios2_setup_incoming_varargs): Set uses_anonymous_args flag.
	If TARGET_HAS_CDX, defer pushing regs to nios2_expand_prologue.
	* config/nios2/nios2.md (return): Use nios2_expand_return.

[-- Attachment #3: r2-6.patch --]
[-- Type: text/x-patch, Size: 20165 bytes --]

Index: gcc/config/nios2/nios2-protos.h
===================================================================
--- gcc/config/nios2/nios2-protos.h	(revision 225798)
+++ gcc/config/nios2/nios2-protos.h	(working copy)
@@ -26,6 +26,7 @@ extern int nios2_initial_elimination_off
 extern int nios2_can_use_return_insn (void);
 extern void nios2_expand_prologue (void);
 extern void nios2_expand_epilogue (bool);
+extern bool nios2_expand_return (void);
 extern void nios2_function_profiler (FILE *, int);
 
 #ifdef RTX_CODE
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225798)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -95,10 +95,14 @@ struct GTY (()) machine_function
   int args_size;
   /* Number of bytes needed to store registers in frame.  */
   int save_reg_size;
+  /* Number of bytes used to store callee-saved registers.  */
+  int callee_save_reg_size;
   /* Offset from new stack pointer to store registers.  */
   int save_regs_offset;
   /* Offset from save_regs_offset to store frame pointer register.  */
   int fp_save_offset;
+  /* != 0 if function has a variable argument list.  */
+  int uses_anonymous_args;
   /* != 0 if frame layout already calculated.  */
   int initialized;
 };
@@ -378,14 +382,11 @@ nios2_compute_frame_layout (void)
   int var_size;
   int out_args_size;
   int save_reg_size;
+  int callee_save_reg_size;
 
   if (cfun->machine->initialized)
     return cfun->machine->total_size;
   
-  var_size = NIOS2_STACK_ALIGN (get_frame_size ());
-  out_args_size = NIOS2_STACK_ALIGN (crtl->outgoing_args_size);
-  total_size = var_size + out_args_size;
-
   /* Calculate space needed for gp registers.  */
   save_reg_size = 0;
   for (regno = 0; regno <= LAST_GP_REG; regno++)
@@ -395,6 +396,37 @@ nios2_compute_frame_layout (void)
 	save_reg_size += 4;
       }
 
+  /* If we are saving any callee-save register, then assume
+     push.n/pop.n should be used. Make sure RA is saved, and
+     contiguous registers starting from r16-- are all saved.  */
+  if (TARGET_HAS_CDX && save_reg_size != 0)
+    {
+      if ((save_mask & (1 << RA_REGNO)) == 0)
+	{
+	  save_mask |= 1 << RA_REGNO;
+	  save_reg_size += 4;
+	}
+
+      for (regno = 23; regno >= 16; regno--)
+	if ((save_mask & (1 << regno)) != 0)
+	  {
+	    /* Starting from highest numbered callee-saved
+	       register that is used, make sure all regs down
+	       to r16 is saved, to maintain contiguous range
+	       for push.n/pop.n.  */
+	    unsigned int i;
+	    for (i = regno - 1; i >= 16; i--)
+	      if ((save_mask & (1 << i)) == 0)
+		{
+		  save_mask |= 1 << i;
+		  save_reg_size += 4;
+		}
+	    break;
+	  }
+    }
+
+  callee_save_reg_size = save_reg_size;
+
   /* If we call eh_return, we need to save the EH data registers.  */
   if (crtl->calls_eh_return)
     {
@@ -420,6 +452,10 @@ nios2_compute_frame_layout (void)
       cfun->machine->fp_save_offset = fp_save_offset;
     }
 
+  var_size = NIOS2_STACK_ALIGN (get_frame_size ());
+  out_args_size = NIOS2_STACK_ALIGN (crtl->outgoing_args_size);
+  total_size = var_size + out_args_size;
+
   save_reg_size = NIOS2_STACK_ALIGN (save_reg_size);
   total_size += save_reg_size;
   total_size += NIOS2_STACK_ALIGN (crtl->args.pretend_args_size);
@@ -430,6 +466,7 @@ nios2_compute_frame_layout (void)
   cfun->machine->var_size = var_size;
   cfun->machine->args_size = out_args_size;
   cfun->machine->save_reg_size = save_reg_size;
+  cfun->machine->callee_save_reg_size = callee_save_reg_size;
   cfun->machine->initialized = reload_completed;
   cfun->machine->save_regs_offset = out_args_size + var_size;
 
@@ -477,6 +514,38 @@ base_reg_adjustment_p (rtx set, rtx *bas
   return false;
 }
 
+/* Does the CFA note work for push/pop prologue/epilogue instructions.  */
+static void
+nios2_create_cfa_notes (rtx_insn *insn, bool epilogue_p)
+{
+  int i = 0;
+  rtx base_reg, offset, elt, pat = PATTERN (insn);
+  if (epilogue_p)
+    {
+      elt = XVECEXP (pat, 0, 0);
+      if (GET_CODE (elt) == RETURN)
+	i++;
+      elt = XVECEXP (pat, 0, i);
+      if (base_reg_adjustment_p (elt, &base_reg, &offset))
+	{
+	  add_reg_note (insn, REG_CFA_ADJUST_CFA, copy_rtx (elt));
+	  i++;
+	}
+      for (; i < XVECLEN (pat, 0); i++)
+	{
+	  elt = SET_DEST (XVECEXP (pat, 0, i));
+	  gcc_assert (REG_P (elt));
+	  add_reg_note (insn, REG_CFA_RESTORE, elt);
+	}
+    }
+  else
+    {
+      /* Tag each of the prologue sets.  */
+      for (i = 0; i < XVECLEN (pat, 0); i++)
+	RTX_FRAME_RELATED_P (XVECEXP (pat, 0, i)) = 1;
+    }
+}
+
 /* Temp regno used inside prologue/epilogue.  */
 #define TEMP_REG_NUM 8
 
@@ -534,6 +603,39 @@ nios2_emit_add_constant (rtx reg, HOST_W
   return insn;
 }
 
+static rtx_insn *
+nios2_adjust_stack (int sp_adjust, bool epilogue_p)
+{
+  enum reg_note note_kind = REG_NOTE_MAX;
+  rtx_insn *insn = NULL;
+  if (sp_adjust)
+    {
+      if (SMALL_INT (sp_adjust))
+	insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
+					 gen_int_mode (sp_adjust, Pmode)));
+      else
+	{
+	  rtx tmp = gen_rtx_REG (Pmode, TEMP_REG_NUM);
+	  emit_move_insn (tmp, gen_int_mode (sp_adjust, Pmode));
+	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, tmp));
+	  /* Attach a note indicating what happened.  */
+	  if (!epilogue_p)
+	    note_kind = REG_FRAME_RELATED_EXPR;
+	}
+      if (epilogue_p)
+	note_kind = REG_CFA_ADJUST_CFA;
+      if (note_kind != REG_NOTE_MAX)
+	{
+	  rtx cfa_adj = gen_rtx_SET (stack_pointer_rtx,
+				     plus_constant (Pmode, stack_pointer_rtx,
+						    sp_adjust));
+	  add_reg_note (insn, note_kind, cfa_adj);
+	}
+      RTX_FRAME_RELATED_P (insn) = 1;
+    }
+  return insn;
+}
+
 void
 nios2_expand_prologue (void)
 {
@@ -548,15 +650,97 @@ nios2_expand_prologue (void)
   if (flag_stack_usage_info)
     current_function_static_stack_size = total_frame_size;
 
-  /* Decrement the stack pointer.  */
-  if (!SMALL_INT (total_frame_size))
+  /* When R2 CDX push.n/stwm is available, arrange for stack frame to be built
+     using them.  */
+  if (TARGET_HAS_CDX
+      && (cfun->machine->save_reg_size != 0
+	  || cfun->machine->uses_anonymous_args))
+    {
+      unsigned int regmask = cfun->machine->save_mask;
+      unsigned int callee_save_regs = regmask & 0xffff0000;
+      unsigned int caller_save_regs = regmask & 0x0000ffff;
+      int push_immed = 0;
+      int pretend_args_size = NIOS2_STACK_ALIGN (crtl->args.pretend_args_size);
+      rtx stack_mem =
+	gen_frame_mem (SImode, plus_constant (Pmode, stack_pointer_rtx, -4));
+
+      /* Check that there is room for the entire stack frame before doing
+	 any SP adjustments or pushes.  */
+      if (crtl->limit_stack)
+	nios2_emit_stack_limit_check (total_frame_size);
+
+      if (pretend_args_size)
+	{
+	  if (cfun->machine->uses_anonymous_args)
+	    {
+	      /* Emit a stwm to push copy of argument registers onto
+	         the stack for va_arg processing.  */
+	      unsigned int r, mask = 0, n = pretend_args_size / 4;
+	      for (r = LAST_ARG_REGNO - n + 1; r <= LAST_ARG_REGNO; r++)
+		mask |= (1 << r);
+	      insn = emit_insn (nios2_ldst_parallel
+				(false, false, false, stack_mem,
+				 -pretend_args_size, mask, false));
+	      /* Tag first SP adjustment as frame-related.  */
+	      RTX_FRAME_RELATED_P (XVECEXP (PATTERN (insn), 0, 0)) = 1;
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	    }
+	  else
+	    nios2_adjust_stack (-pretend_args_size, false);
+	}
+      if (callee_save_regs)
+	{
+	  /* Emit a push.n to save registers and optionally allocate
+	     push_immed extra bytes on the stack.  */
+	  int sp_adjust;
+	  if (caller_save_regs)
+	    /* Can't allocate extra stack space yet.  */
+	    push_immed = 0;
+	  else if (cfun->machine->save_regs_offset <= 60)
+	    /* Stack adjustment fits entirely in the push.n.  */
+	    push_immed = cfun->machine->save_regs_offset;
+	  else if (frame_pointer_needed
+		   && cfun->machine->fp_save_offset == 0)
+	    /* Deferring the entire stack adjustment until later
+	       allows us to use a mov.n instead of a 32-bit addi
+	       instruction to set the frame pointer.  */
+	    push_immed = 0;
+	  else
+	    /* Splitting the stack adjustment between the push.n
+	       and an explicit adjustment makes it more likely that
+	       we can use spdeci.n for the explicit part.  */
+	    push_immed = 60;
+	  sp_adjust = -(cfun->machine->callee_save_reg_size + push_immed);
+	  insn = emit_insn (nios2_ldst_parallel (false, false, false,
+						 stack_mem, sp_adjust,
+						 callee_save_regs, false));
+	  nios2_create_cfa_notes (insn, false);
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      if (caller_save_regs)
+	{
+	  /* Emit a stwm to save the EH data regs, r4-r7.  */
+	  int caller_save_size = (cfun->machine->save_reg_size
+				  - cfun->machine->callee_save_reg_size);
+	  gcc_assert ((caller_save_regs & ~0xf0) == 0);
+	  insn = emit_insn (nios2_ldst_parallel
+			    (false, false, false, stack_mem,
+			     -caller_save_size, caller_save_regs, false));
+	  nios2_create_cfa_notes (insn, false);
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      save_regs_base = push_immed;
+      sp_offset = -(cfun->machine->save_regs_offset - push_immed);
+    }
+  /* The non-CDX cases decrement the stack pointer, to prepare for individual
+     register saves to the stack.  */
+  else if (!SMALL_INT (total_frame_size))
     {
       /* We need an intermediary point, this will point at the spill block.  */
-      insn = emit_insn
-	(gen_add2_insn (stack_pointer_rtx,
-			gen_int_mode (cfun->machine->save_regs_offset
-				      - total_frame_size, Pmode)));
-      RTX_FRAME_RELATED_P (insn) = 1;
+      nios2_adjust_stack (cfun->machine->save_regs_offset - total_frame_size,
+			  false);
       save_regs_base = 0;
       sp_offset = -cfun->machine->save_regs_offset;
       if (crtl->limit_stack)
@@ -564,10 +748,7 @@ nios2_expand_prologue (void)
     }
   else if (total_frame_size)
     {
-      insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
-				       gen_int_mode (-total_frame_size,
-						     Pmode)));
-      RTX_FRAME_RELATED_P (insn) = 1;
+      nios2_adjust_stack (-total_frame_size, false);
       save_regs_base = cfun->machine->save_regs_offset;
       sp_offset = 0;
       if (crtl->limit_stack)
@@ -576,41 +757,34 @@ nios2_expand_prologue (void)
   else
     save_regs_base = sp_offset = 0;
 
-  save_offset = save_regs_base + cfun->machine->save_reg_size;
+  /* Save the registers individually in the non-CDX case.  */
+  if (!TARGET_HAS_CDX)
+    {
+      save_offset = save_regs_base + cfun->machine->save_reg_size;
 
-  for (regno = LAST_GP_REG; regno > 0; regno--)
-    if (cfun->machine->save_mask & (1 << regno))
-      {
-	save_offset -= 4;
-	save_reg (regno, save_offset);
-      }
+      for (regno = LAST_GP_REG; regno > 0; regno--)
+	if (cfun->machine->save_mask & (1 << regno))
+	  {
+	    save_offset -= 4;
+	    save_reg (regno, save_offset);
+	  }
+    }
 
+  /* Set the hard frame pointer.  */
   if (frame_pointer_needed)
     {
       int fp_save_offset = save_regs_base + cfun->machine->fp_save_offset;
-      insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
-				       stack_pointer_rtx,
-				       gen_int_mode (fp_save_offset, Pmode)));
+      insn =
+	(fp_save_offset == 0
+	 ? emit_move_insn (hard_frame_pointer_rtx, stack_pointer_rtx)
+	 : emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
+				     stack_pointer_rtx,
+				     gen_int_mode (fp_save_offset, Pmode))));
       RTX_FRAME_RELATED_P (insn) = 1;
     }
 
-  if (sp_offset)
-    {
-      rtx sp_adjust
-	= gen_rtx_SET (stack_pointer_rtx,
-		       plus_constant (Pmode, stack_pointer_rtx, sp_offset));
-      if (SMALL_INT (sp_offset))
-	insn = emit_insn (sp_adjust);
-      else
-	{
-	  rtx tmp = gen_rtx_REG (Pmode, TEMP_REG_NUM);
-	  emit_move_insn (tmp, gen_int_mode (sp_offset, Pmode));
-	  insn = emit_insn (gen_add2_insn (stack_pointer_rtx, tmp));
-	  /* Attach the sp_adjust as a note indicating what happened.  */
-	  add_reg_note (insn, REG_FRAME_RELATED_EXPR, sp_adjust);
-	}
-      RTX_FRAME_RELATED_P (insn) = 1;
-    }
+  /* Allocate sp_offset more bytes in the stack frame.  */
+  nios2_adjust_stack (sp_offset, false);
 
   /* Load the PIC register if needed.  */
   if (crtl->uses_pic_offset_table)
@@ -643,9 +817,12 @@ nios2_expand_epilogue (bool sibcall_p)
   if (frame_pointer_needed)
     {
       /* Recover the stack pointer.  */
-      insn = emit_insn (gen_add3_insn
-			(stack_pointer_rtx, hard_frame_pointer_rtx,
-			 gen_int_mode (-cfun->machine->fp_save_offset, Pmode)));
+      insn =
+	(cfun->machine->fp_save_offset == 0
+	 ? emit_move_insn (stack_pointer_rtx, hard_frame_pointer_rtx)
+	 : emit_insn (gen_add3_insn
+		      (stack_pointer_rtx, hard_frame_pointer_rtx,
+		       gen_int_mode (-cfun->machine->fp_save_offset, Pmode))));
       cfa_adj = plus_constant (Pmode, stack_pointer_rtx,
 			       (total_frame_size
 				- cfun->machine->save_regs_offset));
@@ -657,15 +834,7 @@ nios2_expand_epilogue (bool sibcall_p)
     }
   else if (!SMALL_INT (total_frame_size))
     {
-      rtx tmp = gen_rtx_REG (Pmode, TEMP_REG_NUM);
-      emit_move_insn (tmp, gen_int_mode (cfun->machine->save_regs_offset,
-					 Pmode));
-      insn = emit_insn (gen_add2_insn (stack_pointer_rtx, tmp));
-      cfa_adj = gen_rtx_SET (stack_pointer_rtx,
-			     plus_constant (Pmode, stack_pointer_rtx,
-					    cfun->machine->save_regs_offset));
-      add_reg_note (insn, REG_CFA_ADJUST_CFA, cfa_adj);
-      RTX_FRAME_RELATED_P (insn) = 1;
+      nios2_adjust_stack (cfun->machine->save_regs_offset, true);
       save_offset = 0;
       sp_adjust = total_frame_size - cfun->machine->save_regs_offset;
     }
@@ -674,25 +843,93 @@ nios2_expand_epilogue (bool sibcall_p)
       save_offset = cfun->machine->save_regs_offset;
       sp_adjust = total_frame_size;
     }
-  
-  save_offset += cfun->machine->save_reg_size;
 
-  for (regno = LAST_GP_REG; regno > 0; regno--)
-    if (cfun->machine->save_mask & (1 << regno))
-      {
-	save_offset -= 4;
-	restore_reg (regno, save_offset);
-      }
+  if (!TARGET_HAS_CDX)
+    {
+      /* Generate individual register restores.  */
+      save_offset += cfun->machine->save_reg_size;
 
-  if (sp_adjust)
+      for (regno = LAST_GP_REG; regno > 0; regno--)
+	if (cfun->machine->save_mask & (1 << regno))
+	  {
+	    save_offset -= 4;
+	    restore_reg (regno, save_offset);
+	  }
+      nios2_adjust_stack (sp_adjust, true);
+    }
+  else if (cfun->machine->save_reg_size == 0)
     {
-      insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
-				       gen_int_mode (sp_adjust, Pmode)));
-      cfa_adj = gen_rtx_SET (stack_pointer_rtx,
-			     plus_constant (Pmode, stack_pointer_rtx,
-					    sp_adjust));
-      add_reg_note (insn, REG_CFA_ADJUST_CFA, cfa_adj);
-      RTX_FRAME_RELATED_P (insn) = 1;
+      /* Nothing to restore, just recover the stack position.  */
+      nios2_adjust_stack (sp_adjust, true);
+    }
+  else
+    {
+      /* Emit CDX pop.n/ldwm to restore registers and optionally return.  */
+      unsigned int regmask = cfun->machine->save_mask;
+      unsigned int callee_save_regs = regmask & 0xffff0000;
+      unsigned int caller_save_regs = regmask & 0x0000ffff;
+      int callee_save_size = cfun->machine->callee_save_reg_size;
+      int caller_save_size = cfun->machine->save_reg_size - callee_save_size;
+      int pretend_args_size = NIOS2_STACK_ALIGN (crtl->args.pretend_args_size);
+      bool ret_p = (!pretend_args_size && !crtl->calls_eh_return
+		    && !sibcall_p);
+
+      if (!ret_p || caller_save_size > 0)
+	sp_adjust = save_offset;
+      else
+	sp_adjust = (save_offset > 60 ? save_offset - 60 : 0);
+
+      save_offset -= sp_adjust;
+
+      nios2_adjust_stack (sp_adjust, true);
+
+      if (caller_save_regs)
+	{
+	  /* Emit a ldwm to restore EH data regs.  */
+	  rtx stack_mem = gen_frame_mem (SImode, stack_pointer_rtx);
+	  insn = emit_insn (nios2_ldst_parallel
+			    (true, true, true, stack_mem,
+			     caller_save_size, caller_save_regs, false));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	  nios2_create_cfa_notes (insn, true);
+	}
+
+      if (callee_save_regs)
+	{
+	  int sp_adjust = save_offset + callee_save_size;
+	  rtx stack_mem;
+	  if (ret_p)
+	    {
+	      /* Emit a pop.n to restore regs and return.  */
+	      stack_mem =
+		gen_frame_mem (SImode,
+			       gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+					     gen_int_mode (sp_adjust - 4,
+							   Pmode)));
+	      insn =
+		emit_jump_insn (nios2_ldst_parallel (true, false, false,
+						     stack_mem, sp_adjust,
+						     callee_save_regs, ret_p));
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	      /* No need to attach CFA notes since we cannot step over
+		 a return.  */
+	      return;
+	    }
+	  else
+	    {
+	      /* If no return, we have to use the ldwm form.  */
+	      stack_mem = gen_frame_mem (SImode, stack_pointer_rtx);
+	      insn =
+		emit_insn (nios2_ldst_parallel (true, true, true,
+						stack_mem, sp_adjust,
+						callee_save_regs, ret_p));
+	      RTX_FRAME_RELATED_P (insn) = 1;
+	      nios2_create_cfa_notes (insn, true);
+	    }
+	}
+
+      if (pretend_args_size)
+	nios2_adjust_stack (pretend_args_size, true);
     }
 
   /* Add in the __builtin_eh_return stack adjustment.  */
@@ -703,6 +940,37 @@ nios2_expand_epilogue (bool sibcall_p)
     emit_jump_insn (gen_simple_return ());
 }
 
+bool
+nios2_expand_return (void)
+{
+  /* If CDX is available, generate a pop.n instruction to do both
+     the stack pop and return.  */
+  if (TARGET_HAS_CDX)
+    {
+      int total_frame_size = nios2_compute_frame_layout ();
+      int sp_adjust = (cfun->machine->save_regs_offset
+		       + cfun->machine->callee_save_reg_size);
+      gcc_assert (sp_adjust == total_frame_size);
+      if (sp_adjust != 0)
+	{
+	  rtx mem =
+	    gen_frame_mem (SImode,
+			   plus_constant (Pmode, stack_pointer_rtx,
+					  sp_adjust - 4, false));
+	  rtx_insn *insn =
+	    emit_jump_insn (nios2_ldst_parallel (true, false, false,
+						 mem, sp_adjust,
+						 cfun->machine->save_mask,
+						 true));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	  /* No need to create CFA notes since we can't step over
+	     a return.  */
+	  return true;
+	}
+    }
+  return false;
+}
+
 /* Implement RETURN_ADDR_RTX.  Note, we do not support moving
    back to a previous frame.  */
 rtx
@@ -874,10 +1142,24 @@ nios2_initial_elimination_offset (int fr
 int
 nios2_can_use_return_insn (void)
 {
+  int total_frame_size;
+
   if (!reload_completed || crtl->profile)
     return 0;
 
-  return nios2_compute_frame_layout () == 0;
+  total_frame_size = nios2_compute_frame_layout ();
+
+  /* If CDX is available, check if we can return using a
+     single pop.n instruction.  */
+  if (TARGET_HAS_CDX
+      && !frame_pointer_needed
+      && cfun->machine->save_regs_offset <= 60
+      && (cfun->machine->save_mask & 0x80000000) != 0
+      && (cfun->machine->save_mask & 0xffff) == 0
+      && crtl->args.pretend_args_size == 0)
+    return true;
+
+  return total_frame_size == 0;
 }
 
 \f
@@ -2785,12 +3067,15 @@ nios2_setup_incoming_varargs (cumulative
   int regs_to_push;
   int pret_size;
 
+  cfun->machine->uses_anonymous_args = 1;
   local_cum = *cum;
-  nios2_function_arg_advance (local_cum_v, mode, type, 1);
+  nios2_function_arg_advance (local_cum_v, mode, type, true);
 
   regs_to_push = NUM_ARG_REGS - local_cum.regs_used;
 
-  if (!second_time && regs_to_push > 0)
+  /* If we can use CDX stwm to push the arguments on the stack,
+     nios2_expand_prologue will do that instead.  */
+  if (!TARGET_HAS_CDX && !second_time && regs_to_push > 0)
     {
       rtx ptr = virtual_incoming_args_rtx;
       rtx mem = gen_rtx_MEM (BLKmode, ptr);
Index: gcc/config/nios2/nios2.md
===================================================================
--- gcc/config/nios2/nios2.md	(revision 225798)
+++ gcc/config/nios2/nios2.md	(working copy)
@@ -746,7 +746,10 @@
 (define_expand "return"
   [(simple_return)]
   "nios2_can_use_return_insn ()"
-  "")
+{
+  if (nios2_expand_return ())
+    DONE;
+})
 
 (define_insn "simple_return"
   [(simple_return)]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [nios2] [7/7] Add new intrinsics
  2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
                   ` (5 preceding siblings ...)
  2015-07-14 23:49 ` [nios2] [6/7] Update function prologues/epilogues for R2 CDX Sandra Loosemore
@ 2015-07-15  0:03 ` Sandra Loosemore
  6 siblings, 0 replies; 8+ messages in thread
From: Sandra Loosemore @ 2015-07-15  0:03 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 324 bytes --]

This patch adds a set of new built-in functions to the Nios II back
end.  Some of these are specific to R2 (wrpie, eni, and the MPX
load/store exclusive instructions) and others (rdprs, flushd, flushda)
correspond to instructions also present in R1 but not previously
exposed as intrinsics.

Committed as r225800.

-Sandra


[-- Attachment #2: r2-7.log --]
[-- Type: text/x-log, Size: 1447 bytes --]

2015-07-14  Sandra Loosemore  <sandra@codesourcery.com>
	    Cesar Philippidis  <cesar@codesourcery.com>
	    Chung-Lin Tang  <cltang@codesourcery.com>

	gcc/
	* config/nios2/constraints.md (U, v): New constraints.
	* config/nios2/predicates.md (rdprs_dcache_operand): New.
	(ldstex_memory_operand): New.
	* config/nios2/sync.md: New file.
	* config/nios2/nios2.md (unspecv): Add new builtin function
	UNSPECV codes.
	(rdprs, flushd, flushda, wrpie, eni): New patterns.
	(top-level): Include sync.md.
	* config/nios2/nios2.c (N2_FTYPES): Add function types for
	new builtins.
	(N2_BUILTINS): Add arch field setting, add new builtins.
	(enum nios2_builtin_code,nios2_builtins): Update N2_BUILTIN_DEF
	for arch field.
	(nios2_expand_ldst_builtin): Rename from nios2_expand_ldstio_builtin.
	Also handle ldex/stex/ldsex/stsex builtins.
	(nios2_expand_rdprs_builtin): New function.
	(nios2_expand_cache_builtin): New function.
	(nios2_expand_wrpie_builtin): New function.
	(nios2_expand_eni_builtin): New function.
	(nios2_expand_builtin): Add arch field handling and new builtin 
        cases.
	* doc/extend.texi (Altera Nios II Built-in Functions): Document
	new builtins.
	* doc/md.texi (Machine Constraints): Document U and v constraints.

	gcc/testsuite/
	* gcc.target/nios2/nios2-flushd.c: New.
	* gcc.target/nios2/nios2-rdprs.c: New.
	* gcc.target/nios2/r2-atomic.c: New.
	* gcc.target/nios2/r2-eni.c: New.
	* gcc.target/nios2/r2-wrpie.c: New.

[-- Attachment #3: r2-7.patch --]
[-- Type: text/x-patch, Size: 19907 bytes --]

Index: gcc/config/nios2/constraints.md
===================================================================
--- gcc/config/nios2/constraints.md	(revision 225798)
+++ gcc/config/nios2/constraints.md	(working copy)
@@ -28,9 +28,11 @@
 ;;  M: 0
 ;;  N: 0 to 255 (for custom instruction numbers)
 ;;  O: 0 to 31 (for control register numbers)
+;;  U: -32768 to 32767 under R1, -2048 to 2047 under R2
 ;;
 ;; We use the following constraint letters for memory constraints
 ;;
+;;  v: memory operands for R2 load/store exclusive instructions
 ;;  w: memory operands for load/store IO and cache instructions
 ;;
 ;; We use the following built-in register classes:
@@ -100,6 +102,17 @@
   "A constant unspec offset representing a relocation."
   (match_test "nios2_unspec_reloc_p (op)"))
 
+(define_constraint "U"
+  "A 12-bit or 16-bit constant (for RDPRS and DCACHE)."
+  (and (match_code "const_int")
+       (if_then_else (match_test "TARGET_ARCH_R2")
+                     (match_test "SMALL_INT12 (ival)")
+                     (match_test "SMALL_INT (ival)"))))
+
+(define_memory_constraint "v"
+  "A memory operand suitable for R2 load/store exclusive instructions."
+  (match_operand 0 "ldstex_memory_operand"))
+
 (define_memory_constraint "w"
   "A memory operand suitable for load/store IO and cache instructions."
   (match_operand 0 "ldstio_memory_operand"))
Index: gcc/config/nios2/predicates.md
===================================================================
--- gcc/config/nios2/predicates.md	(revision 225798)
+++ gcc/config/nios2/predicates.md	(working copy)
@@ -81,6 +81,12 @@
   (and (match_code "const_int")
        (match_test "RDWRCTL_INT (INTVAL (op))")))
 
+(define_predicate "rdprs_dcache_operand"
+  (and (match_code "const_int")
+       (if_then_else (match_test "TARGET_ARCH_R2")
+                     (match_test "SMALL_INT12 (INTVAL (op))")
+                     (match_test "SMALL_INT (INTVAL (op))"))))
+
 (define_predicate "custom_insn_opcode"
   (and (match_code "const_int")
        (match_test "CUSTOM_INSN_OPCODE (INTVAL (op))")))
@@ -144,3 +150,10 @@
     }
   return memory_operand (op, mode);
 })
+
+(define_predicate "ldstex_memory_operand"
+  (match_code "mem")
+{
+  /* ldex/ldsex/stex/stsex cannot handle memory addresses with offsets.  */
+  return GET_CODE (XEXP (op, 0)) == REG;
+})
Index: gcc/config/nios2/sync.md
===================================================================
--- gcc/config/nios2/sync.md	(revision 0)
+++ gcc/config/nios2/sync.md	(revision 0)
@@ -0,0 +1,45 @@
+;; Machine Description for Altera Nios II synchronization primitives.
+;; Copyright (C) 2014-2015 Free Software Foundation, Inc.
+;; Contributed by Mentor Graphics, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_int_iterator UNSPECV_LOAD_EXCLUSIVE [UNSPECV_LDEX UNSPECV_LDSEX])
+(define_int_attr load_exclusive [(UNSPECV_LDEX  "ldex")
+                                 (UNSPECV_LDSEX "ldsex")])
+(define_insn "<load_exclusive>"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI
+          [(match_operand:SI 1 "ldstex_memory_operand" "v")]
+          UNSPECV_LOAD_EXCLUSIVE))]
+  "TARGET_ARCH_R2"
+  "<load_exclusive>\\t%0, %A1"
+  [(set_attr "type" "ld")])
+
+(define_int_iterator UNSPECV_STORE_EXCLUSIVE [UNSPECV_STEX UNSPECV_STSEX])
+(define_int_attr store_exclusive [(UNSPECV_STEX  "stex")
+                                  (UNSPECV_STSEX "stsex")])
+(define_insn "<store_exclusive>"
+  [(set (match_operand:SI 2 "register_operand" "=r")
+        (unspec_volatile:SI [(const_int 0)] UNSPECV_STORE_EXCLUSIVE))
+   (set (match_operand:SI 0 "ldstex_memory_operand" "=v")
+        (unspec_volatile:SI
+          [(match_operand:SI 1 "reg_or_0_operand" "rM")]
+          UNSPECV_STORE_EXCLUSIVE))]
+  "TARGET_ARCH_R2"
+  "<store_exclusive>\\t%2, %z1, %A0"
+  [(set_attr "type" "st")])
Index: gcc/config/nios2/nios2.md
===================================================================
--- gcc/config/nios2/nios2.md	(revision 225799)
+++ gcc/config/nios2/nios2.md	(working copy)
@@ -62,6 +62,15 @@
   UNSPECV_CUSTOM_XNXX
   UNSPECV_LDXIO
   UNSPECV_STXIO
+  UNSPECV_RDPRS
+  UNSPECV_FLUSHD
+  UNSPECV_FLUSHDA
+  UNSPECV_WRPIE
+  UNSPECV_ENI
+  UNSPECV_LDEX
+  UNSPECV_LDSEX
+  UNSPECV_STEX
+  UNSPECV_STSEX
 ])
 
 (define_c_enum "unspec" [
@@ -1127,6 +1136,48 @@
   "wrctl\\tctl%0, %z1"
   [(set_attr "type" "control")])
 
+(define_insn "rdprs"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI [(match_operand:SI 1 "rdwrctl_operand" "O")
+                             (match_operand:SI 2 "arith_operand"   "U")]
+         UNSPECV_RDPRS))]
+  ""
+  "rdprs\\t%0, %1, %2"
+  [(set_attr "type" "control")])
+
+;; Cache Instructions
+
+(define_insn "flushd"
+  [(unspec_volatile:SI [(match_operand:SI 0 "ldstio_memory_operand" "w")]
+  		        UNSPECV_FLUSHD)]
+  ""
+  "flushd\\t%0"
+  [(set_attr "type" "control")])
+
+(define_insn "flushda"
+  [(unspec_volatile:SI [(match_operand:SI 0 "ldstio_memory_operand" "w")]
+  		        UNSPECV_FLUSHDA)]
+  ""
+  "flushda\\t%0"
+  [(set_attr "type" "control")])
+
+;; R2 Instructions
+
+(define_insn "wrpie"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (unspec_volatile:SI [(match_operand:SI 1 "register_operand" "r")]
+		 	     UNSPECV_WRPIE))]
+  "TARGET_ARCH_R2"
+  "wrpie\\t%0, %1"
+  [(set_attr "type" "control")])
+
+(define_insn "eni"
+  [(unspec:VOID [(match_operand 0 "const_int_operand" "i")]
+  		 UNSPECV_ENI)]
+  "TARGET_ARCH_R2"
+  "eni\\t%0"
+  [(set_attr "type" "control")])
+
 ;; Trap patterns
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 3))]
@@ -1172,6 +1223,10 @@
   emit_move_insn (operands[0], gen_rtx_REG (Pmode, TP_REGNO));
   DONE;
 })
+
+;; Synchronization Primitives
+(include "sync.md")
+
 ;; Include the ldwm/stwm/push.n/pop.n patterns and peepholes.
 (include "ldstwm.md")
 
Index: gcc/config/nios2/nios2.c
===================================================================
--- gcc/config/nios2/nios2.c	(revision 225799)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -135,12 +135,16 @@ static bool custom_code_conflict = false
   N2_FTYPE(2, (SI, SF))				\
   N2_FTYPE(3, (SI, SF, SF))			\
   N2_FTYPE(2, (SI, SI))				\
+  N2_FTYPE(3, (SI, SI, SI))			\
+  N2_FTYPE(3, (SI, VPTR, SI))			\
   N2_FTYPE(2, (UI, CVPTR))			\
   N2_FTYPE(2, (UI, DF))				\
   N2_FTYPE(2, (UI, SF))				\
   N2_FTYPE(2, (VOID, DF))			\
   N2_FTYPE(2, (VOID, SF))			\
+  N2_FTYPE(2, (VOID, SI))			\
   N2_FTYPE(3, (VOID, SI, SI))			\
+  N2_FTYPE(2, (VOID, VPTR))			\
   N2_FTYPE(3, (VOID, VPTR, SI))
 
 #define N2_FTYPE_OP1(R)         N2_FTYPE_ ## R ## _VOID
@@ -3266,33 +3270,43 @@ nios2_expand_custom_builtin (tree exp, u
 struct nios2_builtin_desc
 {
   enum insn_code icode;
+  enum nios2_arch_type arch;
   enum nios2_ftcode ftype;
   const char *name;
 };
 
 #define N2_BUILTINS					\
-  N2_BUILTIN_DEF (sync,   N2_FTYPE_VOID_VOID)		\
-  N2_BUILTIN_DEF (ldbio,  N2_FTYPE_SI_CVPTR)		\
-  N2_BUILTIN_DEF (ldbuio, N2_FTYPE_UI_CVPTR)		\
-  N2_BUILTIN_DEF (ldhio,  N2_FTYPE_SI_CVPTR)		\
-  N2_BUILTIN_DEF (ldhuio, N2_FTYPE_UI_CVPTR)		\
-  N2_BUILTIN_DEF (ldwio,  N2_FTYPE_SI_CVPTR)		\
-  N2_BUILTIN_DEF (stbio,  N2_FTYPE_VOID_VPTR_SI)	\
-  N2_BUILTIN_DEF (sthio,  N2_FTYPE_VOID_VPTR_SI)	\
-  N2_BUILTIN_DEF (stwio,  N2_FTYPE_VOID_VPTR_SI)	\
-  N2_BUILTIN_DEF (rdctl,  N2_FTYPE_SI_SI)		\
-  N2_BUILTIN_DEF (wrctl,  N2_FTYPE_VOID_SI_SI)
+  N2_BUILTIN_DEF (sync,    R1, N2_FTYPE_VOID_VOID)	\
+  N2_BUILTIN_DEF (ldbio,   R1, N2_FTYPE_SI_CVPTR)	\
+  N2_BUILTIN_DEF (ldbuio,  R1, N2_FTYPE_UI_CVPTR)	\
+  N2_BUILTIN_DEF (ldhio,   R1, N2_FTYPE_SI_CVPTR)	\
+  N2_BUILTIN_DEF (ldhuio,  R1, N2_FTYPE_UI_CVPTR)	\
+  N2_BUILTIN_DEF (ldwio,   R1, N2_FTYPE_SI_CVPTR)	\
+  N2_BUILTIN_DEF (stbio,   R1, N2_FTYPE_VOID_VPTR_SI)	\
+  N2_BUILTIN_DEF (sthio,   R1, N2_FTYPE_VOID_VPTR_SI)	\
+  N2_BUILTIN_DEF (stwio,   R1, N2_FTYPE_VOID_VPTR_SI)	\
+  N2_BUILTIN_DEF (rdctl,   R1, N2_FTYPE_SI_SI)		\
+  N2_BUILTIN_DEF (wrctl,   R1, N2_FTYPE_VOID_SI_SI)	\
+  N2_BUILTIN_DEF (rdprs,   R1, N2_FTYPE_SI_SI_SI)	\
+  N2_BUILTIN_DEF (flushd,  R1, N2_FTYPE_VOID_VPTR)	\
+  N2_BUILTIN_DEF (flushda, R1, N2_FTYPE_VOID_VPTR)	\
+  N2_BUILTIN_DEF (wrpie,   R2, N2_FTYPE_SI_SI)		\
+  N2_BUILTIN_DEF (eni,     R2, N2_FTYPE_VOID_SI)	\
+  N2_BUILTIN_DEF (ldex,    R2, N2_FTYPE_SI_CVPTR)	\
+  N2_BUILTIN_DEF (ldsex,   R2, N2_FTYPE_SI_CVPTR)	\
+  N2_BUILTIN_DEF (stex,    R2, N2_FTYPE_SI_VPTR_SI)	\
+  N2_BUILTIN_DEF (stsex,   R2, N2_FTYPE_SI_VPTR_SI)
 
 enum nios2_builtin_code {
-#define N2_BUILTIN_DEF(name, ftype) NIOS2_BUILTIN_ ## name,
+#define N2_BUILTIN_DEF(name, arch, ftype) NIOS2_BUILTIN_ ## name,
   N2_BUILTINS
 #undef N2_BUILTIN_DEF
   NUM_FIXED_NIOS2_BUILTINS
 };
 
 static const struct nios2_builtin_desc nios2_builtins[] = {
-#define N2_BUILTIN_DEF(name, ftype)			\
-  { CODE_FOR_ ## name, ftype, "__builtin_" #name },
+#define N2_BUILTIN_DEF(name, arch, ftype)		\
+  { CODE_FOR_ ## name, ARCH_ ## arch, ftype, "__builtin_" #name },
   N2_BUILTINS
 #undef N2_BUILTIN_DEF
 };
@@ -3373,10 +3387,11 @@ nios2_expand_builtin_insn (const struct 
     } 
 }
 
-/* Expand ldio/stio form load-store instruction builtins.  */
+/* Expand ldio/stio and ldex/ldsex/stex/stsex form load-store
+   instruction builtins.  */
 static rtx
-nios2_expand_ldstio_builtin (tree exp, rtx target,
-			     const struct nios2_builtin_desc *d)
+nios2_expand_ldst_builtin (tree exp, rtx target,
+			   const struct nios2_builtin_desc *d)
 {
   bool has_target_p;
   rtx addr, mem, val;
@@ -3388,14 +3403,21 @@ nios2_expand_ldstio_builtin (tree exp, r
 
   if (insn_data[d->icode].operand[0].allows_mem)
     {
-      /* stxio.  */
+      /* stxio/stex/stsex.  */
       val = expand_normal (CALL_EXPR_ARG (exp, 1));
       if (CONST_INT_P (val))
 	val = force_reg (mode, gen_int_mode (INTVAL (val), mode));
       val = simplify_gen_subreg (mode, val, GET_MODE (val), 0);
       create_output_operand (&ops[0], mem, mode);
       create_input_operand (&ops[1], val, mode);
-      has_target_p = false;
+      if (insn_data[d->icode].n_operands == 3)
+	{
+	  /* stex/stsex status value, returned as result of function.  */
+	  create_output_operand (&ops[2], target, mode);
+	  has_target_p = true;
+	}
+      else
+	has_target_p = false;
     }
   else
     {
@@ -3404,7 +3426,8 @@ nios2_expand_ldstio_builtin (tree exp, r
       create_input_operand (&ops[1], mem, mode);
       has_target_p = true;
     }
-  return nios2_expand_builtin_insn (d, 2, ops, has_target_p);
+  return nios2_expand_builtin_insn (d, insn_data[d->icode].n_operands, ops,
+				    has_target_p);
 }
 
 /* Expand rdctl/wrctl builtins.  */
@@ -3436,6 +3459,81 @@ nios2_expand_rdwrctl_builtin (tree exp, 
   return nios2_expand_builtin_insn (d, 2, ops, has_target_p);
 }
 
+static rtx
+nios2_expand_rdprs_builtin (tree exp, rtx target,
+			    const struct nios2_builtin_desc *d)
+{
+  rtx reg = expand_normal (CALL_EXPR_ARG (exp, 0));
+  rtx imm = expand_normal (CALL_EXPR_ARG (exp, 1));
+  struct expand_operand ops[MAX_RECOG_OPERANDS];
+
+  if (!rdwrctl_operand (reg, VOIDmode))
+    {
+      error ("Register number must be in range 0-31 for %s",
+	     d->name);
+      return gen_reg_rtx (SImode);
+    }
+
+  if (!rdprs_dcache_operand (imm, VOIDmode))
+    {
+      error ("The immediate value must fit into a %d-bit integer for %s",
+	     (TARGET_ARCH_R2) ? 12 : 16, d->name);
+      return gen_reg_rtx (SImode);
+    }
+
+  create_output_operand (&ops[0], target, SImode);
+  create_input_operand (&ops[1], reg, SImode);
+  create_integer_operand (&ops[2], INTVAL (imm));
+
+  return nios2_expand_builtin_insn (d, 3, ops, true);
+}
+
+static rtx
+nios2_expand_cache_builtin (tree exp, rtx target ATTRIBUTE_UNUSED,
+			    const struct nios2_builtin_desc *d)
+{
+  rtx mem, addr;
+  struct expand_operand ops[MAX_RECOG_OPERANDS];
+
+  addr = expand_normal (CALL_EXPR_ARG (exp, 0));
+  mem = gen_rtx_MEM (SImode, addr);
+
+  create_input_operand (&ops[0], mem, SImode);
+ 
+  return nios2_expand_builtin_insn (d, 1, ops, false);
+}
+
+static rtx
+nios2_expand_wrpie_builtin (tree exp, rtx target,
+			    const struct nios2_builtin_desc *d)
+{
+  rtx val;
+  struct expand_operand ops[MAX_RECOG_OPERANDS];
+
+  val = expand_normal (CALL_EXPR_ARG (exp, 0));
+  create_input_operand (&ops[1], val, SImode);
+  create_output_operand (&ops[0], target, SImode);
+ 
+  return nios2_expand_builtin_insn (d, 2, ops, true);
+}
+
+static rtx
+nios2_expand_eni_builtin (tree exp, rtx target ATTRIBUTE_UNUSED,
+			    const struct nios2_builtin_desc *d)
+{
+  rtx imm = expand_normal (CALL_EXPR_ARG (exp, 0));
+  struct expand_operand ops[MAX_RECOG_OPERANDS];
+
+  if (INTVAL (imm) != 0 && INTVAL (imm) != 1)
+    {
+      error ("The ENI instruction operand must be either 0 or 1");
+      return const0_rtx;      
+    }
+  create_integer_operand (&ops[0], INTVAL (imm));
+ 
+  return nios2_expand_builtin_insn (d, 1, ops, false);
+}
+
 /* Implement TARGET_EXPAND_BUILTIN.  Expand an expression EXP that calls
    a built-in function, with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
@@ -3454,6 +3552,14 @@ nios2_expand_builtin (tree exp, rtx targ
     {
       const struct nios2_builtin_desc *d = &nios2_builtins[fcode];
 
+      if (d->arch > nios2_arch_option)
+	{
+	  error ("Builtin function %s requires Nios II R%d",
+		 d->name, (int) d->arch);
+	  /* Given it is invalid, just generate a normal call.  */
+	  return expand_call (exp, target, ignore);
+	}
+
       switch (fcode)
 	{
 	case NIOS2_BUILTIN_sync:
@@ -3468,12 +3574,29 @@ nios2_expand_builtin (tree exp, rtx targ
 	case NIOS2_BUILTIN_stbio:
 	case NIOS2_BUILTIN_sthio:
 	case NIOS2_BUILTIN_stwio:
-	  return nios2_expand_ldstio_builtin (exp, target, d);
+	case NIOS2_BUILTIN_ldex:
+	case NIOS2_BUILTIN_ldsex:
+	case NIOS2_BUILTIN_stex:
+	case NIOS2_BUILTIN_stsex:
+	  return nios2_expand_ldst_builtin (exp, target, d);
 
 	case NIOS2_BUILTIN_rdctl:
 	case NIOS2_BUILTIN_wrctl:
 	  return nios2_expand_rdwrctl_builtin (exp, target, d);
 
+	case NIOS2_BUILTIN_rdprs:
+	  return nios2_expand_rdprs_builtin (exp, target, d);
+
+	case NIOS2_BUILTIN_flushd:
+	case NIOS2_BUILTIN_flushda:
+	  return nios2_expand_cache_builtin (exp, target, d);
+
+	case NIOS2_BUILTIN_wrpie:
+	  return nios2_expand_wrpie_builtin (exp, target, d);
+
+	case NIOS2_BUILTIN_eni:
+	  return nios2_expand_eni_builtin (exp, target, d);
+
 	default:
 	  gcc_unreachable ();
 	}
Index: gcc/doc/extend.texi
===================================================================
--- gcc/doc/extend.texi	(revision 225798)
+++ gcc/doc/extend.texi	(working copy)
@@ -11045,7 +11045,16 @@ void __builtin_sthio (volatile void *, i
 void __builtin_stwio (volatile void *, int)
 void __builtin_sync (void)
 int __builtin_rdctl (int) 
+int __builtin_rdprs (int, int)
 void __builtin_wrctl (int, int)
+void __builtin_flushd (volatile void *)
+void __builtin_flushda (volatile void *)
+int __builtin_wrpie (int);
+void __builtin_eni (int);
+int __builtin_ldex (volatile const void *)
+int __builtin_stex (volatile void *, int)
+int __builtin_ldsex (volatile const void *)
+int __builtin_stsex (volatile void *, int)
 @end example
 
 The following built-in functions are always available.  They
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	(revision 225798)
+++ gcc/doc/md.texi	(working copy)
@@ -2999,6 +2999,14 @@ Matches immediates which are addresses i
 data section and therefore can be added to @code{gp}
 as a 16-bit immediate to re-create their 32-bit value.
 
+@item U
+Matches constants suitable as an operand for the rdprs and
+cache instructions.
+
+@item v
+A memory operand suitable for Nios II R2 load/store
+exclusive instructions.
+
 @item w
 A memory operand suitable for load/store IO and cache
 instructions.
Index: gcc/testsuite/gcc.target/nios2/nios2-flushd.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/nios2-flushd.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/nios2-flushd.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do assemble } */
+/* { dg-options "-O" } */
+
+void test_flushd (unsigned char* p1, unsigned char* p2)
+{
+  __builtin_flushd (p1);
+  __builtin_flushd (p2);
+  __builtin_flushd (p2 + 1);
+  __builtin_flushd (p2 + 2);
+  __builtin_flushd (p2 + 2047);
+  __builtin_flushd (p2 + 2048);
+}
+
+void test_flushda (unsigned char* p1, unsigned char* p2)
+{
+  __builtin_flushda (p1);
+  __builtin_flushda (p2);
+  __builtin_flushda (p2 + 1);
+  __builtin_flushda (p2 + 2);
+  __builtin_flushda (p2 + 2047);
+  __builtin_flushda (p2 + 2048);
+}
Index: gcc/testsuite/gcc.target/nios2/nios2-rdprs.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/nios2-rdprs.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/nios2-rdprs.c	(revision 0)
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler "rdprs" } } */
+
+int x ()
+{
+  __builtin_rdprs (3,934);
+  return 0;
+} 
Index: gcc/testsuite/gcc.target/nios2/r2-atomic.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-atomic.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-atomic.c	(revision 0)
@@ -0,0 +1,49 @@
+/* { dg-do assemble } */
+/* { dg-options "-O -march=r2" } */
+
+int test_stex (unsigned char* p1, unsigned char* p2)
+{
+  int a, b, c, d;
+  a = __builtin_stex (p1, *p2);
+  b = __builtin_stex (p2, 0);
+  c = __builtin_stex (p2 + 1, 0x80);
+  d = __builtin_stex (p2 + 2, 0x7f);
+
+  return a + b + c + d;
+}
+
+int test_stsex (unsigned short* p1, unsigned short* p2)
+{
+  int a, b, c, d;
+  
+  a = __builtin_stsex (p1, *p2);
+  b = __builtin_stsex (p2, 0);
+  c = __builtin_stsex (p2 + 1, 0x8000);
+  d = __builtin_stsex (p2 + 2, 0x7fff);
+
+  return a + b + c + d;
+}
+
+int test_ldex (unsigned char* p1, unsigned char* p2)
+{
+  int a, b, c, d;
+  
+  a = __builtin_ldex (p1);
+  b = __builtin_ldex (p2);
+  c = __builtin_ldex (p2 + 1);
+  d = __builtin_ldex (p2 + 2);
+
+  return a + b + c + d;
+}
+
+int test_ldsex (unsigned char* p1, unsigned char* p2)
+{
+  int a, b, c, d;
+  
+  a = __builtin_ldsex (p1);
+  b = __builtin_ldsex (p2);
+  c = __builtin_ldsex (p2 + 1);
+  d = __builtin_ldsex (p2 + 2);
+
+  return a + b + c + d;
+}
Index: gcc/testsuite/gcc.target/nios2/r2-eni.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-eni.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-eni.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2" } */
+/* { dg-final { scan-assembler "eni" } } */
+
+void
+foo (void)
+{
+  __builtin_eni (0);
+  __builtin_eni (1);
+}
Index: gcc/testsuite/gcc.target/nios2/r2-wrpie.c
===================================================================
--- gcc/testsuite/gcc.target/nios2/r2-wrpie.c	(revision 0)
+++ gcc/testsuite/gcc.target/nios2/r2-wrpie.c	(revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=r2" } */
+/* { dg-final { scan-assembler "wrpie" } } */
+
+int
+foo (int a)
+{
+  int b;
+
+  b = __builtin_wrpie (a);
+  a = __builtin_wrpie (b);
+
+  return a + b;
+}

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-14 23:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-14 22:29 [nios2] [0/7] Support for Nios II R2 Sandra Loosemore
2015-07-14 22:35 ` [nios2] [1/7] Add -march=, -mbmx, -mcdx flags Sandra Loosemore
2015-07-14 23:01 ` [nios2] [2/7] Adjust for reduced offsets in R2 load/store IO insns Sandra Loosemore
2015-07-14 23:18 ` [nios2] [3/7] Correct nested function trampolines for R2 encodings Sandra Loosemore
2015-07-14 23:29 ` [nios2] [4/7] Support new R2 instructions Sandra Loosemore
2015-07-14 23:33 ` [nios2] [5/7] Support R2 CDX load/store multiple instructions Sandra Loosemore
2015-07-14 23:49 ` [nios2] [6/7] Update function prologues/epilogues for R2 CDX Sandra Loosemore
2015-07-15  0:03 ` [nios2] [7/7] Add new intrinsics Sandra Loosemore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).