public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* ARM ldm/stm peepholes
@ 2010-04-20 11:37 Bernd Schmidt
  2010-04-23  8:24 ` Ramana Radhakrishnan
  2010-07-07 11:57 ` Resubmit/Ping^n " Bernd Schmidt
  0 siblings, 2 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-04-20 11:37 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2072 bytes --]

This is a new version of the ldm/stm peepholes patch I posted as a WIP
earlier.  This is now in a state where I think it's an improvement over
the existing code and suitable for review.

The motivation here is PR40457, which notes that we cannot convert
multiple ldr/str instructions into ldm/stm for Thumb-1.  To do that, we
need to be able to show that the base register dies, as Thumb-1 only
supports the updating version of these instructions.  This means we have
to use peephole2 instead of peephole.

In addition to converting the existing peepholes, I've also added new
classes which try to optimize special situations requested in the PR.

1. Constant to memory moves: if all the input registers of an stm are
dead after the instruction, it doesn't much matter which value goes into
which register.  This can also use peephole2's ability to allocate free
registers:
        mov     ip, #1
-       str     ip, [sp, #0]
-       mov     ip, #0
-       str     ip, [sp, #4]
+       mov     lr, #0
+       stmia   sp, {ip, lr}

2. When loading registers for use in a commutative operation, their
order also doesn't matter if they are dead afterwards.

I don't know whether to include the generator program as documentation
or as the master copy, or whether to leave it out altogether.  It might
make sense to keep it if we want to increase the limit of 4 instructions
per pattern.

The patch may need more tuning to decide when to use these instructions;
I'll need suggestions as I don't think I have enough information
available to make these decisions.  For now, this uses the same
heuristics as previously, but it may trigger in slightly more
situations.  I'm open to suggestions.

I've tested this quite often with arm-none-linux-gnueabi QEMU testsuite
runs.  Yesterday, after fixing an aliasing issue, I managed to get a
successful SPEC2000 run on a Cortex-A9 board (with a 4.4-based compiler;
overall performance minimally higher than before); since then I've only
made cleanups and added comments.  Once approved, I'll rerun all tests
before committing.


Bernd

[-- Attachment #2: ldmstm-v6e.diff --]
[-- Type: text/plain, Size: 119739 bytes --]

	PR target/40457
	* config/arm/arm.h (MAX_LDM_STM_OPS): New macro.
	(arm_regs_in_sequence): Declare.
	* config/arm/arm-protos.h (emit_ldm_seq, emit_stm_seq,
	load_multiple_sequence, store_multiple_sequence): Delete
	declarations.
	(arm_gen_load_multiple, arm_gen_store_multiple): Adjust
	prototypes.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm.c (arm_regs_in_sequence): New array.
	(compute_offset_order): New static function.
	(load_multiple_sequence): Now static.  New args SAVED_ORDER,
	CHECK_REGS.  All callers changed.
	Replace constant 4 with MAX_LDM_STM_OPS throughout.
	Use compute_offset_order.  If SAVED_ORDER is nonnull, copy the computed
	order into it.  If CHECK_REGS is false, don't sort REGS.
	Handle Thumb mode.
	(store_multiple_sequence): Now static.  New args NOPS_TOTAL,
	SAVED_ORDER, REG_RTXS and CHECK_REGS.  All callers changed.
	Replace constant 4 with MAX_LDM_STM_OPS throughout.
	Use compute_offset_order.  If SAVED_ORDER is nonnull, copy the computed
	order into it.  If CHECK_REGS is false, don't sort REGS.  Set up
	REG_RTXS just like REGS.
	Handle Thumb mode.
	(arm_gen_load_multiple_1): New function, broken out of
	arm_gen_load_multiple.
	(arm_gen_store_multiple_1): New function, broken out of
	arm_gen_store_multiple.
	(arm_gen_multiple_op): New function, with code from
	arm_gen_load_multiple and arm_gen_store_multiple moved here.
	(arm_gen_load_multiple, arm_gen_store_multiple): Now just
	wrappers around arm_gen_multiple_op.  Remove argument UP, all callers
	changed.
	(gen_ldm_seq, gen_stm_seq, gen_const_stm_seq): New functions.
	* config/arm/predicates.md (commutative_binary_operator): New.
	(load_multiple_operation, store_multiple_operation): Handle more
	variants of these patterns with different starting offsets.  Handle
	Thumb-1.
	* config/arm/arm.md: Include "ldmstm.md".
	(ldmsi_postinc4, ldmsi_postinc4_thumb1, ldmsi_postinc3, ldmsi_postinc2,
	ldmsi4, ldmsi3, ldmsi2, stmsi_postinc4, stmsi_postinc4_thumb1,
	stmsi_postinc3, stmsi_postinc2, stmsi4, stmsi3, stmsi2 and related
	peepholes): Delete.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm-ldmstm.ml: New file.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 158513)
+++ config/arm/arm.c	(working copy)
@@ -724,6 +724,12 @@ static const char * const arm_condition_
   "hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
 };
 
+/* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
+int arm_regs_in_sequence[] =
+{
+  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
+};
+
 #define ARM_LSL_NAME (TARGET_UNIFIED_ASM ? "lsl" : "asl")
 #define streq(string1, string2) (strcmp (string1, string2) == 0)
 
@@ -9074,21 +9080,76 @@ adjacent_mem_locations (rtx a, rtx b)
   return 0;
 }
 
-int
-load_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
-			HOST_WIDE_INT *load_offset)
+/* Subroutine of load_multiple_sequence and store_multiple_sequence.
+   Given an array of UNSORTED_OFFSETS, of which there are NOPS, compute
+   an array ORDER which describes the sequence to use when accessing the
+   offsets that produces an ascending order.  In this sequence, each
+   offset must be larger by exactly 4 than the previous one.  ORDER[0]
+   must have been filled in with the lowest offset by the caller.
+   If UNSORTED_REGS is nonnull, it is an array of register numbers that
+   we use to verify that ORDER produces an ascending order of registers.
+   Return true if it was possible to construct such an order, false if
+   not.  */
+
+static bool
+compute_offset_order (int nops, HOST_WIDE_INT *unsorted_offsets, int *order,
+		      int *unsorted_regs)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int i;
+  for (i = 1; i < nops; i++)
+    {
+      int j;
+
+      order[i] = order[i - 1];
+      for (j = 0; j < nops; j++)
+	if (unsorted_offsets[j] == unsorted_offsets[order[i - 1]] + 4)
+	  {
+	    /* We must find exactly one offset that is higher than the
+	       previous one by 4.  */
+	    if (order[i] != order[i - 1])
+	      return false;
+	    order[i] = j;
+	  }
+      if (order[i] == order[i - 1])
+	return false;
+      /* The register numbers must be ascending.  */
+      if (unsorted_regs != NULL
+	  && unsorted_regs[order[i]] <= unsorted_regs[order[i - 1]])
+	return false;
+    }
+  return true;
+}
+
+/* Used to determine in a peephole whether a sequence of load instructions can
+   be changed into a load-multiple instruction.
+   NOPS is the number of separate load instructions we are examining.
+   The first NOPS entries in OPERANDS are the destination registers, the next
+   NOPS entries are memory operands.  If this function is successful, *BASE is
+   set to the common base register of the memory accesses; *LOAD_OFFSET is set
+   to the first memory location's offset from that base register.  REGS is an
+   array filled in with the destination register numbers.
+   SAVED_ORDER (if nonnull), is an array filled in with an order that maps insn
+   numbers to to an ascending order of stores.
+   If CHECK_REGS is true, the sequence of registers in REGS matches the loads
+   from ascending memory locations, and the function verifies that the register
+   numbers are themselves ascending.  If CHECK_REGS is false, the register
+   numbers are stored in the order they are found in the operands.  */
+static int
+load_multiple_sequence (rtx *operands, int nops, int *regs, int *saved_order,
+			int *base, HOST_WIDE_INT *load_offset, bool check_regs)
+{
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
+  rtx base_reg_rtx;
   int base_reg = -1;
   int i;
 
-  /* Can only handle 2, 3, or 4 insns at present,
+  /* Can only handle between 2 and MAX_LDMSTM_OPS insns at present,
      though could be easily extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9126,32 +9187,32 @@ load_multiple_sequence (rtx *operands, i
 	  if (i == 0)
 	    {
 	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
+	      base_reg_rtx = reg;
+	      if (TARGET_THUMB1 && base_reg > LAST_LO_REGNUM)
+		return 0;
 	    }
 	  else
 	    {
 	      if (base_reg != (int) REGNO (reg))
 		/* Not addressed from the same base register.  */
 		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
 	    }
+	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
+			      ? REGNO (operands[i])
+			      : REGNO (SUBREG_REG (operands[i])));
 
 	  /* If it isn't an integer register, or if it overwrites the
 	     base register but isn't the last insn in the list, then
 	     we can't do this.  */
-	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14
+	  if (unsorted_regs[i] < 0
+	      || (TARGET_THUMB1 && unsorted_regs[i] > LAST_LO_REGNUM)
+	      || unsorted_regs[i] > 14
 	      || (i != nops - 1 && unsorted_regs[i] == base_reg))
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9160,41 +9221,31 @@ load_multiple_sequence (rtx *operands, i
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order,
+			     check_regs ? unsorted_regs : NULL))
+    return 0;
 
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
 
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+  if (saved_order)
+    memcpy (saved_order, order, sizeof order);
 
   if (base)
     {
       *base = base_reg;
 
       for (i = 0; i < nops; i++)
-	regs[i] = unsorted_regs[order[i]];
+	regs[i] = unsorted_regs[check_regs ? order[i] : i];
 
       *load_offset = unsorted_offsets[order[0]];
     }
 
+  if (TARGET_THUMB1
+      && !peep2_reg_dead_p (nops, base_reg_rtx))
+    return 0;
+
   if (unsorted_offsets[order[0]] == 0)
     return 1; /* ldmia */
 
@@ -9204,7 +9255,7 @@ load_multiple_sequence (rtx *operands, i
   if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
     return 3; /* ldmda */
 
-  if (unsorted_offsets[order[nops - 1]] == -4)
+  if (TARGET_32BIT && unsorted_offsets[order[nops - 1]] == -4)
     return 4; /* ldmdb */
 
   /* For ARM8,9 & StrongARM, 2 ldr instructions are faster than an ldm
@@ -9245,79 +9296,42 @@ load_multiple_sequence (rtx *operands, i
 	  || const_ok_for_arm (-unsorted_offsets[order[0]])) ? 5 : 0;
 }
 
-const char *
-emit_ldm_seq (rtx *operands, int nops)
-{
-  int regs[4];
-  int base_reg;
-  HOST_WIDE_INT offset;
-  char buf[100];
-  int i;
-
-  switch (load_multiple_sequence (operands, nops, regs, &base_reg, &offset))
-    {
-    case 1:
-      strcpy (buf, "ldm%(ia%)\t");
-      break;
-
-    case 2:
-      strcpy (buf, "ldm%(ib%)\t");
-      break;
-
-    case 3:
-      strcpy (buf, "ldm%(da%)\t");
-      break;
-
-    case 4:
-      strcpy (buf, "ldm%(db%)\t");
-      break;
-
-    case 5:
-      if (offset >= 0)
-	sprintf (buf, "add%%?\t%s%s, %s%s, #%ld", REGISTER_PREFIX,
-		 reg_names[regs[0]], REGISTER_PREFIX, reg_names[base_reg],
-		 (long) offset);
-      else
-	sprintf (buf, "sub%%?\t%s%s, %s%s, #%ld", REGISTER_PREFIX,
-		 reg_names[regs[0]], REGISTER_PREFIX, reg_names[base_reg],
-		 (long) -offset);
-      output_asm_insn (buf, operands);
-      base_reg = regs[0];
-      strcpy (buf, "ldm%(ia%)\t");
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-  sprintf (buf + strlen (buf), "%s%s, {%s%s", REGISTER_PREFIX,
-	   reg_names[base_reg], REGISTER_PREFIX, reg_names[regs[0]]);
-
-  for (i = 1; i < nops; i++)
-    sprintf (buf + strlen (buf), ", %s%s", REGISTER_PREFIX,
-	     reg_names[regs[i]]);
-
-  strcat (buf, "}\t%@ phole ldm");
-
-  output_asm_insn (buf, operands);
-  return "";
-}
+/* Used to determine in a peephole whether a sequence of store instructions can
+   be changed into a store-multiple instruction.
+   NOPS is the number of separate store instructions we are examining.
+   NOPS_TOTAL is the total number of instructions recognized by the peephole
+   pattern.
+   The first NOPS entries in OPERANDS are the source registers, the next
+   NOPS entries are memory operands.  If this function is successful, *BASE is
+   set to the common base register of the memory accesses; *LOAD_OFFSET is set
+   to the first memory location's offset from that base register.  REGS is an
+   array filled in with the source register numbers, REG_RTXS (if nonnull) is
+   likewise filled with the corresponding rtx's.
+   SAVED_ORDER (if nonnull), is an array filled in with an order that maps insn
+   numbers to to an ascending order of stores.
+   If CHECK_REGS is true, the sequence of registers in *REGS matches the stores
+   from ascending memory locations, and the function verifies that the register
+   numbers are themselves ascending.  If CHECK_REGS is false, the register
+   numbers are stored in the order they are found in the operands.  */
 
-int
-store_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
-			 HOST_WIDE_INT * load_offset)
+static int
+store_multiple_sequence (rtx *operands, int nops, int nops_total,
+			 int *regs, rtx *reg_rtxs, int *saved_order, int *base,
+			 HOST_WIDE_INT *load_offset, bool check_regs)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  rtx unsorted_reg_rtxs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
+  rtx base_reg_rtx;
   int i;
 
-  /* Can only handle 2, 3, or 4 insns at present, though could be easily
-     extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  /* Can only handle between 2 and MAX_LDMSTM_OPS insns at present,
+     though could be easily extended if required.  */
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9355,29 +9369,31 @@ store_multiple_sequence (rtx *operands, 
 	  if (i == 0)
 	    {
 	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
+	      base_reg_rtx = reg;
+	      if (TARGET_THUMB1 && base_reg > LAST_LO_REGNUM)
+		return 0;
 	    }
 	  else
 	    {
 	      if (base_reg != (int) REGNO (reg))
 		/* Not addressed from the same base register.  */
 		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
 	    }
+	  unsorted_reg_rtxs[i] = (GET_CODE (operands[i]) == REG
+				  ? operands[i] : SUBREG_REG (operands[i]));
+	  unsorted_regs[i] = REGNO (unsorted_reg_rtxs[i]);
 
 	  /* If it isn't an integer register, then we can't do this.  */
-	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14)
+	  if (unsorted_regs[i] < 0
+	      || (TARGET_THUMB1 && unsorted_regs[i] > LAST_LO_REGNUM)
+	      || (TARGET_THUMB2 && unsorted_regs[i] == base_reg)
+	      || (TARGET_THUMB2 && unsorted_regs[i] == SP_REGNUM)
+	      || unsorted_regs[i] > 14)
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9386,111 +9402,71 @@ store_multiple_sequence (rtx *operands, 
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order,
+			     check_regs ? unsorted_regs : NULL))
+    return 0;
 
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
 
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+  if (saved_order)
+    memcpy (saved_order, order, sizeof order);
 
   if (base)
     {
       *base = base_reg;
 
       for (i = 0; i < nops; i++)
-	regs[i] = unsorted_regs[order[i]];
+	{
+	  regs[i] = unsorted_regs[check_regs ? order[i] : i];
+	  if (reg_rtxs)
+	    reg_rtxs[i] = unsorted_reg_rtxs[check_regs ? order[i] : i];
+	}
 
       *load_offset = unsorted_offsets[order[0]];
     }
 
+  if (TARGET_THUMB1
+      && !peep2_reg_dead_p (nops_total, base_reg_rtx))
+    return 0;
+
   if (unsorted_offsets[order[0]] == 0)
     return 1; /* stmia */
 
-  if (unsorted_offsets[order[0]] == 4)
+  if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
     return 2; /* stmib */
 
-  if (unsorted_offsets[order[nops - 1]] == 0)
+  if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
     return 3; /* stmda */
 
-  if (unsorted_offsets[order[nops - 1]] == -4)
+  if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == -4)
     return 4; /* stmdb */
 
-  return 0;
-}
-
-const char *
-emit_stm_seq (rtx *operands, int nops)
-{
-  int regs[4];
-  int base_reg;
-  HOST_WIDE_INT offset;
-  char buf[100];
-  int i;
-
-  switch (store_multiple_sequence (operands, nops, regs, &base_reg, &offset))
-    {
-    case 1:
-      strcpy (buf, "stm%(ia%)\t");
-      break;
-
-    case 2:
-      strcpy (buf, "stm%(ib%)\t");
-      break;
-
-    case 3:
-      strcpy (buf, "stm%(da%)\t");
-      break;
-
-    case 4:
-      strcpy (buf, "stm%(db%)\t");
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-  sprintf (buf + strlen (buf), "%s%s, {%s%s", REGISTER_PREFIX,
-	   reg_names[base_reg], REGISTER_PREFIX, reg_names[regs[0]]);
-
-  for (i = 1; i < nops; i++)
-    sprintf (buf + strlen (buf), ", %s%s", REGISTER_PREFIX,
-	     reg_names[regs[i]]);
-
-  strcat (buf, "}\t%@ phole stm");
+  if (!peep2_reg_dead_p (nops_total, base_reg_rtx) || nops == 2)
+    return 0;
 
-  output_asm_insn (buf, operands);
-  return "";
+  /* Can't do it without setting up the offset, only do this if it takes
+     no more than one insn.  */
+  return (const_ok_for_arm (unsorted_offsets[order[0]])
+	  || const_ok_for_arm (-unsorted_offsets[order[0]])) ? 5 : 0;
+  return 0;
 }
 \f
 /* Routines for use in generating RTL.  */
 
-rtx
-arm_gen_load_multiple (int base_regno, int count, rtx from, int up,
-		       int write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+/* Generate a load-multiple instruction.  COUNT is the number of loads in
+   the instruction; REGS and MEMS are arrays containing the operands.
+   BASEREG is the base register to be used in addressing the memory operands.
+   WBACK_OFFSET is nonzero if the instruction should update the base
+   register.  */
+
+static rtx
+arm_gen_load_multiple_1 (int count, int *regs, rtx *mems, rtx basereg,
+			 HOST_WIDE_INT wback_offset)
 {
-  HOST_WIDE_INT offset = *offsetp;
   int i = 0, j;
   rtx result;
-  int sign = up ? 1 : -1;
-  rtx mem, addr;
 
   /* XScale has load-store double instructions, but they have stricter
      alignment requirements than load-store multiple, so we cannot
@@ -9527,18 +9503,10 @@ arm_gen_load_multiple (int base_regno, i
       start_sequence ();
 
       for (i = 0; i < count; i++)
-	{
-	  addr = plus_constant (from, i * 4 * sign);
-	  mem = adjust_automodify_address (basemem, SImode, addr, offset);
-	  emit_move_insn (gen_rtx_REG (SImode, base_regno + i), mem);
-	  offset += 4 * sign;
-	}
+	emit_move_insn (gen_rtx_REG (SImode, regs[i]), mems[i]);
 
-      if (write_back)
-	{
-	  emit_move_insn (from, plus_constant (from, count * 4 * sign));
-	  *offsetp = offset;
-	}
+      if (wback_offset != 0)
+	emit_move_insn (basereg, plus_constant (basereg, wback_offset));
 
       seq = get_insns ();
       end_sequence ();
@@ -9547,41 +9515,40 @@ arm_gen_load_multiple (int base_regno, i
     }
 
   result = gen_rtx_PARALLEL (VOIDmode,
-			     rtvec_alloc (count + (write_back ? 1 : 0)));
-  if (write_back)
+			     rtvec_alloc (count + (wback_offset != 0 ? 1 : 0)));
+  if (wback_offset != 0)
     {
       XVECEXP (result, 0, 0)
-	= gen_rtx_SET (VOIDmode, from, plus_constant (from, count * 4 * sign));
+	= gen_rtx_SET (VOIDmode, basereg,
+		       plus_constant (basereg, wback_offset));
       i = 1;
       count++;
     }
 
   for (j = 0; i < count; i++, j++)
-    {
-      addr = plus_constant (from, j * 4 * sign);
-      mem = adjust_automodify_address_nv (basemem, SImode, addr, offset);
-      XVECEXP (result, 0, i)
-	= gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, base_regno + j), mem);
-      offset += 4 * sign;
-    }
-
-  if (write_back)
-    *offsetp = offset;
+    XVECEXP (result, 0, i)
+      = gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, regs[j]), mems[j]);
 
   return result;
 }
 
-rtx
-arm_gen_store_multiple (int base_regno, int count, rtx to, int up,
-			int write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+/* Generate a store-multiple instruction.  COUNT is the number of stores in
+   the instruction; REGS and MEMS are arrays containing the operands.
+   BASEREG is the base register to be used in addressing the memory operands.
+   WBACK_OFFSET is nonzero if the instruction should update the base
+   register.  */
+
+static rtx
+arm_gen_store_multiple_1 (int count, int *regs, rtx *mems, rtx basereg,
+			  HOST_WIDE_INT wback_offset)
 {
-  HOST_WIDE_INT offset = *offsetp;
   int i = 0, j;
   rtx result;
-  int sign = up ? 1 : -1;
-  rtx mem, addr;
 
-  /* See arm_gen_load_multiple for discussion of
+  if (GET_CODE (basereg) == PLUS)
+    basereg = XEXP (basereg, 0);
+
+  /* See arm_gen_load_multiple_1 for discussion of
      the pros/cons of ldm/stm usage for XScale.  */
   if (arm_tune_xscale && count <= 2 && ! optimize_size)
     {
@@ -9590,18 +9557,10 @@ arm_gen_store_multiple (int base_regno, 
       start_sequence ();
 
       for (i = 0; i < count; i++)
-	{
-	  addr = plus_constant (to, i * 4 * sign);
-	  mem = adjust_automodify_address (basemem, SImode, addr, offset);
-	  emit_move_insn (mem, gen_rtx_REG (SImode, base_regno + i));
-	  offset += 4 * sign;
-	}
+	emit_move_insn (mems[i], gen_rtx_REG (SImode, regs[i]));
 
-      if (write_back)
-	{
-	  emit_move_insn (to, plus_constant (to, count * 4 * sign));
-	  *offsetp = offset;
-	}
+      if (wback_offset != 0)
+	emit_move_insn (basereg, plus_constant (basereg, wback_offset));
 
       seq = get_insns ();
       end_sequence ();
@@ -9610,29 +9569,319 @@ arm_gen_store_multiple (int base_regno, 
     }
 
   result = gen_rtx_PARALLEL (VOIDmode,
-			     rtvec_alloc (count + (write_back ? 1 : 0)));
-  if (write_back)
+			     rtvec_alloc (count + (wback_offset != 0 ? 1 : 0)));
+  if (wback_offset != 0)
     {
       XVECEXP (result, 0, 0)
-	= gen_rtx_SET (VOIDmode, to,
-		       plus_constant (to, count * 4 * sign));
+	= gen_rtx_SET (VOIDmode, basereg,
+		       plus_constant (basereg, wback_offset));
       i = 1;
       count++;
     }
 
   for (j = 0; i < count; i++, j++)
+    XVECEXP (result, 0, i)
+      = gen_rtx_SET (VOIDmode, mems[j], gen_rtx_REG (SImode, regs[j]));
+
+  return result;
+}
+
+/* Generate either a load-multiple or a store-multiple instruction.  This
+   function can be used in situations where we can start with a single MEM
+   rtx and adjust its address upwards.
+   COUNT is the number of operations in the instruction, not counting a
+   possible update of the base register.  REGS is an array containing the
+   register operands.
+   BASEREG is the base register to be used in addressing the memory operands,
+   which are constructed from BASEMEM.
+   WRITE_BACK specifies whether the generated instruction should include an
+   update of the base register.
+   OFFSETP is used to pass an offset to and from this function; this offset
+   is not used when constructing the address (instead BASEMEM should have an
+   appropriate offset in its address), it is used only for setting
+   MEM_OFFSET.  It is updated only if WRITE_BACK is true.*/
+
+static rtx
+arm_gen_multiple_op (bool is_load, int *regs, int count, rtx basereg,
+		     bool write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  rtx mems[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT offset = *offsetp;
+  int i;
+
+  gcc_assert (count <= MAX_LDM_STM_OPS);
+
+  if (GET_CODE (basereg) == PLUS)
+    basereg = XEXP (basereg, 0);
+
+  for (i = 0; i < count; i++)
     {
-      addr = plus_constant (to, j * 4 * sign);
-      mem = adjust_automodify_address_nv (basemem, SImode, addr, offset);
-      XVECEXP (result, 0, i)
-	= gen_rtx_SET (VOIDmode, mem, gen_rtx_REG (SImode, base_regno + j));
-      offset += 4 * sign;
+      rtx addr = plus_constant (basereg, i * 4);
+      mems[i] = adjust_automodify_address_nv (basemem, SImode, addr, offset);
+      offset += 4;
     }
 
   if (write_back)
     *offsetp = offset;
 
-  return result;
+  if (is_load)
+    return arm_gen_load_multiple_1 (count, regs, mems, basereg,
+				    write_back ? 4 * count : 0);
+  else
+    return arm_gen_store_multiple_1 (count, regs, mems, basereg,
+				     write_back ? 4 * count : 0);
+}
+
+rtx
+arm_gen_load_multiple (int *regs, int count, rtx basereg, int write_back,
+		       rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  return arm_gen_multiple_op (TRUE, regs, count, basereg, write_back, basemem,
+			      offsetp);
+}
+
+rtx
+arm_gen_store_multiple (int *regs, int count, rtx basereg, int write_back,
+			rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  return arm_gen_multiple_op (FALSE, regs, count, basereg, write_back, basemem,
+			      offsetp);
+}
+
+/* Called from a peephole2 expander to turn a sequence of loads into an
+   LDM instruction.  OPERANDS are the operands found by the peephole matcher;
+   NOPS indicates how many separate loads we are trying to combine.  SORT_REGS
+   is true if we can reorder the registers because they are used commutatively
+   subsequently.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_ldm_seq (rtx *operands, int nops, bool sort_regs)
+{
+  int regs[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int i, j, base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int ldm_case;
+  rtx addr;
+
+  ldm_case = load_multiple_sequence (operands, nops, regs, mem_order,
+				     &base_reg, &offset, !sort_regs);
+
+  if (ldm_case == 0)
+    return false;
+
+  if (sort_regs)
+    for (i = 0; i < nops - 1; i++)
+      for (j = i + 1; j < nops; j++)
+	if (regs[i] > regs[j])
+	  {
+	    int t = regs[i];
+	    regs[i] = regs[j];
+	    regs[j] = t;
+	  }
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (peep2_reg_dead_p (nops, base_reg_rtx));
+      gcc_assert (ldm_case == 1 || ldm_case == 5);
+      write_back = TRUE;
+    }
+
+  if (ldm_case == 5)
+    {
+      rtx newbase = TARGET_THUMB1 ? base_reg_rtx : gen_rtx_REG (SImode, regs[0]);
+      emit_insn (gen_addsi3 (newbase, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+      if (!TARGET_THUMB1)
+	{
+	  base_reg = regs[0];
+	  base_reg_rtx = newbase;
+	}
+    }
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_load_multiple_1 (nops, regs, mems, base_reg_rtx,
+				      write_back ? offset + i * 4 : 0));
+  return true;
+}
+
+/* Called from a peephole2 expander to turn a sequence of stores into an
+   STM instruction.  OPERANDS are the operands found by the peephole matcher;
+   NOPS indicates how many separate stores we are trying to combine.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_stm_seq (rtx *operands, int nops)
+{
+  int i;
+  int regs[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int stm_case;
+  rtx addr;
+  bool base_reg_dies;
+
+  stm_case = store_multiple_sequence (operands, nops, nops, regs, NULL,
+				      mem_order, &base_reg, &offset, true);
+
+  if (stm_case == 0)
+    return false;
+
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  base_reg_dies = peep2_reg_dead_p (nops, base_reg_rtx);
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (base_reg_dies);
+      write_back = TRUE;
+    }
+
+  if (stm_case == 5)
+    {
+      gcc_assert (base_reg_dies);
+      emit_insn (gen_addsi3 (base_reg_rtx, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+    }
+
+  addr = plus_constant (base_reg_rtx, offset);
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_store_multiple_1 (nops, regs, mems, base_reg_rtx,
+				       write_back ? offset + i * 4 : 0));
+  return true;
+}
+
+/* Called from a peephole2 expander to turn a sequence of stores that are
+   preceded by constant loads into an STM instruction.  OPERANDS are the
+   operands found by the peephole matcher; NOPS indicates how many
+   separate stores we are trying to combine; there are 2 * NOPS
+   instructions in the peephole.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_const_stm_seq (rtx *operands, int nops)
+{
+  int regs[MAX_LDM_STM_OPS], sorted_regs[MAX_LDM_STM_OPS];
+  int reg_order[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx reg_rtxs[MAX_LDM_STM_OPS], orig_reg_rtxs[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int stm_case;
+  rtx addr;
+  bool base_reg_dies;
+  int i, j;
+  HARD_REG_SET allocated;
+
+  stm_case = store_multiple_sequence (operands, nops, 2 * nops, regs, reg_rtxs,
+				      mem_order, &base_reg, &offset, false);
+
+  if (stm_case == 0)
+    return false;
+
+  memcpy (orig_reg_rtxs, reg_rtxs, sizeof orig_reg_rtxs);
+
+  /* If the same register is used more than once, try to find a free
+     register.  */
+  CLEAR_HARD_REG_SET (allocated);
+  for (i = 0; i < nops; i++)
+    {
+      for (j = i + 1; j < nops; j++)
+	if (regs[i] == regs[j])
+	  {
+	    rtx t = peep2_find_free_register (0, nops * 2,
+					      TARGET_THUMB1 ? "l" : "r",
+					      SImode, &allocated);
+	    if (t == NULL_RTX)
+	      return false;
+	    reg_rtxs[i] = t;
+	    regs[i] = REGNO (t);
+	  }
+    }
+
+  /* Compute an ordering that maps the register numbers to an ascending
+     sequence.  */
+  reg_order[0] = 0;
+  for (i = 0; i < nops; i++)
+    if (regs[i] < regs[reg_order[0]])
+      reg_order[0] = i;
+
+  for (i = 1; i < nops; i++)
+    {
+      int this_order = reg_order[i - 1];
+      for (j = 0; j < nops; j++)
+	if (regs[j] > regs[reg_order[i - 1]]
+	    && (this_order == reg_order[i - 1]
+		|| regs[j] < regs[this_order]))
+	  this_order = j;
+      reg_order[i] = this_order;
+    }
+
+  /* Ensure that registers that must be live after the instruction end
+     up with the correct value.  */
+  for (i = 0; i < nops; i++)
+    {
+      int this_order = reg_order[i];
+      if ((this_order != mem_order[i]
+	   || orig_reg_rtxs[this_order] != reg_rtxs[this_order])
+	  && !peep2_reg_dead_p (nops * 2, orig_reg_rtxs[this_order]))
+	return false;
+    }
+
+  /* Load the constants.  */
+  for (i = 0; i < nops; i++)
+    {
+      rtx op = operands[2 * nops + mem_order[i]];
+      sorted_regs[i] = regs[reg_order[i]];
+      emit_move_insn (reg_rtxs[reg_order[i]], op);
+    }
+
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  base_reg_dies = peep2_reg_dead_p (nops * 2, base_reg_rtx);
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (base_reg_dies);
+      write_back = TRUE;
+    }
+
+  if (stm_case == 5)
+    {
+      gcc_assert (base_reg_dies);
+      emit_insn (gen_addsi3 (base_reg_rtx, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+    }
+
+  addr = plus_constant (base_reg_rtx, offset);
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_store_multiple_1 (nops, sorted_regs, mems, base_reg_rtx,
+				       write_back ? offset + i * 4 : 0));
+  return true;
 }
 
 int
@@ -9668,20 +9917,21 @@ arm_gen_movmemqi (rtx *operands)
   for (i = 0; in_words_to_go >= 2; i+=4)
     {
       if (in_words_to_go > 4)
-	emit_insn (arm_gen_load_multiple (0, 4, src, TRUE, TRUE,
-					  srcbase, &srcoffset));
+	emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, 4, src,
+					  TRUE, srcbase, &srcoffset));
       else
-	emit_insn (arm_gen_load_multiple (0, in_words_to_go, src, TRUE,
-					  FALSE, srcbase, &srcoffset));
+	emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, in_words_to_go,
+					  src, FALSE, srcbase,
+					  &srcoffset));
 
       if (out_words_to_go)
 	{
 	  if (out_words_to_go > 4)
-	    emit_insn (arm_gen_store_multiple (0, 4, dst, TRUE, TRUE,
-					       dstbase, &dstoffset));
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, 4, dst,
+					       TRUE, dstbase, &dstoffset));
 	  else if (out_words_to_go != 1)
-	    emit_insn (arm_gen_store_multiple (0, out_words_to_go,
-					       dst, TRUE,
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence,
+					       out_words_to_go, dst,
 					       (last_bytes == 0
 						? FALSE : TRUE),
 					       dstbase, &dstoffset));
Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 158513)
+++ config/arm/arm.h	(working copy)
@@ -1087,6 +1087,13 @@ extern int arm_structure_size_boundary;
   ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
    || (MODE) == CImode || (MODE) == XImode)
 
+/* The maximum number of parallel loads or stores we support in an ldm/stm
+   instruction.  */
+#define MAX_LDM_STM_OPS 4
+
+/* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
+extern int arm_regs_in_sequence[];
+
 /* The order in which register should be allocated.  It is good to use ip
    since no saving is required (though calls clobber it) and it never contains
    function parameters.  It is quite good to use lr since other calls may
Index: config/arm/arm-protos.h
===================================================================
--- config/arm/arm-protos.h	(revision 158513)
+++ config/arm/arm-protos.h	(working copy)
@@ -98,14 +98,11 @@ extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
 extern RTX_CODE minmax_code (rtx);
 extern int adjacent_mem_locations (rtx, rtx);
-extern int load_multiple_sequence (rtx *, int, int *, int *, HOST_WIDE_INT *);
-extern const char *emit_ldm_seq (rtx *, int);
-extern int store_multiple_sequence (rtx *, int, int *, int *, HOST_WIDE_INT *);
-extern const char * emit_stm_seq (rtx *, int);
-extern rtx arm_gen_load_multiple (int, int, rtx, int, int,
-				  rtx, HOST_WIDE_INT *);
-extern rtx arm_gen_store_multiple (int, int, rtx, int, int,
-				   rtx, HOST_WIDE_INT *);
+extern bool gen_ldm_seq (rtx *, int, bool);
+extern bool gen_stm_seq (rtx *, int);
+extern bool gen_const_stm_seq (rtx *, int);
+extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
Index: config/arm/ldmstm.md
===================================================================
--- config/arm/ldmstm.md	(revision 0)
+++ config/arm/ldmstm.md	(revision 0)
@@ -0,0 +1,1191 @@
+/* ARM ldm/stm instruction patterns.  This file was automatically generated
+   using arm-ldmstm.ml.  Please do not edit manually.
+
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+(define_insn "*ldm4_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 5 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm4_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 5 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")])
+
+(define_insn "*ldm4_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm4_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&l")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")])
+
+(define_insn "*stm4_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 5 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "stm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm4_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&l")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
+  "stm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")])
+
+(define_insn "*ldm4_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 16))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ib%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 16))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ib%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 16)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(ib%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 16)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "stm%(ib%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(da%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "ldm%(da%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(da%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "stm%(da%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int -16))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(db%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -16))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "ldm%(db%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int -16)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(db%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -16)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "stm%(db%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "memory_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 6 "memory_operand" ""))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 7 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 4, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (parallel
+    [(set (match_operand:SI 1 "s_register_operand" "")
+          (match_operand:SI 5 "memory_operand" ""))
+     (set (match_operand:SI 2 "s_register_operand" "")
+          (match_operand:SI 6 "memory_operand" ""))
+     (set (match_operand:SI 3 "s_register_operand" "")
+          (match_operand:SI 7 "memory_operand" ""))])]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 4, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 9 "const_int_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 10 "const_int_operand" ""))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_dup 2))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 11 "const_int_operand" ""))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_dup 3))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 9 "const_int_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 10 "const_int_operand" ""))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 11 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_dup 2))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_dup 3))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 4 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_operand:SI 2 "s_register_operand" ""))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_operand:SI 3 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_insn "*ldm3_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 4 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm3_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 4 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")])
+
+(define_insn "*ldm3_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm3_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&l")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")])
+
+(define_insn "*stm3_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 4 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm3_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&l")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")])
+
+(define_insn "*ldm3_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 12))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ib%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 12))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ib%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(ib%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(ib%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(da%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(da%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(da%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(da%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(db%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(db%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(db%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(db%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 5 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 3, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (parallel
+    [(set (match_operand:SI 1 "s_register_operand" "")
+          (match_operand:SI 4 "memory_operand" ""))
+     (set (match_operand:SI 2 "s_register_operand" "")
+          (match_operand:SI 5 "memory_operand" ""))])]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 3, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 6 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 7 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 2))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 6 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 7 "const_int_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 2))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 3 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_operand:SI 2 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_insn "*ldm2_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 3 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm2_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 3 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")])
+
+(define_insn "*ldm2_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm2_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&l")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")])
+
+(define_insn "*stm2_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 3 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "stm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm2_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&l")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")])
+
+(define_insn "*ldm2_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 8))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ib%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 8))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ib%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "stm%(ib%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(ib%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int -4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "ldm%(da%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(da%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int -4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "stm%(da%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(da%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "ldm%(db%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(db%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "stm%(db%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(db%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 2, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "const_int_operand" ""))
+   (set (match_operand:SI 2 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 1))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "const_int_operand" ""))
+   (set (match_operand:SI 2 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 1))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 2 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (parallel
+     [(set (match_operand:SI 4 "s_register_operand" "")
+           (match_operator:SI 5 "commutative_binary_operator"
+            [(match_operand:SI 6 "s_register_operand" "")
+             (match_operand:SI 7 "s_register_operand" "")]))
+      (clobber (reg:CC CC_REGNUM))])]
+  "(((operands[6] == operands[0] && operands[7] == operands[1])
+     || (operands[7] == operands[0] && operands[6] == operands[1]))
+    && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1]))"
+  [(parallel
+    [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
+     (clobber (reg:CC CC_REGNUM))])]
+{
+  if (!gen_ldm_seq (operands, 2, true))
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (set (match_operand:SI 4 "s_register_operand" "")
+        (match_operator:SI 5 "commutative_binary_operator"
+         [(match_operand:SI 6 "s_register_operand" "")
+          (match_operand:SI 7 "s_register_operand" "")]))]
+  "(((operands[6] == operands[0] && operands[7] == operands[1])
+     || (operands[7] == operands[0] && operands[6] == operands[1]))
+    && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1]))"
+  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))]
+{
+  if (!gen_ldm_seq (operands, 2, true))
+    FAIL;
+})
+
Index: config/arm/predicates.md
===================================================================
--- config/arm/predicates.md	(revision 158513)
+++ config/arm/predicates.md	(working copy)
@@ -181,6 +181,11 @@ (define_special_predicate "logical_binar
   (and (match_code "ior,xor,and")
        (match_test "mode == GET_MODE (op)")))
 
+;; True for commutative operators
+(define_special_predicate "commutative_binary_operator"
+  (and (match_code "ior,xor,and,plus")
+       (match_test "mode == GET_MODE (op)")))
+
 ;; True for shift operators.
 (define_special_predicate "shift_operator"
   (and (ior (ior (and (match_code "mult")
@@ -305,13 +310,17 @@ (define_special_predicate "load_multiple
   (match_code "parallel")
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
-  int dest_regno;
+  unsigned dest_regno;
   rtx src_addr;
   HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
   rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
 
   if (count <= 1
-      || GET_CODE (XVECEXP (op, 0, 0)) != SET)
+      || GET_CODE (XVECEXP (op, 0, 0)) != SET
+      || !REG_P (SET_DEST (XVECEXP (op, 0, 0))))
     return false;
 
   /* Check to see if this might be a write-back.  */
@@ -319,6 +328,7 @@ (define_special_predicate "load_multiple
     {
       i++;
       base = 1;
+      update = true;
 
       /* Now check it more carefully.  */
       if (GET_CODE (SET_DEST (elt)) != REG
@@ -337,6 +347,15 @@ (define_special_predicate "load_multiple
 
   dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
   src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+  if (GET_CODE (src_addr) == PLUS)
+    {
+      if (GET_CODE (XEXP (src_addr, 1)) != CONST_INT)
+	return false;
+      offset = INTVAL (XEXP (src_addr, 1));
+      src_addr = XEXP (src_addr, 0);
+    }
+  if (!REG_P (src_addr))
+    return false;
 
   for (; i < count; i++)
     {
@@ -345,16 +364,28 @@ (define_special_predicate "load_multiple
       if (GET_CODE (elt) != SET
           || GET_CODE (SET_DEST (elt)) != REG
           || GET_MODE (SET_DEST (elt)) != SImode
-          || REGNO (SET_DEST (elt)) != (unsigned int)(dest_regno + i - base)
+          || REGNO (SET_DEST (elt)) <= dest_regno
           || GET_CODE (SET_SRC (elt)) != MEM
           || GET_MODE (SET_SRC (elt)) != SImode
-          || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
-          || !rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
-          || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
-          || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1)) != (i - base) * 4)
+          || ((GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
+	       || !rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
+	       || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
+	       || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1)) != offset + (i - base) * 4)
+	      && (!REG_P (XEXP (SET_SRC (elt), 0))
+		  || offset + (i - base) * 4 != 0)))
         return false;
+      dest_regno = REGNO (SET_DEST (elt));
+      if (dest_regno == REGNO (src_addr))
+        addr_reg_loaded = true;
     }
-
+  /* For Thumb, we only have updating instructions.  If the pattern does
+     not describe an update, it must be because the address register is
+     in the list of loaded registers - on the hardware, this has the effect
+     of overriding the update.  */
+  if (update && addr_reg_loaded)
+    return false;
+  if (TARGET_THUMB1)
+    return update || addr_reg_loaded;
   return true;
 })
 
@@ -362,9 +393,9 @@ (define_special_predicate "store_multipl
   (match_code "parallel")
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
-  int src_regno;
+  unsigned src_regno;
   rtx dest_addr;
-  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT i = 1, base = 0, offset = 0;
   rtx elt;
 
   if (count <= 1
@@ -395,6 +426,16 @@ (define_special_predicate "store_multipl
   src_regno = REGNO (SET_SRC (XVECEXP (op, 0, i - 1)));
   dest_addr = XEXP (SET_DEST (XVECEXP (op, 0, i - 1)), 0);
 
+  if (GET_CODE (dest_addr) == PLUS)
+    {
+      if (GET_CODE (XEXP (dest_addr, 1)) != CONST_INT)
+	return false;
+      offset = INTVAL (XEXP (dest_addr, 1));
+      dest_addr = XEXP (dest_addr, 0);
+    }
+  if (!REG_P (dest_addr))
+    return false;
+
   for (; i < count; i++)
     {
       elt = XVECEXP (op, 0, i);
@@ -402,14 +443,17 @@ (define_special_predicate "store_multipl
       if (GET_CODE (elt) != SET
           || GET_CODE (SET_SRC (elt)) != REG
           || GET_MODE (SET_SRC (elt)) != SImode
-          || REGNO (SET_SRC (elt)) != (unsigned int)(src_regno + i - base)
+          || REGNO (SET_SRC (elt)) <= src_regno
           || GET_CODE (SET_DEST (elt)) != MEM
           || GET_MODE (SET_DEST (elt)) != SImode
-          || GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
-          || !rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
-          || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
-          || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1)) != (i - base) * 4)
+          || ((GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
+	       || !rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
+	       || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
+               || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1)) != offset + (i - base) * 4)
+	      && (!REG_P (XEXP (SET_DEST (elt), 0))
+		  || offset + (i - base) * 4 != 0)))
         return false;
+      src_regno = REGNO (SET_SRC (elt));
     }
 
   return true;
Index: config/arm/arm-ldmstm.ml
===================================================================
--- config/arm/arm-ldmstm.ml	(revision 0)
+++ config/arm/arm-ldmstm.ml	(revision 0)
@@ -0,0 +1,332 @@
+(* Auto-generate ARM ldm/stm patterns
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   This is an O'Caml program.  The O'Caml compiler is available from:
+
+     http://caml.inria.fr/
+
+   Or from your favourite OS's friendly packaging system. Tested with version
+   3.09.2, though other versions will probably work too.
+
+   Run with:
+     ocaml arm-ldmstm.ml >/path/to/gcc/config/arm/ldmstm.ml
+*)
+
+type amode = IA | IB | DA | DB
+
+type optype = IN | OUT | INOUT
+
+let rec string_of_addrmode addrmode =
+  match addrmode with
+    IA -> "ia" | IB -> "ib" | DA -> "da" | DB -> "db"
+
+let rec initial_offset addrmode nregs =
+  match addrmode with
+    IA -> 0
+  | IB -> 4
+  | DA -> -4 * nregs + 4
+  | DB -> -4 * nregs
+
+let rec final_offset addrmode nregs =
+  match addrmode with
+    IA -> nregs * 4
+  | IB -> nregs * 4
+  | DA -> -4 * nregs
+  | DB -> -4 * nregs
+
+let constr thumb =
+  if thumb then "l" else "rk"
+
+let inout_constr op_type =
+  match op_type with
+  OUT -> "=&"
+  | INOUT -> "+&"
+  | IN -> ""
+
+let destreg nregs first op_type thumb =
+  if not first then
+    Printf.sprintf "(match_dup %d)" (nregs + 1)
+  else
+    Printf.sprintf ("(match_operand:SI %d \"s_register_operand\" \"%s%s\")")
+      (nregs + 1) (inout_constr op_type) (constr thumb)
+
+let write_ldm_set thumb nregs offset opnr first =
+  let indent = "     " in
+  Printf.printf "%s" (if first then "    [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"arm_hard_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (mem:SI " indent;
+  begin if offset != 0 then Printf.printf "(plus:SI " end;
+  Printf.printf "%s" (destreg nregs first IN thumb);
+  begin if offset != 0 then Printf.printf "\n%s             (const_int %d))" indent offset end;
+  Printf.printf "))"
+
+let write_stm_set thumb nregs offset opnr first =
+  let indent = "     " in
+  Printf.printf "%s" (if first then "    [" else indent);
+  Printf.printf "(set (mem:SI ";
+  begin if offset != 0 then Printf.printf "(plus:SI " end;
+  Printf.printf "%s" (destreg nregs first IN thumb);
+  begin if offset != 0 then Printf.printf " (const_int %d))" offset end;
+  Printf.printf ")\n%s     (match_operand:SI %d \"arm_hard_register_operand\" \"\"))" indent opnr 
+
+let write_ldm_peep_set extra_indent nregs opnr first =
+  let indent = "   " ^ extra_indent in
+  Printf.printf "%s" (if first then extra_indent ^ "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"s_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (match_operand:SI %d \"memory_operand\" \"\"))" indent (nregs + opnr)
+
+let write_stm_peep_set extra_indent nregs opnr first =
+  let indent = "   " ^ extra_indent in
+  Printf.printf "%s" (if first then extra_indent ^ "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"memory_operand\" \"\")\n" (nregs + opnr);
+  Printf.printf "%s     (match_operand:SI %d \"s_register_operand\" \"\"))" indent opnr
+
+let write_any_load optype nregs opnr first =
+  let indent = "   " in
+  Printf.printf "%s" (if first then "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"s_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (match_operand:SI %d \"%s\" \"\"))" indent (nregs * 2 + opnr) optype
+
+let write_const_store nregs opnr first =
+  let indent = "   " in
+  Printf.printf "%s(set (match_operand:SI %d \"memory_operand\" \"\")\n" indent (nregs + opnr);
+  Printf.printf "%s     (match_dup %d))" indent opnr
+
+let write_const_stm_peep_set nregs opnr first =
+  write_any_load "const_int_operand" nregs opnr first;
+  Printf.printf "\n";
+  write_const_store nregs opnr false
+
+  
+let rec write_pat_sets func opnr offset first n_left =
+  func offset opnr first;
+  begin
+    if n_left > 1 then begin
+      Printf.printf "\n";
+      write_pat_sets func (opnr + 1) (offset + 4) false (n_left - 1);
+    end else
+      Printf.printf "]"
+  end
+
+let rec write_peep_sets func opnr first n_left =
+  func opnr first;
+  begin
+    if n_left > 1 then begin
+      Printf.printf "\n";
+      write_peep_sets func (opnr + 1) false (n_left - 1);
+    end
+  end
+    
+let can_thumb addrmode update is_store =
+  match addrmode, update, is_store with
+    (* Thumb1 mode only supports IA with update.  However, for LDMIA,
+       if the address register also appears in the list of loaded
+       registers, the loaded value is stored, hence the RTL pattern
+       to describe such an insn does not have an update.  We check
+       in the match_parallel predicate that the condition described
+       above is met.  *)
+    IA, _, false -> true
+  | IA, true, true -> true
+  | _ -> false
+
+let target addrmode thumb =
+  match addrmode, thumb with
+    IA, true -> "TARGET_THUMB1"
+  | IA, false -> "TARGET_32BIT"
+  | DB, false -> "TARGET_32BIT"
+  | _, false -> "TARGET_ARM"
+
+let write_pattern_1 name ls addrmode nregs write_set_fn update thumb =
+  let astr = string_of_addrmode addrmode in
+  Printf.printf "(define_insn \"*%s%s%d_%s%s\"\n"
+    (if thumb then "thumb_" else "") name nregs astr
+    (if update then "_update" else "");
+  Printf.printf "  [(match_parallel 0 \"%s_multiple_operation\"\n" ls;
+  begin
+    if update then begin
+      Printf.printf "    [(set %s\n          (plus:SI %s"
+	(destreg nregs true INOUT thumb) (destreg nregs false IN thumb);
+      Printf.printf " (const_int %d)))\n"
+	(final_offset addrmode nregs)
+    end
+  end;
+  write_pat_sets
+    (write_set_fn thumb nregs) 1
+    (initial_offset addrmode nregs)
+    (not update) nregs;
+  Printf.printf ")]\n  \"%s && XVECLEN (operands[0], 0) == %d\"\n"
+    (target addrmode thumb)
+    (if update then nregs + 1 else nregs);
+  Printf.printf "  \"%s%%(%s%%)\\t%%%d%s, {"
+    name astr (nregs + 1) (if update then "!" else "");
+  for n = 1 to nregs; do
+    Printf.printf "%%%d%s" n (if n < nregs then ", " else "")
+  done;
+  Printf.printf "}\"\n";
+  Printf.printf "  [(set_attr \"type\" \"%s%d\")" ls nregs;
+  begin if not thumb then
+    Printf.printf "\n   (set_attr \"predicable\" \"yes\")";
+  end;
+  Printf.printf "])\n\n"
+
+let write_ldm_pattern addrmode nregs update =
+  write_pattern_1 "ldm" "load" addrmode nregs write_ldm_set update false;
+  begin if can_thumb addrmode update false then
+    write_pattern_1 "ldm" "load" addrmode nregs write_ldm_set update true;
+  end
+
+let write_stm_pattern addrmode nregs update =
+  write_pattern_1 "stm" "store" addrmode nregs write_stm_set update false;
+  begin if can_thumb addrmode update true then
+    write_pattern_1 "stm" "store" addrmode nregs write_stm_set update true;
+  end
+
+let write_ldm_commutative_peephole thumb =
+  let nregs = 2 in
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_ldm_peep_set "" nregs) 0 true nregs;
+  let indent = "   " in
+  if thumb then begin
+    Printf.printf "\n%s(set (match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2);
+    Printf.printf "%s     (match_operator:SI %d \"commutative_binary_operator\"\n" indent (nregs * 2 + 1);
+    Printf.printf "%s      [(match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2 + 2);
+    Printf.printf "%s       (match_operand:SI %d \"s_register_operand\" \"\")]))]\n" indent (nregs * 2 + 3)
+  end else begin
+    Printf.printf "\n%s(parallel\n" indent;
+    Printf.printf "%s  [(set (match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2);
+    Printf.printf "%s        (match_operator:SI %d \"commutative_binary_operator\"\n" indent (nregs * 2 + 1);
+    Printf.printf "%s         [(match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2 + 2);
+    Printf.printf "%s          (match_operand:SI %d \"s_register_operand\" \"\")]))\n" indent (nregs * 2 + 3);
+    Printf.printf "%s   (clobber (reg:CC CC_REGNUM))])]\n" indent
+  end;
+  Printf.printf "  \"(((operands[%d] == operands[0] && operands[%d] == operands[1])\n" (nregs * 2 + 2) (nregs * 2 + 3);
+  Printf.printf "     || (operands[%d] == operands[0] && operands[%d] == operands[1]))\n" (nregs * 2 + 3) (nregs * 2 + 2);
+  Printf.printf "    && peep2_reg_dead_p (%d, operands[0]) && peep2_reg_dead_p (%d, operands[1]))\"\n" (nregs + 1) (nregs + 1);
+  begin
+    if thumb then
+      Printf.printf "  [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n"
+	(nregs * 2) (nregs * 2 + 1) (nregs * 2 + 2) (nregs * 2 + 3)
+    else begin
+      Printf.printf "  [(parallel\n";
+      Printf.printf "    [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))\n"
+	(nregs * 2) (nregs * 2 + 1) (nregs * 2 + 2) (nregs * 2 + 3);
+      Printf.printf "     (clobber (reg:CC CC_REGNUM))])]\n"
+    end
+  end;
+  Printf.printf "{\n  if (!gen_ldm_seq (operands, %d, true))\n    FAIL;\n" nregs;
+  Printf.printf "})\n\n"
+
+let write_ldm_peephole nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_ldm_peep_set "" nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_ldm_seq (operands, %d, false))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_ldm_peephole_b nregs =
+  if nregs > 2 then begin
+    Printf.printf "(define_peephole2\n";
+    write_ldm_peep_set "" nregs 0 true;
+    Printf.printf "\n   (parallel\n";
+    write_peep_sets (write_ldm_peep_set "  " nregs) 1 true (nregs - 1);
+    Printf.printf "])]\n  \"\"\n  [(const_int 0)]\n{\n";
+    Printf.printf "  if (gen_ldm_seq (operands, %d, false))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+  end
+
+let write_stm_peephole nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_stm_peep_set "" nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_stm_peephole_b nregs =
+  if nregs > 2 then begin
+    Printf.printf "(define_peephole2\n";
+    write_stm_peep_set "" nregs 0 true;
+    Printf.printf "\n   (parallel\n";
+    write_peep_sets (write_stm_peep_set "" nregs) 1 true (nregs - 1);
+    Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+    Printf.printf "  if (gen_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+  end
+
+let write_const_stm_peephole_a nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_const_stm_peep_set nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_const_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_const_stm_peephole_b nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_any_load "const_int_operand" nregs) 0 true nregs;
+  Printf.printf "\n";
+  write_peep_sets (write_const_store nregs) 0 false nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_const_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let patterns () =
+  let addrmodes = [ IA; IB; DA; DB ]  in
+  let sizes = [ 4; 3; 2] in
+  List.iter
+    (fun n ->
+      List.iter
+	(fun addrmode ->
+	  write_ldm_pattern addrmode n false;
+	  write_ldm_pattern addrmode n true;
+	  write_stm_pattern addrmode n false;
+	  write_stm_pattern addrmode n true)
+	addrmodes;
+      write_ldm_peephole n;
+      write_ldm_peephole_b n;
+      write_const_stm_peephole_a n;
+      write_const_stm_peephole_b n;
+      write_stm_peephole n;)
+    sizes;
+  write_ldm_commutative_peephole false;
+  write_ldm_commutative_peephole true
+
+let print_lines = List.iter (fun s -> Format.printf "%s@\n" s)
+
+(* Do it.  *)
+
+let _ = 
+  print_lines [
+"/* ARM ldm/stm instruction patterns.  This file was automatically generated";
+"   using arm-ldmstm.ml.  Please do not edit manually.";
+"";
+"   Copyright (C) 2010 Free Software Foundation, Inc.";
+"   Contributed by CodeSourcery.";
+"";
+"   This file is part of GCC.";
+"";
+"   GCC is free software; you can redistribute it and/or modify it";
+"   under the terms of the GNU General Public License as published";
+"   by the Free Software Foundation; either version 3, or (at your";
+"   option) any later version.";
+"";
+"   GCC is distributed in the hope that it will be useful, but WITHOUT";
+"   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY";
+"   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public";
+"   License for more details.";
+"";
+"   You should have received a copy of the GNU General Public License and";
+"   a copy of the GCC Runtime Library Exception along with this program;";
+"   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see";
+"   <http://www.gnu.org/licenses/>.  */";
+""];
+  patterns ();
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 158513)
+++ config/arm/arm.md	(working copy)
@@ -6257,7 +6257,7 @@ (define_expand "movxf"
 
 ;; load- and store-multiple insns
 ;; The arm can load/store any set of registers, provided that they are in
-;; ascending order; but that is beyond GCC so stick with what it knows.
+;; ascending order, but these expanders assume a contiguous set.
 
 (define_expand "load_multiple"
   [(match_par_dup 3 [(set (match_operand:SI 0 "" "")
@@ -6278,126 +6278,12 @@ (define_expand "load_multiple"
     FAIL;
 
   operands[3]
-    = arm_gen_load_multiple (REGNO (operands[0]), INTVAL (operands[2]),
+    = arm_gen_load_multiple (arm_regs_in_sequence + REGNO (operands[0]),
+			     INTVAL (operands[2]),
 			     force_reg (SImode, XEXP (operands[1], 0)),
-			     TRUE, FALSE, operands[1], &offset);
+			     FALSE, operands[1], &offset);
 })
 
-;; Load multiple with write-back
-
-(define_insn "*ldmsi_postinc4"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))
-     (set (match_operand:SI 6 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 12))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
-  "ldm%(ia%)\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "load4")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi_postinc4_thumb1"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=l")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))
-     (set (match_operand:SI 6 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 12))))])]
-  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
-  "ldmia\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "load4")]
-)
-
-(define_insn "*ldmsi_postinc3"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 12)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "ldm%(ia%)\\t%1!, {%3, %4, %5}"
-  [(set_attr "type" "load3")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi_postinc2"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 8)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "ldm%(ia%)\\t%1!, {%3, %4}"
-  [(set_attr "type" "load2")
-   (set_attr "predicable" "yes")]
-)
-
-;; Ordinary load multiple
-
-(define_insn "*ldmsi4"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 8))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 12))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "ldm%(ia%)\\t%1, {%2, %3, %4, %5}"
-  [(set_attr "type" "load4")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi3"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 8))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "ldm%(ia%)\\t%1, {%2, %3, %4}"
-  [(set_attr "type" "load3")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi2"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
-  "ldm%(ia%)\\t%1, {%2, %3}"
-  [(set_attr "type" "load2")
-   (set_attr "predicable" "yes")]
-)
-
 (define_expand "store_multiple"
   [(match_par_dup 3 [(set (match_operand:SI 0 "" "")
                           (match_operand:SI 1 "" ""))
@@ -6417,125 +6303,12 @@ (define_expand "store_multiple"
     FAIL;
 
   operands[3]
-    = arm_gen_store_multiple (REGNO (operands[1]), INTVAL (operands[2]),
+    = arm_gen_store_multiple (arm_regs_in_sequence + REGNO (operands[1]),
+			      INTVAL (operands[2]),
 			      force_reg (SImode, XEXP (operands[0], 0)),
-			      TRUE, FALSE, operands[0], &offset);
+			      FALSE, operands[0], &offset);
 })
 
-;; Store multiple with write-back
-
-(define_insn "*stmsi_postinc4"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 12)))
-	  (match_operand:SI 6 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
-  "stm%(ia%)\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi_postinc4_thumb1"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=l")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 12)))
-	  (match_operand:SI 6 "arm_hard_register_operand" ""))])]
-  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
-  "stmia\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi_postinc3"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 12)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "stm%(ia%)\\t%1!, {%3, %4, %5}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store3")]
-)
-
-(define_insn "*stmsi_postinc2"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 8)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "stm%(ia%)\\t%1!, {%3, %4}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store2")]
-)
-
-;; Ordinary store multiple
-
-(define_insn "*stmsi4"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 8)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 12)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "stm%(ia%)\\t%1, {%2, %3, %4, %5}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi3"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 8)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "stm%(ia%)\\t%1, {%2, %3, %4}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store3")]
-)
-
-(define_insn "*stmsi2"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
-  "stm%(ia%)\\t%1, {%2, %3}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store2")]
-)
 
 ;; Move a block of memory if it is word aligned and MORE than 2 words long.
 ;; We could let this apply for blocks of less than this, but it clobbers so
@@ -8894,8 +8668,8 @@ (define_expand "untyped_call"
 	if (REGNO (reg) == R0_REGNUM)
 	  {
 	    /* On thumb we have to use a write-back instruction.  */
-	    emit_insn (arm_gen_store_multiple (R0_REGNUM, 4, addr, TRUE,
-			TARGET_THUMB ? TRUE : FALSE, mem, &offset));
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, 4, addr,
+ 		       TARGET_THUMB ? TRUE : FALSE, mem, &offset));
 	    size = TARGET_ARM ? 16 : 0;
 	  }
 	else
@@ -8941,8 +8715,8 @@ (define_expand "untyped_return"
 	if (REGNO (reg) == R0_REGNUM)
 	  {
 	    /* On thumb we have to use a write-back instruction.  */
-	    emit_insn (arm_gen_load_multiple (R0_REGNUM, 4, addr, TRUE,
-			TARGET_THUMB ? TRUE : FALSE, mem, &offset));
+	    emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, 4, addr,
+ 		       TARGET_THUMB ? TRUE : FALSE, mem, &offset));
 	    size = TARGET_ARM ? 16 : 0;
 	  }
 	else
@@ -10459,87 +10233,6 @@ (define_peephole2
   ""
 )
 
-; Peepholes to spot possible load- and store-multiples, if the ordering is
-; reversed, check that the memory references aren't volatile.
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 4 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 5 "memory_operand" "m"))
-   (set (match_operand:SI 2 "s_register_operand" "=rk")
-        (match_operand:SI 6 "memory_operand" "m"))
-   (set (match_operand:SI 3 "s_register_operand" "=rk")
-        (match_operand:SI 7 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 4, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 4);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 3 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 4 "memory_operand" "m"))
-   (set (match_operand:SI 2 "s_register_operand" "=rk")
-        (match_operand:SI 5 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 3, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 3);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 2 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 3 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 2, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 2);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 4 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 5 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))
-   (set (match_operand:SI 6 "memory_operand" "=m")
-        (match_operand:SI 2 "s_register_operand" "rk"))
-   (set (match_operand:SI 7 "memory_operand" "=m")
-        (match_operand:SI 3 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 4, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 4);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 3 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 4 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))
-   (set (match_operand:SI 5 "memory_operand" "=m")
-        (match_operand:SI 2 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 3, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 3);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 2 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 3 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 2, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 2);
-  "
-)
-
 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(and:SI (ge:SI (match_operand:SI 1 "s_register_operand" "")
@@ -11323,6 +11016,8 @@ (define_expand "bswapsi2"
   "
 )
 
+;; Load the load/store multiple patterns
+(include "ldmstm.md")
 ;; Load the FPA co-processor patterns
 (include "fpa.md")
 ;; Load the Maverick co-processor patterns


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-04-20 11:37 ARM ldm/stm peepholes Bernd Schmidt
@ 2010-04-23  8:24 ` Ramana Radhakrishnan
  2010-04-23 13:11   ` Richard Earnshaw
  2010-07-07 11:57 ` Resubmit/Ping^n " Bernd Schmidt
  1 sibling, 1 reply; 13+ messages in thread
From: Ramana Radhakrishnan @ 2010-04-23  8:24 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches

Hi Bernd,

On Tue, 2010-04-20 at 13:41 +0200, Bernd Schmidt wrote:

> The patch may need more tuning to decide when to use these instructions;
> I'll need suggestions as I don't think I have enough information
> available to make these decisions.  For now, this uses the same
> heuristics as previously, but it may trigger in slightly more
> situations.  I'm open to suggestions.

Apologies for the delayed response, been away from my desk this week and
will be away for more today.

It would be useful to be able to control ldm /stm generation on a
per-core basis i.e. on some cores it might be useful to have ldm's and
stm's as a size optimization rather than as a performance optimization.
For instance the A8 and A9 have a good ldm and stm setup but there might
be circumstances where this might not be the right thing to do.

An instance of where you don't want to use ldm / stm is this patch
suggested here.
  http://gcc.gnu.org/ml/gcc-patches/2009-11/msg01553.html . 

Having a hook that acted as a gate for matching this peephole would be
useful. Then we could change this hook to work on a per-core basis or
put in any heuristics we want to change or tweak later.


> I've tested this quite often with arm-none-linux-gnueabi QEMU testsuite
> runs.  Yesterday, after fixing an aliasing issue, I managed to get a
> successful SPEC2000 run on a Cortex-A9 board (with a 4.4-based compiler;
> overall performance minimally higher than before); since then I've only
> made cleanups and added comments.  Once approved, I'll rerun all tests
> before committing.

Out of curiosity, was the aliasing issue PR42509 on a 4.4 branch ? If so
do you have a testcase that necessitates this fix to be backported to
the 4.4 branch upstream as well ?

cheers
Ramana

> 
> 
> Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-04-23  8:24 ` Ramana Radhakrishnan
@ 2010-04-23 13:11   ` Richard Earnshaw
  2010-05-05 11:45     ` Bernd Schmidt
  2010-06-08 15:57     ` Bernd Schmidt
  0 siblings, 2 replies; 13+ messages in thread
From: Richard Earnshaw @ 2010-04-23 13:11 UTC (permalink / raw)
  To: ramana.radhakrishnan; +Cc: Bernd Schmidt, GCC Patches


On Fri, 2010-04-23 at 09:13 +0100, Ramana Radhakrishnan wrote:
> Hi Bernd,
> 
> On Tue, 2010-04-20 at 13:41 +0200, Bernd Schmidt wrote:
> 
> > The patch may need more tuning to decide when to use these instructions;
> > I'll need suggestions as I don't think I have enough information
> > available to make these decisions.  For now, this uses the same
> > heuristics as previously, but it may trigger in slightly more
> > situations.  I'm open to suggestions.
> 
> Apologies for the delayed response, been away from my desk this week and
> will be away for more today.
> 
> It would be useful to be able to control ldm /stm generation on a
> per-core basis i.e. on some cores it might be useful to have ldm's and
> stm's as a size optimization rather than as a performance optimization.
> For instance the A8 and A9 have a good ldm and stm setup but there might
> be circumstances where this might not be the right thing to do.
> 

This shouldn't be done on an explicit 'per-core' basis, but on a cost of
alternatives basis.  That is, there should be a cost function that
returns the cost of an ldm with N registers; this can then be compared
with the cost of (N+1)/2 ldrd instructions (if valid) or N ldr
instructions and the cheapest alternative selected.  The new costing
framework that I committed last week should easily support adding such
functions.  It may be necessary to separate out the costs of LDM and STM
as they could, in theory be different.


R.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-04-23 13:11   ` Richard Earnshaw
@ 2010-05-05 11:45     ` Bernd Schmidt
  2010-05-05 12:11       ` Richard Earnshaw
  2010-06-08 15:57     ` Bernd Schmidt
  1 sibling, 1 reply; 13+ messages in thread
From: Bernd Schmidt @ 2010-05-05 11:45 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: ramana.radhakrishnan, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 307 bytes --]

In the hope of moving this along, here's a smaller patch with pieces
that do not introduce functional changes, but reorganize the code a
little in preparation for the full patch.  I hope this will make review
easier.

Tested on
arm-linux-gnueabi(qemu-system-armv7{arch=armv7-a/thumb,thumb,}).  Ok?


Bernd


[-- Attachment #2: ldmstm-pre1b.diff --]
[-- Type: text/plain, Size: 14970 bytes --]

	* config/arm/arm.h (MAX_LDM_STM_OPS): New macro.
	* config/arm/arm.c (multiple_operation_profitable_p,
	compute_offset_order): New static functions.
	(load_multiple_sequence, store_multiple_sequence): Use them.
	Replace constant 4 with MAX_LDM_STM_OPS.  Compute order[0] from
	memory offsets, not register numbers.
	(emit_ldm_seq, emit_stm_seq): Replace constant 4 with MAX_LDM_STM_OPS.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 158771)
+++ config/arm/arm.c	(working copy)
@@ -9074,21 +9074,105 @@ adjacent_mem_locations (rtx a, rtx b)
   return 0;
 }
 
+/* Return true iff it would be profitable to turn a sequence of NOPS loads
+   or stores (depending on IS_STORE) into a load-multiple or store-multiple
+   instruction.  NEED_ADD is true if the base address register needs to be
+   modified with an add instruction before we can use it.  */
+
+static bool
+multiple_operation_profitable_p (bool is_store ATTRIBUTE_UNUSED,
+				 int nops, bool need_add)
+ {
+  /* For ARM8,9 & StrongARM, 2 ldr instructions are faster than an ldm
+     if the offset isn't small enough.  The reason 2 ldrs are faster
+     is because these ARMs are able to do more than one cache access
+     in a single cycle.  The ARM9 and StrongARM have Harvard caches,
+     whilst the ARM8 has a double bandwidth cache.  This means that
+     these cores can do both an instruction fetch and a data fetch in
+     a single cycle, so the trick of calculating the address into a
+     scratch register (one of the result regs) and then doing a load
+     multiple actually becomes slower (and no smaller in code size).
+     That is the transformation
+
+ 	ldr	rd1, [rbase + offset]
+ 	ldr	rd2, [rbase + offset + 4]
+
+     to
+
+ 	add	rd1, rbase, offset
+ 	ldmia	rd1, {rd1, rd2}
+
+     produces worse code -- '3 cycles + any stalls on rd2' instead of
+     '2 cycles + any stalls on rd2'.  On ARMs with only one cache
+     access per cycle, the first sequence could never complete in less
+     than 6 cycles, whereas the ldm sequence would only take 5 and
+     would make better use of sequential accesses if not hitting the
+     cache.
+
+     We cheat here and test 'arm_ld_sched' which we currently know to
+     only be true for the ARM8, ARM9 and StrongARM.  If this ever
+     changes, then the test below needs to be reworked.  */
+  if (nops == 2 && arm_ld_sched && need_add)
+    return false;
+
+  return true;
+}
+
+/* Subroutine of load_multiple_sequence and store_multiple_sequence.
+   Given an array of UNSORTED_OFFSETS, of which there are NOPS, compute
+   an array ORDER which describes the sequence to use when accessing the
+   offsets that produces an ascending order.  In this sequence, each
+   offset must be larger by exactly 4 than the previous one.  ORDER[0]
+   must have been filled in with the lowest offset by the caller.
+   If UNSORTED_REGS is nonnull, it is an array of register numbers that
+   we use to verify that ORDER produces an ascending order of registers.
+   Return true if it was possible to construct such an order, false if
+   not.  */
+
+static bool
+compute_offset_order (int nops, HOST_WIDE_INT *unsorted_offsets, int *order,
+		      int *unsorted_regs)
+{
+  int i;
+  for (i = 1; i < nops; i++)
+    {
+      int j;
+
+      order[i] = order[i - 1];
+      for (j = 0; j < nops; j++)
+	if (unsorted_offsets[j] == unsorted_offsets[order[i - 1]] + 4)
+	  {
+	    /* We must find exactly one offset that is higher than the
+	       previous one by 4.  */
+	    if (order[i] != order[i - 1])
+	      return false;
+	    order[i] = j;
+	  }
+      if (order[i] == order[i - 1])
+	return false;
+      /* The register numbers must be ascending.  */
+      if (unsorted_regs != NULL
+	  && unsorted_regs[order[i]] <= unsorted_regs[order[i - 1]])
+	return false;
+    }
+  return true;
+}
+
 int
 load_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
 			HOST_WIDE_INT *load_offset)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
-  int i;
+  int i, ldm_case;
 
-  /* Can only handle 2, 3, or 4 insns at present,
-     though could be easily extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  /* Can only handle up to MAX_LDM_STM_OPS insns at present, though could be
+     easily extended if required.  */
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9124,25 +9208,16 @@ load_multiple_sequence (rtx *operands, i
 		  == CONST_INT)))
 	{
 	  if (i == 0)
-	    {
-	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
-	    }
+	    base_reg = REGNO (reg);
 	  else
 	    {
 	      if (base_reg != (int) REGNO (reg))
 		/* Not addressed from the same base register.  */
 		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
 	    }
+	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
+			      ? REGNO (operands[i])
+			      : REGNO (SUBREG_REG (operands[i])));
 
 	  /* If it isn't an integer register, or if it overwrites the
 	     base register but isn't the last insn in the list, then
@@ -9152,6 +9227,8 @@ load_multiple_sequence (rtx *operands, i
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9160,30 +9237,11 @@ load_multiple_sequence (rtx *operands, i
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
-
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
-
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+    return 0;
 
   if (base)
     {
@@ -9196,59 +9254,29 @@ load_multiple_sequence (rtx *operands, i
     }
 
   if (unsorted_offsets[order[0]] == 0)
-    return 1; /* ldmia */
-
-  if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
-    return 2; /* ldmib */
-
-  if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
-    return 3; /* ldmda */
-
-  if (unsorted_offsets[order[nops - 1]] == -4)
-    return 4; /* ldmdb */
-
-  /* For ARM8,9 & StrongARM, 2 ldr instructions are faster than an ldm
-     if the offset isn't small enough.  The reason 2 ldrs are faster
-     is because these ARMs are able to do more than one cache access
-     in a single cycle.  The ARM9 and StrongARM have Harvard caches,
-     whilst the ARM8 has a double bandwidth cache.  This means that
-     these cores can do both an instruction fetch and a data fetch in
-     a single cycle, so the trick of calculating the address into a
-     scratch register (one of the result regs) and then doing a load
-     multiple actually becomes slower (and no smaller in code size).
-     That is the transformation
-
- 	ldr	rd1, [rbase + offset]
- 	ldr	rd2, [rbase + offset + 4]
-
-     to
-
- 	add	rd1, rbase, offset
- 	ldmia	rd1, {rd1, rd2}
-
-     produces worse code -- '3 cycles + any stalls on rd2' instead of
-     '2 cycles + any stalls on rd2'.  On ARMs with only one cache
-     access per cycle, the first sequence could never complete in less
-     than 6 cycles, whereas the ldm sequence would only take 5 and
-     would make better use of sequential accesses if not hitting the
-     cache.
+    ldm_case = 1; /* ldmia */
+  else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
+    ldm_case = 2; /* ldmib */
+  else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
+    ldm_case = 3; /* ldmda */
+  else if (unsorted_offsets[order[nops - 1]] == -4)
+    ldm_case = 4; /* ldmdb */
+  else if (const_ok_for_arm (unsorted_offsets[order[0]])
+	   || const_ok_for_arm (-unsorted_offsets[order[0]]))
+    ldm_case = 5;
+  else
+    return 0;
 
-     We cheat here and test 'arm_ld_sched' which we currently know to
-     only be true for the ARM8, ARM9 and StrongARM.  If this ever
-     changes, then the test below needs to be reworked.  */
-  if (nops == 2 && arm_ld_sched)
+  if (!multiple_operation_profitable_p (false, nops, ldm_case == 5))
     return 0;
 
-  /* Can't do it without setting up the offset, only do this if it takes
-     no more than one insn.  */
-  return (const_ok_for_arm (unsorted_offsets[order[0]])
-	  || const_ok_for_arm (-unsorted_offsets[order[0]])) ? 5 : 0;
+  return ldm_case;
 }
 
 const char *
 emit_ldm_seq (rtx *operands, int nops)
 {
-  int regs[4];
+  int regs[MAX_LDM_STM_OPS];
   int base_reg;
   HOST_WIDE_INT offset;
   char buf[100];
@@ -9307,17 +9335,17 @@ int
 store_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
 			 HOST_WIDE_INT * load_offset)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
-  int i;
+  int i, stm_case;
 
-  /* Can only handle 2, 3, or 4 insns at present, though could be easily
-     extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  /* Can only handle up to MAX_LDM_STM_OPS insns at present, though could be
+     easily extended if required.  */
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9352,32 +9380,22 @@ store_multiple_sequence (rtx *operands, 
 	      && (GET_CODE (offset = XEXP (XEXP (operands[nops + i], 0), 1))
 		  == CONST_INT)))
 	{
+	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
+			      ? REGNO (operands[i])
+			      : REGNO (SUBREG_REG (operands[i])));
 	  if (i == 0)
-	    {
-	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
-	    }
-	  else
-	    {
-	      if (base_reg != (int) REGNO (reg))
-		/* Not addressed from the same base register.  */
-		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
-	    }
+	    base_reg = REGNO (reg);
+	  else if (base_reg != (int) REGNO (reg))
+	    /* Not addressed from the same base register.  */
+	    return 0;
 
 	  /* If it isn't an integer register, then we can't do this.  */
 	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14)
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9386,30 +9404,11 @@ store_multiple_sequence (rtx *operands, 
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
-
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
-
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+    return 0;
 
   if (base)
     {
@@ -9422,24 +9421,26 @@ store_multiple_sequence (rtx *operands, 
     }
 
   if (unsorted_offsets[order[0]] == 0)
-    return 1; /* stmia */
-
-  if (unsorted_offsets[order[0]] == 4)
-    return 2; /* stmib */
-
-  if (unsorted_offsets[order[nops - 1]] == 0)
-    return 3; /* stmda */
+    stm_case = 1; /* stmia */
+  else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
+    stm_case = 2; /* stmib */
+  else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
+    stm_case = 3; /* stmda */
+  else if (unsorted_offsets[order[nops - 1]] == -4)
+    stm_case = 4; /* stmdb */
+  else
+    return 0;
 
-  if (unsorted_offsets[order[nops - 1]] == -4)
-    return 4; /* stmdb */
+  if (!multiple_operation_profitable_p (false, nops, false))
+    return 0;
 
-  return 0;
+  return stm_case;
 }
 
 const char *
 emit_stm_seq (rtx *operands, int nops)
 {
-  int regs[4];
+  int regs[MAX_LDM_STM_OPS];
   int base_reg;
   HOST_WIDE_INT offset;
   char buf[100];
Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 158911)
+++ config/arm/arm.h	(working copy)
@@ -2768,4 +2768,8 @@ enum arm_builtins
 #define NEED_INDICATE_EXEC_STACK	0
 #endif
 
+/* The maximum number of parallel loads or stores we support in an ldm/stm
+   instruction.  */
+#define MAX_LDM_STM_OPS 4
+
 #endif /* ! GCC_ARM_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-05-05 11:45     ` Bernd Schmidt
@ 2010-05-05 12:11       ` Richard Earnshaw
  2010-05-05 22:47         ` Bernd Schmidt
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Earnshaw @ 2010-05-05 12:11 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: ramana.radhakrishnan, GCC Patches


On Wed, 2010-05-05 at 13:44 +0200, Bernd Schmidt wrote:
> In the hope of moving this along, here's a smaller patch with pieces
> that do not introduce functional changes, but reorganize the code a
> little in preparation for the full patch.  I hope this will make review
> easier.
> 
> Tested on
> arm-linux-gnueabi(qemu-system-armv7{arch=armv7-a/thumb,thumb,}).  Ok?
> 
> 
> Bernd
> 

Rather than need_add, I think it would be better to pass the offset.
There are times when the offset can't be done with a single add insn,
and we can't tell that if this is just a boolean (but it might still be
profitable in those cases).

Ok apart from that.

R.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-05-05 12:11       ` Richard Earnshaw
@ 2010-05-05 22:47         ` Bernd Schmidt
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-05-05 22:47 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: ramana.radhakrishnan, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On 05/05/2010 02:11 PM, Richard Earnshaw wrote:
> 
> On Wed, 2010-05-05 at 13:44 +0200, Bernd Schmidt wrote:
>> In the hope of moving this along, here's a smaller patch with pieces
>> that do not introduce functional changes, but reorganize the code a
>> little in preparation for the full patch.  I hope this will make review
>> easier.
>>
>> Tested on
>> arm-linux-gnueabi(qemu-system-armv7{arch=armv7-a/thumb,thumb,}).  Ok?
>>
>>
>> Bernd
>>
> 
> Rather than need_add, I think it would be better to pass the offset.
> There are times when the offset can't be done with a single add insn,
> and we can't tell that if this is just a boolean (but it might still be
> profitable in those cases).
> 
> Ok apart from that.

Thanks.  Here's what I committed after some sanity tests.


Bernd


[-- Attachment #2: ldmstm-committed.diff --]
[-- Type: text/plain, Size: 15368 bytes --]

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 159088)
+++ ChangeLog	(working copy)
@@ -1,3 +1,13 @@
+2010-05-06  Bernd Schmidt  <bernds@codesourcery.com>
+
+	* config/arm/arm.h (MAX_LDM_STM_OPS): New macro.
+	* config/arm/arm.c (multiple_operation_profitable_p,
+	compute_offset_order): New static functions.
+	(load_multiple_sequence, store_multiple_sequence): Use them.
+	Replace constant 4 with MAX_LDM_STM_OPS.  Compute order[0] from
+	memory offsets, not register numbers.
+	(emit_ldm_seq, emit_stm_seq): Replace constant 4 with MAX_LDM_STM_OPS.
+
 2010-05-05  Steven Bosscher  <steven@gcc.gnu.org>
 
 	* stor-layout.c (pending_sizes): Change the type to
Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 158771)
+++ config/arm/arm.c	(working copy)
@@ -9074,21 +9074,105 @@ adjacent_mem_locations (rtx a, rtx b)
   return 0;
 }
 
+/* Return true iff it would be profitable to turn a sequence of NOPS loads
+   or stores (depending on IS_STORE) into a load-multiple or store-multiple
+   instruction.  ADD_OFFSET is nonzero if the base address register needs
+   to be modified with an add instruction before we can use it.  */
+
+static bool
+multiple_operation_profitable_p (bool is_store ATTRIBUTE_UNUSED,
+				 int nops, HOST_WIDE_INT add_offset)
+ {
+  /* For ARM8,9 & StrongARM, 2 ldr instructions are faster than an ldm
+     if the offset isn't small enough.  The reason 2 ldrs are faster
+     is because these ARMs are able to do more than one cache access
+     in a single cycle.  The ARM9 and StrongARM have Harvard caches,
+     whilst the ARM8 has a double bandwidth cache.  This means that
+     these cores can do both an instruction fetch and a data fetch in
+     a single cycle, so the trick of calculating the address into a
+     scratch register (one of the result regs) and then doing a load
+     multiple actually becomes slower (and no smaller in code size).
+     That is the transformation
+
+ 	ldr	rd1, [rbase + offset]
+ 	ldr	rd2, [rbase + offset + 4]
+
+     to
+
+ 	add	rd1, rbase, offset
+ 	ldmia	rd1, {rd1, rd2}
+
+     produces worse code -- '3 cycles + any stalls on rd2' instead of
+     '2 cycles + any stalls on rd2'.  On ARMs with only one cache
+     access per cycle, the first sequence could never complete in less
+     than 6 cycles, whereas the ldm sequence would only take 5 and
+     would make better use of sequential accesses if not hitting the
+     cache.
+
+     We cheat here and test 'arm_ld_sched' which we currently know to
+     only be true for the ARM8, ARM9 and StrongARM.  If this ever
+     changes, then the test below needs to be reworked.  */
+  if (nops == 2 && arm_ld_sched && add_offset != 0)
+    return false;
+
+  return true;
+}
+
+/* Subroutine of load_multiple_sequence and store_multiple_sequence.
+   Given an array of UNSORTED_OFFSETS, of which there are NOPS, compute
+   an array ORDER which describes the sequence to use when accessing the
+   offsets that produces an ascending order.  In this sequence, each
+   offset must be larger by exactly 4 than the previous one.  ORDER[0]
+   must have been filled in with the lowest offset by the caller.
+   If UNSORTED_REGS is nonnull, it is an array of register numbers that
+   we use to verify that ORDER produces an ascending order of registers.
+   Return true if it was possible to construct such an order, false if
+   not.  */
+
+static bool
+compute_offset_order (int nops, HOST_WIDE_INT *unsorted_offsets, int *order,
+		      int *unsorted_regs)
+{
+  int i;
+  for (i = 1; i < nops; i++)
+    {
+      int j;
+
+      order[i] = order[i - 1];
+      for (j = 0; j < nops; j++)
+	if (unsorted_offsets[j] == unsorted_offsets[order[i - 1]] + 4)
+	  {
+	    /* We must find exactly one offset that is higher than the
+	       previous one by 4.  */
+	    if (order[i] != order[i - 1])
+	      return false;
+	    order[i] = j;
+	  }
+      if (order[i] == order[i - 1])
+	return false;
+      /* The register numbers must be ascending.  */
+      if (unsorted_regs != NULL
+	  && unsorted_regs[order[i]] <= unsorted_regs[order[i - 1]])
+	return false;
+    }
+  return true;
+}
+
 int
 load_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
 			HOST_WIDE_INT *load_offset)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
-  int i;
+  int i, ldm_case;
 
-  /* Can only handle 2, 3, or 4 insns at present,
-     though could be easily extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  /* Can only handle up to MAX_LDM_STM_OPS insns at present, though could be
+     easily extended if required.  */
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9124,25 +9208,16 @@ load_multiple_sequence (rtx *operands, i
 		  == CONST_INT)))
 	{
 	  if (i == 0)
-	    {
-	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
-	    }
+	    base_reg = REGNO (reg);
 	  else
 	    {
 	      if (base_reg != (int) REGNO (reg))
 		/* Not addressed from the same base register.  */
 		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
 	    }
+	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
+			      ? REGNO (operands[i])
+			      : REGNO (SUBREG_REG (operands[i])));
 
 	  /* If it isn't an integer register, or if it overwrites the
 	     base register but isn't the last insn in the list, then
@@ -9152,6 +9227,8 @@ load_multiple_sequence (rtx *operands, i
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9160,30 +9237,11 @@ load_multiple_sequence (rtx *operands, i
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
-
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
-
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+    return 0;
 
   if (base)
     {
@@ -9196,59 +9254,31 @@ load_multiple_sequence (rtx *operands, i
     }
 
   if (unsorted_offsets[order[0]] == 0)
-    return 1; /* ldmia */
-
-  if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
-    return 2; /* ldmib */
-
-  if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
-    return 3; /* ldmda */
-
-  if (unsorted_offsets[order[nops - 1]] == -4)
-    return 4; /* ldmdb */
-
-  /* For ARM8,9 & StrongARM, 2 ldr instructions are faster than an ldm
-     if the offset isn't small enough.  The reason 2 ldrs are faster
-     is because these ARMs are able to do more than one cache access
-     in a single cycle.  The ARM9 and StrongARM have Harvard caches,
-     whilst the ARM8 has a double bandwidth cache.  This means that
-     these cores can do both an instruction fetch and a data fetch in
-     a single cycle, so the trick of calculating the address into a
-     scratch register (one of the result regs) and then doing a load
-     multiple actually becomes slower (and no smaller in code size).
-     That is the transformation
-
- 	ldr	rd1, [rbase + offset]
- 	ldr	rd2, [rbase + offset + 4]
-
-     to
-
- 	add	rd1, rbase, offset
- 	ldmia	rd1, {rd1, rd2}
-
-     produces worse code -- '3 cycles + any stalls on rd2' instead of
-     '2 cycles + any stalls on rd2'.  On ARMs with only one cache
-     access per cycle, the first sequence could never complete in less
-     than 6 cycles, whereas the ldm sequence would only take 5 and
-     would make better use of sequential accesses if not hitting the
-     cache.
+    ldm_case = 1; /* ldmia */
+  else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
+    ldm_case = 2; /* ldmib */
+  else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
+    ldm_case = 3; /* ldmda */
+  else if (unsorted_offsets[order[nops - 1]] == -4)
+    ldm_case = 4; /* ldmdb */
+  else if (const_ok_for_arm (unsorted_offsets[order[0]])
+	   || const_ok_for_arm (-unsorted_offsets[order[0]]))
+    ldm_case = 5;
+  else
+    return 0;
 
-     We cheat here and test 'arm_ld_sched' which we currently know to
-     only be true for the ARM8, ARM9 and StrongARM.  If this ever
-     changes, then the test below needs to be reworked.  */
-  if (nops == 2 && arm_ld_sched)
+  if (!multiple_operation_profitable_p (false, nops,
+					ldm_case == 5
+					? unsorted_offsets[order[0]] : 0))
     return 0;
 
-  /* Can't do it without setting up the offset, only do this if it takes
-     no more than one insn.  */
-  return (const_ok_for_arm (unsorted_offsets[order[0]])
-	  || const_ok_for_arm (-unsorted_offsets[order[0]])) ? 5 : 0;
+  return ldm_case;
 }
 
 const char *
 emit_ldm_seq (rtx *operands, int nops)
 {
-  int regs[4];
+  int regs[MAX_LDM_STM_OPS];
   int base_reg;
   HOST_WIDE_INT offset;
   char buf[100];
@@ -9307,17 +9337,17 @@ int
 store_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
 			 HOST_WIDE_INT * load_offset)
 {
-  int unsorted_regs[4];
-  HOST_WIDE_INT unsorted_offsets[4];
-  int order[4];
+  int unsorted_regs[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
+  int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
-  int i;
+  int i, stm_case;
 
-  /* Can only handle 2, 3, or 4 insns at present, though could be easily
-     extended if required.  */
-  gcc_assert (nops >= 2 && nops <= 4);
+  /* Can only handle up to MAX_LDM_STM_OPS insns at present, though could be
+     easily extended if required.  */
+  gcc_assert (nops >= 2 && nops <= MAX_LDM_STM_OPS);
 
-  memset (order, 0, 4 * sizeof (int));
+  memset (order, 0, MAX_LDM_STM_OPS * sizeof (int));
 
   /* Loop over the operands and check that the memory references are
      suitable (i.e. immediate offsets from the same base register).  At
@@ -9352,32 +9382,22 @@ store_multiple_sequence (rtx *operands, 
 	      && (GET_CODE (offset = XEXP (XEXP (operands[nops + i], 0), 1))
 		  == CONST_INT)))
 	{
+	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
+			      ? REGNO (operands[i])
+			      : REGNO (SUBREG_REG (operands[i])));
 	  if (i == 0)
-	    {
-	      base_reg = REGNO (reg);
-	      unsorted_regs[0] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      order[0] = 0;
-	    }
-	  else
-	    {
-	      if (base_reg != (int) REGNO (reg))
-		/* Not addressed from the same base register.  */
-		return 0;
-
-	      unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-				  ? REGNO (operands[i])
-				  : REGNO (SUBREG_REG (operands[i])));
-	      if (unsorted_regs[i] < unsorted_regs[order[0]])
-		order[0] = i;
-	    }
+	    base_reg = REGNO (reg);
+	  else if (base_reg != (int) REGNO (reg))
+	    /* Not addressed from the same base register.  */
+	    return 0;
 
 	  /* If it isn't an integer register, then we can't do this.  */
 	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14)
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
+	  if (i == 0 || unsorted_offsets[i] < unsorted_offsets[order[0]])
+	    order[0] = i;
 	}
       else
 	/* Not a suitable memory address.  */
@@ -9386,30 +9406,11 @@ store_multiple_sequence (rtx *operands, 
 
   /* All the useful information has now been extracted from the
      operands into unsorted_regs and unsorted_offsets; additionally,
-     order[0] has been set to the lowest numbered register in the
-     list.  Sort the registers into order, and check that the memory
-     offsets are ascending and adjacent.  */
-
-  for (i = 1; i < nops; i++)
-    {
-      int j;
-
-      order[i] = order[i - 1];
-      for (j = 0; j < nops; j++)
-	if (unsorted_regs[j] > unsorted_regs[order[i - 1]]
-	    && (order[i] == order[i - 1]
-		|| unsorted_regs[j] < unsorted_regs[order[i]]))
-	  order[i] = j;
-
-      /* Have we found a suitable register? if not, one must be used more
-	 than once.  */
-      if (order[i] == order[i - 1])
-	return 0;
-
-      /* Is the memory address adjacent and ascending? */
-      if (unsorted_offsets[order[i]] != unsorted_offsets[order[i - 1]] + 4)
-	return 0;
-    }
+     order[0] has been set to the lowest offset in the list.  Sort
+     the offsets into order, verifying that they are adjacent, and
+     check that the register numbers are ascending.  */
+  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+    return 0;
 
   if (base)
     {
@@ -9422,24 +9423,26 @@ store_multiple_sequence (rtx *operands, 
     }
 
   if (unsorted_offsets[order[0]] == 0)
-    return 1; /* stmia */
-
-  if (unsorted_offsets[order[0]] == 4)
-    return 2; /* stmib */
-
-  if (unsorted_offsets[order[nops - 1]] == 0)
-    return 3; /* stmda */
+    stm_case = 1; /* stmia */
+  else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
+    stm_case = 2; /* stmib */
+  else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
+    stm_case = 3; /* stmda */
+  else if (unsorted_offsets[order[nops - 1]] == -4)
+    stm_case = 4; /* stmdb */
+  else
+    return 0;
 
-  if (unsorted_offsets[order[nops - 1]] == -4)
-    return 4; /* stmdb */
+  if (!multiple_operation_profitable_p (false, nops, 0))
+    return 0;
 
-  return 0;
+  return stm_case;
 }
 
 const char *
 emit_stm_seq (rtx *operands, int nops)
 {
-  int regs[4];
+  int regs[MAX_LDM_STM_OPS];
   int base_reg;
   HOST_WIDE_INT offset;
   char buf[100];
Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 158911)
+++ config/arm/arm.h	(working copy)
@@ -2768,4 +2768,8 @@ enum arm_builtins
 #define NEED_INDICATE_EXEC_STACK	0
 #endif
 
+/* The maximum number of parallel loads or stores we support in an ldm/stm
+   instruction.  */
+#define MAX_LDM_STM_OPS 4
+
 #endif /* ! GCC_ARM_H */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ARM ldm/stm peepholes
  2010-04-23 13:11   ` Richard Earnshaw
  2010-05-05 11:45     ` Bernd Schmidt
@ 2010-06-08 15:57     ` Bernd Schmidt
  2010-06-28 12:27       ` Ping: " Bernd Schmidt
  1 sibling, 1 reply; 13+ messages in thread
From: Bernd Schmidt @ 2010-06-08 15:57 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: ramana.radhakrishnan, GCC Patches

With "official" performance numbers apparently unavailable, Ramana asked
me to run some benchmarks, with a focus on when to enable generation of
ldm insns on Cortex-A9 (the assumption being that on other cores, ldm
should always be a win, and so should stm everywhere).

I've tested three variants of the patch against SPEC2k on an A9 board -
first the full patch, then a variant which restrict generation of ldm
insns to sequences of 3 or more, and one variant which only allows ldm
for 4 insn sequences.

Both the full patch and the 3/4-only variant are quite close to an
unpatched compiler.  The 3/4-only variant is 3.5% better on 164.gzip vs.
the full patch, which translates into a 0.3% improvement in overall
score.  The 4-only variant is significantly worse, with large drops in
164.gzip and 253.perlbmk, for an overall 1% lower score.

Given these results, what would you like me to do with the patch?  Leave
it as-is?  Modify it to disallow 2-insn ldms on Cortex-A9?  Something else?


Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Ping: ARM ldm/stm peepholes
  2010-06-08 15:57     ` Bernd Schmidt
@ 2010-06-28 12:27       ` Bernd Schmidt
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-06-28 12:27 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: ramana.radhakrishnan, GCC Patches

On 06/08/2010 05:57 PM, Bernd Schmidt wrote:
> With "official" performance numbers apparently unavailable, Ramana asked
> me to run some benchmarks, with a focus on when to enable generation of
> ldm insns on Cortex-A9 (the assumption being that on other cores, ldm
> should always be a win, and so should stm everywhere).
> 
> I've tested three variants of the patch against SPEC2k on an A9 board -
> first the full patch, then a variant which restrict generation of ldm
> insns to sequences of 3 or more, and one variant which only allows ldm
> for 4 insn sequences.
> 
> Both the full patch and the 3/4-only variant are quite close to an
> unpatched compiler.  The 3/4-only variant is 3.5% better on 164.gzip vs.
> the full patch, which translates into a 0.3% improvement in overall
> score.  The 4-only variant is significantly worse, with large drops in
> 164.gzip and 253.perlbmk, for an overall 1% lower score.
> 
> Given these results, what would you like me to do with the patch?  Leave
> it as-is?  Modify it to disallow 2-insn ldms on Cortex-A9?  Something else?

Ping.


Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Resubmit/Ping^n ARM ldm/stm peepholes
  2010-04-20 11:37 ARM ldm/stm peepholes Bernd Schmidt
  2010-04-23  8:24 ` Ramana Radhakrishnan
@ 2010-07-07 11:57 ` Bernd Schmidt
  2010-07-16 11:19   ` Ping^(n+1) " Bernd Schmidt
       [not found]   ` <1280580386.32159.25.camel@e102346-lin.cambridge.arm.com>
  1 sibling, 2 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-07-07 11:57 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Earnshaw, ramana.radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1605 bytes --]

Here's an updated version of my ldm/stm peepholes patch for current
trunk.  The main goal of this is to enable ldm/stm generation for Thumb
by using define_peephole2 and peep2_find_free_reg rather than
define_peephole; there are one or two new peepholes to recognize
additional opportunities.

I've rerun Cortex-A9 SPEC2000 benchmarks on our 4.4-based tree, where it
still causes a tiny performance improvement.  Please disregard the
previous set of benchmark results (for limiting this to 3/4- or
4-operation sequences only), I think those results were invalid.  I've
retested these, and limiting the transformation in such a way seems to
cause performance drops.

Previously there were requests to modify performance tuning, but there
were no answers to my questions about how exactly I should go about it,
and no information has been forthcoming about actual processor behaviour
which could be used to implement meaningful tuning.  As requested, I ran
some benchmarks and posted results, which were also ignored
(fortunately, see above).  Since the patch isn't primarily intended to
change code generation significantly on ARM/Thumb-2 code anyway (and
given the performance results mentioned above), I feel it is
unreasonable to hold this up any further.  Additional improvements may
be possible on top of it, but IMO it's a self-contained improvement
as-is.  An earlier patch already introduced the
multiple_operation_profitable_p function which can be used for tuning.

Tested with my usual arm-linux/qemu configuration.  Earlier, I posted a
fix for the PR44404 problem which showed up.

Ok?


Bernd

[-- Attachment #2: ldmstm-v10.diff --]
[-- Type: text/plain, Size: 114401 bytes --]

	PR target/40457
	* config/arm/arm.h (arm_regs_in_sequence): Declare.
	* config/arm/arm-protos.h (emit_ldm_seq, emit_stm_seq,
	load_multiple_sequence, store_multiple_sequence): Delete
	declarations.
	(arm_gen_load_multiple, arm_gen_store_multiple): Adjust
	declarations.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm.c (arm_regs_in_sequence): New array.
	(load_multiple_sequence): Now static.  New args SAVED_ORDER,
	CHECK_REGS.  All callers changed.
	If SAVED_ORDER is nonnull, copy the computed order into it.
	If CHECK_REGS is false, don't sort REGS.  Handle Thumb mode.
	(store_multiple_sequence): Now static.  New args NOPS_TOTAL,
	SAVED_ORDER, REG_RTXS and CHECK_REGS.  All callers changed.
	If SAVED_ORDER is nonnull, copy the computed order into it.
	If CHECK_REGS is false, don't sort REGS.  Set up REG_RTXS just
	like REGS.  Handle Thumb mode.
	(arm_gen_load_multiple_1): New function, broken out of
	arm_gen_load_multiple.
	(arm_gen_store_multiple_1): New function, broken out of
	arm_gen_store_multiple.
	(arm_gen_multiple_op): New function, with code from
	arm_gen_load_multiple and arm_gen_store_multiple moved here.
	(arm_gen_load_multiple, arm_gen_store_multiple): Now just
	wrappers around arm_gen_multiple_op.  Remove argument UP, all callers
	changed.
	(gen_ldm_seq, gen_stm_seq, gen_const_stm_seq): New functions.
	* config/arm/predicates.md (commutative_binary_operator): New.
	(load_multiple_operation, store_multiple_operation): Handle more
	variants of these patterns with different starting offsets.  Handle
	Thumb-1.
	* config/arm/arm.md: Include "ldmstm.md".
	(ldmsi_postinc4, ldmsi_postinc4_thumb1, ldmsi_postinc3, ldmsi_postinc2,
	ldmsi4, ldmsi3, ldmsi2, stmsi_postinc4, stmsi_postinc4_thumb1,
	stmsi_postinc3, stmsi_postinc2, stmsi4, stmsi3, stmsi2 and related
	peepholes): Delete.
	* config/arm/ldmstm.md: New file.
	* config/arm/arm-ldmstm.ml: New file.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c	(revision 161831)
+++ config/arm/arm.c	(working copy)
@@ -733,6 +733,12 @@ static const char * const arm_condition_
   "hi", "ls", "ge", "lt", "gt", "le", "al", "nv"
 };
 
+/* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
+int arm_regs_in_sequence[] =
+{
+  0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
+};
+
 #define ARM_LSL_NAME (TARGET_UNIFIED_ASM ? "lsl" : "asl")
 #define streq(string1, string2) (strcmp (string1, string2) == 0)
 
@@ -9182,13 +9188,29 @@ compute_offset_order (int nops, HOST_WID
   return true;
 }
 
-int
-load_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
-			HOST_WIDE_INT *load_offset)
+/* Used to determine in a peephole whether a sequence of load
+   instructions can be changed into a load-multiple instruction.
+   NOPS is the number of separate load instructions we are examining.  The
+   first NOPS entries in OPERANDS are the destination registers, the
+   next NOPS entries are memory operands.  If this function is
+   successful, *BASE is set to the common base register of the memory
+   accesses; *LOAD_OFFSET is set to the first memory location's offset
+   from that base register.
+   REGS is an array filled in with the destination register numbers.
+   SAVED_ORDER (if nonnull), is an array filled in with an order that maps
+   insn numbers to to an ascending order of stores.  If CHECK_REGS is true,
+   the sequence of registers in REGS matches the loads from ascending memory
+   locations, and the function verifies that the register numbers are
+   themselves ascending.  If CHECK_REGS is false, the register numbers
+   are stored in the order they are found in the operands.  */
+static int
+load_multiple_sequence (rtx *operands, int nops, int *regs, int *saved_order,
+			int *base, HOST_WIDE_INT *load_offset, bool check_regs)
 {
   int unsorted_regs[MAX_LDM_STM_OPS];
   HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
   int order[MAX_LDM_STM_OPS];
+  rtx base_reg_rtx = NULL;
   int base_reg = -1;
   int i, ldm_case;
 
@@ -9232,13 +9254,16 @@ load_multiple_sequence (rtx *operands, i
 		  == CONST_INT)))
 	{
 	  if (i == 0)
-	    base_reg = REGNO (reg);
-	  else
 	    {
-	      if (base_reg != (int) REGNO (reg))
-		/* Not addressed from the same base register.  */
+	      base_reg = REGNO (reg);
+	      base_reg_rtx = reg;
+	      if (TARGET_THUMB1 && base_reg > LAST_LO_REGNUM)
 		return 0;
 	    }
+	  else if (base_reg != (int) REGNO (reg))
+	    /* Not addressed from the same base register.  */
+	    return 0;
+
 	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
 			      ? REGNO (operands[i])
 			      : REGNO (SUBREG_REG (operands[i])));
@@ -9246,7 +9271,9 @@ load_multiple_sequence (rtx *operands, i
 	  /* If it isn't an integer register, or if it overwrites the
 	     base register but isn't the last insn in the list, then
 	     we can't do this.  */
-	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14
+	  if (unsorted_regs[i] < 0
+	      || (TARGET_THUMB1 && unsorted_regs[i] > LAST_LO_REGNUM)
+	      || unsorted_regs[i] > 14
 	      || (i != nops - 1 && unsorted_regs[i] == base_reg))
 	    return 0;
 
@@ -9264,26 +9291,34 @@ load_multiple_sequence (rtx *operands, i
      order[0] has been set to the lowest offset in the list.  Sort
      the offsets into order, verifying that they are adjacent, and
      check that the register numbers are ascending.  */
-  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+  if (!compute_offset_order (nops, unsorted_offsets, order,
+			     check_regs ? unsorted_regs : NULL))
     return 0;
 
+  if (saved_order)
+    memcpy (saved_order, order, sizeof order);
+
   if (base)
     {
       *base = base_reg;
 
       for (i = 0; i < nops; i++)
-	regs[i] = unsorted_regs[order[i]];
+	regs[i] = unsorted_regs[check_regs ? order[i] : i];
 
       *load_offset = unsorted_offsets[order[0]];
     }
 
+  if (TARGET_THUMB1
+      && !peep2_reg_dead_p (nops, base_reg_rtx))
+    return 0;
+
   if (unsorted_offsets[order[0]] == 0)
     ldm_case = 1; /* ldmia */
   else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
     ldm_case = 2; /* ldmib */
   else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
     ldm_case = 3; /* ldmda */
-  else if (unsorted_offsets[order[nops - 1]] == -4)
+  else if (TARGET_32BIT && unsorted_offsets[order[nops - 1]] == -4)
     ldm_case = 4; /* ldmdb */
   else if (const_ok_for_arm (unsorted_offsets[order[0]])
 	   || const_ok_for_arm (-unsorted_offsets[order[0]]))
@@ -9299,72 +9334,34 @@ load_multiple_sequence (rtx *operands, i
   return ldm_case;
 }
 
-const char *
-emit_ldm_seq (rtx *operands, int nops)
-{
-  int regs[MAX_LDM_STM_OPS];
-  int base_reg;
-  HOST_WIDE_INT offset;
-  char buf[100];
-  int i;
-
-  switch (load_multiple_sequence (operands, nops, regs, &base_reg, &offset))
-    {
-    case 1:
-      strcpy (buf, "ldm%(ia%)\t");
-      break;
-
-    case 2:
-      strcpy (buf, "ldm%(ib%)\t");
-      break;
-
-    case 3:
-      strcpy (buf, "ldm%(da%)\t");
-      break;
-
-    case 4:
-      strcpy (buf, "ldm%(db%)\t");
-      break;
-
-    case 5:
-      if (offset >= 0)
-	sprintf (buf, "add%%?\t%s%s, %s%s, #%ld", REGISTER_PREFIX,
-		 reg_names[regs[0]], REGISTER_PREFIX, reg_names[base_reg],
-		 (long) offset);
-      else
-	sprintf (buf, "sub%%?\t%s%s, %s%s, #%ld", REGISTER_PREFIX,
-		 reg_names[regs[0]], REGISTER_PREFIX, reg_names[base_reg],
-		 (long) -offset);
-      output_asm_insn (buf, operands);
-      base_reg = regs[0];
-      strcpy (buf, "ldm%(ia%)\t");
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-  sprintf (buf + strlen (buf), "%s%s, {%s%s", REGISTER_PREFIX,
-	   reg_names[base_reg], REGISTER_PREFIX, reg_names[regs[0]]);
-
-  for (i = 1; i < nops; i++)
-    sprintf (buf + strlen (buf), ", %s%s", REGISTER_PREFIX,
-	     reg_names[regs[i]]);
-
-  strcat (buf, "}\t%@ phole ldm");
-
-  output_asm_insn (buf, operands);
-  return "";
-}
-
-int
-store_multiple_sequence (rtx *operands, int nops, int *regs, int *base,
-			 HOST_WIDE_INT * load_offset)
+/* Used to determine in a peephole whether a sequence of store instructions can
+   be changed into a store-multiple instruction.
+   NOPS is the number of separate store instructions we are examining.
+   NOPS_TOTAL is the total number of instructions recognized by the peephole
+   pattern.
+   The first NOPS entries in OPERANDS are the source registers, the next
+   NOPS entries are memory operands.  If this function is successful, *BASE is
+   set to the common base register of the memory accesses; *LOAD_OFFSET is set
+   to the first memory location's offset from that base register.  REGS is an
+   array filled in with the source register numbers, REG_RTXS (if nonnull) is
+   likewise filled with the corresponding rtx's.
+   SAVED_ORDER (if nonnull), is an array filled in with an order that maps insn
+   numbers to to an ascending order of stores.
+   If CHECK_REGS is true, the sequence of registers in *REGS matches the stores
+   from ascending memory locations, and the function verifies that the register
+   numbers are themselves ascending.  If CHECK_REGS is false, the register
+   numbers are stored in the order they are found in the operands.  */
+static int
+store_multiple_sequence (rtx *operands, int nops, int nops_total,
+			 int *regs, rtx *reg_rtxs, int *saved_order, int *base,
+			 HOST_WIDE_INT *load_offset, bool check_regs)
 {
   int unsorted_regs[MAX_LDM_STM_OPS];
+  rtx unsorted_reg_rtxs[MAX_LDM_STM_OPS];
   HOST_WIDE_INT unsorted_offsets[MAX_LDM_STM_OPS];
   int order[MAX_LDM_STM_OPS];
   int base_reg = -1;
+  rtx base_reg_rtx = NULL;
   int i, stm_case;
 
   /* Can only handle up to MAX_LDM_STM_OPS insns at present, though could be
@@ -9406,17 +9403,27 @@ store_multiple_sequence (rtx *operands, 
 	      && (GET_CODE (offset = XEXP (XEXP (operands[nops + i], 0), 1))
 		  == CONST_INT)))
 	{
-	  unsorted_regs[i] = (GET_CODE (operands[i]) == REG
-			      ? REGNO (operands[i])
-			      : REGNO (SUBREG_REG (operands[i])));
+	  unsorted_reg_rtxs[i] = (GET_CODE (operands[i]) == REG
+				  ? operands[i] : SUBREG_REG (operands[i]));
+	  unsorted_regs[i] = REGNO (unsorted_reg_rtxs[i]);
+
 	  if (i == 0)
-	    base_reg = REGNO (reg);
+	    {
+	      base_reg = REGNO (reg);
+	      base_reg_rtx = reg;
+	      if (TARGET_THUMB1 && base_reg > LAST_LO_REGNUM)
+		return 0;
+	    }
 	  else if (base_reg != (int) REGNO (reg))
 	    /* Not addressed from the same base register.  */
 	    return 0;
 
 	  /* If it isn't an integer register, then we can't do this.  */
-	  if (unsorted_regs[i] < 0 || unsorted_regs[i] > 14)
+	  if (unsorted_regs[i] < 0
+	      || (TARGET_THUMB1 && unsorted_regs[i] > LAST_LO_REGNUM)
+	      || (TARGET_THUMB2 && unsorted_regs[i] == base_reg)
+	      || (TARGET_THUMB2 && unsorted_regs[i] == SP_REGNUM)
+	      || unsorted_regs[i] > 14)
 	    return 0;
 
 	  unsorted_offsets[i] = INTVAL (offset);
@@ -9433,26 +9440,38 @@ store_multiple_sequence (rtx *operands, 
      order[0] has been set to the lowest offset in the list.  Sort
      the offsets into order, verifying that they are adjacent, and
      check that the register numbers are ascending.  */
-  if (!compute_offset_order (nops, unsorted_offsets, order, unsorted_regs))
+  if (!compute_offset_order (nops, unsorted_offsets, order,
+			     check_regs ? unsorted_regs : NULL))
     return 0;
 
+  if (saved_order)
+    memcpy (saved_order, order, sizeof order);
+
   if (base)
     {
       *base = base_reg;
 
       for (i = 0; i < nops; i++)
-	regs[i] = unsorted_regs[order[i]];
+	{
+	  regs[i] = unsorted_regs[check_regs ? order[i] : i];
+	  if (reg_rtxs)
+	    reg_rtxs[i] = unsorted_reg_rtxs[check_regs ? order[i] : i];
+	}
 
       *load_offset = unsorted_offsets[order[0]];
     }
 
+  if (TARGET_THUMB1
+      && !peep2_reg_dead_p (nops_total, base_reg_rtx))
+    return 0;
+
   if (unsorted_offsets[order[0]] == 0)
     stm_case = 1; /* stmia */
   else if (TARGET_ARM && unsorted_offsets[order[0]] == 4)
     stm_case = 2; /* stmib */
   else if (TARGET_ARM && unsorted_offsets[order[nops - 1]] == 0)
     stm_case = 3; /* stmda */
-  else if (unsorted_offsets[order[nops - 1]] == -4)
+  else if (TARGET_32BIT && unsorted_offsets[order[nops - 1]] == -4)
     stm_case = 4; /* stmdb */
   else
     return 0;
@@ -9462,62 +9481,21 @@ store_multiple_sequence (rtx *operands, 
 
   return stm_case;
 }
-
-const char *
-emit_stm_seq (rtx *operands, int nops)
-{
-  int regs[MAX_LDM_STM_OPS];
-  int base_reg;
-  HOST_WIDE_INT offset;
-  char buf[100];
-  int i;
-
-  switch (store_multiple_sequence (operands, nops, regs, &base_reg, &offset))
-    {
-    case 1:
-      strcpy (buf, "stm%(ia%)\t");
-      break;
-
-    case 2:
-      strcpy (buf, "stm%(ib%)\t");
-      break;
-
-    case 3:
-      strcpy (buf, "stm%(da%)\t");
-      break;
-
-    case 4:
-      strcpy (buf, "stm%(db%)\t");
-      break;
-
-    default:
-      gcc_unreachable ();
-    }
-
-  sprintf (buf + strlen (buf), "%s%s, {%s%s", REGISTER_PREFIX,
-	   reg_names[base_reg], REGISTER_PREFIX, reg_names[regs[0]]);
-
-  for (i = 1; i < nops; i++)
-    sprintf (buf + strlen (buf), ", %s%s", REGISTER_PREFIX,
-	     reg_names[regs[i]]);
-
-  strcat (buf, "}\t%@ phole stm");
-
-  output_asm_insn (buf, operands);
-  return "";
-}
 \f
 /* Routines for use in generating RTL.  */
 
-rtx
-arm_gen_load_multiple (int base_regno, int count, rtx from, int up,
-		       int write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+/* Generate a load-multiple instruction.  COUNT is the number of loads in
+   the instruction; REGS and MEMS are arrays containing the operands.
+   BASEREG is the base register to be used in addressing the memory operands.
+   WBACK_OFFSET is nonzero if the instruction should update the base
+   register.  */
+
+static rtx
+arm_gen_load_multiple_1 (int count, int *regs, rtx *mems, rtx basereg,
+			 HOST_WIDE_INT wback_offset)
 {
-  HOST_WIDE_INT offset = *offsetp;
   int i = 0, j;
   rtx result;
-  int sign = up ? 1 : -1;
-  rtx mem, addr;
 
   /* XScale has load-store double instructions, but they have stricter
      alignment requirements than load-store multiple, so we cannot
@@ -9554,18 +9532,10 @@ arm_gen_load_multiple (int base_regno, i
       start_sequence ();
 
       for (i = 0; i < count; i++)
-	{
-	  addr = plus_constant (from, i * 4 * sign);
-	  mem = adjust_automodify_address (basemem, SImode, addr, offset);
-	  emit_move_insn (gen_rtx_REG (SImode, base_regno + i), mem);
-	  offset += 4 * sign;
-	}
+	emit_move_insn (gen_rtx_REG (SImode, regs[i]), mems[i]);
 
-      if (write_back)
-	{
-	  emit_move_insn (from, plus_constant (from, count * 4 * sign));
-	  *offsetp = offset;
-	}
+      if (wback_offset != 0)
+	emit_move_insn (basereg, plus_constant (basereg, wback_offset));
 
       seq = get_insns ();
       end_sequence ();
@@ -9574,41 +9544,40 @@ arm_gen_load_multiple (int base_regno, i
     }
 
   result = gen_rtx_PARALLEL (VOIDmode,
-			     rtvec_alloc (count + (write_back ? 1 : 0)));
-  if (write_back)
+			     rtvec_alloc (count + (wback_offset != 0 ? 1 : 0)));
+  if (wback_offset != 0)
     {
       XVECEXP (result, 0, 0)
-	= gen_rtx_SET (VOIDmode, from, plus_constant (from, count * 4 * sign));
+	= gen_rtx_SET (VOIDmode, basereg,
+		       plus_constant (basereg, wback_offset));
       i = 1;
       count++;
     }
 
   for (j = 0; i < count; i++, j++)
-    {
-      addr = plus_constant (from, j * 4 * sign);
-      mem = adjust_automodify_address_nv (basemem, SImode, addr, offset);
-      XVECEXP (result, 0, i)
-	= gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, base_regno + j), mem);
-      offset += 4 * sign;
-    }
-
-  if (write_back)
-    *offsetp = offset;
+    XVECEXP (result, 0, i)
+      = gen_rtx_SET (VOIDmode, gen_rtx_REG (SImode, regs[j]), mems[j]);
 
   return result;
 }
 
-rtx
-arm_gen_store_multiple (int base_regno, int count, rtx to, int up,
-			int write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+/* Generate a store-multiple instruction.  COUNT is the number of stores in
+   the instruction; REGS and MEMS are arrays containing the operands.
+   BASEREG is the base register to be used in addressing the memory operands.
+   WBACK_OFFSET is nonzero if the instruction should update the base
+   register.  */
+
+static rtx
+arm_gen_store_multiple_1 (int count, int *regs, rtx *mems, rtx basereg,
+			  HOST_WIDE_INT wback_offset)
 {
-  HOST_WIDE_INT offset = *offsetp;
   int i = 0, j;
   rtx result;
-  int sign = up ? 1 : -1;
-  rtx mem, addr;
 
-  /* See arm_gen_load_multiple for discussion of
+  if (GET_CODE (basereg) == PLUS)
+    basereg = XEXP (basereg, 0);
+
+  /* See arm_gen_load_multiple_1 for discussion of
      the pros/cons of ldm/stm usage for XScale.  */
   if (arm_tune_xscale && count <= 2 && ! optimize_size)
     {
@@ -9617,18 +9586,10 @@ arm_gen_store_multiple (int base_regno, 
       start_sequence ();
 
       for (i = 0; i < count; i++)
-	{
-	  addr = plus_constant (to, i * 4 * sign);
-	  mem = adjust_automodify_address (basemem, SImode, addr, offset);
-	  emit_move_insn (mem, gen_rtx_REG (SImode, base_regno + i));
-	  offset += 4 * sign;
-	}
+	emit_move_insn (mems[i], gen_rtx_REG (SImode, regs[i]));
 
-      if (write_back)
-	{
-	  emit_move_insn (to, plus_constant (to, count * 4 * sign));
-	  *offsetp = offset;
-	}
+      if (wback_offset != 0)
+	emit_move_insn (basereg, plus_constant (basereg, wback_offset));
 
       seq = get_insns ();
       end_sequence ();
@@ -9637,29 +9598,319 @@ arm_gen_store_multiple (int base_regno, 
     }
 
   result = gen_rtx_PARALLEL (VOIDmode,
-			     rtvec_alloc (count + (write_back ? 1 : 0)));
-  if (write_back)
+			     rtvec_alloc (count + (wback_offset != 0 ? 1 : 0)));
+  if (wback_offset != 0)
     {
       XVECEXP (result, 0, 0)
-	= gen_rtx_SET (VOIDmode, to,
-		       plus_constant (to, count * 4 * sign));
+	= gen_rtx_SET (VOIDmode, basereg,
+		       plus_constant (basereg, wback_offset));
       i = 1;
       count++;
     }
 
   for (j = 0; i < count; i++, j++)
+    XVECEXP (result, 0, i)
+      = gen_rtx_SET (VOIDmode, mems[j], gen_rtx_REG (SImode, regs[j]));
+
+  return result;
+}
+
+/* Generate either a load-multiple or a store-multiple instruction.  This
+   function can be used in situations where we can start with a single MEM
+   rtx and adjust its address upwards.
+   COUNT is the number of operations in the instruction, not counting a
+   possible update of the base register.  REGS is an array containing the
+   register operands.
+   BASEREG is the base register to be used in addressing the memory operands,
+   which are constructed from BASEMEM.
+   WRITE_BACK specifies whether the generated instruction should include an
+   update of the base register.
+   OFFSETP is used to pass an offset to and from this function; this offset
+   is not used when constructing the address (instead BASEMEM should have an
+   appropriate offset in its address), it is used only for setting
+   MEM_OFFSET.  It is updated only if WRITE_BACK is true.*/
+
+static rtx
+arm_gen_multiple_op (bool is_load, int *regs, int count, rtx basereg,
+		     bool write_back, rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  rtx mems[MAX_LDM_STM_OPS];
+  HOST_WIDE_INT offset = *offsetp;
+  int i;
+
+  gcc_assert (count <= MAX_LDM_STM_OPS);
+
+  if (GET_CODE (basereg) == PLUS)
+    basereg = XEXP (basereg, 0);
+
+  for (i = 0; i < count; i++)
     {
-      addr = plus_constant (to, j * 4 * sign);
-      mem = adjust_automodify_address_nv (basemem, SImode, addr, offset);
-      XVECEXP (result, 0, i)
-	= gen_rtx_SET (VOIDmode, mem, gen_rtx_REG (SImode, base_regno + j));
-      offset += 4 * sign;
+      rtx addr = plus_constant (basereg, i * 4);
+      mems[i] = adjust_automodify_address_nv (basemem, SImode, addr, offset);
+      offset += 4;
     }
 
   if (write_back)
     *offsetp = offset;
 
-  return result;
+  if (is_load)
+    return arm_gen_load_multiple_1 (count, regs, mems, basereg,
+				    write_back ? 4 * count : 0);
+  else
+    return arm_gen_store_multiple_1 (count, regs, mems, basereg,
+				     write_back ? 4 * count : 0);
+}
+
+rtx
+arm_gen_load_multiple (int *regs, int count, rtx basereg, int write_back,
+		       rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  return arm_gen_multiple_op (TRUE, regs, count, basereg, write_back, basemem,
+			      offsetp);
+}
+
+rtx
+arm_gen_store_multiple (int *regs, int count, rtx basereg, int write_back,
+			rtx basemem, HOST_WIDE_INT *offsetp)
+{
+  return arm_gen_multiple_op (FALSE, regs, count, basereg, write_back, basemem,
+			      offsetp);
+}
+
+/* Called from a peephole2 expander to turn a sequence of loads into an
+   LDM instruction.  OPERANDS are the operands found by the peephole matcher;
+   NOPS indicates how many separate loads we are trying to combine.  SORT_REGS
+   is true if we can reorder the registers because they are used commutatively
+   subsequently.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_ldm_seq (rtx *operands, int nops, bool sort_regs)
+{
+  int regs[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int i, j, base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int ldm_case;
+  rtx addr;
+
+  ldm_case = load_multiple_sequence (operands, nops, regs, mem_order,
+				     &base_reg, &offset, !sort_regs);
+
+  if (ldm_case == 0)
+    return false;
+
+  if (sort_regs)
+    for (i = 0; i < nops - 1; i++)
+      for (j = i + 1; j < nops; j++)
+	if (regs[i] > regs[j])
+	  {
+	    int t = regs[i];
+	    regs[i] = regs[j];
+	    regs[j] = t;
+	  }
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (peep2_reg_dead_p (nops, base_reg_rtx));
+      gcc_assert (ldm_case == 1 || ldm_case == 5);
+      write_back = TRUE;
+    }
+
+  if (ldm_case == 5)
+    {
+      rtx newbase = TARGET_THUMB1 ? base_reg_rtx : gen_rtx_REG (SImode, regs[0]);
+      emit_insn (gen_addsi3 (newbase, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+      if (!TARGET_THUMB1)
+	{
+	  base_reg = regs[0];
+	  base_reg_rtx = newbase;
+	}
+    }
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_load_multiple_1 (nops, regs, mems, base_reg_rtx,
+				      write_back ? offset + i * 4 : 0));
+  return true;
+}
+
+/* Called from a peephole2 expander to turn a sequence of stores into an
+   STM instruction.  OPERANDS are the operands found by the peephole matcher;
+   NOPS indicates how many separate stores we are trying to combine.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_stm_seq (rtx *operands, int nops)
+{
+  int i;
+  int regs[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int stm_case;
+  rtx addr;
+  bool base_reg_dies;
+
+  stm_case = store_multiple_sequence (operands, nops, nops, regs, NULL,
+				      mem_order, &base_reg, &offset, true);
+
+  if (stm_case == 0)
+    return false;
+
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  base_reg_dies = peep2_reg_dead_p (nops, base_reg_rtx);
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (base_reg_dies);
+      write_back = TRUE;
+    }
+
+  if (stm_case == 5)
+    {
+      gcc_assert (base_reg_dies);
+      emit_insn (gen_addsi3 (base_reg_rtx, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+    }
+
+  addr = plus_constant (base_reg_rtx, offset);
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_store_multiple_1 (nops, regs, mems, base_reg_rtx,
+				       write_back ? offset + i * 4 : 0));
+  return true;
+}
+
+/* Called from a peephole2 expander to turn a sequence of stores that are
+   preceded by constant loads into an STM instruction.  OPERANDS are the
+   operands found by the peephole matcher; NOPS indicates how many
+   separate stores we are trying to combine; there are 2 * NOPS
+   instructions in the peephole.
+   Returns true iff we could generate a new instruction.  */
+
+bool
+gen_const_stm_seq (rtx *operands, int nops)
+{
+  int regs[MAX_LDM_STM_OPS], sorted_regs[MAX_LDM_STM_OPS];
+  int reg_order[MAX_LDM_STM_OPS], mem_order[MAX_LDM_STM_OPS];
+  rtx reg_rtxs[MAX_LDM_STM_OPS], orig_reg_rtxs[MAX_LDM_STM_OPS];
+  rtx mems[MAX_LDM_STM_OPS];
+  int base_reg;
+  rtx base_reg_rtx;
+  HOST_WIDE_INT offset;
+  int write_back = FALSE;
+  int stm_case;
+  rtx addr;
+  bool base_reg_dies;
+  int i, j;
+  HARD_REG_SET allocated;
+
+  stm_case = store_multiple_sequence (operands, nops, 2 * nops, regs, reg_rtxs,
+				      mem_order, &base_reg, &offset, false);
+
+  if (stm_case == 0)
+    return false;
+
+  memcpy (orig_reg_rtxs, reg_rtxs, sizeof orig_reg_rtxs);
+
+  /* If the same register is used more than once, try to find a free
+     register.  */
+  CLEAR_HARD_REG_SET (allocated);
+  for (i = 0; i < nops; i++)
+    {
+      for (j = i + 1; j < nops; j++)
+	if (regs[i] == regs[j])
+	  {
+	    rtx t = peep2_find_free_register (0, nops * 2,
+					      TARGET_THUMB1 ? "l" : "r",
+					      SImode, &allocated);
+	    if (t == NULL_RTX)
+	      return false;
+	    reg_rtxs[i] = t;
+	    regs[i] = REGNO (t);
+	  }
+    }
+
+  /* Compute an ordering that maps the register numbers to an ascending
+     sequence.  */
+  reg_order[0] = 0;
+  for (i = 0; i < nops; i++)
+    if (regs[i] < regs[reg_order[0]])
+      reg_order[0] = i;
+
+  for (i = 1; i < nops; i++)
+    {
+      int this_order = reg_order[i - 1];
+      for (j = 0; j < nops; j++)
+	if (regs[j] > regs[reg_order[i - 1]]
+	    && (this_order == reg_order[i - 1]
+		|| regs[j] < regs[this_order]))
+	  this_order = j;
+      reg_order[i] = this_order;
+    }
+
+  /* Ensure that registers that must be live after the instruction end
+     up with the correct value.  */
+  for (i = 0; i < nops; i++)
+    {
+      int this_order = reg_order[i];
+      if ((this_order != mem_order[i]
+	   || orig_reg_rtxs[this_order] != reg_rtxs[this_order])
+	  && !peep2_reg_dead_p (nops * 2, orig_reg_rtxs[this_order]))
+	return false;
+    }
+
+  /* Load the constants.  */
+  for (i = 0; i < nops; i++)
+    {
+      rtx op = operands[2 * nops + mem_order[i]];
+      sorted_regs[i] = regs[reg_order[i]];
+      emit_move_insn (reg_rtxs[reg_order[i]], op);
+    }
+
+  base_reg_rtx = gen_rtx_REG (Pmode, base_reg);
+
+  base_reg_dies = peep2_reg_dead_p (nops * 2, base_reg_rtx);
+  if (TARGET_THUMB1)
+    {
+      gcc_assert (base_reg_dies);
+      write_back = TRUE;
+    }
+
+  if (stm_case == 5)
+    {
+      gcc_assert (base_reg_dies);
+      emit_insn (gen_addsi3 (base_reg_rtx, base_reg_rtx, GEN_INT (offset)));
+      offset = 0;
+    }
+
+  addr = plus_constant (base_reg_rtx, offset);
+
+  for (i = 0; i < nops; i++)
+    {
+      addr = plus_constant (base_reg_rtx, offset + i * 4);
+      mems[i] = adjust_automodify_address_nv (operands[nops + mem_order[i]],
+					      SImode, addr, 0);
+    }
+  emit_insn (arm_gen_store_multiple_1 (nops, sorted_regs, mems, base_reg_rtx,
+				       write_back ? offset + i * 4 : 0));
+  return true;
 }
 
 int
@@ -9695,20 +9946,21 @@ arm_gen_movmemqi (rtx *operands)
   for (i = 0; in_words_to_go >= 2; i+=4)
     {
       if (in_words_to_go > 4)
-	emit_insn (arm_gen_load_multiple (0, 4, src, TRUE, TRUE,
-					  srcbase, &srcoffset));
+	emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, 4, src,
+					  TRUE, srcbase, &srcoffset));
       else
-	emit_insn (arm_gen_load_multiple (0, in_words_to_go, src, TRUE,
-					  FALSE, srcbase, &srcoffset));
+	emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, in_words_to_go,
+					  src, FALSE, srcbase,
+					  &srcoffset));
 
       if (out_words_to_go)
 	{
 	  if (out_words_to_go > 4)
-	    emit_insn (arm_gen_store_multiple (0, 4, dst, TRUE, TRUE,
-					       dstbase, &dstoffset));
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, 4, dst,
+					       TRUE, dstbase, &dstoffset));
 	  else if (out_words_to_go != 1)
-	    emit_insn (arm_gen_store_multiple (0, out_words_to_go,
-					       dst, TRUE,
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence,
+					       out_words_to_go, dst,
 					       (last_bytes == 0
 						? FALSE : TRUE),
 					       dstbase, &dstoffset));
Index: config/arm/arm.h
===================================================================
--- config/arm/arm.h	(revision 161824)
+++ config/arm/arm.h	(working copy)
@@ -1087,6 +1087,9 @@ extern int arm_structure_size_boundary;
   ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
    || (MODE) == CImode || (MODE) == XImode)
 
+/* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
+extern int arm_regs_in_sequence[];
+
 /* The order in which register should be allocated.  It is good to use ip
    since no saving is required (though calls clobber it) and it never contains
    function parameters.  It is quite good to use lr since other calls may
Index: config/arm/arm-protos.h
===================================================================
--- config/arm/arm-protos.h	(revision 161824)
+++ config/arm/arm-protos.h	(working copy)
@@ -98,14 +98,11 @@ extern int symbol_mentioned_p (rtx);
 extern int label_mentioned_p (rtx);
 extern RTX_CODE minmax_code (rtx);
 extern int adjacent_mem_locations (rtx, rtx);
-extern int load_multiple_sequence (rtx *, int, int *, int *, HOST_WIDE_INT *);
-extern const char *emit_ldm_seq (rtx *, int);
-extern int store_multiple_sequence (rtx *, int, int *, int *, HOST_WIDE_INT *);
-extern const char * emit_stm_seq (rtx *, int);
-extern rtx arm_gen_load_multiple (int, int, rtx, int, int,
-				  rtx, HOST_WIDE_INT *);
-extern rtx arm_gen_store_multiple (int, int, rtx, int, int,
-				   rtx, HOST_WIDE_INT *);
+extern bool gen_ldm_seq (rtx *, int, bool);
+extern bool gen_stm_seq (rtx *, int);
+extern bool gen_const_stm_seq (rtx *, int);
+extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
Index: config/arm/ldmstm.md
===================================================================
--- config/arm/ldmstm.md	(revision 0)
+++ config/arm/ldmstm.md	(revision 0)
@@ -0,0 +1,1191 @@
+/* ARM ldm/stm instruction patterns.  This file was automatically generated
+   using arm-ldmstm.ml.  Please do not edit manually.
+
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+(define_insn "*ldm4_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 5 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm4_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 5 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")])
+
+(define_insn "*ldm4_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm4_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&l")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")])
+
+(define_insn "*stm4_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 5 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "stm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm4_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&l")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
+  "stm%(ia%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")])
+
+(define_insn "*ldm4_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 16))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ib%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 12))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int 16))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "ldm%(ib%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 16)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(ib%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int 16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int 16)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "stm%(ib%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(da%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 5)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "ldm%(da%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(da%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 5))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 5"
+  "stm%(da%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk")
+                  (const_int -16))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(db%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm4_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -16))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -12))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -8))))
+     (set (match_operand:SI 4 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 5)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "ldm%(db%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 5 "s_register_operand" "rk") (const_int -16)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(db%)\t%5, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm4_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 5 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 5) (const_int -16)))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -16)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -12)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 5) (const_int -4)))
+          (match_operand:SI 4 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
+  "stm%(db%)\t%5!, {%1, %2, %3, %4}"
+  [(set_attr "type" "store4")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "memory_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 6 "memory_operand" ""))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 7 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 4, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (parallel
+    [(set (match_operand:SI 1 "s_register_operand" "")
+          (match_operand:SI 5 "memory_operand" ""))
+     (set (match_operand:SI 2 "s_register_operand" "")
+          (match_operand:SI 6 "memory_operand" ""))
+     (set (match_operand:SI 3 "s_register_operand" "")
+          (match_operand:SI 7 "memory_operand" ""))])]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 4, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 9 "const_int_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 10 "const_int_operand" ""))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_dup 2))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 11 "const_int_operand" ""))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_dup 3))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 9 "const_int_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 10 "const_int_operand" ""))
+   (set (match_operand:SI 3 "s_register_operand" "")
+        (match_operand:SI 11 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_dup 2))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_dup 3))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 4 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))
+   (set (match_operand:SI 6 "memory_operand" "")
+        (match_operand:SI 2 "s_register_operand" ""))
+   (set (match_operand:SI 7 "memory_operand" "")
+        (match_operand:SI 3 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 4))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_insn "*ldm3_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 4 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm3_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 4 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")])
+
+(define_insn "*ldm3_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm3_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&l")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")])
+
+(define_insn "*stm3_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 4 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm3_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&l")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 4"
+  "stm%(ia%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")])
+
+(define_insn "*ldm3_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 12))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ib%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int 12))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(ib%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(ib%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int 12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int 12)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(ib%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(da%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 4)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "ldm%(da%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(da%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 4))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 4"
+  "stm%(da%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk")
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(db%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm3_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -12))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -8))))
+     (set (match_operand:SI 3 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 4)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "ldm%(db%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "load3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 4 "s_register_operand" "rk") (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(db%)\t%4, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm3_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 4 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 4) (const_int -12)))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -12)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 4) (const_int -4)))
+          (match_operand:SI 3 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
+  "stm%(db%)\t%4!, {%1, %2, %3}"
+  [(set_attr "type" "store3")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 4 "memory_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 5 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 3, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (parallel
+    [(set (match_operand:SI 1 "s_register_operand" "")
+          (match_operand:SI 4 "memory_operand" ""))
+     (set (match_operand:SI 2 "s_register_operand" "")
+          (match_operand:SI 5 "memory_operand" ""))])]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 3, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 6 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 7 "const_int_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 2))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 6 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 7 "const_int_operand" ""))
+   (set (match_operand:SI 2 "s_register_operand" "")
+        (match_operand:SI 8 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_dup 1))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_dup 2))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 3 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 4 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))
+   (set (match_operand:SI 5 "memory_operand" "")
+        (match_operand:SI 2 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 3))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_insn "*ldm2_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 3 "s_register_operand" "rk")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm2_ia"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_operand:SI 3 "s_register_operand" "l")))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")])
+
+(define_insn "*ldm2_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_ldm2_ia_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&l")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")])
+
+(define_insn "*stm2_ia"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (match_operand:SI 3 "s_register_operand" "rk"))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "stm%(ia%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb_stm2_ia_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&l")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 3"
+  "stm%(ia%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")])
+
+(define_insn "*ldm2_ib"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 8))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "ldm%(ib%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_ib_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int 8))))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(ib%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ib"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "stm%(ib%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_ib_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int 8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int 8)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(ib%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_da"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int -4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "ldm%(da%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_da_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 3)))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "ldm%(da%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_da"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int -4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 2"
+  "stm%(da%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_da_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (match_dup 3))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_ARM && XVECLEN (operands[0], 0) == 3"
+  "stm%(da%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_db"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk")
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "ldm%(db%)\t%3, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*ldm2_db_update"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -8))))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 3)
+                  (const_int -4))))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "ldm%(db%)\t%3!, {%1, %2}"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_db"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (mem:SI (plus:SI (match_operand:SI 3 "s_register_operand" "rk") (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
+  "stm%(db%)\t%3, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*stm2_db_update"
+  [(match_parallel 0 "store_multiple_operation"
+    [(set (match_operand:SI 3 "s_register_operand" "+&rk")
+          (plus:SI (match_dup 3) (const_int -8)))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -8)))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 3) (const_int -4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
+  "stm%(db%)\t%3!, {%1, %2}"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_ldm_seq (operands, 2, false))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "const_int_operand" ""))
+   (set (match_operand:SI 2 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "const_int_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 1))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 4 "const_int_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 5 "const_int_operand" ""))
+   (set (match_operand:SI 2 "memory_operand" "")
+        (match_dup 0))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_dup 1))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_const_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 2 "memory_operand" "")
+        (match_operand:SI 0 "s_register_operand" ""))
+   (set (match_operand:SI 3 "memory_operand" "")
+        (match_operand:SI 1 "s_register_operand" ""))]
+  ""
+  [(const_int 0)]
+{
+  if (gen_stm_seq (operands, 2))
+    DONE;
+  else
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (parallel
+     [(set (match_operand:SI 4 "s_register_operand" "")
+           (match_operator:SI 5 "commutative_binary_operator"
+            [(match_operand:SI 6 "s_register_operand" "")
+             (match_operand:SI 7 "s_register_operand" "")]))
+      (clobber (reg:CC CC_REGNUM))])]
+  "(((operands[6] == operands[0] && operands[7] == operands[1])
+     || (operands[7] == operands[0] && operands[6] == operands[1]))
+    && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1]))"
+  [(parallel
+    [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))
+     (clobber (reg:CC CC_REGNUM))])]
+{
+  if (!gen_ldm_seq (operands, 2, true))
+    FAIL;
+})
+
+(define_peephole2
+  [(set (match_operand:SI 0 "s_register_operand" "")
+        (match_operand:SI 2 "memory_operand" ""))
+   (set (match_operand:SI 1 "s_register_operand" "")
+        (match_operand:SI 3 "memory_operand" ""))
+   (set (match_operand:SI 4 "s_register_operand" "")
+        (match_operator:SI 5 "commutative_binary_operator"
+         [(match_operand:SI 6 "s_register_operand" "")
+          (match_operand:SI 7 "s_register_operand" "")]))]
+  "(((operands[6] == operands[0] && operands[7] == operands[1])
+     || (operands[7] == operands[0] && operands[6] == operands[1]))
+    && peep2_reg_dead_p (3, operands[0]) && peep2_reg_dead_p (3, operands[1]))"
+  [(set (match_dup 4) (match_op_dup 5 [(match_dup 6) (match_dup 7)]))]
+{
+  if (!gen_ldm_seq (operands, 2, true))
+    FAIL;
+})
+
Index: config/arm/predicates.md
===================================================================
--- config/arm/predicates.md	(revision 161824)
+++ config/arm/predicates.md	(working copy)
@@ -206,6 +206,11 @@ (define_special_predicate "logical_binar
   (and (match_code "ior,xor,and")
        (match_test "mode == GET_MODE (op)")))
 
+;; True for commutative operators
+(define_special_predicate "commutative_binary_operator"
+  (and (match_code "ior,xor,and,plus")
+       (match_test "mode == GET_MODE (op)")))
+
 ;; True for shift operators.
 (define_special_predicate "shift_operator"
   (and (ior (ior (and (match_code "mult")
@@ -330,13 +335,17 @@ (define_special_predicate "load_multiple
   (match_code "parallel")
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
-  int dest_regno;
+  unsigned dest_regno;
   rtx src_addr;
   HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT offset = 0;
   rtx elt;
+  bool addr_reg_loaded = false;
+  bool update = false;
 
   if (count <= 1
-      || GET_CODE (XVECEXP (op, 0, 0)) != SET)
+      || GET_CODE (XVECEXP (op, 0, 0)) != SET
+      || !REG_P (SET_DEST (XVECEXP (op, 0, 0))))
     return false;
 
   /* Check to see if this might be a write-back.  */
@@ -344,6 +353,7 @@ (define_special_predicate "load_multiple
     {
       i++;
       base = 1;
+      update = true;
 
       /* Now check it more carefully.  */
       if (GET_CODE (SET_DEST (elt)) != REG
@@ -362,6 +372,15 @@ (define_special_predicate "load_multiple
 
   dest_regno = REGNO (SET_DEST (XVECEXP (op, 0, i - 1)));
   src_addr = XEXP (SET_SRC (XVECEXP (op, 0, i - 1)), 0);
+  if (GET_CODE (src_addr) == PLUS)
+    {
+      if (GET_CODE (XEXP (src_addr, 1)) != CONST_INT)
+	return false;
+      offset = INTVAL (XEXP (src_addr, 1));
+      src_addr = XEXP (src_addr, 0);
+    }
+  if (!REG_P (src_addr))
+    return false;
 
   for (; i < count; i++)
     {
@@ -370,16 +389,28 @@ (define_special_predicate "load_multiple
       if (GET_CODE (elt) != SET
           || GET_CODE (SET_DEST (elt)) != REG
           || GET_MODE (SET_DEST (elt)) != SImode
-          || REGNO (SET_DEST (elt)) != (unsigned int)(dest_regno + i - base)
+          || REGNO (SET_DEST (elt)) <= dest_regno
           || GET_CODE (SET_SRC (elt)) != MEM
           || GET_MODE (SET_SRC (elt)) != SImode
-          || GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
-          || !rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
-          || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
-          || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1)) != (i - base) * 4)
+          || ((GET_CODE (XEXP (SET_SRC (elt), 0)) != PLUS
+	       || !rtx_equal_p (XEXP (XEXP (SET_SRC (elt), 0), 0), src_addr)
+	       || GET_CODE (XEXP (XEXP (SET_SRC (elt), 0), 1)) != CONST_INT
+	       || INTVAL (XEXP (XEXP (SET_SRC (elt), 0), 1)) != offset + (i - base) * 4)
+	      && (!REG_P (XEXP (SET_SRC (elt), 0))
+		  || offset + (i - base) * 4 != 0)))
         return false;
+      dest_regno = REGNO (SET_DEST (elt));
+      if (dest_regno == REGNO (src_addr))
+        addr_reg_loaded = true;
     }
-
+  /* For Thumb, we only have updating instructions.  If the pattern does
+     not describe an update, it must be because the address register is
+     in the list of loaded registers - on the hardware, this has the effect
+     of overriding the update.  */
+  if (update && addr_reg_loaded)
+    return false;
+  if (TARGET_THUMB1)
+    return update || addr_reg_loaded;
   return true;
 })
 
@@ -387,9 +418,9 @@ (define_special_predicate "store_multipl
   (match_code "parallel")
 {
   HOST_WIDE_INT count = XVECLEN (op, 0);
-  int src_regno;
+  unsigned src_regno;
   rtx dest_addr;
-  HOST_WIDE_INT i = 1, base = 0;
+  HOST_WIDE_INT i = 1, base = 0, offset = 0;
   rtx elt;
 
   if (count <= 1
@@ -420,6 +451,16 @@ (define_special_predicate "store_multipl
   src_regno = REGNO (SET_SRC (XVECEXP (op, 0, i - 1)));
   dest_addr = XEXP (SET_DEST (XVECEXP (op, 0, i - 1)), 0);
 
+  if (GET_CODE (dest_addr) == PLUS)
+    {
+      if (GET_CODE (XEXP (dest_addr, 1)) != CONST_INT)
+	return false;
+      offset = INTVAL (XEXP (dest_addr, 1));
+      dest_addr = XEXP (dest_addr, 0);
+    }
+  if (!REG_P (dest_addr))
+    return false;
+
   for (; i < count; i++)
     {
       elt = XVECEXP (op, 0, i);
@@ -427,14 +468,17 @@ (define_special_predicate "store_multipl
       if (GET_CODE (elt) != SET
           || GET_CODE (SET_SRC (elt)) != REG
           || GET_MODE (SET_SRC (elt)) != SImode
-          || REGNO (SET_SRC (elt)) != (unsigned int)(src_regno + i - base)
+          || REGNO (SET_SRC (elt)) <= src_regno
           || GET_CODE (SET_DEST (elt)) != MEM
           || GET_MODE (SET_DEST (elt)) != SImode
-          || GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
-          || !rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
-          || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
-          || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1)) != (i - base) * 4)
+          || ((GET_CODE (XEXP (SET_DEST (elt), 0)) != PLUS
+	       || !rtx_equal_p (XEXP (XEXP (SET_DEST (elt), 0), 0), dest_addr)
+	       || GET_CODE (XEXP (XEXP (SET_DEST (elt), 0), 1)) != CONST_INT
+               || INTVAL (XEXP (XEXP (SET_DEST (elt), 0), 1)) != offset + (i - base) * 4)
+	      && (!REG_P (XEXP (SET_DEST (elt), 0))
+		  || offset + (i - base) * 4 != 0)))
         return false;
+      src_regno = REGNO (SET_SRC (elt));
     }
 
   return true;
Index: config/arm/arm-ldmstm.ml
===================================================================
--- config/arm/arm-ldmstm.ml	(revision 0)
+++ config/arm/arm-ldmstm.ml	(revision 0)
@@ -0,0 +1,332 @@
+(* Auto-generate ARM ldm/stm patterns
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by CodeSourcery.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3, or (at your option) any later
+   version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.
+
+   This is an O'Caml program.  The O'Caml compiler is available from:
+
+     http://caml.inria.fr/
+
+   Or from your favourite OS's friendly packaging system. Tested with version
+   3.09.2, though other versions will probably work too.
+
+   Run with:
+     ocaml arm-ldmstm.ml >/path/to/gcc/config/arm/ldmstm.ml
+*)
+
+type amode = IA | IB | DA | DB
+
+type optype = IN | OUT | INOUT
+
+let rec string_of_addrmode addrmode =
+  match addrmode with
+    IA -> "ia" | IB -> "ib" | DA -> "da" | DB -> "db"
+
+let rec initial_offset addrmode nregs =
+  match addrmode with
+    IA -> 0
+  | IB -> 4
+  | DA -> -4 * nregs + 4
+  | DB -> -4 * nregs
+
+let rec final_offset addrmode nregs =
+  match addrmode with
+    IA -> nregs * 4
+  | IB -> nregs * 4
+  | DA -> -4 * nregs
+  | DB -> -4 * nregs
+
+let constr thumb =
+  if thumb then "l" else "rk"
+
+let inout_constr op_type =
+  match op_type with
+  OUT -> "=&"
+  | INOUT -> "+&"
+  | IN -> ""
+
+let destreg nregs first op_type thumb =
+  if not first then
+    Printf.sprintf "(match_dup %d)" (nregs + 1)
+  else
+    Printf.sprintf ("(match_operand:SI %d \"s_register_operand\" \"%s%s\")")
+      (nregs + 1) (inout_constr op_type) (constr thumb)
+
+let write_ldm_set thumb nregs offset opnr first =
+  let indent = "     " in
+  Printf.printf "%s" (if first then "    [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"arm_hard_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (mem:SI " indent;
+  begin if offset != 0 then Printf.printf "(plus:SI " end;
+  Printf.printf "%s" (destreg nregs first IN thumb);
+  begin if offset != 0 then Printf.printf "\n%s             (const_int %d))" indent offset end;
+  Printf.printf "))"
+
+let write_stm_set thumb nregs offset opnr first =
+  let indent = "     " in
+  Printf.printf "%s" (if first then "    [" else indent);
+  Printf.printf "(set (mem:SI ";
+  begin if offset != 0 then Printf.printf "(plus:SI " end;
+  Printf.printf "%s" (destreg nregs first IN thumb);
+  begin if offset != 0 then Printf.printf " (const_int %d))" offset end;
+  Printf.printf ")\n%s     (match_operand:SI %d \"arm_hard_register_operand\" \"\"))" indent opnr 
+
+let write_ldm_peep_set extra_indent nregs opnr first =
+  let indent = "   " ^ extra_indent in
+  Printf.printf "%s" (if first then extra_indent ^ "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"s_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (match_operand:SI %d \"memory_operand\" \"\"))" indent (nregs + opnr)
+
+let write_stm_peep_set extra_indent nregs opnr first =
+  let indent = "   " ^ extra_indent in
+  Printf.printf "%s" (if first then extra_indent ^ "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"memory_operand\" \"\")\n" (nregs + opnr);
+  Printf.printf "%s     (match_operand:SI %d \"s_register_operand\" \"\"))" indent opnr
+
+let write_any_load optype nregs opnr first =
+  let indent = "   " in
+  Printf.printf "%s" (if first then "  [" else indent);
+  Printf.printf "(set (match_operand:SI %d \"s_register_operand\" \"\")\n" opnr;
+  Printf.printf "%s     (match_operand:SI %d \"%s\" \"\"))" indent (nregs * 2 + opnr) optype
+
+let write_const_store nregs opnr first =
+  let indent = "   " in
+  Printf.printf "%s(set (match_operand:SI %d \"memory_operand\" \"\")\n" indent (nregs + opnr);
+  Printf.printf "%s     (match_dup %d))" indent opnr
+
+let write_const_stm_peep_set nregs opnr first =
+  write_any_load "const_int_operand" nregs opnr first;
+  Printf.printf "\n";
+  write_const_store nregs opnr false
+
+  
+let rec write_pat_sets func opnr offset first n_left =
+  func offset opnr first;
+  begin
+    if n_left > 1 then begin
+      Printf.printf "\n";
+      write_pat_sets func (opnr + 1) (offset + 4) false (n_left - 1);
+    end else
+      Printf.printf "]"
+  end
+
+let rec write_peep_sets func opnr first n_left =
+  func opnr first;
+  begin
+    if n_left > 1 then begin
+      Printf.printf "\n";
+      write_peep_sets func (opnr + 1) false (n_left - 1);
+    end
+  end
+    
+let can_thumb addrmode update is_store =
+  match addrmode, update, is_store with
+    (* Thumb1 mode only supports IA with update.  However, for LDMIA,
+       if the address register also appears in the list of loaded
+       registers, the loaded value is stored, hence the RTL pattern
+       to describe such an insn does not have an update.  We check
+       in the match_parallel predicate that the condition described
+       above is met.  *)
+    IA, _, false -> true
+  | IA, true, true -> true
+  | _ -> false
+
+let target addrmode thumb =
+  match addrmode, thumb with
+    IA, true -> "TARGET_THUMB1"
+  | IA, false -> "TARGET_32BIT"
+  | DB, false -> "TARGET_32BIT"
+  | _, false -> "TARGET_ARM"
+
+let write_pattern_1 name ls addrmode nregs write_set_fn update thumb =
+  let astr = string_of_addrmode addrmode in
+  Printf.printf "(define_insn \"*%s%s%d_%s%s\"\n"
+    (if thumb then "thumb_" else "") name nregs astr
+    (if update then "_update" else "");
+  Printf.printf "  [(match_parallel 0 \"%s_multiple_operation\"\n" ls;
+  begin
+    if update then begin
+      Printf.printf "    [(set %s\n          (plus:SI %s"
+	(destreg nregs true INOUT thumb) (destreg nregs false IN thumb);
+      Printf.printf " (const_int %d)))\n"
+	(final_offset addrmode nregs)
+    end
+  end;
+  write_pat_sets
+    (write_set_fn thumb nregs) 1
+    (initial_offset addrmode nregs)
+    (not update) nregs;
+  Printf.printf ")]\n  \"%s && XVECLEN (operands[0], 0) == %d\"\n"
+    (target addrmode thumb)
+    (if update then nregs + 1 else nregs);
+  Printf.printf "  \"%s%%(%s%%)\\t%%%d%s, {"
+    name astr (nregs + 1) (if update then "!" else "");
+  for n = 1 to nregs; do
+    Printf.printf "%%%d%s" n (if n < nregs then ", " else "")
+  done;
+  Printf.printf "}\"\n";
+  Printf.printf "  [(set_attr \"type\" \"%s%d\")" ls nregs;
+  begin if not thumb then
+    Printf.printf "\n   (set_attr \"predicable\" \"yes\")";
+  end;
+  Printf.printf "])\n\n"
+
+let write_ldm_pattern addrmode nregs update =
+  write_pattern_1 "ldm" "load" addrmode nregs write_ldm_set update false;
+  begin if can_thumb addrmode update false then
+    write_pattern_1 "ldm" "load" addrmode nregs write_ldm_set update true;
+  end
+
+let write_stm_pattern addrmode nregs update =
+  write_pattern_1 "stm" "store" addrmode nregs write_stm_set update false;
+  begin if can_thumb addrmode update true then
+    write_pattern_1 "stm" "store" addrmode nregs write_stm_set update true;
+  end
+
+let write_ldm_commutative_peephole thumb =
+  let nregs = 2 in
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_ldm_peep_set "" nregs) 0 true nregs;
+  let indent = "   " in
+  if thumb then begin
+    Printf.printf "\n%s(set (match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2);
+    Printf.printf "%s     (match_operator:SI %d \"commutative_binary_operator\"\n" indent (nregs * 2 + 1);
+    Printf.printf "%s      [(match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2 + 2);
+    Printf.printf "%s       (match_operand:SI %d \"s_register_operand\" \"\")]))]\n" indent (nregs * 2 + 3)
+  end else begin
+    Printf.printf "\n%s(parallel\n" indent;
+    Printf.printf "%s  [(set (match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2);
+    Printf.printf "%s        (match_operator:SI %d \"commutative_binary_operator\"\n" indent (nregs * 2 + 1);
+    Printf.printf "%s         [(match_operand:SI %d \"s_register_operand\" \"\")\n" indent (nregs * 2 + 2);
+    Printf.printf "%s          (match_operand:SI %d \"s_register_operand\" \"\")]))\n" indent (nregs * 2 + 3);
+    Printf.printf "%s   (clobber (reg:CC CC_REGNUM))])]\n" indent
+  end;
+  Printf.printf "  \"(((operands[%d] == operands[0] && operands[%d] == operands[1])\n" (nregs * 2 + 2) (nregs * 2 + 3);
+  Printf.printf "     || (operands[%d] == operands[0] && operands[%d] == operands[1]))\n" (nregs * 2 + 3) (nregs * 2 + 2);
+  Printf.printf "    && peep2_reg_dead_p (%d, operands[0]) && peep2_reg_dead_p (%d, operands[1]))\"\n" (nregs + 1) (nregs + 1);
+  begin
+    if thumb then
+      Printf.printf "  [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))]\n"
+	(nregs * 2) (nregs * 2 + 1) (nregs * 2 + 2) (nregs * 2 + 3)
+    else begin
+      Printf.printf "  [(parallel\n";
+      Printf.printf "    [(set (match_dup %d) (match_op_dup %d [(match_dup %d) (match_dup %d)]))\n"
+	(nregs * 2) (nregs * 2 + 1) (nregs * 2 + 2) (nregs * 2 + 3);
+      Printf.printf "     (clobber (reg:CC CC_REGNUM))])]\n"
+    end
+  end;
+  Printf.printf "{\n  if (!gen_ldm_seq (operands, %d, true))\n    FAIL;\n" nregs;
+  Printf.printf "})\n\n"
+
+let write_ldm_peephole nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_ldm_peep_set "" nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_ldm_seq (operands, %d, false))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_ldm_peephole_b nregs =
+  if nregs > 2 then begin
+    Printf.printf "(define_peephole2\n";
+    write_ldm_peep_set "" nregs 0 true;
+    Printf.printf "\n   (parallel\n";
+    write_peep_sets (write_ldm_peep_set "  " nregs) 1 true (nregs - 1);
+    Printf.printf "])]\n  \"\"\n  [(const_int 0)]\n{\n";
+    Printf.printf "  if (gen_ldm_seq (operands, %d, false))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+  end
+
+let write_stm_peephole nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_stm_peep_set "" nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_stm_peephole_b nregs =
+  if nregs > 2 then begin
+    Printf.printf "(define_peephole2\n";
+    write_stm_peep_set "" nregs 0 true;
+    Printf.printf "\n   (parallel\n";
+    write_peep_sets (write_stm_peep_set "" nregs) 1 true (nregs - 1);
+    Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+    Printf.printf "  if (gen_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+  end
+
+let write_const_stm_peephole_a nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_const_stm_peep_set nregs) 0 true nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_const_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let write_const_stm_peephole_b nregs =
+  Printf.printf "(define_peephole2\n";
+  write_peep_sets (write_any_load "const_int_operand" nregs) 0 true nregs;
+  Printf.printf "\n";
+  write_peep_sets (write_const_store nregs) 0 false nregs;
+  Printf.printf "]\n  \"\"\n  [(const_int 0)]\n{\n";
+  Printf.printf "  if (gen_const_stm_seq (operands, %d))\n    DONE;\n  else\n    FAIL;\n})\n\n" nregs
+
+let patterns () =
+  let addrmodes = [ IA; IB; DA; DB ]  in
+  let sizes = [ 4; 3; 2] in
+  List.iter
+    (fun n ->
+      List.iter
+	(fun addrmode ->
+	  write_ldm_pattern addrmode n false;
+	  write_ldm_pattern addrmode n true;
+	  write_stm_pattern addrmode n false;
+	  write_stm_pattern addrmode n true)
+	addrmodes;
+      write_ldm_peephole n;
+      write_ldm_peephole_b n;
+      write_const_stm_peephole_a n;
+      write_const_stm_peephole_b n;
+      write_stm_peephole n;)
+    sizes;
+  write_ldm_commutative_peephole false;
+  write_ldm_commutative_peephole true
+
+let print_lines = List.iter (fun s -> Format.printf "%s@\n" s)
+
+(* Do it.  *)
+
+let _ = 
+  print_lines [
+"/* ARM ldm/stm instruction patterns.  This file was automatically generated";
+"   using arm-ldmstm.ml.  Please do not edit manually.";
+"";
+"   Copyright (C) 2010 Free Software Foundation, Inc.";
+"   Contributed by CodeSourcery.";
+"";
+"   This file is part of GCC.";
+"";
+"   GCC is free software; you can redistribute it and/or modify it";
+"   under the terms of the GNU General Public License as published";
+"   by the Free Software Foundation; either version 3, or (at your";
+"   option) any later version.";
+"";
+"   GCC is distributed in the hope that it will be useful, but WITHOUT";
+"   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY";
+"   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public";
+"   License for more details.";
+"";
+"   You should have received a copy of the GNU General Public License and";
+"   a copy of the GCC Runtime Library Exception along with this program;";
+"   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see";
+"   <http://www.gnu.org/licenses/>.  */";
+""];
+  patterns ();
Index: config/arm/arm.md
===================================================================
--- config/arm/arm.md	(revision 161831)
+++ config/arm/arm.md	(working copy)
@@ -6134,7 +6134,7 @@ (define_expand "movxf"
 
 ;; load- and store-multiple insns
 ;; The arm can load/store any set of registers, provided that they are in
-;; ascending order; but that is beyond GCC so stick with what it knows.
+;; ascending order, but these expanders assume a contiguous set.
 
 (define_expand "load_multiple"
   [(match_par_dup 3 [(set (match_operand:SI 0 "" "")
@@ -6155,126 +6155,12 @@ (define_expand "load_multiple"
     FAIL;
 
   operands[3]
-    = arm_gen_load_multiple (REGNO (operands[0]), INTVAL (operands[2]),
+    = arm_gen_load_multiple (arm_regs_in_sequence + REGNO (operands[0]),
+			     INTVAL (operands[2]),
 			     force_reg (SImode, XEXP (operands[1], 0)),
-			     TRUE, FALSE, operands[1], &offset);
+			     FALSE, operands[1], &offset);
 })
 
-;; Load multiple with write-back
-
-(define_insn "*ldmsi_postinc4"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))
-     (set (match_operand:SI 6 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 12))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
-  "ldm%(ia%)\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "load4")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi_postinc4_thumb1"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=l")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))
-     (set (match_operand:SI 6 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 12))))])]
-  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
-  "ldmia\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "load4")]
-)
-
-(define_insn "*ldmsi_postinc3"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 12)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 8))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "ldm%(ia%)\\t%1!, {%3, %4, %5}"
-  [(set_attr "type" "load3")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi_postinc2"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 8)))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (match_dup 2)))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 2) (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "ldm%(ia%)\\t%1!, {%3, %4}"
-  [(set_attr "type" "load2")
-   (set_attr "predicable" "yes")]
-)
-
-;; Ordinary load multiple
-
-(define_insn "*ldmsi4"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 8))))
-     (set (match_operand:SI 5 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 12))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "ldm%(ia%)\\t%1, {%2, %3, %4, %5}"
-  [(set_attr "type" "load4")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi3"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))
-     (set (match_operand:SI 4 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 8))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "ldm%(ia%)\\t%1, {%2, %3, %4}"
-  [(set_attr "type" "load3")
-   (set_attr "predicable" "yes")]
-)
-
-(define_insn "*ldmsi2"
-  [(match_parallel 0 "load_multiple_operation"
-    [(set (match_operand:SI 2 "arm_hard_register_operand" "")
-	  (mem:SI (match_operand:SI 1 "s_register_operand" "r")))
-     (set (match_operand:SI 3 "arm_hard_register_operand" "")
-	  (mem:SI (plus:SI (match_dup 1) (const_int 4))))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
-  "ldm%(ia%)\\t%1, {%2, %3}"
-  [(set_attr "type" "load2")
-   (set_attr "predicable" "yes")]
-)
-
 (define_expand "store_multiple"
   [(match_par_dup 3 [(set (match_operand:SI 0 "" "")
                           (match_operand:SI 1 "" ""))
@@ -6294,125 +6180,12 @@ (define_expand "store_multiple"
     FAIL;
 
   operands[3]
-    = arm_gen_store_multiple (REGNO (operands[1]), INTVAL (operands[2]),
+    = arm_gen_store_multiple (arm_regs_in_sequence + REGNO (operands[1]),
+			      INTVAL (operands[2]),
 			      force_reg (SImode, XEXP (operands[0], 0)),
-			      TRUE, FALSE, operands[0], &offset);
+			      FALSE, operands[0], &offset);
 })
 
-;; Store multiple with write-back
-
-(define_insn "*stmsi_postinc4"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 12)))
-	  (match_operand:SI 6 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 5"
-  "stm%(ia%)\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi_postinc4_thumb1"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=l")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 16)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 12)))
-	  (match_operand:SI 6 "arm_hard_register_operand" ""))])]
-  "TARGET_THUMB1 && XVECLEN (operands[0], 0) == 5"
-  "stmia\\t%1!, {%3, %4, %5, %6}"
-  [(set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi_postinc3"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 12)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 8)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "stm%(ia%)\\t%1!, {%3, %4, %5}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store3")]
-)
-
-(define_insn "*stmsi_postinc2"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (match_operand:SI 1 "s_register_operand" "=r")
-	  (plus:SI (match_operand:SI 2 "s_register_operand" "1")
-		   (const_int 8)))
-     (set (mem:SI (match_dup 2))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 2) (const_int 4)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "stm%(ia%)\\t%1!, {%3, %4}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store2")]
-)
-
-;; Ordinary store multiple
-
-(define_insn "*stmsi4"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 8)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 12)))
-	  (match_operand:SI 5 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 4"
-  "stm%(ia%)\\t%1, {%2, %3, %4, %5}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store4")]
-)
-
-(define_insn "*stmsi3"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 8)))
-	  (match_operand:SI 4 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 3"
-  "stm%(ia%)\\t%1, {%2, %3, %4}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store3")]
-)
-
-(define_insn "*stmsi2"
-  [(match_parallel 0 "store_multiple_operation"
-    [(set (mem:SI (match_operand:SI 1 "s_register_operand" "r"))
-	  (match_operand:SI 2 "arm_hard_register_operand" ""))
-     (set (mem:SI (plus:SI (match_dup 1) (const_int 4)))
-	  (match_operand:SI 3 "arm_hard_register_operand" ""))])]
-  "TARGET_32BIT && XVECLEN (operands[0], 0) == 2"
-  "stm%(ia%)\\t%1, {%2, %3}"
-  [(set_attr "predicable" "yes")
-   (set_attr "type" "store2")]
-)
 
 ;; Move a block of memory if it is word aligned and MORE than 2 words long.
 ;; We could let this apply for blocks of less than this, but it clobbers so
@@ -8877,8 +8650,8 @@ (define_expand "untyped_call"
 	if (REGNO (reg) == R0_REGNUM)
 	  {
 	    /* On thumb we have to use a write-back instruction.  */
-	    emit_insn (arm_gen_store_multiple (R0_REGNUM, 4, addr, TRUE,
-			TARGET_THUMB ? TRUE : FALSE, mem, &offset));
+	    emit_insn (arm_gen_store_multiple (arm_regs_in_sequence, 4, addr,
+ 		       TARGET_THUMB ? TRUE : FALSE, mem, &offset));
 	    size = TARGET_ARM ? 16 : 0;
 	  }
 	else
@@ -8924,8 +8697,8 @@ (define_expand "untyped_return"
 	if (REGNO (reg) == R0_REGNUM)
 	  {
 	    /* On thumb we have to use a write-back instruction.  */
-	    emit_insn (arm_gen_load_multiple (R0_REGNUM, 4, addr, TRUE,
-			TARGET_THUMB ? TRUE : FALSE, mem, &offset));
+	    emit_insn (arm_gen_load_multiple (arm_regs_in_sequence, 4, addr,
+ 		       TARGET_THUMB ? TRUE : FALSE, mem, &offset));
 	    size = TARGET_ARM ? 16 : 0;
 	  }
 	else
@@ -10518,87 +10291,6 @@ (define_peephole2
   ""
 )
 
-; Peepholes to spot possible load- and store-multiples, if the ordering is
-; reversed, check that the memory references aren't volatile.
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 4 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 5 "memory_operand" "m"))
-   (set (match_operand:SI 2 "s_register_operand" "=rk")
-        (match_operand:SI 6 "memory_operand" "m"))
-   (set (match_operand:SI 3 "s_register_operand" "=rk")
-        (match_operand:SI 7 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 4, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 4);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 3 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 4 "memory_operand" "m"))
-   (set (match_operand:SI 2 "s_register_operand" "=rk")
-        (match_operand:SI 5 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 3, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 3);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 0 "s_register_operand" "=rk")
-        (match_operand:SI 2 "memory_operand" "m"))
-   (set (match_operand:SI 1 "s_register_operand" "=rk")
-        (match_operand:SI 3 "memory_operand" "m"))]
-  "TARGET_ARM && load_multiple_sequence (operands, 2, NULL, NULL, NULL)"
-  "*
-  return emit_ldm_seq (operands, 2);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 4 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 5 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))
-   (set (match_operand:SI 6 "memory_operand" "=m")
-        (match_operand:SI 2 "s_register_operand" "rk"))
-   (set (match_operand:SI 7 "memory_operand" "=m")
-        (match_operand:SI 3 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 4, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 4);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 3 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 4 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))
-   (set (match_operand:SI 5 "memory_operand" "=m")
-        (match_operand:SI 2 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 3, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 3);
-  "
-)
-
-(define_peephole
-  [(set (match_operand:SI 2 "memory_operand" "=m")
-        (match_operand:SI 0 "s_register_operand" "rk"))
-   (set (match_operand:SI 3 "memory_operand" "=m")
-        (match_operand:SI 1 "s_register_operand" "rk"))]
-  "TARGET_ARM && store_multiple_sequence (operands, 2, NULL, NULL, NULL)"
-  "*
-  return emit_stm_seq (operands, 2);
-  "
-)
-
 (define_split
   [(set (match_operand:SI 0 "s_register_operand" "")
 	(and:SI (ge:SI (match_operand:SI 1 "s_register_operand" "")
@@ -11382,6 +11074,8 @@ (define_expand "bswapsi2"
   "
 )
 
+;; Load the load/store multiple patterns
+(include "ldmstm.md")
 ;; Load the FPA co-processor patterns
 (include "fpa.md")
 ;; Load the Maverick co-processor patterns

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Ping^(n+1) ARM ldm/stm peepholes
  2010-07-07 11:57 ` Resubmit/Ping^n " Bernd Schmidt
@ 2010-07-16 11:19   ` Bernd Schmidt
  2010-07-23 10:23     ` Ping^(n+2) " Bernd Schmidt
  2010-07-23 21:49     ` Ping^(n+1) " Steven Bosscher
       [not found]   ` <1280580386.32159.25.camel@e102346-lin.cambridge.arm.com>
  1 sibling, 2 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-07-16 11:19 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Earnshaw, ramana.radhakrishnan

On 07/07/2010 01:55 PM, Bernd Schmidt wrote:
> Here's an updated version of my ldm/stm peepholes patch for current
> trunk.  The main goal of this is to enable ldm/stm generation for Thumb
> by using define_peephole2 and peep2_find_free_reg rather than
> define_peephole; there are one or two new peepholes to recognize
> additional opportunities.

http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00512.html


Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ping^(n+2) ARM ldm/stm peepholes
  2010-07-16 11:19   ` Ping^(n+1) " Bernd Schmidt
@ 2010-07-23 10:23     ` Bernd Schmidt
  2010-07-23 21:49     ` Ping^(n+1) " Steven Bosscher
  1 sibling, 0 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-07-23 10:23 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Earnshaw, ramana.radhakrishnan

On 07/07/2010 01:55 PM, Bernd Schmidt wrote:
> Here's an updated version of my ldm/stm peepholes patch for current
> trunk.  The main goal of this is to enable ldm/stm generation for Thumb
> by using define_peephole2 and peep2_find_free_reg rather than
> define_peephole; there are one or two new peepholes to recognize
> additional opportunities.

http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00512.html


Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Ping^(n+1) ARM ldm/stm peepholes
  2010-07-16 11:19   ` Ping^(n+1) " Bernd Schmidt
  2010-07-23 10:23     ` Ping^(n+2) " Bernd Schmidt
@ 2010-07-23 21:49     ` Steven Bosscher
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Bosscher @ 2010-07-23 21:49 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: GCC Patches, Richard Earnshaw, ramana.radhakrishnan

On Fri, Jul 16, 2010 at 1:18 PM, Bernd Schmidt <bernds@codesourcery.com> wrote:
> On 07/07/2010 01:55 PM, Bernd Schmidt wrote:
>> Here's an updated version of my ldm/stm peepholes patch for current
>> trunk.  The main goal of this is to enable ldm/stm generation for Thumb
>> by using define_peephole2 and peep2_find_free_reg rather than
>> define_peephole; there are one or two new peepholes to recognize
>> additional opportunities.
>
> http://gcc.gnu.org/ml/gcc-patches/2010-07/msg00512.html

Hope no-one minds I ping this too. It'd be nice to see this go in
soon, it's a big step towards the elimination of the text peepholes
(define_peephole & friends).

Ciao!
Steven

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Resubmit/Ping^n ARM ldm/stm peepholes
       [not found]   ` <1280580386.32159.25.camel@e102346-lin.cambridge.arm.com>
@ 2010-08-02 10:09     ` Bernd Schmidt
  0 siblings, 0 replies; 13+ messages in thread
From: Bernd Schmidt @ 2010-08-02 10:09 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 530 bytes --]

(note gcc-patches Cc re-added)

On 07/31/2010 02:46 PM, Richard Earnshaw wrote:
> On Wed, 2010-07-07 at 12:55 +0100, Bernd Schmidt wrote:
>> Here's an updated version of my ldm/stm peepholes patch for current
>> trunk.
>> Ok?
>>
> 
> OK,

Thanks!

I've taken the liberty of also committing two testcases for it which
were missing from the patch submission.  They pass my arm-linux tests
with march=armv7-a/mthumb, mthumb, and plain ARM.

We can still decide whether arm-ldmstm.ml is the master copy or just
documentation.


Bernd

[-- Attachment #2: ldm-tc.diff --]
[-- Type: text/plain, Size: 790 bytes --]

Index: testsuite/gcc.target/arm/pr40457-1.c
===================================================================
--- testsuite/gcc.target/arm/pr40457-1.c	(revision 0)
+++ testsuite/gcc.target/arm/pr40457-1.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-options "-Os" }  */
+/* { dg-do compile } */
+
+int bar(int* p)
+{
+  int x = p[0] + p[1];
+  return x;
+}
+
+/* { dg-final { scan-assembler "ldm" } } */
Index: testsuite/gcc.target/arm/pr40457-2.c
===================================================================
--- testsuite/gcc.target/arm/pr40457-2.c	(revision 0)
+++ testsuite/gcc.target/arm/pr40457-2.c	(revision 0)
@@ -0,0 +1,10 @@
+/* { dg-options "-O2" }  */
+/* { dg-do compile } */
+
+void foo(int* p)
+{
+  p[0] = 1;
+  p[1] = 0;
+}
+
+/* { dg-final { scan-assembler "stm" } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-08-02 10:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-20 11:37 ARM ldm/stm peepholes Bernd Schmidt
2010-04-23  8:24 ` Ramana Radhakrishnan
2010-04-23 13:11   ` Richard Earnshaw
2010-05-05 11:45     ` Bernd Schmidt
2010-05-05 12:11       ` Richard Earnshaw
2010-05-05 22:47         ` Bernd Schmidt
2010-06-08 15:57     ` Bernd Schmidt
2010-06-28 12:27       ` Ping: " Bernd Schmidt
2010-07-07 11:57 ` Resubmit/Ping^n " Bernd Schmidt
2010-07-16 11:19   ` Ping^(n+1) " Bernd Schmidt
2010-07-23 10:23     ` Ping^(n+2) " Bernd Schmidt
2010-07-23 21:49     ` Ping^(n+1) " Steven Bosscher
     [not found]   ` <1280580386.32159.25.camel@e102346-lin.cambridge.arm.com>
2010-08-02 10:09     ` Resubmit/Ping^n " Bernd Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).