public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in  Thumb mode
@ 2012-10-10 14:48 Greta Yorsh
  2012-10-10 15:03 ` [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD " Greta Yorsh
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Greta Yorsh @ 2012-10-10 14:48 UTC (permalink / raw)
  To: GCC Patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, nickc, paul, Greta Yorsh

Generate prologue/epilogue using STRD/LDRD in Thumb mode, when tuning
prefer_ldrd_strd flag is set, such as in Cortex-A15.

[1/4] New RTL patterns for LDRD/STRD in Thumb mode
[2/4] Prologue using STRD in Thumb mode
[3/4] Epilogue using LDRD in Thumb mode
[4/4] Adjust tests gcc.target/arm/pr40457-*.c

Testing and benchmarking:
* No regression on qemu for arm-none-eabi cortex-a15 neon softfp thumb.
* Successful bootstrap on Cortex-A15.
* 3% performance improvement in a popular embedded benchmark.

Ok for trunk?

Thanks,
Greta



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH, ARM][2/4] Prologue using STRD in Thumb mode
  2012-10-10 14:48 [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode Greta Yorsh
  2012-10-10 15:03 ` [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD " Greta Yorsh
@ 2012-10-10 15:03 ` Greta Yorsh
  2012-10-18 14:41   ` Richard Earnshaw
  2012-10-10 15:04 ` [PATCH, ARM][3/4] Epilogue using LDRD " Greta Yorsh
  2012-10-10 15:13 ` [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c Greta Yorsh
  3 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-10 15:03 UTC (permalink / raw)
  To: Greta Yorsh, GCC Patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 326 bytes --]

Generate prologue using STRD when prefer_ldrd_strd is set in tune_params.

ChangeLog

gcc/

2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

        * config/arm/arm.c (thumb2_emit_strd_push): New function.
       (arm_expand_prologue): Use the new function.

[-- Attachment #2: 2-thumb-prolog-strd.patch.txt --]
[-- Type: text/plain, Size: 6132 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index b3a3774..1212a93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15737,6 +15737,126 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
     }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, emit a
+   combination of STRDs and STR for the prologue saves.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+     registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  /* Var j iterates over all the registers to gather all the registers in
+     saved_regs_mask.  Var i gives index of register R_j in stack frame.
+     A PARALLEL RTX of register-pair is created here, so that pattern for
+     STRD can be matched.  If num_regs is odd, 1st register will be pushed
+     using STR and remaining registers will be pushed with STRD in pairs.
+     If num_regs is even, all registers are pushed with STRD in pairs.
+     Hence, skip first element for odd num_regs.  */
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+        gcc_assert (j != PC_REGNUM);
+
+        /* Create RTX for store.  New RTX is created for dwarf as
+           they are not sharable.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
+                           reg);
+
+        tmp1 = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
+                           reg);
+        RTX_FRAME_RELATED_P (tmp) = 1;
+        RTX_FRAME_RELATED_P (tmp1) = 1;
+
+        if (((i - (num_regs % 2)) % 2) == 1)
+          /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+             be created.  Hence create it first.  The STRD pattern we are
+             generating is :
+             [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+               (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+             were target registers need not be consecutive.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+        /* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+           even, the reg_j is added as 0th element and if it is odd, reg_i is
+           added as 1st element of STRD pattern shown above.  */
+        XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+        XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+        if (((i - (num_regs % 2)) % 2) == 0)
+          /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+             to be loaded are generated in above given STRD pattern, and the
+             pattern can be emitted now.  */
+          emit_insn (par);
+
+        i--;
+      }
+
+  if ((num_regs % 2) == 1)
+    {
+      /* If odd number of registers are pushed, generate STR pattern to store
+         lone register.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j--);
+
+      tmp1 = gen_frame_mem (SImode, plus_constant (Pmode,
+                                                   stack_pointer_rtx, 4 * i));
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, tmp1, reg);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+
+      emit_insn (tmp);
+
+      tmp1 = gen_rtx_SET (SImode,
+                         gen_frame_mem
+                         (SImode,
+                          plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
+                          reg);
+      RTX_FRAME_RELATED_P (tmp1) = 1;
+      XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+    }
+
+  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  return;
+}
+
 /* Generate and emit an insn that we will recognize as a push_multi.
    Unfortunately, since this insn does not reflect very well the actual
    semantics of the operation, we need to annotate the insn for the benefit
@@ -16661,8 +16781,25 @@ arm_expand_prologue (void)
 	      saved_regs += frame;
 	    }
 	}
-      insn = emit_multi_reg_push (live_regs_mask);
-      RTX_FRAME_RELATED_P (insn) = 1;
+
+      if (current_tune->prefer_ldrd_strd
+          && !optimize_function_for_size_p (cfun))
+        {
+          if (TARGET_THUMB2)
+            {
+              thumb2_emit_strd_push (live_regs_mask);
+            }
+          else
+            {
+              insn = emit_multi_reg_push (live_regs_mask);
+              RTX_FRAME_RELATED_P (insn) = 1;
+            }
+        }
+      else
+        {
+          insn = emit_multi_reg_push (live_regs_mask);
+          RTX_FRAME_RELATED_P (insn) = 1;
+        }
     }
 
   if (! IS_VOLATILE (func_type))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-10 14:48 [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode Greta Yorsh
@ 2012-10-10 15:03 ` Greta Yorsh
  2012-10-18 13:54   ` Richard Earnshaw
  2012-10-10 15:03 ` [PATCH, ARM][2/4] Prologue using STRD " Greta Yorsh
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-10 15:03 UTC (permalink / raw)
  To: Greta Yorsh, GCC Patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

This patch adds define_insn patterns for LDRD and STRD in Thumb mode.

ChangeLog

gcc/

2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

        * config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
declaration.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
        (thumb2_ldrd_base_neg): Likewise.
        (thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
        * predicates.md (ldrd_strd_offset_operand): New predicate.
        * config/arm/constraints.md (Dd): New constraint.

[-- Attachment #2: 1-thumb-patterns.patch.txt --]
[-- Type: text/plain, Size: 10108 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c590ef4..317bca7 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
 extern bool gen_const_stm_seq (rtx *, int);
 extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
+extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3fce8c4..b3a3774 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12123,6 +12123,75 @@ arm_pad_reg_upward (enum machine_mode mode,
   return !BYTES_BIG_ENDIAN;
 }
 
+/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
+   assuming that the address in the base register is word aligned.  */
+bool
+offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
+{
+  HOST_WIDE_INT max_offset;
+
+  /* Offset must be a multiple of 4 in Thumb mode.  */
+  if (TARGET_THUMB2 && ((offset & 3) != 0))
+    return false;
+
+  if (TARGET_THUMB2)
+    max_offset = 1020;
+  else if (TARGET_ARM)
+    max_offset = 255;
+  else
+    gcc_unreachable ();
+
+  return ((offset <= max_offset) && (offset >= -max_offset));
+}
+
+/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
+   Assumes that RT, RT2, and RTN are REG.  This is guaranteed by the patterns.
+   Assumes that the address in the base register RTN is word aligned.  Pattern
+   guarantees that both memory accesses use the same base register,
+   the offsets are constants within the range, and the gap between the offsets is 4.
+   If preload complete then check that registers are legal.  WBACK indicates whether
+   address is updated.  LOAD indicates whether memory access is load or store.  */
+bool
+operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rtn, HOST_WIDE_INT offset,
+                       bool wback, bool load)
+{
+  unsigned int t, t2, n;
+
+  if (!reload_completed)
+    return true;
+
+  if (!offset_ok_for_ldrd_strd (offset))
+    return false;
+
+  t = REGNO (rt);
+  t2 = REGNO (rt2);
+  n = REGNO (rtn);
+
+  if ((TARGET_THUMB2)
+      && ((wback && (n == t || n == t2))
+          || (t == SP_REGNUM)
+          || (t == PC_REGNUM)
+          || (t2 == SP_REGNUM)
+          || (t2 == PC_REGNUM)
+          || (!load && (n == PC_REGNUM))
+          || (load && (t == t2))
+          /* Triggers Cortex-M3 LDRD errata.  */
+          || (!wback && load && fix_cm3_ldrd && (n == t))))
+    return false;
+
+  if ((TARGET_ARM)
+      && ((wback && (n == t || n == t2))
+          || (t2 == PC_REGNUM)
+          || (t % 2 != 0)   /* First destination register is not even.  */
+          || (t2 != t + 1)
+          /* PC can be used as base register (for offset addressing only),
+             but it is depricated.  */
+          || (n == PC_REGNUM)))
+    return false;
+
+  return true;
+}
+
 \f
 /* Print a symbolic form of X to the debug file, F.  */
 static void
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index e9da56d..ed82634 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11472,6 +11472,99 @@
   "
 )
 
+;; Patterns for LDRD/STRD in Thumb2 mode
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_strd_offset_operand" "Do"))))
+   (set (match_operand:SI 3 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (operands_ok_ldrd_strd (operands[0], operands[3],
+                                  operands[1], INTVAL (operands[2]),
+                                  false, true))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], 0, false, true))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base_neg"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (const_int -4))))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (match_dup 1)))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], -4, false, true))"
+  "ldrd%?\t%0, %2, [%1, #-4]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (match_operand:SI 1 "ldrd_strd_offset_operand" "Do")))
+        (match_operand:SI 2 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (match_operand:SI 3 "const_int_operand" "")))
+        (match_operand:SI 4 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
+     && (operands_ok_ldrd_strd (operands[2], operands[4],
+                                  operands[0], INTVAL (operands[1]),
+                                  false, false))"
+  "strd%?\t%2, %4, [%0, %1]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base"
+  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], 0, false, false))"
+  "strd%?\t%1, %2, [%0]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base_neg"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (const_int -4)))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (match_dup 0))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], -4, false, false))"
+  "strd%?\t%1, %2, [%0, #-4]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+
 ;; Load the load/store multiple patterns
 (include "ldmstm.md")
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b67df55..231d910 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -31,7 +31,7 @@
 ;; 'H' was previously used for FPA.
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -279,6 +279,12 @@
       (match_test "TARGET_32BIT
 		   && imm_for_neon_inv_logic_operand (op, GET_MODE (op))")))
 
+(define_constraint "Do"
+ "@internal
+  In ARM/Thumb2 state valid offset for an ldrd/strd instruction."
+ (and (match_code "const_int")
+      (match_test "offset_ok_for_ldrd_strd (ival)")))
+
 (define_constraint "Dv"
  "@internal
   In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 8ae26ca..badb68b 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -137,6 +137,10 @@
        (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
 	&& ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
 
+(define_predicate "ldrd_strd_offset_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "offset_ok_for_ldrd_strd (INTVAL (op))")))
+
 (define_predicate "arm_add_operand"
   (ior (match_operand 0 "arm_rhs_operand")
        (match_operand 0 "arm_neg_immediate_operand")))
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f330da3..21d1aa8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12130,6 +12130,9 @@ offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
 {
   HOST_WIDE_INT max_offset;
 
+  if (!TARGET_LDRD)
+    return false;
+
   /* Offset must be a multiple of 4 in Thumb mode.  */
   if (TARGET_THUMB2 && ((offset & 3) != 0))
     return false;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH, ARM][3/4] Epilogue using LDRD in Thumb mode
  2012-10-10 14:48 [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode Greta Yorsh
  2012-10-10 15:03 ` [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD " Greta Yorsh
  2012-10-10 15:03 ` [PATCH, ARM][2/4] Prologue using STRD " Greta Yorsh
@ 2012-10-10 15:04 ` Greta Yorsh
  2012-10-19 15:03   ` Richard Earnshaw
  2012-10-10 15:13 ` [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c Greta Yorsh
  3 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-10 15:04 UTC (permalink / raw)
  To: Greta Yorsh, GCC Patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 335 bytes --]

Generate epilogue using LDRD in Thumb mode when prefer_ldrd_strd is set in
tune_params.

ChangeLog

gcc/

2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

    * config/arm/arm.c (thumb2_emit_ldrd_pop): New function.
      (arm_expand_epilogue): Use the new function.

[-- Attachment #2: 3-thumb-epilog-ldrd.patch.txt --]
[-- Type: text/plain, Size: 6330 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1212a93..f330da3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16150,6 +16150,143 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static void
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+  bool return_in_pc;
+
+  return_in_pc = (saved_regs_mask & (1 << PC_REGNUM)) ? true : false;
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+     to be popped.  So, if num_regs is even, now it will become odd,
+     and we can generate pop with PC.  If num_regs is odd, it will be
+     even now, and ldr with return can be generated for PC.  */
+  if (return_in_pc)
+    num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+     saved_regs_mask.  Var i gives index of saved registers in stack frame.
+     A PARALLEL RTX of register-pair is created here, so that pattern for
+     LDRD can be matched.  As PC is always last register to be popped, and
+     we have already decremented num_regs if PC, we don't have to worry
+     about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+
+        /* Create RTX for memory load.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           reg,
+                           gen_frame_mem (SImode,
+                               plus_constant (Pmode,
+                                              stack_pointer_rtx, 4 * i)));
+        RTX_FRAME_RELATED_P (tmp) = 1;
+
+        if (i % 2 == 0)
+          {
+            /* When saved-register index (i) is even, the RTX to be emitted is
+               yet to be created.  Hence create it first.  The LDRD pattern we
+               are generating is :
+               [ (SET (reg_t0) (MEM (PLUS (SP) (NUM))))
+                 (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4)))) ]
+               where target registers need not be consecutive.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+            dwarf = NULL_RTX;
+          }
+
+        /* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+           added as 0th element and if i is odd, reg_i is added as 1st element
+           of LDRD pattern shown above.  */
+        XVECEXP (par, 0, (i % 2)) = tmp;
+        dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+        if ((i % 2) == 1)
+          {
+            /* When saved-register index (i) is odd, RTXs for both the registers
+               to be loaded are generated in above given LDRD pattern, and the
+               pattern can be emitted now.  */
+            par = emit_insn (par);
+            REG_NOTES (par) = dwarf;
+          }
+
+        i++;
+      }
+
+  /* If the number of registers pushed is odd AND return_in_pc is false OR
+     number of registers are even AND return_in_pc is true, last register is
+     popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+     then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+     num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (Pmode, stack_pointer_rtx, 4 * i));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  emit_insn (tmp);
+
+  dwarf = NULL_RTX;
+
+  if (((num_regs % 2) == 1 && !return_in_pc)
+      || ((num_regs % 2) == 0 && return_in_pc))
+    {
+      /* Scan for the single register to be popped.  Skip until the saved
+         register is found.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j++);
+
+      /* Gen LDR with post increment here.  */
+      tmp1 = gen_rtx_MEM (SImode,
+                          gen_rtx_POST_INC (SImode,
+                                            stack_pointer_rtx));
+      set_mem_alias_set (tmp1, get_frame_alias_set ());
+
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, reg, tmp1);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+      if (return_in_pc)
+        {
+          /* If return_in_pc, j must be PC_REGNUM.  */
+          gcc_assert (j == PC_REGNUM);
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          XVECEXP (par, 0, 1) = tmp;
+          par = emit_jump_insn (par);
+        }
+      else
+        {
+          par = emit_insn (tmp);
+        }
+
+      REG_NOTES (par) = dwarf;
+    }
+  else if ((num_regs % 2) == 1 && return_in_pc)
+    {
+      /* There are 2 registers to be popped.  So, generate the pattern
+         pop_multiple_with_stack_update_and_return to pop in PC.  */
+      arm_emit_multi_reg_pop (saved_regs_mask & (~((1 << j) - 1)));
+    }
+
+  return;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
@@ -23102,7 +23239,16 @@ arm_expand_epilogue (bool really_return)
         }
       else
         {
-          arm_emit_multi_reg_pop (saved_regs_mask);
+          if (current_tune->prefer_ldrd_strd
+              && !optimize_function_for_size_p (cfun))
+            {
+              if (TARGET_THUMB2)
+                thumb2_emit_ldrd_pop (saved_regs_mask);
+              else
+                arm_emit_multi_reg_pop (saved_regs_mask);
+            }
+          else
+            arm_emit_multi_reg_pop (saved_regs_mask);
         }
 
       if (return_in_pc == true)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c
  2012-10-10 14:48 [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode Greta Yorsh
                   ` (2 preceding siblings ...)
  2012-10-10 15:04 ` [PATCH, ARM][3/4] Epilogue using LDRD " Greta Yorsh
@ 2012-10-10 15:13 ` Greta Yorsh
  2012-10-19 15:10   ` Richard Earnshaw
  3 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-10 15:13 UTC (permalink / raw)
  To: Greta Yorsh, GCC Patches
  Cc: Ramana Radhakrishnan, Richard Earnshaw, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

As a result of adding LDRD/STRD patterns in Thumb mode, the compiler
generates LDRD/STRD instead of LDM/STM in some cases. This patch adjusts
existing tests to accept LDRD/STRD in addition to LDM/STM.

ChangeLog

gcc/testsuite

2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

        * gcc.target/arm/pr40457-1.c: Adjust expected output.
        * gcc.target/arm/pr40457-2.c: Likewise.
        * gcc.target/arm/pr40457-3.c: Likewise.

[-- Attachment #2: 4-thumb-adjust-tests.v2.patch.txt --]
[-- Type: text/plain, Size: 1130 bytes --]

diff --git a/gcc/testsuite/gcc.target/arm/pr40457-1.c b/gcc/testsuite/gcc.target/arm/pr40457-1.c
index 815fd38..8895659 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-1.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-1.c
@@ -7,4 +7,4 @@ int bar(int* p)
   return x;
 }
 
-/* { dg-final { scan-assembler "ldm" } } */
+/* { dg-final { scan-assembler "ldrd|ldm" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr40457-2.c b/gcc/testsuite/gcc.target/arm/pr40457-2.c
index 187f7bf..5079939 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-2.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-2.c
@@ -7,4 +7,4 @@ void foo(int* p)
   p[1] = 0;
 }
 
-/* { dg-final { scan-assembler "stm" } } */
+/* { dg-final { scan-assembler "strd|stm" } } */
diff --git a/gcc/testsuite/gcc.target/arm/pr40457-3.c b/gcc/testsuite/gcc.target/arm/pr40457-3.c
index 9bd5a17..8823a80 100644
--- a/gcc/testsuite/gcc.target/arm/pr40457-3.c
+++ b/gcc/testsuite/gcc.target/arm/pr40457-3.c
@@ -7,4 +7,4 @@ void foo(int* p)
   p[1] = 0;
 }
 
-/* { dg-final { scan-assembler "stm" } } */
+/* { dg-final { scan-assembler "strd|stm" } } */

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-10 15:03 ` [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD " Greta Yorsh
@ 2012-10-18 13:54   ` Richard Earnshaw
  2012-10-19 15:44     ` Greta Yorsh
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-18 13:54 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 10/10/12 16:03, Greta Yorsh wrote:
> This patch adds define_insn patterns for LDRD and STRD in Thumb mode.
>
> ChangeLog
>
> gcc/
>
> 2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
>              Greta Yorsh  <Greta.Yorsh@arm.com>
>
>          * config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
> declaration.
>          (operands_ok_ldrd_strd): Likewise.
>          * config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
>          (operands_ok_ldrd_strd): Likewise.
>          * config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
>          (thumb2_ldrd_base_neg): Likewise.
>          (thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
>          * predicates.md (ldrd_strd_offset_operand): New predicate.
>          * config/arm/constraints.md (Dd): New constraint.
>
>
> 1-thumb-patterns.patch.txt
>
>
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index c590ef4..317bca7 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
>   extern bool gen_const_stm_seq (rtx *, int);
>   extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
>   extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
> +extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
> +extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
>   extern int arm_gen_movmemqi (rtx *);
>   extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
>   extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 3fce8c4..b3a3774 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12123,6 +12123,75 @@ arm_pad_reg_upward (enum machine_mode mode,
>     return !BYTES_BIG_ENDIAN;
>   }
>
> +/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
> +   assuming that the address in the base register is word aligned.  */
> +bool
> +offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
> +{
> +  HOST_WIDE_INT max_offset;
> +
> +  /* Offset must be a multiple of 4 in Thumb mode.  */
> +  if (TARGET_THUMB2 && ((offset & 3) != 0))
> +    return false;
> +
> +  if (TARGET_THUMB2)
> +    max_offset = 1020;
> +  else if (TARGET_ARM)
> +    max_offset = 255;
> +  else
> +    gcc_unreachable ();
> +
> +  return ((offset <= max_offset) && (offset >= -max_offset));
> +}
> +
> +/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
> +   Assumes that RT, RT2, and RTN are REG.  This is guaranteed by the patterns.
> +   Assumes that the address in the base register RTN is word aligned.  Pattern
> +   guarantees that both memory accesses use the same base register,
> +   the offsets are constants within the range, and the gap between the offsets is 4.
> +   If preload complete then check that registers are legal.  WBACK indicates whether
> +   address is updated.  LOAD indicates whether memory access is load or store.  */

ARM ARM terminology uses Rn for the base reg, so:

s/RTN/RN/

> +bool
> +operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rtn, HOST_WIDE_INT offset,

s/rtn/rn/


> +                       bool wback, bool load)
> +{
> +  unsigned int t, t2, n;
> +
> +  if (!reload_completed)
> +    return true;
> +
> +  if (!offset_ok_for_ldrd_strd (offset))
> +    return false;
> +
> +  t = REGNO (rt);
> +  t2 = REGNO (rt2);
> +  n = REGNO (rtn);
> +
> +  if ((TARGET_THUMB2)
> +      && ((wback && (n == t || n == t2))
> +          || (t == SP_REGNUM)
> +          || (t == PC_REGNUM)
> +          || (t2 == SP_REGNUM)
> +          || (t2 == PC_REGNUM)
> +          || (!load && (n == PC_REGNUM))
> +          || (load && (t == t2))
> +          /* Triggers Cortex-M3 LDRD errata.  */
> +          || (!wback && load && fix_cm3_ldrd && (n == t))))
> +    return false;
> +
> +  if ((TARGET_ARM)
> +      && ((wback && (n == t || n == t2))
> +          || (t2 == PC_REGNUM)
> +          || (t % 2 != 0)   /* First destination register is not even.  */
> +          || (t2 != t + 1)
> +          /* PC can be used as base register (for offset addressing only),
> +             but it is depricated.  */
> +          || (n == PC_REGNUM)))
> +    return false;
> +
> +  return true;
> +}
> +
>   \f
>   /* Print a symbolic form of X to the debug file, F.  */
>   static void
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index e9da56d..ed82634 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -11472,6 +11472,99 @@
>     "
>   )
>
> +;; Patterns for LDRD/STRD in Thumb2 mode
> +
> +(define_insn "*thumb2_ldrd"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> +        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
> +                         (match_operand:SI 2 "ldrd_strd_offset_operand" "Do"))))
> +   (set (match_operand:SI 3 "s_register_operand" "=r")
> +        (mem:SI (plus:SI (match_dup 1)
> +                         (match_operand:SI 4 "const_int_operand" ""))))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))

All these should be gated on "reload_completed" and not on the tune or 
size optimization.


> +     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
> +     && (operands_ok_ldrd_strd (operands[0], operands[3],
> +                                  operands[1], INTVAL (operands[2]),
> +                                  false, true))"
> +  "ldrd%?\t%0, %3, [%1, %2]"
> +  [(set_attr "type" "load2")
> +   (set_attr "predicable" "yes")])
> +
> +(define_insn "*thumb2_ldrd_base"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> +        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
> +   (set (match_operand:SI 2 "s_register_operand" "=r")
> +        (mem:SI (plus:SI (match_dup 1)
> +                         (const_int 4))))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
> +     && (operands_ok_ldrd_strd (operands[0], operands[2],
> +                                  operands[1], 0, false, true))"
> +  "ldrd%?\t%0, %2, [%1]"
> +  [(set_attr "type" "load2")
> +   (set_attr "predicable" "yes")])
> +
> +(define_insn "*thumb2_ldrd_base_neg"
> +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> +	(mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
> +                         (const_int -4))))
> +   (set (match_operand:SI 2 "s_register_operand" "=r")
> +        (mem:SI (match_dup 1)))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
> +     && (operands_ok_ldrd_strd (operands[0], operands[2],
> +                                  operands[1], -4, false, true))"
> +  "ldrd%?\t%0, %2, [%1, #-4]"
> +  [(set_attr "type" "load2")
> +   (set_attr "predicable" "yes")])
> +
> +(define_insn "*thumb2_strd"
> +  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
> +                         (match_operand:SI 1 "ldrd_strd_offset_operand" "Do")))
> +        (match_operand:SI 2 "s_register_operand" "r"))
> +   (set (mem:SI (plus:SI (match_dup 0)
> +                         (match_operand:SI 3 "const_int_operand" "")))
> +        (match_operand:SI 4 "s_register_operand" "r"))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
> +     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
> +     && (operands_ok_ldrd_strd (operands[2], operands[4],
> +                                  operands[0], INTVAL (operands[1]),
> +                                  false, false))"
> +  "strd%?\t%2, %4, [%0, %1]"
> +  [(set_attr "type" "store2")
> +   (set_attr "predicable" "yes")])
> +
> +(define_insn "*thumb2_strd_base"
> +  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
> +        (match_operand:SI 1 "s_register_operand" "r"))
> +   (set (mem:SI (plus:SI (match_dup 0)
> +                         (const_int 4)))
> +        (match_operand:SI 2 "s_register_operand" "r"))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
> +     && (operands_ok_ldrd_strd (operands[1], operands[2],
> +                                  operands[0], 0, false, false))"
> +  "strd%?\t%1, %2, [%0]"
> +  [(set_attr "type" "store2")
> +   (set_attr "predicable" "yes")])
> +
> +(define_insn "*thumb2_strd_base_neg"
> +  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
> +                         (const_int -4)))
> +        (match_operand:SI 1 "s_register_operand" "r"))
> +   (set (mem:SI (match_dup 0))
> +        (match_operand:SI 2 "s_register_operand" "r"))]
> +  "TARGET_LDRD && TARGET_THUMB2
> +     && (current_tune->prefer_ldrd_strd && !optimize_function_for_size_p (cfun))
> +     && (operands_ok_ldrd_strd (operands[1], operands[2],
> +                                  operands[0], -4, false, false))"
> +  "strd%?\t%1, %2, [%0, #-4]"
> +  [(set_attr "type" "store2")
> +   (set_attr "predicable" "yes")])
> +
> +
>   ;; Load the load/store multiple patterns
>   (include "ldmstm.md")
>
> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
> index b67df55..231d910 100644
> --- a/gcc/config/arm/constraints.md
> +++ b/gcc/config/arm/constraints.md
> @@ -31,7 +31,7 @@
>   ;; 'H' was previously used for FPA.
>
>   ;; The following multi-letter normal constraints have been used:
> -;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
> +;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dz
>   ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
>   ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
>
> @@ -279,6 +279,12 @@
>         (match_test "TARGET_32BIT
>   		   && imm_for_neon_inv_logic_operand (op, GET_MODE (op))")))
>
> +(define_constraint "Do"
> + "@internal
> +  In ARM/Thumb2 state valid offset for an ldrd/strd instruction."
> + (and (match_code "const_int")
> +      (match_test "offset_ok_for_ldrd_strd (ival)")))
> +
>   (define_constraint "Dv"
>    "@internal
>     In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts
> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
> index 8ae26ca..badb68b 100644
> --- a/gcc/config/arm/predicates.md
> +++ b/gcc/config/arm/predicates.md
> @@ -137,6 +137,10 @@
>          (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
>   	&& ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
>
> +(define_predicate "ldrd_strd_offset_operand"
> +  (and (match_operand 0 "const_int_operand")
> +       (match_test "offset_ok_for_ldrd_strd (INTVAL (op))")))
> +
>   (define_predicate "arm_add_operand"
>     (ior (match_operand 0 "arm_rhs_operand")
>          (match_operand 0 "arm_neg_immediate_operand")))
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index f330da3..21d1aa8 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -12130,6 +12130,9 @@ offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
>   {
>     HOST_WIDE_INT max_offset;
>
> +  if (!TARGET_LDRD)
> +    return false;
> +

This seems to be in the wrong place.  If we don't have ldrd then the 
question as to what is a valid offset is irrelevant.

>     /* Offset must be a multiple of 4 in Thumb mode.  */
>     if (TARGET_THUMB2 && ((offset & 3) != 0))
>       return false;
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][2/4] Prologue using STRD in Thumb mode
  2012-10-10 15:03 ` [PATCH, ARM][2/4] Prologue using STRD " Greta Yorsh
@ 2012-10-18 14:41   ` Richard Earnshaw
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-18 14:41 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 10/10/12 16:03, Greta Yorsh wrote:
> Generate prologue using STRD when prefer_ldrd_strd is set in tune_params.
>
> ChangeLog
>
> gcc/
>
> 2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
>              Greta Yorsh  <Greta.Yorsh@arm.com>
>
>          * config/arm/arm.c (thumb2_emit_strd_push): New function.
>         (arm_expand_prologue): Use the new function.
>
>
> 2-thumb-prolog-strd.patch.txt
>
>
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index b3a3774..1212a93 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -15737,6 +15737,126 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
>       }
>   }
>
> +/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
> +   number of registers are being pushed, multiple STRD patterns are created for
> +   all register pairs.  If odd number of registers are pushed, emit a
> +   combination of STRDs and STR for the prologue saves.  */
> +static void
> +thumb2_emit_strd_push (unsigned long saved_regs_mask)
> +{
> +  int num_regs = 0;
> +  int i, j;
> +  rtx par = NULL_RTX;
> +  rtx insn = NULL_RTX;
> +  rtx dwarf = NULL_RTX;
> +  rtx tmp, reg, tmp1;
> +
> +  for (i = 0; i <= LAST_ARM_REGNUM; i++)
> +    if (saved_regs_mask & (1 << i))
> +      num_regs++;
> +
> +  gcc_assert (num_regs && num_regs <= 16);
> +
> +  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
> +     registers to push.  */
> +  tmp = gen_rtx_SET (VOIDmode,
> +                     stack_pointer_rtx,
> +                     plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp) = 1;
> +  insn = emit_insn (tmp);
> +
> +  /* Create sequence for DWARF info.  */
> +  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
> +
> +  /* RTLs cannot be shared, hence create new copy for dwarf.  */
> +  tmp1 = gen_rtx_SET (VOIDmode,
> +                     stack_pointer_rtx,
> +                     plus_constant (Pmode, stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp1) = 1;
> +  XVECEXP (dwarf, 0, 0) = tmp1;
> +
> +  /* Var j iterates over all the registers to gather all the registers in
> +     saved_regs_mask.  Var i gives index of register R_j in stack frame.
> +     A PARALLEL RTX of register-pair is created here, so that pattern for
> +     STRD can be matched.  If num_regs is odd, 1st register will be pushed
> +     using STR and remaining registers will be pushed with STRD in pairs.
> +     If num_regs is even, all registers are pushed with STRD in pairs.
> +     Hence, skip first element for odd num_regs.  */
> +  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
> +    if (saved_regs_mask & (1 << j))
> +      {
> +        gcc_assert (j != SP_REGNUM);
> +        gcc_assert (j != PC_REGNUM);

It would be better to assert at the head of the function that 
saved_regs_mask does not contain SP or PC, rather than checking every 
iteration of the loop.

> +
> +        /* Create RTX for store.  New RTX is created for dwarf as
> +           they are not sharable.  */
> +        reg = gen_rtx_REG (SImode, j);
> +        tmp = gen_rtx_SET (SImode,
> +                           gen_frame_mem
> +                           (SImode,
> +                            plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
> +                           reg);
> +
> +        tmp1 = gen_rtx_SET (SImode,
> +                           gen_frame_mem
> +                           (SImode,
> +                            plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
> +                           reg);
> +        RTX_FRAME_RELATED_P (tmp) = 1;
> +        RTX_FRAME_RELATED_P (tmp1) = 1;
> +
> +        if (((i - (num_regs % 2)) % 2) == 1)
> +          /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
> +             be created.  Hence create it first.  The STRD pattern we are
> +             generating is :
> +             [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
> +               (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
> +             were target registers need not be consecutive.  */

s/were/where the/

> +          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
> +
> +        /* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
> +           even, the reg_j is added as 0th element and if it is odd, reg_i is
> +           added as 1st element of STRD pattern shown above.  */
> +        XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
> +        XVECEXP (dwarf, 0, (i + 1)) = tmp1;
> +
> +        if (((i - (num_regs % 2)) % 2) == 0)
> +          /* When (i - (num_regs % 2)) is even, RTXs for both the registers
> +             to be loaded are generated in above given STRD pattern, and the
> +             pattern can be emitted now.  */
> +          emit_insn (par);
> +
> +        i--;
> +      }
> +
> +  if ((num_regs % 2) == 1)
> +    {
> +      /* If odd number of registers are pushed, generate STR pattern to store
> +         lone register.  */
> +      for (; (saved_regs_mask & (1 << j)) == 0; j--);
> +
> +      tmp1 = gen_frame_mem (SImode, plus_constant (Pmode,
> +                                                   stack_pointer_rtx, 4 * i));
> +      reg = gen_rtx_REG (SImode, j);
> +      tmp = gen_rtx_SET (SImode, tmp1, reg);
> +      RTX_FRAME_RELATED_P (tmp) = 1;
> +
> +      emit_insn (tmp);
> +
> +      tmp1 = gen_rtx_SET (SImode,
> +                         gen_frame_mem
> +                         (SImode,
> +                          plus_constant (Pmode, stack_pointer_rtx, 4 * i)),
> +                          reg);
> +      RTX_FRAME_RELATED_P (tmp1) = 1;
> +      XVECEXP (dwarf, 0, (i + 1)) = tmp1;
> +    }
> +
> +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +  return;
> +}
> +
>   /* Generate and emit an insn that we will recognize as a push_multi.
>      Unfortunately, since this insn does not reflect very well the actual
>      semantics of the operation, we need to annotate the insn for the benefit
> @@ -16661,8 +16781,25 @@ arm_expand_prologue (void)
>   	      saved_regs += frame;
>   	    }
>   	}
> -      insn = emit_multi_reg_push (live_regs_mask);
> -      RTX_FRAME_RELATED_P (insn) = 1;
> +
> +      if (current_tune->prefer_ldrd_strd
> +          && !optimize_function_for_size_p (cfun))
> +        {
> +          if (TARGET_THUMB2)
> +            {
> +              thumb2_emit_strd_push (live_regs_mask);
> +            }
> +          else
> +            {
> +              insn = emit_multi_reg_push (live_regs_mask);
> +              RTX_FRAME_RELATED_P (insn) = 1;
> +            }
> +        }
> +      else
> +        {
> +          insn = emit_multi_reg_push (live_regs_mask);
> +          RTX_FRAME_RELATED_P (insn) = 1;
> +        }
>       }
>
>     if (! IS_VOLATILE (func_type))
>

Otherwise OK.

R.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][3/4] Epilogue using LDRD in Thumb mode
  2012-10-10 15:04 ` [PATCH, ARM][3/4] Epilogue using LDRD " Greta Yorsh
@ 2012-10-19 15:03   ` Richard Earnshaw
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-19 15:03 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 10/10/12 16:03, Greta Yorsh wrote:
> Generate epilogue using LDRD in Thumb mode when prefer_ldrd_strd is set in
> tune_params.
>
> ChangeLog
>
> gcc/
>
> 2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
>              Greta Yorsh  <Greta.Yorsh@arm.com>
>
>      * config/arm/arm.c (thumb2_emit_ldrd_pop): New function.
>        (arm_expand_epilogue): Use the new function.
>

This is OK, apart from:


>
> +
> +  /* Var j iterates over all the registers to gather all the registers in
> +     saved_regs_mask.  Var i gives index of saved registers in stack frame.
> +     A PARALLEL RTX of register-pair is created here, so that pattern for
> +     LDRD can be matched.  As PC is always last register to be popped, and
> +     we have already decremented num_regs if PC, we don't have to worry
> +     about PC in this loop.  */
> +  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
> +    if (saved_regs_mask & (1 << j))
> +      {
> +        gcc_assert (j != SP_REGNUM);

Please move the assert outside of the loop.

R.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c
  2012-10-10 15:13 ` [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c Greta Yorsh
@ 2012-10-19 15:10   ` Richard Earnshaw
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-19 15:10 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 10/10/12 16:03, Greta Yorsh wrote:
> As a result of adding LDRD/STRD patterns in Thumb mode, the compiler
> generates LDRD/STRD instead of LDM/STM in some cases. This patch adjusts
> existing tests to accept LDRD/STRD in addition to LDM/STM.
>
> ChangeLog
>
> gcc/testsuite
>
> 2012-09-13  Sameera Deshpande  <sameera.deshpande@arm.com>
>              Greta Yorsh  <Greta.Yorsh@arm.com>
>
>          * gcc.target/arm/pr40457-1.c: Adjust expected output.
>          * gcc.target/arm/pr40457-2.c: Likewise.
>          * gcc.target/arm/pr40457-3.c: Likewise.
>

OK.

R.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-18 13:54   ` Richard Earnshaw
@ 2012-10-19 15:44     ` Greta Yorsh
  2012-10-19 15:52       ` Richard Earnshaw
  0 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-19 15:44 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 3499 bytes --]

On 18 October 2012 14:41, Richard Earnshaw wrote:
> > +/* Checks whether the operands are valid for use in an LDRD/STRD
instruction.
> > +   Assumes that RT, RT2, and RTN are REG.  This is guaranteed by the
patterns.
> > +   Assumes that the address in the base register RTN is word aligned.
Pattern
> > +   guarantees that both memory accesses use the same base register,
> > +   the offsets are constants within the range, and the gap between the
offsets is 4.
> > +   If preload complete then check that registers are legal.  WBACK
indicates whether
> > +   address is updated.  LOAD indicates whether memory access is load or
store.  */
> 
> ARM ARM terminology uses Rn for the base reg, so:
> 
> s/RTN/RN/

Fixed.

> 
> > +bool
> > +operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rtn, HOST_WIDE_INT offset,
> 
> s/rtn/rn/

Fixed.

> > +;; Patterns for LDRD/STRD in Thumb2 mode
> > +
> > +(define_insn "*thumb2_ldrd"
> > +  [(set (match_operand:SI 0 "s_register_operand" "=r")
> > +        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
> > +                         (match_operand:SI 2 "ldrd_strd_offset_operand"
"Do"))))
> > +   (set (match_operand:SI 3 "s_register_operand" "=r")
> > +        (mem:SI (plus:SI (match_dup 1)
> > +                         (match_operand:SI 4 "const_int_operand"
""))))]
> > +  "TARGET_LDRD && TARGET_THUMB2
> > +     && (current_tune->prefer_ldrd_strd &&
!optimize_function_for_size_p (cfun))
> 
> All these should be gated on "reload_completed" and not on the tune or 
> size optimization.

Removed the condition "!optimize_function_for_size_p (cfun))".

The condition "current_tune->prefer_ldrd_strd" is needed because the
patterns 
for LDRD/STRD appear before the patterns for LDM/STM that can match the same
RTL
(two register in the list). Condition "reload_completed" does not help with
it
because peephole optimizations in ldmstm.md may (after reload) create new
RTL insn 
that match this pattern.

> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> > index f330da3..21d1aa8 100644
> > --- a/gcc/config/arm/arm.c
> > +++ b/gcc/config/arm/arm.c
> > @@ -12130,6 +12130,9 @@ offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
> >   {
> >     HOST_WIDE_INT max_offset;
> >
> > +  if (!TARGET_LDRD)
> > +    return false;
> > +
> 
> This seems to be in the wrong place.  If we don't have ldrd then the 
> question as to what is a valid offset is irrelevant.

Moved this condition to predicates.md and constraints.md.

Other uses of offset_ok_for_ldrd_strd are already guarded by the conditions.

I am attaching a new version of this patch. 

No regression on qemu for arm-none-eabi with cpu cortex-m4 and cortex-a15.

Ok for trunk?

Thank you,
Greta

ChangeLog


gcc/

2012-10-19  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

        * config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
        declaration.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
        (thumb2_ldrd_base_neg): Likewise.
        (thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
        * predicates.md (ldrd_strd_offset_operand): New predicate.
        * config/arm/constraints.md (Do): New constraint.


[-- Attachment #2: 1-thumb-patterns.patch.txt --]
[-- Type: text/plain, Size: 9459 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 010e7fc..bfe96ea 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
 extern bool gen_const_stm_seq (rtx *, int);
 extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
+extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fc3a508..c60e62f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12185,6 +12185,75 @@ arm_pad_reg_upward (enum machine_mode mode,
   return !BYTES_BIG_ENDIAN;
 }
 
+/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
+   assuming that the address in the base register is word aligned.  */
+bool
+offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
+{
+  HOST_WIDE_INT max_offset;
+
+  /* Offset must be a multiple of 4 in Thumb mode.  */
+  if (TARGET_THUMB2 && ((offset & 3) != 0))
+    return false;
+
+  if (TARGET_THUMB2)
+    max_offset = 1020;
+  else if (TARGET_ARM)
+    max_offset = 255;
+  else
+    gcc_unreachable ();
+
+  return ((offset <= max_offset) && (offset >= -max_offset));
+}
+
+/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
+   Assumes that RT, RT2, and RN are REG.  This is guaranteed by the patterns.
+   Assumes that the address in the base register RN is word aligned.  Pattern
+   guarantees that both memory accesses use the same base register,
+   the offsets are constants within the range, and the gap between the offsets is 4.
+   If preload complete then check that registers are legal.  WBACK indicates whether
+   address is updated.  LOAD indicates whether memory access is load or store.  */
+bool
+operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rn, HOST_WIDE_INT offset,
+                       bool wback, bool load)
+{
+  unsigned int t, t2, n;
+
+  if (!reload_completed)
+    return true;
+
+  if (!offset_ok_for_ldrd_strd (offset))
+    return false;
+
+  t = REGNO (rt);
+  t2 = REGNO (rt2);
+  n = REGNO (rn);
+
+  if ((TARGET_THUMB2)
+      && ((wback && (n == t || n == t2))
+          || (t == SP_REGNUM)
+          || (t == PC_REGNUM)
+          || (t2 == SP_REGNUM)
+          || (t2 == PC_REGNUM)
+          || (!load && (n == PC_REGNUM))
+          || (load && (t == t2))
+          /* Triggers Cortex-M3 LDRD errata.  */
+          || (!wback && load && fix_cm3_ldrd && (n == t))))
+    return false;
+
+  if ((TARGET_ARM)
+      && ((wback && (n == t || n == t2))
+          || (t2 == PC_REGNUM)
+          || (t % 2 != 0)   /* First destination register is not even.  */
+          || (t2 != t + 1)
+          /* PC can be used as base register (for offset addressing only),
+             but it is depricated.  */
+          || (n == PC_REGNUM)))
+    return false;
+
+  return true;
+}
+
 \f
 /* Print a symbolic form of X to the debug file, F.  */
 static void
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7c80f91..3277561 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11511,6 +11511,99 @@
 ""
 )
 
+;; Patterns for LDRD/STRD in Thumb2 mode
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_strd_offset_operand" "Do"))))
+   (set (match_operand:SI 3 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (operands_ok_ldrd_strd (operands[0], operands[3],
+                                  operands[1], INTVAL (operands[2]),
+                                  false, true))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], 0, false, true))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base_neg"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (const_int -4))))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (match_dup 1)))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], -4, false, true))"
+  "ldrd%?\t%0, %2, [%1, #-4]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (match_operand:SI 1 "ldrd_strd_offset_operand" "Do")))
+        (match_operand:SI 2 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (match_operand:SI 3 "const_int_operand" "")))
+        (match_operand:SI 4 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
+     && (operands_ok_ldrd_strd (operands[2], operands[4],
+                                  operands[0], INTVAL (operands[1]),
+                                  false, false))"
+  "strd%?\t%2, %4, [%0, %1]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base"
+  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], 0, false, false))"
+  "strd%?\t%1, %2, [%0]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base_neg"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (const_int -4)))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (match_dup 0))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], -4, false, false))"
+  "strd%?\t%1, %2, [%0, #-4]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+
 ;; Load the load/store multiple patterns
 (include "ldmstm.md")
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b67df55..1b4167e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -31,7 +31,7 @@
 ;; 'H' was previously used for FPA.
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -279,6 +279,12 @@
       (match_test "TARGET_32BIT
 		   && imm_for_neon_inv_logic_operand (op, GET_MODE (op))")))
 
+(define_constraint "Do"
+ "@internal
+  In ARM/Thumb2 state valid offset for an ldrd/strd instruction."
+ (and (match_code "const_int")
+      (match_test "TARGET_LDRD && offset_ok_for_ldrd_strd (ival)")))
+
 (define_constraint "Dv"
  "@internal
   In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index f55acbf..8f49450 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -137,6 +137,10 @@
        (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
 	&& ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
 
+(define_predicate "ldrd_strd_offset_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "TARGET_LDRD && offset_ok_for_ldrd_strd (INTVAL (op))")))
+
 (define_predicate "arm_add_operand"
   (ior (match_operand 0 "arm_rhs_operand")
        (match_operand 0 "arm_neg_immediate_operand")))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-19 15:44     ` Greta Yorsh
@ 2012-10-19 15:52       ` Richard Earnshaw
  2012-10-19 16:54         ` Greta Yorsh
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-19 15:52 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 19/10/12 16:20, Greta Yorsh wrote:

> Removed the condition "!optimize_function_for_size_p (cfun))".
>
> The condition "current_tune->prefer_ldrd_strd" is needed because the
> patterns
> for LDRD/STRD appear before the patterns for LDM/STM that can match the same
> RTL
> (two register in the list). Condition "reload_completed" does not help with
> it
> because peephole optimizations in ldmstm.md may (after reload) create new
> RTL insn
> that match this pattern.
>

The point of the reload_completed is that these patterns have the 
potential to cause some problems if they somehow matched during earlier 
passes and the address base was an eliminable register.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-19 15:52       ` Richard Earnshaw
@ 2012-10-19 16:54         ` Greta Yorsh
  2012-10-19 17:16           ` Richard Earnshaw
  0 siblings, 1 reply; 13+ messages in thread
From: Greta Yorsh @ 2012-10-19 16:54 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

[-- Attachment #1: Type: text/plain, Size: 1950 bytes --]

> -----Original Message-----
> From: Richard Earnshaw
> Sent: 19 October 2012 16:44
> To: Greta Yorsh
> Cc: GCC Patches; Ramana Radhakrishnan; nickc@redhat.com;
> paul@codesourcery.com
> Subject: Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb
> mode
> 
> On 19/10/12 16:20, Greta Yorsh wrote:
> 
> > Removed the condition "!optimize_function_for_size_p (cfun))".
> >
> > The condition "current_tune->prefer_ldrd_strd" is needed because the
> > patterns
> > for LDRD/STRD appear before the patterns for LDM/STM that can match
> the same
> > RTL
> > (two register in the list). Condition "reload_completed" does not
> help with
> > it
> > because peephole optimizations in ldmstm.md may (after reload) create
> new
> > RTL insn
> > that match this pattern.
> >
> 
> The point of the reload_completed is that these patterns have the
> potential to cause some problems if they somehow matched during earlier
> passes and the address base was an eliminable register.
> 

Thank you for the explanation. Here is an updated patch.

Regression tests and bootstrap in progress for the entire sequence, after
addressing all other comments as well. 

OK for trunk, if bootstrap successful?

Thanks,
Greta


ChangeLog


gcc/

2012-10-19  Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <Greta.Yorsh@arm.com>

        * config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
        declaration.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
        (operands_ok_ldrd_strd): Likewise.
        * config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
        (thumb2_ldrd_base_neg): Likewise.
        (thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
        * predicates.md (ldrd_strd_offset_operand): New predicate.
        * config/arm/constraints.md (Do): New constraint.

[-- Attachment #2: 1-thumb-patterns.v2.patch.txt --]
[-- Type: text/plain, Size: 9579 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 010e7fc..bfe96ea 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -116,6 +116,8 @@ extern bool gen_stm_seq (rtx *, int);
 extern bool gen_const_stm_seq (rtx *, int);
 extern rtx arm_gen_load_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
 extern rtx arm_gen_store_multiple (int *, int, rtx, int, rtx, HOST_WIDE_INT *);
+extern bool offset_ok_for_ldrd_strd (HOST_WIDE_INT);
+extern bool operands_ok_ldrd_strd (rtx, rtx, rtx, HOST_WIDE_INT, bool, bool);
 extern int arm_gen_movmemqi (rtx *);
 extern enum machine_mode arm_select_cc_mode (RTX_CODE, rtx, rtx);
 extern enum machine_mode arm_select_dominance_cc_mode (rtx, rtx,
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fc3a508..c60e62f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -12185,6 +12185,75 @@ arm_pad_reg_upward (enum machine_mode mode,
   return !BYTES_BIG_ENDIAN;
 }
 
+/* Returns true iff OFFSET is valid for use in an LDRD/STRD instruction,
+   assuming that the address in the base register is word aligned.  */
+bool
+offset_ok_for_ldrd_strd (HOST_WIDE_INT offset)
+{
+  HOST_WIDE_INT max_offset;
+
+  /* Offset must be a multiple of 4 in Thumb mode.  */
+  if (TARGET_THUMB2 && ((offset & 3) != 0))
+    return false;
+
+  if (TARGET_THUMB2)
+    max_offset = 1020;
+  else if (TARGET_ARM)
+    max_offset = 255;
+  else
+    gcc_unreachable ();
+
+  return ((offset <= max_offset) && (offset >= -max_offset));
+}
+
+/* Checks whether the operands are valid for use in an LDRD/STRD instruction.
+   Assumes that RT, RT2, and RN are REG.  This is guaranteed by the patterns.
+   Assumes that the address in the base register RN is word aligned.  Pattern
+   guarantees that both memory accesses use the same base register,
+   the offsets are constants within the range, and the gap between the offsets is 4.
+   If preload complete then check that registers are legal.  WBACK indicates whether
+   address is updated.  LOAD indicates whether memory access is load or store.  */
+bool
+operands_ok_ldrd_strd (rtx rt, rtx rt2, rtx rn, HOST_WIDE_INT offset,
+                       bool wback, bool load)
+{
+  unsigned int t, t2, n;
+
+  if (!reload_completed)
+    return true;
+
+  if (!offset_ok_for_ldrd_strd (offset))
+    return false;
+
+  t = REGNO (rt);
+  t2 = REGNO (rt2);
+  n = REGNO (rn);
+
+  if ((TARGET_THUMB2)
+      && ((wback && (n == t || n == t2))
+          || (t == SP_REGNUM)
+          || (t == PC_REGNUM)
+          || (t2 == SP_REGNUM)
+          || (t2 == PC_REGNUM)
+          || (!load && (n == PC_REGNUM))
+          || (load && (t == t2))
+          /* Triggers Cortex-M3 LDRD errata.  */
+          || (!wback && load && fix_cm3_ldrd && (n == t))))
+    return false;
+
+  if ((TARGET_ARM)
+      && ((wback && (n == t || n == t2))
+          || (t2 == PC_REGNUM)
+          || (t % 2 != 0)   /* First destination register is not even.  */
+          || (t2 != t + 1)
+          /* PC can be used as base register (for offset addressing only),
+             but it is depricated.  */
+          || (n == PC_REGNUM)))
+    return false;
+
+  return true;
+}
+
 \f
 /* Print a symbolic form of X to the debug file, F.  */
 static void
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 7c80f91..3277561 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11511,6 +11511,99 @@
 ""
 )
 
+;; Patterns for LDRD/STRD in Thumb2 mode
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_strd_offset_operand" "Do"))))
+   (set (match_operand:SI 3 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (operands_ok_ldrd_strd (operands[0], operands[3],
+                                  operands[1], INTVAL (operands[2]),
+                                  false, true))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], 0, false, true))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd_base_neg"
+  [(set (match_operand:SI 0 "s_register_operand" "=r")
+	(mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (const_int -4))))
+   (set (match_operand:SI 2 "s_register_operand" "=r")
+        (mem:SI (match_dup 1)))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[0], operands[2],
+                                  operands[1], -4, false, true))"
+  "ldrd%?\t%0, %2, [%1, #-4]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (match_operand:SI 1 "ldrd_strd_offset_operand" "Do")))
+        (match_operand:SI 2 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (match_operand:SI 3 "const_int_operand" "")))
+        (match_operand:SI 4 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
+     && (operands_ok_ldrd_strd (operands[2], operands[4],
+                                  operands[0], INTVAL (operands[1]),
+                                  false, false))"
+  "strd%?\t%2, %4, [%0, %1]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base"
+  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], 0, false, false))"
+  "strd%?\t%1, %2, [%0]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd_base_neg"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (const_int -4)))
+        (match_operand:SI 1 "s_register_operand" "r"))
+   (set (mem:SI (match_dup 0))
+        (match_operand:SI 2 "s_register_operand" "r"))]
+  "TARGET_LDRD && TARGET_THUMB2 && reload_completed
+     && current_tune->prefer_ldrd_strd
+     && (operands_ok_ldrd_strd (operands[1], operands[2],
+                                  operands[0], -4, false, false))"
+  "strd%?\t%1, %2, [%0, #-4]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+
 ;; Load the load/store multiple patterns
 (include "ldmstm.md")
 
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index b67df55..1b4167e 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -31,7 +31,7 @@
 ;; 'H' was previously used for FPA.
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dd, Dn, Dl, DL, Do, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd, Pe
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -279,6 +279,12 @@
       (match_test "TARGET_32BIT
 		   && imm_for_neon_inv_logic_operand (op, GET_MODE (op))")))
 
+(define_constraint "Do"
+ "@internal
+  In ARM/Thumb2 state valid offset for an ldrd/strd instruction."
+ (and (match_code "const_int")
+      (match_test "TARGET_LDRD && offset_ok_for_ldrd_strd (ival)")))
+
 (define_constraint "Dv"
  "@internal
   In ARM/Thumb-2 state a const_double which can be used with a VFP fconsts
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index f55acbf..8f49450 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -137,6 +137,10 @@
        (match_test "((unsigned HOST_WIDE_INT) INTVAL (op)) <= GET_MODE_BITSIZE (mode)
 	&& ((unsigned HOST_WIDE_INT) INTVAL (op)) > 0")))
 
+(define_predicate "ldrd_strd_offset_operand"
+  (and (match_operand 0 "const_int_operand")
+       (match_test "TARGET_LDRD && offset_ok_for_ldrd_strd (INTVAL (op))")))
+
 (define_predicate "arm_add_operand"
   (ior (match_operand 0 "arm_rhs_operand")
        (match_operand 0 "arm_neg_immediate_operand")))

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb mode
  2012-10-19 16:54         ` Greta Yorsh
@ 2012-10-19 17:16           ` Richard Earnshaw
  0 siblings, 0 replies; 13+ messages in thread
From: Richard Earnshaw @ 2012-10-19 17:16 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches, Ramana Radhakrishnan, nickc, paul

On 19/10/12 17:51, Greta Yorsh wrote:
>> -----Original Message-----
>> From: Richard Earnshaw
>> Sent: 19 October 2012 16:44
>> To: Greta Yorsh
>> Cc: GCC Patches; Ramana Radhakrishnan; nickc@redhat.com;
>> paul@codesourcery.com
>> Subject: Re: [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD in Thumb
>> mode
>>
>> On 19/10/12 16:20, Greta Yorsh wrote:
>>
>>> Removed the condition "!optimize_function_for_size_p (cfun))".
>>>
>>> The condition "current_tune->prefer_ldrd_strd" is needed because the
>>> patterns
>>> for LDRD/STRD appear before the patterns for LDM/STM that can match
>> the same
>>> RTL
>>> (two register in the list). Condition "reload_completed" does not
>> help with
>>> it
>>> because peephole optimizations in ldmstm.md may (after reload) create
>> new
>>> RTL insn
>>> that match this pattern.
>>>
>>
>> The point of the reload_completed is that these patterns have the
>> potential to cause some problems if they somehow matched during earlier
>> passes and the address base was an eliminable register.
>>
>
> Thank you for the explanation. Here is an updated patch.
>
> Regression tests and bootstrap in progress for the entire sequence, after
> addressing all other comments as well.
>
> OK for trunk, if bootstrap successful?
>
> Thanks,
> Greta
>
>
> ChangeLog
>
>
> gcc/
>
> 2012-10-19  Sameera Deshpande  <sameera.deshpande@arm.com>
>              Greta Yorsh  <Greta.Yorsh@arm.com>
>
>          * config/arm/arm-protos.h (offset_ok_for_ldrd_strd): New
>          declaration.
>          (operands_ok_ldrd_strd): Likewise.
>          * config/arm/arm.c (offset_ok_for_ldrd_strd): New function.
>          (operands_ok_ldrd_strd): Likewise.
>          * config/arm/arm.md (thumb2_ldrd, thumb2_ldrd_base): New patterns.
>          (thumb2_ldrd_base_neg): Likewise.
>          (thumb2_strd, thumb2_strd_base, thumb_strd_base_neg): Likewise.
>          * predicates.md (ldrd_strd_offset_operand): New predicate.
>          * config/arm/constraints.md (Do): New constraint.
>

OK.

R.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-10-19 16:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-10 14:48 [PATCH, ARM][0/4] Prologue/epilogue using STRD/LDRD in Thumb mode Greta Yorsh
2012-10-10 15:03 ` [PATCH, ARM][1/4] New RTL patterns for LDRD/STRD " Greta Yorsh
2012-10-18 13:54   ` Richard Earnshaw
2012-10-19 15:44     ` Greta Yorsh
2012-10-19 15:52       ` Richard Earnshaw
2012-10-19 16:54         ` Greta Yorsh
2012-10-19 17:16           ` Richard Earnshaw
2012-10-10 15:03 ` [PATCH, ARM][2/4] Prologue using STRD " Greta Yorsh
2012-10-18 14:41   ` Richard Earnshaw
2012-10-10 15:04 ` [PATCH, ARM][3/4] Epilogue using LDRD " Greta Yorsh
2012-10-19 15:03   ` Richard Earnshaw
2012-10-10 15:13 ` [PATCH, ARM][4/4] Adjust tests gcc.target/arm/pr40457-*.c Greta Yorsh
2012-10-19 15:10   ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).