public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP.
@ 2011-10-11  9:22 Sameera Deshpande
  2011-10-11  9:31 ` [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15 Sameera Deshpande
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11  9:22 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
epilogue/prologue for ARM and Thumb-2 mode of A15.

Patch [1/5] introduces new field in tune which can be used to indicate
whether LDRD/STRD are preferred over POP/PUSH by the specific core.

Patches [2-5/5] use this field to determine if LDRD/STRD can be
generated instead of PUSH/POP in ARM and Thumb-2 mode.

Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
This patch depends on patch [1/5].

Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
This patch depends for variables, functions and patterns defined in
[1/5] and [2/5].

Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
patch depends on [1/5].

Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
patch depends for variables, functions and patterns defined in [1/5] and
[4/5].

All these patches depend upon the Thumb2/ARM RTL epilogue patches
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
review.

All these patches are applied in given order and tested with check-gcc,
check-gdb and bootstrap without regression. 

In case of ARM mode, significant performance improvement can be seen on
some parts of a popular embedded consumer benchmark (~26%). 
However, in most of the cases, not much effect is seen on performance.
(~ 3% improvement) 

In case of thumb2, the performance improvement observed on same parts
the benchmark is ~11% (2.5% improvement). 

-- 




^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.
  2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
@ 2011-10-11  9:31 ` Sameera Deshpande
  2011-10-21 12:56   ` Ramana Radhakrishnan
  2011-10-11  9:38 ` [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue Sameera Deshpande
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11  9:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2856 bytes --]

Hi!

This patch adds new field in tune_params to indicate if LDRD/STRD are
preferred over PUSH/POP in prologue/epilogue of specific core.
It also creates new tune for cortex-A15 and updates tunes for other
cores to set new field to default value. 

Changelog entry for Patch to create tune for cortex-a15:

2011-10-11  Sameera Deshpande
<sameera.deshpande@arm.com>                                             
        
        * config/arm/arm-cores.def (cortex_a15): Update.
        * config/arm/arm-protos.h (struct tune_params): Add new field...
          (arm_gen_ldrd_strd): ... this.
        * config/arm/arm.c (arm_slowmul_tune): Add 
          arm_gen_ldrd_strd field settings.
          (arm_fastmul_tune): Likewise.
          (arm_strongarm_tune): Likewise.
          (arm_xscale_tune): Likewise.
          (arm_9e_tune): Likewise.
          (arm_v6t2_tune): Likewise.
          (arm_cortex_tune): Likewise.
          (arm_cortex_a5_tune): Likewise.
          (arm_cortex_a9_tune): Likewise.
          (arm_fa726te_tune): Likewise. 
          (arm_cortex_a15_tune): New variable.
-- 


On Tue, 2011-10-11 at 10:08 +0100, Sameera Deshpande wrote:
> This series of 5 patches generate LDRD/STRD instead of POP/PUSH in
> epilogue/prologue for ARM and Thumb-2 mode of A15.
> 
> Patch [1/5] introduces new field in tune which can be used to indicate
> whether LDRD/STRD are preferred over POP/PUSH by the specific core.
> 
> Patches [2-5/5] use this field to determine if LDRD/STRD can be
> generated instead of PUSH/POP in ARM and Thumb-2 mode.
> 
> Patch [2/5] generates LDRD instead of POP for Thumb-2 epilogue in A15.
> This patch depends on patch [1/5].
> 
> Patch [3/5] generates STRD instead of PUSH for Thumb-2 prologue in A15.
> This patch depends for variables, functions and patterns defined in
> [1/5] and [2/5].
> 
> Patch [4/5] generates STRD instead of PUSH for ARM prologue in A15. This
> patch depends on [1/5].
> 
> Patch [5/5] generates LDRD instead of POP for ARM epilogue in A15. This
> patch depends for variables, functions and patterns defined in [1/5] and
> [4/5].
> 
> All these patches depend upon the Thumb2/ARM RTL epilogue patches
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01854.html,
> http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01855.html submitted for
> review.
> 
> All these patches are applied in given order and tested with check-gcc,
> check-gdb and bootstrap without regression. 
> 
> In case of ARM mode, significant performance improvement can be seen on
> some parts of a popular embedded consumer benchmark (~26%). 
> However, in most of the cases, not much effect is seen on performance.
> (~ 3% improvement) 
> 
> In case of thumb2, the performance improvement observed on same parts
> the benchmark is ~11% (2.5% improvement). 
> 

[-- Attachment #2: a15_tune_setup-4Oct.patch --]
[-- Type: text/x-patch, Size: 5642 bytes --]

diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 742b5e8..1b42713 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -128,7 +128,7 @@ ARM_CORE("generic-armv7-a", genericv7a,	7A,				 FL_LDSCHED, cortex)
 ARM_CORE("cortex-a5",	  cortexa5,	7A,				 FL_LDSCHED, cortex_a5)
 ARM_CORE("cortex-a8",	  cortexa8,	7A,				 FL_LDSCHED, cortex)
 ARM_CORE("cortex-a9",	  cortexa9,	7A,				 FL_LDSCHED, cortex_a9)
-ARM_CORE("cortex-a15",	  cortexa15,	7A,				 FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex)
+ARM_CORE("cortex-a15",	  cortexa15,	7A,				 FL_LDSCHED | FL_THUMB_DIV | FL_ARM_DIV, cortex_a15)
 ARM_CORE("cortex-r4",	  cortexr4,	7R,				 FL_LDSCHED, cortex)
 ARM_CORE("cortex-r4f",	  cortexr4f,	7R,				 FL_LDSCHED, cortex)
 ARM_CORE("cortex-r5",	  cortexr5,	7R,				 FL_LDSCHED | FL_ARM_DIV, cortex)
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f69bc42..c6b8f71 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -243,6 +243,9 @@ struct tune_params
   int l1_cache_line_size;
   bool prefer_constant_pool;
   int (*branch_cost) (bool, bool);
+  /* This flag indicates if STRD/LDRD instructions are preferred
+     over PUSH/POP in epilogue/prologue.  */
+  bool prefer_ldrd_strd;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6c09267..d709375 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -850,7 +850,8 @@ const struct tune_params arm_slowmul_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -861,7 +862,8 @@ const struct tune_params arm_fastmul_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -875,7 +877,8 @@ const struct tune_params arm_strongarm_tune =
   3,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -886,7 +889,8 @@ const struct tune_params arm_xscale_tune =
   3,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -897,7 +901,8 @@ const struct tune_params arm_9e_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -908,7 +913,8 @@ const struct tune_params arm_v6t2_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,					/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -920,7 +926,20 @@ const struct tune_params arm_cortex_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,					/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
+};
+
+const struct tune_params arm_cortex_a15_tune =
+{
+  arm_9e_rtx_costs,
+  NULL,
+  1,                                            /* Constant limit.  */
+  5,                                            /* Max cond insns.  */
+  ARM_PREFETCH_NOT_BENEFICIAL,
+  false,                                        /* Prefer constant pool.  */
+  arm_default_branch_cost,
+  true                                          /* Prefer LDRD/STRD.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -934,7 +953,8 @@ const struct tune_params arm_cortex_a5_tune =
   1,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   false,					/* Prefer constant pool.  */
-  arm_cortex_a5_branch_cost
+  arm_cortex_a5_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -945,7 +965,8 @@ const struct tune_params arm_cortex_a9_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_BENEFICIAL(4,32,32),
   false,					/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -956,7 +977,8 @@ const struct tune_params arm_fa726te_tune =
   5,						/* Max cond insns.  */
   ARM_PREFETCH_NOT_BENEFICIAL,
   true,						/* Prefer constant pool.  */
-  arm_default_branch_cost
+  arm_default_branch_cost,
+  false                                         /* Prefer LDRD/STRD.  */
 };
 
 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
  2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
  2011-10-11  9:31 ` [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15 Sameera Deshpande
@ 2011-10-11  9:38 ` Sameera Deshpande
  2011-10-13 18:14   ` Richard Henderson
  2011-10-11  9:53 ` [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue Sameera Deshpande
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11  9:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

Hi!

This patch generates LDRD instead of POP for Thumb2 epilogue in A15. 

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

As LDRD cannot be generated for PC, if PC is in register-list, LDRD is
generated for all other registers in the list which can form register
pair.
Then LDR with return is generated if PC is the only register left to be
popped, otherwise POP with return is generated.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit LDRD for thumb2 epilogue in A15:

2011-10-11  Sameera Deshpande
<sameera.deshpande@arm.com>                                             
                                                                                                       
        * config/arm/arm-protos.h (bad_reg_pair_for_thumb_ldrd_strd):
New 
          declaration.
        * config/arm/arm.c (bad_reg_pair_for_thumb_ldrd_strd): New
helper 
          function.
          (thumb2_emit_ldrd_pop): New static function.                
          (thumb2_expand_epilogue): Update functions.
        * config/arm/constraints.md (Pz): New constraint. 
        * config/arm/ldmstm.md (thumb2_ldrd_base): New pattern.
          (thumb2_ldrd): Likewise.
        * config/arm/predicates.md (ldrd_immediate_operand): New
predicate.

-- 



[-- Attachment #2: a15_thumb2_ldrd_epilogue-5Oct.patch --]
[-- Type: text/x-patch, Size: 10476 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index c6b8f71..06a67b5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -202,6 +202,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d709375..3eba510 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15410,6 +15410,155 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   par = emit_insn (par);
   add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
 }
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+          || GET_CODE (src2) != REG
+          || (REGNO (src1) == PC_REGNUM)
+          || (REGNO (src1) == SP_REGNUM)
+          || (REGNO (src1) == REGNO (src2))
+          || (REGNO (src2) == PC_REGNUM)
+          || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+    /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+       to be popped.  So, if num_regs is even, now it will become odd,
+       and we can generate pop with PC.  If num_regs is odd, it will be
+       even now, and ldr with return can be generated for PC.  */
+    num_regs--;
+
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+    /* Var j iterates over all the registers to gather all the registers in
+       saved_regs_mask.  Var i gives index of saved registers in stack frame.
+       A PARALLEL RTX of register-pair is created here, so that pattern for
+       LDRD can be matched.  As PC is always last register to be popped, and
+       we have already decremented num_regs if PC, we don't have to worry
+       about PC in this loop.  */
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+
+        /* Create RTX for memory load.  New RTX is created for dwarf as
+           they are not sharable.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           reg,
+                           gen_frame_mem (SImode,
+                               plus_constant (stack_pointer_rtx, 4 * i)));
+
+        tmp1 = gen_rtx_SET (SImode,
+                           reg,
+                           gen_frame_mem (SImode,
+                               plus_constant (stack_pointer_rtx, 4 * i)));
+        RTX_FRAME_RELATED_P (tmp) = 1;
+        RTX_FRAME_RELATED_P (tmp1) = 1;
+
+        if (i % 2 == 0)
+          {
+            /* When saved-register index (i) is even, the RTX to be emitted is
+               yet to be created.  Hence create it first.  The LDRD pattern we
+               are generating is :
+               [ (SET (reg_t0) (MEM (PLUS (SP) (NUM))))
+                 (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4)))) ]
+               where target registers need not be consecutive.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+            /* We need to maintain a sequence for DWARF info too.  */
+            dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (2));
+          }
+
+        /* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+           added as 0th element and if i is odd, reg_i is added as 1st element
+           of LDRD pattern shown above.  */
+        XVECEXP (par, 0, (i % 2)) = tmp;
+        XVECEXP (dwarf, 0, (i % 2)) = tmp1;
+
+        if ((i % 2) == 1)
+          {
+            /* When saved-register index (i) is odd, RTXs for both the registers
+               to be loaded are generated in above given LDRD pattern, and the
+               pattern can be emitted now.  */
+            par = emit_insn (par);
+            add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
+          }
+
+        i++;
+      }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+     number of registers are even AND really_return is true, last register is
+     popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+     then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+     num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, 4 * i));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  emit_insn (tmp);
+
+  if (((num_regs % 2) == 1 && !really_return)
+      || ((num_regs % 2) == 0 && really_return))
+    {
+      /* Gen LDR with post increment here.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j++);
+
+      tmp1 = gen_rtx_MEM (SImode,
+                          gen_rtx_POST_INC (SImode,
+                                            stack_pointer_rtx));
+      set_mem_alias_set (tmp1, get_frame_alias_set ());
+
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, reg, tmp1);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+
+      if (really_return)
+        {
+          /* If really_return, j must be PC_REGNUM.  */
+          gcc_assert (j == PC_REGNUM);
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          XVECEXP (par, 0, 1) = tmp;
+          emit_jump_insn (par);
+        }
+      else
+        {
+          emit_insn (tmp);
+        }
+    }
+  else if ((num_regs % 2) == 1 && really_return)
+    {
+      /* There are 2 registers to be popped.  So, generate the pattern
+         pop_multiple_with_stack_update_and_return to pop in PC.  */
+      arm_emit_multi_reg_pop (saved_regs_mask & (~((1 << j) - 1)),
+                              really_return);
+    }
+
+  return really_return;
+}
 
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
@@ -22236,7 +22385,13 @@ thumb2_expand_epilogue (bool is_sibling)
               really_return = true;
             }
 
-          arm_emit_multi_reg_pop (saved_regs_mask, really_return);
+          if (!current_tune->prefer_ldrd_strd || optimize_size)
+            arm_emit_multi_reg_pop (saved_regs_mask, really_return);
+          else
+            /* Generate LDRD pattern instead of POP pattern.  */
+            really_return = thumb2_emit_ldrd_pop (saved_regs_mask,
+                                                  really_return);
+
           if (really_return == true)
             return;
         }
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index d8ce982..3c55699 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -207,6 +207,12 @@
   (and (match_code "const_int")
        (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 255")))
 
+(define_constraint "Pz"
+  "@internal In Thumb-2 state a constant in the range -1020 to 1020"
+  (and (match_code "const_int")
+       (match_test "TARGET_THUMB2 && ival >= -1020 && ival <= 1020
+                    && ival % 4 == 0")))
+
 (define_constraint "G"
  "In ARM/Thumb-2 state a valid FPA immediate constant."
  (and (match_code "const_double")
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 5db4a32..21d2815 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -21,6 +21,32 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[2])))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_immediate_operand" "Pz"))))
+   (set (match_operand:SI 3 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[3])))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 79e65fe..e074425 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -203,6 +203,10 @@
 	    (match_test "(GET_CODE (op) != CONST_INT
 			  || (INTVAL (op) < 4096 && INTVAL (op) > -4096))"))))
 
+(define_predicate "ldrd_immediate_operand"
+  (and (match_operand 0 "const_int_operand")
+  (match_test "(INTVAL (op) < 1020 && INTVAL (op) > -1020)")))
+
 ;; True for operators that can be combined with a shift in ARM state.
 (define_special_predicate "shiftable_operator"
   (and (match_code "plus,minus,ior,xor,and")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.
  2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
  2011-10-11  9:31 ` [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15 Sameera Deshpande
  2011-10-11  9:38 ` [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue Sameera Deshpande
@ 2011-10-11  9:53 ` Sameera Deshpande
  2011-10-21 13:00   ` Ramana Radhakrishnan
  2011-10-11 10:12 ` [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue Sameera Deshpande
  2011-10-11 10:19 ` [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue Sameera Deshpande
  4 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11  9:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 815 bytes --]

Hi!

This patch generates STRD instruction instead of PUSH in thumb2 mode for
A15.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entries for the patch for STRD generation for a15-thumb2:

2011-10-11  Sameera Deshpande
<sameera.deshpande@arm.com>                                             
                                                                                                       
        * config/arm/arm.c (thumb2_emit_strd_push): New static
function.  
          (arm_expand_prologue): Update. 
        * config/arm/ldmstm.md (thumb2_strd): New pattern.
          (thumb2_strd_base): Likewise.
-- 



[-- Attachment #2: a15_thumb2_strd_prologue-6Oct.patch --]
[-- Type: text/x-patch, Size: 7412 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 3eba510..fd8c31d 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15095,6 +15095,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
     }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, first register is
+   stored by using STR pattern.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+     registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+    /* Var j iterates over all the registers to gather all the registers in
+       saved_regs_mask.  Var i gives index of register R_j in stack frame.
+       A PARALLEL RTX of register-pair is created here, so that pattern for
+       STRD can be matched.  If num_regs is odd, 1st register will be pushed
+       using STR and remaining registers will be pushed with STRD in pairs.
+       If num_regs is even, all registers are pushed with STRD in pairs.
+       Hence, skip first element for odd num_regs.  */
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+        gcc_assert (j != PC_REGNUM);
+
+        /* Create RTX for store.  New RTX is created for dwarf as
+           they are not sharable.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (stack_pointer_rtx, 4 * i)),
+                           reg);
+
+        tmp1 = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (stack_pointer_rtx, 4 * i)),
+                           reg);
+        RTX_FRAME_RELATED_P (tmp) = 1;
+        RTX_FRAME_RELATED_P (tmp1) = 1;
+
+        if (((i - (num_regs % 2)) % 2) == 1)
+          /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+             be created.  Hence create it first.  The STRD pattern we are
+             generating is :
+             [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+               (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+             were target registers need not be consecutive.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+        /* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+           even, the reg_j is added as 0th element and if it is odd, reg_i is
+           added as 1st element of STRD pattern shown above.  */
+        XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+        XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+        if (((i - (num_regs % 2)) % 2) == 0)
+          /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+             to be loaded are generated in above given STRD pattern, and the
+             pattern can be emitted now.  */
+          emit_insn (par);
+
+        i--;
+      }
+
+  if ((num_regs % 2) == 1)
+    {
+      /* If odd number of registers are pushed, generate STR pattern to store
+         lone register.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j--);
+
+      tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i));
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, tmp1, reg);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+
+      emit_insn (tmp);
+
+      tmp1 = gen_rtx_SET (SImode,
+                         gen_frame_mem
+                         (SImode,
+                          plus_constant (stack_pointer_rtx, 4 * i)),
+                         reg);
+      RTX_FRAME_RELATED_P (tmp1) = 1;
+      XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+    }
+
+  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  return;
+}
+
 /* Generate and emit an insn that we will recognize as a push_multi.
    Unfortunately, since this insn does not reflect very well the actual
    semantics of the operation, we need to annotate the insn for the benefit
@@ -16307,8 +16426,16 @@ arm_expand_prologue (void)
 	      saved_regs += frame;
 	    }
 	}
-      insn = emit_multi_reg_push (live_regs_mask);
-      RTX_FRAME_RELATED_P (insn) = 1;
+
+      if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size)
+        {
+          thumb2_emit_strd_push (live_regs_mask);
+        }
+      else
+        {
+          insn = emit_multi_reg_push (live_regs_mask);
+          RTX_FRAME_RELATED_P (insn) = 1;
+        }
     }
 
   if (! IS_VOLATILE (func_type))
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 21d2815..e3dcd4f 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -47,6 +47,32 @@
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
 
+(define_insn "*thumb2_strd_base"
+  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
+        (match_operand:SI 1 "register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "register_operand" "r"))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[1], operands[2])))"
+  "strd%?\t%1, %2, [%0]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (match_operand:SI 1 "ldrd_immediate_operand" "Pz")))
+        (match_operand:SI 2 "register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (match_operand:SI 3 "const_int_operand" "")))
+        (match_operand:SI 4 "register_operand" "r"))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[2], operands[4])))"
+  "strd%?\t%2, %4, [%0, %1]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.
  2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
                   ` (2 preceding siblings ...)
  2011-10-11  9:53 ` [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue Sameera Deshpande
@ 2011-10-11 10:12 ` Sameera Deshpande
  2011-10-21 13:21   ` Ramana Radhakrishnan
  2011-10-11 10:19 ` [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue Sameera Deshpande
  4 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11 10:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1641 bytes --]

Hi!

This patch generates STRD instead of PUSH in prologue for A15 ARM mode.

For optimize_size, original prologue is generated for A15.
The work involves defining new functions, predicates and patterns, along
with minor changes in existing code:
* STRD in ARM mode needs consecutive registers to be stored. The
performance of compiler degrades greatly if R3 is pushed for stack
alignment as it generates single LDR for pushing R3. Instead, having SUB
instruction to do stack adjustment is more efficient. Hence, the
condition in arm_get_frame_offsets () is changed to disable push-in-R3
if prefer_ldrd_strd in ARM mode.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be pushed is found. Then, first PUSH all the
accumulated registers, followed by STRD with pre-stack update for
register-pair. We repeat this until all the registers in register-list
are PUSHed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression. 

Changelog entry for Patch to emit STRD for ARM prologue in A15:

2011-10-11  Sameera Deshpande
<sameera.deshpande@arm.com>                                             
                                                                           
        * config/arm/arm-protos.h (bad_reg_pair_for_arm_ldrd_strd): New
declaration.
        * config/arm/arm.c (arm_emit_strd_push): New static function.  
          (bad_reg_pair_for_arm_ldrd_strd): New helper function.
          (arm_expand_prologue): Update. 
          (arm_get_frame_offsets): Update.
        * config/arm/ldmstm.md (arm_strd_base): New pattern.
-- 



[-- Attachment #2: a15_arm_strd_prologue-6Oct.patch --]
[-- Type: text/x-patch, Size: 9520 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 06a67b5..d5287ad 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -162,6 +162,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index fd8c31d..08fa0d5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15095,6 +15096,116 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
     }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi-reg PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; j--)
+    /* Var j iterates over all registers to gather all registers in
+       saved_regs_mask.  Var i is used to count number of registers stored on
+       stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+       that can be pushed using multi-reg PUSH before STRD is generated.  */
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+        gcc_assert (j != PC_REGNUM);
+        i++;
+
+        if ((j % 2 == 1)
+            && (saved_regs_mask & (1 << (j - 1)))
+            && regs_to_be_pushed_mask)
+          {
+            /* Current register and previous register form register pair for
+               which STRD can be generated.  Hence, emit PUSH for accumulated
+               registers and reset regs_to_be_pushed_mask.  */
+            insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+            regs_to_be_pushed_mask = 0;
+            RTX_FRAME_RELATED_P (insn) = 1;
+            continue;
+          }
+
+        regs_to_be_pushed_mask |= (1 << j);
+
+        if ((j % 2) == 0 && (saved_regs_mask & (1 << (j + 1))))
+          {
+            /* We have found 2 consecutive registers, for which STRD can be
+               generated.  Generate pattern to emit STRD as accumulated
+               registers have already been pushed.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+            dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+            tmp = gen_rtx_SET (VOIDmode,
+                               stack_pointer_rtx,
+                               plus_constant (stack_pointer_rtx, -8));
+            tmp1 = gen_rtx_SET (VOIDmode,
+                                stack_pointer_rtx,
+                                plus_constant (stack_pointer_rtx, -8));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 0) = tmp;
+            XVECEXP (dwarf, 0, 0) = tmp1;
+
+            tmp = gen_rtx_SET (SImode,
+                               gen_frame_mem (SImode, stack_pointer_rtx),
+                               gen_rtx_REG (SImode, j));
+            tmp1 = gen_rtx_SET (SImode,
+                                gen_frame_mem (SImode, stack_pointer_rtx),
+                                gen_rtx_REG (SImode, j));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 1) = tmp;
+            XVECEXP (dwarf, 0, 1) = tmp1;
+
+            tmp = gen_rtx_SET (SImode,
+                          gen_frame_mem (SImode,
+                                    plus_constant (stack_pointer_rtx, 4)),
+                          gen_rtx_REG (SImode, j + 1));
+            tmp1 = gen_rtx_SET (SImode,
+                           gen_frame_mem (SImode,
+                                     plus_constant (stack_pointer_rtx, 4)),
+                           gen_rtx_REG (SImode, j + 1));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 2) = tmp;
+            XVECEXP (dwarf, 0, 2) = tmp1;
+
+            insn = emit_insn (par);
+            add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+            RTX_FRAME_RELATED_P (insn) = 1;
+            regs_to_be_pushed_mask = 0;
+          }
+      }
+
+  /* Check if any accumulated registers are yet to be pushed, and generate
+     multi-reg PUSH for them.  */
+  if (regs_to_be_pushed_mask)
+    {
+      insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+      RTX_FRAME_RELATED_P (insn) = 1;
+    }
+
+  return;
+}
+
 /* Generate and emit a pattern that will be recognized as STRD pattern.  If even
    number of registers are being pushed, multiple STRD patterns are created for
    all register pairs.  If odd number of registers are pushed, first register is
@@ -15529,6 +15640,18 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   par = emit_insn (par);
   add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);
 }
+
+bool
+bad_reg_pair_for_arm_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+          || GET_CODE (src2) != REG
+          || ((REGNO (src1) + 1) != REGNO (src2))
+          || ((REGNO (src1) % 2) != 0)
+          || (REGNO (src2) == PC_REGNUM)
+          || (REGNO (src2) == SP_REGNUM));
+}
+
 bool
 bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
 {
@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
 	     use 32-bit push/pop instructions.  */
  	  if (! any_sibcall_uses_r3 ()
 	      && arm_size_return_regs () <= 12
-	      && (offsets->saved_regs_mask & (1 << 3)) == 0)
+	      && (offsets->saved_regs_mask & (1 << 3)) == 0
+              && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
 	    {
 	      reg = 3;
 	    }
@@ -16427,9 +16551,12 @@ arm_expand_prologue (void)
 	    }
 	}
 
-      if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size)
+      if (current_tune->prefer_ldrd_strd && !optimize_size)
         {
-          thumb2_emit_strd_push (live_regs_mask);
+          if (TARGET_THUMB2)
+            thumb2_emit_strd_push (live_regs_mask);
+          else
+            arm_emit_strd_push (live_regs_mask);
         }
       else
         {
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index e3dcd4f..3c729bb 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -73,6 +73,42 @@
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
 
+(define_insn "*arm_strd_base"
+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
+        (plus:SI (match_dup 0)
+                 (const_int -8)))
+   (set (mem:SI (match_dup 0))
+        (match_operand:SI 1 "arm_hard_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "arm_hard_register_operand" "r"))]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  "str%(d%)\t%1, %2, [%0, #-8]!"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int -8)))
+     (set (mem:SI (match_dup 0))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 0)
+                           (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  [(set (mem:DI (pre_dec:SI (match_dup 0)))
+        (match_dup 1))]
+  "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
+)
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.
  2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
                   ` (3 preceding siblings ...)
  2011-10-11 10:12 ` [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue Sameera Deshpande
@ 2011-10-11 10:19 ` Sameera Deshpande
  2011-10-21 13:30   ` Ramana Radhakrishnan
  4 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-10-11 10:19 UTC (permalink / raw)
  To: gcc-patches; +Cc: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

Hi!

This patch generates LDRD instead of POP in epilogue for A15 ARM mode.

For optimize_size, original epilogue is generated for A15.
The work involves defining new functions, predicates and patterns.

In this patch we keep on accumulating non-consecutive registers till
register-pair to be popped is found. Then, first POP all the accumulated
registers, followed by LDRD with post-stack update for register-pair. We
repeat this until all the registers in register-list are POPPed.

The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.
 
Changelog entry for Patch to emit LDRD for ARM epilogue in A15:

2011-10-11  Sameera Deshpande
<sameera.deshpande@arm.com>                                             
                                                                           
        * config/arm/arm.c (arm_emit_ldrd_pop): New static function.  
          (arm_expand_epilogue): Update. 
        * config/arm/ldmstm.md (arm_ldrd_base): New pattern.
          (arm_ldr_with_update): Likewise. 
-- 



[-- Attachment #2: a15_arm_ldrd_epilogue-6Oct.patch --]
[-- Type: text/x-patch, Size: 9751 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08fa0d5..0b9fd93 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -967,7 +967,7 @@ const struct tune_params arm_cortex_a9_tune =
   ARM_PREFETCH_BENEFICIAL(4,32,32),
   false,					/* Prefer constant pool.  */
   arm_default_branch_cost,
-  false                                         /* Prefer LDRD/STRD.  */
+  true                                          /* Prefer LDRD/STRD.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -15664,6 +15664,145 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
           || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi-reg POP takes care of lone registers as well.  However, LDRD
+   cannot be generated for PC, as results are unpredictable.  Hence, if PC is
+   in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        i++;
+        if ((j % 2) == 0
+            && (saved_regs_mask & (1 << (j + 1)))
+            && (j + 1) != SP_REGNUM
+            && (j + 1) != PC_REGNUM
+            && regs_to_be_popped_mask)
+          {
+            /* Current register and next register form register pair for which
+               LDRD can be generated.  Generate POP for accumulated registers
+               and reset regs_to_be_popped_mask.  SP should be handled here as
+               the results are unpredictable if register being stored is same
+               as index register (in this case, SP).  PC is always the last
+               register being popped.  Hence, we don't have to worry about PC
+               here.  */
+            arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+            pc_in_list = false;
+            regs_to_be_popped_mask = 0;
+            continue;
+          }
+
+        if (j == PC_REGNUM)
+          {
+            gcc_assert (really_return);
+            pc_in_list = 1;
+          }
+
+        regs_to_be_popped_mask |= (1 << j);
+
+        if ((j % 2) == 1
+            && (saved_regs_mask & (1 << (j - 1)))
+            && j != SP_REGNUM
+            && j != PC_REGNUM)
+          {
+             /* Generate a LDRD for register pair R_<j>, R_<j+1>.  The pattern
+                generated here is
+                [(SET SP, (PLUS SP, 8))
+                 (SET R_<j-1>, (MEM SP))
+                 (SET R_<j>, (MEM (PLUS SP, 4)))].  */
+             par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+             dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+             tmp = gen_rtx_SET (VOIDmode,
+                                stack_pointer_rtx,
+                                plus_constant (stack_pointer_rtx, 8));
+             tmp1 = gen_rtx_SET (VOIDmode,
+                                 stack_pointer_rtx,
+                                 plus_constant (stack_pointer_rtx, 8));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             RTX_FRAME_RELATED_P (tmp1) = 1;
+             XVECEXP (par, 0, 0) = tmp;
+             XVECEXP (dwarf, 0, 0) = tmp1;
+
+             tmp = gen_rtx_SET (SImode,
+                                gen_rtx_REG (SImode, j - 1),
+                                gen_frame_mem (SImode, stack_pointer_rtx));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             tmp1 = gen_rtx_SET (SImode,
+                                gen_rtx_REG (SImode, j - 1),
+                                gen_frame_mem (SImode, stack_pointer_rtx));
+             RTX_FRAME_RELATED_P (tmp1) = 1;
+             XVECEXP (par, 0, 1) = tmp;
+             XVECEXP (dwarf, 0, 1) = tmp1;
+
+             tmp = gen_rtx_SET (SImode,
+                                 gen_rtx_REG (SImode, j),
+                                 gen_frame_mem (SImode,
+                                       plus_constant (stack_pointer_rtx, 4)));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             tmp1 = gen_rtx_SET (SImode,
+                                 gen_rtx_REG (SImode, j),
+                                 gen_frame_mem (SImode,
+                                       plus_constant (stack_pointer_rtx, 4)));
+             RTX_FRAME_RELATED_P (tmp1) = 1;
+             XVECEXP (par, 0, 2) = tmp;
+             XVECEXP (dwarf, 0, 2) = tmp1;
+
+             insn = emit_insn (par);
+             add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+             pc_in_list = false;
+             regs_to_be_popped_mask = 0;
+          }
+      }
+
+  if (regs_to_be_popped_mask)
+    {
+      /* single PC pop can happen here.  Take care of that.  */
+      if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+        {
+          /* Only PC is to be popped.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          tmp = gen_rtx_SET (SImode,
+                             gen_rtx_REG (SImode, PC_REGNUM),
+                             gen_frame_mem (SImode,
+                                            gen_rtx_POST_INC (SImode,
+                                                         stack_pointer_rtx)));
+          RTX_FRAME_RELATED_P (tmp) = 1;
+          XVECEXP (par, 0, 1) = tmp;
+          emit_jump_insn (par);
+        }
+      else
+        {
+          arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+        }
+    }
+
+  return;
+}
+
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
    number of registers are being popped, multiple LDRD patterns are created for
    all register pairs.  If odd number of registers are popped, last register is
@@ -22488,8 +22627,13 @@ arm_expand_epilogue (bool really_return)
                     saved_regs_mask |=   (1 << PC_REGNUM);
                     return_in_pc = true;
                   }
-        
-                arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+
+                if (!current_tune->prefer_ldrd_strd || optimize_size)
+                  arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+                else
+                  /* Generate LDRD pattern instead of POP pattern.  */
+                  arm_emit_ldrd_pop (saved_regs_mask, return_in_pc);
+
                 if (return_in_pc == true)
                   return;
               }
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 3c729bb..7d69b0b 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -109,6 +109,54 @@
   "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
 )
 
+(define_insn "*arm_ldrd_base"
+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+   (set (match_operand:SI 1 "arm_hard_register_operand" "=r")
+        (mem:SI (match_dup 0)))
+   (set (match_operand:SI 2 "arm_hard_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4))))]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  "ldr%(d%)\t%1, %2, [%0], #8"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 0)
+                           (const_int 4))))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  [(set (match_dup 1)
+        (mem:DI (post_inc:SI (match_dup 0))))]
+  "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
+)
+
+(define_insn "*arm_ldr_with_update"
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 4)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd)"
+  "ldr%?\t%1, [%0], #4"
+  [(set_attr "type" "load1")
+  (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
  2011-10-11  9:38 ` [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue Sameera Deshpande
@ 2011-10-13 18:14   ` Richard Henderson
  2011-11-07  9:54     ` Sameera Deshpande
  0 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2011-10-13 18:14 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

On 10/11/2011 02:21 AM, Sameera Deshpande wrote:
> +            /* When saved-register index (i) is odd, RTXs for both the registers
> +               to be loaded are generated in above given LDRD pattern, and the
> +               pattern can be emitted now.  */
> +            par = emit_insn (par);
> +            add_reg_note (par, REG_FRAME_RELATED_EXPR, dwarf);

I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
anything besides prologues.  You need to emit REG_CFA_RESTORE
for the pop inside an epilogue.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15.
  2011-10-11  9:31 ` [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15 Sameera Deshpande
@ 2011-10-21 12:56   ` Ramana Radhakrishnan
  0 siblings, 0 replies; 18+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-21 12:56 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

> 2011-10-11  Sameera Deshpande
> <sameera.deshpande@arm.com>
>
>        * config/arm/arm-cores.def (cortex_a15): Update.
>        * config/arm/arm-protos.h (struct tune_params): Add new field...
>          (arm_gen_ldrd_strd): ... this.
>        * config/arm/arm.c (arm_slowmul_tune): Add
>          arm_gen_ldrd_strd field settings.
>          (arm_fastmul_tune): Likewise.
>          (arm_strongarm_tune): Likewise.
>          (arm_xscale_tune): Likewise.
>          (arm_9e_tune): Likewise.
>          (arm_v6t2_tune): Likewise.
>          (arm_cortex_tune): Likewise.
>          (arm_cortex_a5_tune): Likewise.
>          (arm_cortex_a9_tune): Likewise.
>          (arm_fa726te_tune): Likewise.
>          (arm_cortex_a15_tune): New variable.
> --

OK.

Ramana

>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.
  2011-10-11  9:53 ` [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue Sameera Deshpande
@ 2011-10-21 13:00   ` Ramana Radhakrishnan
  2011-11-07  9:55     ` Sameera Deshpande
  0 siblings, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-21 13:00 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

On 11 October 2011 10:27, Sameera Deshpande <sameera.deshpande@arm.com> wrote:
> Hi!
>
> This patch generates STRD instruction instead of PUSH in thumb2 mode for
> A15.
>
> For optimize_size, original prologue is generated for A15.
> The work involves defining new functions, predicates and patterns.
>


> +/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
> +   number of registers are being pushed, multiple STRD patterns are created for
> +   all register pairs.  If odd number of registers are pushed, first register is

numchar > 80

> +   stored by using STR pattern.  */

s/stored/Stored.
A better comment would be

"Emit a combination of strd and str's for the prologue saves.  "

> +static void
> +thumb2_emit_strd_push (unsigned long saved_regs_mask)
> +{
> +  int num_regs = 0;
> +  int i, j;
> +  rtx par = NULL_RTX;
> +  rtx insn = NULL_RTX;
> +  rtx dwarf = NULL_RTX;
> +  rtx tmp, reg, tmp1;
> +
> +  for (i = 0; i <= LAST_ARM_REGNUM; i++)
> +    if (saved_regs_mask & (1 << i))
> +      num_regs++;
> +
> +  gcc_assert (num_regs && num_regs <= 16);
> +
> +  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
> +     registers to push.  */
> +  tmp = gen_rtx_SET (VOIDmode,
> +                     stack_pointer_rtx,
> +                     plus_constant (stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp) = 1;
> +  insn = emit_insn (tmp);
> +
> +  /* Create sequence for DWARF info.  */
> +  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
> +
> +  /* RTLs cannot be shared, hence create new copy for dwarf.  */
> +  tmp1 = gen_rtx_SET (VOIDmode,
> +                     stack_pointer_rtx,
> +                     plus_constant (stack_pointer_rtx, -4 * num_regs));
> +  RTX_FRAME_RELATED_P (tmp1) = 1;
> +  XVECEXP (dwarf, 0, 0) = tmp1;
> +
> +  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
> +    /* Var j iterates over all the registers to gather all the registers in
> +       saved_regs_mask.  Var i gives index of register R_j in stack frame.
> +       A PARALLEL RTX of register-pair is created here, so that pattern for
> +       STRD can be matched.  If num_regs is odd, 1st register will be pushed
> +       using STR and remaining registers will be pushed with STRD in pairs.
> +       If num_regs is even, all registers are pushed with STRD in pairs.
> +       Hence, skip first element for odd num_regs.  */

Comment before the loop please.

> +    if (saved_regs_mask & (1 << j))
> +      {
> +        gcc_assert (j != SP_REGNUM);
> +        gcc_assert (j != PC_REGNUM);
> +
> +        /* Create RTX for store.  New RTX is created for dwarf as
> +           they are not sharable.  */
> +        reg = gen_rtx_REG (SImode, j);
> +        tmp = gen_rtx_SET (SImode,
> +                           gen_frame_mem
> +                           (SImode,
> +                            plus_constant (stack_pointer_rtx, 4 * i)),
> +                           reg);
> +
> +        tmp1 = gen_rtx_SET (SImode,
> +                           gen_frame_mem
> +                           (SImode,
> +                            plus_constant (stack_pointer_rtx, 4 * i)),
> +                           reg);
> +        RTX_FRAME_RELATED_P (tmp) = 1;
> +        RTX_FRAME_RELATED_P (tmp1) = 1;
> +
> +        if (((i - (num_regs % 2)) % 2) == 1)
> +          /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
> +             be created.  Hence create it first.  The STRD pattern we are
> +             generating is :
> +             [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
> +               (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
> +             were target registers need not be consecutive.  */
> +          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
> +
> +        /* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
> +           even, the reg_j is added as 0th element and if it is odd, reg_i is
> +           added as 1st element of STRD pattern shown above.  */
> +        XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
> +        XVECEXP (dwarf, 0, (i + 1)) = tmp1;
> +
> +        if (((i - (num_regs % 2)) % 2) == 0)
> +          /* When (i - (num_regs % 2)) is even, RTXs for both the registers
> +             to be loaded are generated in above given STRD pattern, and the
> +             pattern can be emitted now.  */
> +          emit_insn (par);
> +
> +        i--;
> +      }
> +
> +  if ((num_regs % 2) == 1)
> +    {
> +      /* If odd number of registers are pushed, generate STR pattern to store
> +         lone register.  */
> +      for (; (saved_regs_mask & (1 << j)) == 0; j--);
> +
> +      tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i));
> +      reg = gen_rtx_REG (SImode, j);
> +      tmp = gen_rtx_SET (SImode, tmp1, reg);
> +      RTX_FRAME_RELATED_P (tmp) = 1;
> +
> +      emit_insn (tmp);
> +
> +      tmp1 = gen_rtx_SET (SImode,
> +                         gen_frame_mem
> +                         (SImode,
> +                          plus_constant (stack_pointer_rtx, 4 * i)),
> +                         reg);
> +      RTX_FRAME_RELATED_P (tmp1) = 1;
> +      XVECEXP (dwarf, 0, (i + 1)) = tmp1;
> +    }
> +
> +  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
> +  RTX_FRAME_RELATED_P (insn) = 1;
> +  return;
> +}
> +
>  /* Generate and emit an insn that we will recognize as a push_multi.
>     Unfortunately, since this insn does not reflect very well the actual
>     semantics of the operation, we need to annotate the insn for the benefit
> @@ -16307,8 +16426,16 @@ arm_expand_prologue (void)
>  	      saved_regs += frame;
>  	    }
>  	}
> -      insn = emit_multi_reg_push (live_regs_mask);
> -      RTX_FRAME_RELATED_P (insn) = 1;
> +
> +      if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size)

Replace optimize_size by optimize_function_for_size_p () .

OK with those changes.

ramana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.
  2011-10-11 10:12 ` [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue Sameera Deshpande
@ 2011-10-21 13:21   ` Ramana Radhakrishnan
  2011-11-08 11:14     ` Sameera Deshpande
  0 siblings, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-21 13:21 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

>+/* STRD in ARM mode needs consecutive registers to be stored.  This function
>+   keeps accumulating non-consecutive registers until first consecutive register

numchar > 80.

>+   pair is found.  It then generates multi-reg PUSH for all accumulated
>+   registers, and then generates STRD with write-back for consecutive register
>+   pair.  This process is repeated until all the registers are stored on stack.

And again.

>+   multi-reg PUSH takes care of lone registers as well.  */

s/multi-reg/Multi register

>+static void
>+arm_emit_strd_push (unsigned long saved_regs_mask)

How different is this from the thumb2 version you sent out in Patch 03/05 ?

>+{
>+  int num_regs = 0;
>+  int i, j;
>+  rtx par = NULL_RTX;
>+  rtx dwarf = NULL_RTX;
>+  rtx insn = NULL_RTX;
>+  rtx tmp, tmp1;
>+  unsigned long regs_to_be_pushed_mask;
>+
>+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
>+    if (saved_regs_mask & (1 << i))
>+      num_regs++;
>+
>+  gcc_assert (num_regs && num_regs <= 16);
>+
>+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; j--)
>+    /* Var j iterates over all registers to gather all registers in
>+       saved_regs_mask.  Var i is used to count number of registers stored on
>+       stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
>+       that can be pushed using multi-reg PUSH before STRD is generated.  */

Comment above loop.

<...snip...>

>@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
> 	     use 32-bit push/pop instructions.  */
>  	  if (! any_sibcall_uses_r3 ()
> 	      && arm_size_return_regs () <= 12
>-	      && (offsets->saved_regs_mask & (1 << 3)) == 0)
>+	      && (offsets->saved_regs_mask & (1 << 3)) == 0
>+              && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))

Not sure I completely follow this change yet.

>@@ -16427,9 +16551,12 @@ arm_expand_prologue (void)
> 	    }
> 	}
>
>-      if (TARGET_THUMB2 && current_tune->prefer_ldrd_strd && !optimize_size)
>+      if (current_tune->prefer_ldrd_strd && !optimize_size)

s/optimize_size/optimize_function_for_size ()

>         {
>-          thumb2_emit_strd_push (live_regs_mask);
>+          if (TARGET_THUMB2)
>+            thumb2_emit_strd_push (live_regs_mask);
>+          else
>+            arm_emit_strd_push (live_regs_mask);
>         }
>       else
>         {
>diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
>index e3dcd4f..3c729bb 100644
>--- a/gcc/config/arm/ldmstm.md
>+++ b/gcc/config/arm/ldmstm.md
>@@ -73,6 +73,42 @@
>   [(set_attr "type" "store2")
>    (set_attr "predicable" "yes")])
>
>+(define_insn "*arm_strd_base"
>+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
>+        (plus:SI (match_dup 0)
>+                 (const_int -8)))
>+   (set (mem:SI (match_dup 0))
>+        (match_operand:SI 1 "arm_hard_register_operand" "r"))
>+   (set (mem:SI (plus:SI (match_dup 0)
>+                         (const_int 4)))
>+        (match_operand:SI 2 "arm_hard_register_operand" "r"))]
>+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
>+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
>+     && (REGNO (operands[1]) != REGNO (operands[0]))
>+     && (REGNO (operands[2]) != REGNO (operands[0])))"
>+  "str%(d%)\t%1, %2, [%0, #-8]!"
>+  [(set_attr "type" "store2")
>+   (set_attr "predicable" "yes")])


Hmmm the question remains if we want to put these into ldmstm.md since
it was theoretically
auto-generated from ldmstm.ml. If this has to be marked to be separate
then I'd like
to regenerate ldmstm.md from ldmstm.ml and differentiate between the
bits that can be auto-generated
and the bits that have been added since.

Otherwise OK.

Ramana

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.
  2011-10-11 10:19 ` [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue Sameera Deshpande
@ 2011-10-21 13:30   ` Ramana Radhakrishnan
  2011-11-08 11:15     ` Sameera Deshpande
  0 siblings, 1 reply; 18+ messages in thread
From: Ramana Radhakrishnan @ 2011-10-21 13:30 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

> 2011-10-11  Sameera Deshpande
> <sameera.deshpande@arm.com>
>
>        * config/arm/arm.c (arm_emit_ldrd_pop): New static function.
>          (arm_expand_epilogue): Update.
>        * config/arm/ldmstm.md (arm_ldrd_base): New pattern.
>          (arm_ldr_with_update): Likewise.

rth's comment about REG_CFA_RESTORE applies here as well. Please
change that. Other than that this patch looks OK and please watch out
for stylistic issues from the previous patch.

Ramana

> --
>
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
  2011-10-13 18:14   ` Richard Henderson
@ 2011-11-07  9:54     ` Sameera Deshpande
  2011-11-07 16:59       ` Richard Henderson
  2011-12-30 12:42       ` Sameera Deshpande
  0 siblings, 2 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-11-07  9:54 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 412 bytes --]


> 
> 
> I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
> anything besides prologues.  You need to emit REG_CFA_RESTORE
> for the pop inside an epilogue.

Richard, here is updated patch that uses REG_CFA_RESTORE instead of
REG_FRAME_RELATED_EXPR. 


The patch is tested with check-gcc, check-gdb and bootstrap with no
regression.

Ok for trunk?

- Thanks and regards,
  Sameera

[-- Attachment #2: a15_thumb2_ldrd_epilogue-4Nov.patch --]
[-- Type: text/x-patch, Size: 10172 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 37113f5..e71ead5 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -203,6 +203,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 429b644..05c9368 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15706,6 +15706,151 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+          || GET_CODE (src2) != REG
+          || (REGNO (src1) == PC_REGNUM)
+          || (REGNO (src1) == SP_REGNUM)
+          || (REGNO (src1) == REGNO (src2))
+          || (REGNO (src2) == PC_REGNUM)
+          || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static bool
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+     to be popped.  So, if num_regs is even, now it will become odd,
+     and we can generate pop with PC.  If num_regs is odd, it will be
+     even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+    num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+     saved_regs_mask.  Var i gives index of saved registers in stack frame.
+     A PARALLEL RTX of register-pair is created here, so that pattern for
+     LDRD can be matched.  As PC is always last register to be popped, and
+     we have already decremented num_regs if PC, we don't have to worry
+     about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+
+        /* Create RTX for memory load.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           reg,
+                           gen_frame_mem (SImode,
+                               plus_constant (stack_pointer_rtx, 4 * i)));
+        RTX_FRAME_RELATED_P (tmp) = 1;
+
+        if (i % 2 == 0)
+          {
+            /* When saved-register index (i) is even, the RTX to be emitted is
+               yet to be created.  Hence create it first.  The LDRD pattern we
+               are generating is :
+               [ (SET (reg_t0) (MEM (PLUS (SP) (NUM))))
+                 (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4)))) ]
+               where target registers need not be consecutive.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+            dwarf = NULL_RTX;
+          }
+
+        /* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+           added as 0th element and if i is odd, reg_i is added as 1st element
+           of LDRD pattern shown above.  */
+        XVECEXP (par, 0, (i % 2)) = tmp;
+        dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+        if ((i % 2) == 1)
+          {
+            /* When saved-register index (i) is odd, RTXs for both the registers
+               to be loaded are generated in above given LDRD pattern, and the
+               pattern can be emitted now.  */
+            par = emit_insn (par);
+            REG_NOTES (par) = dwarf;
+          }
+
+        i++;
+      }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+     number of registers are even AND really_return is true, last register is
+     popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+     then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+     num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, 4 * i));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  emit_insn (tmp);
+
+  dwarf = NULL_RTX;
+
+  if (((num_regs % 2) == 1 && !really_return)
+      || ((num_regs % 2) == 0 && really_return))
+    {
+      /* Gen LDR with post increment here.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j++);
+
+      tmp1 = gen_rtx_MEM (SImode,
+                          gen_rtx_POST_INC (SImode,
+                                            stack_pointer_rtx));
+      set_mem_alias_set (tmp1, get_frame_alias_set ());
+
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, reg, tmp1);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+      if (really_return)
+        {
+          /* If really_return, j must be PC_REGNUM.  */
+          gcc_assert (j == PC_REGNUM);
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          XVECEXP (par, 0, 1) = tmp;
+          par = emit_jump_insn (par);
+        }
+      else
+        {
+          par = emit_insn (tmp);
+        }
+      REG_NOTES (par) = dwarf;
+    }
+  else if ((num_regs % 2) == 1 && really_return)
+    {
+      /* There are 2 registers to be popped.  So, generate the pattern
+         pop_multiple_with_stack_update_and_return to pop in PC.  */
+      arm_emit_multi_reg_pop (saved_regs_mask & (~((1 << j) - 1)),
+                              really_return);
+    }
+
+  return really_return;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
@@ -22557,7 +22702,14 @@ thumb2_expand_epilogue (bool is_sibling)
               really_return = true;
             }
 
-          arm_emit_multi_reg_pop (saved_regs_mask, really_return);
+          if (!current_tune->prefer_ldrd_strd
+              || optimize_function_for_size_p (cfun))
+            arm_emit_multi_reg_pop (saved_regs_mask, really_return);
+          else
+            /* Generate LDRD pattern instead of POP pattern.  */
+            really_return = thumb2_emit_ldrd_pop (saved_regs_mask,
+                                                  really_return);
+
           if (really_return == true)
             return;
         }
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index d8ce982..3c55699 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -207,6 +207,12 @@
   (and (match_code "const_int")
        (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 255")))
 
+(define_constraint "Pz"
+  "@internal In Thumb-2 state a constant in the range -1020 to 1020"
+  (and (match_code "const_int")
+       (match_test "TARGET_THUMB2 && ival >= -1020 && ival <= 1020
+                    && ival % 4 == 0")))
+
 (define_constraint "G"
  "In ARM/Thumb-2 state a valid FPA immediate constant."
  (and (match_code "const_double")
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 5db4a32..21d2815 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -21,6 +21,32 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[2])))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_immediate_operand" "Pz"))))
+   (set (match_operand:SI 3 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[3])))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e2203d..60ee008 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -209,6 +209,10 @@
 	    (match_test "(GET_CODE (op) != CONST_INT
 			  || (INTVAL (op) < 4096 && INTVAL (op) > -4096))"))))
 
+(define_predicate "ldrd_immediate_operand"
+  (and (match_operand 0 "const_int_operand")
+  (match_test "(INTVAL (op) < 1020 && INTVAL (op) > -1020)")))
+
 ;; True for operators that can be combined with a shift in ARM state.
 (define_special_predicate "shiftable_operator"
   (and (match_code "plus,minus,ior,xor,and")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue.
  2011-10-21 13:00   ` Ramana Radhakrishnan
@ 2011-11-07  9:55     ` Sameera Deshpande
  0 siblings, 0 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-11-07  9:55 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 177 bytes --]

Hi Ramana,

Please find attached reworked patch. The patch is tested with check-gcc,
check-gdb and bootstrap with no regression.

Ok?

- Thanks and regards,
  Sameera D.

[-- Attachment #2: a15_thumb2_strd_prologue-4Nov.patch --]
[-- Type: text/x-patch, Size: 7457 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 05c9368..334a25f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15438,6 +15438,125 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
     }
 }
 
+/* Generate and emit a pattern that will be recognized as STRD pattern.  If even
+   number of registers are being pushed, multiple STRD patterns are created for
+   all register pairs.  If odd number of registers are pushed, emit a
+   combination of STRDs and STR for the prologue saves.  */
+static void
+thumb2_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Pre-decrement the stack pointer, based on there being num_regs 4-byte
+     registers to push.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  insn = emit_insn (tmp);
+
+  /* Create sequence for DWARF info.  */
+  dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* RTLs cannot be shared, hence create new copy for dwarf.  */
+  tmp1 = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, -4 * num_regs));
+  RTX_FRAME_RELATED_P (tmp1) = 1;
+  XVECEXP (dwarf, 0, 0) = tmp1;
+
+  /* Var j iterates over all the registers to gather all the registers in
+     saved_regs_mask.  Var i gives index of register R_j in stack frame.
+     A PARALLEL RTX of register-pair is created here, so that pattern for
+     STRD can be matched.  If num_regs is odd, 1st register will be pushed
+     using STR and remaining registers will be pushed with STRD in pairs.
+     If num_regs is even, all registers are pushed with STRD in pairs.
+     Hence, skip first element for odd num_regs.  */
+  for (i = num_regs - 1, j = LAST_ARM_REGNUM; i >= (num_regs % 2); j--)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+        gcc_assert (j != PC_REGNUM);
+
+        /* Create RTX for store.  New RTX is created for dwarf as
+           they are not sharable.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (stack_pointer_rtx, 4 * i)),
+                           reg);
+
+        tmp1 = gen_rtx_SET (SImode,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (stack_pointer_rtx, 4 * i)),
+                           reg);
+        RTX_FRAME_RELATED_P (tmp) = 1;
+        RTX_FRAME_RELATED_P (tmp1) = 1;
+
+        if (((i - (num_regs % 2)) % 2) == 1)
+          /* When (i - (num_regs % 2)) is odd, the RTX to be emitted is yet to
+             be created.  Hence create it first.  The STRD pattern we are
+             generating is :
+             [ (SET (MEM (PLUS (SP) (NUM))) (reg_t1))
+               (SET (MEM (PLUS (SP) (NUM + 4))) (reg_t2)) ]
+             were target registers need not be consecutive.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+
+        /* Register R_j is added in PARALLEL RTX.  If (i - (num_regs % 2)) is
+           even, the reg_j is added as 0th element and if it is odd, reg_i is
+           added as 1st element of STRD pattern shown above.  */
+        XVECEXP (par, 0, ((i - (num_regs % 2)) % 2)) = tmp;
+        XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+
+        if (((i - (num_regs % 2)) % 2) == 0)
+          /* When (i - (num_regs % 2)) is even, RTXs for both the registers
+             to be loaded are generated in above given STRD pattern, and the
+             pattern can be emitted now.  */
+          emit_insn (par);
+
+        i--;
+      }
+
+  if ((num_regs % 2) == 1)
+    {
+      /* If odd number of registers are pushed, generate STR pattern to store
+         lone register.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j--);
+
+      tmp1 = gen_frame_mem (SImode, plus_constant (stack_pointer_rtx, 4 * i));
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, tmp1, reg);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+
+      emit_insn (tmp);
+
+      tmp1 = gen_rtx_SET (SImode,
+                         gen_frame_mem
+                         (SImode,
+                          plus_constant (stack_pointer_rtx, 4 * i)),
+                         reg);
+      RTX_FRAME_RELATED_P (tmp1) = 1;
+      XVECEXP (dwarf, 0, (i + 1)) = tmp1;
+    }
+
+  add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+  RTX_FRAME_RELATED_P (insn) = 1;
+  return;
+}
+
 /* Generate and emit an insn that we will recognize as a push_multi.
    Unfortunately, since this insn does not reflect very well the actual
    semantics of the operation, we need to annotate the insn for the benefit
@@ -16598,8 +16717,18 @@ arm_expand_prologue (void)
 	      saved_regs += frame;
 	    }
 	}
-      insn = emit_multi_reg_push (live_regs_mask);
-      RTX_FRAME_RELATED_P (insn) = 1;
+
+      if (TARGET_THUMB2
+          && current_tune->prefer_ldrd_strd
+          && !optimize_function_for_size_p (cfun))
+        {
+          thumb2_emit_strd_push (live_regs_mask);
+        }
+      else
+        {
+          insn = emit_multi_reg_push (live_regs_mask);
+          RTX_FRAME_RELATED_P (insn) = 1;
+        }
     }
 
   if (! IS_VOLATILE (func_type))
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 21d2815..e3dcd4f 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -47,6 +47,32 @@
   [(set_attr "type" "load2")
    (set_attr "predicable" "yes")])
 
+(define_insn "*thumb2_strd_base"
+  [(set (mem:SI (match_operand:SI 0 "s_register_operand" "rk"))
+        (match_operand:SI 1 "register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "register_operand" "r"))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[1], operands[2])))"
+  "strd%?\t%1, %2, [%0]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_strd"
+  [(set (mem:SI (plus:SI (match_operand:SI 0 "s_register_operand" "rk")
+                         (match_operand:SI 1 "ldrd_immediate_operand" "Pz")))
+        (match_operand:SI 2 "register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (match_operand:SI 3 "const_int_operand" "")))
+        (match_operand:SI 4 "register_operand" "r"))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[1]) + 4) == INTVAL (operands[3]))
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[2], operands[4])))"
+  "strd%?\t%2, %4, [%0, %1]"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
  2011-11-07  9:54     ` Sameera Deshpande
@ 2011-11-07 16:59       ` Richard Henderson
  2011-12-30 12:42       ` Sameera Deshpande
  1 sibling, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2011-11-07 16:59 UTC (permalink / raw)
  To: Sameera Deshpande
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

On 11/07/2011 01:48 AM, Sameera Deshpande wrote:
> 
>>
>>
>> I don't believe REG_FRAME_RELATED_EXPR does the right thing for 
>> anything besides prologues.  You need to emit REG_CFA_RESTORE
>> for the pop inside an epilogue.
> 
> Richard, here is updated patch that uses REG_CFA_RESTORE instead of
> REG_FRAME_RELATED_EXPR. 
> 
> 
> The patch is tested with check-gcc, check-gdb and bootstrap with no
> regression.
> 
> Ok for trunk?

Ok by me re unwind info.


r~

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue.
  2011-10-21 13:21   ` Ramana Radhakrishnan
@ 2011-11-08 11:14     ` Sameera Deshpande
  0 siblings, 0 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-11-08 11:14 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2881 bytes --]

On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> >+arm_emit_strd_push (unsigned long saved_regs_mask)
> 
> How different is this from the thumb2 version you sent out in Patch 03/05 ?
> 
Thumb-2 STRD can handle non-consecutive registers, ARM STRD cannot.
Because of which we accumulate non-consecutive STRDs in ARM mode and
emit STM instruction. For consecutive registers, STRD is generated.

> >@@ -15958,7 +16081,8 @@ arm_get_frame_offsets (void)
> > 	     use 32-bit push/pop instructions.  */
> >  	  if (! any_sibcall_uses_r3 ()
> > 	      && arm_size_return_regs () <= 12
> >-	      && (offsets->saved_regs_mask & (1 << 3)) == 0)
> >+	      && (offsets->saved_regs_mask & (1 << 3)) == 0
> >+              && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
> 
> Not sure I completely follow this change yet.
> 
If the stack is not aligned, we need to adjust the stack in prologue.
Here, instead of adjusting the stack, we PUSH register R3 on stack, so
that no additional ADD instruction is needed for stack adjustment.
This works fine when we generate multi-reg load/store instructions.

However, when we generate STRD in ARM mode, non-consecutive registers
are stored using STR/STM instruction. As pair register of R3 (reg R2) is
never pushed on stack, we always end up generating STR instruction to
PUSH R3 on stack. This is more expensive than doing ADD SP, SP, #4 for
stack adjustment.

e.g. if we are PUSHing {R4, R5, R6} registers, the stack is not aligned,
hence, we PUSH {R3, R4, R5, R6}
So, Instructions generated are:
STR R6, [sp, #4]
STRD R4, R5, [sp, #12]
STR R3, [sp, #16]

However, if instead of R3, other caller-saved register is PUSHed,
we push {R4, R5, R6, R7}, to generate
STRD R6, R7, [sp, #8]
STRD R4, R5, [sp, #16]

If no caller saved register is available, we generate ADD instruction,
which is still better than generating STR. 
> 
> Hmmm the question remains if we want to put these into ldmstm.md since
> it was theoretically
> auto-generated from ldmstm.ml. If this has to be marked to be separate
> then I'd like
> to regenerate ldmstm.md from ldmstm.ml and differentiate between the
> bits that can be auto-generated
> and the bits that have been added since.
> 
The current patterns are quite different from patterns generated using
arm-ldmstm.ml. I will submit updated arm-ldmstm.ml file generating
ldrd/strd patterns as a new patch. Is that fine?

The patch is tested with check-gcc, check-gdb and bootstrap.

I see a regression in gcc:
FAIL: gcc.c-torture/execute/vector-compare-1.c compilation,  -O3
-fomit-frame-pointer -funroll-loops with error message 
/tmp/ccC13odV.s: Assembler messages:
/tmp/ccC13odV.s:544: Error: co-processor offset out of range

This seems to be uncovered latent bug, and I am looking into it.

- Thanks and regards,
  Sameera D.

[-- Attachment #2: a15_arm_strd_prologue-4Nov.patch --]
[-- Type: text/x-patch, Size: 9517 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index e71ead5..ccf05c7 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -163,6 +163,7 @@ extern const char *arm_output_memory_barrier (rtx *);
 extern const char *arm_output_sync_insn (rtx, rtx *);
 extern unsigned int arm_sync_loop_insns (rtx , rtx *);
 extern int arm_attr_length_push_multi(rtx, rtx);
+extern bool bad_reg_pair_for_arm_ldrd_strd (rtx, rtx);
 
 #if defined TREE_CODE
 extern void arm_init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 334a25f..deee78b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -93,6 +93,7 @@ static bool arm_assemble_integer (rtx, unsigned int, int);
 static void arm_print_operand (FILE *, rtx, int);
 static void arm_print_operand_address (FILE *, rtx);
 static bool arm_print_operand_punct_valid_p (unsigned char code);
+static rtx emit_multi_reg_push (unsigned long);
 static const char *fp_const_from_val (REAL_VALUE_TYPE *);
 static arm_cc get_arm_condition_code (rtx);
 static HOST_WIDE_INT int_log2 (HOST_WIDE_INT);
@@ -15438,6 +15439,117 @@ arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
     }
 }
 
+/* STRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi register PUSH for all accumulated
+   registers, and then generates STRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are stored on stack.
+   multi register PUSH takes care of lone registers as well.  */
+static void
+arm_emit_strd_push (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx tmp, tmp1;
+  unsigned long regs_to_be_pushed_mask;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* Var j iterates over all registers to gather all registers in
+     saved_regs_mask.  Var i is used to count number of registers stored on
+     stack.  regs_to_be_pushed_mask accumulates non-consecutive registers
+     that can be pushed using multi register PUSH before STRD is
+     generated.  */
+  for (i=0, j = LAST_ARM_REGNUM, regs_to_be_pushed_mask = 0; i < num_regs; j--)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+        gcc_assert (j != PC_REGNUM);
+        i++;
+
+        if ((j % 2 == 1)
+            && (saved_regs_mask & (1 << (j - 1)))
+            && regs_to_be_pushed_mask)
+          {
+            /* Current register and previous register form register pair for
+               which STRD can be generated.  Hence, emit PUSH for accumulated
+               registers and reset regs_to_be_pushed_mask.  */
+            insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+            regs_to_be_pushed_mask = 0;
+            RTX_FRAME_RELATED_P (insn) = 1;
+            continue;
+          }
+
+        regs_to_be_pushed_mask |= (1 << j);
+
+        if ((j % 2) == 0 && (saved_regs_mask & (1 << (j + 1))))
+          {
+            /* We have found 2 consecutive registers, for which STRD can be
+               generated.  Generate pattern to emit STRD as accumulated
+               registers have already been pushed.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+            dwarf = gen_rtx_SEQUENCE (VOIDmode, rtvec_alloc (3));
+
+            tmp = gen_rtx_SET (VOIDmode,
+                               stack_pointer_rtx,
+                               plus_constant (stack_pointer_rtx, -8));
+            tmp1 = gen_rtx_SET (VOIDmode,
+                                stack_pointer_rtx,
+                                plus_constant (stack_pointer_rtx, -8));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 0) = tmp;
+            XVECEXP (dwarf, 0, 0) = tmp1;
+
+            tmp = gen_rtx_SET (SImode,
+                               gen_frame_mem (SImode, stack_pointer_rtx),
+                               gen_rtx_REG (SImode, j));
+            tmp1 = gen_rtx_SET (SImode,
+                                gen_frame_mem (SImode, stack_pointer_rtx),
+                                gen_rtx_REG (SImode, j));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 1) = tmp;
+            XVECEXP (dwarf, 0, 1) = tmp1;
+
+            tmp = gen_rtx_SET (SImode,
+                          gen_frame_mem (SImode,
+                                    plus_constant (stack_pointer_rtx, 4)),
+                          gen_rtx_REG (SImode, j + 1));
+            tmp1 = gen_rtx_SET (SImode,
+                           gen_frame_mem (SImode,
+                                     plus_constant (stack_pointer_rtx, 4)),
+                           gen_rtx_REG (SImode, j + 1));
+            RTX_FRAME_RELATED_P (tmp) = 1;
+            RTX_FRAME_RELATED_P (tmp1) = 1;
+            XVECEXP (par, 0, 2) = tmp;
+            XVECEXP (dwarf, 0, 2) = tmp1;
+
+            insn = emit_insn (par);
+            add_reg_note (insn, REG_FRAME_RELATED_EXPR, dwarf);
+            RTX_FRAME_RELATED_P (insn) = 1;
+            regs_to_be_pushed_mask = 0;
+          }
+      }
+
+  /* Check if any accumulated registers are yet to be pushed, and generate
+     multi register PUSH for them.  */
+  if (regs_to_be_pushed_mask)
+    {
+      insn = emit_multi_reg_push (regs_to_be_pushed_mask);
+      RTX_FRAME_RELATED_P (insn) = 1;
+    }
+
+  return;
+}
+
 /* Generate and emit a pattern that will be recognized as STRD pattern.  If even
    number of registers are being pushed, multiple STRD patterns are created for
    all register pairs.  If odd number of registers are pushed, emit a
@@ -15826,6 +15938,17 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
 }
 
 bool
+bad_reg_pair_for_arm_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+          || GET_CODE (src2) != REG
+          || ((REGNO (src1) + 1) != REGNO (src2))
+          || ((REGNO (src1) % 2) != 0)
+          || (REGNO (src2) == PC_REGNUM)
+          || (REGNO (src2) == SP_REGNUM));
+}
+
+bool
 bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
 {
   return (GET_CODE (src1) != REG
@@ -16249,7 +16372,8 @@ arm_get_frame_offsets (void)
 	     use 32-bit push/pop instructions.  */
  	  if (! any_sibcall_uses_r3 ()
 	      && arm_size_return_regs () <= 12
-	      && (offsets->saved_regs_mask & (1 << 3)) == 0)
+	      && (offsets->saved_regs_mask & (1 << 3)) == 0
+              && (TARGET_THUMB2 || !current_tune->prefer_ldrd_strd))
 	    {
 	      reg = 3;
 	    }
@@ -16718,11 +16842,13 @@ arm_expand_prologue (void)
 	    }
 	}
 
-      if (TARGET_THUMB2
-          && current_tune->prefer_ldrd_strd
+      if (current_tune->prefer_ldrd_strd
           && !optimize_function_for_size_p (cfun))
         {
-          thumb2_emit_strd_push (live_regs_mask);
+          if (TARGET_THUMB2)
+            thumb2_emit_strd_push (live_regs_mask);
+          else
+            arm_emit_strd_push (live_regs_mask);
         }
       else
         {
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index e3dcd4f..ffa675d 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -73,6 +73,42 @@
   [(set_attr "type" "store2")
    (set_attr "predicable" "yes")])
 
+(define_insn "*arm_strd_base_update"
+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+&rk")
+        (plus:SI (match_dup 0)
+                 (const_int -8)))
+   (set (mem:SI (match_dup 0))
+        (match_operand:SI 1 "arm_hard_register_operand" "r"))
+   (set (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4)))
+        (match_operand:SI 2 "arm_hard_register_operand" "r"))]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  "str%(d%)\t%1, %2, [%0, #-8]!"
+  [(set_attr "type" "store2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int -8)))
+     (set (mem:SI (match_dup 0))
+          (match_operand:SI 1 "arm_hard_register_operand" ""))
+     (set (mem:SI (plus:SI (match_dup 0)
+                           (const_int 4)))
+          (match_operand:SI 2 "arm_hard_register_operand" ""))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  [(set (mem:DI (pre_dec:SI (match_dup 0)))
+        (match_dup 1))]
+  "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
+)
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.
  2011-10-21 13:30   ` Ramana Radhakrishnan
@ 2011-11-08 11:15     ` Sameera Deshpande
  2011-12-30 13:29       ` Sameera Deshpande
  0 siblings, 1 reply; 18+ messages in thread
From: Sameera Deshpande @ 2011-11-08 11:15 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 353 bytes --]

On Fri, 2011-10-21 at 13:45 +0100, Ramana Radhakrishnan wrote: 
> change that. Other than that this patch looks OK and please watch out
> for stylistic issues from the previous patch.

Ramana, please find attached reworked patch. The patch is tested with
check-gcc, check-gdb and bootstrap with no regression.

- Thanks and regards,
  Sameera D.

[-- Attachment #2: a15_arm_ldrd_epilogue-4Nov.patch --]
[-- Type: text/x-patch, Size: 8748 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index deee78b..4a86749 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15960,6 +15960,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
           || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        i++;
+        if ((j % 2) == 0
+            && (saved_regs_mask & (1 << (j + 1)))
+            && (j + 1) != SP_REGNUM
+            && (j + 1) != PC_REGNUM
+            && regs_to_be_popped_mask)
+          {
+            /* Current register and next register form register pair for which
+               LDRD can be generated.  Generate POP for accumulated registers
+               and reset regs_to_be_popped_mask.  SP should be handled here as
+               the results are unpredictable if register being stored is same
+               as index register (in this case, SP).  PC is always the last
+               register being popped.  Hence, we don't have to worry about PC
+               here.  */
+            arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+            pc_in_list = false;
+            regs_to_be_popped_mask = 0;
+            continue;
+          }
+
+        if (j == PC_REGNUM)
+          {
+            gcc_assert (really_return);
+            pc_in_list = 1;
+          }
+
+        regs_to_be_popped_mask |= (1 << j);
+
+        if ((j % 2) == 1
+            && (saved_regs_mask & (1 << (j - 1)))
+            && j != SP_REGNUM
+            && j != PC_REGNUM)
+          {
+             /* Generate a LDRD for register pair R_<j>, R_<j+1>.  The pattern
+                generated here is
+                [(SET SP, (PLUS SP, 8))
+                 (SET R_<j-1>, (MEM SP))
+                 (SET R_<j>, (MEM (PLUS SP, 4)))].  */
+             par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+             tmp = gen_rtx_SET (VOIDmode,
+                                stack_pointer_rtx,
+                                plus_constant (stack_pointer_rtx, 8));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 0) = tmp;
+
+             tmp = gen_rtx_SET (SImode,
+                                gen_rtx_REG (SImode, j - 1),
+                                gen_frame_mem (SImode, stack_pointer_rtx));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 1) = tmp;
+             dwarf = alloc_reg_note (REG_CFA_RESTORE,
+                                     gen_rtx_REG (SImode, j - 1),
+                                     dwarf);
+
+             tmp = gen_rtx_SET (SImode,
+                                 gen_rtx_REG (SImode, j),
+                                 gen_frame_mem (SImode,
+                                       plus_constant (stack_pointer_rtx, 4)));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 2) = tmp;
+             dwarf = alloc_reg_note (REG_CFA_RESTORE,
+                                     gen_rtx_REG (SImode, j),
+                                     dwarf);
+
+             insn = emit_insn (par);
+             REG_NOTES (insn) = dwarf;
+             pc_in_list = false;
+             regs_to_be_popped_mask = 0;
+             dwarf = NULL_RTX;
+          }
+      }
+
+  if (regs_to_be_popped_mask)
+    {
+      /* single PC pop can happen here.  Take care of that.  */
+      if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+        {
+          /* Only PC is to be popped.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          tmp = gen_rtx_SET (SImode,
+                             gen_rtx_REG (SImode, PC_REGNUM),
+                             gen_frame_mem (SImode,
+                                            gen_rtx_POST_INC (SImode,
+                                                         stack_pointer_rtx)));
+          RTX_FRAME_RELATED_P (tmp) = 1;
+          XVECEXP (par, 0, 1) = tmp;
+          emit_jump_insn (par);
+        }
+      else
+        {
+          arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+        }
+    }
+
+  return;
+}
+
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
    number of registers are being popped, multiple LDRD patterns are created for
    all register pairs.  If odd number of registers are popped, last register is
@@ -22807,7 +22936,13 @@ arm_expand_epilogue (bool really_return)
                     return_in_pc = true;
                   }
 
-                arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+                if (!current_tune->prefer_ldrd_strd
+                    || optimize_function_for_size_p (cfun))
+                  arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+                else
+                  /* Generate LDRD pattern instead of POP pattern.  */
+                  arm_emit_ldrd_pop (saved_regs_mask, return_in_pc);
+
                 if (return_in_pc == true)
                   return;
               }
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index ffa675d..149fd8b 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -109,6 +109,54 @@
   "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
 )
 
+(define_insn "*arm_ldrd_base_update"
+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+   (set (match_operand:SI 1 "arm_hard_register_operand" "=r")
+        (mem:SI (match_dup 0)))
+   (set (match_operand:SI 2 "arm_hard_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4))))]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  "ldr%(d%)\t%1, %2, [%0], #8"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 0)
+                           (const_int 4))))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  [(set (match_dup 1)
+        (mem:DI (post_inc:SI (match_dup 0))))]
+  "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
+)
+
+(define_insn "*arm_ldr_with_update"
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 4)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd)"
+  "ldr%?\t%1, [%0], #4"
+  [(set_attr "type" "load1")
+  (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue.
  2011-11-07  9:54     ` Sameera Deshpande
  2011-11-07 16:59       ` Richard Henderson
@ 2011-12-30 12:42       ` Sameera Deshpande
  1 sibling, 0 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-12-30 12:42 UTC (permalink / raw)
  To: nickc, Richard Earnshaw, paul, Ramana Radhakrishnan; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 299 bytes --]

Hi!

Please find attached revised LDRD generation patch for A15 Thumb-2 mode.

Because of the major rework in ARM and Thumb-2 RTL epilogue patches,
this patch has undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- 

[-- Attachment #2: a15_thumb2_ldrd_epilogue_final.patch --]
[-- Type: text/x-patch, Size: 10261 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 64d5993..49aae52 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -201,6 +201,7 @@ extern void thumb_reload_in_hi (rtx *);
 extern void thumb_set_return_address (rtx, rtx);
 extern const char *thumb1_output_casesi (rtx *);
 extern const char *thumb2_output_casesi (rtx *);
+extern bool bad_reg_pair_for_thumb_ldrd_strd (rtx, rtx);
 #endif
 
 /* Defined in pe.c.  */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d671281..6d008c5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15847,6 +15847,154 @@ arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
   REG_NOTES (par) = dwarf;
 }
 
+bool
+bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
+{
+  return (GET_CODE (src1) != REG
+          || GET_CODE (src2) != REG
+          || (REGNO (src1) == PC_REGNUM)
+          || (REGNO (src1) == SP_REGNUM)
+          || (REGNO (src1) == REGNO (src2))
+          || (REGNO (src2) == PC_REGNUM)
+          || (REGNO (src2) == SP_REGNUM));
+}
+
+/* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
+   number of registers are being popped, multiple LDRD patterns are created for
+   all register pairs.  If odd number of registers are popped, last register is
+   loaded by using LDR pattern.  */
+static void
+thumb2_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg, tmp1;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+  gcc_assert (really_return || ((saved_regs_mask & (1 << PC_REGNUM)) == 0));
+
+  /* We cannot generate ldrd for PC.  Hence, reduce the count if PC is
+     to be popped.  So, if num_regs is even, now it will become odd,
+     and we can generate pop with PC.  If num_regs is odd, it will be
+     even now, and ldr with return can be generated for PC.  */
+  if (really_return && (saved_regs_mask & (1 << PC_REGNUM)))
+    num_regs--;
+
+  /* Var j iterates over all the registers to gather all the registers in
+     saved_regs_mask.  Var i gives index of saved registers in stack frame.
+     A PARALLEL RTX of register-pair is created here, so that pattern for
+     LDRD can be matched.  As PC is always last register to be popped, and
+     we have already decremented num_regs if PC, we don't have to worry
+     about PC in this loop.  */
+  for (i = 0, j = 0; i < (num_regs - (num_regs % 2)); j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        gcc_assert (j != SP_REGNUM);
+
+        /* Create RTX for memory load.  */
+        reg = gen_rtx_REG (SImode, j);
+        tmp = gen_rtx_SET (SImode,
+                           reg,
+                           gen_frame_mem (SImode,
+                               plus_constant (stack_pointer_rtx, 4 * i)));
+        RTX_FRAME_RELATED_P (tmp) = 1;
+
+        if (i % 2 == 0)
+          {
+            /* When saved-register index (i) is even, the RTX to be emitted is
+               yet to be created.  Hence create it first.  The LDRD pattern we
+               are generating is :
+               [ (SET (reg_t0) (MEM (PLUS (SP) (NUM))))
+                 (SET (reg_t1) (MEM (PLUS (SP) (NUM + 4)))) ]
+               where target registers need not be consecutive.  */
+            par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+            dwarf = NULL_RTX;
+          }
+
+        /* ith register is added in PARALLEL RTX.  If i is even, the reg_i is
+           added as 0th element and if i is odd, reg_i is added as 1st element
+           of LDRD pattern shown above.  */
+        XVECEXP (par, 0, (i % 2)) = tmp;
+        dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+        if ((i % 2) == 1)
+          {
+            /* When saved-register index (i) is odd, RTXs for both the registers
+               to be loaded are generated in above given LDRD pattern, and the
+               pattern can be emitted now.  */
+            par = emit_insn (par);
+            REG_NOTES (par) = dwarf;
+          }
+
+        i++;
+      }
+
+  /* If the number of registers pushed is odd AND really_return is false OR
+     number of registers are even AND really_return is true, last register is
+     popped using LDR.  It can be PC as well.  Hence, adjust the stack first and
+     then LDR with post increment.  */
+
+  /* Increment the stack pointer, based on there being
+     num_regs 4-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     stack_pointer_rtx,
+                     plus_constant (stack_pointer_rtx, 4 * i));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  emit_insn (tmp);
+
+  dwarf = NULL_RTX;
+
+  if (((num_regs % 2) == 1 && !really_return)
+      || ((num_regs % 2) == 0 && really_return))
+    {
+      /* Scan for the single register to be popped.  Skip until the saved
+         register is found.  */
+      for (; (saved_regs_mask & (1 << j)) == 0; j++);
+
+      /* Gen LDR with post increment here.  */
+      tmp1 = gen_rtx_MEM (SImode,
+                          gen_rtx_POST_INC (SImode,
+                                            stack_pointer_rtx));
+      set_mem_alias_set (tmp1, get_frame_alias_set ());
+
+      reg = gen_rtx_REG (SImode, j);
+      tmp = gen_rtx_SET (SImode, reg, tmp1);
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+      if (really_return)
+        {
+          /* If really_return, j must be PC_REGNUM.  */
+          gcc_assert (j == PC_REGNUM);
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          XVECEXP (par, 0, 1) = tmp;
+          par = emit_jump_insn (par);
+        }
+      else
+        {
+          par = emit_insn (tmp);
+        }
+
+      REG_NOTES (par) = dwarf;
+    }
+  else if ((num_regs % 2) == 1 && really_return)
+    {
+      /* There are 2 registers to be popped.  So, generate the pattern
+         pop_multiple_with_stack_update_and_return to pop in PC.  */
+      arm_emit_multi_reg_pop (saved_regs_mask & (~((1 << j) - 1)),
+                              really_return);
+    }
+
+  return;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
@@ -22615,7 +22763,13 @@ arm_expand_epilogue (bool really_return)
               }
             else
               {
-                arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+                if (!current_tune->prefer_ldrd_strd
+                    || optimize_function_for_size_p (cfun)
+                    || TARGET_ARM)
+                  arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
+                else
+                  /* Generate LDRD pattern instead of POP pattern.  */
+                  thumb2_emit_ldrd_pop (saved_regs_mask, return_in_pc);
               }
 
             if (return_in_pc == true)
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 7d0269a..e33eff2 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -207,6 +207,12 @@
   (and (match_code "const_int")
        (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 255")))
 
+(define_constraint "Pz"
+  "@internal In Thumb-2 state a constant in the range -1020 to 1020"
+  (and (match_code "const_int")
+       (match_test "TARGET_THUMB2 && ival >= -1020 && ival <= 1020
+                    && ival % 4 == 0")))
+
 (define_constraint "G"
  "In ARM/Thumb-2 state a valid FPA immediate constant."
  (and (match_code "const_double")
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index 5db4a32..21d2815 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -21,6 +21,32 @@
    see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    <http://www.gnu.org/licenses/>.  */
 
+(define_insn "*thumb2_ldrd_base"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+   (set (match_operand:SI 2 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (const_int 4))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[2])))"
+  "ldrd%?\t%0, %2, [%1]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_insn "*thumb2_ldrd"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+        (mem:SI (plus:SI (match_operand:SI 1 "s_register_operand" "rk")
+                         (match_operand:SI 2 "ldrd_immediate_operand" "Pz"))))
+   (set (match_operand:SI 3 "register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 1)
+                         (match_operand:SI 4 "const_int_operand" ""))))]
+  "(TARGET_THUMB2 && current_tune->prefer_ldrd_strd
+     && ((INTVAL (operands[2]) + 4) == INTVAL (operands[4]))
+     && (!bad_reg_pair_for_thumb_ldrd_strd (operands[0], operands[3])))"
+  "ldrd%?\t%0, %3, [%1, %2]"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 85a112e..881c0b0 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -209,6 +209,10 @@
 	    (match_test "(GET_CODE (op) != CONST_INT
 			  || (INTVAL (op) < 4096 && INTVAL (op) > -4096))"))))
 
+(define_predicate "ldrd_immediate_operand"
+  (and (match_operand 0 "const_int_operand")
+  (match_test "(INTVAL (op) < 1020 && INTVAL (op) > -1020)")))
+
 ;; True for operators that can be combined with a shift in ARM state.
 (define_special_predicate "shiftable_operator"
   (and (match_code "plus,minus,ior,xor,and")

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue.
  2011-11-08 11:15     ` Sameera Deshpande
@ 2011-12-30 13:29       ` Sameera Deshpande
  0 siblings, 0 replies; 18+ messages in thread
From: Sameera Deshpande @ 2011-12-30 13:29 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: gcc-patches, nickc, Richard Earnshaw, paul, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 288 bytes --]

Hi Ramana,

Please find attached revised LDRD generation patch for A15 ARM mode.

Because of the major rework in ARM RTL epilogue patch, this patch has
undergone some changes.

The patch is tested with check-gcc, bootstrap and check-gdb without
regression.

Ok for trunk?

-- 

[-- Attachment #2: a15_arm_ldrd_epilogue_final.patch --]
[-- Type: text/x-patch, Size: 8926 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d5c651c..46becfb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16101,6 +16101,135 @@ bad_reg_pair_for_thumb_ldrd_strd (rtx src1, rtx src2)
           || (REGNO (src2) == SP_REGNUM));
 }
 
+/* LDRD in ARM mode needs consecutive registers to be stored.  This function
+   keeps accumulating non-consecutive registers until first consecutive register
+   pair is found.  It then generates multi-reg POP for all accumulated
+   registers, and then generates LDRD with write-back for consecutive register
+   pair.  This process is repeated until all the registers are loaded from
+   stack.  multi register POP takes care of lone registers as well.  However,
+   LDRD cannot be generated for PC, as results are unpredictable.  Hence, if PC
+   is in SAVED_REGS_MASK, generate multi-reg POP with RETURN or LDR with RETURN
+   depending upon number of registers in REGS_TO_BE_POPPED_MASK.  */
+static void
+arm_emit_ldrd_pop (unsigned long saved_regs_mask, bool really_return)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par = NULL_RTX;
+  rtx insn = NULL_RTX;
+  rtx dwarf = NULL_RTX;
+  rtx tmp;
+  unsigned long regs_to_be_popped_mask = 0;
+  bool pc_in_list = false;
+
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  for (i = 0, j = 0; i < num_regs; j++)
+    if (saved_regs_mask & (1 << j))
+      {
+        i++;
+        if ((j % 2) == 0
+            && (saved_regs_mask & (1 << (j + 1)))
+            && (j + 1) != SP_REGNUM
+            && (j + 1) != PC_REGNUM
+            && regs_to_be_popped_mask)
+          {
+            /* Current register and next register form register pair for which
+               LDRD can be generated.  Generate POP for accumulated registers
+               and reset regs_to_be_popped_mask.  SP should be handled here as
+               the results are unpredictable if register being stored is same
+               as index register (in this case, SP).  PC is always the last
+               register being popped.  Hence, we don't have to worry about PC
+               here.  */
+            arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+            pc_in_list = false;
+            regs_to_be_popped_mask = 0;
+            continue;
+          }
+
+        if (j == PC_REGNUM)
+          {
+            gcc_assert (really_return);
+            pc_in_list = 1;
+          }
+
+        regs_to_be_popped_mask |= (1 << j);
+
+        if ((j % 2) == 1
+            && (saved_regs_mask & (1 << (j - 1)))
+            && j != SP_REGNUM
+            && j != PC_REGNUM)
+          {
+             /* Generate a LDRD for register pair R_<j>, R_<j+1>.  The pattern
+                generated here is
+                [(SET SP, (PLUS SP, 8))
+                 (SET R_<j-1>, (MEM SP))
+                 (SET R_<j>, (MEM (PLUS SP, 4)))].  */
+             par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+
+             tmp = gen_rtx_SET (VOIDmode,
+                                stack_pointer_rtx,
+                                plus_constant (stack_pointer_rtx, 8));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 0) = tmp;
+
+             tmp = gen_rtx_SET (SImode,
+                                gen_rtx_REG (SImode, j - 1),
+                                gen_frame_mem (SImode, stack_pointer_rtx));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 1) = tmp;
+             dwarf = alloc_reg_note (REG_CFA_RESTORE,
+                                     gen_rtx_REG (SImode, j - 1),
+                                     dwarf);
+
+             tmp = gen_rtx_SET (SImode,
+                                 gen_rtx_REG (SImode, j),
+                                 gen_frame_mem (SImode,
+                                       plus_constant (stack_pointer_rtx, 4)));
+             RTX_FRAME_RELATED_P (tmp) = 1;
+             XVECEXP (par, 0, 2) = tmp;
+             dwarf = alloc_reg_note (REG_CFA_RESTORE,
+                                     gen_rtx_REG (SImode, j),
+                                     dwarf);
+
+             insn = emit_insn (par);
+             REG_NOTES (insn) = dwarf;
+             pc_in_list = false;
+             regs_to_be_popped_mask = 0;
+             dwarf = NULL_RTX;
+          }
+      }
+
+  if (regs_to_be_popped_mask)
+    {
+      /* single PC pop can happen here.  Take care of that.  */
+      if (pc_in_list && (regs_to_be_popped_mask == (1 << PC_REGNUM)))
+        {
+          /* Only PC is to be popped.  */
+          par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          XVECEXP (par, 0, 0) = ret_rtx;
+          tmp = gen_rtx_SET (SImode,
+                             gen_rtx_REG (SImode, PC_REGNUM),
+                             gen_frame_mem (SImode,
+                                            gen_rtx_POST_INC (SImode,
+                                                         stack_pointer_rtx)));
+          RTX_FRAME_RELATED_P (tmp) = 1;
+          XVECEXP (par, 0, 1) = tmp;
+          emit_jump_insn (par);
+        }
+      else
+        {
+          arm_emit_multi_reg_pop (regs_to_be_popped_mask, pc_in_list);
+        }
+    }
+
+  return;
+}
+
 /* Generate and emit a pattern that will be recognized as LDRD pattern.  If even
    number of registers are being popped, multiple LDRD patterns are created for
    all register pairs.  If odd number of registers are popped, last register is
@@ -23019,12 +23148,14 @@ arm_expand_epilogue (bool really_return)
             else
               {
                 if (!current_tune->prefer_ldrd_strd
-                    || optimize_function_for_size_p (cfun)
-                    || TARGET_ARM)
+                    || optimize_function_for_size_p (cfun))
                   arm_emit_multi_reg_pop (saved_regs_mask, return_in_pc);
                 else
                   /* Generate LDRD pattern instead of POP pattern.  */
-                  thumb2_emit_ldrd_pop (saved_regs_mask, return_in_pc);
+                  if (TARGET_THUMB2)
+                    thumb2_emit_ldrd_pop (saved_regs_mask, return_in_pc);
+                  else
+                    arm_emit_ldrd_pop (saved_regs_mask, return_in_pc);
               }
 
             if (return_in_pc == true)
diff --git a/gcc/config/arm/ldmstm.md b/gcc/config/arm/ldmstm.md
index ffa675d..149fd8b 100644
--- a/gcc/config/arm/ldmstm.md
+++ b/gcc/config/arm/ldmstm.md
@@ -109,6 +109,54 @@
   "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
 )
 
+(define_insn "*arm_ldrd_base_update"
+  [(set (match_operand:SI 0 "arm_hard_register_operand" "+rk")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+   (set (match_operand:SI 1 "arm_hard_register_operand" "=r")
+        (mem:SI (match_dup 0)))
+   (set (match_operand:SI 2 "arm_hard_register_operand" "=r")
+        (mem:SI (plus:SI (match_dup 0)
+                         (const_int 4))))]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  "ldr%(d%)\t%1, %2, [%0], #8"
+  [(set_attr "type" "load2")
+   (set_attr "predicable" "yes")])
+
+(define_peephole2
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 8)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))
+     (set (match_operand:SI 2 "arm_hard_register_operand" "")
+          (mem:SI (plus:SI (match_dup 0)
+                           (const_int 4))))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd
+     && (!bad_reg_pair_for_arm_ldrd_strd (operands[1], operands[2]))
+     && (REGNO (operands[1]) != REGNO (operands[0]))
+     && (REGNO (operands[2]) != REGNO (operands[0])))"
+  [(set (match_dup 1)
+        (mem:DI (post_inc:SI (match_dup 0))))]
+  "operands[1] = gen_rtx_REG (DImode, REGNO (operands[1]));"
+)
+
+(define_insn "*arm_ldr_with_update"
+  [(parallel
+    [(set (match_operand:SI 0 "arm_hard_register_operand" "")
+        (plus:SI (match_dup 0)
+                 (const_int 4)))
+     (set (match_operand:SI 1 "arm_hard_register_operand" "")
+          (mem:SI (match_dup 0)))])]
+  "(TARGET_ARM && current_tune->prefer_ldrd_strd)"
+  "ldr%?\t%1, [%0], #4"
+  [(set_attr "type" "load1")
+  (set_attr "predicable" "yes")])
+
 (define_insn "*ldm4_ia"
   [(match_parallel 0 "load_multiple_operation"
     [(set (match_operand:SI 1 "arm_hard_register_operand" "")

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-12-30 12:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-11  9:22 [RFA/ARM][Patch 00/05]: Introduction - Generate LDRD/STRD in prologue/epilogue instead of PUSH/POP Sameera Deshpande
2011-10-11  9:31 ` [RFA/ARM][Patch 01/05]: Create tune for Cortex-A15 Sameera Deshpande
2011-10-21 12:56   ` Ramana Radhakrishnan
2011-10-11  9:38 ` [RFA/ARM][Patch 02/05]: LDRD generation instead of POP in A15 Thumb2 epilogue Sameera Deshpande
2011-10-13 18:14   ` Richard Henderson
2011-11-07  9:54     ` Sameera Deshpande
2011-11-07 16:59       ` Richard Henderson
2011-12-30 12:42       ` Sameera Deshpande
2011-10-11  9:53 ` [RFA/ARM][Patch 03/05]: STRD generation instead of PUSH in A15 Thumb2 prologue Sameera Deshpande
2011-10-21 13:00   ` Ramana Radhakrishnan
2011-11-07  9:55     ` Sameera Deshpande
2011-10-11 10:12 ` [RFA/ARM][Patch 04/05]: STRD generation instead of PUSH in A15 ARM prologue Sameera Deshpande
2011-10-21 13:21   ` Ramana Radhakrishnan
2011-11-08 11:14     ` Sameera Deshpande
2011-10-11 10:19 ` [RFA/ARM][Patch 05/05]: LDRD generation instead of POP in A15 ARM epilogue Sameera Deshpande
2011-10-21 13:30   ` Ramana Radhakrishnan
2011-11-08 11:15     ` Sameera Deshpande
2011-12-30 13:29       ` Sameera Deshpande

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).