public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)
@ 2012-05-31 13:44 Greta Yorsh
  2012-05-31 13:51 ` [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p Greta Yorsh
                   ` (8 more replies)
  0 siblings, 9 replies; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 13:44 UTC (permalink / raw)
  To: GCC Patches
  Cc: joseph, Richard Earnshaw, sameera.deshpande,
	Ramana Radhakrishnan, paul, nickc

This sequence of patches adds support for epilogue generation in RTL.

This is the first part of Sameera's work on ARM prologue/epilogue. Sameera
Deshpande posted it for review in December 2011, having addressed all
previous comments: http://gcc.gnu.org/ml/gcc-patches/2011-12/msg00049.html.
The latest version hasn't been approved yet. Originally, it was split into
two patches:
[1/2]: Thumb2 epilogue in RTL
[2/2]: ARM epilogue in RTL
I rebased Sameera's patches, made small changes in the patterns and fixed
RTL epilogue generated for -mapcs-frame. To make reviewing easier, I split
the patches into smaller steps:
* Reorganization - already committed upstream.
* New insn and expand patterns - main functionality change.
* Cleanup of dead code.

Here is the list of patches:
 1-update-predicate.patch
 2-patterns.patch
 3-patterns-vfp.patch
 4-expand-epilog-apcs-frame.patch
 5-expand-epilog.patch
 6-simple-return.patch
 7-expand-thumb2-return.patch
 8-remove-dead-code.patch

Testing:
* Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp and
tested in three configuration: -marm (default), -mthumb, -mapcs-frame. No
regression on qemu.
* Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
regression on qemu.
* Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
arm1136jf-s. No regression on qemu.
* Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with eglibc and
used this compiler to build AEL linux kernel. It boots successfully.
* Bootstrap the compiler on cortex-a8 successfully for
--languages=c,c++,fortran and used this compiler to build gdb. No regression
with check-gcc and check-gdb.

Notes:
* The patches are to be applied in the above order.
* The patches are not intended to be used individually.
* The patches have been tested only as a sequence.
* The patches have been tested on gcc from 20 March 2012 (fsf trunk r185582
of gcc-4.8 stage 1). The patches apply cleanly to current trunk (r188056).
* The patches have not been explicitly tested with any FPA variants (which
are deprecated in 4.7 and expected to become obsolete in 4.8).

Ok for trunk?

Thank you,
Greta



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
@ 2012-05-31 13:51 ` Greta Yorsh
  2012-06-15  9:20   ` Richard Earnshaw
  2012-05-31 13:54 ` [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs Greta Yorsh
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 13:51 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

This patch updates ldm_stm_operation_p to check for loads that if SP is in
the register list, then the base register is SP. It guarantees that SP is
reset correctly when an LDM instruction is interrupted. Otherwise, we might
end up with a corrupt stack. 

ChangeLog:

gcc

2012-05-31  Greta Yorsh  <greta.yorsh@arm.com>

	* config/arm/arm.c (ldm_stm_operation_p): Require SP
      as base register for loads if SP is in the register list.

[-- Attachment #2: 1-update-predicate.patch.txt --]
[-- Type: text/plain, Size: 1344 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e3290e2..4717725 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -10247,6 +10247,12 @@ ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
   if (!REG_P (addr))
     return false;
 
+  /* Don't allow SP to be loaded unless it is also the base register. It
+     guarantees that SP is reset correctly when an LDM instruction
+     is interruptted. Otherwise, we might end up with a corrupt stack.  */
+  if (load && (REGNO (reg) == SP_REGNUM) && (REGNO (addr) != SP_REGNUM))
+    return false;
+
   for (; i < count; i++)
     {
       elt = XVECEXP (op, 0, i);
@@ -10270,6 +10276,10 @@ ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
           || (consecutive
               && (REGNO (reg) !=
                   (unsigned int) (first_regno + regs_per_val * (i - base))))
+          /* Don't allow SP to be loaded unless it is also the base register. It
+             guarantees that SP is reset correctly when an LDM instruction
+             is interrupted. Otherwise, we might end up with a corrupt stack.  */
+          || (load && (REGNO (reg) == SP_REGNUM) && (REGNO (addr) != SP_REGNUM))
           || !MEM_P (mem)
           || GET_MODE (mem) != mode
           || ((GET_CODE (XEXP (mem, 0)) != PLUS

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
  2012-05-31 13:51 ` [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p Greta Yorsh
@ 2012-05-31 13:54 ` Greta Yorsh
  2012-06-15  9:22   ` Richard Earnshaw
  2012-05-31 13:55 ` [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs Greta Yorsh
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 13:54 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

This patch adds new define_insn patterns for epilogue with integer
registers.

The patterns can handle pop multiple with writeback and return (loading into
PC directly).
To handle return, the patterns use a new special predicate
pop_multiple_return, that uses ldm_stm_operation_p function from a previous
patch. To output assembly, the patterns use a new function
arm_output_multireg_pop.

This patch also adds a new function arm_emit_multi_reg_pop
that emits RTL that matches the new pop patterns for integer registers.
This is a helper function for epilogue expansion. It is used by a later
patch.

ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm.md (load_multiple_with_writeback) New define_insn.
        (load_multiple, pop_multiple_with_writeback_and_return) Likewise.
        (pop_multiple_with_return, ldr_with_return) Likewise.
        * config/arm/predicates.md (pop_multiple_return) New special
predicate.
        * config/arm/arm-protos.h (arm_output_multireg_pop) New declaration.
        * config/arm/arm.c (arm_output_multireg_pop) New function.
        (arm_emit_multi_reg_pop): New function.
        (ldm_stm_operation_p): Check SP in the register list.

[-- Attachment #2: 2-patterns.patch.txt --]
[-- Type: text/plain, Size: 11100 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 53c2aef..7b25e37 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -156,6 +156,7 @@ extern int    arm_emit_vector_const (FILE *, rtx);
 extern void arm_emit_fp16_const (rtx c);
 extern const char * arm_output_load_gr (rtx *);
 extern const char *vfp_output_fstmd (rtx *);
+extern void arm_output_multireg_pop (rtx *, bool, rtx, bool, bool);
 extern void arm_set_return_address (rtx, rtx);
 extern int arm_eliminable_register (rtx);
 extern const char *arm_output_shift(rtx *, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 4717725..9093801 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13815,6 +13815,84 @@ vfp_output_fldmd (FILE * stream, unsigned int base, int reg, int count)
 }
 
 
+/* OPERANDS[0] is the entire list of insns that constitute pop,
+   OPERANDS[1] is the base register, RETURN_PC is true iff return insn
+   is in the list, UPDATE is true iff the list contains explicit
+   update of base register.
+ */
+void
+arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
+                         bool update)
+{
+  int i;
+  char pattern[100];
+  int offset;
+  const char *conditional;
+  int num_saves = XVECLEN (operands[0], 0);
+  unsigned int regno;
+  unsigned int regno_base = REGNO (operands[1]);
+
+  offset = 0;
+  offset += update ? 1 : 0;
+  offset += return_pc ? 1 : 0;
+
+  /* Is the base register in the list? */
+  for (i = offset; i < num_saves; i++)
+    {
+      regno = REGNO (XEXP (XVECEXP (operands[0], 0, i), 0));
+      /* If SP is in the list, then the base register must be SP. */
+      gcc_assert ((regno != SP_REGNUM) || (regno_base == SP_REGNUM));
+      /* If base register is in the list, there must be no explicit update.  */
+      if (regno == regno_base)
+        gcc_assert (!update);
+    }
+
+  conditional = reverse ? "%?%D0" : "%?%d0";
+  if ((regno_base == SP_REGNUM) && TARGET_UNIFIED_ASM)
+    {
+      /* Output pop (not stmfd) because it has a shorter encoding. */
+      gcc_assert (update);
+      sprintf (pattern, "pop%s\t{", conditional);
+    }
+  else
+    {
+      /* Output ldmfd when the base register is SP, otherwise output ldmia.
+         It's just a convention, their semantics are identical.  */
+      if (regno_base == SP_REGNUM)
+        sprintf (pattern, "ldm%sfd\t", conditional);
+      else if (TARGET_UNIFIED_ASM)
+        sprintf (pattern, "ldmia%s\t", conditional);
+      else
+        sprintf (pattern, "ldm%sia\t", conditional);
+
+      strcat (pattern, reg_names[regno_base]);
+      if (update)
+        strcat (pattern, "!, {");
+      else
+        strcat (pattern, ", {");
+    }
+
+  /* Output the first destination register. */
+  strcat (pattern,
+          reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, offset), 0))]);
+
+  /* Output the rest of the destination registers.  */
+  for (i = offset + 1; i < num_saves; i++)
+    {
+      strcat (pattern, ", ");
+      strcat (pattern,
+              reg_names[REGNO (XEXP (XVECEXP (operands[0], 0, i), 0))]);
+    }
+
+  strcat (pattern, "}");
+
+  if (IS_INTERRUPT (arm_current_func_type ()) && return_pc)
+    strcat (pattern, "^");
+
+  output_asm_insn (pattern, &cond);
+}
+
+
 /* Output the assembly for a store multiple.  */
 
 const char *
@@ -16461,6 +16539,85 @@ emit_multi_reg_push (unsigned long mask)
   return par;
 }
 
+/* Generate and emit an insn pattern that we will recognize as a pop_multi.
+   SAVED_REGS_MASK shows which registers need to be restored.
+
+   Unfortunately, since this insn does not reflect very well the actual
+   semantics of the operation, we need to annotate the insn for the benefit
+   of DWARF2 frame unwind information.  */
+static void
+arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
+{
+  int num_regs = 0;
+  int i, j;
+  rtx par;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg;
+  bool return_in_pc;
+  int offset_adj;
+  int emit_update;
+
+  return_in_pc = (saved_regs_mask & (1 << PC_REGNUM)) ? true : false;
+  offset_adj = return_in_pc ? 1 : 0;
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  gcc_assert (num_regs && num_regs <= 16);
+
+  /* If SP is in reglist, then we don't emit SP update insn.  */
+  emit_update = (saved_regs_mask & (1 << SP_REGNUM)) ? 0 : 1;
+
+  /* The parallel needs to hold num_regs SETs
+     and one SET for the stack update.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (num_regs + emit_update + offset_adj));
+
+  if (return_in_pc)
+    {
+      tmp = ret_rtx;
+      XVECEXP (par, 0, 0) = tmp;
+    }
+
+  if (emit_update)
+    {
+      /* Increment the stack pointer, based on there being
+         num_regs 4-byte registers to restore.  */
+      tmp = gen_rtx_SET (VOIDmode,
+                         stack_pointer_rtx,
+                         plus_constant (stack_pointer_rtx, 4 * num_regs));
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      XVECEXP (par, 0, offset_adj) = tmp;
+    }
+
+  /* Now restore every reg, which may include PC.  */
+  for (j = 0, i = 0; j < num_regs; i++)
+    if (saved_regs_mask & (1 << i))
+      {
+        reg = gen_rtx_REG (SImode, i);
+        tmp = gen_rtx_SET (VOIDmode,
+                           reg,
+                           gen_frame_mem
+                           (SImode,
+                            plus_constant (stack_pointer_rtx, 4 * j)));
+        RTX_FRAME_RELATED_P (tmp) = 1;
+        XVECEXP (par, 0, j + emit_update + offset_adj) = tmp;
+
+        /* We need to maintain a sequence for DWARF info too.  As dwarf info
+           should not have PC, skip PC.  */
+        if (i != PC_REGNUM)
+          dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+        j++;
+      }
+
+  if (return_in_pc)
+    par = emit_jump_insn (par);
+  else
+    par = emit_insn (par);
+
+  REG_NOTES (par) = dwarf;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ed33c9b..862ccf4 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10959,6 +10959,89 @@
   [(set_attr "type" "f_fpa_store")]
 )
 
+;; Pop (as used in epilogue RTL)
+;;
+(define_insn "*load_multiple_with_writeback"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 1 "s_register_operand" "+rk")
+          (plus:SI (match_dup 1)
+                   (match_operand:SI 2 "const_int_operand" "I")))
+     (set (match_operand:SI 3 "s_register_operand" "=rk")
+          (mem:SI (match_dup 1)))
+        ])]
+  "TARGET_32BIT && (reload_in_progress || reload_completed)"
+  "*
+  {
+    arm_output_multireg_pop (operands, /*return_pc=*/FALSE,
+                                       /*cond=*/const_true_rtx,
+                                       /*reverse=*/FALSE,
+                                       /*update=*/TRUE);
+    return \"\";
+  }
+  "
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")]
+)
+
+;; Pop with return (as used in epilogue RTL)
+;;
+;; This instruction is generated when the registers are popped at the end of
+;; epilogue.  Here, instead of popping the value into LR and then generating
+;; jump to LR, value is popped into PC directly.  Hence, the pattern is combined
+;;  with (return).
+(define_insn "*pop_multiple_with_writeback_and_return"
+  [(match_parallel 0 "pop_multiple_return"
+    [(return)
+     (set (match_operand:SI 1 "s_register_operand" "+rk")
+          (plus:SI (match_dup 1)
+                   (match_operand:SI 2 "const_int_operand" "I")))
+     (set (match_operand:SI 3 "s_register_operand" "=rk")
+          (mem:SI (match_dup 1)))
+        ])]
+  "TARGET_32BIT && (reload_in_progress || reload_completed)"
+  "*
+  {
+    arm_output_multireg_pop (operands, /*return_pc=*/TRUE,
+                                       /*cond=*/const_true_rtx,
+                                       /*reverse=*/FALSE,
+                                       /*update=*/TRUE);
+    return \"\";
+  }
+  "
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")]
+)
+
+(define_insn "*pop_multiple_with_return"
+  [(match_parallel 0 "pop_multiple_return"
+    [(return)
+     (set (match_operand:SI 2 "s_register_operand" "=rk")
+          (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+        ])]
+  "TARGET_32BIT && (reload_in_progress || reload_completed)"
+  "*
+  {
+    arm_output_multireg_pop (operands, /*return_pc=*/TRUE,
+                                       /*cond=*/const_true_rtx,
+                                       /*reverse=*/FALSE,
+                                       /*update=*/FALSE);
+    return \"\";
+  }
+  "
+  [(set_attr "type" "load4")
+   (set_attr "predicable" "yes")]
+)
+
+;; Load into PC and return
+(define_insn "*ldr_with_return"
+  [(return)
+   (set (reg:SI PC_REGNUM)
+        (mem:SI (post_inc:SI (match_operand:SI 0 "s_register_operand" "+rk"))))]
+  "TARGET_32BIT && (reload_in_progress || reload_completed)"
+  "ldr%?\t%|pc, [%0], #4"
+  [(set_attr "type" "load1")
+   (set_attr "predicable" "yes")]
+)
 ;; Special patterns for dealing with the constant pool
 
 (define_insn "align_4"
@@ -11390,6 +11473,27 @@
 
 ;; Load the load/store multiple patterns
 (include "ldmstm.md")
+
+;; Patterns in ldmstm.md don't cover more than 4 registers. This pattern covers
+;; large lists without explicit writeback generated for APCS_FRAME epilogue.
+(define_insn "*load_multiple"
+  [(match_parallel 0 "load_multiple_operation"
+    [(set (match_operand:SI 2 "s_register_operand" "=rk")
+          (mem:SI (match_operand:SI 1 "s_register_operand" "rk")))
+        ])]
+  "TARGET_32BIT"
+  "*
+  {
+    arm_output_multireg_pop (operands, /*return_pc=*/FALSE,
+                                       /*cond=*/const_true_rtx,
+                                       /*reverse=*/FALSE,
+                                       /*update=*/FALSE);
+    return \"\";
+  }
+  "
+  [(set_attr "predicable" "yes")]
+)
+
 ;; Load the FPA co-processor patterns
 (include "fpa.md")
 ;; Load the Maverick co-processor patterns
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 428f9e0..24dd4ea 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -393,6 +393,14 @@
                                  /*return_pc=*/false);
 })
 
+(define_special_predicate "pop_multiple_return"
+  (match_code "parallel")
+{
+ return ldm_stm_operation_p (op, /*load=*/true, SImode,
+                                 /*consecutive=*/false,
+                                 /*return_pc=*/true);
+})
+
 (define_special_predicate "multi_register_push"
   (match_code "parallel")
 {

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
  2012-05-31 13:51 ` [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p Greta Yorsh
  2012-05-31 13:54 ` [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs Greta Yorsh
@ 2012-05-31 13:55 ` Greta Yorsh
  2012-06-15 10:44   ` Richard Earnshaw
  2012-05-31 13:59 ` [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame Greta Yorsh
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 13:55 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 617 bytes --]

New define insn pattern for epilogue with floating point registers (DFmode)
and a new function that emits RTL for this pattern. This function is a
helper for epilogue extension. It is used by a later patch.

ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm.md (vfp_pop_multiple_with_writeback) New
define_insn.
        * config/arm/predicates.md (pop_multiple_fp) New special predicate.
        * config/arm/arm.c (arm_emit_vfp_multi_reg_pop): New function.

[-- Attachment #2: 3-patterns-vfp.patch.txt --]
[-- Type: text/plain, Size: 4807 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9093801..491ffea 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -16618,6 +16618,76 @@ arm_emit_multi_reg_pop (unsigned long saved_regs_mask)
   REG_NOTES (par) = dwarf;
 }
 
+/* Generate and emit an insn pattern that we will recognize as a pop_multi
+   of NUM_REGS consecutive VFP regs, starting at FIRST_REG.
+
+   Unfortunately, since this insn does not reflect very well the actual
+   semantics of the operation, we need to annotate the insn for the benefit
+   of DWARF2 frame unwind information.  */
+static void
+arm_emit_vfp_multi_reg_pop (int first_reg, int num_regs, rtx base_reg)
+{
+  int i, j;
+  rtx par;
+  rtx dwarf = NULL_RTX;
+  rtx tmp, reg;
+
+  gcc_assert (num_regs && num_regs <= 32);
+
+    /* Workaround ARM10 VFPr1 bug.  */
+  if (num_regs == 2 && !arm_arch6)
+    {
+      if (first_reg == 15)
+        first_reg--;
+
+      num_regs++;
+    }
+
+  /* We can emit at most 16 D-registers in a single pop_multi instruction, and
+     there could be up to 32 D-registers to restore.
+     If there are more than 16 D-registers, make two recursive calls,
+     each of which emits one pop_multi instruction.  */
+  if (num_regs > 16)
+    {
+      arm_emit_vfp_multi_reg_pop (first_reg, 16, base_reg);
+      arm_emit_vfp_multi_reg_pop (first_reg + 16, num_regs - 16, base_reg);
+      return;
+    }
+
+  /* The parallel needs to hold num_regs SETs
+     and one SET for the stack update.  */
+  par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (num_regs + 1));
+
+  /* Increment the stack pointer, based on there being
+     num_regs 8-byte registers to restore.  */
+  tmp = gen_rtx_SET (VOIDmode,
+                     base_reg,
+                     plus_constant (base_reg, 8 * num_regs));
+  RTX_FRAME_RELATED_P (tmp) = 1;
+  XVECEXP (par, 0, 0) = tmp;
+
+  /* Now show every reg that will be restored, using a SET for each.  */
+  for (j = 0, i=first_reg; j < num_regs; i += 2)
+    {
+      reg = gen_rtx_REG (DFmode, i);
+
+      tmp = gen_rtx_SET (VOIDmode,
+                         reg,
+                         gen_frame_mem
+                         (DFmode,
+                          plus_constant (base_reg, 8 * j)));
+      RTX_FRAME_RELATED_P (tmp) = 1;
+      XVECEXP (par, 0, j + 1) = tmp;
+
+      dwarf = alloc_reg_note (REG_CFA_RESTORE, reg, dwarf);
+
+      j++;
+    }
+
+  par = emit_insn (par);
+  REG_NOTES (par) = dwarf;
+}
+
 /* Calculate the size of the return value that is passed in registers.  */
 static unsigned
 arm_size_return_regs (void)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 862ccf4..98387fa 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -11042,6 +11042,41 @@
   [(set_attr "type" "load1")
    (set_attr "predicable" "yes")]
 )
+;; Pop for floating point registers (as used in epilogue RTL)
+(define_insn "*vfp_pop_multiple_with_writeback"
+  [(match_parallel 0 "pop_multiple_fp"
+    [(set (match_operand:SI 1 "s_register_operand" "+rk")
+          (plus:SI (match_dup 1)
+                   (match_operand:SI 2 "const_int_operand" "I")))
+     (set (match_operand:DF 3 "arm_hard_register_operand" "")
+          (mem:DF (match_dup 1)))])]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP"
+  "*
+  {
+    int num_regs = XVECLEN (operands[0], 0);
+    char pattern[100];
+    rtx op_list[2];
+    strcpy (pattern, \"fldmfdd\\t\");
+    strcat (pattern, reg_names[REGNO (SET_DEST (XVECEXP (operands[0], 0, 0)))]);
+    strcat (pattern, \"!, {\");
+    op_list[0] = XEXP (XVECEXP (operands[0], 0, 1), 0);
+    strcat (pattern, \"%P0\");
+    if ((num_regs - 1) > 1)
+      {
+        strcat (pattern, \"-%P1\");
+        op_list [1] = XEXP (XVECEXP (operands[0], 0, num_regs - 1), 0);
+      }
+
+    strcat (pattern, \"}\");
+    output_asm_insn (pattern, op_list);
+    return \"\";
+  }
+  "
+  [(set_attr "type" "load4")
+   (set_attr "conds" "unconditional")
+   (set_attr "predicable" "no")]
+)
+
 ;; Special patterns for dealing with the constant pool
 
 (define_insn "align_4"
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 24dd4ea..92114bd 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -401,6 +401,14 @@
                                  /*return_pc=*/true);
 })
 
+(define_special_predicate "pop_multiple_fp"
+  (match_code "parallel")
+{
+ return ldm_stm_operation_p (op, /*load=*/true, DFmode,
+                                 /*consecutive=*/true,
+                                 /*return_pc=*/false);
+})
+
 (define_special_predicate "multi_register_push"
   (match_code "parallel")
 {

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (2 preceding siblings ...)
  2012-05-31 13:55 ` [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs Greta Yorsh
@ 2012-05-31 13:59 ` Greta Yorsh
  2012-06-15 10:46   ` Richard Earnshaw
  2012-05-31 14:01 ` [Patch, ARM][5/8] Epilogue in RTL: expand Greta Yorsh
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 13:59 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

Helper function for epilogue expansion. Emit RTL for APCS frame epilogue
(when -mapcs-frame command line option is specified).
This function is used by a later patch.

For APCS frame epilogue, the compiler currently generates LDM with SP as
both the base register
and one of the destination registers. For example:

@ APCS_FRAME epilogue
ldmfd   sp, {r4, fp, sp, pc}

@ non-APCS_FRAME epilogue
ldmfd     sp!, {r4, fp, pc}

The use of SP in LDM register list is deprecated, but this patch does not
address the problem.

To generate the epilogue for APCS frame in RTL, this patch adds a new
alternative to arm_addsi2 insn in ARM mode only to generate "sub sp, fp,
#imm". Previously, there was no pattern to generate sub with SP as the
destination register and not SP as the operand register.


ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm.c (arm_expand_epilogue_apcs_frame): New function.
        * config/arm/arm.md (arm_addsi3) Add an alternative.

[-- Attachment #2: 4-expand-epilog-apcs-frame.patch.txt --]
[-- Type: text/plain, Size: 10775 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 491ffea..d6b4c2e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22896,6 +22896,232 @@ thumb1_expand_epilogue (void)
     emit_use (gen_rtx_REG (SImode, LR_REGNUM));
 }
 
+/* Epilogue code for APCS frame.  */
+static void
+arm_expand_epilogue_apcs_frame (bool really_return)
+{
+  unsigned long func_type;
+  unsigned long saved_regs_mask;
+  int num_regs = 0;
+  int i;
+  int floats_from_frame = 0;
+  arm_stack_offsets *offsets;
+
+  gcc_assert (TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM);
+  func_type = arm_current_func_type ();
+
+  /* Get frame offsets for ARM.  */
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets->saved_regs_mask;
+
+  /* Find the offset of the floating-point save area in the frame.  */
+  floats_from_frame = offsets->saved_args - offsets->frame;
+
+  /* Compute how many core registers saved and how far away the floats are.  */
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      {
+        num_regs++;
+        floats_from_frame += 4;
+      }
+
+  if (TARGET_HARD_FLOAT && TARGET_VFP)
+    {
+      int start_reg;
+
+      /* The offset is from IP_REGNUM.  */
+      int saved_size = arm_get_vfp_saved_size ();
+      if (saved_size > 0)
+        {
+          floats_from_frame += saved_size;
+          emit_insn (gen_addsi3 (gen_rtx_REG (SImode, IP_REGNUM),
+                                 hard_frame_pointer_rtx,
+                                 GEN_INT (-floats_from_frame)));
+        }
+
+      /* Generate VFP register multi-pop.  */
+      start_reg = FIRST_VFP_REGNUM;
+
+      for (i = FIRST_VFP_REGNUM; i < LAST_VFP_REGNUM; i += 2)
+        /* Look for a case where a reg does not need restoring.  */
+        if ((!df_regs_ever_live_p (i) || call_used_regs[i])
+            && (!df_regs_ever_live_p (i + 1)
+                || call_used_regs[i + 1]))
+          {
+            if (start_reg != i)
+              arm_emit_vfp_multi_reg_pop (start_reg,
+                                          (i - start_reg) / 2,
+                                          gen_rtx_REG (SImode,
+                                                       IP_REGNUM));
+            start_reg = i + 2;
+          }
+
+      /* Restore the remaining regs that we have discovered (or possibly
+         even all of them, if the conditional in the for loop never
+         fired).  */
+      if (start_reg != i)
+        arm_emit_vfp_multi_reg_pop (start_reg,
+                                    (i - start_reg) / 2,
+                                    gen_rtx_REG (SImode, IP_REGNUM));
+    }
+  else if (TARGET_FPA_EMU2)
+    {
+      for (i = LAST_FPA_REGNUM; i >= FIRST_FPA_REGNUM; i--)
+        if (df_regs_ever_live_p (i) && !call_used_regs[i])
+          {
+            rtx addr;
+            rtx insn;
+            floats_from_frame += 12;
+            addr = gen_rtx_MEM (XFmode,
+                                gen_rtx_PLUS (SImode,
+                                              hard_frame_pointer_rtx,
+                                              GEN_INT (- floats_from_frame)));
+            set_mem_alias_set (addr, get_frame_alias_set ());
+            insn = emit_insn (gen_rtx_SET (XFmode,
+                                           gen_rtx_REG (XFmode, i),
+                                           addr));
+            REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                               gen_rtx_REG (XFmode, i),
+                                               NULL_RTX);
+          }
+    }
+  else
+    {
+      int idx = 0;
+      rtx load_seq[4];
+      rtx dwarf = NULL_RTX;
+      rtx par;
+      rtx frame_mem;
+
+      for (i = LAST_FPA_REGNUM; i >= FIRST_FPA_REGNUM; i--)
+        {
+          /* We can't unstack more than four registers at once.  */
+          if (idx == 4)
+            {
+              par = emit_insn (gen_rtx_PARALLEL (VOIDmode,
+                                                 gen_rtvec_v (idx, load_seq)));
+              REG_NOTES (par) = dwarf;
+              dwarf = NULL_RTX;
+              idx = 0;
+            }
+
+          if (df_regs_ever_live_p (i) && !call_used_regs[i])
+            {
+              floats_from_frame += 12;
+
+              frame_mem = gen_frame_mem (XFmode,
+                                         plus_constant (hard_frame_pointer_rtx,
+                                                        - floats_from_frame));
+              load_seq[idx] = gen_rtx_SET (VOIDmode, gen_rtx_REG (XFmode, i),
+                                           frame_mem);
+              dwarf = alloc_reg_note (REG_CFA_RESTORE, gen_rtx_REG (XFmode, i),
+                                      dwarf);
+              idx++;
+            }
+          else if (idx)
+            {
+              /* Registers must be consecutive.  */
+              par = emit_insn (gen_rtx_PARALLEL (VOIDmode,
+                                                 gen_rtvec_v (idx, load_seq)));
+              REG_NOTES (par) = dwarf;
+              dwarf = NULL_RTX;
+              idx = 0;
+            }
+        }
+
+      /* Pop the last registers.  */
+      if (idx)
+        {
+          par = emit_insn (gen_rtx_PARALLEL (VOIDmode,
+                                             gen_rtvec_v (idx, load_seq)));
+          REG_NOTES (par) = dwarf;
+        }
+    }
+
+  if (TARGET_IWMMXT)
+    {
+      /* The frame pointer is guaranteed to be non-double-word aligned, as
+         it is set to double-word-aligned old_stack_pointer - 4.  */
+      rtx insn;
+      int lrm_count = (num_regs % 2) ? (num_regs + 2) : (num_regs + 1);
+
+      for (i = LAST_IWMMXT_REGNUM; i >= FIRST_IWMMXT_REGNUM; i--)
+        if (df_regs_ever_live_p (i) && !call_used_regs[i])
+          {
+            rtx addr = gen_frame_mem (V2SImode,
+                                 plus_constant (hard_frame_pointer_rtx,
+                                                - lrm_count * 4));
+            insn = emit_insn (gen_movsi (gen_rtx_REG (V2SImode, i), addr));
+            REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                               gen_rtx_REG (V2SImode, i),
+                                               NULL_RTX);
+            lrm_count += 2;
+          }
+    }
+
+  /* saved_regs_mask should contain IP which contains old stack pointer
+     at the time of activation creation.  Since SP and IP are adjacent registers,
+     we can restore the value directly into SP.  */
+  gcc_assert (saved_regs_mask & (1 << IP_REGNUM));
+  saved_regs_mask &= ~(1 << IP_REGNUM);
+  saved_regs_mask |= (1 << SP_REGNUM);
+
+  /* There are two registers left in saved_regs_mask - LR and PC.  We
+     only need to restore LR (the return address), but to
+     save time we can load it directly into PC, unless we need a
+     special function exit sequence, or we are not really returning.  */
+  if (really_return
+      && ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL
+      && !crtl->calls_eh_return)
+    /* Delete LR from the register mask, so that LR on
+       the stack is loaded into the PC in the register mask.  */
+    saved_regs_mask &= ~(1 << LR_REGNUM);
+  else
+    saved_regs_mask &= ~(1 << PC_REGNUM);
+
+  num_regs = bit_count (saved_regs_mask);
+  if ((offsets->outgoing_args != (1 + num_regs)) || cfun->calls_alloca)
+    {
+      /* Unwind the stack to just below the saved registers.  */
+      emit_insn (gen_addsi3 (stack_pointer_rtx,
+                             hard_frame_pointer_rtx,
+                             GEN_INT (- 4 * num_regs)));
+    }
+
+  arm_emit_multi_reg_pop (saved_regs_mask);
+
+  if (IS_INTERRUPT (func_type))
+    {
+      /* Interrupt handlers will have pushed the
+         IP onto the stack, so restore it now.  */
+      rtx insn;
+      rtx addr = gen_rtx_MEM (SImode,
+                              gen_rtx_POST_INC (SImode,
+                              stack_pointer_rtx));
+      set_mem_alias_set (addr, get_frame_alias_set ());
+      insn = emit_insn (gen_movsi (gen_rtx_REG (SImode, IP_REGNUM), addr));
+      REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                         gen_rtx_REG (SImode, IP_REGNUM),
+                                         NULL_RTX);
+    }
+
+  if (!really_return || (saved_regs_mask & (1 << PC_REGNUM)))
+    return;
+
+  if (crtl->calls_eh_return)
+    emit_insn (gen_addsi3 (stack_pointer_rtx,
+               stack_pointer_rtx,
+               GEN_INT (ARM_EH_STACKADJ_REGNUM)));
+
+  if (IS_STACKALIGN (func_type))
+    /* Restore the original stack pointer.  Before prologue, the stack was
+       realigned and the original stack pointer saved in r0.  For details,
+       see comment in arm_expand_prologue.  */
+    emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, 0)));
+
+  emit_jump_insn (simple_return_rtx);
+}
+
 /* Implementation of insn prologue_thumb1_interwork.  This is the first
    "instruction" of a function called in ARM mode.  Swap to thumb mode.  */
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 98387fa..3a237c8 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -718,9 +718,9 @@
 ;;  (plus (reg rN) (reg sp)) into (reg rN).  In this case reload will
 ;; put the duplicated register first, and not try the commutative version.
 (define_insn_and_split "*arm_addsi3"
-  [(set (match_operand:SI          0 "s_register_operand" "=r, k,r,r, k, r, k,r, k, r")
-	(plus:SI (match_operand:SI 1 "s_register_operand" "%rk,k,r,rk,k, rk,k,rk,k, rk")
-		 (match_operand:SI 2 "reg_or_int_operand" "rI,rI,k,Pj,Pj,L, L,PJ,PJ,?n")))]
+  [(set (match_operand:SI          0 "s_register_operand" "=r, k,r,r, k, r, k,k,r, k, r")
+	(plus:SI (match_operand:SI 1 "s_register_operand" "%rk,k,r,rk,k, rk,k,r,rk,k, rk")
+		 (match_operand:SI 2 "reg_or_int_operand" "rI,rI,k,Pj,Pj,L, L,L,PJ,PJ,?n")))]
   "TARGET_32BIT"
   "@
    add%?\\t%0, %1, %2
@@ -730,6 +730,7 @@
    addw%?\\t%0, %1, %2
    sub%?\\t%0, %1, #%n2
    sub%?\\t%0, %1, #%n2
+   sub%?\\t%0, %1, #%n2
    subw%?\\t%0, %1, #%n2
    subw%?\\t%0, %1, #%n2
    #"
@@ -744,9 +745,9 @@
 		      operands[1], 0);
   DONE;
   "
-  [(set_attr "length" "4,4,4,4,4,4,4,4,4,16")
+  [(set_attr "length" "4,4,4,4,4,4,4,4,4,4,16")
    (set_attr "predicable" "yes")
-   (set_attr "arch" "*,*,*,t2,t2,*,*,t2,t2,*")]
+   (set_attr "arch" "*,*,*,t2,t2,*,*,a,t2,t2,*")]
 )
 
 (define_insn_and_split "*thumb1_addsi3"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][5/8] Epilogue in RTL: expand
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (3 preceding siblings ...)
  2012-05-31 13:59 ` [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame Greta Yorsh
@ 2012-05-31 14:01 ` Greta Yorsh
  2012-06-15 10:47   ` Richard Earnshaw
  2012-05-31 14:03 ` [Patch, ARM][6/8] Epilogue in RTL: simple return Greta Yorsh
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 14:01 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

The main function for epilogue RTL generation, used by expand epilogue
patterns.

ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm-protos.h (arm_expand_epilogue): New declaration.
        * config/arm/arm.c (arm_expand_epilogue): New function.
        * config/arm/arm.md (epilogue): Update condition and code.
        (sibcall_epilogue): Likewise.

[-- Attachment #2: 5-expand-epilog.patch.txt --]
[-- Type: text/plain, Size: 16081 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 7b25e37..f61feef 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -30,6 +30,7 @@ extern void arm_load_pic_register (unsigned long);
 extern int arm_volatile_func (void);
 extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
+extern void arm_expand_epilogue (bool);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
 extern void thumb2_asm_output_opcode (FILE *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d6b4c2e..c8642e2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -23122,6 +23122,326 @@ arm_expand_epilogue_apcs_frame (bool really_return)
   emit_jump_insn (simple_return_rtx);
 }
 
+/* Generate RTL to represent ARM epilogue.  Really_return is true if the
+   function is not a sibcall.  */
+void
+arm_expand_epilogue (bool really_return)
+{
+  unsigned long func_type;
+  unsigned long saved_regs_mask;
+  int num_regs = 0;
+  int i;
+  int amount;
+  int floats_from_frame = 0;
+  arm_stack_offsets *offsets;
+
+  func_type = arm_current_func_type ();
+
+  /* Naked functions don't have epilogue.  Hence, generate return pattern, and
+     let output_return_instruction take care of instruction emition if any.  */
+  if (IS_NAKED (func_type)
+      || (IS_VOLATILE (func_type) && TARGET_ABORT_NORETURN))
+    {
+      emit_jump_insn (simple_return_rtx);
+      return;
+    }
+
+  /* If we are throwing an exception, then we really must be doing a
+     return, so we can't tail-call.  */
+  gcc_assert (!crtl->calls_eh_return || really_return);
+
+  if (TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
+    {
+      arm_expand_epilogue_apcs_frame (really_return);
+      return;
+    }
+
+  /* Get frame offsets for ARM.  */
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets->saved_regs_mask;
+
+  /* Find offset of floating point register from frame pointer.
+     The initialization is done in this way to take care of frame pointer
+     and static-chain register, if stored.  */
+  floats_from_frame = offsets->saved_args - offsets->frame;
+  /* Compute how many registers saved and how far away the floats will be.  */
+  for (i = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      {
+        num_regs++;
+        floats_from_frame += 4;
+      }
+
+  if (frame_pointer_needed)
+    {
+      /* Restore stack pointer if necessary.  */
+      if (TARGET_ARM)
+        {
+          /* In ARM mode, frame pointer points to first saved register.
+             Restore stack pointer to last saved register.  */
+          amount = offsets->frame - offsets->saved_regs;
+
+          /* Force out any pending memory operations that reference stacked data
+             before stack de-allocation occurs.  */
+          emit_insn (gen_blockage ());
+          emit_insn (gen_addsi3 (stack_pointer_rtx,
+                                 hard_frame_pointer_rtx,
+                                 GEN_INT (amount)));
+
+          /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
+             deleted.  */
+          emit_insn (gen_prologue_use (stack_pointer_rtx));
+        }
+      else
+        {
+          /* In Thumb-2 mode, the frame pointer points to the last saved
+             register.  */
+          amount = offsets->locals_base - offsets->saved_regs;
+          if (amount)
+            emit_insn (gen_addsi3 (hard_frame_pointer_rtx,
+                                   hard_frame_pointer_rtx,
+                                   GEN_INT (amount)));
+
+          /* Force out any pending memory operations that reference stacked data
+             before stack de-allocation occurs.  */
+          emit_insn (gen_blockage ());
+          emit_insn (gen_movsi (stack_pointer_rtx, hard_frame_pointer_rtx));
+          /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is not
+             deleted.  */
+          emit_insn (gen_prologue_use (stack_pointer_rtx));
+        }
+    }
+  else
+    {
+      /* Pop off outgoing args and local frame to adjust stack pointer to
+         last saved register.  */
+      amount = offsets->outgoing_args - offsets->saved_regs;
+      if (amount)
+        {
+          /* Force out any pending memory operations that reference stacked data
+             before stack de-allocation occurs.  */
+          emit_insn (gen_blockage ());
+          emit_insn (gen_addsi3 (stack_pointer_rtx,
+                                 stack_pointer_rtx,
+                                 GEN_INT (amount)));
+          /* Emit USE(stack_pointer_rtx) to ensure that stack adjustment is
+             not deleted.  */
+          emit_insn (gen_prologue_use (stack_pointer_rtx));
+        }
+    }
+
+  if (TARGET_HARD_FLOAT && TARGET_VFP)
+    {
+      /* Generate VFP register multi-pop.  */
+      int end_reg = LAST_VFP_REGNUM + 1;
+
+      /* Scan the registers in reverse order.  We need to match
+         any groupings made in the prologue and generate matching
+         vldm operations.  The need to match groups is because,
+         unlike pop, vldm can only do consecutive regs.  */
+      for (i = LAST_VFP_REGNUM - 1; i >= FIRST_VFP_REGNUM; i -= 2)
+        /* Look for a case where a reg does not need restoring.  */
+        if ((!df_regs_ever_live_p (i) || call_used_regs[i])
+            && (!df_regs_ever_live_p (i + 1)
+                || call_used_regs[i + 1]))
+          {
+            /* Restore the regs discovered so far (from reg+2 to
+               end_reg).  */
+            if (end_reg > i + 2)
+              arm_emit_vfp_multi_reg_pop (i + 2,
+                                          (end_reg - (i + 2)) / 2,
+                                          stack_pointer_rtx);
+            end_reg = i;
+          }
+
+      /* Restore the remaining regs that we have discovered (or possibly
+         even all of them, if the conditional in the for loop never
+         fired).  */
+      if (end_reg > i + 2)
+        arm_emit_vfp_multi_reg_pop (i + 2,
+                                    (end_reg - (i + 2)) / 2,
+                                    stack_pointer_rtx);
+    }
+  else if (TARGET_FPA_EMU2)
+    {
+      for (i = FIRST_FPA_REGNUM; i <= LAST_FPA_REGNUM; i++)
+        if (df_regs_ever_live_p (i) && !call_used_regs[i])
+          {
+            /* Generate memory reference with write-back to SP.  */
+            rtx insn;
+            rtx addr = gen_rtx_MEM (XFmode,
+                                    gen_rtx_POST_INC (SImode,
+                                                      stack_pointer_rtx));
+            set_mem_alias_set (addr, get_frame_alias_set ());
+            insn = emit_insn (gen_movxf (gen_rtx_REG (XFmode, i), addr));
+            REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                               gen_rtx_REG (XFmode, i),
+                                               NULL_RTX);
+          }
+    }
+  else
+    {
+      int idx = 0;
+      rtx load_seq[5];
+      rtx par;
+      rtx tmp;
+      rtx dwarf = NULL_RTX;
+
+      for (i = FIRST_FPA_REGNUM; i <= LAST_FPA_REGNUM; i++)
+        {
+          if (idx == 4)
+            {
+              load_seq[0] = gen_rtx_SET (VOIDmode,
+                                         stack_pointer_rtx,
+                                         plus_constant (stack_pointer_rtx,
+                                                        12 * idx));
+              tmp = gen_rtx_PARALLEL (VOIDmode,
+                                      gen_rtvec_v (idx + 1, load_seq));
+              par = emit_insn (tmp);
+              REG_NOTES (par) = dwarf;
+              dwarf = NULL_RTX;
+              idx = 0;
+            }
+
+          if (df_regs_ever_live_p (i) && !call_used_regs[i])
+            {
+              tmp = gen_frame_mem (XFmode,
+                                   plus_constant (stack_pointer_rtx, 12 * idx));
+              load_seq[idx + 1] = gen_rtx_SET (VOIDmode,
+                                               gen_rtx_REG (XFmode, i),
+                                               tmp);
+              dwarf = alloc_reg_note (REG_CFA_RESTORE,
+                                      gen_rtx_REG (XFmode, i),
+                                      dwarf);
+              idx++;
+            }
+          else
+            {
+              if (idx)
+                {
+                  /* Create parallel and emit.  */
+                  load_seq[0] = gen_rtx_SET (VOIDmode,
+                                             stack_pointer_rtx,
+                                             plus_constant (stack_pointer_rtx,
+                                                            12 * idx));
+                  par = emit_insn (gen_rtx_PARALLEL (VOIDmode,
+                                                     gen_rtvec_v (idx + 1,
+                                                                  load_seq)));
+                  REG_NOTES (par) = dwarf;
+                  dwarf = NULL_RTX;
+                  idx = 0;
+                }
+            }
+        }
+
+      if (idx)
+        {
+          load_seq[0] = gen_rtx_SET (VOIDmode,
+                                     stack_pointer_rtx,
+                                     plus_constant (stack_pointer_rtx,
+                                                    12 * idx));
+          par = emit_insn (gen_rtx_PARALLEL (VOIDmode,
+                                             gen_rtvec_v (idx + 1, load_seq)));
+          REG_NOTES (par) = dwarf;
+          dwarf = NULL_RTX;
+          idx = 0;
+        }
+    }
+
+  if (TARGET_IWMMXT)
+    for (i = FIRST_IWMMXT_REGNUM; i <= LAST_IWMMXT_REGNUM; i++)
+      if (df_regs_ever_live_p (i) && !call_used_regs[i])
+        {
+          rtx insn;
+          rtx addr = gen_rtx_MEM (V2SImode,
+                                  gen_rtx_POST_INC (SImode,
+                                                    stack_pointer_rtx));
+          set_mem_alias_set (addr, get_frame_alias_set ());
+          insn = emit_insn (gen_movsi (gen_rtx_REG (V2SImode, i), addr));
+          REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                             gen_rtx_REG (V2SImode, i),
+                                             NULL_RTX);
+        }
+
+  if (saved_regs_mask)
+    {
+      rtx insn;
+      bool return_in_pc = false;
+
+      if (ARM_FUNC_TYPE (func_type) != ARM_FT_INTERWORKED
+          && (TARGET_ARM || ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL)
+          && !IS_STACKALIGN (func_type)
+          && really_return
+          && crtl->args.pretend_args_size == 0
+          && saved_regs_mask & (1 << LR_REGNUM)
+          && !crtl->calls_eh_return)
+        {
+          saved_regs_mask &= ~(1 << LR_REGNUM);
+          saved_regs_mask |= (1 << PC_REGNUM);
+          return_in_pc = true;
+        }
+
+      if (num_regs == 1 && (!IS_INTERRUPT (func_type) || !return_in_pc))
+        {
+          for (i = 0; i <= LAST_ARM_REGNUM; i++)
+            if (saved_regs_mask & (1 << i))
+              {
+                rtx addr = gen_rtx_MEM (SImode,
+                                        gen_rtx_POST_INC (SImode,
+                                                          stack_pointer_rtx));
+                set_mem_alias_set (addr, get_frame_alias_set ());
+
+                if (i == PC_REGNUM)
+                  {
+                    insn = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+                    XVECEXP (insn, 0, 0) = ret_rtx;
+                    XVECEXP (insn, 0, 1) = gen_rtx_SET (SImode,
+                                                        gen_rtx_REG (SImode, i),
+                                                        addr);
+                    RTX_FRAME_RELATED_P (XVECEXP (insn, 0, 1)) = 1;
+                    insn = emit_jump_insn (insn);
+                  }
+                else
+                  {
+                    insn = emit_insn (gen_movsi (gen_rtx_REG (SImode, i),
+                                                 addr));
+                    REG_NOTES (insn) = alloc_reg_note (REG_CFA_RESTORE,
+                                                       gen_rtx_REG (SImode, i),
+                                                       NULL_RTX);
+                  }
+              }
+        }
+      else
+        {
+          arm_emit_multi_reg_pop (saved_regs_mask);
+        }
+
+      if (return_in_pc == true)
+        return;
+    }
+
+  if (crtl->args.pretend_args_size)
+    emit_insn (gen_addsi3 (stack_pointer_rtx,
+                           stack_pointer_rtx,
+                           GEN_INT (crtl->args.pretend_args_size)));
+
+  if (!really_return)
+    return;
+
+  if (crtl->calls_eh_return)
+    emit_insn (gen_addsi3 (stack_pointer_rtx,
+                           stack_pointer_rtx,
+                           gen_rtx_REG (SImode, ARM_EH_STACKADJ_REGNUM)));
+
+  if (IS_STACKALIGN (func_type))
+    /* Restore the original stack pointer.  Before prologue, the stack was
+       realigned and the original stack pointer saved in r0.  For details,
+       see comment in arm_expand_prologue.  */
+    emit_insn (gen_movsi (stack_pointer_rtx, gen_rtx_REG (SImode, 0)));
+
+  emit_jump_insn (simple_return_rtx);
+}
+
 /* Implementation of insn prologue_thumb1_interwork.  This is the first
    "instruction" of a function called in ARM mode.  Swap to thumb mode.  */
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 3a237c8..d1c1894 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10625,14 +10625,21 @@
   if (crtl->calls_eh_return)
     emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, 2)));
   if (TARGET_THUMB1)
-    thumb1_expand_epilogue ();
-  else if (USE_RETURN_INSN (FALSE))
-    {
-      emit_jump_insn (gen_return ());
-      DONE;
-    }
-  emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode,
-	gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE));
+   {
+     thumb1_expand_epilogue ();
+     emit_jump_insn (gen_rtx_UNSPEC_VOLATILE (VOIDmode,
+                     gen_rtvec (1, ret_rtx), VUNSPEC_EPILOGUE));
+   }
+  else if (HAVE_return)
+   {
+     /* HAVE_return is testing for USE_RETURN_INSN (FALSE).  Hence,
+        no need for explicit testing again.  */
+     emit_jump_insn (gen_return ());
+   }
+  else if (TARGET_32BIT)
+   {
+    arm_expand_epilogue (true);
+   }
   DONE;
   "
 )
@@ -10649,22 +10656,14 @@
 ;; to add an unspec of the link register to ensure that flow
 ;; does not think that it is unused by the sibcall branch that
 ;; will replace the standard function epilogue.
-(define_insn "sibcall_epilogue"
-  [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_PROLOGUE_USE)
-              (unspec_volatile [(return)] VUNSPEC_EPILOGUE)])]
-  "TARGET_32BIT"
-  "*
-  if (use_return_insn (FALSE, next_nonnote_insn (insn)))
-    return output_return_instruction (const_true_rtx, FALSE, FALSE);
-  return arm_output_epilogue (next_nonnote_insn (insn));
-  "
-;; Length is absolute worst case
-  [(set_attr "length" "44")
-   (set_attr "type" "block")
-   ;; We don't clobber the conditions, but the potential length of this
-   ;; operation is sufficient to make conditionalizing the sequence 
-   ;; unlikely to be profitable.
-   (set_attr "conds" "clob")]
+(define_expand "sibcall_epilogue"
+   [(parallel [(unspec:SI [(reg:SI LR_REGNUM)] UNSPEC_PROLOGUE_USE)
+               (unspec_volatile [(return)] VUNSPEC_EPILOGUE)])]
+   "TARGET_32BIT"
+   "
+   arm_expand_epilogue (false);
+   DONE;
+   "
 )
 
 (define_insn "*epilogue_insns"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][6/8] Epilogue in RTL: simple return
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (4 preceding siblings ...)
  2012-05-31 14:01 ` [Patch, ARM][5/8] Epilogue in RTL: expand Greta Yorsh
@ 2012-05-31 14:03 ` Greta Yorsh
  2012-06-15 10:49   ` Richard Earnshaw
  2012-05-31 14:05 ` [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return Greta Yorsh
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 14:03 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

Add a new parameter to the function output_return_instruction to handle
simple cases of return when no epilogue needs to be printed out.

ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm-protos.h (output_return_instruction): New
parameter.
        * config/arm/arm.c (output_return_instruction): New parameter.
        * config/arm/arm.md (arm_simple_return): New pattern.
        (arm_return, cond_return, cond_return_inverted): Add new arguments.
        * config/arm/thumb2.md (thumb2_return): Update condition and code.

[-- Attachment #2: 6-simple-return.patch.txt --]
[-- Type: text/plain, Size: 4430 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f61feef..01cd794 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -148,7 +148,7 @@ extern int arm_address_offset_is_imm (rtx);
 extern const char *output_add_immediate (rtx *);
 extern const char *arithmetic_instr (rtx, int);
 extern void output_ascii_pseudo_op (FILE *, const unsigned char *, int);
-extern const char *output_return_instruction (rtx, int, int);
+extern const char *output_return_instruction (rtx, int, int, int);
 extern void arm_poke_function_name (FILE *, const char *);
 extern void arm_final_prescan_insn (rtx);
 extern int arm_debugger_arg_offset (int, rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c8642e2..e7a74e0 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -15592,9 +15592,11 @@ arm_get_vfp_saved_size (void)
 
 
 /* Generate a function exit sequence.  If REALLY_RETURN is false, then do
-   everything bar the final return instruction.  */
+   everything bar the final return instruction.  If simple_return is true,
+   then do not output epilogue, because it has already been emitted in RTL.  */
 const char *
-output_return_instruction (rtx operand, int really_return, int reverse)
+output_return_instruction (rtx operand, int really_return, int reverse,
+                           int simple_return)
 {
   char conditional[10];
   char instr[100];
@@ -15637,7 +15639,7 @@ output_return_instruction (rtx operand, int really_return, int reverse)
   offsets = arm_get_frame_offsets ();
   live_regs_mask = offsets->saved_regs_mask;
 
-  if (live_regs_mask)
+  if (!simple_return && live_regs_mask)
     {
       const char * return_reg;
 
@@ -15765,7 +15767,7 @@ output_return_instruction (rtx operand, int really_return, int reverse)
 	{
 	  /* The return has already been handled
 	     by loading the LR into the PC.  */
-	  really_return = 0;
+          return "";
 	}
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index d1c1894..867dcbe 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8597,7 +8597,7 @@
         arm_ccfsm_state += 2;
         return \"\";
       }
-    return output_return_instruction (const_true_rtx, TRUE, FALSE);
+    return output_return_instruction (const_true_rtx, TRUE, FALSE, FALSE);
   }"
   [(set_attr "type" "load1")
    (set_attr "length" "12")
@@ -8618,7 +8618,7 @@
         arm_ccfsm_state += 2;
         return \"\";
       }
-    return output_return_instruction (operands[0], TRUE, FALSE);
+    return output_return_instruction (operands[0], TRUE, FALSE, FALSE);
   }"
   [(set_attr "conds" "use")
    (set_attr "length" "12")
@@ -8639,13 +8639,30 @@
         arm_ccfsm_state += 2;
         return \"\";
       }
-    return output_return_instruction (operands[0], TRUE, TRUE);
+    return output_return_instruction (operands[0], TRUE, TRUE, FALSE);
   }"
   [(set_attr "conds" "use")
    (set_attr "length" "12")
    (set_attr "type" "load1")]
 )
 
+(define_insn "*arm_simple_return"
+  [(simple_return)]
+  "TARGET_ARM"
+  "*
+  {
+    if (arm_ccfsm_state == 2)
+      {
+        arm_ccfsm_state += 2;
+        return \"\";
+      }
+    return output_return_instruction (const_true_rtx, TRUE, FALSE, TRUE);
+  }"
+  [(set_attr "type" "branch")
+   (set_attr "length" "4")
+   (set_attr "predicable" "yes")]
+)
+
 ;; Generate a sequence of instructions to determine if the processor is
 ;; in 26-bit or 32-bit mode, and return the appropriate return address
 ;; mask.
diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 39a2138..b7a8423 100644
--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -635,17 +635,12 @@
    (set_attr "length" "20")]
 )
 
-;; Note: this is not predicable, to avoid issues with linker-generated
-;; interworking stubs.
 (define_insn "*thumb2_return"
-  [(return)]
-  "TARGET_THUMB2 && USE_RETURN_INSN (FALSE)"
-  "*
-  {
-    return output_return_instruction (const_true_rtx, TRUE, FALSE);
-  }"
-  [(set_attr "type" "load1")
-   (set_attr "length" "12")]
+  [(simple_return)]
+  "TARGET_THUMB2"
+  "* return output_return_instruction (const_true_rtx, TRUE, FALSE, TRUE);"
+  [(set_attr "type" "branch")
+   (set_attr "length" "4")]
 )
 
 (define_insn_and_split "thumb2_eh_return"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (5 preceding siblings ...)
  2012-05-31 14:03 ` [Patch, ARM][6/8] Epilogue in RTL: simple return Greta Yorsh
@ 2012-05-31 14:05 ` Greta Yorsh
  2012-06-15 10:58   ` Richard Earnshaw
  2012-05-31 14:10 ` [Patch, ARM][8/8] Epilogue in RTL: remove dead code Greta Yorsh
  2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 14:05 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 457 bytes --]

Generate RTL for return in Thumb2 mode. Used by expand of return insn.

ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm-protos.h (thumb2_expand_return): New declaration.
        * config/arm/arm.c (thumb2_expand_return): New function.
        * config/arm/arm.md (return): Update condition and code.

[-- Attachment #2: 7-expand-thumb2-epilog.patch.txt --]
[-- Type: text/plain, Size: 3205 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 01cd794..2fef0f2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -31,6 +31,7 @@ extern int arm_volatile_func (void);
 extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
+extern void thumb2_expand_return (void);
 extern const char *arm_strip_name_encoding (const char *);
 extern void arm_asm_output_labelref (FILE *, const char *);
 extern void thumb2_asm_output_opcode (FILE *);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index e7a74e0..8bc6dcc 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -22841,6 +22841,52 @@ thumb1_expand_prologue (void)
     cfun->machine->lr_save_eliminated = 0;
 }
 
+/* Generate pattern *pop_multiple_with_stack_update_and_return if single
+   POP instruction can be generated.  LR should be replaced by PC.  All
+   the checks required are already done by  USE_RETURN_INSN ().  Hence,
+   all we really need to check here is if single register is to be
+   returned, or multiple register return.  */
+void
+thumb2_expand_return (void)
+{
+  int i, num_regs;
+  unsigned long saved_regs_mask;
+  arm_stack_offsets *offsets;
+
+  offsets = arm_get_frame_offsets ();
+  saved_regs_mask = offsets->saved_regs_mask;
+
+  for (i = 0, num_regs = 0; i <= LAST_ARM_REGNUM; i++)
+    if (saved_regs_mask & (1 << i))
+      num_regs++;
+
+  if (saved_regs_mask)
+    {
+      if (num_regs == 1)
+        {
+          rtx par = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (2));
+          rtx reg = gen_rtx_REG (SImode, PC_REGNUM);
+          rtx addr = gen_rtx_MEM (SImode,
+                                  gen_rtx_POST_INC (SImode,
+                                                    stack_pointer_rtx));
+          set_mem_alias_set (addr, get_frame_alias_set ());
+          XVECEXP (par, 0, 0) = ret_rtx;
+          XVECEXP (par, 0, 1) = gen_rtx_SET (SImode, reg, addr);
+          RTX_FRAME_RELATED_P (XVECEXP (par, 0, 1)) = 1;
+          emit_jump_insn (par);
+        }
+      else
+        {
+          saved_regs_mask &= ~ (1 << LR_REGNUM);
+          saved_regs_mask |=   (1 << PC_REGNUM);
+          arm_emit_multi_reg_pop (saved_regs_mask);
+        }
+    }
+  else
+    {
+      emit_jump_insn (simple_return_rtx);
+    }
+}
 
 void
 thumb1_expand_epilogue (void)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 867dcbe..387ca15 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8583,8 +8583,20 @@
 
 (define_expand "return"
   [(return)]
-  "TARGET_32BIT && USE_RETURN_INSN (FALSE)"
-  "")
+  "(TARGET_ARM || (TARGET_THUMB2
+                   && ARM_FUNC_TYPE (arm_current_func_type ()) == ARM_FT_NORMAL
+                   && !IS_STACKALIGN (arm_current_func_type ())))
+    && USE_RETURN_INSN (FALSE)"
+  "
+  {
+    if (TARGET_THUMB2)
+      {
+        thumb2_expand_return ();
+        DONE;
+      }
+  }
+  "
+)
 
 ;; Often the return insn will be the same as loading from memory, so set attr
 (define_insn "*arm_return"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch, ARM][8/8] Epilogue in RTL: remove dead code
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (6 preceding siblings ...)
  2012-05-31 14:05 ` [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return Greta Yorsh
@ 2012-05-31 14:10 ` Greta Yorsh
  2012-06-15 11:05   ` Richard Earnshaw
  2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
  8 siblings, 1 reply; 21+ messages in thread
From: Greta Yorsh @ 2012-05-31 14:10 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 725 bytes --]

As a result of the previous changes, epilogue_insns pattern can only be
generated in Thumb1. After removing other cases in define_insn for
epilogue_insns, the function arm_output_epilogue becomes dead code and can
be eliminated, along with all its helper functions.


ChangeLog:

gcc

2012-05-31  Ian Bolton  <ian.bolton@arm.com>
            Sameera Deshpande  <sameera.deshpande@arm.com>
            Greta Yorsh  <greta.yorsh@arm.com>

        * config/arm/arm-protos.h (arm_output_epilogue): Remove.
        * config/arm/arm.c (print_multi_reg): Remove.
        (vfp_output_fldmd): Likewise.
        (arm_output_epilogue): Likewise.
        * config/arm/arm.md (epilogue_insns): Update condition and code.

[-- Attachment #2: 8-remove-dead-code.patch.txt --]
[-- Type: text/plain, Size: 19083 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 34de513..b97773b 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,6 @@ extern int use_return_insn (int, rtx);
 extern enum reg_class arm_regno_class (int);
 extern void arm_load_pic_register (unsigned long);
 extern int arm_volatile_func (void);
-extern const char *arm_output_epilogue (rtx);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
 extern void thumb2_expand_return (void);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 903517d..712e38f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -13886,86 +13886,6 @@ fp_const_from_val (REAL_VALUE_TYPE *r)
   gcc_unreachable ();
 }
 
-/* Output the operands of a LDM/STM instruction to STREAM.
-   MASK is the ARM register set mask of which only bits 0-15 are important.
-   REG is the base register, either the frame pointer or the stack pointer,
-   INSTR is the possibly suffixed load or store instruction.
-   RFE is nonzero if the instruction should also copy spsr to cpsr.  */
-
-static void
-print_multi_reg (FILE *stream, const char *instr, unsigned reg,
-		 unsigned long mask, int rfe)
-{
-  unsigned i;
-  bool not_first = FALSE;
-
-  gcc_assert (!rfe || (mask & (1 << PC_REGNUM)));
-  fputc ('\t', stream);
-  asm_fprintf (stream, instr, reg);
-  fputc ('{', stream);
-
-  for (i = 0; i <= LAST_ARM_REGNUM; i++)
-    if (mask & (1 << i))
-      {
-	if (not_first)
-	  fprintf (stream, ", ");
-
-	asm_fprintf (stream, "%r", i);
-	not_first = TRUE;
-      }
-
-  if (rfe)
-    fprintf (stream, "}^\n");
-  else
-    fprintf (stream, "}\n");
-}
-
-
-/* Output a FLDMD instruction to STREAM.
-   BASE if the register containing the address.
-   REG and COUNT specify the register range.
-   Extra registers may be added to avoid hardware bugs.
-
-   We output FLDMD even for ARMv5 VFP implementations.  Although
-   FLDMD is technically not supported until ARMv6, it is believed
-   that all VFP implementations support its use in this context.  */
-
-static void
-vfp_output_fldmd (FILE * stream, unsigned int base, int reg, int count)
-{
-  int i;
-
-  /* Workaround ARM10 VFPr1 bug.  */
-  if (count == 2 && !arm_arch6)
-    {
-      if (reg == 15)
-	reg--;
-      count++;
-    }
-
-  /* FLDMD may not load more than 16 doubleword registers at a time. Split the
-     load into multiple parts if we have to handle more than 16 registers.  */
-  if (count > 16)
-    {
-      vfp_output_fldmd (stream, base, reg, 16);
-      vfp_output_fldmd (stream, base, reg + 16, count - 16);
-      return;
-    }
-
-  fputc ('\t', stream);
-  asm_fprintf (stream, "fldmfdd\t%r!, {", base);
-
-  for (i = reg; i < reg + count; i++)
-    {
-      if (i > reg)
-	fputs (", ", stream);
-      asm_fprintf (stream, "d%d", i);
-    }
-  fputs ("}\n", stream);
-
-}
-
-
 /* OPERANDS[0] is the entire list of insns that constitute pop,
    OPERANDS[1] is the base register, RETURN_PC is true iff return insn
    is in the list, UPDATE is true iff the list contains explicit
@@ -16061,451 +15981,6 @@ arm_output_function_prologue (FILE *f, HOST_WIDE_INT frame_size)
 
 }
 
-const char *
-arm_output_epilogue (rtx sibling)
-{
-  int reg;
-  unsigned long saved_regs_mask;
-  unsigned long func_type;
-  /* Floats_offset is the offset from the "virtual" frame.  In an APCS
-     frame that is $fp + 4 for a non-variadic function.  */
-  int floats_offset = 0;
-  rtx operands[3];
-  FILE * f = asm_out_file;
-  unsigned int lrm_count = 0;
-  int really_return = (sibling == NULL);
-  int start_reg;
-  arm_stack_offsets *offsets;
-
-  /* If we have already generated the return instruction
-     then it is futile to generate anything else.  */
-  if (use_return_insn (FALSE, sibling) &&
-      (cfun->machine->return_used_this_function != 0))
-    return "";
-
-  func_type = arm_current_func_type ();
-
-  if (IS_NAKED (func_type))
-    /* Naked functions don't have epilogues.  */
-    return "";
-
-  if (IS_VOLATILE (func_type) && TARGET_ABORT_NORETURN)
-    {
-      rtx op;
-
-      /* A volatile function should never return.  Call abort.  */
-      op = gen_rtx_SYMBOL_REF (Pmode, NEED_PLT_RELOC ? "abort(PLT)" : "abort");
-      assemble_external_libcall (op);
-      output_asm_insn ("bl\t%a0", &op);
-
-      return "";
-    }
-
-  /* If we are throwing an exception, then we really must be doing a
-     return, so we can't tail-call.  */
-  gcc_assert (!crtl->calls_eh_return || really_return);
-
-  offsets = arm_get_frame_offsets ();
-  saved_regs_mask = offsets->saved_regs_mask;
-
-  if (TARGET_IWMMXT)
-    lrm_count = bit_count (saved_regs_mask);
-
-  floats_offset = offsets->saved_args;
-  /* Compute how far away the floats will be.  */
-  for (reg = 0; reg <= LAST_ARM_REGNUM; reg++)
-    if (saved_regs_mask & (1 << reg))
-      floats_offset += 4;
-
-  if (TARGET_APCS_FRAME && frame_pointer_needed && TARGET_ARM)
-    {
-      /* This variable is for the Virtual Frame Pointer, not VFP regs.  */
-      int vfp_offset = offsets->frame;
-
-      if (TARGET_FPA_EMU2)
-	{
-	  for (reg = LAST_FPA_REGNUM; reg >= FIRST_FPA_REGNUM; reg--)
-	    if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-	      {
-		floats_offset += 12;
-		asm_fprintf (f, "\tldfe\t%r, [%r, #-%d]\n",
-			     reg, FP_REGNUM, floats_offset - vfp_offset);
-	      }
-	}
-      else
-	{
-	  start_reg = LAST_FPA_REGNUM;
-
-	  for (reg = LAST_FPA_REGNUM; reg >= FIRST_FPA_REGNUM; reg--)
-	    {
-	      if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-		{
-		  floats_offset += 12;
-
-		  /* We can't unstack more than four registers at once.  */
-		  if (start_reg - reg == 3)
-		    {
-		      asm_fprintf (f, "\tlfm\t%r, 4, [%r, #-%d]\n",
-			           reg, FP_REGNUM, floats_offset - vfp_offset);
-		      start_reg = reg - 1;
-		    }
-		}
-	      else
-		{
-		  if (reg != start_reg)
-		    asm_fprintf (f, "\tlfm\t%r, %d, [%r, #-%d]\n",
-				 reg + 1, start_reg - reg,
-				 FP_REGNUM, floats_offset - vfp_offset);
-		  start_reg = reg - 1;
-		}
-	    }
-
-	  /* Just in case the last register checked also needs unstacking.  */
-	  if (reg != start_reg)
-	    asm_fprintf (f, "\tlfm\t%r, %d, [%r, #-%d]\n",
-			 reg + 1, start_reg - reg,
-			 FP_REGNUM, floats_offset - vfp_offset);
-	}
-
-      if (TARGET_HARD_FLOAT && TARGET_VFP)
-	{
-	  int saved_size;
-
-	  /* The fldmd insns do not have base+offset addressing
-             modes, so we use IP to hold the address.  */
-	  saved_size = arm_get_vfp_saved_size ();
-
-	  if (saved_size > 0)
-	    {
-	      floats_offset += saved_size;
-	      asm_fprintf (f, "\tsub\t%r, %r, #%d\n", IP_REGNUM,
-			   FP_REGNUM, floats_offset - vfp_offset);
-	    }
-	  start_reg = FIRST_VFP_REGNUM;
-	  for (reg = FIRST_VFP_REGNUM; reg < LAST_VFP_REGNUM; reg += 2)
-	    {
-	      if ((!df_regs_ever_live_p (reg) || call_used_regs[reg])
-		  && (!df_regs_ever_live_p (reg + 1) || call_used_regs[reg + 1]))
-		{
-		  if (start_reg != reg)
-		    vfp_output_fldmd (f, IP_REGNUM,
-				      (start_reg - FIRST_VFP_REGNUM) / 2,
-				      (reg - start_reg) / 2);
-		  start_reg = reg + 2;
-		}
-	    }
-	  if (start_reg != reg)
-	    vfp_output_fldmd (f, IP_REGNUM,
-			      (start_reg - FIRST_VFP_REGNUM) / 2,
-			      (reg - start_reg) / 2);
-	}
-
-      if (TARGET_IWMMXT)
-	{
-	  /* The frame pointer is guaranteed to be non-double-word aligned.
-	     This is because it is set to (old_stack_pointer - 4) and the
-	     old_stack_pointer was double word aligned.  Thus the offset to
-	     the iWMMXt registers to be loaded must also be non-double-word
-	     sized, so that the resultant address *is* double-word aligned.
-	     We can ignore floats_offset since that was already included in
-	     the live_regs_mask.  */
-	  lrm_count += (lrm_count % 2 ? 2 : 1);
-
-	  for (reg = LAST_IWMMXT_REGNUM; reg >= FIRST_IWMMXT_REGNUM; reg--)
-	    if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-	      {
-		asm_fprintf (f, "\twldrd\t%r, [%r, #-%d]\n",
-			     reg, FP_REGNUM, lrm_count * 4);
-		lrm_count += 2;
-	      }
-	}
-
-      /* saved_regs_mask should contain the IP, which at the time of stack
-	 frame generation actually contains the old stack pointer.  So a
-	 quick way to unwind the stack is just pop the IP register directly
-	 into the stack pointer.  */
-      gcc_assert (saved_regs_mask & (1 << IP_REGNUM));
-      saved_regs_mask &= ~ (1 << IP_REGNUM);
-      saved_regs_mask |=   (1 << SP_REGNUM);
-
-      /* There are two registers left in saved_regs_mask - LR and PC.  We
-	 only need to restore the LR register (the return address), but to
-	 save time we can load it directly into the PC, unless we need a
-	 special function exit sequence, or we are not really returning.  */
-      if (really_return
-	  && ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL
-	  && !crtl->calls_eh_return)
-	/* Delete the LR from the register mask, so that the LR on
-	   the stack is loaded into the PC in the register mask.  */
-	saved_regs_mask &= ~ (1 << LR_REGNUM);
-      else
-	saved_regs_mask &= ~ (1 << PC_REGNUM);
-
-      /* We must use SP as the base register, because SP is one of the
-         registers being restored.  If an interrupt or page fault
-         happens in the ldm instruction, the SP might or might not
-         have been restored.  That would be bad, as then SP will no
-         longer indicate the safe area of stack, and we can get stack
-         corruption.  Using SP as the base register means that it will
-         be reset correctly to the original value, should an interrupt
-         occur.  If the stack pointer already points at the right
-         place, then omit the subtraction.  */
-      if (offsets->outgoing_args != (1 + (int) bit_count (saved_regs_mask))
-	  || cfun->calls_alloca)
-	asm_fprintf (f, "\tsub\t%r, %r, #%d\n", SP_REGNUM, FP_REGNUM,
-		     4 * bit_count (saved_regs_mask));
-      print_multi_reg (f, "ldmfd\t%r, ", SP_REGNUM, saved_regs_mask, 0);
-
-      if (IS_INTERRUPT (func_type))
-	/* Interrupt handlers will have pushed the
-	   IP onto the stack, so restore it now.  */
-	print_multi_reg (f, "ldmfd\t%r!, ", SP_REGNUM, 1 << IP_REGNUM, 0);
-    }
-  else
-    {
-      /* This branch is executed for ARM mode (non-apcs frames) and
-	 Thumb-2 mode. Frame layout is essentially the same for those
-	 cases, except that in ARM mode frame pointer points to the
-	 first saved register, while in Thumb-2 mode the frame pointer points
-	 to the last saved register.
-
-	 It is possible to make frame pointer point to last saved
-	 register in both cases, and remove some conditionals below.
-	 That means that fp setup in prologue would be just "mov fp, sp"
-	 and sp restore in epilogue would be just "mov sp, fp", whereas
-	 now we have to use add/sub in those cases. However, the value
-	 of that would be marginal, as both mov and add/sub are 32-bit
-	 in ARM mode, and it would require extra conditionals
-	 in arm_expand_prologue to distinguish ARM-apcs-frame case
-	 (where frame pointer is required to point at first register)
-	 and ARM-non-apcs-frame. Therefore, such change is postponed
-	 until real need arise.  */
-      unsigned HOST_WIDE_INT amount;
-      int rfe;
-      /* Restore stack pointer if necessary.  */
-      if (TARGET_ARM && frame_pointer_needed)
-	{
-	  operands[0] = stack_pointer_rtx;
-	  operands[1] = hard_frame_pointer_rtx;
-
-	  operands[2] = GEN_INT (offsets->frame - offsets->saved_regs);
-	  output_add_immediate (operands);
-	}
-      else
-	{
-	  if (frame_pointer_needed)
-	    {
-	      /* For Thumb-2 restore sp from the frame pointer.
-		 Operand restrictions mean we have to incrememnt FP, then copy
-		 to SP.  */
-	      amount = offsets->locals_base - offsets->saved_regs;
-	      operands[0] = hard_frame_pointer_rtx;
-	    }
-	  else
-	    {
-	      unsigned long count;
-	      operands[0] = stack_pointer_rtx;
-	      amount = offsets->outgoing_args - offsets->saved_regs;
-	      /* pop call clobbered registers if it avoids a
-	         separate stack adjustment.  */
-	      count = offsets->saved_regs - offsets->saved_args;
-	      if (optimize_size
-		  && count != 0
-		  && !crtl->calls_eh_return
-		  && bit_count(saved_regs_mask) * 4 == count
-		  && !IS_INTERRUPT (func_type)
-		  && !IS_STACKALIGN (func_type)
-		  && !crtl->tail_call_emit)
-		{
-		  unsigned long mask;
-                  /* Preserve return values, of any size.  */
-		  mask = (1 << ((arm_size_return_regs() + 3) / 4)) - 1;
-		  mask ^= 0xf;
-		  mask &= ~saved_regs_mask;
-		  reg = 0;
-		  while (bit_count (mask) * 4 > amount)
-		    {
-		      while ((mask & (1 << reg)) == 0)
-			reg++;
-		      mask &= ~(1 << reg);
-		    }
-		  if (bit_count (mask) * 4 == amount) {
-		      amount = 0;
-		      saved_regs_mask |= mask;
-		  }
-		}
-	    }
-
-	  if (amount)
-	    {
-	      operands[1] = operands[0];
-	      operands[2] = GEN_INT (amount);
-	      output_add_immediate (operands);
-	    }
-	  if (frame_pointer_needed)
-	    asm_fprintf (f, "\tmov\t%r, %r\n",
-			 SP_REGNUM, HARD_FRAME_POINTER_REGNUM);
-	}
-
-      if (TARGET_FPA_EMU2)
-	{
-	  for (reg = FIRST_FPA_REGNUM; reg <= LAST_FPA_REGNUM; reg++)
-	    if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-	      asm_fprintf (f, "\tldfe\t%r, [%r], #12\n",
-			   reg, SP_REGNUM);
-	}
-      else
-	{
-	  start_reg = FIRST_FPA_REGNUM;
-
-	  for (reg = FIRST_FPA_REGNUM; reg <= LAST_FPA_REGNUM; reg++)
-	    {
-	      if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-		{
-		  if (reg - start_reg == 3)
-		    {
-		      asm_fprintf (f, "\tlfmfd\t%r, 4, [%r]!\n",
-				   start_reg, SP_REGNUM);
-		      start_reg = reg + 1;
-		    }
-		}
-	      else
-		{
-		  if (reg != start_reg)
-		    asm_fprintf (f, "\tlfmfd\t%r, %d, [%r]!\n",
-				 start_reg, reg - start_reg,
-				 SP_REGNUM);
-
-		  start_reg = reg + 1;
-		}
-	    }
-
-	  /* Just in case the last register checked also needs unstacking.  */
-	  if (reg != start_reg)
-	    asm_fprintf (f, "\tlfmfd\t%r, %d, [%r]!\n",
-			 start_reg, reg - start_reg, SP_REGNUM);
-	}
-
-      if (TARGET_HARD_FLOAT && TARGET_VFP)
-	{
-	  int end_reg = LAST_VFP_REGNUM + 1;
-
-	  /* Scan the registers in reverse order.  We need to match
-	     any groupings made in the prologue and generate matching
-	     pop operations.  */
-	  for (reg = LAST_VFP_REGNUM - 1; reg >= FIRST_VFP_REGNUM; reg -= 2)
-	    {
-	      if ((!df_regs_ever_live_p (reg) || call_used_regs[reg])
-		  && (!df_regs_ever_live_p (reg + 1)
-		      || call_used_regs[reg + 1]))
-		{
-		  if (end_reg > reg + 2)
-		    vfp_output_fldmd (f, SP_REGNUM,
-				      (reg + 2 - FIRST_VFP_REGNUM) / 2,
-				      (end_reg - (reg + 2)) / 2);
-		  end_reg = reg;
-		}
-	    }
-	  if (end_reg > reg + 2)
-	    vfp_output_fldmd (f, SP_REGNUM, 0,
-			      (end_reg - (reg + 2)) / 2);
-	}
-
-      if (TARGET_IWMMXT)
-	for (reg = FIRST_IWMMXT_REGNUM; reg <= LAST_IWMMXT_REGNUM; reg++)
-	  if (df_regs_ever_live_p (reg) && !call_used_regs[reg])
-	    asm_fprintf (f, "\twldrd\t%r, [%r], #8\n", reg, SP_REGNUM);
-
-      /* If we can, restore the LR into the PC.  */
-      if (ARM_FUNC_TYPE (func_type) != ARM_FT_INTERWORKED
-	  && (TARGET_ARM || ARM_FUNC_TYPE (func_type) == ARM_FT_NORMAL)
-	  && !IS_STACKALIGN (func_type)
-	  && really_return
-	  && crtl->args.pretend_args_size == 0
-	  && saved_regs_mask & (1 << LR_REGNUM)
-	  && !crtl->calls_eh_return)
-	{
-	  saved_regs_mask &= ~ (1 << LR_REGNUM);
-	  saved_regs_mask |=   (1 << PC_REGNUM);
-	  rfe = IS_INTERRUPT (func_type);
-	}
-      else
-	rfe = 0;
-
-      /* Load the registers off the stack.  If we only have one register
-	 to load use the LDR instruction - it is faster.  For Thumb-2
-	 always use pop and the assembler will pick the best instruction.*/
-      if (TARGET_ARM && saved_regs_mask == (1 << LR_REGNUM)
-	  && !IS_INTERRUPT(func_type))
-	{
-	  asm_fprintf (f, "\tldr\t%r, [%r], #4\n", LR_REGNUM, SP_REGNUM);
-	}
-      else if (saved_regs_mask)
-	{
-	  if (saved_regs_mask & (1 << SP_REGNUM))
-	    /* Note - write back to the stack register is not enabled
-	       (i.e. "ldmfd sp!...").  We know that the stack pointer is
-	       in the list of registers and if we add writeback the
-	       instruction becomes UNPREDICTABLE.  */
-	    print_multi_reg (f, "ldmfd\t%r, ", SP_REGNUM, saved_regs_mask,
-			     rfe);
-	  else if (TARGET_ARM)
-	    print_multi_reg (f, "ldmfd\t%r!, ", SP_REGNUM, saved_regs_mask,
-			     rfe);
-	  else
-	    print_multi_reg (f, "pop\t", SP_REGNUM, saved_regs_mask, 0);
-	}
-
-      if (crtl->args.pretend_args_size)
-	{
-	  /* Unwind the pre-pushed regs.  */
-	  operands[0] = operands[1] = stack_pointer_rtx;
-	  operands[2] = GEN_INT (crtl->args.pretend_args_size);
-	  output_add_immediate (operands);
-	}
-    }
-
-  /* We may have already restored PC directly from the stack.  */
-  if (!really_return || saved_regs_mask & (1 << PC_REGNUM))
-    return "";
-
-  /* Stack adjustment for exception handler.  */
-  if (crtl->calls_eh_return)
-    asm_fprintf (f, "\tadd\t%r, %r, %r\n", SP_REGNUM, SP_REGNUM,
-		 ARM_EH_STACKADJ_REGNUM);
-
-  /* Generate the return instruction.  */
-  switch ((int) ARM_FUNC_TYPE (func_type))
-    {
-    case ARM_FT_ISR:
-    case ARM_FT_FIQ:
-      asm_fprintf (f, "\tsubs\t%r, %r, #4\n", PC_REGNUM, LR_REGNUM);
-      break;
-
-    case ARM_FT_EXCEPTION:
-      asm_fprintf (f, "\tmovs\t%r, %r\n", PC_REGNUM, LR_REGNUM);
-      break;
-
-    case ARM_FT_INTERWORKED:
-      asm_fprintf (f, "\tbx\t%r\n", LR_REGNUM);
-      break;
-
-    default:
-      if (IS_STACKALIGN (func_type))
-	{
-	  /* See comment in arm_expand_prologue.  */
-	  asm_fprintf (f, "\tmov\t%r, %r\n", SP_REGNUM, 0);
-	}
-      if (arm_arch5 || arm_arch4t)
-	asm_fprintf (f, "\tbx\t%r\n", LR_REGNUM);
-      else
-	asm_fprintf (f, "\tmov\t%r, %r\n", PC_REGNUM, LR_REGNUM);
-      break;
-    }
-
-  return "";
-}
-
 static void
 arm_output_function_epilogue (FILE *file ATTRIBUTE_UNUSED,
 			      HOST_WIDE_INT frame_size ATTRIBUTE_UNUSED)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 40abfde..1cddedd 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10748,11 +10748,8 @@
 
 (define_insn "*epilogue_insns"
   [(unspec_volatile [(return)] VUNSPEC_EPILOGUE)]
-  "TARGET_EITHER"
+  "TARGET_THUMB1"
   "*
-  if (TARGET_32BIT)
-    return arm_output_epilogue (NULL);
-  else /* TARGET_THUMB1 */
     return thumb1_unexpanded_epilogue ();
   "
   ; Length is absolute worst case

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)
  2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
                   ` (7 preceding siblings ...)
  2012-05-31 14:10 ` [Patch, ARM][8/8] Epilogue in RTL: remove dead code Greta Yorsh
@ 2012-05-31 18:18 ` Paul Brook
  2012-06-01 11:58   ` Greta Yorsh
                     ` (2 more replies)
  8 siblings, 3 replies; 21+ messages in thread
From: Paul Brook @ 2012-05-31 18:18 UTC (permalink / raw)
  To: Greta Yorsh
  Cc: GCC Patches, joseph, Richard Earnshaw, sameera.deshpande,
	Ramana Radhakrishnan, nickc

> Testing:
> * Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp and
> tested in three configuration: -marm (default), -mthumb, -mapcs-frame. No
> regression on qemu.
> * Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
> regression on qemu.
> * Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
> arm1136jf-s. No regression on qemu.
> * Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with eglibc
> and used this compiler to build AEL linux kernel. It boots successfully. *
> Bootstrap the compiler on cortex-a8 successfully for
> --languages=c,c++,fortran and used this compiler to build gdb. No
> regression with check-gcc and check-gdb.

What other testing have you done?  Thate's a good number of combinations not 
covered by your above list.  In particular:
- Coverage of old cores looks pretty thin.  In particular ARMv4t has different 
interworking requirements.  
- iWMMXT has special alignment requirements.
- Interrupt functions with special prologue/epilogue.  Both traditional ARM 
and Cortex-M3.
- -mtpcs-frame and -mtpcs-leaf-frame

Some of these options are orthogonal.

As you've proved with -mapcs-frame it's near impossible to get these right 
without actually testing them.    I'm not saying you have to do a full testrun 
in every combination, but it's worth testing a representative selection of 
functions (large and small frame, leaf or not, with and without frame pointer, 
uses alloca, etc).  Also worth explicitly clobbering a selection (both odd and 
even numbers) of callee saved registers to make sure we get that right.  Any 
difference in the output should be manually verified (ideally the assembly 
output would be identical).

> * The patches have not been explicitly tested with any FPA variants (which
> are deprecated in 4.7 and expected to become obsolete in 4.8).

I'm not keen on breaking these without actually removing them.

Paul

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)
  2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
@ 2012-06-01 11:58   ` Greta Yorsh
  2012-06-12 15:34   ` Richard Earnshaw
  2012-06-18 16:37   ` Greta Yorsh
  2 siblings, 0 replies; 21+ messages in thread
From: Greta Yorsh @ 2012-06-01 11:58 UTC (permalink / raw)
  To: 'Paul Brook'
  Cc: GCC Patches, joseph, Richard Earnshaw, sameera.deshpande,
	Ramana Radhakrishnan, nickc


On 31 May 2012 19:18, Paul Brook wrote:
> > Testing:
> > * Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp
> and
> > tested in three configuration: -marm (default), -mthumb, -mapcs-
> frame. No
> > regression on qemu.
> > * Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
> > regression on qemu.
> > * Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
> > arm1136jf-s. No regression on qemu.
> > * Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with
> eglibc
> > and used this compiler to build AEL linux kernel. It boots
> successfully. *
> > Bootstrap the compiler on cortex-a8 successfully for
> > --languages=c,c++,fortran and used this compiler to build gdb. No
> > regression with check-gcc and check-gdb.
> 
> What other testing have you done?  Thate's a good number of
> combinations not
> covered by your above list.  In particular:
> - Coverage of old cores looks pretty thin.  In particular ARMv4t has
> different
> interworking requirements.

I ran a full regression test of gcc configured with cpu arm7tdmi on qemu. Is
there another ARMv4t configuration that should be tested?

> - iWMMXT has special alignment requirements.
> - Interrupt functions with special prologue/epilogue.  Both traditional
> ARM
> and Cortex-M3.

A few tests for interrupt functions are included in gcc's regression suite.
Specifically, the test gcc.target/arm/handler-align.c checks that the stack
pointer is handled correctly in prologue/epilogue of Cortex-M interrupt
handlers. I have a patch (not yet posted) to make this test more effective. 

> - -mtpcs-frame and -mtpcs-leaf-frame
> 
> Some of these options are orthogonal.
> 
> As you've proved with -mapcs-frame it's near impossible to get these
> right
> without actually testing them.    I'm not saying you have to do a full
> testrun
> in every combination, but it's worth testing a representative selection
> of
> functions (large and small frame, leaf or not, with and without frame
> pointer,
> uses alloca, etc).  
> Also worth explicitly clobbering a selection (both
> odd and
> even numbers) of callee saved registers to make sure we get that right.
> Any
> difference in the output should be manually verified (ideally the
> assembly
> output would be identical).

For interrupt-related tests, interworking, and several other tests, I've
compared the assembly outputs before and after the patch (and caught a
couple of bugs this way).
In most cases now, the assembly outputs before and after the patch are
identical. The few differences I have seen are due to successful compiler
optimizations, where we benefit from having generated epilogues in RTL. For
example, replacing "sub sp, fp, #0" with "mov sp, fp" in epilogue. Also,
explicit write to callee-saved registers to restore them in epilgoue allows
the data flow analysis pass to deduce that registers are dead and enables
peephole optimizations that were not possible before. 

> 
> > * The patches have not been explicitly tested with any FPA variants
> (which
> > are deprecated in 4.7 and expected to become obsolete in 4.8).
> 
> I'm not keen on breaking these without actually removing them.

Thanks for pointing out additional configurations to test. I will test
-mtpcs-frame and -mtpcs-leaf-frame as you suggested and run regression tests
for iWMMXT. 

Properly testing FPA variants at this point is a lot of work, especially
considering the fact that these variants are obsolete. What minimal
configurations would be sufficient to test?

Thank you,
Greta



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)
  2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
  2012-06-01 11:58   ` Greta Yorsh
@ 2012-06-12 15:34   ` Richard Earnshaw
  2012-06-18 16:37   ` Greta Yorsh
  2 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-12 15:34 UTC (permalink / raw)
  To: Paul Brook
  Cc: Greta Yorsh, GCC Patches, joseph, sameera.deshpande,
	Ramana Radhakrishnan, nickc

On 31/05/12 19:18, Paul Brook wrote:

>> * The patches have not been explicitly tested with any FPA variants (which
>> are deprecated in 4.7 and expected to become obsolete in 4.8).
> 
> I'm not keen on breaking these without actually removing them.
> 
> Paul
> 

Ok, so lets remove them...

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p
  2012-05-31 13:51 ` [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p Greta Yorsh
@ 2012-06-15  9:20   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15  9:20 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 14:50, Greta Yorsh wrote:
> This patch updates ldm_stm_operation_p to check for loads that if SP is in
> the register list, then the base register is SP. It guarantees that SP is
> reset correctly when an LDM instruction is interrupted. Otherwise, we might
> end up with a corrupt stack. 
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Greta Yorsh  <greta.yorsh@arm.com>
> 
> 	* config/arm/arm.c (ldm_stm_operation_p): Require SP
>       as base register for loads if SP is in the register list.
> 
> 
> 1-update-predicate.patch.txt
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index e3290e2..4717725 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -10247,6 +10247,12 @@ ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
>    if (!REG_P (addr))
>      return false;
>  
> +  /* Don't allow SP to be loaded unless it is also the base register. It
> +     guarantees that SP is reset correctly when an LDM instruction
> +     is interruptted. Otherwise, we might end up with a corrupt stack.  */
> +  if (load && (REGNO (reg) == SP_REGNUM) && (REGNO (addr) != SP_REGNUM))
> +    return false;
> +
>    for (; i < count; i++)
>      {
>        elt = XVECEXP (op, 0, i);
> @@ -10270,6 +10276,10 @@ ldm_stm_operation_p (rtx op, bool load, enum machine_mode mode,
>            || (consecutive
>                && (REGNO (reg) !=
>                    (unsigned int) (first_regno + regs_per_val * (i - base))))
> +          /* Don't allow SP to be loaded unless it is also the base register. It
> +             guarantees that SP is reset correctly when an LDM instruction
> +             is interrupted. Otherwise, we might end up with a corrupt stack.  */
> +          || (load && (REGNO (reg) == SP_REGNUM) && (REGNO (addr) != SP_REGNUM))
>            || !MEM_P (mem)
>            || GET_MODE (mem) != mode
>            || ((GET_CODE (XEXP (mem, 0)) != PLUS

OK.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs
  2012-05-31 13:54 ` [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs Greta Yorsh
@ 2012-06-15  9:22   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15  9:22 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 14:53, Greta Yorsh wrote:
> This patch adds new define_insn patterns for epilogue with integer
> registers.
> 
> The patterns can handle pop multiple with writeback and return (loading into
> PC directly).
> To handle return, the patterns use a new special predicate
> pop_multiple_return, that uses ldm_stm_operation_p function from a previous
> patch. To output assembly, the patterns use a new function
> arm_output_multireg_pop.
> 
> This patch also adds a new function arm_emit_multi_reg_pop
> that emits RTL that matches the new pop patterns for integer registers.
> This is a helper function for epilogue expansion. It is used by a later
> patch.
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm.md (load_multiple_with_writeback) New define_insn.
>         (load_multiple, pop_multiple_with_writeback_and_return) Likewise.
>         (pop_multiple_with_return, ldr_with_return) Likewise.
>         * config/arm/predicates.md (pop_multiple_return) New special
> predicate.
>         * config/arm/arm-protos.h (arm_output_multireg_pop) New declaration.
>         * config/arm/arm.c (arm_output_multireg_pop) New function.
>         (arm_emit_multi_reg_pop): New function.
>         (ldm_stm_operation_p): Check SP in the register list.
> 
> 
> 2-patterns.patch.txt
> 
> 


> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 4717725..9093801 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -13815,6 +13815,84 @@ vfp_output_fldmd (FILE * stream, unsigned int base, int reg, int count)
>  }
>  
>  
> +/* OPERANDS[0] is the entire list of insns that constitute pop,
> +   OPERANDS[1] is the base register, RETURN_PC is true iff return insn
> +   is in the list, UPDATE is true iff the list contains explicit
> +   update of base register.
> + */

Close of comment should not be on a separate line.

> +void
> +arm_output_multireg_pop (rtx *operands, bool return_pc, rtx cond, bool reverse,
> +                         bool update)

> +  offset += return_pc ? 1 : 0;
> +
> +  /* Is the base register in the list? */

Two spaces at end of comment before */.

> +  for (i = offset; i < num_saves; i++)
> +    {
> +      regno = REGNO (XEXP (XVECEXP (operands[0], 0, i), 0));
> +      /* If SP is in the list, then the base register must be SP. */

And here.

> +      gcc_assert ((regno != SP_REGNUM) || (regno_base == SP_REGNUM));
> +      /* If base register is in the list, there must be no explicit update.  */
> +      if (regno == regno_base)
> +        gcc_assert (!update);
> +    }
> +
> +  conditional = reverse ? "%?%D0" : "%?%d0";
> +  if ((regno_base == SP_REGNUM) && TARGET_UNIFIED_ASM)
> +    {
> +      /* Output pop (not stmfd) because it has a shorter encoding. */

And here.

> +      gcc_assert (update);
> +      sprintf (pattern, "pop%s\t{", conditional);
> +    }
> +  else
> +    {
> +      /* Output ldmfd when the base register is SP, otherwise output ldmia.
> +         It's just a convention, their semantics are identical.  */
> +      if (regno_base == SP_REGNUM)
> +        sprintf (pattern, "ldm%sfd\t", conditional);
> +      else if (TARGET_UNIFIED_ASM)
> +        sprintf (pattern, "ldmia%s\t", conditional);
> +      else
> +        sprintf (pattern, "ldm%sia\t", conditional);
> +
> +      strcat (pattern, reg_names[regno_base]);
> +      if (update)
> +        strcat (pattern, "!, {");
> +      else
> +        strcat (pattern, ", {");
> +    }
> +
> +  /* Output the first destination register. */

And here.

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index ed33c9b..862ccf4 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -10959,6 +10959,89 @@
>    [(set_attr "type" "f_fpa_store")]
>  )
>  
> +;; Pop (as used in epilogue RTL)
> +;;
> +(define_insn "*load_multiple_with_writeback"
> +  [(match_parallel 0 "load_multiple_operation"
> +    [(set (match_operand:SI 1 "s_register_operand" "+rk")
> +          (plus:SI (match_dup 1)
> +                   (match_operand:SI 2 "const_int_operand" "I")))
> +     (set (match_operand:SI 3 "s_register_operand" "=rk")
> +          (mem:SI (match_dup 1)))
> +        ])]
> +  "TARGET_32BIT && (reload_in_progress || reload_completed)"
> +  "*
> +  {
> +    arm_output_multireg_pop (operands, /*return_pc=*/FALSE,
> +                                       /*cond=*/const_true_rtx,
> +                                       /*reverse=*/FALSE,
> +                                       /*update=*/TRUE);

Use lower case for TRUE and FALSE.  Several instances later on as well.

OK with those changes.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs
  2012-05-31 13:55 ` [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs Greta Yorsh
@ 2012-06-15 10:44   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 10:44 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 14:55, Greta Yorsh wrote:
> New define insn pattern for epilogue with floating point registers (DFmode)
> and a new function that emits RTL for this pattern. This function is a
> helper for epilogue extension. It is used by a later patch.
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm.md (vfp_pop_multiple_with_writeback) New
> define_insn.
>         * config/arm/predicates.md (pop_multiple_fp) New special predicate.
>         * config/arm/arm.c (arm_emit_vfp_multi_reg_pop): New function.
> 
> 

OK.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame
  2012-05-31 13:59 ` [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame Greta Yorsh
@ 2012-06-15 10:46   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 10:46 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 14:58, Greta Yorsh wrote:
> Helper function for epilogue expansion. Emit RTL for APCS frame epilogue
> (when -mapcs-frame command line option is specified).
> This function is used by a later patch.
> 
> For APCS frame epilogue, the compiler currently generates LDM with SP as
> both the base register
> and one of the destination registers. For example:
> 
> @ APCS_FRAME epilogue
> ldmfd   sp, {r4, fp, sp, pc}
> 
> @ non-APCS_FRAME epilogue
> ldmfd     sp!, {r4, fp, pc}
> 
> The use of SP in LDM register list is deprecated, but this patch does not
> address the problem.
> 
> To generate the epilogue for APCS frame in RTL, this patch adds a new
> alternative to arm_addsi2 insn in ARM mode only to generate "sub sp, fp,
> #imm". Previously, there was no pattern to generate sub with SP as the
> destination register and not SP as the operand register.
> 
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm.c (arm_expand_epilogue_apcs_frame): New function.
>         * config/arm/arm.md (arm_addsi3) Add an alternative.
> 

The FPA support is now obsolete.  Please remove that.

OK with that change.

R.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][5/8] Epilogue in RTL: expand
  2012-05-31 14:01 ` [Patch, ARM][5/8] Epilogue in RTL: expand Greta Yorsh
@ 2012-06-15 10:47   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 10:47 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 14:59, Greta Yorsh wrote:
> The main function for epilogue RTL generation, used by expand epilogue
> patterns.
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm-protos.h (arm_expand_epilogue): New declaration.
>         * config/arm/arm.c (arm_expand_epilogue): New function.
>         * config/arm/arm.md (epilogue): Update condition and code.
>         (sibcall_epilogue): Likewise.
> 

Same as last patch, OK once the FPA support has been stripped out.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][6/8] Epilogue in RTL: simple return
  2012-05-31 14:03 ` [Patch, ARM][6/8] Epilogue in RTL: simple return Greta Yorsh
@ 2012-06-15 10:49   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 10:49 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 15:02, Greta Yorsh wrote:
> Add a new parameter to the function output_return_instruction to handle
> simple cases of return when no epilogue needs to be printed out.
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm-protos.h (output_return_instruction): New
> parameter.
>         * config/arm/arm.c (output_return_instruction): New parameter.
>         * config/arm/arm.md (arm_simple_return): New pattern.
>         (arm_return, cond_return, cond_return_inverted): Add new arguments.
>         * config/arm/thumb2.md (thumb2_return): Update condition and code.
> 

Since you're chaning output_return_instruction, please update it to use
the bool type for the flags; then modify the callers to use 'true' and
'false' rather than 'TRUE' and 'FALSE'.

OK with that change.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return
  2012-05-31 14:05 ` [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return Greta Yorsh
@ 2012-06-15 10:58   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 10:58 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 15:04, Greta Yorsh wrote:
> Generate RTL for return in Thumb2 mode. Used by expand of return insn.
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm-protos.h (thumb2_expand_return): New declaration.
>         * config/arm/arm.c (thumb2_expand_return): New function.
>         * config/arm/arm.md (return): Update condition and code.
> 

OK.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch, ARM][8/8] Epilogue in RTL: remove dead code
  2012-05-31 14:10 ` [Patch, ARM][8/8] Epilogue in RTL: remove dead code Greta Yorsh
@ 2012-06-15 11:05   ` Richard Earnshaw
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Earnshaw @ 2012-06-15 11:05 UTC (permalink / raw)
  To: Greta Yorsh; +Cc: GCC Patches

On 31/05/12 15:09, Greta Yorsh wrote:
> As a result of the previous changes, epilogue_insns pattern can only be
> generated in Thumb1. After removing other cases in define_insn for
> epilogue_insns, the function arm_output_epilogue becomes dead code and can
> be eliminated, along with all its helper functions.
> 
> 
> ChangeLog:
> 
> gcc
> 
> 2012-05-31  Ian Bolton  <ian.bolton@arm.com>
>             Sameera Deshpande  <sameera.deshpande@arm.com>
>             Greta Yorsh  <greta.yorsh@arm.com>
> 
>         * config/arm/arm-protos.h (arm_output_epilogue): Remove.
>         * config/arm/arm.c (print_multi_reg): Remove.
>         (vfp_output_fldmd): Likewise.
>         (arm_output_epilogue): Likewise.
>         * config/arm/arm.md (epilogue_insns): Update condition and code.
> 

OK.

R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I)
  2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
  2012-06-01 11:58   ` Greta Yorsh
  2012-06-12 15:34   ` Richard Earnshaw
@ 2012-06-18 16:37   ` Greta Yorsh
  2 siblings, 0 replies; 21+ messages in thread
From: Greta Yorsh @ 2012-06-18 16:37 UTC (permalink / raw)
  To: 'Paul Brook'
  Cc: GCC Patches, joseph, Richard Earnshaw, sameera.deshpande,
	Ramana Radhakrishnan, nickc

Paul, 

I did additional testing of the patches, as you suggested. 

For iwmmxt, no regression on qemu (using -cpu pxa270) for arm-none-eabi
taget configured --with-cpu iwmmxt --with-float soft --with-arch iwmmxt
--with-abi iwmmxt --disable-multilib. There is already a test for mmx stack
alignment in gcc.target/arm/mmx-1.c.  I have also tested a few other options
(including -mtcps-frame and -mtpcs-leaf-frame) on several examples and
haven't found any problems with the patches (at least, not yet :)

Separately, I submitted a couple of testsuite patches related to RTL
epilogue:
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01175.html
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg01176.html

FPA support is in the process of being removed from ARM backend trunk:
http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00825.html

I hope it addresses your concerns. 

Following Richard's comments, I removed FPA support from RTL epilogue
patches, rebased patches to trunk, and fixed some formatting problems. I'll
go ahead and apply individual patches that have already been approved. 

Thank you,
Greta

> -----Original Message-----
> From: Paul Brook [mailto:paul@codesourcery.com]
> Sent: 31 May 2012 19:18
> To: Greta Yorsh
> Cc: GCC Patches; joseph@codesourcery.com; Richard Earnshaw;
> sameera.deshpande@gmail.com; Ramana Radhakrishnan; nickc@redhat.com
> Subject: Re: [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's
> patches, Part I)
> 
> > Testing:
> > * Crossbuild for target arm-none-eabi with cpu cortex-a9 neon softfp
> and
> > tested in three configuration: -marm (default), -mthumb, -mapcs-
> frame. No
> > regression on qemu.
> > * Crossbuild for target arm-none-eabi thumb2 with cpu cortex-m3. No
> > regression on qemu.
> > * Crossbuild for target arm-none-eabi thumb1 with cpu arm7tdmi and
> > arm1136jf-s. No regression on qemu.
> > * Crossbuild for target arm-linux-gnueabi with cpu cortex-a9 with
> eglibc
> > and used this compiler to build AEL linux kernel. It boots
> successfully. *
> > Bootstrap the compiler on cortex-a8 successfully for
> > --languages=c,c++,fortran and used this compiler to build gdb. No
> > regression with check-gcc and check-gdb.
> 
> What other testing have you done?  Thate's a good number of
> combinations not
> covered by your above list.  In particular:
> - Coverage of old cores looks pretty thin.  In particular ARMv4t has
> different
> interworking requirements.
> - iWMMXT has special alignment requirements.
> - Interrupt functions with special prologue/epilogue.  Both traditional
> ARM
> and Cortex-M3.
> - -mtpcs-frame and -mtpcs-leaf-frame
> 
> Some of these options are orthogonal.
> 
> As you've proved with -mapcs-frame it's near impossible to get these
> right
> without actually testing them.    I'm not saying you have to do a full
> testrun
> in every combination, but it's worth testing a representative selection
> of
> functions (large and small frame, leaf or not, with and without frame
> pointer,
> uses alloca, etc).  Also worth explicitly clobbering a selection (both
> odd and
> even numbers) of callee saved registers to make sure we get that right.
> Any
> difference in the output should be manually verified (ideally the
> assembly
> output would be identical).
> 
> > * The patches have not been explicitly tested with any FPA variants
> (which
> > are deprecated in 4.7 and expected to become obsolete in 4.8).
> 
> I'm not keen on breaking these without actually removing them.
> 
> Paul




^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2012-06-18 16:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-31 13:44 [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Greta Yorsh
2012-05-31 13:51 ` [Patch, ARM][1/8] Epilogue in RTL: update ldm_stm_operation_p Greta Yorsh
2012-06-15  9:20   ` Richard Earnshaw
2012-05-31 13:54 ` [Patch, ARM][2/8] Epilogue in RTL: new patterns for int regs Greta Yorsh
2012-06-15  9:22   ` Richard Earnshaw
2012-05-31 13:55 ` [Patch, ARM][3/8] Epilogue in RTL: new patterns for vfp regs Greta Yorsh
2012-06-15 10:44   ` Richard Earnshaw
2012-05-31 13:59 ` [Patch, ARM][4/8] Epilogue in RTL: expand epilogue for apcs frame Greta Yorsh
2012-06-15 10:46   ` Richard Earnshaw
2012-05-31 14:01 ` [Patch, ARM][5/8] Epilogue in RTL: expand Greta Yorsh
2012-06-15 10:47   ` Richard Earnshaw
2012-05-31 14:03 ` [Patch, ARM][6/8] Epilogue in RTL: simple return Greta Yorsh
2012-06-15 10:49   ` Richard Earnshaw
2012-05-31 14:05 ` [Patch, ARM][7/8] Epilogue in RTL: expand thumb2 return Greta Yorsh
2012-06-15 10:58   ` Richard Earnshaw
2012-05-31 14:10 ` [Patch, ARM][8/8] Epilogue in RTL: remove dead code Greta Yorsh
2012-06-15 11:05   ` Richard Earnshaw
2012-05-31 18:18 ` [Patch, ARM][0/8] Epilogue in RTL: introduction (Sameera's patches, Part I) Paul Brook
2012-06-01 11:58   ` Greta Yorsh
2012-06-12 15:34   ` Richard Earnshaw
2012-06-18 16:37   ` Greta Yorsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).