public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Add a new target hook to compute the frame layout
@ 2016-06-16 14:47 Bernd Edlinger
  2016-06-21 21:30 ` Jeff Law
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Edlinger @ 2016-06-16 14:47 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 1158 bytes --]

Hi!


By the design of the target hook INITIAL_ELIMINATION_OFFSET
it is necessary to call this function several times with
different register combinations.
Most targets use a cached data structure that describes the
exact frame layout of the current function.

It is safe to skip the computation when reload_completed = true,
and most targets do that already.

However while reload is doing its work, it is not clear when to
do the computation and when not.  This results in unnecessary
work.  Computing the frame layout can be a simple function or an
arbitrarily complex one, that walks all instructions of the current
function for instance, which is more or less the common case.


This patch adds a new optional target hook that can be used
by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
into a O(n) computation part, and a O(1) result function.

The patch implements a compute_frame_layout target hook just
for ARM in the moment, to show the principle.
Other targets may also implement that hook, if it seems appropriate.


Boot-strapped and reg-tested on arm-linux-gnueabihf.
OK for trunk?


Thanks
Bernd.

[-- Attachment #2: changelog-frame-layout.txt --]
[-- Type: text/plain, Size: 610 bytes --]

2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* target.def (compute_frame_layout): New optional target hook.
	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
	target hook.
	* reload1.c (verify_initial_elim_offsets): Likewise.
	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
	(use_simple_return_p): Call arm_compute_frame_layout if needed.
	(arm_get_frame_offsets): Split up into this ...
	(arm_compute_frame_layout): ... and this function.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout.diff --]
[-- Type: text/x-patch; name="patch-frame-layout.diff", Size: 7404 bytes --]

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(Revision 233176)
+++ gcc/config/arm/arm.c	(Arbeitskopie)
@@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
 static bool arm_needs_doubleword_align (machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
+static void arm_compute_frame_layout (void);
 static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
@@ -669,6 +670,9 @@ static const struct attribute_spec arm_attribute_t
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
 
@@ -3813,6 +3817,10 @@ use_simple_return_p (void)
 {
   arm_stack_offsets *offsets;
 
+  /* Note this function can be called before or after reload.  */
+  if (!reload_completed)
+    arm_compute_frame_layout ();
+
   offsets = arm_get_frame_offsets ();
   return offsets->outgoing_args != 0;
 }
@@ -19238,7 +19246,7 @@ arm_compute_static_chain_stack_bytes (void)
 
 /* Compute a bit mask of which registers need to be
    saved on the stack for the current function.
-   This is used by arm_get_frame_offsets, which may add extra registers.  */
+   This is used by arm_compute_frame_layout, which may add extra registers.  */
 
 static unsigned long
 arm_compute_save_reg_mask (void)
@@ -20789,12 +20797,25 @@ any_sibcall_could_use_r3 (void)
   alignment.  */
 
 
+/* Return cached stack offsets.  */
+
+static arm_stack_offsets *
+arm_get_frame_offsets (void)
+{
+  struct arm_stack_offsets *offsets;
+
+  offsets = &cfun->machine->stack_offsets;
+
+  return offsets;
+}
+
+
 /* Calculate stack offsets.  These are used to calculate register elimination
    offsets and in prologue/epilogue code.  Also calculates which registers
    should be saved.  */
 
-static arm_stack_offsets *
-arm_get_frame_offsets (void)
+static void
+arm_compute_frame_layout (void)
 {
   struct arm_stack_offsets *offsets;
   unsigned long func_type;
@@ -20806,19 +20827,6 @@ any_sibcall_could_use_r3 (void)
 
   offsets = &cfun->machine->stack_offsets;
 
-  /* We need to know if we are a leaf function.  Unfortunately, it
-     is possible to be called after start_sequence has been called,
-     which causes get_insns to return the insns for the sequence,
-     not the function, which will cause leaf_function_p to return
-     the incorrect result.
-
-     to know about leaf functions once reload has completed, and the
-     frame size cannot be changed after that time, so we can safely
-     use the cached value.  */
-
-  if (reload_completed)
-    return offsets;
-
   /* Initially this is the size of the local variables.  It will translated
      into an offset once we have determined the size of preceding data.  */
   frame_size = ROUND_UP_WORD (get_frame_size ());
@@ -20885,7 +20893,7 @@ any_sibcall_could_use_r3 (void)
     {
       offsets->outgoing_args = offsets->soft_frame;
       offsets->locals_base = offsets->soft_frame;
-      return offsets;
+      return;
     }
 
   /* Ensure SFP has the correct alignment.  */
@@ -20961,8 +20969,6 @@ any_sibcall_could_use_r3 (void)
 	offsets->outgoing_args += 4;
       gcc_assert (!(offsets->outgoing_args & 7));
     }
-
-  return offsets;
 }
 
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(Revision 233176)
+++ gcc/doc/tm.texi	(Arbeitskopie)
@@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
 defined.
 @end defmac
 
+@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
+This target hook is called immediately before reload wants to call
+@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame
+layout instead of re-computing it on every invocation.  This is particularly
+useful for targets that have an O(n) frame layout function.  Implementing
+this callback is optional.
+@end deftypefn
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(Revision 233176)
+++ gcc/doc/tm.texi.in	(Arbeitskopie)
@@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
 defined.
 @end defmac
 
+@hook TARGET_COMPUTE_FRAME_LAYOUT
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/lra-eliminations.c
===================================================================
--- gcc/lra-eliminations.c	(Revision 233176)
+++ gcc/lra-eliminations.c	(Arbeitskopie)
@@ -1202,6 +1202,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
   struct lra_elim_table *ep, *ep1;
   HARD_REG_SET temp_hard_reg_set;
 
+#ifdef ELIMINABLE_REGS
+  targetm.compute_frame_layout ();
+#endif
+
   /* Clear self elimination offsets.  */
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     self_elim_offsets[ep->from] = 0;
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	(Revision 233176)
+++ gcc/reload1.c	(Arbeitskopie)
@@ -3856,6 +3856,7 @@ verify_initial_elim_offsets (void)
   {
    struct elim_table *ep;
 
+   targetm.compute_frame_layout ();
    for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
      {
        INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
@@ -3880,6 +3881,7 @@ set_initial_elim_offsets (void)
   struct elim_table *ep = reg_eliminate;
 
 #ifdef ELIMINABLE_REGS
+  targetm.compute_frame_layout ();
   for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(Revision 233176)
+++ gcc/target.def	(Arbeitskopie)
@@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
  unsigned int, (void),
  default_case_values_threshold)
 
-/* Retutn true if a function must have and use a frame pointer.  */
+/* Optional callback to advise the target to compute the frame layout.  */
 DEFHOOK
+(compute_frame_layout,
+ "This target hook is called immediately before reload wants to call\n\
+@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame\n\
+layout instead of re-computing it on every invocation.  This is particularly\n\
+useful for targets that have an O(n) frame layout function.  Implementing\n\
+this callback is optional.",
+ void, (void),
+ hook_void_void)
+
+/* Return true if a function must have and use a frame pointer.  */
+DEFHOOK
 (frame_pointer_required,
  "This target hook should return @code{true} if a function must have and use\n\
 a frame pointer.  This target hook is called in the reload pass.  If its return\n\

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add a new target hook to compute the frame layout
  2016-06-16 14:47 [PATCH] Add a new target hook to compute the frame layout Bernd Edlinger
@ 2016-06-21 21:30 ` Jeff Law
  2016-06-22  5:12   ` Bernd Edlinger
  2016-08-04 21:16   ` [PATCH, ARM] " Bernd Edlinger
  0 siblings, 2 replies; 15+ messages in thread
From: Jeff Law @ 2016-06-21 21:30 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
> Hi!
>
>
> By the design of the target hook INITIAL_ELIMINATION_OFFSET
> it is necessary to call this function several times with
> different register combinations.
> Most targets use a cached data structure that describes the
> exact frame layout of the current function.
>
> It is safe to skip the computation when reload_completed = true,
> and most targets do that already.
>
> However while reload is doing its work, it is not clear when to
> do the computation and when not.  This results in unnecessary
> work.  Computing the frame layout can be a simple function or an
> arbitrarily complex one, that walks all instructions of the current
> function for instance, which is more or less the common case.
>
>
> This patch adds a new optional target hook that can be used
> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
> into a O(n) computation part, and a O(1) result function.
>
> The patch implements a compute_frame_layout target hook just
> for ARM in the moment, to show the principle.
> Other targets may also implement that hook, if it seems appropriate.
>
>
> Boot-strapped and reg-tested on arm-linux-gnueabihf.
> OK for trunk?
>
>
> Thanks
> Bernd.
>
>
> changelog-frame-layout.txt
>
>
> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>
> 	* target.def (compute_frame_layout): New optional target hook.
> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
> 	target hook.
> 	* reload1.c (verify_initial_elim_offsets): Likewise.
> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
> 	(arm_get_frame_offsets): Split up into this ...
> 	(arm_compute_frame_layout): ... and this function.
The ARM maintainers would need to chime in on the ARM specific changes 
though.



> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def	(Revision 233176)
> +++ gcc/target.def	(Arbeitskopie)
> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>   unsigned int, (void),
>   default_case_values_threshold)
>
> -/* Retutn true if a function must have and use a frame pointer.  */
s/Retutn/Return

> +/* Optional callback to advise the target to compute the frame layout.  */
>  DEFHOOK
> +(compute_frame_layout,
> + "This target hook is called immediately before reload wants to call\n\
> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame\n\
> +layout instead of re-computing it on every invocation.  This is particularly\n\
> +useful for targets that have an O(n) frame layout function.  Implementing\n\
> +this callback is optional.",
> + void, (void),
> + hook_void_void)
So the docs say "immediately before", but that's not actually reality in 
lra-eliminations.  I think you can just say "This target hook is called 
before reload or lra-eliminations calls 
@code{INITIAL_ELIMINATION_OFFSET} and allows ..."


How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

I'm OK with this conceptually.  I think you need a minor doc update and 
OK from the ARM maintainers before it can be installed though.

jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add a new target hook to compute the frame layout
  2016-06-21 21:30 ` Jeff Law
@ 2016-06-22  5:12   ` Bernd Edlinger
  2016-06-22  7:21     ` AW: " Bernd Edlinger
  2016-06-22 18:49     ` Jeff Law
  2016-08-04 21:16   ` [PATCH, ARM] " Bernd Edlinger
  1 sibling, 2 replies; 15+ messages in thread
From: Bernd Edlinger @ 2016-06-22  5:12 UTC (permalink / raw)
  To: Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/21/16 23:29, Jeff Law wrote:
> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>> it is necessary to call this function several times with
>> different register combinations.
>> Most targets use a cached data structure that describes the
>> exact frame layout of the current function.
>>
>> It is safe to skip the computation when reload_completed = true,
>> and most targets do that already.
>>
>> However while reload is doing its work, it is not clear when to
>> do the computation and when not.  This results in unnecessary
>> work.  Computing the frame layout can be a simple function or an
>> arbitrarily complex one, that walks all instructions of the current
>> function for instance, which is more or less the common case.
>>
>>
>> This patch adds a new optional target hook that can be used
>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>> into a O(n) computation part, and a O(1) result function.
>>
>> The patch implements a compute_frame_layout target hook just
>> for ARM in the moment, to show the principle.
>> Other targets may also implement that hook, if it seems appropriate.
>>
>>
>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>> OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> changelog-frame-layout.txt
>>
>>
>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>>     * target.def (compute_frame_layout): New optional target hook.
>>     * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>     * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>     * lra-eliminations.c (update_reg_eliminate): Call
>> compute_frame_layout
>>     target hook.
>>     * reload1.c (verify_initial_elim_offsets): Likewise.
>>     * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>     (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>     (arm_get_frame_offsets): Split up into this ...
>>     (arm_compute_frame_layout): ... and this function.
> The ARM maintainers would need to chime in on the ARM specific changes
> though.
>
>
>
>> Index: gcc/target.def
>> ===================================================================
>> --- gcc/target.def    (Revision 233176)
>> +++ gcc/target.def    (Arbeitskopie)
>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>   unsigned int, (void),
>>   default_case_values_threshold)
>>
>> -/* Retutn true if a function must have and use a frame pointer.  */
> s/Retutn/Return
>
yes, a line "+ /* Return ..." is a few lines below.
>> +/* Optional callback to advise the target to compute the frame
>> layout.  */
>>  DEFHOOK
>> +(compute_frame_layout,
>> + "This target hook is called immediately before reload wants to call\n\
>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>> frame\n\
>> +layout instead of re-computing it on every invocation.  This is
>> particularly\n\
>> +useful for targets that have an O(n) frame layout function.
>> Implementing\n\
>> +this callback is optional.",
>> + void, (void),
>> + hook_void_void)
> So the docs say "immediately before", but that's not actually reality in
> lra-eliminations.  I think you can just say "This target hook is called
> before reload or lra-eliminations calls
> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>
>
> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

What I wanted to say here, is that lra goes thru several iterations,
changes something in the register allocation that has an impact on the
frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET 
3-4 times in a row, and in the results must be consistent in each
iteration to be usable.

So I am open to suggestions, how would you explain this idea in the doc?


Thanks
Bernd.

>
> I'm OK with this conceptually.  I think you need a minor doc update and
> OK from the ARM maintainers before it can be installed though.
>
> jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* AW: [PATCH] Add a new target hook to compute the frame layout
  2016-06-22  5:12   ` Bernd Edlinger
@ 2016-06-22  7:21     ` Bernd Edlinger
  2016-06-22 18:34       ` Jeff Law
  2016-06-22 18:49     ` Jeff Law
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Edlinger @ 2016-06-22  7:21 UTC (permalink / raw)
  To: Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/21/16 23:29, Jeff Law wrote:
>
> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

That I forgot to mention:  INITIAL_FRAME_POINTER_OFFSET is just
a single call, so whenever it is called from lra/reload the frame layout
is really expected to change, and so it does not make a difference if the target
computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in
INITIAL_FRAME_POINTER_OFFSET.

But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET,
and maybe support for this target hook could be discontinued as a follow-up patch.

What do you think?


Bernd.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: AW: [PATCH] Add a new target hook to compute the frame layout
  2016-06-22  7:21     ` AW: " Bernd Edlinger
@ 2016-06-22 18:34       ` Jeff Law
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Law @ 2016-06-22 18:34 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/22/2016 01:20 AM, Bernd Edlinger wrote:
> On 06/21/16 23:29, Jeff Law wrote:
>>
>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>
> That I forgot to mention:  INITIAL_FRAME_POINTER_OFFSET is just
> a single call, so whenever it is called from lra/reload the frame layout
> is really expected to change, and so it does not make a difference if the target
> computes the frame layout in TARGET_COMPUTE_FRAME_LAYOUT or in
> INITIAL_FRAME_POINTER_OFFSET.
>
> But I do not know of any targets that still use INITIAL_FRAME_POINTER_OFFSET,
> and maybe support for this target hook could be discontinued as a follow-up patch.
INITIAL_FRAME_POINTER_OFFSET is only defined in 4 ports:

./ft32/ft32.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0
./m32r/m32r.h:#define INITIAL_FRAME_POINTER_OFFSET(VAR) \
./moxie/moxie.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0
./vax/vax.h:#define INITIAL_FRAME_POINTER_OFFSET(DEPTH) (DEPTH) = 0;

However, the m32r version is actually #if 0'd out.  So it's really only 
defined in 3 ports and always to "0".  So yea, it'd be a good candidate 
to collapse away.

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add a new target hook to compute the frame layout
  2016-06-22  5:12   ` Bernd Edlinger
  2016-06-22  7:21     ` AW: " Bernd Edlinger
@ 2016-06-22 18:49     ` Jeff Law
  2016-06-22 20:25       ` Bernd Edlinger
  1 sibling, 1 reply; 15+ messages in thread
From: Jeff Law @ 2016-06-22 18:49 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/21/2016 11:12 PM, Bernd Edlinger wrote:

>
> What I wanted to say here, is that lra goes thru several iterations,
> changes something in the register allocation that has an impact on the
> frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET
> 3-4 times in a row, and in the results must be consistent in each
> iteration to be usable.
>
> So I am open to suggestions, how would you explain this idea in the doc?
I'm not sure :(  The goal is still the same, you're trying to separate 
the O(n) from the O(1) operations.  So you want the COMPUTE_FRAME_LAYOUT 
hook to be called once for things which don't vary and 
INITIAL_ELIMINATION_OFFSET multiple times for things that do vary.

Thinking more about this, which port has has a particularly expensive 
INITIAL_ELIMINATION_OFFSET?

Jeff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] Add a new target hook to compute the frame layout
  2016-06-22 18:49     ` Jeff Law
@ 2016-06-22 20:25       ` Bernd Edlinger
  0 siblings, 0 replies; 15+ messages in thread
From: Bernd Edlinger @ 2016-06-22 20:25 UTC (permalink / raw)
  To: Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford, Ramana Radhakrishnan

On 06/22/16 20:49, Jeff Law wrote:
> On 06/21/2016 11:12 PM, Bernd Edlinger wrote:
>
>>
>> What I wanted to say here, is that lra goes thru several iterations,
>> changes something in the register allocation that has an impact on the
>> frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET
>> 3-4 times in a row, and in the results must be consistent in each
>> iteration to be usable.
>>
>> So I am open to suggestions, how would you explain this idea in the doc?
> I'm not sure :(  The goal is still the same, you're trying to separate
> the O(n) from the O(1) operations.  So you want the COMPUTE_FRAME_LAYOUT
> hook to be called once for things which don't vary and
> INITIAL_ELIMINATION_OFFSET multiple times for things that do vary.
>
> Thinking more about this, which port has has a particularly expensive
> INITIAL_ELIMINATION_OFFSET?
>

I'd bet on mips for instance.


Bernd.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-06-21 21:30 ` Jeff Law
  2016-06-22  5:12   ` Bernd Edlinger
@ 2016-08-04 21:16   ` Bernd Edlinger
  2016-08-05  9:29     ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Edlinger @ 2016-08-04 21:16 UTC (permalink / raw)
  To: Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Richard Earnshaw, Nick Clifton

[-- Attachment #1: Type: text/plain, Size: 4016 bytes --]

Hi,

this patch introduces a new target hook that allows the target's
INITIAL_ELIMINATION_OFFSET function to use cached values instead of 
re-computing the frame layout every time.

I have updated the documentation a bit and hope it is clearer this time.

It still needs a review by ARM port maintainers.

If the ARM port maintainers find this patch useful, that would be fine.


Thanks
Bernd.

On 06/21/16 23:29, Jeff Law wrote:
> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>> it is necessary to call this function several times with
>> different register combinations.
>> Most targets use a cached data structure that describes the
>> exact frame layout of the current function.
>>
>> It is safe to skip the computation when reload_completed = true,
>> and most targets do that already.
>>
>> However while reload is doing its work, it is not clear when to
>> do the computation and when not.  This results in unnecessary
>> work.  Computing the frame layout can be a simple function or an
>> arbitrarily complex one, that walks all instructions of the current
>> function for instance, which is more or less the common case.
>>
>>
>> This patch adds a new optional target hook that can be used
>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>> into a O(n) computation part, and a O(1) result function.
>>
>> The patch implements a compute_frame_layout target hook just
>> for ARM in the moment, to show the principle.
>> Other targets may also implement that hook, if it seems appropriate.
>>
>>
>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>> OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> changelog-frame-layout.txt
>>
>>
>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>>     * target.def (compute_frame_layout): New optional target hook.
>>     * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>     * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>     * lra-eliminations.c (update_reg_eliminate): Call
>> compute_frame_layout
>>     target hook.
>>     * reload1.c (verify_initial_elim_offsets): Likewise.
>>     * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>     (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>     (arm_get_frame_offsets): Split up into this ...
>>     (arm_compute_frame_layout): ... and this function.
> The ARM maintainers would need to chime in on the ARM specific changes
> though.
>
>
>
>> Index: gcc/target.def
>> ===================================================================
>> --- gcc/target.def    (Revision 233176)
>> +++ gcc/target.def    (Arbeitskopie)
>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>   unsigned int, (void),
>>   default_case_values_threshold)
>>
>> -/* Retutn true if a function must have and use a frame pointer.  */
> s/Retutn/Return
>
>> +/* Optional callback to advise the target to compute the frame
>> layout.  */
>>  DEFHOOK
>> +(compute_frame_layout,
>> + "This target hook is called immediately before reload wants to call\n\
>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>> frame\n\
>> +layout instead of re-computing it on every invocation.  This is
>> particularly\n\
>> +useful for targets that have an O(n) frame layout function.
>> Implementing\n\
>> +this callback is optional.",
>> + void, (void),
>> + hook_void_void)
> So the docs say "immediately before", but that's not actually reality in
> lra-eliminations.  I think you can just say "This target hook is called
> before reload or lra-eliminations calls
> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>
>
> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>
> I'm OK with this conceptually.  I think you need a minor doc update and
> OK from the ARM maintainers before it can be installed though.
>
> jeff

[-- Attachment #2: changelog-frame-layout.txt --]
[-- Type: text/plain, Size: 610 bytes --]

2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* target.def (compute_frame_layout): New optional target hook.
	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
	target hook.
	* reload1.c (verify_initial_elim_offsets): Likewise.
	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
	(use_simple_return_p): Call arm_compute_frame_layout if needed.
	(arm_get_frame_offsets): Split up into this ...
	(arm_compute_frame_layout): ... and this function.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout.diff --]
[-- Type: text/x-patch; name="patch-frame-layout.diff", Size: 7404 bytes --]

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 239144)
+++ gcc/config/arm/arm.c	(working copy)
@@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
 static bool arm_needs_doubleword_align (machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
+static void arm_compute_frame_layout (void);
 static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
@@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
 
@@ -3880,6 +3884,10 @@ use_simple_return_p (void)
 {
   arm_stack_offsets *offsets;
 
+  /* Note this function can be called before or after reload.  */
+  if (!reload_completed)
+    arm_compute_frame_layout ();
+
   offsets = arm_get_frame_offsets ();
   return offsets->outgoing_args != 0;
 }
@@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
 
 /* Compute a bit mask of which registers need to be
    saved on the stack for the current function.
-   This is used by arm_get_frame_offsets, which may add extra registers.  */
+   This is used by arm_compute_frame_layout, which may add extra registers.  */
 
 static unsigned long
 arm_compute_save_reg_mask (void)
@@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
   alignment.  */
 
 
+/* Return cached stack offsets.  */
+
+static arm_stack_offsets *
+arm_get_frame_offsets (void)
+{
+  struct arm_stack_offsets *offsets;
+
+  offsets = &cfun->machine->stack_offsets;
+
+  return offsets;
+}
+
+
 /* Calculate stack offsets.  These are used to calculate register elimination
    offsets and in prologue/epilogue code.  Also calculates which registers
    should be saved.  */
 
-static arm_stack_offsets *
-arm_get_frame_offsets (void)
+static void
+arm_compute_frame_layout (void)
 {
   struct arm_stack_offsets *offsets;
   unsigned long func_type;
@@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
 
   offsets = &cfun->machine->stack_offsets;
 
-  /* We need to know if we are a leaf function.  Unfortunately, it
-     is possible to be called after start_sequence has been called,
-     which causes get_insns to return the insns for the sequence,
-     not the function, which will cause leaf_function_p to return
-     the incorrect result.
-
-     to know about leaf functions once reload has completed, and the
-     frame size cannot be changed after that time, so we can safely
-     use the cached value.  */
-
-  if (reload_completed)
-    return offsets;
-
   /* Initially this is the size of the local variables.  It will translated
      into an offset once we have determined the size of preceding data.  */
   frame_size = ROUND_UP_WORD (get_frame_size ());
@@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
     {
       offsets->outgoing_args = offsets->soft_frame;
       offsets->locals_base = offsets->soft_frame;
-      return offsets;
+      return;
     }
 
   /* Ensure SFP has the correct alignment.  */
@@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
 	offsets->outgoing_args += 4;
       gcc_assert (!(offsets->outgoing_args & 7));
     }
-
-  return offsets;
 }
 
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 239144)
+++ gcc/doc/tm.texi	(working copy)
@@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
 defined.
 @end defmac
 
+@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
+This target hook allows the target to compute the frame layout once and
+make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
+instead of re-computing it on every invocation.  This is particularly
+useful for targets that have an expensive frame layout function.
+Implementing this callback is optional.
+@end deftypefn
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 239144)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
 defined.
 @end defmac
 
+@hook TARGET_COMPUTE_FRAME_LAYOUT
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/lra-eliminations.c
===================================================================
--- gcc/lra-eliminations.c	(revision 239144)
+++ gcc/lra-eliminations.c	(working copy)
@@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
   struct lra_elim_table *ep, *ep1;
   HARD_REG_SET temp_hard_reg_set;
 
+#ifdef ELIMINABLE_REGS
+  targetm.compute_frame_layout ();
+#endif
+
   /* Clear self elimination offsets.  */
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     self_elim_offsets[ep->from] = 0;
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	(revision 239144)
+++ gcc/reload1.c	(working copy)
@@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
   {
    struct elim_table *ep;
 
+   targetm.compute_frame_layout ();
    for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
      {
        INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
@@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
   struct elim_table *ep = reg_eliminate;
 
 #ifdef ELIMINABLE_REGS
+  targetm.compute_frame_layout ();
   for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 239144)
+++ gcc/target.def	(working copy)
@@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
  unsigned int, (void),
  default_case_values_threshold)
 
-/* Retutn true if a function must have and use a frame pointer.  */
+/* Optional callback to advise the target to compute the frame layout.  */
 DEFHOOK
+(compute_frame_layout,
+ "This target hook allows the target to compute the frame layout once and\n\
+make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
+instead of re-computing it on every invocation.  This is particularly\n\
+useful for targets that have an expensive frame layout function.\n\
+Implementing this callback is optional.",
+ void, (void),
+ hook_void_void)
+
+/* Return true if a function must have and use a frame pointer.  */
+DEFHOOK
 (frame_pointer_required,
  "This target hook should return @code{true} if a function must have and use\n\
 a frame pointer.  This target hook is called in the reload pass.  If its return\n\

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-08-04 21:16   ` [PATCH, ARM] " Bernd Edlinger
@ 2016-08-05  9:29     ` Richard Earnshaw (lists)
  2016-08-05 12:49       ` Bernd Edlinger
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Earnshaw (lists) @ 2016-08-05  9:29 UTC (permalink / raw)
  To: Bernd Edlinger, Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

On 04/08/16 22:16, Bernd Edlinger wrote:
> Hi,
> 
> this patch introduces a new target hook that allows the target's
> INITIAL_ELIMINATION_OFFSET function to use cached values instead of 
> re-computing the frame layout every time.
> 
> I have updated the documentation a bit and hope it is clearer this time.
> 
> It still needs a review by ARM port maintainers.
> 
> If the ARM port maintainers find this patch useful, that would be fine.
> 

I need to look into this more, but my first thought was that the
documentation is confusing, or there is a real problem in this patch.

As I understand it the frame has to be re-laid out each time the
contents of the frame changes (an extra register becomes live or another
spill slot is needed).  So saying that it is laid out once can't be
right if (as I read it initially) you mean 'once per function' since I
think it needs to be 'once for each time the frame contents changes'.

Of course, things might be a bit different with LRA compared with
reload, but I strongly suspect this is never going to be an 'exactly
once per function' operation.

Can you clarify your meaning in the documentation please?

R.

> 
> Thanks
> Bernd.
> 
> On 06/21/16 23:29, Jeff Law wrote:
>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>> Hi!
>>>
>>>
>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>> it is necessary to call this function several times with
>>> different register combinations.
>>> Most targets use a cached data structure that describes the
>>> exact frame layout of the current function.
>>>
>>> It is safe to skip the computation when reload_completed = true,
>>> and most targets do that already.
>>>
>>> However while reload is doing its work, it is not clear when to
>>> do the computation and when not.  This results in unnecessary
>>> work.  Computing the frame layout can be a simple function or an
>>> arbitrarily complex one, that walks all instructions of the current
>>> function for instance, which is more or less the common case.
>>>
>>>
>>> This patch adds a new optional target hook that can be used
>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>> into a O(n) computation part, and a O(1) result function.
>>>
>>> The patch implements a compute_frame_layout target hook just
>>> for ARM in the moment, to show the principle.
>>> Other targets may also implement that hook, if it seems appropriate.
>>>
>>>
>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>> OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>> changelog-frame-layout.txt
>>>
>>>
>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>
>>>     * target.def (compute_frame_layout): New optional target hook.
>>>     * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>     * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>     * lra-eliminations.c (update_reg_eliminate): Call
>>> compute_frame_layout
>>>     target hook.
>>>     * reload1.c (verify_initial_elim_offsets): Likewise.
>>>     * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>     (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>     (arm_get_frame_offsets): Split up into this ...
>>>     (arm_compute_frame_layout): ... and this function.
>> The ARM maintainers would need to chime in on the ARM specific changes
>> though.
>>
>>
>>
>>> Index: gcc/target.def
>>> ===================================================================
>>> --- gcc/target.def    (Revision 233176)
>>> +++ gcc/target.def    (Arbeitskopie)
>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>   unsigned int, (void),
>>>   default_case_values_threshold)
>>>
>>> -/* Retutn true if a function must have and use a frame pointer.  */
>> s/Retutn/Return
>>
>>> +/* Optional callback to advise the target to compute the frame
>>> layout.  */
>>>  DEFHOOK
>>> +(compute_frame_layout,
>>> + "This target hook is called immediately before reload wants to call\n\
>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>> frame\n\
>>> +layout instead of re-computing it on every invocation.  This is
>>> particularly\n\
>>> +useful for targets that have an O(n) frame layout function.
>>> Implementing\n\
>>> +this callback is optional.",
>>> + void, (void),
>>> + hook_void_void)
>> So the docs say "immediately before", but that's not actually reality in
>> lra-eliminations.  I think you can just say "This target hook is called
>> before reload or lra-eliminations calls
>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>
>>
>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>
>> I'm OK with this conceptually.  I think you need a minor doc update and
>> OK from the ARM maintainers before it can be installed though.
>>
>> jeff
>>
>> changelog-frame-layout.txt
>>
>>
>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>
>> 	* target.def (compute_frame_layout): New optional target hook.
>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>> 	target hook.
>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>> 	(arm_get_frame_offsets): Split up into this ...
>> 	(arm_compute_frame_layout): ... and this function.
>>
>> patch-frame-layout.diff
>>
>>
>> Index: gcc/config/arm/arm.c
>> ===================================================================
>> --- gcc/config/arm/arm.c	(revision 239144)
>> +++ gcc/config/arm/arm.c	(working copy)
>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>  static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>  static int arm_compute_static_chain_stack_bytes (void);
>>  static arm_stack_offsets *arm_get_frame_offsets (void);
>> +static void arm_compute_frame_layout (void);
>>  static void arm_add_gc_roots (void);
>>  static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>  			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>  #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>  #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>  
>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>> +
>>  #undef TARGET_FRAME_POINTER_REQUIRED
>>  #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>  
>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>  {
>>    arm_stack_offsets *offsets;
>>  
>> +  /* Note this function can be called before or after reload.  */
>> +  if (!reload_completed)
>> +    arm_compute_frame_layout ();
>> +
>>    offsets = arm_get_frame_offsets ();
>>    return offsets->outgoing_args != 0;
>>  }
>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>  
>>  /* Compute a bit mask of which registers need to be
>>     saved on the stack for the current function.
>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>  
>>  static unsigned long
>>  arm_compute_save_reg_mask (void)
>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>    alignment.  */
>>  
>>  
>> +/* Return cached stack offsets.  */
>> +
>> +static arm_stack_offsets *
>> +arm_get_frame_offsets (void)
>> +{
>> +  struct arm_stack_offsets *offsets;
>> +
>> +  offsets = &cfun->machine->stack_offsets;
>> +
>> +  return offsets;
>> +}
>> +
>> +
>>  /* Calculate stack offsets.  These are used to calculate register elimination
>>     offsets and in prologue/epilogue code.  Also calculates which registers
>>     should be saved.  */
>>  
>> -static arm_stack_offsets *
>> -arm_get_frame_offsets (void)
>> +static void
>> +arm_compute_frame_layout (void)
>>  {
>>    struct arm_stack_offsets *offsets;
>>    unsigned long func_type;
>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>  
>>    offsets = &cfun->machine->stack_offsets;
>>  
>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>> -     is possible to be called after start_sequence has been called,
>> -     which causes get_insns to return the insns for the sequence,
>> -     not the function, which will cause leaf_function_p to return
>> -     the incorrect result.
>> -
>> -     to know about leaf functions once reload has completed, and the
>> -     frame size cannot be changed after that time, so we can safely
>> -     use the cached value.  */
>> -
>> -  if (reload_completed)
>> -    return offsets;
>> -
>>    /* Initially this is the size of the local variables.  It will translated
>>       into an offset once we have determined the size of preceding data.  */
>>    frame_size = ROUND_UP_WORD (get_frame_size ());
>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>      {
>>        offsets->outgoing_args = offsets->soft_frame;
>>        offsets->locals_base = offsets->soft_frame;
>> -      return offsets;
>> +      return;
>>      }
>>  
>>    /* Ensure SFP has the correct alignment.  */
>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>  	offsets->outgoing_args += 4;
>>        gcc_assert (!(offsets->outgoing_args & 7));
>>      }
>> -
>> -  return offsets;
>>  }
>>  
>>  
>> Index: gcc/doc/tm.texi
>> ===================================================================
>> --- gcc/doc/tm.texi	(revision 239144)
>> +++ gcc/doc/tm.texi	(working copy)
>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>  defined.
>>  @end defmac
>>  
>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>> +This target hook allows the target to compute the frame layout once and
>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>> +instead of re-computing it on every invocation.  This is particularly
>> +useful for targets that have an expensive frame layout function.
>> +Implementing this callback is optional.
>> +@end deftypefn
>> +
>>  @node Stack Arguments
>>  @subsection Passing Function Arguments on the Stack
>>  @cindex arguments on stack
>> Index: gcc/doc/tm.texi.in
>> ===================================================================
>> --- gcc/doc/tm.texi.in	(revision 239144)
>> +++ gcc/doc/tm.texi.in	(working copy)
>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>  defined.
>>  @end defmac
>>  
>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>> +
>>  @node Stack Arguments
>>  @subsection Passing Function Arguments on the Stack
>>  @cindex arguments on stack
>> Index: gcc/lra-eliminations.c
>> ===================================================================
>> --- gcc/lra-eliminations.c	(revision 239144)
>> +++ gcc/lra-eliminations.c	(working copy)
>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>    struct lra_elim_table *ep, *ep1;
>>    HARD_REG_SET temp_hard_reg_set;
>>  
>> +#ifdef ELIMINABLE_REGS
>> +  targetm.compute_frame_layout ();
>> +#endif
>> +
>>    /* Clear self elimination offsets.  */
>>    for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>      self_elim_offsets[ep->from] = 0;
>> Index: gcc/reload1.c
>> ===================================================================
>> --- gcc/reload1.c	(revision 239144)
>> +++ gcc/reload1.c	(working copy)
>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>    {
>>     struct elim_table *ep;
>>  
>> +   targetm.compute_frame_layout ();
>>     for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>       {
>>         INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>    struct elim_table *ep = reg_eliminate;
>>  
>>  #ifdef ELIMINABLE_REGS
>> +  targetm.compute_frame_layout ();
>>    for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>      {
>>        INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>> Index: gcc/target.def
>> ===================================================================
>> --- gcc/target.def	(revision 239144)
>> +++ gcc/target.def	(working copy)
>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>   unsigned int, (void),
>>   default_case_values_threshold)
>>  
>> -/* Retutn true if a function must have and use a frame pointer.  */
>> +/* Optional callback to advise the target to compute the frame layout.  */
>>  DEFHOOK
>> +(compute_frame_layout,
>> + "This target hook allows the target to compute the frame layout once and\n\
>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>> +instead of re-computing it on every invocation.  This is particularly\n\
>> +useful for targets that have an expensive frame layout function.\n\
>> +Implementing this callback is optional.",
>> + void, (void),
>> + hook_void_void)
>> +
>> +/* Return true if a function must have and use a frame pointer.  */
>> +DEFHOOK
>>  (frame_pointer_required,
>>   "This target hook should return @code{true} if a function must have and use\n\
>>  a frame pointer.  This target hook is called in the reload pass.  If its return\n\

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-08-05  9:29     ` Richard Earnshaw (lists)
@ 2016-08-05 12:49       ` Bernd Edlinger
  2016-08-05 14:06         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 15+ messages in thread
From: Bernd Edlinger @ 2016-08-05 12:49 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
> On 04/08/16 22:16, Bernd Edlinger wrote:
>> Hi,
>>
>> this patch introduces a new target hook that allows the target's
>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>> re-computing the frame layout every time.
>>
>> I have updated the documentation a bit and hope it is clearer this time.
>>
>> It still needs a review by ARM port maintainers.
>>
>> If the ARM port maintainers find this patch useful, that would be fine.
>>
>
> I need to look into this more, but my first thought was that the
> documentation is confusing, or there is a real problem in this patch.
>

Thanks for your quick response.

The documentation is actually the most difficult part for me.

> As I understand it the frame has to be re-laid out each time the
> contents of the frame changes (an extra register becomes live or another
> spill slot is needed).  So saying that it is laid out once can't be
> right if (as I read it initially) you mean 'once per function' since I
> think it needs to be 'once for each time the frame contents changes'.
>
> Of course, things might be a bit different with LRA compared with
> reload, but I strongly suspect this is never going to be an 'exactly
> once per function' operation.
>

Right.  It will be done 2 or 3 times for each function.
LRA and reload behave identical in that respect.

But each time reload changes something in the input data the
INITIAL_EMIMINATION_OFFSET is called several times, and the results
have to be consistent in each iteration.

The frame layout function has no way to know if the frame layout
is expected to change or not.

Many targets use reload_completed as an indication when the frame layout
may not change at all, but that is only an approximation.

> Can you clarify your meaning in the documentation please?
>

I meant 'once' in the sense of 'once for each time the frame contents 
change'.

Thus I'd change that sentence to:

"This target hook allows the target to compute the frame layout once for
each time the frame contents change and make use of the cached frame
layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
on every invocation.  This is particularly useful for targets that have
an expensive frame layout function.  Implementing this callback is
optional."


Thanks
Bernd.


> R.
>
>>
>> Thanks
>> Bernd.
>>
>> On 06/21/16 23:29, Jeff Law wrote:
>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>> Hi!
>>>>
>>>>
>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>> it is necessary to call this function several times with
>>>> different register combinations.
>>>> Most targets use a cached data structure that describes the
>>>> exact frame layout of the current function.
>>>>
>>>> It is safe to skip the computation when reload_completed = true,
>>>> and most targets do that already.
>>>>
>>>> However while reload is doing its work, it is not clear when to
>>>> do the computation and when not.  This results in unnecessary
>>>> work.  Computing the frame layout can be a simple function or an
>>>> arbitrarily complex one, that walks all instructions of the current
>>>> function for instance, which is more or less the common case.
>>>>
>>>>
>>>> This patch adds a new optional target hook that can be used
>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>> into a O(n) computation part, and a O(1) result function.
>>>>
>>>> The patch implements a compute_frame_layout target hook just
>>>> for ARM in the moment, to show the principle.
>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>
>>>>
>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>> OK for trunk?
>>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>> changelog-frame-layout.txt
>>>>
>>>>
>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>
>>>>      * target.def (compute_frame_layout): New optional target hook.
>>>>      * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>      * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>      * lra-eliminations.c (update_reg_eliminate): Call
>>>> compute_frame_layout
>>>>      target hook.
>>>>      * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>      * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>      (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>      (arm_get_frame_offsets): Split up into this ...
>>>>      (arm_compute_frame_layout): ... and this function.
>>> The ARM maintainers would need to chime in on the ARM specific changes
>>> though.
>>>
>>>
>>>
>>>> Index: gcc/target.def
>>>> ===================================================================
>>>> --- gcc/target.def    (Revision 233176)
>>>> +++ gcc/target.def    (Arbeitskopie)
>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>>    unsigned int, (void),
>>>>    default_case_values_threshold)
>>>>
>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>> s/Retutn/Return
>>>
>>>> +/* Optional callback to advise the target to compute the frame
>>>> layout.  */
>>>>   DEFHOOK
>>>> +(compute_frame_layout,
>>>> + "This target hook is called immediately before reload wants to call\n\
>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>> frame\n\
>>>> +layout instead of re-computing it on every invocation.  This is
>>>> particularly\n\
>>>> +useful for targets that have an O(n) frame layout function.
>>>> Implementing\n\
>>>> +this callback is optional.",
>>>> + void, (void),
>>>> + hook_void_void)
>>> So the docs say "immediately before", but that's not actually reality in
>>> lra-eliminations.  I think you can just say "This target hook is called
>>> before reload or lra-eliminations calls
>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>
>>>
>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>
>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>> OK from the ARM maintainers before it can be installed though.
>>>
>>> jeff
>>>
>>> changelog-frame-layout.txt
>>>
>>>
>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>
>>> 	* target.def (compute_frame_layout): New optional target hook.
>>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>> 	target hook.
>>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>>> 	(arm_get_frame_offsets): Split up into this ...
>>> 	(arm_compute_frame_layout): ... and this function.
>>>
>>> patch-frame-layout.diff
>>>
>>>
>>> Index: gcc/config/arm/arm.c
>>> ===================================================================
>>> --- gcc/config/arm/arm.c	(revision 239144)
>>> +++ gcc/config/arm/arm.c	(working copy)
>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>   static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>   static int arm_compute_static_chain_stack_bytes (void);
>>>   static arm_stack_offsets *arm_get_frame_offsets (void);
>>> +static void arm_compute_frame_layout (void);
>>>   static void arm_add_gc_roots (void);
>>>   static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>   			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>   #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>   #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>
>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>> +
>>>   #undef TARGET_FRAME_POINTER_REQUIRED
>>>   #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>
>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>   {
>>>     arm_stack_offsets *offsets;
>>>
>>> +  /* Note this function can be called before or after reload.  */
>>> +  if (!reload_completed)
>>> +    arm_compute_frame_layout ();
>>> +
>>>     offsets = arm_get_frame_offsets ();
>>>     return offsets->outgoing_args != 0;
>>>   }
>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>
>>>   /* Compute a bit mask of which registers need to be
>>>      saved on the stack for the current function.
>>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>>
>>>   static unsigned long
>>>   arm_compute_save_reg_mask (void)
>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>     alignment.  */
>>>
>>>
>>> +/* Return cached stack offsets.  */
>>> +
>>> +static arm_stack_offsets *
>>> +arm_get_frame_offsets (void)
>>> +{
>>> +  struct arm_stack_offsets *offsets;
>>> +
>>> +  offsets = &cfun->machine->stack_offsets;
>>> +
>>> +  return offsets;
>>> +}
>>> +
>>> +
>>>   /* Calculate stack offsets.  These are used to calculate register elimination
>>>      offsets and in prologue/epilogue code.  Also calculates which registers
>>>      should be saved.  */
>>>
>>> -static arm_stack_offsets *
>>> -arm_get_frame_offsets (void)
>>> +static void
>>> +arm_compute_frame_layout (void)
>>>   {
>>>     struct arm_stack_offsets *offsets;
>>>     unsigned long func_type;
>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>
>>>     offsets = &cfun->machine->stack_offsets;
>>>
>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>> -     is possible to be called after start_sequence has been called,
>>> -     which causes get_insns to return the insns for the sequence,
>>> -     not the function, which will cause leaf_function_p to return
>>> -     the incorrect result.
>>> -
>>> -     to know about leaf functions once reload has completed, and the
>>> -     frame size cannot be changed after that time, so we can safely
>>> -     use the cached value.  */
>>> -
>>> -  if (reload_completed)
>>> -    return offsets;
>>> -
>>>     /* Initially this is the size of the local variables.  It will translated
>>>        into an offset once we have determined the size of preceding data.  */
>>>     frame_size = ROUND_UP_WORD (get_frame_size ());
>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>       {
>>>         offsets->outgoing_args = offsets->soft_frame;
>>>         offsets->locals_base = offsets->soft_frame;
>>> -      return offsets;
>>> +      return;
>>>       }
>>>
>>>     /* Ensure SFP has the correct alignment.  */
>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>   	offsets->outgoing_args += 4;
>>>         gcc_assert (!(offsets->outgoing_args & 7));
>>>       }
>>> -
>>> -  return offsets;
>>>   }
>>>
>>>
>>> Index: gcc/doc/tm.texi
>>> ===================================================================
>>> --- gcc/doc/tm.texi	(revision 239144)
>>> +++ gcc/doc/tm.texi	(working copy)
>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>   defined.
>>>   @end defmac
>>>
>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>> +This target hook allows the target to compute the frame layout once and
>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>> +instead of re-computing it on every invocation.  This is particularly
>>> +useful for targets that have an expensive frame layout function.
>>> +Implementing this callback is optional.
>>> +@end deftypefn
>>> +
>>>   @node Stack Arguments
>>>   @subsection Passing Function Arguments on the Stack
>>>   @cindex arguments on stack
>>> Index: gcc/doc/tm.texi.in
>>> ===================================================================
>>> --- gcc/doc/tm.texi.in	(revision 239144)
>>> +++ gcc/doc/tm.texi.in	(working copy)
>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>   defined.
>>>   @end defmac
>>>
>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>> +
>>>   @node Stack Arguments
>>>   @subsection Passing Function Arguments on the Stack
>>>   @cindex arguments on stack
>>> Index: gcc/lra-eliminations.c
>>> ===================================================================
>>> --- gcc/lra-eliminations.c	(revision 239144)
>>> +++ gcc/lra-eliminations.c	(working copy)
>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>     struct lra_elim_table *ep, *ep1;
>>>     HARD_REG_SET temp_hard_reg_set;
>>>
>>> +#ifdef ELIMINABLE_REGS
>>> +  targetm.compute_frame_layout ();
>>> +#endif
>>> +
>>>     /* Clear self elimination offsets.  */
>>>     for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>       self_elim_offsets[ep->from] = 0;
>>> Index: gcc/reload1.c
>>> ===================================================================
>>> --- gcc/reload1.c	(revision 239144)
>>> +++ gcc/reload1.c	(working copy)
>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>     {
>>>      struct elim_table *ep;
>>>
>>> +   targetm.compute_frame_layout ();
>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>        {
>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>     struct elim_table *ep = reg_eliminate;
>>>
>>>   #ifdef ELIMINABLE_REGS
>>> +  targetm.compute_frame_layout ();
>>>     for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>       {
>>>         INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>> Index: gcc/target.def
>>> ===================================================================
>>> --- gcc/target.def	(revision 239144)
>>> +++ gcc/target.def	(working copy)
>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>    unsigned int, (void),
>>>    default_case_values_threshold)
>>>
>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>> +/* Optional callback to advise the target to compute the frame layout.  */
>>>   DEFHOOK
>>> +(compute_frame_layout,
>>> + "This target hook allows the target to compute the frame layout once and\n\
>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>> +useful for targets that have an expensive frame layout function.\n\
>>> +Implementing this callback is optional.",
>>> + void, (void),
>>> + hook_void_void)
>>> +
>>> +/* Return true if a function must have and use a frame pointer.  */
>>> +DEFHOOK
>>>   (frame_pointer_required,
>>>    "This target hook should return @code{true} if a function must have and use\n\
>>>   a frame pointer.  This target hook is called in the reload pass.  If its return\n\
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-08-05 12:49       ` Bernd Edlinger
@ 2016-08-05 14:06         ` Richard Earnshaw (lists)
  2016-08-05 15:23           ` Bernd Edlinger
  2016-09-05 16:45           ` Bernd Edlinger
  0 siblings, 2 replies; 15+ messages in thread
From: Richard Earnshaw (lists) @ 2016-08-05 14:06 UTC (permalink / raw)
  To: Bernd Edlinger, Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

On 05/08/16 13:49, Bernd Edlinger wrote:
> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this patch introduces a new target hook that allows the target's
>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>> re-computing the frame layout every time.
>>>
>>> I have updated the documentation a bit and hope it is clearer this time.
>>>
>>> It still needs a review by ARM port maintainers.
>>>
>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>
>>
>> I need to look into this more, but my first thought was that the
>> documentation is confusing, or there is a real problem in this patch.
>>
> 
> Thanks for your quick response.
> 
> The documentation is actually the most difficult part for me.
> 
>> As I understand it the frame has to be re-laid out each time the
>> contents of the frame changes (an extra register becomes live or another
>> spill slot is needed).  So saying that it is laid out once can't be
>> right if (as I read it initially) you mean 'once per function' since I
>> think it needs to be 'once for each time the frame contents changes'.
>>
>> Of course, things might be a bit different with LRA compared with
>> reload, but I strongly suspect this is never going to be an 'exactly
>> once per function' operation.
>>
> 
> Right.  It will be done 2 or 3 times for each function.
> LRA and reload behave identical in that respect.
> 
> But each time reload changes something in the input data the
> INITIAL_EMIMINATION_OFFSET is called several times, and the results
> have to be consistent in each iteration.
> 
> The frame layout function has no way to know if the frame layout
> is expected to change or not.
> 
> Many targets use reload_completed as an indication when the frame layout
> may not change at all, but that is only an approximation.
> 
>> Can you clarify your meaning in the documentation please?
>>
> 
> I meant 'once' in the sense of 'once for each time the frame contents 
> change'.
> 
> Thus I'd change that sentence to:
> 
> "This target hook allows the target to compute the frame layout once for
> each time the frame contents change and make use of the cached frame
> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
> on every invocation.  This is particularly useful for targets that have
> an expensive frame layout function.  Implementing this callback is
> optional."
> 

Thanks, that's pretty much what I expected would be the case.

Could I suggest:

This target hook is called once each time the frame layout needs to be
recalculated.  The calculations can be cached by the target and can then
be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
layout on every invocation of that hook.  This is particularly useful
for targets that have an expensive frame layout function.  Implementing
this callback is optional.

R.

> 
> Thanks
> Bernd.
> 
> 
>> R.
>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>> On 06/21/16 23:29, Jeff Law wrote:
>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>> Hi!
>>>>>
>>>>>
>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>> it is necessary to call this function several times with
>>>>> different register combinations.
>>>>> Most targets use a cached data structure that describes the
>>>>> exact frame layout of the current function.
>>>>>
>>>>> It is safe to skip the computation when reload_completed = true,
>>>>> and most targets do that already.
>>>>>
>>>>> However while reload is doing its work, it is not clear when to
>>>>> do the computation and when not.  This results in unnecessary
>>>>> work.  Computing the frame layout can be a simple function or an
>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>> function for instance, which is more or less the common case.
>>>>>
>>>>>
>>>>> This patch adds a new optional target hook that can be used
>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>
>>>>> The patch implements a compute_frame_layout target hook just
>>>>> for ARM in the moment, to show the principle.
>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>
>>>>>
>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>> OK for trunk?
>>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>>
>>>>>
>>>>> changelog-frame-layout.txt
>>>>>
>>>>>
>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>
>>>>>      * target.def (compute_frame_layout): New optional target hook.
>>>>>      * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>      * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>      * lra-eliminations.c (update_reg_eliminate): Call
>>>>> compute_frame_layout
>>>>>      target hook.
>>>>>      * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>      * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>      (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>      (arm_get_frame_offsets): Split up into this ...
>>>>>      (arm_compute_frame_layout): ... and this function.
>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>> though.
>>>>
>>>>
>>>>
>>>>> Index: gcc/target.def
>>>>> ===================================================================
>>>>> --- gcc/target.def    (Revision 233176)
>>>>> +++ gcc/target.def    (Arbeitskopie)
>>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>>>    unsigned int, (void),
>>>>>    default_case_values_threshold)
>>>>>
>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>> s/Retutn/Return
>>>>
>>>>> +/* Optional callback to advise the target to compute the frame
>>>>> layout.  */
>>>>>   DEFHOOK
>>>>> +(compute_frame_layout,
>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>> frame\n\
>>>>> +layout instead of re-computing it on every invocation.  This is
>>>>> particularly\n\
>>>>> +useful for targets that have an O(n) frame layout function.
>>>>> Implementing\n\
>>>>> +this callback is optional.",
>>>>> + void, (void),
>>>>> + hook_void_void)
>>>> So the docs say "immediately before", but that's not actually reality in
>>>> lra-eliminations.  I think you can just say "This target hook is called
>>>> before reload or lra-eliminations calls
>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>
>>>>
>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>
>>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>>> OK from the ARM maintainers before it can be installed though.
>>>>
>>>> jeff
>>>>
>>>> changelog-frame-layout.txt
>>>>
>>>>
>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>
>>>> 	* target.def (compute_frame_layout): New optional target hook.
>>>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>> 	target hook.
>>>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>>>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>> 	(arm_get_frame_offsets): Split up into this ...
>>>> 	(arm_compute_frame_layout): ... and this function.
>>>>
>>>> patch-frame-layout.diff
>>>>
>>>>
>>>> Index: gcc/config/arm/arm.c
>>>> ===================================================================
>>>> --- gcc/config/arm/arm.c	(revision 239144)
>>>> +++ gcc/config/arm/arm.c	(working copy)
>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>   static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>   static int arm_compute_static_chain_stack_bytes (void);
>>>>   static arm_stack_offsets *arm_get_frame_offsets (void);
>>>> +static void arm_compute_frame_layout (void);
>>>>   static void arm_add_gc_roots (void);
>>>>   static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>   			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>   #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>   #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>
>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>> +
>>>>   #undef TARGET_FRAME_POINTER_REQUIRED
>>>>   #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>
>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>   {
>>>>     arm_stack_offsets *offsets;
>>>>
>>>> +  /* Note this function can be called before or after reload.  */
>>>> +  if (!reload_completed)
>>>> +    arm_compute_frame_layout ();
>>>> +
>>>>     offsets = arm_get_frame_offsets ();
>>>>     return offsets->outgoing_args != 0;
>>>>   }
>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>
>>>>   /* Compute a bit mask of which registers need to be
>>>>      saved on the stack for the current function.
>>>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>>>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>>>
>>>>   static unsigned long
>>>>   arm_compute_save_reg_mask (void)
>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>     alignment.  */
>>>>
>>>>
>>>> +/* Return cached stack offsets.  */
>>>> +
>>>> +static arm_stack_offsets *
>>>> +arm_get_frame_offsets (void)
>>>> +{
>>>> +  struct arm_stack_offsets *offsets;
>>>> +
>>>> +  offsets = &cfun->machine->stack_offsets;
>>>> +
>>>> +  return offsets;
>>>> +}
>>>> +
>>>> +
>>>>   /* Calculate stack offsets.  These are used to calculate register elimination
>>>>      offsets and in prologue/epilogue code.  Also calculates which registers
>>>>      should be saved.  */
>>>>
>>>> -static arm_stack_offsets *
>>>> -arm_get_frame_offsets (void)
>>>> +static void
>>>> +arm_compute_frame_layout (void)
>>>>   {
>>>>     struct arm_stack_offsets *offsets;
>>>>     unsigned long func_type;
>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>
>>>>     offsets = &cfun->machine->stack_offsets;
>>>>
>>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>>> -     is possible to be called after start_sequence has been called,
>>>> -     which causes get_insns to return the insns for the sequence,
>>>> -     not the function, which will cause leaf_function_p to return
>>>> -     the incorrect result.
>>>> -
>>>> -     to know about leaf functions once reload has completed, and the
>>>> -     frame size cannot be changed after that time, so we can safely
>>>> -     use the cached value.  */
>>>> -
>>>> -  if (reload_completed)
>>>> -    return offsets;
>>>> -
>>>>     /* Initially this is the size of the local variables.  It will translated
>>>>        into an offset once we have determined the size of preceding data.  */
>>>>     frame_size = ROUND_UP_WORD (get_frame_size ());
>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>       {
>>>>         offsets->outgoing_args = offsets->soft_frame;
>>>>         offsets->locals_base = offsets->soft_frame;
>>>> -      return offsets;
>>>> +      return;
>>>>       }
>>>>
>>>>     /* Ensure SFP has the correct alignment.  */
>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>   	offsets->outgoing_args += 4;
>>>>         gcc_assert (!(offsets->outgoing_args & 7));
>>>>       }
>>>> -
>>>> -  return offsets;
>>>>   }
>>>>
>>>>
>>>> Index: gcc/doc/tm.texi
>>>> ===================================================================
>>>> --- gcc/doc/tm.texi	(revision 239144)
>>>> +++ gcc/doc/tm.texi	(working copy)
>>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>>   defined.
>>>>   @end defmac
>>>>
>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>> +This target hook allows the target to compute the frame layout once and
>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>> +instead of re-computing it on every invocation.  This is particularly
>>>> +useful for targets that have an expensive frame layout function.
>>>> +Implementing this callback is optional.
>>>> +@end deftypefn
>>>> +
>>>>   @node Stack Arguments
>>>>   @subsection Passing Function Arguments on the Stack
>>>>   @cindex arguments on stack
>>>> Index: gcc/doc/tm.texi.in
>>>> ===================================================================
>>>> --- gcc/doc/tm.texi.in	(revision 239144)
>>>> +++ gcc/doc/tm.texi.in	(working copy)
>>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>>   defined.
>>>>   @end defmac
>>>>
>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>> +
>>>>   @node Stack Arguments
>>>>   @subsection Passing Function Arguments on the Stack
>>>>   @cindex arguments on stack
>>>> Index: gcc/lra-eliminations.c
>>>> ===================================================================
>>>> --- gcc/lra-eliminations.c	(revision 239144)
>>>> +++ gcc/lra-eliminations.c	(working copy)
>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>     struct lra_elim_table *ep, *ep1;
>>>>     HARD_REG_SET temp_hard_reg_set;
>>>>
>>>> +#ifdef ELIMINABLE_REGS
>>>> +  targetm.compute_frame_layout ();
>>>> +#endif
>>>> +
>>>>     /* Clear self elimination offsets.  */
>>>>     for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>       self_elim_offsets[ep->from] = 0;
>>>> Index: gcc/reload1.c
>>>> ===================================================================
>>>> --- gcc/reload1.c	(revision 239144)
>>>> +++ gcc/reload1.c	(working copy)
>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>     {
>>>>      struct elim_table *ep;
>>>>
>>>> +   targetm.compute_frame_layout ();
>>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>        {
>>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>     struct elim_table *ep = reg_eliminate;
>>>>
>>>>   #ifdef ELIMINABLE_REGS
>>>> +  targetm.compute_frame_layout ();
>>>>     for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>       {
>>>>         INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>>> Index: gcc/target.def
>>>> ===================================================================
>>>> --- gcc/target.def	(revision 239144)
>>>> +++ gcc/target.def	(working copy)
>>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>>    unsigned int, (void),
>>>>    default_case_values_threshold)
>>>>
>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>> +/* Optional callback to advise the target to compute the frame layout.  */
>>>>   DEFHOOK
>>>> +(compute_frame_layout,
>>>> + "This target hook allows the target to compute the frame layout once and\n\
>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>>> +useful for targets that have an expensive frame layout function.\n\
>>>> +Implementing this callback is optional.",
>>>> + void, (void),
>>>> + hook_void_void)
>>>> +
>>>> +/* Return true if a function must have and use a frame pointer.  */
>>>> +DEFHOOK
>>>>   (frame_pointer_required,
>>>>    "This target hook should return @code{true} if a function must have and use\n\
>>>>   a frame pointer.  This target hook is called in the reload pass.  If its return\n\
>>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-08-05 14:06         ` Richard Earnshaw (lists)
@ 2016-08-05 15:23           ` Bernd Edlinger
  2016-09-05 16:45           ` Bernd Edlinger
  1 sibling, 0 replies; 15+ messages in thread
From: Bernd Edlinger @ 2016-08-05 15:23 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>
> Thanks, that's pretty much what I expected would be the case.
>
> Could I suggest:
>
> This target hook is called once each time the frame layout needs to be
> recalculated.  The calculations can be cached by the target and can then
> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
> layout on every invocation of that hook.  This is particularly useful
> for targets that have an expensive frame layout function.  Implementing
> this callback is optional.
>
> R.
>

Excellent!  I like this wording very much.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-08-05 14:06         ` Richard Earnshaw (lists)
  2016-08-05 15:23           ` Bernd Edlinger
@ 2016-09-05 16:45           ` Bernd Edlinger
  2017-05-05 15:02             ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 15+ messages in thread
From: Bernd Edlinger @ 2016-09-05 16:45 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

Hi Richard,

what do you think of this patch, is it OK (with the suggested wording)?


Thanks
Bernd.

On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
> On 05/08/16 13:49, Bernd Edlinger wrote:
>> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this patch introduces a new target hook that allows the target's
>>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>>> re-computing the frame layout every time.
>>>>
>>>> I have updated the documentation a bit and hope it is clearer this time.
>>>>
>>>> It still needs a review by ARM port maintainers.
>>>>
>>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>>
>>>
>>> I need to look into this more, but my first thought was that the
>>> documentation is confusing, or there is a real problem in this patch.
>>>
>>
>> Thanks for your quick response.
>>
>> The documentation is actually the most difficult part for me.
>>
>>> As I understand it the frame has to be re-laid out each time the
>>> contents of the frame changes (an extra register becomes live or another
>>> spill slot is needed).  So saying that it is laid out once can't be
>>> right if (as I read it initially) you mean 'once per function' since I
>>> think it needs to be 'once for each time the frame contents changes'.
>>>
>>> Of course, things might be a bit different with LRA compared with
>>> reload, but I strongly suspect this is never going to be an 'exactly
>>> once per function' operation.
>>>
>>
>> Right.  It will be done 2 or 3 times for each function.
>> LRA and reload behave identical in that respect.
>>
>> But each time reload changes something in the input data the
>> INITIAL_EMIMINATION_OFFSET is called several times, and the results
>> have to be consistent in each iteration.
>>
>> The frame layout function has no way to know if the frame layout
>> is expected to change or not.
>>
>> Many targets use reload_completed as an indication when the frame layout
>> may not change at all, but that is only an approximation.
>>
>>> Can you clarify your meaning in the documentation please?
>>>
>>
>> I meant 'once' in the sense of 'once for each time the frame contents
>> change'.
>>
>> Thus I'd change that sentence to:
>>
>> "This target hook allows the target to compute the frame layout once for
>> each time the frame contents change and make use of the cached frame
>> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
>> on every invocation.  This is particularly useful for targets that have
>> an expensive frame layout function.  Implementing this callback is
>> optional."
>>
>
> Thanks, that's pretty much what I expected would be the case.
>
> Could I suggest:
>
> This target hook is called once each time the frame layout needs to be
> recalculated.  The calculations can be cached by the target and can then
> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
> layout on every invocation of that hook.  This is particularly useful
> for targets that have an expensive frame layout function.  Implementing
> this callback is optional.
>
> R.
>
>>
>> Thanks
>> Bernd.
>>
>>
>>> R.
>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>> On 06/21/16 23:29, Jeff Law wrote:
>>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>>> Hi!
>>>>>>
>>>>>>
>>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>>> it is necessary to call this function several times with
>>>>>> different register combinations.
>>>>>> Most targets use a cached data structure that describes the
>>>>>> exact frame layout of the current function.
>>>>>>
>>>>>> It is safe to skip the computation when reload_completed = true,
>>>>>> and most targets do that already.
>>>>>>
>>>>>> However while reload is doing its work, it is not clear when to
>>>>>> do the computation and when not.  This results in unnecessary
>>>>>> work.  Computing the frame layout can be a simple function or an
>>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>>> function for instance, which is more or less the common case.
>>>>>>
>>>>>>
>>>>>> This patch adds a new optional target hook that can be used
>>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>>
>>>>>> The patch implements a compute_frame_layout target hook just
>>>>>> for ARM in the moment, to show the principle.
>>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>>
>>>>>>
>>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>>> OK for trunk?
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>>
>>>>>>
>>>>>> changelog-frame-layout.txt
>>>>>>
>>>>>>
>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>
>>>>>>       * target.def (compute_frame_layout): New optional target hook.
>>>>>>       * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>       * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>       * lra-eliminations.c (update_reg_eliminate): Call
>>>>>> compute_frame_layout
>>>>>>       target hook.
>>>>>>       * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>       * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>       (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>       (arm_get_frame_offsets): Split up into this ...
>>>>>>       (arm_compute_frame_layout): ... and this function.
>>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>>> though.
>>>>>
>>>>>
>>>>>
>>>>>> Index: gcc/target.def
>>>>>> ===================================================================
>>>>>> --- gcc/target.def    (Revision 233176)
>>>>>> +++ gcc/target.def    (Arbeitskopie)
>>>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>>>>     unsigned int, (void),
>>>>>>     default_case_values_threshold)
>>>>>>
>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>> s/Retutn/Return
>>>>>
>>>>>> +/* Optional callback to advise the target to compute the frame
>>>>>> layout.  */
>>>>>>    DEFHOOK
>>>>>> +(compute_frame_layout,
>>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>>> frame\n\
>>>>>> +layout instead of re-computing it on every invocation.  This is
>>>>>> particularly\n\
>>>>>> +useful for targets that have an O(n) frame layout function.
>>>>>> Implementing\n\
>>>>>> +this callback is optional.",
>>>>>> + void, (void),
>>>>>> + hook_void_void)
>>>>> So the docs say "immediately before", but that's not actually reality in
>>>>> lra-eliminations.  I think you can just say "This target hook is called
>>>>> before reload or lra-eliminations calls
>>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>>
>>>>>
>>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>>
>>>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>>>> OK from the ARM maintainers before it can be installed though.
>>>>>
>>>>> jeff
>>>>>
>>>>> changelog-frame-layout.txt
>>>>>
>>>>>
>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>
>>>>> 	* target.def (compute_frame_layout): New optional target hook.
>>>>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>>> 	target hook.
>>>>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>>>>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>> 	(arm_get_frame_offsets): Split up into this ...
>>>>> 	(arm_compute_frame_layout): ... and this function.
>>>>>
>>>>> patch-frame-layout.diff
>>>>>
>>>>>
>>>>> Index: gcc/config/arm/arm.c
>>>>> ===================================================================
>>>>> --- gcc/config/arm/arm.c	(revision 239144)
>>>>> +++ gcc/config/arm/arm.c	(working copy)
>>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>>    static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>>    static int arm_compute_static_chain_stack_bytes (void);
>>>>>    static arm_stack_offsets *arm_get_frame_offsets (void);
>>>>> +static void arm_compute_frame_layout (void);
>>>>>    static void arm_add_gc_roots (void);
>>>>>    static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>>    			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>>    #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>>    #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>>
>>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>>> +
>>>>>    #undef TARGET_FRAME_POINTER_REQUIRED
>>>>>    #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>>
>>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>>    {
>>>>>      arm_stack_offsets *offsets;
>>>>>
>>>>> +  /* Note this function can be called before or after reload.  */
>>>>> +  if (!reload_completed)
>>>>> +    arm_compute_frame_layout ();
>>>>> +
>>>>>      offsets = arm_get_frame_offsets ();
>>>>>      return offsets->outgoing_args != 0;
>>>>>    }
>>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>>
>>>>>    /* Compute a bit mask of which registers need to be
>>>>>       saved on the stack for the current function.
>>>>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>>>>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>>>>
>>>>>    static unsigned long
>>>>>    arm_compute_save_reg_mask (void)
>>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>>      alignment.  */
>>>>>
>>>>>
>>>>> +/* Return cached stack offsets.  */
>>>>> +
>>>>> +static arm_stack_offsets *
>>>>> +arm_get_frame_offsets (void)
>>>>> +{
>>>>> +  struct arm_stack_offsets *offsets;
>>>>> +
>>>>> +  offsets = &cfun->machine->stack_offsets;
>>>>> +
>>>>> +  return offsets;
>>>>> +}
>>>>> +
>>>>> +
>>>>>    /* Calculate stack offsets.  These are used to calculate register elimination
>>>>>       offsets and in prologue/epilogue code.  Also calculates which registers
>>>>>       should be saved.  */
>>>>>
>>>>> -static arm_stack_offsets *
>>>>> -arm_get_frame_offsets (void)
>>>>> +static void
>>>>> +arm_compute_frame_layout (void)
>>>>>    {
>>>>>      struct arm_stack_offsets *offsets;
>>>>>      unsigned long func_type;
>>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>>
>>>>>      offsets = &cfun->machine->stack_offsets;
>>>>>
>>>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>>>> -     is possible to be called after start_sequence has been called,
>>>>> -     which causes get_insns to return the insns for the sequence,
>>>>> -     not the function, which will cause leaf_function_p to return
>>>>> -     the incorrect result.
>>>>> -
>>>>> -     to know about leaf functions once reload has completed, and the
>>>>> -     frame size cannot be changed after that time, so we can safely
>>>>> -     use the cached value.  */
>>>>> -
>>>>> -  if (reload_completed)
>>>>> -    return offsets;
>>>>> -
>>>>>      /* Initially this is the size of the local variables.  It will translated
>>>>>         into an offset once we have determined the size of preceding data.  */
>>>>>      frame_size = ROUND_UP_WORD (get_frame_size ());
>>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>>        {
>>>>>          offsets->outgoing_args = offsets->soft_frame;
>>>>>          offsets->locals_base = offsets->soft_frame;
>>>>> -      return offsets;
>>>>> +      return;
>>>>>        }
>>>>>
>>>>>      /* Ensure SFP has the correct alignment.  */
>>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>>    	offsets->outgoing_args += 4;
>>>>>          gcc_assert (!(offsets->outgoing_args & 7));
>>>>>        }
>>>>> -
>>>>> -  return offsets;
>>>>>    }
>>>>>
>>>>>
>>>>> Index: gcc/doc/tm.texi
>>>>> ===================================================================
>>>>> --- gcc/doc/tm.texi	(revision 239144)
>>>>> +++ gcc/doc/tm.texi	(working copy)
>>>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>>>    defined.
>>>>>    @end defmac
>>>>>
>>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>>> +This target hook allows the target to compute the frame layout once and
>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>>> +instead of re-computing it on every invocation.  This is particularly
>>>>> +useful for targets that have an expensive frame layout function.
>>>>> +Implementing this callback is optional.
>>>>> +@end deftypefn
>>>>> +
>>>>>    @node Stack Arguments
>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>    @cindex arguments on stack
>>>>> Index: gcc/doc/tm.texi.in
>>>>> ===================================================================
>>>>> --- gcc/doc/tm.texi.in	(revision 239144)
>>>>> +++ gcc/doc/tm.texi.in	(working copy)
>>>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>>>    defined.
>>>>>    @end defmac
>>>>>
>>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>>> +
>>>>>    @node Stack Arguments
>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>    @cindex arguments on stack
>>>>> Index: gcc/lra-eliminations.c
>>>>> ===================================================================
>>>>> --- gcc/lra-eliminations.c	(revision 239144)
>>>>> +++ gcc/lra-eliminations.c	(working copy)
>>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>>      struct lra_elim_table *ep, *ep1;
>>>>>      HARD_REG_SET temp_hard_reg_set;
>>>>>
>>>>> +#ifdef ELIMINABLE_REGS
>>>>> +  targetm.compute_frame_layout ();
>>>>> +#endif
>>>>> +
>>>>>      /* Clear self elimination offsets.  */
>>>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>        self_elim_offsets[ep->from] = 0;
>>>>> Index: gcc/reload1.c
>>>>> ===================================================================
>>>>> --- gcc/reload1.c	(revision 239144)
>>>>> +++ gcc/reload1.c	(working copy)
>>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>>      {
>>>>>       struct elim_table *ep;
>>>>>
>>>>> +   targetm.compute_frame_layout ();
>>>>>       for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>         {
>>>>>           INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>>      struct elim_table *ep = reg_eliminate;
>>>>>
>>>>>    #ifdef ELIMINABLE_REGS
>>>>> +  targetm.compute_frame_layout ();
>>>>>      for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>        {
>>>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>>>> Index: gcc/target.def
>>>>> ===================================================================
>>>>> --- gcc/target.def	(revision 239144)
>>>>> +++ gcc/target.def	(working copy)
>>>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>>>     unsigned int, (void),
>>>>>     default_case_values_threshold)
>>>>>
>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>> +/* Optional callback to advise the target to compute the frame layout.  */
>>>>>    DEFHOOK
>>>>> +(compute_frame_layout,
>>>>> + "This target hook allows the target to compute the frame layout once and\n\
>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>>>> +useful for targets that have an expensive frame layout function.\n\
>>>>> +Implementing this callback is optional.",
>>>>> + void, (void),
>>>>> + hook_void_void)
>>>>> +
>>>>> +/* Return true if a function must have and use a frame pointer.  */
>>>>> +DEFHOOK
>>>>>    (frame_pointer_required,
>>>>>     "This target hook should return @code{true} if a function must have and use\n\
>>>>>    a frame pointer.  This target hook is called in the reload pass.  If its return\n\
>>>
>>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2016-09-05 16:45           ` Bernd Edlinger
@ 2017-05-05 15:02             ` Richard Earnshaw (lists)
  2017-05-06 18:16               ` Bernd Edlinger
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 15:02 UTC (permalink / raw)
  To: Bernd Edlinger, Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

On 05/09/16 17:43, Bernd Edlinger wrote:
> Hi Richard,
> 
> what do you think of this patch, is it OK (with the suggested wording)?
> 

Bernd,

Apologies, this seems to have fallen through a crack.

I'm happy with this.  Does it still apply?

If so, I suggest applying it after a 24-hour cooling off period for any
final comments.

R.

> 
> Thanks
> Bernd.
> 
> On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>> On 05/08/16 13:49, Bernd Edlinger wrote:
>>> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>>>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this patch introduces a new target hook that allows the target's
>>>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>>>> re-computing the frame layout every time.
>>>>>
>>>>> I have updated the documentation a bit and hope it is clearer this time.
>>>>>
>>>>> It still needs a review by ARM port maintainers.
>>>>>
>>>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>>>
>>>>
>>>> I need to look into this more, but my first thought was that the
>>>> documentation is confusing, or there is a real problem in this patch.
>>>>
>>>
>>> Thanks for your quick response.
>>>
>>> The documentation is actually the most difficult part for me.
>>>
>>>> As I understand it the frame has to be re-laid out each time the
>>>> contents of the frame changes (an extra register becomes live or another
>>>> spill slot is needed).  So saying that it is laid out once can't be
>>>> right if (as I read it initially) you mean 'once per function' since I
>>>> think it needs to be 'once for each time the frame contents changes'.
>>>>
>>>> Of course, things might be a bit different with LRA compared with
>>>> reload, but I strongly suspect this is never going to be an 'exactly
>>>> once per function' operation.
>>>>
>>>
>>> Right.  It will be done 2 or 3 times for each function.
>>> LRA and reload behave identical in that respect.
>>>
>>> But each time reload changes something in the input data the
>>> INITIAL_EMIMINATION_OFFSET is called several times, and the results
>>> have to be consistent in each iteration.
>>>
>>> The frame layout function has no way to know if the frame layout
>>> is expected to change or not.
>>>
>>> Many targets use reload_completed as an indication when the frame layout
>>> may not change at all, but that is only an approximation.
>>>
>>>> Can you clarify your meaning in the documentation please?
>>>>
>>>
>>> I meant 'once' in the sense of 'once for each time the frame contents
>>> change'.
>>>
>>> Thus I'd change that sentence to:
>>>
>>> "This target hook allows the target to compute the frame layout once for
>>> each time the frame contents change and make use of the cached frame
>>> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
>>> on every invocation.  This is particularly useful for targets that have
>>> an expensive frame layout function.  Implementing this callback is
>>> optional."
>>>
>>
>> Thanks, that's pretty much what I expected would be the case.
>>
>> Could I suggest:
>>
>> This target hook is called once each time the frame layout needs to be
>> recalculated.  The calculations can be cached by the target and can then
>> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
>> layout on every invocation of that hook.  This is particularly useful
>> for targets that have an expensive frame layout function.  Implementing
>> this callback is optional.
>>
>> R.
>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>>
>>>> R.
>>>>
>>>>>
>>>>> Thanks
>>>>> Bernd.
>>>>>
>>>>> On 06/21/16 23:29, Jeff Law wrote:
>>>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>>>> Hi!
>>>>>>>
>>>>>>>
>>>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>>>> it is necessary to call this function several times with
>>>>>>> different register combinations.
>>>>>>> Most targets use a cached data structure that describes the
>>>>>>> exact frame layout of the current function.
>>>>>>>
>>>>>>> It is safe to skip the computation when reload_completed = true,
>>>>>>> and most targets do that already.
>>>>>>>
>>>>>>> However while reload is doing its work, it is not clear when to
>>>>>>> do the computation and when not.  This results in unnecessary
>>>>>>> work.  Computing the frame layout can be a simple function or an
>>>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>>>> function for instance, which is more or less the common case.
>>>>>>>
>>>>>>>
>>>>>>> This patch adds a new optional target hook that can be used
>>>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>>>
>>>>>>> The patch implements a compute_frame_layout target hook just
>>>>>>> for ARM in the moment, to show the principle.
>>>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>>>
>>>>>>>
>>>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Bernd.
>>>>>>>
>>>>>>>
>>>>>>> changelog-frame-layout.txt
>>>>>>>
>>>>>>>
>>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>>
>>>>>>>       * target.def (compute_frame_layout): New optional target hook.
>>>>>>>       * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>>       * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>>       * lra-eliminations.c (update_reg_eliminate): Call
>>>>>>> compute_frame_layout
>>>>>>>       target hook.
>>>>>>>       * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>>       * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>>       (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>>       (arm_get_frame_offsets): Split up into this ...
>>>>>>>       (arm_compute_frame_layout): ... and this function.
>>>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>>>> though.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Index: gcc/target.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/target.def    (Revision 233176)
>>>>>>> +++ gcc/target.def    (Arbeitskopie)
>>>>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>>>>>     unsigned int, (void),
>>>>>>>     default_case_values_threshold)
>>>>>>>
>>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>> s/Retutn/Return
>>>>>>
>>>>>>> +/* Optional callback to advise the target to compute the frame
>>>>>>> layout.  */
>>>>>>>    DEFHOOK
>>>>>>> +(compute_frame_layout,
>>>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>>>> frame\n\
>>>>>>> +layout instead of re-computing it on every invocation.  This is
>>>>>>> particularly\n\
>>>>>>> +useful for targets that have an O(n) frame layout function.
>>>>>>> Implementing\n\
>>>>>>> +this callback is optional.",
>>>>>>> + void, (void),
>>>>>>> + hook_void_void)
>>>>>> So the docs say "immediately before", but that's not actually reality in
>>>>>> lra-eliminations.  I think you can just say "This target hook is called
>>>>>> before reload or lra-eliminations calls
>>>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>>>
>>>>>>
>>>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>>>
>>>>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>>>>> OK from the ARM maintainers before it can be installed though.
>>>>>>
>>>>>> jeff
>>>>>>
>>>>>> changelog-frame-layout.txt
>>>>>>
>>>>>>
>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>
>>>>>> 	* target.def (compute_frame_layout): New optional target hook.
>>>>>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>>>> 	target hook.
>>>>>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>> 	(arm_get_frame_offsets): Split up into this ...
>>>>>> 	(arm_compute_frame_layout): ... and this function.
>>>>>>
>>>>>> patch-frame-layout.diff
>>>>>>
>>>>>>
>>>>>> Index: gcc/config/arm/arm.c
>>>>>> ===================================================================
>>>>>> --- gcc/config/arm/arm.c	(revision 239144)
>>>>>> +++ gcc/config/arm/arm.c	(working copy)
>>>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>>>    static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>>>    static int arm_compute_static_chain_stack_bytes (void);
>>>>>>    static arm_stack_offsets *arm_get_frame_offsets (void);
>>>>>> +static void arm_compute_frame_layout (void);
>>>>>>    static void arm_add_gc_roots (void);
>>>>>>    static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>>>    			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>>>    #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>>>    #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>>>
>>>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>>>> +
>>>>>>    #undef TARGET_FRAME_POINTER_REQUIRED
>>>>>>    #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>>>
>>>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>>>    {
>>>>>>      arm_stack_offsets *offsets;
>>>>>>
>>>>>> +  /* Note this function can be called before or after reload.  */
>>>>>> +  if (!reload_completed)
>>>>>> +    arm_compute_frame_layout ();
>>>>>> +
>>>>>>      offsets = arm_get_frame_offsets ();
>>>>>>      return offsets->outgoing_args != 0;
>>>>>>    }
>>>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>>>
>>>>>>    /* Compute a bit mask of which registers need to be
>>>>>>       saved on the stack for the current function.
>>>>>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>>>>>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>>>>>
>>>>>>    static unsigned long
>>>>>>    arm_compute_save_reg_mask (void)
>>>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>>>      alignment.  */
>>>>>>
>>>>>>
>>>>>> +/* Return cached stack offsets.  */
>>>>>> +
>>>>>> +static arm_stack_offsets *
>>>>>> +arm_get_frame_offsets (void)
>>>>>> +{
>>>>>> +  struct arm_stack_offsets *offsets;
>>>>>> +
>>>>>> +  offsets = &cfun->machine->stack_offsets;
>>>>>> +
>>>>>> +  return offsets;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>>    /* Calculate stack offsets.  These are used to calculate register elimination
>>>>>>       offsets and in prologue/epilogue code.  Also calculates which registers
>>>>>>       should be saved.  */
>>>>>>
>>>>>> -static arm_stack_offsets *
>>>>>> -arm_get_frame_offsets (void)
>>>>>> +static void
>>>>>> +arm_compute_frame_layout (void)
>>>>>>    {
>>>>>>      struct arm_stack_offsets *offsets;
>>>>>>      unsigned long func_type;
>>>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>
>>>>>>      offsets = &cfun->machine->stack_offsets;
>>>>>>
>>>>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>>>>> -     is possible to be called after start_sequence has been called,
>>>>>> -     which causes get_insns to return the insns for the sequence,
>>>>>> -     not the function, which will cause leaf_function_p to return
>>>>>> -     the incorrect result.
>>>>>> -
>>>>>> -     to know about leaf functions once reload has completed, and the
>>>>>> -     frame size cannot be changed after that time, so we can safely
>>>>>> -     use the cached value.  */
>>>>>> -
>>>>>> -  if (reload_completed)
>>>>>> -    return offsets;
>>>>>> -
>>>>>>      /* Initially this is the size of the local variables.  It will translated
>>>>>>         into an offset once we have determined the size of preceding data.  */
>>>>>>      frame_size = ROUND_UP_WORD (get_frame_size ());
>>>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>>>        {
>>>>>>          offsets->outgoing_args = offsets->soft_frame;
>>>>>>          offsets->locals_base = offsets->soft_frame;
>>>>>> -      return offsets;
>>>>>> +      return;
>>>>>>        }
>>>>>>
>>>>>>      /* Ensure SFP has the correct alignment.  */
>>>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>    	offsets->outgoing_args += 4;
>>>>>>          gcc_assert (!(offsets->outgoing_args & 7));
>>>>>>        }
>>>>>> -
>>>>>> -  return offsets;
>>>>>>    }
>>>>>>
>>>>>>
>>>>>> Index: gcc/doc/tm.texi
>>>>>> ===================================================================
>>>>>> --- gcc/doc/tm.texi	(revision 239144)
>>>>>> +++ gcc/doc/tm.texi	(working copy)
>>>>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>>>>    defined.
>>>>>>    @end defmac
>>>>>>
>>>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>>>> +This target hook allows the target to compute the frame layout once and
>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>>>> +instead of re-computing it on every invocation.  This is particularly
>>>>>> +useful for targets that have an expensive frame layout function.
>>>>>> +Implementing this callback is optional.
>>>>>> +@end deftypefn
>>>>>> +
>>>>>>    @node Stack Arguments
>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>    @cindex arguments on stack
>>>>>> Index: gcc/doc/tm.texi.in
>>>>>> ===================================================================
>>>>>> --- gcc/doc/tm.texi.in	(revision 239144)
>>>>>> +++ gcc/doc/tm.texi.in	(working copy)
>>>>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>>>>    defined.
>>>>>>    @end defmac
>>>>>>
>>>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>>>> +
>>>>>>    @node Stack Arguments
>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>    @cindex arguments on stack
>>>>>> Index: gcc/lra-eliminations.c
>>>>>> ===================================================================
>>>>>> --- gcc/lra-eliminations.c	(revision 239144)
>>>>>> +++ gcc/lra-eliminations.c	(working copy)
>>>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>>>      struct lra_elim_table *ep, *ep1;
>>>>>>      HARD_REG_SET temp_hard_reg_set;
>>>>>>
>>>>>> +#ifdef ELIMINABLE_REGS
>>>>>> +  targetm.compute_frame_layout ();
>>>>>> +#endif
>>>>>> +
>>>>>>      /* Clear self elimination offsets.  */
>>>>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>        self_elim_offsets[ep->from] = 0;
>>>>>> Index: gcc/reload1.c
>>>>>> ===================================================================
>>>>>> --- gcc/reload1.c	(revision 239144)
>>>>>> +++ gcc/reload1.c	(working copy)
>>>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>>>      {
>>>>>>       struct elim_table *ep;
>>>>>>
>>>>>> +   targetm.compute_frame_layout ();
>>>>>>       for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>         {
>>>>>>           INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>>>      struct elim_table *ep = reg_eliminate;
>>>>>>
>>>>>>    #ifdef ELIMINABLE_REGS
>>>>>> +  targetm.compute_frame_layout ();
>>>>>>      for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>        {
>>>>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>>>>> Index: gcc/target.def
>>>>>> ===================================================================
>>>>>> --- gcc/target.def	(revision 239144)
>>>>>> +++ gcc/target.def	(working copy)
>>>>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>>>>     unsigned int, (void),
>>>>>>     default_case_values_threshold)
>>>>>>
>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>> +/* Optional callback to advise the target to compute the frame layout.  */
>>>>>>    DEFHOOK
>>>>>> +(compute_frame_layout,
>>>>>> + "This target hook allows the target to compute the frame layout once and\n\
>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>>>>> +useful for targets that have an expensive frame layout function.\n\
>>>>>> +Implementing this callback is optional.",
>>>>>> + void, (void),
>>>>>> + hook_void_void)
>>>>>> +
>>>>>> +/* Return true if a function must have and use a frame pointer.  */
>>>>>> +DEFHOOK
>>>>>>    (frame_pointer_required,
>>>>>>     "This target hook should return @code{true} if a function must have and use\n\
>>>>>>    a frame pointer.  This target hook is called in the reload pass.  If its return\n\
>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH, ARM] Add a new target hook to compute the frame layout
  2017-05-05 15:02             ` Richard Earnshaw (lists)
@ 2017-05-06 18:16               ` Bernd Edlinger
  0 siblings, 0 replies; 15+ messages in thread
From: Bernd Edlinger @ 2017-05-06 18:16 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Jeff Law, gcc-patches
  Cc: Richard Biener, Jakub Jelinek, Richard Sandiford,
	Ramana Radhakrishnan, Nick Clifton

[-- Attachment #1: Type: text/plain, Size: 18247 bytes --]

On 05/05/17 16:59, Richard Earnshaw (lists) wrote:
> On 05/09/16 17:43, Bernd Edlinger wrote:
>> Hi Richard,
>>
>> what do you think of this patch, is it OK (with the suggested wording)?
>>
>
> Bernd,
>
> Apologies, this seems to have fallen through a crack.
>
> I'm happy with this.  Does it still apply?
>

Yes, only in a few places the context lines changed slightly.
So I attached a refreshed patch for your reference.

> If so, I suggest applying it after a 24-hour cooling off period for any
> final comments.
>

Fine, then I will apply it on monday evening, unless someone objects.


Thanks
Bernd.

> R.
>
>>
>> Thanks
>> Bernd.
>>
>> On 08/05/16 16:06, Richard Earnshaw (lists) wrote:
>>> On 05/08/16 13:49, Bernd Edlinger wrote:
>>>> On 08/05/16 11:29, Richard Earnshaw (lists) wrote:
>>>>> On 04/08/16 22:16, Bernd Edlinger wrote:
>>>>>> Hi,
>>>>>>
>>>>>> this patch introduces a new target hook that allows the target's
>>>>>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of
>>>>>> re-computing the frame layout every time.
>>>>>>
>>>>>> I have updated the documentation a bit and hope it is clearer this time.
>>>>>>
>>>>>> It still needs a review by ARM port maintainers.
>>>>>>
>>>>>> If the ARM port maintainers find this patch useful, that would be fine.
>>>>>>
>>>>>
>>>>> I need to look into this more, but my first thought was that the
>>>>> documentation is confusing, or there is a real problem in this patch.
>>>>>
>>>>
>>>> Thanks for your quick response.
>>>>
>>>> The documentation is actually the most difficult part for me.
>>>>
>>>>> As I understand it the frame has to be re-laid out each time the
>>>>> contents of the frame changes (an extra register becomes live or another
>>>>> spill slot is needed).  So saying that it is laid out once can't be
>>>>> right if (as I read it initially) you mean 'once per function' since I
>>>>> think it needs to be 'once for each time the frame contents changes'.
>>>>>
>>>>> Of course, things might be a bit different with LRA compared with
>>>>> reload, but I strongly suspect this is never going to be an 'exactly
>>>>> once per function' operation.
>>>>>
>>>>
>>>> Right.  It will be done 2 or 3 times for each function.
>>>> LRA and reload behave identical in that respect.
>>>>
>>>> But each time reload changes something in the input data the
>>>> INITIAL_EMIMINATION_OFFSET is called several times, and the results
>>>> have to be consistent in each iteration.
>>>>
>>>> The frame layout function has no way to know if the frame layout
>>>> is expected to change or not.
>>>>
>>>> Many targets use reload_completed as an indication when the frame layout
>>>> may not change at all, but that is only an approximation.
>>>>
>>>>> Can you clarify your meaning in the documentation please?
>>>>>
>>>>
>>>> I meant 'once' in the sense of 'once for each time the frame contents
>>>> change'.
>>>>
>>>> Thus I'd change that sentence to:
>>>>
>>>> "This target hook allows the target to compute the frame layout once for
>>>> each time the frame contents change and make use of the cached frame
>>>> layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it
>>>> on every invocation.  This is particularly useful for targets that have
>>>> an expensive frame layout function.  Implementing this callback is
>>>> optional."
>>>>
>>>
>>> Thanks, that's pretty much what I expected would be the case.
>>>
>>> Could I suggest:
>>>
>>> This target hook is called once each time the frame layout needs to be
>>> recalculated.  The calculations can be cached by the target and can then
>>> be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
>>> layout on every invocation of that hook.  This is particularly useful
>>> for targets that have an expensive frame layout function.  Implementing
>>> this callback is optional.
>>>
>>> R.
>>>
>>>>
>>>> Thanks
>>>> Bernd.
>>>>
>>>>
>>>>> R.
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Bernd.
>>>>>>
>>>>>> On 06/21/16 23:29, Jeff Law wrote:
>>>>>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>>>>>>>> Hi!
>>>>>>>>
>>>>>>>>
>>>>>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>>>>>>>> it is necessary to call this function several times with
>>>>>>>> different register combinations.
>>>>>>>> Most targets use a cached data structure that describes the
>>>>>>>> exact frame layout of the current function.
>>>>>>>>
>>>>>>>> It is safe to skip the computation when reload_completed = true,
>>>>>>>> and most targets do that already.
>>>>>>>>
>>>>>>>> However while reload is doing its work, it is not clear when to
>>>>>>>> do the computation and when not.  This results in unnecessary
>>>>>>>> work.  Computing the frame layout can be a simple function or an
>>>>>>>> arbitrarily complex one, that walks all instructions of the current
>>>>>>>> function for instance, which is more or less the common case.
>>>>>>>>
>>>>>>>>
>>>>>>>> This patch adds a new optional target hook that can be used
>>>>>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>>>>>>>> into a O(n) computation part, and a O(1) result function.
>>>>>>>>
>>>>>>>> The patch implements a compute_frame_layout target hook just
>>>>>>>> for ARM in the moment, to show the principle.
>>>>>>>> Other targets may also implement that hook, if it seems appropriate.
>>>>>>>>
>>>>>>>>
>>>>>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>>>>>>>> OK for trunk?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Bernd.
>>>>>>>>
>>>>>>>>
>>>>>>>> changelog-frame-layout.txt
>>>>>>>>
>>>>>>>>
>>>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>>>
>>>>>>>>       * target.def (compute_frame_layout): New optional target hook.
>>>>>>>>       * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>>>       * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>>>       * lra-eliminations.c (update_reg_eliminate): Call
>>>>>>>> compute_frame_layout
>>>>>>>>       target hook.
>>>>>>>>       * reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>>>       * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>>>       (use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>>>       (arm_get_frame_offsets): Split up into this ...
>>>>>>>>       (arm_compute_frame_layout): ... and this function.
>>>>>>> The ARM maintainers would need to chime in on the ARM specific changes
>>>>>>> though.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Index: gcc/target.def
>>>>>>>> ===================================================================
>>>>>>>> --- gcc/target.def    (Revision 233176)
>>>>>>>> +++ gcc/target.def    (Arbeitskopie)
>>>>>>>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>>>>>>>     unsigned int, (void),
>>>>>>>>     default_case_values_threshold)
>>>>>>>>
>>>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>>> s/Retutn/Return
>>>>>>>
>>>>>>>> +/* Optional callback to advise the target to compute the frame
>>>>>>>> layout.  */
>>>>>>>>    DEFHOOK
>>>>>>>> +(compute_frame_layout,
>>>>>>>> + "This target hook is called immediately before reload wants to call\n\
>>>>>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>>>>>>>> frame\n\
>>>>>>>> +layout instead of re-computing it on every invocation.  This is
>>>>>>>> particularly\n\
>>>>>>>> +useful for targets that have an O(n) frame layout function.
>>>>>>>> Implementing\n\
>>>>>>>> +this callback is optional.",
>>>>>>>> + void, (void),
>>>>>>>> + hook_void_void)
>>>>>>> So the docs say "immediately before", but that's not actually reality in
>>>>>>> lra-eliminations.  I think you can just say "This target hook is called
>>>>>>> before reload or lra-eliminations calls
>>>>>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>>>>>>>
>>>>>>>
>>>>>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?
>>>>>>>
>>>>>>> I'm OK with this conceptually.  I think you need a minor doc update and
>>>>>>> OK from the ARM maintainers before it can be installed though.
>>>>>>>
>>>>>>> jeff
>>>>>>>
>>>>>>> changelog-frame-layout.txt
>>>>>>>
>>>>>>>
>>>>>>> 2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>
>>>>>>>
>>>>>>> 	* target.def (compute_frame_layout): New optional target hook.
>>>>>>> 	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>>>>>>> 	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>>>>>>> 	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
>>>>>>> 	target hook.
>>>>>>> 	* reload1.c (verify_initial_elim_offsets): Likewise.
>>>>>>> 	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>>>>>>> 	(use_simple_return_p): Call arm_compute_frame_layout if needed.
>>>>>>> 	(arm_get_frame_offsets): Split up into this ...
>>>>>>> 	(arm_compute_frame_layout): ... and this function.
>>>>>>>
>>>>>>> patch-frame-layout.diff
>>>>>>>
>>>>>>>
>>>>>>> Index: gcc/config/arm/arm.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/config/arm/arm.c	(revision 239144)
>>>>>>> +++ gcc/config/arm/arm.c	(working copy)
>>>>>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
>>>>>>>    static bool arm_needs_doubleword_align (machine_mode, const_tree);
>>>>>>>    static int arm_compute_static_chain_stack_bytes (void);
>>>>>>>    static arm_stack_offsets *arm_get_frame_offsets (void);
>>>>>>> +static void arm_compute_frame_layout (void);
>>>>>>>    static void arm_add_gc_roots (void);
>>>>>>>    static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
>>>>>>>    			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
>>>>>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t
>>>>>>>    #undef TARGET_SCALAR_MODE_SUPPORTED_P
>>>>>>>    #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
>>>>>>>
>>>>>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT
>>>>>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
>>>>>>> +
>>>>>>>    #undef TARGET_FRAME_POINTER_REQUIRED
>>>>>>>    #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
>>>>>>>
>>>>>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void)
>>>>>>>    {
>>>>>>>      arm_stack_offsets *offsets;
>>>>>>>
>>>>>>> +  /* Note this function can be called before or after reload.  */
>>>>>>> +  if (!reload_completed)
>>>>>>> +    arm_compute_frame_layout ();
>>>>>>> +
>>>>>>>      offsets = arm_get_frame_offsets ();
>>>>>>>      return offsets->outgoing_args != 0;
>>>>>>>    }
>>>>>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void)
>>>>>>>
>>>>>>>    /* Compute a bit mask of which registers need to be
>>>>>>>       saved on the stack for the current function.
>>>>>>> -   This is used by arm_get_frame_offsets, which may add extra registers.  */
>>>>>>> +   This is used by arm_compute_frame_layout, which may add extra registers.  */
>>>>>>>
>>>>>>>    static unsigned long
>>>>>>>    arm_compute_save_reg_mask (void)
>>>>>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void)
>>>>>>>      alignment.  */
>>>>>>>
>>>>>>>
>>>>>>> +/* Return cached stack offsets.  */
>>>>>>> +
>>>>>>> +static arm_stack_offsets *
>>>>>>> +arm_get_frame_offsets (void)
>>>>>>> +{
>>>>>>> +  struct arm_stack_offsets *offsets;
>>>>>>> +
>>>>>>> +  offsets = &cfun->machine->stack_offsets;
>>>>>>> +
>>>>>>> +  return offsets;
>>>>>>> +}
>>>>>>> +
>>>>>>> +
>>>>>>>    /* Calculate stack offsets.  These are used to calculate register elimination
>>>>>>>       offsets and in prologue/epilogue code.  Also calculates which registers
>>>>>>>       should be saved.  */
>>>>>>>
>>>>>>> -static arm_stack_offsets *
>>>>>>> -arm_get_frame_offsets (void)
>>>>>>> +static void
>>>>>>> +arm_compute_frame_layout (void)
>>>>>>>    {
>>>>>>>      struct arm_stack_offsets *offsets;
>>>>>>>      unsigned long func_type;
>>>>>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>>
>>>>>>>      offsets = &cfun->machine->stack_offsets;
>>>>>>>
>>>>>>> -  /* We need to know if we are a leaf function.  Unfortunately, it
>>>>>>> -     is possible to be called after start_sequence has been called,
>>>>>>> -     which causes get_insns to return the insns for the sequence,
>>>>>>> -     not the function, which will cause leaf_function_p to return
>>>>>>> -     the incorrect result.
>>>>>>> -
>>>>>>> -     to know about leaf functions once reload has completed, and the
>>>>>>> -     frame size cannot be changed after that time, so we can safely
>>>>>>> -     use the cached value.  */
>>>>>>> -
>>>>>>> -  if (reload_completed)
>>>>>>> -    return offsets;
>>>>>>> -
>>>>>>>      /* Initially this is the size of the local variables.  It will translated
>>>>>>>         into an offset once we have determined the size of preceding data.  */
>>>>>>>      frame_size = ROUND_UP_WORD (get_frame_size ());
>>>>>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void)
>>>>>>>        {
>>>>>>>          offsets->outgoing_args = offsets->soft_frame;
>>>>>>>          offsets->locals_base = offsets->soft_frame;
>>>>>>> -      return offsets;
>>>>>>> +      return;
>>>>>>>        }
>>>>>>>
>>>>>>>      /* Ensure SFP has the correct alignment.  */
>>>>>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void)
>>>>>>>    	offsets->outgoing_args += 4;
>>>>>>>          gcc_assert (!(offsets->outgoing_args & 7));
>>>>>>>        }
>>>>>>> -
>>>>>>> -  return offsets;
>>>>>>>    }
>>>>>>>
>>>>>>>
>>>>>>> Index: gcc/doc/tm.texi
>>>>>>> ===================================================================
>>>>>>> --- gcc/doc/tm.texi	(revision 239144)
>>>>>>> +++ gcc/doc/tm.texi	(working copy)
>>>>>>> @@ -3693,6 +3693,14 @@ registers.  This macro must be defined if @code{EL
>>>>>>>    defined.
>>>>>>>    @end defmac
>>>>>>>
>>>>>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
>>>>>>> +This target hook allows the target to compute the frame layout once and
>>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}
>>>>>>> +instead of re-computing it on every invocation.  This is particularly
>>>>>>> +useful for targets that have an expensive frame layout function.
>>>>>>> +Implementing this callback is optional.
>>>>>>> +@end deftypefn
>>>>>>> +
>>>>>>>    @node Stack Arguments
>>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>>    @cindex arguments on stack
>>>>>>> Index: gcc/doc/tm.texi.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/doc/tm.texi.in	(revision 239144)
>>>>>>> +++ gcc/doc/tm.texi.in	(working copy)
>>>>>>> @@ -3227,6 +3227,8 @@ registers.  This macro must be defined if @code{EL
>>>>>>>    defined.
>>>>>>>    @end defmac
>>>>>>>
>>>>>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT
>>>>>>> +
>>>>>>>    @node Stack Arguments
>>>>>>>    @subsection Passing Function Arguments on the Stack
>>>>>>>    @cindex arguments on stack
>>>>>>> Index: gcc/lra-eliminations.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/lra-eliminations.c	(revision 239144)
>>>>>>> +++ gcc/lra-eliminations.c	(working copy)
>>>>>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
>>>>>>>      struct lra_elim_table *ep, *ep1;
>>>>>>>      HARD_REG_SET temp_hard_reg_set;
>>>>>>>
>>>>>>> +#ifdef ELIMINABLE_REGS
>>>>>>> +  targetm.compute_frame_layout ();
>>>>>>> +#endif
>>>>>>> +
>>>>>>>      /* Clear self elimination offsets.  */
>>>>>>>      for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>>        self_elim_offsets[ep->from] = 0;
>>>>>>> Index: gcc/reload1.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/reload1.c	(revision 239144)
>>>>>>> +++ gcc/reload1.c	(working copy)
>>>>>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void)
>>>>>>>      {
>>>>>>>       struct elim_table *ep;
>>>>>>>
>>>>>>> +   targetm.compute_frame_layout ();
>>>>>>>       for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>>         {
>>>>>>>           INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
>>>>>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void)
>>>>>>>      struct elim_table *ep = reg_eliminate;
>>>>>>>
>>>>>>>    #ifdef ELIMINABLE_REGS
>>>>>>> +  targetm.compute_frame_layout ();
>>>>>>>      for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
>>>>>>>        {
>>>>>>>          INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
>>>>>>> Index: gcc/target.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/target.def	(revision 239144)
>>>>>>> +++ gcc/target.def	(working copy)
>>>>>>> @@ -5269,8 +5269,19 @@ five otherwise.  This is best for most machines.",
>>>>>>>     unsigned int, (void),
>>>>>>>     default_case_values_threshold)
>>>>>>>
>>>>>>> -/* Retutn true if a function must have and use a frame pointer.  */
>>>>>>> +/* Optional callback to advise the target to compute the frame layout.  */
>>>>>>>    DEFHOOK
>>>>>>> +(compute_frame_layout,
>>>>>>> + "This target hook allows the target to compute the frame layout once and\n\
>>>>>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\
>>>>>>> +instead of re-computing it on every invocation.  This is particularly\n\
>>>>>>> +useful for targets that have an expensive frame layout function.\n\
>>>>>>> +Implementing this callback is optional.",
>>>>>>> + void, (void),
>>>>>>> + hook_void_void)
>>>>>>> +
>>>>>>> +/* Return true if a function must have and use a frame pointer.  */
>>>>>>> +DEFHOOK
>>>>>>>    (frame_pointer_required,
>>>>>>>     "This target hook should return @code{true} if a function must have and use\n\
>>>>>>>    a frame pointer.  This target hook is called in the reload pass.  If its return\n\
>>>>>
>>>>
>>>
>

[-- Attachment #2: changelog-frame-layout.txt --]
[-- Type: text/plain, Size: 622 bytes --]

2016-06-16  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* target.def (compute_frame_layout): New optional target hook.
	* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
	* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
	* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
	target hook.
	* reload1.c (verify_initial_elim_offsets): Likewise.
	* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
	(use_simple_return_p): Call arm_compute_frame_layout if needed.
	(arm_get_frame_offsets): Split up into this ...
	(arm_compute_frame_layout): ... and this function.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: patch-frame-layout.diff --]
[-- Type: text/x-patch; name="patch-frame-layout.diff", Size: 6915 bytes --]

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 247710)
+++ gcc/config/arm/arm.c	(working copy)
@@ -85,6 +85,7 @@ static bool arm_const_not_ok_for_debug_p (rtx);
 static int arm_needs_doubleword_align (machine_mode, const_tree);
 static int arm_compute_static_chain_stack_bytes (void);
 static arm_stack_offsets *arm_get_frame_offsets (void);
+static void arm_compute_frame_layout (void);
 static void arm_add_gc_roots (void);
 static int arm_gen_constant (enum rtx_code, machine_mode, rtx,
 			     unsigned HOST_WIDE_INT, rtx, rtx, int, int);
@@ -680,6 +681,9 @@ static const struct attribute_spec arm_attribute_t
 #undef TARGET_SCALAR_MODE_SUPPORTED_P
 #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p
 
+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout
+
 #undef TARGET_FRAME_POINTER_REQUIRED
 #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required
 
@@ -4031,6 +4035,10 @@ use_simple_return_p (void)
 {
   arm_stack_offsets *offsets;
 
+  /* Note this function can be called before or after reload.  */
+  if (!reload_completed)
+    arm_compute_frame_layout ();
+
   offsets = arm_get_frame_offsets ();
   return offsets->outgoing_args != 0;
 }
@@ -19138,7 +19146,7 @@ arm_compute_static_chain_stack_bytes (void)
 
 /* Compute a bit mask of which registers need to be
    saved on the stack for the current function.
-   This is used by arm_get_frame_offsets, which may add extra registers.  */
+   This is used by arm_compute_frame_layout, which may add extra registers.  */
 
 static unsigned long
 arm_compute_save_reg_mask (void)
@@ -20772,12 +20780,25 @@ any_sibcall_could_use_r3 (void)
   alignment.  */
 
 
+/* Return cached stack offsets.  */
+
+static arm_stack_offsets *
+arm_get_frame_offsets (void)
+{
+  struct arm_stack_offsets *offsets;
+
+  offsets = &cfun->machine->stack_offsets;
+
+  return offsets;
+}
+
+
 /* Calculate stack offsets.  These are used to calculate register elimination
    offsets and in prologue/epilogue code.  Also calculates which registers
    should be saved.  */
 
-static arm_stack_offsets *
-arm_get_frame_offsets (void)
+static void
+arm_compute_frame_layout (void)
 {
   struct arm_stack_offsets *offsets;
   unsigned long func_type;
@@ -20788,9 +20809,6 @@ any_sibcall_could_use_r3 (void)
 
   offsets = &cfun->machine->stack_offsets;
 
-  if (reload_completed)
-    return offsets;
-
   /* Initially this is the size of the local variables.  It will translated
      into an offset once we have determined the size of preceding data.  */
   frame_size = ROUND_UP_WORD (get_frame_size ());
@@ -20855,7 +20873,7 @@ any_sibcall_could_use_r3 (void)
     {
       offsets->outgoing_args = offsets->soft_frame;
       offsets->locals_base = offsets->soft_frame;
-      return offsets;
+      return;
     }
 
   /* Ensure SFP has the correct alignment.  */
@@ -20931,8 +20949,6 @@ any_sibcall_could_use_r3 (void)
 	offsets->outgoing_args += 4;
       gcc_assert (!(offsets->outgoing_args & 7));
     }
-
-  return offsets;
 }
 
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 247710)
+++ gcc/doc/tm.texi	(working copy)
@@ -3684,6 +3684,15 @@ such as the result of @code{get_frame_size ()} and
 registers @code{df_regs_ever_live_p} and @code{call_used_regs}.
 @end defmac
 
+@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void)
+This target hook is called once each time the frame layout needs to be
+recalculated.  The calculations can be cached by the target and can then
+be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the
+layout on every invocation of that hook.  This is particularly useful
+for targets that have an expensive frame layout function.  Implementing
+this callback is optional.
+@end deftypefn
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 247710)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -3213,6 +3213,8 @@ such as the result of @code{get_frame_size ()} and
 registers @code{df_regs_ever_live_p} and @code{call_used_regs}.
 @end defmac
 
+@hook TARGET_COMPUTE_FRAME_LAYOUT
+
 @node Stack Arguments
 @subsection Passing Function Arguments on the Stack
 @cindex arguments on stack
Index: gcc/lra-eliminations.c
===================================================================
--- gcc/lra-eliminations.c	(revision 247710)
+++ gcc/lra-eliminations.c	(working copy)
@@ -1196,6 +1196,10 @@ update_reg_eliminate (bitmap insns_with_changed_of
   struct lra_elim_table *ep, *ep1;
   HARD_REG_SET temp_hard_reg_set;
 
+#ifdef ELIMINABLE_REGS
+  targetm.compute_frame_layout ();
+#endif
+
   /* Clear self elimination offsets.  */
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     self_elim_offsets[ep->from] = 0;
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	(revision 247710)
+++ gcc/reload1.c	(working copy)
@@ -3821,6 +3821,7 @@ verify_initial_elim_offsets (void)
   if (!num_eliminable)
     return true;
 
+  targetm.compute_frame_layout ();
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
@@ -3838,6 +3839,7 @@ set_initial_elim_offsets (void)
 {
   struct elim_table *ep = reg_eliminate;
 
+  targetm.compute_frame_layout ();
   for (; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset);
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 247710)
+++ gcc/target.def	(working copy)
@@ -5395,6 +5395,18 @@ five otherwise.  This is best for most machines.",
  unsigned int, (void),
  default_case_values_threshold)
 
+/* Optional callback to advise the target to compute the frame layout.  */
+DEFHOOK
+(compute_frame_layout,
+ "This target hook is called once each time the frame layout needs to be\n\
+recalculated.  The calculations can be cached by the target and can then\n\
+be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the\n\
+layout on every invocation of that hook.  This is particularly useful\n\
+for targets that have an expensive frame layout function.  Implementing\n\
+this callback is optional.",
+ void, (void),
+ hook_void_void)
+
 /* Return true if a function must have and use a frame pointer.  */
 DEFHOOK
 (frame_pointer_required,

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-05-06 18:07 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 14:47 [PATCH] Add a new target hook to compute the frame layout Bernd Edlinger
2016-06-21 21:30 ` Jeff Law
2016-06-22  5:12   ` Bernd Edlinger
2016-06-22  7:21     ` AW: " Bernd Edlinger
2016-06-22 18:34       ` Jeff Law
2016-06-22 18:49     ` Jeff Law
2016-06-22 20:25       ` Bernd Edlinger
2016-08-04 21:16   ` [PATCH, ARM] " Bernd Edlinger
2016-08-05  9:29     ` Richard Earnshaw (lists)
2016-08-05 12:49       ` Bernd Edlinger
2016-08-05 14:06         ` Richard Earnshaw (lists)
2016-08-05 15:23           ` Bernd Edlinger
2016-09-05 16:45           ` Bernd Edlinger
2017-05-05 15:02             ` Richard Earnshaw (lists)
2017-05-06 18:16               ` Bernd Edlinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).