From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112768 invoked by alias); 5 Aug 2016 14:06:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 112757 invoked by uid 89); 5 Aug 2016 14:06:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=BAYES_00,KAM_ASCII_DIVIDERS,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=five, SFP X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 05 Aug 2016 14:06:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2FE6828; Fri, 5 Aug 2016 07:07:39 -0700 (PDT) Received: from e105689-lin.cambridge.arm.com (e105689-lin.cambridge.arm.com [10.2.207.32]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0519A3F215; Fri, 5 Aug 2016 07:06:13 -0700 (PDT) Subject: Re: [PATCH, ARM] Add a new target hook to compute the frame layout To: Bernd Edlinger , Jeff Law , "gcc-patches@gcc.gnu.org" References: <538a413b-ef74-4a9a-1665-f3b84a8d9035@redhat.com> <67767f62-f973-546b-1e18-1d3bf472d748@arm.com> Cc: Richard Biener , Jakub Jelinek , Richard Sandiford , Ramana Radhakrishnan , Nick Clifton From: "Richard Earnshaw (lists)" Message-ID: <4b971f30-b5e2-7960-aab4-a873f3ed433e@arm.com> Date: Fri, 05 Aug 2016 14:06:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-SW-Source: 2016-08/txt/msg00464.txt.bz2 On 05/08/16 13:49, Bernd Edlinger wrote: > On 08/05/16 11:29, Richard Earnshaw (lists) wrote: >> On 04/08/16 22:16, Bernd Edlinger wrote: >>> Hi, >>> >>> this patch introduces a new target hook that allows the target's >>> INITIAL_ELIMINATION_OFFSET function to use cached values instead of >>> re-computing the frame layout every time. >>> >>> I have updated the documentation a bit and hope it is clearer this time. >>> >>> It still needs a review by ARM port maintainers. >>> >>> If the ARM port maintainers find this patch useful, that would be fine. >>> >> >> I need to look into this more, but my first thought was that the >> documentation is confusing, or there is a real problem in this patch. >> > > Thanks for your quick response. > > The documentation is actually the most difficult part for me. > >> As I understand it the frame has to be re-laid out each time the >> contents of the frame changes (an extra register becomes live or another >> spill slot is needed). So saying that it is laid out once can't be >> right if (as I read it initially) you mean 'once per function' since I >> think it needs to be 'once for each time the frame contents changes'. >> >> Of course, things might be a bit different with LRA compared with >> reload, but I strongly suspect this is never going to be an 'exactly >> once per function' operation. >> > > Right. It will be done 2 or 3 times for each function. > LRA and reload behave identical in that respect. > > But each time reload changes something in the input data the > INITIAL_EMIMINATION_OFFSET is called several times, and the results > have to be consistent in each iteration. > > The frame layout function has no way to know if the frame layout > is expected to change or not. > > Many targets use reload_completed as an indication when the frame layout > may not change at all, but that is only an approximation. > >> Can you clarify your meaning in the documentation please? >> > > I meant 'once' in the sense of 'once for each time the frame contents > change'. > > Thus I'd change that sentence to: > > "This target hook allows the target to compute the frame layout once for > each time the frame contents change and make use of the cached frame > layout in @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing it > on every invocation. This is particularly useful for targets that have > an expensive frame layout function. Implementing this callback is > optional." > Thanks, that's pretty much what I expected would be the case. Could I suggest: This target hook is called once each time the frame layout needs to be recalculated. The calculations can be cached by the target and can then be used by @code{INITIAL_ELIMINATION_OFFSET} instead of re-computing the layout on every invocation of that hook. This is particularly useful for targets that have an expensive frame layout function. Implementing this callback is optional. R. > > Thanks > Bernd. > > >> R. >> >>> >>> Thanks >>> Bernd. >>> >>> On 06/21/16 23:29, Jeff Law wrote: >>>> On 06/16/2016 08:47 AM, Bernd Edlinger wrote: >>>>> Hi! >>>>> >>>>> >>>>> By the design of the target hook INITIAL_ELIMINATION_OFFSET >>>>> it is necessary to call this function several times with >>>>> different register combinations. >>>>> Most targets use a cached data structure that describes the >>>>> exact frame layout of the current function. >>>>> >>>>> It is safe to skip the computation when reload_completed = true, >>>>> and most targets do that already. >>>>> >>>>> However while reload is doing its work, it is not clear when to >>>>> do the computation and when not. This results in unnecessary >>>>> work. Computing the frame layout can be a simple function or an >>>>> arbitrarily complex one, that walks all instructions of the current >>>>> function for instance, which is more or less the common case. >>>>> >>>>> >>>>> This patch adds a new optional target hook that can be used >>>>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook >>>>> into a O(n) computation part, and a O(1) result function. >>>>> >>>>> The patch implements a compute_frame_layout target hook just >>>>> for ARM in the moment, to show the principle. >>>>> Other targets may also implement that hook, if it seems appropriate. >>>>> >>>>> >>>>> Boot-strapped and reg-tested on arm-linux-gnueabihf. >>>>> OK for trunk? >>>>> >>>>> >>>>> Thanks >>>>> Bernd. >>>>> >>>>> >>>>> changelog-frame-layout.txt >>>>> >>>>> >>>>> 2016-06-16 Bernd Edlinger >>>>> >>>>> * target.def (compute_frame_layout): New optional target hook. >>>>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook. >>>>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation. >>>>> * lra-eliminations.c (update_reg_eliminate): Call >>>>> compute_frame_layout >>>>> target hook. >>>>> * reload1.c (verify_initial_elim_offsets): Likewise. >>>>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define. >>>>> (use_simple_return_p): Call arm_compute_frame_layout if needed. >>>>> (arm_get_frame_offsets): Split up into this ... >>>>> (arm_compute_frame_layout): ... and this function. >>>> The ARM maintainers would need to chime in on the ARM specific changes >>>> though. >>>> >>>> >>>> >>>>> Index: gcc/target.def >>>>> =================================================================== >>>>> --- gcc/target.def (Revision 233176) >>>>> +++ gcc/target.def (Arbeitskopie) >>>>> @@ -5245,8 +5245,19 @@ five otherwise. This is best for most machines.", >>>>> unsigned int, (void), >>>>> default_case_values_threshold) >>>>> >>>>> -/* Retutn true if a function must have and use a frame pointer. */ >>>> s/Retutn/Return >>>> >>>>> +/* Optional callback to advise the target to compute the frame >>>>> layout. */ >>>>> DEFHOOK >>>>> +(compute_frame_layout, >>>>> + "This target hook is called immediately before reload wants to call\n\ >>>>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the >>>>> frame\n\ >>>>> +layout instead of re-computing it on every invocation. This is >>>>> particularly\n\ >>>>> +useful for targets that have an O(n) frame layout function. >>>>> Implementing\n\ >>>>> +this callback is optional.", >>>>> + void, (void), >>>>> + hook_void_void) >>>> So the docs say "immediately before", but that's not actually reality in >>>> lra-eliminations. I think you can just say "This target hook is called >>>> before reload or lra-eliminations calls >>>> @code{INITIAL_ELIMINATION_OFFSET} and allows ..." >>>> >>>> >>>> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET? >>>> >>>> I'm OK with this conceptually. I think you need a minor doc update and >>>> OK from the ARM maintainers before it can be installed though. >>>> >>>> jeff >>>> >>>> changelog-frame-layout.txt >>>> >>>> >>>> 2016-06-16 Bernd Edlinger >>>> >>>> * target.def (compute_frame_layout): New optional target hook. >>>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook. >>>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation. >>>> * lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout >>>> target hook. >>>> * reload1.c (verify_initial_elim_offsets): Likewise. >>>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define. >>>> (use_simple_return_p): Call arm_compute_frame_layout if needed. >>>> (arm_get_frame_offsets): Split up into this ... >>>> (arm_compute_frame_layout): ... and this function. >>>> >>>> patch-frame-layout.diff >>>> >>>> >>>> Index: gcc/config/arm/arm.c >>>> =================================================================== >>>> --- gcc/config/arm/arm.c (revision 239144) >>>> +++ gcc/config/arm/arm.c (working copy) >>>> @@ -81,6 +81,7 @@ static bool arm_const_not_ok_for_debug_p (rtx); >>>> static bool arm_needs_doubleword_align (machine_mode, const_tree); >>>> static int arm_compute_static_chain_stack_bytes (void); >>>> static arm_stack_offsets *arm_get_frame_offsets (void); >>>> +static void arm_compute_frame_layout (void); >>>> static void arm_add_gc_roots (void); >>>> static int arm_gen_constant (enum rtx_code, machine_mode, rtx, >>>> unsigned HOST_WIDE_INT, rtx, rtx, int, int); >>>> @@ -663,6 +664,9 @@ static const struct attribute_spec arm_attribute_t >>>> #undef TARGET_SCALAR_MODE_SUPPORTED_P >>>> #define TARGET_SCALAR_MODE_SUPPORTED_P arm_scalar_mode_supported_p >>>> >>>> +#undef TARGET_COMPUTE_FRAME_LAYOUT >>>> +#define TARGET_COMPUTE_FRAME_LAYOUT arm_compute_frame_layout >>>> + >>>> #undef TARGET_FRAME_POINTER_REQUIRED >>>> #define TARGET_FRAME_POINTER_REQUIRED arm_frame_pointer_required >>>> >>>> @@ -3880,6 +3884,10 @@ use_simple_return_p (void) >>>> { >>>> arm_stack_offsets *offsets; >>>> >>>> + /* Note this function can be called before or after reload. */ >>>> + if (!reload_completed) >>>> + arm_compute_frame_layout (); >>>> + >>>> offsets = arm_get_frame_offsets (); >>>> return offsets->outgoing_args != 0; >>>> } >>>> @@ -19370,7 +19378,7 @@ arm_compute_static_chain_stack_bytes (void) >>>> >>>> /* Compute a bit mask of which registers need to be >>>> saved on the stack for the current function. >>>> - This is used by arm_get_frame_offsets, which may add extra registers. */ >>>> + This is used by arm_compute_frame_layout, which may add extra registers. */ >>>> >>>> static unsigned long >>>> arm_compute_save_reg_mask (void) >>>> @@ -20926,12 +20934,25 @@ any_sibcall_could_use_r3 (void) >>>> alignment. */ >>>> >>>> >>>> +/* Return cached stack offsets. */ >>>> + >>>> +static arm_stack_offsets * >>>> +arm_get_frame_offsets (void) >>>> +{ >>>> + struct arm_stack_offsets *offsets; >>>> + >>>> + offsets = &cfun->machine->stack_offsets; >>>> + >>>> + return offsets; >>>> +} >>>> + >>>> + >>>> /* Calculate stack offsets. These are used to calculate register elimination >>>> offsets and in prologue/epilogue code. Also calculates which registers >>>> should be saved. */ >>>> >>>> -static arm_stack_offsets * >>>> -arm_get_frame_offsets (void) >>>> +static void >>>> +arm_compute_frame_layout (void) >>>> { >>>> struct arm_stack_offsets *offsets; >>>> unsigned long func_type; >>>> @@ -20943,19 +20964,6 @@ any_sibcall_could_use_r3 (void) >>>> >>>> offsets = &cfun->machine->stack_offsets; >>>> >>>> - /* We need to know if we are a leaf function. Unfortunately, it >>>> - is possible to be called after start_sequence has been called, >>>> - which causes get_insns to return the insns for the sequence, >>>> - not the function, which will cause leaf_function_p to return >>>> - the incorrect result. >>>> - >>>> - to know about leaf functions once reload has completed, and the >>>> - frame size cannot be changed after that time, so we can safely >>>> - use the cached value. */ >>>> - >>>> - if (reload_completed) >>>> - return offsets; >>>> - >>>> /* Initially this is the size of the local variables. It will translated >>>> into an offset once we have determined the size of preceding data. */ >>>> frame_size = ROUND_UP_WORD (get_frame_size ()); >>>> @@ -21022,7 +21030,7 @@ any_sibcall_could_use_r3 (void) >>>> { >>>> offsets->outgoing_args = offsets->soft_frame; >>>> offsets->locals_base = offsets->soft_frame; >>>> - return offsets; >>>> + return; >>>> } >>>> >>>> /* Ensure SFP has the correct alignment. */ >>>> @@ -21098,8 +21106,6 @@ any_sibcall_could_use_r3 (void) >>>> offsets->outgoing_args += 4; >>>> gcc_assert (!(offsets->outgoing_args & 7)); >>>> } >>>> - >>>> - return offsets; >>>> } >>>> >>>> >>>> Index: gcc/doc/tm.texi >>>> =================================================================== >>>> --- gcc/doc/tm.texi (revision 239144) >>>> +++ gcc/doc/tm.texi (working copy) >>>> @@ -3693,6 +3693,14 @@ registers. This macro must be defined if @code{EL >>>> defined. >>>> @end defmac >>>> >>>> +@deftypefn {Target Hook} void TARGET_COMPUTE_FRAME_LAYOUT (void) >>>> +This target hook allows the target to compute the frame layout once and >>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET} >>>> +instead of re-computing it on every invocation. This is particularly >>>> +useful for targets that have an expensive frame layout function. >>>> +Implementing this callback is optional. >>>> +@end deftypefn >>>> + >>>> @node Stack Arguments >>>> @subsection Passing Function Arguments on the Stack >>>> @cindex arguments on stack >>>> Index: gcc/doc/tm.texi.in >>>> =================================================================== >>>> --- gcc/doc/tm.texi.in (revision 239144) >>>> +++ gcc/doc/tm.texi.in (working copy) >>>> @@ -3227,6 +3227,8 @@ registers. This macro must be defined if @code{EL >>>> defined. >>>> @end defmac >>>> >>>> +@hook TARGET_COMPUTE_FRAME_LAYOUT >>>> + >>>> @node Stack Arguments >>>> @subsection Passing Function Arguments on the Stack >>>> @cindex arguments on stack >>>> Index: gcc/lra-eliminations.c >>>> =================================================================== >>>> --- gcc/lra-eliminations.c (revision 239144) >>>> +++ gcc/lra-eliminations.c (working copy) >>>> @@ -1203,6 +1203,10 @@ update_reg_eliminate (bitmap insns_with_changed_of >>>> struct lra_elim_table *ep, *ep1; >>>> HARD_REG_SET temp_hard_reg_set; >>>> >>>> +#ifdef ELIMINABLE_REGS >>>> + targetm.compute_frame_layout (); >>>> +#endif >>>> + >>>> /* Clear self elimination offsets. */ >>>> for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) >>>> self_elim_offsets[ep->from] = 0; >>>> Index: gcc/reload1.c >>>> =================================================================== >>>> --- gcc/reload1.c (revision 239144) >>>> +++ gcc/reload1.c (working copy) >>>> @@ -3831,6 +3831,7 @@ verify_initial_elim_offsets (void) >>>> { >>>> struct elim_table *ep; >>>> >>>> + targetm.compute_frame_layout (); >>>> for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) >>>> { >>>> INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t); >>>> @@ -3855,6 +3856,7 @@ set_initial_elim_offsets (void) >>>> struct elim_table *ep = reg_eliminate; >>>> >>>> #ifdef ELIMINABLE_REGS >>>> + targetm.compute_frame_layout (); >>>> for (; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) >>>> { >>>> INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, ep->initial_offset); >>>> Index: gcc/target.def >>>> =================================================================== >>>> --- gcc/target.def (revision 239144) >>>> +++ gcc/target.def (working copy) >>>> @@ -5269,8 +5269,19 @@ five otherwise. This is best for most machines.", >>>> unsigned int, (void), >>>> default_case_values_threshold) >>>> >>>> -/* Retutn true if a function must have and use a frame pointer. */ >>>> +/* Optional callback to advise the target to compute the frame layout. */ >>>> DEFHOOK >>>> +(compute_frame_layout, >>>> + "This target hook allows the target to compute the frame layout once and\n\ >>>> +make use of the cached frame layout in @code{INITIAL_ELIMINATION_OFFSET}\n\ >>>> +instead of re-computing it on every invocation. This is particularly\n\ >>>> +useful for targets that have an expensive frame layout function.\n\ >>>> +Implementing this callback is optional.", >>>> + void, (void), >>>> + hook_void_void) >>>> + >>>> +/* Return true if a function must have and use a frame pointer. */ >>>> +DEFHOOK >>>> (frame_pointer_required, >>>> "This target hook should return @code{true} if a function must have and use\n\ >>>> a frame pointer. This target hook is called in the reload pass. If its return\n\ >> >