Re: [PATCH] Enable shrink wrapping for the RISC-V target.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Manolis Tsamis <manolis.tsamis@vrull.eu>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org,
	 Vineet Gupta <vineetg@rivosinc.com>,
	Kito Cheng <kito.cheng@gmail.com>,
	philipp.tomsich@vrull.eu
Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target.
Date: Wed, 2 Nov 2022 15:54:47 +0200	[thread overview]
Message-ID: <CAM3yNXpanKFLFz3JgVkmhxzvN4OVhPZ37dsHb6guuJLVM8HCjQ@mail.gmail.com> (raw)
In-Reply-To: <mhng-1ef6a2eb-a41e-4eb7-931a-04efc949db61@palmer-ri-x1c9a>

On Tue, Oct 18, 2022 at 8:35 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 18 Oct 2022 08:57:37 PDT (-0700), jlaw@ventanamicro.com wrote:
> >
> > Just a couple more comments in-line.
> >
> > On 10/18/22 09:18, Manolis Tsamis wrote:
> >>
> >>>> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> >>>> +
> >>>> +static sbitmap
> >>>> +riscv_get_separate_components (void)
> >>>> +{
> >>>> +  HOST_WIDE_INT offset;
> >>>> +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> >>>> +  bitmap_clear (components);
> >>>> +
> >>>> +  if (riscv_use_save_libcall (&cfun->machine->frame)
> >>>> +      || cfun->machine->interrupt_handler_p)
> >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's
> >>> redundant.  That said, I'm not sure riscv_use_save_libcall() is the
> >>> right check here as unless I'm missing something we don't have all those
> >>> other constraints when shrink-wrapping.
> >>>
> >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so the
> >> check for interrupt_handler_p in the branch is not redundant in this case.
> >>
> >> I encountered some issues when shrink wrapping and libcall was used in the same
> >> function. Thinking that libcall replaces the prologue/epilogue I didn't see a
> >> reason to have both at the same time and hence I opted to disable
> >> shrink wrapping in that case. From my understanding this should be harmless?
> >
> > I would have expected things to work fine with libcalls, perhaps with
> > the exception of the save/restore libcalls.  So that needs deeper
> > investigation.
>
> The save/restore libcalls only support saving/restoring a handful of
> register configurations (just the saved X registers in the order they're
> usually saved in by GCC).  It should be OK for correctness to over-save
> registers, but it kind of just un-does the shrink wrapping so not sure
> it's worth worrying about at that point.
>
> There's also some oddness around the save/restore libcall ABI, it's not
> the standard function ABI but instead a GCC-internal one.  IIRC it just
> uses the alternate link register (ie, t0 instead of ra) but I may have
> forgotten something else.
>
> >>> It seems kind of clunky to have two copies of all these loops (and we'll
> >>> need a third to make this work with the V stuff), but we've got that
> >>> issue elsewhere in the port so I don't think you need to fix it here
> >>> (though the V stuff will be there for the v2, so you'll need the third
> >>> copy of each loop).
> >>>
> >> Indeed, I was following the other ports here. Do you think it would be
> >> better to refactor this when the code for the V extension is added?
> >> By taking into account what code will be needed for V, a proper refactored
> >> function could be made to handle all cases.
> >
> > I think refactoring when V gets added would be fine.  While we could
> > probably refactor it correctly now (it isn't terribly complex code after
> > all), but we're more likely to get it right with the least amount of
> > work if we do it when V is submitted.
>
> Some of the V register blocks are already there, but ya I agree we can
> just wait.  There's going to be a bunch of V-related churn for a bit,
> juggling those patches is already enough of a headache ;)
>
> >>> Either way, this deserves a test case.  I think it should be possible to
> >>> write one by introducing some register pressure around a
> >>> shrink-wrappable block that needs a long stack offset and making sure
> >>> in-flight registers don't get trashed.
> >>>
> >> I tried to think of some way to introduce a test like that but couldn't and
> >> I don't see how it would be done. Shrink wrapping only affects saved registers
> >> so there are always available temporaries that are not affected by
> >> shrink wrapping.
> >> (Register pressure should be irrelevant in this case if I understand correctly).
> >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping
> >> should be unaffected from long stack offsets. If you see some way to write
> >> a test for that based on what I explained please explain how I could do that.
> >
> > I think the register pressure was just to ensure that some saves were
> > needed to trigger an attempt to shrink wrap something.  You'd also need
> > something to eat stack space (local array which gets referenced as an
> > asm operand, but where the asm doesn't generate any code perhaps)?
> > Whether or not that works depends on stack layout though which I don't
> > know well enough for riscv.
>
> Sorry for being a bit vague, but it's because I always find it takes a
> bit of time to write up tests like this.  I think something like this
> might do it, but that almost certainly won't work as-is:
>
>     // Some extern bits to try and trip up the optimizer.
>     extern long helper(long *sa, long a, long b, long c, ...);
>     extern long glob_array[1024];
>
>     // The function takes a bunch of arguments to fill up the A
>     // registers.
>     long func(long a, long b, long c, ...) {
>       long stack_array[1024]; // should be big enough to make SP offsets
>                               // take a temporary
>
>       // Here I'm loading up all the T registers with something that
>       // must be saved to the stack, since those helper calls below may
>       // change the value of glob_array.  We probably need to fill the S
>       // registers too?
>       long t0 = glob_array[arg + 0];
>       long t1 = glob_array[arg + 1];
>       long t2 = ...
>
>       // This should trigger shrink-wrapping, as the function calls will
>       // need a bunch of register state saved.  The idea is to make sure
>       // all the other registers are somehow in flight at this point and
>       // could only be recovered via their saved copies, but due to the
>       // large stack we'll need a temporary to construct the slot
>       // addresses.
>       if (a) {
>         t0 = helper(stack_array, a, b, ...);
>       }
>
>       // Go re-use those temporaries, again indirect to avoid
>       // optimization.
>       a += glob_array[t0];
>       b += glob_array[t1];
>       c ...;
>       return helper(stack_array, a, b, c, ...);
>     }
>
> At that point you should end up with the save/restore code in that if
> block needing to do something like
>
>     lui  TMP, IMM_HI
>     addi TMP, sp, IMM_LO
>     sd REG0, 0(t0)
>     sd REG1, 8(t0)
>     ...
>
> and if one of those registers it's trying to save is TMP then we're in
> trouble.
>

I've revisited this testcase and I think it's not possible to make it
work with the current implementation.
It's not possible to trigger shrink wrapping in this case since the
wrapping of registers is guarded by
 if (SMALL_OPERAND (offset)) { bitmap_set_bit (components, regno); }
Hence if a long stack is generated we get no shrink wrapping.

I also tried to remove that restriction but it looks like it can't
work because we can't create
pseudo-registers during shrink wrapping and shrink wrapping can't work either.

I believe this means that shrink wrapping cannot interfere with a long
stack frame
so there is nothing to test against in this case?

Manolis

> > Generally looks pretty good though.
>
> Ya, and a useful optimization so thanks for the help ;)
>
> >
> >
> > Jeff

next prev parent reply	other threads:[~2022-11-02 13:55 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-06 10:39 mtsamis
2022-10-02 20:32 ` Palmer Dabbelt
2022-10-18 15:18   ` Manolis Tsamis
2022-10-18 15:57     ` Jeff Law
2022-10-18 17:35       ` Palmer Dabbelt
2022-10-19 17:15         ` Jeff Law
2022-10-20  7:42           ` Manolis Tsamis
2022-11-02 14:12           ` Manolis Tsamis
2022-11-02 15:02             ` Jeff Law
2022-10-20  7:35         ` Manolis Tsamis
2022-11-02 13:54         ` Manolis Tsamis [this message]
2022-11-02 15:06           ` Jeff Law
2022-11-03  0:26             ` Palmer Dabbelt
2022-11-03 22:23               ` Jeff Law
2022-11-07 22:07                 ` Palmer Dabbelt
2022-11-13  1:32                   ` Jeff Law
2022-11-16 10:26                     ` Manolis Tsamis
2022-11-17  2:09                       ` Jeff Law
2022-11-17 10:54                         ` Manolis Tsamis
2022-11-17 11:59                           ` Philipp Tomsich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAM3yNXpanKFLFz3JgVkmhxzvN4OVhPZ37dsHb6guuJLVM8HCjQ@mail.gmail.com \
    --to=manolis.tsamis@vrull.eu \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jlaw@ventanamicro.com \
    --cc=kito.cheng@gmail.com \
    --cc=palmer@dabbelt.com \
    --cc=philipp.tomsich@vrull.eu \
    --cc=vineetg@rivosinc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).