From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <manolis.tsamis@vrull.eu>
Received: from mail-lj1-x234.google.com (mail-lj1-x234.google.com [IPv6:2a00:1450:4864:20::234])
	by sourceware.org (Postfix) with ESMTPS id 6DDA83858C52
	for <gcc-patches@gcc.gnu.org>; Wed,  2 Nov 2022 13:55:25 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6DDA83858C52
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
Received: by mail-lj1-x234.google.com with SMTP id k19so24850178lji.2
        for <gcc-patches@gcc.gnu.org>; Wed, 02 Nov 2022 06:55:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=vrull.eu; s=google;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=H+nbMIo8RwxsBiySSaWlBYpiY9BBcKGzq0mgIyFZyk8=;
        b=pDEMEKP7SmSodwdTrRqJXlT7KprlC7PtfRJ7IJTHlUSbsuxe5vwyX3RNP4Da4AhK7a
         YnaNByph6uSbMmXUrJhQnKoKoyG8ocAGx4Hmm0Ql/YSB2mLP1c13qGvUr0U8aABwBPdx
         ttKlWL7duJ3KYic8x0TxNFcrb7tsXq05vgp827T+R/7HcZgxcoCCfwkB30mOLaUaanGo
         0W/bD2nZue8o8lPpRuGtgMF8TMwHZrtTAy1HZawdT8MlBoic0ybv+JZJy0ESqUSwr0kn
         h9H8Vk7OvScs39dGUrNhdh7WJq+YUD7rcumn0Uxfazgdqsxbs/x813JIG5NkVbIlFlBr
         LWkw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=H+nbMIo8RwxsBiySSaWlBYpiY9BBcKGzq0mgIyFZyk8=;
        b=UufKZg/B/Q3VkaB3FdoV3TqZYEe/qjDjgRCdLiDHVvt+JjP6aOf5O2sbY0NhpKGVCq
         HSKu6/6NWabj4bsAGTp3MrWhCtWk9N0XQxFqdmt7SDELO9bGpeuV6aOXSukRU8CismXL
         6cHVT0zY0Lqj7tXToAY10oIVryJHIu/+azSSbTgYp3Apd1v9T8+IV+heh0uctV1LeT82
         mgfotivgWDUwhS6oiUn6fCqcR88b3zlq+CW+2o9Ks6h3OG/PfX+wRBGh36H9iL0p0Qco
         CAuInMF67TjM5e4gYBesFnvZJf5W1OxwFsp2njRpBPZrlV7IWN5/49LUW0kMcBhJgNkz
         gSIA==
X-Gm-Message-State: ACrzQf2wUXtdwUSqDhz5OP/8LxuQPD9Fqeh6IJkQ+9avQyLV/+zVIVoi
	uNwOwMSWdEnCekZHBS8oMwdvS8yg8iiOXRqpWeKUHw==
X-Google-Smtp-Source: AMsMyM6WmIu/zgkXxDydP+7AJpQXEFdGNqG9UjuwtTG8OVfH9fiRovSMdm8CgfEONQS5kLlF1/kIqYQmFKUVYsR4GWw=
X-Received: by 2002:a2e:a810:0:b0:26f:b417:d122 with SMTP id
 l16-20020a2ea810000000b0026fb417d122mr9657069ljq.317.1667397323844; Wed, 02
 Nov 2022 06:55:23 -0700 (PDT)
MIME-Version: 1.0
References: <2e2ca1dc-902e-23ca-babf-11a4978530ef@ventanamicro.com> <mhng-1ef6a2eb-a41e-4eb7-931a-04efc949db61@palmer-ri-x1c9a>
In-Reply-To: <mhng-1ef6a2eb-a41e-4eb7-931a-04efc949db61@palmer-ri-x1c9a>
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
Date: Wed, 2 Nov 2022 15:54:47 +0200
Message-ID: <CAM3yNXpanKFLFz3JgVkmhxzvN4OVhPZ37dsHb6guuJLVM8HCjQ@mail.gmail.com>
Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target.
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org, 
	Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>, philipp.tomsich@vrull.eu
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, Oct 18, 2022 at 8:35 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 18 Oct 2022 08:57:37 PDT (-0700), jlaw@ventanamicro.com wrote:
> >
> > Just a couple more comments in-line.
> >
> > On 10/18/22 09:18, Manolis Tsamis wrote:
> >>
> >>>> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> >>>> +
> >>>> +static sbitmap
> >>>> +riscv_get_separate_components (void)
> >>>> +{
> >>>> +  HOST_WIDE_INT offset;
> >>>> +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> >>>> +  bitmap_clear (components);
> >>>> +
> >>>> +  if (riscv_use_save_libcall (&cfun->machine->frame)
> >>>> +      || cfun->machine->interrupt_handler_p)
> >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's
> >>> redundant.  That said, I'm not sure riscv_use_save_libcall() is the
> >>> right check here as unless I'm missing something we don't have all those
> >>> other constraints when shrink-wrapping.
> >>>
> >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so the
> >> check for interrupt_handler_p in the branch is not redundant in this case.
> >>
> >> I encountered some issues when shrink wrapping and libcall was used in the same
> >> function. Thinking that libcall replaces the prologue/epilogue I didn't see a
> >> reason to have both at the same time and hence I opted to disable
> >> shrink wrapping in that case. From my understanding this should be harmless?
> >
> > I would have expected things to work fine with libcalls, perhaps with
> > the exception of the save/restore libcalls.  So that needs deeper
> > investigation.
>
> The save/restore libcalls only support saving/restoring a handful of
> register configurations (just the saved X registers in the order they're
> usually saved in by GCC).  It should be OK for correctness to over-save
> registers, but it kind of just un-does the shrink wrapping so not sure
> it's worth worrying about at that point.
>
> There's also some oddness around the save/restore libcall ABI, it's not
> the standard function ABI but instead a GCC-internal one.  IIRC it just
> uses the alternate link register (ie, t0 instead of ra) but I may have
> forgotten something else.
>
> >>> It seems kind of clunky to have two copies of all these loops (and we'll
> >>> need a third to make this work with the V stuff), but we've got that
> >>> issue elsewhere in the port so I don't think you need to fix it here
> >>> (though the V stuff will be there for the v2, so you'll need the third
> >>> copy of each loop).
> >>>
> >> Indeed, I was following the other ports here. Do you think it would be
> >> better to refactor this when the code for the V extension is added?
> >> By taking into account what code will be needed for V, a proper refactored
> >> function could be made to handle all cases.
> >
> > I think refactoring when V gets added would be fine.  While we could
> > probably refactor it correctly now (it isn't terribly complex code after
> > all), but we're more likely to get it right with the least amount of
> > work if we do it when V is submitted.
>
> Some of the V register blocks are already there, but ya I agree we can
> just wait.  There's going to be a bunch of V-related churn for a bit,
> juggling those patches is already enough of a headache ;)
>
> >>> Either way, this deserves a test case.  I think it should be possible to
> >>> write one by introducing some register pressure around a
> >>> shrink-wrappable block that needs a long stack offset and making sure
> >>> in-flight registers don't get trashed.
> >>>
> >> I tried to think of some way to introduce a test like that but couldn't and
> >> I don't see how it would be done. Shrink wrapping only affects saved registers
> >> so there are always available temporaries that are not affected by
> >> shrink wrapping.
> >> (Register pressure should be irrelevant in this case if I understand correctly).
> >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping
> >> should be unaffected from long stack offsets. If you see some way to write
> >> a test for that based on what I explained please explain how I could do that.
> >
> > I think the register pressure was just to ensure that some saves were
> > needed to trigger an attempt to shrink wrap something.  You'd also need
> > something to eat stack space (local array which gets referenced as an
> > asm operand, but where the asm doesn't generate any code perhaps)?
> > Whether or not that works depends on stack layout though which I don't
> > know well enough for riscv.
>
> Sorry for being a bit vague, but it's because I always find it takes a
> bit of time to write up tests like this.  I think something like this
> might do it, but that almost certainly won't work as-is:
>
>     // Some extern bits to try and trip up the optimizer.
>     extern long helper(long *sa, long a, long b, long c, ...);
>     extern long glob_array[1024];
>
>     // The function takes a bunch of arguments to fill up the A
>     // registers.
>     long func(long a, long b, long c, ...) {
>       long stack_array[1024]; // should be big enough to make SP offsets
>                               // take a temporary
>
>       // Here I'm loading up all the T registers with something that
>       // must be saved to the stack, since those helper calls below may
>       // change the value of glob_array.  We probably need to fill the S
>       // registers too?
>       long t0 = glob_array[arg + 0];
>       long t1 = glob_array[arg + 1];
>       long t2 = ...
>
>       // This should trigger shrink-wrapping, as the function calls will
>       // need a bunch of register state saved.  The idea is to make sure
>       // all the other registers are somehow in flight at this point and
>       // could only be recovered via their saved copies, but due to the
>       // large stack we'll need a temporary to construct the slot
>       // addresses.
>       if (a) {
>         t0 = helper(stack_array, a, b, ...);
>       }
>
>       // Go re-use those temporaries, again indirect to avoid
>       // optimization.
>       a += glob_array[t0];
>       b += glob_array[t1];
>       c ...;
>       return helper(stack_array, a, b, c, ...);
>     }
>
> At that point you should end up with the save/restore code in that if
> block needing to do something like
>
>     lui  TMP, IMM_HI
>     addi TMP, sp, IMM_LO
>     sd REG0, 0(t0)
>     sd REG1, 8(t0)
>     ...
>
> and if one of those registers it's trying to save is TMP then we're in
> trouble.
>

I've revisited this testcase and I think it's not possible to make it
work with the current implementation.
It's not possible to trigger shrink wrapping in this case since the
wrapping of registers is guarded by
 if (SMALL_OPERAND (offset)) { bitmap_set_bit (components, regno); }
Hence if a long stack is generated we get no shrink wrapping.

I also tried to remove that restriction but it looks like it can't
work because we can't create
pseudo-registers during shrink wrapping and shrink wrapping can't work either.

I believe this means that shrink wrapping cannot interfere with a long
stack frame
so there is nothing to test against in this case?

Manolis

> > Generally looks pretty good though.
>
> Ya, and a useful optimization so thanks for the help ;)
>
> >
> >
> > Jeff