From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <manolis.tsamis@vrull.eu>
Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a])
	by sourceware.org (Postfix) with ESMTPS id 9EE4A3853564
	for <gcc-patches@gcc.gnu.org>; Thu, 20 Oct 2022 07:35:54 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9EE4A3853564
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu
Received: by mail-lj1-x22a.google.com with SMTP id x18so25205306ljm.1
        for <gcc-patches@gcc.gnu.org>; Thu, 20 Oct 2022 00:35:54 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=vrull.eu; s=google;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:from:to:cc:subject:date:message-id:reply-to;
        bh=bVXFI1zV4AlCRb8K99cJqdxqnwpnUZiYpo7EHNkMn/I=;
        b=PDruhVIhMJ8USTdFkcg2Pit8reGHQzdpcNcdgTVs2s9fJuURjlGPiLmREEXkWnpI0t
         kbLXs3SzjBLn5AzGtMI3KztDmYlIs/wpEdPwyL5YNgaJPE1mH1ypz/t+9HQ7z3qBNZq6
         nVKeAyjHO6YObaygbCwj/pUFH3Fkq1zsk/pzLBdEIOz3WRJMeELOh/U6KsMRElAyibK+
         h6BeSRQAvNOj1c1L18/UAUub/LCvLFKFNXR60XLxCEmX0F6t2DPomE0Xi+tfRU7ztCL0
         EAeWZZW9KrtEyMRCkebBofwcnGG3jN0m56Uy0X91DEb7XG0wnrfMylxfhZMGdUrrCUwL
         CEfg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:subject:message-id:date:from:in-reply-to:references
         :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=bVXFI1zV4AlCRb8K99cJqdxqnwpnUZiYpo7EHNkMn/I=;
        b=sKy8jdbpSec+fTAxF5kU603IiKIHrkmCdcu8bQTJ0SG+DhLM/xzyNrWd/li/Ry++3L
         TE//QjELEhekPQhdhJNrh+uhYkGYySbB4JvbjbW2bNC/pYMvABKQ++wOOYQzY0Yr1b80
         TPA9o5fW3ZCq7GFEnhKB3DfagCU2g7SmcOJKcsCwewxV76Q3R18DFvUgsFAAOqmjrnhb
         sz5pwRwf4Qj11rtCgHu1v7HfzA7AYRhD+US7VvzogcaC4swt+bi6dCMeTLeUOSkzJJTG
         21/k6qiSqpTKFNVc01HDMCOCSMXsDHjSEnjPUSrKQ+yzVAeIm8RPtMdug75FByjFm/pQ
         2kIg==
X-Gm-Message-State: ACrzQf3ujDPfsODxwKhAOQubYss/4daJ8h6I9oAhhtY6P41qKQ5Nu+Re
	5dCKf2JN4zrPqQOFL/s+frDpg9vCyoS6YElS3kXKfQ==
X-Google-Smtp-Source: AMsMyM4Ef2T7lVcJXDeHpbYRiZLAbUFA97pwUjo2iiRMz+i6H4ZCIhnf/ej4Sih7S2xzcbyMaVdZIEUIXI7RUAs/CNg=
X-Received: by 2002:a05:651c:2103:b0:25d:6478:2a57 with SMTP id
 a3-20020a05651c210300b0025d64782a57mr4211952ljq.496.1666251353066; Thu, 20
 Oct 2022 00:35:53 -0700 (PDT)
MIME-Version: 1.0
References: <2e2ca1dc-902e-23ca-babf-11a4978530ef@ventanamicro.com> <mhng-1ef6a2eb-a41e-4eb7-931a-04efc949db61@palmer-ri-x1c9a>
In-Reply-To: <mhng-1ef6a2eb-a41e-4eb7-931a-04efc949db61@palmer-ri-x1c9a>
From: Manolis Tsamis <manolis.tsamis@vrull.eu>
Date: Thu, 20 Oct 2022 10:35:16 +0300
Message-ID: <CAM3yNXoyHAOUj_Xj8L+vhF9M8oU4C61t3=_Ju-6yb43Ey==NXA@mail.gmail.com>
Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target.
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org, 
	Vineet Gupta <vineetg@rivosinc.com>, Kito Cheng <kito.cheng@gmail.com>, philipp.tomsich@vrull.eu
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Tue, Oct 18, 2022 at 8:35 PM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Tue, 18 Oct 2022 08:57:37 PDT (-0700), jlaw@ventanamicro.com wrote:
> >
> > Just a couple more comments in-line.
> >
> > On 10/18/22 09:18, Manolis Tsamis wrote:
> >>
> >>>> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS.  */
> >>>> +
> >>>> +static sbitmap
> >>>> +riscv_get_separate_components (void)
> >>>> +{
> >>>> +  HOST_WIDE_INT offset;
> >>>> +  sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER);
> >>>> +  bitmap_clear (components);
> >>>> +
> >>>> +  if (riscv_use_save_libcall (&cfun->machine->frame)
> >>>> +      || cfun->machine->interrupt_handler_p)
> >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's
> >>> redundant.  That said, I'm not sure riscv_use_save_libcall() is the
> >>> right check here as unless I'm missing something we don't have all those
> >>> other constraints when shrink-wrapping.
> >>>
> >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so the
> >> check for interrupt_handler_p in the branch is not redundant in this case.
> >>
> >> I encountered some issues when shrink wrapping and libcall was used in the same
> >> function. Thinking that libcall replaces the prologue/epilogue I didn't see a
> >> reason to have both at the same time and hence I opted to disable
> >> shrink wrapping in that case. From my understanding this should be harmless?
> >
> > I would have expected things to work fine with libcalls, perhaps with
> > the exception of the save/restore libcalls.  So that needs deeper
> > investigation.
>
> The save/restore libcalls only support saving/restoring a handful of
> register configurations (just the saved X registers in the order they're
> usually saved in by GCC).  It should be OK for correctness to over-save
> registers, but it kind of just un-does the shrink wrapping so not sure
> it's worth worrying about at that point.
>
> There's also some oddness around the save/restore libcall ABI, it's not
> the standard function ABI but instead a GCC-internal one.  IIRC it just
> uses the alternate link register (ie, t0 instead of ra) but I may have
> forgotten something else.
>
> >>> It seems kind of clunky to have two copies of all these loops (and we'll
> >>> need a third to make this work with the V stuff), but we've got that
> >>> issue elsewhere in the port so I don't think you need to fix it here
> >>> (though the V stuff will be there for the v2, so you'll need the third
> >>> copy of each loop).
> >>>
> >> Indeed, I was following the other ports here. Do you think it would be
> >> better to refactor this when the code for the V extension is added?
> >> By taking into account what code will be needed for V, a proper refactored
> >> function could be made to handle all cases.
> >
> > I think refactoring when V gets added would be fine.  While we could
> > probably refactor it correctly now (it isn't terribly complex code after
> > all), but we're more likely to get it right with the least amount of
> > work if we do it when V is submitted.
>
> Some of the V register blocks are already there, but ya I agree we can
> just wait.  There's going to be a bunch of V-related churn for a bit,
> juggling those patches is already enough of a headache ;)
>
> >>> Either way, this deserves a test case.  I think it should be possible to
> >>> write one by introducing some register pressure around a
> >>> shrink-wrappable block that needs a long stack offset and making sure
> >>> in-flight registers don't get trashed.
> >>>
> >> I tried to think of some way to introduce a test like that but couldn't and
> >> I don't see how it would be done. Shrink wrapping only affects saved registers
> >> so there are always available temporaries that are not affected by
> >> shrink wrapping.
> >> (Register pressure should be irrelevant in this case if I understand correctly).
> >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping
> >> should be unaffected from long stack offsets. If you see some way to write
> >> a test for that based on what I explained please explain how I could do that.
> >
> > I think the register pressure was just to ensure that some saves were
> > needed to trigger an attempt to shrink wrap something.  You'd also need
> > something to eat stack space (local array which gets referenced as an
> > asm operand, but where the asm doesn't generate any code perhaps)?
> > Whether or not that works depends on stack layout though which I don't
> > know well enough for riscv.
>
> Sorry for being a bit vague, but it's because I always find it takes a
> bit of time to write up tests like this.  I think something like this
> might do it, but that almost certainly won't work as-is:
>
>     // Some extern bits to try and trip up the optimizer.
>     extern long helper(long *sa, long a, long b, long c, ...);
>     extern long glob_array[1024];
>
>     // The function takes a bunch of arguments to fill up the A
>     // registers.
>     long func(long a, long b, long c, ...) {
>       long stack_array[1024]; // should be big enough to make SP offsets
>                               // take a temporary
>
>       // Here I'm loading up all the T registers with something that
>       // must be saved to the stack, since those helper calls below may
>       // change the value of glob_array.  We probably need to fill the S
>       // registers too?
>       long t0 = glob_array[arg + 0];
>       long t1 = glob_array[arg + 1];
>       long t2 = ...
>
>       // This should trigger shrink-wrapping, as the function calls will
>       // need a bunch of register state saved.  The idea is to make sure
>       // all the other registers are somehow in flight at this point and
>       // could only be recovered via their saved copies, but due to the
>       // large stack we'll need a temporary to construct the slot
>       // addresses.
>       if (a) {
>         t0 = helper(stack_array, a, b, ...);
>       }
>
>       // Go re-use those temporaries, again indirect to avoid
>       // optimization.
>       a += glob_array[t0];
>       b += glob_array[t1];
>       c ...;
>       return helper(stack_array, a, b, c, ...);
>     }
>
> At that point you should end up with the save/restore code in that if
> block needing to do something like
>
>     lui  TMP, IMM_HI
>     addi TMP, sp, IMM_LO
>     sd REG0, 0(t0)
>     sd REG1, 8(t0)
>     ...
>
> and if one of those registers it's trying to save is TMP then we're in
> trouble.
>

Thanks for the additional details and rough sketch of the testcase.
I'll try to make something based on that.

Manolis

> > Generally looks pretty good though.
>
> Ya, and a useful optimization so thanks for the help ;)
>
> >
> >
> > Jeff