From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lj1-x22a.google.com (mail-lj1-x22a.google.com [IPv6:2a00:1450:4864:20::22a]) by sourceware.org (Postfix) with ESMTPS id 9EE4A3853564 for ; Thu, 20 Oct 2022 07:35:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9EE4A3853564 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lj1-x22a.google.com with SMTP id x18so25205306ljm.1 for ; Thu, 20 Oct 2022 00:35:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=bVXFI1zV4AlCRb8K99cJqdxqnwpnUZiYpo7EHNkMn/I=; b=PDruhVIhMJ8USTdFkcg2Pit8reGHQzdpcNcdgTVs2s9fJuURjlGPiLmREEXkWnpI0t kbLXs3SzjBLn5AzGtMI3KztDmYlIs/wpEdPwyL5YNgaJPE1mH1ypz/t+9HQ7z3qBNZq6 nVKeAyjHO6YObaygbCwj/pUFH3Fkq1zsk/pzLBdEIOz3WRJMeELOh/U6KsMRElAyibK+ h6BeSRQAvNOj1c1L18/UAUub/LCvLFKFNXR60XLxCEmX0F6t2DPomE0Xi+tfRU7ztCL0 EAeWZZW9KrtEyMRCkebBofwcnGG3jN0m56Uy0X91DEb7XG0wnrfMylxfhZMGdUrrCUwL CEfg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=bVXFI1zV4AlCRb8K99cJqdxqnwpnUZiYpo7EHNkMn/I=; b=sKy8jdbpSec+fTAxF5kU603IiKIHrkmCdcu8bQTJ0SG+DhLM/xzyNrWd/li/Ry++3L TE//QjELEhekPQhdhJNrh+uhYkGYySbB4JvbjbW2bNC/pYMvABKQ++wOOYQzY0Yr1b80 TPA9o5fW3ZCq7GFEnhKB3DfagCU2g7SmcOJKcsCwewxV76Q3R18DFvUgsFAAOqmjrnhb sz5pwRwf4Qj11rtCgHu1v7HfzA7AYRhD+US7VvzogcaC4swt+bi6dCMeTLeUOSkzJJTG 21/k6qiSqpTKFNVc01HDMCOCSMXsDHjSEnjPUSrKQ+yzVAeIm8RPtMdug75FByjFm/pQ 2kIg== X-Gm-Message-State: ACrzQf3ujDPfsODxwKhAOQubYss/4daJ8h6I9oAhhtY6P41qKQ5Nu+Re 5dCKf2JN4zrPqQOFL/s+frDpg9vCyoS6YElS3kXKfQ== X-Google-Smtp-Source: AMsMyM4Ef2T7lVcJXDeHpbYRiZLAbUFA97pwUjo2iiRMz+i6H4ZCIhnf/ej4Sih7S2xzcbyMaVdZIEUIXI7RUAs/CNg= X-Received: by 2002:a05:651c:2103:b0:25d:6478:2a57 with SMTP id a3-20020a05651c210300b0025d64782a57mr4211952ljq.496.1666251353066; Thu, 20 Oct 2022 00:35:53 -0700 (PDT) MIME-Version: 1.0 References: <2e2ca1dc-902e-23ca-babf-11a4978530ef@ventanamicro.com> In-Reply-To: From: Manolis Tsamis Date: Thu, 20 Oct 2022 10:35:16 +0300 Message-ID: Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target. To: Palmer Dabbelt Cc: jlaw@ventanamicro.com, gcc-patches@gcc.gnu.org, Vineet Gupta , Kito Cheng , philipp.tomsich@vrull.eu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,JMQ_SPF_NEUTRAL,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 18, 2022 at 8:35 PM Palmer Dabbelt wrote: > > On Tue, 18 Oct 2022 08:57:37 PDT (-0700), jlaw@ventanamicro.com wrote: > > > > Just a couple more comments in-line. > > > > On 10/18/22 09:18, Manolis Tsamis wrote: > >> > >>>> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ > >>>> + > >>>> +static sbitmap > >>>> +riscv_get_separate_components (void) > >>>> +{ > >>>> + HOST_WIDE_INT offset; > >>>> + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); > >>>> + bitmap_clear (components); > >>>> + > >>>> + if (riscv_use_save_libcall (&cfun->machine->frame) > >>>> + || cfun->machine->interrupt_handler_p) > >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's > >>> redundant. That said, I'm not sure riscv_use_save_libcall() is the > >>> right check here as unless I'm missing something we don't have all those > >>> other constraints when shrink-wrapping. > >>> > >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so the > >> check for interrupt_handler_p in the branch is not redundant in this case. > >> > >> I encountered some issues when shrink wrapping and libcall was used in the same > >> function. Thinking that libcall replaces the prologue/epilogue I didn't see a > >> reason to have both at the same time and hence I opted to disable > >> shrink wrapping in that case. From my understanding this should be harmless? > > > > I would have expected things to work fine with libcalls, perhaps with > > the exception of the save/restore libcalls. So that needs deeper > > investigation. > > The save/restore libcalls only support saving/restoring a handful of > register configurations (just the saved X registers in the order they're > usually saved in by GCC). It should be OK for correctness to over-save > registers, but it kind of just un-does the shrink wrapping so not sure > it's worth worrying about at that point. > > There's also some oddness around the save/restore libcall ABI, it's not > the standard function ABI but instead a GCC-internal one. IIRC it just > uses the alternate link register (ie, t0 instead of ra) but I may have > forgotten something else. > > >>> It seems kind of clunky to have two copies of all these loops (and we'll > >>> need a third to make this work with the V stuff), but we've got that > >>> issue elsewhere in the port so I don't think you need to fix it here > >>> (though the V stuff will be there for the v2, so you'll need the third > >>> copy of each loop). > >>> > >> Indeed, I was following the other ports here. Do you think it would be > >> better to refactor this when the code for the V extension is added? > >> By taking into account what code will be needed for V, a proper refactored > >> function could be made to handle all cases. > > > > I think refactoring when V gets added would be fine. While we could > > probably refactor it correctly now (it isn't terribly complex code after > > all), but we're more likely to get it right with the least amount of > > work if we do it when V is submitted. > > Some of the V register blocks are already there, but ya I agree we can > just wait. There's going to be a bunch of V-related churn for a bit, > juggling those patches is already enough of a headache ;) > > >>> Either way, this deserves a test case. I think it should be possible to > >>> write one by introducing some register pressure around a > >>> shrink-wrappable block that needs a long stack offset and making sure > >>> in-flight registers don't get trashed. > >>> > >> I tried to think of some way to introduce a test like that but couldn't and > >> I don't see how it would be done. Shrink wrapping only affects saved registers > >> so there are always available temporaries that are not affected by > >> shrink wrapping. > >> (Register pressure should be irrelevant in this case if I understand correctly). > >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping > >> should be unaffected from long stack offsets. If you see some way to write > >> a test for that based on what I explained please explain how I could do that. > > > > I think the register pressure was just to ensure that some saves were > > needed to trigger an attempt to shrink wrap something. You'd also need > > something to eat stack space (local array which gets referenced as an > > asm operand, but where the asm doesn't generate any code perhaps)? > > Whether or not that works depends on stack layout though which I don't > > know well enough for riscv. > > Sorry for being a bit vague, but it's because I always find it takes a > bit of time to write up tests like this. I think something like this > might do it, but that almost certainly won't work as-is: > > // Some extern bits to try and trip up the optimizer. > extern long helper(long *sa, long a, long b, long c, ...); > extern long glob_array[1024]; > > // The function takes a bunch of arguments to fill up the A > // registers. > long func(long a, long b, long c, ...) { > long stack_array[1024]; // should be big enough to make SP offsets > // take a temporary > > // Here I'm loading up all the T registers with something that > // must be saved to the stack, since those helper calls below may > // change the value of glob_array. We probably need to fill the S > // registers too? > long t0 = glob_array[arg + 0]; > long t1 = glob_array[arg + 1]; > long t2 = ... > > // This should trigger shrink-wrapping, as the function calls will > // need a bunch of register state saved. The idea is to make sure > // all the other registers are somehow in flight at this point and > // could only be recovered via their saved copies, but due to the > // large stack we'll need a temporary to construct the slot > // addresses. > if (a) { > t0 = helper(stack_array, a, b, ...); > } > > // Go re-use those temporaries, again indirect to avoid > // optimization. > a += glob_array[t0]; > b += glob_array[t1]; > c ...; > return helper(stack_array, a, b, c, ...); > } > > At that point you should end up with the save/restore code in that if > block needing to do something like > > lui TMP, IMM_HI > addi TMP, sp, IMM_LO > sd REG0, 0(t0) > sd REG1, 8(t0) > ... > > and if one of those registers it's trying to save is TMP then we're in > trouble. > Thanks for the additional details and rough sketch of the testcase. I'll try to make something based on that. Manolis > > Generally looks pretty good though. > > Ya, and a useful optimization so thanks for the help ;) > > > > > > > Jeff