From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by sourceware.org (Postfix) with ESMTPS id A2BB63858D32 for ; Tue, 18 Oct 2022 17:35:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A2BB63858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=dabbelt.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=dabbelt.com Received: by mail-pl1-x62f.google.com with SMTP id l4so14504253plb.8 for ; Tue, 18 Oct 2022 10:35:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dabbelt-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:from:to:cc:subject:date:message-id :reply-to; bh=IbOrUs4Xdir3UJ1LySiNFqa/CpKVl8iNLPM5Q0zH5iE=; b=htA8kWJ4RRzwd3HgzmACmy6K9eBCCGkGBMFS7jTz39jwyS1IEuwhBtsku2TsRz6qbs nVtCEP5kgcazIEsJn72U406g6TmR8si7R1wA+D2YwGFRdrLOXfyy82D2091sYEHqc/d1 Y05LvxL0qzCXRMtXtKuPw121gP/tJpzXeVW9PnA3xIdBYawr60LG/vvYWULa/qfOjvUX OZNf3fyoAWimio4V2LYHMDrpQh0lyioS98EF+mOqCXotWcqvhv+vnSqhnKVwY+fY/5+3 qOSYFsjh9086kM5+Hr38aF1ID8layeVYKECGpUBt56LRAS1DkQ8eMhLSXfikNYnVpUIE YJIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:to:from:cc :in-reply-to:subject:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=IbOrUs4Xdir3UJ1LySiNFqa/CpKVl8iNLPM5Q0zH5iE=; b=GfBtMzSM6EEYOOdF5LdiHUbYM3szkQ/jcUb6MgfI5nWwQkmsRTTS9O7YJXwo76zEUg gNU3jDoxFXBVMCnjCd+a52Zv94AKKVpFUbwgGSYbkNs5sfvpQhwp6lzNxfoeE6gAELXN 8jUWJ/wWerUpYbVED40asiXaI0p2Qq15LNOdhxN3awJvoFOuouFb74b2GEeBhYXQOLQ6 nUPpCDtrBgr3GKYOYOImYADYTNS7MCNrMMu4Eyl4dzQLHdc97hteLynEBc97vg7XDKjB MTGYWpYdF5L724jQ9ShvNvztKwe7+pPMBlfTLisjtClzWf6o2Z99WrWOUfPnmhjrtKYq MFPQ== X-Gm-Message-State: ACrzQf3SZCUUUgESfv0BXdjsBOVgySsP4hh8UK0eElWRgTlZM+a3ubgl 20gK2L8CAj958u8/C9BWDKrS8Q== X-Google-Smtp-Source: AMsMyM7rucjk7oW7/Ve0krbyYgzKCVTOx+lZmRhDfYttZytD6OudhE1uj60tNo3hB3BwSl4VuARhEw== X-Received: by 2002:a17:90b:1b06:b0:20d:3b10:3817 with SMTP id nu6-20020a17090b1b0600b0020d3b103817mr4873175pjb.43.1666114543335; Tue, 18 Oct 2022 10:35:43 -0700 (PDT) Received: from localhost ([50.221.140.188]) by smtp.gmail.com with ESMTPSA id u15-20020a1709026e0f00b0017508d2665dsm9038639plk.34.2022.10.18.10.35.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Oct 2022 10:35:42 -0700 (PDT) Date: Tue, 18 Oct 2022 10:35:42 -0700 (PDT) X-Google-Original-Date: Tue, 18 Oct 2022 10:35:46 PDT (-0700) Subject: Re: [PATCH] Enable shrink wrapping for the RISC-V target. In-Reply-To: <2e2ca1dc-902e-23ca-babf-11a4978530ef@ventanamicro.com> CC: manolis.tsamis@vrull.eu, gcc-patches@gcc.gnu.org, Vineet Gupta , Kito Cheng , philipp.tomsich@vrull.eu From: Palmer Dabbelt To: jlaw@ventanamicro.com Message-ID: Mime-Version: 1.0 (MHng) Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, 18 Oct 2022 08:57:37 PDT (-0700), jlaw@ventanamicro.com wrote: > > Just a couple more comments in-line. > > On 10/18/22 09:18, Manolis Tsamis wrote: >> >>>> +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ >>>> + >>>> +static sbitmap >>>> +riscv_get_separate_components (void) >>>> +{ >>>> + HOST_WIDE_INT offset; >>>> + sbitmap components = sbitmap_alloc (FIRST_PSEUDO_REGISTER); >>>> + bitmap_clear (components); >>>> + >>>> + if (riscv_use_save_libcall (&cfun->machine->frame) >>>> + || cfun->machine->interrupt_handler_p) >>> riscv_use_save_libcall() already checks interrupt_handler_p, so that's >>> redundant. That said, I'm not sure riscv_use_save_libcall() is the >>> right check here as unless I'm missing something we don't have all those >>> other constraints when shrink-wrapping. >>> >> riscv_use_save_libcall returns false when interrupt_handler_p is true, so the >> check for interrupt_handler_p in the branch is not redundant in this case. >> >> I encountered some issues when shrink wrapping and libcall was used in the same >> function. Thinking that libcall replaces the prologue/epilogue I didn't see a >> reason to have both at the same time and hence I opted to disable >> shrink wrapping in that case. From my understanding this should be harmless? > > I would have expected things to work fine with libcalls, perhaps with > the exception of the save/restore libcalls.  So that needs deeper > investigation. The save/restore libcalls only support saving/restoring a handful of register configurations (just the saved X registers in the order they're usually saved in by GCC). It should be OK for correctness to over-save registers, but it kind of just un-does the shrink wrapping so not sure it's worth worrying about at that point. There's also some oddness around the save/restore libcall ABI, it's not the standard function ABI but instead a GCC-internal one. IIRC it just uses the alternate link register (ie, t0 instead of ra) but I may have forgotten something else. >>> It seems kind of clunky to have two copies of all these loops (and we'll >>> need a third to make this work with the V stuff), but we've got that >>> issue elsewhere in the port so I don't think you need to fix it here >>> (though the V stuff will be there for the v2, so you'll need the third >>> copy of each loop). >>> >> Indeed, I was following the other ports here. Do you think it would be >> better to refactor this when the code for the V extension is added? >> By taking into account what code will be needed for V, a proper refactored >> function could be made to handle all cases. > > I think refactoring when V gets added would be fine.  While we could > probably refactor it correctly now (it isn't terribly complex code after > all), but we're more likely to get it right with the least amount of > work if we do it when V is submitted. Some of the V register blocks are already there, but ya I agree we can just wait. There's going to be a bunch of V-related churn for a bit, juggling those patches is already enough of a headache ;) >>> Either way, this deserves a test case. I think it should be possible to >>> write one by introducing some register pressure around a >>> shrink-wrappable block that needs a long stack offset and making sure >>> in-flight registers don't get trashed. >>> >> I tried to think of some way to introduce a test like that but couldn't and >> I don't see how it would be done. Shrink wrapping only affects saved registers >> so there are always available temporaries that are not affected by >> shrink wrapping. >> (Register pressure should be irrelevant in this case if I understand correctly). >> Also the implementation checks for SMALL_OPERAND (offset) shrink wrapping >> should be unaffected from long stack offsets. If you see some way to write >> a test for that based on what I explained please explain how I could do that. > > I think the register pressure was just to ensure that some saves were > needed to trigger an attempt to shrink wrap something.  You'd also need > something to eat stack space (local array which gets referenced as an > asm operand, but where the asm doesn't generate any code perhaps)?  > Whether or not that works depends on stack layout though which I don't > know well enough for riscv. Sorry for being a bit vague, but it's because I always find it takes a bit of time to write up tests like this. I think something like this might do it, but that almost certainly won't work as-is: // Some extern bits to try and trip up the optimizer. extern long helper(long *sa, long a, long b, long c, ...); extern long glob_array[1024]; // The function takes a bunch of arguments to fill up the A // registers. long func(long a, long b, long c, ...) { long stack_array[1024]; // should be big enough to make SP offsets // take a temporary // Here I'm loading up all the T registers with something that // must be saved to the stack, since those helper calls below may // change the value of glob_array. We probably need to fill the S // registers too? long t0 = glob_array[arg + 0]; long t1 = glob_array[arg + 1]; long t2 = ... // This should trigger shrink-wrapping, as the function calls will // need a bunch of register state saved. The idea is to make sure // all the other registers are somehow in flight at this point and // could only be recovered via their saved copies, but due to the // large stack we'll need a temporary to construct the slot // addresses. if (a) { t0 = helper(stack_array, a, b, ...); } // Go re-use those temporaries, again indirect to avoid // optimization. a += glob_array[t0]; b += glob_array[t1]; c ...; return helper(stack_array, a, b, c, ...); } At that point you should end up with the save/restore code in that if block needing to do something like lui TMP, IMM_HI addi TMP, sp, IMM_LO sd REG0, 0(t0) sd REG1, 8(t0) ... and if one of those registers it's trying to save is TMP then we're in trouble. > Generally looks pretty good though. Ya, and a useful optimization so thanks for the help ;) > > > Jeff