From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DDF903857363 for ; Mon, 16 May 2022 13:24:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DDF903857363 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6968D1063; Mon, 16 May 2022 06:24:07 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8F8083F73D; Mon, 16 May 2022 06:24:06 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions. References: Date: Mon, 16 May 2022 14:24:05 +0100 In-Reply-To: (Tamar Christina's message of "Mon, 16 May 2022 13:02:55 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 16 May 2022 13:24:10 -0000 Tamar Christina writes: >> -----Original Message----- >> From: Richard Sandiford >> Sent: Monday, May 16, 2022 1:18 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; rguenther@suse.de; >> jeffreyalaw@gmail.com >> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target de= cide >> the method of argument promotions. >>=20 >> Richard Sandiford via Gcc-patches writes: >> > Tamar Christina writes: >> >>> -----Original Message----- >> >>> From: Richard Sandiford >> >>> Sent: Monday, May 16, 2022 12:36 PM >> >>> To: Tamar Christina >> >>> Cc: gcc-patches@gcc.gnu.org; nd ; rguenther@suse.de; >> >>> jeffreyalaw@gmail.com >> >>> Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the >> >>> target decide the method of argument promotions. >> >>> >> >>> Tamar Christina writes: >> >>> > Hi All, >> >>> > >> >>> > Some targets require function parameters to be promoted to a >> >>> > different type on expand time because the target may not have >> >>> > native instructions to work on such types. As an example the >> >>> > AArch64 port does not have native instructions working on integer >> >>> > 8- or 16-bit values. As such it promotes every parameter of these >> types to 32-bits. >> >>> >> >>> This doesn't seem specific to parameters though. It applies to any >> >>> 8- or 16-bit variable. E.g.: >> >>> >> >>> #include >> >>> uint8_t foo(uint32_t x, uint32_t y) { >> >>> uint8_t z =3D x !=3D 0 ? x : y; >> >>> return z + 1; >> >>> } >> >>> >> >>> generates: >> >>> >> >>> foo: >> >>> cmp w0, 0 >> >>> and w1, w1, 255 >> >>> and w0, w0, 255 >> >>> csel w0, w1, w0, eq >> >>> add w0, w0, 1 >> >>> ret >> >>> >> >>> So I think the new behaviour is really a modification of the >> >>> PROMOTE_MODE behaviour rather than the >> PROMOTE_FUNCTION_MODE behaviour. >> >>> >> >>> FWIW, I agree with Richard that it would be better not to add a new >> hook. >> >>> I think we're really making PROMOTE_MODE choose between >> SIGN_EXTEND, >> >>> ZERO_EXTEND or SUBREG (what LLVM would call =E2=80=9Cany >> >>> extend=E2=80=9D) rather than the current SIGN_EXTEND vs. ZERO_EXTEND >> choice. >> >> >> >> Ah, I hadn't realized this also applied to locals.. ok I can modify >> >> PROMOTE_MODE then, but I also need the actual SSA_NAME and not just >> the type so will have to pass this along. >> >> >> >> From a practical point of view.. the actual hook however is >> >> implemented by 34 targets, would I need to CC maintainers for each of >> >> them or would global maintainer approval suffice for these mostly >> mechanical changes? >> > >> > Yeah, single approval should be enough mechanical changes. It would >> > be good to do the interface change and mechanical target changes as a >> > separate prepatch if possible though. >> > >> > I'm not sure about passing the SSA name to the target though, or the >> > way that the aarch64 hook uses the info. It looks like a single cold >> > comparison could defeat the optimisation for hot code. > > I'm not sure I follow why the likelihood of the comparison matters in thi= s case.. > I'll expand on it below.. I meant the likelihood that the comparison is executed at all, not which outcome is more likely. E.g. suppose the only comparison occurs on a failure path that eventually calls abort, and that there are other paths (without comparisons of the same value) that would benefit from the any-extend optimisation. We'd prioritise the cold comparison over optimising the other (hot) code. I'm just suspicious of heuristics along the lines of =E2=80=9Cdon't do X if there is a single instance of Y=E2=80=9D. :-) >> > If we do try to make the decision based on uses at expand time, it >> > might be better for the analysis to be in target-independent code, >> > with help from the target to decide where extensions are cheap. It >> > still feels a bit hacky though. > > I thought about it but can't see most target having this problem. I did go > with an optimistic heuristics. There are of course various ways to defeat= it > but looking through the corpus of code I don't see any but the simple cas= es > in practice. (more below). > >> > >> > What stops us from forming cbz/cbnz when the extension is done close >> > to the comparison (from the comment in 2/3)? If we can solve that, >> > could we simply do an any-extend all the time, and treat removing >> > redundant extensions as a global availability problem? >>=20 > > In such cases there's no gain from doing the extension at all, e.g. > and w0, w0, 255 > cmp w0, 0 > b.eq .Lfoo > > will be optimized to > > tst w0, 0xff > b.ne .Lfoo > > already. > > In RTL the problem occurs when you have nested control flow like nested i= f and switch statements > The example in 2/3 was intended to show that before what we'd do is > > and w22, w0, 255 > .... > > cbz w22, .Lfoo1 > .... > > cbz w22, .Lfoo2 > > If we have a single comparison we already sink the zero_extend today. > > Now if we instead any-extend w0 we end up with: > > mov w22, w0 > .... > > tst w22, 0xff > b.ne .Lfoo1 > .... > > tst w22, 0xff > b.ne .Lfoo2 > > So you get an additional tst before each branch. You also can't perform t= he tst higher since CC is clobbered. > The cbz is nice because the zero extend doesn't use CC of course and so h= aving the value live allows us to optimize > The branch. Once the cbz has been formed (in combine), where does the optimisation of it happen? > I don't think branch likeliness matters here as you must keep w22 live in= both cases somehow. In the SPECCPU 2017 > Benchmark perlbench (which uses a lot of nested switches) this pattern is= responsible for an extra 0.3% codesize increase > which the approach in 2/3 prevents. > >> (which would run after combine) >>=20 >> > >> > What kind of code do we emit when do an extension just before an >> > operation? If the DECL_RTL is (subreg:QI (reg:SI R) 0), say, then it >> > should be safe to do the extension directly into R: >> > >> > (set (reg:SI X) (zero_extend:SI (subreg:QI (reg:SI X)))) >>=20 >> Oops, that should of course be: >>=20 >> (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R)))) >>=20 >> > which avoids the problem of having two values live at once (the >> > zero-extended value and the any-extended value). > > I'm not sure it does, as the any-extended value must remain live. i.e. ab= ove you can't get rid of w22, > you can only choose between having it be zero of any extended. But I am = not sure how you would do > that after expand. These per-operation extends are emitted during expand. The question is whether we do them into fresh registers: (set (reg:SI Rtmp) (zero_extend:SI (subreg:QI (reg:SI R)))) which leaves both R and Rtmp live at the same time, or whether we do them in-situ: (set (reg:SI R) (zero_extend:SI (subreg:QI (reg:SI R)))) Expand should know that the latter is valid, given the DECL_RTL. Thanks, Richard