From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id B229C385736B for ; Tue, 17 May 2022 09:03:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B229C385736B Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 579A41042; Tue, 17 May 2022 02:03:59 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7CAA13F73D; Tue, 17 May 2022 02:03:58 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions. References: Date: Tue, 17 May 2022 10:03:57 +0100 In-Reply-To: (Tamar Christina's message of "Tue, 17 May 2022 07:55:20 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 May 2022 09:04:01 -0000 Tamar Christina writes: [=E2=80=A6] >> E.g. does the patch avoid the AND in: >>=20 >> #include >> uint8_t foo(uint8_t x, int y) { >> if (y) { >> printf("Foo %d\n", x ? 1 : 2); >> __builtin_abort (); >> } >> return x + 1; >> } >>=20 >> ? > > Morning, > > It does actually, it generates: > > foo: > cbnz w1, .L9 > add w0, w0, 1 > ret > .L9: > tst w0, 255 > stp x29, x30, [sp, -16]! > cset w1, ne > add w1, w1, 1 > mov x29, sp > adrp x0, .LC0 > add x0, x0, :lo12:.LC0 > bl printf > bl abort > .size foo, .-foo Ah, nice. > Now I will admit that this isn't because of a grand master design, but > purely because the patch works around the cases seen in SPEC. In those > cases the comparisons in question were floated out of the if statement. > > The heuristic in patch 2/3 allows this because it only looks for compares= in > gimple assigns whereas in this case the compare is in the Gimple cond > directly. OK. [=E2=80=A6] >> > We generate for e.g.: >> > >> > #include >> > >> > uint16_t f8 (uint8_t xr, uint8_t xc){ >> > return (uint8_t)(xr * xc); >> > } >> > >> > (insn 9 6 10 2 (set (reg:HI 101) >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1 >> (nil)) >> (insn 10 9 11 2 (set (reg:HI 102) >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1 >> (nil)) >> (insn 11 10 12 2 (set (reg:SI 103) >> (mult:SI (subreg:SI (reg:HI 101) 0) >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1 >> (nil)) >> > >> > Out of expand. The paradoxical subreg isn't generated at all out of >> > expand unless it's needed. It does keep the original params around as >> unused: >> > >> > (insn 2 7 4 2 (set (reg:QI 97) >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1 >> (nil)) >> (insn 4 2 3 2 (set (reg:QI 99) >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1 >> (nil)) >> > >> > And the paradoxical subreg is moved into the first operation requiring= it: >> > >> > (insn 11 10 12 2 (set (reg:SI 103) >> (mult:SI (subreg:SI (reg:HI 101) 0) >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1 >> (nil)) >>=20 >> Ah, OK, this isn't what I'd imaagined. I thought the xr and xc register= s would >> be SIs and the DECL_RTLs would be QI subregs of those SI regs. >> I think that might work better, for the reasons above. (That is, whenev= er we >> need the register in extended form, we can simply extend the existing reg >> rather than create a new one.) > > Ah, I see, no, I explicitly avoid this. When doing the type promotions I = tell it that > size of the copies of xr and xc is still the original size, e.g. QI (i.e.= I don't change 97 and 99). > This is different from what we do with extends where 97 and 99 *would* be= changed. > > The reason is that if I make this SI the compiler thinks it knows the val= ue of all the bits > in the register which led to various miscompares as it thinks it can use = the SI value directly. > > This happens because again the xr and xc are hard regs. So having 97 be > > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0)) > > gets folded to an incorrect > > (set (reg:SI 97) (reg:SI 0 x0 [ xr ])) This part I would expect (and hope for :-)). > And now 97 is free to be used without any zero extension, as 97 on it's o= wn is an invalid RTX. But the way I'd imagined it working, expand would need to insert an extension before any operation that needs the upper 24 bits to be defined (e.g. comparisons, right shifts). If the DECL_RTL is (subreg:QI (reg:SI x) 0) then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P would/should be false for the subreg. E.g. for: int8_t foo(int8_t x) { return x >> 1; } x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter assignment would be expanded as: (set (reg:SI x) (reg:SI x0)) the shift would be expanded as: (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0))) (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1))) and the return assignment would be expanded as: (set (reg:SI x0) (reg:SI x)) x + 1 would instead be expanded to just: (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1))) (without an extension). I realised later though that, although reusing the DECL_RTL reg for the extension has the nice RA property of avoiding multiple live values, it would make it harder to combine the extension into the operation if the variable is still live afterwards. So I guess we lose something both ways. Maybe we need a different approach, not based on changing PROMOTE_MODE. I wonder how easy it would be to do the promotion in gimple, then reuse backprop to determine when a sign/zero-extension (i.e. a normal gimple cast) can be converted into an =E2=80=9Cany extend=E2= =80=9D (probably represented as a new ifn). > So I have to keep the intermediate copy QI mode, after which the RTX opti= mizations > being done during expand generates the forms above. > >>=20 >> I think that's where confusion was coming from. >>=20 >> > In any case, I'm still not totally sure what the objection here is. >> > Afaik, compares need to be treated specially because in GIMPLE they >> > already are. Afaik, C integer promotion rules state that in the >> > comparison 0 should have been promoted to an integer constant of rank >> > int and so the comparison itself should have been done as integer. >> > i.e. extended. And most of our patterns are based around this. >> > >> > Gimple however doesn't do this, the comparison is done in the rank of >> > the variable and there is no explicit conversion. This happened to be >> > fixed up before during the forced promotion. So to me the heuristic >> > doesn't seem to be that crazy.. >>=20 >> I guess I still don't see the distinction. C says the same thing about >> +, -, >>, etc. And gimple is free to do those operations in narrow >> +types >> if it wants to, and if that doesn't change the semantics. (Not that gim= ple >> always does them in narrow types. But it is valid gimple.) >>=20 >> The optimisation problem doesn't come from C or gimple semantics, but >> from the fact that AArch64 (unlike x86 say) doesn't have byte add, byte >> compare, byte right shift, etc. We therefore need to promote 8-bit and = 16- >> bit operations to 32 bits first. >>=20 >> For add, subtract, multiply, left shift, and logic ops, we can avoid def= ining the >> upper bits of the inputs when we do these extensions, because the upper >> bits of the inputs don't affect the useful bits of the result. But for >> comparisons, right shifts, and divides, etc., we do need to extend. >>=20 >> AIUI, the comparison case is special because (for AArch64-specific reaso= ns), >> we prefer extend + cbz to tst + branch, especially when the extend can be >> shared. > > Agreed, so I think we agree on this =F0=9F=98=8A I guess the disagreement= is where this > should be done. I'll admit that the testcase above works by coincidence. = But if > we don't do it during expand time, the only place I can think of to intro= duce > the zero extends is to add various patterns to do an early split of any-e= xtend > + cmp. But wouldn't that be more fragile? At least at expand time all co= mparisons > are tcc_comparisons.=20 I guess one question is: is the patch without the comparison handling just exacerbating an existing problem? Do we already make similar bad choices between extend+cbz and tst+branch in cases where the variables aren't short, but where intermediate calculations involve &s? If so, it might be something worth tackling in its own right, regardless of where the &s or extensions come from. But yeah, I'm not sure how easily that would fit into existing passes. Thanks, Richard