From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 3FA903858D3C for ; Wed, 18 May 2022 07:49:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 3FA903858D3C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A00C823A; Wed, 18 May 2022 00:49:02 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C22013F73D; Wed, 18 May 2022 00:49:01 -0700 (PDT) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina , "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" , nd , "rguenther\@suse.de" , "jeffreyalaw\@gmail.com" Subject: Re: [PATCH 1/3]middle-end: Add the ability to let the target decide the method of argument promotions. References: Date: Wed, 18 May 2022 08:49:00 +0100 In-Reply-To: (Tamar Christina's message of "Tue, 17 May 2022 17:45:50 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 May 2022 07:49:05 -0000 Tamar Christina writes: > [=E2=80=A6] >> >> > We generate for e.g.: >> >> > >> >> > #include >> >> > >> >> > uint16_t f8 (uint8_t xr, uint8_t xc){ >> >> > return (uint8_t)(xr * xc); >> >> > } >> >> > >> >> > (insn 9 6 10 2 (set (reg:HI 101) >> >> (zero_extend:HI (reg/v:QI 96 [ xr ]))) "prom.c":4:16 -1 >> >> (nil)) >> >> (insn 10 9 11 2 (set (reg:HI 102) >> >> (zero_extend:HI (reg/v:QI 98 [ xc ]))) "prom.c":4:16 -1 >> >> (nil)) >> >> (insn 11 10 12 2 (set (reg:SI 103) >> >> (mult:SI (subreg:SI (reg:HI 101) 0) >> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1 >> >> (nil)) >> >> > >> >> > Out of expand. The paradoxical subreg isn't generated at all out of >> >> > expand unless it's needed. It does keep the original params around >> >> > as >> >> unused: >> >> > >> >> > (insn 2 7 4 2 (set (reg:QI 97) >> >> (reg:QI 0 x0 [ xr ])) "prom.c":3:37 -1 >> >> (nil)) >> >> (insn 4 2 3 2 (set (reg:QI 99) >> >> (reg:QI 1 x1 [ xc ])) "prom.c":3:37 -1 >> >> (nil)) >> >> > >> >> > And the paradoxical subreg is moved into the first operation requir= ing it: >> >> > >> >> > (insn 11 10 12 2 (set (reg:SI 103) >> >> (mult:SI (subreg:SI (reg:HI 101) 0) >> >> (subreg:SI (reg:HI 102) 0))) "prom.c":4:16 -1 >> >> (nil)) >> >> >> >> Ah, OK, this isn't what I'd imaagined. I thought the xr and xc >> >> registers would be SIs and the DECL_RTLs would be QI subregs of those= SI >> regs. >> >> I think that might work better, for the reasons above. (That is, >> >> whenever we need the register in extended form, we can simply extend >> >> the existing reg rather than create a new one.) >> > >> > Ah, I see, no, I explicitly avoid this. When doing the type promotions >> > I tell it that size of the copies of xr and xc is still the original s= ize, e.g. QI (i.e. I >> don't change 97 and 99). >> > This is different from what we do with extends where 97 and 99 *would* >> be changed. >> > >> > The reason is that if I make this SI the compiler thinks it knows the >> > value of all the bits in the register which led to various miscompares= as it >> thinks it can use the SI value directly. >> > >> > This happens because again the xr and xc are hard regs. So having 97 >> > be >> > >> > (set (reg:SI 97) (subreg:SI (reg:QI 0 x0 [ xr ]) 0)) >> > >> > gets folded to an incorrect >> > >> > (set (reg:SI 97) (reg:SI 0 x0 [ xr ])) >>=20 >> This part I would expect (and hope for :-)). >>=20 >> > And now 97 is free to be used without any zero extension, as 97 on it'= s own >> is an invalid RTX. >>=20 >> But the way I'd imagined it working, expand would need to insert an >> extension before any operation that needs the upper 24 bits to be defined >> (e.g. comparisons, right shifts). If the DECL_RTL is (subreg:QI (reg:SI= x) 0) >> then the upper bits are not defined, since SUBREG_PROMOTED_VAR_P >> would/should be false for the subreg. > > Ah I see, my fear here was that if we have a pattern which splits out the= zero-extend for whatever reason > that if it gets folded it would be invalid. But I think I understand wha= t you meant. In your case > we'd never again use the hardreg, but that everything goes through 97. Go= t it. Yeah. The expand code is supposed to move the hard register into a pseudo at the earliest opportunity (at the head of the function) and then everything else should use the pseudo. Using the hard register later could lead to spill failures, or to attempts to keep the register live across calls. >> E.g. for: >>=20 >> int8_t foo(int8_t x) { return x >> 1; } >>=20 >> x would have a DECL_RTL of (subreg:QI (reg:SI x) 0), the parameter >> assignment would be expanded as: >>=20 >> (set (reg:SI x) (reg:SI x0)) >>=20 >> the shift would be expanded as: >>=20 >> (set (reg:SI x) (zero_extend:SI (subreg:QI (reg:SI x) 0))) >> (set (reg:SI x) (ashiftrt:SI (reg:SI x) (const_int 1))) >>=20 >> and the return assignment would be expanded as: >>=20 >> (set (reg:SI x0) (reg:SI x)) >>=20 >> x + 1 would instead be expanded to just: >>=20 >> (set (reg:SI x) (plus:SI (reg:SI x) (const_int 1))) >>=20 >> (without an extension). >>=20 >> I realised later though that, although reusing the DECL_RTL reg for the >> extension has the nice RA property of avoiding multiple live values, it = would >> make it harder to combine the extension into the operation if the variab= le is >> still live afterwards. So I guess we lose something both ways. >>=20 >> Maybe we need a different approach, not based on changing >> PROMOTE_MODE. >>=20 >> I wonder how easy it would be to do the promotion in gimple, then reuse >> backprop to determine when a sign/zero-extension (i.e. a normal gimple c= ast) >> can be converted into an =E2=80=9Cany extend=E2=80=9D >> (probably represented as a new ifn). > > Do you mean without changing the hook implementation but keeping the curr= ent promotion? Yeah, keep the hook definitions as they are now, but do the promotion in gimple by widening types where necessary. > I guess the problem here is that it's the inverse cases that's the proble= m isn't it? It's not that in > gimple there are unneeded extends, it's that some operations require an a= ny-extend no? > > like in gimple ~a where a is an 8-bit quantity requires an any-extend, bu= t no cast would be there > in gimple. > > So for instance > > #include > > uint8_t f (uint8_t a) > { > return ~a; > } > > Is just simply: > > f (uint8_t a) > { > uint8_t _2; > > [local count: 1073741824]: > _2 =3D ~a_1(D); > return _2; > > } Right. But the idea was to run a late-ish isel-like pass that converts this to: f (uint8_t a) { uint8_t _2; uint32_t _3; uint8_t _4; [local count: 1073741824]: _2 =3D .ANY_EXTEND(a_1(D), (uint32_t)0); _3 =3D ~_2; _4 =3D (uint8_t)_3; return _4; } We'd need to experiment with the best placing of the pass. > In gimple. I'm also slightly worried about interfering with phi opts. Bac= kprop runs > before ifcombine and pihops for instance and there are various phi opts l= ike ifcombine_ifandif > that rely on the BB containing only the phi node. Adding an any-extend i= n between would break this. Yeah, the current backprop pass runs quite early. But I meant that we could reuse the code (or simply run another instance of the pass, extended to handle this case) at the appropriate point. > I also wonder if the IFN won't interfere with range analysis of expressio= ns. Unless we manage to strategically > Insert the IFNs on entire expressions and not in the intermediate SSA com= ponents. This would be part of the trade-off in placing the promotion pass. Thanks, Richard