From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 55DED3858D1E for ; Fri, 30 Sep 2022 14:33:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55DED3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2306E2189A; Fri, 30 Sep 2022 14:33:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1664548395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CrdCK7JKq+0CHkR9neATE4del3HV+wfwTkrbNm0eeeM=; b=gMiRLX4WGkHP5HiXOokJavIrWCVsdSr7etmxQSr+5DyajvlYjWwYN/c6g8M5QEWvosuDSo d8QDoywuHzcn7tr2EduYjDwbowPGwxh27sclJvL7EwXi5m3irUPHsUgGRrXKQQWMw5S75/ bm6LpGotrIOogK3VyP67nuLrHaOIDH4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1664548395; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CrdCK7JKq+0CHkR9neATE4del3HV+wfwTkrbNm0eeeM=; b=HrPMonButiZ4aeozNCLE1+8Pz0gVqQXEey2uX8+XKQVnk/8rDew2yaXLPqGcdo0PIY4QMb fDAOk9R1aSAptXBg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id F29C513677; Fri, 30 Sep 2022 14:33:14 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id o2MyOyr+NmOfWwAAMHmgww (envelope-from ); Fri, 30 Sep 2022 14:33:14 +0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable From: Richard Biener Mime-Version: 1.0 (1.0) Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend Date: Fri, 30 Sep 2022 16:33:14 +0200 Message-Id: References: Cc: Tamar Christina , nd In-Reply-To: To: Richard Sandiford via Gcc-patches X-Mailer: iPhone Mail (19H12) X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MIME_QP_LONG_LINE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: > Am 30.09.2022 um 16:29 schrieb Richard Sandiford via Gcc-patches : >=20 > =EF=BB=BFTamar Christina writes: >>> -----Original Message----- >>> From: Richard Biener >>> Sent: Friday, September 30, 2022 12:53 PM >>> To: Tamar Christina >>> Cc: Richard Sandiford ; Tamar Christina via >>> Gcc-patches ; nd ; Jeff Law >>> >>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional >>> branches, give hint if argument is a truth type to backend >>>=20 >>>> On Fri, 30 Sep 2022, Tamar Christina wrote: >>>=20 >>>>> -----Original Message----- >>>>> From: Richard Biener >>>>> Sent: Friday, September 30, 2022 11:17 AM >>>>> To: Tamar Christina >>>>> Cc: Richard Sandiford ; Tamar Christina >>>>> via Gcc-patches ; nd ; Jeff >>> Law >>>>> >>>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional >>>>> branches, give hint if argument is a truth type to backend >>>>>=20 >>>>> On Fri, 30 Sep 2022, Tamar Christina wrote: >>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>>> -----Original Message----- >>>>>>> From: Richard Sandiford >>>>>>> Sent: Friday, September 30, 2022 9:49 AM >>>>>>> To: Tamar Christina >>>>>>> Cc: Richard Biener ; Tamar Christina via >>>>>>> Gcc-patches ; nd ; Jeff >>> Law >>>>>>> >>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>> conditional branches, give hint if argument is a truth type to >>>>>>> backend >>>>>>>=20 >>>>>>> Tamar Christina writes: >>>>>>>>> -----Original Message----- >>>>>>>>> From: Richard Sandiford >>>>>>>>> Sent: Friday, September 30, 2022 9:29 AM >>>>>>>>> To: Tamar Christina >>>>>>>>> Cc: Richard Biener ; Tamar Christina via >>>>>>>>> Gcc-patches ; nd ; Jeff >>>>> Law >>>>>>>>> >>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>>>> conditional branches, give hint if argument is a truth type >>>>>>>>> to backend >>>>>>>>>=20 >>>>>>>>> Tamar Christina writes: >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: Gcc-patches >>>>>>>>>> bounces+tamar.christina=3Darm.com@gcc.gnu.org> On Behalf Of >>>>>>> Richard >>>>>>>>>>> Biener via Gcc-patches >>>>>>>>>>> Sent: Thursday, September 29, 2022 12:09 PM >>>>>>>>>>> To: Tamar Christina via Gcc-patches >>>>>>>>>>> >>>>>>>>>>> Cc: Richard Sandiford ; nd >>>>>>> >>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of >>>>>>>>>>> conditional branches, give hint if argument is a truth >>>>>>>>>>> type to backend >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>>> Am 29.09.2022 um 12:23 schrieb Tamar Christina via >>>>>>>>>>>> Gcc-patches >>>>>>>>>>>> >>>>>>>>>> patches@gcc.gnu.org>: >>>>>>>>>>>>=20 >>>>>>>>>>>>=20 >>>>>>>>>>>>>=20 >>>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>>> From: Richard Biener >>>>>>>>>>>>> Sent: Thursday, September 29, 2022 10:41 AM >>>>>>>>>>>>> To: Richard Sandiford >>>>>>>>>>>>> Cc: Jeff Law ; Tamar Christina >>>>>>>>>>>>> ; gcc-patches@gcc.gnu.org; nd >>>>>>>>>>> >>>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion >>>>>>>>>>>>> of conditional branches, give hint if argument is a >>>>>>>>>>>>> truth type to backend >>>>>>>>>>>>>=20 >>>>>>>>>>>>>> On Thu, 29 Sep 2022, Richard Sandiford wrote: >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Jeff Law writes: >>>>>>>>>>>>>>> On 9/28/22 09:04, Richard Sandiford wrote: >>>>>>>>>>>>>>>> Tamar Christina writes: >>>>>>>>>>>>>>>>>> Maybe the target could use (subreg:SI (reg:BI >>>>>>>>>>>>>>>>>> ...)) as >>>>>>> argument. >>>>>>>>>>> Heh. >>>>>>>>>>>>>>>>> But then I'd still need to change the expansion >>>>>>>>>>>>>>>>> code. I suppose this could prevent the issue with >>>>>>>>>>>>>>>>> changes to code on >>>>>>>>> other targets. >>>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>>>>>> We have undocumented addcc, negcc, etc. >>>>>>>>>>>>>>>>>>>> patterns, >>>>>>> should >>>>>>>>> we >>>>>>>>>>>>>>>>>>>> have aandcc >>>>>>>>>>>>>>>>> pattern for this indicating support for andcc + >>>>>>>>>>>>>>>>> jump as opposedto >>>>>>>>>>>>> cmpcc + jump? >>>>>>>>>>>>>>>>>>> This could work yeah. I didn't know these existed. >>>>>>>>>>>>>>>>>> Ah, so they are conditional add, not add setting >>>>>>>>>>>>>>>>>> CC, so andcc wouldn't be appropriate. >>>>>>>>>>>>>>>>>> So I'm not sure how we'd handle such situation - >>>>>>>>>>>>>>>>>> maybe looking at REG_DECL and recognizing a _Bool >>>>>>>>>>>>>>>>>> PARM_DECL is >>>>>>> OK? >>>>>>>>>>>>>>>>> I have a slight suspicion that Richard Sandiford >>>>>>>>>>>>>>>>> would likely reject this though.. >>>>>>>>>>>>>>>> Good guess :-P We shouldn't rely on something like >>>>>>>>>>>>>>>> that for >>>>>>>>>>>>> correctness. >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> Would it help if we promoted the test-and-branch >>>>>>>>>>>>>>>> instructions to optabs, alongside cbranch? The jump >>>>>>>>>>>>>>>> expanders could then target it >>>>>>>>>>>>> directly. >>>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>>> IMO that'd be a reasonable thing to do if it does help. >>>>>>>>>>>>>>>> It's a relatively common operation, especially on >>>>>>>>>>>>>>>> CISCy >>>>> targets. >>>>>>>>>>>>>>>=20 >>>>>>>>>>>>>>> But don't we represent these single bit tests using >>>>>>>>>>>>>>> zero_extract as the condition of the branch? I guess >>>>>>>>>>>>>>> if we can generate them directly rather than waiting >>>>>>>>>>>>>>> for combine to deduce that we're dealing with a >>>>>>>>>>>>>>> single bit test and constructing the zero_extract >>>>>>>>>>>>>>> form would be an improvement and might help aarch at >>>>>>>>>>>>>>> the same >>>>>>>>> time. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> Do you mean that the promote_mode stuff should use >>>>>>>>>>>>>> ext(z)v rather than zero_extend to promote a bool, where >>> available? >>>>>>>>>>>>>> If so, I agree that might help. But it sounds like it >>>>>>>>>>>>>> would have downsides >>>>>>>>> too. >>>>>>>>>>>>>> Currently a bool memory can be zero-extended on the >>>>>>>>>>>>>> fly using a load, but if we used the zero_extract form >>>>>>>>>>>>>> instead, we'd have to extract the bit after the load. >>>>>>>>>>>>>> And (as an >>>>>>>>>>>>>> alternative) choosing different behaviour based on >>>>>>>>>>>>>> whether expand sees a REG or a MEM sounds like it >>>>>>>>>>>>>> could still cause problems, since REGs could be >>>>>>>>>>>>>> replaced by MEMs (or vice versa) >>>>>>> later in the RTL passes. >>>>>>>>>>>>>>=20 >>>>>>>>>>>>>> ISTM that the original patch was inserting an extra >>>>>>>>>>>>>> operation in the branch expansion in order to target a >>>>>>>>>>>>>> specific >>>>> instruction. >>>>>>>>>>>>>> Targeting the instruction in expand seems good, but >>>>>>>>>>>>>> IMO we should do it directly, based on knowledge of >>>>>>>>>>>>>> whether the instruction actually >>>>>>>>>>> exists. >>>>>>>>>>>>>=20 >>>>>>>>>>>>> Yes, I think a compare-and-branch pattern is the best fit >>> here. >>>>>>>>>>>>> Note on GIMPLE we'd rely on the fact this is a >>>>>>>>>>>>> BOOLEAN_TYPE (so even 8 bit precision bools only have 1 >>>>>>>>>>>>> and 0 as meaningful >>>>> values). >>>>>>>>>>>>> So the 'compare-' bit in compare-and-branch would be >>>>>>>>>>>>> interpreting a BOOLEAN_TYPE, not so much a general >>> compare. >>>>>>>>>>>>=20 >>>>>>>>>>>> Oh, I was thinking of adding a constant argument >>>>>>>>>>>> representing the precision that is relevant for the >>>>>>>>>>>> compare in order to make this a bit more >>>>>>>>>>> general/future proof. >>>>>>>>>>>>=20 >>>>>>>>>>>> Are you thinking I should instead just make the optab >>>>>>>>>>>> implicitly only work for 1-bit precision comparisons? >>>>>>>>>>>=20 >>>>>>>>>>> What?s the optab you propose (cite also the documentation >>> part)? >>>>>>>>>>=20 >>>>>>>>>> tbranchmode5 >>>>>>>>>> Conditional branch instruction combined with a bit test >>> instruction. >>>>>>>>> Operand 0 is a comparison operator. >>>>>>>>>> Operand 1 and Operand 2 are the first and second operands >>>>>>>>>> of the >>>>>>>>> comparison, respectively. >>>>>>>>>> Operand 3 is the number of low-order bits that are >>>>>>>>>> relevant for the >>>>>>>>> comparison. >>>>>>>>>> Operand 4 is the code_label to jump to. >>>>>>>>>=20 >>>>>>>>> For the TB instructions (and for other similar instructions >>>>>>>>> that I've seen on other architectures) it would be more >>>>>>>>> useful to have a single-bit test, with operand 4 specifying the bi= t >>> position. >>>>>>>>> Arguably it might then be better to have separate eq and ne >>>>>>>>> optabs, to avoid the awkward doubling of the operands >>>>>>>>> (operand 1 contains >>>>>>> operands 2 and 3). >>>>>>>>>=20 >>>>>>>>> I guess a more general way of achieving the same thing would >>>>>>>>> be to make operand 4 in the optab above a mask rather than a bit >>> count. >>>>>>>>> But that might be overly general, if there are no known >>>>>>>>> architectures that have such an instruction. >>>>>>>>=20 >>>>>>>> One of the reasons I wanted a range rather than a single bit >>>>>>>> is that I can the use this to generate cbz/cbnz early on as well. >>>>>>>=20 >>>>>>> We already have the opportunity to do that via cbranch4. >>>>>>> But at the moment aarch64.md always forces the separate >>>>>>> comparison instead. (Not sure why TBH. Does it enable more >>>>>>> ifcvt >>>>>>> opportunities?) >>>>>>>=20 >>>>>>> If we change the body of cbranch4 to: >>>>>>>=20 >>>>>>> if ((GET_CODE (operands[0]) !=3D EQ && GET_CODE (operands[0]) !=3D >>> NE) >>>>>>> || operands[2] !=3D const0_rtx) >>>>>>> { >>>>>>> operands[1] =3D aarch64_gen_compare_reg (GET_CODE >>> (operands[0]), >>>>>>> operands[1], operands[2]);= >>>>>>> operands[2] =3D const0_rtx; >>>>>>> } >>>>>>>=20 >>>>>>> then we generate the cbz/cbnz directly. >>>>>>>=20 >>>>>>=20 >>>>>> Ah ok, then if Richi agrees, bitpos it is then instead of bit count. >>>>>=20 >>>>> Somehow I understood that cbranch<>4 is already fully capable of the >>>>> optimization? >>>>>=20 >>>>> On your earlier proposal I'd have commented that if it wouldn't make >>>>> sense to instead have a CCmode setter instead of an expander with a >>> branch label? >>>>> That would be a bit test, like {sign,zero}_extract compared against >>>>> zero which can then be combined with a branch? >>>>>=20 >>>>=20 >>>> I missed that part, that could work too. >>>>=20 >>>>> Of course if the ISA is really bit-test-and-branch then cbranch<>4 >>>>> itself or a variant of it might be more useful. Maybe >>>>> cbranchbi4 would be "abused" for this? >>>>=20 >>>> The instruction is an actual bit-test-and-branch with any arbitrary bit= pos. >>>> Yes we can abuse cbranchbi4 for this, but then it also means we can't e= .g. >>>> use this to optimize a < 0 where a is a signed value. With the new >>>> optab this would just be a bit-test-and-branch of the sign bit. >>>>=20 >>>> But also I'm not entirely convinced that using `BImode` and assuming a >>>> single bit is safe here. What happens if I compile my source with -std=3D= c89? >>>>=20 >>>> So I personally think the new optab makes more sense here. The CC sette= r >>> would work too. >>>>=20 >>>> I guess my question is, between you folks, which approach would you >>>> like. It seems that Richi You'd like a CC setter. Richard do you have a= >>> preference of one over the other? >>>=20 >>> My order of preference is >>>=20 >>> a) an existing pattern, if possible >>> b) something usable by N > 1 targets we know of, even if it requires som= e >>> combine magic >>> c) something that fits the actual ISA >>>=20 >>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and set= s >>> ZF when the result is zero. For a branch on sign bit there's easier way= s, so it's >>> probably not very useful for general compare and branch optimization and= if >>> (a & 0x30) is probably handled by combine(?). >>=20 >> Agreed, I was more giving an example of other uses. But ok. >>=20 >>> It also seems that if (a & 1) is handled for aarch64 already and it's ju= st a lack of >>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?= >>=20 >> Yes this was what I was originally after. Your original suggestion was t= o abuse BImode >> and cbranch for this. That could work, but I worry about introducing BIm= ode in the RTL, >> as I don't think we currently use it at all and wonder about new missed o= ptimizations. >>=20 >> Richard Sandiford, would this be OK with you? I also don't want to emit a= n extract here as I think >> that's gonna have a bigger effect on other targets than & 1 did. >>=20 >> I would rather have something I can explicitly check for target support f= or. I also wonder if just a >> target hook asking the target how to expand a given tree Boolean argument= would work. >=20 > Personally I'd prefer the test-and-branch optab. We used to have > separate compare optabs and branch optabs in the cc0 days, but having > the combined pattern turned out to be much easier. We don't currently > expose CC registers at the optabs level and I think that's a good thing. >=20 > We could have a test-bit-and-set-pseudo optab. But using that optab > here would reintroduce the problem that I had with the original patch, > which is: >=20 > - Currently, the branch on the boolean value expands to a single optab > (cbranch4). That optab could easily expand to a single cb(n)z > instruction on aarch64 (even though the port chooses not to do that > currently). So taken on its own, the branch is already maximally > efficient. >=20 > The patch instead adds an extra instruction to the branch expansion, > so that it uses two optabs rather than one. On the face of it, > this extra instruction is entirely redundant. But emitting it can > sometimes allow profitable combines. That is, rather than expand to: >=20 > and reg, reg, 0xFF (from promote_mode) > cbranch on reg (from the branch expansion) >=20 > the patch expanded to: >=20 > and reg, reg, 0xFF (from promote_mode) > and reg, reg, 0x1 (from the branch expansion) > cbranch on reg (from the branch expansion) >=20 > The first two ANDs come from separate sources. When both ANDs exist, > combine can get rid of the redundancy and can then see that the > retained AND 0x1 + cbranch optimise to a test-and-branch instruction. >=20 > But if the target can't in fact use a test-and-branch in a particular > situation, we'd be left with the AND 0x1 and the cbranch. That's not > too bad if the AND from promote_mode and the AND 0x1 can be combined. > But if the AND with 0xff happens "too far" from the AND with 0x1 > (e.g. because it's in a loop preheader), or if there's intermediate > logic, we might end up keeping all three instructions. >=20 > Emitting an extra bit test instruction as part of the branch expansion > IMO has the same problem. We're relying on combine combining the > (redundant) bit test with the cbranch. If combine doesn't do that > (e.g. because the target doesn't support the combination in certain > cases) then we could be left with three instructions rather than the > original two. >=20 > That's why I think the fix is either for promote_mode to use a different > expansion for booleans (which could also have knock-on effects) or for > the branch expanders to target the test-and-branch instruction directly. >=20 > The optab wouldn't be an aarch64 special. arc, avr, and h8300sx have > similar instructions. There are problem others too: h8300sx was from > memory and I stopped looking after avr. :-) Fair enough, let=E2=80=99s go with that then. Richard=20 > Thanks, > Richard