From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4CE9438A8149 for ; Tue, 15 Nov 2022 17:39:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4CE9438A8149 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 534D913D5; Tue, 15 Nov 2022 09:39:55 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 37F903F587; Tue, 15 Nov 2022 09:39:48 -0800 (PST) From: Richard Sandiford To: Tamar Christina Mail-Followup-To: Tamar Christina ,Hongtao Liu , Tamar Christina via Gcc-patches , nd , "rguenther\@suse.de" , richard.sandiford@arm.com Cc: Hongtao Liu , Tamar Christina via Gcc-patches , nd , "rguenther\@suse.de" Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from arbitrary element position inside a vector References: Date: Tue, 15 Nov 2022 17:39:46 +0000 In-Reply-To: (Tamar Christina's message of "Tue, 15 Nov 2022 10:00:13 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-40.8 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tamar Christina writes: >> -----Original Message----- >> From: Hongtao Liu >> Sent: Tuesday, November 15, 2022 9:37 AM >> To: Tamar Christina >> Cc: Richard Sandiford ; Tamar Christina via >> Gcc-patches ; nd ; >> rguenther@suse.de >> Subject: Re: [PATCH 3/8]middle-end: Support extractions of subvectors from >> arbitrary element position inside a vector >> >> On Tue, Nov 15, 2022 at 4:51 PM Tamar Christina >> wrote: >> > >> > > -----Original Message----- >> > > From: Hongtao Liu >> > > Sent: Tuesday, November 15, 2022 8:36 AM >> > > To: Tamar Christina >> > > Cc: Richard Sandiford ; Tamar Christina >> > > via Gcc-patches ; nd ; >> > > rguenther@suse.de >> > > Subject: Re: [PATCH 3/8]middle-end: Support extractions of >> > > subvectors from arbitrary element position inside a vector >> > > >> > > Hi: >> > > I'm from https://gcc.gnu.org/pipermail/gcc-patches/2022- >> > > November/606040.html. >> > > > } >> > > > >> > > > /* See if we can get a better vector mode before extracting. >> > > > */ diff --git a/gcc/optabs.cc b/gcc/optabs.cc index >> > > > >> > > >> cff37ccb0dfc3dd79b97d0abfd872f340855dc96..f338df410265dfe55b68961600 >> > > 9 >> > > 0 >> > > > a453cc6a28d9 100644 >> > > > --- a/gcc/optabs.cc >> > > > +++ b/gcc/optabs.cc >> > > > @@ -6267,6 +6267,7 @@ expand_vec_perm_const (machine_mode >> mode, >> > > rtx v0, rtx v1, >> > > > v0_qi = gen_lowpart (qimode, v0); >> > > > v1_qi = gen_lowpart (qimode, v1); >> > > > if (targetm.vectorize.vec_perm_const != NULL >> > > > + && targetm.can_change_mode_class (mode, qimode, >> > > > + ALL_REGS) >> > > It looks like you want to guard gen_lowpart, shouldn't it be better >> > > to use validate_subreg or (tmp = gen_lowpart_if_possible (mode, >> target_qi)). >> > > IMHO, targetm.can_change_mode_class is mostly used for RA, but not >> > > to guard gen_lowpart. >> > >> > Hmm I don't think this is quite true, there are existing usages in >> > expr.cc and rtanal.cc That do this and aren't part of RA. As I >> > mentioned before for instance the canoncalization of vec_select to subreg >> in rtlanal for instances uses this. >> In theory, we need to iterate through all reg classes that can be assigned for >> both qimode and mode, if any regclass returns true for >> targetm.can_change_mode_class, the bitcast(validate_subreg) should be ok. >> Here we just passed ALL_REGS. > > Yes, and most targets where this transformation is valid return true here. > > I've checked: > * alpha > * arm > * aarch64 > * rs6000 > * s390 > * sparc > * pa > * mips > > And even the default example that other targets use from the documentation > would return true as the size of the modes are the same. > > X86 and RISCV are the only two targets that I found (but didn't check all) that > blankly return a result based on just the register classes. > > That is to say, there are more targets that adhere to the interpretation that > rclass here means "should be possible in some class in rclass" rather than > "should be possible in ALL classes of rclass". Yeah, I agree. A query "can something stored in ALL_REGS change from mode M1 to mode M2?" is meaningful if at least one register R in ALL_REGS can hold both M1 and M2. It's then the target's job to answer conservatively so that the result covers all such R. In principle it's OK for a target to err on the side of caution and forbid things that are actually OK. But that's going to risk losing performance in some cases, and sometimes that loss of performance will be unacceptable. IMO that's what's happening here. The target is applying x87 rules to things that (AIUI) are never stored in x87 registers, and so losing performance as a result. Note that the RA also uses ALL_REGS for some things, so this usage isn't specific to non-RA code. IMO it's not the job of target-independent code to iterate through individual classes and aggregate the result. One of the reasons for having union classes is to avoid the need to do that. And ALL_REGS is the ultimate union class. :-) The patch looks correct to me. Thanks, Richard >> > >> > So there are already existing precedence for this. And the >> > documentation for the hook says: >> > >> > "This hook returns true if it is possible to bitcast values held in registers of >> class rclass from mode from to mode to and if doing so preserves the low- >> order bits that are common to both modes. The result is only meaningful if >> rclass has registers that can hold both from and to. The default >> implementation returns true" >> > >> > So it looks like it's use outside of RA is perfectly valid.. and the >> > documentation also mentions in the example the use from the mid-end as >> an example. >> > >> > But if the mid-end maintainers are happy I'll use something else. >> > >> > Tamar >> > >> > > I did similar things in >> > > https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579296.html >> > > (and ALL_REGS doesn't cover all cases for registers which are both >> > > available for qimode and mode, ALL_REGS fail doesn't mean it can't >> > > be subreg, it just means parts of ALL_REGS can't be subreg. but with >> > > a subset of ALL_REGS, there could be a reg class which return true >> > > for >> > > targetm.can_change_mode_class) >> > > > && targetm.vectorize.vec_perm_const (qimode, qimode, >> > > > target_qi, >> > > v0_qi, >> > > > v1_qi, qimode_indices)) >> > > > return gen_lowpart (mode, target_qi); @@ -6311,7 +6312,8 >> > > > @@ expand_vec_perm_const (machine_mode mode, rtx v0, rtx v1, >> > > > } >> > > > >> > > > if (qimode != VOIDmode >> > > > - && selector_fits_mode_p (qimode, qimode_indices)) >> > > > + && selector_fits_mode_p (qimode, qimode_indices) >> > > > + && targetm.can_change_mode_class (mode, qimode, ALL_REGS)) >> > > > { >> > > > icode = direct_optab_handler (vec_perm_optab, qimode); >> > > > if (icode != CODE_FOR_nothing) diff --git >> > > > a/gcc/testsuite/gcc.target/aarch64/ext_1.c >> > > > b/gcc/testsuite/gcc.target/aarch64/ext_1.c >> > > > new file mode 100644 >> > > > index >> > > > >> > > >> 0000000000000000000000000000000000000000..18a10a14f1161584267a8472e5 >> > > 71 >> > > > b3bc2ddf887a >> > > >> > > >> > > >> > > >> > > -- >> > > BR, >> > > Hongtao >> >> >> >> -- >> BR, >> Hongtao