From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id C9F603858CDB for ; Fri, 30 Sep 2022 09:17:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9F603858CDB Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 213AC15A1; Fri, 30 Sep 2022 02:17:19 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 997813F73B; Fri, 30 Sep 2022 02:17:11 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,Liwei Xu , gcc-patches@gcc.gnu.org, wilson@tuliptree.org, admin@levyhsu.com, richard.sandiford@arm.com Cc: Liwei Xu , gcc-patches@gcc.gnu.org, wilson@tuliptree.org, admin@levyhsu.com Subject: Re: [PATCH] Optimize nested permutation to single VEC_PERM_EXPR [PR54346] References: <20220926065604.783193-1-liwei.xu@intel.com> Date: Fri, 30 Sep 2022 10:17:10 +0100 In-Reply-To: (Richard Biener's message of "Mon, 26 Sep 2022 10:21:07 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-45.3 required=5.0 tests=BAYES_00,GIT_PATCH_0,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Mon, Sep 26, 2022 at 8:58 AM Liwei Xu wrote: >> >> This patch implemented the optimization in PR 54346, which Merges >> >> c = VEC_PERM_EXPR ; >> d = VEC_PERM_EXPR ; >> to >> d = VEC_PERM_EXPR ; >> >> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} >> tree-ssa/forwprop-19.c fail to pass but I'm not sure whether it >> is ok to removed it. > > Looks good, but leave Richard a chance to ask for VLA vector support which > might be trivial to do. Sorry for the slow reply. It might be tricky to handle the general case, so I'd suggest going with this for now and dealing with VLA as a follow-up. (Probably after Prathamesh's changes to fold_vec_perm_expr.) Thanks, Richard > Btw, doesn't this handle the VEC_PERM + VEC_PERM case in > tree-ssa-forwprop.cc:simplify_permutation as well? Note _that_ does > seem to handle VLA vectors. > > Thanks, > Richard. > >> gcc/ChangeLog: >> >> PR target/54346 >> * match.pd: Merge the index of VCST then generates the new vec_perm. >> >> gcc/testsuite/ChangeLog: >> >> PR target/54346 >> * gcc.dg/pr54346.c: New test. >> >> Co-authored-by: liuhongt >> --- >> gcc/match.pd | 41 ++++++++++++++++++++++++++++++++++ >> gcc/testsuite/gcc.dg/pr54346.c | 13 +++++++++++ >> 2 files changed, 54 insertions(+) >> create mode 100755 gcc/testsuite/gcc.dg/pr54346.c >> >> diff --git a/gcc/match.pd b/gcc/match.pd >> index 345bcb701a5..9219b0a10e1 100644 >> --- a/gcc/match.pd >> +++ b/gcc/match.pd >> @@ -8086,6 +8086,47 @@ and, >> (minus (mult (vec_perm @1 @1 @3) @2) @4))) >> >> >> +/* (PR54346) Merge >> + c = VEC_PERM_EXPR ; >> + d = VEC_PERM_EXPR ; >> + to >> + d = VEC_PERM_EXPR ; */ >> + >> +(simplify >> + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) >> + (with >> + { >> + if(!TYPE_VECTOR_SUBPARTS (type).is_constant()) >> + return NULL_TREE; >> + >> + tree op0; >> + machine_mode result_mode = TYPE_MODE (type); >> + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); >> + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant(); >> + vec_perm_builder builder0; >> + vec_perm_builder builder1; >> + vec_perm_builder builder2 (nelts, nelts, 1); >> + >> + if (!tree_to_vec_perm_builder (&builder0, @3) >> + || !tree_to_vec_perm_builder (&builder1, @4)) >> + return NULL_TREE; >> + >> + vec_perm_indices sel0 (builder0, 2, nelts); >> + vec_perm_indices sel1 (builder1, 1, nelts); >> + >> + for (int i = 0; i < nelts; i++) >> + builder2.quick_push (sel0[sel1[i].to_constant()]); >> + >> + vec_perm_indices sel2 (builder2, 2, nelts); >> + >> + if (!can_vec_perm_const_p (result_mode, op_mode, sel2, false)) >> + return NULL_TREE; >> + >> + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); >> + } >> + (vec_perm @1 @2 { op0; }))) >> + >> + >> /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. >> The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic >> constant which when multiplied by a power of 2 contains a unique value >> diff --git a/gcc/testsuite/gcc.dg/pr54346.c b/gcc/testsuite/gcc.dg/pr54346.c >> new file mode 100755 >> index 00000000000..d87dc3a79a5 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/pr54346.c >> @@ -0,0 +1,13 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O -fdump-tree-dse1" } */ >> + >> +typedef int veci __attribute__ ((vector_size (4 * sizeof (int)))); >> + >> +void fun (veci a, veci b, veci *i) >> +{ >> + veci c = __builtin_shuffle (a, b, __extension__ (veci) {1, 4, 2, 7}); >> + *i = __builtin_shuffle (c, __extension__ (veci) { 7, 2, 1, 5 }); >> +} >> + >> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR.*{ 3, 6, 0, 0 }" "dse1" } } */ >> +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 1 "dse1" } } */ >> \ No newline at end of file >> -- >> 2.18.2 >>