From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by sourceware.org (Postfix) with ESMTPS id 9B5E93858C50 for ; Sun, 23 Apr 2023 06:11:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9B5E93858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-3f19b9d5358so7970835e9.1 for ; Sat, 22 Apr 2023 23:11:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1682230316; x=1684822316; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=JuYHHp9Wt6rX13SHh8M6IRO6+ab0Wp2gp+J2IhUh/X8=; b=BsRIgq3TQxmcsGip3yzuUGQFkNlXpxphvBiruyilotyy59yDaPr6YQMOzH/KI8IO97 rLOArTFc4DgU1QiDcVrYSw0tbdj3zam7UMMDAK75mUKnz9dXF+c3t0WrCDzhn79FxXRk 3c88le+SPdl8n8zYyvez216S32YMPaU7c+X2El3Hd5bokG/kxpkzxrkb/+aSkcoBbFRk AOFtKgxMly1E1iuKjNld09phrAUAgIDJu1mAfXHdm2nMycpvePioGzocpaMMOwXxLCly N3q/ql8V4aIais+aqnA7/vOem5L1DlitqslMaODkg7BNvyJvHGqv1Hxeu4GrqHC5hQVQ LKWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682230316; x=1684822316; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=JuYHHp9Wt6rX13SHh8M6IRO6+ab0Wp2gp+J2IhUh/X8=; b=WSoGhgZvUlILaBqfUUdgou/hJvwe46bwWEl88bN7VWUDSBPfU1EeKy3npdJxhHCT0f SbYiVtIgQT6tljwR3FrcowE+bevGP3K4iKgor6BsOEA1SlKLHw0HkMJbrTf14psoC7uA Sd/aFA9AWC9dDYUCSsYcdoEEs5hM41vuKOSB0ZX4jDxeCGsTz13OeqxPh6xWw/ByKv/W G44rUP/A/Nb7zXPzvPs3o7ybAj/8DFsbIv+imyriCVP6T0buYeUbsLPZdXs8YPtClaRc W40jlaFbfo95YmeR4nbsxBcp+7NhpxsQRrQCLCMmd+t20lgiMXtqf0n1qKJjkvqIAmpX CoMg== X-Gm-Message-State: AAQBX9cIG4SoR/PWkq/Nyp1/hVRVogvxgPyK0tjVgeDwzANUrEfYMCg3 34/V2DRQ9DpWahZ3ULwJ3Jw8oq0EYE3JgyU8fMmiQg== X-Google-Smtp-Source: AKy350aJtISoVKUaqIn720VHIGSWp+Toq2YGNud8wRo78kManFLka0sdSuR5kRcC2AJx+A8FV0if0DU8uk8qgFB5cAk= X-Received: by 2002:a05:600c:d6:b0:3f1:75b0:dc47 with SMTP id u22-20020a05600c00d600b003f175b0dc47mr5507261wmm.15.1682230316230; Sat, 22 Apr 2023 23:11:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Prathamesh Kulkarni Date: Sun, 23 Apr 2023 11:41:19 +0530 Message-ID: Subject: Re: [match.pd] [SVE] Add pattern to transform svrev(svrev(v)) --> v To: Prathamesh Kulkarni , Richard Biener , gcc Patches , richard.sandiford@arm.com Content-Type: multipart/mixed; boundary="0000000000006e4e8c05f9fac41e" X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --0000000000006e4e8c05f9fac41e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, 21 Apr 2023 at 21:57, Richard Sandiford wrote: > > Prathamesh Kulkarni writes: > > On Wed, 19 Apr 2023 at 16:17, Richard Biener wrote: > >> > >> On Wed, Apr 19, 2023 at 11:21=E2=80=AFAM Prathamesh Kulkarni > >> wrote: > >> > > >> > On Tue, 11 Apr 2023 at 19:36, Prathamesh Kulkarni > >> > wrote: > >> > > > >> > > On Tue, 11 Apr 2023 at 14:17, Richard Biener wrote: > >> > > > > >> > > > On Wed, Apr 5, 2023 at 10:39=E2=80=AFAM Prathamesh Kulkarni via = Gcc-patches > >> > > > wrote: > >> > > > > > >> > > > > Hi, > >> > > > > For the following test: > >> > > > > > >> > > > > svint32_t f(svint32_t v) > >> > > > > { > >> > > > > return svrev_s32 (svrev_s32 (v)); > >> > > > > } > >> > > > > > >> > > > > We generate 2 rev instructions instead of nop: > >> > > > > f: > >> > > > > rev z0.s, z0.s > >> > > > > rev z0.s, z0.s > >> > > > > ret > >> > > > > > >> > > > > The attached patch tries to fix that by trying to recognize th= e following > >> > > > > pattern in match.pd: > >> > > > > v1 =3D VEC_PERM_EXPR (v0, v0, mask) > >> > > > > v2 =3D VEC_PERM_EXPR (v1, v1, mask) > >> > > > > --> > >> > > > > v2 =3D v0 > >> > > > > if mask is { nelts - 1, nelts - 2, nelts - 3, ... } > >> > > > > > >> > > > > Code-gen with patch: > >> > > > > f: > >> > > > > ret > >> > > > > > >> > > > > Bootstrap+test passes on aarch64-linux-gnu, and SVE bootstrap = in progress. > >> > > > > Does it look OK for stage-1 ? > >> > > > > >> > > > I didn't look at the patch but tree-ssa-forwprop.cc:simplify_per= mutation should > >> > > > handle two consecutive permutes with the is_combined_permutation= _identity > >> > > > which might need tweaking for VLA vectors > >> > > Hi Richard, > >> > > Thanks for the suggestions. The attached patch modifies > >> > > is_combined_permutation_identity > >> > > to recognize the above pattern. > >> > > Does it look OK ? > >> > > Bootstrap+test in progress on aarch64-linux-gnu and x86_64-linux-g= nu. > >> > Hi, > >> > ping https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615502.htm= l > >> > >> Can you instead of def_stmt pass in a bool whether rhs1 is equal to rh= s2 > >> and amend the function comment accordingly, say, > >> > >> tem =3D VEC_PERM ; > >> res =3D VEC_PERM ; > >> > >> SAME_P specifies whether op0 and op1 compare equal. */ > >> > >> + if (def_stmt) > >> + gcc_checking_assert (is_gimple_assign (def_stmt) > >> + && gimple_assign_rhs_code (def_stmt) =3D=3D V= EC_PERM_EXPR); > >> this is then unnecessary > >> > >> mask =3D fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mas= k1, mask2); > >> + > >> + /* For VLA masks, check for the following pattern: > >> + v1 =3D VEC_PERM_EXPR (v0, v0, mask) > >> + v2 =3D VEC_PERM_EXPR (v1, v1, mask) > >> + --> > >> + v2 =3D v0 > >> > >> you are not using 'mask' so please defer fold_ternary until after your > >> special-case. > >> > >> + if (operand_equal_p (mask1, mask2, 0) > >> + && !VECTOR_CST_NELTS (mask1).is_constant () > >> + && def_stmt > >> + && operand_equal_p (gimple_assign_rhs1 (def_stmt), > >> + gimple_assign_rhs2 (def_stmt), 0)) > >> + { > >> + vec_perm_builder builder; > >> + if (tree_to_vec_perm_builder (&builder, mask1)) > >> + { > >> + poly_uint64 nelts =3D TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1= )); > >> + vec_perm_indices sel (builder, 1, nelts); > >> + if (sel.series_p (0, 1, nelts - 1, -1)) > >> + return 1; > >> + } > >> + return 0; > >> > >> I'm defering to Richard whether this is the correct way to check for a= vector > >> reversing mask (I wonder how constructing such mask is even possible) > > Hi Richard, > > Thanks for the suggestions, I have updated the patch accordingly. > > > > The following hunk from svrev_impl::fold() constructs mask in reverse: > > /* Permute as { nelts - 1, nelts - 2, nelts - 3, ... }. */ > > poly_int64 nelts =3D TYPE_VECTOR_SUBPARTS (TREE_TYPE (f.lhs)); > > vec_perm_builder builder (nelts, 1, 3); > > for (int i =3D 0; i < 3; ++i) > > builder.quick_push (nelts - i - 1); > > return fold_permute (f, builder); > > > > To see if mask chooses elements in reverse, I borrowed it from function= comment > > for series_p in vec-perm-indices.cc: > > /* Return true if index OUT_BASE + I * OUT_STEP selects input > > element IN_BASE + I * IN_STEP. For example, the call to test > > whether a permute reverses a vector of N elements would be: > > > > series_p (0, 1, N - 1, -1) > > > > which would return true for { N - 1, N - 2, N - 3, ... }. */ > > > > Thanks, > > Prathamesh > >> > >> Richard. > >> > >> > Thanks, > >> > Prathamesh > >> > > > >> > > Thanks, > >> > > Prathamesh > >> > > > > >> > > > Richard. > >> > > > > >> > > > > > >> > > > > Thanks, > >> > > > > Prathamesh > > > > gcc/ChangeLog: > > * tree-ssa-forwprop.cc (is_combined_permutation_identity): > > New parameter same_p. > > Try to simplify two successive VEC_PERM_EXPRs with single operand > > and same mask, where mask chooses elements in reverse order. > > > > gcc/testesuite/ChangeLog: > > * gcc.target/aarch64/sve/acle/general/rev-1.c: New test. > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c = b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c > > new file mode 100644 > > index 00000000000..e57ee67d716 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/rev-1.c > > @@ -0,0 +1,12 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fdump-tree-optimized" } */ > > + > > +#include > > + > > +svint32_t f(svint32_t v) > > +{ > > + return svrev_s32 (svrev_s32 (v)); > > +} > > + > > +/* { dg-final { scan-tree-dump "return v_1\\(D\\)" "optimized" } } */ > > +/* { dg-final { scan-tree-dump-not "VEC_PERM_EXPR" "optimized" } } */ > > diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc > > index 9b567440ba4..ebd4a368ae9 100644 > > --- a/gcc/tree-ssa-forwprop.cc > > +++ b/gcc/tree-ssa-forwprop.cc > > @@ -2528,11 +2528,16 @@ simplify_bitfield_ref (gimple_stmt_iterator *gs= i) > > return true; > > } > > > > -/* Determine whether applying the 2 permutations (mask1 then mask2) > > - gives back one of the input. */ > > +/* For the following sequence: > > + tem =3D VEC_PERM_EXPR > > + res =3D VEC_PERM_EXPR > > + > > + Determine whether applying the 2 permutations (mask1 then mask2) > > + gives back one of the input. SAME_P specifies whether op0 > > + and op1 compare equal. */ > > > > static int > > -is_combined_permutation_identity (tree mask1, tree mask2) > > +is_combined_permutation_identity (tree mask1, tree mask2, bool same_p) > > { > > tree mask; > > unsigned HOST_WIDE_INT nelts, i, j; > > @@ -2541,6 +2546,29 @@ is_combined_permutation_identity (tree mask1, tr= ee mask2) > > > > gcc_checking_assert (TREE_CODE (mask1) =3D=3D VECTOR_CST > > && TREE_CODE (mask2) =3D=3D VECTOR_CST); > > + > > + /* For VLA masks, check for the following pattern: > > + v1 =3D VEC_PERM_EXPR (v0, v0, mask1) > > + v2 =3D VEC_PERM_EXPR (v1, v1, mask2) > > + --> > > + v2 =3D v0 > > + if mask1 =3D=3D mask2 =3D=3D {nelts - 1, nelts - 2, ...}. */ > > + > > + if (operand_equal_p (mask1, mask2, 0) > > + && !VECTOR_CST_NELTS (mask1).is_constant () > > + && same_p) > > The same_p isn't needed, since a reverse mask only picks from the > first operand. Canonicalisation rules should ensure that op0 and op1 > end up being equal in that case, but this fold doesn't depend on that. > > > + { > > + vec_perm_builder builder; > > + if (tree_to_vec_perm_builder (&builder, mask1)) > > + { > > + poly_uint64 nelts =3D TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask1)); > > + vec_perm_indices sel (builder, 1, nelts); > > + if (sel.series_p (0, 1, nelts - 1, -1)) > > + return 1; > > + } > > I have conflicting feelings about this. > > (1) It seems quite special-purpose. I'm a bit worried that we'll end > up with a list of highly specific folds (which could grow quite long) > rather than something more general. > > (2) Constructing a vec_perm_indices seems quite heavyweight for this > single check. We could go directly from the mask instead. > > But =E2=80=9Cfixing=E2=80=9D (2) reduces the generality of the structure = and so leans > more heavily into (1). > > I agree the check is correct though, so let's go with the patch > in this form (without the same_p thing). However: > > > + return 0; > > + } > > + > > I don't think we want this early exit. It won't be used for all VLA case= s > due the operand_equal_p check. And in future, more VLA-compatible stuff > might be added further down. Hi Richard, Thanks for the suggestions, does the attached patch look OK ? Bootstrapped+tested on aarch64-linux-gnu. Thanks, Prathamesh > > Thanks, > Richard > > > mask =3D fold_ternary (VEC_PERM_EXPR, TREE_TYPE (mask1), mask1, mask= 1, mask2); > > if (mask =3D=3D NULL_TREE || TREE_CODE (mask) !=3D VECTOR_CST) > > return 0; > > @@ -2629,7 +2657,9 @@ simplify_permutation (gimple_stmt_iterator *gsi) > > op3 =3D gimple_assign_rhs3 (def_stmt); > > if (TREE_CODE (op3) !=3D VECTOR_CST) > > return 0; > > - ident =3D is_combined_permutation_identity (op3, op2); > > + bool same_p =3D operand_equal_p (gimple_assign_rhs1 (def_stmt), > > + gimple_assign_rhs2 (def_stmt), 0); > > + ident =3D is_combined_permutation_identity (op3, op2, same_p); > > if (!ident) > > return 0; > > orig =3D (ident =3D=3D 1) ? gimple_assign_rhs1 (def_stmt) --0000000000006e4e8c05f9fac41e Content-Type: text/plain; charset="US-ASCII"; name="gnu-830-5.txt" Content-Disposition: attachment; filename="gnu-830-5.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_lgt0eam40 Z2NjL0NoYW5nZUxvZzoKCSogdHJlZS1zc2EtZm9yd3Byb3AuY2MgKGlzX2NvbWJpbmVkX3Blcm11 dGF0aW9uX2lkZW50aXR5KTogVHJ5IHRvCglzaW1wbGlmeSB0d28gc3VjY2Vzc2l2ZSBWRUNfUEVS TV9FWFBScyB3aXRoIHNpbmdsZSBvcGVyYW5kIGFuZCBzYW1lCgltYXNrLCB3aGVyZSBtYXNrIGNo b29zZXMgZWxlbWVudHMgaW4gcmV2ZXJzZSBvcmRlci4KCmdjYy90ZXN0ZXN1aXRlL0NoYW5nZUxv ZzoKCSogZ2NjLnRhcmdldC9hYXJjaDY0L3N2ZS9hY2xlL2dlbmVyYWwvcmV2LTEuYzogTmV3IHRl c3QuCgpkaWZmIC0tZ2l0IGEvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvc3ZlL2Fj bGUvZ2VuZXJhbC9yZXYtMS5jIGIvZ2NjL3Rlc3RzdWl0ZS9nY2MudGFyZ2V0L2FhcmNoNjQvc3Zl L2FjbGUvZ2VuZXJhbC9yZXYtMS5jCm5ldyBmaWxlIG1vZGUgMTAwNjQ0CmluZGV4IDAwMDAwMDAw MDAwLi5lNTdlZTY3ZDcxNgotLS0gL2Rldi9udWxsCisrKyBiL2djYy90ZXN0c3VpdGUvZ2NjLnRh cmdldC9hYXJjaDY0L3N2ZS9hY2xlL2dlbmVyYWwvcmV2LTEuYwpAQCAtMCwwICsxLDEyIEBACisv KiB7IGRnLWRvIGNvbXBpbGUgfSAqLworLyogeyBkZy1vcHRpb25zICItTzMgLWZkdW1wLXRyZWUt b3B0aW1pemVkIiB9ICovCisKKyNpbmNsdWRlIDxhcm1fc3ZlLmg+CisKK3N2aW50MzJfdCBmKHN2 aW50MzJfdCB2KQoreworICByZXR1cm4gc3ZyZXZfczMyIChzdnJldl9zMzIgKHYpKTsKK30KKwor LyogeyBkZy1maW5hbCB7IHNjYW4tdHJlZS1kdW1wICJyZXR1cm4gdl8xXFwoRFxcKSIgIm9wdGlt aXplZCIgfSB9ICovCisvKiB7IGRnLWZpbmFsIHsgc2Nhbi10cmVlLWR1bXAtbm90ICJWRUNfUEVS TV9FWFBSIiAib3B0aW1pemVkIiB9IH0gKi8KZGlmZiAtLWdpdCBhL2djYy90cmVlLXNzYS1mb3J3 cHJvcC5jYyBiL2djYy90cmVlLXNzYS1mb3J3cHJvcC5jYwppbmRleCA5YjU2NzQ0MGJhNC4uNjFk ZjdlZmU4MmMgMTAwNjQ0Ci0tLSBhL2djYy90cmVlLXNzYS1mb3J3cHJvcC5jYworKysgYi9nY2Mv dHJlZS1zc2EtZm9yd3Byb3AuY2MKQEAgLTI1NDEsNiArMjU0MSwyNyBAQCBpc19jb21iaW5lZF9w ZXJtdXRhdGlvbl9pZGVudGl0eSAodHJlZSBtYXNrMSwgdHJlZSBtYXNrMikKIAogICBnY2NfY2hl Y2tpbmdfYXNzZXJ0IChUUkVFX0NPREUgKG1hc2sxKSA9PSBWRUNUT1JfQ1NUCiAJCSAgICAgICAm JiBUUkVFX0NPREUgKG1hc2syKSA9PSBWRUNUT1JfQ1NUKTsKKworICAvKiBGb3IgVkxBIG1hc2tz LCBjaGVjayBmb3IgdGhlIGZvbGxvd2luZyBwYXR0ZXJuOgorICAgICB2MSA9IFZFQ19QRVJNX0VY UFIgKHYwLCB2MCwgbWFzazEpCisgICAgIHYyID0gVkVDX1BFUk1fRVhQUiAodjEsIHYxLCBtYXNr MikKKyAgICAgLS0+CisgICAgIHYyID0gdjAKKyAgICAgaWYgbWFzazEgPT0gbWFzazIgPT0ge25l bHRzIC0gMSwgbmVsdHMgLSAyLCAuLi59LiAgKi8KKworICBpZiAob3BlcmFuZF9lcXVhbF9wICht YXNrMSwgbWFzazIsIDApCisgICAgICAmJiAhVkVDVE9SX0NTVF9ORUxUUyAobWFzazEpLmlzX2Nv bnN0YW50ICgpKQorICAgIHsKKyAgICAgIHZlY19wZXJtX2J1aWxkZXIgYnVpbGRlcjsKKyAgICAg IGlmICh0cmVlX3RvX3ZlY19wZXJtX2J1aWxkZXIgKCZidWlsZGVyLCBtYXNrMSkpCisJeworCSAg cG9seV91aW50NjQgbmVsdHMgPSBUWVBFX1ZFQ1RPUl9TVUJQQVJUUyAoVFJFRV9UWVBFIChtYXNr MSkpOworCSAgdmVjX3Blcm1faW5kaWNlcyBzZWwgKGJ1aWxkZXIsIDEsIG5lbHRzKTsKKwkgIGlm IChzZWwuc2VyaWVzX3AgKDAsIDEsIG5lbHRzIC0gMSwgLTEpKQorCSAgICByZXR1cm4gMTsKKwl9 CisgICAgfQorCiAgIG1hc2sgPSBmb2xkX3Rlcm5hcnkgKFZFQ19QRVJNX0VYUFIsIFRSRUVfVFlQ RSAobWFzazEpLCBtYXNrMSwgbWFzazEsIG1hc2syKTsKICAgaWYgKG1hc2sgPT0gTlVMTF9UUkVF IHx8IFRSRUVfQ09ERSAobWFzaykgIT0gVkVDVE9SX0NTVCkKICAgICByZXR1cm4gMDsK --0000000000006e4e8c05f9fac41e--