From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CCECA3836F03 for ; Tue, 16 May 2023 08:45:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCECA3836F03 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 36DDD2F4; Tue, 16 May 2023 01:46:28 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 049A93F7BD; Tue, 16 May 2023 01:45:42 -0700 (PDT) From: Richard Sandiford To: Tejas Belagod Mail-Followup-To: Tejas Belagod ,"gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab] References: <20230316113927.4967-1-tejas.belagod@arm.com> Date: Tue, 16 May 2023 09:45:41 +0100 In-Reply-To: (Tejas Belagod's message of "Tue, 16 May 2023 09:03:06 +0100") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-23.2 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Tejas Belagod writes: >> + { >> + int i; >> + int nelts =3D vector_cst_encoded_nelts (v); >> + int first_el =3D 0; >> + >> + for (i =3D first_el; i < nelts; i +=3D step) >> + if (VECTOR_CST_ENCODED_ELT (v, i) !=3D VECTOR_CST_ENCODED_ELT (v, > first_el)) > > I think this should use !operand_equal_p (..., ..., 0). > > > Oops! I wonder why I thought VECTOR_CST_ENCODED_ELT returned a constant! = Thanks > for spotting that. It does only return a constant. But there can be multiple trees with the same constant value, through things like TREE_OVERFLOW (not sure where things stand on expunging that from gimple) and the fact that gimple does not maintain a distinction between different types that have the same mode and signedness. (E.g. on ILP32 hosts, gimple does not maintain a distinction between int and long, even though int 0 and long 0 are different trees.) > Also, should the flags here be OEP_ONLY_CONST ? Nah, just 0 should be fine. >> + return false; >> + >> + return true; >> + } >> + >> + /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF. >> + BIT_FIELD_REF lowers to a NEON element extract, so we have to make= sure >> + the index of the element being accessed is in the range of a NEON > vector >> + width. */ > > s/NEON/Advanced SIMD/. Same in later comments > >> + gimple *fold (gimple_folder & f) const override >> + { >> + tree pred =3D gimple_call_arg (f.call, 0); >> + tree val =3D gimple_call_arg (f.call, 1); >> + >> + if (TREE_CODE (pred) =3D=3D VECTOR_CST) >> + { >> + HOST_WIDE_INT pos; >> + unsigned int const_vg; >> + int i =3D 0; >> + int step =3D f.type_suffix (0).element_bytes; >> + int step_1 =3D gcd (step, VECTOR_CST_NPATTERNS (pred)); >> + int npats =3D VECTOR_CST_NPATTERNS (pred); >> + unsigned HOST_WIDE_INT nelts =3D vector_cst_encoded_nelts (pred); >> + tree b =3D NULL_TREE; >> + bool const_vl =3D aarch64_sve_vg.is_constant (&const_vg); > > I think this might be left over from previous versions, but: > const_vg isn't used and const_vl is only used once, so I think it > would be better to remove them. > >> + >> + /* We can optimize 2 cases common to variable and fixed-length cas= es >> + without a linear search of the predicate vector: >> + 1. LASTA if predicate is all true, return element 0. >> + 2. LASTA if predicate all false, return element 0. */ >> + if (is_lasta () && vect_all_same (pred, step_1)) >> + { >> + b =3D build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >> + bitsize_int (step * BITS_PER_UNIT), bitsize_int (0= )); >> + return gimple_build_assign (f.lhs, b); >> + } >> + >> + /* Handle the all-false case for LASTB where SVE VL =3D=3D 128b - >> + return the highest numbered element. */ >> + if (is_lastb () && known_eq (BYTES_PER_SVE_VECTOR, 16) >> + && vect_all_same (pred, step_1) >> + && integer_zerop (VECTOR_CST_ENCODED_ELT (pred, 0))) > > Formatting nit: one condition per line once one line isn't enough. > >> + { >> + b =3D build3 (BIT_FIELD_REF, TREE_TYPE (f.lhs), val, >> + bitsize_int (step * BITS_PER_UNIT), >> + bitsize_int ((16 - step) * BITS_PER_UNIT)); >> + >> + return gimple_build_assign (f.lhs, b); >> + } >> + >> + /* If VECTOR_CST_NELTS_PER_PATTERN (pred) =3D=3D 2 and every multi= ple of >> + 'step_1' in >> + [VECTOR_CST_NPATTERNS .. VECTOR_CST_ENCODED_NELTS - 1] >> + is zero, then we can treat the vector as VECTOR_CST_NPATTERNS >> + elements followed by all inactive elements. */ >> + if (!const_vl && VECTOR_CST_NELTS_PER_PATTERN (pred) =3D=3D 2) > > Following on from the above, maybe use: > > !VECTOR_CST_NELTS (pred).is_constant () > > instead of !const_vl here. > > I have a horrible suspicion that I'm contradicting our earlier discussion > here, sorry, but: I think we have to return null if NELTS_PER_PATTERN != =3D 2. > >=20=20 > > IIUC, the NPATTERNS .. ENCODED_ELTS represent the repeated part of the en= coded > constant. This means the repetition occurs if NELTS_PER_PATTERN =3D=3D 2,= IOW the > base1 repeats in the encoding. This loop is checking this condition and l= ooks > for a 1 in the repeated part of the NELTS_PER_PATTERN =3D=3D 2 in a VL ve= ctor. > Please correct me if I=E2=80=99m misunderstanding here. NELTS_PER_PATTERN =3D=3D 1 is also a repeating pattern: it means that the entire sequence is repeated to fill a vector. So if an NELTS_PER_PATTERN =3D=3D 1 constant has elements {0, 1, 0, 0}, the vector is: {0, 1, 0, 0, 0, 1, 0, 0, ...} and the optimisation can't handle that. NELTS_PER_PATTERN =3D=3D 3 isn't likely to occur for predicates, but in principle it has the same problem. Thanks, Richard