From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 8DF733858407 for ; Fri, 17 Dec 2021 11:33:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8DF733858407 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 377841435; Fri, 17 Dec 2021 03:33:36 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B5EBA3F5A1; Fri, 17 Dec 2021 03:33:35 -0800 (PST) From: Richard Sandiford To: Prathamesh Kulkarni Mail-Followup-To: Prathamesh Kulkarni , gcc Patches , richard.sandiford@arm.com Cc: gcc Patches Subject: Re: [1/2] PR96463 - aarch64 specific changes References: Date: Fri, 17 Dec 2021 11:33:34 +0000 In-Reply-To: (Prathamesh Kulkarni's message of "Fri, 17 Dec 2021 15:34:01 +0530") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2021 11:33:38 -0000 Prathamesh Kulkarni writes: > Hi, > The patch folds: > lhs =3D svld1rq ({-1, -1, -1, ...}, &v[0]) > into: > lhs =3D vec_perm_expr > and expands above vec_perm_expr using aarch64_expand_sve_dupq. > > With patch, for following test: > #include > #include > > svint32_t > foo (int32x4_t x) > { > return svld1rq (svptrue_b8 (), &x[0]); > } > > it generates following code: > foo: > .LFB4350: > dup z0.q, z0.q[0] > ret > > and passes bootstrap+test on aarch64-linux-gnu. > But I am not sure if the changes to aarch64_evpc_sve_tbl > are correct. Just in case: I was only using int32x4_t in the PR as an example. The same thing should work for all element types. > > Thanks, > Prathamesh > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config= /aarch64/aarch64-sve-builtins-base.cc > index 02e42a71e5e..e21bbec360c 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > @@ -1207,6 +1207,56 @@ public: > insn_code icode =3D code_for_aarch64_sve_ld1rq (e.vector_mode (0)); > return e.use_contiguous_load_insn (icode); > } > + > + gimple * > + fold (gimple_folder &f) const OVERRIDE > + { > + tree arg0 =3D gimple_call_arg (f.call, 0); > + tree arg1 =3D gimple_call_arg (f.call, 1); > + > + /* Transform: > + lhs =3D svld1rq ({-1, -1, ... }, &v[0]) > + into: > + lhs =3D vec_perm_expr. > + on little endian target. */ > + > + if (!BYTES_BIG_ENDIAN > + && integer_all_onesp (arg0) > + && TREE_CODE (arg1) =3D=3D ADDR_EXPR) > + { > + tree t =3D TREE_OPERAND (arg1, 0); > + if (TREE_CODE (t) =3D=3D ARRAY_REF) > + { > + tree index =3D TREE_OPERAND (t, 1); > + t =3D TREE_OPERAND (t, 0); > + if (integer_zerop (index) && TREE_CODE (t) =3D=3D VIEW_CONVERT_EXPR) > + { > + t =3D TREE_OPERAND (t, 0); > + tree vectype =3D TREE_TYPE (t); > + if (VECTOR_TYPE_P (vectype) > + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) > + && wi::to_wide (TYPE_SIZE (vectype)) =3D=3D 128) > + { Since this is quite a specific pattern match, and since we now lower arm_neon.h vld1* to normal gimple accesses, I think we should try the =E2=80=9Cmore generally=E2=80=9D approach mentioned in the PR and see what = the fallout is. That is, keep: if (!BYTES_BIG_ENDIAN && integer_all_onesp (arg0) If those conditions pass, create an Advanced SIMD access at address arg1, using similar code to the handling of: BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD) BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD) BUILTIN_VALLP_NO_DI (LOAD1_P, ld1, 0, LOAD) in aarch64_general_gimple_fold_builtin. (Would be good to move the common code to aarch64.c so that both files can use it.) > + tree lhs =3D gimple_call_lhs (f.call); > + tree lhs_type =3D TREE_TYPE (lhs); > + int source_nelts =3D TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > + vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelt= s, 1); > + for (int i =3D 0; i < source_nelts; i++) > + sel.quick_push (i); > + > + vec_perm_indices indices (sel, 1, source_nelts); > + if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices)) > + return NULL; I don't think we need to check this: it should always be true. Probably worth keeping as a gcc_checking_assert though. > + > + tree mask =3D vec_perm_indices_to_tree (lhs_type, indices); > + return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask); > + } > + } > + } > + } > + > + return NULL; > + } > }; >=20=20 > class svld1ro_impl : public load_replicate > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index f07330cff4f..af27f550be3 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -23002,8 +23002,32 @@ aarch64_evpc_sve_tbl (struct expand_vec_perm_d *= d) >=20=20 > machine_mode sel_mode =3D related_int_vector_mode (d->vmode).require (= ); > rtx sel =3D vec_perm_indices_to_rtx (sel_mode, d->perm); > + > if (d->one_vector_p) > - emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, se= l)); > + { > + bool use_dupq =3D false; > + /* Check if sel is dup vector with encoded elements {0, 1, 2, ... = nelts} */ > + if (GET_CODE (sel) =3D=3D CONST_VECTOR > + && !GET_MODE_NUNITS (GET_MODE (sel)).is_constant () > + && CONST_VECTOR_DUPLICATE_P (sel)) > + { > + unsigned nelts =3D const_vector_encoded_nelts (sel); > + unsigned i; > + for (i =3D 0; i < nelts; i++) > + { > + rtx elem =3D CONST_VECTOR_ENCODED_ELT(sel, i); > + if (!(CONST_INT_P (elem) && INTVAL(elem) =3D=3D i)) > + break; > + } > + if (i =3D=3D nelts) > + use_dupq =3D true; > + } > + > + if (use_dupq) > + aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0); > + else > + emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel)); > + } This shouldn't be a TBL but a new operation, handled by its own aarch64_evpc_sve_* routine. The check for the mask should then be done on d->perm, to detect whether the permutation is one that the new routine supports. I think the requirements are: - !BYTES_BIG_ENDIAN - the source must be an Advanced SIMD vector - the destination must be an SVE vector - the permutation must be a duplicate (tested in the code above) - the number of =E2=80=9Cpatterns=E2=80=9D in the permutation must equal th= e number of source elements - element X of the permutation must equal X (tested in the code above) The existing aarch64_evpc_* routines expect the source and target modes to be the same, so we should only call them when that's true. Thanks, Richard