On Mon, 27 Dec 2021 at 15:54, Prathamesh Kulkarni wrote: > > On Fri, 17 Dec 2021 at 17:03, Richard Sandiford > wrote: > > > > Prathamesh Kulkarni writes: > > > Hi, > > > The patch folds: > > > lhs = svld1rq ({-1, -1, -1, ...}, &v[0]) > > > into: > > > lhs = vec_perm_expr > > > and expands above vec_perm_expr using aarch64_expand_sve_dupq. > > > > > > With patch, for following test: > > > #include > > > #include > > > > > > svint32_t > > > foo (int32x4_t x) > > > { > > > return svld1rq (svptrue_b8 (), &x[0]); > > > } > > > > > > it generates following code: > > > foo: > > > .LFB4350: > > > dup z0.q, z0.q[0] > > > ret > > > > > > and passes bootstrap+test on aarch64-linux-gnu. > > > But I am not sure if the changes to aarch64_evpc_sve_tbl > > > are correct. > > > > Just in case: I was only using int32x4_t in the PR as an example. > > The same thing should work for all element types. > > > > > > > > Thanks, > > > Prathamesh > > > > > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > index 02e42a71e5e..e21bbec360c 100644 > > > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc > > > @@ -1207,6 +1207,56 @@ public: > > > insn_code icode = code_for_aarch64_sve_ld1rq (e.vector_mode (0)); > > > return e.use_contiguous_load_insn (icode); > > > } > > > + > > > + gimple * > > > + fold (gimple_folder &f) const OVERRIDE > > > + { > > > + tree arg0 = gimple_call_arg (f.call, 0); > > > + tree arg1 = gimple_call_arg (f.call, 1); > > > + > > > + /* Transform: > > > + lhs = svld1rq ({-1, -1, ... }, &v[0]) > > > + into: > > > + lhs = vec_perm_expr. > > > + on little endian target. */ > > > + > > > + if (!BYTES_BIG_ENDIAN > > > + && integer_all_onesp (arg0) > > > + && TREE_CODE (arg1) == ADDR_EXPR) > > > + { > > > + tree t = TREE_OPERAND (arg1, 0); > > > + if (TREE_CODE (t) == ARRAY_REF) > > > + { > > > + tree index = TREE_OPERAND (t, 1); > > > + t = TREE_OPERAND (t, 0); > > > + if (integer_zerop (index) && TREE_CODE (t) == VIEW_CONVERT_EXPR) > > > + { > > > + t = TREE_OPERAND (t, 0); > > > + tree vectype = TREE_TYPE (t); > > > + if (VECTOR_TYPE_P (vectype) > > > + && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u) > > > + && wi::to_wide (TYPE_SIZE (vectype)) == 128) > > > + { > > > > Since this is quite a specific pattern match, and since we now lower > > arm_neon.h vld1* to normal gimple accesses, I think we should try the > > “more generally” approach mentioned in the PR and see what the fallout > > is. That is, keep: > > > > if (!BYTES_BIG_ENDIAN > > && integer_all_onesp (arg0) > > > > If those conditions pass, create an Advanced SIMD access at address arg1, > > using similar code to the handling of: > > > > BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD) > > BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD) > > BUILTIN_VALLP_NO_DI (LOAD1_P, ld1, 0, LOAD) > > > > in aarch64_general_gimple_fold_builtin. (Would be good to move the > > common code to aarch64.c so that both files can use it.) > > > > > + tree lhs = gimple_call_lhs (f.call); > > > + tree lhs_type = TREE_TYPE (lhs); > > > + int source_nelts = TYPE_VECTOR_SUBPARTS (vectype).to_constant (); > > > + vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelts, 1); > > > + for (int i = 0; i < source_nelts; i++) > > > + sel.quick_push (i); > > > + > > > + vec_perm_indices indices (sel, 1, source_nelts); > > > + if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices)) > > > + return NULL; > > > > I don't think we need to check this: it should always be true. > > Probably worth keeping as a gcc_checking_assert though. > > > > > + > > > + tree mask = vec_perm_indices_to_tree (lhs_type, indices); > > > + return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask); > > > + } > > > + } > > > + } > > > + } > > > + > > > + return NULL; > > > + } > > > }; > > > > > > class svld1ro_impl : public load_replicate > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > index f07330cff4f..af27f550be3 100644 > > > --- a/gcc/config/aarch64/aarch64.c > > > +++ b/gcc/config/aarch64/aarch64.c > > > @@ -23002,8 +23002,32 @@ aarch64_evpc_sve_tbl (struct expand_vec_perm_d *d) > > > > > > machine_mode sel_mode = related_int_vector_mode (d->vmode).require (); > > > rtx sel = vec_perm_indices_to_rtx (sel_mode, d->perm); > > > + > > > if (d->one_vector_p) > > > - emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel)); > > > + { > > > + bool use_dupq = false; > > > + /* Check if sel is dup vector with encoded elements {0, 1, 2, ... nelts} */ > > > + if (GET_CODE (sel) == CONST_VECTOR > > > + && !GET_MODE_NUNITS (GET_MODE (sel)).is_constant () > > > + && CONST_VECTOR_DUPLICATE_P (sel)) > > > + { > > > + unsigned nelts = const_vector_encoded_nelts (sel); > > > + unsigned i; > > > + for (i = 0; i < nelts; i++) > > > + { > > > + rtx elem = CONST_VECTOR_ENCODED_ELT(sel, i); > > > + if (!(CONST_INT_P (elem) && INTVAL(elem) == i)) > > > + break; > > > + } > > > + if (i == nelts) > > > + use_dupq = true; > > > + } > > > + > > > + if (use_dupq) > > > + aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0); > > > + else > > > + emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel)); > > > + } > > > > This shouldn't be a TBL but a new operation, handled by its own > > aarch64_evpc_sve_* routine. The check for the mask should then > > be done on d->perm, to detect whether the permutation is one > > that the new routine supports. > > > > I think the requirements are: > > > > - !BYTES_BIG_ENDIAN > > - the source must be an Advanced SIMD vector > > - the destination must be an SVE vector > > - the permutation must be a duplicate (tested in the code above) > > - the number of “patterns” in the permutation must equal the number of > > source elements > > - element X of the permutation must equal X (tested in the code above) > > > > The existing aarch64_evpc_* routines expect the source and target modes > > to be the same, so we should only call them when that's true. > Hi Richard, > Thanks for the suggestions, and sorry for late reply. > Does the following patch look OK (sans the refactoring of building mem_ref) ? > Passes bootstrap+test on aarch64-linux-gnu. Hi Richard, Since stage-1 has reopened, does the attached patch look OK to commit ? Thanks, Prathamesh > > Thanks, > Prathamesh > > > > Thanks, > > Richard