From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 8DF733858407
 for <gcc-patches@gcc.gnu.org>; Fri, 17 Dec 2021 11:33:36 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8DF733858407
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 377841435;
 Fri, 17 Dec 2021 03:33:36 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B5EBA3F5A1;
 Fri, 17 Dec 2021 03:33:35 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Mail-Followup-To: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>,
 gcc Patches <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: gcc Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [1/2] PR96463 - aarch64 specific changes
References: <CAAgBjM=meq6r-Yi3BJ6WWQ_eQHqV-u7nyN8NFmnxGefmeWp7hg@mail.gmail.com>
Date: Fri, 17 Dec 2021 11:33:34 +0000
In-Reply-To: <CAAgBjM=meq6r-Yi3BJ6WWQ_eQHqV-u7nyN8NFmnxGefmeWp7hg@mail.gmail.com>
 (Prathamesh Kulkarni's message of "Fri, 17 Dec 2021 15:34:01 +0530")
Message-ID: <mpty24j8nzl.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Dec 2021 11:33:38 -0000

Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi,
> The patch folds:
> lhs =3D svld1rq ({-1, -1, -1, ...}, &v[0])
> into:
> lhs =3D vec_perm_expr<v, v, {0, 1, 2, 3, ... }>
> and expands above vec_perm_expr using aarch64_expand_sve_dupq.
>
> With patch, for following test:
> #include <arm_sve.h>
> #include <arm_neon.h>
>
> svint32_t
> foo (int32x4_t x)
> {
>   return svld1rq (svptrue_b8 (), &x[0]);
> }
>
> it generates following code:
> foo:
> .LFB4350:
>         dup     z0.q, z0.q[0]
>         ret
>
> and passes bootstrap+test on aarch64-linux-gnu.
> But I am not sure if the changes to aarch64_evpc_sve_tbl
> are correct.

Just in case: I was only using int32x4_t in the PR as an example.
The same thing should work for all element types.

>
> Thanks,
> Prathamesh
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc b/gcc/config=
/aarch64/aarch64-sve-builtins-base.cc
> index 02e42a71e5e..e21bbec360c 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1207,6 +1207,56 @@ public:
>      insn_code icode =3D code_for_aarch64_sve_ld1rq (e.vector_mode (0));
>      return e.use_contiguous_load_insn (icode);
>    }
> +
> +  gimple *
> +  fold (gimple_folder &f) const OVERRIDE
> +  {
> +    tree arg0 =3D gimple_call_arg (f.call, 0);
> +    tree arg1 =3D gimple_call_arg (f.call, 1);
> +
> +    /* Transform:
> +       lhs =3D svld1rq ({-1, -1, ... }, &v[0])
> +       into:
> +       lhs =3D vec_perm_expr<v, v, {0, 1, 2, 3, ...}>.
> +       on little endian target.  */
> +
> +    if (!BYTES_BIG_ENDIAN
> +	&& integer_all_onesp (arg0)
> +	&& TREE_CODE (arg1) =3D=3D ADDR_EXPR)
> +      {
> +	tree t =3D TREE_OPERAND (arg1, 0);
> +	if (TREE_CODE (t) =3D=3D ARRAY_REF)
> +	  {
> +	    tree index =3D TREE_OPERAND (t, 1);
> +	    t =3D TREE_OPERAND (t, 0);
> +	    if (integer_zerop (index) && TREE_CODE (t) =3D=3D VIEW_CONVERT_EXPR)
> +	      {
> +		t =3D TREE_OPERAND (t, 0);
> +		tree vectype =3D TREE_TYPE (t);
> +		if (VECTOR_TYPE_P (vectype)
> +		    && known_eq (TYPE_VECTOR_SUBPARTS (vectype), 4u)
> +		    && wi::to_wide (TYPE_SIZE (vectype)) =3D=3D 128)
> +		  {

Since this is quite a specific pattern match, and since we now lower
arm_neon.h vld1* to normal gimple accesses, I think we should try the
=E2=80=9Cmore generally=E2=80=9D approach mentioned in the PR and see what =
the fallout
is.  That is, keep:

    if (!BYTES_BIG_ENDIAN
	&& integer_all_onesp (arg0)

If those conditions pass, create an Advanced SIMD access at address arg1,
using similar code to the handling of:

     BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
     BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD)
     BUILTIN_VALLP_NO_DI (LOAD1_P, ld1, 0, LOAD)

in aarch64_general_gimple_fold_builtin.  (Would be good to move the
common code to aarch64.c so that both files can use it.)

> +		    tree lhs =3D gimple_call_lhs (f.call);
> +		    tree lhs_type =3D TREE_TYPE (lhs);
> +		    int source_nelts =3D TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
> +		    vec_perm_builder sel (TYPE_VECTOR_SUBPARTS (lhs_type), source_nelt=
s, 1);
> +		    for (int i =3D 0; i < source_nelts; i++)
> +		      sel.quick_push (i);
> +
> +		    vec_perm_indices indices (sel, 1, source_nelts);
> +		    if (!can_vec_perm_const_p (TYPE_MODE (lhs_type), indices))
> +		      return NULL;

I don't think we need to check this: it should always be true.
Probably worth keeping as a gcc_checking_assert though.

> +
> +		    tree mask =3D vec_perm_indices_to_tree (lhs_type, indices);
> +		    return gimple_build_assign (lhs, VEC_PERM_EXPR, t, t, mask);
> +		  }
> +	      }
> +	  }
> +      }
> +
> +    return NULL;
> +  }
>  };
>=20=20
>  class svld1ro_impl : public load_replicate
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index f07330cff4f..af27f550be3 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -23002,8 +23002,32 @@ aarch64_evpc_sve_tbl (struct expand_vec_perm_d *=
d)
>=20=20
>    machine_mode sel_mode =3D related_int_vector_mode (d->vmode).require (=
);
>    rtx sel =3D vec_perm_indices_to_rtx (sel_mode, d->perm);
> +
>    if (d->one_vector_p)
> -    emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, se=
l));
> +    {
> +      bool use_dupq =3D false;
> +      /* Check if sel is dup vector with encoded elements {0, 1, 2, ... =
nelts}  */
> +      if (GET_CODE (sel) =3D=3D CONST_VECTOR
> +	  && !GET_MODE_NUNITS (GET_MODE (sel)).is_constant ()
> +	  && CONST_VECTOR_DUPLICATE_P (sel))
> +	  {
> +	    unsigned nelts =3D const_vector_encoded_nelts (sel);
> +	    unsigned i;
> +	    for (i =3D 0; i < nelts; i++)
> +	      {
> +		rtx elem =3D CONST_VECTOR_ENCODED_ELT(sel, i);
> +		if (!(CONST_INT_P (elem) && INTVAL(elem) =3D=3D i))
> +		  break;
> +	      }
> +	    if (i =3D=3D nelts)
> +	      use_dupq =3D true;
> +	  }
> +
> +      if (use_dupq)
> +	aarch64_expand_sve_dupq (d->target, GET_MODE (d->target), d->op0);
> +      else
> +	emit_unspec2 (d->target, UNSPEC_TBL, d->op0, force_reg (sel_mode, sel));
> +    }

This shouldn't be a TBL but a new operation, handled by its own
aarch64_evpc_sve_* routine.  The check for the mask should then
be done on d->perm, to detect whether the permutation is one
that the new routine supports.

I think the requirements are:

- !BYTES_BIG_ENDIAN
- the source must be an Advanced SIMD vector
- the destination must be an SVE vector
- the permutation must be a duplicate (tested in the code above)
- the number of =E2=80=9Cpatterns=E2=80=9D in the permutation must equal th=
e number of
  source elements
- element X of the permutation must equal X (tested in the code above)

The existing aarch64_evpc_* routines expect the source and target modes
to be the same, so we should only call them when that's true.

Thanks,
Richard