From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 48DA93858D28
 for <gcc-patches@gcc.gnu.org>; Fri, 17 Dec 2021 12:44:28 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 48DA93858D28
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CFD351435;
 Fri, 17 Dec 2021 04:44:27 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3CA223F5A1;
 Fri, 17 Dec 2021 04:44:27 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org>
Mail-Followup-To: "Andre Vieira \(lists\) via Gcc-patches"
 <gcc-patches@gcc.gnu.org>, Richard Biener <rguenther@suse.de>,
 "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>,
 richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>,
 "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Subject: Re: [AArch64] Enable generation of FRINTNZ instructions
References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com>
 <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr>
 <36e3469a-3922-d49e-4006-0088eac29157@arm.com>
 <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr>
 <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com>
 <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr>
 <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com>
 <s96q8088-262q-n93-ro5p-759n51837n8@fhfr.qr>
 <f5312e85-b175-fbc9-8858-de611b3d99ac@arm.com>
Date: Fri, 17 Dec 2021 12:44:25 +0000
In-Reply-To: <f5312e85-b175-fbc9-8858-de611b3d99ac@arm.com> (Andre Vieira via
 Gcc-patches's message of "Thu, 25 Nov 2021 13:53:08 +0000")
Message-ID: <mptee6b8kpi.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 17 Dec 2021 12:44:32 -0000

"Andre Vieira (lists) via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't before=
... I
>>>> thought I could handle the vector forms later, but the problem is if I=
 add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, which d=
oesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was=20
> initially trying to make vectorizable_call and=20
> vectorizable_internal_function handle IFNs with different inputs more=20
> generically, using the information we have in the <IFN>_direct structs=20
> regarding what operands to get the modes from. Unfortunately, that=20
> wasn't straightforward because of how vectorizable_call assumes operands=
=20
> have the same type and uses the type of the DEF_STMT_INFO of the=20
> non-constant operands (either output operand or non-constant inputs) to=20
> determine the type of constants. I assume there is some reason why we=20
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on=20
> the argument types. That is why I ended up with this sort of half-way=20
> mix of both, which still allows room to add more IFNs that don't take=20
> inputs of the same type, but require adding a bit of special casing=20
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.

Still leaving the match.pd stuff to Richard, but otherwise:

> index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463=
f6a46f9fd0fb43f 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD=
_F16INST")
>  				  SF DF])
>=20=20
>  ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")

Nit: excess space between [ and (.

> +			      (V4SF "TARGET_SIMD")
> +			      (V2DF "TARGET_SIMD")
> +			      (DF "TARGET_FLOAT")
> +			      (SF "TARGET_FLOAT")])
>=20=20
>  ;; Advanced SIMD single Float modes.
>  (define_mode_iterator VDQSF [V2SF V4SF])
> [=E2=80=A6]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4=
396a490f33f174c 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>=20=20
>  This pattern is not allowed to @code{FAIL}.
>=20=20
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and s=
tore
> +the result in operand 0. Both operands have mode @var{m}, which is a sca=
lar or
> +vector floating-point mode.  Exception must be thrown if operand 1 does =
not fit

Maybe =E2=80=9CAn exception must be raised=E2=80=9D?  =E2=80=9Cthrown=E2=80=
=9D makes it sound like a
signal must be raised or C++ exception thrown.

> +in a @var{n} mode signed integer as it would have if the truncation happ=
ened
> +through separate floating point to integer conversion.
> +
> +
>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> [=E2=80=A6]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab opta=
b, tree_pair types,
>  	  !=3D CODE_FOR_nothing);
>  }
>=20=20
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> +						 tree_pair types,
> +						 optimization_type opt_type)

Formatting nit: should be a line break after =E2=80=9Cbool=E2=80=9D

> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) !=3D CODE_FOR_nothing);
> +}
> +
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
>  #define direct_ternary_optab_supported_p direct_optab_supported_p
> [=E2=80=A6]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsui=
te/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee5=
1d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=3Darmv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i =3D 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i =3D x[i];					      \
> +      y[i] =3D (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v0.4s, v0.4s

I don't think we can rely on v0 being used here.  v[0-9]+\.4s would
be safer.

> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v0.4s, v0.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v0.2d, v0.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> [=E2=80=A6]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70c=
e0ac9da891dbe62 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>=20=20
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)
>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, t=
ree fndecl,
>        const direct_internal_fn_info &info =3D direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 =3D (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 =3D (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 =3D (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 =3D vectype_in;
> +	  tree type1 =3D (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 =3D vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type =3D unsigned_type_node;
>      }
>=20=20
> -  int mask_opno =3D -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno =3D -1;
> +  bool masked =3D false;
>    if (internal_fn_p (cfn))
> -    mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn =3D=3D CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno =3D 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno =3D internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked =3D diff_opno >=3D  0;
> +	}
> +    }

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

>    for (i =3D 0; i < nargs; i++)
>      {
> -      if ((int) i =3D=3D mask_opno)
> +      if ((int) i =3D=3D diff_opno && masked)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> +	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +				       diff_opno, &op, &slp_op[i], &dt[i],
> +				       &vectypes[i]))
>  	    return false;
>  	  continue;
>  	}
> [=E2=80=A6]
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e=
2480ed5522f78c1 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_=
error *perr /* =3D NULL */)
>    return true;
>  }
>=20=20
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>=20=20
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (const_tree type)
>  {
>    if (!TYPE_P (type))
>      type =3D TREE_TYPE (type);
> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>    if (code =3D=3D COMPLEX_TYPE || code =3D=3D VECTOR_TYPE)
>      type =3D TREE_TYPE (type);
>=20=20
> -  return TYPE_PRECISION (type);
> +  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain =E2=80=9Ctype=E2=80=9D.  As it stands element_typ=
e is an
implicit const_cast for many cases.

Thanks,
Richard

> +}
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +
> +unsigned int
> +element_precision (const_tree type)
> +{
> +  return TYPE_PRECISION (element_type (type));
>  }
>=20=20
>  /* Return true if CODE represents an associative tree code.  Otherwise