From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <richard.sandiford@arm.com>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 0F3583850423
 for <gcc-patches@gcc.gnu.org>; Fri, 14 Jan 2022 10:37:38 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F3583850423
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4891A6D;
 Fri, 14 Jan 2022 02:37:37 -0800 (PST)
Received: from localhost (unknown [10.32.98.88])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 89EE73F766;
 Fri, 14 Jan 2022 02:37:36 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Mail-Followup-To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>,
 Richard Biener <rguenther@suse.de>,
 "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
Cc: Richard Biener <rguenther@suse.de>,
 "Andre Vieira \(lists\) via Gcc-patches" <gcc-patches@gcc.gnu.org>
Subject: Re: [AArch64] Enable generation of FRINTNZ instructions
References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com>
 <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr>
 <36e3469a-3922-d49e-4006-0088eac29157@arm.com>
 <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr>
 <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com>
 <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr>
 <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com>
 <s96q8088-262q-n93-ro5p-759n51837n8@fhfr.qr>
 <f5312e85-b175-fbc9-8858-de611b3d99ac@arm.com>
 <mptee6b8kpi.fsf@arm.com>
 <9d3755df-6c41-20e4-31fb-811e5cd9182a@arm.com>
 <231396s0-2756-q51s-q55-o8npqo91on32@fhfr.qr>
 <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com>
Date: Fri, 14 Jan 2022 10:37:35 +0000
In-Reply-To: <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> (Andre Vieira's
 message of "Mon, 10 Jan 2022 14:09:04 +0000")
Message-ID: <mptmtjyr4b4.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Jan 2022 10:37:40 -0000

"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b=
207580ef597a98f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
>=20=20
>  This pattern is not allowed to @code{FAIL}.
>=20=20
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and s=
tore
> +the result in operand 0. Both operands have mode @var{m}, which is a sca=
lar or
> +vector floating-point mode.  An exception must be raised if operand 1 do=
es not
> +fit in a @var{n} mode signed integer as it would have if the truncation
> +happened through separate floating point to integer conversion.
> +
> +

Nit: just one blank line here.

>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90=
a01ad543557e29a 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an e=
xact hardware vector length.
>  @item aarch64_fjcvtzs_hw
>  AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
>  instruction.
> +
> +@item aarch64_frintzx_ok
> +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
> +FRINT32X and FRINT64X instructions.
>  @end table
>=20=20
>  @subsubsection MIPS-specific attributes
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1cc=
c4032410a369139 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -130,6 +130,7 @@ init_internal_fns ()
>  #define fold_left_direct { 1, 1, false }
>  #define mask_fold_left_direct { 1, 1, false }
>  #define check_ptrs_direct { 0, 0, false }
> +#define ftrunc_int_direct { 0, 1, true }
>=20=20
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] =3D=
 {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_opta=
b optab)
>    return convert_optab_handler (optab, imode, vmode);
>  }
>=20=20
> +/* Expand FTRUNC_INT call STMT using optab OPTAB.  */
> +
> +static void
> +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab opta=
b)
> +{
> +  class expand_operand ops[2];
> +  tree lhs, float_type, int_type;
> +  rtx target, op;
> +
> +  lhs =3D gimple_call_lhs (stmt);
> +  target =3D expand_normal (lhs);
> +  op =3D expand_normal (gimple_call_arg (stmt, 0));
> +
> +  float_type =3D TREE_TYPE (lhs);
> +  int_type =3D element_type (gimple_call_arg (stmt, 1));

Sorry for the run-around, but now that we don't (need to) vectorise
the second argument, I think we can drop this element_type.  That in
turn means that=E2=80=A6

> +
> +  create_output_operand (&ops[0], target, TYPE_MODE (float_type));
> +  create_input_operand (&ops[1], op, TYPE_MODE (float_type));
> +
> +  expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
> +				      TYPE_MODE (int_type)), 2, ops);
> +}
> +
>  /* Expand LOAD_LANES call STMT using optab OPTAB.  */
>=20=20
>  static void
> @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab opta=
b, tree_pair types,
>  	  !=3D CODE_FOR_nothing);
>  }
>=20=20
> +static bool
> +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair type=
s,
> +				     optimization_type opt_type)
> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +				TYPE_MODE (element_type (types.second)),
> +				opt_type) !=3D CODE_FOR_nothing);
> +}
> +

=E2=80=A6this can use convert_optab_supported_p.

> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/g=
cc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9=
ea0d1bb111467b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,91 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=3Darmv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Is this just a cut-&-pasto from a run test?  If not, why do we need both
this and the dg-options?  It feels like one on its own should be enough,
with the dg-options being better.

The test looks OK without this line.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +**	frint32z	s0, s0
> +**	ret
> +*/
> +float
> +f1 (float x)
> +{
> +  int y =3D x;
> +  return (float) y;
> +}
> +
> +/*
> +** f2:
> +**	frint64z	s0, s0
> +**	ret
> +*/
> +float
> +f2 (float x)
> +{
> +  long long int y =3D x;
> +  return (float) y;
> +}
> +
> +/*
> +** f3:
> +**	frint32z	d0, d0
> +**	ret
> +*/
> +double
> +f3 (double x)
> +{
> +  int y =3D x;
> +  return (double) y;
> +}
> +
> +/*
> +** f4:
> +**	frint64z	d0, d0
> +**	ret
> +*/
> +double
> +f4 (double x)
> +{
> +  long long int y =3D x;
> +  return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> +  unsigned int y =3D x;
> +  return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> +  unsigned long long int y =3D x;
> +  return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> +  unsigned int y =3D x;
> +  return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> +  unsigned long long int y =3D x;
> +  return (double) y;
> +}
> +
> +double
> +f5_dont (double x)
> +{
> +  signed short y =3D x;
> +  return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz.  */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsui=
te/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aa=
b42747f43b90687
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=3Darmv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */

Same here.

> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)					\
> +void									\
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{									\
> +  for (int i =3D 0; i < n; ++i)					      \
> +    {								      \
> +      int_type x_i =3D x[i];					      \
> +      y[i] =3D (float_type) x_i;					      \
> +    }								      \
> +}
> +
> +/*
> +** f1:
> +**	...
> +**	frint32z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**	...
> +**	frint64z	v[0-9]+\.4s, v[0-9]+\.4s
> +**	...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**	...
> +**	frint32z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**	...
> +**	frint64z	v[0-9]+\.2d, v[0-9]+\.2d
> +**	...
> +*/
> +TEST(f4, double, long long)
> [=E2=80=A6]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a=
97d375634ae4514 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>=20=20
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -				tree vectype_out, tree vectype_in)
> +				tree vectype_out, tree vectype_in,
> +				tree *vectypes)

Should be described in the comment above the function.

>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, t=
ree fndecl,
>        const direct_internal_fn_info &info =3D direct_internal_fn (ifn);
>        if (info.vectorizable)
>  	{
> -	  tree type0 =3D (info.type0 < 0 ? vectype_out : vectype_in);
> -	  tree type1 =3D (info.type1 < 0 ? vectype_out : vectype_in);
> +	  tree type0 =3D (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +	  if (!type0)
> +	    type0 =3D vectype_in;
> +	  tree type1 =3D (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +	  if (!type1)
> +	    type1 =3D vectype_in;
>  	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>  					      OPTIMIZE_FOR_SPEED))
>  	    return ifn;
> @@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type =3D unsigned_type_node;
>      }
>=20=20
> -  int mask_opno =3D -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno =3D -1;
> +  bool masked =3D false;
>    if (internal_fn_p (cfn))
> -    mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn =3D=3D CFN_FTRUNC_INT)
> +	/* For FTRUNC this represents the argument that carries the type of the
> +	   intermediate signed integer.  */
> +	diff_opno =3D 1;
> +      else
> +	{
> +	  /* For masked operations this represents the argument that carries the
> +	     mask.  */
> +	  diff_opno =3D internal_fn_mask_index (as_internal_fn (cfn));
> +	  masked =3D diff_opno >=3D  0;

Nit: excess space after =E2=80=9C>=3D=E2=80=9D.

> +	}
> +    }

I think it would be better to add a new flag to direct_internal_fn_info
to say whether type1 is scalar, rather than check based on function code.
type1 would then provide the value of diff_opno above.

Also, I think diff_opno should be separate from mask_opno.
Maybe scalar_opno would be a better name.

This would probably be simpler if we used:

  internal_fn ifn =3D associated_internal_fn (cfn, lhs_type);

before the loop (with lhs_type being new), then used ifn to get the
direct_internal_fn_info and passed ifn to vectorizable_internal_function.

>    for (i =3D 0; i < nargs; i++)
>      {
> -      if ((int) i =3D=3D mask_opno)
> +      if ((int) i =3D=3D diff_opno)
>  	{
> -	  if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -				       &op, &slp_op[i], &dt[i], &vectypes[i]))
> -	    return false;
> -	  continue;
> +	  if (masked)
> +	    {
> +	      if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +					   diff_opno, &op, &slp_op[i], &dt[i],
> +					   &vectypes[i]))
> +		return false;
> +	    }
> +	  else
> +	    {
> +	      vectypes[i] =3D TREE_TYPE (gimple_call_arg (stmt, i));
> +	      continue;
> +	    }
>  	}
>=20=20
>        if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
> @@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
>  	  return false;
>  	}
>=20=20
> -      /* We can only handle calls with arguments of the same type.  */
> -      if (rhs_type
> -	  && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +      if ((int) i !=3D diff_opno)

Is this ever false?  It looks the continue above handles the other case.

>  	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument types differ.\n");
> -	  return false;
> -	}
> -      if (!rhs_type)
> -	rhs_type =3D TREE_TYPE (op);
> +	  /* We can only handle calls with arguments of the same type.  */
> +	  if (rhs_type
> +	      && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument types differ.\n");
> +	      return false;
> +	    }
> +	  if (!rhs_type)
> +	    rhs_type =3D TREE_TYPE (op);
>=20=20
> -      if (!vectype_in)
> -	vectype_in =3D vectypes[i];
> -      else if (vectypes[i]
> -	       && !types_compatible_p (vectypes[i], vectype_in))
> -	{
> -	  if (dump_enabled_p ())
> -	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -                             "argument vector types differ.\n");
> -	  return false;
> +	  if (!vectype_in)
> +	    vectype_in =3D vectypes[i];
> +	  else if (vectypes[i]
> +		   && !types_compatible_p (vectypes[i], vectype_in))
> +	    {
> +	      if (dump_enabled_p ())
> +		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +				 "argument vector types differ.\n");
> +	      return false;
> +	    }
>  	}
>      }
>    /* If all arguments are external or constant defs, infer the vector ty=
pe
> @@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
>  	  || (modifier =3D=3D NARROW
>  	      && simple_integer_narrowing (vectype_out, vectype_in,
>  					   &convert_code))))
> -    ifn =3D vectorizable_internal_function (cfn, callee, vectype_out,
> -					  vectype_in);
> +    ifn =3D vectorizable_internal_function (cfn, callee, vectype_out, ve=
ctype_in,
> +					  &vectypes[0]);
>=20=20
>    /* If that fails, try asking for a target-specific built-in function. =
 */
>    if (ifn =3D=3D IFN_LAST)
> @@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
>=20=20
>        if (loop_vinfo
>  	  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> -	  && (reduc_idx >=3D 0 || mask_opno >=3D 0))
> +	  && (reduc_idx >=3D 0 || masked))
>  	{
>  	  if (reduc_idx >=3D 0
>  	      && (cond_fn =3D=3D IFN_LAST
> @@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
>  		   ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
>  		   : ncopies);
>  	      tree scalar_mask =3D NULL_TREE;
> -	      if (mask_opno >=3D 0)
> -		scalar_mask =3D gimple_call_arg (stmt_info->stmt, mask_opno);
> +	      if (masked)
> +		scalar_mask =3D gimple_call_arg (stmt_info->stmt, diff_opno);
>  	      vect_record_loop_mask (loop_vinfo, masks, nvectors,
>  				     vectype_out, scalar_mask);
>  	    }
> @@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
>  		    {
>  		      /* We don't define any narrowing conditional functions
>  			 at present.  */
> -		      gcc_assert (mask_opno < 0);
> +		      gcc_assert (!masked);
>  		      tree half_res =3D make_ssa_name (vectype_in);
>  		      gcall *call
>  			=3D gimple_build_call_internal_vec (ifn, vargs);
> @@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
>  		    }
>  		  else
>  		    {
> -		      if (mask_opno >=3D 0 && masked_loop_p)
> +		      if (masked && masked_loop_p)
>  			{
>  			  unsigned int vec_num =3D vec_oprnds0.length ();
>  			  /* Always true for SLP.  */
>  			  gcc_assert (ncopies =3D=3D 1);
>  			  tree mask =3D vect_get_loop_mask (gsi, masks, vec_num,
>  							  vectype_out, i);
> -			  vargs[mask_opno] =3D prepare_vec_mask
> +			  vargs[diff_opno] =3D prepare_vec_mask
>  			    (loop_vinfo, TREE_TYPE (mask), mask,
> -			     vargs[mask_opno], gsi);
> +			     vargs[diff_opno], gsi);
>  			}
>=20=20
>  		      gcall *call;
> @@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
>  	  if (masked_loop_p && reduc_idx >=3D 0)
>  	    vargs[varg++] =3D vargs[reduc_idx + 1];
>=20=20
> -	  if (mask_opno >=3D 0 && masked_loop_p)
> +	  if (masked && masked_loop_p)
>  	    {
>  	      tree mask =3D vect_get_loop_mask (gsi, masks, ncopies,
>  					      vectype_out, j);
> -	      vargs[mask_opno]
> +	      vargs[diff_opno]
>  		=3D prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> -				    vargs[mask_opno], gsi);
> +				    vargs[diff_opno], gsi);
>  	    }
>=20=20
>  	  gimple *new_stmt;
> @@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
>  	    {
>  	      /* We don't define any narrowing conditional functions at
>  		 present.  */
> -	      gcc_assert (mask_opno < 0);
> +	      gcc_assert (!masked);
>  	      tree half_res =3D make_ssa_name (vectype_in);
>  	      gcall *call =3D gimple_build_call_internal_vec (ifn, vargs);
>  	      gimple_call_set_lhs (call, half_res);
> @@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
>      {
>        auto_vec<vec<tree> > vec_defs (nargs);
>        /* We don't define any narrowing conditional functions at present.=
  */
> -      gcc_assert (mask_opno < 0);
> +      gcc_assert (!masked);
>        for (j =3D 0; j < ncopies; ++j)
>  	{
>  	  /* Build argument list for the vectorized call.  */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2faf=
a61eab0f1ba4cfa 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
>     object or pointer.  Otherwise return null.  */
>  extern tree get_attr_nonstring_decl (tree, tree * =3D NULL);
>=20=20
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
> +extern tree element_type (tree);
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +extern unsigned int element_precision (const_tree);
> +
>  #endif  /* GCC_TREE_H  */
> diff --git a/gcc/tree.c b/gcc/tree.c
> index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655=
cfb437b40ec0bfa 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_=
error *perr /* =3D NULL */)
>    return true;
>  }
>=20=20
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>=20=20
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (tree type)
>  {
>    if (!TYPE_P (type))
>      type =3D TREE_TYPE (type);
> @@ -6658,7 +6658,16 @@ element_precision (const_tree type)
>    if (code =3D=3D COMPLEX_TYPE || code =3D=3D VECTOR_TYPE)
>      type =3D TREE_TYPE (type);
>=20=20
> -  return TYPE_PRECISION (type);
> +  return const_cast<tree> (type);

The const_cast<> is redundant.

Sorry for not thinking about it before, but we should probably have
a test for the SLP case.  E.g.:

  for (int i =3D 0; i < n; i +=3D 2)
    {
      int_type x_i0 =3D x[i];
      int_type x_i1 =3D x[i + 1];
      y[i] =3D (float_type) x_i1;
      y[i + 1] =3D (float_type) x_i0;
    }

(with a permute thrown in for good measure).  This will make sure
that the (separate) SLP group matching code handles the call correctly.

Thanks,
Richard