From: Richard Sandiford <richard.sandiford@arm.com>
To: "Andre Vieira \(lists\)" <andre.simoesdiasvieira@arm.com>
Cc: Richard Biener <rguenther@suse.de>,
"Andre Vieira \(lists\) via Gcc-patches"
<gcc-patches@gcc.gnu.org>
Subject: Re: [AArch64] Enable generation of FRINTNZ instructions
Date: Fri, 14 Jan 2022 10:37:35 +0000 [thread overview]
Message-ID: <mptmtjyr4b4.fsf@arm.com> (raw)
In-Reply-To: <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> (Andre Vieira's message of "Mon, 10 Jan 2022 14:09:04 +0000")
"Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com> writes:
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b207580ef597a98f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6191,6 +6191,15 @@ operands; otherwise, it may not.
>
> This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar or
> +vector floating-point mode. An exception must be raised if operand 1 does not
> +fit in a @var{n} mode signed integer as it would have if the truncation
> +happened through separate floating point to integer conversion.
> +
> +
Nit: just one blank line here.
> @cindex @code{round@var{m}2} instruction pattern
> @item @samp{round@var{m}2}
> Round operand 1 to the nearest integer, rounding away from zero in the
> diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
> index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90a01ad543557e29a 100644
> --- a/gcc/doc/sourcebuild.texi
> +++ b/gcc/doc/sourcebuild.texi
> @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an exact hardware vector length.
> @item aarch64_fjcvtzs_hw
> AArch64 target that is able to generate and execute armv8.3-a FJCVTZS
> instruction.
> +
> +@item aarch64_frintzx_ok
> +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z,
> +FRINT32X and FRINT64X instructions.
> @end table
>
> @subsubsection MIPS-specific attributes
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1ccc4032410a369139 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -130,6 +130,7 @@ init_internal_fns ()
> #define fold_left_direct { 1, 1, false }
> #define mask_fold_left_direct { 1, 1, false }
> #define check_ptrs_direct { 0, 0, false }
> +#define ftrunc_int_direct { 0, 1, true }
>
> const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
> #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_optab optab)
> return convert_optab_handler (optab, imode, vmode);
> }
>
> +/* Expand FTRUNC_INT call STMT using optab OPTAB. */
> +
> +static void
> +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
> +{
> + class expand_operand ops[2];
> + tree lhs, float_type, int_type;
> + rtx target, op;
> +
> + lhs = gimple_call_lhs (stmt);
> + target = expand_normal (lhs);
> + op = expand_normal (gimple_call_arg (stmt, 0));
> +
> + float_type = TREE_TYPE (lhs);
> + int_type = element_type (gimple_call_arg (stmt, 1));
Sorry for the run-around, but now that we don't (need to) vectorise
the second argument, I think we can drop this element_type. That in
turn means that…
> +
> + create_output_operand (&ops[0], target, TYPE_MODE (float_type));
> + create_input_operand (&ops[1], op, TYPE_MODE (float_type));
> +
> + expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type),
> + TYPE_MODE (int_type)), 2, ops);
> +}
> +
> /* Expand LOAD_LANES call STMT using optab OPTAB. */
>
> static void
> @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
> != CODE_FOR_nothing);
> }
>
> +static bool
> +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair types,
> + optimization_type opt_type)
> +{
> + return (convert_optab_handler (optab, TYPE_MODE (types.first),
> + TYPE_MODE (element_type (types.second)),
> + opt_type) != CODE_FOR_nothing);
> +}
> +
…this can use convert_optab_supported_p.
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9ea0d1bb111467b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c
> @@ -0,0 +1,91 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
Is this just a cut-&-pasto from a run test? If not, why do we need both
this and the dg-options? It feels like one on its own should be enough,
with the dg-options being better.
The test looks OK without this line.
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +/*
> +** f1:
> +** frint32z s0, s0
> +** ret
> +*/
> +float
> +f1 (float x)
> +{
> + int y = x;
> + return (float) y;
> +}
> +
> +/*
> +** f2:
> +** frint64z s0, s0
> +** ret
> +*/
> +float
> +f2 (float x)
> +{
> + long long int y = x;
> + return (float) y;
> +}
> +
> +/*
> +** f3:
> +** frint32z d0, d0
> +** ret
> +*/
> +double
> +f3 (double x)
> +{
> + int y = x;
> + return (double) y;
> +}
> +
> +/*
> +** f4:
> +** frint64z d0, d0
> +** ret
> +*/
> +double
> +f4 (double x)
> +{
> + long long int y = x;
> + return (double) y;
> +}
> +
> +float
> +f1_dont (float x)
> +{
> + unsigned int y = x;
> + return (float) y;
> +}
> +
> +float
> +f2_dont (float x)
> +{
> + unsigned long long int y = x;
> + return (float) y;
> +}
> +
> +double
> +f3_dont (double x)
> +{
> + unsigned int y = x;
> + return (double) y;
> +}
> +
> +double
> +f4_dont (double x)
> +{
> + unsigned long long int y = x;
> + return (double) y;
> +}
> +
> +double
> +f5_dont (double x)
> +{
> + signed short y = x;
> + return (double) y;
> +}
> +
> +/* Make sure the 'dont's don't generate any frintNz. */
> +/* { dg-final { scan-assembler-times {frint32z} 2 } } */
> +/* { dg-final { scan-assembler-times {frint64z} 2 } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aab42747f43b90687
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
Same here.
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type) \
> +void \
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n) \
> +{ \
> + for (int i = 0; i < n; ++i) \
> + { \
> + int_type x_i = x[i]; \
> + y[i] = (float_type) x_i; \
> + } \
> +}
> +
> +/*
> +** f1:
> +** ...
> +** frint32z v[0-9]+\.4s, v[0-9]+\.4s
> +** ...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +** ...
> +** frint64z v[0-9]+\.4s, v[0-9]+\.4s
> +** ...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +** ...
> +** frint32z v[0-9]+\.2d, v[0-9]+\.2d
> +** ...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +** ...
> +** frint64z v[0-9]+\.2d, v[0-9]+\.2d
> +** ...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a97d375634ae4514 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>
> static internal_fn
> vectorizable_internal_function (combined_fn cfn, tree fndecl,
> - tree vectype_out, tree vectype_in)
> + tree vectype_out, tree vectype_in,
> + tree *vectypes)
Should be described in the comment above the function.
> {
> internal_fn ifn;
> if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree fndecl,
> const direct_internal_fn_info &info = direct_internal_fn (ifn);
> if (info.vectorizable)
> {
> - tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> - tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> + tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> + if (!type0)
> + type0 = vectype_in;
> + tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> + if (!type1)
> + type1 = vectype_in;
> if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
> OPTIMIZE_FOR_SPEED))
> return ifn;
> @@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo,
> rhs_type = unsigned_type_node;
> }
>
> - int mask_opno = -1;
> + /* The argument that is not of the same type as the others. */
> + int diff_opno = -1;
> + bool masked = false;
> if (internal_fn_p (cfn))
> - mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> + {
> + if (cfn == CFN_FTRUNC_INT)
> + /* For FTRUNC this represents the argument that carries the type of the
> + intermediate signed integer. */
> + diff_opno = 1;
> + else
> + {
> + /* For masked operations this represents the argument that carries the
> + mask. */
> + diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> + masked = diff_opno >= 0;
Nit: excess space after “>=”.
> + }
> + }
I think it would be better to add a new flag to direct_internal_fn_info
to say whether type1 is scalar, rather than check based on function code.
type1 would then provide the value of diff_opno above.
Also, I think diff_opno should be separate from mask_opno.
Maybe scalar_opno would be a better name.
This would probably be simpler if we used:
internal_fn ifn = associated_internal_fn (cfn, lhs_type);
before the loop (with lhs_type being new), then used ifn to get the
direct_internal_fn_info and passed ifn to vectorizable_internal_function.
> for (i = 0; i < nargs; i++)
> {
> - if ((int) i == mask_opno)
> + if ((int) i == diff_opno)
> {
> - if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> - &op, &slp_op[i], &dt[i], &vectypes[i]))
> - return false;
> - continue;
> + if (masked)
> + {
> + if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> + diff_opno, &op, &slp_op[i], &dt[i],
> + &vectypes[i]))
> + return false;
> + }
> + else
> + {
> + vectypes[i] = TREE_TYPE (gimple_call_arg (stmt, i));
> + continue;
> + }
> }
>
> if (!vect_is_simple_use (vinfo, stmt_info, slp_node,
> @@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo,
> return false;
> }
>
> - /* We can only handle calls with arguments of the same type. */
> - if (rhs_type
> - && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> + if ((int) i != diff_opno)
Is this ever false? It looks the continue above handles the other case.
> {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "argument types differ.\n");
> - return false;
> - }
> - if (!rhs_type)
> - rhs_type = TREE_TYPE (op);
> + /* We can only handle calls with arguments of the same type. */
> + if (rhs_type
> + && !types_compatible_p (rhs_type, TREE_TYPE (op)))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "argument types differ.\n");
> + return false;
> + }
> + if (!rhs_type)
> + rhs_type = TREE_TYPE (op);
>
> - if (!vectype_in)
> - vectype_in = vectypes[i];
> - else if (vectypes[i]
> - && !types_compatible_p (vectypes[i], vectype_in))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "argument vector types differ.\n");
> - return false;
> + if (!vectype_in)
> + vectype_in = vectypes[i];
> + else if (vectypes[i]
> + && !types_compatible_p (vectypes[i], vectype_in))
> + {
> + if (dump_enabled_p ())
> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> + "argument vector types differ.\n");
> + return false;
> + }
> }
> }
> /* If all arguments are external or constant defs, infer the vector type
> @@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo,
> || (modifier == NARROW
> && simple_integer_narrowing (vectype_out, vectype_in,
> &convert_code))))
> - ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> - vectype_in);
> + ifn = vectorizable_internal_function (cfn, callee, vectype_out, vectype_in,
> + &vectypes[0]);
>
> /* If that fails, try asking for a target-specific built-in function. */
> if (ifn == IFN_LAST)
> @@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo,
>
> if (loop_vinfo
> && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> - && (reduc_idx >= 0 || mask_opno >= 0))
> + && (reduc_idx >= 0 || masked))
> {
> if (reduc_idx >= 0
> && (cond_fn == IFN_LAST
> @@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo,
> ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node)
> : ncopies);
> tree scalar_mask = NULL_TREE;
> - if (mask_opno >= 0)
> - scalar_mask = gimple_call_arg (stmt_info->stmt, mask_opno);
> + if (masked)
> + scalar_mask = gimple_call_arg (stmt_info->stmt, diff_opno);
> vect_record_loop_mask (loop_vinfo, masks, nvectors,
> vectype_out, scalar_mask);
> }
> @@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo,
> {
> /* We don't define any narrowing conditional functions
> at present. */
> - gcc_assert (mask_opno < 0);
> + gcc_assert (!masked);
> tree half_res = make_ssa_name (vectype_in);
> gcall *call
> = gimple_build_call_internal_vec (ifn, vargs);
> @@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo,
> }
> else
> {
> - if (mask_opno >= 0 && masked_loop_p)
> + if (masked && masked_loop_p)
> {
> unsigned int vec_num = vec_oprnds0.length ();
> /* Always true for SLP. */
> gcc_assert (ncopies == 1);
> tree mask = vect_get_loop_mask (gsi, masks, vec_num,
> vectype_out, i);
> - vargs[mask_opno] = prepare_vec_mask
> + vargs[diff_opno] = prepare_vec_mask
> (loop_vinfo, TREE_TYPE (mask), mask,
> - vargs[mask_opno], gsi);
> + vargs[diff_opno], gsi);
> }
>
> gcall *call;
> @@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo,
> if (masked_loop_p && reduc_idx >= 0)
> vargs[varg++] = vargs[reduc_idx + 1];
>
> - if (mask_opno >= 0 && masked_loop_p)
> + if (masked && masked_loop_p)
> {
> tree mask = vect_get_loop_mask (gsi, masks, ncopies,
> vectype_out, j);
> - vargs[mask_opno]
> + vargs[diff_opno]
> = prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask,
> - vargs[mask_opno], gsi);
> + vargs[diff_opno], gsi);
> }
>
> gimple *new_stmt;
> @@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo,
> {
> /* We don't define any narrowing conditional functions at
> present. */
> - gcc_assert (mask_opno < 0);
> + gcc_assert (!masked);
> tree half_res = make_ssa_name (vectype_in);
> gcall *call = gimple_build_call_internal_vec (ifn, vargs);
> gimple_call_set_lhs (call, half_res);
> @@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo,
> {
> auto_vec<vec<tree> > vec_defs (nargs);
> /* We don't define any narrowing conditional functions at present. */
> - gcc_assert (mask_opno < 0);
> + gcc_assert (!masked);
> for (j = 0; j < ncopies; ++j)
> {
> /* Build argument list for the vectorized call. */
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2fafa61eab0f1ba4cfa 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree);
> object or pointer. Otherwise return null. */
> extern tree get_attr_nonstring_decl (tree, tree * = NULL);
>
> +/* Return the type, or for a complex or vector type the type of its
> + elements. */
> +extern tree element_type (tree);
> +
> +/* Return the precision of the type, or for a complex or vector type the
> + precision of the type of its elements. */
> +extern unsigned int element_precision (const_tree);
> +
> #endif /* GCC_TREE_H */
> diff --git a/gcc/tree.c b/gcc/tree.c
> index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655cfb437b40ec0bfa 100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_error *perr /* = NULL */)
> return true;
> }
>
> -/* Return the precision of the type, or for a complex or vector type the
> - precision of the type of its elements. */
> +/* Return the type, or for a complex or vector type the type of its
> + elements. */
>
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (tree type)
> {
> if (!TYPE_P (type))
> type = TREE_TYPE (type);
> @@ -6658,7 +6658,16 @@ element_precision (const_tree type)
> if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> type = TREE_TYPE (type);
>
> - return TYPE_PRECISION (type);
> + return const_cast<tree> (type);
The const_cast<> is redundant.
Sorry for not thinking about it before, but we should probably have
a test for the SLP case. E.g.:
for (int i = 0; i < n; i += 2)
{
int_type x_i0 = x[i];
int_type x_i1 = x[i + 1];
y[i] = (float_type) x_i1;
y[i + 1] = (float_type) x_i0;
}
(with a permute thrown in for good measure). This will make sure
that the (separate) SLP group matching code handles the call correctly.
Thanks,
Richard
next prev parent reply other threads:[~2022-01-14 10:37 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-11 17:51 Andre Vieira (lists)
2021-11-12 10:56 ` Richard Biener
2021-11-12 11:48 ` Andre Simoes Dias Vieira
2021-11-16 12:10 ` Richard Biener
2021-11-17 13:30 ` Andre Vieira (lists)
2021-11-17 15:38 ` Richard Sandiford
2021-11-18 11:05 ` Richard Biener
2021-11-22 11:38 ` Andre Vieira (lists)
2021-11-22 11:41 ` Richard Biener
2021-11-25 13:53 ` Andre Vieira (lists)
2021-12-07 11:29 ` Andre Vieira (lists)
2021-12-17 12:44 ` Richard Sandiford
2021-12-29 15:55 ` Andre Vieira (lists)
2021-12-29 16:54 ` Richard Sandiford
2022-01-03 12:18 ` Richard Biener
2022-01-10 14:09 ` Andre Vieira (lists)
2022-01-10 14:45 ` Richard Biener
2022-01-14 10:37 ` Richard Sandiford [this message]
2022-11-04 17:40 ` Andre Vieira (lists)
2022-11-07 11:05 ` Richard Biener
2022-11-07 14:19 ` Andre Vieira (lists)
2022-11-07 14:56 ` Richard Biener
2022-11-09 11:33 ` Andre Vieira (lists)
2022-11-15 18:24 ` Richard Sandiford
2022-11-16 12:25 ` Richard Biener
2021-11-29 11:17 ` Andre Vieira (lists)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mptmtjyr4b4.fsf@arm.com \
--to=richard.sandiford@arm.com \
--cc=andre.simoesdiasvieira@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=rguenther@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).