From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0F3583850423 for ; Fri, 14 Jan 2022 10:37:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F3583850423 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4891A6D; Fri, 14 Jan 2022 02:37:37 -0800 (PST) Received: from localhost (unknown [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 89EE73F766; Fri, 14 Jan 2022 02:37:36 -0800 (PST) From: Richard Sandiford To: "Andre Vieira \(lists\)" Mail-Followup-To: "Andre Vieira \(lists\)" , Richard Biener , "Andre Vieira \(lists\) via Gcc-patches" , richard.sandiford@arm.com Cc: Richard Biener , "Andre Vieira \(lists\) via Gcc-patches" Subject: Re: [AArch64] Enable generation of FRINTNZ instructions References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com> <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr> <36e3469a-3922-d49e-4006-0088eac29157@arm.com> <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr> <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com> <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr> <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com> <9d3755df-6c41-20e4-31fb-811e5cd9182a@arm.com> <231396s0-2756-q51s-q55-o8npqo91on32@fhfr.qr> <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> Date: Fri, 14 Jan 2022 10:37:35 +0000 In-Reply-To: <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> (Andre Vieira's message of "Mon, 10 Jan 2022 14:09:04 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2022 10:37:40 -0000 "Andre Vieira (lists)" writes: > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 19e89ae502bc2f51db64667b236c1cb669718b02..3b0e4e0875b4392ab6833568b= 207580ef597a98f 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6191,6 +6191,15 @@ operands; otherwise, it may not. >=20=20 > This pattern is not allowed to @code{FAIL}. >=20=20 > +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern > +@item @samp{ftrunc@var{m}@var{n}2} > +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and s= tore > +the result in operand 0. Both operands have mode @var{m}, which is a sca= lar or > +vector floating-point mode. An exception must be raised if operand 1 do= es not > +fit in a @var{n} mode signed integer as it would have if the truncation > +happened through separate floating point to integer conversion. > + > + Nit: just one blank line here. > @cindex @code{round@var{m}2} instruction pattern > @item @samp{round@var{m}2} > Round operand 1 to the nearest integer, rounding away from zero in the > diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi > index 6095a35cd4565fdb7d758104e80fe6411230f758..a56bbb775572fa72379854f90= a01ad543557e29a 100644 > --- a/gcc/doc/sourcebuild.texi > +++ b/gcc/doc/sourcebuild.texi > @@ -2286,6 +2286,10 @@ Like @code{aarch64_sve_hw}, but also test for an e= xact hardware vector length. > @item aarch64_fjcvtzs_hw > AArch64 target that is able to generate and execute armv8.3-a FJCVTZS > instruction. > + > +@item aarch64_frintzx_ok > +AArch64 target that is able to generate the Armv8.5-a FRINT32Z, FRINT64Z, > +FRINT32X and FRINT64X instructions. > @end table >=20=20 > @subsubsection MIPS-specific attributes > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index b24102a5990bea4cbb102069f7a6df497fc81ebf..9047b486f41948059a7a7f1cc= c4032410a369139 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -130,6 +130,7 @@ init_internal_fns () > #define fold_left_direct { 1, 1, false } > #define mask_fold_left_direct { 1, 1, false } > #define check_ptrs_direct { 0, 0, false } > +#define ftrunc_int_direct { 0, 1, true } >=20=20 > const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] =3D= { > #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, > @@ -156,6 +157,29 @@ get_multi_vector_move (tree array_type, convert_opta= b optab) > return convert_optab_handler (optab, imode, vmode); > } >=20=20 > +/* Expand FTRUNC_INT call STMT using optab OPTAB. */ > + > +static void > +expand_ftrunc_int_optab_fn (internal_fn, gcall *stmt, convert_optab opta= b) > +{ > + class expand_operand ops[2]; > + tree lhs, float_type, int_type; > + rtx target, op; > + > + lhs =3D gimple_call_lhs (stmt); > + target =3D expand_normal (lhs); > + op =3D expand_normal (gimple_call_arg (stmt, 0)); > + > + float_type =3D TREE_TYPE (lhs); > + int_type =3D element_type (gimple_call_arg (stmt, 1)); Sorry for the run-around, but now that we don't (need to) vectorise the second argument, I think we can drop this element_type. That in turn means that=E2=80=A6 > + > + create_output_operand (&ops[0], target, TYPE_MODE (float_type)); > + create_input_operand (&ops[1], op, TYPE_MODE (float_type)); > + > + expand_insn (convert_optab_handler (optab, TYPE_MODE (float_type), > + TYPE_MODE (int_type)), 2, ops); > +} > + > /* Expand LOAD_LANES call STMT using optab OPTAB. */ >=20=20 > static void > @@ -3747,6 +3771,15 @@ multi_vector_optab_supported_p (convert_optab opta= b, tree_pair types, > !=3D CODE_FOR_nothing); > } >=20=20 > +static bool > +direct_ftrunc_int_optab_supported_p (convert_optab optab, tree_pair type= s, > + optimization_type opt_type) > +{ > + return (convert_optab_handler (optab, TYPE_MODE (types.first), > + TYPE_MODE (element_type (types.second)), > + opt_type) !=3D CODE_FOR_nothing); > +} > + =E2=80=A6this can use convert_optab_supported_p. > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz.c b/gcc/testsuite/g= cc.target/aarch64/frintnz.c > new file mode 100644 > index 0000000000000000000000000000000000000000..008e1cf9f4a1b0148128c65c9= ea0d1bb111467b7 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz.c > @@ -0,0 +1,91 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=3Darmv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ Is this just a cut-&-pasto from a run test? If not, why do we need both this and the dg-options? It feels like one on its own should be enough, with the dg-options being better. The test looks OK without this line. > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +/* > +** f1: > +** frint32z s0, s0 > +** ret > +*/ > +float > +f1 (float x) > +{ > + int y =3D x; > + return (float) y; > +} > + > +/* > +** f2: > +** frint64z s0, s0 > +** ret > +*/ > +float > +f2 (float x) > +{ > + long long int y =3D x; > + return (float) y; > +} > + > +/* > +** f3: > +** frint32z d0, d0 > +** ret > +*/ > +double > +f3 (double x) > +{ > + int y =3D x; > + return (double) y; > +} > + > +/* > +** f4: > +** frint64z d0, d0 > +** ret > +*/ > +double > +f4 (double x) > +{ > + long long int y =3D x; > + return (double) y; > +} > + > +float > +f1_dont (float x) > +{ > + unsigned int y =3D x; > + return (float) y; > +} > + > +float > +f2_dont (float x) > +{ > + unsigned long long int y =3D x; > + return (float) y; > +} > + > +double > +f3_dont (double x) > +{ > + unsigned int y =3D x; > + return (double) y; > +} > + > +double > +f4_dont (double x) > +{ > + unsigned long long int y =3D x; > + return (double) y; > +} > + > +double > +f5_dont (double x) > +{ > + signed short y =3D x; > + return (double) y; > +} > + > +/* Make sure the 'dont's don't generate any frintNz. */ > +/* { dg-final { scan-assembler-times {frint32z} 2 } } */ > +/* { dg-final { scan-assembler-times {frint64z} 2 } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsui= te/gcc.target/aarch64/frintnz_vec.c > new file mode 100644 > index 0000000000000000000000000000000000000000..801d65ea8325cb680691286aa= b42747f43b90687 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > @@ -0,0 +1,47 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=3Darmv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ Same here. > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +#define TEST(name,float_type,int_type) \ > +void \ > +name (float_type * __restrict__ x, float_type * __restrict__ y, int n) \ > +{ \ > + for (int i =3D 0; i < n; ++i) \ > + { \ > + int_type x_i =3D x[i]; \ > + y[i] =3D (float_type) x_i; \ > + } \ > +} > + > +/* > +** f1: > +** ... > +** frint32z v[0-9]+\.4s, v[0-9]+\.4s > +** ... > +*/ > +TEST(f1, float, int) > + > +/* > +** f2: > +** ... > +** frint64z v[0-9]+\.4s, v[0-9]+\.4s > +** ... > +*/ > +TEST(f2, float, long long) > + > +/* > +** f3: > +** ... > +** frint32z v[0-9]+\.2d, v[0-9]+\.2d > +** ... > +*/ > +TEST(f3, double, int) > + > +/* > +** f4: > +** ... > +** frint64z v[0-9]+\.2d, v[0-9]+\.2d > +** ... > +*/ > +TEST(f4, double, long long) > [=E2=80=A6] > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index f2625a2ff4089739326ce11785f1b68678c07f0e..435f2f4f5aeb2ed4c503c7b6a= 97d375634ae4514 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo, >=20=20 > static internal_fn > vectorizable_internal_function (combined_fn cfn, tree fndecl, > - tree vectype_out, tree vectype_in) > + tree vectype_out, tree vectype_in, > + tree *vectypes) Should be described in the comment above the function. > { > internal_fn ifn; > if (internal_fn_p (cfn)) > @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, t= ree fndecl, > const direct_internal_fn_info &info =3D direct_internal_fn (ifn); > if (info.vectorizable) > { > - tree type0 =3D (info.type0 < 0 ? vectype_out : vectype_in); > - tree type1 =3D (info.type1 < 0 ? vectype_out : vectype_in); > + tree type0 =3D (info.type0 < 0 ? vectype_out : vectypes[info.type0]); > + if (!type0) > + type0 =3D vectype_in; > + tree type1 =3D (info.type1 < 0 ? vectype_out : vectypes[info.type1]); > + if (!type1) > + type1 =3D vectype_in; > if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1), > OPTIMIZE_FOR_SPEED)) > return ifn; > @@ -3263,18 +3268,40 @@ vectorizable_call (vec_info *vinfo, > rhs_type =3D unsigned_type_node; > } >=20=20 > - int mask_opno =3D -1; > + /* The argument that is not of the same type as the others. */ > + int diff_opno =3D -1; > + bool masked =3D false; > if (internal_fn_p (cfn)) > - mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); > + { > + if (cfn =3D=3D CFN_FTRUNC_INT) > + /* For FTRUNC this represents the argument that carries the type of the > + intermediate signed integer. */ > + diff_opno =3D 1; > + else > + { > + /* For masked operations this represents the argument that carries the > + mask. */ > + diff_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); > + masked =3D diff_opno >=3D 0; Nit: excess space after =E2=80=9C>=3D=E2=80=9D. > + } > + } I think it would be better to add a new flag to direct_internal_fn_info to say whether type1 is scalar, rather than check based on function code. type1 would then provide the value of diff_opno above. Also, I think diff_opno should be separate from mask_opno. Maybe scalar_opno would be a better name. This would probably be simpler if we used: internal_fn ifn =3D associated_internal_fn (cfn, lhs_type); before the loop (with lhs_type being new), then used ifn to get the direct_internal_fn_info and passed ifn to vectorizable_internal_function. > for (i =3D 0; i < nargs; i++) > { > - if ((int) i =3D=3D mask_opno) > + if ((int) i =3D=3D diff_opno) > { > - if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno, > - &op, &slp_op[i], &dt[i], &vectypes[i])) > - return false; > - continue; > + if (masked) > + { > + if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, > + diff_opno, &op, &slp_op[i], &dt[i], > + &vectypes[i])) > + return false; > + } > + else > + { > + vectypes[i] =3D TREE_TYPE (gimple_call_arg (stmt, i)); > + continue; > + } > } >=20=20 > if (!vect_is_simple_use (vinfo, stmt_info, slp_node, > @@ -3286,27 +3313,30 @@ vectorizable_call (vec_info *vinfo, > return false; > } >=20=20 > - /* We can only handle calls with arguments of the same type. */ > - if (rhs_type > - && !types_compatible_p (rhs_type, TREE_TYPE (op))) > + if ((int) i !=3D diff_opno) Is this ever false? It looks the continue above handles the other case. > { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "argument types differ.\n"); > - return false; > - } > - if (!rhs_type) > - rhs_type =3D TREE_TYPE (op); > + /* We can only handle calls with arguments of the same type. */ > + if (rhs_type > + && !types_compatible_p (rhs_type, TREE_TYPE (op))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "argument types differ.\n"); > + return false; > + } > + if (!rhs_type) > + rhs_type =3D TREE_TYPE (op); >=20=20 > - if (!vectype_in) > - vectype_in =3D vectypes[i]; > - else if (vectypes[i] > - && !types_compatible_p (vectypes[i], vectype_in)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "argument vector types differ.\n"); > - return false; > + if (!vectype_in) > + vectype_in =3D vectypes[i]; > + else if (vectypes[i] > + && !types_compatible_p (vectypes[i], vectype_in)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "argument vector types differ.\n"); > + return false; > + } > } > } > /* If all arguments are external or constant defs, infer the vector ty= pe > @@ -3382,8 +3412,8 @@ vectorizable_call (vec_info *vinfo, > || (modifier =3D=3D NARROW > && simple_integer_narrowing (vectype_out, vectype_in, > &convert_code)))) > - ifn =3D vectorizable_internal_function (cfn, callee, vectype_out, > - vectype_in); > + ifn =3D vectorizable_internal_function (cfn, callee, vectype_out, ve= ctype_in, > + &vectypes[0]); >=20=20 > /* If that fails, try asking for a target-specific built-in function. = */ > if (ifn =3D=3D IFN_LAST) > @@ -3461,7 +3491,7 @@ vectorizable_call (vec_info *vinfo, >=20=20 > if (loop_vinfo > && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) > - && (reduc_idx >=3D 0 || mask_opno >=3D 0)) > + && (reduc_idx >=3D 0 || masked)) > { > if (reduc_idx >=3D 0 > && (cond_fn =3D=3D IFN_LAST > @@ -3481,8 +3511,8 @@ vectorizable_call (vec_info *vinfo, > ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) > : ncopies); > tree scalar_mask =3D NULL_TREE; > - if (mask_opno >=3D 0) > - scalar_mask =3D gimple_call_arg (stmt_info->stmt, mask_opno); > + if (masked) > + scalar_mask =3D gimple_call_arg (stmt_info->stmt, diff_opno); > vect_record_loop_mask (loop_vinfo, masks, nvectors, > vectype_out, scalar_mask); > } > @@ -3547,7 +3577,7 @@ vectorizable_call (vec_info *vinfo, > { > /* We don't define any narrowing conditional functions > at present. */ > - gcc_assert (mask_opno < 0); > + gcc_assert (!masked); > tree half_res =3D make_ssa_name (vectype_in); > gcall *call > =3D gimple_build_call_internal_vec (ifn, vargs); > @@ -3567,16 +3597,16 @@ vectorizable_call (vec_info *vinfo, > } > else > { > - if (mask_opno >=3D 0 && masked_loop_p) > + if (masked && masked_loop_p) > { > unsigned int vec_num =3D vec_oprnds0.length (); > /* Always true for SLP. */ > gcc_assert (ncopies =3D=3D 1); > tree mask =3D vect_get_loop_mask (gsi, masks, vec_num, > vectype_out, i); > - vargs[mask_opno] =3D prepare_vec_mask > + vargs[diff_opno] =3D prepare_vec_mask > (loop_vinfo, TREE_TYPE (mask), mask, > - vargs[mask_opno], gsi); > + vargs[diff_opno], gsi); > } >=20=20 > gcall *call; > @@ -3614,13 +3644,13 @@ vectorizable_call (vec_info *vinfo, > if (masked_loop_p && reduc_idx >=3D 0) > vargs[varg++] =3D vargs[reduc_idx + 1]; >=20=20 > - if (mask_opno >=3D 0 && masked_loop_p) > + if (masked && masked_loop_p) > { > tree mask =3D vect_get_loop_mask (gsi, masks, ncopies, > vectype_out, j); > - vargs[mask_opno] > + vargs[diff_opno] > =3D prepare_vec_mask (loop_vinfo, TREE_TYPE (mask), mask, > - vargs[mask_opno], gsi); > + vargs[diff_opno], gsi); > } >=20=20 > gimple *new_stmt; > @@ -3639,7 +3669,7 @@ vectorizable_call (vec_info *vinfo, > { > /* We don't define any narrowing conditional functions at > present. */ > - gcc_assert (mask_opno < 0); > + gcc_assert (!masked); > tree half_res =3D make_ssa_name (vectype_in); > gcall *call =3D gimple_build_call_internal_vec (ifn, vargs); > gimple_call_set_lhs (call, half_res); > @@ -3683,7 +3713,7 @@ vectorizable_call (vec_info *vinfo, > { > auto_vec > vec_defs (nargs); > /* We don't define any narrowing conditional functions at present.= */ > - gcc_assert (mask_opno < 0); > + gcc_assert (!masked); > for (j =3D 0; j < ncopies; ++j) > { > /* Build argument list for the vectorized call. */ > diff --git a/gcc/tree.h b/gcc/tree.h > index 318019c4dc5373271551f5d9a48dadb57a29d4a7..770d0ddfcc9a7acda01ed2faf= a61eab0f1ba4cfa 100644 > --- a/gcc/tree.h > +++ b/gcc/tree.h > @@ -6558,4 +6558,12 @@ extern unsigned fndecl_dealloc_argno (tree); > object or pointer. Otherwise return null. */ > extern tree get_attr_nonstring_decl (tree, tree * =3D NULL); >=20=20 > +/* Return the type, or for a complex or vector type the type of its > + elements. */ > +extern tree element_type (tree); > + > +/* Return the precision of the type, or for a complex or vector type the > + precision of the type of its elements. */ > +extern unsigned int element_precision (const_tree); > + > #endif /* GCC_TREE_H */ > diff --git a/gcc/tree.c b/gcc/tree.c > index d98b77db50b29b22dc9af1f98cd86044f62af019..81e66dd710ce6bc237f508655= cfb437b40ec0bfa 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -6646,11 +6646,11 @@ valid_constant_size_p (const_tree size, cst_size_= error *perr /* =3D NULL */) > return true; > } >=20=20 > -/* Return the precision of the type, or for a complex or vector type the > - precision of the type of its elements. */ > +/* Return the type, or for a complex or vector type the type of its > + elements. */ >=20=20 > -unsigned int > -element_precision (const_tree type) > +tree > +element_type (tree type) > { > if (!TYPE_P (type)) > type =3D TREE_TYPE (type); > @@ -6658,7 +6658,16 @@ element_precision (const_tree type) > if (code =3D=3D COMPLEX_TYPE || code =3D=3D VECTOR_TYPE) > type =3D TREE_TYPE (type); >=20=20 > - return TYPE_PRECISION (type); > + return const_cast (type); The const_cast<> is redundant. Sorry for not thinking about it before, but we should probably have a test for the SLP case. E.g.: for (int i =3D 0; i < n; i +=3D 2) { int_type x_i0 =3D x[i]; int_type x_i1 =3D x[i + 1]; y[i] =3D (float_type) x_i1; y[i + 1] =3D (float_type) x_i0; } (with a permute thrown in for good measure). This will make sure that the (separate) SLP group matching code handles the call correctly. Thanks, Richard