From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 48DA93858D28 for ; Fri, 17 Dec 2021 12:44:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 48DA93858D28 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CFD351435; Fri, 17 Dec 2021 04:44:27 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.88]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3CA223F5A1; Fri, 17 Dec 2021 04:44:27 -0800 (PST) From: Richard Sandiford To: "Andre Vieira \(lists\) via Gcc-patches" Mail-Followup-To: "Andre Vieira \(lists\) via Gcc-patches" , Richard Biener , "Andre Vieira \(lists\)" , richard.sandiford@arm.com Cc: Richard Biener , "Andre Vieira \(lists\)" Subject: Re: [AArch64] Enable generation of FRINTNZ instructions References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com> <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr> <36e3469a-3922-d49e-4006-0088eac29157@arm.com> <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr> <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com> <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr> <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com> Date: Fri, 17 Dec 2021 12:44:25 +0000 In-Reply-To: (Andre Vieira via Gcc-patches's message of "Thu, 25 Nov 2021 13:53:08 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_LOTSOFHASH, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 17 Dec 2021 12:44:32 -0000 "Andre Vieira (lists) via Gcc-patches" writes: > On 22/11/2021 11:41, Richard Biener wrote: >> >>> On 18/11/2021 11:05, Richard Biener wrote: >>>> This is a good shout and made me think about something I hadn't before= ... I >>>> thought I could handle the vector forms later, but the problem is if I= add >>>> support for the scalar, it will stop the vectorizer. It seems >>>> vectorizable_call expects all arguments to have the same type, which d= oesn't >>>> work with passing the integer type as an operand work around. >> We already special case some IFNs there (masked load/store and gather) >> to ignore some args, so that would just add to this set. >> >> Richard. > Hi, > > Reworked it to add support of the new IFN to the vectorizer. Was=20 > initially trying to make vectorizable_call and=20 > vectorizable_internal_function handle IFNs with different inputs more=20 > generically, using the information we have in the _direct structs=20 > regarding what operands to get the modes from. Unfortunately, that=20 > wasn't straightforward because of how vectorizable_call assumes operands= =20 > have the same type and uses the type of the DEF_STMT_INFO of the=20 > non-constant operands (either output operand or non-constant inputs) to=20 > determine the type of constants. I assume there is some reason why we=20 > use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on=20 > the argument types. That is why I ended up with this sort of half-way=20 > mix of both, which still allows room to add more IFNs that don't take=20 > inputs of the same type, but require adding a bit of special casing=20 > similar to the IFN_FTRUNC_INT and masking ones. > > Bootstrapped on aarch64-none-linux. Still leaving the match.pd stuff to Richard, but otherwise: > index bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463= f6a46f9fd0fb43f 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF "TARGET_SIMD= _F16INST") > SF DF]) >=20=20 > ;; Scalar and vetor modes for SF, DF. > -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF]) > +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD") Nit: excess space between [ and (. > + (V4SF "TARGET_SIMD") > + (V2DF "TARGET_SIMD") > + (DF "TARGET_FLOAT") > + (SF "TARGET_FLOAT")]) >=20=20 > ;; Advanced SIMD single Float modes. > (define_mode_iterator VDQSF [V2SF V4SF]) > [=E2=80=A6] > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi > index 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4= 396a490f33f174c 100644 > --- a/gcc/doc/md.texi > +++ b/gcc/doc/md.texi > @@ -6175,6 +6175,15 @@ operands; otherwise, it may not. >=20=20 > This pattern is not allowed to @code{FAIL}. >=20=20 > +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern > +@item @samp{ftrunc@var{m}@var{n}2} > +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and s= tore > +the result in operand 0. Both operands have mode @var{m}, which is a sca= lar or > +vector floating-point mode. Exception must be thrown if operand 1 does = not fit Maybe =E2=80=9CAn exception must be raised=E2=80=9D? =E2=80=9Cthrown=E2=80= =9D makes it sound like a signal must be raised or C++ exception thrown. > +in a @var{n} mode signed integer as it would have if the truncation happ= ened > +through separate floating point to integer conversion. > + > + > @cindex @code{round@var{m}2} instruction pattern > @item @samp{round@var{m}2} > Round operand 1 to the nearest integer, rounding away from zero in the > [=E2=80=A6] > @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab opta= b, tree_pair types, > !=3D CODE_FOR_nothing); > } >=20=20 > +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab, > + tree_pair types, > + optimization_type opt_type) Formatting nit: should be a line break after =E2=80=9Cbool=E2=80=9D > +{ > + return (convert_optab_handler (optab, TYPE_MODE (types.first), > + TYPE_MODE (element_type (types.second)), > + opt_type) !=3D CODE_FOR_nothing); > +} > + > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > #define direct_ternary_optab_supported_p direct_optab_supported_p > [=E2=80=A6] > diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c b/gcc/testsui= te/gcc.target/aarch64/frintnz_vec.c > new file mode 100644 > index 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee5= 1d77ff23fee68ac > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c > @@ -0,0 +1,47 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=3Darmv8.5-a" } */ > +/* { dg-require-effective-target aarch64_frintnzx_ok } */ > +/* { dg-final { check-function-bodies "**" "" } } */ > + > +#define TEST(name,float_type,int_type) \ > +void \ > +name (float_type * __restrict__ x, float_type * __restrict__ y, int n) \ > +{ \ > + for (int i =3D 0; i < n; ++i) \ > + { \ > + int_type x_i =3D x[i]; \ > + y[i] =3D (float_type) x_i; \ > + } \ > +} > + > +/* > +** f1: > +** ... > +** frint32z v0.4s, v0.4s I don't think we can rely on v0 being used here. v[0-9]+\.4s would be safer. > +** ... > +*/ > +TEST(f1, float, int) > + > +/* > +** f2: > +** ... > +** frint64z v0.4s, v0.4s > +** ... > +*/ > +TEST(f2, float, long long) > + > +/* > +** f3: > +** ... > +** frint32z v0.2d, v0.2d > +** ... > +*/ > +TEST(f3, double, int) > + > +/* > +** f4: > +** ... > +** frint64z v0.2d, v0.2d > +** ... > +*/ > +TEST(f4, double, long long) > [=E2=80=A6] > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c > index 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70c= e0ac9da891dbe62 100644 > --- a/gcc/tree-vect-stmts.c > +++ b/gcc/tree-vect-stmts.c > @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo, >=20=20 > static internal_fn > vectorizable_internal_function (combined_fn cfn, tree fndecl, > - tree vectype_out, tree vectype_in) > + tree vectype_out, tree vectype_in, > + tree *vectypes) > { > internal_fn ifn; > if (internal_fn_p (cfn)) > @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, t= ree fndecl, > const direct_internal_fn_info &info =3D direct_internal_fn (ifn); > if (info.vectorizable) > { > - tree type0 =3D (info.type0 < 0 ? vectype_out : vectype_in); > - tree type1 =3D (info.type1 < 0 ? vectype_out : vectype_in); > + tree type0 =3D (info.type0 < 0 ? vectype_out : vectypes[info.type0]); > + if (!type0) > + type0 =3D vectype_in; > + tree type1 =3D (info.type1 < 0 ? vectype_out : vectypes[info.type1]); > + if (!type1) > + type1 =3D vectype_in; > if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1), > OPTIMIZE_FOR_SPEED)) > return ifn; > @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo, > rhs_type =3D unsigned_type_node; > } >=20=20 > - int mask_opno =3D -1; > + /* The argument that is not of the same type as the others. */ > + int diff_opno =3D -1; > + bool masked =3D false; > if (internal_fn_p (cfn)) > - mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); > + { > + if (cfn =3D=3D CFN_FTRUNC_INT) > + /* For FTRUNC this represents the argument that carries the type of the > + intermediate signed integer. */ > + diff_opno =3D 1; > + else > + { > + /* For masked operations this represents the argument that carries the > + mask. */ > + diff_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); > + masked =3D diff_opno >=3D 0; > + } > + } I think it would be cleaner not to process argument 1 at all for CFN_FTRUNC_INT. There's no particular need to vectorise it. > for (i =3D 0; i < nargs; i++) > { > - if ((int) i =3D=3D mask_opno) > + if ((int) i =3D=3D diff_opno && masked) > { > - if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno, > - &op, &slp_op[i], &dt[i], &vectypes[i])) > + if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, > + diff_opno, &op, &slp_op[i], &dt[i], > + &vectypes[i])) > return false; > continue; > } > [=E2=80=A6] > diff --git a/gcc/tree.c b/gcc/tree.c > index 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e= 2480ed5522f78c1 100644 > --- a/gcc/tree.c > +++ b/gcc/tree.c > @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, cst_size_= error *perr /* =3D NULL */) > return true; > } >=20=20 > -/* Return the precision of the type, or for a complex or vector type the > - precision of the type of its elements. */ > +/* Return the type, or for a complex or vector type the type of its > + elements. */ >=20=20 > -unsigned int > -element_precision (const_tree type) > +tree > +element_type (const_tree type) > { > if (!TYPE_P (type)) > type =3D TREE_TYPE (type); > @@ -6657,7 +6657,16 @@ element_precision (const_tree type) > if (code =3D=3D COMPLEX_TYPE || code =3D=3D VECTOR_TYPE) > type =3D TREE_TYPE (type); >=20=20 > - return TYPE_PRECISION (type); > + return (tree) type; I think we should stick a const_cast in element_precision and make element_type take a plain =E2=80=9Ctype=E2=80=9D. As it stands element_typ= e is an implicit const_cast for many cases. Thanks, Richard > +} > + > +/* Return the precision of the type, or for a complex or vector type the > + precision of the type of its elements. */ > + > +unsigned int > +element_precision (const_tree type) > +{ > + return TYPE_PRECISION (element_type (type)); > } >=20=20 > /* Return true if CODE represents an associative tree code. Otherwise