From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CDF8C38A9087 for ; Tue, 15 Nov 2022 18:24:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CDF8C38A9087 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D4E9713D5; Tue, 15 Nov 2022 10:24:26 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id F1D533F587; Tue, 15 Nov 2022 10:24:19 -0800 (PST) From: Richard Sandiford To: "Andre Vieira \(lists\)" Mail-Followup-To: "Andre Vieira \(lists\)" ,Richard Biener , "Andre Vieira \(lists\) via Gcc-patches" , richard.sandiford@arm.com Cc: Richard Biener , "Andre Vieira \(lists\) via Gcc-patches" Subject: Re: [AArch64] Enable generation of FRINTNZ instructions References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com> <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr> <36e3469a-3922-d49e-4006-0088eac29157@arm.com> <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr> <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com> <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr> <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com> <9d3755df-6c41-20e4-31fb-811e5cd9182a@arm.com> <231396s0-2756-q51s-q55-o8npqo91on32@fhfr.qr> <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> <051c70ff-7c59-bbfe-7780-4cb2380d9a93@arm.com> <1cdffce8-e0e5-3304-52b9-3b736a4d380d@arm.com> Date: Tue, 15 Nov 2022 18:24:18 +0000 In-Reply-To: <1cdffce8-e0e5-3304-52b9-3b736a4d380d@arm.com> (Andre Vieira's message of "Mon, 7 Nov 2022 14:19:09 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-33.2 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: "Andre Vieira (lists)" writes: > On 07/11/2022 11:05, Richard Biener wrote: >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: >> >>> Sorry for the delay, just been reminded I still had this patch outstand= ing >>> from last stage 1. Hopefully since it has been mostly reviewed it could= go in >>> for this stage 1? >>> >>> I addressed the comments and gave the slp-part of vectorizable_call som= e TLC >>> to make it work. >>> >>> I also changed vect_get_slp_defs as I noticed that the call from >>> vectorizable_call was creating an auto_vec with 'nargs' that might be l= ess >>> than the number of children in the slp_node >> how so? Please fix that in the caller. It looks like it probably >> shoud use vect_nargs instead? > Well that was my first intuition, but when I looked at it further the=20 > variant it's calling: > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec >=20 > *vec_oprnds, unsigned n) > > Is actually creating a vector of vectors of slp defs. So for each child=20 > of slp_node it calls: > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > Which returns a vector of vectorized defs. So vect_nargs would be the=20 > right size for the inner vec of vec_defs, but the outer should=20 > have the same number of elements as the original slp_node has children. > > However, at the call site (vectorizable_call), the operand we pass to=20 > vect_get_slp_defs 'vec_defs', is initialized before the code-path is=20 > specialized for slp_node. I'll go see if I can change the call site to=20 > not have to do that, given the continue at the end of the if (slp_node)=20 > BB I don't think it needs to use vec_defs after it, but it may require=20 > some massaging to be able to define it separately for each code-path. > >> >>> , so that quick_push might not be >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I d= idn't >>> actually come across any failure because of it though. Happy to split t= his >>> into a separate patch if needed. >>> >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and >>> x86_64-pc-linux-gnu. >>> >>> OK for trunk? >> I'll leave final approval to Richard but >> >> - This only needs 1 bit, but occupies the full 16 to ensure a nice >> + This only needs 1 bit, but occupies the full 15 to ensure a nice >> layout. */ >> unsigned int vectorizable : 16; >> >> you don't actually change the width of the bitfield. I would find >> it more natural to have >> >> signed int type0 : 7; >> signed int type0_vtrans : 1; >> signed int type1 : 7; >> signed int type1_vtrans : 1; >> >> with typeN_vtrans specifying how the types transform when vectorized. >> I would imagine another variant we could need is narrow/widen >> according to either result or other argument type? That said, >> just your flag would then be >> >> signed int type0 : 7; >> signed int pad : 1; >> signed int type1 : 7; >> signed int type1_vect_as_scalar : 1; >> >> ? > That's a cool idea! I'll leave it as a single bit for now like that, if=20 > we want to re-use it for multiple transformations we will obviously need= =20 > to rename & give it more bits. I think we should steal bits from vectorizable rather than shrink type0 and type1 though. Then add a 14-bit padding field to show how many bits are left. > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo, > rhs_type =3D unsigned_type_node; > } >=20 > + /* The argument that is not of the same type as the others. */ > int mask_opno =3D -1; > + int scalar_opno =3D -1; > if (internal_fn_p (cfn)) > - mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); > + { > + internal_fn ifn =3D as_internal_fn (cfn); > + if (direct_internal_fn_p (ifn) > + && direct_internal_fn (ifn).type1_is_scalar_p) > + scalar_opno =3D direct_internal_fn (ifn).type1; > + else > + /* For masked operations this represents the argument that carries the > + mask. */ > + mask_opno =3D internal_fn_mask_index (as_internal_fn (cfn)); This doesn't seem logically like an else. We should do both. LGTM otherwise for the bits outside match.pd. If Richard's happy with the match.pd bits then I think the patch is OK with those changes and without the vect_get_slp_defs thing (as you mentioned downthread). Thanks, Richard >> >>> gcc/ChangeLog: >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * config/aarch64/aarch64.md= (ftrunc2): New >>> pattern. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * config/aarch64/iterators.= md (FRINTNZ): New iterator. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (frintnz_mode): New int att= ribute. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (VSFDF): Make iterator cond= itional. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * internal-fn.def (FTRUNC_I= NT): New IFN. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * internal-fn.cc (ftrunc_in= t_direct): New define. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (expand_ftrunc_int_optab_fn= ): New custom expander. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (direct_ftrunc_int_optab_su= pported_p): New supported_p. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * internal-fn.h (direct_int= ernal_fn_info): Add new member >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 type1_is_scalar_p. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * match.pd: Add to the exis= ting TRUNC pattern match. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * optabs.def (ftrunc_int): = New entry. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * stor-layout.h (element_pr= ecision): Moved from here... >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree.h (element_precision= ): ... to here. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (element_type): New declara= tion. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree.cc (element_type): N= ew function. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (element_precision): Change= d to use element_type. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree-vect-stmts.cc (vecto= rizable_internal_function): Add >>> support for >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IFNs with different input t= ypes. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_get_scalar_oprnds): N= ew function. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vectorizable_call): Teach = to handle IFN_FTRUNC_INT. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * tree-vect-slp.cc (check_s= calar_arg_ok): New function. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_slp_analyze_node_oper= ations): Use check_scalar_arg_ok. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 (vect_get_slp_defs): Ensure= vec_oprnds has enough slots to push. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * doc/md.texi: New entry fo= r ftrunc pattern name. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * doc/sourcebuild.texi (aar= ch64_frintzx_ok): New target. >>> >>> gcc/testsuite/ChangeLog: >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/merge_= trunc1.c: Adapted to skip if frintnz >>> instructions available. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * lib/target-supports.exp: = Added aarch64_frintnzx_ok target and >>> aarch64_frintz options. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/frintn= z.c: New test. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/frintn= z_vec.c: New test. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * gcc.target/aarch64/frintn= z_slp.c: New test. >>>