From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id AA8BF3959E51 for ; Wed, 16 Nov 2022 12:25:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AA8BF3959E51 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 8005D1F8F6; Wed, 16 Nov 2022 12:25:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1668601506; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wge9ahQjUBbK3KfJw98HzPpFiOuxmiAzCu8PEFnzazQ=; b=sV+q3cWUEN3ryvnnA/G8i22wyoZx6UwbbIV3nKJbduXgf55648jSyWHqNdzsMmWwwtw3+h 6zVq8Bv8bSVQklVCK2keoOIpYj0ao4M1EeCLFeyKEn/blAwftKpA0kZEV7gmfCxUvBt3Kt thbZSLnnItfoeSo2Wc3zNqfFUz07SAs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1668601506; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=wge9ahQjUBbK3KfJw98HzPpFiOuxmiAzCu8PEFnzazQ=; b=/wPHpmexxd3X8ZcAcks2Auwe2gK3MU40vD0IwMLu9dKKYA4F/QlGSf0EmSL9wWMk3b+Blg UxVg7lmizcGHvGAA== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 62C062C14F; Wed, 16 Nov 2022 12:25:06 +0000 (UTC) Date: Wed, 16 Nov 2022 12:25:06 +0000 (UTC) From: Richard Biener To: Richard Sandiford cc: "Andre Vieira (lists)" , "Andre Vieira (lists) via Gcc-patches" Subject: Re: [AArch64] Enable generation of FRINTNZ instructions In-Reply-To: Message-ID: References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com> <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr> <36e3469a-3922-d49e-4006-0088eac29157@arm.com> <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr> <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com> <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr> <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com> <9d3755df-6c41-20e4-31fb-811e5cd9182a@arm.com> <231396s0-2756-q51s-q55-o8npqo91on32@fhfr.qr> <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> <051c70ff-7c59-bbfe-7780-4cb2380d9a93@arm.com> <1cdffce8-e0e5-3304-52b9-3b736a4d380d@arm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="-1609957120-1461803241-1668601506=:3995" X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---1609957120-1461803241-1668601506=:3995 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT On Tue, 15 Nov 2022, Richard Sandiford wrote: > "Andre Vieira (lists)" writes: > > On 07/11/2022 11:05, Richard Biener wrote: > >> On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > >> > >>> Sorry for the delay, just been reminded I still had this patch outstanding > >>> from last stage 1. Hopefully since it has been mostly reviewed it could go in > >>> for this stage 1? > >>> > >>> I addressed the comments and gave the slp-part of vectorizable_call some TLC > >>> to make it work. > >>> > >>> I also changed vect_get_slp_defs as I noticed that the call from > >>> vectorizable_call was creating an auto_vec with 'nargs' that might be less > >>> than the number of children in the slp_node > >> how so? Please fix that in the caller. It looks like it probably > >> shoud use vect_nargs instead? > > Well that was my first intuition, but when I looked at it further the > > variant it's calling: > > void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > > > *vec_oprnds, unsigned n) > > > > Is actually creating a vector of vectors of slp defs. So for each child > > of slp_node it calls: > > void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) > > > > Which returns a vector of vectorized defs. So vect_nargs would be the > > right size for the inner vec of vec_defs, but the outer should > > have the same number of elements as the original slp_node has children. > > > > However, at the call site (vectorizable_call), the operand we pass to > > vect_get_slp_defs 'vec_defs', is initialized before the code-path is > > specialized for slp_node. I'll go see if I can change the call site to > > not have to do that, given the continue at the end of the if (slp_node) > > BB I don't think it needs to use vec_defs after it, but it may require > > some massaging to be able to define it separately for each code-path. > > > >> > >>> , so that quick_push might not be > >>> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't > >>> actually come across any failure because of it though. Happy to split this > >>> into a separate patch if needed. > >>> > >>> Bootstrapped and regression tested on aarch64-none-linux-gnu and > >>> x86_64-pc-linux-gnu. > >>> > >>> OK for trunk? > >> I'll leave final approval to Richard but > >> > >> - This only needs 1 bit, but occupies the full 16 to ensure a nice > >> + This only needs 1 bit, but occupies the full 15 to ensure a nice > >> layout. */ > >> unsigned int vectorizable : 16; > >> > >> you don't actually change the width of the bitfield. I would find > >> it more natural to have > >> > >> signed int type0 : 7; > >> signed int type0_vtrans : 1; > >> signed int type1 : 7; > >> signed int type1_vtrans : 1; > >> > >> with typeN_vtrans specifying how the types transform when vectorized. > >> I would imagine another variant we could need is narrow/widen > >> according to either result or other argument type? That said, > >> just your flag would then be > >> > >> signed int type0 : 7; > >> signed int pad : 1; > >> signed int type1 : 7; > >> signed int type1_vect_as_scalar : 1; > >> > >> ? > > That's a cool idea! I'll leave it as a single bit for now like that, if > > we want to re-use it for multiple transformations we will obviously need > > to rename & give it more bits. > > I think we should steal bits from vectorizable rather than shrink > type0 and type1 though. Then add a 14-bit padding field to show > how many bits are left. > > > @@ -3340,9 +3364,20 @@ vectorizable_call (vec_info *vinfo, > > rhs_type = unsigned_type_node; > > } > > > > + /* The argument that is not of the same type as the others. */ > > int mask_opno = -1; > > + int scalar_opno = -1; > > if (internal_fn_p (cfn)) > > - mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > + { > > + internal_fn ifn = as_internal_fn (cfn); > > + if (direct_internal_fn_p (ifn) > > + && direct_internal_fn (ifn).type1_is_scalar_p) > > + scalar_opno = direct_internal_fn (ifn).type1; > > + else > > + /* For masked operations this represents the argument that carries the > > + mask. */ > > + mask_opno = internal_fn_mask_index (as_internal_fn (cfn)); > > This doesn't seem logically like an else. We should do both. > > LGTM otherwise for the bits outside match.pd. If Richard's happy with > the match.pd bits then I think the patch is OK with those changes and > without the vect_get_slp_defs thing (as you mentioned downthread). Yes, the match.pd part looked OK. > Thanks, > Richard > > > >> > >>> gcc/ChangeLog: > >>> > >>>         * config/aarch64/aarch64.md (ftrunc2): New > >>> pattern. > >>>         * config/aarch64/iterators.md (FRINTNZ): New iterator. > >>>         (frintnz_mode): New int attribute. > >>>         (VSFDF): Make iterator conditional. > >>>         * internal-fn.def (FTRUNC_INT): New IFN. > >>>         * internal-fn.cc (ftrunc_int_direct): New define. > >>>         (expand_ftrunc_int_optab_fn): New custom expander. > >>>         (direct_ftrunc_int_optab_supported_p): New supported_p. > >>>         * internal-fn.h (direct_internal_fn_info): Add new member > >>>         type1_is_scalar_p. > >>>         * match.pd: Add to the existing TRUNC pattern match. > >>>         * optabs.def (ftrunc_int): New entry. > >>>         * stor-layout.h (element_precision): Moved from here... > >>>         * tree.h (element_precision): ... to here. > >>>         (element_type): New declaration. > >>>         * tree.cc (element_type): New function. > >>>         (element_precision): Changed to use element_type. > >>>         * tree-vect-stmts.cc (vectorizable_internal_function): Add > >>> support for > >>>         IFNs with different input types. > >>>         (vect_get_scalar_oprnds): New function. > >>>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT. > >>>         * tree-vect-slp.cc (check_scalar_arg_ok): New function. > >>>         (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. > >>>         (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. > >>>         * doc/md.texi: New entry for ftrunc pattern name. > >>>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. > >>> > >>> gcc/testsuite/ChangeLog: > >>> > >>>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz > >>> instructions available. > >>>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and > >>> aarch64_frintz options. > >>>         * gcc.target/aarch64/frintnz.c: New test. > >>>         * gcc.target/aarch64/frintnz_vec.c: New test. > >>>         * gcc.target/aarch64/frintnz_slp.c: New test. > >>> > -- Richard Biener SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg) ---1609957120-1461803241-1668601506=:3995--