From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 5FD8F3858401 for ; Mon, 7 Nov 2022 14:19:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5FD8F3858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 134A91FB; Mon, 7 Nov 2022 06:19:21 -0800 (PST) Received: from [10.1.33.159] (E121495.Arm.com [10.1.33.159]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 74FE33F73D; Mon, 7 Nov 2022 06:19:14 -0800 (PST) Message-ID: <1cdffce8-e0e5-3304-52b9-3b736a4d380d@arm.com> Date: Mon, 7 Nov 2022 14:19:09 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: Re: [AArch64] Enable generation of FRINTNZ instructions Content-Language: en-US To: Richard Biener Cc: "Andre Vieira (lists) via Gcc-patches" , richard.sandiford@arm.com References: <8225375c-eb9e-f9b3-6bcd-9fbccf2fc87b@arm.com> <70s9nn94-452-5rrr-4458-q6n3qp563652@fhfr.qr> <36e3469a-3922-d49e-4006-0088eac29157@arm.com> <653o8886-3p5n-sr82-9n83-71q33o8824@fhfr.qr> <6c730f35-10b1-2843-cffc-4ed0851380be@arm.com> <85sr96q-o3s-602o-3436-40713n68pp84@fhfr.qr> <8d593d5f-41a0-6051-0ce0-d72834ecfa25@arm.com> <9d3755df-6c41-20e4-31fb-811e5cd9182a@arm.com> <231396s0-2756-q51s-q55-o8npqo91on32@fhfr.qr> <5d7bb7af-b09e-cb91-b457-c6148f65028e@arm.com> <051c70ff-7c59-bbfe-7780-4cb2380d9a93@arm.com> From: "Andre Vieira (lists)" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00,BODY_8BITS,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 07/11/2022 11:05, Richard Biener wrote: > On Fri, 4 Nov 2022, Andre Vieira (lists) wrote: > >> Sorry for the delay, just been reminded I still had this patch outstanding >> from last stage 1. Hopefully since it has been mostly reviewed it could go in >> for this stage 1? >> >> I addressed the comments and gave the slp-part of vectorizable_call some TLC >> to make it work. >> >> I also changed vect_get_slp_defs as I noticed that the call from >> vectorizable_call was creating an auto_vec with 'nargs' that might be less >> than the number of children in the slp_node > how so? Please fix that in the caller. It looks like it probably > shoud use vect_nargs instead? Well that was my first intuition, but when I looked at it further the variant it's calling: void vect_get_slp_defs (vec_info *, slp_tree slp_node, vec > *vec_oprnds, unsigned n) Is actually creating a vector of vectors of slp defs. So for each child of slp_node it calls: void vect_get_slp_defs (slp_tree slp_node, vec *vec_defs) Which returns a vector of vectorized defs. So vect_nargs would be the right size for the inner vec of vec_defs, but the outer should have the same number of elements as the original slp_node has children. However, at the call site (vectorizable_call), the operand we pass to vect_get_slp_defs 'vec_defs', is initialized before the code-path is specialized for slp_node. I'll go see if I can change the call site to not have to do that, given the continue at the end of the if (slp_node) BB I don't think it needs to use vec_defs after it, but it may require some massaging to be able to define it separately for each code-path. > >> , so that quick_push might not be >> safe as is, so I added the reserve (n) to ensure it's safe to push. I didn't >> actually come across any failure because of it though. Happy to split this >> into a separate patch if needed. >> >> Bootstrapped and regression tested on aarch64-none-linux-gnu and >> x86_64-pc-linux-gnu. >> >> OK for trunk? > I'll leave final approval to Richard but > > - This only needs 1 bit, but occupies the full 16 to ensure a nice > + This only needs 1 bit, but occupies the full 15 to ensure a nice > layout. */ > unsigned int vectorizable : 16; > > you don't actually change the width of the bitfield. I would find > it more natural to have > > signed int type0 : 7; > signed int type0_vtrans : 1; > signed int type1 : 7; > signed int type1_vtrans : 1; > > with typeN_vtrans specifying how the types transform when vectorized. > I would imagine another variant we could need is narrow/widen > according to either result or other argument type? That said, > just your flag would then be > > signed int type0 : 7; > signed int pad : 1; > signed int type1 : 7; > signed int type1_vect_as_scalar : 1; > > ? That's a cool idea! I'll leave it as a single bit for now like that, if we want to re-use it for multiple transformations we will obviously need to rename & give it more bits. > >> gcc/ChangeLog: >> >>         * config/aarch64/aarch64.md (ftrunc2): New >> pattern. >>         * config/aarch64/iterators.md (FRINTNZ): New iterator. >>         (frintnz_mode): New int attribute. >>         (VSFDF): Make iterator conditional. >>         * internal-fn.def (FTRUNC_INT): New IFN. >>         * internal-fn.cc (ftrunc_int_direct): New define. >>         (expand_ftrunc_int_optab_fn): New custom expander. >>         (direct_ftrunc_int_optab_supported_p): New supported_p. >>         * internal-fn.h (direct_internal_fn_info): Add new member >>         type1_is_scalar_p. >>         * match.pd: Add to the existing TRUNC pattern match. >>         * optabs.def (ftrunc_int): New entry. >>         * stor-layout.h (element_precision): Moved from here... >>         * tree.h (element_precision): ... to here. >>         (element_type): New declaration. >>         * tree.cc (element_type): New function. >>         (element_precision): Changed to use element_type. >>         * tree-vect-stmts.cc (vectorizable_internal_function): Add >> support for >>         IFNs with different input types. >>         (vect_get_scalar_oprnds): New function. >>         (vectorizable_call): Teach to handle IFN_FTRUNC_INT. >>         * tree-vect-slp.cc (check_scalar_arg_ok): New function. >>         (vect_slp_analyze_node_operations): Use check_scalar_arg_ok. >>         (vect_get_slp_defs): Ensure vec_oprnds has enough slots to push. >>         * doc/md.texi: New entry for ftrunc pattern name. >>         * doc/sourcebuild.texi (aarch64_frintzx_ok): New target. >> >> gcc/testsuite/ChangeLog: >> >>         * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintnz >> instructions available. >>         * lib/target-supports.exp: Added aarch64_frintnzx_ok target and >> aarch64_frintz options. >>         * gcc.target/aarch64/frintnz.c: New test. >>         * gcc.target/aarch64/frintnz_vec.c: New test. >>         * gcc.target/aarch64/frintnz_slp.c: New test. >>