From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id E048C3858C53 for ; Mon, 15 May 2023 10:47:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E048C3858C53 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 388292F4; Mon, 15 May 2023 03:48:02 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D00B73F67D; Mon, 15 May 2023 03:47:16 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,"Andre Vieira \(lists\)" , Richard Biener , "gcc-patches\@gcc.gnu.org" , richard.sandiford@arm.com Cc: "Andre Vieira \(lists\)" , Richard Biener , "gcc-patches\@gcc.gnu.org" Subject: Re: [PATCH 2/3] Refactor widen_plus as internal_fn References: <51ce8969-3130-452e-092e-f9d91eff2dad@arm.com> <4df83136-82f9-de0b-7e66-007b9047174d@arm.com> Date: Mon, 15 May 2023 11:47:15 +0100 In-Reply-To: (Richard Biener's message of "Mon, 15 May 2023 10:20:54 +0000 (UTC)") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-23.3 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Richard Biener writes: > On Fri, 12 May 2023, Richard Sandiford wrote: > >> Richard Biener writes: >> > On Fri, 12 May 2023, Andre Vieira (lists) wrote: >> > >> >> I have dealt with, I think..., most of your comments. There's quite a few >> >> changes, I think it's all a bit simpler now. I made some other changes to the >> >> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >> >> the same behaviour as we had with the tree codes before. Also added some extra >> >> checks to tree-cfg.cc that made sense to me. >> >> >> >> I am still regression testing the gimple-range-op change, as that was a last >> >> minute change, but the rest survived a bootstrap and regression test on >> >> aarch64-unknown-linux-gnu. >> >> >> >> cover letter: >> >> >> >> This patch replaces the existing tree_code widen_plus and widen_minus >> >> patterns with internal_fn versions. >> >> >> >> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >> >> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >> >> except they provide convenience wrappers for defining conversions that require >> >> a hi/lo split. Each definition for will require optabs for _hi and _lo >> >> and each of those will also require a signed and unsigned version in the case >> >> of widening. The hi/lo pair is necessary because the widening and narrowing >> >> operations take n narrow elements as inputs and return n/2 wide elements as >> >> outputs. The 'lo' operation operates on the first n/2 elements of input. The >> >> 'hi' operation operates on the second n/2 elements of input. Defining an >> >> internal_fn along with hi/lo variations allows a single internal function to >> >> be returned from a vect_recog function that will later be expanded to hi/lo. >> >> >> >> >> >> For example: >> >> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >> >> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_add_hi_ -> >> >> (u/s)addl2 >> >> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_add_lo_ >> >> -> (u/s)addl >> >> >> >> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >> >> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. >> > >> > What I still don't understand is how we are so narrowly focused on >> > HI/LO? We need a combined scalar IFN for pattern selection (not >> > sure why that's now called _HILO, I expected no suffix). Then there's >> > three possibilities the target can implement this: >> > >> > 1) with a widen_[su]add instruction - I _think_ that's what >> > RISCV is going to offer since it is a target where vector modes >> > have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead >> > RVV can do a V4HI to V4SI widening and widening add/subtract >> > using vwadd[u] and vwsub[u] (the HI->SI widening is actually >> > done with a widening add of zero - eh). >> > IIRC GCN is the same here. >> >> SVE currently does this too, but the addition and widening are >> separate operations. E.g. in principle there's no reason why >> you can't sign-extend one operand, zero-extend the other, and >> then add the result together. Or you could extend them from >> different sizes (QI and HI). All of those are supported >> (if the costing allows them). > > I see. So why does the target the expose widen_[su]add at all? It shouldn't (need to) do that. I don't think we should have an optab for the unsplit operation. At least on SVE, we really want the extensions to be fused with loads (where possible) rather than with arithmetic. We can still do the widening arithmetic in one go. It's just that fusing with the loads works for the mixed-sign and mixed-size cases, and can handle more than just doubling the element size. >> If the target has operations to do combined extending and adding (or >> whatever), then at the moment we rely on combine to generate them. >> >> So I think this case is separate from Andre's work. The addition >> itself is just an ordinary addition, and any widening happens by >> vectorising a CONVERT/NOP_EXPR. >> >> > 2) with a widen_[su]add{_lo,_hi} combo - that's what the tree >> > codes currently support (exclusively) >> > 3) similar, but widen_[su]add{_even,_odd} >> > >> > that said, things like decomposes_to_hilo_fn_p look to paint us into >> > a 2) corner without good reason. >> >> I suppose one question is: how much of the patch is really specific >> to HI/LO, and how much is just grouping two halves together? > > Yep, that I don't know for sure. > >> The nice >> thing about the internal-fn grouping macros is that, if (3) is >> implemented in future, the structure will strongly encourage even/odd >> pairs to be supported for all operations that support hi/lo. That is, >> I would expect the grouping macros to be extended to define even/odd >> ifns alongside hi/lo ones, rather than adding separate definitions >> for even/odd functions. >> >> If so, at least from the internal-fn.* side of things, I think the question >> is whether it's OK to stick with hilo names for now, or whether we should >> use more forward-looking names. > > I think for parts that are independent we could use a more > forward-looking name. Maybe _halves? Using _halves for the ifn macros sounds good to me FWIW. > But I'm also not sure > how much of that is really needed (it seems to be tied around > optimizing optabs space?) Not sure what you mean by "this". Optabs space shouldn't be a problem though. The optab encoding gives us a full int to play with, and it could easily go up to 64 bits if necessary/convenient. At least on the internal-fn.* side, the aim is really just to establish a regular structure, so that we don't have arbitrary differences between different widening operations, or too much cut-&-paste. Thanks, Richard