From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 1E9C63858C31 for ; Mon, 15 May 2023 12:21:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1E9C63858C31 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 29D5F1F8BA; Mon, 15 May 2023 12:21:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1684153315; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0oIvx1cti+IxiTlrhs8/Q68knJ4XyNMNTtDEkQgMRTo=; b=V+OGn6fzzIFm2YCJxZj+GtZ3BrpgW/fn81xgva+PPBDb1VAPE1qcosH8Xvid5ssbkGXJEK bDabKO6I9g0fIjPlGaV6T4pbdxwIC+T2TM6g5VG3ygtP1y0Q/0QLG8ZBM1eSjSONH/4PkF khQVRKDsq77dG9iGUCGFq9iPz5lz48U= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1684153315; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=0oIvx1cti+IxiTlrhs8/Q68knJ4XyNMNTtDEkQgMRTo=; b=wpcGJKbxz7n6NoQ1JvdjYiz7ZeeKHCzVmR/RUaXlrevqFoyY7+yA3SvSPT2A0TryqXq6hd Qf6GyOyu3wto8EBQ== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 167D82C141; Mon, 15 May 2023 12:21:54 +0000 (UTC) Date: Mon, 15 May 2023 12:21:54 +0000 (UTC) From: Richard Biener To: "Andre Vieira (lists)" cc: Richard Sandiford , Richard Biener , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH 2/3] Refactor widen_plus as internal_fn In-Reply-To: <37abc128-776c-0f04-f755-30e514909e15@arm.com> Message-ID: References: <51ce8969-3130-452e-092e-f9d91eff2dad@arm.com> <4df83136-82f9-de0b-7e66-007b9047174d@arm.com> <37abc128-776c-0f04-f755-30e514909e15@arm.com> User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Mon, 15 May 2023, Andre Vieira (lists) wrote: > > > On 15/05/2023 12:01, Richard Biener wrote: > > On Mon, 15 May 2023, Richard Sandiford wrote: > > > >> Richard Biener writes: > >>> On Fri, 12 May 2023, Richard Sandiford wrote: > >>> > >>>> Richard Biener writes: > >>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote: > >>>>> > >>>>>> I have dealt with, I think..., most of your comments. There's quite a > >>>>>> few > >>>>>> changes, I think it's all a bit simpler now. I made some other changes > >>>>>> to the > >>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to > >>>>>> preserve > >>>>>> the same behaviour as we had with the tree codes before. Also added > >>>>>> some extra > >>>>>> checks to tree-cfg.cc that made sense to me. > >>>>>> > >>>>>> I am still regression testing the gimple-range-op change, as that was a > >>>>>> last > >>>>>> minute change, but the rest survived a bootstrap and regression test on > >>>>>> aarch64-unknown-linux-gnu. > >>>>>> > >>>>>> cover letter: > >>>>>> > >>>>>> This patch replaces the existing tree_code widen_plus and widen_minus > >>>>>> patterns with internal_fn versions. > >>>>>> > >>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and > >>>>>> DEF_INTERNAL_OPTAB_NARROWING_HILO_FN > >>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN > >>>>>> respectively > >>>>>> except they provide convenience wrappers for defining conversions that > >>>>>> require > >>>>>> a hi/lo split. Each definition for will require optabs for _hi > >>>>>> and _lo > >>>>>> and each of those will also require a signed and unsigned version in > >>>>>> the case > >>>>>> of widening. The hi/lo pair is necessary because the widening and > >>>>>> narrowing > >>>>>> operations take n narrow elements as inputs and return n/2 wide > >>>>>> elements as > >>>>>> outputs. The 'lo' operation operates on the first n/2 elements of > >>>>>> input. The > >>>>>> 'hi' operation operates on the second n/2 elements of input. Defining > >>>>>> an > >>>>>> internal_fn along with hi/lo variations allows a single internal > >>>>>> function to > >>>>>> be returned from a vect_recog function that will later be expanded to > >>>>>> hi/lo. > >>>>>> > >>>>>> > >>>>>> For example: > >>>>>> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO > >>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_add_hi_ -> > >>>>>> (u/s)addl2 > >>>>>> IFN_VEC_WIDEN_PLUS_LO -> > >>>>>> vec_widen_add_lo_ > >>>>>> -> (u/s)addl > >>>>>> > >>>>>> This gives the same functionality as the previous > >>>>>> WIDEN_PLUS/WIDEN_MINUS tree > >>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. > >>>>> > >>>>> What I still don't understand is how we are so narrowly focused on > >>>>> HI/LO? We need a combined scalar IFN for pattern selection (not > >>>>> sure why that's now called _HILO, I expected no suffix). Then there's > >>>>> three possibilities the target can implement this: > >>>>> > >>>>> 1) with a widen_[su]add instruction - I _think_ that's what > >>>>> RISCV is going to offer since it is a target where vector modes > >>>>> have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead > >>>>> RVV can do a V4HI to V4SI widening and widening add/subtract > >>>>> using vwadd[u] and vwsub[u] (the HI->SI widening is actually > >>>>> done with a widening add of zero - eh). > >>>>> IIRC GCN is the same here. > >>>> > >>>> SVE currently does this too, but the addition and widening are > >>>> separate operations. E.g. in principle there's no reason why > >>>> you can't sign-extend one operand, zero-extend the other, and > >>>> then add the result together. Or you could extend them from > >>>> different sizes (QI and HI). All of those are supported > >>>> (if the costing allows them). > >>> > >>> I see. So why does the target the expose widen_[su]add at all? > >> > >> It shouldn't (need to) do that. I don't think we should have an optab > >> for the unsplit operation. > >> > >> At least on SVE, we really want the extensions to be fused with loads > >> (where possible) rather than with arithmetic. > >> > >> We can still do the widening arithmetic in one go. It's just that > >> fusing with the loads works for the mixed-sign and mixed-size cases, > >> and can handle more than just doubling the element size. > >> > >>>> If the target has operations to do combined extending and adding (or > >>>> whatever), then at the moment we rely on combine to generate them. > >>>> > >>>> So I think this case is separate from Andre's work. The addition > >>>> itself is just an ordinary addition, and any widening happens by > >>>> vectorising a CONVERT/NOP_EXPR. > >>>> > >>>>> 2) with a widen_[su]add{_lo,_hi} combo - that's what the tree > >>>>> codes currently support (exclusively) > >>>>> 3) similar, but widen_[su]add{_even,_odd} > >>>>> > >>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into > >>>>> a 2) corner without good reason. > >>>> > >>>> I suppose one question is: how much of the patch is really specific > >>>> to HI/LO, and how much is just grouping two halves together? > >>> > >>> Yep, that I don't know for sure. > >>> > >>>> The nice > >>>> thing about the internal-fn grouping macros is that, if (3) is > >>>> implemented in future, the structure will strongly encourage even/odd > >>>> pairs to be supported for all operations that support hi/lo. That is, > >>>> I would expect the grouping macros to be extended to define even/odd > >>>> ifns alongside hi/lo ones, rather than adding separate definitions > >>>> for even/odd functions. > >>>> > >>>> If so, at least from the internal-fn.* side of things, I think the > >>>> question > >>>> is whether it's OK to stick with hilo names for now, or whether we should > >>>> use more forward-looking names. > >>> > >>> I think for parts that are independent we could use a more > >>> forward-looking name. Maybe _halves? > >> > >> Using _halves for the ifn macros sounds good to me FWIW. > >> > >>> But I'm also not sure > >>> how much of that is really needed (it seems to be tied around > >>> optimizing optabs space?) > >> > >> Not sure what you mean by "this". Optabs space shouldn't be a problem > >> though. The optab encoding gives us a full int to play with, and it > >> could easily go up to 64 bits if necessary/convenient. > >> > >> At least on the internal-fn.* side, the aim is really just to establish > >> a regular structure, so that we don't have arbitrary differences between > >> different widening operations, or too much cut-&-paste. > > > > Hmm, I'm looking at the need for the std::map and > > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. > > The vectorizer pieces contain > > > > + if (code.is_fn_code ()) > > + { > > + internal_fn ifn = as_internal_fn ((combined_fn) code); > > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > > + > > + internal_fn lo, hi; > > + lookup_hilo_internal_fn (ifn, &lo, &hi); > > + *code1 = as_combined_fn (lo); > > + *code2 = as_combined_fn (hi); > > + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); > > + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); > > > > so that tries to automatically associate the scalar widening IFN > > with the set(s) of IFN pairs we can split to. But then this > > list should be static and there's no need to create a std::map? > > Maybe gencfn-macros.cc can be enhanced to output these static > > cases? Or the vectorizer could (as it did previously) simply > > open-code the handled cases (I guess since we deal with two > > cases only now I'd prefer that). > > > > Thanks, > > Richard. > > > > > >> Thanks, > >> Richard > >> > > > The patch I uploaded last no longer has std::map nor > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've attached > it again) Whoops, too many patches ... > I'm not sure I understand the _halves, do you mean that for the case where I > had _hilo or _HILO before we rename that to _halves/_HALVES such that it later > represents both _hi/_lo separation and _even/_odd? I don't see much shared stuff, but I guess we'd see when we add a case for EVEN/ODD. The verifier contains + else if (decomposes_to_hilo_fn_p (ifn)) + { + /* Non decomposed HILO stmts should not appear in IL, these are + merely used as an internal representation to the auto-vectorizer + pass and should have been expanded to their _LO _HI variants. */ + error ("gimple call has an non decomposed HILO IFN"); + debug_generic_stmt (fn); + return true; I think to support case 1) that's not wanted. Instead what you could check is that the types involved are vector types, so a subset of what you check for IFN_VEC_WIDEN_PLUS_LO etc. (but oddly it's not verified those are all operating on vector types only?) +/* Given an internal_fn IFN that is a HILO function, return its corresponding + LO and HI internal_fns. */ + +extern void +lookup_hilo_internal_fn (internal_fn ifn, internal_fn *lo, internal_fn *hi) +{ + gcc_assert (decomposes_to_hilo_fn_p (ifn)); + + *lo = internal_fn (ifn + 1); + *hi = internal_fn (ifn + 2); that might become fragile if we add EVEN/ODD besides HI/LO unless we merge those with a DEF_INTERNAL_OPTAB_WIDENING_HILO_EVENODD_FN case, right? > And am I correct to assume we are just giving up on having a INTERNAL_OPTAB_FN > idea for 1)? Well, I think we want all of them in the end (or at least support them if target need arises). full vector, hi/lo and even/odd. Richard.