On 15/05/2023 12:01, Richard Biener wrote: > On Mon, 15 May 2023, Richard Sandiford wrote: > >> Richard Biener writes: >>> On Fri, 12 May 2023, Richard Sandiford wrote: >>> >>>> Richard Biener writes: >>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote: >>>>> >>>>>> I have dealt with, I think..., most of your comments. There's quite a few >>>>>> changes, I think it's all a bit simpler now. I made some other changes to the >>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve >>>>>> the same behaviour as we had with the tree codes before. Also added some extra >>>>>> checks to tree-cfg.cc that made sense to me. >>>>>> >>>>>> I am still regression testing the gimple-range-op change, as that was a last >>>>>> minute change, but the rest survived a bootstrap and regression test on >>>>>> aarch64-unknown-linux-gnu. >>>>>> >>>>>> cover letter: >>>>>> >>>>>> This patch replaces the existing tree_code widen_plus and widen_minus >>>>>> patterns with internal_fn versions. >>>>>> >>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN >>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively >>>>>> except they provide convenience wrappers for defining conversions that require >>>>>> a hi/lo split. Each definition for will require optabs for _hi and _lo >>>>>> and each of those will also require a signed and unsigned version in the case >>>>>> of widening. The hi/lo pair is necessary because the widening and narrowing >>>>>> operations take n narrow elements as inputs and return n/2 wide elements as >>>>>> outputs. The 'lo' operation operates on the first n/2 elements of input. The >>>>>> 'hi' operation operates on the second n/2 elements of input. Defining an >>>>>> internal_fn along with hi/lo variations allows a single internal function to >>>>>> be returned from a vect_recog function that will later be expanded to hi/lo. >>>>>> >>>>>> >>>>>> For example: >>>>>> IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO >>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI -> vec_widen_add_hi_ -> >>>>>> (u/s)addl2 >>>>>> IFN_VEC_WIDEN_PLUS_LO -> vec_widen_add_lo_ >>>>>> -> (u/s)addl >>>>>> >>>>>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree >>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI. >>>>> >>>>> What I still don't understand is how we are so narrowly focused on >>>>> HI/LO? We need a combined scalar IFN for pattern selection (not >>>>> sure why that's now called _HILO, I expected no suffix). Then there's >>>>> three possibilities the target can implement this: >>>>> >>>>> 1) with a widen_[su]add instruction - I _think_ that's what >>>>> RISCV is going to offer since it is a target where vector modes >>>>> have "padding" (aka you cannot subreg a V2SI to get V4HI). Instead >>>>> RVV can do a V4HI to V4SI widening and widening add/subtract >>>>> using vwadd[u] and vwsub[u] (the HI->SI widening is actually >>>>> done with a widening add of zero - eh). >>>>> IIRC GCN is the same here. >>>> >>>> SVE currently does this too, but the addition and widening are >>>> separate operations. E.g. in principle there's no reason why >>>> you can't sign-extend one operand, zero-extend the other, and >>>> then add the result together. Or you could extend them from >>>> different sizes (QI and HI). All of those are supported >>>> (if the costing allows them). >>> >>> I see. So why does the target the expose widen_[su]add at all? >> >> It shouldn't (need to) do that. I don't think we should have an optab >> for the unsplit operation. >> >> At least on SVE, we really want the extensions to be fused with loads >> (where possible) rather than with arithmetic. >> >> We can still do the widening arithmetic in one go. It's just that >> fusing with the loads works for the mixed-sign and mixed-size cases, >> and can handle more than just doubling the element size. >> >>>> If the target has operations to do combined extending and adding (or >>>> whatever), then at the moment we rely on combine to generate them. >>>> >>>> So I think this case is separate from Andre's work. The addition >>>> itself is just an ordinary addition, and any widening happens by >>>> vectorising a CONVERT/NOP_EXPR. >>>> >>>>> 2) with a widen_[su]add{_lo,_hi} combo - that's what the tree >>>>> codes currently support (exclusively) >>>>> 3) similar, but widen_[su]add{_even,_odd} >>>>> >>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into >>>>> a 2) corner without good reason. >>>> >>>> I suppose one question is: how much of the patch is really specific >>>> to HI/LO, and how much is just grouping two halves together? >>> >>> Yep, that I don't know for sure. >>> >>>> The nice >>>> thing about the internal-fn grouping macros is that, if (3) is >>>> implemented in future, the structure will strongly encourage even/odd >>>> pairs to be supported for all operations that support hi/lo. That is, >>>> I would expect the grouping macros to be extended to define even/odd >>>> ifns alongside hi/lo ones, rather than adding separate definitions >>>> for even/odd functions. >>>> >>>> If so, at least from the internal-fn.* side of things, I think the question >>>> is whether it's OK to stick with hilo names for now, or whether we should >>>> use more forward-looking names. >>> >>> I think for parts that are independent we could use a more >>> forward-looking name. Maybe _halves? >> >> Using _halves for the ifn macros sounds good to me FWIW. >> >>> But I'm also not sure >>> how much of that is really needed (it seems to be tied around >>> optimizing optabs space?) >> >> Not sure what you mean by "this". Optabs space shouldn't be a problem >> though. The optab encoding gives us a full int to play with, and it >> could easily go up to 64 bits if necessary/convenient. >> >> At least on the internal-fn.* side, the aim is really just to establish >> a regular structure, so that we don't have arbitrary differences between >> different widening operations, or too much cut-&-paste. > > Hmm, I'm looking at the need for the std::map and > internal_fn_hilo_keys_array and internal_fn_hilo_values_array. > The vectorizer pieces contain > > + if (code.is_fn_code ()) > + { > + internal_fn ifn = as_internal_fn ((combined_fn) code); > + gcc_assert (decomposes_to_hilo_fn_p (ifn)); > + > + internal_fn lo, hi; > + lookup_hilo_internal_fn (ifn, &lo, &hi); > + *code1 = as_combined_fn (lo); > + *code2 = as_combined_fn (hi); > + optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype)); > + optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype)); > > so that tries to automatically associate the scalar widening IFN > with the set(s) of IFN pairs we can split to. But then this > list should be static and there's no need to create a std::map? > Maybe gencfn-macros.cc can be enhanced to output these static > cases? Or the vectorizer could (as it did previously) simply > open-code the handled cases (I guess since we deal with two > cases only now I'd prefer that). > > Thanks, > Richard. > > >> Thanks, >> Richard >> > The patch I uploaded last no longer has std::map nor internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've attached it again) I'm not sure I understand the _halves, do you mean that for the case where I had _hilo or _HILO before we rename that to _halves/_HALVES such that it later represents both _hi/_lo separation and _even/_odd? And am I correct to assume we are just giving up on having a INTERNAL_OPTAB_FN idea for 1)? Kind regards, Andre