On 15/05/2023 12:01, Richard Biener wrote:
> On Mon, 15 May 2023, Richard Sandiford wrote:
> 
>> Richard Biener <rguenther@suse.de> writes:
>>> On Fri, 12 May 2023, Richard Sandiford wrote:
>>>
>>>> Richard Biener <rguenther@suse.de> writes:
>>>>> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
>>>>>
>>>>>> I have dealt with, I think..., most of your comments. There's quite a few
>>>>>> changes, I think it's all a bit simpler now. I made some other changes to the
>>>>>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>>>>>> the same behaviour as we had with the tree codes before. Also added some extra
>>>>>> checks to tree-cfg.cc that made sense to me.
>>>>>>
>>>>>> I am still regression testing the gimple-range-op change, as that was a last
>>>>>> minute change, but the rest survived a bootstrap and regression test on
>>>>>> aarch64-unknown-linux-gnu.
>>>>>>
>>>>>> cover letter:
>>>>>>
>>>>>> This patch replaces the existing tree_code widen_plus and widen_minus
>>>>>> patterns with internal_fn versions.
>>>>>>
>>>>>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>>>>>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>>>>>> except they provide convenience wrappers for defining conversions that require
>>>>>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>>>>>> and each of those will also require a signed and unsigned version in the case
>>>>>> of widening. The hi/lo pair is necessary because the widening and narrowing
>>>>>> operations take n narrow elements as inputs and return n/2 wide elements as
>>>>>> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>>>>>> 'hi' operation operates on the second n/2 elements of input. Defining an
>>>>>> internal_fn along with hi/lo variations allows a single internal function to
>>>>>> be returned from a vect_recog function that will later be expanded to hi/lo.
>>>>>>
>>>>>>
>>>>>>   For example:
>>>>>>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>>>>>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>>>>>> (u/s)addl2
>>>>>>                         IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>>>>>> -> (u/s)addl
>>>>>>
>>>>>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>>>>>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
>>>>>
>>>>> What I still don't understand is how we are so narrowly focused on
>>>>> HI/LO?  We need a combined scalar IFN for pattern selection (not
>>>>> sure why that's now called _HILO, I expected no suffix).  Then there's
>>>>> three possibilities the target can implement this:
>>>>>
>>>>>   1) with a widen_[su]add<mode> instruction - I _think_ that's what
>>>>>      RISCV is going to offer since it is a target where vector modes
>>>>>      have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>>>>>      RVV can do a V4HI to V4SI widening and widening add/subtract
>>>>>      using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>>>>>      done with a widening add of zero - eh).
>>>>>      IIRC GCN is the same here.
>>>>
>>>> SVE currently does this too, but the addition and widening are
>>>> separate operations.  E.g. in principle there's no reason why
>>>> you can't sign-extend one operand, zero-extend the other, and
>>>> then add the result together.  Or you could extend them from
>>>> different sizes (QI and HI).  All of those are supported
>>>> (if the costing allows them).
>>>
>>> I see.  So why does the target the expose widen_[su]add<mode> at all?
>>
>> It shouldn't (need to) do that.  I don't think we should have an optab
>> for the unsplit operation.
>>
>> At least on SVE, we really want the extensions to be fused with loads
>> (where possible) rather than with arithmetic.
>>
>> We can still do the widening arithmetic in one go.  It's just that
>> fusing with the loads works for the mixed-sign and mixed-size cases,
>> and can handle more than just doubling the element size.
>>
>>>> If the target has operations to do combined extending and adding (or
>>>> whatever), then at the moment we rely on combine to generate them.
>>>>
>>>> So I think this case is separate from Andre's work.  The addition
>>>> itself is just an ordinary addition, and any widening happens by
>>>> vectorising a CONVERT/NOP_EXPR.
>>>>
>>>>>   2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>>>>>      codes currently support (exclusively)
>>>>>   3) similar, but widen_[su]add{_even,_odd}<mode>
>>>>>
>>>>> that said, things like decomposes_to_hilo_fn_p look to paint us into
>>>>> a 2) corner without good reason.
>>>>
>>>> I suppose one question is: how much of the patch is really specific
>>>> to HI/LO, and how much is just grouping two halves together?
>>>
>>> Yep, that I don't know for sure.
>>>
>>>>   The nice
>>>> thing about the internal-fn grouping macros is that, if (3) is
>>>> implemented in future, the structure will strongly encourage even/odd
>>>> pairs to be supported for all operations that support hi/lo.  That is,
>>>> I would expect the grouping macros to be extended to define even/odd
>>>> ifns alongside hi/lo ones, rather than adding separate definitions
>>>> for even/odd functions.
>>>>
>>>> If so, at least from the internal-fn.* side of things, I think the question
>>>> is whether it's OK to stick with hilo names for now, or whether we should
>>>> use more forward-looking names.
>>>
>>> I think for parts that are independent we could use a more
>>> forward-looking name.  Maybe _halves?
>>
>> Using _halves for the ifn macros sounds good to me FWIW.
>>
>>> But I'm also not sure
>>> how much of that is really needed (it seems to be tied around
>>> optimizing optabs space?)
>>
>> Not sure what you mean by "this".  Optabs space shouldn't be a problem
>> though.  The optab encoding gives us a full int to play with, and it
>> could easily go up to 64 bits if necessary/convenient.
>>
>> At least on the internal-fn.* side, the aim is really just to establish
>> a regular structure, so that we don't have arbitrary differences between
>> different widening operations, or too much cut-&-paste.
> 
> Hmm, I'm looking at the need for the std::map and
> internal_fn_hilo_keys_array and internal_fn_hilo_values_array.
> The vectorizer pieces contain
> 
> +  if (code.is_fn_code ())
> +     {
> +      internal_fn ifn = as_internal_fn ((combined_fn) code);
> +      gcc_assert (decomposes_to_hilo_fn_p (ifn));
> +
> +      internal_fn lo, hi;
> +      lookup_hilo_internal_fn (ifn, &lo, &hi);
> +      *code1 = as_combined_fn (lo);
> +      *code2 = as_combined_fn (hi);
> +      optab1 = lookup_hilo_ifn_optab (lo, !TYPE_UNSIGNED (vectype));
> +      optab2 = lookup_hilo_ifn_optab (hi, !TYPE_UNSIGNED (vectype));
> 
> so that tries to automatically associate the scalar widening IFN
> with the set(s) of IFN pairs we can split to.  But then this
> list should be static and there's no need to create a std::map?
> Maybe gencfn-macros.cc can be enhanced to output these static
> cases?  Or the vectorizer could (as it did previously) simply
> open-code the handled cases (I guess since we deal with two
> cases only now I'd prefer that).
> 
> Thanks,
> Richard.
> 
> 
>> Thanks,
>> Richard
>>
> 
The patch I uploaded last no longer has std::map nor 
internal_fn_hilo_keys_array and internal_fn_hilo_values_array. (I've 
attached it again)

I'm not sure I understand the _halves, do you mean that for the case 
where I had _hilo or _HILO before we rename that to _halves/_HALVES such 
that it later represents both _hi/_lo separation and _even/_odd?

And am I correct to assume we are just giving up on having a 
INTERNAL_OPTAB_FN idea for 1)?

Kind regards,
Andre