From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Wag9=BB=arm.com=andre.simoesdiasvieira@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 554BE3854170
	for <gcc-patches@gcc.gnu.org>; Fri, 12 May 2023 13:56:01 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 554BE3854170
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 986B3C14;
	Fri, 12 May 2023 06:56:45 -0700 (PDT)
Received: from [10.57.73.43] (unknown [10.57.73.43])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 579E23F663;
	Fri, 12 May 2023 06:56:00 -0700 (PDT)
Message-ID: <feadf50f-03f4-1f55-f63e-746d866c6744@arm.com>
Date: Fri, 12 May 2023 14:55:55 +0100
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.10.0
Subject: Re: [PATCH 2/3] Refactor widen_plus as internal_fn
Content-Language: en-US
To: Richard Biener <rguenther@suse.de>
Cc: Richard Biener <richard.guenther@gmail.com>,
 "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
 richard.sandiford@arm.com
References: <DB9PR08MB660348E126B0DBFBBC11325EF5D69@DB9PR08MB6603.eurprd08.prod.outlook.com>
 <DB9PR08MB6603F9C50FFDAF4B972EC317F5A29@DB9PR08MB6603.eurprd08.prod.outlook.com>
 <mpt5ylckhtf.fsf@arm.com>
 <AM8PR08MB6596506A7B7EE54B128D35A3F5A59@AM8PR08MB6596.eurprd08.prod.outlook.com>
 <sp30q1n6-orn4-p4ss-q36s-734854o1ss4@fhfr.qr>
 <DB9PR08MB6603D656FD05B9D562033809F5BA9@DB9PR08MB6603.eurprd08.prod.outlook.com>
 <nycvar.YFH.7.77.849.2207121045100.14950@jbgna.fhfr.qr>
 <51ce8969-3130-452e-092e-f9d91eff2dad@arm.com>
 <nycvar.YFH.7.77.849.2303171151340.18795@jbgna.fhfr.qr>
 <ff69685d-059b-15a6-c2fe-80d9f09c87e3@arm.com>
 <CAFiYyc3ULC=1P7Mr8m9ObJhpywSC0q6vb_GtqHeQr-h-UvB_nQ@mail.gmail.com>
 <ba68e2e8-9670-7e9b-1467-7bc6238ecf0d@arm.com>
 <a9c739df-eba4-e0e6-b59e-4d6ecc7511e9@arm.com>
 <nycvar.YFH.7.77.849.2305031200420.4466@jbgna.fhfr.qr>
 <mptfs8ds07m.fsf@arm.com> <4df83136-82f9-de0b-7e66-007b9047174d@arm.com>
 <nycvar.YFH.7.77.849.2305121318050.4723@jbgna.fhfr.qr>
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
In-Reply-To: <nycvar.YFH.7.77.849.2305121318050.4723@jbgna.fhfr.qr>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-9.7 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>



On 12/05/2023 14:28, Richard Biener wrote:
> On Fri, 12 May 2023, Andre Vieira (lists) wrote:
> 
>> I have dealt with, I think..., most of your comments. There's quite a few
>> changes, I think it's all a bit simpler now. I made some other changes to the
>> costing in tree-inline.cc and gimple-range-op.cc in which I try to preserve
>> the same behaviour as we had with the tree codes before. Also added some extra
>> checks to tree-cfg.cc that made sense to me.
>>
>> I am still regression testing the gimple-range-op change, as that was a last
>> minute change, but the rest survived a bootstrap and regression test on
>> aarch64-unknown-linux-gnu.
>>
>> cover letter:
>>
>> This patch replaces the existing tree_code widen_plus and widen_minus
>> patterns with internal_fn versions.
>>
>> DEF_INTERNAL_OPTAB_WIDENING_HILO_FN and DEF_INTERNAL_OPTAB_NARROWING_HILO_FN
>> are like DEF_INTERNAL_SIGNED_OPTAB_FN and DEF_INTERNAL_OPTAB_FN respectively
>> except they provide convenience wrappers for defining conversions that require
>> a hi/lo split.  Each definition for <NAME> will require optabs for _hi and _lo
>> and each of those will also require a signed and unsigned version in the case
>> of widening. The hi/lo pair is necessary because the widening and narrowing
>> operations take n narrow elements as inputs and return n/2 wide elements as
>> outputs. The 'lo' operation operates on the first n/2 elements of input. The
>> 'hi' operation operates on the second n/2 elements of input. Defining an
>> internal_fn along with hi/lo variations allows a single internal function to
>> be returned from a vect_recog function that will later be expanded to hi/lo.
>>
>>
>>   For example:
>>   IFN_VEC_WIDEN_PLUS -> IFN_VEC_WIDEN_PLUS_HI, IFN_VEC_WIDEN_PLUS_LO
>> for aarch64: IFN_VEC_WIDEN_PLUS_HI   -> vec_widen_<su>add_hi_<mode> ->
>> (u/s)addl2
>>                         IFN_VEC_WIDEN_PLUS_LO  -> vec_widen_<su>add_lo_<mode>
>> -> (u/s)addl
>>
>> This gives the same functionality as the previous WIDEN_PLUS/WIDEN_MINUS tree
>> codes which are expanded into VEC_WIDEN_PLUS_LO, VEC_WIDEN_PLUS_HI.
> 
> What I still don't understand is how we are so narrowly focused on
> HI/LO?  We need a combined scalar IFN for pattern selection (not
> sure why that's now called _HILO, I expected no suffix).  Then there's
> three possibilities the target can implement this:
> 
>   1) with a widen_[su]add<mode> instruction - I _think_ that's what
>      RISCV is going to offer since it is a target where vector modes
>      have "padding" (aka you cannot subreg a V2SI to get V4HI).  Instead
>      RVV can do a V4HI to V4SI widening and widening add/subtract
>      using vwadd[u] and vwsub[u] (the HI->SI widening is actually
>      done with a widening add of zero - eh).
>      IIRC GCN is the same here.
>   2) with a widen_[su]add{_lo,_hi}<mode> combo - that's what the tree
>      codes currently support (exclusively)
>   3) similar, but widen_[su]add{_even,_odd}<mode>
> 
> that said, things like decomposes_to_hilo_fn_p look to paint us into
> a 2) corner without good reason.

I was kind of just keeping the naming, I had forgotten to mention I was 
also going to add _EVENODD but you are right, the pattern selection IFN 
does not need to be restrictive.

And then at supportable_widening_operation we could check what the 
target offers support for (either 1, 2 or 3). We can then actually just 
get rid of decomposes_to_hilo_fn_p and just assume that for all 
narrowing or widening IFN's there are optabs (that may or may not be 
implemented by a target) for all three variants

Having said that, that means we should have an optab to cover 1, which 
should probably just have the original name. Let me write it out...

Say we have a IFN_VEC_WIDEN_PLUS pattern and assume its signed, 
supportable_widening_operation would then first check if the target 
supported vec_widen_sadd_optab for say V8HI -> V8SI? Risc-V would take 
this path I guess?

If the target doesn't then it could check for support for:
vec_widen_sadd_lo_optab V4HI -> V4SI
vec_widen_sadd_hi_optab V4HI -> V4SI

AArch64 Advanced SIMD would implement this.

If the target still didn't support this it would check for (not sure 
about the modes here):
vec_widen_sadd_even_optab VNx8HI -> VNx4SI
vec_widen_sadd_odd_optab VNx8HI -> VNx4SI

This is one SVE would implement.


So that would mean that I'd probably end up rewriting
#define DEF_INTERNAL_OPTAB_WIDENING_FN (NAME, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
as:
for1)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME, FLAGS, SELECTOR, SOPTAB, UOPTAB, 
TYPE)

for 2)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_LO, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_HI, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)

for 3)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_EVEN, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)
   DEF_INTERNAL_SIGNED_OPTAB_FN (NAME##_ODD, FLAGS, SELECTOR, SOPTAB, 
UOPTAB, TYPE)

And the same for narrowing (but with DEF_INTERNAL_OPTAB_FN instead of 
SIGNED_OPTAB).

So each widening and narrowing IFN would have optabs for all its 
variants and each target implements the ones it supports.

I'm happy to do this, but implementing support to handle the 1 and 3 
variants without having optabs for them right now seems a bit odd and it 
would delay this patch, so I suggest I add the framework and the optabs 
but leave adding the vectorizer support for later? I can add comments to 
where I think that should go.

> Richard.
> 
>> gcc/ChangeLog:
>>
>> 2023-05-12  Andre Vieira  <andre.simoesdiasvieira@arm.com>
>>              Joel Hutton  <joel.hutton@arm.com>
>>              Tamar Christina  <tamar.christina@arm.com>
>>
>>          * config/aarch64/aarch64-simd.md (vec_widen_<su>addl_lo_<mode>):
>> Rename
>>          this ...
>>          (vec_widen_<su>add_lo_<mode>): ... to this.
>>          (vec_widen_<su>addl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>add_hi_<mode>): ... to this.
>>          (vec_widen_<su>subl_lo_<mode>): Rename this ...
>>          (vec_widen_<su>sub_lo_<mode>): ... to this.
>>          (vec_widen_<su>subl_hi_<mode>): Rename this ...
>>          (vec_widen_<su>sub_hi_<mode>): ...to this.
>>          * doc/generic.texi: Document new IFN codes.
>> 	* internal-fn.cc (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Macro to
>> 	define an
>>          internal_fn that expands into multiple internal_fns for widening.
>>          (DEF_INTERNAL_OPTAB_NARROWING_HILO_FN): Likewise but for narrowing.
>>   	(ifn_cmp): Function to compare ifn's for sorting/searching.
>> 	(lookup_hilo_internal_fn): Add lookup function.
>> 	(commutative_binary_fn_p): Add widen_plus fn's.
>> 	(widening_fn_p): New function.
>> 	(narrowing_fn_p): New function.
>> 	(decomposes_to_hilo_fn_p): New function.
>> 	         (direct_internal_fn_optab): Change visibility.
>>      	* internal-fn.def (DEF_INTERNAL_OPTAB_WIDENING_HILO_FN): Define
>>      widening
>>      plus,minus functions.
>> 	(VEC_WIDEN_PLUS): Replacement for VEC_WIDEN_PLUS_EXPR tree code.
>> 	(VEC_WIDEN_MINUS): Replacement for VEC_WIDEN_MINUS_EXPR tree code.
>> 	* internal-fn.h (GCC_INTERNAL_FN_H): Add headers.
>> 	         (direct_internal_fn_optab): Declare new prototype.
>> 	(lookup_hilo_internal_fn): Likewise.
>> 	(widening_fn_p): Likewise.
>> 	(Narrowing_fn_p): Likewise.
>> 	(decomposes_to_hilo_fn_p): Likewise.
>> 	* optabs.cc (commutative_optab_p): Add widening plus optabs.
>> 	* optabs.def (OPTAB_D): Define widen add, sub optabs.
>>          * tree-cfg.cc (verify_gimple_call): Add checks for new widen
>>          add and sub IFNs.
>>          * tree-inline.cc (estimate_num_insns): Return same
>>          cost for widen add and sub IFNs as previous tree_codes.
>>      	* tree-vect-patterns.cc (vect_recog_widen_op_pattern): Support
>>      patterns with a hi/lo split.
>>          (vect_recog_sad_pattern): Refactor to use new IFN codes.
>>          (vect_recog_widen_plus_pattern): Likewise.
>>          (vect_recog_widen_minus_pattern): Likewise.
>>          (vect_recog_average_pattern): Likewise.
>> 	* tree-vect-stmts.cc (vectorizable_conversion): Add support for
>> 	         _HILO IFNs.
>> 	(supportable_widening_operation): Likewise.
>>          * tree.def (WIDEN_SUM_EXPR): Update example to use new IFNs.
>>
>> gcc/testsuite/ChangeLog:
>>
>>      	* gcc.target/aarch64/vect-widen-add.c: Test that new
>>      IFN_VEC_WIDEN_PLUS is being used.
>>      	* gcc.target/aarch64/vect-widen-sub.c: Test that new
>>      IFN_VEC_WIDEN_MINUS is being used.
>>
>