From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=IYwX=JJ=arm.com=andre.simoesdiasvieira@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 363BE38582B0
	for <gcc-patches@gcc.gnu.org>; Wed, 31 Jan 2024 16:36:45 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 363BE38582B0
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 363BE38582B0
Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706719007; cv=none;
	b=mXJSz69CIBx8Hgw44n9tSztRwyiFPCg1FXISX55NZH0nXiynoFd4SDS1pHe89FlMgfidPyp6ucr5M2Gh1XwA7Jof3vSGGNAVtu6sHtOrnXN9tjdHF7zZlCtAsoVuskzFvJLQtCLQTUxoMHcZ3Tl2lnZRhABz98vWpn01OIbNaW0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
	t=1706719007; c=relaxed/simple;
	bh=DK+ErOnOZs9ctbr8/9MmxP2wmATJcmq1h8068/yqvBA=;
	h=Message-ID:Date:MIME-Version:Subject:To:From; b=jtcr107TejkMZu49beHGz1zEZqS1rlV7TCt9hUpXiL94CZCt3DuyFAkFOsHLw5XdxhKyDKEipWTLlqdTPF1l8NAe9xwVM/6+wcMrWjJPxhSSqdn0QL7XZsSBhM1Km/MqcpU0WPged7dWxsjce0q5h1gZLydKErywxChiEZKrq9Y=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0BE1BDA7;
	Wed, 31 Jan 2024 08:37:28 -0800 (PST)
Received: from [10.57.78.243] (unknown [10.57.78.243])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 334253F762;
	Wed, 31 Jan 2024 08:36:44 -0800 (PST)
Message-ID: <e0448c57-8071-49b0-a551-c3831cf68d63@arm.com>
Date: Wed, 31 Jan 2024 16:36:42 +0000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
Content-Language: en-US
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com
References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com>
 <20240130143132.9575-2-andre.simoesdiasvieira@arm.com>
 <s94q838q-35s2-r96r-n2oo-s9op84r49s42@fhfr.qr>
 <ro1os72r-6q35-03q4-nrno-5n405oro645p@fhfr.qr>
 <47e1aeb2-94ac-4733-b49f-ea97932cc49f@arm.com>
 <545r8s73-675p-4o48-sr66-q6956nqp6r6p@fhfr.qr>
 <f6fd7aba-9f8c-4515-929f-6e6ce96e42e9@arm.com>
 <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr>
From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
In-Reply-To: <3rq8sn71-8188-o4rq-9spp-q9spn98163q5@fhfr.qr>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>


On 31/01/2024 14:35, Richard Biener wrote:
> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 31/01/2024 13:58, Richard Biener wrote:
>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>>
>>>>
>>>>
>>>> On 31/01/2024 12:13, Richard Biener wrote:
>>>>> On Wed, 31 Jan 2024, Richard Biener wrote:
>>>>>
>>>>>> On Tue, 30 Jan 2024, Andre Vieira wrote:
>>>>>>
>>>>>>>
>>>>>>> This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure
>>>>>>> the
>>>>>>> target can reject a simd_clone based on the vector mode it is using.
>>>>>>> This is needed because for VLS SVE vectorization the vectorizer accepts
>>>>>>> Advanced SIMD simd clones when vectorizing using SVE types because the
>>>>>>> simdlens
>>>>>>> might match.  This will cause type errors later on.
>>>>>>>
>>>>>>> Other targets do not currently need to use this argument.
>>>>>>
>>>>>> Can you instead pass down the mode?
>>>>>
>>>>> Thinking about that again the cgraph_simd_clone info in the clone
>>>>> should have sufficient information to disambiguate.  If it doesn't
>>>>> then we should amend it.
>>>>>
>>>>> Richard.
>>>>
>>>> Hi Richard,
>>>>
>>>> Thanks for the review, I don't think cgraph_simd_clone_info is the right
>>>> place
>>>> to pass down this information, since this is information about the caller
>>>> rather than the simdclone itself. What we are trying to achieve here is
>>>> making
>>>> the vectorizer being able to accept or reject simdclones based on the ISA
>>>> we
>>>> are vectorizing for. To distinguish between SVE and Advanced SIMD ISAs we
>>>> use
>>>> modes, I am also not sure that's ideal but it is what we currently use. So
>>>> to
>>>> answer your earlier question, yes I can also pass down mode if that's
>>>> preferable.
>>>
>>> Note cgraph_simd_clone_info has simdlen and we seem to check elsewhere
>>> whether that's POLY or constant.  I wonder how aarch64_sve_mode_p
>>> comes into play here which in the end classifies VLS SVE modes as
>>> non-SVE?
>>>
>>
>> Using -msve-vector-bits=128
>> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $4 = E_VNx4SImode
>> (gdb) p  TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $5 = (tree) 0xfffff741c1b0
>> (gdb) p debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> 128
>> (gdb) p aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> $5 = true
>>
>> and for reference without vls codegen:
>> (gdb) p TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo))
>> $1 = E_VNx4SImode
>> (gdb) p  debug (TYPE_SIZE (STMT_VINFO_VECTYPE (stmt_vinfo)))
>> POLY_INT_CST [128, 128]
>>
>> Having said that I believe that the USABLE targethook implementation for
>> aarch64 should also block other uses, like an Advanced SIMD mode being used as
>> input for a SVE VLS SIMDCLONE. The reason being that for instance 'half'
>> registers like VNx2SI are packed differently from V2SI.
>>
>> We could teach the vectorizer to support these of course, but that requires
>> more work and is not extremely useful just yet. I'll add the extra check that
>> to the patch once we agree on how to pass down the information we need. Happy
>> to use either mode, or stmt_vec_info and extract the mode from it like it does
>> now.
> 
> As said, please pass down 'mode'.  But I wonder how to document it,
> which mode is that supposed to be?  Any of result or any argument
> mode that happens to be a vector?  I think that we might be able
> to mix Advanced SIMD modes and SVE modes with -msve-vector-bits=128
> in the same loop?
> 
> Are the simd clones you don't want to use with -msve-vector-bits=128
> having constant simdlen?  If so why do you generate them in the first
> place?

So this is where things get a bit confusing and I will write up some 
text for these cases to put in our ABI document (currently in Beta and 
in need of some tlc).

Our intended behaviour is for a 'declare simd' without a simdlen to 
generate simdclones for:
* Advanced SIMD 128 and 64-bit vectors, where possible (we don't allow 
for simdlen 1, Tamar fixed that in gcc recently),
* SVE VLA vectors.

Let me illustrate this with an example:

__attribute__ ((simd (notinbranch), const)) float cosf(float);

Should tell the compiler the following simd clones are available:
__ZGVnN4v_cosf 128-bit 4x4 float Advanced SIMD clone
__ZGVnN2v_cosf 64-bit  4x2 float Advanced SIMD clone
__ZGVsMxv_cosf [128, 128]-bit 4x4xN SVE SIMD clone

[To save you looking into the abi let me break this down, _ZGV is 
prefix, then 'n' or 's' picks between Advanced SIMD and SVE, 'N' or 'M' 
picks between Not Masked and Masked (SVE is always masked even if we ask 
for notinbranch), then a digit or 'x' picks between Vector Length or 
VLA, and after that you get a letter per argument, where v = vector mapped]

Regardless of -msve-vector-bits, however, the vectorizer (and any other 
part of the compiler) may assume that the VL of the VLA SVE clone is 
that specified by -msve-vector-bits, which if the clone is written in a 
VLA way will still work.

If the attribute is used with a function definition rather than 
declaration, so:

__attribute__ ((simd (notinbranch), const)) float fn0(float a)
{
   return a + 1.0f;
}

the compiler should again generate the three simd clones:
__ZGVnN4v_fn0 128-bit 4x4 float Advanced SIMD clone
__ZGVnN2v_fn0 64-bit  4x2 float Advanced SIMD clone
__ZGVsMxv_fn0 [128, 128]-bit 4x4xN SVE SIMD clone

However, in the last one it may assume a VL for the codegen of the body 
and it's the user's responsibility to only use it for targets with that 
length , much like any other code produced this way.

So that's what we tell the compiler is available and what the compiler 
generates depending on where we use the attribute. The question at hand 
here is, what can the vectorizer use for a specific loop. If we are 
using Advanced SIMD modes then it needs to call an Advanced SIMD clone, 
and if we are using SVE modes then it needs to call an SVE clone. At 
least until we support the ABI conversion, because like I said for an 
unpacked argument they behave differently.

PS: In the future OpenMP may add specifications that allow us to define 
a specific VLA simdlen... in other words, whether we want [128, 128] or 
[256, 256], [512, 512] ... etc, but that still needs agreement on the 
OpenMP Spec, which is why for now we piggy back on the simdlen-less 
definition to provide us a VLA SVE simdclone with [128, 128] VL.

Hopefully this makes things a bit clearer :/
> 
> That said, I wonder how we end up mixing things up in the first place.
> 
> Richard.