Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "Andre Vieira (lists)" <andre.simoesdiasvieira@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org, Richard.Sandiford@arm.com
Subject: Re: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE
Date: Wed, 28 Feb 2024 17:25:30 +0000	[thread overview]
Message-ID: <0a69b67c-7fa3-4a79-86c4-c6311cc56ea7@arm.com> (raw)
In-Reply-To: <q59108o7-5ps1-q5oo-o5rq-oqp450456038@fhfr.qr>

[-- Attachment #1: Type: text/plain, Size: 5591 bytes --]



On 27/02/2024 08:47, Richard Biener wrote:
> On Mon, 26 Feb 2024, Andre Vieira (lists) wrote:
> 
>>
>>
>> On 05/02/2024 09:56, Richard Biener wrote:
>>> On Thu, 1 Feb 2024, Andre Vieira (lists) wrote:
>>>
>>>>
>>>>
>>>> On 01/02/2024 07:19, Richard Biener wrote:
>>>>> On Wed, 31 Jan 2024, Andre Vieira (lists) wrote:
>>>>>
>>>>>
>>>>> The patch didn't come with a testcase so it's really hard to tell
>>>>> what goes wrong now and how it is fixed ...
>>>>
>>>> My bad! I had a testcase locally but never added it...
>>>>
>>>> However... now I look at it and ran it past Richard S, the codegen isn't
>>>> 'wrong', but it does have the potential to lead to some pretty slow
>>>> codegen,
>>>> especially for inbranch simdclones where it transforms the SVE predicate
>>>> into
>>>> an Advanced SIMD vector by inserting the elements one at a time...
>>>>
>>>> An example of which can be seen if you do:
>>>>
>>>> gcc -O3 -march=armv8-a+sve -msve-vector-bits=128  -fopenmp-simd t.c -S
>>>>
>>>> with the following t.c:
>>>> #pragma omp declare simd simdlen(4) inbranch
>>>> int __attribute__ ((const)) fn5(int);
>>>>
>>>> void fn4 (int *a, int *b, int n)
>>>> {
>>>>       for (int i = 0; i < n; ++i)
>>>>           b[i] = fn5(a[i]);
>>>> }
>>>>
>>>> Now I do have to say, for our main usecase of libmvec we won't have any
>>>> 'inbranch' Advanced SIMD clones, so we avoid that issue... But of course
>>>> that
>>>> doesn't mean user-code will.
>>>
>>> It seems to use SVE masks with vector(4) <signed-boolean:4> and the
>>> ABI says the mask is vector(4) int.  You say that's because we choose
>>> a Adv SIMD clone for the SVE VLS vector code (it calls _ZGVnM4v_fn5).
>>>
>>> The vectorizer creates
>>>
>>>     _44 = VEC_COND_EXPR <loop_mask_41, { 1, 1, 1, 1 }, { 0, 0, 0, 0 }>;
>>>
>>> and then vector lowering decomposes this.  That means the vectorizer
>>> lacks a check that the target handles this VEC_COND_EXPR.
>>>
>>> Of course I would expect that SVE with VLS vectors is able to
>>> code generate this operation, so it's missing patterns in the end.
>>>
>>> Richard.
>>>
>>
>> What should we do for GCC-14? Going forward I think the right thing to do is
>> to add these patterns. But I am not even going to try to do that right now and
>> even though we can codegen for this, the result doesn't feel like it would
>> ever be profitable which means I'd rather not vectorize, or well pick a
>> different vector mode if possible.
>>
>> This would be achieved with the change to the targethook. If I change the hook
>> to take modes, using STMT_VINFO_VECTYPE (stmt_vinfo), is that OK for now?
> 
> Passing in a mode is OK.  I'm still not fully understanding why the
> clone isn't fully specifying 'mode' and if it does not why the
> vectorizer itself can not disregard it.


We could check that the modes of the parameters & return type are the 
same as the vector operands & result in the vectorizer. But then we'd 
also want to make sure we don't reject cases where we have simdclones 
with compatible modes, aka same element type, but a multiple element 
count.  Which is where'd we get in trouble again I think, because we'd 
want to accept V8SI -> 2x V4SI, but not V8SI -> 2x VNx4SI (with VLS and 
aarch64_sve_vg = 2), not because it's invalid, but because right now the 
codegen is bad. And it's easier to do this in the targethook, which we 
can technically also use to 'rank' simdclones by setting a 
target_badness value, so in the future we could decide to assign some 
'badness' to influence the rank a SVE simdclone for Advanced SIMD loops 
vs an Advanced SIMD clone for Advanced SIMD loops.

This does touch another issue of simdclone costing, which is a larger 
issue in general and one we (arm) might want to approach in the future. 
It's a complex issue, because the vectorizer doesn't know the 
performance impact of a simdclone, we assume (as we should) that its 
faster than the original scalar, though we currently don't record costs 
for either, but we don't know by how much or how much impact it has, so 
the vectorizer can't reason whether it's beneficial to use a simdclone 
if it has to do a lot of operand preparation, we can merely tell it to 
use it, or not and all the other operations in the loop will determine 
costing.


>  From the past discussion I understood the existing situation isn't
> as bad as initially thought and no bad things happen right now?
Nope, I thought they compiler would fall apart, but it seems to be able 
to transform the operands from one mode into the other, so without the 
targethook it just generates slower loops in certain cases, which we'd 
rather avoid given the usecase for simdclones is to speed things up ;)


Attached reworked patch.


This patch adds a machine_mode argument to TARGET_SIMD_CLONE_USABLE to 
make sure the target can reject a simd_clone based on the vector mode it 
is using.  This is needed because for VLS SVE vectorization the 
vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE 
types because the simdlens might match, this currently leads to 
suboptimal codegen.

Other targets do not currently need to use this argument.

gcc/ChangeLog:

	* target.def (TARGET_SIMD_CLONE_USABLE): Add argument.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass vector_mode
	to call TARGET_SIMD_CLONE_USABLE.
	* config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument
	and use it to reject the use of SVE simd clones with Advanced SIMD
	modes.
	* config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument.
	* config/i386/i386.cc (ix86_simd_clone_usable): Likewise.

[-- Attachment #2: target_simd_clone_usable.patch --]
[-- Type: text/plain, Size: 4672 bytes --]

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 16318bf925883ecedf9345e53fc0824a553b2747..6ee77f61235219b477d1f622fceb752d54c58b87 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -28769,12 +28769,12 @@ aarch64_simd_clone_adjust (struct cgraph_node *node)
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-aarch64_simd_clone_usable (struct cgraph_node *node)
+aarch64_simd_clone_usable (struct cgraph_node *node, machine_mode vector_mode)
 {
   switch (node->simdclone->vecsize_mangle)
     {
     case 'n':
-      if (!TARGET_SIMD)
+      if (!TARGET_SIMD || aarch64_sve_mode_p (vector_mode))
 	return -1;
       return 0;
     default:
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index bc076d1120d9e7d03c9bed23b8df215ae35e442c..9624b7c1aab29665c52f7b82d8b437af2e8e1ea1 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -5667,7 +5667,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node))
 /* Implement TARGET_SIMD_CLONE_USABLE.  */
 
 static int
-gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
+gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node),
+		       machine_mode ARG_UNUSED (vector_mode))
 {
   /* We don't need to do anything here because
      gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fc5068539c11748a5adf70ec77b2f1cae1a1e231..c54f66543fdd4103d58c2f9390a3c91060597b94 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -25249,7 +25249,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
    slightly less desirable, etc.).  */
 
 static int
-ix86_simd_clone_usable (struct cgraph_node *node)
+ix86_simd_clone_usable (struct cgraph_node *node,
+			machine_mode ARG_UNUSED (vector_mode))
 {
   switch (node->simdclone->vecsize_mangle)
     {
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index c8b8b126b2424b6552f824ba42ac329cfaf84d84..03f7d72a429204a584253dc5c6e8fa1b3074795d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6498,11 +6498,11 @@ This hook should add implicit @code{attribute(target("..."))} attribute
 to SIMD clone @var{node} if needed.
 @end deftypefn
 
-@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{})
+@deftypefn {Target Hook} int TARGET_SIMD_CLONE_USABLE (struct cgraph_node *@var{}, @var{machine_mode})
 This hook should return -1 if SIMD clone @var{node} shouldn't be used
-in vectorized loops in current function, or non-negative number if it is
-usable.  In that case, the smaller the number is, the more desirable it is
-to use it.
+in vectorized loops in current function with @var{vector_mode}, or
+non-negative number if it is usable.  In that case, the smaller the number
+is, the more desirable it is to use it.
 @end deftypefn
 
 @deftypefn {Target Hook} int TARGET_SIMT_VF (void)
diff --git a/gcc/target.def b/gcc/target.def
index fdad7bbc93e2ad8aea30336d5cd4af67801e9c74..7e8921b6bd4078770268819a38595fdce612b548 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1645,10 +1645,10 @@ void, (struct cgraph_node *), NULL)
 DEFHOOK
 (usable,
 "This hook should return -1 if SIMD clone @var{node} shouldn't be used\n\
-in vectorized loops in current function, or non-negative number if it is\n\
-usable.  In that case, the smaller the number is, the more desirable it is\n\
-to use it.",
-int, (struct cgraph_node *), NULL)
+in vectorized loops in current function with @var{vector_mode}, or\n\
+non-negative number if it is usable.  In that case, the smaller the number\n\
+is, the more desirable it is to use it.",
+int, (struct cgraph_node *, machine_mode), NULL)
 
 HOOK_VECTOR_END (simd_clone)
 
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 1dbe1115da4d7dd4fc590e5830a9c7f05be6945a..f06a53d37ee05737e00e80d9c265192bede6aa18 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -4074,7 +4074,14 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
 	  this_badness += floor_log2 (num_calls) * 4096;
 	if (n->simdclone->inbranch)
 	  this_badness += 8192;
-	int target_badness = targetm.simd_clone.usable (n);
+
+	/* If STMT_VINFO_VECTYPE has not been set yet pass the general vector
+	   mode,  which for targets that use it will determine what ISA we can
+	   vectorize this code with.  */
+	machine_mode vector_mode = vinfo->vector_mode;
+	if (vectype)
+	  vector_mode = TYPE_MODE (vectype);
+	int target_badness = targetm.simd_clone.usable (n, vector_mode);
 	if (target_badness < 0)
 	  continue;
 	this_badness += target_badness * 512;

next prev parent reply	other threads:[~2024-02-28 17:25 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30 14:31 [PATCH 0/3] vect, aarch64: Add SVE support for simdclones Andre Vieira
2024-01-30 14:31 ` [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Andre Vieira
2024-01-31 12:11   ` Richard Biener
2024-01-31 12:13     ` Richard Biener
2024-01-31 13:52       ` Andre Vieira (lists)
2024-01-31 13:58         ` Richard Biener
2024-01-31 14:03           ` Richard Biener
2024-01-31 16:13             ` Andre Vieira (lists)
2024-01-31 14:35           ` Andre Vieira (lists)
2024-01-31 14:35             ` Richard Biener
2024-01-31 16:36               ` Andre Vieira (lists)
2024-02-01  7:19                 ` Richard Biener
2024-02-01 17:01                   ` Andre Vieira (lists)
2024-02-05  9:56                     ` Richard Biener
2024-02-26 16:56                       ` Andre Vieira (lists)
2024-02-27  8:47                         ` Richard Biener
2024-02-28 17:25                           ` Andre Vieira (lists) [this message]
2024-02-29  7:26                             ` Richard Biener
2024-02-01  7:59                 ` Richard Sandiford
2024-01-30 14:31 ` [PATCH 2/3] vect: disable multiple calls of poly simdclones Andre Vieira
2024-01-31 12:13   ` Richard Biener
2024-01-30 14:31 ` [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Andre Vieira
2024-02-01 21:59   ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a69b67c-7fa3-4a79-86c4-c6311cc56ea7@arm.com \
    --to=andre.simoesdiasvieira@arm.com \
    --cc=Richard.Sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).