public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <rguenther@suse.de>
Cc: gcc-patches@gcc.gnu.org,  tamar.christina@arm.com
Subject: Re: [PATCH 4/4] Testsuite updates
Date: Wed, 22 May 2024 12:37:23 +0100	[thread overview]
Message-ID: <mptjzjmjcwc.fsf@arm.com> (raw)
In-Reply-To: <p4q0o88p-6800-n13o-9o40-nr41r60sp683@fhfr.qr> (Richard Biener's message of "Wed, 22 May 2024 12:58:58 +0200 (CEST)")

Richard Biener <rguenther@suse.de> writes:
> On Tue, 21 May 2024, Richard Biener wrote:
>
>> The gcc.dg/vect/slp-12a.c case is interesting as we currently split
>> the 8 store group into lanes 0-5 which we SLP with an unroll factor
>> of two (on x86-64 with SSE) and the remaining two lanes are using
>> interleaving vectorization with a final unroll factor of four.  Thus
>> we're using hybrid SLP within a single store group.  After the change
>> we discover the same 0-5 lane SLP part as well as two single-lane
>> parts feeding the full store group.  But that results in a load
>> permutation that isn't supported (I have WIP patchs to rectify that).
>> So we end up cancelling SLP and vectorizing the whole loop with
>> interleaving which is IMO good and results in better code.
>> 
>> This is similar for gcc.target/i386/pr52252-atom.c where interleaving
>> generates much better code than hybrid SLP.  I'm unsure how to update
>> the testcase though.
>> 
>> gcc.dg/vect/slp-21.c runs into similar situations.  Note that when
>> when analyzing SLP operations we discard an instance we currently
>> force the full loop to have no SLP because hybrid detection is
>> broken.  It's probably not worth fixing this at this moment.
>> 
>> For gcc.dg/vect/pr97428.c we are not splitting the 16 store group
>> into two but merge the two 8 lane loads into one before doing the
>> store and thus have only a single SLP instance.  A similar situation
>> happens in gcc.dg/vect/slp-11c.c but the branches feeding the
>> single SLP store only have a single lane.  Likewise for
>> gcc.dg/vect/vect-complex-5.c and gcc.dg/vect/vect-gather-2.c.
>> 
>> gcc.dg/vect/slp-cond-1.c has an additional SLP vectorization
>> with a SLP store group of size two but two single-lane branches.
>> 
>> gcc.target/i386/pr98928.c ICEs in SLP permute optimization
>> because we don't expect a constant and internal branch to be
>> merged with a permute node in
>> vect_optimize_slp_pass::change_vec_perm_layout:4859 (the only
>> permutes merging two SLP nodes are two-operator nodes right now).
>> This still requires fixing.
>> 
>> The whole series has been bootstrapped and tested on 
>> x86_64-unknown-linux-gnu with the gcc.target/i386/pr98928.c FAIL
>> unfixed.
>> 
>> Comments welcome (and hello ARM CI), RISC-V and other arch
>> testing appreciated.  Unless there are comments to the contrary
>> I plan to push patch 1 and 2 tomorrow.
>
> RISC-V CI didn't trigger (not sure what magic is required).  Both
> ARM and AARCH64 show that the "Vectorizing stmts using SLP" are a bit
> fragile because we sometimes cancel SLP becuase we want to use
> load/store-lanes.
>
> I have locally scrapped the SLP scanning for gcc.dg/vect/slp-21.c where
> it doesn't really matter (and if we are finished with all-SLP it will
> matter nowhere).  I've conditionalized the outcome based on
> vect_load_lanes for gcc.dg/vect/slp-11c.c and
> gcc.dg/vect/slp-cond-1.c
>
> On AARCH64 additionally gcc.target/aarch64/sve/mask_struct_store_4.c
> ICEs, I have a fix for that.
>
> gcc.target/aarch64/pr99873_2.c FAILs because with a single
> SLP store group merged from two two-lane load groups we cancel
> the SLP and want to use load/store-lanes.  I'll leave this
> FAILing or shall I XFAIL it?

Yeah, agree it's probably worth leaving it FAILing for now, since it
is something we should try to fix for GCC 15.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>> Thanks,
>> Richard.
>> 
>> 	* gcc.dg/vect/pr97428.c: Expect a single store SLP group.
>> 	* gcc.dg/vect/slp-11c.c: Likewise.
>> 	* gcc.dg/vect/vect-complex-5.c: Likewise.
>> 	* gcc.dg/vect/slp-12a.c: Do not expect SLP.
>> 	* gcc.dg/vect/slp-21.c: Likewise.
>> 	* gcc.dg/vect/slp-cond-1.c: Expect one more SLP.
>> 	* gcc.dg/vect/vect-gather-2.c: Expect SLP to be used.
>> 	* gcc.target/i386/pr52252-atom.c: XFAIL test for palignr.
>> ---
>>  gcc/testsuite/gcc.dg/vect/pr97428.c          |  2 +-
>>  gcc/testsuite/gcc.dg/vect/slp-11c.c          |  5 +++--
>>  gcc/testsuite/gcc.dg/vect/slp-12a.c          |  6 +++++-
>>  gcc/testsuite/gcc.dg/vect/slp-21.c           | 19 +++++--------------
>>  gcc/testsuite/gcc.dg/vect/slp-cond-1.c       |  2 +-
>>  gcc/testsuite/gcc.dg/vect/vect-complex-5.c   |  2 +-
>>  gcc/testsuite/gcc.dg/vect/vect-gather-2.c    |  1 -
>>  gcc/testsuite/gcc.target/i386/pr52252-atom.c |  3 ++-
>>  8 files changed, 18 insertions(+), 22 deletions(-)
>> 
>> diff --git a/gcc/testsuite/gcc.dg/vect/pr97428.c b/gcc/testsuite/gcc.dg/vect/pr97428.c
>> index 60dd984cfd3..3cc9976c00c 100644
>> --- a/gcc/testsuite/gcc.dg/vect/pr97428.c
>> +++ b/gcc/testsuite/gcc.dg/vect/pr97428.c
>> @@ -44,5 +44,5 @@ void foo_i2(dcmlx4_t dst[], const dcmlx_t src[], int n)
>>  /* { dg-final { scan-tree-dump "Detected interleaving store of size 16" "vect" } } */
>>  /* We're not able to peel & apply re-aligning to make accesses well-aligned for !vect_hw_misalign,
>>     but we could by peeling the stores for alignment and applying re-aligning loads.  */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail { ! vect_hw_misalign } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { ! vect_hw_misalign } } } } */
>>  /* { dg-final { scan-tree-dump-not "gap of 6 elements" "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-11c.c b/gcc/testsuite/gcc.dg/vect/slp-11c.c
>> index 0f680cd4e60..169b0d10eec 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-11c.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-11c.c
>> @@ -13,7 +13,8 @@ main1 ()
>>    unsigned int in[N*8] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63};
>>    float out[N*8];
>>  
>> -  /* Different operations - not SLPable.  */
>> +  /* Different operations - we SLP the store and split the group to two
>> +     single-lane branches.  */
>>    for (i = 0; i < N*4; i++)
>>      {
>>        out[i*2] = ((float) in[i*2] * 2 + 6) ;
>> @@ -44,4 +45,4 @@ int main (void)
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { { vect_uintfloat_cvt && vect_strided2 } && vect_int_mult } } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0  "vect"  } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1  "vect"  } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-12a.c b/gcc/testsuite/gcc.dg/vect/slp-12a.c
>> index 973de6ada21..2f98dc9da0b 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-12a.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-12a.c
>> @@ -40,6 +40,10 @@ main1 ()
>>        out[i*8 + 3] = b3 - 1;
>>        out[i*8 + 4] = b4 - 8;
>>        out[i*8 + 5] = b5 - 7;
>> +      /* Due to the use in the ia[i] store we keep the feeding expression
>> +         in the form ((in[i*8 + 6] + 11) * 3 - 3) while other expressions
>> +	 got associated as for example (in[i*5 + 5] * 4 + 33).  That
>> +	 causes SLP discovery to fail.  */
>>        out[i*8 + 6] = b6 - 3;
>>        out[i*8 + 7] = b7 - 7;
>>  
>> @@ -76,5 +80,5 @@ int main (void)
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { { vect_strided8 && {! vect_load_lanes } } && vect_int_mult } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-21.c b/gcc/testsuite/gcc.dg/vect/slp-21.c
>> index 58751688414..dc153a53b47 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-21.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-21.c
>> @@ -12,6 +12,7 @@ main1 ()
>>    unsigned short out[N*8], out2[N*8], b0, b1, b2, b3, b4, a0, a1, a2, a3, b5;
>>    unsigned short in[N*8];
>>  
>> +#pragma GCC novector
>>    for (i = 0; i < N*8; i++)
>>      {
>>        in[i] = i;
>> @@ -202,18 +203,8 @@ int main (void)
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
>> -/* Some targets can vectorize the second of the three main loops using
>> -   hybrid SLP.  For 128-bit vectors, the required 4->3 permutations are:
>> -
>> -   { 0, 1, 2, 4, 5, 6, 8, 9 }
>> -   { 2, 4, 5, 6, 8, 9, 10, 12 }
>> -   { 5, 6, 8, 9, 10, 12, 13, 14 }
>> -
>> -   Not all vect_perm targets support that, and it's a bit too specific to have
>> -   its own effective-target selector, so we just test targets directly.  */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { target { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { vect_strided4 && { ! { powerpc64*-*-* s390*-*-* loongarch*-*-* } } } } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect"  { target { ! { vect_strided4 } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect"  { target { vect_strided4 || vect_extract_even_odd } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  { target  { ! { vect_strided4 || vect_extract_even_odd } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 6 "vect" { xfail *-*-* } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" } } */
>>    
>> diff --git a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
>> index 450c7141c96..16ab0cc7605 100644
>> --- a/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-cond-1.c
>> @@ -125,4 +125,4 @@ main ()
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
>> index addcf60438c..ac562dc475c 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-complex-5.c
>> @@ -41,4 +41,4 @@ main (void)
>>  }
>>  
>>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target vect_load_lanes } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_load_lanes } xfail { ! vect_hw_misalign } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
>> index 4c23b808333..10e64e64d47 100644
>> --- a/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
>> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-2.c
>> @@ -36,6 +36,5 @@ f3 (int *restrict y, int *restrict x, int *restrict indices)
>>      }
>>  }
>>  
>> -/* { dg-final { scan-tree-dump-not "Loop contains only SLP stmts" vect } } */
>>  /* { dg-final { scan-tree-dump "different gather base" vect { target { ! vect_gather_load_ifn } } } } */
>>  /* { dg-final { scan-tree-dump "different gather scale" vect { target { ! vect_gather_load_ifn } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom.c b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
>> index 11f94411575..02736d56d31 100644
>> --- a/gcc/testsuite/gcc.target/i386/pr52252-atom.c
>> +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom.c
>> @@ -25,4 +25,5 @@ matrix_mul (byte *in, byte *out, int size)
>>      }
>>  }
>>  
>> -/* { dg-final { scan-assembler "palignr" } } */
>> +/* We are no longer using hybrid SLP.  */
>> +/* { dg-final { scan-assembler "palignr" { xfail *-*-* } } } */
>> 

  reply	other threads:[~2024-05-22 11:37 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-22 10:58 Richard Biener
2024-05-22 11:37 ` Richard Sandiford [this message]
2024-05-22 15:14 ` Jeff Law
  -- strict thread matches above, loose matches on Subject: below --
2024-05-21 12:48 Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mptjzjmjcwc.fsf@arm.com \
    --to=richard.sandiford@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=tamar.christina@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).