[PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
@ 2024-06-13  9:23 Alexandre Oliva
  2024-06-19 14:57 ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-13  9:23 UTC (permalink / raw)
  To: gcc-patches
  Cc: Rainer Orth, Mike Stump, Nick Clifton, Richard Earnshaw,
	Ramana Radhakrishnan


The test was too optimistic, alas.  We used to vectorize shifts
involving 8-bit and 16-bit integral types by clamping the shift count
at the highest in-range shift count, but that was not correct: such
narrow shifts expect integral promotion, so larger shift counts should
be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the incorrect clamping we no longer perform
needs to be adjusted.

Tested on x86_64-linux-gnu-x-arm-eabi.  Also tested with gcc-13
x-arm-vx7r2.  Ok to install?


for  gcc/testsuite/ChangeLog

	PR tree-optimization/113281
	* gcc.target/arm/simd/mve-vshr.c: Adjust expectations.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..8253427db6ef6 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -56,9 +56,9 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
 /* MVE has only 128-bit vectors, so we can vectorize only half of the
    functions above.  */
 /* Vector right shifts use vneg and left shifts.  */
-/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
-/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
-/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 6 } } */
+/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 1 } } */
+/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 2 } } */
 
 
 /* Shift by immediate.  */


-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-13  9:23 [PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281] Alexandre Oliva
@ 2024-06-19 14:57 ` Richard Earnshaw (lists)
  2024-06-20 10:50   ` [PATCH v2] " Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Earnshaw (lists) @ 2024-06-19 14:57 UTC (permalink / raw)
  To: Alexandre Oliva, gcc-patches
  Cc: Rainer Orth, Mike Stump, Nick Clifton, Ramana Radhakrishnan

On 13/06/2024 10:23, Alexandre Oliva wrote:
> 
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).
> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.
> 
> Tested on x86_64-linux-gnu-x-arm-eabi.  Also tested with gcc-13
> x-arm-vx7r2.  Ok to install?
> 
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR tree-optimization/113281
> 	* gcc.target/arm/simd/mve-vshr.c: Adjust expectations.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..8253427db6ef6 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -56,9 +56,9 @@ FUNC_IMM(u, uint, 8, 16, >>, vshrimm)
>  /* MVE has only 128-bit vectors, so we can vectorize only half of the
>     functions above.  */
>  /* Vector right shifts use vneg and left shifts.  */
> -/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
> -/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 3 } } */
> -/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 6 } } */
> +/* { dg-final { scan-assembler-times {vshl.s[0-9]+\tq[0-9]+, q[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vshl.u[0-9]+\tq[0-9]+, q[0-9]+} 1 } } */
> +/* { dg-final { scan-assembler-times {vneg.s[0-9]+\tq[0-9]+, q[0-9]+} 2 } } */
>  
>  
>  /* Shift by immediate.  */
> 
> 

We know the range of the LHS of the shift operand (it comes from a T* array, so that isn't the issue.  The problem comes from the RHS of the shift where the range can legitimately come from 0..31 (since the shift is conceptually performed at int precision).  That can't be handled correctly as gimple only supports shifts on the number of bits in the type, itself.  But we can inform the compiler that it needn't care about the larger range with a __builtin_unreachable().

It looks like adding

      if ((unsigned)b[i] >= 8*sizeof (TYPE##BITS##_t)) \
	__builtin_unreachable();		       \
 
to the shift-by-variable case is enough to tell the vectorizer that it's safe to vectorize this code without needing to handle any additional clamping.

Since this test is primarily about testing the MVE vector operations, I think I'd rather go with a solution along those lines rather than nobbling the test.

Thoughts?

R.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-19 14:57 ` Richard Earnshaw (lists)
@ 2024-06-20 10:50   ` Alexandre Oliva
  2024-06-20 11:57     ` Christophe Lyon
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-20 10:50 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: gcc-patches, Rainer Orth, Mike Stump, Nick Clifton, Ramana Radhakrishnan

On Jun 19, 2024, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:

> It looks like adding

>       if ((unsigned)b[i] >= 8*sizeof (TYPE##BITS##_t)) \
> 	__builtin_unreachable();		       \

Ah, yes, nice, good idea, thanks!

Here's the patch that implements that, co-attributed to you, as IMHO it
should be; please let me know if you find otherwise.  The commit message
is mostly unchanged from v1, except for the very end.  Tested on
x86_64-linux-gnu-x-arm-eabi.  Ok to install?

The test was too optimistic, alas.  We used to vectorize shifts
involving 8-bit and 16-bit integral types by clamping the shift count
at the highest in-range shift count, but that was not correct: such
narrow shifts expect integral promotion, so larger shift counts should
be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the incorrect clamping we no longer perform
needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization.

Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>

for  gcc/testsuite/ChangeLog

	PR tree-optimization/113281
	* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..35cd0e75be5dc 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -9,6 +9,8 @@
   void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
     int i;								\
     for (i=0; i<NB; i++) {						\
+      if ((unsigned)b[i] >= __CHAR_BIT__ * sizeof (TYPE##BITS##_t))	\
+	__builtin_unreachable();					\
       dest[i] = a[i] OP b[i];						\
     }									\
 }

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-20 10:50   ` [PATCH v2] " Alexandre Oliva
@ 2024-06-20 11:57     ` Christophe Lyon
  2024-06-21  7:57       ` [PATCH v3] " Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Christophe Lyon @ 2024-06-20 11:57 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Earnshaw (lists),
	gcc-patches, Rainer Orth, Mike Stump, Nick Clifton,
	Ramana Radhakrishnan

Hi,


On Thu, 20 Jun 2024 at 12:51, Alexandre Oliva <oliva@adacore.com> wrote:
>
> On Jun 19, 2024, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:
>
> > It looks like adding
>
> >       if ((unsigned)b[i] >= 8*sizeof (TYPE##BITS##_t)) \
> >       __builtin_unreachable();                       \
>
> Ah, yes, nice, good idea, thanks!
>
> Here's the patch that implements that, co-attributed to you, as IMHO it
> should be; please let me know if you find otherwise.  The commit message
> is mostly unchanged from v1, except for the very end.  Tested on
> x86_64-linux-gnu-x-arm-eabi.  Ok to install?
>
>
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).
>
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
>
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization.
>
>
> Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
>
> for  gcc/testsuite/ChangeLog
>
>         PR tree-optimization/113281
>         * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..35cd0e75be5dc 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>    void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>      int i;                                                             \
>      for (i=0; i<NB; i++) {                                             \
> +      if ((unsigned)b[i] >= __CHAR_BIT__ * sizeof (TYPE##BITS##_t))    \

Maybe using
if ((unsigned)b[i] >= BITS) \
would be clearer?

Thanks,

Christophe

> +       __builtin_unreachable();                                        \
>        dest[i] = a[i] OP b[i];                                          \
>      }                                                                  \
>  }
>
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-20 11:57     ` Christophe Lyon
@ 2024-06-21  7:57       ` Alexandre Oliva
  2024-06-21 10:14         ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-21  7:57 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Richard Earnshaw (lists),
	gcc-patches, Rainer Orth, Mike Stump, Nick Clifton,
	Ramana Radhakrishnan

On Jun 20, 2024, Christophe Lyon <christophe.lyon@linaro.org> wrote:

> Maybe using
> if ((unsigned)b[i] >= BITS) \
> would be clearer?

Heh.  Why make it simpler if we can make it unreadable, right? :-D

Thanks, here's another version I've just retested on x-arm-eabi.  Ok?

I'm not sure how to credit your suggestion.  It's not like you pretty
much wrote the entire patch, as in Richard's case, but it's still a
sizable chunk of this two-liner.  Any preferences?

The test was too optimistic, alas.  We used to vectorize shifts
involving 8-bit and 16-bit integral types by clamping the shift count
at the highest in-range shift count, but that was not correct: such
narrow shifts expect integral promotion, so larger shift counts should
be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the incorrect clamping we no longer perform
needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization.

Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>

for  gcc/testsuite/ChangeLog

	PR tree-optimization/113281
	* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..03078de49c65e 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -9,6 +9,8 @@
   void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
     int i;								\
     for (i=0; i<NB; i++) {						\
+      if ((unsigned)b[i] >= (unsigned)(BITS))				\
+	__builtin_unreachable();					\
       dest[i] = a[i] OP b[i];						\
     }									\
 }

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-21  7:57       ` [PATCH v3] " Alexandre Oliva
@ 2024-06-21 10:14         ` Richard Earnshaw (lists)
  2024-06-21 13:39           ` Christophe Lyon
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Earnshaw (lists) @ 2024-06-21 10:14 UTC (permalink / raw)
  To: Alexandre Oliva, Christophe Lyon
  Cc: gcc-patches, Rainer Orth, Mike Stump, Nick Clifton, Ramana Radhakrishnan

On 21/06/2024 08:57, Alexandre Oliva wrote:
> On Jun 20, 2024, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> 
>> Maybe using
>> if ((unsigned)b[i] >= BITS) \
>> would be clearer?
> 
> Heh.  Why make it simpler if we can make it unreadable, right? :-D
> 
> Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> 
> I'm not sure how to credit your suggestion.  It's not like you pretty
> much wrote the entire patch, as in Richard's case, but it's still a
> sizable chunk of this two-liner.  Any preferences?

How about mentioning Christophe's simplification in the commit log?
> 
> 
> The test was too optimistic, alas.  We used to vectorize shifts
> involving 8-bit and 16-bit integral types by clamping the shift count
> at the highest in-range shift count, but that was not correct: such
> narrow shifts expect integral promotion, so larger shift counts should
> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> before the fix).

This is OK, but you might wish to revisit this statement before committing.  I think the above is a mis-summary of the original bug report which had a test to pick between 0 and 1 as the result of a shift operation.

If I've understood what's going on here correctly, then we have 

(int16_t)32768 >> (int16_t) 16

but shift is always done at int precision, so this is (due to default promotions)

(int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.

which then simplifies to

-32768 >> 16;  // 0xffff8000 >> 16

= -1;

I think the original bug was that we were losing the cast to short (and hence the sign extension of the intermediate value), so effectively we simplified this to 

32768 >> 16; // 0x00008000 >> 16

= 0;

And the other part of the observation was that it had to be done this way (and couldn't be narrowed for vectorization) because 16 is larger than the maximum shift for a short (actually you say that just below).

R.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the incorrect clamping we no longer perform
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization.
> 
> 
> Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR tree-optimization/113281
> 	* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>    void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>      int i;								\
>      for (i=0; i<NB; i++) {						\
> +      if ((unsigned)b[i] >= (unsigned)(BITS))				\
> +	__builtin_unreachable();					\
>        dest[i] = a[i] OP b[i];						\
>      }									\
>  }
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-21 10:14         ` Richard Earnshaw (lists)
@ 2024-06-21 13:39           ` Christophe Lyon
  2024-06-24 11:35             ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Christophe Lyon @ 2024-06-21 13:39 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Alexandre Oliva, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan

On Fri, 21 Jun 2024 at 12:14, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
>
> On 21/06/2024 08:57, Alexandre Oliva wrote:
> > On Jun 20, 2024, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> >
> >> Maybe using
> >> if ((unsigned)b[i] >= BITS) \
> >> would be clearer?
> >
> > Heh.  Why make it simpler if we can make it unreadable, right? :-D
> >
> > Thanks, here's another version I've just retested on x-arm-eabi.  Ok?
> >
> > I'm not sure how to credit your suggestion.  It's not like you pretty
> > much wrote the entire patch, as in Richard's case, but it's still a
> > sizable chunk of this two-liner.  Any preferences?
>
> How about mentioning Christophe's simplification in the commit log?

For the avoidance of doubt: it's OK for me (but you don't need to
mention my name in fact ;-)

Thanks,

Christophe

> >
> >
> > The test was too optimistic, alas.  We used to vectorize shifts
> > involving 8-bit and 16-bit integral types by clamping the shift count
> > at the highest in-range shift count, but that was not correct: such
> > narrow shifts expect integral promotion, so larger shift counts should
> > be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
> > before the fix).
>
> This is OK, but you might wish to revisit this statement before committing.  I think the above is a mis-summary of the original bug report which had a test to pick between 0 and 1 as the result of a shift operation.
>
> If I've understood what's going on here correctly, then we have
>
> (int16_t)32768 >> (int16_t) 16
>
> but shift is always done at int precision, so this is (due to default promotions)
>
> (int)(int16_t)32768 >> 16  // size/type of the shift amount does not matter.
>
> which then simplifies to
>
> -32768 >> 16;  // 0xffff8000 >> 16
>
> = -1;
>
> I think the original bug was that we were losing the cast to short (and hence the sign extension of the intermediate value), so effectively we simplified this to
>
> 32768 >> 16; // 0x00008000 >> 16
>
> = 0;
>
> And the other part of the observation was that it had to be done this way (and couldn't be narrowed for vectorization) because 16 is larger than the maximum shift for a short (actually you say that just below).
>
> R.
>
> >
> > Unfortunately, in the gimple model of vector units, such large shift
> > counts wouldn't be well-defined, so we won't vectorize such shifts any
> > more, unless we can tell they're in range or undefined.
> >
> > So the test that expected the incorrect clamping we no longer perform
> > needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> > suggested annotating the test with the expected ranges so as to enable
> > the optimization.
> >
> >
> > Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
> >
> > for  gcc/testsuite/ChangeLog
> >
> >       PR tree-optimization/113281
> >       * gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
> > ---
> >  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > index 8c7adef9ed8f1..03078de49c65e 100644
> > --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> > @@ -9,6 +9,8 @@
> >    void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
> >      int i;                                                           \
> >      for (i=0; i<NB; i++) {                                           \
> > +      if ((unsigned)b[i] >= (unsigned)(BITS))                                \
> > +     __builtin_unreachable();                                        \
> >        dest[i] = a[i] OP b[i];                                                \
> >      }                                                                        \
> >  }
> >
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-21 13:39           ` Christophe Lyon
@ 2024-06-24 11:35             ` Alexandre Oliva
  2024-06-24 16:20               ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-24 11:35 UTC (permalink / raw)
  To: Christophe Lyon
  Cc: Richard Earnshaw (lists),
	gcc-patches, Rainer Orth, Mike Stump, Nick Clifton,
	Ramana Radhakrishnan

On Jun 21, 2024, Christophe Lyon <christophe.lyon@linaro.org> wrote:

>> How about mentioning Christophe's simplification in the commit log?

> For the avoidance of doubt: it's OK for me (but you don't need to
> mention my name in fact ;-)

Needing or not, I added it ;-)

>> > be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
>> > before the fix).

>> This is OK, but you might wish to revisit this statement before
>> committing.

Oh, right, sorry, I messed it up.  uint16_t was what I should have put
in there.  int16_t would have overflown and invoked undefined behavior
to begin with, and I see it misled you far down the wrong path.

>> I think the original bug was that we were losing the cast to short

The problem was that the shift count saturated at 15.  AFAIK sign
extension was not relevant.  Hopefully the rewritten opening paragraph
below makes that clearer.  I will put it in later this week barring
objections or further suggestions of improvement.  Thanks,

[testsuite] [arm] [vect] adjust mve-vshr test [PR113281]

The test was too optimistic, alas.  We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).

Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.

So the test that expected the vectorization we no longer performed
needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.

Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>

for  gcc/testsuite/ChangeLog

	PR tree-optimization/113281
	* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.
---
 gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
index 8c7adef9ed8f1..03078de49c65e 100644
--- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
+++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
@@ -9,6 +9,8 @@
   void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
     int i;								\
     for (i=0; i<NB; i++) {						\
+      if ((unsigned)b[i] >= (unsigned)(BITS))				\
+	__builtin_unreachable();					\
       dest[i] = a[i] OP b[i];						\
     }									\
 }

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-24 11:35             ` Alexandre Oliva
@ 2024-06-24 16:20               ` Richard Earnshaw (lists)
  2024-06-24 19:01                 ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Earnshaw (lists) @ 2024-06-24 16:20 UTC (permalink / raw)
  To: Alexandre Oliva, Christophe Lyon
  Cc: gcc-patches, Rainer Orth, Mike Stump, Nick Clifton, Ramana Radhakrishnan

On 24/06/2024 12:35, Alexandre Oliva wrote:
> On Jun 21, 2024, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> 
>>> How about mentioning Christophe's simplification in the commit log?
> 
>> For the avoidance of doubt: it's OK for me (but you don't need to
>> mention my name in fact ;-)
> 
> Needing or not, I added it ;-)
> 
>>>> be accepted.  (int16_t)32768 >> (int16_t)16 must yield 0, not 1 (as
>>>> before the fix).
> 
>>> This is OK, but you might wish to revisit this statement before
>>> committing.
> 
> Oh, right, sorry, I messed it up.  uint16_t was what I should have put
> in there.  int16_t would have overflown and invoked undefined behavior
> to begin with, and I see it misled you far down the wrong path.
> 
>>> I think the original bug was that we were losing the cast to short
> 
> The problem was that the shift count saturated at 15.  AFAIK sign
> extension was not relevant.  Hopefully the rewritten opening paragraph
> below makes that clearer.  I will put it in later this week barring
> objections or further suggestions of improvement.  Thanks,

A signed shift right on a 16-bit vector element by 15 would still yield -1; but ...

> 
> 
> [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
> 
> The test was too optimistic, alas.  We used to vectorize shifts by
> clamping the shift counts below the bit width of the types (e.g. at 15
> for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
> well defined (because of promotion to 32-bit int) and must yield 0,
> not 1 (as before the fix).

That make more sense now.  Thanks.

> 
> Unfortunately, in the gimple model of vector units, such large shift
> counts wouldn't be well-defined, so we won't vectorize such shifts any
> more, unless we can tell they're in range or undefined.
> 
> So the test that expected the vectorization we no longer performed
> needs to be adjusted.  Instead of nobbling the test, Richard Earnshaw
> suggested annotating the test with the expected ranges so as to enable
> the optimization, and Christophe Lyon suggested a further
> simplification.
> 
> 
> Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
> 
> for  gcc/testsuite/ChangeLog
> 
> 	PR tree-optimization/113281
> 	* gcc.target/arm/simd/mve-vshr.c: Add expected ranges.

I think this is OK now.

R.

> ---
>  gcc/testsuite/gcc.target/arm/simd/mve-vshr.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> index 8c7adef9ed8f1..03078de49c65e 100644
> --- a/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> +++ b/gcc/testsuite/gcc.target/arm/simd/mve-vshr.c
> @@ -9,6 +9,8 @@
>    void test_ ## NAME ##_ ## SIGN ## BITS ## x ## NB (TYPE##BITS##_t * __restrict__ dest, TYPE##BITS##_t *a, TYPE##BITS##_t *b) { \
>      int i;								\
>      for (i=0; i<NB; i++) {						\
> +      if ((unsigned)b[i] >= (unsigned)(BITS))				\
> +	__builtin_unreachable();					\
>        dest[i] = a[i] OP b[i];						\
>      }									\
>  }
> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-24 16:20               ` Richard Earnshaw (lists)
@ 2024-06-24 19:01                 ` Alexandre Oliva
  2024-06-25 17:33                   ` Richard Sandiford
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-24 19:01 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Sandiford
  Cc: Christophe Lyon, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan

On Jun 24, 2024, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:

> A signed shift right on a 16-bit vector element by 15 would still
> yield -1

Yeah.  Indeed, ISTM that we *could* have retained the clamping
transformation for *signed* shifts, since the clamping would only make a
difference in case of (undefined) overflow.  Only for unsigned shifts
can well-defined shifts yield different results with clamping.

Richard (Sandiford), do you happen to recall why the IRC conversation
mentioned in the PR trail decided to drop it entirely, even for signed
types?

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-24 19:01                 ` Alexandre Oliva
@ 2024-06-25 17:33                   ` Richard Sandiford
  2024-06-26  5:15                     ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2024-06-25 17:33 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Earnshaw (lists),
	Christophe Lyon, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan

Alexandre Oliva <oliva@adacore.com> writes:
> On Jun 24, 2024, "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com> wrote:
>
>> A signed shift right on a 16-bit vector element by 15 would still
>> yield -1
>
> Yeah.  Indeed, ISTM that we *could* have retained the clamping
> transformation for *signed* shifts, since the clamping would only make a
> difference in case of (undefined) overflow.  Only for unsigned shifts
> can well-defined shifts yield different results with clamping.
>
> Richard (Sandiford), do you happen to recall why the IRC conversation
> mentioned in the PR trail decided to drop it entirely, even for signed
> types?

In the PR, the original shift was 32768 >> x (x >= 16) on ints, which the
vectoriser was narrowing to 32768 >> x' on shorts.  The original shift is
well-defined for both signed and unsigned shifts, and no valid x' exists
for that case.

Thanks,
Richard


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-25 17:33                   ` Richard Sandiford
@ 2024-06-26  5:15                     ` Alexandre Oliva
  2024-06-26  6:36                       ` Richard Sandiford
  0 siblings, 1 reply; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-26  5:15 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Christophe Lyon, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan, richard.sandiford

On Jun 25, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:

>> Richard (Sandiford), do you happen to recall why the IRC conversation
>> mentioned in the PR trail decided to drop it entirely, even for signed
>> types?

> In the PR, the original shift was 32768 >> x (x >= 16) on ints, which the
> vectoriser was narrowing to 32768 >> x' on shorts.  The original shift is
> well-defined for both signed and unsigned shifts, and no valid x' exists
> for that case.

It sounds like shifts on shorts proper, that would have benefitted from
the optimization, was not covered and thus there may be room for
reconsidering, eh?

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-26  5:15                     ` Alexandre Oliva
@ 2024-06-26  6:36                       ` Richard Sandiford
  2024-06-27  9:33                         ` Alexandre Oliva
  0 siblings, 1 reply; 14+ messages in thread
From: Richard Sandiford @ 2024-06-26  6:36 UTC (permalink / raw)
  To: Alexandre Oliva
  Cc: Richard Earnshaw (lists),
	Christophe Lyon, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan

Alexandre Oliva <oliva@adacore.com> writes:
> On Jun 25, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
>>> Richard (Sandiford), do you happen to recall why the IRC conversation
>>> mentioned in the PR trail decided to drop it entirely, even for signed
>>> types?
>
>> In the PR, the original shift was 32768 >> x (x >= 16) on ints, which the
>> vectoriser was narrowing to 32768 >> x' on shorts.  The original shift is
>> well-defined for both signed and unsigned shifts, and no valid x' exists
>> for that case.
>
> It sounds like shifts on shorts proper, that would have benefitted from
> the optimization, was not covered and thus there may be room for
> reconsidering, eh?

What kind of case are you thinking of?  If a frontend creates a true
16-bit shift then it wouldn't need to be narrowed by this optimisation.

Like you say in the commit message, the problem here is that the values
are promoted to ints and then shifted as 32-bit values, so shifts by
>= 16 are well-defined.

Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
  2024-06-26  6:36                       ` Richard Sandiford
@ 2024-06-27  9:33                         ` Alexandre Oliva
  0 siblings, 0 replies; 14+ messages in thread
From: Alexandre Oliva @ 2024-06-27  9:33 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Christophe Lyon, gcc-patches, Rainer Orth, Mike Stump,
	Nick Clifton, Ramana Radhakrishnan, richard.sandiford

On Jun 26, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:

> Alexandre Oliva <oliva@adacore.com> writes:
>> On Jun 25, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:
>> 
>>>> Richard (Sandiford), do you happen to recall why the IRC conversation
>>>> mentioned in the PR trail decided to drop it entirely, even for signed
>>>> types?
>> 
>>> In the PR, the original shift was 32768 >> x (x >= 16) on ints, which the
>>> vectoriser was narrowing to 32768 >> x' on shorts.  The original shift is
>>> well-defined for both signed and unsigned shifts, and no valid x' exists
>>> for that case.
>> 
>> It sounds like shifts on shorts proper, that would have benefitted from
>> the optimization, was not covered and thus there may be room for
>> reconsidering, eh?

> What kind of case are you thinking of?  If a frontend creates a true
> 16-bit shift then it wouldn't need to be narrowed by this optimisation.

I'm thinking of *any* (looped over arrays) shifts of *signed* shorts.
The compiler can't generally tell that the shift count is < 16, as would
now be required to use the vector instructions, but that's not
necessary: for *signed* shorts, clamping the shift count at 15 like we
used to do is enough to get the correct result for well-defined (as in
non-overflowing) operations.  ISTM we've given up a useful optimization.

-- 
Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
   Free Software Activist                   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-06-27  9:34 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-13  9:23 [PATCH] [testsuite] [arm] [vect] adjust mve-vshr test [PR113281] Alexandre Oliva
2024-06-19 14:57 ` Richard Earnshaw (lists)
2024-06-20 10:50   ` [PATCH v2] " Alexandre Oliva
2024-06-20 11:57     ` Christophe Lyon
2024-06-21  7:57       ` [PATCH v3] " Alexandre Oliva
2024-06-21 10:14         ` Richard Earnshaw (lists)
2024-06-21 13:39           ` Christophe Lyon
2024-06-24 11:35             ` Alexandre Oliva
2024-06-24 16:20               ` Richard Earnshaw (lists)
2024-06-24 19:01                 ` Alexandre Oliva
2024-06-25 17:33                   ` Richard Sandiford
2024-06-26  5:15                     ` Alexandre Oliva
2024-06-26  6:36                       ` Richard Sandiford
2024-06-27  9:33                         ` Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).