[Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs
@ 2024-05-31  8:45 ro at gcc dot gnu.org
  2024-05-31  8:46 ` [Bug tree-optimization/115304] " ro at gcc dot gnu.org
                   ` (14 more replies)
  0 siblings, 15 replies; 16+ messages in thread
From: ro at gcc dot gnu.org @ 2024-05-31  8:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

            Bug ID: 115304
           Summary: gcc.dg/vect/slp-gap-1.c FAILs
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ro at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: sparc*-sun-solaris2.11

The new gcc.dg/vect/slp-gap-1.c test FAILs on 32 and 64-bit Solaris/SPARC:

+FAIL: gcc.dg/vect/slp-gap-1.c -flto -ffat-lto-objects  scan-tree-dump-times
vect "{_[0-9]+, 0" 6
+FAIL: gcc.dg/vect/slp-gap-1.c scan-tree-dump-times vect "{_[0-9]+, 0" 6

The dump has

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21: note: 
 ==> examining statement: _34 = *pix1_72;
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   unsupported vect permute { 0 1 2 3 8 9 10 11 }
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   unsupported load permutation
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:9:29:
missed:   not vectorized: relevant stmt not supported: _34 = *pix1_72;

/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21: note: 
 ==> examining statement: _34 = *pix1_72;
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   no array mode for V8QI[16]
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   no array mode for V8QI[16] 
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   extract even/odd not supported by target
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:7:21:
missed:   not falling back to elementwise accesses
/vol/gcc/src/hg/master/local/gcc/testsuite/gcc.dg/vect/slp-gap-1.c:9:29:
missed:   not vectorized: relevant stmt not supported: _34 = *pix1_72;

IIUC the test needs both vect_perm and vect_extract_even_odd.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
@ 2024-05-31  8:46 ` ro at gcc dot gnu.org
  2024-05-31  8:46 ` ro at gcc dot gnu.org
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ro at gcc dot gnu.org @ 2024-05-31  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #1 from Rainer Orth <ro at gcc dot gnu.org> ---
Created attachment 58317
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58317&action=edit
32-bit sparc-sun-solaris2.11 slp-gap-1.c.179t.vect

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
  2024-05-31  8:46 ` [Bug tree-optimization/115304] " ro at gcc dot gnu.org
@ 2024-05-31  8:46 ` ro at gcc dot gnu.org
  2024-05-31 12:26 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ro at gcc dot gnu.org @ 2024-05-31  8:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Rainer Orth <ro at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
  2024-05-31  8:46 ` [Bug tree-optimization/115304] " ro at gcc dot gnu.org
  2024-05-31  8:46 ` ro at gcc dot gnu.org
@ 2024-05-31 12:26 ` rguenth at gcc dot gnu.org
  2024-06-03  9:18 ` ro at CeBiTec dot Uni-Bielefeld.DE
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-31 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |testsuite-fail

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It should only need vect32 - basically I assumed the target can compose the
64bit vector from two 32bit elements.  But it might be that for this to work
the loads would need to be aligned.

What is needed is char-to-short unpacking and vector composition.  Either
composing V2SImode or V8QImode from two V4QImode vectors.

Does the following help?

diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
index 36463ca22c5..08942380caa 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -4,6 +4,9 @@
 typedef unsigned char uint8_t;
 typedef short int16_t;
 void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
+  diff = __builtin_assume_aligned (diff, __BIGGEST_ALIGNMENT__);
+  pix1 = __builtin_assume_aligned (pix1, 4);
+  pix2 = __builtin_assume_aligned (pix2, 4);
   for (int y = 0; y < 4; y++) {
     for (int x = 0; x < 4; x++)
       diff[x + y * 4] = pix1[x] - pix2[x];

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-05-31 12:26 ` rguenth at gcc dot gnu.org
@ 2024-06-03  9:18 ` ro at CeBiTec dot Uni-Bielefeld.DE
  2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ro at CeBiTec dot Uni-Bielefeld.DE @ 2024-06-03  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #3 from ro at CeBiTec dot Uni-Bielefeld.DE <ro at CeBiTec dot Uni-Bielefeld.DE> ---
> --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
> It should only need vect32 - basically I assumed the target can compose the
> 64bit vector from two 32bit elements.  But it might be that for this to work
> the loads would need to be aligned.
>
> What is needed is char-to-short unpacking and vector composition.  Either
> composing V2SImode or V8QImode from two V4QImode vectors.
>
> Does the following help?

Unfortunately not: makes no difference AFAICS.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-03  9:18 ` ro at CeBiTec dot Uni-Bielefeld.DE
@ 2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
  2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-03 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|sparc*-sun-solaris2.11      |sparc*-sun-solaris2.11 GCN
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-06-03
     Ever confirmed|0                           |1
                 CC|                            |ams at gcc dot gnu.org,
                   |                            |tschwinge at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
@ 2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
  2024-06-03 10:13 ` tschwinge at gcc dot gnu.org
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-03 10:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #4 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Created attachment 58332
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58332&action=edit
GCN target ('-march=gfx908') 'slp-gap-1.c.179t.vect'

Similar (I suppose?) for GCN target (tested '-march=gfx908'):

    +PASS: gcc.dg/vect/slp-gap-1.c (test for excess errors)
    +FAIL: gcc.dg/vect/slp-gap-1.c scan-tree-dump-times vect "{_[0-9]+, 0" 6

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
@ 2024-06-03 10:13 ` tschwinge at gcc dot gnu.org
  2024-06-03 12:43 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-03 10:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #5 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Created attachment 58333
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58333&action=edit
'c2' GCN target ('-march=gfx908') 'slp-gap-1.c.179t.vect'

(In reply to ro@CeBiTec.Uni-Bielefeld.DE from comment #3)
> > --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > It should only need vect32 - basically I assumed the target can compose the
> > 64bit vector from two 32bit elements.  But it might be that for this to work
> > the loads would need to be aligned.
> >
> > What is needed is char-to-short unpacking and vector composition.  Either
> > composing V2SImode or V8QImode from two V4QImode vectors.
> >
> > Does the following help?
> 
> Unfortunately not: makes no difference AFAICS.

Also doesn't resolve the issue for GCN target (tested '-march=gfx908'); see
attached 'c2' GCN target ('-march=gfx908') 'slp-gap-1.c.179t.vect'.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-06-03 10:13 ` tschwinge at gcc dot gnu.org
@ 2024-06-03 12:43 ` rguenth at gcc dot gnu.org
  2024-06-03 12:46 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-03 12:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
For GCN the issue is that with vector(64) unsigned short we fail the permute
(but we have { target vect64 } for this reason), but we then re-try with
the same mode but with SLP disabled and that succeeds.

The best strathegy for GCN would be to gather V4QImode aka SImode into the
V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
doing consecutive loads isn't a good strategy here.

On x86 we can use a small vector and use half of it (gathers would be slow).

On sparc we start with V8QImode which is great but then sparc doesn't seem
able to build a V8QImode vector from two V4QImode vectors or have
V2SImode and build from two SImode values (and load SImode from pix1/pix2,
that possibly due to alignment).  I do see a vec_initv2sisi though.  Ah,
so we verify we can do the load using a permutation, permute two V8QImode
'a' and 'b' to get you a { a_low, b_low } V8QImode vector.  The other
part is eliding of the gap that will end up loading half of the vector
but then pad it out as { a_low, 0 } but then still invoke this unsupported
permutation to get { a_low, b_low }.  So in this case requiring vect_perm
would fix this though there is sparc_vectorize_vec_perm_const and vec_perm<>
guarded with VIS2, with -mvis2 we get past this failure point and run into

missed:   not vectorized: relevant stmt not supported: _35 = (unsigned short)
_34;

So there's no vec_upack_{hi,lo}_v4hi.  vect_unpack guards this.

Maybe I should move the test to be x86 specific.

I'll add the two dg-effective targets to fix the solaris fallout for now.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-06-03 12:43 ` rguenth at gcc dot gnu.org
@ 2024-06-03 12:46 ` cvs-commit at gcc dot gnu.org
  2024-06-03 12:47 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-03 12:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:ed8ba88074f3663f810ef2f07d79c3fcde5d9697

commit r15-991-ged8ba88074f3663f810ef2f07d79c3fcde5d9697
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Jun 3 14:43:42 2024 +0200

    testsuite/115304 - properly guard gcc.dg/vect/slp-gap-1.c

    Testing on sparc shows we need vect_unpack and vect_perm.  This
    isn't enough to resolve the GCN fail which ends up using interleaving.

            PR testsuite/115304
            * gcc.dg/vect/slp-gap-1.c: Require vect_unpack and vect_perm.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-06-03 12:46 ` cvs-commit at gcc dot gnu.org
@ 2024-06-03 12:47 ` rguenth at gcc dot gnu.org
  2024-06-03 13:33 ` ams at gcc dot gnu.org
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-03 12:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|sparc*-sun-solaris2.11 GCN  |GCN

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
Should be fixed on sparc.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-06-03 12:47 ` rguenth at gcc dot gnu.org
@ 2024-06-03 13:33 ` ams at gcc dot gnu.org
  2024-06-03 13:51 ` rguenther at suse dot de
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-03 13:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> The best strathegy for GCN would be to gather V4QImode aka SImode into the
> V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> doing consecutive loads isn't a good strategy here.

I don't fully understand what you're trying to say here, so apologies if you
knew all this already and I missed the point.....

In general, on GCN V4QImode is not in any way equivalent to SImode (when the
values are in registers). The vector registers are not one single string of
re-interpretable bits.

For the same reason, you can't load a value as V64QImode and then try to
interpret it as V16SImode. GCN vector registers just don't work like
SSE/Neon/etc.

When you load a V64QImode vector, each lane is extended to 32 bits, so what you
actually get in hardware is a V64SImode vector.

Likewise, when you load a V4QImode vector the hardware representation is
actually V4SImode (which in itself is just V64SImode with undefined values in
the unused lanes).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-06-03 13:33 ` ams at gcc dot gnu.org
@ 2024-06-03 13:51 ` rguenther at suse dot de
  2024-06-03 14:11 ` ams at gcc dot gnu.org
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 16+ messages in thread
From: rguenther at suse dot de @ 2024-06-03 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
> 
> --- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #6)
> > The best strathegy for GCN would be to gather V4QImode aka SImode into the
> > V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> > doing consecutive loads isn't a good strategy here.
> 
> I don't fully understand what you're trying to say here, so apologies if you
> knew all this already and I missed the point.....
> 
> In general, on GCN V4QImode is not in any way equivalent to SImode (when the
> values are in registers). The vector registers are not one single string of
> re-interpretable bits.
> 
> For the same reason, you can't load a value as V64QImode and then try to
> interpret it as V16SImode. GCN vector registers just don't work like
> SSE/Neon/etc.
> 
> When you load a V64QImode vector, each lane is extended to 32 bits, so what you
> actually get in hardware is a V64SImode vector.
> 
> Likewise, when you load a V4QImode vector the hardware representation is
> actually V4SImode (which in itself is just V64SImode with undefined values in
> the unused lanes).

I see.  I wonder if there's not one or two latent wrong-code because of
this and the vectorizers assumptions ;)  I suppose modes_tieable_p
will tell us whether a VIEW_CONVERT_EXPR will do the right thing?
Is GET_MODE_SIZE (V64QImode) == GET_MODE_SIZE (V64SImode) btw?
And V64QImode really V64PSImode?

Still for a V64QImode load on { c[0], c[1], c[2], c[3], c[32], c[33], 
c[34], c[35], ... } it's probably best to use a single V64QImode gather 
with GCN then rather than four "consecutive" V64QImode loads and then
element swizzling.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-06-03 13:51 ` rguenther at suse dot de
@ 2024-06-03 14:11 ` ams at gcc dot gnu.org
  2024-06-20  5:15 ` pinskia at gcc dot gnu.org
  2024-06-20  6:45 ` rguenther at suse dot de
  14 siblings, 0 replies; 16+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-03 14:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #11 from Andrew Stubbs <ams at gcc dot gnu.org> ---
(In reply to rguenther@suse.de from comment #10)
> On Mon, 3 Jun 2024, ams at gcc dot gnu.org wrote:
> 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
> > 
> > --- Comment #9 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #6)
> > > The best strathegy for GCN would be to gather V4QImode aka SImode into the
> > > V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
> > > doing consecutive loads isn't a good strategy here.
> > 
> > I don't fully understand what you're trying to say here, so apologies if you
> > knew all this already and I missed the point.....
> > 
> > In general, on GCN V4QImode is not in any way equivalent to SImode (when the
> > values are in registers). The vector registers are not one single string of
> > re-interpretable bits.
> > 
> > For the same reason, you can't load a value as V64QImode and then try to
> > interpret it as V16SImode. GCN vector registers just don't work like
> > SSE/Neon/etc.
> > 
> > When you load a V64QImode vector, each lane is extended to 32 bits, so what you
> > actually get in hardware is a V64SImode vector.
> > 
> > Likewise, when you load a V4QImode vector the hardware representation is
> > actually V4SImode (which in itself is just V64SImode with undefined values in
> > the unused lanes).
> 
> I see.  I wonder if there's not one or two latent wrong-code because of
> this and the vectorizers assumptions ;)  I suppose modes_tieable_p
> will tell us whether a VIEW_CONVERT_EXPR will do the right thing?
> Is GET_MODE_SIZE (V64QImode) == GET_MODE_SIZE (V64SImode) btw?
> And V64QImode really V64PSImode?

The mode size says how big it will be when written to memory, so no they're not
the same. I believe this matches the scalar QImode behaviour.

We don't use any PSI modes. There are (some) machine instructions for V64QImode
(and V64HImode) so we don't want to lose that information.

There may well be some bugs, but we have handling for conversions in a number
of places. There are truncate and extend patterns that operate lane-wise, and
vec_extract can take a subset of a vector, IIRC.

> Still for a V64QImode load on { c[0], c[1], c[2], c[3], c[32], c[33], 
> c[34], c[35], ... } it's probably best to use a single V64QImode gather 
> with GCN then rather than four "consecutive" V64QImode loads and then
> element swizzling.

Fewer loads are always better, and permutations are expensive operations (and
don't work with 64-lane vectors on RDNA devices because they're actually two
32-lane vectors stuck together) so it can certainly make sense to use gather
with a vector of permuted offsets (although it can be expensive to generate
that vector in the first place).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-06-03 14:11 ` ams at gcc dot gnu.org
@ 2024-06-20  5:15 ` pinskia at gcc dot gnu.org
  2024-06-20  6:45 ` rguenther at suse dot de
  14 siblings, 0 replies; 16+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-06-20  5:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pinskia at gcc dot gnu.org

--- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note when I am adding V4QI support to the aarch64 backend (emulating it via
V8QI), I am getting a failure in slp-gap-1.c but it is different from the
others.

Without V4QI, the pattern matched `\{_\[0-9\]\+, 0` was able to match 6 times.

we got in the IR:
```
  unsigned int _50;
  vector(2) unsigned int _49;
...
  _50 = MEM <unsigned int> [(uint8_t *)vectp_pix1.5_58];
  _49 = {_50, 0};
```


But afterwards we now get:
```
  vector(4) unsigned char _50;
  vector(8) unsigned char vect__34.9;
...
  _50 = MEM <vector(4) unsigned char> [(uint8_t *)vectp_pix1.5_58];
  vect__34.9_49 = {_50, { 0, 0, 0, 0 }};
```

Which produces the exact same code. I am trying to figure out the best way to
change the testcase pattern to make sure we don't match:
  vect__37.23_6 = VEC_PERM_EXPR <vect__37.15_30, vect__37.19_13, { 0, 1, 2, 3,
8, 9, 10, 11 }>;

too.

`\{_\[0-9\]\+, { 0, 0` I think that will work but should I just do an
alternative for the scan-tree-dump-times or should I put it as a seperate one
with some target selection here?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs
  2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-06-20  5:15 ` pinskia at gcc dot gnu.org
@ 2024-06-20  6:45 ` rguenther at suse dot de
  14 siblings, 0 replies; 16+ messages in thread
From: rguenther at suse dot de @ 2024-06-20  6:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #13 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 20 Jun 2024, pinskia at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304
> 
> Andrew Pinski <pinskia at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |pinskia at gcc dot gnu.org
> 
> --- Comment #12 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
> Note when I am adding V4QI support to the aarch64 backend (emulating it via
> V8QI), I am getting a failure in slp-gap-1.c but it is different from the
> others.
> 
> Without V4QI, the pattern matched `\{_\[0-9\]\+, 0` was able to match 6 times.
> 
> we got in the IR:
> ```
>   unsigned int _50;
>   vector(2) unsigned int _49;
> ...
>   _50 = MEM <unsigned int> [(uint8_t *)vectp_pix1.5_58];
>   _49 = {_50, 0};
> ```
> 
> 
> But afterwards we now get:
> ```
>   vector(4) unsigned char _50;
>   vector(8) unsigned char vect__34.9;
> ...
>   _50 = MEM <vector(4) unsigned char> [(uint8_t *)vectp_pix1.5_58];
>   vect__34.9_49 = {_50, { 0, 0, 0, 0 }};
> ```
> 
> Which produces the exact same code. I am trying to figure out the best way to
> change the testcase pattern to make sure we don't match:
>   vect__37.23_6 = VEC_PERM_EXPR <vect__37.15_30, vect__37.19_13, { 0, 1, 2, 3,
> 8, 9, 10, 11 }>;
> 
> too.
> 
> `\{_\[0-9\]\+, { 0, 0` I think that will work but should I just do an
> alternative for the scan-tree-dump-times or should I put it as a seperate one
> with some target selection here?

Maybe match \{_\[0-9\]\+, (0\|{ 0(, 0)+ })?  (with proper quoting)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-06-20  6:45 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-31  8:45 [Bug tree-optimization/115304] New: gcc.dg/vect/slp-gap-1.c FAILs ro at gcc dot gnu.org
2024-05-31  8:46 ` [Bug tree-optimization/115304] " ro at gcc dot gnu.org
2024-05-31  8:46 ` ro at gcc dot gnu.org
2024-05-31 12:26 ` rguenth at gcc dot gnu.org
2024-06-03  9:18 ` ro at CeBiTec dot Uni-Bielefeld.DE
2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
2024-06-03 10:10 ` tschwinge at gcc dot gnu.org
2024-06-03 10:13 ` tschwinge at gcc dot gnu.org
2024-06-03 12:43 ` rguenth at gcc dot gnu.org
2024-06-03 12:46 ` cvs-commit at gcc dot gnu.org
2024-06-03 12:47 ` rguenth at gcc dot gnu.org
2024-06-03 13:33 ` ams at gcc dot gnu.org
2024-06-03 13:51 ` rguenther at suse dot de
2024-06-03 14:11 ` ams at gcc dot gnu.org
2024-06-20  5:15 ` pinskia at gcc dot gnu.org
2024-06-20  6:45 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).