public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f   -O execution test
@ 2024-06-25 11:39 tschwinge at gcc dot gnu.org
  2024-06-25 11:56 ` [Bug target/115640] " tschwinge at gcc dot gnu.org
                   ` (20 more replies)
  0 siblings, 21 replies; 22+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-25 11:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

            Bug ID: 115640
           Summary: GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution
                    test
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: testsuite-fail
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: ams at gcc dot gnu.org, rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: GCN

The new PR115528 test case 'gfortran.dg/vect/pr115528.f' FAILs its execution
test for GCN target (tested '-march=gfx908'):

    +PASS: gfortran.dg/vect/pr115528.f   -O  (test for excess errors)
    +FAIL: gfortran.dg/vect/pr115528.f   -O  execution test

    spawn -ignore SIGHUP [...]/build-gcc/gcc/gcn-run ./pr115528.exe
    Memory access fault by GPU node-2 (Agent handle: 0x1834c30) on address
0x7f67e6dff000. Reason: Page not present or supervisor privilege.
    FAIL: gfortran.dg/vect/pr115528.f   -O  execution test

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
@ 2024-06-25 11:56 ` tschwinge at gcc dot gnu.org
  2024-06-25 12:30 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-25 11:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #1 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
For GCN target (tested '-march=gfx908'), we've got identical code
('pr115528.exe') before vs. after the commit
r15-1582-g2f83ea87ee328d337f87d4430861221be9babe1e "tree-optimization/115528 -
fix vect alignment analysis for outer loop vect" code changes, so it's likely
an unrelated issue?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
  2024-06-25 11:56 ` [Bug target/115640] " tschwinge at gcc dot gnu.org
@ 2024-06-25 12:30 ` rguenth at gcc dot gnu.org
  2024-06-25 12:36 ` ams at gcc dot gnu.org
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-25 12:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-25
             Status|UNCONFIRMED                 |NEW
                 CC|                            |rsandifo at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It doesn't look like a misaligned access at least.  Instead it looks like an
out-of-bound access?

OTOH I probably missed the fortran equivalent of #include "tree-vect.h" and
check_vect() in main.

We do vectorize the outer loop, but with partial vectors (huh, it iterates
exactly 4 times).  I'm not exactly sure the outer loop mask is the correct
one to use in the inner loop though (but we do that).  And this might explain
the issue - we're definitely accessing excess elements of AA in the inner
loop.

In particular we disallowed grouped accesses in inner loops but this
load is basically treated as a grouped access by means of having a load
permutation.  Test coverage seems to be weak here and this restriction
should be lifted ideally (I remember issues with the IV update, but that
only was for the multiple-types case which is rejected separately).

I'm not sure how one would deal with an outer .WHILE_ULT loop mask in the
inner loop.  This is AA(I,J) with I evolving in the outer loop and J
in the inner loop.  The outer loop evolution is contiguous, the inner
loop evolution applies a stride of four.

If you force GCN to use fixed length vectors (how?), does it work?  How's
it behaving on aarch64 with SVE?  (the CI was happy, but maybe doesn't
enable SVE)

Confirmed as in, it looks like wrong generated code.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
  2024-06-25 11:56 ` [Bug target/115640] " tschwinge at gcc dot gnu.org
  2024-06-25 12:30 ` rguenth at gcc dot gnu.org
@ 2024-06-25 12:36 ` ams at gcc dot gnu.org
  2024-06-25 13:08 ` rguenther at suse dot de
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-25 12:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #3 from Andrew Stubbs <ams at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> If you force GCN to use fixed length vectors (how?), does it work?  How's
> it behaving on aarch64 with SVE?  (the CI was happy, but maybe doesn't
> enable SVE)

I believe "--param vect-partial-vector-usage=0" will disable the use of
WHILE_ULT? The default is "2" for the standalone toolchain, and last I checked
the value is inherited from the host in the offload toolchain; the default for
x86_64 was "1", meaning approximately "only use partial vectors in epilogue
loops".

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-06-25 12:36 ` ams at gcc dot gnu.org
@ 2024-06-25 13:08 ` rguenther at suse dot de
  2024-06-25 17:55 ` [Bug target/115640] [15 Regression] " tschwinge at gcc dot gnu.org
                   ` (16 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2024-06-25 13:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 25 Jun 2024, ams at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #3 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #2)
> > If you force GCN to use fixed length vectors (how?), does it work?  How's
> > it behaving on aarch64 with SVE?  (the CI was happy, but maybe doesn't
> > enable SVE)
> 
> I believe "--param vect-partial-vector-usage=0" will disable the use of
> WHILE_ULT? The default is "2" for the standalone toolchain, and last I checked
> the value is inherited from the host in the offload toolchain; the default for
> x86_64 was "1", meaning approximately "only use partial vectors in epilogue
> loops".

For unknown reasons x86-64 refuses to use partial vectors.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-06-25 13:08 ` rguenther at suse dot de
@ 2024-06-25 17:55 ` tschwinge at gcc dot gnu.org
  2024-06-26  6:38 ` rguenther at suse dot de
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-25 17:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|GCN: FAIL:                  |[15 Regression] GCN: FAIL:
                   |gfortran.dg/vect/pr115528.f |gfortran.dg/vect/pr115528.f
                   |  -O  execution test        |  -O  execution test
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=114107

--- Comment #5 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Turns out, this is a regression after all: before commit
r15-1238-g1fe55a1794863b5ad9eeca5062782834716016b2 "tree-optimization/114107 -
avoid peeling for gaps in more cases", this test case
'gfortran.dg/vect/pr115528.f' (which, of course, wasn't in-tree back then) does
PASS its execution test for GCN target (tested '-march=gfx908').

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-06-25 17:55 ` [Bug target/115640] [15 Regression] " tschwinge at gcc dot gnu.org
@ 2024-06-26  6:38 ` rguenther at suse dot de
  2024-06-26  7:40 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2024-06-26  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #6 from rguenther at suse dot de <rguenther at suse dot de> ---
On Tue, 25 Jun 2024, tschwinge at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>             Summary|GCN: FAIL:                  |[15 Regression] GCN: FAIL:
>                    |gfortran.dg/vect/pr115528.f |gfortran.dg/vect/pr115528.f
>                    |  -O  execution test        |  -O  execution test
>            See Also|                            |https://gcc.gnu.org/bugzill
>                    |                            |a/show_bug.cgi?id=114107
> 
> --- Comment #5 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
> Turns out, this is a regression after all: before commit
> r15-1238-g1fe55a1794863b5ad9eeca5062782834716016b2 "tree-optimization/114107 -
> avoid peeling for gaps in more cases", this test case
> 'gfortran.dg/vect/pr115528.f' (which, of course, wasn't in-tree back then) does
> PASS its execution test for GCN target (tested '-march=gfx908').

Yes, the testcase outer loop wasn't vectorized before due to the
"gap" access in the inner loop.  With this rev. we can now vectorize
the inner loop without touching the gap.  Note I'm no longer sure
the loop mask use is what's wrong.  Instead it looks like the IV for
'aa' is off (just like I remember what happens when there is a grouped
access in the inner loop):

  # vectp_aa.21_73 = PHI <vectp_aa.21_72(8), vectp_aa.21_75(3)>
...
  vect__14.23_71 = .MASK_LOAD (vectp_aa.21_73, 64B, loop_mask_77);
  vectp_aa.21_70 = vectp_aa.21_73 + 18446744073709551360; // -256
...
  vectp_aa.21_72 = vectp_aa.21_70 + 32;

for the inner loop - but the inner loop should advance by 32.

So I think the issue is a latent one, but maybe one that couldn't be
triggered.  _Possibly_ it could be triggered with V2mode vectors.

For an inner loop with a grouped access this case is still disqualified
but IIRC it has the same issue.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-06-26  6:38 ` rguenther at suse dot de
@ 2024-06-26  7:40 ` rguenth at gcc dot gnu.org
  2024-06-26 11:05 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26  7:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look (and for run validation try to reproduce with gfx1036).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-06-26  7:40 ` rguenth at gcc dot gnu.org
@ 2024-06-26 11:05 ` rguenth at gcc dot gnu.org
  2024-06-26 11:06 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26 11:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> I will have a look (and for run validation try to reproduce with gfx1036).

OK, so with gfx1036 we end up using 16 byte vectors and the testcase
passes.  The difference with gfx908 is

/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:   ==> examining statement: _14 = aa[_13];
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:   vect_model_load_cost: aligned.
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:   vect_model_load_cost: inside_cost = 2, prologue_cost = 0 .

vs.

/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:   ==> examining statement: _14 = aa[_13];
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
missed:   unsupported vect permute { 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
10 11 11 12 12 13 13 14 14 15 15 }
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
missed:   unsupported load permutation
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:19:72:
missed:   not vectorized: relevant stmt not supported: _14 = aa[_13];
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:   removing SLP instance operations starting from: REALPART_EXPR
<(*hadcur_24(D))[_2]> = _86;
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
missed:  unsupported SLP instances
/space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
note:  re-trying with SLP disabled

so gfx1036 cannot do such permutes but gfx908 can?

On aarch64 with SVE we are using non-SLP and we're doing load-lanes in the
outer loop.  The reason seems to be also the unsupported load permutation,
but that's possibly because of VLA vectors - GCN uses fixed size but
loop masking.  So the better equivalent would have been x86-64 with loop
masking.

So looking again I think the loop mask in the inner loop is wrong.  We have

      do i = 1,4
         do j = 1,4
            HADCUR(I)=
     $         HADCUR(I)+CMPLX(COEF1)*FORM1*AA(I,J)
         end do
      end do

and the vectorizer sees

  <bb 3> [local count: 214748368]:
  # i_35 = PHI <i_27(7), 1(2)>
  # ivtmp_82 = PHI <ivtmp_81(7), 4(2)>
  _1 = (integer(kind=8)) i_35;
  _2 = _1 + -1;
  hadcur__I_RE_lsm.15_8 = REALPART_EXPR <(*hadcur_24(D))[_2]>;
  hadcur__I_IM_lsm.16_9 = IMAGPART_EXPR <(*hadcur_24(D))[_2]>;

  <bb 4> [local count: 858993456]:
  # j_36 = PHI <j_26(8), 1(3)>
...
  _10 = (integer(kind=8)) j_36;
  _11 = _10 * 4;
  _12 = _1 + _11;
  _13 = _12 + -5;
  _14 = aa[_13];
...
  j_26 = j_36 + 1;

  <bb 5> [local count: 214748368]:
  # _86 = PHI <_49(4)>
  # _85 = PHI <_50(4)>
  REALPART_EXPR <(*hadcur_24(D))[_2]> = _86;
  IMAGPART_EXPR <(*hadcur_24(D))[_2]> = _85;
  i_27 = i_35 + 1;

the loop mask { -1, -1, -1, -1, -1, -1, -1, -1, 0, .... } is OK for
the outer loop grouped load

  vect_hadcur__I_RE_lsm.20_76 = .MASK_LOAD (vectp_hadcur.18_79, 64B,
loop_mask_77);

but for the inner loop we do

  vect__14.23_71 = .MASK_LOAD (vectp_aa.21_73, 64B, loop_mask_77);

with the same mask.  This fails to be pruned for the GAP which means
that my improving of gap handling relies for this case to not end up
in the masked load handling.  In fact get_group_load_store_type doesn't
seem to be prepared for outer loop vectorization.  OTOH the inner loop
isn't "unrolled" (it has a VF of 1), and this might be a mistake of
loop mask handling and bad re-use.

As was said elsewhere outer loop vectorization with inner loop datarefs
is compensating for a missed interchange.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-06-26 11:05 ` rguenth at gcc dot gnu.org
@ 2024-06-26 11:06 ` rguenth at gcc dot gnu.org
  2024-06-26 11:06 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26 11:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 58519
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58519&action=edit
patch

I think this fixes it, but I cannot validate.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2024-06-26 11:06 ` rguenth at gcc dot gnu.org
@ 2024-06-26 11:06 ` rguenth at gcc dot gnu.org
  2024-06-26 11:30 ` ams at gcc dot gnu.org
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26 11:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |15.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2024-06-26 11:06 ` rguenth at gcc dot gnu.org
@ 2024-06-26 11:30 ` ams at gcc dot gnu.org
  2024-06-26 11:54 ` tschwinge at gcc dot gnu.org
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-26 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #10 from Andrew Stubbs <ams at gcc dot gnu.org> ---
On 26/06/2024 12:05, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #7)
>> I will have a look (and for run validation try to reproduce with gfx1036).
> 
> OK, so with gfx1036 we end up using 16 byte vectors and the testcase
> passes.  The difference with gfx908 is
> 
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   ==> examining statement: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   vect_model_load_cost: aligned.
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   vect_model_load_cost: inside_cost = 2, prologue_cost = 0 .
> 
> vs.
> 
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   ==> examining statement: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:   unsupported vect permute { 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
> 10 11 11 12 12 13 13 14 14 15 15 }
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:   unsupported load permutation
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:19:72:
> missed:   not vectorized: relevant stmt not supported: _14 = aa[_13];
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:   removing SLP instance operations starting from: REALPART_EXPR
> <(*hadcur_24(D))[_2]> = _86;
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> missed:  unsupported SLP instances
> /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> note:  re-trying with SLP disabled
> 
> so gfx1036 cannot do such permutes but gfx908 can?

GFX10 has more limited permutation capabilities than GFX9 because it 
only has 32-lane vectors natively, even though we're using the 64-lane 
"compatibility" mode.

However, in theory, the permutation capabilities on V32 and below should 
be the same, and some permutations on V64 are allowed, so I don't know 
why it doesn't use it. It's possible I broke the logic in 
gcn_vectorize_vec_perm_const:

   /* RDNA devices can only do permutations within each group of 32-lanes.
      Reject permutations that cross the boundary.  */
   if (TARGET_RDNA2_PLUS)
     for (unsigned int i = 0; i < nelt; i++)
       if (i < 31 ? perm[i] > 31 : perm[i] < 32)
         return false;

It looks right to me though?

The vec_extract patterns that also use permutations are likewise 
supposedly still enabled for V32 and below.

Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2024-06-26 11:30 ` ams at gcc dot gnu.org
@ 2024-06-26 11:54 ` tschwinge at gcc dot gnu.org
  2024-06-26 12:26 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2024-06-26 11:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #11 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #9)
> Created attachment 58519 [details]
> patch
> 
> I think this fixes it, but I cannot validate.

Yes, it does, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2024-06-26 11:54 ` tschwinge at gcc dot gnu.org
@ 2024-06-26 12:26 ` rguenth at gcc dot gnu.org
  2024-06-26 12:34 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Andrew Stubbs from comment #10)
> On 26/06/2024 12:05, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> > 
> > --- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #7)
> >> I will have a look (and for run validation try to reproduce with gfx1036).
> > 
> > OK, so with gfx1036 we end up using 16 byte vectors and the testcase
> > passes.  The difference with gfx908 is
> > 
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:   ==> examining statement: _14 = aa[_13];
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:   vect_model_load_cost: aligned.
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:   vect_model_load_cost: inside_cost = 2, prologue_cost = 0 .
> > 
> > vs.
> > 
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:   ==> examining statement: _14 = aa[_13];
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > missed:   unsupported vect permute { 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
> > 10 11 11 12 12 13 13 14 14 15 15 }
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > missed:   unsupported load permutation
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:19:72:
> > missed:   not vectorized: relevant stmt not supported: _14 = aa[_13];
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:   removing SLP instance operations starting from: REALPART_EXPR
> > <(*hadcur_24(D))[_2]> = _86;
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > missed:  unsupported SLP instances
> > /space/rguenther/src/gcc-autopar_devel/gcc/testsuite/gfortran.dg/vect/pr115528.f:16:12:
> > note:  re-trying with SLP disabled
> > 
> > so gfx1036 cannot do such permutes but gfx908 can?
> 
> GFX10 has more limited permutation capabilities than GFX9 because it 
> only has 32-lane vectors natively, even though we're using the 64-lane 
> "compatibility" mode.
> 
> However, in theory, the permutation capabilities on V32 and below should 
> be the same, and some permutations on V64 are allowed, so I don't know 
> why it doesn't use it. It's possible I broke the logic in 
> gcn_vectorize_vec_perm_const:
> 
>    /* RDNA devices can only do permutations within each group of 32-lanes.
>       Reject permutations that cross the boundary.  */
>    if (TARGET_RDNA2_PLUS)
>      for (unsigned int i = 0; i < nelt; i++)
>        if (i < 31 ? perm[i] > 31 : perm[i] < 32)
>          return false;
> 
> It looks right to me though?

nelt == 32 so I think the last element has the wrong check applied?

It should be

>        if (i < 32 ? perm[i] > 31 : perm[i] < 32)

I think.  With that the vectorization happens in a similar way but the
failure still doesn't reproduce (without the patch, of course).

> The vec_extract patterns that also use permutations are likewise 
> supposedly still enabled for V32 and below.
> 
> Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2024-06-26 12:26 ` rguenth at gcc dot gnu.org
@ 2024-06-26 12:34 ` rguenth at gcc dot gnu.org
  2024-06-26 13:34 ` ams at gcc dot gnu.org
                   ` (6 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-26 12:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #12)
> (In reply to Andrew Stubbs from comment #10)
> > GFX10 has more limited permutation capabilities than GFX9 because it 
> > only has 32-lane vectors natively, even though we're using the 64-lane 
> > "compatibility" mode.
> > 
> > However, in theory, the permutation capabilities on V32 and below should 
> > be the same, and some permutations on V64 are allowed, so I don't know 
> > why it doesn't use it. It's possible I broke the logic in 
> > gcn_vectorize_vec_perm_const:
> > 
> >    /* RDNA devices can only do permutations within each group of 32-lanes.
> >       Reject permutations that cross the boundary.  */
> >    if (TARGET_RDNA2_PLUS)
> >      for (unsigned int i = 0; i < nelt; i++)
> >        if (i < 31 ? perm[i] > 31 : perm[i] < 32)
> >          return false;
> > 
> > It looks right to me though?
> 
> nelt == 32 so I think the last element has the wrong check applied?
> 
> It should be
> 
> >        if (i < 32 ? perm[i] > 31 : perm[i] < 32)
> 
> I think.  With that the vectorization happens in a similar way but the
> failure still doesn't reproduce (without the patch, of course).

Btw, the above looks quite odd for nelt == 32 anyway - we are permuting
two vectors src0 and src1 into one 32 element dst vector (it's no longer
required that src0 and src1 line up with the dst vector size btw, they
might have different nelt).  So the loop would reject interleaving
the low parts of two 32 element vectors, a permute that would look like
{ 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
mean you can never mix the two vector inputs?  Or does GCN not have
a two-to-one vector permute instruction?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2024-06-26 12:34 ` rguenth at gcc dot gnu.org
@ 2024-06-26 13:34 ` ams at gcc dot gnu.org
  2024-06-26 13:41 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-26 13:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #14 from Andrew Stubbs <ams at gcc dot gnu.org> ---
On 26/06/2024 13:34, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #12)
>> (In reply to Andrew Stubbs from comment #10)
>>> GFX10 has more limited permutation capabilities than GFX9 because it
>>> only has 32-lane vectors natively, even though we're using the 64-lane
>>> "compatibility" mode.
>>>
>>> However, in theory, the permutation capabilities on V32 and below should
>>> be the same, and some permutations on V64 are allowed, so I don't know
>>> why it doesn't use it. It's possible I broke the logic in
>>> gcn_vectorize_vec_perm_const:
>>>
>>>     /* RDNA devices can only do permutations within each group of 32-lanes.
>>>        Reject permutations that cross the boundary.  */
>>>     if (TARGET_RDNA2_PLUS)
>>>       for (unsigned int i = 0; i < nelt; i++)
>>>         if (i < 31 ? perm[i] > 31 : perm[i] < 32)
>>>           return false;
>>>
>>> It looks right to me though?
>>
>> nelt == 32 so I think the last element has the wrong check applied?
>>
>> It should be
>>
>>>         if (i < 32 ? perm[i] > 31 : perm[i] < 32)
>>
>> I think.  With that the vectorization happens in a similar way but the
>> failure still doesn't reproduce (without the patch, of course).

Oops, I think you're right.

> Btw, the above looks quite odd for nelt == 32 anyway - we are permuting
> two vectors src0 and src1 into one 32 element dst vector (it's no longer
> required that src0 and src1 line up with the dst vector size btw, they
> might have different nelt).  So the loop would reject interleaving
> the low parts of two 32 element vectors, a permute that would look like
> { 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
> mean you can never mix the two vector inputs?  Or does GCN not have
> a two-to-one vector permute instruction?

GCN does not have two-to-one vector permute in hardware, so we do two 
permutes and a vec_merge to get the same effect.

GFX9 can permute all the elements within a 64 lane vector arbitrarily.

GFX10 and GFX11 can permute the low-32 and high-32 elements freely, but 
no value may cross the boundary. AFAIK there's no way to do that via any 
vector instruction (i.e. without writing to memory, or extracting values 
element-wise).

In theory, we could implement permutes with different sized inputs and 
outputs, but right now those are rejected early. The interleave example 
wouldn't work in hardware, for GFX10, but we could have it for GFX9.

However, I think you might be right about the numbering of the "perm" 
array; we probably need to be testing "(perm[i] % nelt) > 31" if we are 
to support two-to-one permutations.

Thanks for looking at this.

Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2024-06-26 13:34 ` ams at gcc dot gnu.org
@ 2024-06-26 13:41 ` rguenther at suse dot de
  2024-06-26 14:05 ` ams at gcc dot gnu.org
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: rguenther at suse dot de @ 2024-06-26 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 26 Jun 2024, ams at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #14 from Andrew Stubbs <ams at gcc dot gnu.org> ---
> On 26/06/2024 13:34, rguenth at gcc dot gnu.org wrote:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> > 
> > --- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
> > (In reply to Richard Biener from comment #12)
> >> (In reply to Andrew Stubbs from comment #10)
> >>> GFX10 has more limited permutation capabilities than GFX9 because it
> >>> only has 32-lane vectors natively, even though we're using the 64-lane
> >>> "compatibility" mode.
> >>>
> >>> However, in theory, the permutation capabilities on V32 and below should
> >>> be the same, and some permutations on V64 are allowed, so I don't know
> >>> why it doesn't use it. It's possible I broke the logic in
> >>> gcn_vectorize_vec_perm_const:
> >>>
> >>>     /* RDNA devices can only do permutations within each group of 32-lanes.
> >>>        Reject permutations that cross the boundary.  */
> >>>     if (TARGET_RDNA2_PLUS)
> >>>       for (unsigned int i = 0; i < nelt; i++)
> >>>         if (i < 31 ? perm[i] > 31 : perm[i] < 32)
> >>>           return false;
> >>>
> >>> It looks right to me though?
> >>
> >> nelt == 32 so I think the last element has the wrong check applied?
> >>
> >> It should be
> >>
> >>>         if (i < 32 ? perm[i] > 31 : perm[i] < 32)
> >>
> >> I think.  With that the vectorization happens in a similar way but the
> >> failure still doesn't reproduce (without the patch, of course).
> 
> Oops, I think you're right.
> 
> > Btw, the above looks quite odd for nelt == 32 anyway - we are permuting
> > two vectors src0 and src1 into one 32 element dst vector (it's no longer
> > required that src0 and src1 line up with the dst vector size btw, they
> > might have different nelt).  So the loop would reject interleaving
> > the low parts of two 32 element vectors, a permute that would look like
> > { 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
> > mean you can never mix the two vector inputs?  Or does GCN not have
> > a two-to-one vector permute instruction?
> 
> GCN does not have two-to-one vector permute in hardware, so we do two 
> permutes and a vec_merge to get the same effect.
> 
> GFX9 can permute all the elements within a 64 lane vector arbitrarily.
> 
> GFX10 and GFX11 can permute the low-32 and high-32 elements freely, but 
> no value may cross the boundary. AFAIK there's no way to do that via any 
> vector instruction (i.e. without writing to memory, or extracting values 
> element-wise).

I see - so it cannot even swap low-32 and high-32?  I'm thinking of
what sub-part of permutes would be possible by extending the two-to-one
vec_merge trick.

OTOH we restrict GFX10/11 to 32 lane vectors so in practice this
restriction should be fine.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2024-06-26 13:41 ` rguenther at suse dot de
@ 2024-06-26 14:05 ` ams at gcc dot gnu.org
  2024-06-28 11:08 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-26 14:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #16 from Andrew Stubbs <ams at gcc dot gnu.org> ---
On 26/06/2024 14:41, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
> 
> --- Comment #15 from rguenther at suse dot de <rguenther at suse dot de> ---
>>> Btw, the above looks quite odd for nelt == 32 anyway - we are permuting
>>> two vectors src0 and src1 into one 32 element dst vector (it's no longer
>>> required that src0 and src1 line up with the dst vector size btw, they
>>> might have different nelt).  So the loop would reject interleaving
>>> the low parts of two 32 element vectors, a permute that would look like
>>> { 0, 32, 1, 33, 2, 34 ... } so does "within each group of 32-lanes"
>>> mean you can never mix the two vector inputs?  Or does GCN not have
>>> a two-to-one vector permute instruction?
>>
>> GCN does not have two-to-one vector permute in hardware, so we do two
>> permutes and a vec_merge to get the same effect.
>>
>> GFX9 can permute all the elements within a 64 lane vector arbitrarily.
>>
>> GFX10 and GFX11 can permute the low-32 and high-32 elements freely, but
>> no value may cross the boundary. AFAIK there's no way to do that via any
>> vector instruction (i.e. without writing to memory, or extracting values
>> element-wise).
> 
> I see - so it cannot even swap low-32 and high-32?  I'm thinking of
> what sub-part of permutes would be possible by extending the two-to-one
> vec_merge trick.

No(?)

The 64-lane compatibility mode works, under the hood, by allocating 
double the number of 32-lane registers and then executing each 
instruction twice. Mostly this is invisible, but it gets exposed for 
permutations and the like. Logically, the microarchitecture could do a 
vec_merge to DTRT, but I've not found a way to express that.

It's possible I missed something when RTFM.

> OTOH we restrict GFX10/11 to 32 lane vectors so in practice this
> restriction should be fine.

Yes, with the "31" fixed it should work.

Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2024-06-26 14:05 ` ams at gcc dot gnu.org
@ 2024-06-28 11:08 ` cvs-commit at gcc dot gnu.org
  2024-06-28 11:12 ` ams at gcc dot gnu.org
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-28 11:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #17 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Andrew Stubbs <ams@gcc.gnu.org>:

https://gcc.gnu.org/g:ef0b30212f7756db15d7507bfd871bf377d7d648

commit r15-1705-gef0b30212f7756db15d7507bfd871bf377d7d648
Author: Andrew Stubbs <ams@baylibre.com>
Date:   Fri Jun 28 10:47:50 2024 +0000

    amdgcn: Fix RDNA V32 permutations [PR115640]

    There was an off-by-one error in the RDNA validation check, plus I forgot
to
    allow for two-to-one permute-and-merge operations.

            PR target/115640

    gcc/ChangeLog:

            * config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Modify RDNA
checks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2024-06-28 11:08 ` cvs-commit at gcc dot gnu.org
@ 2024-06-28 11:12 ` ams at gcc dot gnu.org
  2024-06-28 11:44 ` cvs-commit at gcc dot gnu.org
  2024-06-28 11:45 ` rguenth at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: ams at gcc dot gnu.org @ 2024-06-28 11:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #18 from Andrew Stubbs <ams at gcc dot gnu.org> ---
That should fix the broken validation check. All V32 permutations should work
now on RDNA GPUs, I think. V16 and smaller were already working fine.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2024-06-28 11:12 ` ams at gcc dot gnu.org
@ 2024-06-28 11:44 ` cvs-commit at gcc dot gnu.org
  2024-06-28 11:45 ` rguenth at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-06-28 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:0192341a07b8ea30f631cf4afdc6fcf3fa7ce838

commit r15-1706-g0192341a07b8ea30f631cf4afdc6fcf3fa7ce838
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jun 26 14:07:51 2024 +0200

    tree-optimization/115640 - outer loop vect with inner SLP permute

    The following fixes wrong-code when using outer loop vectorization
    and an inner loop SLP access with permutation.  A wrong adjustment
    to the IV increment is then applied on GCN.

            PR tree-optimization/115640
            * tree-vect-stmts.cc (vectorizable_load): With an inner
            loop SLP access to not apply a gap adjustment.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Bug target/115640] [15 Regression] GCN: FAIL: gfortran.dg/vect/pr115528.f   -O  execution test
  2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2024-06-28 11:44 ` cvs-commit at gcc dot gnu.org
@ 2024-06-28 11:45 ` rguenth at gcc dot gnu.org
  20 siblings, 0 replies; 22+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-06-28 11:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #20 from Richard Biener <rguenth at gcc dot gnu.org> ---
The wrong-code issue should be fixed now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-06-28 11:45 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-25 11:39 [Bug target/115640] New: GCN: FAIL: gfortran.dg/vect/pr115528.f -O execution test tschwinge at gcc dot gnu.org
2024-06-25 11:56 ` [Bug target/115640] " tschwinge at gcc dot gnu.org
2024-06-25 12:30 ` rguenth at gcc dot gnu.org
2024-06-25 12:36 ` ams at gcc dot gnu.org
2024-06-25 13:08 ` rguenther at suse dot de
2024-06-25 17:55 ` [Bug target/115640] [15 Regression] " tschwinge at gcc dot gnu.org
2024-06-26  6:38 ` rguenther at suse dot de
2024-06-26  7:40 ` rguenth at gcc dot gnu.org
2024-06-26 11:05 ` rguenth at gcc dot gnu.org
2024-06-26 11:06 ` rguenth at gcc dot gnu.org
2024-06-26 11:06 ` rguenth at gcc dot gnu.org
2024-06-26 11:30 ` ams at gcc dot gnu.org
2024-06-26 11:54 ` tschwinge at gcc dot gnu.org
2024-06-26 12:26 ` rguenth at gcc dot gnu.org
2024-06-26 12:34 ` rguenth at gcc dot gnu.org
2024-06-26 13:34 ` ams at gcc dot gnu.org
2024-06-26 13:41 ` rguenther at suse dot de
2024-06-26 14:05 ` ams at gcc dot gnu.org
2024-06-28 11:08 ` cvs-commit at gcc dot gnu.org
2024-06-28 11:12 ` ams at gcc dot gnu.org
2024-06-28 11:44 ` cvs-commit at gcc dot gnu.org
2024-06-28 11:45 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).