public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer
@ 2021-01-05 18:50 martin@mpa-garching.mpg.de
  2021-01-05 19:36 ` [Bug tree-optimization/98544] " martin@mpa-garching.mpg.de
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: martin@mpa-garching.mpg.de @ 2021-01-05 18:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

            Bug ID: 98544
           Summary: [11 regression] Wrong code generated by tree
                    vectorizer
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: martin@mpa-garching.mpg.de
  Target Milestone: ---

Created attachment 49893
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49893&action=edit
test case

The attached test case (sorry, I don't have the time to reduce this properly at
the moment ...) tests real-valued FFTs of different lengths and should not
output anything if compiled properly.

However, with current trunk and the following compiler command line, some of
the results are really wrong:

martin@debian:~/codes/ducc/bug$ g++ -std=c++17 -O1 -mavx -ftree-vectorize
bug2.cc
martin@debian:~/codes/ducc/bug$ ./a.out
problem at length 15; discrepancy is 4.40898
problem at length 20; discrepancy is 4.54691
problem at length 21; discrepancy is 3.70442
problem at length 25; discrepancy is 8.40318
problem at length 27; discrepancy is 8.44956
problem at length 28; discrepancy is 3.12203
problem at length 30; discrepancy is 8.81795

The discrepancies should be below 1e-15.

The failure goes away when removing either "-ftree-vectorize" or "-mavx". On
released versions of gcc, it runs fine.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
@ 2021-01-05 19:36 ` martin@mpa-garching.mpg.de
  2021-01-06  6:00 ` marxin at gcc dot gnu.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: martin@mpa-garching.mpg.de @ 2021-01-05 19:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #1 from Martin Reinecke <martin@mpa-garching.mpg.de> ---
Problem seems to be related to the use of __restrict__.

If I remove the DUCC0_RESTRICT from the function definitions of "radb3",
"radb4" etc., the problem goes away.

However I don't see where I'm violating the promise made by __restrict__ in
these functions ...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
  2021-01-05 19:36 ` [Bug tree-optimization/98544] " martin@mpa-garching.mpg.de
@ 2021-01-06  6:00 ` marxin at gcc dot gnu.org
  2021-01-06  6:03 ` [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87 marxin at gcc dot gnu.org
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-06  6:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2021-01-06
     Ever confirmed|0                           |1

--- Comment #2 from Martin Liška <marxin at gcc dot gnu.org> ---
Confirmed, one can reduce that to a single loop vectorization:

$ g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize
-fdbg-cnt=vect_loop:10-10 && ./a.out

but the loop is quite huge.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
  2021-01-05 19:36 ` [Bug tree-optimization/98544] " martin@mpa-garching.mpg.de
  2021-01-06  6:00 ` marxin at gcc dot gnu.org
@ 2021-01-06  6:03 ` marxin at gcc dot gnu.org
  2021-01-06  9:02 ` rguenth at gcc dot gnu.org
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-06  6:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11 regression] Wrong code  |[11 regression] Wrong code
                   |generated by tree           |generated by tree
                   |vectorizer                  |vectorizer since
                   |                            |r11-3917-g28290cb50c7dbf87

--- Comment #3 from Martin Liška <marxin at gcc dot gnu.org> ---
For g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize, it started with
r11-3917-g28290cb50c7dbf87.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (2 preceding siblings ...)
  2021-01-06  6:03 ` [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87 marxin at gcc dot gnu.org
@ 2021-01-06  9:02 ` rguenth at gcc dot gnu.org
  2021-01-07 13:59 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-06  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |11.0
                 CC|                            |rsandifo at gcc dot gnu.org
           Priority|P3                          |P1
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
split_constant_offset?  Meh.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (3 preceding siblings ...)
  2021-01-06  9:02 ` rguenth at gcc dot gnu.org
@ 2021-01-07 13:59 ` rguenth at gcc dot gnu.org
  2021-01-07 14:26 ` marxin at gcc dot gnu.org
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 13:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #3)
> For g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize, it started with
> r11-3917-g28290cb50c7dbf87.

Are you sure about this bisection?  Reverting this doesn't fix the issue
(reversion isn't quite easy since split_constant_offset has undergone quite
some refactoring meanwhile).

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (4 preceding siblings ...)
  2021-01-07 13:59 ` rguenth at gcc dot gnu.org
@ 2021-01-07 14:26 ` marxin at gcc dot gnu.org
  2021-01-07 14:27 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-07 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #6 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> (In reply to Martin Liška from comment #3)
> > For g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize, it started with
> > r11-3917-g28290cb50c7dbf87.
> 
> Are you sure about this bisection?

Yes, it's correct. I've just verified that.

>  Reverting this doesn't fix the issue
> (reversion isn't quite easy since split_constant_offset has undergone quite
> some refactoring meanwhile).

I'm gonna bisect until when the reversion worked on the trunk.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (5 preceding siblings ...)
  2021-01-07 14:26 ` marxin at gcc dot gnu.org
@ 2021-01-07 14:27 ` rguenth at gcc dot gnu.org
  2021-01-07 14:35 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #5)
> (In reply to Martin Liška from comment #3)
> > For g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize, it started with
> > r11-3917-g28290cb50c7dbf87.
> 
> Are you sure about this bisection?  Reverting this doesn't fix the issue
> (reversion isn't quite easy since split_constant_offset has undergone quite
> some refactoring meanwhile).

So I can confirm the bisection.  I see quite some good effect of the change
with no "obvious" mistakes done though so it might just trigger some latent
issue elsewhere.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (6 preceding siblings ...)
  2021-01-07 14:27 ` rguenth at gcc dot gnu.org
@ 2021-01-07 14:35 ` rguenth at gcc dot gnu.org
  2021-01-07 15:01 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 14:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Martin Liška from comment #2)
> Confirmed, one can reduce that to a single loop vectorization:
> 
> $ g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize
> -fdbg-cnt=vect_loop:10-10 && ./a.out
> 
> but the loop is quite huge.

btw, 11-11 or 12-12 or 13-13 also is enough individually to trigger a
miscompare.
The 11-11 loop looks smallest to me:

***dbgcnt: lower limit 11 reached for vect_loop.***
***dbgcnt: upper limit 11 reached for vect_loop.***
fft1d.h:1256:23: optimized: loop vectorized using 32 byte vectors
fft1d.h:1256:23: optimized:  loop versioned for vectorization because of
possible aliasing

it also only needs a single alias check (just guessing where things may go
wrong)

The source corresponds to

template<typename T> void radb2(size_t ido, size_t l1,
  const T * DUCC0_RESTRICT cc, T * DUCC0_RESTRICT ch,
  const T0 * DUCC0_RESTRICT wa) const
  {
  auto WA = [wa,ido](size_t x, size_t i) { return wa[i+x*(ido-1)]; };
  auto CC = [cc,ido](size_t a, size_t b, size_t c) -> const T&
    { return cc[a+ido*(b+2*c)]; };
  auto CH = [ch,ido,l1](size_t a, size_t b, size_t c) -> T&
    { return ch[a+ido*(b+l1*c)]; };

  for (size_t k=0; k<l1; k++)
    PM (CH(0,k,0),CH(0,k,1),CC(0,0,k),CC(ido-1,1,k));
  if ((ido&1)==0)
    for (size_t k=0; k<l1; k++)
      {
      CH(ido-1,k,0) = T0( 2)*CC(ido-1,0,k);
      CH(ido-1,k,1) = T0(-2)*CC(0    ,1,k);
      }
  if (ido<=2) return;
  for (size_t k=0; k<l1;++k)
====>  this loop
    for (size_t i=2; i<ido; i+=2)
      {
      size_t ic=ido-i;
      T ti2, tr2;
      PM (CH(i-1,k,0),tr2,CC(i-1,0,k),CC(ic-1,1,k));
      PM (ti2,CH(i  ,k,0),CC(i  ,0,k),CC(ic  ,1,k));
      MULPM (CH(i,k,1),CH(i-1,k,1),WA(0,i-2),WA(0,i-1),ti2,tr2);
      }
<====
  }

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (7 preceding siblings ...)
  2021-01-07 14:35 ` rguenth at gcc dot gnu.org
@ 2021-01-07 15:01 ` rguenth at gcc dot gnu.org
  2021-01-07 15:05 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 15:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> (In reply to Martin Liška from comment #2)
> > Confirmed, one can reduce that to a single loop vectorization:
> > 
> > $ g++ bug2.cc  -std=c++17 -O1 -mavx -ftree-loop-vectorize
> > -fdbg-cnt=vect_loop:10-10 && ./a.out
> > 
> > but the loop is quite huge.
> 
> btw, 11-11 or 12-12 or 13-13 also is enough individually to trigger a
> miscompare.
> The 11-11 loop looks smallest to me:
> 
> ***dbgcnt: lower limit 11 reached for vect_loop.***
> ***dbgcnt: upper limit 11 reached for vect_loop.***
> fft1d.h:1256:23: optimized: loop vectorized using 32 byte vectors
> fft1d.h:1256:23: optimized:  loop versioned for vectorization because of
> possible aliasing
> 
> it also only needs a single alias check (just guessing where things may go
> wrong)
> 
> The source corresponds to
> 
> template<typename T> void radb2(size_t ido, size_t l1,
>   const T * DUCC0_RESTRICT cc, T * DUCC0_RESTRICT ch,
>   const T0 * DUCC0_RESTRICT wa) const
>   {
>   auto WA = [wa,ido](size_t x, size_t i) { return wa[i+x*(ido-1)]; };
>   auto CC = [cc,ido](size_t a, size_t b, size_t c) -> const T&
>     { return cc[a+ido*(b+2*c)]; };
>   auto CH = [ch,ido,l1](size_t a, size_t b, size_t c) -> T&
>     { return ch[a+ido*(b+l1*c)]; };
> 
>   for (size_t k=0; k<l1; k++)
>     PM (CH(0,k,0),CH(0,k,1),CC(0,0,k),CC(ido-1,1,k));
>   if ((ido&1)==0)
>     for (size_t k=0; k<l1; k++)
>       {
>       CH(ido-1,k,0) = T0( 2)*CC(ido-1,0,k);
>       CH(ido-1,k,1) = T0(-2)*CC(0    ,1,k);
>       }
>   if (ido<=2) return;
>   for (size_t k=0; k<l1;++k)
> ====>  this loop
>     for (size_t i=2; i<ido; i+=2)
>       {
>       size_t ic=ido-i;
>       T ti2, tr2;
>       PM (CH(i-1,k,0),tr2,CC(i-1,0,k),CC(ic-1,1,k));
>       PM (ti2,CH(i  ,k,0),CC(i  ,0,k),CC(ic  ,1,k));
>       MULPM (CH(i,k,1),CH(i-1,k,1),WA(0,i-2),WA(0,i-1),ti2,tr2);
>       }
> <====
>   }

Notably the access functions end up with large (negative) initial values
and we have a negative step (more negative step stuff is vectorized with
GCC 11 now!)

 Creating dr for *_123
 analyze_innermost: success.
-       base_address: (const double &) cc_30(D) + (sizetype) ((((long unsigned
int) ido_29(D) + _135)
 + 18446744073709551614) * 8)
+       base_address: (const double &) cc_30(D) + (sizetype) (((long unsigned
int) ido_29(D) + _120) 
* 8)
        offset from base address: 0
-       constant offset from base address: 0
+       constant offset from base address: -24(OVF)
        step: -16(OVF)
        base alignment: 8
        base misalignment: 0
        offset alignment: 256
        step alignment: 16
-       base_object: *(const double &) cc_30(D) + (sizetype) ((((long unsigned
int) ido_29(D) + _135)
 + 18446744073709551614) * 8)
-       Access function 0: {0, +, 18446744073709551600}_4
+       base_object: *(const double &) cc_30(D) + (sizetype) (((long unsigned
int) ido_29(D) + _120) 
* 8)
+       Access function 0: {18446744073709551592, +, 18446744073709551600}_4

we now use SLP for this loop (great!) while we previously failed to vectorize
this loop at all.

We can disable epilouge vectorization with no (good) effect.

The function in question is ducc0::detail_fft::rfftp<double>::radb2<double>,
it should be possible to isolate this kernel into a testcase, I will try to do
this from the GIMPLE IL.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (8 preceding siblings ...)
  2021-01-07 15:01 ` rguenth at gcc dot gnu.org
@ 2021-01-07 15:05 ` rguenth at gcc dot gnu.org
  2021-01-07 15:19 ` rguenth at gcc dot gnu.org
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 15:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> ---
Created attachment 49912
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49912&action=edit
GIMPLE testcase for one of the miscompiled(?) loops

It shows similar vectorized / analysis IL as for the big testcase with -O
-ftree-loop-vectorize -mavx2 -fgimple

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (9 preceding siblings ...)
  2021-01-07 15:05 ` rguenth at gcc dot gnu.org
@ 2021-01-07 15:19 ` rguenth at gcc dot gnu.org
  2021-01-07 15:29 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 15:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
Or massaging the C++ to produce a single-function testcase (driver still
missing,
I think we can remove the "unrelated" loops as well).  What kind of shape (w/o
too much guessing) is the function expecting for its input arrays?

typedef unsigned long size_t;

template<typename T> inline void PM(T &a, T &b, T c, T d)
  { a=c+d; b=c-d; }
template<typename T1, typename T2, typename T3> inline void MULPM
  (T1 &a, T1 &b, T2 c, T2 d, T3 e, T3 f)
  {  a=c*e+d*f; b=c*f-d*e; }

typedef double T;
void radb2(size_t ido, size_t l1,
           const T * __restrict cc, T * __restrict ch,
           const T * __restrict wa)
{
  auto WA = [wa,ido](size_t x, size_t i) { return wa[i+x*(ido-1)]; };
  auto CC = [cc,ido](size_t a, size_t b, size_t c) -> const T&
    { return cc[a+ido*(b+2*c)]; };
  auto CH = [ch,ido,l1](size_t a, size_t b, size_t c) -> T&
    { return ch[a+ido*(b+l1*c)]; };

  for (size_t k=0; k<l1; k++)
    PM (CH(0,k,0),CH(0,k,1),CC(0,0,k),CC(ido-1,1,k));
  if ((ido&1)==0)
    for (size_t k=0; k<l1; k++)
      {
        CH(ido-1,k,0) = T( 2)*CC(ido-1,0,k);
        CH(ido-1,k,1) = T(-2)*CC(0    ,1,k);
      }
  if (ido<=2) return;
  for (size_t k=0; k<l1;++k)
    for (size_t i=2; i<ido; i+=2)
      {
        size_t ic=ido-i;
        T ti2, tr2;
        PM (CH(i-1,k,0),tr2,CC(i-1,0,k),CC(ic-1,1,k));
        PM (ti2,CH(i  ,k,0),CC(i  ,0,k),CC(ic  ,1,k));
        MULPM (CH(i,k,1),CH(i-1,k,1),WA(0,i-2),WA(0,i-1),ti2,tr2);
      }
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (10 preceding siblings ...)
  2021-01-07 15:19 ` rguenth at gcc dot gnu.org
@ 2021-01-07 15:29 ` rguenth at gcc dot gnu.org
  2021-01-07 15:37 ` martin@mpa-garching.mpg.de
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 15:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
So at least the following doesn't produce any different answers for me
comparing -O -ftree-loop-vectorize -mavx to -O0

typedef unsigned long size_t;

template<typename T> inline void PM(T &a, T &b, T c, T d)
  { a=c+d; b=c-d; }
template<typename T1, typename T2, typename T3> inline void MULPM
  (T1 &a, T1 &b, T2 c, T2 d, T3 e, T3 f)
  {  a=c*e+d*f; b=c*f-d*e; }

typedef double T;
void __attribute__((noipa))
radb2(size_t ido, size_t l1,
      const T * __restrict cc, T * __restrict ch,
      const T * __restrict wa)
{
  auto WA = [wa,ido](size_t x, size_t i) { return wa[i+x*(ido-1)]; };
  auto CC = [cc,ido](size_t a, size_t b, size_t c) -> const T&
    { return cc[a+ido*(b+2*c)]; };
  auto CH = [ch,ido,l1](size_t a, size_t b, size_t c) -> T&
    { return ch[a+ido*(b+l1*c)]; };

  for (size_t k=0; k<l1;++k)
    for (size_t i=2; i<ido; i+=2)
      {
        size_t ic=ido-i;
        T ti2, tr2;
        PM (CH(i-1,k,0),tr2,CC(i-1,0,k),CC(ic-1,1,k));
        PM (ti2,CH(i  ,k,0),CC(i  ,0,k),CC(ic  ,1,k));
        MULPM (CH(i,k,1),CH(i-1,k,1),WA(0,i-2),WA(0,i-1),ti2,tr2);
      }
}

int main()
{
  double *cc, *ch, *wa;
  cc = new double[8*8*8];
  ch = new double[8*8*8];
  wa = new double[8*8*8];
  for (size_t i = 0; i < 8*8*8; ++i)
    cc[i] = i;
  for (size_t i = 0; i < 8*8*8; ++i)
    ch[i] = i + 0.25;
  for (size_t i = 0; i < 8*8*8; ++i)
    wa[i] = i - 0.5;
  radb2 (8, 8, cc, ch, wa);
  for (size_t i = 0; i < 8*8*8; ++i)
    __builtin_printf ("%f %f %f\n", cc[i], ch[i], wa[i]);
  return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (11 preceding siblings ...)
  2021-01-07 15:29 ` rguenth at gcc dot gnu.org
@ 2021-01-07 15:37 ` martin@mpa-garching.mpg.de
  2021-01-07 15:50 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: martin@mpa-garching.mpg.de @ 2021-01-07 15:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #13 from Martin Reinecke <martin@mpa-garching.mpg.de> ---
> What kind of shape (w/o too much guessing) is the function expecting for its input arrays?

For radb<x> the size of the cc and ch arrays is l1*ido*x.
Size of wa is (x-1)*ido.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (12 preceding siblings ...)
  2021-01-07 15:37 ` martin@mpa-garching.mpg.de
@ 2021-01-07 15:50 ` rguenth at gcc dot gnu.org
  2021-01-07 16:08 ` martin@mpa-garching.mpg.de
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 15:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
So that makes it sth like the following then - still not miscomparing.  What's
the "problem at length N" hinting at?  Does N somehow make ido/l1 different?

typedef unsigned long size_t;
template<typename T> inline void PM(T &a, T &b, T c, T d)
  { a=c+d; b=c-d; }
template<typename T1, typename T2, typename T3> inline void MULPM
  (T1 &a, T1 &b, T2 c, T2 d, T3 e, T3 f)
  {  a=c*e+d*f; b=c*f-d*e; }
typedef double T;
void __attribute__((noipa))
radb2(size_t ido, size_t l1,
      const T * __restrict cc, T * __restrict ch,
      const T * __restrict wa)
{
  auto WA = [wa,ido](size_t x, size_t i) { return wa[i+x*(ido-1)]; };
  auto CC = [cc,ido](size_t a, size_t b, size_t c) -> const T&
    { return cc[a+ido*(b+2*c)]; };
  auto CH = [ch,ido,l1](size_t a, size_t b, size_t c) -> T&
    { return ch[a+ido*(b+l1*c)]; };

  for (size_t k=0; k<l1;++k)
    for (size_t i=2; i<ido; i+=2)
      {
        size_t ic=ido-i;
        T ti2, tr2;
        PM (CH(i-1,k,0),tr2,CC(i-1,0,k),CC(ic-1,1,k));
        PM (ti2,CH(i  ,k,0),CC(i  ,0,k),CC(ic  ,1,k));
        MULPM (CH(i,k,1),CH(i-1,k,1),WA(0,i-2),WA(0,i-1),ti2,tr2);
      }
}
int main()
{
  double *cc, *ch, *wa;
  size_t ido = 10, l1 = 10;
  cc = new double[ido*l1*2];
  ch = new double[ido*l1*2];
  wa = new double[ido];
  for (size_t i = 0; i < 10*10*10; ++i)
    cc[i] = i;
  for (size_t i = 0; i < 10*10*10; ++i)
    wa[i] = i - 0.5;
  radb2 (ido, l1, cc, ch, wa);
  for (size_t i = 0; i < ido*l1*2; ++i)
    __builtin_printf ("%f\n", ch[i]);
  return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (13 preceding siblings ...)
  2021-01-07 15:50 ` rguenth at gcc dot gnu.org
@ 2021-01-07 16:08 ` martin@mpa-garching.mpg.de
  2021-01-07 16:48 ` rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: martin@mpa-garching.mpg.de @ 2021-01-07 16:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #15 from Martin Reinecke <martin@mpa-garching.mpg.de> ---
"Problem at length N" means that the FFT of length N is computed incorrectly.
Also, N==l1*ido*x.

For an FFT of length N, the computation is broken down into several passes.
Let's take N=15.
First the prome factors of N are computed, in this case 3 and 5.

The (simplified) procedure is then:

l1=1;

// first pass with x=3
x=3;
ido=N/(x*l1);
radb3(ido, l1, p1, p2, <twiddle factors>);
swap (p1,p2);

l1*=x;
x=5;
ido=N/(x*l1);
radb5(ido, l1, p1, p2, <twiddle factors>);
swap (p1,p2);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (14 preceding siblings ...)
  2021-01-07 16:08 ` martin@mpa-garching.mpg.de
@ 2021-01-07 16:48 ` rguenth at gcc dot gnu.org
  2021-01-08 12:14 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-07 16:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
Interestingly trunk doesn't exhibit the issue with radb2 but radb4 and radb3
and radb5 still do ... (radb3 being the next "smallest") but a similarly
stripped down testcase doesn't exhibit problems either (at length 15 as
reported from the full testcase).  *sigh*

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (15 preceding siblings ...)
  2021-01-07 16:48 ` rguenth at gcc dot gnu.org
@ 2021-01-08 12:14 ` rguenth at gcc dot gnu.org
  2021-01-08 12:16 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-08 12:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
/* { dg-do run } */

double a[4], b[2];

void __attribute__((noipa))
foo ()
{
  double a0 = a[0];
  double a1 = a[1];
  double a2 = a[2];
  double a3 = a[3];
  b[0] = a1 - a3;
  b[1] = a0 + a2;
}

int main()
{
  a[0] = 1.;
  a[1] = 2.;
  a[2] = 3.;
  a[3] = 4.;
  foo ();
  if (b[0] != -2 || b[1] != 4)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (16 preceding siblings ...)
  2021-01-08 12:14 ` rguenth at gcc dot gnu.org
@ 2021-01-08 12:16 ` rguenth at gcc dot gnu.org
  2021-01-08 12:20 ` marxin at gcc dot gnu.org
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-08 12:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #17)
> /* { dg-do run } */
> 
> double a[4], b[2];
> 
> void __attribute__((noipa))
> foo ()
> {
>   double a0 = a[0];
>   double a1 = a[1];
>   double a2 = a[2];
>   double a3 = a[3];
>   b[0] = a1 - a3;
>   b[1] = a0 + a2;
> }
> 
> int main()
> {
>   a[0] = 1.;
>   a[1] = 2.;
>   a[2] = 3.;
>   a[3] = 4.;
>   foo ();
>   if (b[0] != -2 || b[1] != 4)
>     __builtin_abort ();
>   return 0;
> }

err

/* { dg-do run } */

double a[2], b[2], c[2], d[2];

void __attribute__((noipa))
foo()
{
  double a0 = a[0];
  double a1 = a[1];
  double b0 = b[0];
  double b1 = b[1];
  double c0 = c[0];
  double c1 = c[1];
  double tem1 = a1 - b1;
  double tem2 = a0 + b0;
  d[0] = tem1 * c1;
  d[1] = tem2 * c0;
}

int main()
{
  a[0] = 1.;
  a[1] = 2.;
  b[0] = 3.;
  b[1] = 4.;
  c[0] = 2.;
  c[1] = 3.;
  foo ();
  if (d[0] != -6. || d[1] != 8.)
    __builtin_abort ();
  return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (17 preceding siblings ...)
  2021-01-08 12:16 ` rguenth at gcc dot gnu.org
@ 2021-01-08 12:20 ` marxin at gcc dot gnu.org
  2021-01-08 13:09 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: marxin at gcc dot gnu.org @ 2021-01-08 12:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #19 from Martin Liška <marxin at gcc dot gnu.org> ---
> err
> 
> /* { dg-do run } */
> 
> double a[2], b[2], c[2], d[2];
> 
> void __attribute__((noipa))
> foo()
> {
>   double a0 = a[0];
>   double a1 = a[1];
>   double b0 = b[0];
>   double b1 = b[1];
>   double c0 = c[0];
>   double c1 = c[1];
>   double tem1 = a1 - b1;
>   double tem2 = a0 + b0;
>   d[0] = tem1 * c1;
>   d[1] = tem2 * c0;
> }
> 
> int main()
> {
>   a[0] = 1.;
>   a[1] = 2.;
>   b[0] = 3.;
>   b[1] = 4.;
>   c[0] = 2.;
>   c[1] = 3.;
>   foo ();
>   if (d[0] != -6. || d[1] != 8.)
>     __builtin_abort ();
>   return 0;
> }

This started to fail with -O3 since r11-3823-g126ed72b9f48f853 if it helps.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (18 preceding siblings ...)
  2021-01-08 12:20 ` marxin at gcc dot gnu.org
@ 2021-01-08 13:09 ` cvs-commit at gcc dot gnu.org
  2021-01-08 13:09 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-01-08 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #20 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:bdcde1504502719504a7a63ab10059e171694dc2

commit r11-6549-gbdcde1504502719504a7a63ab10059e171694dc2
Author: Richard Biener <rguenther@suse.de>
Date:   Fri Jan 8 13:17:18 2021 +0100

    tree-optimization/98544 - more permute optimization fixes

    Permute nodes are not transparent to the permute of their children.
    Instead we have to materialize child permutes always and in future
    may treat permute nodes as the source of arbitrary permutes as
    we can permute the lane permutation vector at will (as the target
    supports in the end).

    2021-01-08  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/98544
            * tree-vect-slp.c (vect_optimize_slp): Always materialize
            permutes at a permute node.

            * gcc.dg/vect/bb-slp-pr98544.c: New testcase.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (19 preceding siblings ...)
  2021-01-08 13:09 ` cvs-commit at gcc dot gnu.org
@ 2021-01-08 13:09 ` rguenth at gcc dot gnu.org
  2021-01-08 14:07 ` martin@mpa-garching.mpg.de
  2021-01-08 14:18 ` rguenther at suse dot de
  22 siblings, 0 replies; 24+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-08 13:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #21 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (20 preceding siblings ...)
  2021-01-08 13:09 ` rguenth at gcc dot gnu.org
@ 2021-01-08 14:07 ` martin@mpa-garching.mpg.de
  2021-01-08 14:18 ` rguenther at suse dot de
  22 siblings, 0 replies; 24+ messages in thread
From: martin@mpa-garching.mpg.de @ 2021-01-08 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #22 from Martin Reinecke <martin@mpa-garching.mpg.de> ---
Brilliant, thank you very much for tracking this one down!
My FFT library now works correctly again with all optimizations enabled, which
is a great relief. The scipy maintainers will be happy that they won't need to
fiddle with flags depending on gcc versions :)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87
  2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
                   ` (21 preceding siblings ...)
  2021-01-08 14:07 ` martin@mpa-garching.mpg.de
@ 2021-01-08 14:18 ` rguenther at suse dot de
  22 siblings, 0 replies; 24+ messages in thread
From: rguenther at suse dot de @ 2021-01-08 14:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544

--- Comment #23 from rguenther at suse dot de <rguenther at suse dot de> ---
On January 8, 2021 3:07:48 PM GMT+01:00, "martin@mpa-garching.mpg.de"
<gcc-bugzilla@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98544
>
>--- Comment #22 from Martin Reinecke <martin@mpa-garching.mpg.de> ---
>Brilliant, thank you very much for tracking this one down!
>My FFT library now works correctly again with all optimizations
>enabled, which
>is a great relief. The scipy maintainers will be happy that they won't
>need to
>fiddle with flags depending on gcc versions :)

Thanks again for testing and reporting issues before we released GCC 11!

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-01-08 14:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-05 18:50 [Bug c++/98544] New: [11 regression] Wrong code generated by tree vectorizer martin@mpa-garching.mpg.de
2021-01-05 19:36 ` [Bug tree-optimization/98544] " martin@mpa-garching.mpg.de
2021-01-06  6:00 ` marxin at gcc dot gnu.org
2021-01-06  6:03 ` [Bug tree-optimization/98544] [11 regression] Wrong code generated by tree vectorizer since r11-3917-g28290cb50c7dbf87 marxin at gcc dot gnu.org
2021-01-06  9:02 ` rguenth at gcc dot gnu.org
2021-01-07 13:59 ` rguenth at gcc dot gnu.org
2021-01-07 14:26 ` marxin at gcc dot gnu.org
2021-01-07 14:27 ` rguenth at gcc dot gnu.org
2021-01-07 14:35 ` rguenth at gcc dot gnu.org
2021-01-07 15:01 ` rguenth at gcc dot gnu.org
2021-01-07 15:05 ` rguenth at gcc dot gnu.org
2021-01-07 15:19 ` rguenth at gcc dot gnu.org
2021-01-07 15:29 ` rguenth at gcc dot gnu.org
2021-01-07 15:37 ` martin@mpa-garching.mpg.de
2021-01-07 15:50 ` rguenth at gcc dot gnu.org
2021-01-07 16:08 ` martin@mpa-garching.mpg.de
2021-01-07 16:48 ` rguenth at gcc dot gnu.org
2021-01-08 12:14 ` rguenth at gcc dot gnu.org
2021-01-08 12:16 ` rguenth at gcc dot gnu.org
2021-01-08 12:20 ` marxin at gcc dot gnu.org
2021-01-08 13:09 ` cvs-commit at gcc dot gnu.org
2021-01-08 13:09 ` rguenth at gcc dot gnu.org
2021-01-08 14:07 ` martin@mpa-garching.mpg.de
2021-01-08 14:18 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).