public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
@ 2023-06-06 20:33 hubicka at gcc dot gnu.org
  2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-06 20:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

            Bug ID: 110148
           Summary: TSVC s242 regression between
                    g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and
                    g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

Seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.854.0 (benzen)
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=463.854.0 (helene)
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=283.854.0 (lntzen3)

So it seems to affect both intel (Helene) and zens
TSVC/s1244 is also regressing same day but only on zens

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
@ 2023-06-06 21:02 ` pinskia at gcc dot gnu.org
  2023-06-09  6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-06 21:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Only g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 sticks out but there is no FMA
possible in s242 as far as I can tell ....

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
  2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
@ 2023-06-09  6:36 ` rguenth at gcc dot gnu.org
  2023-06-09 11:11 ` lili.cui at intel dot com
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-09  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Blocks|                            |53947
             Target|                            |x86_64-*-*
            Version|13.1.0                      |14.0
            Summary|TSVC s242 regression        |[14 Regression] TSVC s242
                   |between                     |regression between
                   |g:c0df96b3cda5738afbba3a65b |g:c0df96b3cda5738afbba3a65b
                   |b054183c5cd5530 and         |b054183c5cd5530 and
                   |g:e4c986fde56a6248f8fbe6cf0 |g:e4c986fde56a6248f8fbe6cf0
                   |704e1da34b055d8             |704e1da34b055d8
   Target Milestone|---                         |14.0
           Keywords|                            |needs-bisection


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
  2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
  2023-06-09  6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
@ 2023-06-09 11:11 ` lili.cui at intel dot com
  2023-06-25  5:56 ` lili.cui at intel dot com
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-06-09 11:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

cuilili <lili.cui at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lili.cui at intel dot com

--- Comment #2 from cuilili <lili.cui at intel dot com> ---

The commit changed the break dependency chain function, in order to generate
more FMA. S242 has a chain that needs to be broken. The chain is in a small
loop and related with the loop reduction variable a[i-1].


Src code:

for (int i = 1; i < LEN_1D; ++i) 
   {
     a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];
   }

------------------------------------------------------
Base version:

SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1+ssa2;
ssa4 = ssa3 + a[i-1]

a[i-1] uses xmm1, there are 2 instructions using xmm0 have dependencies across
iterations

Assembler
Loop1:
vmovsd 0x60c400(%rax),%xmm0              
vaddsd 0x60b000(%rax),%xmm3,%xmm2        
add    $0x8,%rax                                 
vaddsd 0x60b9f8(%rax),%xmm0,%xmm0        
vaddsd %xmm2,%xmm0,%xmm0                         
vaddsd %xmm0,%xmm1,%xmm1     ---> 1                   
vmovsd %xmm1,0x60cdf8(%rax)  ---> 2
cmp    $0xa00,%rdx
jne    Loop1

--------------------------------------------------------------
Base + commit g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 version:

a[i-1] uses xmm0, there are 4 instructions using xmm0 have dependencies across
iterations

SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1 + a[i-1]
ssa3 = ssa2 + ssa3;

Assembler
Loop1:
vaddsdq  0x60b000(%rax), %xmm0, %xmm0  ---> 1
vmovsdq  0x60c400(%rax), %xmm1
add $0x8, %rax                                                           
vaddsdq  0x60b9f8(%rax), %xmm1, %xmm1
vaddsd %xmm2, %xmm0, %xmm0             ---> 2
vaddsd %xmm1, %xmm0, %xmm0             ---> 3
vmovsdq  %xmm0, 0x60cdf8(%rax)         ---> 4
cmp    $0xa00,%rdx
jne    Loop1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-06-09 11:11 ` lili.cui at intel dot com
@ 2023-06-25  5:56 ` lili.cui at intel dot com
  2023-06-25 20:01 ` hubicka at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-06-25  5:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #3 from cuilili <lili.cui at intel dot com> ---
I reproduced S1244 regression on znver3.

Src code:

for (int i = 0; i < LEN_1D-1; i++)
  {
    a[i] = b[i] + c[i] * c[i] + b[i] * b[i] + c[i];
    d[i] = a[i] + a[i+1];
  }
--------------------------------------------------------
Base version:                     Base + commit version:            

Assembler                         Assembler                         
Loop1:                            Loop1:                            
vmovsd 0x60c400(%rax),%xmm2       vmovsd 0x60ba00(%rax),%xmm2       
vmovsd 0x60ba00(%rax),%xmm1       vmovsd 0x60c400(%rax),%xmm1       
add    $0x8,%rax                  add    $0x8,%rax                  
--------------------------------------------------------------------
vaddsd %xmm1,%xmm2,%xmm0          vmovsd %xmm2,%xmm2,%xmm0          
vmulsd %xmm2,%xmm2,%xmm2          vfmadd132sd %xmm2,%xmm1,%xmm0     
vfmadd132sd %xmm1,%xmm2,%xmm1     vfmadd132sd %xmm1,%xmm2,%xmm1     
--------------------------------------------------------------------
vaddsd %xmm1,%xmm0,%xmm0          vaddsd %xmm1,%xmm0,%xmm0          
vmovsd %xmm0,0x60cdf8(%rax)       vmovsd %xmm0,0x60cdf8(%rax)       
vaddsd 0x60ce00(%rax),%xmm0,%xmm0 vaddsd 0x60ce00(%rax),%xmm0,%xmm0 
vmovsd %xmm0,0x60aff8(%rax)       vmovsd %xmm0,0x60aff8(%rax)       
cmp    $0x9f8,%rax                cmp    $0x9f8,%rax                
jne    Loop1:                     jne    Loop1        


For the Base version, mult and FMA have dependencies, which increases the
latency of the critical dependency chain. I didn't find out why znver3 has
regression. Same binary running on ICX has 11% gain (with #define iterations
100000000).

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-06-25  5:56 ` lili.cui at intel dot com
@ 2023-06-25 20:01 ` hubicka at gcc dot gnu.org
  2023-06-29  9:31 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-25 20:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
zen3 fma requires all inputs to be ready to start execution, separate
multiply+add can start multiplication earlier. Not sure if that explains the
difference.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-06-25 20:01 ` hubicka at gcc dot gnu.org
@ 2023-06-29  9:31 ` cvs-commit at gcc dot gnu.org
  2023-09-23 10:32 ` jamborm at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-29  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Lili Cui <cuilili@gcc.gnu.org>:

https://gcc.gnu.org/g:4633e38cd22c5e51fac984124c7627be912d0999

commit r14-2185-g4633e38cd22c5e51fac984124c7627be912d0999
Author: Lili Cui <lili.cui@intel.com>
Date:   Thu Jun 29 06:51:56 2023 +0000

    Avoid adding loop-carried ops to long chains

    Avoid adding loop-carried ops to long chains, otherwise the whole chain
will
    have dependencies across the loop iteration. Just keep loop-carried ops in
a
    separate chain.
       E.g.
       x_1 = phi(x_0, x_2)
       y_1 = phi(y_0, y_2)

       a + b + c + d + e + x1 + y1

       SSA1 = a + b;
       SSA2 = c + d;
       SSA3 = SSA1 + e;
       SSA4 = SSA3 + SSA2;
       SSA5 = x1 + y1;
       SSA6 = SSA4 + SSA5;

    With the patch applied, these test cases improved by 32%~100%.

    S242:
    for (int i = 1; i < LEN_1D; ++i) {
        a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}

    Case 1:
    for (int i = 1; i < LEN_1D; ++i) {
        a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

    Case 2:
    for (int i = 1; i < LEN_1D; ++i) {
        a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}

    The value is the execution time
    A: original version
    B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
    C: with current patch(base on B)

              A       B       C     B/A             C/A
    s242    2.859   5.152   2.859   1.802028681     1
    case 1  5.489   5.488   3.511   0.999818        0.64
    case 2  7.216   7.499   4.885   1.039218        0.68

    gcc/ChangeLog:

            PR tree-optimization/110148
            * tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle
loop-carried
            ops in this function.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-06-29  9:31 ` cvs-commit at gcc dot gnu.org
@ 2023-09-23 10:32 ` jamborm at gcc dot gnu.org
  2023-09-26  1:55 ` lili.cui at intel dot com
  2023-09-26 15:04 ` jamborm at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-09-23 10:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I believe this has been fixed?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-09-23 10:32 ` jamborm at gcc dot gnu.org
@ 2023-09-26  1:55 ` lili.cui at intel dot com
  2023-09-26 15:04 ` jamborm at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-09-26  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #7 from cuilili <lili.cui at intel dot com> ---
(In reply to Martin Jambor from comment #6)
> I believe this has been fixed?

Yes.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
  2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-09-26  1:55 ` lili.cui at intel dot com
@ 2023-09-26 15:04 ` jamborm at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-09-26 15:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #8 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to cuilili from comment #7)
> (In reply to Martin Jambor from comment #6)
> > I believe this has been fixed?
> 
> Yes.

Closing the bug then.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-26 15:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
2023-06-09  6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
2023-06-09 11:11 ` lili.cui at intel dot com
2023-06-25  5:56 ` lili.cui at intel dot com
2023-06-25 20:01 ` hubicka at gcc dot gnu.org
2023-06-29  9:31 ` cvs-commit at gcc dot gnu.org
2023-09-23 10:32 ` jamborm at gcc dot gnu.org
2023-09-26  1:55 ` lili.cui at intel dot com
2023-09-26 15:04 ` jamborm at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).