public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
@ 2023-06-06 20:33 hubicka at gcc dot gnu.org
2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-06 20:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
Bug ID: 110148
Summary: TSVC s242 regression between
g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and
g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
Seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.854.0 (benzen)
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=463.854.0 (helene)
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=283.854.0 (lntzen3)
So it seems to affect both intel (Helene) and zens
TSVC/s1244 is also regressing same day but only on zens
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
@ 2023-06-06 21:02 ` pinskia at gcc dot gnu.org
2023-06-09 6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-06-06 21:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Only g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 sticks out but there is no FMA
possible in s242 as far as I can tell ....
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
@ 2023-06-09 6:36 ` rguenth at gcc dot gnu.org
2023-06-09 11:11 ` lili.cui at intel dot com
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-06-09 6:36 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Blocks| |53947
Target| |x86_64-*-*
Version|13.1.0 |14.0
Summary|TSVC s242 regression |[14 Regression] TSVC s242
|between |regression between
|g:c0df96b3cda5738afbba3a65b |g:c0df96b3cda5738afbba3a65b
|b054183c5cd5530 and |b054183c5cd5530 and
|g:e4c986fde56a6248f8fbe6cf0 |g:e4c986fde56a6248f8fbe6cf0
|704e1da34b055d8 |704e1da34b055d8
Target Milestone|--- |14.0
Keywords| |needs-bisection
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
2023-06-09 6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
@ 2023-06-09 11:11 ` lili.cui at intel dot com
2023-06-25 5:56 ` lili.cui at intel dot com
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-06-09 11:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
cuilili <lili.cui at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lili.cui at intel dot com
--- Comment #2 from cuilili <lili.cui at intel dot com> ---
The commit changed the break dependency chain function, in order to generate
more FMA. S242 has a chain that needs to be broken. The chain is in a small
loop and related with the loop reduction variable a[i-1].
Src code:
for (int i = 1; i < LEN_1D; ++i)
{
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];
}
------------------------------------------------------
Base version:
SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1+ssa2;
ssa4 = ssa3 + a[i-1]
a[i-1] uses xmm1, there are 2 instructions using xmm0 have dependencies across
iterations
Assembler
Loop1:
vmovsd 0x60c400(%rax),%xmm0
vaddsd 0x60b000(%rax),%xmm3,%xmm2
add $0x8,%rax
vaddsd 0x60b9f8(%rax),%xmm0,%xmm0
vaddsd %xmm2,%xmm0,%xmm0
vaddsd %xmm0,%xmm1,%xmm1 ---> 1
vmovsd %xmm1,0x60cdf8(%rax) ---> 2
cmp $0xa00,%rdx
jne Loop1
--------------------------------------------------------------
Base + commit g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409 version:
a[i-1] uses xmm0, there are 4 instructions using xmm0 have dependencies across
iterations
SSA tree
ssa1 = (s1+s2) + b[i];
ssa2 = c[i] + d[i];
ssa3 = ssa1 + a[i-1]
ssa3 = ssa2 + ssa3;
Assembler
Loop1:
vaddsdq 0x60b000(%rax), %xmm0, %xmm0 ---> 1
vmovsdq 0x60c400(%rax), %xmm1
add $0x8, %rax
vaddsdq 0x60b9f8(%rax), %xmm1, %xmm1
vaddsd %xmm2, %xmm0, %xmm0 ---> 2
vaddsd %xmm1, %xmm0, %xmm0 ---> 3
vmovsdq %xmm0, 0x60cdf8(%rax) ---> 4
cmp $0xa00,%rdx
jne Loop1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (2 preceding siblings ...)
2023-06-09 11:11 ` lili.cui at intel dot com
@ 2023-06-25 5:56 ` lili.cui at intel dot com
2023-06-25 20:01 ` hubicka at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-06-25 5:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #3 from cuilili <lili.cui at intel dot com> ---
I reproduced S1244 regression on znver3.
Src code:
for (int i = 0; i < LEN_1D-1; i++)
{
a[i] = b[i] + c[i] * c[i] + b[i] * b[i] + c[i];
d[i] = a[i] + a[i+1];
}
--------------------------------------------------------
Base version: Base + commit version:
Assembler Assembler
Loop1: Loop1:
vmovsd 0x60c400(%rax),%xmm2 vmovsd 0x60ba00(%rax),%xmm2
vmovsd 0x60ba00(%rax),%xmm1 vmovsd 0x60c400(%rax),%xmm1
add $0x8,%rax add $0x8,%rax
--------------------------------------------------------------------
vaddsd %xmm1,%xmm2,%xmm0 vmovsd %xmm2,%xmm2,%xmm0
vmulsd %xmm2,%xmm2,%xmm2 vfmadd132sd %xmm2,%xmm1,%xmm0
vfmadd132sd %xmm1,%xmm2,%xmm1 vfmadd132sd %xmm1,%xmm2,%xmm1
--------------------------------------------------------------------
vaddsd %xmm1,%xmm0,%xmm0 vaddsd %xmm1,%xmm0,%xmm0
vmovsd %xmm0,0x60cdf8(%rax) vmovsd %xmm0,0x60cdf8(%rax)
vaddsd 0x60ce00(%rax),%xmm0,%xmm0 vaddsd 0x60ce00(%rax),%xmm0,%xmm0
vmovsd %xmm0,0x60aff8(%rax) vmovsd %xmm0,0x60aff8(%rax)
cmp $0x9f8,%rax cmp $0x9f8,%rax
jne Loop1: jne Loop1
For the Base version, mult and FMA have dependencies, which increases the
latency of the critical dependency chain. I didn't find out why znver3 has
regression. Same binary running on ICX has 11% gain (with #define iterations
100000000).
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (3 preceding siblings ...)
2023-06-25 5:56 ` lili.cui at intel dot com
@ 2023-06-25 20:01 ` hubicka at gcc dot gnu.org
2023-06-29 9:31 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-06-25 20:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
zen3 fma requires all inputs to be ready to start execution, separate
multiply+add can start multiplication earlier. Not sure if that explains the
difference.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (4 preceding siblings ...)
2023-06-25 20:01 ` hubicka at gcc dot gnu.org
@ 2023-06-29 9:31 ` cvs-commit at gcc dot gnu.org
2023-09-23 10:32 ` jamborm at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-06-29 9:31 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Lili Cui <cuilili@gcc.gnu.org>:
https://gcc.gnu.org/g:4633e38cd22c5e51fac984124c7627be912d0999
commit r14-2185-g4633e38cd22c5e51fac984124c7627be912d0999
Author: Lili Cui <lili.cui@intel.com>
Date: Thu Jun 29 06:51:56 2023 +0000
Avoid adding loop-carried ops to long chains
Avoid adding loop-carried ops to long chains, otherwise the whole chain
will
have dependencies across the loop iteration. Just keep loop-carried ops in
a
separate chain.
E.g.
x_1 = phi(x_0, x_2)
y_1 = phi(y_0, y_2)
a + b + c + d + e + x1 + y1
SSA1 = a + b;
SSA2 = c + d;
SSA3 = SSA1 + e;
SSA4 = SSA3 + SSA2;
SSA5 = x1 + y1;
SSA6 = SSA4 + SSA5;
With the patch applied, these test cases improved by 32%~100%.
S242:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i];}
Case 1:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
Case 2:
for (int i = 1; i < LEN_1D; ++i) {
a[i] = a[i - 1] + b[i - 1] + s1 + s2 + b[i] + c[i] + d[i] + e[i];}
The value is the execution time
A: original version
B: with FMA patch g:e5405f065bace0685cb3b8878d1dfc7a6e7ef409(base on A)
C: with current patch(base on B)
A B C B/A C/A
s242 2.859 5.152 2.859 1.802028681 1
case 1 5.489 5.488 3.511 0.999818 0.64
case 2 7.216 7.499 4.885 1.039218 0.68
gcc/ChangeLog:
PR tree-optimization/110148
* tree-ssa-reassoc.cc (rewrite_expr_tree_parallel): Handle
loop-carried
ops in this function.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (5 preceding siblings ...)
2023-06-29 9:31 ` cvs-commit at gcc dot gnu.org
@ 2023-09-23 10:32 ` jamborm at gcc dot gnu.org
2023-09-26 1:55 ` lili.cui at intel dot com
2023-09-26 15:04 ` jamborm at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-09-23 10:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I believe this has been fixed?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (6 preceding siblings ...)
2023-09-23 10:32 ` jamborm at gcc dot gnu.org
@ 2023-09-26 1:55 ` lili.cui at intel dot com
2023-09-26 15:04 ` jamborm at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: lili.cui at intel dot com @ 2023-09-26 1:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
--- Comment #7 from cuilili <lili.cui at intel dot com> ---
(In reply to Martin Jambor from comment #6)
> I believe this has been fixed?
Yes.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
` (7 preceding siblings ...)
2023-09-26 1:55 ` lili.cui at intel dot com
@ 2023-09-26 15:04 ` jamborm at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-09-26 15:04 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu.org
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
--- Comment #8 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to cuilili from comment #7)
> (In reply to Martin Jambor from comment #6)
> > I believe this has been fixed?
>
> Yes.
Closing the bug then.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-09-26 15:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06 20:33 [Bug middle-end/110148] New: TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8 hubicka at gcc dot gnu.org
2023-06-06 21:02 ` [Bug middle-end/110148] " pinskia at gcc dot gnu.org
2023-06-09 6:36 ` [Bug middle-end/110148] [14 Regression] " rguenth at gcc dot gnu.org
2023-06-09 11:11 ` lili.cui at intel dot com
2023-06-25 5:56 ` lili.cui at intel dot com
2023-06-25 20:01 ` hubicka at gcc dot gnu.org
2023-06-29 9:31 ` cvs-commit at gcc dot gnu.org
2023-09-23 10:32 ` jamborm at gcc dot gnu.org
2023-09-26 1:55 ` lili.cui at intel dot com
2023-09-26 15:04 ` jamborm at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).