public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
@ 2023-07-07 9:46 hubicka at gcc dot gnu.org
2023-07-07 10:26 ` [Bug middle-end/110586] [13/14 Regression] " rguenth at gcc dot gnu.org
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-07 9:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Bug ID: 110586
Summary: 10% fatigue2 regression on zen between
g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and
g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
Product: gcc
Version: 13.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: hubicka at gcc dot gnu.org
Target Milestone: ---
seen here:
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=283.767.0
on zen3 and
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.767.0
on zen2
https://lnt.opensuse.org/db_default/v4/CPP/graph?plot.0=171.767.0
on zen1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [13/14 Regression] 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
@ 2023-07-07 10:26 ` rguenth at gcc dot gnu.org
2023-07-15 19:26 ` [Bug middle-end/110586] [13/14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 jamborm at gcc dot gnu.org
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-07 10:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords| |missed-optimization,
| |needs-bisection
Summary|10% fatigue2 regression on |[13/14 Regression] 10%
|zen between |fatigue2 regression on zen
|g:8377cf1bf41a0a9d9d49de807 |between
|b2341f0bf5d30cf and |g:8377cf1bf41a0a9d9d49de807
|g:3a61ca1b9256535e1bfb19b2d |b2341f0bf5d30cf and
|46cde21f3908a5d |g:3a61ca1b9256535e1bfb19b2d
| |46cde21f3908a5d
Target Milestone|--- |13.2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [13/14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-07 10:26 ` [Bug middle-end/110586] [13/14 Regression] " rguenth at gcc dot gnu.org
@ 2023-07-15 19:26 ` jamborm at gcc dot gnu.org
2023-07-15 19:37 ` [Bug middle-end/110586] [14 " pinskia at gcc dot gnu.org
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-15 19:26 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu.org
Summary|[13/14 Regression] 10% |[13/14 Regression] 10%
|fatigue2 regression on zen |fatigue2 regression on zen
|between |since
|g:8377cf1bf41a0a9d9d49de807 |r14-2369-g3a61ca1b925653
|b2341f0bf5d30cf and |
|g:3a61ca1b9256535e1bfb19b2d |
|46cde21f3908a5d |
--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
This unfortunately also bisects to:
3a61ca1b9256535e1bfb19b2d46cde21f3908a5d is the first bad commit
commit 3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
Author: Jan Hubicka <jh@suse.cz>
Date: Thu Jul 6 18:56:22 2023 +0200
Improve profile updates after loop-ch and cunroll
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-07 10:26 ` [Bug middle-end/110586] [13/14 Regression] " rguenth at gcc dot gnu.org
2023-07-15 19:26 ` [Bug middle-end/110586] [13/14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 jamborm at gcc dot gnu.org
@ 2023-07-15 19:37 ` pinskia at gcc dot gnu.org
2023-07-17 9:02 ` hubicka at gcc dot gnu.org
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-15 19:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
Summary|[13/14 Regression] 10% |[14 Regression] 10%
|fatigue2 regression on zen |fatigue2 regression on zen
|since |since
|r14-2369-g3a61ca1b925653 |r14-2369-g3a61ca1b925653
Target Milestone|13.2 |14.0
Version|13.1.0 |14.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (2 preceding siblings ...)
2023-07-15 19:37 ` [Bug middle-end/110586] [14 " pinskia at gcc dot gnu.org
@ 2023-07-17 9:02 ` hubicka at gcc dot gnu.org
2023-07-17 10:02 ` jamborm at gcc dot gnu.org
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-17 9:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Last reconfirmed| |2023-07-17
--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Do we have other PRs reducing to this change?
The patch makes cuntroll to scale down previously incoherent profiles when loop
that does not loop is predicted to loop.
Common source of these loops are vectorized epilogues which I fixed yesterday.
With some luck this may fix fatigue.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (3 preceding siblings ...)
2023-07-17 9:02 ` hubicka at gcc dot gnu.org
@ 2023-07-17 10:02 ` jamborm at gcc dot gnu.org
2023-07-18 10:27 ` [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling) hubicka at gcc dot gnu.org
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 10:02 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #2)
> Do we have other PRs reducing to this change?
>
I thought the recent sphinx regression was also becaus of this? But if I am
wrong, there may be none.
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling)
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (4 preceding siblings ...)
2023-07-17 10:02 ` jamborm at gcc dot gnu.org
@ 2023-07-18 10:27 ` hubicka at gcc dot gnu.org
2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07 7:41 ` [Bug middle-end/110586] [14/15 " rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 10:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[14 Regression] 10% |[14 Regression] 10%
|fatigue2 regression on zen |fatigue2 regression on zen
|since |since
|r14-2369-g3a61ca1b925653 |r14-2369-g3a61ca1b925653
| |(bad LRA&scheduling)
--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Aha, sphinx3 is indeed same patch.
The patch corrects profile here. It is LRA/scheduler interaction that causes
the difference
With older trunk I get:
Performance counter stats for './b.out':
28,536.75 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 4.836 /sec
134,747,380,473 cycles:u # 4.722 GHz
(83.33%)
714,193,718 stalled-cycles-frontend:u # 0.53% frontend
cycles idle (83.33%)
3,510,378 stalled-cycles-backend:u # 0.00% backend
cycles idle (83.33%)
243,176,910,654 instructions:u # 1.80 insn per
cycle
# 0.00 stalled cycles per
insn (83.33%)
13,541,807,472 branches:u # 474.539 M/sec
(83.33%)
13,829,858 branch-misses:u # 0.10% of all
branches (83.33%)
28.537620889 seconds time elapsed
28.536941000 seconds user
0.000000000 seconds sys
and with current trunk:
Performance counter stats for './a.out':
31933.51 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 4.321 /sec
150448312691 cycles:u # 4.711 GHz
(83.33%)
760763745 stalled-cycles-frontend:u # 0.51% frontend
cycles idle (83.33%)
1918238 stalled-cycles-backend:u # 0.00% backend
cycles idle (83.33%)
242823668283 instructions:u # 1.61 insn per
cycle
# 0.00 stalled cycles per
insn (83.34%)
13541981288 branches:u # 424.068 M/sec
(83.34%)
14583703 branch-misses:u # 0.11% of all
branches (83.33%)
31.933986770 seconds time elapsed
31.933701000 seconds user
0.000000000 seconds sys
So same instruction and branch count, but they execute slower. IPC goes down
from 1.8 to 1.6. Perf thinks the difference is
__perdida_m_MOD_generalized_hookes_law.constprop.0.
27.45% b.out b.out [.] MAIN__
27.07% a.out a.out [.] MAIN__
21.72% a.out a.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
16.60% b.out b.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.0.
2.22% a.out a.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
1.64% b.out b.out [.]
__perdida_m_MOD_generalized_hookes_law.constprop.1.
1.55% b.out libc.so.6 [.] __memset_avx2_unaligned_erms
1.54% a.out libc.so.6 [.] __memset_avx2_unaligned_erms
0.06% a.out libm.so.6 [.] __sincos_fma
0.04% b.out libm.so.6 [.] __sincos_fma
b.out is before patch and a.out is after. The difference seems to be relocated
load. Before patch:
Percent│ 0000000000401860 <__perdida_m_MOD_generalized_hookes_▒
│ __perdida_m_MOD_generalized_hookes_law.constprop.0.is▒
0.10 │ push %rbp ▒
0.02 │ mov %r8,%rax ▒
│ vmovddup %xmm0,%xmm5 ▒
│ mov %rsp,%rbp ▒
1.22 │ push %r15 ▒
0.04 │ push %r14 ▒
0.03 │ push %r13 ▒
0.09 │ push %r12 ▒
0.05 │ push %rbx ▒
0.03 │ not %rax ▒
0.00 │ mov %rdi,%rbx ▒
│ and $0xffffffffffffffe0,%rsp ▒
1.11 │ mov %rdx,%r12 ▒
│ sub $0x180,%rsp ▒
0.04 │ vmovapd %xmm5,0x20(%rsp) ◆
^^^^ this load
0.04 │ mov %rax,0x30(%rsp) ▒
0.02 │ test %rsi,%rsi ▒
│ ↓ je 210 ▒
│ mov %rsi,%rax ▒
│ mov %rsi,%r13 ▒
1.16 │ lea (%rsi,%rsi,1),%r10 ▒
0.01 │ mov %rsi,%r15 ▒
│ shl $0x4,%rax ▒
0.06 │ neg %r13 ▒
│ lea (%r10,%rsi,1),%r14 ▒
0.03 │ mov %rax,0x18(%rsp) ▒
0.02 │ lea 0x0(,%rsi,8),%rax ▒
│ mov %rax,0x10(%rsp) ▒
1.23 │ 66: mov $0x120,%edx ▒
0.01 │ xor %esi,%esi ▒
│ lea 0x60(%rsp),%rdi ▒
0.07 │ vmovsd %xmm1,0x38(%rsp) ▒
0.03 │ vmovsd %xmm0,0x40(%rsp) ▒
0.12 │ mov %r8,0x48(%rsp) ▒
0.05 │ mov %rcx,0x50(%rsp) ▒
0.06 │ sub %r12,%r13 ▒
1.16 │ mov %r10,0x58(%rsp) ▒
0.04 │ → call memset@plt ▒
after
│ 0000000000401870 <__perdida_m_MOD_generalized_hookes_▒
│ __perdida_m_MOD_generalized_hookes_law.constprop.0.is▒
0.07 │ push %rbp ▒
0.01 │ mov %r8,%rax ▒
│ vmovddup %xmm0,%xmm3 ▒
│ mov %rsp,%rbp ▒
0.87 │ push %r15 ▒
0.04 │ push %r14 ▒
0.02 │ push %r13 ▒
0.07 │ push %r12 ▒
0.02 │ push %rbx ▒
0.02 │ not %rax ▒
0.00 │ mov %rdi,%rbx ▒
│ and $0xffffffffffffffe0,%rsp ▒
0.87 │ mov %rdx,%r12 ▒
│ sub $0x180,%rsp ◆
0.04 │ mov %rax,0x58(%rsp) ▒
0.03 │ test %rsi,%rsi ▒
│ je 210 ▒
│ mov %rsi,%rax ▒
0.00 │ mov %rsi,%r13 ▒
│ lea (%rsi,%rsi,1),%r10 ▒
0.95 │ mov %rsi,%r15 ▒
0.01 │ shl $0x4,%rax ▒
│ neg %r13 ▒
0.04 │ lea (%r10,%rsi,1),%r14 ▒
0.04 │ mov %rax,0x18(%rsp) ▒
0.01 │ lea 0x0(,%rsi,8),%rax ▒
│ mov %rax,0x10(%rsp) ▒
0.02 │ 60: mov $0x120,%edx ▒
0.89 │ xor %esi,%esi ▒
0.01 │ lea 0x60(%rsp),%rdi ▒
│ vmovsd %xmm1,0x20(%rsp) ▒
^^^^ is now here
0.08 │ vmovsd %xmm0,0x28(%rsp) ▒
0.04 │ mov %r8,0x30(%rsp) ▒
0.01 │ mov %rcx,0x38(%rsp) ▒
0.05 │ sub %r12,%r13 ▒
│ mov %r10,0x50(%rsp) ▒
1.04 │ vmovapd %xmm3,0x40(%rsp) ▒
And later bit different scheduling:
0.12 │ vmovsd %xmm1,0x108(%rsp) ▒
1.22 │ vmovsd %xmm0,0x70(%rsp) ◆
0.38 │ vmovapd %xmm4,0xc0(%rsp) ▒
1.27 │ vmovsd %xmm0,0xa0(%rsp) ▒
0.20 │ vmovsd %xmm1,0x140(%rsp) ▒
2.41 │ vmovsd %xmm1,0x178(%rsp) ▒
2.05 │ vbroadcastsd 0x10(%rcx,%rax,8),%ymm1 ▒
0.10 │ vunpcklpd %xmm0,%xmm2,%xmm3 ▒
│ vmovsd %xmm2,0xd0(%rsp) ▒
0.34 │ vmovapd %xmm3,0x60(%rsp) ▒
2.25 │ vunpcklpd %xmm2,%xmm0,%xmm3 ▒
│ vbroadcastsd -0x8(%rcx,%rdx,8),%ymm2 ▒
0.01 │ vmovapd %xmm3,0x90(%rsp) ▒
0.28 │ vbroadcastsd (%rcx,%rdx,8),%ymm3 ▒
0.01 │ vmulpd 0xc0(%rsp),%ymm3,%ymm3 ▒
52.87 │ vmulpd 0xf0(%rsp),%ymm2,%ymm2 ▒
0.06 │ vbroadcastsd (%rcx),%ymm0 ▒
│ vfmadd132pd 0x90(%rsp),%ymm3,%ymm1 ▒
1.77 │ vfmadd132pd 0x60(%rsp),%ymm2,%ymm0 ▒
0.10 │ vmovddup 0x8(%rcx,%rax,8),%xmm2 ▒
│ lea 0x0(%r13,%r12,2),%rax ▒
After:
0.28 │ vmovsd %xmm1,0x108(%rsp) ▒
0.98 │ vmovsd %xmm0,0x70(%rsp) ◆
0.04 │ vmovapd %xmm3,0xc0(%rsp) ▒
0.99 │ vmovsd %xmm0,0xa0(%rsp) ▒
0.26 │ vmovsd %xmm1,0x140(%rsp) ▒
1.80 │ vmovsd %xmm1,0x178(%rsp) ▒
0.91 │ vbroadcastsd (%rcx,%rdx,8),%ymm3 ▒
0.08 │ vbroadcastsd 0x10(%rcx,%rax,8),%ymm1 ▒
0.07 │ vunpcklpd %xmm0,%xmm2,%xmm4 ▒
0.02 │ vmovsd %xmm2,0xd0(%rsp) ▒
0.93 │ vmulpd 0xc0(%rsp),%ymm3,%ymm3 ▒
42.18 │ vmovapd %xmm4,0x60(%rsp) ▒
│ vunpcklpd %xmm2,%xmm0,%xmm4 ▒
│ vbroadcastsd -0x8(%rcx,%rdx,8),%ymm2 ▒
│ vmulpd 0xf0(%rsp),%ymm2,%ymm2 ▒
0.09 │ vmovapd %xmm4,0x90(%rsp) ▒
│ vbroadcastsd (%rcx),%ymm0 ▒
│ vfmadd132pd 0x90(%rsp),%ymm3,%ymm1 ▒
23.48 │ vfmadd132pd 0x60(%rsp),%ymm2,%ymm0 ▒
0.77 │ vmovddup 0x8(%rcx,%rax,8),%xmm2 ▒
│ lea 0x0(%r13,%r12,2),%rax ▒
Perdida is loopless with only 3 BBS in optimize dump. With old build we get:
<bb 2> [local count: 25581901]:
_60 = {ISRA.929_118(D), ISRA.929_118(D)};
offset.162_6 = ~ISRA.928_112(D);
if (ISRA.925_113(D) != 0)
goto <bb 3>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 3> [local count: 12790951]:
_226 = -ISRA.925_113(D);
_228 = ISRA.925_113(D) * 2;
_230 = ISRA.925_113(D) * 3;
_232 = ISRA.925_113(D) * 16;
_234 = (sizetype) _232;
_236 = ISRA.925_113(D) * 8;
_238 = (sizetype) _236;
<bb 4> [local count: 51163802]:
# iftmp.499_11 = PHI <ISRA.925_113(D)(3), 1(2)>
# prephitmp_227 = PHI <_226(3), -1(2)>
# prephitmp_229 = PHI <_228(3), 2(2)>
# prephitmp_231 = PHI <_230(3), 3(2)>
# prephitmp_235 = PHI <_234(3), 16(2)>
# prephitmp_239 = PHI <_238(3), 8(2)>
offset.166_13 = prephitmp_227 - ISRA.926_115(D);
generalized_constitutive_tensor = {};
_17 = .FMA (ISRA.930_119(D), 2.0e+0, ISRA.929_118(D));
_157 = {ISRA.929_118(D), _17};
_177 = {_17, ISRA.929_118(D)};
Count of BB4 should be the same as the count of BB2 but it is twice as much.
This is originally comming from vectorizer doing the vectorized epilogue that
never iterates but giving it 50% chance of iteration.
After patch this is corrected:
<bb 2> [local count: 25581901]:
_60 = {ISRA.929_118(D), ISRA.929_118(D)};
offset.162_6 = ~ISRA.928_112(D);
if (ISRA.925_113(D) != 0)
goto <bb 3>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 3> [local count: 12790951]:
_226 = -ISRA.925_113(D);
_228 = ISRA.925_113(D) * 2;
_230 = ISRA.925_113(D) * 3;
_232 = ISRA.925_113(D) * 16;
_234 = (sizetype) _232;
_236 = ISRA.925_113(D) * 8;
_238 = (sizetype) _236;
<bb 4> [local count: 25581901]:
# iftmp.499_11 = PHI <ISRA.925_113(D)(3), 1(2)>
# prephitmp_227 = PHI <_226(3), -1(2)>
# prephitmp_229 = PHI <_228(3), 2(2)>
# prephitmp_231 = PHI <_230(3), 3(2)>
# prephitmp_235 = PHI <_234(3), 16(2)>
# prephitmp_239 = PHI <_238(3), 8(2)>
offset.166_13 = prephitmp_227 - ISRA.926_115(D);
generalized_constitutive_tensor = {};
_17 = .FMA (ISRA.930_119(D), 2.0e+0, ISRA.929_118(D));
_157 = {ISRA.929_118(D), _17};
_177 = {_17, ISRA.929_118(D)};
MEM <vector(2) real(kind=8)> [(real(kind=8)
*)&generalized_constitutive_tensor] = _177;
MEM <vector(2) real(kind=8)> [(real(kind=8)
*)&generalized_constitutive_tensor + 48B] = _157;
MEM <vector(2) real(kind=8)> [(real(kind=8)
*)&generalized_constitutive_tensor + 96B] = _60;
So it seems like RTL backend getting worse schedule due to different memory
allocations.
Memset is bit unfortunate here since it requires a lot of spiling. With
-minline-all-stringops I get before patch:
Performance counter stats for './b.out':
27,928.16 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 4.941 /sec
133,992,554,723 cycles:u # 4.798 GHz
(83.33%)
17,113,198 stalled-cycles-frontend:u # 0.01% frontend
cycles idle (83.33%)
10,144,634 stalled-cycles-backend:u # 0.01% backend
cycles idle (83.33%)
205,237,551,965 instructions:u # 1.53 insn per
cycle
# 0.00 stalled cycles per
insn (83.33%)
7,665,052,125 branches:u # 274.456 M/sec
(83.34%)
13,596,346 branch-misses:u # 0.18% of all
branches (83.34%)
27.933007797 seconds time elapsed
27.928356000 seconds user
0.000000000 seconds sys
and after patch:
30791.26 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
138 page-faults:u # 4.482 /sec
148093969122 cycles:u # 4.810 GHz
(83.33%)
13660157 stalled-cycles-frontend:u # 0.01% frontend
cycles idle (83.33%)
411233 stalled-cycles-backend:u # 0.00% backend
cycles idle (83.33%)
204951193376 instructions:u # 1.38 insn per
cycle
# 0.00 stalled cycles per
insn (83.33%)
7664856101 branches:u # 248.930 M/sec
(83.33%)
12960525 branch-misses:u # 0.17% of all
branches (83.34%)
30.791579163 seconds time elapsed
30.791441000 seconds user
0.000000000 seconds sys
So this may be
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling)
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (5 preceding siblings ...)
2023-07-18 10:27 ` [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling) hubicka at gcc dot gnu.org
@ 2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07 7:41 ` [Bug middle-end/110586] [14/15 " rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 23:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
CC| |law at gcc dot gnu.org
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Bug middle-end/110586] [14/15 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling)
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
` (6 preceding siblings ...)
2024-03-07 23:29 ` law at gcc dot gnu.org
@ 2024-05-07 7:41 ` rguenth at gcc dot gnu.org
7 siblings, 0 replies; 9+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07 7:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110586
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|14.0 |14.2
--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-05-07 7:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-07 9:46 [Bug middle-end/110586] New: 10% fatigue2 regression on zen between g:8377cf1bf41a0a9d9d49de807b2341f0bf5d30cf and g:3a61ca1b9256535e1bfb19b2d46cde21f3908a5d hubicka at gcc dot gnu.org
2023-07-07 10:26 ` [Bug middle-end/110586] [13/14 Regression] " rguenth at gcc dot gnu.org
2023-07-15 19:26 ` [Bug middle-end/110586] [13/14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 jamborm at gcc dot gnu.org
2023-07-15 19:37 ` [Bug middle-end/110586] [14 " pinskia at gcc dot gnu.org
2023-07-17 9:02 ` hubicka at gcc dot gnu.org
2023-07-17 10:02 ` jamborm at gcc dot gnu.org
2023-07-18 10:27 ` [Bug middle-end/110586] [14 Regression] 10% fatigue2 regression on zen since r14-2369-g3a61ca1b925653 (bad LRA&scheduling) hubicka at gcc dot gnu.org
2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07 7:41 ` [Bug middle-end/110586] [14/15 " rguenth at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).