* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
@ 2023-11-24 9:07 ` fkastl at suse dot cz
2023-11-24 9:51 ` rguenth at gcc dot gnu.org
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: fkastl at suse dot cz @ 2023-11-24 9:07 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #1 from Filip Kastl <fkastl at suse dot cz> ---
Around the same time there was also a 6% slowdown with
-Ofast -march=native -g and PGO
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=299.70.0
and an 11% slowdown with
-O2 -g -flto=128 and PGO
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=685.70.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
2023-11-24 9:07 ` [Bug middle-end/112697] " fkastl at suse dot cz
@ 2023-11-24 9:51 ` rguenth at gcc dot gnu.org
2023-11-24 17:45 ` jamborm at gcc dot gnu.org
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-24 9:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.0
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
This big jump has been seen in the past, I wonder if it's one of those
micro-arch hazards regarding alignment. The only "generic" change with
possibly
ripple-down effects is r14-4965-ga5e69e94591ae2.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
2023-11-24 9:07 ` [Bug middle-end/112697] " fkastl at suse dot cz
2023-11-24 9:51 ` rguenth at gcc dot gnu.org
@ 2023-11-24 17:45 ` jamborm at gcc dot gnu.org
2023-11-24 17:52 ` [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f sjames at gcc dot gnu.org
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-24 17:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu.org,
| |vmakarov at gcc dot gnu.org
--- Comment #3 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I can reliably bisect this to r14-4972-g8aa47713701b1f (Vladimir's [RA]: Add
cost calculation for reg equivalence invariants) on a similar zen2 machine.
But it seems zen2 specific, I did not see any performance difference (this is
generic march/tuning) on znver4, for example. So it may be quite hard to
analyze and fix, even though the regression is big :-/
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (2 preceding siblings ...)
2023-11-24 17:45 ` jamborm at gcc dot gnu.org
@ 2023-11-24 17:52 ` sjames at gcc dot gnu.org
2023-11-27 11:58 ` amonakov at gcc dot gnu.org
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-24 17:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Sam James <sjames at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|[14 Regression] 30-40% exec |[14 Regression] 30-40% exec
|time regression of 433.milc |time regression of 433.milc
|on zen2 |on zen2 since
| |r14-4972-g8aa47713701b1f
--- Comment #4 from Sam James <sjames at gcc dot gnu.org> ---
I can probably find a znver2 machine for someone to work on if it's needed, but
that's obviously not going to be the hardest part here...
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (3 preceding siblings ...)
2023-11-24 17:52 ` [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f sjames at gcc dot gnu.org
@ 2023-11-27 11:58 ` amonakov at gcc dot gnu.org
2023-11-29 13:52 ` jamborm at gcc dot gnu.org
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-11-27 11:58 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Alexander Monakov <amonakov at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |amonakov at gcc dot gnu.org
--- Comment #5 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Martin, if you still have the binaries, would you mind sharing perf profiles?
You can produce plain-text reports with 'perf report --stdio' and 'perf
annotate --stdio'.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (4 preceding siblings ...)
2023-11-27 11:58 ` amonakov at gcc dot gnu.org
@ 2023-11-29 13:52 ` jamborm at gcc dot gnu.org
2023-11-29 13:53 ` jamborm at gcc dot gnu.org
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-29 13:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #6 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Created attachment 56719
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56719&action=edit
Perf annotate of milc built with r14-4971-g0beb1611754742
commit r14-4971-g0beb1611754742:
$ perf stat taskset -c 0 specinvoke
Performance counter stats for 'taskset -c 0 specinvoke':
216908.59 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
889694 page-faults:u # 4.102 K/sec
697007650237 cycles:u # 3.213 GHz
(83.33%)
31999772966 stalled-cycles-frontend:u # 4.59% frontend
cycles idle (83.33%)
540485725923 stalled-cycles-backend:u # 77.54% backend
cycles idle (83.33%)
1061256162815 instructions:u # 1.52 insn per
cycle
# 0.51 stalled cycles per
insn (83.33%)
58760648879 branches:u # 270.901 M/sec
(83.34%)
11890202 branch-misses:u # 0.02% of all
branches (83.33%)
216.935387643 seconds time elapsed
211.436079000 seconds user
5.472459000 seconds sys
$ perf record taskset -c 0 specinvoke
[ perf record: Woken up 132 times to write data ]
[ perf record: Captured and wrote 32.901 MB perf.data (862286 samples) ]
$ perf report -n --percent-limit=1 --stdio
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 862K of event 'cycles:Pu'
# Event count (approx.): 695776598661
#
# Overhead Samples Command Shared Object Symbol
# ........ ............ ............... ......................
......................................
#
22.68% 197003 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_na
20.99% 177912 milc_base.mine- milc_base.mine-lto-gen [.]
u_shift_fermion
19.04% 163787 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_nn
6.85% 58509 milc_base.mine- milc_base.mine-lto-gen [.]
scalar_mult_add_su3_matrix
5.51% 50953 milc_base.mine- milc_base.mine-lto-gen [.]
path_product
5.40% 46083 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_an
4.22% 35853 milc_base.mine- milc_base.mine-lto-gen [.]
add_force_to_mom
3.77% 32446 milc_base.mine- milc_base.mine-lto-gen [.]
imp_gauge_force.constprop.0
1.98% 16848 milc_base.mine- milc_base.mine-lto-gen [.]
compute_gen_staple
1.94% 16462 milc_base.mine- milc_base.mine-lto-gen [.]
make_anti_hermitian
1.73% 14655 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_mat_vec_sum_4dir
1.35% 11472 milc_base.mine- milc_base.mine-lto-gen [.]
mult_adj_su3_mat_4vec
1.27% 10801 milc_base.mine- libc.so.6 [.]
__memset_avx2_unaligned_erms
$ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-0beb1611754
(gzipeped and attached)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (5 preceding siblings ...)
2023-11-29 13:52 ` jamborm at gcc dot gnu.org
@ 2023-11-29 13:53 ` jamborm at gcc dot gnu.org
2023-12-01 15:37 ` amonakov at gcc dot gnu.org
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-11-29 13:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #7 from Martin Jambor <jamborm at gcc dot gnu.org> ---
Created attachment 56720
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56720&action=edit
Perf annotate of milc built with r14-4972-g8aa47713701b1f
commit r14-4972-g8aa47713701b1f:
$ perf stat taskset -c 0 specinvoke
Performance counter stats for 'taskset -c 0 specinvoke':
272931.43 msec task-clock:u # 1.000 CPUs
utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
472353 page-faults:u # 1.731 K/sec
886165387570 cycles:u # 3.247 GHz
(83.33%)
31546898034 stalled-cycles-frontend:u # 3.56% frontend
cycles idle (83.33%)
729878095777 stalled-cycles-backend:u # 82.36% backend
cycles idle (83.33%)
1061779557370 instructions:u # 1.20 insn per
cycle
# 0.69 stalled cycles per
insn (83.33%)
58797121078 branches:u # 215.428 M/sec
(83.33%)
6960852 branch-misses:u # 0.01% of all
branches (83.33%)
272.967381843 seconds time elapsed
268.718335000 seconds user
4.212584000 seconds sys
$ perf record taskset -c 0 specinvoke
[ perf record: Woken up 167 times to write data ]
[ perf record: Captured and wrote 41.549 MB perf.data (1088982 samples) ]
$ perf report -n --percent-limit=1 --stdio
# To display the perf.data header info, please use --header/--header-only
options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of event 'cycles:Pu'
# Event count (approx.): 883903400858
#
# Overhead Samples Command Shared Object Symbol
# ........ ............ ............... ......................
......................................
#
24.34% 260907 milc_base.mine- milc_base.mine-lto-gen [.]
add_force_to_mom
18.01% 198287 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_na
17.45% 187529 milc_base.mine- milc_base.mine-lto-gen [.]
u_shift_fermion
14.22% 155596 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_nn
5.61% 60601 milc_base.mine- milc_base.mine-lto-gen [.]
scalar_mult_add_su3_matrix
4.35% 51034 milc_base.mine- milc_base.mine-lto-gen [.]
path_product
4.24% 46032 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_an
2.99% 32624 milc_base.mine- milc_base.mine-lto-gen [.]
imp_gauge_force.constprop.0
1.50% 16242 milc_base.mine- milc_base.mine-lto-gen [.]
compute_gen_staple
1.35% 14580 milc_base.mine- milc_base.mine-lto-gen [.]
mult_su3_mat_vec_sum_4dir
1.21% 12922 milc_base.mine- milc_base.mine-lto-gen [.]
make_anti_hermitian
1.06% 11469 milc_base.mine- milc_base.mine-lto-gen [.]
mult_adj_su3_mat_4vec
1.03% 11111 milc_base.mine- libc.so.6 [.]
__memset_avx2_unaligned_erms
$ perf annotate -n --percent-limit=1 > ~/tmp/milc-perf-annotate-8aa47713701
(gzipeped and attached)
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (6 preceding siblings ...)
2023-11-29 13:53 ` jamborm at gcc dot gnu.org
@ 2023-12-01 15:37 ` amonakov at gcc dot gnu.org
2023-12-01 16:00 ` amonakov at gcc dot gnu.org
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-12-01 15:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #8 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Thanks, I can reproduce it. It is pretty tricky though. For instance, just
swapping the mov and the compare is enough to make it fast:
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.fast.s 2023-12-01 18:32:20.318668991 +0300
@@ -743,8 +743,8 @@ add_force_to_mom:
.p2align 4,,10
.p2align 3
.L58:
- cmpb $1, -680(%r11,%r12)
movapd %xmm5, %xmm7
+ cmpb $1, -680(%r11,%r12)
jne .L54
xorpd %xmm6, %xmm7
.L54:
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (7 preceding siblings ...)
2023-12-01 15:37 ` amonakov at gcc dot gnu.org
@ 2023-12-01 16:00 ` amonakov at gcc dot gnu.org
2024-03-07 21:01 ` law at gcc dot gnu.org
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: amonakov at gcc dot gnu.org @ 2023-12-01 16:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
--- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
... as does inserting a nop before the compare ¯\_(ツ)_/¯
--- d.out.ltrans0.ltrans.slow.s 2023-12-01 18:32:54.255841611 +0300
+++ d.out.ltrans0.ltrans.s 2023-12-01 18:53:04.909438690 +0300
@@ -743,6 +743,7 @@ add_force_to_mom:
.p2align 4,,10
.p2align 3
.L58:
+ nop
cmpb $1, -680(%r11,%r12)
movapd %xmm5, %xmm7
jne .L54
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (8 preceding siblings ...)
2023-12-01 16:00 ` amonakov at gcc dot gnu.org
@ 2024-03-07 21:01 ` law at gcc dot gnu.org
2024-03-21 9:45 ` pheeck at gcc dot gnu.org
2024-03-22 12:47 ` law at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 21:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
CC| |law at gcc dot gnu.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (9 preceding siblings ...)
2024-03-07 21:01 ` law at gcc dot gnu.org
@ 2024-03-21 9:45 ` pheeck at gcc dot gnu.org
2024-03-22 12:47 ` law at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: pheeck at gcc dot gnu.org @ 2024-03-21 9:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Filip Kastl <pheeck at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Keywords|needs-bisection |
--- Comment #10 from Filip Kastl <pheeck at gcc dot gnu.org> ---
I see that the benchmark's exec time has returned to its original value. If
there are no objections, I'll mark this bug as fixed.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug middle-end/112697] [14 Regression] 30-40% exec time regression of 433.milc on zen2 since r14-4972-g8aa47713701b1f
2023-11-24 9:03 [Bug middle-end/112697] New: [14 Regression] 30-40% exec time regression of 433.milc on zen2 fkastl at suse dot cz
` (10 preceding siblings ...)
2024-03-21 9:45 ` pheeck at gcc dot gnu.org
@ 2024-03-22 12:47 ` law at gcc dot gnu.org
11 siblings, 0 replies; 13+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-22 12:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112697
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|UNCONFIRMED |RESOLVED
--- Comment #11 from Jeffrey A. Law <law at gcc dot gnu.org> ---
Per c#10.
^ permalink raw reply [flat|nested] 13+ messages in thread