* [Bug target/110649] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
@ 2023-07-12 22:39 ` pinskia at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] [14 Regression] " pinskia at gcc dot gnu.org
` (16 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-12 22:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
3 profile changes:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d4c2e34deef8cbd81ba2ef3389fdbaf95c70e225
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e406f0753e8d78d320437189211e3094c33b7e4
1 vectorizer change:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=224fd59b2dc8a5fa78a309a09863afe9b3cf2111
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
@ 2023-07-12 22:39 ` pinskia at gcc dot gnu.org
2023-07-14 7:16 ` hliu at amperecomputing dot com
` (15 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-12 22:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.0
Keywords| |missed-optimization
Target| |x86_64-linux-gnu
Version|13.1.0 |14.0
Summary|25% sphinx3 spec2006 |[14 Regression] 25% sphinx3
|regression on Ice Lake and |spec2006 regression on Ice
|zen between |Lake and zen between
|g:acaa441a98bebc52 |g:acaa441a98bebc52
|(2023-07-06 11:36) and |(2023-07-06 11:36) and
|g:55900189ab517906 |g:55900189ab517906
|(2023-07-07 00:23) |(2023-07-07 00:23)
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] [14 Regression] " pinskia at gcc dot gnu.org
@ 2023-07-14 7:16 ` hliu at amperecomputing dot com
2023-07-14 14:39 ` hubicka at gcc dot gnu.org
` (14 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hliu at amperecomputing dot com @ 2023-07-14 7:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #2 from Hao Liu <hliu at amperecomputing dot com> ---
Hi, I bisected the following 3 commits (sequantial):
[v3] 3a61ca1b925 - Improve profile updates after loop-ch and cunroll
(2023-07-06) <Jan Hubicka>
[v2] d4c2e34deef - Improve scale_loop_profile (2023-07-06) <Jan Hubicka>
[v1] 224fd59b2dc - Vect: use a small step to calculate induction for the
unrolled loop (PR tree-optimization/110449) (2023-07-06) <Hao Liu OS>
Tests the time in seconds of 1-copy performance of 482.sphinx3 on zen2:
v3: 261s
v2: 231s
v1: 231s
So the regression should be caused by 3a61ca1b925, i.e.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (2 preceding siblings ...)
2023-07-14 7:16 ` hliu at amperecomputing dot com
@ 2023-07-14 14:39 ` hubicka at gcc dot gnu.org
2023-07-14 14:41 ` hubicka at gcc dot gnu.org
` (13 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-14 14:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=10647
Ever confirmed|0 |1
Last reconfirmed| |2023-07-14
Status|UNCONFIRMED |NEW
--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks for bisecting this! We also have PR10647 which is tracked to this
change.
The change correct loop profile after header copying:
test()
{
for (int i = 0; i < 10; i++)
test2();
}
has probability of exit conditional 90.9% before loop header copying (since it
technically iterates 10 times) while after loop header copying and optimizing
out the constant "if (0<10)" test it has only 90% loopback probability.
So probably fixing the bug above triggers something else.
I will first look at PR10647 and see if I can figure out what is going on
there.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (3 preceding siblings ...)
2023-07-14 14:39 ` hubicka at gcc dot gnu.org
@ 2023-07-14 14:41 ` hubicka at gcc dot gnu.org
2023-07-16 16:25 ` hubicka at gcc dot gnu.org
` (12 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-14 14:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We also have PR98782 that is about sphinx being sensitive to LRA decisions.
Reducing loopback probability might trigger LRA adding a spill to the loop.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (4 preceding siblings ...)
2023-07-14 14:41 ` hubicka at gcc dot gnu.org
@ 2023-07-16 16:25 ` hubicka at gcc dot gnu.org
2023-07-16 17:39 ` hubicka at gcc dot gnu.org
` (11 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 16:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also|https://gcc.gnu.org/bugzill |https://gcc.gnu.org/bugzill
|a/show_bug.cgi?id=10647 |a/show_bug.cgi?id=110647
--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
In comment 3 I got wrong PR number. It is PR110647
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (5 preceding siblings ...)
2023-07-16 16:25 ` hubicka at gcc dot gnu.org
@ 2023-07-16 17:39 ` hubicka at gcc dot gnu.org
2023-07-16 20:06 ` hubicka at gcc dot gnu.org
` (10 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 17:39 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I tried zen3 with -march=native -Ofast
Samples: 1M of event 'cycles:u', Event count (approx.): 2309002237334, DSO: s
Overhead Command Symbol
42.51% sphinx_livepret [.] mgau_eval ◆
24.36% sphinx_livepret [.] vector_gautbl_eval_logs3 ▒
6.81% sphinx_livepret [.] subvq_mgau_shortlist ▒
6.43% sphinx_livepret [.] logs3_add ▒
4.91% sphinx_livepret [.] approx_cont_mgau_frame_eval ▒
4.32% sphinx_livepret [.] mdef_sseq2sen_active ▒
2.62% sphinx_livepret [.] dict2pid_comsenscr ▒
1.50% sphinx_livepret [.] hmm_vit_eval_3st ▒
0.84% sphinx_livepret [.] lextree_hmm_eval ▒
0.67% sphinx_livepret [.] lextree_hmm_propagate ▒
0.64% sphinx_livepret [.] lextree_enter ▒
0.61% sphinx_livepret [.] fe_fft ▒
0.45% sphinx_livepret [.] dict2pid_comsseq2sen_active ▒
0.32% sphinx_livepret [.] lextree_ssid_active ▒
0.18% sphinx_livepret [.] vithist_rescore ▒
0.14% sphinx_livepret [.] utt_decode_block ▒
0.12% sphinx_livepret [.] fe_mel_cep ▒
Prior vectorizing there is no invalid profile in mgau_eval.
Loop is
for (c = 0; c < mgau->n_comp-1; c += 2) { /* Interleave 2
components for speed */
m1 = mgau->mean[c];
m2 = mgau->mean[c+1];
v1 = mgau->var[c];
v2 = mgau->var[c+1];
dval1 = mgau->lrd[c];
dval2 = mgau->lrd[c+1];
for (i = 0; i < veclen; i++) {
diff1 = x[i] - m1[i];
dval1 -= diff1 * diff1 * v1[i];
diff2 = x[i] - m2[i];
dval2 -= diff2 * diff2 * v2[i];
/* E_INFO("x %10f m1 %10f m2 %10f v1 %10f, v2
%10f\n",x[i],m1[i],m2[i],v1[i],v2[i]);
E_INFO("diff1 %10f,dval1 %10f, diff2 %10f,
dval2 %10f\n",diff1,dval1,diff2,dval2);*/
}
if (dval1 < g->distfloor) /* Floor */
dval1 = g->distfloor;
if (dval2 < g->distfloor)
dval2 = g->distfloor;
score = logs3_add (score, (int32)(f * dval1) + mgau->mixw[c]);
score = logs3_add (score, (int32)(f * dval2) + mgau->mixw[c+1]);
}
and the inner loop iterates 47 times on average. Vectorizer has profitaiblity
threshold 8 and vectorizes to 32bit vectors.
Epilogue has threshold 4 and is vectorized with 16bit vector.
There is second similar loop nest in the function:
for (j = 0; active[j] >= 0; j++) {
#ifdef SPEC_CPU
considered++;
#endif
c = active[j];
m1 = mgau->mean[c];
v1 = mgau->var[c];
dval1 = mgau->lrd[c];
for (i = 0; i < veclen; i++) {
diff1 = x[i] - m1[i];
dval1 -= diff1 * diff1 * v1[i];
}
if (dval1 < g->distfloor)
dval1 = g->distfloor;
score = logs3_add (score, (int32)(f * dval1) + mgau->mixw[c]);
}
which is executed 10% of time and also vectorized twice.
We then believe that the inner loop iterates 5 times (I would expect 47/4
times).
In cunroll pass we then see:
Loop 4 iterates at most 2147483647 times.
Loop 4 likely iterates at most 2147483647 times.
Not unrolling loop 4 (--param max-completely-peel-times limit reached).
This is the outer loop
Loop 7 iterates at most 2 times.
Loop 7 likely iterates at most 2 times.
Loop size: 22
Estimated size after unrolling: 42
cont_mgau.c:604:20: optimized: loop with 2 iterations completely unrolled
(header execution count 1065258)
this is the scalar epilogue loop.
Loop 6 iterates at most 0 times.
Loop 6 likely iterates at most 0 times.
cont_mgau.c:575:7: optimized: loop turned into non-loop; it never loops
This is the vectorized epilogue loop (really non-loop).
So this looks OK, but introduced one mismatch in profile. Before the pass we
had:
;; basic block 14, loop depth 2, count 171249098 (guessed, freq 23.9461),
maybe hot
;; prev block 51, next block 66, flags: (NEW, VISITED)
;; pred: 24 [always] count:142707582 (guessed, freq 19.9550)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;; 51 [always] count:28541516 (guessed, freq 3.9910)
(FALLTHRU,EXECUTABLE)
and now we get:
;; basic block 14, loop depth 2, count 13764235 (guessed, freq 1.9247), maybe
hot
;; Invalid sum of incoming counts 25234431 (guessed, freq 3.5286), should be
13764235 (guessed, freq 1.9247)
;; prev block 83, next block 66, flags: (NEW, VISITED)
;; pred: 24 [always] count:11470196 (guessed, freq 1.6039)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;; 83 [always] count:13764235 (guessed, freq 1.9247)
(FALLTHRU,EXECUTABLE)
this does look wrong, since the loop was not unroled yet it profile was reduced
significantl.
I also noticed that in other (not hot) function we get following BB with
nonsential exit edges:
;; basic block 74, loop depth 3, count 258660 (guessed, freq 258660.0000),
maybe hot
;; Invalid sum of outgoing probabilities 120.0%
;; prev block 155, next block 175, flags: (NEW, REACHABLE, VISITED)
;; pred: 97 [always] count:215550 (guessed, freq 215550.0000)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;; 155 [always] count:43110 (guessed, freq 43110.0000)
(FALLTHRU,EXECUTABLE)
# i_212 = PHI <i_232(97), 0(155)>
# n_94 = PHI <_453(97), n_244(155)>
# vect_n_94.158_583 = PHI <vect__453.169_601(97), { 0, 0, 0, 0, 0, 0, 0, 0
}(155)>
# vectp.159_584 = PHI <vectp.159_585(97), _222(155)>
# vectp.165_594 = PHI <vectp.165_595(97), _222(155)>
# ivtmp_612 = PHI <ivtmp_613(97), 0(155)>
# DEBUG BEGIN_STMT
_224 = (long unsigned int) i_212;
_225 = _224 * 4;
_226 = _222 + _225;
vect__227.161_586 = MEM <vector(8) float> [(float32 *)vectp.159_584];
_227 = *_226;
vect__228.162_587 = [vec_unpack_lo_expr] vect__227.161_586;
vect__228.162_588 = [vec_unpack_hi_expr] vect__227.161_586;
_228 = (double) _227;
mask__470.163_590 = vect_cst__589 > vect__228.162_587;
mask__470.163_591 = vect_cst__589 > vect__228.162_588;
_470 = varfloor_23(D) > _228;
# DEBUG BEGIN_STMT
mask_patt_538.164_592 = VEC_PACK_TRUNC_EXPR <mask__470.163_590,
mask__470.163_591>;
if (mask_patt_538.164_592 == { 0, 0, 0, 0, 0, 0, 0, 0 })
goto <bb 174>; [100.00%]
else
goto <bb 175>; [20.00%]
Edge to 174 seems just worng:
;; basic block 174, loop depth 3, count 258660 (guessed, freq 258660.0000),
maybe hot
;; Invalid sum of incoming counts 310392 (guessed, freq 310392.0000), should
be 258660 (guessed, freq 258660.0000)
;; prev block 175, next block 97, flags: (NEW, VISITED)
;; pred: 74 [always] count:258660 (guessed, freq 258660.0000)
(TRUE_VALUE,EXECUTABLE)
;; 175 [always] count:51732 (guessed, freq 51732.0000)
(FALLTHRU,EXECUTABLE)
# DEBUG BEGIN_STMT
#
So if the probability was 80% it would be almost right.
This problem repeats twice.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (6 preceding siblings ...)
2023-07-16 17:39 ` hubicka at gcc dot gnu.org
@ 2023-07-16 20:06 ` hubicka at gcc dot gnu.org
2023-07-16 20:14 ` hubicka at gcc dot gnu.org
` (9 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 20:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I found the problem why vectorizer gets vectorized epilogue profile scales
wrong. It is scale_profile_for_vect_loop that uses niter_for_unrolled_loop
which does not understand the fact that if iteration count is not divisible,
the epilogue (unless loop is masked) will use the count.
THe upper bound compuation is actually right in update of loop_info, so we can
just use it directly instead of relying on niter_for_unrolled_loop.
Wrong profile in:
;; basic block 14, loop depth 2, count 13764235 (guessed, freq 1.9247), maybe
hot
;; Invalid sum of incoming counts 25234431 (guessed, freq 3.5286), should be
13764235 (guessed, freq 1.9247)
Is caused by loop peeling. The unrolled loop is peeled 4 times which seems
like a reasonable idea, but I am not sure why profile is not updated correctly
here.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (7 preceding siblings ...)
2023-07-16 20:06 ` hubicka at gcc dot gnu.org
@ 2023-07-16 20:14 ` hubicka at gcc dot gnu.org
2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
` (8 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 20:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=110692
--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
So in mgau_eval the inner loop is vectorized and peeled, epilogues are
vectorized and fully unrolled. The resulting code seems bit more complicated
then it needs to be.
I do not think the problems in profile updates are very iportant and actually
should affect overall performance much.
vector_gautbl_eval_logs3 seems similar but we run out of registers, so there
profile may be more relevant
I added to PR110692 oversimplified example of this pattern. I think we could
get overall codegen better...
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (8 preceding siblings ...)
2023-07-16 20:14 ` hubicka at gcc dot gnu.org
@ 2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
` (7 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:54 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:1d203d4c90adb064edfa9680768d1f83a41f17e0
commit r14-2544-g1d203d4c90adb064edfa9680768d1f83a41f17e0
Author: Jan Hubicka <jh@suse.cz>
Date: Sun Jul 16 23:53:56 2023 +0200
Avoid double profile udpate in try_peel_loop
try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which
subtracts the profile
from the original loop. However then it tries to scale the profile in a
wrong way
(it forces header count to be entry count).
This eliminates to profile misupdates in the internal loop of sphinx3.
gcc/ChangeLog:
PR middle-end/110649
* tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile
update.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (9 preceding siblings ...)
2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
@ 2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
` (6 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:c62791fa413a49fc6476ce186b324250f8ae6d40
commit r14-2545-gc62791fa413a49fc6476ce186b324250f8ae6d40
Author: Jan Hubicka <jh@suse.cz>
Date: Sun Jul 16 23:55:14 2023 +0200
Fix optimize_mask_stores profile update
While looking into sphinx3 regression I noticed that vectorizer produces
BBs with overall probability count 120%. This patch fixes it.
Richi, I don't know how to create a testcase, but having one would
be nice.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
gcc/ChangeLog:
PR tree-optimization/110649
* tree-vect-loop.cc (optimize_mask_stores): Set correctly
probability of the if-then-else construct.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (10 preceding siblings ...)
2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
@ 2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
2023-07-17 10:48 ` roger at nextmovesoftware dot com
` (5 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:
https://gcc.gnu.org/g:061f74c06735e1fa35b910ae0bcf01b61a74ec23
commit r14-2546-g061f74c06735e1fa35b910ae0bcf01b61a74ec23
Author: Jan Hubicka <jh@suse.cz>
Date: Sun Jul 16 23:56:59 2023 +0200
Fix profile update in scale_profile_for_vect_loop
When vectorizing 4 times, we sometimes do
for
<4x vectorized body>
for
<2x vectorized body>
for
<1x vectorized body>
Here the second two fors handling epilogue never iterates.
Currently vecotrizer thinks that the middle for itrates twice.
This turns out to be scale_profile_for_vect_loop that uses
niter_for_unrolled_loop.
At that time we know epilogue will iterate at most 2 times
but niter_for_unrolled_loop does not know that the last iteration
will be taken by the epilogue-of-epilogue and thus it think
that the loop may iterate once and exit in middle of second
iteration.
We already do correct job updating niter bounds and this is
just ordering issue. This patch makes us to first update
the bounds and then do updating of the loop. I re-implemented
the function more correctly and precisely.
The loop reducing iteration factor for overly flat profiles is bit funny,
but
only other method I can think of is to compute sreal scale that would have
similar overhead I think.
Bootstrapped/regtested x86_64-linux, will commit it shortly.
gcc/ChangeLog:
PR middle-end/110649
* tree-vect-loop.cc (scale_profile_for_vect_loop): Rewrite.
(vect_transform_loop): Move scale_profile_for_vect_loop after
upper bound updates.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (11 preceding siblings ...)
2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
@ 2023-07-17 10:48 ` roger at nextmovesoftware dot com
2023-07-17 11:53 ` jamborm at gcc dot gnu.org
` (4 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 10:48 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Roger Sayle <roger at nextmovesoftware dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |roger at nextmovesoftware dot com
--- Comment #12 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Jan,
I believe you also need to remove the
profile_count entry_count = profile_count::zero ();
from tree-ssa-loop-ivcanon.cc's try_peel_loop to avoid a
bootstrap issue with -Werror "variable entry_count set but unused".
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (12 preceding siblings ...)
2023-07-17 10:48 ` roger at nextmovesoftware dot com
@ 2023-07-17 11:53 ` jamborm at gcc dot gnu.org
2023-07-18 14:49 ` hubicka at gcc dot gnu.org
` (3 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 11:53 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Martin Jambor <jamborm at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jamborm at gcc dot gnu.org
--- Comment #13 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #12)
> Hi Jan,
> I believe you also need to remove the
> profile_count entry_count = profile_count::zero ();
> from tree-ssa-loop-ivcanon.cc's try_peel_loop to avoid a
> bootstrap issue with -Werror "variable entry_count set but unused".
I'm bootstrapping that change right now.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (13 preceding siblings ...)
2023-07-17 11:53 ` jamborm at gcc dot gnu.org
@ 2023-07-18 14:49 ` hubicka at gcc dot gnu.org
2023-07-18 15:56 ` [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06) hubicka at gcc dot gnu.org
` (2 subsequent siblings)
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 14:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Chasing profile update bugs out of the hottest two functions did not solve the
regression. Moreover the weekly testers confirm it was not noise on zens
either.
Before the change we get:
34.58% sphinx_livepret [.] mgau_eval ◆
26.61% sphinx_livepret [.] vector_gautbl_eval_logs3 ▒
8.94% sphinx_livepret [.] subvq_mgau_shortlist ▒
7.36% sphinx_livepret [.] logs3_add ▒
5.66% sphinx_livepret [.] approx_cont_mgau_frame_eval ▒
4.68% sphinx_livepret [.] mdef_sseq2sen_active ▒
3.38% sphinx_livepret [.] dict2pid_comsenscr ▒
1.66% sphinx_livepret [.] hmm_vit_eval_3st ▒
0.90% sphinx_livepret [.] lextree_hmm_eval ▒
0.73% sphinx_livepret [.] lextree_hmm_propagate ▒
0.71% sphinx_livepret [.] lextree_enter ▒
0.68% sphinx_livepret [.] fe_fft ▒
0.49% sphinx_livepret [.] dict2pid_comsseq2sen_active ▒
0.35% sphinx_livepret [.] lextree_ssid_active ▒
0.20% sphinx_livepret [.] vithist_rescore ▒
So difference seems to be mgau_eval.
Both version of mgau_eval has almost same code layout. Main difference is
registr allocation. In old version we do more spill around call:
0.01 │ and $0xffffffffffffffe0,%rsp ▒
0.14 │ mov %rcx,%rbx ▒
0.00 │ sub $0xa0,%rsp ▒
0.04 │ mov 0x10(%rdi),%rax ▒
0.13 │ mov 0x8(%rdi),%r15d ▒
0.01 │ vmovaps %xmm3,0x80(%rsp) ▒
0.22 │ vmovaps %xmm2,0x90(%rsp) ▒
0.03 │ mov %rdi,0x70(%rsp) ▒
0.05 │ lea (%rax,%rdx,8),%r14 ▒
0.01 │ call log_to_logs3_factor ▒
1.00 │ test %r13,%r13 ▒
0.00 │ vxorps %xmm4,%xmm4,%xmm4 ▒
0.02 │ vmovsd %xmm0,0x78(%rsp) ▒
0.00 │ je 433 ▒
0.01 │ movslq 0x0(%r13),%rax ▒
0.02 │ mov $0xc8000000,%edi ▒
0.01 │ vmovaps 0x90(%rsp),%xmm2 ▒
0.23 │ vmovaps 0x80(%rsp),%xmm3 ▒
0.09 │ test %eax,%eax ▒
0.00 │ js 3f9 ▒
new verison is missing the spill of xmm2/3
0.02 │ and $0xffffffffffffffe0,%rsp ▒
0.03 │ mov %rcx,%rbx ▒
0.01 │ add $0xffffffffffffff80,%rsp ▒
0.03 │ mov 0x10(%rdi),%rax ▒
0.16 │ mov 0x8(%rdi),%r15d ▒
0.06 │ mov %rdi,0x50(%rsp) ▒
0.12 │ lea (%rax,%rdx,8),%r14 ▒
0.01 │ call log_to_logs3_factor ▒
0.75 │ test %r12,%r12 ▒
0.00 │ vxorps %xmm3,%xmm3,%xmm3 ▒
0.01 │ vmovsd %xmm0,0x58(%rsp) ▒
0.01 │ je 3f2 ▒
0.01 │ movslq (%r12),%rcx ▒
0.00 │ mov $0xc8000000,%edi ▒
│ test %ecx,%ecx ▒
0.14 │ js 3b8 ▒
Which looks better. log_to_logs3_factor just returns constant:
Percent│ vmovsd invlogB,%xmm0
│ ret
I wonder why we no longer need to spill. log_to_logs3_factor is from other
translation unit and this is non-LTO build. Maybe there are undefined
variables.
New version does:
0.29 │ vmovhps %xmm4,0x70(%rsp) ▒
0.11 │ vmovaps 0x70(%rsp),%xmm7 ▒
and this looks odd.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (14 preceding siblings ...)
2023-07-18 14:49 ` hubicka at gcc dot gnu.org
@ 2023-07-18 15:56 ` hubicka at gcc dot gnu.org
2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07 7:41 ` [Bug target/110649] [14/15 " rguenth at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 15:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://gcc.gnu.org/bugzill
| |a/show_bug.cgi?id=110586
--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
It seems that both this PR nad PR110586 boils down to worse IRA and scheduling
due to corrected profile. I wonder if the artifically increased frequency of
former bodies of vectorized loops does not suggest that IRA may take into
account that spilling in code with long latency instructions is worse than
spiling elsehwere.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (15 preceding siblings ...)
2023-07-18 15:56 ` [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06) hubicka at gcc dot gnu.org
@ 2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07 7:41 ` [Bug target/110649] [14/15 " rguenth at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 23:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Jeffrey A. Law <law at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Priority|P3 |P2
CC| |law at gcc dot gnu.org
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Bug target/110649] [14/15 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06)
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
` (16 preceding siblings ...)
2024-03-07 23:29 ` law at gcc dot gnu.org
@ 2024-05-07 7:41 ` rguenth at gcc dot gnu.org
17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07 7:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|14.0 |14.2
--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.
^ permalink raw reply [flat|nested] 19+ messages in thread