public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
@ 2023-07-12 22:35 hubicka at gcc dot gnu.org
  2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-12 22:35 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

            Bug ID: 110649
           Summary: 25% sphinx3 spec2006 regression on Ice Lake and zen
                    between g:acaa441a98bebc52 (2023-07-06 11:36) and
                    g:55900189ab517906 (2023-07-07 00:23)
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

It is seen here:
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=798.280.0
with -Ofast -march=native

Weekly testers seems to show regressions too (all just 1 run so far, so may be
a noise)

Intel
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=790.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=790.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=785.280.0

zen1
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=301.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=299.280.0

zen3
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=467.280.0
https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=468.280.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
@ 2023-07-12 22:39 ` pinskia at gcc dot gnu.org
  2023-07-12 22:39 ` [Bug target/110649] [14 Regression] " pinskia at gcc dot gnu.org
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-12 22:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
3 profile changes:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a61ca1b9256535e1bfb19b2d46cde21f3908a5d
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d4c2e34deef8cbd81ba2ef3389fdbaf95c70e225
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=2e406f0753e8d78d320437189211e3094c33b7e4

1 vectorizer change:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=224fd59b2dc8a5fa78a309a09863afe9b3cf2111

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
  2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
@ 2023-07-12 22:39 ` pinskia at gcc dot gnu.org
  2023-07-14  7:16 ` hliu at amperecomputing dot com
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-07-12 22:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0
           Keywords|                            |missed-optimization
             Target|                            |x86_64-linux-gnu
            Version|13.1.0                      |14.0
            Summary|25% sphinx3 spec2006        |[14 Regression] 25% sphinx3
                   |regression on Ice Lake and  |spec2006 regression on Ice
                   |zen between                 |Lake and zen between
                   |g:acaa441a98bebc52          |g:acaa441a98bebc52
                   |(2023-07-06 11:36) and      |(2023-07-06 11:36) and
                   |g:55900189ab517906          |g:55900189ab517906
                   |(2023-07-07 00:23)          |(2023-07-07 00:23)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
  2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
  2023-07-12 22:39 ` [Bug target/110649] [14 Regression] " pinskia at gcc dot gnu.org
@ 2023-07-14  7:16 ` hliu at amperecomputing dot com
  2023-07-14 14:39 ` hubicka at gcc dot gnu.org
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hliu at amperecomputing dot com @ 2023-07-14  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #2 from Hao Liu <hliu at amperecomputing dot com> ---
Hi, I bisected the following 3 commits (sequantial):

  [v3] 3a61ca1b925 - Improve profile updates after loop-ch and cunroll
(2023-07-06) <Jan Hubicka>
  [v2] d4c2e34deef - Improve scale_loop_profile (2023-07-06) <Jan Hubicka>
  [v1] 224fd59b2dc - Vect: use a small step to calculate induction for the
unrolled loop (PR tree-optimization/110449) (2023-07-06) <Hao Liu OS>

Tests the time in seconds of 1-copy performance of 482.sphinx3 on zen2:
  v3: 261s
  v2: 231s
  v1: 231s

So the regression should be caused by 3a61ca1b925, i.e.
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=3a61ca1b9256535e1bfb19b2d46cde21f3908a5d

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-07-14  7:16 ` hliu at amperecomputing dot com
@ 2023-07-14 14:39 ` hubicka at gcc dot gnu.org
  2023-07-14 14:41 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-14 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=10647
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-07-14
             Status|UNCONFIRMED                 |NEW

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks for bisecting this! We also have PR10647 which is tracked to this
change.
The change correct loop profile after header copying:

test()
{
        for (int i = 0; i < 10; i++)
                test2();
}

has probability of exit conditional 90.9% before loop header copying (since it
technically iterates 10 times) while after loop header copying and optimizing
out the constant "if (0<10)" test it has only 90% loopback probability.

So probably fixing the bug above triggers something else.
I will first look at PR10647 and see if I can figure out what is going on
there.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-07-14 14:39 ` hubicka at gcc dot gnu.org
@ 2023-07-14 14:41 ` hubicka at gcc dot gnu.org
  2023-07-16 16:25 ` hubicka at gcc dot gnu.org
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-14 14:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We also have PR98782 that is about sphinx being sensitive to LRA decisions. 
Reducing loopback probability might trigger LRA adding a spill to the loop.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-07-14 14:41 ` hubicka at gcc dot gnu.org
@ 2023-07-16 16:25 ` hubicka at gcc dot gnu.org
  2023-07-16 17:39 ` hubicka at gcc dot gnu.org
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 16:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|https://gcc.gnu.org/bugzill |https://gcc.gnu.org/bugzill
                   |a/show_bug.cgi?id=10647     |a/show_bug.cgi?id=110647

--- Comment #5 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
In comment 3 I got wrong PR number. It is PR110647

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-07-16 16:25 ` hubicka at gcc dot gnu.org
@ 2023-07-16 17:39 ` hubicka at gcc dot gnu.org
  2023-07-16 20:06 ` hubicka at gcc dot gnu.org
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 17:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #6 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I tried zen3 with -march=native -Ofast 

Samples: 1M of event 'cycles:u', Event count (approx.): 2309002237334, DSO: s
Overhead  Command          Symbol                                            
  42.51%  sphinx_livepret  [.] mgau_eval                                    ◆
  24.36%  sphinx_livepret  [.] vector_gautbl_eval_logs3                     ▒
   6.81%  sphinx_livepret  [.] subvq_mgau_shortlist                         ▒
   6.43%  sphinx_livepret  [.] logs3_add                                    ▒
   4.91%  sphinx_livepret  [.] approx_cont_mgau_frame_eval                  ▒
   4.32%  sphinx_livepret  [.] mdef_sseq2sen_active                         ▒
   2.62%  sphinx_livepret  [.] dict2pid_comsenscr                           ▒
   1.50%  sphinx_livepret  [.] hmm_vit_eval_3st                             ▒
   0.84%  sphinx_livepret  [.] lextree_hmm_eval                             ▒
   0.67%  sphinx_livepret  [.] lextree_hmm_propagate                        ▒
   0.64%  sphinx_livepret  [.] lextree_enter                                ▒
   0.61%  sphinx_livepret  [.] fe_fft                                       ▒
   0.45%  sphinx_livepret  [.] dict2pid_comsseq2sen_active                  ▒
   0.32%  sphinx_livepret  [.] lextree_ssid_active                          ▒
   0.18%  sphinx_livepret  [.] vithist_rescore                              ▒
   0.14%  sphinx_livepret  [.] utt_decode_block                             ▒
   0.12%  sphinx_livepret  [.] fe_mel_cep                                   ▒

Prior vectorizing there is no invalid profile in mgau_eval.
Loop is
        for (c = 0; c < mgau->n_comp-1; c += 2) {       /* Interleave 2
components for speed */
            m1 = mgau->mean[c];
            m2 = mgau->mean[c+1];
            v1 = mgau->var[c];
            v2 = mgau->var[c+1];
            dval1 = mgau->lrd[c];
            dval2 = mgau->lrd[c+1];

            for (i = 0; i < veclen; i++) {
                diff1 = x[i] - m1[i];
                dval1 -= diff1 * diff1 * v1[i];
                diff2 = x[i] - m2[i];
                dval2 -= diff2 * diff2 * v2[i];
                /*              E_INFO("x %10f m1 %10f m2 %10f v1 %10f, v2
%10f\n",x[i],m1[i],m2[i],v1[i],v2[i]);
                                E_INFO("diff1 %10f,dval1 %10f, diff2 %10f,
dval2 %10f\n",diff1,dval1,diff2,dval2);*/
            }

            if (dval1 < g->distfloor)   /* Floor */
                dval1 = g->distfloor;
            if (dval2 < g->distfloor)
                dval2 = g->distfloor;

            score = logs3_add (score, (int32)(f * dval1) + mgau->mixw[c]);
            score = logs3_add (score, (int32)(f * dval2) + mgau->mixw[c+1]);
        }
and the inner loop iterates 47 times on average. Vectorizer has profitaiblity
threshold 8 and vectorizes to 32bit vectors.
Epilogue has threshold 4 and is vectorized with 16bit vector.

There is second similar loop nest in the function:
        for (j = 0; active[j] >= 0; j++) {
#ifdef SPEC_CPU
            considered++;
#endif
            c = active[j];

            m1 = mgau->mean[c];
            v1 = mgau->var[c];
            dval1 = mgau->lrd[c];

            for (i = 0; i < veclen; i++) {
                diff1 = x[i] - m1[i];
                dval1 -= diff1 * diff1 * v1[i];
            }

            if (dval1 < g->distfloor)
                dval1 = g->distfloor;

            score = logs3_add (score, (int32)(f * dval1) + mgau->mixw[c]);
        }
which is executed 10% of time and also vectorized twice.

We then believe that the inner loop iterates 5 times (I would expect 47/4
times).

In cunroll pass we then see:
   Loop 4 iterates at most 2147483647 times. 
   Loop 4 likely iterates at most 2147483647 times.
   Not unrolling loop 4 (--param max-completely-peel-times limit reached).

This is the outer loop

   Loop 7 iterates at most 2 times.
   Loop 7 likely iterates at most 2 times.
  Loop size: 22
  Estimated size after unrolling: 42
  cont_mgau.c:604:20: optimized: loop with 2 iterations completely unrolled
(header execution count 1065258)

this is the scalar epilogue loop.

   Loop 6 iterates at most 0 times.
   Loop 6 likely iterates at most 0 times.
   cont_mgau.c:575:7: optimized: loop turned into non-loop; it never loops

This is the vectorized epilogue loop (really non-loop).

So this looks OK, but introduced one mismatch in profile. Before the pass we
had:
;;   basic block 14, loop depth 2, count 171249098 (guessed, freq 23.9461),
maybe hot
;;    prev block 51, next block 66, flags: (NEW, VISITED)
;;    pred:       24 [always]  count:142707582 (guessed, freq 19.9550)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;;                51 [always]  count:28541516 (guessed, freq 3.9910)
(FALLTHRU,EXECUTABLE)

and now we get:
;;   basic block 14, loop depth 2, count 13764235 (guessed, freq 1.9247), maybe
hot
;;   Invalid sum of incoming counts 25234431 (guessed, freq 3.5286), should be
13764235 (guessed, freq 1.9247)
;;    prev block 83, next block 66, flags: (NEW, VISITED)
;;    pred:       24 [always]  count:11470196 (guessed, freq 1.6039)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;;                83 [always]  count:13764235 (guessed, freq 1.9247)
(FALLTHRU,EXECUTABLE)

this does look wrong, since the loop was not unroled yet it profile was reduced
significantl.

I also noticed that in other (not hot) function we get following BB with
nonsential exit edges:

;;   basic block 74, loop depth 3, count 258660 (guessed, freq 258660.0000),
maybe hot
;;   Invalid sum of outgoing probabilities 120.0%
;;    prev block 155, next block 175, flags: (NEW, REACHABLE, VISITED)
;;    pred:       97 [always]  count:215550 (guessed, freq 215550.0000)
(FALLTHRU,DFS_BACK,EXECUTABLE)
;;                155 [always]  count:43110 (guessed, freq 43110.0000)
(FALLTHRU,EXECUTABLE)
  # i_212 = PHI <i_232(97), 0(155)>
  # n_94 = PHI <_453(97), n_244(155)>
  # vect_n_94.158_583 = PHI <vect__453.169_601(97), { 0, 0, 0, 0, 0, 0, 0, 0
}(155)>
  # vectp.159_584 = PHI <vectp.159_585(97), _222(155)>
  # vectp.165_594 = PHI <vectp.165_595(97), _222(155)>
  # ivtmp_612 = PHI <ivtmp_613(97), 0(155)>
  # DEBUG BEGIN_STMT
  _224 = (long unsigned int) i_212;
  _225 = _224 * 4;
  _226 = _222 + _225;
  vect__227.161_586 = MEM <vector(8) float> [(float32 *)vectp.159_584];
  _227 = *_226;
  vect__228.162_587 = [vec_unpack_lo_expr] vect__227.161_586;
  vect__228.162_588 = [vec_unpack_hi_expr] vect__227.161_586;
  _228 = (double) _227;
  mask__470.163_590 = vect_cst__589 > vect__228.162_587;
  mask__470.163_591 = vect_cst__589 > vect__228.162_588;
  _470 = varfloor_23(D) > _228;
  # DEBUG BEGIN_STMT
  mask_patt_538.164_592 = VEC_PACK_TRUNC_EXPR <mask__470.163_590,
mask__470.163_591>;
  if (mask_patt_538.164_592 == { 0, 0, 0, 0, 0, 0, 0, 0 })
    goto <bb 174>; [100.00%]
  else
    goto <bb 175>; [20.00%]

Edge to 174 seems just worng:

;;   basic block 174, loop depth 3, count 258660 (guessed, freq 258660.0000),
maybe hot
;;   Invalid sum of incoming counts 310392 (guessed, freq 310392.0000), should
be 258660 (guessed, freq 258660.0000)
;;    prev block 175, next block 97, flags: (NEW, VISITED)
;;    pred:       74 [always]  count:258660 (guessed, freq 258660.0000)
(TRUE_VALUE,EXECUTABLE)
;;                175 [always]  count:51732 (guessed, freq 51732.0000)
(FALLTHRU,EXECUTABLE)
  # DEBUG BEGIN_STMT
  #

So if the probability was 80% it would be almost right.

This problem repeats twice.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-07-16 17:39 ` hubicka at gcc dot gnu.org
@ 2023-07-16 20:06 ` hubicka at gcc dot gnu.org
  2023-07-16 20:14 ` hubicka at gcc dot gnu.org
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #7 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I found the problem why vectorizer gets vectorized epilogue profile scales
wrong. It is scale_profile_for_vect_loop that uses niter_for_unrolled_loop
which does not understand the fact that if iteration count is not divisible,
the epilogue (unless loop is masked) will use the count.

THe upper bound compuation is actually right in update of loop_info, so we can
just use it directly instead of relying on niter_for_unrolled_loop.

Wrong profile in:

;;   basic block 14, loop depth 2, count 13764235 (guessed, freq 1.9247), maybe
hot
;;   Invalid sum of incoming counts 25234431 (guessed, freq 3.5286), should be
13764235 (guessed, freq 1.9247)

Is caused by loop peeling.  The unrolled loop is peeled 4 times which seems
like a reasonable idea, but I am not sure why profile is not updated correctly
here.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-07-16 20:06 ` hubicka at gcc dot gnu.org
@ 2023-07-16 20:14 ` hubicka at gcc dot gnu.org
  2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-16 20:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110692

--- Comment #8 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
So in mgau_eval the inner loop is vectorized and peeled, epilogues are
vectorized and fully unrolled. The resulting code seems bit more complicated
then it needs to be.
I do not think the problems in profile updates are very iportant and actually
should affect overall performance much.

vector_gautbl_eval_logs3 seems similar but we run out of registers, so there
profile may be more relevant

I added to PR110692 oversimplified example of this pattern.  I think we could
get overall codegen better...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-07-16 20:14 ` hubicka at gcc dot gnu.org
@ 2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
  2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:1d203d4c90adb064edfa9680768d1f83a41f17e0

commit r14-2544-g1d203d4c90adb064edfa9680768d1f83a41f17e0
Author: Jan Hubicka <jh@suse.cz>
Date:   Sun Jul 16 23:53:56 2023 +0200

    Avoid double profile udpate in try_peel_loop

    try_peel_loop uses gimple_duplicate_loop_body_to_header_edge which
subtracts the profile
    from the original loop. However then it tries to scale the profile in a
wrong way
    (it forces header count to be entry count).

    This eliminates to profile misupdates in the internal loop of sphinx3.

    gcc/ChangeLog:

            PR middle-end/110649
            * tree-ssa-loop-ivcanon.cc (try_peel_loop): Avoid double profile
update.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
@ 2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
  2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:c62791fa413a49fc6476ce186b324250f8ae6d40

commit r14-2545-gc62791fa413a49fc6476ce186b324250f8ae6d40
Author: Jan Hubicka <jh@suse.cz>
Date:   Sun Jul 16 23:55:14 2023 +0200

    Fix optimize_mask_stores profile update

    While looking into sphinx3 regression I noticed that vectorizer produces
    BBs with overall probability count 120%.  This patch fixes it.
    Richi, I don't know how to create a testcase, but having one would
    be nice.

    Bootstrapped/regtested x86_64-linux, will commit it shortly.

    gcc/ChangeLog:

            PR tree-optimization/110649
            * tree-vect-loop.cc (optimize_mask_stores): Set correctly
            probability of the if-then-else construct.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
@ 2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
  2023-07-17 10:48 ` roger at nextmovesoftware dot com
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-16 21:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #11 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:061f74c06735e1fa35b910ae0bcf01b61a74ec23

commit r14-2546-g061f74c06735e1fa35b910ae0bcf01b61a74ec23
Author: Jan Hubicka <jh@suse.cz>
Date:   Sun Jul 16 23:56:59 2023 +0200

    Fix profile update in scale_profile_for_vect_loop

    When vectorizing 4 times, we sometimes do
      for
        <4x vectorized body>
      for
        <2x vectorized body>
      for
        <1x vectorized body>

    Here the second two fors handling epilogue never iterates.
    Currently vecotrizer thinks that the middle for itrates twice.
    This turns out to be scale_profile_for_vect_loop that uses
    niter_for_unrolled_loop.

    At that time we know epilogue will iterate at most 2 times
    but niter_for_unrolled_loop does not know that the last iteration
    will be taken by the epilogue-of-epilogue and thus it think
    that the loop may iterate once and exit in middle of second
    iteration.

    We already do correct job updating niter bounds and this is
    just ordering issue.  This patch makes us to first update
    the bounds and then do updating of the loop.  I re-implemented
    the function more correctly and precisely.

    The loop reducing iteration factor for overly flat profiles is bit funny,
but
    only other method I can think of is to compute sreal scale that would have
    similar overhead I think.

    Bootstrapped/regtested x86_64-linux, will commit it shortly.

    gcc/ChangeLog:

            PR middle-end/110649
            * tree-vect-loop.cc (scale_profile_for_vect_loop): Rewrite.
            (vect_transform_loop): Move scale_profile_for_vect_loop after
            upper bound updates.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
@ 2023-07-17 10:48 ` roger at nextmovesoftware dot com
  2023-07-17 11:53 ` jamborm at gcc dot gnu.org
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: roger at nextmovesoftware dot com @ 2023-07-17 10:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |roger at nextmovesoftware dot com

--- Comment #12 from Roger Sayle <roger at nextmovesoftware dot com> ---
Hi Jan,
I believe you also need to remove the
   profile_count entry_count = profile_count::zero ();
from tree-ssa-loop-ivcanon.cc's try_peel_loop to avoid a
bootstrap issue with -Werror "variable entry_count set but unused".

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-07-17 10:48 ` roger at nextmovesoftware dot com
@ 2023-07-17 11:53 ` jamborm at gcc dot gnu.org
  2023-07-18 14:49 ` hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-17 11:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jamborm at gcc dot gnu.org

--- Comment #13 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Roger Sayle from comment #12)
> Hi Jan,
> I believe you also need to remove the
>    profile_count entry_count = profile_count::zero ();
> from tree-ssa-loop-ivcanon.cc's try_peel_loop to avoid a
> bootstrap issue with -Werror "variable entry_count set but unused".

I'm bootstrapping that change right now.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-07-17 11:53 ` jamborm at gcc dot gnu.org
@ 2023-07-18 14:49 ` hubicka at gcc dot gnu.org
  2023-07-18 15:56 ` [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06) hubicka at gcc dot gnu.org
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 14:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

--- Comment #14 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Chasing profile update bugs out of the hottest two functions did not solve the
regression. Moreover the weekly testers confirm it was not noise on zens
either.

Before the change we get:

  34.58%  sphinx_livepret  [.] mgau_eval                              ◆
  26.61%  sphinx_livepret  [.] vector_gautbl_eval_logs3               ▒
   8.94%  sphinx_livepret  [.] subvq_mgau_shortlist                   ▒
   7.36%  sphinx_livepret  [.] logs3_add                              ▒
   5.66%  sphinx_livepret  [.] approx_cont_mgau_frame_eval            ▒
   4.68%  sphinx_livepret  [.] mdef_sseq2sen_active                   ▒
   3.38%  sphinx_livepret  [.] dict2pid_comsenscr                     ▒
   1.66%  sphinx_livepret  [.] hmm_vit_eval_3st                       ▒
   0.90%  sphinx_livepret  [.] lextree_hmm_eval                       ▒
   0.73%  sphinx_livepret  [.] lextree_hmm_propagate                  ▒
   0.71%  sphinx_livepret  [.] lextree_enter                          ▒
   0.68%  sphinx_livepret  [.] fe_fft                                 ▒
   0.49%  sphinx_livepret  [.] dict2pid_comsseq2sen_active            ▒
   0.35%  sphinx_livepret  [.] lextree_ssid_active                    ▒
   0.20%  sphinx_livepret  [.] vithist_rescore                        ▒

So difference seems to be mgau_eval.
Both version of mgau_eval has almost same code layout. Main difference is
registr allocation.  In old version we do more spill around call:

 0.01 │       and          $0xffffffffffffffe0,%rsp                  ▒
  0.14 │       mov          %rcx,%rbx                                 ▒
  0.00 │       sub          $0xa0,%rsp                                ▒
  0.04 │       mov          0x10(%rdi),%rax                           ▒
  0.13 │       mov          0x8(%rdi),%r15d                           ▒
  0.01 │       vmovaps      %xmm3,0x80(%rsp)                          ▒
  0.22 │       vmovaps      %xmm2,0x90(%rsp)                          ▒
  0.03 │       mov          %rdi,0x70(%rsp)                           ▒
  0.05 │       lea          (%rax,%rdx,8),%r14                        ▒
  0.01 │       call         log_to_logs3_factor                       ▒
  1.00 │       test         %r13,%r13                                 ▒
  0.00 │       vxorps       %xmm4,%xmm4,%xmm4                         ▒
  0.02 │       vmovsd       %xmm0,0x78(%rsp)                          ▒
  0.00 │       je           433                                       ▒
  0.01 │       movslq       0x0(%r13),%rax                            ▒
  0.02 │       mov          $0xc8000000,%edi                          ▒
  0.01 │       vmovaps      0x90(%rsp),%xmm2                          ▒
  0.23 │       vmovaps      0x80(%rsp),%xmm3                          ▒
  0.09 │       test         %eax,%eax                                 ▒
  0.00 │       js           3f9                                       ▒

new verison is missing the spill of xmm2/3

  0.02 │       and          $0xffffffffffffffe0,%rsp                  ▒
  0.03 │       mov          %rcx,%rbx                                 ▒
  0.01 │       add          $0xffffffffffffff80,%rsp                  ▒
  0.03 │       mov          0x10(%rdi),%rax                           ▒
  0.16 │       mov          0x8(%rdi),%r15d                           ▒
  0.06 │       mov          %rdi,0x50(%rsp)                           ▒
  0.12 │       lea          (%rax,%rdx,8),%r14                        ▒
  0.01 │       call         log_to_logs3_factor                       ▒
  0.75 │       test         %r12,%r12                                 ▒
  0.00 │       vxorps       %xmm3,%xmm3,%xmm3                         ▒
  0.01 │       vmovsd       %xmm0,0x58(%rsp)                          ▒
  0.01 │       je           3f2                                       ▒
  0.01 │       movslq       (%r12),%rcx                               ▒
  0.00 │       mov          $0xc8000000,%edi                          ▒
       │       test         %ecx,%ecx                                 ▒
  0.14 │       js           3b8                                       ▒

Which looks better. log_to_logs3_factor just returns constant:

Percent│     vmovsd invlogB,%xmm0                                      
       │     ret                                                       

I wonder why we no longer need to spill. log_to_logs3_factor is from other
translation unit and this is non-LTO build. Maybe there are undefined
variables.

New version does:
  0.29 │       vmovhps      %xmm4,0x70(%rsp)                          ▒
  0.11 │       vmovaps      0x70(%rsp),%xmm7                          ▒
and this looks odd.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653  (2023-07-06)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-07-18 14:49 ` hubicka at gcc dot gnu.org
@ 2023-07-18 15:56 ` hubicka at gcc dot gnu.org
  2024-03-07 23:29 ` law at gcc dot gnu.org
  2024-05-07  7:41 ` [Bug target/110649] [14/15 " rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-18 15:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110586

--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
It seems that both this PR nad PR110586 boils down to worse IRA and scheduling
due to corrected profile.  I wonder if the artifically increased frequency of
former bodies of vectorized loops does not suggest that IRA may take into
account that spilling in code with long latency instructions is worse than
spiling elsehwere.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653  (2023-07-06)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-07-18 15:56 ` [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06) hubicka at gcc dot gnu.org
@ 2024-03-07 23:29 ` law at gcc dot gnu.org
  2024-05-07  7:41 ` [Bug target/110649] [14/15 " rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: law at gcc dot gnu.org @ 2024-03-07 23:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Jeffrey A. Law <law at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2
                 CC|                            |law at gcc dot gnu.org

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Bug target/110649] [14/15 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653  (2023-07-06)
  2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2024-03-07 23:29 ` law at gcc dot gnu.org
@ 2024-05-07  7:41 ` rguenth at gcc dot gnu.org
  17 siblings, 0 replies; 19+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-07  7:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110649

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|14.0                        |14.2

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-05-07  7:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-12 22:35 [Bug target/110649] New: 25% sphinx3 spec2006 regression on Ice Lake and zen between g:acaa441a98bebc52 (2023-07-06 11:36) and g:55900189ab517906 (2023-07-07 00:23) hubicka at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] " pinskia at gcc dot gnu.org
2023-07-12 22:39 ` [Bug target/110649] [14 Regression] " pinskia at gcc dot gnu.org
2023-07-14  7:16 ` hliu at amperecomputing dot com
2023-07-14 14:39 ` hubicka at gcc dot gnu.org
2023-07-14 14:41 ` hubicka at gcc dot gnu.org
2023-07-16 16:25 ` hubicka at gcc dot gnu.org
2023-07-16 17:39 ` hubicka at gcc dot gnu.org
2023-07-16 20:06 ` hubicka at gcc dot gnu.org
2023-07-16 20:14 ` hubicka at gcc dot gnu.org
2023-07-16 21:54 ` cvs-commit at gcc dot gnu.org
2023-07-16 21:55 ` cvs-commit at gcc dot gnu.org
2023-07-16 21:57 ` cvs-commit at gcc dot gnu.org
2023-07-17 10:48 ` roger at nextmovesoftware dot com
2023-07-17 11:53 ` jamborm at gcc dot gnu.org
2023-07-18 14:49 ` hubicka at gcc dot gnu.org
2023-07-18 15:56 ` [Bug target/110649] [14 Regression] 25% sphinx3 spec2006 regression on Ice Lake and zen since g:r14-2369-g3a61ca1b925653 (2023-07-06) hubicka at gcc dot gnu.org
2024-03-07 23:29 ` law at gcc dot gnu.org
2024-05-07  7:41 ` [Bug target/110649] [14/15 " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).