[Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
@ 2022-07-14  9:08 jamborm at gcc dot gnu.org
  2022-07-14  9:22 ` [Bug tree-optimization/106293] [13 Regression] " rguenth at gcc dot gnu.org
                   ` (28 more replies)
  0 siblings, 29 replies; 31+ messages in thread
From: jamborm at gcc dot gnu.org @ 2022-07-14  9:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

            Bug ID: 106293
           Summary: 456.hmmer at -Ofast -march=native regressed by 19% on
                    zen2 and zen3 in July 2022
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jamborm at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
            Blocks: 26163
  Target Milestone: ---

The benchmark 456.hmmer from SPECINT 2006 suite has regressed on zen2 when
compiled with -Ofast -march=native, with or without LTO. See:

  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=301.180.0
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=289.180.0

On zen3, LNT only reported a similar regression with LTO:

  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=476.180.0


There may be some effect also on the Kabylake:
  https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=2.180.0

On Zen2 (with LTO), I have manually bisected the regression to:
  d2a898666609452ef79a14feae1cadc3538e4b45 is the first bad commit
  commit d2a898666609452ef79a14feae1cadc3538e4b45
  Author: Richard Biener <rguenther@suse.de>
  Date:   Tue Jun 21 16:17:58 2022 +0200

    Put virtual operands into loop-closed SSA

    When attempting to manually update SSA form after high-level loop
    transforms such as loop versioning it is helpful when the loop-closed
    SSA form includes virtual operands.  While we have the special
    rewrite_virtuals_into_loop_closed_ssa function that doesn't
    presently scale, invoking update_ssa by itself.  So the following
    makes the regular loop-closed SSA form also cover virtual operands.
    For users of loop_version this allows to use cheaper
    TODO_update_ssa_no_phi, skipping dominance frontier compute
    (for the whole function) and iterated dominance frontiers for each
    copied def.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
@ 2022-07-14  9:22 ` rguenth at gcc dot gnu.org
  2022-07-14 12:10 ` rguenth at gcc dot gnu.org
                   ` (27 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-14  9:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |13.0
   Last reconfirmed|                            |2022-07-14
           Keywords|                            |missed-optimization
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
            Version|12.0                        |13.0
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1
            Summary|456.hmmer at -Ofast         |[13 Regression] 456.hmmer
                   |-march=native regressed by  |at -Ofast -march=native
                   |19% on zen2 and zen3 in     |regressed by 19% on zen2
                   |July 2022                   |and zen3 in July 2022

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I will have a look - the change should not have resulted in (major) code
generation changes.  Some effects result in slightly different CFG in some
cases (somewhat mitigated with r13-1503-gc3d2600cfb476e which I will
cherry-pick for investigating).

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
  2022-07-14  9:22 ` [Bug tree-optimization/106293] [13 Regression] " rguenth at gcc dot gnu.org
@ 2022-07-14 12:10 ` rguenth at gcc dot gnu.org
  2022-07-14 12:22 ` rguenth at gcc dot gnu.org
                   ` (26 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-14 12:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org,
                   |                            |luoxhu at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I can reproduce a regression with -Ofast -march=znver2 running on Haswell as
well.  -fopt-info doesn't reveal anything interesting besides

-fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
unrolled (header execution count 32987933)
+fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
unrolled (header execution count 129072791)

obviously the slowdown is in P7Viterbi.  There's only minimal changes on the
GIMPLE side, one notable:

  niters_vector_mult_vf.205_2406 = niters.203_442 & 429496729 |   _2041 =
niters.203_438 & 3;
  _2408 = (int) niters_vector_mult_vf.205_2406;               |   if (_2041 ==
0)
  tmp.206_2407 = k_384 + _2408;                               |     goto <bb
66>; [25.00%]
  _2300 = niters.203_442 & 3;                                 <
  if (_2300 == 0)                                             <
    goto <bb 65>; [25.00%]                                    <
  else                                                            else
    goto <bb 36>; [75.00%]                                          goto <bb
36>; [75.00%]

  <bb 36> [local count: 41646173]:                            |   <bb 36>
[local count: 177683003]:
  # k_2403 = PHI <tmp.206_2407(35), tmp.239_2637(34)>         |  
niters_vector_mult_vf.205_2409 = niters.203_438 & 429496729
  # DEBUG k => k_2403                                         |   _2411 = (int)
niters_vector_mult_vf.205_2409;
                                                              >   tmp.206_2410
= k_382 + _2411;
                                                              >
                                                              >   <bb 37>
[local count: 162950122]:
                                                              >   # k_2406 =
PHI <tmp.206_2410(36), tmp.239_2639(34)>

the sink pass now does the transform where it did not do so before.

That's appearantly because of

  /* If BEST_BB is at the same nesting level, then require it to have
     significantly lower execution frequency to avoid gratuitous movement.  */
  if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
      /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
      && !(best_bb->count * 100 >= early_bb->count * threshold))
    return best_bb;

  /* No better block found, so return EARLY_BB, which happens to be the
     statement's original block.  */
  return early_bb;

where the SRC count is 96726596 before, 236910671 after and the
destination count is 72544947 before, 177683003 at the destination after.
The edge probabilities are 75% vs 25% and param_sink_frequency_threshold
is exactly 75 as well.  Since 236910671*0.75
is rounded down it passes the test while the previous state has an exact
match defeating it.

It's a little bit of an arbitrary choice,

diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index 2e744d6ae50..9b368e13463 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -230,7 +230,7 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
       /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
+      && !(best_bb->count * 100 > early_bb->count * threshold))
     return best_bb;

   /* No better block found, so return EARLY_BB, which happens to be the

fixes the missed sinking but not the regression :/

The count differences start to appear in when LC PHI blocks are added
only for virtuals and then pre-existing 'Invalid sum of incoming counts'
eventually lead to mismatches.  The 'Invalid sum of incoming counts'
start with the loop splitting pass.

fast_algorithms.c:145:10: optimized: loop split

Xionghu Lou did profile count updates there, not sure if that made things
worse in this case.

At least with broken BB counts splitting/unsplitting an edge can propagate
bogus counts elsewhere it seems.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
  2022-07-14  9:22 ` [Bug tree-optimization/106293] [13 Regression] " rguenth at gcc dot gnu.org
  2022-07-14 12:10 ` rguenth at gcc dot gnu.org
@ 2022-07-14 12:22 ` rguenth at gcc dot gnu.org
  2022-07-25  9:44 ` luoxhu at gcc dot gnu.org
                   ` (25 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-07-14 12:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
-fno-split-loops makes both variants run at the same speed (but slower than
with splitting loops).  There are still some bogus profile counts/probabilities
with that, notably after ifcvt and vectorization :/

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-07-14 12:22 ` rguenth at gcc dot gnu.org
@ 2022-07-25  9:44 ` luoxhu at gcc dot gnu.org
  2022-07-25  9:46 ` luoxhu at gcc dot gnu.org
                   ` (24 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2022-07-25  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #4 from luoxhu at gcc dot gnu.org ---
Could you try revert (In reply to Richard Biener from comment #2)
> I can reproduce a regression with -Ofast -march=znver2 running on Haswell as
> well.  -fopt-info doesn't reveal anything interesting besides
> 
> -fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 32987933)
> +fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 129072791)
> 
> obviously the slowdown is in P7Viterbi.  There's only minimal changes on the
> GIMPLE side, one notable:
> 
>   niters_vector_mult_vf.205_2406 = niters.203_442 & 429496729 |   _2041 =
> niters.203_438 & 3;
>   _2408 = (int) niters_vector_mult_vf.205_2406;               |   if (_2041
> == 0)
>   tmp.206_2407 = k_384 + _2408;                               |     goto <bb
> 66>; [25.00%]
>   _2300 = niters.203_442 & 3;                                 <
>   if (_2300 == 0)                                             <
>     goto <bb 65>; [25.00%]                                    <
>   else                                                            else
>     goto <bb 36>; [75.00%]                                          goto <bb
> 36>; [75.00%]
> 
>   <bb 36> [local count: 41646173]:                            |   <bb 36>
> [local count: 177683003]:
>   # k_2403 = PHI <tmp.206_2407(35), tmp.239_2637(34)>         |  
> niters_vector_mult_vf.205_2409 = niters.203_438 & 429496729
>   # DEBUG k => k_2403                                         |   _2411 =
> (int) niters_vector_mult_vf.205_2409;
>                                                               >  
> tmp.206_2410 = k_382 + _2411;
>                                                               >
>                                                               >   <bb 37>
> [local count: 162950122]:
>                                                               >   # k_2406 =
> PHI <tmp.206_2410(36), tmp.239_2639(34)>
> 
> the sink pass now does the transform where it did not do so before.
> 
> That's appearantly because of
> 
>   /* If BEST_BB is at the same nesting level, then require it to have
>      significantly lower execution frequency to avoid gratuitous movement. 
> */
>   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>       /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=..) rather than (...<...)  */
>       && !(best_bb->count * 100 >= early_bb->count * threshold))
>     return best_bb;
> 
>   /* No better block found, so return EARLY_BB, which happens to be the
>      statement's original block.  */
>   return early_bb;
> 
> where the SRC count is 96726596 before, 236910671 after and the
> destination count is 72544947 before, 177683003 at the destination after.
> The edge probabilities are 75% vs 25% and param_sink_frequency_threshold
> is exactly 75 as well.  Since 236910671*0.75
> is rounded down it passes the test while the previous state has an exact
> match defeating it.
> 
> It's a little bit of an arbitrary choice,
> 
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2e744d6ae50..9b368e13463 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -230,7 +230,7 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>        /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> +      && !(best_bb->count * 100 > early_bb->count * threshold))
>      return best_bb;
>  
>    /* No better block found, so return EARLY_BB, which happens to be the
> 
> fixes the missed sinking but not the regression :/
> 
> The count differences start to appear in when LC PHI blocks are added
> only for virtuals and then pre-existing 'Invalid sum of incoming counts'
> eventually lead to mismatches.  The 'Invalid sum of incoming counts'
> start with the loop splitting pass.
> 
> fast_algorithms.c:145:10: optimized: loop split
> 
> Xionghu Lou did profile count updates there, not sure if that made things
> worse in this case.
> 
> At least with broken BB counts splitting/unsplitting an edge can propagate
> bogus counts elsewhere it seems.

:(, Could you please try revert cd5ae148c47c6dee05adb19acd6a523f7187be7f and
see whether performance is back?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-07-25  9:44 ` luoxhu at gcc dot gnu.org
@ 2022-07-25  9:46 ` luoxhu at gcc dot gnu.org
  2023-01-10 12:12 ` yann at ywg dot ch
                   ` (23 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: luoxhu at gcc dot gnu.org @ 2022-07-25  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #5 from luoxhu at gcc dot gnu.org ---
r12-6086

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-07-25  9:46 ` luoxhu at gcc dot gnu.org
@ 2023-01-10 12:12 ` yann at ywg dot ch
  2023-01-10 12:45 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: yann at ywg dot ch @ 2023-01-10 12:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Yann Girsberger <yann at ywg dot ch> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |yann at ywg dot ch

--- Comment #6 from Yann Girsberger <yann at ywg dot ch> ---
I found a DCE Regression in trunk that bisects to the same commit as in the
original report.
It seems to be present in -Ofast and -O2.
Let me know if this should be a new report. 

cat case.c #14
int a;
static long b = 4073709551612, d;
short c;
void foo();
static char e(int **f) {
  **f = 0;
  if (a) {
    unsigned long *g = &b;
    unsigned long **h = &g;
    for (; d;) {
      foo();
      for (; c;) {
        unsigned long ***i = &h;
      }
    }
  }
  return 1;
}
int main() {
  int *j = &a;
  e(&j);
  if (b <= 0)
    foo();
}

`gcc-cb93c5f8008b95743b741d6f1842f9be50c6985c (trunk) -O2` can not eliminate
`foo` but `gcc-releases/gcc-12.2.0 -O2` can.

`gcc-cb93c5f8008b95743b741d6f1842f9be50c6985c (trunk) -O2 -S -o /dev/stdout
case.c`
--------- OUTPUT ---------
main:
.LFB1:
        .cfi_startproc
        cmpq    $0, b(%rip)
        movl    $0, a(%rip)
        jle     .L8
        xorl    %eax, %eax
        ret
.L8:
        pushq   %rax
        .cfi_def_cfa_offset 16
        xorl    %eax, %eax
        call    foo
        xorl    %eax, %eax
        popq    %rdx
        .cfi_def_cfa_offset 8
        ret
---------- END OUTPUT ---------


`gcc-releases/gcc-12.2.0 -O2 -S -o /dev/stdout case.c`
--------- OUTPUT ---------
main:
.LFB1:
        .cfi_startproc
        movl    $0, a(%rip)
        xorl    %eax, %eax
        ret
---------- END OUTPUT ---------


Bisects to:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d2a898666609452ef79a14feae1cadc3538e4b45

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-01-10 12:12 ` yann at ywg dot ch
@ 2023-01-10 12:45 ` rguenth at gcc dot gnu.org
  2023-01-10 15:53 ` cvs-commit at gcc dot gnu.org
                   ` (21 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-01-10 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Interesting.  It seems that early DSE is hindered by the extra PHI, possibly
not
reaching the CLOBBER.

@@ -24,6 +24,8 @@
     goto <bb 8>; [INV]

   <bb 3> :
+  # .MEM_10 = VDEF <.MEM_9>
+  g = &b;
   goto <bb 6>; [INV]

   <bb 4> :
@@ -39,7 +41,7 @@
     goto <bb 6>; [INV]

   <bb 6> :
-  # .MEM_5 = PHI <.MEM_9(3), .MEM_14(5)>
+  # .MEM_5 = PHI <.MEM_10(3), .MEM_14(5)>
   # VUSE <.MEM_5>
   d.3_4 = d;
   if (d.3_4 != 0)
@@ -48,7 +50,8 @@
     goto <bb 7>; [INV]

   <bb 7> :
-  # .MEM_12 = VDEF <.MEM_5>
+  # .MEM_17 = PHI <.MEM_5(6)>
+  # .MEM_12 = VDEF <.MEM_17>
   g ={v} {CLOBBER(eol)};

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-01-10 12:45 ` rguenth at gcc dot gnu.org
@ 2023-01-10 15:53 ` cvs-commit at gcc dot gnu.org
  2023-01-10 15:54 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-01-10 15:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1

commit r13-5092-g4e0b504f26f78ff02e80ad98ebbf8ded3aa6ffa1
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Jan 10 13:48:51 2023 +0100

    tree-optimization/106293 - missed DSE with virtual LC PHI

    Degenerate virtual PHIs can break DSEs fragile heuristic as to what
    defs it can handle for further processing.  The following enhances
    it to look through degenerate PHIs by means of a worklist, processing
    the degenerate PHI defs uses to the defs array.  The rewrite of
    virtuals into loop-closed SSA caused this to issue appear more often.
    The patch itself is mostly re-indenting the new loop body.

            PR tree-optimization/106293
            * tree-ssa-dse.cc (dse_classify_store): Use a worklist to
            process degenerate PHI defs.

            * gcc.dg/tree-ssa/ssa-dse-46.c: New testcase.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-01-10 15:53 ` cvs-commit at gcc dot gnu.org
@ 2023-01-10 15:54 ` rguenth at gcc dot gnu.org
  2023-01-11  7:04 ` cvs-commit at gcc dot gnu.org
                   ` (19 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-01-10 15:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |NEW
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
The testcase in comment#6 is now fixed.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-01-10 15:54 ` rguenth at gcc dot gnu.org
@ 2023-01-11  7:04 ` cvs-commit at gcc dot gnu.org
  2023-04-17 15:11 ` [Bug tree-optimization/106293] [13/14 " jakub at gcc dot gnu.org
                   ` (18 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-01-11  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #10 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:8d96a7fc27f3561f984e50feb316d3e472ed9d14

commit r13-5099-g8d96a7fc27f3561f984e50feb316d3e472ed9d14
Author: Richard Biener <rguenther@suse.de>
Date:   Wed Jan 11 08:02:52 2023 +0100

    tree-optimization/106293 - fix testcase

    The following removes a problematic initializer which causes
    excess diagnostics with -m32 and isn't actually required.

            PR tree-optimization/106293
            * gcc.dg/tree-ssa/ssa-dse-46.c: Remove long initializer.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2023-01-11  7:04 ` cvs-commit at gcc dot gnu.org
@ 2023-04-17 15:11 ` jakub at gcc dot gnu.org
  2023-04-17 16:15 ` jamborm at gcc dot gnu.org
                   ` (17 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-04-17 15:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, what is left in tHIS PR?  Shall we retarget it for GCC 14?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2023-04-17 15:11 ` [Bug tree-optimization/106293] [13/14 " jakub at gcc dot gnu.org
@ 2023-04-17 16:15 ` jamborm at gcc dot gnu.org
  2023-04-26  6:56 ` rguenth at gcc dot gnu.org
                   ` (16 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-04-17 16:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #12 from Martin Jambor <jamborm at gcc dot gnu.org> ---
My understanding of comment #2 and #3 is that we end up with what are very
likely bogus BB counts that we should check and perhaps attempt to fix.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2023-04-17 16:15 ` jamborm at gcc dot gnu.org
@ 2023-04-26  6:56 ` rguenth at gcc dot gnu.org
  2023-07-27  9:23 ` rguenth at gcc dot gnu.org
                   ` (15 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-26  6:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.0                        |13.2

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 13.1 is being released, retargeting bugs to GCC 13.2.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2023-04-26  6:56 ` rguenth at gcc dot gnu.org
@ 2023-07-27  9:23 ` rguenth at gcc dot gnu.org
  2023-07-27 18:01 ` hubicka at gcc dot gnu.org
                   ` (14 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-07-27  9:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.2                        |13.3

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 13.2 is being released, retargeting bugs to GCC 13.3.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2023-07-27  9:23 ` rguenth at gcc dot gnu.org
@ 2023-07-27 18:01 ` hubicka at gcc dot gnu.org
  2023-07-27 21:38 ` hubicka at gcc dot gnu.org
                   ` (13 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-27 18:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
       /* If result of comparsion is unknown, prefer EARLY_BB.
         Thus use !(...>=..) rather than (...<...)  */
-      && !(best_bb->count * 100 >= early_bb->count * threshold))
+      && !(best_bb->count * 100 > early_bb->count * threshold))
     return best_bb;

Comparing loop depths seems ceartainly odd.  
If we want to test best_bb and early_bb to be in same loop, we want to test
loop_father.  What is a benefit of testing across loop nests?

Profile report here claims:
dump id |static mismat|dynamic mismatch                                     |   
        |in count     |in count                  |time                      |   
lsplit  |      5    +5|   8151850567  +8151850567| 531506481006       +57.9%| 
ldist   |      9    +4|  15345493501  +7193642934| 606848841056       +14.2%| 
ifcvt   |     10    +1|  15487514871   +142021370| 689469797790       +13.6%| 
vect    |     35   +25|  17558425961  +2070911090| 517375405715       -25.0%| 
cunroll |     42    +7|  16898736178   -659689783| 452445796198        -4.9%|  
loopdone|     33    -9|   2678017188 -14220718990| 330969127663             |   
tracer  |     34    +1|   2678018710        +1522| 330613415364        +0.0%|  
fre     |     33    -1|   2676980249     -1038461| 330465677073        -0.0%|  
expand  |     28    -5|   2497468467   -179511782|--------------------------|

so looks like loop splitting, distribution and vectorizer does disturb profile
signficantly. 
(Ifcft does so by design and the damage is undone later.)
Not sure if that is the real problem though.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2023-07-27 18:01 ` hubicka at gcc dot gnu.org
@ 2023-07-27 21:38 ` hubicka at gcc dot gnu.org
  2023-07-28  7:22 ` rguenther at suse dot de
                   ` (12 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-27 21:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #16 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
It is really hard to make loop splitting to do something.
It does not like canonicalized invariant variables since loop exit condition
should not be NE_EXPR and it does not like when VRP turns LT/GT into NE.

This is what happens in hmmer.  There is loop iterating 100 times and splitting
happens just before last BB
int M = 100;

void
__attribute__ ((noinline,noipa))
do_something()
{
}
void
__attribute__ ((noinline,noipa))
do_something2()
{
}

__attribute__ ((noinline,noipa))
void test1 (int n)
{
  if (n <= 0 || n > 100000)
        return; 
  for (int i = 0; i <= n; i++)
          if (i < n)
                  do_something ();
          else
                  do_something2 ();
}
int
main(int, char **)
{
        for (int i = 0 ; i < 1000; i++)
          test1(M);
        return 0;
}

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2023-07-27 21:38 ` hubicka at gcc dot gnu.org
@ 2023-07-28  7:22 ` rguenther at suse dot de
  2023-07-28  8:01   ` Jan Hubicka
  2023-07-28  7:51 ` cvs-commit at gcc dot gnu.org
                   ` (11 subsequent siblings)
  28 siblings, 1 reply; 31+ messages in thread
From: rguenther at suse dot de @ 2023-07-28  7:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #17 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 27 Jul 2023, hubicka at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
> 
> --- Comment #15 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>        /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> +      && !(best_bb->count * 100 > early_bb->count * threshold))
>      return best_bb;
> 
> Comparing loop depths seems ceartainly odd.  
> If we want to test best_bb and early_bb to be in same loop, we want to test
> loop_father.  What is a benefit of testing across loop nests?

This heuristic wants to catch

  <sink stmt>
  if (foo) abort ();
  <place to sink>

and avoid sinking "too far" across a path with "similar enough"
execution count (I think the original motivation was to fix some
spilling / register pressure issue).  The loop depth test
should be !(bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
so we shouldn't limit sinking to a more outer nest.  As we rule
out > before this becomes ==.

It looks tempting to sink to the earliest place with the same
execution count rather than the latest but the above doesn't
really achive that (it doesn't look "upwards" but simply fails).
With a guessed profile it's also going to be hard.

And it in no way implements register pressure / spilling sensitivity
(see also Ajits attempts at producing a patch that avoids sinking
across a call).  All these are ultimatively doomed unless we at least
consider a group of stmts together.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2023-07-28  7:22 ` rguenther at suse dot de
@ 2023-07-28  7:51 ` cvs-commit at gcc dot gnu.org
  2023-07-28  8:01 ` hubicka at ucw dot cz
                   ` (10 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-28  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #18 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:b24acae8f4d315a5b071ffc2574ce91c7a0800ca

commit r14-2850-gb24acae8f4d315a5b071ffc2574ce91c7a0800ca
Author: Jan Hubicka <jh@suse.cz>
Date:   Fri Jul 28 09:48:34 2023 +0200

    loop-split improvements, part 2

    this patch fixes profile update in the first case of loop splitting.
    The pass still gives up on very basic testcases:

    __attribute__ ((noinline,noipa))
    void test1 (int n)
    {
      if (n <= 0 || n > 100000)
        return;
      for (int i = 0; i <= n; i++)
        {
          if (i < n)
            do_something ();
          if (a[i])
            do_something2();
        }
    }

    Here I needed to do the conditoinal that enforces sane value range of n.
    The reason is that it gives up on:
          !number_of_iterations_exit (loop1, exit1, &niter, false, true)
    and without the conditonal we get assumption that n>=0 and not INT_MAX.
    I think from overflow we shold derive that INT_MAX test is not needed and
since
    the loop does nothing for n<0 it is also just an paranoia.

    I am not sure how to fix this though :(.  In general the pass does not
really
    need to compute iteration count.  It only needs to know what direction the
IVs
    go so it can detect tests that fires in first part of iteration space.

    Rich, any idea what the correct test should be?

    In testcase:
      for (int i = 0; i < 200; i++)
        if (i < 150)
          do_something ();
        else
          do_something2 ();
    the old code did wrong update of the exit condition probabilities.
    We know that first loop iterates 150 times and the second loop 50 times
    and we get it by simply scaling loop body by the probability of inner test.

    With the patch we now get:

      <bb 2> [count: 1000]:

      <bb 3> [count: 150000]:    <- loop 1 correctly iterates 149 times
      # i_10 = PHI <i_7(8), 0(2)>
      do_something ();
      i_7 = i_10 + 1;
      if (i_7 <= 149)
        goto <bb 8>; [99.33%]
      else
        goto <bb 17>; [0.67%]

      <bb 8> [count: 149000]:
      goto <bb 3>; [100.00%]

      <bb 16> [count: 1000]:
      # i_15 = PHI <i_18(17)>

      <bb 9> [count: 49975]:    <- loop 2 should iterate 50 times but
                                   we are slightly wrong
      # i_3 = PHI <i_15(16), i_14(13)>
      do_something2 ();
      i_14 = i_3 + 1;
      if (i_14 != 200)
        goto <bb 13>; [98.00%]
      else
        goto <bb 7>; [2.00%]

      <bb 13> [count: 48975]:
      goto <bb 9>; [100.00%]

      <bb 17> [count: 1000]:   <- this test is always true becuase it is
                                  reached form bb 3
      # i_18 = PHI <i_7(3)>
      if (i_18 != 200)
        goto <bb 16>; [99.95%]
      else
        goto <bb 7>; [0.05%]

      <bb 7> [count: 1000]:
      return;

    The reason why we are slightly wrong is the condtion in bb17 that
    is always true but the pass does not konw it.

    Rich any idea how to do that?  I think connect_loops should work out
    the cas where the loop exit conditon is never satisfied at the time
    the splitted condition fails for first time.

    Before patch on hmmer we get a lot of mismatches:
    Profile report here claims:
    dump id |static mismat|dynamic mismatch                                    
|
            |in count     |in count                  |time                     
|
    lsplit  |      5    +5|   8151850567  +8151850567| 531506481006      
+57.9%|
    ldist   |      9    +4|  15345493501  +7193642934| 606848841056      
+14.2%|
    ifcvt   |     10    +1|  15487514871   +142021370| 689469797790      
+13.6%|
    vect    |     35   +25|  17558425961  +2070911090| 517375405715      
-25.0%|
    cunroll |     42    +7|  16898736178   -659689783| 452445796198       
-4.9%|
    loopdone|     33    -9|   2678017188 -14220718990| 330969127663            
|
    tracer  |     34    +1|   2678018710        +1522| 330613415364       
+0.0%|
    fre     |     33    -1|   2676980249     -1038461| 330465677073       
-0.0%|
    expand  |     28    -5|   2497468467  
-179511782|--------------------------|

    With patch

    lsplit  |      0      |            0             | 328723360744       
-2.3%|
    ldist   |      0      |            0             | 396193562452      
+20.6%|
    ifcvt   |      1    +1|     71010686    +71010686| 478743508522      
+20.8%|
    vect    |     14   +13|    697518955   +626508269| 299398068323      
-37.5%|
    cunroll |     13    -1|    489349408   -208169547| 257777839725      
-10.5%|
    loopdone|     11    -2|    402558559    -86790849| 201010712702            
|
    tracer  |     13    +2|    402977200      +418641| 200651036623       
+0.0%|
    fre     |     13      |    402622146      -355054| 200344398654       
-0.2%|
    expand  |     11    -2|    333608636   
-69013510|--------------------------|

    So no mismatches for lsplit and ldist and also lsplit thinks it improves
speed by
    2.3% rather than regressig it by 57%.

    Update is still not perfect since we do not work out that the second loop
    never iterates.

    Ifcft wrecks profile by desing since it insert conditonals with both arms
100%
    that will be eliminated later after vect.  It is not clear to me what
happens
    in vect though.

    Bootstrapped/regtested x86_64-linux, comitted.

    gcc/ChangeLog:

            PR middle-end/106923
            * tree-ssa-loop-split.cc (connect_loops): Change probability
            of the test preconditioning second loop to very_likely.
            (fix_loop_bb_probability): Handle correctly case where
            on of the arms of the conditional is empty.
            (split_loop): Fold the test guarding first condition to
            see if it is constant true; Set correct entry block
            probabilities of the split loops; determine correct loop
            eixt probabilities.

    gcc/testsuite/ChangeLog:

            PR middle-end/106293
            * gcc.dg/tree-prof/loop-split-1.c: New test.
            * gcc.dg/tree-prof/loop-split-2.c: New test.
            * gcc.dg/tree-prof/loop-split-3.c: New test.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2023-07-28  7:22 ` rguenther at suse dot de
@ 2023-07-28  8:01   ` Jan Hubicka
  0 siblings, 0 replies; 31+ messages in thread
From: Jan Hubicka @ 2023-07-28  8:01 UTC (permalink / raw)
  To: rguenther at suse dot de; +Cc: gcc-bugs

> This heuristic wants to catch
> 
>   <sink stmt>
>   if (foo) abort ();
>   <place to sink>
> 
> and avoid sinking "too far" across a path with "similar enough"
> execution count (I think the original motivation was to fix some
> spilling / register pressure issue).  The loop depth test
> should be !(bb_loop_depth (best_bb) < bb_loop_depth (early_bb))

I am still concenred that loop_depth (bb1) < loop_depth (bb2)
does not really imply that bb1 is not in different loop nest with
loop with significantly higher iteration count than bb2...
> so we shouldn't limit sinking to a more outer nest.  As we rule
> out > before this becomes ==.
> 
> It looks tempting to sink to the earliest place with the same
> execution count rather than the latest but the above doesn't
> really achive that (it doesn't look "upwards" but simply fails).
> With a guessed profile it's also going to be hard.

Statistically guessed profile works quite well for things like placement
of splills in IRA (not perfectly of course) and this looks like kind of
similar thing.  So perhaps it could work reasoably well...
> 
> And it in no way implements register pressure / spilling sensitivity
> (see also Ajits attempts at producing a patch that avoids sinking
> across a call).  All these are ultimatively doomed unless we at least
> consider a group of stmts together.

hmm, life is hard :)
Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2023-07-28  7:51 ` cvs-commit at gcc dot gnu.org
@ 2023-07-28  8:01 ` hubicka at ucw dot cz
  2023-07-28 12:09 ` rguenther at suse dot de
                   ` (9 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at ucw dot cz @ 2023-07-28  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
> This heuristic wants to catch
> 
>   <sink stmt>
>   if (foo) abort ();
>   <place to sink>
> 
> and avoid sinking "too far" across a path with "similar enough"
> execution count (I think the original motivation was to fix some
> spilling / register pressure issue).  The loop depth test
> should be !(bb_loop_depth (best_bb) < bb_loop_depth (early_bb))

I am still concenred that loop_depth (bb1) < loop_depth (bb2)
does not really imply that bb1 is not in different loop nest with
loop with significantly higher iteration count than bb2...
> so we shouldn't limit sinking to a more outer nest.  As we rule
> out > before this becomes ==.
> 
> It looks tempting to sink to the earliest place with the same
> execution count rather than the latest but the above doesn't
> really achive that (it doesn't look "upwards" but simply fails).
> With a guessed profile it's also going to be hard.

Statistically guessed profile works quite well for things like placement
of splills in IRA (not perfectly of course) and this looks like kind of
similar thing.  So perhaps it could work reasoably well...
> 
> And it in no way implements register pressure / spilling sensitivity
> (see also Ajits attempts at producing a patch that avoids sinking
> across a call).  All these are ultimatively doomed unless we at least
> consider a group of stmts together.

hmm, life is hard :)
Honza

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2023-07-28  8:01 ` hubicka at ucw dot cz
@ 2023-07-28 12:09 ` rguenther at suse dot de
  2023-07-31  7:44 ` hubicka at gcc dot gnu.org
                   ` (8 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenther at suse dot de @ 2023-07-28 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #20 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 28 Jul 2023, hubicka at ucw dot cz wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293
> 
> --- Comment #19 from Jan Hubicka <hubicka at ucw dot cz> ---
> > This heuristic wants to catch
> > 
> >   <sink stmt>
> >   if (foo) abort ();
> >   <place to sink>
> > 
> > and avoid sinking "too far" across a path with "similar enough"
> > execution count (I think the original motivation was to fix some
> > spilling / register pressure issue).  The loop depth test
> > should be !(bb_loop_depth (best_bb) < bb_loop_depth (early_bb))
> 
> I am still concenred that loop_depth (bb1) < loop_depth (bb2)
> does not really imply that bb1 is not in different loop nest with
> loop with significantly higher iteration count than bb2...

True, so it probably should be instead

  !flow_loop_nested_p (early_bb->loop_father, best_bb->loop_father)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2023-07-28 12:09 ` rguenther at suse dot de
@ 2023-07-31  7:44 ` hubicka at gcc dot gnu.org
  2023-07-31 15:39 ` jamborm at gcc dot gnu.org
                   ` (7 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-07-31  7:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #21 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Fixing loop distribution and vectorizer profile update seems to do the trick
with profile feedback. Without we are still worse than in July last year on
zen2 tester (zen3 and ice lake seems to behave differently perhaps due to
different vectorization decisions)

https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=38536&plot.0=476.180.0

shows two jumps last year.
g:d489ec082ea21410 (2022-06-30 16:46) and 3731dd0bea8994c3 (2022-07-04 00:16)
g:3731dd0bea8994c3 (2022-07-04 00:16) and 07dd0f7ba27d1fe9 (2022-07-05 14:05)

Which seems both different from the patch listed (which is even older).
Optically it seems that second jump is gone, but it is hard to tell a year
later.
Martin, it would be great to bisect these two.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2023-07-31  7:44 ` hubicka at gcc dot gnu.org
@ 2023-07-31 15:39 ` jamborm at gcc dot gnu.org
  2023-08-01 10:40 ` hubicka at gcc dot gnu.org
                   ` (6 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: jamborm at gcc dot gnu.org @ 2023-07-31 15:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #22 from Martin Jambor <jamborm at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #21)
> Fixing loop distribution and vectorizer profile update seems to do the trick
> with profile feedback. Without we are still worse than in July last year on
> zen2 tester (zen3 and ice lake seems to behave differently perhaps due to
> different vectorization decisions)
> 
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?highlight_run=38536&plot.
> 0=476.180.0
> 
> shows two jumps last year.
> g:d489ec082ea21410 (2022-06-30 16:46) and 3731dd0bea8994c3 (2022-07-04 00:16)

On a machine very similar to lntzen3, hmmer binary built with these
two revisions ran for pretty much the same time.

> g:3731dd0bea8994c3 (2022-07-04 00:16) and 07dd0f7ba27d1fe9 (2022-07-05 14:05)

Bisecting in this range led to g:d2a89866660 but that is the commit
referenced in the summary of this bug.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2023-07-31 15:39 ` jamborm at gcc dot gnu.org
@ 2023-08-01 10:40 ` hubicka at gcc dot gnu.org
  2023-08-02  8:48 ` hubicka at gcc dot gnu.org
                   ` (5 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-08-01 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #23 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Thanks,
I think I will need to work out the remaining vectorizer problems.  One issue
seems to be interaction with loop distribution.  Loop distribution seems to
intorduce alias checks that are later removed by vectorizer but I suspect the
profile is not compensated back.

Other problem is that lsplit produces loop iterating once (for the last
iteration) and does not update loop info accordingly (since it really lacks
analysis discovering this).  These loops seems to survive into ivopts

fast_algorithms.c.182t.ivopts:;;  iterations by profile: 5.312499 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 0.009495 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 0.009495 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 0.009495 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 0.009495 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 100.000008
(unreliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 100.000000
(unreliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 9.662853 (unreliable,
maybe flat)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 4.646072 (unreliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 100.000007
(unreliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 5.312500 (unreliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 473.497707 (reliable)
fast_algorithms.c.182t.ivopts:;;  iterations by profile: 100.999596 (reliable)

which is obviously bad idea.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2023-08-01 10:40 ` hubicka at gcc dot gnu.org
@ 2023-08-02  8:48 ` hubicka at gcc dot gnu.org
  2023-08-02  9:42 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-08-02  8:48 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #24 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
g:2e93b92c1ec5fbbbe10765c6e059c3c90d564245 fixes the profile update after
cancelled distribution. However it does not help hmmer since we actually
vectorize that loop iterating 0 times.  We need to figure out proper iteration
count for that and convince loop distribution and vectorizer to do nothing on
it.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13/14 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2023-08-02  8:48 ` hubicka at gcc dot gnu.org
@ 2023-08-02  9:42 ` rguenth at gcc dot gnu.org
  2023-08-04 10:09 ` [Bug tree-optimization/106293] [13 regression] " hubicka at gcc dot gnu.org
                   ` (3 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-02  9:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #25 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jan Hubicka from comment #24)
> g:2e93b92c1ec5fbbbe10765c6e059c3c90d564245 fixes the profile update after
> cancelled distribution. However it does not help hmmer since we actually
> vectorize that loop iterating 0 times.  We need to figure out proper
> iteration count for that and convince loop distribution and vectorizer to do
> nothing on it.

or use a smaller VF?  I doubt the scalar loop iterates zero times?

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2023-08-02  9:42 ` rguenth at gcc dot gnu.org
@ 2023-08-04 10:09 ` hubicka at gcc dot gnu.org
  2023-08-07  8:56 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-08-04 10:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[13/14 Regression]          |[13 regression] 456.hmmer
                   |456.hmmer at -Ofast         |at -Ofast -march=native
                   |-march=native regressed by  |regressed by 19% on zen2
                   |19% on zen2 and zen3 in     |and zen3 in July 2022
                   |July 2022                   |

--- Comment #26 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
We are out of regression finally, but still there are several things to fix.
 1) vectorizer produces corrupt profile
 2) loop-split is not able to work out that it splits last iteration
 3) we work way to hard optimizing loops iterating 0 times.

The loop in question really iterates zero times.  It is created by loop split
from the internal loop:

        for (k = 1; k <= M; k++) {
          mc[k] = mpp[k-1]   + tpmm[k-1];
          if ((sc = ip[k-1]  + tpim[k-1]) > mc[k])  mc[k] = sc;
          if ((sc = dpp[k-1] + tpdm[k-1]) > mc[k])  mc[k] = sc;
          if ((sc = xmb  + bp[k])         > mc[k])  mc[k] = sc;
          mc[k] += ms[k];
          if (mc[k] < -INFTY) mc[k] = -INFTY;

          dc[k] = dc[k-1] + tpdd[k-1];
          if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc;
          if (dc[k] < -INFTY) dc[k] = -INFTY;

          if (k < M) {
            ic[k] = mpp[k] + tpmi[k];
            if ((sc = ip[k] + tpii[k]) > ic[k]) ic[k] = sc;
            ic[k] += is[k];
            if (ic[k] < -INFTY) ic[k] = -INFTY;
          }

it peels off the last iteration. For ocnidtion is
 if (k <= M)
while we plit on
 if (k < M)
M is varianble and nothing seems to be able to optimize out the second loop
after splitting.

My plan is to add the pattern match so loop split gets this right and records
upper bound on iteration count, but first want to show other bugs exposed by
this scenario.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (25 preceding siblings ...)
  2023-08-04 10:09 ` [Bug tree-optimization/106293] [13 regression] " hubicka at gcc dot gnu.org
@ 2023-08-07  8:56 ` cvs-commit at gcc dot gnu.org
  2023-08-10 16:01 ` hubicka at gcc dot gnu.org
  2024-05-21  9:11 ` jakub at gcc dot gnu.org
  28 siblings, 0 replies; 31+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-08-07  8:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #27 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jan Hubicka <hubicka@gcc.gnu.org>:

https://gcc.gnu.org/g:73c14db6d1a8c1267137b94c41f2e2c9410dcbb1

commit r14-3015-g73c14db6d1a8c1267137b94c41f2e2c9410dcbb1
Author: Jan Hubicka <jh@suse.cz>
Date:   Mon Aug 7 10:55:58 2023 +0200

    Fix profile update after versioning ifconverted loop

    If loop is ifconverted and later versioning by vectorizer, vectorizer will
    reuse the scalar loop produced by ifconvert. Curiously enough it does not
seem
    to do so for versions produced by loop distribution while for loop
distribution
    this matters (since since both ldist versions survive to final code) while
    after ifcvt it does not (since we remove non-vectorized path).

    This patch fixes associated profile update.  Here it is necessary to scale
both
    arms of the conditional according to runtime checks inserted.  We got
partly
    right the loop body, but not the preheader block and block after exit.  The
    first is particularly bad since it changes loop iterations estimates.

    So we now turn 4 original loops:
      loop 1: iterations by profile: 473.497707 (reliable) entry count:84821
(precise, freq 0.9979)
      loop 2: iterations by profile: 100.000000 (reliable) entry count:39848881
(precise, freq 468.8104)
      loop 3: iterations by profile: 100.000000 (reliable) entry count:39848881
(precise, freq 468.8104)
      loop 4: iterations by profile: 100.999596 (reliable) entry count:84167
(precise, freq 0.9902)

    Into following loops
      iterations by profile: 5.312499 (unreliable, maybe flat) entry
count:12742188 (guessed, freq 149.9081)
         vectorized and split loop 1, peeled
      iterations by profile: 0.009496 (unreliable, maybe flat) entry
count:374798 (guessed, freq 4.4094)
         split loop 1 (last iteration), peeled
      iterations by profile: 100.000008 (unreliable) entry count:3945039
(guessed, freq 46.4122)
         scalar version of loop 1
      iterations by profile: 100.000007 (unreliable) entry count:7101070
(guessed, freq 83.5420)
         redundant scalar version of loop 1 which we could eliminate if
vectorizer understood ldist
      iterations by profile: 100.000000 (unreliable) entry count:35505353
(guessed, freq 417.7100)
         unvectorized loop 2
      iterations by profile: 5.312500 (unreliable) entry count:25563855
(guessed, freq 300.7512)
         vectorized loop 2, not peeled (hits max-peel-insns)
      iterations by profile: 100.000007 (unreliable) entry count:7101070
(guessed, freq 83.5420)
         unvectorized loop 3
      iterations by profile: 5.312500 (unreliable) entry count:25563855
(guessed, freq 300.7512)
         vectorized loop 3, not peeled (hits max-peel-insns)
      iterations by profile: 473.497707 (reliable) entry count:84821 (precise,
freq 0.9979)
         loop 1
      iterations by profile: 100.999596 (reliable) entry count:84167 (precise,
freq 0.9902)
         loop 4

    With this change we are on 0 profile erros on hmmer benchmark:

    Pass dump id |dynamic mismatch          |overall                           
  |
                 |in count                  |size            |time             
  |
    172t ch_vect |            0             |      996       | 385812023346    
  |
    173t ifcvt   |     71010686    +71010686|     1021  +2.5%| 468361969416
+21.4%|
    174t vect    |    210830784   +139820098|     1497 +46.6%| 216073467874
-53.9%|
    175t dce     |    210830784             |     1387  -7.3%| 205273170281 
-5.0%|
    176t pcom    |    210830784             |     1387       | 201722634966 
-1.7%|
    177t cunroll |            0   -210830784|     1443  +4.0%| 180441501289
-10.5%|
    182t ivopts  |            0             |     1385  -4.0%| 136412345683
-24.4%|
    183t lim     |            0             |     1389  +0.3%| 135093950836 
-1.0%|
    192t reassoc |            0             |     1381  -0.6%| 134778347700 
-0.2%|
    193t slsr    |            0             |     1380  -0.1%| 134738100330 
-0.0%|
    195t tracer  |            0             |     1521 +10.2%| 134738179146 
+0.0%|
    196t fre     |      2680654     +2680654|     1489  -2.1%| 134659672725 
-0.1%|
    198t dom     |      5361308     +2680654|     1473  -1.1%| 134449553658 
-0.2%|
    201t vrp     |      5361308             |     1474  +0.1%| 134489004050 
+0.0%|
    202t ccp     |      5361308             |     1472  -0.1%| 134440752274 
-0.0%|
    204t dse     |      5361308             |     1444  -1.9%| 133802300525 
-0.5%|
    206t forwprop|      5361308             |     1433  -0.8%| 133542828370 
-0.2%|
    207t sink    |      5361308             |     1431  -0.1%| 133542658728 
-0.0%|
    211t store-me|      5361308             |     1430  -0.1%| 133542573728 
-0.0%|
    212t cddce   |      5361308             |     1428  -0.1%| 133541776728 
-0.0%|
    258r expand  |      5361308            
|----------------|--------------------|
    260r into_cfg|      5361308             |     9334  -0.8%| 885820707913 
-0.6%|
    261r jump    |      5361308             |     9330  -0.0%| 885820367913 
-0.0%|
    265r fwprop1 |      5361308             |     9206  -1.3%| 876756504385 
-1.0%|
    267r rtl pre |      5361308             |     9210  +0.0%| 876914305953 
+0.0%|
    269r cprop   |      5361308             |     9202  -0.1%| 876756165101 
-0.0%|
    271r cse_loca|      5361308             |     9198  -0.0%| 876727760821 
-0.0%|
    272r ce1     |      5361308             |     9126  -0.8%| 875726815885 
-0.1%|
    276r loop2_in|      5361308             |     9167  +0.4%| 873573110570 
-0.2%|
    282r cprop   |      5361308             |     9095  -0.8%| 871937317262 
-0.2%|
    284r cse2    |      5361308             |     9091  -0.0%| 871936977978 
-0.0%|
    285r dse1    |      5361308             |     9067  -0.3%| 871437031602 
-0.1%|
    290r combine |      5361308             |     9071  +0.0%| 869206278202 
-0.3%|
    292r stv     |      5361308             |    17157 +89.1%|
2111071925708+142.9%|
    295r bbpart  |      5361308             |    17161  +0.0%| 2111071925708   
   |
    296r outof_cf|      5361308             |    17233  +0.4%| 2111655121000 
+0.0%|
    297r split1  |      5361308             |    17245  +0.1%| 2111656138852 
+0.0%|
    306r ira     |      5361308             |    19189 +11.3%| 2136098398308 
+1.2%|
    307r reload  |      5361308             |    12101 -36.9%| 981091222830
-54.1%|
    309r postrelo|      5361308             |    12019  -0.7%| 978750345475 
-0.2%|
    310r gcse2   |      5361308             |    12027  +0.1%| 978329108320 
-0.0%|
    311r split2  |      5361308             |    12023  -0.0%| 978507631352 
+0.0%|
    312r ree     |      5361308             |    12027  +0.0%| 978505414244 
-0.0%|
    313r cmpelim |      5361308             |    11979  -0.4%| 977531601988 
-0.1%|
    314r pro_and_|      5361308             |    12091  +0.9%| 977541801988 
+0.0%|
    315r dse2    |      5361308             |    12091       | 977541801988    
  |
    316r csa     |      5361308             |    12087  -0.0%| 977541461988 
-0.0%|
    317r jump2   |      5361308             |    12039  -0.4%| 977683176572 
+0.0%|
    318r compgoto|      5361308             |    12039       | 977683176572    
  |
    320r peephole|      5361308             |    12047  +0.1%| 977362727612 
-0.0%|
    321r ce3     |      5361308             |    12047       | 977362727612    
  |
    323r cprop_ha|      5361308             |    11907  -1.2%| 968751076676 
-0.9%|
    324r rtl_dce |      5361308             |    11903  -0.0%| 968593274820 
-0.0%|
    325r bbro    |      5361308             |    11883  -0.2%| 967964046644 
-0.1%|

    Bootstrapped/regtested x86_64-linux, plan to commit it tomorrow if there
are no
    complains.

    gcc/ChangeLog:

            PR tree-optimization/106293
            * tree-vect-loop-manip.cc (vect_loop_versioning): Fix profile
update.
            * tree-vect-loop.cc (vect_transform_loop): Likewise.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/106293
            * gcc.dg/vect/vect-cond-11.c: Check profile consistency.
            * gcc.dg/vect/vect-widen-mult-extern-1.c: Check profile
consistency.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (26 preceding siblings ...)
  2023-08-07  8:56 ` cvs-commit at gcc dot gnu.org
@ 2023-08-10 16:01 ` hubicka at gcc dot gnu.org
  2024-05-21  9:11 ` jakub at gcc dot gnu.org
  28 siblings, 0 replies; 31+ messages in thread
From: hubicka at gcc dot gnu.org @ 2023-08-10 16:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=110975

--- Comment #28 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Added PR for the missed iteration count

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Bug tree-optimization/106293] [13 regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
  2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
                   ` (27 preceding siblings ...)
  2023-08-10 16:01 ` hubicka at gcc dot gnu.org
@ 2024-05-21  9:11 ` jakub at gcc dot gnu.org
  28 siblings, 0 replies; 31+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-05-21  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|13.3                        |13.4

--- Comment #29 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
GCC 13.3 is being released, retargeting bugs to GCC 13.4.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-05-21  9:11 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-14  9:08 [Bug tree-optimization/106293] New: 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 jamborm at gcc dot gnu.org
2022-07-14  9:22 ` [Bug tree-optimization/106293] [13 Regression] " rguenth at gcc dot gnu.org
2022-07-14 12:10 ` rguenth at gcc dot gnu.org
2022-07-14 12:22 ` rguenth at gcc dot gnu.org
2022-07-25  9:44 ` luoxhu at gcc dot gnu.org
2022-07-25  9:46 ` luoxhu at gcc dot gnu.org
2023-01-10 12:12 ` yann at ywg dot ch
2023-01-10 12:45 ` rguenth at gcc dot gnu.org
2023-01-10 15:53 ` cvs-commit at gcc dot gnu.org
2023-01-10 15:54 ` rguenth at gcc dot gnu.org
2023-01-11  7:04 ` cvs-commit at gcc dot gnu.org
2023-04-17 15:11 ` [Bug tree-optimization/106293] [13/14 " jakub at gcc dot gnu.org
2023-04-17 16:15 ` jamborm at gcc dot gnu.org
2023-04-26  6:56 ` rguenth at gcc dot gnu.org
2023-07-27  9:23 ` rguenth at gcc dot gnu.org
2023-07-27 18:01 ` hubicka at gcc dot gnu.org
2023-07-27 21:38 ` hubicka at gcc dot gnu.org
2023-07-28  7:22 ` rguenther at suse dot de
2023-07-28  8:01   ` Jan Hubicka
2023-07-28  7:51 ` cvs-commit at gcc dot gnu.org
2023-07-28  8:01 ` hubicka at ucw dot cz
2023-07-28 12:09 ` rguenther at suse dot de
2023-07-31  7:44 ` hubicka at gcc dot gnu.org
2023-07-31 15:39 ` jamborm at gcc dot gnu.org
2023-08-01 10:40 ` hubicka at gcc dot gnu.org
2023-08-02  8:48 ` hubicka at gcc dot gnu.org
2023-08-02  9:42 ` rguenth at gcc dot gnu.org
2023-08-04 10:09 ` [Bug tree-optimization/106293] [13 regression] " hubicka at gcc dot gnu.org
2023-08-07  8:56 ` cvs-commit at gcc dot gnu.org
2023-08-10 16:01 ` hubicka at gcc dot gnu.org
2024-05-21  9:11 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).