[Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "luoxhu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022
Date: Mon, 25 Jul 2022 09:44:25 +0000	[thread overview]
Message-ID: <bug-106293-4-W7NmBnOIdt@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-106293-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106293

--- Comment #4 from luoxhu at gcc dot gnu.org ---
Could you try revert (In reply to Richard Biener from comment #2)
> I can reproduce a regression with -Ofast -march=znver2 running on Haswell as
> well.  -fopt-info doesn't reveal anything interesting besides
> 
> -fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 32987933)
> +fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 129072791)
> 
> obviously the slowdown is in P7Viterbi.  There's only minimal changes on the
> GIMPLE side, one notable:
> 
>   niters_vector_mult_vf.205_2406 = niters.203_442 & 429496729 |   _2041 =
> niters.203_438 & 3;
>   _2408 = (int) niters_vector_mult_vf.205_2406;               |   if (_2041
> == 0)
>   tmp.206_2407 = k_384 + _2408;                               |     goto <bb
> 66>; [25.00%]
>   _2300 = niters.203_442 & 3;                                 <
>   if (_2300 == 0)                                             <
>     goto <bb 65>; [25.00%]                                    <
>   else                                                            else
>     goto <bb 36>; [75.00%]                                          goto <bb
> 36>; [75.00%]
> 
>   <bb 36> [local count: 41646173]:                            |   <bb 36>
> [local count: 177683003]:
>   # k_2403 = PHI <tmp.206_2407(35), tmp.239_2637(34)>         |  
> niters_vector_mult_vf.205_2409 = niters.203_438 & 429496729
>   # DEBUG k => k_2403                                         |   _2411 =
> (int) niters_vector_mult_vf.205_2409;
>                                                               >  
> tmp.206_2410 = k_382 + _2411;
>                                                               >
>                                                               >   <bb 37>
> [local count: 162950122]:
>                                                               >   # k_2406 =
> PHI <tmp.206_2410(36), tmp.239_2639(34)>
> 
> the sink pass now does the transform where it did not do so before.
> 
> That's appearantly because of
> 
>   /* If BEST_BB is at the same nesting level, then require it to have
>      significantly lower execution frequency to avoid gratuitous movement. 
> */
>   if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>       /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=..) rather than (...<...)  */
>       && !(best_bb->count * 100 >= early_bb->count * threshold))
>     return best_bb;
> 
>   /* No better block found, so return EARLY_BB, which happens to be the
>      statement's original block.  */
>   return early_bb;
> 
> where the SRC count is 96726596 before, 236910671 after and the
> destination count is 72544947 before, 177683003 at the destination after.
> The edge probabilities are 75% vs 25% and param_sink_frequency_threshold
> is exactly 75 as well.  Since 236910671*0.75
> is rounded down it passes the test while the previous state has an exact
> match defeating it.
> 
> It's a little bit of an arbitrary choice,
> 
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2e744d6ae50..9b368e13463 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -230,7 +230,7 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) == bb_loop_depth (early_bb)
>        /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >= early_bb->count * threshold))
> +      && !(best_bb->count * 100 > early_bb->count * threshold))
>      return best_bb;
>  
>    /* No better block found, so return EARLY_BB, which happens to be the
> 
> fixes the missed sinking but not the regression :/
> 
> The count differences start to appear in when LC PHI blocks are added
> only for virtuals and then pre-existing 'Invalid sum of incoming counts'
> eventually lead to mismatches.  The 'Invalid sum of incoming counts'
> start with the loop splitting pass.
> 
> fast_algorithms.c:145:10: optimized: loop split
> 
> Xionghu Lou did profile count updates there, not sure if that made things
> worse in this case.
> 
> At least with broken BB counts splitting/unsplitting an edge can propagate
> bogus counts elsewhere it seems.

:(, Could you please try revert cd5ae148c47c6dee05adb19acd6a523f7187be7f and
see whether performance is back?

next prev parent reply	other threads:[~2022-07-25  9:44 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-14  9:08 [Bug tree-optimization/106293] New: " jamborm at gcc dot gnu.org
2022-07-14  9:22 ` [Bug tree-optimization/106293] [13 Regression] " rguenth at gcc dot gnu.org
2022-07-14 12:10 ` rguenth at gcc dot gnu.org
2022-07-14 12:22 ` rguenth at gcc dot gnu.org
2022-07-25  9:44 ` luoxhu at gcc dot gnu.org [this message]
2022-07-25  9:46 ` luoxhu at gcc dot gnu.org
2023-01-10 12:12 ` yann at ywg dot ch
2023-01-10 12:45 ` rguenth at gcc dot gnu.org
2023-01-10 15:53 ` cvs-commit at gcc dot gnu.org
2023-01-10 15:54 ` rguenth at gcc dot gnu.org
2023-01-11  7:04 ` cvs-commit at gcc dot gnu.org
2023-04-17 15:11 ` [Bug tree-optimization/106293] [13/14 " jakub at gcc dot gnu.org
2023-04-17 16:15 ` jamborm at gcc dot gnu.org
2023-04-26  6:56 ` rguenth at gcc dot gnu.org
2023-07-27  9:23 ` rguenth at gcc dot gnu.org
2023-07-27 18:01 ` hubicka at gcc dot gnu.org
2023-07-27 21:38 ` hubicka at gcc dot gnu.org
2023-07-28  7:22 ` rguenther at suse dot de
2023-07-28  8:01   ` Jan Hubicka
2023-07-28  7:51 ` cvs-commit at gcc dot gnu.org
2023-07-28  8:01 ` hubicka at ucw dot cz
2023-07-28 12:09 ` rguenther at suse dot de
2023-07-31  7:44 ` hubicka at gcc dot gnu.org
2023-07-31 15:39 ` jamborm at gcc dot gnu.org
2023-08-01 10:40 ` hubicka at gcc dot gnu.org
2023-08-02  8:48 ` hubicka at gcc dot gnu.org
2023-08-02  9:42 ` rguenth at gcc dot gnu.org
2023-08-04 10:09 ` [Bug tree-optimization/106293] [13 regression] " hubicka at gcc dot gnu.org
2023-08-07  8:56 ` cvs-commit at gcc dot gnu.org
2023-08-10 16:01 ` hubicka at gcc dot gnu.org
2024-05-21  9:11 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-106293-4-W7NmBnOIdt@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).