From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 72BE9385828B; Mon, 25 Jul 2022 09:44:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 72BE9385828B From: "luoxhu at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast -march=native regressed by 19% on zen2 and zen3 in July 2022 Date: Mon, 25 Jul 2022 09:44:25 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: luoxhu at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: 13.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jul 2022 09:44:25 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106293 --- Comment #4 from luoxhu at gcc dot gnu.org --- Could you try revert (In reply to Richard Biener from comment #2) > I can reproduce a regression with -Ofast -march=3Dznver2 running on Haswe= ll as > well. -fopt-info doesn't reveal anything interesting besides >=20 > -fast_algorithms.c:133:19: optimized: loop with 2 iterations completely > unrolled (header execution count 32987933) > +fast_algorithms.c:133:19: optimized: loop with 2 iterations completely > unrolled (header execution count 129072791) >=20 > obviously the slowdown is in P7Viterbi. There's only minimal changes on = the > GIMPLE side, one notable: >=20 > niters_vector_mult_vf.205_2406 =3D niters.203_442 & 429496729 | _2041= =3D > niters.203_438 & 3; > _2408 =3D (int) niters_vector_mult_vf.205_2406; | if (_= 2041 > =3D=3D 0) > tmp.206_2407 =3D k_384 + _2408; | got= o 66>; [25.00%] > _2300 =3D niters.203_442 & 3; < > if (_2300 =3D=3D 0) < > goto ; [25.00%] < > else else > goto ; [75.00%] goto = 36>; [75.00%] >=20 > [local count: 41646173]: | > [local count: 177683003]: > # k_2403 =3D PHI |=20=20 > niters_vector_mult_vf.205_2409 =3D niters.203_438 & 429496729 > # DEBUG k =3D> k_2403 | _2411= =3D > (int) niters_vector_mult_vf.205_2409; > >=20=20 > tmp.206_2410 =3D k_382 + _2411; > > > > > [local count: 162950122]: > > # k_240= 6 =3D > PHI >=20 > the sink pass now does the transform where it did not do so before. >=20 > That's appearantly because of >=20 > /* If BEST_BB is at the same nesting level, then require it to have > significantly lower execution frequency to avoid gratuitous movement= .=20 > */ > if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb) > /* If result of comparsion is unknown, prefer EARLY_BB. > Thus use !(...>=3D..) rather than (...<...) */ > && !(best_bb->count * 100 >=3D early_bb->count * threshold)) > return best_bb; >=20 > /* No better block found, so return EARLY_BB, which happens to be the > statement's original block. */ > return early_bb; >=20 > where the SRC count is 96726596 before, 236910671 after and the > destination count is 72544947 before, 177683003 at the destination after. > The edge probabilities are 75% vs 25% and param_sink_frequency_threshold > is exactly 75 as well. Since 236910671*0.75 > is rounded down it passes the test while the previous state has an exact > match defeating it. >=20 > It's a little bit of an arbitrary choice, >=20 > diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc > index 2e744d6ae50..9b368e13463 100644 > --- a/gcc/tree-ssa-sink.cc > +++ b/gcc/tree-ssa-sink.cc > @@ -230,7 +230,7 @@ select_best_block (basic_block early_bb, > if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb) > /* If result of comparsion is unknown, prefer EARLY_BB. > Thus use !(...>=3D..) rather than (...<...) */ > - && !(best_bb->count * 100 >=3D early_bb->count * threshold)) > + && !(best_bb->count * 100 > early_bb->count * threshold)) > return best_bb; >=20=20 > /* No better block found, so return EARLY_BB, which happens to be the >=20 > fixes the missed sinking but not the regression :/ >=20 > The count differences start to appear in when LC PHI blocks are added > only for virtuals and then pre-existing 'Invalid sum of incoming counts' > eventually lead to mismatches. The 'Invalid sum of incoming counts' > start with the loop splitting pass. >=20 > fast_algorithms.c:145:10: optimized: loop split >=20 > Xionghu Lou did profile count updates there, not sure if that made things > worse in this case. >=20 > At least with broken BB counts splitting/unsplitting an edge can propagate > bogus counts elsewhere it seems. :(, Could you please try revert cd5ae148c47c6dee05adb19acd6a523f7187be7f and see whether performance is back?=