From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 72BE9385828B; Mon, 25 Jul 2022 09:44:25 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 72BE9385828B
From: "luoxhu at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/106293] [13 Regression] 456.hmmer at -Ofast
 -march=native regressed by 19% on zen2 and zen3 in July 2022
Date: Mon, 25 Jul 2022 09:44:25 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: luoxhu at gcc dot gnu.org
X-Bugzilla-Status: NEW
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 13.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-106293-4-W7NmBnOIdt@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-106293-4@http.gcc.gnu.org/bugzilla/>
References: <bug-106293-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jul 2022 09:44:25 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D106293
--- Comment #4 from luoxhu at gcc dot gnu.org ---
Could you try revert (In reply to Richard Biener from comment #2)
> I can reproduce a regression with -Ofast -march=3Dznver2 running on Haswe=
ll as
> well.  -fopt-info doesn't reveal anything interesting besides
>=20
> -fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 32987933)
> +fast_algorithms.c:133:19: optimized: loop with 2 iterations completely
> unrolled (header execution count 129072791)
>=20
> obviously the slowdown is in P7Viterbi.  There's only minimal changes on =
the
> GIMPLE side, one notable:
>=20
>   niters_vector_mult_vf.205_2406 =3D niters.203_442 & 429496729 |   _2041=
 =3D
> niters.203_438 & 3;
>   _2408 =3D (int) niters_vector_mult_vf.205_2406;               |   if (_=
2041
> =3D=3D 0)
>   tmp.206_2407 =3D k_384 + _2408;                               |     got=
o <bb
> 66>; [25.00%]
>   _2300 =3D niters.203_442 & 3;                                 <
>   if (_2300 =3D=3D 0)                                             <
>     goto <bb 65>; [25.00%]                                    <
>   else                                                            else
>     goto <bb 36>; [75.00%]                                          goto =
<bb
> 36>; [75.00%]
>=20
>   <bb 36> [local count: 41646173]:                            |   <bb 36>
> [local count: 177683003]:
>   # k_2403 =3D PHI <tmp.206_2407(35), tmp.239_2637(34)>         |=20=20
> niters_vector_mult_vf.205_2409 =3D niters.203_438 & 429496729
>   # DEBUG k =3D> k_2403                                         |   _2411=
 =3D
> (int) niters_vector_mult_vf.205_2409;
>                                                               >=20=20
> tmp.206_2410 =3D k_382 + _2411;
>                                                               >
>                                                               >   <bb 37>
> [local count: 162950122]:
>                                                               >   # k_240=
6 =3D
> PHI <tmp.206_2410(36), tmp.239_2639(34)>
>=20
> the sink pass now does the transform where it did not do so before.
>=20
> That's appearantly because of
>=20
>   /* If BEST_BB is at the same nesting level, then require it to have
>      significantly lower execution frequency to avoid gratuitous movement=
.=20
> */
>   if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb)
>       /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=3D..) rather than (...<...)  */
>       && !(best_bb->count * 100 >=3D early_bb->count * threshold))
>     return best_bb;
>=20
>   /* No better block found, so return EARLY_BB, which happens to be the
>      statement's original block.  */
>   return early_bb;
>=20
> where the SRC count is 96726596 before, 236910671 after and the
> destination count is 72544947 before, 177683003 at the destination after.
> The edge probabilities are 75% vs 25% and param_sink_frequency_threshold
> is exactly 75 as well.  Since 236910671*0.75
> is rounded down it passes the test while the previous state has an exact
> match defeating it.
>=20
> It's a little bit of an arbitrary choice,
>=20
> diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
> index 2e744d6ae50..9b368e13463 100644
> --- a/gcc/tree-ssa-sink.cc
> +++ b/gcc/tree-ssa-sink.cc
> @@ -230,7 +230,7 @@ select_best_block (basic_block early_bb,
>    if (bb_loop_depth (best_bb) =3D=3D bb_loop_depth (early_bb)
>        /* If result of comparsion is unknown, prefer EARLY_BB.
>          Thus use !(...>=3D..) rather than (...<...)  */
> -      && !(best_bb->count * 100 >=3D early_bb->count * threshold))
> +      && !(best_bb->count * 100 > early_bb->count * threshold))
>      return best_bb;
>=20=20
>    /* No better block found, so return EARLY_BB, which happens to be the
>=20
> fixes the missed sinking but not the regression :/
>=20
> The count differences start to appear in when LC PHI blocks are added
> only for virtuals and then pre-existing 'Invalid sum of incoming counts'
> eventually lead to mismatches.  The 'Invalid sum of incoming counts'
> start with the loop splitting pass.
>=20
> fast_algorithms.c:145:10: optimized: loop split
>=20
> Xionghu Lou did profile count updates there, not sure if that made things
> worse in this case.
>=20
> At least with broken BB counts splitting/unsplitting an edge can propagate
> bogus counts elsewhere it seems.

:(, Could you please try revert cd5ae148c47c6dee05adb19acd6a523f7187be7f and
see whether performance is back?=