From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id BFE2A3858C74; Mon, 7 Mar 2022 08:22:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFE2A3858C74 From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/101929] [12 Regression] r12-7319 regress x264_r by 4% on CLX. Date: Mon, 07 Mar 2022 08:22:01 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2022 08:22:01 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D101929 --- Comment #8 from Hongtao.liu --- (In reply to Richard Biener from comment #7) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9188d727e33..7f1f12fb6c6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -2374,7 +2375,7 @@ fail: > n_vector_builds++; > } > } > - if (all_uniform_p > + if ((all_uniform_p && !two_operators) > || n_vector_builds > 1 > || (n_vector_builds =3D=3D children.length () > && is_a (stmt_info->stmt))) >=20 >=20 > will re-enable the vectorization - it evades the vect_construct cost bump > by instead using scalar_to_vec (aka splat) which has not yet been fixed to > account for a possible gpr to xmm move (so it would be a temporary "solut= ion" > at best). >=20 > Another change to mute the effect somewhat (but not fixing x264) that was > mentioned is >=20 > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost =3D TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost =3D (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; (In reply to Richard Biener from comment #7) > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 9188d727e33..7f1f12fb6c6 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -2374,7 +2375,7 @@ fail: > n_vector_builds++; > } > } > - if (all_uniform_p > + if ((all_uniform_p && !two_operators) > || n_vector_builds > 1 > || (n_vector_builds =3D=3D children.length () > && is_a (stmt_info->stmt))) >=20 >=20 > will re-enable the vectorization - it evades the vect_construct cost bump > by instead using scalar_to_vec (aka splat) which has not yet been fixed to > account for a possible gpr to xmm move (so it would be a temporary "solut= ion" > at best). >=20 > Another change to mute the effect somewhat (but not fixing x264) that was > mentioned is >=20 > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc > index b2bf90576d5..acf2cc977b4 100644 > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -22595,7 +22595,7 @@ ix86_builtin_vectorization_cost (enum > vect_cost_for_stmt type_of_cost, > case vec_construct: > { > /* N element inserts into SSE vectors. */ > - int cost =3D TYPE_VECTOR_SUBPARTS (vectype) * ix86_cost->sse_op; > + int cost =3D (TYPE_VECTOR_SUBPARTS (vectype) - 1) * > ix86_cost->sse_op; n - 1 is right for 128-bit vector, but for 256-bit vector, shouldn't it be = n - 2, since we have a separate cost for vinserti128, and n - 4 for 512-bit one= .=