From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5D8E5395202C; Thu, 17 Mar 2022 12:31:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5D8E5395202C From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104912] [12 Regression] 416.gamess regression after r12-7612-g69619acd8d9b58 Date: Thu, 17 Mar 2022 12:31:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: attachments.created Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Mar 2022 12:31:32 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104912 --- Comment #6 from Richard Biener --- Created attachment 52640 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=3D52640&action=3Dedit patch Like this - this counts the number of vector stmts and the number of strided loads/stores and then when finishing up: +void +ix86_vector_costs::finish_cost (const vector_costs *scalar_costs) +{ + m_finished =3D true; + if (m_costing_for_scalar) + return; + + /* When we have more than one strided load or store and the + number of strided stores is high compared to all vector + stmts in the body we require at least an estimated + improvement due to the vectorization of a factor of two. */ + if (m_n_body_strided_load_store > 1 + && m_n_body_stmts / m_n_body_strided_load_store < 4) + { + unsigned vf =3D 1; + if (is_a (m_vinfo)) + vf =3D vect_vf_for_cost (as_a (m_vinfo)); + if (scalar_costs->prologue_cost () * vf < 2 * body_cost ()) + m_costs[vect_body] *=3D 2; + } +} the scaling of m_costs[vect_body] will make the vectorization unprofitable. Instead of a hard limit like this we could also scale the strided load cost based on the overall number of them, like if adding m_n_body_strided_load_store squared to the cost. Note that the "true" cost would only be visible when doing a scheduling model with dependences in mind. Note that for this particular case this is all hand-waving since the true cost is the versioning/branching overhead, not the vectorized loop body and the low number of iterations makes this particularly visible. So for 416.gamess it will be all a hack...=