From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 55E1B385DC06; Wed, 27 Mar 2024 10:37:00 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 55E1B385DC06 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1711535820; bh=0zjkoNSXN8IvJkoJ6cmNPD5sm3b3zS3CsDc3R5Y3hWQ=; h=From:To:Subject:Date:In-Reply-To:References:From; b=MeoLJfpMlgZVSr12VHvbHjPrB1xnPX9A6FzUNpcfE9JMxGBEByqi/XLdxD2TkbOhK P8fGe14BFr1XusNyaIEzLTwNZAP7fUISKHMysfPH1dvegnjFPsLa2lCUaZhHmbRzia NZ1TsKgS8I5OlpDUcIoA4bOuqajb7yjLsegDCLJs= From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/114057] [14 Regression] 435.gromacs fails verification with -Ofast -march={znver2,znver4} and PGO after r14-7272-g57f611604e8bab Date: Wed, 27 Mar 2024 10:36:58 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 14.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D114057 --- Comment #12 from Richard Biener --- OK, so I think the change is that we get to "correctly" notice -vec.h:380:9: note: node (external) 0x6a2e9d8 (max_nunits=3D2, refcnt=3D1) vector(2) float -vec.h:380:9: note: stmt 0 _164 =3D MEM[(const real *)_27 + 8B]; -vec.h:380:9: note: stmt 1 _158 =3D MEM[(const real *)_27]; +vec.h:380:9: note: node (external) 0x5a823a8 (max_nunits=3D2, refcnt=3D1) vector(2) float +vec.h:380:9: note: [l] stmt 0 _164 =3D MEM[(const real *)_27 + 8B]; +vec.h:380:9: note: [l] stmt 1 _158 =3D MEM[(const real *)_27]; for the loads we do not handle because of gaps and promoted external. That leads to extra costs. But also +vec.h:380:9: note: node 0x5a81770 (max_nunits=3D2, refcnt=3D2) vector(2) f= loat vec.h:380:9: note: op template: x_160 =3D _158 - _159; vec.h:380:9: note: stmt 0 x_160 =3D _158 - _159; -vec.h:380:9: note: [l] stmt 1 y_163 =3D _161 - _162; +vec.h:380:9: note: stmt 1 y_163 =3D _161 - _162; so y_163 isn't considered live for some reason. We find _123 =3D _117 * y_163; is vectorized as part of a reduction. On the costing side we then see -_161 - _162 1 times scalar_stmt costs 12 in body -MEM[(const real *)_27 + 4B] 1 times scalar_load costs 12 in body -MEM[(const real *)_24 + 4B] 1 times scalar_load costs 12 in body which is the live (and dependent) stmts no longer costed on the scalar side but also +MEM[(const real *)_27 + 8B] 1 times vec_to_scalar costs 4 in epilogue +MEM[(const real *)_24 + 8B] 1 times vec_to_scalar costs 4 in epilogue costed in the vector epilog. This is because we're conservative as we don't really know whether we'll be able to code-generate the live operation. The costing side here is also not in sync as can be seen from the _161 - _162 op removed. I should also note that the setting of PURE_SLP is done a bit too early, before we analyze operations and eventually throw away instances or prune it by promoting ops external. For reductions we also falsely claim all root stmts are vectorized - we do have remain ops. Fixing this restores the LIVE on them and in some way restores vectorization. I'm going to test this as fix for now.=