From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D0491386EC52; Mon, 14 Mar 2022 13:49:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D0491386EC52 From: "rguenth at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/104912] [12 Regression] 416.gamess regression after r12-7612-g69619acd8d9b58 Date: Mon, 14 Mar 2022 13:49:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: rguenth at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org X-Bugzilla-Target-Milestone: 12.0 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Mar 2022 13:49:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104912 --- Comment #4 from Richard Biener --- I think for the case at hand no runtime alias checking is needed, since we = have DO 30 MK=3D1,NOC DO 30 ML=3D1,MK MKL =3D MKL+1 XPQKL(MPQ,MKL) =3D XPQKL(MPQ,MKL) + * VAL1*(CO(MS,MK)*CO(MR,ML)+CO(MS,ML)*CO(MR,MK)) XPQKL(MRS,MKL) =3D XPQKL(MRS,MKL) + * VAL3*(CO(MQ,MK)*CO(MP,ML)+CO(MQ,ML)*CO(MP,MK)) 30 CONTINUE so we're dealing with reductions which we can interleave (with -Ofast).=20 Editing the source with !GCC$ ivdep reduces the vectorization penalty to 5% (we sti= ll need the niter/epilogue checks). It also shows that only fixing PR89755 is= n't the solution we're looking for. In the end the vectorization is unlikely going to play out since V2DF is usually handled well by dual issue capabilities for DFmode arithmetic on modern archs. The only mitigation I can think of is realizing the outer inner loop niter is 0, 1, 2, .., NOC - 1 and thus the first outer iterations will have inner loop vectorization not profitable. But the question is what to do with this (not knowing the actual runtime values of NOC). As PR87561 says "Note for 416.gamess it looks like NOC is just 5 but MPQ and MRS are so that there is no runtime aliasing between iterations most of the time (sometimes they are indeed equal). The cost model check skips the vector loop for MK =3D=3D 2 and 3 and only will execute it for MK =3D=3D 4 = and 5. An alternative for this kind of loop nest would be to cost-model for MK % 2 =3D=3D 0, thus requiring no epilogue loop." In general applying no vectorization to these kind of loops looks wrong. Versioning also the outer loop in addition to the inner loop in case the number of iterations evolves in the outer loop looks excessive (but would eventually help 416.gamess). Implementation-wise it's also non-trivial.=