From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id D0491386EC52; Mon, 14 Mar 2022 13:49:13 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D0491386EC52
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/104912] [12 Regression] 416.gamess regression after
 r12-7612-g69619acd8d9b58
Date: Mon, 14 Mar 2022 13:49:13 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 12.0
X-Bugzilla-Keywords: missed-optimization
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: rguenth at gcc dot gnu.org
X-Bugzilla-Target-Milestone: 12.0
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-104912-4-CafyyBLbYA@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-104912-4@http.gcc.gnu.org/bugzilla/>
References: <bug-104912-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Mar 2022 13:49:13 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D104912
--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think for the case at hand no runtime alias checking is needed, since we =
have

            DO 30 MK=3D1,NOC
            DO 30 ML=3D1,MK
               MKL =3D MKL+1
               XPQKL(MPQ,MKL) =3D XPQKL(MPQ,MKL) +
     *               VAL1*(CO(MS,MK)*CO(MR,ML)+CO(MS,ML)*CO(MR,MK))
               XPQKL(MRS,MKL) =3D XPQKL(MRS,MKL) +
     *               VAL3*(CO(MQ,MK)*CO(MP,ML)+CO(MQ,ML)*CO(MP,MK))
   30       CONTINUE

so we're dealing with reductions which we can interleave (with -Ofast).=20
Editing
the source with !GCC$ ivdep reduces the vectorization penalty to 5% (we sti=
ll
need the niter/epilogue checks).  It also shows that only fixing PR89755 is=
n't
the solution we're looking for.

In the end the vectorization is unlikely going to play out since V2DF is
usually handled well by dual issue capabilities for DFmode arithmetic on
modern archs.

The only mitigation I can think of is realizing the outer inner loop niter
is 0, 1, 2, .., NOC - 1 and thus the first outer iterations will have inner
loop vectorization not profitable.  But the question is what to do with this
(not knowing the actual runtime values of NOC).  As PR87561 says

"Note for 416.gamess it looks like NOC is just 5 but MPQ and MRS are so
that there is no runtime aliasing between iterations most of the time
(sometimes they are indeed equal).  The cost model check skips the
vector loop for MK =3D=3D 2 and 3 and only will execute it for MK =3D=3D 4 =
and 5.
An alternative for this kind of loop nest would be to cost-model for
MK % 2 =3D=3D 0, thus requiring no epilogue loop."

In general applying no vectorization to these kind of loops looks wrong.
Versioning also the outer loop in addition to the inner loop in case the
number of iterations evolves in the outer loop looks excessive (but would
eventually help 416.gamess).  Implementation-wise it's also non-trivial.=