From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id CFD6C3858C27; Mon,  4 Jan 2021 09:02:40 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CFD6C3858C27
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/88767] 'unroll and jam' not optimizing some
 loops
Date: Mon, 04 Jan 2021 09:02:40 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: tree-optimization
X-Bugzilla-Version: 9.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: RESOLVED
X-Bugzilla-Resolution: DUPLICATE
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-88767-4-rXxRMAHTTQ@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-88767-4@http.gcc.gnu.org/bugzilla/>
References: <bug-88767-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2021 09:02:40 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D88767
--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #13)
> Hi Richard=EF=BC=8C
>=20
> As checking the changed code as in comment 9, it seems there is another
> opportunity to improve the performance:  By improving locality of array A
> usage.
>=20
> Unroll and jam loop1 into loop4 (or unroll and jam loop1 into loop3 after
> loop2/loop4 are unrolled completely), this would reduce memory access by
> reusing elements of array A.=20
>=20
> It seems not hard to implement this improvement from the source code aspe=
ct
> (as the example code shown in comment 9).=20
> While I'm thinking about how to implement this in GCC.
>=20
> Some concerns are here.  It is not a `perfect nest` for these loops: there
> are stmts/instructions that belong to the outer loop (loop1) but outside =
the
> inner loop(loop4).=20
> And even delete loop2 (or distribute loop2 out) and unroll loop4, 'store =
to
> array C: C[(l_n*10)+l_m] +=3Dxx` is moved out of the inner loop (loop3), =
but
> still inside the outer loop(loop1).  This is not in favor of 'unroll and
> jam'.
>=20
> Thanks for any comments!
>=20
> BR.=20
> Jiufu Guo

I've only quickly tried to understand what you are proposing but I think
this is out-of scope of our "separate" distribution / interchange /
unroll-and-jam transforms but requires interaction of them.  Which means
the theory is that the graphite based loop nest optimization should
catch this kind of locality transform.  Which it for sure doesn't do
in it's current state (without checking).=