public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
@ 2020-12-16  5:57 ` guojiufu at gcc dot gnu.org
  2020-12-16  6:04 ` guojiufu at gcc dot gnu.org
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-12-16  5:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

Jiu Fu Guo <guojiufu at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |guojiufu at gcc dot gnu.org

--- Comment #10 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
The source code is the same as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98137. And the fix for PR98137
helps to improve the run time of this case too (~20% on some machines).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
  2020-12-16  5:57 ` [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops guojiufu at gcc dot gnu.org
@ 2020-12-16  6:04 ` guojiufu at gcc dot gnu.org
  2020-12-16  9:14 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-12-16  6:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #11 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
And the patch(PR98137) also helps a lot for the code in comment 9, since
vectorization happens.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
  2020-12-16  5:57 ` [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops guojiufu at gcc dot gnu.org
  2020-12-16  6:04 ` guojiufu at gcc dot gnu.org
@ 2020-12-16  9:14 ` rguenth at gcc dot gnu.org
  2020-12-17  2:58 ` guojiufu at gcc dot gnu.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-12-16  9:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
dup / fixed then.

*** This bug has been marked as a duplicate of bug 98137 ***

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-12-16  9:14 ` rguenth at gcc dot gnu.org
@ 2020-12-17  2:58 ` guojiufu at gcc dot gnu.org
  2021-01-04  9:02 ` rguenth at gcc dot gnu.org
  2021-01-08  5:34 ` guojiufu at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2020-12-17  2:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #13 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
Hi Richard,

As checking the changed code as in comment 9, it seems there is another
opportunity to improve the performance:  By improving locality of array A
usage.

Unroll and jam loop1 into loop4 (or unroll and jam loop1 into loop3 after
loop2/loop4 are unrolled completely), this would reduce memory access by
reusing elements of array A. 

It seems not hard to implement this improvement from the source code aspect (as
the example code shown in comment 9). 
While I'm thinking about how to implement this in GCC.

Some concerns are here.  It is not a `perfect nest` for these loops: there are
stmts/instructions that belong to the outer loop (loop1) but outside the inner
loop(loop4). 
And even delete loop2 (or distribute loop2 out) and unroll loop4, 'store to
array C: C[(l_n*10)+l_m] +=xx` is moved out of the inner loop (loop3), but
still inside the outer loop(loop1).  This is not in favor of 'unroll and jam'.

Thanks for any comments!

BR. 
Jiufu Guo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-12-17  2:58 ` guojiufu at gcc dot gnu.org
@ 2021-01-04  9:02 ` rguenth at gcc dot gnu.org
  2021-01-08  5:34 ` guojiufu at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-01-04  9:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jiu Fu Guo from comment #13)
> Hi Richard,
> 
> As checking the changed code as in comment 9, it seems there is another
> opportunity to improve the performance:  By improving locality of array A
> usage.
> 
> Unroll and jam loop1 into loop4 (or unroll and jam loop1 into loop3 after
> loop2/loop4 are unrolled completely), this would reduce memory access by
> reusing elements of array A. 
> 
> It seems not hard to implement this improvement from the source code aspect
> (as the example code shown in comment 9). 
> While I'm thinking about how to implement this in GCC.
> 
> Some concerns are here.  It is not a `perfect nest` for these loops: there
> are stmts/instructions that belong to the outer loop (loop1) but outside the
> inner loop(loop4). 
> And even delete loop2 (or distribute loop2 out) and unroll loop4, 'store to
> array C: C[(l_n*10)+l_m] +=xx` is moved out of the inner loop (loop3), but
> still inside the outer loop(loop1).  This is not in favor of 'unroll and
> jam'.
> 
> Thanks for any comments!
> 
> BR. 
> Jiufu Guo

I've only quickly tried to understand what you are proposing but I think
this is out-of scope of our "separate" distribution / interchange /
unroll-and-jam transforms but requires interaction of them.  Which means
the theory is that the graphite based loop nest optimization should
catch this kind of locality transform.  Which it for sure doesn't do
in it's current state (without checking).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops
       [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2021-01-04  9:02 ` rguenth at gcc dot gnu.org
@ 2021-01-08  5:34 ` guojiufu at gcc dot gnu.org
  5 siblings, 0 replies; 6+ messages in thread
From: guojiufu at gcc dot gnu.org @ 2021-01-08  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88767

--- Comment #15 from Jiu Fu Guo <guojiufu at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #14)
> 
> I've only quickly tried to understand what you are proposing but I think
> this is out-of scope of our "separate" distribution / interchange /
> unroll-and-jam transforms but requires interaction of them.  Which means
> the theory is that the graphite based loop nest optimization should
> catch this kind of locality transform.  Which it for sure doesn't do
> in it's current state (without checking).

Hi Richard, thanks for your thoughts and great comments!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-01-08  5:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-88767-4@http.gcc.gnu.org/bugzilla/>
2020-12-16  5:57 ` [Bug tree-optimization/88767] 'unroll and jam' not optimizing some loops guojiufu at gcc dot gnu.org
2020-12-16  6:04 ` guojiufu at gcc dot gnu.org
2020-12-16  9:14 ` rguenth at gcc dot gnu.org
2020-12-17  2:58 ` guojiufu at gcc dot gnu.org
2021-01-04  9:02 ` rguenth at gcc dot gnu.org
2021-01-08  5:34 ` guojiufu at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).