[Bug tree-optimization/44794] pre- and post-loops should not be unrolled.

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
       [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/>
@ 2023-03-09  8:06 ` rguenth at gcc dot gnu.org
  2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-09  8:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2023-03-09
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
On trunk we end up with a vectorized loop with unrolled vectorized epilogue and
scalar iteration and with a scalar loop copy used for n == 1 and the case
when 'a' and 'b' alias, so this loop covers all possible 'n'.  Prefetching then
(rightfully) prefetches both of these which makes four out of them.

What can be improved is the upper bound on the iteration for the epilog loop as
created by tree_unroll_loop.  That results in the RTL unroller seeing

-  upper bound: 2147483646
-  likely upper bound: 2147483646
+  upper bound: 14
+  likely upper bound: 14
   realistic bound: -1
 ;; Unable to prove that the loop iterates constant times
+;; Not unrolling loop, doesn't roll

...

-  upper bound: 536870910
-  likely upper bound: 536870910
+  upper bound: 2
+  likely upper bound: 2
   realistic bound: -1
 ;; Unable to prove that the loop iterates constant times
+;; Not unrolling loop, doesn't roll

and thus only unrolling the main loops again.  The prefetching unrolling
should have been enough of course - I suppose the RTL unroller could detect
prefetch instructions and refrain from unrolling loops with prefetches
on the basis they are already tuned well (that could be also implemented
in the unroll control target hook).

I am testing a patch to improve the situation.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
       [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/>
  2023-03-09  8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org
@ 2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org
  2023-04-19 11:53 ` rguenth at gcc dot gnu.org
  2023-04-19 11:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-04-19 11:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:a243ce2a52a6c62bc0d6be0b756a85dd9c1bceb7

commit r14-71-ga243ce2a52a6c62bc0d6be0b756a85dd9c1bceb7
Author: Richard Biener <rguenther@suse.de>
Date:   Thu Mar 9 09:02:07 2023 +0100

    tree-optimization/44794 - avoid excessive RTL unrolling on epilogues

    The following adjusts tree_[transform_and_]unroll_loop to set an
    upper bound on the number of iterations on the epilogue loop it
    creates.  For the testcase at hand which involves array prefetching
    this avoids applying RTL unrolling to them when -funroll-loops is
    specified.

    Other users of this API includes predictive commoning and
    unroll-and-jam.

            PR tree-optimization/44794
            * tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop):
            If an epilogue loop is required set its iteration upper bound.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
       [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/>
  2023-03-09  8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org
  2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org
@ 2023-04-19 11:53 ` rguenth at gcc dot gnu.org
  2023-04-19 11:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-19 11:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Heuristic to not unroll loops with prefetches is missing.  The aprefetch pass
could set ->unroll to 1 in the loop structure.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
       [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2023-04-19 11:53 ` rguenth at gcc dot gnu.org
@ 2023-04-19 11:55 ` rguenth at gcc dot gnu.org
  3 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-04-19 11:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|rguenth at gcc dot gnu.org         |unassigned at gcc dot gnu.org
             Status|ASSIGNED                    |NEW

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794]  New: pre- and post-loops should not be unrolled.
@ 2010-07-02 23:54 changpeng dot fang at amd dot com
  2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-02 23:54 UTC (permalink / raw)
  To: gcc-bugs

void foo(int *a, int *b, int n)
{
  int i;
  for(i = 0; i < n; i++)
     a[i] = a[i] + b[i];
}

For this simple loop, the vectorizer does its job and peels the last few 
iterations as post-loop that is not vectorized. But the RTL loop unroller
does not know that it just has a few (at most 3 in this case) iterations,
and will unroll the post-loop.

What is worse, if you compile it with:
  gcc -O3 -fprefetch-loop-arrays -funroll-loops

You may find the prefetch pass will also unroll the post-loop, and generate
a new post-loop (post-post-loop) for this post-loop. Again, the RTL loop
unroller could not recognize this post-post-loop, and will unroll it.
(the RTL loop unroller will generate yet another post loop
(post-post-post-loop) for the post-post-loop :-))

 This will cause compilation time and code size increase dramastically without
any performance benefit.

-- 
           Summary: pre- and post-loops should not be unrolled.
           Product: gcc
           Version: lno
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: changpeng dot fang at amd dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
  2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com
@ 2010-07-03 10:48 ` rguenth at gcc dot gnu dot org
  2010-07-06 17:59 ` changpeng dot fang at amd dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu dot org @ 2010-07-03 10:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from rguenth at gcc dot gnu dot org  2010-07-03 10:48 -------
It would be interesting to know why/if number-of-iteration analysis fails
and if the code the vectorizer emits can be adjusted to fix that.


-- 

rguenth at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu dot
                   |                            |org, rakdver at gcc dot gnu
                   |                            |dot org
           Severity|major                       |enhancement
            Version|lno                         |4.6.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
  2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com
  2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org
@ 2010-07-06 17:59 ` changpeng dot fang at amd dot com
  2010-07-06 18:36 ` changpeng dot fang at amd dot com
  2010-07-15  1:50 ` changpeng dot fang at amd dot com
  3 siblings, 0 replies; 8+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-06 17:59 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #2 from changpeng dot fang at amd dot com  2010-07-06 17:58 -------
We also need to handle the post loop of unrolling. Suppose the unroll_factor
is 16, then the post-loop should have up to 15 iterations.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
  2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com
  2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org
  2010-07-06 17:59 ` changpeng dot fang at amd dot com
@ 2010-07-06 18:36 ` changpeng dot fang at amd dot com
  2010-07-15  1:50 ` changpeng dot fang at amd dot com
  3 siblings, 0 replies; 8+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-06 18:36 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #3 from changpeng dot fang at amd dot com  2010-07-06 18:35 -------
Here is the impact of loop unrolling on the compilation time and code size
on polyhedron test_fpu.f90:

-O3 -ftree-vectorize -fno-prefetch-loop-arrays -fno-unroll-loops:
timing: 12.62s,  size: 67069  bytes
-O3 -ftree-vectorize -fprefetch-loop-arrays -funroll-loops:
timing: 51.77s,  size: 234045 bytes

I also did an experiment on prefetching that we don't unroll the pre- and
post-loop generated by the vectorizer:
-O3 -ftree-vectorize -fprefetch-loop-arrays:
timing: 29.32s   size: 92541 bytes
-O3 -ftree-vectorize -fprefetch-loop-arrays (don't unroll pre- postloops)
timing: 18.34s   size: 78909 bytes 
-O3 -ftree-vectorize -fno-prefetch-loop-arrays
timing: 12.62s,  size: 67069  bytes

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/44794] pre- and post-loops should not be unrolled.
  2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com
                   ` (2 preceding siblings ...)
  2010-07-06 18:36 ` changpeng dot fang at amd dot com
@ 2010-07-15  1:50 ` changpeng dot fang at amd dot com
  3 siblings, 0 replies; 8+ messages in thread
From: changpeng dot fang at amd dot com @ 2010-07-15  1:50 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #4 from changpeng dot fang at amd dot com  2010-07-15 01:50 -------
Created an attachment (id=21205)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21205&action=view)
Do not unroll pre and post loops

I did a quick test on polyhedron before and after applying the preliminary
patch. Tests are based on -O3 -fprefetch-loop-arrays -funroll-loops.

               timing (s)      |     size (B)
         before after   %deduc | before after   %deduc          
cacacita 14.35  10.88   24.18  | 90715  72843   19.7
gas_dyn  34.68  21.58   37.77  | 149608 100936  32.53
nf       33.91  19.32   43.03  | 139150 83054   40.31
protein  51.35  33.23   35.29  | 163672 122808  24.97
rnflow   60.9   43.28   28.93  | 268784 169152  37.07
test_fpu 52.61  30.35   42.31  | 234045 144285  38.35


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44794


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-19 11:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-44794-4@http.gcc.gnu.org/bugzilla/>
2023-03-09  8:06 ` [Bug tree-optimization/44794] pre- and post-loops should not be unrolled rguenth at gcc dot gnu.org
2023-04-19 11:51 ` cvs-commit at gcc dot gnu.org
2023-04-19 11:53 ` rguenth at gcc dot gnu.org
2023-04-19 11:55 ` rguenth at gcc dot gnu.org
2010-07-02 23:54 [Bug tree-optimization/44794] New: " changpeng dot fang at amd dot com
2010-07-03 10:48 ` [Bug tree-optimization/44794] " rguenth at gcc dot gnu dot org
2010-07-06 17:59 ` changpeng dot fang at amd dot com
2010-07-06 18:36 ` changpeng dot fang at amd dot com
2010-07-15  1:50 ` changpeng dot fang at amd dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).