public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli)
@ 2012-12-18 14:13 ysrumyan at gmail dot com
  2012-12-18 14:23 ` [Bug tree-optimization/55731] " ysrumyan at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: ysrumyan at gmail dot com @ 2012-12-18 14:13 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

             Bug #: 55731
           Summary: Issue with complete innermost loop unrolling
                    (cunrolli)
    Classification: Unclassified
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: ysrumyan@gmail.com
                CC: hubicka@ucw.cz, izamyatin@gmail.com


I attached 2 test-cases extracted from important benchmark at which clang and
icc outperform gcc for x86 target (atom). For 1st test-case (t.c) cunrolli
phase does not perform complete loop unrolling with the following message (test
was compiled with -O3 -funroll-loops options):

  Loop size: 23
  Estimated size after unrolling: 33
Not unrolling loop 1: size would grow.

but it is unrolled by cunroll phase:

  Loop size: 24
  Estimated size after unrolling: 32
Unrolled loop 1 completely (duplicated 2 times).

I wonder why this loop was not unrolled by cunrolli? We lost a lot of
optimizations for unrolled loop such as Constant (address) Propagation, Dead
code elimination etc. and got non-optimal binaries.

For comparsion I added another test (t2.c) with successfull complete unrolling
by cunrolli, at which we can see that all assignments to local array 'b' were
properly propagated and deleted but we don't have such transformations for 1st
test-case.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
@ 2012-12-18 14:23 ` ysrumyan at gmail dot com
  2012-12-18 14:24 ` ysrumyan at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ysrumyan at gmail dot com @ 2012-12-18 14:23 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

--- Comment #1 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-12-18 14:23:30 UTC ---
Created attachment 28997
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28997
testcase1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
  2012-12-18 14:23 ` [Bug tree-optimization/55731] " ysrumyan at gmail dot com
@ 2012-12-18 14:24 ` ysrumyan at gmail dot com
  2012-12-18 15:35 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: ysrumyan at gmail dot com @ 2012-12-18 14:24 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

--- Comment #2 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-12-18 14:24:05 UTC ---
Created attachment 28998
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=28998
testcase2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
  2012-12-18 14:23 ` [Bug tree-optimization/55731] " ysrumyan at gmail dot com
  2012-12-18 14:24 ` ysrumyan at gmail dot com
@ 2012-12-18 15:35 ` rguenth at gcc dot gnu.org
  2012-12-19  9:17 ` ysrumyan at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-18 15:35 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-12-18
     Ever Confirmed|0                           |1

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-18 15:34:51 UTC ---
The reason is that unrolling early can be harmful to for example vectorization
and thus cunrolli restricts itself to "obviously" profitable cases.

In this case the loop is not an "inner" loop - it doesn't have a containing
loop and so growth is not allowed even with -O3 (we otherwise will fail
to vectorize if the unrolled body ends up as part of other basic-blocks).

It's a know issue that after cunroll there is no strong value-numbering
pass that handles memory (there is DOM which only has weak memory handling).

So, it's a trade-off we make, mostly for the sake of loop optimizations
that do not handle unrolled loops well.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
                   ` (2 preceding siblings ...)
  2012-12-18 15:35 ` rguenth at gcc dot gnu.org
@ 2012-12-19  9:17 ` ysrumyan at gmail dot com
  2012-12-19 10:27 ` rguenth at gcc dot gnu.org
  2021-12-12  9:45 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: ysrumyan at gmail dot com @ 2012-12-19  9:17 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

--- Comment #4 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2012-12-19 09:17:40 UTC ---
(In reply to comment #3)
> The reason is that unrolling early can be harmful to for example vectorization
> and thus cunrolli restricts itself to "obviously" profitable cases.
> 
> In this case the loop is not an "inner" loop - it doesn't have a containing
> loop and so growth is not allowed even with -O3 (we otherwise will fail
> to vectorize if the unrolled body ends up as part of other basic-blocks).
> 
Richard,

It looks that you did not see attached testcases.
I can't agree with your statement since
1. Loop in problem (t.c) has only 3 iterations and in any case it should not be
considered as candidate for vectorization.
2. Loop contains calls of functions that do not have vectorizable counterparts.
3. Loop contains comparisons with loop control variable as
    if (i == 0) etc.
and cunrolli phase determines it:

 BB: 7, after_exit: 1
  size:   2 if (i_1 == 1)
   Constant conditional.
 BB: 5, after_exit: 1
  size:   2 foo4 (k_15(D));
  size:   2 if (i_1 == 0)
   Constant conditional.

It means that these tests will be completely eliminated by loop unroller and
some bb will become unreachable.

I also added another testcase (t2.c) for which cunrolli does correct size
estimation and completely unroll it (it has only 2 iterations).

So I assume that size estimation algorithm in unroller is not perfect and must
be re-written.

And at last if customer provides gcc with "-funroll-loop" option we should not
consider "possible size growth" as reason of unroll rejection. 

> It's a know issue that after cunroll there is no strong value-numbering
> pass that handles memory (there is DOM which only has weak memory handling).
> 
> So, it's a trade-off we make, mostly for the sake of loop optimizations
> that do not handle unrolled loops well.

Best regards.
Yuri.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
                   ` (3 preceding siblings ...)
  2012-12-19  9:17 ` ysrumyan at gmail dot com
@ 2012-12-19 10:27 ` rguenth at gcc dot gnu.org
  2021-12-12  9:45 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: rguenth at gcc dot gnu.org @ 2012-12-19 10:27 UTC (permalink / raw)
  To: gcc-bugs


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> 2012-12-19 10:27:22 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > The reason is that unrolling early can be harmful to for example vectorization
> > and thus cunrolli restricts itself to "obviously" profitable cases.
> > 
> > In this case the loop is not an "inner" loop - it doesn't have a containing
> > loop and so growth is not allowed even with -O3 (we otherwise will fail
> > to vectorize if the unrolled body ends up as part of other basic-blocks).
> > 
> Richard,
> 
> It looks that you did not see attached testcases.

I did - I even compiled them as you did and looked at the dump file and
the unroller source.

> I can't agree with your statement since
> 1. Loop in problem (t.c) has only 3 iterations and in any case it should not be
> considered as candidate for vectorization.

That's target dependend knowledge the unroller does not have (with
two element vectors you can produce one vectorized and one scalar iteration).

> 2. Loop contains calls of functions that do not have vectorizable counterparts.

The unroller does not have this detailed knowledge of the vectorizers
capabilities - it simply considers all loops vectorizable.

> 3. Loop contains comparisons with loop control variable as
>     if (i == 0) etc.
> and cunrolli phase determines it:
> 
>  BB: 7, after_exit: 1
>   size:   2 if (i_1 == 1)
>    Constant conditional.
>  BB: 5, after_exit: 1
>   size:   2 foo4 (k_15(D));
>   size:   2 if (i_1 == 0)
>    Constant conditional.
> 
> It means that these tests will be completely eliminated by loop unroller and
> some bb will become unreachable.

So?  Fact is:

      FOR_EACH_LOOP (li, loop, LI_FROM_INNERMOST)
        {
          struct loop *loop_father = loop_outer (loop);

          if (may_increase_size && optimize_loop_nest_for_speed_p (loop)
              /* Unroll outermost loops only if asked to do so or they do
                 not cause code growth.  */
              && (unroll_outer || loop_outer (loop_father)))
            ul = UL_ALL;
          else
            ul = UL_NO_GROWTH;

will end up with ul == UL_NO_GROWTH for t.c.  Because loop_outer (loop_father)
is NULL (and unroll_outer is false).

I stated the reason for this "heuristic" (-> this loop may no longer be
a loop after unrolling and thus not vectorizable).

> I also added another testcase (t2.c) for which cunrolli does correct size
> estimation and completely unroll it (it has only 2 iterations).

size: 14-5, last_iteration: 2-0
  Loop size: 14
  Estimated size after unrolling: 13

doesn't grow thus is ok to unroll.

> So I assume that size estimation algorithm in unroller is not perfect and must
> be re-written.

Haha ;)  Of course - it can't be "perfect" - you cannot reasonably pre-compute
the outcome of all subsequent optimizations correctly without ever pessimizing
in one or another way (either estimate a too small or a too large size).

But you are of course free to propose a patch!

> And at last if customer provides gcc with "-funroll-loop" option we should not
> consider "possible size growth" as reason of unroll rejection. 

As I said above, cunrolli is supposed to only unroll inner loops.  Your
loop isn't an inner (nested loop).  This restriction is relaxed if unrolling
does not increase size.

> > It's a know issue that after cunroll there is no strong value-numbering
> > pass that handles memory (there is DOM which only has weak memory handling).
> > 
> > So, it's a trade-off we make, mostly for the sake of loop optimizations
> > that do not handle unrolled loops well.
> 
> Best regards.
> Yuri.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug tree-optimization/55731] Issue with complete innermost loop unrolling (cunrolli)
  2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
                   ` (4 preceding siblings ...)
  2012-12-19 10:27 ` rguenth at gcc dot gnu.org
@ 2021-12-12  9:45 ` pinskia at gcc dot gnu.org
  5 siblings, 0 replies; 7+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-12-12  9:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55731

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |11.0
         Resolution|---                         |FIXED

--- Comment #6 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Fixed in GCC 11 by a few things, first there is now a curoll pass after
vectorization. Second is there if a fre pass added in the late stage. Third is
by handling vector extractions in FRE (VN) [if we do -O3 -fno-tree-vectorize,
this is optimized in GCC 9).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-12-12  9:45 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-18 14:13 [Bug tree-optimization/55731] New: Issue with complete innermost loop unrolling (cunrolli) ysrumyan at gmail dot com
2012-12-18 14:23 ` [Bug tree-optimization/55731] " ysrumyan at gmail dot com
2012-12-18 14:24 ` ysrumyan at gmail dot com
2012-12-18 15:35 ` rguenth at gcc dot gnu.org
2012-12-19  9:17 ` ysrumyan at gmail dot com
2012-12-19 10:27 ` rguenth at gcc dot gnu.org
2021-12-12  9:45 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).