public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/59802] New: excessive compile time in loop unswitching
@ 2014-01-14  8:49 dcb314 at hotmail dot com
  2014-01-14 11:20 ` [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP) rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: dcb314 at hotmail dot com @ 2014-01-14  8:49 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

            Bug ID: 59802
           Summary: excessive compile time in loop unswitching
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dcb314 at hotmail dot com

Created attachment 31830
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31830&action=edit
gzipped C++ source code

I just compiled the attached code with gcc trunk 20140112 on a x86_64
box with flag -O3 and it took over eight minutes.  Using only -O2 took
a more reasonable 2 minutes 38 seconds.

For reference, the redhat version of gcc 482 took 2 minutes 32 seconds
for -O3 and 2 minutes 11 seconds for -O2.

I can see that for -O2, trunk is using about 30 seconds more CPU time,
which is fine, but for -O3 over 5 minutes more.

I tried flag -ftime-report and here are all the times > 1%.

Execution times (seconds)
 phase opt and generate  : 465.18 (100%) usr   0.50 (57%) sys 468.04 (100%)
wall  130935 kB (59%) ggc
 loop invariant motion   :  22.50 ( 5%) usr   0.01 ( 1%) sys  22.85 ( 5%) wall 
     2 kB ( 0%) ggc
 loop unswitching        : 302.37 (65%) usr   0.01 ( 1%) sys 303.82 (65%) wall 
    72 kB ( 0%) ggc
 CPROP                   :  85.02 (18%) usr   0.09 (10%) sys  85.52 (18%) wall 
  4445 kB ( 2%) ggc
 TOTAL                 : 466.12             0.88           469.55            
221219 kB

Suggest code rework for trunk for -O3, maybe in the area of loop unswitching.

This bug may be a duplicate of bug 38518


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
@ 2014-01-14 11:20 ` rguenth at gcc dot gnu.org
  2014-01-14 11:21 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-14 11:20 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |compile-time-hog
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2014-01-14
          Component|c                           |rtl-optimization
            Summary|excessive compile time in   |excessive compile time in
                   |loop unswitching            |RTL optimizers (loop
                   |                            |unswitching, CPROP)
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
GCC 4.8 shows

 CPROP                   :  45.00 (57%) usr   0.02 ( 4%) sys  45.01 (57%) wall 
  4016 kB ( 2%) ggc
 TOTAL                 :  78.48             0.57            79.04            
213705 kB

while GCC 4.9 has

 loop invariant motion   :  10.11 (11%) usr   0.01 ( 2%) sys  10.16 (11%) wall 
     2 kB ( 0%) ggc
 loop unswitching        :   9.81 (11%) usr   0.00 ( 0%) sys   9.83 (11%) wall 
     1 kB ( 0%) ggc
 CPROP                   :  48.16 (54%) usr   0.04 ( 7%) sys  48.20 (54%) wall 
  4444 kB ( 2%) ggc

so I can't really confirm the unswitching slowness (this is r205857 which
is somewhat older than your test).

Generally I think we should probably consider removing RTL unswitching,
there is not a single loop unswitched by RTL for this testcase.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
  2014-01-14 11:20 ` [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP) rguenth at gcc dot gnu.org
@ 2014-01-14 11:21 ` rguenth at gcc dot gnu.org
  2014-01-14 12:04 ` dcb314 at hotmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-14 11:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Oh, did you configure with --enable-checking=release for 4.9?  (I did)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
  2014-01-14 11:20 ` [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP) rguenth at gcc dot gnu.org
  2014-01-14 11:21 ` rguenth at gcc dot gnu.org
@ 2014-01-14 12:04 ` dcb314 at hotmail dot com
  2014-01-14 13:24 ` rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: dcb314 at hotmail dot com @ 2014-01-14 12:04 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

--- Comment #3 from David Binderman <dcb314 at hotmail dot com> ---
(In reply to Richard Biener from comment #2)
> Oh, did you configure with --enable-checking=release for 4.9?  (I did)

No, I used --enable-checking=yes.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
                   ` (2 preceding siblings ...)
  2014-01-14 12:04 ` dcb314 at hotmail dot com
@ 2014-01-14 13:24 ` rguenth at gcc dot gnu.org
  2014-01-14 13:54 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-14 13:24 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to David Binderman from comment #3)
> (In reply to Richard Biener from comment #2)
> > Oh, did you configure with --enable-checking=release for 4.9?  (I did)
> 
> No, I used --enable-checking=yes.

That makes the comparison to 4.8 invalid (uses --enable-checking=release
by default).

Btw, callgrind shows that compile-time is dominated by
bitmap_intersection_of_preds (and bitmap_ior_and_compl),
called from lcm.c:compute_available.  LCM works with
sbitmaps which can be very expensive for large functions.

tree PRE uses regular bitmaps, but it seems that LCM can
end up using the full bitmap via returning bitmap_ones
from bitmap_intersection_of_preds (for a block with no preds).

It seems compute_available doesn't use optimal iteration order
and that explicitely representing the maximum set instead of
handling unvisited preds makes things more expensive (need to
use sbitmaps).

Iterating in inverted postorder gets me

 CPROP                   :   2.13 ( 5%) usr   0.06 (10%) sys   2.20 ( 5%) wall 
  4444 kB ( 2%) ggc

with no changes in generated code ...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
                   ` (3 preceding siblings ...)
  2014-01-14 13:24 ` rguenth at gcc dot gnu.org
@ 2014-01-14 13:54 ` rguenth at gcc dot gnu.org
  2014-01-15 12:17 ` rguenth at gcc dot gnu.org
  2014-01-19 21:28 ` dcb314 at hotmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-14 13:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
http://gcc.gnu.org/ml/gcc-patches/2014-01/msg00780.html

Even better would be to get rid of the explicit maximum set (just ignore
incoming edges with the maximum set, aka 'unvisited' edges during
bitmap_intersection_of_preds).  Basically follow what tree PRE does
for antic-in compute.  That would make using regular bitmaps possible
(if that is a win - at least computing the changed bit is free).  Also
queuing succs at the end of the worklist messes up iteration order for
everything but the first iteration.  PRE uses a sbitmap that records
whether a BB was changed.

Anyway, the above simple patch dramatically improves the numbers for this
testcase.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
                   ` (4 preceding siblings ...)
  2014-01-14 13:54 ` rguenth at gcc dot gnu.org
@ 2014-01-15 12:17 ` rguenth at gcc dot gnu.org
  2014-01-19 21:28 ` dcb314 at hotmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-01-15 12:17 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP)
  2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
                   ` (5 preceding siblings ...)
  2014-01-15 12:17 ` rguenth at gcc dot gnu.org
@ 2014-01-19 21:28 ` dcb314 at hotmail dot com
  6 siblings, 0 replies; 8+ messages in thread
From: dcb314 at hotmail dot com @ 2014-01-19 21:28 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59802

--- Comment #8 from David Binderman <dcb314 at hotmail dot com> ---
(In reply to Richard Biener from comment #7)
> Fixed.

The results I can report are for trunk dated 20130119

[dcb@zippy4 foundBugs]$ time ../results/bin/gcc -c bug129.cc 

real    0m8.076s
user    0m5.925s
sys    0m0.131s
[dcb@zippy4 foundBugs]$ time ../results/bin/gcc -c -O2 bug129.cc 

real    1m0.706s
user    0m57.884s
sys    0m0.402s
[dcb@zippy4 foundBugs]$ time ../results/bin/gcc -c -O3 bug129.cc 

real    5m45.982s
user    5m42.793s
sys    0m0.457s

while the first time is trivial, the -O2 time is down by about
60% and the -O3 time is down by about 30%.

Good work !


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-19 21:28 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14  8:49 [Bug c/59802] New: excessive compile time in loop unswitching dcb314 at hotmail dot com
2014-01-14 11:20 ` [Bug rtl-optimization/59802] excessive compile time in RTL optimizers (loop unswitching, CPROP) rguenth at gcc dot gnu.org
2014-01-14 11:21 ` rguenth at gcc dot gnu.org
2014-01-14 12:04 ` dcb314 at hotmail dot com
2014-01-14 13:24 ` rguenth at gcc dot gnu.org
2014-01-14 13:54 ` rguenth at gcc dot gnu.org
2014-01-15 12:17 ` rguenth at gcc dot gnu.org
2014-01-19 21:28 ` dcb314 at hotmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).