public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
@ 2002-03-19 13:16 lucier
  2002-03-19 13:21 ` David Edelsohn
  0 siblings, 1 reply; 20+ messages in thread
From: lucier @ 2002-03-19 13:16 UTC (permalink / raw)
  To: gcc, mark; +Cc: feeley, lucier

I just filed

optimization/6007: cfg cleanup tremendous performance hog with -O1

http://gcc.gnu.org/ml/gcc-prs/2002-03/msg00729.html

It addresses the problem that cfg cleanup takes *way* too long to
run, and it's invoked with -O1.  This is a tremendous regression from
2.95*, and I ask that it be added to the list of 3.1 release issues.

I've been filing PRs for this since at least last August.

Right now, neither 3.1 nor 3.2 bootstraps on Solaris.  I don't see how
I and other people are going to have enough opportunity to test 3.1
on sparc/Solaris before release to ensure that things work even
minimally.  (For example, the last time I tried -fprofile-arcs
-ftest-coverage/gcov, it didn't work.)  Jakub just marked PR 5740
as fixed, and I haven't had an opportunity to test it, since sparc/Solaris
hasn't bootstrapped since he sent me the patch privately.

Blah.

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-19 13:16 optimization/6007: cfg cleanup tremendous performance hog with -O1 lucier
@ 2002-03-19 13:21 ` David Edelsohn
  2002-03-22  4:29   ` Jan Hubicka
  0 siblings, 1 reply; 20+ messages in thread
From: David Edelsohn @ 2002-03-19 13:21 UTC (permalink / raw)
  To: lucier; +Cc: gcc, mark, feeley

>>>>> lucier  writes:

Brad> Right now, neither 3.1 nor 3.2 bootstraps on Solaris.  I don't see how
Brad> I and other people are going to have enough opportunity to test 3.1
Brad> on sparc/Solaris before release to ensure that things work even
Brad> minimally.  (For example, the last time I tried -fprofile-arcs
Brad> -ftest-coverage/gcov, it didn't work.)  Jakub just marked PR 5740
Brad> as fixed, and I haven't had an opportunity to test it, since sparc/Solaris
Brad> hasn't bootstrapped since he sent me the patch privately.

	SPARC Solaris 2.7 is a release criteria for GCC 3.1.  GCC 3.1 will
not be released unless that configuration at least builds and works.

David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-19 13:21 ` David Edelsohn
@ 2002-03-22  4:29   ` Jan Hubicka
  2002-03-24 20:03     ` Brad Lucier
  0 siblings, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2002-03-22  4:29 UTC (permalink / raw)
  To: David Edelsohn; +Cc: lucier, gcc, mark, feeley

> >>>>> lucier  writes:
> 
> Brad> Right now, neither 3.1 nor 3.2 bootstraps on Solaris.  I don't see how
> Brad> I and other people are going to have enough opportunity to test 3.1
> Brad> on sparc/Solaris before release to ensure that things work even
> Brad> minimally.  (For example, the last time I tried -fprofile-arcs
> Brad> -ftest-coverage/gcov, it didn't work.)  Jakub just marked PR 5740
> Brad> as fixed, and I haven't had an opportunity to test it, since sparc/Solaris
> Brad> hasn't bootstrapped since he sent me the patch privately.
> 
> 	SPARC Solaris 2.7 is a release criteria for GCC 3.1.  GCC 3.1 will
> not be released unless that configuration at least builds and works.

Concerning the problem itself, I think it can not be solved w/o missing
the optimization opurtunities.  I already spent quite  alot of time
trying to save redundant work in crossjumping and there is almost
none (I am aware of one case and I will try to benchmark it).

I think we can simply disable the crossjumping at -O1 as it is expensive
optimization or do the simplified form of crossjuping that will crossjump
only into fallthru edges that should limit the worst case scenarios from
quadratic to linear and will get similar strengthness/weakness as old
crossjumping did.

With 3.2, we plan avoid some of the worst case scenarios by avoidiing
the linear ordering of basic blocks, but this is definitly not 3.1.x
choice.  Similar dead ends are constructable with very huge fucntions
and dead code removal as well. They did exist in older gccs too.

I will try to check whether I can squeze out some more cycles or
find way how to limit this.

Honza
> 
> David

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-22  4:29   ` Jan Hubicka
@ 2002-03-24 20:03     ` Brad Lucier
  2002-03-25  2:14       ` Jan Hubicka
  2002-03-28  6:39       ` Jan Hubicka
  0 siblings, 2 replies; 20+ messages in thread
From: Brad Lucier @ 2002-03-24 20:03 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: David Edelsohn, lucier, gcc, mark, feeley

[RE: crossjumping]

> I will try to check whether I can squeze out some more cycles or
> find way how to limit this.

My code uses a lot of computed goto's, with many labels.  Is crossjumping
possibly a win in this case?  Can it ignore these jumps?

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-24 20:03     ` Brad Lucier
@ 2002-03-25  2:14       ` Jan Hubicka
  2002-03-25  2:32         ` Richard Henderson
  2002-03-28  6:39       ` Jan Hubicka
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2002-03-25  2:14 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

> [RE: crossjumping]
> 
> > I will try to check whether I can squeze out some more cycles or
> > find way how to limit this.
> 
> My code uses a lot of computed goto's, with many labels.  Is crossjumping
> possibly a win in this case?  Can it ignore these jumps?

I personally don't think that the computed jumps makes some important difference
from crossjumping effectivity point of view. ONly they makes the graph very
huge making it dificult to find the crossjumping candidates.

The graph is already n^2 in size and I need to find matching outgoing edges
running tiny loop as n^3 that probably makes difference.  As there is no
particular ordering of edges, I am not sure if and how I can reduce the
complexity.  I will try to check this till end of week ( I have bit too many
things on my TODO unforutnately, so hope to get into soon, but don't know
exactly when :( )

Honza
> 
> Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-25  2:14       ` Jan Hubicka
@ 2002-03-25  2:32         ` Richard Henderson
  2002-03-25  2:59           ` Jan Hubicka
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2002-03-25  2:32 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, David Edelsohn, gcc, mark, feeley

On Mon, Mar 25, 2002 at 09:41:09AM +0100, Jan Hubicka wrote:
> The graph is already n^2 in size and I need to find matching outgoing edges
> running tiny loop as n^3 that probably makes difference.

We already look at the complexity of the cfg to decide whether or
not to run gcse.  No reason we can't do the same for crossjump.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-25  2:32         ` Richard Henderson
@ 2002-03-25  2:59           ` Jan Hubicka
  0 siblings, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2002-03-25  2:59 UTC (permalink / raw)
  To: Richard Henderson, Jan Hubicka, Brad Lucier, David Edelsohn, gcc,
	mark, feeley

> On Mon, Mar 25, 2002 at 09:41:09AM +0100, Jan Hubicka wrote:
> > The graph is already n^2 in size and I need to find matching outgoing edges
> > running tiny loop as n^3 that probably makes difference.
> 
> We already look at the complexity of the cfg to decide whether or
> not to run gcse.  No reason we can't do the same for crossjump.

Another posibility is to give up the search after constant amount of edges, so
we will still crossjump in trivial parts of function and ignore the expensive
bits.

I will try whether this works well.

Honza
> 
> 
> r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-24 20:03     ` Brad Lucier
  2002-03-25  2:14       ` Jan Hubicka
@ 2002-03-28  6:39       ` Jan Hubicka
  2002-03-28 15:27         ` Brad Lucier
                           ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Jan Hubicka @ 2002-03-28  6:39 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

> [RE: crossjumping]
> 
> > I will try to check whether I can squeze out some more cycles or
> > find way how to limit this.
> 
> My code uses a lot of computed goto's, with many labels.  Is crossjumping
> possibly a win in this case?  Can it ignore these jumps?
> 
> Brad
Hi, here is patch I made as a test.  It simply disables crossjumping
if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
your testcase as my machine runs out of space before getting there.  Can you
check if this solves your problem?  If so, I will prepare more polished
version of this patch.

Index: cfgcleanup.c
===================================================================
RCS file: /cvs/gcc/egcs/gcc/cfgcleanup.c,v
retrieving revision 1.45
diff -c -3 -p -r1.45 cfgcleanup.c
*** cfgcleanup.c	2002/03/22 11:18:33	1.45
--- cfgcleanup.c	2002/03/28 11:44:48
*************** try_crossjump_bb (mode, bb)
*** 1467,1472 ****
--- 1467,1473 ----
  {
    edge e, e2, nexte2, nexte, fallthru;
    bool changed;
+   int n = 0;
  
    /* Nothing to do if there is not at least two incoming edges.  */
    if (!bb->pred || !bb->pred->pred_next)
*************** try_crossjump_bb (mode, bb)
*** 1475,1483 ****
    /* It is always cheapest to redirect a block that ends in a branch to
       a block that falls through into BB, as that adds no branches to the
       program.  We'll try that combination first.  */
!   for (fallthru = bb->pred; fallthru; fallthru = fallthru->pred_next)
!     if (fallthru->flags & EDGE_FALLTHRU)
!       break;
  
    changed = false;
    for (e = bb->pred; e; e = nexte)
--- 1476,1488 ----
    /* It is always cheapest to redirect a block that ends in a branch to
       a block that falls through into BB, as that adds no branches to the
       program.  We'll try that combination first.  */
!   for (fallthru = bb->pred; fallthru; fallthru = fallthru->pred_next, n++)
!     {
!       if (fallthru->flags & EDGE_FALLTHRU)
! 	break;
!       if (n > 100)
! 	return false;
!     }
  
    changed = false;
    for (e = bb->pred; e; e = nexte)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-28  6:39       ` Jan Hubicka
@ 2002-03-28 15:27         ` Brad Lucier
  2002-03-29  8:44           ` Jan Hubicka
  2002-03-29 10:37           ` Jan Hubicka
  2002-03-29 10:44         ` Brad Lucier
  2002-03-29 14:00         ` Brad Lucier
  2 siblings, 2 replies; 20+ messages in thread
From: Brad Lucier @ 2002-03-28 15:27 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, David Edelsohn, gcc, mark, feeley

> 
> > [RE: crossjumping]
> > 
> > > I will try to check whether I can squeze out some more cycles or
> > > find way how to limit this.
> > 
> > My code uses a lot of computed goto's, with many labels.  Is crossjumping
> > possibly a win in this case?  Can it ignore these jumps?
> > 
> > Brad
> Hi, here is patch I made as a test.  It simply disables crossjumping
> if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
> your testcase as my machine runs out of space before getting there.  Can you
> check if this solves your problem?  If so, I will prepare more polished
> version of this patch.

Here is a progress report.

Today's CVS version of gcc-3.1 had a good bootstrap with your patch.
I'll run the regression tests later.

I'm running the new gcc on my test case now.  So far, it's taken
> 30 minutes of CPU time on my 500 MHz UltraSPARC.  Unfortunately,
it's also taking about 4 GB of swap space, and competing with another
400 MB job, so it's not making much progress right now.

The amount of swap necessary is a problem, even if cross-jumping is
disabled for certain blocks because the number of outgoing edges
is too large.  I've been having memory problems compiling much smaller
programs on my PII linux box at home because I have "only" 768 MB of
swap space.

Is there some way to not build these large internal structures if we don't
end up using them?

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-28 15:27         ` Brad Lucier
@ 2002-03-29  8:44           ` Jan Hubicka
  2002-03-29 10:37           ` Jan Hubicka
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2002-03-29  8:44 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

> > 
> > > [RE: crossjumping]
> > > 
> > > > I will try to check whether I can squeze out some more cycles or
> > > > find way how to limit this.
> > > 
> > > My code uses a lot of computed goto's, with many labels.  Is crossjumping
> > > possibly a win in this case?  Can it ignore these jumps?
> > > 
> > > Brad
> > Hi, here is patch I made as a test.  It simply disables crossjumping
> > if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
> > your testcase as my machine runs out of space before getting there.  Can you
> > check if this solves your problem?  If so, I will prepare more polished
> > version of this patch.
> 
> Here is a progress report.
> 
> Today's CVS version of gcc-3.1 had a good bootstrap with your patch.
> I'll run the regression tests later.
> 
> I'm running the new gcc on my test case now.  So far, it's taken
> > 30 minutes of CPU time on my 500 MHz UltraSPARC.  Unfortunately,
> it's also taking about 4 GB of swap space, and competing with another
> 400 MB job, so it's not making much progress right now.
> 
> The amount of swap necessary is a problem, even if cross-jumping is
> disabled for certain blocks because the number of outgoing edges
> is too large.  I've been having memory problems compiling much smaller
> programs on my PII linux box at home because I have "only" 768 MB of
> swap space.

Do you have any idea where the memory consumption come from?
I will try to check, but I am somewhat overload at the moment :(
> 
> Is there some way to not build these large internal structures if we don't
> end up using them?

Well, this is too general.  We need to figure out what structures are getting
so gigantic.
Overall gcc should not build too much structures at -O1 level - the rtl
reprezentation of program and the cfg.  Both required to compile the program.
I am not aware of any structures built at -O1 that are not needed.  How much
memory footprint changes compared to earlier versions of gcc?

The problem of CFG is the fact that it gets quadratic in the size.  THis is
unavoidable with explicit reprezenation of edges in the cfg and presence of
computed jumps :(

Honza
> 
> Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-28 15:27         ` Brad Lucier
  2002-03-29  8:44           ` Jan Hubicka
@ 2002-03-29 10:37           ` Jan Hubicka
  2002-04-04  7:02             ` Brad Lucier
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2002-03-29 10:37 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

> > 
> > > [RE: crossjumping]
> > > 
> > > > I will try to check whether I can squeze out some more cycles or
> > > > find way how to limit this.
> > > 
> > > My code uses a lot of computed goto's, with many labels.  Is crossjumping
> > > possibly a win in this case?  Can it ignore these jumps?
> > > 
> > > Brad
> > Hi, here is patch I made as a test.  It simply disables crossjumping
> > if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
> > your testcase as my machine runs out of space before getting there.  Can you
> > check if this solves your problem?  If so, I will prepare more polished
> > version of this patch.
> 
> Here is a progress report.
> 
> Today's CVS version of gcc-3.1 had a good bootstrap with your patch.
> I'll run the regression tests later.
> 
> I'm running the new gcc on my test case now.  So far, it's taken
> > 30 minutes of CPU time on my 500 MHz UltraSPARC.  Unfortunately,
> it's also taking about 4 GB of swap space, and competing with another
> 400 MB job, so it's not making much progress right now.
> 
> The amount of swap necessary is a problem, even if cross-jumping is
> disabled for certain blocks because the number of outgoing edges
> is too large.  I've been having memory problems compiling much smaller

Reading closer, in case the edges are really the problem, I don't think
we can do something with it, except changing everything to support multiedges,
like some CFG implementations do, but that is nasty.

Concerning computed jumps, it may not be impossible to simply teach compiler
to generate single incarnation of computed jump and otherwise do jumps to
that blocks resulting in worse code, but faster compilation...  It is nasty
trick, but I am not able to come with something better. You can also do it
at source level easilly in order to speed up your compiler.

Honza
> programs on my PII linux box at home because I have "only" 768 MB of
> swap space.
> 
> Is there some way to not build these large internal structures if we don't
> end up using them?
> 
> Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-28  6:39       ` Jan Hubicka
  2002-03-28 15:27         ` Brad Lucier
@ 2002-03-29 10:44         ` Brad Lucier
  2002-03-29 14:00         ` Brad Lucier
  2 siblings, 0 replies; 20+ messages in thread
From: Brad Lucier @ 2002-03-29 10:44 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, David Edelsohn, gcc, mark, feeley

> 
> > [RE: crossjumping]
> > 
> > > I will try to check whether I can squeze out some more cycles or
> > > find way how to limit this.
> > 
> > My code uses a lot of computed goto's, with many labels.  Is crossjumping
> > possibly a win in this case?  Can it ignore these jumps?
> > 
> > Brad
> Hi, here is patch I made as a test.  It simply disables crossjumping
> if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
> your testcase as my machine runs out of space before getting there.  Can you
> check if this solves your problem?  If so, I will prepare more polished
> version of this patch.

First of all, there were no regressions in the test suite on 
sparcv9-sun-solaris2.8 with your patch.

I made a slightly smaller test case that requires only 1200 MB to compile on
sparcv9.  (Perhaps it would require less on regular sparc with 32-bit pointers.)
I did not build a profiled version of cc1 yet.

The results are not so good with your patch.

banach-725% ~/programs/gcc/gcc-3.1/objdir-sparcv9/gcc/stage2/cc1 -fpreprocessed denoise3.i -mptr64 -mstack-bias -mno-v8plus -dumpbase denoise3.c -m64 -mcpu=ultrasparc -mtune=ultrasparc -O1 -Wall -W -Wno-unused -version -fPIC -fschedule-insns2 -fno-math-errno -fno-strict-aliasing -o denoise3.s
GNU CPP version 3.1 20020328 (prerelease) (cpplib) (sparc ELF)
GNU C version 3.1 20020328 (prerelease) (sparcv9-sun-solaris2.8)
        compiled by GNU C version 3.1 20020328 (prerelease).
options passed:  -fpreprocessed -mptr64 -mstack-bias -mno-v8plus -m64
 -mcpu=ultrasparc -mtune=ultrasparc -O1 -Wall -W -Wno-unused -fPIC
 -fschedule-insns2 -fno-math-errno -fno-strict-aliasing
options enabled:  -fdefer-pop -fomit-frame-pointer -fthread-jumps
 -fpeephole -ffunction-cse -fkeep-static-consts -freg-struct-return
 -fdelayed-branch -fgcse-lm -fgcse-sm -fschedule-insns2 -fsched-interblock
 -fsched-spec -fbranch-count-reg -fPIC -fcprop-registers -fcommon
 -fgnu-linker -fargument-alias -fmerge-constants -fident
 -fguess-branch-probability -ftrapping-math -mepilogue -mptr64 -m64
 -mstack-bias -mcpu=ultrasparc -mtune=ultrasparc
 ___H__20_denoise3 {GC 72981k -> 25818k} {GC 33692k -> 25034k} {GC 49841k -> 28604k} {GC 42905k -> 28468k} {GC 42063k -> 33617k} {GC 55871k -> 36690k} ___init_proc ____20_denoise3
Execution times (seconds)
 garbage collection    :   4.31 ( 0%) usr   0.02 ( 0%) sys   6.00 ( 0%) wall
 cfg construction      :  66.65 ( 2%) usr  22.01 (51%) sys  89.00 ( 2%) wall
 cfg cleanup           :3156.77 (87%) usr   0.04 ( 0%) sys3193.00 (86%) wall
 life analysis         :  79.31 ( 2%) usr   0.00 ( 0%) sys  79.00 ( 2%) wall
 life info update      :   0.80 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 preprocessing         :   0.41 ( 0%) usr   1.81 ( 4%) sys   3.00 ( 0%) wall
 lexical analysis      :   0.45 ( 0%) usr   3.63 ( 8%) sys   5.00 ( 0%) wall
 parser                :   4.22 ( 0%) usr   2.46 ( 6%) sys   5.00 ( 0%) wall
 expand                :   2.00 ( 0%) usr   0.26 ( 1%) sys   3.00 ( 0%) wall
 varconst              :   0.65 ( 0%) usr   0.03 ( 0%) sys   0.00 ( 0%) wall
 integration           :   0.93 ( 0%) usr   0.04 ( 0%) sys   1.00 ( 0%) wall
 jump                  :   0.69 ( 0%) usr   0.01 ( 0%) sys   0.00 ( 0%) wall
 CSE                   :   4.66 ( 0%) usr   0.00 ( 0%) sys   5.00 ( 0%) wall
 loop analysis         :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 flow analysis         : 138.86 ( 4%) usr  13.01 (30%) sys 152.00 ( 4%) wall
 combiner              :   8.42 ( 0%) usr   0.00 ( 0%) sys   9.00 ( 0%) wall
 if-conversion         :  11.72 ( 0%) usr   0.01 ( 0%) sys  11.00 ( 0%) wall
 local alloc           :   2.63 ( 0%) usr   0.00 ( 0%) sys   2.00 ( 0%) wall
 global alloc          :  25.46 ( 1%) usr   0.00 ( 0%) sys  26.00 ( 1%) wall
 reload CSE regs       : 106.46 ( 3%) usr   0.00 ( 0%) sys 106.00 ( 3%) wall
 flow 2                :   4.41 ( 0%) usr   0.00 ( 0%) sys   5.00 ( 0%) wall
 if-conversion 2       :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.00 ( 0%) wall
 rename registers      :   8.73 ( 0%) usr   0.00 ( 0%) sys   9.00 ( 0%) wall
 scheduling 2          :   4.88 ( 0%) usr   0.01 ( 0%) sys   5.00 ( 0%) wall
 delay branch sched    :   4.05 ( 0%) usr   0.00 ( 0%) sys   3.00 ( 0%) wall
 shorten branches      :   0.50 ( 0%) usr   0.00 ( 0%) sys   2.00 ( 0%) wall
 final                 :   1.17 ( 0%) usr   0.03 ( 0%) sys   0.00 ( 0%) wall
 rest of compilation   :   2.50 ( 0%) usr   0.02 ( 0%) sys   3.00 ( 0%) wall
 TOTAL                 :3641.89            43.42          3722.00

The file denoise3.i is at

http://www.math.purdue.edu/~lucier/GNATS/GNATS-4/denoise3.i.gz

Do you want me to build a profiled version of cc1 with your patch?

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-28  6:39       ` Jan Hubicka
  2002-03-28 15:27         ` Brad Lucier
  2002-03-29 10:44         ` Brad Lucier
@ 2002-03-29 14:00         ` Brad Lucier
  2 siblings, 0 replies; 20+ messages in thread
From: Brad Lucier @ 2002-03-29 14:00 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, David Edelsohn, gcc, mark, feeley

> Hi, here is patch I made as a test.  It simply disables crossjumping
> if there is moer than 100 outgoing edges.  Unfortunately I can't benchmark
> your testcase as my machine runs out of space before getting there.  Can you
> check if this solves your problem?  If so, I will prepare more polished
> version of this patch.

You patch did help substantially, but not sufficiently.  I just
finished a run on denoise3.i with pre-patch cc1; the time for cfg cleanup
before your patch is

 cfg cleanup           :9486.98 (95%) usr   0.00 ( 0%) sys9510.00 (95%) wall

the time after your patch is

 cfg cleanup           :3156.77 (87%) usr   0.04 ( 0%) sys3193.00 (86%) wall

The times for other passes were substantially the same with and without
the patch.

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-03-29 10:37           ` Jan Hubicka
@ 2002-04-04  7:02             ` Brad Lucier
  2002-04-04  8:23               ` Jan Hubicka
  2002-04-04 12:43               ` Richard Henderson
  0 siblings, 2 replies; 20+ messages in thread
From: Brad Lucier @ 2002-04-04  7:02 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, Jan Hubicka, David Edelsohn, gcc, mark, feeley

Honza and I have exchanged some off-list e-mail about this problem.  Real
life has intervened, and I doubt that he will have time before the scheduled
release of April 15 to work on this, so I'll attempt to summarize what's
been happening.  I think I need some help.

All this is with the compile options -O1 -fschedule-insns2.

At first, this was the profile on my all.i test:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 37.31   1106.44  1106.44 2123667007     0.00     0.00  try_crossjump_to_edge
 11.27   1440.70   334.26                             internal_mcount
  6.85   1643.67   202.97   395788     0.00     0.00  cselib_invalidate_regno
  6.53   1837.39   193.72                             htab_traverse
  4.67   1975.95   138.56     4987     0.03     0.03  propagate_freq
  2.87   2061.08    85.13       29     2.94     2.94  find_unreachable_blocks
  2.50   2135.09    74.01       15     4.93     4.94  calc_idoms
  2.48   2208.53    73.44   468802     0.00     0.00  try_forward_edges
  2.46   2281.48    72.95 173160573     0.00     0.00  cached_make_edge
  2.41   2353.01    71.53 175996207     0.00     0.00  bitmap_operation
...

cleanup cfg took > 98% of the CPU time of 18 hours on my UltraSPARC.

Honza sent me a patch on March 28 that disabled try_crossjump_bb if there
are more than 100 outgoint edges.  This changed the profile on a slightly
smaller example (denoise3.i) to

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 63.29   1219.40  1219.40    33525     0.04     0.04  try_crossjump_to_edge
  8.19   1377.13   157.73                             internal_mcount
  5.08   1474.99    97.86                             htab_traverse
  2.52   1523.56    48.57   206655     0.00     0.00  try_forward_edges
  2.40   1569.81    46.25       31     1.49     1.49  find_unreachable_blocks
  2.16   1611.38    41.57     2905     0.01     0.01  propagate_freq
  1.40   1638.30    26.91        9     2.99     5.48  calculate_global_regs_live
  1.33   1663.98    25.68       15     1.71     1.71  calc_idoms
  1.20   1687.02    23.04       15     1.54     1.54  calc_dfs_tree_nonrec

Basically, try_crossjump_bb was no longer a problem, but try_crossjump_to_edge
is still a problem; cleanup cfg still took > 87% of the CPU time.

Honza suggested (several times) that one way to deal with this is to disable
try_crossjump_to_edge and try_crossjump_bb with unless we use -O2 or higher.
These algorithms are O(N^3) in the number of edges, which are quadratic in
the size of my programs, since I use a lot of computed gotos and label 
addresses.

I've looked at cfgcleanup.c and don't really know how to proceed.  Can
someone suggest a reasonable way to fix this?

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-04  7:02             ` Brad Lucier
@ 2002-04-04  8:23               ` Jan Hubicka
  2002-04-04 12:46                 ` Richard Henderson
  2002-04-04 12:43               ` Richard Henderson
  1 sibling, 1 reply; 20+ messages in thread
From: Jan Hubicka @ 2002-04-04  8:23 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

> Honza and I have exchanged some off-list e-mail about this problem.  Real
> life has intervened, and I doubt that he will have time before the scheduled
> release of April 15 to work on this, so I'll attempt to summarize what's
> been happening.  I think I need some help.
> 
> All this is with the compile options -O1 -fschedule-insns2.
> 
> At first, this was the profile on my all.i test:
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  37.31   1106.44  1106.44 2123667007     0.00     0.00  try_crossjump_to_edge
>  11.27   1440.70   334.26                             internal_mcount
>   6.85   1643.67   202.97   395788     0.00     0.00  cselib_invalidate_regno
>   6.53   1837.39   193.72                             htab_traverse
>   4.67   1975.95   138.56     4987     0.03     0.03  propagate_freq
>   2.87   2061.08    85.13       29     2.94     2.94  find_unreachable_blocks
>   2.50   2135.09    74.01       15     4.93     4.94  calc_idoms
>   2.48   2208.53    73.44   468802     0.00     0.00  try_forward_edges
>   2.46   2281.48    72.95 173160573     0.00     0.00  cached_make_edge
>   2.41   2353.01    71.53 175996207     0.00     0.00  bitmap_operation
> ...
> 
> cleanup cfg took > 98% of the CPU time of 18 hours on my UltraSPARC.
> 
> Honza sent me a patch on March 28 that disabled try_crossjump_bb if there
> are more than 100 outgoint edges.  This changed the profile on a slightly
> smaller example (denoise3.i) to
> 
> Each sample counts as 0.01 seconds.
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls   s/call   s/call  name
>  63.29   1219.40  1219.40    33525     0.04     0.04  try_crossjump_to_edge
>   8.19   1377.13   157.73                             internal_mcount
>   5.08   1474.99    97.86                             htab_traverse
>   2.52   1523.56    48.57   206655     0.00     0.00  try_forward_edges
>   2.40   1569.81    46.25       31     1.49     1.49  find_unreachable_blocks
>   2.16   1611.38    41.57     2905     0.01     0.01  propagate_freq
>   1.40   1638.30    26.91        9     2.99     5.48  calculate_global_regs_live
>   1.33   1663.98    25.68       15     1.71     1.71  calc_idoms
>   1.20   1687.02    23.04       15     1.54     1.54  calc_dfs_tree_nonrec
> 
> Basically, try_crossjump_bb was no longer a problem, but try_crossjump_to_edge
> is still a problem; cleanup cfg still took > 87% of the CPU time.
> 
> Honza suggested (several times) that one way to deal with this is to disable
> try_crossjump_to_edge and try_crossjump_bb with unless we use -O2 or higher.
> These algorithms are O(N^3) in the number of edges, which are quadratic in

This is not the case. Tiny loop is O(N^2) in number of edges leaving basic
block, overall maximally O(N^3) in number of basic block for complette graph.

With work limiting check they are already O(N^2).  For complette graph.
This is little bit dificult to decrease futher, as crossjumping needs to
compare all sources and all destinations.  This is not problem for normal
control flow graphs, as they are sparse (number of edges is linear to nmuber
of blocks).

Old crossjumping didn't run into this problem just because it didn't handled
computed jump.  I guess one way to proceed is to simply bypass abnormal
edges when expensive optimization is disabled. (-O1)

Does this sound sane? If so, I have enought time to prepare path
tomorrow night.
> the size of my programs, since I use a lot of computed gotos and label 
> addresses.
> 
> I've looked at cfgcleanup.c and don't really know how to proceed.  Can
> someone suggest a reasonable way to fix this?

There is already patch in mainline for -fcrossjump-optimize.  I guess all
we need is to put it into branch and set default to 0 for -O1 level.
Overall -O1 can be made about 7% faster if crossjumping and jump threading
is disabled (that is also true for old implementation).  I am not sure
if the optimizations woth the CPU time for -O1 level.

Honza
> 
> Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-04  7:02             ` Brad Lucier
  2002-04-04  8:23               ` Jan Hubicka
@ 2002-04-04 12:43               ` Richard Henderson
  2002-04-04 12:57                 ` Jan Hubicka
  2002-04-05 22:15                 ` Brad Lucier
  1 sibling, 2 replies; 20+ messages in thread
From: Richard Henderson @ 2002-04-04 12:43 UTC (permalink / raw)
  To: Brad Lucier; +Cc: Jan Hubicka, David Edelsohn, gcc, mark, feeley

On Thu, Apr 04, 2002 at 09:56:49AM -0500, Brad Lucier wrote:
> I've looked at cfgcleanup.c and don't really know how to proceed.  Can
> someone suggest a reasonable way to fix this?

How about stealing an idea from gcse_main.


r~



Index: toplev.c
===================================================================
RCS file: /cvs/gcc/gcc/gcc/toplev.c,v
retrieving revision 1.574.2.7
diff -u -p -r1.574.2.7 toplev.c
--- toplev.c	2002/04/03 07:53:52	1.574.2.7
+++ toplev.c	2002/04/04 20:40:22
@@ -2348,6 +2348,7 @@ rest_of_compilation (decl)
   int failure = 0;
   int rebuild_label_notes_after_reload;
   int register_life_up_to_date;
+  int cleanup_crossjump;
 
   timevar_push (TV_REST_OF_COMPILATION);
 
@@ -3268,9 +3269,21 @@ rest_of_compilation (decl)
      scheduling to operate in the epilogue.  */
   thread_prologue_and_epilogue_insns (insns);
 
+  /* Cross-jumping is O(N^3) on the number of edges, thus trying to
+     perform cross-jumping on flow graphs which have a high connectivity
+     will take a long time.  This is similar to the test to disable GCSE.  */
+  cleanup_crossjump = CLEANUP_CROSSJUMP;
+  if (n_basic_blocks > 1000 && n_edges / n_basic_blocks >= 20)
+    {
+      if (optimize && warn_disabled_optimization)
+	warning ("crossjump disabled: %d > 1000 basic blocks and %d >= 20 edges/basic block",
+                 n_basic_blocks, n_edges / n_basic_blocks);
+      cleanup_crossjump = 0;
+    }
+
   if (optimize)
     {
-      cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_CROSSJUMP);
+      cleanup_cfg (CLEANUP_EXPENSIVE | cleanup_crossjump);
       life_analysis (insns, rtl_dump_file, PROP_FINAL);
 
       /* This is kind of a heuristic.  We need to run combine_stack_adjustments
@@ -3376,7 +3389,7 @@ rest_of_compilation (decl)
       /* Last attempt to optimize CFG, as life analyzis possibly removed
 	 some instructions.  */
       cleanup_cfg (CLEANUP_EXPENSIVE | CLEANUP_POST_REGSTACK
-		   | CLEANUP_CROSSJUMP);
+		   | cleanup_crossjump);
       if (flag_reorder_blocks)
 	{
 	  reorder_basic_blocks ();

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-04  8:23               ` Jan Hubicka
@ 2002-04-04 12:46                 ` Richard Henderson
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2002-04-04 12:46 UTC (permalink / raw)
  To: Jan Hubicka; +Cc: Brad Lucier, David Edelsohn, gcc, mark, feeley

On Thu, Apr 04, 2002 at 06:22:03PM +0200, Jan Hubicka wrote:
> Old crossjumping didn't run into this problem just because it didn't handled
> computed jump.  I guess one way to proceed is to simply bypass abnormal
> edges when expensive optimization is disabled. (-O1)

I'd say bypass abnormal edges always.


r~

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-04 12:43               ` Richard Henderson
@ 2002-04-04 12:57                 ` Jan Hubicka
  2002-04-05 22:15                 ` Brad Lucier
  1 sibling, 0 replies; 20+ messages in thread
From: Jan Hubicka @ 2002-04-04 12:57 UTC (permalink / raw)
  To: Richard Henderson, Brad Lucier, Jan Hubicka, David Edelsohn, gcc,
	mark, feeley

> On Thu, Apr 04, 2002 at 09:56:49AM -0500, Brad Lucier wrote:
> > I've looked at cfgcleanup.c and don't really know how to proceed.  Can
> > someone suggest a reasonable way to fix this?
> 
> How about stealing an idea from gcse_main.

Thats what supposed to be done by limiting the constant of outgoing edges,
that should have similar effect, just locally as there is no need to disable
optimization in sparse section of the CFG.

I think with about 100 basic block functions we can be able to reproduce same
timmings of cfg_cleanup as with the check as on the brad's testcase.  This is
not new problem, I can reproduce the same with old jump implementation.

But yes, this will work best for 3.1, where for 3.2 I can try to come with
better sollution.  I unfortunately and inexpectely have to finish my diploma
and school project till 12th instead of expected 30th making by schedule very
full, so I would not be able to do much development and testing on gcc.  Hope
no major problems will be shown in "my" parts of gcc till 12th.  I will try to
keep eye on the situation and hope to do some work next week, once I will know
more closely how far I am from finishing the work.

Thanks,
Honza

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-04 12:43               ` Richard Henderson
  2002-04-04 12:57                 ` Jan Hubicka
@ 2002-04-05 22:15                 ` Brad Lucier
  2002-04-06  5:10                   ` Mark Mitchell
  1 sibling, 1 reply; 20+ messages in thread
From: Brad Lucier @ 2002-04-05 22:15 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Brad Lucier, Jan Hubicka, David Edelsohn, gcc, mark, feeley

> 
> On Thu, Apr 04, 2002 at 09:56:49AM -0500, Brad Lucier wrote:
> > I've looked at cfgcleanup.c and don't really know how to proceed.  Can
> > someone suggest a reasonable way to fix this?
> 
> How about stealing an idea from gcse_main.
...

Thank you.

I've tested this on sparcv9-sun-solaris2.8 (sans libgcj and ada) and
there were no regressions.  This also fixes my particular problem with
long execution times.  I suggest that it be added to the 3.1 branch because
it fixes the problem, but I think a more long-term solution (like ignoring
branches from computed gotos) be applied to the 3.2 trunk later.

Brad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: optimization/6007: cfg cleanup tremendous performance hog with -O1
  2002-04-05 22:15                 ` Brad Lucier
@ 2002-04-06  5:10                   ` Mark Mitchell
  0 siblings, 0 replies; 20+ messages in thread
From: Mark Mitchell @ 2002-04-06  5:10 UTC (permalink / raw)
  To: Brad Lucier, Richard Henderson; +Cc: Jan Hubicka, David Edelsohn, gcc, feeley



--On Saturday, April 06, 2002 01:02:37 AM -0500 Brad Lucier 
<lucier@math.purdue.edu> wrote:

>>
>> On Thu, Apr 04, 2002 at 09:56:49AM -0500, Brad Lucier wrote:
>> > I've looked at cfgcleanup.c and don't really know how to proceed.  Can
>> > someone suggest a reasonable way to fix this?
>>
>> How about stealing an idea from gcse_main.
> ...
>
> Thank you.
>
> I've tested this on sparcv9-sun-solaris2.8 (sans libgcj and ada) and
> there were no regressions.  This also fixes my particular problem with
> long execution times.  I suggest that it be added to the 3.1 branch
> because it fixes the problem, but I think a more long-term solution (like
> ignoring branches from computed gotos) be applied to the 3.2 trunk later.

I agree that we should apply this patch.  I'd prefer it if the patch
used the configurable limits stuff, rather than hardwiring the numbers,
but we can live with the present form.

Thanks,

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2002-04-06  6:15 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-03-19 13:16 optimization/6007: cfg cleanup tremendous performance hog with -O1 lucier
2002-03-19 13:21 ` David Edelsohn
2002-03-22  4:29   ` Jan Hubicka
2002-03-24 20:03     ` Brad Lucier
2002-03-25  2:14       ` Jan Hubicka
2002-03-25  2:32         ` Richard Henderson
2002-03-25  2:59           ` Jan Hubicka
2002-03-28  6:39       ` Jan Hubicka
2002-03-28 15:27         ` Brad Lucier
2002-03-29  8:44           ` Jan Hubicka
2002-03-29 10:37           ` Jan Hubicka
2002-04-04  7:02             ` Brad Lucier
2002-04-04  8:23               ` Jan Hubicka
2002-04-04 12:46                 ` Richard Henderson
2002-04-04 12:43               ` Richard Henderson
2002-04-04 12:57                 ` Jan Hubicka
2002-04-05 22:15                 ` Brad Lucier
2002-04-06  5:10                   ` Mark Mitchell
2002-03-29 10:44         ` Brad Lucier
2002-03-29 14:00         ` Brad Lucier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).