* Re: if-conversion a performance bottleneck
@ 2000-05-04 22:32 Mike Stump
2000-05-04 22:35 ` Richard Henderson
0 siblings, 1 reply; 24+ messages in thread
From: Mike Stump @ 2000-05-04 22:32 UTC (permalink / raw)
To: lucier, rth; +Cc: gcc, matzmich
> Date: Thu, 4 May 2000 22:16:17 -0700
> From: Richard Henderson <rth@cygnus.com>
> > make -j bootstrap
> Err.. don't do that? Try make -j3 instead.
Unfortunately -j3 gives you not three jobs, but an exponential cascade
of 3 jobs per recursive make level. Yes, I have read the paper
`Recursive make considered harmful'. :-)
I found that -j3 -l10 will at least try and limit them from expanding
too much, which is useful if your swap limited (just how did those 2
emacen grow to be 60M each, and netscape inflate to 90M? :-().
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-04 22:32 if-conversion a performance bottleneck Mike Stump
@ 2000-05-04 22:35 ` Richard Henderson
2000-05-05 6:12 ` Brad Lucier
0 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2000-05-04 22:35 UTC (permalink / raw)
To: Mike Stump; +Cc: lucier, gcc, matzmich
On Thu, May 04, 2000 at 10:31:59PM -0700, Mike Stump wrote:
> Unfortunately -j3 gives you not three jobs, but an exponential cascade
> of 3 jobs per recursive make level.
I know. But 3 or 15 is still a lot better than 1000, which might
be what you get with just -j.
> I found that -j3 -l10 will at least try and limit them from expanding
> too much, which is useful if your swap limited (just how did those 2
> emacen grow to be 60M each, and netscape inflate to 90M? :-().
That's what gigabytes of RAM are for. ;-)
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-04 22:35 ` Richard Henderson
@ 2000-05-05 6:12 ` Brad Lucier
2000-05-05 10:37 ` Richard Henderson
2000-05-05 13:35 ` Gerald Pfeifer
0 siblings, 2 replies; 24+ messages in thread
From: Brad Lucier @ 2000-05-05 6:12 UTC (permalink / raw)
To: Richard Henderson; +Cc: Mike Stump, lucier, gcc, matzmich
> On Thu, May 04, 2000 at 10:31:59PM -0700, Mike Stump wrote:
> > Unfortunately -j3 gives you not three jobs, but an exponential cascade
> > of 3 jobs per recursive make level.
>
> I know. But 3 or 15 is still a lot better than 1000, which might
> be what you get with just -j.
>
> > I found that -j3 -l10 will at least try and limit them from expanding
> > too much, which is useful if your swap limited (just how did those 2
> > emacen grow to be 60M each, and netscape inflate to 90M? :-().
>
> That's what gigabytes of RAM are for. ;-)
I have gigabytes of RAM; I've watched what happens when I say, e.g.,
make -j9 bootstrap
or
make -j 9 bootstrap
or whatever, and it doesn't seem to actually set off a bunch of parallel
jobs---definitely, by the time the stage1 compiler starts compiling
things, I'm down to one job at a time. That's with the 2.2.13 kernel
on alpha with make 3.77; is this a known problem? Should I upgrade
make?
BTW, the maximum load with make -j is < 80 (without libgcc) and
things seem to stay rational because make gets less time as the
load rises and can't spawn jobs at such a high rate.
Brad
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-05 6:12 ` Brad Lucier
@ 2000-05-05 10:37 ` Richard Henderson
2000-05-05 13:35 ` Gerald Pfeifer
1 sibling, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2000-05-05 10:37 UTC (permalink / raw)
To: Brad Lucier; +Cc: Mike Stump, gcc, matzmich
On Fri, May 05, 2000 at 08:12:34AM -0500, Brad Lucier wrote:
> make -j 9 bootstrap
>
> or whatever, and it doesn't seem to actually set off a bunch of parallel
> jobs--
make MAKE='make -j4' bootstrap -- the -j doesn't normally get
passed down to submakes.
> Should I upgrade make?
Apparently both of us should. I'm interested in this "build server"
thingy...
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-05 6:12 ` Brad Lucier
2000-05-05 10:37 ` Richard Henderson
@ 2000-05-05 13:35 ` Gerald Pfeifer
1 sibling, 0 replies; 24+ messages in thread
From: Gerald Pfeifer @ 2000-05-05 13:35 UTC (permalink / raw)
To: Brad Lucier; +Cc: Richard Henderson, Mike Stump, gcc, matzmich
On Fri, 5 May 2000, Brad Lucier wrote:
> make -j 9 bootstrap
>
> or whatever, and it doesn't seem to actually set off a bunch of parallel
> jobs---definitely, by the time the stage1 compiler starts compiling
> things, I'm down to one job at a time. That's with the 2.2.13 kernel
> on alpha with make 3.77; is this a known problem? Should I upgrade
> make?
Please do so and let us know the results!
It would be very useful to document properly, how to use parallel
makes for GCC -- perhaps you could come up with a patch for our
documentation at?
http://gcc.gnu.org/install/build.html
Gerald
--
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-10 9:36 ` Joe Buck
2000-05-10 9:52 ` Jeffrey A Law
@ 2000-05-10 14:17 ` Joern Rennecke
1 sibling, 0 replies; 24+ messages in thread
From: Joern Rennecke @ 2000-05-10 14:17 UTC (permalink / raw)
To: Joe Buck
Cc: Andreas Schwab, Richard Henderson, Brad Lucier, Michael Matz, gcc
> No, -l does not work and never did. The reason is that the load value
> increases very slowly as processes are launched; as a result, if you
> say make -j -l, a very large number of processes get launched initially,
> and it may take 10 seconds or so for the load to climb to the level
> specified in -l -- but then the load climbs to 20x that figure or more.
> At this point, no more processes get launched until at least a minute
> after all of the first batch dies, and then another process storm is
> launched. This is becuase the number checked by -l is based on an average
> load over a full minute, not an instantaneous figure.
>
> Solution: get make 3.79 and use -j<N>. The GNU make folks should
> deprecate -l and eventually remove it.
I wouldn't go so far. It seems clear that -l is not useful to make make
react to the load it creates itself, but you might use it instead or in
addition to nice to control how much a batch job taxes a system that is
used for interactive stuff as well.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-10 9:52 ` Jeffrey A Law
@ 2000-05-10 10:49 ` Joe Buck
0 siblings, 0 replies; 24+ messages in thread
From: Joe Buck @ 2000-05-10 10:49 UTC (permalink / raw)
To: law; +Cc: Andreas Schwab, Richard Henderson, Brad Lucier, Michael Matz, gcc
> In message < 200005101628.JAA25911@possibly.synopsys.com >you write:
> > Solution: get make 3.79 and use -j<N>. The GNU make folks should
> > deprecate -l and eventually remove it.
> Most definitely. I upgraded to 3.79 a few days ago and have really liked
> the way -j works in the new version. I strongly recommend anyone building
> gcc on an SMP machine upgrade to make-3.79.
Even a 1-processor machine can benefit from -j2, as the compiler will
spend a significant portion of its time reading or writing disk, leaving
idle CPU for the other process.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-10 9:36 ` Joe Buck
@ 2000-05-10 9:52 ` Jeffrey A Law
2000-05-10 10:49 ` Joe Buck
2000-05-10 14:17 ` Joern Rennecke
1 sibling, 1 reply; 24+ messages in thread
From: Jeffrey A Law @ 2000-05-10 9:52 UTC (permalink / raw)
To: Joe Buck
Cc: Andreas Schwab, Richard Henderson, Brad Lucier, Michael Matz, gcc
In message < 200005101628.JAA25911@possibly.synopsys.com >you write:
> Solution: get make 3.79 and use -j<N>. The GNU make folks should
> deprecate -l and eventually remove it.
Most definitely. I upgraded to 3.79 a few days ago and have really liked
the way -j works in the new version. I strongly recommend anyone building
gcc on an SMP machine upgrade to make-3.79.
jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-10 8:30 ` Andreas Schwab
@ 2000-05-10 9:36 ` Joe Buck
2000-05-10 9:52 ` Jeffrey A Law
2000-05-10 14:17 ` Joern Rennecke
0 siblings, 2 replies; 24+ messages in thread
From: Joe Buck @ 2000-05-10 9:36 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Richard Henderson, Brad Lucier, Michael Matz, gcc
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1475 bytes --]
> Richard Henderson <rth@cygnus.com> writes:
>
> |> On Thu, May 04, 2000 at 09:39:36PM -0500, Brad Lucier wrote:
> |> > BTW, with the introduction of libgcc1, I can no longer
> |> >
> |> > make -j bootstrap
> |> >
> |> > on my Linux box---I run out of processes with the standard value of
> |> > NR_TASKS=512! I install a new kernel tomorrow with NR_TASKS=4096.
> |>
> |> Err.. don't do that? Try make -j3 instead.
>
> Or say make -j -l <some small value> to limit depending on the load.
No, -l does not work and never did. The reason is that the load value
increases very slowly as processes are launched; as a result, if you
say make -j -l, a very large number of processes get launched initially,
and it may take 10 seconds or so for the load to climb to the level
specified in -l -- but then the load climbs to 20x that figure or more.
At this point, no more processes get launched until at least a minute
after all of the first batch dies, and then another process storm is
launched. This is becuase the number checked by -l is based on an average
load over a full minute, not an instantaneous figure.
Solution: get make 3.79 and use -j<N>. The GNU make folks should
deprecate -l and eventually remove it.
> Andreas.
>
> --
> Andreas Schwab "And now for something
> SuSE Labs completely different."
> Andreas.Schwab@suse.de
> SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-04 22:16 ` Richard Henderson
@ 2000-05-10 8:30 ` Andreas Schwab
2000-05-10 9:36 ` Joe Buck
0 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2000-05-10 8:30 UTC (permalink / raw)
To: Richard Henderson; +Cc: Brad Lucier, Michael Matz, gcc
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 704 bytes --]
Richard Henderson <rth@cygnus.com> writes:
|> On Thu, May 04, 2000 at 09:39:36PM -0500, Brad Lucier wrote:
|> > BTW, with the introduction of libgcc1, I can no longer
|> >
|> > make -j bootstrap
|> >
|> > on my Linux box---I run out of processes with the standard value of
|> > NR_TASKS=512! I install a new kernel tomorrow with NR_TASKS=4096.
|>
|> Err.. don't do that? Try make -j3 instead.
Or say make -j -l <some small value> to limit depending on the load.
Andreas.
--
Andreas Schwab "And now for something
SuSE Labs completely different."
Andreas.Schwab@suse.de
SuSE GmbH, Schanzäckerstr. 10, D-90443 Nürnberg
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
@ 2000-05-05 13:46 Brad Lucier
0 siblings, 0 replies; 24+ messages in thread
From: Brad Lucier @ 2000-05-05 13:46 UTC (permalink / raw)
To: lucier, pfeifer; +Cc: rth, mrs, gcc, matzmich
> On Fri, 5 May 2000, Brad Lucier wrote:
> > make -j 9 bootstrap
> >
> > or whatever, and it doesn't seem to actually set off a bunch of parallel
> > jobs---definitely, by the time the stage1 compiler starts compiling
> > things, I'm down to one job at a time. That's with the 2.2.13 kernel
> > on alpha with make 3.77; is this a known problem? Should I upgrade
> > make?
>
> Please do so and let us know the results!
>
> It would be very useful to document properly, how to use parallel
> makes for GCC -- perhaps you could come up with a patch for our
> documentation at?
>
> http://gcc.gnu.org/install/build.html
After upgrading make from version 3.77 to version 3.79,
make -j 9 bootstrap
works as I think it should---it consistently has about 9 processes
running in various subdirectories throughout the build process.
Brad
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-02 22:58 ` Michael Matz
` (2 preceding siblings ...)
2000-05-04 11:55 ` Richard Henderson
@ 2000-05-05 12:01 ` Jeffrey A Law
3 siblings, 0 replies; 24+ messages in thread
From: Jeffrey A Law @ 2000-05-05 12:01 UTC (permalink / raw)
To: Michael Matz; +Cc: Brad Lucier, Richard Henderson, gcc
In message < Pine.SOL.4.10.10005030741450.1121-200000@platon >you write:
> This message is in MIME format. The first part should be readable text,
> while the remaining parts are likely unreadable without MIME-aware tools.
> Send mail to mime@docserver.cac.washington.edu for more info.
>
> ---559023410-33463914-957333180=:1121
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> Hi,
>
> On Tue, 2 May 2000, Michael Matz wrote:
> > > time seconds seconds calls ms/call ms/call name
> > > 57.51 111.67 111.67 40604 2.75 2.75 sbitmap_intersection_of_succs
> > > 12.83 136.59 24.92 15025 1.66 1.66 sbitmap_intersection_of_preds
> >
> > I once had faster versions of these two functions, if I get home I'll see
> > if they make any difference on your input data.
>
> Before fiddling with sbitmap_intersection_of_xx() I first reworked
> compute_flow_dominators() to also behave normally in calculating the
> post_doms. At least the order of work-queue initialization was wrong (in
> post_dom the changes are propagating from the _end_). I then also
> implemented a poor man's topological sort for cyclic graphs ;), which
> again gave a better performance. (I also did this once for doms, but there
> it didn't make a great difference on Brads test cases)
>
> Please try the attached diff (against actual CVS) if they make also a
> difference for you ;)
Like Richard, I'd like to see you submit this as a function which can be
called from multiple locations in the compiler.
We've got a number of routines that _might_ benefit from this code, but
I'd also like to see some more general benchmarking. I don't want to
see us slow down the compiler for the common cases just to make Brad's
one test run faster.
jeff
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
@ 2000-05-04 23:28 Mathias Froehlich
0 siblings, 0 replies; 24+ messages in thread
From: Mathias Froehlich @ 2000-05-04 23:28 UTC (permalink / raw)
To: gcc
> On Thu, May 04, 2000 at 10:31:59PM -0700, Mike Stump wrote:
> > Unfortunately -j3 gives you not three jobs, but an exponential cascade
> > of 3 jobs per recursive make level.
> I know. But 3 or 15 is still a lot better than 1000, which might
> be what you get with just -j.
> > I found that -j3 -l10 will at least try and limit them from expanding
> > too much, which is useful if your swap limited (just how did those 2
> > emacen grow to be 60M each, and netscape inflate to 90M? :-().
> That's what gigabytes of RAM are for. ;-)
Or use a recent version of gmake (>=3.78.1), which implements a so
called "jobserver mode". This limits ther _overall_ number of working
make jobs to the number given by -j. This does also work for recursive
makes.
Try it out it works great ...
Regards,
Mathias Froehlich
--
Mathias Fr"ohlich e-mail: frohlich@na.uni-tuebingen.de
Institut f"ur Mathematik, Universit"at T"ubingen, D-72076 T"ubingen
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-04 19:39 ` Brad Lucier
@ 2000-05-04 22:16 ` Richard Henderson
2000-05-10 8:30 ` Andreas Schwab
0 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2000-05-04 22:16 UTC (permalink / raw)
To: Brad Lucier; +Cc: Michael Matz, gcc
On Thu, May 04, 2000 at 09:39:36PM -0500, Brad Lucier wrote:
> BTW, with the introduction of libgcc1, I can no longer
>
> make -j bootstrap
>
> on my Linux box---I run out of processes with the standard value of
> NR_TASKS=512! I install a new kernel tomorrow with NR_TASKS=4096.
Err.. don't do that? Try make -j3 instead.
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-04 11:46 ` Michael Matz
@ 2000-05-04 19:39 ` Brad Lucier
2000-05-04 22:16 ` Richard Henderson
0 siblings, 1 reply; 24+ messages in thread
From: Brad Lucier @ 2000-05-04 19:39 UTC (permalink / raw)
To: Michael Matz; +Cc: Brad Lucier, Richard Henderson, gcc
> Btw. do you have also slightly smaller test cases (a quarter or so)? At
> home I only have a slow machine and waiting 10 minutes everytime to see,
> if one change was worthwhile is, erhh... slow.
Try
http://www.math.purdue.edu/~lucier/_t-c-1.i.gz
I get
popov-1089% /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -O1 -mcpu=ev6 -fno-math-errno -mieee _t-c-1.i
...
TOTAL : 31.78 0.36 32.13
BTW, with the introduction of libgcc1, I can no longer
make -j bootstrap
on my Linux box---I run out of processes with the standard value of
NR_TASKS=512! I install a new kernel tomorrow with NR_TASKS=4096.
Brad
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-02 22:58 ` Michael Matz
2000-05-03 5:50 ` Brad Lucier
2000-05-03 20:05 ` Brad Lucier
@ 2000-05-04 11:55 ` Richard Henderson
2000-05-05 12:01 ` Jeffrey A Law
3 siblings, 0 replies; 24+ messages in thread
From: Richard Henderson @ 2000-05-04 11:55 UTC (permalink / raw)
To: Michael Matz; +Cc: Brad Lucier, gcc
On Wed, May 03, 2000 at 07:53:00AM +0200, Michael Matz wrote:
> I then also implemented a poor man's topological sort for cyclic graphs
> ;), which again gave a better performance. (I also did this once for doms,
> but there it didn't make a great difference on Brads test cases)
Would you split this off into a new function? Perhaps called
order_bb_for_backward_flow or something? This could be useful elsewhere.
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-03 20:05 ` Brad Lucier
@ 2000-05-04 11:46 ` Michael Matz
2000-05-04 19:39 ` Brad Lucier
0 siblings, 1 reply; 24+ messages in thread
From: Michael Matz @ 2000-05-04 11:46 UTC (permalink / raw)
To: Brad Lucier; +Cc: Richard Henderson, gcc
Hi,
On Wed, 3 May 2000, Brad Lucier wrote:
> Your changes to flow.c have cut the number of calls to
> sbitmap_intersection_of_succs from 40604 to 24225, so they are definitely
> worthwhile.
Yes, that was the purpose of the patch ;)
Anyway, I saw some issues of that patch with one of your other test files,
where the runtime exploded, and on my HDD at home I have another version
which even more reduces the time. If I find time, I'll rewrite the sbitmap
routines for large homogeneous bitmaps, clean up all that stuff and see if
anything comes out of it ;)
Btw. do you have also slightly smaller test cases (a quarter or so)? At
home I only have a slow machine and waiting 10 minutes everytime to see,
if one change was worthwhile is, erhh... slow.
> rest of compilation : 1.07 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall
> TOTAL : 226.10 1.27 227.32
God! For me that was ca. 1110 seconds ;) (Now I'm down to 580)
Ciao,
Michael.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-02 22:58 ` Michael Matz
2000-05-03 5:50 ` Brad Lucier
@ 2000-05-03 20:05 ` Brad Lucier
2000-05-04 11:46 ` Michael Matz
2000-05-04 11:55 ` Richard Henderson
2000-05-05 12:01 ` Jeffrey A Law
3 siblings, 1 reply; 24+ messages in thread
From: Brad Lucier @ 2000-05-03 20:05 UTC (permalink / raw)
To: Michael Matz; +Cc: Brad Lucier, Richard Henderson, gcc
> Please try the attached diff (against actual CVS) if they make also a
> difference for you ;)
Your changes to flow.c have cut the number of calls to
sbitmap_intersection_of_succs from 40604 to 24225, so they are definitely
worthwhile. Bootstrapped on alphaev6-unknown-linux-gnu.
Brad
Here are the new statistics with your changes:
Execution times (seconds)
garbage collection : 1.30 ( 1%) usr 0.00 ( 0%) sys 1.30 ( 1%) wall
parser : 6.45 ( 3%) usr 0.19 (15%) sys 6.64 ( 3%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 20.06 ( 9%) usr 0.84 (66%) sys 20.90 ( 9%) wall
CSE : 2.77 ( 1%) usr 0.00 ( 0%) sys 2.77 ( 1%) wall
global CSE : 5.51 ( 2%) usr 0.01 ( 1%) sys 5.52 ( 2%) wall
loop analysis : 0.24 ( 0%) usr 0.00 ( 0%) sys 0.24 ( 0%) wall
CSE 2 : 2.29 ( 1%) usr 0.00 ( 0%) sys 2.29 ( 1%) wall
flow analysis : 52.11 (23%) usr 0.06 ( 5%) sys 52.16 (23%) wall
combiner : 2.80 ( 1%) usr 0.00 ( 0%) sys 2.80 ( 1%) wall
if-conversion : 44.42 (20%) usr 0.02 ( 2%) sys 44.43 (20%) wall
regmove : 0.50 ( 0%) usr 0.00 ( 0%) sys 0.50 ( 0%) wall
scheduling : 6.26 ( 3%) usr 0.01 ( 1%) sys 6.27 ( 3%) wall
local alloc : 1.49 ( 1%) usr 0.00 ( 0%) sys 1.49 ( 1%) wall
global alloc : 2.78 ( 1%) usr 0.07 ( 6%) sys 2.86 ( 1%) wall
reload CSE regs : 7.92 ( 4%) usr 0.01 ( 1%) sys 7.93 ( 3%) wall
flow 2 : 19.36 ( 9%) usr 0.00 ( 0%) sys 19.35 ( 9%) wall
if-conversion 2 : 36.38 (16%) usr 0.01 ( 1%) sys 36.38 (16%) wall
peephole 2 : 0.03 ( 0%) usr 0.00 ( 0%) sys 0.03 ( 0%) wall
schedulding 2 : 8.88 ( 4%) usr 0.00 ( 0%) sys 8.88 ( 4%) wall
shorten branches : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
final : 3.22 ( 1%) usr 0.00 ( 0%) sys 3.22 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 1.07 ( 0%) usr 0.00 ( 0%) sys 1.07 ( 0%) wall
TOTAL : 226.10 1.27 227.32
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
42.54 69.80 69.80 24225 2.88 2.88 sbitmap_intersection_of_succs
17.24 98.09 28.29 15025 1.88 1.88 sbitmap_intersection_of_preds
5.03 106.34 8.25 42 196.36 196.36 mark_critical_edges
3.98 112.86 6.52 25085231 0.00 0.00 bitmap_operation
3.56 118.70 5.84 10248 0.57 0.62 compute_block_backward_dependences
3.44 124.35 5.65 9 627.60 11546.70 compute_flow_dominators
2.37 128.24 3.89 21 185.36 185.93 delete_unreachable_blocks
2.31 132.03 3.79 6 631.02 1760.19 calculate_global_regs_live
...
-----------------------------------------------
1.88 32.76 3/9 flow_loops_find [9]
3.77 65.51 6/9 if_convert [8]
[6] 63.3 5.65 98.27 9 compute_flow_dominators [6]
69.80 0.00 24225/24225 sbitmap_intersection_of_succs [7]
28.29 0.00 15025/15025 sbitmap_intersection_of_preds [10]
0.16 0.00 39259/39259 sbitmap_a_and_b [135]
0.02 0.00 9/27 sbitmap_vector_alloc [255]
0.00 0.00 9/18 sbitmap_vector_zero [771]
0.00 0.00 9/74716 sbitmap_zero [725]
0.00 0.00 9/9 sbitmap_vector_ones [1365]
-----------------------------------------------
69.80 0.00 24225/24225 compute_flow_dominators [6]
[7] 42.5 69.80 0.00 24225 sbitmap_intersection_of_succs [7]
0.00 0.00 24225/39250 sbitmap_copy [612]
-----------------------------------------------
0.00 69.40 12/12 rest_of_compilation [5]
[8] 42.3 0.00 69.40 12 if_convert [8]
3.77 65.51 6/9 compute_flow_dominators [6]
0.00 0.06 12/43 free_basic_block_vars [112]
0.03 0.00 12/48 compute_bb_for_insn [147]
0.00 0.01 20509/20509 find_if_header [391]
0.01 0.00 6/27 sbitmap_vector_alloc [255]
0.00 0.00 1/10258 update_life_info [12]
0.00 0.00 1/37 allocate_reg_info [540]
0.00 0.00 1/20500 count_or_remove_death_notes [36]
0.00 0.00 1/995 sbitmap_alloc [679]
0.00 0.00 2/145008 max_reg_num [374]
0.00 0.00 1/74716 sbitmap_zero [725]
0.00 0.00 12/5225 get_max_uid [1134]
-----------------------------------------------
2.96 36.17 3/3 rest_of_compilation [5]
[9] 23.8 2.96 36.17 3 flow_loops_find [9]
1.88 32.76 3/9 compute_flow_dominators [6]
0.76 0.00 964/964 flow_loop_exits_find [38]
0.75 0.00 1/1 flow_depth_first_order_compute [39]
0.01 0.00 3/27 sbitmap_vector_alloc [255]
0.00 0.00 964/964 flow_loop_pre_header_find [627]
0.00 0.00 966/995 sbitmap_alloc [679]
0.00 0.00 3/3 flow_loops_tree_build [747]
0.00 0.00 964/964 flow_loop_nodes_find [791]
0.00 0.00 964/964 sbitmap_last_set_bit [834]
0.00 0.00 2/74716 sbitmap_zero [725]
0.00 0.00 964/964 sbitmap_first_set_bit [1174]
0.00 0.00 3/3 flow_loops_level_compute [1435]
-----------------------------------------------
28.29 0.00 15025/15025 compute_flow_dominators [6]
[10] 17.2 28.29 0.00 15025 sbitmap_intersection_of_preds [10]
0.00 0.00 15025/39250 sbitmap_copy [612]
-----------------------------------------------
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-02 22:58 ` Michael Matz
@ 2000-05-03 5:50 ` Brad Lucier
2000-05-03 20:05 ` Brad Lucier
` (2 subsequent siblings)
3 siblings, 0 replies; 24+ messages in thread
From: Brad Lucier @ 2000-05-03 5:50 UTC (permalink / raw)
To: Michael Matz; +Cc: Brad Lucier, Richard Henderson, gcc
> Please try the attached diff (against actual CVS) if they make also a
> difference for you ;)
The diff didn't apply cleanly, I don't know why, so I applied it by hand.
I didn't have time this morning to install a new profiled version, so
I ran make bootstrap with the usual BOOT_CFLAGS.
Your patch did help. Here is a comparison between the a profiled version
without your patch and an unprofiled version with your patch:
popov-726% /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -O1 -mcpu=ev6 -fno-math-errno -mieee _t-c-2.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___t_2d_c_2d_2 {GC 46791k -> 9596k} {GC 14695k -> 10090k} {GC 14135k -> 10558k} ___init_proc {GC 23543k -> 2096k} ____20___t_2d_c_2d_2
Execution times (seconds)
garbage collection : 0.71 ( 0%) usr 0.00 ( 1%) sys 0.71 ( 0%) wall
parser : 6.13 ( 3%) usr 0.19 (25%) sys 6.32 ( 3%) wall
varconst : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 6.45 ( 3%) usr 0.39 (51%) sys 6.84 ( 3%) wall
CSE : 2.20 ( 1%) usr 0.00 ( 1%) sys 2.20 ( 1%) wall
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 63.17 (27%) usr 0.06 ( 8%) sys 63.21 (27%) wall
combiner : 2.81 ( 1%) usr 0.00 ( 0%) sys 2.81 ( 1%) wall
if-conversion : 56.39 (24%) usr 0.01 ( 2%) sys 56.39 (24%) wall
local alloc : 1.24 ( 1%) usr 0.00 ( 0%) sys 1.24 ( 1%) wall
global alloc : 2.35 ( 1%) usr 0.05 ( 8%) sys 2.41 ( 1%) wall
reload CSE regs : 3.68 ( 2%) usr 0.00 ( 0%) sys 3.68 ( 2%) wall
flow 2 : 29.86 (13%) usr 0.00 ( 1%) sys 29.86 (13%) wall
if-conversion 2 : 58.60 (25%) usr 0.01 ( 2%) sys 58.60 (25%) wall
shorten branches : 0.11 ( 0%) usr 0.00 ( 0%) sys 0.11 ( 0%) wall
final : 2.68 ( 1%) usr 0.00 ( 1%) sys 2.70 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 1.02 ( 0%) usr 0.00 ( 0%) sys 1.02 ( 0%) wall
TOTAL : 237.52 0.77 238.47
popov-727% gcc -v
Reading specs from /export/u10/gcc-2.95.1/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.95.1/specs
gcc version 2.95.1 19990816 (release)
popov-728% /export/u10/egcs-test/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -O1 -mcpu=ev6 -fno-math-errno -mieee _t-c-2.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___t_2d_c_2d_2 {GC 46791k -> 9596k} {GC 14695k -> 10090k} {GC 14135k -> 10558k} ___init_proc {GC 23543k -> 2096k} ____20___t_2d_c_2d_2
Execution times (seconds)
garbage collection : 0.32 ( 0%) usr 0.00 ( 1%) sys 0.32 ( 0%) wall
parser : 2.92 ( 2%) usr 0.18 (25%) sys 3.10 ( 2%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 4.56 ( 3%) usr 0.38 (52%) sys 4.94 ( 3%) wall
CSE : 0.81 ( 1%) usr 0.00 ( 0%) sys 0.81 ( 1%) wall
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 51.69 (32%) usr 0.06 ( 8%) sys 51.74 (32%) wall
combiner : 0.99 ( 1%) usr 0.00 ( 0%) sys 0.99 ( 1%) wall
if-conversion : 33.69 (21%) usr 0.01 ( 2%) sys 33.70 (21%) wall
local alloc : 0.54 ( 0%) usr 0.00 ( 0%) sys 0.54 ( 0%) wall
global alloc : 1.11 ( 1%) usr 0.05 ( 7%) sys 1.16 ( 1%) wall
reload CSE regs : 3.15 ( 2%) usr 0.00 ( 1%) sys 3.15 ( 2%) wall
flow 2 : 23.85 (15%) usr 0.00 ( 0%) sys 23.84 (15%) wall
if-conversion 2 : 33.45 (21%) usr 0.01 ( 2%) sys 33.45 (21%) wall
shorten branches : 0.06 ( 0%) usr 0.00 ( 0%) sys 0.06 ( 0%) wall
final : 2.46 ( 2%) usr 0.00 ( 0%) sys 2.46 ( 2%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 0.43 ( 0%) usr 0.00 ( 0%) sys 0.43 ( 0%) wall
TOTAL : 160.09 0.73 160.78
I can get more information this afternoon (this evening in Germany :-).
Brad
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-02 5:45 ` Michael Matz
@ 2000-05-02 22:58 ` Michael Matz
2000-05-03 5:50 ` Brad Lucier
` (3 more replies)
0 siblings, 4 replies; 24+ messages in thread
From: Michael Matz @ 2000-05-02 22:58 UTC (permalink / raw)
To: Brad Lucier; +Cc: Richard Henderson, gcc
Hi,
On Tue, 2 May 2000, Michael Matz wrote:
> > time seconds seconds calls ms/call ms/call name
> > 57.51 111.67 111.67 40604 2.75 2.75 sbitmap_intersection_of_succs
> > 12.83 136.59 24.92 15025 1.66 1.66 sbitmap_intersection_of_preds
>
> I once had faster versions of these two functions, if I get home I'll see
> if they make any difference on your input data.
Before fiddling with sbitmap_intersection_of_xx() I first reworked
compute_flow_dominators() to also behave normally in calculating the
post_doms. At least the order of work-queue initialization was wrong (in
post_dom the changes are propagating from the _end_). I then also
implemented a poor man's topological sort for cyclic graphs ;), which
again gave a better performance. (I also did this once for doms, but there
it didn't make a great difference on Brads test cases)
Please try the attached diff (against actual CVS) if they make also a
difference for you ;)
Ciao,
Michael.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-01 18:01 ` Brad Lucier
@ 2000-05-02 5:45 ` Michael Matz
2000-05-02 22:58 ` Michael Matz
0 siblings, 1 reply; 24+ messages in thread
From: Michael Matz @ 2000-05-02 5:45 UTC (permalink / raw)
To: Brad Lucier; +Cc: Richard Henderson, gcc
On Mon, 1 May 2000, Brad Lucier wrote:
> Each sample counts as 0.000976562 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 57.51 111.67 111.67 40604 2.75 2.75 sbitmap_intersection_of_succs
> 12.83 136.59 24.92 15025 1.66 1.66 sbitmap_intersection_of_preds
I once had faster versions of these two functions, if I get home I'll see
if they make any difference on your input data.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-01 15:29 ` Richard Henderson
@ 2000-05-01 18:01 ` Brad Lucier
2000-05-02 5:45 ` Michael Matz
0 siblings, 1 reply; 24+ messages in thread
From: Brad Lucier @ 2000-05-01 18:01 UTC (permalink / raw)
To: Richard Henderson; +Cc: Brad Lucier, gcc
> Oops. That call was supposed to be ifdef ENABLE_CHECKING.
> With that fixed, are compilation times mostly sane?
> r~
Yes, that helped a lot :-), but if-conversion still takes almost half
the CPU time to compile that file:
popov-211% /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -O1 -mcpu=ev6 -fno-math-errno -mieee _t-c-2.i
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___t_2d_c_2d_2 {GC 46791k -> 9596k} {GC 14695k -> 10090k} {GC 14135k -> 10558k} ___init_proc {GC 23543k -> 2096k} ____20___t_2d_c_2d_2
Execution times (seconds)
garbage collection : 0.70 ( 0%) usr 0.00 ( 0%) sys 0.70 ( 0%) wall
parser : 6.25 ( 3%) usr 0.18 (22%) sys 6.43 ( 3%) wall
varconst : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 6.50 ( 3%) usr 0.38 (48%) sys 6.88 ( 3%) wall
CSE : 2.28 ( 1%) usr 0.00 ( 0%) sys 2.28 ( 1%) wall
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 58.86 (25%) usr 0.11 (14%) sys 58.99 (25%) wall
combiner : 2.87 ( 1%) usr 0.00 ( 0%) sys 2.87 ( 1%) wall
if-conversion : 59.53 (26%) usr 0.02 ( 3%) sys 59.59 (26%) wall
local alloc : 1.24 ( 1%) usr 0.00 ( 0%) sys 1.24 ( 1%) wall
global alloc : 2.37 ( 1%) usr 0.05 ( 6%) sys 2.42 ( 1%) wall
reload CSE regs : 3.88 ( 2%) usr 0.00 ( 0%) sys 3.88 ( 2%) wall
flow 2 : 27.86 (12%) usr 0.00 ( 0%) sys 27.86 (12%) wall
if-conversion 2 : 55.02 (24%) usr 0.01 ( 2%) sys 55.02 (24%) wall
shorten branches : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.12 ( 0%) wall
final : 2.77 ( 1%) usr 0.01 ( 1%) sys 2.78 ( 1%) wall
symout : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
rest of compilation : 1.02 ( 0%) usr 0.00 ( 0%) sys 1.02 ( 0%) wall
TOTAL : 231.37 0.81 232.20
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
57.51 111.67 111.67 40604 2.75 2.75 sbitmap_intersection_of_succs
12.83 136.59 24.92 15025 1.66 1.66 sbitmap_intersection_of_preds
9.58 155.19 18.60 20981 0.89 0.89 for_each_successor_phi
3.78 162.54 7.35 25079015 0.00 0.00 bitmap_operation
1.99 166.40 3.86 6 643.72 5013.82 calculate_global_regs_live
1.71 169.73 3.33 18 184.84 184.84 mark_critical_edges
1.57 172.78 3.05 9 338.87 15538.80 compute_flow_dominators
1.49 175.67 2.89 3 963.87 16982.59 flow_loops_find
...
-----------------------------------------------
1.02 45.60 3/9 flow_loops_find [9]
2.03 91.20 6/9 if_convert [8]
[6] 72.0 3.05 136.80 9 compute_flow_dominators [6]
111.67 0.00 40604/40604 sbitmap_intersection_of_succs [7]
24.92 0.00 15025/15025 sbitmap_intersection_of_preds [12]
0.18 0.00 55638/55638 sbitmap_a_and_b [77]
0.02 0.00 9/21 sbitmap_vector_alloc [217]
0.00 0.00 9/9 sbitmap_vector_ones [744]
0.00 0.00 9/12 sbitmap_vector_zero [758]
0.00 0.00 9/31743 sbitmap_zero [752]
-----------------------------------------------
111.67 0.00 40604/40604 compute_flow_dominators [6]
[7] 57.5 111.67 0.00 40604 sbitmap_intersection_of_succs [7]
0.00 0.00 40604/55629 sbitmap_copy [470]
-----------------------------------------------
0.00 96.36 9/9 rest_of_compilation [5]
[8] 49.6 0.00 96.36 9 if_convert [8]
2.03 91.20 6/9 compute_flow_dominators [6]
0.00 3.05 1/10 update_life_info [10]
0.03 0.00 9/18 compute_bb_for_insn [190]
0.00 0.02 9/37 free_basic_block_vars [135]
0.00 0.02 15384/15384 find_if_header [297]
0.01 0.00 6/21 sbitmap_vector_alloc [217]
0.00 0.00 1/25 allocate_reg_info [481]
0.00 0.00 9/5192 get_max_uid [727]
0.00 0.00 2/64864 max_reg_num [454]
0.00 0.00 1/31743 sbitmap_zero [752]
0.00 0.00 1/974 sbitmap_alloc [1076]
0.00 0.00 1/1 count_or_remove_death_notes [1385]
-----------------------------------------------
2.89 48.06 3/3 rest_of_compilation [5]
[9] 26.2 2.89 48.06 3 flow_loops_find [9]
1.02 45.60 3/9 compute_flow_dominators [6]
0.71 0.00 964/964 flow_loop_exits_find [31]
0.71 0.00 1/1 flow_depth_first_order_compute [32]
0.01 0.00 3/21 sbitmap_vector_alloc [217]
0.00 0.00 3/3 flow_loops_tree_build [604]
0.00 0.00 964/964 flow_loop_nodes_find [698]
0.00 0.00 964/964 sbitmap_first_set_bit [739]
0.00 0.00 964/964 flow_loop_pre_header_find [738]
0.00 0.00 2/31743 sbitmap_zero [752]
0.00 0.00 966/974 sbitmap_alloc [1076]
0.00 0.00 964/964 sbitmap_last_set_bit [1078]
0.00 0.00 3/3 flow_loops_level_compute [1331]
-----------------------------------------------
0.00 3.05 1/10 if_convert [8]
0.00 9.14 3/10 rest_of_compilation [5]
0.00 18.28 6/10 life_analysis [14]
[10] 15.7 0.00 30.47 10 update_life_info [10]
3.86 26.22 6/6 calculate_global_regs_live [11]
0.02 0.35 15374/27234 propagate_block [34]
0.00 0.00 15374/27239 free_propagate_block_info [389]
0.01 0.00 15374/72751 bitmap_copy [262]
0.00 0.00 5125/5125 verify_local_live_at_start [573]
0.00 0.00 10/257366 bitmap_clear [283]
0.00 0.00 10/124375 bitmap_initialize [407]
-----------------------------------------------
3.86 26.22 6/6 update_life_info [10]
[11] 15.5 3.86 26.22 6 calculate_global_regs_live [11]
18.60 0.00 20981/20981 for_each_successor_phi [13]
7.30 0.00 24929375/25079015 bitmap_operation [15]
0.01 0.27 11855/27234 propagate_block [34]
0.01 0.00 24210/72751 bitmap_copy [262]
0.00 0.00 52694/257366 bitmap_clear [283]
0.00 0.00 11855/27239 free_propagate_block_info [389]
0.00 0.00 11855/11855 bitmap_equal_p [525]
0.00 0.00 20981/488036 bitmap_set_bit [167]
0.00 0.00 10261/124375 bitmap_initialize [407]
-----------------------------------------------
24.92 0.00 15025/15025 compute_flow_dominators [6]
[12] 12.8 24.92 0.00 15025 sbitmap_intersection_of_preds [12]
0.00 0.00 15025/55629 sbitmap_copy [470]
-----------------------------------------------
18.60 0.00 20981/20981 calculate_global_regs_live [11]
[13] 9.6 18.60 0.00 20981 for_each_successor_phi [13]
-----------------------------------------------
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: if-conversion a performance bottleneck
2000-05-01 10:54 Brad Lucier
@ 2000-05-01 15:29 ` Richard Henderson
2000-05-01 18:01 ` Brad Lucier
0 siblings, 1 reply; 24+ messages in thread
From: Richard Henderson @ 2000-05-01 15:29 UTC (permalink / raw)
To: Brad Lucier; +Cc: gcc
On Mon, May 01, 2000 at 12:54:35PM -0500, Brad Lucier wrote:
> Perhaps with the calls to verify_flow_info, this is a temporary effect,
> I don't know.
Oops. That call was supposed to be ifdef ENABLE_CHECKING.
With that fixed, are compilation times mostly sane?
r~
^ permalink raw reply [flat|nested] 24+ messages in thread
* if-conversion a performance bottleneck
@ 2000-05-01 10:54 Brad Lucier
2000-05-01 15:29 ` Richard Henderson
0 siblings, 1 reply; 24+ messages in thread
From: Brad Lucier @ 2000-05-01 10:54 UTC (permalink / raw)
To: gcc; +Cc: lucier
The patch
http://gcc.gnu.org/ml/gcc-patches/2000-04/msg00248.html
killed one of the biggest time hogs in gcc, but now Richard
Henderson's if-conversion pass has affected performance adversely.
Perhaps with the calls to verify_flow_info, this is a temporary effect,
I don't know.
On this file
http://www.math.purdue.edu/~lucier/_t-c-2.i.gz
called with these options
/export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/cc1 -fno-math-errno -mieee -g -mcpu=ev6 -O1 _t-c-2.i > & cc1.out &
with this compiler
popov-41% gcc -v
Reading specs from /export/u10/egcs-profile/lib/gcc-lib/alphaev6-unknown-linux-gnu/2.96/specs
gcc version 2.96 20000430 (experimental)
I get the following performance on a 500 MHz alpha 21264:
__copysignf copysignf __copysign copysign __fabsf fabsf __fabs fabs __floorf __floor floorf floor __fdimf fdimf __fdim fdim ___H__20___t_2d_c_2d_2 {GC 47533k -> 10293k} {GC 15386k -> 10788k} {GC 14832k -> 11255k} ___init_proc {GC 24248k -> 2107k} ____20___t_2d_c_2d_2
Execution times (seconds)
garbage collection : 0.75 ( 0%) usr 0.00 ( 0%) sys 0.75 ( 0%) wall
parser : 6.55 ( 0%) usr 0.18 (15%) sys 6.73 ( 0%) wall
varconst : 0.01 ( 0%) usr 0.00 ( 0%) sys 0.01 ( 0%) wall
integration : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
jump : 6.81 ( 0%) usr 0.40 (32%) sys 7.21 ( 0%) wall
CSE : 2.29 ( 0%) usr 0.00 ( 0%) sys 2.29 ( 0%) wall
global CSE : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
loop analysis : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
CSE 2 : 0.00 ( 0%) usr 0.00 ( 0%) sys 0.00 ( 0%) wall
flow analysis : 61.00 ( 2%) usr 0.07 ( 6%) sys 61.05 ( 2%) wall
combiner : 2.82 ( 0%) usr 0.00 ( 0%) sys 2.82 ( 0%) wall
if-conversion :2306.07 (64%) usr 0.44 (35%) sys2306.74 (64%) wall
local alloc : 1.31 ( 0%) usr 0.00 ( 0%) sys 1.31 ( 0%) wall
global alloc : 2.42 ( 0%) usr 0.04 ( 4%) sys 2.46 ( 0%) wall
reload CSE regs : 3.62 ( 0%) usr 0.00 ( 0%) sys 3.62 ( 0%) wall
flow 2 : 30.20 ( 1%) usr 0.00 ( 0%) sys 30.20 ( 1%) wall
if-conversion 2 :1196.27 (33%) usr 0.09 ( 7%) sys1196.01 (33%) wall
shorten branches : 0.12 ( 0%) usr 0.00 ( 0%) sys 0.13 ( 0%) wall
final : 2.69 ( 0%) usr 0.00 ( 1%) sys 2.70 ( 0%) wall
symout : 0.02 ( 0%) usr 0.00 ( 0%) sys 0.02 ( 0%) wall
rest of compilation : 1.06 ( 0%) usr 0.00 ( 0%) sys 1.06 ( 0%) wall
TOTAL :3624.12 1.27 3625.23
Some of the details are:
Flat profile:
Each sample counts as 0.000976562 seconds.
% cumulative self self total
time seconds seconds calls ms/call ms/call name
49.50 192.40 192.40 9 21378.26 21381.98 verify_flow_info
28.10 301.64 109.24 40604 2.69 2.69 sbitmap_intersection_of_succs
6.75 327.90 26.25 15025 1.75 1.75 sbitmap_intersection_of_preds
5.38 348.79 20.89 20981 1.00 1.00 for_each_successor_phi
1.87 356.06 7.27 25079015 0.00 0.00 bitmap_operation
...
-----------------------------------------------
0.00 288.44 9/9 rest_of_compilation [5]
[6] 74.2 0.00 288.44 9 if_convert [6]
192.40 0.03 9/9 verify_flow_info [7]
2.17 90.46 6/9 compute_flow_dominators [8]
0.00 3.27 1/10 update_life_info [11]
0.00 0.03 9/37 free_basic_block_vars [99]
0.03 0.00 9/18 compute_bb_for_insn [177]
0.00 0.02 15384/15384 find_if_header [331]
0.01 0.00 6/21 sbitmap_vector_alloc [222]
0.00 0.00 1/25 allocate_reg_info [536]
0.00 0.00 2/64864 max_reg_num [371]
0.00 0.00 9/5201 get_max_uid [1093]
0.00 0.00 1/974 sbitmap_alloc [1124]
0.00 0.00 1/31743 sbitmap_zero [1070]
0.00 0.00 1/1 count_or_remove_death_notes [1414]
-----------------------------------------------
192.40 0.03 9/9 if_convert [6]
[7] 49.5 192.40 0.03 9 verify_flow_info [7]
0.00 0.03 15008/26961 returnjump_p [183]
0.00 0.00 3/87970 condjump_p [388]
0.00 0.00 9/5201 get_max_uid [1093]
0.00 0.00 9/231 get_insns [1157]
-----------------------------------------------
1.09 45.23 3/9 flow_loops_find [10]
2.17 90.46 6/9 if_convert [6]
[8] 35.7 3.26 135.69 9 compute_flow_dominators [8]
109.24 0.01 40604/40604 sbitmap_intersection_of_succs [9]
26.25 0.00 15025/15025 sbitmap_intersection_of_preds [13]
0.17 0.00 55638/55638 sbitmap_a_and_b [88]
0.02 0.00 9/21 sbitmap_vector_alloc [222]
0.00 0.00 9/9 sbitmap_vector_ones [688]
0.00 0.00 9/12 sbitmap_vector_zero [788]
0.00 0.00 9/31743 sbitmap_zero [1070]
-----------------------------------------------
109.24 0.01 40604/40604 compute_flow_dominators [8]
[9] 28.1 109.24 0.01 40604 sbitmap_intersection_of_succs [9]
0.01 0.00 40604/55629 sbitmap_copy [396]
-----------------------------------------------
Brad Lucier
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2000-05-10 14:17 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-05-04 22:32 if-conversion a performance bottleneck Mike Stump
2000-05-04 22:35 ` Richard Henderson
2000-05-05 6:12 ` Brad Lucier
2000-05-05 10:37 ` Richard Henderson
2000-05-05 13:35 ` Gerald Pfeifer
-- strict thread matches above, loose matches on Subject: below --
2000-05-05 13:46 Brad Lucier
2000-05-04 23:28 Mathias Froehlich
2000-05-01 10:54 Brad Lucier
2000-05-01 15:29 ` Richard Henderson
2000-05-01 18:01 ` Brad Lucier
2000-05-02 5:45 ` Michael Matz
2000-05-02 22:58 ` Michael Matz
2000-05-03 5:50 ` Brad Lucier
2000-05-03 20:05 ` Brad Lucier
2000-05-04 11:46 ` Michael Matz
2000-05-04 19:39 ` Brad Lucier
2000-05-04 22:16 ` Richard Henderson
2000-05-10 8:30 ` Andreas Schwab
2000-05-10 9:36 ` Joe Buck
2000-05-10 9:52 ` Jeffrey A Law
2000-05-10 10:49 ` Joe Buck
2000-05-10 14:17 ` Joern Rennecke
2000-05-04 11:55 ` Richard Henderson
2000-05-05 12:01 ` Jeffrey A Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).