Timings for copying collection vs non-copying collection

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Timings for copying collection vs non-copying collection
@ 2002-12-13  9:58 Daniel Berlin
  2002-12-13 12:44 ` Matt Austern
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Berlin @ 2002-12-13  9:58 UTC (permalink / raw)
  To: gcc

Okay, after Geoff's suggestion to try the pch-branch, i rewrote the 
copying collector (much easier to do it on the pch-branch, *thanks* 
Geoff), and have some first timings.
A few notes:
1. Ignore GC times, this is a non-optimized copying collector.
2. These times are consistent to a few *tenths* (few = 2 max) of a 
second (for each pass) over multiple runs.  So pass times < 1 second 
are probably too noisy to be useful.
3. There is a bootstrap of another tree running in the background for 
this run, so ignore the wall clock time (the likely reason for 3, BTW).
4.  I'm just pasting one run as representative. The wall clock times 
obviously differed for each run.
5. The cc1's in question is not compiled with optimization.
6. Literally the only difference in cc1 between the two is that one is 
linked with ggc-page, one with ggc-copy (IE no other files are 
recompiled. They have the exact same object files being linked in).
7. The assembler output is the same for copying collection and 
non-copying collection.
8. GCC's memory usage actually shrinks after garbage collection with 
the copying collector, so it's definitely doing it's job.
9. Heap size for the copying collector is fixed at 64 meg.
10. This is a p4 1.7ghz computer with 768 meg of memory.
With ggc-page, compiling 20001221-1.c:


garbage collection    :   0.45 ( 0%) usr   0.01 ( 2%) sys   0.69 ( 0%) 
wall
cfg construction      :   0.31 ( 0%) usr   0.01 ( 2%) sys   0.84 ( 0%) 
wall
cfg cleanup           :   5.39 ( 5%) usr   0.01 ( 2%) sys  10.76 ( 6%) 
wall
trivially dead code   :   0.19 ( 0%) usr   0.00 ( 0%) sys   0.18 ( 0%) 
wall
life analysis         :   1.40 ( 1%) usr   0.01 ( 2%) sys   2.70 ( 1%) 
wall
life info update      :   0.61 ( 1%) usr   0.00 ( 0%) sys   1.21 ( 1%) 
wall
preprocessing         :   0.15 ( 0%) usr   0.11 (17%) sys   0.41 ( 0%) 
wall
lexical analysis      :   0.30 ( 0%) usr   0.23 (35%) sys   0.92 ( 0%) 
wall
parser                :   0.72 ( 1%) usr   0.13 (20%) sys   1.68 ( 1%) 
wall
expand                :   0.18 ( 0%) usr   0.00 ( 0%) sys   0.33 ( 0%) 
wall
integration           :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) 
wall
jump                  :   0.86 ( 1%) usr   0.04 ( 6%) sys   1.95 ( 1%) 
wall
CSE                   :   2.77 ( 3%) usr   0.00 ( 0%) sys   5.62 ( 3%) 
wall
global CSE            :   0.69 ( 1%) usr   0.08 (12%) sys   1.52 ( 1%) 
wall
loop analysis         :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall
CSE 2                 :   0.27 ( 0%) usr   0.00 ( 0%) sys   0.42 ( 0%) 
wall
branch prediction     :  26.96 (27%) usr   0.01 ( 2%) sys  53.45 (28%) 
wall
flow analysis         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.24 ( 0%) 
wall
combiner              :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) 
wall
if-conversion         :  11.55 (12%) usr   0.00 ( 0%) sys  22.98 (12%) 
wall
regmove               :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.07 ( 0%) 
wall
mode switching        :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.31 ( 0%) 
wall
local alloc           :   0.22 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) 
wall
global alloc          :  19.84 (20%) usr   0.01 ( 2%) sys  37.17 (19%) 
wall
reload CSE regs       :   0.36 ( 0%) usr   0.00 ( 0%) sys   0.81 ( 0%) 
wall
flow 2                :   0.12 ( 0%) usr   0.00 ( 0%) sys   0.27 ( 0%) 
wall
if-conversion 2       :   5.81 ( 6%) usr   0.00 ( 0%) sys  10.38 ( 5%) 
wall
peephole 2            :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall
rename registers      :   0.14 ( 0%) usr   0.00 ( 0%) sys   0.29 ( 0%) 
wall
scheduling 2          :  18.43 (19%) usr   0.01 ( 2%) sys  34.19 (18%) 
wall
reorder blocks        :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) 
wall
shorten branches      :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) 
wall
final                 :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.06 ( 0%) 
wall
rest of compilation   :   0.26 ( 0%) usr   0.00 ( 0%) sys   0.71 ( 0%) 
wall
TOTAL                 :  98.78             0.66           191.35

Total time: ~99 seconds
GC time: ~.5 seconds
So ~98.5 seconds excluding GC time.

With ggc-copy:

garbage collection    :   1.47 ( 2%) usr   0.05 ( 7%) sys   2.50 ( 1%) 
wall
cfg construction      :   0.33 ( 0%) usr   0.01 ( 1%) sys   0.50 ( 0%) 
wall
cfg cleanup           :   5.44 ( 6%) usr   0.02 ( 3%) sys   9.06 ( 5%) 
wall
trivially dead code   :   0.16 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) 
wall
life analysis         :   1.51 ( 2%) usr   0.02 ( 3%) sys   3.12 ( 2%) 
wall
life info update      :   0.58 ( 1%) usr   0.00 ( 0%) sys   1.03 ( 1%) 
wall
preprocessing         :   0.11 ( 0%) usr   0.07 ( 9%) sys   0.18 ( 0%) 
wall
lexical analysis      :   0.42 ( 0%) usr   0.20 (26%) sys   1.34 ( 1%) 
wall
parser                :   0.65 ( 1%) usr   0.10 (13%) sys   1.14 ( 1%) 
wall
expand                :   0.13 ( 0%) usr   0.02 ( 3%) sys   0.24 ( 0%) 
wall
integration           :   0.07 ( 0%) usr   0.00 ( 0%) sys   0.22 ( 0%) 
wall
jump                  :   0.96 ( 1%) usr   0.04 ( 5%) sys   1.71 ( 1%) 
wall
CSE                   :   2.40 ( 3%) usr   0.03 ( 4%) sys   4.64 ( 3%) 
wall
global CSE            :   0.68 ( 1%) usr   0.09 (12%) sys   1.59 ( 1%) 
wall
loop analysis         :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall
CSE 2                 :   0.23 ( 0%) usr   0.00 ( 0%) sys   0.53 ( 0%) 
wall
branch prediction     :  24.16 (26%) usr   0.05 ( 7%) sys  46.38 (27%) 
wall
flow analysis         :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) 
wall
combiner              :   0.11 ( 0%) usr   0.00 ( 0%) sys   0.32 ( 0%) 
wall
if-conversion         :  11.68 (13%) usr   0.00 ( 0%) sys  22.72 (13%) 
wall
regmove               :   0.06 ( 0%) usr   0.00 ( 0%) sys   0.21 ( 0%) 
wall
mode switching        :   0.15 ( 0%) usr   0.00 ( 0%) sys   0.30 ( 0%) 
wall
local alloc           :   0.25 ( 0%) usr   0.00 ( 0%) sys   0.55 ( 0%) 
wall
global alloc          :  12.65 (14%) usr   0.03 ( 4%) sys  24.31 (14%) 
wall
reload CSE regs       :   0.33 ( 0%) usr   0.00 ( 0%) sys   0.52 ( 0%) 
wall
flow 2                :   0.09 ( 0%) usr   0.00 ( 0%) sys   0.09 ( 0%) 
wall
if-conversion 2       :   5.85 ( 6%) usr   0.00 ( 0%) sys  10.73 ( 6%) 
wall
peephole 2            :   0.03 ( 0%) usr   0.00 ( 0%) sys   0.03 ( 0%) 
wall
rename registers      :   0.10 ( 0%) usr   0.01 ( 1%) sys   0.26 ( 0%) 
wall
scheduling 2          :  20.56 (22%) usr   0.01 ( 1%) sys  37.06 (21%) 
wall
reorder blocks        :   0.05 ( 0%) usr   0.00 ( 0%) sys   0.05 ( 0%) 
wall
shorten branches      :   0.04 ( 0%) usr   0.00 ( 0%) sys   0.04 ( 0%) 
wall
final                 :   0.04 ( 0%) usr   0.01 ( 1%) sys   0.05 ( 0%) 
wall
rest of compilation   :   0.24 ( 0%) usr   0.00 ( 0%) sys   0.77 ( 0%) 
wall
TOTAL                 :  91.67             0.76           172.63
Total time: ~91.5 seconds
GC time: ~1.5 seconds
So 90 seconds excluding gc times.

Just about a 10% difference in overall speed.
Memory footprint when not doing collection is obviously smaller for the 
copying collector.

Some observations:

Global alloc takes half the time with a copying collector. This 
surprised me, but it's consistent over multiple runs.

Branch prediction is consistently 2 seconds faster (~10%).

Locality for long lived objects isn't as good as it could be, since we 
aren't generational.  This is likely to account for the scheduling 2 
time increase.

Things that touch a lot of RTL seem to be doing better with the copying 
collector.
Whatever the memory pattern is in global alloc is likely causing 
horrendous numbers of cache misses for ggc-page, due to fragmentation 
or locality (no idea which). This is a guess, i'll run the vtune beta 
for linux and see if i'm right.

I haven't yet done C++ timings to see if it speeds up the parser/expand 
passes.

All in all it looks, at the start, like it might be worth it to go to 
copying collection.
But these are just first timings, as i said.
The numbers look good enough that i'll keep implementing.

Would people like me to post the patch against the pch branch for 
copying collection so they can try it out themselves?
--Dan

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Timings for copying collection vs non-copying collection
  2002-12-13  9:58 Timings for copying collection vs non-copying collection Daniel Berlin
@ 2002-12-13 12:44 ` Matt Austern
  0 siblings, 0 replies; 2+ messages in thread
From: Matt Austern @ 2002-12-13 12:44 UTC (permalink / raw)
  To: Daniel Berlin; +Cc: gcc

On Friday, December 13, 2002, at 09:35  AM, Daniel Berlin wrote:

> Would people like me to post the patch against the pch branch for 
> copying collection so they can try it out themselves?

I'd like to see it.  I think there are a few GC
experiments we ought to be trying, and this is
clearly one of them.

			--Matt

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-12-13 19:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-13  9:58 Timings for copying collection vs non-copying collection Daniel Berlin
2002-12-13 12:44 ` Matt Austern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).