Speeding up GC

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Speeding up GC
@ 2002-06-03  6:25 Martin v. Löwis
  2002-06-03  6:39 ` David S. Miller
  2002-06-03 18:10 ` Richard Henderson
  0 siblings, 2 replies; 32+ messages in thread
From: Martin v. Löwis @ 2002-06-03  6:25 UTC (permalink / raw)
  To: gcc

While profiling cc1plus, I notice that a large amount of time is spent
in marking objects during GC. I suspect that, for each collection, the
same objects are marked over and over again.

It appears that speed-ups are possible by not traversing parts of the
graph that are known not to have changed since the previous
collection.

While it is, in general, not possible to determine whether an object
has changed, I believe that for certain objects (in particular DECL
nodes), there is a point in time when they are "frozen", meaning that
they won't change anymore. I suggest to record this fact in the
object (with the opportunity to unfreeze the object).

During GC traversal, GC should detect "immutable" objects. An object
can be marked immutable if it is frozen, and all contained objects are
immutable. Then, when marking objects, GC should not traverse into
immutable objects.

An exception probably needs to be made for identifiers: they are
always mutable, since the associated decls might change. However, it
is not necessary to traverse into identifiers from other tree nodes,
since all identifiers are GC roots, anyway.

In my specific application, 20% of the memory is consumed by
PARM_DECLs. I believe that those would be candidates for being marked
as immutable, and that the containers immediately storing them could
be marked as immutable as well (after the list of parameters for a
declaration has been built). All those PARM_DECLs indeed live up to
the end of the compilation; they are never collected, and never
changed. Thus, GC traversal could completely ignore them.

Does this approach have a chance of speeding up GC? What do you think?

Regards,
Martin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-03  6:25 Speeding up GC Martin v. Löwis
@ 2002-06-03  6:39 ` David S. Miller
  2002-06-03 18:10 ` Richard Henderson
  1 sibling, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-03  6:39 UTC (permalink / raw)
  To: loewis; +Cc: gcc

   From: loewis@informatik.hu-berlin.de (Martin v. Löwis)
   Date: 03 Jun 2002 15:25:05 +0200

   Does this approach have a chance of speeding up GC? What do you think?

One problem with your idea is that this would make the
mark list for each page more complex.  Currently it is
a single bitmap with one bit per object allocated from
that page.

A page can (currently) contain objects of various types.
Trees, RTL, and various other things can all reside within
the same page.

To implement your idea would likely be more expensive than
the GC walk itself.  That is, unless someone can come up
with some fantastic implementation I haven't thought of.

A more pressing problem, which I am working on, is that GCC
allocates a HUGE number of throw-away objects.  Ie. things
allocated then immediately never referenced again.  The two
major classes of such objects are:

1) SEQUENCE rtl for holding generated insns

   This is the part I am working on right now.  We emit
   ton of these things, especially when splitting insns.
   And the current "optimization" in gen_sequence which
   tries to avoid emitting the SEQUENCE causes the INSN
   to be dropped, so we end up emitting two INSNs instead.

   This stuff is so dumb and I hope I can finish up my
   patch before I next go to sleep.

2) "Can I do this?" type RTL generation.  Often we generate
   RTL just to see if creating a particular kind of
   expression is possible.  We do this a lot.  One example
   of this is "product_cheap_p ()" in gcc/loop.c

All of these are bad because they fragment GC pages, so if
you look at pages at various points in time you'll see a layout like
this:

   ... some other objects ... | dead SEQUENCE | ... | dead SEQUENCE

That dead SEQUENCE ties up the whole page from being used for other
orders.  It also thus makes GC's working set of pages larger than it
needs to be.

The next order of buisness is locality, as noted by Richard Earnshaw
and others.  GC makes for horrible locality in the RTL.  One way
to deal with this is to create an obstack like allocator from the
GC pages.  Currently each page is for allocating objects of a certain
power of 2 size.  In the obstack-like GC allocator, all objects of
a certain class (for example, long term RTL) get carved out of a
single page.  This hopefully can bring back the locality properties
of obstacks and the benefits of GC at the same time.  I have a
work-in-progress patch which I'll get back to after my patch which
kills of gen_sequence() as described above in #1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-03  6:25 Speeding up GC Martin v. Löwis
  2002-06-03  6:39 ` David S. Miller
@ 2002-06-03 18:10 ` Richard Henderson
  2002-06-04  2:34   ` Marc Espie
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-06-03 18:10 UTC (permalink / raw)
  To: Martin v. Löwis; +Cc: gcc

On Mon, Jun 03, 2002 at 03:25:05PM +0200, Martin v. LÃ¶wis wrote:
> While it is, in general, not possible to determine whether an object
> has changed, I believe that for certain objects (in particular DECL
> nodes), there is a point in time when they are "frozen", meaning that
> they won't change anymore. I suggest to record this fact in the
> object (with the opportunity to unfreeze the object).

A better way to do this is to actually compact the GC arena
and mark the pages read-only.  If the objects are modified,
the SIGSEGV handler marks the page read-write again and flags
the page as having been modified.

This is called "generational GC".  It has been discussed on
the list off and on.

It's on-hold waiting for Geoff Keating's GC rewrite, since
at the moment we can't compact the GC arena.

r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-03 18:10 ` Richard Henderson
@ 2002-06-04  2:34   ` Marc Espie
  2002-06-04  3:15     ` David S. Miller
  2002-06-04 10:03     ` Richard Henderson
  0 siblings, 2 replies; 32+ messages in thread
From: Marc Espie @ 2002-06-04  2:34 UTC (permalink / raw)
  To: rth; +Cc: gcc

In article <20020603180553.A13829@redhat.com> you write:
>
>A better way to do this is to actually compact the GC arena
>and mark the pages read-only.  If the objects are modified,
>the SIGSEGV handler marks the page read-write again and flags
>the page as having been modified.

Isn't that a bit expensive ? getting through a SEGV and a signal-handler
is not especially fast on most OSes.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  2:34   ` Marc Espie
@ 2002-06-04  3:15     ` David S. Miller
  2002-06-04 11:42       ` Alexandre Oliva
  2002-06-04 10:03     ` Richard Henderson
  1 sibling, 1 reply; 32+ messages in thread
From: David S. Miller @ 2002-06-04  3:15 UTC (permalink / raw)
  To: espie; +Cc: rth, gcc

   From: Marc Espie <espie@quatramaran.ens.fr>
   Date: Tue, 4 Jun 2002 11:20:35 +0200

   In article <20020603180553.A13829@redhat.com> you write:
   >A better way to do this is to actually compact the GC arena
   >and mark the pages read-only.  If the objects are modified,
   >the SIGSEGV handler marks the page read-write again and flags
   >the page as having been modified.

   Isn't that a bit expensive ? getting through a SEGV and a
   signal-handler is not especially fast on most OSes.

This, along with several other reasons, are why I think the
compacting allocator idea is going to be a net lose.

I think it is more worthwhile to clean up HOW we allocate
objects, especially "throw away" objects such as temporary RTL.

GC is a great way to simplify memory allocation, but it isn't an
excuse to do stupid things.  Many parts of the compiler do absolutely
stupid things which result in much fragmented GC memory.

A compacting allocator is just going to paper over the real problems,
not fix them.

Now, if we could use SIGSEGV handlers to totally eliminate the GC scan
itself, that I would buy as a performance win.  But you can't do that
and get the fine-grained in_use bitmap updates we get with current way
the GC scan operates.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  3:15     ` David S. Miller
@ 2002-06-04 11:42       ` Alexandre Oliva
  2002-06-04 12:10         ` Richard Henderson
  2002-06-04 12:38         ` David S. Miller
  0 siblings, 2 replies; 32+ messages in thread
From: Alexandre Oliva @ 2002-06-04 11:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: espie, rth, gcc

On Jun  4, 2002, "David S. Miller" <davem@redhat.com> wrote:

> I think it is more worthwhile to clean up HOW we allocate
> objects, especially "throw away" objects such as temporary RTL.

So how about going back to using obstacks for allocation and grouping
of objects, and doing garbage collection on obstacks, instead of on
individual objects?

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist                Professional serial bug killer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04 11:42       ` Alexandre Oliva
@ 2002-06-04 12:10         ` Richard Henderson
  2002-06-04 12:39           ` David S. Miller
  2002-06-04 12:38         ` David S. Miller
  1 sibling, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-06-04 12:10 UTC (permalink / raw)
  To: Alexandre Oliva; +Cc: David S. Miller, espie, gcc

On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote:
> ... and doing garbage collection on obstacks, instead of on
> individual objects?

How, pray tell, are you going to do that?


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04 12:10         ` Richard Henderson
@ 2002-06-04 12:39           ` David S. Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04 12:39 UTC (permalink / raw)
  To: rth; +Cc: aoliva, espie, gcc

   From: Richard Henderson <rth@redhat.com>
   Date: Tue, 4 Jun 2002 11:48:18 -0700

   On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote:
   > ... and doing garbage collection on obstacks, instead of on
   > individual objects?
   
   How, pray tell, are you going to do that?
   
Make obstack alloc pages from GC and make the obstack_free
be a NOP.  I even tried this, it works.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04 11:42       ` Alexandre Oliva
  2002-06-04 12:10         ` Richard Henderson
@ 2002-06-04 12:38         ` David S. Miller
  1 sibling, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04 12:38 UTC (permalink / raw)
  To: aoliva; +Cc: espie, rth, gcc

   From: Alexandre Oliva <aoliva@redhat.com>
   Date: 04 Jun 2002 15:41:10 -0300

   So how about going back to using obstacks for allocation and grouping
   of objects, and doing garbage collection on obstacks, instead of on
   individual objects?

I have a patch which essentially does that, but implemented within GGC
itself.  I create 3 GGC pools:

1	Obstack like GC allocation for long-term RTL
2	Obstack like GC allocation for short-term RTL
3	Obstack like GC allocation for Trees

I didn't enable usage of #3 in my changes.  This patch was deeply
a work in progress, and I'll get back to it after I finish up
with my "kill gen_sequence()" patch.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  2:34   ` Marc Espie
  2002-06-04  3:15     ` David S. Miller
@ 2002-06-04 10:03     ` Richard Henderson
  1 sibling, 0 replies; 32+ messages in thread
From: Richard Henderson @ 2002-06-04 10:03 UTC (permalink / raw)
  To: Marc Espie; +Cc: gcc

On Tue, Jun 04, 2002 at 11:20:35AM +0200, Marc Espie wrote:
> Isn't that a bit expensive ? getting through a SEGV and a signal-handler
> is not especially fast on most OSes.

The idea is that in all likelyhood, the objects _won't_ be changed.
You don't use this all the time for every page.


r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

[parent not found: <j47klg8pfy.fsf@informatik.hu-berlin.de.suse.lists.egcs>]

[parent not found: <20020603180553.A13829@redhat.com.suse.lists.egcs>]

* Re: Speeding up GC
       [not found] ` <20020603180553.A13829@redhat.com.suse.lists.egcs>
@ 2002-06-04  3:52   ` Andi Kleen
  2002-06-04  6:57     ` John Levon
  0 siblings, 1 reply; 32+ messages in thread
From: Andi Kleen @ 2002-06-04  3:52 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc

Richard Henderson <rth@redhat.com> writes:

> A better way to do this is to actually compact the GC arena
> and mark the pages read-only.  If the objects are modified,
> the SIGSEGV handler marks the page read-write again and flags
> the page as having been modified.

Unfortunately changing pages to read only and back is rather expensive.
Cheaper would be to expose the hardware's page dirty bit to user space
and use it for this. Apparently some OS like Solaris support it already
(at least that is what the relevant function in the boehm-gc suggest) 
I guess it could be added to Linux too.

-Andi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  3:52   ` Andi Kleen
@ 2002-06-04  6:57     ` John Levon
  2002-06-04  7:14       ` Paul Koning
  0 siblings, 1 reply; 32+ messages in thread
From: John Levon @ 2002-06-04  6:57 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard Henderson, gcc

On Tue, Jun 04, 2002 at 12:42:06PM +0200, Andi Kleen wrote:

> Unfortunately changing pages to read only and back is rather expensive.
> Cheaper would be to expose the hardware's page dirty bit to user space
> and use it for this. Apparently some OS like Solaris support it already
> (at least that is what the relevant function in the boehm-gc suggest) 
> I guess it could be added to Linux too.

To be most useful that needs to be read/write interface so the dirty bit
can be reset after a full GC, since the GC only cares about
dirty-since-last-GC (variant of card marking), whereas the VM has other
purposes for the dirty bit. It looks possible on x86 at least, by
backing up the dirty bit for the VM's benefit into one of the pte bits
that seem to be unused, as far as I can tell...

regards
john
-- 
"Do you mean to tell me that "The Prince" is not the set textbook for CS1072
Professional Issues ? What on earth do you learn in that course ?"
	- David Lester

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  6:57     ` John Levon
@ 2002-06-04  7:14       ` Paul Koning
  2002-06-04  7:37         ` David S. Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Paul Koning @ 2002-06-04  7:14 UTC (permalink / raw)
  To: levon; +Cc: ak, rth, gcc

Excerpt of message (sent 4 June 2002) by John Levon:
> On Tue, Jun 04, 2002 at 12:42:06PM +0200, Andi Kleen wrote:
> 
> > Unfortunately changing pages to read only and back is rather expensive.
> > Cheaper would be to expose the hardware's page dirty bit to user space
> > and use it for this. Apparently some OS like Solaris support it already
> > (at least that is what the relevant function in the boehm-gc suggest) 
> > I guess it could be added to Linux too.
> 
> To be most useful that needs to be read/write interface so the dirty bit
> can be reset after a full GC, since the GC only cares about
> dirty-since-last-GC (variant of card marking), whereas the VM has other
> purposes for the dirty bit. It looks possible on x86 at least, by
> backing up the dirty bit for the VM's benefit into one of the pte bits
> that seem to be unused, as far as I can tell...

It also depends on having a *hardware* dirty bit mechanism.  I don't
know about Sparc, but -- for example -- MIPS doesn't have any.  You
can simulate it by making the page read-only and taking the page fault
on the first write, and indeed the documentation suggests this and
describes how an OS could do it.  That of course wouldn't be all that
cheap. 

       paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:14       ` Paul Koning
@ 2002-06-04  7:37         ` David S. Miller
  2002-06-04  7:51           ` Daniel Berlin
                             ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04  7:37 UTC (permalink / raw)
  To: pkoning; +Cc: levon, ak, rth, gcc

   From: Paul Koning <pkoning@equallogic.com>
   Date: Tue, 4 Jun 2002 10:05:44 -0400

   It also depends on having a *hardware* dirty bit mechanism.  I don't
   know about Sparc, but -- for example -- MIPS doesn't have any.  You
   can simulate it by making the page read-only and taking the page fault
   on the first write, and indeed the documentation suggests this and
   describes how an OS could do it.  That of course wouldn't be all that
   cheap. 

Every software TLB cpu has this problem, and as you kind of hint at
the dirty bit is maintained in software.

I think messing around with dirty bits and OS interfaces to set/reset
them is absolutely the wrong path to go down.  This is going to make
GC much more complex from a per-host standpoint.  I also, for similar
reasons, think the SIGSEGV+mprotect idea is bad too.  I can guarentee
you we will need to disable this on half the platforms that have a
working mmap() because of either a) OS bugs in providing the correct
fault address to the SIGSEGV handler b) us not being able to figure
out how to make it work on a particular platform.

Really, the right answer, which I mentioned earlier, is to fix how we
allocate objects.  I mean it doesn't take a genius to figure out how
to plug all the holes on the RTL side right now.

1) Every function ever seen by gcc causes it to try and figure
   out if will return it's value in a register or not.  It does
   this via aggregate_value_p (), which if it will return into a
   register generates a hard REG rtl object EVERY CALL.

   FIX: Add new target interface that says "will return in reg"
        given function info.

2) Functions like loop.c:product_cheap_p() generate RTL to see
   if valid expressions of a given type can be created and if so
   see what it looks like.  Then the RTL is %100 forgotten.

   FIX: Create a "short-term" mode for GC allocation.  Enable this
   mode around such allocations.

3) SEQUENCE RTL objects are %99 short-term RTL and should die outside
   of reorg.c and ssa.c

   FIX: I have a patch which does which which I'll post for review
        to gcc-patches later today.

I could go on and on, just set breakpoints where RTL is allocated
and you'll see all the bonehead things the compiler does.  Another fun
exercise is to increase the size of each GC allocated object by
(N * sizeof(void *)) and tack on a record of N return addresses so
you can later see where unreferenced objects originate.  It's very
instructive.

To me, it seems much more prudent to fix the cause of the problem
instead of adding all of these page protection et al. games to GC.
What I am hearing right now from rth and others is "We allocate things
stupidly in the compiler, so we're going to fix the allocator."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:37         ` David S. Miller
@ 2002-06-04  7:51           ` Daniel Berlin
  2002-06-04  7:53             ` David S. Miller
  2002-06-04  8:10           ` Marc Espie
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Daniel Berlin @ 2002-06-04  7:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: pkoning, levon, ak, rth, gcc

> 
> 2) Functions like loop.c:product_cheap_p() generate RTL to see
>    if valid expressions of a given type can be created and if so
>    see what it looks like.  Then the RTL is %100 forgotten.
> 
>    FIX: Create a "short-term" mode for GC allocation.  Enable this
>    mode around such allocations.
It might make more sense, rather than even bothering to mark this stuff, 
to use memory pools/arenas.
Unlike obstacks, you can realloc/free given pieces of memory in the arena 
easily, but you can destroy all of the arena at the end of the pass or 
whatever.

For thinks where we *know* the lifetime to a certainty (IE it's only 
supposed to live for this pass), why should we be making the GC try to 
figure it out?

This seems like the right solution to me for our throwaway/fixed lifetime 
objects.

The problem with gc is that it's slow because it effectively tries to 
figure out when your objects are dead, so it's bad for throwaway objects.
The problem with obstacks is that they are too inflexible, and don't let 
us do anything but allocate (easily, anyway).
So something in the middle, that lets you allocate/free/realloc without 
penalty, but still blow away everything quickly when you are done seems 
right.
Which would, of course, be memory pools/arenas (whatever you want to call 
them).

> To me, it seems much more prudent to fix the cause of the problem
> instead of adding all of these page protection et al. games to GC.
> What I am hearing right now from rth and others is "We allocate things
> stupidly in the compiler, so we're going to fix the allocator."
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:51           ` Daniel Berlin
@ 2002-06-04  7:53             ` David S. Miller
  2002-06-04  7:58               ` Andi Kleen
  0 siblings, 1 reply; 32+ messages in thread
From: David S. Miller @ 2002-06-04  7:53 UTC (permalink / raw)
  To: dberlin; +Cc: pkoning, levon, ak, rth, gcc

   From: Daniel Berlin <dberlin@dberlin.org>
   Date: Tue, 4 Jun 2002 10:45:22 -0400 (EDT)

   > 2) Functions like loop.c:product_cheap_p() generate RTL to see
   >    if valid expressions of a given type can be created and if so
   >    see what it looks like.  Then the RTL is %100 forgotten.
   > 
   >    FIX: Create a "short-term" mode for GC allocation.  Enable this
   >    mode around such allocations.
   It might make more sense, rather than even bothering to mark this stuff, 
   to use memory pools/arenas.
   Unlike obstacks, you can realloc/free given pieces of memory in the arena 
   easily, but you can destroy all of the arena at the end of the pass or 
   whatever.

This is kind of the "next level" of what I was proposing.  Once you
have this "enter short-term mode" thing, you can have the "exit
short-term" call just reset the state of all the temporary GC pools as
if GC has been run on them and no references had been found.

See?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:53             ` David S. Miller
@ 2002-06-04  7:58               ` Andi Kleen
  2002-06-04  8:08                 ` David S. Miller
                                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Andi Kleen @ 2002-06-04  7:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: dberlin, pkoning, levon, ak, rth, gcc

> This is kind of the "next level" of what I was proposing.  Once you
> have this "enter short-term mode" thing, you can have the "exit
> short-term" call just reset the state of all the temporary GC pools as
> if GC has been run on them and no references had been found.

Just doesn't this bring all the bugs back that got fixed by the GC introduction?

-Andi

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:58               ` Andi Kleen
@ 2002-06-04  8:08                 ` David S. Miller
  2002-06-04  8:55                   ` law
  2002-06-04  8:17                 ` Marc Espie
  2002-06-04  9:21                 ` Joe Buck
  2 siblings, 1 reply; 32+ messages in thread
From: David S. Miller @ 2002-06-04  8:08 UTC (permalink / raw)
  To: ak; +Cc: dberlin, pkoning, levon, rth, gcc

   From: Andi Kleen <ak@suse.de>
   Date: Tue, 4 Jun 2002 16:52:54 +0200

   > This is kind of the "next level" of what I was proposing.  Once you
   > have this "enter short-term mode" thing, you can have the "exit
   > short-term" call just reset the state of all the temporary GC pools as
   > if GC has been run on them and no references had been found.

   Just doesn't this bring all the bugs back that got fixed by the GC
   introduction?

Do you have another fast way to handle short-term allocations?  I
actually like obstacks more and more the longer I look at the
performance mess have because of GC's introduction.  Going from
3 to 6 hour bootstraps on some SH systems, for example.  That kind of
crap isn't acceptable.

Remember in your answer to my question that the precondition is that
mprotect and the like are not considered fast.  :-)

I am of the opinion that people are in general too lazy about how they
allocate RTL objects when they make changes to the compiler middle and
back ends.  This is why we have this enormous problem today.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  8:08                 ` David S. Miller
@ 2002-06-04  8:55                   ` law
  0 siblings, 0 replies; 32+ messages in thread
From: law @ 2002-06-04  8:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, dberlin, pkoning, levon, rth, gcc

 In message <20020604.075504.56053777.davem@redhat.com>, "David S. Miller" 
writes:
 > Remember in your answer to my question that the precondition is that
 > mprotect and the like are not considered fast.  :-)
True.  However, if we use it wisely it may still be helpful.  For example, if
we have a large number of of objects that are not expected to ever change
and they have good locality, we could use the VM system to help us identify
when they do change (and which in turn would cause us to examine them in the
collecting pass).

 > I am of the opinion that people are in general too lazy about how they
 > allocate RTL objects when they make changes to the compiler middle and
 > back ends.  This is why we have this enormous problem today.
Agreed.

jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:58               ` Andi Kleen
  2002-06-04  8:08                 ` David S. Miller
@ 2002-06-04  8:17                 ` Marc Espie
  2002-06-04  9:21                 ` Joe Buck
  2 siblings, 0 replies; 32+ messages in thread
From: Marc Espie @ 2002-06-04  8:17 UTC (permalink / raw)
  To: gcc

In article <20020604165254.A697@wotan.suse.de> you write:
>> This is kind of the "next level" of what I was proposing.  Once you
>> have this "enter short-term mode" thing, you can have the "exit
>> short-term" call just reset the state of all the temporary GC pools as
>> if GC has been run on them and no references had been found.
>
>Just doesn't this bring all the bugs back that got fixed by the GC introduction?
>

Yes and no.

The bugs may be back, but their consequences are entirely different.
See, the obstack stuff was buggy, because you had to track down memory
usage to ensure you weren't using a dereferenced address, and that was
pervasive to the whole compiler.

Now, you still have the GC to take care of all icky parts. And you can
probably design the pooling stuff so that it just degrades performance if
you have a liveliness bug---not crash outright.

Plus, there are definitely parts of the compiler where local allocation is
reasonably obvious.

One issue that comes to mind, though, is that, as far as possible, the
interface should be designed so that you can fall back down to the `normal'
gc, and have checks the whole way (e.g., that temporary arenas are indeed
dead when they get freed). In the event you run into a bug, it becomes much
easier to track down, because the memory infrastructure more or less makes
the bug trivial.

Am I making sense ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:58               ` Andi Kleen
  2002-06-04  8:08                 ` David S. Miller
  2002-06-04  8:17                 ` Marc Espie
@ 2002-06-04  9:21                 ` Joe Buck
  2 siblings, 0 replies; 32+ messages in thread
From: Joe Buck @ 2002-06-04  9:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, dberlin, pkoning, levon, ak, rth, gcc

> > This is kind of the "next level" of what I was proposing.  Once you
> > have this "enter short-term mode" thing, you can have the "exit
> > short-term" call just reset the state of all the temporary GC pools as
> > if GC has been run on them and no references had been found.
> 
> Just doesn't this bring all the bugs back that got fixed by the GC introduction?

You make the short-term area just a pool of the GC, and you have two
modes, settable by a flag.  In the production mode, when "exit short-term"
is called you just nuke the short-term pool.  In the GC-debug mode, when
exit short-term is called you do a GC and abort if there's anything live
in the short-term pool.  You then run the full regression suite in the
GC-debug mode.  GC-debug is shipped, so that if any user gets a mysterious
ICE you can see if this is the cause.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:37         ` David S. Miller
  2002-06-04  7:51           ` Daniel Berlin
@ 2002-06-04  8:10           ` Marc Espie
  2002-06-04  8:58             ` David S. Miller
  2002-06-04  8:43           ` law
  2002-06-04 10:15           ` Richard Henderson
  3 siblings, 1 reply; 32+ messages in thread
From: Marc Espie @ 2002-06-04  8:10 UTC (permalink / raw)
  To: gcc

In article <20020604.072448.55836404.davem@redhat.com> you write:
>I think messing around with dirty bits and OS interfaces to set/reset
>them is absolutely the wrong path to go down.  This is going to make
>GC much more complex from a per-host standpoint.  I also, for similar
>reasons, think the SIGSEGV+mprotect idea is bad too.  I can guarentee
>you we will need to disable this on half the platforms that have a
>working mmap() because of either a) OS bugs in providing the correct
>fault address to the SIGSEGV handler b) us not being able to figure
>out how to make it work on a particular platform.
>

Thank you ! you formulated very well what I was thinking, and was not
able to pinpoint precisely.

Actually, I can pinpoint it partly: going down such a path would
1/ make gcc very much linux-dependent. Or at least, turn it into trash on
platforms without a `fast/non-buggy' SEGV+mprotect hack.
2/ help hide other inefficiencies in the compiler... and since the majority
of gcc users are i*86-*-linux*, well, you can guess the rest.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  8:10           ` Marc Espie
@ 2002-06-04  8:58             ` David S. Miller
  2002-06-04  9:05               ` Marc Espie
  0 siblings, 1 reply; 32+ messages in thread
From: David S. Miller @ 2002-06-04  8:58 UTC (permalink / raw)
  To: espie; +Cc: gcc

   From: Marc Espie <espie@quatramaran.ens.fr>
   Date: Tue, 4 Jun 2002 17:08:51 +0200

   Actually, I can pinpoint it partly: going down such a path would
   1/ make gcc very much linux-dependent.

Actually, the dirty bit facility in question was orignally found to be
present under Solaris.  Every UNIX provides the mprotect+SIGSEGV
feature though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  8:58             ` David S. Miller
@ 2002-06-04  9:05               ` Marc Espie
  2002-06-04  9:35                 ` David S. Miller
  0 siblings, 1 reply; 32+ messages in thread
From: Marc Espie @ 2002-06-04  9:05 UTC (permalink / raw)
  To: David S. Miller; +Cc: gcc

On Tue, Jun 04, 2002 at 08:52:07AM -0700, David S. Miller wrote:
>    From: Marc Espie <espie@quatramaran.ens.fr>
>    Date: Tue, 4 Jun 2002 17:08:51 +0200
>    
>    Actually, I can pinpoint it partly: going down such a path would
>    1/ make gcc very much linux-dependent.
> 
> Actually, the dirty bit facility in question was orignally found to be
> present under Solaris.  Every UNIX provides the mprotect+SIGSEGV
> feature though.

No.

1/ it may be prohibitively expensive on most OSes that implement it.

2/ Here, documentation of mprotect says:
Not all implementations will guarantee protection on a page basis;
the granularity of protection changes may be as large as an entire re-
gion.

Even the linux documentation says something similar:
 POSIX.1b says that mprotect can be used only on regions of memory 
 obtained from mmap(2)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  9:05               ` Marc Espie
@ 2002-06-04  9:35                 ` David S. Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04  9:35 UTC (permalink / raw)
  To: espie; +Cc: gcc

   From: Marc Espie <espie@nerim.net>
   Date: Tue, 4 Jun 2002 18:04:40 +0200

   1/ it may be prohibitively expensive on most OSes that implement it.

I don't question this.

   2/ Here, documentation of mprotect says:
   Not all implementations will guarantee protection on a page basis;
   the granularity of protection changes may be as large as an entire re-
   gion.

In reality, %99 of UNIX systems do page based granularity because so
many apps need it.  ELF shared libraries are challenging, at best, if
you don't do page based mprotect :-)

It's a moot point because I think the mprotect+SIGSEGV thing is stupid
from a performance standpoint anyways.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:37         ` David S. Miller
  2002-06-04  7:51           ` Daniel Berlin
  2002-06-04  8:10           ` Marc Espie
@ 2002-06-04  8:43           ` law
  2002-06-04  8:59             ` David S. Miller
  2002-06-04 10:15           ` Richard Henderson
  3 siblings, 1 reply; 32+ messages in thread
From: law @ 2002-06-04  8:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: pkoning, levon, ak, rth, gcc

 In message <20020604.072448.55836404.davem@redhat.com>, "David S. Miller" 
writes:
 > 2) Functions like loop.c:product_cheap_p() generate RTL to see
 >    if valid expressions of a given type can be created and if so
 >    see what it looks like.  Then the RTL is %100 forgotten.
 > 
 >    FIX: Create a "short-term" mode for GC allocation.  Enable this
 >    mode around such allocations.
This seems reasonable -- as long as we have a way for the short term object
to turn into a long term object.  That was one of the fundamental problems
with obstacks -- there was no reasonable way to change the lifetime of an
object after it was created.

It seems to me we have to classes of short term needs for RTL.

  1. We're going to generate various hunks of RTL, none of which will be
     live after the "short term" mode is complete.  Clearly we want these
     to use different pages in the GC allocator and not pollute the GC
     pages that are interesting long-term.

  2. We're going to generate various hunks of RTL, but one or more of those
     hunks may be live after we're done with the "short term" mode.  This
     is the more interesting (and I suspect more important) long term problem
     to solve.


jeff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  8:43           ` law
@ 2002-06-04  8:59             ` David S. Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04  8:59 UTC (permalink / raw)
  To: law; +Cc: pkoning, levon, ak, rth, gcc

   From: law@redhat.com
   Date: Tue, 04 Jun 2002 09:42:28 -0600

   It seems to me we have to classes of short term needs for RTL.
   
     1. We're going to generate various hunks of RTL, none of which will be
        live after the "short term" mode is complete.  Clearly we want these
        to use different pages in the GC allocator and not pollute the GC
        pages that are interesting long-term.
   
Right.

     2. We're going to generate various hunks of RTL, but one or more of those
        hunks may be live after we're done with the "short term" mode.  This
        is the more interesting (and I suspect more important) long term problem
        to solve.
   
Use for_each_rtx() on the RTL you made to mark the pieces such that
RTL unsharing will copy it, move back to long-term mode before you
do the unshare.

I have most of this in my head already. :-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04  7:37         ` David S. Miller
                             ` (2 preceding siblings ...)
  2002-06-04  8:43           ` law
@ 2002-06-04 10:15           ` Richard Henderson
  2002-06-04 10:22             ` David S. Miller
  3 siblings, 1 reply; 32+ messages in thread
From: Richard Henderson @ 2002-06-04 10:15 UTC (permalink / raw)
  To: David S. Miller; +Cc: pkoning, levon, ak, gcc

On Tue, Jun 04, 2002 at 07:24:48AM -0700, David S. Miller wrote:
> What I am hearing right now from rth and others is "We allocate things
> stupidly in the compiler, so we're going to fix the allocator."

No, that's not what you're hearing.

The problem that generational GC is used to fix is C++ including
a few million lines of code and creating 100MB of templates and
inline functions and whatnot.  Which are absolutely positively
not going to change once we parse them.  And yet we re-scan those
pages during each collection pass.

You're looking at a different problem.

One that needs solving, don't get me wrong, but rearranging how
we scan or don't scan temporaries isn't going to help the problem
described.

r~

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-06-04 10:15           ` Richard Henderson
@ 2002-06-04 10:22             ` David S. Miller
  0 siblings, 0 replies; 32+ messages in thread
From: David S. Miller @ 2002-06-04 10:22 UTC (permalink / raw)
  To: rth; +Cc: pkoning, levon, ak, gcc

   From: Richard Henderson <rth@redhat.com>
   Date: Tue, 4 Jun 2002 10:12:44 -0700

   You're looking at a different problem.

   One that needs solving, don't get me wrong, but rearranging how
   we scan or don't scan temporaries isn't going to help the problem
   described.

Now I understand.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
@ 2002-06-05  4:56 Robert Dewar
  0 siblings, 0 replies; 32+ messages in thread
From: Robert Dewar @ 2002-06-05  4:56 UTC (permalink / raw)
  To: aoliva, rth; +Cc: davem, espie, gcc

> On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote:
> > ... and doing garbage collection on obstacks, instead of on
> > individual objects?

I would assume that a PL-1 style area approach might be used ???

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
@ 2002-08-17  5:13 Tim Josling
  2002-08-18  0:25 ` Guillermo Ballester Valor
  0 siblings, 1 reply; 32+ messages in thread
From: Tim Josling @ 2002-08-17  5:13 UTC (permalink / raw)
  To: gcc

I ran a few more tests (Pentium III with 256mb ram, 1Ghz).

1. I set minimum storage used so that gc_collect never did anything. This
increased the build time by about 3% compared to normal. The largest program
insn-recog.c just fit into my 256mb of ram. If the build is 3% slower, then
GCC must be about 5% slower because phase 1 of the build is the native
compiler, and there are various shell scripts and other programs that run as
well, diluting the speedup, so the GCC speedup is more than the bootstrap
speedup.

2. Replace ggc-page.c with a routine that just allocated everything with
xmalloc. I had to add a 4 bytes header for the storage length because
gc_realloc is used in half a dozen places. This bootstrapped about 3% faster
than the normal bootstrap, meaning the compiler was about 5% faster.

Implications from this:

1. GC overhead is high. Allocating with xmalloc was 5% faster, even though no
storage was freed. It was 10% faster than running the GC allocation code but
not freeing anything. So the combined effect of the extra path length from
gc_alloc and the cache hit from non-locality is at least 10%. In fact it is
probably more, because I had to allocate the extra 4 bytes, and so I used more
memory, and additionally, I did not align the storage which usually would have
a performance impact. None of the above overhead of GC is captured in the GCC
reports, which therefore greatly understate the cost of GC.

2. GC overhead in -O0 is very high. The percentage of GC overhead in
non-optimised compiles is far greater, based on my earlier profiles. It is
reported (by gprof) as ~10-12%. So it is probably more like 20-25% (based on
ratios above), which is very large.

3. A new GCC default and option are needed. It would in my opinion be a good
idea to have a heuristic which said

- If the file is small, do not do GC. Instead do simple contiguous allocation.
We should align things properly, and also we should change gc_realloc to
require the callers to provide the old length so we don't have to store the
length.

- If the file is very large, use GC. The default size could be a configure
option.

- Allow user to override and force use or non-use of GC.

I would clean up and submit this patch but it is just to hard too get patches
onto gcc if you do not have write access.

4. This all makes it appear that there are a lot of performance wins to be
found in GCC. Future changes to GC should be done after testing of the effect
on performance.

Tim Josling

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: Speeding up GC
  2002-08-17  5:13 Tim Josling
@ 2002-08-18  0:25 ` Guillermo Ballester Valor
  0 siblings, 0 replies; 32+ messages in thread
From: Guillermo Ballester Valor @ 2002-08-18  0:25 UTC (permalink / raw)
  To: gcc

Hi,

I'm not gcc developer, but this mail seems to me a lot promising. Is there any 
strong problem with the Tim's allocation sugestions?.  

Guillermo.

On Saturday 17 August 2002 14:12, Tim Josling wrote:
> I ran a few more tests (Pentium III with 256mb ram, 1Ghz).
>
> 1. I set minimum storage used so that gc_collect never did anything. This
> increased the build time by about 3% compared to normal. The largest
> program insn-recog.c just fit into my 256mb of ram. If the build is 3%
> slower, then GCC must be about 5% slower because phase 1 of the build is
> the native compiler, and there are various shell scripts and other programs
> that run as well, diluting the speedup, so the GCC speedup is more than the
> bootstrap speedup.
>
> 2. Replace ggc-page.c with a routine that just allocated everything with
> xmalloc. I had to add a 4 bytes header for the storage length because
> gc_realloc is used in half a dozen places. This bootstrapped about 3%
> faster than the normal bootstrap, meaning the compiler was about 5% faster.
>
> Implications from this:
>
> 1. GC overhead is high. Allocating with xmalloc was 5% faster, even though
> no storage was freed. It was 10% faster than running the GC allocation code
> but not freeing anything. So the combined effect of the extra path length
> from gc_alloc and the cache hit from non-locality is at least 10%. In fact
> it is probably more, because I had to allocate the extra 4 bytes, and so I
> used more memory, and additionally, I did not align the storage which
> usually would have a performance impact. None of the above overhead of GC
> is captured in the GCC reports, which therefore greatly understate the cost
> of GC.
>
> 2. GC overhead in -O0 is very high. The percentage of GC overhead in
> non-optimised compiles is far greater, based on my earlier profiles. It is
> reported (by gprof) as ~10-12%. So it is probably more like 20-25% (based
> on ratios above), which is very large.
>
> 3. A new GCC default and option are needed. It would in my opinion be a
> good idea to have a heuristic which said
>
> - If the file is small, do not do GC. Instead do simple contiguous
> allocation. We should align things properly, and also we should change
> gc_realloc to require the callers to provide the old length so we don't
> have to store the length.
>
> - If the file is very large, use GC. The default size could be a configure
> option.
>
> - Allow user to override and force use or non-use of GC.
>
> I would clean up and submit this patch but it is just to hard too get
> patches onto gcc if you do not have write access.
>
> 4. This all makes it appear that there are a lot of performance wins to be
> found in GCC. Future changes to GC should be done after testing of the
> effect on performance.
>
> Tim Josling

-- 
Guillermo Ballester Valor
gbv@oxixares.com
Ogijares, Granada  SPAIN
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2002-08-18  0:25 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-06-03  6:25 Speeding up GC Martin v. Löwis
2002-06-03  6:39 ` David S. Miller
2002-06-03 18:10 ` Richard Henderson
2002-06-04  2:34   ` Marc Espie
2002-06-04  3:15     ` David S. Miller
2002-06-04 11:42       ` Alexandre Oliva
2002-06-04 12:10         ` Richard Henderson
2002-06-04 12:39           ` David S. Miller
2002-06-04 12:38         ` David S. Miller
2002-06-04 10:03     ` Richard Henderson
     [not found] <j47klg8pfy.fsf@informatik.hu-berlin.de.suse.lists.egcs>
     [not found] ` <20020603180553.A13829@redhat.com.suse.lists.egcs>
2002-06-04  3:52   ` Andi Kleen
2002-06-04  6:57     ` John Levon
2002-06-04  7:14       ` Paul Koning
2002-06-04  7:37         ` David S. Miller
2002-06-04  7:51           ` Daniel Berlin
2002-06-04  7:53             ` David S. Miller
2002-06-04  7:58               ` Andi Kleen
2002-06-04  8:08                 ` David S. Miller
2002-06-04  8:55                   ` law
2002-06-04  8:17                 ` Marc Espie
2002-06-04  9:21                 ` Joe Buck
2002-06-04  8:10           ` Marc Espie
2002-06-04  8:58             ` David S. Miller
2002-06-04  9:05               ` Marc Espie
2002-06-04  9:35                 ` David S. Miller
2002-06-04  8:43           ` law
2002-06-04  8:59             ` David S. Miller
2002-06-04 10:15           ` Richard Henderson
2002-06-04 10:22             ` David S. Miller
2002-06-05  4:56 Robert Dewar
2002-08-17  5:13 Tim Josling
2002-08-18  0:25 ` Guillermo Ballester Valor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).