* Speeding up GC @ 2002-06-03 6:25 Martin v. Löwis 2002-06-03 6:39 ` David S. Miller 2002-06-03 18:10 ` Richard Henderson 0 siblings, 2 replies; 32+ messages in thread From: Martin v. Löwis @ 2002-06-03 6:25 UTC (permalink / raw) To: gcc While profiling cc1plus, I notice that a large amount of time is spent in marking objects during GC. I suspect that, for each collection, the same objects are marked over and over again. It appears that speed-ups are possible by not traversing parts of the graph that are known not to have changed since the previous collection. While it is, in general, not possible to determine whether an object has changed, I believe that for certain objects (in particular DECL nodes), there is a point in time when they are "frozen", meaning that they won't change anymore. I suggest to record this fact in the object (with the opportunity to unfreeze the object). During GC traversal, GC should detect "immutable" objects. An object can be marked immutable if it is frozen, and all contained objects are immutable. Then, when marking objects, GC should not traverse into immutable objects. An exception probably needs to be made for identifiers: they are always mutable, since the associated decls might change. However, it is not necessary to traverse into identifiers from other tree nodes, since all identifiers are GC roots, anyway. In my specific application, 20% of the memory is consumed by PARM_DECLs. I believe that those would be candidates for being marked as immutable, and that the containers immediately storing them could be marked as immutable as well (after the list of parameters for a declaration has been built). All those PARM_DECLs indeed live up to the end of the compilation; they are never collected, and never changed. Thus, GC traversal could completely ignore them. Does this approach have a chance of speeding up GC? What do you think? Regards, Martin ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-03 6:25 Speeding up GC Martin v. Löwis @ 2002-06-03 6:39 ` David S. Miller 2002-06-03 18:10 ` Richard Henderson 1 sibling, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-03 6:39 UTC (permalink / raw) To: loewis; +Cc: gcc From: loewis@informatik.hu-berlin.de (Martin v. Löwis) Date: 03 Jun 2002 15:25:05 +0200 Does this approach have a chance of speeding up GC? What do you think? One problem with your idea is that this would make the mark list for each page more complex. Currently it is a single bitmap with one bit per object allocated from that page. A page can (currently) contain objects of various types. Trees, RTL, and various other things can all reside within the same page. To implement your idea would likely be more expensive than the GC walk itself. That is, unless someone can come up with some fantastic implementation I haven't thought of. A more pressing problem, which I am working on, is that GCC allocates a HUGE number of throw-away objects. Ie. things allocated then immediately never referenced again. The two major classes of such objects are: 1) SEQUENCE rtl for holding generated insns This is the part I am working on right now. We emit ton of these things, especially when splitting insns. And the current "optimization" in gen_sequence which tries to avoid emitting the SEQUENCE causes the INSN to be dropped, so we end up emitting two INSNs instead. This stuff is so dumb and I hope I can finish up my patch before I next go to sleep. 2) "Can I do this?" type RTL generation. Often we generate RTL just to see if creating a particular kind of expression is possible. We do this a lot. One example of this is "product_cheap_p ()" in gcc/loop.c All of these are bad because they fragment GC pages, so if you look at pages at various points in time you'll see a layout like this: ... some other objects ... | dead SEQUENCE | ... | dead SEQUENCE That dead SEQUENCE ties up the whole page from being used for other orders. It also thus makes GC's working set of pages larger than it needs to be. The next order of buisness is locality, as noted by Richard Earnshaw and others. GC makes for horrible locality in the RTL. One way to deal with this is to create an obstack like allocator from the GC pages. Currently each page is for allocating objects of a certain power of 2 size. In the obstack-like GC allocator, all objects of a certain class (for example, long term RTL) get carved out of a single page. This hopefully can bring back the locality properties of obstacks and the benefits of GC at the same time. I have a work-in-progress patch which I'll get back to after my patch which kills of gen_sequence() as described above in #1 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-03 6:25 Speeding up GC Martin v. Löwis 2002-06-03 6:39 ` David S. Miller @ 2002-06-03 18:10 ` Richard Henderson 2002-06-04 2:34 ` Marc Espie 1 sibling, 1 reply; 32+ messages in thread From: Richard Henderson @ 2002-06-03 18:10 UTC (permalink / raw) To: Martin v. Löwis; +Cc: gcc On Mon, Jun 03, 2002 at 03:25:05PM +0200, Martin v. Löwis wrote: > While it is, in general, not possible to determine whether an object > has changed, I believe that for certain objects (in particular DECL > nodes), there is a point in time when they are "frozen", meaning that > they won't change anymore. I suggest to record this fact in the > object (with the opportunity to unfreeze the object). A better way to do this is to actually compact the GC arena and mark the pages read-only. If the objects are modified, the SIGSEGV handler marks the page read-write again and flags the page as having been modified. This is called "generational GC". It has been discussed on the list off and on. It's on-hold waiting for Geoff Keating's GC rewrite, since at the moment we can't compact the GC arena. r~ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-03 18:10 ` Richard Henderson @ 2002-06-04 2:34 ` Marc Espie 2002-06-04 3:15 ` David S. Miller 2002-06-04 10:03 ` Richard Henderson 0 siblings, 2 replies; 32+ messages in thread From: Marc Espie @ 2002-06-04 2:34 UTC (permalink / raw) To: rth; +Cc: gcc In article <20020603180553.A13829@redhat.com> you write: > >A better way to do this is to actually compact the GC arena >and mark the pages read-only. If the objects are modified, >the SIGSEGV handler marks the page read-write again and flags >the page as having been modified. Isn't that a bit expensive ? getting through a SEGV and a signal-handler is not especially fast on most OSes. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 2:34 ` Marc Espie @ 2002-06-04 3:15 ` David S. Miller 2002-06-04 11:42 ` Alexandre Oliva 2002-06-04 10:03 ` Richard Henderson 1 sibling, 1 reply; 32+ messages in thread From: David S. Miller @ 2002-06-04 3:15 UTC (permalink / raw) To: espie; +Cc: rth, gcc From: Marc Espie <espie@quatramaran.ens.fr> Date: Tue, 4 Jun 2002 11:20:35 +0200 In article <20020603180553.A13829@redhat.com> you write: >A better way to do this is to actually compact the GC arena >and mark the pages read-only. If the objects are modified, >the SIGSEGV handler marks the page read-write again and flags >the page as having been modified. Isn't that a bit expensive ? getting through a SEGV and a signal-handler is not especially fast on most OSes. This, along with several other reasons, are why I think the compacting allocator idea is going to be a net lose. I think it is more worthwhile to clean up HOW we allocate objects, especially "throw away" objects such as temporary RTL. GC is a great way to simplify memory allocation, but it isn't an excuse to do stupid things. Many parts of the compiler do absolutely stupid things which result in much fragmented GC memory. A compacting allocator is just going to paper over the real problems, not fix them. Now, if we could use SIGSEGV handlers to totally eliminate the GC scan itself, that I would buy as a performance win. But you can't do that and get the fine-grained in_use bitmap updates we get with current way the GC scan operates. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 3:15 ` David S. Miller @ 2002-06-04 11:42 ` Alexandre Oliva 2002-06-04 12:10 ` Richard Henderson 2002-06-04 12:38 ` David S. Miller 0 siblings, 2 replies; 32+ messages in thread From: Alexandre Oliva @ 2002-06-04 11:42 UTC (permalink / raw) To: David S. Miller; +Cc: espie, rth, gcc On Jun 4, 2002, "David S. Miller" <davem@redhat.com> wrote: > I think it is more worthwhile to clean up HOW we allocate > objects, especially "throw away" objects such as temporary RTL. So how about going back to using obstacks for allocation and grouping of objects, and doing garbage collection on obstacks, instead of on individual objects? -- Alexandre Oliva Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/ Red Hat GCC Developer aoliva@{cygnus.com, redhat.com} CS PhD student at IC-Unicamp oliva@{lsd.ic.unicamp.br, gnu.org} Free Software Evangelist Professional serial bug killer ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 11:42 ` Alexandre Oliva @ 2002-06-04 12:10 ` Richard Henderson 2002-06-04 12:39 ` David S. Miller 2002-06-04 12:38 ` David S. Miller 1 sibling, 1 reply; 32+ messages in thread From: Richard Henderson @ 2002-06-04 12:10 UTC (permalink / raw) To: Alexandre Oliva; +Cc: David S. Miller, espie, gcc On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote: > ... and doing garbage collection on obstacks, instead of on > individual objects? How, pray tell, are you going to do that? r~ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 12:10 ` Richard Henderson @ 2002-06-04 12:39 ` David S. Miller 0 siblings, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 12:39 UTC (permalink / raw) To: rth; +Cc: aoliva, espie, gcc From: Richard Henderson <rth@redhat.com> Date: Tue, 4 Jun 2002 11:48:18 -0700 On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote: > ... and doing garbage collection on obstacks, instead of on > individual objects? How, pray tell, are you going to do that? Make obstack alloc pages from GC and make the obstack_free be a NOP. I even tried this, it works. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 11:42 ` Alexandre Oliva 2002-06-04 12:10 ` Richard Henderson @ 2002-06-04 12:38 ` David S. Miller 1 sibling, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 12:38 UTC (permalink / raw) To: aoliva; +Cc: espie, rth, gcc From: Alexandre Oliva <aoliva@redhat.com> Date: 04 Jun 2002 15:41:10 -0300 So how about going back to using obstacks for allocation and grouping of objects, and doing garbage collection on obstacks, instead of on individual objects? I have a patch which essentially does that, but implemented within GGC itself. I create 3 GGC pools: 1 Obstack like GC allocation for long-term RTL 2 Obstack like GC allocation for short-term RTL 3 Obstack like GC allocation for Trees I didn't enable usage of #3 in my changes. This patch was deeply a work in progress, and I'll get back to it after I finish up with my "kill gen_sequence()" patch. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 2:34 ` Marc Espie 2002-06-04 3:15 ` David S. Miller @ 2002-06-04 10:03 ` Richard Henderson 1 sibling, 0 replies; 32+ messages in thread From: Richard Henderson @ 2002-06-04 10:03 UTC (permalink / raw) To: Marc Espie; +Cc: gcc On Tue, Jun 04, 2002 at 11:20:35AM +0200, Marc Espie wrote: > Isn't that a bit expensive ? getting through a SEGV and a signal-handler > is not especially fast on most OSes. The idea is that in all likelyhood, the objects _won't_ be changed. You don't use this all the time for every page. r~ ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <j47klg8pfy.fsf@informatik.hu-berlin.de.suse.lists.egcs>]
[parent not found: <20020603180553.A13829@redhat.com.suse.lists.egcs>]
* Re: Speeding up GC [not found] ` <20020603180553.A13829@redhat.com.suse.lists.egcs> @ 2002-06-04 3:52 ` Andi Kleen 2002-06-04 6:57 ` John Levon 0 siblings, 1 reply; 32+ messages in thread From: Andi Kleen @ 2002-06-04 3:52 UTC (permalink / raw) To: Richard Henderson; +Cc: gcc Richard Henderson <rth@redhat.com> writes: > A better way to do this is to actually compact the GC arena > and mark the pages read-only. If the objects are modified, > the SIGSEGV handler marks the page read-write again and flags > the page as having been modified. Unfortunately changing pages to read only and back is rather expensive. Cheaper would be to expose the hardware's page dirty bit to user space and use it for this. Apparently some OS like Solaris support it already (at least that is what the relevant function in the boehm-gc suggest) I guess it could be added to Linux too. -Andi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 3:52 ` Andi Kleen @ 2002-06-04 6:57 ` John Levon 2002-06-04 7:14 ` Paul Koning 0 siblings, 1 reply; 32+ messages in thread From: John Levon @ 2002-06-04 6:57 UTC (permalink / raw) To: Andi Kleen; +Cc: Richard Henderson, gcc On Tue, Jun 04, 2002 at 12:42:06PM +0200, Andi Kleen wrote: > Unfortunately changing pages to read only and back is rather expensive. > Cheaper would be to expose the hardware's page dirty bit to user space > and use it for this. Apparently some OS like Solaris support it already > (at least that is what the relevant function in the boehm-gc suggest) > I guess it could be added to Linux too. To be most useful that needs to be read/write interface so the dirty bit can be reset after a full GC, since the GC only cares about dirty-since-last-GC (variant of card marking), whereas the VM has other purposes for the dirty bit. It looks possible on x86 at least, by backing up the dirty bit for the VM's benefit into one of the pte bits that seem to be unused, as far as I can tell... regards john -- "Do you mean to tell me that "The Prince" is not the set textbook for CS1072 Professional Issues ? What on earth do you learn in that course ?" - David Lester ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 6:57 ` John Levon @ 2002-06-04 7:14 ` Paul Koning 2002-06-04 7:37 ` David S. Miller 0 siblings, 1 reply; 32+ messages in thread From: Paul Koning @ 2002-06-04 7:14 UTC (permalink / raw) To: levon; +Cc: ak, rth, gcc Excerpt of message (sent 4 June 2002) by John Levon: > On Tue, Jun 04, 2002 at 12:42:06PM +0200, Andi Kleen wrote: > > > Unfortunately changing pages to read only and back is rather expensive. > > Cheaper would be to expose the hardware's page dirty bit to user space > > and use it for this. Apparently some OS like Solaris support it already > > (at least that is what the relevant function in the boehm-gc suggest) > > I guess it could be added to Linux too. > > To be most useful that needs to be read/write interface so the dirty bit > can be reset after a full GC, since the GC only cares about > dirty-since-last-GC (variant of card marking), whereas the VM has other > purposes for the dirty bit. It looks possible on x86 at least, by > backing up the dirty bit for the VM's benefit into one of the pte bits > that seem to be unused, as far as I can tell... It also depends on having a *hardware* dirty bit mechanism. I don't know about Sparc, but -- for example -- MIPS doesn't have any. You can simulate it by making the page read-only and taking the page fault on the first write, and indeed the documentation suggests this and describes how an OS could do it. That of course wouldn't be all that cheap. paul ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:14 ` Paul Koning @ 2002-06-04 7:37 ` David S. Miller 2002-06-04 7:51 ` Daniel Berlin ` (3 more replies) 0 siblings, 4 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 7:37 UTC (permalink / raw) To: pkoning; +Cc: levon, ak, rth, gcc From: Paul Koning <pkoning@equallogic.com> Date: Tue, 4 Jun 2002 10:05:44 -0400 It also depends on having a *hardware* dirty bit mechanism. I don't know about Sparc, but -- for example -- MIPS doesn't have any. You can simulate it by making the page read-only and taking the page fault on the first write, and indeed the documentation suggests this and describes how an OS could do it. That of course wouldn't be all that cheap. Every software TLB cpu has this problem, and as you kind of hint at the dirty bit is maintained in software. I think messing around with dirty bits and OS interfaces to set/reset them is absolutely the wrong path to go down. This is going to make GC much more complex from a per-host standpoint. I also, for similar reasons, think the SIGSEGV+mprotect idea is bad too. I can guarentee you we will need to disable this on half the platforms that have a working mmap() because of either a) OS bugs in providing the correct fault address to the SIGSEGV handler b) us not being able to figure out how to make it work on a particular platform. Really, the right answer, which I mentioned earlier, is to fix how we allocate objects. I mean it doesn't take a genius to figure out how to plug all the holes on the RTL side right now. 1) Every function ever seen by gcc causes it to try and figure out if will return it's value in a register or not. It does this via aggregate_value_p (), which if it will return into a register generates a hard REG rtl object EVERY CALL. FIX: Add new target interface that says "will return in reg" given function info. 2) Functions like loop.c:product_cheap_p() generate RTL to see if valid expressions of a given type can be created and if so see what it looks like. Then the RTL is %100 forgotten. FIX: Create a "short-term" mode for GC allocation. Enable this mode around such allocations. 3) SEQUENCE RTL objects are %99 short-term RTL and should die outside of reorg.c and ssa.c FIX: I have a patch which does which which I'll post for review to gcc-patches later today. I could go on and on, just set breakpoints where RTL is allocated and you'll see all the bonehead things the compiler does. Another fun exercise is to increase the size of each GC allocated object by (N * sizeof(void *)) and tack on a record of N return addresses so you can later see where unreferenced objects originate. It's very instructive. To me, it seems much more prudent to fix the cause of the problem instead of adding all of these page protection et al. games to GC. What I am hearing right now from rth and others is "We allocate things stupidly in the compiler, so we're going to fix the allocator." ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:37 ` David S. Miller @ 2002-06-04 7:51 ` Daniel Berlin 2002-06-04 7:53 ` David S. Miller 2002-06-04 8:10 ` Marc Espie ` (2 subsequent siblings) 3 siblings, 1 reply; 32+ messages in thread From: Daniel Berlin @ 2002-06-04 7:51 UTC (permalink / raw) To: David S. Miller; +Cc: pkoning, levon, ak, rth, gcc > > 2) Functions like loop.c:product_cheap_p() generate RTL to see > if valid expressions of a given type can be created and if so > see what it looks like. Then the RTL is %100 forgotten. > > FIX: Create a "short-term" mode for GC allocation. Enable this > mode around such allocations. It might make more sense, rather than even bothering to mark this stuff, to use memory pools/arenas. Unlike obstacks, you can realloc/free given pieces of memory in the arena easily, but you can destroy all of the arena at the end of the pass or whatever. For thinks where we *know* the lifetime to a certainty (IE it's only supposed to live for this pass), why should we be making the GC try to figure it out? This seems like the right solution to me for our throwaway/fixed lifetime objects. The problem with gc is that it's slow because it effectively tries to figure out when your objects are dead, so it's bad for throwaway objects. The problem with obstacks is that they are too inflexible, and don't let us do anything but allocate (easily, anyway). So something in the middle, that lets you allocate/free/realloc without penalty, but still blow away everything quickly when you are done seems right. Which would, of course, be memory pools/arenas (whatever you want to call them). > To me, it seems much more prudent to fix the cause of the problem > instead of adding all of these page protection et al. games to GC. > What I am hearing right now from rth and others is "We allocate things > stupidly in the compiler, so we're going to fix the allocator." > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:51 ` Daniel Berlin @ 2002-06-04 7:53 ` David S. Miller 2002-06-04 7:58 ` Andi Kleen 0 siblings, 1 reply; 32+ messages in thread From: David S. Miller @ 2002-06-04 7:53 UTC (permalink / raw) To: dberlin; +Cc: pkoning, levon, ak, rth, gcc From: Daniel Berlin <dberlin@dberlin.org> Date: Tue, 4 Jun 2002 10:45:22 -0400 (EDT) > 2) Functions like loop.c:product_cheap_p() generate RTL to see > if valid expressions of a given type can be created and if so > see what it looks like. Then the RTL is %100 forgotten. > > FIX: Create a "short-term" mode for GC allocation. Enable this > mode around such allocations. It might make more sense, rather than even bothering to mark this stuff, to use memory pools/arenas. Unlike obstacks, you can realloc/free given pieces of memory in the arena easily, but you can destroy all of the arena at the end of the pass or whatever. This is kind of the "next level" of what I was proposing. Once you have this "enter short-term mode" thing, you can have the "exit short-term" call just reset the state of all the temporary GC pools as if GC has been run on them and no references had been found. See? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:53 ` David S. Miller @ 2002-06-04 7:58 ` Andi Kleen 2002-06-04 8:08 ` David S. Miller ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Andi Kleen @ 2002-06-04 7:58 UTC (permalink / raw) To: David S. Miller; +Cc: dberlin, pkoning, levon, ak, rth, gcc > This is kind of the "next level" of what I was proposing. Once you > have this "enter short-term mode" thing, you can have the "exit > short-term" call just reset the state of all the temporary GC pools as > if GC has been run on them and no references had been found. Just doesn't this bring all the bugs back that got fixed by the GC introduction? -Andi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:58 ` Andi Kleen @ 2002-06-04 8:08 ` David S. Miller 2002-06-04 8:55 ` law 2002-06-04 8:17 ` Marc Espie 2002-06-04 9:21 ` Joe Buck 2 siblings, 1 reply; 32+ messages in thread From: David S. Miller @ 2002-06-04 8:08 UTC (permalink / raw) To: ak; +Cc: dberlin, pkoning, levon, rth, gcc From: Andi Kleen <ak@suse.de> Date: Tue, 4 Jun 2002 16:52:54 +0200 > This is kind of the "next level" of what I was proposing. Once you > have this "enter short-term mode" thing, you can have the "exit > short-term" call just reset the state of all the temporary GC pools as > if GC has been run on them and no references had been found. Just doesn't this bring all the bugs back that got fixed by the GC introduction? Do you have another fast way to handle short-term allocations? I actually like obstacks more and more the longer I look at the performance mess have because of GC's introduction. Going from 3 to 6 hour bootstraps on some SH systems, for example. That kind of crap isn't acceptable. Remember in your answer to my question that the precondition is that mprotect and the like are not considered fast. :-) I am of the opinion that people are in general too lazy about how they allocate RTL objects when they make changes to the compiler middle and back ends. This is why we have this enormous problem today. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 8:08 ` David S. Miller @ 2002-06-04 8:55 ` law 0 siblings, 0 replies; 32+ messages in thread From: law @ 2002-06-04 8:55 UTC (permalink / raw) To: David S. Miller; +Cc: ak, dberlin, pkoning, levon, rth, gcc In message <20020604.075504.56053777.davem@redhat.com>, "David S. Miller" writes: > Remember in your answer to my question that the precondition is that > mprotect and the like are not considered fast. :-) True. However, if we use it wisely it may still be helpful. For example, if we have a large number of of objects that are not expected to ever change and they have good locality, we could use the VM system to help us identify when they do change (and which in turn would cause us to examine them in the collecting pass). > I am of the opinion that people are in general too lazy about how they > allocate RTL objects when they make changes to the compiler middle and > back ends. This is why we have this enormous problem today. Agreed. jeff ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:58 ` Andi Kleen 2002-06-04 8:08 ` David S. Miller @ 2002-06-04 8:17 ` Marc Espie 2002-06-04 9:21 ` Joe Buck 2 siblings, 0 replies; 32+ messages in thread From: Marc Espie @ 2002-06-04 8:17 UTC (permalink / raw) To: gcc In article <20020604165254.A697@wotan.suse.de> you write: >> This is kind of the "next level" of what I was proposing. Once you >> have this "enter short-term mode" thing, you can have the "exit >> short-term" call just reset the state of all the temporary GC pools as >> if GC has been run on them and no references had been found. > >Just doesn't this bring all the bugs back that got fixed by the GC introduction? > Yes and no. The bugs may be back, but their consequences are entirely different. See, the obstack stuff was buggy, because you had to track down memory usage to ensure you weren't using a dereferenced address, and that was pervasive to the whole compiler. Now, you still have the GC to take care of all icky parts. And you can probably design the pooling stuff so that it just degrades performance if you have a liveliness bug---not crash outright. Plus, there are definitely parts of the compiler where local allocation is reasonably obvious. One issue that comes to mind, though, is that, as far as possible, the interface should be designed so that you can fall back down to the `normal' gc, and have checks the whole way (e.g., that temporary arenas are indeed dead when they get freed). In the event you run into a bug, it becomes much easier to track down, because the memory infrastructure more or less makes the bug trivial. Am I making sense ? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:58 ` Andi Kleen 2002-06-04 8:08 ` David S. Miller 2002-06-04 8:17 ` Marc Espie @ 2002-06-04 9:21 ` Joe Buck 2 siblings, 0 replies; 32+ messages in thread From: Joe Buck @ 2002-06-04 9:21 UTC (permalink / raw) To: Andi Kleen; +Cc: David S. Miller, dberlin, pkoning, levon, ak, rth, gcc > > This is kind of the "next level" of what I was proposing. Once you > > have this "enter short-term mode" thing, you can have the "exit > > short-term" call just reset the state of all the temporary GC pools as > > if GC has been run on them and no references had been found. > > Just doesn't this bring all the bugs back that got fixed by the GC introduction? You make the short-term area just a pool of the GC, and you have two modes, settable by a flag. In the production mode, when "exit short-term" is called you just nuke the short-term pool. In the GC-debug mode, when exit short-term is called you do a GC and abort if there's anything live in the short-term pool. You then run the full regression suite in the GC-debug mode. GC-debug is shipped, so that if any user gets a mysterious ICE you can see if this is the cause. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:37 ` David S. Miller 2002-06-04 7:51 ` Daniel Berlin @ 2002-06-04 8:10 ` Marc Espie 2002-06-04 8:58 ` David S. Miller 2002-06-04 8:43 ` law 2002-06-04 10:15 ` Richard Henderson 3 siblings, 1 reply; 32+ messages in thread From: Marc Espie @ 2002-06-04 8:10 UTC (permalink / raw) To: gcc In article <20020604.072448.55836404.davem@redhat.com> you write: >I think messing around with dirty bits and OS interfaces to set/reset >them is absolutely the wrong path to go down. This is going to make >GC much more complex from a per-host standpoint. I also, for similar >reasons, think the SIGSEGV+mprotect idea is bad too. I can guarentee >you we will need to disable this on half the platforms that have a >working mmap() because of either a) OS bugs in providing the correct >fault address to the SIGSEGV handler b) us not being able to figure >out how to make it work on a particular platform. > Thank you ! you formulated very well what I was thinking, and was not able to pinpoint precisely. Actually, I can pinpoint it partly: going down such a path would 1/ make gcc very much linux-dependent. Or at least, turn it into trash on platforms without a `fast/non-buggy' SEGV+mprotect hack. 2/ help hide other inefficiencies in the compiler... and since the majority of gcc users are i*86-*-linux*, well, you can guess the rest. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 8:10 ` Marc Espie @ 2002-06-04 8:58 ` David S. Miller 2002-06-04 9:05 ` Marc Espie 0 siblings, 1 reply; 32+ messages in thread From: David S. Miller @ 2002-06-04 8:58 UTC (permalink / raw) To: espie; +Cc: gcc From: Marc Espie <espie@quatramaran.ens.fr> Date: Tue, 4 Jun 2002 17:08:51 +0200 Actually, I can pinpoint it partly: going down such a path would 1/ make gcc very much linux-dependent. Actually, the dirty bit facility in question was orignally found to be present under Solaris. Every UNIX provides the mprotect+SIGSEGV feature though. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 8:58 ` David S. Miller @ 2002-06-04 9:05 ` Marc Espie 2002-06-04 9:35 ` David S. Miller 0 siblings, 1 reply; 32+ messages in thread From: Marc Espie @ 2002-06-04 9:05 UTC (permalink / raw) To: David S. Miller; +Cc: gcc On Tue, Jun 04, 2002 at 08:52:07AM -0700, David S. Miller wrote: > From: Marc Espie <espie@quatramaran.ens.fr> > Date: Tue, 4 Jun 2002 17:08:51 +0200 > > Actually, I can pinpoint it partly: going down such a path would > 1/ make gcc very much linux-dependent. > > Actually, the dirty bit facility in question was orignally found to be > present under Solaris. Every UNIX provides the mprotect+SIGSEGV > feature though. No. 1/ it may be prohibitively expensive on most OSes that implement it. 2/ Here, documentation of mprotect says: Not all implementations will guarantee protection on a page basis; the granularity of protection changes may be as large as an entire re- gion. Even the linux documentation says something similar: POSIX.1b says that mprotect can be used only on regions of memory obtained from mmap(2) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 9:05 ` Marc Espie @ 2002-06-04 9:35 ` David S. Miller 0 siblings, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 9:35 UTC (permalink / raw) To: espie; +Cc: gcc From: Marc Espie <espie@nerim.net> Date: Tue, 4 Jun 2002 18:04:40 +0200 1/ it may be prohibitively expensive on most OSes that implement it. I don't question this. 2/ Here, documentation of mprotect says: Not all implementations will guarantee protection on a page basis; the granularity of protection changes may be as large as an entire re- gion. In reality, %99 of UNIX systems do page based granularity because so many apps need it. ELF shared libraries are challenging, at best, if you don't do page based mprotect :-) It's a moot point because I think the mprotect+SIGSEGV thing is stupid from a performance standpoint anyways. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:37 ` David S. Miller 2002-06-04 7:51 ` Daniel Berlin 2002-06-04 8:10 ` Marc Espie @ 2002-06-04 8:43 ` law 2002-06-04 8:59 ` David S. Miller 2002-06-04 10:15 ` Richard Henderson 3 siblings, 1 reply; 32+ messages in thread From: law @ 2002-06-04 8:43 UTC (permalink / raw) To: David S. Miller; +Cc: pkoning, levon, ak, rth, gcc In message <20020604.072448.55836404.davem@redhat.com>, "David S. Miller" writes: > 2) Functions like loop.c:product_cheap_p() generate RTL to see > if valid expressions of a given type can be created and if so > see what it looks like. Then the RTL is %100 forgotten. > > FIX: Create a "short-term" mode for GC allocation. Enable this > mode around such allocations. This seems reasonable -- as long as we have a way for the short term object to turn into a long term object. That was one of the fundamental problems with obstacks -- there was no reasonable way to change the lifetime of an object after it was created. It seems to me we have to classes of short term needs for RTL. 1. We're going to generate various hunks of RTL, none of which will be live after the "short term" mode is complete. Clearly we want these to use different pages in the GC allocator and not pollute the GC pages that are interesting long-term. 2. We're going to generate various hunks of RTL, but one or more of those hunks may be live after we're done with the "short term" mode. This is the more interesting (and I suspect more important) long term problem to solve. jeff ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 8:43 ` law @ 2002-06-04 8:59 ` David S. Miller 0 siblings, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 8:59 UTC (permalink / raw) To: law; +Cc: pkoning, levon, ak, rth, gcc From: law@redhat.com Date: Tue, 04 Jun 2002 09:42:28 -0600 It seems to me we have to classes of short term needs for RTL. 1. We're going to generate various hunks of RTL, none of which will be live after the "short term" mode is complete. Clearly we want these to use different pages in the GC allocator and not pollute the GC pages that are interesting long-term. Right. 2. We're going to generate various hunks of RTL, but one or more of those hunks may be live after we're done with the "short term" mode. This is the more interesting (and I suspect more important) long term problem to solve. Use for_each_rtx() on the RTL you made to mark the pieces such that RTL unsharing will copy it, move back to long-term mode before you do the unshare. I have most of this in my head already. :-) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 7:37 ` David S. Miller ` (2 preceding siblings ...) 2002-06-04 8:43 ` law @ 2002-06-04 10:15 ` Richard Henderson 2002-06-04 10:22 ` David S. Miller 3 siblings, 1 reply; 32+ messages in thread From: Richard Henderson @ 2002-06-04 10:15 UTC (permalink / raw) To: David S. Miller; +Cc: pkoning, levon, ak, gcc On Tue, Jun 04, 2002 at 07:24:48AM -0700, David S. Miller wrote: > What I am hearing right now from rth and others is "We allocate things > stupidly in the compiler, so we're going to fix the allocator." No, that's not what you're hearing. The problem that generational GC is used to fix is C++ including a few million lines of code and creating 100MB of templates and inline functions and whatnot. Which are absolutely positively not going to change once we parse them. And yet we re-scan those pages during each collection pass. You're looking at a different problem. One that needs solving, don't get me wrong, but rearranging how we scan or don't scan temporaries isn't going to help the problem described. r~ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-06-04 10:15 ` Richard Henderson @ 2002-06-04 10:22 ` David S. Miller 0 siblings, 0 replies; 32+ messages in thread From: David S. Miller @ 2002-06-04 10:22 UTC (permalink / raw) To: rth; +Cc: pkoning, levon, ak, gcc From: Richard Henderson <rth@redhat.com> Date: Tue, 4 Jun 2002 10:12:44 -0700 You're looking at a different problem. One that needs solving, don't get me wrong, but rearranging how we scan or don't scan temporaries isn't going to help the problem described. Now I understand. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC
@ 2002-06-05 4:56 Robert Dewar
0 siblings, 0 replies; 32+ messages in thread
From: Robert Dewar @ 2002-06-05 4:56 UTC (permalink / raw)
To: aoliva, rth; +Cc: davem, espie, gcc
> On Tue, Jun 04, 2002 at 03:41:10PM -0300, Alexandre Oliva wrote:
> > ... and doing garbage collection on obstacks, instead of on
> > individual objects?
I would assume that a PL-1 style area approach might be used ???
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC @ 2002-08-17 5:13 Tim Josling 2002-08-18 0:25 ` Guillermo Ballester Valor 0 siblings, 1 reply; 32+ messages in thread From: Tim Josling @ 2002-08-17 5:13 UTC (permalink / raw) To: gcc I ran a few more tests (Pentium III with 256mb ram, 1Ghz). 1. I set minimum storage used so that gc_collect never did anything. This increased the build time by about 3% compared to normal. The largest program insn-recog.c just fit into my 256mb of ram. If the build is 3% slower, then GCC must be about 5% slower because phase 1 of the build is the native compiler, and there are various shell scripts and other programs that run as well, diluting the speedup, so the GCC speedup is more than the bootstrap speedup. 2. Replace ggc-page.c with a routine that just allocated everything with xmalloc. I had to add a 4 bytes header for the storage length because gc_realloc is used in half a dozen places. This bootstrapped about 3% faster than the normal bootstrap, meaning the compiler was about 5% faster. Implications from this: 1. GC overhead is high. Allocating with xmalloc was 5% faster, even though no storage was freed. It was 10% faster than running the GC allocation code but not freeing anything. So the combined effect of the extra path length from gc_alloc and the cache hit from non-locality is at least 10%. In fact it is probably more, because I had to allocate the extra 4 bytes, and so I used more memory, and additionally, I did not align the storage which usually would have a performance impact. None of the above overhead of GC is captured in the GCC reports, which therefore greatly understate the cost of GC. 2. GC overhead in -O0 is very high. The percentage of GC overhead in non-optimised compiles is far greater, based on my earlier profiles. It is reported (by gprof) as ~10-12%. So it is probably more like 20-25% (based on ratios above), which is very large. 3. A new GCC default and option are needed. It would in my opinion be a good idea to have a heuristic which said - If the file is small, do not do GC. Instead do simple contiguous allocation. We should align things properly, and also we should change gc_realloc to require the callers to provide the old length so we don't have to store the length. - If the file is very large, use GC. The default size could be a configure option. - Allow user to override and force use or non-use of GC. I would clean up and submit this patch but it is just to hard too get patches onto gcc if you do not have write access. 4. This all makes it appear that there are a lot of performance wins to be found in GCC. Future changes to GC should be done after testing of the effect on performance. Tim Josling ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: Speeding up GC 2002-08-17 5:13 Tim Josling @ 2002-08-18 0:25 ` Guillermo Ballester Valor 0 siblings, 0 replies; 32+ messages in thread From: Guillermo Ballester Valor @ 2002-08-18 0:25 UTC (permalink / raw) To: gcc Hi, I'm not gcc developer, but this mail seems to me a lot promising. Is there any strong problem with the Tim's allocation sugestions?. Guillermo. On Saturday 17 August 2002 14:12, Tim Josling wrote: > I ran a few more tests (Pentium III with 256mb ram, 1Ghz). > > 1. I set minimum storage used so that gc_collect never did anything. This > increased the build time by about 3% compared to normal. The largest > program insn-recog.c just fit into my 256mb of ram. If the build is 3% > slower, then GCC must be about 5% slower because phase 1 of the build is > the native compiler, and there are various shell scripts and other programs > that run as well, diluting the speedup, so the GCC speedup is more than the > bootstrap speedup. > > 2. Replace ggc-page.c with a routine that just allocated everything with > xmalloc. I had to add a 4 bytes header for the storage length because > gc_realloc is used in half a dozen places. This bootstrapped about 3% > faster than the normal bootstrap, meaning the compiler was about 5% faster. > > Implications from this: > > 1. GC overhead is high. Allocating with xmalloc was 5% faster, even though > no storage was freed. It was 10% faster than running the GC allocation code > but not freeing anything. So the combined effect of the extra path length > from gc_alloc and the cache hit from non-locality is at least 10%. In fact > it is probably more, because I had to allocate the extra 4 bytes, and so I > used more memory, and additionally, I did not align the storage which > usually would have a performance impact. None of the above overhead of GC > is captured in the GCC reports, which therefore greatly understate the cost > of GC. > > 2. GC overhead in -O0 is very high. The percentage of GC overhead in > non-optimised compiles is far greater, based on my earlier profiles. It is > reported (by gprof) as ~10-12%. So it is probably more like 20-25% (based > on ratios above), which is very large. > > 3. A new GCC default and option are needed. It would in my opinion be a > good idea to have a heuristic which said > > - If the file is small, do not do GC. Instead do simple contiguous > allocation. We should align things properly, and also we should change > gc_realloc to require the callers to provide the old length so we don't > have to store the length. > > - If the file is very large, use GC. The default size could be a configure > option. > > - Allow user to override and force use or non-use of GC. > > I would clean up and submit this patch but it is just to hard too get > patches onto gcc if you do not have write access. > > 4. This all makes it appear that there are a lot of performance wins to be > found in GCC. Future changes to GC should be done after testing of the > effect on performance. > > Tim Josling -- Guillermo Ballester Valor gbv@oxixares.com Ogijares, Granada SPAIN ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2002-08-18 0:25 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-06-03 6:25 Speeding up GC Martin v. Löwis 2002-06-03 6:39 ` David S. Miller 2002-06-03 18:10 ` Richard Henderson 2002-06-04 2:34 ` Marc Espie 2002-06-04 3:15 ` David S. Miller 2002-06-04 11:42 ` Alexandre Oliva 2002-06-04 12:10 ` Richard Henderson 2002-06-04 12:39 ` David S. Miller 2002-06-04 12:38 ` David S. Miller 2002-06-04 10:03 ` Richard Henderson [not found] <j47klg8pfy.fsf@informatik.hu-berlin.de.suse.lists.egcs> [not found] ` <20020603180553.A13829@redhat.com.suse.lists.egcs> 2002-06-04 3:52 ` Andi Kleen 2002-06-04 6:57 ` John Levon 2002-06-04 7:14 ` Paul Koning 2002-06-04 7:37 ` David S. Miller 2002-06-04 7:51 ` Daniel Berlin 2002-06-04 7:53 ` David S. Miller 2002-06-04 7:58 ` Andi Kleen 2002-06-04 8:08 ` David S. Miller 2002-06-04 8:55 ` law 2002-06-04 8:17 ` Marc Espie 2002-06-04 9:21 ` Joe Buck 2002-06-04 8:10 ` Marc Espie 2002-06-04 8:58 ` David S. Miller 2002-06-04 9:05 ` Marc Espie 2002-06-04 9:35 ` David S. Miller 2002-06-04 8:43 ` law 2002-06-04 8:59 ` David S. Miller 2002-06-04 10:15 ` Richard Henderson 2002-06-04 10:22 ` David S. Miller 2002-06-05 4:56 Robert Dewar 2002-08-17 5:13 Tim Josling 2002-08-18 0:25 ` Guillermo Ballester Valor
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).