public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13  8:09 dewar
  2001-04-13  8:32 ` Jonathan P. Olson
  0 siblings, 1 reply; 36+ messages in thread
From: dewar @ 2001-04-13  8:09 UTC (permalink / raw)
  To: dewar, olson; +Cc: gcc, hans_boehm, java, jsturm, tromey

<<Seems like the compiler would have to emit a ton of debugging information
to specify the contents of each word of the stack frame at each PC
location
within the program.  Wading through all this mess in a GC would likely be
more costly than just conservatively scanning the stacks.
>>

Well there is also the concern that in a language like C, type accurate
scanning is not possible (because of free unions), so you still end up
with conservative guesses in any case, and there is some question as to
whether there is enough gain. Yes, it is definitely a gain to go to
a 100% type accurate solution (as would be possible with a language like
Java or Ada), since then you can count on no memory leaks, but if you
only go part way, it is not clear that it is worth it.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-14  9:24 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-14  9:24 UTC (permalink / raw)
  To: fjh, olson; +Cc: gcc, hans_boehm, java

<<You don't necessarily need such information for each PC location;
for example, for single-threaded applications it's enough to record
such information for each call site or heap allocation.
>>

There are other approaches possible. In Spitbol/370, the garbage collector
scans and interprets the code to determine what stack locations are in use
as references.

^ permalink raw reply	[flat|nested] 36+ messages in thread
* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 13:06 Boehm, Hans
  0 siblings, 0 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-13 13:06 UTC (permalink / raw)
  To: 'dewar@gnat.com', olson; +Cc: gcc, Boehm, Hans, java, jsturm, tromey

I don't believe that there is a significant direct time benefit to accurate
stack scanning.  Conservative scanning could be made faster (if it mattered,
it usually doesn't, and it probably already is faster), and will certainly
result in smaller executables and less memory traffic during scanning.

As far as indirect effects are concerned, the story is a bit different.  The
best empirical comparison between type accurate and conservative collection
that I know of is

Hirzel, Diwan, "On the Type Accuracy of Garbage Collection", ISMM 2000

(See http://www.cs.colorado.edu/~diwan/ISMM-Hirzel.ps .
There is a new, related paper at
http://www.cs.colorado.edu/~diwan/ecoop01-gc-liveness.pdf )

They did a fair amount of work to measure differences in data seen as live
by the GC with various degrees of conservativism for programs they expected
to be a challenge for conservative collectors.  They didn't find any growing
leaks, but they did find significant differences (up to about 40%) in the
amount of retained memory, and the stack was often a significant part of the
cause.  As expected, this is highly architecture dependent, and the
difference is insignificant with curent applications on 64 bit plaforms.

I think the conclusion depends somewhat in what you are trying to achieve:

1) Meaningful hard space bounds.  This is hard no matter what.  You need:
 -  A definition of what constitutes "live data".  Most languages don't
define this, since it can be difficult to force optimizations to preserve
it.  Appel has a definition for ML, and I believe there is one for Haskell.
Java carefully doesn't define it.  I don't think Ada defines it either.
 -  Probably a compacting collector, since you otherwise lose a factor of
roughly log(<largest object size>/<smallest object size>) in worst case
space use.
 -  Type accurate scans of the heap and static data.
 -  Probably an accurate stack scan.  (For most applications, you otherwise
lose a factor of <number of misidentified pointers>, and in a few cases the
loss is potentially unbounded.  See
http://www.hpl.hp.com/personal/Hans_Boehm/gc/bounds.html .)  Many, though
not all, approaches here require execution overhead for programs that don't
allocate, e.g. to potentially stop execution at "safe points", or to limit
derived pointers at those "safe points".

2) Minimal surprise in space usage at minimal cost.  I'm not sure what the
right solution is.  Perhaps profile-based type info for frequently executed
call sites?  In the absence of C/C++ code, perhaps full type information, so
that you can fully compact on occasion?

3) Minimal expected running time.  The answer seems to be completely
dependent on the application and architecture.
 
Hans

> -----Original Message-----
> From: dewar@gnat.com [ mailto:dewar@gnat.com ]
> Sent: Friday, April 13, 2001 8:10 AM
> To: dewar@gnat.com; olson@mmsi.com
> Cc: gcc@gcc.gnu.org; hans_boehm@hp.com; java@gcc.gnu.org;
> jsturm@one-point.com; tromey@redhat.com
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> <<Seems like the compiler would have to emit a ton of 
> debugging information
> to specify the contents of each word of the stack frame at each PC
> location
> within the program.  Wading through all this mess in a GC 
> would likely be
> more costly than just conservatively scanning the stacks.
> >>
> 
> Well there is also the concern that in a language like C, 
> type accurate
> scanning is not possible (because of free unions), so you still end up
> with conservative guesses in any case, and there is some 
> question as to
> whether there is enough gain. Yes, it is definitely a gain to go to
> a 100% type accurate solution (as would be possible with a 
> language like
> Java or Ada), since then you can count on no memory leaks, but if you
> only go part way, it is not clear that it is worth it.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread
* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:31 Boehm, Hans
  0 siblings, 0 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-13 12:31 UTC (permalink / raw)
  To: 'dewar@gnat.com', olson, tromey; +Cc: gcc, Boehm, Hans, java, jsturm

> -----Original Message-----
> From: dewar@gnat.com [ mailto:dewar@gnat.com ]
> 
> <<Read the paper "Compiler Support for Garbage Collection in a
> Statically Typed Language", by Diwan, Moss, and Hudson.  My copy says
> "To appear in SIGPLAN '92 PLDI".
> >>
Proceedings of the SIGPLAN '92 Conference on Programming Language Design and
Implementation, pp. 273-282.
> 
> Well there is a VERY extensive literature much earlier than 
> this. Remember
> that Algol-68 requires fully type accurate garbage 
> collection, and all the
> problems were worked out in the early 70's, with a general 
> conclusion that
> the costs are not that high (this includes dealing with 
> parallel incremental
> garbage collection in a multi-threaded environment, since A68 had full
> thread support). Of course there is a followon literature, but this is
> hardly a new problem.
> 
And indeed the above paper cites a 1971 paper by Branquart and Lewi in
"Algol 68 Implementation".

A good, more recent, paper is

Stichnoth, Lueh, and Cierniak, "Support for Garbage Collection at Every
Instruction in a Java Compiler", PLDI '99.

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:04 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-13 12:04 UTC (permalink / raw)
  To: olson, tromey; +Cc: dewar, gcc, hans_boehm, java, jsturm

<<I'll check out the paper.  However it seems like the compacting collector
would need to suspend threads for long periods of time while it's
modifying
all the memory references since all references to an object would need to
be modified atomically.  In a multi-threaded system it would be very
costly.
>>

There are well known algorithms that minimize the cost "it seems" is not
a substitute for looking at the literature here :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:03 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-13 12:03 UTC (permalink / raw)
  To: olson, tromey; +Cc: dewar, gcc, hans_boehm, java, jsturm

<<Read the paper "Compiler Support for Garbage Collection in a
Statically Typed Language", by Diwan, Moss, and Hudson.  My copy says
"To appear in SIGPLAN '92 PLDI".
>>

Well there is a VERY extensive literature much earlier than this. Remember
that Algol-68 requires fully type accurate garbage collection, and all the
problems were worked out in the early 70's, with a general conclusion that
the costs are not that high (this includes dealing with parallel incremental
garbage collection in a multi-threaded environment, since A68 had full
thread support). Of course there is a followon literature, but this is
hardly a new problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread
[parent not found: <200104131451.f3DEpoV06885@mail.redhat.com>]
[parent not found: <auto-000000600681@crewsoft.com>]
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13  4:52 dewar
  2001-04-13  7:49 ` Jonathan P. Olson
       [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
  0 siblings, 2 replies; 36+ messages in thread
From: dewar @ 2001-04-13  4:52 UTC (permalink / raw)
  To: hans_boehm, tromey; +Cc: gcc, java, jsturm

>Adding non-conservative stack scanning is much harder


A nice term for non-conservative GC that has been introduced by IBM is
"type accurate".

^ permalink raw reply	[flat|nested] 36+ messages in thread
* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-12 14:30 Boehm, Hans
  2001-04-12 22:17 ` Tom Tromey
  0 siblings, 1 reply; 36+ messages in thread
From: Boehm, Hans @ 2001-04-12 14:30 UTC (permalink / raw)
  To: 'Jeff Sturm', Tom Tromey; +Cc: Boehm, Hans, java, gcc

> -----Original Message-----
> From: Jeff Sturm [ mailto:jsturm@one-point.com ]
> 1) If I have two shared libraries, both compiled with 
> compatible releases
> of gcj but targetted to different GC libraries, is there any hope of
> loading them both into one process image (i.e. running two GC systems
> concurrently)?
> 
> I'm thinking the answer is "no", but just thought I'd ask.
I also think the answer is "no".  If you wanted to make the answer "yes",
you'd need a fairly elaborate protocol between the two collectors to handle
pointer chains that cross from the heap manged by one GC to the other.
> 
> 2) If someone really wants pluggable GC (along with somewhat reduced
> efficiency), would it be practical to define a "generic" GC 
> interface for
> this purpose?  That would leave the choice up to the package 
> maintainer.
> 
> Suppose we had two GC libraries available today, named "boehm" and
> "copying".  You could require that these two cannot be mixed 
> freely, but
> either could be mixed with a "generic" which would use whatever GC is
> loaded at runtime.
> 
I would be inclined to make direct calls into the GC a configurable option,
so that you can still go through _Jv_AllocObject if you desire (or probably
if you want to use JVMPI).  I think that gets you this level of
compatibility, assuming both collectors can live with the same generated
code, which I suspect greatly restricts the space of possibilities.  (As far
as I know, the only way to currently get a write barrier is with virtual
memory techniques.  That means generational or incremental collection
requires virtual memory, and you only get update information at coarse
granularity.  And you're presumably limited to a conservative stack scan.)

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread
* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-11 12:16 Boehm, Hans
  2001-04-11 16:46 ` Corey Minyard
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-11 12:16 UTC (permalink / raw)
  To: 'Bryce McKinlay', Boehm, Hans
  Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

I'd like to move in a slightly different direction.  I'd like to see gcc
generate code that calls GC_gcj_malloc directly in most cases, at least for
non-embedded applications.

In 3.0,the allocation path is _Jv_AllocObject (prims.cc) -> _Jv_AllocObj
(boehm.cc) -> GC_gcj_malloc (GC library).

In my version _Jv_AllocObj is simply an inlined call to GC_gcj_malloc, so
this path is a little shorter, but still too long.

_Jv_allocObject currently does the following:

1) Calls _Jv_InitClass.  This normally involves an inline test of a global.
It seems to me the compiler should be doing this (perhaps even in the
embedded case?) since it can often (usually?) be optimized out.  Much of the
overhead is in locating the global, and that could be moved out of loops,
etc.

2) Checks if a finalizer needs to be registered, and does so if necessary.
This is a very expensive  (5 + memory references?) test for something that I
believe is completely statically known.  The test should definitely be done
by the compiler.  (It can still call through libgcj if a finalizer needs to
be registered.  That's currently fairly expensive anyway, and it's
relatively infrequent.)

3) If object allocation notifications are requested, it does some messy
JVMPI allocation profiling stuff.  I think this is currently compiled in by
default?  I don't understand the full history or current status here, but
I'd be inclined to turn it off by default.  (As far as I undertand it, it's
advantage is that it's semistandard.  It's disadvantages are that neither
the spec nor the implementation look quite right (the stuff about turning
off GC looks wrong to me), it looks very intrusive for a profiling
mechanism, and I think there are usually better mechanisms for answering the
questions I would really want answered.)

4) It calls _Jv_AllocObj (which does nothing but call GC_gcj_malloc).

A disadvantage of this change is that it would create a direct dependency
from gcj generated code to the GC library.  But I suspect that's already
implicitly there, since another GC library would probably need a compiler
write-barrier, etc.

I'm trying to generalize the GC library, so that a single binary could be
used for Java, C, ObjC, etc.  So I would like to avoid replicating it in
libgcj.  It may still  make sense to replicate the top level routines, if
there's a clean way to do that.

Hans

> -----Original Message-----
> From: Bryce McKinlay [ mailto:bryce@albatross.co.nz ]
> Sent: Tuesday, April 10, 2001 6:16 PM
> To: Boehm, Hans
> Cc: 'Alexandre Oliva'; tromey@redhat.com; Jeff Sturm; 
> java@gcc.gnu.org;
> gcc@gcc.gnu.org
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> "Boehm, Hans" wrote:
> 
> > I'm seeing significant overheads as a result of dynamic 
> library calls.  On a
> > PII/300 machine, a (single-threaded) loop containing only 
> free(malloc(8))
> > runs more than 20% faster when it's linked statically.  
> This is similar on
> > an Itanium machine.  I'm not sure how significant that is 
> for typical C
> > programs, nor how frequently something like libsupc++ is 
> called.  I believe
> > it is currently very significant for libgcj and calls to the garbage
> > collector library from libgcj.
> 
> Why don't we link the garbage collector into libgcj?
> 
> regards
> 
>   [ bryce ]
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread
* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-10 16:23 Boehm, Hans
  2001-04-12  5:42 ` Alexandre Oliva
  0 siblings, 1 reply; 36+ messages in thread
From: Boehm, Hans @ 2001-04-10 16:23 UTC (permalink / raw)
  To: 'Alexandre Oliva', Boehm, Hans; +Cc: tromey, Jeff Sturm, java, gcc

Isn't PIC code in the library only part of the story?  Even if you
unconditionally compile the library as PIC, there's still the issue of
whether the client can directly branch it, or branch to an indirect branch
to the routine (X86) or branch to a more complicated trampoline (IA64).  On
IA64, much of the time seems to go into the trampoline code, though there
seems to be potential for optimization there.  The code is essentially
position independent in all cases.

I'm not sure how well typical X86 hardware handles the
branch-to-indirect-branch sequence.

Hans

> -----Original Message-----
> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> Sent: Tuesday, April 10, 2001 1:22 PM
> To: Boehm, Hans
> Cc: tromey@redhat.com; Jeff Sturm; java@gcc.gnu.org; gcc@gcc.gnu.org
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> On Apr 10, 2001, "Boehm, Hans" <hans_boehm@hp.com> wrote:
> 
> >> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> 
> >> We either need libsupc++ to be a shared library, that both 
> libstdc++
> >> and libgcj depend on, or have libgcj depend on libstdc++.
> 
> > I'm seeing significant overheads as a result of dynamic library
> > calls.
> 
> In this particular case, it would make no difference at all: libsupc++
> is already compiled as PIC, even though it's a static library.
> 
> -- 
> Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
> Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
> CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
> Free Software Evangelist    *Please* write to mailing lists, not to me
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread
* dynamic library cost (was RE: libtool, java woes)
@ 2001-04-10  9:15 Boehm, Hans
  2001-04-10 13:22 ` Alexandre Oliva
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-10  9:15 UTC (permalink / raw)
  To: 'Alexandre Oliva', tromey; +Cc: Jeff Sturm, java, gcc

> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
>
> We either need libsupc++ to be a shared library, that both libstdc++
> and libgcj depend on, or have libgcj depend on libstdc++.
> 
One thing to keep in mind with all of this:

I'm seeing significant overheads as a result of dynamic library calls.  On a
PII/300 machine, a (single-threaded) loop containing only free(malloc(8))
runs more than 20% faster when it's linked statically.  This is similar on
an Itanium machine.  I'm not sure how significant that is for typical C
programs, nor how frequently something like libsupc++ is called.  I believe
it is currently very significant for libgcj and calls to the garbage
collector library from libgcj.

I doubt there are any easy solutions, but we should be careful not to make
this worse.  (Partial solutions might include sometimes expanding the
dynamic library call code inline on Itanium, having gcj generate allocation
code that often directly calls the GC innstead of passing through libgcj,
perhaps somewhow replicating a few frequently called functions like the top
level of the malloc implementation, perhaps dynamically replacing the code
at a few frequently used call sites?)

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2001-04-14  9:24 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-13  8:09 dynamic library cost (was RE: libtool, java woes) dewar
2001-04-13  8:32 ` Jonathan P. Olson
  -- strict thread matches above, loose matches on Subject: below --
2001-04-14  9:24 dewar
2001-04-13 13:06 Boehm, Hans
2001-04-13 12:31 Boehm, Hans
2001-04-13 12:04 dewar
2001-04-13 12:03 dewar
     [not found] <200104131451.f3DEpoV06885@mail.redhat.com>
2001-04-13  8:59 ` Tom Tromey
2001-04-13  9:07   ` Jonathan P. Olson
2001-04-14  8:07   ` Fergus Henderson
     [not found] <auto-000000600681@crewsoft.com>
2001-04-13  8:44 ` Cedric Berger
2001-04-13  8:52   ` Jonathan P. Olson
2001-04-13  4:52 dewar
2001-04-13  7:49 ` Jonathan P. Olson
     [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
2001-04-14  7:50   ` Fergus Henderson
2001-04-12 14:30 Boehm, Hans
2001-04-12 22:17 ` Tom Tromey
2001-04-13  7:37   ` Jonathan P. Olson
2001-04-11 12:16 Boehm, Hans
2001-04-11 16:46 ` Corey Minyard
2001-04-12  8:01   ` Tom Tromey
2001-04-11 21:46 ` Bryce McKinlay
2001-04-12  7:54   ` Tom Tromey
2001-04-12  7:59 ` Tom Tromey
2001-04-12 11:47   ` Jeff Sturm
2001-04-12 22:13     ` Tom Tromey
2001-04-12 23:46       ` Jeff Sturm
2001-04-10 16:23 Boehm, Hans
2001-04-12  5:42 ` Alexandre Oliva
2001-04-10  9:15 Boehm, Hans
2001-04-10 13:22 ` Alexandre Oliva
2001-04-10 14:43 ` Joern Rennecke
2001-04-10 17:50   ` Alexandre Oliva
2001-04-10 15:00 ` Jakub Jelinek
2001-04-10 18:14 ` Bryce McKinlay
2001-04-12 22:09   ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).