public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-14  9:24 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-14  9:24 UTC (permalink / raw)
  To: fjh, olson; +Cc: gcc, hans_boehm, java

<<You don't necessarily need such information for each PC location;
for example, for single-threaded applications it's enough to record
such information for each call site or heap allocation.
>>

There are other approaches possible. In Spitbol/370, the garbage collector
scans and interprets the code to determine what stack locations are in use
as references.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-13  8:59 ` Tom Tromey
  2001-04-13  9:07   ` Jonathan P. Olson
@ 2001-04-14  8:07   ` Fergus Henderson
  1 sibling, 0 replies; 36+ messages in thread
From: Fergus Henderson @ 2001-04-14  8:07 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Jonathan P. Olson, hans_boehm, gcc, java

On 13-Apr-2001, Tom Tromey <tromey@redhat.com> wrote:
> >>>>> "Jon" == Jonathan P Olson <olson@mmsi.com> writes:
> 
> Jon> Does anybody have a feel for the costs of non-conservative stack
> Jon> scanning?
...
> Jon> Wading through all this mess in a GC would likely be more costly
> Jon> than just conservatively scanning the stacks.
> 
> They did this work so that they could implement a compacting collector
> but still have the compiler generate optimized code.  So what they did
> is even more ambitious -- if an object is moved they update the stack
> and registers to reflect this.
> 
> My guess is that it would be cheaper, easier, and probably not much
> worse to simply implement a mostly-copying-or-compacting collector.
> (I have no facts to back up this assertion.)

I think it depends on the application.

Having a compacting collector makes some additional optimizations
possible, at least for single-threaded applications.  For example, if you
have a function whose return type is some atomic type such as `int', and
through static analysis (or language properties, e.g. for pure functional
languages) you know that this function doesn't have any side effects,
you know that any heap objects allocated in the function will be dead
when the function returns, so you can save the heap pointer on function
entry and restore it on function exit, thus very cheaply collecting
any heap objects allocated in that function.  Prolog implementations
typically use a similar optimization to cheaply reclaim heap on backtracking.
For some applications, such optimizations can often avoid the need to
ever do conventional garbage collection.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
       [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
@ 2001-04-14  7:50   ` Fergus Henderson
  0 siblings, 0 replies; 36+ messages in thread
From: Fergus Henderson @ 2001-04-14  7:50 UTC (permalink / raw)
  To: Jonathan P. Olson; +Cc: hans_boehm, gcc, java

On 13-Apr-2001, Jonathan P. Olson <olson@mmsi.com> wrote:
> Does anybody have a feel for the costs of non-conservative stack 
> scanning?
> 
> Seems like the compiler would have to emit a ton of debugging information
> to specify the contents of each word of the stack frame at each PC 
> location within the program.  Wading through all this mess in a GC
> would likely be more costly than just conservatively scanning the stacks.

You don't necessarily need such information for each PC location;
for example, for single-threaded applications it's enough to record
such information for each call site or heap allocation.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
                                    |  of excellence is a lethal habit"
WWW: < http://www.cs.mu.oz.au/~fjh >  |     -- the last words of T. S. Garp.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 13:06 Boehm, Hans
  0 siblings, 0 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-13 13:06 UTC (permalink / raw)
  To: 'dewar@gnat.com', olson; +Cc: gcc, Boehm, Hans, java, jsturm, tromey

I don't believe that there is a significant direct time benefit to accurate
stack scanning.  Conservative scanning could be made faster (if it mattered,
it usually doesn't, and it probably already is faster), and will certainly
result in smaller executables and less memory traffic during scanning.

As far as indirect effects are concerned, the story is a bit different.  The
best empirical comparison between type accurate and conservative collection
that I know of is

Hirzel, Diwan, "On the Type Accuracy of Garbage Collection", ISMM 2000

(See http://www.cs.colorado.edu/~diwan/ISMM-Hirzel.ps .
There is a new, related paper at
http://www.cs.colorado.edu/~diwan/ecoop01-gc-liveness.pdf )

They did a fair amount of work to measure differences in data seen as live
by the GC with various degrees of conservativism for programs they expected
to be a challenge for conservative collectors.  They didn't find any growing
leaks, but they did find significant differences (up to about 40%) in the
amount of retained memory, and the stack was often a significant part of the
cause.  As expected, this is highly architecture dependent, and the
difference is insignificant with curent applications on 64 bit plaforms.

I think the conclusion depends somewhat in what you are trying to achieve:

1) Meaningful hard space bounds.  This is hard no matter what.  You need:
 -  A definition of what constitutes "live data".  Most languages don't
define this, since it can be difficult to force optimizations to preserve
it.  Appel has a definition for ML, and I believe there is one for Haskell.
Java carefully doesn't define it.  I don't think Ada defines it either.
 -  Probably a compacting collector, since you otherwise lose a factor of
roughly log(<largest object size>/<smallest object size>) in worst case
space use.
 -  Type accurate scans of the heap and static data.
 -  Probably an accurate stack scan.  (For most applications, you otherwise
lose a factor of <number of misidentified pointers>, and in a few cases the
loss is potentially unbounded.  See
http://www.hpl.hp.com/personal/Hans_Boehm/gc/bounds.html .)  Many, though
not all, approaches here require execution overhead for programs that don't
allocate, e.g. to potentially stop execution at "safe points", or to limit
derived pointers at those "safe points".

2) Minimal surprise in space usage at minimal cost.  I'm not sure what the
right solution is.  Perhaps profile-based type info for frequently executed
call sites?  In the absence of C/C++ code, perhaps full type information, so
that you can fully compact on occasion?

3) Minimal expected running time.  The answer seems to be completely
dependent on the application and architecture.
 
Hans

> -----Original Message-----
> From: dewar@gnat.com [ mailto:dewar@gnat.com ]
> Sent: Friday, April 13, 2001 8:10 AM
> To: dewar@gnat.com; olson@mmsi.com
> Cc: gcc@gcc.gnu.org; hans_boehm@hp.com; java@gcc.gnu.org;
> jsturm@one-point.com; tromey@redhat.com
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> <<Seems like the compiler would have to emit a ton of 
> debugging information
> to specify the contents of each word of the stack frame at each PC
> location
> within the program.  Wading through all this mess in a GC 
> would likely be
> more costly than just conservatively scanning the stacks.
> >>
> 
> Well there is also the concern that in a language like C, 
> type accurate
> scanning is not possible (because of free unions), so you still end up
> with conservative guesses in any case, and there is some 
> question as to
> whether there is enough gain. Yes, it is definitely a gain to go to
> a 100% type accurate solution (as would be possible with a 
> language like
> Java or Ada), since then you can count on no memory leaks, but if you
> only go part way, it is not clear that it is worth it.
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:31 Boehm, Hans
  0 siblings, 0 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-13 12:31 UTC (permalink / raw)
  To: 'dewar@gnat.com', olson, tromey; +Cc: gcc, Boehm, Hans, java, jsturm

> -----Original Message-----
> From: dewar@gnat.com [ mailto:dewar@gnat.com ]
> 
> <<Read the paper "Compiler Support for Garbage Collection in a
> Statically Typed Language", by Diwan, Moss, and Hudson.  My copy says
> "To appear in SIGPLAN '92 PLDI".
> >>
Proceedings of the SIGPLAN '92 Conference on Programming Language Design and
Implementation, pp. 273-282.
> 
> Well there is a VERY extensive literature much earlier than 
> this. Remember
> that Algol-68 requires fully type accurate garbage 
> collection, and all the
> problems were worked out in the early 70's, with a general 
> conclusion that
> the costs are not that high (this includes dealing with 
> parallel incremental
> garbage collection in a multi-threaded environment, since A68 had full
> thread support). Of course there is a followon literature, but this is
> hardly a new problem.
> 
And indeed the above paper cites a 1971 paper by Branquart and Lewi in
"Algol 68 Implementation".

A good, more recent, paper is

Stichnoth, Lueh, and Cierniak, "Support for Garbage Collection at Every
Instruction in a Java Compiler", PLDI '99.

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:04 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-13 12:04 UTC (permalink / raw)
  To: olson, tromey; +Cc: dewar, gcc, hans_boehm, java, jsturm

<<I'll check out the paper.  However it seems like the compacting collector
would need to suspend threads for long periods of time while it's
modifying
all the memory references since all references to an object would need to
be modified atomically.  In a multi-threaded system it would be very
costly.
>>

There are well known algorithms that minimize the cost "it seems" is not
a substitute for looking at the literature here :-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13 12:03 dewar
  0 siblings, 0 replies; 36+ messages in thread
From: dewar @ 2001-04-13 12:03 UTC (permalink / raw)
  To: olson, tromey; +Cc: dewar, gcc, hans_boehm, java, jsturm

<<Read the paper "Compiler Support for Garbage Collection in a
Statically Typed Language", by Diwan, Moss, and Hudson.  My copy says
"To appear in SIGPLAN '92 PLDI".
>>

Well there is a VERY extensive literature much earlier than this. Remember
that Algol-68 requires fully type accurate garbage collection, and all the
problems were worked out in the early 70's, with a general conclusion that
the costs are not that high (this includes dealing with parallel incremental
garbage collection in a multi-threaded environment, since A68 had full
thread support). Of course there is a followon literature, but this is
hardly a new problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-13  8:59 ` Tom Tromey
@ 2001-04-13  9:07   ` Jonathan P. Olson
  2001-04-14  8:07   ` Fergus Henderson
  1 sibling, 0 replies; 36+ messages in thread
From: Jonathan P. Olson @ 2001-04-13  9:07 UTC (permalink / raw)
  To: tromey; +Cc: dewar, hans_boehm, gcc, java, jsturm

I'll check out the paper.  However it seems like the compacting collector
would need to suspend threads for long periods of time while it's 
modifying
all the memory references since all references to an object would need to
be modified atomically.  In a multi-threaded system it would be very 
costly.

On Friday, April 13, 2001, at 09:09  AM, Tom Tromey wrote:

> They did this work so that they could implement a compacting collector
> but still have the compiler generate optimized code.  So what they did
> is even more ambitious -- if an object is moved they update the stack
> and registers to reflect this.
>
> My guess is that it would be cheaper, easier, and probably not much
> worse to simply implement a mostly-copying-or-compacting collector.
> (I have no facts to back up this assertion.)
>
> Anyway, for them the question wasn't about the cost of scanning the
> stack per se.  That cost is just part of a larger cost framework.  For
> some reason they wanted to use a compacting collector and this is the
> work required to do that.
>
> Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
       [not found] <200104131451.f3DEpoV06885@mail.redhat.com>
@ 2001-04-13  8:59 ` Tom Tromey
  2001-04-13  9:07   ` Jonathan P. Olson
  2001-04-14  8:07   ` Fergus Henderson
  0 siblings, 2 replies; 36+ messages in thread
From: Tom Tromey @ 2001-04-13  8:59 UTC (permalink / raw)
  To: Jonathan P. Olson; +Cc: dewar, hans_boehm, gcc, java, jsturm

>>>>> "Jon" == Jonathan P Olson <olson@mmsi.com> writes:

Jon> Does anybody have a feel for the costs of non-conservative stack
Jon> scanning?

Read the paper "Compiler Support for Garbage Collection in a
Statically Typed Language", by Diwan, Moss, and Hudson.  My copy says
"To appear in SIGPLAN '92 PLDI".

Jon> Wading through all this mess in a GC would likely be more costly
Jon> than just conservatively scanning the stacks.

They did this work so that they could implement a compacting collector
but still have the compiler generate optimized code.  So what they did
is even more ambitious -- if an object is moved they update the stack
and registers to reflect this.

My guess is that it would be cheaper, easier, and probably not much
worse to simply implement a mostly-copying-or-compacting collector.
(I have no facts to back up this assertion.)

Anyway, for them the question wasn't about the cost of scanning the
stack per se.  That cost is just part of a larger cost framework.  For
some reason they wanted to use a compacting collector and this is the
work required to do that.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-13  8:44 ` Cedric Berger
@ 2001-04-13  8:52   ` Jonathan P. Olson
  0 siblings, 0 replies; 36+ messages in thread
From: Jonathan P. Olson @ 2001-04-13  8:52 UTC (permalink / raw)
  To: Cedric Berger; +Cc: dewar, gcc, hans_boehm, java, jsturm, tromey

In my case, I let the conservative GC collect C++ objects just like it 
does
Java objects.  Instead of storing a native peer in a `long', objects 
which
require a native peer store an int which indexes into a C++ Map vector 
of native
peers.

A little less evil than storing the native peer in a `long' variable.  
Yeah,
the finalizer is still necessary to deallocate the peer int from the Map
vector but this allows the C++ object to be shared by multiple references
and collected by GC without an explicit free().

On Friday, April 13, 2001, at 08:44  AM, Cedric Berger wrote:

> "Jonathan P. Olson" wrote:
>
>> Even in Java, it's often not possible.  Consider how Java AWT
>> implementations
>> typically store native peer objects in `long' variables.  Yeah, this is
>> totally evil but unfortunately Java doesn't provide any way to declare
>> `native' object references.
>
> Ok, I've done that a lot (storing native peer objects in `long' 
> variables)
> but then I write a finalizer to free() the memory explicitely.
> (this is one of the very common use of finalizers)
> Cedric
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
       [not found] <auto-000000600681@crewsoft.com>
@ 2001-04-13  8:44 ` Cedric Berger
  2001-04-13  8:52   ` Jonathan P. Olson
  0 siblings, 1 reply; 36+ messages in thread
From: Cedric Berger @ 2001-04-13  8:44 UTC (permalink / raw)
  To: Jonathan P. Olson; +Cc: dewar, gcc, hans_boehm, java, jsturm, tromey

"Jonathan P. Olson" wrote:

> Even in Java, it's often not possible.  Consider how Java AWT
> implementations
> typically store native peer objects in `long' variables.  Yeah, this is
> totally evil but unfortunately Java doesn't provide any way to declare
> `native' object references.

Ok, I've done that a lot (storing native peer objects in `long' variables)
but then I write a finalizer to free() the memory explicitely.
(this is one of the very common use of finalizers)
Cedric


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-13  8:09 dewar
@ 2001-04-13  8:32 ` Jonathan P. Olson
  0 siblings, 0 replies; 36+ messages in thread
From: Jonathan P. Olson @ 2001-04-13  8:32 UTC (permalink / raw)
  To: dewar; +Cc: gcc, hans_boehm, java, jsturm, tromey

Even in Java, it's often not possible.  Consider how Java AWT 
implementations
typically store native peer objects in `long' variables.  Yeah, this is 
totally evil but unfortunately Java doesn't provide any way to declare 
`native' object references.

On Friday, April 13, 2001, at 08:09  AM, dewar@gnat.com wrote:

> <<Seems like the compiler would have to emit a ton of debugging 
> information
> to specify the contents of each word of the stack frame at each PC
> location
> within the program.  Wading through all this mess in a GC would likely 
> be
> more costly than just conservatively scanning the stacks.
>>>
>
> Well there is also the concern that in a language like C, type accurate
> scanning is not possible (because of free unions), so you still end up
> with conservative guesses in any case, and there is some question as to
> whether there is enough gain. Yes, it is definitely a gain to go to
> a 100% type accurate solution (as would be possible with a language like
> Java or Ada), since then you can count on no memory leaks, but if you
> only go part way, it is not clear that it is worth it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13  8:09 dewar
  2001-04-13  8:32 ` Jonathan P. Olson
  0 siblings, 1 reply; 36+ messages in thread
From: dewar @ 2001-04-13  8:09 UTC (permalink / raw)
  To: dewar, olson; +Cc: gcc, hans_boehm, java, jsturm, tromey

<<Seems like the compiler would have to emit a ton of debugging information
to specify the contents of each word of the stack frame at each PC
location
within the program.  Wading through all this mess in a GC would likely be
more costly than just conservatively scanning the stacks.
>>

Well there is also the concern that in a language like C, type accurate
scanning is not possible (because of free unions), so you still end up
with conservative guesses in any case, and there is some question as to
whether there is enough gain. Yes, it is definitely a gain to go to
a 100% type accurate solution (as would be possible with a language like
Java or Ada), since then you can count on no memory leaks, but if you
only go part way, it is not clear that it is worth it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-13  4:52 dewar
@ 2001-04-13  7:49 ` Jonathan P. Olson
       [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
  1 sibling, 0 replies; 36+ messages in thread
From: Jonathan P. Olson @ 2001-04-13  7:49 UTC (permalink / raw)
  To: dewar; +Cc: hans_boehm, tromey, gcc, java, jsturm

Does anybody have a feel for the costs of non-conservative stack 
scanning?

Seems like the compiler would have to emit a ton of debugging information
to specify the contents of each word of the stack frame at each PC 
location
within the program.  Wading through all this mess in a GC would likely be
more costly than just conservatively scanning the stacks.

In my experience, stacks are normally the smallest amount of memory 
that's
actually scanned by a collector.  Most O-O applications have an order of 
magnitude
more heap memory allocated than stack memory, so the only thing a precise
stack scan would buy you is elimination of (theoretical) memory leaks.

On Friday, April 13, 2001, at 04:52  AM, dewar@gnat.com wrote:

>> Adding non-conservative stack scanning is much harder
>
>
> A nice term for non-conservative GC that has been introduced by IBM is
> "type accurate".

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-12 22:17 ` Tom Tromey
@ 2001-04-13  7:37   ` Jonathan P. Olson
  0 siblings, 0 replies; 36+ messages in thread
From: Jonathan P. Olson @ 2001-04-13  7:37 UTC (permalink / raw)
  To: tromey; +Cc: Boehm, Hans, 'Jeff Sturm', java, gcc

Right, adding write barriers was incredibly easy.   The  following code
in `expr.c' store_expr() invokes the write barrier:

#if defined (HAVE_write_barrier)
       /*
        * Generate write barrier code if we're setting memory with
        * something that's not a constant, not a vtable, and with
        * something that is a pointer.
        */
       else if (flag_write_barriers
                && GET_CODE (target) == MEM
                && GET_MODE (target) == Pmode
#ifdef TARGET_NEEDS_WRITE_BARRIER
                && TARGET_NEEDS_WRITE_BARRIER (target)
#endif
                && !(TREE_CODE(exp) == ADDR_EXPR
                     && DECL_VIRTUAL_P (TREE_OPERAND (exp, 0)))
                && !really_constant_p (exp)
                && contains_pointers_p (TREE_TYPE (exp))) {
           temp = force_reg (Pmode, temp);
           emit_insn (gen_write_barrier (target, temp));
       }
#endif

And here's the example RTL write-barrier pattern that I added for
StrongARM:

;; Garbage collector write barrier
(define_insn "write_barrier"
   [(parallel [
         (set (match_operand:SI 0 "memory_operand"   "=m")
              (match_operand:SI 1 "register_operand" "r"))
         (unspec_volatile [(match_dup 1)] 16)
         (clobber (reg:SI 12))
         (clobber (reg:SI 14))])]
   ""
   "str%?\\t%1,%0\;bl%?\\t___write_barrier_%1"
[(set_attr "conds" "clob")
  (set_attr "length" "8")])

On StrongARM, I have a separate statically linked function for each
hardware register.  For example, bl __write_barrier_r0 tells the
garbage collector the object contained in register `r0' is live.

Note that this GC is my own conservative GC especially designed
for real-time embedded work.  The StrongARM write barrier code for 
register r0
looks like this:

         .global __write_barrier_r0
__write_barrier_r0:
         ldr     ip,SysBlocktbl
         cmp     r0,ip
         movls   pc,lr
         mov     ip,REG
         swi     #V_write_barrier
         mov     pc,lr

SysBlocktbl is a global memory location which, during GC, points to the
base of the collected heap.  When GC isn't running, SysBlocktbl contains 
ffffffff.
Thus, the write_barrier code itself requires three instructions when GC 
is not
running.  When GC is running, each write to a pointer requires a SWI.  
Obviously
it would be possible to inline these 5 instructions, but on memory 
constrained
embedded devices, I chose to save the memory.  On a system where you 
didn't
mind the extra code size, you could inline this entire sequence to:

	str	%1,%0
	ldr	ip,SysBlocktbl
	cmp	%1,ip
	movhi pc,lr
	movhi ip,%1
	swihi #V_write_barrier

This would execute an extra instruction, but avoid a couple of pipeline 
stalls
so overall would save a few cycles.  I doubt if it would be very 
noticable in most
applications, however, since the write barrier code only gets generated 
when
you write to a pointer.

On embedded devices I use a SWI since the write barrier code which grays 
a particular
object needs to be non -preemptible.  On Linux or other multiuser 
systems you may want
to replace this SWI with a call which locks an appropriate pthread 
mutex.  However
on a single user, multi-threaded embedded device the SWI is alot more 
efficient.

FYI, another processor which we use (now obsolete) is the AMD a29k.  One 
thing
it's really good for is write barriers because it has an ASSERT 
instruction.  The
RTL write barrier pattern for this processor is just a single 
instruction:

;; Garbage collector write barrier
(define_insn "write_barrier"
   [(parallel [
         (set (match_operand:SI 0 "memory_operand"   "=m")
              (match_operand:SI 1 "register_operand" "r"))
         (unspec_volatile [(match_dup 1)] 16)])]
   ""
   "store 0,0,%1,%0\;asleu V_write_barrier,%1,gr95")

The `asleu' instruction traps to the write barrier code whenever the
value of the pointer being written is higher than the heap base.  Because
the A29K has a zillion global registers, I can easily afford to use one 
to
hold the GC's SysBlocktbl pointer.

When GC is not running, `asleu' is a NOP.  Other processors which have
assert instructions could use similar code to implement ultra cheap
write barriers.

BTW, one thing about GC on pthreads implementations which I've found 
truly
evil is the whole business of suspending threads.  SInce pthreads 
doesn't provide
any way to suspend a thread, Boehm GC and I assume everybody else that
does multi-threaded GC sends Unix signals to all the threads.  Worse 
yet, pthreads
doesn't provide any way to find all the threads, so GCs must perform 
their own
bookkeeping to try to keep track of all the running threads.  If a 
thread dies with
GC not knowing about it, GC hangs forever waiting for the thread to 
catch the
signal ;^(

My guess is that putting write barriers in the compiler will be alot 
cheaper than
all the Unix signalling nonsense to suspend threads which is necessary 
when
write barriers aren't part of the compiler.

On Thursday, April 12, 2001, at 10:27  PM, Tom Tromey wrote:

>>>>>> "Hans" == Boehm, Hans <hans_boehm@hp.com> writes:
>
> Hans> (As far as I know, the only way to currently get a write barrier
> Hans> is with virtual memory techniques.  That means generational or
> Hans> incremental collection requires virtual memory, and you only get
> Hans> update information at coarse granularity.  And you're presumably
> Hans> limited to a conservative stack scan.)
>
> That is definitely the current state of affairs.
>
> Adding write barrier support doesn't appear to be hugely difficult.
> Jon Olson implemented it (I haven't seen the code).  Is it worth doing
> this with your collector?
>
> Adding non-conservative stack scanning is much harder.  It has been
> added to gcc before (in gcc 1.something -- ancient history).  It is
> definitely possible, but it touches a lot of the compiler.  There's a
> paper on this by Rick Hudson.  When I say "much harder", my guess (and
> this is really a wild guess) is that it is on the order of a man year.
>
> Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-13  4:52 dewar
  2001-04-13  7:49 ` Jonathan P. Olson
       [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
  0 siblings, 2 replies; 36+ messages in thread
From: dewar @ 2001-04-13  4:52 UTC (permalink / raw)
  To: hans_boehm, tromey; +Cc: gcc, java, jsturm

>Adding non-conservative stack scanning is much harder


A nice term for non-conservative GC that has been introduced by IBM is
"type accurate".

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-12 22:13     ` Tom Tromey
@ 2001-04-12 23:46       ` Jeff Sturm
  0 siblings, 0 replies; 36+ messages in thread
From: Jeff Sturm @ 2001-04-12 23:46 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Boehm, Hans, java, gcc

On 12 Apr 2001, Tom Tromey wrote:
> Jeff> 1) If I have two shared libraries, both compiled with compatible
> Jeff> releases of gcj but targetted to different GC libraries, is
> Jeff> there any hope of loading them both into one process image
> Jeff> (i.e. running two GC systems concurrently)?
> 
> I'm not sure I really understand the question.

I'm trying to understand possible drawbacks of tailoring the frontend for
specific GC libraries.  If gcj generates calls to GC_gcj_malloc etc. _and_
an alternative GC becomes widely used with gcj, there is a chance someday
people will be faced with object libraries that cannot be loaded together.

> Right now I think the answer is `yes', because all the GC calls are
> made through _Jv_ wrappers.  We assume a conservative GC all over the
> place, so in many other ways these two GCs would have to be very
> similar.

Right now, yes.  The scenario I brought up isn't likely today because:

a) the current GC interface is mostly generic
b) only boehm-gc is widely used with gcj
c) nobody is distributing java code in native form only, such that
it cannot be recompiled

There are good reasons to change a) and other collectors already exist.  I
hope that anyone who chooses to distribute native java objects will
include source code (or at least class files) permitting recompilation,
but since the libgcj license permits non-GPL use we cannot count on it.

> Jeff> 2) If someone really wants pluggable GC (along with somewhat
> Jeff> reduced efficiency), would it be practical to define a "generic"
> Jeff> GC interface for this purpose?
> 
> We've pretty much done this already.  The problem is that this is
> slow.

It's a tradeoff.  If speed were all that mattered, nobody would build
shared libraries anyway, due to the overhead of PIC and the PLT, etc.

> Also, suppose some day we were to implement write barriers in the
> compiler.  In this case the write barriers would be compiled into the
> code.  For efficiency these would have to be GC-specific.

I suppose they could also be done through a portable interface at some
cost in runtime efficiency.

I'm not at all opposed to GC-specific interfaces in the frontend.  A
generic interface may still be a worthwhile goal however, and there are
those who might choose it if it were a compile-time option.  (Maybe I'm
stating the obvious, I don't know.)  Someday soon libgcj will be widely
distributed with GNU/Linux distributions, and then maintaining a stable
ABI becomes important.

Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-12 14:30 Boehm, Hans
@ 2001-04-12 22:17 ` Tom Tromey
  2001-04-13  7:37   ` Jonathan P. Olson
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Tromey @ 2001-04-12 22:17 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: 'Jeff Sturm', java, gcc

>>>>> "Hans" == Boehm, Hans <hans_boehm@hp.com> writes:

Hans> (As far as I know, the only way to currently get a write barrier
Hans> is with virtual memory techniques.  That means generational or
Hans> incremental collection requires virtual memory, and you only get
Hans> update information at coarse granularity.  And you're presumably
Hans> limited to a conservative stack scan.)

That is definitely the current state of affairs.

Adding write barrier support doesn't appear to be hugely difficult.
Jon Olson implemented it (I haven't seen the code).  Is it worth doing
this with your collector?

Adding non-conservative stack scanning is much harder.  It has been
added to gcc before (in gcc 1.something -- ancient history).  It is
definitely possible, but it touches a lot of the compiler.  There's a
paper on this by Rick Hudson.  When I say "much harder", my guess (and
this is really a wild guess) is that it is on the order of a man year.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-12 11:47   ` Jeff Sturm
@ 2001-04-12 22:13     ` Tom Tromey
  2001-04-12 23:46       ` Jeff Sturm
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Tromey @ 2001-04-12 22:13 UTC (permalink / raw)
  To: Jeff Sturm; +Cc: Boehm, Hans, java, gcc

>>>>> "Jeff" == Jeff Sturm <jsturm@one-point.com> writes:

Jeff> 1) If I have two shared libraries, both compiled with compatible
Jeff> releases of gcj but targetted to different GC libraries, is
Jeff> there any hope of loading them both into one process image
Jeff> (i.e. running two GC systems concurrently)?

I'm not sure I really understand the question.

Right now I think the answer is `yes', because all the GC calls are
made through _Jv_ wrappers.  We assume a conservative GC all over the
place, so in many other ways these two GCs would have to be very
similar.

I'm assuming you mean you want to load both the objects but only have
one GC in the process.  You can't run more than one GC in the process
and still have the libraries share data; that would require a lot of
new work.

Jeff> 2) If someone really wants pluggable GC (along with somewhat
Jeff> reduced efficiency), would it be practical to define a "generic"
Jeff> GC interface for this purpose?

We've pretty much done this already.  The problem is that this is
slow.

Also, suppose some day we were to implement write barriers in the
compiler.  In this case the write barriers would be compiled into the
code.  For efficiency these would have to be GC-specific.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10 18:14 ` Bryce McKinlay
@ 2001-04-12 22:09   ` Tom Tromey
  0 siblings, 0 replies; 36+ messages in thread
From: Tom Tromey @ 2001-04-12 22:09 UTC (permalink / raw)
  To: Bryce McKinlay
  Cc: Boehm, Hans, 'Alexandre Oliva', Jeff Sturm, java, gcc

>>>>> "Bryce" == Bryce McKinlay <bryce@albatross.co.nz> writes:

>> I'm seeing significant overheads as a result of dynamic library
>> calls.  On a PII/300 machine, a (single-threaded) loop containing
>> only free(malloc(8)) runs more than 20% faster when it's linked
>> statically.  This is similar on an Itanium machine.  I'm not sure
>> how significant that is for typical C programs, nor how frequently
>> something like libsupc++ is called.  I believe it is currently very
>> significant for libgcj and calls to the garbage collector library
>> from libgcj.

Bryce> Why don't we link the garbage collector into libgcj?

There's no particular reason right now.  Eventually I think the GC
will be runtime configurable so that different applications can use
the same .so.  The idea then was that it would be used as a pure
shared library.

Still I don't think that is super-compelling for us.  I'd prefer to
have measurements before changing it.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-12 14:30 Boehm, Hans
  2001-04-12 22:17 ` Tom Tromey
  0 siblings, 1 reply; 36+ messages in thread
From: Boehm, Hans @ 2001-04-12 14:30 UTC (permalink / raw)
  To: 'Jeff Sturm', Tom Tromey; +Cc: Boehm, Hans, java, gcc

> -----Original Message-----
> From: Jeff Sturm [ mailto:jsturm@one-point.com ]
> 1) If I have two shared libraries, both compiled with 
> compatible releases
> of gcj but targetted to different GC libraries, is there any hope of
> loading them both into one process image (i.e. running two GC systems
> concurrently)?
> 
> I'm thinking the answer is "no", but just thought I'd ask.
I also think the answer is "no".  If you wanted to make the answer "yes",
you'd need a fairly elaborate protocol between the two collectors to handle
pointer chains that cross from the heap manged by one GC to the other.
> 
> 2) If someone really wants pluggable GC (along with somewhat reduced
> efficiency), would it be practical to define a "generic" GC 
> interface for
> this purpose?  That would leave the choice up to the package 
> maintainer.
> 
> Suppose we had two GC libraries available today, named "boehm" and
> "copying".  You could require that these two cannot be mixed 
> freely, but
> either could be mixed with a "generic" which would use whatever GC is
> loaded at runtime.
> 
I would be inclined to make direct calls into the GC a configurable option,
so that you can still go through _Jv_AllocObject if you desire (or probably
if you want to use JVMPI).  I think that gets you this level of
compatibility, assuming both collectors can live with the same generated
code, which I suspect greatly restricts the space of possibilities.  (As far
as I know, the only way to currently get a write barrier is with virtual
memory techniques.  That means generational or incremental collection
requires virtual memory, and you only get update information at coarse
granularity.  And you're presumably limited to a conservative stack scan.)

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-12  7:59 ` Tom Tromey
@ 2001-04-12 11:47   ` Jeff Sturm
  2001-04-12 22:13     ` Tom Tromey
  0 siblings, 1 reply; 36+ messages in thread
From: Jeff Sturm @ 2001-04-12 11:47 UTC (permalink / raw)
  To: Tom Tromey; +Cc: Boehm, Hans, java, gcc

On 12 Apr 2001, Tom Tromey wrote:
> We've already decided, long ago, that the GC would not be dynamically
> pluggable.  That is, we're allowed to require a recompile if the GC
> implementation changes.

That's probably fine for current gcj users.  But in the real world, few
users actually  recompile packages, and if gcj is successful the ABI
issues will be increasingly important.  So I'm wondering...

1) If I have two shared libraries, both compiled with compatible releases
of gcj but targetted to different GC libraries, is there any hope of
loading them both into one process image (i.e. running two GC systems
concurrently)?

I'm thinking the answer is "no", but just thought I'd ask.

2) If someone really wants pluggable GC (along with somewhat reduced
efficiency), would it be practical to define a "generic" GC interface for
this purpose?  That would leave the choice up to the package maintainer.

Suppose we had two GC libraries available today, named "boehm" and
"copying".  You could require that these two cannot be mixed freely, but
either could be mixed with a "generic" which would use whatever GC is
loaded at runtime.


Jeff

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-11 16:46 ` Corey Minyard
@ 2001-04-12  8:01   ` Tom Tromey
  0 siblings, 0 replies; 36+ messages in thread
From: Tom Tromey @ 2001-04-12  8:01 UTC (permalink / raw)
  To: minyard
  Cc: Boehm, Hans, 'Bryce McKinlay', 'Alexandre Oliva',
	Jeff Sturm, java, gcc

>>>>> "Corey" == Corey Minyard <minyard@acm.org> writes:

Corey> Remember that at least two others use different GCs with GCJ.
Corey> What might be nice would be an interface that any GC could implement
Corey> that was divorced from any particular GC.  Something like POSIX (but
Corey> hopefully a better design that POSIX).

We tried to do this with the _Jv_ functions.  I guess one could argue
that the Boehm GC could simply rename the functions it exports.

Passing the name of the allocation function to gcj via libgcj.spec
seems like a reasonable approach to me, though.  That way each GC gets
to keep its exported namespace relatively clean, but we don't really
lose much in terms of maintainability.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-11 12:16 Boehm, Hans
  2001-04-11 16:46 ` Corey Minyard
  2001-04-11 21:46 ` Bryce McKinlay
@ 2001-04-12  7:59 ` Tom Tromey
  2001-04-12 11:47   ` Jeff Sturm
  2 siblings, 1 reply; 36+ messages in thread
From: Tom Tromey @ 2001-04-12  7:59 UTC (permalink / raw)
  To: Boehm, Hans
  Cc: 'Bryce McKinlay', 'Alexandre Oliva',
	Jeff Sturm, java, gcc

>>>>> "Hans" == Boehm, Hans <hans_boehm@hp.com> writes:

Hans> I'd like to move in a slightly different direction.  I'd like to
Hans> see gcc generate code that calls GC_gcj_malloc directly in most
Hans> cases, at least for non-embedded applications.

This won't be a problem.

Hans> 2) Checks if a finalizer needs to be registered, and does so if
Hans> necessary.  This is a very expensive (5 + memory references?)
Hans> test for something that I believe is completely statically
Hans> known.

I agree it is statically known.

Hans> A disadvantage of this change is that it would create a direct
Hans> dependency from gcj generated code to the GC library.

This isn't actually as big a problem as you might think.

We've already decided, long ago, that the GC would not be dynamically
pluggable.  That is, we're allowed to require a recompile if the GC
implementation changes.

What we can do here is make the GC function names a command-line
option to gcj.  Then we drop the appropriate option into libgcj.spec.
That lets us keep all the GC-related configury code close together in
libgcj, without having it spill over into gcj itself.

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-11 21:46 ` Bryce McKinlay
@ 2001-04-12  7:54   ` Tom Tromey
  0 siblings, 0 replies; 36+ messages in thread
From: Tom Tromey @ 2001-04-12  7:54 UTC (permalink / raw)
  To: Bryce McKinlay
  Cc: Boehm, Hans, 'Alexandre Oliva', Jeff Sturm, java, gcc

>>>>> "Bryce" == Bryce McKinlay <bryce@albatross.co.nz> writes:

Bryce> I was not aware that the JVMPI stuff is being compiled in. I
Bryce> agree: it should be turned off unless "--enable-libgcj-profile"
Bryce> or whatever is given to configure.

I definitely agree.  This should go on the branch too.  The JVMPI code
isn't complete and as far as I know nobody has ever used it.  Do you
want to write the patch or should I?

Tom

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10 16:23 Boehm, Hans
@ 2001-04-12  5:42 ` Alexandre Oliva
  0 siblings, 0 replies; 36+ messages in thread
From: Alexandre Oliva @ 2001-04-12  5:42 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: tromey, Jeff Sturm, java, gcc

On Apr 10, 2001, "Boehm, Hans" <hans_boehm@hp.com> wrote:

> Even if you unconditionally compile the library as PIC, there's
> still the issue of whether the client can directly branch it, or
> branch to an indirect branch to the routine (X86) or branch to a
> more complicated trampoline (IA64).

Indeed.  It's not just an issue of having one register put aside to
point to the GOT.  Going through the PLT can indeed get quite
more expensive than a direct call.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-11 12:16 Boehm, Hans
  2001-04-11 16:46 ` Corey Minyard
@ 2001-04-11 21:46 ` Bryce McKinlay
  2001-04-12  7:54   ` Tom Tromey
  2001-04-12  7:59 ` Tom Tromey
  2 siblings, 1 reply; 36+ messages in thread
From: Bryce McKinlay @ 2001-04-11 21:46 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

"Boehm, Hans" wrote:

> 3) If object allocation notifications are requested, it does some messy
> JVMPI allocation profiling stuff.  I think this is currently compiled in by
> default?  I don't understand the full history or current status here, but
> I'd be inclined to turn it off by default.

I was not aware that the JVMPI stuff is being compiled in. I agree: it should
be turned off unless "--enable-libgcj-profile" or whatever is given to
configure.

In any case, this (not having _Jv_AllocObject(), at least for common cases) is
is definatly where we want to go. Actually, I think I've mentioned this before
on one of my GCJ wishlist rants ;-)

regards

  [ bryce ]


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-11 12:16 Boehm, Hans
@ 2001-04-11 16:46 ` Corey Minyard
  2001-04-12  8:01   ` Tom Tromey
  2001-04-11 21:46 ` Bryce McKinlay
  2001-04-12  7:59 ` Tom Tromey
  2 siblings, 1 reply; 36+ messages in thread
From: Corey Minyard @ 2001-04-11 16:46 UTC (permalink / raw)
  To: Boehm, Hans
  Cc: 'Bryce McKinlay', 'Alexandre Oliva',
	tromey, Jeff Sturm, java, gcc

Remember that at least two others use different GCs with GCJ.

What might be nice would be an interface that any GC could implement
that was divorced from any particular GC.  Something like POSIX (but
hopefully a better design that POSIX).

-Corey

"Boehm, Hans" <hans_boehm@hp.com> writes:

> I'd like to move in a slightly different direction.  I'd like to see gcc
> generate code that calls GC_gcj_malloc directly in most cases, at least for
> non-embedded applications.
> 
> In 3.0,the allocation path is _Jv_AllocObject (prims.cc) -> _Jv_AllocObj
> (boehm.cc) -> GC_gcj_malloc (GC library).
> 
> In my version _Jv_AllocObj is simply an inlined call to GC_gcj_malloc, so
> this path is a little shorter, but still too long.
> 
> _Jv_allocObject currently does the following:
> 
> 4) It calls _Jv_AllocObj (which does nothing but call GC_gcj_malloc).
> 
> A disadvantage of this change is that it would create a direct dependency
> from gcj generated code to the GC library.  But I suspect that's already
> implicitly there, since another GC library would probably need a compiler
> write-barrier, etc.
> 
> I'm trying to generalize the GC library, so that a single binary could be
> used for Java, C, ObjC, etc.  So I would like to avoid replicating it in
> libgcj.  It may still  make sense to replicate the top level routines, if
> there's a clean way to do that.
> 
> Hans
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-11 12:16 Boehm, Hans
  2001-04-11 16:46 ` Corey Minyard
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-11 12:16 UTC (permalink / raw)
  To: 'Bryce McKinlay', Boehm, Hans
  Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

I'd like to move in a slightly different direction.  I'd like to see gcc
generate code that calls GC_gcj_malloc directly in most cases, at least for
non-embedded applications.

In 3.0,the allocation path is _Jv_AllocObject (prims.cc) -> _Jv_AllocObj
(boehm.cc) -> GC_gcj_malloc (GC library).

In my version _Jv_AllocObj is simply an inlined call to GC_gcj_malloc, so
this path is a little shorter, but still too long.

_Jv_allocObject currently does the following:

1) Calls _Jv_InitClass.  This normally involves an inline test of a global.
It seems to me the compiler should be doing this (perhaps even in the
embedded case?) since it can often (usually?) be optimized out.  Much of the
overhead is in locating the global, and that could be moved out of loops,
etc.

2) Checks if a finalizer needs to be registered, and does so if necessary.
This is a very expensive  (5 + memory references?) test for something that I
believe is completely statically known.  The test should definitely be done
by the compiler.  (It can still call through libgcj if a finalizer needs to
be registered.  That's currently fairly expensive anyway, and it's
relatively infrequent.)

3) If object allocation notifications are requested, it does some messy
JVMPI allocation profiling stuff.  I think this is currently compiled in by
default?  I don't understand the full history or current status here, but
I'd be inclined to turn it off by default.  (As far as I undertand it, it's
advantage is that it's semistandard.  It's disadvantages are that neither
the spec nor the implementation look quite right (the stuff about turning
off GC looks wrong to me), it looks very intrusive for a profiling
mechanism, and I think there are usually better mechanisms for answering the
questions I would really want answered.)

4) It calls _Jv_AllocObj (which does nothing but call GC_gcj_malloc).

A disadvantage of this change is that it would create a direct dependency
from gcj generated code to the GC library.  But I suspect that's already
implicitly there, since another GC library would probably need a compiler
write-barrier, etc.

I'm trying to generalize the GC library, so that a single binary could be
used for Java, C, ObjC, etc.  So I would like to avoid replicating it in
libgcj.  It may still  make sense to replicate the top level routines, if
there's a clean way to do that.

Hans

> -----Original Message-----
> From: Bryce McKinlay [ mailto:bryce@albatross.co.nz ]
> Sent: Tuesday, April 10, 2001 6:16 PM
> To: Boehm, Hans
> Cc: 'Alexandre Oliva'; tromey@redhat.com; Jeff Sturm; 
> java@gcc.gnu.org;
> gcc@gcc.gnu.org
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> "Boehm, Hans" wrote:
> 
> > I'm seeing significant overheads as a result of dynamic 
> library calls.  On a
> > PII/300 machine, a (single-threaded) loop containing only 
> free(malloc(8))
> > runs more than 20% faster when it's linked statically.  
> This is similar on
> > an Itanium machine.  I'm not sure how significant that is 
> for typical C
> > programs, nor how frequently something like libsupc++ is 
> called.  I believe
> > it is currently very significant for libgcj and calls to the garbage
> > collector library from libgcj.
> 
> Why don't we link the garbage collector into libgcj?
> 
> regards
> 
>   [ bryce ]
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10  9:15 Boehm, Hans
                   ` (2 preceding siblings ...)
  2001-04-10 15:00 ` Jakub Jelinek
@ 2001-04-10 18:14 ` Bryce McKinlay
  2001-04-12 22:09   ` Tom Tromey
  3 siblings, 1 reply; 36+ messages in thread
From: Bryce McKinlay @ 2001-04-10 18:14 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

"Boehm, Hans" wrote:

> I'm seeing significant overheads as a result of dynamic library calls.  On a
> PII/300 machine, a (single-threaded) loop containing only free(malloc(8))
> runs more than 20% faster when it's linked statically.  This is similar on
> an Itanium machine.  I'm not sure how significant that is for typical C
> programs, nor how frequently something like libsupc++ is called.  I believe
> it is currently very significant for libgcj and calls to the garbage
> collector library from libgcj.

Why don't we link the garbage collector into libgcj?

regards

  [ bryce ]


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10 14:43 ` Joern Rennecke
@ 2001-04-10 17:50   ` Alexandre Oliva
  0 siblings, 0 replies; 36+ messages in thread
From: Alexandre Oliva @ 2001-04-10 17:50 UTC (permalink / raw)
  To: Joern Rennecke; +Cc: Boehm, Hans, tromey, Jeff Sturm, java, gcc

On Apr 10, 2001, Joern Rennecke <amylaar@redhat.com> wrote:

> Well, we could also say that on AIX, libsupc++.so is a symlink to
> libstdc++.so .

Make that s/\.so/\.a/g, since it's AIX.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: dynamic library cost (was RE: libtool, java woes)
@ 2001-04-10 16:23 Boehm, Hans
  2001-04-12  5:42 ` Alexandre Oliva
  0 siblings, 1 reply; 36+ messages in thread
From: Boehm, Hans @ 2001-04-10 16:23 UTC (permalink / raw)
  To: 'Alexandre Oliva', Boehm, Hans; +Cc: tromey, Jeff Sturm, java, gcc

Isn't PIC code in the library only part of the story?  Even if you
unconditionally compile the library as PIC, there's still the issue of
whether the client can directly branch it, or branch to an indirect branch
to the routine (X86) or branch to a more complicated trampoline (IA64).  On
IA64, much of the time seems to go into the trampoline code, though there
seems to be potential for optimization there.  The code is essentially
position independent in all cases.

I'm not sure how well typical X86 hardware handles the
branch-to-indirect-branch sequence.

Hans

> -----Original Message-----
> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> Sent: Tuesday, April 10, 2001 1:22 PM
> To: Boehm, Hans
> Cc: tromey@redhat.com; Jeff Sturm; java@gcc.gnu.org; gcc@gcc.gnu.org
> Subject: Re: dynamic library cost (was RE: libtool, java woes)
> 
> 
> On Apr 10, 2001, "Boehm, Hans" <hans_boehm@hp.com> wrote:
> 
> >> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> 
> >> We either need libsupc++ to be a shared library, that both 
> libstdc++
> >> and libgcj depend on, or have libgcj depend on libstdc++.
> 
> > I'm seeing significant overheads as a result of dynamic library
> > calls.
> 
> In this particular case, it would make no difference at all: libsupc++
> is already compiled as PIC, even though it's a static library.
> 
> -- 
> Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
> Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
> CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
> Free Software Evangelist    *Please* write to mailing lists, not to me
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10  9:15 Boehm, Hans
  2001-04-10 13:22 ` Alexandre Oliva
  2001-04-10 14:43 ` Joern Rennecke
@ 2001-04-10 15:00 ` Jakub Jelinek
  2001-04-10 18:14 ` Bryce McKinlay
  3 siblings, 0 replies; 36+ messages in thread
From: Jakub Jelinek @ 2001-04-10 15:00 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

On Tue, Apr 10, 2001 at 09:15:40AM -0700, Boehm, Hans wrote:
> > From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> >
> > We either need libsupc++ to be a shared library, that both libstdc++
> > and libgcj depend on, or have libgcj depend on libstdc++.
> > 
> One thing to keep in mind with all of this:
> 
> I'm seeing significant overheads as a result of dynamic library calls.  On a
> PII/300 machine, a (single-threaded) loop containing only free(malloc(8))
> runs more than 20% faster when it's linked statically.  This is similar on
> an Itanium machine.  I'm not sure how significant that is for typical C
> programs, nor how frequently something like libsupc++ is called.  I believe
> it is currently very significant for libgcj and calls to the garbage
> collector library from libgcj.

Not to mention the dynamic linking overhead at startup (or dlopen) time
which every additional shared library adds.

	Jakub

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10  9:15 Boehm, Hans
  2001-04-10 13:22 ` Alexandre Oliva
@ 2001-04-10 14:43 ` Joern Rennecke
  2001-04-10 17:50   ` Alexandre Oliva
  2001-04-10 15:00 ` Jakub Jelinek
  2001-04-10 18:14 ` Bryce McKinlay
  3 siblings, 1 reply; 36+ messages in thread
From: Joern Rennecke @ 2001-04-10 14:43 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: 'Alexandre Oliva', tromey, Jeff Sturm, java, gcc

> > From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
> >
> > We either need libsupc++ to be a shared library, that both libstdc++
> > and libgcj depend on, or have libgcj depend on libstdc++.
> > 
> One thing to keep in mind with all of this:
> 
> I'm seeing significant overheads as a result of dynamic library calls.  On a
> PII/300 machine, a (single-threaded) loop containing only free(malloc(8))
> runs more than 20% faster when it's linked statically.  This is similar on
> an Itanium machine.  I'm not sure how significant that is for typical C
> programs, nor how frequently something like libsupc++ is called.  I believe
> it is currently very significant for libgcj and calls to the garbage
> collector library from libgcj.

Well, we could also say that on AIX, libsupc++.so is a symlink to
libstdc++.so .

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: dynamic library cost (was RE: libtool, java woes)
  2001-04-10  9:15 Boehm, Hans
@ 2001-04-10 13:22 ` Alexandre Oliva
  2001-04-10 14:43 ` Joern Rennecke
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Alexandre Oliva @ 2001-04-10 13:22 UTC (permalink / raw)
  To: Boehm, Hans; +Cc: tromey, Jeff Sturm, java, gcc

On Apr 10, 2001, "Boehm, Hans" <hans_boehm@hp.com> wrote:

>> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]

>> We either need libsupc++ to be a shared library, that both libstdc++
>> and libgcj depend on, or have libgcj depend on libstdc++.

> I'm seeing significant overheads as a result of dynamic library
> calls.

In this particular case, it would make no difference at all: libsupc++
is already compiled as PIC, even though it's a static library.

-- 
Alexandre Oliva   Enjoy Guarana', see http://www.ic.unicamp.br/~oliva/
Red Hat GCC Developer                  aoliva@{cygnus.com, redhat.com}
CS PhD student at IC-Unicamp        oliva@{lsd.ic.unicamp.br, gnu.org}
Free Software Evangelist    *Please* write to mailing lists, not to me

^ permalink raw reply	[flat|nested] 36+ messages in thread

* dynamic library cost (was RE: libtool, java woes)
@ 2001-04-10  9:15 Boehm, Hans
  2001-04-10 13:22 ` Alexandre Oliva
                   ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Boehm, Hans @ 2001-04-10  9:15 UTC (permalink / raw)
  To: 'Alexandre Oliva', tromey; +Cc: Jeff Sturm, java, gcc

> From: Alexandre Oliva [ mailto:aoliva@redhat.com ]
>
> We either need libsupc++ to be a shared library, that both libstdc++
> and libgcj depend on, or have libgcj depend on libstdc++.
> 
One thing to keep in mind with all of this:

I'm seeing significant overheads as a result of dynamic library calls.  On a
PII/300 machine, a (single-threaded) loop containing only free(malloc(8))
runs more than 20% faster when it's linked statically.  This is similar on
an Itanium machine.  I'm not sure how significant that is for typical C
programs, nor how frequently something like libsupc++ is called.  I believe
it is currently very significant for libgcj and calls to the garbage
collector library from libgcj.

I doubt there are any easy solutions, but we should be careful not to make
this worse.  (Partial solutions might include sometimes expanding the
dynamic library call code inline on Itanium, having gcj generate allocation
code that often directly calls the GC innstead of passing through libgcj,
perhaps somewhow replicating a few frequently called functions like the top
level of the malloc implementation, perhaps dynamically replacing the code
at a few frequently used call sites?)

Hans

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2001-04-14  9:24 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-14  9:24 dynamic library cost (was RE: libtool, java woes) dewar
  -- strict thread matches above, loose matches on Subject: below --
2001-04-13 13:06 Boehm, Hans
2001-04-13 12:31 Boehm, Hans
2001-04-13 12:04 dewar
2001-04-13 12:03 dewar
     [not found] <200104131451.f3DEpoV06885@mail.redhat.com>
2001-04-13  8:59 ` Tom Tromey
2001-04-13  9:07   ` Jonathan P. Olson
2001-04-14  8:07   ` Fergus Henderson
     [not found] <auto-000000600681@crewsoft.com>
2001-04-13  8:44 ` Cedric Berger
2001-04-13  8:52   ` Jonathan P. Olson
2001-04-13  8:09 dewar
2001-04-13  8:32 ` Jonathan P. Olson
2001-04-13  4:52 dewar
2001-04-13  7:49 ` Jonathan P. Olson
     [not found] ` <200104131451.AAA11537@mulga.cs.mu.OZ.AU>
2001-04-14  7:50   ` Fergus Henderson
2001-04-12 14:30 Boehm, Hans
2001-04-12 22:17 ` Tom Tromey
2001-04-13  7:37   ` Jonathan P. Olson
2001-04-11 12:16 Boehm, Hans
2001-04-11 16:46 ` Corey Minyard
2001-04-12  8:01   ` Tom Tromey
2001-04-11 21:46 ` Bryce McKinlay
2001-04-12  7:54   ` Tom Tromey
2001-04-12  7:59 ` Tom Tromey
2001-04-12 11:47   ` Jeff Sturm
2001-04-12 22:13     ` Tom Tromey
2001-04-12 23:46       ` Jeff Sturm
2001-04-10 16:23 Boehm, Hans
2001-04-12  5:42 ` Alexandre Oliva
2001-04-10  9:15 Boehm, Hans
2001-04-10 13:22 ` Alexandre Oliva
2001-04-10 14:43 ` Joern Rennecke
2001-04-10 17:50   ` Alexandre Oliva
2001-04-10 15:00 ` Jakub Jelinek
2001-04-10 18:14 ` Bryce McKinlay
2001-04-12 22:09   ` Tom Tromey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).