public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Minimal GCC/Linux shared lib + EH bug example
       [not found] <09b501c1f634$04747d80$6501a8c0@boostconsulting.com>
@ 2002-05-12  4:57 ` Jason Merrill
  2002-05-12  6:42   ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-12  4:57 UTC (permalink / raw)
  To: David Abrahams; +Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

> FYI, Ralf Grosse-Kunstleve has reduced the exception-handling problem
> mentioned here
> http://mail.python.org/pipermail/c++-sig/2002-May/001021.html to a minimal
> example:

>     http://cci.lbl.gov/~rwgk/tmp/gcc_dl_eh.tar.gz

> gunzip -c gcc_dl_eh.tar.gz | tar xvf -
> cd gcc_dl_eh
> more 0README

> The problem here is clearly a GCC/Linux interaction problem, *not* a Python
> bug. However, it does have an impact on anyone writing Python extension
> modules with g++ on Linux.

IMO, it is unreasonable to expect C++ to work with RTLD_LOCAL unless the
object so loaded is indeed self-contained (which precludes linking against
a common shared library, as in this case).  Too many aspects of the
language depend on being able to merge duplicates coming from different
sources.  In this case, the problem comes from std::type_info; the runtime
library expects to be able to compare type_info nodes by pointer
equivalence.  Templates and static variables in inline functions would also
have trouble.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  4:57 ` Minimal GCC/Linux shared lib + EH bug example Jason Merrill
@ 2002-05-12  6:42   ` David Abrahams
  2002-05-12  7:30     ` Jason Merrill
  2002-05-12  8:17     ` Martin v. Loewis
  0 siblings, 2 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-12  6:42 UTC (permalink / raw)
  To: Jason Merrill; +Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis


----- Original Message -----
From: "Jason Merrill" <jason@redhat.com>

> IMO, it is unreasonable to expect C++ to work with RTLD_LOCAL unless the
> object so loaded is indeed self-contained (which precludes linking
against
> a common shared library, as in this case).  Too many aspects of the
> language depend on being able to merge duplicates coming from different
> sources.

I think there's an implicit assumption in your statement which should be
brought into the open: that there's an agreed-upon idea of what it means
for C++ to "work" with shared libraries. As you know, the language standard
doesn't define how share libs are supposed to work, and people have
different mental models. For example, on Windows, imports and exports are
explicitly declared. Nobody expects to share static variables in inline
functions across DLLs unless the function is explicitly exported. However,
exception-handling and RTTI /are/ expected to work.

> In this case, the problem comes from std::type_info; the runtime
> library expects to be able to compare type_info nodes by pointer
> equivalence.  Templates and static variables in inline functions would
also
> have trouble.

As I understand (guess) it, what happens is this:

1. lib1.so is loaded with RTLD_LOCAL. All of its symbols go into a new
"symbol space"
2. the loader notices the dependency on X.so, and loads any /new/ symbols
from the shared lib X.so into the same space, eliminating duplicates.
3. Now that all dependent libs are loaded, any unresolved symbols in
lib1.so and X.so are resolved now; if any fail to resolve or there are
duplicates, runtime error.
-----
4. lib2.so is loaded with RTLD_LOCAL. Because it's RTLD_LOCAL, the loader
again creates a new "symbol space"; no duplicates are shared with X.so.
5. The loader notices the dependency on X.so, but X.so is already loaded
6. Any unresolved symbols in lib2.so (and X.so, though there are none) are
resolved now
-----

What I'd prefer to happen is that in step 4, the loader would use the
existing definition for any loaded symbol which is defined in or used by
lib2's immediate dependencies. That would nicely model the concept that
lib2.so is sharing globally with X.so but not with lib1.so, and it seems
like the "right" solution.

However, for my application I'd be content if EH was just comparing the
type_info::name() strings, as Martin von Loewis stated was the case in
2.95.x and again in 3.1:
http://aspn.activestate.com/ASPN/Mail/Message/1191899 [This statement
appears to be contradicted empirically, though: Ralf reports similar
problems with GCC 3.1 - GNATS id 6629]. This would bring GCC sufficiently
close to the model of Windows compilers (and of compilers on the other *NIX
OSes he's tested on) to allow practical cross-platform authoring of plugins
in C++.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  6:42   ` David Abrahams
@ 2002-05-12  7:30     ` Jason Merrill
  2002-05-12  7:31       ` David Abrahams
                         ` (2 more replies)
  2002-05-12  8:17     ` Martin v. Loewis
  1 sibling, 3 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-12  7:30 UTC (permalink / raw)
  To: David Abrahams
  Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

> I think there's an implicit assumption in your statement which should be
> brought into the open: that there's an agreed-upon idea of what it means
> for C++ to "work" with shared libraries. As you know, the language standard
> doesn't define how share libs are supposed to work, and people have
> different mental models. For example, on Windows, imports and exports are
> explicitly declared. Nobody expects to share static variables in inline
> functions across DLLs unless the function is explicitly exported. However,
> exception-handling and RTTI /are/ expected to work.

And on Windows, we don't rely on address equivalence.

> What I'd prefer to happen is that in step 4, the loader would use the
> existing definition for any loaded symbol which is defined in or used by
> lib2's immediate dependencies. That would nicely model the concept that
> lib2.so is sharing globally with X.so but not with lib1.so, and it seems
> like the "right" solution.

I noticed that the readme says that the test passes on Solaris.  Does it
provide these semantics?  How about SCO?  Anyone?

> However, for my application I'd be content if EH was just comparing the
> type_info::name() strings, as Martin von Loewis stated was the case in
> 2.95.x and again in 3.1:
> http://aspn.activestate.com/ASPN/Mail/Message/1191899 [This statement
> appears to be contradicted empirically, though: Ralf reports similar
> problems with GCC 3.1 - GNATS id 6629].

Yes, 3.1 still relies on pointer comparison.

I find this testcase somewhat persuasive, as the offending dlopen call is
not in the C++ code.  What do others think?

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  7:30     ` Jason Merrill
@ 2002-05-12  7:31       ` David Abrahams
  2002-05-12  8:07         ` Jason Merrill
  2002-05-12  9:31       ` Martin v. Loewis
  2002-05-12 12:17       ` Mark Mitchell
  2 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-12  7:31 UTC (permalink / raw)
  To: Jason Merrill
  Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis, c++std-ext

For the c++std-ext recipients, just added: previous messages in this
thread, which concerns the semantics of shared libraries w.r.t.
exception-handling and other C++ features which require "vague linkage",
can be found here:
http://gcc.gnu.org/ml/gcc/2002-05/msg00866.html
http://gcc.gnu.org/ml/gcc/2002-05/msg00869.html
http://gcc.gnu.org/ml/gcc/2002-05/msg00873.html

----- Original Message -----
From: "Jason Merrill" <jason@redhat.com>

> > What I'd prefer to happen is that in step 4, the loader would use the
> > existing definition for any loaded symbol which is defined in or used
by
> > lib2's immediate dependencies. That would nicely model the concept that
> > lib2.so is sharing globally with X.so but not with lib1.so, and it
seems
> > like the "right" solution.
>
> I noticed that the readme says that the test passes on Solaris.  Does it
> provide these semantics?  How about SCO?  Anyone?

The test as written doesn't really tell us the answer since it uses EH and
any implementation can make it a non-issue by comparing type_info::name()
strings instead of addresses. The test could easily be modified the so it
looks at the address of a class template's static data member, of course.

> > However, for my application I'd be content if EH was just comparing the
> > type_info::name() strings, as Martin von Loewis stated was the case in
> > 2.95.x and again in 3.1:
> > http://aspn.activestate.com/ASPN/Mail/Message/1191899 [This statement
> > appears to be contradicted empirically, though: Ralf reports similar
> > problems with GCC 3.1 - GNATS id 6629].
>
> Yes, 3.1 still relies on pointer comparison.
>
> I find this testcase somewhat persuasive, as the offending dlopen call is
> not in the C++ code.  What do others think?

I guess you /know/ what I think: I just want it to work ;-)

I'd also like to point out what I think is a fundamental difference in the
Windows and Linux models for shared libraries: in the Windows model,
sharing has a "direction" but in the Linux model the direction is
determined by who got there first. So, for example, in the Windows model
when lib.dll and lib2.dll are loaded there are certain symbols which come
in from their dependency, X.dll, because they're explicitly imported. It's
the fact that the libraries explicitly "pull" symbols from their
dependencies that makes the model work. In the Linux model, it appears that
lib1.so and lib2.so essentially load everything they've got and "push"
their symbols onto X.so. In the case we're looking at, the dependency may
already be loaded so the push might leave us with two copies of some
symbols in the lib<N>.so/X.so relationship.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  7:31       ` David Abrahams
@ 2002-05-12  8:07         ` Jason Merrill
  2002-05-12  9:24           ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-12  8:07 UTC (permalink / raw)
  To: David Abrahams
  Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis, c++std-ext

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

>> I noticed that the readme says that the test passes on Solaris.  Does it
>> provide these semantics?  How about SCO?  Anyone?

> The test as written doesn't really tell us the answer since it uses EH and
> any implementation can make it a non-issue by comparing type_info::name()
> strings instead of addresses.

I meant using gcc 3.0.4 on Solaris.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  6:42   ` David Abrahams
  2002-05-12  7:30     ` Jason Merrill
@ 2002-05-12  8:17     ` Martin v. Loewis
  1 sibling, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-12  8:17 UTC (permalink / raw)
  To: David Abrahams; +Cc: Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> However, for my application I'd be content if EH was just comparing the
> type_info::name() strings, as Martin von Loewis stated was the case in
> 2.95.x and again in 3.1:

I misinterpreted the code. It reads

#if !__GXX_MERGED_TYPEINFO_NAMES

// We can't rely on common symbols being shared between shared objects.
bool std::type_info::
operator== (const std::type_info& arg) const
{
  return (&arg == this) || (__builtin_strcmp (name (), arg.name ()) == 0);
}

#endif

What I missed is that this implementation was conditional, with the
condition being

#if !__GXX_WEAK__
  // If weak symbols are not supported, typeinfo names are not merged.
  #define __GXX_MERGED_TYPEINFO_NAMES 0
#else
  // On platforms that support weak symbols, typeinfo names are merged.
  #define __GXX_MERGED_TYPEINFO_NAMES 1
#endif

So on platforms with weak symbol support, operator== is implemented as

    bool operator==(const type_info& __arg) const
    { return __name == __arg.__name; }

instead; this includes Linux.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  8:07         ` Jason Merrill
@ 2002-05-12  9:24           ` David Abrahams
  0 siblings, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-12  9:24 UTC (permalink / raw)
  To: Jason Merrill
  Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis, c++std-ext

Jason, I had to write all the following exposition to understand your
reply, but then it dawned on me what you meant ;-)

I wrote, describing existing Linux/GCC semantics:
> >>> 4. lib2.so is loaded with RTLD_LOCAL. Because it's RTLD_LOCAL, the
loader
> >>> again creates a new "symbol space"; no duplicates are shared with
X.so.
<snip>

And then, describing my preferred semantics:
> >>> What I'd prefer to happen is that in step 4, the loader would use the
> >>> existing definition for any loaded symbol which is defined in or used
by
> >>> lib2's immediate dependencies. That would nicely model the concept
that
> >>> lib2.so is sharing globally with X.so but not with lib1.so, and it
seems
> >>> like the "right" solution.

Jason replied:
> >> I noticed that the readme says that the test passes on Solaris.  Does
it
> >> provide these semantics?  How about SCO?  Anyone?

Assuming by "these semantics", Jason meant my preferred semantics:
> > The test as written doesn't really tell us the answer since it uses EH
and
> > any implementation can make it a non-issue by comparing
type_info::name()
> > strings instead of addresses.
>
> I meant using gcc 3.0.4 on Solaris.

Ah yes, GCC 3.0.4 would tell us something, since it is using address
comparison. If it worked on Solaris, that would be just as good as using a
different test with that looked at addresses of template static data
members. Good question.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  7:30     ` Jason Merrill
  2002-05-12  7:31       ` David Abrahams
@ 2002-05-12  9:31       ` Martin v. Loewis
  2002-05-12  9:34         ` David Abrahams
  2002-05-12 12:17       ` Mark Mitchell
  2 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-12  9:31 UTC (permalink / raw)
  To: Jason Merrill; +Cc: David Abrahams, Ralf W. Grosse-Kunstleve, gcc

Jason Merrill <jason@redhat.com> writes:

> I find this testcase somewhat persuasive, as the offending dlopen call is
> not in the C++ code.  What do others think?

What I find troubling is that gcc emits the typeinfo for worker_error
into all three object files, even though the class has a non-inline
non-abstract virtual function. It correctly manages to emit the vtable
only once; it should manage to emit the typeinfo (and typeinfo name)
only once, also.

I believe if the single copy of the typeinfo was emitted together with
the vtable, this example would "work".

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  9:31       ` Martin v. Loewis
@ 2002-05-12  9:34         ` David Abrahams
  0 siblings, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-12  9:34 UTC (permalink / raw)
  To: Jason Merrill, Martin v. Loewis; +Cc: Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> Jason Merrill <jason@redhat.com> writes:
>
> > I find this testcase somewhat persuasive, as the offending dlopen call
is
> > not in the C++ code.  What do others think?
>
> What I find troubling is that gcc emits the typeinfo for worker_error
> into all three object files, even though the class has a non-inline
> non-abstract virtual function. It correctly manages to emit the vtable
> only once; it should manage to emit the typeinfo (and typeinfo name)
> only once, also.
>
> I believe if the single copy of the typeinfo was emitted together with
> the vtable, this example would "work".

Hmm, that's very interesting. I agree with Martin that this change should
probably be made, I also think it doesn't go nearly far enough. If I
understand correctly, even with the change, if worker_error were changed to
be a POD struct it would fail again. It seems unreasonable that users
should have to restrict themselves to throwing polymorphic class instances
with out-of-line virtual functions.

[all the same, this one change would make a big difference for my
application]

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12  7:30     ` Jason Merrill
  2002-05-12  7:31       ` David Abrahams
  2002-05-12  9:31       ` Martin v. Loewis
@ 2002-05-12 12:17       ` Mark Mitchell
  2002-05-12 12:24         ` Martin v. Loewis
  2 siblings, 1 reply; 104+ messages in thread
From: Mark Mitchell @ 2002-05-12 12:17 UTC (permalink / raw)
  To: Jason Merrill, David Abrahams
  Cc: python-dev, Ralf W. Grosse-Kunstleve, gcc, Martin v. Loewis

> I find this testcase somewhat persuasive, as the offending dlopen call is
> not in the C++ code.  What do others think?

I agree with your other statement: RTLD_LOCAL and C++ don't really make
sense.

I think we're running down a slippery slope; once EH works, people
*will* wonder why things involving inlines and templates don't.

If, for example, you have *two* Python modules in C++, each of which
uses a nice package for managing global resources, and you can load
either module just fine, but loading both causes subtle runtime
problems, ...

We will have given people a bigger bazooka, but it will be aimed at
their own feet.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:17       ` Mark Mitchell
@ 2002-05-12 12:24         ` Martin v. Loewis
  2002-05-12 12:29           ` Mark Mitchell
  2002-05-12 12:36           ` David Abrahams
  0 siblings, 2 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-12 12:24 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Jason Merrill, David Abrahams, Ralf W. Grosse-Kunstleve, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> > I find this testcase somewhat persuasive, as the offending dlopen call is
> > not in the C++ code.  What do others think?
> 
> I agree with your other statement: RTLD_LOCAL and C++ don't really make
> sense.

The issue is that Python *must* use RTLD_LOCAL to load its extension
modules, or else unrelated extension modules might crash due to
conflicting symbols.

Now, people want to use C++ for extension modules. So far, this has
worked fine - except that it now stops working with g++ 3.x, if you
want to throw exceptions in the extension module.

> We will have given people a bigger bazooka, but it will be aimed at
> their own feet.

Since the alternative is not to allow writing exceptions in C++,
people would be willing to accept restrictions, if they know what
those restrictions are. Requiring all classes used as exceptions to be
polymorphic (non-pure non-inline blabla), and not allowing static
members in templates might be acceptable; not allowing exceptions
probably isn't.

It would be nice if the compiler could warn if features are used that
require symbol uniqueness.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:24         ` Martin v. Loewis
@ 2002-05-12 12:29           ` Mark Mitchell
  2002-05-12 12:36             ` Jason Merrill
  2002-05-12 13:41             ` David Abrahams
  2002-05-12 12:36           ` David Abrahams
  1 sibling, 2 replies; 104+ messages in thread
From: Mark Mitchell @ 2002-05-12 12:29 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Jason Merrill, David Abrahams, Ralf W. Grosse-Kunstleve, gcc



--On Sunday, May 12, 2002 08:48:42 PM +0200 "Martin v. Loewis" 
<martin@v.loewis.de> wrote:

> Mark Mitchell <mark@codesourcery.com> writes:
>
>> > I find this testcase somewhat persuasive, as the offending dlopen call
>> > is not in the C++ code.  What do others think?
>>
>> I agree with your other statement: RTLD_LOCAL and C++ don't really make
>> sense.
>
> The issue is that Python *must* use RTLD_LOCAL to load its extension
> modules, or else unrelated extension modules might crash due to
> conflicting symbols.

I understand.  I'm actually a Python devotee. :-)

And, of course, this applies to C++ modules too: I don't want your
module catching my exceptions just because we both happen to have
a type with the same name.  Or maybe I do, but I'm not sure unless
I know what your module is, and whether it means the same thing by
that type that I do...

> Now, people want to use C++ for extension modules. So far, this has
> worked fine - except that it now stops working with g++ 3.x, if you
> want to throw exceptions in the extension module.
>
>> We will have given people a bigger bazooka, but it will be aimed at
>> their own feet.
>
> Since the alternative is not to allow writing exceptions in C++,
> people would be willing to accept restrictions, if they know what
> those restrictions are.

And here we hit the age-old debate, on which I am usually on the losing
side.

My feeling is that a user interface like this is just not worth having.
It is true that these features can be useful to some people some of the
time, and that in careful hands can be deployed appropriately.  It's
just that we have a lot of problems in GCC due to our corner-case options;
we've tacked on options to let all kinds of people do all kinds of things.
But, we tend to break those options, and we tend to not document them
right, and people tend to use them in unintended ways, and so forth and
so on.  I don't think we serve our users in this way.

All that said, I'm surprised that throwing exceptions -- without crossing
DSO boundaries -- doesn't work.  I'd expect that would work almost by
accident.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:24         ` Martin v. Loewis
  2002-05-12 12:29           ` Mark Mitchell
@ 2002-05-12 12:36           ` David Abrahams
  2002-05-13  1:28             ` Martin v. Loewis
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-12 12:36 UTC (permalink / raw)
  To: Mark Mitchell, Martin v. Loewis
  Cc: Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> > We will have given people a bigger bazooka, but it will be aimed at
> > their own feet.
>
> Since the alternative is not to allow writing exceptions in C++,
"extensions", I think---------------------------^^^^^^^^^^
> people would be willing to accept restrictions, if they know what
> those restrictions are.

Yes. All we need is a well-documented model that can be made to work.

> Requiring all classes used as exceptions to be
> polymorphic (non-pure non-inline blabla)

That's pushing it. I *might* get away with requiring that for my
application, but if my customers accept that restriction it will only be
grudgingly.

> and not allowing static
> members in templates might be acceptable

!! absolutely not acceptable !!

Do you really mean "not allowing", here? The use of static members in
templates (known to probably not be shared) is one of the primary ways I
get around the problems we're discussing. I routinely declare static
reference members and initialize them through function calls to my shared
library. Once static initialization is done, they all refer to the right
piece of data. If I couldn't do that, I'd be royally shafted.

> ; not allowing exceptions
> probably isn't.

Right. Also, let me point out that exceptions and RTTI deserve special
treatment because workarounds like the one described above are not
available.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:29           ` Mark Mitchell
@ 2002-05-12 12:36             ` Jason Merrill
  2002-05-12 12:37               ` Mark Mitchell
  2002-05-12 16:55               ` Jason Merrill
  2002-05-12 13:41             ` David Abrahams
  1 sibling, 2 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-12 12:36 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Martin v. Loewis, David Abrahams, Ralf W. Grosse-Kunstleve, gcc

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

> All that said, I'm surprised that throwing exceptions -- without crossing
> DSO boundaries -- doesn't work.  I'd expect that would work almost by
> accident.

The problem in this case is that both DSOs link against the same shared
library.  Both call a function in that library, which calls back into the
appropriate DSO, which throws an exception.  For the first DSO loaded, the
catch clause in the library matches.  For the second, it doesn't, as it's
checking against the typeinfo from the first.

I suspect that this is really a bug in the Linux dynamic loader, that the
typeinfo references should bind separately for the two DSOs, as David has
suggested.  I've verified that this test works properly on Solaris, though
my investigation as to why must wait while I build a new gdb.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:36             ` Jason Merrill
@ 2002-05-12 12:37               ` Mark Mitchell
  2002-05-12 16:55               ` Jason Merrill
  1 sibling, 0 replies; 104+ messages in thread
From: Mark Mitchell @ 2002-05-12 12:37 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, David Abrahams, Ralf W. Grosse-Kunstleve, gcc

> I suspect that this is really a bug in the Linux dynamic loader, that the
> typeinfo references should bind separately for the two DSOs, as David has
> suggested.  I've verified that this test works properly on Solaris, though
> my investigation as to why must wait while I build a new gdb.

Aha; if it's a dynamic loader bug we can sidestep at least part of the
debate. :-)

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:29           ` Mark Mitchell
  2002-05-12 12:36             ` Jason Merrill
@ 2002-05-12 13:41             ` David Abrahams
  2002-05-13  1:34               ` Martin v. Loewis
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-12 13:41 UTC (permalink / raw)
  To: Mark Mitchell, Martin v. Loewis
  Cc: Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Mark Mitchell" <mark@codesourcery.com>

> And, of course, this applies to C++ modules too: I don't want your
> module catching my exceptions just because we both happen to have
> a type with the same name.  Or maybe I do, but I'm not sure unless
> I know what your module is, and whether it means the same thing by
> that type that I do...

Careful, though: that's not the case we're discussing. The case, once
again, is that two libraries lib1 and lib2 loaded with RTLD_LOCAL are both
linked to a third library X in the usual global-symbol-sharing way.
Throwing exceptions between lib1 and X works, but between lib2 and X fails
to work.

Nobody expects to be able to throw exceptions between two shared libs in
general without explicitly linking them together, AFAIK. However, windows
compilers work this way and AFAIK it does NOT cause problems. People tend
to be fairly careful about letting exceptions travel across module
boundaries where they're not intended in any case. Usually entry points in
modules loaded with RTLD_LOCAL are called by "C" code and are not allowed
to throw anyway.

> And here we hit the age-old debate, on which I am usually on the losing
> side.

I hope so <wink>

> My feeling is that a user interface like this

What do you mean by "like this"?

> is just not worth having.
> It is true that these features

Please be specific so that I can understand your position. Which are "these
features"?

> can be useful to some people some of the
> time, and that in careful hands can be deployed appropriately.  It's
> just that we have a lot of problems in GCC due to our corner-case
options;
> we've tacked on options to let all kinds of people do all kinds of
things.
> But, we tend to break those options, and we tend to not document them
> right, and people tend to use them in unintended ways, and so forth and
> so on.  I don't think we serve our users in this way.

There are several reasonable choices for behaviors which would make things
work. None of them would be hard to document or to understand. The
situation now is hard to get right, hard to understand, and hard to test
(because the behavior depends on module load order). Making things work
would be a vast improvement for users, and would make it possible to
program to a reasonably similar mental model for most platforms that
support dynamic linking.

> All that said, I'm surprised that throwing exceptions -- without crossing
> DSO boundaries -- doesn't work.  I'd expect that would work almost by
> accident.

Have you read the rest of the thread? The reasons it doesn't work by
accident have been pretty fully explored; if you're still surprised in
light of that explanation I'd appreciate knowing why because the rest of us
are probably missing something important.

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:36             ` Jason Merrill
  2002-05-12 12:37               ` Mark Mitchell
@ 2002-05-12 16:55               ` Jason Merrill
  1 sibling, 0 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-12 16:55 UTC (permalink / raw)
  Cc: Martin v. Loewis, David Abrahams, Ralf W. Grosse-Kunstleve, gcc

>>>>> "Jason" == Jason Merrill <jason@redhat.com> writes:

> I suspect that this is really a bug in the Linux dynamic loader, that the
> typeinfo references should bind separately for the two DSOs, as David has
> suggested.  I've verified that this test works properly on Solaris, though
> my investigation as to why must wait while I build a new gdb.

Oddly, Solaris seems to have the same resolution semantics as Linux for the
type_info node itself, but manages to combine the name symbols; perhaps
because they are read-only.  Anyway, that's why the test passes, not
because it has David's desired semantics.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 12:36           ` David Abrahams
@ 2002-05-13  1:28             ` Martin v. Loewis
  2002-05-13  5:00               ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-13  1:28 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> > and not allowing static
> > members in templates might be acceptable
> 
> !! absolutely not acceptable !!
> 
> Do you really mean "not allowing", here? The use of static members in
> templates (known to probably not be shared) is one of the primary ways I
> get around the problems we're discussing. I routinely declare static
> reference members and initialize them through function calls to my shared
> library. Once static initialization is done, they all refer to the right
> piece of data. If I couldn't do that, I'd be royally shafted.

Are you saying you rely on the fact that static members of template
classes are not shared across multiple instantiations of the same
template? The C++ standard is very clear about that all instantiations
ought to refer to the same variable.

Mark's point is, that, from the position of a compiler vendor, it is
unacceptable to declare a feature as "supported", when there are usage
restrictions on that feature in clear violation of the standard.

Also, how do you think you can initialize the static template member?
Through function calls of the class template? Forget it, that might
not work - you are relying here on functions being emitted together
with the instantiation of the static member, and there is no guarantee
that this will work.

> Right. Also, let me point out that exceptions and RTTI deserve special
> treatment because workarounds like the one described above are not
> available.

Ok, then what about block-local static variables in inline function,
and in template functions?

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-12 13:41             ` David Abrahams
@ 2002-05-13  1:34               ` Martin v. Loewis
  2002-05-13  2:05                 ` Mark Mitchell
  2002-05-13  5:44                 ` David Abrahams
  0 siblings, 2 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-13  1:34 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> > My feeling is that a user interface like this
> 
> What do you mean by "like this"?
>
> > is just not worth having.  It is true that these features
>
> Please be specific so that I can understand your position. Which are "these
> features"?

Interpreting Mark: "this" is an interface that sometimes works,
sometimes doesn't.

> Have you read the rest of the thread? The reasons it doesn't work by
> accident have been pretty fully explored; if you're still surprised
> in light of that explanation I'd appreciate knowing why because the
> rest of us are probably missing something important.

Mark is right that you just found a special case of a more general
problem, and that what you consider a solution (do string compares)
just solves the special case, not the general problem.

Consider:

inline void
inc_boost_counter()
{
  static int counter = 0;
  counter++;
  if(counter % 1000 == 0)
    report_counter();
}

If this was part of the libboost headers, then, again, you would end
up with three copies of the static variable (ext1, ext2, libcore).  If
ext1 is loaded first, libcore's usage is resolved to the copy in ext1,
and expansions of the inline function in ext2 would use a second copy
(the copy inside libcore would not be used). That would be equally
wrong, but impossible to fix.

Likewise for templates:

template<typename T>
class X{
  static T* singleton;
};

template<typename T>T* X::singleton = new T;

You would end up with up-to three copies of the singleton for any
value of T.

Your last straw is that this is a bug in the Linux dynamic loader,
since the Solaris loader behaves differently. I'm not sure exactly how
it behaves, but it probably has one of the following options:

1. resolve all weak symbols defined both in extN and libcore to
   libcore. Then, anybody linking agains libcore gets the definition
   of the symbols in libcore. This probably violates the symbol
   resolution order of the gABI, so this likely not happens.

2. When loading ext1, resolve weak symbols defined both in ext1 and
   libcore to the definition in ext1; this is what happens on Linux,
   too. When loading ext2, notice that some of the symbols defined
   both in ext2 and libcore have already been resolved, for those,
   use the pre-existing binding; this essentially results in ext1,
   ext2, and libcore sharing common symbols.

You could try to report this as a bug in the Linux dynamic linker. For
that, you probably have to construct an example that doesn't include
C++, but instead directly involves weak symbols. Of course, you should
make sure that your example "passes" on Solaris.

HTH,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13  1:34               ` Martin v. Loewis
@ 2002-05-13  2:05                 ` Mark Mitchell
  2002-05-13  5:44                 ` David Abrahams
  1 sibling, 0 replies; 104+ messages in thread
From: Mark Mitchell @ 2002-05-13  2:05 UTC (permalink / raw)
  To: Martin v. Loewis, David Abrahams
  Cc: Jason Merrill, Ralf W. Grosse-Kunstleve, gcc



--On Monday, May 13, 2002 07:58:20 AM +0200 "Martin v. Loewis" 
<martin@v.loewis.de> wrote:

> "David Abrahams" <david.abrahams@rcn.com> writes:
>
>> > My feeling is that a user interface like this
>>
>> What do you mean by "like this"?
>>
>> > is just not worth having.  It is true that these features
>>
>> Please be specific so that I can understand your position. Which are
>> "these features"?
>
> Interpreting Mark: "this" is an interface that sometimes works,
> sometimes doesn't.
>
>> Have you read the rest of the thread? The reasons it doesn't work by
>> accident have been pretty fully explored; if you're still surprised
>> in light of that explanation I'd appreciate knowing why because the
>> rest of us are probably missing something important.
>
> Mark is right that you just found a special case of a more general
> problem, and that what you consider a solution (do string compares)
> just solves the special case, not the general problem.

Martin, who does not even completely agree with me, has done an excellent
job of elucidating my posting.  Thank you!

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13  1:28             ` Martin v. Loewis
@ 2002-05-13  5:00               ` David Abrahams
  2002-05-13 16:50                 ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-13  5:00 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > > and not allowing static
> > > members in templates might be acceptable
> >
> > !! absolutely not acceptable !!
> >
> > Do you really mean "not allowing", here? The use of static members in
> > templates (known to probably not be shared) is one of the primary ways
I
> > get around the problems we're discussing. I routinely declare static
> > reference members and initialize them through function calls to my
shared
> > library. Once static initialization is done, they all refer to the
right
> > piece of data. If I couldn't do that, I'd be royally shafted.
>
> Are you saying you rely on the fact that static members of template
> classes are not shared across multiple instantiations of the same
> template? T

No, I'm saying that I'm relying on being *allowed* to use them, and I don't
happen to care whether they're shared in this case. For example:

// lib1.hpp
#include <typeinfo>
SomeClass& get_x(char const*);

// lib2.hpp, lib3.hpp, lib4.hpp
#include "lib1.hpp"
 template <class T>
struct object
{
    static int& x;
}
template <class T> int& object<T>::x = get_x(typeid(T).name());

// lib2.cpp
...something which instantiates object<X>::x...

//lib1.cpp
#include <map>
#include lib2.hpp
int& get_x(char const* s)
{
    static map<char const*, int, compare_strings> m;
    return m[s];
}

> The C++ standard is very clear about that all instantiations
> ought to refer to the same variable.

And it's very unclear about what to do with shared libraries. We in the
comittee are just starting to tackle that one, and I am confident we are
not going to require the ODR be preserved in cases where symbols are not
shared.

> Mark's point is, that, from the position of a compiler vendor, it is
> unacceptable to declare a feature as "supported", when there are usage
> restrictions on that feature in clear violation of the standard.

From the point of view of the standard, shared libraries are not covered; a
vendor is in the territory of extensions, and any behavior is allowed. What
you can call "supported" is up to QOI.

> Also, how do you think you can initialize the static template member?
> Through function calls of the class template? Forget it, that might
> not work - you are relying here on functions being emitted together
> with the instantiation of the static member, and there is no guarantee
> that this will work.


I have no idea what the above might mean. I do not happen to be calling the
class template's own function members to initialize its data, but I can't
imagine how it would be a problem unless I were relying on sharing or
not-sharing... But I'm not.

> > Right. Also, let me point out that exceptions and RTTI deserve special
> > treatment because workarounds like the one described above are not
> > available.
>
> Ok, then what about block-local static variables in inline function,
> and in template functions?


These /are/ susceptible to the same sort of workaround, using references
(an extra level of indirection, the classic approach!) I wouldn't expect to
see a unique copy unless all instantiating libraries share symbols globally
with one-another.

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13  1:34               ` Martin v. Loewis
  2002-05-13  2:05                 ` Mark Mitchell
@ 2002-05-13  5:44                 ` David Abrahams
  2002-05-13 16:58                   ` Martin v. Loewis
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-13  5:44 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>
To: "David Abrahams" <david.abrahams@rcn.com>
Cc: "Mark Mitchell" <mark@codesourcery.com>; "Jason Merrill"
<jason@redhat.com>; "Ralf W. Grosse-Kunstleve" <rwgk@cci.lbl.gov>;
<gcc@gcc.gnu.org>
Sent: Monday, May 13, 2002 12:58 AM
Subject: Re: Minimal GCC/Linux shared lib + EH bug example


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > > My feeling is that a user interface like this
> >
> > What do you mean by "like this"?
> >
> > > is just not worth having.  It is true that these features
> >
> > Please be specific so that I can understand your position. Which are
"these
> > features"?
>
> Interpreting Mark: "this" is an interface that sometimes works,
> sometimes doesn't.
>
> > Have you read the rest of the thread? The reasons it doesn't work by
> > accident have been pretty fully explored; if you're still surprised
> > in light of that explanation I'd appreciate knowing why because the
> > rest of us are probably missing something important.
>
> Mark is right that you just found a special case of a more general
> problem, and that what you consider a solution (do string compares)
> just solves the special case, not the general problem.


I think that's a mischaracterization of my position. Have you read my other
postings (especially http://gcc.gnu.org/ml/gcc/2002-05/msg00869.html)? I
consider "do string compares" a viable workaround, not a general solution.
In a general solution the loader would resolve symbols differently.

> Consider:
>
> inline void
> inc_boost_counter()
> {
>   static int counter = 0;
>   counter++;
>   if(counter % 1000 == 0)
>     report_counter();
> }
>
> If this was part of the libboost headers, then, again, you would end
> up with three copies of the static variable (ext1, ext2, libcore).  If
> ext1 is loaded first, libcore's usage is resolved to the copy in ext1,
> and expansions of the inline function in ext2 would use a second copy
> (the copy inside libcore would not be used).

I understand that's the way it works now, of course, because the loader's
got the wrong semantics.

> That would be equally wrong,

Wrongness can't be determined until the standard describes the behavior of
shared libs.

> but impossible to fix.

In this case a redesign of the library which uses a non-inline function
fixes the problem.

> Likewise for templates:
>
> template<typename T>
> class X{
>   static T* singleton;
> };>
> template<typename T>T* X::singleton = new T;
>
> You would end up with up-to three copies of the singleton for any
> value of T.


Two, I think (if the loader semantics stay the same). Isn't the case
perfectly analogous to the one above?

> Your last straw is that this is a bug in the Linux dynamic loader,
> since the Solaris loader behaves differently. I'm not sure exactly how
> it behaves, but it probably has one of the following options:
>
> 1. resolve all weak symbols defined both in extN and libcore to
>    libcore. Then, anybody linking agains libcore gets the definition
>    of the symbols in libcore. This probably violates the symbol
>    resolution order of the gABI, so this likely not happens.
>
> 2. When loading ext1, resolve weak symbols defined both in ext1 and
>    libcore to the definition in ext1; this is what happens on Linux,
>    too. When loading ext2, notice that some of the symbols defined
>    both in ext2 and libcore have already been resolved, for those,
>    use the pre-existing binding; this essentially results in ext1,
>    ext2, and libcore sharing common symbols.
>
> You could try to report this as a bug in the Linux dynamic linker. For
> that, you probably have to construct an example that doesn't include
> C++, but instead directly involves weak symbols. Of course, you should
> make sure that your example "passes" on Solaris.


I wouldn't know where to start with that one. Is there an explicit way to
mark a symbol "weak"?

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13  5:00               ` David Abrahams
@ 2002-05-13 16:50                 ` Martin v. Loewis
  2002-05-13 19:00                   ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-13 16:50 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> No, I'm saying that I'm relying on being *allowed* to use them, and I don't
> happen to care whether they're shared in this case. 

Well, the "restrictions" document would say "you can use these
features, but we don't guarantee any specific semantics".

This is the same for the exception handling: you can certainly throw
any exceptions you want - but the compiler (or run-time system) won't
guarantee that you can catch them everywhere.

How could the authors of the compiler, or the authors of Python, know
which deviations from accepted well-defined behaviour (in this case,
the ISO C++ standard) are acceptable to you, and which aren't?

What good would be it be if you are happy, but the next user complains
that all his counters are incorrect? (apart from the fact that I want
you happy, of course :-)

> > Also, how do you think you can initialize the static template member?
> > Through function calls of the class template? Forget it, that might
> > not work - you are relying here on functions being emitted together
> > with the instantiation of the static member, and there is no guarantee
> > that this will work.
> 
> 
> I have no idea what the above might mean. I do not happen to be calling the
> class template's own function members to initialize its data, but I can't
> imagine how it would be a problem unless I were relying on sharing or
> not-sharing... But I'm not.

Your get_x function above would fail to work if it was a member of the
template, since you can multiple copies of the block-static variable.

Likewise, if the get_x function is implemented in a static library, it
will fail to work correctly - even if it is defined the way you wrote
it.

> These /are/ susceptible to the same sort of workaround, using references
> (an extra level of indirection, the classic approach!) I wouldn't expect to
> see a unique copy unless all instantiating libraries share symbols globally
> with one-another.

I think with RTLD_LAZY, you are up for even more surprises, e.g. when
the local variable and its initialization guard resolve inconsistently.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13  5:44                 ` David Abrahams
@ 2002-05-13 16:58                   ` Martin v. Loewis
  2002-05-13 21:39                     ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-13 16:58 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> I think that's a mischaracterization of my position. Have you read my other
> postings (especially http://gcc.gnu.org/ml/gcc/2002-05/msg00869.html)? 

Why keep you asking whether people have read important information
that you have shared? Please just assume they did, and don't hesitate
to repeat information in context, perhaps with different wording, to
make you better understood, in case you feel you weren't understood
the first time.

> > That would be equally wrong,
> 
> Wrongness can't be determined until the standard describes the behavior of
> shared libs.

In absence of a well definition, people usually consider the compiler
"wrong" if it doesn't do what they expect it to do.

> In this case a redesign of the library which uses a non-inline function
> fixes the problem.

By redesigning the code of the library, every problem can be
solved. Just don't throw exceptions across DSO boundaries, and this
specific problem goes away.

> I wouldn't know where to start with that one. Is there an explicit way to
> mark a symbol "weak"?

On the assember level, with the .weak directive. On C level, either
with a __asm__ statement, or (I believe) with an
__attribute__((weak)).

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13 16:50                 ` Martin v. Loewis
@ 2002-05-13 19:00                   ` David Abrahams
  2002-05-14  2:14                     ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-13 19:00 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > No, I'm saying that I'm relying on being *allowed* to use them, and I
don't
> > happen to care whether they're shared in this case.
>
> Well, the "restrictions" document would say "you can use these
> features, but we don't guarantee any specific semantics".

I could live with it if the standard told me that this was undefined
behavior, as long as implementors were inclined to "play nice" and give me
useful semantics I could count on.

The fact that some implementors are inclined to make this undefined
behavior is one reason I want the standard to say something stronger.

> This is the same for the exception handling: you can certainly throw
> any exceptions you want - but the compiler (or run-time system) won't
> guarantee that you can catch them everywhere.

I don't want it to guarantee that they can be caught everywhere. I /would/
like it to tell me precisely where they can and can't be caught, insofar as
it's possible. I'd also like to get you to change what you tell me about
it, but more on that below ;-)

> How could the authors of the compiler, or the authors of Python, know
> which deviations from accepted well-defined behaviour (in this case,
> the ISO C++ standard) are acceptable to you, and which aren't?

Any time you're in the domain of features (like shared libraries) which are
outside the standard, you need to make some judgements about what behaviors
would be most useful and predictable. That's what QOI is all about. I claim
to have a model which makes shared linking more useful and predictable than
what GCC/Linux currently implements. So, the short answer is "they can't
know for sure, but they can try to do the best job possible". I hope they
will.

> What good would be it be if you are happy, but the next user complains
> that all his counters are incorrect? (apart from the fact that I want
> you happy, of course :-)

Don't change anything just for me. A change should be made because it's the
right thing to do. As I've said, though I could get by with an
EH/RTTI-specific patch, what I really want, and what I have in mind, is a
behavior which works better for the next user's counters as well as
everything else.

> > > Also, how do you think you can initialize the static template member?
> > > Through function calls of the class template? Forget it, that might
> > > not work - you are relying here on functions being emitted together
> > > with the instantiation of the static member, and there is no
guarantee
> > > that this will work.
> >
> >
> > I have no idea what the above might mean. I do not happen to be calling
the
> > class template's own function members to initialize its data, but I
can't
> > imagine how it would be a problem unless I were relying on sharing or
> > not-sharing... But I'm not.
>
> Your get_x function above would fail to work if it was a member of the
> template, since you can multiple copies of the block-static variable.

Of course. That's why I put it in a source file in lib1.

> Likewise, if the get_x function is implemented in a static library, it
> will fail to work correctly - even if it is defined the way you wrote
> it.

Again, of course. You seem to be operating on the assumption that users of
shared libraries will expect them to be semantically equivalent to
good-ol'-static linking under all circumstances, and that if we can't make
them equivalent we shouldn't give any guarantees at all. Of course, nobody
but the most naive users have that expectation. Yes, we can make shared
libs act like static libs when they're linked in the usual way, but other
arrangements are quite common, and users have a mental model for those as
well. It's not clear that the model corresponds with reality, of course,
but it's worth supporting well-defined semantics when possible.

> > These /are/ susceptible to the same sort of workaround, using
references
> > (an extra level of indirection, the classic approach!) I wouldn't
expect to
> > see a unique copy unless all instantiating libraries share symbols
globally
> > with one-another.
>
> I think with RTLD_LAZY, you are up for even more surprises, e.g. when
> the local variable and its initialization guard resolve inconsistently.

Yes, I can see that. I think the current Linux/GCC symbol-binding semantics
lead to all sorts of trouble, and should be... dare-I-say-it... fixed.

Just to reiterate (and rephrase) what I think is the right behavior:

1. For each symbol, there is an undirected graph which determines how it is
shared.
2. Nodes of the graph correspond to shared libraries. There is also a node
for the sole executable.
3. At each boundary between nodes where there is global symbol sharing,
either via explicit linking, or via dlopen with RTLD_GLOBAL, an edge is
formed between nodes in a symbol's graph iff the symbol is used (defined or
unresolved) in BOTH nodes
4. A symbol's definition is shared between nodes A and B iff A is reachable
from B in that symbol's graph. Note that this is only true if there is a
continuous chain of global sharing between A and B, and all intermediate
nodes use the symbol as well.
5. Since neighbor nodes which share symbols are already required to have
the same definition of any symbols they have in common, there's no chance
of undesired name collision.
6. Since symbols used by both neighbors will always be shared between
neighbor nodes there's no chance of identity crises like we're seeing
today.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13 16:58                   ` Martin v. Loewis
@ 2002-05-13 21:39                     ` David Abrahams
  2002-05-14  2:34                       ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-13 21:39 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > I think that's a mischaracterization of my position. Have you read my
other
> > postings (especially http://gcc.gnu.org/ml/gcc/2002-05/msg00869.html)?
>
> Why keep you asking whether people have read important information
> that you have shared?

Because I can't tell whether they read it, and it's hard to know where to
go in a conversation when you don't know how much of the background the
people you're talking with may have missed.

> Please just assume they did, and don't hesitate
> to repeat information in context, perhaps with different wording, to
> make you better understood, in case you feel you weren't understood
> the first time.

If I thought the post had left room for misinterpretation, I'd have done
that. Anyway, no offense was intended, and I don't mind trying to find
different ways to repeat myself until I think I'm understood.

> > > That would be equally wrong,
> >
> > Wrongness can't be determined until the standard describes the behavior
of
> > shared libs.
>
> In absence of a well definition, people usually consider the compiler
> "wrong" if it doesn't do what they expect it to do.

Then it's going to be wrong for lots of people until we have a good
definition. And then it's still going to be wrong ;-)

> > In this case a redesign of the library which uses a non-inline function
> > fixes the problem.
>
> By redesigning the code of the library, every problem can be
> solved. Just don't throw exceptions across DSO boundaries, and this
> specific problem goes away.

Point taken, but still the cases are very different. One looks like an
implementation detail to a library's users; the other one doesn't.

> > I wouldn't know where to start with that one. Is there an explicit way
to
> > mark a symbol "weak"?
>
> On the assember level, with the .weak directive. On C level, either
> with a __asm__ statement, or (I believe) with an
> __attribute__((weak)).

If I understand correctly, weak symbols were introduced to allow things
like users replacing malloc, free, operator new, etc. AFAIK, there are
specific symbols which are meant to work this way: the user's definition
gets priority over that in any of his dependencies. AFAICT, this model is
part of the reason for the current behavior we're seeing with C++. However,
I think the same case could be made by putting malloc/free replacements in
each of two extension modules which are linked to a common shared lib,
right? Then memory allocated by ext2 couldn't be freed by the common lib,
and vice-versa. Can you think of a more-minimal or more-compelling test
case?

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13 19:00                   ` David Abrahams
@ 2002-05-14  2:14                     ` Martin v. Loewis
  2002-05-14  6:07                       ` David Abrahams
  2002-05-14 13:23                       ` Sean Parent
  0 siblings, 2 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-14  2:14 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext

"David Abrahams" <david.abrahams@rcn.com> writes:

> Don't change anything just for me. A change should be made because it's the
> right thing to do. 

It's not obvious what the right thing to do is, here. Changing
typeinfo comparison will lose performance. It is not clear that
"traditional" (i.e. most) applications, which just compile and link a
"program" should be penalized to support more exotic applications.

That's a difficult judgement, and not one I'm willing to make.

> Again, of course. You seem to be operating on the assumption that
> users of shared libraries will expect them to be semantically
> equivalent to good-ol'-static linking under all circumstances, and
> that if we can't make them equivalent we shouldn't give any
> guarantees at all. Of course, nobody but the most naive users have
> that expectation.

You might be surprised how many users have that expectation. As you
know, Ralf originally was linking a static libboost, and it never
occurred to him that something might be wrong in the build process.

I believe that it is a useful approach to keep the notion that shared
libraries are part of a "program", and that the program still ought to
implement the semantics of the relevant standards. In this specific
case, any Python extension would be a "program" (though free-standing,
since it has a different entry point).

> Yes, we can make shared libs act like static libs when they're
> linked in the usual way, but other arrangements are quite common,
> and users have a mental model for those as well. It's not clear that
> the model corresponds with reality, of course, but it's worth
> supporting well-defined semantics when possible.

Unfortunately, apart from the obvious cases, it is pretty difficult to
give a well definition; it is much easier to declare problematic cases
as undefined. You probably cannot convince compiler vendors to follow
what you consider a reasonable semantics unless you specify what that
semantics is, and contribute code or money to change their
implementations.

> 1. For each symbol, there is an undirected graph which determines how it is
> shared.
> 2. Nodes of the graph correspond to shared libraries. There is also a node
> for the sole executable.
> 3. At each boundary between nodes where there is global symbol sharing,
> either via explicit linking, or via dlopen with RTLD_GLOBAL, an edge is
> formed between nodes in a symbol's graph iff the symbol is used (defined or
> unresolved) in BOTH nodes
> 4. A symbol's definition is shared between nodes A and B iff A is reachable
> from B in that symbol's graph. Note that this is only true if there is a
> continuous chain of global sharing between A and B, and all intermediate
> nodes use the symbol as well.

That item 4 is in violation of the ELF spec. If both A and B define
the same symbol, and if the symbol is weak, and if A is reachable from
B, then the dynamic linker shall chose the definition of in B, not the
definition in A.

This is necessary to allow executables to override the definition in a
shared library, e.g. for replacement functions.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-13 21:39                     ` David Abrahams
@ 2002-05-14  2:34                       ` Martin v. Loewis
  2002-05-14 13:12                         ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-14  2:34 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> If I understand correctly, weak symbols were introduced to allow things
> like users replacing malloc, free, operator new, etc. 

That's why they were introduced, yes. It turns out that they are
useful for a number of other things: you can emit duplicate
definitions of the same symbol, and it won't be a linker error. This
usage is particularly important for C++, to support virtual tables,
typeinfo objects, template instantiations, etc.

> I think the same case could be made by putting malloc/free replacements in
> each of two extension modules which are linked to a common shared lib,
> right? 

Not sure what "the same case" is, here.

> Then memory allocated by ext2 couldn't be freed by the common lib,
> and vice-versa.

If both ext1 and ext2 override the allocator in libcore, then all
calls in libcore would use the definition in ext1, but calls in ext2
would use their own definition, yes.

> Can you think of a more-minimal or more-compelling test case?

Think of: yes, I would just make three functions, each printing a
different message. It's not a compelling test case, but it
demonstrates the difference to Solaris, and it demonstrates that the
libcore and ext2 use different definitions, even though they are
linked together.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14  2:14                     ` Martin v. Loewis
@ 2002-05-14  6:07                       ` David Abrahams
  2002-05-14 13:53                         ` Martin v. Loewis
  2002-05-14 13:23                       ` Sean Parent
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-14  6:07 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > Don't change anything just for me. A change should be made because it's
the
> > right thing to do.
>
> It's not obvious what the right thing to do is, here.

I normally wouldn't say this, but today I think the right answer is
obvious.

> Changing
> typeinfo comparison will lose performance. It is not clear that
> "traditional" (i.e. most) applications, which just compile and link a
> "program" should be penalized to support more exotic applications.

Speculating, of course: I doubt it would have much impact because users
typically know better than to use RTTI and EH in their inner loops; every
good C++ book I've seen warns that these features are not always fast.
Also, it could be made to cost almost nothing by storing a hash of the
string in an extended area of the type_info record and comparing that
first. However, I don't think this is "the right answer"; it's just a
workaround.

> That's a difficult judgement, and not one I'm willing to make.
>
> I believe that it is a useful approach to keep the notion that shared
> libraries are part of a "program", and that the program still ought to
> implement the semantics of the relevant standards.

I totally agree. In that case you'll have to accept my shared linking
semantics, I think. In today's model, depending on the order in which it is
loaded w.r.t. other "programs", a "program" may or may not share with its
libraries.

> In this specific
> case, any Python extension would be a "program" (though free-standing,
> since it has a different entry point).

So, what happens when multiple "programs" link to the same shared library?

> > Yes, we can make shared libs act like static libs when they're
> > linked in the usual way, but other arrangements are quite common,
> > and users have a mental model for those as well. It's not clear that
> > the model corresponds with reality, of course, but it's worth
> > supporting well-defined semantics when possible.
>
> Unfortunately, apart from the obvious cases, it is pretty difficult to
> give a well definition; it is much easier to declare problematic cases
> as undefined.

Isn't that always the way? I guess I'll just have to make the right answer
seem more obvious <0.1 wink>

> You probably cannot convince compiler vendors to follow
> what you consider a reasonable semantics unless you specify what that
> semantics is, and contribute code or money to change their
> implementations.

There's one other way: the pressure of standards. However, that takes a
long time, and seldom works without a reference implementation...

> > 1. For each symbol, there is an undirected graph which determines how
it is
                                    ^^^^^^^^^^
> > shared.
> > 2. Nodes of the graph correspond to shared libraries. There is also a
node
> > for the sole executable.
> > 3. At each boundary between nodes where there is global symbol sharing,
> > either via explicit linking, or via dlopen with RTLD_GLOBAL, an edge is
> > formed between nodes in a symbol's graph iff the symbol is used
(defined or
> > unresolved) in BOTH nodes
> > 4. A symbol's definition is shared between nodes A and B iff A is
reachable
> > from B in that symbol's graph. Note that this is only true if there is
a
> > continuous chain of global sharing between A and B, and all
intermediate
> > nodes use the symbol as well.
>
> That item 4 is in violation of the ELF spec. If both A and B define
> the same symbol, and if the symbol is weak, and if A is reachable from
> B, then the dynamic linker shall chose the definition of in B, not the
> definition in A.

I think you misunderstand me. What you wrote doesn't contradict item 4:
since the graph is bidirectional, if A is reachable from B then B is
reachable from A. I don't care /which/ definition is chosen - they're
required to be the same anyway - I only care that they're shared.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14  2:34                       ` Martin v. Loewis
@ 2002-05-14 13:12                         ` David Abrahams
  2002-05-14 14:17                           ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-14 13:12 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>
> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > If I understand correctly, weak symbols were introduced to allow things
> > like users replacing malloc, free, operator new, etc.
>
> That's why they were introduced, yes. It turns out that they are
> useful for a number of other things: you can emit duplicate
> definitions of the same symbol, and it won't be a linker error. This
> usage is particularly important for C++, to support virtual tables,
> typeinfo objects, template instantiations, etc.


Yes, thanks, I understand the reasons. However, there's a small difference
if I understand things correctly: in these C++ cases, typically /all/ of
the definitions are weak, right?

> > I think the same case could be made by putting malloc/free replacements
in
> > each of two extension modules which are linked to a common shared lib,
> > right?
>
> Not sure what "the same case" is, here.


I mean that we can make the "same" (well, analogous) argument for what the
symbol sharing behavior ought to be.

> > Then memory allocated by ext2 couldn't be freed by the common lib,
> > and vice-versa.
>
> If both ext1 and ext2 override the allocator in libcore, then all
> calls in libcore would use the definition in ext1, but calls in ext2
> would use their own definition, yes.
>
> > Can you think of a more-minimal or more-compelling test case?
>
> Think of: yes, I would just make three functions, each printing a
> different message. It's not a compelling test case, but it
> demonstrates the difference to Solaris, and it demonstrates that the
> libcore and ext2 use different definitions, even though they are
> linked together.


Yes, it's minimal, but when you don't attach some connotation of real-world
semantics to these functions it's easy to miss the reasons that it should
work differently.

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14  2:14                     ` Martin v. Loewis
  2002-05-14  6:07                       ` David Abrahams
@ 2002-05-14 13:23                       ` Sean Parent
  2002-05-14 14:08                         ` David Abrahams
  1 sibling, 1 reply; 104+ messages in thread
From: Sean Parent @ 2002-05-14 13:23 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

[Reading this thread gets confusing about when people are discussing
explicitly loaded (and unloaded) libraries vs. libraries which are
dynamically linked - the issues are very different so let's try to be clear
about what case is being discussed in which message.]

I'm discussing in this message the case of dynamic linking.

on 5/13/02 11:16 PM, Martin v. Loewis at martin@v.loewis.de wrote:

> I believe that it is a useful approach to keep the notion that shared
> libraries are part of a "program", and that the program still ought to
> implement the semantics of the relevant standards. In this specific
> case, any Python extension would be a "program" (though free-standing,
> since it has a different entry point).

[A small issue here - let's not introduce "shared" into the vocabulary.
Whether or not a library is shared by other processes when it is dynamically
linked usually has no relevance - and when it is relevant, for example in a
system that allows for an instantiation of the library to be shared across
processes, the added complexity only complicates this discussion.]

By "shared" I will assume you mean "dynamically linked". Although I agree
with the term "ought to" I'm unaware of any system which currently behaves
even remotely like this ideal. David's proposal, I believe amounts to
requiring that dynamic linking is handled much more like static linking with
a resolution mechanism for duplicate symbols. This proposal would require
significantly more sophisticated loaders (than current loaders which usually
do a very simple name binding).

This opens the question - is it necessary to force a new model for loading
libraries to get reasonable semantics for C++ dynamic linking? If the answer
is "yes" then we will have a very difficult adoption going forward as the
library loader is a relatively fundamental piece of any operating system.
Beyond those issue there is the fact that dynamically linked libraries
currently provide a degree of encapsulation - and that encapsulation is a
major reason developers use dynamic linking. Forcing a broad notion of
symbol resolution potentially defeats many of the benefits.
 
> Unfortunately, apart from the obvious cases, it is pretty difficult to
> give a well definition; it is much easier to declare problematic cases
> as undefined. You probably cannot convince compiler vendors to follow
> what you consider a reasonable semantics unless you specify what that
> semantics is, and contribute code or money to change their
> implementations.

What are the obvious cases? The problematic cases in C++ includes aspects
of, RTTI, exception handling, inline functions, templates, static members,
and memory allocation. This currently disqualifies all of the C++ libraries,
and large portions of the language from being used in any consistent manner
in dynamically linked libraries.

Although I don't mind considering changes to the runtime to make C++ work
better I think changes of this nature should be considered a last resort.
Rather, I would like us to look in depth at each issue and determine the
following:

"Is it reasonable to require that a conforming implementation of the
language make this work with existing dynamic linking implementations?"

"If not, is there a change that can be made to the language or library
definition that would allow this?"

"If not, is this a language or library feature which is isolated enough that
it can be deemed to have undefined behavior when used with dynamically
linked libraries without "crippling" the usefulness of the language for
these purposes?"

"If not, then is there a change to the loading mechanism that would be
feasible for a platform provider to make that could enable this."

Because we don't want to mess with "our language" - I fear we are jumping to
the conclusion that everything falls into the last category.

Sean

-- 
Sean Parent
Sr. Computer Scientist II
Advanced Technology Group
Adobe Systems Incorporated
sparent@adobe.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14  6:07                       ` David Abrahams
@ 2002-05-14 13:53                         ` Martin v. Loewis
  2002-05-14 14:45                           ` David Abrahams
  2002-05-14 15:28                           ` Jason Merrill
  0 siblings, 2 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-14 13:53 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext

"David Abrahams" <david.abrahams@rcn.com> writes:

> Speculating, of course: I doubt it would have much impact because users
> typically know better than to use RTTI and EH in their inner loops; every
> good C++ book I've seen warns that these features are not always fast.

I'm guessing also, but I see more and more users of dynamic_cast
around me, after they get told that it is safer than static_cast; I
agree with them that users should use dynamic_cast if they can't
easily guarantee that static_cast will be always correct.

So expect more usage of RTTI in the future.

> I totally agree. In that case you'll have to accept my shared linking
> semantics, I think. In today's model, depending on the order in which it is
> loaded w.r.t. other "programs", a "program" may or may not share with its
> libraries.

I can't see all consequences, so I cannot really bless that approach
right now. Implementing that approach will tell what new problems it
causes.

> > In this specific case, any Python extension would be a "program"
> > (though free-standing, since it has a different entry point).
>
> So, what happens when multiple "programs" link to the same shared library?

Each object in the shared library should still exist only once. That
means that some objects can change their state "spontaneously", from
the view of a "program", but apart from that, ODR and everything is
preserved.

> > You probably cannot convince compiler vendors to follow
> > what you consider a reasonable semantics unless you specify what that
> > semantics is, and contribute code or money to change their
> > implementations.
> 
> There's one other way: the pressure of standards. However, that takes a
> long time, and seldom works without a reference implementation...

I think it should never work without a reference implementation:
standardization should codify existing practice, instead of setting
new grounds. That is OT here, of course.

> I think you misunderstand me. What you wrote doesn't contradict item 4:
> since the graph is bidirectional, if A is reachable from B then B is
> reachable from A. I don't care /which/ definition is chosen - they're
> required to be the same anyway - I only care that they're shared.

I see. If that is what Solaris does, it sounds reasonable indeed. If
Solaris does something else, I'd like to see that approach confronted
with your specification.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 13:23                       ` Sean Parent
@ 2002-05-14 14:08                         ` David Abrahams
  2002-05-14 18:38                           ` Sean Parent
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-14 14:08 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Sean Parent" <sparent@adobe.com>

> on 5/13/02 11:16 PM, Martin v. Loewis at martin@v.loewis.de wrote:
>
> > I believe that it is a useful approach to keep the notion that shared
> > libraries are part of a "program", and that the program still ought to
> > implement the semantics of the relevant standards. In this specific
> > case, any Python extension would be a "program" (though free-standing,
> > since it has a different entry point).
>
> By "shared" I will assume you mean "dynamically linked". Although I agree
> with the term "ought to" I'm unaware of any system which currently
behaves
> even remotely like this ideal. David's proposal, I believe amounts to
> requiring that dynamic linking is handled much more like static linking
with
> a resolution mechanism for duplicate symbols. This proposal would require
> significantly more sophisticated loaders (than current loaders which
usually
> do a very simple name binding).

Could you be more specific about the differences you are mentioning? It
seems to me that the current model is identical to static linking when all
the objects are linked to one another directly (i.e. no use of dlopen with
RTLD_LOCAL).

> This opens the question - is it necessary to force a new model for
loading
> libraries to get reasonable semantics for C++ dynamic linking?

It depends on your definition of "reasonable". If you want to support a
model which includes RTLD_LOCAL and doesn't add new restrictions on what
may be linked together, then I think the answer is yes.

> If the answer
> is "yes" then we will have a very difficult adoption going forward as the
> library loader is a relatively fundamental piece of any operating system.

It's not clear. I wonder whether any software that relies on the current
behavior (as opposed to what I'm proposing) can possibly work right. I
think the only legal programs that could be broken by the change I'm
proposing would be those that unload libraries, and I'm not even certain of
that: it depends on what the semantics of unloading are.

> Beyond those issue there is the fact that dynamically linked libraries
> currently provide a degree of encapsulation - and that encapsulation is a
> major reason developers use dynamic linking. Forcing a broad notion of
> symbol resolution potentially defeats many of the benefits.

I think if you look at my proposal closely, you'll see that it is fairly
conservative. It doesn't introduce any new concepts, just a small change to
the existing behavior which removes an order-dependency. Symbol sharing
across library boundaries would only happen where it would have happened
with the current semantics if the library load order were changed.

> > Unfortunately, apart from the obvious cases, it is pretty difficult to
> > give a well definition; it is much easier to declare problematic cases
> > as undefined. You probably cannot convince compiler vendors to follow
> > what you consider a reasonable semantics unless you specify what that
> > semantics is, and contribute code or money to change their
> > implementations.
>
> What are the obvious cases? The problematic cases in C++ includes aspects
> of, RTTI, exception handling, inline functions, templates, static
members,
> and memory allocation. This currently disqualifies all of the C++
libraries,
> and large portions of the language from being used in any consistent
manner
> in dynamically linked libraries.
>
> Although I don't mind considering changes to the runtime to make C++ work
> better I think changes of this nature should be considered a last resort.
> Rather, I would like us to look in depth at each issue and determine the
> following:
>
> "Is it reasonable to require that a conforming implementation of the
> language make this work with existing dynamic linking implementations?"
>
> "If not, is there a change that can be made to the language or library
> definition that would allow this?"
>
> "If not, is this a language or library feature which is isolated enough
that
> it can be deemed to have undefined behavior when used with dynamically
> linked libraries without "crippling" the usefulness of the language for
> these purposes?"
>
> "If not, then is there a change to the loading mechanism that would be
> feasible for a platform provider to make that could enable this."
>
> Because we don't want to mess with "our language" - I fear we are jumping
to
> the conclusion that everything falls into the last category.

Actually, I *do* want to mess with our language. At least, I'm among those
who think the language definition should say something about the semantics
of shared libraries. However, messing with the language has to happen at
both ends: you have to gain implementation experience in addition to
thinking about what the standard should mandate.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 13:12                         ` David Abrahams
@ 2002-05-14 14:17                           ` Martin v. Loewis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-14 14:17 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> Yes, thanks, I understand the reasons. However, there's a small difference
> if I understand things correctly: in these C++ cases, typically /all/ of
> the definitions are weak, right?

Right. There is the additional feature of "weak undefined references"
also, which the linker resolves to 0 if no definition is found at
run-time.  This is used to wrap thread libraries etc, so that the
application will link fine if no thread library is used; if the thread
library is linked, it will also be used to link the weak undefined
references.

g++ uses that to implement thread-safe exception handling, without
requiring two versions of the runtime library.

> Yes, it's minimal, but when you don't attach some connotation of
> real-world semantics to these functions it's easy to miss the
> reasons that it should work differently.

I've been processing g++ bug reports for a while, and I usually
requested that people had their report in the following form:

- what is the code being executed
- what is the behaviour you observe
- what is the behaviour you expect

Optionally, there is a fourth item

- why do you think this behaviour is desirable.

This structure allows me to understand the issue quickly, without
having to understand complicated real-world architectures first, with
loads of unrelated stuff. I hated reports where people attached their
code as-is (of course, in gcc, there are specific exception to this
rule, e.g. for ICEs)

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 13:53                         ` Martin v. Loewis
@ 2002-05-14 14:45                           ` David Abrahams
  2002-05-15  2:54                             ` Martin v. Loewis
  2002-05-14 15:28                           ` Jason Merrill
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-14 14:45 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>

> > > In this specific case, any Python extension would be a "program"
> > > (though free-standing, since it has a different entry point).
> >
> > So, what happens when multiple "programs" link to the same shared
library?
>
> Each object in the shared library should still exist only once.

Good, then we agree!

> That
> means that some objects can change their state "spontaneously", from
> the view of a "program",

Hmm, what does that mean? In my proposal there is no "state change" AFAICT.

> but apart from that, ODR and everything is
> preserved.
>
> > > You probably cannot convince compiler vendors to follow
> > > what you consider a reasonable semantics unless you specify what that
> > > semantics is, and contribute code or money to change their
> > > implementations.
> >
> > There's one other way: the pressure of standards. However, that takes a
> > long time, and seldom works without a reference implementation...
>
> I think it should never work without a reference implementation:

I agree. However, in practice things have sometimes gone differently.

> > I think you misunderstand me. What you wrote doesn't contradict item 4:
> > since the graph is bidirectional, if A is reachable from B then B is
> > reachable from A. I don't care /which/ definition is chosen - they're
> > required to be the same anyway - I only care that they're shared.
>
> I see. If that is what Solaris does, it sounds reasonable indeed. If
> Solaris does something else, I'd like to see that approach confronted
> with your specification.

Sorry, could you explain? What would it mean to confront the Solaris
approach with my specification?

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 13:53                         ` Martin v. Loewis
  2002-05-14 14:45                           ` David Abrahams
@ 2002-05-14 15:28                           ` Jason Merrill
  2002-05-14 18:32                             ` Daniel Jacobowitz
  1 sibling, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-14 15:28 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: David Abrahams, Mark Mitchell, Ralf W. Grosse-Kunstleve, gcc, c++std-ext

>>>>> "Martin" == Martin v Loewis <martin@v.loewis.de> writes:

> I see. If that is what Solaris does, it sounds reasonable indeed.

It isn't; Solaris has the same suboptimal behavior as Linux for most
symbols.  It is unclear to me why it happens to share the typeinfo names in
such a way that the testcase passes; my attempts to create a similar C
testcase have produced the same results as on Linux.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 15:28                           ` Jason Merrill
@ 2002-05-14 18:32                             ` Daniel Jacobowitz
  2002-05-15  1:34                               ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: Daniel Jacobowitz @ 2002-05-14 18:32 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, David Abrahams, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc, c++std-ext

On Tue, May 14, 2002 at 10:55:32PM +0100, Jason Merrill wrote:
> >>>>> "Martin" == Martin v Loewis <martin@v.loewis.de> writes:
> 
> > I see. If that is what Solaris does, it sounds reasonable indeed.
> 
> It isn't; Solaris has the same suboptimal behavior as Linux for most
> symbols.  It is unclear to me why it happens to share the typeinfo names in
> such a way that the testcase passes; my attempts to create a similar C
> testcase have produced the same results as on Linux.

A slightly gross possibility - is it possible that the Solaris linker
is treating typeinfo names specially, for these same reasons, and based
on what it knows generates them?  Unlikely...

-- 
Daniel Jacobowitz                           Carnegie Mellon University
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 14:08                         ` David Abrahams
@ 2002-05-14 18:38                           ` Sean Parent
  2002-05-14 22:50                             ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Sean Parent @ 2002-05-14 18:38 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

on 5/14/02 1:22 PM, David Abrahams at david.abrahams@rcn.com wrote:

>> By "shared" I will assume you mean "dynamically linked"...
> 
> Could you be more specific about the differences you are mentioning? It
> seems to me that the current model is identical to static linking when all
> the objects are linked to one another directly (i.e. no use of dlopen with
> RTLD_LOCAL).

Certainly, "shared" is usually used to denote that the code in a single
library file is shared by multiple applications. That sharing isn't relevant
to this thread. But other systems support other loading models of libraries
- you point out RTLD_LOCAL for Linux. Other systems have other models - CFM
on the Macintosh allows a library to be singly instantiated across multiple
processes. Palm (and I would suspect many small systems that run single
address space) have a similar notion. A "shared library" is just an overly
broad term and the canonical meaning of the code being shared has no
relevance.

To narrow the scope I wanted to focus on dynamically linked - not explicitly
loaded (which can have other semantics and runs into lifetime issues). I
would like to tackle dlopen (and dlclose) at some point but first we need a
solid notion of "dynamically linked".

>> This opens the question - is it necessary to force a new model for
> loading
>> libraries to get reasonable semantics for C++ dynamic linking?
> 
> It depends on your definition of "reasonable". If you want to support a
> model which includes RTLD_LOCAL and doesn't add new restrictions on what
> may be linked together, then I think the answer is yes.

I would like to start with being able to use _any_ C++ (outside of the
subset that is C) from within a dynamically linked library. That currently
isn't possible without a thorough understanding of the runtime environment,
language, standard libraries, and application so that you can hack some
workable subset. I'm perfectly fine with there being restrictions about what
can be linked together - in fact I think restrictions are probably desirable
if they contribute to the encapsulation provided by the library.
 
>> If the answer
>> is "yes" then we will have a very difficult adoption going forward as the
>> library loader is a relatively fundamental piece of any operating system.
> 
> It's not clear. I wonder whether any software that relies on the current
> behavior (as opposed to what I'm proposing) can possibly work right. I
> think the only legal programs that could be broken by the change I'm
> proposing would be those that unload libraries, and I'm not even certain of
> that: it depends on what the semantics of unloading are.

Since the C++ language doesn't currently define any behavior in regards to
how C++ works in dynamically linked code I don't know that there is any such
thing as a "legal program". However, there is a lot of code written using
aspects of C++ that makes heavy use of dynamic linking - Mat can chime in
but InDesign is a great example.

Current code frequently relies on the fact that "global" symbols _are not_
visible or shared outside the DLL. If they were, the implementation would be
tightly revision locked with the library it is linked against. Rev locking
components is something to be avoided.

>> Beyond those issue there is the fact that dynamically linked libraries
>> currently provide a degree of encapsulation - and that encapsulation is a
>> major reason developers use dynamic linking. Forcing a broad notion of
>> symbol resolution potentially defeats many of the benefits.
> 
> I think if you look at my proposal closely, you'll see that it is fairly
> conservative. It doesn't introduce any new concepts, just a small change to
> the existing behavior which removes an order-dependency. Symbol sharing
> across library boundaries would only happen where it would have happened
> with the current semantics if the library load order were changed.

I'm not familiar enough with how Linux works but your proposal seems to
change the current dynamic linking semantics to be equivalent to static
linking. That doesn't get you any of the benefits of dynamic linking other
than saving bytes on disk. I'm not sure what you mean about "current
semantics if the library load order were changed." Today, in every loader
I'm using, conflicting symbols are an error - not a load order issue. In
fact I can usually specify the load order to be anything that I want. I'm
also not certain how far you expect your proposal to go - it is targeted at
some set of symbol sharing but is it isolated to symbols defined by the
application, or does it include "implicit" symbols defined by the compiler
and runtime libraries? Exception handling tables, RTTI, overloads to
operator new and delete all fall outside the notion of user defined symbols.
Are these "symbols" somehow exported (not in the template export sense)?

> Actually, I *do* want to mess with our language. At least, I'm among those
> who think the language definition should say something about the semantics
> of shared libraries. However, messing with the language has to happen at
> both ends: you have to gain implementation experience in addition to
> thinking about what the standard should mandate.

I agree with gaining implementation experience - but I don't think we should
start with pursuing changes to loaders but should start with changes to the
language and the compiler implementations.

I'll try to start with an example (this one bit me trying to integrate some
code into InDesign - so it's "real world", and a related issue caused me
problems with Photoshop 7.).


-----
Problem: Behavior of overrides of operator new and delete within an
application are not defined with regards to dynamic libraries.

Discussion: Given an application that globally overrides operator new and
delete as allowed by the standard. Said application is also dynamically
linked to a library.

Under current compiler implementations (VC++ 6 and CodeWarrior 7), the
overrides are not visible to the library.

Under VC++ 6 this means that the application and the library are executing
out of separate memory allocators. Any memory allocation which can
"straddle" the boundary may fail. Because the std::string library is
supplied by Microsoft pre-built, and the inlines may cause items to straddle
the boundary, this means that std::strings or any objects containing them
cannot straddle the dll boundary (by straddle I mean allocated on one side
and invoked from the other).

With CodeWarrior 7 a workable solution was found by refactoring and
rebuilding the standard runtime libraries. However, even with that
workaround the standard libraries are initialized prior to _any_
initialization happening within the application. The static initializers for
std::local call operator new and delete (which are cross linked in from the
main application) - which means that operator new and delete are invoked
prior to the exception handling tables being initialized. A careful review
of the behavior of try and catch was required to determine that this work
around was "safe" so long as an exception is not thrown while calling
operator new, including in the constructor for the objects relied upon by
the implementation of std::local - or by any other initializer in any
library.

Solutions:

One possible, though undesirable, solution is to simply say that overrides
of operator new and delete are not allowed from applications using dynamic
linking.

If it is allowed - what are some "reasonable semantics"? I would state the
following:

1. Any runtime initialization should happen prior to any static initializers
being executed from any library in the linkage closure.

2. A single override of operator new and delete should be allowed to exist
anywhere within the linkage closure. Duplicate overrides give undefined
behavior (or a failure could be required).

3. Overrides are visible to all libraries loaded as part of the closure.

Could something that meets these semantics be implemented given the current
implementation for most loaders? I believe so - (I'm certain I could
implement this for Metrowerks w/ CFM but it may rely on being able to cross
link libraries which I'm not certain is generally available). In this case,
all that may be required is a statement in the standard that these are the
semantics that can be relied upon.

Sean

-- 
Sean Parent
Sr. Computer Scientist II
Advanced Technology Group
Adobe Systems Incorporated
sparent@adobe.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 18:38                           ` Sean Parent
@ 2002-05-14 22:50                             ` David Abrahams
  2002-05-15 11:38                               ` Sean Parent
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-14 22:50 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Sean Parent" <sparent@adobe.com>


> To: C++ extensions mailing list
>
> on 5/14/02 1:22 PM, David Abrahams at david.abrahams@rcn.com wrote:
>
> >> By "shared" I will assume you mean "dynamically linked"...
> >
> > Could you be more specific about the differences you are mentioning? It
> > seems to me that the current model is identical to static linking when
all
> > the objects are linked to one another directly (i.e. no use of dlopen
with
> > RTLD_LOCAL).


This is weird. You snipped out the very thing I was asking about above,
leaving it looking as though I was asking about your use of terminology...
and then you answered a question I wasn't asking. Specifically, I was
asking about this passage:

> David's proposal, I believe amounts to requiring that dynamic linking is
> handled much more like static linking with a resolution mechanism for
> duplicate symbols. This proposal would require significantly more
> sophisticated loaders (than current loaders which usually
> do a very simple name binding).

And I was asking you to be more specific about the significant increase in
sophistication you think would be demanded of loaders.

I also went on to claim that Linux dynamic linking already provides
*precisely* the semantics of static linking in the simple case where there
is no use of RTLD_LOCAL. In other words I challenge your assertions above.

> Certainly, "shared" is usually used to denote that the code in a single
> library file is shared by multiple applications. That sharing isn't
relevant
> to this thread.

No, we don't care (from a standards point-of-view) whether any code is
actually shared. However, "identity sharing" is absolutely relevant to this
thread: the observable behavior differences that can arise in the context
of shared libraries (if we ignore unloading for the moment) are ALL cases
of duplication of things which are "supposed" to arise as a single copy:
two different addresses (and thus values) for the same static member of an
inline function or function template, two different addresses for the same
function, two different run-time type identities for the same type, etc.

> But other systems support other loading models of libraries
> - you point out RTLD_LOCAL for Linux. Other systems have other models -
CFM
> on the Macintosh allows a library to be singly instantiated across
multiple
> processes. Palm (and I would suspect many small systems that run single
> address space) have a similar notion. A "shared library" is just an
overly
> broad term and the canonical meaning of the code being shared has no
> relevance.

I'm not sure I agree. I think the CFM model where the library is shared
across processes is strikingly similar to the case we're discussing on
Linux if you take Martin's view that each library loaded with RTLD_LOCAL
should be viewed as a separate "program".

> To narrow the scope I wanted to focus on dynamically linked - not
explicitly
> loaded (which can have other semantics and runs into lifetime issues). I
> would like to tackle dlopen (and dlclose) at some point but first we need
a
> solid notion of "dynamically linked".

If you ignore dlopen and dlclose I don't think there's anything mysterious
about what it means on Unix: from as standards POV, there's nothing to
discuss because it works like static linking.

I don't think Windows can be discussed in the same breath, since it is a
different model and we're not going to be able to force them both into the
same box, if the box is going to be concrete enough to be useful.

> >> This opens the question - is it necessary to force a new model for
> > loading
> >> libraries to get reasonable semantics for C++ dynamic linking?
> >
> > It depends on your definition of "reasonable". If you want to support a
> > model which includes RTLD_LOCAL and doesn't add new restrictions on
what
> > may be linked together, then I think the answer is yes.
>
> I would like to start with being able to use _any_ C++ (outside of the
> subset that is C) from within a dynamically linked library. That
currently
> isn't possible without a thorough understanding of the runtime
environment,
> language, standard libraries, and application so that you can hack some
> workable subset.

Please be specific about the language features you feel you can't use on
Linux in shared libs, and why you think you can't use them... or why
special considerations about the runtime, etc., are required.

> I'm perfectly fine with there being restrictions about what
> can be linked together - in fact I think restrictions are probably
desirable
> if they contribute to the encapsulation provided by the library.

Why is it better for the language designer, rather than the designer of
library X, to say "You can't link Y to X"?

> >> If the answer
> >> is "yes" then we will have a very difficult adoption going forward as
the
> >> library loader is a relatively fundamental piece of any operating
system.
> >
> > It's not clear. I wonder whether any software that relies on the
current
> > behavior (as opposed to what I'm proposing) can possibly work right. I
> > think the only legal programs that could be broken by the change I'm
> > proposing would be those that unload libraries, and I'm not even
certain of
> > that: it depends on what the semantics of unloading are.
>
> Since the C++ language doesn't currently define any behavior in regards
to
> how C++ works in dynamically linked code I don't know that there is any
such
> thing as a "legal program". However, there is a lot of code written using
> aspects of C++ that makes heavy use of dynamic linking - Mat can chime in
> but InDesign is a great example.

Of course; I didn't claim otherwise.

> Current code frequently relies on the fact that "global" symbols _are
not_
> visible or shared outside the DLL. If they were, the implementation would
be
> tightly revision locked with the library it is linked against. Rev
locking
> components is something to be avoided.

The way you avoid that kind of visibility on Unix is with dlopen and
RTLD_LOCAL. Otherwise, you're discussing a windows-model concept. As a
cross-platform developer, I think it's important to be able to have this
kind of hiding, and that's one reason I don't think brushing aside dlopen
is appropriate.

> >> Beyond those issue there is the fact that dynamically linked libraries
> >> currently provide a degree of encapsulation - and that encapsulation
is a
> >> major reason developers use dynamic linking. Forcing a broad notion of
> >> symbol resolution potentially defeats many of the benefits.
> >
> > I think if you look at my proposal closely, you'll see that it is
fairly
> > conservative. It doesn't introduce any new concepts, just a small
change to
> > the existing behavior which removes an order-dependency. Symbol sharing
> > across library boundaries would only happen where it would have
happened
> > with the current semantics if the library load order were changed.
>
> I'm not familiar enough with how Linux works but your proposal seems to
> change the current dynamic linking semantics to be equivalent to static
> linking.

No, you've misinterpreted it. Furthermore, as I say above, if you don't use
dlopen it's already equivalent to static linking.

> That doesn't get you any of the benefits of dynamic linking other
> than saving bytes on disk.

Not true; it gets you component-based development.

> I'm not sure what you mean about "current
> semantics if the library load order were changed."

Let me review, then: In the case I'm talking about, the executable A opens
two libs B and C with dlopen. B and C each link dynamically to D in the
usual way. B, C, and D all contain calls to the same inline function which
has a static counter:

inline int count()
{
    static int n = 0;
    return n++;
}

D also contains the definition of:
int count2() { return count(); }

B and C each contain this definition:

namespace {
  void check_count()
  {
    int x = count2()
    assert(x + 1 == count());
  }
}

Calling check_count() in B always works, but in C it always asserts. That
behavior depends on the order in which B and C were loaded. The change I'm
proposing makes check_count() work in both B and C.

> Today, in every loader
> I'm using, conflicting symbols are an error - not a load order issue.

Are you sure? The ones we want to be shared without errorare usually hidden
from you: template instantiations, static variables in inline functions and
static data members of class templates, type_info, EH info, etc... Most
implementations use some sort of notion of "weak" symbols to ensure that
these things always get a single identity in the usual cases.

> In
> fact I can usually specify the load order to be anything that I want.

Yes, we're not talking about the usual cases.

> I'm
> also not certain how far you expect your proposal to go - it is targeted
at
> some set of symbol sharing but is it isolated to symbols defined by the
> application, or does it include "implicit" symbols defined by the
compiler
> and runtime libraries? Exception handling tables, RTTI, overloads to
> operator new and delete all fall outside the notion of user defined
symbols.

Not new/delete; those can be replaced. My proposal is explicitly concerned
with those runtime-support symbols, though.

> I agree with gaining implementation experience - but I don't think we
should
> start with pursuing changes to loaders but should start with changes to
the
> language and the compiler implementations.

I guess I just disagree with you there. I don't think the problem on Linux
is really in the compilers. We can make the compiler do something which
works around a few of the problems (i.e. by comparing typeinfo::name() for
EH) but we can't really solve the problems in any meaningful way without
changing the loader.

> I'll try to start with an example (this one bit me trying to integrate
some
> code into InDesign - so it's "real world", and a related issue caused me
> problems with Photoshop 7.).
>
>
> -----
> Problem: Behavior of overrides of operator new and delete within an
> application are not defined with regards to dynamic libraries.
>
> Discussion: Given an application that globally overrides operator new and
> delete as allowed by the standard. Said application is also dynamically
> linked to a library.
>
> Under current compiler implementations (VC++ 6 and CodeWarrior 7), the
> overrides are not visible to the library.

Okay, now we're in Windows land. That's a completely different domain and
may require different solutions... but I'm out of time for tonight.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 18:32                             ` Daniel Jacobowitz
@ 2002-05-15  1:34                               ` Martin v. Loewis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-15  1:34 UTC (permalink / raw)
  To: Daniel Jacobowitz
  Cc: Jason Merrill, David Abrahams, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc, c++std-ext

Daniel Jacobowitz <drow@mvista.com> writes:

> A slightly gross possibility - is it possible that the Solaris linker
> is treating typeinfo names specially, for these same reasons, and based
> on what it knows generates them?  Unlikely...

Very unlikely - it might do so for SunPRO, but certainly not for gcc.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 14:45                           ` David Abrahams
@ 2002-05-15  2:54                             ` Martin v. Loewis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-15  2:54 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext

"David Abrahams" <david.abrahams@rcn.com> writes:

> > That means that some objects can change their state
> > "spontaneously", from the view of a "program",
>
> Hmm, what does that mean? In my proposal there is no "state change" AFAICT.

No, but if one such "program", e.g. ext1 invokes a method that happens
to change an object in libcore, then ext2 will observe that the object
changed its state, even though the "program" ext2 has done nothing to
cause this change of state. That's why I call the change, from the
view of ext2, spontaneously.

> > I see. If that is what Solaris does, it sounds reasonable indeed. If
> > Solaris does something else, I'd like to see that approach confronted
> > with your specification.
> 
> Sorry, could you explain? What would it mean to confront the Solaris
> approach with my specification?

Not confront, but compare. It means that both solutions are put next
to each other, side by side, and requirements are stated, and each
solution is judged with regard to these requirements.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 22:50                             ` David Abrahams
@ 2002-05-15 11:38                               ` Sean Parent
  2002-05-15 11:50                                 ` Matthew Austern
  2002-05-15 16:36                                 ` David Abrahams
  0 siblings, 2 replies; 104+ messages in thread
From: Sean Parent @ 2002-05-15 11:38 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

on 5/14/02 8:10 PM, David Abrahams at david.abrahams@rcn.com wrote:

> This is weird. You snipped out the very thing I was asking about above,
> leaving it looking as though I was asking about your use of terminology...

Sorry for the misunderstanding.

>> David's proposal, I believe amounts to requiring that dynamic linking is
>> handled much more like static linking with a resolution mechanism for
>> duplicate symbols. This proposal would require significantly more
>> sophisticated loaders (than current loaders which usually
>> do a very simple name binding).
> 
> And I was asking you to be more specific about the significant increase in
> sophistication you think would be demanded of loaders.
> 
> I also went on to claim that Linux dynamic linking already provides
> *precisely* the semantics of static linking in the simple case where there
> is no use of RTLD_LOCAL. In other words I challenge your assertions above.

Ahh - I don't have experience with Linux (as I had stated above) I had
assumed that RTLD_LOCAL vs. RTLD_GLOBAL simply referred to whether or not
the data section was reinstantiated. I assume that RTLD_LOCAL is only
available for dlopen and not a mode for a dynamic link. I believe my
assertion holds for many loaders in common use.

> No, we don't care (from a standards point-of-view) whether any code is
> actually shared. However, "identity sharing" is absolutely relevant to this
> thread: the observable behavior differences that can arise in the context
> of shared libraries (if we ignore unloading for the moment) are ALL cases
> of duplication of things which are "supposed" to arise as a single copy:
> two different addresses (and thus values) for the same static member of an
> inline function or function template, two different addresses for the same
> function, two different run-time type identities for the same type, etc.

Problems also arise from two items that are supposed to be joined into a
single item - such as exception handling tables - this may not be an issue
with Linux - it is with CodeWarrior on the Mac however. Also replacements
for operator new and delete lead to two different items of the same name of
which one is supposed to be correct. You also have issues with
initialization order - the runtime environment may not be fully initialized
prior to statics being initialized.
 
> I'm not sure I agree. I think the CFM model where the library is shared
> across processes is strikingly similar to the case we're discussing on
> Linux if you take Martin's view that each library loaded with RTLD_LOCAL
> should be viewed as a separate "program".

It's a little bit reversed - it's a property of the library being linked
against rather than an option on the library being loaded - but it is
similar.

> If you ignore dlopen and dlclose I don't think there's anything mysterious
> about what it means on Unix: from as standards POV, there's nothing to
> discuss because it works like static linking.

That may be what it means on Linux - I think it's a stretch to generalize
that to UNIX. It certainly isn't what it means on Mac or Windows - and I'm
not convinced it is a desirable definition.

> I don't think Windows can be discussed in the same breath, since it is a
> different model and we're not going to be able to force them both into the
> same box, if the box is going to be concrete enough to be useful.

Oh boy - if you can't include Windows in the standard you really don't have
a standard. I hate that (like I hate VPC) but it is reality.

> Please be specific about the language features you feel you can't use on
> Linux in shared libs, and why you think you can't use them... or why
> special considerations about the runtime, etc., are required.

I can't use any of the features portably. I'm not a Linux developer - I
develop primarily on the Mac but everything I do also will have to run on
Windows. It would be nice if it could run on Unix for a couple of our
products - and Palm and PocketPC (really another Windows). Linux isn't
currently on the list. I'll probably move my primary development off CFM to
Mach-O at some point, basically I'm waiting for better tool and library
support for Metrowerks before I do that.

>> I'm perfectly fine with there being restrictions about what
>> can be linked together - in fact I think restrictions are probably
> desirable
>> if they contribute to the encapsulation provided by the library.
> 
> Why is it better for the language designer, rather than the designer of
> library X, to say "You can't link Y to X"?

If the language gives me the control to specify it in my library design -
great. If the language requires that all my static symbols and runtime
symbols be exported than I can't use it. I can't afford to build an economy
where all of the add-ons to my product are revision locked to my product, my
compiler, my runtime. Those become handcuffs to the adoption of my next
release. Solid encapsulation is a good thing.

> The way you avoid that kind of visibility on Unix is with dlopen and
> RTLD_LOCAL. Otherwise, you're discussing a windows-model concept. As a
> cross-platform developer, I think it's important to be able to have this
> kind of hiding, and that's one reason I don't think brushing aside dlopen
> is appropriate.

I think we should come back to dlopen - I agree it is very important. But I
think in order to make it work we first need to settle what it means to
dynamically link a C++ application. The issues of dlopen only add complexity
with regards to scoping and lifespan.

> No, you've misinterpreted it. Furthermore, as I say above, if you don't use
> dlopen it's already equivalent to static linking.

Really? You have all these issue with duplicate symbols that don't get
merged with static linking? I guess I'm running with a very different model
- all my static linking "just works" - and dynamic linking isn't even close
to an equivalent.
 
>> That doesn't get you any of the benefits of dynamic linking other
>> than saving bytes on disk.
> 
> Not true; it gets you component-based development.

What does that buy you if everything is revision locked? You can't afford to
allow separate companies to develop components and you would have to give
them your sources to make it work. Might as static linking it and ship an
updater.
 
>> I'm not sure what you mean about "current
>> semantics if the library load order were changed."
> 
> Let me review, then: In the case I'm talking about, the executable A opens
> two libs B and C with dlopen. B and C each link dynamically to D in the
> usual way. B, C, and D all contain calls to the same inline function which
> has a static counter:
> 
> inline int count()
> {
>   static int n = 0;
>   return n++;
> }
> 
> D also contains the definition of:
> int count2() { return count(); }
> 
> B and C each contain this definition:
> 
> namespace {
> void check_count()
> {
>   int x = count2()
>   assert(x + 1 == count());
> }
> }
> 
> Calling check_count() in B always works, but in C it always asserts. That
> behavior depends on the order in which B and C were loaded. The change I'm
> proposing makes check_count() work in both B and C.

And it would also work that way with a static library? With CFM B, C, and D
would all have unique copies of count and the static so it would always
assert. Unless you exported that symbol (which you would have to look at the
link map to find the name) - in which case you couldn't load because you
would always have a conflict. "Weak" linking wouldn't help - that would only
allow you to load of no copies of the static were present.

To make this work you would have to make count() not be an inlined function,
put it into D, and export.

>> Today, in every loader
>> I'm using, conflicting symbols are an error - not a load order issue.
> 
> Are you sure? The ones we want to be shared without errorare usually hidden
> from you: template instantiations, static variables in inline functions and
> static data members of class templates, type_info, EH info, etc... Most
> implementations use some sort of notion of "weak" symbols to ensure that
> these things always get a single identity in the usual cases.

Quite certain with regards to CFM and the Metrowerks runtime - as I had
noted I had to repackage the runtime to get any kind of dll support to work
at all - I'm quite familiar with what it does and does not support. I also
had to build the export list necessary to get things wired together by
finding mangled names in link maps. Given that CFM was based on (and I
believe is a superset of) XCOFF I'd be surprised if XCOFF worked any
differently (my knowledge of XCOFF and AIX though is about a decade out of
date). I work with Alan Lillich who created CFM though so I'll ask him when
I get a moment.

> Not new/delete; those can be replaced. My proposal is explicitly concerned
> with those runtime-support symbols, though.

Okay - except they aren't always just "symbols" to be aliased (maybe they
are in Linux). It seems rather implementation dependent on the loader
though.

> I guess I just disagree with you there. I don't think the problem on Linux
> is really in the compilers. We can make the compiler do something which
> works around a few of the problems (i.e. by comparing typeinfo::name() for
> EH) but we can't really solve the problems in any meaningful way without
> changing the loader.

I can see that - it sounds like with Linux a lot already just works - great.
But what does work isn't defined to work in the standard, and I'm not sure
it's a reasonable extension to say "because it works on Linux it could be
made to work anywhere." I'm also still not convinced that the Linux
direction is the direction the standard should be going in.

> Okay, now we're in Windows land. That's a completely different domain and
> may require different solutions... but I'm out of time for tonight.

Windows and Mac land - and most of what you are taking for granted just
doesn't work that way on these platforms. Before we jump in to solve the
last bits for Linux I think we need to step back and define what the first
bits are for the standard.

-- 
Sean Parent
Sr. Computer Scientist II
Advanced Technology Group
Adobe Systems Incorporated
sparent@adobe.com


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 11:38                               ` Sean Parent
@ 2002-05-15 11:50                                 ` Matthew Austern
  2002-05-15 12:29                                   ` Joe Buck
  2002-05-15 16:36                                 ` David Abrahams
  1 sibling, 1 reply; 104+ messages in thread
From: Matthew Austern @ 2002-05-15 11:50 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

On Wednesday, May 15, 2002, at 11:32 AM, Sean Parent wrote:

>> I guess I just disagree with you there. I don't think the problem on 
>> Linux
>> is really in the compilers. We can make the compiler do something which
>> works around a few of the problems (i.e. by comparing typeinfo::name() 
>> for
>> EH) but we can't really solve the problems in any meaningful way 
>> without
>> changing the loader.
>
> I can see that - it sounds like with Linux a lot already just works - 
> great.
> But what does work isn't defined to work in the standard, and I'm not 
> sure
> it's a reasonable extension to say "because it works on Linux it could 
> be
> made to work anywhere." I'm also still not convinced that the Linux
> direction is the direction the standard should be going in.
>
>> Okay, now we're in Windows land. That's a completely different domain 
>> and
>> may require different solutions... but I'm out of time for tonight.
>
> Windows and Mac land - and most of what you are taking for granted just
> doesn't work that way on these platforms. Before we jump in to solve the
> last bits for Linux I think we need to step back and define what the 
> first
> bits are for the standard.

I agree.

I think it's a bit unfortunate that this discussion got crossposted
between the gcc development list and that C++ standardization
reflector; I think we might be having a discussion that's not very
useful to either group.

There are at least two interesting questions we might ask:
  (1) what should a future version of the C++ standard say
      about dynamic libraries?
  (2) considering what the standard says right now, and
      recognizing that we're talking about behavior outside
      the scope of the standard, what behavior for gcc would
      best serve users on a linux/ELF platform?

I think we should disentangle those two questions, and
probably hold them in different places.

(I'm actually concerned about a third question: what should
gcc do on a system that uses MACH-O instead of ELF.)

			--Matt

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 11:50                                 ` Matthew Austern
@ 2002-05-15 12:29                                   ` Joe Buck
  2002-05-15 17:26                                     ` David Abrahams
  2002-05-15 20:21                                     ` H . J . Lu
  0 siblings, 2 replies; 104+ messages in thread
From: Joe Buck @ 2002-05-15 12:29 UTC (permalink / raw)
  To: Matthew Austern
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext

Matt Austern writes:
> There are at least two interesting questions we might ask:
>   (1) what should a future version of the C++ standard say
>       about dynamic libraries?
>   (2) considering what the standard says right now, and
>       recognizing that we're talking about behavior outside
>       the scope of the standard, what behavior for gcc would
>       best serve users on a linux/ELF platform?

There's a hybrid question as well, since both C++ and ELF have standards.
C++ has the one-definition rule, which is contradicted by the way weak
symbols work in ELF, so we have a tension between two standards.
So:

	what should a future version of the ELF standard say
	about C++ dynamic libraries?

as it seems that any compiler targeting an OS that supports ELF
should provide the same semantics.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 11:38                               ` Sean Parent
  2002-05-15 11:50                                 ` Matthew Austern
@ 2002-05-15 16:36                                 ` David Abrahams
  2002-05-15 19:26                                   ` Jeff Sturm
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-15 16:36 UTC (permalink / raw)
  To: c++std-ext; +Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Sean Parent" <sparent@adobe.com>

> > And I was asking you to be more specific about the significant increase
in
> > sophistication you think would be demanded of loaders.
> >
> > I also went on to claim that Linux dynamic linking already provides
> > *precisely* the semantics of static linking in the simple case where
there
> > is no use of RTLD_LOCAL. In other words I challenge your assertions
above.
>
> Ahh - I don't have experience with Linux (as I had stated above) I had
> assumed that RTLD_LOCAL vs. RTLD_GLOBAL simply referred to whether or not
> the data section was reinstantiated. I assume that RTLD_LOCAL is only
> available for dlopen and not a mode for a dynamic link.

As far as I know, that's the case.

> I believe my assertion holds for many loaders in common use.

Maybe. There are certainly a few different models out there. In
standardization we'll need to at least look at all of them.

> > No, we don't care (from a standards point-of-view) whether any code is
> > actually shared. However, "identity sharing" is absolutely relevant to
this
> > thread: the observable behavior differences that can arise in the
context
> > of shared libraries (if we ignore unloading for the moment) are ALL
cases
> > of duplication of things which are "supposed" to arise as a single
copy:
> > two different addresses (and thus values) for the same static member of
an
> > inline function or function template, two different addresses for the
same
> > function, two different run-time type identities for the same type,
etc.
>
> Problems also arise from two items that are supposed to be joined into a
> single item - such as exception handling tables - this may not be an
issue
> with Linux - it is with CodeWarrior on the Mac however. Also replacements
> for operator new and delete lead to two different items of the same name
of
> which one is supposed to be correct.

Those are all exactly the sort of things that I'm talking about also.

> You also have issues with
> initialization order - the runtime environment may not be fully
initialized
> prior to statics being initialized.

What, specifically, do you mean by "the runtime environment may not be
fully initialized"?

> > I'm not sure I agree. I think the CFM model where the library is shared
> > across processes is strikingly similar to the case we're discussing on
> > Linux if you take Martin's view that each library loaded with
RTLD_LOCAL
> > should be viewed as a separate "program".
>
> It's a little bit reversed - it's a property of the library being linked
> against rather than an option on the library being loaded - but it is
> similar.

It sounds like you're saying that the difference is in which entity
intitiates the sharing (or lack thereof). Except in the case where sharing
is directional (as in the Windows model) I think we can ignore the question
of where it is initiated.

> > If you ignore dlopen and dlclose I don't think there's anything
mysterious
> > about what it means on Unix: from as standards POV, there's nothing to
> > discuss because it works like static linking.
>
> That may be what it means on Linux - I think it's a stretch to generalize
> that to UNIX.

You might be right. It seems to be the same on Solaris. Is ELF a Posix
standard, anybody? That would mean it could be generalized to Posix.

> It certainly isn't what it means on Mac or Windows - and I'm
> not convinced it is a desirable definition.

Whether or not it's desirable, it's an important model in wide use.
Whatever is standardized for C++ has to have a place for those.

> > I don't think Windows can be discussed in the same breath, since it is
a
> > different model and we're not going to be able to force them both into
the
> > same box, if the box is going to be concrete enough to be useful.
>
> Oh boy - if you can't include Windows in the standard you really don't
have
> a standard. I hate that (like I hate VPC) but it is reality.

I don't begin to suggest that we shouldn't include Windows in the standard.
What I /am/ saying is that the standard has to accomodate a few different
models, because Windows and Linux very different.

In particular, Windows sharing has two properties that you don't get on
Linux:

1. Specificity - only symbols which are named explicitly (in the source
code, for practical C++ development) are resolved externally or made
available for external resolution. Some Unices (e.g. AIX) also have this
property.

2. Directionality - a symbol is explicitly marked for export from or import
to a given object file. This is unique to Windows AFAIK, but I'm not
intimate with CFM (I stopped doing Mac development years ago).

I am convinced that any C++ standard for dynamic linking needs to
accomodate at least these two axes of variability by describing what you
can expect when they are/aren't supported by the platform. We might
simplify things a bit by dealing with RTLD_LOCAL as a special case of "bulk
specificity", but I'm jumping ahead here...

> I can't use any of the features portably. I'm not a Linux developer - I
> develop primarily on the Mac but everything I do also will have to run on
> Windows. It would be nice if it could run on Unix for a couple of our
> products - and Palm and PocketPC (really another Windows). Linux isn't
> currently on the list. I'll probably move my primary development off CFM
to
> Mach-O at some point, basically I'm waiting for better tool and library
> support for Metrowerks before I do that.

Good. I hope that when we're done, people like you and I will be able to
develop and standardize a reasonably portable programming model which
doesn't prevent us from using any significant portion of the C++ language
in dynamic libraries, and that allows us to take advantage of the
techniques for isolating symbol spaces on various common OSes.

> >> I'm perfectly fine with there being restrictions about what
> >> can be linked together - in fact I think restrictions are probably
> > desirable
> >> if they contribute to the encapsulation provided by the library.
> >
> > Why is it better for the language designer, rather than the designer of
> > library X, to say "You can't link Y to X"?
>
> If the language gives me the control to specify it in my library design -
> great. If the language requires that all my static symbols and runtime
> symbols be exported than I can't use it. I can't afford to build an
economy
> where all of the add-ons to my product are revision locked to my product,
my
> compiler, my runtime. Those become handcuffs to the adoption of my next
> release. Solid encapsulation is a good thing.

You're not talking about restricting what can be linked together, at least
not the way I understood the phrase. What you mean (AFAICT) is that you
want some way to control symbol visibility across a shared library
boundary. IOW, you want Specificity. I support that. You may not be happy
with it, but AFAICT on Linux, visibility control is an all-or-nothing
proposal at each library boundary.

> > The way you avoid that kind of visibility on Unix is with dlopen and
> > RTLD_LOCAL. Otherwise, you're discussing a windows-model concept. As a
> > cross-platform developer, I think it's important to be able to have
this
> > kind of hiding, and that's one reason I don't think brushing aside
dlopen
> > is appropriate.
>
> I think we should come back to dlopen - I agree it is very important. But
I
> think in order to make it work we first need to settle what it means to
> dynamically link a C++ application. The issues of dlopen only add
complexity
> with regards to scoping and lifespan.

I don't think so. dlopen() is a special case of the more-general visibility
controls you get with __declspec on Windows (you essentially get a single
visible entry point and that's all).

> > No, you've misinterpreted it. Furthermore, as I say above,
> > if you don't use dlopen it's already equivalent to static
    ^^^^^^^^^^^^^^^^^^^^^^^
> > linking.
>
> Really? You have all these issue with duplicate symbols that don't get
> merged with static linking?

No, you don't have those issues with static linking. You don't have them
with dynamic linking either when you're not using dlopen().
I don't think you're reading what I'm writing very carefully.

> I guess I'm running with a very different model
> - all my static linking "just works" - and dynamic linking isn't
> even close to an equivalent.

Yes, that's a common situation. Some people have said that we shouldn't
even talk about dynamic linking in the standard if it isn't going to work
just like static linking; I think that's shortsighted, so we need to
accomodate your model as well.

> >> That doesn't get you any of the benefits of dynamic linking other
> >> than saving bytes on disk.
> >
> > Not true; it gets you component-based development.
>
> What does that buy you if everything is revision locked? You can't afford
to
> allow separate companies to develop components and you would have to give
> them your sources to make it work. Might as static linking it and ship an
> updater.

Not so; we have namespaces. Also, I know of single development groups that
like to do CBD within a single organization. Anyway, I'm not arguing that
we shouldn't discuss models for strong isolation. I'm just saying that the
Linux shared linking model is far from useless. Lots of people use it
happily. Also, it seems to me that your insistence that isolation is
important is in direct conflict with your insistence that dlopen() is
unimportant, unless you hope to get Linux to implement a completely
different linking model.

> >> I'm not sure what you mean about "current
> >> semantics if the library load order were changed."
> >
> > Let me review, then: In the case I'm talking about, the executable A
opens
> > two libs B and C with dlopen. B and C each link dynamically to D in the
> > usual way. B, C, and D all contain calls to the same inline function
which
> > has a static counter:
> >
> > inline int count()
> > {
> >   static int n = 0;
> >   return n++;
> > }
> >
> > D also contains the definition of:
> > int count2() { return count(); }
> >
> > B and C each contain this definition:
> >
> > namespace {
> > void check_count()
> > {
> >   int x = count2()
> >   assert(x + 1 == count());
> > }
> > }
> >
> > Calling check_count() in B always works, but in C it always asserts.
That
> > behavior depends on the order in which B and C were loaded. The change
I'm
> > proposing makes check_count() work in both B and C.
>
> And it would also work that way with a static library?

No, of course not: A opens B and C with dlopen(), which is what causes this
problem. I have said repeatedly that Linux static and dynamic linking are
semantically equivalent IN THE ABSENCE OF DLOPEN.

> With CFM B, C, and D
> would all have unique copies of count and the static so it would always
> assert.

That's similar to Windows.

> Unless you exported that symbol (which you would have to look at the
> link map to find the name) - in which case you couldn't load because you
> would always have a conflict.

Also similar to Windows, except that exporting is simpler.

> "Weak" linking wouldn't help - that would only
> allow you to load of no copies of the static were present.

Does CFM have a "weak" linking model at all? If so, is it different from
ELF weak links?

[Also, how relevant is CFM to a discussion of future dynamic linking
standards? Does it have a long enough future at Apple to make it worth
investigating? Matt?]

> To make this work you would have to make count() not be an inlined
function,
> put it into D, and export.

Again, similar to Windows. That's nice because it means the number of
distinct models is converging.

> > Not new/delete; those can be replaced. My proposal is explicitly
concerned
> > with those runtime-support symbols, though.
>
> Okay - except they aren't always just "symbols" to be aliased (maybe they
> are in Linux).

Please be specific about what you mean. What, if not "symbols" (and does it
make any difference or is it just terminology)?

> > I guess I just disagree with you there. I don't think the problem on
Linux
> > is really in the compilers. We can make the compiler do something which
> > works around a few of the problems (i.e. by comparing typeinfo::name()
for
> > EH) but we can't really solve the problems in any meaningful way
without
> > changing the loader.
>
> I can see that - it sounds like with Linux a lot already just works -
great.
> But what does work isn't defined to work in the standard, and I'm not
sure
> it's a reasonable extension to say "because it works on Linux it could be
> made to work anywhere."

Nobody's claiming that it is.

> I'm also still not convinced that the Linux
> direction is the direction the standard should be going in.

The standard is going go go in a direction that accomodates existing
important platforms (including Linux) - there's nothing you or I could do
to change that. Obviously I hope that it is only /close/ to accomodating
Linux as it exists today, because I want the Loader behavior fixed.

> > Okay, now we're in Windows land. That's a completely different domain
and
> > may require different solutions... but I'm out of time for tonight.
>
> Windows and Mac land - and most of what you are taking for granted just
> doesn't work that way on these platforms.

Please, don't underestimate me. I'm intimately familiar with dynamic
linking on Windows and I'm not taking anything for granted that doesn't
apply there.

> Before we jump in to solve the
> last bits for Linux I think we need to step back and define what the
first
> bits are for the standard.

Since several conversational paths are crossing here, I'm going to continue
to press for Linux fixes on the GNU front, though it may well be premature
for the C++ standards thread.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 12:29                                   ` Joe Buck
@ 2002-05-15 17:26                                     ` David Abrahams
  2002-05-15 20:21                                     ` H . J . Lu
  1 sibling, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-15 17:26 UTC (permalink / raw)
  To: c++std-ext
  Cc: Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc, c++std-ext


----- Original Message -----
From: "Joe Buck" <Joe.Buck@synopsys.com>


> To: C++ extensions mailing list
> Message c++std-ext-5022
>
> Matt Austern writes:
> > There are at least two interesting questions we might ask:
> >   (1) what should a future version of the C++ standard say
> >       about dynamic libraries?
> >   (2) considering what the standard says right now, and
> >       recognizing that we're talking about behavior outside
> >       the scope of the standard, what behavior for gcc would
> >       best serve users on a linux/ELF platform?
>
> There's a hybrid question as well, since both C++ and ELF have standards.
> C++ has the one-definition rule, which is contradicted by the way weak
> symbols work in ELF, so we have a tension between two standards.

If so it may just mean that weak symbols are the wrong mechanism for
implementing some of these C++ features. If the ELF standards people
believe that the current ELF behavior is not mis-specified, we'd need a new
kind of weakness to support C++ well.

> So:
>
> what should a future version of the ELF standard say
> about C++ dynamic libraries?

Good question. What's the right forum for asking that?

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 16:36                                 ` David Abrahams
@ 2002-05-15 19:26                                   ` Jeff Sturm
  0 siblings, 0 replies; 104+ messages in thread
From: Jeff Sturm @ 2002-05-15 19:26 UTC (permalink / raw)
  To: David Abrahams
  Cc: c++std-ext, Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

On Wed, 15 May 2002, David Abrahams wrote:
> You may not be happy
> with it, but AFAICT on Linux, visibility control is an all-or-nothing
> proposal at each library boundary.

That's not quite true.  GNU binutils support visibility directives
(e.g. .hidden, .protected) for ELF that affect an individual symbol's
linkage.  As I understand it, .hidden symbols behave as ordinary
(non-exported) symbols on win32, and .protected as dllexport.

I'm not aware of any language frontend that makes use of these however.

Jeff

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 12:29                                   ` Joe Buck
  2002-05-15 17:26                                     ` David Abrahams
@ 2002-05-15 20:21                                     ` H . J . Lu
  2002-05-15 22:35                                       ` David Abrahams
  1 sibling, 1 reply; 104+ messages in thread
From: H . J . Lu @ 2002-05-15 20:21 UTC (permalink / raw)
  To: Joe Buck
  Cc: Matthew Austern, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc, c++std-ext

On Wed, May 15, 2002 at 12:02:46PM -0700, Joe Buck wrote:
> Matt Austern writes:
> > There are at least two interesting questions we might ask:
> >   (1) what should a future version of the C++ standard say
> >       about dynamic libraries?
> >   (2) considering what the standard says right now, and
> >       recognizing that we're talking about behavior outside
> >       the scope of the standard, what behavior for gcc would
> >       best serve users on a linux/ELF platform?
> 
> There's a hybrid question as well, since both C++ and ELF have standards.
> C++ has the one-definition rule, which is contradicted by the way weak
> symbols work in ELF, so we have a tension between two standards.
> So:
> 
> 	what should a future version of the ELF standard say
> 	about C++ dynamic libraries?
> 
> as it seems that any compiler targeting an OS that supports ELF
> should provide the same semantics.

Please check out the current gABI for weak symbols. If gcc can provide
the detailed description how weak symbols should work for g++ and how
different it is from the gABI, I can look into it for binutils and
glibc.


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 20:21                                     ` H . J . Lu
@ 2002-05-15 22:35                                       ` David Abrahams
  2002-05-16 11:18                                         ` H . J . Lu
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-15 22:35 UTC (permalink / raw)
  To: c++std-ext
  Cc: Matthew Austern, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc, c++std-ext


----- Original Message -----
From: "H . J . Lu" <hjl@lucon.org>

> Please check out the current gABI for weak symbols. If gcc can provide
> the detailed description how weak symbols should work for g++ and how
> different it is from the gABI, I can look into it for binutils and
> glibc.

Where can I find the spec? I would be happy to provide the description
you've requested.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15 22:35                                       ` David Abrahams
@ 2002-05-16 11:18                                         ` H . J . Lu
  2002-05-18 16:53                                           ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: H . J . Lu @ 2002-05-16 11:18 UTC (permalink / raw)
  To: David Abrahams
  Cc: c++std-ext, Matthew Austern, drepper, Mark Mitchell,
	Jason Merrill, Ralf W. Grosse-Kunstleve, gcc

On Wed, May 15, 2002 at 11:02:04PM -0500, David Abrahams wrote:
> 
> ----- Original Message -----
> From: "H . J . Lu" <hjl@lucon.org>
> 
> > Please check out the current gABI for weak symbols. If gcc can provide
> > the detailed description how weak symbols should work for g++ and how
> > different it is from the gABI, I can look into it for binutils and
> > glibc.
> 
> Where can I find the spec? I would be happy to provide the description
> you've requested.
> 

http://www.caldera.com/developers/gabi/


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-16 11:18                                         ` H . J . Lu
@ 2002-05-18 16:53                                           ` David Abrahams
  2002-05-18 17:55                                             ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-18 16:53 UTC (permalink / raw)
  To: H . J . Lu
  Cc: drepper, Mark Mitchell, Jason Merrill, Ralf W. Grosse-Kunstleve,
	gcc, Martin v. Loewis

[-- Attachment #1: Type: text/plain, Size: 868 bytes --]

After doing some more experiments, I am not sure I know just how weak
symbols correspond to the output of g++, though I'm sure this can be
chalked up to unfamiliarity with the tools.

The enclosed archive "template.tgz" minimally reproduces the problem we're
seeing with dlopen using a static data member of a C++ template.

I attempted to take C++ out of the picture in "weak.tgz" by using
__attribute__((weak)), but the assembler doesn't like what the compiler
outputs. DOes g++ add some additional attribute to the template static data
members to make the assembler happy?

Each archive contains a script build.sh which attempts to build and run the
example (well, "weak.tgz" doesn't attempt to run, since the build fails). I
am using GCC 3.1 installed in /usr/local, which explains why /usr/local/lib
appears in the LD_LIBRARY_PATH in the scripts.

Regards,
Dave


[-- Attachment #2: weak.tgz --]
[-- Type: application/x-compressed, Size: 1241 bytes --]

[-- Attachment #3: template.tgz --]
[-- Type: application/x-compressed, Size: 1241 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-18 16:53                                           ` David Abrahams
@ 2002-05-18 17:55                                             ` Martin v. Loewis
  2002-05-18 19:06                                               ` David Abrahams
  2002-05-18 19:13                                               ` Minimal GCC/Linux shared lib + EH bug example David Abrahams
  0 siblings, 2 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-18 17:55 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> I attempted to take C++ out of the picture in "weak.tgz" by using
> __attribute__((weak)), but the assembler doesn't like what the compiler
> outputs. 

Please be always as specific you can in such reports. I assume the
assembler expressed its dislike by saying

 Error: symbol `x' can not be both weak and common

That may be a bug in gcc - it should not export a symbol as "common"
when it also declares it as weak.

> DOes g++ add some additional attribute to the template static data
> members to make the assembler happy?

No. For template static data, it *only* emits them as .comm, not as
.weak. For initialized data that need to be merged at run-time (such
as vtables), it emits them as weak. In your C example, you can achieve
the same effect by saying

  int x __attribute__((weak)) = 1;

I ran your example, but could not see any problems with it.

> Each archive contains a script build.sh which attempts to build and run the
> example (well, "weak.tgz" doesn't attempt to run, since the build fails). I
> am using GCC 3.1 installed in /usr/local, which explains why /usr/local/lib
> appears in the LD_LIBRARY_PATH in the scripts.

Could it be that you've attached the same example twice? I could not
find anything involving template static members (or C++, for that
matter).

For a minimal example, it would help if the directory structure where
simpler...

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-18 17:55                                             ` Martin v. Loewis
@ 2002-05-18 19:06                                               ` David Abrahams
  2002-05-19  4:18                                                 ` Duplicate data objects in shared libraries Martin v. Loewis
  2002-05-18 19:13                                               ` Minimal GCC/Linux shared lib + EH bug example David Abrahams
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-18 19:06 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

[-- Attachment #1: Type: text/plain, Size: 1836 bytes --]


From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > I attempted to take C++ out of the picture in "weak.tgz" by using
> > __attribute__((weak)), but the assembler doesn't like what the compiler
> > outputs.
>
> Please be always as specific you can in such reports.

...of course you're right...

> I assume the
> assembler expressed its dislike by saying
>
>  Error: symbol `x' can not be both weak and common


Exactly.

> That may be a bug in gcc - it should not export a symbol as "common"
> when it also declares it as weak.
>
> > DOes g++ add some additional attribute to the template static data
> > members to make the assembler happy?
>
> No. For template static data, it *only* emits them as .comm, not as
> .weak. For initialized data that need to be merged at run-time (such
> as vtables), it emits them as weak. In your C example, you can achieve
> the same effect by saying
>
>   int x __attribute__((weak)) = 1;
>
> I ran your example, but could not see any problems with it.

When I added the initializer as you suggest, the "C" language example
produces the same results as the C++ one.

> > Each archive contains a script build.sh which attempts to build and run
the
> > example (well, "weak.tgz" doesn't attempt to run, since the build
fails). I
> > am using GCC 3.1 installed in /usr/local, which explains why
/usr/local/lib
> > appears in the LD_LIBRARY_PATH in the scripts.
>
> Could it be that you've attached the same example twice? I could not
> find anything involving template static members (or C++, for that
> matter).

Hmm, yes, you're right; I tar'ed the wrong directory. Please see the
enclosed.

> For a minimal example, it would help if the directory structure where
> simpler...

Indeed; the title of the thread is slightly misleading ;-)

-Dave


[-- Attachment #2: weak.tgz --]
[-- Type: application/x-compressed, Size: 1396 bytes --]

[-- Attachment #3: template.tgz --]
[-- Type: application/x-compressed, Size: 1436 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-18 17:55                                             ` Martin v. Loewis
  2002-05-18 19:06                                               ` David Abrahams
@ 2002-05-18 19:13                                               ` David Abrahams
  2002-05-19  4:29                                                 ` Martin v. Loewis
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-18 19:13 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

From: "Martin v. Loewis" <martin@v.loewis.de>

> No. For template static data, it *only* emits them as .comm, not as
> .weak. For initialized data that need to be merged at run-time (such
> as vtables), it emits them as weak.

Hmm; don't initialized instances of a template static data member need to
be merged at run-time?

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Duplicate data objects in shared libraries
  2002-05-18 19:06                                               ` David Abrahams
@ 2002-05-19  4:18                                                 ` Martin v. Loewis
  2002-05-19  5:00                                                   ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-19  4:18 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> When I added the initializer as you suggest, the "C" language example
> produces the same results as the C++ one.

And not surprisingly so. Still, to make you better understood, I
recommend that you:
a) put all object files into a single directory,
b) in the C-only case (weak.tgz), drop the path to gcc - this has
   nothing to do with a specific gcc release
c) drop the usage of C99 features in main.c (declare all variables
   at top)
d) if you want to keep the template example in the discussion:
   drop the path to g++ (it's not specific for a g++ release, either)

[it turns out that you can also drop the __attribute__((weak)) in the
C example; it does not contribute to the behaviour]

I somewhat lost track as to what your problem is, though: earlier, you
said you accept that static members of class templates might be
duplicated at run-time, and that your problem is only with things you
cannot control (such as typeinfo objects). Why are static members of
template classes suddenly a problem?

Also, where do you suspect the bug now? GCC? glibc? Python? Your own
code?

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-18 19:13                                               ` Minimal GCC/Linux shared lib + EH bug example David Abrahams
@ 2002-05-19  4:29                                                 ` Martin v. Loewis
  2002-05-19  5:10                                                   ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-19  4:29 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> > No. For template static data, it *only* emits them as .comm, not as
> > .weak. For initialized data that need to be merged at run-time (such
> > as vtables), it emits them as weak.
> 
> Hmm; don't initialized instances of a template static data member need to
> be merged at run-time?

Yes. It turns out that weak symbols only contribute lightly to
run-time semantics of symbol resolution: Even if a symbol is strong,
the dynamic linker will deal with multiple definitions gracefully, and
take the first one it finds. Weak symbols matter in that case only if
the first one it finds is weak: a later strong symbol may then
override the resolution.

.comm is only relevant for object files (relocatable objects): the
(static) linker will eliminate duplicates of .comm (common data).  It
will do so by finding the definition of the symbol that is largest (so
they even might have different sizes), and then allocate a global
symbol in the .bss section.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-19  4:18                                                 ` Duplicate data objects in shared libraries Martin v. Loewis
@ 2002-05-19  5:00                                                   ` David Abrahams
  2002-05-19  5:14                                                     ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-19  5:00 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>

> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > When I added the initializer as you suggest, the "C" language example
> > produces the same results as the C++ one.
>
> And not surprisingly so. Still, to make you better understood, I
> recommend that you:
> a) put all object files into a single directory,
> b) in the C-only case (weak.tgz), drop the path to gcc - this has
>    nothing to do with a specific gcc release
> c) drop the usage of C99 features in main.c (declare all variables
>    at top)
> d) if you want to keep the template example in the discussion:
>    drop the path to g++ (it's not specific for a g++ release, either)

Thanks, I'll take all of your suggestions and post some new examples.

> [it turns out that you can also drop the __attribute__((weak)) in the
> C example; it does not contribute to the behaviour]

Hmm, that's interesting and unexpected (to me). Wouldn't it cause an error
in C++?

> I somewhat lost track as to what your problem is, though: earlier, you
> said you accept that static members of class templates might be
> duplicated at run-time, and that your problem is only with things you
> cannot control (such as typeinfo objects).

Wait, weren't you the one who was unwilling to accept half-measures? Heh,
that appears to have been you
(http://gcc.gnu.org/ml/gcc/2002-05/msg00985.html):

    "What good would be it be if you are happy, but the next user
    complains that all his counters are incorrect?"

> Why are static members of template classes suddenly a problem?

For my particular application, they are not. However, *you* convinced me
that nobody was interested in solving my particular problem, and that
developing useful and consistent shared library semantics for the whole
language was a worthwhile goal (I also have that goal for the
standardization process).

> Also, where do you suspect the bug now? GCC? glibc? Python? Your own
> code?

I never said there was a bug (the title of this thread came from Ralf).
It's pretty hard to say there's a bug anywhere in the absence of a
specification.

I *am* convinced that the current behavior of ELF shared libraries with g++
is suboptimal.
I also think I know what an optimal behavior (within the "spirit of the
current design" -- not making it look like Windows or anything) looks like.
Since Mr. Lu generously stepped forward and volunteered to look into the
implementation I'm just trying to understand enough of the details so I can
describe the optimal behavior to him.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-19  4:29                                                 ` Martin v. Loewis
@ 2002-05-19  5:10                                                   ` David Abrahams
  2002-05-19 14:48                                                     ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-19  5:10 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > > No. For template static data, it *only* emits them as .comm, not as
> > > .weak. For initialized data that need to be merged at run-time (such
> > > as vtables), it emits them as weak.
> >
> > Hmm; don't initialized instances of a template static data member need
to
> > be merged at run-time?
>
> Yes. It turns out that weak symbols only contribute lightly to
> run-time semantics of symbol resolution: Even if a symbol is strong,
> the dynamic linker will deal with multiple definitions gracefully, and
> take the first one it finds.

That's only "graceful handling" for some kinds of definitions, but I take
your point.

> Weak symbols matter in that case only if
> the first one it finds is weak: a later strong symbol may then
> override the resolution.
>
> .comm is only relevant for object files (relocatable objects):

By this do you mean what we normally use "*.o" names for? From looking at
the ELF spec, it wasn't clear if "object" meant something else, e.g.
"*.so".

> the (static) linker will eliminate duplicates of .comm (common data). It
> will do so by finding the definition of the symbol that is largest (so
> they even might have different sizes), and then allocate a global
> symbol in the .bss section.

So, for a symbol in a C++ shared library composed of a single object file,
even .comm would not be needed?

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-19  5:00                                                   ` David Abrahams
@ 2002-05-19  5:14                                                     ` Martin v. Loewis
  2002-05-19  5:48                                                       ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-19  5:14 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> Hmm, that's interesting and unexpected (to me). Wouldn't it cause an error
> in C++?

Depends on what "in C++" is. In a single program, two definitions of
the same object are an error, no diagnostics required (3.2/3;
diagnostics is required only if both definitions are in the same
translation unit).

With static libraries, it is common to have a library definition

// foo.c
void free(void*);

and allow the program to override this definition in the "main
program". The linker, when building the program, will take the
definition from the main program, and disregard the definition from
the library.

To allow the same kind of replacement in shared libraries, it is
necessary that the static linker does not reject such duplicate
definitions, either - let alone the dynamic linker.

> > I somewhat lost track as to what your problem is, though: earlier, you
> > said you accept that static members of class templates might be
> > duplicated at run-time, and that your problem is only with things you
> > cannot control (such as typeinfo objects).
> 
> Wait, weren't you the one who was unwilling to accept half-measures? 

Well, yes. Are you saying you now object because I did? That's a good
reason, of course :-)

> For my particular application, they are not. However, *you* convinced me
> that nobody was interested in solving my particular problem, and that
> developing useful and consistent shared library semantics for the whole
> language was a worthwhile goal (I also have that goal for the
> standardization process).

Good. I was just surprised by this change in mind.

> I never said there was a bug (the title of this thread came from Ralf).
> It's pretty hard to say there's a bug anywhere in the absence of a
> specification.

In that case, either everything is fine (which apparently it is not),
or there is a bug in the specification (for not defining a behaviour
for an apparently important case).

> Since Mr. Lu generously stepped forward and volunteered to look into
> the implementation I'm just trying to understand enough of the
> details so I can describe the optimal behavior to him.

I see. Not to discourage you, but I believe that the current behaviour
is quite cast in stone, so there is little chance to change it. Also,
even if a change was made today, it would take years for that change
to propagate to end users.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-19  5:14                                                     ` Martin v. Loewis
@ 2002-05-19  5:48                                                       ` David Abrahams
  2002-05-19 15:05                                                         ` Martin v. Loewis
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-19  5:48 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc


----- Original Message -----
From: "Martin v. Loewis" <martin@v.loewis.de>


> "David Abrahams" <david.abrahams@rcn.com> writes:
>
> > Hmm, that's interesting and unexpected (to me). Wouldn't it cause an
error
> > in C++?
>
> Depends on what "in C++" is. In a single program, two definitions of
> the same object are an error, no diagnostics required (3.2/3;
> diagnostics is required only if both definitions are in the same
> translation unit).
>
> With static libraries, it is common to have a library definition
>
> // foo.c
> void free(void*);
>
> and allow the program to override this definition in the "main
> program". The linker, when building the program, will take the
> definition from the main program, and disregard the definition from
> the library.

...as a language extension, right. And some implementations require that a
symbol be specifically labelled as eligible for such replacement by
programs, in order to better detect errors.

> To allow the same kind of replacement in shared libraries, it is
> necessary that the static linker does not reject such duplicate
> definitions, either - let alone the dynamic linker.

...and this is expected to work on any symbol without explicit notation, I
take it, just as with static libs. Hmph; I guess that makes the behavior I
want to specify a bit simpler, though it certainly leaves the door open to
silent unreliability (though it would be considerably less-open).

> > > I somewhat lost track as to what your problem is, though: earlier,
you
> > > said you accept that static members of class templates might be
> > > duplicated at run-time, and that your problem is only with things you
> > > cannot control (such as typeinfo objects).
> >
> > Wait, weren't you the one who was unwilling to accept half-measures?
>
> Well, yes. Are you saying you now object because I did? That's a good
> reason, of course :-)

Because you did what? I don't think I was objecting, but if I was, it was
to what seemed to be a strange conversational gambit on your part.

If you accepted the idea of implementing a half-measure, I would be very
happy. I need a fix, and I need it sooner than we'll see any change in the
linker/loader behavior. Also, I want to point out that despite the fact
that making just RTTI and EH work properly is a half-measure, it's a
half-measure that *many* C++ implementations seem to take. I think there's
a reason for that, and that there's great value in providing behavior
consistent with other implementations (and the expectations people will
have developed based on other implementations) when practical.

> > For my particular application, they are not. However, *you* convinced
me
> > that nobody was interested in solving my particular problem, and that
> > developing useful and consistent shared library semantics for the whole
> > language was a worthwhile goal (I also have that goal for the
> > standardization process).
>
> Good. I was just surprised by this change in mind.

I'm surprised that you view it as a change. I'll refrain from pointing you
at my earlier messages like
http://gcc.gnu.org/ml/gcc/2002-05/msg00995.html<wink> which describe my
position, but I thought what I wrote was pretty unambiguous.

> > I never said there was a bug (the title of this thread came from Ralf).
> > It's pretty hard to say there's a bug anywhere in the absence of a
> > specification.
>
> In that case, either everything is fine (which apparently it is not),
> or there is a bug in the specification (for not defining a behaviour
> for an apparently important case).

OK, there's a bug in one or more specifications, here.

> > Since Mr. Lu generously stepped forward and volunteered to look into
> > the implementation I'm just trying to understand enough of the
> > details so I can describe the optimal behavior to him.
>
> I see. Not to discourage you, but I believe that the current behaviour
> is quite cast in stone, so there is little chance to change it. Also,
> even if a change was made today, it would take years for that change
> to propagate to end users.

I've heard that before ;-)
I'm willing to take the long view (as long as I can also take the short
view simultaneously).

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-19  5:10                                                   ` David Abrahams
@ 2002-05-19 14:48                                                     ` Martin v. Loewis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-19 14:48 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> > .comm is only relevant for object files (relocatable objects):
> 
> By this do you mean what we normally use "*.o" names for? 

Yes.

> From looking at the ELF spec, it wasn't clear if "object" meant
> something else, e.g.  "*.so".

There are four kinds of object files in ELF (see e_type field):
- relocatable objects (your usual .o files);
- executable files
- shared object files (.so, aka "shared libraries", aka "DSO" =
  "dynamic shared object")
- core files

I believe there is a fifth kind also, archives (aka static libraries),
but that may not be a ELF object.

> So, for a symbol in a C++ shared library composed of a single object file,
> even .comm would not be needed?

Exactly, but you would need to express allocation in the .bss section
by different means.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-19  5:48                                                       ` David Abrahams
@ 2002-05-19 15:05                                                         ` Martin v. Loewis
  2002-05-20  1:42                                                           ` Jason Merrill
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-19 15:05 UTC (permalink / raw)
  To: David Abrahams
  Cc: H . J . Lu, drepper, Mark Mitchell, Jason Merrill,
	Ralf W. Grosse-Kunstleve, gcc

"David Abrahams" <david.abrahams@rcn.com> writes:

> ...as a language extension, right. And some implementations require that a
> symbol be specifically labelled as eligible for such replacement by
> programs, in order to better detect errors.

For C++, it's an extension. For C, 5.1.1.1 specifies

         8.  All  external  object  and  function  references   are
             resolved.   Library  components  are linked to satisfy
             external  references  to  functions  and  objects  not
             defined   in   the   current  translation.   All  such
             translator output is collected into  a  program  image
             which contains information needed for execution in its
             execution environment.

So while it may be an extension to allow programs to define functions
from the standard library, a conforming implementation clearly must
support this feature for other libraries.

> ...and this is expected to work on any symbol without explicit notation, I
> take it, just as with static libs. 

Atleast for malloc, it is clearly necessary to support replacements
that have been defined without any explicit notation, yes.

I now recall the reason for glibc to define all library entry points
as weak, even though the dynamic linker would not reject duplicate
symbols: with that setup, it is possible to define the replacement
malloc in a shared library that happens to be searched *after* the C
library, and to replace malloc even in shared libraries that have not
been linked with the replacement at all.

> If you accepted the idea of implementing a half-measure, I would be
> very happy.

It will be very difficult to convince GCC maintainers to change RTTI
matching, unless problem reports with that semantics pile up
significantly. I believe the rationale is that you can gain speed with
the current implementation, and that giving up that advantage in
favour of providing a minority of users with the illusion of a
solution is not acceptable.

> I need a fix, and I need it sooner than we'll see any change in the
> linker/loader behavior.

Implement your own exception matching, then. Don't rely on the C++
polymorphic exception handling, instead, define a single exception
class that carries all exceptions. Then provide library functions to
perform a more specific matching.

I believe that the notational inconvenience of that approach would be
small, and it would improve portabibility of your code.

> > > For my particular application, they are not. However, *you*
> > > convinced me that nobody was interested in solving my particular
> > > problem, and that developing useful and consistent shared
> > > library semantics for the whole language was a worthwhile goal
> > > (I also have that goal for the standardization process).
> >
> > Good. I was just surprised by this change in mind.
> 
> I'm surprised that you view it as a change. I'll refrain from pointing you
> at my earlier messages like
> http://gcc.gnu.org/ml/gcc/2002-05/msg00995.html<wink> which describe my
> position, but I thought what I wrote was pretty unambiguous.

I don't like these "he said, then I said" games, but in this message,
you said

   You seem to be operating on the assumption that users of shared
   libraries will expect them to be semantically equivalent to
   good-ol'-static linking under all circumstances ...
   Of course, nobody but the most naive users have that expectation.

It appears that you are now expecting that static members in templates
behave like they do in static libraries. Does that make you a most
naive user?

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-19 15:05                                                         ` Martin v. Loewis
@ 2002-05-20  1:42                                                           ` Jason Merrill
  2002-05-20  3:47                                                             ` H . J . Lu
                                                                               ` (2 more replies)
  0 siblings, 3 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-20  1:42 UTC (permalink / raw)
  To: Martin v. Loewis
  Cc: David Abrahams, H . J . Lu, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc, Jason Merrill

[-- Attachment #1: Type: text/plain, Size: 3162 bytes --]

Let me try to summarize the discussion:

The semantics of existing SVR4 dynamic linkers are such that:
   IF two shared objects A.so and B.so link against the same shared 
     library C.so, AND
   both override the same symbol S, AND
   A.so and B.so are loaded in that order with RTLD_LOCAL, THEN
   all references to S from A.so and C.so will resolve to the copy from
     A.so, but references from B.so will resolve to the copy from B.so.

This is true because:
   There is only one copy of C.so loaded, and its relocs are only resolved
     once, AND
   the definitions from A.so and C.so are not visible when loading B.so.

This breaks RTTI matching, which relies on all references within a
program resolving to the same copy.  Since references from B.so and C.so
differ, this premise is violated.

It is generally agreed that this is unfortunate.  Yes?

Various solutions present themselves.  Most basically, they break down to:

1) Change the dynamic linker so that B.so and C.so agree, AND/OR
2) Change the runtime so that it doesn't matter if they don't agree.

#1 seems desirable for other symbols, too; it seems broken for a shared
object to have a different idea of what a symbol means from one of its
dependencies.  Anything which uses global data is vulnerable to being
broken by this.  However, it may not be a complete solution, as we may not
be able to implement it for SVR4 platforms other than Linux/GNU; certainly
not in an interesting time frame.

#2 has the advantages of being simple to implement and applicable to all
targets.  On the other hand, as Martin has pointed out, the more
conservative comparison is slower.  I would be interested to see actual
numbers to quantify this.  Volunteer?

#2 is also not a complete solution, as it would only solve the problem for
RTTI nodes.  A template library with, say, an allocator pool referenced
through a static data member would have the same problem unless the library
author is careful to ensure that the pool is only defined in the library,
not in any client .o's.

On the other hand, as David has argued, other affected constructs can be
controlled by the user; type_info nodes are emitted everywhere.  This is
true, but is a bug.  The type_info node should only be emitted with the
vtable if there is one.  As a result, they can be controlled about to the
same degree as any other static data.  However, they are much more common
than static data such as the above, so managing them is much more of a
hassle.

Possible implementations of #1:
3) If a library needed by an RTLD_LOCAL object is already loaded, ignore it
   and map a new copy.  As an optimization, only do this if it refers to
   symbols defined by the RTLD_LOCAL object.
4) If a library needed by an RTLD_LOCAL object is already loaded, force the
   library to RTLD_GLOBAL status so that references from B.so will use the
   already-resolved definition.

I think #3 is philosophically cleaner.

Have I missed any arguments?

I am in favor of doing #1 and neutral to positive on #2.  As a possible
point for further discussion, here is an unofficial patch I whipped up a
week or so ago to do #2 iff -fpic.  YMMV.

Jason


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 4803 bytes --]

*** ./libsupc++/tinfo.cc.~1~	Sat May 18 16:11:27 2002
--- ./libsupc++/tinfo.cc	Sun May 12 13:57:20 2002
*************** std::type_info::
*** 43,58 ****
  std::bad_cast::~bad_cast() throw() { }
  std::bad_typeid::~bad_typeid() throw() { }
  
- #if !__GXX_MERGED_TYPEINFO_NAMES
- 
- // We can't rely on common symbols being shared between shared objects.
  bool std::type_info::
  operator== (const std::type_info& arg) const
  {
!   return (&arg == this) || (__builtin_strcmp (name (), arg.name ()) == 0);
! }
! 
  #endif
  
  namespace std {
  
--- 43,60 ----
  std::bad_cast::~bad_cast() throw() { }
  std::bad_typeid::~bad_typeid() throw() { }
  
  bool std::type_info::
  operator== (const std::type_info& arg) const
  {
!   bool ret = (name() == arg.name());
! #if !__GXX_MERGED_TYPEINFO_NAMES
!   // In old abi, or when weak symbols are not supported, there can
!   // be multiple instances of a type_info object for one
!   // type. Uniqueness must use the _name value, not object address.
!   ret = (ret || __builtin_strcmp (name (), arg.name ()) == 0);
  #endif
+   return ret;
+ }
  
  namespace std {
  
*** ./libsupc++/tinfo.h.~1~	Sat May 18 16:11:27 2002
--- ./libsupc++/tinfo.h	Sun May 12 13:57:34 2002
***************
*** 8,10 ****
--- 8,20 ----
  // Class declarations shared between the typeinfo implementation files.
  
  #include <cxxabi.h>
+ 
+ #if !__GXX_WEAK__ || defined(__PIC__)
+   // If weak symbols are not supported, typeinfo names are not merged.
+   // Also don't rely on this if building a shared library, as multiple
+   // clients might try to use us.
+   #define __GXX_MERGED_TYPEINFO_NAMES 0
+ #else
+   // On platforms that support weak symbols, typeinfo names are merged.
+   #define __GXX_MERGED_TYPEINFO_NAMES 1
+ #endif
*** ./libsupc++/tinfo2.cc.~1~	Sat May 18 16:11:27 2002
--- ./libsupc++/tinfo2.cc	Sun May 12 13:57:07 2002
*************** extern "C" void abort ();
*** 38,52 ****
  
  using std::type_info;
  
- #if !__GXX_MERGED_TYPEINFO_NAMES
- 
  bool
  type_info::before (const type_info &arg) const
  {
    return __builtin_strcmp (name (), arg.name ()) < 0;
- }
- 
  #endif
  
  #include <cxxabi.h>
  
--- 38,54 ----
  
  using std::type_info;
  
  bool
  type_info::before (const type_info &arg) const
  {
+ #if __GXX_MERGED_TYPEINFO_NAMES
+   // In new abi we can rely on type_info's NTBS being unique,
+   // and therefore address comparisons are sufficient.
+   return name() < arg.name();
+ #else
    return __builtin_strcmp (name (), arg.name ()) < 0;
  #endif
+ }
  
  #include <cxxabi.h>
  
*************** __pointer_catch (const __pbase_type_info
*** 164,167 ****
    return __pbase_type_info::__pointer_catch (thrown_type, thr_obj, outer);
  }
  
! } // namespace std
--- 166,169 ----
    return __pbase_type_info::__pointer_catch (thrown_type, thr_obj, outer);
  }
  
! } // namespace __cxxabiv1
*** ./libsupc++/typeinfo.~1~	Sat May 18 16:11:27 2002
--- ./libsupc++/typeinfo	Sun May 12 13:43:25 2002
*************** namespace __cxxabiv1
*** 44,57 ****
    class __class_type_info;
  } // namespace __cxxabiv1
  
- #if !__GXX_WEAK__
-   // If weak symbols are not supported, typeinfo names are not merged.
-   #define __GXX_MERGED_TYPEINFO_NAMES 0
- #else
-   // On platforms that support weak symbols, typeinfo names are merged.
-   #define __GXX_MERGED_TYPEINFO_NAMES 1
- #endif
- 
  namespace std 
  {
    /** The @c type_info class describes type information generated by
--- 44,49 ----
*************** namespace std 
*** 84,105 ****
      const char* name() const
      { return __name; }
  
- #if !__GXX_MERGED_TYPEINFO_NAMES
-     bool before(const type_info& __arg) const;
-     // In old abi, or when weak symbols are not supported, there can
-     // be multiple instances of a type_info object for one
-     // type. Uniqueness must use the _name value, not object address.
-     bool operator==(const type_info& __arg) const;
- #else
      /** Returns true if @c *this precedes @c __arg in the implementation's
       *  collation order.  */
!     // In new abi we can rely on type_info's NTBS being unique,
!     // and therefore address comparisons are sufficient.
!     bool before(const type_info& __arg) const
!     { return __name < __arg.__name; }
!     bool operator==(const type_info& __arg) const
!     { return __name == __arg.__name; }
! #endif
      bool operator!=(const type_info& __arg) const
      { return !operator==(__arg); }
      
--- 76,85 ----
      const char* name() const
      { return __name; }
  
      /** Returns true if @c *this precedes @c __arg in the implementation's
       *  collation order.  */
!     bool before(const type_info& __arg) const;
!     bool operator==(const type_info& __arg) const;
      bool operator!=(const type_info& __arg) const
      { return !operator==(__arg); }
      

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  1:42                                                           ` Jason Merrill
@ 2002-05-20  3:47                                                             ` H . J . Lu
  2002-05-20  4:08                                                             ` Mark Mitchell
  2002-05-20  7:42                                                             ` Duplicate data objects in shared libraries David Abrahams
  2 siblings, 0 replies; 104+ messages in thread
From: H . J . Lu @ 2002-05-20  3:47 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, David Abrahams, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

On Mon, May 20, 2002 at 06:12:35AM +0100, Jason Merrill wrote:
> Let me try to summarize the discussion:
> 
> The semantics of existing SVR4 dynamic linkers are such that:
>    IF two shared objects A.so and B.so link against the same shared 
>      library C.so, AND
>    both override the same symbol S, AND
>    A.so and B.so are loaded in that order with RTLD_LOCAL, THEN
>    all references to S from A.so and C.so will resolve to the copy from
>      A.so, but references from B.so will resolve to the copy from B.so.
> 
> This is true because:
>    There is only one copy of C.so loaded, and its relocs are only resolved
>      once, AND
>    the definitions from A.so and C.so are not visible when loading B.so.
> 
> This breaks RTTI matching, which relies on all references within a
> program resolving to the same copy.  Since references from B.so and C.so
> differ, this premise is violated.
> 
> It is generally agreed that this is unfortunate.  Yes?
> 
> Various solutions present themselves.  Most basically, they break down to:
> 
> 1) Change the dynamic linker so that B.so and C.so agree, AND/OR
> 2) Change the runtime so that it doesn't matter if they don't agree.
> 
> 
> Possible implementations of #1:
> 3) If a library needed by an RTLD_LOCAL object is already loaded, ignore it
>    and map a new copy.  As an optimization, only do this if it refers to
>    symbols defined by the RTLD_LOCAL object.
> 4) If a library needed by an RTLD_LOCAL object is already loaded, force the
>    library to RTLD_GLOBAL status so that references from B.so will use the
>    already-resolved definition.
> 
> I think #3 is philosophically cleaner.
> 
> Have I missed any arguments?
> 

I saw RTLD_GROUP in Solaris 8 dlopen man page and it says the scope of
RTLD_LOCAL is for the dlopen group. I was wondering what the dlopen
group meant and if it applied to A.so, B.so and C.so here.


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  1:42                                                           ` Jason Merrill
  2002-05-20  3:47                                                             ` H . J . Lu
@ 2002-05-20  4:08                                                             ` Mark Mitchell
  2002-05-20  9:55                                                               ` Jason Merrill
  2002-05-20  7:42                                                             ` Duplicate data objects in shared libraries David Abrahams
  2 siblings, 1 reply; 104+ messages in thread
From: Mark Mitchell @ 2002-05-20  4:08 UTC (permalink / raw)
  To: Jason Merrill, Martin v. Loewis
  Cc: David Abrahams, H . J . Lu, drepper, Ralf W. Grosse-Kunstleve, gcc



--On Monday, May 20, 2002 06:12:35 AM +0100 Jason Merrill 
<jason@redhat.com> wrote:

> Let me try to summarize the discussion:

Thanks; that's helpful.

> I am in favor of doing #1 and neutral to positive on #2.  As a possible
> point for further discussion, here is an unofficial patch I whipped up a
> week or so ago to do #2 iff -fpic.  YMMV.

What about:

static void f() { struct S { virtual void g(); }; }

There's no guarantee that the name in the RTTI for S will be different
from a similar class in another translation unit -- but it is true that
the NTBS will be at a different address since it will be allocated with
internal linkage.

In other words, is it really true that comparison by address is just
an optimization, and not a correctness issue?

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  1:42                                                           ` Jason Merrill
  2002-05-20  3:47                                                             ` H . J . Lu
  2002-05-20  4:08                                                             ` Mark Mitchell
@ 2002-05-20  7:42                                                             ` David Abrahams
  2002-05-20  9:34                                                               ` Jason Merrill
  2 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-20  7:42 UTC (permalink / raw)
  To: Martin v. Loewis, Jason Merrill
  Cc: H . J . Lu, drepper, Mark Mitchell, Ralf W. Grosse-Kunstleve,
	gcc, Jason Merrill


From: "Jason Merrill" <jason@redhat.com>

> Let me try to summarize the discussion:
>
> The semantics of existing SVR4 dynamic linkers are such that:
>    IF two shared objects A.so and B.so link against the same shared
>      library C.so, AND
>    both override the same symbol S, AND
>    A.so and B.so are loaded in that order with RTLD_LOCAL, THEN
>    all references to S from A.so and C.so will resolve to the copy from
>      A.so, but references from B.so will resolve to the copy from B.so.
>
> This is true because:
>    There is only one copy of C.so loaded, and its relocs are only
resolved
>      once, AND
>    the definitions from A.so and C.so are not visible when loading B.so.
>
> This breaks RTTI matching, which relies on all references within a
> program resolving to the same copy.  Since references from B.so and C.so
> differ, this premise is violated.


Nice summary.

> It is generally agreed that this is unfortunate.  Yes?


I agree; I can't speak for others.

> Various solutions present themselves.  Most basically, they break down
to:
>
> 1) Change the dynamic linker so that B.so and C.so agree, AND/OR
> 2) Change the runtime so that it doesn't matter if they don't agree.

<snip>

> Possible implementations of #1:
> 3) If a library needed by an RTLD_LOCAL object is already loaded, ignore
it
>    and map a new copy.  As an optimization, only do this if it refers to
>    symbols defined by the RTLD_LOCAL object.
> 4) If a library needed by an RTLD_LOCAL object is already loaded, force
the
>    library to RTLD_GLOBAL status so that references from B.so will use
the
>    already-resolved definition.
>
> I think #3 is philosophically cleaner.


#3 would be much worse for me than the status quo is. The scenario is that
my clients are writing extension modules loaded with RTLD_LOCAL. In order
to function properly, these modules must share a copy of a common library:
each module "publishes" some data through calls to the common library and
also "subscribes" to all the data in the library. It sounds like ensuring
that the library is actually shared in #3 would be next-to-impossible, and
that even if it were possible my users could easily break sharing
unintentionally by using a some template, inline function or polymorphic
class which is also used by the library.

[I have managed, for the time being, to make my application immune to the
problem we're discussing at the top by arranging for certain exceptions
previously thrown by client code to to be thrown by a function call into
the common library... so they are thrown and caught in the same object. I
don't think such work-arounds will be available for some planned upcoming
work]

#4, or a variation on it, makes much more sense to me.

> Have I missed any arguments?


None that I wouldn't take issue with ;-)

> I am in favor of doing #1 and neutral to positive on #2.  As a possible
> point for further discussion, here is an unofficial patch I whipped up a
> week or so ago to do #2 iff -fpic.  YMMV.


Constructive!

Thanks,
Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  7:42                                                             ` Duplicate data objects in shared libraries David Abrahams
@ 2002-05-20  9:34                                                               ` Jason Merrill
  2002-05-20  9:57                                                                 ` David Abrahams
                                                                                   ` (3 more replies)
  0 siblings, 4 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-20  9:34 UTC (permalink / raw)
  To: David Abrahams
  Cc: Martin v. Loewis, H . J . Lu, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc, Jason Merrill

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

>> 3) If a library needed by an RTLD_LOCAL object is already loaded, ignore
>> it and map a new copy.  As an optimization, only do this if it refers to
>> symbols defined by the RTLD_LOCAL object.

>> 4) If a library needed by an RTLD_LOCAL object is already loaded, force
>> the library to RTLD_GLOBAL status so that references from B.so will use
>> the already-resolved definition.

> #3 would be much worse for me than the status quo is. The scenario is that
> my clients are writing extension modules loaded with RTLD_LOCAL. In order
> to function properly, these modules must share a copy of a common library:
> each module "publishes" some data through calls to the common library and
> also "subscribes" to all the data in the library. It sounds like ensuring
> that the library is actually shared in #3 would be next-to-impossible.

#3 comes from a conception of RTLD_LOCAL as a partitioning of the system
into independent parts ("programs", in earlier messages); if you want to
exchange information between two such objects via a common library, that
model is inadequate.

If you want RTLD_LOCAL objects to be able to share information, the
remaining option is to fix the sharing so it works properly, a la #4.

>>>>> "H" == H J Lu <hjl@lucon.org> writes:

> I saw RTLD_GROUP in Solaris 8 dlopen man page and it says the scope of
> RTLD_LOCAL is for the dlopen group. I was wondering what the dlopen
> group meant and if it applied to A.so, B.so and C.so here.

In this case, there are two dlopen groups: (A.so, C.so) and (B.so, C.so).

The Solaris dlopen man page mentions something that could be taken as
relevant precedent:

                                                              Any
     object of mode RTLD_LOCAL that is referenced as a dependency
     of  an  object  of  mode  RTLD_GLOBAL  will  be  promoted to
     RTLD_GLOBAL. In other words, the RTLD_LOCAL mode is ignored.

My #4 is an extension of this principle.

Testing indicates that explicitly loading C.so as RTLD_GLOBAL after loading
A.so doesn't currently have the desired effect; the reference in B.so is
still resolved locally.  Loading C.so before A.so works.

Interestingly, loading C.so first as RTLD_LOCAL causes both A.so and B.so
to resolve to different addresses from C.so on Linux, but on Solaris it
produces the desired result.

#4 as written above could have the effect of causing B.so to refer to
a definition in A.so, which would be problematic if we try to unload A.so.
Perhaps the right approach is

5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
   dependency.

This rule is easily stated; it would cause both A.so and B.so to refer to
the definition in C.so, regardless of the order of loading.  I like it.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  4:08                                                             ` Mark Mitchell
@ 2002-05-20  9:55                                                               ` Jason Merrill
  2002-05-20 10:15                                                                 ` Mark Mitchell
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-20  9:55 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Martin v. Loewis, David Abrahams, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

> What about:

> static void f() { struct S { virtual void g(); }; }

> There's no guarantee that the name in the RTTI for S will be different
> from a similar class in another translation unit -- but it is true that
> the NTBS will be at a different address since it will be allocated with
> internal linkage.

> In other words, is it really true that comparison by address is just
> an optimization, and not a correctness issue?

A good point, though we could handle this by decorating the RTTI name for S
with the unnamed namespace qualifier.  I suppose this sort of thing is what
leads people to want to remove internal linkage entirely.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  9:34                                                               ` Jason Merrill
@ 2002-05-20  9:57                                                                 ` David Abrahams
  2002-05-20 10:28                                                                 ` H . J . Lu
                                                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-20  9:57 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, H . J . Lu, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc, Jason Merrill


----- Original Message -----
From: "Jason Merrill" <jason@redhat.com>

> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from
a
>    dependency.
>
> This rule is easily stated; it would cause both A.so and B.so to refer to
> the definition in C.so, regardless of the order of loading.  I like it.

At first glance, I like it very much also. Very interesting.

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  9:55                                                               ` Jason Merrill
@ 2002-05-20 10:15                                                                 ` Mark Mitchell
  2002-05-20 12:42                                                                   ` Jason Merrill
  0 siblings, 1 reply; 104+ messages in thread
From: Mark Mitchell @ 2002-05-20 10:15 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, David Abrahams, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc


> A good point, though we could handle this by decorating the RTTI name for
> S with the unnamed namespace qualifier.  I suppose this sort of thing is
> what leads people to want to remove internal linkage entirely.

We're clearly in the land of corner cases, but changing the RTTI name for
S would be an incompatible ABI change.

I'm not sure what to say here overall.

I guess my top-level opinion is that this is a good discussion, but that
we should keep it as a discussion -- rather than an implementation -- for
some time to come.  We should bring in other vendors too; if we do one
thing, and HP and IBM and Sun and EDG and so forth and so on do another,
that won't be good for people.

I guess I think we have bigger fish to fry than making RTLD_LOCAL work
with C++... :-)

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  9:34                                                               ` Jason Merrill
  2002-05-20  9:57                                                                 ` David Abrahams
@ 2002-05-20 10:28                                                                 ` H . J . Lu
  2002-05-20 13:49                                                                   ` Jason Merrill
  2002-05-20 13:26                                                                 ` David Beazley
  2002-05-20 15:50                                                                 ` Michael Matz
  3 siblings, 1 reply; 104+ messages in thread
From: H . J . Lu @ 2002-05-20 10:28 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

On Mon, May 20, 2002 at 05:12:19PM +0100, Jason Merrill wrote:
> 
> Interestingly, loading C.so first as RTLD_LOCAL causes both A.so and B.so
> to resolve to different addresses from C.so on Linux, but on Solaris it
> produces the desired result.

It seems like a Linux bug. I will look into it if no one else does.

> 
> #4 as written above could have the effect of causing B.so to refer to
> a definition in A.so, which would be problematic if we try to unload A.so.
> Perhaps the right approach is
> 
> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
>    dependency.
> 
> This rule is easily stated; it would cause both A.so and B.so to refer to
> the definition in C.so, regardless of the order of loading.  I like it.

If you were saying:

1. Load C.so with RTLD_LOCAL.
2. Load A.so with RTLD_LOCAL.
3. Load B.so with RTLD_LOCAL.

both A.so and B.so should resolve to C.so, I think it makes sense.


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 10:15                                                                 ` Mark Mitchell
@ 2002-05-20 12:42                                                                   ` Jason Merrill
  2002-05-20 12:53                                                                     ` Mark Mitchell
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-20 12:42 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Martin v. Loewis, David Abrahams, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

>> A good point, though we could handle this by decorating the RTTI name for
>> S with the unnamed namespace qualifier.  I suppose this sort of thing is
>> what leads people to want to remove internal linkage entirely.

> We're clearly in the land of corner cases, but changing the RTTI name for
> S would be an incompatible ABI change.

I don't think it would be incompatible; S is file-local, so its
compatibility with things from other files is either uninteresting or
actually undesirable.

> I guess my top-level opinion is that this is a good discussion, but that
> we should keep it as a discussion -- rather than an implementation -- for
> some time to come.  We should bring in other vendors too; if we do one
> thing, and HP and IBM and Sun and EDG and so forth and so on do another,
> that won't be good for people.

Certainly any changes to ld.so semantics should go through the ELF gABI
committee.  But I don't think that's as difficult as you make it sound.  :)

> I guess I think we have bigger fish to fry than making RTLD_LOCAL work
> with C++... :-)

I think that being able to write plugins in C++ is important, and a
reasonably common desire.  I know I talked to a customer several years ago
about writing Oracle plugins in C++, and it comes up regularly.  This isn't
like -Bsymbolic, where we can just say "don't do that".  If you say that in
this case, you're saying "don't use C++".  I'd prefer not to discourage
people from using C++.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 12:42                                                                   ` Jason Merrill
@ 2002-05-20 12:53                                                                     ` Mark Mitchell
  2002-05-20 13:23                                                                       ` Jason Merrill
  2002-05-20 13:28                                                                       ` David Abrahams
  0 siblings, 2 replies; 104+ messages in thread
From: Mark Mitchell @ 2002-05-20 12:53 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Martin v. Loewis, David Abrahams, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc



--On Monday, May 20, 2002 08:06:56 PM +0100 Jason Merrill 
<jason@redhat.com> wrote:

>>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:
>
>>> A good point, though we could handle this by decorating the RTTI name
>>> for S with the unnamed namespace qualifier.  I suppose this sort of
>>> thing is what leads people to want to remove internal linkage entirely.
>
>> We're clearly in the land of corner cases, but changing the RTTI name for
>> S would be an incompatible ABI change.
>
> I don't think it would be incompatible; S is file-local, so its
> compatibility with things from other files is either uninteresting or
> actually undesirable.

Hmm.  I think that the ABI specifies the name -- even for the local class
in the method with static linkage -- and therefore I can write
ABI-conforming code that does:

  if (strcmp (typeid (S).name(), "<mangled name here>") != 0)
    abort ();

It would be reasonable to have an ABI that says basically nothing about
objects with internal linkage, but I don't think ours does.

> I think that being able to write plugins in C++ is important, and a

I guess I just don't think that using RTLD_LOCAL is the only reasonable
way to do it.

But, if there's a good solution here, we should definitely do it.

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 12:53                                                                     ` Mark Mitchell
@ 2002-05-20 13:23                                                                       ` Jason Merrill
  2002-05-20 13:28                                                                       ` David Abrahams
  1 sibling, 0 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-20 13:23 UTC (permalink / raw)
  To: Mark Mitchell
  Cc: Martin v. Loewis, David Abrahams, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "Mark" == Mark Mitchell <mark@codesourcery.com> writes:

> I think that the ABI specifies the name -- even for the local class in
> the method with static linkage

True.

>> I think that being able to write plugins in C++ is important, and a

> I guess I just don't think that using RTLD_LOCAL is the only reasonable
> way to do it.

The problem is that the use of RTLD_LOCAL is not under the control of the
plugin writer.  And in any case, I think that using RTLD_LOCAL is
appropriate for plugins; we don't want f() in one overriding f() in
another.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  9:34                                                               ` Jason Merrill
  2002-05-20  9:57                                                                 ` David Abrahams
  2002-05-20 10:28                                                                 ` H . J . Lu
@ 2002-05-20 13:26                                                                 ` David Beazley
  2002-05-20 13:57                                                                   ` H . J . Lu
  2002-05-20 15:50                                                                 ` Michael Matz
  3 siblings, 1 reply; 104+ messages in thread
From: David Beazley @ 2002-05-20 13:26 UTC (permalink / raw)
  To: gcc

Jason Merrill writes:
 > >>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:
 > 
 > > #3 would be much worse for me than the status quo is. The scenario is that
 > > my clients are writing extension modules loaded with RTLD_LOCAL. In order
 > > to function properly, these modules must share a copy of a common library:
 > > each module "publishes" some data through calls to the common library and
 > > also "subscribes" to all the data in the library. It sounds like ensuring
 > > that the library is actually shared in #3 would be next-to-impossible.
 > 
 > #3 comes from a conception of RTLD_LOCAL as a partitioning of the system
 > into independent parts ("programs", in earlier messages); if you want to
 > exchange information between two such objects via a common library, that
 > model is inadequate.

I've been quietly sitting on the sidelines for most of this
discussion, but I'd like to reiterate David's comment above.  #3 would
definitely break a lot of big applications that rely on scripting
language extension module interfaces.  It is fairly common for an
application to be broken up into independent dynamic modules---all of
which link against a common runtime library.  This runtime library may
manage things related to networking/parallel computing (MPI, etc.) or
other shared state.  Creating independent copies of the same library
would just be a huge disaster.

 > Perhaps the right approach is
 > 
 > 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
 >    dependency.
 > 
 > This rule is easily stated; it would cause both A.so and B.so to refer to
 > the definition in C.so, regardless of the order of loading.  I like it.

Hmmm. Interesting. As far as I know, this wouldn't break anything in
the typical use of dynamic loading. At least in Python, extension
modules aren't meant to serve as libraries nor do you normally try to
override library symbols (at least not on purpose).  However, I don't
recall any sort of ELF/dynamic linking option that would achieve this
kind of effect (it seems like it is the opposite of how libraries are
normally linked).  

-- Dave







^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 12:53                                                                     ` Mark Mitchell
  2002-05-20 13:23                                                                       ` Jason Merrill
@ 2002-05-20 13:28                                                                       ` David Abrahams
  2002-05-22 16:35                                                                         ` Jason Merrill
  1 sibling, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-20 13:28 UTC (permalink / raw)
  To: Mark Mitchell, Jason Merrill
  Cc: Martin v. Loewis, H . J . Lu, drepper, Ralf W. Grosse-Kunstleve, gcc


From: "Mark Mitchell" <mark@codesourcery.com>

> > I think that being able to write plugins in C++ is important, and a
>
> I guess I just don't think that using RTLD_LOCAL is the only reasonable
> way to do it.

If you are writing an application which loads plugins which may be written
in "C", it is pretty-much the only reasonable way to do it. Martin details
the reasons quite nicely here:

http://aspn.activestate.com/ASPN/Mail/Message/1190320

...and although namespaces mitigate things a bit, I think the arguments
still apply to C++.

> But, if there's a good solution here, we should definitely do it.

I just spent some more time discussing Jason's #5 solution with him, and
trying to find problems with it. We weren't able to find any; it does seem
to provide the desired semantics in all cases we could imagine with a
minimum of subtlety and complication.

One thing we didn't discuss in detail was what should happen in case two of
a library's dependencies are already loaded, each with its own definition
for some shared symbol S. There are two possiblities I can imagine:

1. Pick one
2. Error

Unless we get a new symbol label which means "must be shared", I'm strongly
in favor of 1. In many cases there's no detectable difference when symbols
aren't actually shared (for example, inline functions with no static data
and which nobody takes the address of), and I don't want to make otherwise
legitimate uses fail. Even if we had a "must be shared" label, describing
how to implement suitably selective error reporting is not simple.

-Dave



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 10:28                                                                 ` H . J . Lu
@ 2002-05-20 13:49                                                                   ` Jason Merrill
  2002-05-20 13:59                                                                     ` H . J . Lu
                                                                                       ` (3 more replies)
  0 siblings, 4 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-20 13:49 UTC (permalink / raw)
  To: H . J . Lu
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "H" == H J Lu <hjl@lucon.org> writes:

> On Mon, May 20, 2002 at 05:12:19PM +0100, Jason Merrill wrote:
>> 
>> Interestingly, loading C.so first as RTLD_LOCAL causes both A.so and B.so
>> to resolve to different addresses from C.so on Linux, but on Solaris it
>> produces the desired result.

> It seems like a Linux bug. I will look into it if no one else does.

>> #4 as written above could have the effect of causing B.so to refer to
>> a definition in A.so, which would be problematic if we try to unload A.so.
>> Perhaps the right approach is
>> 
>> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
>> dependency.
>> 
>> This rule is easily stated; it would cause both A.so and B.so to refer to
>> the definition in C.so, regardless of the order of loading.  I like it.

> If you were saying:

> 1. Load C.so with RTLD_LOCAL.
> 2. Load A.so with RTLD_LOCAL.
> 3. Load B.so with RTLD_LOCAL.

> both A.so and B.so should resolve to C.so, I think it makes sense.

Yes, that is what I was saying.  A possible refinement would be

6) #5, but if the definition in the RTLD_LOCAL object is strong, use it in
   the object.

Which would produce the current Linux semantics described above if the
definitions in A.so and B.so are strong, and the current Solaris semantics
described above if they are weak.  This would allow a plugin writer to
override operator new for their plugin without affecting uses in
libstdc++.  Obviously, the plugin writer would need to be careful to
handle all of their own memory allocation/deallocation.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:26                                                                 ` David Beazley
@ 2002-05-20 13:57                                                                   ` H . J . Lu
  2002-05-20 14:36                                                                     ` David Beazley
  0 siblings, 1 reply; 104+ messages in thread
From: H . J . Lu @ 2002-05-20 13:57 UTC (permalink / raw)
  To: David Beazley; +Cc: gcc

On Mon, May 20, 2002 at 02:40:31PM -0500, David Beazley wrote:
> Jason Merrill writes:
>  > >>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:
>  > 
>  > > #3 would be much worse for me than the status quo is. The scenario is that
>  > > my clients are writing extension modules loaded with RTLD_LOCAL. In order
>  > > to function properly, these modules must share a copy of a common library:
>  > > each module "publishes" some data through calls to the common library and
>  > > also "subscribes" to all the data in the library. It sounds like ensuring
>  > > that the library is actually shared in #3 would be next-to-impossible.
>  > 
>  > #3 comes from a conception of RTLD_LOCAL as a partitioning of the system
>  > into independent parts ("programs", in earlier messages); if you want to
>  > exchange information between two such objects via a common library, that
>  > model is inadequate.
> 
> I've been quietly sitting on the sidelines for most of this
> discussion, but I'd like to reiterate David's comment above.  #3 would
> definitely break a lot of big applications that rely on scripting
> language extension module interfaces.  It is fairly common for an
> application to be broken up into independent dynamic modules---all of
> which link against a common runtime library.  This runtime library may
> manage things related to networking/parallel computing (MPI, etc.) or
> other shared state.  Creating independent copies of the same library
> would just be a huge disaster.
> 
>  > Perhaps the right approach is
>  > 
>  > 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
>  >    dependency.
>  > 
>  > This rule is easily stated; it would cause both A.so and B.so to refer to
>  > the definition in C.so, regardless of the order of loading.  I like it.
> 
> Hmmm. Interesting. As far as I know, this wouldn't break anything in
> the typical use of dynamic loading. At least in Python, extension
> modules aren't meant to serve as libraries nor do you normally try to
> override library symbols (at least not on purpose).  However, I don't
> recall any sort of ELF/dynamic linking option that would achieve this
> kind of effect (it seems like it is the opposite of how libraries are
> normally linked).  

Just load C.so first. It should work. I know it doesn't work on Linux.
I am working on a testcase in C to fix it.


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:49                                                                   ` Jason Merrill
@ 2002-05-20 13:59                                                                     ` H . J . Lu
  2002-05-20 14:17                                                                       ` Jason Merrill
  2002-05-20 14:32                                                                       ` David Abrahams
  2002-05-20 14:32                                                                     ` David Abrahams
                                                                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 104+ messages in thread
From: H . J . Lu @ 2002-05-20 13:59 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

On Mon, May 20, 2002 at 08:57:35PM +0100, Jason Merrill wrote:
> >>>>> "H" == H J Lu <hjl@lucon.org> writes:
> 
> > On Mon, May 20, 2002 at 05:12:19PM +0100, Jason Merrill wrote:
> >> 
> >> Interestingly, loading C.so first as RTLD_LOCAL causes both A.so and B.so
> >> to resolve to different addresses from C.so on Linux, but on Solaris it
> >> produces the desired result.
> 
> > It seems like a Linux bug. I will look into it if no one else does.
> 
> >> #4 as written above could have the effect of causing B.so to refer to
> >> a definition in A.so, which would be problematic if we try to unload A.so.
> >> Perhaps the right approach is
> >> 
> >> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
> >> dependency.
> >> 
> >> This rule is easily stated; it would cause both A.so and B.so to refer to
> >> the definition in C.so, regardless of the order of loading.  I like it.
> 
> > If you were saying:
> 
> > 1. Load C.so with RTLD_LOCAL.
> > 2. Load A.so with RTLD_LOCAL.
> > 3. Load B.so with RTLD_LOCAL.
> 
> > both A.so and B.so should resolve to C.so, I think it makes sense.
> 
> Yes, that is what I was saying.  A possible refinement would be
> 
> 6) #5, but if the definition in the RTLD_LOCAL object is strong, use it in
>    the object.

I believe Linux is trying to move away from special treatment of weak
symbol in ld.so.

> 
> Which would produce the current Linux semantics described above if the
> definitions in A.so and B.so are strong, and the current Solaris semantics
> described above if they are weak.  This would allow a plugin writer to

Do you have a testcase in C to show the Linux behavior? I believe it is
a Linux bug.


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:59                                                                     ` H . J . Lu
@ 2002-05-20 14:17                                                                       ` Jason Merrill
  2002-05-20 18:19                                                                         ` H . J . Lu
  2002-05-20 14:32                                                                       ` David Abrahams
  1 sibling, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-20 14:17 UTC (permalink / raw)
  To: H . J . Lu
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

[-- Attachment #1: Type: text/plain, Size: 415 bytes --]

>>>>> "H" == H J Lu <hjl@lucon.org> writes:

> On Mon, May 20, 2002 at 08:57:35PM +0100, Jason Merrill wrote:

>> 6) #5, but if the definition in the RTLD_LOCAL object is strong, use it
>> in the object.

> I believe Linux is trying to move away from special treatment of weak
> symbol in ld.so.

Fair enough.

> Do you have a testcase in C to show the Linux behavior? I believe it is
> a Linux bug.

Here you go.


[-- Attachment #2: foo.tar.gz --]
[-- Type: application/x-gzip, Size: 609 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:59                                                                     ` H . J . Lu
  2002-05-20 14:17                                                                       ` Jason Merrill
@ 2002-05-20 14:32                                                                       ` David Abrahams
  1 sibling, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-20 14:32 UTC (permalink / raw)
  To: H . J . Lu, Jason Merrill
  Cc: Martin v. Loewis, drepper, Mark Mitchell, Ralf W. Grosse-Kunstleve, gcc


From: "H . J . Lu" <hjl@lucon.org>
> I believe Linux is trying to move away from special treatment of weak
> symbol in ld.so.


Martin mentions a use made of weak symbols by glibc to control symbol
replaceability in this message:
http://gcc.gnu.org/ml/gcc/2002-05/msg01913.html. Is there momentum toward
removing that capability, or is some alternative way of achieving the same
thing in the works?

-Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:49                                                                   ` Jason Merrill
  2002-05-20 13:59                                                                     ` H . J . Lu
@ 2002-05-20 14:32                                                                     ` David Abrahams
  2002-05-20 15:31                                                                     ` Martin v. Loewis
  2002-05-21 19:07                                                                     ` H . J . Lu
  3 siblings, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-20 14:32 UTC (permalink / raw)
  To: H . J . Lu, Jason Merrill
  Cc: Martin v. Loewis, drepper, Mark Mitchell, Ralf W. Grosse-Kunstleve, gcc


From: "Jason Merrill" <jason@redhat.com>
To: "H . J . Lu" <hjl@lucon.org>

> > both A.so and B.so should resolve to C.so, I think it makes sense.
>
> Yes, that is what I was saying.  A possible refinement would be
>
> 6) #5, but if the definition in the RTLD_LOCAL object is strong, use it
in
>    the object.
>
> Which would produce the current Linux semantics described above if the
> definitions in A.so and B.so are strong, and the current Solaris
semantics
> described above if they are weak.  This would allow a plugin writer to
> override operator new for their plugin without affecting uses in
> libstdc++.  Obviously, the plugin writer would need to be careful to
> handle all of their own memory allocation/deallocation.

That's a common model for plugins; it would be very useful if we could
support it.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:57                                                                   ` H . J . Lu
@ 2002-05-20 14:36                                                                     ` David Beazley
  0 siblings, 0 replies; 104+ messages in thread
From: David Beazley @ 2002-05-20 14:36 UTC (permalink / raw)
  To: H . J . Lu; +Cc: David Beazley, gcc

H . J . Lu writes:
 > 
 > Just load C.so first. It should work. I know it doesn't work on Linux.
 > I am working on a testcase in C to fix it.

Yes, I guess that's easy enough :-). The only difficulty with loading C.so
first is that I have seen a number of applications where the use of
C.so is somewhat implicit (there are a bunch of dynamic modules linked
against C.so, but C.so is never actually loaded explicitly).
Admittedly, this is a pretty minor point and I don't really feel all
that inclined to pursue it further.  If explicit loading of C.so
solves the problem, then that's easy enough to document.  Works for me.

-- Dave


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:49                                                                   ` Jason Merrill
  2002-05-20 13:59                                                                     ` H . J . Lu
  2002-05-20 14:32                                                                     ` David Abrahams
@ 2002-05-20 15:31                                                                     ` Martin v. Loewis
  2002-05-21 19:07                                                                     ` H . J . Lu
  3 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-20 15:31 UTC (permalink / raw)
  To: Jason Merrill
  Cc: H . J . Lu, David Abrahams, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

Jason Merrill <jason@redhat.com> writes:

> Yes, that is what I was saying.  A possible refinement would be
> 
> 6) #5, but if the definition in the RTLD_LOCAL object is strong, use it in
>    the object.

That approach would probably provide better backwards compatibility,
but still make all the C++ cases work. So I'd support this
modification.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20  9:34                                                               ` Jason Merrill
                                                                                   ` (2 preceding siblings ...)
  2002-05-20 13:26                                                                 ` David Beazley
@ 2002-05-20 15:50                                                                 ` Michael Matz
  3 siblings, 0 replies; 104+ messages in thread
From: Michael Matz @ 2002-05-20 15:50 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc

Hi,

On Mon, 20 May 2002, Jason Merrill wrote:

> Perhaps the right approach is
>
> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
>    dependency.

Yep.  And indeed I ever thought this is how RTLD_LOCAL should behave.  A
"local" DSO can access it's own symbols and additionally those from the
global DSO's and the app.  Nothing except that local DSO can resolve to
it's own symbols (besides through an explicit dlsym() ).  If a global and
a local DSO both define the same symbol, the one from the global DSO is
used (even from inside the local one).  This partitions the application
into one big global chunk, and many local finals.  I think this is also
how dynamically loaded DSO's are used.  If you use RTLD_LOCAL you often
have the notion of a "plugin", with a fairly small and exactly
defined interface (i.e. not interfacing through global data, depending on
sharing of symbols).

In the past I once thought, that it would be ideal if we had another class
of symbols visibility: "global-damnit".  Such symbols are normal global
ones in RTLD_GLOBAL DSOs, and are also global with RTLD_LOCAL, just that
the dynamic linker magically ensures, that those symbol sets are made
disjoint when loading two such local DSOs into the same process.

This visibility would be reserved for language implementation symbols,
were the user really doesn't care how exactly they are spelled (or doesn't
even know they exist), but the language runtime cares, that they exist and
are different.  This would e.g. allow equally named classes in two
different local DSO still have different RTTI, as semantically they have
(sure technically you are outside standard with this and the
one-definition rule, but it should still work).

I think only this or a similar basic change to ELF (or other systems)
really would solve such problems correctly.  After all, the user wants to
write a plugin, say only exporting an "init" function returning an
instance of a (private) class deriving from a common "Plugin" class
(globally defined) which fixes the interface.  This plugin should be
loadable RTLD_LOCAL to not conflict with other plugins _in the symbols the
programmer actually wrote himself_, regardless of how the language is
implemented.  And still the user should be able to use all C++ features
reasonable (i.e. throwing exception into the global code, using templates,
whatnot.  Expecting that a static data from that local DSO is somehow
shared with other local DSOs, or even with the global part is not
reasonable here, it's local after all, i.e. invisible from the outside).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 14:17                                                                       ` Jason Merrill
@ 2002-05-20 18:19                                                                         ` H . J . Lu
  0 siblings, 0 replies; 104+ messages in thread
From: H . J . Lu @ 2002-05-20 18:19 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

On Mon, May 20, 2002 at 09:12:28PM +0100, Jason Merrill wrote:
> 
> > Do you have a testcase in C to show the Linux behavior? I believe it is
> > a Linux bug.
> 
> Here you go.
> 

Here is the simplified testcase without using weak and only 2 DSOs.
Could someone please run it on Solaris/x86 and Solaris/Sparc?

Thanks.


H.J.

[-- Attachment #2: bug.tar.gz --]
[-- Type: application/x-gzip, Size: 837 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:49                                                                   ` Jason Merrill
                                                                                       ` (2 preceding siblings ...)
  2002-05-20 15:31                                                                     ` Martin v. Loewis
@ 2002-05-21 19:07                                                                     ` H . J . Lu
  2002-05-22  1:46                                                                       ` Martin v. Loewis
  3 siblings, 1 reply; 104+ messages in thread
From: H . J . Lu @ 2002-05-21 19:07 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Abrahams, Martin v. Loewis, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

On Mon, May 20, 2002 at 08:57:35PM +0100, Jason Merrill wrote:
> >>>>> "H" == H J Lu <hjl@lucon.org> writes:
> 
> > On Mon, May 20, 2002 at 05:12:19PM +0100, Jason Merrill wrote:
> >> 
> >> Interestingly, loading C.so first as RTLD_LOCAL causes both A.so and B.so
> >> to resolve to different addresses from C.so on Linux, but on Solaris it
> >> produces the desired result.
> 
> > It seems like a Linux bug. I will look into it if no one else does.
> 
> >> #4 as written above could have the effect of causing B.so to refer to
> >> a definition in A.so, which would be problematic if we try to unload A.so.
> >> Perhaps the right approach is
> >> 
> >> 5) Do not allow an object loaded with RTLD_LOCAL to override symbols from a
> >> dependency.
> >> 
> >> This rule is easily stated; it would cause both A.so and B.so to refer to
> >> the definition in C.so, regardless of the order of loading.  I like it.
> 
> > If you were saying:
> 
> > 1. Load C.so with RTLD_LOCAL.
> > 2. Load A.so with RTLD_LOCAL.
> > 3. Load B.so with RTLD_LOCAL.
> 
> > both A.so and B.so should resolve to C.so, I think it makes sense.
> 
> Yes, that is what I was saying.  A possible refinement would be
> 

I took a look. When I move C.so in front of A.so in the scope of A.so
if C.so is on the DT_NEEDED list of A.so and is loaded in memory
already, this will work. However, there are a few testcases in
glibc which assume otherwise. I can't make both to work at the same
time. Does anyone have some ideas?


H.J.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-21 19:07                                                                     ` H . J . Lu
@ 2002-05-22  1:46                                                                       ` Martin v. Loewis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Loewis @ 2002-05-22  1:46 UTC (permalink / raw)
  To: H . J . Lu
  Cc: Jason Merrill, David Abrahams, drepper, Mark Mitchell,
	Ralf W. Grosse-Kunstleve, gcc

"H . J . Lu" <hjl@lucon.org> writes:

> I took a look. When I move C.so in front of A.so in the scope of A.so
> if C.so is on the DT_NEEDED list of A.so and is loaded in memory
> already, this will work. However, there are a few testcases in
> glibc which assume otherwise. I can't make both to work at the same
> time. Does anyone have some ideas?

Can you report what those test cases are, and judge whether they are
"reasonable"?

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-20 13:28                                                                       ` David Abrahams
@ 2002-05-22 16:35                                                                         ` Jason Merrill
  2002-05-22 21:46                                                                           ` David Abrahams
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-22 16:35 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Martin v. Loewis, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

> One thing we didn't discuss in detail was what should happen in case two
> of a library's dependencies are already loaded, each with its own
> definition for some shared symbol S.

i.e.

A.so -> T.so
     -> U.so

B.so -> U.so

?  Here, if T and U both define S, refs in U will resolve to the copy in T,
but refs in B won't, so we get the same problem.  Hmm.

This problem would be solved if T were to link against U, or vice versa, or
if both were linked against a third library which defined S.

The problem is that RTLD_LOCAL really wants a strict delineation of
provider and user; if a DSO uses a symbol from a DSO that it doesn't
depend on, this premise is violated, and users will get confused.

I would further clarify my proposal #6 thus:

7) Resolution of a relocation in a DSO loaded with RTLD_LOCAL only
   considers definitions in the DSO itself and its dependencies.  If a
   strong definition is seen in the normal breadth-first search of these
   DSOs, it is used; otherwise, a weak definition is chosen by depth-first
   search.

Actually, I'd be inclined to adopt the second sentence for all cases, not
just RTLD_LOCAL.  If a library provides a weak definition of something, and
the executable provides a weak definition as well, it makes sense to me to
use the library version.  Doing so would improve the usefulness of
-Wl,--gc-sections (once it works).

Anyway, adopting this proposal, T and U would each use their own
definition, A would use the one from T, and B would use the one from U.  So
the problem would come when trying to, say, throw from U into A.

It's not difficult to imagine this sort of situation arising with
vague-linkage entities that are emitted when needed.  For example: a
library V defines a non-polymorphic class J but doesn't use its RTTI node.
T and U link against V and both throw objects of type J.  A catches the one
thrown in T, but not the one thrown in U.  We would have been fine if the
RTTI node had been emitted in V, but it wasn't needed, so it wasn't
emitted.

I don't see any way to get ld.so to just give us the semantics we want for
this subcase.

If the author of V is aware of this issue, he can avoid it by making sure
the RTTI node for J is emitted in V, either by using #pragma interface or
(if #7 is adopted) by writing a dummy function which refers to typeid(J).
The same thing is true for other static data.

Unfortunately, this is much harder for template libraries, where we can't
anticipate what parameters our templates might be instantiated with.  If V
defines a template K<X> and T and U independently decide to throw a
K<int***>, there isn't much the author of V can do about it.

The author of a template library can adjust their design to avoid relying
on static data members being combined properly.  For instance, an allocator
pool is less effective if it's partitioned, but no less correct.  But there
seems to be nothing anyone can do to fix throwing a K<int***> from U into
A, unless we adjust RTTI in the same way.  In other words, #2.

We can significantly reduce the number of cases where this situation will
cause problems, but can't eliminate them without abandoning our reliance on
pointer comparison.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-22 16:35                                                                         ` Jason Merrill
@ 2002-05-22 21:46                                                                           ` David Abrahams
  2002-05-22 23:05                                                                             ` Jason Merrill
  0 siblings, 1 reply; 104+ messages in thread
From: David Abrahams @ 2002-05-22 21:46 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Mark Mitchell, Martin v. Loewis, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc


From: "Jason Merrill" <jason@redhat.com>


> >>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:
>
> > One thing we didn't discuss in detail was what should happen in case
two
> > of a library's dependencies are already loaded, each with its own
> > definition for some shared symbol S.
>
> i.e.
>
> A.so -> T.so
>      -> U.so
>
> B.so -> U.so
>
> ?  Here, if T and U both define S, refs in U will resolve to the copy in
T,
> but refs in B won't, so we get the same problem.  Hmm.

No, that's not what I had in mind. In fact I assumed *** that B would get
the same copy of S that was currently used by U, which is to say the one in
T. Is there any reason it couldn't work that way?

> This problem would be solved if T were to link against U, or vice versa,
or
> if both were linked against a third library which defined S.
>
> The problem is that RTLD_LOCAL really wants a strict delineation of
> provider and user;

At the RTLD_LOCAL boundary there already is a strict delineation.

> if a DSO uses a symbol from a DSO that it doesn't
> depend on, this premise is violated, and users will get confused.

I don't see the potential for confusion (leaving aside unloading for the
moment), since matching symbols in T and U above were required to be
identical anyway according to the ODR.

----
What I had in mind was more like this:

A.so -> T.so
B.so -> U.so

Now there are two shared spaces (A,T) and (B,U), each with its own copy of
S. Then:

C.so -> T.so, U.so

C wanted a single shared copy but ends up having to choose one or error. I
vote for the former.

---


> I would further clarify my proposal #6 thus:
>
> 7) Resolution of a relocation in a DSO loaded with RTLD_LOCAL only
>    considers definitions in the DSO itself and its dependencies.  If a
>    strong definition is seen in the normal breadth-first search of these
>    DSOs, it is used; otherwise, a weak definition is chosen by
depth-first
>    search.

If you believe Martin, symbols we care about like RTTI info and template
static members are not weak... so I don't understand the relevance of 7.
Could you describe how it would play out in practice?

> Actually, I'd be inclined to adopt the second sentence for all cases, not
> just RTLD_LOCAL.  If a library provides a weak definition of something,
and
> the executable provides a weak definition as well, it makes sense to me
to
> use the library version.  Doing so would improve the usefulness of
> -Wl,--gc-sections (once it works).
>
> Anyway, adopting this proposal, T and U would each use their own
> definition, A would use the one from T, and B would use the one from U.
So
> the problem would come when trying to, say, throw from U into A.
>
> It's not difficult to imagine this sort of situation arising with
> vague-linkage entities that are emitted when needed.  For example: a
> library V defines a non-polymorphic class J but doesn't use its RTTI
node.
> T and U link against V and both throw objects of type J.  A catches the
one
> thrown in T, but not the one thrown in U.  We would have been fine if the
> RTTI node had been emitted in V, but it wasn't needed, so it wasn't
> emitted.
>
> I don't see any way to get ld.so to just give us the semantics we want
for
> this subcase.

What about just following the semantics I had assumed you'd get anyway
(***)?

-Dave

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Duplicate data objects in shared libraries
  2002-05-22 21:46                                                                           ` David Abrahams
@ 2002-05-22 23:05                                                                             ` Jason Merrill
       [not found]                                                                               ` <20020529130945.A16909@lucon.org>
  0 siblings, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-22 23:05 UTC (permalink / raw)
  To: David Abrahams
  Cc: Mark Mitchell, Martin v. Loewis, H . J . Lu, drepper,
	Ralf W. Grosse-Kunstleve, gcc

>>>>> "David" == David Abrahams <david.abrahams@rcn.com> writes:

> From: "Jason Merrill" <jason@redhat.com>

>> A.so -> T.so
>>      -> U.so
>> 
>> B.so -> U.so
>> 
> No, that's not what I had in mind. In fact I assumed *** that B would get
> the same copy of S that was currently used by U, which is to say the one in
> T. Is there any reason it couldn't work that way?

Because the one in T is not visible when we are loading B.  The symbols
used to satisfy relocs in U are not re-exported by U.

> What I had in mind was more like this:

> A.so -> T.so
> B.so -> U.so

> Now there are two shared spaces (A,T) and (B,U), each with its own copy of
> S. Then:

> C.so -> T.so, U.so

> C wanted a single shared copy but ends up having to choose one or error. I
> vote for the former.

In this case, T and U are already resolved when we load C, so they already
resolve to their own copies.  Under current Linux semantics, C would then
resolve to its own copies; my proposals would cause it to refer to the one
from T.

Under my proposal #7, your example and mine would have the same result.


>> I would further clarify my proposal #6 thus:
>> 
>> 7) Resolution of a relocation in a DSO loaded with RTLD_LOCAL only
>> considers definitions in the DSO itself and its dependencies.  If a
>> strong definition is seen in the normal breadth-first search of these
>> DSOs, it is used; otherwise, a weak definition is chosen by depth-first
>> search.

> If you believe Martin, symbols we care about like RTTI info and template
> static members are not weak...

I'm not sure that Martin said that; in any case, it's wrong.  Symbols with
vague linkage are emitted as weak in order to avoid multiple definition
errors in static links.

>...

>> I don't see any way to get ld.so to just give us the semantics we want
>> for this subcase.

> What about just following the semantics I had assumed you'd get anyway
> (***)?

Well, that's another possibility.

8) #7, but also consider the prior resolutions of relocs to the same symbol
   in a dependency.

This would fix my example, but not yours.  And I don't think it's as clean
a design as #7 alone.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
       [not found]                                                                                     ` <1022790116.22692.205.camel@myware.mynet>
@ 2002-05-30 18:51                                                                                       ` Jason Merrill
       [not found]                                                                                       ` <wvlit54530i.fsf@prospero.cambridge.redhat.com>
  1 sibling, 0 replies; 104+ messages in thread
From: Jason Merrill @ 2002-05-30 18:51 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: David Abrahams, H . J . Lu, Martin v. Loewis,
	Ralf W.  Grosse-Kunstleve, GNU C Library, gcc

>>>>> "Ulrich" == Ulrich Drepper <drepper@redhat.com> writes:

Thanks for responding.

> On Thu, 2002-05-30 at 13:01, Jason Merrill wrote:

>> No; Solaris' behavior is no more helpful for real-world C++ examples.

> Very specialized and maybe true for gcc.

The EDG frontend, used by Intel among many others, also relies on address
comparison for typeinfos.  And, indeed, the Intel compiler produces code
that fails in exactly the same way as the gcc output.

Intel also seems to use weak symbols, and even linkonce sections, for
template instantiations.

Sun CC 4.2 fails in the same way under Solaris 5.8 (after I make the
necessary changes to accomodate that ancient compiler; fortunately, it
supports EH).

Interestingly, SGI CC 7.30 passes the test, even though it also uses the
EDG frontend.  I'll investigate why; I'm guessing dlopen works differently
on Irix.

>> Is there any kind of a standard for ld.so symbol resolution behavior?

> Most things the generic ELF ABI covers.  But the behavior of dlopen() on
> the ELF level is not covered by any standard.

>> 1) Always prefer the last weak definition if no strong definition is seen.

> Special weak symbol handling is going away.  The ELF spec didn't clearly
> state what has to happen and so a few implementation (like glibc) added
> this kind of support.  But it's not portable and it's unnecessarily
> reducing the speed.

It's not portable because, as you say, there's no standard.  That seems
like an opportunity to explore what a future standard should say.

Speed should not trump correctness.  If you have a different idea for how
we can get proper C++ semantics, I'd love to hear it.

>> 2) If a DSO A has two unrelated dependencies B and C which both define (and
>> use) the same weak symbol, add C to the dependency list of this loaded
>> copy of B.

> If I understand this correctly you mean

>    A ---> B
>      |
>      +--> C

> and B defines and uses 'foo' and C defines and uses 'foo'.

> In this case it makes no difference whether C gets added to the
> dependency list of B since B's scope comes first.

Yes, I mentioned that this was only meaningful in conjunction with #1,
which would cause the last definition to be chosen.

>> 3) When resolving a relocation from a DSO loaded with RTLD_LOCAL, start
>> looking from the DSO itself; do not consider other RTLD_LOCAL objects
>> which depend on it.

> Starting with the DSO itself is what you select with DF_SYMBOLIC.  It's
> generally a very bad idea.  Which other scopes are searched depends
> heavily on the actual situation.  There won't be any "this is how C++
> needs it and therefore this is how it's gonna be".

Of course not, I'm mostly looking for input.  But C++ places more complex
demands on the linker, leading to situations that we hadn't considered
before; we need to consider what the right thing to do is in those
situations.  I've suggested what I think the right thing is, which I
believe is appropriate for all languages, not just C++, but I'm very
interested in your opinion; you are certainly more familiar with ld.so than
I.

> I'll look at all this hopefully in two weeks from now.

Thanks.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
       [not found]                                                                                       ` <wvlit54530i.fsf@prospero.cambridge.redhat.com>
@ 2002-05-31  0:28                                                                                         ` Jason Merrill
  2002-05-31  0:39                                                                                           ` Ulrich Drepper
  2003-04-10 15:31                                                                                         ` Jason Merrill
  1 sibling, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2002-05-31  0:28 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: David Abrahams, H . J . Lu, Martin v. Loewis,
	Ralf W.  Grosse-Kunstleve, GNU C Library, gcc, Jason Merrill

>>>>> "Jason" == Jason Merrill <jason@redhat.com> writes:

> Interestingly, SGI CC 7.30 passes the test, even though it also uses the
> EDG frontend.  I'll investigate why; I'm guessing dlopen works differently
> on Irix.

The EDG frontend uses address comparison of common symbols, rather than
weak; for some reason, this seems to work under Irix.  This could have
something to do with the COMMON/MIPS_ACOMMON distinction in nm output
between the defs in the users and library, respectively.

Weak symbols seem to work about the same as under Linux.

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
  2002-05-31  0:28                                                                                         ` Jason Merrill
@ 2002-05-31  0:39                                                                                           ` Ulrich Drepper
  0 siblings, 0 replies; 104+ messages in thread
From: Ulrich Drepper @ 2002-05-31  0:39 UTC (permalink / raw)
  To: Jason Merrill
  Cc: David Abrahams, H . J . Lu, Martin v. Loewis,
	Ralf W.    Grosse-Kunstleve, GNU C Library, gcc

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Thu, 2002-05-30 at 22:15, Jason Merrill wrote:

> Weak symbols seem to work about the same as under Linux.

Irix is the other system which got the handling of weak symbols wrong.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
       [not found]                                                                                       ` <wvlit54530i.fsf@prospero.cambridge.redhat.com>
  2002-05-31  0:28                                                                                         ` Jason Merrill
@ 2003-04-10 15:31                                                                                         ` Jason Merrill
  2003-04-10 15:32                                                                                           ` H. J. Lu
  1 sibling, 1 reply; 104+ messages in thread
From: Jason Merrill @ 2003-04-10 15:31 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: GNU C Library, gcc

On Fri, 31 May 2002 00:56:29 -0400, Jason Merrill <jason@redhat.com> wrote:

>>>>>> "Ulrich" == Ulrich Drepper <drepper@redhat.com> writes:
>
> Thanks for responding.
>
>> On Thu, 2002-05-30 at 13:01, Jason Merrill wrote:
>
>>> No; Solaris' behavior is no more helpful for real-world C++ examples.
>
>> Very specialized and maybe true for gcc.
>
> The EDG frontend, used by Intel among many others, also relies on address
> comparison for typeinfos.  And, indeed, the Intel compiler produces code
> that fails in exactly the same way as the gcc output.
>
> Intel also seems to use weak symbols, and even linkonce sections, for
> template instantiations.
>
> Sun CC 4.2 fails in the same way under Solaris 5.8 (after I make the
> necessary changes to accomodate that ancient compiler; fortunately, it
> supports EH).
>
> Interestingly, SGI CC 7.30 passes the test, even though it also uses the
> EDG frontend.  I'll investigate why; I'm guessing dlopen works differently
> on Irix.
>
>>> Is there any kind of a standard for ld.so symbol resolution behavior?
>
>> Most things the generic ELF ABI covers.  But the behavior of dlopen() on
>> the ELF level is not covered by any standard.
>
>>> 1) Always prefer the last weak definition if no strong definition is seen.
>
>> Special weak symbol handling is going away.  The ELF spec didn't clearly
>> state what has to happen and so a few implementation (like glibc) added
>> this kind of support.  But it's not portable and it's unnecessarily
>> reducing the speed.
>
> It's not portable because, as you say, there's no standard.  That seems
> like an opportunity to explore what a future standard should say.
>
> Speed should not trump correctness.  If you have a different idea for how
> we can get proper C++ semantics, I'd love to hear it.
>
>>> 2) If a DSO A has two unrelated dependencies B and C which both define (and
>>> use) the same weak symbol, add C to the dependency list of this loaded
>>> copy of B.
>
>> If I understand this correctly you mean
>
>>    A ---> B
>>      |
>>      +--> C
>
>> and B defines and uses 'foo' and C defines and uses 'foo'.
>
>> In this case it makes no difference whether C gets added to the
>> dependency list of B since B's scope comes first.
>
> Yes, I mentioned that this was only meaningful in conjunction with #1,
> which would cause the last definition to be chosen.
>
>>> 3) When resolving a relocation from a DSO loaded with RTLD_LOCAL, start
>>> looking from the DSO itself; do not consider other RTLD_LOCAL objects
>>> which depend on it.
>
>> Starting with the DSO itself is what you select with DF_SYMBOLIC.  It's
>> generally a very bad idea.  Which other scopes are searched depends
>> heavily on the actual situation.  There won't be any "this is how C++
>> needs it and therefore this is how it's gonna be".
>
> Of course not, I'm mostly looking for input.  But C++ places more complex
> demands on the linker, leading to situations that we hadn't considered
> before; we need to consider what the right thing to do is in those
> situations.  I've suggested what I think the right thing is, which I
> believe is appropriate for all languages, not just C++, but I'm very
> interested in your opinion; you are certainly more familiar with ld.so than
> I.
>
>> I'll look at all this hopefully in two weeks from now.
>
> Thanks.

Ping?

Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
  2003-04-10 15:31                                                                                         ` Jason Merrill
@ 2003-04-10 15:32                                                                                           ` H. J. Lu
  2003-04-10 16:20                                                                                             ` H. J. Lu
  0 siblings, 1 reply; 104+ messages in thread
From: H. J. Lu @ 2003-04-10 15:32 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Ulrich Drepper, GNU C Library, gcc

While we are on the topic, here is a patch. At the current glibc, it
should change anything since l_searchlist.r_list[0] is always args.map.
But it is used for RTLD_LOCAL change.


H.J.
----
2002-05-30  H.J. Lu  <hjl@gnu.org>

	* elf/dl-open.c (_dl_open): Check args.map->l_opencount instead
	of args.map->l_searchlist.r_list[0]->l_opencount for error.

--- elf/dl-open.c.close	Thu May 30 08:58:36 2002
+++ elf/dl-open.c	Thu May 30 14:10:39 2002
@@ -503,7 +507,7 @@ _dl_open (const char *file, int mode, co
 
 	  /* Increment open counters for all objects since this
 	     sometimes has not happened yet.  */
-	  if (args.map->l_searchlist.r_list[0]->l_opencount == 0)
+	  if (args.map->l_opencount == 0)
 	    for (i = 0; i < args.map->l_searchlist.r_nlist; ++i)
 	      ++args.map->l_searchlist.r_list[i]->l_opencount;
 
On Thu, Apr 10, 2003 at 01:49:26PM +0100, Jason Merrill wrote:
> On Fri, 31 May 2002 00:56:29 -0400, Jason Merrill <jason@redhat.com> wrote:
> 
> >>>>>> "Ulrich" == Ulrich Drepper <drepper@redhat.com> writes:
> >
> > Thanks for responding.
> >
> >> On Thu, 2002-05-30 at 13:01, Jason Merrill wrote:
> >
> >>> No; Solaris' behavior is no more helpful for real-world C++ examples.
> >
> >> Very specialized and maybe true for gcc.
> >
> > The EDG frontend, used by Intel among many others, also relies on address
> > comparison for typeinfos.  And, indeed, the Intel compiler produces code
> > that fails in exactly the same way as the gcc output.
> >
> > Intel also seems to use weak symbols, and even linkonce sections, for
> > template instantiations.
> >
> > Sun CC 4.2 fails in the same way under Solaris 5.8 (after I make the
> > necessary changes to accomodate that ancient compiler; fortunately, it
> > supports EH).
> >
> > Interestingly, SGI CC 7.30 passes the test, even though it also uses the
> > EDG frontend.  I'll investigate why; I'm guessing dlopen works differently
> > on Irix.
> >
> >>> Is there any kind of a standard for ld.so symbol resolution behavior?
> >
> >> Most things the generic ELF ABI covers.  But the behavior of dlopen() on
> >> the ELF level is not covered by any standard.
> >
> >>> 1) Always prefer the last weak definition if no strong definition is seen.
> >
> >> Special weak symbol handling is going away.  The ELF spec didn't clearly
> >> state what has to happen and so a few implementation (like glibc) added
> >> this kind of support.  But it's not portable and it's unnecessarily
> >> reducing the speed.
> >
> > It's not portable because, as you say, there's no standard.  That seems
> > like an opportunity to explore what a future standard should say.
> >
> > Speed should not trump correctness.  If you have a different idea for how
> > we can get proper C++ semantics, I'd love to hear it.
> >
> >>> 2) If a DSO A has two unrelated dependencies B and C which both define (and
> >>> use) the same weak symbol, add C to the dependency list of this loaded
> >>> copy of B.
> >
> >> If I understand this correctly you mean
> >
> >>    A ---> B
> >>      |
> >>      +--> C
> >
> >> and B defines and uses 'foo' and C defines and uses 'foo'.
> >
> >> In this case it makes no difference whether C gets added to the
> >> dependency list of B since B's scope comes first.
> >
> > Yes, I mentioned that this was only meaningful in conjunction with #1,
> > which would cause the last definition to be chosen.
> >
> >>> 3) When resolving a relocation from a DSO loaded with RTLD_LOCAL, start
> >>> looking from the DSO itself; do not consider other RTLD_LOCAL objects
> >>> which depend on it.
> >
> >> Starting with the DSO itself is what you select with DF_SYMBOLIC.  It's
> >> generally a very bad idea.  Which other scopes are searched depends
> >> heavily on the actual situation.  There won't be any "this is how C++
> >> needs it and therefore this is how it's gonna be".
> >
> > Of course not, I'm mostly looking for input.  But C++ places more complex
> > demands on the linker, leading to situations that we hadn't considered
> > before; we need to consider what the right thing to do is in those
> > situations.  I've suggested what I think the right thing is, which I
> > believe is appropriate for all languages, not just C++, but I'm very
> > interested in your opinion; you are certainly more familiar with ld.so than
> > I.
> >
> >> I'll look at all this hopefully in two weeks from now.
> >
> > Thanks.
> 
> Ping?
> 
> Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries)
  2003-04-10 15:32                                                                                           ` H. J. Lu
@ 2003-04-10 16:20                                                                                             ` H. J. Lu
  0 siblings, 0 replies; 104+ messages in thread
From: H. J. Lu @ 2003-04-10 16:20 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Ulrich Drepper, GNU C Library, gcc

Oops. I meant my patch SHOULDN'T change anything.


H.J.
On Thu, Apr 10, 2003 at 08:00:09AM -0700, H. J. Lu wrote:
> While we are on the topic, here is a patch. At the current glibc, it
> should change anything since l_searchlist.r_list[0] is always args.map.
> But it is used for RTLD_LOCAL change.
> 
> 
> H.J.
> ----
> 2002-05-30  H.J. Lu  <hjl@gnu.org>
> 
> 	* elf/dl-open.c (_dl_open): Check args.map->l_opencount instead
> 	of args.map->l_searchlist.r_list[0]->l_opencount for error.
> 
> --- elf/dl-open.c.close	Thu May 30 08:58:36 2002
> +++ elf/dl-open.c	Thu May 30 14:10:39 2002
> @@ -503,7 +507,7 @@ _dl_open (const char *file, int mode, co
>  
>  	  /* Increment open counters for all objects since this
>  	     sometimes has not happened yet.  */
> -	  if (args.map->l_searchlist.r_list[0]->l_opencount == 0)
> +	  if (args.map->l_opencount == 0)
>  	    for (i = 0; i < args.map->l_searchlist.r_nlist; ++i)
>  	      ++args.map->l_searchlist.r_list[i]->l_opencount;
>  
> On Thu, Apr 10, 2003 at 01:49:26PM +0100, Jason Merrill wrote:
> > On Fri, 31 May 2002 00:56:29 -0400, Jason Merrill <jason@redhat.com> wrote:
> > 
> > >>>>>> "Ulrich" == Ulrich Drepper <drepper@redhat.com> writes:
> > >
> > > Thanks for responding.
> > >
> > >> On Thu, 2002-05-30 at 13:01, Jason Merrill wrote:
> > >
> > >>> No; Solaris' behavior is no more helpful for real-world C++ examples.
> > >
> > >> Very specialized and maybe true for gcc.
> > >
> > > The EDG frontend, used by Intel among many others, also relies on address
> > > comparison for typeinfos.  And, indeed, the Intel compiler produces code
> > > that fails in exactly the same way as the gcc output.
> > >
> > > Intel also seems to use weak symbols, and even linkonce sections, for
> > > template instantiations.
> > >
> > > Sun CC 4.2 fails in the same way under Solaris 5.8 (after I make the
> > > necessary changes to accomodate that ancient compiler; fortunately, it
> > > supports EH).
> > >
> > > Interestingly, SGI CC 7.30 passes the test, even though it also uses the
> > > EDG frontend.  I'll investigate why; I'm guessing dlopen works differently
> > > on Irix.
> > >
> > >>> Is there any kind of a standard for ld.so symbol resolution behavior?
> > >
> > >> Most things the generic ELF ABI covers.  But the behavior of dlopen() on
> > >> the ELF level is not covered by any standard.
> > >
> > >>> 1) Always prefer the last weak definition if no strong definition is seen.
> > >
> > >> Special weak symbol handling is going away.  The ELF spec didn't clearly
> > >> state what has to happen and so a few implementation (like glibc) added
> > >> this kind of support.  But it's not portable and it's unnecessarily
> > >> reducing the speed.
> > >
> > > It's not portable because, as you say, there's no standard.  That seems
> > > like an opportunity to explore what a future standard should say.
> > >
> > > Speed should not trump correctness.  If you have a different idea for how
> > > we can get proper C++ semantics, I'd love to hear it.
> > >
> > >>> 2) If a DSO A has two unrelated dependencies B and C which both define (and
> > >>> use) the same weak symbol, add C to the dependency list of this loaded
> > >>> copy of B.
> > >
> > >> If I understand this correctly you mean
> > >
> > >>    A ---> B
> > >>      |
> > >>      +--> C
> > >
> > >> and B defines and uses 'foo' and C defines and uses 'foo'.
> > >
> > >> In this case it makes no difference whether C gets added to the
> > >> dependency list of B since B's scope comes first.
> > >
> > > Yes, I mentioned that this was only meaningful in conjunction with #1,
> > > which would cause the last definition to be chosen.
> > >
> > >>> 3) When resolving a relocation from a DSO loaded with RTLD_LOCAL, start
> > >>> looking from the DSO itself; do not consider other RTLD_LOCAL objects
> > >>> which depend on it.
> > >
> > >> Starting with the DSO itself is what you select with DF_SYMBOLIC.  It's
> > >> generally a very bad idea.  Which other scopes are searched depends
> > >> heavily on the actual situation.  There won't be any "this is how C++
> > >> needs it and therefore this is how it's gonna be".
> > >
> > > Of course not, I'm mostly looking for input.  But C++ places more complex
> > > demands on the linker, leading to situations that we hadn't considered
> > > before; we need to consider what the right thing to do is in those
> > > situations.  I've suggested what I think the right thing is, which I
> > > believe is appropriate for all languages, not just C++, but I'm very
> > > interested in your opinion; you are certainly more familiar with ld.so than
> > > I.
> > >
> > >> I'll look at all this hopefully in two weeks from now.
> > >
> > > Thanks.
> > 
> > Ping?
> > 
> > Jason

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-16  0:16   ` Mark Mitchell
@ 2002-05-16  4:35     ` Martin v. Löwis
  0 siblings, 0 replies; 104+ messages in thread
From: Martin v. Löwis @ 2002-05-16  4:35 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Ralf W. Grosse-Kunstleve, c++std-ext, gcc

Mark Mitchell <mark@codesourcery.com> writes:

> It occurs to me that another way to solve these problems is to change
> the rules for loading modules, from the point of view of Python.  The
> whole point of using RTLD_LOCAL is to prevent name clashes between
> modules.
> 
> Why not define that problem away?
> 
> If the rule was that, for example, all externally visible names in the
> loaded modules had to be within namespaces that were assigned by some
> naming authority, or otherwise consistent, then you could load modules
> with RTLD_GLOBAL.
> 
> Yes, I know this is non-trivial, but if you want to use C++, it's the
> practical solution over the medium term.

That indeed is an option: with sys.setdlopenflags, the application can
override Python's default of RTLD_LOCAL.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15  5:05 ` Martin v. Löwis
@ 2002-05-16  0:16   ` Mark Mitchell
  2002-05-16  4:35     ` Martin v. Löwis
  0 siblings, 1 reply; 104+ messages in thread
From: Mark Mitchell @ 2002-05-16  0:16 UTC (permalink / raw)
  To: Martin v. Löwis, Ralf W. Grosse-Kunstleve; +Cc: c++std-ext, gcc



--On Wednesday, May 15, 2002 01:07:02 PM +0200 "Martin v. Löwis" <loewis@informatik.hu-berlin.de> wrote:

> "Ralf W. Grosse-Kunstleve" <rwgk@cci.lbl.gov> writes:
>
>> Once you guys have figured out what "the right" approach is, PLEASE
>> document it clearly and in a way that can be understood by a wider
>> audience.
>
> I think the message that can be understood by a wider audience is:
> don't use C++ for Python extension modules.

As someone knowledgeable about C++ implementations, and somewhat
knowledgeable about Python's implementation, I second this advice.

That might change if the ELF/C++/etc. semantics get better in the ways
that are being discussed, but at present, I will simply say that if
I were going to build a Python extension module, I would do it in C,
even though I generally find that I am more cost-effective when working
in C++.

It occurs to me that another way to solve these problems is to change
the rules for loading modules, from the point of view of Python.  The
whole point of using RTLD_LOCAL is to prevent name clashes between
modules.

Why not define that problem away?

If the rule was that, for example, all externally visible names in the
loaded modules had to be within namespaces that were assigned by some
naming authority, or otherwise consistent, then you could load modules
with RTLD_GLOBAL.

Yes, I know this is non-trivial, but if you want to use C++, it's the
practical solution over the medium term.

-- Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15  8:23 Ralf W. Grosse-Kunstleve
  2002-05-15  8:32 ` David Abrahams
@ 2002-05-15 10:44 ` Martin v. Löwis
  1 sibling, 0 replies; 104+ messages in thread
From: Martin v. Löwis @ 2002-05-15 10:44 UTC (permalink / raw)
  To: Ralf W. Grosse-Kunstleve; +Cc: c++std-ext, gcc

"Ralf W. Grosse-Kunstleve" <rwgk@cci.lbl.gov> writes:

> > I think the message that can be understood by a wider audience is:
> > don't use C++ for Python extension modules.
> 
> Surely You're Joking, Mr. Loewis!

Only half-ly.

> As it stands, Linux/gcc3 is the only platform that does not do what
> we want. Are you sure it is helpful to tell people to "go away"?

For *your* application, only Linux/gcc3 fails to work. For any
compiler/platform combination, I could produce numerous cases that
won't work as many people may expect.

If you want a simple answer, you have to accept the one you I gave. If
you cannot accept this, you must accept that the issues are much more
involved.

> - Is the situation different if python is compiled and linked with a C++
>   compiler (--with-cxx)?

Not on your platform. On other platforms, C++ extensions won't work at
all unless Python is compiled with C++ (AIX, with the system compiler,
is one of these platforms).

> - More generally: What are the issues when using dlopen in any C++
>   program.

Too numerous to list in this message. They roughly group into the
following categories:

- constructor/destructor execution: may or may not execute at dlopen
  time, block-static variables may or may not work correctly.
- duplicate definition of things, where things is one of
  - virtual method tables
  - typeinfo object
  - template instantiations
  - malloc heaps
  - locales
  - ...
- conflicting symbol spaces
- dynamic failures to resolve symbols

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-15  8:23 Ralf W. Grosse-Kunstleve
@ 2002-05-15  8:32 ` David Abrahams
  2002-05-15 10:44 ` Martin v. Löwis
  1 sibling, 0 replies; 104+ messages in thread
From: David Abrahams @ 2002-05-15  8:32 UTC (permalink / raw)
  To: c++std-ext; +Cc: c++std-ext, gcc, Ralf W. Grosse-Kunstleve


From: "Ralf W. Grosse-Kunstleve" <rwgk@cci.lbl.gov>


> To: C++ extensions mailing list
> Message c++std-ext-5016
>
> > I think the message that can be understood by a wider audience is:
> > don't use C++ for Python extension modules.
>
> Surely You're Joking, Mr. Loewis!
> As it stands, Linux/gcc3 is the only platform that does not do what
> we want.

Well, to be fair... Linux/gcc3 doesn't do what you and I want for
exceptions with Boost.Python, but it appears that none of the Unices have a
reasonable behavior for many other parts of the C++ language. It seems as
though a better job could (and maybe should) be done by the compiler with
the parts it can handle (EH, RTTI), but the deep problem lies elsewhere.

> Are you sure it is helpful to tell people to "go away"?

This part I agree with you on, but I think that Martin was just using a
shorthand for "If you're not prepared to think about some interesting
details of how the loader works, don't use C++ for Python extension
modules". IOW, I'm guessing Martin thinks the wider audience isn't prepared
to think about those details... and I bet he's right.

> Questions:
>
> - Is the situation different if python is compiled and linked with a C++
>   compiler (--with-cxx)?

No.

> - More generally: What are the issues when using dlopen in any C++
>   program.

It appears that if two dlopened libraries link to a common shared library,
one of them will disagree with the common library about the identities of
all weakly-linked static data. If you need more explanation, give me a call
and I can talk you through the implications.

-Dave



^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
@ 2002-05-15  8:23 Ralf W. Grosse-Kunstleve
  2002-05-15  8:32 ` David Abrahams
  2002-05-15 10:44 ` Martin v. Löwis
  0 siblings, 2 replies; 104+ messages in thread
From: Ralf W. Grosse-Kunstleve @ 2002-05-15  8:23 UTC (permalink / raw)
  To: loewis; +Cc: c++std-ext, gcc, rwgk

> I think the message that can be understood by a wider audience is:
> don't use C++ for Python extension modules.

Surely You're Joking, Mr. Loewis!
As it stands, Linux/gcc3 is the only platform that does not do what
we want. Are you sure it is helpful to tell people to "go away"?

Questions:

- Is the situation different if python is compiled and linked with a C++
  compiler (--with-cxx)?

- More generally: What are the issues when using dlopen in any C++
  program.

Ralf

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
  2002-05-14 15:37 Ralf W. Grosse-Kunstleve
@ 2002-05-15  5:05 ` Martin v. Löwis
  2002-05-16  0:16   ` Mark Mitchell
  0 siblings, 1 reply; 104+ messages in thread
From: Martin v. Löwis @ 2002-05-15  5:05 UTC (permalink / raw)
  To: Ralf W. Grosse-Kunstleve; +Cc: c++std-ext, gcc

"Ralf W. Grosse-Kunstleve" <rwgk@cci.lbl.gov> writes:

> Once you guys have figured out what "the right" approach is, PLEASE
> document it clearly and in a way that can be understood by a wider
> audience.

I think the message that can be understood by a wider audience is:
don't use C++ for Python extension modules.

Unless the underlying technologies change, this *is* difficult matter,
and you really have to learn the inner workings to predict whether a
certain application will "work" or not.

Regards,
Martin

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: Minimal GCC/Linux shared lib + EH bug example
@ 2002-05-14 15:37 Ralf W. Grosse-Kunstleve
  2002-05-15  5:05 ` Martin v. Löwis
  0 siblings, 1 reply; 104+ messages in thread
From: Ralf W. Grosse-Kunstleve @ 2002-05-14 15:37 UTC (permalink / raw)
  To: loewis; +Cc: c++std-ext, gcc, jason, mark, rwgk

Martin v. Loewis wrote:
> You might be surprised how many users have that expectation. As you
> know, Ralf originally was linking a static libboost, and it never
> occurred to him that something might be wrong in the build process.

This is not quite accurate: I really had not concept of what is going
on under the hood. Attempts to find documentation about dynamic loading
did not produce any material that I, as the average user, could easily
absorb (did I miss something?). So I just stuck to what seemed to
work until it started breaking many months later.

Once you guys have figured out what "the right" approach is, PLEASE
document it clearly and in a way that can be understood by a wider
audience.

Ralf

^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2003-04-10 15:31 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <09b501c1f634$04747d80$6501a8c0@boostconsulting.com>
2002-05-12  4:57 ` Minimal GCC/Linux shared lib + EH bug example Jason Merrill
2002-05-12  6:42   ` David Abrahams
2002-05-12  7:30     ` Jason Merrill
2002-05-12  7:31       ` David Abrahams
2002-05-12  8:07         ` Jason Merrill
2002-05-12  9:24           ` David Abrahams
2002-05-12  9:31       ` Martin v. Loewis
2002-05-12  9:34         ` David Abrahams
2002-05-12 12:17       ` Mark Mitchell
2002-05-12 12:24         ` Martin v. Loewis
2002-05-12 12:29           ` Mark Mitchell
2002-05-12 12:36             ` Jason Merrill
2002-05-12 12:37               ` Mark Mitchell
2002-05-12 16:55               ` Jason Merrill
2002-05-12 13:41             ` David Abrahams
2002-05-13  1:34               ` Martin v. Loewis
2002-05-13  2:05                 ` Mark Mitchell
2002-05-13  5:44                 ` David Abrahams
2002-05-13 16:58                   ` Martin v. Loewis
2002-05-13 21:39                     ` David Abrahams
2002-05-14  2:34                       ` Martin v. Loewis
2002-05-14 13:12                         ` David Abrahams
2002-05-14 14:17                           ` Martin v. Loewis
2002-05-12 12:36           ` David Abrahams
2002-05-13  1:28             ` Martin v. Loewis
2002-05-13  5:00               ` David Abrahams
2002-05-13 16:50                 ` Martin v. Loewis
2002-05-13 19:00                   ` David Abrahams
2002-05-14  2:14                     ` Martin v. Loewis
2002-05-14  6:07                       ` David Abrahams
2002-05-14 13:53                         ` Martin v. Loewis
2002-05-14 14:45                           ` David Abrahams
2002-05-15  2:54                             ` Martin v. Loewis
2002-05-14 15:28                           ` Jason Merrill
2002-05-14 18:32                             ` Daniel Jacobowitz
2002-05-15  1:34                               ` Martin v. Loewis
2002-05-14 13:23                       ` Sean Parent
2002-05-14 14:08                         ` David Abrahams
2002-05-14 18:38                           ` Sean Parent
2002-05-14 22:50                             ` David Abrahams
2002-05-15 11:38                               ` Sean Parent
2002-05-15 11:50                                 ` Matthew Austern
2002-05-15 12:29                                   ` Joe Buck
2002-05-15 17:26                                     ` David Abrahams
2002-05-15 20:21                                     ` H . J . Lu
2002-05-15 22:35                                       ` David Abrahams
2002-05-16 11:18                                         ` H . J . Lu
2002-05-18 16:53                                           ` David Abrahams
2002-05-18 17:55                                             ` Martin v. Loewis
2002-05-18 19:06                                               ` David Abrahams
2002-05-19  4:18                                                 ` Duplicate data objects in shared libraries Martin v. Loewis
2002-05-19  5:00                                                   ` David Abrahams
2002-05-19  5:14                                                     ` Martin v. Loewis
2002-05-19  5:48                                                       ` David Abrahams
2002-05-19 15:05                                                         ` Martin v. Loewis
2002-05-20  1:42                                                           ` Jason Merrill
2002-05-20  3:47                                                             ` H . J . Lu
2002-05-20  4:08                                                             ` Mark Mitchell
2002-05-20  9:55                                                               ` Jason Merrill
2002-05-20 10:15                                                                 ` Mark Mitchell
2002-05-20 12:42                                                                   ` Jason Merrill
2002-05-20 12:53                                                                     ` Mark Mitchell
2002-05-20 13:23                                                                       ` Jason Merrill
2002-05-20 13:28                                                                       ` David Abrahams
2002-05-22 16:35                                                                         ` Jason Merrill
2002-05-22 21:46                                                                           ` David Abrahams
2002-05-22 23:05                                                                             ` Jason Merrill
     [not found]                                                                               ` <20020529130945.A16909@lucon.org>
     [not found]                                                                                 ` <039401c20759$a3ba1400$6601a8c0@boostconsulting.com>
     [not found]                                                                                   ` <wvl8z615rsz.fsf@prospero.cambridge.redhat.com>
     [not found]                                                                                     ` <1022790116.22692.205.camel@myware.mynet>
2002-05-30 18:51                                                                                       ` PATCH: Treat RTLD_LOCAL like Solaris (Re: Duplicate data objects in shared libraries) Jason Merrill
     [not found]                                                                                       ` <wvlit54530i.fsf@prospero.cambridge.redhat.com>
2002-05-31  0:28                                                                                         ` Jason Merrill
2002-05-31  0:39                                                                                           ` Ulrich Drepper
2003-04-10 15:31                                                                                         ` Jason Merrill
2003-04-10 15:32                                                                                           ` H. J. Lu
2003-04-10 16:20                                                                                             ` H. J. Lu
2002-05-20  7:42                                                             ` Duplicate data objects in shared libraries David Abrahams
2002-05-20  9:34                                                               ` Jason Merrill
2002-05-20  9:57                                                                 ` David Abrahams
2002-05-20 10:28                                                                 ` H . J . Lu
2002-05-20 13:49                                                                   ` Jason Merrill
2002-05-20 13:59                                                                     ` H . J . Lu
2002-05-20 14:17                                                                       ` Jason Merrill
2002-05-20 18:19                                                                         ` H . J . Lu
2002-05-20 14:32                                                                       ` David Abrahams
2002-05-20 14:32                                                                     ` David Abrahams
2002-05-20 15:31                                                                     ` Martin v. Loewis
2002-05-21 19:07                                                                     ` H . J . Lu
2002-05-22  1:46                                                                       ` Martin v. Loewis
2002-05-20 13:26                                                                 ` David Beazley
2002-05-20 13:57                                                                   ` H . J . Lu
2002-05-20 14:36                                                                     ` David Beazley
2002-05-20 15:50                                                                 ` Michael Matz
2002-05-18 19:13                                               ` Minimal GCC/Linux shared lib + EH bug example David Abrahams
2002-05-19  4:29                                                 ` Martin v. Loewis
2002-05-19  5:10                                                   ` David Abrahams
2002-05-19 14:48                                                     ` Martin v. Loewis
2002-05-15 16:36                                 ` David Abrahams
2002-05-15 19:26                                   ` Jeff Sturm
2002-05-12  8:17     ` Martin v. Loewis
2002-05-14 15:37 Ralf W. Grosse-Kunstleve
2002-05-15  5:05 ` Martin v. Löwis
2002-05-16  0:16   ` Mark Mitchell
2002-05-16  4:35     ` Martin v. Löwis
2002-05-15  8:23 Ralf W. Grosse-Kunstleve
2002-05-15  8:32 ` David Abrahams
2002-05-15 10:44 ` Martin v. Löwis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).