GNU dlopen(3) differs from POSIX/IEEE

public inbox for gnu-gabi@sourceware.org
 help / color / mirror / Atom feed

* GNU dlopen(3) differs from POSIX/IEEE
@ 2016-01-01  0:00 Suprateeka R Hegde
  2016-01-01  0:00 ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: Suprateeka R Hegde @ 2016-01-01  0:00 UTC (permalink / raw)
  To: gnu-gabi

Hi

The RTLD_GLOBAL flag of dlopen(3) under POSIX/IEEE standards says "The 
executable object file's symbols shall be made available for relocation 
processing of any other executable object file".

(http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html)

However, on a GNU/Linux system, the manpage says "The symbols defined by 
this library will be made available for symbol resolution of 
subsequently loaded libraries".

And yes, there is a difference between the two. According to the 
POSIX/IEEE one, the symbols from a dlopened library is available for 
symbol resolution in the executable (a.out) also. The GNU one seems to 
restrict it "subsequently" opened objects only, and not "any" object.

The following case fails on GNU/Linux, but works on other POSIX 
compliant systems.

---
$ cat main.c
#include <dlfcn.h>
extern void foo(void);
int main()
{
          dlopen("./libfoo1.so", RTLD_GLOBAL);
          foo();
          return 0;
}

$ cat libfoo.c
#include <stdio.h>
void foo(void) { printf("In foo\n"); }

$ cc -fpic -shared libfoo.c -o libfoo.so
$ cc main.c -ldl # Read Note-1 at the end

$ ./a.out
Segmentation fault (core dumped)

$ LD_PRELOAD=./libfoo1.so ./a.out
In foo
---

That means dlopen RTLD_GLOBAL was not effective. LD_PRELOAD was 
effective. Of course the entire exercise is for lazy bind mode.

Is there any reason why GNU differes here? Does it mean the GNU variant 
is not POSIX/IEEE compliant?

-- 
Supra

Note-1:

Without provding libfoo on the link line, I could not get a JUMP_SLOT 
for foo. So I provided -lfoo for the link-edit phase and then renamed 
libfoo.so to libfoo1.so and also created a dummy libfoo.so without foo. 
This way, I could get a JUMP_SLOT for foo. This hack was not necessary 
on other platforms as foo gets a PLT entry even without definition. By 
getting a JUMP_SLOT, I could verify if LD_PRELOAD works in this case.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00 GNU dlopen(3) differs from POSIX/IEEE Suprateeka R Hegde
@ 2016-01-01  0:00 ` Carlos O'Donell
  2016-01-01  0:00   ` Suprateeka R Hegde
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2016-01-01  0:00 UTC (permalink / raw)
  To: hegdesmailbox, gnu-gabi

On 06/13/2016 10:48 AM, Suprateeka R Hegde wrote:
> Without provding libfoo on the link line, I could not get a JUMP_SLOT
> for foo. So I provided -lfoo for the link-edit phase and then renamed
> libfoo.so to libfoo1.so and also created a dummy libfoo.so without
> foo. This way, I could get a JUMP_SLOT for foo. This hack was not
> necessary on other platforms as foo gets a PLT entry even without
> definition. By getting a JUMP_SLOT, I could verify if LD_PRELOAD
> works in this case.

Correct, you don't get a PLT entry for foo unless it's in a shared
library at link-edit time.

Could you actually provide the exact steps you used in a GNU/Linux-
--based system to produce the final executable?

My experience is that you will either see a failure at link-edit
time, failure at runtime (missing libfoo.so, undefined symbol foo),
and will never get to the point where you can run the application
and get a segfault. I'm curious to see exactly the way you constructed
the scenario.

Therefore if the application's global symbol references all must be
defined before it starts there is no possibility for dlopen with
RTLD_GLOBAL to add symbols to the global scope that can be used
to result such symbols, because they are already resolved.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00 ` Carlos O'Donell
@ 2016-01-01  0:00   ` Suprateeka R Hegde
  2016-01-01  0:00     ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: Suprateeka R Hegde @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell, gnu-gabi

(I was away from work. Apologize for the delay in response)

On 13-Jun-2016 11:21 PM, Carlos O'Donell wrote:
> On 06/13/2016 10:48 AM, Suprateeka R Hegde wrote:
>> Without provding libfoo on the link line, I could not get a JUMP_SLOT
>> for foo. So I provided -lfoo for the link-edit phase and then renamed
>> libfoo.so to libfoo1.so and also created a dummy libfoo.so without
>> foo. This way, I could get a JUMP_SLOT for foo. This hack was not
>> necessary on other platforms as foo gets a PLT entry even without
>> definition. By getting a JUMP_SLOT, I could verify if LD_PRELOAD
>> works in this case.
>
> Correct, you don't get a PLT entry for foo unless it's in a shared
> library at link-edit time.
>
> Could you actually provide the exact steps you used in a GNU/Linux-
> --based system to produce the final executable?
>
> My experience is that you will either see a failure at link-edit
> time, failure at runtime (missing libfoo.so, undefined symbol foo),
> and will never get to the point where you can run the application
> and get a segfault. I'm curious to see exactly the way you constructed
> the scenario.

This is just to show there are ways to bring symbols to global space at 
runtime. LD_PRELOAD is one way. dlopen(3) with RTLD_GLOBAL would be 
another way, but on GNU based system it is not as per POSIX/IEEE specs.

So I tested for at least the LD_PRELOAD way. Here are the exact steps:

---
$ cat main.c
#include <dlfcn.h>
extern void foo(void);
int main()
{
           dlopen("./libfoo1.so", RTLD_GLOBAL);
           foo();
           return 0;
}

$ cat libfoo.c
#include <stdio.h>
void foo(void) { printf("In foo\n"); }

$ cat libjunk.c
#include <stdio.h>
void junk(void) { printf("Junky\n"); }

$ cc -fpic -shared libfoo.c -o libfoo.so
$ cc main.c -ldl -L. -lfoo # Gets a JUMP_SLOT for foo
$ cp libfoo.so libfoo1.so

$ # Now change libfoo.so not to contain foo. In other words
$ # not to resolve foo from startup libfoo.so. Keep it unresolved
$ # for lazy bind to happen to a runtime-brought-in global foo.

$ cc -fpic -shared libjunk.c -o libfoo.so
$ LD_PRELOAD=./libfoo1.so ./a.out
In foo
---

As you see, program works fine and foo is lazy bound to foo from 
libfoo1.so, which has been brought in at runtime without being there at 
link-edit time. The same case would have worked even without LD_PRELOAD, 
and with only dlopen-RTLD_GLOBAL if the GNU dlopen(3) matched the spec 
defined by POSIX/IEEE.

> Therefore if the application's global symbol references all must be
> defined before it starts there is no possibility for dlopen with
> RTLD_GLOBAL to add symbols to the global scope that can be used
> to result such symbols, because they are already resolved.

No possibility with current GNU implementation. But possible with 
POSIX/IEEE compliant dlopen(3). The test case works fine on other POSIX 
compliant system.

All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in foo 
at runtime to be compliant with POSIX.

--
Supra

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00   ` Suprateeka R Hegde
@ 2016-01-01  0:00     ` Carlos O'Donell
  2016-01-01  0:00       ` Suprateeka R Hegde
  0 siblings, 1 reply; 12+ messages in thread
From: Carlos O'Donell @ 2016-01-01  0:00 UTC (permalink / raw)
  To: hegdesmailbox, gnu-gabi

On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote:
> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in
> foo at runtime to be compliant with POSIX.

I disagree. Nothing in POSIX says that needs to be done. The
key failure in your reasoning is that you have assumed lazy
symbol resolution must happen at the point of the first function
call.

You have read "shall be made available for relocation" and
then used implementation knowledge to decide that _today_ those
relocations have a happens-after relationship with dlopen in your
program. But because lazy symbol resolution is not an observable
event for a well-defined program, and no guarantees are made, 
you can't make a happens-after relationship, and can't expect
'foo' to resolve to the loaded 'foo' that came into the global
scope with dlopen.

Perhaps in the future you want a mode where all lazy symbol
resolution is done before the first dlopen runs. Say we want to
do this to relocate the whole PLT and mark it read-only for
safety hardening.

If you were to _require_ lazy resolution to happen at the point
of the function call, which is what you're assuming here, then
it would prevent the above implementation from being conforming.
However, because POSIX says nothing about when the lazy symbol
resolution happens, or anything at all about it, your argument
is invalid.

What you observe on other implementations is a detail of the
implementation and a non-portable one.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00     ` Carlos O'Donell
@ 2016-01-01  0:00       ` Suprateeka R Hegde
  2016-01-01  0:00         ` Carlos O'Donell
  0 siblings, 1 reply; 12+ messages in thread
From: Suprateeka R Hegde @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell, gnu-gabi

On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote:
> On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote:
>> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in
>> foo at runtime to be compliant with POSIX.
>
> I disagree. Nothing in POSIX says that needs to be done. The
> key failure in your reasoning is that you have assumed lazy
> symbol resolution must happen at the point of the first function
> call.

ld(1) on a GNU/Linux machine says:
---
-z lazy

When generating an executable or shared library, mark it to tell the 
dynamic linker to defer function call resolution to the point when the 
function is called (lazy binding)
---

This made me think that GNU implementation also matches with other 
implementations -- that is lazy resolution happens at the time of the 
first call.

> You have read "shall be made available for relocation" and
> then used implementation knowledge to decide that _today_ those
> relocations have a happens-after relationship with dlopen in your
> program. But because lazy symbol resolution is not an observable
> event for a well-defined program,

Yes. I agree very much. But making some massive enterprise legacy 
application to become "well-defined" now is beyond tool chain writers.

The very use of --unresolved-symbol=ignore all for an executable link is 
bad in a way.

> and no guarantees are made,
> you can't make a happens-after relationship, and can't expect
> 'foo' to resolve to the loaded 'foo' that came into the global
> scope with dlopen.
>
> Perhaps in the future you want a mode where all lazy symbol
> resolution is done before the first dlopen runs. Say we want to
> do this to relocate the whole PLT and mark it read-only for
> safety hardening.

This is going to be a "mode". Almost similar to BIND_NOW. But not 
default. Even if decided default, a non-default (lazy writable PLTs) 
mode still exists.

> If you were to _require_ lazy resolution to happen at the point
> of the function call, which is what you're assuming here, then
> it would prevent the above implementation from being conforming.

Both are mutually exclusive. In my opinion, programs either want 
immediate binding or lazy binding. Not an arbitrary mix of both.

> However, because POSIX says nothing about when the lazy symbol
> resolution happens, or anything at all about it,

It indeed says something:
---
RTLD_LAZY

Relocations shall be performed at an implementation-defined time, 
ranging from the time of the dlopen() call until the first reference to 
a given symbol occurs
---

And then based on the ld(1) manpage, I thought GNU/Linux implementation 
uses the time of first call.

What is the harm if we go by the existing documentation and under the 
option -z lazy or RTLD_LAZY, make lazy resolution happen at the point of 
function call?

(BTW, the above is already in place currently and is working as expected)

And eventually change the semantics of RTLD_GLOBAL to match the 
description mentioned in the POSIX spec -- ...relocation processing of 
any other executable object file.

--
Supra

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00       ` Suprateeka R Hegde
@ 2016-01-01  0:00         ` Carlos O'Donell
  2016-01-01  0:00           ` Suprateeka R Hegde
  2016-01-01  0:00           ` Florian Weimer
  0 siblings, 2 replies; 12+ messages in thread
From: Carlos O'Donell @ 2016-01-01  0:00 UTC (permalink / raw)
  To: hegdesmailbox, gnu-gabi

On 06/18/2016 04:01 AM, Suprateeka R Hegde wrote:
> 
> 
> On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote:
>> On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote:
>>> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in
>>> foo at runtime to be compliant with POSIX.
>>
>> I disagree. Nothing in POSIX says that needs to be done. The
>> key failure in your reasoning is that you have assumed lazy
>> symbol resolution must happen at the point of the first function
>> call.
> 
> ld(1) on a GNU/Linux machine says:
> ---
> -z lazy
> 
> When generating an executable or shared library, mark it to tell the
> dynamic linker to defer function call resolution to the point when
> the function is called (lazy binding)
> ---

Note that those man page is part of the linux man pages project and
are not canonical documentation for the glibc project. Often the man
pages documentation goes too far in describing the implementation
and beyond what is guaranteed. We can work with Michael Kerrisk to
get this changed quickly to read "defer function call resolution
to an implementation-defined point in the future, possibly as late
as the point when the function is called (lazy binding)."

> This made me think that GNU implementation also matches with other
> implementations -- that is lazy resolution happens at the time of the
> first call.

That is not an assumption that developers should be making.

>> You have read "shall be made available for relocation" and
>> then used implementation knowledge to decide that _today_ those
>> relocations have a happens-after relationship with dlopen in your
>> program. But because lazy symbol resolution is not an observable
>> event for a well-defined program,
> 
> Yes. I agree very much. But making some massive enterprise legacy
> application to become "well-defined" now is beyond tool chain
> writers.

I agree that inevitably applications of a certain size end up having
dependencies on implementation details that in turn make them costly
to port to other operating systems.

I care a lot about our users, and I don't want to see implementations
constrained by standards text that might limit benefits to them in
the future. So any suggestions you have I'm going to weigh against
what I think a sensible user might expect, not a singular enterprise
application.

>> If you were to _require_ lazy resolution to happen at the point
>> of the function call, which is what you're assuming here, then
>> it would prevent the above implementation from being conforming.
> 
> Both are mutually exclusive. In my opinion, programs either want
> immediate binding or lazy binding. Not an arbitrary mix of both.

I disagree. Lazy binding provides significant performance boosts,
but in a mixed lazy/now binding environment you can bind a fixed
number of key security related symbols early to quickly determine
if the application uses say "execve" and decide if access control,
in a policy-less environment, needs to be disabled (execve disabled
unless the application needs it).

You argue that we should standardize on "bind now" which happens
immediately at startup, and "lazy binding" which always happens
at the time the function is called, ignoring any opportunisitic
binding that might happen if the dynamic loader happens to prove
it knows what the binding result will be.

No, if anything, I think we should be less proscriptive about
lazy binding.

>> However, because POSIX says nothing about when the lazy symbol
>> resolution happens, or anything at all about it,
> 
> It indeed says something:

Only for dlopen...

> ---
> RTLD_LAZY
> 
> Relocations shall be performed at an implementation-defined time,
> ranging from the time of the dlopen() call until the first reference
> to a given symbol occurs
> ---

... and it says nothing really, like it should, leaving the choice
up to the implementation. This text is specifically geared towards
shared objects loaded via dlopen, not the symbols in the binary, for
which the standard says nothing.

> And then based on the ld(1) manpage, I thought GNU/Linux
> implementation uses the time of first call.

It does, but it doesn't use symbols brought into the global scope
by dlopen for this resolution.

> What is the harm if we go by the existing documentation and under the
> option -z lazy or RTLD_LAZY, make lazy resolution happen at the point
> of function call?

You forbid a mixed binding environment, you forbid opportunistic binding,
and force the binding to be truly as late as possible.

> And eventually change the semantics of RTLD_GLOBAL to match the
> description mentioned in the POSIX spec -- ...relocation processing
> of any other executable object file.

I don't yet see the benefit in this except that you say some undisclosed
enterprise applications need these semantics because other operating
systems provided them.

That is not a good reason to be overly prescriptive in the standard.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00         ` Carlos O'Donell
@ 2016-01-01  0:00           ` Suprateeka R Hegde
  2016-01-01  0:00             ` Carlos O'Donell
  2016-01-01  0:00           ` Florian Weimer
  1 sibling, 1 reply; 12+ messages in thread
From: Suprateeka R Hegde @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell, gnu-gabi

On 19-Jun-2016 12:25 AM, Carlos O'Donell wrote:
> On 06/18/2016 04:01 AM, Suprateeka R Hegde wrote:
>>
>>
>> On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote:
>>> On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote:
>>>> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in
>>>> foo at runtime to be compliant with POSIX.
>>>
>>> I disagree. Nothing in POSIX says that needs to be done. The
>>> key failure in your reasoning is that you have assumed lazy
>>> symbol resolution must happen at the point of the first function
>>> call.
>>
>> ld(1) on a GNU/Linux machine says:
>> ---
>> -z lazy
>>
>> When generating an executable or shared library, mark it to tell the
>> dynamic linker to defer function call resolution to the point when
>> the function is called (lazy binding)
>> ---
>
> Note that those man page is part of the linux man pages project and
> are not canonical documentation for the glibc project. Often the man
> pages documentation goes too far in describing the implementation
> and beyond what is guaranteed. We can work with Michael Kerrisk to
> get this changed quickly to read "defer function call resolution
> to an implementation-defined point in the future, possibly as late
> as the point when the function is called (lazy binding)."
>
>> This made me think that GNU implementation also matches with other
>> implementations -- that is lazy resolution happens at the time of the
>> first call.
>
> That is not an assumption that developers should be making.

Not as a developer. I usually read manpages as an end user. As a 
developer I can clearly see whats happening currently. And whats 
happening currently matches the description in the manoage too. They are 
in sync now -- that is resolution at the time of first function call.

>
>>> You have read "shall be made available for relocation" and
>>> then used implementation knowledge to decide that _today_ those
>>> relocations have a happens-after relationship with dlopen in your
>>> program. But because lazy symbol resolution is not an observable
>>> event for a well-defined program,
>>
>> Yes. I agree very much. But making some massive enterprise legacy
>> application to become "well-defined" now is beyond tool chain
>> writers.
>
> I agree that inevitably applications of a certain size end up having
> dependencies on implementation details that in turn make them costly
> to port to other operating systems.
>
> I care a lot about our users, and I don't want to see implementations
> constrained by standards text that might limit benefits to them in
> the future. So any suggestions you have I'm going to weigh against
> what I think a sensible user might expect, not a singular enterprise
> application.

I too agree very much on this. But we are not changing any defaults that 
affects sensible users. We are not standardizing definition of lazy 
resolution. Read more below.

>
>>> If you were to _require_ lazy resolution to happen at the point
>>> of the function call, which is what you're assuming here, then
>>> it would prevent the above implementation from being conforming.
>>
>> Both are mutually exclusive. In my opinion, programs either want
>> immediate binding or lazy binding. Not an arbitrary mix of both.
>
> I disagree. Lazy binding provides significant performance boosts,
> but in a mixed lazy/now binding environment you can bind a fixed
> number of key security related symbols early

I meant, as an observable event, they are exclusive. For optimizations 
or security, anything can be mixed. Any heuristics can be taken to 
achieve best results.

to quickly determine
> if the application uses say "execve" and decide if access control,
> in a policy-less environment, needs to be disabled (execve disabled
> unless the application needs it).
>
> You argue that we should standardize on "bind now" which happens
> immediately at startup, and "lazy binding" which always happens
> at the time the function is called, ignoring any opportunisitic
> binding that might happen if the dynamic loader happens to prove
> it knows what the binding result will be.

No. I am not at all suggesting "binding" be standardized. As you said, 
we do need space for optimizations and improvements.

We can keep existing semantics as is. We can add say "-z smart" 
(LD_BIND_SMARRT) or something like that to mean opportunistic binding 
later when it gets in.

All I am proposing is to make the dlopen(3) RTLD_GLOBAL semantics to 
match that of POSIX/IEEE description.


> No, if anything, I think we should be less proscriptive about
> lazy binding.
>
>>> However, because POSIX says nothing about when the lazy symbol
>>> resolution happens, or anything at all about it,
>>
>> It indeed says something:
>
> Only for dlopen...
>
>> ---
>> RTLD_LAZY
>>
>> Relocations shall be performed at an implementation-defined time,
>> ranging from the time of the dlopen() call until the first reference
>> to a given symbol occurs
>> ---
>
> ... and it says nothing really, like it should, leaving the choice
> up to the implementation. This text is specifically geared towards
> shared objects loaded via dlopen, not the symbols in the binary, for
> which the standard says nothing.
>
>> And then based on the ld(1) manpage, I thought GNU/Linux
>> implementation uses the time of first call.
>
> It does, but it doesn't use symbols brought into the global scope
> by dlopen for this resolution.
>
>> What is the harm if we go by the existing documentation and under the
>> option -z lazy or RTLD_LAZY, make lazy resolution happen at the point
>> of function call?
>
> You forbid a mixed binding environment, you forbid opportunistic binding,
> and force the binding to be truly as late as possible.

No. As I said, I do not want to standardize binding and forbid any 
optimizations.

I am saying, we can change RTLD_GLOBAL semantics and still have all that 
you said. By changing RTLD_GLOBAL semantics, we will not break any 
existing ABI. Its an additional one.

And we can also have -z smart (or -z secure). And we can even make them 
default (in place of existing -z lazy). In that way we have everything.

>
>> And eventually change the semantics of RTLD_GLOBAL to match the
>> description mentioned in the POSIX spec -- ...relocation processing
>> of any other executable object file.
>
> I don't yet see the benefit in this except that you say some undisclosed
> enterprise applications need these semantics because other operating
> systems provided them.
>
> That is not a good reason to be overly prescriptive in the standard.

I think we have a very minor difference of opinion in the whole 
discussion. To re-iterate, I am not proposing to restrict binding 
behaviors either to be NOW or be LAZY. We can add anything  in between 
to optimize or secure. We can add them under an option as I said and 
make it default too.

IMHO, (I was discussing with H.J too on the alternate code sequence 
proposal) lazy binding or writable-PLT cannot be totally removed from a 
platform. Tools like ltrace(1) will stop working. Couple of DSU 
solutions relying on writable-PLT/lazy_bind may stop working.

All of them should co-exist is what I think. One can always use the 
option of choice to achieve desired results.

--
Supra

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00           ` Suprateeka R Hegde
@ 2016-01-01  0:00             ` Carlos O'Donell
  0 siblings, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2016-01-01  0:00 UTC (permalink / raw)
  To: hegdesmailbox, gnu-gabi

On 06/20/2016 10:19 AM, Suprateeka R Hegde wrote:
>>> ld(1) on a GNU/Linux machine says:
>>> ---
>>> -z lazy
>>>
>>> When generating an executable or shared library, mark it to tell the
>>> dynamic linker to defer function call resolution to the point when
>>> the function is called (lazy binding)
>>> ---
>>> This made me think that GNU implementation also matches with other
>>> implementations -- that is lazy resolution happens at the time of the
>>> first call.
>>
>> That is not an assumption that developers should be making.
> 
> Not as a developer. I usually read manpages as an end user. As a
> developer I can clearly see whats happening currently. And whats
> happening currently matches the description in the manoage too. They
> are in sync now -- that is resolution at the time of first function
> call.

I have submitted a patch to correct this. First draft has been approved,
and a second draft with clarifications for STT_GNU_IFUNC has been
submitted.

> All I am proposing is to make the dlopen(3) RTLD_GLOBAL semantics to
> match that of POSIX/IEEE description.

They already match.

GNU dlopen(3) via RTLD_GLOBAL makes symbols available for relocation
processing.

POSIX/IEEE's RTLD_LAZY is the model for the executables own lazy symbol
resolution and there the text of the standard says:
"at an implementation-defined time, ranging from the time of the dlopen()
call until the first reference to a given symbol occurs."

In the case of GNU dlopen(3) I do not wish to constrain the implementation
by saying exactly when the lazy resolution happens, and I see no strong
justification to make it "at the time of the call" and to enforce global
symbol searches from dlopen'd RTLD_GLOBAL symbols.

>> That is not a good reason to be overly prescriptive in the standard.
> 
> I think we have a very minor difference of opinion in the whole
> discussion. To re-iterate, I am not proposing to restrict binding
> behaviors either to be NOW or be LAZY. We can add anything in between
> to optimize or secure. We can add them under an option as I said and
> make it default too.

The existing implementation is conforming as far as I can see.

What you want is for the implementation-defined time to be documented
as "at the time of the call" and therefore to be required to consider
the symbols brought in by the dlopen via RTLD_GLOBAL.

I do not feel that you have provided enough technical justification
for this requirement. We don't know how this might impact the existing
GNU userspace for the sake of a singular use case you present here.

I could be convinced otherwise, but I am not yet convinced that the
existing semantics should be changed.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00         ` Carlos O'Donell
  2016-01-01  0:00           ` Suprateeka R Hegde
@ 2016-01-01  0:00           ` Florian Weimer
  2016-01-01  0:00             ` Szabolcs Nagy
  2016-01-01  0:00             ` Carlos O'Donell
  1 sibling, 2 replies; 12+ messages in thread
From: Florian Weimer @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: hegdesmailbox, gnu-gabi

* Carlos O'Donell:

>> ld(1) on a GNU/Linux machine says:
>> ---
>> -z lazy
>> 
>> When generating an executable or shared library, mark it to tell the
>> dynamic linker to defer function call resolution to the point when
>> the function is called (lazy binding)
>> ---
>
> Note that those man page is part of the linux man pages project and
> are not canonical documentation for the glibc project.

This particular ld manual page seems to be derived from the
ld/binutils Info documentation, which promises the same behavior.

I am not sure what the exact semantics of lazy binding should be.
With IFUNCs, lazy binding is observable, and we know from Fedora's
BIND_NOW experiment that some applications assume that undefined
functions which are never called do not cause any trouble whatsoever.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00           ` Florian Weimer
@ 2016-01-01  0:00             ` Szabolcs Nagy
  2016-01-01  0:00               ` Florian Weimer
  2016-01-01  0:00             ` Carlos O'Donell
  1 sibling, 1 reply; 12+ messages in thread
From: Szabolcs Nagy @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell, hegdesmailbox, gnu-gabi

* Florian Weimer <fw@deneb.enyo.de> [2016-07-01 22:46:19 +0200]:
> I am not sure what the exact semantics of lazy binding should be.
> With IFUNCs, lazy binding is observable, and we know from Fedora's
> BIND_NOW experiment that some applications assume that undefined
> functions which are never called do not cause any trouble whatsoever.

this bind now experiment made me curious but i could not
find the results and its description.

is there a list of affected packages somewhere?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00             ` Szabolcs Nagy
@ 2016-01-01  0:00               ` Florian Weimer
  0 siblings, 0 replies; 12+ messages in thread
From: Florian Weimer @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Carlos O'Donell, hegdesmailbox, gnu-gabi

* Szabolcs Nagy:

> * Florian Weimer <fw@deneb.enyo.de> [2016-07-01 22:46:19 +0200]:
>> I am not sure what the exact semantics of lazy binding should be.
>> With IFUNCs, lazy binding is observable, and we know from Fedora's
>> BIND_NOW experiment that some applications assume that undefined
>> functions which are never called do not cause any trouble whatsoever.
>
> this bind now experiment made me curious but i could not
> find the results and its description.

It's ongoing:

  <https://fedoraproject.org/wiki/Changes/Harden_All_Packages>

Alpine Linux with musl runs essentially the same experiment because
musl does not support lazy binding.

> is there a list of affected packages somewhere?

I'm not sure.  I don't think Fedora keeps a tally of the exceptions.
I can generate a list of objects which use lazy binding, but I don't
know if those are accidents or the result of a deliberate choice.

One example that keeps coming up is Xorg server modules, which do not
use DT_NEEDED.  Instead, an external dependency mechanism makes sure
that functions in them are called only after all the relevant modules
have been loaded (but not necessarily in the order of their symbol
bindings).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: GNU dlopen(3) differs from POSIX/IEEE
  2016-01-01  0:00           ` Florian Weimer
  2016-01-01  0:00             ` Szabolcs Nagy
@ 2016-01-01  0:00             ` Carlos O'Donell
  1 sibling, 0 replies; 12+ messages in thread
From: Carlos O'Donell @ 2016-01-01  0:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: hegdesmailbox, gnu-gabi

On 07/01/2016 04:46 PM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>>> ld(1) on a GNU/Linux machine says:
>>> ---
>>> -z lazy
>>>
>>> When generating an executable or shared library, mark it to tell the
>>> dynamic linker to defer function call resolution to the point when
>>> the function is called (lazy binding)
>>> ---
>>
>> Note that those man page is part of the linux man pages project and
>> are not canonical documentation for the glibc project.
> 
> This particular ld manual page seems to be derived from the
> ld/binutils Info documentation, which promises the same behavior.

The binutils manual should not dictate glibc behaviour.

Patch sent to binutils:
https://sourceware.org/ml/binutils/2016-07/msg00104.html

> I am not sure what the exact semantics of lazy binding should be.

The semantics of lazy binding are purposely vague to avoid constraining
the implementation. The reference to the symbol will be resolved at 
some point between load and call.

Do we need stricter semantics? Do the stricter semantics give us something
in return for the constraints we place on the implementation?

> With IFUNCs, lazy binding is observable, and we know from Fedora's
> BIND_NOW experiment that some applications assume that undefined
> functions which are never called do not cause any trouble whatsoever.

The IFUNC observes lazy binding only indirectly in that the resolver
is called one or more times depending on (a) number of object references
to the resolver and (b) number of threads concurrently updating GOT/PLT
entries and calling the ifunc resolver.

If there are relevant issues from Fedora's BIND_NOW testing that relate
to gnu-gabi, then we should raise them in a new thread.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-07-30 18:44 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-01  0:00 GNU dlopen(3) differs from POSIX/IEEE Suprateeka R Hegde
2016-01-01  0:00 ` Carlos O'Donell
2016-01-01  0:00   ` Suprateeka R Hegde
2016-01-01  0:00     ` Carlos O'Donell
2016-01-01  0:00       ` Suprateeka R Hegde
2016-01-01  0:00         ` Carlos O'Donell
2016-01-01  0:00           ` Suprateeka R Hegde
2016-01-01  0:00             ` Carlos O'Donell
2016-01-01  0:00           ` Florian Weimer
2016-01-01  0:00             ` Szabolcs Nagy
2016-01-01  0:00               ` Florian Weimer
2016-01-01  0:00             ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).