* GNU dlopen(3) differs from POSIX/IEEE @ 2016-01-01 0:00 Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 0 siblings, 1 reply; 12+ messages in thread From: Suprateeka R Hegde @ 2016-01-01 0:00 UTC (permalink / raw) To: gnu-gabi Hi The RTLD_GLOBAL flag of dlopen(3) under POSIX/IEEE standards says "The executable object file's symbols shall be made available for relocation processing of any other executable object file". (http://pubs.opengroup.org/onlinepubs/9699919799/functions/dlopen.html) However, on a GNU/Linux system, the manpage says "The symbols defined by this library will be made available for symbol resolution of subsequently loaded libraries". And yes, there is a difference between the two. According to the POSIX/IEEE one, the symbols from a dlopened library is available for symbol resolution in the executable (a.out) also. The GNU one seems to restrict it "subsequently" opened objects only, and not "any" object. The following case fails on GNU/Linux, but works on other POSIX compliant systems. --- $ cat main.c #include <dlfcn.h> extern void foo(void); int main() { dlopen("./libfoo1.so", RTLD_GLOBAL); foo(); return 0; } $ cat libfoo.c #include <stdio.h> void foo(void) { printf("In foo\n"); } $ cc -fpic -shared libfoo.c -o libfoo.so $ cc main.c -ldl # Read Note-1 at the end $ ./a.out Segmentation fault (core dumped) $ LD_PRELOAD=./libfoo1.so ./a.out In foo --- That means dlopen RTLD_GLOBAL was not effective. LD_PRELOAD was effective. Of course the entire exercise is for lazy bind mode. Is there any reason why GNU differes here? Does it mean the GNU variant is not POSIX/IEEE compliant? -- Supra Note-1: Without provding libfoo on the link line, I could not get a JUMP_SLOT for foo. So I provided -lfoo for the link-edit phase and then renamed libfoo.so to libfoo1.so and also created a dummy libfoo.so without foo. This way, I could get a JUMP_SLOT for foo. This hack was not necessary on other platforms as foo gets a PLT entry even without definition. By getting a JUMP_SLOT, I could verify if LD_PRELOAD works in this case. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 GNU dlopen(3) differs from POSIX/IEEE Suprateeka R Hegde @ 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 0 siblings, 1 reply; 12+ messages in thread From: Carlos O'Donell @ 2016-01-01 0:00 UTC (permalink / raw) To: hegdesmailbox, gnu-gabi On 06/13/2016 10:48 AM, Suprateeka R Hegde wrote: > Without provding libfoo on the link line, I could not get a JUMP_SLOT > for foo. So I provided -lfoo for the link-edit phase and then renamed > libfoo.so to libfoo1.so and also created a dummy libfoo.so without > foo. This way, I could get a JUMP_SLOT for foo. This hack was not > necessary on other platforms as foo gets a PLT entry even without > definition. By getting a JUMP_SLOT, I could verify if LD_PRELOAD > works in this case. Correct, you don't get a PLT entry for foo unless it's in a shared library at link-edit time. Could you actually provide the exact steps you used in a GNU/Linux- --based system to produce the final executable? My experience is that you will either see a failure at link-edit time, failure at runtime (missing libfoo.so, undefined symbol foo), and will never get to the point where you can run the application and get a segfault. I'm curious to see exactly the way you constructed the scenario. Therefore if the application's global symbol references all must be defined before it starts there is no possibility for dlopen with RTLD_GLOBAL to add symbols to the global scope that can be used to result such symbols, because they are already resolved. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Carlos O'Donell @ 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 0 siblings, 1 reply; 12+ messages in thread From: Suprateeka R Hegde @ 2016-01-01 0:00 UTC (permalink / raw) To: Carlos O'Donell, gnu-gabi (I was away from work. Apologize for the delay in response) On 13-Jun-2016 11:21 PM, Carlos O'Donell wrote: > On 06/13/2016 10:48 AM, Suprateeka R Hegde wrote: >> Without provding libfoo on the link line, I could not get a JUMP_SLOT >> for foo. So I provided -lfoo for the link-edit phase and then renamed >> libfoo.so to libfoo1.so and also created a dummy libfoo.so without >> foo. This way, I could get a JUMP_SLOT for foo. This hack was not >> necessary on other platforms as foo gets a PLT entry even without >> definition. By getting a JUMP_SLOT, I could verify if LD_PRELOAD >> works in this case. > > Correct, you don't get a PLT entry for foo unless it's in a shared > library at link-edit time. > > Could you actually provide the exact steps you used in a GNU/Linux- > --based system to produce the final executable? > > My experience is that you will either see a failure at link-edit > time, failure at runtime (missing libfoo.so, undefined symbol foo), > and will never get to the point where you can run the application > and get a segfault. I'm curious to see exactly the way you constructed > the scenario. This is just to show there are ways to bring symbols to global space at runtime. LD_PRELOAD is one way. dlopen(3) with RTLD_GLOBAL would be another way, but on GNU based system it is not as per POSIX/IEEE specs. So I tested for at least the LD_PRELOAD way. Here are the exact steps: --- $ cat main.c #include <dlfcn.h> extern void foo(void); int main() { dlopen("./libfoo1.so", RTLD_GLOBAL); foo(); return 0; } $ cat libfoo.c #include <stdio.h> void foo(void) { printf("In foo\n"); } $ cat libjunk.c #include <stdio.h> void junk(void) { printf("Junky\n"); } $ cc -fpic -shared libfoo.c -o libfoo.so $ cc main.c -ldl -L. -lfoo # Gets a JUMP_SLOT for foo $ cp libfoo.so libfoo1.so $ # Now change libfoo.so not to contain foo. In other words $ # not to resolve foo from startup libfoo.so. Keep it unresolved $ # for lazy bind to happen to a runtime-brought-in global foo. $ cc -fpic -shared libjunk.c -o libfoo.so $ LD_PRELOAD=./libfoo1.so ./a.out In foo --- As you see, program works fine and foo is lazy bound to foo from libfoo1.so, which has been brought in at runtime without being there at link-edit time. The same case would have worked even without LD_PRELOAD, and with only dlopen-RTLD_GLOBAL if the GNU dlopen(3) matched the spec defined by POSIX/IEEE. > Therefore if the application's global symbol references all must be > defined before it starts there is no possibility for dlopen with > RTLD_GLOBAL to add symbols to the global scope that can be used > to result such symbols, because they are already resolved. No possibility with current GNU implementation. But possible with POSIX/IEEE compliant dlopen(3). The test case works fine on other POSIX compliant system. All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in foo at runtime to be compliant with POSIX. -- Supra ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Suprateeka R Hegde @ 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 0 siblings, 1 reply; 12+ messages in thread From: Carlos O'Donell @ 2016-01-01 0:00 UTC (permalink / raw) To: hegdesmailbox, gnu-gabi On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote: > All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in > foo at runtime to be compliant with POSIX. I disagree. Nothing in POSIX says that needs to be done. The key failure in your reasoning is that you have assumed lazy symbol resolution must happen at the point of the first function call. You have read "shall be made available for relocation" and then used implementation knowledge to decide that _today_ those relocations have a happens-after relationship with dlopen in your program. But because lazy symbol resolution is not an observable event for a well-defined program, and no guarantees are made, you can't make a happens-after relationship, and can't expect 'foo' to resolve to the loaded 'foo' that came into the global scope with dlopen. Perhaps in the future you want a mode where all lazy symbol resolution is done before the first dlopen runs. Say we want to do this to relocate the whole PLT and mark it read-only for safety hardening. If you were to _require_ lazy resolution to happen at the point of the function call, which is what you're assuming here, then it would prevent the above implementation from being conforming. However, because POSIX says nothing about when the lazy symbol resolution happens, or anything at all about it, your argument is invalid. What you observe on other implementations is a detail of the implementation and a non-portable one. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Carlos O'Donell @ 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 0 siblings, 1 reply; 12+ messages in thread From: Suprateeka R Hegde @ 2016-01-01 0:00 UTC (permalink / raw) To: Carlos O'Donell, gnu-gabi On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote: > On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote: >> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in >> foo at runtime to be compliant with POSIX. > > I disagree. Nothing in POSIX says that needs to be done. The > key failure in your reasoning is that you have assumed lazy > symbol resolution must happen at the point of the first function > call. ld(1) on a GNU/Linux machine says: --- -z lazy When generating an executable or shared library, mark it to tell the dynamic linker to defer function call resolution to the point when the function is called (lazy binding) --- This made me think that GNU implementation also matches with other implementations -- that is lazy resolution happens at the time of the first call. > You have read "shall be made available for relocation" and > then used implementation knowledge to decide that _today_ those > relocations have a happens-after relationship with dlopen in your > program. But because lazy symbol resolution is not an observable > event for a well-defined program, Yes. I agree very much. But making some massive enterprise legacy application to become "well-defined" now is beyond tool chain writers. The very use of --unresolved-symbol=ignore all for an executable link is bad in a way. > and no guarantees are made, > you can't make a happens-after relationship, and can't expect > 'foo' to resolve to the loaded 'foo' that came into the global > scope with dlopen. > > Perhaps in the future you want a mode where all lazy symbol > resolution is done before the first dlopen runs. Say we want to > do this to relocate the whole PLT and mark it read-only for > safety hardening. This is going to be a "mode". Almost similar to BIND_NOW. But not default. Even if decided default, a non-default (lazy writable PLTs) mode still exists. > If you were to _require_ lazy resolution to happen at the point > of the function call, which is what you're assuming here, then > it would prevent the above implementation from being conforming. Both are mutually exclusive. In my opinion, programs either want immediate binding or lazy binding. Not an arbitrary mix of both. > However, because POSIX says nothing about when the lazy symbol > resolution happens, or anything at all about it, It indeed says something: --- RTLD_LAZY Relocations shall be performed at an implementation-defined time, ranging from the time of the dlopen() call until the first reference to a given symbol occurs --- And then based on the ld(1) manpage, I thought GNU/Linux implementation uses the time of first call. What is the harm if we go by the existing documentation and under the option -z lazy or RTLD_LAZY, make lazy resolution happen at the point of function call? (BTW, the above is already in place currently and is working as expected) And eventually change the semantics of RTLD_GLOBAL to match the description mentioned in the POSIX spec -- ...relocation processing of any other executable object file. -- Supra ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Suprateeka R Hegde @ 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Florian Weimer 0 siblings, 2 replies; 12+ messages in thread From: Carlos O'Donell @ 2016-01-01 0:00 UTC (permalink / raw) To: hegdesmailbox, gnu-gabi On 06/18/2016 04:01 AM, Suprateeka R Hegde wrote: > > > On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote: >> On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote: >>> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in >>> foo at runtime to be compliant with POSIX. >> >> I disagree. Nothing in POSIX says that needs to be done. The >> key failure in your reasoning is that you have assumed lazy >> symbol resolution must happen at the point of the first function >> call. > > ld(1) on a GNU/Linux machine says: > --- > -z lazy > > When generating an executable or shared library, mark it to tell the > dynamic linker to defer function call resolution to the point when > the function is called (lazy binding) > --- Note that those man page is part of the linux man pages project and are not canonical documentation for the glibc project. Often the man pages documentation goes too far in describing the implementation and beyond what is guaranteed. We can work with Michael Kerrisk to get this changed quickly to read "defer function call resolution to an implementation-defined point in the future, possibly as late as the point when the function is called (lazy binding)." > This made me think that GNU implementation also matches with other > implementations -- that is lazy resolution happens at the time of the > first call. That is not an assumption that developers should be making. >> You have read "shall be made available for relocation" and >> then used implementation knowledge to decide that _today_ those >> relocations have a happens-after relationship with dlopen in your >> program. But because lazy symbol resolution is not an observable >> event for a well-defined program, > > Yes. I agree very much. But making some massive enterprise legacy > application to become "well-defined" now is beyond tool chain > writers. I agree that inevitably applications of a certain size end up having dependencies on implementation details that in turn make them costly to port to other operating systems. I care a lot about our users, and I don't want to see implementations constrained by standards text that might limit benefits to them in the future. So any suggestions you have I'm going to weigh against what I think a sensible user might expect, not a singular enterprise application. >> If you were to _require_ lazy resolution to happen at the point >> of the function call, which is what you're assuming here, then >> it would prevent the above implementation from being conforming. > > Both are mutually exclusive. In my opinion, programs either want > immediate binding or lazy binding. Not an arbitrary mix of both. I disagree. Lazy binding provides significant performance boosts, but in a mixed lazy/now binding environment you can bind a fixed number of key security related symbols early to quickly determine if the application uses say "execve" and decide if access control, in a policy-less environment, needs to be disabled (execve disabled unless the application needs it). You argue that we should standardize on "bind now" which happens immediately at startup, and "lazy binding" which always happens at the time the function is called, ignoring any opportunisitic binding that might happen if the dynamic loader happens to prove it knows what the binding result will be. No, if anything, I think we should be less proscriptive about lazy binding. >> However, because POSIX says nothing about when the lazy symbol >> resolution happens, or anything at all about it, > > It indeed says something: Only for dlopen... > --- > RTLD_LAZY > > Relocations shall be performed at an implementation-defined time, > ranging from the time of the dlopen() call until the first reference > to a given symbol occurs > --- ... and it says nothing really, like it should, leaving the choice up to the implementation. This text is specifically geared towards shared objects loaded via dlopen, not the symbols in the binary, for which the standard says nothing. > And then based on the ld(1) manpage, I thought GNU/Linux > implementation uses the time of first call. It does, but it doesn't use symbols brought into the global scope by dlopen for this resolution. > What is the harm if we go by the existing documentation and under the > option -z lazy or RTLD_LAZY, make lazy resolution happen at the point > of function call? You forbid a mixed binding environment, you forbid opportunistic binding, and force the binding to be truly as late as possible. > And eventually change the semantics of RTLD_GLOBAL to match the > description mentioned in the POSIX spec -- ...relocation processing > of any other executable object file. I don't yet see the benefit in this except that you say some undisclosed enterprise applications need these semantics because other operating systems provided them. That is not a good reason to be overly prescriptive in the standard. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Carlos O'Donell @ 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Florian Weimer 1 sibling, 1 reply; 12+ messages in thread From: Suprateeka R Hegde @ 2016-01-01 0:00 UTC (permalink / raw) To: Carlos O'Donell, gnu-gabi On 19-Jun-2016 12:25 AM, Carlos O'Donell wrote: > On 06/18/2016 04:01 AM, Suprateeka R Hegde wrote: >> >> >> On 18-Jun-2016 11:02 AM, Carlos O'Donell wrote: >>> On 06/18/2016 12:11 AM, Suprateeka R Hegde wrote: >>>> All I am saying is, dlopen(3) with RTLD_GLOBAL also should bring in >>>> foo at runtime to be compliant with POSIX. >>> >>> I disagree. Nothing in POSIX says that needs to be done. The >>> key failure in your reasoning is that you have assumed lazy >>> symbol resolution must happen at the point of the first function >>> call. >> >> ld(1) on a GNU/Linux machine says: >> --- >> -z lazy >> >> When generating an executable or shared library, mark it to tell the >> dynamic linker to defer function call resolution to the point when >> the function is called (lazy binding) >> --- > > Note that those man page is part of the linux man pages project and > are not canonical documentation for the glibc project. Often the man > pages documentation goes too far in describing the implementation > and beyond what is guaranteed. We can work with Michael Kerrisk to > get this changed quickly to read "defer function call resolution > to an implementation-defined point in the future, possibly as late > as the point when the function is called (lazy binding)." > >> This made me think that GNU implementation also matches with other >> implementations -- that is lazy resolution happens at the time of the >> first call. > > That is not an assumption that developers should be making. Not as a developer. I usually read manpages as an end user. As a developer I can clearly see whats happening currently. And whats happening currently matches the description in the manoage too. They are in sync now -- that is resolution at the time of first function call. > >>> You have read "shall be made available for relocation" and >>> then used implementation knowledge to decide that _today_ those >>> relocations have a happens-after relationship with dlopen in your >>> program. But because lazy symbol resolution is not an observable >>> event for a well-defined program, >> >> Yes. I agree very much. But making some massive enterprise legacy >> application to become "well-defined" now is beyond tool chain >> writers. > > I agree that inevitably applications of a certain size end up having > dependencies on implementation details that in turn make them costly > to port to other operating systems. > > I care a lot about our users, and I don't want to see implementations > constrained by standards text that might limit benefits to them in > the future. So any suggestions you have I'm going to weigh against > what I think a sensible user might expect, not a singular enterprise > application. I too agree very much on this. But we are not changing any defaults that affects sensible users. We are not standardizing definition of lazy resolution. Read more below. > >>> If you were to _require_ lazy resolution to happen at the point >>> of the function call, which is what you're assuming here, then >>> it would prevent the above implementation from being conforming. >> >> Both are mutually exclusive. In my opinion, programs either want >> immediate binding or lazy binding. Not an arbitrary mix of both. > > I disagree. Lazy binding provides significant performance boosts, > but in a mixed lazy/now binding environment you can bind a fixed > number of key security related symbols early I meant, as an observable event, they are exclusive. For optimizations or security, anything can be mixed. Any heuristics can be taken to achieve best results. to quickly determine > if the application uses say "execve" and decide if access control, > in a policy-less environment, needs to be disabled (execve disabled > unless the application needs it). > > You argue that we should standardize on "bind now" which happens > immediately at startup, and "lazy binding" which always happens > at the time the function is called, ignoring any opportunisitic > binding that might happen if the dynamic loader happens to prove > it knows what the binding result will be. No. I am not at all suggesting "binding" be standardized. As you said, we do need space for optimizations and improvements. We can keep existing semantics as is. We can add say "-z smart" (LD_BIND_SMARRT) or something like that to mean opportunistic binding later when it gets in. All I am proposing is to make the dlopen(3) RTLD_GLOBAL semantics to match that of POSIX/IEEE description. > No, if anything, I think we should be less proscriptive about > lazy binding. > >>> However, because POSIX says nothing about when the lazy symbol >>> resolution happens, or anything at all about it, >> >> It indeed says something: > > Only for dlopen... > >> --- >> RTLD_LAZY >> >> Relocations shall be performed at an implementation-defined time, >> ranging from the time of the dlopen() call until the first reference >> to a given symbol occurs >> --- > > ... and it says nothing really, like it should, leaving the choice > up to the implementation. This text is specifically geared towards > shared objects loaded via dlopen, not the symbols in the binary, for > which the standard says nothing. > >> And then based on the ld(1) manpage, I thought GNU/Linux >> implementation uses the time of first call. > > It does, but it doesn't use symbols brought into the global scope > by dlopen for this resolution. > >> What is the harm if we go by the existing documentation and under the >> option -z lazy or RTLD_LAZY, make lazy resolution happen at the point >> of function call? > > You forbid a mixed binding environment, you forbid opportunistic binding, > and force the binding to be truly as late as possible. No. As I said, I do not want to standardize binding and forbid any optimizations. I am saying, we can change RTLD_GLOBAL semantics and still have all that you said. By changing RTLD_GLOBAL semantics, we will not break any existing ABI. Its an additional one. And we can also have -z smart (or -z secure). And we can even make them default (in place of existing -z lazy). In that way we have everything. > >> And eventually change the semantics of RTLD_GLOBAL to match the >> description mentioned in the POSIX spec -- ...relocation processing >> of any other executable object file. > > I don't yet see the benefit in this except that you say some undisclosed > enterprise applications need these semantics because other operating > systems provided them. > > That is not a good reason to be overly prescriptive in the standard. I think we have a very minor difference of opinion in the whole discussion. To re-iterate, I am not proposing to restrict binding behaviors either to be NOW or be LAZY. We can add anything in between to optimize or secure. We can add them under an option as I said and make it default too. IMHO, (I was discussing with H.J too on the alternate code sequence proposal) lazy binding or writable-PLT cannot be totally removed from a platform. Tools like ltrace(1) will stop working. Couple of DSU solutions relying on writable-PLT/lazy_bind may stop working. All of them should co-exist is what I think. One can always use the option of choice to achieve desired results. -- Supra ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Suprateeka R Hegde @ 2016-01-01 0:00 ` Carlos O'Donell 0 siblings, 0 replies; 12+ messages in thread From: Carlos O'Donell @ 2016-01-01 0:00 UTC (permalink / raw) To: hegdesmailbox, gnu-gabi On 06/20/2016 10:19 AM, Suprateeka R Hegde wrote: >>> ld(1) on a GNU/Linux machine says: >>> --- >>> -z lazy >>> >>> When generating an executable or shared library, mark it to tell the >>> dynamic linker to defer function call resolution to the point when >>> the function is called (lazy binding) >>> --- >>> This made me think that GNU implementation also matches with other >>> implementations -- that is lazy resolution happens at the time of the >>> first call. >> >> That is not an assumption that developers should be making. > > Not as a developer. I usually read manpages as an end user. As a > developer I can clearly see whats happening currently. And whats > happening currently matches the description in the manoage too. They > are in sync now -- that is resolution at the time of first function > call. I have submitted a patch to correct this. First draft has been approved, and a second draft with clarifications for STT_GNU_IFUNC has been submitted. > All I am proposing is to make the dlopen(3) RTLD_GLOBAL semantics to > match that of POSIX/IEEE description. They already match. GNU dlopen(3) via RTLD_GLOBAL makes symbols available for relocation processing. POSIX/IEEE's RTLD_LAZY is the model for the executables own lazy symbol resolution and there the text of the standard says: "at an implementation-defined time, ranging from the time of the dlopen() call until the first reference to a given symbol occurs." In the case of GNU dlopen(3) I do not wish to constrain the implementation by saying exactly when the lazy resolution happens, and I see no strong justification to make it "at the time of the call" and to enforce global symbol searches from dlopen'd RTLD_GLOBAL symbols. >> That is not a good reason to be overly prescriptive in the standard. > > I think we have a very minor difference of opinion in the whole > discussion. To re-iterate, I am not proposing to restrict binding > behaviors either to be NOW or be LAZY. We can add anything in between > to optimize or secure. We can add them under an option as I said and > make it default too. The existing implementation is conforming as far as I can see. What you want is for the implementation-defined time to be documented as "at the time of the call" and therefore to be required to consider the symbols brought in by the dlopen via RTLD_GLOBAL. I do not feel that you have provided enough technical justification for this requirement. We don't know how this might impact the existing GNU userspace for the sake of a singular use case you present here. I could be convinced otherwise, but I am not yet convinced that the existing semantics should be changed. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde @ 2016-01-01 0:00 ` Florian Weimer 2016-01-01 0:00 ` Szabolcs Nagy 2016-01-01 0:00 ` Carlos O'Donell 1 sibling, 2 replies; 12+ messages in thread From: Florian Weimer @ 2016-01-01 0:00 UTC (permalink / raw) To: Carlos O'Donell; +Cc: hegdesmailbox, gnu-gabi * Carlos O'Donell: >> ld(1) on a GNU/Linux machine says: >> --- >> -z lazy >> >> When generating an executable or shared library, mark it to tell the >> dynamic linker to defer function call resolution to the point when >> the function is called (lazy binding) >> --- > > Note that those man page is part of the linux man pages project and > are not canonical documentation for the glibc project. This particular ld manual page seems to be derived from the ld/binutils Info documentation, which promises the same behavior. I am not sure what the exact semantics of lazy binding should be. With IFUNCs, lazy binding is observable, and we know from Fedora's BIND_NOW experiment that some applications assume that undefined functions which are never called do not cause any trouble whatsoever. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Florian Weimer @ 2016-01-01 0:00 ` Szabolcs Nagy 2016-01-01 0:00 ` Florian Weimer 2016-01-01 0:00 ` Carlos O'Donell 1 sibling, 1 reply; 12+ messages in thread From: Szabolcs Nagy @ 2016-01-01 0:00 UTC (permalink / raw) To: Florian Weimer; +Cc: Carlos O'Donell, hegdesmailbox, gnu-gabi * Florian Weimer <fw@deneb.enyo.de> [2016-07-01 22:46:19 +0200]: > I am not sure what the exact semantics of lazy binding should be. > With IFUNCs, lazy binding is observable, and we know from Fedora's > BIND_NOW experiment that some applications assume that undefined > functions which are never called do not cause any trouble whatsoever. this bind now experiment made me curious but i could not find the results and its description. is there a list of affected packages somewhere? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Szabolcs Nagy @ 2016-01-01 0:00 ` Florian Weimer 0 siblings, 0 replies; 12+ messages in thread From: Florian Weimer @ 2016-01-01 0:00 UTC (permalink / raw) To: Szabolcs Nagy; +Cc: Carlos O'Donell, hegdesmailbox, gnu-gabi * Szabolcs Nagy: > * Florian Weimer <fw@deneb.enyo.de> [2016-07-01 22:46:19 +0200]: >> I am not sure what the exact semantics of lazy binding should be. >> With IFUNCs, lazy binding is observable, and we know from Fedora's >> BIND_NOW experiment that some applications assume that undefined >> functions which are never called do not cause any trouble whatsoever. > > this bind now experiment made me curious but i could not > find the results and its description. It's ongoing: <https://fedoraproject.org/wiki/Changes/Harden_All_Packages> Alpine Linux with musl runs essentially the same experiment because musl does not support lazy binding. > is there a list of affected packages somewhere? I'm not sure. I don't think Fedora keeps a tally of the exceptions. I can generate a list of objects which use lazy binding, but I don't know if those are accidents or the result of a deliberate choice. One example that keeps coming up is Xorg server modules, which do not use DT_NEEDED. Instead, an external dependency mechanism makes sure that functions in them are called only after all the relevant modules have been loaded (but not necessarily in the order of their symbol bindings). ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: GNU dlopen(3) differs from POSIX/IEEE 2016-01-01 0:00 ` Florian Weimer 2016-01-01 0:00 ` Szabolcs Nagy @ 2016-01-01 0:00 ` Carlos O'Donell 1 sibling, 0 replies; 12+ messages in thread From: Carlos O'Donell @ 2016-01-01 0:00 UTC (permalink / raw) To: Florian Weimer; +Cc: hegdesmailbox, gnu-gabi On 07/01/2016 04:46 PM, Florian Weimer wrote: > * Carlos O'Donell: > >>> ld(1) on a GNU/Linux machine says: >>> --- >>> -z lazy >>> >>> When generating an executable or shared library, mark it to tell the >>> dynamic linker to defer function call resolution to the point when >>> the function is called (lazy binding) >>> --- >> >> Note that those man page is part of the linux man pages project and >> are not canonical documentation for the glibc project. > > This particular ld manual page seems to be derived from the > ld/binutils Info documentation, which promises the same behavior. The binutils manual should not dictate glibc behaviour. Patch sent to binutils: https://sourceware.org/ml/binutils/2016-07/msg00104.html > I am not sure what the exact semantics of lazy binding should be. The semantics of lazy binding are purposely vague to avoid constraining the implementation. The reference to the symbol will be resolved at some point between load and call. Do we need stricter semantics? Do the stricter semantics give us something in return for the constraints we place on the implementation? > With IFUNCs, lazy binding is observable, and we know from Fedora's > BIND_NOW experiment that some applications assume that undefined > functions which are never called do not cause any trouble whatsoever. The IFUNC observes lazy binding only indirectly in that the resolver is called one or more times depending on (a) number of object references to the resolver and (b) number of threads concurrently updating GOT/PLT entries and calling the ifunc resolver. If there are relevant issues from Fedora's BIND_NOW testing that relate to gnu-gabi, then we should raise them in a new thread. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-07-30 18:44 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-01 0:00 GNU dlopen(3) differs from POSIX/IEEE Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Suprateeka R Hegde 2016-01-01 0:00 ` Carlos O'Donell 2016-01-01 0:00 ` Florian Weimer 2016-01-01 0:00 ` Szabolcs Nagy 2016-01-01 0:00 ` Florian Weimer 2016-01-01 0:00 ` Carlos O'Donell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).