* Problem with atexit and _dl_fini
@ 2019-05-18 21:23 Nat!
2019-05-19 16:23 ` Florian Weimer
0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-18 21:23 UTC (permalink / raw)
To: libc-help
So my problem is, that I observe that my atexit calls are not executed
in the correct order.
i.e. atexit( a); atexit( b);Â should result in b(), a() being called in
that order. To quote the man page,
"the registered functions are invoked in reverse order".
When I register with `atexit` I can see my functions being added
properly within `__internal_atexit` in the
correct order. Finally after my functions, the elf-loader ? also adds
itself there. So it is being called first by
`__run_exit_handlers`.
Then comes the part where it goes wrong. I registered my two function
with `__internal_atexit`, but for some reason
`_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
function first.
I would like to question the wisdom of `_dl_fini` calling my destructor,
which I never registered via
__attribute__((destructor)) or somesuch. To me that is a bug. If
`_dl_fini` would leave that to `__internal_atexit`,
there would be no problem. The comments in `_dl_fini` say, that it does
some complicated dependency checks to
call destructors in the correct order. This seemingly interferes with
the atexit order.
Here is a stacktrace of the wrong call back:
```
* thread #1, name = 'noleak.debug.ex', stop reason = breakpoint 2.1
 * frame #0: 0x00007ffff7f86740
libmulle-testallocator.so`mulle_testallocator_exit at
mulle-testallocator.c:451:14
   frame #1: 0x00007ffff7e131af
libc.so.6`__cxa_finalize(d=0x00007ffff7f890e0) at cxa_finalize.c:83:6
   frame #2: 0x00007ffff7f85243
libmulle-testallocator.so`__do_global_dtors_aux + 35
   frame #3: 0x00007ffff7fe2ce6 ld-2.29.so`_dl_fini at dl-fini.c:138:9
   frame #4: 0x00007ffff7e12c65
libc.so.6`__run_exit_handlers(status=0, listp=0x00007ffff7f7d718,
run_list_atexit=<unavailable>, run_dtors=<unavailable>) at exit.c:108:8
   frame #5: 0x00007ffff7e12d8c
libc.so.6`__GI_exit(status=<unavailable>) at exit.c:139:3
   frame #6: 0x00007ffff7dfe587
libc.so.6`__libc_start_main(main=(noleak.debug.exe`main at noleak.m:18),
argc=1, argv=0x00007fffffffd728, init=(noleak.debug.exe`__libc_csu_init
at elf-init.c:68:1), fini=<unavailable>, rtld_fini=<unavailable>,
stack_end=0x00007fffffffd718) at libc-start.c:342:3
   frame #7: 0x000000000040110a noleak.debug.exe`_start + 42
```
Stacktrace of the corresponding atexit :
```
 * frame #0: 0x00007ffff7f869a0 libmulle-testallocator.so`atexit
   frame #1: 0x00007ffff7f868c7
libmulle-testallocator.so`mulle_testallocator_initialize at
mulle-testallocator.c:492:7
   frame #2: 0x00007ffff7fe295a ld-2.29.so`call_init(l=<unavailable>,
argc=1, argv=0x00007fffffffd728, env=0x00007fffffffd738) at dl-init.c:72:3
   frame #3: 0x00007ffff7fe2a59 ld-2.29.so`_dl_init at dl-init.c:30:6
   frame #4: 0x00007ffff7fe2a43
ld-2.29.so`_dl_init(main_map=0x00007ffff7ffe190, argc=1,
argv=0x00007fffffffd728, env=0x00007fffffffd738) at dl-init.c:119
   frame #5: 0x00007ffff7fd30ca ld-2.29.so`_dl_start_user + 50
```
A tweet/screenshot of the problem in action, where you can see that
libmulle-testallocator.so is calling atexit first, but also gets called
back first:
https://twitter.com/mulle_nat/status/1129131042001043456
I tried to reproduce this with a smaller test case, but wasn't
successful. I have the problem with system glibc as well as with my
self-built debugging glibc-2.29.
```
(lldb) image list
[Â 0] 047FB15C 0x0000000000400000
/home/src/srcO/mulle-objc/MulleObjC/test/0_noleak/noleak.debug.exe
[Â 1] 7A9876B1-3173-65A8-3AC8-D6F6E88FD53F-25BEA927 0x00007ffff7fd2000
/lib/x86_64-linux-gnu/ld-2.29.so
     /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.29.so
[Â 2] 76B794BB-7305-068D-C4A6-B2C6330A23DE-0BC95526 0x00007ffff7fd1000
[vdso] (0x00007ffff7fd1000)
[Â 3] 05639C58 0x00007ffff7f8a000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libMulleObjC.so
[Â 4] D1F022AE-06A7-DD85-43B7-D26B34D45415-59240C1E 0x00007ffff7f84000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-testallocator.so
[Â 5] 0F5AACB3-8162-0D16-F936-1C8C44F8B6D6-82423DF9 0x00007ffff7dd9000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libc.so.6
[Â 6] 13E7B8ED-A5D0-4B2C-05BE-0926D37365CE-A861C019 0x00007ffff7d89000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-objc-runtime.so
[Â 7] 96F58694-13A4-8019-C8C7-47993BC8B10E-C11F8498 0x00007ffff7d84000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libdl.so.2
[Â 8] 8BBE3351-7A55-AB93-23D3-E05E6DBF19C4-842503AB 0x00007ffff7d7a000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-concurrent.so
[Â 9] 4D189D24-CA86-BCAD-32DC-2C4E15D6DB70-C51C94FB 0x00007ffff7d6c000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-aba.so
[ 10] DEB7E042-331F-B188-43B1-91C1225C51EB-DA4DAC39 0x00007ffff7d67000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-allocator.so
[ 11] 30F22D5C-1699-A1E2-C135-14F5A2199065-7DCC8A02 0x00007ffff7d62000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-thread.so
[ 12] 6AC089FD-3867-9707-F7F1-4E2B6347F639-60D504FF 0x00007ffff7d5d000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-vararg.so
[ 13] C18E6B6F-2BD8-DC81-92BE-E3CA239F4B92-D1FBB101 0x00007ffff7d56000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-stacktrace.so
[ 14] 272D4479-EC66-502F-C535-BBC9286963F6-09CA9E9D 0x00007ffff7d3d000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-container.so
[ 15] A6F82062-5433-D33D-4BC8-F81DA55BC85A-76B9D8AF 0x00007ffff7d1d000
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libpthread.so.0
[ 16] FB3F5F86-E867-5910-864C-3E064CEAC7D3-4FFFD5BC 0x00007ffff7cb2000
/lib/x86_64-linux-gnu/libgcc_s.so.1
```
LD_DEBUG: doesn't say much:
```
    13670:
    13670:   calling fini:
/home/src/srcO/mulle-objc/MulleObjC/test/0_noleak/noleak.debug.exe [0]
    13670:
    13670:
    13670:   calling fini:
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libMulleObjC.so [0]
    13670:
    13670:
    13670:   calling fini:
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-testallocator.so
[0]
    13670:
mulle_testallocator: exit (0x7f78f9ee873c)
```
Ciao
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-18 21:23 Problem with atexit and _dl_fini Nat!
@ 2019-05-19 16:23 ` Florian Weimer
2019-05-19 19:37 ` Nat!
0 siblings, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2019-05-19 16:23 UTC (permalink / raw)
To: Nat!; +Cc: libc-help
* Nat!:
> So my problem is, that I observe that my atexit calls are not executed
> in the correct order.
>
> i.e. atexit( a); atexit( b); should result in b(), a() being called in
> that order. To quote the man page,
>
> "the registered functions are invoked in reverse order".
>
>
> When I register with `atexit` I can see my functions being added
> properly within `__internal_atexit` in the
>
> correct order. Finally after my functions, the elf-loader ? also adds
> itself there. So it is being called first by
>
> `__run_exit_handlers`.
>
>
> Then comes the part where it goes wrong. I registered my two function
> with `__internal_atexit`, but for some reason
>
> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
> function first.
When atexit is called from a DSO, glibc calls the registered function
before the DSO is unloaded. This choice was made because after
unloading, the function pointer becomes invalid.
I haven't checked, but I suspect atexit still works this way even if
it doesn't have to (because the DSO is never unloaded).
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-19 16:23 ` Florian Weimer
@ 2019-05-19 19:37 ` Nat!
2019-05-21 20:43 ` Adhemerval Zanella
2019-06-09 20:59 ` Nat!
0 siblings, 2 replies; 21+ messages in thread
From: Nat! @ 2019-05-19 19:37 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-help
On 19.05.19 18:23, Florian Weimer wrote:
> * Nat!:
>
>> So my problem is, that I observe that my atexit calls are not executed
>> in the correct order.
>>
>> i.e. atexit( a); atexit( b);Â should result in b(), a() being called in
>> that order. To quote the man page,
>>
>> "the registered functions are invoked in reverse order".
>>
>>
>> When I register with `atexit` I can see my functions being added
>> properly within `__internal_atexit` in the
>>
>> correct order. Finally after my functions, the elf-loader ? also adds
>> itself there. So it is being called first by
>>
>> `__run_exit_handlers`.
>>
>>
>> Then comes the part where it goes wrong. I registered my two function
>> with `__internal_atexit`, but for some reason
>>
>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>> function first.
> When atexit is called from a DSO, glibc calls the registered function
> before the DSO is unloaded. This choice was made because after
> unloading, the function pointer becomes invalid.
>
> I haven't checked, but I suspect atexit still works this way even if
> it doesn't have to (because the DSO is never unloaded).
>
I understand, but the behavior is wrong :) The C standard (or the C++
standard for this matter)
http://www.cplusplus.com/reference/cstdlib/atexit/ states that
```
If more than one atexit function has been specified by different calls
to this function, they are all executed in reverse order as a stack
(i.e. the last function specified is the first to be executed at exit).
```
I think its been shown that glibc can violate this C standard, so for me
the argument would be over here already. That one should unwind in the
reverse order is, I assume, not a interesting discussion topic.
Currently atexit as a reliable mechanism is broken.
But I also don't think the way this is currently handled in glibc, can't
be of much use to anyone.
Case 1: a regular exe linked with shared libraries, nothing gets
unloaded at exit, so what's the point ?
Case 2: someone manually unloads a shared library, that contains atexit
code. The bug is either using `atexit` for a shared library that gets
unloaded, or unloading a shared library that contains atexit code. But
it's not really glibcs business IMO.
Case 3: some automatism unloads shared libraries. Then the automatism
should check if atexit code is affected and not unload, because the
shared library is still clearly needed. It's a bug in the automatism.
If one was hellbent on trying to support atexit for unloading shared
libraries, an atexit contained in a shared library should up the
reference count of the shared library during the atexit call and
decrement after the callback has executed.
Ciao
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-19 19:37 ` Nat!
@ 2019-05-21 20:43 ` Adhemerval Zanella
2019-05-22 10:22 ` Nat!
2019-06-09 20:59 ` Nat!
1 sibling, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-21 20:43 UTC (permalink / raw)
To: libc-help
On 19/05/2019 16:37, Nat! wrote:
>
> On 19.05.19 18:23, Florian Weimer wrote:
>> * Nat!:
>>
>>> So my problem is, that I observe that my atexit calls are not executed
>>> in the correct order.
>>>
>>> i.e. atexit( a); atexit( b);Â should result in b(), a() being called in
>>> that order. To quote the man page,
>>>
>>> "the registered functions are invoked in reverse order".
>>>
>>>
>>> When I register with `atexit` I can see my functions being added
>>> properly within `__internal_atexit` in the
>>>
>>> correct order. Finally after my functions, the elf-loader ? also adds
>>> itself there. So it is being called first by
>>>
>>> `__run_exit_handlers`.
>>>
>>>
>>> Then comes the part where it goes wrong. I registered my two function
>>> with `__internal_atexit`, but for some reason
>>>
>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>> function first.
>> When atexit is called from a DSO, glibc calls the registered function
>> before the DSO is unloaded. This choice was made because after
>> unloading, the function pointer becomes invalid.
>>
>> I haven't checked, but I suspect atexit still works this way even if
>> it doesn't have to (because the DSO is never unloaded).
>>
>
> I understand, but the behavior is wrong :) The C standard (or the C++ standard for this matter) http://www.cplusplus.com/reference/cstdlib/atexit/ states that
>
>
> ```
>
> If more than one atexit function has been specified by different calls to this function, they are all executed in reverse order as a stack (i.e. the last function specified is the first to be executed at exit).
>
> ```
>
> I think its been shown that glibc can violate this C standard, so for me the argument would be over here already. That one should unwind in the reverse order is, I assume, not a interesting discussion topic. Currently atexit as a reliable mechanism is broken.
>
>
> But I also don't think the way this is currently handled in glibc, can't be of much use to anyone.
>
> Case 1: a regular exe linked with shared libraries, nothing gets unloaded at exit, so what's the point ?
>
> Case 2: someone manually unloads a shared library, that contains atexit code. The bug is either using `atexit` for a shared library that gets unloaded, or unloading a shared library that contains atexit code. But it's not really glibcs business IMO.
>
> Case 3: some automatism unloads shared libraries. Then the automatism should check if atexit code is affected and not unload, because the shared library is still clearly needed. It's a bug in the automatism.
>
> If one was hellbent on trying to support atexit for unloading shared libraries, an atexit contained in a shared library should up the reference count of the shared library during the atexit call and decrement after the callback has executed.
>
> Ciao
Could you provide a testcase that stress the issue you are seeing? At
least on glibc it does have a testcase that does pretty much what you
described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.
It created a shared library with an exported symbol that registers at
lot of atexit function and the main program dlopen and dclose it and
checks if the atexit handlers are indeed called in the correct order.
It does not use any C++, so there is no __cxa_finalize involved.
Also, on your debug information I must confess it is confusing what
exactly you are expecting and what exactly your program is doing.
For instance I am not understanding the part 'I registered my two
function with `__internal_atexit`'. Are you trying to calling it
directly? Or are you calling __cxa_atexit? Keep in mind that
__cxa_atexit calls are generated by compiler itself to destruct
global objects.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-21 20:43 ` Adhemerval Zanella
@ 2019-05-22 10:22 ` Nat!
2019-05-22 15:01 ` Adhemerval Zanella
0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-22 10:22 UTC (permalink / raw)
To: libc-help
Adhemerval Zanella schrieb:
>
> Could you provide a testcase that stress the issue you are seeing? At
> least on glibc it does have a testcase that does pretty much what you
> described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.
I tried to simplify it, but I failed and gave up (see below).
>
> It created a shared library with an exported symbol that registers at
> lot of atexit function and the main program dlopen and dclose it and
> checks if the atexit handlers are indeed called in the correct order.
> It does not use any C++, so there is no __cxa_finalize involved.
The use of __cxa_finalize is something I observed in my stacktrace
(see previous mails).
>
> Also, on your debug information I must confess it is confusing what
> exactly you are expecting and what exactly your program is doing.
> For instance I am not understanding the part 'I registered my two
> function with `__internal_atexit`'. Are you trying to calling it
> directly? Or are you calling __cxa_atexit? Keep in mind that
> __cxa_atexit calls are generated by compiler itself to destruct
> global objects.
>
I am really just using `atexit`. I was trying to explain, what the
internal glib problem is, that I observed setting breakpoints and
stepping through the code.
What conceptually I am doing is to install on-demand `atexit` handlers
during the load of shared libraries. These are then used for debugging.
Basically like this (typed by hand):
liba.so:
void a( void)
{
}
__attribute__((constructor))
static void check( void)
{
if( getenv( "DO_B_ON_EXIT"))
atexit( a);
}
libb.so:
void b( void)
{
}
__attribute__((constructor))
static void check( void)
{
if( getenv( "DO_A_ON_EXIT"))
atexit( b);
}
I tried this with a simple setup and that doesn't create problems as
such. But dl-fini re-sorts the dependencies sometimes, and then
the atexit order is compromised and that is what I am running into.
Ciao
Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-22 10:22 ` Nat!
@ 2019-05-22 15:01 ` Adhemerval Zanella
2019-05-22 15:29 ` Nat!
0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-22 15:01 UTC (permalink / raw)
To: libc-help
On 22/05/2019 07:22, Nat! wrote:
> Adhemerval Zanella schrieb:
>
>>
>> Could you provide a testcase that stress the issue you are seeing? At
>> least on glibc it does have a testcase that does pretty much what you
>> described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.
>
> I tried to simplify it, but I failed and gave up (see below).
>
>>
>> It created a shared library with an exported symbol that registers at
>> lot of atexit function and the main program dlopen and dclose it and
>> checks if the atexit handlers are indeed called in the correct order.
>> It does not use any C++, so there is no __cxa_finalize involved.
>
> The use of __cxa_finalize is something I observed in my stacktrace
> (see previous mails).
>
>>
>> Also, on your debug information I must confess it is confusing what
>> exactly you are expecting and what exactly your program is doing.
>> For instance I am not understanding the part 'I registered my two
>> function with `__internal_atexit`'. Are you trying to calling it
>> directly? Or are you calling __cxa_atexit? Keep in mind that
>> __cxa_atexit calls are generated by compiler itself to destruct
>> global objects.
>>
>
> I am really just using `atexit`. I was trying to explain, what the internal glib problem is, that I observed setting breakpoints and stepping through the code.
>
> What conceptually I am doing is to install on-demand `atexit` handlers during the load of shared libraries. These are then used for debugging.
>
> Basically like this (typed by hand):
>
> liba.so:
>
> void  a( void)
> {
> }
>
> __attribute__((constructor))
> static void  check( void)
> {
> Â Â if( getenv( "DO_B_ON_EXIT"))
> Â Â Â Â Â atexit( a);
> }
>
>
> libb.so:
>
> void  b( void)
> {
> }
>
> __attribute__((constructor))
> static void  check( void)
> {
> Â Â if( getenv( "DO_A_ON_EXIT"))
> Â Â Â Â Â atexit( b);
> }
>
> I tried this with a simple setup and that doesn't create problems as such. But dl-fini re-sorts the dependencies sometimes, and then
> the atexit order is compromised and that is what I am running into.
>
Are you sure that you calling the atexit in the order you expect? Because by
explicit linking it will depend on the order of the shared object you are
passing on linking command.
Using the example (I just instrumented both 'a' and 'b' to prints the function
name using write on STDERR_FILENO):
$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -lliba -llibb
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
a
b
$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -llibb -lliba
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
b
a
Is is also similar to a dlopen case:
--
#include <dlfcn.h>
#include <assert.h>
int main (void)
{
void *handle_a = dlopen ("libliba.so", RTLD_NOW);
assert (handle_a);
void *handle_b = dlopen ("liblibb.so", RTLD_NOW);
assert (handle_b);
}
--
$ gcc -Wall test-dlopen.c -o test-dlopen -ldl
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test-dlopen
b
a
What you might seeing is maybe a implicit dependency that make loading
in a specific order even if you try to explicit issue them on linking
command. Using the starting example, if you link liblibb.so with
libliba.so as a dependency:
$ gcc -Wall -shared -fpic liba.c -o libliba.so
$ gcc -Wall -shared -fpic libb.c -o liblibb.so -L`pwd` -Wl,--no-as-needed -lliba
It does not matter how you link the resulting program, the atexit handler will
be registered in the same order:
$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -llibb -lliba
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
b
a
$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -lliba -llibb
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
b
a
I really think it is something related to your build because glibc does not
actually reorder the internal __exit_funcs list. AFAIK once it is inserted at
stdlib/on_exit.c by __internal_atexit it is deregistered at stdlib/cxa_finalize.c.
So if you could actually show the issue your are observing, please open a bug
report. Otherwise I am not seeing an actually issue with atexit handlers
here.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-22 15:01 ` Adhemerval Zanella
@ 2019-05-22 15:29 ` Nat!
2019-05-22 19:35 ` Adhemerval Zanella
0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-22 15:29 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-help
Did you take a look at the screenshot in the tweet
(https://twitter.com/mulle_nat/status/1129131042001043456/photo/1) I
linked ? That's the best evidence I can provide, that its really
happening. I tried to reproduce it pretty much the same as you did, but
wasn't successful. It's not easy...
On a hunch I would say the problem will turn out to have something to do
with liba having shared library dependencies and libb having shared
library dependecies and some are shared by both and some not and dl-fini
sorts things in the wrong order.
If I get my release stuff down and find some spare time, I'll attempt to
reproduce it again.
Ciao
Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-22 15:29 ` Nat!
@ 2019-05-22 19:35 ` Adhemerval Zanella
2019-05-29 21:16 ` Nat!
0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-22 19:35 UTC (permalink / raw)
To: Nat!; +Cc: libc-help
On 22/05/2019 12:29, Nat! wrote:
> Did you take a look at the screenshot in the tweet (https://twitter.com/mulle_nat/status/1129131042001043456/photo/1) I linked ? That's the best evidence I can provide, that its really happening. I tried to reproduce it pretty much the same as you did, but wasn't successful. It's not easy...
And I can't really tell without actually debugging it. On __cxa_finalizer, could
you dump the __exit_funcs values? The only thing I can think of is _dl_fini is
having to sort out the maps because of an implicit dependency between the shared
libraries.
>
> On a hunch I would say the problem will turn out to have something to do with liba having shared library dependencies and libb having shared library dependecies and some are shared by both and some not and dl-fini sorts things in the wrong order.
>
> If I get my release stuff down and find some spare time, I'll attempt to reproduce it again.
Do you have a easy way to provide the environment you are testing?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-22 19:35 ` Adhemerval Zanella
@ 2019-05-29 21:16 ` Nat!
0 siblings, 0 replies; 21+ messages in thread
From: Nat! @ 2019-05-29 21:16 UTC (permalink / raw)
To: libc-help, Adhemerval Zanella
On 22.05.19 21:34, Adhemerval Zanella wrote:
>
> Do you have a easy way to provide the environment you are testing?
>
Now I can provide a way to reproduce the problem and debug it. While
this may look a little daunting at first, it's mostly just copy/paste of
commands and in the end you have a gdb session with a glibc that can be
stepped through in the debugger.
atexit-bug.md
---
This is about the simplest way to reproduce the `atexit` problem in glibc.
## Create a docker with the development environment
Get the Dockerfile:
```
curl -L -O
'https://raw.githubusercontent.com/MulleFoundation/foundation-developer/release/Dockerfile'
```
As we want to debug with the newest **glibc** we need to use `ubuntu:disco`
instead of `ubuntu:bionic`, so change the first line of
the `Dockerfile` to `FROM ubuntu:disco`. Now you are ready to build the
container:
```
sudo docker build -t foundation -f Dockerfile "`mktemp -d`"
sudo docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined
-i -t --rm foundation
```
## Get some tools via apt
Inside the docker get some prerequisites for debugging and for **glibc**:
```
sudo apt-get -y install vim gdb gawk bison gettext texinfo
```
## Download the project
Inside the docker get the **MulleObjC** project. This will place you in a
virtual environment subshell. With `exit` you can get out (you should).
```
mulle-sde https://github.com/mulle-objc/MulleObjC/archive/latest.tar.gz
exit
```
## Remove atexit fix
With the `atexit` fix, there are no problems, so we need to take it out:
```
cd MulleObjC/test
mulle-sde environment set MULLE_ATEXIT_URL
'https://github.com/mulle-core/mulle-atexit/archive/placebo.tar.gz'
```
## Build with a debug version of glibc
Build everything with a custom built version of **glibc**, so we can
debug it.
While still being in `MulleObjC/test`:
```
mulle-sourcetree -N add --nodetype 'tar' --marks 'no-all-load,no-import'
--userinfo 'aliases=c' --url
'${GLIBC_2_29_URL:-https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.xz}'
--branch '${GLIBC_2_29_BRANCH}' 'glibc'
mulle-sourcetree move glibc top
mulle-sde dependency craftinfo set glibc CC_DEBUG '-O1 -g'
mulle-sde dependency craftinfo set glibc SKIP_AUTOCONF YES
mulle-sde test craft
```
## Build first test and observe the error
While still being in `MulleObjC/test`:
```
mulle-sde -vvv test run --keep-exe 0_noleak/noleak.m
```
The error should appear.
Now you can look at the debugger to seem the wrong atexit order:
```
MULLE_TESTALLOCATOR=1 MULLE_OBJC_PEDANTIC_EXIT=YES gdb
/MulleObjC/test/0_noleak/noleak.debug.exe
```
> Breakpoints to set:
>
> b atexit
> b mulle_testallocator_exit
> b mulle_objc_global_atexit
---
I tried these steps and it worked out successfully for me.
Good Luck
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-05-19 19:37 ` Nat!
2019-05-21 20:43 ` Adhemerval Zanella
@ 2019-06-09 20:59 ` Nat!
2019-06-10 11:48 ` Adhemerval Zanella
1 sibling, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-09 20:59 UTC (permalink / raw)
To: libc-help
Another datapoint to support my claim that _dl-fini breaks atexit. This
time its very easy to reproduce ;)
Here 's the README.md from the Github Repo
https://github.com/mulle-nat/atexit-breakage-linux
```
# Shows another breakage involving `atexit` on linux
Here the `atexit` callback is invoked mistakenly multiple times.
## Build
Build with [mulle-make](//github.com/mulle-sde/mulle-make) or alternatively :
```
(
mkdir build &&
cd build &&
cmake .. &&
make
)
```
## Run
Use `ldd` to trigger the misbehaviour:
```
LD_PRELOAD="${PWD}/build/libld-preload.so" ldd ./build/main
```
## Output
```
load
unload
unload
unload
linux-vdso.so.1 (0x00007ffd2b2bd000)
/home/src/srcO/mulle-core/mulle-testallocator/research/ld-preload/build/libld-preload.so (0x00007f83853c1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f838518c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f83853cd000)
unload
unload
```
Ciao
Nat!
On 19.05.19 21:37, Nat! wrote:
>
> On 19.05.19 18:23, Florian Weimer wrote:
>> * Nat!:
>>
>>> So my problem is, that I observe that my atexit calls are not executed
>>> in the correct order.
>>>
>>> i.e. atexit( a); atexit( b);Â should result in b(), a() being called in
>>> that order. To quote the man page,
>>>
>>> "the registered functions are invoked in reverse order".
>>>
>>>
>>> When I register with `atexit` I can see my functions being added
>>> properly within `__internal_atexit` in the
>>>
>>> correct order. Finally after my functions, the elf-loader ? also adds
>>> itself there. So it is being called first by
>>>
>>> `__run_exit_handlers`.
>>>
>>>
>>> Then comes the part where it goes wrong. I registered my two function
>>> with `__internal_atexit`, but for some reason
>>>
>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>> function first.
>> When atexit is called from a DSO, glibc calls the registered function
>> before the DSO is unloaded. This choice was made because after
>> unloading, the function pointer becomes invalid.
>>
>> I haven't checked, but I suspect atexit still works this way even if
>> it doesn't have to (because the DSO is never unloaded).
>>
>
> I understand, but the behavior is wrong :) The C standard (or the C++
> standard for this matter)
> http://www.cplusplus.com/reference/cstdlib/atexit/ states that
>
>
> ```
>
> If more than one atexit function has been specified by different calls
> to this function, they are all executed in reverse order as a stack
> (i.e. the last function specified is the first to be executed at exit).
>
> ```
>
> I think its been shown that glibc can violate this C standard, so for
> me the argument would be over here already. That one should unwind in
> the reverse order is, I assume, not a interesting discussion topic.
> Currently atexit as a reliable mechanism is broken.
>
>
> But I also don't think the way this is currently handled in glibc,
> can't be of much use to anyone.
>
> Case 1: a regular exe linked with shared libraries, nothing gets
> unloaded at exit, so what's the point ?
>
> Case 2: someone manually unloads a shared library, that contains
> atexit code. The bug is either using `atexit` for a shared library
> that gets unloaded, or unloading a shared library that contains atexit
> code. But it's not really glibcs business IMO.
>
> Case 3: some automatism unloads shared libraries. Then the automatism
> should check if atexit code is affected and not unload, because the
> shared library is still clearly needed. It's a bug in the automatism.
>
> If one was hellbent on trying to support atexit for unloading shared
> libraries, an atexit contained in a shared library should up the
> reference count of the shared library during the atexit call and
> decrement after the callback has executed.
>
> Ciao
>
> Â Â Nat!
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-09 20:59 ` Nat!
@ 2019-06-10 11:48 ` Adhemerval Zanella
2019-06-10 13:08 ` Nat!
0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-10 11:48 UTC (permalink / raw)
To: libc-help
On 09/06/2019 17:59, Nat! wrote:
> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>
> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>
>
> ```
>
> # Shows another breakage involving `atexit` on linux
>
> Here the `atexit` callback is invoked mistakenly multiple times.
This 'example' does not really show the issue because ldd script issues
the loader multiple times, see below. You can check exactly what ldd is
doing by calling with sh -x.
I will try to use your instruction to run on docker to see what exactly
is happening in your environment.
>
> ## Build
>
> Build with [mulle-make](//github.com/mulle-sde/mulle-make) or alternatively :
>
> ```
> (
> Â Â mkdir build &&
> Â Â cd build &&
> Â Â cmake .. &&
> Â Â make
> )
> ```
>
> ## Run
>
> Use `ldd` to trigger the misbehaviour:
>
> ```
> LD_PRELOAD="${PWD}/build/libld-preload.so" ldd ./build/main
> ```
>
> ## Output
>
> ```
> load
> unload
First and second time is done on:
158 dummy=`$rtld 2>&1`
159 if test $? = 127; then
160 verify_out=`${rtld} --verify "$file"`
161 ret=$?
Where the rtld list is, for x86_64, /lib/ld-linux.so.2 and /lib64/ld-linux-x86-64.so.2
> unload
> unload
This time is done at:
176 0|2)
177 try_trace "$RTLD" "$file" || result=1
178 ;;
179 *)
Where is call the loader as
eval LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= 'LD_LIBRARY_VERSION=$verify_out' LD_VERBOSE= '"$@"'
With same rtld list as before.
> Â Â linux-vdso.so.1 (0x00007ffd2b2bd000)
> Â Â /home/src/srcO/mulle-core/mulle-testallocator/research/ld-preload/build/libld-preload.so (0x00007f83853c1000)
> Â Â libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f838518c000)
> Â Â /lib64/ld-linux-x86-64.so.2 (0x00007f83853cd000)
> unload
> unload
> ```
>
>
> Ciao
> Â Â Nat!
>
>
> On 19.05.19 21:37, Nat! wrote:
>>
>> On 19.05.19 18:23, Florian Weimer wrote:
>>> * Nat!:
>>>
>>>> So my problem is, that I observe that my atexit calls are not executed
>>>> in the correct order.
>>>>
>>>> i.e. atexit( a); atexit( b);Â should result in b(), a() being called in
>>>> that order. To quote the man page,
>>>>
>>>> "the registered functions are invoked in reverse order".
>>>>
>>>>
>>>> When I register with `atexit` I can see my functions being added
>>>> properly within `__internal_atexit` in the
>>>>
>>>> correct order. Finally after my functions, the elf-loader ? also adds
>>>> itself there. So it is being called first by
>>>>
>>>> `__run_exit_handlers`.
>>>>
>>>>
>>>> Then comes the part where it goes wrong. I registered my two function
>>>> with `__internal_atexit`, but for some reason
>>>>
>>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>>> function first.
>>> When atexit is called from a DSO, glibc calls the registered function
>>> before the DSO is unloaded. This choice was made because after
>>> unloading, the function pointer becomes invalid.
>>>
>>> I haven't checked, but I suspect atexit still works this way even if
>>> it doesn't have to (because the DSO is never unloaded).
>>>
>>
>> I understand, but the behavior is wrong :) The C standard (or the C++ standard for this matter) http://www.cplusplus.com/reference/cstdlib/atexit/ states that
>>
>>
>> ```
>>
>> If more than one atexit function has been specified by different calls to this function, they are all executed in reverse order as a stack (i.e. the last function specified is the first to be executed at exit).
>>
>> ```
>>
>> I think its been shown that glibc can violate this C standard, so for me the argument would be over here already. That one should unwind in the reverse order is, I assume, not a interesting discussion topic. Currently atexit as a reliable mechanism is broken.
>>
>>
>> But I also don't think the way this is currently handled in glibc, can't be of much use to anyone.
>>
>> Case 1: a regular exe linked with shared libraries, nothing gets unloaded at exit, so what's the point ?
>>
>> Case 2: someone manually unloads a shared library, that contains atexit code. The bug is either using `atexit` for a shared library that gets unloaded, or unloading a shared library that contains atexit code. But it's not really glibcs business IMO.
>>
>> Case 3: some automatism unloads shared libraries. Then the automatism should check if atexit code is affected and not unload, because the shared library is still clearly needed. It's a bug in the automatism.
>>
>> If one was hellbent on trying to support atexit for unloading shared libraries, an atexit contained in a shared library should up the reference count of the shared library during the atexit call and decrement after the callback has executed.
>>
>> Ciao
>>
>> Â Â Nat!
>>
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-10 11:48 ` Adhemerval Zanella
@ 2019-06-10 13:08 ` Nat!
2019-06-10 20:27 ` Adhemerval Zanella
0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-10 13:08 UTC (permalink / raw)
To: libc-help; +Cc: Adhemerval Zanella
On 10.06.19 13:48, Adhemerval Zanella wrote:
>
> On 09/06/2019 17:59, Nat! wrote:
>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>
>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>
>>
>> ```
>>
>> # Shows another breakage involving `atexit` on linux
>>
>> Here the `atexit` callback is invoked mistakenly multiple times.
> This 'example' does not really show the issue because ldd script issues
> the loader multiple times, see below. You can check exactly what ldd is
> doing by calling with sh -x.
I agree it doesn't show the same issue, but it shows that something else
is going very wrong. :) Or are you happy, that atexit is called multiple
times ? Who's calling exit here anyway ? Check out the debugger output
too (see updated README.md)
>
> I will try to use your instruction to run on docker to see what exactly
> is happening in your environment.
That's not necessary anymore. I managed to make it reproducible in a
much simpler form just now.
The ld-so-breakage project is basically a recreation of the original
"docker" scenario written from scratch. I try to explain in the README ,
what is going on. But if there are questions hit me up (maybe as an
issue ?) :
   https://github.com/mulle-nat/ld-so-breakage
The "another datapoint" project shows how constructor/destructor don't
pair up:
   https://github.com/mulle-nat/atexit-breakage-linux
And as a random bonus this project indicates to me that LD_PRELOAD
doesn't do what its supposed to either:
   https://github.com/mulle-nat/LD_PRELOAD-breakage-linux
In total I think the state of affairs is pretty dismal. I didn't expect
that much basic stuff not working on linux. With hindsight, I probably
have wasted _weeks_ on these problems.
I still maintain that the concept to let `atexit` callbacks not run by
`exit` is broken. An `atexit` callback is not the same as an
`__attribute__((destructor))__`.
Ciao
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-10 13:08 ` Nat!
@ 2019-06-10 20:27 ` Adhemerval Zanella
2019-06-11 18:39 ` Adhemerval Zanella
2019-06-11 18:53 ` Nat!
0 siblings, 2 replies; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-10 20:27 UTC (permalink / raw)
To: Nat!, libc-help
On 10/06/2019 10:07, Nat! wrote:
>
> On 10.06.19 13:48, Adhemerval Zanella wrote:
>>
>> On 09/06/2019 17:59, Nat! wrote:
>>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>>
>>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>>
>>>
>>> ```
>>>
>>> # Shows another breakage involving `atexit` on linux
>>>
>>> Here the `atexit` callback is invoked mistakenly multiple times.
>> This 'example' does not really show the issue because ldd script issues
>> the loader multiple times, see below. You can check exactly what ldd is
>> doing by calling with sh -x.
>
> I agree it doesn't show the same issue, but it shows that something else is going very wrong. :) Or are you happy, that atexit is called multiple times ? Who's calling exit here anyway ? Check out the debugger output too (see updated README.md)
The ldd is not a program, but rather a shell script that issues the target
binary along with system loader multiple times. What you are seeing is not
atexit called multiple times, but rather how the script is called.
When you set LD_PRELOAD *before* issuing ldd you will make the shell binary
to also pre-load the library. I instrumented the binary to also print the
output command line from the issue binary (get either by program_invocation_name
or /proc/self/cmdline):
$ LD_PRELOAD=./libld-preload.so ./ldd ./main
/bin/bash: load
/bin/bash: unload
/bin/bash: unload
/bin/bash: unload
linux-vdso.so.1 (0x00007ffd445ef000)
./libld-preload.so (0x00007fa866ac5000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8664b5000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa8668a6000)
/bin/bash: unload
/bin/bash: unload
The program is not load since although ldd does call the loader, it calls
in a trace mode that does not actually load any shared library. The first
'load' is issued by library when bash is first executed and later multiple
'unload' is due bash forks and then exits multiple times.
>
>
>>
>> I will try to use your instruction to run on docker to see what exactly
>> is happening in your environment.
>
> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
>
> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
>
> Â Â Â https://github.com/mulle-nat/ld-so-breakage
Thanks, it is way more useful. I now I understand what is happening and
IMHO this behaviour is a required because on glibc we set that atexit/on_exit
handlers are ran when deregister a library (as for dlclose).
Using the example in your testcase:
---
USE_A=YES ./build/main_adbc
-- install atexit_b
-- install atexit_a
-- run atexit_a
-- run atexit_b
---
The behaviour of atexit handlers being called in wrong order is they are
being registered with '__cxa_atexit' which in turn sets its internal type
as 'ef_cxa'. Since _dl_init is registered last (after all shared library
loading and constructors calls), it will call _dl_fini which in turn will
call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
by __do_global_dtors_aux and set the function as 'ef_free'. It will then
prevent '__run_exit_handlers' to run the handlers more than once.
So the question you might ask is why not just to use 'ef_at' for atexit
handlers, make them no to run on __cxa_finalize and thus make your example
run as you expect? The issue is glibc does not know whether your library
would be dlopened or not.
If you set an atfork handler by a constructor that references to a function
inside the shared library and if do *not* set to *not* be ran later you might,
a case of dlopen -> constructor -> dlclose -> exit will try to execute and
invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c.
So the question is why exactly glibc defined that atexit should be called
by dlclose. I understand that __cxa_finalize / destructor make sense to
make it possible the shared library to free allocated resources, but I
can't really get why there a need to extend it to 'atexit' as well.
>
>
> The "another datapoint" project shows how constructor/destructor don't pair up:
>
> Â Â Â https://github.com/mulle-nat/atexit-breakage-linux
>
>
> And as a random bonus this project indicates to me that LD_PRELOAD doesn't do what its supposed to either:
>
> Â Â Â https://github.com/mulle-nat/LD_PRELOAD-breakage-linux
>
>
> In total I think the state of affairs is pretty dismal. I didn't expect that much basic stuff not working on linux. With hindsight, I probably have wasted _weeks_ on these problems.
>
> I still maintain that the concept to let `atexit` callbacks not run by `exit` is broken. An `atexit` callback is not the same as an `__attribute__((destructor))__`.
>
>
> Ciao
>
> Â Â Nat!
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-10 20:27 ` Adhemerval Zanella
@ 2019-06-11 18:39 ` Adhemerval Zanella
2019-06-11 20:20 ` Nat!
2019-06-11 18:53 ` Nat!
1 sibling, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-11 18:39 UTC (permalink / raw)
To: Nat!, libc-help
On 10/06/2019 17:27, Adhemerval Zanella wrote:
>
>
> On 10/06/2019 10:07, Nat! wrote:
>>
>> On 10.06.19 13:48, Adhemerval Zanella wrote:
>>>
>>> On 09/06/2019 17:59, Nat! wrote:
>>>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>>>
>>>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>>>
>>>>
>>>> ```
>>>>
>>>> # Shows another breakage involving `atexit` on linux
>>>>
>>>> Here the `atexit` callback is invoked mistakenly multiple times.
>>> This 'example' does not really show the issue because ldd script issues
>>> the loader multiple times, see below. You can check exactly what ldd is
>>> doing by calling with sh -x.
>>
>> I agree it doesn't show the same issue, but it shows that something else is going very wrong. :) Or are you happy, that atexit is called multiple times ? Who's calling exit here anyway ? Check out the debugger output too (see updated README.md)
>
> The ldd is not a program, but rather a shell script that issues the target
> binary along with system loader multiple times. What you are seeing is not
> atexit called multiple times, but rather how the script is called.
>
> When you set LD_PRELOAD *before* issuing ldd you will make the shell binary
> to also pre-load the library. I instrumented the binary to also print the
> output command line from the issue binary (get either by program_invocation_name
> or /proc/self/cmdline):
>
> $ LD_PRELOAD=./libld-preload.so ./ldd ./main
> /bin/bash: load
> /bin/bash: unload
> /bin/bash: unload
> /bin/bash: unload
> linux-vdso.so.1 (0x00007ffd445ef000)
> ./libld-preload.so (0x00007fa866ac5000)
> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8664b5000)
> /lib64/ld-linux-x86-64.so.2 (0x00007fa8668a6000)
> /bin/bash: unload
> /bin/bash: unload
>
> The program is not load since although ldd does call the loader, it calls
> in a trace mode that does not actually load any shared library. The first
> 'load' is issued by library when bash is first executed and later multiple
> 'unload' is due bash forks and then exits multiple times.
>
>>
>>
>>>
>>> I will try to use your instruction to run on docker to see what exactly
>>> is happening in your environment.
>>
>> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
>>
>> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
>>
>> Â Â Â https://github.com/mulle-nat/ld-so-breakage
>
> Thanks, it is way more useful. I now I understand what is happening and
> IMHO this behaviour is a required because on glibc we set that atexit/on_exit
> handlers are ran when deregister a library (as for dlclose).
>
> Using the example in your testcase:
>
> ---
> USE_A=YES ./build/main_adbc
> -- install atexit_b
> -- install atexit_a
> -- run atexit_a
> -- run atexit_b
> ---
>
> The behaviour of atexit handlers being called in wrong order is they are
> being registered with '__cxa_atexit' which in turn sets its internal type
> as 'ef_cxa'. Since _dl_init is registered last (after all shared library
> loading and constructors calls), it will call _dl_fini which in turn will
> call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
>
> The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
> by __do_global_dtors_aux and set the function as 'ef_free'. It will then
> prevent '__run_exit_handlers' to run the handlers more than once.
>
> So the question you might ask is why not just to use 'ef_at' for atexit
> handlers, make them no to run on __cxa_finalize and thus make your example
> run as you expect? The issue is glibc does not know whether your library
> would be dlopened or not.
>
> If you set an atfork handler by a constructor that references to a function
> inside the shared library and if do *not* set to *not* be ran later you might,
> a case of dlopen -> constructor -> dlclose -> exit will try to execute and
> invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c.
>
> So the question is why exactly glibc defined that atexit should be called
> by dlclose. I understand that __cxa_finalize / destructor make sense to
> make it possible the shared library to free allocated resources, but I
> can't really get why there a need to extend it to 'atexit' as well.
It seems that this requirement seems to come from LSB, although I am not
sure which one came first (the specification or the implementation).
It also states that __cxa_atexit should register a function to be called
by exit or when a shared library is unloaded.
And __cxa_finalize requires to call atexit registers functions as well. It
also states __cxa_finalize should be called on dlclose.
I think it might due the fact old gcc version uses atexit to register C++
destructors for local static and global objects. However it seems to be
enabled as default for GLIBC (since it support __cxa_atexit since initial
versions).
So I think there is no impeding reason to make atexit not be called from
__cxa_finalize, although I am not sure how we would handle the LSB deviation.
I will write down a libc-alpha to check what other developer think.
[1] http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.pdf
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-10 20:27 ` Adhemerval Zanella
2019-06-11 18:39 ` Adhemerval Zanella
@ 2019-06-11 18:53 ` Nat!
1 sibling, 0 replies; 21+ messages in thread
From: Nat! @ 2019-06-11 18:53 UTC (permalink / raw)
To: libc-help; +Cc: Adhemerval Zanella
On 10.06.19 22:27, Adhemerval Zanella wrote:
>
> The program is not load since although ldd does call the loader, it calls
> in a trace mode that does not actually load any shared library. The first
> 'load' is issued by library when bash is first executed and later multiple
> 'unload' is due bash forks and then exits multiple times.
I can understand this. Possibly the same is happening when I am running
this in a debugger.
>
>>
>>> I will try to use your instruction to run on docker to see what exactly
>>> is happening in your environment.
>> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
>>
>> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
>>
>> Â Â Â https://github.com/mulle-nat/ld-so-breakage
> Thanks, it is way more useful. I now I understand what is happening and
> IMHO this behaviour is a required because on glibc we set that atexit/on_exit
> handlers are ran when deregister a library (as for dlclose).
>
> Using the example in your testcase:
>
> ---
> USE_A=YES ./build/main_adbc
> -- install atexit_b
> -- install atexit_a
> -- run atexit_a
> -- run atexit_b
> ---
>
> The behaviour of atexit handlers being called in wrong order is they are
> being registered with '__cxa_atexit' which in turn sets its internal type
> as 'ef_cxa'. Since _dl_init is registered last (after all shared library
> loading and constructors calls), it will call _dl_fini which in turn will
> call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
>
> The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
> by __do_global_dtors_aux and set the function as 'ef_free'. It will then
> prevent '__run_exit_handlers' to run the handlers more than once.
>
> So the question you might ask is why not just to use 'ef_at' for atexit
> handlers, make them no to run on __cxa_finalize and thus make your example
> run as you expect? The issue is glibc does not know whether your library
> would be dlopened or not.
>
> If you set an atfork handler by a constructor that references to a function
> inside the shared library and if do *not* set to *not* be ran later you might,
> a case of dlopen -> constructor -> dlclose -> exit will try to execute and
> invalid mapping. This is exactly what dlfcn/bug-atexit{1,2}.c.
>
> So the question is why exactly glibc defined that atexit should be called
> by dlclose. I understand that __cxa_finalize / destructor make sense to
> make it possible the shared library to free allocated resources, but I
> can't really get why there a need to extend it to 'atexit' as well.
>
My pet theory is this. After I posted my example, I looked at the ELF
spec (http://refspecs.linuxbase.org/elf/elf.pdf) . This writes about how
to implement the ELF destructors. ELF specifies to use `atexit` for
destructors. The ELF spec at the time of writing does not seem to
consider the unloading of a shared object and then everything written
there makes sense. When you want to support unloading though, atexit is
now the wrong way to do it. But the code was already there and noone
wanted to change too much. Alas that's just a theory :)
I still think this is a case of glibc trying to be too helpful, but
doing more damage (violating good code `atexit`) then good (supporting
programmers unwittingly unloading code with `atexit`).
Thanks for taking the time to look into this!
Ciao
 Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-11 18:39 ` Adhemerval Zanella
@ 2019-06-11 20:20 ` Nat!
2019-06-11 22:40 ` Nat!
0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-11 20:20 UTC (permalink / raw)
To: Adhemerval Zanella, libc-help
On 11.06.19 20:39, Adhemerval Zanella wrote:
>
> It seems that this requirement seems to come from LSB, although I am not
> sure which one came first (the specification or the implementation).
> It also states that __cxa_atexit should register a function to be called
> by exit or when a shared library is unloaded.
I don't really have much further to add to this topic, so this is just
some commentary and speculation... and I am probably repeating myself.
https://pubs.opengroup.org/onlinepubs/009695399/functions/atexit.htmlstates:
  The /atexit/() function shall register the function pointed to by
/func/, to be called without arguments at normal program termination.
That's "normal program termination" not anytime before. dlclose is
anytime before. What is happening is a violation of `atexit`.
When I read
http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
I see that it's *Intels* version of C++ that originally dictated this
violation of the C standard. Possibly Intel was writing this with
Windows in mind ?
> And __cxa_finalize requires to call atexit registers functions as well. It
> also states __cxa_finalize should be called on dlclose.
In my opinion the__cxa_finalize requirement is wrong. It's further my
opinion, that a vendors requirement for its C++ ABI, does not "trump"
open standards. :)
>
> I think it might due the fact old gcc version uses atexit to register C++
> destructors for local static and global objects. However it seems to be
> enabled as default for GLIBC (since it support __cxa_atexit since initial
> versions).
>
> So I think there is no impeding reason to make atexit not be called from
> __cxa_finalize, although I am not sure how we would handle the LSB deviation.
> I will write down a libc-alpha to check what other developer think.
I think the proper solution is to rewrite __cxa__finalize and remove
atexit functionality completely from it.
Alas I am not hopeful, that this will be resolved to my taste :)
>
> [1] http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.pdf
>
Ciao
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-11 20:20 ` Nat!
@ 2019-06-11 22:40 ` Nat!
2019-06-12 3:41 ` Carlos O'Donell
2019-06-13 22:53 ` Nat!
0 siblings, 2 replies; 21+ messages in thread
From: Nat! @ 2019-06-11 22:40 UTC (permalink / raw)
To: libc-help
Sorry for the spam, but I just thought of an easy fix for the situation,
with this rewording of
http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
```
The implementation shall arrange for__cxa_finalize() to be called during
early shared library unload (e.g. dlclose()) with a handle to the shared
library. The unload should fail, if the termination function list
contains any __cxa_atexit-registered functions.
When the main program calls exit, the implementation shall cause any
remaining __cxa_atexit-registered functions to be called, either by
calling __cxa_finalize(NULL), or by walking the registration list itself.
```
The effect is, that atexit "poisoned" shared objects stay until
termination, all others get unloaded as they are now, which would be IMO
perfect and expected. As a positive side effect it seems like minimal
code change.
Ciao
  Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-11 22:40 ` Nat!
@ 2019-06-12 3:41 ` Carlos O'Donell
2019-06-13 22:53 ` Nat!
1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-06-12 3:41 UTC (permalink / raw)
To: Nat!, libc-help
On 6/11/19 6:40 PM, Nat! wrote:
> Sorry for the spam, but I just thought of an easy fix for the situation, with this rewording of http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
>
> ```
> The implementation shall arrange for__cxa_finalize() to be called during early shared library unload (e.g. dlclose()) with a handle to the shared library. The unload should fail, if the termination function list contains any __cxa_atexit-registered functions.
> When the main program calls exit, the implementation shall cause any remaining __cxa_atexit-registered functions to be called, either by calling __cxa_finalize(NULL), or by walking the registration list itself.
> ```
>
> The effect is, that atexit "poisoned" shared objects stay until termination, all others get unloaded as they are now, which would be IMO perfect and expected. As a positive side effect it seems like minimal code change.
I disagree.
It would block existing plugin mechanisms from being able to reload their objects until they switched to some other mechanism like destructors.
It is a change which is not conservative and doesn't solve any real problems except an educational issue.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-11 22:40 ` Nat!
2019-06-12 3:41 ` Carlos O'Donell
@ 2019-06-13 22:53 ` Nat!
2019-06-14 12:29 ` Manfred
2019-06-14 15:14 ` Adhemerval Zanella
1 sibling, 2 replies; 21+ messages in thread
From: Nat! @ 2019-06-13 22:53 UTC (permalink / raw)
To: libc-help
Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize
is based, then the algorithm described
there is doing exactly the right thing.
Beause the wording of __cxa_finalize is so shortened, it its hard to
pick out the original meaning. But the description is
actually fully compatible with how `atexit` is supposed to function.
The gist is this. For atexit, functions are stored in a unique way in
the termination function table (clarifications in []):
http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
```
In the latter case [atexit] the pointer to the function is the pointer
passed to atexit(), while the other pointers [operand, handle] are NULL.
```
When dlclose hits, the handle to be closed is `d` and not NULL:
```
The implementation shall arrange for__cxa_finalize() to be called during
early shared library unload (e.g. dlclose()) with a handle to the shared
library.
```
And then
```
When __cxa_finalize(d) is called, it shall walk the termination function
list, calling each in turn if d matches the handle of the termination
function entry.
```
So `atexit`s don't match, since the handle stored is NULL. Only if `d`
is NULL (the base process terminates), then will the atexits be called.
Currently though at `dlclose` time all handlers are called, which breaks
the `atexit` specification as well as your own LSB.
Well it's a goof up, but FreeBSD and MacOS aren't doing any better.
Ciao
 Nat!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-13 22:53 ` Nat!
@ 2019-06-14 12:29 ` Manfred
2019-06-14 15:14 ` Adhemerval Zanella
1 sibling, 0 replies; 21+ messages in thread
From: Manfred @ 2019-06-14 12:29 UTC (permalink / raw)
To: libc-help, libc-alpha
Interesting,
In fact the link you posted is about the LSB, not specific to the
Itanium C++ ABI, and indeed it does the right thing.
As a side note, it leaves up to user code the following:
- do not register with atexit functions residing in dso's that can be
unloaded early (i.e. by dlclose() during execution, not at exit()).
- do not instantate global/static C++ objects whose type is defined in
the main program and derived from classes defined in dso's that can be
unloaded early.
Both requirements descend (implicitly?) from the C and C++ standards,
though.
I'm cross-posting to alpha, in case anyone is interested.
On 6/14/2019 12:53 AM, Nat! wrote:
> Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize
> is based, then the algorithm described
> there is doing exactly the right thing.
> Beause the wording of __cxa_finalize is so shortened, it its hard to
> pick out the original meaning. But the description is
> actually fully compatible with how `atexit` is supposed to function.
>
> The gist is this. For atexit, functions are stored in a unique way in
> the termination function table (clarifications in []):
>
> http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
>
>
> ```
> In the latter case [atexit] the pointer to the function is the pointer
> passed to atexit(), while the other pointers [operand, handle] are NULL.
> ```
>
> When dlclose hits, the handle to be closed is `d` and not NULL:
>
> ```
> The implementation shall arrange for__cxa_finalize() to be called during
> early shared library unload (e.g. dlclose()) with a handle to the shared
> library.
> ```
>
> And then
>
> ```
> When __cxa_finalize(d) is called, it shall walk the termination function
> list, calling each in turn if d matches the handle of the termination
> function entry.
> ```
>
> So `atexit`s don't match, since the handle stored is NULL. Only if `d`
> is NULL (the base process terminates), then will the atexits be called.
> Currently though at `dlclose` time all handlers are called, which breaks
> the `atexit` specification as well as your own LSB.
>
> Well it's a goof up, but FreeBSD and MacOS aren't doing any better.
>
> Ciao
> Â Nat!
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Problem with atexit and _dl_fini
2019-06-13 22:53 ` Nat!
2019-06-14 12:29 ` Manfred
@ 2019-06-14 15:14 ` Adhemerval Zanella
1 sibling, 0 replies; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-14 15:14 UTC (permalink / raw)
To: libc-help
On 13/06/2019 19:53, Nat! wrote:
> Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize is based, then the algorithm described
> there is doing exactly the right thing.
> Beause the wording of __cxa_finalize is so shortened, it its hard to pick out the original meaning. But the description is
> actually fully compatible with how `atexit` is supposed to function.
>
> The gist is this. For atexit, functions are stored in a unique way in the termination function table (clarifications in []):
>
> http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
>
> ```
> In the latter case [atexit] the pointer to the function is the pointer passed to atexit(), while the other pointers [operand, handle] are NULL.
> ```
>
> When dlclose hits, the handle to be closed is `d` and not NULL:
>
> ```
> The implementation shall arrange for__cxa_finalize() to be called during early shared library unload (e.g. dlclose()) with a handle to the shared library.
> ```
>
> And then
>
> ```
> When __cxa_finalize(d) is called, it shall walk the termination function list, calling each in turn if d matches the handle of the termination function entry.
> ```
>
> So `atexit`s don't match, since the handle stored is NULL. Only if `d` is NULL (the base process terminates), then will the atexits be called. Currently though at `dlclose` time all handlers are called, which breaks the `atexit` specification as well as your own LSB.
>
> Well it's a goof up, but FreeBSD and MacOS aren't doing any better.
>
The problem is currently for glibc atexit is implemented as __cxa_atexit as:
---
/* Register FUNC to be executed by `exit'. */
int
#ifndef atexit
attribute_hidden
#endif
atexit (void (*func) (void))
{
return __cxa_atexit ((void (*) (void *)) func, NULL, __dso_handle);
}
---
And linked against a glibc's provided static library (libc_nonshared.a).
The compiler then defines the __dso_handle variable to be an unique
value for each shared-object (on libgcc for gcc case), and the static
linking allows the atexit register to use that value.
This is due by design to make atexit work as __cxa_atexit created by
compiler itself.
What I advocate on a recent discussion on libc-alpha [1] is indeed to
follow what you described. My initial suggestion was to add atexit
handlers using a different mechanism, essentially they would be different
than __cxa_atexit handlers. This would make then not to be called
with __cxa_finalize (NULL), rather exit() will be responsible to actually
call them.
It causes a semantic change though: dlclose will need to actually remove
the atexit the shared library registers (because we can't potentially issue
a function callback where its texts has been 'unmaped'). That's why I think
we will need to use another symbol to register atexit handler, since we will
need to pass to libc the __dso_handler value to allow __cxa_finalize remove
the handler on dlclose.
I have a WIP patch to fix, I will push on a user branch if you want to
check this out.
[1] https://sourceware.org/ml/libc-alpha/2019-06/msg00229.html
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-06-14 15:14 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-18 21:23 Problem with atexit and _dl_fini Nat!
2019-05-19 16:23 ` Florian Weimer
2019-05-19 19:37 ` Nat!
2019-05-21 20:43 ` Adhemerval Zanella
2019-05-22 10:22 ` Nat!
2019-05-22 15:01 ` Adhemerval Zanella
2019-05-22 15:29 ` Nat!
2019-05-22 19:35 ` Adhemerval Zanella
2019-05-29 21:16 ` Nat!
2019-06-09 20:59 ` Nat!
2019-06-10 11:48 ` Adhemerval Zanella
2019-06-10 13:08 ` Nat!
2019-06-10 20:27 ` Adhemerval Zanella
2019-06-11 18:39 ` Adhemerval Zanella
2019-06-11 20:20 ` Nat!
2019-06-11 22:40 ` Nat!
2019-06-12 3:41 ` Carlos O'Donell
2019-06-13 22:53 ` Nat!
2019-06-14 12:29 ` Manfred
2019-06-14 15:14 ` Adhemerval Zanella
2019-06-11 18:53 ` Nat!
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).