public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Problem with atexit and _dl_fini
@ 2019-05-18 21:23 Nat!
  2019-05-19 16:23 ` Florian Weimer
  0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-18 21:23 UTC (permalink / raw)
  To: libc-help

So my problem is, that I observe that my atexit calls are not executed 
in the correct order.

i.e. atexit( a); atexit( b);  should result in b(), a() being called in 
that order. To quote the man page,

"the registered functions are invoked in reverse order".


When I register with `atexit` I can see my functions being added 
properly within `__internal_atexit` in the

correct order. Finally after my functions, the elf-loader ? also adds 
itself there. So it is being called first by

`__run_exit_handlers`.


Then comes the part where it goes wrong. I registered my two function 
with `__internal_atexit`, but for some reason

`_dl_fini` is calling `__cxa_finalize` and that is calling the wrong 
function first.


I would like to question the wisdom of `_dl_fini` calling my destructor, 
which I never registered via

__attribute__((destructor)) or somesuch. To me that is a bug. If 
`_dl_fini` would leave that to `__internal_atexit`,

there would be no problem. The comments in `_dl_fini` say, that it does 
some complicated dependency checks to

call destructors in the correct order. This seemingly interferes with 
the atexit order.



Here is a stacktrace of the wrong call back:

```

* thread #1, name = 'noleak.debug.ex', stop reason = breakpoint 2.1
   * frame #0: 0x00007ffff7f86740 
libmulle-testallocator.so`mulle_testallocator_exit at 
mulle-testallocator.c:451:14
     frame #1: 0x00007ffff7e131af 
libc.so.6`__cxa_finalize(d=0x00007ffff7f890e0) at cxa_finalize.c:83:6
     frame #2: 0x00007ffff7f85243 
libmulle-testallocator.so`__do_global_dtors_aux + 35
     frame #3: 0x00007ffff7fe2ce6 ld-2.29.so`_dl_fini at dl-fini.c:138:9
     frame #4: 0x00007ffff7e12c65 
libc.so.6`__run_exit_handlers(status=0, listp=0x00007ffff7f7d718, 
run_list_atexit=<unavailable>, run_dtors=<unavailable>) at exit.c:108:8
     frame #5: 0x00007ffff7e12d8c 
libc.so.6`__GI_exit(status=<unavailable>) at exit.c:139:3
     frame #6: 0x00007ffff7dfe587 
libc.so.6`__libc_start_main(main=(noleak.debug.exe`main at noleak.m:18), 
argc=1, argv=0x00007fffffffd728, init=(noleak.debug.exe`__libc_csu_init 
at elf-init.c:68:1), fini=<unavailable>, rtld_fini=<unavailable>, 
stack_end=0x00007fffffffd718) at libc-start.c:342:3
     frame #7: 0x000000000040110a noleak.debug.exe`_start + 42
```


Stacktrace of the corresponding atexit :

```

   * frame #0: 0x00007ffff7f869a0 libmulle-testallocator.so`atexit
     frame #1: 0x00007ffff7f868c7 
libmulle-testallocator.so`mulle_testallocator_initialize at 
mulle-testallocator.c:492:7
     frame #2: 0x00007ffff7fe295a ld-2.29.so`call_init(l=<unavailable>, 
argc=1, argv=0x00007fffffffd728, env=0x00007fffffffd738) at dl-init.c:72:3
     frame #3: 0x00007ffff7fe2a59 ld-2.29.so`_dl_init at dl-init.c:30:6
     frame #4: 0x00007ffff7fe2a43 
ld-2.29.so`_dl_init(main_map=0x00007ffff7ffe190, argc=1, 
argv=0x00007fffffffd728, env=0x00007fffffffd738) at dl-init.c:119
     frame #5: 0x00007ffff7fd30ca ld-2.29.so`_dl_start_user + 50
```


A tweet/screenshot of the problem in action, where you can see that 
libmulle-testallocator.so is calling atexit first, but also gets called 
back first:


https://twitter.com/mulle_nat/status/1129131042001043456


I tried to reproduce this with a smaller test case, but wasn't 
successful. I have the problem with system glibc as well as with my 
self-built debugging glibc-2.29.


```

(lldb) image list
[  0] 047FB15C 0x0000000000400000 
/home/src/srcO/mulle-objc/MulleObjC/test/0_noleak/noleak.debug.exe
[  1] 7A9876B1-3173-65A8-3AC8-D6F6E88FD53F-25BEA927 0x00007ffff7fd2000 
/lib/x86_64-linux-gnu/ld-2.29.so
       /usr/lib/debug/lib/x86_64-linux-gnu/ld-2.29.so
[  2] 76B794BB-7305-068D-C4A6-B2C6330A23DE-0BC95526 0x00007ffff7fd1000 
[vdso] (0x00007ffff7fd1000)
[  3] 05639C58 0x00007ffff7f8a000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libMulleObjC.so
[  4] D1F022AE-06A7-DD85-43B7-D26B34D45415-59240C1E 0x00007ffff7f84000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-testallocator.so 

[  5] 0F5AACB3-8162-0D16-F936-1C8C44F8B6D6-82423DF9 0x00007ffff7dd9000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libc.so.6
[  6] 13E7B8ED-A5D0-4B2C-05BE-0926D37365CE-A861C019 0x00007ffff7d89000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-objc-runtime.so 

[  7] 96F58694-13A4-8019-C8C7-47993BC8B10E-C11F8498 0x00007ffff7d84000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libdl.so.2
[  8] 8BBE3351-7A55-AB93-23D3-E05E6DBF19C4-842503AB 0x00007ffff7d7a000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-concurrent.so 

[  9] 4D189D24-CA86-BCAD-32DC-2C4E15D6DB70-C51C94FB 0x00007ffff7d6c000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-aba.so
[ 10] DEB7E042-331F-B188-43B1-91C1225C51EB-DA4DAC39 0x00007ffff7d67000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-allocator.so 

[ 11] 30F22D5C-1699-A1E2-C135-14F5A2199065-7DCC8A02 0x00007ffff7d62000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-thread.so
[ 12] 6AC089FD-3867-9707-F7F1-4E2B6347F639-60D504FF 0x00007ffff7d5d000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-vararg.so
[ 13] C18E6B6F-2BD8-DC81-92BE-E3CA239F4B92-D1FBB101 0x00007ffff7d56000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-stacktrace.so 

[ 14] 272D4479-EC66-502F-C535-BBC9286963F6-09CA9E9D 0x00007ffff7d3d000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-container.so 

[ 15] A6F82062-5433-D33D-4BC8-F81DA55BC85A-76B9D8AF 0x00007ffff7d1d000 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libpthread.so.0
[ 16] FB3F5F86-E867-5910-864C-3E064CEAC7D3-4FFFD5BC 0x00007ffff7cb2000 
/lib/x86_64-linux-gnu/libgcc_s.so.1
```

LD_DEBUG: doesn't say much:


```

      13670:
      13670:    calling fini: 
/home/src/srcO/mulle-objc/MulleObjC/test/0_noleak/noleak.debug.exe [0]
      13670:
      13670:
      13670:    calling fini: 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libMulleObjC.so [0]
      13670:
      13670:
      13670:    calling fini: 
/home/src/srcO/mulle-objc/MulleObjC/test/dependency/lib/libmulle-testallocator.so 
[0]
      13670:
mulle_testallocator: exit (0x7f78f9ee873c)
```

Ciao

    Nat!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-18 21:23 Problem with atexit and _dl_fini Nat!
@ 2019-05-19 16:23 ` Florian Weimer
  2019-05-19 19:37   ` Nat!
  0 siblings, 1 reply; 21+ messages in thread
From: Florian Weimer @ 2019-05-19 16:23 UTC (permalink / raw)
  To: Nat!; +Cc: libc-help

* Nat!:

> So my problem is, that I observe that my atexit calls are not executed 
> in the correct order.
>
> i.e. atexit( a); atexit( b);  should result in b(), a() being called in 
> that order. To quote the man page,
>
> "the registered functions are invoked in reverse order".
>
>
> When I register with `atexit` I can see my functions being added 
> properly within `__internal_atexit` in the
>
> correct order. Finally after my functions, the elf-loader ? also adds 
> itself there. So it is being called first by
>
> `__run_exit_handlers`.
>
>
> Then comes the part where it goes wrong. I registered my two function 
> with `__internal_atexit`, but for some reason
>
> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong 
> function first.

When atexit is called from a DSO, glibc calls the registered function
before the DSO is unloaded.  This choice was made because after
unloading, the function pointer becomes invalid.

I haven't checked, but I suspect atexit still works this way even if
it doesn't have to (because the DSO is never unloaded).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-19 16:23 ` Florian Weimer
@ 2019-05-19 19:37   ` Nat!
  2019-05-21 20:43     ` Adhemerval Zanella
  2019-06-09 20:59     ` Nat!
  0 siblings, 2 replies; 21+ messages in thread
From: Nat! @ 2019-05-19 19:37 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-help


On 19.05.19 18:23, Florian Weimer wrote:
> * Nat!:
>
>> So my problem is, that I observe that my atexit calls are not executed
>> in the correct order.
>>
>> i.e. atexit( a); atexit( b);  should result in b(), a() being called in
>> that order. To quote the man page,
>>
>> "the registered functions are invoked in reverse order".
>>
>>
>> When I register with `atexit` I can see my functions being added
>> properly within `__internal_atexit` in the
>>
>> correct order. Finally after my functions, the elf-loader ? also adds
>> itself there. So it is being called first by
>>
>> `__run_exit_handlers`.
>>
>>
>> Then comes the part where it goes wrong. I registered my two function
>> with `__internal_atexit`, but for some reason
>>
>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>> function first.
> When atexit is called from a DSO, glibc calls the registered function
> before the DSO is unloaded.  This choice was made because after
> unloading, the function pointer becomes invalid.
>
> I haven't checked, but I suspect atexit still works this way even if
> it doesn't have to (because the DSO is never unloaded).
>

I understand, but the behavior is wrong :) The C standard (or the C++ 
standard for this matter) 
http://www.cplusplus.com/reference/cstdlib/atexit/ states that


```

If more than one atexit function has been specified by different calls 
to this function, they are all executed in reverse order as a stack 
(i.e. the last function specified is the first to be executed at exit).

```

I think its been shown that glibc can violate this C standard, so for me 
the argument would be over here already. That one should unwind in the 
reverse order is, I assume, not a interesting discussion topic. 
Currently atexit as a reliable mechanism is broken.


But I also don't think the way this is currently handled in glibc, can't 
be of much use to anyone.

Case 1: a regular exe linked with shared libraries, nothing gets 
unloaded at exit, so what's the point ?

Case 2: someone manually unloads a shared library, that contains atexit 
code. The bug is either using `atexit` for a shared library that gets 
unloaded, or unloading a shared library that contains atexit code. But 
it's not really glibcs business IMO.

Case 3: some automatism unloads shared libraries. Then the automatism 
should check if atexit code is affected and not unload, because the 
shared library is still clearly needed. It's a bug in the automatism.

If one was hellbent on trying to support atexit for unloading shared 
libraries, an atexit contained in a shared library should up the 
reference count of the shared library during the atexit call and 
decrement after the callback has executed.

Ciao

    Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-19 19:37   ` Nat!
@ 2019-05-21 20:43     ` Adhemerval Zanella
  2019-05-22 10:22       ` Nat!
  2019-06-09 20:59     ` Nat!
  1 sibling, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-21 20:43 UTC (permalink / raw)
  To: libc-help



On 19/05/2019 16:37, Nat! wrote:
> 
> On 19.05.19 18:23, Florian Weimer wrote:
>> * Nat!:
>>
>>> So my problem is, that I observe that my atexit calls are not executed
>>> in the correct order.
>>>
>>> i.e. atexit( a); atexit( b);  should result in b(), a() being called in
>>> that order. To quote the man page,
>>>
>>> "the registered functions are invoked in reverse order".
>>>
>>>
>>> When I register with `atexit` I can see my functions being added
>>> properly within `__internal_atexit` in the
>>>
>>> correct order. Finally after my functions, the elf-loader ? also adds
>>> itself there. So it is being called first by
>>>
>>> `__run_exit_handlers`.
>>>
>>>
>>> Then comes the part where it goes wrong. I registered my two function
>>> with `__internal_atexit`, but for some reason
>>>
>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>> function first.
>> When atexit is called from a DSO, glibc calls the registered function
>> before the DSO is unloaded.  This choice was made because after
>> unloading, the function pointer becomes invalid.
>>
>> I haven't checked, but I suspect atexit still works this way even if
>> it doesn't have to (because the DSO is never unloaded).
>>
> 
> I understand, but the behavior is wrong :) The C standard (or the C++ standard for this matter) http://www.cplusplus.com/reference/cstdlib/atexit/ states that
> 
> 
> ```
> 
> If more than one atexit function has been specified by different calls to this function, they are all executed in reverse order as a stack (i.e. the last function specified is the first to be executed at exit).
> 
> ```
> 
> I think its been shown that glibc can violate this C standard, so for me the argument would be over here already. That one should unwind in the reverse order is, I assume, not a interesting discussion topic. Currently atexit as a reliable mechanism is broken.
> 
> 
> But I also don't think the way this is currently handled in glibc, can't be of much use to anyone.
> 
> Case 1: a regular exe linked with shared libraries, nothing gets unloaded at exit, so what's the point ?
> 
> Case 2: someone manually unloads a shared library, that contains atexit code. The bug is either using `atexit` for a shared library that gets unloaded, or unloading a shared library that contains atexit code. But it's not really glibcs business IMO.
> 
> Case 3: some automatism unloads shared libraries. Then the automatism should check if atexit code is affected and not unload, because the shared library is still clearly needed. It's a bug in the automatism.
> 
> If one was hellbent on trying to support atexit for unloading shared libraries, an atexit contained in a shared library should up the reference count of the shared library during the atexit call and decrement after the callback has executed.
> 
> Ciao

Could you provide a testcase that stress the issue you are seeing?  At
least on glibc it does have a testcase that does pretty much what you
described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.

It created a shared library with an exported symbol that registers at
lot of atexit function and the main program dlopen and dclose it and
checks if the atexit handlers are indeed called in the correct order.
It does not use any C++, so there is no __cxa_finalize involved. 

Also, on your debug information I must confess it is confusing what
exactly you are expecting and what exactly your program is doing.
For instance I am not understanding the part 'I registered my two 
function with `__internal_atexit`'. Are you trying to calling it
directly? Or are you calling __cxa_atexit? Keep in mind that
__cxa_atexit calls are generated by compiler itself to destruct
global objects.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-21 20:43     ` Adhemerval Zanella
@ 2019-05-22 10:22       ` Nat!
  2019-05-22 15:01         ` Adhemerval Zanella
  0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-22 10:22 UTC (permalink / raw)
  To: libc-help

Adhemerval Zanella schrieb:

>
> Could you provide a testcase that stress the issue you are seeing?  At
> least on glibc it does have a testcase that does pretty much what you
> described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.

I tried to simplify it, but I failed and gave up (see below).

>
> It created a shared library with an exported symbol that registers at
> lot of atexit function and the main program dlopen and dclose it and
> checks if the atexit handlers are indeed called in the correct order.
> It does not use any C++, so there is no __cxa_finalize involved.

The use of __cxa_finalize is something I observed in my stacktrace
(see previous mails).

>
> Also, on your debug information I must confess it is confusing what
> exactly you are expecting and what exactly your program is doing.
> For instance I am not understanding the part 'I registered my two
> function with `__internal_atexit`'. Are you trying to calling it
> directly? Or are you calling __cxa_atexit? Keep in mind that
> __cxa_atexit calls are generated by compiler itself to destruct
> global objects.
>

I am really just using `atexit`. I was trying to explain, what the 
internal glib problem is, that I observed setting breakpoints and 
stepping through the code.

What conceptually I am doing is to install on-demand `atexit` handlers 
during the load of shared libraries. These are then used for debugging.

Basically like this (typed by hand):

liba.so:

void   a( void)
{
}

__attribute__((constructor))
static void   check( void)
{
    if( getenv( "DO_B_ON_EXIT"))
       atexit( a);
}


libb.so:

void   b( void)
{
}

__attribute__((constructor))
static void   check( void)
{
    if( getenv( "DO_A_ON_EXIT"))
       atexit( b);
}

I tried this with a simple setup and that doesn't create problems as 
such. But dl-fini re-sorts the dependencies sometimes, and then
the atexit order is compromised and that is what I am running into.

Ciao
    Nat!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-22 10:22       ` Nat!
@ 2019-05-22 15:01         ` Adhemerval Zanella
  2019-05-22 15:29           ` Nat!
  0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-22 15:01 UTC (permalink / raw)
  To: libc-help



On 22/05/2019 07:22, Nat! wrote:
> Adhemerval Zanella schrieb:
> 
>>
>> Could you provide a testcase that stress the issue you are seeing?  At
>> least on glibc it does have a testcase that does pretty much what you
>> described, dlfcn/bug-atexit1-lib.c and dlfcn/bug-atexit1.c.
> 
> I tried to simplify it, but I failed and gave up (see below).
> 
>>
>> It created a shared library with an exported symbol that registers at
>> lot of atexit function and the main program dlopen and dclose it and
>> checks if the atexit handlers are indeed called in the correct order.
>> It does not use any C++, so there is no __cxa_finalize involved.
> 
> The use of __cxa_finalize is something I observed in my stacktrace
> (see previous mails).
> 
>>
>> Also, on your debug information I must confess it is confusing what
>> exactly you are expecting and what exactly your program is doing.
>> For instance I am not understanding the part 'I registered my two
>> function with `__internal_atexit`'. Are you trying to calling it
>> directly? Or are you calling __cxa_atexit? Keep in mind that
>> __cxa_atexit calls are generated by compiler itself to destruct
>> global objects.
>>
> 
> I am really just using `atexit`. I was trying to explain, what the internal glib problem is, that I observed setting breakpoints and stepping through the code.
> 
> What conceptually I am doing is to install on-demand `atexit` handlers during the load of shared libraries. These are then used for debugging.
> 
> Basically like this (typed by hand):
> 
> liba.so:
> 
> void   a( void)
> {
> }
> 
> __attribute__((constructor))
> static void   check( void)
> {
>    if( getenv( "DO_B_ON_EXIT"))
>       atexit( a);
> }
> 
> 
> libb.so:
> 
> void   b( void)
> {
> }
> 
> __attribute__((constructor))
> static void   check( void)
> {
>    if( getenv( "DO_A_ON_EXIT"))
>       atexit( b);
> }
> 
> I tried this with a simple setup and that doesn't create problems as such. But dl-fini re-sorts the dependencies sometimes, and then
> the atexit order is compromised and that is what I am running into.
> 

Are you sure that you calling the atexit in the order you expect? Because by
explicit linking it will depend on the order of the shared object you are
passing on linking command.

Using the example (I just instrumented both 'a' and 'b' to prints the function
name using write on STDERR_FILENO):

$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -lliba -llibb
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test 
a
b

$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -llibb -lliba
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test 
b
a

Is is also similar to a dlopen case:

--
#include <dlfcn.h>
#include <assert.h>

int main (void)
{ 
  void *handle_a = dlopen ("libliba.so", RTLD_NOW);
  assert (handle_a);
  void *handle_b = dlopen ("liblibb.so", RTLD_NOW);
  assert (handle_b);
}
--

$ gcc -Wall test-dlopen.c -o test-dlopen -ldl
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test-dlopen 
b
a

What you might seeing is maybe a implicit dependency that make loading 
in a specific order even if you try to explicit issue them on linking 
command.  Using the starting example, if you link liblibb.so with 
libliba.so as a dependency:

$ gcc -Wall -shared -fpic liba.c -o libliba.so
$ gcc -Wall -shared -fpic libb.c -o liblibb.so -L`pwd` -Wl,--no-as-needed -lliba

It does not matter how you link the resulting program, the atexit handler will
be registered in the same order:

$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -llibb -lliba
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
b
a
$ gcc -Wall test.c -o test -L`pwd` -Wl,--no-as-needed -lliba -llibb
$ LD_LIBRARY_PATH=. DO_A_ON_EXIT=1 DO_B_ON_EXIT=1 ./test
b
a

I really think it is something related to your build because glibc does not
actually reorder the internal __exit_funcs list.  AFAIK once it is inserted at
stdlib/on_exit.c by __internal_atexit it is deregistered at stdlib/cxa_finalize.c.

So if you could actually show the issue your are observing, please open a bug
report.  Otherwise I am not seeing an actually issue with atexit handlers
here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-22 15:01         ` Adhemerval Zanella
@ 2019-05-22 15:29           ` Nat!
  2019-05-22 19:35             ` Adhemerval Zanella
  0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-05-22 15:29 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help

Did you take a look at the screenshot in the tweet 
(https://twitter.com/mulle_nat/status/1129131042001043456/photo/1) I 
linked ? That's the best evidence I can provide, that its really 
happening. I tried to reproduce it pretty much the same as you did, but 
wasn't successful. It's not easy...

On a hunch I would say the problem will turn out to have something to do 
with liba having shared library dependencies and libb having shared 
library dependecies and some are shared by both and some not and dl-fini 
sorts things in the wrong order.

If I get my release stuff down and find some spare time, I'll attempt to 
reproduce it again.

Ciao
    Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-22 15:29           ` Nat!
@ 2019-05-22 19:35             ` Adhemerval Zanella
  2019-05-29 21:16               ` Nat!
  0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-05-22 19:35 UTC (permalink / raw)
  To: Nat!; +Cc: libc-help



On 22/05/2019 12:29, Nat! wrote:
> Did you take a look at the screenshot in the tweet (https://twitter.com/mulle_nat/status/1129131042001043456/photo/1) I linked ? That's the best evidence I can provide, that its really happening. I tried to reproduce it pretty much the same as you did, but wasn't successful. It's not easy...

And I can't really tell without actually debugging it. On __cxa_finalizer, could
you dump the __exit_funcs values? The only thing I can think of is _dl_fini is
having to sort out the maps because of an implicit dependency between the shared
libraries.

> 
> On a hunch I would say the problem will turn out to have something to do with liba having shared library dependencies and libb having shared library dependecies and some are shared by both and some not and dl-fini sorts things in the wrong order.
> 
> If I get my release stuff down and find some spare time, I'll attempt to reproduce it again.

Do you have a easy way to provide the environment you are testing?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-22 19:35             ` Adhemerval Zanella
@ 2019-05-29 21:16               ` Nat!
  0 siblings, 0 replies; 21+ messages in thread
From: Nat! @ 2019-05-29 21:16 UTC (permalink / raw)
  To: libc-help, Adhemerval Zanella


On 22.05.19 21:34, Adhemerval Zanella wrote:
>
> Do you have a easy way to provide the environment you are testing?
>
Now I can provide a way to reproduce the problem and debug it. While 
this may look a little daunting at first, it's mostly just copy/paste of 
commands and in the end you have a gdb session with a glibc that can be 
stepped through in the debugger.


atexit-bug.md

---

This is about the simplest way to reproduce the `atexit` problem in glibc.


## Create a docker with the development environment

Get the Dockerfile:

```
curl -L -O 
'https://raw.githubusercontent.com/MulleFoundation/foundation-developer/release/Dockerfile'
```

As we want to debug with the newest **glibc** we need to use `ubuntu:disco`
instead of `ubuntu:bionic`, so change the first line of
the `Dockerfile` to `FROM ubuntu:disco`. Now you are ready to build the
container:


```
sudo docker build -t foundation -f Dockerfile "`mktemp -d`"
sudo docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined 
-i -t --rm foundation
```


## Get some tools via apt

Inside the docker get some prerequisites for debugging and for **glibc**:

```
sudo apt-get -y install vim gdb gawk bison gettext texinfo
```

## Download the project

Inside the docker get the **MulleObjC** project. This will place you in a
virtual environment subshell.  With `exit` you can get out (you should).


```
mulle-sde https://github.com/mulle-objc/MulleObjC/archive/latest.tar.gz
exit
```


## Remove atexit fix

With the `atexit` fix, there are no problems, so we need to take it out:

```
cd MulleObjC/test
mulle-sde environment set MULLE_ATEXIT_URL 
'https://github.com/mulle-core/mulle-atexit/archive/placebo.tar.gz'
```


## Build with a debug version of glibc

Build everything with a custom built version of **glibc**, so we can 
debug it.
While still being in `MulleObjC/test`:

```
mulle-sourcetree -N add --nodetype 'tar' --marks 'no-all-load,no-import' 
--userinfo 'aliases=c' --url 
'${GLIBC_2_29_URL:-https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.xz}' 
--branch '${GLIBC_2_29_BRANCH}' 'glibc'
mulle-sourcetree move glibc top
mulle-sde dependency craftinfo set glibc CC_DEBUG '-O1 -g'
mulle-sde dependency craftinfo set glibc SKIP_AUTOCONF YES
mulle-sde test craft
```

## Build first test and observe the error

While still being in `MulleObjC/test`:

```
mulle-sde -vvv test run --keep-exe 0_noleak/noleak.m
```

The error should appear.
Now you can look at the debugger to seem the wrong atexit order:

```
MULLE_TESTALLOCATOR=1 MULLE_OBJC_PEDANTIC_EXIT=YES gdb 
/MulleObjC/test/0_noleak/noleak.debug.exe
```

 > Breakpoints to set:
 >
 > b atexit
 > b mulle_testallocator_exit
 > b mulle_objc_global_atexit


---


I tried these steps and it worked out successfully for me.


Good Luck

    Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-05-19 19:37   ` Nat!
  2019-05-21 20:43     ` Adhemerval Zanella
@ 2019-06-09 20:59     ` Nat!
  2019-06-10 11:48       ` Adhemerval Zanella
  1 sibling, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-09 20:59 UTC (permalink / raw)
  To: libc-help

Another datapoint to support my claim that _dl-fini breaks atexit. This 
time its very easy to reproduce ;)

Here 's the README.md from the Github Repo 
https://github.com/mulle-nat/atexit-breakage-linux


```

# Shows another breakage involving `atexit` on linux

Here the `atexit` callback is invoked mistakenly multiple times.


## Build

Build with [mulle-make](//github.com/mulle-sde/mulle-make) or alternatively :

```
(
    mkdir build &&
    cd build &&
    cmake .. &&
    make
)
```

## Run

Use `ldd` to trigger the misbehaviour:

```
LD_PRELOAD="${PWD}/build/libld-preload.so" ldd ./build/main
```

## Output

```
load
unload
unload
unload
    linux-vdso.so.1 (0x00007ffd2b2bd000)
    /home/src/srcO/mulle-core/mulle-testallocator/research/ld-preload/build/libld-preload.so (0x00007f83853c1000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f838518c000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f83853cd000)
unload
unload
```


Ciao
    Nat!


On 19.05.19 21:37, Nat! wrote:
>
> On 19.05.19 18:23, Florian Weimer wrote:
>> * Nat!:
>>
>>> So my problem is, that I observe that my atexit calls are not executed
>>> in the correct order.
>>>
>>> i.e. atexit( a); atexit( b);  should result in b(), a() being called in
>>> that order. To quote the man page,
>>>
>>> "the registered functions are invoked in reverse order".
>>>
>>>
>>> When I register with `atexit` I can see my functions being added
>>> properly within `__internal_atexit` in the
>>>
>>> correct order. Finally after my functions, the elf-loader ? also adds
>>> itself there. So it is being called first by
>>>
>>> `__run_exit_handlers`.
>>>
>>>
>>> Then comes the part where it goes wrong. I registered my two function
>>> with `__internal_atexit`, but for some reason
>>>
>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>> function first.
>> When atexit is called from a DSO, glibc calls the registered function
>> before the DSO is unloaded.  This choice was made because after
>> unloading, the function pointer becomes invalid.
>>
>> I haven't checked, but I suspect atexit still works this way even if
>> it doesn't have to (because the DSO is never unloaded).
>>
>
> I understand, but the behavior is wrong :) The C standard (or the C++ 
> standard for this matter) 
> http://www.cplusplus.com/reference/cstdlib/atexit/ states that
>
>
> ```
>
> If more than one atexit function has been specified by different calls 
> to this function, they are all executed in reverse order as a stack 
> (i.e. the last function specified is the first to be executed at exit).
>
> ```
>
> I think its been shown that glibc can violate this C standard, so for 
> me the argument would be over here already. That one should unwind in 
> the reverse order is, I assume, not a interesting discussion topic. 
> Currently atexit as a reliable mechanism is broken.
>
>
> But I also don't think the way this is currently handled in glibc, 
> can't be of much use to anyone.
>
> Case 1: a regular exe linked with shared libraries, nothing gets 
> unloaded at exit, so what's the point ?
>
> Case 2: someone manually unloads a shared library, that contains 
> atexit code. The bug is either using `atexit` for a shared library 
> that gets unloaded, or unloading a shared library that contains atexit 
> code. But it's not really glibcs business IMO.
>
> Case 3: some automatism unloads shared libraries. Then the automatism 
> should check if atexit code is affected and not unload, because the 
> shared library is still clearly needed. It's a bug in the automatism.
>
> If one was hellbent on trying to support atexit for unloading shared 
> libraries, an atexit contained in a shared library should up the 
> reference count of the shared library during the atexit call and 
> decrement after the callback has executed.
>
> Ciao
>
>    Nat!
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-09 20:59     ` Nat!
@ 2019-06-10 11:48       ` Adhemerval Zanella
  2019-06-10 13:08         ` Nat!
  0 siblings, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-10 11:48 UTC (permalink / raw)
  To: libc-help



On 09/06/2019 17:59, Nat! wrote:
> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
> 
> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
> 
> 
> ```
> 
> # Shows another breakage involving `atexit` on linux
> 
> Here the `atexit` callback is invoked mistakenly multiple times.

This 'example' does not really show the issue because ldd script issues
the loader multiple times, see below. You can check exactly what ldd is
doing by calling with sh -x. 


I will try to use your instruction to run on docker to see what exactly
is happening in your environment.


> 
> ## Build
> 
> Build with [mulle-make](//github.com/mulle-sde/mulle-make) or alternatively :
> 
> ```
> (
>    mkdir build &&
>    cd build &&
>    cmake .. &&
>    make
> )
> ```
> 
> ## Run
> 
> Use `ldd` to trigger the misbehaviour:
> 
> ```
> LD_PRELOAD="${PWD}/build/libld-preload.so" ldd ./build/main
> ```
> 
> ## Output
> 
> ```
> load
> unload

First and second time is done on:

158         dummy=`$rtld 2>&1`
159         if test $? = 127; then
160           verify_out=`${rtld} --verify "$file"`
161           ret=$?

Where the rtld list is, for x86_64, /lib/ld-linux.so.2 and /lib64/ld-linux-x86-64.so.2

> unload
> unload

This time is done at:

176     0|2)
177       try_trace "$RTLD" "$file" || result=1
178       ;;
179     *)

Where is call the loader as 

eval LD_TRACE_LOADED_OBJECTS=1 LD_WARN= LD_BIND_NOW= 'LD_LIBRARY_VERSION=$verify_out' LD_VERBOSE= '"$@"'

With same rtld list as before.

>    linux-vdso.so.1 (0x00007ffd2b2bd000)
>    /home/src/srcO/mulle-core/mulle-testallocator/research/ld-preload/build/libld-preload.so (0x00007f83853c1000)
>    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f838518c000)
>    /lib64/ld-linux-x86-64.so.2 (0x00007f83853cd000)
> unload
> unload
> ```
> 
> 
> Ciao
>    Nat!
> 
> 
> On 19.05.19 21:37, Nat! wrote:
>>
>> On 19.05.19 18:23, Florian Weimer wrote:
>>> * Nat!:
>>>
>>>> So my problem is, that I observe that my atexit calls are not executed
>>>> in the correct order.
>>>>
>>>> i.e. atexit( a); atexit( b);  should result in b(), a() being called in
>>>> that order. To quote the man page,
>>>>
>>>> "the registered functions are invoked in reverse order".
>>>>
>>>>
>>>> When I register with `atexit` I can see my functions being added
>>>> properly within `__internal_atexit` in the
>>>>
>>>> correct order. Finally after my functions, the elf-loader ? also adds
>>>> itself there. So it is being called first by
>>>>
>>>> `__run_exit_handlers`.
>>>>
>>>>
>>>> Then comes the part where it goes wrong. I registered my two function
>>>> with `__internal_atexit`, but for some reason
>>>>
>>>> `_dl_fini` is calling `__cxa_finalize` and that is calling the wrong
>>>> function first.
>>> When atexit is called from a DSO, glibc calls the registered function
>>> before the DSO is unloaded.  This choice was made because after
>>> unloading, the function pointer becomes invalid.
>>>
>>> I haven't checked, but I suspect atexit still works this way even if
>>> it doesn't have to (because the DSO is never unloaded).
>>>
>>
>> I understand, but the behavior is wrong :) The C standard (or the C++ standard for this matter) http://www.cplusplus.com/reference/cstdlib/atexit/ states that
>>
>>
>> ```
>>
>> If more than one atexit function has been specified by different calls to this function, they are all executed in reverse order as a stack (i.e. the last function specified is the first to be executed at exit).
>>
>> ```
>>
>> I think its been shown that glibc can violate this C standard, so for me the argument would be over here already. That one should unwind in the reverse order is, I assume, not a interesting discussion topic. Currently atexit as a reliable mechanism is broken.
>>
>>
>> But I also don't think the way this is currently handled in glibc, can't be of much use to anyone.
>>
>> Case 1: a regular exe linked with shared libraries, nothing gets unloaded at exit, so what's the point ?
>>
>> Case 2: someone manually unloads a shared library, that contains atexit code. The bug is either using `atexit` for a shared library that gets unloaded, or unloading a shared library that contains atexit code. But it's not really glibcs business IMO.
>>
>> Case 3: some automatism unloads shared libraries. Then the automatism should check if atexit code is affected and not unload, because the shared library is still clearly needed. It's a bug in the automatism.
>>
>> If one was hellbent on trying to support atexit for unloading shared libraries, an atexit contained in a shared library should up the reference count of the shared library during the atexit call and decrement after the callback has executed.
>>
>> Ciao
>>
>>    Nat!
>>
>>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-10 11:48       ` Adhemerval Zanella
@ 2019-06-10 13:08         ` Nat!
  2019-06-10 20:27           ` Adhemerval Zanella
  0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-10 13:08 UTC (permalink / raw)
  To: libc-help; +Cc: Adhemerval Zanella


On 10.06.19 13:48, Adhemerval Zanella wrote:
>
> On 09/06/2019 17:59, Nat! wrote:
>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>
>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>
>>
>> ```
>>
>> # Shows another breakage involving `atexit` on linux
>>
>> Here the `atexit` callback is invoked mistakenly multiple times.
> This 'example' does not really show the issue because ldd script issues
> the loader multiple times, see below. You can check exactly what ldd is
> doing by calling with sh -x.

I agree it doesn't show the same issue, but it shows that something else 
is going very wrong. :) Or are you happy, that atexit is called multiple 
times ? Who's calling exit here anyway ? Check out the debugger output 
too (see updated README.md)


>
> I will try to use your instruction to run on docker to see what exactly
> is happening in your environment.

That's not necessary anymore. I managed to make it reproducible in a 
much simpler form just now.

The ld-so-breakage project is basically a recreation of the original 
"docker" scenario written from scratch. I try to explain in the README , 
what is going on. But if there are questions hit me up (maybe as an 
issue ?) :

     https://github.com/mulle-nat/ld-so-breakage


The "another datapoint" project shows how constructor/destructor don't 
pair up:

     https://github.com/mulle-nat/atexit-breakage-linux


And as a random bonus this project indicates to me that LD_PRELOAD 
doesn't do what its supposed to either:

     https://github.com/mulle-nat/LD_PRELOAD-breakage-linux


In total I think the state of affairs is pretty dismal. I didn't expect 
that much basic stuff not working on linux. With hindsight, I probably 
have wasted _weeks_ on these problems.

I still maintain that the concept to let `atexit` callbacks not run by 
`exit` is broken. An `atexit` callback is not the same as an 
`__attribute__((destructor))__`.


Ciao

    Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-10 13:08         ` Nat!
@ 2019-06-10 20:27           ` Adhemerval Zanella
  2019-06-11 18:39             ` Adhemerval Zanella
  2019-06-11 18:53             ` Nat!
  0 siblings, 2 replies; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-10 20:27 UTC (permalink / raw)
  To: Nat!, libc-help



On 10/06/2019 10:07, Nat! wrote:
> 
> On 10.06.19 13:48, Adhemerval Zanella wrote:
>>
>> On 09/06/2019 17:59, Nat! wrote:
>>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>>
>>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>>
>>>
>>> ```
>>>
>>> # Shows another breakage involving `atexit` on linux
>>>
>>> Here the `atexit` callback is invoked mistakenly multiple times.
>> This 'example' does not really show the issue because ldd script issues
>> the loader multiple times, see below. You can check exactly what ldd is
>> doing by calling with sh -x.
> 
> I agree it doesn't show the same issue, but it shows that something else is going very wrong. :) Or are you happy, that atexit is called multiple times ? Who's calling exit here anyway ? Check out the debugger output too (see updated README.md)

The ldd is not a program, but rather a shell script that issues the target
binary along with system loader multiple times. What you are seeing is not 
atexit called multiple times, but rather how the script is called.

When you set LD_PRELOAD *before* issuing ldd you will make the shell binary
to also pre-load the library.  I instrumented the binary to also print the
output command line from the issue binary (get either by program_invocation_name
or /proc/self/cmdline):

$ LD_PRELOAD=./libld-preload.so ./ldd ./main
/bin/bash: load
/bin/bash: unload
/bin/bash: unload
/bin/bash: unload
	linux-vdso.so.1 (0x00007ffd445ef000)
	./libld-preload.so (0x00007fa866ac5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8664b5000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa8668a6000)
/bin/bash: unload
/bin/bash: unload

The program is not load since although ldd does call the loader, it calls
in a trace mode that does not actually load any shared library.  The first
'load' is issued by library when bash is first executed and later multiple
'unload' is due bash forks and then exits multiple times.

> 
> 
>>
>> I will try to use your instruction to run on docker to see what exactly
>> is happening in your environment.
> 
> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
> 
> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
> 
>     https://github.com/mulle-nat/ld-so-breakage

Thanks, it is way more useful. I now I understand what is happening and
IMHO this behaviour is a required because on glibc we set that atexit/on_exit 
handlers are ran when deregister a library (as for dlclose).

Using the example in your testcase:

---
USE_A=YES ./build/main_adbc
-- install atexit_b
-- install atexit_a
-- run atexit_a
-- run atexit_b
---

The behaviour of atexit handlers being called in wrong order is they are
being registered with '__cxa_atexit' which in turn sets its internal type
as 'ef_cxa'.  Since _dl_init is registered last (after all shared library
loading and constructors calls), it will call _dl_fini which in turn will
call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).

The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
by __do_global_dtors_aux and set the function as 'ef_free'. It will then
prevent '__run_exit_handlers' to run the handlers more than once.

So the question you might ask is why not just to use 'ef_at' for atexit
handlers, make them no to run on __cxa_finalize and thus make your example
run as you expect? The issue is glibc does not know whether your library
would be dlopened or not.  

If you set an atfork handler by a constructor that references to a function 
inside the shared library and if do *not* set to *not* be ran later you might, 
a case of dlopen -> constructor -> dlclose -> exit will try to execute and
invalid mapping.  This is exactly what dlfcn/bug-atexit{1,2}.c.

So the question is why exactly glibc defined that atexit should be called
by dlclose. I understand that __cxa_finalize / destructor make sense to
make it possible the shared library to free allocated resources, but I
can't really get why there a need to extend it to 'atexit' as well.

> 
> 
> The "another datapoint" project shows how constructor/destructor don't pair up:
> 
>     https://github.com/mulle-nat/atexit-breakage-linux
> 
> 
> And as a random bonus this project indicates to me that LD_PRELOAD doesn't do what its supposed to either:
> 
>     https://github.com/mulle-nat/LD_PRELOAD-breakage-linux
> 
> 
> In total I think the state of affairs is pretty dismal. I didn't expect that much basic stuff not working on linux. With hindsight, I probably have wasted _weeks_ on these problems.
> 
> I still maintain that the concept to let `atexit` callbacks not run by `exit` is broken. An `atexit` callback is not the same as an `__attribute__((destructor))__`.
> 
> 
> Ciao
> 
>    Nat!
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-10 20:27           ` Adhemerval Zanella
@ 2019-06-11 18:39             ` Adhemerval Zanella
  2019-06-11 20:20               ` Nat!
  2019-06-11 18:53             ` Nat!
  1 sibling, 1 reply; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-11 18:39 UTC (permalink / raw)
  To: Nat!, libc-help



On 10/06/2019 17:27, Adhemerval Zanella wrote:
> 
> 
> On 10/06/2019 10:07, Nat! wrote:
>>
>> On 10.06.19 13:48, Adhemerval Zanella wrote:
>>>
>>> On 09/06/2019 17:59, Nat! wrote:
>>>> Another datapoint to support my claim that _dl-fini breaks atexit. This time its very easy to reproduce ;)
>>>>
>>>> Here 's the README.md from the Github Repo https://github.com/mulle-nat/atexit-breakage-linux
>>>>
>>>>
>>>> ```
>>>>
>>>> # Shows another breakage involving `atexit` on linux
>>>>
>>>> Here the `atexit` callback is invoked mistakenly multiple times.
>>> This 'example' does not really show the issue because ldd script issues
>>> the loader multiple times, see below. You can check exactly what ldd is
>>> doing by calling with sh -x.
>>
>> I agree it doesn't show the same issue, but it shows that something else is going very wrong. :) Or are you happy, that atexit is called multiple times ? Who's calling exit here anyway ? Check out the debugger output too (see updated README.md)
> 
> The ldd is not a program, but rather a shell script that issues the target
> binary along with system loader multiple times. What you are seeing is not 
> atexit called multiple times, but rather how the script is called.
> 
> When you set LD_PRELOAD *before* issuing ldd you will make the shell binary
> to also pre-load the library.  I instrumented the binary to also print the
> output command line from the issue binary (get either by program_invocation_name
> or /proc/self/cmdline):
> 
> $ LD_PRELOAD=./libld-preload.so ./ldd ./main
> /bin/bash: load
> /bin/bash: unload
> /bin/bash: unload
> /bin/bash: unload
> 	linux-vdso.so.1 (0x00007ffd445ef000)
> 	./libld-preload.so (0x00007fa866ac5000)
> 	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa8664b5000)
> 	/lib64/ld-linux-x86-64.so.2 (0x00007fa8668a6000)
> /bin/bash: unload
> /bin/bash: unload
> 
> The program is not load since although ldd does call the loader, it calls
> in a trace mode that does not actually load any shared library.  The first
> 'load' is issued by library when bash is first executed and later multiple
> 'unload' is due bash forks and then exits multiple times.
> 
>>
>>
>>>
>>> I will try to use your instruction to run on docker to see what exactly
>>> is happening in your environment.
>>
>> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
>>
>> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
>>
>>     https://github.com/mulle-nat/ld-so-breakage
> 
> Thanks, it is way more useful. I now I understand what is happening and
> IMHO this behaviour is a required because on glibc we set that atexit/on_exit 
> handlers are ran when deregister a library (as for dlclose).
> 
> Using the example in your testcase:
> 
> ---
> USE_A=YES ./build/main_adbc
> -- install atexit_b
> -- install atexit_a
> -- run atexit_a
> -- run atexit_b
> ---
> 
> The behaviour of atexit handlers being called in wrong order is they are
> being registered with '__cxa_atexit' which in turn sets its internal type
> as 'ef_cxa'.  Since _dl_init is registered last (after all shared library
> loading and constructors calls), it will call _dl_fini which in turn will
> call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
> 
> The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
> by __do_global_dtors_aux and set the function as 'ef_free'. It will then
> prevent '__run_exit_handlers' to run the handlers more than once.
> 
> So the question you might ask is why not just to use 'ef_at' for atexit
> handlers, make them no to run on __cxa_finalize and thus make your example
> run as you expect? The issue is glibc does not know whether your library
> would be dlopened or not.  
> 
> If you set an atfork handler by a constructor that references to a function 
> inside the shared library and if do *not* set to *not* be ran later you might, 
> a case of dlopen -> constructor -> dlclose -> exit will try to execute and
> invalid mapping.  This is exactly what dlfcn/bug-atexit{1,2}.c.
> 
> So the question is why exactly glibc defined that atexit should be called
> by dlclose. I understand that __cxa_finalize / destructor make sense to
> make it possible the shared library to free allocated resources, but I
> can't really get why there a need to extend it to 'atexit' as well.

It seems that this requirement seems to come from LSB, although I am not
sure which one came first (the specification or the implementation). 
It also states that __cxa_atexit should register a function to be called
by exit or when a shared library is unloaded. 

And __cxa_finalize requires to call atexit registers functions as well. It 
also states __cxa_finalize should be called on dlclose.

I think it might due the fact old gcc version uses atexit to register C++
destructors for local static and global objects. However it seems to be
enabled as default for GLIBC (since it support __cxa_atexit since initial
versions).

So I think there is no impeding reason to make atexit not be called from
__cxa_finalize, although I am not sure how we would handle the LSB deviation.
I will write down a libc-alpha to check what other developer think.

[1] http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.pdf

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-10 20:27           ` Adhemerval Zanella
  2019-06-11 18:39             ` Adhemerval Zanella
@ 2019-06-11 18:53             ` Nat!
  1 sibling, 0 replies; 21+ messages in thread
From: Nat! @ 2019-06-11 18:53 UTC (permalink / raw)
  To: libc-help; +Cc: Adhemerval Zanella


On 10.06.19 22:27, Adhemerval Zanella wrote:
>
> The program is not load since although ldd does call the loader, it calls
> in a trace mode that does not actually load any shared library.  The first
> 'load' is issued by library when bash is first executed and later multiple
> 'unload' is due bash forks and then exits multiple times.

I can understand this. Possibly the same is happening when I am running 
this in a debugger.


>
>>
>>> I will try to use your instruction to run on docker to see what exactly
>>> is happening in your environment.
>> That's not necessary anymore. I managed to make it reproducible in a much simpler form just now.
>>
>> The ld-so-breakage project is basically a recreation of the original "docker" scenario written from scratch. I try to explain in the README , what is going on. But if there are questions hit me up (maybe as an issue ?) :
>>
>>      https://github.com/mulle-nat/ld-so-breakage
> Thanks, it is way more useful. I now I understand what is happening and
> IMHO this behaviour is a required because on glibc we set that atexit/on_exit
> handlers are ran when deregister a library (as for dlclose).
>
> Using the example in your testcase:
>
> ---
> USE_A=YES ./build/main_adbc
> -- install atexit_b
> -- install atexit_a
> -- run atexit_a
> -- run atexit_b
> ---
>
> The behaviour of atexit handlers being called in wrong order is they are
> being registered with '__cxa_atexit' which in turn sets its internal type
> as 'ef_cxa'.  Since _dl_init is registered last (after all shared library
> loading and constructors calls), it will call _dl_fini which in turn will
> call '__cxa_finalize' (through __do_global_dtors_aux generated by compiler).
>
> The '__cxa_finalize' will then all 'ef_cxa' function for the module passed
> by __do_global_dtors_aux and set the function as 'ef_free'. It will then
> prevent '__run_exit_handlers' to run the handlers more than once.
>
> So the question you might ask is why not just to use 'ef_at' for atexit
> handlers, make them no to run on __cxa_finalize and thus make your example
> run as you expect? The issue is glibc does not know whether your library
> would be dlopened or not.
>
> If you set an atfork handler by a constructor that references to a function
> inside the shared library and if do *not* set to *not* be ran later you might,
> a case of dlopen -> constructor -> dlclose -> exit will try to execute and
> invalid mapping.  This is exactly what dlfcn/bug-atexit{1,2}.c.
>
> So the question is why exactly glibc defined that atexit should be called
> by dlclose. I understand that __cxa_finalize / destructor make sense to
> make it possible the shared library to free allocated resources, but I
> can't really get why there a need to extend it to 'atexit' as well.
>

My pet theory is this. After I posted my example, I looked at the ELF 
spec (http://refspecs.linuxbase.org/elf/elf.pdf) . This writes about how 
to implement the ELF destructors. ELF specifies to use `atexit` for 
destructors. The ELF spec at the time of writing does not seem to 
consider the unloading of a shared object and then everything written 
there makes sense. When you want to support unloading though, atexit is 
now the wrong way to do it. But the code was already there and noone 
wanted to change too much. Alas that's just a theory :)

I still think this is a case of glibc trying to be too helpful, but 
doing more damage (violating good code `atexit`) then good (supporting 
programmers unwittingly unloading code with `atexit`).

Thanks for taking the time to look into this!

Ciao

   Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-11 18:39             ` Adhemerval Zanella
@ 2019-06-11 20:20               ` Nat!
  2019-06-11 22:40                 ` Nat!
  0 siblings, 1 reply; 21+ messages in thread
From: Nat! @ 2019-06-11 20:20 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-help


On 11.06.19 20:39, Adhemerval Zanella wrote:
>
> It seems that this requirement seems to come from LSB, although I am not
> sure which one came first (the specification or the implementation).
> It also states that __cxa_atexit should register a function to be called
> by exit or when a shared library is unloaded.

I don't really have much further to add to this topic, so this is just 
some commentary and speculation... and I am probably repeating myself.


https://pubs.opengroup.org/onlinepubs/009695399/functions/atexit.htmlstates: 


    The /atexit/() function shall register the function pointed to by 
/func/, to be called without arguments at normal program termination.

That's "normal program termination" not anytime before. dlclose is 
anytime before. What is happening is a violation of `atexit`.


When I read 
http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE

I see that it's *Intels* version of C++ that originally dictated this 
violation of the C standard. Possibly Intel was writing this with 
Windows in mind ?

> And __cxa_finalize requires to call atexit registers functions as well. It
> also states __cxa_finalize should be called on dlclose.

In my opinion the__cxa_finalize requirement is wrong. It's further my 
opinion, that a vendors requirement for its C++ ABI, does not "trump" 
open standards. :)

>
> I think it might due the fact old gcc version uses atexit to register C++
> destructors for local static and global objects. However it seems to be
> enabled as default for GLIBC (since it support __cxa_atexit since initial
> versions).
>
> So I think there is no impeding reason to make atexit not be called from
> __cxa_finalize, although I am not sure how we would handle the LSB deviation.
> I will write down a libc-alpha to check what other developer think.

I think the proper solution is to rewrite __cxa__finalize and remove 
atexit functionality completely from it.
Alas I am not hopeful, that this will be resolved to my taste :)


>
> [1] http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.pdf
>

Ciao

    Nat!



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-11 20:20               ` Nat!
@ 2019-06-11 22:40                 ` Nat!
  2019-06-12  3:41                   ` Carlos O'Donell
  2019-06-13 22:53                   ` Nat!
  0 siblings, 2 replies; 21+ messages in thread
From: Nat! @ 2019-06-11 22:40 UTC (permalink / raw)
  To: libc-help

Sorry for the spam, but I just thought of an easy fix for the situation, 
with this rewording of 
http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE

```
The implementation shall arrange for__cxa_finalize() to be called during 
early shared library unload (e.g. dlclose()) with a handle to the shared 
library. The unload should fail, if the termination function list 
contains any __cxa_atexit-registered functions.
When the main program calls exit, the implementation shall cause any 
remaining __cxa_atexit-registered functions to be called, either by 
calling __cxa_finalize(NULL), or by walking the registration list itself.
```

The effect is, that atexit "poisoned" shared objects stay until 
termination, all others get unloaded as they are now, which would be IMO 
perfect and expected. As a positive side effect it seems like minimal 
code change.

Ciao
    Nat!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-11 22:40                 ` Nat!
@ 2019-06-12  3:41                   ` Carlos O'Donell
  2019-06-13 22:53                   ` Nat!
  1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-06-12  3:41 UTC (permalink / raw)
  To: Nat!, libc-help

On 6/11/19 6:40 PM, Nat! wrote:
> Sorry for the spam, but I just thought of an easy fix for the situation, with this rewording of http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
> 
> ```
> The implementation shall arrange for__cxa_finalize() to be called during early shared library unload (e.g. dlclose()) with a handle to the shared library. The unload should fail, if the termination function list contains any __cxa_atexit-registered functions.
> When the main program calls exit, the implementation shall cause any remaining __cxa_atexit-registered functions to be called, either by calling __cxa_finalize(NULL), or by walking the registration list itself.
> ```
> 
> The effect is, that atexit "poisoned" shared objects stay until termination, all others get unloaded as they are now, which would be IMO perfect and expected. As a positive side effect it seems like minimal code change.

I disagree.

It would block existing plugin mechanisms from being able to reload their objects until they switched to some other mechanism like destructors.

It is a change which is not conservative and doesn't solve any real problems except an educational issue.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-11 22:40                 ` Nat!
  2019-06-12  3:41                   ` Carlos O'Donell
@ 2019-06-13 22:53                   ` Nat!
  2019-06-14 12:29                     ` Manfred
  2019-06-14 15:14                     ` Adhemerval Zanella
  1 sibling, 2 replies; 21+ messages in thread
From: Nat! @ 2019-06-13 22:53 UTC (permalink / raw)
  To: libc-help

Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize 
is based, then the algorithm described
there is doing exactly the right thing.
Beause the wording of __cxa_finalize is so shortened, it its hard to 
pick out the original meaning. But the description is
actually fully compatible with how `atexit` is supposed to function.

The gist is this. For atexit, functions are stored in a unique way in 
the termination function table (clarifications in []):

http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE

```
In the latter case [atexit] the pointer to the function is the pointer 
passed to atexit(), while the other pointers [operand, handle] are NULL.
```

When dlclose hits, the handle to be closed is `d` and not NULL:

```
The implementation shall arrange for__cxa_finalize() to be called during 
early shared library unload (e.g. dlclose()) with a handle to the shared 
library.
```

And then

```
When __cxa_finalize(d) is called, it shall walk the termination function 
list, calling each in turn if d matches the handle of the termination 
function entry.
```

So `atexit`s don't match, since the handle stored is NULL. Only if `d` 
is NULL (the base process terminates), then will the atexits be called. 
Currently though at `dlclose` time all handlers are called, which breaks 
the `atexit` specification as well as your own LSB.

Well it's a goof up, but FreeBSD and MacOS aren't doing any better.

Ciao
   Nat!


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-13 22:53                   ` Nat!
@ 2019-06-14 12:29                     ` Manfred
  2019-06-14 15:14                     ` Adhemerval Zanella
  1 sibling, 0 replies; 21+ messages in thread
From: Manfred @ 2019-06-14 12:29 UTC (permalink / raw)
  To: libc-help, libc-alpha

Interesting,

In fact the link you posted is about the LSB, not specific to the 
Itanium C++ ABI, and indeed it does the right thing.

As a side note, it leaves up to user code the following:
- do not register with atexit functions residing in dso's that can be 
unloaded early (i.e. by dlclose() during execution, not at exit()).
- do not instantate global/static C++ objects whose type is defined in 
the main program and derived from classes defined in dso's that can be 
unloaded early.

Both requirements descend (implicitly?) from the C and C++ standards, 
though.

I'm cross-posting to alpha, in case anyone is interested.


On 6/14/2019 12:53 AM, Nat! wrote:
> Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize 
> is based, then the algorithm described
> there is doing exactly the right thing.
> Beause the wording of __cxa_finalize is so shortened, it its hard to 
> pick out the original meaning. But the description is
> actually fully compatible with how `atexit` is supposed to function.
> 
> The gist is this. For atexit, functions are stored in a unique way in 
> the termination function table (clarifications in []):
> 
> http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE 
> 
> 
> ```
> In the latter case [atexit] the pointer to the function is the pointer 
> passed to atexit(), while the other pointers [operand, handle] are NULL.
> ```
> 
> When dlclose hits, the handle to be closed is `d` and not NULL:
> 
> ```
> The implementation shall arrange for__cxa_finalize() to be called during 
> early shared library unload (e.g. dlclose()) with a handle to the shared 
> library.
> ```
> 
> And then
> 
> ```
> When __cxa_finalize(d) is called, it shall walk the termination function 
> list, calling each in turn if d matches the handle of the termination 
> function entry.
> ```
> 
> So `atexit`s don't match, since the handle stored is NULL. Only if `d` 
> is NULL (the base process terminates), then will the atexits be called. 
> Currently though at `dlclose` time all handlers are called, which breaks 
> the `atexit` specification as well as your own LSB.
> 
> Well it's a goof up, but FreeBSD and MacOS aren't doing any better.
> 
> Ciao
>    Nat!
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Problem with atexit and _dl_fini
  2019-06-13 22:53                   ` Nat!
  2019-06-14 12:29                     ` Manfred
@ 2019-06-14 15:14                     ` Adhemerval Zanella
  1 sibling, 0 replies; 21+ messages in thread
From: Adhemerval Zanella @ 2019-06-14 15:14 UTC (permalink / raw)
  To: libc-help



On 13/06/2019 19:53, Nat! wrote:
> Funnily enough, if you read the Itanium C++ ABI, on which __cxa_finalize is based, then the algorithm described
> there is doing exactly the right thing.
> Beause the wording of __cxa_finalize is so shortened, it its hard to pick out the original meaning. But the description is
> actually fully compatible with how `atexit` is supposed to function.
> 
> The gist is this. For atexit, functions are stored in a unique way in the termination function table (clarifications in []):
> 
> http://refspecs.linuxbase.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic.html#BASELIB---CXA-FINALIZE
> 
> ```
> In the latter case [atexit] the pointer to the function is the pointer passed to atexit(), while the other pointers [operand, handle] are NULL.
> ```
> 
> When dlclose hits, the handle to be closed is `d` and not NULL:
> 
> ```
> The implementation shall arrange for__cxa_finalize() to be called during early shared library unload (e.g. dlclose()) with a handle to the shared library.
> ```
> 
> And then
> 
> ```
> When __cxa_finalize(d) is called, it shall walk the termination function list, calling each in turn if d matches the handle of the termination function entry.
> ```
> 
> So `atexit`s don't match, since the handle stored is NULL. Only if `d` is NULL (the base process terminates), then will the atexits be called. Currently though at `dlclose` time all handlers are called, which breaks the `atexit` specification as well as your own LSB.
> 
> Well it's a goof up, but FreeBSD and MacOS aren't doing any better.
> 

The problem is currently for glibc atexit is implemented as __cxa_atexit as:

---
/* Register FUNC to be executed by `exit'.  */
int
#ifndef atexit
attribute_hidden
#endif
atexit (void (*func) (void))
{
  return __cxa_atexit ((void (*) (void *)) func, NULL, __dso_handle);
}
---

And linked against a glibc's provided static library (libc_nonshared.a). 
The compiler then defines the __dso_handle variable to be an unique
value for each shared-object (on libgcc for gcc case), and the static 
linking allows the atexit register to use that value.

This is due by design to make atexit work as __cxa_atexit created by
compiler itself.

What I advocate on a recent discussion on libc-alpha [1] is indeed to
follow what you described. My initial suggestion was to add atexit
handlers using a different mechanism, essentially they would be different
than __cxa_atexit handlers. This would make then not to be called
with __cxa_finalize (NULL), rather exit() will be responsible to actually
call them.

It causes a semantic change though: dlclose will need to actually remove
the atexit the shared library registers (because we can't potentially issue
a function callback where its texts has been 'unmaped'). That's why I think 
we will need to use another symbol to register atexit handler, since we will 
need to pass to libc the __dso_handler value to allow __cxa_finalize remove 
the handler on dlclose.

I have a WIP patch to fix, I will push on a user branch if you want to
check this out.

[1] https://sourceware.org/ml/libc-alpha/2019-06/msg00229.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-06-14 15:14 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-18 21:23 Problem with atexit and _dl_fini Nat!
2019-05-19 16:23 ` Florian Weimer
2019-05-19 19:37   ` Nat!
2019-05-21 20:43     ` Adhemerval Zanella
2019-05-22 10:22       ` Nat!
2019-05-22 15:01         ` Adhemerval Zanella
2019-05-22 15:29           ` Nat!
2019-05-22 19:35             ` Adhemerval Zanella
2019-05-29 21:16               ` Nat!
2019-06-09 20:59     ` Nat!
2019-06-10 11:48       ` Adhemerval Zanella
2019-06-10 13:08         ` Nat!
2019-06-10 20:27           ` Adhemerval Zanella
2019-06-11 18:39             ` Adhemerval Zanella
2019-06-11 20:20               ` Nat!
2019-06-11 22:40                 ` Nat!
2019-06-12  3:41                   ` Carlos O'Donell
2019-06-13 22:53                   ` Nat!
2019-06-14 12:29                     ` Manfred
2019-06-14 15:14                     ` Adhemerval Zanella
2019-06-11 18:53             ` Nat!

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).