public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Calling __cxa_thread_atexit_impl directly, from C code
@ 2022-08-26  8:31 Florian Weimer
  2022-08-29 18:59 ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-26  8:31 UTC (permalink / raw)
  To: libc-alpha

Do we support calling __cxa_thread_atexit_impl directly?  The function
is not documented, nor is it declared in any installed header.

I think the answer is yes, because we support different C++
implementations, and they have to call __cxa_thread_atexit_impl for
thread-local destructors, either directly or through a wrapper
(__cxa_thread_atexit for libstdc++, as proposed for the Itanium C++
ABI).

There's interest in this because it's much easier to use than
pthread_key_create if you have to avoid linking against libpthread and
you target glibc releases between 2.18 and 2.33 (although it is possible
to use weak symbols, as in the libstdc++ implementation in
libstdc++-v3/libsupc++/atexit_thread.cc).

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-26  8:31 Calling __cxa_thread_atexit_impl directly, from C code Florian Weimer
@ 2022-08-29 18:59 ` Adhemerval Zanella Netto
  2022-08-29 19:21   ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella Netto @ 2022-08-29 18:59 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha



On 26/08/22 05:31, Florian Weimer via Libc-alpha wrote:
> Do we support calling __cxa_thread_atexit_impl directly?  The function
> is not documented, nor is it declared in any installed header.
> 
> I think the answer is yes, because we support different C++
> implementations, and they have to call __cxa_thread_atexit_impl for
> thread-local destructors, either directly or through a wrapper
> (__cxa_thread_atexit for libstdc++, as proposed for the Itanium C++
> ABI).

Not only that, but if the C library does not provide the C++ replacement
will most likely have serious drawbacks such as no handling of dso_symbol 
(so no synchronization between thread_local destructors and
dclose) and no ordering enforcing in main thread thread_local destructors.
The libc++ implementation does document the possible issues with the
fallback.

> 
> There's interest in this because it's much easier to use than
> pthread_key_create if you have to avoid linking against libpthread and
> you target glibc releases between 2.18 and 2.33 (although it is possible
> to use weak symbols, as in the libstdc++ implementation in
> libstdc++-v3/libsupc++/atexit_thread.cc).

Although it seems to be used solely by C++ and the interface is generic enough
so any runtime/language might use it as well for any thread exit deallocation
cleanup, I am not sure it would be feasible to export or support calling it
from C code.

It does make sense to use __cxa_thread_atexit for C++ since it aims to have
interoperability with C, but it is not clear to me how useful it would be for
a language or runtime that does not aim for it (it would be simple to just
do all the required work on the thread wrapper or similar facility).

The interface also have two annoying peculiarities where calling from C code
is not straightforward: 

  1. Any memory failure aborts the process, which is far from ideal to a
     generic interface.

  2. User need to correctly declare __dso_handle; using NULL (or any invalid
     value) will bound the to callback to main program (which is not correct
     if use within a shared library).

So I think it would be better to provide a different interface.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-29 18:59 ` Adhemerval Zanella Netto
@ 2022-08-29 19:21   ` Florian Weimer
  2022-08-29 19:56     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-29 19:21 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> Although it seems to be used solely by C++ and the interface is generic enough
> so any runtime/language might use it as well for any thread exit deallocation
> cleanup, I am not sure it would be feasible to export or support calling it
> from C code.

It's an exported symbol, and usually we treat those as part of the ABI.

> The interface also have two annoying peculiarities where calling from C code
> is not straightforward: 
>
>   1. Any memory failure aborts the process, which is far from ideal to a
>      generic interface.
>
>   2. User need to correctly declare __dso_handle; using NULL (or any invalid
>      value) will bound the to callback to main program (which is not correct
>      if use within a shared library).
>
> So I think it would be better to provide a different interface.

We could add this:

static inline int
thread_atexit (void (*__callback) (void *__data), void *__data)
{
  extern void *__dso_handle __attribute__ ((__visibility__ ("hidden")));
  extern int __cxa_thread_atexit_impl (void (*__callback) (void *__data),
                                        void *__caller);
  return __cxa_thread_atexit_impl (__callback, __data, &__dso_handle);
}

to <stdlib.h> with some approriate preprocessor conditionals.

Maybe libgcc_s should do the error checking in its __cxa_thread_atexit
function?  Then we could simply use the existing implementation in the
function above, remove our abort, and defer the problem to the
thread_atexit caller.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-29 19:21   ` Florian Weimer
@ 2022-08-29 19:56     ` Adhemerval Zanella Netto
  2022-08-30  7:37       ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella Netto @ 2022-08-29 19:56 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 29/08/22 16:21, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> Although it seems to be used solely by C++ and the interface is generic enough
>> so any runtime/language might use it as well for any thread exit deallocation
>> cleanup, I am not sure it would be feasible to export or support calling it
>> from C code.
> 
> It's an exported symbol, and usually we treat those as part of the ABI.

Right, but the double underscore also notes this is an implementation specific
symbol similar to fortify ones (where although glibc is bounded to supported
them indefinitely, it might not call them for if fortify changes to a compiler
generated builtin for instance).

> 
>> The interface also have two annoying peculiarities where calling from C code
>> is not straightforward: 
>>
>>   1. Any memory failure aborts the process, which is far from ideal to a
>>      generic interface.
>>
>>   2. User need to correctly declare __dso_handle; using NULL (or any invalid
>>      value) will bound the to callback to main program (which is not correct
>>      if use within a shared library).
>>
>> So I think it would be better to provide a different interface.
> 
> We could add this:
> 
> static inline int
> thread_atexit (void (*__callback) (void *__data), void *__data)
> {
>   extern void *__dso_handle __attribute__ ((__visibility__ ("hidden")));
>   extern int __cxa_thread_atexit_impl (void (*__callback) (void *__data),
>                                         void *__caller);
>   return __cxa_thread_atexit_impl (__callback, __data, &__dso_handle);
> }
> 
> to <stdlib.h> with some approriate preprocessor conditionals.

Or make it similar to atexit and provide it with static-only-routines.
It would simplify the prototype and header definitions.

> 
> Maybe libgcc_s should do the error checking in its __cxa_thread_atexit
> function?  Then we could simply use the existing implementation in the
> function above, remove our abort, and defer the problem to the
> thread_atexit caller.

It would mean that libgcc_s would need to build and use the fallback
implementation in the case of failure, which is suboptimal (although not
sure it would be an improvement over abort on failure).  

But I also think for compat reasons we can't really change 
__cxa_thread_atexit_impl, since C++ constructors will be the ones responsible
to call __cxa_thread_atexit and afaik it assumes it can not fail (meaning
that any failure will be ignored).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-29 19:56     ` Adhemerval Zanella Netto
@ 2022-08-30  7:37       ` Florian Weimer
  2022-08-30 12:56         ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-30  7:37 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> It would mean that libgcc_s would need to build and use the fallback
> implementation in the case of failure, which is suboptimal (although not
> sure it would be an improvement over abort on failure).

The fallback implementation also has to allocate memory.

The alternative would be to throw std::bad_alloc.

> But I also think for compat reasons we can't really change 
> __cxa_thread_atexit_impl, since C++ constructors will be the ones responsible
> to call __cxa_thread_atexit and afaik it assumes it can not fail (meaning
> that any failure will be ignored).

Yes, there is the general problem that for registering an object for
destruction, as a matter of principle, you need to try to allocate the
data structure in the registry first, and if that is successful, create
the object.  Otherwise you may end up with an object and no way to
register its destructor.  Perhaps you should just call the destructor at
this point and throw std::bad_alloc.

I guess we should go with the static destructor counting approach
instead. 8-/

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-30  7:37       ` Florian Weimer
@ 2022-08-30 12:56         ` Adhemerval Zanella Netto
  2022-09-06  6:44           ` Florian Weimer
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella Netto @ 2022-08-30 12:56 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha



On 30/08/22 04:37, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
> 
>> It would mean that libgcc_s would need to build and use the fallback
>> implementation in the case of failure, which is suboptimal (although not
>> sure it would be an improvement over abort on failure).
> 
> The fallback implementation also has to allocate memory.
> 
> The alternative would be to throw std::bad_alloc.

Yeah, but the suboptimal is not solely for the allocation memory part,
but also for the missing synchronization and ordering.  But I also think
moving the failing handling to caller it still better than the hard hammer
or aborting the process (even though I agree it won't improve that much).

> 
>> But I also think for compat reasons we can't really change 
>> __cxa_thread_atexit_impl, since C++ constructors will be the ones responsible
>> to call __cxa_thread_atexit and afaik it assumes it can not fail (meaning
>> that any failure will be ignored).
> 
> Yes, there is the general problem that for registering an object for
> destruction, as a matter of principle, you need to try to allocate the
> data structure in the registry first, and if that is successful, create
> the object.  Otherwise you may end up with an object and no way to
> register its destructor.  Perhaps you should just call the destructor at
> this point and throw std::bad_alloc.
> 
> I guess we should go with the static destructor counting approach
> instead. 8-/

Why strategy more specially do you mean the counting approach?  I just reread
the 'Counting static __cxa_atexit calls' thread and tend to agree with you
that having the number of required unique __cxa_atexit calls seems slight
better than a failable .init_array mode.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Calling __cxa_thread_atexit_impl directly, from C code
  2022-08-30 12:56         ` Adhemerval Zanella Netto
@ 2022-09-06  6:44           ` Florian Weimer
  0 siblings, 0 replies; 7+ messages in thread
From: Florian Weimer @ 2022-09-06  6:44 UTC (permalink / raw)
  To: Adhemerval Zanella Netto; +Cc: libc-alpha

* Adhemerval Zanella Netto:

> On 30/08/22 04:37, Florian Weimer wrote:
>> * Adhemerval Zanella Netto:
>> 
>>> It would mean that libgcc_s would need to build and use the fallback
>>> implementation in the case of failure, which is suboptimal (although not
>>> sure it would be an improvement over abort on failure).
>> 
>> The fallback implementation also has to allocate memory.
>> 
>> The alternative would be to throw std::bad_alloc.
>
> Yeah, but the suboptimal is not solely for the allocation memory part,
> but also for the missing synchronization and ordering.  But I also think
> moving the failing handling to caller it still better than the hard hammer
> or aborting the process (even though I agree it won't improve that much).

But I think this argues indirectly for making a __cxa_thread_atexit_impl
variant callable from C code: C++ code with -fno-exceptions might want
to use this as well, and there is really no good way to handle such
fallible C++ constructs (here: implicit destructor registration) without
exceptions.

>>> But I also think for compat reasons we can't really change 
>>> __cxa_thread_atexit_impl, since C++ constructors will be the ones responsible
>>> to call __cxa_thread_atexit and afaik it assumes it can not fail (meaning
>>> that any failure will be ignored).
>> 
>> Yes, there is the general problem that for registering an object for
>> destruction, as a matter of principle, you need to try to allocate the
>> data structure in the registry first, and if that is successful, create
>> the object.  Otherwise you may end up with an object and no way to
>> register its destructor.  Perhaps you should just call the destructor at
>> this point and throw std::bad_alloc.
>> 
>> I guess we should go with the static destructor counting approach
>> instead. 8-/
>
> Why strategy more specially do you mean the counting approach?

If we know how many TLS variables with d'tors there can be, we can
allocate the memory upfront during thread creation.

> I just reread the 'Counting static __cxa_atexit calls' thread and tend
> to agree with you that having the number of required unique
> __cxa_atexit calls seems slight better than a failable .init_array
> mode.

Please say so on the thread as well. 8-)

Thanks,
Florian


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-09-06  6:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-26  8:31 Calling __cxa_thread_atexit_impl directly, from C code Florian Weimer
2022-08-29 18:59 ` Adhemerval Zanella Netto
2022-08-29 19:21   ` Florian Weimer
2022-08-29 19:56     ` Adhemerval Zanella Netto
2022-08-30  7:37       ` Florian Weimer
2022-08-30 12:56         ` Adhemerval Zanella Netto
2022-09-06  6:44           ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).