* Counting static __cxa_atexit calls
@ 2022-08-23 11:58 Florian Weimer
2022-08-23 12:28 ` Nick Clifton
2022-08-23 13:40 ` Michael Matz
0 siblings, 2 replies; 7+ messages in thread
From: Florian Weimer @ 2022-08-23 11:58 UTC (permalink / raw)
To: binutils; +Cc: gcc, libc-alpha
We currently have a latent bug in glibc where C++ constructor calls can
fail if they have static or thread storage duration and a non-trivial
destructor. The reason is that __cxa_atexit (and
__cxa_thread_atexit_impl) may have to allocate memory. We can avoid
that if we know how many such static calls exist in an object (for C++,
the compiler will never emit these calls repeatedly in a loop). Then we
can allocate the resources beforehand, either during process and thread
start, or when dlopen is called and new objects are loaded.
What would be the most ELF-flavored way to implement this? After the
final link, I expect that the count (or counts, we need a separate
counter for thread-local storage) would show up under a new dynamic tag
in the dynamic segment. This is actually a very good fit because older
loaders will just ignore it. But the question remains what GCC should
emit into assembler & object files, so that the link editor can compute
the total count from that.
Thanks,
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer
@ 2022-08-23 12:28 ` Nick Clifton
2022-08-23 13:40 ` Michael Matz
1 sibling, 0 replies; 7+ messages in thread
From: Nick Clifton @ 2022-08-23 12:28 UTC (permalink / raw)
To: Florian Weimer, binutils; +Cc: gcc, libc-alpha
Hi Florian,
> What would be the most ELF-flavored way to implement this? After the
> final link, I expect that the count (or counts, we need a separate
> counter for thread-local storage) would show up under a new dynamic tag
> in the dynamic segment. This is actually a very good fit because older
> loaders will just ignore it. But the question remains what GCC should
> emit into assembler & object files, so that the link editor can compute
> the total count from that.
(It would worthwhile asking this question of the LLVM community too,
since ideally we would like to use the same method in both compilers).
This sounds like an opportunity to add a couple of new GNU object
attributes:
.gnu_attribute Tag_gnu_destructor_count, <number>
.gnu_attribute Tag_gnu_tld_count, <count>
Which would then translate into a GNU object attribute notes in the
object file.
Cheers
Nick
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer
2022-08-23 12:28 ` Nick Clifton
@ 2022-08-23 13:40 ` Michael Matz
2022-08-24 12:06 ` Florian Weimer
1 sibling, 1 reply; 7+ messages in thread
From: Michael Matz @ 2022-08-23 13:40 UTC (permalink / raw)
To: Florian Weimer; +Cc: binutils, gcc, libc-alpha
Hello,
On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote:
> We currently have a latent bug in glibc where C++ constructor calls can
> fail if they have static or thread storage duration and a non-trivial
> destructor. The reason is that __cxa_atexit (and
> __cxa_thread_atexit_impl) may have to allocate memory. We can avoid
> that if we know how many such static calls exist in an object (for C++,
> the compiler will never emit these calls repeatedly in a loop). Then we
> can allocate the resources beforehand, either during process and thread
> start, or when dlopen is called and new objects are loaded.
Isn't this merely moving the failure point from exception-at-ctor to
dlopen-fails? If an individual __cxa_atexit can't allocate memory anymore
for its list structure, why should pre-allocation (which is still dynamic,
based on the number of actual atexit calls) have any more luck?
> What would be the most ELF-flavored way to implement this? After the
> final link, I expect that the count (or counts, we need a separate
> counter for thread-local storage) would show up under a new dynamic tag
> in the dynamic segment. This is actually a very good fit because older
> loaders will just ignore it. But the question remains what GCC should
> emit into assembler & object files, so that the link editor can compute
> the total count from that.
Probably a note section, which the link editor could either transform into
a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter
wouldn't require any specific tooling support in the link editor. But the
consumer would have to iterate through all the notes to add the
individual counts together. Might be acceptable, though.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-23 13:40 ` Michael Matz
@ 2022-08-24 12:06 ` Florian Weimer
2022-08-24 12:53 ` Michael Matz
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-24 12:06 UTC (permalink / raw)
To: Michael Matz; +Cc: binutils, gcc, libc-alpha
* Michael Matz:
> Hello,
>
> On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote:
>
>> We currently have a latent bug in glibc where C++ constructor calls can
>> fail if they have static or thread storage duration and a non-trivial
>> destructor. The reason is that __cxa_atexit (and
>> __cxa_thread_atexit_impl) may have to allocate memory. We can avoid
>> that if we know how many such static calls exist in an object (for C++,
>> the compiler will never emit these calls repeatedly in a loop). Then we
>> can allocate the resources beforehand, either during process and thread
>> start, or when dlopen is called and new objects are loaded.
>
> Isn't this merely moving the failure point from exception-at-ctor to
> dlopen-fails?
Yes, and that is a soft error that can be handled (likewise for
pthread_create).
> If an individual __cxa_atexit can't allocate memory anymore for its
> list structure, why should pre-allocation (which is still dynamic,
> based on the number of actual atexit calls) have any more luck?
We can report the error properly, and not just terminate the process.
The existing ABI functions are mostly noexcept. For C++ constructors of
global objects, there cannot even be a handler because they are invoked
by an ELF constructor, and throwing through an ELF constructor is
undefined.
>> What would be the most ELF-flavored way to implement this? After the
>> final link, I expect that the count (or counts, we need a separate
>> counter for thread-local storage) would show up under a new dynamic tag
>> in the dynamic segment. This is actually a very good fit because older
>> loaders will just ignore it. But the question remains what GCC should
>> emit into assembler & object files, so that the link editor can compute
>> the total count from that.
>
> Probably a note section, which the link editor could either transform into
> a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter
> wouldn't require any specific tooling support in the link editor. But the
> consumer would have to iterate through all the notes to add the
> individual counts together. Might be acceptable, though.
I think we need some level of link editor support to avoid drastically
over-counting multiple static calls that get merged into one
implementation as the result of vague linkage. Not sure how to express
that at the ELF level?
Thanks,
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-24 12:06 ` Florian Weimer
@ 2022-08-24 12:53 ` Michael Matz
2022-08-24 14:31 ` Florian Weimer
0 siblings, 1 reply; 7+ messages in thread
From: Michael Matz @ 2022-08-24 12:53 UTC (permalink / raw)
To: Florian Weimer; +Cc: binutils, gcc, libc-alpha
Hello,
On Wed, 24 Aug 2022, Florian Weimer wrote:
> > Isn't this merely moving the failure point from exception-at-ctor to
> > dlopen-fails?
>
> Yes, and that is a soft error that can be handled (likewise for
> pthread_create).
Makes sense. Though that actually hints at a design problem with ELF
static ctors/dtors: they should be able to soft-fail (leading to dlopen or
pthread_create error returns). So, maybe the _best_ way to deal with this
is to extend the definition of the various object-initionalization means
in ELF to allow propagating failure.
> > Probably a note section, which the link editor could either transform into
> > a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter
> > wouldn't require any specific tooling support in the link editor. But the
> > consumer would have to iterate through all the notes to add the
> > individual counts together. Might be acceptable, though.
>
> I think we need some level of link editor support to avoid drastically
> over-counting multiple static calls that get merged into one
> implementation as the result of vague linkage. Not sure how to express
> that at the ELF level?
Hmm. The __cxa_atexit calls are coming from the per-file local static
initialization_and_destruction routine which doesn't have vague linkage,
so its contribution to the overall number of cxa_atexit calls doesn't
change from .o to final-exe. Can you show an example of what you're
worried about?
A completely different way would be to not use cxa_atexit at all: allocate
memory statically for the object and dtor addresses in .rodata (instead of
in .text right now), and iterate over those at static_destruction time.
(For the thread-local ones it would need to store arguments to
__tls_get_addr).
Doing that or defining failure modes for ELF init/fini seems a better
design than hacking around the current limitation via counting static
cxa_atexit calls.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-24 12:53 ` Michael Matz
@ 2022-08-24 14:31 ` Florian Weimer
2022-08-24 15:25 ` Michael Matz
0 siblings, 1 reply; 7+ messages in thread
From: Florian Weimer @ 2022-08-24 14:31 UTC (permalink / raw)
To: Michael Matz; +Cc: binutils, gcc, libc-alpha
* Michael Matz:
> Hello,
>
> On Wed, 24 Aug 2022, Florian Weimer wrote:
>
>> > Isn't this merely moving the failure point from exception-at-ctor to
>> > dlopen-fails?
>>
>> Yes, and that is a soft error that can be handled (likewise for
>> pthread_create).
>
> Makes sense. Though that actually hints at a design problem with ELF
> static ctors/dtors: they should be able to soft-fail (leading to dlopen or
> pthread_create error returns). So, maybe the _best_ way to deal with this
> is to extend the definition of the various object-initionalization means
> in ELF to allow propagating failure.
We could enable unwinding through the dynamic linker perhaps. But as I
said, those Itanium ABI functions tend to be noexcept, so there's work
on that front as well.
For thread-local storage, it's even more difficult because any first
access can throw even if the constructor is noexcept.
>> > Probably a note section, which the link editor could either transform into
>> > a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter
>> > wouldn't require any specific tooling support in the link editor. But the
>> > consumer would have to iterate through all the notes to add the
>> > individual counts together. Might be acceptable, though.
>>
>> I think we need some level of link editor support to avoid drastically
>> over-counting multiple static calls that get merged into one
>> implementation as the result of vague linkage. Not sure how to express
>> that at the ELF level?
>
> Hmm. The __cxa_atexit calls are coming from the per-file local static
> initialization_and_destruction routine which doesn't have vague linkage,
> so its contribution to the overall number of cxa_atexit calls doesn't
> change from .o to final-exe. Can you show an example of what you're
> worried about?
Sorry if I didn't use the correct terminology.
I was thinking about this:
#include <vector>
template <int i>
struct S {
static std::vector<int *> vec;
};
template <int i> std::vector<int *> S<i>::vec(i);
std::vector<int *> &
f()
{
return S<1009>::vec;
}
The initialization is deduplicated with the help of a guard variable,
and that also bounds to number of __cxa_atexit invocations to at most
one per type.
> A completely different way would be to not use cxa_atexit at all: allocate
> memory statically for the object and dtor addresses in .rodata (instead of
> in .text right now), and iterate over those at static_destruction time.
> (For the thread-local ones it would need to store arguments to
> __tls_get_addr).
That only works if the compiler and linker can figure out the
construction order. In general, that is not possible, and that case
seems even quite common with C++. If the construction order is not
known ahead of time, it is necessary to record it somewhere, so that
destruction can happen in reverse. So I think storing things in .rodata
is out.
Thanks,
Florian
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls
2022-08-24 14:31 ` Florian Weimer
@ 2022-08-24 15:25 ` Michael Matz
0 siblings, 0 replies; 7+ messages in thread
From: Michael Matz @ 2022-08-24 15:25 UTC (permalink / raw)
To: Florian Weimer; +Cc: binutils, gcc, libc-alpha
Hello,
On Wed, 24 Aug 2022, Florian Weimer wrote:
> > On Wed, 24 Aug 2022, Florian Weimer wrote:
> >
> >> > Isn't this merely moving the failure point from exception-at-ctor to
> >> > dlopen-fails?
> >>
> >> Yes, and that is a soft error that can be handled (likewise for
> >> pthread_create).
> >
> > Makes sense. Though that actually hints at a design problem with ELF
> > static ctors/dtors: they should be able to soft-fail (leading to dlopen or
> > pthread_create error returns). So, maybe the _best_ way to deal with this
> > is to extend the definition of the various object-initionalization means
> > in ELF to allow propagating failure.
>
> We could enable unwinding through the dynamic linker perhaps. But as I
> said, those Itanium ABI functions tend to be noexcept, so there's work
> on that front as well.
Yeah, my idea would have been slightly less ambitious: redefine the ABI of
.init_array functions to be able to return an int. The loader would abort
loading if any of them return non-zero. Now change GCC code emission of
those helper functions placed in .init_array to catch all exceptions and
(in case an exception happened) return non-zero. Or, even easier, don't
deal with exceptions, but rather just check if __cxa_atexit worked, and if
not return non-zero right away. That way all the exception propagation
(or cxa_atexit error handling) stays purely within the GCC generated code
and the dynamic loader only needs to deal with return values, not
exceptions and unwinding.
For backward compat we can't just change the ABI of .init_array, but we
can devise an alternative: .init_array_mayfail and the associated DT tags.
> For thread-local storage, it's even more difficult because any first
> access can throw even if the constructor is noexcept.
That's extending the scope somewhat, pre-counting cxa_atexit wouldn't
solve this problem either, right?
> >> I think we need some level of link editor support to avoid drastically
> >> over-counting multiple static calls that get merged into one
> >> implementation as the result of vague linkage. Not sure how to express
> >> that at the ELF level?
> >
> > Hmm. The __cxa_atexit calls are coming from the per-file local static
> > initialization_and_destruction routine which doesn't have vague linkage,
> > so its contribution to the overall number of cxa_atexit calls doesn't
> > change from .o to final-exe. Can you show an example of what you're
> > worried about?
>
> Sorry if I didn't use the correct terminology.
>
> I was thinking about this:
>
> #include <vector>
>
> template <int i>
> struct S {
> static std::vector<int *> vec;
> };
>
> template <int i> std::vector<int *> S<i>::vec(i);
>
> std::vector<int *> &
> f()
> {
> return S<1009>::vec;
> }
>
> The initialization is deduplicated with the help of a guard variable,
> and that also bounds to number of __cxa_atexit invocations to at most
> one per type.
Ah, right, thanks. The guard variable for class-local statics, I was
thinking file-scope globals. Double-hmm. I don't readily see a nice way
to correctly precalculate the number of cxa_atexit calls here. A simple
problem is the following: assume a couple files each defining such class
templates, that ultimately define and initialize static members A<1>::a
and B<1>::b (assume vague linkage). Assume we have four files:
a: defines A::a
b: defines B::b
ab: defines A::a and B::b
ba: defines B::b and A::a
Now link order influences which file gets to actually initialize the
members and which ones skip it due to guard variables. But the object
files themself don't know enough context of which will be which. Not even
the link editor know that because the non-taken cxa_atexit calls aren't in
linkonce/group sections, there are all there in
object.o:.text:_Z41__static_initialization_and_destruction_0ii .
So, what would need to be emitted is for instance a list of cxa_atexit
calls plus guard variable; the link editor could then count all unguarded
cxa_atexit calls plus all guarded ones, but the latter only once per
guard. The key would be the identity of the guard variable.
That seems like an awful lot of complexity at the wrong level for a very
specific usecase when we could also make .init_array failable, which then
even might have more usecases.
> > A completely different way would be to not use cxa_atexit at all:
> > allocate memory statically for the object and dtor addresses in
> > .rodata (instead of in .text right now), and iterate over those at
> > static_destruction time. (For the thread-local ones it would need to
> > store arguments to __tls_get_addr).
>
> That only works if the compiler and linker can figure out the
> construction order. In general, that is not possible, and that case
> seems even quite common with C++. If the construction order is not
> known ahead of time, it is necessary to record it somewhere, so that
> destruction can happen in reverse. So I think storing things in .rodata
> is out.
Hmm, right. The basic idea could be salvaged by also pre-allocating a
linked list field in .data (or .tdata), and a per-object-file entry to
such list. But failable .init_array looks nicer to me right now.
Ciao,
Michael.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-24 15:25 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer
2022-08-23 12:28 ` Nick Clifton
2022-08-23 13:40 ` Michael Matz
2022-08-24 12:06 ` Florian Weimer
2022-08-24 12:53 ` Michael Matz
2022-08-24 14:31 ` Florian Weimer
2022-08-24 15:25 ` Michael Matz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).