* Counting static __cxa_atexit calls @ 2022-08-23 11:58 Florian Weimer 2022-08-23 12:28 ` Nick Clifton 2022-08-23 13:40 ` Michael Matz 0 siblings, 2 replies; 7+ messages in thread From: Florian Weimer @ 2022-08-23 11:58 UTC (permalink / raw) To: binutils; +Cc: gcc, libc-alpha We currently have a latent bug in glibc where C++ constructor calls can fail if they have static or thread storage duration and a non-trivial destructor. The reason is that __cxa_atexit (and __cxa_thread_atexit_impl) may have to allocate memory. We can avoid that if we know how many such static calls exist in an object (for C++, the compiler will never emit these calls repeatedly in a loop). Then we can allocate the resources beforehand, either during process and thread start, or when dlopen is called and new objects are loaded. What would be the most ELF-flavored way to implement this? After the final link, I expect that the count (or counts, we need a separate counter for thread-local storage) would show up under a new dynamic tag in the dynamic segment. This is actually a very good fit because older loaders will just ignore it. But the question remains what GCC should emit into assembler & object files, so that the link editor can compute the total count from that. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer @ 2022-08-23 12:28 ` Nick Clifton 2022-08-23 13:40 ` Michael Matz 1 sibling, 0 replies; 7+ messages in thread From: Nick Clifton @ 2022-08-23 12:28 UTC (permalink / raw) To: Florian Weimer, binutils; +Cc: gcc, libc-alpha Hi Florian, > What would be the most ELF-flavored way to implement this? After the > final link, I expect that the count (or counts, we need a separate > counter for thread-local storage) would show up under a new dynamic tag > in the dynamic segment. This is actually a very good fit because older > loaders will just ignore it. But the question remains what GCC should > emit into assembler & object files, so that the link editor can compute > the total count from that. (It would worthwhile asking this question of the LLVM community too, since ideally we would like to use the same method in both compilers). This sounds like an opportunity to add a couple of new GNU object attributes: .gnu_attribute Tag_gnu_destructor_count, <number> .gnu_attribute Tag_gnu_tld_count, <count> Which would then translate into a GNU object attribute notes in the object file. Cheers Nick ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer 2022-08-23 12:28 ` Nick Clifton @ 2022-08-23 13:40 ` Michael Matz 2022-08-24 12:06 ` Florian Weimer 1 sibling, 1 reply; 7+ messages in thread From: Michael Matz @ 2022-08-23 13:40 UTC (permalink / raw) To: Florian Weimer; +Cc: binutils, gcc, libc-alpha Hello, On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote: > We currently have a latent bug in glibc where C++ constructor calls can > fail if they have static or thread storage duration and a non-trivial > destructor. The reason is that __cxa_atexit (and > __cxa_thread_atexit_impl) may have to allocate memory. We can avoid > that if we know how many such static calls exist in an object (for C++, > the compiler will never emit these calls repeatedly in a loop). Then we > can allocate the resources beforehand, either during process and thread > start, or when dlopen is called and new objects are loaded. Isn't this merely moving the failure point from exception-at-ctor to dlopen-fails? If an individual __cxa_atexit can't allocate memory anymore for its list structure, why should pre-allocation (which is still dynamic, based on the number of actual atexit calls) have any more luck? > What would be the most ELF-flavored way to implement this? After the > final link, I expect that the count (or counts, we need a separate > counter for thread-local storage) would show up under a new dynamic tag > in the dynamic segment. This is actually a very good fit because older > loaders will just ignore it. But the question remains what GCC should > emit into assembler & object files, so that the link editor can compute > the total count from that. Probably a note section, which the link editor could either transform into a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter wouldn't require any specific tooling support in the link editor. But the consumer would have to iterate through all the notes to add the individual counts together. Might be acceptable, though. Ciao, Michael. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-23 13:40 ` Michael Matz @ 2022-08-24 12:06 ` Florian Weimer 2022-08-24 12:53 ` Michael Matz 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-08-24 12:06 UTC (permalink / raw) To: Michael Matz; +Cc: binutils, gcc, libc-alpha * Michael Matz: > Hello, > > On Tue, 23 Aug 2022, Florian Weimer via Gcc wrote: > >> We currently have a latent bug in glibc where C++ constructor calls can >> fail if they have static or thread storage duration and a non-trivial >> destructor. The reason is that __cxa_atexit (and >> __cxa_thread_atexit_impl) may have to allocate memory. We can avoid >> that if we know how many such static calls exist in an object (for C++, >> the compiler will never emit these calls repeatedly in a loop). Then we >> can allocate the resources beforehand, either during process and thread >> start, or when dlopen is called and new objects are loaded. > > Isn't this merely moving the failure point from exception-at-ctor to > dlopen-fails? Yes, and that is a soft error that can be handled (likewise for pthread_create). > If an individual __cxa_atexit can't allocate memory anymore for its > list structure, why should pre-allocation (which is still dynamic, > based on the number of actual atexit calls) have any more luck? We can report the error properly, and not just terminate the process. The existing ABI functions are mostly noexcept. For C++ constructors of global objects, there cannot even be a handler because they are invoked by an ELF constructor, and throwing through an ELF constructor is undefined. >> What would be the most ELF-flavored way to implement this? After the >> final link, I expect that the count (or counts, we need a separate >> counter for thread-local storage) would show up under a new dynamic tag >> in the dynamic segment. This is actually a very good fit because older >> loaders will just ignore it. But the question remains what GCC should >> emit into assembler & object files, so that the link editor can compute >> the total count from that. > > Probably a note section, which the link editor could either transform into > a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter > wouldn't require any specific tooling support in the link editor. But the > consumer would have to iterate through all the notes to add the > individual counts together. Might be acceptable, though. I think we need some level of link editor support to avoid drastically over-counting multiple static calls that get merged into one implementation as the result of vague linkage. Not sure how to express that at the ELF level? Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-24 12:06 ` Florian Weimer @ 2022-08-24 12:53 ` Michael Matz 2022-08-24 14:31 ` Florian Weimer 0 siblings, 1 reply; 7+ messages in thread From: Michael Matz @ 2022-08-24 12:53 UTC (permalink / raw) To: Florian Weimer; +Cc: binutils, gcc, libc-alpha Hello, On Wed, 24 Aug 2022, Florian Weimer wrote: > > Isn't this merely moving the failure point from exception-at-ctor to > > dlopen-fails? > > Yes, and that is a soft error that can be handled (likewise for > pthread_create). Makes sense. Though that actually hints at a design problem with ELF static ctors/dtors: they should be able to soft-fail (leading to dlopen or pthread_create error returns). So, maybe the _best_ way to deal with this is to extend the definition of the various object-initionalization means in ELF to allow propagating failure. > > Probably a note section, which the link editor could either transform into > > a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter > > wouldn't require any specific tooling support in the link editor. But the > > consumer would have to iterate through all the notes to add the > > individual counts together. Might be acceptable, though. > > I think we need some level of link editor support to avoid drastically > over-counting multiple static calls that get merged into one > implementation as the result of vague linkage. Not sure how to express > that at the ELF level? Hmm. The __cxa_atexit calls are coming from the per-file local static initialization_and_destruction routine which doesn't have vague linkage, so its contribution to the overall number of cxa_atexit calls doesn't change from .o to final-exe. Can you show an example of what you're worried about? A completely different way would be to not use cxa_atexit at all: allocate memory statically for the object and dtor addresses in .rodata (instead of in .text right now), and iterate over those at static_destruction time. (For the thread-local ones it would need to store arguments to __tls_get_addr). Doing that or defining failure modes for ELF init/fini seems a better design than hacking around the current limitation via counting static cxa_atexit calls. Ciao, Michael. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-24 12:53 ` Michael Matz @ 2022-08-24 14:31 ` Florian Weimer 2022-08-24 15:25 ` Michael Matz 0 siblings, 1 reply; 7+ messages in thread From: Florian Weimer @ 2022-08-24 14:31 UTC (permalink / raw) To: Michael Matz; +Cc: binutils, gcc, libc-alpha * Michael Matz: > Hello, > > On Wed, 24 Aug 2022, Florian Weimer wrote: > >> > Isn't this merely moving the failure point from exception-at-ctor to >> > dlopen-fails? >> >> Yes, and that is a soft error that can be handled (likewise for >> pthread_create). > > Makes sense. Though that actually hints at a design problem with ELF > static ctors/dtors: they should be able to soft-fail (leading to dlopen or > pthread_create error returns). So, maybe the _best_ way to deal with this > is to extend the definition of the various object-initionalization means > in ELF to allow propagating failure. We could enable unwinding through the dynamic linker perhaps. But as I said, those Itanium ABI functions tend to be noexcept, so there's work on that front as well. For thread-local storage, it's even more difficult because any first access can throw even if the constructor is noexcept. >> > Probably a note section, which the link editor could either transform into >> > a dynamic tag or leave as note(s) in the PT_NOTE segment. The latter >> > wouldn't require any specific tooling support in the link editor. But the >> > consumer would have to iterate through all the notes to add the >> > individual counts together. Might be acceptable, though. >> >> I think we need some level of link editor support to avoid drastically >> over-counting multiple static calls that get merged into one >> implementation as the result of vague linkage. Not sure how to express >> that at the ELF level? > > Hmm. The __cxa_atexit calls are coming from the per-file local static > initialization_and_destruction routine which doesn't have vague linkage, > so its contribution to the overall number of cxa_atexit calls doesn't > change from .o to final-exe. Can you show an example of what you're > worried about? Sorry if I didn't use the correct terminology. I was thinking about this: #include <vector> template <int i> struct S { static std::vector<int *> vec; }; template <int i> std::vector<int *> S<i>::vec(i); std::vector<int *> & f() { return S<1009>::vec; } The initialization is deduplicated with the help of a guard variable, and that also bounds to number of __cxa_atexit invocations to at most one per type. > A completely different way would be to not use cxa_atexit at all: allocate > memory statically for the object and dtor addresses in .rodata (instead of > in .text right now), and iterate over those at static_destruction time. > (For the thread-local ones it would need to store arguments to > __tls_get_addr). That only works if the compiler and linker can figure out the construction order. In general, that is not possible, and that case seems even quite common with C++. If the construction order is not known ahead of time, it is necessary to record it somewhere, so that destruction can happen in reverse. So I think storing things in .rodata is out. Thanks, Florian ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Counting static __cxa_atexit calls 2022-08-24 14:31 ` Florian Weimer @ 2022-08-24 15:25 ` Michael Matz 0 siblings, 0 replies; 7+ messages in thread From: Michael Matz @ 2022-08-24 15:25 UTC (permalink / raw) To: Florian Weimer; +Cc: binutils, gcc, libc-alpha Hello, On Wed, 24 Aug 2022, Florian Weimer wrote: > > On Wed, 24 Aug 2022, Florian Weimer wrote: > > > >> > Isn't this merely moving the failure point from exception-at-ctor to > >> > dlopen-fails? > >> > >> Yes, and that is a soft error that can be handled (likewise for > >> pthread_create). > > > > Makes sense. Though that actually hints at a design problem with ELF > > static ctors/dtors: they should be able to soft-fail (leading to dlopen or > > pthread_create error returns). So, maybe the _best_ way to deal with this > > is to extend the definition of the various object-initionalization means > > in ELF to allow propagating failure. > > We could enable unwinding through the dynamic linker perhaps. But as I > said, those Itanium ABI functions tend to be noexcept, so there's work > on that front as well. Yeah, my idea would have been slightly less ambitious: redefine the ABI of .init_array functions to be able to return an int. The loader would abort loading if any of them return non-zero. Now change GCC code emission of those helper functions placed in .init_array to catch all exceptions and (in case an exception happened) return non-zero. Or, even easier, don't deal with exceptions, but rather just check if __cxa_atexit worked, and if not return non-zero right away. That way all the exception propagation (or cxa_atexit error handling) stays purely within the GCC generated code and the dynamic loader only needs to deal with return values, not exceptions and unwinding. For backward compat we can't just change the ABI of .init_array, but we can devise an alternative: .init_array_mayfail and the associated DT tags. > For thread-local storage, it's even more difficult because any first > access can throw even if the constructor is noexcept. That's extending the scope somewhat, pre-counting cxa_atexit wouldn't solve this problem either, right? > >> I think we need some level of link editor support to avoid drastically > >> over-counting multiple static calls that get merged into one > >> implementation as the result of vague linkage. Not sure how to express > >> that at the ELF level? > > > > Hmm. The __cxa_atexit calls are coming from the per-file local static > > initialization_and_destruction routine which doesn't have vague linkage, > > so its contribution to the overall number of cxa_atexit calls doesn't > > change from .o to final-exe. Can you show an example of what you're > > worried about? > > Sorry if I didn't use the correct terminology. > > I was thinking about this: > > #include <vector> > > template <int i> > struct S { > static std::vector<int *> vec; > }; > > template <int i> std::vector<int *> S<i>::vec(i); > > std::vector<int *> & > f() > { > return S<1009>::vec; > } > > The initialization is deduplicated with the help of a guard variable, > and that also bounds to number of __cxa_atexit invocations to at most > one per type. Ah, right, thanks. The guard variable for class-local statics, I was thinking file-scope globals. Double-hmm. I don't readily see a nice way to correctly precalculate the number of cxa_atexit calls here. A simple problem is the following: assume a couple files each defining such class templates, that ultimately define and initialize static members A<1>::a and B<1>::b (assume vague linkage). Assume we have four files: a: defines A::a b: defines B::b ab: defines A::a and B::b ba: defines B::b and A::a Now link order influences which file gets to actually initialize the members and which ones skip it due to guard variables. But the object files themself don't know enough context of which will be which. Not even the link editor know that because the non-taken cxa_atexit calls aren't in linkonce/group sections, there are all there in object.o:.text:_Z41__static_initialization_and_destruction_0ii . So, what would need to be emitted is for instance a list of cxa_atexit calls plus guard variable; the link editor could then count all unguarded cxa_atexit calls plus all guarded ones, but the latter only once per guard. The key would be the identity of the guard variable. That seems like an awful lot of complexity at the wrong level for a very specific usecase when we could also make .init_array failable, which then even might have more usecases. > > A completely different way would be to not use cxa_atexit at all: > > allocate memory statically for the object and dtor addresses in > > .rodata (instead of in .text right now), and iterate over those at > > static_destruction time. (For the thread-local ones it would need to > > store arguments to __tls_get_addr). > > That only works if the compiler and linker can figure out the > construction order. In general, that is not possible, and that case > seems even quite common with C++. If the construction order is not > known ahead of time, it is necessary to record it somewhere, so that > destruction can happen in reverse. So I think storing things in .rodata > is out. Hmm, right. The basic idea could be salvaged by also pre-allocating a linked list field in .data (or .tdata), and a per-object-file entry to such list. But failable .init_array looks nicer to me right now. Ciao, Michael. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-24 15:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-23 11:58 Counting static __cxa_atexit calls Florian Weimer 2022-08-23 12:28 ` Nick Clifton 2022-08-23 13:40 ` Michael Matz 2022-08-24 12:06 ` Florian Weimer 2022-08-24 12:53 ` Michael Matz 2022-08-24 14:31 ` Florian Weimer 2022-08-24 15:25 ` Michael Matz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).