From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id B67BD385C335; Wed, 24 Aug 2022 15:25:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B67BD385C335 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id B824520885; Wed, 24 Aug 2022 15:25:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1661354751; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ATEftOC8UyUWfj7tXrRwart7lJVJHb2Vcso9mZOsqLg=; b=MtRRufLEAzhDvoZzcKpp7uDHcbzE8sPaKw/NMW4s/oi9qy5Av6Od668PzspEnB0xfqf+3b iIXjV7rBW8SMq6+rFaIP3QTWfrd3V+8yF4pdwS+U5109Cv94pJBonJBHrJYzrTNfSfQMHX Gst9kHT464MTJecBODwPIvKeZv7kItc= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1661354751; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ATEftOC8UyUWfj7tXrRwart7lJVJHb2Vcso9mZOsqLg=; b=Ca2kZ2FHOl2DHelMgrk/7Tw0cH8LKq+tqWHD5I5UGqyVetwJ7KQDWgqoCpNuIwjHfLBiLq xIoqkBvRhaDOSpDg== Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id A15302C172; Wed, 24 Aug 2022 15:25:51 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 8296D6692; Wed, 24 Aug 2022 15:25:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 814FD6687; Wed, 24 Aug 2022 15:25:51 +0000 (UTC) Date: Wed, 24 Aug 2022 15:25:51 +0000 (UTC) From: Michael Matz To: Florian Weimer cc: binutils@sourceware.org, gcc@gcc.gnu.org, libc-alpha@sourceware.org Subject: Re: Counting static __cxa_atexit calls In-Reply-To: <87y1vd67cx.fsf@oldenburg.str.redhat.com> Message-ID: References: <87fshn2mu1.fsf@oldenburg.str.redhat.com> <87k06x7sme.fsf@oldenburg.str.redhat.com> <87y1vd67cx.fsf@oldenburg.str.redhat.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hello, On Wed, 24 Aug 2022, Florian Weimer wrote: > > On Wed, 24 Aug 2022, Florian Weimer wrote: > > > >> > Isn't this merely moving the failure point from exception-at-ctor to > >> > dlopen-fails? > >> > >> Yes, and that is a soft error that can be handled (likewise for > >> pthread_create). > > > > Makes sense. Though that actually hints at a design problem with ELF > > static ctors/dtors: they should be able to soft-fail (leading to dlopen or > > pthread_create error returns). So, maybe the _best_ way to deal with this > > is to extend the definition of the various object-initionalization means > > in ELF to allow propagating failure. > > We could enable unwinding through the dynamic linker perhaps. But as I > said, those Itanium ABI functions tend to be noexcept, so there's work > on that front as well. Yeah, my idea would have been slightly less ambitious: redefine the ABI of .init_array functions to be able to return an int. The loader would abort loading if any of them return non-zero. Now change GCC code emission of those helper functions placed in .init_array to catch all exceptions and (in case an exception happened) return non-zero. Or, even easier, don't deal with exceptions, but rather just check if __cxa_atexit worked, and if not return non-zero right away. That way all the exception propagation (or cxa_atexit error handling) stays purely within the GCC generated code and the dynamic loader only needs to deal with return values, not exceptions and unwinding. For backward compat we can't just change the ABI of .init_array, but we can devise an alternative: .init_array_mayfail and the associated DT tags. > For thread-local storage, it's even more difficult because any first > access can throw even if the constructor is noexcept. That's extending the scope somewhat, pre-counting cxa_atexit wouldn't solve this problem either, right? > >> I think we need some level of link editor support to avoid drastically > >> over-counting multiple static calls that get merged into one > >> implementation as the result of vague linkage. Not sure how to express > >> that at the ELF level? > > > > Hmm. The __cxa_atexit calls are coming from the per-file local static > > initialization_and_destruction routine which doesn't have vague linkage, > > so its contribution to the overall number of cxa_atexit calls doesn't > > change from .o to final-exe. Can you show an example of what you're > > worried about? > > Sorry if I didn't use the correct terminology. > > I was thinking about this: > > #include > > template > struct S { > static std::vector vec; > }; > > template std::vector S::vec(i); > > std::vector & > f() > { > return S<1009>::vec; > } > > The initialization is deduplicated with the help of a guard variable, > and that also bounds to number of __cxa_atexit invocations to at most > one per type. Ah, right, thanks. The guard variable for class-local statics, I was thinking file-scope globals. Double-hmm. I don't readily see a nice way to correctly precalculate the number of cxa_atexit calls here. A simple problem is the following: assume a couple files each defining such class templates, that ultimately define and initialize static members A<1>::a and B<1>::b (assume vague linkage). Assume we have four files: a: defines A::a b: defines B::b ab: defines A::a and B::b ba: defines B::b and A::a Now link order influences which file gets to actually initialize the members and which ones skip it due to guard variables. But the object files themself don't know enough context of which will be which. Not even the link editor know that because the non-taken cxa_atexit calls aren't in linkonce/group sections, there are all there in object.o:.text:_Z41__static_initialization_and_destruction_0ii . So, what would need to be emitted is for instance a list of cxa_atexit calls plus guard variable; the link editor could then count all unguarded cxa_atexit calls plus all guarded ones, but the latter only once per guard. The key would be the identity of the guard variable. That seems like an awful lot of complexity at the wrong level for a very specific usecase when we could also make .init_array failable, which then even might have more usecases. > > A completely different way would be to not use cxa_atexit at all: > > allocate memory statically for the object and dtor addresses in > > .rodata (instead of in .text right now), and iterate over those at > > static_destruction time. (For the thread-local ones it would need to > > store arguments to __tls_get_addr). > > That only works if the compiler and linker can figure out the > construction order. In general, that is not possible, and that case > seems even quite common with C++. If the construction order is not > known ahead of time, it is necessary to record it somewhere, so that > destruction can happen in reverse. So I think storing things in .rodata > is out. Hmm, right. The basic idea could be salvaged by also pre-allocating a linked list field in .data (or .tdata), and a per-object-file entry to such list. But failable .init_array looks nicer to me right now. Ciao, Michael.