From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0CCBC383D83B for ; Wed, 11 May 2022 13:59:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0CCBC383D83B Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-524-ETXHHHbjMsW9V-CwzlZF8A-1; Wed, 11 May 2022 09:59:47 -0400 X-MC-Unique: ETXHHHbjMsW9V-CwzlZF8A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 350CD85A5A8; Wed, 11 May 2022 13:59:47 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.194]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EDE202026D64; Wed, 11 May 2022 13:59:41 +0000 (UTC) From: Florian Weimer To: Jonathon Anderson Cc: Carlos O'Donell , Ben Woodard , Adhemerval Zanella , "Legendre, Matthew P." , libc-alpha@sourceware.org, John Mellor-Crummey Subject: Re: LD_AUDIT: Not enough space in static TLS block References: <87fsmiiw3x.fsf@oldenburg.str.redhat.com> <875ymnxepr.fsf@oldenburg.str.redhat.com> <87k0azlwea.fsf@oldenburg.str.redhat.com> Date: Wed, 11 May 2022 15:59:39 +0200 In-Reply-To: (Jonathon Anderson's message of "Thu, 5 May 2022 14:56:10 -0500") Message-ID: <87fslggofo.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 May 2022 13:59:52 -0000 * Jonathon Anderson: > This tunable works for us as a stopgap until a long-term solution can be > implemented. Good to know, thanks. > I had a separate (email) chat with Ben Woodard bouncing ideas for a long-= term solution. A > major difficulty is that LD_AUDIT currently introduces a cyclic dependenc= y: > - auditors must be loaded before searching for the application's depende= ncies (since > la_objsearch may modify the results), and > - dependency searches must complete before the static TLS auto-tuning (s= ince the TLS > sizes of the initial link-map must be known), but > - the static TLS block must be allocated before auditors are loaded (sin= ce auditors may also > use initial-exec TLS). > > So, I'm not hopeful for a long-term solution that does not involve > another LAV_CURRENT bump. We (me and Ben) came up with a couple of > initial solutions: disallowing initial-exec TLS in auditors, I'm not sure if this feasible. It would mean we cannot use initial-exec TLS in glibc at all, or in libstdc++ (in case auditors are written in C++). And we don't want to build libraries twice (for auditor usage). > or per-auditor static TLS blocks (ie. TLS namespaces). We already have that, but there is just one thread pointer, so that does not solve the problem. Using a secondary thread pointer has the second build problem, too. Auditor TLS usage has a conceptually simple fix, though. (It's simple in concept, but implementation requires some refactoring.) Recall that for regular process startup (without auditing), we do this: (1) map the main executable (the kernel may do this for us) (2) recursively map all the dependencies (3) calculate static TLS usage (4) allocate static TLS space (5) perform relocation (6) assign TLS variables their initial values (7) start running user code (initializers, main) Once auditors are in the mix, we do this instead: (1) guess static TLS usage (2) allocate static TLS space (3) load each audit module individual, in sequence, as if per dlmopen: (3.1) map the auditor (3.2) recursively map all its dependencies (3.3) perform relocation (3.4) calculate and allocate static TLS space (from the global area) (3.5) assign TLS variables their initial values (3.6) start running auditor code (ELF constructors, la_version) (4) map the main executable (the kernel may do this for us) (5) recursively map all the dependencies (may involve la_objsearch) (6) calculate and allocate static TLS space (from the global area) (7) perform relocation (8) assign TLS variables their initial values (9) start running user code (initializers, main) Step (1) is the big problem here, it's just a quick hack to get things going with TLS, but it has been around for a long time. What we should be doing instead is this: (1) load each audit module individual, in sequence (no relocation here): (1.1) map the auditor (1.2) recursively map all its dependencies (1.3) calculate static TLS usage for this auditor namespace (2) map the main executable (the kernel may do this for us) (3) calculate static TLS usage using all TLS size information seen so far (4) allocate static TLS space (5) complete loading the auditors (relocation and startup): (5.1) perform relocation (5.2) calculate and allocate static TLS space (from the global area) (5.3) assign TLS variables their initial values (5.4) start running auditor code (ELF constructors, la_version) (6) recursively map all the dependencies (may involve la_objsearch) (7) calculate and allocate static TLS space (from the global area) (8) perform relocation (9) assign TLS variables their initial values (10) start running user code (initializers, main) With this sequence, direct static TLS usage from auditors is taken into account for the fixed-size TLS allocation at (4), eliminating the guesswork. (Step (2) could actually come right before step (6), it would not alter the picture.) When no auditor defines la_objsearch, we can do even better and map the executable and its dependencies before computing the static TLS size, and only run step (5), complete loading the auditors, after mapping everything (but before relocation, which needs working auditors for la_symbind). In this case, we'd have the same level of information regarding TLS usage as in the non-auditor case (which is still not enough in all cases, but another incremental improvement). With la_objsearch, we could pull a few more tricks. Auditors could advertise that the address of their TLS variables do not matter, which would enable us to relocate the TLS space as we discover more objects that need more static TLS. Or we could unload the auditors on TLS exhaustion and start again with a larger space esimate. Auditors could provide their own guesses for static TLS usage that we query upfront and take into account for the size calculation. None of this solves the general dlopen case, though. I have some ideas for that, which boils down to =E2=80=9Cjust provide enough address space du= ring early startup, so that you never exceed it until the initialization phase with dlopen is complete=E2=80=9D. This needs a new TCB allocator, th= ough, so it's also quite involved to implement. It does not solve the problem completely, but I expect that it will eliminate pretty much all shortcomings of initial-exec TLS we have seen in practice. With that change, we might not even need the two-phased auditor loading. > Comments and ideas are welcome. (I would love to have a detailed > LD_AUDIT discussion at STW in June.) Uhm, what's STW? > Our reproducer for the early dl* bug passes with the latest Fedora > Rawhide, I'll look into using RTLD_DI_PHDR in HPCToolkit in the coming > weeks. Thanks! Florian