public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Jonathon Anderson <janderson@rice.edu>
To: Florian Weimer <fweimer@redhat.com>
Cc: Carlos O'Donell <carlos@redhat.com>,
	Ben Woodard <woodard@redhat.com>,
	Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	"Legendre, Matthew P." <legendre1@llnl.gov>,
	libc-alpha@sourceware.org, John Mellor-Crummey <johnmc@rice.edu>
Subject: Re: LD_AUDIT: Not enough space in static TLS block
Date: Wed, 11 May 2022 12:31:23 -0500	[thread overview]
Message-ID: <9d215c9b-4b70-00a9-2312-e2c5a5560fcf@rice.edu> (raw)
In-Reply-To: <87fslggofo.fsf@oldenburg.str.redhat.com>



On 5/11/22 08:59, Florian Weimer wrote:
>> I had a separate (email) chat with Ben Woodard bouncing ideas for a long-term solution. A
>> major difficulty is that LD_AUDIT currently introduces a cyclic dependency:
>>   - auditors must be loaded before searching for the application's dependencies (since
>> la_objsearch may modify the results), and
>>   - dependency searches must complete before the static TLS auto-tuning (since the TLS
>> sizes of the initial link-map must be known), but
>>   - the static TLS block must be allocated before auditors are loaded (since auditors may also
>> use initial-exec TLS).
>>
>> So, I'm not hopeful for a long-term solution that does not involve
>> another LAV_CURRENT bump. We (me and Ben) came up with a couple of
>> initial solutions: disallowing initial-exec TLS in auditors,
I wasn't very clear with my half-sentence descriptions, let me add some 
more detail to these ideas.
> I'm not sure if this feasible.  It would mean we cannot use initial-exec
> TLS in glibc at all, or in libstdc++ (in case auditors are written in
> C++).
>
> And we don't want to build libraries twice (for auditor usage).
I believe this can be avoided if they're compiled with the gnu2 TLS 
variant, that would allow ld.so to relocate so that:
  - the Glibc in the auditor namespace(s) uses a dynamic TLS segment 
(with a performance hit, of course), but
  - the Glibc in the main namespace(s) uses the static TLS segment (with 
no measurable performance degradation).

This extends to other libraries as well, libstdc++ included. On the 
ld.so side, the startup sequence would then become something like this:

   (1) allocate TCB and DTV (but not static TLS space)
   (2) load each audit module
     (2.1) map the auditor
     (2.2) recursively map all its dependencies
     (2.3) relocate, where:
       (2.3.1) TLSDESC is always relocated in dynamic form
       (2.3.2) any initial-exec relocation (eg. R_X86_64_TPOFF64) throws 
an error
     (2.4) assign TLS variables from their initial values
     (2.5) start running auditor code (ELF constructors, la_version)
   (3) map the main executable
   (4) recursively map all dependencies
   (5) calculate static TLS usage
   (6) allocate static TLS + new TCB
   (7) move old TCB to new location
   (8) relocate
   (9) assign TLS variables their initial values
   (10) start running user code (initializers, main)

Where steps (1-2) and (7) are skipped in the non-auditor case.

(Of course, this idea leaves us in the unenviable position of 
negotiating with our dependencies to compile everything with gnu2 TLS. 
Caveat emptor.)

>> or per-auditor static TLS blocks (ie. TLS namespaces).
> We already have that, but there is just one thread pointer, so that does
> not solve the problem.  Using a secondary thread pointer has the second
> build problem, too.
In this idea the auditors would be responsible for "switching" between 
the multiple thread pointers, where all non-auditor namespaces share one 
"main" thread pointer. The audit API would need to be expanded with a 
few function(-like macros) to manipulate the thread pointer, such as:
  - struct tls_restore tls_switch_to_main_tp();
  - struct tls_restore tls_switch_to_caller_tp();
  - void tls_restore_tp(const struct tls_storage*);

Then auditors (and ld.so) would need to implement a series of 
restrictions/conventions to properly maintain TP, one set of rules could be:
  - All la_* notification calls occur with the auditor's thread pointer set.
  - All calls to code outside the auditor namespace (eg. user code) must 
occur with the main thread pointer set (ie. wrapped in a 
tls_switch_to_main_tp()/tls_restore_tp() pair).
  - All calls from outside the auditor namespace (eg. wrapped symbols) 
occur with the main thread pointer set (ie. the auditor should wrap the 
contents in a tls_switch_to_caller_tp()/tls_restore_tp() pair).

(Of course, any bugs in any auditor's TP management will cause very 
subtle errors in the application, and auditors will need to be careful 
that their dependencies don't naively call code loaded via 
dlmopen(LM_ID_BASE). Caveat emptor.)

> Auditor TLS usage has a conceptually simple fix, though.  (It's simple
> in concept, but implementation requires some refactoring.)  Recall that
> for regular process startup (without auditing), we do this:
>
>    (1) map the main executable (the kernel may do this for us)
>    (2) recursively map all the dependencies
>    (3) calculate static TLS usage
>    (4) allocate static TLS space
>    (5) perform relocation
>    (6) assign TLS variables their initial values
>    (7) start running user code (initializers, main)
>
> Once auditors are in the mix, we do this instead:
>
>    (1) guess static TLS usage
>    (2) allocate static TLS space
>    (3) load each audit module individual, in sequence, as if per dlmopen:
>      (3.1) map the auditor
>      (3.2) recursively map all its dependencies
>      (3.3) perform relocation
>      (3.4) calculate and allocate static TLS space (from the global area)
>      (3.5) assign TLS variables their initial values
>      (3.6) start running auditor code (ELF constructors, la_version)
>    (4) map the main executable (the kernel may do this for us)
>    (5) recursively map all the dependencies (may involve la_objsearch)
>    (6) calculate and allocate static TLS space (from the global area)
>    (7) perform relocation
>    (8) assign TLS variables their initial values
>    (9) start running user code (initializers, main)
>
> Step (1) is the big problem here, it's just a quick hack to get things
> going with TLS, but it has been around for a long time.  What we should
> be doing instead is this:
>
>    (1) load each audit module individual, in sequence (no relocation here):
>      (1.1) map the auditor
>      (1.2) recursively map all its dependencies
>      (1.3) calculate static TLS usage for this auditor namespace
>    (2) map the main executable (the kernel may do this for us)
>    (3) calculate static TLS usage using all TLS size information seen so far
>    (4) allocate static TLS space
>    (5) complete loading the auditors (relocation and startup):
>      (5.1) perform relocation
>      (5.2) calculate and allocate static TLS space (from the global area)
>      (5.3) assign TLS variables their initial values
>      (5.4) start running auditor code (ELF constructors, la_version)
>    (6) recursively map all the dependencies (may involve la_objsearch)
>    (7) calculate and allocate static TLS space (from the global area)
>    (8) perform relocation
>    (9) assign TLS variables their initial values
>    (10) start running user code (initializers, main)
>
> With this sequence, direct static TLS usage from auditors is taken into
> account for the fixed-size TLS allocation at (4), eliminating the
> guesswork.  (Step (2) could actually come right before step (6), it
> would not alter the picture.)
Step (3) doesn't account for the static TLS usage of the executable's 
dependencies, so this doesn't completely solve the issue. (The cyclic 
dependency I mentioned before is roughly (3) -> (4) -> (5.4) -> (6) -> (3).)

> When no auditor defines la_objsearch, we can do even better and map the
> executable and its dependencies before computing the static TLS size,
> and only run step (5), complete loading the auditors, after mapping
> everything (but before relocation, which needs working auditors for
> la_symbind).  In this case, we'd have the same level of information
> regarding TLS usage as in the non-auditor case (which is still not
> enough in all cases, but another incremental improvement).
> With la_objsearch, we could pull a few more tricks.  Auditors could
> advertise that the address of their TLS variables do not matter, which
> would enable us to relocate the TLS space as we discover more objects
> that need more static TLS.
If I understand correctly, this optimization would require that the 
auditor and all its dependencies don't take the address of TLS 
variables. This won't be feasible for us (we have complex dependencies), 
to be used by any reasonable auditor this restriction would have to be 
satisfied by (at least) Glibc and libstdc++.

Compiler assistance would help here, although I highly doubt this 
restriction will be often achieved with C++ code (eg. returning a const 
reference to a TLS variable is enough to break this restriction).

>    Or we could unload the auditors on TLS
> exhaustion and start again with a larger space esimate.
We would need some additions to the auditor API to indicate when this is 
happening (vs. normal program termination) and to move data to the 
reloaded instance. This would also require significant violence to our 
measurement infrastructure to support save/reload like this, so this is 
not really a preferable solution.

>    Auditors could
> provide their own guesses for static TLS usage that we query upfront and
> take into account for the size calculation.
Is this different from setting the tunable (except with a fancier 
interface)?

> None of this solves the general dlopen case, though.  I have some ideas
> for that, which boils down to “just provide enough address space during
> early startup, so that you never exceed it until the initialization
> phase with dlopen is complete”.  This needs a new TCB allocator, though,
> so it's also quite involved to implement.  It does not solve the problem
> completely, but I expect that it will eliminate pretty much all
> shortcomings of initial-exec TLS we have seen in practice.
>
> With that change, we might not even need the two-phased auditor loading.
I'm not sure I understand this solution, would this require the static 
TLS block to grow as new libraries are loaded during ELF constructors? 
Or for the entire execution? Would this then mean that any library can 
use initial-exec TLS, regardless of whether it's part of the initial 
link map?

It sounds magical, but if it's possible it would definitely solve the issue.

>> Comments and ideas are welcome. (I would love to have a detailed
>> LD_AUDIT discussion at STW in June.)
> Uhm, what's STW?
As Dr. Mellor-Crummey corrected me, it's the Scalable Tools Workshop 
(https://dyninst.github.io/scalable_tools_workshop/petascale2022/). 
Force of habit using the acronym, sorry for the confusion.

-Jonathon

      reply	other threads:[~2022-05-11 17:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11 20:24 Jonathon Anderson
2022-04-12  7:44 ` Florian Weimer
2022-05-03  7:22   ` Florian Weimer
2022-05-05 17:30     ` Florian Weimer
2022-05-05 19:56       ` Jonathon Anderson
2022-05-11 13:59         ` Florian Weimer
2022-05-11 17:31           ` Jonathon Anderson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9d215c9b-4b70-00a9-2312-e2c5a5560fcf@rice.edu \
    --to=janderson@rice.edu \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=johnmc@rice.edu \
    --cc=legendre1@llnl.gov \
    --cc=libc-alpha@sourceware.org \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).