Re: LD_AUDIT: Not enough space in static TLS block

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Florian Weimer <fweimer@redhat.com>
To: Jonathon Anderson <janderson@rice.edu>
Cc: Carlos O'Donell <carlos@redhat.com>,
	 Ben Woodard <woodard@redhat.com>,
	Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	 "Legendre, Matthew P." <legendre1@llnl.gov>,
	 libc-alpha@sourceware.org,
	 John Mellor-Crummey <johnmc@rice.edu>
Subject: Re: LD_AUDIT: Not enough space in static TLS block
Date: Wed, 11 May 2022 15:59:39 +0200	[thread overview]
Message-ID: <87fslggofo.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <fed70868-a629-fb6b-58d8-27876f9fb158@rice.edu> (Jonathon Anderson's message of "Thu, 5 May 2022 14:56:10 -0500")

* Jonathon Anderson:

> This tunable works for us as a stopgap until a long-term solution can be
> implemented.

Good to know, thanks.

> I had a separate (email) chat with Ben Woodard bouncing ideas for a long-term solution. A
> major difficulty is that LD_AUDIT currently introduces a cyclic dependency:
>  - auditors must be loaded before searching for the application's dependencies (since
> la_objsearch may modify the results), and
>  - dependency searches must complete before the static TLS auto-tuning (since the TLS
> sizes of the initial link-map must be known), but
>  - the static TLS block must be allocated before auditors are loaded (since auditors may also
> use initial-exec TLS).
>
> So, I'm not hopeful for a long-term solution that does not involve
> another LAV_CURRENT bump. We (me and Ben) came up with a couple of
> initial solutions: disallowing initial-exec TLS in auditors,

I'm not sure if this feasible.  It would mean we cannot use initial-exec
TLS in glibc at all, or in libstdc++ (in case auditors are written in
C++).

And we don't want to build libraries twice (for auditor usage).

> or per-auditor static TLS blocks (ie. TLS namespaces).

We already have that, but there is just one thread pointer, so that does
not solve the problem.  Using a secondary thread pointer has the second
build problem, too.

Auditor TLS usage has a conceptually simple fix, though.  (It's simple
in concept, but implementation requires some refactoring.)  Recall that
for regular process startup (without auditing), we do this:

  (1) map the main executable (the kernel may do this for us)
  (2) recursively map all the dependencies
  (3) calculate static TLS usage
  (4) allocate static TLS space
  (5) perform relocation
  (6) assign TLS variables their initial values
  (7) start running user code (initializers, main)

Once auditors are in the mix, we do this instead:

  (1) guess static TLS usage
  (2) allocate static TLS space
  (3) load each audit module individual, in sequence, as if per dlmopen:
    (3.1) map the auditor
    (3.2) recursively map all its dependencies
    (3.3) perform relocation
    (3.4) calculate and allocate static TLS space (from the global area)
    (3.5) assign TLS variables their initial values
    (3.6) start running auditor code (ELF constructors, la_version)
  (4) map the main executable (the kernel may do this for us)
  (5) recursively map all the dependencies (may involve la_objsearch)
  (6) calculate and allocate static TLS space (from the global area)
  (7) perform relocation
  (8) assign TLS variables their initial values
  (9) start running user code (initializers, main)

Step (1) is the big problem here, it's just a quick hack to get things
going with TLS, but it has been around for a long time.  What we should
be doing instead is this:

  (1) load each audit module individual, in sequence (no relocation here):
    (1.1) map the auditor
    (1.2) recursively map all its dependencies
    (1.3) calculate static TLS usage for this auditor namespace
  (2) map the main executable (the kernel may do this for us)
  (3) calculate static TLS usage using all TLS size information seen so far
  (4) allocate static TLS space
  (5) complete loading the auditors (relocation and startup):
    (5.1) perform relocation
    (5.2) calculate and allocate static TLS space (from the global area)
    (5.3) assign TLS variables their initial values
    (5.4) start running auditor code (ELF constructors, la_version)
  (6) recursively map all the dependencies (may involve la_objsearch)
  (7) calculate and allocate static TLS space (from the global area)
  (8) perform relocation
  (9) assign TLS variables their initial values
  (10) start running user code (initializers, main)

With this sequence, direct static TLS usage from auditors is taken into
account for the fixed-size TLS allocation at (4), eliminating the
guesswork.  (Step (2) could actually come right before step (6), it
would not alter the picture.)

When no auditor defines la_objsearch, we can do even better and map the
executable and its dependencies before computing the static TLS size,
and only run step (5), complete loading the auditors, after mapping
everything (but before relocation, which needs working auditors for
la_symbind).  In this case, we'd have the same level of information
regarding TLS usage as in the non-auditor case (which is still not
enough in all cases, but another incremental improvement).

With la_objsearch, we could pull a few more tricks.  Auditors could
advertise that the address of their TLS variables do not matter, which
would enable us to relocate the TLS space as we discover more objects
that need more static TLS.  Or we could unload the auditors on TLS
exhaustion and start again with a larger space esimate.  Auditors could
provide their own guesses for static TLS usage that we query upfront and
take into account for the size calculation.

None of this solves the general dlopen case, though.  I have some ideas
for that, which boils down to “just provide enough address space during
early startup, so that you never exceed it until the initialization
phase with dlopen is complete”.  This needs a new TCB allocator, though,
so it's also quite involved to implement.  It does not solve the problem
completely, but I expect that it will eliminate pretty much all
shortcomings of initial-exec TLS we have seen in practice.

With that change, we might not even need the two-phased auditor loading.

> Comments and ideas are welcome. (I would love to have a detailed
> LD_AUDIT discussion at STW in June.)

Uhm, what's STW?

> Our reproducer for the early dl* bug passes with the latest Fedora
> Rawhide, I'll look into using RTLD_DI_PHDR in HPCToolkit in the coming
> weeks.

Thanks!

Florian

next prev parent reply	other threads:[~2022-05-11 13:59 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11 20:24 Jonathon Anderson
2022-04-12  7:44 ` Florian Weimer
2022-05-03  7:22   ` Florian Weimer
2022-05-05 17:30     ` Florian Weimer
2022-05-05 19:56       ` Jonathon Anderson
2022-05-11 13:59         ` Florian Weimer [this message]
2022-05-11 17:31           ` Jonathon Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fslggofo.fsf@oldenburg.str.redhat.com \
    --to=fweimer@redhat.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=janderson@rice.edu \
    --cc=johnmc@rice.edu \
    --cc=legendre1@llnl.gov \
    --cc=libc-alpha@sourceware.org \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).