From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fweimer@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 0CCBC383D83B
 for <libc-alpha@sourceware.org>; Wed, 11 May 2022 13:59:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0CCBC383D83B
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-524-ETXHHHbjMsW9V-CwzlZF8A-1; Wed, 11 May 2022 09:59:47 -0400
X-MC-Unique: ETXHHHbjMsW9V-CwzlZF8A-1
Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com
 [10.11.54.4])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 350CD85A5A8;
 Wed, 11 May 2022 13:59:47 +0000 (UTC)
Received: from oldenburg.str.redhat.com (unknown [10.39.192.194])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id EDE202026D64;
 Wed, 11 May 2022 13:59:41 +0000 (UTC)
From: Florian Weimer <fweimer@redhat.com>
To: Jonathon Anderson <janderson@rice.edu>
Cc: Carlos O'Donell <carlos@redhat.com>,  Ben Woodard <woodard@redhat.com>,
 Adhemerval Zanella <adhemerval.zanella@linaro.org>,  "Legendre, Matthew
 P." <legendre1@llnl.gov>,  libc-alpha@sourceware.org,  John Mellor-Crummey
 <johnmc@rice.edu>
Subject: Re: LD_AUDIT: Not enough space in static TLS block
References: <da6b1738-a8da-e52a-c9bc-c11cf459abb7@rice.edu>
 <87fsmiiw3x.fsf@oldenburg.str.redhat.com>
 <875ymnxepr.fsf@oldenburg.str.redhat.com>
 <87k0azlwea.fsf@oldenburg.str.redhat.com>
 <fed70868-a629-fb6b-58d8-27876f9fb158@rice.edu>
Date: Wed, 11 May 2022 15:59:39 +0200
In-Reply-To: <fed70868-a629-fb6b-58d8-27876f9fb158@rice.edu> (Jonathon
 Anderson's message of "Thu, 5 May 2022 14:56:10 -0500")
Message-ID: <87fslggofo.fsf@oldenburg.str.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2022 13:59:52 -0000

* Jonathon Anderson:

> This tunable works for us as a stopgap until a long-term solution can be
> implemented.

Good to know, thanks.

> I had a separate (email) chat with Ben Woodard bouncing ideas for a long-=
term solution. A
> major difficulty is that LD_AUDIT currently introduces a cyclic dependenc=
y:
>  - auditors must be loaded before searching for the application's depende=
ncies (since
> la_objsearch may modify the results), and
>  - dependency searches must complete before the static TLS auto-tuning (s=
ince the TLS
> sizes of the initial link-map must be known), but
>  - the static TLS block must be allocated before auditors are loaded (sin=
ce auditors may also
> use initial-exec TLS).
>
> So, I'm not hopeful for a long-term solution that does not involve
> another LAV_CURRENT bump. We (me and Ben) came up with a couple of
> initial solutions: disallowing initial-exec TLS in auditors,

I'm not sure if this feasible.  It would mean we cannot use initial-exec
TLS in glibc at all, or in libstdc++ (in case auditors are written in
C++).

And we don't want to build libraries twice (for auditor usage).

> or per-auditor static TLS blocks (ie. TLS namespaces).

We already have that, but there is just one thread pointer, so that does
not solve the problem.  Using a secondary thread pointer has the second
build problem, too.

Auditor TLS usage has a conceptually simple fix, though.  (It's simple
in concept, but implementation requires some refactoring.)  Recall that
for regular process startup (without auditing), we do this:

  (1) map the main executable (the kernel may do this for us)
  (2) recursively map all the dependencies
  (3) calculate static TLS usage
  (4) allocate static TLS space
  (5) perform relocation
  (6) assign TLS variables their initial values
  (7) start running user code (initializers, main)

Once auditors are in the mix, we do this instead:

  (1) guess static TLS usage
  (2) allocate static TLS space
  (3) load each audit module individual, in sequence, as if per dlmopen:
    (3.1) map the auditor
    (3.2) recursively map all its dependencies
    (3.3) perform relocation
    (3.4) calculate and allocate static TLS space (from the global area)
    (3.5) assign TLS variables their initial values
    (3.6) start running auditor code (ELF constructors, la_version)
  (4) map the main executable (the kernel may do this for us)
  (5) recursively map all the dependencies (may involve la_objsearch)
  (6) calculate and allocate static TLS space (from the global area)
  (7) perform relocation
  (8) assign TLS variables their initial values
  (9) start running user code (initializers, main)

Step (1) is the big problem here, it's just a quick hack to get things
going with TLS, but it has been around for a long time.  What we should
be doing instead is this:

  (1) load each audit module individual, in sequence (no relocation here):
    (1.1) map the auditor
    (1.2) recursively map all its dependencies
    (1.3) calculate static TLS usage for this auditor namespace
  (2) map the main executable (the kernel may do this for us)
  (3) calculate static TLS usage using all TLS size information seen so far
  (4) allocate static TLS space
  (5) complete loading the auditors (relocation and startup):
    (5.1) perform relocation
    (5.2) calculate and allocate static TLS space (from the global area)
    (5.3) assign TLS variables their initial values
    (5.4) start running auditor code (ELF constructors, la_version)
  (6) recursively map all the dependencies (may involve la_objsearch)
  (7) calculate and allocate static TLS space (from the global area)
  (8) perform relocation
  (9) assign TLS variables their initial values
  (10) start running user code (initializers, main)

With this sequence, direct static TLS usage from auditors is taken into
account for the fixed-size TLS allocation at (4), eliminating the
guesswork.  (Step (2) could actually come right before step (6), it
would not alter the picture.)

When no auditor defines la_objsearch, we can do even better and map the
executable and its dependencies before computing the static TLS size,
and only run step (5), complete loading the auditors, after mapping
everything (but before relocation, which needs working auditors for
la_symbind).  In this case, we'd have the same level of information
regarding TLS usage as in the non-auditor case (which is still not
enough in all cases, but another incremental improvement).

With la_objsearch, we could pull a few more tricks.  Auditors could
advertise that the address of their TLS variables do not matter, which
would enable us to relocate the TLS space as we discover more objects
that need more static TLS.  Or we could unload the auditors on TLS
exhaustion and start again with a larger space esimate.  Auditors could
provide their own guesses for static TLS usage that we query upfront and
take into account for the size calculation.

None of this solves the general dlopen case, though.  I have some ideas
for that, which boils down to =E2=80=9Cjust provide enough address space du=
ring
early startup, so that you never exceed it until the initialization
phase with dlopen is complete=E2=80=9D.  This needs a new TCB allocator, th=
ough,
so it's also quite involved to implement.  It does not solve the problem
completely, but I expect that it will eliminate pretty much all
shortcomings of initial-exec TLS we have seen in practice.

With that change, we might not even need the two-phased auditor loading.

> Comments and ideas are welcome. (I would love to have a detailed
> LD_AUDIT discussion at STW in June.)

Uhm, what's STW?

> Our reproducer for the early dl* bug passes with the latest Fedora
> Rawhide, I'll look into using RTLD_DI_PHDR in HPCToolkit in the coming
> weeks.

Thanks!

Florian