public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: John Mellor-Crummey <johnmc@rice.edu>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: John Mellor-Crummey <johnmc@rice.edu>,
	libc-alpha@sourceware.org, woodard@redhat.com,
	Florian Weimer <fweimer@redhat.com>
Subject: Re: A collection of LD_AUDIT bugs that are important for tools (with better formatting for this list)
Date: Fri, 18 Jun 2021 12:48:01 -0500	[thread overview]
Message-ID: <AE0FD6DC-48C7-444B-AF2A-16EDD1F30CDC@rice.edu> (raw)
In-Reply-To: <2fc830b9-35da-9b94-369f-4df683078a5c@linaro.org>

Adhemerval,

Thanks for your detailed response to the LD_AUDIT issues I raised.
My responses to your questions are interspersed below.

John

> On Jun 17, 2021, at 2:42 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> 
> 
> 
> On 16/06/2021 14:55, John Mellor-Crummey via Libc-alpha wrote:
>> I was encouraged to notify this list about several LD_AUDIT bugs that
>> have significant impact on performance tools that we are developing
>> for Linux in general and US Department of Energy (DOE) parallel
>> supercomputers in particular.
>> 
>> My team develops the HPCToolkit performance tools
>> (https://hpctoolkit.org, https://github.com/HPCToolkit/hpctoolkit)
>> under funding from the DOE. We have been modifying our measurement
>> subsystem to interpose itself between an application and the OS using
>> glibc’s LD_AUDIT capability.
>> 
>> For such tools to succeed, we need many of LD_AUDIT’s features to
>> work. Over the last eight months, we identified six bugs on systems we
>> use. We believe that four are still problems in upstream glibc and two
>> may be fixed (as noted).
>> 
>> We would like confirmed fixes for the following bugs in upstream
>> glibc:
> 
> 
> Thank you very much to approach us about the current state of audit
> interface from the user point of view.  There are some recent discussion
> started by Ben Coyote Woodard some time ago that has stalled.
> 
> I checked the issues you listed below with master (I can't really
> comment their status regarding the usual HPC distributions).
> 
>> 
>> ----------------------------------------------------------
>> Priority   | Issue 
>> —————————————————————————————
>> VERY       | When using an auditor, there is an unacceptable
>> HIGH       | performance degradation of over 10x for PLT
>>           | calls to small procedures even when neither
>>           | la_pltenter or la_pltexit is present.
> 
> The fix provided by https://sourceware.org/legacy-ml/libc-alpha/2013-05/msg00888.html
> seems to fix, I am not sure why it has not been applied.  It also seems to not
> show any regressions.
> 
> I will respin the patch and check if we can add a testcase.

We would be delighted to have this problem resolved!

Trying to avoid the overhead of the profiling resolver in our tool (overriding
the library state set up by glibc to use the standard resolver, ensuring the GOT table 
is writable, etc.) is a complex hack. While we’ve done it right in most cases, there are 
edge cases where we still go through the profiling resolver when monitoring Firefox. 

Having glibc record the binding in the GOT as per usual for PLT calls is best.

> 
> 
>> ----------------------------------------------------------
>> HIGH       | When auditing, a dlmopen of a shared library
>>           | causes a SEGV.
> 
> This does not show on master, I will need to pinpoint exactly which are the
> commits that actually fixed it.
> 
>> ----------------------------------------------------------
>> HIGH       | la_symbind isn't always called when appropriate.
>>           | We observed that glibc 2.26 calls la_symbind
>>           | when appropriate; glibc 2.28 does not.
> 
> I am not sure how to proper fix it, maybe an option is just to disable bind-now
> if we have any audit module enabled. We will need to discuss the possible
> implications.
> 
> (btw your auditor-tests/symbind does not enable bind-now with either
> LD_BIND_NOW=1 or -Wl,-z,now).

Our test was expecting normal lazy binding. 

I would argue that there should be la_symbind callbacks even in the
LD_BIND_NOW case. For us to use la_symbind to replace an application’s use
of a routine with the use of a wrapped routine that informs our tool as needed
(e.g. a call to pthread_create, fork, MPI_Init, etc.), we need the la_symbind callbacks
even if when LD_BIND_NOW is specified for a library.

> 
>> ----------------------------------------------------------
>> HIGH       | glibc does not save all necessary registers
>>           | (e.g. X8 - the indirect result register, truncated
>>           | SIMD registers) when auditing on aarch64 since
>>           | the beginning of time.
> 
> This has been already discussed on the maillist:

I am not surprised that this has been mentioned before. We originally reported 
the problem to Ben Woodard back in September and I know from the Bugzilla records
(and from Ben) that there has been a lot of activity exploring the implications since.

In hindsight, I think our tool may not need this problem to be resolved as long as the
top issue (avoiding profiling PLT calls when it is not wanted) is resolved.
Since we have no interest in profiling PLT calls, I think that having glibc avoid the 
unwanted calls to _dl_runtime_profile by using the regular resolver instead would fix 
our issue since the broken code is in _dl_runtime_profile.

> * NEON support: we have a proposed [1] patch that addresses it by extending the 
>  export struct used. Although the patch seems to fix the issue described by 
>  BZ#26643 it is still incomplete: it would required to bump LAV_CURRENT for
>  aarch64 (and add a arch-specific way to override it), add tests, and check 
>  on how to handle possible incompatbilities. From a briefly chat with other
>  glibc maintainer, we can just set that audit version lower LAV_CURRENT are
>  just unsupported.
> 
> * SVE support: as indicated by Szabolcs SVE calls are marked with the 
>  STO_AARCH64_VARIANT_PCS and thus explicit not supported by dynamic loader. 
>  To support SVE, it would require save/restore all registers (but pass down 
>  arg and retval registers to pltentry/exit callbacks according to the base 
>  PCS). Another option would be to use different LAV_CURRENT version, one for
>  NEON and SVE with 128-bits and another for SVE larger than 256-bits.
>  This also has performance implications.
> 
> [1] https://patchwork.sourceware.org/project/glibc/patch/20200923160448.2321909-1-woodard@redhat.com/
> [2] https://sourceware.org/pipermail/libc-alpha/2020-September/117822.html
> 
>> ----------------------------------------------------------
>> LOW        | When auditing, a dlopen of a shared library
>>           | that uses R_X86_64_TLSDESC causes a SEGV. This
>>           | is reportedly fixed in glibc 2.34.
> 
> It should be fixed by BZ#27137 (8f7e09f4dbdb5c815a18b8285fbc5d5d7bc17d86),
> however it has regressed by 03e187a41d9.  We need the following fix and we
> definitely need a regression tests for this (I will probably use your
> auditor-test) as base:
> 
> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index d2240d8747..d2638ebf05 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -771,7 +771,7 @@ dl_open_worker (void *a)
>     {
>       struct link_map *libc_map = GL(dl_ns)[args->nsid].libc_map;
> #ifdef SHARED
> -      bool initial = libc_map->l_ns == LM_ID_BASE;
> +      bool initial = libc_map != NULL ? libc_map->l_ns == LM_ID_BASE : false;
> #else
>       /* In the static case, there is only one namespace, but it
>         contains a secondary libc (the primary libc is statically

A fix would be welcome. I don’t understand the glibc issues in detail for this bug
and I’m glad to have help from others that do!

> 
> 
>> ----------------------------------------------------------
>> LOW   | An auditor added to an executable at link time
>>           | with --audit=auditor.so and noted in the DT_AUDIT
>>           | entry of the dynamic section is not called at
>>           | runtime. This is reportedly fixed in glibc 2.32.
>> ----------------------------------------------------------
>> 
> 
> This has also fixed by BZ#24943 on glibc 2.32

Great!

> 
>> 
>> 
>> A repository of reproducers for these bugs can be found here:
>> https://github.com/hpctoolkit/auditor-tests.
> 
> Thanks, this is really helpful.
> 
>> 
>> A detailed writeup of everything known about each of these bugs,
>> including links to Red Hat and Sourceware Bugzilla entries, if any are
>> known to exist, can be found here:
>> https://docs.google.com/document/d/1dVaDBdzySecxQqD6hLLzDrEF18M1UtjDna9gL5BWWI0/edit?usp=sharing
>> 
> 
> And you are correct about your assessment on the document, we *really* need
> more extensible tests for audit interface.  It would be really helpful if
> we could add more tests to exercise the real world usage from tools that
> rely on audit modules.

I know that the things in our test repository aren’t ideal for automated regression
tests since they typically involve looking at the printed output or noticing a SEGV.
At the outset, we were most interested in using them to document the bugs. 
We can work with Ben Woodard to create proper auditor tests.

> 
>> Technical stakeholders for platforms that are HPCToolkit’s principal
>> targets under DOE funding, especially the exascale computing program:
>> 
>> ----------------------------------------------------------
>> Stakeholder | Why
>> ----------------------------------------------------------
>> Intel       | Prime contractor on Aurora exascale system at
>>            | Argonne National Laboratory
>> ----------------------------------------------------------
>> IBM         | Prime contractor and processor vendor for Summit
>>            | and Sierra supercomputers at Oak Ridge National
>>            | Laboratory and Lawrence Livermore  National
>>            | Laboratory.  
>> ----------------------------------------------------------
>> ARM         | Stakeholder who wants all ARM Linux platforms
>>            | to succeed, including Sandia National Laboratory's
>>            | Astra supercomputer and SUNY Stony Brook's
>>            | A64FX-based Ookami.
>> ----------------------------------------------------------
>> AMD         | Processor vendor for Frontier and El Capitan
>>            | exascale supercomputers at Oak Ridge and Lawrence
>>            | Livermore National Laboratories.  
>> ----------------------------------------------------------
>> SuSE        | Linux distribution provider for Cray systems to be
>>            | delivered to Oak Ridge and Lawrence Livermore
>>            | National Laboratories and the A64FX-based system
>>            | installed at SUNY Stony Brook.  
>> ----------------------------------------------------------
>> Red Hat     | Linux distribution provider for Oak Ridge
>>            | National Laboratory s Summit, Lawrence Livermore
>>            | National Laboratory s Sierra.
>> ----------------------------------------------------------
>> Cray        | Prime contractor and system vendor for Oak Ridge
>>            | and Lawrence Livermore National Laboratories,
>>            | and SUNY Stony Brook; system vendor for Argonne
>>            | National Laboratory.
>> ----------------------------------------------------------
>> 
>> 
>> For reference, here is a pointer to the portion of our tool that uses
>> the LD_AUDIT interface:
>> https://github.com/HPCToolkit/hpctoolkit/blob/master/src/tool/hpcrun/audit/auditor.c
>> 
>> Here are some of the capabilities of LD_AUDIT that we need to work and why:
>> 
>> -  We use LD_AUDIT’s la_objopen and la_objclose to track what objects are
>>   in an application’s address space so that our measurement subsystem
>>   can unwind the call stack when a profiling signal is
>>   received. Tracking libraries by wrapping dlopen is problematic for
>>   several reasons. For instance, a wrapper would need to implement RPATH
>>   and RUNPATH semantics because glibc does not provide an alternate
>>   dlopen interface (like _dlsym) so that a wrapper can provide the
>>   return address in the requesting library as an argument which glibc
>>   needs to determine the R_PATH and RUNPATH to use when trying to find
>>   the library and its dependencies.
> 
> It is not clear to me  what kind of extensible interface for audit you are 
> asking here or if la_objopen/la_objclose is already suffice.

The existing la_objopen/la_objclose suffice, provided there is not a blocker that 
prevents us from using LD_AUDIT. 

However, it is worth noting that there is really no other reasonable option 
(e.g. a simple wrapper for dlopen) since there is not an alternate interface 
for dlopen where one can provide the return address (which dlopen uses to 
determine the applicable RPATH or RUNPATH) as one can with _dlsym.

>> -  We want to use LD_AUDIT’s la_symbind32 and la_symbind64 to interpose
>>   wrappers around key functions, e.g. pthread_create. This enables a
>>   tool to intercept functions invoked through pointers obtained with
>>   dlsym, which preloaded wrappers can’t do. (Note: We don’t use
>>   la_symbind for interposition yet, but we plan to when it works
>>   everywhere.)
>> 
>> -  We need auditing to work when an application or a tool library (e.g.,
>>   Intel’s GT-Pin) opens a shared library with dlmopen.
>> 
>> -  We need auditing to work when opening a dynamic library with TLS
>>   dialect gnu2 relocations on x86_64 (R_X86_64_TLSDESC). We don’t have
>>   any special interest in such relocations; at present, they cause a
>>   SEGV when auditing and that must be avoided.
> 
> This should be fixed minus the regression above.
> 
>> 
>> -  We want to add an auditor to an application at link time, noted in the
>>   DT_AUDIT entry of the dynamic section. Loading the DT_AUDIT entry as a
>>   program is launched enables our profiler to be injected into an
>>   application’s address space without a wrapper script that sets
>>   LD_AUDIT and LD_PRELOAD.
> 
> So if I understood correctly you are asking something like a DT_FILTER
> for DT_AUDIT?

Here, I am just explaining how we would use DT_AUDIT. A working 
implementation of DT_AUDIT would meet our need here.

>> -  LD_AUDIT needs to work on aarch64, which is an important target for
>>   our tools. The fact that _dl_runtime_profile does not save register x8
>>   (the indirect result register) is often fatal for applications, which
>>   makes LD_AUDIT unusable for any purpose on aarch64.
>> 
>> -  LD_AUDIT needs to support auditing of inter-object calls on aarch64
>>   when SVE registers are in use.
>> 
>> As a final thing to consider: we understand that there is a tension
>> between security and auditability. We are concerned that changes being
>> considered for security may compromise observability for tools. For
>> tools, we would need a way to authorize full observability even in the
>> cases when that may theoretically reduce security. Perhaps setting
>> DT_AUDIT could be considered as authorizing full observability.
>> 
>> --
>> John Mellor-Crummey         Professor
>> Dept of Computer Science    Rice University
>> email: johnmc@rice.edu      phone: 713-348-5179
>> 


  parent reply	other threads:[~2021-06-18 17:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-16 17:55 John Mellor-Crummey
2021-06-17 19:42 ` Adhemerval Zanella
2021-06-17 20:09   ` Florian Weimer
2021-06-17 23:06     ` Adhemerval Zanella
2021-06-23 17:42       ` Ben Woodard
2021-07-30 14:58         ` Adhemerval Zanella
2021-07-30 18:59           ` Ben Woodard
2021-07-30 21:09             ` Adhemerval Zanella
2021-07-31  0:59               ` Ben Woodard
2021-08-04 18:11                 ` Adhemerval Zanella
2021-08-05 10:32                   ` Szabolcs Nagy
2021-08-05 19:36                     ` Ben Woodard
2021-08-06  9:04                       ` Szabolcs Nagy
2021-06-21 19:42     ` John Mellor-Crummey
2021-06-22  8:15       ` Florian Weimer
2021-06-22 15:04         ` John Mellor-Crummey
2021-06-22 15:36           ` Florian Weimer
2021-06-22 16:17             ` John Mellor-Crummey
2021-06-22 16:33               ` Adhemerval Zanella
2021-06-23  6:32                 ` Florian Weimer
2021-06-23 20:06                   ` Mark Krentel
2021-06-18 17:48   ` John Mellor-Crummey [this message]
2021-06-18 18:27     ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AE0FD6DC-48C7-444B-AF2A-16EDD1F30CDC@rice.edu \
    --to=johnmc@rice.edu \
    --cc=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=woodard@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).