RTLD_DEEPBIND interaction with LD

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* RTLD_DEEPBIND interaction with LD_PRELOAD
@ 2023-03-23 11:06 Matthew Parkinson
  2023-03-31 14:52 ` Jonathon Anderson
  0 siblings, 1 reply; 2+ messages in thread
From: Matthew Parkinson @ 2023-03-23 11:06 UTC (permalink / raw)
  To: libc-alpha

[-- Attachment #1: Type: text/plain, Size: 4376 bytes --]

When a shared library is loaded using RTLD_DEEPBIND, it does not use the LD_PRELOADed libraries in preference.  This means that allocator overriding with LD_PRELOAD in applications that load libraries with RTLD_DEEPBIND does not work.  A minimal example can be found here:

    deepbindexample/problem at main · mjp41/deepbindexample · GitHub<https://github.com/mjp41/deepbindexample/tree/main/problem>

This causes issues for a collection of allocators and address sanitizer. More examples can be found on the Bugzilla issue I raised:
  30186 – RTLD_DEEPBIND interacts badly with LD_PRELOAD (sourceware.org)<https://sourceware.org/bugzilla/show_bug.cgi?id=30186>
And the twitter thread:
  Twitter thread on RTLD_DEEPBIND<https://twitter.com/ParkyMatthew/status/1630500641708683268>

I am raising this on libc-alpha to discuss possible solutions, and how acceptable each would be to the community.  This is the list I have so far from discussions with colleagues and feedback from Adhemerval Zanella and Siddhesh Poyarekar:

  1.  Malloc only solutions
     *   Introduce new malloc specific symbols for LD_PRELOAD
     *   Use malloc tunables to specify the allocator
  2.  General solutions
     *   Change RTLD_DEEPBIND to look at LD_PRELOADed libraries first
     *   Introduce new environment variable LD_PRELOAD_OVERRIDE_DEEPBIND(*) that must be respected by RTLD_DEEPBIND
     *   Introduce new RTLD_DEEPBIND_RESPECT_PRELOAD(*) that looks at LD_PRELOAD first.

(*) Naming is not my strong point, just trying to be illustrative.

As an allocator person I am fine with something from “Malloc only solution”, but I also appreciate anything that is added is something that needs to be maintained.  So a quick specific solution may be a long-term bad choice.  The “General solutions” has far more ramifications that I personally don’t understand.

Here are some more details of the specific ideas

1a.  This is probably the quickest solution.  Introduce a collection of internal symbols that are used to override the allocator. I have put a very minimal PoC for a single call at:
  deepbindexample/solutionopt at main · mjp41/deepbindexample · GitHub<https://github.com/mjp41/deepbindexample/tree/main/solutionopt>

The core idea for something exposed would be

__attribute__((visibility("hidden")))
void message_impl()
{
    puts("lib.c: message_impl");
}

__attribute__((weak))
extern void override_message();

extern void message()
{
    if (override_message != NULL)
    {
        puts("lib.c: message -> override_message");
        override_message();
        return;
    }
    puts("lib.c: message -> message_impl");
    message_impl();
}

Here `message` would be the libc function we want to be able to override.  A library that wants to override this would provide both `message` and `override_message`.  This would then work even in the presence of RTLD_DEEPBIND libraries.  The call from a library that was loaded with RTLD_DEEPBIND would call the libc `message`, which would then call the `override_message` from the preload.

This incurs a single load, compare and branch on the fast path when LD_PRELOAD does not occur.  It does not suffer the previous malloc hooks issues as this is a relocation, rather than a code pointer in the data segment.

1b.  This is proposed by Siddhesh Poyarekar. I think the idea is to expose a “Tunable” parameter to specify, which malloc library to use.  This is very appealing and has a clear meaning to me.  I worry a bit about when Tunables are processed and if any allocation occurs before then.

2a. This seems like the nicest solution if RTLD_DEEPBIND didn’t already exist.  It will alter existing semantics of programs, and hence is probably a compatibility nightmare.

2b and 2c. Are both adding a new feature to enable the desired behaviour.  Personally, I prefer 2b as that doesn’t require everything that currently uses RTLD_DEEPBIND to be modified.  However, I do not have enough experience to understand the consequences of either choice properly.

I am sure there are other possible approaches not outlined here, and I am sure there are consequences of each choice that I am not aware of.  However, I do believe making LD_PRELOADing an allocator more reliable is an important feature for glibc.

--
Matthew Parkinson,
Principal Researcher
Microsoft

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: RTLD_DEEPBIND interaction with LD_PRELOAD
  2023-03-23 11:06 RTLD_DEEPBIND interaction with LD_PRELOAD Matthew Parkinson
@ 2023-03-31 14:52 ` Jonathon Anderson
  0 siblings, 0 replies; 2+ messages in thread
From: Jonathon Anderson @ 2023-03-31 14:52 UTC (permalink / raw)
  To: Matthew Parkinson, libc-alpha

Hello all,

Wrapping symbols is of interest to us (the HPCToolkit folks), so I thought I would jump in here and bring up a 3rd option that we are excited about: LD_AUDIT-powered symbol injection.

We currently use LD_PRELOAD + dlsym(RTLD_NEXT) to wrap critical symbols, however we have also encountered unavoidable limitations with the approach for some applications in the wild. We haven't run into RTLD_DEEPBIND before, but we have found many other issues:
 - dlsym(): If the symbol is fetched directly with dlsym() LD_PRELOAD does not apply. (And yes, there is code out there that does `dlsym(dlopen(libc.so.6))`. :/)
 - dlopen(RTLD_LOCAL): dlsym(RTLD_NEXT) fails if the "victim" symbol is only loaded as the dependency of a library loaded with dlopen(RTLD_LOCAL).
 - dlmopen() namespaces: LD_PRELOAD only applies to the main namespace, symbols in private dlmopen() namespaces are unaffected.

The alternative we are considering uses LD_AUDIT's la_symbind hook to inject our wrappers. This hook fires *every* time a symbol gets bound or on dlsym(), avoiding the narrow application issues with LD_PRELOAD. The hook also receives the to-be-bound target as an argument, avoiding the issues with dlsym(RTLD_NEXT). The high power of this approach makes it a very appealing alternative to LD_PRELOAD.

We have not yet tried LD_AUDIT-powered symbol injection in the wild, but I did write [a small test matrix with some possible wrapper implementations][1] for preliminary research. So far, the basic LD_AUDIT-powered implementations are very promising and avoid the issues we see with LD_PRELOAD.

[1]: https://gitlab.com/blue42u/ldaudit-power-tests/-/tree/main/symbol-wrapping

-Jonathon

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-03-31 14:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-23 11:06 RTLD_DEEPBIND interaction with LD_PRELOAD Matthew Parkinson
2023-03-31 14:52 ` Jonathon Anderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).