When a shared library is loaded using RTLD_DEEPBIND, it does not use the LD_PRELOADed libraries in preference.  This means that allocator overriding with LD_PRELOAD in applications that load libraries with RTLD_DEEPBIND does not work.  A minimal example can be found here:

    deepbindexample/problem at main · mjp41/deepbindexample · GitHub<https://github.com/mjp41/deepbindexample/tree/main/problem>

This causes issues for a collection of allocators and address sanitizer. More examples can be found on the Bugzilla issue I raised:
  30186 – RTLD_DEEPBIND interacts badly with LD_PRELOAD (sourceware.org)<https://sourceware.org/bugzilla/show_bug.cgi?id=30186>
And the twitter thread:
  Twitter thread on RTLD_DEEPBIND<https://twitter.com/ParkyMatthew/status/1630500641708683268>

I am raising this on libc-alpha to discuss possible solutions, and how acceptable each would be to the community.  This is the list I have so far from discussions with colleagues and feedback from Adhemerval Zanella and Siddhesh Poyarekar:

  1.  Malloc only solutions
     *   Introduce new malloc specific symbols for LD_PRELOAD
     *   Use malloc tunables to specify the allocator
  2.  General solutions
     *   Change RTLD_DEEPBIND to look at LD_PRELOADed libraries first
     *   Introduce new environment variable LD_PRELOAD_OVERRIDE_DEEPBIND(*) that must be respected by RTLD_DEEPBIND
     *   Introduce new RTLD_DEEPBIND_RESPECT_PRELOAD(*) that looks at LD_PRELOAD first.

(*) Naming is not my strong point, just trying to be illustrative.

As an allocator person I am fine with something from “Malloc only solution”, but I also appreciate anything that is added is something that needs to be maintained.  So a quick specific solution may be a long-term bad choice.  The “General solutions” has far more ramifications that I personally don’t understand.

Here are some more details of the specific ideas

1a.  This is probably the quickest solution.  Introduce a collection of internal symbols that are used to override the allocator. I have put a very minimal PoC for a single call at:
  deepbindexample/solutionopt at main · mjp41/deepbindexample · GitHub<https://github.com/mjp41/deepbindexample/tree/main/solutionopt>

The core idea for something exposed would be

__attribute__((visibility("hidden")))
void message_impl()
{
    puts("lib.c: message_impl");
}

__attribute__((weak))
extern void override_message();

extern void message()
{
    if (override_message != NULL)
    {
        puts("lib.c: message -> override_message");
        override_message();
        return;
    }
    puts("lib.c: message -> message_impl");
    message_impl();
}

Here `message` would be the libc function we want to be able to override.  A library that wants to override this would provide both `message` and `override_message`.  This would then work even in the presence of RTLD_DEEPBIND libraries.  The call from a library that was loaded with RTLD_DEEPBIND would call the libc `message`, which would then call the `override_message` from the preload.

This incurs a single load, compare and branch on the fast path when LD_PRELOAD does not occur.  It does not suffer the previous malloc hooks issues as this is a relocation, rather than a code pointer in the data segment.

1b.  This is proposed by Siddhesh Poyarekar. I think the idea is to expose a “Tunable” parameter to specify, which malloc library to use.  This is very appealing and has a clear meaning to me.  I worry a bit about when Tunables are processed and if any allocation occurs before then.

2a. This seems like the nicest solution if RTLD_DEEPBIND didn’t already exist.  It will alter existing semantics of programs, and hence is probably a compatibility nightmare.

2b and 2c. Are both adding a new feature to enable the desired behaviour.  Personally, I prefer 2b as that doesn’t require everything that currently uses RTLD_DEEPBIND to be modified.  However, I do not have enough experience to understand the consequences of either choice properly.

I am sure there are other possible approaches not outlined here, and I am sure there are consequences of each choice that I am not aware of.  However, I do believe making LD_PRELOADing an allocator more reliable is an important feature for glibc.

--
Matthew Parkinson,
Principal Researcher
Microsoft