public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
@ 2024-07-17 19:00 woodard at redhat dot com
  2024-08-06  9:13 ` [Bug dynamic-link/31986] " fweimer at redhat dot com
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: woodard at redhat dot com @ 2024-07-17 19:00 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

            Bug ID: 31986
           Summary: Loading the same library within an audit library and
                    within an application can cause ld.so to crash with an
                    assert.
           Product: glibc
           Version: 2.39
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
          Assignee: unassigned at sourceware dot org
          Reporter: woodard at redhat dot com
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

Created attachment 15630
  --> https://sourceware.org/bugzilla/attachment.cgi?id=15630&action=edit
reproducer

This is a particularly serious issue for auditor-based tools that need to
interface with binaries within the application namespace. Tools often need to
make calls to a library immediately when it is loaded before application code
starts to use the library. It is not safe to call into the library prior to its
init constructors and the auditor interface does not provide a callback after
init constructors have run, thus the only alternative is to "promote" the init
constructors through a recursive call to dl*open during
la_activity(CONSISTENT).

In particular cases the dynamic linker asserts with:

Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin:
Assertion `r_state == RT_CONSISTENT' failed!
make: [Makefile:19: test] Error 127 (ignored)

to run the attached reproducer simply:

tar xvzf recursive-dlopen-crashes.tar.gz
cd recursive-dlopen-crashes
make

The two test cases which fail are at the end of the output:

Outer dlopen(libinit), inner dlopen(libinit):
LD_AUDIT=./auditor.so ./main
[main] Dlopening libinit...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin:
Assertion `r_state == RT_CONSISTENT' failed!
make: [Makefile:19: test] Error 127 (ignored)

Outer dlopen(libwrap), inner dlopen(libwrap):
LD_AUDIT=./auditor-wrap.so ./main-wrap
[main] Dlopening libwrap...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libwrap...
Inconsistency detected by ld.so: dl-open.c: 627: dl_open_worker_begin:
Assertion `r_state == RT_CONSISTENT' failed!
make: [Makefile:22: test] Error 127 (ignored)

The particular use case is particularly problematic for LD_AUDIT based tools
which work with GPU frameworks. For instance, in HPCToolkit as part of
initialization we call into libcuda.so to set up callbacks for monitoring CUDA
operations. We call dlopen/dlsym to access the libcuda.so API without creating
a direct dependency (to prevent loading libcuda.so for non-CUDA applications).
However, some application frameworks initiate CUDA operations during their init
constructors, to allow us to capture these operations we initialize when
libcuda.so is loaded to capture other operations of interest, such as thread
creation. If the first action by an application framework's init constructor is
a dlopen(libcuda.so) (seen in IBM’s XL OpenMP runtime when used by Clang for
OpenMP offloading), we initialize during this call and recursively
dlopen(libcuda.so), and subsequently crash due to this bug.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
@ 2024-08-06  9:13 ` fweimer at redhat dot com
  2024-08-06 17:50 ` fweimer at redhat dot com
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2024-08-06  9:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|libc                        |dynamic-link
             Status|NEW                         |ASSIGNED
              Flags|                            |security-
                 CC|                            |fweimer at redhat dot com
           Assignee|unassigned at sourceware dot org   |fweimer at redhat dot com

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
The recursive dlmopen hits the assert in the already-open path in
dl_open_worker_begin:

  /* It was already open.  */
  if (__glibc_unlikely (new->l_searchlist.r_list != NULL))
    {
      /* Let the user know about the opencount.  */
      if (__glibc_unlikely (GLRO(dl_debug_mask) & DL_DEBUG_FILES))
        _dl_debug_printf ("opening file=%s [%lu]; direct_opencount=%u\n\n",
                          new->l_name, new->l_ns, new->l_direct_opencount);

      /* If the user requested the object to be in the global
         namespace but it is not so far, prepare to add it now.  This
         can raise an exception to do a malloc failure.  */
      if ((mode & RTLD_GLOBAL) && new->l_global == 0)
        add_to_global_resize (new);

      /* Mark the object as not deletable if the RTLD_NODELETE flags
         was passed.  */
      if (__glibc_unlikely (mode & RTLD_NODELETE))
        {
          if (__glibc_unlikely (GLRO (dl_debug_mask) & DL_DEBUG_FILES)
              && !new->l_nodelete_active)
            _dl_debug_printf ("marking %s [%lu] as NODELETE\n",
                              new->l_name, new->l_ns);
          new->l_nodelete_active = true;
        }

      /* Finalize the addition to the global scope.  */
      if ((mode & RTLD_GLOBAL) && new->l_global == 0)
        add_to_global_update (new);

      const int r_state __attribute__ ((unused))
        = _dl_debug_update (args->nsid)->r_state;
      assert (r_state == RT_CONSISTENT);

I think we need to look at new->l_init_called and re-run the constructors along
new->l_searchlist.r_list if new->l_init_called is false. Not sure if we want to
switch to RT_CONSISTENT before that, or leave it (potentially) at RT_ADD.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
  2024-08-06  9:13 ` [Bug dynamic-link/31986] " fweimer at redhat dot com
@ 2024-08-06 17:50 ` fweimer at redhat dot com
  2024-08-06 23:15 ` woodard at redhat dot com
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2024-08-06 17:50 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #2 from Florian Weimer <fweimer at redhat dot com> ---
I've got a reproducer of the missing constructor call that doesn't even need an
auditor.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
  2024-08-06  9:13 ` [Bug dynamic-link/31986] " fweimer at redhat dot com
  2024-08-06 17:50 ` fweimer at redhat dot com
@ 2024-08-06 23:15 ` woodard at redhat dot com
  2024-08-07  9:01 ` fweimer at redhat dot com
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: woodard at redhat dot com @ 2024-08-06 23:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #3 from Ben Woodard <woodard at redhat dot com> ---
I talked to the original problem reporter and his opinion is that it should
transition through RT_CONSISTENT rather than staying in RT_ADD. One of the
reasons is there is an assumption in tools that user code such as library
constructors should not be running while the linker state is in RT_ADD.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
                   ` (2 preceding siblings ...)
  2024-08-06 23:15 ` woodard at redhat dot com
@ 2024-08-07  9:01 ` fweimer at redhat dot com
  2024-08-07 10:07 ` fweimer at redhat dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2024-08-07  9:01 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #4 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Ben Woodard from comment #3)
> I talked to the original problem reporter and his opinion is that it should
> transition through RT_CONSISTENT rather than staying in RT_ADD. One of the
> reasons is there is an assumption in tools that user code such as library
> constructors should not be running while the linker state is in RT_ADD.

My patches swap the order of the la_activity calls and the switch back to
RT_CONSISTENT. This takes care of the asserts (after some other fixes …), and I
think it makes sense from a conceptual point of view, too. Introducing more
la_activity calls is problematic because even our limited tests, this can
introduce infinite recursion that wasn't there before.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
                   ` (3 preceding siblings ...)
  2024-08-07  9:01 ` fweimer at redhat dot com
@ 2024-08-07 10:07 ` fweimer at redhat dot com
  2024-09-06 13:55 ` carlos at redhat dot com
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: fweimer at redhat dot com @ 2024-08-07 10:07 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #5 from Florian Weimer <fweimer at redhat dot com> ---
Patches posted:

[PATCH 0/4] Fixes for recursive dlopen (bug 31986)
<https://inbox.sourceware.org/libc-alpha/cover.1723024001.git.fweimer@redhat.com/>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
                   ` (4 preceding siblings ...)
  2024-08-07 10:07 ` fweimer at redhat dot com
@ 2024-09-06 13:55 ` carlos at redhat dot com
  2024-09-19 19:53 ` woodard at redhat dot com
  2024-09-19 23:23 ` woodard at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: carlos at redhat dot com @ 2024-09-06 13:55 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

Carlos O'Donell <carlos at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |carlos at redhat dot com

--- Comment #6 from Carlos O'Donell <carlos at redhat dot com> ---
v2 posted by Florian:
https://patchwork.sourceware.org/project/glibc/list/?series=37208

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
                   ` (5 preceding siblings ...)
  2024-09-06 13:55 ` carlos at redhat dot com
@ 2024-09-19 19:53 ` woodard at redhat dot com
  2024-09-19 23:23 ` woodard at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: woodard at redhat dot com @ 2024-09-19 19:53 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #7 from Ben Woodard <woodard at redhat dot com> ---
I think that I may need a bit more for this to be a complete fix of the
problem.

I built a local version of the trunk with:
fae459a273 (HEAD -> fw-fixes) elf: Signal RT_CONSISTENT after relocation
processing in dlopen (bug 31986)
be59ac60e3 elf: Signal LA_ACT_CONSISTENT to auditors after RT_CONSISTENT switch
00cdcdfe1a elf: Run constructors on cyclic recursive dlopen (bug 31986)
edf36ee9ab elf: Reorder audit events in dlcose to match _dl_fini (bug 32066)
d5167014b6 elf: Call la_objclose for proxy link maps in _dl_fini (bug 32065)
7bd0d8585d elf: Signal la_objopen for the proxy link map in dlmopen (bug 31985)
e36412841b elf: Update DSO list, write audit log to elf/tst-audit23.out
e64a1e81aa (origin/master, origin/HEAD, master) tst: Extend cross-test-ssh.sh
to support passing glibc tunables

and though it makes it through the first few test cases and gets farther than
when I originally filed the bug it doesn't make it through all of them.

[ben@darkstar build]$ ./testrun.sh /usr/bin/make -C
../../test/auditor-tests/tier2/recursive-dlopen-crashes
make: Entering directory
'/home/ben/Shared/Work/test/auditor-tests/tier2/recursive-dlopen-crashes'

All tests below require lines end in OK (not FAIL and no error)

Outer dlopen(libwrap), inner dlopen(libinit):
LD_AUDIT=./auditor.so ./main-wrap
[main] Dlopening libwrap...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
  [libinit] Initializing... OK
[audit -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK
[main -> libwrap] Validating libinit has initialized...
[libwrap -> libinit] Passing validation down to libinit...
  [libinit] Checking if initialized... OK

Outer libinit preloaded, inner dlopen(libinit):
LD_PRELOAD=./libinit.so LD_AUDIT=./auditor.so ./main
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
  [libinit] Initializing... OK
[audit -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK
[main] Dlopening libinit...
[main -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK

Outer libinit loaded by main dependency, inner dlopen(libinit):
LD_AUDIT=./auditor.so ./main-init
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
  [libinit] Initializing... OK
[audit -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK
[main] Dlopening libinit...
[main -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK

Outer dlopen(libinit), inner dlopen(libwrap):
LD_AUDIT=./auditor-wrap.so ./main
[main] Dlopening libinit...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libwrap...
  [libinit] Initializing... OK
[audit -> libwrap] Validating libinit has initialized...
[libwrap -> libinit] Passing validation down to libinit...
  [libinit] Checking if initialized... OK
[main -> libinit] Validating libinit has initialized...
  [libinit] Checking if initialized... OK

Outer dlopen(libinit), inner dlopen(libinit):
LD_AUDIT=./auditor.so ./main
[main] Dlopening libinit...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libinit...
Inconsistency detected by ld.so: dl-open.c: 624: dl_open_worker_begin:
Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT'
failed!
make: [Makefile:19: test] Error 127 (ignored)

Outer dlopen(libwrap), inner dlopen(libwrap):
LD_AUDIT=./auditor-wrap.so ./main-wrap
[main] Dlopening libwrap...
[audit] libinit has been loaded (but not initialized)
[audit] First CONSISTENT with libinit, dlopening libwrap...
Inconsistency detected by ld.so: dl-open.c: 624: dl_open_worker_begin:
Assertion `_dl_debug_initialize (0, args->nsid)->r_state == RT_CONSISTENT'
failed!
make: [Makefile:22: test] Error 127 (ignored)
make: Leaving directory
'/home/ben/Shared/Work/test/auditor-tests/tier2/recursive-dlopen-crashes'

Is there a patch that I missed? Or some other set of patches that I need to
apply first?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug dynamic-link/31986] Loading the same library within an audit library and within an application can cause ld.so to crash with an assert.
  2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
                   ` (6 preceding siblings ...)
  2024-09-19 19:53 ` woodard at redhat dot com
@ 2024-09-19 23:23 ` woodard at redhat dot com
  7 siblings, 0 replies; 9+ messages in thread
From: woodard at redhat dot com @ 2024-09-19 23:23 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=31986

--- Comment #8 from Ben Woodard <woodard at redhat dot com> ---
A more precise way to run the specific test is:

[ben@darkstar build2]$
LD_AUDIT=../../test/auditor-tests/tier2/recursive-dlopen-crashes/auditor.so
./testrun.sh ../../test/auditor-tests/tier2/recursive-dlopen-crashes/main
[main] Dlopening libinit...
[main -> libinit] Validating libinit has initialized...
Segmentation fault (core dumped)

GDB doesn't give us any deeper insights. 

$ LD_LIBRARY_PATH=./nptl_db gdb -ex "set env GCONV_PATH=./iconvdata" -ex "set
env LOCPATH=./localedata" -ex "set env LC_ALL=C" -ex "set arg --library-path
.:./math:./elf:./dlfcn:./nss:./nis:./rt:./resolv:./mathvec:./support:./nptl
../../test/auditor-tests/tier2/recursive-dlopen-crashes/main" -ex "set env
LD_AUDIT=../../test/auditor-tests/tier2/recursive-dlopen-crashes/auditor.so"
./elf/ld-linux-x86-64.so.2
GNU gdb (GDB) Red Hat Enterprise Linux 10.2-13.el9
<snip>
Reading symbols from ./elf/ld-linux-x86-64.so.2...
(gdb) r
Starting program: /home/ben/Shared/Work/glibc/build2/elf/ld-linux-x86-64.so.2
--library-path
.:./math:./elf:./dlfcn:./nss:./nis:./rt:./resolv:./mathvec:./support:./nptl
../../test/auditor-tests/tier2/recursive-dlopen-crashes/main
warning: Corrupted shared library list: 0x0 != 0x7ffff7fc9000
warning: Corrupted shared library list: 0x0 != 0x7ffff7fc9000
warning: Corrupted shared library list: 0x0 != 0x7ffff7fc9000
warning: Corrupted shared library list: 0x0 != 0x7ffff7fc9000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "./nptl_db/libthread_db.so.1".
[main] Dlopening libinit...
warning: Corrupted shared library list: 0x0 != 0x7ffff7fc9000
[main -> libinit] Validating libinit has initialized...

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000004011b5 in ?? ()
#2  0x0000000000000000 in ?? ()

A clue could be the "warning: Corrupted shared library list: 0x0 !=
0x7ffff7fc9000"
As with PR31985 It is accepted that this could easily be user error on my part.
I just wanted to try the patches with the reproducers before they were applied
upstream.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-09-19 23:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-17 19:00 [Bug libc/31986] New: Loading the same library within an audit library and within an application can cause ld.so to crash with an assert woodard at redhat dot com
2024-08-06  9:13 ` [Bug dynamic-link/31986] " fweimer at redhat dot com
2024-08-06 17:50 ` fweimer at redhat dot com
2024-08-06 23:15 ` woodard at redhat dot com
2024-08-07  9:01 ` fweimer at redhat dot com
2024-08-07 10:07 ` fweimer at redhat dot com
2024-09-06 13:55 ` carlos at redhat dot com
2024-09-19 19:53 ` woodard at redhat dot com
2024-09-19 23:23 ` woodard at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).