public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug shlibs/30765] New: Recursive library loading problem when using glibc probes
@ 2023-08-15 13:43 aburgess at redhat dot com
2023-08-15 14:31 ` [Bug shlibs/30765] " aburgess at redhat dot com
0 siblings, 1 reply; 2+ messages in thread
From: aburgess at redhat dot com @ 2023-08-15 13:43 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=30765
Bug ID: 30765
Summary: Recursive library loading problem when using glibc
probes
Product: gdb
Version: HEAD
Status: NEW
Severity: normal
Priority: P2
Component: shlibs
Assignee: unassigned at sourceware dot org
Reporter: aburgess at redhat dot com
Target Milestone: ---
Created attachment 15060
--> https://sourceware.org/bugzilla/attachment.cgi?id=15060&action=edit
GDB test case that exposes the issue described in this bug.
This bug describes an issues that exists with the mechanism GDB uses to detect
shared library loading, specifically, with glibc's probe interface. I think
the real problem is with glibc, though it maybe possible that we can work
around this issue in GDB, but I'm not sure how yet.
The attached patch applies to current(ish) HEAD of GDB (86dfe011797) and adds a
test which shows the problem, when run I see results like this:
=== gdb Summary ===
# of expected passes 4
# of known failures 3
Below is the description of the bug taken from the commit message included in
the patch:
gdb/testsuite: expose issue with recursive dlopen
This commit exposes an issue with GDB's handling of recursive dlopen.
The bug is actually an issue in glibc, but I'm creating this patch so
that I can file a GDB bug, which I'll then reference from a glibc bug.
The bug is actually in glibc's reloc_complete probe, which the glibc
documentation describes like this:
reloc_complete:
The linker has relocated all objects in the specified namespace.
The namespace's r_debug structure is consistent and may be
inspected, and all objects in the namespace's link-map are
guaranteed to have been relocated.
In this test we create a situation where a recursive dlopen occurs.
This is done by overriding malloc.
Inside the overridden malloc we dlopen a library (libbar) and call a
function from within that library, we then dlclose the library. Care
is taken so that we don't trigger this behaviour recursively, if the
dlopen, call, dlclose sequence used within malloc triggers another
malloc, then, in that case, we just forward the request straight
through to malloc.
Now, in the main() function we dlopen a different library (libfoo),
call a function within it, and then dlclose the library. There is no
recursion protection here. And so, the basic sequence of events is:
In main, dlopen libfoo
dlopen calls malloc
In malloc, dlopen libbar
dlopen calls malloc
In malloc, allocate memory and return
dlopen for libbar completes
In malloc, call function from libbar
In malloc, dlclose libbar
In malloc, allocate memory and return
dlopen for libfoo completes
In main, call function from libfoo
In main, dlclose libfoo
It's not quite that simple, it turns out that dlopen calls malloc a
number of times, and so we actually see repeated calls into malloc
that each result in libbar being loaded, called, and closed.
Within glibc, as each library is loaded, we pass through a number of
probes:
- map_start
- map_complete
- reloc_start
- reloc_complete
GDB only cares about the 'reloc_complete' probe, which is hit when all
the libraries have been mapped and relocated.
At some point after map_start the new library is added to the shared
library list, but is not yet relocated. Only when reloc_complete is
hit are we guaranteed that all libraries have been relocated...
The problem is, glibc calls malloc at some point between map_start and
reloc_complete. This call to malloc triggers the recursive dlopen.
This recursive dlopen passes through all these probes, which means
that GDB will be triggered by the reloc_complete probe.
When the reloc_complete probe is hit the following things happen:
First, GDB tries to only load information about the most recently
added libraries. To do this GDB tracks the known library list. When
reloc_complete is hit glibc passes GDB a pointer to the new library,
which is part of a doubly linked list.
GDB follows the back pointer for the new library and expects the
previous library to be the last library that GDB knows was loaded.
However, in our problem case this is not true. The first
library (libfoo) has already been added to the library list, but has
not yet been announced (with reloc_complete) to GDB yet. GDB is
seeing the reloc_complete probe for libbar. However, within glibc's
data structure, the previous library is libfoo, and this is why we see
the following warning from GDB:
warning: Corrupted shared library list: 0x7ffff7ffd988 != 0x405ee0
Now, when GDB emits that warning it falls back to performing a
complete reload of all the shared libraries. This is done by walking
glibc's data structure to find all the libraries. This will include
libfoo, which has not yet been relocated.
Unfortunately, there is nothing in glibc's data structure (that is
visible to GDB) that can tell us that libfoo is not yet relocated, as
a result, GDB will believe that libfoo has been fully relocated, and
will announce the library to the user.
This test shows that the library is not fully relocated by stopping on
the solib event, watching for GDB to tell us that libfoo has been
loaded, and then prints a global variable from within the library.
The global variable happens to be initialised with a pointer value,
and so will not be correct unless relocation has been performed. As
we see, GDB can observe the global in an uninitialised state.
I don't know if there are wider implications from GDB seeing the
library load earlier than it should, we can, for sure, load the debug
information at this point -- could we get anything wrong as a result
of relocation having not been completed yet? We could potentially
trigger the loading of Python extensions from the library, this for
sure could run into problems if the Python code reads any globals that
it expects to be initialised.
In terms of fixing this, the only options I see would require GDB to
be _more_ trusting of glibc, and even then, I don't think the solution
would be perfect. We could track the reloc_start/reloc_completed
pairs to try and track recursion, and thus ignore libraries that have
not been relocated yet, but this would mean we could not fall back (as
we currently do) to just "reload everything", when we see some
unexpected state -- as "everything" can include libraries that are not
relocated yet.
Also, if we attach to a process we're stuck, the only option is to
walk the library list and "reload everything", but at that point we
might end up finding a library that is not relocated yet.
Ultimately, the right solution is for glibc to ensure that we really
do only add the library to the library list just prior to hitting the
reloc_complete probe.
Well, to maintain the existing API, I think glibc would need to add
the library to the list just prior to map_complete, then remove the
library again just after reloc_start, before adding the libraries
again at reloc_complete -- which really sucks. Or maybe glibc needs
to be smarter and "preallocate" its required memory ahead of time
before mapping and relocating the library...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 2+ messages in thread
* [Bug shlibs/30765] Recursive library loading problem when using glibc probes
2023-08-15 13:43 [Bug shlibs/30765] New: Recursive library loading problem when using glibc probes aburgess at redhat dot com
@ 2023-08-15 14:31 ` aburgess at redhat dot com
0 siblings, 0 replies; 2+ messages in thread
From: aburgess at redhat dot com @ 2023-08-15 14:31 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=30765
Andrew Burgess <aburgess at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Depends on| |30766
--- Comment #1 from Andrew Burgess <aburgess at redhat dot com> ---
I created glibc bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30766 for
the glibc side of this issue.
Referenced Bugs:
https://sourceware.org/bugzilla/show_bug.cgi?id=30766
[Bug 30766] The reloc_complete probe can be hit when not all libraries have
been relocated
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2023-08-15 14:31 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-15 13:43 [Bug shlibs/30765] New: Recursive library loading problem when using glibc probes aburgess at redhat dot com
2023-08-15 14:31 ` [Bug shlibs/30765] " aburgess at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).