[PATCH] gdb/debuginfod: cleanup debuginfod earlier

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] gdb/debuginfod: cleanup debuginfod earlier
@ 2023-05-23 11:23 Andrew Burgess
  2023-05-23 12:31 ` Simon Marchi
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Burgess @ 2023-05-23 11:23 UTC (permalink / raw)
  To: gdb-patches; +Cc: Aaron Merey, Andrew Burgess, Mark Wielaard

A GDB crash was discovered on Fedora GDB that was tracked back to an
issue with the way that debuginfod is cleaned up.

The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
reproduce:

1. The file /etc/ssl/openssl.cnf contains the following lines:

   [provider_sect]
   default = default_sect
   ##legacy = legacy_sect
   ##
   [default_sect]
   activate = 1

   ##[legacy_sect]
   ##activate = 1

   The bug will occur when the '##' characters are removed so that the
   lines in question look like this:

   [provider_sect]
   default = default_sect
   legacy = legacy_sect

   [default_sect]
   activate = 1

   [legacy_sect]
   activate = 1

2. Clean up any existing debuginfod cache data:

   > rm -rf $HOME/.cache/debuginfod_client

3. Run GDB:

   > gdb -nx -q -iex 'set trace-commands on' \
                -iex 'set debuginfod enabled on' \
                -iex 'set confirm off' \
		-ex 'start' -ex 'quit' /bin/ls
   +set debuginfod enabled on
   +set confirm off
   Reading symbols from /bin/ls...
   Downloading separate debug info for /usr/bin/ls
   ... snip ...
   Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
   1646    {
   +quit

   Fatal signal: Segmentation fault
   ----- Backtrace -----
   ... snip ...

So GDB ends up crashing during exit.

What's happening is that when debuginfod is initialised
debuginfod_begin is called (this is in the debuginfod library), this
in turn sets up libcurl, which makes use of openssl.  Somewhere during
this setup process an at_exit function is registered to cleanup some
state.

Back in GDB the debuginfod_client object is managed using this code:

  /* Deleter for a debuginfod_client.  */

  struct debuginfod_client_deleter
  {
    void operator() (debuginfod_client *c)
    {
      debuginfod_end (c);
    }
  };

  using debuginfod_client_up
    = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;

And then a global debuginfod_client_up is created to hold a pointer to
the debuginfod_client object.  As a global this will be cleaned up
using the standard C++ global object destructor mechanism, which is
run after the at_exit handlers.

However, it is expected that when debuginfod_end is called the
debuginfod_client object will still be in a usable state, that is, we
don't expect the at_exit handlers to have run and started cleaning up
the library state.

To fix this issue we need to ensure that debuginfod_end is called
before the at_exit handlers have a chance to run.

I propose that we should make use of GDB's make_final_cleanup
mechanism to call debuginfod_end.  Or rather, I intend to continue
using the debuginfod_client_up class, but ensure that the global
instance of this class is set back to nullptr before GDB calls exit.
Setting debuginfod_client_up to nullptr will invoke the deleter, which
will call debuginfod_end.

For the implementation I've created a new class which holds the
debuginfod_client_up we can then pass a pointer to this class to
make_final_cleanup.  The alternative, passing a pointer to a
unique_ptr made me feel really uncomfortable -- we'd then have
multiple references to the unique_ptr, which seems like a bad idea.

There's no test associated with this patch.  I have no idea how I
might trigger this bug from within the testsuite.  If anyone has any
ideas then I'm happy to have a go at writing something.

None of the existing debuginfod tests fail after this change.

Co-Authored-By: Mark Wielaard <mark@klomp.org>
---
 gdb/debuginfod-support.c | 104 +++++++++++++++++++++++++++++++++++----
 1 file changed, 94 insertions(+), 10 deletions(-)

diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
index 5853f420a18..5dfb313f0af 100644
--- a/gdb/debuginfod-support.c
+++ b/gdb/debuginfod-support.c
@@ -180,20 +180,104 @@ progressfn (debuginfod_client *c, long cur, long total)
   return 0;
 }
 
-static debuginfod_client *
-get_debuginfod_client ()
+/* A class that manages a debuginfod_client object.  There should only be a
+   single debuginfod_client created within GDB, so we should only ever have
+   a single instance of this class.
+
+   The critical part of this class is that we register a final cleanup with
+   GDB that is responsible for ensuring the debuginfod_client object is
+   correctly cleaned up before GDB calls exit.  If we rely on the C++
+   global object destructors to clean up the debuginfod_client object then
+   we run into problems with the order in which object destructors are
+   called relative to at_exit callbacks.  See the init method below for
+   more details.  */
+
+struct debuginfod_client_manager
 {
-  static debuginfod_client_up global_client;
+  /* Return a pointer to the managed debuginfod_client object, initialising
+     it first if needed.  */
 
-  if (global_client == nullptr)
-    {
-      global_client.reset (debuginfod_begin ());
+  debuginfod_client *get ()
+  {
+    if (m_global_client == nullptr)
+      this->init ();
+    return m_global_client.get ();
+  }
 
-      if (global_client != nullptr)
-	debuginfod_set_progressfn (global_client.get (), progressfn);
-    }
+  /* There should only be a single, global, instance of this class.  The
+     managed debuginfod_client will be reset back to nullptr by GDB's
+     make_final_cleanup mechanism before the global object is destroyed,
+     which is why the assertion here makes sense.  */
+
+  ~debuginfod_client_manager ()
+  {
+    gdb_assert (m_global_client == nullptr);
+  }
 
-  return global_client.get ();
+private:
+
+  /* Callback used by GDB's final cleanup mechanism.  Cleans up the
+     debuginfod_client object being managed by ARG, which should be a
+     debuginfod_client_manager pointer.  */
+
+  static void cleanup_debuginfod_client (void *arg)
+  {
+    debuginfod_client_manager *m = (debuginfod_client_manager *) arg;
+    m->cleanup ();
+  }
+
+  /* Set the managed debuginfod_client back to nullptr.  This will ensure
+     that the debuginfod cleanup function is called.  */
+
+  void cleanup ()
+  {
+    m_global_client = nullptr;
+  }
+
+  /* Initialise the managed debuginfod_client object and register with
+     GDB's global cleanup mechanism to ensure that the object is cleaned up
+     correctly before GDB calls exit.  */
+
+  void init ()
+  {
+    gdb_assert (m_global_client == nullptr);
+
+    debuginfod_client *client = debuginfod_begin ();
+
+    if (client != nullptr)
+      {
+	/* Debuginfod (or the libraries it uses) use at_exit to cleanup
+	   some of their state.  However, it is requires that the
+	   debuginfod_client object be cleaned up before the libraries that
+	   debuginfod uses are cleaned up.
+
+	   If we rely on the C++ global object destructors to cleanup the
+	   debuginfod object then these are run after the at_exit
+	   callbacks, which is too late.
+
+	   So instead, we register with GDB's final cleanup mechanism, this
+	   is run before GDB calls exit, which is before the at_exit
+	   handlers are run, this ensures things are cleaned up in the
+	   correct order.  */
+	make_final_cleanup (cleanup_debuginfod_client, this);
+	debuginfod_set_progressfn (client, progressfn);
+	m_global_client.reset (client);
+      }
+  }
+
+  /* The managed debuginfod_client object.  */
+
+  debuginfod_client_up m_global_client;
+};
+
+/* Return a pointer to the single global debuginfod_client, initialising it
+   first if needed.  */
+
+static debuginfod_client *
+get_debuginfod_client ()
+{
+  static debuginfod_client_manager global_client_manager;
+  return global_client_manager.get ();
 }
 
 /* Check if debuginfod is enabled.  If configured to do so, ask the user

base-commit: e84060b489746d031ed1ec9e7b6b39fdf4b6cfe3
-- 
2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gdb/debuginfod: cleanup debuginfod earlier
  2023-05-23 11:23 [PATCH] gdb/debuginfod: cleanup debuginfod earlier Andrew Burgess
@ 2023-05-23 12:31 ` Simon Marchi
  2023-05-23 14:23 ` Aaron Merey
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
  2 siblings, 0 replies; 9+ messages in thread
From: Simon Marchi @ 2023-05-23 12:31 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Aaron Merey, Mark Wielaard

On 5/23/23 07:23, Andrew Burgess via Gdb-patches wrote:
> A GDB crash was discovered on Fedora GDB that was tracked back to an
> issue with the way that debuginfod is cleaned up.
> 
> The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
> reproduce:
> 
> 1. The file /etc/ssl/openssl.cnf contains the following lines:
> 
>    [provider_sect]
>    default = default_sect
>    ##legacy = legacy_sect
>    ##
>    [default_sect]
>    activate = 1
> 
>    ##[legacy_sect]
>    ##activate = 1
> 
>    The bug will occur when the '##' characters are removed so that the
>    lines in question look like this:
> 
>    [provider_sect]
>    default = default_sect
>    legacy = legacy_sect
> 
>    [default_sect]
>    activate = 1
> 
>    [legacy_sect]
>    activate = 1
> 
> 2. Clean up any existing debuginfod cache data:
> 
>    > rm -rf $HOME/.cache/debuginfod_client
> 
> 3. Run GDB:
> 
>    > gdb -nx -q -iex 'set trace-commands on' \
>                 -iex 'set debuginfod enabled on' \
>                 -iex 'set confirm off' \
> 		-ex 'start' -ex 'quit' /bin/ls
>    +set debuginfod enabled on
>    +set confirm off
>    Reading symbols from /bin/ls...
>    Downloading separate debug info for /usr/bin/ls
>    ... snip ...
>    Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
>    1646    {
>    +quit
> 
>    Fatal signal: Segmentation fault
>    ----- Backtrace -----
>    ... snip ...
> 
> So GDB ends up crashing during exit.
> 
> What's happening is that when debuginfod is initialised
> debuginfod_begin is called (this is in the debuginfod library), this
> in turn sets up libcurl, which makes use of openssl.  Somewhere during
> this setup process an at_exit function is registered to cleanup some
> state.
> 
> Back in GDB the debuginfod_client object is managed using this code:
> 
>   /* Deleter for a debuginfod_client.  */
> 
>   struct debuginfod_client_deleter
>   {
>     void operator() (debuginfod_client *c)
>     {
>       debuginfod_end (c);
>     }
>   };
> 
>   using debuginfod_client_up
>     = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
> 
> And then a global debuginfod_client_up is created to hold a pointer to
> the debuginfod_client object.  As a global this will be cleaned up
> using the standard C++ global object destructor mechanism, which is
> run after the at_exit handlers.
> 
> However, it is expected that when debuginfod_end is called the
> debuginfod_client object will still be in a usable state, that is, we
> don't expect the at_exit handlers to have run and started cleaning up
> the library state.
> 
> To fix this issue we need to ensure that debuginfod_end is called
> before the at_exit handlers have a chance to run.
> 
> I propose that we should make use of GDB's make_final_cleanup
> mechanism to call debuginfod_end.  Or rather, I intend to continue
> using the debuginfod_client_up class, but ensure that the global
> instance of this class is set back to nullptr before GDB calls exit.
> Setting debuginfod_client_up to nullptr will invoke the deleter, which
> will call debuginfod_end.
> 
> For the implementation I've created a new class which holds the
> debuginfod_client_up we can then pass a pointer to this class to
> make_final_cleanup.  The alternative, passing a pointer to a
> unique_ptr made me feel really uncomfortable -- we'd then have
> multiple references to the unique_ptr, which seems like a bad idea.
> 
> There's no test associated with this patch.  I have no idea how I
> might trigger this bug from within the testsuite.  If anyone has any
> ideas then I'm happy to have a go at writing something.
> 
> None of the existing debuginfod tests fail after this change.
> 
> Co-Authored-By: Mark Wielaard <mark@klomp.org>

I was able to reproduce on Arch Linux using the same openssl
configuration.

Using the final cleanup mechanism sounds fine to me.  But if we're not
going to manage the lifetime of the object with the destructor, I think
that there is no point in having a debuginfod_client_up, it just adds an
additional and unnecessary layer to run the same code (we can just call
debuginfod_end where you would reset the unique pointer).

Also, I think the debuginfod_client_manager class is unnecessary and
just adds some more unnecessary layers, making the code hard to follow
for no benefit.  We can just make the existing get_debuginfod_client
function set up the final cleanup handler, as follow:

diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
index 5853f420a18d..4ab650ef2494 100644
--- a/gdb/debuginfod-support.c
+++ b/gdb/debuginfod-support.c
@@ -96,20 +96,6 @@ struct user_data
   ui_out::progress_update progress;
 };

-/* Deleter for a debuginfod_client.  */
-
-struct debuginfod_client_deleter
-{
-  void operator() (debuginfod_client *c)
-  {
-    debuginfod_end (c);
-  }
-};
-
-using debuginfod_client_up
-  = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
-
-
 /* Convert SIZE into a unit suitable for use with progress updates.
    SIZE should in given in bytes and will be converted into KB, MB, GB
    or remain unchanged. UNIT will be set to "B", "KB", "MB" or "GB"
@@ -180,20 +166,32 @@ progressfn (debuginfod_client *c, long cur, long total)
   return 0;
 }

+static void cleanup_debuginfod_client (void *arg)
+{
+  debuginfod_client *client = static_cast<debuginfod_client *> (arg);
+  debuginfod_end (client);
+}
+
+/* Return a pointer to the single global debuginfod_client, initialising it
+   first if needed.  */
+
 static debuginfod_client *
 get_debuginfod_client ()
 {
-  static debuginfod_client_up global_client;
+  static debuginfod_client *global_client;

   if (global_client == nullptr)
     {
-      global_client.reset (debuginfod_begin ());
+      global_client = debuginfod_begin ();

       if (global_client != nullptr)
-	debuginfod_set_progressfn (global_client.get (), progressfn);
+	{
+	  make_final_cleanup (cleanup_debuginfod_client, global_client);
+	  debuginfod_set_progressfn (global_client, progressfn);
+	}
     }

-  return global_client.get ();
+  return global_client;
 }

 /* Check if debuginfod is enabled.  If configured to do so, ask the user

---

As for the test I don't see any other way of truly testing this than
running the regular tests on the affected systems.

Simon

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gdb/debuginfod: cleanup debuginfod earlier
  2023-05-23 11:23 [PATCH] gdb/debuginfod: cleanup debuginfod earlier Andrew Burgess
  2023-05-23 12:31 ` Simon Marchi
@ 2023-05-23 14:23 ` Aaron Merey
  2023-05-24 14:33   ` Andrew Burgess
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
  2 siblings, 1 reply; 9+ messages in thread
From: Aaron Merey @ 2023-05-23 14:23 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches, Mark Wielaard

On Tue, May 23, 2023 at 7:23 AM Andrew Burgess <aburgess@redhat.com> wrote:
>
> And then a global debuginfod_client_up is created to hold a pointer to
> the debuginfod_client object.  As a global this will be cleaned up
> using the standard C++ global object destructor mechanism, which is
> run after the at_exit handlers.
>
> However, it is expected that when debuginfod_end is called the
> debuginfod_client object will still be in a usable state, that is, we
> don't expect the at_exit handlers to have run and started cleaning up
> the library state.

The crash comes down to curl_multi_cleanup triggering a double free
when it's called during process exit. Ideally this should be fixed in
libcurl or at least the libcurl docs should mention that curl_multi_cleanup
shouldn't be called at exit.

But it's still a good idea to add this workaround to gdb. Thanks for looking
into this. I tested Simon's patch since it's a bit simpler and it fixes the
crash for me on F37.

> There's no test associated with this patch.  I have no idea how I
> might trigger this bug from within the testsuite.  If anyone has any
> ideas then I'm happy to have a go at writing something.

gdb's debuginfod tests only pull files from local servers.  This crash
does not reproduce when using a localhost URL.  To test for this
gdb would have to download from a remote server and maybe
use the OPENSSL_CONF environment variable to set a custom
config file path.  However this might cause more problems than it
solves.

Aaron

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gdb/debuginfod: cleanup debuginfod earlier
  2023-05-23 14:23 ` Aaron Merey
@ 2023-05-24 14:33   ` Andrew Burgess
  0 siblings, 0 replies; 9+ messages in thread
From: Andrew Burgess @ 2023-05-24 14:33 UTC (permalink / raw)
  To: Aaron Merey; +Cc: gdb-patches, Mark Wielaard

Aaron Merey <amerey@redhat.com> writes:

> On Tue, May 23, 2023 at 7:23 AM Andrew Burgess <aburgess@redhat.com> wrote:
>>
>> And then a global debuginfod_client_up is created to hold a pointer to
>> the debuginfod_client object.  As a global this will be cleaned up
>> using the standard C++ global object destructor mechanism, which is
>> run after the at_exit handlers.
>>
>> However, it is expected that when debuginfod_end is called the
>> debuginfod_client object will still be in a usable state, that is, we
>> don't expect the at_exit handlers to have run and started cleaning up
>> the library state.
>
> The crash comes down to curl_multi_cleanup triggering a double free
> when it's called during process exit. Ideally this should be fixed in
> libcurl or at least the libcurl docs should mention that curl_multi_cleanup
> shouldn't be called at exit.
>
> But it's still a good idea to add this workaround to gdb.

It's only a workaround if libcurl doesn't just update their docs to say
don't call curl_multi_cleanup at exit!  If they do that then debuginfod
needs to update its docs to say don't call debuginfod_end at exit
... and then we're back here, right?

>                                                            Thanks for looking
> into this. I tested Simon's patch since it's a bit simpler and it fixes the
> crash for me on F37.

Excellent, I'll add a commit message to Simon's version and repost it.

Thanks,
Andrew

>
>> There's no test associated with this patch.  I have no idea how I
>> might trigger this bug from within the testsuite.  If anyone has any
>> ideas then I'm happy to have a go at writing something.
>
> gdb's debuginfod tests only pull files from local servers.  This crash
> does not reproduce when using a localhost URL.  To test for this
> gdb would have to download from a remote server and maybe
> use the OPENSSL_CONF environment variable to set a custom
> config file path.  However this might cause more problems than it
> solves.
>
> Aaron


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCHv2] gdb/debuginfod: cleanup debuginfod earlier
  2023-05-23 11:23 [PATCH] gdb/debuginfod: cleanup debuginfod earlier Andrew Burgess
  2023-05-23 12:31 ` Simon Marchi
  2023-05-23 14:23 ` Aaron Merey
@ 2023-06-08 13:33 ` Andrew Burgess
  2023-06-08 15:09   ` Mark Wielaard
                     ` (3 more replies)
  2 siblings, 4 replies; 9+ messages in thread
From: Andrew Burgess @ 2023-06-08 13:33 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess, Simon Marchi, Aaron Merey, Mark Wielaard

In V2:

  - Complete rewrite inline with Simon's suggestion.

--

A GDB crash was discovered on Fedora GDB that was tracked back to an
issue with the way that debuginfod is cleaned up.

The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
reproduce:

1. The file /etc/ssl/openssl.cnf contains the following lines:

   [provider_sect]
   default = default_sect
   ##legacy = legacy_sect
   ##
   [default_sect]
   activate = 1

   ##[legacy_sect]
   ##activate = 1

   The bug will occur when the '##' characters are removed so that the
   lines in question look like this:

   [provider_sect]
   default = default_sect
   legacy = legacy_sect

   [default_sect]
   activate = 1

   [legacy_sect]
   activate = 1

2. Clean up any existing debuginfod cache data:

   > rm -rf $HOME/.cache/debuginfod_client

3. Run GDB:

   > gdb -nx -q -iex 'set trace-commands on' \
                -iex 'set debuginfod enabled on' \
                -iex 'set confirm off' \
		-ex 'start' -ex 'quit' /bin/ls
   +set debuginfod enabled on
   +set confirm off
   Reading symbols from /bin/ls...
   Downloading separate debug info for /usr/bin/ls
   ... snip ...
   Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
   1646    {
   +quit

   Fatal signal: Segmentation fault
   ----- Backtrace -----
   ... snip ...

So GDB ends up crashing during exit.

What's happening is that when debuginfod is initialised
debuginfod_begin is called (this is in the debuginfod library), this
in turn sets up libcurl, which makes use of openssl.  Somewhere during
this setup process an at_exit function is registered to cleanup some
state.

Back in GDB the debuginfod_client object is managed using this code:

  /* Deleter for a debuginfod_client.  */

  struct debuginfod_client_deleter
  {
    void operator() (debuginfod_client *c)
    {
      debuginfod_end (c);
    }
  };

  using debuginfod_client_up
    = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;

And then a global debuginfod_client_up is created to hold a pointer to
the debuginfod_client object.  As a global this will be cleaned up
using the standard C++ global object destructor mechanism, which is
run after the at_exit handlers.

However, it is expected that when debuginfod_end is called the
debuginfod_client object will still be in a usable state, that is, we
don't expect the at_exit handlers to have run and started cleaning up
the library state.

To fix this issue we need to ensure that debuginfod_end is called
before the at_exit handlers have a chance to run.

This commit removes the debuginfod_client_up type, and instead has GDB
hold a raw pointer to the debuginfod_client object.  We then make use
of GDB's make_final_cleanup to register a function that will call
debuginfod_end.

As GDB's final cleanups are called before exit is called, this means
that debuginfod_end will be called before the at_exit handlers are
called, and the crash identified above is resolved.

It's not obvious how this issue can easily be tested for. The bug does
not appear to manifest when using a local debuginfod server, so we'd
need to setup something more involved.  For now I'm proposing this
patch without any associated tests.

Co-Authored-By: Mark Wielaard <mark@klomp.org>
Co-Authored-By: Simon Marchi <simark@simark.ca>
---
 gdb/debuginfod-support.c | 47 +++++++++++++++++++++++++---------------
 1 file changed, 29 insertions(+), 18 deletions(-)

diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
index 5853f420a18..a41a4c95785 100644
--- a/gdb/debuginfod-support.c
+++ b/gdb/debuginfod-support.c
@@ -96,20 +96,6 @@ struct user_data
   ui_out::progress_update progress;
 };

-/* Deleter for a debuginfod_client.  */
-
-struct debuginfod_client_deleter
-{
-  void operator() (debuginfod_client *c)
-  {
-    debuginfod_end (c);
-  }
-};
-
-using debuginfod_client_up
-  = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
-
-
 /* Convert SIZE into a unit suitable for use with progress updates.
    SIZE should in given in bytes and will be converted into KB, MB, GB
    or remain unchanged. UNIT will be set to "B", "KB", "MB" or "GB"
@@ -180,20 +166,45 @@ progressfn (debuginfod_client *c, long cur, long total)
   return 0;
 }

+/* Cleanup ARG, which is a debuginfod_client pointer.  */
+
+static void
+cleanup_debuginfod_client (void *arg)
+{
+  debuginfod_client *client = static_cast<debuginfod_client *> (arg);
+  debuginfod_end (client);
+}
+
+/* Return a pointer to the single global debuginfod_client, initialising it
+   first if needed.  */
+
 static debuginfod_client *
 get_debuginfod_client ()
 {
-  static debuginfod_client_up global_client;
+  static debuginfod_client *global_client = nullptr;

   if (global_client == nullptr)
     {
-      global_client.reset (debuginfod_begin ());
+      global_client = debuginfod_begin ();

       if (global_client != nullptr)
-	debuginfod_set_progressfn (global_client.get (), progressfn);
+	{
+	  /* It is important that we cleanup the debuginfod_client object
+	     before calling exit.  Some of the libraries used by debuginfod
+	     make use of at_exit handlers to perform cleanup.
+
+	     If we wrapped the debuginfod_client in a unique_ptr and relied
+	     on its destructor to cleanup then this would be run as part of
+	     the global C++ object destructors, which is after the at_exit
+	     handlers, which is too late.
+
+	     So instead, we make use of GDB's final cleanup mechanism.  */
+	  make_final_cleanup (cleanup_debuginfod_client, global_client);
+	  debuginfod_set_progressfn (global_client, progressfn);
+	}
     }

-  return global_client.get ();
+  return global_client;
 }

 /* Check if debuginfod is enabled.  If configured to do so, ask the user

base-commit: baab375361c365afee2577c94cbbd3fdd443d6da
-- 
2.25.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv2] gdb/debuginfod: cleanup debuginfod earlier
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
@ 2023-06-08 15:09   ` Mark Wielaard
  2023-06-08 21:08   ` Aaron Merey
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Mark Wielaard @ 2023-06-08 15:09 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Simon Marchi, Aaron Merey

Hi Andrew,

On Thu, 2023-06-08 at 14:33 +0100, Andrew Burgess wrote:
> To fix this issue we need to ensure that debuginfod_end is called
> before the at_exit handlers have a chance to run.
> 
> This commit removes the debuginfod_client_up type, and instead has GDB
> hold a raw pointer to the debuginfod_client object.  We then make use
> of GDB's make_final_cleanup to register a function that will call
> debuginfod_end.
> 
> As GDB's final cleanups are called before exit is called, this means
> that debuginfod_end will be called before the at_exit handlers are
> called, and the crash identified above is resolved.

The nice thing is that the code got simpler.

> It's not obvious how this issue can easily be tested for. The bug does
> not appear to manifest when using a local debuginfod server, so we'd
> need to setup something more involved.  For now I'm proposing this
> patch without any associated tests.

Agreed. This really is a somewhat obscure bug that only triggers for a
legacy openssl backend in libcurl that is shutdown "too early".

> Co-Authored-By: Mark Wielaard <mark@klomp.org>

That is somewhat generous, I don't think any of what I suggested was
left in this new (and better) version :)


Code looks good to me.

Thanks,

Mark

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv2] gdb/debuginfod: cleanup debuginfod earlier
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
  2023-06-08 15:09   ` Mark Wielaard
@ 2023-06-08 21:08   ` Aaron Merey
  2023-06-09 14:24   ` Tom Tromey
  2023-06-09 14:38   ` Andrew Burgess
  3 siblings, 0 replies; 9+ messages in thread
From: Aaron Merey @ 2023-06-08 21:08 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches, Simon Marchi, Mark Wielaard

On Thu, Jun 8, 2023 at 9:34 AM Andrew Burgess <aburgess@redhat.com> wrote:
>
> It's not obvious how this issue can easily be tested for. The bug does
> not appear to manifest when using a local debuginfod server, so we'd
> need to setup something more involved.  For now I'm proposing this
> patch without any associated tests.

LGTM. Tested manually on F37.

Thanks,
Aaron


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv2] gdb/debuginfod: cleanup debuginfod earlier
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
  2023-06-08 15:09   ` Mark Wielaard
  2023-06-08 21:08   ` Aaron Merey
@ 2023-06-09 14:24   ` Tom Tromey
  2023-06-09 14:38   ` Andrew Burgess
  3 siblings, 0 replies; 9+ messages in thread
From: Tom Tromey @ 2023-06-09 14:24 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches
  Cc: Andrew Burgess, Simon Marchi, Aaron Merey, Mark Wielaard

>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:

Andrew> In V2:
Andrew>   - Complete rewrite inline with Simon's suggestion.

Thanks for doing this.  FWIW this looks good to me.
Reviewed-By: Tom Tromey <tom@tromey.com>

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHv2] gdb/debuginfod: cleanup debuginfod earlier
  2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
                     ` (2 preceding siblings ...)
  2023-06-09 14:24   ` Tom Tromey
@ 2023-06-09 14:38   ` Andrew Burgess
  3 siblings, 0 replies; 9+ messages in thread
From: Andrew Burgess @ 2023-06-09 14:38 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi, Aaron Merey, Mark Wielaard

Andrew Burgess <aburgess@redhat.com> writes:

> In V2:
>
>   - Complete rewrite inline with Simon's suggestion.
>

Thanks for all the reviews.  I've now pushed this patch.

Thanks,
Andrew


> --
>
> A GDB crash was discovered on Fedora GDB that was tracked back to an
> issue with the way that debuginfod is cleaned up.
>
> The bug was reported on Fedora 37, 38, and 39.  Here are the steps to
> reproduce:
>
> 1. The file /etc/ssl/openssl.cnf contains the following lines:
>
>    [provider_sect]
>    default = default_sect
>    ##legacy = legacy_sect
>    ##
>    [default_sect]
>    activate = 1
>
>    ##[legacy_sect]
>    ##activate = 1
>
>    The bug will occur when the '##' characters are removed so that the
>    lines in question look like this:
>
>    [provider_sect]
>    default = default_sect
>    legacy = legacy_sect
>
>    [default_sect]
>    activate = 1
>
>    [legacy_sect]
>    activate = 1
>
> 2. Clean up any existing debuginfod cache data:
>
>    > rm -rf $HOME/.cache/debuginfod_client
>
> 3. Run GDB:
>
>    > gdb -nx -q -iex 'set trace-commands on' \
>                 -iex 'set debuginfod enabled on' \
>                 -iex 'set confirm off' \
> 		-ex 'start' -ex 'quit' /bin/ls
>    +set debuginfod enabled on
>    +set confirm off
>    Reading symbols from /bin/ls...
>    Downloading separate debug info for /usr/bin/ls
>    ... snip ...
>    Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde38) at ../src/ls.c:1646
>    1646    {
>    +quit
>
>    Fatal signal: Segmentation fault
>    ----- Backtrace -----
>    ... snip ...
>
> So GDB ends up crashing during exit.
>
> What's happening is that when debuginfod is initialised
> debuginfod_begin is called (this is in the debuginfod library), this
> in turn sets up libcurl, which makes use of openssl.  Somewhere during
> this setup process an at_exit function is registered to cleanup some
> state.
>
> Back in GDB the debuginfod_client object is managed using this code:
>
>   /* Deleter for a debuginfod_client.  */
>
>   struct debuginfod_client_deleter
>   {
>     void operator() (debuginfod_client *c)
>     {
>       debuginfod_end (c);
>     }
>   };
>
>   using debuginfod_client_up
>     = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
>
> And then a global debuginfod_client_up is created to hold a pointer to
> the debuginfod_client object.  As a global this will be cleaned up
> using the standard C++ global object destructor mechanism, which is
> run after the at_exit handlers.
>
> However, it is expected that when debuginfod_end is called the
> debuginfod_client object will still be in a usable state, that is, we
> don't expect the at_exit handlers to have run and started cleaning up
> the library state.
>
> To fix this issue we need to ensure that debuginfod_end is called
> before the at_exit handlers have a chance to run.
>
> This commit removes the debuginfod_client_up type, and instead has GDB
> hold a raw pointer to the debuginfod_client object.  We then make use
> of GDB's make_final_cleanup to register a function that will call
> debuginfod_end.
>
> As GDB's final cleanups are called before exit is called, this means
> that debuginfod_end will be called before the at_exit handlers are
> called, and the crash identified above is resolved.
>
> It's not obvious how this issue can easily be tested for. The bug does
> not appear to manifest when using a local debuginfod server, so we'd
> need to setup something more involved.  For now I'm proposing this
> patch without any associated tests.
>
> Co-Authored-By: Mark Wielaard <mark@klomp.org>
> Co-Authored-By: Simon Marchi <simark@simark.ca>
> ---
>  gdb/debuginfod-support.c | 47 +++++++++++++++++++++++++---------------
>  1 file changed, 29 insertions(+), 18 deletions(-)
>
> diff --git a/gdb/debuginfod-support.c b/gdb/debuginfod-support.c
> index 5853f420a18..a41a4c95785 100644
> --- a/gdb/debuginfod-support.c
> +++ b/gdb/debuginfod-support.c
> @@ -96,20 +96,6 @@ struct user_data
>    ui_out::progress_update progress;
>  };
>  
> -/* Deleter for a debuginfod_client.  */
> -
> -struct debuginfod_client_deleter
> -{
> -  void operator() (debuginfod_client *c)
> -  {
> -    debuginfod_end (c);
> -  }
> -};
> -
> -using debuginfod_client_up
> -  = std::unique_ptr<debuginfod_client, debuginfod_client_deleter>;
> -
> -
>  /* Convert SIZE into a unit suitable for use with progress updates.
>     SIZE should in given in bytes and will be converted into KB, MB, GB
>     or remain unchanged. UNIT will be set to "B", "KB", "MB" or "GB"
> @@ -180,20 +166,45 @@ progressfn (debuginfod_client *c, long cur, long total)
>    return 0;
>  }
>  
> +/* Cleanup ARG, which is a debuginfod_client pointer.  */
> +
> +static void
> +cleanup_debuginfod_client (void *arg)
> +{
> +  debuginfod_client *client = static_cast<debuginfod_client *> (arg);
> +  debuginfod_end (client);
> +}
> +
> +/* Return a pointer to the single global debuginfod_client, initialising it
> +   first if needed.  */
> +
>  static debuginfod_client *
>  get_debuginfod_client ()
>  {
> -  static debuginfod_client_up global_client;
> +  static debuginfod_client *global_client = nullptr;
>  
>    if (global_client == nullptr)
>      {
> -      global_client.reset (debuginfod_begin ());
> +      global_client = debuginfod_begin ();
>  
>        if (global_client != nullptr)
> -	debuginfod_set_progressfn (global_client.get (), progressfn);
> +	{
> +	  /* It is important that we cleanup the debuginfod_client object
> +	     before calling exit.  Some of the libraries used by debuginfod
> +	     make use of at_exit handlers to perform cleanup.
> +
> +	     If we wrapped the debuginfod_client in a unique_ptr and relied
> +	     on its destructor to cleanup then this would be run as part of
> +	     the global C++ object destructors, which is after the at_exit
> +	     handlers, which is too late.
> +
> +	     So instead, we make use of GDB's final cleanup mechanism.  */
> +	  make_final_cleanup (cleanup_debuginfod_client, global_client);
> +	  debuginfod_set_progressfn (global_client, progressfn);
> +	}
>      }
>  
> -  return global_client.get ();
> +  return global_client;
>  }
>  
>  /* Check if debuginfod is enabled.  If configured to do so, ask the user
>
> base-commit: baab375361c365afee2577c94cbbd3fdd443d6da
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-09 14:38 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-23 11:23 [PATCH] gdb/debuginfod: cleanup debuginfod earlier Andrew Burgess
2023-05-23 12:31 ` Simon Marchi
2023-05-23 14:23 ` Aaron Merey
2023-05-24 14:33   ` Andrew Burgess
2023-06-08 13:33 ` [PATCHv2] " Andrew Burgess
2023-06-08 15:09   ` Mark Wielaard
2023-06-08 21:08   ` Aaron Merey
2023-06-09 14:24   ` Tom Tromey
2023-06-09 14:38   ` Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).