public inbox for gdb-cvs@sourceware.org
help / color / mirror / Atom feed
* [binutils-gdb] gdb: have target_stack automate reference count handling
@ 2022-12-14 13:58 Andrew Burgess
  0 siblings, 0 replies; only message in thread
From: Andrew Burgess @ 2022-12-14 13:58 UTC (permalink / raw)
  To: gdb-cvs

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=91e3d1d1a5462e49bbf70e0ff2fa3bf3a6a80503

commit 91e3d1d1a5462e49bbf70e0ff2fa3bf3a6a80503
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Thu Sep 22 11:17:39 2022 +0100

    gdb: have target_stack automate reference count handling
    
    This commit changes the target_stack class from using a C style array
    of 'target_ops *' to using a C++ std::array<target_ops_ref, ...>.  The
    benefit of this change is that some of the reference counting of
    target_ops objects is now done automatically.
    
    This commit fixes a crash in gdb.python/py-inferior.exp where GDB
    crashes at exit, leaving a core file behind.
    
    The crash occurs in connpy_connection_dealloc, and is actually
    triggered by this assert:
    
    gdb_assert (conn_obj->target == nullptr);
    
    Now a little aside...
    
        ... the assert is never actually printed, instead GDB crashes due
        to calling a pure virtual function.  The backtrace at the point of
        crash looks like this:
    
          #7  0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6
          #8  0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6
          #9  0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../s>
          #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../.>
          #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047
          #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gd>
          #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112
          #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_li>
    
        Notice in frame #12 we called target_ops::supports_terminal_ours,
        however, this is the_dummy_target, which is of type dummy_target,
        and so we should have called dummy_target::supports_terminal_ours.
        I believe the reason we ended up in the wrong implementation of
        supports_terminal_ours (which is a virtual function) is because we
        made the call during GDB's shut-down, and, I suspect, the vtables
        were in a weird state.
    
        Anyway, the point of this patch is not to fix GDB's ability to
        print an assert during exit, but to address the root cause of the
        assert.  With that aside out of the way, we can return to the main
        story...
    
    Connections are represented in Python with gdb.TargetConnection
    objects (or its sub-classes).  The assert in question confirms that
    when a gdb.TargetConnection is deallocated, the underlying GDB
    connection has itself been removed from GDB.  If this is not true then
    we risk creating multiple different gdb.TargetConnection objects for
    the same connection, which would be bad.
    
    To ensure that we have one gdb.TargetConnection object for each
    connection, the all_connection_objects map exists, this maps the
    process_stratum_target object (the connection) to the
    gdb.TargetConnection object that represents the connection.
    
    When a connection is removed in GDB the connection_removed observer
    fires, which we catch with connpy_connection_removed, this function
    then sets conn_obj->target to nullptr, and removes the corresponding
    entry from the all_connection_objects map.
    
    The first issue here is that connpy_connection_dealloc is being called
    as part of GDB's exit code, which is run after the Python interpreter
    has been shut down.  The connpy_connection_dealloc function is used to
    deallocate the gdb.TargetConnection Python object.  Surely it is
    wrong for us to be deallocating Python objects after the interpreter
    has been shut down.
    
    The reason why connpy_connection_dealloc is called during GDB's exit
    is that the global all_connection_objects map is still holding a
    reference to the gdb.TargetConnection object.  When the map is
    destroyed during GDB's exit, the gdb.TargetConnection objects within
    the map can finally be deallocated.
    
    The reason why all_connection_objects has contents when GDB exits, and
    the reason the assert fires, is that, when GDB exits, there are still
    some connections that have not yet been removed from GDB, that is,
    they have a non-zero reference count.
    
    If we take a look at quit_force (top.c) you can see that, for each
    inferior, we call pop_all_targets before we (later in the function)
    call do_final_cleanups.  It is the do_final_cleanups call that is
    responsible for shutting down the Python interpreter.  The
    pop_all_targets calls should, in theory, cause all the connections to
    be removed from GDB.
    
    That this isn't working indicates that some targets have a non-zero
    reference count even after this final pop_all_targets call, and
    indeed, when I debug GDB, that is what I see.
    
    I tracked the problem down to delete_inferior where we do some house
    keeping, and then delete the inferior object, which calls
    inferior::~inferior.
    
    In neither delete_inferior or inferior::~inferior do we call
    pop_all_targets, and it is this missing call that means we leak some
    references to the target_ops objects on the inferior's target_stack.
    
    In this commit I will provide a partial fix for the problem.  I say
    partial fix, but this will actually be enough to resolve the crash.
    In a later commit I will provide the final part of the fix.
    
    As mentioned at the start of the commit message, this commit changes
    the m_stack in target_stack to hold target_ops_ref objects.  This
    means that when inferior::~inferior is called, and m_stack is
    released, we automatically decrement the target_ops reference count.
    With this change in place we no longer leak any references, and now,
    in quit_force the final pop_all_targets calls will release the final
    references.  This means that the targets will be correctly closed at
    this point, which means the connections will be removed from GDB and
    the Python objects deallocated before the Python interpreter shuts
    down.
    
    There's a slight oddity in target_stack::unpush, where we std::move
    the reference out of m_stack like this:
    
      auto ref = std::move (m_stack[stratum]);
    
    the `ref' isn't used explicitly, but it serves to hold the
    target_ops_ref until the end of the scope while allowing the m_stack
    entry to be reset back to nullptr.  The alternative would be to
    directly set the m_stack entry to nullptr, like this:
    
      m_stack[stratum] = nullptr;
    
    The problem here is that when we set the m_stack entry to nullptr we
    first decrement the target_ops reference count, and then set the array
    entry to nullptr.
    
    If the decrement means that the target_ops object reaches a zero
    reference count then the target_ops object will be closed by calling
    target_close.  In target_close we ensure that the target being closed
    is not in any inferiors target_stack.
    
    As we decrement before clearing, then this check in target_close will
    fail, and an assert will trigger.
    
    By using std::move to move the reference out of m_stack, this clears
    the m_stack entry, meaning the inferior no longer contains the
    target_ops in its target_stack.  Now when the REF object goes out of
    scope and the reference count is decremented, target_close can run
    successfully.
    
    I've made use of the Python connection_removed listener API to add a
    test for this issue.  The test installs a listener and then causes
    delete_inferior to be called, we can then see that the connection is
    then correctly removed (because the listener triggers).

Diff:
---
 gdb/target.c                                       | 43 +++++-----
 gdb/target.h                                       |  4 +-
 gdb/testsuite/gdb.python/py-connection-removed.exp | 92 ++++++++++++++++++++++
 3 files changed, 118 insertions(+), 21 deletions(-)

diff --git a/gdb/target.c b/gdb/target.c
index 1dd0f42af7d..b3fd2345b78 100644
--- a/gdb/target.c
+++ b/gdb/target.c
@@ -1182,23 +1182,28 @@ decref_target (target_ops *t)
 void
 target_stack::push (target_ops *t)
 {
-  t->incref ();
+  /* We must create a new reference first.  It is possible that T is
+     already pushed on this target stack, in which case we will first
+     unpush it below, before re-pushing it.  If we don't increment the
+     reference count now, then when we unpush it, we might end up deleting
+     T, which is not good.  */
+  auto ref = target_ops_ref::new_reference (t);
 
   strata stratum = t->stratum ();
 
-  if (stratum == process_stratum)
-    connection_list_add (as_process_stratum_target (t));
-
   /* If there's already a target at this stratum, remove it.  */
 
-  if (m_stack[stratum] != NULL)
-    unpush (m_stack[stratum]);
+  if (m_stack[stratum].get () != nullptr)
+    unpush (m_stack[stratum].get ());
 
   /* Now add the new one.  */
-  m_stack[stratum] = t;
+  m_stack[stratum] = std::move (ref);
 
   if (m_top < stratum)
     m_top = stratum;
+
+  if (stratum == process_stratum)
+    connection_list_add (as_process_stratum_target (t));
 }
 
 /* See target.h.  */
@@ -1223,19 +1228,19 @@ target_stack::unpush (target_ops *t)
       return false;
     }
 
-  /* Unchain the target.  */
-  m_stack[stratum] = NULL;
-
   if (m_top == stratum)
     m_top = this->find_beneath (t)->stratum ();
 
-  /* Finally close the target, if there are no inferiors
-     referencing this target still.  Note we do this after unchaining,
-     so any target method calls from within the target_close
-     implementation don't end up in T anymore.  Do leave the target
-     open if we have are other inferiors referencing this target
-     still.  */
-  decref_target (t);
+  /* Move the target reference off the target stack, this sets the pointer
+     held in m_stack to nullptr, and places the reference in ref.  When
+     ref goes out of scope its reference count will be decremented, which
+     might cause the target to close.
+
+     We have to do it this way, and not just set the value in m_stack to
+     nullptr directly, because doing so would decrement the reference
+     count first, which might close the target, and closing the target
+     does a check that the target is not on any inferiors target_stack.  */
+  auto ref = std::move (m_stack[stratum]);
 
   return true;
 }
@@ -3612,8 +3617,8 @@ target_stack::find_beneath (const target_ops *t) const
 {
   /* Look for a non-empty slot at stratum levels beneath T's.  */
   for (int stratum = t->stratum () - 1; stratum >= 0; --stratum)
-    if (m_stack[stratum] != NULL)
-      return m_stack[stratum];
+    if (m_stack[stratum].get () != NULL)
+      return m_stack[stratum].get ();
 
   return NULL;
 }
diff --git a/gdb/target.h b/gdb/target.h
index 28aa9273893..b9888f851cc 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -1389,7 +1389,7 @@ public:
   { return at (t->stratum ()) == t; }
 
   /* Return the target at STRATUM.  */
-  target_ops *at (strata stratum) const { return m_stack[stratum]; }
+  target_ops *at (strata stratum) const { return m_stack[stratum].get (); }
 
   /* Return the target at the top of the stack.  */
   target_ops *top () const { return at (m_top); }
@@ -1404,7 +1404,7 @@ private:
   /* The stack, represented as an array, with one slot per stratum.
      If no target is pushed at some stratum, the corresponding slot is
      null.  */
-  target_ops *m_stack[(int) debug_stratum + 1] {};
+  std::array<target_ops_ref, (int) debug_stratum + 1> m_stack;
 };
 
 /* Return the dummy target.  */
diff --git a/gdb/testsuite/gdb.python/py-connection-removed.exp b/gdb/testsuite/gdb.python/py-connection-removed.exp
new file mode 100644
index 00000000000..6a0dbd17fe0
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-connection-removed.exp
@@ -0,0 +1,92 @@
+# Copyright (C) 2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# Check that the gdb.connect_removed event triggers when we expect it
+# too.
+#
+# Checking this event has wider implications that simply some corner
+# of the Python API working or not.  The connection_removed event
+# triggers when the reference count of a process_stratum_target
+# reaches zero.  If these events stop triggering when expected then
+# GDB might be getting the reference counting on target_ops objects
+# wrong.
+
+load_lib gdb-python.exp
+
+standard_testfile py-connection.c
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile}] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    return 0
+}
+
+# Register a callback that will trigger when a connection is removed
+# (deleted) within GDB.
+gdb_test_multiline "Add connection_removed event" \
+    "python" "" \
+    "def connection_removed_handler(event):" "" \
+    "   num = event.connection.num" "" \
+    "   type = event.connection.type" "" \
+    "   print('Connection %d (%s) removed' % (num, type))" "" \
+    "gdb.events.connection_removed.connect (connection_removed_handler)" "" \
+    "end" ""
+
+if { [target_info exists gdb_protocol] } {
+    if { [target_info gdb_protocol] == "extended-remote" } {
+	set connection_type "extended-remote"
+    } else {
+	set connection_type "remote"
+    }
+} else {
+    set connection_type "native"
+}
+
+# Add an inferior that shares a connection with inferior 1.
+gdb_test "add-inferior" "Added inferior 2 on connection 1 \[^\r\n\]+"
+
+# Add an inferior with no connection.
+gdb_test "add-inferior -no-connection" "Added inferior 3"
+
+# Kill the first inferior.  If we are using the plain 'remote' protocol then
+# it as this point that the remote target connection is removed.  For the
+# 'extended-remote' and 'native' targets the connection is removed later.
+if { $connection_type == "remote" } {
+    gdb_test "with confirm off -- kill" \
+	"Connection 1 \\(remote\\) removed\r\n.*" "kill inferior"
+} else {
+    gdb_test "with confirm off -- kill" "" "kill inferior"
+}
+
+# Switch to inferior 3 (the one with no connection).
+gdb_test "inferior 3"
+
+# Remove inferior 1, not expecting anything interesting at this point.
+gdb_test_no_output "remove-inferiors 1"
+
+# Now removed inferior 2.  For the 'remote' target the connection has
+# already been removed (see above), but for 'extended-remote' and 'native'
+# targets, it is at this point that the connection is removed.
+if { $connection_type == "remote" } {
+    gdb_test_no_output "remove-inferiors 2"
+} else {
+    gdb_test "remove-inferiors 2" \
+	"Connection 1 \\(${connection_type}\\) removed"
+}

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-12-14 13:58 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-14 13:58 [binutils-gdb] gdb: have target_stack automate reference count handling Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).