From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id 8C49A385840C
 for <gdb-patches@sourceware.org>; Wed, 21 Sep 2022 15:30:51 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C49A385840C
Received: from [10.0.0.11] (unknown [217.28.27.60])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 0024F1E07B;
 Wed, 21 Sep 2022 11:30:50 -0400 (EDT)
Message-ID: <87eae56e-e370-0f02-86b4-d4d2edb4dfa8@simark.ca>
Date: Wed, 21 Sep 2022 11:30:50 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
 Thunderbird/102.2.2
Subject: Re: [PATCH] gdb: fix target_ops reference count for some cases
Content-Language: en-US
To: Andrew Burgess <aburgess@redhat.com>, gdb-patches@sourceware.org
References: <20220921131200.3983844-1-aburgess@redhat.com>
From: Simon Marchi <simark@simark.ca>
In-Reply-To: <20220921131200.3983844-1-aburgess@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_PASS,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Sep 2022 15:30:53 -0000


On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote:
> This commit started as an investigation into why the test
> gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file
> behind.
> 
> The crash occurs in connpy_connection_dealloc, and is actually
> triggered by this assert:
> 
>   gdb_assert (conn_obj->target == nullptr);
> 
> Now a little aside...
> 
> ... the assert is never actually printed, instead GDB crashes due to
> calling a pure virtual function.  The backtrace at the point of crash
> looks like this:
> 
>   #7  0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6
>   #8  0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6
>   #9  0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3606
>   #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../../src/gdb/inferior.h:377
>   #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047
>   #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target-delegates.c:1223
>   #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112
>   #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 <internal_error_problem>, file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379
> 
> Notice in frame #12 we called target_ops::supports_terminal_ours,
> however, this is the_dummy_target, which is of type dummy_target, and
> so we should have called dummy_target::supports_terminal_ours.  I
> believe the reason we ended up in the wrong implementation of
> supports_terminal_ours (which is a virtual function) is because we
> made the call during GDB's shut-down, and, I suspect, the vtables were
> in a weird state.
> 
> Anyway, the point of this patch is not to fix GDB's ability to print
> an assert during exit, but to address the root cause of the assert.
> With that aside out of the way, we can return to the main story...
> 
> Connections are represented in Python with gdb.TargtetConnection
> objects (or its sub-classes).  The assert in question confirms that
> when a gdb.TargtetConnection is deallocated, the underlying GDB
> connection has itself been removed from GDB.  If this is not true then
> we risk creating multiple different gdb.TargtetConnection objects for
> the same connection, which would be bad.
> 
> When a connection removed in GDB the connection_removed observer

Missing "is".

> fires, which we catch with connpy_connection_removed, this function
> then sets conn_obj->target to nullptr.
> 
> The first issue here is that connpy_connection_dealloc is being called
> as part of GDB's exit code, which is run after the Python interpreter
> has been shut down.  The connpy_connection_dealloc function is used to
> deallocate the gdb.TargtetConnection Python object.  Surely it is
> wrong for us to be deallocating Python objects after the interpreter
> has been shut down.
> 
> The reason why connpy_connection_dealloc is called during GDB's exit
> is that the global all_connection_objects map is holding a reference
> to the gdb.TargtetConnection object.  When the map is destroyed during

Typo in "TargtetConnection".

> GDB's exit, the gdb.TargtetConnection objects within the map can
> finally be deallocated.
> 
> Another job of connpy_connection_removed (the function we mentioned
> earlier) is to remove connections from the all_connection_objects map
> when the connection is removed from GDB.
> 
> And so, the reason why all_connection_objects has contents when GDB
> exits, and the reason the assert fires, is that, when GDB exits, there
> are still some connections that have not yet been removed from GDB,
> that is, they have a non-zero reference count.
>
> If we take a look at quit_force (top.c) you can see that, for each
> inferior, we call pop_all_targets before we (later in the function)
> call do_final_cleanups.  It is the do_final_cleanups call that is
> responsible for shutting down the Python interpreter.
> 
> So, in theory, we should have popped all targets be the time GDB

be -> before?

> exits, this should have reduced their reference counts to zero, which
> in turn should have triggered the connection_removed observer, and
> resulted in the connection being removed from all_connection_objects,
> and the gdb.TargtetConnection object being deallocated.

"TargtetConnection"

> That this is not happening indicates that earlier, somewhere else in
> GDB, we are leaking references to GDB's connections.
> 
> I tracked the problem down to the 'remove-inferiors' command,
> implemented with the remove_inferior_command function (in inferior.c).
> This function calls delete_inferior for each inferior the user
> specifies.
> 
> In delete_inferior we do some house keeping, and then delete the
> inferior object, which calls inferior::~inferior.
> 
> In neither delete_inferior or inferior::~inferior do we call
> pop_all_targets, and it is this missing call that means we leak some
> references to the target_ops objects on the inferior's target_stack.
> 
> To fix this we need to add a pop_all_targets call either in
> delete_inferior or in inferior::~inferior.  Currently, I think that we
> should place the call in delete_inferior.
> 
> Before calling pop_all_targets the inferior for which we are popping
> needs to be made current, along with the program_space associated with
> the inferior.

Why does the inferior and program_space need to be made current in order
to pop the targets?  I understand that pop_all_targets_above and other
functions use `current_inferior`, but could we convert them (or add new
versions) so they don't?  Off-hand I don't see why they couldn't receive
the inferior as a parameter (or be made methods of inferior and/or
target_stack).

It shouldn't be important which inferior is the current one when calling
target_close on a target.  If we are closing a target, it means it is no
longer controlling any inferior.

> At the moment the inferior's program_space is deleted in
> delete_inferior before we call inferior::~inferior, so, I think, to
> place the pop_all_targets call into inferior::~inferior would require
> additional adjustment to GDB.  As delete_inferior already exists, and
> includes various house keeping tasks, it doesn't seem unreasonable to
> place the pop_all_targets call there.

I don't object to fixing it like this.  I'm just wondering, did you
consider changing target_stack::m_stack to make it hold string
references, something like std::vector<target_ops_ref>?  I haven't tried
so maybe this doesn't make sense / is too difficult.  But if it does, I
guess the problem would take care of itself.  When deleting an inferior
that still has some targets pushed, they would be automatically decref'd
and closed if needed.

> Now when I run py-inferior.exp, by the time GDB exits, the reference
> counts are correct.  The final pop_all_targets calls in quit_force
> reduce the reference counts to zero, which means the connections are
> removed before the Python interpreter is shut down.  When GDB actually
> exits the all_connection_objects map is empty, and no further Python
> objects are deallocated at that point.  The test now exits cleanly
> without creating a core file.
> 
> I've made some additional, related, changes in this commit.
> 
> In inferior::~inferior I've added a new assert that ensures, by the
> time the inferior is destructed, the inferior's target stack is
> empty (with the exception of the dummy_target).  If this is not true
> then we will be loosing a reference to a target_ops object.
> 
> It is worth noting that we are loosing references to the dummy_target
> object, however, I've not tried to fix that problem in this patch, as
> I don't think it is as important.  The dummy target is a global
> singleton, there's no observer for when the dummy target is deleted,
> so no other parts of GDB care when the object is deleted.  As a global
> it is always just deleted as part of the exit code, and we never
> really care what its reference count is.  So, though it is a little
> annoying that its reference count is wrong, it doesn't really matter.
> Maybe I'll come back in a later patch and try to clean that up... but
> that's for another day.
> 
> When I tested the changes above I ran into a failure from 'maint
> selftest infrun_thread_ptid_changed'.
> 
> The problem is with scoped_mock_context.  This object creates a new
> inferior (called mock_inferior), with a thread, and some other
> associated state, and then select this new inferior.  We also push a
> process_stratum_target sub-class onto the new inferior's target stack.
> 
> In ~scoped_mock_context we call:
> 
>   pop_all_targets_at_and_above (process_stratum);
> 
> this will remove all target_ops objects from the mock_inferior's
> target stack, but leaves anything at the dummy_stratum and the
> file_stratum (which I find a little weird, but more on this later).
> 
> The problem though is that pop_all_targets_at_and_above, just like
> pop_all_targets, removes things from the target stack of the current
> inferior.  In ~scoped_mock_context we don't ensure that the
> mock_inferior associated with the current scoped_mock_context is
> actually selected.
> 
> In most tests we create a single scoped_mock_context, which
> automatically selects its contained mock_inferior.  However, in the
> test infrun_thread_ptid_changed, we create multiple
> scoped_mock_context, and then change which inferior is currently
> selected.
> 
> As a result, in one case, we end up in ~scoped_mock_context with the
> wrong inferior selected.  The pop_all_targets_at_and_above call then
> removes the target_ops from the wrong inferior's target stack.  This
> leaves the target_ops on the scoped_mock_context::mock_inferior's
> target stack, and, when the mock_inferior is destructed, we loose
> some references, this triggers the assert I placed in
> inferior::~inferior.
> 
> To fix this I added a switch_to_inferior_no_thread call within the
> ~scoped_mock_context function.

Good catch.  Although, if that could be fixed by making
pop_all_targets_at_and_above not use the current_inferior, I think it
would be nicer.  And if the target stack could take care of managing the
refcount, as mentioned above, even nicer.

> As I mention above, it seems weird that we call
> pop_all_targets_at_and_above instead of pop_all_targets, so I've
> changed that.  I didn't see any test regressions after this, so I'm
> assuming this is fine.

Seems fine to me (this is essentially what a target stack holding
target_ops_refs would do).

Simon