From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id 8C49A385840C for ; Wed, 21 Sep 2022 15:30:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8C49A385840C Received: from [10.0.0.11] (unknown [217.28.27.60]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 0024F1E07B; Wed, 21 Sep 2022 11:30:50 -0400 (EDT) Message-ID: <87eae56e-e370-0f02-86b4-d4d2edb4dfa8@simark.ca> Date: Wed, 21 Sep 2022 11:30:50 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: [PATCH] gdb: fix target_ops reference count for some cases Content-Language: en-US To: Andrew Burgess , gdb-patches@sourceware.org References: <20220921131200.3983844-1-aburgess@redhat.com> From: Simon Marchi In-Reply-To: <20220921131200.3983844-1-aburgess@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 21 Sep 2022 15:30:53 -0000 On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote: > This commit started as an investigation into why the test > gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file > behind. > > The crash occurs in connpy_connection_dealloc, and is actually > triggered by this assert: > > gdb_assert (conn_obj->target == nullptr); > > Now a little aside... > > ... the assert is never actually printed, instead GDB crashes due to > calling a pure virtual function. The backtrace at the point of crash > looks like this: > > #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 > #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 > #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 ) at ../../src/gdb/target.c:3606 > #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 ) at ../../src/gdb/inferior.h:377 > #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 ) at ../../src/gdb/target.c:3047 > #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 ) at ../../src/gdb/target-delegates.c:1223 > #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 > #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 , file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379 > > Notice in frame #12 we called target_ops::supports_terminal_ours, > however, this is the_dummy_target, which is of type dummy_target, and > so we should have called dummy_target::supports_terminal_ours. I > believe the reason we ended up in the wrong implementation of > supports_terminal_ours (which is a virtual function) is because we > made the call during GDB's shut-down, and, I suspect, the vtables were > in a weird state. > > Anyway, the point of this patch is not to fix GDB's ability to print > an assert during exit, but to address the root cause of the assert. > With that aside out of the way, we can return to the main story... > > Connections are represented in Python with gdb.TargtetConnection > objects (or its sub-classes). The assert in question confirms that > when a gdb.TargtetConnection is deallocated, the underlying GDB > connection has itself been removed from GDB. If this is not true then > we risk creating multiple different gdb.TargtetConnection objects for > the same connection, which would be bad. > > When a connection removed in GDB the connection_removed observer Missing "is". > fires, which we catch with connpy_connection_removed, this function > then sets conn_obj->target to nullptr. > > The first issue here is that connpy_connection_dealloc is being called > as part of GDB's exit code, which is run after the Python interpreter > has been shut down. The connpy_connection_dealloc function is used to > deallocate the gdb.TargtetConnection Python object. Surely it is > wrong for us to be deallocating Python objects after the interpreter > has been shut down. > > The reason why connpy_connection_dealloc is called during GDB's exit > is that the global all_connection_objects map is holding a reference > to the gdb.TargtetConnection object. When the map is destroyed during Typo in "TargtetConnection". > GDB's exit, the gdb.TargtetConnection objects within the map can > finally be deallocated. > > Another job of connpy_connection_removed (the function we mentioned > earlier) is to remove connections from the all_connection_objects map > when the connection is removed from GDB. > > And so, the reason why all_connection_objects has contents when GDB > exits, and the reason the assert fires, is that, when GDB exits, there > are still some connections that have not yet been removed from GDB, > that is, they have a non-zero reference count. > > If we take a look at quit_force (top.c) you can see that, for each > inferior, we call pop_all_targets before we (later in the function) > call do_final_cleanups. It is the do_final_cleanups call that is > responsible for shutting down the Python interpreter. > > So, in theory, we should have popped all targets be the time GDB be -> before? > exits, this should have reduced their reference counts to zero, which > in turn should have triggered the connection_removed observer, and > resulted in the connection being removed from all_connection_objects, > and the gdb.TargtetConnection object being deallocated. "TargtetConnection" > That this is not happening indicates that earlier, somewhere else in > GDB, we are leaking references to GDB's connections. > > I tracked the problem down to the 'remove-inferiors' command, > implemented with the remove_inferior_command function (in inferior.c). > This function calls delete_inferior for each inferior the user > specifies. > > In delete_inferior we do some house keeping, and then delete the > inferior object, which calls inferior::~inferior. > > In neither delete_inferior or inferior::~inferior do we call > pop_all_targets, and it is this missing call that means we leak some > references to the target_ops objects on the inferior's target_stack. > > To fix this we need to add a pop_all_targets call either in > delete_inferior or in inferior::~inferior. Currently, I think that we > should place the call in delete_inferior. > > Before calling pop_all_targets the inferior for which we are popping > needs to be made current, along with the program_space associated with > the inferior. Why does the inferior and program_space need to be made current in order to pop the targets? I understand that pop_all_targets_above and other functions use `current_inferior`, but could we convert them (or add new versions) so they don't? Off-hand I don't see why they couldn't receive the inferior as a parameter (or be made methods of inferior and/or target_stack). It shouldn't be important which inferior is the current one when calling target_close on a target. If we are closing a target, it means it is no longer controlling any inferior. > At the moment the inferior's program_space is deleted in > delete_inferior before we call inferior::~inferior, so, I think, to > place the pop_all_targets call into inferior::~inferior would require > additional adjustment to GDB. As delete_inferior already exists, and > includes various house keeping tasks, it doesn't seem unreasonable to > place the pop_all_targets call there. I don't object to fixing it like this. I'm just wondering, did you consider changing target_stack::m_stack to make it hold string references, something like std::vector? I haven't tried so maybe this doesn't make sense / is too difficult. But if it does, I guess the problem would take care of itself. When deleting an inferior that still has some targets pushed, they would be automatically decref'd and closed if needed. > Now when I run py-inferior.exp, by the time GDB exits, the reference > counts are correct. The final pop_all_targets calls in quit_force > reduce the reference counts to zero, which means the connections are > removed before the Python interpreter is shut down. When GDB actually > exits the all_connection_objects map is empty, and no further Python > objects are deallocated at that point. The test now exits cleanly > without creating a core file. > > I've made some additional, related, changes in this commit. > > In inferior::~inferior I've added a new assert that ensures, by the > time the inferior is destructed, the inferior's target stack is > empty (with the exception of the dummy_target). If this is not true > then we will be loosing a reference to a target_ops object. > > It is worth noting that we are loosing references to the dummy_target > object, however, I've not tried to fix that problem in this patch, as > I don't think it is as important. The dummy target is a global > singleton, there's no observer for when the dummy target is deleted, > so no other parts of GDB care when the object is deleted. As a global > it is always just deleted as part of the exit code, and we never > really care what its reference count is. So, though it is a little > annoying that its reference count is wrong, it doesn't really matter. > Maybe I'll come back in a later patch and try to clean that up... but > that's for another day. > > When I tested the changes above I ran into a failure from 'maint > selftest infrun_thread_ptid_changed'. > > The problem is with scoped_mock_context. This object creates a new > inferior (called mock_inferior), with a thread, and some other > associated state, and then select this new inferior. We also push a > process_stratum_target sub-class onto the new inferior's target stack. > > In ~scoped_mock_context we call: > > pop_all_targets_at_and_above (process_stratum); > > this will remove all target_ops objects from the mock_inferior's > target stack, but leaves anything at the dummy_stratum and the > file_stratum (which I find a little weird, but more on this later). > > The problem though is that pop_all_targets_at_and_above, just like > pop_all_targets, removes things from the target stack of the current > inferior. In ~scoped_mock_context we don't ensure that the > mock_inferior associated with the current scoped_mock_context is > actually selected. > > In most tests we create a single scoped_mock_context, which > automatically selects its contained mock_inferior. However, in the > test infrun_thread_ptid_changed, we create multiple > scoped_mock_context, and then change which inferior is currently > selected. > > As a result, in one case, we end up in ~scoped_mock_context with the > wrong inferior selected. The pop_all_targets_at_and_above call then > removes the target_ops from the wrong inferior's target stack. This > leaves the target_ops on the scoped_mock_context::mock_inferior's > target stack, and, when the mock_inferior is destructed, we loose > some references, this triggers the assert I placed in > inferior::~inferior. > > To fix this I added a switch_to_inferior_no_thread call within the > ~scoped_mock_context function. Good catch. Although, if that could be fixed by making pop_all_targets_at_and_above not use the current_inferior, I think it would be nicer. And if the target stack could take care of managing the refcount, as mentioned above, even nicer. > As I mention above, it seems weird that we call > pop_all_targets_at_and_above instead of pop_all_targets, so I've > changed that. I didn't see any test regressions after this, so I'm > assuming this is fine. Seems fine to me (this is essentially what a target stack holding target_ops_refs would do). Simon