From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A4A183858422 for ; Thu, 22 Sep 2022 14:21:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4A183858422 Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-452-S7xc5T25OP-2kMwGJBBtXQ-1; Thu, 22 Sep 2022 10:21:49 -0400 X-MC-Unique: S7xc5T25OP-2kMwGJBBtXQ-1 Received: by mail-wm1-f70.google.com with SMTP id v190-20020a1cacc7000000b003b4ab30188fso1171069wme.2 for ; Thu, 22 Sep 2022 07:21:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:references:in-reply-to:subject:to:from :x-gm-message-state:from:to:cc:subject:date; bh=NI27tbxBjZVBJFMam5hnOq898drWq0hsWOtCQeItvV8=; b=JJ5AM6dEq0LBxdVp+ocKA+kETtasB5fGWzSL6zl0tGilkt4C1ggYjjlot8nZoaG5WY Bdg86zZMJ6XcgGu1i+n6IxyNvEXn635lkmUVaI0pi5eywMYut7mIwbSaY3IezIREvJ9M HAEFfmlZgnWKnRdVi8ZmiPIKnxejmTn2meem/MTPxIbKdiw2T2NLDNfikUjtZxCPNIcp /drkt/O65ODeew75XApDh7/hsxeFl+McDHhIs+5SmiOsr52qz2klyVIZ6p40QCroZoW5 7e3fnp3FEFI9PYsLbk0xexgsFM7fpNjQxCXOgQjKao47aPbyBzpDxX0qFDPHoSpHYsUm lrFA== X-Gm-Message-State: ACrzQf0SpgtByggi79UOigL0CxgJIYrfRs8SpWwez5GKnXPcNcD6tdVg meZLE8FraNrR10RyzXbPCujVP+u155lHCS54K8jkmPZUk3K0pzWbhe6HWHvUFDf2zHyC9+biCGG 5NqAYX8lFoFIJ2nEIw74SPw== X-Received: by 2002:a05:600c:3d8a:b0:3b4:a4e1:8661 with SMTP id bi10-20020a05600c3d8a00b003b4a4e18661mr2628139wmb.30.1663856507524; Thu, 22 Sep 2022 07:21:47 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4+XD8DNc0CKziETi8oiAEad4sbpsDfglHHXTX0UjHA/UKBHmZRYSv5Y/OZ+Tr1qfcbaqWXsg== X-Received: by 2002:a05:600c:3d8a:b0:3b4:a4e1:8661 with SMTP id bi10-20020a05600c3d8a00b003b4a4e18661mr2628121wmb.30.1663856507223; Thu, 22 Sep 2022 07:21:47 -0700 (PDT) Received: from localhost (52.72.115.87.dyn.plus.net. [87.115.72.52]) by smtp.gmail.com with ESMTPSA id p4-20020a05600c358400b003b4935f04a4sm8052727wmq.5.2022.09.22.07.21.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Sep 2022 07:21:46 -0700 (PDT) From: Andrew Burgess To: Simon Marchi , gdb-patches@sourceware.org Subject: Re: [PATCH] gdb: fix target_ops reference count for some cases In-Reply-To: <87eae56e-e370-0f02-86b4-d4d2edb4dfa8@simark.ca> References: <20220921131200.3983844-1-aburgess@redhat.com> <87eae56e-e370-0f02-86b4-d4d2edb4dfa8@simark.ca> Date: Thu, 22 Sep 2022 15:21:45 +0100 Message-ID: <87edw3ebgm.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Sep 2022 14:21:53 -0000 Simon Marchi writes: > On 2022-09-21 09:12, Andrew Burgess via Gdb-patches wrote: >> This commit started as an investigation into why the test >> gdb.python/py-inferior.exp crashes when GDB exits, leaving a core file >> behind. >> >> The crash occurs in connpy_connection_dealloc, and is actually >> triggered by this assert: >> >> gdb_assert (conn_obj->target == nullptr); >> >> Now a little aside... >> >> ... the assert is never actually printed, instead GDB crashes due to >> calling a pure virtual function. The backtrace at the point of crash >> looks like this: >> >> #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 >> #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 >> #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 ) at ../../src/gdb/target.c:3606 >> #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 ) at ../../src/gdb/inferior.h:377 >> #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 ) at ../../src/gdb/target.c:3047 >> #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 ) at ../../src/gdb/target-delegates.c:1223 >> #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 >> #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=0x2bdab00 , file=0x198acf0 "../../src/gdb/python/py-connection.c", line=193, fmt=0x198ac9f "%s: Assertion `%s' failed.", ap=0x7ffdc26109d8) at ../../src/gdb/utils.c:379 >> >> Notice in frame #12 we called target_ops::supports_terminal_ours, >> however, this is the_dummy_target, which is of type dummy_target, and >> so we should have called dummy_target::supports_terminal_ours. I >> believe the reason we ended up in the wrong implementation of >> supports_terminal_ours (which is a virtual function) is because we >> made the call during GDB's shut-down, and, I suspect, the vtables were >> in a weird state. >> >> Anyway, the point of this patch is not to fix GDB's ability to print >> an assert during exit, but to address the root cause of the assert. >> With that aside out of the way, we can return to the main story... >> >> Connections are represented in Python with gdb.TargtetConnection >> objects (or its sub-classes). The assert in question confirms that >> when a gdb.TargtetConnection is deallocated, the underlying GDB >> connection has itself been removed from GDB. If this is not true then >> we risk creating multiple different gdb.TargtetConnection objects for >> the same connection, which would be bad. >> >> When a connection removed in GDB the connection_removed observer > > Missing "is". > >> fires, which we catch with connpy_connection_removed, this function >> then sets conn_obj->target to nullptr. >> >> The first issue here is that connpy_connection_dealloc is being called >> as part of GDB's exit code, which is run after the Python interpreter >> has been shut down. The connpy_connection_dealloc function is used to >> deallocate the gdb.TargtetConnection Python object. Surely it is >> wrong for us to be deallocating Python objects after the interpreter >> has been shut down. >> >> The reason why connpy_connection_dealloc is called during GDB's exit >> is that the global all_connection_objects map is holding a reference >> to the gdb.TargtetConnection object. When the map is destroyed during > > Typo in "TargtetConnection". > >> GDB's exit, the gdb.TargtetConnection objects within the map can >> finally be deallocated. >> >> Another job of connpy_connection_removed (the function we mentioned >> earlier) is to remove connections from the all_connection_objects map >> when the connection is removed from GDB. >> >> And so, the reason why all_connection_objects has contents when GDB >> exits, and the reason the assert fires, is that, when GDB exits, there >> are still some connections that have not yet been removed from GDB, >> that is, they have a non-zero reference count. >> >> If we take a look at quit_force (top.c) you can see that, for each >> inferior, we call pop_all_targets before we (later in the function) >> call do_final_cleanups. It is the do_final_cleanups call that is >> responsible for shutting down the Python interpreter. >> >> So, in theory, we should have popped all targets be the time GDB > > be -> before? > >> exits, this should have reduced their reference counts to zero, which >> in turn should have triggered the connection_removed observer, and >> resulted in the connection being removed from all_connection_objects, >> and the gdb.TargtetConnection object being deallocated. > > "TargtetConnection" > >> That this is not happening indicates that earlier, somewhere else in >> GDB, we are leaking references to GDB's connections. >> >> I tracked the problem down to the 'remove-inferiors' command, >> implemented with the remove_inferior_command function (in inferior.c). >> This function calls delete_inferior for each inferior the user >> specifies. >> >> In delete_inferior we do some house keeping, and then delete the >> inferior object, which calls inferior::~inferior. >> >> In neither delete_inferior or inferior::~inferior do we call >> pop_all_targets, and it is this missing call that means we leak some >> references to the target_ops objects on the inferior's target_stack. >> >> To fix this we need to add a pop_all_targets call either in >> delete_inferior or in inferior::~inferior. Currently, I think that we >> should place the call in delete_inferior. >> >> Before calling pop_all_targets the inferior for which we are popping >> needs to be made current, along with the program_space associated with >> the inferior. > > Why does the inferior and program_space need to be made current in order > to pop the targets? I understand that pop_all_targets_above and other > functions use `current_inferior`, but could we convert them (or add new > versions) so they don't? Off-hand I don't see why they couldn't receive > the inferior as a parameter (or be made methods of inferior and/or > target_stack). > > It shouldn't be important which inferior is the current one when calling > target_close on a target. If we are closing a target, it means it is no > longer controlling any inferior. I agree with you 100%. Unfortunately, the following targets all seem to depend on current_inferior being set (in their ::close method): bsd_kvm_target core_target darwin_nat_target record_btrace_target ctf_target tfile_target windows_nat_target (though this is only for debug output) I suspect that this means these targets only really work when GDB has a single inferior maybe? In most cases GDB seems to be clearing out some per-inferior state relating to the target... I need to investigate more, but I guess I wanted to raise this in case you (or anyone) had thoughts. > >> At the moment the inferior's program_space is deleted in >> delete_inferior before we call inferior::~inferior, so, I think, to >> place the pop_all_targets call into inferior::~inferior would require >> additional adjustment to GDB. As delete_inferior already exists, and >> includes various house keeping tasks, it doesn't seem unreasonable to >> place the pop_all_targets call there. > > I don't object to fixing it like this. I'm just wondering, did you > consider changing target_stack::m_stack to make it hold string > references, something like std::vector? I haven't tried > so maybe this doesn't make sense / is too difficult. But if it does, I > guess the problem would take care of itself. When deleting an inferior > that still has some targets pushed, they would be automatically decref'd > and closed if needed. I did think about this. I think in the end the fix I proposed here was just less churn. I've revisited the idea of holding target_ops_ref objects, and I have some patches that move GDB in that direction, though I haven't yet figured out if we can get rid of the whole pop_all_targets API, which I think is what you're hinting at. > >> Now when I run py-inferior.exp, by the time GDB exits, the reference >> counts are correct. The final pop_all_targets calls in quit_force >> reduce the reference counts to zero, which means the connections are >> removed before the Python interpreter is shut down. When GDB actually >> exits the all_connection_objects map is empty, and no further Python >> objects are deallocated at that point. The test now exits cleanly >> without creating a core file. >> >> I've made some additional, related, changes in this commit. >> >> In inferior::~inferior I've added a new assert that ensures, by the >> time the inferior is destructed, the inferior's target stack is >> empty (with the exception of the dummy_target). If this is not true >> then we will be loosing a reference to a target_ops object. >> >> It is worth noting that we are loosing references to the dummy_target >> object, however, I've not tried to fix that problem in this patch, as >> I don't think it is as important. The dummy target is a global >> singleton, there's no observer for when the dummy target is deleted, >> so no other parts of GDB care when the object is deleted. As a global >> it is always just deleted as part of the exit code, and we never >> really care what its reference count is. So, though it is a little >> annoying that its reference count is wrong, it doesn't really matter. >> Maybe I'll come back in a later patch and try to clean that up... but >> that's for another day. >> >> When I tested the changes above I ran into a failure from 'maint >> selftest infrun_thread_ptid_changed'. >> >> The problem is with scoped_mock_context. This object creates a new >> inferior (called mock_inferior), with a thread, and some other >> associated state, and then select this new inferior. We also push a >> process_stratum_target sub-class onto the new inferior's target stack. >> >> In ~scoped_mock_context we call: >> >> pop_all_targets_at_and_above (process_stratum); >> >> this will remove all target_ops objects from the mock_inferior's >> target stack, but leaves anything at the dummy_stratum and the >> file_stratum (which I find a little weird, but more on this later). >> >> The problem though is that pop_all_targets_at_and_above, just like >> pop_all_targets, removes things from the target stack of the current >> inferior. In ~scoped_mock_context we don't ensure that the >> mock_inferior associated with the current scoped_mock_context is >> actually selected. >> >> In most tests we create a single scoped_mock_context, which >> automatically selects its contained mock_inferior. However, in the >> test infrun_thread_ptid_changed, we create multiple >> scoped_mock_context, and then change which inferior is currently >> selected. >> >> As a result, in one case, we end up in ~scoped_mock_context with the >> wrong inferior selected. The pop_all_targets_at_and_above call then >> removes the target_ops from the wrong inferior's target stack. This >> leaves the target_ops on the scoped_mock_context::mock_inferior's >> target stack, and, when the mock_inferior is destructed, we loose >> some references, this triggers the assert I placed in >> inferior::~inferior. >> >> To fix this I added a switch_to_inferior_no_thread call within the >> ~scoped_mock_context function. > > Good catch. Although, if that could be fixed by making > pop_all_targets_at_and_above not use the current_inferior, I think it > would be nicer. And if the target stack could take care of managing the > refcount, as mentioned above, even nicer. As I mention above, right now it seems we do need th correct inferior selected, so we might need something like this, I'll see how my new patches work out. Thanks, Andrew > >> As I mention above, it seems weird that we call >> pop_all_targets_at_and_above instead of pop_all_targets, so I've >> changed that. I didn't see any test regressions after this, so I'm >> assuming this is fine. > > Seems fine to me (this is essentially what a target stack holding > target_ops_refs would do). > > Simon