From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from simark.ca (simark.ca [158.69.221.121]) by sourceware.org (Postfix) with ESMTPS id A0B323858D20 for ; Mon, 22 Jan 2024 20:43:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A0B323858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=simark.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=simark.ca ARC-Filter: OpenARC Filter v1.0.0 sourceware.org A0B323858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=158.69.221.121 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705956230; cv=none; b=lMPOqlitaGRWdD2mkT1H4mhfWYMpsurZET70NGVoBQsrt7YQM86AE1v/ickL1m0nLccdtQEMsr1zxVKukm/xXyPkPVB+NIMuyb25w4NcJ5m32p574ZzM3TrxHhEBwMD2wpYo3vbhuMp/QpErbfZxA2TS4KiQDYWhP9cYz0RrRNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705956230; c=relaxed/simple; bh=HTJU7bgQqyXpuupGjlNEx2Gp5ta0ngJV2GXCxmB/ah0=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=EqtoeBaa/oT6EVAeiX6Aur/1hxpeNDGdaxOcGORbvBhk3RGzrkBxI2K8U4wUFPDoGXrYyA7hEg7EyECd8HnzunArWajdZKtEd7N4Yhq1A4BHqsjVCzQcbYgVogZQQoWnwchPyiGK3mXwC/ttoNlkEwFTEF7YFsbQohqbEcAI23o= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=simark.ca; s=mail; t=1705956226; bh=HTJU7bgQqyXpuupGjlNEx2Gp5ta0ngJV2GXCxmB/ah0=; h=Date:Subject:To:References:From:In-Reply-To:From; b=VkNJsrz5bQgYZyt66RcXkdCFTegVy+e11PbftLZyTaUCY4Fox/gDyUEkceBYROEls h4/4FUIRodl9Z4tDdOIFqLilH6Bt5ebQSwbSk7bsxX6WEeuOCLzhVULv6vG5qSMopF BOlZUXj1WVIBaM7PV6yS9RnsykwX+AzobQOPvMww= Received: from [10.0.0.11] (modemcable238.237-201-24.mc.videotron.ca [24.201.237.238]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id BD0891E092; Mon, 22 Jan 2024 15:43:46 -0500 (EST) Message-ID: Date: Mon, 22 Jan 2024 15:43:46 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp Content-Language: en-US To: Tom de Vries , gdb-patches@sourceware.org References: <20240122133403.31661-1-tdevries@suse.de> From: Simon Marchi In-Reply-To: <20240122133403.31661-1-tdevries@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2024-01-22 08:34, Tom de Vries wrote: > When building gdb with -O0 -fsanitize=thread, and running test-case > gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into: > ... > WARNING: ThreadSanitizer: heap-use-after-free (pid=249653) > Write of size 4 at 0xffffee83055c by main thread: > #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c) > #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470) > #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) > #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) > #4 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) > #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) > #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70) > #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) > #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) > #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) > #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) > #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) > #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) > #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) > #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) > #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) > #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) > #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) > #18 main gdb/gdb.c:39 (gdb+0x423ce8) > > Previous write of size 8 at 0xffffee830558 by main thread: > #0 operator delete(void*, unsigned long) (libtsan.so.2+0x8fb14) > #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c) > #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c) > #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404) > #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8) > #5 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0) > #6 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18) > #7 gdb::function_view::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90) > #8 iterate_over_lwps(ptid_t, gdb::function_view) gdb/linux-nat.c:879 (gdb+0xb03a18) > #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8) > #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) > #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) > #12 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) > #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) > #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70) > #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) > #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) > #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) > #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) > #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) > #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) > #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) > #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) > #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) > #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) > #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) > #26 main gdb/gdb.c:39 (gdb+0x423ce8) > > SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp > ... > > Since heap-use-after-free is essentially an address sanitizer complaint, I > also tried building gdb with -O0 -fsanitize=address, but with this setup it > doesn't seem to trigger (0 times out of 10). > > The heap-use-after-free happens during the following scenario: > - linux_nat_wait_1 selects an LWP thread T1 with a status to report. > - it sets variable lp to point to the corresponding lwp_info. > - it calls stop_callback and stop_wait_callback for all threads > (because !target_is_non_stop_p ()). > - it calls select_event_lwp to maybe pick another thread than T1, to prevent > starvation. > > The problem seems to be the following: > - while calling stop_wait_callback for all threads, it also does this for T1. > While doing so, the corresponding lwp_info is deleted (callstack > stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable > lp as a dangling pointer. > - variable lp is passed to select_event_lwp, which derefences it, which causes > the heap-use-after-free. > > Note that the comment here mentions "all other LWP's": > ... > /* Now stop all other LWP's ... */ > iterate_over_lwps (minus_one_ptid, stop_callback); > /* ... and wait until all of them have reported back that > they're no longer running. */ > iterate_over_lwps (minus_one_ptid, stop_wait_callback); > ... > which presumably means other than the one in lp, but the iterators > don't skip lp. > > Fix this by making the code match the comment, and skipping stop_callback and > stop_wait_callback for lp It's not totally clear to me in which part of the test this happens. Is it when doing the "continue to end of inferior 2" test, in which case the thread T1 in your example is the thread of inferior 2 reporting an exit? > Tested on aarch64-linux. > > PR gdb/31259 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259 > --- > gdb/linux-nat.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index e91c57ba239..41f56935284 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus, > if (!target_is_non_stop_p ()) > { > /* Now stop all other LWP's ... */ > - iterate_over_lwps (minus_one_ptid, stop_callback); > + iterate_over_lwps (minus_one_ptid, > + [&] (struct lwp_info *info) > + { > + if (info == lp) > + return 0; > + return stop_callback (info); > + }); > > /* ... and wait until all of them have reported back that > they're no longer running. */ > - iterate_over_lwps (minus_one_ptid, stop_wait_callback); > + iterate_over_lwps (minus_one_ptid, > + [&] (struct lwp_info *info) > + { > + if (info == lp) > + return 0; > + return stop_wait_callback (info); > + }); > } > > /* If we're not waiting for a specific LWP, choose an event LWP from The code changes make sense to me. There's no need to stop the event thread, it's presumably already stopped. I think the code would be clearer if you wrote a for loop (I think we should phase out iterate_over_lwps): for (thread_info *to_stop : all_lwps_safe ()) { if (lp == to_stop) continue; stop_callback (to_stop); } stop_callback becomes a little bit misnamed here, but cleaning that up would be a bigger change. Simon