From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp.polymtl.ca (smtp.polymtl.ca [132.207.4.11]) by sourceware.org (Postfix) with ESMTPS id 6E5AB3858C66 for ; Tue, 23 Jan 2024 16:08:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6E5AB3858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=polymtl.ca Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=polymtl.ca ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6E5AB3858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=132.207.4.11 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706026111; cv=none; b=FCHWFAZZlsJJZjxAEJxKORPbcStKgkkGkEzVFSwyspwOyZC08BWYS6TYJbVTL397nYxgCUa6/56ym6sRNt4cVz58cnpKeBiWGOSBls6BBuLUeGpl93jRyidwbX21t0wDw5PNRwwpMq10Nrrdgxv36Ww0ydQLzxhlMU4aYNrqEls= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706026111; c=relaxed/simple; bh=CNX+EJMbprTlNQnLIPIrDQl6wTGT/x4yLKISHiKbpls=; h=DKIM-Signature:Message-ID:Date:MIME-Version:Subject:To:From; b=UkZdEmEX6M+akY4gQNzyv4DMgZ8HltM21qL9k0VtMy+ueyEr4FkGDKLklvORCUM6o+tRBqWtvl3mQtmtBmsOhJtSKV3uXtDisUWsNy/m4FB/dYYnZwo1/gdc6qvUDeFDFnkoEcRI3+yfFWAb5MVdm54tmnVx7KSD/4cYU7ryF3U= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from simark.ca (simark.ca [158.69.221.121]) (authenticated bits=0) by smtp.polymtl.ca (8.14.7/8.14.7) with ESMTP id 40NG8Kqi010101 (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 23 Jan 2024 11:08:25 -0500 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp.polymtl.ca 40NG8Kqi010101 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=polymtl.ca; s=default; t=1706026105; bh=P37L1l9fpTV8aCGzNYGLTv8i2KKuphgKJ4uxjfmkriM=; h=Date:Subject:To:From:In-Reply-To:From; b=EcU/BHvo4YHGBE2Ncg0MqOSkqFPX+4Jd3OsSLzT93VGlSeN+BdvBTGmPsodK0xJ54 eqtEEJGx5HayP/A6EdjhW2fc5IQU3Nm8k6Lru76B5IArP9YOUcoj0pmAabR8F7hGGe 1g8hTYLd5kPt82TncQksHP37V59woZObC9Bh55es= Received: from [10.0.0.11] (modemcable238.237-201-24.mc.videotron.ca [24.201.237.238]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature ECDSA (prime256v1) server-digest SHA256) (No client certificate requested) by simark.ca (Postfix) with ESMTPSA id 2609D1E092; Tue, 23 Jan 2024 11:08:19 -0500 (EST) Message-ID: Date: Tue, 23 Jan 2024 11:08:19 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp Content-Language: en-US To: Tom de Vries , gdb-patches@sourceware.org References: <20240123114830.20253-1-tdevries@suse.de> From: Simon Marchi In-Reply-To: <20240123114830.20253-1-tdevries@suse.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Poly-FromMTA: (simark.ca [158.69.221.121]) at Tue, 23 Jan 2024 16:08:20 +0000 X-Spam-Status: No, score=-3037.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2024-01-23 06:48, Tom de Vries wrote: > When building gdb with -O0 -fsanitize=thread, and running test-case > gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into: > ... > WARNING: ThreadSanitizer: heap-use-after-free (pid=249653) > Write of size 4 at 0xffffee83055c by main thread: > #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c) > #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470) > #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) > #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) > #4 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) > #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) > #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70) > #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) > #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) > #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) > #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) > #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) > #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) > #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) > #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) > #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) > #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) > #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) > #18 main gdb/gdb.c:39 (gdb+0x423ce8) > > Previous write of size 8 at 0xffffee830558 by main thread: > #0 operator delete(void*, unsigned long) (libtsan.so.2+0x8fb14) > #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c) > #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c) > #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404) > #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8) > #5 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0) > #6 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18) > #7 gdb::function_view::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90) > #8 iterate_over_lwps(ptid_t, gdb::function_view) gdb/linux-nat.c:879 (gdb+0xb03a18) > #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8) > #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) > #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) > #12 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) > #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) > #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70) > #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) > #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) > #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) > #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) > #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) > #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) > #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) > #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) > #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) > #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) > #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) > #26 main gdb/gdb.c:39 (gdb+0x423ce8) > > SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp > ... > > Since heap-use-after-free is essentially an address sanitizer complaint, I > also tried building gdb with -O0 -fsanitize=address, but with this setup it > doesn't seem to trigger (0 times out of 10). > > The heap-use-after-free happens during the following scenario: > - linux_nat_wait_1 selects an LWP thread T1 with a status to report. > - it sets variable lp to point to the corresponding lwp_info. > - it calls stop_callback and stop_wait_callback for all threads > (because !target_is_non_stop_p ()). > - it calls select_event_lwp to maybe pick another thread than T1, to prevent > starvation. > > The problem seems to be the following: > - while calling stop_wait_callback for all threads, it also does this for T1. > While doing so, the corresponding lwp_info is deleted (callstack > stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable > lp as a dangling pointer. > - variable lp is passed to select_event_lwp, which derefences it, which causes > the heap-use-after-free. > > Note that the comment here mentions "all other LWP's": > ... > /* Now stop all other LWP's ... */ > iterate_over_lwps (minus_one_ptid, stop_callback); > /* ... and wait until all of them have reported back that > they're no longer running. */ > iterate_over_lwps (minus_one_ptid, stop_wait_callback); > ... > which presumably means other than the one in lp, but the iterators > don't skip lp. > > Fix this by making the code match the comment, and skipping stop_callback and > stop_wait_callback for lp. > > Tested on aarch64-linux. > > PR gdb/31259 > Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259 > --- > gdb/linux-nat.c | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c > index e91c57ba239..8bfae8555fc 100644 > --- a/gdb/linux-nat.c > +++ b/gdb/linux-nat.c > @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus, > if (!target_is_non_stop_p ()) > { > /* Now stop all other LWP's ... */ > - iterate_over_lwps (minus_one_ptid, stop_callback); > + for (lwp_info *other_lp : all_lwps_safe ()) > + { > + if (other_lp == lp) > + continue; > + > + stop_callback (other_lp); > + } > > /* ... and wait until all of them have reported back that > they're no longer running. */ > - iterate_over_lwps (minus_one_ptid, stop_wait_callback); > + for (lwp_info *other_lp : all_lwps_safe ()) > + { > + if (other_lp == lp) > + continue; > + > + stop_wait_callback (other_lp); > + } I did a bit of archeology to see how this code evolved, and I noticed that this change in commit 9c02b52532 ("linux-nat.c: better starvation avoidance, handle non-stop mode too"): https://gitlab.com/gnutools/binutils-gdb/-/commit/9c02b52532ac?view=parallel#a360e5f37ff035d1ed6814cb60de9f2826b55788_3373_3362 Previously, we had: lp->stopped = 1; just before the snippet you modify. Both stop_callback and stop_wait_callback are no-ops if `lp->stopped` is true, so that would make them skip over the event thread. The commit removed that, so I suppose that the problem was introduced in that commit. Ideally, Pedro should look at this, but in the mean time: Reviewed-By: Simon Marchi Simon