From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) by sourceware.org (Postfix) with ESMTPS id 71A623858D3C for ; Tue, 23 Jan 2024 17:51:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 71A623858D3C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 71A623858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706032312; cv=none; b=pA1ajrUhoz+DWS9HVUMdn+BeNfZN66VuINlbK/517y6LqNkESuJCi8chjQ5K0bl4K2BWK+CDp6zsNYFhXK1ZUmcH3MtOiv+2r/m5PNh3YDjNTBYKfIN6VIGpeFhoj0gBh0XOYLEnXfTYfkRGG5vyFfMEMOYYifgFc/X4K0kSdMU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706032312; c=relaxed/simple; bh=OIdCTStnT9Pv2Ga+XE+V7gjy7bFdVS4abYQNHOUaiWc=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature: Message-ID:Date:MIME-Version:Subject:To:From; b=c+eVkxTby7g/2tmklAEk7aCzMJHV7YJ1DBs/nGv6LGwy6uLBL9fgPx2EOQDfGfUlCBJT7VWEAUSLzj01izVyjyI89i/vehvzbDt/7sxSNYeuWLrou8NCM/Egc4+rkV0tnSbsq/eo83xAd/ojLrM9SRi3dWlRUYxPgSt7lyenURs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 99B9D21A4F; Tue, 23 Jan 2024 17:51:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706032308; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYij5CqNiqmZsggSB9fTuldCgCjmPACeo0zXwpoOUHQ=; b=pNojKeDnuhMKVqUrGsMwurWAaTqAOMCD1w5G3lnnqG3/r0zUa/HfTysbCBZDJ22c5QIOTb QM9VoucL7Tpy0pGQYPZ1DoKUBXBmsWAh7/WqJ68b2wuEwfmjrz+sd6VuX9h6us29gQxIN5 tu9TC5Fhg+BilLi8g8g0V0ry02Y74jw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706032308; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYij5CqNiqmZsggSB9fTuldCgCjmPACeo0zXwpoOUHQ=; b=Z9KwzHvtNwmHlXoSbgun/mXFowpEpNLYqaPzYaO/6WUWZepoO05KC8CAlglzfGLmThb1hc JZAxoWOEcfaJwHBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1706032307; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYij5CqNiqmZsggSB9fTuldCgCjmPACeo0zXwpoOUHQ=; b=uQOqt07Xi75zowW6+LiYtVUqFlNx4CR1lOiTe1S7O7pWELEvrNQDySNsRycllCuDELrIiu e0J1P90UcSEcEdozECZdjEJqJ4Nl10rshuX0m+ah7lO8TmHoC9fdHeqsFl5HSl9yhS4/op cLftyChxL5VojDGdPpfJJfN8P8JAtCQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1706032307; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TYij5CqNiqmZsggSB9fTuldCgCjmPACeo0zXwpoOUHQ=; b=9WEgw91buEQ+yENtxAD2Yj8oiM6CJpZrZ7bC+Ln6SMY8vUuTGt07xO03NLfSe3bAL4BUKI AU0lALlmK49hTmCw== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 75916136A4; Tue, 23 Jan 2024 17:51:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id TnZXG7P8r2UqJgAAD6G6ig (envelope-from ); Tue, 23 Jan 2024 17:51:47 +0000 Message-ID: Date: Tue, 23 Jan 2024 18:52:59 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp Content-Language: en-US To: Simon Marchi , gdb-patches@sourceware.org References: <20240123114830.20253-1-tdevries@suse.de> Cc: Pedro Alves From: Tom de Vries In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Authentication-Results: smtp-out1.suse.de; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=uQOqt07X; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=9WEgw91b X-Spamd-Result: default: False [-5.30 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; R_DKIM_ALLOW(-0.20)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; XM_UA_NO_VERSION(0.01)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_MED(-2.00)[suse.de:dkim]; BAYES_HAM(-3.00)[100.00%]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DKIM_TRACE(0.00)[suse.de:+]; MX_GOOD(-0.01)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:dkim]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Rspamd-Server: rspamd1.dmz-prg2.suse.org X-Rspamd-Queue-Id: 99B9D21A4F X-Spam-Level: X-Spam-Score: -5.30 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 1/23/24 17:08, Simon Marchi wrote: > > > On 2024-01-23 06:48, Tom de Vries wrote: >> When building gdb with -O0 -fsanitize=thread, and running test-case >> gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into: >> ... >> WARNING: ThreadSanitizer: heap-use-after-free (pid=249653) >> Write of size 4 at 0xffffee83055c by main thread: >> #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c) >> #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470) >> #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) >> #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) >> #4 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) >> #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) >> #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70) >> #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) >> #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) >> #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) >> #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) >> #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) >> #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) >> #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) >> #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) >> #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) >> #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) >> #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) >> #18 main gdb/gdb.c:39 (gdb+0x423ce8) >> >> Previous write of size 8 at 0xffffee830558 by main thread: >> #0 operator delete(void*, unsigned long) (libtsan.so.2+0x8fb14) >> #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c) >> #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c) >> #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404) >> #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8) >> #5 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0) >> #6 gdb::function_view::bind(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18) >> #7 gdb::function_view::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90) >> #8 iterate_over_lwps(ptid_t, gdb::function_view) gdb/linux-nat.c:879 (gdb+0xb03a18) >> #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8) >> #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-nat.c:3560 (gdb+0xb0cfc8) >> #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags) gdb/linux-thread-db.c:1402 (gdb+0xb35958) >> #12 target_wait(ptid_t, target_waitstatus*, enum_flags) gdb/target.c:2571 (gdb+0xfb6c34) >> #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4) >> #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70) >> #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc) >> #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658) >> #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8) >> #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694) >> #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c) >> #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700) >> #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac) >> #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c) >> #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc) >> #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4) >> #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594) >> #26 main gdb/gdb.c:39 (gdb+0x423ce8) >> >> SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp >> ... >> >> Since heap-use-after-free is essentially an address sanitizer complaint, I >> also tried building gdb with -O0 -fsanitize=address, but with this setup it >> doesn't seem to trigger (0 times out of 10). >> >> The heap-use-after-free happens during the following scenario: >> - linux_nat_wait_1 selects an LWP thread T1 with a status to report. >> - it sets variable lp to point to the corresponding lwp_info. >> - it calls stop_callback and stop_wait_callback for all threads >> (because !target_is_non_stop_p ()). >> - it calls select_event_lwp to maybe pick another thread than T1, to prevent >> starvation. >> >> The problem seems to be the following: >> - while calling stop_wait_callback for all threads, it also does this for T1. >> While doing so, the corresponding lwp_info is deleted (callstack >> stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable >> lp as a dangling pointer. >> - variable lp is passed to select_event_lwp, which derefences it, which causes >> the heap-use-after-free. >> >> Note that the comment here mentions "all other LWP's": >> ... >> /* Now stop all other LWP's ... */ >> iterate_over_lwps (minus_one_ptid, stop_callback); >> /* ... and wait until all of them have reported back that >> they're no longer running. */ >> iterate_over_lwps (minus_one_ptid, stop_wait_callback); >> ... >> which presumably means other than the one in lp, but the iterators >> don't skip lp. >> >> Fix this by making the code match the comment, and skipping stop_callback and >> stop_wait_callback for lp. >> >> Tested on aarch64-linux. >> >> PR gdb/31259 >> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259 >> --- >> gdb/linux-nat.c | 16 ++++++++++++++-- >> 1 file changed, 14 insertions(+), 2 deletions(-) >> >> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c >> index e91c57ba239..8bfae8555fc 100644 >> --- a/gdb/linux-nat.c >> +++ b/gdb/linux-nat.c >> @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus, >> if (!target_is_non_stop_p ()) >> { >> /* Now stop all other LWP's ... */ >> - iterate_over_lwps (minus_one_ptid, stop_callback); >> + for (lwp_info *other_lp : all_lwps_safe ()) >> + { >> + if (other_lp == lp) >> + continue; >> + >> + stop_callback (other_lp); >> + } >> >> /* ... and wait until all of them have reported back that >> they're no longer running. */ >> - iterate_over_lwps (minus_one_ptid, stop_wait_callback); >> + for (lwp_info *other_lp : all_lwps_safe ()) >> + { >> + if (other_lp == lp) >> + continue; >> + >> + stop_wait_callback (other_lp); >> + } > > I did a bit of archeology to see how this code evolved, and I noticed > that this change in commit 9c02b52532 ("linux-nat.c: better starvation > avoidance, handle non-stop mode too"): > > https://gitlab.com/gnutools/binutils-gdb/-/commit/9c02b52532ac?view=parallel#a360e5f37ff035d1ed6814cb60de9f2826b55788_3373_3362 > > Previously, we had: > > lp->stopped = 1; > > just before the snippet you modify. Both stop_callback and > stop_wait_callback are no-ops if `lp->stopped` is true, so that would > make them skip over the event thread. The commit removed that, so I > suppose that the problem was introduced in that commit. > Hi Simon, thanks for the review. > Ideally, Pedro should look at this, but in the mean time: > cc-ing Pedro. Thanks, - Tom > Reviewed-By: Simon Marchi > > Simon