From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by sourceware.org (Postfix) with ESMTPS id 3D43F3858C50 for ; Fri, 23 Feb 2024 14:33:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3D43F3858C50 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=palves.net Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 3D43F3858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=209.85.128.45 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708698786; cv=none; b=IYE63bjKzrq0wPqGZBKf7G0gh+dWRLX/6Spev6rIg3x+AvrQs83fz1INyrEc5MWTVykupl8FYtr5WS7tivHS3Fqzu97lJ+8HZ4EqqQBHiXwni0vA2p8VNFMUCTcIQU9Jtyu5ib48MHzPuhLbWOK7bF8OTLAUymIgEA078zDt5io= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708698786; c=relaxed/simple; bh=5zE/BTRkqbqf3uljGchnBi6v6oT0iA5lrBkb1c/wPC8=; h=Message-ID:Date:MIME-Version:Subject:To:From; b=KzffiUpxkAxgWAwwaabrPiL85q6Gkqesn9ESzKF16QYEZ6+gjxJ++tB2aF8Ufrcqd1NWY4Si+CpQjLnZKxTYyuG/Uc2zNVVvLoJ002OfStIpNkxsImF7+aLsvpmSZdlSpDnzdYJCzhZvCoAsYCCDOXZ2D4IdZjUQNETG4YIlbaw= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-4127190ad83so6086965e9.0 for ; Fri, 23 Feb 2024 06:33:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708698783; x=1709303583; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ufA4cUy8FvUfRLGgH/4IFaK+SWFww08hQwuKt4YG1Oc=; b=rrgN+WiXB47xJ+EYq5oDcCbIJcHWi8kh/Hs39WWCJMKsofJPc+uvbkFcR/7/IMCYuX JSoS8hvnvMtW1GLKzum6PwmHWbwAnVDboaGxwapSQz5jvOCwM6jOUzmfHR18ufLx8Njs 0//zYmYgzl4EMPV8bn4zwdO8ipVXB9p5SXudHqdsL8ytJu/Jrs2q7kTclp1+nG3VaS+t y536Hi7XhqrzptNs6K5lU9XOftSkkHI6EEjcHmVr4zWxlF9+iGt4x6iCe9oxaLtF7f30 EhsPp2H20fAA+sexkpuvXXYnqTPYjvK3vIYKqcJxIA/u8Lg90+3GgK2tp5fBDrX1a/wC cD9A== X-Forwarded-Encrypted: i=1; AJvYcCXOCMgPBj1/hVvqktOe7zZvHyTQzL4QZYCMH1GcI8tnb97JQEozqYU9b50zvKkXfcPfEkMB4FJJ79IN3dXBhs0v9ofAQvb2fYoDVQ== X-Gm-Message-State: AOJu0YxLmzF0JfBZsEHq7UlT1xfB7rFNWX1bWoOT6w1m+9PeBfuhe4jw V9lreRfVdI7OaOsqjmf34Bx4oT+UXZw99ZB8KYB9AWQeafTK7krU X-Google-Smtp-Source: AGHT+IG4LtZWjDxG7eUKy1e+yEUJhdRhylgrouk9+X2PQ9qfLU6qNNebB7VQ2756xUUkYAryl7N5gw== X-Received: by 2002:a05:600c:46d4:b0:412:748b:7197 with SMTP id q20-20020a05600c46d400b00412748b7197mr1965wmo.24.1708698782596; Fri, 23 Feb 2024 06:33:02 -0800 (PST) Received: from ?IPV6:2001:8a0:f918:ab00:ac50:2e85:9ec5:a3f0? ([2001:8a0:f918:ab00:ac50:2e85:9ec5:a3f0]) by smtp.gmail.com with ESMTPSA id p29-20020a05600c1d9d00b00411a6ce0f99sm2652936wms.24.2024.02.23.06.33.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 23 Feb 2024 06:33:02 -0800 (PST) Message-ID: <01b587fa-6f28-4a64-baf5-9d985a0f78cc@palves.net> Date: Fri, 23 Feb 2024 14:33:00 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp Content-Language: en-US To: Tom de Vries , gdb-patches@sourceware.org Cc: Simon Marchi References: <20240123114830.20253-1-tdevries@suse.de> <830ab71f-8968-4ab0-b8e7-8a2884169d4c@palves.net> <17059615-cba2-4ff2-a8dc-c6799930406f@palves.net> From: Pedro Alves In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,GIT_PATCH_0,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 2024-02-22 11:43, Tom de Vries wrote: > On 2/21/24 18:42, Pedro Alves wrote: > > It does. > > As mentioned in the PR, on aarch64-linux this reproduces for me 5/10 times, and with your patch 0/10 times. > > Given that pretty much the entire patch is yours, do you want to proceed with this, or do you want me to integrate this into my patch with a Co-Authored-By tag? I think it'd be good to fix up the commit log a bit, so I thought of proceeding with this. I think it makes sense to add you as Co-Authored-By, as you wrote the initial commit log and did investigation work. On the commit log, here: > The problem seems to be the following: > - while calling stop_wait_callback for all threads, it also does this for T1. > While doing so, the corresponding lwp_info is deleted (callstack > stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable > lp as a dangling pointer. > - variable lp is passed to select_event_lwp, which derefences it, which causes > the heap-use-after-free." we are missing part of the scenario. It must have been that we were going to report an exit event for T1, because we know that the bug is that we missed setting its lwp->stopped flag from within stop_wait_callback -> wait_lwp. So that miss must have happened earlier than what is described above, while reporting a previous event for another thread. I can see no other way that this scenario could trigger. Also, took me a bit to realize, but here: > The heap-use-after-free happens during the following scenario: > - linux_nat_wait_1 selects an LWP thread T1 with a status to report. I think you are saying that linux_nat_wait_1 selected a previously pending event, not that it pulled an event out of the kernel. With that, it makes a lot more sense to me. So, here's the same patch but now with an adjusted commit log, with that missing info added. WDYT? >From 96f7005bf56ce57a8dfebcb48885342c5f9e237c Mon Sep 17 00:00:00 2001 From: Pedro Alves Date: Wed, 21 Feb 2024 16:23:55 +0000 Subject: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp PR gdb/31259 reveals one scenario where we run into a heap-use-after-free reported by thread sanitizer, while running gdb.base/vfork-follow-parent.exp. The heap-use-after-free happens during the following scenario: - linux_nat_wait_1 is about to return an event for T2. It stops all other threads, and while doing so, stop_wait_callback -> wait_lwp sees T1 exit, and decides to leave the exit event pending. It should have set the lp->stopped flag too, but does not -- this is the bug. - The event for T2 is reported, is processed by infrun, and we're back at linux_nat_wait_1. - linux_nat_wait_1 selects LWP T1 with the pending exit status to report. - it sets variable lp to point to the corresponding lwp_info. - it calls stop_callback and stop_wait_callback for all threads (because !target_is_non_stop_p ()). - it calls select_event_lwp to maybe pick another thread than T1, to prevent starvation. The problem is the following: - while calling stop_wait_callback for all threads, it also does this for T1. While doing so, the corresponding lwp_info is deleted (callstack stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable lp as a dangling pointer. - variable lp is passed to select_event_lwp, which derefences it, which causes the heap-use-after-free. Note that the comment here mentions "all other LWP's": ... /* Now stop all other LWP's ... */ iterate_over_lwps (minus_one_ptid, stop_callback); /* ... and wait until all of them have reported back that they're no longer running. */ iterate_over_lwps (minus_one_ptid, stop_wait_callback); ... The reason the comments say "all other LWP's", and doesn't bother filtering out LP is that lp->stopped should be true at this point, and the callbacks (both stop_callback and stop_wait_callback) check that flag, and do nothing if set. I.e., they skip already-stopped threads, so they should skip LP. In this particular scenario, though, we missed setting the stopped flag right in the first step described above, so LP was iterated over incorrectly. The fix is to make wait_lwp set the lp->stopped flag when it decides to leave the exit event pending. However, going a bit further, GDBserver has a mark_lwp_dead function to centralize setting up various lwp flags such that the rest of the code doesn't mishandle them, and it seems like a good idea to do a similar thing in GDB as well. That is what this patch does. PR gdb/31259 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259 Co-Authored-By: Tom de Vries Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9 --- gdb/linux-nat.c | 34 +++++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index e91c57ba239..a9984eb3221 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -2123,6 +2123,27 @@ wait_for_signal () } } +/* Mark LWP dead, with STATUS as exit status pending to report + later. */ + +static void +mark_lwp_dead (lwp_info *lp, int status) +{ + /* Store the exit status lp->waitstatus, because lp->status would be + ambiguous (W_EXITCODE(0,0) == 0). */ + lp->waitstatus = host_status_to_waitstatus (status); + + /* If we're processing LP's status, there should be no other event + already recorded as pending. */ + gdb_assert (lp->status == 0); + + /* Dead LWPs aren't expected to report a pending sigstop. */ + lp->signalled = 0; + + /* Prevent trying to stop it. */ + lp->stopped = 1; +} + /* Wait for LP to stop. Returns the wait status, or 0 if the LWP has exited. */ @@ -2207,9 +2228,8 @@ wait_lwp (struct lwp_info *lp) /* If this is the leader exiting, it means the whole process is gone. Store the status to report to the - core. Store it in lp->waitstatus, because lp->status - would be ambiguous (W_EXITCODE(0,0) == 0). */ - lp->waitstatus = host_status_to_waitstatus (status); + core. */ + mark_lwp_dead (lp, status); return 0; } @@ -3013,12 +3033,7 @@ linux_nat_filter_event (int lwpid, int status) linux_nat_debug_printf ("LWP %ld exited (resumed=%d)", lp->ptid.lwp (), lp->resumed); - /* Dead LWP's aren't expected to reported a pending sigstop. */ - lp->signalled = 0; - - /* Store the pending event in the waitstatus, because - W_EXITCODE(0,0) == 0. */ - lp->waitstatus = host_status_to_waitstatus (status); + mark_lwp_dead (lp, status); return; } @@ -3368,6 +3383,7 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus, } gdb_assert (lp); + gdb_assert (lp->stopped); status = lp->status; lp->status = 0; base-commit: e346d50a89106a52fa1545db5eade2a25a4932f0 -- 2.43.2