public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Pedro Alves <pedro@palves.net>, gdb-patches@sourceware.org
Cc: Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp
Date: Mon, 26 Feb 2024 15:23:08 +0100	[thread overview]
Message-ID: <8d111966-a623-4c84-86bf-d48ff4efbeb9@suse.de> (raw)
In-Reply-To: <01b587fa-6f28-4a64-baf5-9d985a0f78cc@palves.net>

On 2/23/24 15:33, Pedro Alves wrote:
> On 2024-02-22 11:43, Tom de Vries wrote:
>> On 2/21/24 18:42, Pedro Alves wrote:
> 
>>
>> It does.
>>
>> As mentioned in the PR, on aarch64-linux this reproduces for me 5/10 times, and with your patch 0/10 times.
>>
>> Given that pretty much the entire patch is yours, do you want to proceed with this, or do you want me to integrate this into my patch with a Co-Authored-By tag?
> 
> I think it'd be good to fix up the commit log a bit, so I thought of proceeding with this.

Thanks.

> I think it makes sense to add you as Co-Authored-By, as you wrote the initial commit
> log and did investigation work.
> 
> On the commit log, here:
> 
>>   The problem seems to be the following:
>>   - while calling stop_wait_callback for all threads, it also does this for T1.
>>     While doing so, the corresponding lwp_info is deleted (callstack
>>     stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
>>     lp as a dangling pointer.
>>   - variable lp is passed to select_event_lwp, which derefences it, which causes
>>     the heap-use-after-free."
> 
> we are missing part of the scenario.  It must have been that we were going to
> report an exit event for T1, because we know that the bug is that we missed setting
> its lwp->stopped flag from within stop_wait_callback -> wait_lwp.  So that miss must
> have happened earlier than what is described above, while reporting a previous event
> for another thread.  I can see no other way that this scenario could trigger.
> 
> Also, took me a bit to realize, but here:
> 
>>   The heap-use-after-free happens during the following scenario:
>>   - linux_nat_wait_1 selects an LWP thread T1 with a status to report.
> 
> I think you are saying that linux_nat_wait_1 selected a previously pending
> event, not that it pulled an event out of the kernel.
> 
> With that, it makes a lot more sense to me.
> 
> So, here's the same patch but now with an adjusted commit log, with that
> missing info added.  WDYT?
> 

Hi Pedro,

I've verified that the scenario is as you describe it.

LGTM.

Thanks,
- Tom

>  From 96f7005bf56ce57a8dfebcb48885342c5f9e237c Mon Sep 17 00:00:00 2001
> From: Pedro Alves <pedro@palves.net>
> Date: Wed, 21 Feb 2024 16:23:55 +0000
> Subject: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp
> 
> PR gdb/31259 reveals one scenario where we run into a
> heap-use-after-free reported by thread sanitizer, while running
> gdb.base/vfork-follow-parent.exp.
> 
> The heap-use-after-free happens during the following scenario:
> 
>   - linux_nat_wait_1 is about to return an event for T2.  It stops all
>     other threads, and while doing so, stop_wait_callback -> wait_lwp
>     sees T1 exit, and decides to leave the exit event pending.  It
>     should have set the lp->stopped flag too, but does not -- this is
>     the bug.
> 
>   - The event for T2 is reported, is processed by infrun, and we're
>     back at linux_nat_wait_1.
> 
>   - linux_nat_wait_1 selects LWP T1 with the pending exit status to
>     report.
> 
>   - it sets variable lp to point to the corresponding lwp_info.
> 
>   - it calls stop_callback and stop_wait_callback for all threads
>     (because !target_is_non_stop_p ()).
> 
>   - it calls select_event_lwp to maybe pick another thread than T1, to
>     prevent starvation.
> 
> The problem is the following:
> 
>   - while calling stop_wait_callback for all threads, it also does this
>     for T1.  While doing so, the corresponding lwp_info is deleted
>     (callstack stop_wait_callback -> wait_lwp -> exit_lwp ->
>     delete_lwp), leaving variable lp as a dangling pointer.
> 
>   - variable lp is passed to select_event_lwp, which derefences it,
>     which causes the heap-use-after-free.
> 
> Note that the comment here mentions "all other LWP's":
> ...
>        /* Now stop all other LWP's ...  */
>        iterate_over_lwps (minus_one_ptid, stop_callback);
>        /* ... and wait until all of them have reported back that
>          they're no longer running.  */
>        iterate_over_lwps (minus_one_ptid, stop_wait_callback);
> ...
> 
> The reason the comments say "all other LWP's", and doesn't bother
> filtering out LP is that lp->stopped should be true at this point, and
> the callbacks (both stop_callback and stop_wait_callback) check that
> flag, and do nothing if set.  I.e., they skip already-stopped threads,
> so they should skip LP.
> 
> In this particular scenario, though, we missed setting the stopped
> flag right in the first step described above, so LP was iterated over
> incorrectly.
> 
> The fix is to make wait_lwp set the lp->stopped flag when it decides
> to leave the exit event pending.  However, going a bit further,
> GDBserver has a mark_lwp_dead function to centralize setting up
> various lwp flags such that the rest of the code doesn't mishandle
> them, and it seems like a good idea to do a similar thing in GDB as
> well.  That is what this patch does.
> 
> PR gdb/31259
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
> Co-Authored-By: Tom de Vries <tdevries@suse.de>
> Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9
> ---
>   gdb/linux-nat.c | 34 +++++++++++++++++++++++++---------
>   1 file changed, 25 insertions(+), 9 deletions(-)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index e91c57ba239..a9984eb3221 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -2123,6 +2123,27 @@ wait_for_signal ()
>       }
>   }
>   
> +/* Mark LWP dead, with STATUS as exit status pending to report
> +   later.  */
> +
> +static void
> +mark_lwp_dead (lwp_info *lp, int status)
> +{
> +  /* Store the exit status lp->waitstatus, because lp->status would be
> +     ambiguous (W_EXITCODE(0,0) == 0).  */
> +  lp->waitstatus = host_status_to_waitstatus (status);
> +
> +  /* If we're processing LP's status, there should be no other event
> +     already recorded as pending.  */
> +  gdb_assert (lp->status == 0);
> +
> +  /* Dead LWPs aren't expected to report a pending sigstop.  */
> +  lp->signalled = 0;
> +
> +  /* Prevent trying to stop it.  */
> +  lp->stopped = 1;
> +}
> +
>   /* Wait for LP to stop.  Returns the wait status, or 0 if the LWP has
>      exited.  */
>   
> @@ -2207,9 +2228,8 @@ wait_lwp (struct lwp_info *lp)
>   
>   	      /* If this is the leader exiting, it means the whole
>   		 process is gone.  Store the status to report to the
> -		 core.  Store it in lp->waitstatus, because lp->status
> -		 would be ambiguous (W_EXITCODE(0,0) == 0).  */
> -	      lp->waitstatus = host_status_to_waitstatus (status);
> +		 core.  */
> +	      mark_lwp_dead (lp, status);
>   	      return 0;
>   	    }
>   
> @@ -3013,12 +3033,7 @@ linux_nat_filter_event (int lwpid, int status)
>         linux_nat_debug_printf ("LWP %ld exited (resumed=%d)",
>   			      lp->ptid.lwp (), lp->resumed);
>   
> -      /* Dead LWP's aren't expected to reported a pending sigstop.  */
> -      lp->signalled = 0;
> -
> -      /* Store the pending event in the waitstatus, because
> -	 W_EXITCODE(0,0) == 0.  */
> -      lp->waitstatus = host_status_to_waitstatus (status);
> +      mark_lwp_dead (lp, status);
>         return;
>       }
>   
> @@ -3368,6 +3383,7 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>       }
>   
>     gdb_assert (lp);
> +  gdb_assert (lp->stopped);
>   
>     status = lp->status;
>     lp->status = 0;
> 
> base-commit: e346d50a89106a52fa1545db5eade2a25a4932f0


  reply	other threads:[~2024-02-26 14:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 11:48 Tom de Vries
2024-01-23 16:08 ` Simon Marchi
2024-01-23 17:52   ` Tom de Vries
2024-02-09 15:46 ` Pedro Alves
2024-02-19 15:04   ` Tom de Vries
2024-02-21 17:42     ` Pedro Alves
2024-02-22 11:43       ` Tom de Vries
2024-02-23 14:33         ` Pedro Alves
2024-02-26 14:23           ` Tom de Vries [this message]
2024-02-26 15:28             ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d111966-a623-4c84-86bf-d48ff4efbeb9@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=pedro@palves.net \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).