public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: Tom de Vries <tdevries@suse.de>, gdb-patches@sourceware.org
Cc: Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp
Date: Fri, 23 Feb 2024 14:33:00 +0000	[thread overview]
Message-ID: <01b587fa-6f28-4a64-baf5-9d985a0f78cc@palves.net> (raw)
In-Reply-To: <ab380bc0-46aa-44d4-b6d9-44f87f52a9d5@suse.de>

On 2024-02-22 11:43, Tom de Vries wrote:
> On 2/21/24 18:42, Pedro Alves wrote:

> 
> It does.
> 
> As mentioned in the PR, on aarch64-linux this reproduces for me 5/10 times, and with your patch 0/10 times.
> 
> Given that pretty much the entire patch is yours, do you want to proceed with this, or do you want me to integrate this into my patch with a Co-Authored-By tag?

I think it'd be good to fix up the commit log a bit, so I thought of proceeding with this.
I think it makes sense to add you as Co-Authored-By, as you wrote the initial commit
log and did investigation work.

On the commit log, here:

>  The problem seems to be the following:
>  - while calling stop_wait_callback for all threads, it also does this for T1.
>    While doing so, the corresponding lwp_info is deleted (callstack
>    stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
>    lp as a dangling pointer.
>  - variable lp is passed to select_event_lwp, which derefences it, which causes
>    the heap-use-after-free."

we are missing part of the scenario.  It must have been that we were going to
report an exit event for T1, because we know that the bug is that we missed setting
its lwp->stopped flag from within stop_wait_callback -> wait_lwp.  So that miss must
have happened earlier than what is described above, while reporting a previous event
for another thread.  I can see no other way that this scenario could trigger.

Also, took me a bit to realize, but here:

>  The heap-use-after-free happens during the following scenario:
>  - linux_nat_wait_1 selects an LWP thread T1 with a status to report.

I think you are saying that linux_nat_wait_1 selected a previously pending
event, not that it pulled an event out of the kernel.

With that, it makes a lot more sense to me.

So, here's the same patch but now with an adjusted commit log, with that
missing info added.  WDYT?

From 96f7005bf56ce57a8dfebcb48885342c5f9e237c Mon Sep 17 00:00:00 2001
From: Pedro Alves <pedro@palves.net>
Date: Wed, 21 Feb 2024 16:23:55 +0000
Subject: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp

PR gdb/31259 reveals one scenario where we run into a
heap-use-after-free reported by thread sanitizer, while running
gdb.base/vfork-follow-parent.exp.

The heap-use-after-free happens during the following scenario:

 - linux_nat_wait_1 is about to return an event for T2.  It stops all
   other threads, and while doing so, stop_wait_callback -> wait_lwp
   sees T1 exit, and decides to leave the exit event pending.  It
   should have set the lp->stopped flag too, but does not -- this is
   the bug.

 - The event for T2 is reported, is processed by infrun, and we're
   back at linux_nat_wait_1.

 - linux_nat_wait_1 selects LWP T1 with the pending exit status to
   report.

 - it sets variable lp to point to the corresponding lwp_info.

 - it calls stop_callback and stop_wait_callback for all threads
   (because !target_is_non_stop_p ()).

 - it calls select_event_lwp to maybe pick another thread than T1, to
   prevent starvation.

The problem is the following:

 - while calling stop_wait_callback for all threads, it also does this
   for T1.  While doing so, the corresponding lwp_info is deleted
   (callstack stop_wait_callback -> wait_lwp -> exit_lwp ->
   delete_lwp), leaving variable lp as a dangling pointer.

 - variable lp is passed to select_event_lwp, which derefences it,
   which causes the heap-use-after-free.

Note that the comment here mentions "all other LWP's":
...
      /* Now stop all other LWP's ...  */
      iterate_over_lwps (minus_one_ptid, stop_callback);
      /* ... and wait until all of them have reported back that
        they're no longer running.  */
      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
...

The reason the comments say "all other LWP's", and doesn't bother
filtering out LP is that lp->stopped should be true at this point, and
the callbacks (both stop_callback and stop_wait_callback) check that
flag, and do nothing if set.  I.e., they skip already-stopped threads,
so they should skip LP.

In this particular scenario, though, we missed setting the stopped
flag right in the first step described above, so LP was iterated over
incorrectly.

The fix is to make wait_lwp set the lp->stopped flag when it decides
to leave the exit event pending.  However, going a bit further,
GDBserver has a mark_lwp_dead function to centralize setting up
various lwp flags such that the rest of the code doesn't mishandle
them, and it seems like a good idea to do a similar thing in GDB as
well.  That is what this patch does.

PR gdb/31259
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
Co-Authored-By: Tom de Vries <tdevries@suse.de>
Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9
---
 gdb/linux-nat.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index e91c57ba239..a9984eb3221 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -2123,6 +2123,27 @@ wait_for_signal ()
     }
 }
 
+/* Mark LWP dead, with STATUS as exit status pending to report
+   later.  */
+
+static void
+mark_lwp_dead (lwp_info *lp, int status)
+{
+  /* Store the exit status lp->waitstatus, because lp->status would be
+     ambiguous (W_EXITCODE(0,0) == 0).  */
+  lp->waitstatus = host_status_to_waitstatus (status);
+
+  /* If we're processing LP's status, there should be no other event
+     already recorded as pending.  */
+  gdb_assert (lp->status == 0);
+
+  /* Dead LWPs aren't expected to report a pending sigstop.  */
+  lp->signalled = 0;
+
+  /* Prevent trying to stop it.  */
+  lp->stopped = 1;
+}
+
 /* Wait for LP to stop.  Returns the wait status, or 0 if the LWP has
    exited.  */
 
@@ -2207,9 +2228,8 @@ wait_lwp (struct lwp_info *lp)
 
 	      /* If this is the leader exiting, it means the whole
 		 process is gone.  Store the status to report to the
-		 core.  Store it in lp->waitstatus, because lp->status
-		 would be ambiguous (W_EXITCODE(0,0) == 0).  */
-	      lp->waitstatus = host_status_to_waitstatus (status);
+		 core.  */
+	      mark_lwp_dead (lp, status);
 	      return 0;
 	    }
 
@@ -3013,12 +3033,7 @@ linux_nat_filter_event (int lwpid, int status)
       linux_nat_debug_printf ("LWP %ld exited (resumed=%d)",
 			      lp->ptid.lwp (), lp->resumed);
 
-      /* Dead LWP's aren't expected to reported a pending sigstop.  */
-      lp->signalled = 0;
-
-      /* Store the pending event in the waitstatus, because
-	 W_EXITCODE(0,0) == 0.  */
-      lp->waitstatus = host_status_to_waitstatus (status);
+      mark_lwp_dead (lp, status);
       return;
     }
 
@@ -3368,6 +3383,7 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
     }
 
   gdb_assert (lp);
+  gdb_assert (lp->stopped);
 
   status = lp->status;
   lp->status = 0;

base-commit: e346d50a89106a52fa1545db5eade2a25a4932f0
-- 
2.43.2



  reply	other threads:[~2024-02-23 14:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 11:48 Tom de Vries
2024-01-23 16:08 ` Simon Marchi
2024-01-23 17:52   ` Tom de Vries
2024-02-09 15:46 ` Pedro Alves
2024-02-19 15:04   ` Tom de Vries
2024-02-21 17:42     ` Pedro Alves
2024-02-22 11:43       ` Tom de Vries
2024-02-23 14:33         ` Pedro Alves [this message]
2024-02-26 14:23           ` Tom de Vries
2024-02-26 15:28             ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01b587fa-6f28-4a64-baf5-9d985a0f78cc@palves.net \
    --to=pedro@palves.net \
    --cc=gdb-patches@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).