public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Simon Marchi <simon.marchi@polymtl.ca>
To: Tom de Vries <tdevries@suse.de>, gdb-patches@sourceware.org
Subject: Re: [PATCH v2] [gdb] Fix heap-use-after-free in select_event_lwp
Date: Tue, 23 Jan 2024 11:08:19 -0500	[thread overview]
Message-ID: <e1f9f082-032c-4d16-a1d1-f23a250e2128@polymtl.ca> (raw)
In-Reply-To: <20240123114830.20253-1-tdevries@suse.de>



On 2024-01-23 06:48, Tom de Vries wrote:
> When building gdb with -O0 -fsanitize=thread, and running test-case
> gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into:
> ...
> WARNING: ThreadSanitizer: heap-use-after-free (pid=249653)
>   Write of size 4 at 0xffffee83055c by main thread:
>     #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c)
>     #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470)
>     #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>     #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>     #4 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>     #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>     #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>     #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>     #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>     #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>     #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>     #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>     #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>     #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>     #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>     #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>     #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>     #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>     #18 main gdb/gdb.c:39 (gdb+0x423ce8)
> 
>   Previous write of size 8 at 0xffffee830558 by main thread:
>     #0 operator delete(void*, unsigned long) <null> (libtsan.so.2+0x8fb14)
>     #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c)
>     #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c)
>     #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404)
>     #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8)
>     #5 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0)
>     #6 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18)
>     #7 gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90)
>     #8 iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) gdb/linux-nat.c:879 (gdb+0xb03a18)
>     #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8)
>     #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>     #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>     #12 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>     #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>     #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>     #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>     #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>     #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>     #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>     #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>     #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>     #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>     #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>     #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>     #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>     #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>     #26 main gdb/gdb.c:39 (gdb+0x423ce8)
> 
> SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp
> ...
> 
> Since heap-use-after-free is essentially an address sanitizer complaint, I
> also tried building gdb with -O0 -fsanitize=address, but with this setup it
> doesn't seem to trigger (0 times out of 10).
> 
> The heap-use-after-free happens during the following scenario:
> - linux_nat_wait_1 selects an LWP thread T1 with a status to report.
> - it sets variable lp to point to the corresponding lwp_info.
> - it calls stop_callback and stop_wait_callback for all threads
>   (because !target_is_non_stop_p ()).
> - it calls select_event_lwp to maybe pick another thread than T1, to prevent
>   starvation.
> 
> The problem seems to be the following:
> - while calling stop_wait_callback for all threads, it also does this for T1.
>   While doing so, the corresponding lwp_info is deleted (callstack
>   stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
>   lp as a dangling pointer.
> - variable lp is passed to select_event_lwp, which derefences it, which causes
>   the heap-use-after-free.
> 
> Note that the comment here mentions "all other LWP's":
> ...
>       /* Now stop all other LWP's ...  */
>       iterate_over_lwps (minus_one_ptid, stop_callback);
>       /* ... and wait until all of them have reported back that
>         they're no longer running.  */
>       iterate_over_lwps (minus_one_ptid, stop_wait_callback);
> ...
> which presumably means other than the one in lp, but the iterators
> don't skip lp.
> 
> Fix this by making the code match the comment, and skipping stop_callback and
> stop_wait_callback for lp.
> 
> Tested on aarch64-linux.
> 
> PR gdb/31259
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
> ---
>  gdb/linux-nat.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index e91c57ba239..8bfae8555fc 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>    if (!target_is_non_stop_p ())
>      {
>        /* Now stop all other LWP's ...  */
> -      iterate_over_lwps (minus_one_ptid, stop_callback);
> +      for (lwp_info *other_lp : all_lwps_safe ())
> +	{
> +	  if (other_lp == lp)
> +	    continue;
> +
> +	  stop_callback (other_lp);
> +	}
>  
>        /* ... and wait until all of them have reported back that
>  	 they're no longer running.  */
> -      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
> +      for (lwp_info *other_lp : all_lwps_safe ())
> +	{
> +	  if (other_lp == lp)
> +	    continue;
> +
> +	  stop_wait_callback (other_lp);
> +	}

I did a bit of archeology to see how this code evolved, and I noticed
that this change in commit 9c02b52532 ("linux-nat.c: better starvation
avoidance, handle non-stop mode too"):

https://gitlab.com/gnutools/binutils-gdb/-/commit/9c02b52532ac?view=parallel#a360e5f37ff035d1ed6814cb60de9f2826b55788_3373_3362

Previously, we had:

  lp->stopped = 1;

just before the snippet you modify.  Both stop_callback and
stop_wait_callback are no-ops if `lp->stopped` is true, so that would
make them skip over the event thread.  The commit removed that, so I
suppose that the problem was introduced in that commit.

Ideally, Pedro should look at this, but in the mean time:

Reviewed-By: Simon Marchi <simon.marchi@efficios.com>

Simon

  reply	other threads:[~2024-01-23 16:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-23 11:48 Tom de Vries
2024-01-23 16:08 ` Simon Marchi [this message]
2024-01-23 17:52   ` Tom de Vries
2024-02-09 15:46 ` Pedro Alves
2024-02-19 15:04   ` Tom de Vries
2024-02-21 17:42     ` Pedro Alves
2024-02-22 11:43       ` Tom de Vries
2024-02-23 14:33         ` Pedro Alves
2024-02-26 14:23           ` Tom de Vries
2024-02-26 15:28             ` Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1f9f082-032c-4d16-a1d1-f23a250e2128@polymtl.ca \
    --to=simon.marchi@polymtl.ca \
    --cc=gdb-patches@sourceware.org \
    --cc=tdevries@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).