public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp
@ 2024-01-22 13:34 Tom de Vries
  2024-01-22 20:43 ` Simon Marchi
  0 siblings, 1 reply; 3+ messages in thread
From: Tom de Vries @ 2024-01-22 13:34 UTC (permalink / raw)
  To: gdb-patches

When building gdb with -O0 -fsanitize=thread, and running test-case
gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into:
...
WARNING: ThreadSanitizer: heap-use-after-free (pid=249653)
  Write of size 4 at 0xffffee83055c by main thread:
    #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c)
    #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470)
    #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
    #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
    #4 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
    #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
    #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
    #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
    #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
    #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
    #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
    #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
    #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
    #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
    #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
    #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
    #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
    #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
    #18 main gdb/gdb.c:39 (gdb+0x423ce8)

  Previous write of size 8 at 0xffffee830558 by main thread:
    #0 operator delete(void*, unsigned long) <null> (libtsan.so.2+0x8fb14)
    #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c)
    #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c)
    #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404)
    #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8)
    #5 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0)
    #6 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18)
    #7 gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90)
    #8 iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) gdb/linux-nat.c:879 (gdb+0xb03a18)
    #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8)
    #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
    #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
    #12 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
    #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
    #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
    #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
    #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
    #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
    #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
    #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
    #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
    #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
    #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
    #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
    #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
    #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
    #26 main gdb/gdb.c:39 (gdb+0x423ce8)

SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp
...

Since heap-use-after-free is essentially an address sanitizer complaint, I
also tried building gdb with -O0 -fsanitize=address, but with this setup it
doesn't seem to trigger (0 times out of 10).

The heap-use-after-free happens during the following scenario:
- linux_nat_wait_1 selects an LWP thread T1 with a status to report.
- it sets variable lp to point to the corresponding lwp_info.
- it calls stop_callback and stop_wait_callback for all threads
  (because !target_is_non_stop_p ()).
- it calls select_event_lwp to maybe pick another thread than T1, to prevent
  starvation.

The problem seems to be the following:
- while calling stop_wait_callback for all threads, it also does this for T1.
  While doing so, the corresponding lwp_info is deleted (callstack
  stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
  lp as a dangling pointer.
- variable lp is passed to select_event_lwp, which derefences it, which causes
  the heap-use-after-free.

Note that the comment here mentions "all other LWP's":
...
      /* Now stop all other LWP's ...  */
      iterate_over_lwps (minus_one_ptid, stop_callback);
      /* ... and wait until all of them have reported back that
        they're no longer running.  */
      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
...
which presumably means other than the one in lp, but the iterators
don't skip lp.

Fix this by making the code match the comment, and skipping stop_callback and
stop_wait_callback for lp.

Tested on aarch64-linux.

PR gdb/31259
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
---
 gdb/linux-nat.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index e91c57ba239..41f56935284 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
   if (!target_is_non_stop_p ())
     {
       /* Now stop all other LWP's ...  */
-      iterate_over_lwps (minus_one_ptid, stop_callback);
+      iterate_over_lwps (minus_one_ptid,
+			 [&] (struct lwp_info *info)
+			 {
+			   if (info == lp)
+			     return 0;
+			   return stop_callback (info);
+			 });
 
       /* ... and wait until all of them have reported back that
 	 they're no longer running.  */
-      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
+      iterate_over_lwps (minus_one_ptid,
+			 [&] (struct lwp_info *info)
+			 {
+			   if (info == lp)
+			     return 0;
+			   return stop_wait_callback (info);
+			 });
     }
 
   /* If we're not waiting for a specific LWP, choose an event LWP from

base-commit: c2625a463ffd8c0d10b85b65e80ab8b67b28e441
-- 
2.35.3


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp
  2024-01-22 13:34 [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp Tom de Vries
@ 2024-01-22 20:43 ` Simon Marchi
  2024-01-23 11:51   ` Tom de Vries
  0 siblings, 1 reply; 3+ messages in thread
From: Simon Marchi @ 2024-01-22 20:43 UTC (permalink / raw)
  To: Tom de Vries, gdb-patches



On 2024-01-22 08:34, Tom de Vries wrote:
> When building gdb with -O0 -fsanitize=thread, and running test-case
> gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into:
> ...
> WARNING: ThreadSanitizer: heap-use-after-free (pid=249653)
>   Write of size 4 at 0xffffee83055c by main thread:
>     #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c)
>     #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470)
>     #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>     #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>     #4 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>     #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>     #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>     #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>     #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>     #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>     #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>     #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>     #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>     #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>     #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>     #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>     #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>     #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>     #18 main gdb/gdb.c:39 (gdb+0x423ce8)
> 
>   Previous write of size 8 at 0xffffee830558 by main thread:
>     #0 operator delete(void*, unsigned long) <null> (libtsan.so.2+0x8fb14)
>     #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c)
>     #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c)
>     #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404)
>     #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8)
>     #5 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0)
>     #6 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18)
>     #7 gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90)
>     #8 iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) gdb/linux-nat.c:879 (gdb+0xb03a18)
>     #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8)
>     #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>     #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>     #12 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>     #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>     #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>     #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>     #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>     #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>     #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>     #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>     #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>     #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>     #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>     #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>     #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>     #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>     #26 main gdb/gdb.c:39 (gdb+0x423ce8)
> 
> SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp
> ...
> 
> Since heap-use-after-free is essentially an address sanitizer complaint, I
> also tried building gdb with -O0 -fsanitize=address, but with this setup it
> doesn't seem to trigger (0 times out of 10).
> 
> The heap-use-after-free happens during the following scenario:
> - linux_nat_wait_1 selects an LWP thread T1 with a status to report.
> - it sets variable lp to point to the corresponding lwp_info.
> - it calls stop_callback and stop_wait_callback for all threads
>   (because !target_is_non_stop_p ()).
> - it calls select_event_lwp to maybe pick another thread than T1, to prevent
>   starvation.
> 
> The problem seems to be the following:
> - while calling stop_wait_callback for all threads, it also does this for T1.
>   While doing so, the corresponding lwp_info is deleted (callstack
>   stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
>   lp as a dangling pointer.
> - variable lp is passed to select_event_lwp, which derefences it, which causes
>   the heap-use-after-free.
> 
> Note that the comment here mentions "all other LWP's":
> ...
>       /* Now stop all other LWP's ...  */
>       iterate_over_lwps (minus_one_ptid, stop_callback);
>       /* ... and wait until all of them have reported back that
>         they're no longer running.  */
>       iterate_over_lwps (minus_one_ptid, stop_wait_callback);
> ...
> which presumably means other than the one in lp, but the iterators
> don't skip lp.
> 
> Fix this by making the code match the comment, and skipping stop_callback and
> stop_wait_callback for lp

It's not totally clear to me in which part of the test this happens.
Is it when doing the "continue to end of inferior 2" test, in which case
the thread T1 in your example is the thread of inferior 2 reporting an
exit?

> Tested on aarch64-linux.
> 
> PR gdb/31259
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
> ---
>  gdb/linux-nat.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
> index e91c57ba239..41f56935284 100644
> --- a/gdb/linux-nat.c
> +++ b/gdb/linux-nat.c
> @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>    if (!target_is_non_stop_p ())
>      {
>        /* Now stop all other LWP's ...  */
> -      iterate_over_lwps (minus_one_ptid, stop_callback);
> +      iterate_over_lwps (minus_one_ptid,
> +			 [&] (struct lwp_info *info)
> +			 {
> +			   if (info == lp)
> +			     return 0;
> +			   return stop_callback (info);
> +			 });
>  
>        /* ... and wait until all of them have reported back that
>  	 they're no longer running.  */
> -      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
> +      iterate_over_lwps (minus_one_ptid,
> +			 [&] (struct lwp_info *info)
> +			 {
> +			   if (info == lp)
> +			     return 0;
> +			   return stop_wait_callback (info);
> +			 });
>      }
>  
>    /* If we're not waiting for a specific LWP, choose an event LWP from

The code changes make sense to me.  There's no need to stop the event
thread, it's presumably already stopped.

I think the code would be clearer if you wrote a for loop (I think we
should phase out iterate_over_lwps):

  for (thread_info *to_stop : all_lwps_safe ())
    {
      if (lp == to_stop)
        continue;

      stop_callback (to_stop);
    }


stop_callback becomes a little bit misnamed here, but cleaning that up
would be a bigger change.

Simon

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp
  2024-01-22 20:43 ` Simon Marchi
@ 2024-01-23 11:51   ` Tom de Vries
  0 siblings, 0 replies; 3+ messages in thread
From: Tom de Vries @ 2024-01-23 11:51 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 1/22/24 21:43, Simon Marchi wrote:
> 
> 
> On 2024-01-22 08:34, Tom de Vries wrote:
>> When building gdb with -O0 -fsanitize=thread, and running test-case
>> gdb.base/vfork-follow-parent.exp, 5 times out of 10 I run into:
>> ...
>> WARNING: ThreadSanitizer: heap-use-after-free (pid=249653)
>>    Write of size 4 at 0xffffee83055c by main thread:
>>      #0 select_event_lwp gdb/linux-nat.c:2809 (gdb+0xb0a65c)
>>      #1 linux_nat_wait_1 gdb/linux-nat.c:3389 (gdb+0xb0c470)
>>      #2 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>>      #3 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>>      #4 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>>      #5 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>>      #6 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>>      #7 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>>      #8 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>>      #9 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>>      #10 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>>      #11 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>>      #12 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>>      #13 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>>      #14 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>>      #15 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>>      #16 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>>      #17 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>>      #18 main gdb/gdb.c:39 (gdb+0x423ce8)
>>
>>    Previous write of size 8 at 0xffffee830558 by main thread:
>>      #0 operator delete(void*, unsigned long) <null> (libtsan.so.2+0x8fb14)
>>      #1 delete_lwp gdb/linux-nat.c:849 (gdb+0xb0384c)
>>      #2 exit_lwp gdb/linux-nat.c:924 (gdb+0xb03c4c)
>>      #3 wait_lwp gdb/linux-nat.c:2224 (gdb+0xb08404)
>>      #4 stop_wait_callback gdb/linux-nat.c:2458 (gdb+0xb092a8)
>>      #5 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const gdb/../gdbsupport/function-view.h:326 (gdb+0xb15ab0)
>>      #6 gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) gdb/../gdbsupport/function-view.h:320 (gdb+0xb15b18)
>>      #7 gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const gdb/../gdbsupport/function-view.h:289 (gdb+0xb13e90)
>>      #8 iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) gdb/linux-nat.c:879 (gdb+0xb03a18)
>>      #9 linux_nat_wait_1 gdb/linux-nat.c:3382 (gdb+0xb0c3f8)
>>      #10 linux_nat_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-nat.c:3560 (gdb+0xb0cfc8)
>>      #11 thread_db_target::wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/linux-thread-db.c:1402 (gdb+0xb35958)
>>      #12 target_wait(ptid_t, target_waitstatus*, enum_flags<target_wait_flag>) gdb/target.c:2571 (gdb+0xfb6c34)
>>      #13 do_target_wait_1 gdb/infrun.c:4120 (gdb+0xa99dc4)
>>      #14 operator() gdb/infrun.c:4179 (gdb+0xa99f70)
>>      #15 do_target_wait gdb/infrun.c:4198 (gdb+0xa9a2bc)
>>      #16 fetch_inferior_event() gdb/infrun.c:4629 (gdb+0xa9b658)
>>      #17 inferior_event_handler(inferior_event_type) gdb/inf-loop.c:42 (gdb+0xa6b0c8)
>>      #18 handle_target_event gdb/linux-nat.c:4357 (gdb+0xb0f694)
>>      #19 handle_file_event gdbsupport/event-loop.cc:573 (gdb+0x1cfc03c)
>>      #20 gdb_wait_for_event gdbsupport/event-loop.cc:694 (gdb+0x1cfc700)
>>      #21 gdb_do_one_event(int) gdbsupport/event-loop.cc:217 (gdb+0x1cfa8ac)
>>      #22 start_event_loop gdb/main.c:408 (gdb+0xb7be9c)
>>      #23 captured_command_loop gdb/main.c:472 (gdb+0xb7c0cc)
>>      #24 captured_main gdb/main.c:1342 (gdb+0xb7e4e4)
>>      #25 gdb_main(captured_main_args*) gdb/main.c:1361 (gdb+0xb7e594)
>>      #26 main gdb/gdb.c:39 (gdb+0x423ce8)
>>
>> SUMMARY: ThreadSanitizer: heap-use-after-free gdb/linux-nat.c:2809 in select_event_lwp
>> ...
>>
>> Since heap-use-after-free is essentially an address sanitizer complaint, I
>> also tried building gdb with -O0 -fsanitize=address, but with this setup it
>> doesn't seem to trigger (0 times out of 10).
>>
>> The heap-use-after-free happens during the following scenario:
>> - linux_nat_wait_1 selects an LWP thread T1 with a status to report.
>> - it sets variable lp to point to the corresponding lwp_info.
>> - it calls stop_callback and stop_wait_callback for all threads
>>    (because !target_is_non_stop_p ()).
>> - it calls select_event_lwp to maybe pick another thread than T1, to prevent
>>    starvation.
>>
>> The problem seems to be the following:
>> - while calling stop_wait_callback for all threads, it also does this for T1.
>>    While doing so, the corresponding lwp_info is deleted (callstack
>>    stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable
>>    lp as a dangling pointer.
>> - variable lp is passed to select_event_lwp, which derefences it, which causes
>>    the heap-use-after-free.
>>
>> Note that the comment here mentions "all other LWP's":
>> ...
>>        /* Now stop all other LWP's ...  */
>>        iterate_over_lwps (minus_one_ptid, stop_callback);
>>        /* ... and wait until all of them have reported back that
>>          they're no longer running.  */
>>        iterate_over_lwps (minus_one_ptid, stop_wait_callback);
>> ...
>> which presumably means other than the one in lp, but the iterators
>> don't skip lp.
>>
>> Fix this by making the code match the comment, and skipping stop_callback and
>> stop_wait_callback for lp
> 
> It's not totally clear to me in which part of the test this happens.
> Is it when doing the "continue to end of inferior 2" test, in which case
> the thread T1 in your example is the thread of inferior 2 reporting an
> exit?
> 

The context is (copied from the PR):
...
(gdb) PASS: gdb.base/vfork-follow-parent.exp: 
exec_file=vfork-follow-parent-exec: target-non-stop=off: non-stop=off: 
resolution_method=schedule-multiple: set schedule-multiple on
continue
Continuing.
[New inferior 2 (process 600810)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 600810 is executing new program: 
/home/vries/gdb/build/gdb/testsuite/outputs/gdb.base/vfork-follow-parent/vforked-prog
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Thread 0xfffff7fb4020 (LWP 600810) exited]
==================
WARNING: ThreadSanitizer: heap-use-after-free (pid=600786)
   ...
SUMMARY: ThreadSanitizer: heap-use-after-free 
/home/vries/gdb/src/gdb/linux-nat.c:2809 in select_event_lwp
==================
FAIL: gdb.base/vfork-follow-parent.exp: 
exec_file=vfork-follow-parent-exec: target-non-stop=off: non-stop=off: 
resolution_method=schedule-multiple: continue to end of inferior 2 (timeout)
...

So, I'd say yes.

>> Tested on aarch64-linux.
>>
>> PR gdb/31259
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259
>> ---
>>   gdb/linux-nat.c | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
>> index e91c57ba239..41f56935284 100644
>> --- a/gdb/linux-nat.c
>> +++ b/gdb/linux-nat.c
>> @@ -3375,11 +3375,23 @@ linux_nat_wait_1 (ptid_t ptid, struct target_waitstatus *ourstatus,
>>     if (!target_is_non_stop_p ())
>>       {
>>         /* Now stop all other LWP's ...  */
>> -      iterate_over_lwps (minus_one_ptid, stop_callback);
>> +      iterate_over_lwps (minus_one_ptid,
>> +			 [&] (struct lwp_info *info)
>> +			 {
>> +			   if (info == lp)
>> +			     return 0;
>> +			   return stop_callback (info);
>> +			 });
>>   
>>         /* ... and wait until all of them have reported back that
>>   	 they're no longer running.  */
>> -      iterate_over_lwps (minus_one_ptid, stop_wait_callback);
>> +      iterate_over_lwps (minus_one_ptid,
>> +			 [&] (struct lwp_info *info)
>> +			 {
>> +			   if (info == lp)
>> +			     return 0;
>> +			   return stop_wait_callback (info);
>> +			 });
>>       }
>>   
>>     /* If we're not waiting for a specific LWP, choose an event LWP from
> 
> The code changes make sense to me.  There's no need to stop the event
> thread, it's presumably already stopped.
> 
> I think the code would be clearer if you wrote a for loop (I think we
> should phase out iterate_over_lwps):
> 
>    for (thread_info *to_stop : all_lwps_safe ())
>      {
>        if (lp == to_stop)
>          continue;
> 
>        stop_callback (to_stop);
>      }
> 
> 
> stop_callback becomes a little bit misnamed here, but cleaning that up
> would be a bigger change.

Ack, I've updated that in the v2 ( 
https://sourceware.org/pipermail/gdb-patches/2024-January/206171.html ). 
  Though I'm using iteration variable "other_lp" instead of "to_stop", 
copying what I found elsewhere in the same file.

Thanks,
- Tom


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-23 11:50 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22 13:34 [PATCH] [gdb] Fix heap-use-after-free in select_event_lwp Tom de Vries
2024-01-22 20:43 ` Simon Marchi
2024-01-23 11:51   ` Tom de Vries

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).