public inbox for gdb@sourceware.org
 help / color / mirror / Atom feed
From: Bill Messmer <wmessmer@microsoft.com>
To: Simon Marchi <simark@simark.ca>,
	"gdb@sourceware.org" <gdb@sourceware.org>
Subject: RE: [EXTERNAL] Re: Issues With Thread Events In User Mode GDBServer
Date: Fri, 30 Sep 2022 21:08:47 +0000	[thread overview]
Message-ID: <MN2PR21MB1439C53BE37D447BF27B34C9C4569@MN2PR21MB1439.namprd21.prod.outlook.com> (raw)
In-Reply-To: <054cd411-885d-443e-d357-1c517315dbcb@simark.ca>

Simon,

Apologies for the delay in response.  I finally had a bit of time to debug through gdbserver while trying to get all of this working on my side...

linux_process_target::wait_1 does *NOT* call stop_all_lwps at all (even in full stop mode) if the event is a termination event.  The relevant block is the large

      if (WIFEXITED (w) || WIFSIGNALED (w))
        {

        ...

          if (ourstatus->kind () == TARGET_WAITKIND_EXITED)
            return filter_exit_event (event_child, ourstatus);

          return ptid_of (current_thread);
        }

I went and tweaked this to:

      if (WIFEXITED (w) || WIFSIGNALED (w))
        {

        ...

          if (ourstatus->kind () == TARGET_WAITKIND_EXITED)
           result = filter_exit_event (event_child, ourstatus);

          result = ptid_of (current_thread);

          if (!non_stop)
            {
              stop_all_lwps(0, NULL);
            }

          return result;
        }

With the tests I have, things appear to largely work as a I'd expect after making these changes.  Again -- I have little familiarity with GDBServer, so I don't know if I'm missing something here.

If this seems reasonably correct to you -- I'm happy to submit a patch.

Sincerely,

Bill Messmer
wmessmer@microsoft.com

-----Original Message-----
From: Simon Marchi <simark@simark.ca> 
Sent: Tuesday, September 13, 2022 4:39 PM
To: Bill Messmer <wmessmer@microsoft.com>; gdb@sourceware.org
Subject: Re: [EXTERNAL] Re: Issues With Thread Events In User Mode GDBServer

[You don't often get email from simark@simark.ca. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

> The GDBServer then segfaults when the first thread exits.  GDB itself shows that the gdbserver faulted at:
>
>     Program received signal SIGSEGV, Segmentation fault.
>     resume (actions=actions@entry=0x55e85605f590, num_actions=num_actions@entry=1) at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:2966
>     2966    /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc: No such file or directory.
>     (gdb) bt
>     #0  resume (actions=actions@entry=0x55e85605f590, num_actions=num_actions@entry=1)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:2966
>     #1  0x000055e854c61020 in handle_v_cont (own_buf=0x55e85604aed0 "vCont;c")
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:2910
>     #2  handle_v_requests (own_buf=0x55e85604aed0 "vCont;c", packet_len=<optimized out>,
>         new_packet_len=<optimized out>) at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:3177
>     #3  0x000055e854c6299e in process_serial_event ()
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:4523
>     #4  handle_serial_event (err=<optimized out>, client_data=<optimized out>)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:4555
>     #5  0x000055e854c994b6 in gdb_wait_for_event (block=block@entry=1)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbsupport/event-loop.cc:700
>     #6  0x000055e854c9994b in gdb_wait_for_event (block=1)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbsupport/event-loop.cc:596
>     #7  gdb_do_one_event () at /build/gdb-wIRHdd/gdb-12.0.90/gdbsupport/event-loop.cc:237
>     #8  0x000055e854c50872 in start_event_loop ()
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:3553
>     #9  captured_main (argv=<optimized out>, argc=<optimized out>)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:4033
>     #10 main (argc=<optimized out>, argv=<optimized out>)
>         at /build/gdb-wIRHdd/gdb-12.0.90/gdbserver/server.cc:4119

Thanks for the detailed report.

A bit of background: the only time GDB ever requests thread events from GDBserver is in non-stop mode, when it wants to stop all threads.  It is the case described in the QThreadEvent documentation:

   For example, this is used in non-stop mode when GDB stops a set of
   threads and synchronously waits for the their corresponding stop
   replies. Without exit events, if one of the threads exits, GDB would
   hang forever not knowing that it should no longer expect a stop for
   that same thread.

By using QThreadEvents in all-stop mode, you likely trigger some different code path (not a reason for GDBserver to crash, of course).

I think I was able to reproduce the crash using GDB, with this simple patch that enables thread events all the time, just like you do:

diff --git a/gdb/remote.c b/gdb/remote.c index 70f918a7362c..700e2c2b929f 100644
--- a/gdb/remote.c
+++ b/gdb/remote.c
@@ -4776,6 +4776,8 @@ remote_target::start_remote_1 (int from_tty, int extended_p)
   if (packet_support (PACKET_QAllow) != PACKET_DISABLE)
     set_permissions ();

+  this->thread_events (1);
+
   /* gdbserver < 7.7 (before its fix from 2013-12-11) did reply to any
      unknown 'v' packet with string "OK".  "OK" gets interpreted by GDB
      as a reply to known packet.  For packet "vFile:setfs:" it is an

Using a test program similar to yours:

  $ ./gdb -nx -q --data-directory=data-directory a.out -ex "tar rem :1234" -ex c

... leads to gdbserver crashing, the backtrace looks just like yours:

==33707==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000030 (pc 0x55d6edb7df07 bp 0x7fff852bc360 sp 0x7fff852bc350 T0) ==33707==The signal is caused by a READ memory access.
==33707==Hint: address points to the zero page.
    #0 0x55d6edb7df07 in target_waitstatus::reset() /home/smarchi/src/binutils-gdb/gdbserver/../gdb/target/waitstatus.h:400
    #1 0x55d6edbc6519 in target_waitstatus::operator=(target_waitstatus const&) /home/smarchi/src/binutils-gdb/gdbserver/../gdb/target/waitstatus.h:187
    #2 0x55d6edbb6bab in resume /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2931
    #3 0x55d6edbb6523 in handle_v_cont /home/smarchi/src/binutils-gdb/gdbserver/server.cc:2875
    #4 0x55d6edbb8129 in handle_v_requests(char*, int, int*) /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3138
    #5 0x55d6edbc1844 in process_serial_event /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4484
    #6 0x55d6edbc1a9b in handle_serial_event(int, void*) /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4516
    #7 0x55d6edcdcef1 in handle_file_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:574
    #8 0x55d6edcdd82d in gdb_wait_for_event /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:695
    #9 0x55d6edcdb4f8 in gdb_do_one_event(int) /home/smarchi/src/binutils-gdb/gdbsupport/event-loop.cc:265
    #10 0x55d6edbba12b in start_event_loop /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3514
    #11 0x55d6edbbde10 in captured_main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:3994
    #12 0x55d6edbbe4b8 in main /home/smarchi/src/binutils-gdb/gdbserver/server.cc:4080
    #13 0x7ff01623c28f  (/usr/lib/libc.so.6+0x2328f)
    #14 0x7ff01623c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
    #15 0x55d6edb59ec4 in _start ../sysdeps/x86_64/start.S:115


> So I went into *resume* and added the "cs.last_status.kind() != TARGET_WAITKIND_THREAD_EXITED) to the below code in that function as the "current_thread->last_status" reference is the source of the segfault:
>
>       if (cs.last_status.kind () != TARGET_WAITKIND_EXITED
>           && cs.last_status.kind () != TARGET_WAITKIND_SIGNALLED
>           && cs.last_status.kind () != TARGET_WAITKIND_NO_RESUMED
>           && cs.last_status.kind () != TARGET_WAITKIND_THREAD_EXITED)
>         current_thread->last_status = cs.last_status;

I think that makes sense, as if linux-low.cc has reported TARGET_WAITKIND_THREAD_EXITED, it has deleted that thread_info, so current_thread will be made nullptr.

> After making this change, the server no longer crashes at the first 
> thread exit, but instead, I get a packet that is
>
>     w0;2635
>
> Here's the problem though.  When I receive the various "T05create;..." packets, the debuggee process is frozen.  There's a bunch of printf's in my test app...  and nothing happens until I issue the vCont back to the server.  On receipt of the w0;2635 packet, however, the process just keeps going...

That is a bug, from what I understand.  In all-stop, the target should all threads whenever it returns any stop reply.  This should be done by the "low" target, linux-low.cc.  Off-hand I don't understand why this call to stop_all_lwps in linux_process_target::wait_1 doesn't stop the threads in that situation:

  https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fgnutools%2Fbinutils-gdb%2F-%2Fblob%2Fe9a241e87b42f902d0408704df6bbcd8bf465a46%2Fgdbserver%2Flinux-low.cc%23L3463&amp;data=05%7C01%7Cwmessmer%40microsoft.com%7C57a0cc725b7f47e2ae1108da95e12859%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637987091529234592%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=dAvf11H6aKpNyU7pWGu4LvF4Sh02mZpRLg1R2RIal0M%3D&amp;reserved=0

> I suspect that's a bug in the gdbserver (I'm no expert here in either gdbserver or its code).  That's the first question...  and the second is whether there's some other way that thread creations and exits get detected other than QThreadEvents:1 (as this doesn't seem to be well supported).

Yes, I think it's a bug.  QThreadEvents should be the way to get notified about thread creation / exit events as it happens.  As mentioned earlier, it's only used when stopping all threads, at the moment.  It's not enabled by default because it would be inefficient when debugging applications with lots of short-lived threads.  I think it's just that it has never been used in all-stop mode yet, so you are the lucky one to stumble on those bugs.

Simon

  reply	other threads:[~2022-09-30 21:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <MN2PR21MB14398C0638256C6DD545FA7CC4439@MN2PR21MB1439.namprd21.prod.outlook.com>
2022-09-11 18:55 ` Simon Marchi
2022-09-12 18:42   ` [EXTERNAL] " Bill Messmer
2022-09-13 23:39     ` Simon Marchi
2022-09-30 21:08       ` Bill Messmer [this message]
2022-10-19 16:19         ` Simon Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR21MB1439C53BE37D447BF27B34C9C4569@MN2PR21MB1439.namprd21.prod.outlook.com \
    --to=wmessmer@microsoft.com \
    --cc=gdb@sourceware.org \
    --cc=simark@simark.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).