public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Pedro Alves <pedro@palves.net>
To: Andrew Burgess <aburgess@redhat.com>, gdb-patches@sourceware.org
Subject: Re: [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event
Date: Mon, 13 Nov 2023 14:04:02 +0000	[thread overview]
Message-ID: <51c04503-ec38-4366-b02d-91da84b5ba3c@palves.net> (raw)
In-Reply-To: <87pm6n5e20.fsf@redhat.com>

Hi!

At long last, I've resumed work on this again.  A belated thank you so much for
the reviews.

On 2023-05-26 16:04, Andrew Burgess wrote:
> Pedro Alves <pedro@palves.net> writes:

>> +  /* Display one table row for each lwp_info.  */
>> +  for (lwp_info *lp : all_lwps ())
>> +    {
>> +      ui_out_emit_tuple tuple_emitter (uiout, "lwp-entry");
>> +
>> +      struct thread_info *th = find_thread_ptid (linux_target, lp->ptid);
> 
> After recent changes this line becomes:
> 
>   struct thread_info *th = linux_target->find_thread (lp->ptid);

Thanks, done.

>> From ee0a276c08b829ae504fe0eba5badc4f7faf3676 Mon Sep 17 00:00:00 2001
>> From: Pedro Alves <pedro@palves.net>
>> Date: Wed, 13 Jul 2022 17:16:38 +0100
>> Subject: [PATCH 2/2] gdb/linux: Delete all other LWPs immediately on ptrace
>>  exec event
>>
>> I noticed that on an Ubuntu 20.04 system, after a following patch
>> ("Step over clone syscall w/ breakpoint,
>> TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
>> was passing cleanly, but still, we'd end up with four new unexpected
>> GDB core dumps:
>>
>> 		 === gdb Summary ===
>>
>>  # of unexpected core files      4
>>  # of expected passes            48
>>
>> That said patch is making the pre-existing
>> gdb.threads/step-over-exec.exp testcase (almost silently) expose a
>> latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
>>
>>  #1 - a non-leader thread execs
>>  #2 - the post-exec program stops somewhere
>>  #3 - you kill the inferior
>>
>> Instead of #3 directly, the testcase just returns, which ends up in
>> gdb_exit, tearing down GDB, which kills the inferior, and is thus
>> equivalent to #3 above.
>>
>> Vis:
>>
>>  $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
>>  ...
>>  (top-gdb) r
>>  ...
>>  (gdb) b main
>>  ...
>>  (gdb) r
>>  ...
>>  Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
>>  69        argv0 = argv[0];
>>  (gdb) c
>>  Continuing.
>>  [New Thread 0x7ffff7d89700 (LWP 2506975)]
>>  Other going in exec.
>>  Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>>  process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
>>
>>  Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
>>  28        foo ();
>>  (gdb) k
>>  ...
>>  Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
>>  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>>  393         return m_suspend.waitstatus_pending_p;
>>  (top-gdb) bt
>>  #0  0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
>>  #1  0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
>>  #2  0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
>>  #3  0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
>>  #4  0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
>>  #5  0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
>>  #6  0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
>>  #7  0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
>>  #8  0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
>>  ...
> 
> As I mentioned in my other message, this backtrace includes
> kill_unfollowed_child_callback, which doesn't exist yet!  I think that's
> OK though, the text before the backtrace does make it clear that you saw
> this problem only after applying a later patch.

I've tweaked the commit log to make that more explicit.

> 
>>
>> The root of the problem is that when a non-leader LWP execs, it just
>> changes its tid to the tgid, replacing the pre-exec leader thread,
>> becoming the new leader.  There's no thread exit event for the execing
>> thread.  It's as if the old pre-exec LWP vanishes without trace.  The
>> ptrace man page says:
>>
>> "PTRACE_O_TRACEEXEC (since Linux 2.5.46)
>> 	Stop the tracee at the next execve(2).  A waitpid(2) by the
>> 	tracer will return a status value such that
>>
>> 	  status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
>>
>> 	If the execing thread is not a thread group leader, the thread
>> 	ID is reset to thread group leader's ID before this stop.
>> 	Since Linux 3.0, the former thread ID can be retrieved with
>> 	PTRACE_GETEVENTMSG."
>>
>> When the core of GDB processes an exec events, it deletes all the
>> threads of the inferior.  But, that is too late -- deleting the thread
>> does not delete the corresponding LWP, so we end leaving the pre-exec
>> non-leader LWP stale in the LWP list.  That's what leads to the crash
>> above -- linux_nat_target::kill iterates over all LWPs, and after the
>> patch in question, that code will look for the corresponding
>> thread_info for each LWP.  For the pre-exec non-leader LWP still
>> listed, won't find one.
>>
>> This patch fixes it, by deleting the pre-exec non-leader LWP (and
>> thread) from the LWP/thread lists as soon as we get an exec event out
>> of ptrace.
>>
>> GDBserver does not need an equivalent fix, because it is already doing
>> this, as side effect of mourning the pre-exec process, in
>> gdbserver/linux-low.cc:
>>
>>   else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
>>     {
>> ...
>>       /* Delete the execing process and all its threads.  */
>>       mourn (proc);
>>       switch_to_thread (nullptr);
>>
>>
>> The crash with gdb.threads/step-over-exec.exp is not observable on
>> newer systems, which postdate the glibc change to move "libpthread.so"
>> internals to "libc.so.6", because right after the exec, GDB traps a
>> load event for "libc.so.6", which leads to GDB trying to open
>> libthread_db for the post-exec inferior, and, on such systems that
>> succeeds.  When we load libthread_db, we call
>> linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
>> lwps, and then waits to see their stops.  While doing this, GDB
>> detects that the pre-exec stale LWP is gone, and deletes it.
>>
>> If we use "catch exec" to stop right at the exec before the
>> "libc.so.6" load event ever happens, and issue "kill" right there,
>> then GDB crashes on newer systems as well.  So instead of tweaking
>> gdb.threads/step-over-exec.exp to cover the fix, add a new
>> gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
> 
> Maybe it's worth mentioning that because the crash itself only happens
> once a later patch is applied we use 'maint info linux-lwps' to reveal
> the issue for now?

I've done something like that.

>>
>> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
>> ---
>>  gdb/infrun.c                                  |  8 +--
>>  gdb/linux-nat.c                               | 15 ++++
>>  .../gdb.threads/threads-after-exec.exp        | 70 +++++++++++++++++++
> 
> Oops, this diff is missing the two source files for this test (.c and
> -execd.c).  I was able to figure something out though so I could test
> the rest of this patch :)
> 

Ouch.  And enough time has passed that I completely lost those files.  But as you
discovered, they're trivial enough to rewrite.  Actually, I simplified, and
only use one .c file this time.


>>  3 files changed, 88 insertions(+), 5 deletions(-)
>>  create mode 100644 gdb/testsuite/gdb.threads/threads-after-exec.exp
>>

...

>> +
>> +standard_testfile .c -execd.c
>> +
>> +proc do_test { } {
>> +    global srcdir subdir srcfile srcfile2 binfile testfile
>> +    global decimal
>> +
>> +    # Compile main binary (the one that does the exec).
>> +    if {[gdb_compile_pthreads $srcdir/$subdir/$srcfile $binfile \
>> +	     executable {debug}] != "" } {
>> +	return -1
>> +    }
> 
> You can do:
> 
>     if {[build_executable "failed to build main executable" \
>              $binfile $srcfile {debug pthread}] == -1} {
> 	return -1
>     }
> 
>> +
>> +    # Compile the second binary (the one that gets exec'd).
>> +    if {[gdb_compile $srcdir/$subdir/$srcfile2 $binfile-execd \
>> +	     executable {debug}] != "" } {
>> +	return -1
>> +    }
> 
> And:
> 
>     if {[build_executable "failed to build execd executable" \
>              $binfile-execd $srcfile2 {debug}] == -1} {
> 	return -1
>     }
> 
> I thought we were moving away from calling the gdb_compile* functions
> directly.
> 

Done-ish -- since I only have one test program this time, I can use
prepare_for_testing, instead.

> Assuming the missing source files are added, this all looks great.
> 
> Reviewed-By: Andrew Burgess <aburgess@redhat.com>


  reply	other threads:[~2023-11-13 14:04 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-12 20:30 [PATCH 00/31] Step over thread clone and thread exit Pedro Alves
2022-12-12 20:30 ` [PATCH 01/31] displaced step: pass down target_waitstatus instead of gdb_signal Pedro Alves
2023-02-03 10:44   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:07       ` Andrew Burgess
2023-03-22 21:29         ` Andrew Burgess
2023-03-23 15:15           ` Pedro Alves
2023-03-27 12:40             ` Andrew Burgess
2023-03-27 16:21               ` Pedro Alves
2022-12-12 20:30 ` [PATCH 02/31] linux-nat: introduce pending_status_str Pedro Alves
2023-02-03 12:00   ` Andrew Burgess
2023-03-10 17:15     ` Pedro Alves
2023-03-16 16:19       ` Andrew Burgess
2023-03-27 18:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 03/31] gdb/linux: Delete all other LWPs immediately on ptrace exec event Pedro Alves
2023-03-21 14:50   ` Andrew Burgess
2023-04-04 13:57     ` Pedro Alves
2023-04-14 19:29       ` Pedro Alves
2023-05-26 15:04         ` Andrew Burgess
2023-11-13 14:04           ` Pedro Alves [this message]
2023-05-26 14:45       ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 04/31] Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED Pedro Alves
2023-02-04 15:38   ` Andrew Burgess
2023-03-10 17:16     ` Pedro Alves
2023-03-21 16:06       ` Andrew Burgess
2023-11-13 14:05         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 05/31] Support clone events in the remote protocol Pedro Alves
2023-03-22 15:46   ` Andrew Burgess
2023-11-13 14:05     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 06/31] Avoid duplicate QThreadEvents packets Pedro Alves
2023-05-26 15:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 07/31] enum_flags to_string Pedro Alves
2023-01-30 20:07   ` Simon Marchi
2022-12-12 20:30 ` [PATCH 08/31] Thread options & clone events (core + remote) Pedro Alves
2023-01-31 12:25   ` Lancelot SIX
2023-03-10 19:16     ` Pedro Alves
2023-06-06 13:29       ` Andrew Burgess
2023-11-13 14:07         ` Pedro Alves
2022-12-12 20:30 ` [PATCH 09/31] Thread options & clone events (native Linux) Pedro Alves
2023-06-06 13:43   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 10/31] Thread options & clone events (Linux GDBserver) Pedro Alves
2023-06-06 14:12   ` Andrew Burgess
2023-11-13 14:07     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 11/31] gdbserver: Hide and don't detach pending clone children Pedro Alves
2023-06-07 16:10   ` Andrew Burgess
2023-11-13 14:08     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 12/31] Remove gdb/19675 kfails (displaced stepping + clone) Pedro Alves
2023-06-07 17:08   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 13/31] Add test for stepping over clone syscall Pedro Alves
2023-06-07 17:42   ` Andrew Burgess
2023-11-13 14:09     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 14/31] all-stop/synchronous RSP support thread-exit events Pedro Alves
2023-06-07 17:52   ` Andrew Burgess
2023-11-13 14:11     ` Pedro Alves
2023-12-15 18:15       ` Pedro Alves
2022-12-12 20:30 ` [PATCH 15/31] gdbserver/linux-low.cc: Ignore event_ptid if TARGET_WAITKIND_IGNORE Pedro Alves
2022-12-12 20:30 ` [PATCH 16/31] Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to core Pedro Alves
2023-06-08 12:27   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 17/31] Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exit Pedro Alves
2023-06-08 13:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 18/31] Implement GDB_THREAD_OPTION_EXIT support for Linux GDBserver Pedro Alves
2023-06-08 14:14   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 19/31] Implement GDB_THREAD_OPTION_EXIT support for native Linux Pedro Alves
2023-06-08 14:17   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 20/31] gdb: clear step over information on thread exit (PR gdb/27338) Pedro Alves
2023-06-08 15:29   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 21/31] stop_all_threads: (re-)enable async before waiting for stops Pedro Alves
2023-06-08 15:49   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 22/31] gdbserver: Queue no-resumed event after thread exit Pedro Alves
2023-06-08 18:16   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 23/31] Don't resume new threads if scheduler-locking is in effect Pedro Alves
2023-06-08 18:24   ` Andrew Burgess
2023-11-13 14:12     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 24/31] Report thread exit event for leader if reporting thread exit events Pedro Alves
2023-06-09 13:11   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 25/31] Ignore failure to read PC when resuming Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 26/31] gdb/testsuite/lib/my-syscalls.S: Refactor new SYSCALL macro Pedro Alves
2023-06-10 10:33   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 27/31] Testcases for stepping over thread exit syscall (PR gdb/27338) Pedro Alves
2023-06-12  9:53   ` Andrew Burgess
2022-12-12 20:30 ` [PATCH 28/31] Document remote clone events, and QThreadOptions packet Pedro Alves
2023-06-05 15:53   ` Andrew Burgess
2023-11-13 14:13     ` Pedro Alves
2023-06-12 12:06   ` Andrew Burgess
2023-11-13 14:15     ` Pedro Alves
2022-12-12 20:30 ` [PATCH 29/31] inferior::clear_thread_list always silent Pedro Alves
2023-06-12 12:20   ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 30/31] Centralize "[Thread ...exited]" notifications Pedro Alves
2023-02-04 16:05   ` Andrew Burgess
2023-03-10 17:21     ` Pedro Alves
2023-02-16 15:40   ` Andrew Burgess
2023-06-12 12:23     ` Andrew Burgess
2022-12-12 20:31 ` [PATCH 31/31] Cancel execution command on thread exit, when stepping, nexting, etc Pedro Alves
2023-06-12 13:12   ` Andrew Burgess
2023-01-24 19:47 ` [PATCH v3 00/31] Step over thread clone and thread exit Pedro Alves
2023-11-13 14:24 ` [PATCH " Pedro Alves

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51c04503-ec38-4366-b02d-91da84b5ba3c@palves.net \
    --to=pedro@palves.net \
    --cc=aburgess@redhat.com \
    --cc=gdb-patches@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).