public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
@ 2020-05-17  9:10 ` vries at gcc dot gnu.org
  2020-07-21 14:41 ` vries at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2020-05-17  9:10 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
I reproduced this at 966dc1a27c "Automatic date update in version.in", that is,
with the fix for PR25478 committed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
  2020-05-17  9:10 ` [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process) vries at gcc dot gnu.org
@ 2020-07-21 14:41 ` vries at gcc dot gnu.org
  2020-12-01  2:57 ` simark at simark dot ca
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2020-07-21 14:41 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
I installed openSUSE Tumbleweed in a VM, with 1 virtual CPU and execution cap
set to 75%.  In this setting, this reproduces every time.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
  2020-05-17  9:10 ` [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process) vries at gcc dot gnu.org
  2020-07-21 14:41 ` vries at gcc dot gnu.org
@ 2020-12-01  2:57 ` simark at simark dot ca
  2020-12-01  3:03 ` simark at simark dot ca
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2020-12-01  2:57 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

Simon Marchi <simark at simark dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |simark at simark dot ca

--- Comment #8 from Simon Marchi <simark at simark dot ca> ---
This is due to an exec race.  We have two threads

- leader thread, doing the exec
- other thread, minding its own business

Things to remember are:

- GDB resumes threads one by one, one after the other
- When a multi-threaded program execs, regardless of which thread did the exec,
all non-leader threads disappear and it looks like the leader is now executing
the new executable.

When we do a "continue" after stopping at the all_started function, GDB tries
to resume leader first, then other.  In the failing case, leader has the time
to run its exec before GDB tries to resume the other thread.  So the ptrace
resume on the other thread fails with

  Couldn't get registers: No such process.

That aborts the resumption command and causes unexpected output, making the
test fail.

Since this particular test is not about testing this particular corner case, I
think we can avoid it by adding a synchronization point after the continue, so
the exec only happens after both threads were resumed.

But I also think we should handle the situation better.  When resuming multiple
threads and one of them fails with "no such process", we should probably print
a warning but carry on, not abort the continue.  There would be a separate test
for that (or whatever behavior we decide to have).

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2020-12-01  2:57 ` simark at simark dot ca
@ 2020-12-01  3:03 ` simark at simark dot ca
  2020-12-01  3:42 ` simark at simark dot ca
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2020-12-01  3:03 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #9 from Simon Marchi <simark at simark dot ca> ---
Applying this little delay should make the issue trigger all the time.  It's
using repeated usleep instead of one sleep, because incoming SIGCHLDs interrupt
the sleep.


diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
index d5a062163c7..9540339a9da 100644
--- a/gdb/inf-ptrace.c
+++ b/gdb/inf-ptrace.c
@@ -308,6 +308,8 @@ inf_ptrace_target::resume (ptid_t ptid, int step, enum
gdb_signal signal)
   gdb_ptrace (request, ptid, (PTRACE_TYPE_ARG3)1, gdb_signal_to_host
(signal));
   if (errno != 0)
     perror_with_name (("ptrace"));
+  for (int i = 0 ; i < 100; i++)
+    usleep (10000);
 }

 /* Wait for the child specified by PTID to do something.  Return the

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2020-12-01  3:03 ` simark at simark dot ca
@ 2020-12-01  3:42 ` simark at simark dot ca
  2020-12-04 13:18 ` palves at redhat dot com
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2020-12-01  3:42 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #10 from Simon Marchi <simark at simark dot ca> ---
Proposed patch:
https://sourceware.org/pipermail/gdb-patches/2020-December/173647.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2020-12-01  3:42 ` simark at simark dot ca
@ 2020-12-04 13:18 ` palves at redhat dot com
  2020-12-04 17:59 ` simark at simark dot ca
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 9+ messages in thread
From: palves at redhat dot com @ 2020-12-04 13:18 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

Pedro Alves <palves at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |palves at redhat dot com

--- Comment #11 from Pedro Alves <palves at redhat dot com> ---
> So the ptrace resume on the other thread fails with
>   Couldn't get registers: No such process.

> But I also think we should handle the situation better.  When resuming
> multiple threads and one of them fails with "no such process", we should
> probably print a warning but carry on, not abort the continue.  

Note we already ignore failures to resume when the thread disappears.  See 
linux_resume_one_lwp -> check_ptrace_stopped_lwp_gone in both gdb and
gdbserver.

And also, gdbserver ignores ESRCH when reading registers.  See
linux-low.cc:regsets_fetch_inferior_registers:

          else if (errno == ESRCH)
            {
              /* At this point, ESRCH should mean the process is
                 already gone, in which case we simply ignore attempts
                 to read its registers.  */
            }

Native gdb is missing the equivalent.  Unfortunately, on the native side, the
code calling PTRACE_GETREGS / PTRACE_GETREGSET / PTRACE_SETREGS / 
PTRACE_SETREGSET is dispersed throughout all architecture ports.

Does the issue trigger with gdbserver?  I'd assume not.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2020-12-04 13:18 ` palves at redhat dot com
@ 2020-12-04 17:59 ` simark at simark dot ca
  2020-12-11  0:56 ` cvs-commit at gcc dot gnu.org
  2020-12-11  0:57 ` simark at simark dot ca
  8 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2020-12-04 17:59 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #12 from Simon Marchi <simark at simark dot ca> ---
(In reply to Pedro Alves from comment #11)
> Does the issue trigger with gdbserver?  I'd assume not.

Indeed, I don't seem to be able to trigger it with gdbserver.

I found this other bug while trying though :)

https://sourceware.org/bugzilla/show_bug.cgi?id=27018

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2020-12-04 17:59 ` simark at simark dot ca
@ 2020-12-11  0:56 ` cvs-commit at gcc dot gnu.org
  2020-12-11  0:57 ` simark at simark dot ca
  8 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-11  0:56 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

--- Comment #13 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Simon Marchi <simark@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=4483a8e72ad265b5d428899d384bf190db071759

commit 4483a8e72ad265b5d428899d384bf190db071759
Author: Simon Marchi <simon.marchi@efficios.com>
Date:   Thu Dec 10 19:55:56 2020 -0500

    gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp

    That test fails intermittently for me.  The problem is a race condition
    between the exec syscall and GDB resuming threads.

    The initial situation is that we have two threads, let's call them
    "leader" and "other".  Leader is the one who is going to do the exec.
    We stop at the breakpoint on the all_started function, so both threads
    are stopped.  When resuming, GDB resumes leader first and other second.
    However, between resuming the two threads, leader has time to run and do
    its exec, making other disappear.  When GDB tries to resume other, it is
    ino longer there.  We get some "Couldn't get registers: No such
    process." messages, and the state is a bit messed up.

    The issue can be triggered consistently by adding a small delay after
    the resume syscall:

        diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
        index d5a062163c7..9540339a9da 100644
        --- a/gdb/inf-ptrace.c
        +++ b/gdb/inf-ptrace.c
        @@ -308,6 +308,8 @@ inf_ptrace_target::resume (ptid_t ptid, int step,
enum gdb_signal signal)
           gdb_ptrace (request, ptid, (PTRACE_TYPE_ARG3)1, gdb_signal_to_host
(signal));
           if (errno != 0)
             perror_with_name (("ptrace"));
        +  for (int i = 0 ; i < 100; i++)
        +    usleep (10000);
         }

         /* Wait for the child specified by PTID to do something.  Return the

    This patch is about fixing the test to avoid this, since the test is not
    about testing this particular corner case.  Handling of multi-threaded
    program doing execs should be improved too, but that's not the goal of
    this patch.

    Fix it by adding a synchronization point in the test to make sure both
    threads were resumed by GDB before doing the exec.  I added two
    pthread_barrier_wait calls in each thread (for a total of three).  I
    think adding one call in each thread would not be enough, because this
    could happen:

    - both threads reach the first barrier
    - the "other" thread is scheduled so has time to run and hit the second
      barrier
    - the "leader" thread hits the all_started function breakpoint, causing
      both threads to be stopped by GDB
    - GDB resumes the "leader" thread
    - Since the "other" thread has already reached the second barrier, the
      "leader" thread is free to run past its second barrier and do the
      exec, while GDB still hasn't resumed the second one

    By adding two barrier calls in each thread, I think we are good.  The test
    passes consistently for me, even with the artificial delay added.

    gdb/testsuite/ChangeLog:

            PR gdb/24694
            * gdb.multi/multi-arch-exec.c (thread_start, main): Add barrier
            calls.

    Change-Id: I25c8ea9724010b6bf20b42691c716235537d0e27

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process)
       [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2020-12-11  0:56 ` cvs-commit at gcc dot gnu.org
@ 2020-12-11  0:57 ` simark at simark dot ca
  8 siblings, 0 replies; 9+ messages in thread
From: simark at simark dot ca @ 2020-12-11  0:57 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=24694

Simon Marchi <simark at simark dot ca> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #14 from Simon Marchi <simark at simark dot ca> ---
This is hopefully fixed now.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-12-11  0:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-24694-4717@http.sourceware.org/bugzilla/>
2020-05-17  9:10 ` [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process) vries at gcc dot gnu.org
2020-07-21 14:41 ` vries at gcc dot gnu.org
2020-12-01  2:57 ` simark at simark dot ca
2020-12-01  3:03 ` simark at simark dot ca
2020-12-01  3:42 ` simark at simark dot ca
2020-12-04 13:18 ` palves at redhat dot com
2020-12-04 17:59 ` simark at simark dot ca
2020-12-11  0:56 ` cvs-commit at gcc dot gnu.org
2020-12-11  0:57 ` simark at simark dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).