[PATCH] gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp

public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed

* [PATCH] gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp
@ 2020-12-01  3:41 Simon Marchi
  2020-12-11  0:56 ` Simon Marchi
  0 siblings, 1 reply; 2+ messages in thread
From: Simon Marchi @ 2020-12-01  3:41 UTC (permalink / raw)
  To: gdb-patches; +Cc: Simon Marchi

That test fails intermittently for me.  The problem is a race condition
between the exec syscall and GDB resuming threads.

The initial situation is that we have two threads, let's call them
"leader" and "other".  Leader is the one who is going to do the exec.
We stop at the breakpoint on the all_started function, so both threads
are stopped.  When resuming, GDB resumes leader first and other second.
However, between resuming the two threads, leader has time to run and do
its exec, making other disappear.  When GDB tries to resume other, it is
ino longer there.  We get some "Couldn't get registers: No such
process." messages, and the state is a bit messed up.

The issue can be triggered consistently by adding a small delay after
the resume syscall:

    diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
    index d5a062163c7..9540339a9da 100644
    --- a/gdb/inf-ptrace.c
    +++ b/gdb/inf-ptrace.c
    @@ -308,6 +308,8 @@ inf_ptrace_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
       gdb_ptrace (request, ptid, (PTRACE_TYPE_ARG3)1, gdb_signal_to_host (signal));
       if (errno != 0)
         perror_with_name (("ptrace"));
    +  for (int i = 0 ; i < 100; i++)
    +    usleep (10000);
     }

     /* Wait for the child specified by PTID to do something.  Return the

This patch is about fixing the test to avoid this, since the test is not
about testing this particular corner case.  Handling of multi-threaded
program doing execs should be improved too, but that's not the goal of
this patch.

Fix it by adding a synchronization point in the test to make sure both
threads were resumed by GDB before doing the exec.  I added two
pthread_barrier_wait calls in each thread (for a total of three).  I
think adding one call in each thread would not be enough, because this
could happen:

- both threads reach the first barrier
- the "other" thread is scheduled so has time to run and hit the second
  barrier
- the "leader" thread hits the all_started function breakpoint, causing
  both threads to be stopped by GDB
- GDB resumes the "leader" thread
- Since the "other" thread has already reached the second barrier, the
  "leader" thread is free to run past its second barrier and do the
  exec, while GDB still hasn't resumed the second one

By adding two barrier calls in each thread, I think we are good.  The test
passes consistently for me, even with the artificial delay added.

gdb/testsuite/ChangeLog:

	PR gdb/24694
	* gdb.multi/multi-arch-exec.c (thread_start, main): Add barrier
	calls.

Change-Id: I25c8ea9724010b6bf20b42691c716235537d0e27
---
 gdb/testsuite/gdb.multi/multi-arch-exec.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/gdb/testsuite/gdb.multi/multi-arch-exec.c b/gdb/testsuite/gdb.multi/multi-arch-exec.c
index a2c04dfa87d..1c22050a009 100644
--- a/gdb/testsuite/gdb.multi/multi-arch-exec.c
+++ b/gdb/testsuite/gdb.multi/multi-arch-exec.c
@@ -30,6 +30,8 @@ static void *
 thread_start (void *arg)
 {
   pthread_barrier_wait (&barrier);
+  pthread_barrier_wait (&barrier);
+  pthread_barrier_wait (&barrier);

   while (1)
     sleep (1);
@@ -60,6 +62,11 @@ main (int argc, char ** argv)
   pthread_barrier_wait (&barrier);
   all_started ();

+  /* Avoid races with GDB ptrace-resuming the threads and the exec: ensure
+     both threads were resumed by GDB before going into the exec.  */
+  pthread_barrier_wait (&barrier);
+  pthread_barrier_wait (&barrier);
+
   execl (prog,
          prog,
          (char *) NULL);
-- 
2.29.2

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp
  2020-12-01  3:41 [PATCH] gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp Simon Marchi
@ 2020-12-11  0:56 ` Simon Marchi
  0 siblings, 0 replies; 2+ messages in thread
From: Simon Marchi @ 2020-12-11  0:56 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches

On 2020-11-30 10:41 p.m., Simon Marchi via Gdb-patches wrote:
> That test fails intermittently for me.  The problem is a race condition
> between the exec syscall and GDB resuming threads.
> 
> The initial situation is that we have two threads, let's call them
> "leader" and "other".  Leader is the one who is going to do the exec.
> We stop at the breakpoint on the all_started function, so both threads
> are stopped.  When resuming, GDB resumes leader first and other second.
> However, between resuming the two threads, leader has time to run and do
> its exec, making other disappear.  When GDB tries to resume other, it is
> ino longer there.  We get some "Couldn't get registers: No such
> process." messages, and the state is a bit messed up.
> 
> The issue can be triggered consistently by adding a small delay after
> the resume syscall:
> 
>     diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c
>     index d5a062163c7..9540339a9da 100644
>     --- a/gdb/inf-ptrace.c
>     +++ b/gdb/inf-ptrace.c
>     @@ -308,6 +308,8 @@ inf_ptrace_target::resume (ptid_t ptid, int step, enum gdb_signal signal)
>        gdb_ptrace (request, ptid, (PTRACE_TYPE_ARG3)1, gdb_signal_to_host (signal));
>        if (errno != 0)
>          perror_with_name (("ptrace"));
>     +  for (int i = 0 ; i < 100; i++)
>     +    usleep (10000);
>      }
> 
>      /* Wait for the child specified by PTID to do something.  Return the
> 
> This patch is about fixing the test to avoid this, since the test is not
> about testing this particular corner case.  Handling of multi-threaded
> program doing execs should be improved too, but that's not the goal of
> this patch.
> 
> Fix it by adding a synchronization point in the test to make sure both
> threads were resumed by GDB before doing the exec.  I added two
> pthread_barrier_wait calls in each thread (for a total of three).  I
> think adding one call in each thread would not be enough, because this
> could happen:
> 
> - both threads reach the first barrier
> - the "other" thread is scheduled so has time to run and hit the second
>   barrier
> - the "leader" thread hits the all_started function breakpoint, causing
>   both threads to be stopped by GDB
> - GDB resumes the "leader" thread
> - Since the "other" thread has already reached the second barrier, the
>   "leader" thread is free to run past its second barrier and do the
>   exec, while GDB still hasn't resumed the second one
> 
> By adding two barrier calls in each thread, I think we are good.  The test
> passes consistently for me, even with the artificial delay added.
> 
> gdb/testsuite/ChangeLog:
> 
> 	PR gdb/24694
> 	* gdb.multi/multi-arch-exec.c (thread_start, main): Add barrier
> 	calls.
> 
> Change-Id: I25c8ea9724010b6bf20b42691c716235537d0e27

I just pushed this.

Simon

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-12-11  0:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-01  3:41 [PATCH] gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp Simon Marchi
2020-12-11  0:56 ` Simon Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).