public inbox for gdb-prs@sourceware.org
help / color / mirror / Atom feed
* [Bug gdb/29762] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access mem (print global_var after writing again, inf=2, iter=1)
@ 2022-11-09  2:15 simark at simark dot ca
  2022-11-11  1:16 ` [Bug gdb/29762] " simark at simark dot ca
  0 siblings, 1 reply; 2+ messages in thread
From: simark at simark dot ca @ 2022-11-09  2:15 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29762

            Bug ID: 29762
           Summary: FAIL: gdb.threads/access-mem-running-thread-exit.exp:
                    non-stop: access mem (print global_var after writing
                    again, inf=2, iter=1)
           Product: gdb
           Version: HEAD
            Status: NEW
          Severity: normal
          Priority: P2
         Component: gdb
          Assignee: unassigned at sourceware dot org
          Reporter: simark at simark dot ca
  Target Milestone: ---

I get this failure very rarely on my CI.  I managed to reproduce it on my dev
machine by running:

$ while taskset -c 1,19 make check
TESTS="gdb.threads/access-mem-running-thread-exit.exp"
RUNTESTFLAGS="--target_board=native-extended-gdbserver";do done

It takes a few runs, maybe a few minutes, but it eventually fails.

I think running

$ stress -n $(nproc)

at the same time helped, but maybe it was just an illusion.

Here's an instance of the failure:

(gdb) print global_var = 555^M
$1 = 555^M
(gdb) print global_var^M
$2 = 555^M
(gdb) print global_var = 333^M
$3 = 333^M
(gdb) print global_var^M
$4 = 123^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access
mem (print global_var after writing again, inf=2, iter=1)

In another case it looks like this:

(gdb) print global_var = 555^M
$1 = 555^M
(gdb) print global_var^M
$2 = 123^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access
mem (print global_var after writing, inf=2, iter=1)

I don't know if the taskset is a red herring, but I never got a failure by
running it without the taskset, or by running with taskset on a single core.

Interestingly, all the failures I got were always on iter=1.

I don't really know what kind of racy problem it could be in GDB.  It sounds
like a "write memory on one core, get migrated to another CPU, then read the
old value on another core" kind of problem.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug gdb/29762] FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access mem (print global_var after writing again, inf=2, iter=1)
  2022-11-09  2:15 [Bug gdb/29762] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access mem (print global_var after writing again, inf=2, iter=1) simark at simark dot ca
@ 2022-11-11  1:16 ` simark at simark dot ca
  0 siblings, 0 replies; 2+ messages in thread
From: simark at simark dot ca @ 2022-11-11  1:16 UTC (permalink / raw)
  To: gdb-prs

https://sourceware.org/bugzilla/show_bug.cgi?id=29762

--- Comment #1 from Simon Marchi <simark at simark dot ca> ---
> I don't really know what kind of racy problem it could be in GDB.  It sounds
> like a "write memory on one core, get migrated to another CPU, then read the
> old value on another core" kind of problem.

Of course it's absolutely not that.

It turns out that with this change in the test, it reproduces pretty much every
time:

diff --git a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
index 7932c0a82e6..54080e5e5bc 100644
--- a/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
+++ b/gdb/testsuite/gdb.threads/access-mem-running-thread-exit.exp
@@ -172,10 +172,13 @@ proc test { non_stop } {

        my_gdb_test "print global_var = 555" " = 555" \
            "write to global_var"
+       sleep 1
        my_gdb_test "print global_var" " = 555" \
            "print global_var after writing"
+       sleep 1
        my_gdb_test "print global_var = 333" " = 333" \
            "write to global_var again"
+       sleep 1
        my_gdb_test "print global_var" " = 333" \
            "print global_var after writing again"
     }

By putting some printfs in gdbserver (and hacking the testsuite so it would
connect to the gdbserver I started manually, so I could see its stdout), I
found that we would end up writing or reading from the wrong inferior. 
Sometimes, it happens that prior to a memory access operation (use to implement
those prints), GDB tries to set the remote general thread, but it fails because
that thread has just exited (and GDB doesn't know about it).  The Hg packet
fails, but we don't check the response:

https://gitlab.com/gnutools/binutils-gdb/-/blob/cde010e1a866e67b7e895cbcb95dedd3de0a1e56/gdb/remote.c#L2914

So GDB proceeds with the memory operation with the previous remote general
thread still set, which belongs to the other inferior.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-11-11  1:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-09  2:15 [Bug gdb/29762] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access mem (print global_var after writing again, inf=2, iter=1) simark at simark dot ca
2022-11-11  1:16 ` [Bug gdb/29762] " simark at simark dot ca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).