* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
@ 2021-07-08 10:01 ` vries at gcc dot gnu.org
2021-07-08 10:01 ` vries at gcc dot gnu.org
` (25 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 10:01 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 13545
--> https://sourceware.org/bugzilla/attachment.cgi?id=13545&action=edit
gdb.log gzipped
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
2021-07-08 10:01 ` [Bug threads/28065] " vries at gcc dot gnu.org
@ 2021-07-08 10:01 ` vries at gcc dot gnu.org
2021-07-08 10:06 ` vries at gcc dot gnu.org
` (24 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 10:01 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |pedro at palves dot net
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
2021-07-08 10:01 ` [Bug threads/28065] " vries at gcc dot gnu.org
2021-07-08 10:01 ` vries at gcc dot gnu.org
@ 2021-07-08 10:06 ` vries at gcc dot gnu.org
2021-07-08 10:41 ` vries at gcc dot gnu.org
` (23 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 10:06 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
Reproduced on openSUSE Leap 15.2:
...
$ for n in $(seq 1 100); do echo -n "$n: " ; ./test.sh 2>&1 | grep "expected
passes" ; cp gdb.log gdb.$n.log; done
1: # of expected passes 13
2: # of expected passes 13
3: # of expected passes 13
4: # of expected passes 13
5: # of expected passes 13
6: # of expected passes 13
7: # of expected passes 13
8: # of expected passes 13
9: # of expected passes 13
10: # of expected passes 13
11: # of expected passes 13
12: # of expected passes 13
13: # of expected passes 12
14: # of expected passes 12
15: # of expected passes 13
...
gdb.13.log:
...
(gdb) inferior 2^M
[Switching to inferior 2 [process 1343] (access-mem-running-thread-exit)]^M
[Switching to thread 2.3260 (Thread 0x7fffe27fc700 (LWP 8131))](running)^M
(gdb) print global_var = 555^M
$297 = 555^M
(gdb) print global_var^M
Cannot access memory at address 0x601070^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access
mem (print global_var after writing, inf=2, iter=75)
...
gdb.14.log:
...
(gdb) inferior 2^M
[Switching to inferior 2 [process 6525] (access-mem-running-thread-exit)]^M
[Switching to thread 2.23673 (Thread 0x7fffdbfff700 (LWP 22533))](running)^M
(gdb) print global_var = 555^M
Cannot access memory at address 0x601070^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: non-stop: access
mem (write to global_var, inf=2, iter=523)
...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (2 preceding siblings ...)
2021-07-08 10:06 ` vries at gcc dot gnu.org
@ 2021-07-08 10:41 ` vries at gcc dot gnu.org
2021-07-08 10:41 ` vries at gcc dot gnu.org
` (22 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 10:41 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
Reproduced on openSUSE Leap 15.2 with trunk:
...
83: # of expected passes 13
84: # of expected passes 12
85: # of expected passes 12
86: # of expected passes 13
...
gdb.84.log:
...
(gdb) inferior 1^M
[Switching to inferior 1 [process 16006] (access-mem-running-thread-exit)]^M
[Switching to thread 1.24406 (Thread 0x7ffff54c0700 (LWP 486))](running)^M
(gdb) print global_var = 555^M
$2149 = 555^M
(gdb) print global_var^M
$2150 = 555^M
(gdb) print global_var = 333^M
Cannot access memory at address 0x601070^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access
mem (write to global_var again, inf=1, iter=538)
...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (3 preceding siblings ...)
2021-07-08 10:41 ` vries at gcc dot gnu.org
@ 2021-07-08 10:41 ` vries at gcc dot gnu.org
2021-07-08 10:45 ` pedro at palves dot net
` (21 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 10:41 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Version|unknown |HEAD
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (4 preceding siblings ...)
2021-07-08 10:41 ` vries at gcc dot gnu.org
@ 2021-07-08 10:45 ` pedro at palves dot net
2021-07-08 11:03 ` vries at gcc dot gnu.org
` (20 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-08 10:45 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #4 from Pedro Alves <pedro at palves dot net> ---
Huh. Unfortunately, the gdb.log doesn't show anything useful. We'd need "set
debug lin-lwp 1" logs at least.
I've set the testcase running in a loop here, went past 50 iterations, and I
still get no failures. I'll leave it running for a while longer. This is on
Ubuntu 20.04.
The code to look at is linux_proc_xfer_memory_partial /
linux_proc_xfer_memory_partial_pid.
Could it be that opening /proc/pid/task/lwp/mem is failing with errno other
than EACCES or ENOENT on openSUSE's kernel?
To debug this, I'd insert abort() calls in all TARGET_XFER_EOF failure paths in
linux_proc_xfer_memory_partial, and then set the testcase running in a loop.
Then debug the core dump.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (5 preceding siblings ...)
2021-07-08 10:45 ` pedro at palves dot net
@ 2021-07-08 11:03 ` vries at gcc dot gnu.org
2021-07-08 11:23 ` vries at gcc dot gnu.org
` (19 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 11:03 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
By doing:
...
@@ -153,6 +153,8 @@ proc test { non_stop } {
"print global_var after writing again"
}
+ gdb_test "echo bla \\n"
+
if {$ok} {
pass "access mem"
}
...
we see just a bit more:
...
(gdb) inferior 1^M
[Switching to inferior 1 [process 9358] (access-mem-running-thread-exit)]^M
[Switching to thread 1.8997 (Thread 0x7fffc47d0700 (LWP 27527))](running)^M
(gdb) print global_var = 555^M
$797 = 555^M
(gdb) print global_var^M
Cannot access memory at address 0x601070^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access
mem (print global_var after writing, inf=1, iter=200)
Cannot find user-level thread for LWP 27716: generic error^M
(gdb) echo bla \n^M
bla ^M
(gdb) PASS: gdb.threads/access-mem-running-thread-exit.exp: all-stop: echo bla
\n
...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (6 preceding siblings ...)
2021-07-08 11:03 ` vries at gcc dot gnu.org
@ 2021-07-08 11:23 ` vries at gcc dot gnu.org
2021-07-08 12:06 ` pedro at palves dot net
` (18 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 11:23 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 13546
--> https://sourceware.org/bugzilla/attachment.cgi?id=13546&action=edit
debug patch
(In reply to Pedro Alves from comment #4)
> Could it be that opening /proc/pid/task/lwp/mem is failing with errno other
> than EACCES or ENOENT on openSUSE's kernel?
>
> To debug this, I'd insert abort() calls in all TARGET_XFER_EOF failure paths
> in linux_proc_xfer_memory_partial, and then set the testcase running in a
> loop. Then debug the core dump.
Core dump gives this location:
...
(gdb) up
#6 0x0000000000801ec9 in linux_proc_xfer_memory_partial (readbuf=0x2dc7620 "",
writebuf=0x0, offset=6295664, len=4, xfered_len=0x7ffc589ada38)
at /home/vries/gdb_versions/devel/src/gdb/linux-nat.c:4007
4007 gdb_assert_not_reached ("4");
(gdb) l
4002
4003 if (res == 0)
4004 {
4005 /* EOF means the address space is gone, the whole process
4006 exited or execed. */
4007 gdb_assert_not_reached ("4");
4008 return TARGET_XFER_EOF;
4009 }
4010 else if (res == -1)
4011 {
(gdb)
...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (7 preceding siblings ...)
2021-07-08 11:23 ` vries at gcc dot gnu.org
@ 2021-07-08 12:06 ` pedro at palves dot net
2021-07-08 13:20 ` vries at gcc dot gnu.org
` (17 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-08 12:06 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #7 from Pedro Alves <pedro at palves dot net> ---
Huh^2.
Off hand, it looks as if either something is busted in the kernel, or the
process really died somehow.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (8 preceding siblings ...)
2021-07-08 12:06 ` pedro at palves dot net
@ 2021-07-08 13:20 ` vries at gcc dot gnu.org
2021-07-08 13:34 ` pedro at palves dot net
` (16 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-08 13:20 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #8 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Pedro Alves from comment #4)
> Huh. Unfortunately, the gdb.log doesn't show anything useful. We'd need
> "set debug lin-lwp 1" logs at least.
>
I've tried that, but the problem no longer reproduces with debugging on.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (9 preceding siblings ...)
2021-07-08 13:20 ` vries at gcc dot gnu.org
@ 2021-07-08 13:34 ` pedro at palves dot net
2021-07-09 9:41 ` vries at gcc dot gnu.org
` (15 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-08 13:34 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #9 from Pedro Alves <pedro at palves dot net> ---
I'm looking at the kernel sources, and AFAICT, the only way pread64 could
return 0 is if the address space is gone. See here:
https://github.com/torvalds/linux/blob/master/fs/proc/base.c#L834
For reads, mem_read -> mem_rw.
We either reach the !mm path:
if (!mm)
return 0;
or, here:
copied = 0;
if (!mmget_not_zero(mm))
goto free;
...
free:
free_page((unsigned long) page);
return copied;
'mm' is cleared by mem_release.
Could this be something like an OOM kill?
Maybe by tweaking the testcase to not exit immediately on first FAIL we would
see the process exit.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (10 preceding siblings ...)
2021-07-08 13:34 ` pedro at palves dot net
@ 2021-07-09 9:41 ` vries at gcc dot gnu.org
2021-07-09 9:44 ` vries at gcc dot gnu.org
` (14 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 9:41 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #10 from Tom de Vries <vries at gcc dot gnu.org> ---
I split up the test-case into two:
- access-mem-running-thread-exit-non-stop-off.exp.
- access-mem-running-thread-exit-non-stop-on.exp.
I established that running either in a loop triggers the problem.
I continued with access-mem-running-thread-exit-non-stop-off.exp.
Then I cranked up the hammer-away time from to 5 seconds to 10 minutes, which
gave me a fairly reliable reproducer without having to resort to iteration.
Then I tried on ubuntu 18.04.5, cranked up hammer-away time to 30 minutes, and
... managed to reproduce.
https://github.com/vries/gdb/commits/access-mem-running-thread-exit
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (11 preceding siblings ...)
2021-07-09 9:41 ` vries at gcc dot gnu.org
@ 2021-07-09 9:44 ` vries at gcc dot gnu.org
2021-07-09 13:33 ` vries at gcc dot gnu.org
` (13 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 9:44 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #11 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #10)
> Then I tried on ubuntu 18.04.5, cranked up hammer-away time to 30 minutes,
> and ... managed to reproduce.
I just reproduced again, this time I measured the time: real 8m33s.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (12 preceding siblings ...)
2021-07-09 9:44 ` vries at gcc dot gnu.org
@ 2021-07-09 13:33 ` vries at gcc dot gnu.org
2021-07-09 16:19 ` vries at gcc dot gnu.org
` (12 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 13:33 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #12 from Tom de Vries <vries at gcc dot gnu.org> ---
I updated the test-case to remove "set print thread-events off".
Also I moved the debug statements related to the /mem fp to a separate debug
category: lin-lwp-mem, enabled it in the test-case, and verified that the
problem still reproduced.
Changes available at
https://github.com/vries/gdb/commits/access-mem-running-thread-exit-v2 .
I got the following log:
...
[linux-nat] linux_proc_xfer_memory_partial_pid: opening
/proc/14678/task/19075/mem failed: No such file or directory (2)^M
^M
[linux-nat] linux_proc_xfer_memory_partial_pid: opened fd 13 for
/proc/14678/task/19285/mem^M
^M
[linux-nat] linux_proc_xfer_memory_partial_pid: accessing fd 13 for pid 19285
got EOF^M
^M
[linux-nat] linux_proc_xfer_memory_partial_pid: fd 13 for
/proc/14678/task/19285/mem^M
^M
[linux-nat] linux_proc_xfer_memory_partial_pid: accessing fd 13 for pid 19075
got EOF^M
^M
[linux-nat] linux_proc_xfer_memory_partial_pid: fd 13 for
/proc/14678/task/19285/mem^M
^M
Cannot access memory at address 0x601070^M
(gdb) [LWP 19287 exited]^M
...
I noticed that we get EOF twice, and the second time try for the same file.
This led me to the following:
...
diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c
index f206b874929..14892a4ee2a 100644
--- a/gdb/linux-nat.c
+++ b/gdb/linux-nat.c
@@ -3907,6 +3907,7 @@ linux_proc_xfer_memory_partial_pid (ptid_t ptid,
{
linux_nat_debug_printf ("accessing fd %d for pid %ld got EOF\n",
fd, ptid.lwp ());
+ last_proc_mem_file.close ();
}
return ret;
...
And this fixed the failure: I managed to run the test-case for half an hour
without triggering the FAIL.
I don't understand things well enough to say whether this is a proper fix.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (13 preceding siblings ...)
2021-07-09 13:33 ` vries at gcc dot gnu.org
@ 2021-07-09 16:19 ` vries at gcc dot gnu.org
2021-07-09 16:40 ` pedro at palves dot net
` (11 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 16:19 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #13 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 13552
--> https://sourceware.org/bugzilla/attachment.cgi?id=13552&action=edit
access.c
Standalone reproducer:
...
$ gcc access.c -pthread
$ ./a.out
pread64 res: 0
a.out: access.c:43: get_global_var: Assertion `res == 4' failed.
Aborted (core dumped)
$
...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (14 preceding siblings ...)
2021-07-09 16:19 ` vries at gcc dot gnu.org
@ 2021-07-09 16:40 ` pedro at palves dot net
2021-07-09 17:11 ` pedro at palves dot net
` (10 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 16:40 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #14 from Pedro Alves <pedro at palves dot net> ---
This is brilliant work, Tom. Thanks for doing all this.
In the standalong reproducer, you're missing a close(fd) at the end of
get_global_var, but even with that it reproduces for me too.
It looks like a kernel bug to me off hand, are you thinking otherwise?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (15 preceding siblings ...)
2021-07-09 16:40 ` pedro at palves dot net
@ 2021-07-09 17:11 ` pedro at palves dot net
2021-07-09 18:57 ` vries at gcc dot gnu.org
` (9 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 17:11 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #15 from Pedro Alves <pedro at palves dot net> ---
Created attachment 13553
--> https://sourceware.org/bugzilla/attachment.cgi?id=13553&action=edit
program showing you can continue accessing memory via file after thread exits
This got me questioning whether I was indeed correct that you can continue
accessing memory via a file after the thread it was open for has exited, so I
tweaked the standalone reproducer to double check it, and indeed, that is
correct. I'm attaching the file, just so if somebody else questions it, we
have this recorded.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (16 preceding siblings ...)
2021-07-09 17:11 ` pedro at palves dot net
@ 2021-07-09 18:57 ` vries at gcc dot gnu.org
2021-07-09 19:06 ` pedro at palves dot net
` (8 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 18:57 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #16 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Pedro Alves from comment #14)
> This is brilliant work, Tom. Thanks for doing all this.
>
Np :)
> In the standalong reproducer, you're missing a close(fd) at the end of
> get_global_var, but even with that it reproduces for me too.
>
Ack, thanks.
> It looks like a kernel bug to me off hand, are you thinking otherwise?
I'm not sure. I made the standalone reproducer to show to the kernel people
such that they can answer that question ;)
I read your comment about mem_rw, and looked at the code, and that seems to
make sense, but ... not my expertise, so I easily could be overlooking
something.
FWIW, I found in https://man7.org/linux/man-pages/man5/proc.5.html :
...
/proc/[pid]/task (since Linux 2.6.0)
...
In a multithreaded process, the contents of the
/proc/[pid]/task directory are not available if the main
thread has already terminated (typically by calling
pthread_exit(3)).
...
So, I guess it's possible we get the "unsupported" answer.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (17 preceding siblings ...)
2021-07-09 18:57 ` vries at gcc dot gnu.org
@ 2021-07-09 19:06 ` pedro at palves dot net
2021-07-09 19:10 ` pedro at palves dot net
` (7 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 19:06 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #17 from Pedro Alves <pedro at palves dot net> ---
> So, I guess it's possible we get the "unsupported" answer.
Earlier I thought that we might be seeing some race with the kernel thinking
that the whole process was gone because the main thread had exited, and
thread_fn exits without making sure first that the new threads it spawned were
actually started. So I removed the main thread's pthread_exit to check it,
and, the test still failed...
To avoid confusing the kernel people with that detail, it maybe just be better
to remove the main thread's pthread_exit from the standalone reproducer. I did
that here, replaced with:
while (1)
sleep (1);
and the program still fails the same way.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (18 preceding siblings ...)
2021-07-09 19:06 ` pedro at palves dot net
@ 2021-07-09 19:10 ` pedro at palves dot net
2021-07-09 20:32 ` pedro at palves dot net
` (6 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 19:10 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Pedro Alves <pedro at palves dot net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #13553|program showing you can |access-after-thread-exit.c
description|continue accessing memory |(program showing you can
|via file after thread exits |continue accessing memory
| |via file after thread
| |exits)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (19 preceding siblings ...)
2021-07-09 19:10 ` pedro at palves dot net
@ 2021-07-09 20:32 ` pedro at palves dot net
2021-07-09 21:53 ` vries at gcc dot gnu.org
` (5 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 20:32 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Pedro Alves <pedro at palves dot net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #13552|0 |1
is obsolete| |
--- Comment #18 from Pedro Alves <pedro at palves dot net> ---
Created attachment 13554
--> https://sourceware.org/bugzilla/attachment.cgi?id=13554&action=edit
attach-2.c
Adjusted version of testcase that:
- closes fd
- does not have main thread call pthread_exit.
- also expects ESRCH. I saw that a couple times.
Otherwise, still fails the same way.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (20 preceding siblings ...)
2021-07-09 20:32 ` pedro at palves dot net
@ 2021-07-09 21:53 ` vries at gcc dot gnu.org
2021-07-09 22:55 ` pedro at palves dot net
` (4 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-07-09 21:53 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Tom de Vries <vries at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bp at alien8 dot de
--- Comment #19 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Pedro Alves from comment #18)
> Created attachment 13554 [details]
> attach-2.c
>
> Adjusted version of testcase that:
>
> - closes fd
> - does not have main thread call pthread_exit.
> - also expects ESRCH. I saw that a couple times.
>
> Otherwise, still fails the same way.
Hi Boris,
We're seeing this reproducer fail like this:
...
$ gcc -pthread access-2.c
$ ./a.out
pread64 res: 0
a.out: access-2.c:45: get_global_var: Assertion `res == 4' failed.
Aborted (core dumped)
...
and we're trying to understand if this is a kernel bug or not.
What happens:
- we do an open ("/proc/$pid/task/$tid/mem"), and this succeeds, so
we get a file descriptor.
- then we try to read from the file descriptor, but this returns 0, in other
words we have EOF.
The fact that read doesn't complete could be explained by tid exiting
before/during the read.
The question is then whether EOF is the correct failure mode. Shouldn't we
expect read to return -1, with errno set to EBADF or EINVAL or some such?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (21 preceding siblings ...)
2021-07-09 21:53 ` vries at gcc dot gnu.org
@ 2021-07-09 22:55 ` pedro at palves dot net
2021-07-10 12:24 ` simark at simark dot ca
` (3 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-07-09 22:55 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #20 from Pedro Alves <pedro at palves dot net> ---
> The fact that read doesn't complete could be explained by tid exiting
> before/during the read.
It could, but I don't think it should. As seen with the
access-after-thread-exit.c program attached to this bugzilla (and confirmed the
same in gdb), if you manage to open a /proc/pid/task/tid/mem file, continue to
be able to use the file to read memory from pid's address space, even after tid
exits. I believe that what is supposed to happen is that only when the whole
process (thread group) exits or execs that read should return 0 (eof), because
the address space is destroyed.
However, we're seeing read return 0 (eof) even without the process exiting or
execing. This smells like a race in the kernel somewhere.
> The question is then whether EOF is the correct failure mode. Shouldn't we
> expect read to return -1, with errno set to EBADF or EINVAL or some such?
I don't think it should fail at all.
Very curious to hear from Boris!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (22 preceding siblings ...)
2021-07-09 22:55 ` pedro at palves dot net
@ 2021-07-10 12:24 ` simark at simark dot ca
2021-10-05 11:27 ` vries at gcc dot gnu.org
` (2 subsequent siblings)
26 siblings, 0 replies; 28+ messages in thread
From: simark at simark dot ca @ 2021-07-10 12:24 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Simon Marchi <simark at simark dot ca> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |simark at simark dot ca
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (23 preceding siblings ...)
2021-07-10 12:24 ` simark at simark dot ca
@ 2021-10-05 11:27 ` vries at gcc dot gnu.org
2021-11-05 17:58 ` cvs-commit at gcc dot gnu.org
2021-11-05 18:02 ` pedro at palves dot net
26 siblings, 0 replies; 28+ messages in thread
From: vries at gcc dot gnu.org @ 2021-10-05 11:27 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #21 from Tom de Vries <vries at gcc dot gnu.org> ---
Patch posted:
https://sourceware.org/pipermail/gdb-patches/2021-September/182255.html
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (24 preceding siblings ...)
2021-10-05 11:27 ` vries at gcc dot gnu.org
@ 2021-11-05 17:58 ` cvs-commit at gcc dot gnu.org
2021-11-05 18:02 ` pedro at palves dot net
26 siblings, 0 replies; 28+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-11-05 17:58 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
--- Comment #22 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pedro Alves <palves@sourceware.org>:
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8a89ddbda2ecb41be0f12142e5d4b95c7bd5a138
commit 8a89ddbda2ecb41be0f12142e5d4b95c7bd5a138
Author: Pedro Alves <pedro@palves.net>
Date: Tue Sep 14 19:01:37 2021 +0100
Avoid /proc/pid/mem races (PR 28065)
PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent
failure) shows that GDB can hit an unexpected scenario -- it can
happen that the kernel manages to open a /proc/PID/task/LWP/mem file,
but then reading from the file returns 0/EOF, even though the process
hasn't exited or execed.
"0" out of read/write is normally what you get when the address space
of the process the file was open for is gone, because the process
execed or exited. So when GDB gets the 0, it returns memory access
failure. In the bad case in question, the process hasn't execed or
exited, so GDB fails a memory access when the access should have
worked.
GDB has code in place to gracefully handle the case of opening the
/proc/PID/task/LWP/mem just while the LWP is exiting -- most often the
open fails with EACCES or ENOENT. When it happens, GDB just tries
opening the file for a different thread of the process. The testcase
is written such that it stresses GDB's logic of closing/reopening the
/proc/PID/task/LWP/mem file, by constantly spawning short lived
threads.
However, there's a window where the kernel manages to find the thread,
but the thread exits just after and clears its address space pointer.
In this case, the kernel creates a file successfully, but the file
ends up with no address space associated, so a subsequent read/write
returns 0/EOF too, just like if the whole process had execed or
exited. This is the case in question that GDB does not handle.
Oleg Nesterov gave this suggestion as workaround for that race:
gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm.
If statm reports something non-zero, then open() was "successfull".
I think that might work. However, I didn't try it, because I realized
we have another nasty race that that wouldn't fix.
The other race I realized is that because we close/reopen the
/proc/PID/task/LWP/mem file when GDB switches to a different inferior,
then it can happen that GDB reopens /proc/PID/task/LWP/mem just after
a thread execs, and before GDB has seen the corresponding exec event.
I.e., we can open a /proc/PID/task/LWP/mem file accessing the
post-exec address space thinking we're accessing the pre-exec address
space.
A few months back, Simon, Oleg and I discussed a similar race:
[Bug gdb/26754] Race condition when resuming threads and one does an exec
https://sourceware.org/bugzilla/show_bug.cgi?id=26754
The solution back then was to make the kernel fail any ptrace
operation until the exec event is consumed, with this kernel commit:
commit dbb5afad100a828c97e012c6106566d99f041db6
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Wed May 12 15:33:08 2021 +0200
Commit: Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Wed May 12 10:45:22 2021 -0700
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
This however, only applies to ptrace, not to the /proc/pid/mem file
opening case. Also, even if it did apply to the file open case, we
would want to support current kernels until such a fix is more wide
spread anyhow.
So all in all, this commit gives up on the idea of only ever keeping
one /proc/pid/mem file descriptor open. Instead, make GDB open a
/proc/pid/mem per inferior, and keep it open until the inferior exits,
is detached or execs. Make GDB open the file right after the inferior
is created or is attached to or forks, at which point we know the
inferior is stable and stopped and isn't thus going to exec, or have a
thread exit, and so the file open won't fail (unless the whole process
is SIGKILLed from outside GDB, at which point it doesn't matter
whether we open the file).
This way, we avoid both races described above, at the expense of using
more file descriptors (one per inferior).
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread
* [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354)
2021-07-08 9:57 [Bug threads/28065] New: FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) vries at gcc dot gnu.org
` (25 preceding siblings ...)
2021-11-05 17:58 ` cvs-commit at gcc dot gnu.org
@ 2021-11-05 18:02 ` pedro at palves dot net
26 siblings, 0 replies; 28+ messages in thread
From: pedro at palves dot net @ 2021-11-05 18:02 UTC (permalink / raw)
To: gdb-prs
https://sourceware.org/bugzilla/show_bug.cgi?id=28065
Pedro Alves <pedro at palves dot net> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |12.1
--- Comment #23 from Pedro Alves <pedro at palves dot net> ---
Fix merged to master. Do you think we should put this in GDB 11? If so, it
might be better to let it cook in master for a while.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 28+ messages in thread