From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id D529B3858410; Fri, 5 Nov 2021 17:58:51 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D529B3858410 From: "cvs-commit at gcc dot gnu.org" To: gdb-prs@sourceware.org Subject: [Bug threads/28065] FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: access mem (print global_var after writing again, inf=1, iter=354) Date: Fri, 05 Nov 2021 17:58:51 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: threads X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gdb-prs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-prs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Nov 2021 17:58:51 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D28065 --- Comment #22 from cvs-commit at gcc dot gnu.org --- The master branch has been updated by Pedro Alves : https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3D8a89ddbda2ec= b41be0f12142e5d4b95c7bd5a138 commit 8a89ddbda2ecb41be0f12142e5d4b95c7bd5a138 Author: Pedro Alves Date: Tue Sep 14 19:01:37 2021 +0100 Avoid /proc/pid/mem races (PR 28065) PR 28065 (gdb.threads/access-mem-running-thread-exit.exp intermittent failure) shows that GDB can hit an unexpected scenario -- it can happen that the kernel manages to open a /proc/PID/task/LWP/mem file, but then reading from the file returns 0/EOF, even though the process hasn't exited or execed. "0" out of read/write is normally what you get when the address space of the process the file was open for is gone, because the process execed or exited. So when GDB gets the 0, it returns memory access failure. In the bad case in question, the process hasn't execed or exited, so GDB fails a memory access when the access should have worked. GDB has code in place to gracefully handle the case of opening the /proc/PID/task/LWP/mem just while the LWP is exiting -- most often the open fails with EACCES or ENOENT. When it happens, GDB just tries opening the file for a different thread of the process. The testcase is written such that it stresses GDB's logic of closing/reopening the /proc/PID/task/LWP/mem file, by constantly spawning short lived threads. However, there's a window where the kernel manages to find the thread, but the thread exits just after and clears its address space pointer. In this case, the kernel creates a file successfully, but the file ends up with no address space associated, so a subsequent read/write returns 0/EOF too, just like if the whole process had execed or exited. This is the case in question that GDB does not handle. Oleg Nesterov gave this suggestion as workaround for that race: gdb can open(/proc/pid/mem) and then read (say) /proc/pid/statm. If statm reports something non-zero, then open() was "successfull". I think that might work. However, I didn't try it, because I realized we have another nasty race that that wouldn't fix. The other race I realized is that because we close/reopen the /proc/PID/task/LWP/mem file when GDB switches to a different inferior, then it can happen that GDB reopens /proc/PID/task/LWP/mem just after a thread execs, and before GDB has seen the corresponding exec event. I.e., we can open a /proc/PID/task/LWP/mem file accessing the post-exec address space thinking we're accessing the pre-exec address space. A few months back, Simon, Oleg and I discussed a similar race: [Bug gdb/26754] Race condition when resuming threads and one does an = exec https://sourceware.org/bugzilla/show_bug.cgi?id=3D26754 The solution back then was to make the kernel fail any ptrace operation until the exec event is consumed, with this kernel commit: commit dbb5afad100a828c97e012c6106566d99f041db6 Author: Oleg Nesterov AuthorDate: Wed May 12 15:33:08 2021 +0200 Commit: Linus Torvalds CommitDate: Wed May 12 10:45:22 2021 -0700 ptrace: make ptrace() fail if the tracee changed its pid unexpecte= dly This however, only applies to ptrace, not to the /proc/pid/mem file opening case. Also, even if it did apply to the file open case, we would want to support current kernels until such a fix is more wide spread anyhow. So all in all, this commit gives up on the idea of only ever keeping one /proc/pid/mem file descriptor open. Instead, make GDB open a /proc/pid/mem per inferior, and keep it open until the inferior exits, is detached or execs. Make GDB open the file right after the inferior is created or is attached to or forks, at which point we know the inferior is stable and stopped and isn't thus going to exec, or have a thread exit, and so the file open won't fail (unless the whole process is SIGKILLed from outside GDB, at which point it doesn't matter whether we open the file). This way, we avoid both races described above, at the expense of using more file descriptors (one per inferior). Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=3D28065 Change-Id: Iff943b95126d0f98a7973a07e989e4f020c29419 --=20 You are receiving this mail because: You are on the CC list for the bug.=