From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 10B233870892; Fri, 11 Dec 2020 00:56:52 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 10B233870892 From: "cvs-commit at gcc dot gnu.org" To: gdb-prs@sourceware.org Subject: [Bug gdb/24694] FAIL: gdb.multi/multi-arch-exec.exp: first_arch=1: selected_thread=1: follow_exec_mode=same: continue across exec that changes architecture (Couldn't get registers: No such process) Date: Fri, 11 Dec 2020 00:56:51 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gdb X-Bugzilla-Component: gdb X-Bugzilla-Version: HEAD X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gdb-prs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-prs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Dec 2020 00:56:52 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D24694 --- Comment #13 from cvs-commit at gcc dot gnu.org --- The master branch has been updated by Simon Marchi : https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3D4483a8e72ad2= 65b5d428899d384bf190db071759 commit 4483a8e72ad265b5d428899d384bf190db071759 Author: Simon Marchi Date: Thu Dec 10 19:55:56 2020 -0500 gdb/testsuite: fix race condition in gdb.multi/multi-arch-exec.exp That test fails intermittently for me. The problem is a race condition between the exec syscall and GDB resuming threads. The initial situation is that we have two threads, let's call them "leader" and "other". Leader is the one who is going to do the exec. We stop at the breakpoint on the all_started function, so both threads are stopped. When resuming, GDB resumes leader first and other second. However, between resuming the two threads, leader has time to run and do its exec, making other disappear. When GDB tries to resume other, it is ino longer there. We get some "Couldn't get registers: No such process." messages, and the state is a bit messed up. The issue can be triggered consistently by adding a small delay after the resume syscall: diff --git a/gdb/inf-ptrace.c b/gdb/inf-ptrace.c index d5a062163c7..9540339a9da 100644 --- a/gdb/inf-ptrace.c +++ b/gdb/inf-ptrace.c @@ -308,6 +308,8 @@ inf_ptrace_target::resume (ptid_t ptid, int ste= p, enum gdb_signal signal) gdb_ptrace (request, ptid, (PTRACE_TYPE_ARG3)1, gdb_signal_to_ho= st (signal)); if (errno !=3D 0) perror_with_name (("ptrace")); + for (int i =3D 0 ; i < 100; i++) + usleep (10000); } /* Wait for the child specified by PTID to do something. Return t= he This patch is about fixing the test to avoid this, since the test is not about testing this particular corner case. Handling of multi-threaded program doing execs should be improved too, but that's not the goal of this patch. Fix it by adding a synchronization point in the test to make sure both threads were resumed by GDB before doing the exec. I added two pthread_barrier_wait calls in each thread (for a total of three). I think adding one call in each thread would not be enough, because this could happen: - both threads reach the first barrier - the "other" thread is scheduled so has time to run and hit the second barrier - the "leader" thread hits the all_started function breakpoint, causing both threads to be stopped by GDB - GDB resumes the "leader" thread - Since the "other" thread has already reached the second barrier, the "leader" thread is free to run past its second barrier and do the exec, while GDB still hasn't resumed the second one By adding two barrier calls in each thread, I think we are good. The t= est passes consistently for me, even with the artificial delay added. gdb/testsuite/ChangeLog: PR gdb/24694 * gdb.multi/multi-arch-exec.c (thread_start, main): Add barrier calls. Change-Id: I25c8ea9724010b6bf20b42691c716235537d0e27 --=20 You are receiving this mail because: You are on the CC list for the bug.=