From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103363 invoked by alias); 14 Mar 2016 21:29:32 -0000 Mailing-List: contact gdb-patches-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: gdb-patches-owner@sourceware.org Received: (qmail 103353 invoked by uid 89); 14 Mar 2016 21:29:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=continuously, Parent, diagnosis, restarted X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-GCM-SHA384 encrypted) ESMTPS; Mon, 14 Mar 2016 21:29:21 +0000 Received: from svr-orw-fem-02x.mgc.mentorg.com ([147.34.96.206] helo=SVR-ORW-FEM-02.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1afa3D-0005es-Vt from Don_Breazeal@mentor.com ; Mon, 14 Mar 2016 14:29:20 -0700 Received: from [172.30.4.244] (147.34.91.1) by SVR-ORW-FEM-02.mgc.mentorg.com (147.34.96.168) with Microsoft SMTP Server (TLS) id 14.3.224.2; Mon, 14 Mar 2016 14:29:19 -0700 Subject: Re: [PING] Re: [PATCH v2 2/3] PR remote/19496, interrupted syscall in forking-threads-plus-bkpt To: , References: <1455150484-12600-1-git-send-email-donb@codesourcery.com> <56CF39CF.5080007@codesourcery.com> <56D88042.1050207@codesourcery.com> From: Don Breazeal Message-ID: <56E72D2F.2010307@codesourcery.com> Date: Mon, 14 Mar 2016 21:29:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <56D88042.1050207@codesourcery.com> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2016-03/txt/msg00222.txt.bz2 Hi Pedro, Did you have any more suggestions about handling the interrupted system call, or shall we go with the "loop until we don't get -1 and errno===EINTR" approach? Thanks, --Don On 3/3/2016 10:19 AM, Don Breazeal wrote: > Ping. > I checked, the patch still applies cleanly to mainline. > Thanks > --Don > > On 2/25/2016 9:28 AM, Don Breazeal wrote: >> Ping >> Thanks, >> --Don >> >> On 2/10/2016 4:28 PM, Don Breazeal wrote: >>> Hi Pedro, >>> >>> On 2/1/2016 11:38 AM, Pedro Alves wrote: >>>> On 01/28/2016 12:48 AM, Don Breazeal wrote: >>>>> This patch addresses "fork:Interrupted system call" (or wait:) failures >>>>> in gdb.threads/forking-threads-plus-breakpoint.exp. >>>>> >>>>> The test program spawns ten threads, each of which do ten fork/waitpid >>>>> sequences. The cause of the problem was that when one of the fork >>>>> children exited before the corresponding fork parent could initiate its >>>>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork >>>>> or waitpid in another thread. >>> >>> In fact, I think my diagnosis here was incorrect, or at least incorrect >>> in some cases. I believe at least some of the interruptions are caused >>> by SIGSTOP, sent by GDB when stopping all the threads. More below. >>> >>>>> >>>>> The fix was to wrap the system calls in a loop to retry the call if >>>>> it was interrupted, like: >>>>> >>>>> do >>>>> { >>>>> pid = fork (); >>>>> } >>>>> while (pid == -1 && errno == EINTR); >>>>> >>>>> Since this is a Linux-only test I figure it is OK to use errno and EINTR. >>>>> >>>>> Tested on Nios II Linux target with x86 Linux host. >>>> >>>> I'd prefer to avoid this if possible. These loops potentially hide >>>> bugs like ERESTARTSYS escaping out of a syscall and mishandling of >>>> signals. See bc9540e842eb5639ca59cb133adef211d252843c for example: >>>> https://sourceware.org/ml/gdb-patches/2015-02/msg00654.html >>>> >>>> How about setting SIGCHLD to SIG_IGN, or making SIGCHLD be SA_RESTART? >>> >>> I spent a couple of days trying to find an alternate solution, but >>> couldn't find one that was reliable. Here is a snapshot of what I tried: >>> >>> 1) SIG_IGN: results in an ECHILD from waitpid. The man page for waitpid >>> says "This can happen for one's own child if the action for SIGCHLD is >>> set to SIG_IGN." >>> >>> 2) SA_RESTART: While waitpid is listed as a system call that can be >>> restarted by SA_RESTART, fork is not. Even if I leave the "EINTR loop" >>> in place for fork, using SA_RESTART I still see an interrupted system >>> call for waitpid. Possibly because the problem is SIGSTOP and not >>> SIGCHLD. >>> >>> 3) pthread_sigblock: With this set for SIGCHLD in all the threads, I >>> still saw an interrupted system call. You can't block SIGSTOP. >>> >>> 4) pthread_sigblock with sigwait: using pthread_sigblock on all the >>> blockable signals with a signal thread that called sigwait for all >>> the signals in a loop, the signal thread would see a bunch of SIGCHLDs, >>> but there would eventually be an interrupted system call. >>> >>> 5) bsd_signal: this function is supposed to automatically restart blocking >>> system calls. fork is not a blocking system call, but it doesn't help >>> for waitpid either. >>> >>> I found this in the ptrace(2) man page: "Note that a suppressed signal >>> still causes system calls to return prematurely. In this case, system >>> calls will be restarted: the tracer will observe the tracee to reexecute >>> the interrupted system call (or restart_syscall(2) system call for a few >>> system calls which use a different mechanism for restarting) if the tracer >>> uses PTRACE_SYSCALL. Even system calls (such as poll(2)) which are not >>> restartable after signal are restarted after signal is suppressed; however, >>> kernel bugs exist which cause some system calls to fail with EINTR even >>> though no observable signal is injected to the tracee." >>> >>> The GDB manual mentions something similar about interrupted system calls. >>> >>> So, the bottom line is that I haven't changed the fix for the interrupted >>> system calls, because I can't find anything that works as well as the >>> original fix. Perhaps this test puts enough stress on the kernel that the >>> kernel bugs mentioned above are exposed. >>> >>> One change I did make from the previous version was to increase the >>> timeout to 90 seconds, which was necessary to get more reliable results >>> on the Nios II target. >>> >>> Let me know what you think. >>> Thanks! >>> --Don >>> >>> --- >>> This patch addresses "fork:Interrupted system call" (or wait:) failures >>> in gdb.threads/forking-threads-plus-breakpoint.exp. >>> >>> The test program spawns ten threads, each of which do ten fork/waitpid >>> sequences. The cause of the problem was that when one of the fork >>> children exited before the corresponding fork parent could initiate its >>> waitpid for that child, a SIGCHLD was delivered and interrupted a fork >>> or waitpid in another thread. >>> >>> The fix was to wrap the system calls in a loop to retry the call if >>> it was interrupted, like: >>> >>> do >>> { >>> pid = fork (); >>> } >>> while (pid == -1 && errno == EINTR); >>> >>> Since this is a Linux-only test I figure it is OK to use errno and EINTR. >>> I tried a number of alternative fixes using SIG_IGN, SA_RESTART, >>> pthread_sigblock, and bsd_signal, but none of these worked as well. >>> >>> Tested on Nios II Linux target with x86 Linux host. >>> >>> gdb/testsuite/ChangeLog: >>> 2016-02-10 Don Breazeal >>> >>> * gdb.threads/forking-threads-plus-breakpoint.c (thread_forks): >>> Retry fork and waitpid on interrupted system call errors. >>> * gdb.threads/forking-threads-plus-breakpoint.exp: (do_test): >>> Increase timeout to 90. >>> >>> --- >>> .../gdb.threads/forking-threads-plus-breakpoint.c | 14 ++++++++++++-- >>> .../gdb.threads/forking-threads-plus-breakpoint.exp | 3 +++ >>> 2 files changed, 15 insertions(+), 2 deletions(-) >>> >>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c >>> index fc64d93..c169e18 100644 >>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c >>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.c >>> @@ -22,6 +22,7 @@ >>> #include >>> #include >>> #include >>> +#include >>> >>> /* Number of threads. Each thread continuously spawns a fork and wait >>> for it. If we have another thread continuously start a step over, >>> @@ -49,14 +50,23 @@ thread_forks (void *arg) >>> { >>> pid_t pid; >>> >>> - pid = fork (); >>> + do >>> + { >>> + pid = fork (); >>> + } >>> + while (pid == -1 && errno == EINTR); >>> >>> if (pid > 0) >>> { >>> int status; >>> >>> /* Parent. */ >>> - pid = waitpid (pid, &status, 0); >>> + do >>> + { >>> + pid = waitpid (pid, &status, 0); >>> + } >>> + while (pid == -1 && errno == EINTR); >>> + >>> if (pid == -1) >>> { >>> perror ("wait"); >>> diff --git a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp >>> index ff3ca9a..6889c2b 100644 >>> --- a/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp >>> +++ b/gdb/testsuite/gdb.threads/forking-threads-plus-breakpoint.exp >>> @@ -73,6 +73,9 @@ proc do_test { cond_bp_target detach_on_fork displaced } { >>> global linenum >>> global is_remote_target >>> >>> + global timeout >>> + set timeout 90 >>> + >>> set saved_gdbflags $GDBFLAGS >>> set GDBFLAGS [concat $GDBFLAGS " -ex \"set non-stop on\""] >>> clean_restart $binfile >>> >> >