From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 7516 invoked by alias); 18 Aug 2014 12:28:26 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 7450 invoked by uid 89); 18 Aug 2014 12:28:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_05,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: limerock04.mail.cornell.edu Received: from limerock04.mail.cornell.edu (HELO limerock04.mail.cornell.edu) (128.84.13.244) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 18 Aug 2014 12:28:22 +0000 X-CornellRouted: This message has been Routed already. Received: from authusersmtp.mail.cornell.edu (granite3.serverfarm.cornell.edu [10.16.197.8]) by limerock04.mail.cornell.edu (8.14.4/8.14.4_cu) with ESMTP id s7ICSKpg011211 for ; Mon, 18 Aug 2014 08:28:20 -0400 Received: from [192.168.1.4] (cpe-67-249-194-47.twcny.res.rr.com [67.249.194.47]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.14.4/8.12.10) with ESMTP id s7ICSJxt025088 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT) for ; Mon, 18 Aug 2014 08:28:20 -0400 Message-ID: <53F1F154.1020702@cornell.edu> Date: Mon, 18 Aug 2014 12:28:00 -0000 From: Ken Brown User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: cygwin@cygwin.com Subject: Re: (call-process ...) hangs in emacs References: <53DB8D23.7060806@alice.it> <20140801133225.GD25860@calimero.vinschen.de> <53DEDBBA.20102@cornell.edu> <20140804080034.GA2578@calimero.vinschen.de> <53DF8BDC.8090104@cornell.edu> <20140804134526.GK2578@calimero.vinschen.de> <53E0CC2D.4080305@cornell.edu> <20140805135830.GA9994@calimero.vinschen.de> <53E11A93.9070800@cornell.edu> <20140805184047.GC13601@calimero.vinschen.de> <53E3685B.8050508@cornell.edu> <53E39BAD.3010004@redhat.com> <53E3CB46.1020909@cornell.edu> <53E3F2AE.7030608@redhat.com> <53E4D01B.9010005@cornell.edu> In-Reply-To: <53E4D01B.9010005@cornell.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-08/txt/msg00346.txt.bz2 On 8/8/2014 9:26 AM, Ken Brown wrote: > On 8/7/2014 5:42 PM, Eric Blake wrote: >> On 08/07/2014 12:53 PM, Ken Brown wrote: >>> On 8/7/2014 11:30 AM, Eric Blake wrote: >>>> On 08/07/2014 05:51 AM, Ken Brown wrote: >>>>> >>>>> I think I found the problem with NORMAL mutexes. emacs calls >>>>> pthread_atfork after initializing the mutexes, and the resulting >>>>> 'prepare' handler locks the mutexes. (The parent and child handlers >>>>> unlock them.) So when emacs calls fork, the mutexes are locked, and >>>>> shortly thereafter the Cygwin DLL calls calloc, leading to a deadlock. >>>>> Here's a gdb backtrace showing the sequence of calls: >>>> >>>> Arguably, that's an upstream bug in emacs. POSIX has declared >>>> pthread_atfork to be fundamentally useless; it is broken by design, >>>> because you cannot use it for anything that is not async-signal-safe >>>> without risking deadlock. And (except for sem_post()), NONE of the >>>> standardized locking functions are async-signal-safe. >>>> >>>> http://austingroupbugs.net/view.php?id=858 >>>> >>>> That said, it would still be nice to support this, since even though >>>> the >>>> theory says it is broken, there are still lots of (broken) >>>> programs/libraries still trying to use it. >>> >>> So what do you think emacs should do instead of using pthread_atfork? Or >>> is it better to just remove it? I don't know how likely it is that this >>> would cause a problem. >> >> The POSIX recommendation is that multithreaded apps limit themselves >> solely to async-signal-safe functions in the window between fork and >> exec (or to use pthread_spawn instead of fork/exec). I don't know what >> emacs is trying to do in that window, but at this point, it's certainly >> worth reporting it upstream. If you need a pointer to the full list of >> async-signal-safe functions: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04 >> >> and search for "The following table defines a set of functions that >> shall be async-signal-safe." >> >> The most common deadlocks when violating async-signal-safety rules look >> like this in single-threaded programs: >> >> function calls malloc() >> malloc() grabs a non-recursive mutex >> async signal arrives >> signal handler called >> signal handler calls malloc() >> malloc() can't grab the mutex - deadlock >> >> and this counterpart in multithreaded programs: >> >> thread1 calls malloc() >> malloc() grabs a non-recursive mutex >> thread 2 gains control and calls fork() >> because of the fork, thread1 no longer exists to release the lock >> child process calls malloc() >> malloc() tries to grab mutex, but it is locked with no thread to >> release it >> >> Switching malloc() to a recursive lock may or may not "solve" the >> single-threaded deadlock (in that malloc can now obtain the mutex), but >> it is probably NOT what you want to happen (unless malloc is fully >> re-entrant, the inner instance will see incomplete data and either be >> totally clobbered itself, or else totally clobber the outer instance >> when it returns). So it's GOOD that malloc does NOT use a recursive >> mutex by default. >> >> In the multithreaded case, you are flat out hosed. Switching to a >> recursive lock does not change the picture - you are still deadlocked >> waiting on thread1 to release the lock, but thread1 doesn't exist. > > Thanks for the explanations, Eric. I've filed an emacs bug report: > > http://debbugs.gnu.org/cgi/bugreport.cgi?bug=18222 I've just made a new emacs test release that includes a workaround for this bug. I think I see a way to make emacs use Cygwin's malloc; if this works, it will provide a better fix for the bug. Ken -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple