From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <cygwin-return-193069-listarch-cygwin=sourceware.org@cygwin.com>
Received: (qmail 12004 invoked by alias); 26 Oct 2014 11:58:54 -0000
Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm
Precedence: bulk
List-Id: <cygwin.cygwin.com>
List-Subscribe: <mailto:cygwin-subscribe@cygwin.com>
List-Archive: <http://sourceware.org/ml/cygwin/>
List-Post: <mailto:cygwin@cygwin.com>
List-Help: <mailto:cygwin-help@cygwin.com>, <http://sourceware.org/ml/#faqs>
Sender: cygwin-owner@cygwin.com
Mail-Followup-To: cygwin@cygwin.com
Received: (qmail 11995 invoked by uid 89); 26 Oct 2014 11:58:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: limerock04.mail.cornell.edu
Received: from limerock04.mail.cornell.edu (HELO limerock04.mail.cornell.edu) (128.84.13.244) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 26 Oct 2014 11:58:52 +0000
X-CornellRouted: This message has been Routed already.
Received: from authusersmtp.mail.cornell.edu (granite3.serverfarm.cornell.edu [10.16.197.8])	by limerock04.mail.cornell.edu (8.14.4/8.14.4_cu) with ESMTP id s9QBwnhv014219	for <cygwin@cygwin.com>; Sun, 26 Oct 2014 07:58:50 -0400
Received: from [10.0.0.113] (50-247-204-241-static.hfc.comcastbusiness.net [50.247.204.241] (may be forged))	(authenticated bits=0)	by authusersmtp.mail.cornell.edu (8.14.4/8.12.10) with ESMTP id s9QBwmBp002420	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT)	for <cygwin@cygwin.com>; Sun, 26 Oct 2014 07:58:49 -0400
Message-ID: <544CE1F7.5050603@cornell.edu>
Date: Sun, 26 Oct 2014 11:58:00 -0000
From: Ken Brown <kbrown@cornell.edu>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: cygwin@cygwin.com
Subject: Re: Threads
References: <54450835.3050602@cornell.edu> <5448E6F9.8040005@dronecode.org.uk> <5448EEBF.3020706@cornell.edu> <20141023153730.GC20607@calimero.vinschen.de> <544A327E.9090006@dronecode.org.uk> <20141024125416.GK20607@calimero.vinschen.de> <20141024135231.GL20607@calimero.vinschen.de>
In-Reply-To: <20141024135231.GL20607@calimero.vinschen.de>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-IsSubscribed: yes
X-SW-Source: 2014-10/txt/msg00437.txt.bz2

On 10/24/2014 9:52 AM, Corinna Vinschen wrote:
> On Oct 24 14:54, Corinna Vinschen wrote:
>> On Oct 24 12:05, Jon TURNEY wrote:
>>> On 23/10/2014 16:37, Corinna Vinschen wrote:
>>>> On Oct 23 08:04, Ken Brown wrote:
>>>>> Yes, flags register corruption is exactly what Eli suggested in the other
>>>>> bug report I cited.
>>>>
>>>> The aforementioned patch was supposed to fix this problem and it is
>>>> definitely in the current 1.7.32 release...
>>>
>>> I didn't mean to suggest otherwise, just that perhaps a similar problem
>>> exists now.
>>>
>>> So I made the attached test case to explore that.  Maybe I've made an
>>> obvious mistake with it, but on the face of it, it seems to demonstrate
>>> something...
>>>
>>> jon@tambora /
>>> $ gcc signal-stress.c  -Wall -O0 -g
>>>
>>> jon@tambora /
>>> $ ./a
>>> failed: 2144210386 isn't equal to 2144210386, apparently
>>
>> So it checks i and j for equality, fails, and then comes up with
>> "42 isn't equal to 42"?  This is weird...
>>
>>> Note there is some odd load dependency. For me, it works fine when it's the
>>> only thing running, but when I start up something CPU intensive, it often
>>> fails...
>>
>> That's... interesting.  I wonder if that only occurs in multi-core or
>> multi-CPU environments.  The fact that i and j are not the same when
>> testing, but then are the same when printf is called looks like a
>> out-of-order execution problem.
>>
>> Is it possible that we have to add CPU memory barriers to the sigdelayed
>> function to avoid stuff like this?
>
> I discussed this with my college Kai Tietz (many thanks to him from
> here), and we came up with a problem in sigdelayed in the 64 bit case:
> pushf is called *after* aligning the stack with andq.  This alignment
> potentially changes the CPU flag values so the restored flags are
> potentially not the flags when entering sigdelayed.
>
> I just applied a patch and created new snapshots on
> https://cygwin.com/snapshots/
>
> I couldn't reprocude the problem locally, so I'd be grateful if you
> could test if that fixes the problem in your testcase, Jon.

I tried Jon's testcase.  With cygwin-1.7.33-0.1, it failed within a few minutes. 
  With cygwin-1.7.33-0.2, I ran it for over an hour with no problem, with the 
system heavily loaded.  So it looks good so far.

> Ken, can you check if this snapshot helps emacs along, too?

The people who have been reporting frequent crashes are aware of the fix.  Now I 
just have to wait and hope I don't hear from them for a few days.

Ken

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple