From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6123 invoked by alias); 10 Feb 2012 16:26:57 -0000 Received: (qmail 6099 invoked by uid 22791); 10 Feb 2012 16:26:53 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_PX X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 10 Feb 2012 16:26:38 +0000 From: "kevin.dempsey at aculab dot com" To: glibc-bugs@sources.redhat.com Subject: [Bug nptl/12674] sem_post/sem_wait race causing sem_post to return EINVAL Date: Fri, 10 Feb 2012 16:26:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: kevin.dempsey at aculab dot com X-Bugzilla-Status: REOPENED X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: drepper.fsp at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: CC Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2012-02/txt/msg00057.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=12674 Kevin Dempsey changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kevin.dempsey at aculab dot | |com --- Comment #8 from Kevin Dempsey 2012-02-10 16:26:32 UTC --- We have been getting the same problem on an Amazon EC2 instance running a Fedora 8 (2.6.21.7-5.fc8 kernel-xen) based image with glibc.i686 2.7-2, using the nosegneg variant. The program aborts when sem_post() returns an error and has been averaging one failure every three months. Having seen this bug report, I have been testing with a program based on the original reporters source. On an EC2 instance I have not had it run for more than 4 hours before failing (I have not seen a failure on bare metal). When a failure does occur the strace output shows the futex() syscall has been made with an invalid operation: 12072 futex(0x9152098, 0x1010101 /* FUTEX_??? */, 1) = -1 ENOSYS (Function not implemented) presumably because the PRIVATE field has been overwritten. >>From the glibc source repository it appears that this race was introduced when the change was made to make sem_post() only call FUTEX_WAKE when there are threads waiting. In fact, with the test program forced to use the old implementation (using .symver) I haven't had it fail. If the value and nwaiters were next to each other then they could both be accessed atomically using cmpxchg8b (on i586 and later). Perhaps then somebody skilled in the art could eliminate the race condition? -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.