From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13380 invoked by alias); 14 Apr 2011 06:33:34 -0000 Received: (qmail 13371 invoked by uid 22791); 14 Apr 2011 06:33:33 -0000 X-SWARE-Spam-Status: No, hits=-2.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from localhost (HELO sourceware.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Thu, 14 Apr 2011 06:33:27 +0000 From: "dhatch at ilm dot com" To: glibc-bugs@sources.redhat.com Subject: [Bug nptl/12674] New: sem_post/sem_wait race causing sem_post to return EINVAL X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: nptl X-Bugzilla-Keywords: X-Bugzilla-Severity: critical X-Bugzilla-Who: dhatch at ilm dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: drepper.fsp at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Thu, 14 Apr 2011 06:33:00 -0000 Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2011-04/txt/msg00053.txt.bz2 http://sourceware.org/bugzilla/show_bug.cgi?id=12674 Summary: sem_post/sem_wait race causing sem_post to return EINVAL Product: glibc Version: unspecified Status: NEW Severity: critical Priority: P2 Component: nptl AssignedTo: drepper.fsp@gmail.com ReportedBy: dhatch@ilm.com Created attachment 5671 --> http://sourceware.org/bugzilla/attachment.cgi?id=5671 the test program, to be run in gdb as described There appears to be a race in the implementation of sem_post/sem_wait on AMD64 (nptl/sysdeps/unix/sysv/linux/x86_64/sem_post.S in the source code) which sometimes causes sem_post to access freed memory and to fail with EINVAL. In a nutshell, if sem_post happens to go to sleep right after it increments sem->value but before it looks at sem->nwaiters, another thread can sail through a sem_wait without blocking and destroy the semaphore, so that when the sem_post thread wakes up and looks at sem->nwaiters, it is looking at already-freed (and possibly unmapped) memory. The bug was originally filed as gentoo bug 93366 ( http://bugs.gentoo.org/show_bug.cgi?id=93366 ). It's extremely hard to reproduce, and I don't have a simple program that can demonstrate the problem reliably by just running it (for less than a million years). But it can be reproduced consistently either by hacking up the sem_post source code and adding a sleep() at a crucial point, or by carefully stopping and resuming the threads in a debugger with thread-specific breakpoints. I'll include instructions for doing the latter using gdb >=7.1. We're observing the problem on an AMD64 machine running RHEL5.3 Linux, with glibc-2.5-34.el5_3.1 and gcc-4.1.2-44.el5, which I know is ancient but I also downloaded the most current glibc source code today and compiled the sem_post.S and sem_wait.S from it, and I can still reproduce the problem using those. Here are the instructions for reproducing the problem using gdb 7.1 or 7.2 on the attached program (gdb 7.0.1 and earlier fail with a supposed syntax error on the "b *(sem_post+18) thread 3"). % gcc -Wall -g semtest.c -lpthread -o semtest % gdb ./semtest # per http://sourceware.org/gdb/onlinedocs/gdb/Non_002dStop-Mode.html ... # Enable the async interface. set target-async 1 # If using the CLI, pagination breaks non-stop. set pagination off # Finally, turn it on! set non-stop on b waiter b poster r # thread 2 stops in waiter # thread 3 stops in poster t 2 b sem_wait thread 2 c # thread 2 (waiter) stops at the beginning of sem_wait(varsem) disas sem_post # look for the "cmpq $0x0,0x8(%rdi)" and put a breakpoint there. # in older versions it's sem_post+4; # in newer versions it's sem_post+18. t 3 b *(sem_post+18) thread 3 <-- or sem_post+4 or whatever c # thread 3 (poster) stops at the breakpoint inside sem_post, # after incrementing varsem->value (4-byte value 0 bytes into the object) # but before looking at varsem->nwaiters (8-byte value 8 bytes into the object) t 2 b free thread 2 c # thread 2 (waiter) sails through the sem_wait without blocking, # calls sem_destroy(varsem), # trashes the memory, # and stops at the beginning of free t 3 c # thread 3 (poster) resumes in the middle of sem_post, # looks at varsem->nwaiters and sees it's nonzero (trash) # so it makes the FUTEX_WAKE syscall which returns EINVAL, # the program exits with error message # "sem_post() in poster: Invalid argument" I hope I am not overinflating this bug's severity by calling it "critical" ("major" would feel more appropriate to me, but there seems to be no "major" option, only "normal" and "critical"). Although failure is rare, we are about to be forced to implement our own semaphores rather than using the posix semaphores because of this bug, so it does seem rather severe. -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.