From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16456 invoked by alias); 17 Jun 2008 00:19:25 -0000 Received: (qmail 16039 invoked by uid 48); 17 Jun 2008 00:18:47 -0000 Date: Tue, 17 Jun 2008 00:19:00 -0000 Message-ID: <20080617001847.16038.qmail@sourceware.org> From: "twong at gear6 dot com" To: glibc-bugs@sources.redhat.com In-Reply-To: <20070328191239.4294.Matthew.L.Dunkle@nasa.gov> References: <20070328191239.4294.Matthew.L.Dunkle@nasa.gov> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug nptl/4294] rwlock hangs under stress load X-Bugzilla-Reason: CC Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2008-06/txt/msg00023.txt.bz2 ------- Additional Comments From twong at gear6 dot com 2008-06-17 00:18 ------- In fact, we hit the same problem just in the past few weeks. We are running 2.6.12 kernel on ADM processors. After heavy use of the rwlock, we found that the number of readers count can go to negative - sometimes -1, -2, -3. etc. when it fact there were no readers. The system would hang because the writers can never get the rwlock. We have added some code to keep seperate readers count (protected by different mutex) and we stop when we detect -1 in the rwlock. In one case, our independent log shows that there are 2 simultaneous reads so the number of readers should be 2 after the rwlock is acquired. However it remains at 1. It seems that one of the incl instruction is missing. This is reproduced by running our reflexOS - a NFS cache application with many many network connections, which uses the rwlock. We can reproduced this problem in 30 minutes to 1 hour with out code. We have written smaller standalone programs to reproduce the problem but we could not. We believe it has a lot to do with the workload and memory usage. Unfortunately no root cause is reported here but we will switch to mutex as suggested. We suspect it has something to do with the memory barrier features in AMD. -- http://sourceware.org/bugzilla/show_bug.cgi?id=4294 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.