From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 24964 invoked by alias); 28 Mar 2007 18:13:04 -0000 Received: (qmail 24661 invoked by uid 48); 28 Mar 2007 18:12:41 -0000 Date: Wed, 28 Mar 2007 18:13:00 -0000 From: "Matthew dot L dot Dunkle at nasa dot gov" To: glibc-bugs@sources.redhat.com Message-ID: <20070328191239.4294.Matthew.L.Dunkle@nasa.gov> Reply-To: sourceware-bugzilla@sourceware.org Subject: [Bug nptl/4294] New: rwlock hangs under stress load X-Bugzilla-Reason: CC Mailing-List: contact glibc-bugs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: glibc-bugs-owner@sourceware.org X-SW-Source: 2007-03/txt/msg00084.txt.bz2 fedora core 6 x86_64 smp installation. updated to kernel.org 2.6.20.3 kernel configured with fully preemptible kernel, including preemptible big kernel lock. running on 4 dual-core AMD 880 system with 8-gig of ram. real-time process with 4 reader threads and 1 writer thread of higher real-time SCHED_FIFO priority than reader threads. all 5 threads use "cpu affinity" setting to each obtain a processor to themselves. attempted to set rwlock attributes to "writer preferred", although i'm not certain it worked ("flags" still appear to be zero in pthread_rwlock_t structure, but maybe i'm looking at the wrong thing). approximately 15000 write locks and 4x15000 read locks per second under full load. process will eventually get "stuck" with reader threads returning "RESOURCE TEMPORARILY UNAVAILABLE" status forever after. amount of time it takes to get "stuck" is highly variable (can be minutes or hours). the EAGAIN return status would appear to be indicative of a counter overflow condition, but i believe it's actually just the opposite, in a roundabout manner of speaking. i don't know much about assembly language code, so i tried taking the "C" code equivalents (instead of the x86_64 assembly functions) for the pthread_rwlock_rdlock, pthread_rwlock_wrlock, and pthread_rwlock_unlock functions, and "incorporated them into my process" so to speak, hoping to duplicate the symptoms, and allowing me to insert some printf statements, which might shed some light on the problem. i was able to duplicate the situation, and what appears to be happening to me is two of the reader threads are simultaneously incrementing the __nr_readers counter in the pthread_rwlock_t structure, so essentially one of the increments is "missed". for example, the __nr_readers counter starts at zero let's say, both threads increment the counter simultaneously, and it ends up at one, where it should have ended up at two. then when the "unlock" call decrements the counter, it goes to -1 (or 4294967295 as an unsigned 32-bit integer). the next time a rdlock is issued, it thinks the counter is about to roll over, and returns the EAGAIN status. i thought the low level lock should prevent two threads from simultaneously incrementing or decrementing those counters, but for some reason that doesn't seem to be the case? so perhaps the problem is really with the lll_mutex_lock rather than the rwlock itself, i'm not really sure? sorry, this is my first bug report, and i didn't know what to fill in for the host, target, build, triplets, but hopefully i've provided enough information otherwise. if not, feel free to e-mail me at Matthew.L.Dunkle@nasa.gov if you need additional information. i know this might not be easy to reproduce, especially considering the equipment i am working with and so forth, but i appreciate whatever efforts you can make. in the meantime, i am going to attempt to use something else, maybe a plain vanilla mutex, to see if i can get it working in a different manner. thank you. -- Summary: rwlock hangs under stress load Product: glibc Version: 2.4 Status: NEW Severity: normal Priority: P2 Component: nptl AssignedTo: drepper at redhat dot com ReportedBy: Matthew dot L dot Dunkle at nasa dot gov CC: glibc-bugs at sources dot redhat dot com http://sourceware.org/bugzilla/show_bug.cgi?id=4294 ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is.