public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug nptl/4294] New: rwlock hangs under stress load
@ 2007-03-28 18:13 Matthew dot L dot Dunkle at nasa dot gov
  2007-03-30 11:36 ` [Bug nptl/4294] " jakub at redhat dot com
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Matthew dot L dot Dunkle at nasa dot gov @ 2007-03-28 18:13 UTC (permalink / raw)
  To: glibc-bugs

fedora core 6 x86_64 smp installation.  updated to kernel.org 2.6.20.3 kernel
configured with fully preemptible kernel, including preemptible big kernel lock.
 running on 4 dual-core AMD 880 system with 8-gig of ram.

real-time process with 4 reader threads and 1 writer thread of higher real-time
SCHED_FIFO priority than reader threads.  all 5 threads use "cpu affinity"
setting to each obtain a processor to themselves.  attempted to set rwlock
attributes to "writer preferred", although i'm not certain it worked ("flags"
still appear to be zero in pthread_rwlock_t structure, but maybe i'm looking at
the wrong thing).  approximately 15000 write locks and 4x15000 read locks per
second under full load.

process will eventually get "stuck" with reader threads returning "RESOURCE
TEMPORARILY UNAVAILABLE" status forever after.  amount of time it takes to get
"stuck" is highly variable (can be minutes or hours).

the EAGAIN return status would appear to be indicative of a counter overflow
condition, but i believe it's actually just the opposite, in a roundabout manner
of speaking.  i don't know much about assembly language code, so i tried taking
the "C" code equivalents (instead of the x86_64 assembly functions) for the
pthread_rwlock_rdlock, pthread_rwlock_wrlock, and pthread_rwlock_unlock
functions, and "incorporated them into my process" so to speak, hoping to
duplicate the symptoms, and allowing me to insert some printf statements, which
might shed some light on the problem.

i was able to duplicate the situation, and what appears to be happening to me is
two of the reader threads are simultaneously incrementing the __nr_readers
counter in the pthread_rwlock_t structure, so essentially one of the increments
is "missed".  for example, the __nr_readers counter starts at zero let's say,
both threads increment the counter simultaneously, and it ends up at one, where
it should have ended up at two.  then when the "unlock" call decrements the
counter, it goes to -1 (or 4294967295 as an unsigned 32-bit integer).  the next
time a rdlock is issued, it thinks the counter is about to roll over, and
returns the EAGAIN status.

i thought the low level lock should prevent two threads from simultaneously
incrementing or decrementing those counters, but for some reason that doesn't
seem to be the case?  so perhaps the problem is really with the lll_mutex_lock
rather than the rwlock itself, i'm not really sure?

sorry, this is my first bug report, and i didn't know what to fill in for the
host, target, build, triplets, but hopefully i've provided enough information
otherwise.  if not, feel free to e-mail me at Matthew.L.Dunkle@nasa.gov if you
need additional information.

i know this might not be easy to reproduce, especially considering the equipment
i am working with and so forth, but i appreciate whatever efforts you can make.
 in the meantime, i am going to attempt to use something else, maybe a plain
vanilla mutex, to see if i can get it working in a different manner.  thank you.

-- 
           Summary: rwlock hangs under stress load
           Product: glibc
           Version: 2.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: Matthew dot L dot Dunkle at nasa dot gov
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/4294] rwlock hangs under stress load
  2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
@ 2007-03-30 11:36 ` jakub at redhat dot com
  2007-03-30 14:22 ` Matthew dot L dot Dunkle at nasa dot gov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: jakub at redhat dot com @ 2007-03-30 11:36 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From jakub at redhat dot com  2007-03-30 12:36 -------
Please write a small self-contained testcase that shows this.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/4294] rwlock hangs under stress load
  2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
  2007-03-30 11:36 ` [Bug nptl/4294] " jakub at redhat dot com
@ 2007-03-30 14:22 ` Matthew dot L dot Dunkle at nasa dot gov
  2007-04-09 22:28 ` Matthew dot L dot Dunkle at nasa dot gov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Matthew dot L dot Dunkle at nasa dot gov @ 2007-03-30 14:22 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From Matthew dot L dot Dunkle at nasa dot gov  2007-03-30 15:21 -------
(In reply to comment #1)
> Please write a small self-contained testcase that shows this.

i will try soon as possible.  fyi - switched program to use mutex lock instead 
of rwlock, works perfectly now.

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/4294] rwlock hangs under stress load
  2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
  2007-03-30 11:36 ` [Bug nptl/4294] " jakub at redhat dot com
  2007-03-30 14:22 ` Matthew dot L dot Dunkle at nasa dot gov
@ 2007-04-09 22:28 ` Matthew dot L dot Dunkle at nasa dot gov
  2007-05-01  6:01 ` drepper at redhat dot com
  2008-06-17  0:19 ` twong at gear6 dot com
  4 siblings, 0 replies; 6+ messages in thread
From: Matthew dot L dot Dunkle at nasa dot gov @ 2007-04-09 22:28 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From Matthew dot L dot Dunkle at nasa dot gov  2007-04-09 23:28 -------
i have thus far been unable to duplicate the problem in a test program that i 
wrote, that i thought would contain the essence of my application, but 
apparently something essential is still missing.

i can send you a copy of that test code, but since it doesn't exhibit the 
problem at this time, i don't think it would do you much good.  as i mentioned 
previously, changing my original application to use a mutex lock instead of a 
read/write lock, took care of the application which works fine now.  at this 
time, unfortunately, i don't have much spare time to spend on the test software 
to try and replicate the original problem.

so if you'd like to mark this bug as unreproducable, or whatever, that would be 
fine with me.  it has at least been documented, so if someone else experiences 
something similar, they might be able to help in recreating the situation.  if 
i am able to get the test software to fail sometime in the future, i will send 
you a copy.  in the meantime, please do whatever you think appropriate with 
this bug report, and thank you for your time.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/4294] rwlock hangs under stress load
  2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
                   ` (2 preceding siblings ...)
  2007-04-09 22:28 ` Matthew dot L dot Dunkle at nasa dot gov
@ 2007-05-01  6:01 ` drepper at redhat dot com
  2008-06-17  0:19 ` twong at gear6 dot com
  4 siblings, 0 replies; 6+ messages in thread
From: drepper at redhat dot com @ 2007-05-01  6:01 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2007-05-01 07:01 -------
I doubt it's a real problem.  You might have a memory corruption or so.  The
code is not only in very wide use, it has been looked over many times by
different people.  There were kernel versions which had bugs in the futex code.
 There also is somewhere a description of hardware problems on AMD processors
wrt atomic operations.  What really is the cause is for you to decide in case
you care.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug nptl/4294] rwlock hangs under stress load
  2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
                   ` (3 preceding siblings ...)
  2007-05-01  6:01 ` drepper at redhat dot com
@ 2008-06-17  0:19 ` twong at gear6 dot com
  4 siblings, 0 replies; 6+ messages in thread
From: twong at gear6 dot com @ 2008-06-17  0:19 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From twong at gear6 dot com  2008-06-17 00:18 -------
In fact, we hit the same problem just in the past few weeks.
We are running 2.6.12 kernel on ADM processors.

After heavy use of the rwlock, we found that the number of readers
count can go to negative - sometimes -1, -2, -3. etc.  when it fact
there were no readers.   The system would hang because the writers
can never get the rwlock.

We have added some code to keep seperate readers count (protected
by different mutex) and we stop when we detect -1 in the rwlock.

In one case, our independent log shows that there are 2 simultaneous
reads so the number of readers should be 2 after the rwlock is acquired.
However it remains at 1.  It seems that one of the incl instruction is
missing.

This is reproduced by running our reflexOS - a NFS cache application
with many many network connections, which uses the rwlock.  We can reproduced
this problem in 30 minutes to 1 hour with out code.

We have written smaller standalone programs to reproduce the problem
but we could not.   We believe it has a lot to do with the workload and
memory usage.  Unfortunately no root cause is reported here but
we will switch to mutex as suggested.

We suspect it has something to do with the memory barrier features
in AMD.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=4294

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-06-17  0:19 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-28 18:13 [Bug nptl/4294] New: rwlock hangs under stress load Matthew dot L dot Dunkle at nasa dot gov
2007-03-30 11:36 ` [Bug nptl/4294] " jakub at redhat dot com
2007-03-30 14:22 ` Matthew dot L dot Dunkle at nasa dot gov
2007-04-09 22:28 ` Matthew dot L dot Dunkle at nasa dot gov
2007-05-01  6:01 ` drepper at redhat dot com
2008-06-17  0:19 ` twong at gear6 dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).