public inbox for libc-ports@sourceware.org
 help / color / mirror / Atom feed
* Coldfire __lll_lock fails under heavy system stress
@ 2012-10-31 20:23 Ed Slas
  2012-10-31 20:43 ` Joseph S. Myers
  0 siblings, 1 reply; 6+ messages in thread
From: Ed Slas @ 2012-10-31 20:23 UTC (permalink / raw)
  To: libc-ports



I have been able to show that low level locks do not work properly on coldfire CPU.


Under heavy system stress with an application that uses real time threads, The low level lock implementation fails with several symptoms. 
I have a test app that creates 40 RT threads, each thread malloc’ing 500 bytes, filling it and checking it was filled a million times. 
While this is running, one of a couple things will fail:
-A thread will stop running, waiting on a lock that no one holds.
-segmentation fault
-glibc bails complaining about corrupted double-link lists. 


ports/sysdeps/unix/sysv/linux/m68k/nptl/lowlevellock.h
uses
atomic_compare_and_exchange_val_acq
to implement locks. 
For coldfire, atomic_compare_and_exchange_val_acq is implemented in ports/sysdeps/unix/sysv/linux/m68k/coldfire/nptl/bits/atomic.h
with the system call
atomic_cmpxchg_32

I rarely delve this low into the code, so I cannot explain why this implementation fails, but it is very suspicious that the kernel (2.6.38) interrupt handler repositions the PC if it sees the PC at interrupt was in atomic_cmpxchg_32(). 

In a effort to work-around my problem, I wrote a user-space lock using the m68K TAS instruction. This instruction only works on a bit, so I could not implement the ‘waiters’ condition, where the futex is changed from 1 to 2. 
With the lack of the waiters condition, I simply call FUTEX_WAKE for every unlock. 

Using these modifications, I was able to run my test app successfully and ultimately my real application no longer exhibits these issues. 

Not being a maintainer, I do not have the wherewithal to submit a patch. However, I am willing to work with anyone if interested to get this fixed in the m68k/coldfire port. 

Anyone interested?

-Ed

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Coldfire __lll_lock fails under heavy system stress
  2012-10-31 20:23 Coldfire __lll_lock fails under heavy system stress Ed Slas
@ 2012-10-31 20:43 ` Joseph S. Myers
  2012-11-01 14:57   ` Ed Slas
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph S. Myers @ 2012-10-31 20:43 UTC (permalink / raw)
  To: Ed Slas; +Cc: libc-ports

On Wed, 31 Oct 2012, Ed Slas wrote:

> I rarely delve this low into the code, so I cannot explain why this 
> implementation fails, but it is very suspicious that the kernel (2.6.38) 
> interrupt handler repositions the PC if it sees the PC at interrupt was 
> in atomic_cmpxchg_32().

Given what you have described, there is no evidence for a bug in glibc 
rather than the kernel (and so no evidence that the problem you see should 
be fixed in glibc rather than the kernel).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Coldfire __lll_lock fails under heavy system stress
  2012-10-31 20:43 ` Joseph S. Myers
@ 2012-11-01 14:57   ` Ed Slas
  2012-11-01 16:52     ` Joseph S. Myers
  0 siblings, 1 reply; 6+ messages in thread
From: Ed Slas @ 2012-11-01 14:57 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: libc-ports

Joesph, 

Thanks for your time. I understand the kernel's atomic_cmpxchg_32() is most likely the issue, but note that most of the other platforms use a atomic lock in user space, then resort to the kernel to arbitrate contentions. The Coldfire port makes the atomic_cmpxchg_32 kernel call first, when there is a user space atomic lock available (TAS instruction).


This said, I believe I have a 'better' way, i.e. follows the design of the other ports closer...and eliminates the dependency on the buggy atomic_cmpxchg_32.

Also note the coldfire TAS instruction was not added until the v4 version of this chip...so this is probably not a good fit for patching tip anyhow. 

Ed



----- Original Message -----
From: Joseph S. Myers <joseph@codesourcery.com>
To: Ed Slas <ed_slas@yahoo.com>
Cc: "libc-ports@sourceware.org" <libc-ports@sourceware.org>
Sent: Wednesday, October 31, 2012 3:43 PM
Subject: Re: Coldfire __lll_lock fails under heavy system stress

On Wed, 31 Oct 2012, Ed Slas wrote:

> I rarely delve this low into the code, so I cannot explain why this 
> implementation fails, but it is very suspicious that the kernel (2.6.38) 
> interrupt handler repositions the PC if it sees the PC at interrupt was 
> in atomic_cmpxchg_32().

Given what you have described, there is no evidence for a bug in glibc 
rather than the kernel (and so no evidence that the problem you see should 
be fixed in glibc rather than the kernel).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Coldfire __lll_lock fails under heavy system stress
  2012-11-01 14:57   ` Ed Slas
@ 2012-11-01 16:52     ` Joseph S. Myers
  2012-11-02  3:35       ` Carlos O'Donell
  0 siblings, 1 reply; 6+ messages in thread
From: Joseph S. Myers @ 2012-11-01 16:52 UTC (permalink / raw)
  To: Ed Slas; +Cc: libc-ports

[-- Attachment #1: Type: text/plain, Size: 830 bytes --]

On Thu, 1 Nov 2012, Ed Slas wrote:

> Thanks for your time. I understand the kernel's atomic_cmpxchg_32() is 
> most likely the issue, but note that most of the other platforms use a 
> atomic lock in user space, then resort to the kernel to arbitrate 
> contentions. The Coldfire port makes the atomic_cmpxchg_32 kernel call 
> first, when there is a user space atomic lock available (TAS 
> instruction).

I don't believe TAS is sufficient to implement a general 
compare-and-exchange operation, such as is expected by NPTL.  The syscall 
is used because the ColdFire architecture has neither an atomic 
compare-and-exchange instruction, nor load-locked / store-conditional that 
are used on some other architectures to implement compare-and-exchange in 
userspace.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Coldfire __lll_lock fails under heavy system stress
  2012-11-01 16:52     ` Joseph S. Myers
@ 2012-11-02  3:35       ` Carlos O'Donell
  2012-11-02 12:02         ` Chris Metcalf
  0 siblings, 1 reply; 6+ messages in thread
From: Carlos O'Donell @ 2012-11-02  3:35 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Ed Slas, libc-ports

On Thu, Nov 1, 2012 at 12:51 PM, Joseph S. Myers
<joseph@codesourcery.com> wrote:
> On Thu, 1 Nov 2012, Ed Slas wrote:
>
>> Thanks for your time. I understand the kernel's atomic_cmpxchg_32() is
>> most likely the issue, but note that most of the other platforms use a
>> atomic lock in user space, then resort to the kernel to arbitrate
>> contentions. The Coldfire port makes the atomic_cmpxchg_32 kernel call
>> first, when there is a user space atomic lock available (TAS
>> instruction).
>
> I don't believe TAS is sufficient to implement a general
> compare-and-exchange operation, such as is expected by NPTL.  The syscall
> is used because the ColdFire architecture has neither an atomic
> compare-and-exchange instruction, nor load-locked / store-conditional that
> are used on some other architectures to implement compare-and-exchange in
> userspace.

Correct, TAS is not sufficient. You really do need to be able to CAS
in both userspace *and* in the kernel for futexes to be useful.

One defect on HP-PARISC was that our kernel-helper CAS didn't
coordinate with the futex syscall.

We fixed this by having the kernel-helper CAS use the same locks as
the futex syscall would use in order to complete the futex operation
when required.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Coldfire __lll_lock fails under heavy system stress
  2012-11-02  3:35       ` Carlos O'Donell
@ 2012-11-02 12:02         ` Chris Metcalf
  0 siblings, 0 replies; 6+ messages in thread
From: Chris Metcalf @ 2012-11-02 12:02 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Joseph S. Myers, Ed Slas, libc-ports

On 11/1/2012 11:35 PM, Carlos O'Donell wrote:
> On Thu, Nov 1, 2012 at 12:51 PM, Joseph S. Myers
> <joseph@codesourcery.com> wrote:
>> On Thu, 1 Nov 2012, Ed Slas wrote:
>>
>>> Thanks for your time. I understand the kernel's atomic_cmpxchg_32() is
>>> most likely the issue, but note that most of the other platforms use a
>>> atomic lock in user space, then resort to the kernel to arbitrate
>>> contentions. The Coldfire port makes the atomic_cmpxchg_32 kernel call
>>> first, when there is a user space atomic lock available (TAS
>>> instruction).
>> I don't believe TAS is sufficient to implement a general
>> compare-and-exchange operation, such as is expected by NPTL.  The syscall
>> is used because the ColdFire architecture has neither an atomic
>> compare-and-exchange instruction, nor load-locked / store-conditional that
>> are used on some other architectures to implement compare-and-exchange in
>> userspace.
> Correct, TAS is not sufficient. You really do need to be able to CAS
> in both userspace *and* in the kernel for futexes to be useful.
>
> One defect on HP-PARISC was that our kernel-helper CAS didn't
> coordinate with the futex syscall.
>
> We fixed this by having the kernel-helper CAS use the same locks as
> the futex syscall would use in order to complete the futex operation
> when required.

Our older 32-bit tilepro architecture has this same issue of supporting
only a single not-very-powerful atomic primitive, "tns".  It has the
semantics of "atomic_exchange(1)", i.e. you write the 32-bit value "1" and
get back the old value.  In the end we provided a kernel fastpath cmpxchg()
operation (as well as a few other atomic update primitives like "add",
"and", and "or"), and we use the kernel cmpxchg() in the glibc fastpath. 
The kernel fastpath is really much faster than a regular syscall, though. 
We leave interrupts disabled throughout, don't save/restore any registers,
and just take some bits in VA and hash them into an array of "tns" locks to
implement atomicity.  When cache/TLB is hot the whole syscall takes only
about 50 cycles.  (And note that kernel locks, futex locks, and the fast
atomic syscalls all coordinate with each other on tilepro.)

We tried implementing pthread locks with "tns" in userspace, but it's
tricky because you need that extra bit of state to track whether the mutex
is contended.  We ended up just using our kernel fastpath for everything. 
(Well, we use "tns" for pthread_spinlock.)

One approach we rejected early on, because it seemed hard to get right, was
to use two words of state for glibc's lowlevellock, and use "tns" on a
spinlock word plus having a value word that held the lock state (0, 1, or
2).  One problem with this approach is that if a thread gets
context-switched while holding the "tns" lock and before completing the
read-modify-write of the value portion of the lock, the lock gets frozen
and everyone else ends up busywaiting on the spinlock part.  Real atomic
intructions are much more convenient; good thing the Tilera chip architects
listened to us software folks for the current generation processor :-)

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-02 12:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-31 20:23 Coldfire __lll_lock fails under heavy system stress Ed Slas
2012-10-31 20:43 ` Joseph S. Myers
2012-11-01 14:57   ` Ed Slas
2012-11-01 16:52     ` Joseph S. Myers
2012-11-02  3:35       ` Carlos O'Donell
2012-11-02 12:02         ` Chris Metcalf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).