* Priority Inversion and Unlimited Spin of pthread_rwlock_t
@ 2024-03-12 3:19 Peng Zheng
2024-03-12 3:48 ` Peng Zheng
2024-03-12 14:39 ` Florian Weimer
0 siblings, 2 replies; 5+ messages in thread
From: Peng Zheng @ 2024-03-12 3:19 UTC (permalink / raw)
To: libc-alpha
[-- Attachment #1: Type: text/plain, Size: 2577 bytes --]
Hi,
I found that there are several unlimited spins in the current
pthread_rwlock's implementation.
Therefore, it suffers from the same issue of user-space spinlocks as
mentioned in this LWN article ([0]):
> One thread might be spinning on a lock while the holder has been
preempted and isn't running at all. In such cases, the lock will not be
released soon, and the spinning just wastes CPU time. In the worst case,
the thread that is spinning may be the one that is keeping the lock
holder from running, meaning that the spinning thread is actively
preventing the lock it needs from being released. In such situations,
the code should simply stop spinning and go to sleep until the lock is
released.
I just encountered one such issue in an embedded Linux system: there
were several readers of priority SCHED_RR, and one writer of priority
SCHED_OTHER.
It was found that two high priority readers are spinning (consuming 100%
CPU) within the loop near the end of `__pthread_rwlock_rdlock_full`:
for (;;)
{
while (((wpf = atomic_load_relaxed (&rwlock->__data.__wrphase_futex))
| PTHREAD_RWLOCK_FUTEX_USED) == (1 | PTHREAD_RWLOCK_FUTEX_USED))
{/*omitted*/}
if (ready)
/* See below. */
break;
if ((atomic_load_acquire (&rwlock->__data.__readers)
& PTHREAD_RWLOCK_WRPHASE) == 0)
ready = true;
}
return 0;
And the SCHED_OTHER writer was just about to enable the
`__wrphase_futex` in `__pthread_rwlock_wrlock_full` (just one ARM
instruction away)
but never able to do that (the two readers ate nearly all available CPUs):
while ((r & PTHREAD_RWLOCK_WRPHASE) == 0
&& (r >> PTHREAD_RWLOCK_READER_SHIFT) == 0)
{
if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
&r, r | PTHREAD_RWLOCK_WRPHASE))
{
atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
/* writer was stuck HERE! */
goto done;
}
/* TODO Back-off. */
}
In ARM assembly:
move r3, #1 ; the writer is stuck HERE!
str r3,[r12,#8] ; r12 holds the address of rwlock->__data, and 8 is the
offset of __readers in __data
Unlimited user space spin is too dangerous to be used, how about
limiting the total number of spins before suspending using futex? Or
using rseq as mentioned in the LWN artible?
Any ideas?
[0] https://lwn.net/Articles/944895/
Regards,
--
Peng Zheng
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Priority Inversion and Unlimited Spin of pthread_rwlock_t
2024-03-12 3:19 Priority Inversion and Unlimited Spin of pthread_rwlock_t Peng Zheng
@ 2024-03-12 3:48 ` Peng Zheng
2024-03-14 7:32 ` Peng Zheng
2024-03-12 14:39 ` Florian Weimer
1 sibling, 1 reply; 5+ messages in thread
From: Peng Zheng @ 2024-03-12 3:48 UTC (permalink / raw)
To: libc-alpha
On 2024/3/12 11:19, Peng Zheng wrote:
> Hi,
>
> I found that there are several unlimited spins in the current
> pthread_rwlock's implementation.
> Therefore, it suffers from the same issue of user-space spinlocks as
> mentioned in this LWN article ([0]):
I forget to mention this issue is glibc-specific.
Just checked musl's source code, all spin in its pthread_rwlock_t is
limited to 100. For example:
int __pthread_rwlock_timedwrlock(pthread_rwlock_t *restrict rw, const
struct timespec *restrict at)
{
int r, t;
r = pthread_rwlock_trywrlock(rw);
if (r != EBUSY) return r;
int spins = 100;
while (spins-- && rw->_rw_lock && !rw->_rw_waiters) a_spin();
while ((r=__pthread_rwlock_trywrlock(rw))==EBUSY) {
if (!(r=rw->_rw_lock)) continue;
t = r | 0x80000000;
a_inc(&rw->_rw_waiters);
a_cas(&rw->_rw_lock, r, t);
r = __timedwait(&rw->_rw_lock, t, CLOCK_REALTIME, at, rw->_rw_shared^128);
a_dec(&rw->_rw_waiters);
if (r && r != EINTR) return r;
}
return r;
}
Regards,
--
Peng Zheng
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Priority Inversion and Unlimited Spin of pthread_rwlock_t
2024-03-12 3:19 Priority Inversion and Unlimited Spin of pthread_rwlock_t Peng Zheng
2024-03-12 3:48 ` Peng Zheng
@ 2024-03-12 14:39 ` Florian Weimer
2024-03-13 1:48 ` Peng Zheng
1 sibling, 1 reply; 5+ messages in thread
From: Florian Weimer @ 2024-03-12 14:39 UTC (permalink / raw)
To: Peng Zheng; +Cc: libc-alpha
* Peng Zheng:
> And the SCHED_OTHER writer was just about to enable the `__wrphase_futex` in
> `__pthread_rwlock_wrlock_full` (just one ARM instruction away)
> but never able to do that (the two readers ate nearly all available CPUs):
>
> while ((r & PTHREAD_RWLOCK_WRPHASE) == 0
> && (r >> PTHREAD_RWLOCK_READER_SHIFT) == 0)
> {
> if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
> &r, r | PTHREAD_RWLOCK_WRPHASE))
> {
> atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1); /* writer was stuck HERE! */
>
> goto done;
> }
> /* TODO Back-off. */
> }
Is this about filling in the TODO?
> Unlimited user space spin is too dangerous to be used, how about
> limiting the total number of spins before suspending using futex? Or
> using rseq as mentioned in the LWN artible?
> [0] https://lwn.net/Articles/944895/
The LWN article does not talk about priority inversion. With rseq,
userspace can detect preemption, but that may never happen with certain
realtime scheduling priorities.
Thanks,
Florian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Priority Inversion and Unlimited Spin of pthread_rwlock_t
2024-03-12 14:39 ` Florian Weimer
@ 2024-03-13 1:48 ` Peng Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Peng Zheng @ 2024-03-13 1:48 UTC (permalink / raw)
To: Florian Weimer; +Cc: libc-alpha
On 2024/3/12 22:39, Florian Weimer wrote:
> * Peng Zheng:
>
>> And the SCHED_OTHER writer was just about to enable the `__wrphase_futex` in
>> `__pthread_rwlock_wrlock_full` (just one ARM instruction away)
>> but never able to do that (the two readers ate nearly all available CPUs):
>>
>> while ((r & PTHREAD_RWLOCK_WRPHASE) == 0
>> && (r >> PTHREAD_RWLOCK_READER_SHIFT) == 0)
>> {
>> if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
>> &r, r | PTHREAD_RWLOCK_WRPHASE))
>> {
>> atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1); /* writer was stuck HERE! */
>>
>> goto done;
>> }
>> /* TODO Back-off. */
>> }
>
> Is this about filling in the TODO?
No, I forgot to remove inline comments from the original source.
It is about a low priority (SCHED_OTHER) writer, which was about to
acquire its lock, preempted by high priority readers, and thus was not
able to set `__wrphase_futex` to 1 (see IT IS PREEMPTED HERE comment).
while ((r & PTHREAD_RWLOCK_WRPHASE) == 0
&& (r >> PTHREAD_RWLOCK_READER_SHIFT) == 0)
{
if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
&r, r | PTHREAD_RWLOCK_WRPHASE))
{
/* IT IS PREEMPTED HERE */
atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
goto done;
}
}
And these two priority readers were stuck in a loop near the end of
`__pthread_rwlock_rdlock_full` eating all available CPU.
for (;;)
{
while (((wpf = atomic_load_relaxed (&rwlock->__data.__wrphase_futex))
| PTHREAD_RWLOCK_FUTEX_USED) == (1 | PTHREAD_RWLOCK_FUTEX_USED))
{/*omitted*/}
if (ready)
break;
if ((atomic_load_acquire (&rwlock->__data.__readers)
& PTHREAD_RWLOCK_WRPHASE) == 0)
ready = true;
}
return 0;
Note that PTHREAD_RWLOCK_WRPHASE was already set by the preempted
writer. That means `ready` is always false.
Note also that `__wrphase_futex` was not yet enabled by the preempted
writer. That means these readers can not wait on futex to stop spinning.
This illustrates one of the several unlimited spin possibilities and I
encounter two/three of them. If you are interested, I could provide
corresponding postmortem debug sessions.
--
Peng Zheng
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Priority Inversion and Unlimited Spin of pthread_rwlock_t
2024-03-12 3:48 ` Peng Zheng
@ 2024-03-14 7:32 ` Peng Zheng
0 siblings, 0 replies; 5+ messages in thread
From: Peng Zheng @ 2024-03-14 7:32 UTC (permalink / raw)
To: libc-alpha; +Cc: triegel
On 2024/3/12 11:48, Peng Zheng wrote:
> On 2024/3/12 11:19, Peng Zheng wrote:
>> Hi,
>>
>> I found that there are several unlimited spins in the current
>> pthread_rwlock's implementation.
>> Therefore, it suffers from the same issue of user-space spinlocks as
>> mentioned in this LWN article ([0]):
>
> I forget to mention this issue is glibc-specific.
> Just checked musl's source code, all spin in its pthread_rwlock_t is
> limited to 100. For example:
It seems that this issue is introduced by
cc25c8b4c1196a8c29e9a45b1e096b99a87b7f8c, whose design is quite
complicated.
It's highly non-trivial to add limited spin to the current implementation.
This issue has extensive impact because OpenSSL use wrlock as ordinary
mutex by default.
https://github.com/openssl/openssl/blob/3cb0755323281267211fbe951b94a2552e99d32a/crypto/threads_pthread.c#L622
And it impacts nearly all subsystems of OpenSSL.
https://github.com/openssl/openssl/blob/b372b1f76450acdfed1e2301a39810146e28b02c/crypto/ex_data.c#L50C22-L50C34
I can safely tell that nearly all linux running real-time task using
OpenSSL might be affected by this issue (like real-time video streaming
over TLS connection).
>
> int __pthread_rwlock_timedwrlock(pthread_rwlock_t *restrict rw, const
> struct timespec *restrict at)
> {
> int r, t;
>
> r = pthread_rwlock_trywrlock(rw);
> if (r != EBUSY) return r;
>
> int spins = 100;
> while (spins-- && rw->_rw_lock && !rw->_rw_waiters) a_spin();
>
> while ((r=__pthread_rwlock_trywrlock(rw))==EBUSY) {
> if (!(r=rw->_rw_lock)) continue;
> t = r | 0x80000000;
> a_inc(&rw->_rw_waiters);
> a_cas(&rw->_rw_lock, r, t);
> r = __timedwait(&rw->_rw_lock, t, CLOCK_REALTIME, at,
> rw->_rw_shared^128);
> a_dec(&rw->_rw_waiters);
> if (r && r != EINTR) return r;
> }
> return r;
> }
>
> Regards,
>
--
Peng Zheng
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-03-14 7:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12 3:19 Priority Inversion and Unlimited Spin of pthread_rwlock_t Peng Zheng
2024-03-12 3:48 ` Peng Zheng
2024-03-14 7:32 ` Peng Zheng
2024-03-12 14:39 ` Florian Weimer
2024-03-13 1:48 ` Peng Zheng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).