* Glibc pthread_rwlock_timed*() Optimization Bug
@ 2021-09-07 21:31 Justin Chen
2021-09-07 21:33 ` Florian Fainelli
2021-09-08 6:20 ` Florian Weimer
0 siblings, 2 replies; 4+ messages in thread
From: Justin Chen @ 2021-09-07 21:31 UTC (permalink / raw)
To: libc-help; +Cc: Florian Fainelli
Hello all,
We are seeing a faulty compiler optimization with
pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
leading to deadlocks with some of our software.
We are using glibc-2.27, but I see the same issue with the current
master branch. This is built with the master branch using the
following configure.
../configure --host=arm-unknown-linux-gnueabihf
--prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
make && make install
The code in question is in nptl/pthread_rwlock_common.c line 490-501
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
| PTHREAD_RWLOCK_FUTEX_USED)
== (1 | PTHREAD_RWLOCK_FUTEX_USED))
{
/* TODO Back-off? */
}
ready = true;
break;
The compiled ASM is the following
5dc98: f043 0302 orr.w r3, r3, #2
atomic_thread_fence_acquire ();
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
5dc9c: 2b03 cmp r3, #3
5dc9e: d184 bne.n 5dbaa
<__pthread_rwlock_timedrdlock64+0x6e>
5dca0: e7fe b.n 5dca0
<__pthread_rwlock_timedrdlock64+0x164>
We only read __wrphase_futex once then hit an infinite loop.
Adding volatile seems to do the trick.
diff --git a/sysdeps/nptl/bits/struct_rwlock.h
b/sysdeps/nptl/bits/struct_rwlock.h
index 2f8b7ac..cd47bd2 100644
--- a/sysdeps/nptl/bits/struct_rwlock.h
+++ b/sysdeps/nptl/bits/struct_rwlock.h
@@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
{
unsigned int __readers;
unsigned int __writers;
- unsigned int __wrphase_futex;
+ volatile unsigned int __wrphase_futex;
unsigned int __writers_futex;
unsigned int __pad3;
unsigned int __pad4;
The compiled ASM with this change (and with a few declarations
corrected with the volatile type)
5d2ca: f043 0302 orr.w r3, r3, #2
atomic_thread_fence_acquire ();
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
5d2ce: 2b03 cmp r3, #3
5d2d0: d0fa beq.n 5d2c8
<__pthread_rwlock_clockrdlock64+0x168>
5d2d2: e783 b.n 5d1dc
<__pthread_rwlock_clockrdlock64+0x7c>
No longer have infinite loop here.
It seems like the compiler is incorrectly optimizing the loop because
it is not informed that the value of __wrphase_futex can be changed in
another context, which I believe should be done with the volatile
attribute.
Does this analysis look correct?
Thank You,
Justin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Glibc pthread_rwlock_timed*() Optimization Bug
2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
@ 2021-09-07 21:33 ` Florian Fainelli
2021-09-08 6:14 ` Florian Weimer
2021-09-08 6:20 ` Florian Weimer
1 sibling, 1 reply; 4+ messages in thread
From: Florian Fainelli @ 2021-09-07 21:33 UTC (permalink / raw)
To: Justin Chen, libc-help, fw
On 9/7/2021 2:31 PM, Justin Chen wrote:
> Hello all,
>
> We are seeing a faulty compiler optimization with
> pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
> leading to deadlocks with some of our software.
FWIW, this was previously reported, along with a reproducer here:
https://sourceware.org/bugzilla/show_bug.cgi?id=24774
the comment about not triggering bug 24774 does not seem to related to
that bug, was it a typo?
>
> We are using glibc-2.27, but I see the same issue with the current
> master branch. This is built with the master branch using the
> following configure.
> ../configure --host=arm-unknown-linux-gnueabihf
> --prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
> make && make install
>
> The code in question is in nptl/pthread_rwlock_common.c line 490-501
> /* We still need to wait for explicit hand-over, but we must
> not use futex_wait anymore because we would just time out
> in this case and thus make the spin-waiting we need
> unnecessarily expensive. */
> while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
> | PTHREAD_RWLOCK_FUTEX_USED)
> == (1 | PTHREAD_RWLOCK_FUTEX_USED))
> {
> /* TODO Back-off? */
> }
> ready = true;
> break;
>
> The compiled ASM is the following
> 5dc98: f043 0302 orr.w r3, r3, #2
> atomic_thread_fence_acquire ();
> /* We still need to wait for explicit hand-over, but we must
> not use futex_wait anymore because we would just time out
> in this case and thus make the spin-waiting we need
> unnecessarily expensive. */
> while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
> 5dc9c: 2b03 cmp r3, #3
> 5dc9e: d184 bne.n 5dbaa
> <__pthread_rwlock_timedrdlock64+0x6e>
> 5dca0: e7fe b.n 5dca0
> <__pthread_rwlock_timedrdlock64+0x164>
> We only read __wrphase_futex once then hit an infinite loop.
>
> Adding volatile seems to do the trick.
> diff --git a/sysdeps/nptl/bits/struct_rwlock.h
> b/sysdeps/nptl/bits/struct_rwlock.h
> index 2f8b7ac..cd47bd2 100644
> --- a/sysdeps/nptl/bits/struct_rwlock.h
> +++ b/sysdeps/nptl/bits/struct_rwlock.h
> @@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
> {
> unsigned int __readers;
> unsigned int __writers;
> - unsigned int __wrphase_futex;
> + volatile unsigned int __wrphase_futex;
> unsigned int __writers_futex;
> unsigned int __pad3;
> unsigned int __pad4;
>
> The compiled ASM with this change (and with a few declarations
> corrected with the volatile type)
> 5d2ca: f043 0302 orr.w r3, r3, #2
> atomic_thread_fence_acquire ();
> /* We still need to wait for explicit hand-over, but we must
> not use futex_wait anymore because we would just time out
> in this case and thus make the spin-waiting we need
> unnecessarily expensive. */
> while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
> 5d2ce: 2b03 cmp r3, #3
> 5d2d0: d0fa beq.n 5d2c8
> <__pthread_rwlock_clockrdlock64+0x168>
> 5d2d2: e783 b.n 5d1dc
> <__pthread_rwlock_clockrdlock64+0x7c>
> No longer have infinite loop here.
>
> It seems like the compiler is incorrectly optimizing the loop because
> it is not informed that the value of __wrphase_futex can be changed in
> another context, which I believe should be done with the volatile
> attribute.
> Does this analysis look correct?
>
> Thank You,
> Justin
>
--
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Glibc pthread_rwlock_timed*() Optimization Bug
2021-09-07 21:33 ` Florian Fainelli
@ 2021-09-08 6:14 ` Florian Weimer
0 siblings, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2021-09-08 6:14 UTC (permalink / raw)
To: Florian Fainelli via Libc-help; +Cc: Justin Chen, Florian Fainelli
* Florian Fainelli via Libc-help:
> On 9/7/2021 2:31 PM, Justin Chen wrote:
>> Hello all,
>> We are seeing a faulty compiler optimization with
>> pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
>> leading to deadlocks with some of our software.
>
> FWIW, this was previously reported, along with a reproducer here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=24774
>
> the comment about not triggering bug 24774 does not seem to related to
> that bug, was it a typo?
Yes, it was a typo.
Thanks,
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Glibc pthread_rwlock_timed*() Optimization Bug
2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
2021-09-07 21:33 ` Florian Fainelli
@ 2021-09-08 6:20 ` Florian Weimer
1 sibling, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2021-09-08 6:20 UTC (permalink / raw)
To: Justin Chen via Libc-help; +Cc: Justin Chen, Florian Fainelli
* Justin Chen via Libc-help:
> It seems like the compiler is incorrectly optimizing the loop because
> it is not informed that the value of __wrphase_futex can be changed in
> another context, which I believe should be done with the volatile
> attribute.
> Does this analysis look correct?
Yes, it's a bug/misfeature in our custom atomics implementation. Thanks
for tracking it down.
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-09-08 6:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
2021-09-07 21:33 ` Florian Fainelli
2021-09-08 6:14 ` Florian Weimer
2021-09-08 6:20 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).