public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Glibc pthread_rwlock_timed*() Optimization Bug
@ 2021-09-07 21:31 Justin Chen
  2021-09-07 21:33 ` Florian Fainelli
  2021-09-08  6:20 ` Florian Weimer
  0 siblings, 2 replies; 4+ messages in thread
From: Justin Chen @ 2021-09-07 21:31 UTC (permalink / raw)
  To: libc-help; +Cc: Florian Fainelli

Hello all,

We are seeing a faulty compiler optimization with
pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
leading to deadlocks with some of our software.

We are using glibc-2.27, but I see the same issue with the current
master branch. This is built with the master branch using the
following configure.
 ../configure --host=arm-unknown-linux-gnueabihf
--prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
 make && make install

The code in question is in nptl/pthread_rwlock_common.c line 490-501
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
                      | PTHREAD_RWLOCK_FUTEX_USED)
                     == (1 | PTHREAD_RWLOCK_FUTEX_USED))
                {
                  /* TODO Back-off?  */
                }
              ready = true;
              break;

The compiled ASM is the following
   5dc98:       f043 0302       orr.w   r3, r3, #2
              atomic_thread_fence_acquire ();
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
   5dc9c:       2b03            cmp     r3, #3
   5dc9e:       d184            bne.n   5dbaa
<__pthread_rwlock_timedrdlock64+0x6e>
   5dca0:       e7fe            b.n     5dca0
<__pthread_rwlock_timedrdlock64+0x164>
We only read __wrphase_futex once then hit an infinite loop.

Adding volatile seems to do the trick.
diff --git a/sysdeps/nptl/bits/struct_rwlock.h
b/sysdeps/nptl/bits/struct_rwlock.h
index 2f8b7ac..cd47bd2 100644
--- a/sysdeps/nptl/bits/struct_rwlock.h
+++ b/sysdeps/nptl/bits/struct_rwlock.h
@@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
 {
   unsigned int __readers;
   unsigned int __writers;
-  unsigned int __wrphase_futex;
+  volatile unsigned int __wrphase_futex;
   unsigned int __writers_futex;
   unsigned int __pad3;
   unsigned int __pad4;

The compiled ASM with this change (and with a few declarations
corrected with the volatile type)
   5d2ca:       f043 0302       orr.w   r3, r3, #2
              atomic_thread_fence_acquire ();
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
   5d2ce:       2b03            cmp     r3, #3
   5d2d0:       d0fa            beq.n   5d2c8
<__pthread_rwlock_clockrdlock64+0x168>
   5d2d2:       e783            b.n     5d1dc
<__pthread_rwlock_clockrdlock64+0x7c>
No longer have infinite loop here.

It seems like the compiler is incorrectly optimizing the loop because
it is not informed that the value of __wrphase_futex can be changed in
another context, which I believe should be done with the volatile
attribute.
Does this analysis look correct?

Thank You,
Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Glibc pthread_rwlock_timed*() Optimization Bug
  2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
@ 2021-09-07 21:33 ` Florian Fainelli
  2021-09-08  6:14   ` Florian Weimer
  2021-09-08  6:20 ` Florian Weimer
  1 sibling, 1 reply; 4+ messages in thread
From: Florian Fainelli @ 2021-09-07 21:33 UTC (permalink / raw)
  To: Justin Chen, libc-help, fw



On 9/7/2021 2:31 PM, Justin Chen wrote:
> Hello all,
> 
> We are seeing a faulty compiler optimization with
> pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
> leading to deadlocks with some of our software.

FWIW, this was previously reported, along with a reproducer here:

https://sourceware.org/bugzilla/show_bug.cgi?id=24774

the comment about not triggering bug 24774 does not seem to related to 
that bug, was it a typo?

> 
> We are using glibc-2.27, but I see the same issue with the current
> master branch. This is built with the master branch using the
> following configure.
>   ../configure --host=arm-unknown-linux-gnueabihf
> --prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
>   make && make install
> 
> The code in question is in nptl/pthread_rwlock_common.c line 490-501
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>                        | PTHREAD_RWLOCK_FUTEX_USED)
>                       == (1 | PTHREAD_RWLOCK_FUTEX_USED))
>                  {
>                    /* TODO Back-off?  */
>                  }
>                ready = true;
>                break;
> 
> The compiled ASM is the following
>     5dc98:       f043 0302       orr.w   r3, r3, #2
>                atomic_thread_fence_acquire ();
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>     5dc9c:       2b03            cmp     r3, #3
>     5dc9e:       d184            bne.n   5dbaa
> <__pthread_rwlock_timedrdlock64+0x6e>
>     5dca0:       e7fe            b.n     5dca0
> <__pthread_rwlock_timedrdlock64+0x164>
> We only read __wrphase_futex once then hit an infinite loop.
> 
> Adding volatile seems to do the trick.
> diff --git a/sysdeps/nptl/bits/struct_rwlock.h
> b/sysdeps/nptl/bits/struct_rwlock.h
> index 2f8b7ac..cd47bd2 100644
> --- a/sysdeps/nptl/bits/struct_rwlock.h
> +++ b/sysdeps/nptl/bits/struct_rwlock.h
> @@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
>   {
>     unsigned int __readers;
>     unsigned int __writers;
> -  unsigned int __wrphase_futex;
> +  volatile unsigned int __wrphase_futex;
>     unsigned int __writers_futex;
>     unsigned int __pad3;
>     unsigned int __pad4;
> 
> The compiled ASM with this change (and with a few declarations
> corrected with the volatile type)
>     5d2ca:       f043 0302       orr.w   r3, r3, #2
>                atomic_thread_fence_acquire ();
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>     5d2ce:       2b03            cmp     r3, #3
>     5d2d0:       d0fa            beq.n   5d2c8
> <__pthread_rwlock_clockrdlock64+0x168>
>     5d2d2:       e783            b.n     5d1dc
> <__pthread_rwlock_clockrdlock64+0x7c>
> No longer have infinite loop here.
> 
> It seems like the compiler is incorrectly optimizing the loop because
> it is not informed that the value of __wrphase_futex can be changed in
> another context, which I believe should be done with the volatile
> attribute.
> Does this analysis look correct?
> 
> Thank You,
> Justin
> 

-- 
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Glibc pthread_rwlock_timed*() Optimization Bug
  2021-09-07 21:33 ` Florian Fainelli
@ 2021-09-08  6:14   ` Florian Weimer
  0 siblings, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2021-09-08  6:14 UTC (permalink / raw)
  To: Florian Fainelli via Libc-help; +Cc: Justin Chen, Florian Fainelli

* Florian Fainelli via Libc-help:

> On 9/7/2021 2:31 PM, Justin Chen wrote:
>> Hello all,
>> We are seeing a faulty compiler optimization with
>> pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
>> leading to deadlocks with some of our software.
>
> FWIW, this was previously reported, along with a reproducer here:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=24774
>
> the comment about not triggering bug 24774 does not seem to related to
> that bug, was it a typo?

Yes, it was a typo.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Glibc pthread_rwlock_timed*() Optimization Bug
  2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
  2021-09-07 21:33 ` Florian Fainelli
@ 2021-09-08  6:20 ` Florian Weimer
  1 sibling, 0 replies; 4+ messages in thread
From: Florian Weimer @ 2021-09-08  6:20 UTC (permalink / raw)
  To: Justin Chen via Libc-help; +Cc: Justin Chen, Florian Fainelli

* Justin Chen via Libc-help:

> It seems like the compiler is incorrectly optimizing the loop because
> it is not informed that the value of __wrphase_futex can be changed in
> another context, which I believe should be done with the volatile
> attribute.
> Does this analysis look correct?

Yes, it's a bug/misfeature in our custom atomics implementation.  Thanks
for tracking it down.

Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-08  6:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07 21:31 Glibc pthread_rwlock_timed*() Optimization Bug Justin Chen
2021-09-07 21:33 ` Florian Fainelli
2021-09-08  6:14   ` Florian Weimer
2021-09-08  6:20 ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).