From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <justinpopo6@gmail.com>
Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com
 [IPv6:2a00:1450:4864:20::62d])
 by sourceware.org (Postfix) with ESMTPS id 0DC99385F02A
 for <libc-help@sourceware.org>; Tue,  7 Sep 2021 21:32:10 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0DC99385F02A
Received: by mail-ej1-x62d.google.com with SMTP id me10so931480ejb.11
 for <libc-help@sourceware.org>; Tue, 07 Sep 2021 14:32:10 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc;
 bh=qloL+Rw95soGLBCIa3X0CwtQ/l+vqvfVRHj5marCSxc=;
 b=sIIaxPBZY5q9TxYup6hZQGNYSjoP1FJFDE7/58wVGBdy9HwE73axCFuhEmmYGyClzU
 +aT5ltNshxJ3GiX4m50gawD40seqqI0XwqPOl4e6cHfZuNNL375r3l3SippxMOWfzfeE
 FMvivwI8feMFt0GUewhrdDvI/4JTjK6c9Qbtt1Gi/IhE3fjb37MolCGyCwOPOLf2hJZ0
 X7JrsFq0gNwagQvXnPxGllpYhaKKvOov0npY8z/dzzNTZ4sACUqlKmon3EhBdXivLQZX
 rT3VYi6un55UrReJBHszDv/Lwj4Rihj7B4yOen/0jIQVw1Qa3atVyVHelzN0skKbjn6F
 gUmQ==
X-Gm-Message-State: AOAM533badL7dLQUtobClT8SaapnFI9MxcLRt/ODc/YiTjBIUwZhVJDC
 SMQ3A0G3o80/r/83tsM1Sx+MDWDO/AR7Udo7Prdtfs65JfA=
X-Google-Smtp-Source: ABdhPJyi+4azLhCnEOkPUrdI1bF6vxjPu51ZSLXxBfoRXADg2Cn/61DCrJT9TD85dh460aqjYF5+FIvbgPu4BdbxD7s=
X-Received: by 2002:a17:906:184e:: with SMTP id
 w14mr415954eje.526.1631050328813; 
 Tue, 07 Sep 2021 14:32:08 -0700 (PDT)
MIME-Version: 1.0
From: Justin Chen <justinpopo6@gmail.com>
Date: Tue, 7 Sep 2021 14:31:58 -0700
Message-ID: <CAJx26kU_T=uam6uZOcHJ9_a-G2onePtHCU9q2F2g0OhoJzR51w@mail.gmail.com>
Subject: Glibc pthread_rwlock_timed*() Optimization Bug
To: libc-help@sourceware.org
Cc: Florian Fainelli <f.fainelli@gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT,
 FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-help@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-help mailing list <libc-help.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-help/>
List-Help: <mailto:libc-help-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Sep 2021 21:32:11 -0000

Hello all,

We are seeing a faulty compiler optimization with
pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
leading to deadlocks with some of our software.

We are using glibc-2.27, but I see the same issue with the current
master branch. This is built with the master branch using the
following configure.
 ../configure --host=arm-unknown-linux-gnueabihf
--prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
 make && make install

The code in question is in nptl/pthread_rwlock_common.c line 490-501
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
                      | PTHREAD_RWLOCK_FUTEX_USED)
                     == (1 | PTHREAD_RWLOCK_FUTEX_USED))
                {
                  /* TODO Back-off?  */
                }
              ready = true;
              break;

The compiled ASM is the following
   5dc98:       f043 0302       orr.w   r3, r3, #2
              atomic_thread_fence_acquire ();
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
   5dc9c:       2b03            cmp     r3, #3
   5dc9e:       d184            bne.n   5dbaa
<__pthread_rwlock_timedrdlock64+0x6e>
   5dca0:       e7fe            b.n     5dca0
<__pthread_rwlock_timedrdlock64+0x164>
We only read __wrphase_futex once then hit an infinite loop.

Adding volatile seems to do the trick.
diff --git a/sysdeps/nptl/bits/struct_rwlock.h
b/sysdeps/nptl/bits/struct_rwlock.h
index 2f8b7ac..cd47bd2 100644
--- a/sysdeps/nptl/bits/struct_rwlock.h
+++ b/sysdeps/nptl/bits/struct_rwlock.h
@@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
 {
   unsigned int __readers;
   unsigned int __writers;
-  unsigned int __wrphase_futex;
+  volatile unsigned int __wrphase_futex;
   unsigned int __writers_futex;
   unsigned int __pad3;
   unsigned int __pad4;

The compiled ASM with this change (and with a few declarations
corrected with the volatile type)
   5d2ca:       f043 0302       orr.w   r3, r3, #2
              atomic_thread_fence_acquire ();
              /* We still need to wait for explicit hand-over, but we must
                 not use futex_wait anymore because we would just time out
                 in this case and thus make the spin-waiting we need
                 unnecessarily expensive.  */
              while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
   5d2ce:       2b03            cmp     r3, #3
   5d2d0:       d0fa            beq.n   5d2c8
<__pthread_rwlock_clockrdlock64+0x168>
   5d2d2:       e783            b.n     5d1dc
<__pthread_rwlock_clockrdlock64+0x7c>
No longer have infinite loop here.

It seems like the compiler is incorrectly optimizing the loop because
it is not informed that the value of __wrphase_futex can be changed in
another context, which I believe should be done with the volatile
attribute.
Does this analysis look correct?

Thank You,
Justin