From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ej1-x62d.google.com (mail-ej1-x62d.google.com [IPv6:2a00:1450:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id 0DC99385F02A for ; Tue, 7 Sep 2021 21:32:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0DC99385F02A Received: by mail-ej1-x62d.google.com with SMTP id me10so931480ejb.11 for ; Tue, 07 Sep 2021 14:32:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=qloL+Rw95soGLBCIa3X0CwtQ/l+vqvfVRHj5marCSxc=; b=sIIaxPBZY5q9TxYup6hZQGNYSjoP1FJFDE7/58wVGBdy9HwE73axCFuhEmmYGyClzU +aT5ltNshxJ3GiX4m50gawD40seqqI0XwqPOl4e6cHfZuNNL375r3l3SippxMOWfzfeE FMvivwI8feMFt0GUewhrdDvI/4JTjK6c9Qbtt1Gi/IhE3fjb37MolCGyCwOPOLf2hJZ0 X7JrsFq0gNwagQvXnPxGllpYhaKKvOov0npY8z/dzzNTZ4sACUqlKmon3EhBdXivLQZX rT3VYi6un55UrReJBHszDv/Lwj4Rihj7B4yOen/0jIQVw1Qa3atVyVHelzN0skKbjn6F gUmQ== X-Gm-Message-State: AOAM533badL7dLQUtobClT8SaapnFI9MxcLRt/ODc/YiTjBIUwZhVJDC SMQ3A0G3o80/r/83tsM1Sx+MDWDO/AR7Udo7Prdtfs65JfA= X-Google-Smtp-Source: ABdhPJyi+4azLhCnEOkPUrdI1bF6vxjPu51ZSLXxBfoRXADg2Cn/61DCrJT9TD85dh460aqjYF5+FIvbgPu4BdbxD7s= X-Received: by 2002:a17:906:184e:: with SMTP id w14mr415954eje.526.1631050328813; Tue, 07 Sep 2021 14:32:08 -0700 (PDT) MIME-Version: 1.0 From: Justin Chen Date: Tue, 7 Sep 2021 14:31:58 -0700 Message-ID: Subject: Glibc pthread_rwlock_timed*() Optimization Bug To: libc-help@sourceware.org Cc: Florian Fainelli Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Sep 2021 21:32:11 -0000 Hello all, We are seeing a faulty compiler optimization with pthread_rwlock_timed*() when cross-compiling glibc for arm. This is leading to deadlocks with some of our software. We are using glibc-2.27, but I see the same issue with the current master branch. This is built with the master branch using the following configure. ../configure --host=arm-unknown-linux-gnueabihf --prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/ make && make install The code in question is in nptl/pthread_rwlock_common.c line 490-501 /* We still need to wait for explicit hand-over, but we must not use futex_wait anymore because we would just time out in this case and thus make the spin-waiting we need unnecessarily expensive. */ while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) | PTHREAD_RWLOCK_FUTEX_USED) == (1 | PTHREAD_RWLOCK_FUTEX_USED)) { /* TODO Back-off? */ } ready = true; break; The compiled ASM is the following 5dc98: f043 0302 orr.w r3, r3, #2 atomic_thread_fence_acquire (); /* We still need to wait for explicit hand-over, but we must not use futex_wait anymore because we would just time out in this case and thus make the spin-waiting we need unnecessarily expensive. */ while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) 5dc9c: 2b03 cmp r3, #3 5dc9e: d184 bne.n 5dbaa <__pthread_rwlock_timedrdlock64+0x6e> 5dca0: e7fe b.n 5dca0 <__pthread_rwlock_timedrdlock64+0x164> We only read __wrphase_futex once then hit an infinite loop. Adding volatile seems to do the trick. diff --git a/sysdeps/nptl/bits/struct_rwlock.h b/sysdeps/nptl/bits/struct_rwlock.h index 2f8b7ac..cd47bd2 100644 --- a/sysdeps/nptl/bits/struct_rwlock.h +++ b/sysdeps/nptl/bits/struct_rwlock.h @@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t { unsigned int __readers; unsigned int __writers; - unsigned int __wrphase_futex; + volatile unsigned int __wrphase_futex; unsigned int __writers_futex; unsigned int __pad3; unsigned int __pad4; The compiled ASM with this change (and with a few declarations corrected with the volatile type) 5d2ca: f043 0302 orr.w r3, r3, #2 atomic_thread_fence_acquire (); /* We still need to wait for explicit hand-over, but we must not use futex_wait anymore because we would just time out in this case and thus make the spin-waiting we need unnecessarily expensive. */ while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) 5d2ce: 2b03 cmp r3, #3 5d2d0: d0fa beq.n 5d2c8 <__pthread_rwlock_clockrdlock64+0x168> 5d2d2: e783 b.n 5d1dc <__pthread_rwlock_clockrdlock64+0x7c> No longer have infinite loop here. It seems like the compiler is incorrectly optimizing the loop because it is not informed that the value of __wrphase_futex can be changed in another context, which I believe should be done with the volatile attribute. Does this analysis look correct? Thank You, Justin