From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <f.fainelli@gmail.com>
Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com
 [IPv6:2607:f8b0:4864:20::433])
 by sourceware.org (Postfix) with ESMTPS id 37EC43858023;
 Tue,  7 Sep 2021 21:34:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 37EC43858023
Received: by mail-pf1-x433.google.com with SMTP id e16so219838pfc.6;
 Tue, 07 Sep 2021 14:34:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:references:from:in-reply-to
 :content-transfer-encoding;
 bh=Ehq8IHcxja2VrHihPFkvK1cKIN2LfY4hQulE1kC0q8A=;
 b=uJ+76wBKEVvCltmGeLI8/UfLtSccoVlkNwnBkq+uE0DICjHrJYsMoYGL8f2QGs5AYB
 h/mKXIPfjSKLZ7Hx9SEyJIZFpbyOnk11vDphqWRnDA74Dn4cG+dIj9fHKR1Ru83K0/Ik
 S2+R1/rbVtXRoa3F8Px3eLM+uvgYGJYCKt+Nko3cUNXwJ8x3Nh79WLHxMOm7f8zaMvf5
 qVwOm72OQfIMrY5wlixLls59dsfBpvH/lCa31bAq9u42yD5I7AD2EPHZ932Oez+YjvdQ
 mx8bWYqK4fwEqnjdTPGQML6ONSm57mOoLqSQDoe3ozJiH74CeCKvqw0TqlatrsQ40RCj
 SWFw==
X-Gm-Message-State: AOAM533Ay+lVIN3cCXbGMZcj6G5wozD5Qkr7SBBvVWIlWHEpndfK2873
 gxENZGtzvX1Go48FzA3GONLujedglv4=
X-Google-Smtp-Source: ABdhPJwCvlYlmKEFLxG1E2Xd1uVzdw3CKX85YaPvOECmuzy9AYAn2H3sCjHlsW4GZGH+tnLANbzDTw==
X-Received: by 2002:a63:b147:: with SMTP id g7mr370215pgp.478.1631050442169;
 Tue, 07 Sep 2021 14:34:02 -0700 (PDT)
Received: from [10.230.31.46] ([192.19.223.252])
 by smtp.gmail.com with ESMTPSA id 68sm50930pjz.0.2021.09.07.14.33.59
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Tue, 07 Sep 2021 14:34:01 -0700 (PDT)
Message-ID: <00d50647-d99f-8296-03a1-c9ebc24fbdf3@gmail.com>
Date: Tue, 7 Sep 2021 14:33:57 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.0.3
Subject: Re: Glibc pthread_rwlock_timed*() Optimization Bug
Content-Language: en-US
To: Justin Chen <justinpopo6@gmail.com>, libc-help@sourceware.org,
 fw@sourceware.org
References: <CAJx26kU_T=uam6uZOcHJ9_a-G2onePtHCU9q2F2g0OhoJzR51w@mail.gmail.com>
From: Florian Fainelli <f.fainelli@gmail.com>
In-Reply-To: <CAJx26kU_T=uam6uZOcHJ9_a-G2onePtHCU9q2F2g0OhoJzR51w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-15.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0,
 NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: libc-help@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-help mailing list <libc-help.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-help/>
List-Help: <mailto:libc-help-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-help>,
 <mailto:libc-help-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 07 Sep 2021 21:34:05 -0000


On 9/7/2021 2:31 PM, Justin Chen wrote:
> Hello all,
> 
> We are seeing a faulty compiler optimization with
> pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
> leading to deadlocks with some of our software.

FWIW, this was previously reported, along with a reproducer here:

https://sourceware.org/bugzilla/show_bug.cgi?id=24774

the comment about not triggering bug 24774 does not seem to related to 
that bug, was it a typo?

> 
> We are using glibc-2.27, but I see the same issue with the current
> master branch. This is built with the master branch using the
> following configure.
>   ../configure --host=arm-unknown-linux-gnueabihf
> --prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
>   make && make install
> 
> The code in question is in nptl/pthread_rwlock_common.c line 490-501
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>                        | PTHREAD_RWLOCK_FUTEX_USED)
>                       == (1 | PTHREAD_RWLOCK_FUTEX_USED))
>                  {
>                    /* TODO Back-off?  */
>                  }
>                ready = true;
>                break;
> 
> The compiled ASM is the following
>     5dc98:       f043 0302       orr.w   r3, r3, #2
>                atomic_thread_fence_acquire ();
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>     5dc9c:       2b03            cmp     r3, #3
>     5dc9e:       d184            bne.n   5dbaa
> <__pthread_rwlock_timedrdlock64+0x6e>
>     5dca0:       e7fe            b.n     5dca0
> <__pthread_rwlock_timedrdlock64+0x164>
> We only read __wrphase_futex once then hit an infinite loop.
> 
> Adding volatile seems to do the trick.
> diff --git a/sysdeps/nptl/bits/struct_rwlock.h
> b/sysdeps/nptl/bits/struct_rwlock.h
> index 2f8b7ac..cd47bd2 100644
> --- a/sysdeps/nptl/bits/struct_rwlock.h
> +++ b/sysdeps/nptl/bits/struct_rwlock.h
> @@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
>   {
>     unsigned int __readers;
>     unsigned int __writers;
> -  unsigned int __wrphase_futex;
> +  volatile unsigned int __wrphase_futex;
>     unsigned int __writers_futex;
>     unsigned int __pad3;
>     unsigned int __pad4;
> 
> The compiled ASM with this change (and with a few declarations
> corrected with the volatile type)
>     5d2ca:       f043 0302       orr.w   r3, r3, #2
>                atomic_thread_fence_acquire ();
>                /* We still need to wait for explicit hand-over, but we must
>                   not use futex_wait anymore because we would just time out
>                   in this case and thus make the spin-waiting we need
>                   unnecessarily expensive.  */
>                while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
>     5d2ce:       2b03            cmp     r3, #3
>     5d2d0:       d0fa            beq.n   5d2c8
> <__pthread_rwlock_clockrdlock64+0x168>
>     5d2d2:       e783            b.n     5d1dc
> <__pthread_rwlock_clockrdlock64+0x7c>
> No longer have infinite loop here.
> 
> It seems like the compiler is incorrectly optimizing the loop because
> it is not informed that the value of __wrphase_futex can be changed in
> another context, which I believe should be done with the volatile
> attribute.
> Does this analysis look correct?
> 
> Thank You,
> Justin
> 

-- 
Florian