From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by sourceware.org (Postfix) with ESMTPS id 37EC43858023; Tue, 7 Sep 2021 21:34:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 37EC43858023 Received: by mail-pf1-x433.google.com with SMTP id e16so219838pfc.6; Tue, 07 Sep 2021 14:34:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=Ehq8IHcxja2VrHihPFkvK1cKIN2LfY4hQulE1kC0q8A=; b=uJ+76wBKEVvCltmGeLI8/UfLtSccoVlkNwnBkq+uE0DICjHrJYsMoYGL8f2QGs5AYB h/mKXIPfjSKLZ7Hx9SEyJIZFpbyOnk11vDphqWRnDA74Dn4cG+dIj9fHKR1Ru83K0/Ik S2+R1/rbVtXRoa3F8Px3eLM+uvgYGJYCKt+Nko3cUNXwJ8x3Nh79WLHxMOm7f8zaMvf5 qVwOm72OQfIMrY5wlixLls59dsfBpvH/lCa31bAq9u42yD5I7AD2EPHZ932Oez+YjvdQ mx8bWYqK4fwEqnjdTPGQML6ONSm57mOoLqSQDoe3ozJiH74CeCKvqw0TqlatrsQ40RCj SWFw== X-Gm-Message-State: AOAM533Ay+lVIN3cCXbGMZcj6G5wozD5Qkr7SBBvVWIlWHEpndfK2873 gxENZGtzvX1Go48FzA3GONLujedglv4= X-Google-Smtp-Source: ABdhPJwCvlYlmKEFLxG1E2Xd1uVzdw3CKX85YaPvOECmuzy9AYAn2H3sCjHlsW4GZGH+tnLANbzDTw== X-Received: by 2002:a63:b147:: with SMTP id g7mr370215pgp.478.1631050442169; Tue, 07 Sep 2021 14:34:02 -0700 (PDT) Received: from [10.230.31.46] ([192.19.223.252]) by smtp.gmail.com with ESMTPSA id 68sm50930pjz.0.2021.09.07.14.33.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 07 Sep 2021 14:34:01 -0700 (PDT) Message-ID: <00d50647-d99f-8296-03a1-c9ebc24fbdf3@gmail.com> Date: Tue, 7 Sep 2021 14:33:57 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.0.3 Subject: Re: Glibc pthread_rwlock_timed*() Optimization Bug Content-Language: en-US To: Justin Chen , libc-help@sourceware.org, fw@sourceware.org References: From: Florian Fainelli In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-15.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Sep 2021 21:34:05 -0000 On 9/7/2021 2:31 PM, Justin Chen wrote: > Hello all, > > We are seeing a faulty compiler optimization with > pthread_rwlock_timed*() when cross-compiling glibc for arm. This is > leading to deadlocks with some of our software. FWIW, this was previously reported, along with a reproducer here: https://sourceware.org/bugzilla/show_bug.cgi?id=24774 the comment about not triggering bug 24774 does not seem to related to that bug, was it a typo? > > We are using glibc-2.27, but I see the same issue with the current > master branch. This is built with the master branch using the > following configure. > ../configure --host=arm-unknown-linux-gnueabihf > --prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/ > make && make install > > The code in question is in nptl/pthread_rwlock_common.c line 490-501 > /* We still need to wait for explicit hand-over, but we must > not use futex_wait anymore because we would just time out > in this case and thus make the spin-waiting we need > unnecessarily expensive. */ > while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) > | PTHREAD_RWLOCK_FUTEX_USED) > == (1 | PTHREAD_RWLOCK_FUTEX_USED)) > { > /* TODO Back-off? */ > } > ready = true; > break; > > The compiled ASM is the following > 5dc98: f043 0302 orr.w r3, r3, #2 > atomic_thread_fence_acquire (); > /* We still need to wait for explicit hand-over, but we must > not use futex_wait anymore because we would just time out > in this case and thus make the spin-waiting we need > unnecessarily expensive. */ > while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) > 5dc9c: 2b03 cmp r3, #3 > 5dc9e: d184 bne.n 5dbaa > <__pthread_rwlock_timedrdlock64+0x6e> > 5dca0: e7fe b.n 5dca0 > <__pthread_rwlock_timedrdlock64+0x164> > We only read __wrphase_futex once then hit an infinite loop. > > Adding volatile seems to do the trick. > diff --git a/sysdeps/nptl/bits/struct_rwlock.h > b/sysdeps/nptl/bits/struct_rwlock.h > index 2f8b7ac..cd47bd2 100644 > --- a/sysdeps/nptl/bits/struct_rwlock.h > +++ b/sysdeps/nptl/bits/struct_rwlock.h > @@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t > { > unsigned int __readers; > unsigned int __writers; > - unsigned int __wrphase_futex; > + volatile unsigned int __wrphase_futex; > unsigned int __writers_futex; > unsigned int __pad3; > unsigned int __pad4; > > The compiled ASM with this change (and with a few declarations > corrected with the volatile type) > 5d2ca: f043 0302 orr.w r3, r3, #2 > atomic_thread_fence_acquire (); > /* We still need to wait for explicit hand-over, but we must > not use futex_wait anymore because we would just time out > in this case and thus make the spin-waiting we need > unnecessarily expensive. */ > while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex) > 5d2ce: 2b03 cmp r3, #3 > 5d2d0: d0fa beq.n 5d2c8 > <__pthread_rwlock_clockrdlock64+0x168> > 5d2d2: e783 b.n 5d1dc > <__pthread_rwlock_clockrdlock64+0x7c> > No longer have infinite loop here. > > It seems like the compiler is incorrectly optimizing the loop because > it is not informed that the value of __wrphase_futex can be changed in > another context, which I believe should be done with the volatile > attribute. > Does this analysis look correct? > > Thank You, > Justin > -- Florian