From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 94789 invoked by alias); 29 Jun 2018 07:39:28 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 94176 invoked by uid 89); 29 Jun 2018 07:39:28 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Subject: Re: FAIL nptl/tst-robustpi4 [BZ 23183] To: Stefan Liebler , libc-alpha@sourceware.org Cc: Carlos O'Donell , Torvald Riegel References: <1485447752.16721.17.camel@redhat.com> From: Florian Weimer Message-ID: <1e1bea42-78b2-0d3d-c36e-bcf6f37e2bb7@redhat.com> Date: Fri, 29 Jun 2018 07:39:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-SW-Source: 2018-06/txt/msg00936.txt.bz2 On 06/29/2018 08:54 AM, Stefan Liebler wrote: > On 01/26/2017 05:22 PM, Torvald Riegel wrote: >> On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote: >>> On 01/26/2017 10:29 AM, Stefan Liebler wrote: >>>> It seems as a race between futex- and exit-syscall causes ESRCH >>>> result from futex-syscall. >> >> I'll have a closer look at this. >> >>>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as >>>> well as with 4.6 on a LPAR (but less often). >>> >>> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware >>> across a wide number of kernels, but never tst-robustpi4. >>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=19004 >>> >>> The robustpi support is certainly not very robust as Torvald's >>> recent fixes show, and there still remains at least one design >>> flaw that can't be fixed. >>> >>> e.g. >>> https://sourceware.org/bugzilla/show_bug.cgi?id=14485 >> >> The underlying problem for that bug does not affect PI+robust, just >> robust, I think.  Unless I forgot about something, PI+robust should >> always use the kernel to unlock. > in the meantime, Florian Weimer could also reproduce this issue and > opened the bugzilla Bug 23183 - tst-robustpi4 test failure > (https://sourceware.org/bugzilla/show_bug.cgi?id=23183). > > I've also dig a bit deeper - see details in bugzilla - and was also able > to reproduce it on intel. > > If the thread with locked mutex is executing the exit-syscall > while the main-thread is executing the futex-syscall, > then it could lead to this ESRCH return value of the futex-syscall which > triggers the assertion. > > In this situation, the futex-syscall has already added the FUTEX_WAITERS > bit to the lock-value and is then calling attach_to_pi_owner(). > > The exit-syscall is now setting the lock-value to FUTEX_WAITERS | > FUTEX_OWNER_DIED and is proceeding. > > attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is > testing if the owner is currently exiting. In those cases, ESRCH is > returned! Does the kernel look at the TID and determine that it no longer exists, or does it use the FUTEX_OWNER_DIED bit to detect this situation? I'm worried that using the TID introduces a TID race here. Thanks, Florian