From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 62772 invoked by alias); 29 Jun 2018 06:55:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 62697 invoked by uid 89); 29 Jun 2018 06:54:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.0 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=unlock, Stefan, stefan, died X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: FAIL nptl/tst-robustpi4 [BZ 23183] To: libc-alpha@sourceware.org References: <1485447752.16721.17.camel@redhat.com> Cc: Florian Weimer , "Carlos O'Donell" , Torvald Riegel From: Stefan Liebler Date: Fri, 29 Jun 2018 06:55:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <1485447752.16721.17.camel@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit x-cbid: 18062906-0008-0000-0000-0000024D240C X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 18062906-0009-0000-0000-000021B3A098 Message-Id: X-SW-Source: 2018-06/txt/msg00933.txt.bz2 On 01/26/2017 05:22 PM, Torvald Riegel wrote: > On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote: >> On 01/26/2017 10:29 AM, Stefan Liebler wrote: >>> It seems as a race between futex- and exit-syscall causes ESRCH >>> result from futex-syscall. > > I'll have a closer look at this. > >>> I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as >>> well as with 4.6 on a LPAR (but less often). >> >> I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware >> across a wide number of kernels, but never tst-robustpi4. >> >> https://sourceware.org/bugzilla/show_bug.cgi?id=19004 >> >> The robustpi support is certainly not very robust as Torvald's >> recent fixes show, and there still remains at least one design >> flaw that can't be fixed. >> >> e.g. >> https://sourceware.org/bugzilla/show_bug.cgi?id=14485 > > The underlying problem for that bug does not affect PI+robust, just > robust, I think. Unless I forgot about something, PI+robust should > always use the kernel to unlock. > > > Hi, in the meantime, Florian Weimer could also reproduce this issue and opened the bugzilla Bug 23183 - tst-robustpi4 test failure (https://sourceware.org/bugzilla/show_bug.cgi?id=23183). I've also dig a bit deeper - see details in bugzilla - and was also able to reproduce it on intel. If the thread with locked mutex is executing the exit-syscall while the main-thread is executing the futex-syscall, then it could lead to this ESRCH return value of the futex-syscall which triggers the assertion. In this situation, the futex-syscall has already added the FUTEX_WAITERS bit to the lock-value and is then calling attach_to_pi_owner(). The exit-syscall is now setting the lock-value to FUTEX_WAITERS | FUTEX_OWNER_DIED and is proceeding. attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is testing if the owner is currently exiting. In those cases, ESRCH is returned! Back in glibc, this assertion is triggered: /* ESRCH can happen only for non-robust PI mutexes where the owner of the lock died. */ assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust); The assertion/comment does not agree with the current behaviour of the kernel. Any ideas? Bye Stefan