From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by sourceware.org (Postfix) with ESMTP id BFDFF3853824 for ; Fri, 20 Aug 2021 16:24:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BFDFF3853824 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-25-x_H1uidSMLuqMwUnf-DJoA-1; Fri, 20 Aug 2021 12:24:21 -0400 X-MC-Unique: x_H1uidSMLuqMwUnf-DJoA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 05378801A92; Fri, 20 Aug 2021 16:24:20 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.194.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 29FEE18432; Fri, 20 Aug 2021 16:24:18 +0000 (UTC) From: Florian Weimer To: Adhemerval Zanella Cc: Adhemerval Zanella via Libc-alpha Subject: Re: [PATCH 3/3] nptl: Fix race between pthread_kill and thread exit (bug 12889) References: <811aff39780f8d997421dcda8c6afaf01ea1ee6a.1629208174.git.fweimer@redhat.com> <88bdbb1e-2338-4fd2-0c03-89f2196ba239@linaro.org> <87r1epz8ym.fsf@oldenburg.str.redhat.com> <84b846a9-5304-3bca-bfc4-f1d15c26204f@linaro.org> Date: Fri, 20 Aug 2021 18:24:17 +0200 In-Reply-To: <84b846a9-5304-3bca-bfc4-f1d15c26204f@linaro.org> (Adhemerval Zanella's message of "Thu, 19 Aug 2021 13:49:30 -0300") Message-ID: <87v940ulse.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 20 Aug 2021 16:24:35 -0000 * Adhemerval Zanella: > On 19/08/2021 13:37, Florian Weimer wrote: >> * Adhemerval Zanella via Libc-alpha: >> >>> On 17/08/2021 10:51, Florian Weimer via Libc-alpha wrote: >>>> Thread exit is delayed until all pending pthread_kill operations >>>> on the thread have completed. This avoids sending signals to the >>>> wrong thread or a spurious ESRCH error. >>>> >>>> The test sysdeps/pthread/tst-pthread_cancel-select-loop.c is derived >>>> from a downstream test originally written by Marek Polacek. >>> >>> And I really hope we could first sort out the BZ#19951 and move away >>> of using pthread::tid as the synchronization member to check thread >>> termination. I sent a possible fix [1], but it does not fully handle >>> the pthread::tid access, it would require to a proper lock to >>> synchronize between thread exit and pthread_kill (as Rich has suggested >>> on BZ#12889). >>> >>> I will need to send an updated version of my pthread fixes patchset, >>> since clone3 changes broke it and INVALID_TD_P needs fixing as well. >>> >>> [1] https://patchwork.sourceware.org/project/glibc/patch/20210610193639.3650754-5-adhemerval.zanella@linaro.org/ >> >> I think my patch is independent of the other fixes. Its advantage over >> other proposals is that it does not add waiting to pthread_kill, so >> there is no signal-safety issue. > > I am not sure, the way I think we should fix to also fix BZ#19951 > would to add a lock to access 'tid' and uses not only on pthread_kill, > but also on pthread_getcpuclockid, pthread_getschedparam, > pthread_setschedparam, and pthread_setschedprio. The lock also > simplifies the code, there is no need to special handling of futex > internal state. I also think pthread_kill would also be simplified. > > The lock is just to simplify the code, if we really need to add > some scalability we might use a more clever code as you are proposing. I don't think we can wait for the fix for bug 19951 because glibc 2.34 has a real regression in pthread_cancel: it can fail spuriously with ESRCH. And it can happen with a single pthread_cancel call in the program, you don't even need concurrent pthread_cancel to trigger this bug now. I have moved the magic no-TID-reuse functionality into a separate function in the new version, so we can replace it once we have something better. Thanks, Florian