From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x235.google.com (mail-oi1-x235.google.com [IPv6:2607:f8b0:4864:20::235]) by sourceware.org (Postfix) with ESMTPS id 5EE043858D3C for ; Tue, 19 Apr 2022 12:30:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5EE043858D3C Received: by mail-oi1-x235.google.com with SMTP id r85so12722318oie.7 for ; Tue, 19 Apr 2022 05:30:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=z7ll+ybuifEM3IXaNKQ/u9xpbKZh8rFxJVtFLQMK4y0=; b=1rdVFoyGWQOK5iRk0QTLkBXIs+CZ8jSg2JKctB50zHwCh3nWxh0+uPGQHTLY2HijlX Xxznf7LxBvDE2tsohYCIQS8+tanAWMmMBYw80iZtBSwe9GlD2KQ3cRsSC+WmBwCX7GVw 1iSpAB2VntYeZ76d2CJWGsGIUdeHCADFqOfVsxQchq9lkTk+dv1rkdQZa7fcbkrqVJap U5sBLX1wRgLQ9xBAVeR+Z/Qiil1H3B45jPD5400y6FzU0LSj+XQXCAQnJ6gWHD/BOQjy HmAdEgBhXsQT7/fi+ForOqp5PCZ7hzEUgCQaG6tvQDwcDE5KHtWQ9pHKuS7GkGYnYpoi Pf1Q== X-Gm-Message-State: AOAM532LNZB9Iv/k/V8CsC5Ms+zSRxfQP8DvzIeT25Tqzd+HzKwQOCdb zLESNivba+IG3xPKpxq8Bp3x7Q== X-Google-Smtp-Source: ABdhPJzEQKZ7v/UyTbB5DrB4CDD858sPheN0rfHe9Vkz8VcdOnxIrNUvUh6FCVKX2yqTzWKvGZpQcg== X-Received: by 2002:a05:6808:14d4:b0:2da:3647:6b26 with SMTP id f20-20020a05680814d400b002da36476b26mr9068500oiw.161.1650371450640; Tue, 19 Apr 2022 05:30:50 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:c9d0:98f6:7aed:2f61:2745? ([2804:431:c7ca:c9d0:98f6:7aed:2f61:2745]) by smtp.gmail.com with ESMTPSA id oa9-20020a056870bc0900b000e60e510d5esm1077196oab.25.2022.04.19.05.30.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Apr 2022 05:30:49 -0700 (PDT) Message-ID: <5963984e-1185-0c65-b565-5485299eac39@linaro.org> Date: Tue, 19 Apr 2022 09:30:47 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v2] nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) Content-Language: en-US To: Szabolcs Nagy Cc: libc-alpha@sourceware.org, Florian Weimer , Aurelien Jarno References: <20220414154947.2187880-1-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-13.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2022 12:30:53 -0000 On 19/04/2022 09:10, Szabolcs Nagy wrote: > The 04/14/2022 12:49, Adhemerval Zanella via Libc-alpha wrote: >> --- a/nptl/pthread_cancel.c >> +++ b/nptl/pthread_cancel.c > ... >> + /* Some syscalls are never restarted after being interrupted by a signal >> + handler, regardless of the use of SA_RESTART (they always fail with >> + EINTR). So pthread_cancel cannot send SIGCANCEL unless the cancellation >> + is enabled and set as asynchronous (in this case the cancellation will >> + be acted in the cancellation handler instead by the syscall wrapper). >> + Otherwise the target thread is set as 'cancelling' (CANCELING_BITMASK) >> + by atomically setting 'cancelhandling' and the cancelation will be acted >> + upon on next cancellation entrypoing in the target thread. >> + >> + It also requires to atomically check if cancellation is enabled and >> + asynchronous, so both cancellation state and type are tracked on >> + 'cancelhandling'. */ >> + >> + int result = 0; >> + int oldval = atomic_load_relaxed (&pd->cancelhandling); >> + int newval; >> + do >> { >> - /* A single-threaded process should be able to kill itself, since there >> - is nothing in the POSIX specification that says that it cannot. So >> - we set multiple_threads to true so that cancellation points get >> - executed. */ >> - THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1); >> + newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK; >> + if (oldval == newval) >> + break; >> + >> + /* If the cancellation is handled asynchronously just send a >> + signal. We avoid this if possible since it's more >> + expensive. */ >> + if (cancel_enabled_and_canceled_and_async (newval)) >> + { >> + /* Mark the cancellation as "in progress". */ >> + int newval2 = oldval | CANCELING_BITMASK; >> + if (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, >> + &oldval, newval2)) >> + continue; > > this continue looks wrong, the cas can fail spuriously > (cancelhandling == oldval) and then continue jumps to... > >> + >> + if (pd == THREAD_SELF) >> + /* This is not merely an optimization: An application may >> + call pthread_cancel (pthread_self ()) without calling >> + pthread_create, so the signal handler may not have been >> + set up for a self-cancel. */ >> + { >> + pd->result = PTHREAD_CANCELED; >> + if ((newval & CANCELTYPE_BITMASK) != 0) >> + __do_cancel (); >> + } >> + else >> + /* The cancellation handler will take care of marking the >> + thread as canceled. */ >> + result = __pthread_kill_internal (th, SIGCANCEL); >> + >> + break; >> + } >> + >> + /* A single-threaded process should be able to kill itself, since >> + there is nothing in the POSIX specification that says that it >> + cannot. So we set multiple_threads to true so that cancellation >> + points get executed. */ >> + THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1); >> #ifndef TLS_MULTIPLE_THREADS_IN_TCB >> __libc_multiple_threads = 1; >> #endif >> - >> - THREAD_SETMEM (pd, result, PTHREAD_CANCELED); >> - if (pd->cancelstate == PTHREAD_CANCEL_ENABLE >> - && pd->canceltype == PTHREAD_CANCEL_ASYNCHRONOUS) >> - __do_cancel (); >> - return 0; >> } >> + while (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, &oldval, >> + newval)); > > ...here and this cas updates cancelhandling to newval without > actually doing any cancellation. Yeah, it looks wrong indeed thanks for catching it. Old code has a goto that I tried to remove (wrongly): do { again: oldval = pd->cancelhandling; newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK; [...] if (CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS (newval)) { if (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling, oldval | CANCELING_BITMASK, oldval); goto again; [...] } [...] } while (atomic_compare_and_exchange_bool_acq (&pd->cancelhandling, newval, oldval); Do this fix the intermittent issue you are seeing: diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c index c76882e279..e67b2df5cc 100644 --- a/nptl/pthread_cancel.c +++ b/nptl/pthread_cancel.c @@ -121,6 +121,7 @@ __pthread_cancel (pthread_t th) int newval; do { + again: newval = oldval | CANCELING_BITMASK | CANCELED_BITMASK; if (oldval == newval) break; @@ -134,7 +135,7 @@ __pthread_cancel (pthread_t th) int newval2 = oldval | CANCELING_BITMASK; if (!atomic_compare_exchange_weak_acquire (&pd->cancelhandling, &oldval, newval2)) - continue; + goto again; if (pd == THREAD_SELF) /* This is not merely an optimization: An application may