From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oo1-xc2d.google.com (mail-oo1-xc2d.google.com [IPv6:2607:f8b0:4864:20::c2d]) by sourceware.org (Postfix) with ESMTPS id CD9B43858D3C for ; Tue, 19 Apr 2022 12:18:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD9B43858D3C Received: by mail-oo1-xc2d.google.com with SMTP id e7-20020a4aaac7000000b00330e3ddfd4bso2996000oon.8 for ; Tue, 19 Apr 2022 05:18:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=WYxmS6N/5tRsTFsaEAOh21q/oZDNGY+P9shJBQU8B7A=; b=F9EiaKvZRkVnvjLcQbe6/HkMv44C9jKGGiBmL6VrXjHzcViCHa6RHpgy99nsBIwJNP RZKVXL1qiaI0idEN8OQuoHWEkowt4DHNkaKFuChQo2I9txcMB0h3l+ypJHgeusbt6RDo qp+BSP3yUJdyclv92SX8WeFgEn1YJkZRXvvrRCk5M46XPiTxzXkwa2rEJvANyDnpmOib UsAI6gS8FnvuPiLLRVKuNUUWv39vO3EmGifAChawOuK2zwAGEu1RyAy519KyNSE5F4a9 jqvMeHaLuo6cnwt2Aain6eo3K9YYnmlM93sax86cQbzu9MEvqQUzHDxhr3JSczOt7Qrf yf8Q== X-Gm-Message-State: AOAM532SoN+R2GYY/NqDEtTu9rhoMBpfCevVQ3WzzSzRMjMnV+BqXOnC KeWGJdDNsJk0qbbjpGUBe6RAVA== X-Google-Smtp-Source: ABdhPJzonGq+Bm6nMs943d3/fwF+tSASlUQbB9M3aB9h1shXKuZkADtDloQ83vZP9dc5UmlfXnfGoQ== X-Received: by 2002:a05:6820:616:b0:33a:524a:e98c with SMTP id e22-20020a056820061600b0033a524ae98cmr1943335oow.95.1650370684948; Tue, 19 Apr 2022 05:18:04 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:c9d0:98f6:7aed:2f61:2745? ([2804:431:c7ca:c9d0:98f6:7aed:2f61:2745]) by smtp.gmail.com with ESMTPSA id mm21-20020a0568700e9500b000e6134b1b0bsm922617oab.33.2022.04.19.05.18.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 19 Apr 2022 05:18:04 -0700 (PDT) Message-ID: Date: Tue, 19 Apr 2022 09:18:01 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [PATCH v2] nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) Content-Language: en-US To: Szabolcs Nagy Cc: libc-alpha@sourceware.org, Florian Weimer , Aurelien Jarno References: <20220414154947.2187880-1-adhemerval.zanella@linaro.org> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Apr 2022 12:18:08 -0000 On 19/04/2022 07:44, Szabolcs Nagy wrote: > The 04/14/2022 12:49, Adhemerval Zanella via Libc-alpha wrote: >> Some Linux interfaces never restart after being interrupted by a signal >> handler, regardless of the use of SA_RESTART [1]. It means that for >> pthread cancellation, if the target thread disables cancellation with >> pthread_setcancelstate and calls such interfaces (like poll or select), >> it should not see spurious EINTR failures due the internal SIGCANCEL. >> >> However recent changes made pthread_cancel to always sent the internal >> signal, regardless of the target thread cancellation status or type. >> To fix it, the previous semantic is restored, where the cancel signal >> is only sent if the target thread has cancelation enabled in >> asynchronous mode. >> >> The cancel state and cancel type is moved back to cancelhandling >> and atomic operation are used to synchronize between threads. The >> patch essentially revert the following commits: >> >> 8c1c0aae20 nptl: Move cancel type out of cancelhandling >> 2b51742531 nptl: Move cancel state out of cancelhandling >> 26cfbb7162 nptl: Remove CANCELING_BITMASK >> >> However I changed the atomic operation to follow the internal C11 >> semantic and removed the MACRO usage, it simplifies a bit the >> resulting code (and removes another usage of the old atomic macros). >> >> Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, >> and powerpc64-linux-gnu. >> >> [1] https://man7.org/linux/man-pages/man7/signal.7.html >> >> Reviewed-by: Florian Weimer >> Tested-by: Aurelien Jarno >> --- >> v2: Fixed some typos and extended pthread_cancel comments. > > > since this commit various cancel tests fail for me (unreliably) > on aarch64 e.g. failures from 2 different test runs: > > FAIL: nptl/tst-cancel17 > FAIL: nptl/tst-cancelx5 > FAIL: nptl/tst-cond7 > FAIL: nptl/tst-pthread-raise-blocked-self > FAIL: nptl/tst-pthread_cancel-select-loop > > FAIL: nptl/tst-cancelx20 > FAIL: nptl/tst-cond7 > FAIL: nptl/tst-cond8 > FAIL: nptl/tst-join12 > FAIL: nptl/tst-key3 > FAIL: nptl/tst-pthread_cancel-select-loop > > an example run of nptl/tst-cond7 > > $ elf/ld.so --library-path . nptl/tst-cond7 --direct > round 0 > child created > parent: joining now > round 1 > child created > parent: joining now > round 2 > child created > parent: joining now > round 3 > child created > parent: joining now > round 4 > child created > parent: joining now > > where it is blocked forever: it seems pthread_cancel returns without > sending a signal (__pthread_kill_internal is not called) so join hangs. > But the signal should be only sent if thread is is cancelled and has async cancellation enabled, which is not the case for the tests. I am trying to reproduce it on an aarch64 machine, but I can't see t any failure tests above. I will double check if I revert everything and if the atomics usage are fully correct.