From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2a.google.com (mail-qv1-xf2a.google.com [IPv6:2607:f8b0:4864:20::f2a]) by sourceware.org (Postfix) with ESMTPS id AC3623861031 for ; Wed, 26 May 2021 17:51:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AC3623861031 Received: by mail-qv1-xf2a.google.com with SMTP id u33so1179570qvf.9 for ; Wed, 26 May 2021 10:51:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=QEONnMPRoNdURulpYEeZwTr5P2vvwfUdjv3sVdyaVkc=; b=pLBOZMTOZeUazWrgCP38cEnIyPMRlNGFOnOCrU08zOlDeNmOl1p6zHPuQFs48Howlu WWd4KLKlGurasYRClNClGuZgOtxXRQAqFiq1Tc035lZPgwC0+/BqdrR1pBIzQnbAJRlH T2Oe7IhVaaL09V2uWV/TDSO+e8fgg6YgiVkNJQ0cW8YeCF2dJCwmHThZPC423juCRVMG HbZCztDHkQlF2mdPboPRz+J5ylmjdsTlMuoSX9xcTYL0jDAB7BNII/c+/LYJ6OlCo9kK exGG9oJaTVoIsvZQSs+K4tDq6dEKi3dOnEDVMy53+TX6ZasxUUC2WnyH6+IhEyntdfmP 7aCg== X-Gm-Message-State: AOAM531c5/erRy1XIP09Qkk4jX/BMD+yXCr2omuoS1aDKwdMjSz4HGQz fnRw+WHUizUiEeb9S0UdxLrGSzKWxAbCQw== X-Google-Smtp-Source: ABdhPJwvMJ0WqLTou1ECVCMHlqD4XJPHzv1gUfj16InUGuA2nEoWklHQJkLVGt+P9Bbq6TypQlCbCQ== X-Received: by 2002:a05:6214:17cb:: with SMTP id cu11mr44689921qvb.27.1622051504123; Wed, 26 May 2021 10:51:44 -0700 (PDT) Received: from [192.168.1.4] ([177.194.37.86]) by smtp.gmail.com with ESMTPSA id i11sm1892751qtv.8.2021.05.26.10.51.42 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 26 May 2021 10:51:43 -0700 (PDT) Subject: Re: [PATCH 05/11] nptl: Deallocate the thread stack on setup failure (BZ #19511) To: Florian Weimer , Adhemerval Zanella via Libc-alpha References: <20210526165728.1772546-1-adhemerval.zanella@linaro.org> <20210526165728.1772546-6-adhemerval.zanella@linaro.org> <87fsy9qu1p.fsf@oldenburg.str.redhat.com> From: Adhemerval Zanella Message-ID: Date: Wed, 26 May 2021 14:51:41 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <87fsy9qu1p.fsf@oldenburg.str.redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 May 2021 17:51:46 -0000 On 26/05/2021 14:33, Florian Weimer wrote: > * Adhemerval Zanella via Libc-alpha: > >> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, >> CONCURRENCY NOTES above). We can assert that STOPPED_START >> must have been true because thread creation didn't fail, but >> thread attribute setting did. */ >> - /* See bug 19511 which explains why doing nothing here is a >> - resource leak for a joinable thread. */ >> - assert (stopped_start); >> + { >> + assert (stopped_start); >> + if (stopped_start) >> + /* State (a), we own PD. The thread blocked on this lock will >> + finish due setup_failed being set. */ >> + lll_unlock (pd->lock, LLL_PRIVATE); >> + } >> else >> { >> /* State (e) and we have ownership of PD (see CONCURRENCY > > The assert followed by the if doesn't look right. One should check > setup_failed, I assume. Is there any way to trigger the broken assert > or the hang from the missing wakeup? This just check what is described in the CONCURRENCY NOTES: 204 Axioms: 205 206 * The create_thread function can never set stopped_start to false. 207 * The created thread can read stopped_start but *never write to it*. > > I happend to have looked at this today for the sigaltstack stuff. > > I felt that it would be simpler to perform the kernel-side setup > (affinity and scheduling policy) on the new thread. Essentially use > stopped_start mode unconditionally. And use a separate lock word for > the barrier/latch, instead of reusing the pd->lock. > > The advantage is that we can pass an arbitrary temporary object into the > internal start function, such as the signal mask. The creating thread > blocks until the initialization has happened and the immediate fate of > the thread has been decided. After that point the creating thread > deallocates the temporary object, and if the thread couldn't start, any > thread-related data. Otherwise the new thread takes ownership of that > data. > > I think it will be much easier to explain what is going on in the > concurrency notes. We have an additional synchronization step on all > created threads, but I doubt that this matters in practice. > > What do you think? I'm not sure if the change should happen as part of > this series. I see two potential issues on this approach: 1. The cpuset information is tied to the pthread_attr_t and can be potentially large, it means that if we want to make the created thread to consume we will need to either duplicate, add something like a reference count, or add more synchronization for the consumer. None is really simpler to implement and I don't think it is really an improvement. 2. For both schedule parameters and affinity ideally we want to inform right away to user if the operation has failed. If we move it to thread itself, it still would require the synchronization to inform pthread_create the operation has failed. I don't see much improvement, the pthread_create latency will be essentially the same. After reading the CONCURRENCY NOTES (great work on this btw), I think the main issue on the resource deallocation here is mixing pthread cancellation and thread creation.