From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by sourceware.org (Postfix) with ESMTPS id DBEE93AAB4AA for ; Tue, 1 Jun 2021 13:55:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DBEE93AAB4AA Received: by mail-qk1-x72e.google.com with SMTP id v8so14287436qkv.1 for ; Tue, 01 Jun 2021 06:55:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:from:to:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=GaNMJE8ChbOfMqiMMtY3hNBSCKEUHQQkzF0GmEz5qNk=; b=uVJBYYwuKzpipN0hpgDvZ6PmXx2abwYIzaPOxnUPRjIhJWbpqVnvE04K9p9sjiY0xj yAMArZLUCQ/eC8QCOedAa5zTAo1bd103SnbnKkPbHNwOX7QLZ6RRIe7ePkOA99EQSaTt 4SQ4qbuMH0KCi3XQSO2cghLtV7CEsIdS5N2Xha8flwjf0iLMXjOcb0wZYK8xkvwXagEP bk0Utr1X+xdQwonpFzKAZDzLVVZr9XL6I5MTTY/rjv9NAwyM4XfOnxH/aU2/Jtq9cl0r PCFmmEpXnQfe4BWA9byvGskOeOs2kiyYztZRrIl3g4U/kBnZe/SS4z/z66KqFfuPtJc9 B5cA== X-Gm-Message-State: AOAM5336pqSfgiKZ/fhLp0+ZyjvsvxaMpbyGakHN1WFJyDAz5jsLYPrc S33RTBi2KVwuT3F+yD4faqbMO3L3v+nZnQ== X-Google-Smtp-Source: ABdhPJxHgCaoD+x+ZfaP48jEHg0dR/Obk4xbdxI7Ipjf/TEyspa2vqg10EAjhkEqaNVP1vbBVXVAdQ== X-Received: by 2002:a37:b0f:: with SMTP id 15mr2725373qkl.210.1622555746266; Tue, 01 Jun 2021 06:55:46 -0700 (PDT) Received: from [192.168.1.4] ([177.194.59.218]) by smtp.gmail.com with ESMTPSA id v20sm828514qtq.67.2021.06.01.06.55.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 01 Jun 2021 06:55:45 -0700 (PDT) Subject: Re: [PATCH v2 2/9] nptl: Deallocate the thread stack on setup failure (BZ #19511) From: Adhemerval Zanella To: Florian Weimer , Adhemerval Zanella via Libc-alpha References: <20210527172823.3461314-1-adhemerval.zanella@linaro.org> <20210527172823.3461314-3-adhemerval.zanella@linaro.org> <87r1hm6l4g.fsf@oldenburg.str.redhat.com> <69aac17c-75e1-5b80-7df7-1dc09dea18b4@linaro.org> Message-ID: <6d2dc011-054b-56be-7ba7-9fdd91003bb6@linaro.org> Date: Tue, 1 Jun 2021 10:55:43 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <69aac17c-75e1-5b80-7df7-1dc09dea18b4@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Jun 2021 13:55:48 -0000 On 01/06/2021 10:08, Adhemerval Zanella wrote: > > > On 01/06/2021 05:32, Florian Weimer wrote: >> * Adhemerval Zanella via Libc-alpha: >> >>> @@ -424,17 +409,22 @@ start_thread (void *arg) >>> have ownership (see CONCURRENCY NOTES above). */ >>> if (__glibc_unlikely (pd->stopped_start)) >>> { >>> /* Get the lock the parent locked to force synchronization. */ >>> lll_lock (pd->lock, LLL_PRIVATE); >>> >>> /* We have ownership of PD now. */ >>> + if (pd->setup_failed == 1) >>> + { >>> + /* In the case of a failed setup, the created thread will be >>> + responsible to free up the allocated resources. The >>> + detached mode ensure it won't be joined and it will trigger >>> + the required cleanup. */ >>> + pd->joinid = pd; >>> + __do_cancel (); >>> + } >>> >>> /* And give it up right away. */ >>> lll_unlock (pd->lock, LLL_PRIVATE); >>> } >> >> I don't think this is quite right. The failed thread should not linger >> around after pthread_create has reported failure to the caller. It's >> better than what we had before, so maybe we should use this version and >> clean it up further later. > > Agreed, but I think this is really an improvement over current strategy > as I commented earlier [1]. I think to proper fix would move the > lock prior any initialization so we don't need to run any unwinding > operations, skip directly to INTERNAL_SYSCALL_CALL (exit, 0), and let the > parent join the thread and do the resource deallocation. > >> >> At the point of the __do_cancel call, unwinding may not yet initialized, >> so the implied dlopen happens on a potentially very small stack. It >> should be possible to replace it with a goto to the cleanup code. Or >> maybe put the cleanup code into a separate function and call it here and >> on the regular cleanup path. >> >> Apart from that, it looks fine to me. > > The __do_cancel is not done really in asynchronous mode, so it will use > the allocated thread stack for the dlopen. But I think with a better > strategy to move the 'struct thread' lock earlier than any initialization > will avoid such issue. > >> >> Thanks, >> Florian >> > > [1] https://sourceware.org/pipermail/libc-alpha/2021-May/126879.html > The patch below on top of this patch implements the synchronous thread join in case of setup failure. It shows no regressions on testcases, and it is slight better because it avoid calling the __pthread_unwind for such cases (and issuing the libgcc dlopen) In fact since there is no registered user case that might required calling either cancellation cleanup handler or thread destructors, I think there is no need to actually issue a pthread_exit like termination. -- diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 018af30c85..7a4c742416 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -346,6 +346,23 @@ start_thread (void *arg) { struct pthread *pd = arg; + /* We are either in (a) or (b), and in either case we either own PD already + (2) or are about to own PD (1), and so our only restriction would be that + we can't free PD until we know we have ownership (see CONCURRENCY NOTES + above). */ + if (__glibc_unlikely (pd->stopped_start)) + { + /* Get the lock the parent locked to force synchronization. */ + lll_lock (pd->lock, LLL_PRIVATE); + + /* We have ownership of PD now. */ + if (pd->setup_failed == 1) + goto out; + + /* And give it up right away. */ + lll_unlock (pd->lock, LLL_PRIVATE); + } + /* Initialize resolver state pointer. */ __resp = &pd->res; @@ -403,30 +420,6 @@ start_thread (void *arg) /* Store the new cleanup handler info. */ THREAD_SETMEM (pd, cleanup_jmp_buf, &unwind_buf); - /* We are either in (a) or (b), and in either case we either own - PD already (2) or are about to own PD (1), and so our only - restriction would be that we can't free PD until we know we - have ownership (see CONCURRENCY NOTES above). */ - if (__glibc_unlikely (pd->stopped_start)) - { - /* Get the lock the parent locked to force synchronization. */ - lll_lock (pd->lock, LLL_PRIVATE); - - /* We have ownership of PD now. */ - if (pd->setup_failed == 1) - { - /* In the case of a failed setup, the created thread will be - responsible to free up the allocated resources. The - detached mode ensure it won't be joined and it will trigger - the required cleanup. */ - pd->joinid = pd; - __do_cancel (); - } - - /* And give it up right away. */ - lll_unlock (pd->lock, LLL_PRIVATE); - } - LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg); /* Run the code the user provided. */ @@ -556,6 +549,7 @@ start_thread (void *arg) /* Free the TCB. */ __nptl_free_tcb (pd); +out: /* We cannot call '_exit' here. '_exit' will terminate the process. The 'exit' implementation in the kernel will signal when the @@ -802,6 +796,8 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, thread. */ __libc_signal_restore_set (&original_sigmask); + int setup_failed = 0; + if (__glibc_unlikely (retval != 0)) { if (thread_ran) @@ -814,7 +810,7 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, /* State (a), we own PD. The thread blocked on this lock will finish due setup_failed being set. The thread will be responsible to cleanup any allocated resource. */ - pd->setup_failed = 1; + setup_failed = pd->setup_failed = 1; lll_unlock (pd->lock, LLL_PRIVATE); } else @@ -857,6 +853,9 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr, THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1); } + if (setup_failed == 1) + __pthread_join (*newthread, NULL); + out: if (destroy_default_attr) __pthread_attr_destroy (&default_attr.external);