public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
Subject: Re: [PATCH 05/11] nptl: Deallocate the thread stack on setup failure (BZ #19511)
Date: Wed, 26 May 2021 16:19:18 -0300	[thread overview]
Message-ID: <4a561aad-4ba3-897b-e6b6-31305b06a088@linaro.org> (raw)
In-Reply-To: <877djlqswb.fsf@oldenburg.str.redhat.com>



On 26/05/2021 14:58, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> On 26/05/2021 14:33, Florian Weimer wrote:
>>> * Adhemerval Zanella via Libc-alpha:
>>>
>>>> @@ -819,9 +809,13 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
>>>>  	   CONCURRENCY NOTES above).  We can assert that STOPPED_START
>>>>  	   must have been true because thread creation didn't fail, but
>>>>  	   thread attribute setting did.  */
>>>> -	/* See bug 19511 which explains why doing nothing here is a
>>>> -	   resource leak for a joinable thread.  */
>>>> -	assert (stopped_start);
>>>> +        {
>>>> +	  assert (stopped_start);
>>>> +          if (stopped_start)
>>>> +  	    /* State (a), we own PD. The thread blocked on this lock will
>>>> +               finish due setup_failed being set.  */
>>>> +	    lll_unlock (pd->lock, LLL_PRIVATE);
>>>> +        }
>>>>        else
>>>>  	{
>>>>  	  /* State (e) and we have ownership of PD (see CONCURRENCY
>>>
>>> The assert followed by the if doesn't look right.  One should check
>>> setup_failed, I assume.  Is there any way to trigger the broken assert
>>> or the hang from the missing wakeup?
>>
>> This just check what is described in the CONCURRENCY NOTES:
>>
>> 204    Axioms:                                                                                          
>> 205 
>> 206    * The create_thread function can never set stopped_start to false.                               
>> 207    * The created thread can read stopped_start but *never write to it*.   
> 
> I meant this:
> 
> +	  assert (stopped_start);
> +         if (stopped_start)
> 
> It doesn't look right to me.

Yeah, the 'if' check is not really required.  It will hit this code path
only if clone has succeeded ('thread_ran' is true) but either 
sched_setaffinity or sched_setscheduler has failed (in this case
'stopped_start' will be always true).

I changed to:

  if (thread_ran)
    {
      /* State (c) or (d) and we may not have PD ownership (see
         CONCURRENCY NOTES above).  We can assert that STOPPED_START
         must have been true because thread creation didn't fail, but
         thread attribute setting did.  */
      assert (stopped_start);
      /* State (a), we own PD. The thread blocked on this lock will
         finish due setup_failed being set.
         The thread will be responsible to cleanup any allocated
         resource.  */
      lll_unlock (pd->lock, LLL_PRIVATE).
    }

> 
>>> I happend to have looked at this today for the sigaltstack stuff.
>>>
>>> I felt that it would be simpler to perform the kernel-side setup
>>> (affinity and scheduling policy) on the new thread.  Essentially use
>>> stopped_start mode unconditionally.  And use a separate lock word for
>>> the barrier/latch, instead of reusing the pd->lock.
>>>
>>> The advantage is that we can pass an arbitrary temporary object into the
>>> internal start function, such as the signal mask.  The creating thread
>>> blocks until the initialization has happened and the immediate fate of
>>> the thread has been decided.  After that point the creating thread
>>> deallocates the temporary object, and if the thread couldn't start, any
>>> thread-related data.  Otherwise the new thread takes ownership of that
>>> data.
>>>
>>> I think it will be much easier to explain what is going on in the
>>> concurrency notes.  We have an additional synchronization step on all
>>> created threads, but I doubt that this matters in practice.
>>>
>>> What do you think?  I'm not sure if the change should happen as part of
>>> this series.
>>
>> I see two potential issues on this approach:
>>
>>   1. The cpuset information is tied to the pthread_attr_t and can be
>>      potentially large, it means that if we want to make the created
>>      thread to consume we will need to either duplicate, add something
>>      like a reference count, or add more synchronization for the 
>>      consumer.  None is really simpler to implement and I don't think 
>>      it is really an improvement.
> 
> No, I think that's covered by the temporary object thing.  The creating
> thread will block until it's guaranteed that the new thread won't need
> this data anymore.
> 
>>   2. For both schedule parameters and affinity ideally we want to inform
>>      right away to user if the operation has failed.  If we move it to
>>      thread itself, it still would require the synchronization to inform
>>      pthread_create the operation has failed.  I don't see much improvement,
>>      the pthread_create latency will be essentially the same.
> 
> Correct.  But it will be much simpler because we can treat all
> kernel-side initialization the same.  For example, sigaltstack might
> fail with invalid flags, too.  And we'd definitely have to run that on
> the new thread.

The small downside is we will need to consume some of the thread own stack
to pass such information, but it can be only a pointer in any case.

But I still see setting the scheduling/affinity synchronous to have some
advantage: we can fast exit in the case of failure and join the thread
on pthread_create (instead of making it act a detached one).  This make
the failure case not 'leak' the thread context and resource out of
pthread_create.

It might be somewhat complex to code it right now, since we need to take
care of destructor for thread_local variables and thread-local data,
__libc_thread_freeres, etc.

That is also why I used the current unwind mechanism to exit in case of
setup failure (maybe instead of adding the setup_failed on the IS_DETACHED
check the patch could just setup the thread as detached, the pd->joinid
could be set non-atomically since it owns the pd).

In any case, since we do not implement this I don't have a preference.
Your arbitrary temporary object seems ok and should work on the
scheduler/affinity setup.

  reply	other threads:[~2021-05-26 19:19 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 16:57 [PATCH 00/11] nptl: pthread cancellation refactor Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 01/11] nptl: Move Linux createthread to nptl Adhemerval Zanella
2021-05-26 17:14   ` Florian Weimer
2021-05-26 16:57 ` [PATCH 02/11] nptl: Move createthread to pthread_create Adhemerval Zanella
2021-05-26 17:16   ` Florian Weimer
2021-05-26 17:29     ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 03/11] support: Add xpthread_attr_setaffinity_np wrapper Adhemerval Zanella
2021-05-26 17:17   ` Florian Weimer
2021-05-27 22:35   ` Joseph Myers
2021-05-28  1:08     ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 04/11] nptl: Add pthread_attr_setaffinity_np failure test Adhemerval Zanella
2021-05-26 17:21   ` Florian Weimer
2021-05-26 17:30     ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 05/11] nptl: Deallocate the thread stack on setup failure (BZ #19511) Adhemerval Zanella
2021-05-26 17:33   ` Florian Weimer
2021-05-26 17:51     ` Adhemerval Zanella
2021-05-26 17:58       ` Florian Weimer
2021-05-26 19:19         ` Adhemerval Zanella [this message]
2021-05-26 18:21   ` Andreas Schwab
2021-05-26 18:40     ` Adhemerval Zanella
2021-05-26 19:26   ` Adhemerval Zanella
2021-05-27  7:43     ` Florian Weimer
2021-05-26 16:57 ` [PATCH 06/11] nptl: Install cancellation handler on pthread_cancel Adhemerval Zanella
2021-05-26 17:38   ` Florian Weimer
2021-05-26 17:52     ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 07/11] nptl: Remove CANCELING_BITMASK Adhemerval Zanella
2021-05-26 18:02   ` Florian Weimer
2021-06-15 22:07   ` Florian Weimer
2021-06-15 23:33     ` Adhemerval Zanella
2021-06-16 12:46       ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 08/11] nptl: Move cancel state out of cancelhandling Adhemerval Zanella
2021-05-26 18:20   ` Florian Weimer
2021-05-27 16:40     ` Adhemerval Zanella
2021-05-27 16:48       ` Florian Weimer
2021-05-27 16:57         ` Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 09/11] nptl: Move cancel type " Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 10/11] nptl: Implement raise in terms of pthread_kill Adhemerval Zanella
2021-05-26 16:57 ` [PATCH 11/11] nptl: Use pthread_kill on pthread_cancel Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a561aad-4ba3-897b-e6b6-31305b06a088@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).