public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* RFC: clone with CLONE_VM behavior
@ 2016-04-13 13:26 Adhemerval Zanella
  2016-04-13 13:32 ` Adhemerval Zanella
  0 siblings, 1 reply; 4+ messages in thread
From: Adhemerval Zanella @ 2016-04-13 13:26 UTC (permalink / raw)
  To: GNU C Library

Hi all,

Szabolcs has brought to my attention that the new posix_spawn is showing
some issue on his aarch64 [1], but it is not limited to aarch64.

The problem is due the fact GLIBC clone implementation resets both
THREAD_SELF pid and tid when CLONE_VM is specified. This leads to
inconsistency since the value is not restored back in parent and
thus INVALID_TD_P and INVALID_NOT_TERMINATED_TD_P (used in pthread
implementations) will bail with an error handler.

Previous posix_spawn uses vfork which only interferes with THREAD_SELF
pid field by negating it before the syscall and restoring the value
after it.

I am trying to came up with the best solution for this, since both
pid and tid is used in both pthread_{join,cancel} and also on raise
(which also have another issue somewhat related [2]) and I am inclined
to just remove the CLONE_VM changes to pid/tid fields in the syscall
itself and moving it to START_THREAD_DEFN instead.

Any better ideas, tips, advices?

[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00274.html
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=15368

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: clone with CLONE_VM behavior
  2016-04-13 13:26 RFC: clone with CLONE_VM behavior Adhemerval Zanella
@ 2016-04-13 13:32 ` Adhemerval Zanella
  2016-04-15  4:56   ` Carlos O'Donell
  0 siblings, 1 reply; 4+ messages in thread
From: Adhemerval Zanella @ 2016-04-13 13:32 UTC (permalink / raw)
  To: GNU C Library



On 13-04-2016 10:26, Adhemerval Zanella wrote:
> Hi all,
> 
> Szabolcs has brought to my attention that the new posix_spawn is showing
> some issue on his aarch64 [1], but it is not limited to aarch64.
> 
> The problem is due the fact GLIBC clone implementation resets both
> THREAD_SELF pid and tid when CLONE_VM is specified. This leads to
> inconsistency since the value is not restored back in parent and
> thus INVALID_TD_P and INVALID_NOT_TERMINATED_TD_P (used in pthread
> implementations) will bail with an error handler.
> 
> Previous posix_spawn uses vfork which only interferes with THREAD_SELF
> pid field by negating it before the syscall and restoring the value
> after it.
> 
> I am trying to came up with the best solution for this, since both
> pid and tid is used in both pthread_{join,cancel} and also on raise
> (which also have another issue somewhat related [2]) and I am inclined
> to just remove the CLONE_VM changes to pid/tid fields in the syscall
> itself and moving it to START_THREAD_DEFN instead.

In fact, pthread does not really require it, since it will use CLONE_THREAD
and the glibc clone implementation won't change THERAD_SELF pid/tid field.
So the question is if glibc really should change pid/tid value for
CLONE_VM flag.

My impression is to allow programs that might use this flag to be able
to use the raise function and force pthread call to fail (due invalid
handle), however I am not convinced it is the right path to take
(and with proposed solution on BZ#15368 I think using cached value is
not an option).

> 
> Any better ideas, tips, advices?
> 
> [1] https://sourceware.org/ml/libc-alpha/2016-04/msg00274.html
> [2] https://sourceware.org/bugzilla/show_bug.cgi?id=15368
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: clone with CLONE_VM behavior
  2016-04-13 13:32 ` Adhemerval Zanella
@ 2016-04-15  4:56   ` Carlos O'Donell
  2016-04-15 13:13     ` Adhemerval Zanella
  0 siblings, 1 reply; 4+ messages in thread
From: Carlos O'Donell @ 2016-04-15  4:56 UTC (permalink / raw)
  To: Adhemerval Zanella, GNU C Library, Florian Weimer
  Cc: Rich Felker, Szabolcs Nagy

On 04/13/2016 09:32 AM, Adhemerval Zanella wrote:
> In fact, pthread does not really require it, since it will use CLONE_THREAD
> and the glibc clone implementation won't change THERAD_SELF pid/tid field.
> So the question is if glibc really should change pid/tid value for
> CLONE_VM flag.
> 
> My impression is to allow programs that might use this flag to be able
> to use the raise function and force pthread call to fail (due invalid
> handle), however I am not convinced it is the right path to take
> (and with proposed solution on BZ#15368 I think using cached value is
> not an option).

Florian and I discussed this briefly, and I think we came to some more
dire conclusions. After a cup of coffee and further discussions with
Adhemerval and Szabolcs Nagy I realized things are not so bad. I would
still like to see Florian review this list and agree with me.

I went through several questions when thinking about this issue, and so
I wrote them all down, even though only (e) applies.

(a) Non-problem: vfork.

All glibc vfork implementations set tcb->pthread->pid to -PID during the vfork
operation. Calls to getpid or getppid in a signal handler at this point will
return valid results from having called getpid or getppid from the kernel
(because they see a negative value in the cache), and will *not* refresh the 
tcb->pthread->tid cache (tid is non-zero) because doing so would result in
the cache having a stale value in the child and a signal handler could observe
that.

Corollary: All async-signal-safe functions that operate on the tid/pid cache
           directly must understand -PID/-TID.

(b) Non-problem: failed exec after vfork changes parent errno value.

A failed exec after vfork sets errno, and that sets the parent's errno, and
thus vfork technically sets errno but is not described as setting errno.
POSIX says that since vfork doesn't describe setting errno that the value
is unspecified, and so this is OK from a standards perspective that calling
vfork may change the parent's errno.

(c) Non-problem: pthread_join from another thread to a vfork-ing thread.

A vfork-ing thread will have it's child write -PID into it's own PID cache,
but the TID remains intact. Therefore another thread may still call
pthread_join on the thread that is vfork-ing and wait for it to exit.

(d) Problem: raise is not async-signal safe.

Still true. See https://sourceware.org/bugzilla/show_bug.cgi?id=15368
Though with (e) fixed there is one less scenario to worry about.

(e) Problem: pthread_join from another thread to a posix_spawn'ing thread.

The test nptl/tst-exec1.c calls pthread_join on a thread using posix_spawn,
and this results in a testsuite failure given the new posix_spawn
implementation. The thread spawning the child using posix_spawn is able
to accelerate the spawn with clone, but that results in tid/pid caches
set to -1 and thus the pthread_join from the other thread fails, and the
test fails.

The real problem is that __clone sets tid/pid to -1 in child using
parent memory. 

All implmentations of clone.S in glibc write -1 to tid/pid in the
tcb->pthread->pid/tid fields. This modifies the tid/pid of the parent.
It would seem that the purpose of this is to reset the tid/pid cache
such that a signal handler calling gepid/getpid in the child will
call the syscall to get the new correct results. The consequence is
that any external parallel thread calling pthread_join will see the
value of -1 in tid and be unable to wait on the thread calling vfork
(there is no way to get the right tid value anymore). You should be
able to join on the thread calling posix_spawn. I expect it was an
oversight that pthread_join fails to work in this case e.g. vfork
from a thread was not considered.

There are two theoretical solutions, but only one is practical:

(e.1) Remove setting tid/pid to -1 in __clone and provide users with a clone
      that does *nothing* to the parent.

      This would fix a complaint from ASAN which has them using a direct clone
      syscall because our clone wrapper breaks the parent's tid/pid with CLONE_VM
      but not CLONE_THREAD (they use this for a detached pseudo-monitor thread).

      https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L881

      Verify that posix_spawn{p} does not modify tid/pid. The child created by
      posix_spawn{p} will have default signal dispositions so no signal handler
      exists that can observe the wrong tid/pid.

      Change the tst-pid* tests in glibc to verify tid/pid does not change.

(e.2) Create a fake tcb->pthread.

      In the parent create a tcbhead_t, and struct pthread, and set them up with
      enough data they are valid, and reference the parent's TLS data or a copy
      of the parent's TLS data.

      Call clone with with CLONE_SETTLS pointing at the tcbhead_t copy, with
      CLONE_PARENT_SETTID pointing at the pid, and CLONE_CHILD_SETTID pointing
      at the tid. This will mean that you enter the child with unique data
      for struct pthread, tcbhead_t, and unique and correct values for pid/tid
      which are the same.

      You can immediately take a signal and the returned values from getpid
      and getppid in a signal handler would be valid.

      It is unclear exactly how hard it will be to get a valid working tcbhead_t
      or struct pthread setup and initialized for everything to operate
      correctly.

In practice (1) is the most reasonable.

The only problem is if we have an expectation that user code can call clone
or __clone2 and have getpid and getppid work correctly in the child? I think
the answer to that is: No we don't have any such expectation because clone
is designed for low-level runtime implementation.

I support removing the -1 tid/pid reset in clone/__clone2 (all clone.S assembly
files for all machines).

Other comments?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: clone with CLONE_VM behavior
  2016-04-15  4:56   ` Carlos O'Donell
@ 2016-04-15 13:13     ` Adhemerval Zanella
  0 siblings, 0 replies; 4+ messages in thread
From: Adhemerval Zanella @ 2016-04-15 13:13 UTC (permalink / raw)
  To: Carlos O'Donell, GNU C Library, Florian Weimer
  Cc: Rich Felker, Szabolcs Nagy



On 15-04-2016 01:56, Carlos O'Donell wrote:
> On 04/13/2016 09:32 AM, Adhemerval Zanella wrote:
>> In fact, pthread does not really require it, since it will use CLONE_THREAD
>> and the glibc clone implementation won't change THERAD_SELF pid/tid field.
>> So the question is if glibc really should change pid/tid value for
>> CLONE_VM flag.
>>
>> My impression is to allow programs that might use this flag to be able
>> to use the raise function and force pthread call to fail (due invalid
>> handle), however I am not convinced it is the right path to take
>> (and with proposed solution on BZ#15368 I think using cached value is
>> not an option).
> 
> Florian and I discussed this briefly, and I think we came to some more
> dire conclusions. After a cup of coffee and further discussions with
> Adhemerval and Szabolcs Nagy I realized things are not so bad. I would
> still like to see Florian review this list and agree with me.
> 
> I went through several questions when thinking about this issue, and so
> I wrote them all down, even though only (e) applies.
> 
> (a) Non-problem: vfork.
> 
> All glibc vfork implementations set tcb->pthread->pid to -PID during the vfork
> operation. Calls to getpid or getppid in a signal handler at this point will
> return valid results from having called getpid or getppid from the kernel
> (because they see a negative value in the cache), and will *not* refresh the 
> tcb->pthread->tid cache (tid is non-zero) because doing so would result in
> the cache having a stale value in the child and a signal handler could observe
> that.
> 
> Corollary: All async-signal-safe functions that operate on the tid/pid cache
>            directly must understand -PID/-TID.
> 
> (b) Non-problem: failed exec after vfork changes parent errno value.
> 
> A failed exec after vfork sets errno, and that sets the parent's errno, and
> thus vfork technically sets errno but is not described as setting errno.
> POSIX says that since vfork doesn't describe setting errno that the value
> is unspecified, and so this is OK from a standards perspective that calling
> vfork may change the parent's errno.

As you stated vfork it not an issue here, since it will use either the vfork
syscall or clone directly, without calling the __clone.

> 
> (c) Non-problem: pthread_join from another thread to a vfork-ing thread.
> 
> A vfork-ing thread will have it's child write -PID into it's own PID cache,
> but the TID remains intact. Therefore another thread may still call
> pthread_join on the thread that is vfork-ing and wait for it to exit.
> 
> (d) Problem: raise is not async-signal safe.
> 
> Still true. See https://sourceware.org/bugzilla/show_bug.cgi?id=15368
> Though with (e) fixed there is one less scenario to worry about.

This BZ is on my radar and the proposed solution, although it is considering
the default implementation (nptl/raise.c), instead of the Linux one
(unix/sysv/linux/{pt-}raise.c).

I think the approach outline in the bug report is the correct one,
although slower (it much lock all signals, execute a gettid).

> 
> (e) Problem: pthread_join from another thread to a posix_spawn'ing thread.
> 
> The test nptl/tst-exec1.c calls pthread_join on a thread using posix_spawn,
> and this results in a testsuite failure given the new posix_spawn
> implementation. The thread spawning the child using posix_spawn is able
> to accelerate the spawn with clone, but that results in tid/pid caches
> set to -1 and thus the pthread_join from the other thread fails, and the
> test fails.
> 
> The real problem is that __clone sets tid/pid to -1 in child using
> parent memory. 
> 
> All implmentations of clone.S in glibc write -1 to tid/pid in the
> tcb->pthread->pid/tid fields. This modifies the tid/pid of the parent.
> It would seem that the purpose of this is to reset the tid/pid cache
> such that a signal handler calling gepid/getpid in the child will
> call the syscall to get the new correct results. The consequence is
> that any external parallel thread calling pthread_join will see the
> value of -1 in tid and be unable to wait on the thread calling vfork
> (there is no way to get the right tid value anymore). You should be
> able to join on the thread calling posix_spawn. I expect it was an
> oversight that pthread_join fails to work in this case e.g. vfork
> from a thread was not considered.
> 
> There are two theoretical solutions, but only one is practical:
> 
> (e.1) Remove setting tid/pid to -1 in __clone and provide users with a clone
>       that does *nothing* to the parent.
> 
>       This would fix a complaint from ASAN which has them using a direct clone
>       syscall because our clone wrapper breaks the parent's tid/pid with CLONE_VM
>       but not CLONE_THREAD (they use this for a detached pseudo-monitor thread).
> 
>       https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L881
> 
>       Verify that posix_spawn{p} does not modify tid/pid. The child created by
>       posix_spawn{p} will have default signal dispositions so no signal handler
>       exists that can observe the wrong tid/pid.
> 
>       Change the tst-pid* tests in glibc to verify tid/pid does not change.

This is the approach I am working currently and the one I think the correct way.
As a side note, even with this modification ASAN will need to carry one its
own clone implementation because it requires the clone syscall to not modify
anything in its own pthread, not even the pid/tid in case of success.

To really fix it for ASAN it will require to remove the new pid/tid write
and move it for fork() code and fix the places in GLIBC where it calls
clone directly (sysdeps/unix/sysv/linux/system.c).

> 
> (e.2) Create a fake tcb->pthread.
> 
>       In the parent create a tcbhead_t, and struct pthread, and set them up with
>       enough data they are valid, and reference the parent's TLS data or a copy
>       of the parent's TLS data.
> 
>       Call clone with with CLONE_SETTLS pointing at the tcbhead_t copy, with
>       CLONE_PARENT_SETTID pointing at the pid, and CLONE_CHILD_SETTID pointing
>       at the tid. This will mean that you enter the child with unique data
>       for struct pthread, tcbhead_t, and unique and correct values for pid/tid
>       which are the same.
> 
>       You can immediately take a signal and the returned values from getpid
>       and getppid in a signal handler would be valid.
> 
>       It is unclear exactly how hard it will be to get a valid working tcbhead_t
>       or struct pthread setup and initialized for everything to operate
>       correctly.

I do not really this approach because it add much more complexity than the
implementation really requires to setup a cloned task (just check TLS_INIT_TP)
to execute posix_spawn.

> 
> In practice (1) is the most reasonable.
> 
> The only problem is if we have an expectation that user code can call clone
> or __clone2 and have getpid and getppid work correctly in the child? I think
> the answer to that is: No we don't have any such expectation because clone
> is designed for low-level runtime implementation.
> 
> I support removing the -1 tid/pid reset in clone/__clone2 (all clone.S assembly
> files for all machines).
> 
> Other comments?

I think the main problem is here is if we rellay should support direct clone
syscall. GLIBC requires some internal state update for all the clone internal
usage (either in thread creation, fork, and now posix_spawn) so pursing it
might be problematic.

As Szabolcs has pointed out in IRC, clone usage outside GLIBC are *very*
specific and contains some very dubious assumptions. For instance systemtap [1]
do uses clone directly with CLONE_VM, but does not seem to rely on getpid
to work.

[1] http://sources.debian.net/src/systemtap/3.0-2/testsuite/systemtap.clone/dtrace_clone.c/?hl=27#L27



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-15 13:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 13:26 RFC: clone with CLONE_VM behavior Adhemerval Zanella
2016-04-13 13:32 ` Adhemerval Zanella
2016-04-15  4:56   ` Carlos O'Donell
2016-04-15 13:13     ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).