* ctermid: return string literal, document MT-Safety pitfall @ 2014-11-07 8:35 Alexandre Oliva 2014-11-07 10:36 ` Richard Henderson 2014-11-11 13:30 ` Florian Weimer 0 siblings, 2 replies; 24+ messages in thread From: Alexandre Oliva @ 2014-11-07 8:35 UTC (permalink / raw) To: libc-alpha The ctermid implementation, like cuserid, uses a static buffer. I noticed this one, but I reasoned that, since the buffer was initialized with the same short string in every thread that called the function without passing it a buffer, the value would remain unchanged, and so no harmful effects would be caused by what is technically a data race. This was based on an interpretation that strcpy (and memcpy, and compiler-inlined versions thereof) could not write garbage in the destination before writing the intended values, because this would be a deviation from the specification, and it could be observed by an asynchronous signal handler. Whether or not this reading of POSIX is correct is not so important: ctermid can be implemented so as to return a pre-initialized static buffer, instead of initializing it every time. Callers are not allowed by POSIX to modify this buffer, so we can even make it read-only. This patch does this, to sidestep the debate. It might even be the case that it makes ctermid more efficient, since it avoids reinitializing a static buffer every time. GCC is still smart enough to notice that, when a buffer is passed in, the string copied to it is a known constant, so it optimizes the strcpy to the same sequence of stores used before this patch. As for the MT-Safety documentation, I update the comments next to the annotations to reflect this change in the implementation, add a note indicating we diverge from POSIX in the static buffer case (MT-Safety is not required), and suggest that, when we drop the note that indicates this is preliminary documentation about the current implementation, rather than a commitment to remain within these safety boundaries in the future, we may want to add a note indicating the possibility of a race condition. Ok to install? From: Alexandre Oliva <aoliva@redhat.com> for ChangeLog * sysdeps/posix/ctermid.c (ctermid): Return a pointer to a string literal if not passed a buffer. * manual/job.texi (ctermid): Update reasoning, note deviation from posix, suggest mtasurace when not passed a buffer, for future non-preliminary safety notes. --- manual/job.texi | 8 +++++--- sysdeps/posix/ctermid.c | 13 +++++++------ 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/manual/job.texi b/manual/job.texi index 4f9bd81..095c26d 100644 --- a/manual/job.texi +++ b/manual/job.texi @@ -1039,10 +1039,12 @@ The function @code{ctermid} is declared in the header file @comment stdio.h @comment POSIX.1 @deftypefun {char *} ctermid (char *@var{string}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@safety{@prelim{}@mtsafe{@mtsposix{/!string}}@assafe{}@acsafe{}} @c This function is a stub by default; the actual implementation, for -@c posix systems, returns an internal buffer if passed a NULL string, -@c but the internal buffer is always set to /dev/tty. +@c posix systems, returns a pointer to a string literal if passed a NULL +@c string. It's not clear we want to commit to being MT-Safe in the +@c !string case, so maybe add mtasurace{:ctermid/!string} when we take +@c prelim out, to make room for using a static buffer in the future. The @code{ctermid} function returns a string containing the file name of the controlling terminal for the current process. If @var{string} is not a null pointer, it should be an array that can hold at least diff --git a/sysdeps/posix/ctermid.c b/sysdeps/posix/ctermid.c index 0ef9a3f..ca81d42 100644 --- a/sysdeps/posix/ctermid.c +++ b/sysdeps/posix/ctermid.c @@ -19,17 +19,18 @@ #include <string.h> -/* Return the name of the controlling terminal. - If S is not NULL, the name is copied into it (it should be at - least L_ctermid bytes long), otherwise a static buffer is used. */ +/* Return the name of the controlling terminal. If S is not NULL, the + name is copied into it (it should be at least L_ctermid bytes + long), otherwise we return a pointer to a non-const but read-only + string literal, that POSIX states the caller must not modify. */ char * ctermid (s) char *s; { - static char name[L_ctermid]; + char *name = (char /*drop const*/ *) "/dev/tty"; if (s == NULL) - s = name; + return name; - return strcpy (s, "/dev/tty"); + return strcpy (s, name); } -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-07 8:35 ctermid: return string literal, document MT-Safety pitfall Alexandre Oliva @ 2014-11-07 10:36 ` Richard Henderson 2014-11-08 14:22 ` Alexandre Oliva 2014-11-11 13:30 ` Florian Weimer 1 sibling, 1 reply; 24+ messages in thread From: Richard Henderson @ 2014-11-07 10:36 UTC (permalink / raw) To: Alexandre Oliva, libc-alpha On 11/07/2014 09:35 AM, Alexandre Oliva wrote: > char * > ctermid (s) > char *s; Can you please fix the K&R at the same time? r~ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-07 10:36 ` Richard Henderson @ 2014-11-08 14:22 ` Alexandre Oliva 2014-11-08 15:01 ` Richard Henderson 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2014-11-08 14:22 UTC (permalink / raw) To: Richard Henderson; +Cc: libc-alpha On Nov 7, 2014, Richard Henderson <rth@twiddle.net> wrote: > On 11/07/2014 09:35 AM, Alexandre Oliva wrote: >> char * >> ctermid (s) >> char *s; > Can you please fix the K&R at the same time? Sure, how's this? ctermid: return string literal, document MT-Safety pitfall From: Alexandre Oliva <aoliva@redhat.com> for ChangeLog * sysdeps/posix/ctermid.c (ctermid): Return a pointer to a string literal if not passed a buffer. * manual/job.texi (ctermid): Update reasoning, note deviation from posix, suggest mtasurace when not passed a buffer, for future non-preliminary safety notes. --- manual/job.texi | 8 +++++--- sysdeps/posix/ctermid.c | 16 ++++++++-------- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/manual/job.texi b/manual/job.texi index 4f9bd81..095c26d 100644 --- a/manual/job.texi +++ b/manual/job.texi @@ -1039,10 +1039,12 @@ The function @code{ctermid} is declared in the header file @comment stdio.h @comment POSIX.1 @deftypefun {char *} ctermid (char *@var{string}) -@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}} +@safety{@prelim{}@mtsafe{@mtsposix{/!string}}@assafe{}@acsafe{}} @c This function is a stub by default; the actual implementation, for -@c posix systems, returns an internal buffer if passed a NULL string, -@c but the internal buffer is always set to /dev/tty. +@c posix systems, returns a pointer to a string literal if passed a NULL +@c string. It's not clear we want to commit to being MT-Safe in the +@c !string case, so maybe add mtasurace{:ctermid/!string} when we take +@c prelim out, to make room for using a static buffer in the future. The @code{ctermid} function returns a string containing the file name of the controlling terminal for the current process. If @var{string} is not a null pointer, it should be an array that can hold at least diff --git a/sysdeps/posix/ctermid.c b/sysdeps/posix/ctermid.c index 0ef9a3f..9714285 100644 --- a/sysdeps/posix/ctermid.c +++ b/sysdeps/posix/ctermid.c @@ -19,17 +19,17 @@ #include <string.h> -/* Return the name of the controlling terminal. - If S is not NULL, the name is copied into it (it should be at - least L_ctermid bytes long), otherwise a static buffer is used. */ +/* Return the name of the controlling terminal. If S is not NULL, the + name is copied into it (it should be at least L_ctermid bytes + long), otherwise we return a pointer to a non-const but read-only + string literal, that POSIX states the caller must not modify. */ char * -ctermid (s) - char *s; +ctermid (char *s) { - static char name[L_ctermid]; + char *name = (char /*drop const*/ *) "/dev/tty"; if (s == NULL) - s = name; + return name; - return strcpy (s, "/dev/tty"); + return strcpy (s, name); } -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-08 14:22 ` Alexandre Oliva @ 2014-11-08 15:01 ` Richard Henderson 2014-11-11 14:37 ` Torvald Riegel 0 siblings, 1 reply; 24+ messages in thread From: Richard Henderson @ 2014-11-08 15:01 UTC (permalink / raw) To: Alexandre Oliva; +Cc: libc-alpha On 11/08/2014 03:20 PM, Alexandre Oliva wrote: > ctermid: return string literal, document MT-Safety pitfall > > From: Alexandre Oliva <aoliva@redhat.com> > > for ChangeLog > > * sysdeps/posix/ctermid.c (ctermid): Return a pointer to a > string literal if not passed a buffer. > * manual/job.texi (ctermid): Update reasoning, note deviation > from posix, suggest mtasurace when not passed a buffer, for > future non-preliminary safety notes. LGTM. r~ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-08 15:01 ` Richard Henderson @ 2014-11-11 14:37 ` Torvald Riegel 0 siblings, 0 replies; 24+ messages in thread From: Torvald Riegel @ 2014-11-11 14:37 UTC (permalink / raw) To: Richard Henderson; +Cc: Alexandre Oliva, libc-alpha On Sat, 2014-11-08 at 16:01 +0100, Richard Henderson wrote: > On 11/08/2014 03:20 PM, Alexandre Oliva wrote: > > ctermid: return string literal, document MT-Safety pitfall > > > > From: Alexandre Oliva <aoliva@redhat.com> > > > > for ChangeLog > > > > * sysdeps/posix/ctermid.c (ctermid): Return a pointer to a > > string literal if not passed a buffer. > > * manual/job.texi (ctermid): Update reasoning, note deviation > > from posix, suggest mtasurace when not passed a buffer, for > > future non-preliminary safety notes. > > LGTM. Same for me. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-07 8:35 ctermid: return string literal, document MT-Safety pitfall Alexandre Oliva 2014-11-07 10:36 ` Richard Henderson @ 2014-11-11 13:30 ` Florian Weimer 2014-11-13 21:03 ` Alexandre Oliva 1 sibling, 1 reply; 24+ messages in thread From: Florian Weimer @ 2014-11-11 13:30 UTC (permalink / raw) To: Alexandre Oliva, libc-alpha On 11/07/2014 09:35 AM, Alexandre Oliva wrote: > This was based on an interpretation that strcpy (and memcpy, and > compiler-inlined versions thereof) could not write garbage in the > destination before writing the intended values, because this would be a > deviation from the specification, and it could be observed by an > asynchronous signal handler. Which specification do you mean? glibc or the C standard? -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-11 13:30 ` Florian Weimer @ 2014-11-13 21:03 ` Alexandre Oliva 2014-11-14 12:01 ` Florian Weimer 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2014-11-13 21:03 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote: > On 11/07/2014 09:35 AM, Alexandre Oliva wrote: >> This was based on an interpretation that strcpy (and memcpy, and >> compiler-inlined versions thereof) could not write garbage in the >> destination before writing the intended values, because this would be a >> deviation from the specification, and it could be observed by an >> asynchronous signal handler. > Which specification do you mean? glibc or the C standard? I meant standard C. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-13 21:03 ` Alexandre Oliva @ 2014-11-14 12:01 ` Florian Weimer 2014-11-14 13:28 ` Torvald Riegel 2014-11-14 16:46 ` Alexandre Oliva 0 siblings, 2 replies; 24+ messages in thread From: Florian Weimer @ 2014-11-14 12:01 UTC (permalink / raw) To: Alexandre Oliva; +Cc: libc-alpha On 11/13/2014 10:03 PM, Alexandre Oliva wrote: > On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote: > >> On 11/07/2014 09:35 AM, Alexandre Oliva wrote: >>> This was based on an interpretation that strcpy (and memcpy, and >>> compiler-inlined versions thereof) could not write garbage in the >>> destination before writing the intended values, because this would be a >>> deviation from the specification, and it could be observed by an >>> asynchronous signal handler. > >> Which specification do you mean? glibc or the C standard? > > I meant standard C. I've been staring at the standard for a while. The standard explicitly refuses to deal with the interaction of signal handlers and threads (7.14.1.1/7, “Use of this function in a multi-threaded program results in undefined behavior.”). However, the standard still required that lock-free atomic objects have values which are not unspecified. But as far as I can tell, the standard does not explicitly sequence operations on atomic objects, so the normal sequencing rules apply, and they fail to specify a value, so the value is still effectively unspecified, and library functions such as memcpy and memset can write ghost values, or can be implemented with one-char-at-a-time loops, and there is no way to observe that. This (the “not unspecified but not specified either” state) seems to be a defect in the standard. I very much doubt the intent was invalidate existing implementations which write ghost values, such as the Solaris/SPARC memset implementation: <https://bugs.openjdk.java.net/browse/JDK-6948537> -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 12:01 ` Florian Weimer @ 2014-11-14 13:28 ` Torvald Riegel 2014-11-14 13:47 ` Florian Weimer 2014-11-14 16:46 ` Alexandre Oliva 1 sibling, 1 reply; 24+ messages in thread From: Torvald Riegel @ 2014-11-14 13:28 UTC (permalink / raw) To: Florian Weimer; +Cc: Alexandre Oliva, libc-alpha On Fri, 2014-11-14 at 13:01 +0100, Florian Weimer wrote: > On 11/13/2014 10:03 PM, Alexandre Oliva wrote: > > On Nov 11, 2014, Florian Weimer <fweimer@redhat.com> wrote: > > > >> On 11/07/2014 09:35 AM, Alexandre Oliva wrote: > >>> This was based on an interpretation that strcpy (and memcpy, and > >>> compiler-inlined versions thereof) could not write garbage in the > >>> destination before writing the intended values, because this would be a > >>> deviation from the specification, and it could be observed by an > >>> asynchronous signal handler. > > > >> Which specification do you mean? glibc or the C standard? > > > > I meant standard C. > > I've been staring at the standard for a while. The standard explicitly > refuses to deal with the interaction of signal handlers and threads > (7.14.1.1/7, âUse of this function in a multi-threaded program results > in undefined behavior.â). At least in ISO C++, what a signal handler can do is still being discussed. There's sig_atomic_t and the lock-free atomic ops that are safe, and at least C++ wants some ordering guarantee wrt. to the installation of the signal handler and an executing signal handler. AFAIU from the ISO C++ discussions, the same discussion is happening among ISO C committee members. > However, the standard still required that lock-free atomic objects have > values which are not unspecified. But as far as I can tell, the > standard does not explicitly sequence operations on atomic objects, What do you mean by "to sequence"? The sequenced-before relation can include atomic operations, and atomic operations will be part of happens-before. > so > the normal sequencing rules apply, and they fail to specify a value, so > the value is still effectively unspecified, and library functions such > as memcpy and memset can write ghost values, or can be implemented with > one-char-at-a-time loops, and there is no way to observe that. > > This (the ânot unspecified but not specified eitherâ state) seems to be > a defect in the standard. I very much doubt the intent was invalidate > existing implementations which write ghost values, such as the > Solaris/SPARC memset implementation: > > <https://bugs.openjdk.java.net/browse/JDK-6948537> > I agree that what happens during the execution of non-concurrent functions is unspecified, and that the as-if rule applies. For a sequential specification of a function, one has a precondition and a postcondition -- but how to reach a state satisfying the postcondition is left to the implementation. memset is allowed to change one bit at a time, in any order. Wanting anything else would require specifying the actual implementation, which the standard doesn't do; it might be easy to assume that many implementations of a very simple function like memset would behave in a certain way -- but this already breaks down with more complex functions such as qsort (which intermediate states are actually allowed? can it use the to-be-sorted array as scratch space?). Also, making assumptions about intermediate states kills the as-if rule, hampering compiler optimizations. If we want to reason about states during the execution of a function, there must be some way to observe that, and the observation will be concurrent with the execution of the function. Thus, we need a concurrent specification not just a sequential one, which describes the possible outcomes when combining the function and something concurrently running. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 13:28 ` Torvald Riegel @ 2014-11-14 13:47 ` Florian Weimer 2014-11-14 14:06 ` Torvald Riegel 0 siblings, 1 reply; 24+ messages in thread From: Florian Weimer @ 2014-11-14 13:47 UTC (permalink / raw) To: Torvald Riegel; +Cc: Alexandre Oliva, libc-alpha On 11/14/2014 02:28 PM, Torvald Riegel wrote: >> However, the standard still required that lock-free atomic objects have >> values which are not unspecified. But as far as I can tell, the >> standard does not explicitly sequence operations on atomic objects, > > What do you mean by "to sequence"? The sequenced-before relation can > include atomic operations, and atomic operations will be part of > happens-before. Unlike volatile accesses, access to atomic objects do not contribute to the sequenced-before relation directly, only their corresponding full expressions do. > Wanting anything else would require specifying the > actual implementation, which the standard doesn't do; it might be easy > to assume that many implementations of a very simple function like > memset would behave in a certain way -- but this already breaks down > with more complex functions such as qsort (which intermediate states are > actually allowed? can it use the to-be-sorted array as scratch space?). > Also, making assumptions about intermediate states kills the as-if rule, > hampering compiler optimizations. It tries to do that for memset_s, but I doubt it succeeds at this (we touch this issue briefly before). I still think the language in the standard allows the compiler to elide dead memset_s calls, despite the intent. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 13:47 ` Florian Weimer @ 2014-11-14 14:06 ` Torvald Riegel 2014-11-14 16:53 ` Alexandre Oliva 0 siblings, 1 reply; 24+ messages in thread From: Torvald Riegel @ 2014-11-14 14:06 UTC (permalink / raw) To: Florian Weimer; +Cc: Alexandre Oliva, libc-alpha On Fri, 2014-11-14 at 14:47 +0100, Florian Weimer wrote: > On 11/14/2014 02:28 PM, Torvald Riegel wrote: > > Wanting anything else would require specifying the > > actual implementation, which the standard doesn't do; it might be easy > > to assume that many implementations of a very simple function like > > memset would behave in a certain way -- but this already breaks down > > with more complex functions such as qsort (which intermediate states are > > actually allowed? can it use the to-be-sorted array as scratch space?). > > Also, making assumptions about intermediate states kills the as-if rule, > > hampering compiler optimizations. > > It tries to do that for memset_s, but I doubt it succeeds at this (we > touch this issue briefly before). I still think the language in the > standard allows the compiler to elide dead memset_s calls, despite the > intent. AFAICT memset_s is still a sequentially-specified function. Even though it states that the memory will be modified strictly according to the rules of the abstract machine, it doesn't state that the stores don't contribute to data races -- thus, data-race freedom would still be required. And it doesn't make the stores atomic. Also, for a concurrent observer to actually see the stores in the way they were issues by the memset_s, it would have to synchronize with the stores; this would require a statement how that needs to happen. It could be used to have a "volatile" memset I guess. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 14:06 ` Torvald Riegel @ 2014-11-14 16:53 ` Alexandre Oliva 2014-11-17 9:44 ` Torvald Riegel 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2014-11-14 16:53 UTC (permalink / raw) To: Torvald Riegel; +Cc: Florian Weimer, libc-alpha On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: > AFAICT memset_s is still a sequentially-specified function. How can you tell? It's not like the standard explicitly says so, is it? It can't be the as-if rule if intermediate results can be observed in ways that are not ruled out by the standard. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 16:53 ` Alexandre Oliva @ 2014-11-17 9:44 ` Torvald Riegel 2014-11-18 22:23 ` Alexandre Oliva 0 siblings, 1 reply; 24+ messages in thread From: Torvald Riegel @ 2014-11-17 9:44 UTC (permalink / raw) To: Alexandre Oliva; +Cc: Florian Weimer, libc-alpha On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote: > On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: > > > AFAICT memset_s is still a sequentially-specified function. > > How can you tell? It's not like the standard explicitly says so, is it? > It can't be the as-if rule if intermediate results can be observed in > ways that are not ruled out by the standard. If we're talking about C11, which Florian cited, then the by-default data-race freedom requirement applies, and memset_s doesn't say anything about atomicity or ordering, so if you would observe intermediate states, you'd have a race condition. You wouldn't have a race condition if you'd have an observer that happens-before the memset_s or have the memset_s happens-before the observer. IOW, you're not allowed to look at the intermediate states. If we disregard data-race freedom for a second, memset_s is, in comparison to memset, a little special in that it says that the function has to be executed strictly according to the rules of the abstract machine. That may look like it could be useful for concurrent settings, but then you still have the issue that observers need to be constrained as well, execution under racing accesses from multiple threads is still undefined, and there's no memory ordering (which matters less in related ctermid case of concurrent memset_s to the same memory locations because you just store store store). memset_s doesn't specify any of that, so, by absence of defined semantics, it's still a sequential function to me. The way I read the special memset_s requirements is that if the function's execution is terminated prematurely because of violating the runtime constraints, that an observer then get an as-if to the abstract machine. Not that you can just observe the results without it being terminated. Also, C11 states in 3.7.4.1p4: "Unlike memset, any call to the memset_s function shall be evaluated strictly according to the rules of the abstract machine as described in (5.1.2.3)." This indicates that memset can write intermediate states; otherwise, the standard wouldn't need to state the deviation from the default for memset_s. If the standard doesn't define semantics of multi-threaded executions, I disagree that you can assume some semantics for it; it's undefined, so like undefined behavior, you can get anything. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-17 9:44 ` Torvald Riegel @ 2014-11-18 22:23 ` Alexandre Oliva 2014-11-19 22:11 ` Torvald Riegel 0 siblings, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2014-11-18 22:23 UTC (permalink / raw) To: Torvald Riegel; +Cc: Florian Weimer, libc-alpha On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote: > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote: >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: >> >> > AFAICT memset_s is still a sequentially-specified function. >> >> How can you tell? It's not like the standard explicitly says so, is it? >> It can't be the as-if rule if intermediate results can be observed in >> ways that are not ruled out by the standard. > If we're talking about C11, which Florian cited, then the by-default > data-race freedom requirement applies, and memset_s doesn't say anything > about atomicity or ordering, so if you would observe intermediate > states, you'd have a race condition. You wouldn't have a race condition > if you'd have an observer that happens-before the memset_s or have the > memset_s happens-before the observer. IOW, you're not allowed to look > at the intermediate states. I'm not asking specifically about memset or strcpy, I'm asking how do you tell in general. You've long ago, and again recently, claimed that such functions as qsort and bsearch have sequential specifications, even though they have callbacks that must necessarily observe and compute based on intermediate states. I'm just trying to figure out what the heck you mean by “sequential function”, and by “sequential specification”. I had understood the latter had to do with specifications limited to pre- and post-conditions, but the standards we've been talking about do not limit function specifications to that. So, something is clearly amiss. As for observing intermediate results, we seem to have ruled out as undefined accesses from other threads, and from interrupting signal handlers. This covers almost all possibilities, but how about cancelling the thread that's running memcpy or strcpy, if it has asynchronous cancellation enabled? If you do that, and then pthread_join completes, you have set a clear happens-before relationship. Sure enough, POSIX doesn't require such functions as memcpy or strcpy to be AC-Safe, but our manual claims our current implementations are. Does this mean it is safe to access the variables that were partially modified by the interrupted memcpy/strcpy/whatever, and that this provides means to safely inspect intermediate states? Or does it mean our manual should not claim these functions to be AC-Safe, just so that we can claim a program that attempts to inspect intermediate states of strcpy is undefined behavior? Or could we resort to any other argument to make it undefined? -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-18 22:23 ` Alexandre Oliva @ 2014-11-19 22:11 ` Torvald Riegel 2014-11-21 9:31 ` Alexandre Oliva 0 siblings, 1 reply; 24+ messages in thread From: Torvald Riegel @ 2014-11-19 22:11 UTC (permalink / raw) To: Alexandre Oliva; +Cc: Florian Weimer, libc-alpha On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote: > On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote: > > > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote: > >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: > >> > >> > AFAICT memset_s is still a sequentially-specified function. > >> > >> How can you tell? It's not like the standard explicitly says so, is it? > >> It can't be the as-if rule if intermediate results can be observed in > >> ways that are not ruled out by the standard. > > > If we're talking about C11, which Florian cited, then the by-default > > data-race freedom requirement applies, and memset_s doesn't say anything > > about atomicity or ordering, so if you would observe intermediate > > states, you'd have a race condition. You wouldn't have a race condition > > if you'd have an observer that happens-before the memset_s or have the > > memset_s happens-before the observer. IOW, you're not allowed to look > > at the intermediate states. > > I'm not asking specifically about memset or strcpy, I'm asking how do > you tell in general. For C11 specifically, what I wrote above applies. If the function is doing something, for example a store, and the function does not specify what it does internally and which inter-thread happens-before relations this creates, then there's nothing specified that makes this not a data race if you try to look at an intermediate state from another thread. So if you actually look at an intermediate state, there's a data race. Data-race-freedom is the default. 5.1.2.3p10 gives an example of an allowed implementation that just guarantees equality of non-volatiles to the abstract machine at function boundaries; IOW, the implementation is allowed to just satisfy a post-condition (as long as it doesn't violate other invariants, introduced data races on its own, etc.). > You've long ago, and again recently, claimed that > such functions as qsort and bsearch have sequential specifications, even > though they have callbacks that must necessarily observe and compute > based on intermediate states. Well, the comparison callbacks can't just look at will at every piece of intermediate state. They get called with specific arguments, and the memory locations that they have to compare are exactly specified. So, I agree that these *specific* memory locations are intermediate states, but the comparison functions are not guaranteed to be able to look at other elements of the arrays and find sensible information in those. The promise that a function such as qsort makes is still that after it has finished, the array will be sorted. Yes it can call other functions while doing that, and it will do those calls in a way that satisfies the preconditions of those other functions (e.g., don't have garbage in the elements that a comparison function needs to compare); but that doesn't mean that it guarantees anything beyond that in terms of it's promise. > I'm just trying to figure out what the > heck you mean by âsequential functionâ, and by âsequential > specificationâ. What I mean is that they are not concurrent specifications that make guarantees about states of an unfinished execution as visible to concurrent observers. They only make guarantees about the state after a function has finished executing. (Sorry if I'm using shared-memory synchronization terminology here, but given that we want to distinguish between concurrent and non-concurrent, that seems to make sense.) > I had understood the latter had to do with > specifications limited to pre- and post-conditions, but the standards > we've been talking about do not limit function specifications to that. Why do you think that is the case? The callback, or composition of functions in general, is one thing you mentioned, and I hope was able to convince you that this doesn't give guarantees about the caller (e.g., qsort) to the callee (e.g., comparison function), except when those guarantees overlap with preconditions for the callee. > So, something is clearly amiss. > > As for observing intermediate results, we seem to have ruled out as > undefined accesses from other threads, and from interrupting signal > handlers. Good. > This covers almost all possibilities, but how about > cancelling the thread that's running memcpy or strcpy, if it has > asynchronous cancellation enabled? If you do that, and then > pthread_join completes, you have set a clear happens-before > relationship. Well, that's what I would guess too. I definitely agree that the cancellation happens-before the return of pthread_join, but which effects then actually happen-before depends on the definition of AC-Safe. Which you seem to point out next: > Sure enough, POSIX doesn't require such functions as > memcpy or strcpy to be AC-Safe, but our manual claims our current > implementations are. That is a good question, and really is up to the definition of AC-Safe. The one I see is (please correct me or cite further parts of the spec that may apply): "A function that may be safely invoked by an application while the asynchronous form of cancellation is enabled." That doesn't really tell me a lot :) I can interpret "safely invoked" to at least mean that the mere act of cancellation will not break anything. But it doesn't tell me which state one can expect after cancellation. I think we can distinguish between three kinds of functions here: 1) Functions like memset that (IMO) don't specify intermediate states. 2) Functions like memset_s, that to some extent do specify intermediate states (e.g., in this case, it's equivalence to steps the abstract machine would do when storing one character at a time, starting at the beginning). 3) Those with an already concurrent specification, which clearly designate atomic parts (i.e., indivisible steps, even wrt. cancellation). One way to define safety would be to say that a cancelled function should either take effect or not, but never partially take effect. IOW, it's either just the precondition or the postcondition that holds. Another option would be to allow specified intermediate steps to take effect. For 2), we could say that cancellation happens anywhere between the steps the abstract machine would do, but not within a step. This would be satisfied under the requirement you assumed for memset and strcpy implementations, I believe. For 3), it could be cancellation between any of the atomic steps, unless otherwise specified. For condvar wait, for example, this could be one of the three parts: lock release, wakeup, lock acquisition. However, that may not be really useful, so more needs to be specified (as condvars do, IIRC). If a concurrent function is supposed to be just one atomic step, safety could mean either pre- or postcondition. For 3), we can probably assume safety to be that the function was cancelled somewhere between the atomic steps it does make. > Does this mean it is safe to access the variables > that were partially modified by the interrupted memcpy/strcpy/whatever, > and that this provides means to safely inspect intermediate states? For normal memcpy, strcpy, and other functions in group 1), the intermediate states aren't defined, so unless we want to define safety as just being cancellable (and leaving the affected memory in an unspecified state), we can't do much. > Or > does it mean our manual should not claim these functions to be AC-Safe, > just so that we can claim a program that attempts to inspect > intermediate states of strcpy is undefined behavior? I guess not claiming AC-Safety makes most sense for group 1). Cancellation could be useful in scenarios where you actually don't need to look at the state at all -- but then AC Safety must clarify the definition that you shouldn't look at state. I'm not sure whether we want this as default though. > Or could we resort > to any other argument to make it undefined? I'm not aware of one. I believe clarifying (our interpretation of) the definition of AC-Safety is a good way forward. Or checking back with POSIX. If there is indeed an agreed upon, clear definition, we should just adapt to it I suppose. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-19 22:11 ` Torvald Riegel @ 2014-11-21 9:31 ` Alexandre Oliva 2014-11-21 17:17 ` Joseph Myers 2014-11-21 23:43 ` Torvald Riegel 0 siblings, 2 replies; 24+ messages in thread From: Alexandre Oliva @ 2014-11-21 9:31 UTC (permalink / raw) To: Torvald Riegel; +Cc: Florian Weimer, libc-alpha On Nov 19, 2014, Torvald Riegel <triegel@redhat.com> wrote: > On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote: >> On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote: >> >> > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote: >> >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: >> >> >> >> > AFAICT memset_s is still a sequentially-specified function. >> >> >> >> How can you tell? It's not like the standard explicitly says so, is it? >> I'm asking how do you tell in general. > If the function is doing something, for example a store, and the > function does not specify what it does internally and which > inter-thread happens-before relations this creates, then there's > nothing specified that makes this not a data race if you try to look > at an intermediate state from another thread. Which is why I've resorted to non-threaded means of inspection of intermediate states. I think we differ in whether “the function does not specify what it does internally”. If the definition of the function said “copy n chars from src[0..n-1] to dest[0..n-1] respectively”, besides any pre- and post- conditions, then it *does* specify what it does internally. Not the order in which the chars are copied, for sure, but still, it says the function should copy each and every one of those chars. It doesn't state how to copy a char, but anything other than load from src[i] and store the loaded value in dest[i] is hardly a copy. So while this makes room an interrupted copy to leave dest[i] in an unspecified state that could be its earlier value or the newly-copied one, it would be hard to argue that anything else complies with the behavior specification enclosed in quotes above. I can see value in making simplifying assumptions to reason about behavior in the presence of multiple threads, and I realize that the no data race requirements can enable *reasoning* about sequential functions in such contexts as if only the pre- and post-conditions mattered, I do not agree that applying similar reasoning to go backwards is logically sound. I mean, “I perceive this as a sequential function, which enables simplifying assumptions about internal behavior in multi-threaded contexts, therefore I can disregard the explicit behavior specification and only look at explicit or inferred (pre- and?) post-conditions to reason in any context whatsoever, or to implement the function however I like, even deviating from the specification, as long as it still satisfies the post-conditions when given the pre-conditions” doesn't hold, because there are issues that arise besides those that come up in multi-threaded contexts, to which the simplifying assumptions for reasoning about multi-threaded contexts do not apply. > Well, the comparison callbacks can't just look at will at every piece > of intermediate state. Why is that? I mean, what, if any, part of the relevant standards says so? > So, I agree that these *specific* memory locations are intermediate > states, but the comparison functions are not guaranteed to be able to > look at other elements of the arrays and find sensible information in > those. The important question here IMHO is whether looking at them is invokes undefined behavior, or just yields unspecified values, possibly narrowed to a subset of all values that might be held by the types of the objects in those locations, if there can even be valid assumptions about the types of those memory locations. >> I'm just trying to figure out what the >> heck you mean by “sequential function”, and by “sequential >> specification”. > What I mean is that they are not concurrent specifications that make > guarantees about states of an unfinished execution as visible to > concurrent observers. They only make guarantees about the state after a > function has finished executing. (Sorry if I'm using shared-memory > synchronization terminology here, but given that we want to distinguish > between concurrent and non-concurrent, that seems to make sense.) Thanks. The definitely makes sense, when the goal is to reason about shared-memory multi-threaded (henceforth SMMT) issues. But there are other issues for which this distinction, or the simplifications in SMMT reasoning that follow from it, don't apply, and may even contradict other standard-imposed requirements. So please take the “sequential function” claims with a grain of salt, and don't use them to discard parts of the specification you don't generally have to worry about when you're thinking of SMMT, when the context is not limited to SMMT. >> I had understood the latter had to do with >> specifications limited to pre- and post-conditions, but the standards >> we've been talking about do not limit function specifications to that. > Why do you think that is the case? What does “that” mean? That I had understood it in a certain way? Or that the standards do not limit specs to pre- and post-conditions? > The callback, or composition of functions in general, is one thing you > mentioned, and I hope was able to convince you that this doesn't give > guarantees about the caller (e.g., qsort) to the callee (e.g., > comparison function), except when those guarantees overlap with > preconditions for the callee. I'm afraid you haven't, but you've helped me understand our differences in reasoning, because I won't turn specifications of behavior into pre- and post-conditions and label a function as sequential to then pretend the original specifications did not exist and did not impose any other requirements that are not necessarily relevant for SMMT contexts, but that might be in other contexts. > "A function that may be safely invoked by an application while the > asynchronous form of cancellation is enabled." > That doesn't really tell me a lot :) I can interpret "safely invoked" > to at least mean that the mere act of cancellation will not break > anything. But it doesn't tell me which state one can expect after > cancellation. Yup. Again, the important question is: is it undefined or unspecified? > One way to define safety would be to say that a cancelled function > should either take effect or not, but never partially take effect. IOW, > it's either just the precondition or the postcondition that holds. This would be a way to extend the simplifying assumptions of sequential functions to some other contexts. Sequential functions would essentially be regarded as, and required to behave as, atomic. > Another option would be to allow specified intermediate steps to take > effect. For 2), we could say that cancellation happens anywhere between > the steps the abstract machine would do, but not within a step. This > would be satisfied under the requirement you assumed for memset and > strcpy implementations, I believe. Yeah, with the caveat that the order of steps of the abstract machine that may be used to carry out the required behavior is not specified. So, interrupting memset, you might observe that dest[i+1] is modified while dest[i] wasn't yet, or vice-versa. > For 3), it could be cancellation between any of the atomic steps, unless > otherwise specified. For condvar wait, for example, this could be one > of the three parts: lock release, wakeup, lock acquisition. Eeek, it would be Really Bad (TM) IMHO if a condvar wait could be canceled while the lock is not held: this could mess with enclosing cleanup handlers that, among other things, release the lock. What states can cancellation cleanup handlers reliably inspect, anyway? Are they to be regarded as running in async signal context, so that they can't reliably access local state and are very limited in global state? Or are they allowed to access local state, plus any global state that could be accessed after pthread_join()ing the canceled thread? >> Does this mean it is safe to access the variables >> that were partially modified by the interrupted memcpy/strcpy/whatever, >> and that this provides means to safely inspect intermediate states? > For normal memcpy, strcpy, and other functions in group 1), the > intermediate states aren't defined Again, not defined or not specified? -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-21 9:31 ` Alexandre Oliva @ 2014-11-21 17:17 ` Joseph Myers 2014-11-21 23:43 ` Torvald Riegel 1 sibling, 0 replies; 24+ messages in thread From: Joseph Myers @ 2014-11-21 17:17 UTC (permalink / raw) To: Alexandre Oliva; +Cc: Torvald Riegel, Florian Weimer, libc-alpha On Fri, 21 Nov 2014, Alexandre Oliva wrote: > > So, I agree that these *specific* memory locations are intermediate > > states, but the comparison functions are not guaranteed to be able to > > look at other elements of the arrays and find sensible information in > > those. > > The important question here IMHO is whether looking at them is invokes > undefined behavior, or just yields unspecified values, possibly narrowed > to a subset of all values that might be held by the types of the objects > in those locations, if there can even be valid assumptions about the > types of those memory locations. If a location is modified by a function, and the semantics of that function do not specify it to be modified as an atomic operation with a particular memory order, I think asynchronous accesses result in undefined behavior. At least, they behave like accessing uninitialized automatic storage or struct padding (i.e., a variable copied from the possibly modified location has a wobbly value, as in DR#451, that need not behave consistently like any particular value of its type for subsequent operations on it). I don't think memset_s is any different - it acts on memory as if it were volatile, but not atomic. Similarly, it is valid for functions to read their inputs multiple times unless otherwise specified; memcpy has undefined behavior if its inputs change concurrently with the call to memcpy (rather than it simply being unspecified which value gets copied). -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-21 9:31 ` Alexandre Oliva 2014-11-21 17:17 ` Joseph Myers @ 2014-11-21 23:43 ` Torvald Riegel 1 sibling, 0 replies; 24+ messages in thread From: Torvald Riegel @ 2014-11-21 23:43 UTC (permalink / raw) To: Alexandre Oliva; +Cc: Florian Weimer, libc-alpha On Fri, 2014-11-21 at 07:30 -0200, Alexandre Oliva wrote: > On Nov 19, 2014, Torvald Riegel <triegel@redhat.com> wrote: > > > On Tue, 2014-11-18 at 20:23 -0200, Alexandre Oliva wrote: > >> On Nov 17, 2014, Torvald Riegel <triegel@redhat.com> wrote: > >> > >> > On Fri, 2014-11-14 at 14:53 -0200, Alexandre Oliva wrote: > >> >> On Nov 14, 2014, Torvald Riegel <triegel@redhat.com> wrote: > >> >> > >> >> > AFAICT memset_s is still a sequentially-specified function. > >> >> > >> >> How can you tell? It's not like the standard explicitly says so, is it? > > >> I'm asking how do you tell in general. > > > If the function is doing something, for example a store, and the > > function does not specify what it does internally and which > > inter-thread happens-before relations this creates, then there's > > nothing specified that makes this not a data race if you try to look > > at an intermediate state from another thread. > > Which is why I've resorted to non-threaded means of inspection of > intermediate states. I think we differ in whether âthe function does > not specify what it does internallyâ. Yes, that seems to be the case. > If the definition of the function > said âcopy n chars from src[0..n-1] to dest[0..n-1] respectivelyâ, > besides any pre- and post- conditions, then it *does* specify what it > does internally. I think we differ on whether that specifies the details of an implementation, or what condition will hold after the function returns. The way I read it, and the way I think this needs to be understood to be actually generally applicable, is that it effectively says: "When the function has returned (and thus finished execution), it will have copied n chars from src[0..n-1] to dest[0..n-1] respectivelyâ. That's the post-condition. One reason for why I think that this is indeed the intended verbose specification is that it's clear how to inspect the state after the function has returned; it's not executing anymore, a copy can be easily understood, and there's no restriction on how you actually look at the state (e.g., when you rely on the post-condition, or want it to check whether it holds). In contrast, while the function is still running, there are several other things that would have to be specified for this to make sense and to prevent that it is interpreted differently (which a standard wouldn't want). For example, which steps does copying a char take? It sounds trivial in this example, but what disallows this to be a bit-wise copy? If we look at other functions like qsort, which are specified to sort the array, then there are various ways to do sorting; do you think it says anything except that the array is sorted when the function has returned? (Ignoring the comparison callbacks, which too don't reveal why sorting algorithm is used.) Thus, if we consider these more complex functions, and you agree that they only guarantee the effects that are in place when they have returned, wouldn't it then make most sense to take this as the default way to understand the specification? Why would it be worthwhile to special-case simpler functions such as memcpy? > Not the order in which the chars are copied, for sure, > but still, it says the function should copy each and every one of those > chars. It doesn't state how to copy a char, but anything other than > load from src[i] and store the loaded value in dest[i] is hardly a copy. > > So while this makes room an interrupted copy to leave dest[i] in an > unspecified state that could be its earlier value or the newly-copied > one, it would be hard to argue that anything else complies with the > behavior specification enclosed in quotes above. Why couldn't it be a partially copied value? Where does the standard disallow bit-wise copy, or require atomic operations for every access to char? > I can see value in making simplifying assumptions to reason about > behavior in the presence of multiple threads, and I realize that the no > data race requirements can enable *reasoning* about sequential functions > in such contexts as if only the pre- and post-conditions mattered, I do > not agree that applying similar reasoning to go backwards is logically > sound. > > I mean, âI perceive this as a sequential function, which enables > simplifying assumptions about internal behavior in multi-threaded > contexts, therefore I can disregard the explicit behavior specification > and only look at explicit or inferred (pre- and?) post-conditions to > reason in any context whatsoever, or to implement the function however I > like, even deviating from the specification, as long as it still > satisfies the post-conditions when given the pre-conditionsâ doesn't > hold, because there are issues that arise besides those that come up in > multi-threaded contexts, to which the simplifying assumptions for > reasoning about multi-threaded contexts do not apply. I'm not sure I understand what you're saying. First, I think what is precisely the behavioral specification is something we still disagree about. From my perspective, guaranteeing the effects of a (sequential, non-synchronizing, non-volatile, ...) function when it returns is perfectly in line with the specifications. So, from my perspective, nothing is disregarded. Compilers rely on the as-if rule to make optimizations. For example, store speculative values if they can prove that some value will be written to a variable in all executions of the function. That speculative store might never have a right value. Unless I misunderstand you, such an compiler optimization would be incorrect in your opinion because it stores a value that the abstract machine might never store. C11 (N1570) 5.1.2.3p9-10 start with the following sentences: "An implementation might define a one-to-one correspondence between abstract and actual semantics: at every sequence point, the values of the actual objects would agree with those specified by the abstract semantics. The keyword volatile would then be redundant. Alternatively, an implementation might perform various optimizations within each translation unit, such that the actual semantics would agree with the abstract semantics only when making function calls across translation unit boundaries." Wouldn't the alternative implementation not be able to provide what you argue the standard requires? > > Well, the comparison callbacks can't just look at will at every piece > > of intermediate state. > > Why is that? I mean, what, if any, part of the relevant standards says > so? > > > So, I agree that these *specific* memory locations are intermediate > > states, but the comparison functions are not guaranteed to be able to > > look at other elements of the arrays and find sensible information in > > those. > > The important question here IMHO is whether looking at them is invokes > undefined behavior, or just yields unspecified values, possibly narrowed > to a subset of all values that might be held by the types of the objects > in those locations, if there can even be valid assumptions about the > types of those memory locations. I'm not sure about the comparison functions. But even if there should be a stronger requirement for the comparison functions, this wouldn't imply that accesses from other threads wouldn't be a data race. > > >> I'm just trying to figure out what the > >> heck you mean by âsequential functionâ, and by âsequential > >> specificationâ. > > > What I mean is that they are not concurrent specifications that make > > guarantees about states of an unfinished execution as visible to > > concurrent observers. They only make guarantees about the state after a > > function has finished executing. (Sorry if I'm using shared-memory > > synchronization terminology here, but given that we want to distinguish > > between concurrent and non-concurrent, that seems to make sense.) > > Thanks. The definitely makes sense, when the goal is to reason about > shared-memory multi-threaded (henceforth SMMT) issues. But there are > other issues for which this distinction, or the simplifications in SMMT > reasoning that follow from it, don't apply, and may even contradict > other standard-imposed requirements. So please take the âsequential > functionâ claims with a grain of salt, and don't use them to discard > parts of the specification you don't generally have to worry about when > you're thinking of SMMT, when the context is not limited to SMMT. So which issues are you thinking about, and how do they affect MT-Safety? > >> I had understood the latter had to do with > >> specifications limited to pre- and post-conditions, but the standards > >> we've been talking about do not limit function specifications to that. > > > Why do you think that is the case? > > What does âthatâ mean? That I had understood it in a certain way? Or > that the standards do not limit specs to pre- and post-conditions? The latter. > > The callback, or composition of functions in general, is one thing you > > mentioned, and I hope was able to convince you that this doesn't give > > guarantees about the caller (e.g., qsort) to the callee (e.g., > > comparison function), except when those guarantees overlap with > > preconditions for the callee. > > I'm afraid you haven't, but you've helped me understand our differences > in reasoning, because I won't turn specifications of behavior into pre- > and post-conditions and label a function as sequential to then pretend > the original specifications did not exist and did not impose any other > requirements that are not necessarily relevant for SMMT contexts, but > that might be in other contexts. > > > > "A function that may be safely invoked by an application while the > > asynchronous form of cancellation is enabled." > > > That doesn't really tell me a lot :) I can interpret "safely invoked" > > to at least mean that the mere act of cancellation will not break > > anything. But it doesn't tell me which state one can expect after > > cancellation. > > Yup. Again, the important question is: is it undefined or unspecified? Do you think that unspecified would really help you a lot? What do you think it means? Is it all values allowed by a type? Or something else? > > One way to define safety would be to say that a cancelled function > > should either take effect or not, but never partially take effect. IOW, > > it's either just the precondition or the postcondition that holds. > > This would be a way to extend the simplifying assumptions of sequential > functions to some other contexts. Sequential functions would > essentially be regarded as, and required to behave as, atomic. Atomic if cancelled, yes. > > Another option would be to allow specified intermediate steps to take > > effect. For 2), we could say that cancellation happens anywhere between > > the steps the abstract machine would do, but not within a step. This > > would be satisfied under the requirement you assumed for memset and > > strcpy implementations, I believe. > > Yeah, with the caveat that the order of steps of the abstract machine > that may be used to carry out the required behavior is not specified. > So, interrupting memset, you might observe that dest[i+1] is modified > while dest[i] wasn't yet, or vice-versa. The problem with that is that it still would need to be specified what the actual steps are. This is done explicitly for memset_s, but not for memset, IMO. It's not only the order that the standard mentions additionally for memset_s, though -- it also explicitly requires that the implementation is strictly equivalent to the abstract machine (K.3.7.4.1p4): "Unlike memset, any call to the memset_s function shall be evaluated strictly according to the rules of the abstract machine as described in (5.1.2.3)." Why would the standard add these requirements for memset_s if memset already had them? > > For 3), it could be cancellation between any of the atomic steps, unless > > otherwise specified. For condvar wait, for example, this could be one > > of the three parts: lock release, wakeup, lock acquisition. > > Eeek, it would be Really Bad (TM) IMHO if a condvar wait could be > canceled while the lock is not held: this could mess with enclosing > cleanup handlers that, among other things, release the lock. condvar wait was just an example. (It's a bad one, because it's not a cancellation point.) > What states can cancellation cleanup handlers reliably inspect, anyway? > Are they to be regarded as running in async signal context, so that they > can't reliably access local state and are very limited in global state? > Or are they allowed to access local state, plus any global state that > could be accessed after pthread_join()ing the canceled thread? I don't know. I'd focus on what you call the global state first. > >> Does this mean it is safe to access the variables > >> that were partially modified by the interrupted memcpy/strcpy/whatever, > >> and that this provides means to safely inspect intermediate states? > > > For normal memcpy, strcpy, and other functions in group 1), the > > intermediate states aren't defined > > Again, not defined or not specified? Again, what's your definition of "unspecified"? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 12:01 ` Florian Weimer 2014-11-14 13:28 ` Torvald Riegel @ 2014-11-14 16:46 ` Alexandre Oliva 2014-11-14 21:43 ` Florian Weimer 1 sibling, 1 reply; 24+ messages in thread From: Alexandre Oliva @ 2014-11-14 16:46 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote: > I've been staring at the standard for a while. The standard > explicitly refuses to deal with the interaction of signal handlers and > threads The argument doesn't require threads, they'd be a distraction at best. The issue is whether a signal handler that interrupts strcpy (or memcpy, or any other standard function) could observe effects in the destination string (or whatever else they modify) that are not specified in the definition of the corresponding function. Say, given: char foo[5] = "12"; int main() { signal (SIGUSR1, checkme) strcpy (&foo[1], "23"); } what standard-compliant values can checkme legitimately expect to find in foo[0], foo[1], foo[2], foo[3], and foo[4]? Under my reading, foo[0] and foo[4] could only hold '1' and '\0', respectively, since nothing allows strcpy to modify them from their initial values. foo[1] and foo[3] could only hold '2' and '\0', respectively, since they already held the values that the standard says strcpy should store in them. foo[2] could hold '\0' or '3' (*), depending on whether the the signal interrupts strcpy before or after it gets to it. I reason that temporarily storing alternate values in the destination would be as much of a deviation from the specification as writing to foo[0] or foo[4]. (*) or perhaps other intermediate values, if chars are too big to copy as a single memory transaction, so that narrower memory blocks, such as individual bits, had to be copied one at a time. I realize this reading would rule out not only the potentially useful practice of resetting full cache lines instead of loading them from memory before overwriting them, but also other possibilities that would arguably comply with the current specification if it was limited to pre- and post-condition only, without any observable intermediate results, such as repeatedly generating random strings and comparing them with the source until they were identical, or incrementing the source as an multi-byte unsigned number, represented by the concatenation of all the bits in the destination string, until it compares equal to the source. These other possibilities would not only fail the efficiency expectations, but also produce visible intermediate results that IMHO are not allowed by the current wording of the standard. But see below. > I very much doubt the intent was invalidate existing implementations > which write ghost values This was the sort of argument that made me revisit my understanding that strcpy et al couldn't “write garbage” in the destination before writing the final value. I still don't see how the current wording would allow for that, but now I agree it would make perfect sense to have wording in standards that would allow this sort of behavior. Maybe not arbitrary garbage, certainly not temporarily writing garbage to other user-visible portions of the address space, but something that can be construed as executing an algorithm that, starting with the destination assumed to contain random garbage, makes progress towards the goal of having the destination hold a copy of the source. Similar wording could apply to qsort too, although to inspect intermediate qsort results you don't even need signal handlers: it calls back the compare function synchronously, and the compare function is not prohibited from accessing the array being sorted; plus, I believe numerous qsort implementations have historically exchanged array entries as sorting progresses, so attemting to rule that out would be unlikely to fly. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 16:46 ` Alexandre Oliva @ 2014-11-14 21:43 ` Florian Weimer 2014-11-15 0:00 ` Alexandre Oliva 0 siblings, 1 reply; 24+ messages in thread From: Florian Weimer @ 2014-11-14 21:43 UTC (permalink / raw) To: Alexandre Oliva; +Cc: libc-alpha On 11/14/2014 05:45 PM, Alexandre Oliva wrote: > Say, given: > > char foo[5] = "12"; > > int main() { > signal (SIGUSR1, checkme) > strcpy (&foo[1], "23"); > } > > what standard-compliant values can checkme legitimately expect to find > in foo[0], foo[1], foo[2], foo[3], and foo[4]? foo is not an atomic object, so this is undefined. As a tried to explain, things turn out rather messy if you add the _Atomic qualifier. I still think the values are unspecified (despite the standard saying they are not) because the accesses from strcpy and the signal handler are not sequenced. -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-14 21:43 ` Florian Weimer @ 2014-11-15 0:00 ` Alexandre Oliva 2014-11-17 7:53 ` Florian Weimer 2014-11-17 10:05 ` Torvald Riegel 0 siblings, 2 replies; 24+ messages in thread From: Alexandre Oliva @ 2014-11-15 0:00 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote: > On 11/14/2014 05:45 PM, Alexandre Oliva wrote: >> Say, given: >> >> char foo[5] = "12"; >> >> int main() { >> signal (SIGUSR1, checkme) >> strcpy (&foo[1], "23"); >> } >> >> what standard-compliant values can checkme legitimately expect to find >> in foo[0], foo[1], foo[2], foo[3], and foo[4]? > foo is not an atomic object, so this is undefined. Yeah, I goofed in the testcase, and I failed to mention the reasoning was supposed to apply to earlier C standards, that we still intend to comply with, so our implementation of strcpy shouldn't gratuitously break. In order to avoid the undefinedness under e.g. C90, the actual string couldn't be in static storage. Getting ahold of a pointer to the string storage could be messy, though; maybe C90 and C99 were written so as to imply it couldn't be done at all, even if POSIX introduced means that would make it possible, such as writing the pointer to a pipe, or to a file using POSIX functions not defined in standard C. If their intent was to make access impossible, then my argument would indeed fall apart. > As a tried to explain, things turn out rather messy if you add the > _Atomic qualifier. I still think the values are unspecified (despite > the standard saying they are not) because the accesses from strcpy and > the signal handler are not sequenced. Once we make C11 the focus, getting the pointer to the signal handler is easier, through an _Atomic intptr_t with static or per-thread storage, but the string storage would have to be _Atomic as well, and then, as you say, what might happen within strcpy, memcpy et al is not entirely clear. However, making it unspecified might be pushing it too far. The standard could specify, for example, that it is unspecified, within an interrupting signal handler, whether observed values would be those originally held in the atomic storage, or those that should be put in there by the copy, without permitting any other values. That would be in line with my understanding, and I'll dare now put forth the idea that the apparent contradiction you point out might be an indication that this was the intent. But it could also say any value whatsoever would be permitted, which might make writing temporary garbage or invalidating entire cache lines more defensible. I suppose we'll only know if we ask and get a clarification... It no longer matters for the situation that initiated the debate, but it might matter for other future decisions. -- Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-15 0:00 ` Alexandre Oliva @ 2014-11-17 7:53 ` Florian Weimer 2014-11-17 10:21 ` Torvald Riegel 2014-11-17 10:05 ` Torvald Riegel 1 sibling, 1 reply; 24+ messages in thread From: Florian Weimer @ 2014-11-17 7:53 UTC (permalink / raw) To: Alexandre Oliva; +Cc: libc-alpha On 11/15/2014 12:58 AM, Alexandre Oliva wrote: > The standard could specify, for example, that it is unspecified, within > an interrupting signal handler, whether observed values would be those > originally held in the atomic storage, or those that should be put in > there by the copy, without permitting any other values. That would be > in line with my understanding, and I'll dare now put forth the idea that > the apparent contradiction you point out might be an indication that > this was the intent. It would mean that memset and memcpy would align the passed-in pointer to the largest possible atomic object size and update the target using atomic instructions of at least this size. (This might also apply to the string functions.) Head and tail may not be a multiple of the word size, so we'd need a compare-and-swap loop to cover this case, with quite a bit of performance overhead. Personally, I find it rather attractive to leave this unspecified. (Note that C11 is a bit ambiguous whether there is a “no values out of thin air” requirement in the memory model. Java has this even in the presence of data races, but I don't think GCC provides this for C11. If all data races are indeed undefined behavior, the fact that the standard makes a contrary claim about how the memory model works (see the previous discussion with Torvald for a quote from the standard) does not matter.) -- Florian Weimer / Red Hat Product Security ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-17 7:53 ` Florian Weimer @ 2014-11-17 10:21 ` Torvald Riegel 0 siblings, 0 replies; 24+ messages in thread From: Torvald Riegel @ 2014-11-17 10:21 UTC (permalink / raw) To: Florian Weimer; +Cc: Alexandre Oliva, libc-alpha On Mon, 2014-11-17 at 08:53 +0100, Florian Weimer wrote: > On 11/15/2014 12:58 AM, Alexandre Oliva wrote: > > The standard could specify, for example, that it is unspecified, within > > an interrupting signal handler, whether observed values would be those > > originally held in the atomic storage, or those that should be put in > > there by the copy, without permitting any other values. That would be > > in line with my understanding, and I'll dare now put forth the idea that > > the apparent contradiction you point out might be an indication that > > this was the intent. > > It would mean that memset and memcpy would align the passed-in pointer > to the largest possible atomic object size and update the target using > atomic instructions of at least this size. (This might also apply to > the string functions.) Head and tail may not be a multiple of the word > size, so we'd need a compare-and-swap loop to cover this case, with > quite a bit of performance overhead. > > Personally, I find it rather attractive to leave this unspecified. > > (Note that C11 is a bit ambiguous whether there is a âno values out of > thin airâ requirement in the memory model. I've heard nobody in the C++ committee say that they want to allow out-of-thin-air values -- and the C and C++ models are supposed to be equivalent. For C++, there is such a requirement, even though in a non-normative note -- but just because it's hard to specify precisely. See this paper for details (e.g., Section 4): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4136.pdf But, I don't think this specificaton problem is critical for us; the paper also states: "This is a high-level-language specification problem: there is no suggestion that thin-air executions occur in practice with current compilers and hardware; the problem is rather how to exclude them without preventing desired compiler optimisations." > Java has this even in the > presence of data races, ... and that makes it hard for them. > but I don't think GCC provides this for C11. Agreed. Data races are undefined behavior as stated by C11. That doesn't mean that everything that would be a data race under C11 also leads to a program crash, for example, when compiled by GCC. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: ctermid: return string literal, document MT-Safety pitfall 2014-11-15 0:00 ` Alexandre Oliva 2014-11-17 7:53 ` Florian Weimer @ 2014-11-17 10:05 ` Torvald Riegel 1 sibling, 0 replies; 24+ messages in thread From: Torvald Riegel @ 2014-11-17 10:05 UTC (permalink / raw) To: Alexandre Oliva; +Cc: Florian Weimer, libc-alpha On Fri, 2014-11-14 at 21:58 -0200, Alexandre Oliva wrote: > On Nov 14, 2014, Florian Weimer <fweimer@redhat.com> wrote: > > > On 11/14/2014 05:45 PM, Alexandre Oliva wrote: > >> Say, given: > >> > >> char foo[5] = "12"; > >> > >> int main() { > >> signal (SIGUSR1, checkme) > >> strcpy (&foo[1], "23"); > >> } > >> > >> what standard-compliant values can checkme legitimately expect to find > >> in foo[0], foo[1], foo[2], foo[3], and foo[4]? > > > foo is not an atomic object, so this is undefined. > > Yeah, I goofed in the testcase, and I failed to mention the reasoning > was supposed to apply to earlier C standards, that we still intend to > comply with, so our implementation of strcpy shouldn't gratuitously > break. > > In order to avoid the undefinedness under e.g. C90, the actual string > couldn't be in static storage. Getting ahold of a pointer to the string > storage could be messy, though; maybe C90 and C99 were written so as to > imply it couldn't be done at all, even if POSIX introduced means that > would make it possible, such as writing the pointer to a pipe, or to a > file using POSIX functions not defined in standard C. If their intent > was to make access impossible, then my argument would indeed fall apart. > > > As a tried to explain, things turn out rather messy if you add the > > _Atomic qualifier. I still think the values are unspecified (despite > > the standard saying they are not) because the accesses from strcpy and > > the signal handler are not sequenced. > > Once we make C11 the focus, getting the pointer to the signal handler is > easier, through an _Atomic intptr_t with static or per-thread storage, > but the string storage would have to be _Atomic as well, and then, as > you say, what might happen within strcpy, memcpy et al is not entirely > clear. However, making it unspecified might be pushing it too far. For example, strcpy doesn't take atomic types as arguments, so it won't access the memory atomically, so you get data races and undefined behavior unless you make sure that the signal handler happens after or before the strcpy (via happens-before). Please have a look at C11 (N1570) 5.1.2.3p10, which gives an example of an allowed implementation that just guarantees equality of non-volatiles to the abstract machine at function boundaries. > The standard could specify, for example, that it is unspecified, within > an interrupting signal handler, whether observed values would be those > originally held in the atomic storage, or those that should be put in > there by the copy, without permitting any other values. If the signal handler's atomic accesses are concurrent with other atomic accesses, then there is no data race because atomics don't create data races. But strcpy doesn't write with atomics. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2014-11-21 23:43 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-11-07 8:35 ctermid: return string literal, document MT-Safety pitfall Alexandre Oliva 2014-11-07 10:36 ` Richard Henderson 2014-11-08 14:22 ` Alexandre Oliva 2014-11-08 15:01 ` Richard Henderson 2014-11-11 14:37 ` Torvald Riegel 2014-11-11 13:30 ` Florian Weimer 2014-11-13 21:03 ` Alexandre Oliva 2014-11-14 12:01 ` Florian Weimer 2014-11-14 13:28 ` Torvald Riegel 2014-11-14 13:47 ` Florian Weimer 2014-11-14 14:06 ` Torvald Riegel 2014-11-14 16:53 ` Alexandre Oliva 2014-11-17 9:44 ` Torvald Riegel 2014-11-18 22:23 ` Alexandre Oliva 2014-11-19 22:11 ` Torvald Riegel 2014-11-21 9:31 ` Alexandre Oliva 2014-11-21 17:17 ` Joseph Myers 2014-11-21 23:43 ` Torvald Riegel 2014-11-14 16:46 ` Alexandre Oliva 2014-11-14 21:43 ` Florian Weimer 2014-11-15 0:00 ` Alexandre Oliva 2014-11-17 7:53 ` Florian Weimer 2014-11-17 10:21 ` Torvald Riegel 2014-11-17 10:05 ` Torvald Riegel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).