* [PATCH] Fix undefined behaviour inconsistent for strtok
@ 2016-10-25 10:58 Adhemerval Zanella
2016-10-25 11:31 ` Andreas Schwab
2016-10-25 13:32 ` Florian Weimer
0 siblings, 2 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 10:58 UTC (permalink / raw)
To: libc-alpha
Although not stated in any standard how strtok should return if you
pass a null argument if the previous argument is also null, this patch
changes the default implementation to follow this idea.
The original bug report comment #1 states glibc code convention [6]
should not allow it, however for this specific function its contract
does not expect failure even if the returned is ignored (since it
would be a no-op). Also, patch idea is more focuses on implementation
portability , since it aligns glibc with other implementation that
follows the same idea for strtok:
- FreeBSD [1], OpenBSD [2], NetBSD [3];
- uclibc and uclibc-ng [4]
- musl [5]
I see little value to either assert on null input (as stated in comment
2 from original bug report), change both x86_64 and powerpc64le
implementation to fault on such input, or to keep a different behavior
compared to other libc implementations.
Checked on x86_64, aarch64, and powerpc64le.
* string/strtok.c (strtok): Return null is previous input is also
null.
* string/tst-strtok.c (do_test): Add more strtok coverage.
[1] https://github.com/freebsd/freebsd/blob/386ddae58459341ec567604707805814a2128a57/lib/libc/string/strtok.c
[2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/string/strtok_r.c?rev=1.10&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
[3] https://github.com/openbsd/src/blob/5271000b44abe23907b73bbb3aa38ddf4a0bce08/lib/libc/string/strtok.c
[4] http://www.uclibc-ng.org/browser/uclibc-ng/libc/string/strtok_r.c
[5] https://git.musl-libc.org/cgit/musl/tree/src/string/strtok.c
[6] https://sourceware.org/glibc/wiki/Style_and_Conventions#Invalid_pointers
---
ChangeLog | 7 +++++++
string/strtok.c | 4 ++--
string/tst-strtok.c | 23 ++++++++++++-----------
3 files changed, 21 insertions(+), 13 deletions(-)
diff --git a/string/strtok.c b/string/strtok.c
index 7a4574d..5c4b309 100644
--- a/string/strtok.c
+++ b/string/strtok.c
@@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
{
char *token;
- if (s == NULL)
- s = olds;
+ if ((s == NULL) && ((s = olds) == NULL))
+ return NULL;
/* Scan leading delimiters. */
s += strspn (s, delim);
diff --git a/string/tst-strtok.c b/string/tst-strtok.c
index 6fbef9f..d9180a4 100644
--- a/string/tst-strtok.c
+++ b/string/tst-strtok.c
@@ -2,25 +2,26 @@
#include <stdio.h>
#include <string.h>
+static int do_test (void);
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
+
static int
do_test (void)
{
char buf[1] = { 0 };
int result = 0;
+ if (strtok (NULL, " ") != NULL)
+ FAIL_RET ("first strtok call did not return NULL");
+ if (strtok (NULL, " ") != NULL)
+ FAIL_RET ("second strtok call did not return NULL");
+
if (strtok (buf, " ") != NULL)
- {
- puts ("first strtok call did not return NULL");
- result = 1;
- }
+ FAIL_RET ("third strtok call did not return NULL");
else if (strtok (NULL, " ") != NULL)
- {
- puts ("second strtok call did not return NULL");
- result = 1;
- }
+ FAIL_RET ("forth strtok call did not return NULL");
return result;
}
-
-#define TEST_FUNCTION do_test ()
-#include "../test-skeleton.c"
--
2.7.4
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 10:58 [PATCH] Fix undefined behaviour inconsistent for strtok Adhemerval Zanella
@ 2016-10-25 11:31 ` Andreas Schwab
2016-10-25 12:33 ` Adhemerval Zanella
2016-10-25 13:32 ` Florian Weimer
1 sibling, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2016-10-25 11:31 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> * string/strtok.c (strtok): Return null is previous input is also
s/is/if/
> diff --git a/string/strtok.c b/string/strtok.c
> index 7a4574d..5c4b309 100644
> --- a/string/strtok.c
> +++ b/string/strtok.c
> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
> {
> char *token;
>
> - if (s == NULL)
> - s = olds;
> + if ((s == NULL) && ((s = olds) == NULL))
Please avoid assignment in an expression. And the parens are redundant.
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 11:31 ` Andreas Schwab
@ 2016-10-25 12:33 ` Adhemerval Zanella
2016-10-25 12:57 ` Andreas Schwab
0 siblings, 1 reply; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 12:33 UTC (permalink / raw)
To: Andreas Schwab; +Cc: libc-alpha
On 25/10/2016 09:31, Andreas Schwab wrote:
> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>
>> * string/strtok.c (strtok): Return null is previous input is also
>
> s/is/if/
>
>> diff --git a/string/strtok.c b/string/strtok.c
>> index 7a4574d..5c4b309 100644
>> --- a/string/strtok.c
>> +++ b/string/strtok.c
>> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
>> {
>> char *token;
>>
>> - if (s == NULL)
>> - s = olds;
>> + if ((s == NULL) && ((s = olds) == NULL))
>
> Please avoid assignment in an expression. And the parens are redundant.
>
> Andreas.
>
Right, with these fixes would it be acceptable?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 12:33 ` Adhemerval Zanella
@ 2016-10-25 12:57 ` Andreas Schwab
2016-10-25 13:13 ` Adhemerval Zanella
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2016-10-25 12:57 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 25/10/2016 09:31, Andreas Schwab wrote:
>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>
>>> * string/strtok.c (strtok): Return null is previous input is also
>>
>> s/is/if/
>>
>>> diff --git a/string/strtok.c b/string/strtok.c
>>> index 7a4574d..5c4b309 100644
>>> --- a/string/strtok.c
>>> +++ b/string/strtok.c
>>> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
>>> {
>>> char *token;
>>>
>>> - if (s == NULL)
>>> - s = olds;
>>> + if ((s == NULL) && ((s = olds) == NULL))
>>
>> Please avoid assignment in an expression. And the parens are redundant.
>>
>> Andreas.
>>
>
> Right, with these fixes would it be acceptable?
I don't see much point in supporting invalid use of strtok.
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 12:57 ` Andreas Schwab
@ 2016-10-25 13:13 ` Adhemerval Zanella
2016-10-25 13:20 ` Andreas Schwab
0 siblings, 1 reply; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 13:13 UTC (permalink / raw)
To: Andreas Schwab; +Cc: libc-alpha
On 25/10/2016 10:57, Andreas Schwab wrote:
> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>
>> On 25/10/2016 09:31, Andreas Schwab wrote:
>>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>
>>>> * string/strtok.c (strtok): Return null is previous input is also
>>>
>>> s/is/if/
>>>
>>>> diff --git a/string/strtok.c b/string/strtok.c
>>>> index 7a4574d..5c4b309 100644
>>>> --- a/string/strtok.c
>>>> +++ b/string/strtok.c
>>>> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
>>>> {
>>>> char *token;
>>>>
>>>> - if (s == NULL)
>>>> - s = olds;
>>>> + if ((s == NULL) && ((s = olds) == NULL))
>>>
>>> Please avoid assignment in an expression. And the parens are redundant.
>>>
>>> Andreas.
>>>
>>
>> Right, with these fixes would it be acceptable?
>
> I don't see much point in supporting invalid use of strtok.
>
> Andreas.
>
My point is just to add portability and align with other current
implementations.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:13 ` Adhemerval Zanella
@ 2016-10-25 13:20 ` Andreas Schwab
2016-10-25 13:23 ` Adhemerval Zanella
0 siblings, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2016-10-25 13:20 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 25/10/2016 10:57, Andreas Schwab wrote:
>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>
>>> On 25/10/2016 09:31, Andreas Schwab wrote:
>>>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>>
>>>>> * string/strtok.c (strtok): Return null is previous input is also
>>>>
>>>> s/is/if/
>>>>
>>>>> diff --git a/string/strtok.c b/string/strtok.c
>>>>> index 7a4574d..5c4b309 100644
>>>>> --- a/string/strtok.c
>>>>> +++ b/string/strtok.c
>>>>> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
>>>>> {
>>>>> char *token;
>>>>>
>>>>> - if (s == NULL)
>>>>> - s = olds;
>>>>> + if ((s == NULL) && ((s = olds) == NULL))
>>>>
>>>> Please avoid assignment in an expression. And the parens are redundant.
>>>>
>>>> Andreas.
>>>>
>>>
>>> Right, with these fixes would it be acceptable?
>>
>> I don't see much point in supporting invalid use of strtok.
>>
>> Andreas.
>>
>
> My point is just to add portability and align with other current
> implementations.
Has it ever be a problem in the past?
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:20 ` Andreas Schwab
@ 2016-10-25 13:23 ` Adhemerval Zanella
2016-10-25 13:45 ` Andreas Schwab
2016-10-25 13:51 ` Joseph Myers
0 siblings, 2 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 13:23 UTC (permalink / raw)
To: Andreas Schwab; +Cc: libc-alpha
On 25/10/2016 11:19, Andreas Schwab wrote:
> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>
>> On 25/10/2016 10:57, Andreas Schwab wrote:
>>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>
>>>> On 25/10/2016 09:31, Andreas Schwab wrote:
>>>>> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>>>>>
>>>>>> * string/strtok.c (strtok): Return null is previous input is also
>>>>>
>>>>> s/is/if/
>>>>>
>>>>>> diff --git a/string/strtok.c b/string/strtok.c
>>>>>> index 7a4574d..5c4b309 100644
>>>>>> --- a/string/strtok.c
>>>>>> +++ b/string/strtok.c
>>>>>> @@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
>>>>>> {
>>>>>> char *token;
>>>>>>
>>>>>> - if (s == NULL)
>>>>>> - s = olds;
>>>>>> + if ((s == NULL) && ((s = olds) == NULL))
>>>>>
>>>>> Please avoid assignment in an expression. And the parens are redundant.
>>>>>
>>>>> Andreas.
>>>>>
>>>>
>>>> Right, with these fixes would it be acceptable?
>>>
>>> I don't see much point in supporting invalid use of strtok.
>>>
>>> Andreas.
>>>
>>
>> My point is just to add portability and align with other current
>> implementations.
>
> Has it ever be a problem in the past?
>
> Andreas.
>
None I am aware of, but regardless it is a effort to close down old
glibc bugs and keep the backlog under control.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 10:58 [PATCH] Fix undefined behaviour inconsistent for strtok Adhemerval Zanella
2016-10-25 11:31 ` Andreas Schwab
@ 2016-10-25 13:32 ` Florian Weimer
1 sibling, 0 replies; 14+ messages in thread
From: Florian Weimer @ 2016-10-25 13:32 UTC (permalink / raw)
To: Adhemerval Zanella, libc-alpha
On 10/25/2016 12:58 PM, Adhemerval Zanella wrote:
> The original bug report comment #1
I don't see a reference to that bug anywhere, so it's not clear (to me
at least) what the report was about. (But I can guess Â…)
Florian
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:23 ` Adhemerval Zanella
@ 2016-10-25 13:45 ` Andreas Schwab
2016-10-25 13:49 ` Adhemerval Zanella
2016-10-25 13:51 ` Joseph Myers
1 sibling, 1 reply; 14+ messages in thread
From: Andreas Schwab @ 2016-10-25 13:45 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> None I am aware of, but regardless it is a effort to close down old
> glibc bugs and keep the backlog under control.
Which bug?
Andreas.
--
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:45 ` Andreas Schwab
@ 2016-10-25 13:49 ` Adhemerval Zanella
0 siblings, 0 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 13:49 UTC (permalink / raw)
To: Andreas Schwab; +Cc: libc-alpha
On 25/10/2016 11:45, Andreas Schwab wrote:
> On Okt 25 2016, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>
>> None I am aware of, but regardless it is a effort to close down old
>> glibc bugs and keep the backlog under control.
>
> Which bug?
>
> Andreas.
>
Oops, my bad. BZ#16640.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:23 ` Adhemerval Zanella
2016-10-25 13:45 ` Andreas Schwab
@ 2016-10-25 13:51 ` Joseph Myers
2016-10-25 14:08 ` Adhemerval Zanella
1 sibling, 1 reply; 14+ messages in thread
From: Joseph Myers @ 2016-10-25 13:51 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: Andreas Schwab, libc-alpha
On Tue, 25 Oct 2016, Adhemerval Zanella wrote:
> None I am aware of, but regardless it is a effort to close down old
> glibc bugs and keep the backlog under control.
Well, if a bug report is invalid then closing it as INVALID is
appropriate. Or if there is an idea in the bug report that might or might
not be a good idea but isn't appropriate for Bugzilla, closing as INVALID
and putting a note on
<https://sourceware.org/glibc/wiki/Development_Todo/Master> of the idea to
consider (with a link to the previous discussion in the bug) is
appropriate.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 13:51 ` Joseph Myers
@ 2016-10-25 14:08 ` Adhemerval Zanella
0 siblings, 0 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 14:08 UTC (permalink / raw)
To: Joseph Myers; +Cc: Andreas Schwab, libc-alpha
On 25/10/2016 11:51, Joseph Myers wrote:
> On Tue, 25 Oct 2016, Adhemerval Zanella wrote:
>
>> None I am aware of, but regardless it is a effort to close down old
>> glibc bugs and keep the backlog under control.
>
> Well, if a bug report is invalid then closing it as INVALID is
> appropriate. Or if there is an idea in the bug report that might or might
> not be a good idea but isn't appropriate for Bugzilla, closing as INVALID
> and putting a note on
> <https://sourceware.org/glibc/wiki/Development_Todo/Master> of the idea to
> consider (with a link to the previous discussion in the bug) is
> appropriate.
>
Right, but the bug report is about the inconsistent behaviour about
for x86_64 (and powerpc as well) and default one. Bug report comments
from Carlos pointed that it should be fixed in x86_64/powerpc
implementation, while I argued that it would better to follow
what other libc are aiming for since this specific case that does
trigger any particular issue. That's why I think it is not invalid.
If the consensus is indeed to fix the x86_64/powerpc I will work
towards, although I still prefer aim for portability.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
2016-10-25 14:04 Wilco Dijkstra
@ 2016-10-25 16:46 ` Wilco Dijkstra
0 siblings, 0 replies; 14+ messages in thread
From: Wilco Dijkstra @ 2016-10-25 16:46 UTC (permalink / raw)
To: adhemerval.zanella, Andreas Schwab; +Cc: libc-alpha, nd
Hi,
>+ if ((s == NULL) && ((s = olds) == NULL))
>+ return NULL;
>
> What is the benefit of this given:
>
> if (s == NULL)
> /* This token finishes the string. */
> olds = __rawmemchr (token, '\0');
Right, looking at this a bit more, if we call strcspn rather than strpbrk,
we get the end of the string for free and avoid 2 calls.
It looks like the reason behind the (s = olds) == NULL check is purely
performance. Checking *s == '\0' at the start is slightly faster still (and
would seem better as it catches incorrect use of strtok).
With these changes you get > 2x speedup for most cases. I'll post a patch.
Wilco
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
@ 2016-10-25 14:04 Wilco Dijkstra
2016-10-25 16:46 ` Wilco Dijkstra
0 siblings, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2016-10-25 14:04 UTC (permalink / raw)
To: adhemerval.zanella, Andreas Schwab; +Cc: libc-alpha, nd
Hi,
+ if ((s == NULL) && ((s = olds) == NULL))
+ return NULL;
What is the benefit of this given:
if (s == NULL)
/* This token finishes the string. */
olds = __rawmemchr (token, '\0');
So in the current implementation 'olds' can only ever be NULL at the
very first call to strtok.
To avoid doing unnecessary work at the end of a string and avoid
use after free or other memory errors, this would be much better:
if (s == NULL)
/* This token finishes the string. */
olds = NULL;
Setting it to a NULL pointer (and not checking for it on entry) causes a crash
so any bug is found immediately rather than potentially staying latent when
returning NULL. The goal should be to make bugs obvious, not trying to hide them.
Btw strtok_r has more potential issues, the reference to the previous string
may be a NULL pointer, and the pointer it contains may not be initialized at
all, so it's not useful to test either.
Wilco
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-10-25 16:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25 10:58 [PATCH] Fix undefined behaviour inconsistent for strtok Adhemerval Zanella
2016-10-25 11:31 ` Andreas Schwab
2016-10-25 12:33 ` Adhemerval Zanella
2016-10-25 12:57 ` Andreas Schwab
2016-10-25 13:13 ` Adhemerval Zanella
2016-10-25 13:20 ` Andreas Schwab
2016-10-25 13:23 ` Adhemerval Zanella
2016-10-25 13:45 ` Andreas Schwab
2016-10-25 13:49 ` Adhemerval Zanella
2016-10-25 13:51 ` Joseph Myers
2016-10-25 14:08 ` Adhemerval Zanella
2016-10-25 13:32 ` Florian Weimer
2016-10-25 14:04 Wilco Dijkstra
2016-10-25 16:46 ` Wilco Dijkstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).