public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] Fix undefined behaviour inconsistent for strtok
@ 2016-10-25 10:58 Adhemerval Zanella
  2016-10-25 11:31 ` Andreas Schwab
  2016-10-25 13:32 ` Florian Weimer
  0 siblings, 2 replies; 14+ messages in thread
From: Adhemerval Zanella @ 2016-10-25 10:58 UTC (permalink / raw)
  To: libc-alpha

Although not stated in any standard how strtok should return if you
pass a null argument if the previous argument is also null, this patch
changes the default implementation to follow this idea.

The original bug report comment #1 states glibc code convention [6]
should not allow it, however for this specific function its contract
does not expect failure even if the returned is ignored (since it
would be a no-op).  Also, patch idea is more focuses on implementation
portability , since it aligns glibc with other implementation that
follows the same idea for strtok:

  - FreeBSD [1], OpenBSD [2], NetBSD [3];
  - uclibc and uclibc-ng [4]
  - musl [5]

I see little value to either assert on null input (as stated in comment
2 from original bug report), change both x86_64 and powerpc64le
implementation to fault on such input, or to keep a different behavior
compared to other libc implementations.

Checked on x86_64, aarch64, and powerpc64le.

	* string/strtok.c (strtok): Return null is previous input is also
	null.
	* string/tst-strtok.c (do_test): Add more strtok coverage.

[1] https://github.com/freebsd/freebsd/blob/386ddae58459341ec567604707805814a2128a57/lib/libc/string/strtok.c
[2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/string/strtok_r.c?rev=1.10&content-type=text/x-cvsweb-markup&only_with_tag=MAIN
[3] https://github.com/openbsd/src/blob/5271000b44abe23907b73bbb3aa38ddf4a0bce08/lib/libc/string/strtok.c
[4] http://www.uclibc-ng.org/browser/uclibc-ng/libc/string/strtok_r.c
[5] https://git.musl-libc.org/cgit/musl/tree/src/string/strtok.c
[6] https://sourceware.org/glibc/wiki/Style_and_Conventions#Invalid_pointers
---
 ChangeLog           |  7 +++++++
 string/strtok.c     |  4 ++--
 string/tst-strtok.c | 23 ++++++++++++-----------
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/string/strtok.c b/string/strtok.c
index 7a4574d..5c4b309 100644
--- a/string/strtok.c
+++ b/string/strtok.c
@@ -40,8 +40,8 @@ STRTOK (char *s, const char *delim)
 {
   char *token;
 
-  if (s == NULL)
-    s = olds;
+  if ((s == NULL) && ((s = olds) == NULL))
+    return NULL;
 
   /* Scan leading delimiters.  */
   s += strspn (s, delim);
diff --git a/string/tst-strtok.c b/string/tst-strtok.c
index 6fbef9f..d9180a4 100644
--- a/string/tst-strtok.c
+++ b/string/tst-strtok.c
@@ -2,25 +2,26 @@
 #include <stdio.h>
 #include <string.h>
 
+static int do_test (void);
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"
+
 static int
 do_test (void)
 {
   char buf[1] = { 0 };
   int result = 0;
 
+  if (strtok (NULL, " ") != NULL)
+    FAIL_RET ("first strtok call did not return NULL");
+  if (strtok (NULL, " ") != NULL)
+    FAIL_RET ("second strtok call did not return NULL");
+
   if (strtok (buf, " ") != NULL)
-    {
-      puts ("first strtok call did not return NULL");
-      result = 1;
-    }
+    FAIL_RET ("third strtok call did not return NULL");
   else if (strtok (NULL, " ") != NULL)
-    {
-      puts ("second strtok call did not return NULL");
-      result = 1;
-    }
+    FAIL_RET ("forth strtok call did not return NULL");
 
   return result;
 }
-
-#define TEST_FUNCTION do_test ()
-#include "../test-skeleton.c"
-- 
2.7.4

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [PATCH] Fix undefined behaviour inconsistent for strtok
@ 2016-10-25 14:04 Wilco Dijkstra
  2016-10-25 16:46 ` Wilco Dijkstra
  0 siblings, 1 reply; 14+ messages in thread
From: Wilco Dijkstra @ 2016-10-25 14:04 UTC (permalink / raw)
  To: adhemerval.zanella, Andreas Schwab; +Cc: libc-alpha, nd

Hi,

+  if ((s == NULL) && ((s = olds) == NULL))
+    return NULL;

What is the benefit of this given:

  if (s == NULL)
    /* This token finishes the string.  */
    olds = __rawmemchr (token, '\0');

So in the current implementation 'olds' can only ever be NULL at the
very first call to strtok.

To avoid doing unnecessary work at the end of a string and avoid
use after free or other memory errors, this would be much better:

  if (s == NULL)
    /* This token finishes the string.  */
    olds = NULL;

Setting it to a NULL pointer (and not checking for it on entry) causes a crash
so any bug is found immediately rather than potentially staying latent when 
returning NULL. The goal should be to make bugs obvious, not trying to hide them.

Btw strtok_r has more potential issues, the reference to the previous string
may be a NULL pointer, and the pointer it contains may not be initialized at
all, so it's not useful to test either.

Wilco

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-10-25 16:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25 10:58 [PATCH] Fix undefined behaviour inconsistent for strtok Adhemerval Zanella
2016-10-25 11:31 ` Andreas Schwab
2016-10-25 12:33   ` Adhemerval Zanella
2016-10-25 12:57     ` Andreas Schwab
2016-10-25 13:13       ` Adhemerval Zanella
2016-10-25 13:20         ` Andreas Schwab
2016-10-25 13:23           ` Adhemerval Zanella
2016-10-25 13:45             ` Andreas Schwab
2016-10-25 13:49               ` Adhemerval Zanella
2016-10-25 13:51             ` Joseph Myers
2016-10-25 14:08               ` Adhemerval Zanella
2016-10-25 13:32 ` Florian Weimer
2016-10-25 14:04 Wilco Dijkstra
2016-10-25 16:46 ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).