public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] manual: Document a GNU extension for strncmp/wcsncmp
@ 2024-06-27 15:58 Florian Weimer
  2024-06-27 16:33 ` Andreas Schwab
  0 siblings, 1 reply; 2+ messages in thread
From: Florian Weimer @ 2024-06-27 15:58 UTC (permalink / raw)
  To: libc-alpha

At least strncnmp is widely used for string prefix checking,
so add some language to make this valid.  Add tests to show that
glibc implements this extension.

This should probably go in after the strnlen/wcsnlen GNU extension.

Tested on aarch64-linux-gnu (Neoverse-V2), i686-linux-gnu (Zen 4),
powerpc64le-linux-gnu (POWER10), x86_64-linux-gnu (Zen 4).

On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934.
(There could be further issues because the test crashes rather early.)

---
 manual/string.texi        |  36 ++++++++-
 string/Makefile           |   1 +
 string/test-Xncmp-gnu.c   | 183 ++++++++++++++++++++++++++++++++++++++++++++++
 string/test-strncmp-gnu.c |   4 +
 wcsmbs/Makefile           |   1 +
 wcsmbs/test-wcsncmp-gnu.c |   5 ++
 6 files changed, 226 insertions(+), 4 deletions(-)

diff --git a/manual/string.texi b/manual/string.texi
index 0b667bd3fb..ecd3c66d43 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -1234,6 +1234,12 @@ char} objects, then promoted to @code{int}).
 
 If the contents of the two blocks are equal, @code{memcmp} returns
 @code{0}.
+
+Note that @code{memcmp} requires objects of at least @var{size} bytes at
+@var{a1} and @var{a2}.  The implementation does not necessarily stop
+processing after the first byte difference.  Use @code{strcmp} to
+compare a string with a string literal, and use the GNU extension of
+@code{strncmp} to check if a string has a given prefix.
 @end deftypefun
 
 @deftypefun int wmemcmp (const wchar_t *@var{a1}, const wchar_t *@var{a2}, size_t @var{size})
@@ -1247,6 +1253,13 @@ smaller or larger than the corresponding wide character in @var{a2}.
 
 If the contents of the two blocks are equal, @code{wmemcmp} returns
 @code{0}.
+
+Note that @code{wmemcmp} requires that @var{size} wide characters are
+available starting at @var{a1} and @var{a2}.  The implementation does
+not necessarily stop processing after the first difference encountered.
+Use @code{wcscmp} to compare a wide string with a wide string literal,
+and use the GNU extension of @code{wcsncmp} to check if a string has a
+given prefix.
 @end deftypefun
 
 On arbitrary arrays, the @code{memcmp} function is mostly useful for
@@ -1367,15 +1380,30 @@ This function is the similar to @code{strcmp}, except that no more than
 @var{size} bytes are compared.  In other words, if the two
 strings are the same in their first @var{size} bytes, the
 return value is zero.
+
+As a GNU extension, the pointer arguments do not need to point to arrays
+of at least @var{size} elements in some cases.  For example, for
+null-terminated strings @var{s1} and @var{s2}, the expression
+@code{strncmp (@var{s1}, @var{s2}, strlen (@var{s2})) == 0} is true if
+and only if the string @var{s2} is a prefix of the string @var{s1}.
+More generally, in the GNU version, @code{strncmp (@var{s1}, @var{s2},
+@var{size})} is valid if both @code{strnlen (@var{s1}, @var{size})} and
+@code{strnlen (@var{s2}, @var{size})} are valid.  In the prefix checking
+idiom, note that this still requires that @var{s1} is a null-terminated
+string there are fewer than @var{size} array elements starting at
+@var{s1}.
 @end deftypefun
 
 @deftypefun int wcsncmp (const wchar_t *@var{ws1}, const wchar_t *@var{ws2}, size_t @var{size})
 @standards{ISO, wchar.h}
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
-This function is similar to @code{wcscmp}, except that no more than
-@var{size} wide characters are compared.  In other words, if the two
-strings are the same in their first @var{size} wide characters, the
-return value is zero.
+This function is similar to @code{strncnmp}, except that it operates
+on wide characters instead of bytes.  At most @var{size} wide characters
+are compared.
+
+As a GNU extension, @code{wcsncmp (@var{ws1}, @var{ws2}, @var{size})} is
+valid if both @code{wcsnlen (@var{ws1}, @var{size})} and @code{wcsnlen
+(@var{ws2}, @var{size})} are valid.
 @end deftypefun
 
 @deftypefun int strncasecmp (const char *@var{s1}, const char *@var{s2}, size_t @var{n})
diff --git a/string/Makefile b/string/Makefile
index 8f31fa49e6..ad98d06391 100644
--- a/string/Makefile
+++ b/string/Makefile
@@ -181,6 +181,7 @@ tests := \
   test-strncasecmp \
   test-strncat \
   test-strncmp \
+  test-strncmp-gnu \
   test-strncpy \
   test-strndup \
   test-strnlen \
diff --git a/string/test-Xncmp-gnu.c b/string/test-Xncmp-gnu.c
new file mode 100644
index 0000000000..9dc1ecca3c
--- /dev/null
+++ b/string/test-Xncmp-gnu.c
@@ -0,0 +1,183 @@
+/* Test GNU extension for non-array inputs to string comparison functions.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+/* This skeleton file is included from string/test-strncmp-gnu.c and
+   wcsmbs/tst-wcsncmp-gnu.c to test that reading of the arrays stops
+   at the first null character.
+
+   TEST_IDENTIFIER must be the test function identifier.  TEST_NAME is
+   the same as a string.
+
+   CHAR must be defined as the character type.  */
+
+#include <array_length.h>
+#include <string.h>
+#include <support/check.h>
+#include <support/next_to_fault.h>
+#include <support/test-driver.h>
+#include <sys/param.h>
+#include <unistd.h>
+
+/* Much shorter than test-Xnlen-gnu.c because of deeply nested loops.  */
+enum { buffer_length = 80 };
+
+/* The test buffer layout follows what is described test-Xnlen-gnu.c,
+   except that there two buffers, left and right.  The variables
+   a_count, zero_count, start_offset are all duplicated.  */
+
+/* Return the maximum string length for a string that starts at
+   start_offset.  */
+static int
+string_length (int a_count, int start_offset)
+{
+  if (start_offset == buffer_length || start_offset >= a_count)
+    return 0;
+  else
+    return a_count - start_offset;
+}
+
+/* This is the valid maximum length argument computation for
+   strnlen/wcsnlen.  See text-Xnlen-gnu.c.  */
+static int
+maximum_length (int start_offset, int zero_count)
+{
+  if (start_offset == buffer_length)
+    return 0;
+  else if (zero_count > 0)
+    /* Effectively unbounded, but we need to stop fairly low,
+       otherwise testing takes too long.  */
+    return buffer_length + 32;
+  else
+    return buffer_length - start_offset;
+}
+
+typedef __typeof (TEST_IDENTIFIER) *proto_t;
+
+#define TEST_MAIN
+#include "test-string.h"
+
+IMPL (TEST_IDENTIFIER, 1)
+
+static int
+test_main (void)
+{
+  TEST_VERIFY_EXIT (sysconf (_SC_PAGESIZE) >= buffer_length);
+  test_init ();
+
+  struct support_next_to_fault left_ntf
+    = support_next_to_fault_allocate (buffer_length * sizeof (CHAR));
+  CHAR *left_buffer = (CHAR *) left_ntf.buffer;
+  struct support_next_to_fault right_ntf
+    = support_next_to_fault_allocate (buffer_length * sizeof (CHAR));
+  CHAR *right_buffer = (CHAR *) right_ntf.buffer;
+
+  FOR_EACH_IMPL (impl, 0)
+    {
+      printf ("info: testing %s\n", impl->name);
+      for (size_t i = 0; i < buffer_length; ++i)
+        left_buffer[i] = 'A';
+
+      for (int left_zero_count = 0; left_zero_count <= buffer_length;
+           ++left_zero_count)
+        {
+          if (left_zero_count > 0)
+            left_buffer[buffer_length - left_zero_count] = 0;
+          int left_a_count = buffer_length - left_zero_count;
+          for (size_t i = 0; i < buffer_length; ++i)
+            right_buffer[i] = 'A';
+          for (int right_zero_count = 0; right_zero_count <= buffer_length;
+               ++right_zero_count)
+            {
+              if (right_zero_count > 0)
+                right_buffer[buffer_length - right_zero_count] = 0;
+              int right_a_count = buffer_length - right_zero_count;
+              for (int left_start_offset = 0;
+                   left_start_offset <= buffer_length;
+                   ++left_start_offset)
+                {
+                  CHAR *left_start_pointer = left_buffer + left_start_offset;
+                  int left_maxlen
+                    = maximum_length (left_start_offset, left_zero_count);
+                  int left_length
+                    = string_length (left_a_count, left_start_offset);
+                  for (int right_start_offset = 0;
+                       right_start_offset <= buffer_length;
+                       ++right_start_offset)
+                    {
+                      CHAR *right_start_pointer
+                        = right_buffer + right_start_offset;
+                      int right_maxlen
+                        = maximum_length (right_start_offset, right_zero_count);
+                      int right_length
+                        = string_length (right_a_count, right_start_offset);
+
+                      /* Maximum length is modelled after strnlen/wcsnlen,
+                         and must be valid for both pointer arguments at
+                         the same time.  */
+                      int maxlen = MIN (left_maxlen, right_maxlen);
+
+                      for (int length_argument = 0; length_argument <= maxlen;
+                           ++length_argument)
+                        {
+                          if (test_verbose)
+                            {
+                              printf ("left: zero_count=%d"
+                                      " a_count=%d start_offset=%d\n",
+                                      left_zero_count, left_a_count,
+                                      left_start_offset);
+                              printf ("right: zero_count=%d"
+                                      " a_count=%d start_offset=%d\n",
+                                      right_zero_count, right_a_count,
+                                      right_start_offset);
+                              printf ("length argument: %d\n",
+                                      length_argument);
+                            }
+
+                          /* Effective lengths bounded by length argument.
+                             The effective length determines the
+                             outcome of the comparison.  */
+                          int left_effective
+                            = MIN (left_length, length_argument);
+                          int right_effective
+                            = MIN (right_length, length_argument);
+                          if (left_effective == right_effective)
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument), 0);
+                          else if (left_effective < right_effective)
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument) < 0, 1);
+                          else
+                            TEST_COMPARE (CALL (impl,
+                                                left_start_pointer,
+                                                right_start_pointer,
+                                                length_argument) > 0, 1);
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/string/test-strncmp-gnu.c b/string/test-strncmp-gnu.c
new file mode 100644
index 0000000000..0652145caa
--- /dev/null
+++ b/string/test-strncmp-gnu.c
@@ -0,0 +1,4 @@
+#define TEST_IDENTIFIER strncmp
+#define TEST_NAME "strncmp"
+typedef char CHAR;
+#include "test-Xncmp-gnu.c"
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 1cddd8cc6d..884b9ce8b7 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -158,6 +158,7 @@ tests := \
   test-wcslen \
   test-wcsncat \
   test-wcsncmp \
+  test-wcsncmp-gnu \
   test-wcsncpy \
   test-wcsnlen \
   test-wcspbrk \
diff --git a/wcsmbs/test-wcsncmp-gnu.c b/wcsmbs/test-wcsncmp-gnu.c
new file mode 100644
index 0000000000..6d085d300b
--- /dev/null
+++ b/wcsmbs/test-wcsncmp-gnu.c
@@ -0,0 +1,5 @@
+#include <wchar.h>
+#define TEST_IDENTIFIER wcsncmp
+#define TEST_NAME "wcsncmp"
+typedef wchar_t CHAR;
+#include "../string/test-Xncmp-gnu.c"

base-commit: 21738846a19eb4a36981efd37d9ee7cb6d687494


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] manual: Document a GNU extension for strncmp/wcsncmp
  2024-06-27 15:58 [PATCH] manual: Document a GNU extension for strncmp/wcsncmp Florian Weimer
@ 2024-06-27 16:33 ` Andreas Schwab
  0 siblings, 0 replies; 2+ messages in thread
From: Andreas Schwab @ 2024-06-27 16:33 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Jun 27 2024, Florian Weimer wrote:

> @@ -1367,15 +1380,30 @@ This function is the similar to @code{strcmp}, except that no more than
>  @var{size} bytes are compared.  In other words, if the two
>  strings are the same in their first @var{size} bytes, the
>  return value is zero.
> +
> +As a GNU extension, the pointer arguments do not need to point to arrays
> +of at least @var{size} elements in some cases.

I don't think this is needed.  The standard already says that characters
beyond the first null character are not compared.  Similar wording
exists also for the other strn and wcsn functions.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-06-27 16:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-27 15:58 [PATCH] manual: Document a GNU extension for strncmp/wcsncmp Florian Weimer
2024-06-27 16:33 ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).