public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant
@ 2020-05-02  0:06 Seija Kijin
  2020-11-13 18:53 ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Seija Kijin @ 2020-05-02  0:06 UTC (permalink / raw)
  To: gcc-patches

The original code in libiberty says "FIXME" and then says it has not been
validated to be ANSI compliant. However, this patch changes the function to
match implementations that ARE compliant, and such code is in the public
domain.

I ran the test results, and there are no test failures.

--- strstr.c 2020-03-12 07:07:24.000000000 -0400
+++ strstr_fixed.c 2020-05-01 19:53:13.904340902 -0400
@@ -16,26 +16,20 @@

 */

-
-/* FIXME:  The above description is ANSI compiliant.  This routine has not
-   been validated to comply with it.  -fnf */
-
 #include <stddef.h>

-extern char *strchr (const char *, int);
-extern int strncmp (const void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
 extern size_t strlen (const char *);

 char *
 strstr (const char *s1, const char *s2)
 {
-  const char *p = s1;
   const size_t len = strlen (s2);
-
-  for (; (p = strchr (p, *s2)) != 0; p++)
-    {
-      if (strncmp (p, s2, len) == 0)
- return (char *)p;
-    }
+  while (*s1)
+  {
+    if (!memcmp (s1, s2, len))
+      return (char *)s1;
+    ++s1;
+  }
   return (0);
 }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant
  2020-05-02  0:06 [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant Seija Kijin
@ 2020-11-13 18:53 ` Jeff Law
  2023-03-29 15:00   ` Jakub Jelinek
  0 siblings, 1 reply; 4+ messages in thread
From: Jeff Law @ 2020-11-13 18:53 UTC (permalink / raw)
  To: Seija Kijin, gcc-patches


On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> The original code in libiberty says "FIXME" and then says it has not been
> validated to be ANSI compliant. However, this patch changes the function to
> match implementations that ARE compliant, and such code is in the public
> domain.
>
> I ran the test results, and there are no test failures.

Thanks.  This seems to be the standard "simple" strstr implementation. 
There's significantly faster implementations available, but I doubt it's
worth the effort as the version in this file only gets used if there is
no system strstr.c.


I've pushed this patch to the trunk after fixing some minor formatting
issues

jeff



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant
  2020-11-13 18:53 ` Jeff Law
@ 2023-03-29 15:00   ` Jakub Jelinek
  2023-04-02 17:54     ` Jeff Law
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2023-03-29 15:00 UTC (permalink / raw)
  To: Jeff Law; +Cc: Seija Kijin, gcc-patches

On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote:
> 
> On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> > The original code in libiberty says "FIXME" and then says it has not been
> > validated to be ANSI compliant. However, this patch changes the function to
> > match implementations that ARE compliant, and such code is in the public
> > domain.
> >
> > I ran the test results, and there are no test failures.
> 
> Thanks.  This seems to be the standard "simple" strstr implementation. 
> There's significantly faster implementations available, but I doubt it's
> worth the effort as the version in this file only gets used if there is
> no system strstr.c.

Except that PR109306 says the new version is non-compliant and
is certainly slower than what we used to have.  The only problem I see
on the old version (sure, it is not very fast version) is that for
strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because
strchr in that case changed p to point to the last character and then
strncmp returned 0.

The question reported in PR109306 is whether memcmp is required not to
access characters beyond the first difference or not.
For all of memcmp/strcmp/strncmp, C17 says:
"The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp
is determined by the sign of the difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being compared."
but then in memcmp description says:
"The memcmp function compares the first n characters of the object pointed to by s1 to the first n
characters of the object pointed to by s2."
rather than something similar to strncmp wording:
"The strncmp function compares not more than n characters (characters that follow a null character
are not compared) from the array pointed to by s1 to the array pointed to by
s2."
So, while for strncmp it seems clearly well defined when there is zero
terminator before reaching the n, for memcmp it is unclear if say
int
memcmp (const void *s1, const void *s2, size_t n)
{
  int ret = 0;
  size_t i;
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  for (i = n; i; i--)
    if (p1[i - 1] != p2[i - 1])
      ret = p1[i - 1] < p2[i - 1] ? -1 : 1;
  return ret;
}
wouldn't be valid implementation (one which always compares all characters
and just returns non-zero from the first one that differs).

So, shouldn't we just revert and handle the len == 0 case correctly?

I think almost nothing really uses it, but still, the old version
at least worked nicer with a fast strchr.
Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred
because strchr already compared the first character.

2023-03-29  Jakub Jelinek  <jakub@redhat.com>

	PR other/109306
	* strstr.c: Revert the 2020-11-13 changes.
	(strstr): Return s1 if len is 0.

--- libiberty/strstr.c.jj	2020-11-14 10:40:11.346408311 +0100
+++ libiberty/strstr.c	2023-03-29 16:30:11.163602758 +0200
@@ -16,20 +16,25 @@ length, the function returns @var{string
 
 */
 
 #include <stddef.h>
 
-extern int memcmp (const void *, const void *, size_t);
+extern char *strchr (const char *, int);
+extern int strncmp (const void *, const void *, size_t);
 extern size_t strlen (const char *);
 
 char *
 strstr (const char *s1, const char *s2)
 {
+  const char *p = s1;
   const size_t len = strlen (s2);
-  while (*s1)
+
+  if (!len)
+    return s1;
+
+  for (; (p = strchr (p, *s2)) != 0; p++)
     {
-      if (!memcmp (s1, s2, len))
-	return (char *)s1;
-      ++s1;
+      if (strncmp (p, s2, len) == 0)
+	return (char *)p;
     }
   return (0);
 }


	Jakub


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant
  2023-03-29 15:00   ` Jakub Jelinek
@ 2023-04-02 17:54     ` Jeff Law
  0 siblings, 0 replies; 4+ messages in thread
From: Jeff Law @ 2023-04-02 17:54 UTC (permalink / raw)
  To: Jakub Jelinek, Jeff Law; +Cc: Seija Kijin, gcc-patches



On 3/29/23 09:00, Jakub Jelinek via Gcc-patches wrote:
> On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote:
>>
>> On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
>>> The original code in libiberty says "FIXME" and then says it has not been
>>> validated to be ANSI compliant. However, this patch changes the function to
>>> match implementations that ARE compliant, and such code is in the public
>>> domain.
>>>
>>> I ran the test results, and there are no test failures.
>>
>> Thanks.  This seems to be the standard "simple" strstr implementation.
>> There's significantly faster implementations available, but I doubt it's
>> worth the effort as the version in this file only gets used if there is
>> no system strstr.c.
> 
> Except that PR109306 says the new version is non-compliant and
> is certainly slower than what we used to have.  The only problem I see
> on the old version (sure, it is not very fast version) is that for
> strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because
> strchr in that case changed p to point to the last character and then
> strncmp returned 0.
> 
> The question reported in PR109306 is whether memcmp is required not to
> access characters beyond the first difference or not.
> For all of memcmp/strcmp/strncmp, C17 says:
> "The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp
> is determined by the sign of the difference between the values of the first pair of characters (both
> interpreted as unsigned char) that differ in the objects being compared."
> but then in memcmp description says:
> "The memcmp function compares the first n characters of the object pointed to by s1 to the first n
> characters of the object pointed to by s2."
> rather than something similar to strncmp wording:
> "The strncmp function compares not more than n characters (characters that follow a null character
> are not compared) from the array pointed to by s1 to the array pointed to by
> s2."
> So, while for strncmp it seems clearly well defined when there is zero
> terminator before reaching the n, for memcmp it is unclear if say
> int
> memcmp (const void *s1, const void *s2, size_t n)
> {
>    int ret = 0;
>    size_t i;
>    const unsigned char *p1 = (const unsigned char *) s1;
>    const unsigned char *p2 = (const unsigned char *) s2;
> 
>    for (i = n; i; i--)
>      if (p1[i - 1] != p2[i - 1])
>        ret = p1[i - 1] < p2[i - 1] ? -1 : 1;
>    return ret;
> }
> wouldn't be valid implementation (one which always compares all characters
> and just returns non-zero from the first one that differs).
> 
> So, shouldn't we just revert and handle the len == 0 case correctly?
> 
> I think almost nothing really uses it, but still, the old version
> at least worked nicer with a fast strchr.
> Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred
> because strchr already compared the first character.
> 
> 2023-03-29  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR other/109306
> 	* strstr.c: Revert the 2020-11-13 changes.
> 	(strstr): Return s1 if len is 0.
I really don't think the performance of this code matters.  If this 
fixes the compliance issue, then it's fine with me.
Jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-02 17:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-02  0:06 [PATCH] libiberty: Make strstr.c in libiberty ANSI compliant Seija Kijin
2020-11-13 18:53 ` Jeff Law
2023-03-29 15:00   ` Jakub Jelinek
2023-04-02 17:54     ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).