public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
@ 2019-10-30 10:25 Florian Weimer (Code Review)
  2019-10-30 10:44 ` Andreas Schwab
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Florian Weimer (Code Review) @ 2019-10-30 10:25 UTC (permalink / raw)
  To: libc-alpha

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444
......................................................................

manual: Clarify strnlen, wcsnlen, strndup null termination behavior

It is required that the inputs are arrays, as reading is not
guaranteed to stop on the first null byte.

Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
---
M manual/string.texi
1 file changed, 10 insertions(+), 0 deletions(-)



diff --git a/manual/string.texi b/manual/string.texi
index a1c58e5..ba8a588 100644
--- a/manual/string.texi
+++ b/manual/string.texi
@@ -328,6 +328,10 @@
     @result{} 5
 @end smallexample
 
+Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
+is undefined to call @code{strnlen} on a shorter array, even if it is
+known that the shorter array contains a null terminator.
+
 This function is a GNU extension and is declared in @file{string.h}.
 @end deftypefun
 
@@ -336,6 +340,8 @@
 @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
 @code{wcsnlen} is the wide character equivalent to @code{strnlen}.  The
 @var{maxlen} parameter specifies the maximum number of wide characters.
+Similar to @code{strnlen}, @var{ws} must point to an array of at least
+@var{maxlen} wide characters.
 
 This function is a GNU extension and is declared in @file{wchar.h}.
 @end deftypefun
@@ -919,6 +925,10 @@
 copies just the first @var{size} bytes and adds a closing null byte.
 Otherwise all bytes are copied and the string is terminated.
 
+Note that @var{s} must be an array of at least @var{size} bytes.  It
+is undefined to call @code{strndup} on a shorter array, even if it is
+known that the shorter array contains a null terminator.
+
 This function differs from @code{strncpy} in that it always terminates
 the destination string.
 

-- 
Gerrit-Project: glibc
Gerrit-Branch: master
Gerrit-Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
Gerrit-Change-Number: 444
Gerrit-PatchSet: 1
Gerrit-Owner: Florian Weimer <fweimer@redhat.com>
Gerrit-MessageType: newchange

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:25 [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior Florian Weimer (Code Review)
@ 2019-10-30 10:44 ` Andreas Schwab
  2019-10-30 10:55   ` Florian Weimer
  2019-11-27 19:08 ` Carlos O'Donell (Code Review)
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 10:44 UTC (permalink / raw)
  To: Florian Weimer (Code Review); +Cc: libc-alpha, fweimer

On Okt 30 2019, Florian Weimer (Code Review) wrote:

> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
> +is undefined to call @code{strnlen} on a shorter array, even if it is
> +known that the shorter array contains a null terminator.

This is not true.  strnlen _always_ stops before the null byte.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:44 ` Andreas Schwab
@ 2019-10-30 10:55   ` Florian Weimer
  2019-10-30 11:00     ` Andreas Schwab
  0 siblings, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2019-10-30 10:55 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

* Andreas Schwab:

> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>
>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>> +known that the shorter array contains a null terminator.
>
> This is not true.  strnlen _always_ stops before the null byte.

This is not how it is specified in POSIX.

Our generic implementation of strnlen performs out-of-bounds pointer
arithmetic in that case, and it looks really iffy:

  const char *char_ptr, *end_ptr = str + maxlen;
…
  if (__glibc_unlikely (end_ptr < str))
    end_ptr = (const char *) ~0UL;

GCC does the right thing on x86-64, I think, but that's far from
guaranteed.

And what about wcsnlen?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:55   ` Florian Weimer
@ 2019-10-30 11:00     ` Andreas Schwab
  2019-10-30 11:03       ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 11:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Okt 30 2019, Florian Weimer wrote:

> * Andreas Schwab:
>
>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>
>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>> +known that the shorter array contains a null terminator.
>>
>> This is not true.  strnlen _always_ stops before the null byte.
>
> This is not how it is specified in POSIX.

Yes, it is.

    The strnlen() function shall return the number of bytes preceding
    the first null byte in the array to which s points, if s contains a
    null byte within the first maxlen bytes; otherwise, it shall return
    maxlen.

There is nothing undefined here.  Your interpretation would be
completely useless anyway.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 11:00     ` Andreas Schwab
@ 2019-10-30 11:03       ` Florian Weimer
  2019-10-30 11:10         ` Andreas Schwab
  2019-11-28  9:43         ` Florian Weimer
  0 siblings, 2 replies; 24+ messages in thread
From: Florian Weimer @ 2019-10-30 11:03 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: libc-alpha

* Andreas Schwab:

> On Okt 30 2019, Florian Weimer wrote:
>
>> * Andreas Schwab:
>>
>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>
>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>> +known that the shorter array contains a null terminator.
>>>
>>> This is not true.  strnlen _always_ stops before the null byte.
>>
>> This is not how it is specified in POSIX.
>
> Yes, it is.
>
>     The strnlen() function shall return the number of bytes preceding
>     the first null byte in the array to which s points, if s contains a
>     null byte within the first maxlen bytes; otherwise, it shall return
>     maxlen.
>
> There is nothing undefined here.  Your interpretation would be
> completely useless anyway.

It says “array”, which implies a length.  Admittedly, it does not say
that maxlen corresponds to the arrray length.  POSIX also says this:

| The strnlen() function shall never examine more than maxlen bytes of
| the array pointed to by s.

But it does NOT say that reading stops after the first null terminator.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 11:03       ` Florian Weimer
@ 2019-10-30 11:10         ` Andreas Schwab
  2019-10-30 12:01           ` Zack Weinberg
  2019-11-28  9:43         ` Florian Weimer
  1 sibling, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 11:10 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Okt 30 2019, Florian Weimer wrote:

> * Andreas Schwab:
>
>> On Okt 30 2019, Florian Weimer wrote:
>>
>>> * Andreas Schwab:
>>>
>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>
>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>> +known that the shorter array contains a null terminator.
>>>>
>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>
>>> This is not how it is specified in POSIX.
>>
>> Yes, it is.
>>
>>     The strnlen() function shall return the number of bytes preceding
>>     the first null byte in the array to which s points, if s contains a
>>     null byte within the first maxlen bytes; otherwise, it shall return
>>     maxlen.
>>
>> There is nothing undefined here.  Your interpretation would be
>> completely useless anyway.
>
> It says “array”

Yes, because a null terminator is not required.

> But it does NOT say that reading stops after the first null terminator.

Yes, it does, see above.  Otherwise it doesn't make sense.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 11:10         ` Andreas Schwab
@ 2019-10-30 12:01           ` Zack Weinberg
  2019-10-30 16:20             ` Andreas Schwab
  2019-10-30 17:24             ` Joseph Myers
  0 siblings, 2 replies; 24+ messages in thread
From: Zack Weinberg @ 2019-10-30 12:01 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, GNU C Library

On Wed, Oct 30, 2019 at 7:10 AM Andreas Schwab <schwab@suse.de> wrote:
> On Okt 30 2019, Florian Weimer wrote:
> > * Andreas Schwab:
> >> On Okt 30 2019, Florian Weimer wrote:
> >>> * Andreas Schwab:
> >>>>
> >>>> strnlen _always_ stops before the null byte.
> >>> This is not how it is specified in POSIX.
> >> Yes, it is.
> >>
> >> # The strnlen() function shall return the number of bytes preceding
> >> # the first null byte in the array to which s points, if s contains a
> >> # null byte within the first maxlen bytes; otherwise, it shall return
> >> # maxlen.
> >> There is nothing undefined here. Your interpretation would be
> >> completely useless anyway.
> >
> > It says “array”
>
> Yes, because a null terminator is not required.
>
> > But it does NOT say that reading stops after the first null terminator.
>
> Yes, it does, see above.  Otherwise it doesn't make sense.

I agree with Florian's interpretation. The text Andreas quoted only says
that strnlen shall find the first null byte within the array and
return the number of bytes preceding. It does not say anything about
whether read accesses beyond the first NUL are allowed.

Looking at
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/strnlen.html>,
Andreas quoted only the RETURN VALUE section of the specification;
there's another paragraph in the DESCRIPTION section which clarifies:

# The strnlen() function shall compute the smaller of the number of
# bytes in the array to which s points, not including any terminating
# NUL character, or the value of the maxlen argument.  The strnlen()
# function shall never examine more than maxlen bytes of the array
# pointed to by s.

It says that accesses beyond maxlen are forbidden, but it *doesn't*
say that accesses beyond the first NUL are forbidden; therefore they
are allowed.

As a matter of QoI I think our implementation should take care not to
access beyond the end of the *page* containing the first NUL
(which happens naturally if we don't do speculative or misaligned
loads) but it is appropriate for the manual to warn people that
portable code needs to make sure the entire array is readable.

(I have not looked at the rest of the proposed changes.)

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 12:01           ` Zack Weinberg
@ 2019-10-30 16:20             ` Andreas Schwab
  2019-10-30 16:31               ` Zack Weinberg
  2019-10-30 17:24             ` Joseph Myers
  1 sibling, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 16:20 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Florian Weimer, GNU C Library

On Okt 30 2019, Zack Weinberg wrote:

> It says that accesses beyond maxlen are forbidden, but it *doesn't*
> say that accesses beyond the first NUL are forbidden; therefore they
> are allowed.

Neither does it say that about strncpy or strncat.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 16:20             ` Andreas Schwab
@ 2019-10-30 16:31               ` Zack Weinberg
  2019-10-30 16:47                 ` Andreas Schwab
  0 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2019-10-30 16:31 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, GNU C Library

On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote:
> On Okt 30 2019, Zack Weinberg wrote:
>
> > It says that accesses beyond maxlen are forbidden, but it *doesn't*
> > say that accesses beyond the first NUL are forbidden; therefore they
> > are allowed.
>
> Neither does it say that about strncpy or strncat.

I don't see why that would change anything.

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 16:31               ` Zack Weinberg
@ 2019-10-30 16:47                 ` Andreas Schwab
  2019-10-30 16:58                   ` Zack Weinberg
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 16:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Florian Weimer, GNU C Library

On Okt 30 2019, Zack Weinberg wrote:

> On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote:
>> On Okt 30 2019, Zack Weinberg wrote:
>>
>> > It says that accesses beyond maxlen are forbidden, but it *doesn't*
>> > say that accesses beyond the first NUL are forbidden; therefore they
>> > are allowed.
>>
>> Neither does it say that about strncpy or strncat.
>
> I don't see why that would change anything.

That means that strncpy (x, "a", 10) is undefined.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 16:47                 ` Andreas Schwab
@ 2019-10-30 16:58                   ` Zack Weinberg
  2019-10-30 17:26                     ` Andreas Schwab
  0 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2019-10-30 16:58 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Florian Weimer, GNU C Library

On Wed, Oct 30, 2019 at 12:47 PM Andreas Schwab <schwab@suse.de> wrote:
> On Okt 30 2019, Zack Weinberg wrote:
> > On Wed, Oct 30, 2019 at 12:20 PM Andreas Schwab <schwab@suse.de> wrote:
> >> On Okt 30 2019, Zack Weinberg wrote:
> >>
> >> > It says that accesses beyond maxlen are forbidden, but it *doesn't*
> >> > say that accesses beyond the first NUL are forbidden; therefore they
> >> > are allowed.
> >>
> >> Neither does it say that about strncpy or strncat.
> >
> > I don't see why that would change anything.
>
> That means that strncpy (x, "a", 10) is undefined.

Yes, that could be a defect in the specification of strncpy (I can
argue either way about what the parenthetical "(bytes that follow a
NUL character are not copied)" means).  How does text's presence or
absence in the specification of strncpy change anything about the
requirements on strnlen?

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 12:01           ` Zack Weinberg
  2019-10-30 16:20             ` Andreas Schwab
@ 2019-10-30 17:24             ` Joseph Myers
  1 sibling, 0 replies; 24+ messages in thread
From: Joseph Myers @ 2019-10-30 17:24 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Andreas Schwab, Florian Weimer, GNU C Library

On Wed, 30 Oct 2019, Zack Weinberg wrote:

> I agree with Florian's interpretation. The text Andreas quoted only says
> that strnlen shall find the first null byte within the array and
> return the number of bytes preceding. It does not say anything about
> whether read accesses beyond the first NUL are allowed.

Also, ISO C has special wording for memchr for this issue "The 
implementation shall behave as if it reads the characters sequentially and 
stops as soon as a matching character is found.", which it doesn't for 
other string functions.  (As noted in the prior discussion of strnlen in 
bug 19391, currently marked RESOLVED/INVALID.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 16:58                   ` Zack Weinberg
@ 2019-10-30 17:26                     ` Andreas Schwab
  2019-10-30 18:12                       ` Zack Weinberg
  0 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2019-10-30 17:26 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Florian Weimer, GNU C Library

On Okt 30 2019, Zack Weinberg wrote:

> Yes, that could be a defect in the specification of strncpy (I can
> argue either way about what the parenthetical "(bytes that follow a
> NUL character are not copied)" means).  How does text's presence or
> absence in the specification of strncpy change anything about the
> requirements on strnlen?

Because it shows how flawed your argument is.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 17:26                     ` Andreas Schwab
@ 2019-10-30 18:12                       ` Zack Weinberg
  2019-10-30 18:36                         ` Florian Weimer
  0 siblings, 1 reply; 24+ messages in thread
From: Zack Weinberg @ 2019-10-30 18:12 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: GNU C Library

On Wed, Oct 30, 2019 at 1:26 PM Andreas Schwab <schwab@suse.de> wrote:
> On Okt 30 2019, Zack Weinberg wrote:
>
> > Yes, that could be a defect in the specification of strncpy (I can
> > argue either way about what the parenthetical "(bytes that follow a
> > NUL character are not copied)" means).  How does text's presence or
> > absence in the specification of strncpy change anything about the
> > requirements on strnlen?
>
> Because it shows how flawed your argument is.

Are you seriously saying that I have to read the specification of
strncpy to understand the specification of strnlen?  That's not how I
was taught to read standards.

zw

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 18:12                       ` Zack Weinberg
@ 2019-10-30 18:36                         ` Florian Weimer
  0 siblings, 0 replies; 24+ messages in thread
From: Florian Weimer @ 2019-10-30 18:36 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Andreas Schwab, GNU C Library

* Zack Weinberg:

> On Wed, Oct 30, 2019 at 1:26 PM Andreas Schwab <schwab@suse.de> wrote:
>> On Okt 30 2019, Zack Weinberg wrote:
>>
>> > Yes, that could be a defect in the specification of strncpy (I can
>> > argue either way about what the parenthetical "(bytes that follow a
>> > NUL character are not copied)" means).  How does text's presence or
>> > absence in the specification of strncpy change anything about the
>> > requirements on strnlen?
>>
>> Because it shows how flawed your argument is.
>
> Are you seriously saying that I have to read the specification of
> strncpy to understand the specification of strnlen?  That's not how I
> was taught to read standards.

I actually find the strncpy-based argument quite convincing.

And it's really the way you have to read standards if you want derive
meaning from them.  You need to look at how certain terms are used in
other contexts and what they apply there.  For strncpy, clearly the
intent is that it is safe to specify a source string shorter than the
target array.  If comparable wording is used to describe the strnlen
behavior, then it makes sense to assume that the POSIX authors
probably have not thought about this corner case.  At the very least,
it tells us that the standard does not say what the behavior should be
in this case.

Does anyone know if we have test cases that exercise page crossing
after the null terminator in strnlen?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:25 [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior Florian Weimer (Code Review)
  2019-10-30 10:44 ` Andreas Schwab
@ 2019-11-27 19:08 ` Carlos O'Donell (Code Review)
  2019-11-27 19:14 ` Florian Weimer (Code Review)
  2019-11-27 22:11 ` Carlos O'Donell (Code Review)
  3 siblings, 0 replies; 24+ messages in thread
From: Carlos O'Donell (Code Review) @ 2019-11-27 19:08 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

Carlos O'Donell has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444
......................................................................


Patch Set 1: Code-Review+2

(2 comments)

Looks good to me.

Reivewed-by: Carlos O'Donell <carlos@redhat.com>

| --- /dev/null
| +++ /COMMIT_MSG
| @@ -1,0 +1,12 @@ 
| +Parent:     177a3d48 (y2038: linux: Provide __clock_getres64 implementation)
| +Author:     Florian Weimer <fweimer@redhat.com>
| +AuthorDate: 2019-10-30 11:21:18 +0100
| +Commit:     Florian Weimer <fweimer@redhat.com>
| +CommitDate: 2019-10-30 11:21:18 +0100
| +
| +manual: Clarify strnlen, wcsnlen, strndup null termination behavior
| +
| +It is required that the inputs are arrays, as reading is not
| +guaranteed to stop on the first null byte.

PS1, Line 10:

OK. Agreed.

| +
| +Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
| --- manual/string.texi
| +++ manual/string.texi
| @@ -321,18 +321,22 @@ is more efficient and works even if @var{s} is not null-terminated so
|  long as @var{maxlen} does not exceed the size of @var{s}'s array.
|  
|  @smallexample
|  char string[32] = "hello, world";
|  strnlen (string, 32)
|      @result{} 12
|  strnlen (string, 5)
|      @result{} 5
|  @end smallexample
|  
| +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
| +is undefined to call @code{strnlen} on a shorter array, even if it is
| +known that the shorter array contains a null terminator.

PS1, Line 333:

OK. Agreed, you must have at least maxlen bytes, othewise it's
undefined. We might even create something that scans backwards for
NULL bytes knowing we have maxlen bytes.

| +
|  This function is a GNU extension and is declared in @file{string.h}.
|  @end deftypefun
|  
|  @deftypefun size_t wcsnlen (const wchar_t *@var{ws}, size_t @var{maxlen})
|  @standards{GNU, wchar.h}
|  @safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
|  @code{wcsnlen} is the wide character equivalent to @code{strnlen}.  The
|  @var{maxlen} parameter specifies the maximum number of wide characters.

-- 
Gerrit-Project: glibc
Gerrit-Branch: master
Gerrit-Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
Gerrit-Change-Number: 444
Gerrit-PatchSet: 1
Gerrit-Owner: Florian Weimer <fweimer@redhat.com>
Gerrit-Reviewer: Carlos O'Donell <carlos@redhat.com>
Gerrit-Comment-Date: Wed, 27 Nov 2019 19:08:06 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:25 [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior Florian Weimer (Code Review)
  2019-10-30 10:44 ` Andreas Schwab
  2019-11-27 19:08 ` Carlos O'Donell (Code Review)
@ 2019-11-27 19:14 ` Florian Weimer (Code Review)
  2019-11-27 22:11 ` Carlos O'Donell (Code Review)
  3 siblings, 0 replies; 24+ messages in thread
From: Florian Weimer (Code Review) @ 2019-11-27 19:14 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha; +Cc: Carlos O'Donell

Florian Weimer has posted comments on this change.

Change URL: https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444
......................................................................


Patch Set 1: Code-Review-2

There has been objection to this on libc-alpha:

https://sourceware.org/ml/libc-alpha/2019-10/msg00939.html

I should have probably closed out this review.


-- 
Gerrit-Project: glibc
Gerrit-Branch: master
Gerrit-Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
Gerrit-Change-Number: 444
Gerrit-PatchSet: 1
Gerrit-Owner: Florian Weimer <fweimer@redhat.com>
Gerrit-Reviewer: Carlos O'Donell <carlos@redhat.com>
Gerrit-Reviewer: Florian Weimer <fweimer@redhat.com>
Gerrit-Comment-Date: Wed, 27 Nov 2019 19:14:00 +0000
Gerrit-HasComments: No
Gerrit-Has-Labels: Yes
Gerrit-MessageType: comment

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 10:25 [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior Florian Weimer (Code Review)
                   ` (2 preceding siblings ...)
  2019-11-27 19:14 ` Florian Weimer (Code Review)
@ 2019-11-27 22:11 ` Carlos O'Donell (Code Review)
  3 siblings, 0 replies; 24+ messages in thread
From: Carlos O'Donell (Code Review) @ 2019-11-27 22:11 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

Carlos O'Donell has abandoned this change. ( https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/444 )

Change subject: manual: Clarify strnlen, wcsnlen, strndup null termination behavior
......................................................................


Abandoned

Dropping this change due to: https://sourceware.org/ml/libc-alpha/2019-10/msg00939.html
-- 
Gerrit-Project: glibc
Gerrit-Branch: master
Gerrit-Change-Id: Ia3e68bc2d4d7e967df141702fb2f600cbd4a6432
Gerrit-Change-Number: 444
Gerrit-PatchSet: 1
Gerrit-Owner: Florian Weimer <fweimer@redhat.com>
Gerrit-Reviewer: Carlos O'Donell <carlos@redhat.com>
Gerrit-Reviewer: Florian Weimer <fweimer@redhat.com>
Gerrit-MessageType: abandon

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-10-30 11:03       ` Florian Weimer
  2019-10-30 11:10         ` Andreas Schwab
@ 2019-11-28  9:43         ` Florian Weimer
  2019-11-28 15:56           ` Carlos O'Donell
  1 sibling, 1 reply; 24+ messages in thread
From: Florian Weimer @ 2019-11-28  9:43 UTC (permalink / raw)
  To: libc-alpha

* Florian Weimer:

> * Andreas Schwab:
>
>> On Okt 30 2019, Florian Weimer wrote:
>>
>>> * Andreas Schwab:
>>>
>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>
>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>> +known that the shorter array contains a null terminator.
>>>>
>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>
>>> This is not how it is specified in POSIX.
>>
>> Yes, it is.
>>
>>     The strnlen() function shall return the number of bytes preceding
>>     the first null byte in the array to which s points, if s contains a
>>     null byte within the first maxlen bytes; otherwise, it shall return
>>     maxlen.
>>
>> There is nothing undefined here.  Your interpretation would be
>> completely useless anyway.
>
> It says “array”, which implies a length.  Admittedly, it does not say
> that maxlen corresponds to the arrray length.  POSIX also says this:
>
> | The strnlen() function shall never examine more than maxlen bytes of
> | the array pointed to by s.
>
> But it does NOT say that reading stops after the first null terminator.

I have built glibc with --disable-multi-arch and this patch on x86-64:

diff --git a/string/strnlen.c b/string/strnlen.c
index 0b3a12e8b1..d5781dbb6f 100644
--- a/string/strnlen.c
+++ b/string/strnlen.c
@@ -33,6 +33,10 @@
 size_t
 __strnlen (const char *str, size_t maxlen)
 {
+  /* Assert that the entire input is readable.  */
+  for (size_t i = 0; i < maxlen; ++i)
+    asm volatile ("" :: "r" (str[i]));
+
   const char *char_ptr, *end_ptr = str + maxlen;
   const unsigned long int *longword_ptr;
   unsigned long int longword, himagic, lomagic;
diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
deleted file mode 100644
index d3c43ac482..0000000000
--- a/sysdeps/x86_64/strnlen.S
+++ /dev/null
@@ -1,6 +0,0 @@
-#define AS_STRNLEN
-#define strlen __strnlen
-#include "strlen.S"
-
-weak_alias (__strnlen, strnlen);
-libc_hidden_builtin_def (strnlen)
diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
index 17e004dcc0..0d3709ac91 100644
--- a/wcsmbs/wcsnlen.c
+++ b/wcsmbs/wcsnlen.c
@@ -26,6 +26,10 @@
 size_t
 __wcsnlen (const wchar_t *s, size_t maxlen)
 {
+  /* Assert that the entire input is readable.  */
+  for (size_t i = 0; i < maxlen; ++i)
+    asm volatile ("" :: "r" (s[i]));
+
   const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
   if (ret)
     maxlen = ret - s;

The resulting crashes demonstrate that the test suite verifies that we
do not treat the input as an array (to some degree; there might be
scopes in coverage).

I think we should document this as a GNU extension.  Thoughts?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-11-28  9:43         ` Florian Weimer
@ 2019-11-28 15:56           ` Carlos O'Donell
  2019-11-28 15:58             ` Carlos O'Donell
  0 siblings, 1 reply; 24+ messages in thread
From: Carlos O'Donell @ 2019-11-28 15:56 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 11/28/19 4:43 AM, Florian Weimer wrote:
> * Florian Weimer:
> 
>> * Andreas Schwab:
>>
>>> On Okt 30 2019, Florian Weimer wrote:
>>>
>>>> * Andreas Schwab:
>>>>
>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>
>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>> +known that the shorter array contains a null terminator.
>>>>>
>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>
>>>> This is not how it is specified in POSIX.
>>>
>>> Yes, it is.
>>>
>>>     The strnlen() function shall return the number of bytes preceding
>>>     the first null byte in the array to which s points, if s contains a
>>>     null byte within the first maxlen bytes; otherwise, it shall return
>>>     maxlen.
>>>
>>> There is nothing undefined here.  Your interpretation would be
>>> completely useless anyway.
>>
>> It says “array”, which implies a length.  Admittedly, it does not say
>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>
>> | The strnlen() function shall never examine more than maxlen bytes of
>> | the array pointed to by s.
>>
>> But it does NOT say that reading stops after the first null terminator.
> 
> I have built glibc with --disable-multi-arch and this patch on x86-64:
> 
> diff --git a/string/strnlen.c b/string/strnlen.c
> index 0b3a12e8b1..d5781dbb6f 100644
> --- a/string/strnlen.c
> +++ b/string/strnlen.c
> @@ -33,6 +33,10 @@
>  size_t
>  __strnlen (const char *str, size_t maxlen)
>  {
> +  /* Assert that the entire input is readable.  */
> +  for (size_t i = 0; i < maxlen; ++i)
> +    asm volatile ("" :: "r" (str[i]));
> +
>    const char *char_ptr, *end_ptr = str + maxlen;
>    const unsigned long int *longword_ptr;
>    unsigned long int longword, himagic, lomagic;
> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
> deleted file mode 100644
> index d3c43ac482..0000000000
> --- a/sysdeps/x86_64/strnlen.S
> +++ /dev/null
> @@ -1,6 +0,0 @@
> -#define AS_STRNLEN
> -#define strlen __strnlen
> -#include "strlen.S"
> -
> -weak_alias (__strnlen, strnlen);
> -libc_hidden_builtin_def (strnlen)
> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
> index 17e004dcc0..0d3709ac91 100644
> --- a/wcsmbs/wcsnlen.c
> +++ b/wcsmbs/wcsnlen.c
> @@ -26,6 +26,10 @@
>  size_t
>  __wcsnlen (const wchar_t *s, size_t maxlen)
>  {
> +  /* Assert that the entire input is readable.  */
> +  for (size_t i = 0; i < maxlen; ++i)
> +    asm volatile ("" :: "r" (s[i]));
> +
>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>    if (ret)
>      maxlen = ret - s;
> 
> The resulting crashes demonstrate that the test suite verifies that we
> do not treat the input as an array (to some degree; there might be
> scopes in coverage).
> 
> I think we should document this as a GNU extension.  Thoughts?

We should absolutely document this. It's an implementation-dependent detail
that we choose to interpret the standard in this way.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-11-28 15:56           ` Carlos O'Donell
@ 2019-11-28 15:58             ` Carlos O'Donell
  2019-11-28 18:23               ` Rich Felker
  0 siblings, 1 reply; 24+ messages in thread
From: Carlos O'Donell @ 2019-11-28 15:58 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha, Rich Felker

On 11/28/19 10:56 AM, Carlos O'Donell wrote:
> On 11/28/19 4:43 AM, Florian Weimer wrote:
>> * Florian Weimer:
>>
>>> * Andreas Schwab:
>>>
>>>> On Okt 30 2019, Florian Weimer wrote:
>>>>
>>>>> * Andreas Schwab:
>>>>>
>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>>
>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>>> +known that the shorter array contains a null terminator.
>>>>>>
>>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>>
>>>>> This is not how it is specified in POSIX.
>>>>
>>>> Yes, it is.
>>>>
>>>>     The strnlen() function shall return the number of bytes preceding
>>>>     the first null byte in the array to which s points, if s contains a
>>>>     null byte within the first maxlen bytes; otherwise, it shall return
>>>>     maxlen.
>>>>
>>>> There is nothing undefined here.  Your interpretation would be
>>>> completely useless anyway.
>>>
>>> It says “array”, which implies a length.  Admittedly, it does not say
>>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>>
>>> | The strnlen() function shall never examine more than maxlen bytes of
>>> | the array pointed to by s.
>>>
>>> But it does NOT say that reading stops after the first null terminator.
>>
>> I have built glibc with --disable-multi-arch and this patch on x86-64:
>>
>> diff --git a/string/strnlen.c b/string/strnlen.c
>> index 0b3a12e8b1..d5781dbb6f 100644
>> --- a/string/strnlen.c
>> +++ b/string/strnlen.c
>> @@ -33,6 +33,10 @@
>>  size_t
>>  __strnlen (const char *str, size_t maxlen)
>>  {
>> +  /* Assert that the entire input is readable.  */
>> +  for (size_t i = 0; i < maxlen; ++i)
>> +    asm volatile ("" :: "r" (str[i]));
>> +
>>    const char *char_ptr, *end_ptr = str + maxlen;
>>    const unsigned long int *longword_ptr;
>>    unsigned long int longword, himagic, lomagic;
>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
>> deleted file mode 100644
>> index d3c43ac482..0000000000
>> --- a/sysdeps/x86_64/strnlen.S
>> +++ /dev/null
>> @@ -1,6 +0,0 @@
>> -#define AS_STRNLEN
>> -#define strlen __strnlen
>> -#include "strlen.S"
>> -
>> -weak_alias (__strnlen, strnlen);
>> -libc_hidden_builtin_def (strnlen)
>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
>> index 17e004dcc0..0d3709ac91 100644
>> --- a/wcsmbs/wcsnlen.c
>> +++ b/wcsmbs/wcsnlen.c
>> @@ -26,6 +26,10 @@
>>  size_t
>>  __wcsnlen (const wchar_t *s, size_t maxlen)
>>  {
>> +  /* Assert that the entire input is readable.  */
>> +  for (size_t i = 0; i < maxlen; ++i)
>> +    asm volatile ("" :: "r" (s[i]));
>> +
>>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>>    if (ret)
>>      maxlen = ret - s;
>>
>> The resulting crashes demonstrate that the test suite verifies that we
>> do not treat the input as an array (to some degree; there might be
>> scopes in coverage).
>>
>> I think we should document this as a GNU extension.  Thoughts?
> 
> We should absolutely document this. It's an implementation-dependent detail
> that we choose to interpret the standard in this way.
> 

I also think we should get changes into the linux man page project to call
this out so that nobody thinks about changing this again and so the
implementation is clear.

Have we asked Rich what musl does and what he thinks on the topic?

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-11-28 15:58             ` Carlos O'Donell
@ 2019-11-28 18:23               ` Rich Felker
  2019-11-28 18:38                 ` Szabolcs Nagy
  0 siblings, 1 reply; 24+ messages in thread
From: Rich Felker @ 2019-11-28 18:23 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, libc-alpha

On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:
> On 11/28/19 10:56 AM, Carlos O'Donell wrote:
> > On 11/28/19 4:43 AM, Florian Weimer wrote:
> >> * Florian Weimer:
> >>
> >>> * Andreas Schwab:
> >>>
> >>>> On Okt 30 2019, Florian Weimer wrote:
> >>>>
> >>>>> * Andreas Schwab:
> >>>>>
> >>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
> >>>>>>
> >>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
> >>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
> >>>>>>> +known that the shorter array contains a null terminator.
> >>>>>>
> >>>>>> This is not true.  strnlen _always_ stops before the null byte.
> >>>>>
> >>>>> This is not how it is specified in POSIX.
> >>>>
> >>>> Yes, it is.
> >>>>
> >>>>     The strnlen() function shall return the number of bytes preceding
> >>>>     the first null byte in the array to which s points, if s contains a
> >>>>     null byte within the first maxlen bytes; otherwise, it shall return
> >>>>     maxlen.
> >>>>
> >>>> There is nothing undefined here.  Your interpretation would be
> >>>> completely useless anyway.
> >>>
> >>> It says “array”, which implies a length.  Admittedly, it does not say
> >>> that maxlen corresponds to the arrray length.  POSIX also says this:
> >>>
> >>> | The strnlen() function shall never examine more than maxlen bytes of
> >>> | the array pointed to by s.
> >>>
> >>> But it does NOT say that reading stops after the first null terminator.
> >>
> >> I have built glibc with --disable-multi-arch and this patch on x86-64:
> >>
> >> diff --git a/string/strnlen.c b/string/strnlen.c
> >> index 0b3a12e8b1..d5781dbb6f 100644
> >> --- a/string/strnlen.c
> >> +++ b/string/strnlen.c
> >> @@ -33,6 +33,10 @@
> >>  size_t
> >>  __strnlen (const char *str, size_t maxlen)
> >>  {
> >> +  /* Assert that the entire input is readable.  */
> >> +  for (size_t i = 0; i < maxlen; ++i)
> >> +    asm volatile ("" :: "r" (str[i]));
> >> +
> >>    const char *char_ptr, *end_ptr = str + maxlen;
> >>    const unsigned long int *longword_ptr;
> >>    unsigned long int longword, himagic, lomagic;
> >> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
> >> deleted file mode 100644
> >> index d3c43ac482..0000000000
> >> --- a/sysdeps/x86_64/strnlen.S
> >> +++ /dev/null
> >> @@ -1,6 +0,0 @@
> >> -#define AS_STRNLEN
> >> -#define strlen __strnlen
> >> -#include "strlen.S"
> >> -
> >> -weak_alias (__strnlen, strnlen);
> >> -libc_hidden_builtin_def (strnlen)
> >> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
> >> index 17e004dcc0..0d3709ac91 100644
> >> --- a/wcsmbs/wcsnlen.c
> >> +++ b/wcsmbs/wcsnlen.c
> >> @@ -26,6 +26,10 @@
> >>  size_t
> >>  __wcsnlen (const wchar_t *s, size_t maxlen)
> >>  {
> >> +  /* Assert that the entire input is readable.  */
> >> +  for (size_t i = 0; i < maxlen; ++i)
> >> +    asm volatile ("" :: "r" (s[i]));
> >> +
> >>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
> >>    if (ret)
> >>      maxlen = ret - s;
> >>
> >> The resulting crashes demonstrate that the test suite verifies that we
> >> do not treat the input as an array (to some degree; there might be
> >> scopes in coverage).
> >>
> >> I think we should document this as a GNU extension.  Thoughts?
> > 
> > We should absolutely document this. It's an implementation-dependent detail
> > that we choose to interpret the standard in this way.
> > 
> 
> I also think we should get changes into the linux man page project to call
> this out so that nobody thinks about changing this again and so the
> implementation is clear.
> 
> Have we asked Rich what musl does and what he thinks on the topic?

I missed this whole thread, and haven't had time to look back through
it yet. Is the claim that strnlen, etc. require a pointer to at least
n bytes? I do not think that matches the intent of these interfaces at
all. The language in POSIX is sloppy ("the number of bytes in the
array to which s points"?! I think they were just trying to avoid
saying "string" here because it's not necessarily a string, but they
botched it) but a function like this that requires a large array is
utterly useless. The whole point of strnlen is to be a bounded-time
strlen when lengths >n will be treated as errors (or otherwise
specially) after it returns.

Rich

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-11-28 18:23               ` Rich Felker
@ 2019-11-28 18:38                 ` Szabolcs Nagy
  2019-11-29 18:20                   ` Martin Sebor
  0 siblings, 1 reply; 24+ messages in thread
From: Szabolcs Nagy @ 2019-11-28 18:38 UTC (permalink / raw)
  To: Rich Felker, Carlos O'Donell
  Cc: nd, Florian Weimer, libc-alpha, Martin Sebor

On 28/11/2019 18:22, Rich Felker wrote:
> On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:
>> On 11/28/19 10:56 AM, Carlos O'Donell wrote:
>>> On 11/28/19 4:43 AM, Florian Weimer wrote:
>>>> * Florian Weimer:
>>>>
>>>>> * Andreas Schwab:
>>>>>
>>>>>> On Okt 30 2019, Florian Weimer wrote:
>>>>>>
>>>>>>> * Andreas Schwab:
>>>>>>>
>>>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>>>>
>>>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>>>>> +known that the shorter array contains a null terminator.
>>>>>>>>
>>>>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>>>>
>>>>>>> This is not how it is specified in POSIX.
>>>>>>
>>>>>> Yes, it is.
>>>>>>
>>>>>>     The strnlen() function shall return the number of bytes preceding
>>>>>>     the first null byte in the array to which s points, if s contains a
>>>>>>     null byte within the first maxlen bytes; otherwise, it shall return
>>>>>>     maxlen.
>>>>>>
>>>>>> There is nothing undefined here.  Your interpretation would be
>>>>>> completely useless anyway.
>>>>>
>>>>> It says “array”, which implies a length.  Admittedly, it does not say
>>>>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>>>>
>>>>> | The strnlen() function shall never examine more than maxlen bytes of
>>>>> | the array pointed to by s.
>>>>>
>>>>> But it does NOT say that reading stops after the first null terminator.
>>>>
>>>> I have built glibc with --disable-multi-arch and this patch on x86-64:
>>>>
>>>> diff --git a/string/strnlen.c b/string/strnlen.c
>>>> index 0b3a12e8b1..d5781dbb6f 100644
>>>> --- a/string/strnlen.c
>>>> +++ b/string/strnlen.c
>>>> @@ -33,6 +33,10 @@
>>>>  size_t
>>>>  __strnlen (const char *str, size_t maxlen)
>>>>  {
>>>> +  /* Assert that the entire input is readable.  */
>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>> +    asm volatile ("" :: "r" (str[i]));
>>>> +
>>>>    const char *char_ptr, *end_ptr = str + maxlen;
>>>>    const unsigned long int *longword_ptr;
>>>>    unsigned long int longword, himagic, lomagic;
>>>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
>>>> deleted file mode 100644
>>>> index d3c43ac482..0000000000
>>>> --- a/sysdeps/x86_64/strnlen.S
>>>> +++ /dev/null
>>>> @@ -1,6 +0,0 @@
>>>> -#define AS_STRNLEN
>>>> -#define strlen __strnlen
>>>> -#include "strlen.S"
>>>> -
>>>> -weak_alias (__strnlen, strnlen);
>>>> -libc_hidden_builtin_def (strnlen)
>>>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
>>>> index 17e004dcc0..0d3709ac91 100644
>>>> --- a/wcsmbs/wcsnlen.c
>>>> +++ b/wcsmbs/wcsnlen.c
>>>> @@ -26,6 +26,10 @@
>>>>  size_t
>>>>  __wcsnlen (const wchar_t *s, size_t maxlen)
>>>>  {
>>>> +  /* Assert that the entire input is readable.  */
>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>> +    asm volatile ("" :: "r" (s[i]));
>>>> +
>>>>    const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>>>>    if (ret)
>>>>      maxlen = ret - s;
>>>>
>>>> The resulting crashes demonstrate that the test suite verifies that we
>>>> do not treat the input as an array (to some degree; there might be
>>>> scopes in coverage).
>>>>
>>>> I think we should document this as a GNU extension.  Thoughts?
>>>
>>> We should absolutely document this. It's an implementation-dependent detail
>>> that we choose to interpret the standard in this way.
>>>
>>
>> I also think we should get changes into the linux man page project to call
>> this out so that nobody thinks about changing this again and so the
>> implementation is clear.
>>
>> Have we asked Rich what musl does and what he thinks on the topic?
> 
> I missed this whole thread, and haven't had time to look back through
> it yet. Is the claim that strnlen, etc. require a pointer to at least
> n bytes? I do not think that matches the intent of these interfaces at
> all. The language in POSIX is sloppy ("the number of bytes in the
> array to which s points"?! I think they were just trying to avoid
> saying "string" here because it's not necessarily a string, but they
> botched it) but a function like this that requires a large array is
> utterly useless. The whole point of strnlen is to be a bounded-time
> strlen when lengths >n will be treated as errors (or otherwise
> specially) after it returns.

if there is something wrong with the posix wording then
maybe the c2x proposal of strnlen should be updated too?
(cc Martin)

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2351.htm

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior
  2019-11-28 18:38                 ` Szabolcs Nagy
@ 2019-11-29 18:20                   ` Martin Sebor
  0 siblings, 0 replies; 24+ messages in thread
From: Martin Sebor @ 2019-11-29 18:20 UTC (permalink / raw)
  To: Szabolcs Nagy, Rich Felker, Carlos O'Donell
  Cc: nd, Florian Weimer, libc-alpha

On 11/28/19 11:38 AM, Szabolcs Nagy wrote:
> On 28/11/2019 18:22, Rich Felker wrote:
>> On Thu, Nov 28, 2019 at 10:58:13AM -0500, Carlos O'Donell wrote:
>>> On 11/28/19 10:56 AM, Carlos O'Donell wrote:
>>>> On 11/28/19 4:43 AM, Florian Weimer wrote:
>>>>> * Florian Weimer:
>>>>>
>>>>>> * Andreas Schwab:
>>>>>>
>>>>>>> On Okt 30 2019, Florian Weimer wrote:
>>>>>>>
>>>>>>>> * Andreas Schwab:
>>>>>>>>
>>>>>>>>> On Okt 30 2019, Florian Weimer (Code Review) wrote:
>>>>>>>>>
>>>>>>>>>> +Note that @var{s} must be an array of at least @var{maxlen} bytes.  It
>>>>>>>>>> +is undefined to call @code{strnlen} on a shorter array, even if it is
>>>>>>>>>> +known that the shorter array contains a null terminator.
>>>>>>>>>
>>>>>>>>> This is not true.  strnlen _always_ stops before the null byte.
>>>>>>>>
>>>>>>>> This is not how it is specified in POSIX.
>>>>>>>
>>>>>>> Yes, it is.
>>>>>>>
>>>>>>>      The strnlen() function shall return the number of bytes preceding
>>>>>>>      the first null byte in the array to which s points, if s contains a
>>>>>>>      null byte within the first maxlen bytes; otherwise, it shall return
>>>>>>>      maxlen.
>>>>>>>
>>>>>>> There is nothing undefined here.  Your interpretation would be
>>>>>>> completely useless anyway.
>>>>>>
>>>>>> It says “array”, which implies a length.  Admittedly, it does not say
>>>>>> that maxlen corresponds to the arrray length.  POSIX also says this:
>>>>>>
>>>>>> | The strnlen() function shall never examine more than maxlen bytes of
>>>>>> | the array pointed to by s.
>>>>>>
>>>>>> But it does NOT say that reading stops after the first null terminator.
>>>>>
>>>>> I have built glibc with --disable-multi-arch and this patch on x86-64:
>>>>>
>>>>> diff --git a/string/strnlen.c b/string/strnlen.c
>>>>> index 0b3a12e8b1..d5781dbb6f 100644
>>>>> --- a/string/strnlen.c
>>>>> +++ b/string/strnlen.c
>>>>> @@ -33,6 +33,10 @@
>>>>>   size_t
>>>>>   __strnlen (const char *str, size_t maxlen)
>>>>>   {
>>>>> +  /* Assert that the entire input is readable.  */
>>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>>> +    asm volatile ("" :: "r" (str[i]));
>>>>> +
>>>>>     const char *char_ptr, *end_ptr = str + maxlen;
>>>>>     const unsigned long int *longword_ptr;
>>>>>     unsigned long int longword, himagic, lomagic;
>>>>> diff --git a/sysdeps/x86_64/strnlen.S b/sysdeps/x86_64/strnlen.S
>>>>> deleted file mode 100644
>>>>> index d3c43ac482..0000000000
>>>>> --- a/sysdeps/x86_64/strnlen.S
>>>>> +++ /dev/null
>>>>> @@ -1,6 +0,0 @@
>>>>> -#define AS_STRNLEN
>>>>> -#define strlen __strnlen
>>>>> -#include "strlen.S"
>>>>> -
>>>>> -weak_alias (__strnlen, strnlen);
>>>>> -libc_hidden_builtin_def (strnlen)
>>>>> diff --git a/wcsmbs/wcsnlen.c b/wcsmbs/wcsnlen.c
>>>>> index 17e004dcc0..0d3709ac91 100644
>>>>> --- a/wcsmbs/wcsnlen.c
>>>>> +++ b/wcsmbs/wcsnlen.c
>>>>> @@ -26,6 +26,10 @@
>>>>>   size_t
>>>>>   __wcsnlen (const wchar_t *s, size_t maxlen)
>>>>>   {
>>>>> +  /* Assert that the entire input is readable.  */
>>>>> +  for (size_t i = 0; i < maxlen; ++i)
>>>>> +    asm volatile ("" :: "r" (s[i]));
>>>>> +
>>>>>     const wchar_t *ret = __wmemchr (s, L'\0', maxlen);
>>>>>     if (ret)
>>>>>       maxlen = ret - s;
>>>>>
>>>>> The resulting crashes demonstrate that the test suite verifies that we
>>>>> do not treat the input as an array (to some degree; there might be
>>>>> scopes in coverage).
>>>>>
>>>>> I think we should document this as a GNU extension.  Thoughts?
>>>>
>>>> We should absolutely document this. It's an implementation-dependent detail
>>>> that we choose to interpret the standard in this way.
>>>>
>>>
>>> I also think we should get changes into the linux man page project to call
>>> this out so that nobody thinks about changing this again and so the
>>> implementation is clear.
>>>
>>> Have we asked Rich what musl does and what he thinks on the topic?
>>
>> I missed this whole thread, and haven't had time to look back through
>> it yet. Is the claim that strnlen, etc. require a pointer to at least
>> n bytes? I do not think that matches the intent of these interfaces at
>> all. The language in POSIX is sloppy ("the number of bytes in the
>> array to which s points"?! I think they were just trying to avoid
>> saying "string" here because it's not necessarily a string, but they
>> botched it) but a function like this that requires a large array is
>> utterly useless. The whole point of strnlen is to be a bounded-time
>> strlen when lengths >n will be treated as errors (or otherwise
>> specially) after it returns.
> 
> if there is something wrong with the posix wording then
> maybe the c2x proposal of strnlen should be updated too?
> (cc Martin)
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2351.htm

I'm not sure I'd call it wrong but I'm not opposed to updating
the text to make it clear the function (and others like it) must
not examine bytes of the source array past the first NUL.

The strncpy and strncat functions are specified similarly (there's
no explicit requirement that they not read characters past the first
NUL).  All three functions should behave analogously WRT reading
the source array.

Martin

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-11-29 18:20 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-30 10:25 [review] manual: Clarify strnlen, wcsnlen, strndup null termination behavior Florian Weimer (Code Review)
2019-10-30 10:44 ` Andreas Schwab
2019-10-30 10:55   ` Florian Weimer
2019-10-30 11:00     ` Andreas Schwab
2019-10-30 11:03       ` Florian Weimer
2019-10-30 11:10         ` Andreas Schwab
2019-10-30 12:01           ` Zack Weinberg
2019-10-30 16:20             ` Andreas Schwab
2019-10-30 16:31               ` Zack Weinberg
2019-10-30 16:47                 ` Andreas Schwab
2019-10-30 16:58                   ` Zack Weinberg
2019-10-30 17:26                     ` Andreas Schwab
2019-10-30 18:12                       ` Zack Weinberg
2019-10-30 18:36                         ` Florian Weimer
2019-10-30 17:24             ` Joseph Myers
2019-11-28  9:43         ` Florian Weimer
2019-11-28 15:56           ` Carlos O'Donell
2019-11-28 15:58             ` Carlos O'Donell
2019-11-28 18:23               ` Rich Felker
2019-11-28 18:38                 ` Szabolcs Nagy
2019-11-29 18:20                   ` Martin Sebor
2019-11-27 19:08 ` Carlos O'Donell (Code Review)
2019-11-27 19:14 ` Florian Weimer (Code Review)
2019-11-27 22:11 ` Carlos O'Donell (Code Review)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).