public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* UB status of snprintf on invalid ptr+size combination?
@ 2023-03-14 19:47 Simon Chopin
  2023-03-14 21:39 ` Paul Eggert
  0 siblings, 1 reply; 32+ messages in thread
From: Simon Chopin @ 2023-03-14 19:47 UTC (permalink / raw)
  To: libc-alpha

Hi,

We've come across a couple of regressions[0] in 2.37 on armhf due to a
peculiar use of snprintf that boils down to this:

char [32]buf;
...
snprintf(buf, INT_MAX, "%s", "Hello world");

This used to work fine, but now fails in a non-obvious way: the return
value is the one you would expect from a successful call (here, 11), but
the last character is overwritten by a NUL terminator.

It's likely the new behavior has been introduced by

commit e88b9f0e5cc50cab57a299dc7efe1a4eb385161d
Author: Florian Weimer <fweimer@redhat.com>
Date:   Mon Dec 19 18:56:54 2022 +0100

    stdio-common: Convert vfprintf and related functions to buffers

We're hitting the issue because buf+INT_MAX overflows on armhf (being
32-bit and on the stack).

When the issue was first discovered[1], I didn't raise the issue because I
dismissed it as UB, but it reappeared in an unrelated context, and
colleagues pointed out that the wording in the standard doesn't actually
say that the `n` argument is the size of the array.

Do you think it worth it to try and fix this regression? Florian
suggested an approach similar to

commit 0d50f477f47ba637b54fb03ac48d769ec4543e8d
Author: Florian Weimer <fweimer@redhat.com>
Date: Wed Jan 25 08:01:00 2023 +0100

    stdio-common: Handle -1 buffer size in __sprintf_chk & co (bug 30039)


Thanks in advance,
Simon

[0]: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/2011326
[1]: https://bugs.launchpad.net/ubuntu/+source/notcurses/+bug/2008108

--
Simon Chopin
Foundations Team                               	    Ubuntu MOTU/Core Dev
simon.chopin@canonical.com                            schopin@ubuntu.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-14 19:47 UB status of snprintf on invalid ptr+size combination? Simon Chopin
@ 2023-03-14 21:39 ` Paul Eggert
  2023-03-15  9:22   ` Andreas Schwab
  2023-03-15 12:39   ` Vincent Lefevre
  0 siblings, 2 replies; 32+ messages in thread
From: Paul Eggert @ 2023-03-14 21:39 UTC (permalink / raw)
  To: Simon Chopin; +Cc: libc-alpha

On 3/14/23 12:47, Simon Chopin via Libc-alpha wrote:
> When the issue was first discovered[1], I didn't raise the issue because I
> dismissed it as UB, but it reappeared in an unrelated context, and
> colleagues pointed out that the wording in the standard doesn't actually
> say that the `n` argument is the size of the array.

That sounds like a misreading of the C standard. Even though the 
standard often does not explicitly say that a size argument is the size 
of an array, it's obvious from context that this is intended. So Florian 
is correct here that the call with INT_MAX is not portable C code.

For example, it's valid for snprintf to be implemented this way:

   int
   snprintf (char *buf, size_t size, char const *fmt, ...)
   {
      char *buf_limit = buf + size;
      ...
   }

even though this would have undefined behavior if BUF points to a 
character array smaller than SIZE.

glibc snprintf does the equivalent of the above internally, so it's a 
good thing that notcurses has been fixed to not use the INT_MAX trick as 
that trick already did not work in general with glibc and I assume with 
other C libraries (with rare and hard-to-diagnose failures), even aside 
from any fortification business that made the bug more visible.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-14 21:39 ` Paul Eggert
@ 2023-03-15  9:22   ` Andreas Schwab
  2023-03-15 15:54     ` Siddhesh Poyarekar
                       ` (2 more replies)
  2023-03-15 12:39   ` Vincent Lefevre
  1 sibling, 3 replies; 32+ messages in thread
From: Andreas Schwab @ 2023-03-15  9:22 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Simon Chopin, libc-alpha

On Mär 14 2023, Paul Eggert wrote:

> For example, it's valid for snprintf to be implemented this way:
>
>   int
>   snprintf (char *buf, size_t size, char const *fmt, ...)
>   {
>      char *buf_limit = buf + size;
>      ...
>   }
>
> even though this would have undefined behavior if BUF points to a
> character array smaller than SIZE.

Since it is part of the implementation it is irrelevant from the POV of
the standard.  The implementation does not have to abide to the C
standard, as long as it properly implements the interface constraints.

What matters is the wording of the standard.  The POSIX standard is more
explicit here: "with the addition of the n argument which states the
size of the buffer referred to by s."  Probably the C standard should be
clarified.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-14 21:39 ` Paul Eggert
  2023-03-15  9:22   ` Andreas Schwab
@ 2023-03-15 12:39   ` Vincent Lefevre
  2023-03-16 10:29     ` Stephan Bergmann
  1 sibling, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-15 12:39 UTC (permalink / raw)
  To: Paul Eggert; +Cc: Simon Chopin, libc-alpha

About

  snprintf(buf, INT_MAX, "%s", "Hello world");

On 2023-03-14 14:39:42 -0700, Paul Eggert wrote:
> On 3/14/23 12:47, Simon Chopin via Libc-alpha wrote:
> > When the issue was first discovered[1], I didn't raise the issue because I
> > dismissed it as UB, but it reappeared in an unrelated context, and
> > colleagues pointed out that the wording in the standard doesn't actually
> > say that the `n` argument is the size of the array.
> 
> That sounds like a misreading of the C standard. Even though the standard
> often does not explicitly say that a size argument is the size of an array,
> it's obvious from context that this is intended. So Florian is correct here
> that the call with INT_MAX is not portable C code.

No, it is not obvious. If the C standard does not say that this is
the size of the array, then it does not have to be the size of the
array. The C standard just says:

  Otherwise, output characters beyond the n-1st are discarded rather
  than being written to the array, and a null character is written at
  the end of the characters actually written into the array.

In practice, it is possible that the check of the buffer size has
already been done somewhere else, so that using a larger value for n
is fine.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-15  9:22   ` Andreas Schwab
@ 2023-03-15 15:54     ` Siddhesh Poyarekar
  2023-03-15 18:34     ` Michael Hudson-Doyle
  2023-03-19 14:45     ` manfred
  2 siblings, 0 replies; 32+ messages in thread
From: Siddhesh Poyarekar @ 2023-03-15 15:54 UTC (permalink / raw)
  To: Andreas Schwab, Paul Eggert; +Cc: Simon Chopin, libc-alpha

On 2023-03-15 05:22, Andreas Schwab via Libc-alpha wrote:
> On Mär 14 2023, Paul Eggert wrote:
> 
>> For example, it's valid for snprintf to be implemented this way:
>>
>>    int
>>    snprintf (char *buf, size_t size, char const *fmt, ...)
>>    {
>>       char *buf_limit = buf + size;
>>       ...
>>    }
>>
>> even though this would have undefined behavior if BUF points to a
>> character array smaller than SIZE.
> 
> Since it is part of the implementation it is irrelevant from the POV of
> the standard.  The implementation does not have to abide to the C
> standard, as long as it properly implements the interface constraints.
> 
> What matters is the wording of the standard.  The POSIX standard is more
> explicit here: "with the addition of the n argument which states the
> size of the buffer referred to by s."  Probably the C standard should be
> clarified.

+1, the C standard wording ought to mirror POSIX here.  FWIW, when built 
with fortification, this code will abort prematurely because it 
considers passed size being greater than the buffer size as being unsafe.

Thanks,
Sid

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-15  9:22   ` Andreas Schwab
  2023-03-15 15:54     ` Siddhesh Poyarekar
@ 2023-03-15 18:34     ` Michael Hudson-Doyle
  2023-03-19 14:45     ` manfred
  2 siblings, 0 replies; 32+ messages in thread
From: Michael Hudson-Doyle @ 2023-03-15 18:34 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Paul Eggert, Simon Chopin, libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1210 bytes --]

On Wed, 15 Mar 2023 at 22:22, Andreas Schwab via Libc-alpha <
libc-alpha@sourceware.org> wrote:

> On Mär 14 2023, Paul Eggert wrote:
>
> > For example, it's valid for snprintf to be implemented this way:
> >
> >   int
> >   snprintf (char *buf, size_t size, char const *fmt, ...)
> >   {
> >      char *buf_limit = buf + size;
> >      ...
> >   }
> >
> > even though this would have undefined behavior if BUF points to a
> > character array smaller than SIZE.
>
> Since it is part of the implementation it is irrelevant from the POV of
> the standard.  The implementation does not have to abide to the C
> standard, as long as it properly implements the interface constraints.
>
> What matters is the wording of the standard.  The POSIX standard is more
> explicit here: "with the addition of the n argument which states the
> size of the buffer referred to by s."  Probably the C standard should be
> clarified.
>

Ah that's interesting that POSIX is clearer here, thanks for pointing that
out. I can feel more confident declaring the affected code broken now :-)

Is anyone here close enough to the C standards process to push getting this
clarified there?

Cheers,
mwh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-15 12:39   ` Vincent Lefevre
@ 2023-03-16 10:29     ` Stephan Bergmann
  2023-03-18  2:07       ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Stephan Bergmann @ 2023-03-16 10:29 UTC (permalink / raw)
  To: libc-alpha; +Cc: Vincent Lefevre, Paul Eggert, Simon Chopin

On 15/03/2023 13:39, Vincent Lefevre wrote:
> No, it is not obvious. If the C standard does not say that this is
> the size of the array, then it does not have to be the size of the
> array. The C standard just says:
> 
>    Otherwise, output characters beyond the n-1st are discarded rather
>    than being written to the array, and a null character is written at
>    the end of the characters actually written into the array.

But in 7.1.4 "Use of library functions" the standard also says

> If a function argument is described as being an array, the pointer passed to the function shall
> have a value such that all address computations and accesses to objects (that would be valid if
> the pointer did point to the first element of such an array) are valid.

which could be construed as meaning that the n-1st array element must 
always be accessible, even if a given invocation is known to always 
generate less then n output characters.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-16 10:29     ` Stephan Bergmann
@ 2023-03-18  2:07       ` Vincent Lefevre
  2023-03-18  2:30         ` Alejandro Colomar
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-18  2:07 UTC (permalink / raw)
  To: Stephan Bergmann; +Cc: libc-alpha, Paul Eggert, Simon Chopin

On 2023-03-16 11:29:31 +0100, Stephan Bergmann wrote:
> On 15/03/2023 13:39, Vincent Lefevre wrote:
> > No, it is not obvious. If the C standard does not say that this is
> > the size of the array, then it does not have to be the size of the
> > array. The C standard just says:
> > 
> >    Otherwise, output characters beyond the n-1st are discarded rather
> >    than being written to the array, and a null character is written at
> >    the end of the characters actually written into the array.
> 
> But in 7.1.4 "Use of library functions" the standard also says
> 
> > If a function argument is described as being an array, the pointer
> > passed to the function shall have a value such that all address
> > computations and accesses to objects (that would be valid if the
> > pointer did point to the first element of such an array) are
> > valid.
> 
> which could be construed as meaning that the n-1st array element must always
> be accessible, even if a given invocation is known to always generate less
> then n output characters.

But the standard does not say that n is the size of the array.
The size of the array could be the maximum of n and the size
corresponding to the untruncated output string.

Similarly, for strncpy, I would not see n as the size of the arrays,
i.e. it is not allowed for the implementation to read characters
past a null character (possibly unless this does not have unwanted
effects), even though such characters would be among the first n
characters.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-18  2:07       ` Vincent Lefevre
@ 2023-03-18  2:30         ` Alejandro Colomar
  2023-03-18 10:58           ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Alejandro Colomar @ 2023-03-18  2:30 UTC (permalink / raw)
  To: Vincent Lefevre, libc-alpha
  Cc: Stephan Bergmann, Paul Eggert, Simon Chopin, Andreas Schwab


[-- Attachment #1.1: Type: text/plain, Size: 2608 bytes --]

Hello Vincent,

On 3/18/23 03:07, Vincent Lefevre wrote:
> On 2023-03-16 11:29:31 +0100, Stephan Bergmann wrote:
>> On 15/03/2023 13:39, Vincent Lefevre wrote:
>>> No, it is not obvious. If the C standard does not say that this is
>>> the size of the array, then it does not have to be the size of the
>>> array. The C standard just says:
>>>
>>>    Otherwise, output characters beyond the n-1st are discarded rather
>>>    than being written to the array, and a null character is written at
>>>    the end of the characters actually written into the array.
>>
>> But in 7.1.4 "Use of library functions" the standard also says
>>
>>> If a function argument is described as being an array, the pointer
>>> passed to the function shall have a value such that all address
>>> computations and accesses to objects (that would be valid if the
>>> pointer did point to the first element of such an array) are
>>> valid.
>>
>> which could be construed as meaning that the n-1st array element must always
>> be accessible, even if a given invocation is known to always generate less
>> then n output characters.
> 
> But the standard does not say that n is the size of the array.
> The size of the array could be the maximum of n and the size
> corresponding to the untruncated output string.

I guess you mean the minimum?  If it were the maximum, then it would
never truncate.

[assuming you meant minimum]:

As Andreas mentioned, that's valid for ISO C, but POSIX is more
restrictive.  Here's a quote from fprintf(3posix):

       The snprintf() function shall be equivalent to sprintf(),  with
       the  addition  of  the  n argument which states the size of the
       buffer referred to by s.

It clearly specifies that 'n' is the size of the buffer, so
implementations are free to assume that `s+n` is a valid pointer.

> 
> Similarly, for strncpy, I would not see n as the size of the arrays,
> i.e. it is not allowed for the implementation to read characters
> past a null character (possibly unless this does not have unwanted
> effects), even though such characters would be among the first n
> characters.

The size argument to strncpy(3) is the size of the destination buffer,
not the size of the input buffer.  The input buffer must be either
a string, or a character sequence at least as large as the destination
buffer.  Thus, in strncpy(3), reads are limited by
`strnlen(src, size)`, but writes are limited by `size`.

Cheers,
Alex

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-18  2:30         ` Alejandro Colomar
@ 2023-03-18 10:58           ` Vincent Lefevre
  2023-03-18 15:01             ` Andreas Schwab
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-18 10:58 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: libc-alpha, Stephan Bergmann, Paul Eggert, Simon Chopin, Andreas Schwab

On 2023-03-18 03:30:27 +0100, Alejandro Colomar wrote:
> Hello Vincent,
> 
> On 3/18/23 03:07, Vincent Lefevre wrote:
> > On 2023-03-16 11:29:31 +0100, Stephan Bergmann wrote:
> >> On 15/03/2023 13:39, Vincent Lefevre wrote:
> >>> No, it is not obvious. If the C standard does not say that this is
> >>> the size of the array, then it does not have to be the size of the
> >>> array. The C standard just says:
> >>>
> >>>    Otherwise, output characters beyond the n-1st are discarded rather
> >>>    than being written to the array, and a null character is written at
> >>>    the end of the characters actually written into the array.
> >>
> >> But in 7.1.4 "Use of library functions" the standard also says
> >>
> >>> If a function argument is described as being an array, the pointer
> >>> passed to the function shall have a value such that all address
> >>> computations and accesses to objects (that would be valid if the
> >>> pointer did point to the first element of such an array) are
> >>> valid.
> >>
> >> which could be construed as meaning that the n-1st array element must always
> >> be accessible, even if a given invocation is known to always generate less
> >> then n output characters.
> > 
> > But the standard does not say that n is the size of the array.
> > The size of the array could be the maximum of n and the size
> > corresponding to the untruncated output string.
> 
> I guess you mean the minimum?  If it were the maximum, then it would
> never truncate.

Yes, obviously, the minimum.

> [assuming you meant minimum]:
> 
> As Andreas mentioned, that's valid for ISO C, but POSIX is more
> restrictive.  Here's a quote from fprintf(3posix):
> 
>        The snprintf() function shall be equivalent to sprintf(),  with
>        the  addition  of  the  n argument which states the size of the
>        buffer referred to by s.
> 
> It clearly specifies that 'n' is the size of the buffer, so
> implementations are free to assume that `s+n` is a valid pointer.

However, I'm wondering whether such a change is intentional. BTW,
this description is even wrong: this is certainly not equivalent!
If the untruncated output is larger than n, then the call is UB
with sprintf(), while the output is truncated with snprintf(). What
really occurs is described later, but still, this first sentence is
incorrect. This could just mean that they tried to oversimplify when
writing this text, without thinking of the details.

> > Similarly, for strncpy, I would not see n as the size of the arrays,
> > i.e. it is not allowed for the implementation to read characters
> > past a null character (possibly unless this does not have unwanted
> > effects), even though such characters would be among the first n
> > characters.
> 
> The size argument to strncpy(3) is the size of the destination buffer,
> not the size of the input buffer.  The input buffer must be either
> a string, or a character sequence at least as large as the destination
> buffer.  Thus, in strncpy(3), reads are limited by
> `strnlen(src, size)`, but writes are limited by `size`.

I agree, but I was answering to Stephan Bergmann, who thought that
a size_t argument could be seen as the size of the array arguments.
IMHO, the (minimum) sizes of the arrays are just implied from the
described behavior of the function.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-18 10:58           ` Vincent Lefevre
@ 2023-03-18 15:01             ` Andreas Schwab
  2023-03-19 22:48               ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Andreas Schwab @ 2023-03-18 15:01 UTC (permalink / raw)
  To: Vincent Lefevre
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On Mär 18 2023, Vincent Lefevre wrote:

> However, I'm wondering whether such a change is intentional. BTW,
> this description is even wrong: this is certainly not equivalent!
> If the untruncated output is larger than n, then the call is UB
> with sprintf(), while the output is truncated with snprintf().

Which makes them equivalent in all situations where sprintf is defined,
so there is no discrepancy here.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-15  9:22   ` Andreas Schwab
  2023-03-15 15:54     ` Siddhesh Poyarekar
  2023-03-15 18:34     ` Michael Hudson-Doyle
@ 2023-03-19 14:45     ` manfred
  2023-03-19 23:07       ` Vincent Lefevre
  2023-03-20 15:09       ` Vincent Lefevre
  2 siblings, 2 replies; 32+ messages in thread
From: manfred @ 2023-03-19 14:45 UTC (permalink / raw)
  To: libc-alpha

After reading it a few times, I believe that the meaning of the wording 
of the ISO C standard is that 'n' is an upper limit to the number of 
characters written to s, not necessarily the size of the array.

Also, the standard says that
   "The snprintf function is equivalent to fprintf, except that the
    output is written into an array (specified by argument s) rather than
    to a stream"

Which implies that no access to the output array is performed after 
termination of encoding, regardless of the actual value of n.
To me, it looks like in this case, i.e. when the encoded result is 
shorter than the array size and n, the distinction between n being the 
actual array size, or an upper limit to the length of the output string, 
is only relevant to the implementation:
The question becomes: is the implementation allowed to access elements 
past the encoded result? Is it allowed to evaluate s+n?

Out of curiosity, I compared the wording with the specifications in 
Annex K, which is explicitly aimed at checking array boundaries: as far 
as I can tell snprintf_s refers to snprintf, and does not add to this 
scenario. sprintf_s says about n:

   "n shall neither equal zero nor be greater than RSIZE_MAX. The number
    of characters (including the trailing null) required for the result
    to be written to the array pointed to by s shall not be greater than
    n."

Again, this describes an upper limit to the length of the result, not 
the array size, but next in the description:

  "4 The sprintf_s function is equivalent to the sprintf function except
     for the parameter n and the explicit runtime-constraints listed
     above.

   5 The sprintf_s function, unlike snprintf_s, treats a result too big
     for the array pointed to by s as a runtime-constraint violation."

In this last sentence, the expression "a result too big for the array" 
hints at the array size (otherwise it should say "a result longer than 
n"), but propagating this to the rest of the text requires some 
imagination about the intention of the authors.

All of that said, back to the OP case I would not pass INT_MAX to 
snprintf. If I have a situation wherein I know that the buffer is large 
enough, but I don't know its exact size, I'd use sprintf and be done 
with it. (I'm sure that the actual code is more elaborate than this, but 
still)

My 2c


On 3/15/2023 5:22 AM, Andreas Schwab via Libc-alpha wrote:
> On Mär 14 2023, Paul Eggert wrote:
> 
>> For example, it's valid for snprintf to be implemented this way:
>>
>>    int
>>    snprintf (char *buf, size_t size, char const *fmt, ...)
>>    {
>>       char *buf_limit = buf + size;
>>       ...
>>    }
>>
>> even though this would have undefined behavior if BUF points to a
>> character array smaller than SIZE.
> 
> Since it is part of the implementation it is irrelevant from the POV of
> the standard.  The implementation does not have to abide to the C
> standard, as long as it properly implements the interface constraints.
> 
> What matters is the wording of the standard.  The POSIX standard is more
> explicit here: "with the addition of the n argument which states the
> size of the buffer referred to by s."  Probably the C standard should be
> clarified.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-18 15:01             ` Andreas Schwab
@ 2023-03-19 22:48               ` Vincent Lefevre
  2023-03-19 23:24                 ` Andreas Schwab
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-19 22:48 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On 2023-03-18 16:01:07 +0100, Andreas Schwab wrote:
> On Mär 18 2023, Vincent Lefevre wrote:
> 
> > However, I'm wondering whether such a change is intentional. BTW,
> > this description is even wrong: this is certainly not equivalent!
> > If the untruncated output is larger than n, then the call is UB
> > with sprintf(), while the output is truncated with snprintf().
> 
> Which makes them equivalent in all situations where sprintf is defined,
> so there is no discrepancy here.

The conditions under which the function is defined or not are part of
the equivalence. For instance, I would not say that memcpy and memmove
are equivalent, even though they are equivalent when memcpy is defined.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-19 14:45     ` manfred
@ 2023-03-19 23:07       ` Vincent Lefevre
  2023-03-20 12:05         ` Siddhesh Poyarekar
  2023-03-20 15:09       ` Vincent Lefevre
  1 sibling, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-19 23:07 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-19 10:45:59 -0400, manfred via Libc-alpha wrote:
> All of that said, back to the OP case I would not pass INT_MAX to snprintf.
> If I have a situation wherein I know that the buffer is large enough, but I
> don't know its exact size, I'd use sprintf and be done with it. (I'm sure
> that the actual code is more elaborate than this, but still)

In simple code, probably. But in actual code, it may be more natural
to use snprintf. Something like that:

  snprintf(buf, checked ? SIZE_MAX : n, "%s", s);

The function may not know the buffer size if `checked` is true,
so that it uses a known bound. Thanks to common code factorized,
this is more readable than

  if (checked)
    sprintf (buf, "%s", s);
  else
    snprintf(buf, n, "%s", s);

in particular in the cases where the format string is complex.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-19 22:48               ` Vincent Lefevre
@ 2023-03-19 23:24                 ` Andreas Schwab
  2023-03-20  4:10                   ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Andreas Schwab @ 2023-03-19 23:24 UTC (permalink / raw)
  To: Vincent Lefevre
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On Mär 19 2023, Vincent Lefevre wrote:

> On 2023-03-18 16:01:07 +0100, Andreas Schwab wrote:
>> On Mär 18 2023, Vincent Lefevre wrote:
>> 
>> > However, I'm wondering whether such a change is intentional. BTW,
>> > this description is even wrong: this is certainly not equivalent!
>> > If the untruncated output is larger than n, then the call is UB
>> > with sprintf(), while the output is truncated with snprintf().
>> 
>> Which makes them equivalent in all situations where sprintf is defined,
>> so there is no discrepancy here.
>
> The conditions under which the function is defined or not are part of
> the equivalence.

No, it isn't.  Equivalent is not the same as identical.

> For instance, I would not say that memcpy and memmove are equivalent,
> even though they are equivalent when memcpy is defined.

But they are.  You can use either memcpy or memove in all situations
where memcpy is defined.  Outside of the domain of memcpy there is no
relation at all.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-19 23:24                 ` Andreas Schwab
@ 2023-03-20  4:10                   ` Vincent Lefevre
  2023-03-20  9:19                     ` Andreas Schwab
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20  4:10 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On 2023-03-20 00:24:51 +0100, Andreas Schwab wrote:
> On Mär 19 2023, Vincent Lefevre wrote:
> 
> > On 2023-03-18 16:01:07 +0100, Andreas Schwab wrote:
> >> On Mär 18 2023, Vincent Lefevre wrote:
> >> 
> >> > However, I'm wondering whether such a change is intentional. BTW,
> >> > this description is even wrong: this is certainly not equivalent!
> >> > If the untruncated output is larger than n, then the call is UB
> >> > with sprintf(), while the output is truncated with snprintf().
> >> 
> >> Which makes them equivalent in all situations where sprintf is defined,
> >> so there is no discrepancy here.
> >
> > The conditions under which the function is defined or not are part of
> > the equivalence.
> 
> No, it isn't.  Equivalent is not the same as identical.
> 
> > For instance, I would not say that memcpy and memmove are equivalent,
> > even though they are equivalent when memcpy is defined.
> 
> But they are.  You can use either memcpy or memove in all situations
> where memcpy is defined.  Outside of the domain of memcpy there is no
> relation at all.

POSIX specifies _exit() by:

  The _Exit() and _exit() functions shall be functionally equivalent.

So, you mean that _exit() may have undefined behavior while _Exit()
has a well-defined behavior? (as this would match your definition
of equivalence.)

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20  4:10                   ` Vincent Lefevre
@ 2023-03-20  9:19                     ` Andreas Schwab
  2023-03-20 10:42                       ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Andreas Schwab @ 2023-03-20  9:19 UTC (permalink / raw)
  To: Vincent Lefevre
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On Mär 20 2023, Vincent Lefevre wrote:

> So, you mean that _exit() may have undefined behavior while _Exit()
> has a well-defined behavior? (as this would match your definition
> of equivalence.)

That's a logical fallacy.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20  9:19                     ` Andreas Schwab
@ 2023-03-20 10:42                       ` Vincent Lefevre
  2023-03-20 10:44                         ` Andreas Schwab
  0 siblings, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 10:42 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On 2023-03-20 10:19:13 +0100, Andreas Schwab wrote:
> On Mär 20 2023, Vincent Lefevre wrote:
> 
> > So, you mean that _exit() may have undefined behavior while _Exit()
> > has a well-defined behavior? (as this would match your definition
> > of equivalence.)
> 
> That's a logical fallacy.

Not more than saying that memcpy and memmove are equivalent.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 10:42                       ` Vincent Lefevre
@ 2023-03-20 10:44                         ` Andreas Schwab
  0 siblings, 0 replies; 32+ messages in thread
From: Andreas Schwab @ 2023-03-20 10:44 UTC (permalink / raw)
  To: Vincent Lefevre
  Cc: Alejandro Colomar, libc-alpha, Stephan Bergmann, Paul Eggert,
	Simon Chopin

On Mär 20 2023, Vincent Lefevre wrote:

> On 2023-03-20 10:19:13 +0100, Andreas Schwab wrote:
>> On Mär 20 2023, Vincent Lefevre wrote:
>> 
>> > So, you mean that _exit() may have undefined behavior while _Exit()
>> > has a well-defined behavior? (as this would match your definition
>> > of equivalence.)
>> 
>> That's a logical fallacy.
>
> Not more than saying that memcpy and memmove are equivalent.

No.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-19 23:07       ` Vincent Lefevre
@ 2023-03-20 12:05         ` Siddhesh Poyarekar
  2023-03-20 12:17           ` Alejandro Colomar
  2023-03-20 13:50           ` Vincent Lefevre
  0 siblings, 2 replies; 32+ messages in thread
From: Siddhesh Poyarekar @ 2023-03-20 12:05 UTC (permalink / raw)
  To: Vincent Lefevre, libc-alpha

On 2023-03-19 19:07, Vincent Lefevre wrote:
> On 2023-03-19 10:45:59 -0400, manfred via Libc-alpha wrote:
>> All of that said, back to the OP case I would not pass INT_MAX to snprintf.
>> If I have a situation wherein I know that the buffer is large enough, but I
>> don't know its exact size, I'd use sprintf and be done with it. (I'm sure
>> that the actual code is more elaborate than this, but still)
> 
> In simple code, probably. But in actual code, it may be more natural
> to use snprintf. Something like that:
> 
>    snprintf(buf, checked ? SIZE_MAX : n, "%s", s);
> 
> The function may not know the buffer size if `checked` is true,
> so that it uses a known bound. Thanks to common code factorized,
> this is more readable than
> 
>    if (checked)
>      sprintf (buf, "%s", s);
>    else
>      snprintf(buf, n, "%s", s);
> 
> in particular in the cases where the format string is complex.

If your application requires such patterns then it really needs an 
additional layer of abstraction or maybe a rethink on the pattern 
itself.  This is not something the C runtime should try to solve.

I think on the glibc front it makes sense from a security perspective to 
interpret this through POSIX than the C standard.  Even if the C 
standard is clarified to be contrary to POSIX and explicitly state that 
n is not the size of the buffer (which would be a terrible mistake IMO), 
I'd lean towards violating the C standard and conforming to POSIX instead.

Sid

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 12:05         ` Siddhesh Poyarekar
@ 2023-03-20 12:17           ` Alejandro Colomar
  2023-03-20 12:29             ` Siddhesh Poyarekar
  2023-03-20 13:36             ` Vincent Lefevre
  2023-03-20 13:50           ` Vincent Lefevre
  1 sibling, 2 replies; 32+ messages in thread
From: Alejandro Colomar @ 2023-03-20 12:17 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Vincent Lefevre, libc-alpha


[-- Attachment #1.1: Type: text/plain, Size: 2090 bytes --]

Hi Vincent, Siddhesh,

On 3/20/23 13:05, Siddhesh Poyarekar wrote:
> On 2023-03-19 19:07, Vincent Lefevre wrote:
>> On 2023-03-19 10:45:59 -0400, manfred via Libc-alpha wrote:
>>> All of that said, back to the OP case I would not pass INT_MAX to snprintf.
>>> If I have a situation wherein I know that the buffer is large enough, but I
>>> don't know its exact size, I'd use sprintf and be done with it. (I'm sure
>>> that the actual code is more elaborate than this, but still)
>>
>> In simple code, probably. But in actual code, it may be more natural
>> to use snprintf. Something like that:
>>
>>    snprintf(buf, checked ? SIZE_MAX : n, "%s", s);
>>
>> The function may not know the buffer size if `checked` is true,
>> so that it uses a known bound. Thanks to common code factorized,
>> this is more readable than
>>
>>    if (checked)
>>      sprintf (buf, "%s", s);
>>    else
>>      snprintf(buf, n, "%s", s);
>>
>> in particular in the cases where the format string is complex.

That pattern looks like _FORTIFY_SOURCE, doesn't it?  If so, the correct
action would be to call sprintf(3) and rely on the compiler to do the
checks.

snprintf(3) should be called when you can't guarantee at coding time
if the array is possibly overrun.  If you can guarantee that, then call
sprintf(3), and the compiler will confirm.

Cheers,
Alex

> 
> If your application requires such patterns then it really needs an 
> additional layer of abstraction or maybe a rethink on the pattern 
> itself.  This is not something the C runtime should try to solve.
> 
> I think on the glibc front it makes sense from a security perspective to 
> interpret this through POSIX than the C standard.  Even if the C 
> standard is clarified to be contrary to POSIX and explicitly state that 
> n is not the size of the buffer (which would be a terrible mistake IMO), 
> I'd lean towards violating the C standard and conforming to POSIX instead.
> 
> Sid

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 12:17           ` Alejandro Colomar
@ 2023-03-20 12:29             ` Siddhesh Poyarekar
  2023-03-20 13:36             ` Vincent Lefevre
  1 sibling, 0 replies; 32+ messages in thread
From: Siddhesh Poyarekar @ 2023-03-20 12:29 UTC (permalink / raw)
  To: Alejandro Colomar, Vincent Lefevre, libc-alpha

On 2023-03-20 08:17, Alejandro Colomar wrote:
>>>     if (checked)
>>>       sprintf (buf, "%s", s);
>>>     else
>>>       snprintf(buf, n, "%s", s);
>>>
>>> in particular in the cases where the format string is complex.
> 
> That pattern looks like _FORTIFY_SOURCE, doesn't it?  If so, the correct
> action would be to call sprintf(3) and rely on the compiler to do the
> checks.

Yes the pattern is similar to __sprintf_chk, except that if the result 
string is larger, it will result in program abort while in the above 
pattern the string will get truncated.

Sid

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 12:17           ` Alejandro Colomar
  2023-03-20 12:29             ` Siddhesh Poyarekar
@ 2023-03-20 13:36             ` Vincent Lefevre
  1 sibling, 0 replies; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 13:36 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 13:17:11 +0100, Alejandro Colomar wrote:
> On 3/20/23 13:05, Siddhesh Poyarekar wrote:
> > On 2023-03-19 19:07, Vincent Lefevre wrote:
> >> The function may not know the buffer size if `checked` is true,
> >> so that it uses a known bound. Thanks to common code factorized,
> >> this is more readable than
> >>
> >>    if (checked)
> >>      sprintf (buf, "%s", s);
> >>    else
> >>      snprintf(buf, n, "%s", s);
> >>
> >> in particular in the cases where the format string is complex.
> 
> That pattern looks like _FORTIFY_SOURCE, doesn't it?  If so, the correct
> action would be to call sprintf(3) and rely on the compiler to do the
> checks.

Not necessarily. Here, this is the programmer who would do the check,
though he may add code (e.g. assertions) on the caller side to give
enough information to the compiler so that a smart compiler could
prove that the call is valid. But this may need an analysis across
different compilation units. If the compiler cannot check, the
validity relies on the proof of the code; but not that in any case,
the code needs to be proved: for instance, snprintf() may yield
data loss and potential security issues if string truncation is
not expected (or if the string is truncation too early).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 12:05         ` Siddhesh Poyarekar
  2023-03-20 12:17           ` Alejandro Colomar
@ 2023-03-20 13:50           ` Vincent Lefevre
  2023-03-20 16:56             ` Siddhesh Poyarekar
  1 sibling, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 13:50 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 08:05:32 -0400, Siddhesh Poyarekar wrote:
> I think on the glibc front it makes sense from a security
> perspective to interpret this through POSIX than the C standard.
> Even if the C standard is clarified to be contrary to POSIX and
> explicitly state that n is not the size of the buffer (which would
> be a terrible mistake IMO), I'd lean towards violating the C
> standard and conforming to POSIX instead.

I disagree about the POSIX behavior (assuming it is intentional).
With it, if the compiler detects that n is larger than the actual
buffer size, then due to undefined behavior, the compiler could
assume that this is dead code and introduce erratic behavior in
code written with the C standard in mind (or when it was introduced
in BSD).

Note that the printf(3) man page from FreeBSD 1.0 (1991), just says:

  Snprintf() and vsnprintf() will write at most size-1 of the
  characters printed into the output string (the size'th character
  then gets the terminating `\0'); if the return value is greater
  than or equal to the size argument, the string was too short and
  some of the printed characters were discarded.

Reference:
https://man.freebsd.org/cgi/man.cgi?query=snprintf&apropos=0&sektion=0&manpath=FreeBSD+1.0-RELEASE&arch=default&format=html

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-19 14:45     ` manfred
  2023-03-19 23:07       ` Vincent Lefevre
@ 2023-03-20 15:09       ` Vincent Lefevre
  2023-03-20 16:15         ` Alejandro Colomar
  1 sibling, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 15:09 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-19 10:45:59 -0400, manfred via Libc-alpha wrote:
> All of that said, back to the OP case I would not pass INT_MAX to snprintf.
> If I have a situation wherein I know that the buffer is large enough, but I
> don't know its exact size, I'd use sprintf and be done with it. (I'm sure
> that the actual code is more elaborate than this, but still)

Here's another example where the support of snprintf with a large n
argument (larger than the buffer size) may be used:

In GNU MPFR, for our function mpfr_snprintf, we have assumed a
behavior analogue to the ISO C behavior. In particular, we use
that in our tests in order to check that large values of n are
correctly handled. This allowed us to trigger/detect a bug in
the choice of the integer types in our implementation:

The test:

https://gitlab.inria.fr/mpfr/mpfr/-/commit/67a75bfe41d3a7f95367ee9e62bd7dfc73e5b395

The bug fix (replacing an int by a size_t in a variable declaration):

https://gitlab.inria.fr/mpfr/mpfr/-/commit/6b8cf3e2bdc285027627281cac230ed932c1b73f

Note that the MPFR code currently does not use snprintf or any
similar function, so that there should be no issues with this
test if snprintf does not support n > buffer size, it but may
use such a function in the future. (Indeed, MPFR currently uses
gmp_vasprintf and gmp_asprintf, thus generating a full output
before truncating it as needed, so that this is potentially
very inefficient and could unnecessarily exhaust the memory
and/or crash.)

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 15:09       ` Vincent Lefevre
@ 2023-03-20 16:15         ` Alejandro Colomar
  2023-03-20 16:33           ` Vincent Lefevre
  2023-03-20 17:00           ` Vincent Lefevre
  0 siblings, 2 replies; 32+ messages in thread
From: Alejandro Colomar @ 2023-03-20 16:15 UTC (permalink / raw)
  To: Vincent Lefevre, libc-alpha
  Cc: Stephan Bergmann, Andreas Schwab, Paul Eggert, Simon Chopin


[-- Attachment #1.1: Type: text/plain, Size: 6801 bytes --]

Hi Vincent,

On 3/20/23 14:36, Vincent Lefevre wrote:
> On 2023-03-20 13:17:11 +0100, Alejandro Colomar wrote:
>> On 3/20/23 13:05, Siddhesh Poyarekar wrote:
>>> On 2023-03-19 19:07, Vincent Lefevre wrote:
>>>> The function may not know the buffer size if `checked` is true,
>>>> so that it uses a known bound. Thanks to common code factorized,
>>>> this is more readable than
>>>>
>>>>    if (checked)
>>>>      sprintf (buf, "%s", s);
>>>>    else
>>>>      snprintf(buf, n, "%s", s);
>>>>
>>>> in particular in the cases where the format string is complex.
>>
>> That pattern looks like _FORTIFY_SOURCE, doesn't it?  If so, the correct
>> action would be to call sprintf(3) and rely on the compiler to do the
>> checks.
> 
> Not necessarily. Here, this is the programmer who would do the check,
> though he may add code (e.g. assertions) on the caller side to give
> enough information to the compiler so that a smart compiler could
> prove that the call is valid. But this may need an analysis across
> different compilation units. If the compiler cannot check, the
> validity relies on the proof of the code; but not that in any case,
> the code needs to be proved: for instance, snprintf() may yield
> data loss and potential security issues if string truncation is
> not expected (or if the string is truncation too early).
> 

$ cat str.c 
#include <stdio.h>

void f(char *dst, char *src)
{
	sprintf(dst, "%s plus some extra stuff", src);
}

$ cat main.c 
void f(char *dst, char *src);

int main(void)
{
	char *str = "some long string";
	char s[20];

	f(s, str);
}

$ cc -Wall -Wextra *.c -flto -O3 -fanalyzer -D_FORTIFY_SOURCE=1
$ ./a.out 
*** buffer overflow detected ***: terminated
Aborted
$ cc -Wall -Wextra *.c -flto -O3 -fanalyzer -D_FORTIFY_SOURCE=3
alx@asus5775:~/tmp/fort$ ./a.out 
*** buffer overflow detected ***: terminated
Aborted


Is this what you're looking for?  I agree that it would be nicer
if the analyzer could catch this at build time, but it seems it's
not yet so powerful.


On 3/20/23 14:50, Vincent Lefevre wrote:
> On 2023-03-20 08:05:32 -0400, Siddhesh Poyarekar wrote:
>> I think on the glibc front it makes sense from a security
>> perspective to interpret this through POSIX than the C standard.
>> Even if the C standard is clarified to be contrary to POSIX and
>> explicitly state that n is not the size of the buffer (which would
>> be a terrible mistake IMO), I'd lean towards violating the C
>> standard and conforming to POSIX instead.
> I disagree about the POSIX behavior (assuming it is intentional).
> With it, if the compiler detects that n is larger than the actual
> buffer size, then due to undefined behavior, the compiler could
> assume that this is dead code and introduce erratic behavior in
> code written with the C standard in mind (or when it was introduced
> in BSD).
> 
> Note that the printf(3) man page from FreeBSD 1.0 (1991), just says:
> 
>   Snprintf() and vsnprintf() will write at most size-1 of the
>   characters printed into the output string (the size'th character
>   then gets the terminating `\0'); if the return value is greater
>   than or equal to the size argument, the string was too short and
>   some of the printed characters were discarded.
The Linux man-pages show the following SYNOPSIS, which shows the
meaning of 'size' as being an array size:


SYNOPSIS
       #include <stdio.h>

       int printf(const char *restrict format, ...);
       int fprintf(FILE *restrict stream,
                   const char *restrict format, ...);
       int dprintf(int fd,
                   const char *restrict format, ...);
       int sprintf(char *restrict str,
                   const char *restrict format, ...);
       int snprintf(char str[restrict .size], size_t size,
                   const char *restrict format, ...);

       int vprintf(const char *restrict format, va_list ap);
       int vfprintf(FILE *restrict stream,
                   const char *restrict format, va_list ap);
       int vdprintf(int fd,
                   const char *restrict format, va_list ap);
       int vsprintf(char *restrict str,
                   const char *restrict format, va_list ap);
       int vsnprintf(char str[restrict .size], size_t size,
                   const char *restrict format, va_list ap);


I'll need to update the text, though, to be more explicit in
that it's an array size, and not just a limit for the copying.


> 
> Reference:
> https://man.freebsd.org/cgi/man.cgi?query=snprintf&apropos=0&sektion=0&manpath=FreeBSD+1.0-RELEASE&arch=default&format=html
> 
> -- Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


On 3/20/23 16:09, Vincent Lefevre wrote:
> On 2023-03-19 10:45:59 -0400, manfred via Libc-alpha wrote:
>> All of that said, back to the OP case I would not pass INT_MAX to snprintf.
>> If I have a situation wherein I know that the buffer is large enough, but I
>> don't know its exact size, I'd use sprintf and be done with it. (I'm sure
>> that the actual code is more elaborate than this, but still)
> 
> Here's another example where the support of snprintf with a large n
> argument (larger than the buffer size) may be used:
> 
> In GNU MPFR, for our function mpfr_snprintf, we have assumed a
> behavior analogue to the ISO C behavior. In particular, we use
> that in our tests in order to check that large values of n are
> correctly handled. This allowed us to trigger/detect a bug in
> the choice of the integer types in our implementation:
> 
> The test:
> 
> https://gitlab.inria.fr/mpfr/mpfr/-/commit/67a75bfe41d3a7f95367ee9e62bd7dfc73e5b395
> 
> The bug fix (replacing an int by a size_t in a variable declaration):
> 
> https://gitlab.inria.fr/mpfr/mpfr/-/commit/6b8cf3e2bdc285027627281cac230ed932c1b73f

I don't understand how snprintf(3) helped catch the bug.  Could
you show some small reproducer?  Would _FORTIFY_SOURCE not do a
similar thing?

Cheers,

Alex

> 
> Note that the MPFR code currently does not use snprintf or any
> similar function, so that there should be no issues with this
> test if snprintf does not support n > buffer size, it but may
> use such a function in the future. (Indeed, MPFR currently uses
> gmp_vasprintf and gmp_asprintf, thus generating a full output
> before truncating it as needed, so that this is potentially
> very inefficient and could unnecessarily exhaust the memory
> and/or crash.)
> 

-- 
<http://www.alejandro-colomar.es/>
GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 16:15         ` Alejandro Colomar
@ 2023-03-20 16:33           ` Vincent Lefevre
  2023-03-20 17:00           ` Vincent Lefevre
  1 sibling, 0 replies; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 16:33 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 17:15:49 +0100, Alejandro Colomar wrote:
> $ cat str.c 
> #include <stdio.h>
> 
> void f(char *dst, char *src)
> {
> 	sprintf(dst, "%s plus some extra stuff", src);
> }
> 
> $ cat main.c 
> void f(char *dst, char *src);
> 
> int main(void)
> {
> 	char *str = "some long string";
> 	char s[20];
> 
> 	f(s, str);
> }
> 
> $ cc -Wall -Wextra *.c -flto -O3 -fanalyzer -D_FORTIFY_SOURCE=1
> $ ./a.out 
> *** buffer overflow detected ***: terminated
> Aborted
> $ cc -Wall -Wextra *.c -flto -O3 -fanalyzer -D_FORTIFY_SOURCE=3
> alx@asus5775:~/tmp/fort$ ./a.out 
> *** buffer overflow detected ***: terminated
> Aborted
> 
> 
> Is this what you're looking for?  I agree that it would be nicer
> if the analyzer could catch this at build time, but it seems it's
> not yet so powerful.

I meant at build time.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 13:50           ` Vincent Lefevre
@ 2023-03-20 16:56             ` Siddhesh Poyarekar
  2023-03-20 17:36               ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Siddhesh Poyarekar @ 2023-03-20 16:56 UTC (permalink / raw)
  To: Vincent Lefevre, libc-alpha

On 2023-03-20 09:50, Vincent Lefevre wrote:
> On 2023-03-20 08:05:32 -0400, Siddhesh Poyarekar wrote:
>> I think on the glibc front it makes sense from a security
>> perspective to interpret this through POSIX than the C standard.
>> Even if the C standard is clarified to be contrary to POSIX and
>> explicitly state that n is not the size of the buffer (which would
>> be a terrible mistake IMO), I'd lean towards violating the C
>> standard and conforming to POSIX instead.
> 
> I disagree about the POSIX behavior (assuming it is intentional).

Why do you think it may be unintentional?  The POSIX wording seems 
pretty deliberate and clear to me.  The C standard wording on the other 
hand leaves things to the imagination, which is why we're having this 
discussion.

> With it, if the compiler detects that n is larger than the actual
> buffer size, then due to undefined behavior, the compiler could
> assume that this is dead code and introduce erratic behavior in
> code written with the C standard in mind (or when it was introduced
> in BSD).

In fact, with _FORTIFY_SOURCE, if the runtime detects that n is larger 
than the actual buffer size, the code will abort, see pr28989[1].  But 
that's a runtime feature, nothing to do with gcc.

gcc at the moment doesn't have any such check AFAICT but if it does, I 
reckon that's a discussion to be had on the gcc mailing list.

Sid

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=28989

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 16:15         ` Alejandro Colomar
  2023-03-20 16:33           ` Vincent Lefevre
@ 2023-03-20 17:00           ` Vincent Lefevre
  2023-03-20 17:31             ` Siddhesh Poyarekar
  1 sibling, 1 reply; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 17:00 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 17:15:49 +0100, Alejandro Colomar wrote:
> On 3/20/23 16:09, Vincent Lefevre wrote:
> > Here's another example where the support of snprintf with a large n
> > argument (larger than the buffer size) may be used:
> > 
> > In GNU MPFR, for our function mpfr_snprintf, we have assumed a
> > behavior analogue to the ISO C behavior. In particular, we use
> > that in our tests in order to check that large values of n are
> > correctly handled. This allowed us to trigger/detect a bug in
> > the choice of the integer types in our implementation:
> > 
> > The test:
> > 
> > https://gitlab.inria.fr/mpfr/mpfr/-/commit/67a75bfe41d3a7f95367ee9e62bd7dfc73e5b395
> > 
> > The bug fix (replacing an int by a size_t in a variable declaration):
> > 
> > https://gitlab.inria.fr/mpfr/mpfr/-/commit/6b8cf3e2bdc285027627281cac230ed932c1b73f
> 
> I don't understand how snprintf(3) helped catch the bug.

It doesn't. *Currently*, MPFR does not use snprintf. With the buggy
version, on a typical 64-bit machine (where int = 32 bits), the size
given to mpfr_snprintf became the value modulo 2^32, so if n is 2^32
(possible as size_t has 64 bits), the implementation of mpfr_snprintf
assumed that the size was 0 (instead of 2^32). Hence the incorrect
behavior.

The test helped to catch this bug because it checks mpfr_snprintf
on this value n = (size_t) UINT_MAX + 1, which is 2^32 here. But in
order not to use much memory, the test is done with a small buffer.
Using a buffer of size n would need 4 GB, and this amount of memory
is not available everywhere.

As MPFR does not currently use snprintf, there would not be any issue
with any snprintf behavior. But in the future, we should use snprintf
(actually gmp_snprintf or similar, but this may call snprintf at the
end) to implement mpfr_snprintf. In such a case, the above test will
no longer work if snprintf aborts just because n is larger than the
buffer size.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 17:00           ` Vincent Lefevre
@ 2023-03-20 17:31             ` Siddhesh Poyarekar
  2023-03-20 17:45               ` Vincent Lefevre
  0 siblings, 1 reply; 32+ messages in thread
From: Siddhesh Poyarekar @ 2023-03-20 17:31 UTC (permalink / raw)
  To: Vincent Lefevre, libc-alpha

On 2023-03-20 13:00, Vincent Lefevre wrote:
> It doesn't. *Currently*, MPFR does not use snprintf. With the buggy
> version, on a typical 64-bit machine (where int = 32 bits), the size
> given to mpfr_snprintf became the value modulo 2^32, so if n is 2^32
> (possible as size_t has 64 bits), the implementation of mpfr_snprintf
> assumed that the size was 0 (instead of 2^32). Hence the incorrect
> behavior.
> 
> The test helped to catch this bug because it checks mpfr_snprintf
> on this value n = (size_t) UINT_MAX + 1, which is 2^32 here. But in
> order not to use much memory, the test is done with a small buffer.
> Using a buffer of size n would need 4 GB, and this amount of memory
> is not available everywhere.

In glibc we tend to skip a test as UNSUPPORTED when resources to run a 
test are considered uncommon (from the perspective of a test system) and 
cannot be expected to be met in all environments.

It looks like mpfr_snprintf and glibc snprintf are incompatible in this 
context and the latter should not be used to implement the former if the 
n > __bos(dest) use case is important to it.

Then again, I wish mpfr_snprintf also similarly tightened its 
requirements, but that's a discussion for another day.

Thanks,
Sid

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 16:56             ` Siddhesh Poyarekar
@ 2023-03-20 17:36               ` Vincent Lefevre
  0 siblings, 0 replies; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 17:36 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 12:56:34 -0400, Siddhesh Poyarekar wrote:
> On 2023-03-20 09:50, Vincent Lefevre wrote:
> > On 2023-03-20 08:05:32 -0400, Siddhesh Poyarekar wrote:
> > > I think on the glibc front it makes sense from a security
> > > perspective to interpret this through POSIX than the C standard.
> > > Even if the C standard is clarified to be contrary to POSIX and
> > > explicitly state that n is not the size of the buffer (which would
> > > be a terrible mistake IMO), I'd lean towards violating the C
> > > standard and conforming to POSIX instead.
> > 
> > I disagree about the POSIX behavior (assuming it is intentional).
> 
> Why do you think it may be unintentional?

Yes, because it does not warn about a difference with the C standard
(and see below...).

> The POSIX wording seems pretty deliberate and clear to me.

Standards may have clear text, while some corner cases have been
overlooked. This occurs frequently. As an example, see

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399
  (LDBL_MAX is incorrect with IBM long double format)

and a DR eventually changed the definition, even though it was clear:

  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61399#c4

Note: This change of definition was not much an issue in practice
because it made the standard match the actual implementations
(probably all of them). So it did not break anything.

Back to POSIX: The text was probably written like that because in
almost all the cases, n is a valid size of the array and the change
of formulation made the text a bit simpler. But this broke some
legitimate uses, and at that time, the working group may not have
been aware of that. It would be useful to know whether this had been
brought in the discussions at that time.

> The C standard wording on the other hand leaves things to the
> imagination, which is why we're having this discussion.

What things to the imagination? IMHO, the C standard wording is very
clear.

> > With it, if the compiler detects that n is larger than the actual
> > buffer size, then due to undefined behavior, the compiler could
> > assume that this is dead code and introduce erratic behavior in
> > code written with the C standard in mind (or when it was introduced
> > in BSD).
> 
> In fact, with _FORTIFY_SOURCE, if the runtime detects that n is larger than
> the actual buffer size, the code will abort, see pr28989[1].  But that's a
> runtime feature, nothing to do with gcc.
> 
> gcc at the moment doesn't have any such check AFAICT but if it does, I
> reckon that's a discussion to be had on the gcc mailing list.

If the ISO C standard is meant to be changed, all compilers could
potentially consider the code as dead code, not just GCC.

IMHO, it would be very bad to consider the code as UB. It would be
better to standardize _FORTIFY_SOURCE (that would add restrictions,
but with a controlled behavior).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: UB status of snprintf on invalid ptr+size combination?
  2023-03-20 17:31             ` Siddhesh Poyarekar
@ 2023-03-20 17:45               ` Vincent Lefevre
  0 siblings, 0 replies; 32+ messages in thread
From: Vincent Lefevre @ 2023-03-20 17:45 UTC (permalink / raw)
  To: libc-alpha

On 2023-03-20 13:31:33 -0400, Siddhesh Poyarekar wrote:
> In glibc we tend to skip a test as UNSUPPORTED when resources to run a test
> are considered uncommon (from the perspective of a test system) and cannot
> be expected to be met in all environments.

In MPFR, we have some tests that need much memory, which are enabled
only if the MPFR_CHECK_LARGEMEM environment variable is set. But this
means that some issues would not be detected on particular platforms
by end users (MPFR_CHECK_LARGEMEM is mainly for developers).

> It looks like mpfr_snprintf and glibc snprintf are incompatible in this
> context and the latter should not be used to implement the former if the n >
> __bos(dest) use case is important to it.
> 
> Then again, I wish mpfr_snprintf also similarly tightened its requirements,
> but that's a discussion for another day.

Well, if the C standard is changed, I suppose that we will align to
it. We can still recommend not to use n > buffer size if this is not
supported by snprintf (even though the MPFR code is currently
guaranteed to work in this case).

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-03-20 17:45 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14 19:47 UB status of snprintf on invalid ptr+size combination? Simon Chopin
2023-03-14 21:39 ` Paul Eggert
2023-03-15  9:22   ` Andreas Schwab
2023-03-15 15:54     ` Siddhesh Poyarekar
2023-03-15 18:34     ` Michael Hudson-Doyle
2023-03-19 14:45     ` manfred
2023-03-19 23:07       ` Vincent Lefevre
2023-03-20 12:05         ` Siddhesh Poyarekar
2023-03-20 12:17           ` Alejandro Colomar
2023-03-20 12:29             ` Siddhesh Poyarekar
2023-03-20 13:36             ` Vincent Lefevre
2023-03-20 13:50           ` Vincent Lefevre
2023-03-20 16:56             ` Siddhesh Poyarekar
2023-03-20 17:36               ` Vincent Lefevre
2023-03-20 15:09       ` Vincent Lefevre
2023-03-20 16:15         ` Alejandro Colomar
2023-03-20 16:33           ` Vincent Lefevre
2023-03-20 17:00           ` Vincent Lefevre
2023-03-20 17:31             ` Siddhesh Poyarekar
2023-03-20 17:45               ` Vincent Lefevre
2023-03-15 12:39   ` Vincent Lefevre
2023-03-16 10:29     ` Stephan Bergmann
2023-03-18  2:07       ` Vincent Lefevre
2023-03-18  2:30         ` Alejandro Colomar
2023-03-18 10:58           ` Vincent Lefevre
2023-03-18 15:01             ` Andreas Schwab
2023-03-19 22:48               ` Vincent Lefevre
2023-03-19 23:24                 ` Andreas Schwab
2023-03-20  4:10                   ` Vincent Lefevre
2023-03-20  9:19                     ` Andreas Schwab
2023-03-20 10:42                       ` Vincent Lefevre
2023-03-20 10:44                         ` Andreas Schwab

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).