public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Zack Weinberg <zack@owlfolio.org>
Cc: libc-alpha@sourceware.org,
	'linux-man' <linux-man@vger.kernel.org>,
	Ian Abbott <abbotti@mev.co.uk>
Subject: Re: [PATCH] scanf.3: Do not mention the ERANGE error
Date: Mon, 12 Dec 2022 11:21:29 +0100	[thread overview]
Message-ID: <a7a60a45-afb2-2fae-f6b0-a26db649c09c@gmail.com> (raw)
In-Reply-To: <ypikk02xv09c.fsf@owlfolio.org>


[-- Attachment #1.1: Type: text/plain, Size: 8332 bytes --]

Hi Zack!

On 12/12/22 03:11, Zack Weinberg wrote:
> Alejandro Colomar <alx.manpages@gmail.com> writes:
>> On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote:
>>> The `scanf` functions have undefined behavior if numeric input
>>> overflows.  This means it is *impossible* to detect malformed input
>>> reliably using these functions.
>>> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of
>>> characters into a destination buffer whose size is unspecified; any
>>> use of such specifications renders `scanf` every bit as dangerous as
>>> `gets`.
> …
>>> Best practice is not to use any of these functions at all.
> …
>> I'm inclined to add that in that manual page.  Is there anything that
>> can be saved from that page, or should we burn it all?  To be more
>> specific:
>>
>> -  Are there any functions in that page that are still useful for any
>>     corner cases, or are they all useless?
>> -  Are there any conversion specifiers that can be used safely?
> 
> Hmm, this turns out to be a bit of a rabbit hole.
> 
> There are two major design-level problems with the scanf family.
> The more important one is that string input conversions (%s, %[…])
> will read an unlimited number of characters by default, oblivious to
> the size of the destination buffer — exactly the same design flaw as
> ‘gets’.  They do stop scanning at _any_ whitespace (not just \n) so,
> if you’re trying to craft exploit code, there are more byte values that
> must be avoided, but this is only a minor obstacle.  They can, however,
> be used safely, either by supplying a field width that accurately
> reflects the size of the destination buffer, or by using the ‘m’
> modifier (a POSIX extension), which directs scanf to allocate the
> right amount of space for the string with malloc.

Okay, so %s and $[ are at least usable.  Useful?  I don't know.  Probably 
fgets(3) and then either <string.h> or <regex.h> functions or taking 
unterminated strings (pointer plus length) is a much better idea.

But not enough to deprecate the specifiers; probably better to warn on GCC.

> 
> (Field widths are awkward to use because you have to write them as
> decimal constants _inside the format string_, which makes them more
> likely to get out of sync with the actual size of the buffer than,
> say, the buffer-size argument to ‘fgets’, but this is not a fatal
> flaw in and of itself.)

Yeah; it's an almost useless feature, but not a fatal flaw.  Any programmer 
would probably intuitively know that it's bad.  So, no action is required here.

> 
> The other design-level issue affects all of the numeric conversions:
> if the result of (abstract, infinite-precision) numeric input conversion
> does not fit in the variable supplied to hold the result of that conversion,
> the behavior is undefined.  The manpage says that you get an ERANGE error
> in this case, and that may be the behavior _glibc_ guarantees (I don’t
> actually know for sure), but in the modern era of compilers drawing
> inferences from undefined behavior, a guarantee by one C library is
> not good enough.

This, to me, is enough to mark them as deprecated in the manual page.  Anyway, 
deprecating something is not removing it.  It's just saying "hey, you shouldn't 
be using that; it's bad, and don't expect that ISO C will keep it around next 
century".

Something that results in undefined behavior without control of the programmer 
is as bad as gets(3) (okay, not as bad; gets(3) just was a red carpet for 
attacks, but fundamentally, yes).

So, I'll apply the diff shown at the bottom of the page for this.

> 
> That covers everything except %c and %n, which are safe but somewhat
> pointless in isolation.
> 
>> Or the converse questions:
>>
>> -  Which conversion specifiers (or modifiers) are impossible to use
>>     safely as gets(3) and should therefore be marked as deprecated in
>>     the manual page (and probably warned in GCC)?
>> -  Which functions in that page are impossible to use safely and
>>     should therefore be marked as deprecated?
>>
>> Would you please mark them as [[deprecated]] in glibc too?
> 
> I don’t think glibc should unilaterally deprecate any function that’s
> specified by ISO C.

Okay.  The man-pages are not that restricted, since they won't affect code at 
all, and only the minds of the programmers, which is more powerful.  So, I'll 
mark as deprecated at least the integer conversion specifiers.

>  And, the scanf family *can* be used safely with
> sufficient care — read entire lines of input with getline,

If getline(3) _is necessary_ to be safe, then I would deprecate the stream 
functions, and keep only the "s" variants.  Is it?

In fact, I'd say that even if it's not necessary to be safe, there are no good 
reasons to use [f]scanf(3) at all.  I'm very much considering deprecating them 
in the manual page.

> then split
> them up into fields with sscanf using only %ms and %m[…], and finally
> parse all numeric fields by hand with strtoX — the issue is more that,
> if you limit yourself to the set of scanf operations that are 100% safe,
> you’re left only with stuff that is arguably *easier* to do with <string.h>
> and <regex.h> functions.
> 
> In a more sober tone of voice I suggest this text for the manpage:
> 
> .SH BUGS
> By default, the
> .IR %s " and " %[
> conversions will read an
> .I unlimited
> number of characters from the input.
> In this mode they are just as unsafe as the infamous
> .BR gets (3).
> One should always specify either a field width,
> or the
> .I m
> modifier,
> with every use of
> .IR %s " or " %[ .
> .PP
> If a numeric input conversion produces a value
> that is not representable in the type of the corresponding argument
> (e.g. if 99999 is to be stored in an
> .IR "unsigned short" ),
> ISO C says that the behavior is undefined.
> The GNU C Library guarantees to treat this condition as a
> .IR "matching failure" ,
> but portable code should avoid using the numeric conversions.

That makes sense to me.  Would you mind sending a patch?  :)

> 
> I also suggest that GCC should add diagnostics to -Wformat and/or
> -Wformat-security to catch use of %s and %[ with no size specified;
> if glibc doesn’t already treat numeric overflow as a matching failure,
> it should be changed to do so; and maybe someone should write up a
> proposal for the C standard to make the same change.

Acked-by: Alejandro Colomar <alx@kernel.org>

> 
> zw

Cheers,

Alex

---


diff --git a/man3/scanf.3 b/man3/scanf.3
index ba470a5c1..0041d5573 100644
--- a/man3/scanf.3
+++ b/man3/scanf.3
@@ -386,6 +386,7 @@ .SS Conversions
  and assignment does not occur.
  .TP
  .B d
+.IR Deprecated .
  Matches an optionally signed decimal integer;
  the next pointer must be a pointer to
  .IR int .
@@ -400,6 +401,7 @@ .SS Conversions
  .\" is silently ignored, causing old programs to fail mysteriously.)
  .TP
  .B i
+.IR Deprecated .
  Matches an optionally signed integer; the next pointer must be a pointer to
  .IR int .
  The integer is read in base 16 if it begins with
@@ -412,15 +414,18 @@ .SS Conversions
  Only characters that correspond to the base are used.
  .TP
  .B o
+.IR Deprecated .
  Matches an unsigned octal integer; the next pointer must be a pointer to
  .IR "unsigned int" .
  .TP
  .B u
+.IR Deprecated .
  Matches an unsigned decimal integer; the next pointer must be a
  pointer to
  .IR "unsigned int" .
  .TP
  .B x
+.IR Deprecated .
  Matches an unsigned hexadecimal integer
  (that may optionally begin with a prefix of
  .I 0x
@@ -431,27 +436,33 @@ .SS Conversions
  .IR "unsigned int" .
  .TP
  .B X
+.IR Deprecated .
  Equivalent to
  .BR x .
  .TP
  .B f
+.IR Deprecated .
  Matches an optionally signed floating-point number; the next pointer must
  be a pointer to
  .IR float .
  .TP
  .B e
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B g
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B E
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B a
+.IR Deprecated .
  (C99) Equivalent to
  .BR f .
  .TP



-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-12-12 10:21 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221208123454.13132-1-abbotti@mev.co.uk>
2022-12-09 18:59 ` Alejandro Colomar
2022-12-09 19:28   ` Ian Abbott
2022-12-09 19:33     ` Alejandro Colomar
2022-12-09 21:41       ` Zack Weinberg
2022-12-11 15:58         ` Alejandro Colomar
2022-12-11 16:03           ` Alejandro Colomar
2022-12-12  2:11           ` Zack Weinberg
2022-12-12 10:21             ` Alejandro Colomar [this message]
2022-12-14  2:13               ` Zack Weinberg
2022-12-14 10:47                 ` Alejandro Colomar
2022-12-14 11:03                   ` Ian Abbott
2022-12-29  6:42                     ` Zack Weinberg
2022-12-29  6:39                   ` Zack Weinberg
2022-12-29 10:47                     ` Alejandro Colomar
2022-12-29 16:35                       ` Zack Weinberg
2022-12-29 16:39                         ` Alejandro Colomar
2022-12-12 15:22             ` Ian Abbott
2022-12-14  2:18               ` Zack Weinberg
2022-12-14 10:22                 ` Ian Abbott
2022-12-14 10:39                   ` Alejandro Colomar
2022-12-14 10:52                     ` Ian Abbott
2022-12-14 11:23                       ` Alejandro Colomar
2022-12-14 14:10                         ` Ian Abbott
2022-12-14 16:38                         ` Joseph Myers
2022-12-12 10:07       ` Ian Abbott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7a60a45-afb2-2fae-f6b0-a26db649c09c@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=abbotti@mev.co.uk \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).