public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Alejandro Colomar <alx.manpages@gmail.com>
To: Zack Weinberg <zack@owlfolio.org>
Cc: libc-alpha@sourceware.org,
	'linux-man' <linux-man@vger.kernel.org>,
	Ian Abbott <abbotti@mev.co.uk>
Subject: Re: [PATCH] scanf.3: Do not mention the ERANGE error
Date: Wed, 14 Dec 2022 11:47:12 +0100	[thread overview]
Message-ID: <4fe9ed93-8fb9-64d0-26f1-a9560387d108@gmail.com> (raw)
In-Reply-To: <ypikwn6uag11.fsf@owlfolio.org>


[-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --]

Hi Zack,

On 12/14/22 03:13, Zack Weinberg wrote:
> Alejandro Colomar <alx.manpages@gmail.com> writes:
>> Okay, so %s and $[ are at least usable.  Useful?  I don't know.  Probably
>> fgets(3) and then either <string.h> or <regex.h> functions or taking
>> unterminated strings (pointer plus length) is a much better idea.
> 
> Yeah, agreed.
> 
>>> The other design-level issue affects all of the numeric conversions:
>>> if the result of (abstract, infinite-precision) numeric input conversion
>>> does not fit in the variable supplied to hold the result of that conversion,
>>> the behavior is undefined.  The manpage says that you get an ERANGE error
>>> in this case, and that may be the behavior _glibc_ guarantees (I don’t
>>> actually know for sure), but in the modern era of compilers drawing
>>> inferences from undefined behavior, a guarantee by one C library is
>>> not good enough.
>>
>> This, to me, is enough to mark them as deprecated in the manual page.  Anyway,
>> deprecating something is not removing it.  It's just saying "hey, you shouldn't
>> be using that; it's bad, and don't expect that ISO C will keep it around next
>> century".
> 
> In my lexicon “deprecated” is a very strong statement, possibly because
> I’m used to seeing it in the context of standards where it means “we
> think we should never have added this in the first place, there’s no
> plausible way to fix it, but we have to keep it around for backward
> compatibility.”
> 
> The scanf numeric conversions could be fixed with a one-sentence edit to
> the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10
> from “If this object does not have an appropriate type, or if the result
> of the conversion cannot be represented in the object, the behavior is
> undefined” to “If this object does not have an appropriate type, the
> behavior is undefined.  If the result of the conversion cannot be
> represented in the object, the execution of the directive fails; this
> condition is a matching failure.”  And, even if the C committee doesn’t
> want to make that change, open-source C libraries can and should do it
> unilaterally, as a documented implementation extension.  I think that’s
> a better plan than declaring most uses of *scanf “deprecated.”

Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :)

> 
>>>   And, the scanf family *can* be used safely with
>>> sufficient care — read entire lines of input with getline,
>>
>> If getline(3) _is necessary_ to be safe, then I would deprecate the stream
>> functions, and keep only the "s" variants.  Is it?
> 
> Oh, right, the _third_ headache with fscanf.
> 
> Yes, I think it would be fair to say that it is almost always a mistake
> to use the scanf variants that read directly from a FILE.  The issue
> here is, at its root, that people new to C _expect_ a scanf call to read
> an entire line of input, but it doesn’t. This is especially problematic
> for interactive input — they try to use plain scanf to read numeric
> input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in
> the terminal’s line buffer _after_ the number, and get very confused
> when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they
> were expecting as a response to the _next_ prompt.  But it’s _also_ a
> problem for error recovery, because scanf will stop in the middle of the
> line when a matching failure occurs, and if you naively assumed it would
> throw away the rest of the line, you get an error cascade.
> 
> The recommended practice to avoid this trap, is that you should use one
> of the functions that _does_ read an entire line of input, i.e. fgets or
> getline, and then parse the line as a string.  It would make sense for
> the [f]scanf manpage to say that.

Please clarify; do you consider [v][f]scanf something that "we think we should 
never have added this in the first place, there’s no plausible way to fix it, 
but we have to keep it around for backward compatibility"?

> 
>>> In a more sober tone of voice I suggest this text for the manpage:
> …
>> That makes sense to me.  Would you mind sending a patch?  :)
> 
> I do not have time to do that anytime soon.  Also, maybe glibc’s
> behavior on numeric input overflow should be fixed first.

That also makes sense ;)

In short:

(1)  Numeric conversion specifiers are broken but can be fixed, and you plan to 
fix them.

      (1.1)  I'll revert the deprecation warning now; since they are only broken 
because the _current_ standard and implementations are broken, but not by 
inherent design problems.

      (1.2)  When you fix the implementation to not be UB anymore, it will also 
make sense to revert the patch that removed the ERANGE error, since you'll need 
to report it.

(2)  For the string conversion specifiers, there are ways to use them safely, 
and you plan to add a way to specify a size at runtime to the function, so it 
will be even better in the future.  No action required.

(3)  [v][f]scanf seem to be really broken by design.  Please confirm.

Cheers,

Alex

> 
> zw

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2022-12-14 10:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20221208123454.13132-1-abbotti@mev.co.uk>
2022-12-09 18:59 ` Alejandro Colomar
2022-12-09 19:28   ` Ian Abbott
2022-12-09 19:33     ` Alejandro Colomar
2022-12-09 21:41       ` Zack Weinberg
2022-12-11 15:58         ` Alejandro Colomar
2022-12-11 16:03           ` Alejandro Colomar
2022-12-12  2:11           ` Zack Weinberg
2022-12-12 10:21             ` Alejandro Colomar
2022-12-14  2:13               ` Zack Weinberg
2022-12-14 10:47                 ` Alejandro Colomar [this message]
2022-12-14 11:03                   ` Ian Abbott
2022-12-29  6:42                     ` Zack Weinberg
2022-12-29  6:39                   ` Zack Weinberg
2022-12-29 10:47                     ` Alejandro Colomar
2022-12-29 16:35                       ` Zack Weinberg
2022-12-29 16:39                         ` Alejandro Colomar
2022-12-12 15:22             ` Ian Abbott
2022-12-14  2:18               ` Zack Weinberg
2022-12-14 10:22                 ` Ian Abbott
2022-12-14 10:39                   ` Alejandro Colomar
2022-12-14 10:52                     ` Ian Abbott
2022-12-14 11:23                       ` Alejandro Colomar
2022-12-14 14:10                         ` Ian Abbott
2022-12-14 16:38                         ` Joseph Myers
2022-12-12 10:07       ` Ian Abbott

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fe9ed93-8fb9-64d0-26f1-a9560387d108@gmail.com \
    --to=alx.manpages@gmail.com \
    --cc=abbotti@mev.co.uk \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-man@vger.kernel.org \
    --cc=zack@owlfolio.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).