public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: [PATCH] scanf.3: Do not mention the ERANGE error
       [not found] <20221208123454.13132-1-abbotti@mev.co.uk>
@ 2022-12-09 18:59 ` Alejandro Colomar
  2022-12-09 19:28   ` Ian Abbott
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-09 18:59 UTC (permalink / raw)
  To: Ian Abbott, Alejandro Colomar; +Cc: linux-man, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 2007 bytes --]

Hi Ian,

On 12/8/22 13:34, Ian Abbott wrote:
> The `scanf()` function does not intentionally set `errno` to `ERANGE`.
> That is just a side effect of the code that it uses to perform
> conversions.  It also does not work as reliably as indicated in the
> 'man' page when the target integer type is narrower than `long`.
> Typically (at least in glibc) for target integer types narrower than
> `long`, the number has to exceed the range of `long` (for signed
> conversions) or `unsigned long` (for unsigned conversions) for `errno`
> to be set to `ERANGE`.
> 
> Documenting `ERANGE` in the ERRORS section kind of implies that
> `scanf()` should return `EOF` when an integer overflow is encountered,
> which it doesn't (and doing so would violate the C standard).
> 
> Just remove any mention of the `ERANGE` error to avoid confusion.
> 
> Fixes: 646af540e467 ("Add an ERRORS section documenting at least some of the errors that may occur for scanf().")
> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>

I see.  How about saying something like "it may also fail for any of any errors 
that functions used to perform the conversions may fail"?

Cheers,

Alex

> ---
>   man3/scanf.3 | 7 -------
>   1 file changed, 7 deletions(-)
> 
> diff --git a/man3/scanf.3 b/man3/scanf.3
> index ba470a5c1..c5ff59f45 100644
> --- a/man3/scanf.3
> +++ b/man3/scanf.3
> @@ -576,10 +576,6 @@ is NULL.
>   .TP
>   .B ENOMEM
>   Out of memory.
> -.TP
> -.B ERANGE
> -The result of an integer conversion would exceed the size
> -that can be stored in the corresponding integer type.
>   .SH ATTRIBUTES
>   For an explanation of the terms used in this section, see
>   .BR attributes (7).
> @@ -609,9 +605,6 @@ The functions
>   and
>   .BR sscanf ()
>   conform to C89 and C99 and POSIX.1-2001.
> -These standards do not specify the
> -.B ERANGE
> -error.
>   .PP
>   The
>   .B q

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-09 18:59 ` [PATCH] scanf.3: Do not mention the ERANGE error Alejandro Colomar
@ 2022-12-09 19:28   ` Ian Abbott
  2022-12-09 19:33     ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Abbott @ 2022-12-09 19:28 UTC (permalink / raw)
  To: Alejandro Colomar, Alejandro Colomar; +Cc: linux-man, GNU C Library

On 09/12/2022 18:59, Alejandro Colomar wrote:
> Hi Ian,
> 
> On 12/8/22 13:34, Ian Abbott wrote:
>> The `scanf()` function does not intentionally set `errno` to `ERANGE`.
>> That is just a side effect of the code that it uses to perform
>> conversions.  It also does not work as reliably as indicated in the
>> 'man' page when the target integer type is narrower than `long`.
>> Typically (at least in glibc) for target integer types narrower than
>> `long`, the number has to exceed the range of `long` (for signed
>> conversions) or `unsigned long` (for unsigned conversions) for `errno`
>> to be set to `ERANGE`.
>>
>> Documenting `ERANGE` in the ERRORS section kind of implies that
>> `scanf()` should return `EOF` when an integer overflow is encountered,
>> which it doesn't (and doing so would violate the C standard).
>>
>> Just remove any mention of the `ERANGE` error to avoid confusion.
>>
>> Fixes: 646af540e467 ("Add an ERRORS section documenting at least some 
>> of the errors that may occur for scanf().")
>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
> 
> I see.  How about saying something like "it may also fail for any of any 
> errors that functions used to perform the conversions may fail"?

It depends what you mean by "fail".  These errors do not make scanf 
return EOF.  Technically, the behavior is undefined if the result of the 
conversion cannot be represented in the object being assigned to by 
scanf.  (In the case of glibc, that probably results in either the 
integer object being set to a truncated version of the input integer, or 
the integer object being set to a truncated version of LONG_MIN or 
LONG_MAX, depending on the actual number.)

Setting errno to 0 before calling scanf and expecting errno to have a 
meaningful value when scanf returns something other than EOF is bogus usage.

> 
> Cheers,
> 
> Alex

Cheers,
Ian

> 
>> ---
>>   man3/scanf.3 | 7 -------
>>   1 file changed, 7 deletions(-)
>>
>> diff --git a/man3/scanf.3 b/man3/scanf.3
>> index ba470a5c1..c5ff59f45 100644
>> --- a/man3/scanf.3
>> +++ b/man3/scanf.3
>> @@ -576,10 +576,6 @@ is NULL.
>>   .TP
>>   .B ENOMEM
>>   Out of memory.
>> -.TP
>> -.B ERANGE
>> -The result of an integer conversion would exceed the size
>> -that can be stored in the corresponding integer type.
>>   .SH ATTRIBUTES
>>   For an explanation of the terms used in this section, see
>>   .BR attributes (7).
>> @@ -609,9 +605,6 @@ The functions
>>   and
>>   .BR sscanf ()
>>   conform to C89 and C99 and POSIX.1-2001.
>> -These standards do not specify the
>> -.B ERANGE
>> -error.
>>   .PP
>>   The
>>   .B q
> 

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-09 19:28   ` Ian Abbott
@ 2022-12-09 19:33     ` Alejandro Colomar
  2022-12-09 21:41       ` Zack Weinberg
  2022-12-12 10:07       ` Ian Abbott
  0 siblings, 2 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-09 19:33 UTC (permalink / raw)
  To: Ian Abbott, Alejandro Colomar; +Cc: linux-man, GNU C Library


[-- Attachment #1.1: Type: text/plain, Size: 2570 bytes --]

Hi Ian,

On 12/9/22 20:28, Ian Abbott wrote:
> On 09/12/2022 18:59, Alejandro Colomar wrote:
>> On 12/8/22 13:34, Ian Abbott wrote:
>>> The `scanf()` function does not intentionally set `errno` to `ERANGE`.
>>> That is just a side effect of the code that it uses to perform
>>> conversions.  It also does not work as reliably as indicated in the
>>> 'man' page when the target integer type is narrower than `long`.
>>> Typically (at least in glibc) for target integer types narrower than
>>> `long`, the number has to exceed the range of `long` (for signed
>>> conversions) or `unsigned long` (for unsigned conversions) for `errno`
>>> to be set to `ERANGE`.
>>>
>>> Documenting `ERANGE` in the ERRORS section kind of implies that
>>> `scanf()` should return `EOF` when an integer overflow is encountered,
>>> which it doesn't (and doing so would violate the C standard).
>>>
>>> Just remove any mention of the `ERANGE` error to avoid confusion.
>>>
>>> Fixes: 646af540e467 ("Add an ERRORS section documenting at least some of the 
>>> errors that may occur for scanf().")
>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
>>
>> I see.  How about saying something like "it may also fail for any of any 
>> errors that functions used to perform the conversions may fail"?
> 
> It depends what you mean by "fail".  These errors do not make scanf return EOF.  

Just to clarify.  Does scanf(3) _never_ fail (EOF) due to ERANGE?  Or is it that 
ERANGE sometimes makes it fail, sometimes not?

If it's the former, I agree with your patch.  When a function hasn't reported 
failure, errno is unspecified.

If it's the latter, I'd write something about it.


> Technically, the behavior is undefined if the result of the conversion cannot be 
> represented in the object being assigned to by scanf.  (In the case of glibc, 
> that probably results in either the integer object being set to a truncated 
> version of the input integer, or the integer object being set to a truncated 
> version of LONG_MIN or LONG_MAX, depending on the actual number.)

Hmm, UB.  Under UB, anything can change, so error reporting is already 
unreliable.  If EOF+ERANGE can _only_ happen under UB, I'd rather remove the 
paragraph.  Please confirm.

> 
> Setting errno to 0 before calling scanf and expecting errno to have a meaningful 
> value when scanf returns something other than EOF is bogus usage.

Yep, that's bogus.


Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-09 19:33     ` Alejandro Colomar
@ 2022-12-09 21:41       ` Zack Weinberg
  2022-12-11 15:58         ` Alejandro Colomar
  2022-12-12 10:07       ` Ian Abbott
  1 sibling, 1 reply; 25+ messages in thread
From: Zack Weinberg @ 2022-12-09 21:41 UTC (permalink / raw)
  To: libc-alpha, 'linux-man'

On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote:
>> Technically, the behavior is undefined if the result of the conversion 
>> cannot be represented in the object being assigned to by scanf.  (In 
>> the case of glibc, that probably results in either the integer object 
>> being set to a truncated version of the input integer, or the integer 
>> object being set to a truncated version of LONG_MIN or LONG_MAX, 
>> depending on the actual number.)
> 
> Hmm, UB.  Under UB, anything can change, so error reporting is already 
> unreliable.  If EOF+ERANGE can _only_ happen under UB, I'd rather remove 
> the paragraph.  Please confirm.
			
BUGS

The `scanf` functions have undefined behavior if numeric input 
overflows.  This means it is *impossible* to detect malformed input 
reliably using these functions.

Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of 
characters into a destination buffer whose size is unspecified; any use 
of such specifications renders `scanf` every bit as dangerous as `gets`.

Best practice is not to use any of these functions at all.

zw (no, this is not a joke)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-09 21:41       ` Zack Weinberg
@ 2022-12-11 15:58         ` Alejandro Colomar
  2022-12-11 16:03           ` Alejandro Colomar
  2022-12-12  2:11           ` Zack Weinberg
  0 siblings, 2 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-11 15:58 UTC (permalink / raw)
  To: Zack Weinberg, libc-alpha, 'linux-man'; +Cc: Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --]

[CC += Ian]

Hi Zack,

On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote:
> On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote:
>>> Technically, the behavior is undefined if the result of the conversion cannot 
>>> be represented in the object being assigned to by scanf.  (In the case of 
>>> glibc, that probably results in either the integer object being set to a 
>>> truncated version of the input integer, or the integer object being set to a 
>>> truncated version of LONG_MIN or LONG_MAX, depending on the actual number.)
>>
>> Hmm, UB.  Under UB, anything can change, so error reporting is already 
>> unreliable.  If EOF+ERANGE can _only_ happen under UB, I'd rather remove the 
>> paragraph.  Please confirm.
> 
> BUGS
> 
> The `scanf` functions have undefined behavior if numeric input overflows.  This 
> means it is *impossible* to detect malformed input reliably using these functions.
> 
> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of characters 
> into a destination buffer whose size is unspecified; any use of such 
> specifications renders `scanf` every bit as dangerous as `gets`.

Thanks for reminding that!  Since I don't use these functions, I don't remember 
how bad they are :)

> 
> Best practice is not to use any of these functions at all.
> 
> zw (no, this is not a joke)

I'm inclined to add that in that manual page.  Is there anything that can be 
saved from that page, or should we burn it all?  To be more specific:

-  Are there any functions in that page that are still useful for any corner 
cases, or are they all useless?
-  Are there any conversion specifiers that can be used safely?

Or the converse questions:

-  Which conversion specifiers (or modifiers) are impossible to use safely as 
gets(3) and should therefore be marked as deprecated in the manual page (and 
probably warned in GCC)?
-  Which functions in that page are impossible to use safely and should 
therefore be marked as deprecated?

Would you please mark them as [[deprecated]] in glibc too?  This is not 
essential to me, since I can mark them as deprecated in the manual pages without 
that happening, but it'd help.

Cheers,

Alex

> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-11 15:58         ` Alejandro Colomar
@ 2022-12-11 16:03           ` Alejandro Colomar
  2022-12-12  2:11           ` Zack Weinberg
  1 sibling, 0 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-11 16:03 UTC (permalink / raw)
  To: Zack Weinberg, libc-alpha, 'linux-man'; +Cc: Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 2621 bytes --]



On 12/11/22 16:58, Alejandro Colomar wrote:
> [CC += Ian]
> 
> Hi Zack,
> 
> On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote:
>> On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote:
>>>> Technically, the behavior is undefined if the result of the conversion 
>>>> cannot be represented in the object being assigned to by scanf.  (In the 
>>>> case of glibc, that probably results in either the integer object being set 
>>>> to a truncated version of the input integer, or the integer object being set 
>>>> to a truncated version of LONG_MIN or LONG_MAX, depending on the actual 
>>>> number.)
>>>
>>> Hmm, UB.  Under UB, anything can change, so error reporting is already 
>>> unreliable.  If EOF+ERANGE can _only_ happen under UB, I'd rather remove the 
>>> paragraph.  Please confirm.
>>
>> BUGS
>>
>> The `scanf` functions have undefined behavior if numeric input overflows.  
>> This means it is *impossible* to detect malformed input reliably using these 
>> functions.
>>
>> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of characters 
>> into a destination buffer whose size is unspecified; any use of such 
>> specifications renders `scanf` every bit as dangerous as `gets`.
> 
> Thanks for reminding that!  Since I don't use these functions, I don't remember 
> how bad they are :)
> 
>>
>> Best practice is not to use any of these functions at all.
>>
>> zw (no, this is not a joke)
> 
> I'm inclined to add that in that manual page.  Is there anything that can be 
> saved from that page, or should we burn it all?  To be more specific:
> 
> -  Are there any functions in that page that are still useful for any corner 
> cases, or are they all useless?
> -  Are there any conversion specifiers that can be used safely?
> 
> Or the converse questions:
> 
> -  Which conversion specifiers (or modifiers) are impossible to use safely as 
> gets(3) and should therefore be marked as deprecated in the manual page (and 
> probably warned in GCC)?
> -  Which functions in that page are impossible to use safely and should 
> therefore be marked as deprecated?

This includes functions that can only be used safely for the most basic behavior 
for which better functions such as fgets(3) are superior.  I'd also deprecate those.

> 
> Would you please mark them as [[deprecated]] in glibc too?  This is not 
> essential to me, since I can mark them as deprecated in the manual pages without 
> that happening, but it'd help.
> 
> Cheers,
> 
> Alex
> 
>>
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-11 15:58         ` Alejandro Colomar
  2022-12-11 16:03           ` Alejandro Colomar
@ 2022-12-12  2:11           ` Zack Weinberg
  2022-12-12 10:21             ` Alejandro Colomar
  2022-12-12 15:22             ` Ian Abbott
  1 sibling, 2 replies; 25+ messages in thread
From: Zack Weinberg @ 2022-12-12  2:11 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott

Alejandro Colomar <alx.manpages@gmail.com> writes:
> On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote:
>> The `scanf` functions have undefined behavior if numeric input
>> overflows.  This means it is *impossible* to detect malformed input
>> reliably using these functions.
>> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of
>> characters into a destination buffer whose size is unspecified; any
>> use of such specifications renders `scanf` every bit as dangerous as
>> `gets`.
>> Best practice is not to use any of these functions at all.
> I'm inclined to add that in that manual page.  Is there anything that
> can be saved from that page, or should we burn it all?  To be more
> specific:
>
> -  Are there any functions in that page that are still useful for any
>    corner cases, or are they all useless?
> -  Are there any conversion specifiers that can be used safely?

Hmm, this turns out to be a bit of a rabbit hole.

There are two major design-level problems with the scanf family.
The more important one is that string input conversions (%s, %[…])
will read an unlimited number of characters by default, oblivious to
the size of the destination buffer — exactly the same design flaw as
‘gets’.  They do stop scanning at _any_ whitespace (not just \n) so,
if you’re trying to craft exploit code, there are more byte values that
must be avoided, but this is only a minor obstacle.  They can, however,
be used safely, either by supplying a field width that accurately
reflects the size of the destination buffer, or by using the ‘m’
modifier (a POSIX extension), which directs scanf to allocate the
right amount of space for the string with malloc.

(Field widths are awkward to use because you have to write them as
decimal constants _inside the format string_, which makes them more
likely to get out of sync with the actual size of the buffer than,
say, the buffer-size argument to ‘fgets’, but this is not a fatal
flaw in and of itself.)

The other design-level issue affects all of the numeric conversions:
if the result of (abstract, infinite-precision) numeric input conversion
does not fit in the variable supplied to hold the result of that conversion,
the behavior is undefined.  The manpage says that you get an ERANGE error
in this case, and that may be the behavior _glibc_ guarantees (I don’t
actually know for sure), but in the modern era of compilers drawing
inferences from undefined behavior, a guarantee by one C library is
not good enough.

That covers everything except %c and %n, which are safe but somewhat
pointless in isolation.

> Or the converse questions:
>
> -  Which conversion specifiers (or modifiers) are impossible to use
>    safely as gets(3) and should therefore be marked as deprecated in
>    the manual page (and probably warned in GCC)?
> -  Which functions in that page are impossible to use safely and
>    should therefore be marked as deprecated?
>
> Would you please mark them as [[deprecated]] in glibc too?

I don’t think glibc should unilaterally deprecate any function that’s
specified by ISO C.  And, the scanf family *can* be used safely with
sufficient care — read entire lines of input with getline, then split
them up into fields with sscanf using only %ms and %m[…], and finally
parse all numeric fields by hand with strtoX — the issue is more that,
if you limit yourself to the set of scanf operations that are 100% safe,
you’re left only with stuff that is arguably *easier* to do with <string.h>
and <regex.h> functions.

In a more sober tone of voice I suggest this text for the manpage:

.SH BUGS
By default, the 
.IR %s " and " %[
conversions will read an
.I unlimited
number of characters from the input.
In this mode they are just as unsafe as the infamous
.BR gets (3).
One should always specify either a field width,
or the
.I m
modifier,
with every use of
.IR %s " or " %[ .
.PP
If a numeric input conversion produces a value
that is not representable in the type of the corresponding argument
(e.g. if 99999 is to be stored in an
.IR "unsigned short" ),
ISO C says that the behavior is undefined.
The GNU C Library guarantees to treat this condition as a
.IR "matching failure" ,
but portable code should avoid using the numeric conversions.

I also suggest that GCC should add diagnostics to -Wformat and/or
-Wformat-security to catch use of %s and %[ with no size specified;
if glibc doesn’t already treat numeric overflow as a matching failure,
it should be changed to do so; and maybe someone should write up a
proposal for the C standard to make the same change.

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-09 19:33     ` Alejandro Colomar
  2022-12-09 21:41       ` Zack Weinberg
@ 2022-12-12 10:07       ` Ian Abbott
  1 sibling, 0 replies; 25+ messages in thread
From: Ian Abbott @ 2022-12-12 10:07 UTC (permalink / raw)
  To: Alejandro Colomar, Alejandro Colomar; +Cc: linux-man, GNU C Library

On 09/12/2022 19:33, Alejandro Colomar wrote:
> Hi Ian,
> 
> On 12/9/22 20:28, Ian Abbott wrote:
>> On 09/12/2022 18:59, Alejandro Colomar wrote:
>>> On 12/8/22 13:34, Ian Abbott wrote:
>>>> The `scanf()` function does not intentionally set `errno` to `ERANGE`.
>>>> That is just a side effect of the code that it uses to perform
>>>> conversions.  It also does not work as reliably as indicated in the
>>>> 'man' page when the target integer type is narrower than `long`.
>>>> Typically (at least in glibc) for target integer types narrower than
>>>> `long`, the number has to exceed the range of `long` (for signed
>>>> conversions) or `unsigned long` (for unsigned conversions) for `errno`
>>>> to be set to `ERANGE`.
>>>>
>>>> Documenting `ERANGE` in the ERRORS section kind of implies that
>>>> `scanf()` should return `EOF` when an integer overflow is encountered,
>>>> which it doesn't (and doing so would violate the C standard).
>>>>
>>>> Just remove any mention of the `ERANGE` error to avoid confusion.
>>>>
>>>> Fixes: 646af540e467 ("Add an ERRORS section documenting at least 
>>>> some of the errors that may occur for scanf().")
>>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com>
>>>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
>>>
>>> I see.  How about saying something like "it may also fail for any of 
>>> any errors that functions used to perform the conversions may fail"?
>>
>> It depends what you mean by "fail".  These errors do not make scanf 
>> return EOF. 
> 
> Just to clarify.  Does scanf(3) _never_ fail (EOF) due to ERANGE?  Or is 
> it that ERANGE sometimes makes it fail, sometimes not?

The glibc implementation certainly doesn't return EOF when ERANGE is 
detected.  __vfscanf_internal() in stdio-common/vfscan-internal.c does 
not contain any code to deal with ERANGE - it's just a side-effect of 
the calls to __strtol_internal(), __strtoul_internal(), 
__strtoll_internal(), or __strtoull_internal().

> If it's the former, I agree with your patch.  When a function hasn't 
> reported failure, errno is unspecified.
> 
> If it's the latter, I'd write something about it.

For the glibc implementation, it's the former.

>> Technically, the behavior is undefined if the result of the conversion 
>> cannot be represented in the object being assigned to by scanf.  (In 
>> the case of glibc, that probably results in either the integer object 
>> being set to a truncated version of the input integer, or the integer 
>> object being set to a truncated version of LONG_MIN or LONG_MAX, 
>> depending on the actual number.)
> 
> Hmm, UB.  Under UB, anything can change, so error reporting is already 
> unreliable.  If EOF+ERANGE can _only_ happen under UB, I'd rather remove 
> the paragraph.  Please confirm.

Yes, it is UB as per C17 7.21.6 paragraph 10: "[...] Unless assignment 
suppression was indicated by a *, the result of the conversion is placed 
in the object pointed to by the first argument following the format 
argument that has not already received a conversion result. If this 
object does not have an appropriate type, or if the result of the 
conversion cannot be represented in the object, the behavior is undefined."

>> Setting errno to 0 before calling scanf and expecting errno to have a 
>> meaningful value when scanf returns something other than EOF is bogus 
>> usage.
> 
> Yep, that's bogus.
> 
> 
> Cheers,
> 
> Alex

Best regards,
Ian

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-12  2:11           ` Zack Weinberg
@ 2022-12-12 10:21             ` Alejandro Colomar
  2022-12-14  2:13               ` Zack Weinberg
  2022-12-12 15:22             ` Ian Abbott
  1 sibling, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-12 10:21 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 8332 bytes --]

Hi Zack!

On 12/12/22 03:11, Zack Weinberg wrote:
> Alejandro Colomar <alx.manpages@gmail.com> writes:
>> On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote:
>>> The `scanf` functions have undefined behavior if numeric input
>>> overflows.  This means it is *impossible* to detect malformed input
>>> reliably using these functions.
>>> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of
>>> characters into a destination buffer whose size is unspecified; any
>>> use of such specifications renders `scanf` every bit as dangerous as
>>> `gets`.
> …
>>> Best practice is not to use any of these functions at all.
> …
>> I'm inclined to add that in that manual page.  Is there anything that
>> can be saved from that page, or should we burn it all?  To be more
>> specific:
>>
>> -  Are there any functions in that page that are still useful for any
>>     corner cases, or are they all useless?
>> -  Are there any conversion specifiers that can be used safely?
> 
> Hmm, this turns out to be a bit of a rabbit hole.
> 
> There are two major design-level problems with the scanf family.
> The more important one is that string input conversions (%s, %[…])
> will read an unlimited number of characters by default, oblivious to
> the size of the destination buffer — exactly the same design flaw as
> ‘gets’.  They do stop scanning at _any_ whitespace (not just \n) so,
> if you’re trying to craft exploit code, there are more byte values that
> must be avoided, but this is only a minor obstacle.  They can, however,
> be used safely, either by supplying a field width that accurately
> reflects the size of the destination buffer, or by using the ‘m’
> modifier (a POSIX extension), which directs scanf to allocate the
> right amount of space for the string with malloc.

Okay, so %s and $[ are at least usable.  Useful?  I don't know.  Probably 
fgets(3) and then either <string.h> or <regex.h> functions or taking 
unterminated strings (pointer plus length) is a much better idea.

But not enough to deprecate the specifiers; probably better to warn on GCC.

> 
> (Field widths are awkward to use because you have to write them as
> decimal constants _inside the format string_, which makes them more
> likely to get out of sync with the actual size of the buffer than,
> say, the buffer-size argument to ‘fgets’, but this is not a fatal
> flaw in and of itself.)

Yeah; it's an almost useless feature, but not a fatal flaw.  Any programmer 
would probably intuitively know that it's bad.  So, no action is required here.

> 
> The other design-level issue affects all of the numeric conversions:
> if the result of (abstract, infinite-precision) numeric input conversion
> does not fit in the variable supplied to hold the result of that conversion,
> the behavior is undefined.  The manpage says that you get an ERANGE error
> in this case, and that may be the behavior _glibc_ guarantees (I don’t
> actually know for sure), but in the modern era of compilers drawing
> inferences from undefined behavior, a guarantee by one C library is
> not good enough.

This, to me, is enough to mark them as deprecated in the manual page.  Anyway, 
deprecating something is not removing it.  It's just saying "hey, you shouldn't 
be using that; it's bad, and don't expect that ISO C will keep it around next 
century".

Something that results in undefined behavior without control of the programmer 
is as bad as gets(3) (okay, not as bad; gets(3) just was a red carpet for 
attacks, but fundamentally, yes).

So, I'll apply the diff shown at the bottom of the page for this.

> 
> That covers everything except %c and %n, which are safe but somewhat
> pointless in isolation.
> 
>> Or the converse questions:
>>
>> -  Which conversion specifiers (or modifiers) are impossible to use
>>     safely as gets(3) and should therefore be marked as deprecated in
>>     the manual page (and probably warned in GCC)?
>> -  Which functions in that page are impossible to use safely and
>>     should therefore be marked as deprecated?
>>
>> Would you please mark them as [[deprecated]] in glibc too?
> 
> I don’t think glibc should unilaterally deprecate any function that’s
> specified by ISO C.

Okay.  The man-pages are not that restricted, since they won't affect code at 
all, and only the minds of the programmers, which is more powerful.  So, I'll 
mark as deprecated at least the integer conversion specifiers.

>  And, the scanf family *can* be used safely with
> sufficient care — read entire lines of input with getline,

If getline(3) _is necessary_ to be safe, then I would deprecate the stream 
functions, and keep only the "s" variants.  Is it?

In fact, I'd say that even if it's not necessary to be safe, there are no good 
reasons to use [f]scanf(3) at all.  I'm very much considering deprecating them 
in the manual page.

> then split
> them up into fields with sscanf using only %ms and %m[…], and finally
> parse all numeric fields by hand with strtoX — the issue is more that,
> if you limit yourself to the set of scanf operations that are 100% safe,
> you’re left only with stuff that is arguably *easier* to do with <string.h>
> and <regex.h> functions.
> 
> In a more sober tone of voice I suggest this text for the manpage:
> 
> .SH BUGS
> By default, the
> .IR %s " and " %[
> conversions will read an
> .I unlimited
> number of characters from the input.
> In this mode they are just as unsafe as the infamous
> .BR gets (3).
> One should always specify either a field width,
> or the
> .I m
> modifier,
> with every use of
> .IR %s " or " %[ .
> .PP
> If a numeric input conversion produces a value
> that is not representable in the type of the corresponding argument
> (e.g. if 99999 is to be stored in an
> .IR "unsigned short" ),
> ISO C says that the behavior is undefined.
> The GNU C Library guarantees to treat this condition as a
> .IR "matching failure" ,
> but portable code should avoid using the numeric conversions.

That makes sense to me.  Would you mind sending a patch?  :)

> 
> I also suggest that GCC should add diagnostics to -Wformat and/or
> -Wformat-security to catch use of %s and %[ with no size specified;
> if glibc doesn’t already treat numeric overflow as a matching failure,
> it should be changed to do so; and maybe someone should write up a
> proposal for the C standard to make the same change.

Acked-by: Alejandro Colomar <alx@kernel.org>

> 
> zw

Cheers,

Alex

---


diff --git a/man3/scanf.3 b/man3/scanf.3
index ba470a5c1..0041d5573 100644
--- a/man3/scanf.3
+++ b/man3/scanf.3
@@ -386,6 +386,7 @@ .SS Conversions
  and assignment does not occur.
  .TP
  .B d
+.IR Deprecated .
  Matches an optionally signed decimal integer;
  the next pointer must be a pointer to
  .IR int .
@@ -400,6 +401,7 @@ .SS Conversions
  .\" is silently ignored, causing old programs to fail mysteriously.)
  .TP
  .B i
+.IR Deprecated .
  Matches an optionally signed integer; the next pointer must be a pointer to
  .IR int .
  The integer is read in base 16 if it begins with
@@ -412,15 +414,18 @@ .SS Conversions
  Only characters that correspond to the base are used.
  .TP
  .B o
+.IR Deprecated .
  Matches an unsigned octal integer; the next pointer must be a pointer to
  .IR "unsigned int" .
  .TP
  .B u
+.IR Deprecated .
  Matches an unsigned decimal integer; the next pointer must be a
  pointer to
  .IR "unsigned int" .
  .TP
  .B x
+.IR Deprecated .
  Matches an unsigned hexadecimal integer
  (that may optionally begin with a prefix of
  .I 0x
@@ -431,27 +436,33 @@ .SS Conversions
  .IR "unsigned int" .
  .TP
  .B X
+.IR Deprecated .
  Equivalent to
  .BR x .
  .TP
  .B f
+.IR Deprecated .
  Matches an optionally signed floating-point number; the next pointer must
  be a pointer to
  .IR float .
  .TP
  .B e
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B g
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B E
+.IR Deprecated .
  Equivalent to
  .BR f .
  .TP
  .B a
+.IR Deprecated .
  (C99) Equivalent to
  .BR f .
  .TP



-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-12  2:11           ` Zack Weinberg
  2022-12-12 10:21             ` Alejandro Colomar
@ 2022-12-12 15:22             ` Ian Abbott
  2022-12-14  2:18               ` Zack Weinberg
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Abbott @ 2022-12-12 15:22 UTC (permalink / raw)
  To: Zack Weinberg, Alejandro Colomar; +Cc: libc-alpha, 'linux-man'

On 12/12/2022 02:11, Zack Weinberg wrote:
> (Field widths are awkward to use because you have to write them as
> decimal constants _inside the format string_, which makes them more
> likely to get out of sync with the actual size of the buffer than,
> say, the buffer-size argument to ‘fgets’, but this is not a fatal
> flaw in and of itself.)

It's a shame that scanf's maximum field width couldn't be specified 
using an integer parameter in the same was as printf's minimum field 
width, but the '*' flag was already taken!

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-12 10:21             ` Alejandro Colomar
@ 2022-12-14  2:13               ` Zack Weinberg
  2022-12-14 10:47                 ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zack Weinberg @ 2022-12-14  2:13 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott

Alejandro Colomar <alx.manpages@gmail.com> writes:
> Okay, so %s and $[ are at least usable.  Useful?  I don't know.  Probably 
> fgets(3) and then either <string.h> or <regex.h> functions or taking 
> unterminated strings (pointer plus length) is a much better idea.

Yeah, agreed.

>> The other design-level issue affects all of the numeric conversions:
>> if the result of (abstract, infinite-precision) numeric input conversion
>> does not fit in the variable supplied to hold the result of that conversion,
>> the behavior is undefined.  The manpage says that you get an ERANGE error
>> in this case, and that may be the behavior _glibc_ guarantees (I don’t
>> actually know for sure), but in the modern era of compilers drawing
>> inferences from undefined behavior, a guarantee by one C library is
>> not good enough.
>
> This, to me, is enough to mark them as deprecated in the manual page.  Anyway, 
> deprecating something is not removing it.  It's just saying "hey, you shouldn't 
> be using that; it's bad, and don't expect that ISO C will keep it around next 
> century".

In my lexicon “deprecated” is a very strong statement, possibly because
I’m used to seeing it in the context of standards where it means “we
think we should never have added this in the first place, there’s no
plausible way to fix it, but we have to keep it around for backward
compatibility.”

The scanf numeric conversions could be fixed with a one-sentence edit to
the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10
from “If this object does not have an appropriate type, or if the result
of the conversion cannot be represented in the object, the behavior is
undefined” to “If this object does not have an appropriate type, the
behavior is undefined.  If the result of the conversion cannot be
represented in the object, the execution of the directive fails; this
condition is a matching failure.”  And, even if the C committee doesn’t
want to make that change, open-source C libraries can and should do it
unilaterally, as a documented implementation extension.  I think that’s
a better plan than declaring most uses of *scanf “deprecated.”

>>  And, the scanf family *can* be used safely with
>> sufficient care — read entire lines of input with getline,
>
> If getline(3) _is necessary_ to be safe, then I would deprecate the stream 
> functions, and keep only the "s" variants.  Is it?

Oh, right, the _third_ headache with fscanf.

Yes, I think it would be fair to say that it is almost always a mistake
to use the scanf variants that read directly from a FILE.  The issue
here is, at its root, that people new to C _expect_ a scanf call to read
an entire line of input, but it doesn’t. This is especially problematic
for interactive input — they try to use plain scanf to read numeric
input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in
the terminal’s line buffer _after_ the number, and get very confused
when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they
were expecting as a response to the _next_ prompt.  But it’s _also_ a
problem for error recovery, because scanf will stop in the middle of the
line when a matching failure occurs, and if you naively assumed it would
throw away the rest of the line, you get an error cascade.

The recommended practice to avoid this trap, is that you should use one
of the functions that _does_ read an entire line of input, i.e. fgets or
getline, and then parse the line as a string.  It would make sense for
the [f]scanf manpage to say that.

>> In a more sober tone of voice I suggest this text for the manpage:
> That makes sense to me.  Would you mind sending a patch?  :)

I do not have time to do that anytime soon.  Also, maybe glibc’s
behavior on numeric input overflow should be fixed first.

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-12 15:22             ` Ian Abbott
@ 2022-12-14  2:18               ` Zack Weinberg
  2022-12-14 10:22                 ` Ian Abbott
  0 siblings, 1 reply; 25+ messages in thread
From: Zack Weinberg @ 2022-12-14  2:18 UTC (permalink / raw)
  To: Ian Abbott; +Cc: Alejandro Colomar, libc-alpha, 'linux-man'

Ian Abbott <abbotti@mev.co.uk> writes:

> On 12/12/2022 02:11, Zack Weinberg wrote:
>> Field widths are awkward to use because you have to write them as
>> decimal constants _inside the format string_…
>
> It's a shame that scanf's maximum field width couldn't be specified
> using an integer parameter in the same was as printf's minimum field
> width, but the '*' flag was already taken!

Yup.  I suppose we could make up another flag … ‘@’ isn’t used for
anything …

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14  2:18               ` Zack Weinberg
@ 2022-12-14 10:22                 ` Ian Abbott
  2022-12-14 10:39                   ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Abbott @ 2022-12-14 10:22 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Alejandro Colomar, libc-alpha, 'linux-man'

On 14/12/2022 02:18, Zack Weinberg wrote:
> Ian Abbott <abbotti@mev.co.uk> writes:
> 
>> On 12/12/2022 02:11, Zack Weinberg wrote:
>>> Field widths are awkward to use because you have to write them as
>>> decimal constants _inside the format string_…
>>
>> It's a shame that scanf's maximum field width couldn't be specified
>> using an integer parameter in the same was as printf's minimum field
>> width, but the '*' flag was already taken!
> 
> Yup.  I suppose we could make up another flag … ‘@’ isn’t used for
> anything …

'@' isn't included in C's basic character set though.  '&' is available.

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 10:22                 ` Ian Abbott
@ 2022-12-14 10:39                   ` Alejandro Colomar
  2022-12-14 10:52                     ` Ian Abbott
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-14 10:39 UTC (permalink / raw)
  To: Ian Abbott, Zack Weinberg; +Cc: libc-alpha, 'linux-man'


[-- Attachment #1.1: Type: text/plain, Size: 903 bytes --]

Hi Ian & Zack,

On 12/14/22 11:22, Ian Abbott wrote:
> On 14/12/2022 02:18, Zack Weinberg wrote:
>> Ian Abbott <abbotti@mev.co.uk> writes:
>>
>>> On 12/12/2022 02:11, Zack Weinberg wrote:
>>>> Field widths are awkward to use because you have to write them as
>>>> decimal constants _inside the format string_…
>>>
>>> It's a shame that scanf's maximum field width couldn't be specified
>>> using an integer parameter in the same was as printf's minimum field
>>> width, but the '*' flag was already taken!
>>
>> Yup.  I suppose we could make up another flag … ‘@’ isn’t used for
>> anything …
> 
> '@' isn't included in C's basic character set though.  '&' is available.

Just a curious question from an ignorant:  what's the difference between the 
basic character set and the source character set?

Thanks,

Alex

> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14  2:13               ` Zack Weinberg
@ 2022-12-14 10:47                 ` Alejandro Colomar
  2022-12-14 11:03                   ` Ian Abbott
  2022-12-29  6:39                   ` Zack Weinberg
  0 siblings, 2 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-14 10:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --]

Hi Zack,

On 12/14/22 03:13, Zack Weinberg wrote:
> Alejandro Colomar <alx.manpages@gmail.com> writes:
>> Okay, so %s and $[ are at least usable.  Useful?  I don't know.  Probably
>> fgets(3) and then either <string.h> or <regex.h> functions or taking
>> unterminated strings (pointer plus length) is a much better idea.
> 
> Yeah, agreed.
> 
>>> The other design-level issue affects all of the numeric conversions:
>>> if the result of (abstract, infinite-precision) numeric input conversion
>>> does not fit in the variable supplied to hold the result of that conversion,
>>> the behavior is undefined.  The manpage says that you get an ERANGE error
>>> in this case, and that may be the behavior _glibc_ guarantees (I don’t
>>> actually know for sure), but in the modern era of compilers drawing
>>> inferences from undefined behavior, a guarantee by one C library is
>>> not good enough.
>>
>> This, to me, is enough to mark them as deprecated in the manual page.  Anyway,
>> deprecating something is not removing it.  It's just saying "hey, you shouldn't
>> be using that; it's bad, and don't expect that ISO C will keep it around next
>> century".
> 
> In my lexicon “deprecated” is a very strong statement, possibly because
> I’m used to seeing it in the context of standards where it means “we
> think we should never have added this in the first place, there’s no
> plausible way to fix it, but we have to keep it around for backward
> compatibility.”
> 
> The scanf numeric conversions could be fixed with a one-sentence edit to
> the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10
> from “If this object does not have an appropriate type, or if the result
> of the conversion cannot be represented in the object, the behavior is
> undefined” to “If this object does not have an appropriate type, the
> behavior is undefined.  If the result of the conversion cannot be
> represented in the object, the execution of the directive fails; this
> condition is a matching failure.”  And, even if the C committee doesn’t
> want to make that change, open-source C libraries can and should do it
> unilaterally, as a documented implementation extension.  I think that’s
> a better plan than declaring most uses of *scanf “deprecated.”

Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :)

> 
>>>   And, the scanf family *can* be used safely with
>>> sufficient care — read entire lines of input with getline,
>>
>> If getline(3) _is necessary_ to be safe, then I would deprecate the stream
>> functions, and keep only the "s" variants.  Is it?
> 
> Oh, right, the _third_ headache with fscanf.
> 
> Yes, I think it would be fair to say that it is almost always a mistake
> to use the scanf variants that read directly from a FILE.  The issue
> here is, at its root, that people new to C _expect_ a scanf call to read
> an entire line of input, but it doesn’t. This is especially problematic
> for interactive input — they try to use plain scanf to read numeric
> input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in
> the terminal’s line buffer _after_ the number, and get very confused
> when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they
> were expecting as a response to the _next_ prompt.  But it’s _also_ a
> problem for error recovery, because scanf will stop in the middle of the
> line when a matching failure occurs, and if you naively assumed it would
> throw away the rest of the line, you get an error cascade.
> 
> The recommended practice to avoid this trap, is that you should use one
> of the functions that _does_ read an entire line of input, i.e. fgets or
> getline, and then parse the line as a string.  It would make sense for
> the [f]scanf manpage to say that.

Please clarify; do you consider [v][f]scanf something that "we think we should 
never have added this in the first place, there’s no plausible way to fix it, 
but we have to keep it around for backward compatibility"?

> 
>>> In a more sober tone of voice I suggest this text for the manpage:
> …
>> That makes sense to me.  Would you mind sending a patch?  :)
> 
> I do not have time to do that anytime soon.  Also, maybe glibc’s
> behavior on numeric input overflow should be fixed first.

That also makes sense ;)

In short:

(1)  Numeric conversion specifiers are broken but can be fixed, and you plan to 
fix them.

      (1.1)  I'll revert the deprecation warning now; since they are only broken 
because the _current_ standard and implementations are broken, but not by 
inherent design problems.

      (1.2)  When you fix the implementation to not be UB anymore, it will also 
make sense to revert the patch that removed the ERANGE error, since you'll need 
to report it.

(2)  For the string conversion specifiers, there are ways to use them safely, 
and you plan to add a way to specify a size at runtime to the function, so it 
will be even better in the future.  No action required.

(3)  [v][f]scanf seem to be really broken by design.  Please confirm.

Cheers,

Alex

> 
> zw

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 10:39                   ` Alejandro Colomar
@ 2022-12-14 10:52                     ` Ian Abbott
  2022-12-14 11:23                       ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Ian Abbott @ 2022-12-14 10:52 UTC (permalink / raw)
  To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man'

On 2022-12-14 10:39, Alejandro Colomar wrote:
> Hi Ian & Zack,
> 
> On 12/14/22 11:22, Ian Abbott wrote:
>> On 14/12/2022 02:18, Zack Weinberg wrote:
>>> Ian Abbott <abbotti@mev.co.uk> writes:
>>>
>>>> On 12/12/2022 02:11, Zack Weinberg wrote:
>>>>> Field widths are awkward to use because you have to write them as
>>>>> decimal constants _inside the format string_…
>>>>
>>>> It's a shame that scanf's maximum field width couldn't be specified
>>>> using an integer parameter in the same was as printf's minimum field
>>>> width, but the '*' flag was already taken!
>>>
>>> Yup.  I suppose we could make up another flag … ‘@’ isn’t used for
>>> anything …
>>
>> '@' isn't included in C's basic character set though.  '&' is available.
> 
> Just a curious question from an ignorant:  what's the difference between 
> the basic character set and the source character set?

The source character set may contain locale-specific characters outside 
the basic source character set.

Actually, there are two basic character sets - the basic source 
character set and the basic execution character set (which includes the 
basic source character set plus a few control characters).  The source 
character set and/or execution character set may contain 
locale-specific, extended characters outside the basic character set.

https://port70.net/~nsz/c/c11/n1570.html#5.2.1

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 10:47                 ` Alejandro Colomar
@ 2022-12-14 11:03                   ` Ian Abbott
  2022-12-29  6:42                     ` Zack Weinberg
  2022-12-29  6:39                   ` Zack Weinberg
  1 sibling, 1 reply; 25+ messages in thread
From: Ian Abbott @ 2022-12-14 11:03 UTC (permalink / raw)
  To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man'

On 2022-12-14 10:47, Alejandro Colomar wrote:
> On 12/14/22 03:13, Zack Weinberg wrote:
>> Alejandro Colomar <alx.manpages@gmail.com> writes:
>>>> In a more sober tone of voice I suggest this text for the manpage:
>> …
>>> That makes sense to me.  Would you mind sending a patch?  :)
>>
>> I do not have time to do that anytime soon.  Also, maybe glibc’s
>> behavior on numeric input overflow should be fixed first.
> 
> That also makes sense ;)
> 
> In short:
> 
> (1)  Numeric conversion specifiers are broken but can be fixed, and you 
> plan to fix them.
> 
>       (1.1)  I'll revert the deprecation warning now; since they are 
> only broken because the _current_ standard and implementations are 
> broken, but not by inherent design problems.
> 
>       (1.2)  When you fix the implementation to not be UB anymore, it 
> will also make sense to revert the patch that removed the ERANGE error, 
> since you'll need to report it.

And would ERANGE cause scanf to return EOF in the fixed implementation? 
That seems like it would break a lot of existing code (even though it is 
currently UB).  It would probably be better to silently set errno to 
ERANGE without returning EOF, and to set the integer object's value to 
the maximum or minimum value for its type (as it currently does for 
signed/unsigned long).

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 10:52                     ` Ian Abbott
@ 2022-12-14 11:23                       ` Alejandro Colomar
  2022-12-14 14:10                         ` Ian Abbott
  2022-12-14 16:38                         ` Joseph Myers
  0 siblings, 2 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-14 11:23 UTC (permalink / raw)
  To: Ian Abbott, Zack Weinberg; +Cc: libc-alpha, 'linux-man'


[-- Attachment #1.1: Type: text/plain, Size: 1420 bytes --]



On 12/14/22 11:52, Ian Abbott wrote:
>>>
>>> '@' isn't included in C's basic character set though.  '&' is available.
>>
>> Just a curious question from an ignorant:  what's the difference between the 
>> basic character set and the source character set?
> 
> The source character set may contain locale-specific characters outside the 
> basic source character set.
> 
> Actually, there are two basic character sets - the basic source character set 
> and the basic execution character set (which includes the basic source character 
> set plus a few control characters).  The source character set and/or execution 
> character set may contain locale-specific, extended characters outside the basic 
> character set.
> 
> https://port70.net/~nsz/c/c11/n1570.html#5.2.1

I still have a small doubt.  C23 added '@' to the source character set, but 
seems to be a second-class citizen:



  The execution character set may also contain multibyte characters, which
need not have the same encoding as for the source character set. For both 
character sets, the following
shall hold:
— The basic character set, @, $, and ` shall be present and each character shall 
be encoded as a
single byte.



What's the difference, and why isn't it part of the basic character set?  Maybe 
because not all keyboards have those three characters?
> 

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 11:23                       ` Alejandro Colomar
@ 2022-12-14 14:10                         ` Ian Abbott
  2022-12-14 16:38                         ` Joseph Myers
  1 sibling, 0 replies; 25+ messages in thread
From: Ian Abbott @ 2022-12-14 14:10 UTC (permalink / raw)
  To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man'

On 14/12/2022 11:23, Alejandro Colomar wrote:
> 
> 
> On 12/14/22 11:52, Ian Abbott wrote:
>>>>
>>>> '@' isn't included in C's basic character set though.  '&' is 
>>>> available.
>>>
>>> Just a curious question from an ignorant:  what's the difference 
>>> between the basic character set and the source character set?
>>
>> The source character set may contain locale-specific characters 
>> outside the basic source character set.
>>
>> Actually, there are two basic character sets - the basic source 
>> character set and the basic execution character set (which includes 
>> the basic source character set plus a few control characters).  The 
>> source character set and/or execution character set may contain 
>> locale-specific, extended characters outside the basic character set.
>>
>> https://port70.net/~nsz/c/c11/n1570.html#5.2.1
> 
> I still have a small doubt.  C23 added '@' to the source character set, 
> but seems to be a second-class citizen:
> 
> 
> 
>   The execution character set may also contain multibyte characters, which
> need not have the same encoding as for the source character set. For 
> both character sets, the following
> shall hold:
> — The basic character set, @, $, and ` shall be present and each 
> character shall be encoded as a
> single byte.
> 
> What's the difference, and why isn't it part of the basic character 
> set?  Maybe because not all keyboards have those three characters?

I think the inability to type certain characters in the basic source 
character set is the reason why the language contains the horrible 
trigraph sequences (no longer valid since the C23 final draft N3054), 
and the slightly less horrible digraph tokens.

Here is the rationale for inclusion of @ and $ in the source and 
execution character sets, but ` is only mentioned briefly as an also-ran 
at the end of the document in section "Do we also want to add ` in the 
same way as @ and $?":

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm

The rationale for exclusion of @ and $ characters from the basic 
character set is given in this paragraph from the document:

"""
By requiring @ and $ in the source and execution character set we, reach 
the goal of making them useable in comments and string literals. By not 
adding them to the basic source character set, we protect the freedom of 
implementations of allowing or disallowing them in identifiers, and 
avoid inconsistency or incompability regarding the use of universal 
character names (currently the use of universal character names for 
characters in the basic source character set is not allowed, so adding 
characters to the basic source character set without lifting that 
restriction could break existing code).
"""

I guess it was decided to add all three proposed characters during the 
Jan/Feb 2022 virtual meeting of WG14 as mentioned here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2913.htm

The first C2x draft that incorporated the change is this one:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf

-- 
-=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 11:23                       ` Alejandro Colomar
  2022-12-14 14:10                         ` Ian Abbott
@ 2022-12-14 16:38                         ` Joseph Myers
  1 sibling, 0 replies; 25+ messages in thread
From: Joseph Myers @ 2022-12-14 16:38 UTC (permalink / raw)
  To: Alejandro Colomar
  Cc: Ian Abbott, Zack Weinberg, libc-alpha, 'linux-man'

On Wed, 14 Dec 2022, Alejandro Colomar via Libc-alpha wrote:

> What's the difference, and why isn't it part of the basic character set?

Apart from the point discussed about how making them part of the basic 
character set would interact with other rules involving that character 
set, the basic source character set consists of those characters that have 
some specified role in the C syntax.  There is no specified role for $ @ ` 
in the C syntax - they can only be used in string or character literals 
(modulo the question of whether $ should still be allowed as an 
implementation-defined character in identifiers, see N3046) and have no 
special role to play there.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 10:47                 ` Alejandro Colomar
  2022-12-14 11:03                   ` Ian Abbott
@ 2022-12-29  6:39                   ` Zack Weinberg
  2022-12-29 10:47                     ` Alejandro Colomar
  1 sibling, 1 reply; 25+ messages in thread
From: Zack Weinberg @ 2022-12-29  6:39 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott

On Wed, 14 Dec 2022 05:47:12 -0500, Alejandro Colomar wrote:
> On 12/14/22 03:13, Zack Weinberg wrote:
> > The scanf numeric conversions could be fixed with a one-sentence edit to
> > the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10
> > from “If this object does not have an appropriate type, or if the result
> > of the conversion cannot be represented in the object, the behavior is
> > undefined” to “If this object does not have an appropriate type, the
> > behavior is undefined.  If the result of the conversion cannot be
> > represented in the object, the execution of the directive fails; this
> > condition is a matching failure.”  And, even if the C committee doesn’t
> > want to make that change, open-source C libraries can and should do it
> > unilaterally, as a documented implementation extension.  I think that’s
> > a better plan than declaring most uses of *scanf “deprecated.”
> 
> Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :)

To be clear, I personally don’t have plans to do any of the actual
programming or standard-changing work involved here.  :-)

> > Yes, I think it would be fair to say that it is almost always a mistake
> > to use the scanf variants that read directly from a FILE.  The issue
> > here is, at its root, that people new to C _expect_ a scanf call to read
> > an entire line of input, but it doesn’t. This is especially problematic
> > for interactive input ― they try to use plain scanf to read numeric
> > input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in
> > the terminal’s line buffer _after_ the number, and get very confused
> > when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they
> > were expecting as a response to the _next_ prompt.  But it’s _also_ a
> > problem for error recovery, because scanf will stop in the middle of the
> > line when a matching failure occurs, and if you naively assumed it would
> > throw away the rest of the line, you get an error cascade.
> > 
> > The recommended practice to avoid this trap, is that you should use one
> > of the functions that _does_ read an entire line of input, i.e. fgets or
> > getline, and then parse the line as a string.  It would make sense for
> > the [f]scanf manpage to say that.
> 
> Please clarify; do you consider [v][f]scanf something that "we think
> we should never have added this in the first place, there’s no
> plausible way to fix it, but we have to keep it around for backward
> compatibility"?

_I_ wouldn’t have added them in the first place, but I care more than
the average about robustness in the face of unexpected input, even for
“throwaway” programs.  I doubt the C committee would be prepared to
say the same thing.  They _can_ be used legitimately, and they can
even be used in ways that meet my robustness standards if you go
to enough trouble.  It’s just that (IMNSHO) there are better ways to
reach those standards.

In terms of text for the [v][f]scanf manpage, maybe something like

[NOTES? BUGS?]
    When scanf() or fscanf() report a _matching failure_, all of the
    text that was matched successfully has still been read from
    _stdin_ or the _stream_ (respectively), and so have an
    unpredictable number of characters associated with the conversion
    that failed to match.  The latter characters are lost.  This may
    make it difficult to recover from invalid input.

    One way to make recovery easier is to separate reading from
    parsing: use fgets() or getline() to read an entire line of text
    into a string, then use sscanf() to parse the string.  If a
    _matching failure_ occurs, you can try sscanf() again with a
    different _format_; the equivalent is not possible using fscanf().

    _Successful_ calls to scanf() and fscanf() frequently consume
    either more, or fewer, characters from the input than was
    expected.  For example, assuming the next six characters readable
    from `stdin` are `"123\n a"‘, `scanf("%d", &val)` will consume the
    digits but _not_ the newline, and `scanf("%d\n", &val)‘ will
    consume the digits, the newline, _and_ the space.  Either of these
    is likely to cause trouble when mixing calls to scanf() with calls
    to fgets() or fgetc().  As above, it helps to read entire lines of
    text with fgets() or getline() and then parse them with sscanf().

> In short:
> 
> (1)  Numeric conversion specifiers are broken but can be fixed…

Yes.

> …and you plan to fix them.

No :)

>      (1.1)  I'll revert the deprecation warning now; since they are
> only broken because the _current_ standard and implementations are
> broken, but not by inherent design problems.

OK.

>      (1.2) When [someone fixes] the implementation to not be UB
> anymore, it will also make sense to revert the patch that removed
> the ERANGE error, since you'll need to report it.

Yes.

> (2)  For the string conversion specifiers, there are ways to use them
> safely.

Yes.

> and you plan to add a way to specify a size at runtime to the
> function,

No again :)

> so it will be even better in the future.  No action required.

Concur.

> (3)  [v][f]scanf seem to be really broken by design.  Please confirm.

See above.

If you remind me where to find the git repo for the manpages, I _may_
have time to write a patch for all this sometime next week.

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-14 11:03                   ` Ian Abbott
@ 2022-12-29  6:42                     ` Zack Weinberg
  0 siblings, 0 replies; 25+ messages in thread
From: Zack Weinberg @ 2022-12-29  6:42 UTC (permalink / raw)
  To: Ian Abbott; +Cc: Alejandro Colomar, libc-alpha, 'linux-man'

On Wed, 14 Dec 2022 06:03:07 -0500, Ian Abbott wrote:
> And would ERANGE cause scanf to return EOF in the fixed
> implementation?

In the counterfactual universe where I actually patch scanf, yes.

> That seems like it would break a lot of existing code
> (even though it is currently UB).  It would probably be better to
> silently set errno to ERANGE without returning EOF

My intuition says it’s the other way around.  What existing code do
you expect will break?

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-29  6:39                   ` Zack Weinberg
@ 2022-12-29 10:47                     ` Alejandro Colomar
  2022-12-29 16:35                       ` Zack Weinberg
  0 siblings, 1 reply; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-29 10:47 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 5528 bytes --]

Hi Zack,

On 12/29/22 07:39, Zack Weinberg wrote:
>> Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :)
> 
> To be clear, I personally don’t have plans to do any of the actual
> programming or standard-changing work involved here.  :-)

Ah, no, I meant more that the whole set of glibc maintainers had that in mind, 
as a long term plan (like 10 years maybe?).  If you don't think these functions 
will be saved (even if we can) because it's not worth it, maybe we can just kill 
them.  But I'll defer that decission to after some fixes to the documentation.

>> Please clarify; do you consider [v][f]scanf something that "we think
>> we should never have added this in the first place, there’s no
>> plausible way to fix it, but we have to keep it around for backward
>> compatibility"?
> 
> _I_ wouldn’t have added them in the first place, but I care more than
> the average about robustness in the face of unexpected input, even for
> “throwaway” programs.  I doubt the C committee would be prepared to
> say the same thing.

I share your concerns about robustness of code, and am prepared to defend at 
least some partial/unofficial deprecation.  I'll clarify below.

> They _can_ be used legitimately, and they can
> even be used in ways that meet my robustness standards if you go
> to enough trouble.

That's not so clear to me.

>  It’s just that (IMNSHO) there are better ways to
> reach those standards.
> 
> In terms of text for the [v][f]scanf manpage, maybe something like
> 
> [NOTES? BUGS?]
>      When scanf() or fscanf() report a _matching failure_, all of the
>      text that was matched successfully has still been read from
>      _stdin_ or the _stream_ (respectively), and so have an
>      unpredictable number of characters associated with the conversion
>      that failed to match.  The latter characters are lost.  This may
>      make it difficult to recover from invalid input.
> 
>      One way to make recovery easier is to separate reading from
>      parsing: use fgets() or getline() to read an entire line of text
>      into a string, then use sscanf() to parse the string.  If a
>      _matching failure_ occurs, you can try sscanf() again with a
>      different _format_; the equivalent is not possible using fscanf().

This reads more or less as: "the only way to use scanf(3) is not to use it; use 
fgets(3)/getline(3) + sscanf(3) instead".

> 
>      _Successful_ calls to scanf() and fscanf() frequently consume
>      either more, or fewer, characters from the input than was
>      expected.  For example, assuming the next six characters readable
>      from `stdin` are `"123\n a"‘, `scanf("%d", &val)` will consume the
>      digits but _not_ the newline, and `scanf("%d\n", &val)‘ will
>      consume the digits, the newline, _and_ the space.  Either of these
>      is likely to cause trouble when mixing calls to scanf() with calls
>      to fgets() or fgetc().  As above, it helps to read entire lines of
>      text with fgets() or getline() and then parse them with sscanf().

And this reads as "even \"successful\" calls to scanf(3) are doomed; really, 
never use it".  :)

> 
>> In short:
>>
>> (1)  Numeric conversion specifiers are broken but can be fixed…
> 
> Yes.
> 
>> …and you plan to fix them.
> 
> No :)
> 
>>       (1.1)  I'll revert the deprecation warning now; since they are
>> only broken because the _current_ standard and implementations are
>> broken, but not by inherent design problems.
> 
> OK.
> 
>>       (1.2) When [someone fixes] the implementation to not be UB
>> anymore, it will also make sense to revert the patch that removed
>> the ERANGE error, since you'll need to report it.
> 
> Yes.
> 
>> (2)  For the string conversion specifiers, there are ways to use them
>> safely.
> 
> Yes.
> 
>> and you plan to add a way to specify a size at runtime to the
>> function,
> 
> No again :)
> 
>> so it will be even better in the future.  No action required.
> 
> Concur.
> 
>> (3)  [v][f]scanf seem to be really broken by design.  Please confirm.
> 
> See above.

Before you start writing patches, I'm considering the following, which is my way 
to say don't use these functions without deprecating them:

Split FILE and char* functions into separate manual pages.  In the one for 
[v]sscanf(3), I'd keep the current documentation.  In the one for FILE 
functions, I'd keep it very short, defering to sscanf(3) for documentation of 
things like conversion specifiers, and that page would only cover the 
bugs^Wdifferences that apply only to FILE functions.

I'll prepare some patches and show them for discussion in linux-man@.

> 
> If you remind me where to find the git repo for the manpages, I _may_
> have time to write a patch for all this sometime next week.

man-pages' README:

Versions
        Tarballs of releases starting from 2.00 are available at
        <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/>.

        The git(1) repository can be found at:
        <git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git>

        A secondary git(1) repository can be found at:
        <git://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git>

See also
        man-pages(7)

    Website
        <http://www.kernel.org/doc/man-pages/index.html>.


> 
> zw


Cheers,

Alex

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-29 10:47                     ` Alejandro Colomar
@ 2022-12-29 16:35                       ` Zack Weinberg
  2022-12-29 16:39                         ` Alejandro Colomar
  0 siblings, 1 reply; 25+ messages in thread
From: Zack Weinberg @ 2022-12-29 16:35 UTC (permalink / raw)
  To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott

On Thu, 29 Dec 2022 05:47:06 -0500, Alejandro Colomar wrote:
> On 12/29/22 07:39, Zack Weinberg wrote:
> > To be clear, I personally don’t have plans to do any of the actual
> > programming or standard-changing work involved here.  :-)
> 
> Ah, no, I meant more that the whole set of glibc maintainers had that
> in mind, as a long term plan (like 10 years maybe?).

Oh, OK.  Yeah, changes to the standard can easily take that long.

> Before you start writing patches, I'm considering the following, which
> is my way to say don't use these functions without deprecating them:
> 
> Split FILE and char* functions into separate manual pages.  In the one
> for [v]sscanf(3), I'd keep the current documentation.  In the one for
> FILE functions, I'd keep it very short, defering to sscanf(3) for
> documentation of things like conversion specifiers, and that page
> would only cover the bugs^Wdifferences that apply only to FILE
> functions.

That seems like a good way forward to me.

zw

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] scanf.3: Do not mention the ERANGE error
  2022-12-29 16:35                       ` Zack Weinberg
@ 2022-12-29 16:39                         ` Alejandro Colomar
  0 siblings, 0 replies; 25+ messages in thread
From: Alejandro Colomar @ 2022-12-29 16:39 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott


[-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --]

Hi Zack,

On 12/29/22 17:35, Zack Weinberg wrote:
> On Thu, 29 Dec 2022 05:47:06 -0500, Alejandro Colomar wrote:
>> On 12/29/22 07:39, Zack Weinberg wrote:
>>> To be clear, I personally don’t have plans to do any of the actual
>>> programming or standard-changing work involved here.  :-)
>>
>> Ah, no, I meant more that the whole set of glibc maintainers had that
>> in mind, as a long term plan (like 10 years maybe?).
> 
> Oh, OK.  Yeah, changes to the standard can easily take that long.
> 
>> Before you start writing patches, I'm considering the following, which
>> is my way to say don't use these functions without deprecating them:
>>
>> Split FILE and char* functions into separate manual pages.  In the one
>> for [v]sscanf(3), I'd keep the current documentation.  In the one for
>> FILE functions, I'd keep it very short, defering to sscanf(3) for
>> documentation of things like conversion specifiers, and that page
>> would only cover the bugs^Wdifferences that apply only to FILE
>> functions.
> 
> That seems like a good way forward to me.

I've done the splitting.  If you would like to prepare any patches for adding 
BUGS, I'll take them :)

Cheers,

Alex

> 
> zw

-- 
<http://www.alejandro-colomar.es/>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-12-29 16:39 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20221208123454.13132-1-abbotti@mev.co.uk>
2022-12-09 18:59 ` [PATCH] scanf.3: Do not mention the ERANGE error Alejandro Colomar
2022-12-09 19:28   ` Ian Abbott
2022-12-09 19:33     ` Alejandro Colomar
2022-12-09 21:41       ` Zack Weinberg
2022-12-11 15:58         ` Alejandro Colomar
2022-12-11 16:03           ` Alejandro Colomar
2022-12-12  2:11           ` Zack Weinberg
2022-12-12 10:21             ` Alejandro Colomar
2022-12-14  2:13               ` Zack Weinberg
2022-12-14 10:47                 ` Alejandro Colomar
2022-12-14 11:03                   ` Ian Abbott
2022-12-29  6:42                     ` Zack Weinberg
2022-12-29  6:39                   ` Zack Weinberg
2022-12-29 10:47                     ` Alejandro Colomar
2022-12-29 16:35                       ` Zack Weinberg
2022-12-29 16:39                         ` Alejandro Colomar
2022-12-12 15:22             ` Ian Abbott
2022-12-14  2:18               ` Zack Weinberg
2022-12-14 10:22                 ` Ian Abbott
2022-12-14 10:39                   ` Alejandro Colomar
2022-12-14 10:52                     ` Ian Abbott
2022-12-14 11:23                       ` Alejandro Colomar
2022-12-14 14:10                         ` Ian Abbott
2022-12-14 16:38                         ` Joseph Myers
2022-12-12 10:07       ` Ian Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).