* Re: [PATCH] scanf.3: Do not mention the ERANGE error [not found] <20221208123454.13132-1-abbotti@mev.co.uk> @ 2022-12-09 18:59 ` Alejandro Colomar 2022-12-09 19:28 ` Ian Abbott 0 siblings, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-12-09 18:59 UTC (permalink / raw) To: Ian Abbott, Alejandro Colomar; +Cc: linux-man, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 2007 bytes --] Hi Ian, On 12/8/22 13:34, Ian Abbott wrote: > The `scanf()` function does not intentionally set `errno` to `ERANGE`. > That is just a side effect of the code that it uses to perform > conversions. It also does not work as reliably as indicated in the > 'man' page when the target integer type is narrower than `long`. > Typically (at least in glibc) for target integer types narrower than > `long`, the number has to exceed the range of `long` (for signed > conversions) or `unsigned long` (for unsigned conversions) for `errno` > to be set to `ERANGE`. > > Documenting `ERANGE` in the ERRORS section kind of implies that > `scanf()` should return `EOF` when an integer overflow is encountered, > which it doesn't (and doing so would violate the C standard). > > Just remove any mention of the `ERANGE` error to avoid confusion. > > Fixes: 646af540e467 ("Add an ERRORS section documenting at least some of the errors that may occur for scanf().") > Cc: Michael Kerrisk <mtk.manpages@gmail.com> > Signed-off-by: Ian Abbott <abbotti@mev.co.uk> I see. How about saying something like "it may also fail for any of any errors that functions used to perform the conversions may fail"? Cheers, Alex > --- > man3/scanf.3 | 7 ------- > 1 file changed, 7 deletions(-) > > diff --git a/man3/scanf.3 b/man3/scanf.3 > index ba470a5c1..c5ff59f45 100644 > --- a/man3/scanf.3 > +++ b/man3/scanf.3 > @@ -576,10 +576,6 @@ is NULL. > .TP > .B ENOMEM > Out of memory. > -.TP > -.B ERANGE > -The result of an integer conversion would exceed the size > -that can be stored in the corresponding integer type. > .SH ATTRIBUTES > For an explanation of the terms used in this section, see > .BR attributes (7). > @@ -609,9 +605,6 @@ The functions > and > .BR sscanf () > conform to C89 and C99 and POSIX.1-2001. > -These standards do not specify the > -.B ERANGE > -error. > .PP > The > .B q -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-09 18:59 ` [PATCH] scanf.3: Do not mention the ERANGE error Alejandro Colomar @ 2022-12-09 19:28 ` Ian Abbott 2022-12-09 19:33 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: Ian Abbott @ 2022-12-09 19:28 UTC (permalink / raw) To: Alejandro Colomar, Alejandro Colomar; +Cc: linux-man, GNU C Library On 09/12/2022 18:59, Alejandro Colomar wrote: > Hi Ian, > > On 12/8/22 13:34, Ian Abbott wrote: >> The `scanf()` function does not intentionally set `errno` to `ERANGE`. >> That is just a side effect of the code that it uses to perform >> conversions. It also does not work as reliably as indicated in the >> 'man' page when the target integer type is narrower than `long`. >> Typically (at least in glibc) for target integer types narrower than >> `long`, the number has to exceed the range of `long` (for signed >> conversions) or `unsigned long` (for unsigned conversions) for `errno` >> to be set to `ERANGE`. >> >> Documenting `ERANGE` in the ERRORS section kind of implies that >> `scanf()` should return `EOF` when an integer overflow is encountered, >> which it doesn't (and doing so would violate the C standard). >> >> Just remove any mention of the `ERANGE` error to avoid confusion. >> >> Fixes: 646af540e467 ("Add an ERRORS section documenting at least some >> of the errors that may occur for scanf().") >> Cc: Michael Kerrisk <mtk.manpages@gmail.com> >> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> > > I see. How about saying something like "it may also fail for any of any > errors that functions used to perform the conversions may fail"? It depends what you mean by "fail". These errors do not make scanf return EOF. Technically, the behavior is undefined if the result of the conversion cannot be represented in the object being assigned to by scanf. (In the case of glibc, that probably results in either the integer object being set to a truncated version of the input integer, or the integer object being set to a truncated version of LONG_MIN or LONG_MAX, depending on the actual number.) Setting errno to 0 before calling scanf and expecting errno to have a meaningful value when scanf returns something other than EOF is bogus usage. > > Cheers, > > Alex Cheers, Ian > >> --- >> man3/scanf.3 | 7 ------- >> 1 file changed, 7 deletions(-) >> >> diff --git a/man3/scanf.3 b/man3/scanf.3 >> index ba470a5c1..c5ff59f45 100644 >> --- a/man3/scanf.3 >> +++ b/man3/scanf.3 >> @@ -576,10 +576,6 @@ is NULL. >> .TP >> .B ENOMEM >> Out of memory. >> -.TP >> -.B ERANGE >> -The result of an integer conversion would exceed the size >> -that can be stored in the corresponding integer type. >> .SH ATTRIBUTES >> For an explanation of the terms used in this section, see >> .BR attributes (7). >> @@ -609,9 +605,6 @@ The functions >> and >> .BR sscanf () >> conform to C89 and C99 and POSIX.1-2001. >> -These standards do not specify the >> -.B ERANGE >> -error. >> .PP >> The >> .B q > -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-09 19:28 ` Ian Abbott @ 2022-12-09 19:33 ` Alejandro Colomar 2022-12-09 21:41 ` Zack Weinberg 2022-12-12 10:07 ` Ian Abbott 0 siblings, 2 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-09 19:33 UTC (permalink / raw) To: Ian Abbott, Alejandro Colomar; +Cc: linux-man, GNU C Library [-- Attachment #1.1: Type: text/plain, Size: 2570 bytes --] Hi Ian, On 12/9/22 20:28, Ian Abbott wrote: > On 09/12/2022 18:59, Alejandro Colomar wrote: >> On 12/8/22 13:34, Ian Abbott wrote: >>> The `scanf()` function does not intentionally set `errno` to `ERANGE`. >>> That is just a side effect of the code that it uses to perform >>> conversions. It also does not work as reliably as indicated in the >>> 'man' page when the target integer type is narrower than `long`. >>> Typically (at least in glibc) for target integer types narrower than >>> `long`, the number has to exceed the range of `long` (for signed >>> conversions) or `unsigned long` (for unsigned conversions) for `errno` >>> to be set to `ERANGE`. >>> >>> Documenting `ERANGE` in the ERRORS section kind of implies that >>> `scanf()` should return `EOF` when an integer overflow is encountered, >>> which it doesn't (and doing so would violate the C standard). >>> >>> Just remove any mention of the `ERANGE` error to avoid confusion. >>> >>> Fixes: 646af540e467 ("Add an ERRORS section documenting at least some of the >>> errors that may occur for scanf().") >>> Cc: Michael Kerrisk <mtk.manpages@gmail.com> >>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> >> >> I see. How about saying something like "it may also fail for any of any >> errors that functions used to perform the conversions may fail"? > > It depends what you mean by "fail". These errors do not make scanf return EOF. Just to clarify. Does scanf(3) _never_ fail (EOF) due to ERANGE? Or is it that ERANGE sometimes makes it fail, sometimes not? If it's the former, I agree with your patch. When a function hasn't reported failure, errno is unspecified. If it's the latter, I'd write something about it. > Technically, the behavior is undefined if the result of the conversion cannot be > represented in the object being assigned to by scanf. (In the case of glibc, > that probably results in either the integer object being set to a truncated > version of the input integer, or the integer object being set to a truncated > version of LONG_MIN or LONG_MAX, depending on the actual number.) Hmm, UB. Under UB, anything can change, so error reporting is already unreliable. If EOF+ERANGE can _only_ happen under UB, I'd rather remove the paragraph. Please confirm. > > Setting errno to 0 before calling scanf and expecting errno to have a meaningful > value when scanf returns something other than EOF is bogus usage. Yep, that's bogus. Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-09 19:33 ` Alejandro Colomar @ 2022-12-09 21:41 ` Zack Weinberg 2022-12-11 15:58 ` Alejandro Colomar 2022-12-12 10:07 ` Ian Abbott 1 sibling, 1 reply; 25+ messages in thread From: Zack Weinberg @ 2022-12-09 21:41 UTC (permalink / raw) To: libc-alpha, 'linux-man' On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote: >> Technically, the behavior is undefined if the result of the conversion >> cannot be represented in the object being assigned to by scanf. (In >> the case of glibc, that probably results in either the integer object >> being set to a truncated version of the input integer, or the integer >> object being set to a truncated version of LONG_MIN or LONG_MAX, >> depending on the actual number.) > > Hmm, UB. Under UB, anything can change, so error reporting is already > unreliable. If EOF+ERANGE can _only_ happen under UB, I'd rather remove > the paragraph. Please confirm. BUGS The `scanf` functions have undefined behavior if numeric input overflows. This means it is *impossible* to detect malformed input reliably using these functions. Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of characters into a destination buffer whose size is unspecified; any use of such specifications renders `scanf` every bit as dangerous as `gets`. Best practice is not to use any of these functions at all. zw (no, this is not a joke) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-09 21:41 ` Zack Weinberg @ 2022-12-11 15:58 ` Alejandro Colomar 2022-12-11 16:03 ` Alejandro Colomar 2022-12-12 2:11 ` Zack Weinberg 0 siblings, 2 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-11 15:58 UTC (permalink / raw) To: Zack Weinberg, libc-alpha, 'linux-man'; +Cc: Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --] [CC += Ian] Hi Zack, On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote: > On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote: >>> Technically, the behavior is undefined if the result of the conversion cannot >>> be represented in the object being assigned to by scanf. (In the case of >>> glibc, that probably results in either the integer object being set to a >>> truncated version of the input integer, or the integer object being set to a >>> truncated version of LONG_MIN or LONG_MAX, depending on the actual number.) >> >> Hmm, UB. Under UB, anything can change, so error reporting is already >> unreliable. If EOF+ERANGE can _only_ happen under UB, I'd rather remove the >> paragraph. Please confirm. > > BUGS > > The `scanf` functions have undefined behavior if numeric input overflows. This > means it is *impossible* to detect malformed input reliably using these functions. > > Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of characters > into a destination buffer whose size is unspecified; any use of such > specifications renders `scanf` every bit as dangerous as `gets`. Thanks for reminding that! Since I don't use these functions, I don't remember how bad they are :) > > Best practice is not to use any of these functions at all. > > zw (no, this is not a joke) I'm inclined to add that in that manual page. Is there anything that can be saved from that page, or should we burn it all? To be more specific: - Are there any functions in that page that are still useful for any corner cases, or are they all useless? - Are there any conversion specifiers that can be used safely? Or the converse questions: - Which conversion specifiers (or modifiers) are impossible to use safely as gets(3) and should therefore be marked as deprecated in the manual page (and probably warned in GCC)? - Which functions in that page are impossible to use safely and should therefore be marked as deprecated? Would you please mark them as [[deprecated]] in glibc too? This is not essential to me, since I can mark them as deprecated in the manual pages without that happening, but it'd help. Cheers, Alex > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-11 15:58 ` Alejandro Colomar @ 2022-12-11 16:03 ` Alejandro Colomar 2022-12-12 2:11 ` Zack Weinberg 1 sibling, 0 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-11 16:03 UTC (permalink / raw) To: Zack Weinberg, libc-alpha, 'linux-man'; +Cc: Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 2621 bytes --] On 12/11/22 16:58, Alejandro Colomar wrote: > [CC += Ian] > > Hi Zack, > > On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote: >> On 2022-12-09 2:33 PM, Alejandro Colomar via Libc-alpha wrote: >>>> Technically, the behavior is undefined if the result of the conversion >>>> cannot be represented in the object being assigned to by scanf. (In the >>>> case of glibc, that probably results in either the integer object being set >>>> to a truncated version of the input integer, or the integer object being set >>>> to a truncated version of LONG_MIN or LONG_MAX, depending on the actual >>>> number.) >>> >>> Hmm, UB. Under UB, anything can change, so error reporting is already >>> unreliable. If EOF+ERANGE can _only_ happen under UB, I'd rather remove the >>> paragraph. Please confirm. >> >> BUGS >> >> The `scanf` functions have undefined behavior if numeric input overflows. >> This means it is *impossible* to detect malformed input reliably using these >> functions. >> >> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of characters >> into a destination buffer whose size is unspecified; any use of such >> specifications renders `scanf` every bit as dangerous as `gets`. > > Thanks for reminding that! Since I don't use these functions, I don't remember > how bad they are :) > >> >> Best practice is not to use any of these functions at all. >> >> zw (no, this is not a joke) > > I'm inclined to add that in that manual page. Is there anything that can be > saved from that page, or should we burn it all? To be more specific: > > - Are there any functions in that page that are still useful for any corner > cases, or are they all useless? > - Are there any conversion specifiers that can be used safely? > > Or the converse questions: > > - Which conversion specifiers (or modifiers) are impossible to use safely as > gets(3) and should therefore be marked as deprecated in the manual page (and > probably warned in GCC)? > - Which functions in that page are impossible to use safely and should > therefore be marked as deprecated? This includes functions that can only be used safely for the most basic behavior for which better functions such as fgets(3) are superior. I'd also deprecate those. > > Would you please mark them as [[deprecated]] in glibc too? This is not > essential to me, since I can mark them as deprecated in the manual pages without > that happening, but it'd help. > > Cheers, > > Alex > >> > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-11 15:58 ` Alejandro Colomar 2022-12-11 16:03 ` Alejandro Colomar @ 2022-12-12 2:11 ` Zack Weinberg 2022-12-12 10:21 ` Alejandro Colomar 2022-12-12 15:22 ` Ian Abbott 1 sibling, 2 replies; 25+ messages in thread From: Zack Weinberg @ 2022-12-12 2:11 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott Alejandro Colomar <alx.manpages@gmail.com> writes: > On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote: >> The `scanf` functions have undefined behavior if numeric input >> overflows. This means it is *impossible* to detect malformed input >> reliably using these functions. >> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of >> characters into a destination buffer whose size is unspecified; any >> use of such specifications renders `scanf` every bit as dangerous as >> `gets`. … >> Best practice is not to use any of these functions at all. … > I'm inclined to add that in that manual page. Is there anything that > can be saved from that page, or should we burn it all? To be more > specific: > > - Are there any functions in that page that are still useful for any > corner cases, or are they all useless? > - Are there any conversion specifiers that can be used safely? Hmm, this turns out to be a bit of a rabbit hole. There are two major design-level problems with the scanf family. The more important one is that string input conversions (%s, %[…]) will read an unlimited number of characters by default, oblivious to the size of the destination buffer — exactly the same design flaw as ‘gets’. They do stop scanning at _any_ whitespace (not just \n) so, if you’re trying to craft exploit code, there are more byte values that must be avoided, but this is only a minor obstacle. They can, however, be used safely, either by supplying a field width that accurately reflects the size of the destination buffer, or by using the ‘m’ modifier (a POSIX extension), which directs scanf to allocate the right amount of space for the string with malloc. (Field widths are awkward to use because you have to write them as decimal constants _inside the format string_, which makes them more likely to get out of sync with the actual size of the buffer than, say, the buffer-size argument to ‘fgets’, but this is not a fatal flaw in and of itself.) The other design-level issue affects all of the numeric conversions: if the result of (abstract, infinite-precision) numeric input conversion does not fit in the variable supplied to hold the result of that conversion, the behavior is undefined. The manpage says that you get an ERANGE error in this case, and that may be the behavior _glibc_ guarantees (I don’t actually know for sure), but in the modern era of compilers drawing inferences from undefined behavior, a guarantee by one C library is not good enough. That covers everything except %c and %n, which are safe but somewhat pointless in isolation. > Or the converse questions: > > - Which conversion specifiers (or modifiers) are impossible to use > safely as gets(3) and should therefore be marked as deprecated in > the manual page (and probably warned in GCC)? > - Which functions in that page are impossible to use safely and > should therefore be marked as deprecated? > > Would you please mark them as [[deprecated]] in glibc too? I don’t think glibc should unilaterally deprecate any function that’s specified by ISO C. And, the scanf family *can* be used safely with sufficient care — read entire lines of input with getline, then split them up into fields with sscanf using only %ms and %m[…], and finally parse all numeric fields by hand with strtoX — the issue is more that, if you limit yourself to the set of scanf operations that are 100% safe, you’re left only with stuff that is arguably *easier* to do with <string.h> and <regex.h> functions. In a more sober tone of voice I suggest this text for the manpage: .SH BUGS By default, the .IR %s " and " %[ conversions will read an .I unlimited number of characters from the input. In this mode they are just as unsafe as the infamous .BR gets (3). One should always specify either a field width, or the .I m modifier, with every use of .IR %s " or " %[ . .PP If a numeric input conversion produces a value that is not representable in the type of the corresponding argument (e.g. if 99999 is to be stored in an .IR "unsigned short" ), ISO C says that the behavior is undefined. The GNU C Library guarantees to treat this condition as a .IR "matching failure" , but portable code should avoid using the numeric conversions. I also suggest that GCC should add diagnostics to -Wformat and/or -Wformat-security to catch use of %s and %[ with no size specified; if glibc doesn’t already treat numeric overflow as a matching failure, it should be changed to do so; and maybe someone should write up a proposal for the C standard to make the same change. zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-12 2:11 ` Zack Weinberg @ 2022-12-12 10:21 ` Alejandro Colomar 2022-12-14 2:13 ` Zack Weinberg 2022-12-12 15:22 ` Ian Abbott 1 sibling, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-12-12 10:21 UTC (permalink / raw) To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 8332 bytes --] Hi Zack! On 12/12/22 03:11, Zack Weinberg wrote: > Alejandro Colomar <alx.manpages@gmail.com> writes: >> On 12/9/22 22:41, Zack Weinberg via Libc-alpha wrote: >>> The `scanf` functions have undefined behavior if numeric input >>> overflows. This means it is *impossible* to detect malformed input >>> reliably using these functions. >>> Many input specifications (e.g. `%s`, `%[^\n]`) read a sequence of >>> characters into a destination buffer whose size is unspecified; any >>> use of such specifications renders `scanf` every bit as dangerous as >>> `gets`. > … >>> Best practice is not to use any of these functions at all. > … >> I'm inclined to add that in that manual page. Is there anything that >> can be saved from that page, or should we burn it all? To be more >> specific: >> >> - Are there any functions in that page that are still useful for any >> corner cases, or are they all useless? >> - Are there any conversion specifiers that can be used safely? > > Hmm, this turns out to be a bit of a rabbit hole. > > There are two major design-level problems with the scanf family. > The more important one is that string input conversions (%s, %[…]) > will read an unlimited number of characters by default, oblivious to > the size of the destination buffer — exactly the same design flaw as > ‘gets’. They do stop scanning at _any_ whitespace (not just \n) so, > if you’re trying to craft exploit code, there are more byte values that > must be avoided, but this is only a minor obstacle. They can, however, > be used safely, either by supplying a field width that accurately > reflects the size of the destination buffer, or by using the ‘m’ > modifier (a POSIX extension), which directs scanf to allocate the > right amount of space for the string with malloc. Okay, so %s and $[ are at least usable. Useful? I don't know. Probably fgets(3) and then either <string.h> or <regex.h> functions or taking unterminated strings (pointer plus length) is a much better idea. But not enough to deprecate the specifiers; probably better to warn on GCC. > > (Field widths are awkward to use because you have to write them as > decimal constants _inside the format string_, which makes them more > likely to get out of sync with the actual size of the buffer than, > say, the buffer-size argument to ‘fgets’, but this is not a fatal > flaw in and of itself.) Yeah; it's an almost useless feature, but not a fatal flaw. Any programmer would probably intuitively know that it's bad. So, no action is required here. > > The other design-level issue affects all of the numeric conversions: > if the result of (abstract, infinite-precision) numeric input conversion > does not fit in the variable supplied to hold the result of that conversion, > the behavior is undefined. The manpage says that you get an ERANGE error > in this case, and that may be the behavior _glibc_ guarantees (I don’t > actually know for sure), but in the modern era of compilers drawing > inferences from undefined behavior, a guarantee by one C library is > not good enough. This, to me, is enough to mark them as deprecated in the manual page. Anyway, deprecating something is not removing it. It's just saying "hey, you shouldn't be using that; it's bad, and don't expect that ISO C will keep it around next century". Something that results in undefined behavior without control of the programmer is as bad as gets(3) (okay, not as bad; gets(3) just was a red carpet for attacks, but fundamentally, yes). So, I'll apply the diff shown at the bottom of the page for this. > > That covers everything except %c and %n, which are safe but somewhat > pointless in isolation. > >> Or the converse questions: >> >> - Which conversion specifiers (or modifiers) are impossible to use >> safely as gets(3) and should therefore be marked as deprecated in >> the manual page (and probably warned in GCC)? >> - Which functions in that page are impossible to use safely and >> should therefore be marked as deprecated? >> >> Would you please mark them as [[deprecated]] in glibc too? > > I don’t think glibc should unilaterally deprecate any function that’s > specified by ISO C. Okay. The man-pages are not that restricted, since they won't affect code at all, and only the minds of the programmers, which is more powerful. So, I'll mark as deprecated at least the integer conversion specifiers. > And, the scanf family *can* be used safely with > sufficient care — read entire lines of input with getline, If getline(3) _is necessary_ to be safe, then I would deprecate the stream functions, and keep only the "s" variants. Is it? In fact, I'd say that even if it's not necessary to be safe, there are no good reasons to use [f]scanf(3) at all. I'm very much considering deprecating them in the manual page. > then split > them up into fields with sscanf using only %ms and %m[…], and finally > parse all numeric fields by hand with strtoX — the issue is more that, > if you limit yourself to the set of scanf operations that are 100% safe, > you’re left only with stuff that is arguably *easier* to do with <string.h> > and <regex.h> functions. > > In a more sober tone of voice I suggest this text for the manpage: > > .SH BUGS > By default, the > .IR %s " and " %[ > conversions will read an > .I unlimited > number of characters from the input. > In this mode they are just as unsafe as the infamous > .BR gets (3). > One should always specify either a field width, > or the > .I m > modifier, > with every use of > .IR %s " or " %[ . > .PP > If a numeric input conversion produces a value > that is not representable in the type of the corresponding argument > (e.g. if 99999 is to be stored in an > .IR "unsigned short" ), > ISO C says that the behavior is undefined. > The GNU C Library guarantees to treat this condition as a > .IR "matching failure" , > but portable code should avoid using the numeric conversions. That makes sense to me. Would you mind sending a patch? :) > > I also suggest that GCC should add diagnostics to -Wformat and/or > -Wformat-security to catch use of %s and %[ with no size specified; > if glibc doesn’t already treat numeric overflow as a matching failure, > it should be changed to do so; and maybe someone should write up a > proposal for the C standard to make the same change. Acked-by: Alejandro Colomar <alx@kernel.org> > > zw Cheers, Alex --- diff --git a/man3/scanf.3 b/man3/scanf.3 index ba470a5c1..0041d5573 100644 --- a/man3/scanf.3 +++ b/man3/scanf.3 @@ -386,6 +386,7 @@ .SS Conversions and assignment does not occur. .TP .B d +.IR Deprecated . Matches an optionally signed decimal integer; the next pointer must be a pointer to .IR int . @@ -400,6 +401,7 @@ .SS Conversions .\" is silently ignored, causing old programs to fail mysteriously.) .TP .B i +.IR Deprecated . Matches an optionally signed integer; the next pointer must be a pointer to .IR int . The integer is read in base 16 if it begins with @@ -412,15 +414,18 @@ .SS Conversions Only characters that correspond to the base are used. .TP .B o +.IR Deprecated . Matches an unsigned octal integer; the next pointer must be a pointer to .IR "unsigned int" . .TP .B u +.IR Deprecated . Matches an unsigned decimal integer; the next pointer must be a pointer to .IR "unsigned int" . .TP .B x +.IR Deprecated . Matches an unsigned hexadecimal integer (that may optionally begin with a prefix of .I 0x @@ -431,27 +436,33 @@ .SS Conversions .IR "unsigned int" . .TP .B X +.IR Deprecated . Equivalent to .BR x . .TP .B f +.IR Deprecated . Matches an optionally signed floating-point number; the next pointer must be a pointer to .IR float . .TP .B e +.IR Deprecated . Equivalent to .BR f . .TP .B g +.IR Deprecated . Equivalent to .BR f . .TP .B E +.IR Deprecated . Equivalent to .BR f . .TP .B a +.IR Deprecated . (C99) Equivalent to .BR f . .TP -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-12 10:21 ` Alejandro Colomar @ 2022-12-14 2:13 ` Zack Weinberg 2022-12-14 10:47 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: Zack Weinberg @ 2022-12-14 2:13 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott Alejandro Colomar <alx.manpages@gmail.com> writes: > Okay, so %s and $[ are at least usable. Useful? I don't know. Probably > fgets(3) and then either <string.h> or <regex.h> functions or taking > unterminated strings (pointer plus length) is a much better idea. Yeah, agreed. >> The other design-level issue affects all of the numeric conversions: >> if the result of (abstract, infinite-precision) numeric input conversion >> does not fit in the variable supplied to hold the result of that conversion, >> the behavior is undefined. The manpage says that you get an ERANGE error >> in this case, and that may be the behavior _glibc_ guarantees (I don’t >> actually know for sure), but in the modern era of compilers drawing >> inferences from undefined behavior, a guarantee by one C library is >> not good enough. > > This, to me, is enough to mark them as deprecated in the manual page. Anyway, > deprecating something is not removing it. It's just saying "hey, you shouldn't > be using that; it's bad, and don't expect that ISO C will keep it around next > century". In my lexicon “deprecated” is a very strong statement, possibly because I’m used to seeing it in the context of standards where it means “we think we should never have added this in the first place, there’s no plausible way to fix it, but we have to keep it around for backward compatibility.” The scanf numeric conversions could be fixed with a one-sentence edit to the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10 from “If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined” to “If this object does not have an appropriate type, the behavior is undefined. If the result of the conversion cannot be represented in the object, the execution of the directive fails; this condition is a matching failure.” And, even if the C committee doesn’t want to make that change, open-source C libraries can and should do it unilaterally, as a documented implementation extension. I think that’s a better plan than declaring most uses of *scanf “deprecated.” >> And, the scanf family *can* be used safely with >> sufficient care — read entire lines of input with getline, > > If getline(3) _is necessary_ to be safe, then I would deprecate the stream > functions, and keep only the "s" variants. Is it? Oh, right, the _third_ headache with fscanf. Yes, I think it would be fair to say that it is almost always a mistake to use the scanf variants that read directly from a FILE. The issue here is, at its root, that people new to C _expect_ a scanf call to read an entire line of input, but it doesn’t. This is especially problematic for interactive input — they try to use plain scanf to read numeric input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in the terminal’s line buffer _after_ the number, and get very confused when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they were expecting as a response to the _next_ prompt. But it’s _also_ a problem for error recovery, because scanf will stop in the middle of the line when a matching failure occurs, and if you naively assumed it would throw away the rest of the line, you get an error cascade. The recommended practice to avoid this trap, is that you should use one of the functions that _does_ read an entire line of input, i.e. fgets or getline, and then parse the line as a string. It would make sense for the [f]scanf manpage to say that. >> In a more sober tone of voice I suggest this text for the manpage: … > That makes sense to me. Would you mind sending a patch? :) I do not have time to do that anytime soon. Also, maybe glibc’s behavior on numeric input overflow should be fixed first. zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 2:13 ` Zack Weinberg @ 2022-12-14 10:47 ` Alejandro Colomar 2022-12-14 11:03 ` Ian Abbott 2022-12-29 6:39 ` Zack Weinberg 0 siblings, 2 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-14 10:47 UTC (permalink / raw) To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 5300 bytes --] Hi Zack, On 12/14/22 03:13, Zack Weinberg wrote: > Alejandro Colomar <alx.manpages@gmail.com> writes: >> Okay, so %s and $[ are at least usable. Useful? I don't know. Probably >> fgets(3) and then either <string.h> or <regex.h> functions or taking >> unterminated strings (pointer plus length) is a much better idea. > > Yeah, agreed. > >>> The other design-level issue affects all of the numeric conversions: >>> if the result of (abstract, infinite-precision) numeric input conversion >>> does not fit in the variable supplied to hold the result of that conversion, >>> the behavior is undefined. The manpage says that you get an ERANGE error >>> in this case, and that may be the behavior _glibc_ guarantees (I don’t >>> actually know for sure), but in the modern era of compilers drawing >>> inferences from undefined behavior, a guarantee by one C library is >>> not good enough. >> >> This, to me, is enough to mark them as deprecated in the manual page. Anyway, >> deprecating something is not removing it. It's just saying "hey, you shouldn't >> be using that; it's bad, and don't expect that ISO C will keep it around next >> century". > > In my lexicon “deprecated” is a very strong statement, possibly because > I’m used to seeing it in the context of standards where it means “we > think we should never have added this in the first place, there’s no > plausible way to fix it, but we have to keep it around for backward > compatibility.” > > The scanf numeric conversions could be fixed with a one-sentence edit to > the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10 > from “If this object does not have an appropriate type, or if the result > of the conversion cannot be represented in the object, the behavior is > undefined” to “If this object does not have an appropriate type, the > behavior is undefined. If the result of the conversion cannot be > represented in the object, the execution of the directive fails; this > condition is a matching failure.” And, even if the C committee doesn’t > want to make that change, open-source C libraries can and should do it > unilaterally, as a documented implementation extension. I think that’s > a better plan than declaring most uses of *scanf “deprecated.” Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :) > >>> And, the scanf family *can* be used safely with >>> sufficient care — read entire lines of input with getline, >> >> If getline(3) _is necessary_ to be safe, then I would deprecate the stream >> functions, and keep only the "s" variants. Is it? > > Oh, right, the _third_ headache with fscanf. > > Yes, I think it would be fair to say that it is almost always a mistake > to use the scanf variants that read directly from a FILE. The issue > here is, at its root, that people new to C _expect_ a scanf call to read > an entire line of input, but it doesn’t. This is especially problematic > for interactive input — they try to use plain scanf to read numeric > input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in > the terminal’s line buffer _after_ the number, and get very confused > when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they > were expecting as a response to the _next_ prompt. But it’s _also_ a > problem for error recovery, because scanf will stop in the middle of the > line when a matching failure occurs, and if you naively assumed it would > throw away the rest of the line, you get an error cascade. > > The recommended practice to avoid this trap, is that you should use one > of the functions that _does_ read an entire line of input, i.e. fgets or > getline, and then parse the line as a string. It would make sense for > the [f]scanf manpage to say that. Please clarify; do you consider [v][f]scanf something that "we think we should never have added this in the first place, there’s no plausible way to fix it, but we have to keep it around for backward compatibility"? > >>> In a more sober tone of voice I suggest this text for the manpage: > … >> That makes sense to me. Would you mind sending a patch? :) > > I do not have time to do that anytime soon. Also, maybe glibc’s > behavior on numeric input overflow should be fixed first. That also makes sense ;) In short: (1) Numeric conversion specifiers are broken but can be fixed, and you plan to fix them. (1.1) I'll revert the deprecation warning now; since they are only broken because the _current_ standard and implementations are broken, but not by inherent design problems. (1.2) When you fix the implementation to not be UB anymore, it will also make sense to revert the patch that removed the ERANGE error, since you'll need to report it. (2) For the string conversion specifiers, there are ways to use them safely, and you plan to add a way to specify a size at runtime to the function, so it will be even better in the future. No action required. (3) [v][f]scanf seem to be really broken by design. Please confirm. Cheers, Alex > > zw -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 10:47 ` Alejandro Colomar @ 2022-12-14 11:03 ` Ian Abbott 2022-12-29 6:42 ` Zack Weinberg 2022-12-29 6:39 ` Zack Weinberg 1 sibling, 1 reply; 25+ messages in thread From: Ian Abbott @ 2022-12-14 11:03 UTC (permalink / raw) To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man' On 2022-12-14 10:47, Alejandro Colomar wrote: > On 12/14/22 03:13, Zack Weinberg wrote: >> Alejandro Colomar <alx.manpages@gmail.com> writes: >>>> In a more sober tone of voice I suggest this text for the manpage: >> … >>> That makes sense to me. Would you mind sending a patch? :) >> >> I do not have time to do that anytime soon. Also, maybe glibc’s >> behavior on numeric input overflow should be fixed first. > > That also makes sense ;) > > In short: > > (1) Numeric conversion specifiers are broken but can be fixed, and you > plan to fix them. > > (1.1) I'll revert the deprecation warning now; since they are > only broken because the _current_ standard and implementations are > broken, but not by inherent design problems. > > (1.2) When you fix the implementation to not be UB anymore, it > will also make sense to revert the patch that removed the ERANGE error, > since you'll need to report it. And would ERANGE cause scanf to return EOF in the fixed implementation? That seems like it would break a lot of existing code (even though it is currently UB). It would probably be better to silently set errno to ERANGE without returning EOF, and to set the integer object's value to the maximum or minimum value for its type (as it currently does for signed/unsigned long). -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 11:03 ` Ian Abbott @ 2022-12-29 6:42 ` Zack Weinberg 0 siblings, 0 replies; 25+ messages in thread From: Zack Weinberg @ 2022-12-29 6:42 UTC (permalink / raw) To: Ian Abbott; +Cc: Alejandro Colomar, libc-alpha, 'linux-man' On Wed, 14 Dec 2022 06:03:07 -0500, Ian Abbott wrote: > And would ERANGE cause scanf to return EOF in the fixed > implementation? In the counterfactual universe where I actually patch scanf, yes. > That seems like it would break a lot of existing code > (even though it is currently UB). It would probably be better to > silently set errno to ERANGE without returning EOF My intuition says it’s the other way around. What existing code do you expect will break? zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 10:47 ` Alejandro Colomar 2022-12-14 11:03 ` Ian Abbott @ 2022-12-29 6:39 ` Zack Weinberg 2022-12-29 10:47 ` Alejandro Colomar 1 sibling, 1 reply; 25+ messages in thread From: Zack Weinberg @ 2022-12-29 6:39 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott On Wed, 14 Dec 2022 05:47:12 -0500, Alejandro Colomar wrote: > On 12/14/22 03:13, Zack Weinberg wrote: > > The scanf numeric conversions could be fixed with a one-sentence edit to > > the C standard: change the last sentence of http://port70.net/~nsz/c/c11/n1570.html#7.21.6.2p10 > > from “If this object does not have an appropriate type, or if the result > > of the conversion cannot be represented in the object, the behavior is > > undefined” to “If this object does not have an appropriate type, the > > behavior is undefined. If the result of the conversion cannot be > > represented in the object, the execution of the directive fails; this > > condition is a matching failure.” And, even if the C committee doesn’t > > want to make that change, open-source C libraries can and should do it > > unilaterally, as a documented implementation extension. I think that’s > > a better plan than declaring most uses of *scanf “deprecated.” > > Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :) To be clear, I personally don’t have plans to do any of the actual programming or standard-changing work involved here. :-) > > Yes, I think it would be fair to say that it is almost always a mistake > > to use the scanf variants that read directly from a FILE. The issue > > here is, at its root, that people new to C _expect_ a scanf call to read > > an entire line of input, but it doesn’t. This is especially problematic > > for interactive input ― they try to use plain scanf to read numeric > > input, don’t realize that `scanf("%d", &arg)` doesn’t consume the \n in > > the terminal’s line buffer _after_ the number, and get very confused > > when a subsequent getchar() reads that \n instead of the ‘y’ or ‘n’ they > > were expecting as a response to the _next_ prompt. But it’s _also_ a > > problem for error recovery, because scanf will stop in the middle of the > > line when a matching failure occurs, and if you naively assumed it would > > throw away the rest of the line, you get an error cascade. > > > > The recommended practice to avoid this trap, is that you should use one > > of the functions that _does_ read an entire line of input, i.e. fgets or > > getline, and then parse the line as a string. It would make sense for > > the [f]scanf manpage to say that. > > Please clarify; do you consider [v][f]scanf something that "we think > we should never have added this in the first place, there’s no > plausible way to fix it, but we have to keep it around for backward > compatibility"? _I_ wouldn’t have added them in the first place, but I care more than the average about robustness in the face of unexpected input, even for “throwaway” programs. I doubt the C committee would be prepared to say the same thing. They _can_ be used legitimately, and they can even be used in ways that meet my robustness standards if you go to enough trouble. It’s just that (IMNSHO) there are better ways to reach those standards. In terms of text for the [v][f]scanf manpage, maybe something like [NOTES? BUGS?] When scanf() or fscanf() report a _matching failure_, all of the text that was matched successfully has still been read from _stdin_ or the _stream_ (respectively), and so have an unpredictable number of characters associated with the conversion that failed to match. The latter characters are lost. This may make it difficult to recover from invalid input. One way to make recovery easier is to separate reading from parsing: use fgets() or getline() to read an entire line of text into a string, then use sscanf() to parse the string. If a _matching failure_ occurs, you can try sscanf() again with a different _format_; the equivalent is not possible using fscanf(). _Successful_ calls to scanf() and fscanf() frequently consume either more, or fewer, characters from the input than was expected. For example, assuming the next six characters readable from `stdin` are `"123\n a"‘, `scanf("%d", &val)` will consume the digits but _not_ the newline, and `scanf("%d\n", &val)‘ will consume the digits, the newline, _and_ the space. Either of these is likely to cause trouble when mixing calls to scanf() with calls to fgets() or fgetc(). As above, it helps to read entire lines of text with fgets() or getline() and then parse them with sscanf(). > In short: > > (1) Numeric conversion specifiers are broken but can be fixed… Yes. > …and you plan to fix them. No :) > (1.1) I'll revert the deprecation warning now; since they are > only broken because the _current_ standard and implementations are > broken, but not by inherent design problems. OK. > (1.2) When [someone fixes] the implementation to not be UB > anymore, it will also make sense to revert the patch that removed > the ERANGE error, since you'll need to report it. Yes. > (2) For the string conversion specifiers, there are ways to use them > safely. Yes. > and you plan to add a way to specify a size at runtime to the > function, No again :) > so it will be even better in the future. No action required. Concur. > (3) [v][f]scanf seem to be really broken by design. Please confirm. See above. If you remind me where to find the git repo for the manpages, I _may_ have time to write a patch for all this sometime next week. zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-29 6:39 ` Zack Weinberg @ 2022-12-29 10:47 ` Alejandro Colomar 2022-12-29 16:35 ` Zack Weinberg 0 siblings, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-12-29 10:47 UTC (permalink / raw) To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 5528 bytes --] Hi Zack, On 12/29/22 07:39, Zack Weinberg wrote: >> Yeah, if you have plans to fix it, I'm fine removing the deprecation now. :) > > To be clear, I personally don’t have plans to do any of the actual > programming or standard-changing work involved here. :-) Ah, no, I meant more that the whole set of glibc maintainers had that in mind, as a long term plan (like 10 years maybe?). If you don't think these functions will be saved (even if we can) because it's not worth it, maybe we can just kill them. But I'll defer that decission to after some fixes to the documentation. >> Please clarify; do you consider [v][f]scanf something that "we think >> we should never have added this in the first place, there’s no >> plausible way to fix it, but we have to keep it around for backward >> compatibility"? > > _I_ wouldn’t have added them in the first place, but I care more than > the average about robustness in the face of unexpected input, even for > “throwaway” programs. I doubt the C committee would be prepared to > say the same thing. I share your concerns about robustness of code, and am prepared to defend at least some partial/unofficial deprecation. I'll clarify below. > They _can_ be used legitimately, and they can > even be used in ways that meet my robustness standards if you go > to enough trouble. That's not so clear to me. > It’s just that (IMNSHO) there are better ways to > reach those standards. > > In terms of text for the [v][f]scanf manpage, maybe something like > > [NOTES? BUGS?] > When scanf() or fscanf() report a _matching failure_, all of the > text that was matched successfully has still been read from > _stdin_ or the _stream_ (respectively), and so have an > unpredictable number of characters associated with the conversion > that failed to match. The latter characters are lost. This may > make it difficult to recover from invalid input. > > One way to make recovery easier is to separate reading from > parsing: use fgets() or getline() to read an entire line of text > into a string, then use sscanf() to parse the string. If a > _matching failure_ occurs, you can try sscanf() again with a > different _format_; the equivalent is not possible using fscanf(). This reads more or less as: "the only way to use scanf(3) is not to use it; use fgets(3)/getline(3) + sscanf(3) instead". > > _Successful_ calls to scanf() and fscanf() frequently consume > either more, or fewer, characters from the input than was > expected. For example, assuming the next six characters readable > from `stdin` are `"123\n a"‘, `scanf("%d", &val)` will consume the > digits but _not_ the newline, and `scanf("%d\n", &val)‘ will > consume the digits, the newline, _and_ the space. Either of these > is likely to cause trouble when mixing calls to scanf() with calls > to fgets() or fgetc(). As above, it helps to read entire lines of > text with fgets() or getline() and then parse them with sscanf(). And this reads as "even \"successful\" calls to scanf(3) are doomed; really, never use it". :) > >> In short: >> >> (1) Numeric conversion specifiers are broken but can be fixed… > > Yes. > >> …and you plan to fix them. > > No :) > >> (1.1) I'll revert the deprecation warning now; since they are >> only broken because the _current_ standard and implementations are >> broken, but not by inherent design problems. > > OK. > >> (1.2) When [someone fixes] the implementation to not be UB >> anymore, it will also make sense to revert the patch that removed >> the ERANGE error, since you'll need to report it. > > Yes. > >> (2) For the string conversion specifiers, there are ways to use them >> safely. > > Yes. > >> and you plan to add a way to specify a size at runtime to the >> function, > > No again :) > >> so it will be even better in the future. No action required. > > Concur. > >> (3) [v][f]scanf seem to be really broken by design. Please confirm. > > See above. Before you start writing patches, I'm considering the following, which is my way to say don't use these functions without deprecating them: Split FILE and char* functions into separate manual pages. In the one for [v]sscanf(3), I'd keep the current documentation. In the one for FILE functions, I'd keep it very short, defering to sscanf(3) for documentation of things like conversion specifiers, and that page would only cover the bugs^Wdifferences that apply only to FILE functions. I'll prepare some patches and show them for discussion in linux-man@. > > If you remind me where to find the git repo for the manpages, I _may_ > have time to write a patch for all this sometime next week. man-pages' README: Versions Tarballs of releases starting from 2.00 are available at <https://mirrors.edge.kernel.org/pub/linux/docs/man-pages/>. The git(1) repository can be found at: <git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git> A secondary git(1) repository can be found at: <git://www.alejandro-colomar.es/src/alx/linux/man-pages/man-pages.git> See also man-pages(7) Website <http://www.kernel.org/doc/man-pages/index.html>. > > zw Cheers, Alex -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-29 10:47 ` Alejandro Colomar @ 2022-12-29 16:35 ` Zack Weinberg 2022-12-29 16:39 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: Zack Weinberg @ 2022-12-29 16:35 UTC (permalink / raw) To: Alejandro Colomar; +Cc: libc-alpha, 'linux-man', Ian Abbott On Thu, 29 Dec 2022 05:47:06 -0500, Alejandro Colomar wrote: > On 12/29/22 07:39, Zack Weinberg wrote: > > To be clear, I personally don’t have plans to do any of the actual > > programming or standard-changing work involved here. :-) > > Ah, no, I meant more that the whole set of glibc maintainers had that > in mind, as a long term plan (like 10 years maybe?). Oh, OK. Yeah, changes to the standard can easily take that long. > Before you start writing patches, I'm considering the following, which > is my way to say don't use these functions without deprecating them: > > Split FILE and char* functions into separate manual pages. In the one > for [v]sscanf(3), I'd keep the current documentation. In the one for > FILE functions, I'd keep it very short, defering to sscanf(3) for > documentation of things like conversion specifiers, and that page > would only cover the bugs^Wdifferences that apply only to FILE > functions. That seems like a good way forward to me. zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-29 16:35 ` Zack Weinberg @ 2022-12-29 16:39 ` Alejandro Colomar 0 siblings, 0 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-29 16:39 UTC (permalink / raw) To: Zack Weinberg; +Cc: libc-alpha, 'linux-man', Ian Abbott [-- Attachment #1.1: Type: text/plain, Size: 1263 bytes --] Hi Zack, On 12/29/22 17:35, Zack Weinberg wrote: > On Thu, 29 Dec 2022 05:47:06 -0500, Alejandro Colomar wrote: >> On 12/29/22 07:39, Zack Weinberg wrote: >>> To be clear, I personally don’t have plans to do any of the actual >>> programming or standard-changing work involved here. :-) >> >> Ah, no, I meant more that the whole set of glibc maintainers had that >> in mind, as a long term plan (like 10 years maybe?). > > Oh, OK. Yeah, changes to the standard can easily take that long. > >> Before you start writing patches, I'm considering the following, which >> is my way to say don't use these functions without deprecating them: >> >> Split FILE and char* functions into separate manual pages. In the one >> for [v]sscanf(3), I'd keep the current documentation. In the one for >> FILE functions, I'd keep it very short, defering to sscanf(3) for >> documentation of things like conversion specifiers, and that page >> would only cover the bugs^Wdifferences that apply only to FILE >> functions. > > That seems like a good way forward to me. I've done the splitting. If you would like to prepare any patches for adding BUGS, I'll take them :) Cheers, Alex > > zw -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-12 2:11 ` Zack Weinberg 2022-12-12 10:21 ` Alejandro Colomar @ 2022-12-12 15:22 ` Ian Abbott 2022-12-14 2:18 ` Zack Weinberg 1 sibling, 1 reply; 25+ messages in thread From: Ian Abbott @ 2022-12-12 15:22 UTC (permalink / raw) To: Zack Weinberg, Alejandro Colomar; +Cc: libc-alpha, 'linux-man' On 12/12/2022 02:11, Zack Weinberg wrote: > (Field widths are awkward to use because you have to write them as > decimal constants _inside the format string_, which makes them more > likely to get out of sync with the actual size of the buffer than, > say, the buffer-size argument to ‘fgets’, but this is not a fatal > flaw in and of itself.) It's a shame that scanf's maximum field width couldn't be specified using an integer parameter in the same was as printf's minimum field width, but the '*' flag was already taken! -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-12 15:22 ` Ian Abbott @ 2022-12-14 2:18 ` Zack Weinberg 2022-12-14 10:22 ` Ian Abbott 0 siblings, 1 reply; 25+ messages in thread From: Zack Weinberg @ 2022-12-14 2:18 UTC (permalink / raw) To: Ian Abbott; +Cc: Alejandro Colomar, libc-alpha, 'linux-man' Ian Abbott <abbotti@mev.co.uk> writes: > On 12/12/2022 02:11, Zack Weinberg wrote: >> Field widths are awkward to use because you have to write them as >> decimal constants _inside the format string_… > > It's a shame that scanf's maximum field width couldn't be specified > using an integer parameter in the same was as printf's minimum field > width, but the '*' flag was already taken! Yup. I suppose we could make up another flag … ‘@’ isn’t used for anything … zw ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 2:18 ` Zack Weinberg @ 2022-12-14 10:22 ` Ian Abbott 2022-12-14 10:39 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: Ian Abbott @ 2022-12-14 10:22 UTC (permalink / raw) To: Zack Weinberg; +Cc: Alejandro Colomar, libc-alpha, 'linux-man' On 14/12/2022 02:18, Zack Weinberg wrote: > Ian Abbott <abbotti@mev.co.uk> writes: > >> On 12/12/2022 02:11, Zack Weinberg wrote: >>> Field widths are awkward to use because you have to write them as >>> decimal constants _inside the format string_… >> >> It's a shame that scanf's maximum field width couldn't be specified >> using an integer parameter in the same was as printf's minimum field >> width, but the '*' flag was already taken! > > Yup. I suppose we could make up another flag … ‘@’ isn’t used for > anything … '@' isn't included in C's basic character set though. '&' is available. -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 10:22 ` Ian Abbott @ 2022-12-14 10:39 ` Alejandro Colomar 2022-12-14 10:52 ` Ian Abbott 0 siblings, 1 reply; 25+ messages in thread From: Alejandro Colomar @ 2022-12-14 10:39 UTC (permalink / raw) To: Ian Abbott, Zack Weinberg; +Cc: libc-alpha, 'linux-man' [-- Attachment #1.1: Type: text/plain, Size: 903 bytes --] Hi Ian & Zack, On 12/14/22 11:22, Ian Abbott wrote: > On 14/12/2022 02:18, Zack Weinberg wrote: >> Ian Abbott <abbotti@mev.co.uk> writes: >> >>> On 12/12/2022 02:11, Zack Weinberg wrote: >>>> Field widths are awkward to use because you have to write them as >>>> decimal constants _inside the format string_… >>> >>> It's a shame that scanf's maximum field width couldn't be specified >>> using an integer parameter in the same was as printf's minimum field >>> width, but the '*' flag was already taken! >> >> Yup. I suppose we could make up another flag … ‘@’ isn’t used for >> anything … > > '@' isn't included in C's basic character set though. '&' is available. Just a curious question from an ignorant: what's the difference between the basic character set and the source character set? Thanks, Alex > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 10:39 ` Alejandro Colomar @ 2022-12-14 10:52 ` Ian Abbott 2022-12-14 11:23 ` Alejandro Colomar 0 siblings, 1 reply; 25+ messages in thread From: Ian Abbott @ 2022-12-14 10:52 UTC (permalink / raw) To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man' On 2022-12-14 10:39, Alejandro Colomar wrote: > Hi Ian & Zack, > > On 12/14/22 11:22, Ian Abbott wrote: >> On 14/12/2022 02:18, Zack Weinberg wrote: >>> Ian Abbott <abbotti@mev.co.uk> writes: >>> >>>> On 12/12/2022 02:11, Zack Weinberg wrote: >>>>> Field widths are awkward to use because you have to write them as >>>>> decimal constants _inside the format string_… >>>> >>>> It's a shame that scanf's maximum field width couldn't be specified >>>> using an integer parameter in the same was as printf's minimum field >>>> width, but the '*' flag was already taken! >>> >>> Yup. I suppose we could make up another flag … ‘@’ isn’t used for >>> anything … >> >> '@' isn't included in C's basic character set though. '&' is available. > > Just a curious question from an ignorant: what's the difference between > the basic character set and the source character set? The source character set may contain locale-specific characters outside the basic source character set. Actually, there are two basic character sets - the basic source character set and the basic execution character set (which includes the basic source character set plus a few control characters). The source character set and/or execution character set may contain locale-specific, extended characters outside the basic character set. https://port70.net/~nsz/c/c11/n1570.html#5.2.1 -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 10:52 ` Ian Abbott @ 2022-12-14 11:23 ` Alejandro Colomar 2022-12-14 14:10 ` Ian Abbott 2022-12-14 16:38 ` Joseph Myers 0 siblings, 2 replies; 25+ messages in thread From: Alejandro Colomar @ 2022-12-14 11:23 UTC (permalink / raw) To: Ian Abbott, Zack Weinberg; +Cc: libc-alpha, 'linux-man' [-- Attachment #1.1: Type: text/plain, Size: 1420 bytes --] On 12/14/22 11:52, Ian Abbott wrote: >>> >>> '@' isn't included in C's basic character set though. '&' is available. >> >> Just a curious question from an ignorant: what's the difference between the >> basic character set and the source character set? > > The source character set may contain locale-specific characters outside the > basic source character set. > > Actually, there are two basic character sets - the basic source character set > and the basic execution character set (which includes the basic source character > set plus a few control characters). The source character set and/or execution > character set may contain locale-specific, extended characters outside the basic > character set. > > https://port70.net/~nsz/c/c11/n1570.html#5.2.1 I still have a small doubt. C23 added '@' to the source character set, but seems to be a second-class citizen: The execution character set may also contain multibyte characters, which need not have the same encoding as for the source character set. For both character sets, the following shall hold: — The basic character set, @, $, and ` shall be present and each character shall be encoded as a single byte. What's the difference, and why isn't it part of the basic character set? Maybe because not all keyboards have those three characters? > -- <http://www.alejandro-colomar.es/> [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 11:23 ` Alejandro Colomar @ 2022-12-14 14:10 ` Ian Abbott 2022-12-14 16:38 ` Joseph Myers 1 sibling, 0 replies; 25+ messages in thread From: Ian Abbott @ 2022-12-14 14:10 UTC (permalink / raw) To: Alejandro Colomar, Zack Weinberg; +Cc: libc-alpha, 'linux-man' On 14/12/2022 11:23, Alejandro Colomar wrote: > > > On 12/14/22 11:52, Ian Abbott wrote: >>>> >>>> '@' isn't included in C's basic character set though. '&' is >>>> available. >>> >>> Just a curious question from an ignorant: what's the difference >>> between the basic character set and the source character set? >> >> The source character set may contain locale-specific characters >> outside the basic source character set. >> >> Actually, there are two basic character sets - the basic source >> character set and the basic execution character set (which includes >> the basic source character set plus a few control characters). The >> source character set and/or execution character set may contain >> locale-specific, extended characters outside the basic character set. >> >> https://port70.net/~nsz/c/c11/n1570.html#5.2.1 > > I still have a small doubt. C23 added '@' to the source character set, > but seems to be a second-class citizen: > > > > The execution character set may also contain multibyte characters, which > need not have the same encoding as for the source character set. For > both character sets, the following > shall hold: > — The basic character set, @, $, and ` shall be present and each > character shall be encoded as a > single byte. > > What's the difference, and why isn't it part of the basic character > set? Maybe because not all keyboards have those three characters? I think the inability to type certain characters in the basic source character set is the reason why the language contains the horrible trigraph sequences (no longer valid since the C23 final draft N3054), and the slightly less horrible digraph tokens. Here is the rationale for inclusion of @ and $ in the source and execution character sets, but ` is only mentioned briefly as an also-ran at the end of the document in section "Do we also want to add ` in the same way as @ and $?": https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm The rationale for exclusion of @ and $ characters from the basic character set is given in this paragraph from the document: """ By requiring @ and $ in the source and execution character set we, reach the goal of making them useable in comments and string literals. By not adding them to the basic source character set, we protect the freedom of implementations of allowing or disallowing them in identifiers, and avoid inconsistency or incompability regarding the use of universal character names (currently the use of universal character names for characters in the basic source character set is not allowed, so adding characters to the basic source character set without lifting that restriction could break existing code). """ I guess it was decided to add all three proposed characters during the Jan/Feb 2022 virtual meeting of WG14 as mentioned here: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2913.htm The first C2x draft that incorporated the change is this one: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-14 11:23 ` Alejandro Colomar 2022-12-14 14:10 ` Ian Abbott @ 2022-12-14 16:38 ` Joseph Myers 1 sibling, 0 replies; 25+ messages in thread From: Joseph Myers @ 2022-12-14 16:38 UTC (permalink / raw) To: Alejandro Colomar Cc: Ian Abbott, Zack Weinberg, libc-alpha, 'linux-man' On Wed, 14 Dec 2022, Alejandro Colomar via Libc-alpha wrote: > What's the difference, and why isn't it part of the basic character set? Apart from the point discussed about how making them part of the basic character set would interact with other rules involving that character set, the basic source character set consists of those characters that have some specified role in the C syntax. There is no specified role for $ @ ` in the C syntax - they can only be used in string or character literals (modulo the question of whether $ should still be allowed as an implementation-defined character in identifiers, see N3046) and have no special role to play there. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH] scanf.3: Do not mention the ERANGE error 2022-12-09 19:33 ` Alejandro Colomar 2022-12-09 21:41 ` Zack Weinberg @ 2022-12-12 10:07 ` Ian Abbott 1 sibling, 0 replies; 25+ messages in thread From: Ian Abbott @ 2022-12-12 10:07 UTC (permalink / raw) To: Alejandro Colomar, Alejandro Colomar; +Cc: linux-man, GNU C Library On 09/12/2022 19:33, Alejandro Colomar wrote: > Hi Ian, > > On 12/9/22 20:28, Ian Abbott wrote: >> On 09/12/2022 18:59, Alejandro Colomar wrote: >>> On 12/8/22 13:34, Ian Abbott wrote: >>>> The `scanf()` function does not intentionally set `errno` to `ERANGE`. >>>> That is just a side effect of the code that it uses to perform >>>> conversions. It also does not work as reliably as indicated in the >>>> 'man' page when the target integer type is narrower than `long`. >>>> Typically (at least in glibc) for target integer types narrower than >>>> `long`, the number has to exceed the range of `long` (for signed >>>> conversions) or `unsigned long` (for unsigned conversions) for `errno` >>>> to be set to `ERANGE`. >>>> >>>> Documenting `ERANGE` in the ERRORS section kind of implies that >>>> `scanf()` should return `EOF` when an integer overflow is encountered, >>>> which it doesn't (and doing so would violate the C standard). >>>> >>>> Just remove any mention of the `ERANGE` error to avoid confusion. >>>> >>>> Fixes: 646af540e467 ("Add an ERRORS section documenting at least >>>> some of the errors that may occur for scanf().") >>>> Cc: Michael Kerrisk <mtk.manpages@gmail.com> >>>> Signed-off-by: Ian Abbott <abbotti@mev.co.uk> >>> >>> I see. How about saying something like "it may also fail for any of >>> any errors that functions used to perform the conversions may fail"? >> >> It depends what you mean by "fail". These errors do not make scanf >> return EOF. > > Just to clarify. Does scanf(3) _never_ fail (EOF) due to ERANGE? Or is > it that ERANGE sometimes makes it fail, sometimes not? The glibc implementation certainly doesn't return EOF when ERANGE is detected. __vfscanf_internal() in stdio-common/vfscan-internal.c does not contain any code to deal with ERANGE - it's just a side-effect of the calls to __strtol_internal(), __strtoul_internal(), __strtoll_internal(), or __strtoull_internal(). > If it's the former, I agree with your patch. When a function hasn't > reported failure, errno is unspecified. > > If it's the latter, I'd write something about it. For the glibc implementation, it's the former. >> Technically, the behavior is undefined if the result of the conversion >> cannot be represented in the object being assigned to by scanf. (In >> the case of glibc, that probably results in either the integer object >> being set to a truncated version of the input integer, or the integer >> object being set to a truncated version of LONG_MIN or LONG_MAX, >> depending on the actual number.) > > Hmm, UB. Under UB, anything can change, so error reporting is already > unreliable. If EOF+ERANGE can _only_ happen under UB, I'd rather remove > the paragraph. Please confirm. Yes, it is UB as per C17 7.21.6 paragraph 10: "[...] Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined." >> Setting errno to 0 before calling scanf and expecting errno to have a >> meaningful value when scanf returns something other than EOF is bogus >> usage. > > Yep, that's bogus. > > > Cheers, > > Alex Best regards, Ian -- -=( Ian Abbott <abbotti@mev.co.uk> || MEV Ltd. is a company )=- -=( registered in England & Wales. Regd. number: 02862268. )=- -=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=- -=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=- ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2022-12-29 16:39 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20221208123454.13132-1-abbotti@mev.co.uk> 2022-12-09 18:59 ` [PATCH] scanf.3: Do not mention the ERANGE error Alejandro Colomar 2022-12-09 19:28 ` Ian Abbott 2022-12-09 19:33 ` Alejandro Colomar 2022-12-09 21:41 ` Zack Weinberg 2022-12-11 15:58 ` Alejandro Colomar 2022-12-11 16:03 ` Alejandro Colomar 2022-12-12 2:11 ` Zack Weinberg 2022-12-12 10:21 ` Alejandro Colomar 2022-12-14 2:13 ` Zack Weinberg 2022-12-14 10:47 ` Alejandro Colomar 2022-12-14 11:03 ` Ian Abbott 2022-12-29 6:42 ` Zack Weinberg 2022-12-29 6:39 ` Zack Weinberg 2022-12-29 10:47 ` Alejandro Colomar 2022-12-29 16:35 ` Zack Weinberg 2022-12-29 16:39 ` Alejandro Colomar 2022-12-12 15:22 ` Ian Abbott 2022-12-14 2:18 ` Zack Weinberg 2022-12-14 10:22 ` Ian Abbott 2022-12-14 10:39 ` Alejandro Colomar 2022-12-14 10:52 ` Ian Abbott 2022-12-14 11:23 ` Alejandro Colomar 2022-12-14 14:10 ` Ian Abbott 2022-12-14 16:38 ` Joseph Myers 2022-12-12 10:07 ` Ian Abbott
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).