public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Re: Buffer size checking for scanf* functions
@ 2022-07-05 21:50 Yair Lenga
  2022-07-06 12:10 ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Yair Lenga @ 2022-07-05 21:50 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help


> 
> Adhemerval,

Thanks for taking time to review my posting.

I agree 100% with the critic on annex K. I believe there is (almost) universal agreement that it was half baked solution, rushed into the standard without proper review - which is why it got very little traction.

However, the fact the the annex K has many flaws does not mean that there are no good ideas in it. In particular, the ability to create “safer” vararg lists for atomic types, and use them in scanf* and similar, is significant improvement (IHMO). I believe it improve safety, while keeping code clean, readable, and can be used for many use cases (e.g., my scanf from a result set, parsing csv-like data, …).

Also worth mentioning that one mentioned alternative (OBS - object size checking) can fit into the proposed scanf change, given it ability to identify object sizes for static/dynamic char *.

Hoping to get feedback on my proposal for this limited use case.

Also, regarding the proposal for using stringifying macros, etc. Those are interesting ideas, but will break code clarity/readability . They Will not fit nicely into large scale projects, given their limitation (e.g. with dynamic size strings, etc.)

Thanks again for your time/ideas/feedback,

Yair

Sent from my iPad

> On Jul 5, 2022, at 3:39 PM, Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> The glibc does not support hooks for scanf and I don’t think we will even
> will, the printf hooks itself has some drawbacks (MT-safeness, etc.).  Also
> scan_s and similar ‘enhanced’ functions from TR-24731 have not been widely 
> implemented mainly because they do not actually improve much. Carlos and
> Martim, both GNU developers, wrote a paper discussing the problems with
> these interfaces [1].

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Buffer size checking for scanf* functions
  2022-07-05 21:50 Buffer size checking for scanf* functions Yair Lenga
@ 2022-07-06 12:10 ` Adhemerval Zanella
  2022-07-06 15:04   ` Yair Lenga
  0 siblings, 1 reply; 6+ messages in thread
From: Adhemerval Zanella @ 2022-07-06 12:10 UTC (permalink / raw)
  To: Yair Lenga; +Cc: libc-help



> On 5 Jul 2022, at 18:50, Yair Lenga <yair.lenga@gmail.com> wrote:
> 
> 
>> 
>> Adhemerval,
> 
> Thanks for taking time to review my posting.
> 
> I agree 100% with the critic on annex K. I believe there is (almost) universal agreement that it was half baked solution, rushed into the standard without proper review - which is why it got very little traction.
> 
> However, the fact the the annex K has many flaws does not mean that there are no good ideas in it. In particular, the ability to create “safer” vararg lists for atomic types, and use them in scanf* and similar, is significant improvement (IHMO). I believe it improve safety, while keeping code clean, readable, and can be used for many use cases (e.g., my scanf from a result set, parsing csv-like data, …).
> 
> Also worth mentioning that one mentioned alternative (OBS - object size checking) can fit into the proposed scanf change, given it ability to identify object sizes for static/dynamic char *.
> 
> Hoping to get feedback on my proposal for this limited use case.
> 
> Also, regarding the proposal for using stringifying macros, etc. Those are interesting ideas, but will break code clarity/readability . They Will not fit nicely into large scale projects, given their limitation (e.g. with dynamic size strings, etc.)
> 
> Thanks again for your time/ideas/feedback,

The main issue of extending standard interfaces with non standard 
functionality is the burden of keeping compatibility over releases.  
The scanf family itself requires glibc to provide multiple symbols
to handle C99, originally on glibc ‘a’ was synonymous of ‘m’ and 
the C standard defines ‘a’ to being synonymous of ‘f’.

It means that for -std=c99 glibc does a symbol redirection to a
different one to handle this difference.  If we add another non
standard extension it might require extra additional handling in
the future if C or POSIX reassign the conversion modifier to
something else.

We can follow the scanf_s idea and handle c, s, and [ conversion
as a pair or argument.  This will need also another symbol
redirection (since this interface is not compatible with standard
scanf family) and to export multiple new symbols.

I don’t have a a strong opinion, but if the idea is to add such 
extensions I think it would require some more discussion to avoid 
the scanf_s mistakes and maybe add some useful extension like to
handle non standard types supported by GCC (like int128 and
float128).

It also has the drawback to be an ad-hoc solution that will be
eventually phase out if standard ended up provide simila
functionality

I really think using current standard %m is really a much better 
approach than trying to extending scanf to use fixed sizes buffers.
It does dynamic allocation, but since your idea is to use dynamic
size strings anyway it might fit.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Buffer size checking for scanf* functions
  2022-07-06 12:10 ` Adhemerval Zanella
@ 2022-07-06 15:04   ` Yair Lenga
  2022-07-11 12:38     ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Yair Lenga @ 2022-07-06 15:04 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-help

Thanks for elaborating .Agree that when possible '-m' should be used, but
it's not always trivial .Lot of code is already written in such a way that
those changes are less than trivial. Not to mention that each %m will
require adding an strcpy (or equivalent) to copy the dynamically allocated
strings into the fixed length storage usually defined in struct, etc.

struct { ... } s ;
char *sx = NULL;
scanf("%ms %d", &sx, &s.i) ;
strcpy_fix(s.output, sizeof(s.output), &sx) ;    // Copy from *sx to
s.output, up to the limit, free *sx, and set sx =  NULL

Your point about backward compatibility is also very valid - if possible
new features should try to avoid collision with future improvement. The C
standard is getting updated every 10 years (c99, c11, and the expected
c23), I could not find any reason why the C standard committee chose to use
'%m' instead of using the already established '%a' that existed for many
years in glibc, and assign new meaning to the '%a'. I hope that those are
exceptions that proves the rules.

You raised a good point with the 'scanf_s' - the fact that they chose to
modify the behavior of the '%s' . To my understanding scanf_s '%s' requires
2 arguments (char *, size_t), vs. scanf that will only expect the 'char *'.
It would have been a much better solution to keep '%s' compatible. and
introduce another formatting sequence for the dynamic fixed-length string.

Going back to the question - what will be a good way to integrate the type
safety provided by scanf_s, without creating problems. a few ideas that I
have:
* Use '%S' (upper S) into indicate that a pair (char *, size_t) is
expected, OR
* Use '%@s' ('@' can be any unused letter or special character e.g. '%!s',
'%:s', ...). The logical choice should have been '*' - symmetry with
printf("%*s"). Unfortunately, '*' is already used .. as "ignore assignment'
flag.
* Use '%n %s', when 'n' will indicate a size parameter will be provided,
and will apply to the next '%s' or '%[', or even '%ms' - dynamic width
limit, instead of static width limit.

Personally, I prefer the second option '%@s', it matches the style of '*'
for printf. Easy for existing developers to grasp. Interesting enough, it
might be possible to implement the scanf_s as a wrapper around scanf, with
some manipulation of the argument list.

Yair

On Wed, Jul 6, 2022 at 3:10 PM Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:

>
>
> > On 5 Jul 2022, at 18:50, Yair Lenga <yair.lenga@gmail.com> wrote:
> >
> >
> >>
> >> Adhemerval,
> >
> > Thanks for taking time to review my posting.
> >
> > I agree 100% with the critic on annex K. I believe there is (almost)
> universal agreement that it was half baked solution, rushed into the
> standard without proper review - which is why it got very little traction.
> >
> > However, the fact the the annex K has many flaws does not mean that
> there are no good ideas in it. In particular, the ability to create “safer”
> vararg lists for atomic types, and use them in scanf* and similar, is
> significant improvement (IHMO). I believe it improve safety, while keeping
> code clean, readable, and can be used for many use cases (e.g., my scanf
> from a result set, parsing csv-like data, …).
> >
> > Also worth mentioning that one mentioned alternative (OBS - object size
> checking) can fit into the proposed scanf change, given it ability to
> identify object sizes for static/dynamic char *.
> >
> > Hoping to get feedback on my proposal for this limited use case.
> >
> > Also, regarding the proposal for using stringifying macros, etc. Those
> are interesting ideas, but will break code clarity/readability . They Will
> not fit nicely into large scale projects, given their limitation (e.g. with
> dynamic size strings, etc.)
> >
> > Thanks again for your time/ideas/feedback,
>
> The main issue of extending standard interfaces with non standard
> functionality is the burden of keeping compatibility over releases.
> The scanf family itself requires glibc to provide multiple symbols
> to handle C99, originally on glibc ‘a’ was synonymous of ‘m’ and
> the C standard defines ‘a’ to being synonymous of ‘f’.
>
> It means that for -std=c99 glibc does a symbol redirection to a
> different one to handle this difference.  If we add another non
> standard extension it might require extra additional handling in
> the future if C or POSIX reassign the conversion modifier to
> something else.
>
> We can follow the scanf_s idea and handle c, s, and [ conversion
> as a pair or argument.  This will need also another symbol
> redirection (since this interface is not compatible with standard
> scanf family) and to export multiple new symbols.
>
> I don’t have a a strong opinion, but if the idea is to add such
> extensions I think it would require some more discussion to avoid
> the scanf_s mistakes and maybe add some useful extension like to
> handle non standard types supported by GCC (like int128 and
> float128).
>
> It also has the drawback to be an ad-hoc solution that will be
> eventually phase out if standard ended up provide simila
> functionality
>
> I really think using current standard %m is really a much better
> approach than trying to extending scanf to use fixed sizes buffers.
> It does dynamic allocation, but since your idea is to use dynamic
> size strings anyway it might fit.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Buffer size checking for scanf* functions
  2022-07-06 15:04   ` Yair Lenga
@ 2022-07-11 12:38     ` Adhemerval Zanella
  0 siblings, 0 replies; 6+ messages in thread
From: Adhemerval Zanella @ 2022-07-11 12:38 UTC (permalink / raw)
  To: Yair Lenga; +Cc: libc-help




> On 6 Jul 2022, at 12:04, Yair Lenga <yair.lenga@gmail.com> wrote:
> 
> Thanks for elaborating .Agree that when possible '-m' should be used, but it's not always trivial .Lot of code is already written in such a way that those changes are less than trivial. Not to mention that each %m will require adding an strcpy (or equivalent) to copy the dynamically allocated strings into the fixed length storage usually defined in struct, etc.
> 
> struct { ... } s ;
> char *sx = NULL;
> scanf("%ms %d", &sx, &s.i) ;
> strcpy_fix(s.output, sizeof(s.output), &sx) ;    // Copy from *sx to s.output, up to the limit, free *sx, and set sx =  NULL

Well either this or to use the a simpler solution like:

  #define NAMELEN 32
  struct { char name[NAMELEN]; } x;
  #define STRFY(__n)  XSTRFY(__n)
  #define XSTRFY(__n) #__n
  scanf ("%"STRFY(NAMELEN)"s", x.name);

> 
> Your point about backward compatibility is also very valid - if possible new features should try to avoid collision with future improvement. The C standard is getting updated every 10 years (c99, c11, and the expected c23), I could not find any reason why the C standard committee chose to use '%m' instead of using the already established '%a' that existed for many years in glibc, and assign new meaning to the '%a'. I hope that those are exceptions that proves the rules.
> 
> You raised a good point with the 'scanf_s' - the fact that they chose to modify the behavior of the '%s' . To my understanding scanf_s '%s' requires 2 arguments (char *, size_t), vs. scanf that will only expect the 'char *'. It would have been a much better solution to keep '%s' compatible. and introduce another formatting sequence for the dynamic fixed-length string.
> 
> Going back to the question - what will be a good way to integrate the type safety provided by scanf_s, without creating problems. a few ideas that I have:
> * Use '%S' (upper S) into indicate that a pair (char *, size_t) is expected, OR
> * Use '%@s' ('@' can be any unused letter or special character e.g. '%!s', '%:s', ...). The logical choice should have been '*' - symmetry with printf("%*s"). Unfortunately, '*' is already used .. as "ignore assignment' flag.
> * Use '%n %s', when 'n' will indicate a size parameter will be provided, and will apply to the next '%s' or '%[', or even '%ms' - dynamic width limit, instead of static width limit.
> 
> Personally, I prefer the second option '%@s', it matches the style of '*' for printf. Easy for existing developers to grasp. Interesting enough, it might be possible to implement the scanf_s as a wrapper around scanf, with some manipulation of the argument list.

I don’t have a strong preference, and although I think scanf is still a
bad interface [1] I think you might try to raise this on libc-alpha.

I think the ‘@‘ modifier would make more sense, since ideally it would
extend to wscanf familiar as well (and it already defines ’%S’).


[1] https://github.com/biojppm/rapidyaml/issues/40

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Buffer size checking for scanf* functions
  2022-07-05  7:31 Yair Lenga
@ 2022-07-05 12:39 ` Adhemerval Zanella
  0 siblings, 0 replies; 6+ messages in thread
From: Adhemerval Zanella @ 2022-07-05 12:39 UTC (permalink / raw)
  To: Yair Lenga; +Cc: libc-help



> On 5 Jul 2022, at 04:31, Yair Lenga via Libc-help <libc-help@sourceware.org> wrote:
> 
> Hi,
> 
> Looking for feedback on the following age-old scanf problem: I'm trying to
> perform a "safe' scanf, which will avoid buffer overflow.
> 
> #define XSIZE 30
> char x[XSIZE] ;
> sscanf(input, "%29s", x) ;

You can adjust to use the old stringify macro trick instead:

--
$ cat<<EOF>test.c
#include <stdio.h>
int main (int argc, char *argv[])
{
#define XLEN 8
  char x[XLEN+1];
#define XSTRINPUT(size) STRINPUT (size)
#define STRINPUT(size)  "%" #size "s"
  sscanf (argv[1], XSTRINPUT (XLEN), x);

  printf ("x=%s\n", x);
  return 0;
}
EOF
$ gcc -Wall test.c -o test
$ ./test 1234567890
x=12345678
--

It has some limitations where you can’t evaluate neither arithmetic
operations on the macro (for instance XSTRINPTU (XLEN - 1)) nor use
compile directives like ’sizeof’.

> 
> With the common pitfall that anytime the size of X is changed, the format
> string MUST to be modified. One common approach, with glibc, is to use the
> 'm' modifier, switch x to char **x. However, this require code changes, and
> is not practical with my existing code base.
> 
> My question: is there any extension to allow scanf to take an extra
> argument (similar to the scanf_s proposal) - specifying the sizeof any
> string arguments ?
> sscanf(input, "%S", x, sizeof(x)) ;     // The 'S' require adding sizeof
> parameter ?

Currently glibc only provides the %m that accepts any input size and you 
can also use the a maximum field width (something like %128m). 

> 
> If there is no such extension - how hard it will be to implement. I know
> possible to define custom conversions for printf, I could not find anything
> for scanf. IF this will be built into 'glibc', there IF it will be embedded
> into the gcc format checks (2 big IFs), it can reduce the problems I (and I
> believe many other developers) face when using those functions.
> 

The glibc does not support hooks for scanf and I don’t think we will even
will, the printf hooks itself has some drawbacks (MT-safeness, etc.).  Also
scan_s and similar ‘enhanced’ functions from TR-24731 have not been widely 
implemented mainly because they do not actually improve much. Carlos and
Martim, both GNU developers, wrote a paper discussing the problems with
these interfaces [1].

In the end scanf family functions are just a really clunky interface with
a lot of pitfalls and I would advice you just rewrite it either avoid them
it or use the ‘%m’ extension otherwise.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Buffer size checking for scanf* functions
@ 2022-07-05  7:31 Yair Lenga
  2022-07-05 12:39 ` Adhemerval Zanella
  0 siblings, 1 reply; 6+ messages in thread
From: Yair Lenga @ 2022-07-05  7:31 UTC (permalink / raw)
  To: libc-help

Hi,

Looking for feedback on the following age-old scanf problem: I'm trying to
perform a "safe' scanf, which will avoid buffer overflow.

#define XSIZE 30
char x[XSIZE] ;
sscanf(input, "%29s", x) ;

With the common pitfall that anytime the size of X is changed, the format
string MUST to be modified. One common approach, with glibc, is to use the
'm' modifier, switch x to char **x. However, this require code changes, and
is not practical with my existing code base.

My question: is there any extension to allow scanf to take an extra
argument (similar to the scanf_s proposal) - specifying the sizeof any
string arguments ?
sscanf(input, "%S", x, sizeof(x)) ;     // The 'S' require adding sizeof
parameter ?

If there is no such extension - how hard it will be to implement. I know
possible to define custom conversions for printf, I could not find anything
for scanf. IF this will be built into 'glibc', there IF it will be embedded
into the gcc format checks (2 big IFs), it can reduce the problems I (and I
believe many other developers) face when using those functions.

Yair

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-11 12:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-05 21:50 Buffer size checking for scanf* functions Yair Lenga
2022-07-06 12:10 ` Adhemerval Zanella
2022-07-06 15:04   ` Yair Lenga
2022-07-11 12:38     ` Adhemerval Zanella
  -- strict thread matches above, loose matches on Subject: below --
2022-07-05  7:31 Yair Lenga
2022-07-05 12:39 ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).