public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
From: Yair Lenga <yair.lenga@gmail.com>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-help@sourceware.org
Subject: Re: Buffer size checking for scanf* functions
Date: Wed, 6 Jul 2022 18:04:29 +0300	[thread overview]
Message-ID: <CAK3_KpPaLa7yzDg1WeV1bu2APjQURpEU0r49KQcXhM8+ushbDg@mail.gmail.com> (raw)
In-Reply-To: <D978102C-941F-40DE-9E75-2604E256204C@linaro.org>

Thanks for elaborating .Agree that when possible '-m' should be used, but
it's not always trivial .Lot of code is already written in such a way that
those changes are less than trivial. Not to mention that each %m will
require adding an strcpy (or equivalent) to copy the dynamically allocated
strings into the fixed length storage usually defined in struct, etc.

struct { ... } s ;
char *sx = NULL;
scanf("%ms %d", &sx, &s.i) ;
strcpy_fix(s.output, sizeof(s.output), &sx) ;    // Copy from *sx to
s.output, up to the limit, free *sx, and set sx =  NULL

Your point about backward compatibility is also very valid - if possible
new features should try to avoid collision with future improvement. The C
standard is getting updated every 10 years (c99, c11, and the expected
c23), I could not find any reason why the C standard committee chose to use
'%m' instead of using the already established '%a' that existed for many
years in glibc, and assign new meaning to the '%a'. I hope that those are
exceptions that proves the rules.

You raised a good point with the 'scanf_s' - the fact that they chose to
modify the behavior of the '%s' . To my understanding scanf_s '%s' requires
2 arguments (char *, size_t), vs. scanf that will only expect the 'char *'.
It would have been a much better solution to keep '%s' compatible. and
introduce another formatting sequence for the dynamic fixed-length string.

Going back to the question - what will be a good way to integrate the type
safety provided by scanf_s, without creating problems. a few ideas that I
have:
* Use '%S' (upper S) into indicate that a pair (char *, size_t) is
expected, OR
* Use '%@s' ('@' can be any unused letter or special character e.g. '%!s',
'%:s', ...). The logical choice should have been '*' - symmetry with
printf("%*s"). Unfortunately, '*' is already used .. as "ignore assignment'
flag.
* Use '%n %s', when 'n' will indicate a size parameter will be provided,
and will apply to the next '%s' or '%[', or even '%ms' - dynamic width
limit, instead of static width limit.

Personally, I prefer the second option '%@s', it matches the style of '*'
for printf. Easy for existing developers to grasp. Interesting enough, it
might be possible to implement the scanf_s as a wrapper around scanf, with
some manipulation of the argument list.

Yair

On Wed, Jul 6, 2022 at 3:10 PM Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:

>
>
> > On 5 Jul 2022, at 18:50, Yair Lenga <yair.lenga@gmail.com> wrote:
> >
> >
> >>
> >> Adhemerval,
> >
> > Thanks for taking time to review my posting.
> >
> > I agree 100% with the critic on annex K. I believe there is (almost)
> universal agreement that it was half baked solution, rushed into the
> standard without proper review - which is why it got very little traction.
> >
> > However, the fact the the annex K has many flaws does not mean that
> there are no good ideas in it. In particular, the ability to create “safer”
> vararg lists for atomic types, and use them in scanf* and similar, is
> significant improvement (IHMO). I believe it improve safety, while keeping
> code clean, readable, and can be used for many use cases (e.g., my scanf
> from a result set, parsing csv-like data, …).
> >
> > Also worth mentioning that one mentioned alternative (OBS - object size
> checking) can fit into the proposed scanf change, given it ability to
> identify object sizes for static/dynamic char *.
> >
> > Hoping to get feedback on my proposal for this limited use case.
> >
> > Also, regarding the proposal for using stringifying macros, etc. Those
> are interesting ideas, but will break code clarity/readability . They Will
> not fit nicely into large scale projects, given their limitation (e.g. with
> dynamic size strings, etc.)
> >
> > Thanks again for your time/ideas/feedback,
>
> The main issue of extending standard interfaces with non standard
> functionality is the burden of keeping compatibility over releases.
> The scanf family itself requires glibc to provide multiple symbols
> to handle C99, originally on glibc ‘a’ was synonymous of ‘m’ and
> the C standard defines ‘a’ to being synonymous of ‘f’.
>
> It means that for -std=c99 glibc does a symbol redirection to a
> different one to handle this difference.  If we add another non
> standard extension it might require extra additional handling in
> the future if C or POSIX reassign the conversion modifier to
> something else.
>
> We can follow the scanf_s idea and handle c, s, and [ conversion
> as a pair or argument.  This will need also another symbol
> redirection (since this interface is not compatible with standard
> scanf family) and to export multiple new symbols.
>
> I don’t have a a strong opinion, but if the idea is to add such
> extensions I think it would require some more discussion to avoid
> the scanf_s mistakes and maybe add some useful extension like to
> handle non standard types supported by GCC (like int128 and
> float128).
>
> It also has the drawback to be an ad-hoc solution that will be
> eventually phase out if standard ended up provide simila
> functionality
>
> I really think using current standard %m is really a much better
> approach than trying to extending scanf to use fixed sizes buffers.
> It does dynamic allocation, but since your idea is to use dynamic
> size strings anyway it might fit.

  reply	other threads:[~2022-07-06 15:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-05 21:50 Yair Lenga
2022-07-06 12:10 ` Adhemerval Zanella
2022-07-06 15:04   ` Yair Lenga [this message]
2022-07-11 12:38     ` Adhemerval Zanella
  -- strict thread matches above, loose matches on Subject: below --
2022-07-05  7:31 Yair Lenga
2022-07-05 12:39 ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK3_KpPaLa7yzDg1WeV1bu2APjQURpEU0r49KQcXhM8+ushbDg@mail.gmail.com \
    --to=yair.lenga@gmail.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=libc-help@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).