From: Amol Surati <suratiamol@gmail.com>
To: Alejandro Colomar <alx@kernel.org>
Cc: libc-help@sourceware.org, gcc-help@gcc.gnu.org,
Guillem Jover <guillem@hadrons.org>,
libbsd@lists.freedesktop.org
Subject: Re: restrictness of strtoi(3bsd) and strtol(3)
Date: Sun, 3 Dec 2023 16:29:07 +0530 [thread overview]
Message-ID: <CA+nuEB-T9Qi8eRwVovsZau33J5o+sAQ7X-MD9wy8Up1C_-3qkA@mail.gmail.com> (raw)
In-Reply-To: <ZWskPqcvoqXq6dEN@debian>
On Sat, 2 Dec 2023 at 18:05, Alejandro Colomar via Gcc-help
<gcc-help@gcc.gnu.org> wrote:
>
> On Sat, Dec 02, 2023 at 01:29:01PM +0100, Alejandro Colomar wrote:
> > On Sat, Dec 02, 2023 at 12:50:28PM +0100, Alejandro Colomar wrote:
> > > Hi,
> > >
> > > I've been implementing my own copy of strto[iu](3bsd), to avoid the
> > > complexity of calling strtol(3) et al. In the process, I've noticed
> > > that all of these functions use restrict for their parameters.
> > >
> > > Why do these functions use restrict? While the second parameter is not
> > > used for accessing nptr memory (**endptr is not accessed), it can point
> > > to the same memory. Here is an example of how these functions can have
> > > pointers to the same memory in the two arguments.
> > >
> > > l = strtol(p, &p, 0);
> > >
> > > The use of restrict in the prototype of the function could result in
> > > compiler warnings, no? Currently, I don't see any warnings, but I
> > > suspect the compiler could complain, since the same memory is available
> > > to the function via two different arguments (albeit with a different
> > > number of references).
> > >
> > > The use of restrict in the definition of the function doesn't help the
> > > optimizer, since it already knows that the second parameter is out-only,
> > > so even if it weren't restrict, the only way to access memory is via the
> > > first parameter.
> >
> > In the case of strto[iu](3bsd), I have even more doubts.
> >
> > Here's libbsd's version of it (omitting unimportant parts):
> >
> > $ grepc -tfd strtoi .
> > ./src/strtoi.c:intmax_t
> > strtoi(const char *__restrict nptr,
> > char **__restrict endptr, int base,
> > intmax_t lo, intmax_t hi, int *rstatus)
> > {
> > ...
> >
> > im = strtoimax(nptr, endptr, base);
> >
> > *rstatus = errno;
> > errno = serrno;
> >
> > if (*rstatus == 0) {
> > /* No digits were found */
> > if (nptr == *endptr)
> > *rstatus = ECANCELED;
> > /* There are further characters after number */
> > else if (**endptr != '\0')
> > *rstatus = ENOTSUP;
> > }
> >
> > ...
> >
> > return im;
> > }
> >
> > Let's say the base is unsupported (e.g., -42), and endptr initially
> > points to nptr-1. Imagine this call:
> >
> > i = strtoimax(p + 1, &p, -42);
> >
> > ISO C doesn't specify what happens if the base is not between 0 and 36,
> > so the behavior is probably undefined in ISO C.
> >
> > POSIX says it returns 0 and sets errno to EINVAL, but doesn't say what
> > happens to endptr. I expect two possible implementations:
> >
> > - Leave endptr untouched.
> > - Set *endptr = nptr.
> >
> > Let's suppose it leaves endptr untouched (otherwise, it would be
> > impossible to portably differentiate an EINVAL due to unsupported base
> > from an EINVAL due to no digits in the string).
> >
> > So, the test (nptr == *endptr) would be false (because p+1 != p), and
> > the code would jump into accessing **endptr without having derived
> > that pointer from nptr, which is a violation of restrict.
>
> Oops, it's within an (errno == 0) path, so *endptr is guaranteed to be
> derived from nptr here.
>
> So no bug, but still unclear to me what's the benefit of using restrict,
The section "7. Library" at [1] has some information about the 'restrict'
keyword.
I think the restrict keywords compel the programmer to keep the string
(or that portion of the string that strtol actually accesses) and the
pointer to a string in non-overlapping memory regions. Calling
strtol(p, &p, 0) should be well-defined in such cases.
-------------------
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n881.pdf
-Amol
> and also unclear why GCC doesn't warn about it at call site.
>
> > I made many assumptions here, where the standards are not clear, so I
> > may be wrong in some of them. But it looks to me like a bug.
> >
> > CCing libbsd.
> >
> > Cheers,
> > Alex
> >
> > --
> > <https://www.alejandro-colomar.es/>
>
>
>
> --
> <https://www.alejandro-colomar.es/>
next prev parent reply other threads:[~2023-12-03 10:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-02 11:50 Alejandro Colomar
2023-12-02 12:29 ` Alejandro Colomar
2023-12-02 12:34 ` Alejandro Colomar
2023-12-03 10:59 ` Amol Surati [this message]
2023-12-03 11:35 ` Alejandro Colomar
2023-12-03 15:38 ` Amol Surati
2023-12-03 16:33 ` Alejandro Colomar
2023-12-03 16:46 ` Alejandro Colomar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+nuEB-T9Qi8eRwVovsZau33J5o+sAQ7X-MD9wy8Up1C_-3qkA@mail.gmail.com \
--to=suratiamol@gmail.com \
--cc=alx@kernel.org \
--cc=gcc-help@gcc.gnu.org \
--cc=guillem@hadrons.org \
--cc=libbsd@lists.freedesktop.org \
--cc=libc-help@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).