public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Iain Sandoe <idsandoe@googlemail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
Date: Thu, 4 Apr 2024 17:30:26 +0100	[thread overview]
Message-ID: <CACb0b4k61G-2nttkL7e6e6XaOJ-UUG03dVedhc75JYS9xQfSow@mail.gmail.com> (raw)
In-Reply-To: <19633866-184F-4E13-B05B-C3473946E2B9@googlemail.com>

On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote:
>
>
>
> > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote:
> >
> > I would appreciate more eyes on this to confirm my conclusions about
> > negative int_type values, and the proposed fix, make sense.
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > A negative value for the delim value passed to std::istream::ignore can
> > never match any character in the stream, because the comparison is done
> > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
> > returns negative values (except at EOF). The optimized version of
> > ignore for the std::istream specialization uses traits_type::find to
> > locate the delim character in the streambuf, which _can_ match a
> > negative delim on platforms where char is signed, but then we do another
> > comparison using eq_int_type which fails. The code then keeps looping
> > forever, with traits_type::find saying the character is present and
> > eq_int_type saying it's not.
> >
> > A possible fix would be to check with eq_int_type after a successful
> > find, to see whether we really have a match. However, that would be
> > suboptimal since we know that a negative delimiter will never match
> > using eq_int_type. So a better fix is to adjust the check at the top of
> > the function that handles delim==eof(), so that we treat all negative
> > delim values as equivalent to EOF. That way we don't bother using find
> > to search for something that will never match with eq_int_type.
>
> Is the corollary to this that a platform with signed chars can never use a
> negative value as a delimiter - since that we always be treated as EOF?

That's what the C++ standard says (and is what libc++ does).

The delimiter argument to ignore is an int_type, not a char. So
formally you should call it like:

std::cin.ignore(n, std::istream::traits_type::to_int_type('a'));

where to_int_type will cast to unsigned char and then to int, so that
no char can ever produce a negative value for that argument.

If you happen to know that casting 'a' to unsigned char and then to
int doesn't change its value (because it's a 7-bit ASCII value), then
you can be lazy and do:

std::cin.ignore(n, 'a');

That works fine.

But if your delimiter character is the MSB set, *and* char is signed
on your platform, then you can't be lazy. The implicit conversion from
char to the stream's int_type is not the same as the result of calling
traits_type::to_int_type, and so these are NOT equivalent on a
platform with signed char:
std::cin.ignore(n, '\x80');
std::cin.ignore(n, (unsigned char)'\x80');

The former is wrong, the latter is correct.
The former will never match a '\x80' in the stream, because the ignore
function will cast each char extracted from the stream to
(int)(unsigned char) and so never match -128.

So the change to treat all negative values as EOF is just an
optimization. Since they can never match, there's no point searching
for them. Just skip n chars.



>
> - I am not sure it there’s an actual use-case where that matters, but,
> Iain
>
> >
> > The version of ignore in the primary template doesn't need a change,
> > because it doesn't use traits_type::find, instead characters are
> > extracted one-by-one and always matched using eq_int_type. That avoids
> > the inconsistency between find and eq_int_type.
> >
> > libstdc++-v3/ChangeLog:
> >
> >       PR libstdc++/93672
> >       * src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
> >       Treat all negative delimiter values as eof().
> >       * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
> > ---
> > libstdc++-v3/src/c++98/istream.cc                 |  5 ++++-
> > .../27_io/basic_istream/ignore/char/93672.cc      | 15 +++++++++++++++
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> >
> > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
> > index 07ac739c26a..aa1069dea07 100644
> > --- a/libstdc++-v3/src/c++98/istream.cc
> > +++ b/libstdc++-v3/src/c++98/istream.cc
> > @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     basic_istream<char>::
> >     ignore(streamsize __n, int_type __delim)
> >     {
> > -      if (traits_type::eq_int_type(__delim, traits_type::eof()))
> > +      // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF.
> > +      // If __delim is negative, then eq_int_type(sgetc(), __delim) can only
> > +      // be true for EOF, so just treat all negative values as eof().
> > +      if (__delim < 0)
> >       return ignore(__n);
> >
> >       _M_gcount = 0;
> > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> > new file mode 100644
> > index 00000000000..6d11f5622c8
> > --- /dev/null
> > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> > @@ -0,0 +1,15 @@
> > +// { dg-do run }
> > +
> > +#include <sstream>
> > +#include <testsuite_hooks.h>
> > +
> > +int main()
> > +{
> > +  std::istringstream in("x\xfdxxx\xfex");
> > +  in.ignore(10, std::char_traits<char>::to_int_type('\xfd'));
> > +  VERIFY( in.gcount() == 2 );
> > +  VERIFY( ! in.eof() );
> > +  in.ignore(10, '\xfe');
> > +  VERIFY( in.gcount() == 5 );
> > +  VERIFY( in.eof() );
> > +}
> > --
> > 2.44.0
> >
>


  reply	other threads:[~2024-04-04 16:30 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 15:29 Jonathan Wakely
2024-04-04 15:40 ` Iain Sandoe
2024-04-04 16:30   ` Jonathan Wakely [this message]
2024-04-04 16:55     ` Jonathan Wakely
2024-04-04 19:53       ` Jonathan Wakely
2024-04-04 16:28 ` Ulrich Drepper
2024-04-04 16:33   ` Jonathan Wakely
2024-04-04 16:35     ` Jonathan Wakely
2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
2024-04-15 18:30   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACb0b4k61G-2nttkL7e6e6XaOJ-UUG03dVedhc75JYS9xQfSow@mail.gmail.com \
    --to=jwakely@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=idsandoe@googlemail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).