public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
@ 2024-04-04 15:29 Jonathan Wakely
  2024-04-04 15:40 ` Iain Sandoe
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 15:29 UTC (permalink / raw)
  To: libstdc++, gcc-patches

I would appreciate more eyes on this to confirm my conclusions about
negative int_type values, and the proposed fix, make sense.

Tested x86_64-linux.

-- >8 --

A negative value for the delim value passed to std::istream::ignore can
never match any character in the stream, because the comparison is done
using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
returns negative values (except at EOF). The optimized version of
ignore for the std::istream specialization uses traits_type::find to
locate the delim character in the streambuf, which _can_ match a
negative delim on platforms where char is signed, but then we do another
comparison using eq_int_type which fails. The code then keeps looping
forever, with traits_type::find saying the character is present and
eq_int_type saying it's not.

A possible fix would be to check with eq_int_type after a successful
find, to see whether we really have a match. However, that would be
suboptimal since we know that a negative delimiter will never match
using eq_int_type. So a better fix is to adjust the check at the top of
the function that handles delim==eof(), so that we treat all negative
delim values as equivalent to EOF. That way we don't bother using find
to search for something that will never match with eq_int_type.

The version of ignore in the primary template doesn't need a change,
because it doesn't use traits_type::find, instead characters are
extracted one-by-one and always matched using eq_int_type. That avoids
the inconsistency between find and eq_int_type.

libstdc++-v3/ChangeLog:

	PR libstdc++/93672
	* src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
	Treat all negative delimiter values as eof().
	* testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
---
 libstdc++-v3/src/c++98/istream.cc                 |  5 ++++-
 .../27_io/basic_istream/ignore/char/93672.cc      | 15 +++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc

diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
index 07ac739c26a..aa1069dea07 100644
--- a/libstdc++-v3/src/c++98/istream.cc
+++ b/libstdc++-v3/src/c++98/istream.cc
@@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     basic_istream<char>::
     ignore(streamsize __n, int_type __delim)
     {
-      if (traits_type::eq_int_type(__delim, traits_type::eof()))
+      // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF.
+      // If __delim is negative, then eq_int_type(sgetc(), __delim) can only
+      // be true for EOF, so just treat all negative values as eof().
+      if (__delim < 0)
 	return ignore(__n);
 
       _M_gcount = 0;
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
new file mode 100644
index 00000000000..6d11f5622c8
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
@@ -0,0 +1,15 @@
+// { dg-do run }
+
+#include <sstream>
+#include <testsuite_hooks.h>
+
+int main()
+{
+  std::istringstream in("x\xfdxxx\xfex");
+  in.ignore(10, std::char_traits<char>::to_int_type('\xfd'));
+  VERIFY( in.gcount() == 2 );
+  VERIFY( ! in.eof() );
+  in.ignore(10, '\xfe');
+  VERIFY( in.gcount() == 5 );
+  VERIFY( in.eof() );
+}
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely
@ 2024-04-04 15:40 ` Iain Sandoe
  2024-04-04 16:30   ` Jonathan Wakely
  2024-04-04 16:28 ` Ulrich Drepper
  2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
  2 siblings, 1 reply; 10+ messages in thread
From: Iain Sandoe @ 2024-04-04 15:40 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, GCC Patches



> On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote:
> 
> I would appreciate more eyes on this to confirm my conclusions about
> negative int_type values, and the proposed fix, make sense.
> 
> Tested x86_64-linux.
> 
> -- >8 --
> 
> A negative value for the delim value passed to std::istream::ignore can
> never match any character in the stream, because the comparison is done
> using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
> returns negative values (except at EOF). The optimized version of
> ignore for the std::istream specialization uses traits_type::find to
> locate the delim character in the streambuf, which _can_ match a
> negative delim on platforms where char is signed, but then we do another
> comparison using eq_int_type which fails. The code then keeps looping
> forever, with traits_type::find saying the character is present and
> eq_int_type saying it's not.
> 
> A possible fix would be to check with eq_int_type after a successful
> find, to see whether we really have a match. However, that would be
> suboptimal since we know that a negative delimiter will never match
> using eq_int_type. So a better fix is to adjust the check at the top of
> the function that handles delim==eof(), so that we treat all negative
> delim values as equivalent to EOF. That way we don't bother using find
> to search for something that will never match with eq_int_type.

Is the corollary to this that a platform with signed chars can never use a
negative value as a delimiter - since that we always be treated as EOF?

- I am not sure it there’s an actual use-case where that matters, but,
Iain

> 
> The version of ignore in the primary template doesn't need a change,
> because it doesn't use traits_type::find, instead characters are
> extracted one-by-one and always matched using eq_int_type. That avoids
> the inconsistency between find and eq_int_type.
> 
> libstdc++-v3/ChangeLog:
> 
> 	PR libstdc++/93672
> 	* src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
> 	Treat all negative delimiter values as eof().
> 	* testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
> ---
> libstdc++-v3/src/c++98/istream.cc                 |  5 ++++-
> .../27_io/basic_istream/ignore/char/93672.cc      | 15 +++++++++++++++
> 2 files changed, 19 insertions(+), 1 deletion(-)
> create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> 
> diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
> index 07ac739c26a..aa1069dea07 100644
> --- a/libstdc++-v3/src/c++98/istream.cc
> +++ b/libstdc++-v3/src/c++98/istream.cc
> @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     basic_istream<char>::
>     ignore(streamsize __n, int_type __delim)
>     {
> -      if (traits_type::eq_int_type(__delim, traits_type::eof()))
> +      // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF.
> +      // If __delim is negative, then eq_int_type(sgetc(), __delim) can only
> +      // be true for EOF, so just treat all negative values as eof().
> +      if (__delim < 0)
> 	return ignore(__n);
> 
>       _M_gcount = 0;
> diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> new file mode 100644
> index 00000000000..6d11f5622c8
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> @@ -0,0 +1,15 @@
> +// { dg-do run }
> +
> +#include <sstream>
> +#include <testsuite_hooks.h>
> +
> +int main()
> +{
> +  std::istringstream in("x\xfdxxx\xfex");
> +  in.ignore(10, std::char_traits<char>::to_int_type('\xfd'));
> +  VERIFY( in.gcount() == 2 );
> +  VERIFY( ! in.eof() );
> +  in.ignore(10, '\xfe');
> +  VERIFY( in.gcount() == 5 );
> +  VERIFY( in.eof() );
> +}
> -- 
> 2.44.0
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely
  2024-04-04 15:40 ` Iain Sandoe
@ 2024-04-04 16:28 ` Ulrich Drepper
  2024-04-04 16:33   ` Jonathan Wakely
  2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
  2 siblings, 1 reply; 10+ messages in thread
From: Ulrich Drepper @ 2024-04-04 16:28 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: libstdc++, gcc-patches

On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote:
> I would appreciate more eyes on this to confirm my conclusions about
> negative int_type values, and the proposed fix, make sense.

The way something like this is handled in glibc's ctype functions is
that both branches are considered.  For isXXX(c) whether c is -v or
256-v the same value is returned (except for EOF which is -1).  This
caused the least number of bad surprises.

You could here also perform similar actions.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 15:40 ` Iain Sandoe
@ 2024-04-04 16:30   ` Jonathan Wakely
  2024-04-04 16:55     ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 16:30 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: libstdc++, GCC Patches

On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote:
>
>
>
> > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote:
> >
> > I would appreciate more eyes on this to confirm my conclusions about
> > negative int_type values, and the proposed fix, make sense.
> >
> > Tested x86_64-linux.
> >
> > -- >8 --
> >
> > A negative value for the delim value passed to std::istream::ignore can
> > never match any character in the stream, because the comparison is done
> > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
> > returns negative values (except at EOF). The optimized version of
> > ignore for the std::istream specialization uses traits_type::find to
> > locate the delim character in the streambuf, which _can_ match a
> > negative delim on platforms where char is signed, but then we do another
> > comparison using eq_int_type which fails. The code then keeps looping
> > forever, with traits_type::find saying the character is present and
> > eq_int_type saying it's not.
> >
> > A possible fix would be to check with eq_int_type after a successful
> > find, to see whether we really have a match. However, that would be
> > suboptimal since we know that a negative delimiter will never match
> > using eq_int_type. So a better fix is to adjust the check at the top of
> > the function that handles delim==eof(), so that we treat all negative
> > delim values as equivalent to EOF. That way we don't bother using find
> > to search for something that will never match with eq_int_type.
>
> Is the corollary to this that a platform with signed chars can never use a
> negative value as a delimiter - since that we always be treated as EOF?

That's what the C++ standard says (and is what libc++ does).

The delimiter argument to ignore is an int_type, not a char. So
formally you should call it like:

std::cin.ignore(n, std::istream::traits_type::to_int_type('a'));

where to_int_type will cast to unsigned char and then to int, so that
no char can ever produce a negative value for that argument.

If you happen to know that casting 'a' to unsigned char and then to
int doesn't change its value (because it's a 7-bit ASCII value), then
you can be lazy and do:

std::cin.ignore(n, 'a');

That works fine.

But if your delimiter character is the MSB set, *and* char is signed
on your platform, then you can't be lazy. The implicit conversion from
char to the stream's int_type is not the same as the result of calling
traits_type::to_int_type, and so these are NOT equivalent on a
platform with signed char:
std::cin.ignore(n, '\x80');
std::cin.ignore(n, (unsigned char)'\x80');

The former is wrong, the latter is correct.
The former will never match a '\x80' in the stream, because the ignore
function will cast each char extracted from the stream to
(int)(unsigned char) and so never match -128.

So the change to treat all negative values as EOF is just an
optimization. Since they can never match, there's no point searching
for them. Just skip n chars.



>
> - I am not sure it there’s an actual use-case where that matters, but,
> Iain
>
> >
> > The version of ignore in the primary template doesn't need a change,
> > because it doesn't use traits_type::find, instead characters are
> > extracted one-by-one and always matched using eq_int_type. That avoids
> > the inconsistency between find and eq_int_type.
> >
> > libstdc++-v3/ChangeLog:
> >
> >       PR libstdc++/93672
> >       * src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
> >       Treat all negative delimiter values as eof().
> >       * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
> > ---
> > libstdc++-v3/src/c++98/istream.cc                 |  5 ++++-
> > .../27_io/basic_istream/ignore/char/93672.cc      | 15 +++++++++++++++
> > 2 files changed, 19 insertions(+), 1 deletion(-)
> > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> >
> > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
> > index 07ac739c26a..aa1069dea07 100644
> > --- a/libstdc++-v3/src/c++98/istream.cc
> > +++ b/libstdc++-v3/src/c++98/istream.cc
> > @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     basic_istream<char>::
> >     ignore(streamsize __n, int_type __delim)
> >     {
> > -      if (traits_type::eq_int_type(__delim, traits_type::eof()))
> > +      // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF.
> > +      // If __delim is negative, then eq_int_type(sgetc(), __delim) can only
> > +      // be true for EOF, so just treat all negative values as eof().
> > +      if (__delim < 0)
> >       return ignore(__n);
> >
> >       _M_gcount = 0;
> > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> > new file mode 100644
> > index 00000000000..6d11f5622c8
> > --- /dev/null
> > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> > @@ -0,0 +1,15 @@
> > +// { dg-do run }
> > +
> > +#include <sstream>
> > +#include <testsuite_hooks.h>
> > +
> > +int main()
> > +{
> > +  std::istringstream in("x\xfdxxx\xfex");
> > +  in.ignore(10, std::char_traits<char>::to_int_type('\xfd'));
> > +  VERIFY( in.gcount() == 2 );
> > +  VERIFY( ! in.eof() );
> > +  in.ignore(10, '\xfe');
> > +  VERIFY( in.gcount() == 5 );
> > +  VERIFY( in.eof() );
> > +}
> > --
> > 2.44.0
> >
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 16:28 ` Ulrich Drepper
@ 2024-04-04 16:33   ` Jonathan Wakely
  2024-04-04 16:35     ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 16:33 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: libstdc++, gcc-patches

On Thu, 4 Apr 2024 at 17:29, Ulrich Drepper <drepper.fsp@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote:
> > I would appreciate more eyes on this to confirm my conclusions about
> > negative int_type values, and the proposed fix, make sense.
>
> The way something like this is handled in glibc's ctype functions is
> that both branches are considered.  For isXXX(c) whether c is -v or
> 256-v the same value is returned (except for EOF which is -1).  This
> caused the least number of bad surprises.
>
> You could here also perform similar actions.

Yes, my first attempt to fix PR93672 did exactly that, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93672#c1

But since it doesn't work for '\xff' (because that's EOF when char is
signed) it only handles 127 of the 128 possible bugs ;-)
I'm also not sure it's conforming, since the standard specifies how
the matching is done, and that won't match negative chars.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 16:33   ` Jonathan Wakely
@ 2024-04-04 16:35     ` Jonathan Wakely
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 16:35 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: libstdc++, gcc-patches

On Thu, 4 Apr 2024 at 17:33, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Thu, 4 Apr 2024 at 17:29, Ulrich Drepper <drepper.fsp@gmail.com> wrote:
> >
> > On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote:
> > > I would appreciate more eyes on this to confirm my conclusions about
> > > negative int_type values, and the proposed fix, make sense.
> >
> > The way something like this is handled in glibc's ctype functions is
> > that both branches are considered.  For isXXX(c) whether c is -v or
> > 256-v the same value is returned (except for EOF which is -1).  This
> > caused the least number of bad surprises.
> >
> > You could here also perform similar actions.
>
> Yes, my first attempt to fix PR93672 did exactly that, see
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93672#c1
>
> But since it doesn't work for '\xff' (because that's EOF when char is
> signed) it only handles 127 of the 128 possible bugs ;-)
> I'm also not sure it's conforming, since the standard specifies how
> the matching is done, and that won't match negative chars.

I might suggest relaxing the standard to allow it though...


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 16:30   ` Jonathan Wakely
@ 2024-04-04 16:55     ` Jonathan Wakely
  2024-04-04 19:53       ` Jonathan Wakely
  0 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 16:55 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: libstdc++, GCC Patches

On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote:
> >
> >
> >
> > > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote:
> > >
> > > I would appreciate more eyes on this to confirm my conclusions about
> > > negative int_type values, and the proposed fix, make sense.
> > >
> > > Tested x86_64-linux.
> > >
> > > -- >8 --
> > >
> > > A negative value for the delim value passed to std::istream::ignore can
> > > never match any character in the stream, because the comparison is done
> > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
> > > returns negative values (except at EOF). The optimized version of
> > > ignore for the std::istream specialization uses traits_type::find to
> > > locate the delim character in the streambuf, which _can_ match a
> > > negative delim on platforms where char is signed, but then we do another
> > > comparison using eq_int_type which fails. The code then keeps looping
> > > forever, with traits_type::find saying the character is present and
> > > eq_int_type saying it's not.
> > >
> > > A possible fix would be to check with eq_int_type after a successful
> > > find, to see whether we really have a match. However, that would be
> > > suboptimal since we know that a negative delimiter will never match
> > > using eq_int_type. So a better fix is to adjust the check at the top of
> > > the function that handles delim==eof(), so that we treat all negative
> > > delim values as equivalent to EOF. That way we don't bother using find
> > > to search for something that will never match with eq_int_type.
> >
> > Is the corollary to this that a platform with signed chars can never use a
> > negative value as a delimiter - since that we always be treated as EOF?
>
> That's what the C++ standard says (and is what libc++ does).
>
> The delimiter argument to ignore is an int_type, not a char. So
> formally you should call it like:
>
> std::cin.ignore(n, std::istream::traits_type::to_int_type('a'));
>
> where to_int_type will cast to unsigned char and then to int, so that
> no char can ever produce a negative value for that argument.
>
> If you happen to know that casting 'a' to unsigned char and then to
> int doesn't change its value (because it's a 7-bit ASCII value), then
> you can be lazy and do:
>
> std::cin.ignore(n, 'a');
>
> That works fine.
>
> But if your delimiter character is the MSB set, *and* char is signed
> on your platform, then you can't be lazy. The implicit conversion from
> char to the stream's int_type is not the same as the result of calling
> traits_type::to_int_type, and so these are NOT equivalent on a
> platform with signed char:
> std::cin.ignore(n, '\x80');
> std::cin.ignore(n, (unsigned char)'\x80');
>
> The former is wrong, the latter is correct.
> The former will never match a '\x80' in the stream, because the ignore
> function will cast each char extracted from the stream to
> (int)(unsigned char) and so never match -128.
>
> So the change to treat all negative values as EOF is just an
> optimization. Since they can never match, there's no point searching
> for them. Just skip n chars.

And FWIW, I don't think libc++ does that optimization. They extract
char-by-char and compare them, even though that can never match a
negative value. So they get the same result as with my patch, but much
slower ;-)

The status quo is that libstdc++ loops forever given a negative
delimiter. That's obviously wrong!

We need to fix the buggy logic that loops forever, and the "correct"
standard conforming fix means that we never match the negative value,
because the chars are extracted from the stream as non-negative
int_type values. As an optimization, we can skip the loop that keeps
trying impossible matches, because we know it won't match. That's what
my patch does. (It occurs to me now that we could also treat any delim
value greater than 255 as EOF too, since that can't match either).

Ulrich's suggestion is to allow the buggy user code to Just Work (for
all cases except (char)-1 on a platform where char is signed). That is
maybe the least surprising behaviour for users.

On the other hand, I'm not sure we really want to support:

cin.ignore(n, -10);
cin.ignore(n, -999);
cin.ignore(n, +999);

What are these trying to do? Those are just nonsense arguments to this
function. But if we try to make the testcase in the bug report Just
Work, we end up also accepting (at least) the -10 case, because it's
indistinguishable from the char '\xf6'. Depending how we do it, we
might also make the other cases work, treating -999 as '\x19', and
+999 as '\xe7'.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 16:55     ` Jonathan Wakely
@ 2024-04-04 19:53       ` Jonathan Wakely
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-04 19:53 UTC (permalink / raw)
  To: Iain Sandoe; +Cc: libstdc++, GCC Patches

On Thu, 4 Apr 2024 at 17:55, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely <jwakely@redhat.com> wrote:
> Ulrich's suggestion is to allow the buggy user code to Just Work (for
> all cases except (char)-1 on a platform where char is signed). That is
> maybe the least surprising behaviour for users.
>
> On the other hand, I'm not sure we really want to support:
>
> cin.ignore(n, -10);
> cin.ignore(n, -999);
> cin.ignore(n, +999);
>
> What are these trying to do? Those are just nonsense arguments to this
> function. But if we try to make the testcase in the bug report Just
> Work, we end up also accepting (at least) the -10 case, because it's
> indistinguishable from the char '\xf6'. Depending how we do it, we
> might also make the other cases work, treating -999 as '\x19', and
> +999 as '\xe7'.

So maybe what we want is to add a new overload:

basic_istream&
ignore(streamsize n, char_type c)
{
  return ignore(n, traits_type::to_int_type(c));
}

This will only accept the stream's char_type, not -999 or +999, and
will do the required conversion to int_type correctly, not as an
implicit conversion.

Calls that pass eof() will still select the other overload and do the
right thing. Calls using a char (or for wstream, a wchar_t) will
select the new overload.

This new overload will make some calls ambiguous, e.g. ignore(1, 1L)
compiles today, but would become ambiguous. That seems fine, since the
second argument should not be type long (what is it even trying to
do?)

If that's a problem, we can make it a constrained template that only
gets called for the exact char_type:

basic_istream&
ignore(streamsize n, same_as<char_type> auto c)

This would still Do The Right Thing for ignore(n, '\x80') but would
not change the result of ignore(1, 1L) which would still select the
original overload taking int_type for the second parameter.

I think I'll raise this idea with the committee. For now though, I
think my patch fixes the bug and conforms to the standard, and doesn't
do anything inventive.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely
  2024-04-04 15:40 ` Iain Sandoe
  2024-04-04 16:28 ` Ulrich Drepper
@ 2024-04-08 16:48 ` Jonathan Wakely
  2024-04-15 18:30   ` Jonathan Wakely
  2 siblings, 1 reply; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-08 16:48 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Patch v2.

I realised that it's not only negative delim values that cause the
problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256)
will cause traits_type::find to match 'a' but then the eq_int_type
comparison will fail because (int)'a' != (int)('a' + 256).

This version of the patch calls to_int_type on the delim and if that
alters the value, it's never going to match so skip the loop that tries
to find it and just ignore up to n chars instead.

Tested x86_64linux and aarch64-linux.

-- >8 --

A negative delim value passed to std::istream::ignore can never match
any character in the stream, because the comparison is done using
traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
negative values (except at EOF). The optimized version of ignore for the
std::istream specialization uses traits_type::find to locate the delim
character in the streambuf, which _can_ match a negative delim on
platforms where char is signed, but then we do another comparison using
eq_int_type which fails. The code then keeps looping forever, with
traits_type::find locating the character and traits_type::eq_int_type
saying it's not a match, so traits_type::find is used again and finds
the same character again.

A possible fix would be to check with eq_int_type after a successful
find, to see whether we really have a match. However, that would be
suboptimal since we know that a negative delimiter will never match
using eq_int_type. So a better fix is to adjust the check at the top of
the function that handles delim==eof(), so that we treat all negative
delim values as equivalent to EOF. That way we don't bother using find
to search for something that will never match with eq_int_type.

The version of ignore in the primary template doesn't need a change,
because it doesn't use traits_type::find, instead characters are
extracted one-by-one and always matched using eq_int_type. That avoids
the inconsistency between find and eq_int_type. The specialization for
std::wistream does use traits_type::find, but traits_type::to_int_type
is equivalent to an implicit conversion from wchar_t to wint_t, so
passing a wchar_t directly to ignore without using to_int_type works.

libstdc++-v3/ChangeLog:

	PR libstdc++/93672
	* src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
	Treat all negative delimiter values as eof().
	* testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
	* testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
	test.
---
 libstdc++-v3/src/c++98/istream.cc             |  13 ++-
 .../27_io/basic_istream/ignore/char/93672.cc  | 101 ++++++++++++++++++
 .../basic_istream/ignore/wchar_t/93672.cc     |  34 ++++++
 3 files changed, 146 insertions(+), 2 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
 create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc

diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
index 07ac739c26a..d1b4444ff2b 100644
--- a/libstdc++-v3/src/c++98/istream.cc
+++ b/libstdc++-v3/src/c++98/istream.cc
@@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     basic_istream<char>::
     ignore(streamsize __n, int_type __delim)
     {
-      if (traits_type::eq_int_type(__delim, traits_type::eof()))
-	return ignore(__n);
+      {
+	// If conversion to int_type changes the value then __delim does not
+	// correspond to a value of type char_type, and so will never match
+	// a character extracted from the input sequence. Just use ignore(n).
+	const int_type chk_delim = traits_type::to_int_type(__delim);
+	const bool matchable = traits_type::eq_int_type(chk_delim, __delim);
+	if (__builtin_expect(!matchable, 0))
+	  return ignore(__n);
+	// Now we know that __delim is a valid char_type value, so it's safe
+	// for the code below to use traits_type::find to search for it.
+      }
 
       _M_gcount = 0;
       sentry __cerb(*this, true);
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
new file mode 100644
index 00000000000..96737485b83
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
@@ -0,0 +1,101 @@
+// { dg-do run }
+
+#include <sstream>
+#include <limits>
+#include <testsuite_hooks.h>
+
+void
+test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set
+{
+  std::istringstream in(".\xfc..\xfd...\xfe.");
+
+  // This should find '\xfd' even on platforms where char is signed,
+  // because the delimiter is correctly converted to the stream's int_type.
+  in.ignore(100, std::char_traits<char>::to_int_type('\xfc'));
+  VERIFY( in.gcount() == 2 );
+  VERIFY( ! in.eof() );
+
+  // This should work equivalently to traits_type::to_int_type
+  in.ignore(100, (unsigned char)'\xfd');
+  VERIFY( in.gcount() == 3 );
+  VERIFY( ! in.eof() );
+
+  // This only works if char is unsigned.
+  in.ignore(100, '\xfe');
+  if (std::numeric_limits<char>::is_signed)
+  {
+    // When char is signed, '\xfe' != traits_type::to_int_type('\xfe')
+    // so the delimiter does not match the character in the input sequence,
+    // and ignore consumes all input until EOF.
+    VERIFY( in.gcount() == 5 );
+    VERIFY( in.eof() );
+  }
+  else
+  {
+    // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter
+    // matches the character in the input sequence, and doesn't reach EOF.
+    VERIFY( in.gcount() == 4 );
+    VERIFY( ! in.eof() );
+  }
+
+  in.clear();
+  in.str(".a.");
+  in.ignore(100, 'a' + 256); // Should not match 'a'
+  VERIFY( in.gcount() == 3 );
+  VERIFY( in.eof() );
+}
+
+// Custom traits type that inherits all behaviour from std::char_traits<char>.
+struct traits : std::char_traits<char> { };
+
+void
+test_primary_template()
+{
+  // Check that the primary template for std::basic_istream::ignore
+  // works the same as the std::istream::ignore specialization.
+  // The infinite loop bug was never present in the primary template,
+  // because it doesn't use traits_type::find to search the input sequence.
+
+  std::basic_istringstream<char, traits> in(".\xfc..\xfd...\xfe.");
+
+  // This should find '\xfd' even on platforms where char is signed,
+  // because the delimiter is correctly converted to the stream's int_type.
+  in.ignore(100, std::char_traits<char>::to_int_type('\xfc'));
+  VERIFY( in.gcount() == 2 );
+  VERIFY( ! in.eof() );
+
+  // This should work equivalently to traits_type::to_int_type
+  in.ignore(100, (unsigned char)'\xfd');
+  VERIFY( in.gcount() == 3 );
+  VERIFY( ! in.eof() );
+
+  // This only works if char is unsigned.
+  in.ignore(100, '\xfe');
+  if (std::numeric_limits<char>::is_signed)
+  {
+    // When char is signed, '\xfe' != traits_type::to_int_type('\xfe')
+    // so the delimiter does not match the character in the input sequence,
+    // and ignore consumes all input until EOF.
+    VERIFY( in.gcount() == 5 );
+    VERIFY( in.eof() );
+  }
+  else
+  {
+    // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter
+    // matches the character in the input sequence, and doesn't reach EOF.
+    VERIFY( in.gcount() == 4 );
+    VERIFY( ! in.eof() );
+  }
+
+  in.clear();
+  in.str(".a.");
+  in.ignore(100, 'a' + 256); // Should not match 'a'
+  VERIFY( in.gcount() == 3 );
+  VERIFY( in.eof() );
+}
+
+int main()
+{
+  test_pr93672();
+  test_primary_template();
+}
diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc
new file mode 100644
index 00000000000..5ce9155e02c
--- /dev/null
+++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc
@@ -0,0 +1,34 @@
+// { dg-do run }
+
+#include <sstream>
+#include <limits>
+#include <climits>
+#include <testsuite_hooks.h>
+
+// PR 93672 was a bug in std::istream that never affected std::wistream.
+// This test ensures that the bug doesn't get introduced to std::wistream.
+void
+test_pr93672()
+{
+  std::wstring str = L".x..x.";
+  str[1] = (wchar_t)-2;
+  str[4] = (wchar_t)-3;
+  std::wistringstream in(str);
+
+  // This should find the character even on platforms where wchar_t is signed,
+  // because the delimiter is correctly converted to the stream's int_type.
+  in.ignore(100, std::char_traits<wchar_t>::to_int_type((wchar_t)-2));
+  VERIFY( in.gcount() == 2 );
+  VERIFY( ! in.eof() );
+
+  // This also works, because std::char_traits<wchar_t>::to_int_type(wc) is
+  // equivalent to (int_type)wc so using to_int_type isn't needed.
+  in.ignore(100, (wchar_t)-3);
+  VERIFY( in.gcount() == 3 );
+  VERIFY( ! in.eof() );
+}
+
+int main()
+{
+  test_pr93672();
+}
-- 
2.44.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
  2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
@ 2024-04-15 18:30   ` Jonathan Wakely
  0 siblings, 0 replies; 10+ messages in thread
From: Jonathan Wakely @ 2024-04-15 18:30 UTC (permalink / raw)
  To: libstdc++, gcc-patches

Pushed to trunk now.


On Mon, 8 Apr 2024 at 17:53, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> Patch v2.
>
> I realised that it's not only negative delim values that cause the
> problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256)
> will cause traits_type::find to match 'a' but then the eq_int_type
> comparison will fail because (int)'a' != (int)('a' + 256).
>
> This version of the patch calls to_int_type on the delim and if that
> alters the value, it's never going to match so skip the loop that tries
> to find it and just ignore up to n chars instead.
>
> Tested x86_64linux and aarch64-linux.
>
> -- >8 --
>
> A negative delim value passed to std::istream::ignore can never match
> any character in the stream, because the comparison is done using
> traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns
> negative values (except at EOF). The optimized version of ignore for the
> std::istream specialization uses traits_type::find to locate the delim
> character in the streambuf, which _can_ match a negative delim on
> platforms where char is signed, but then we do another comparison using
> eq_int_type which fails. The code then keeps looping forever, with
> traits_type::find locating the character and traits_type::eq_int_type
> saying it's not a match, so traits_type::find is used again and finds
> the same character again.
>
> A possible fix would be to check with eq_int_type after a successful
> find, to see whether we really have a match. However, that would be
> suboptimal since we know that a negative delimiter will never match
> using eq_int_type. So a better fix is to adjust the check at the top of
> the function that handles delim==eof(), so that we treat all negative
> delim values as equivalent to EOF. That way we don't bother using find
> to search for something that will never match with eq_int_type.
>
> The version of ignore in the primary template doesn't need a change,
> because it doesn't use traits_type::find, instead characters are
> extracted one-by-one and always matched using eq_int_type. That avoids
> the inconsistency between find and eq_int_type. The specialization for
> std::wistream does use traits_type::find, but traits_type::to_int_type
> is equivalent to an implicit conversion from wchar_t to wint_t, so
> passing a wchar_t directly to ignore without using to_int_type works.
>
> libstdc++-v3/ChangeLog:
>
>         PR libstdc++/93672
>         * src/c++98/istream.cc (istream::ignore(streamsize, int_type)):
>         Treat all negative delimiter values as eof().
>         * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test.
>         * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New
>         test.
> ---
>  libstdc++-v3/src/c++98/istream.cc             |  13 ++-
>  .../27_io/basic_istream/ignore/char/93672.cc  | 101 ++++++++++++++++++
>  .../basic_istream/ignore/wchar_t/93672.cc     |  34 ++++++
>  3 files changed, 146 insertions(+), 2 deletions(-)
>  create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
>  create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc
>
> diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc
> index 07ac739c26a..d1b4444ff2b 100644
> --- a/libstdc++-v3/src/c++98/istream.cc
> +++ b/libstdc++-v3/src/c++98/istream.cc
> @@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      basic_istream<char>::
>      ignore(streamsize __n, int_type __delim)
>      {
> -      if (traits_type::eq_int_type(__delim, traits_type::eof()))
> -       return ignore(__n);
> +      {
> +       // If conversion to int_type changes the value then __delim does not
> +       // correspond to a value of type char_type, and so will never match
> +       // a character extracted from the input sequence. Just use ignore(n).
> +       const int_type chk_delim = traits_type::to_int_type(__delim);
> +       const bool matchable = traits_type::eq_int_type(chk_delim, __delim);
> +       if (__builtin_expect(!matchable, 0))
> +         return ignore(__n);
> +       // Now we know that __delim is a valid char_type value, so it's safe
> +       // for the code below to use traits_type::find to search for it.
> +      }
>
>        _M_gcount = 0;
>        sentry __cerb(*this, true);
> diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> new file mode 100644
> index 00000000000..96737485b83
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc
> @@ -0,0 +1,101 @@
> +// { dg-do run }
> +
> +#include <sstream>
> +#include <limits>
> +#include <testsuite_hooks.h>
> +
> +void
> +test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set
> +{
> +  std::istringstream in(".\xfc..\xfd...\xfe.");
> +
> +  // This should find '\xfd' even on platforms where char is signed,
> +  // because the delimiter is correctly converted to the stream's int_type.
> +  in.ignore(100, std::char_traits<char>::to_int_type('\xfc'));
> +  VERIFY( in.gcount() == 2 );
> +  VERIFY( ! in.eof() );
> +
> +  // This should work equivalently to traits_type::to_int_type
> +  in.ignore(100, (unsigned char)'\xfd');
> +  VERIFY( in.gcount() == 3 );
> +  VERIFY( ! in.eof() );
> +
> +  // This only works if char is unsigned.
> +  in.ignore(100, '\xfe');
> +  if (std::numeric_limits<char>::is_signed)
> +  {
> +    // When char is signed, '\xfe' != traits_type::to_int_type('\xfe')
> +    // so the delimiter does not match the character in the input sequence,
> +    // and ignore consumes all input until EOF.
> +    VERIFY( in.gcount() == 5 );
> +    VERIFY( in.eof() );
> +  }
> +  else
> +  {
> +    // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter
> +    // matches the character in the input sequence, and doesn't reach EOF.
> +    VERIFY( in.gcount() == 4 );
> +    VERIFY( ! in.eof() );
> +  }
> +
> +  in.clear();
> +  in.str(".a.");
> +  in.ignore(100, 'a' + 256); // Should not match 'a'
> +  VERIFY( in.gcount() == 3 );
> +  VERIFY( in.eof() );
> +}
> +
> +// Custom traits type that inherits all behaviour from std::char_traits<char>.
> +struct traits : std::char_traits<char> { };
> +
> +void
> +test_primary_template()
> +{
> +  // Check that the primary template for std::basic_istream::ignore
> +  // works the same as the std::istream::ignore specialization.
> +  // The infinite loop bug was never present in the primary template,
> +  // because it doesn't use traits_type::find to search the input sequence.
> +
> +  std::basic_istringstream<char, traits> in(".\xfc..\xfd...\xfe.");
> +
> +  // This should find '\xfd' even on platforms where char is signed,
> +  // because the delimiter is correctly converted to the stream's int_type.
> +  in.ignore(100, std::char_traits<char>::to_int_type('\xfc'));
> +  VERIFY( in.gcount() == 2 );
> +  VERIFY( ! in.eof() );
> +
> +  // This should work equivalently to traits_type::to_int_type
> +  in.ignore(100, (unsigned char)'\xfd');
> +  VERIFY( in.gcount() == 3 );
> +  VERIFY( ! in.eof() );
> +
> +  // This only works if char is unsigned.
> +  in.ignore(100, '\xfe');
> +  if (std::numeric_limits<char>::is_signed)
> +  {
> +    // When char is signed, '\xfe' != traits_type::to_int_type('\xfe')
> +    // so the delimiter does not match the character in the input sequence,
> +    // and ignore consumes all input until EOF.
> +    VERIFY( in.gcount() == 5 );
> +    VERIFY( in.eof() );
> +  }
> +  else
> +  {
> +    // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter
> +    // matches the character in the input sequence, and doesn't reach EOF.
> +    VERIFY( in.gcount() == 4 );
> +    VERIFY( ! in.eof() );
> +  }
> +
> +  in.clear();
> +  in.str(".a.");
> +  in.ignore(100, 'a' + 256); // Should not match 'a'
> +  VERIFY( in.gcount() == 3 );
> +  VERIFY( in.eof() );
> +}
> +
> +int main()
> +{
> +  test_pr93672();
> +  test_primary_template();
> +}
> diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc
> new file mode 100644
> index 00000000000..5ce9155e02c
> --- /dev/null
> +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc
> @@ -0,0 +1,34 @@
> +// { dg-do run }
> +
> +#include <sstream>
> +#include <limits>
> +#include <climits>
> +#include <testsuite_hooks.h>
> +
> +// PR 93672 was a bug in std::istream that never affected std::wistream.
> +// This test ensures that the bug doesn't get introduced to std::wistream.
> +void
> +test_pr93672()
> +{
> +  std::wstring str = L".x..x.";
> +  str[1] = (wchar_t)-2;
> +  str[4] = (wchar_t)-3;
> +  std::wistringstream in(str);
> +
> +  // This should find the character even on platforms where wchar_t is signed,
> +  // because the delimiter is correctly converted to the stream's int_type.
> +  in.ignore(100, std::char_traits<wchar_t>::to_int_type((wchar_t)-2));
> +  VERIFY( in.gcount() == 2 );
> +  VERIFY( ! in.eof() );
> +
> +  // This also works, because std::char_traits<wchar_t>::to_int_type(wc) is
> +  // equivalent to (int_type)wc so using to_int_type isn't needed.
> +  in.ignore(100, (wchar_t)-3);
> +  VERIFY( in.gcount() == 3 );
> +  VERIFY( ! in.eof() );
> +}
> +
> +int main()
> +{
> +  test_pr93672();
> +}
> --
> 2.44.0
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-04-15 18:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely
2024-04-04 15:40 ` Iain Sandoe
2024-04-04 16:30   ` Jonathan Wakely
2024-04-04 16:55     ` Jonathan Wakely
2024-04-04 19:53       ` Jonathan Wakely
2024-04-04 16:28 ` Ulrich Drepper
2024-04-04 16:33   ` Jonathan Wakely
2024-04-04 16:35     ` Jonathan Wakely
2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
2024-04-15 18:30   ` Jonathan Wakely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).