* [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] @ 2024-04-04 15:29 Jonathan Wakely 2024-04-04 15:40 ` Iain Sandoe ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 15:29 UTC (permalink / raw) To: libstdc++, gcc-patches I would appreciate more eyes on this to confirm my conclusions about negative int_type values, and the proposed fix, make sense. Tested x86_64-linux. -- >8 -- A negative value for the delim value passed to std::istream::ignore can never match any character in the stream, because the comparison is done using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns negative values (except at EOF). The optimized version of ignore for the std::istream specialization uses traits_type::find to locate the delim character in the streambuf, which _can_ match a negative delim on platforms where char is signed, but then we do another comparison using eq_int_type which fails. The code then keeps looping forever, with traits_type::find saying the character is present and eq_int_type saying it's not. A possible fix would be to check with eq_int_type after a successful find, to see whether we really have a match. However, that would be suboptimal since we know that a negative delimiter will never match using eq_int_type. So a better fix is to adjust the check at the top of the function that handles delim==eof(), so that we treat all negative delim values as equivalent to EOF. That way we don't bother using find to search for something that will never match with eq_int_type. The version of ignore in the primary template doesn't need a change, because it doesn't use traits_type::find, instead characters are extracted one-by-one and always matched using eq_int_type. That avoids the inconsistency between find and eq_int_type. libstdc++-v3/ChangeLog: PR libstdc++/93672 * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): Treat all negative delimiter values as eof(). * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. --- libstdc++-v3/src/c++98/istream.cc | 5 ++++- .../27_io/basic_istream/ignore/char/93672.cc | 15 +++++++++++++++ 2 files changed, 19 insertions(+), 1 deletion(-) create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc index 07ac739c26a..aa1069dea07 100644 --- a/libstdc++-v3/src/c++98/istream.cc +++ b/libstdc++-v3/src/c++98/istream.cc @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_istream<char>:: ignore(streamsize __n, int_type __delim) { - if (traits_type::eq_int_type(__delim, traits_type::eof())) + // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF. + // If __delim is negative, then eq_int_type(sgetc(), __delim) can only + // be true for EOF, so just treat all negative values as eof(). + if (__delim < 0) return ignore(__n); _M_gcount = 0; diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc new file mode 100644 index 00000000000..6d11f5622c8 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc @@ -0,0 +1,15 @@ +// { dg-do run } + +#include <sstream> +#include <testsuite_hooks.h> + +int main() +{ + std::istringstream in("x\xfdxxx\xfex"); + in.ignore(10, std::char_traits<char>::to_int_type('\xfd')); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + in.ignore(10, '\xfe'); + VERIFY( in.gcount() == 5 ); + VERIFY( in.eof() ); +} -- 2.44.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely @ 2024-04-04 15:40 ` Iain Sandoe 2024-04-04 16:30 ` Jonathan Wakely 2024-04-04 16:28 ` Ulrich Drepper 2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely 2 siblings, 1 reply; 10+ messages in thread From: Iain Sandoe @ 2024-04-04 15:40 UTC (permalink / raw) To: Jonathan Wakely; +Cc: libstdc++, GCC Patches > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote: > > I would appreciate more eyes on this to confirm my conclusions about > negative int_type values, and the proposed fix, make sense. > > Tested x86_64-linux. > > -- >8 -- > > A negative value for the delim value passed to std::istream::ignore can > never match any character in the stream, because the comparison is done > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never > returns negative values (except at EOF). The optimized version of > ignore for the std::istream specialization uses traits_type::find to > locate the delim character in the streambuf, which _can_ match a > negative delim on platforms where char is signed, but then we do another > comparison using eq_int_type which fails. The code then keeps looping > forever, with traits_type::find saying the character is present and > eq_int_type saying it's not. > > A possible fix would be to check with eq_int_type after a successful > find, to see whether we really have a match. However, that would be > suboptimal since we know that a negative delimiter will never match > using eq_int_type. So a better fix is to adjust the check at the top of > the function that handles delim==eof(), so that we treat all negative > delim values as equivalent to EOF. That way we don't bother using find > to search for something that will never match with eq_int_type. Is the corollary to this that a platform with signed chars can never use a negative value as a delimiter - since that we always be treated as EOF? - I am not sure it there’s an actual use-case where that matters, but, Iain > > The version of ignore in the primary template doesn't need a change, > because it doesn't use traits_type::find, instead characters are > extracted one-by-one and always matched using eq_int_type. That avoids > the inconsistency between find and eq_int_type. > > libstdc++-v3/ChangeLog: > > PR libstdc++/93672 > * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): > Treat all negative delimiter values as eof(). > * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. > --- > libstdc++-v3/src/c++98/istream.cc | 5 ++++- > .../27_io/basic_istream/ignore/char/93672.cc | 15 +++++++++++++++ > 2 files changed, 19 insertions(+), 1 deletion(-) > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc > index 07ac739c26a..aa1069dea07 100644 > --- a/libstdc++-v3/src/c++98/istream.cc > +++ b/libstdc++-v3/src/c++98/istream.cc > @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > basic_istream<char>:: > ignore(streamsize __n, int_type __delim) > { > - if (traits_type::eq_int_type(__delim, traits_type::eof())) > + // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF. > + // If __delim is negative, then eq_int_type(sgetc(), __delim) can only > + // be true for EOF, so just treat all negative values as eof(). > + if (__delim < 0) > return ignore(__n); > > _M_gcount = 0; > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > new file mode 100644 > index 00000000000..6d11f5622c8 > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > @@ -0,0 +1,15 @@ > +// { dg-do run } > + > +#include <sstream> > +#include <testsuite_hooks.h> > + > +int main() > +{ > + std::istringstream in("x\xfdxxx\xfex"); > + in.ignore(10, std::char_traits<char>::to_int_type('\xfd')); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + in.ignore(10, '\xfe'); > + VERIFY( in.gcount() == 5 ); > + VERIFY( in.eof() ); > +} > -- > 2.44.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 15:40 ` Iain Sandoe @ 2024-04-04 16:30 ` Jonathan Wakely 2024-04-04 16:55 ` Jonathan Wakely 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 16:30 UTC (permalink / raw) To: Iain Sandoe; +Cc: libstdc++, GCC Patches On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote: > > > > > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote: > > > > I would appreciate more eyes on this to confirm my conclusions about > > negative int_type values, and the proposed fix, make sense. > > > > Tested x86_64-linux. > > > > -- >8 -- > > > > A negative value for the delim value passed to std::istream::ignore can > > never match any character in the stream, because the comparison is done > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never > > returns negative values (except at EOF). The optimized version of > > ignore for the std::istream specialization uses traits_type::find to > > locate the delim character in the streambuf, which _can_ match a > > negative delim on platforms where char is signed, but then we do another > > comparison using eq_int_type which fails. The code then keeps looping > > forever, with traits_type::find saying the character is present and > > eq_int_type saying it's not. > > > > A possible fix would be to check with eq_int_type after a successful > > find, to see whether we really have a match. However, that would be > > suboptimal since we know that a negative delimiter will never match > > using eq_int_type. So a better fix is to adjust the check at the top of > > the function that handles delim==eof(), so that we treat all negative > > delim values as equivalent to EOF. That way we don't bother using find > > to search for something that will never match with eq_int_type. > > Is the corollary to this that a platform with signed chars can never use a > negative value as a delimiter - since that we always be treated as EOF? That's what the C++ standard says (and is what libc++ does). The delimiter argument to ignore is an int_type, not a char. So formally you should call it like: std::cin.ignore(n, std::istream::traits_type::to_int_type('a')); where to_int_type will cast to unsigned char and then to int, so that no char can ever produce a negative value for that argument. If you happen to know that casting 'a' to unsigned char and then to int doesn't change its value (because it's a 7-bit ASCII value), then you can be lazy and do: std::cin.ignore(n, 'a'); That works fine. But if your delimiter character is the MSB set, *and* char is signed on your platform, then you can't be lazy. The implicit conversion from char to the stream's int_type is not the same as the result of calling traits_type::to_int_type, and so these are NOT equivalent on a platform with signed char: std::cin.ignore(n, '\x80'); std::cin.ignore(n, (unsigned char)'\x80'); The former is wrong, the latter is correct. The former will never match a '\x80' in the stream, because the ignore function will cast each char extracted from the stream to (int)(unsigned char) and so never match -128. So the change to treat all negative values as EOF is just an optimization. Since they can never match, there's no point searching for them. Just skip n chars. > > - I am not sure it there’s an actual use-case where that matters, but, > Iain > > > > > The version of ignore in the primary template doesn't need a change, > > because it doesn't use traits_type::find, instead characters are > > extracted one-by-one and always matched using eq_int_type. That avoids > > the inconsistency between find and eq_int_type. > > > > libstdc++-v3/ChangeLog: > > > > PR libstdc++/93672 > > * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): > > Treat all negative delimiter values as eof(). > > * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. > > --- > > libstdc++-v3/src/c++98/istream.cc | 5 ++++- > > .../27_io/basic_istream/ignore/char/93672.cc | 15 +++++++++++++++ > > 2 files changed, 19 insertions(+), 1 deletion(-) > > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > > > > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc > > index 07ac739c26a..aa1069dea07 100644 > > --- a/libstdc++-v3/src/c++98/istream.cc > > +++ b/libstdc++-v3/src/c++98/istream.cc > > @@ -112,7 +112,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > > basic_istream<char>:: > > ignore(streamsize __n, int_type __delim) > > { > > - if (traits_type::eq_int_type(__delim, traits_type::eof())) > > + // sgetc() returns either (int_type)(unsigned char)c or -1 for EOF. > > + // If __delim is negative, then eq_int_type(sgetc(), __delim) can only > > + // be true for EOF, so just treat all negative values as eof(). > > + if (__delim < 0) > > return ignore(__n); > > > > _M_gcount = 0; > > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > > new file mode 100644 > > index 00000000000..6d11f5622c8 > > --- /dev/null > > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > > @@ -0,0 +1,15 @@ > > +// { dg-do run } > > + > > +#include <sstream> > > +#include <testsuite_hooks.h> > > + > > +int main() > > +{ > > + std::istringstream in("x\xfdxxx\xfex"); > > + in.ignore(10, std::char_traits<char>::to_int_type('\xfd')); > > + VERIFY( in.gcount() == 2 ); > > + VERIFY( ! in.eof() ); > > + in.ignore(10, '\xfe'); > > + VERIFY( in.gcount() == 5 ); > > + VERIFY( in.eof() ); > > +} > > -- > > 2.44.0 > > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 16:30 ` Jonathan Wakely @ 2024-04-04 16:55 ` Jonathan Wakely 2024-04-04 19:53 ` Jonathan Wakely 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 16:55 UTC (permalink / raw) To: Iain Sandoe; +Cc: libstdc++, GCC Patches On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely <jwakely@redhat.com> wrote: > > On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote: > > > > > > > > > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote: > > > > > > I would appreciate more eyes on this to confirm my conclusions about > > > negative int_type values, and the proposed fix, make sense. > > > > > > Tested x86_64-linux. > > > > > > -- >8 -- > > > > > > A negative value for the delim value passed to std::istream::ignore can > > > never match any character in the stream, because the comparison is done > > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never > > > returns negative values (except at EOF). The optimized version of > > > ignore for the std::istream specialization uses traits_type::find to > > > locate the delim character in the streambuf, which _can_ match a > > > negative delim on platforms where char is signed, but then we do another > > > comparison using eq_int_type which fails. The code then keeps looping > > > forever, with traits_type::find saying the character is present and > > > eq_int_type saying it's not. > > > > > > A possible fix would be to check with eq_int_type after a successful > > > find, to see whether we really have a match. However, that would be > > > suboptimal since we know that a negative delimiter will never match > > > using eq_int_type. So a better fix is to adjust the check at the top of > > > the function that handles delim==eof(), so that we treat all negative > > > delim values as equivalent to EOF. That way we don't bother using find > > > to search for something that will never match with eq_int_type. > > > > Is the corollary to this that a platform with signed chars can never use a > > negative value as a delimiter - since that we always be treated as EOF? > > That's what the C++ standard says (and is what libc++ does). > > The delimiter argument to ignore is an int_type, not a char. So > formally you should call it like: > > std::cin.ignore(n, std::istream::traits_type::to_int_type('a')); > > where to_int_type will cast to unsigned char and then to int, so that > no char can ever produce a negative value for that argument. > > If you happen to know that casting 'a' to unsigned char and then to > int doesn't change its value (because it's a 7-bit ASCII value), then > you can be lazy and do: > > std::cin.ignore(n, 'a'); > > That works fine. > > But if your delimiter character is the MSB set, *and* char is signed > on your platform, then you can't be lazy. The implicit conversion from > char to the stream's int_type is not the same as the result of calling > traits_type::to_int_type, and so these are NOT equivalent on a > platform with signed char: > std::cin.ignore(n, '\x80'); > std::cin.ignore(n, (unsigned char)'\x80'); > > The former is wrong, the latter is correct. > The former will never match a '\x80' in the stream, because the ignore > function will cast each char extracted from the stream to > (int)(unsigned char) and so never match -128. > > So the change to treat all negative values as EOF is just an > optimization. Since they can never match, there's no point searching > for them. Just skip n chars. And FWIW, I don't think libc++ does that optimization. They extract char-by-char and compare them, even though that can never match a negative value. So they get the same result as with my patch, but much slower ;-) The status quo is that libstdc++ loops forever given a negative delimiter. That's obviously wrong! We need to fix the buggy logic that loops forever, and the "correct" standard conforming fix means that we never match the negative value, because the chars are extracted from the stream as non-negative int_type values. As an optimization, we can skip the loop that keeps trying impossible matches, because we know it won't match. That's what my patch does. (It occurs to me now that we could also treat any delim value greater than 255 as EOF too, since that can't match either). Ulrich's suggestion is to allow the buggy user code to Just Work (for all cases except (char)-1 on a platform where char is signed). That is maybe the least surprising behaviour for users. On the other hand, I'm not sure we really want to support: cin.ignore(n, -10); cin.ignore(n, -999); cin.ignore(n, +999); What are these trying to do? Those are just nonsense arguments to this function. But if we try to make the testcase in the bug report Just Work, we end up also accepting (at least) the -10 case, because it's indistinguishable from the char '\xf6'. Depending how we do it, we might also make the other cases work, treating -999 as '\x19', and +999 as '\xe7'. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 16:55 ` Jonathan Wakely @ 2024-04-04 19:53 ` Jonathan Wakely 0 siblings, 0 replies; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 19:53 UTC (permalink / raw) To: Iain Sandoe; +Cc: libstdc++, GCC Patches On Thu, 4 Apr 2024 at 17:55, Jonathan Wakely <jwakely@redhat.com> wrote: > > On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely <jwakely@redhat.com> wrote: > Ulrich's suggestion is to allow the buggy user code to Just Work (for > all cases except (char)-1 on a platform where char is signed). That is > maybe the least surprising behaviour for users. > > On the other hand, I'm not sure we really want to support: > > cin.ignore(n, -10); > cin.ignore(n, -999); > cin.ignore(n, +999); > > What are these trying to do? Those are just nonsense arguments to this > function. But if we try to make the testcase in the bug report Just > Work, we end up also accepting (at least) the -10 case, because it's > indistinguishable from the char '\xf6'. Depending how we do it, we > might also make the other cases work, treating -999 as '\x19', and > +999 as '\xe7'. So maybe what we want is to add a new overload: basic_istream& ignore(streamsize n, char_type c) { return ignore(n, traits_type::to_int_type(c)); } This will only accept the stream's char_type, not -999 or +999, and will do the required conversion to int_type correctly, not as an implicit conversion. Calls that pass eof() will still select the other overload and do the right thing. Calls using a char (or for wstream, a wchar_t) will select the new overload. This new overload will make some calls ambiguous, e.g. ignore(1, 1L) compiles today, but would become ambiguous. That seems fine, since the second argument should not be type long (what is it even trying to do?) If that's a problem, we can make it a constrained template that only gets called for the exact char_type: basic_istream& ignore(streamsize n, same_as<char_type> auto c) This would still Do The Right Thing for ignore(n, '\x80') but would not change the result of ignore(1, 1L) which would still select the original overload taking int_type for the second parameter. I think I'll raise this idea with the committee. For now though, I think my patch fixes the bug and conforms to the standard, and doesn't do anything inventive. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely 2024-04-04 15:40 ` Iain Sandoe @ 2024-04-04 16:28 ` Ulrich Drepper 2024-04-04 16:33 ` Jonathan Wakely 2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely 2 siblings, 1 reply; 10+ messages in thread From: Ulrich Drepper @ 2024-04-04 16:28 UTC (permalink / raw) To: Jonathan Wakely; +Cc: libstdc++, gcc-patches On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote: > I would appreciate more eyes on this to confirm my conclusions about > negative int_type values, and the proposed fix, make sense. The way something like this is handled in glibc's ctype functions is that both branches are considered. For isXXX(c) whether c is -v or 256-v the same value is returned (except for EOF which is -1). This caused the least number of bad surprises. You could here also perform similar actions. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 16:28 ` Ulrich Drepper @ 2024-04-04 16:33 ` Jonathan Wakely 2024-04-04 16:35 ` Jonathan Wakely 0 siblings, 1 reply; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 16:33 UTC (permalink / raw) To: Ulrich Drepper; +Cc: libstdc++, gcc-patches On Thu, 4 Apr 2024 at 17:29, Ulrich Drepper <drepper.fsp@gmail.com> wrote: > > On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote: > > I would appreciate more eyes on this to confirm my conclusions about > > negative int_type values, and the proposed fix, make sense. > > The way something like this is handled in glibc's ctype functions is > that both branches are considered. For isXXX(c) whether c is -v or > 256-v the same value is returned (except for EOF which is -1). This > caused the least number of bad surprises. > > You could here also perform similar actions. Yes, my first attempt to fix PR93672 did exactly that, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93672#c1 But since it doesn't work for '\xff' (because that's EOF when char is signed) it only handles 127 of the 128 possible bugs ;-) I'm also not sure it's conforming, since the standard specifies how the matching is done, and that won't match negative chars. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 16:33 ` Jonathan Wakely @ 2024-04-04 16:35 ` Jonathan Wakely 0 siblings, 0 replies; 10+ messages in thread From: Jonathan Wakely @ 2024-04-04 16:35 UTC (permalink / raw) To: Ulrich Drepper; +Cc: libstdc++, gcc-patches On Thu, 4 Apr 2024 at 17:33, Jonathan Wakely <jwakely@redhat.com> wrote: > > On Thu, 4 Apr 2024 at 17:29, Ulrich Drepper <drepper.fsp@gmail.com> wrote: > > > > On Thu, Apr 4, 2024 at 5:29 PM Jonathan Wakely <jwakely@redhat.com> wrote: > > > I would appreciate more eyes on this to confirm my conclusions about > > > negative int_type values, and the proposed fix, make sense. > > > > The way something like this is handled in glibc's ctype functions is > > that both branches are considered. For isXXX(c) whether c is -v or > > 256-v the same value is returned (except for EOF which is -1). This > > caused the least number of bad surprises. > > > > You could here also perform similar actions. > > Yes, my first attempt to fix PR93672 did exactly that, see > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93672#c1 > > But since it doesn't work for '\xff' (because that's EOF when char is > signed) it only handles 127 of the 128 possible bugs ;-) > I'm also not sure it's conforming, since the standard specifies how > the matching is done, and that won't match negative chars. I might suggest relaxing the standard to allow it though... ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely 2024-04-04 15:40 ` Iain Sandoe 2024-04-04 16:28 ` Ulrich Drepper @ 2024-04-08 16:48 ` Jonathan Wakely 2024-04-15 18:30 ` Jonathan Wakely 2 siblings, 1 reply; 10+ messages in thread From: Jonathan Wakely @ 2024-04-08 16:48 UTC (permalink / raw) To: libstdc++, gcc-patches Patch v2. I realised that it's not only negative delim values that cause the problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256) will cause traits_type::find to match 'a' but then the eq_int_type comparison will fail because (int)'a' != (int)('a' + 256). This version of the patch calls to_int_type on the delim and if that alters the value, it's never going to match so skip the loop that tries to find it and just ignore up to n chars instead. Tested x86_64linux and aarch64-linux. -- >8 -- A negative delim value passed to std::istream::ignore can never match any character in the stream, because the comparison is done using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns negative values (except at EOF). The optimized version of ignore for the std::istream specialization uses traits_type::find to locate the delim character in the streambuf, which _can_ match a negative delim on platforms where char is signed, but then we do another comparison using eq_int_type which fails. The code then keeps looping forever, with traits_type::find locating the character and traits_type::eq_int_type saying it's not a match, so traits_type::find is used again and finds the same character again. A possible fix would be to check with eq_int_type after a successful find, to see whether we really have a match. However, that would be suboptimal since we know that a negative delimiter will never match using eq_int_type. So a better fix is to adjust the check at the top of the function that handles delim==eof(), so that we treat all negative delim values as equivalent to EOF. That way we don't bother using find to search for something that will never match with eq_int_type. The version of ignore in the primary template doesn't need a change, because it doesn't use traits_type::find, instead characters are extracted one-by-one and always matched using eq_int_type. That avoids the inconsistency between find and eq_int_type. The specialization for std::wistream does use traits_type::find, but traits_type::to_int_type is equivalent to an implicit conversion from wchar_t to wint_t, so passing a wchar_t directly to ignore without using to_int_type works. libstdc++-v3/ChangeLog: PR libstdc++/93672 * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): Treat all negative delimiter values as eof(). * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New test. --- libstdc++-v3/src/c++98/istream.cc | 13 ++- .../27_io/basic_istream/ignore/char/93672.cc | 101 ++++++++++++++++++ .../basic_istream/ignore/wchar_t/93672.cc | 34 ++++++ 3 files changed, 146 insertions(+), 2 deletions(-) create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc index 07ac739c26a..d1b4444ff2b 100644 --- a/libstdc++-v3/src/c++98/istream.cc +++ b/libstdc++-v3/src/c++98/istream.cc @@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION basic_istream<char>:: ignore(streamsize __n, int_type __delim) { - if (traits_type::eq_int_type(__delim, traits_type::eof())) - return ignore(__n); + { + // If conversion to int_type changes the value then __delim does not + // correspond to a value of type char_type, and so will never match + // a character extracted from the input sequence. Just use ignore(n). + const int_type chk_delim = traits_type::to_int_type(__delim); + const bool matchable = traits_type::eq_int_type(chk_delim, __delim); + if (__builtin_expect(!matchable, 0)) + return ignore(__n); + // Now we know that __delim is a valid char_type value, so it's safe + // for the code below to use traits_type::find to search for it. + } _M_gcount = 0; sentry __cerb(*this, true); diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc new file mode 100644 index 00000000000..96737485b83 --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc @@ -0,0 +1,101 @@ +// { dg-do run } + +#include <sstream> +#include <limits> +#include <testsuite_hooks.h> + +void +test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set +{ + std::istringstream in(".\xfc..\xfd...\xfe."); + + // This should find '\xfd' even on platforms where char is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits<char>::to_int_type('\xfc')); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This should work equivalently to traits_type::to_int_type + in.ignore(100, (unsigned char)'\xfd'); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); + + // This only works if char is unsigned. + in.ignore(100, '\xfe'); + if (std::numeric_limits<char>::is_signed) + { + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') + // so the delimiter does not match the character in the input sequence, + // and ignore consumes all input until EOF. + VERIFY( in.gcount() == 5 ); + VERIFY( in.eof() ); + } + else + { + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter + // matches the character in the input sequence, and doesn't reach EOF. + VERIFY( in.gcount() == 4 ); + VERIFY( ! in.eof() ); + } + + in.clear(); + in.str(".a."); + in.ignore(100, 'a' + 256); // Should not match 'a' + VERIFY( in.gcount() == 3 ); + VERIFY( in.eof() ); +} + +// Custom traits type that inherits all behaviour from std::char_traits<char>. +struct traits : std::char_traits<char> { }; + +void +test_primary_template() +{ + // Check that the primary template for std::basic_istream::ignore + // works the same as the std::istream::ignore specialization. + // The infinite loop bug was never present in the primary template, + // because it doesn't use traits_type::find to search the input sequence. + + std::basic_istringstream<char, traits> in(".\xfc..\xfd...\xfe."); + + // This should find '\xfd' even on platforms where char is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits<char>::to_int_type('\xfc')); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This should work equivalently to traits_type::to_int_type + in.ignore(100, (unsigned char)'\xfd'); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); + + // This only works if char is unsigned. + in.ignore(100, '\xfe'); + if (std::numeric_limits<char>::is_signed) + { + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') + // so the delimiter does not match the character in the input sequence, + // and ignore consumes all input until EOF. + VERIFY( in.gcount() == 5 ); + VERIFY( in.eof() ); + } + else + { + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter + // matches the character in the input sequence, and doesn't reach EOF. + VERIFY( in.gcount() == 4 ); + VERIFY( ! in.eof() ); + } + + in.clear(); + in.str(".a."); + in.ignore(100, 'a' + 256); // Should not match 'a' + VERIFY( in.gcount() == 3 ); + VERIFY( in.eof() ); +} + +int main() +{ + test_pr93672(); + test_primary_template(); +} diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc new file mode 100644 index 00000000000..5ce9155e02c --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc @@ -0,0 +1,34 @@ +// { dg-do run } + +#include <sstream> +#include <limits> +#include <climits> +#include <testsuite_hooks.h> + +// PR 93672 was a bug in std::istream that never affected std::wistream. +// This test ensures that the bug doesn't get introduced to std::wistream. +void +test_pr93672() +{ + std::wstring str = L".x..x."; + str[1] = (wchar_t)-2; + str[4] = (wchar_t)-3; + std::wistringstream in(str); + + // This should find the character even on platforms where wchar_t is signed, + // because the delimiter is correctly converted to the stream's int_type. + in.ignore(100, std::char_traits<wchar_t>::to_int_type((wchar_t)-2)); + VERIFY( in.gcount() == 2 ); + VERIFY( ! in.eof() ); + + // This also works, because std::char_traits<wchar_t>::to_int_type(wc) is + // equivalent to (int_type)wc so using to_int_type isn't needed. + in.ignore(100, (wchar_t)-3); + VERIFY( in.gcount() == 3 ); + VERIFY( ! in.eof() ); +} + +int main() +{ + test_pr93672(); +} -- 2.44.0 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] 2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely @ 2024-04-15 18:30 ` Jonathan Wakely 0 siblings, 0 replies; 10+ messages in thread From: Jonathan Wakely @ 2024-04-15 18:30 UTC (permalink / raw) To: libstdc++, gcc-patches Pushed to trunk now. On Mon, 8 Apr 2024 at 17:53, Jonathan Wakely <jwakely@redhat.com> wrote: > > Patch v2. > > I realised that it's not only negative delim values that cause the > problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256) > will cause traits_type::find to match 'a' but then the eq_int_type > comparison will fail because (int)'a' != (int)('a' + 256). > > This version of the patch calls to_int_type on the delim and if that > alters the value, it's never going to match so skip the loop that tries > to find it and just ignore up to n chars instead. > > Tested x86_64linux and aarch64-linux. > > -- >8 -- > > A negative delim value passed to std::istream::ignore can never match > any character in the stream, because the comparison is done using > traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns > negative values (except at EOF). The optimized version of ignore for the > std::istream specialization uses traits_type::find to locate the delim > character in the streambuf, which _can_ match a negative delim on > platforms where char is signed, but then we do another comparison using > eq_int_type which fails. The code then keeps looping forever, with > traits_type::find locating the character and traits_type::eq_int_type > saying it's not a match, so traits_type::find is used again and finds > the same character again. > > A possible fix would be to check with eq_int_type after a successful > find, to see whether we really have a match. However, that would be > suboptimal since we know that a negative delimiter will never match > using eq_int_type. So a better fix is to adjust the check at the top of > the function that handles delim==eof(), so that we treat all negative > delim values as equivalent to EOF. That way we don't bother using find > to search for something that will never match with eq_int_type. > > The version of ignore in the primary template doesn't need a change, > because it doesn't use traits_type::find, instead characters are > extracted one-by-one and always matched using eq_int_type. That avoids > the inconsistency between find and eq_int_type. The specialization for > std::wistream does use traits_type::find, but traits_type::to_int_type > is equivalent to an implicit conversion from wchar_t to wint_t, so > passing a wchar_t directly to ignore without using to_int_type works. > > libstdc++-v3/ChangeLog: > > PR libstdc++/93672 > * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): > Treat all negative delimiter values as eof(). > * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. > * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New > test. > --- > libstdc++-v3/src/c++98/istream.cc | 13 ++- > .../27_io/basic_istream/ignore/char/93672.cc | 101 ++++++++++++++++++ > .../basic_istream/ignore/wchar_t/93672.cc | 34 ++++++ > 3 files changed, 146 insertions(+), 2 deletions(-) > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc > index 07ac739c26a..d1b4444ff2b 100644 > --- a/libstdc++-v3/src/c++98/istream.cc > +++ b/libstdc++-v3/src/c++98/istream.cc > @@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > basic_istream<char>:: > ignore(streamsize __n, int_type __delim) > { > - if (traits_type::eq_int_type(__delim, traits_type::eof())) > - return ignore(__n); > + { > + // If conversion to int_type changes the value then __delim does not > + // correspond to a value of type char_type, and so will never match > + // a character extracted from the input sequence. Just use ignore(n). > + const int_type chk_delim = traits_type::to_int_type(__delim); > + const bool matchable = traits_type::eq_int_type(chk_delim, __delim); > + if (__builtin_expect(!matchable, 0)) > + return ignore(__n); > + // Now we know that __delim is a valid char_type value, so it's safe > + // for the code below to use traits_type::find to search for it. > + } > > _M_gcount = 0; > sentry __cerb(*this, true); > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > new file mode 100644 > index 00000000000..96737485b83 > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > @@ -0,0 +1,101 @@ > +// { dg-do run } > + > +#include <sstream> > +#include <limits> > +#include <testsuite_hooks.h> > + > +void > +test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set > +{ > + std::istringstream in(".\xfc..\xfd...\xfe."); > + > + // This should find '\xfd' even on platforms where char is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits<char>::to_int_type('\xfc')); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This should work equivalently to traits_type::to_int_type > + in.ignore(100, (unsigned char)'\xfd'); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > + > + // This only works if char is unsigned. > + in.ignore(100, '\xfe'); > + if (std::numeric_limits<char>::is_signed) > + { > + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') > + // so the delimiter does not match the character in the input sequence, > + // and ignore consumes all input until EOF. > + VERIFY( in.gcount() == 5 ); > + VERIFY( in.eof() ); > + } > + else > + { > + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter > + // matches the character in the input sequence, and doesn't reach EOF. > + VERIFY( in.gcount() == 4 ); > + VERIFY( ! in.eof() ); > + } > + > + in.clear(); > + in.str(".a."); > + in.ignore(100, 'a' + 256); // Should not match 'a' > + VERIFY( in.gcount() == 3 ); > + VERIFY( in.eof() ); > +} > + > +// Custom traits type that inherits all behaviour from std::char_traits<char>. > +struct traits : std::char_traits<char> { }; > + > +void > +test_primary_template() > +{ > + // Check that the primary template for std::basic_istream::ignore > + // works the same as the std::istream::ignore specialization. > + // The infinite loop bug was never present in the primary template, > + // because it doesn't use traits_type::find to search the input sequence. > + > + std::basic_istringstream<char, traits> in(".\xfc..\xfd...\xfe."); > + > + // This should find '\xfd' even on platforms where char is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits<char>::to_int_type('\xfc')); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This should work equivalently to traits_type::to_int_type > + in.ignore(100, (unsigned char)'\xfd'); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > + > + // This only works if char is unsigned. > + in.ignore(100, '\xfe'); > + if (std::numeric_limits<char>::is_signed) > + { > + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') > + // so the delimiter does not match the character in the input sequence, > + // and ignore consumes all input until EOF. > + VERIFY( in.gcount() == 5 ); > + VERIFY( in.eof() ); > + } > + else > + { > + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter > + // matches the character in the input sequence, and doesn't reach EOF. > + VERIFY( in.gcount() == 4 ); > + VERIFY( ! in.eof() ); > + } > + > + in.clear(); > + in.str(".a."); > + in.ignore(100, 'a' + 256); // Should not match 'a' > + VERIFY( in.gcount() == 3 ); > + VERIFY( in.eof() ); > +} > + > +int main() > +{ > + test_pr93672(); > + test_primary_template(); > +} > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > new file mode 100644 > index 00000000000..5ce9155e02c > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > @@ -0,0 +1,34 @@ > +// { dg-do run } > + > +#include <sstream> > +#include <limits> > +#include <climits> > +#include <testsuite_hooks.h> > + > +// PR 93672 was a bug in std::istream that never affected std::wistream. > +// This test ensures that the bug doesn't get introduced to std::wistream. > +void > +test_pr93672() > +{ > + std::wstring str = L".x..x."; > + str[1] = (wchar_t)-2; > + str[4] = (wchar_t)-3; > + std::wistringstream in(str); > + > + // This should find the character even on platforms where wchar_t is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits<wchar_t>::to_int_type((wchar_t)-2)); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This also works, because std::char_traits<wchar_t>::to_int_type(wc) is > + // equivalent to (int_type)wc so using to_int_type isn't needed. > + in.ignore(100, (wchar_t)-3); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > +} > + > +int main() > +{ > + test_pr93672(); > +} > -- > 2.44.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-04-15 18:30 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-04 15:29 [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] Jonathan Wakely 2024-04-04 15:40 ` Iain Sandoe 2024-04-04 16:30 ` Jonathan Wakely 2024-04-04 16:55 ` Jonathan Wakely 2024-04-04 19:53 ` Jonathan Wakely 2024-04-04 16:28 ` Ulrich Drepper 2024-04-04 16:33 ` Jonathan Wakely 2024-04-04 16:35 ` Jonathan Wakely 2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely 2024-04-15 18:30 ` Jonathan Wakely
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).