public inbox for libstdc++@gcc.gnu.org
 help / color / mirror / Atom feed
From: Jonathan Wakely <jwakely@redhat.com>
To: Iain Sandoe <idsandoe@googlemail.com>
Cc: "libstdc++" <libstdc++@gcc.gnu.org>,
	GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672]
Date: Thu, 4 Apr 2024 17:55:36 +0100	[thread overview]
Message-ID: <CACb0b4kNaX+BX8feL4U9UXf0RZ7q21ocQaD4Va5_UYXf4_vEQQ@mail.gmail.com> (raw)
In-Reply-To: <CACb0b4k61G-2nttkL7e6e6XaOJ-UUG03dVedhc75JYS9xQfSow@mail.gmail.com>

On Thu, 4 Apr 2024 at 17:30, Jonathan Wakely <jwakely@redhat.com> wrote:
>
> On Thu, 4 Apr 2024 at 16:40, Iain Sandoe <idsandoe@googlemail.com> wrote:
> >
> >
> >
> > > On 4 Apr 2024, at 16:29, Jonathan Wakely <jwakely@redhat.com> wrote:
> > >
> > > I would appreciate more eyes on this to confirm my conclusions about
> > > negative int_type values, and the proposed fix, make sense.
> > >
> > > Tested x86_64-linux.
> > >
> > > -- >8 --
> > >
> > > A negative value for the delim value passed to std::istream::ignore can
> > > never match any character in the stream, because the comparison is done
> > > using traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never
> > > returns negative values (except at EOF). The optimized version of
> > > ignore for the std::istream specialization uses traits_type::find to
> > > locate the delim character in the streambuf, which _can_ match a
> > > negative delim on platforms where char is signed, but then we do another
> > > comparison using eq_int_type which fails. The code then keeps looping
> > > forever, with traits_type::find saying the character is present and
> > > eq_int_type saying it's not.
> > >
> > > A possible fix would be to check with eq_int_type after a successful
> > > find, to see whether we really have a match. However, that would be
> > > suboptimal since we know that a negative delimiter will never match
> > > using eq_int_type. So a better fix is to adjust the check at the top of
> > > the function that handles delim==eof(), so that we treat all negative
> > > delim values as equivalent to EOF. That way we don't bother using find
> > > to search for something that will never match with eq_int_type.
> >
> > Is the corollary to this that a platform with signed chars can never use a
> > negative value as a delimiter - since that we always be treated as EOF?
>
> That's what the C++ standard says (and is what libc++ does).
>
> The delimiter argument to ignore is an int_type, not a char. So
> formally you should call it like:
>
> std::cin.ignore(n, std::istream::traits_type::to_int_type('a'));
>
> where to_int_type will cast to unsigned char and then to int, so that
> no char can ever produce a negative value for that argument.
>
> If you happen to know that casting 'a' to unsigned char and then to
> int doesn't change its value (because it's a 7-bit ASCII value), then
> you can be lazy and do:
>
> std::cin.ignore(n, 'a');
>
> That works fine.
>
> But if your delimiter character is the MSB set, *and* char is signed
> on your platform, then you can't be lazy. The implicit conversion from
> char to the stream's int_type is not the same as the result of calling
> traits_type::to_int_type, and so these are NOT equivalent on a
> platform with signed char:
> std::cin.ignore(n, '\x80');
> std::cin.ignore(n, (unsigned char)'\x80');
>
> The former is wrong, the latter is correct.
> The former will never match a '\x80' in the stream, because the ignore
> function will cast each char extracted from the stream to
> (int)(unsigned char) and so never match -128.
>
> So the change to treat all negative values as EOF is just an
> optimization. Since they can never match, there's no point searching
> for them. Just skip n chars.

And FWIW, I don't think libc++ does that optimization. They extract
char-by-char and compare them, even though that can never match a
negative value. So they get the same result as with my patch, but much
slower ;-)

The status quo is that libstdc++ loops forever given a negative
delimiter. That's obviously wrong!

We need to fix the buggy logic that loops forever, and the "correct"
standard conforming fix means that we never match the negative value,
because the chars are extracted from the stream as non-negative
int_type values. As an optimization, we can skip the loop that keeps
trying impossible matches, because we know it won't match. That's what
my patch does. (It occurs to me now that we could also treat any delim
value greater than 255 as EOF too, since that can't match either).

Ulrich's suggestion is to allow the buggy user code to Just Work (for
all cases except (char)-1 on a platform where char is signed). That is
maybe the least surprising behaviour for users.

On the other hand, I'm not sure we really want to support:

cin.ignore(n, -10);
cin.ignore(n, -999);
cin.ignore(n, +999);

What are these trying to do? Those are just nonsense arguments to this
function. But if we try to make the testcase in the bug report Just
Work, we end up also accepting (at least) the -10 case, because it's
indistinguishable from the char '\xf6'. Depending how we do it, we
might also make the other cases work, treating -999 as '\x19', and
+999 as '\xe7'.


  reply	other threads:[~2024-04-04 16:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 15:29 Jonathan Wakely
2024-04-04 15:40 ` Iain Sandoe
2024-04-04 16:30   ` Jonathan Wakely
2024-04-04 16:55     ` Jonathan Wakely [this message]
2024-04-04 19:53       ` Jonathan Wakely
2024-04-04 16:28 ` Ulrich Drepper
2024-04-04 16:33   ` Jonathan Wakely
2024-04-04 16:35     ` Jonathan Wakely
2024-04-08 16:48 ` [PATCH v2] " Jonathan Wakely
2024-04-15 18:30   ` Jonathan Wakely

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACb0b4kNaX+BX8feL4U9UXf0RZ7q21ocQaD4Va5_UYXf4_vEQQ@mail.gmail.com \
    --to=jwakely@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=idsandoe@googlemail.com \
    --cc=libstdc++@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).