From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0AB06386F421 for ; Mon, 15 Apr 2024 18:30:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0AB06386F421 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0AB06386F421 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713205850; cv=none; b=qvlnl7ZQ/V5SwSZBkC99fy6TtBY4fyqlOdJ/6D2a0gaRviRdDE7Cjh4ni4i5Y/8dyg+xVNqgzULoApw4t8GTvpjZqbkjlILyLcX+QNlZ/Zan4uY1taxbS4jrHApOSgFUJj1gP78KU70T+uZZKpWFtHahikL1x5V6fRHFlS+DLTg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1713205850; c=relaxed/simple; bh=gWLKTbU77i21vqcoxG0bptu8s4pX6u6z8v8jf9bcQi0=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=nwfRgrNFn4s6YZqF9nEnbABu4N1Q0F3TOXQ7SpVzQ7ta9JB3qsRSmJUjhCHzB3sIUyrLtHlHmJ+8ZRi/ywdIySluGcRoP1TmPNK8ZzalqiCgC84WekkkiCg35chn5lyfGibgfsGW5SCSwQlQnyLK4jjasbDT+imx/j387CRMcuk= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1713205840; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mHOIiBxT8BXPvHZEckjB4SKfZGH5xKOSaZgNfrSoHM0=; b=eCE3s5JgOi0rll9n4uImJgPd5GtzI9l/5yU8NSroUEBWfwd1uHNEb0BJEtJKyqFqB9rAdP oEJlDruHYJ/WmSQAHFkxGRm585lDFvyAYFx/i/vUKp1AGoXi/ygr0bop4R/abMoMrIZb/v KCGDpgT6NlIQVDmmGM6BW3i9HTVsLew= Received: from mail-yb1-f199.google.com (mail-yb1-f199.google.com [209.85.219.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-378-9pWetpgbPBKwF4KBR5oVqg-1; Mon, 15 Apr 2024 14:30:38 -0400 X-MC-Unique: 9pWetpgbPBKwF4KBR5oVqg-1 Received: by mail-yb1-f199.google.com with SMTP id 3f1490d57ef6-dced704f17cso5970117276.1 for ; Mon, 15 Apr 2024 11:30:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713205837; x=1713810637; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mHOIiBxT8BXPvHZEckjB4SKfZGH5xKOSaZgNfrSoHM0=; b=AmuD0wOsERxo88QvO9gVVO1E9T6WCtBxHiWAtCoU7d3qYzAZM9WKT06X6i7HxkWA/e D0Lp8CdNaUEPq245LXMC4zvYDpx7vrsAfNGqIibZ0l/JDLOgUutNqjhXRnm3WIrPMYkt SDJcw/r9gqjWe1fFVS6HBuUjSN4SGRYhC3/a5dyfmA8wPH0U/WRVqeEmwyszubce60fS QMDnzVdryL/ptGNLNYysZ9N6dcW2DQy1jr0C9+9d+F+iVGKyfZCtBQRkaDD9jECPrbE2 Bx+tcIbRyxy9ChngknUcSZ5n1QQRf7q5LgJk6W8OlvQLkL/qttehI1uvNhMqlwr/1g2l 0jOA== X-Gm-Message-State: AOJu0Yx0NqbLNferR8FTtwlHxt90cSKGHOSLW2DL0/XFCYvD9erovVNH V3Q+aMS4oWp9H7/oj1/msUZyk4GZTQE1vvHJaLTusMZqzmiPrb0+NV1Bt7VrjtZPmGdxxxzEyBA qsEzfzVoFdPcT4ZyeAp2QEMV9u3EaBoeOwaxqpNjHoctzH77PzS+0wXChvqDauB6DJYoDTwlE4q mkR+jP8kZxqrQdX+Nizahn1+8g1ay71iEUEpqmig== X-Received: by 2002:a25:fb0d:0:b0:dda:a608:54bf with SMTP id j13-20020a25fb0d000000b00ddaa60854bfmr9712488ybe.56.1713205837465; Mon, 15 Apr 2024 11:30:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IECQBrdRSi1NJHqwf+AmagLNRAlxlWm3IBR9qbSO4XXGmFcBOQKGhY9bmDsU6hjSfl2paBQbn3anv/Sk0uCCWs= X-Received: by 2002:a25:fb0d:0:b0:dda:a608:54bf with SMTP id j13-20020a25fb0d000000b00ddaa60854bfmr9712473ybe.56.1713205837132; Mon, 15 Apr 2024 11:30:37 -0700 (PDT) MIME-Version: 1.0 References: <20240404153158.313297-1-jwakely@redhat.com> <20240408165252.196710-1-jwakely@redhat.com> In-Reply-To: <20240408165252.196710-1-jwakely@redhat.com> From: Jonathan Wakely Date: Mon, 15 Apr 2024 19:30:21 +0100 Message-ID: Subject: Re: [PATCH v2] libstdc++: Fix infinite loop in std::istream::ignore(n, delim) [PR93672] To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Pushed to trunk now. On Mon, 8 Apr 2024 at 17:53, Jonathan Wakely wrote: > > Patch v2. > > I realised that it's not only negative delim values that cause the > problem, but also ones greater than CHAR_MAX. Calling ignore(n, 'a'+256) > will cause traits_type::find to match 'a' but then the eq_int_type > comparison will fail because (int)'a' != (int)('a' + 256). > > This version of the patch calls to_int_type on the delim and if that > alters the value, it's never going to match so skip the loop that tries > to find it and just ignore up to n chars instead. > > Tested x86_64linux and aarch64-linux. > > -- >8 -- > > A negative delim value passed to std::istream::ignore can never match > any character in the stream, because the comparison is done using > traits_type::eq_int_type(sb->sgetc(), delim) and sgetc() never returns > negative values (except at EOF). The optimized version of ignore for the > std::istream specialization uses traits_type::find to locate the delim > character in the streambuf, which _can_ match a negative delim on > platforms where char is signed, but then we do another comparison using > eq_int_type which fails. The code then keeps looping forever, with > traits_type::find locating the character and traits_type::eq_int_type > saying it's not a match, so traits_type::find is used again and finds > the same character again. > > A possible fix would be to check with eq_int_type after a successful > find, to see whether we really have a match. However, that would be > suboptimal since we know that a negative delimiter will never match > using eq_int_type. So a better fix is to adjust the check at the top of > the function that handles delim==eof(), so that we treat all negative > delim values as equivalent to EOF. That way we don't bother using find > to search for something that will never match with eq_int_type. > > The version of ignore in the primary template doesn't need a change, > because it doesn't use traits_type::find, instead characters are > extracted one-by-one and always matched using eq_int_type. That avoids > the inconsistency between find and eq_int_type. The specialization for > std::wistream does use traits_type::find, but traits_type::to_int_type > is equivalent to an implicit conversion from wchar_t to wint_t, so > passing a wchar_t directly to ignore without using to_int_type works. > > libstdc++-v3/ChangeLog: > > PR libstdc++/93672 > * src/c++98/istream.cc (istream::ignore(streamsize, int_type)): > Treat all negative delimiter values as eof(). > * testsuite/27_io/basic_istream/ignore/char/93672.cc: New test. > * testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc: New > test. > --- > libstdc++-v3/src/c++98/istream.cc | 13 ++- > .../27_io/basic_istream/ignore/char/93672.cc | 101 ++++++++++++++++++ > .../basic_istream/ignore/wchar_t/93672.cc | 34 ++++++ > 3 files changed, 146 insertions(+), 2 deletions(-) > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > > diff --git a/libstdc++-v3/src/c++98/istream.cc b/libstdc++-v3/src/c++98/istream.cc > index 07ac739c26a..d1b4444ff2b 100644 > --- a/libstdc++-v3/src/c++98/istream.cc > +++ b/libstdc++-v3/src/c++98/istream.cc > @@ -112,8 +112,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > basic_istream:: > ignore(streamsize __n, int_type __delim) > { > - if (traits_type::eq_int_type(__delim, traits_type::eof())) > - return ignore(__n); > + { > + // If conversion to int_type changes the value then __delim does not > + // correspond to a value of type char_type, and so will never match > + // a character extracted from the input sequence. Just use ignore(n). > + const int_type chk_delim = traits_type::to_int_type(__delim); > + const bool matchable = traits_type::eq_int_type(chk_delim, __delim); > + if (__builtin_expect(!matchable, 0)) > + return ignore(__n); > + // Now we know that __delim is a valid char_type value, so it's safe > + // for the code below to use traits_type::find to search for it. > + } > > _M_gcount = 0; > sentry __cerb(*this, true); > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > new file mode 100644 > index 00000000000..96737485b83 > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/char/93672.cc > @@ -0,0 +1,101 @@ > +// { dg-do run } > + > +#include > +#include > +#include > + > +void > +test_pr93672() // std::basic_istream::ignore hangs if delim MSB is set > +{ > + std::istringstream in(".\xfc..\xfd...\xfe."); > + > + // This should find '\xfd' even on platforms where char is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits::to_int_type('\xfc')); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This should work equivalently to traits_type::to_int_type > + in.ignore(100, (unsigned char)'\xfd'); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > + > + // This only works if char is unsigned. > + in.ignore(100, '\xfe'); > + if (std::numeric_limits::is_signed) > + { > + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') > + // so the delimiter does not match the character in the input sequence, > + // and ignore consumes all input until EOF. > + VERIFY( in.gcount() == 5 ); > + VERIFY( in.eof() ); > + } > + else > + { > + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter > + // matches the character in the input sequence, and doesn't reach EOF. > + VERIFY( in.gcount() == 4 ); > + VERIFY( ! in.eof() ); > + } > + > + in.clear(); > + in.str(".a."); > + in.ignore(100, 'a' + 256); // Should not match 'a' > + VERIFY( in.gcount() == 3 ); > + VERIFY( in.eof() ); > +} > + > +// Custom traits type that inherits all behaviour from std::char_traits. > +struct traits : std::char_traits { }; > + > +void > +test_primary_template() > +{ > + // Check that the primary template for std::basic_istream::ignore > + // works the same as the std::istream::ignore specialization. > + // The infinite loop bug was never present in the primary template, > + // because it doesn't use traits_type::find to search the input sequence. > + > + std::basic_istringstream in(".\xfc..\xfd...\xfe."); > + > + // This should find '\xfd' even on platforms where char is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits::to_int_type('\xfc')); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This should work equivalently to traits_type::to_int_type > + in.ignore(100, (unsigned char)'\xfd'); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > + > + // This only works if char is unsigned. > + in.ignore(100, '\xfe'); > + if (std::numeric_limits::is_signed) > + { > + // When char is signed, '\xfe' != traits_type::to_int_type('\xfe') > + // so the delimiter does not match the character in the input sequence, > + // and ignore consumes all input until EOF. > + VERIFY( in.gcount() == 5 ); > + VERIFY( in.eof() ); > + } > + else > + { > + // When char is unsigned, '\xfe' == to_int_type('\xfe') so the delimiter > + // matches the character in the input sequence, and doesn't reach EOF. > + VERIFY( in.gcount() == 4 ); > + VERIFY( ! in.eof() ); > + } > + > + in.clear(); > + in.str(".a."); > + in.ignore(100, 'a' + 256); // Should not match 'a' > + VERIFY( in.gcount() == 3 ); > + VERIFY( in.eof() ); > +} > + > +int main() > +{ > + test_pr93672(); > + test_primary_template(); > +} > diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > new file mode 100644 > index 00000000000..5ce9155e02c > --- /dev/null > +++ b/libstdc++-v3/testsuite/27_io/basic_istream/ignore/wchar_t/93672.cc > @@ -0,0 +1,34 @@ > +// { dg-do run } > + > +#include > +#include > +#include > +#include > + > +// PR 93672 was a bug in std::istream that never affected std::wistream. > +// This test ensures that the bug doesn't get introduced to std::wistream. > +void > +test_pr93672() > +{ > + std::wstring str = L".x..x."; > + str[1] = (wchar_t)-2; > + str[4] = (wchar_t)-3; > + std::wistringstream in(str); > + > + // This should find the character even on platforms where wchar_t is signed, > + // because the delimiter is correctly converted to the stream's int_type. > + in.ignore(100, std::char_traits::to_int_type((wchar_t)-2)); > + VERIFY( in.gcount() == 2 ); > + VERIFY( ! in.eof() ); > + > + // This also works, because std::char_traits::to_int_type(wc) is > + // equivalent to (int_type)wc so using to_int_type isn't needed. > + in.ignore(100, (wchar_t)-3); > + VERIFY( in.gcount() == 3 ); > + VERIFY( ! in.eof() ); > +} > + > +int main() > +{ > + test_pr93672(); > +} > -- > 2.44.0 >