From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id EAEEC3856975 for ; Tue, 12 Jul 2022 22:40:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org EAEEC3856975 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-496-MKFY4YZeNsWof_7UI8NJ4Q-1; Tue, 12 Jul 2022 18:40:38 -0400 X-MC-Unique: MKFY4YZeNsWof_7UI8NJ4Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B968A85A585; Tue, 12 Jul 2022 22:40:37 +0000 (UTC) Received: from localhost (unknown [10.33.36.243]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7D21A2026D64; Tue, 12 Jul 2022 22:40:37 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [committed] libstdc++: Check for EOF if extraction avoids buffer overflow [PR106248] Date: Tue, 12 Jul 2022 23:40:36 +0100 Message-Id: <20220712224036.312606-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libstdc++@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libstdc++ mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jul 2022 22:40:49 -0000 Tested x86_64-linux, pushed to trunk. Backports to gcc-12 and gcc-11 to follow. -- >8 -- In r11-2581-g17abcc77341584 (for LWG 2499) I added overflow checks to the pre-C++20 operator>>(istream&, char*) overload. Those checks can cause extraction to stop after filling the buffer, where previously it would have tried to extract another character and stopped at EOF. When that happens we no longer set eofbit in the stream state, which is consistent with the behaviour of the new C++20 overload, but is an observable and unexpected change in the C++17 behaviour. What makes it worse is that the behaviour change is dependent on optimization, because __builtin_object_size is used to detect the buffer size and that only works when optimizing. To avoid the unexpected and optimization-dependent change in behaviour, set eofbit manually if we stopped extracting because of the buffer size check, but had reached EOF anyway. If the stream's rdstate() != goodbit or width() is non-zero and smaller than the buffer, there's nothing to do. Otherwise, we filled the buffer and need to check for EOF, and maybe set eofbit. The new check is guarded by #ifdef __OPTIMIZE__ because otherwise __builtin_object_size is useless. There's no point compiling and emitting dead code that can't be eliminated because we're not optimizing. We could add extra checks that the next character in the buffer is not whitespace, to detect the case where we stopped early and prevented a buffer overflow that would have happened otherwise. That would allow us to assert or set badbit in the stream state when undefined behaviour was prevented. However, those extra checks would increase the size of the function, potentially reducing the likelihood of it being inlined, and so making the buffer size detection less reliable. It seems preferable to prevent UB and silently truncate, rather than miss the UB and allow the overflow to happen. libstdc++-v3/ChangeLog: PR libstdc++/106248 * include/std/istream [C++17] (operator>>(istream&, char*)): Set eofbit if we stopped extracting at EOF. * testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc: New test. * testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc: New test. --- libstdc++-v3/include/std/istream | 33 ++++++++++++--- .../extractors_character/char/pr106248.cc | 40 +++++++++++++++++++ .../extractors_character/wchar_t/pr106248.cc | 40 +++++++++++++++++++ 3 files changed, 107 insertions(+), 6 deletions(-) create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc create mode 100644 libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc diff --git a/libstdc++-v3/include/std/istream b/libstdc++-v3/include/std/istream index b506c4f7504..416ef556fa1 100644 --- a/libstdc++-v3/include/std/istream +++ b/libstdc++-v3/include/std/istream @@ -784,7 +784,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * - if `width()` is greater than zero, `n` is `min(width(), n)` * - otherwise `n` is the number of elements of the array * - (before C++20 the pointer is assumed to point to an array of - * - the largest possible size for an array of `char_type`). + * the largest possible size for an array of `char_type`). * * Characters are extracted and stored until one of the following happens: * - `n - 1` characters are stored @@ -802,19 +802,40 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline basic_istream<_CharT, _Traits>& operator>>(basic_istream<_CharT, _Traits>& __in, _CharT* __s) { +#ifdef __OPTIMIZE__ + // Function inlining might make the buffer size known, allowing us to + // prevent overflow. size_t __n = __builtin_object_size(__s, 0); - if (__builtin_expect(__n < sizeof(_CharT), false)) + if (__n < sizeof(_CharT)) { // There is not even space for the required null terminator. __glibcxx_assert(__n >= sizeof(_CharT)); + // No point calling __istream_extract, but still need to reset width. __in.width(0); __in.setstate(ios_base::failbit); } - else + else if (__n != (size_t)-1) { - if (__n == (size_t)-1) - __n = __gnu_cxx::__numeric_traits::__max; - std::__istream_extract(__in, __s, __n / sizeof(_CharT)); + __n /= sizeof(_CharT); + streamsize __w = __in.width(); + std::__istream_extract(__in, __s, __n); + if (__in.good() && (__w <= 0 || __n < __w)) + { + // Stopped extracting early to avoid overflowing the buffer, + // but might have stopped anyway (and set eofbit) if at EOF. + const typename _Traits::int_type __c = __in.rdbuf()->sgetc(); + const bool __eof = _Traits::eq_int_type(__c, _Traits::eof()); + if (__builtin_expect(__eof, true)) // Assume EOF, not overflow. + __in.setstate(ios_base::eofbit); + } + } + else +#endif // __OPTIMIZE + { + // Buffer size is unknown, have to assume it's huge. + streamsize __n = __gnu_cxx::__numeric_traits::__max; + __n /= sizeof(_CharT); + std::__istream_extract(__in, __s, __n); } return __in; } diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc new file mode 100644 index 00000000000..6d89a0e5fef --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/char/pr106248.cc @@ -0,0 +1,40 @@ +// { dg-do run } + +#include +#include + +void +test_pr106248() +{ + char buf[5] = {'x', 'x', 'x', 'x', 'x'}; + std::string s(" four"); + std::istringstream in(s); + in >> buf; +#if __cplusplus >= 202002L + // Extraction stops because buffer is full. + VERIFY( in.good() ); +#else + // PR libstdc++/106248 + // Extraction stops because all input has been consumed and eofbit is set. + VERIFY( in.eof() ); +#endif + // Extracted string must be null-terminated. + VERIFY( buf[4] == '\0' ); + VERIFY( std::string(buf) == "four" ); + + in.clear(); + in.str(s); + for (int i = 0; i < 5; ++i) + s[i] = 'x'; + + in.width(5); + in >> buf; + // Extraction stops due to field width, eofbit not set. + VERIFY( in.good() ); + VERIFY( std::string(buf) == "four" ); +} + +int main() +{ + test_pr106248(); +} diff --git a/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc new file mode 100644 index 00000000000..7c226600b9e --- /dev/null +++ b/libstdc++-v3/testsuite/27_io/basic_istream/extractors_character/wchar_t/pr106248.cc @@ -0,0 +1,40 @@ +// { dg-do run } + +#include +#include + +void +test_pr106248() +{ + wchar_t buf[5] = {L'x', L'x', L'x', L'x', L'x'}; + std::wstring s(L" four"); + std::wistringstream in(s); + in >> buf; +#if __cplusplus >= 202002L + // Extraction stops because buffer is full. + VERIFY( in.good() ); +#else + // PR libstdc++/106248 + // Extraction stops because all input has been consumed and eofbit is set. + VERIFY( in.eof() ); +#endif + // Extracted string must be null-terminated. + VERIFY( buf[4] == L'\0' ); + VERIFY( std::wstring(buf) == L"four" ); + + in.clear(); + in.str(s); + for (int i = 0; i < 5; ++i) + s[i] = L'x'; + + in.width(5); + in >> buf; + // Extraction stops due to field width, eofbit not set. + VERIFY( in.good() ); + VERIFY( std::wstring(buf) == L"four" ); +} + +int main() +{ + test_pr106248(); +} -- 2.36.1