From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id AB33F3858C78; Fri, 29 Sep 2023 15:01:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AB33F3858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1695999705; bh=nivRmL4Kij80KxmD4BZ7nvx7dcC4GJLsyNHD77icuXE=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Cok13OT11L9dChdsWtsX4w1D983fKuhu2VMLLOPoqcNGZi92QXVZ3qipUObjkWVnH hGkKPCdVOXVyR3ijqdLW2iB1P1/5An0xw2uSDaq+gxyjeydzyIeitpPWy46GCLKt75 t4FJmW9FI35QPJRDfTZAkDGw8Xo2o7UM3briS3N4= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libstdc++/108976] codecvt for Unicode allows surrogate code points Date: Fri, 29 Sep 2023 15:01:43 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libstdc++ X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: wrong-code X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D108976 --- Comment #8 from CVS Commits --- The master branch has been updated by Jonathan Wakely : https://gcc.gnu.org/g:a8b9c32da787ea0bfbfc9118ac816fa7be4b1bc8 commit r14-4335-ga8b9c32da787ea0bfbfc9118ac816fa7be4b1bc8 Author: Dimitrij Mijoski Date: Thu Sep 28 21:38:11 2023 +0200 libstdc++: Fix handling of surrogate CP in codecvt [PR108976] This patch fixes the handling of surrogate code points in all standard facets for transcoding Unicode that are based on std::codecvt. Surrogate code points should always be treated as error. On the other hand surrogate code units can only appear in UTF-16 and only when they come in a proper pair. Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number of bytes were given in the range [from, from_end), error was returned always. The last byte in such range does not form a full UTF-16 code unit and we can not make any decisions for error, instead partial should be returned. The testsuite for testing these facets was updated in the following order: 1. All functions that test codecvts that work with UTF-8 were refactored and made more generic so they accept codecvt that works with the char type char8_t. 2. The same functions were updated with new test cases for transcoding errors and now additionally test for surrogates, overlong UTF-8 sequences, code points out of the Unicode range, and more tests for missing leading and trailing code units. 3. New tests were added to test codecvt_utf16 in both of its variants, UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2. libstdc++-v3/ChangeLog: PR libstdc++/108976 * src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of surrogates in UTF-8. (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8. (ucs4_in): Fix handling of range with odd number of bytes. (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16. (ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16. (ucs2_in): Fix handling of range with odd number of bytes. (__codecvt_utf16_base::do_in): Likewise. (__codecvt_utf16_base::do_in): Likewise. (__codecvt_utf16_base::do_in): Likewise. * testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add tests for codecvt_utf16 and codecvt_utf16. * testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8 testing functions for char8_t, add more test cases for errors, add testing functions for codecvt_utf16. * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc: Renames, add tests for codecvt_utf16. * testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06): Fix test. * testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New test.=