public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "cvs-commit at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug libstdc++/108976] codecvt for Unicode allows surrogate code points
Date: Tue, 21 May 2024 22:03:17 +0000	[thread overview]
Message-ID: <bug-108976-4-bBoB0fqngW@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-108976-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108976

--- Comment #14 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:bd5e672303f5f777e8927a746d3ee42db21d871b

commit r13-8788-gbd5e672303f5f777e8927a746d3ee42db21d871b
Author: Dimitrij Mijoski <dmjpp@hotmail.com>
Date:   Thu Sep 28 21:38:11 2023 +0200

    libstdc++: Fix handling of surrogate CP in codecvt [PR108976]

    This patch fixes the handling of surrogate code points in all standard
    facets for transcoding Unicode that are based on std::codecvt. Surrogate
    code points should always be treated as error. On the other hand
    surrogate code units can only appear in UTF-16 and only when they come
    in a proper pair.

    Additionally, it fixes a bug in std::codecvt_utf16::in() when odd number
    of bytes were given in the range [from, from_end), error was returned
    always. The last byte in such range does not form a full UTF-16 code
    unit and we can not make any decisions for error, instead partial should
    be returned.

    The testsuite for testing these facets was updated in the following
    order:

    1. All functions that test codecvts that work with UTF-8 were refactored
       and made more generic so they accept codecvt that works with the char
       type char8_t.
    2. The same functions were updated with new test cases for transcoding
       errors and now additionally test for surrogates, overlong UTF-8
       sequences, code points out of the Unicode range, and more tests for
       missing leading and trailing code units.
    3. New tests were added to test codecvt_utf16 in both of its variants,
       UTF-16 <-> UTF-32/UCS-4 and UTF-16 <-> UCS-2.

    libstdc++-v3/ChangeLog:

            PR libstdc++/108976
            * src/c++11/codecvt.cc (read_utf8_code_point): Fix handing of
            surrogates in UTF-8.
            (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-8.
            (ucs4_in): Fix handling of range with odd number of bytes.
            (ucs4_out): Fix handling of surrogates in UCS-4 -> UTF-16.
            (ucs2_out): Fix handling of surrogates in UCS-2 -> UTF-16.
            (ucs2_in): Fix handling of range with odd number of bytes.
            (__codecvt_utf16_base<char16_t>::do_in): Likewise.
            (__codecvt_utf16_base<char32_t>::do_in): Likewise.
            (__codecvt_utf16_base<wchar_t>::do_in): Likewise.
            * testsuite/22_locale/codecvt/codecvt_unicode.cc: Renames, add
            tests for codecvt_utf16<char16_t> and codecvt_utf16<char32_t>.
            * testsuite/22_locale/codecvt/codecvt_unicode.h: Refactor UTF-8
            testing functions for char8_t, add more test cases for errors,
            add testing functions for codecvt_utf16.
            * testsuite/22_locale/codecvt/codecvt_unicode_wchar_t.cc:
            Renames, add tests for codecvt_utf16<whchar_t>.
            * testsuite/22_locale/codecvt/codecvt_utf16/79980.cc (test06):
            Fix test.
            * testsuite/22_locale/codecvt/codecvt_unicode_char8_t.cc: New
            test.

    (cherry picked from commit a8b9c32da787ea0bfbfc9118ac816fa7be4b1bc8)

  parent reply	other threads:[~2024-05-21 22:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-28 23:20 [Bug libstdc++/108976] New: " dmjpp at hotmail dot com
2023-03-02 11:03 ` [Bug libstdc++/108976] " redi at gcc dot gnu.org
2023-03-02 11:08 ` dmjpp at hotmail dot com
2023-03-02 11:17 ` redi at gcc dot gnu.org
2023-03-07 20:17 ` dmjpp at hotmail dot com
2023-03-07 21:43 ` redi at gcc dot gnu.org
2023-03-08 14:11 ` dmjpp at hotmail dot com
2023-04-18 13:45 ` dmjpp at hotmail dot com
2023-09-29 15:01 ` cvs-commit at gcc dot gnu.org
2024-01-05 16:34 ` dmjpp at hotmail dot com
2024-01-05 18:57 ` redi at gcc dot gnu.org
2024-01-13 11:56 ` dmjpp at hotmail dot com
2024-01-13 12:25 ` redi at gcc dot gnu.org
2024-05-21  9:14 ` jakub at gcc dot gnu.org
2024-05-21 22:03 ` cvs-commit at gcc dot gnu.org [this message]
2024-05-21 22:03 ` redi at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-108976-4-bBoB0fqngW@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).