public inbox for libstdc++-cvs@sourceware.org help / color / mirror / Atom feed
From: Jonathan Wakely <redi@gcc.gnu.org> To: gcc-cvs@gcc.gnu.org, libstdc++-cvs@gcc.gnu.org Subject: [gcc r11-10770] libstdc++: Fix reading UTF-8 characters for 16-bit targets [PR104875] Date: Tue, 16 May 2023 12:50:15 +0000 (GMT) [thread overview] Message-ID: <20230516125015.EACAA3854170@sourceware.org> (raw) https://gcc.gnu.org/g:e4b0d0b84b719ea9cd3d0a7b0668cdd8055a07d2 commit r11-10770-ge4b0d0b84b719ea9cd3d0a7b0668cdd8055a07d2 Author: Jonathan Wakely <jwakely@redhat.com> Date: Fri Mar 11 14:52:38 2022 +0000 libstdc++: Fix reading UTF-8 characters for 16-bit targets [PR104875] The current code in read_utf8_code_point assumes that integer promotion will create a 32-bit int, but that's not true for 16-bit targets like msp430 and avr. This changes the intermediate variables used for each octet from unsigned char to char32_t, so that (c << N) works correctly when N > 8. libstdc++-v3/ChangeLog: PR libstdc++/104875 * src/c++11/codecvt.cc (read_utf8_code_point): Use char32_t to hold octets that will be left-shifted. (cherry picked from commit 8f7b7c1495f92c72da154d32317943a2cc276ca8) Diff: --- libstdc++-v3/src/c++11/codecvt.cc | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc index f8a50fb7150..04252d6dc2a 100644 --- a/libstdc++-v3/src/c++11/codecvt.cc +++ b/libstdc++-v3/src/c++11/codecvt.cc @@ -254,7 +254,7 @@ namespace const size_t avail = from.size(); if (avail == 0) return incomplete_mb_character; - unsigned char c1 = from[0]; + char32_t c1 = (unsigned char) from[0]; // https://en.wikipedia.org/wiki/UTF-8#Sample_code if (c1 < 0x80) { @@ -267,7 +267,7 @@ namespace { if (avail < 2) return incomplete_mb_character; - unsigned char c2 = from[1]; + char32_t c2 = (unsigned char) from[1]; if ((c2 & 0xC0) != 0x80) return invalid_mb_sequence; char32_t c = (c1 << 6) + c2 - 0x3080; @@ -279,12 +279,12 @@ namespace { if (avail < 3) return incomplete_mb_character; - unsigned char c2 = from[1]; + char32_t c2 = (unsigned char) from[1]; if ((c2 & 0xC0) != 0x80) return invalid_mb_sequence; if (c1 == 0xE0 && c2 < 0xA0) // overlong return invalid_mb_sequence; - unsigned char c3 = from[2]; + char32_t c3 = (unsigned char) from[2]; if ((c3 & 0xC0) != 0x80) return invalid_mb_sequence; char32_t c = (c1 << 12) + (c2 << 6) + c3 - 0xE2080; @@ -296,17 +296,17 @@ namespace { if (avail < 4) return incomplete_mb_character; - unsigned char c2 = from[1]; + char32_t c2 = (unsigned char) from[1]; if ((c2 & 0xC0) != 0x80) return invalid_mb_sequence; if (c1 == 0xF0 && c2 < 0x90) // overlong return invalid_mb_sequence; if (c1 == 0xF4 && c2 >= 0x90) // > U+10FFFF return invalid_mb_sequence; - unsigned char c3 = from[2]; + char32_t c3 = (unsigned char) from[2]; if ((c3 & 0xC0) != 0x80) return invalid_mb_sequence; - unsigned char c4 = from[3]; + char32_t c4 = (unsigned char) from[3]; if ((c4 & 0xC0) != 0x80) return invalid_mb_sequence; char32_t c = (c1 << 18) + (c2 << 12) + (c3 << 6) + c4 - 0x3C82080;
reply other threads:[~2023-05-16 12:50 UTC|newest] Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20230516125015.EACAA3854170@sourceware.org \ --to=redi@gcc.gnu.org \ --cc=gcc-cvs@gcc.gnu.org \ --cc=libstdc++-cvs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).