From: Jonathan Wakely <jwakely@redhat.com>
To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org
Subject: [committed] libstdc++: Fix reading UTF-8 characters for 16-bit targets [PR104875]
Date: Mon, 14 Mar 2022 13:10:49 +0000 [thread overview]
Message-ID: <20220314131049.2080585-1-jwakely@redhat.com> (raw)
Tested powerpc64le-linux and sparc-sun-solaris2.11, pushed to trunk.
This should be backported too. It's not a regression, but this code is
just broken on 16-bit targets.
-- >8 --
The current code in read_utf8_code_point assumes that integer promotion
will create a 32-bit int, but that's not true for 16-bit targets like
msp430 and avr. This changes the intermediate variables used for each
octet from unsigned char to char32_t, so that (c << N) works correctly
when N > 8.
libstdc++-v3/ChangeLog:
PR libstdc++/104875
* src/c++11/codecvt.cc (read_utf8_code_point): Use char32_t to
hold octets that will be left-shifted.
---
libstdc++-v3/src/c++11/codecvt.cc | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/libstdc++-v3/src/c++11/codecvt.cc b/libstdc++-v3/src/c++11/codecvt.cc
index d9f2dacb647..9f8cb767732 100644
--- a/libstdc++-v3/src/c++11/codecvt.cc
+++ b/libstdc++-v3/src/c++11/codecvt.cc
@@ -254,7 +254,7 @@ namespace
const size_t avail = from.size();
if (avail == 0)
return incomplete_mb_character;
- unsigned char c1 = from[0];
+ char32_t c1 = (unsigned char) from[0];
// https://en.wikipedia.org/wiki/UTF-8#Sample_code
if (c1 < 0x80)
{
@@ -267,7 +267,7 @@ namespace
{
if (avail < 2)
return incomplete_mb_character;
- unsigned char c2 = from[1];
+ char32_t c2 = (unsigned char) from[1];
if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
char32_t c = (c1 << 6) + c2 - 0x3080;
@@ -279,12 +279,12 @@ namespace
{
if (avail < 3)
return incomplete_mb_character;
- unsigned char c2 = from[1];
+ char32_t c2 = (unsigned char) from[1];
if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
if (c1 == 0xE0 && c2 < 0xA0) // overlong
return invalid_mb_sequence;
- unsigned char c3 = from[2];
+ char32_t c3 = (unsigned char) from[2];
if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
char32_t c = (c1 << 12) + (c2 << 6) + c3 - 0xE2080;
@@ -296,17 +296,17 @@ namespace
{
if (avail < 4)
return incomplete_mb_character;
- unsigned char c2 = from[1];
+ char32_t c2 = (unsigned char) from[1];
if ((c2 & 0xC0) != 0x80)
return invalid_mb_sequence;
if (c1 == 0xF0 && c2 < 0x90) // overlong
return invalid_mb_sequence;
if (c1 == 0xF4 && c2 >= 0x90) // > U+10FFFF
return invalid_mb_sequence;
- unsigned char c3 = from[2];
+ char32_t c3 = (unsigned char) from[2];
if ((c3 & 0xC0) != 0x80)
return invalid_mb_sequence;
- unsigned char c4 = from[3];
+ char32_t c4 = (unsigned char) from[3];
if ((c4 & 0xC0) != 0x80)
return invalid_mb_sequence;
char32_t c = (c1 << 18) + (c2 << 12) + (c3 << 6) + c4 - 0x3C82080;
--
2.34.1
reply other threads:[~2022-03-14 13:10 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220314131049.2080585-1-jwakely@redhat.com \
--to=jwakely@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=libstdc++@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).