public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: John Schmerge <jbschmerge@gmail.com> To: gcc-bugs@gcc.gnu.org Subject: g++ off-by-one bug in utf16 conversion Date: Sun, 26 Oct 2014 06:22:00 -0000 [thread overview] Message-ID: <CAL1ZCA-JS0w+qeQW2BsJq8u6geTtZWjwMQYQKBiE5wSaX3tKag@mail.gmail.com> (raw) [-- Attachment #1: Type: text/plain, Size: 1116 bytes --] Hey guys, I came across this bug earlier today in implementing some unit tests for utf8/16 conversions... The following c++ fragment gives the wrong result: int main() { char16_t s[] = u"\uffff"; std::cout << std::hex << s[0] << " " << s[1] << std::endl; } it prints: d7ff dfff where as it should print: ffff 0 For those unfamiliar with utf16, all unicode values less than or equal to 0xffff remain 16 bit values and no conversion is done on them, code points greater than 0xffff get converted to a pair of 16-bit shorts, where the 1st is in the range 0xd800-dbff and the 2nd is in the range 0xdc00-dffff. Clearly this is an off-by-one issue. I traced it down to a use of a less-than operator vs less-than-equal operator in libcpp/charset.c I have verified this is a bug with versions 4.4.7 (rhel 6.5), 4.8.2 (linaro/ubuntu/mint) and g++ (GCC) 5.0.0 20141025... I am a bit surprised that this has gone so many years unnoticed or at least unresolved. Attached is a patch against gcc 4.8.2 from the gcc website for the issue to $gcc-root/libcpp/charset.c that fixes the issue by my tests. Thanks, John [-- Attachment #2: gcc-utf16.patch --] [-- Type: text/x-patch, Size: 250 bytes --] --- libcpp/charset.c 2014-10-26 01:24:10.583796875 -0400 +++ libcpp/charset.c.old 2014-10-26 01:23:50.103796842 -0400 @@ -353,7 +353,7 @@ return EILSEQ; } - if (s <= 0xFFFF) + if (s < 0xFFFF) { if (*outbytesleftp < 2) {
next reply other threads:[~2014-10-26 5:59 UTC|newest] Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-10-26 6:22 John Schmerge [this message] 2014-10-27 18:26 ` Joseph S. Myers
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAL1ZCA-JS0w+qeQW2BsJq8u6geTtZWjwMQYQKBiE5wSaX3tKag@mail.gmail.com \ --to=jbschmerge@gmail.com \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).