public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/41698] New: "\uFFFF" converts incorrectly to two-byte character
@ 2009-10-13 21:00 chasonr at newsguy dot com
2009-10-14 10:34 ` [Bug preprocessor/41698] " rguenth at gcc dot gnu dot org
2009-11-22 19:28 ` jsm28 at gcc dot gnu dot org
0 siblings, 2 replies; 4+ messages in thread
From: chasonr at newsguy dot com @ 2009-10-13 21:00 UTC (permalink / raw)
To: gcc-bugs
GCC 4.4.1 incorrectly parses the code point U+FFFF when generating a two-byte
character. It mistakes this code point for a supplemental one, and generates
an improper surrogate pair U+D7FF U+DFFF. This bug is present as far back as
GCC 3.4.6.
Here is a test program that demonstrates the bug, and could function as a
regression test. This program uses char16_t, but GCC 3.4.5 as shipped with
MinGW also shows this bug when wchar_t is used.
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--
/* gcc-utf16-test.c -- demonstrate a bug in GCC 4.4.1, that causes the code
point U+FFFF to convert incorrectly to UTF-16.
Compile on GCC 4.4.1 with -std=gnu99. */
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
static const __CHAR16_TYPE__ teststr1[] = u"\uFFFF";
static const __CHAR16_TYPE__ teststr2[] = u"\U00010000";
size_t i;
printf("The string \"\\uFFFF\" converts as:");
for (i = 0; teststr1[i] != 0; i++)
printf(" U+%04X", teststr1[i]);
printf("\n");
if (teststr1[0] != 0xFFFF || teststr1[1] != 0)
{
printf("This conversion is INCORRECT. It should be U+FFFF.\n");
return EXIT_FAILURE;
}
printf("The string \"\\U00010000\" converts as:");
for (i = 0; teststr2[i] != 0; i++)
printf(" U+%04X", teststr2[i]);
printf("\n");
if (teststr2[0] != 0xD800 || teststr2[1] != 0xDC00 || teststr2[2] != 0)
{
printf("This conversion is INCORRECT. It should be U+D800 U+DC00.\n");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--
The problem is a simple off-by-one error in the function one_utf8_to_utf16 in
libcpp/charset.c . The following patch against the GCC 4.4.1 source corrects
the bug:
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--
--- gcc-4.4.1/libcpp/charset.c.old 2009-04-09 19:23:07.000000000 -0400
+++ gcc-4.4.1/libcpp/charset.c 2009-10-12 04:06:25.000000000 -0400
@@ -354,7 +354,7 @@
return EILSEQ;
}
- if (s < 0xFFFF)
+ if (s <= 0xFFFF)
{
if (*outbytesleftp < 2)
{
--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--CUT HERE--
--
Summary: "\uFFFF" converts incorrectly to two-byte character
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: minor
Priority: P3
Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: chasonr at newsguy dot com
GCC build triplet: x86_64-unknown-linux
GCC host triplet: x86_64-unknown-linux
GCC target triplet: x86_64-unknown-linux
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41698
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-29 1:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <bug-41698-4@http.gcc.gnu.org/bugzilla/>
2014-11-29 1:56 ` [Bug preprocessor/41698] "\uFFFF" converts incorrectly to two-byte character jsm28 at gcc dot gnu.org
2014-11-29 1:57 ` jsm28 at gcc dot gnu.org
2009-10-13 21:00 [Bug c/41698] New: " chasonr at newsguy dot com
2009-10-14 10:34 ` [Bug preprocessor/41698] " rguenth at gcc dot gnu dot org
2009-11-22 19:28 ` jsm28 at gcc dot gnu dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).