* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
@ 2012-06-15 20:48 ` redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-06-15 20:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-06-15
Version|unknown |4.8.0
Summary|\u0000 and \U00000000 are |[C++11] \u0000 and
|wrongly encoded as U+0001. |\U00000000 are wrongly
| |encoded as U+0001.
Ever Confirmed|0 |1
Known to fail| |4.6.3, 4.7.1, 4.8.0
--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-06-15 20:48:07 UTC ---
confirmed
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
@ 2012-07-08 11:59 ` schwab@linux-m68k.org
2012-07-08 20:23 ` steven at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: schwab@linux-m68k.org @ 2012-07-08 11:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Andreas Schwab <schwab@linux-m68k.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |arnetheduck at gmail dot
| |com
--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> 2012-07-08 11:58:31 UTC ---
*** Bug 53892 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
@ 2012-07-08 20:23 ` steven at gcc dot gnu.org
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:23:40 UTC ---
Test case:
$ cat testsuite/g++.dg/pr53690.C
// { dg-do compile }
// { dg-options "-std=c++11" }
extern "C" int printf (__const char *__restrict __format, ...);
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
int main() {
uint32_t a = U'\U00000000';
uint32_t b = U'\u0000';
uint32_t c = U'\x00';
uint32_t d = U'\0';
uint16_t e = u'\U00000000';
uint16_t f = u'\u0000';
uint16_t g = u'\x00';
uint16_t h = u'\0';
printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);
return 0;
}
// { dg-final { scan-tree-dump-not "= 1" "original" } }
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (2 preceding siblings ...)
2012-07-08 20:23 ` steven at gcc dot gnu.org
@ 2012-07-08 20:53 ` steven at gcc dot gnu.org
2012-07-08 21:44 ` redi at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|steven at gcc dot gnu.org |tromey at redhat dot com
Component|c++ |preprocessor
--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:53:32 UTC ---
The bug is in the preprocessor, see libcpp/charset.c:1074:
if (result == 0)
result = 1;
return result;
}
That code is older than the revision where libcpp became stand-alone 8 years
ago (r82199), and the initial check-in of gcc/cppcharset.c (r65845) already has
this code, too.
(See also http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01497.html)
One for the libcpp maintainer...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (3 preceding siblings ...)
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
@ 2012-07-08 21:44 ` redi at gcc dot gnu.org
2012-07-08 21:59 ` steven at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-07-08 21:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-07-08 21:44:28 UTC ---
For the record,
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
behaviour so UCNs corresponding to control characters are allowed in character
and string literals.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (4 preceding siblings ...)
2012-07-08 21:44 ` redi at gcc dot gnu.org
@ 2012-07-08 21:59 ` steven at gcc dot gnu.org
2015-04-30 10:37 ` paolo.carlini at oracle dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 21:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |steven at gcc dot gnu.org
--- Comment #6 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 21:59:03 UTC ---
(In reply to comment #5)
> For the record,
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
> behaviour so UCNs corresponding to control characters are allowed in character
> and string literals.
Yes, see r152614.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (5 preceding siblings ...)
2012-07-08 21:59 ` steven at gcc dot gnu.org
@ 2015-04-30 10:37 ` paolo.carlini at oracle dot com
2015-06-21 17:23 ` wjl at icecavern dot net
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-04-30 10:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wjl at icecavern dot net
--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> ---
*** Bug 59873 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (6 preceding siblings ...)
2015-04-30 10:37 ` paolo.carlini at oracle dot com
@ 2015-06-21 17:23 ` wjl at icecavern dot net
2015-07-01 18:40 ` paolo.carlini at oracle dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2015-06-21 17:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #8 from Wesley J. Landaker <wjl at icecavern dot net> ---
This major bug -- with security implications -- is still present in GCC 5.1.1.
$ g++ --version
g++ (Debian 5.1.1-20) 5.1.1 20150616
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (7 preceding siblings ...)
2015-06-21 17:23 ` wjl at icecavern dot net
@ 2015-07-01 18:40 ` paolo.carlini at oracle dot com
2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-01 18:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |paolo.carlini at oracle dot com
--- Comment #9 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Mine.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (8 preceding siblings ...)
2015-07-01 18:40 ` paolo.carlini at oracle dot com
@ 2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo at gcc dot gnu.org @ 2015-07-02 18:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #10 from paolo at gcc dot gnu.org <paolo at gcc dot gnu.org> ---
Author: paolo
Date: Thu Jul 2 18:54:41 2015
New Revision: 225353
URL: https://gcc.gnu.org/viewcvs?rev=225353&root=gcc&view=rev
Log:
/libcpp
2015-07-02 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/53690
* charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
return type to bool. Fix encoding of \u0000 and \U00000000 in C++.
(convert_ucn): Adjust call.
* lex.c (forms_identifier_p): Likewise.
* internal.h (_cpp_valid_ucn): Adjust declaration.
/gcc/testsuite
2015-07-02 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/53690
* g++.dg/cpp/pr53690.C: New.
Added:
trunk/gcc/testsuite/g++.dg/cpp/pr53690.C
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/libcpp/ChangeLog
trunk/libcpp/charset.c
trunk/libcpp/internal.h
trunk/libcpp/lex.c
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (9 preceding siblings ...)
2015-07-02 18:55 ` paolo at gcc dot gnu.org
@ 2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-02 18:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |6.0
--- Comment #11 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Fixed.
^ permalink raw reply [flat|nested] 12+ messages in thread