public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001.
@ 2012-06-15 19:42 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: kennytm at gmail dot com @ 2012-06-15 19:42 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Bug #: 53690
Summary: \u0000 and \U00000000 are wrong encoded as U+0001.
Classification: Unclassified
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
AssignedTo: unassigned@gcc.gnu.org
ReportedBy: kennytm@gmail.com
Tested with gcc 4.7 and 4.5 (via ideone)
~~~~~~~~~~~
#include <cstdint>
#include <cstdio>
int main() {
uint32_t a = U'\U00000000';
uint32_t b = U'\u0000';
uint32_t c = U'\x00';
uint32_t d = U'\0';
uint16_t e = u'\U00000000';
uint16_t f = u'\u0000';
uint16_t g = u'\x00';
uint16_t h = u'\0';
printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);
return 0;
}
// Compile with:
//
// g++ -std=c++11 x.cpp
~~~~~~~~~~~
This program prints "1 1 0 0 1 1 0 0", but the expected output should be "0 0 0
0 0 0 0 0".
gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/src/gcc-4.7-20120505/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-gnu-unique-object --enable-linker-build-id
--with-ppl --enable-cloog-backend=isl --enable-lto --enable-gold
--enable-ld=default --enable-plugin --with-plugin-ld=ld.gold
--with-linker-hash-style=gnu --enable-multilib --disable-libssp
--disable-build-with-cxx --disable-build-poststage1-with-cxx
--enable-checking=release --with-fpmath=sse
Thread model: posix
gcc version 4.7.0 20120505 (prerelease) (GCC)
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
@ 2012-06-15 20:48 ` redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-06-15 20:48 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Last reconfirmed| |2012-06-15
Version|unknown |4.8.0
Summary|\u0000 and \U00000000 are |[C++11] \u0000 and
|wrongly encoded as U+0001. |\U00000000 are wrongly
| |encoded as U+0001.
Ever Confirmed|0 |1
Known to fail| |4.6.3, 4.7.1, 4.8.0
--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-06-15 20:48:07 UTC ---
confirmed
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
@ 2012-07-08 11:59 ` schwab@linux-m68k.org
2012-07-08 20:23 ` steven at gcc dot gnu.org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: schwab@linux-m68k.org @ 2012-07-08 11:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Andreas Schwab <schwab@linux-m68k.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |arnetheduck at gmail dot
| |com
--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> 2012-07-08 11:58:31 UTC ---
*** Bug 53892 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
@ 2012-07-08 20:23 ` steven at gcc dot gnu.org
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:23 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:23:40 UTC ---
Test case:
$ cat testsuite/g++.dg/pr53690.C
// { dg-do compile }
// { dg-options "-std=c++11" }
extern "C" int printf (__const char *__restrict __format, ...);
typedef unsigned short uint16_t;
typedef unsigned int uint32_t;
int main() {
uint32_t a = U'\U00000000';
uint32_t b = U'\u0000';
uint32_t c = U'\x00';
uint32_t d = U'\0';
uint16_t e = u'\U00000000';
uint16_t f = u'\u0000';
uint16_t g = u'\x00';
uint16_t h = u'\0';
printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);
return 0;
}
// { dg-final { scan-tree-dump-not "= 1" "original" } }
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (2 preceding siblings ...)
2012-07-08 20:23 ` steven at gcc dot gnu.org
@ 2012-07-08 20:53 ` steven at gcc dot gnu.org
2012-07-08 21:44 ` redi at gcc dot gnu.org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:53 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|steven at gcc dot gnu.org |tromey at redhat dot com
Component|c++ |preprocessor
--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:53:32 UTC ---
The bug is in the preprocessor, see libcpp/charset.c:1074:
if (result == 0)
result = 1;
return result;
}
That code is older than the revision where libcpp became stand-alone 8 years
ago (r82199), and the initial check-in of gcc/cppcharset.c (r65845) already has
this code, too.
(See also http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01497.html)
One for the libcpp maintainer...
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (3 preceding siblings ...)
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
@ 2012-07-08 21:44 ` redi at gcc dot gnu.org
2012-07-08 21:59 ` steven at gcc dot gnu.org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-07-08 21:44 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-07-08 21:44:28 UTC ---
For the record,
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
behaviour so UCNs corresponding to control characters are allowed in character
and string literals.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (4 preceding siblings ...)
2012-07-08 21:44 ` redi at gcc dot gnu.org
@ 2012-07-08 21:59 ` steven at gcc dot gnu.org
2015-04-30 10:37 ` paolo.carlini at oracle dot com
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 21:59 UTC (permalink / raw)
To: gcc-bugs
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Steven Bosscher <steven at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |steven at gcc dot gnu.org
--- Comment #6 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 21:59:03 UTC ---
(In reply to comment #5)
> For the record,
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
> behaviour so UCNs corresponding to control characters are allowed in character
> and string literals.
Yes, see r152614.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (5 preceding siblings ...)
2012-07-08 21:59 ` steven at gcc dot gnu.org
@ 2015-04-30 10:37 ` paolo.carlini at oracle dot com
2015-06-21 17:23 ` wjl at icecavern dot net
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-04-30 10:37 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |wjl at icecavern dot net
--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> ---
*** Bug 59873 has been marked as a duplicate of this bug. ***
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (6 preceding siblings ...)
2015-04-30 10:37 ` paolo.carlini at oracle dot com
@ 2015-06-21 17:23 ` wjl at icecavern dot net
2015-07-01 18:40 ` paolo.carlini at oracle dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2015-06-21 17:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #8 from Wesley J. Landaker <wjl at icecavern dot net> ---
This major bug -- with security implications -- is still present in GCC 5.1.1.
$ g++ --version
g++ (Debian 5.1.1-20) 5.1.1 20150616
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (7 preceding siblings ...)
2015-06-21 17:23 ` wjl at icecavern dot net
@ 2015-07-01 18:40 ` paolo.carlini at oracle dot com
2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-01 18:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |paolo.carlini at oracle dot com
--- Comment #9 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Mine.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (8 preceding siblings ...)
2015-07-01 18:40 ` paolo.carlini at oracle dot com
@ 2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo at gcc dot gnu.org @ 2015-07-02 18:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
--- Comment #10 from paolo at gcc dot gnu.org <paolo at gcc dot gnu.org> ---
Author: paolo
Date: Thu Jul 2 18:54:41 2015
New Revision: 225353
URL: https://gcc.gnu.org/viewcvs?rev=225353&root=gcc&view=rev
Log:
/libcpp
2015-07-02 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/53690
* charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
return type to bool. Fix encoding of \u0000 and \U00000000 in C++.
(convert_ucn): Adjust call.
* lex.c (forms_identifier_p): Likewise.
* internal.h (_cpp_valid_ucn): Adjust declaration.
/gcc/testsuite
2015-07-02 Paolo Carlini <paolo.carlini@oracle.com>
PR c++/53690
* g++.dg/cpp/pr53690.C: New.
Added:
trunk/gcc/testsuite/g++.dg/cpp/pr53690.C
Modified:
trunk/gcc/testsuite/ChangeLog
trunk/libcpp/ChangeLog
trunk/libcpp/charset.c
trunk/libcpp/internal.h
trunk/libcpp/lex.c
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
` (9 preceding siblings ...)
2015-07-02 18:55 ` paolo at gcc dot gnu.org
@ 2015-07-02 18:57 ` paolo.carlini at oracle dot com
10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-02 18:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690
Paolo Carlini <paolo.carlini at oracle dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |6.0
--- Comment #11 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Fixed.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-07-02 18:57 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
2012-07-08 20:23 ` steven at gcc dot gnu.org
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
2012-07-08 21:44 ` redi at gcc dot gnu.org
2012-07-08 21:59 ` steven at gcc dot gnu.org
2015-04-30 10:37 ` paolo.carlini at oracle dot com
2015-06-21 17:23 ` wjl at icecavern dot net
2015-07-01 18:40 ` paolo.carlini at oracle dot com
2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).