public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001.
@ 2012-06-15 19:42 kennytm at gmail dot com
  2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: kennytm at gmail dot com @ 2012-06-15 19:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

             Bug #: 53690
           Summary: \u0000 and \U00000000 are wrong encoded as U+0001.
    Classification: Unclassified
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: kennytm@gmail.com


Tested with gcc 4.7 and 4.5 (via ideone)

~~~~~~~~~~~

#include <cstdint>
#include <cstdio>

int main() {
    uint32_t a = U'\U00000000';
    uint32_t b = U'\u0000';
    uint32_t c = U'\x00';
    uint32_t d = U'\0';

    uint16_t e = u'\U00000000';
    uint16_t f = u'\u0000';
    uint16_t g = u'\x00';
    uint16_t h = u'\0';

    printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);

    return 0;
}

// Compile with:
//
// g++ -std=c++11 x.cpp

~~~~~~~~~~~

This program prints "1 1 0 0 1 1 0 0", but the expected output should be "0 0 0
0 0 0 0 0".



gcc -v:

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /build/src/gcc-4.7-20120505/configure --prefix=/usr
--libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared
--enable-threads=posix --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--enable-libstdcxx-time --enable-gnu-unique-object --enable-linker-build-id
--with-ppl --enable-cloog-backend=isl --enable-lto --enable-gold
--enable-ld=default --enable-plugin --with-plugin-ld=ld.gold
--with-linker-hash-style=gnu --enable-multilib --disable-libssp
--disable-build-with-cxx --disable-build-poststage1-with-cxx
--enable-checking=release --with-fpmath=sse
Thread model: posix
gcc version 4.7.0 20120505 (prerelease) (GCC)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
@ 2012-06-15 20:48 ` redi at gcc dot gnu.org
  2012-07-08 11:59 ` schwab@linux-m68k.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-06-15 20:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-06-15
            Version|unknown                     |4.8.0
            Summary|\u0000 and \U00000000 are   |[C++11] \u0000 and
                   |wrongly encoded as U+0001.  |\U00000000 are wrongly
                   |                            |encoded as U+0001.
     Ever Confirmed|0                           |1
      Known to fail|                            |4.6.3, 4.7.1, 4.8.0

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-06-15 20:48:07 UTC ---
confirmed


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
  2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
@ 2012-07-08 11:59 ` schwab@linux-m68k.org
  2012-07-08 20:23 ` steven at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: schwab@linux-m68k.org @ 2012-07-08 11:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |arnetheduck at gmail dot
                   |                            |com

--- Comment #2 from Andreas Schwab <schwab@linux-m68k.org> 2012-07-08 11:58:31 UTC ---
*** Bug 53892 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
  2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
  2012-07-08 11:59 ` schwab@linux-m68k.org
@ 2012-07-08 20:23 ` steven at gcc dot gnu.org
  2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:23 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

--- Comment #3 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:23:40 UTC ---
Test case:

$ cat testsuite/g++.dg/pr53690.C
// { dg-do compile }
// { dg-options "-std=c++11" }

extern "C" int printf (__const char *__restrict __format, ...);

typedef unsigned short uint16_t;
typedef unsigned int uint32_t;

int main() {
    uint32_t a = U'\U00000000';
    uint32_t b = U'\u0000';
    uint32_t c = U'\x00';
    uint32_t d = U'\0';

    uint16_t e = u'\U00000000';
    uint16_t f = u'\u0000';
    uint16_t g = u'\x00';
    uint16_t h = u'\0';

    printf("%x %x %x %x %x %x %x %x\n", a, b, c, d, e, f, g, h);

    return 0;
}

// { dg-final { scan-tree-dump-not "= 1" "original" } }


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (2 preceding siblings ...)
  2012-07-08 20:23 ` steven at gcc dot gnu.org
@ 2012-07-08 20:53 ` steven at gcc dot gnu.org
  2012-07-08 21:44 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 20:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|steven at gcc dot gnu.org   |tromey at redhat dot com
          Component|c++                         |preprocessor

--- Comment #4 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 20:53:32 UTC ---
The bug is in the preprocessor, see libcpp/charset.c:1074:

  if (result == 0)
    result = 1;

  return result;
}

That code is older than the revision where libcpp became stand-alone 8 years
ago (r82199), and the initial check-in of gcc/cppcharset.c (r65845) already has
this code, too.

(See also http://gcc.gnu.org/ml/gcc-patches/2003-04/msg01497.html)

One for the libcpp maintainer...


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (3 preceding siblings ...)
  2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
@ 2012-07-08 21:44 ` redi at gcc dot gnu.org
  2012-07-08 21:59 ` steven at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: redi at gcc dot gnu.org @ 2012-07-08 21:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> 2012-07-08 21:44:28 UTC ---
For the record,
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
behaviour so UCNs corresponding to control characters are allowed in character
and string literals.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (4 preceding siblings ...)
  2012-07-08 21:44 ` redi at gcc dot gnu.org
@ 2012-07-08 21:59 ` steven at gcc dot gnu.org
  2015-04-30 10:37 ` paolo.carlini at oracle dot com
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: steven at gcc dot gnu.org @ 2012-07-08 21:59 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Steven Bosscher <steven at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |steven at gcc dot gnu.org

--- Comment #6 from Steven Bosscher <steven at gcc dot gnu.org> 2012-07-08 21:59:03 UTC ---
(In reply to comment #5)
> For the record,
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2170.html changed the
> behaviour so UCNs corresponding to control characters are allowed in character
> and string literals.

Yes, see r152614.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (5 preceding siblings ...)
  2012-07-08 21:59 ` steven at gcc dot gnu.org
@ 2015-04-30 10:37 ` paolo.carlini at oracle dot com
  2015-06-21 17:23 ` wjl at icecavern dot net
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-04-30 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |wjl at icecavern dot net

--- Comment #7 from Paolo Carlini <paolo.carlini at oracle dot com> ---
*** Bug 59873 has been marked as a duplicate of this bug. ***


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (6 preceding siblings ...)
  2015-04-30 10:37 ` paolo.carlini at oracle dot com
@ 2015-06-21 17:23 ` wjl at icecavern dot net
  2015-07-01 18:40 ` paolo.carlini at oracle dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2015-06-21 17:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

--- Comment #8 from Wesley J. Landaker <wjl at icecavern dot net> ---
This major bug -- with security implications -- is still present in GCC 5.1.1.

$ g++ --version
g++ (Debian 5.1.1-20) 5.1.1 20150616
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (7 preceding siblings ...)
  2015-06-21 17:23 ` wjl at icecavern dot net
@ 2015-07-01 18:40 ` paolo.carlini at oracle dot com
  2015-07-02 18:55 ` paolo at gcc dot gnu.org
  2015-07-02 18:57 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-01 18:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |paolo.carlini at oracle dot com

--- Comment #9 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Mine.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (8 preceding siblings ...)
  2015-07-01 18:40 ` paolo.carlini at oracle dot com
@ 2015-07-02 18:55 ` paolo at gcc dot gnu.org
  2015-07-02 18:57 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: paolo at gcc dot gnu.org @ 2015-07-02 18:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

--- Comment #10 from paolo at gcc dot gnu.org <paolo at gcc dot gnu.org> ---
Author: paolo
Date: Thu Jul  2 18:54:41 2015
New Revision: 225353

URL: https://gcc.gnu.org/viewcvs?rev=225353&root=gcc&view=rev
Log:
/libcpp
2015-07-02  Paolo Carlini  <paolo.carlini@oracle.com>

        PR c++/53690
        * charset.c (_cpp_valid_ucn): Add cppchar_t * parameter and change
        return type to bool.  Fix encoding of \u0000 and \U00000000 in C++.
        (convert_ucn): Adjust call.
        * lex.c (forms_identifier_p): Likewise.
        * internal.h (_cpp_valid_ucn): Adjust declaration.

/gcc/testsuite
2015-07-02  Paolo Carlini  <paolo.carlini@oracle.com>

        PR c++/53690
        * g++.dg/cpp/pr53690.C: New.

Added:
    trunk/gcc/testsuite/g++.dg/cpp/pr53690.C
Modified:
    trunk/gcc/testsuite/ChangeLog
    trunk/libcpp/ChangeLog
    trunk/libcpp/charset.c
    trunk/libcpp/internal.h
    trunk/libcpp/lex.c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/53690] [C++11] \u0000 and \U00000000 are wrongly encoded as U+0001.
  2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
                   ` (9 preceding siblings ...)
  2015-07-02 18:55 ` paolo at gcc dot gnu.org
@ 2015-07-02 18:57 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-07-02 18:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53690

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED
   Target Milestone|---                         |6.0

--- Comment #11 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Fixed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-07-02 18:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-15 19:42 [Bug c++/53690] New: \u0000 and \U00000000 are wrong encoded as U+0001 kennytm at gmail dot com
2012-06-15 20:48 ` [Bug c++/53690] [C++11] \u0000 and \U00000000 are wrongly " redi at gcc dot gnu.org
2012-07-08 11:59 ` schwab@linux-m68k.org
2012-07-08 20:23 ` steven at gcc dot gnu.org
2012-07-08 20:53 ` [Bug preprocessor/53690] " steven at gcc dot gnu.org
2012-07-08 21:44 ` redi at gcc dot gnu.org
2012-07-08 21:59 ` steven at gcc dot gnu.org
2015-04-30 10:37 ` paolo.carlini at oracle dot com
2015-06-21 17:23 ` wjl at icecavern dot net
2015-07-01 18:40 ` paolo.carlini at oracle dot com
2015-07-02 18:55 ` paolo at gcc dot gnu.org
2015-07-02 18:57 ` paolo.carlini at oracle dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).