public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
@ 2014-01-18 22:02 wjl at icecavern dot net
  2014-01-18 22:05 ` [Bug c++/59873] " wjl at icecavern dot net
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 22:02 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

            Bug ID: 59873
           Summary: The value of char32_t U'\u0000' and char16_t u'\u000'
                    is 1, instead of 0.
           Product: gcc
           Version: 4.8.3
            Status: UNCONFIRMED
          Severity: major
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: wjl at icecavern dot net

I found a major bug with char32_t and char16_t literals when trying to encode a
U+0000 (Null).

The following expressions have the numeric value 1, instead of the correct
value, 0. This makes it impossible to use code which has these literals.

The following typescript shows a program that demonstrates the problem, and
shows the behavior of g++ (incorrect) vs. clang++ (correct):

$ cat test.c++ 
#include <cstdint>
#include <iostream>

int main() {
    char32_t null = U'\u0000';
    std::cerr << "null (char32_t) = " << null << '\n';
    std::cerr << "null (uint32_t) = " << uint32_t(null) << '\n';

    char32_t soh = U'\u0001';
    std::cerr << "soh (char32_t) = " << soh << '\n';
    std::cerr << "soh (uint32_t) = " << uint32_t(soh) << '\n';

    std::cerr << "char32_t null == soh = " << (U'\u0000' == U'\u0001') << '\n';

    char16_t null16 = u'\u0000';
    std::cerr << "null (char16_t) = " << null16 << '\n';
    std::cerr << "null (uint16_t) = " << uint16_t(null16) << '\n';

    char16_t soh16 = u'\u0001';
    std::cerr << "soh (char16_t) = " << soh16 << '\n';
    std::cerr << "soh (uint16_t) = " << uint16_t(soh16) << '\n';

    std::cerr << "char16_t null == soh = " << (u'\u0000' == u'\u0001') << '\n';
}
$ g++ --version
g++ (Debian 4.8.2-10) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++ -Wall -Wextra -std=c++11 test.c++
$ ./a.out
null (char32_t) = 1
null (uint32_t) = 1
soh (char32_t) = 1
soh (uint32_t) = 1
char32_t null == soh = 1
null (char16_t) = 1
null (uint16_t) = 1
soh (char16_t) = 1
soh (uint16_t) = 1
char16_t null == soh = 1
$ clang++ --version
Debian clang version 3.5-1 (trunk) (based on LLVM 3.5)
Target: x86_64-pc-linux-gnu
Thread model: posix
$ clang++ -Wall -Wextra -std=c++11 test.c++ 
$ ./a.out 
null (char32_t) = 0
null (uint32_t) = 0
soh (char32_t) = 1
soh (uint32_t) = 1
char32_t null == soh = 0
null (char16_t) = 0
null (uint16_t) = 0
soh (char16_t) = 1
soh (uint16_t) = 1
char16_t null == soh = 0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
@ 2014-01-18 22:05 ` wjl at icecavern dot net
  2014-01-18 22:10 ` wjl at icecavern dot net
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 22:05 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

Wesley J. Landaker <wjl at icecavern dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|4.8.3                       |4.9.0

--- Comment #1 from Wesley J. Landaker <wjl at icecavern dot net> ---
I just tested this with gcc 4.9 in Debian experimental and the problem still
exists there:

$ g++-4.9 --version
g++-4.9 (Debian 4.9-20140116-1) 4.9.0 20140116 (experimental) [trunk revision
206688]
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ g++-4.9 -Wall -Wextra -std=c++11 test.c++ 
$ ./a.out 
null (char32_t) = 1
null (uint32_t) = 1
soh (char32_t) = 1
soh (uint32_t) = 1
char32_t null == soh = 1
null (char16_t) = 1
null (uint16_t) = 1
soh (char16_t) = 1
soh (uint16_t) = 1
char16_t null == soh = 1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
  2014-01-18 22:05 ` [Bug c++/59873] " wjl at icecavern dot net
@ 2014-01-18 22:10 ` wjl at icecavern dot net
  2014-01-18 23:19 ` wjl at icecavern dot net
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 22:10 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #2 from Wesley J. Landaker <wjl at icecavern dot net> ---
Created attachment 31886
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31886&action=edit
The test.c++ program shown in the bug

For convenience, here is the test.c++ program as an attachment (same exact code
as shown in the bug report).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
  2014-01-18 22:05 ` [Bug c++/59873] " wjl at icecavern dot net
  2014-01-18 22:10 ` wjl at icecavern dot net
@ 2014-01-18 23:19 ` wjl at icecavern dot net
  2014-01-18 23:21 ` glisse at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 23:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #3 from Wesley J. Landaker <wjl at icecavern dot net> ---
Created attachment 31887
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31887&action=edit
A truncated version of char32_literal_test.c++

I also made another program that tests ALL possible char32_t literals and
demonstrates that U+0000 (Null) is the only one that fails on gcc (it works on
clang).

The attached program is truncated because the full program is over 17 MiB, but
the literals were just generated with a script like this (surrogates were just
cut out by hand with vim):

for i in {0..1114111}; do printf "\tU'\\\\U%08x',\n" $i; done >
char32_literal_test.c++


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (2 preceding siblings ...)
  2014-01-18 23:19 ` wjl at icecavern dot net
@ 2014-01-18 23:21 ` glisse at gcc dot gnu.org
  2014-01-18 23:33 ` wjl at icecavern dot net
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: glisse at gcc dot gnu.org @ 2014-01-18 23:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #4 from Marc Glisse <glisse at gcc dot gnu.org> ---
Seems to be on purpose, see the comment before _cpp_valid_ucn in
libcpp/charset.c, and the last instruction in that function.

[lex.charset] is a bit hard to read for me.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (3 preceding siblings ...)
  2014-01-18 23:21 ` glisse at gcc dot gnu.org
@ 2014-01-18 23:33 ` wjl at icecavern dot net
  2014-01-18 23:40 ` schwab@linux-m68k.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 23:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #5 from Wesley J. Landaker <wjl at icecavern dot net> ---
(In reply to Marc Glisse from comment #4)
> Seems to be on purpose, see the comment before _cpp_valid_ucn in
> libcpp/charset.c, and the last instruction in that function.
> 
> [lex.charset] is a bit hard to read for me.

If I'm reading that comment right, it sounds like the C++11 standard says that
something like: U'\u0000' should yield a compiler error, like it currently does
with U'\ud800' (a surrogate), instead of silently working in an unexpected
manner.

Assuming this line of reasoning is correct, my second test program (the
char32_literal_test.c++) shows that gcc has a bug in that it does not propertly
*reject* any invalid \uXXXX or \UXXXXXXXX except for surrogates. (As an aside,
if this really does violate the C++11 standard, clang has this same bug -- it
just behaved in the way I naively expected it to.)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (4 preceding siblings ...)
  2014-01-18 23:33 ` wjl at icecavern dot net
@ 2014-01-18 23:40 ` schwab@linux-m68k.org
  2014-01-18 23:47 ` wjl at icecavern dot net
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: schwab@linux-m68k.org @ 2014-01-18 23:40 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #6 from Andreas Schwab <schwab@linux-m68k.org> ---
\u0000 is only malformed outside of string and char literals, eg. in
identifiers.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (5 preceding siblings ...)
  2014-01-18 23:40 ` schwab@linux-m68k.org
@ 2014-01-18 23:47 ` wjl at icecavern dot net
  2014-01-20  0:16 ` wjl at icecavern dot net
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-18 23:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

Wesley J. Landaker <wjl at icecavern dot net> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |http://llvm.org/bugs/show_b
                   |                            |ug.cgi?id=18535

--- Comment #7 from Wesley J. Landaker <wjl at icecavern dot net> ---
(In reply to Andreas Schwab from comment #6)
> \u0000 is only malformed outside of string and char literals, eg. in
> identifiers.

In that case, it sounds like my original issue of U'\u0000' == 1 is still the
real bug.

(Aside for those who care about clang): Based on Marc's comment and what I read
in the linked source code, I reported an issue to clang (see the see also bug),
believing that perhaps this was a bug in their compiler as well but I may have
done so in error, if this is only a gcc issue as I originally believed.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (6 preceding siblings ...)
  2014-01-18 23:47 ` wjl at icecavern dot net
@ 2014-01-20  0:16 ` wjl at icecavern dot net
  2014-01-20  0:19 ` wjl at icecavern dot net
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-20  0:16 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #8 from Wesley J. Landaker <wjl at icecavern dot net> ---
Just as an additional point, L'\u0000' also yields a wchar_t with the value of
1. (If that is an illegal construct, it is not warned about when using -Wall
-Wextra -Werror).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (7 preceding siblings ...)
  2014-01-20  0:16 ` wjl at icecavern dot net
@ 2014-01-20  0:19 ` wjl at icecavern dot net
  2014-07-20 20:51 ` [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u0000' " richard-gccbugzilla at metafoo dot co.uk
  2015-04-30 10:37 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: wjl at icecavern dot net @ 2014-01-20  0:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

--- Comment #9 from Wesley J. Landaker <wjl at icecavern dot net> ---
This also happens in strings, e.g.:

static_assert(U"\u0000"[0] == 1, "this passes");
static_assert(U"\u0000"[0] == 0, "this fails");


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u0000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (8 preceding siblings ...)
  2014-01-20  0:19 ` wjl at icecavern dot net
@ 2014-07-20 20:51 ` richard-gccbugzilla at metafoo dot co.uk
  2015-04-30 10:37 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: richard-gccbugzilla at metafoo dot co.uk @ 2014-07-20 20:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

Richard Smith <richard-gccbugzilla at metafoo dot co.uk> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |richard-gccbugzilla@metafoo
                   |                            |.co.uk

--- Comment #11 from Richard Smith <richard-gccbugzilla at metafoo dot co.uk> ---
This looks like a duplicate of bug 53690.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u0000' is 1, instead of 0.
  2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
                   ` (9 preceding siblings ...)
  2014-07-20 20:51 ` [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u0000' " richard-gccbugzilla at metafoo dot co.uk
@ 2015-04-30 10:37 ` paolo.carlini at oracle dot com
  10 siblings, 0 replies; 12+ messages in thread
From: paolo.carlini at oracle dot com @ 2015-04-30 10:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873

Paolo Carlini <paolo.carlini at oracle dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |DUPLICATE

--- Comment #12 from Paolo Carlini <paolo.carlini at oracle dot com> ---
Thanks Richard.

*** This bug has been marked as a duplicate of bug 53690 ***


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2015-04-30 10:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-18 22:02 [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0 wjl at icecavern dot net
2014-01-18 22:05 ` [Bug c++/59873] " wjl at icecavern dot net
2014-01-18 22:10 ` wjl at icecavern dot net
2014-01-18 23:19 ` wjl at icecavern dot net
2014-01-18 23:21 ` glisse at gcc dot gnu.org
2014-01-18 23:33 ` wjl at icecavern dot net
2014-01-18 23:40 ` schwab@linux-m68k.org
2014-01-18 23:47 ` wjl at icecavern dot net
2014-01-20  0:16 ` wjl at icecavern dot net
2014-01-20  0:19 ` wjl at icecavern dot net
2014-07-20 20:51 ` [Bug c++/59873] The value of char32_t U'\u0000' and char16_t u'\u0000' " richard-gccbugzilla at metafoo dot co.uk
2015-04-30 10:37 ` paolo.carlini at oracle dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).