From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21265 invoked by alias); 18 Jan 2014 22:02:15 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 21243 invoked by uid 48); 18 Jan 2014 22:02:10 -0000 From: "wjl at icecavern dot net" To: gcc-bugs@gcc.gnu.org Subject: [Bug c++/59873] New: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0. Date: Sat, 18 Jan 2014 22:02:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: c++ X-Bugzilla-Version: 4.8.3 X-Bugzilla-Keywords: X-Bugzilla-Severity: major X-Bugzilla-Who: wjl at icecavern dot net X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-01/txt/msg02001.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59873 Bug ID: 59873 Summary: The value of char32_t U'\u0000' and char16_t u'\u000' is 1, instead of 0. Product: gcc Version: 4.8.3 Status: UNCONFIRMED Severity: major Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: wjl at icecavern dot net I found a major bug with char32_t and char16_t literals when trying to encode a U+0000 (Null). The following expressions have the numeric value 1, instead of the correct value, 0. This makes it impossible to use code which has these literals. The following typescript shows a program that demonstrates the problem, and shows the behavior of g++ (incorrect) vs. clang++ (correct): $ cat test.c++ #include #include int main() { char32_t null = U'\u0000'; std::cerr << "null (char32_t) = " << null << '\n'; std::cerr << "null (uint32_t) = " << uint32_t(null) << '\n'; char32_t soh = U'\u0001'; std::cerr << "soh (char32_t) = " << soh << '\n'; std::cerr << "soh (uint32_t) = " << uint32_t(soh) << '\n'; std::cerr << "char32_t null == soh = " << (U'\u0000' == U'\u0001') << '\n'; char16_t null16 = u'\u0000'; std::cerr << "null (char16_t) = " << null16 << '\n'; std::cerr << "null (uint16_t) = " << uint16_t(null16) << '\n'; char16_t soh16 = u'\u0001'; std::cerr << "soh (char16_t) = " << soh16 << '\n'; std::cerr << "soh (uint16_t) = " << uint16_t(soh16) << '\n'; std::cerr << "char16_t null == soh = " << (u'\u0000' == u'\u0001') << '\n'; } $ g++ --version g++ (Debian 4.8.2-10) 4.8.2 Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ g++ -Wall -Wextra -std=c++11 test.c++ $ ./a.out null (char32_t) = 1 null (uint32_t) = 1 soh (char32_t) = 1 soh (uint32_t) = 1 char32_t null == soh = 1 null (char16_t) = 1 null (uint16_t) = 1 soh (char16_t) = 1 soh (uint16_t) = 1 char16_t null == soh = 1 $ clang++ --version Debian clang version 3.5-1 (trunk) (based on LLVM 3.5) Target: x86_64-pc-linux-gnu Thread model: posix $ clang++ -Wall -Wextra -std=c++11 test.c++ $ ./a.out null (char32_t) = 0 null (uint32_t) = 0 soh (char32_t) = 1 soh (uint32_t) = 1 char32_t null == soh = 0 null (char16_t) = 0 null (uint16_t) = 0 soh (char16_t) = 1 soh (uint16_t) = 1 char16_t null == soh = 0