From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 37323 invoked by alias); 17 Dec 2018 21:02:11 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 37302 invoked by uid 89); 17 Dec 2018 21:02:11 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=sk:cp_pars, integral, get_identifier, weak X-HELO: mail-qk1-f196.google.com Received: from mail-qk1-f196.google.com (HELO mail-qk1-f196.google.com) (209.85.222.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 17 Dec 2018 21:02:07 +0000 Received: by mail-qk1-f196.google.com with SMTP id 189so8215194qkj.8 for ; Mon, 17 Dec 2018 13:02:07 -0800 (PST) Return-Path: Received: from [192.168.1.149] (209-6-216-142.s141.c3-0.smr-cbr1.sbo-smr.ma.cable.rcncustomer.com. [209.6.216.142]) by smtp.gmail.com with ESMTPSA id g5sm7012693qkd.55.2018.12.17.13.02.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 17 Dec 2018 13:02:05 -0800 (PST) Subject: Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support From: Jason Merrill To: Tom Honermann , gcc-patches References: <3649a3a5-df5e-b198-dcd5-a17a29df1ef5@redhat.com> <6bc5ae56-be78-865f-481a-36d257fffcb7@redhat.com> <6fc1fe2e-2330-c57e-3722-081cf95ffd8f@honermann.net> <015f32d4-8651-8f82-45a5-5d8a3f813894@redhat.com> Message-ID: <6d595950-d7ee-8c19-fb6b-ad554622892a@redhat.com> Date: Mon, 17 Dec 2018 21:02:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 MIME-Version: 1.0 In-Reply-To: <015f32d4-8651-8f82-45a5-5d8a3f813894@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg01256.txt.bz2 On 12/5/18 11:16 AM, Jason Merrill wrote: > On 12/5/18 2:09 AM, Tom Honermann wrote: >> On 12/3/18 5:01 PM, Jason Merrill wrote: >>> On 12/3/18 4:51 PM, Jason Merrill wrote: >>>> On 11/5/18 2:39 PM, Tom Honermann wrote: >>>>> This patch adds support for the P0482R5 core language changes. >>>>> This includes: >>>>> - The -fchar8_t and -fno_char8_t command line options. >>>>> - char8_t as a keyword. >>>>> - The char8_t builtin type as a non-aliasing unsigned integral >>>>>    character type of size 1. >>>>> - Use of char8_t as a simple type specifier. >>>>> - u8 character literals with type char8_t. >>>>> - u8 string literals with type array of const char8_t. >>>>> - User defined literal operators that accept char8_1 and char8_t >>>>> pointer >>>>>    types. >>>>> - New __cpp_char8_t predefined feature test macro. >>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined >>>>>    macros . >>>>> - Name mangling and demangling for char8_t (using Du). >>>>> >>>>> gcc/ChangeLog: >>>>> >>>>> 2018-11-04  Tom Honermann  >>>>> >>>>>       * defaults.h: Define CHAR8_TYPE. >>>>> >>>>> gcc/c-family/ChangeLog: >>>>> >>>>> 2018-11-04  Tom Honermann  >>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t. >>>>>       (fix_string_type): Use char8_t for the type of u8 string >>>>> literals. >>>>>       (c_common_get_alias_set): char8_t doesn't alias. >>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin >>>>> type in >>>>>       C++. >>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__. >>>>>       (keyword_begins_type_specifier): Add RID_CHAR8. >>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8. >>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE. >>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS. >>>>>       Define char8_type_node and char8_array_type_node. >>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine >>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE. >>>>>       (c_cpp_builtins): Predefine __cpp_char8_t. >>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as >>>>> the >>>>>       type of CPP_UTF8STRING. >>>>>       (lex_charconst): Use char8_type_node as the type of >>>>> CPP_UTF8CHAR. >>>>>       * c-family/c.opt: Add the -fchar8_t command line option. >>>>> >>>>> gcc/c/ChangeLog: >>>>> >>>>> 2018-11-04  Tom Honermann  >>>>> >>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node. >>>>>       (digest_init): Handle initialization by a u8 string literal of >>>>>       char8_t type. >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> 2018-11-04  Tom Honermann  >>>>> >>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion. >>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier >>>>>       combinations involving char8_t. >>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word. >>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for >>>>> char8_t >>>>>       (Du). >>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p, >>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a simple >>>>>       type specifier. >>>>>       (cp_parser_string_literal): Use char8_array_type_node for the >>>>> type >>>>>       of CPP_UTF8STRING. >>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in >>>>> system >>>>>       headers. >>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for >>>>> char8_t. >>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character >>>>> type. >>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string >>>>>       literals of char8_t type. >>>>>       (check_literal_operator_args): Handle UDLs with u8 string >>>>> literals >>>>>       of char8_t type. >>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a char >>>>> array >>>>>       with a u8 string literal. >>>>> >>>>> libiberty/ChangeLog: >>>>> >>>>> 2018-10-31  Tom Honermann  >>>>>       * cp-demangle.c (cplus_demangle_builtin_types, >>>>>       cplus_demangle_type): Add name demangling for char8_t (Du). >>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate >>>>> the >>>>>       new char8_t type. >>>> >>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t) >>>>>    if (!TYPE_P (t)) >>>>>      return -1; >>>> >>>>> +  /* Unlike char, char8_t doesn't alias. */ >>>>> +  if (flag_char8_t && t == char8_type_node) >>>>> +    return -1; >>>> >>>> This seems unnecessary; doesn't the existing code have the same >>>> effect? I think we could do with just an adjustment to the existing >>>> comment. >> I'm not sure.  I had concerns about unintended matching due to char8_t >> having an underlying type of unsigned char. > > That shouldn't be a problem: if char8_t is a distinct type, it won't > match unsigned char, and if it's the same as unsigned char, flag_char8_t > will be false. > >>>>> +  else if (flag_char8_t && TREE_TYPE (value) == >>>>> char8_array_type_node) >>>>> +      || (flag_char8_t && type == char8_type_node) >>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, >>>>> char8_type_node)); >>>>> +       || (flag_char8_t && type == char8_type_node >>>> In many places you check the flag and then for one of the char8 >>>> types. Since the types won't be used without the flag, checking the >>>> flag seems redundant? >> >> This was again protection against unintended matching of the >> underlying unsigned char type, particularly when compiling as C. >> char8_type_node is constructed (in c_common_nodes_and_builtins) >> following the pattern in place for char16_t and char32_t with the >> following code: >> >> +  char8_type_node = get_identifier (CHAR8_TYPE); >> +  char8_type_node = TREE_TYPE (identifier_global_value >> (char8_type_node)); >> +  char8_type_size = TYPE_PRECISION (char8_type_node); >> +  if (c_dialect_cxx ()) >> +    { >> +      char8_type_node = make_unsigned_type (char8_type_size); >> + >> +      if (flag_char8_t) >> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node); >> +    } >> >> My knowledge of gcc internals is weak, but I understand this to be, >> effectively, defining a type alias (of unsigned char) and then, if >> compiling as C++, re-defining it as a strong unsigned type. >> >> I don't recall the details now, but at one point, I was missing a >> check for flag_char8_t in some location and I encountered some test >> failures as a result. > > Since char8_type_node is always a distinct type in C++, we shouldn't > need these checks in the C++ front end.  And since it's never a distinct > type in C, the C front end (c/*) doesn't need to change at all. > >>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT) >>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT >>>>> +          && (typ1 == char_type_node || !flag_char8_t)) >>>> >>>> This looks wrong, or at least incomplete; we want to complain about >>>> mismatched types here even with -fchar8-t.  Perhaps we should >>>> replace all of this if/else with simply comparing typ1 and >>>> char_type, and complaining if they're different.  Talking about wide >>>> and non-wide isn't as useful as the actual types would be. >>> >>> Well, I suppose it isn't quite that simple, since we still need to >>> treat the ordinary character types as interchangeable. >> >> We do need to complain about mismatched types and test >> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that >> happens. > > You don't seem to test initializing an array of signed/unsigned char, > which I think will be broken by the change only considering char_type_node. > > I think we want to specifically allow conversion from array of one > ordinary character type to another, and otherwise complain about the > types being different with a message like "cannot initialize array of > %qT from array of %qT" rather than mess with terms like int-array and > (non-)wide string. Ping. Will you have a chance to update the patch soon, or would you like me to make these last changes myself? Jason