From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129399 invoked by alias); 24 Dec 2018 02:28:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129087 invoked by uid 89); 24 Dec 2018 02:28:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=this, so, parserc, UD:parser.c X-HELO: smtp89.iad3b.emailsrvr.com Received: from smtp89.iad3b.emailsrvr.com (HELO smtp89.iad3b.emailsrvr.com) (146.20.161.89) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 24 Dec 2018 02:28:03 +0000 Received: from smtp4.relay.iad3b.emailsrvr.com (localhost [127.0.0.1]) by smtp4.relay.iad3b.emailsrvr.com (SMTP Server) with ESMTP id B3F7B201B9; Sun, 23 Dec 2018 21:28:01 -0500 (EST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=honermann.net; s=20180930-2j89z3ji; t=1545618481; bh=AbJHdI/8L13O1e/Iqol1EoPSvdD/IyqvQSuCrXPWbSI=; h=Subject:From:To:Date:From; b=m5t7Tmfs0WgWhZp4Z5RunQ/yOr+E3MgYsE14CwXXjgk7pbVgcpH36KLr9znKtkc1D Ny87FnxMZyid0kaiq9H0Hnvny5qIM5564TkINgLHsHxI/Vru7UfF/9sLRWhF4ZP6GE FgEZ8R5dhOQi5tiavA46zaTgFGYUlzSisCj9gcGw= X-Auth-ID: tom@honermann.net Received: by smtp4.relay.iad3b.emailsrvr.com (Authenticated sender: tom-AT-honermann.net) with ESMTPSA id 781C1201B3; Sun, 23 Dec 2018 21:28:01 -0500 (EST) X-Sender-Id: tom@honermann.net Received: from [192.168.1.22] (pool-71-176-235-158.rcmdva.fios.verizon.net [71.176.235.158]) (using TLSv1.2 with cipher AES128-SHA) by 0.0.0.0:25 (trex/5.7.12); Sun, 23 Dec 2018 21:28:01 -0500 Subject: Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support From: Tom Honermann To: Jason Merrill , gcc-patches References: <3649a3a5-df5e-b198-dcd5-a17a29df1ef5@redhat.com> <6bc5ae56-be78-865f-481a-36d257fffcb7@redhat.com> <6fc1fe2e-2330-c57e-3722-081cf95ffd8f@honermann.net> <015f32d4-8651-8f82-45a5-5d8a3f813894@redhat.com> <6d595950-d7ee-8c19-fb6b-ad554622892a@redhat.com> <69be5cf2-3c78-1719-8942-874bcf7efbba@honermann.net> Message-ID: <094cab90-81ef-d41d-48b7-d69895f1f4d0@honermann.net> Date: Mon, 24 Dec 2018 02:32:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <69be5cf2-3c78-1719-8942-874bcf7efbba@honermann.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-12/txt/msg01659.txt.bz2 Thanks, Jason.  I just sent a revised set of patches addressing most of your feedback with exceptions as described inline below. On 12/17/18 4:47 PM, Tom Honermann wrote: > On 12/17/18 4:02 PM, Jason Merrill wrote: >> On 12/5/18 11:16 AM, Jason Merrill wrote: >>> On 12/5/18 2:09 AM, Tom Honermann wrote: >>>> On 12/3/18 5:01 PM, Jason Merrill wrote: >>>>> On 12/3/18 4:51 PM, Jason Merrill wrote: >>>>>> On 11/5/18 2:39 PM, Tom Honermann wrote: >>>>>>> This patch adds support for the P0482R5 core language changes.  >>>>>>> This includes: >>>>>>> - The -fchar8_t and -fno_char8_t command line options. >>>>>>> - char8_t as a keyword. >>>>>>> - The char8_t builtin type as a non-aliasing unsigned integral >>>>>>>    character type of size 1. >>>>>>> - Use of char8_t as a simple type specifier. >>>>>>> - u8 character literals with type char8_t. >>>>>>> - u8 string literals with type array of const char8_t. >>>>>>> - User defined literal operators that accept char8_1 and char8_t >>>>>>> pointer >>>>>>>    types. >>>>>>> - New __cpp_char8_t predefined feature test macro. >>>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined >>>>>>>    macros . >>>>>>> - Name mangling and demangling for char8_t (using Du). >>>>>>> >>>>>>> gcc/ChangeLog: >>>>>>> >>>>>>> 2018-11-04  Tom Honermann  >>>>>>> >>>>>>>       * defaults.h: Define CHAR8_TYPE. >>>>>>> >>>>>>> gcc/c-family/ChangeLog: >>>>>>> >>>>>>> 2018-11-04  Tom Honermann  >>>>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t. >>>>>>>       (fix_string_type): Use char8_t for the type of u8 string >>>>>>> literals. >>>>>>>       (c_common_get_alias_set): char8_t doesn't alias. >>>>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin >>>>>>> type in >>>>>>>       C++. >>>>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__. >>>>>>>       (keyword_begins_type_specifier): Add RID_CHAR8. >>>>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8. >>>>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE. >>>>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS. >>>>>>>       Define char8_type_node and char8_array_type_node. >>>>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine >>>>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE. >>>>>>>       (c_cpp_builtins): Predefine __cpp_char8_t. >>>>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node >>>>>>> as the >>>>>>>       type of CPP_UTF8STRING. >>>>>>>       (lex_charconst): Use char8_type_node as the type of >>>>>>> CPP_UTF8CHAR. >>>>>>>       * c-family/c.opt: Add the -fchar8_t command line option. >>>>>>> >>>>>>> gcc/c/ChangeLog: >>>>>>> >>>>>>> 2018-11-04  Tom Honermann  >>>>>>> >>>>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node. >>>>>>>       (digest_init): Handle initialization by a u8 string >>>>>>> literal of >>>>>>>       char8_t type. >>>>>>> >>>>>>> gcc/cp/ChangeLog: >>>>>>> >>>>>>> 2018-11-04  Tom Honermann  >>>>>>> >>>>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion. >>>>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier >>>>>>>       combinations involving char8_t. >>>>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word. >>>>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for >>>>>>> char8_t >>>>>>>       (Du). >>>>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p, >>>>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a >>>>>>> simple >>>>>>>       type specifier. >>>>>>>       (cp_parser_string_literal): Use char8_array_type_node for >>>>>>> the type >>>>>>>       of CPP_UTF8STRING. >>>>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs >>>>>>> in system >>>>>>>       headers. >>>>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for >>>>>>> char8_t. >>>>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a >>>>>>> character type. >>>>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 >>>>>>> string >>>>>>>       literals of char8_t type. >>>>>>>       (check_literal_operator_args): Handle UDLs with u8 string >>>>>>> literals >>>>>>>       of char8_t type. >>>>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a >>>>>>> char array >>>>>>>       with a u8 string literal. >>>>>>> >>>>>>> libiberty/ChangeLog: >>>>>>> >>>>>>> 2018-10-31  Tom Honermann  >>>>>>>       * cp-demangle.c (cplus_demangle_builtin_types, >>>>>>>       cplus_demangle_type): Add name demangling for char8_t (Du). >>>>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to >>>>>>> accommodate the >>>>>>>       new char8_t type. >>>>>> >>>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t) >>>>>>>    if (!TYPE_P (t)) >>>>>>>      return -1; >>>>>> >>>>>>> +  /* Unlike char, char8_t doesn't alias. */ >>>>>>> +  if (flag_char8_t && t == char8_type_node) >>>>>>> +    return -1; >>>>>> >>>>>> This seems unnecessary; doesn't the existing code have the same >>>>>> effect? I think we could do with just an adjustment to the >>>>>> existing comment. >>>> I'm not sure.  I had concerns about unintended matching due to >>>> char8_t having an underlying type of unsigned char. >>> >>> That shouldn't be a problem: if char8_t is a distinct type, it won't >>> match unsigned char, and if it's the same as unsigned char, >>> flag_char8_t will be false. I tried removing this check and that resulted in test gcc/testsuite/g++.dg/ext/char8_t-aliasing-1.C (added in patch 3/9) failing.  It seems this change is needed.  If you believe that implies that something is wrong elsewhere, please let me know. >>> >>>>>>> +  else if (flag_char8_t && TREE_TYPE (value) == >>>>>>> char8_array_type_node) >>>>>>> +      || (flag_char8_t && type == char8_type_node) >>>>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, >>>>>>> char8_type_node)); >>>>>>> +       || (flag_char8_t && type == char8_type_node >>>>>> In many places you check the flag and then for one of the char8 >>>>>> types. Since the types won't be used without the flag, checking >>>>>> the flag seems redundant? >>>> >>>> This was again protection against unintended matching of the >>>> underlying unsigned char type, particularly when compiling as C. >>>> char8_type_node is constructed (in c_common_nodes_and_builtins) >>>> following the pattern in place for char16_t and char32_t with the >>>> following code: >>>> >>>> +  char8_type_node = get_identifier (CHAR8_TYPE); >>>> +  char8_type_node = TREE_TYPE (identifier_global_value >>>> (char8_type_node)); >>>> +  char8_type_size = TYPE_PRECISION (char8_type_node); >>>> +  if (c_dialect_cxx ()) >>>> +    { >>>> +      char8_type_node = make_unsigned_type (char8_type_size); >>>> + >>>> +      if (flag_char8_t) >>>> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node); >>>> +    } >>>> >>>> My knowledge of gcc internals is weak, but I understand this to be, >>>> effectively, defining a type alias (of unsigned char) and then, if >>>> compiling as C++, re-defining it as a strong unsigned type. >>>> >>>> I don't recall the details now, but at one point, I was missing a >>>> check for flag_char8_t in some location and I encountered some test >>>> failures as a result. >>> >>> Since char8_type_node is always a distinct type in C++, we shouldn't >>> need these checks in the C++ front end.  And since it's never a >>> distinct type in C, the C front end (c/*) doesn't need to change at >>> all. Thanks, that makes sense.  I removed the checks in the C++ front end.  The only checks now remaining are in gcc/c-family/c-common.c and I believe they are necessary there. >>> >>>>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT) >>>>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT >>>>>>> +          && (typ1 == char_type_node || !flag_char8_t)) >>>>>> >>>>>> This looks wrong, or at least incomplete; we want to complain >>>>>> about mismatched types here even with -fchar8-t. Perhaps we >>>>>> should replace all of this if/else with simply comparing typ1 and >>>>>> char_type, and complaining if they're different.  Talking about >>>>>> wide and non-wide isn't as useful as the actual types would be. >>>>> >>>>> Well, I suppose it isn't quite that simple, since we still need to >>>>> treat the ordinary character types as interchangeable. >>>> >>>> We do need to complain about mismatched types and test >>>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that >>>> happens. >>> >>> You don't seem to test initializing an array of signed/unsigned >>> char, which I think will be broken by the change only considering >>> char_type_node. Thank you for spotting this!  Yes, this was broken.  Now fixed and tests added. >>> >>> I think we want to specifically allow conversion from array of one >>> ordinary character type to another, and otherwise complain about the >>> types being different with a message like "cannot initialize array >>> of %qT from array of %qT" rather than mess with terms like int-array >>> and (non-)wide string. I like this suggestion.  For this patch, I just made a small tweak to avoid misleadingly describing UTF-8 strings literals as wide strings.  The new (temporary) error messages look like "char-array initialized from UTF-8 string" and "char8_t-array initialized from ordinary string".  I have a separate patch that changes the error messages along the lines you suggested, but I'll send that separately (in the next day or so; I need to re-run the testsuite) as it will require updates to a distinct set of tests. Tom. >> >> Ping.  Will you have a chance to update the patch soon, or would you >> like me to make these last changes myself? > > Hi, Jason.  Updating the patch has remained high on my todo list, but > I've been under water elsewhere.  I've been hoping to get to it in the > next week or so.  If you have time available to help complete changes, > sure, that would be great. > > Tom. > >> >> Jason > >