From: Jason Merrill <jason@redhat.com>
To: Tom Honermann <tom@honermann.net>, gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
Date: Mon, 17 Dec 2018 21:02:00 -0000 [thread overview]
Message-ID: <6d595950-d7ee-8c19-fb6b-ad554622892a@redhat.com> (raw)
In-Reply-To: <015f32d4-8651-8f82-45a5-5d8a3f813894@redhat.com>
On 12/5/18 11:16 AM, Jason Merrill wrote:
> On 12/5/18 2:09 AM, Tom Honermann wrote:
>> On 12/3/18 5:01 PM, Jason Merrill wrote:
>>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>>> This patch adds support for the P0482R5 core language changes.
>>>>> This includes:
>>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>>> - char8_t as a keyword.
>>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>> Â Â character type of size 1.
>>>>> - Use of char8_t as a simple type specifier.
>>>>> - u8 character literals with type char8_t.
>>>>> - u8 string literals with type array of const char8_t.
>>>>> - User defined literal operators that accept char8_1 and char8_t
>>>>> pointer
>>>>> Â Â types.
>>>>> - New __cpp_char8_t predefined feature test macro.
>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>> Â Â macros .
>>>>> - Name mangling and demangling for char8_t (using Du).
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2018-11-04 Tom Honermann <tom@honermann.net>
>>>>>
>>>>> Â Â Â Â Â * defaults.h: Define CHAR8_TYPE.
>>>>>
>>>>> gcc/c-family/ChangeLog:
>>>>>
>>>>> 2018-11-04 Tom Honermann <tom@honermann.net>
>>>>> Â Â Â Â Â * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>> Â Â Â Â Â (fix_string_type): Use char8_t for the type of u8 string
>>>>> literals.
>>>>> Â Â Â Â Â (c_common_get_alias_set): char8_t doesn't alias.
>>>>> Â Â Â Â Â (c_common_nodes_and_builtins): Define char8_t as a builtin
>>>>> type in
>>>>> Â Â Â Â Â C++.
>>>>> Â Â Â Â Â (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>> Â Â Â Â Â (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>> Â Â Â Â Â * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>> Â Â Â Â Â (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>> Â Â Â Â Â Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>> Â Â Â Â Â Define char8_type_node and char8_array_type_node.
>>>>> Â Â Â Â Â * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>> Â Â Â Â Â __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>> Â Â Â Â Â (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>> Â Â Â Â Â * c-family/c-lex.c (lex_string): Use char8_array_type_node as
>>>>> the
>>>>> Â Â Â Â Â type of CPP_UTF8STRING.
>>>>> Â Â Â Â Â (lex_charconst): Use char8_type_node as the type of
>>>>> CPP_UTF8CHAR.
>>>>> Â Â Â Â Â * c-family/c.opt: Add the -fchar8_t command line option.
>>>>>
>>>>> gcc/c/ChangeLog:
>>>>>
>>>>> 2018-11-04 Tom Honermann <tom@honermann.net>
>>>>>
>>>>> Â Â Â Â Â * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>> Â Â Â Â Â (digest_init): Handle initialization by a u8 string literal of
>>>>> Â Â Â Â Â char8_t type.
>>>>>
>>>>> gcc/cp/ChangeLog:
>>>>>
>>>>> 2018-11-04 Tom Honermann <tom@honermann.net>
>>>>>
>>>>> Â Â Â Â Â * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>> Â Â Â Â Â * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>> Â Â Â Â Â combinations involving char8_t.
>>>>> Â Â Â Â Â * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>> Â Â Â Â Â * cp/mangle.c (write_builtin_type): Add name mangling for
>>>>> char8_t
>>>>> Â Â Â Â Â (Du).
>>>>> Â Â Â Â Â * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>> Â Â Â Â Â cp_parser_simple_type_specifier): Recognize char8_t as a simple
>>>>> Â Â Â Â Â type specifier.
>>>>> Â Â Â Â Â (cp_parser_string_literal): Use char8_array_type_node for the
>>>>> type
>>>>> Â Â Â Â Â of CPP_UTF8STRING.
>>>>> Â Â Â Â Â (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in
>>>>> system
>>>>> Â Â Â Â Â headers.
>>>>> Â Â Â Â Â * cp/rtti.c (emit_support_tinfos): type_info support for
>>>>> char8_t.
>>>>> Â Â Â Â Â * cp/tree.c (char_type_p): Recognize char8_t as a character
>>>>> type.
>>>>> Â Â Â Â Â * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>>> Â Â Â Â Â literals of char8_t type.
>>>>> Â Â Â Â Â (check_literal_operator_args): Handle UDLs with u8 string
>>>>> literals
>>>>> Â Â Â Â Â of char8_t type.
>>>>> Â Â Â Â Â * cp/typeck2.c (digest_init_r): Disallow initializing a char
>>>>> array
>>>>> Â Â Â Â Â with a u8 string literal.
>>>>>
>>>>> libiberty/ChangeLog:
>>>>>
>>>>> 2018-10-31 Tom Honermann <tom@honermann.net>
>>>>> Â Â Â Â Â * cp-demangle.c (cplus_demangle_builtin_types,
>>>>> Â Â Â Â Â cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>> Â Â Â Â Â * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate
>>>>> the
>>>>> Â Â Â Â Â new char8_t type.
>>>>
>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>> Â Â if (!TYPE_P (t))
>>>>> Â Â Â Â return -1;
>>>>
>>>>> +Â /* Unlike char, char8_t doesn't alias. */
>>>>> +Â if (flag_char8_t && t == char8_type_node)
>>>>> +Â Â Â return -1;
>>>>
>>>> This seems unnecessary; doesn't the existing code have the same
>>>> effect? I think we could do with just an adjustment to the existing
>>>> comment.
>> I'm not sure. I had concerns about unintended matching due to char8_t
>> having an underlying type of unsigned char.
>
> That shouldn't be a problem: if char8_t is a distinct type, it won't
> match unsigned char, and if it's the same as unsigned char, flag_char8_t
> will be false.
>
>>>>> +Â else if (flag_char8_t && TREE_TYPE (value) ==
>>>>> char8_array_type_node)
>>>>> +Â Â Â Â Â || (flag_char8_t && type == char8_type_node)
>>>>> +Â Â Â Â Â bool char8_array = (flag_char8_t && !!comptypes (typ1,
>>>>> char8_type_node));
>>>>> +Â Â Â Â Â Â || (flag_char8_t && type == char8_type_node
>>>> In many places you check the flag and then for one of the char8
>>>> types. Since the types won't be used without the flag, checking the
>>>> flag seems redundant?
>>
>> This was again protection against unintended matching of the
>> underlying unsigned char type, particularly when compiling as C.
>> char8_type_node is constructed (in c_common_nodes_and_builtins)
>> following the pattern in place for char16_t and char32_t with the
>> following code:
>>
>> +Â char8_type_node = get_identifier (CHAR8_TYPE);
>> +Â char8_type_node = TREE_TYPE (identifier_global_value
>> (char8_type_node));
>> +Â char8_type_size = TYPE_PRECISION (char8_type_node);
>> +Â if (c_dialect_cxx ())
>> +Â Â Â {
>> +Â Â Â Â Â char8_type_node = make_unsigned_type (char8_type_size);
>> +
>> +Â Â Â Â Â if (flag_char8_t)
>> +Â Â Â Â Â Â Â record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
>> +Â Â Â }
>>
>> My knowledge of gcc internals is weak, but I understand this to be,
>> effectively, defining a type alias (of unsigned char) and then, if
>> compiling as C++, re-defining it as a strong unsigned type.
>>
>> I don't recall the details now, but at one point, I was missing a
>> check for flag_char8_t in some location and I encountered some test
>> failures as a result.
>
> Since char8_type_node is always a distinct type in C++, we shouldn't
> need these checks in the C++ front end. And since it's never a distinct
> type in C, the C front end (c/*) doesn't need to change at all.
>
>>>>> -Â Â Â Â Â if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>>> +Â Â Â Â Â if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>>> +Â Â Â Â Â Â Â Â Â && (typ1 == char_type_node || !flag_char8_t))
>>>>
>>>> This looks wrong, or at least incomplete; we want to complain about
>>>> mismatched types here even with -fchar8-t. Perhaps we should
>>>> replace all of this if/else with simply comparing typ1 and
>>>> char_type, and complaining if they're different. Talking about wide
>>>> and non-wide isn't as useful as the actual types would be.
>>>
>>> Well, I suppose it isn't quite that simple, since we still need to
>>> treat the ordinary character types as interchangeable.
>>
>> We do need to complain about mismatched types and test
>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that
>> happens.
>
> You don't seem to test initializing an array of signed/unsigned char,
> which I think will be broken by the change only considering char_type_node.
>
> I think we want to specifically allow conversion from array of one
> ordinary character type to another, and otherwise complain about the
> types being different with a message like "cannot initialize array of
> %qT from array of %qT" rather than mess with terms like int-array and
> (non-)wide string.
Ping. Will you have a chance to update the patch soon, or would you
like me to make these last changes myself?
Jason
next prev parent reply other threads:[~2018-12-17 21:02 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-05 19:40 Tom Honermann
2018-12-03 21:51 ` Jason Merrill
2018-12-03 22:01 ` Jason Merrill
2018-12-05 7:10 ` Tom Honermann
2018-12-05 16:16 ` Jason Merrill
2018-12-17 21:02 ` Jason Merrill [this message]
2018-12-17 21:47 ` Tom Honermann
2018-12-24 2:32 ` Tom Honermann
2018-12-24 2:27 ` [REVISED PATCH " Tom Honermann
2019-01-14 19:59 ` Jason Merrill
2019-01-15 4:08 ` Tom Honermann
2019-01-15 6:51 ` Christophe Lyon
2019-01-15 15:50 ` Tom Honermann
2019-01-15 18:28 ` Jason Merrill
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6d595950-d7ee-8c19-fb6b-ad554622892a@redhat.com \
--to=jason@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=tom@honermann.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).