public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Tom Honermann <tom@honermann.net>
To: gcc-patches <gcc-patches@gcc.gnu.org>
Subject: Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
Date: Mon, 24 Dec 2018 02:27:00 -0000	[thread overview]
Message-ID: <84731895-95a5-f4d3-f444-169fa7441080@honermann.net> (raw)
In-Reply-To: <f625e9a2-1757-10eb-ceff-5f1bbb6a1a66@honermann.net>

[-- Attachment #1: Type: text/plain, Size: 3646 bytes --]

Attached is a revised patch that addresses changes in P0482R6 as well as 
feedback provided by Jason.  Changes from the prior patch include:
- Updated the value of the __cpp_char8_t feature test macro to 201811
   per P0482R6.
- Enable char8_t support with -std=c++2a per adoption of P0482R6 in
   San Diego.
- Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
   by Jason.
- Removed unnecessary checks of 'flag_char8_t' within the C++ front
   end as requested by Jason.
- Corrected the regression spotted by Jason regarding initialization of
   signed char and unsigned char arrays with string literals.
- Made minor changes to the error message emitted for ill-formed
   initialization of char arrays with UTF-8 string literals.  These
   changes do not yet implement Jason's suggestion; I'll follow up with a
   separate patch for that due to additional test impact.

Tested on x86_64-linux.

gcc/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>
      * c-family/c-common.c (c_common_reswords): Add char8_t.
      (fix_string_type): Use char8_t for the type of u8 string literals.
      (c_common_get_alias_set): char8_t doesn't alias.
      (c_common_nodes_and_builtins): Define char8_t as a builtin type in
      C++.
      (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
      (keyword_begins_type_specifier): Add RID_CHAR8.
      * c-family/c-common.h (rid): Add RID_CHAR8.
      (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
      Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
      Define char8_type_node and char8_array_type_node.
      * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
      __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
      (c_cpp_builtins): Predefine __cpp_char8_t.
      * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
      type of CPP_UTF8STRING.
      (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
      * c-family/c-opts.c: If not otherwise specified, enable -fchar8_t
      when targeting C++2a.
      * c-family/c.opt: Add the -fchar8_t command line option.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
      * cp/decl.c (grokdeclarator): Handle invalid type specifier
      combinations involving char8_t.
      * cp/lex.c (init_reswords): Add char8_t as a reserved word.
      * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
      (Du).
      * cp/parser.c (cp_keyword_starts_decl_specifier_p,
      cp_parser_simple_type_specifier): Recognize char8_t as a simple
      type specifier.
      (cp_parser_string_literal): Use char8_array_type_node for the type
      of CPP_UTF8STRING.
      (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
      headers.
      * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
      * cp/tree.c (char_type_p): Recognize char8_t as a character type.
      * cp/typeck.c (string_conv_p): Handle conversions of u8 string
      literals of char8_t type.
      (check_literal_operator_args): Handle UDLs with u8 string literals
      of char8_t type.
      * cp/typeck2.c (digest_init_r): Disallow initializing a char array
      with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  <tom@honermann.net>
      * cp-demangle.c (cplus_demangle_builtin_types,
      cplus_demangle_type): Add name demangling for char8_t (Du).
      * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
      new char8_t type.

Tom.



[-- Attachment #2: p0482r5-2-2.patch --]
[-- Type: text/x-patch, Size: 21355 bytes --]

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f10cf89c3a7..b387daca137 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -79,6 +79,7 @@ machine_mode c_default_pointer_mode = VOIDmode;
 	tree signed_char_type_node;
 	tree wchar_type_node;
 
+	tree char8_type_node;
 	tree char16_type_node;
 	tree char32_type_node;
 
@@ -128,6 +129,11 @@ machine_mode c_default_pointer_mode = VOIDmode;
 
 	tree wchar_array_type_node;
 
+   Type `char8_t[SOMENUMBER]' or something like it.
+   Used when a UTF-8 string literal is created.
+
+	tree char8_array_type_node;
+
    Type `char16_t[SOMENUMBER]' or something like it.
    Used when a UTF-16 string literal is created.
 
@@ -450,6 +456,7 @@ const struct c_common_resword c_common_reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	D_CXX_OBJC | D_CXXWARN },
   { "char",		RID_CHAR,	0 },
+  { "char8_t",		RID_CHAR8,	D_CXX_CHAR8_T_FLAGS | D_CXXWARN },
   { "char16_t",		RID_CHAR16,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "char32_t",		RID_CHAR32,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "class",		RID_CLASS,	D_CXX_OBJC | D_CXXWARN },
@@ -746,6 +753,11 @@ fix_string_type (tree value)
       nchars = length;
       e_type = char_type_node;
     }
+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+    {
+      nchars = length / (TYPE_PRECISION (char8_type_node) / BITS_PER_UNIT);
+      e_type = char8_type_node;
+    }
   else if (TREE_TYPE (value) == char16_array_type_node)
     {
       nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
@@ -813,7 +825,8 @@ fix_string_type (tree value)
    CPP_STRING16, or CPP_STRING32.  Return CPP_OTHER in case of error.
    This may not be exactly the string token type that initially created
    the string, since CPP_WSTRING is indistinguishable from the 16/32 bit
-   string type at this point.
+   string type, and CPP_UTF8STRING is indistinguishable from CPP_STRING
+   at this point.
 
    This effectively reverses part of the logic in lex_string and
    fix_string_type.  */
@@ -3543,8 +3556,12 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
     return -1;
 
+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;
+
   /* The C standard guarantees that any object may be accessed via an
-     lvalue that has character type.  */
+     lvalue that has narrow character type (except char8_t).  */
   if (t == char_type_node
       || t == signed_char_type_node
       || t == unsigned_char_type_node)
@@ -3953,6 +3970,7 @@ c_get_ident (const char *id)
 void
 c_common_nodes_and_builtins (void)
 {
+  int char8_type_size;
   int char16_type_size;
   int char32_type_size;
   int wchar_type_size;
@@ -4244,6 +4262,22 @@ c_common_nodes_and_builtins (void)
   wchar_array_type_node
     = build_array_type (wchar_type_node, array_domain_type);
 
+  /* Define 'char8_t'.  */
+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }
+
+  /* This is for UTF-8 string constants.  */
+  char8_array_type_node
+    = build_array_type (char8_type_node, array_domain_type);
+
   /* Define 'char16_t'.  */
   char16_type_node = get_identifier (CHAR16_TYPE);
   char16_type_node = TREE_TYPE (identifier_global_value (char16_type_node));
@@ -5041,6 +5075,8 @@ c_stddef_cpp_builtins(void)
   builtin_define_with_value ("__WINT_TYPE__", WINT_TYPE, 0);
   builtin_define_with_value ("__INTMAX_TYPE__", INTMAX_TYPE, 0);
   builtin_define_with_value ("__UINTMAX_TYPE__", UINTMAX_TYPE, 0);
+  if (flag_char8_t)
+    builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
   builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
   builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
   if (SIG_ATOMIC_TYPE)
@@ -7717,6 +7753,7 @@ keyword_begins_type_specifier (enum rid keyword)
     case RID_ACCUM:
     case RID_BOOL:
     case RID_WCHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_SAT:
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 641fe57d671..56992b63c0b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -179,6 +179,9 @@ enum rid
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
+  /* char8_t */
+  RID_CHAR8,
+
   /* C++ concepts */
   RID_CONCEPT, RID_REQUIRES,
 
@@ -286,6 +289,7 @@ extern GTY ((length ("(int) RID_MAX"))) tree *ridpointers;
 
 enum c_tree_index
 {
+    CTI_CHAR8_TYPE,
     CTI_CHAR16_TYPE,
     CTI_CHAR32_TYPE,
     CTI_WCHAR_TYPE,
@@ -329,6 +333,7 @@ enum c_tree_index
     CTI_UINTPTR_TYPE,
 
     CTI_CHAR_ARRAY_TYPE,
+    CTI_CHAR8_ARRAY_TYPE,
     CTI_CHAR16_ARRAY_TYPE,
     CTI_CHAR32_ARRAY_TYPE,
     CTI_WCHAR_ARRAY_TYPE,
@@ -408,20 +413,22 @@ extern machine_mode c_default_pointer_mode;
    mask) is _true_.  Thus for keywords which are present in all
    languages the disable field is zero.  */
 
-#define D_CONLY		0x001	/* C only (not in C++).  */
-#define D_CXXONLY	0x002	/* C++ only (not in C).  */
-#define D_C99		0x004	/* In C, C99 only.  */
-#define D_CXX11         0x008	/* In C++, C++11 only.  */
-#define D_EXT		0x010	/* GCC extension.  */
-#define D_EXT89		0x020	/* GCC extension incorporated in C99.  */
-#define D_ASM		0x040	/* Disabled by -fno-asm.  */
-#define D_OBJC		0x080	/* In Objective C and neither C nor C++.  */
-#define D_CXX_OBJC	0x100	/* In Objective C, and C++, but not C.  */
-#define D_CXXWARN	0x200	/* In C warn with -Wcxx-compat.  */
-#define D_CXX_CONCEPTS  0x400   /* In C++, only with concepts. */
-#define D_TRANSMEM	0X800   /* C++ transactional memory TS.  */
+#define D_CONLY		0x0001	/* C only (not in C++).  */
+#define D_CXXONLY	0x0002	/* C++ only (not in C).  */
+#define D_C99		0x0004	/* In C, C99 only.  */
+#define D_CXX11         0x0008	/* In C++, C++11 only.  */
+#define D_EXT		0x0010	/* GCC extension.  */
+#define D_EXT89		0x0020	/* GCC extension incorporated in C99.  */
+#define D_ASM		0x0040	/* Disabled by -fno-asm.  */
+#define D_OBJC		0x0080	/* In Objective C and neither C nor C++.  */
+#define D_CXX_OBJC	0x0100	/* In Objective C, and C++, but not C.  */
+#define D_CXXWARN	0x0200	/* In C warn with -Wcxx-compat.  */
+#define D_CXX_CONCEPTS  0x0400	/* In C++, only with concepts.  */
+#define D_TRANSMEM	0X0800	/* C++ transactional memory TS.  */
+#define D_CXX_CHAR8_T	0X1000	/* In C++, only with -fchar8_t.  */
 
 #define D_CXX_CONCEPTS_FLAGS D_CXXONLY | D_CXX_CONCEPTS
+#define D_CXX_CHAR8_T_FLAGS D_CXXONLY | D_CXX_CHAR8_T
 
 /* The reserved keyword table.  */
 extern const struct c_common_resword c_common_reswords[];
@@ -429,6 +436,7 @@ extern const struct c_common_resword c_common_reswords[];
 /* The number of items in the reserved keyword table.  */
 extern const unsigned int num_c_common_reswords;
 
+#define char8_type_node			c_global_trees[CTI_CHAR8_TYPE]
 #define char16_type_node		c_global_trees[CTI_CHAR16_TYPE]
 #define char32_type_node		c_global_trees[CTI_CHAR32_TYPE]
 #define wchar_type_node			c_global_trees[CTI_WCHAR_TYPE]
@@ -474,6 +482,7 @@ extern const unsigned int num_c_common_reswords;
 #define truthvalue_false_node		c_global_trees[CTI_TRUTHVALUE_FALSE]
 
 #define char_array_type_node		c_global_trees[CTI_CHAR_ARRAY_TYPE]
+#define char8_array_type_node		c_global_trees[CTI_CHAR8_ARRAY_TYPE]
 #define char16_array_type_node		c_global_trees[CTI_CHAR16_ARRAY_TYPE]
 #define char32_array_type_node		c_global_trees[CTI_CHAR32_ARRAY_TYPE]
 #define wchar_array_type_node		c_global_trees[CTI_WCHAR_ARRAY_TYPE]
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 96a6b4dfd2b..1287b55d3b3 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -702,6 +702,11 @@ cpp_atomic_builtins (cpp_reader *pfile)
 			(have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1));
+  if (flag_char8_t)
+    {
+      builtin_define_with_int_value ("__GCC_ATOMIC_CHAR8_T_LOCK_FREE",
+			(have_swap[SWAP_INDEX (char8_type_node)]? 2 : 1));
+    }
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR16_T_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR32_T_LOCK_FREE", 
@@ -993,6 +998,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	cpp_define (pfile, "__cpp_template_template_args=201611");
       if (flag_threadsafe_statics)
 	cpp_define (pfile, "__cpp_threadsafe_static_init=200806");
+      if (flag_char8_t)
+        cpp_define (pfile, "__cpp_char8_t=201811");
     }
   /* Note that we define this for C as well, so that we know if
      __attribute__((cleanup)) will interface with EH.  */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 28a820a2a3d..f7cf79ee350 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1279,9 +1279,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
     {
     default:
     case CPP_STRING:
-    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
+    case CPP_UTF8STRING:
+      if (flag_char8_t)
+        TREE_TYPE (value) = char8_array_type_node;
+      else
+        TREE_TYPE (value) = char_array_type_node;
+      break;
     case CPP_STRING16:
       TREE_TYPE (value) = char16_array_type_node;
       break;
@@ -1321,7 +1326,12 @@ lex_charconst (const cpp_token *token)
   else if (token->type == CPP_CHAR16)
     type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
-    type = char_type_node;
+    {
+      if (flag_char8_t)
+        type = char8_type_node;
+      else
+        type = char_type_node;
+    }
   /* In C, a character constant has type 'int'.
      In C++ 'char', but multi-char charconsts have type 'int'.  */
   else if (!c_dialect_cxx () || chars_seen > 1)
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 9cf1900fb9a..507bf122e3d 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -995,6 +995,10 @@ c_common_post_options (const char **pfilename)
   if (flag_sized_deallocation == -1)
     flag_sized_deallocation = (cxx_dialect >= cxx14);
 
+  /* char8_t support is new in C++2A.  */
+  if (flag_char8_t == -1)
+    flag_char8_t = (cxx_dialect >= cxx2a);
+
   if (flag_extern_tls_init)
     {
       if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 6f88a1013d6..5d5f5c26ce0 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1291,6 +1291,11 @@ fcanonical-system-headers
 C ObjC C++ ObjC++
 Where shorter, use canonicalized paths to systems headers.
 
+fchar8_t
+C++ ObjC++ Var(flag_char8_t) Init(-1)
+Enable the char8_t fundamental type and use it as the type for UTF-8 string
+and character literals.
+
 fcheck-pointer-bounds
 C ObjC C++ ObjC++ LTO Deprecated
 Deprecated in GCC 9.  This switch has no effect.
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 315b0d6a65a..c40606f7c2c 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1863,6 +1863,7 @@ type_promotes_to (tree type)
      wider.  Scoped enums don't promote, but pretend they do for backward
      ABI bug compatibility wrt varargs.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
+	   || type == char8_type_node
 	   || type == char16_type_node
 	   || type == char32_type_node
 	   || type == wchar_type_node)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5ebfaaf85e6..f3c09ad2274 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10714,7 +10714,9 @@ grokdeclarator (const cp_declarator *declarator,
 	  error_at (&richloc, "%<long%> and %<short%> specified together");
 	}
       else if (TREE_CODE (type) != INTEGER_TYPE
-	       || type == char16_type_node || type == char32_type_node
+	       || type == char8_type_node
+	       || type == char16_type_node
+	       || type == char32_type_node
 	       || ((long_p || short_p)
 		   && (explicit_char || explicit_intN)))
 	error_at (loc, "%qs specified with %qT", key, type);
diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
index 47b99c3c469..c679eb73cdd 100644
--- a/gcc/cp/lex.c
+++ b/gcc/cp/lex.c
@@ -229,6 +229,8 @@ init_reswords (void)
     mask |= D_CXX_CONCEPTS;
   if (!flag_tm)
     mask |= D_TRANSMEM;
+  if (!flag_char8_t)
+    mask |= D_CXX_CHAR8_T;
   if (flag_no_asm)
     mask |= D_ASM | D_EXT;
   if (flag_no_gnu_keywords)
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 59a3111fba2..2363ed87dee 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2527,10 +2527,12 @@ write_builtin_type (tree type)
       break;
 
     case INTEGER_TYPE:
-      /* TYPE may still be wchar_t, char16_t, or char32_t, since that
+      /* TYPE may still be wchar_t, char8_t, char16_t, or char32_t, since that
 	 isn't in integer_type_nodes.  */
       if (type == wchar_type_node)
 	write_char ('w');
+      else if (type == char8_type_node)
+	write_string ("Du");
       else if (type == char16_type_node)
 	write_string ("Ds");
       else if (type == char32_type_node)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ebe326eb923..1e76779b84e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -944,6 +944,7 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
     case RID_TYPENAME:
       /* Simple type specifiers.  */
     case RID_CHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_WCHAR:
@@ -4183,9 +4184,14 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	{
 	default:
 	case CPP_STRING:
-	case CPP_UTF8STRING:
 	  TREE_TYPE (value) = char_array_type_node;
 	  break;
+	case CPP_UTF8STRING:
+	  if (flag_char8_t)
+	    TREE_TYPE (value) = char8_array_type_node;
+	  else
+	    TREE_TYPE (value) = char_array_type_node;
+	  break;
 	case CPP_STRING16:
 	  TREE_TYPE (value) = char16_array_type_node;
 	  break;
@@ -17064,6 +17070,9 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 	decl_specs->explicit_char_p = true;
       type = char_type_node;
       break;
+    case RID_CHAR8:
+      type = char8_type_node;
+      break;
     case RID_CHAR16:
       type = char16_type_node;
       break;
@@ -28275,14 +28284,15 @@ cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 {
   decl_specs->any_specifiers_p = true;
 
-  /* If the user tries to redeclare bool, char16_t, char32_t, or wchar_t
-     (with, for example, in "typedef int wchar_t;") we remember that
+  /* If the user tries to redeclare bool, char8_t, char16_t, char32_t, or
+     wchar_t (with, for example, in "typedef int wchar_t;") we remember that
      this is what happened.  In system headers, we ignore these
      declarations so that G++ can work with system headers that are not
      C++-safe.  */
   if (decl_spec_seq_has_spec_p (decl_specs, ds_typedef)
       && !type_definition_p
       && (type_spec == boolean_type_node
+	  || type_spec == char8_type_node
 	  || type_spec == char16_type_node
 	  || type_spec == char32_type_node
 	  || type_spec == wchar_type_node)
diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index a0629e19360..987183f14f0 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -1539,7 +1539,7 @@ emit_support_tinfos (void)
   {
     &void_type_node,
     &boolean_type_node,
-    &wchar_type_node, &char16_type_node, &char32_type_node,
+    &wchar_type_node, &char8_type_node, &char16_type_node, &char32_type_node,
     &char_type_node, &signed_char_type_node, &unsigned_char_type_node,
     &short_integer_type_node, &short_unsigned_type_node,
     &integer_type_node, &unsigned_type_node,
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 251c344f181..52bd62b27b5 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5036,6 +5036,7 @@ char_type_p (tree type)
   return (same_type_p (type, char_type_node)
 	  || same_type_p (type, unsigned_char_type_node)
 	  || same_type_p (type, signed_char_type_node)
+	  || same_type_p (type, char8_type_node)
 	  || same_type_p (type, char16_type_node)
 	  || same_type_p (type, char32_type_node)
 	  || same_type_p (type, wchar_type_node));
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index c921096cb31..8f78d8cf1f3 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2206,6 +2206,7 @@ string_conv_p (const_tree totype, const_tree exp, int warn)
 
   t = TREE_TYPE (totype);
   if (!same_type_p (t, char_type_node)
+      && !same_type_p (t, char8_type_node)
       && !same_type_p (t, char16_type_node)
       && !same_type_p (t, char32_type_node)
       && !same_type_p (t, wchar_type_node))
@@ -10206,6 +10207,7 @@ check_literal_operator_args (const_tree decl,
 	      t = TYPE_MAIN_VARIANT (t);
 	      if ((maybe_raw_p = same_type_p (t, char_type_node))
 		  || same_type_p (t, wchar_type_node)
+		  || same_type_p (t, char8_type_node)
 		  || same_type_p (t, char16_type_node)
 		  || same_type_p (t, char32_type_node))
 		{
@@ -10238,6 +10240,8 @@ check_literal_operator_args (const_tree decl,
 	    max_arity = 1;
 	  else if (same_type_p (t, wchar_type_node))
 	    max_arity = 1;
+	  else if (same_type_p (t, char8_type_node))
+	    max_arity = 1;
 	  else if (same_type_p (t, char16_type_node))
 	    max_arity = 1;
 	  else if (same_type_p (t, char32_type_node))
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index fec1db00ca4..782fd7f9cd5 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1063,7 +1063,21 @@ digest_init_r (tree type, tree init, int nested, int flags,
 
 	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
 	    {
-	      if (char_type != char_type_node)
+	      if (typ1 != char8_type_node && char_type == char8_type_node)
+		{
+		  if (complain & tf_error)
+		    error_at (loc, "char-array initialized from UTF-8 string");
+		  return error_mark_node;
+		}
+	      else if (typ1 == char8_type_node && char_type == char_type_node)
+		{
+		  if (complain & tf_error)
+		    error_at (loc, "char8_t-array initialized from ordinary "
+			      "string");
+		  return error_mark_node;
+		}
+	      else if (char_type != char_type_node
+		       && char_type != char8_type_node)
 		{
 		  if (complain & tf_error)
 		    error_at (loc, "char-array initialized from wide string");
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 9035b333be8..fc90b5fae79 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -583,6 +583,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    affect C++ name mangling because in C++ these are distinct types
    not typedefs.  */
 
+#ifndef CHAR8_TYPE
+#define CHAR8_TYPE "unsigned char"
+#endif
+
 #ifdef UINT_LEAST16_TYPE
 #define CHAR16_TYPE UINT_LEAST16_TYPE
 #else
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 3f2a097e7f2..a45b041c400 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2355,9 +2355,10 @@ cplus_demangle_builtin_types[D_BUILTIN_TYPE_COUNT] =
   /* 27 */ { NL ("decimal64"),	NL ("decimal64"),	D_PRINT_DEFAULT },
   /* 28 */ { NL ("decimal128"),	NL ("decimal128"),	D_PRINT_DEFAULT },
   /* 29 */ { NL ("half"),	NL ("half"),		D_PRINT_FLOAT },
-  /* 30 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
-  /* 31 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
-  /* 32 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
+  /* 30 */ { NL ("char8_t"),	NL ("char8_t"),		D_PRINT_DEFAULT },
+  /* 31 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
+  /* 32 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
+  /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
 };
 
@@ -2645,14 +2646,19 @@ cplus_demangle_type (struct d_info *di)
 	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[29]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
+	case 'u':
+	  /* char8_t */
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  di->expansion += ret->u.s_builtin.type->len;
+	  break;
 	case 's':
 	  /* char16_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 	case 'i':
 	  /* char32_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
@@ -2678,7 +2684,7 @@ cplus_demangle_type (struct d_info *di)
 
         case 'n':
           /* decltype(nullptr) */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[33]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
diff --git a/libiberty/cp-demangle.h b/libiberty/cp-demangle.h
index 51b8a243e0e..d4405127645 100644
--- a/libiberty/cp-demangle.h
+++ b/libiberty/cp-demangle.h
@@ -173,7 +173,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (33)
+#define D_BUILTIN_TYPE_COUNT (34)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info

  parent reply	other threads:[~2018-12-24  2:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-05 19:40 [PATCH " Tom Honermann
2018-12-03 21:51 ` Jason Merrill
2018-12-03 22:01   ` Jason Merrill
2018-12-05  7:10     ` Tom Honermann
2018-12-05 16:16       ` Jason Merrill
2018-12-17 21:02         ` Jason Merrill
2018-12-17 21:47           ` Tom Honermann
2018-12-24  2:32             ` Tom Honermann
2018-12-24  2:27 ` Tom Honermann [this message]
2019-01-14 19:59   ` [REVISED PATCH " Jason Merrill
2019-01-15  4:08     ` Tom Honermann
2019-01-15  6:51     ` Christophe Lyon
2019-01-15 15:50       ` Tom Honermann
2019-01-15 18:28         ` Jason Merrill

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84731895-95a5-f4d3-f444-169fa7441080@honermann.net \
    --to=tom@honermann.net \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).