public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
@ 2018-11-05 19:40 Tom Honermann
  2018-12-03 21:51 ` Jason Merrill
  2018-12-24  2:27 ` [REVISED PATCH " Tom Honermann
  0 siblings, 2 replies; 14+ messages in thread
From: Tom Honermann @ 2018-11-05 19:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: tom

[-- Attachment #1: Type: text/plain, Size: 3507 bytes --]

This patch adds support for the P0482R5 core language changes.  This 
includes:
- The -fchar8_t and -fno_char8_t command line options.
- char8_t as a keyword.
- The char8_t builtin type as a non-aliasing unsigned integral
   character type of size 1.
- Use of char8_t as a simple type specifier.
- u8 character literals with type char8_t.
- u8 string literals with type array of const char8_t.
- User defined literal operators that accept char8_1 and char8_t pointer
   types.
- New __cpp_char8_t predefined feature test macro.
- New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
   macros .
- Name mangling and demangling for char8_t (using Du).

gcc/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>
      * c-family/c-common.c (c_common_reswords): Add char8_t.
      (fix_string_type): Use char8_t for the type of u8 string literals.
      (c_common_get_alias_set): char8_t doesn't alias.
      (c_common_nodes_and_builtins): Define char8_t as a builtin type in
      C++.
      (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
      (keyword_begins_type_specifier): Add RID_CHAR8.
      * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
      (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
      Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
      Define char8_type_node and char8_array_type_node.
      * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
      __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
      (c_cpp_builtins): Predefine __cpp_char8_t.
      * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
      type of CPP_UTF8STRING.
      (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
      * c-family/c.opt: Add the -fchar8_t command line option.

gcc/c/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * c/c-typeck.c (char_type_p): Add char8_type_node.
      (digest_init): Handle initialization by a u8 string literal of
      char8_t type.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
      * cp/decl.c (grokdeclarator): Handle invalid type specifier
      combinations involving char8_t.
      * cp/lex.c (init_reswords): Add char8_t as a reserved word.
      * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
      (Du).
      * cp/parser.c (cp_keyword_starts_decl_specifier_p,
      cp_parser_simple_type_specifier): Recognize char8_t as a simple
      type specifier.
      (cp_parser_string_literal): Use char8_array_type_node for the type
      of CPP_UTF8STRING.
      (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
      headers.
      * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
      * cp/tree.c (char_type_p): Recognize char8_t as a character type.
      * cp/typeck.c (string_conv_p): Handle conversions of u8 string
      literals of char8_t type.
      (check_literal_operator_args): Handle UDLs with u8 string literals
      of char8_t type.
      * cp/typeck2.c (digest_init_r): Disallow initializing a char array
      with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  <tom@honermann.net>
      * cp-demangle.c (cplus_demangle_builtin_types,
      cplus_demangle_type): Add name demangling for char8_t (Du).
      * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
      new char8_t type.

Tom.

[-- Attachment #2: p0482r5-2.patch --]
[-- Type: text/x-patch, Size: 21363 bytes --]

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f10cf89c3a7..c7d88eb9a22 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -79,6 +79,7 @@ machine_mode c_default_pointer_mode = VOIDmode;
 	tree signed_char_type_node;
 	tree wchar_type_node;
 
+	tree char8_type_node;
 	tree char16_type_node;
 	tree char32_type_node;
 
@@ -128,6 +129,11 @@ machine_mode c_default_pointer_mode = VOIDmode;
 
 	tree wchar_array_type_node;
 
+   Type `char8_t[SOMENUMBER]' or something like it.
+   Used when a UTF-8 string literal is created.
+
+	tree char8_array_type_node;
+
    Type `char16_t[SOMENUMBER]' or something like it.
    Used when a UTF-16 string literal is created.
 
@@ -450,6 +456,7 @@ const struct c_common_resword c_common_reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	D_CXX_OBJC | D_CXXWARN },
   { "char",		RID_CHAR,	0 },
+  { "char8_t",		RID_CHAR8,	D_CXX_CHAR8_T_FLAGS | D_CXXWARN },
   { "char16_t",		RID_CHAR16,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "char32_t",		RID_CHAR32,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "class",		RID_CLASS,	D_CXX_OBJC | D_CXXWARN },
@@ -746,6 +753,11 @@ fix_string_type (tree value)
       nchars = length;
       e_type = char_type_node;
     }
+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+    {
+      nchars = length / (TYPE_PRECISION (char8_type_node) / BITS_PER_UNIT);
+      e_type = char8_type_node;
+    }
   else if (TREE_TYPE (value) == char16_array_type_node)
     {
       nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
@@ -813,7 +825,8 @@ fix_string_type (tree value)
    CPP_STRING16, or CPP_STRING32.  Return CPP_OTHER in case of error.
    This may not be exactly the string token type that initially created
    the string, since CPP_WSTRING is indistinguishable from the 16/32 bit
-   string type at this point.
+   string type, and CPP_UTF8STRING is indistinguishable from CPP_STRING
+   at this point.
 
    This effectively reverses part of the logic in lex_string and
    fix_string_type.  */
@@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
     return -1;
 
+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;
+
   /* The C standard guarantees that any object may be accessed via an
      lvalue that has character type.  */
   if (t == char_type_node
@@ -3953,6 +3970,7 @@ c_get_ident (const char *id)
 void
 c_common_nodes_and_builtins (void)
 {
+  int char8_type_size;
   int char16_type_size;
   int char32_type_size;
   int wchar_type_size;
@@ -4244,6 +4262,22 @@ c_common_nodes_and_builtins (void)
   wchar_array_type_node
     = build_array_type (wchar_type_node, array_domain_type);
 
+  /* Define 'char8_t'.  */
+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }
+
+  /* This is for UTF-8 string constants.  */
+  char8_array_type_node
+    = build_array_type (char8_type_node, array_domain_type);
+
   /* Define 'char16_t'.  */
   char16_type_node = get_identifier (CHAR16_TYPE);
   char16_type_node = TREE_TYPE (identifier_global_value (char16_type_node));
@@ -5041,6 +5075,8 @@ c_stddef_cpp_builtins(void)
   builtin_define_with_value ("__WINT_TYPE__", WINT_TYPE, 0);
   builtin_define_with_value ("__INTMAX_TYPE__", INTMAX_TYPE, 0);
   builtin_define_with_value ("__UINTMAX_TYPE__", UINTMAX_TYPE, 0);
+  if (flag_char8_t)
+    builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
   builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
   builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
   if (SIG_ATOMIC_TYPE)
@@ -7717,6 +7753,7 @@ keyword_begins_type_specifier (enum rid keyword)
     case RID_ACCUM:
     case RID_BOOL:
     case RID_WCHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_SAT:
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 641fe57d671..56992b63c0b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -179,6 +179,9 @@ enum rid
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
+  /* char8_t */
+  RID_CHAR8,
+
   /* C++ concepts */
   RID_CONCEPT, RID_REQUIRES,
 
@@ -286,6 +289,7 @@ extern GTY ((length ("(int) RID_MAX"))) tree *ridpointers;
 
 enum c_tree_index
 {
+    CTI_CHAR8_TYPE,
     CTI_CHAR16_TYPE,
     CTI_CHAR32_TYPE,
     CTI_WCHAR_TYPE,
@@ -329,6 +333,7 @@ enum c_tree_index
     CTI_UINTPTR_TYPE,
 
     CTI_CHAR_ARRAY_TYPE,
+    CTI_CHAR8_ARRAY_TYPE,
     CTI_CHAR16_ARRAY_TYPE,
     CTI_CHAR32_ARRAY_TYPE,
     CTI_WCHAR_ARRAY_TYPE,
@@ -408,20 +413,22 @@ extern machine_mode c_default_pointer_mode;
    mask) is _true_.  Thus for keywords which are present in all
    languages the disable field is zero.  */
 
-#define D_CONLY		0x001	/* C only (not in C++).  */
-#define D_CXXONLY	0x002	/* C++ only (not in C).  */
-#define D_C99		0x004	/* In C, C99 only.  */
-#define D_CXX11         0x008	/* In C++, C++11 only.  */
-#define D_EXT		0x010	/* GCC extension.  */
-#define D_EXT89		0x020	/* GCC extension incorporated in C99.  */
-#define D_ASM		0x040	/* Disabled by -fno-asm.  */
-#define D_OBJC		0x080	/* In Objective C and neither C nor C++.  */
-#define D_CXX_OBJC	0x100	/* In Objective C, and C++, but not C.  */
-#define D_CXXWARN	0x200	/* In C warn with -Wcxx-compat.  */
-#define D_CXX_CONCEPTS  0x400   /* In C++, only with concepts. */
-#define D_TRANSMEM	0X800   /* C++ transactional memory TS.  */
+#define D_CONLY		0x0001	/* C only (not in C++).  */
+#define D_CXXONLY	0x0002	/* C++ only (not in C).  */
+#define D_C99		0x0004	/* In C, C99 only.  */
+#define D_CXX11         0x0008	/* In C++, C++11 only.  */
+#define D_EXT		0x0010	/* GCC extension.  */
+#define D_EXT89		0x0020	/* GCC extension incorporated in C99.  */
+#define D_ASM		0x0040	/* Disabled by -fno-asm.  */
+#define D_OBJC		0x0080	/* In Objective C and neither C nor C++.  */
+#define D_CXX_OBJC	0x0100	/* In Objective C, and C++, but not C.  */
+#define D_CXXWARN	0x0200	/* In C warn with -Wcxx-compat.  */
+#define D_CXX_CONCEPTS  0x0400	/* In C++, only with concepts.  */
+#define D_TRANSMEM	0X0800	/* C++ transactional memory TS.  */
+#define D_CXX_CHAR8_T	0X1000	/* In C++, only with -fchar8_t.  */
 
 #define D_CXX_CONCEPTS_FLAGS D_CXXONLY | D_CXX_CONCEPTS
+#define D_CXX_CHAR8_T_FLAGS D_CXXONLY | D_CXX_CHAR8_T
 
 /* The reserved keyword table.  */
 extern const struct c_common_resword c_common_reswords[];
@@ -429,6 +436,7 @@ extern const struct c_common_resword c_common_reswords[];
 /* The number of items in the reserved keyword table.  */
 extern const unsigned int num_c_common_reswords;
 
+#define char8_type_node			c_global_trees[CTI_CHAR8_TYPE]
 #define char16_type_node		c_global_trees[CTI_CHAR16_TYPE]
 #define char32_type_node		c_global_trees[CTI_CHAR32_TYPE]
 #define wchar_type_node			c_global_trees[CTI_WCHAR_TYPE]
@@ -474,6 +482,7 @@ extern const unsigned int num_c_common_reswords;
 #define truthvalue_false_node		c_global_trees[CTI_TRUTHVALUE_FALSE]
 
 #define char_array_type_node		c_global_trees[CTI_CHAR_ARRAY_TYPE]
+#define char8_array_type_node		c_global_trees[CTI_CHAR8_ARRAY_TYPE]
 #define char16_array_type_node		c_global_trees[CTI_CHAR16_ARRAY_TYPE]
 #define char32_array_type_node		c_global_trees[CTI_CHAR32_ARRAY_TYPE]
 #define wchar_array_type_node		c_global_trees[CTI_WCHAR_ARRAY_TYPE]
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 96a6b4dfd2b..49399aed06b 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -702,6 +702,11 @@ cpp_atomic_builtins (cpp_reader *pfile)
 			(have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1));
+  if (flag_char8_t)
+    {
+      builtin_define_with_int_value ("__GCC_ATOMIC_CHAR8_T_LOCK_FREE",
+			(have_swap[SWAP_INDEX (char8_type_node)]? 2 : 1));
+    }
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR16_T_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR32_T_LOCK_FREE", 
@@ -993,6 +998,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	cpp_define (pfile, "__cpp_template_template_args=201611");
       if (flag_threadsafe_statics)
 	cpp_define (pfile, "__cpp_threadsafe_static_init=200806");
+      if (flag_char8_t)
+        cpp_define (pfile, "__cpp_char8_t=201803");
     }
   /* Note that we define this for C as well, so that we know if
      __attribute__((cleanup)) will interface with EH.  */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 28a820a2a3d..f7cf79ee350 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1279,9 +1279,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
     {
     default:
     case CPP_STRING:
-    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
+    case CPP_UTF8STRING:
+      if (flag_char8_t)
+        TREE_TYPE (value) = char8_array_type_node;
+      else
+        TREE_TYPE (value) = char_array_type_node;
+      break;
     case CPP_STRING16:
       TREE_TYPE (value) = char16_array_type_node;
       break;
@@ -1321,7 +1326,12 @@ lex_charconst (const cpp_token *token)
   else if (token->type == CPP_CHAR16)
     type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
-    type = char_type_node;
+    {
+      if (flag_char8_t)
+        type = char8_type_node;
+      else
+        type = char_type_node;
+    }
   /* In C, a character constant has type 'int'.
      In C++ 'char', but multi-char charconsts have type 'int'.  */
   else if (!c_dialect_cxx () || chars_seen > 1)
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 6f88a1013d6..306f58efb2d 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1291,6 +1291,11 @@ fcanonical-system-headers
 C ObjC C++ ObjC++
 Where shorter, use canonicalized paths to systems headers.
 
+fchar8_t
+C++ ObjC++ Var(flag_char8_t)
+Enable the char8_t fundamental type and use it as the type for UTF-8 string
+and character literals.
+
 fcheck-pointer-bounds
 C ObjC C++ ObjC++ LTO Deprecated
 Deprecated in GCC 9.  This switch has no effect.
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 9d09b8d65fd..66d363439a4 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -3608,6 +3608,7 @@ char_type_p (tree type)
   return (type == char_type_node
 	  || type == unsigned_char_type_node
 	  || type == signed_char_type_node
+	  || (flag_char8_t && type == char8_type_node)
 	  || type == char16_type_node
 	  || type == char32_type_node);
 }
@@ -7440,10 +7441,11 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype,
 			 || typ1 == signed_char_type_node
 			 || typ1 == unsigned_char_type_node);
       bool wchar_array = !!comptypes (typ1, wchar_type_node);
+      bool char8_array = (flag_char8_t && !!comptypes (typ1, char8_type_node));
       bool char16_array = !!comptypes (typ1, char16_type_node);
       bool char32_array = !!comptypes (typ1, char32_type_node);
 
-      if (char_array || wchar_array || char16_array || char32_array)
+      if (char_array || wchar_array || char8_array || char16_array || char32_array)
 	{
 	  struct c_expr expr;
 	  tree typ2 = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (inside_init)));
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 315b0d6a65a..b1cc1d40fd5 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1863,6 +1863,7 @@ type_promotes_to (tree type)
      wider.  Scoped enums don't promote, but pretend they do for backward
      ABI bug compatibility wrt varargs.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
+	   || (flag_char8_t && type == char8_type_node)
 	   || type == char16_type_node
 	   || type == char32_type_node
 	   || type == wchar_type_node)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5ebfaaf85e6..d95512eb02a 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10714,6 +10714,7 @@ grokdeclarator (const cp_declarator *declarator,
 	  error_at (&richloc, "%<long%> and %<short%> specified together");
 	}
       else if (TREE_CODE (type) != INTEGER_TYPE
+	       || (flag_char8_t && type == char8_type_node)
 	       || type == char16_type_node || type == char32_type_node
 	       || ((long_p || short_p)
 		   && (explicit_char || explicit_intN)))
diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
index 47b99c3c469..c679eb73cdd 100644
--- a/gcc/cp/lex.c
+++ b/gcc/cp/lex.c
@@ -229,6 +229,8 @@ init_reswords (void)
     mask |= D_CXX_CONCEPTS;
   if (!flag_tm)
     mask |= D_TRANSMEM;
+  if (!flag_char8_t)
+    mask |= D_CXX_CHAR8_T;
   if (flag_no_asm)
     mask |= D_ASM | D_EXT;
   if (flag_no_gnu_keywords)
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 59a3111fba2..134b792d35b 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2527,10 +2527,12 @@ write_builtin_type (tree type)
       break;
 
     case INTEGER_TYPE:
-      /* TYPE may still be wchar_t, char16_t, or char32_t, since that
+      /* TYPE may still be wchar_t, char8_t, char16_t, or char32_t, since that
 	 isn't in integer_type_nodes.  */
       if (type == wchar_type_node)
 	write_char ('w');
+      else if (flag_char8_t && type == char8_type_node)
+	write_string ("Du");
       else if (type == char16_type_node)
 	write_string ("Ds");
       else if (type == char32_type_node)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ebe326eb923..888ecc5cd34 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -944,6 +944,7 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
     case RID_TYPENAME:
       /* Simple type specifiers.  */
     case RID_CHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_WCHAR:
@@ -4183,9 +4184,14 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	{
 	default:
 	case CPP_STRING:
-	case CPP_UTF8STRING:
 	  TREE_TYPE (value) = char_array_type_node;
 	  break;
+	case CPP_UTF8STRING:
+	  if (flag_char8_t)
+	    TREE_TYPE (value) = char8_array_type_node;
+	  else
+	    TREE_TYPE (value) = char_array_type_node;
+	  break;
 	case CPP_STRING16:
 	  TREE_TYPE (value) = char16_array_type_node;
 	  break;
@@ -17064,6 +17070,9 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 	decl_specs->explicit_char_p = true;
       type = char_type_node;
       break;
+    case RID_CHAR8:
+      type = char8_type_node;
+      break;
     case RID_CHAR16:
       type = char16_type_node;
       break;
@@ -28275,14 +28284,15 @@ cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 {
   decl_specs->any_specifiers_p = true;
 
-  /* If the user tries to redeclare bool, char16_t, char32_t, or wchar_t
-     (with, for example, in "typedef int wchar_t;") we remember that
+  /* If the user tries to redeclare bool, char8_t, char16_t, char32_t, or
+     wchar_t (with, for example, in "typedef int wchar_t;") we remember that
      this is what happened.  In system headers, we ignore these
      declarations so that G++ can work with system headers that are not
      C++-safe.  */
   if (decl_spec_seq_has_spec_p (decl_specs, ds_typedef)
       && !type_definition_p
       && (type_spec == boolean_type_node
+	  || (flag_char8_t && type_spec == char8_type_node)
 	  || type_spec == char16_type_node
 	  || type_spec == char32_type_node
 	  || type_spec == wchar_type_node)
diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index a0629e19360..987183f14f0 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -1539,7 +1539,7 @@ emit_support_tinfos (void)
   {
     &void_type_node,
     &boolean_type_node,
-    &wchar_type_node, &char16_type_node, &char32_type_node,
+    &wchar_type_node, &char8_type_node, &char16_type_node, &char32_type_node,
     &char_type_node, &signed_char_type_node, &unsigned_char_type_node,
     &short_integer_type_node, &short_unsigned_type_node,
     &integer_type_node, &unsigned_type_node,
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 251c344f181..4b5ba0b54e4 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5036,6 +5036,7 @@ char_type_p (tree type)
   return (same_type_p (type, char_type_node)
 	  || same_type_p (type, unsigned_char_type_node)
 	  || same_type_p (type, signed_char_type_node)
+	  || (flag_char8_t && same_type_p (type, char8_type_node))
 	  || same_type_p (type, char16_type_node)
 	  || same_type_p (type, char32_type_node)
 	  || same_type_p (type, wchar_type_node));
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index c921096cb31..7c50bc78304 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2206,6 +2206,7 @@ string_conv_p (const_tree totype, const_tree exp, int warn)
 
   t = TREE_TYPE (totype);
   if (!same_type_p (t, char_type_node)
+      && !(flag_char8_t && same_type_p (t, char8_type_node))
       && !same_type_p (t, char16_type_node)
       && !same_type_p (t, char32_type_node)
       && !same_type_p (t, wchar_type_node))
@@ -10206,6 +10207,7 @@ check_literal_operator_args (const_tree decl,
 	      t = TYPE_MAIN_VARIANT (t);
 	      if ((maybe_raw_p = same_type_p (t, char_type_node))
 		  || same_type_p (t, wchar_type_node)
+		  || (flag_char8_t && same_type_p (t, char8_type_node))
 		  || same_type_p (t, char16_type_node)
 		  || same_type_p (t, char32_type_node))
 		{
@@ -10238,6 +10240,8 @@ check_literal_operator_args (const_tree decl,
 	    max_arity = 1;
 	  else if (same_type_p (t, wchar_type_node))
 	    max_arity = 1;
+	  else if (flag_char8_t && same_type_p (t, char8_type_node))
+	    max_arity = 1;
 	  else if (same_type_p (t, char16_type_node))
 	    max_arity = 1;
 	  else if (same_type_p (t, char32_type_node))
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index fec1db00ca4..e80dd4bba1e 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1061,7 +1061,8 @@ digest_init_r (tree type, tree init, int nested, int flags,
 	{
 	  tree char_type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (init)));
 
-	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
+	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
+	      && (typ1 == char_type_node || !flag_char8_t))
 	    {
 	      if (char_type != char_type_node)
 		{
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 9035b333be8..fc90b5fae79 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -583,6 +583,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    affect C++ name mangling because in C++ these are distinct types
    not typedefs.  */
 
+#ifndef CHAR8_TYPE
+#define CHAR8_TYPE "unsigned char"
+#endif
+
 #ifdef UINT_LEAST16_TYPE
 #define CHAR16_TYPE UINT_LEAST16_TYPE
 #else
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 3f2a097e7f2..a45b041c400 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2355,9 +2355,10 @@ cplus_demangle_builtin_types[D_BUILTIN_TYPE_COUNT] =
   /* 27 */ { NL ("decimal64"),	NL ("decimal64"),	D_PRINT_DEFAULT },
   /* 28 */ { NL ("decimal128"),	NL ("decimal128"),	D_PRINT_DEFAULT },
   /* 29 */ { NL ("half"),	NL ("half"),		D_PRINT_FLOAT },
-  /* 30 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
-  /* 31 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
-  /* 32 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
+  /* 30 */ { NL ("char8_t"),	NL ("char8_t"),		D_PRINT_DEFAULT },
+  /* 31 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
+  /* 32 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
+  /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
 };
 
@@ -2645,14 +2646,19 @@ cplus_demangle_type (struct d_info *di)
 	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[29]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
+	case 'u':
+	  /* char8_t */
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  di->expansion += ret->u.s_builtin.type->len;
+	  break;
 	case 's':
 	  /* char16_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 	case 'i':
 	  /* char32_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
@@ -2678,7 +2684,7 @@ cplus_demangle_type (struct d_info *di)
 
         case 'n':
           /* decltype(nullptr) */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[33]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
diff --git a/libiberty/cp-demangle.h b/libiberty/cp-demangle.h
index 51b8a243e0e..d4405127645 100644
--- a/libiberty/cp-demangle.h
+++ b/libiberty/cp-demangle.h
@@ -173,7 +173,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (33)
+#define D_BUILTIN_TYPE_COUNT (34)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-11-05 19:40 [PATCH 2/9]: C++ P0482R5 char8_t: Core language support Tom Honermann
@ 2018-12-03 21:51 ` Jason Merrill
  2018-12-03 22:01   ` Jason Merrill
  2018-12-24  2:27 ` [REVISED PATCH " Tom Honermann
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2018-12-03 21:51 UTC (permalink / raw)
  To: Tom Honermann, gcc-patches

On 11/5/18 2:39 PM, Tom Honermann wrote:
> This patch adds support for the P0482R5 core language changes.  This 
> includes:
> - The -fchar8_t and -fno_char8_t command line options.
> - char8_t as a keyword.
> - The char8_t builtin type as a non-aliasing unsigned integral
>    character type of size 1.
> - Use of char8_t as a simple type specifier.
> - u8 character literals with type char8_t.
> - u8 string literals with type array of const char8_t.
> - User defined literal operators that accept char8_1 and char8_t pointer
>    types.
> - New __cpp_char8_t predefined feature test macro.
> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>    macros .
> - Name mangling and demangling for char8_t (using Du).
> 
> gcc/ChangeLog:
> 
> 2018-11-04  Tom Honermann  <tom@honermann.net>
> 
>       * defaults.h: Define CHAR8_TYPE.
> 
> gcc/c-family/ChangeLog:
> 
> 2018-11-04  Tom Honermann  <tom@honermann.net>
>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>       (fix_string_type): Use char8_t for the type of u8 string literals.
>       (c_common_get_alias_set): char8_t doesn't alias.
>       (c_common_nodes_and_builtins): Define char8_t as a builtin type in
>       C++.
>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>       (keyword_begins_type_specifier): Add RID_CHAR8.
>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>       Define char8_type_node and char8_array_type_node.
>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>       (c_cpp_builtins): Predefine __cpp_char8_t.
>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
>       type of CPP_UTF8STRING.
>       (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
>       * c-family/c.opt: Add the -fchar8_t command line option.
> 
> gcc/c/ChangeLog:
> 
> 2018-11-04  Tom Honermann  <tom@honermann.net>
> 
>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>       (digest_init): Handle initialization by a u8 string literal of
>       char8_t type.
> 
> gcc/cp/ChangeLog:
> 
> 2018-11-04  Tom Honermann  <tom@honermann.net>
> 
>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>       combinations involving char8_t.
>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>       * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
>       (Du).
>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>       cp_parser_simple_type_specifier): Recognize char8_t as a simple
>       type specifier.
>       (cp_parser_string_literal): Use char8_array_type_node for the type
>       of CPP_UTF8STRING.
>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
>       headers.
>       * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
>       * cp/tree.c (char_type_p): Recognize char8_t as a character type.
>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>       literals of char8_t type.
>       (check_literal_operator_args): Handle UDLs with u8 string literals
>       of char8_t type.
>       * cp/typeck2.c (digest_init_r): Disallow initializing a char array
>       with a u8 string literal.
> 
> libiberty/ChangeLog:
> 
> 2018-10-31  Tom Honermann  <tom@honermann.net>
>       * cp-demangle.c (cplus_demangle_builtin_types,
>       cplus_demangle_type): Add name demangling for char8_t (Du).
>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
>       new char8_t type.

> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>    if (!TYPE_P (t))
>      return -1;

> +  /* Unlike char, char8_t doesn't alias. */
> +  if (flag_char8_t && t == char8_type_node)
> +    return -1;

This seems unnecessary; doesn't the existing code have the same effect? 
I think we could do with just an adjustment to the existing comment.

> +  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
> +	  || (flag_char8_t && type == char8_type_node)
> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, char8_type_node));
> +	   || (flag_char8_t && type == char8_type_node
In many places you check the flag and then for one of the char8 types. 
Since the types won't be used without the flag, checking the flag seems 
redundant?

> -	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
> +	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
> +	      && (typ1 == char_type_node || !flag_char8_t))

This looks wrong, or at least incomplete; we want to complain about 
mismatched types here even with -fchar8-t.  Perhaps we should replace 
all of this if/else with simply comparing typ1 and char_type, and 
complaining if they're different.  Talking about wide and non-wide isn't 
as useful as the actual types would be.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-03 21:51 ` Jason Merrill
@ 2018-12-03 22:01   ` Jason Merrill
  2018-12-05  7:10     ` Tom Honermann
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2018-12-03 22:01 UTC (permalink / raw)
  To: Tom Honermann, gcc-patches

On 12/3/18 4:51 PM, Jason Merrill wrote:
> On 11/5/18 2:39 PM, Tom Honermann wrote:
>> This patch adds support for the P0482R5 core language changes.  This 
>> includes:
>> - The -fchar8_t and -fno_char8_t command line options.
>> - char8_t as a keyword.
>> - The char8_t builtin type as a non-aliasing unsigned integral
>>    character type of size 1.
>> - Use of char8_t as a simple type specifier.
>> - u8 character literals with type char8_t.
>> - u8 string literals with type array of const char8_t.
>> - User defined literal operators that accept char8_1 and char8_t pointer
>>    types.
>> - New __cpp_char8_t predefined feature test macro.
>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>    macros .
>> - Name mangling and demangling for char8_t (using Du).
>>
>> gcc/ChangeLog:
>>
>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>
>>       * defaults.h: Define CHAR8_TYPE.
>>
>> gcc/c-family/ChangeLog:
>>
>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>       (fix_string_type): Use char8_t for the type of u8 string literals.
>>       (c_common_get_alias_set): char8_t doesn't alias.
>>       (c_common_nodes_and_builtins): Define char8_t as a builtin type in
>>       C++.
>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>       Define char8_type_node and char8_array_type_node.
>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
>>       type of CPP_UTF8STRING.
>>       (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>
>> gcc/c/ChangeLog:
>>
>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>
>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>       (digest_init): Handle initialization by a u8 string literal of
>>       char8_t type.
>>
>> gcc/cp/ChangeLog:
>>
>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>
>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>       combinations involving char8_t.
>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>       * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
>>       (Du).
>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>       cp_parser_simple_type_specifier): Recognize char8_t as a simple
>>       type specifier.
>>       (cp_parser_string_literal): Use char8_array_type_node for the type
>>       of CPP_UTF8STRING.
>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
>>       headers.
>>       * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
>>       * cp/tree.c (char_type_p): Recognize char8_t as a character type.
>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>       literals of char8_t type.
>>       (check_literal_operator_args): Handle UDLs with u8 string literals
>>       of char8_t type.
>>       * cp/typeck2.c (digest_init_r): Disallow initializing a char array
>>       with a u8 string literal.
>>
>> libiberty/ChangeLog:
>>
>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
>>       new char8_t type.
> 
>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>    if (!TYPE_P (t))
>>      return -1;
> 
>> +  /* Unlike char, char8_t doesn't alias. */
>> +  if (flag_char8_t && t == char8_type_node)
>> +    return -1;
> 
> This seems unnecessary; doesn't the existing code have the same effect? 
> I think we could do with just an adjustment to the existing comment.
> 
>> +  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
>> +      || (flag_char8_t && type == char8_type_node)
>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>> char8_type_node));
>> +       || (flag_char8_t && type == char8_type_node
> In many places you check the flag and then for one of the char8 types. 
> Since the types won't be used without the flag, checking the flag seems 
> redundant?
> 
>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>> +          && (typ1 == char_type_node || !flag_char8_t))
> 
> This looks wrong, or at least incomplete; we want to complain about 
> mismatched types here even with -fchar8-t.  Perhaps we should replace 
> all of this if/else with simply comparing typ1 and char_type, and 
> complaining if they're different.  Talking about wide and non-wide isn't 
> as useful as the actual types would be.

Well, I suppose it isn't quite that simple, since we still need to treat 
the ordinary character types as interchangeable.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-03 22:01   ` Jason Merrill
@ 2018-12-05  7:10     ` Tom Honermann
  2018-12-05 16:16       ` Jason Merrill
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Honermann @ 2018-12-05  7:10 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches

On 12/3/18 5:01 PM, Jason Merrill wrote:
> On 12/3/18 4:51 PM, Jason Merrill wrote:
>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>> This patch adds support for the P0482R5 core language changes.  This 
>>> includes:
>>> - The -fchar8_t and -fno_char8_t command line options.
>>> - char8_t as a keyword.
>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>    character type of size 1.
>>> - Use of char8_t as a simple type specifier.
>>> - u8 character literals with type char8_t.
>>> - u8 string literals with type array of const char8_t.
>>> - User defined literal operators that accept char8_1 and char8_t 
>>> pointer
>>>    types.
>>> - New __cpp_char8_t predefined feature test macro.
>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>    macros .
>>> - Name mangling and demangling for char8_t (using Du).
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>
>>>       * defaults.h: Define CHAR8_TYPE.
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>> literals.
>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>> type in
>>>       C++.
>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>       Define char8_type_node and char8_array_type_node.
>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
>>>       type of CPP_UTF8STRING.
>>>       (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>
>>> gcc/c/ChangeLog:
>>>
>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>
>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>       (digest_init): Handle initialization by a u8 string literal of
>>>       char8_t type.
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>
>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>       combinations involving char8_t.
>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>       * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
>>>       (Du).
>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>       cp_parser_simple_type_specifier): Recognize char8_t as a simple
>>>       type specifier.
>>>       (cp_parser_string_literal): Use char8_array_type_node for the 
>>> type
>>>       of CPP_UTF8STRING.
>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in 
>>> system
>>>       headers.
>>>       * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character type.
>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>       literals of char8_t type.
>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>> literals
>>>       of char8_t type.
>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a char 
>>> array
>>>       with a u8 string literal.
>>>
>>> libiberty/ChangeLog:
>>>
>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
>>>       new char8_t type.
>>
>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>    if (!TYPE_P (t))
>>>      return -1;
>>
>>> +  /* Unlike char, char8_t doesn't alias. */
>>> +  if (flag_char8_t && t == char8_type_node)
>>> +    return -1;
>>
>> This seems unnecessary; doesn't the existing code have the same 
>> effect? I think we could do with just an adjustment to the existing 
>> comment.
I'm not sure.  I had concerns about unintended matching due to char8_t 
having an underlying type of unsigned char.
>>
>>> +  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
>>> +      || (flag_char8_t && type == char8_type_node)
>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>> char8_type_node));
>>> +       || (flag_char8_t && type == char8_type_node
>> In many places you check the flag and then for one of the char8 
>> types. Since the types won't be used without the flag, checking the 
>> flag seems redundant?

This was again protection against unintended matching of the underlying 
unsigned char type, particularly when compiling as C. char8_type_node is 
constructed (in c_common_nodes_and_builtins) following the pattern in 
place for char16_t and char32_t with the following code:

+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }

My knowledge of gcc internals is weak, but I understand this to be, 
effectively, defining a type alias (of unsigned char) and then, if 
compiling as C++, re-defining it as a strong unsigned type.

I don't recall the details now, but at one point, I was missing a check 
for flag_char8_t in some location and I encountered some test failures 
as a result.

>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>
>> This looks wrong, or at least incomplete; we want to complain about 
>> mismatched types here even with -fchar8-t.  Perhaps we should replace 
>> all of this if/else with simply comparing typ1 and char_type, and 
>> complaining if they're different.  Talking about wide and non-wide 
>> isn't as useful as the actual types would be.
>
> Well, I suppose it isn't quite that simple, since we still need to 
> treat the ordinary character types as interchangeable.

We do need to complain about mismatched types and test 
gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that happens.

Tom.

>
> Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-05  7:10     ` Tom Honermann
@ 2018-12-05 16:16       ` Jason Merrill
  2018-12-17 21:02         ` Jason Merrill
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2018-12-05 16:16 UTC (permalink / raw)
  To: Tom Honermann, gcc-patches

On 12/5/18 2:09 AM, Tom Honermann wrote:
> On 12/3/18 5:01 PM, Jason Merrill wrote:
>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>> This patch adds support for the P0482R5 core language changes.  This 
>>>> includes:
>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>> - char8_t as a keyword.
>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>    character type of size 1.
>>>> - Use of char8_t as a simple type specifier.
>>>> - u8 character literals with type char8_t.
>>>> - u8 string literals with type array of const char8_t.
>>>> - User defined literal operators that accept char8_1 and char8_t 
>>>> pointer
>>>>    types.
>>>> - New __cpp_char8_t predefined feature test macro.
>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>    macros .
>>>> - Name mangling and demangling for char8_t (using Du).
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>
>>>>       * defaults.h: Define CHAR8_TYPE.
>>>>
>>>> gcc/c-family/ChangeLog:
>>>>
>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>>> literals.
>>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>>> type in
>>>>       C++.
>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>       Define char8_type_node and char8_array_type_node.
>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
>>>>       type of CPP_UTF8STRING.
>>>>       (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
>>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>>
>>>> gcc/c/ChangeLog:
>>>>
>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>
>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>       (digest_init): Handle initialization by a u8 string literal of
>>>>       char8_t type.
>>>>
>>>> gcc/cp/ChangeLog:
>>>>
>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>
>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>       combinations involving char8_t.
>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
>>>>       (Du).
>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a simple
>>>>       type specifier.
>>>>       (cp_parser_string_literal): Use char8_array_type_node for the 
>>>> type
>>>>       of CPP_UTF8STRING.
>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in 
>>>> system
>>>>       headers.
>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character type.
>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>>       literals of char8_t type.
>>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>>> literals
>>>>       of char8_t type.
>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a char 
>>>> array
>>>>       with a u8 string literal.
>>>>
>>>> libiberty/ChangeLog:
>>>>
>>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
>>>>       new char8_t type.
>>>
>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>    if (!TYPE_P (t))
>>>>      return -1;
>>>
>>>> +  /* Unlike char, char8_t doesn't alias. */
>>>> +  if (flag_char8_t && t == char8_type_node)
>>>> +    return -1;
>>>
>>> This seems unnecessary; doesn't the existing code have the same 
>>> effect? I think we could do with just an adjustment to the existing 
>>> comment.
> I'm not sure.  I had concerns about unintended matching due to char8_t 
> having an underlying type of unsigned char.

That shouldn't be a problem: if char8_t is a distinct type, it won't 
match unsigned char, and if it's the same as unsigned char, flag_char8_t 
will be false.

>>>> +  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
>>>> +      || (flag_char8_t && type == char8_type_node)
>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>>> char8_type_node));
>>>> +       || (flag_char8_t && type == char8_type_node
>>> In many places you check the flag and then for one of the char8 
>>> types. Since the types won't be used without the flag, checking the 
>>> flag seems redundant?
> 
> This was again protection against unintended matching of the underlying 
> unsigned char type, particularly when compiling as C. char8_type_node is 
> constructed (in c_common_nodes_and_builtins) following the pattern in 
> place for char16_t and char32_t with the following code:
> 
> +  char8_type_node = get_identifier (CHAR8_TYPE);
> +  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
> +  char8_type_size = TYPE_PRECISION (char8_type_node);
> +  if (c_dialect_cxx ())
> +    {
> +      char8_type_node = make_unsigned_type (char8_type_size);
> +
> +      if (flag_char8_t)
> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
> +    }
> 
> My knowledge of gcc internals is weak, but I understand this to be, 
> effectively, defining a type alias (of unsigned char) and then, if 
> compiling as C++, re-defining it as a strong unsigned type.
> 
> I don't recall the details now, but at one point, I was missing a check 
> for flag_char8_t in some location and I encountered some test failures 
> as a result.

Since char8_type_node is always a distinct type in C++, we shouldn't 
need these checks in the C++ front end.  And since it's never a distinct 
type in C, the C front end (c/*) doesn't need to change at all.

>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>>
>>> This looks wrong, or at least incomplete; we want to complain about 
>>> mismatched types here even with -fchar8-t.  Perhaps we should replace 
>>> all of this if/else with simply comparing typ1 and char_type, and 
>>> complaining if they're different.  Talking about wide and non-wide 
>>> isn't as useful as the actual types would be.
>>
>> Well, I suppose it isn't quite that simple, since we still need to 
>> treat the ordinary character types as interchangeable.
> 
> We do need to complain about mismatched types and test 
> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that happens.

You don't seem to test initializing an array of signed/unsigned char, 
which I think will be broken by the change only considering char_type_node.

I think we want to specifically allow conversion from array of one 
ordinary character type to another, and otherwise complain about the 
types being different with a message like "cannot initialize array of 
%qT from array of %qT" rather than mess with terms like int-array and 
(non-)wide string.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-05 16:16       ` Jason Merrill
@ 2018-12-17 21:02         ` Jason Merrill
  2018-12-17 21:47           ` Tom Honermann
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2018-12-17 21:02 UTC (permalink / raw)
  To: Tom Honermann, gcc-patches

On 12/5/18 11:16 AM, Jason Merrill wrote:
> On 12/5/18 2:09 AM, Tom Honermann wrote:
>> On 12/3/18 5:01 PM, Jason Merrill wrote:
>>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>>> This patch adds support for the P0482R5 core language changes.  
>>>>> This includes:
>>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>>> - char8_t as a keyword.
>>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>>    character type of size 1.
>>>>> - Use of char8_t as a simple type specifier.
>>>>> - u8 character literals with type char8_t.
>>>>> - u8 string literals with type array of const char8_t.
>>>>> - User defined literal operators that accept char8_1 and char8_t 
>>>>> pointer
>>>>>    types.
>>>>> - New __cpp_char8_t predefined feature test macro.
>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>>    macros .
>>>>> - Name mangling and demangling for char8_t (using Du).
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>
>>>>>       * defaults.h: Define CHAR8_TYPE.
>>>>>
>>>>> gcc/c-family/ChangeLog:
>>>>>
>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>>>> literals.
>>>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>>>> type in
>>>>>       C++.
>>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>>       Define char8_type_node and char8_array_type_node.
>>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node as 
>>>>> the
>>>>>       type of CPP_UTF8STRING.
>>>>>       (lex_charconst): Use char8_type_node as the type of 
>>>>> CPP_UTF8CHAR.
>>>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>>>
>>>>> gcc/c/ChangeLog:
>>>>>
>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>
>>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>>       (digest_init): Handle initialization by a u8 string literal of
>>>>>       char8_t type.
>>>>>
>>>>> gcc/cp/ChangeLog:
>>>>>
>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>
>>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>>       combinations involving char8_t.
>>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for 
>>>>> char8_t
>>>>>       (Du).
>>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a simple
>>>>>       type specifier.
>>>>>       (cp_parser_string_literal): Use char8_array_type_node for the 
>>>>> type
>>>>>       of CPP_UTF8STRING.
>>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in 
>>>>> system
>>>>>       headers.
>>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for 
>>>>> char8_t.
>>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character 
>>>>> type.
>>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>>>       literals of char8_t type.
>>>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>>>> literals
>>>>>       of char8_t type.
>>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a char 
>>>>> array
>>>>>       with a u8 string literal.
>>>>>
>>>>> libiberty/ChangeLog:
>>>>>
>>>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate 
>>>>> the
>>>>>       new char8_t type.
>>>>
>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>>    if (!TYPE_P (t))
>>>>>      return -1;
>>>>
>>>>> +  /* Unlike char, char8_t doesn't alias. */
>>>>> +  if (flag_char8_t && t == char8_type_node)
>>>>> +    return -1;
>>>>
>>>> This seems unnecessary; doesn't the existing code have the same 
>>>> effect? I think we could do with just an adjustment to the existing 
>>>> comment.
>> I'm not sure.  I had concerns about unintended matching due to char8_t 
>> having an underlying type of unsigned char.
> 
> That shouldn't be a problem: if char8_t is a distinct type, it won't 
> match unsigned char, and if it's the same as unsigned char, flag_char8_t 
> will be false.
> 
>>>>> +  else if (flag_char8_t && TREE_TYPE (value) == 
>>>>> char8_array_type_node)
>>>>> +      || (flag_char8_t && type == char8_type_node)
>>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>>>> char8_type_node));
>>>>> +       || (flag_char8_t && type == char8_type_node
>>>> In many places you check the flag and then for one of the char8 
>>>> types. Since the types won't be used without the flag, checking the 
>>>> flag seems redundant?
>>
>> This was again protection against unintended matching of the 
>> underlying unsigned char type, particularly when compiling as C. 
>> char8_type_node is constructed (in c_common_nodes_and_builtins) 
>> following the pattern in place for char16_t and char32_t with the 
>> following code:
>>
>> +  char8_type_node = get_identifier (CHAR8_TYPE);
>> +  char8_type_node = TREE_TYPE (identifier_global_value 
>> (char8_type_node));
>> +  char8_type_size = TYPE_PRECISION (char8_type_node);
>> +  if (c_dialect_cxx ())
>> +    {
>> +      char8_type_node = make_unsigned_type (char8_type_size);
>> +
>> +      if (flag_char8_t)
>> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
>> +    }
>>
>> My knowledge of gcc internals is weak, but I understand this to be, 
>> effectively, defining a type alias (of unsigned char) and then, if 
>> compiling as C++, re-defining it as a strong unsigned type.
>>
>> I don't recall the details now, but at one point, I was missing a 
>> check for flag_char8_t in some location and I encountered some test 
>> failures as a result.
> 
> Since char8_type_node is always a distinct type in C++, we shouldn't 
> need these checks in the C++ front end.  And since it's never a distinct 
> type in C, the C front end (c/*) doesn't need to change at all.
> 
>>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>>>
>>>> This looks wrong, or at least incomplete; we want to complain about 
>>>> mismatched types here even with -fchar8-t.  Perhaps we should 
>>>> replace all of this if/else with simply comparing typ1 and 
>>>> char_type, and complaining if they're different.  Talking about wide 
>>>> and non-wide isn't as useful as the actual types would be.
>>>
>>> Well, I suppose it isn't quite that simple, since we still need to 
>>> treat the ordinary character types as interchangeable.
>>
>> We do need to complain about mismatched types and test 
>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that 
>> happens.
> 
> You don't seem to test initializing an array of signed/unsigned char, 
> which I think will be broken by the change only considering char_type_node.
> 
> I think we want to specifically allow conversion from array of one 
> ordinary character type to another, and otherwise complain about the 
> types being different with a message like "cannot initialize array of 
> %qT from array of %qT" rather than mess with terms like int-array and 
> (non-)wide string.

Ping.  Will you have a chance to update the patch soon, or would you 
like me to make these last changes myself?

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-17 21:02         ` Jason Merrill
@ 2018-12-17 21:47           ` Tom Honermann
  2018-12-24  2:32             ` Tom Honermann
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Honermann @ 2018-12-17 21:47 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches

On 12/17/18 4:02 PM, Jason Merrill wrote:
> On 12/5/18 11:16 AM, Jason Merrill wrote:
>> On 12/5/18 2:09 AM, Tom Honermann wrote:
>>> On 12/3/18 5:01 PM, Jason Merrill wrote:
>>>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>>>> This patch adds support for the P0482R5 core language changes.  
>>>>>> This includes:
>>>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>>>> - char8_t as a keyword.
>>>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>>>    character type of size 1.
>>>>>> - Use of char8_t as a simple type specifier.
>>>>>> - u8 character literals with type char8_t.
>>>>>> - u8 string literals with type array of const char8_t.
>>>>>> - User defined literal operators that accept char8_1 and char8_t 
>>>>>> pointer
>>>>>>    types.
>>>>>> - New __cpp_char8_t predefined feature test macro.
>>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>>>    macros .
>>>>>> - Name mangling and demangling for char8_t (using Du).
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * defaults.h: Define CHAR8_TYPE.
>>>>>>
>>>>>> gcc/c-family/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>>>>> literals.
>>>>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>>>>> type in
>>>>>>       C++.
>>>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>>>       Define char8_type_node and char8_array_type_node.
>>>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node 
>>>>>> as the
>>>>>>       type of CPP_UTF8STRING.
>>>>>>       (lex_charconst): Use char8_type_node as the type of 
>>>>>> CPP_UTF8CHAR.
>>>>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>>>>
>>>>>> gcc/c/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>>>       (digest_init): Handle initialization by a u8 string literal of
>>>>>>       char8_t type.
>>>>>>
>>>>>> gcc/cp/ChangeLog:
>>>>>>
>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>
>>>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>>>       combinations involving char8_t.
>>>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for 
>>>>>> char8_t
>>>>>>       (Du).
>>>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a 
>>>>>> simple
>>>>>>       type specifier.
>>>>>>       (cp_parser_string_literal): Use char8_array_type_node for 
>>>>>> the type
>>>>>>       of CPP_UTF8STRING.
>>>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs 
>>>>>> in system
>>>>>>       headers.
>>>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for 
>>>>>> char8_t.
>>>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a character 
>>>>>> type.
>>>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 string
>>>>>>       literals of char8_t type.
>>>>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>>>>> literals
>>>>>>       of char8_t type.
>>>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a 
>>>>>> char array
>>>>>>       with a u8 string literal.
>>>>>>
>>>>>> libiberty/ChangeLog:
>>>>>>
>>>>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to 
>>>>>> accommodate the
>>>>>>       new char8_t type.
>>>>>
>>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>>>    if (!TYPE_P (t))
>>>>>>      return -1;
>>>>>
>>>>>> +  /* Unlike char, char8_t doesn't alias. */
>>>>>> +  if (flag_char8_t && t == char8_type_node)
>>>>>> +    return -1;
>>>>>
>>>>> This seems unnecessary; doesn't the existing code have the same 
>>>>> effect? I think we could do with just an adjustment to the 
>>>>> existing comment.
>>> I'm not sure.  I had concerns about unintended matching due to 
>>> char8_t having an underlying type of unsigned char.
>>
>> That shouldn't be a problem: if char8_t is a distinct type, it won't 
>> match unsigned char, and if it's the same as unsigned char, 
>> flag_char8_t will be false.
>>
>>>>>> +  else if (flag_char8_t && TREE_TYPE (value) == 
>>>>>> char8_array_type_node)
>>>>>> +      || (flag_char8_t && type == char8_type_node)
>>>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>>>>> char8_type_node));
>>>>>> +       || (flag_char8_t && type == char8_type_node
>>>>> In many places you check the flag and then for one of the char8 
>>>>> types. Since the types won't be used without the flag, checking 
>>>>> the flag seems redundant?
>>>
>>> This was again protection against unintended matching of the 
>>> underlying unsigned char type, particularly when compiling as C. 
>>> char8_type_node is constructed (in c_common_nodes_and_builtins) 
>>> following the pattern in place for char16_t and char32_t with the 
>>> following code:
>>>
>>> +  char8_type_node = get_identifier (CHAR8_TYPE);
>>> +  char8_type_node = TREE_TYPE (identifier_global_value 
>>> (char8_type_node));
>>> +  char8_type_size = TYPE_PRECISION (char8_type_node);
>>> +  if (c_dialect_cxx ())
>>> +    {
>>> +      char8_type_node = make_unsigned_type (char8_type_size);
>>> +
>>> +      if (flag_char8_t)
>>> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
>>> +    }
>>>
>>> My knowledge of gcc internals is weak, but I understand this to be, 
>>> effectively, defining a type alias (of unsigned char) and then, if 
>>> compiling as C++, re-defining it as a strong unsigned type.
>>>
>>> I don't recall the details now, but at one point, I was missing a 
>>> check for flag_char8_t in some location and I encountered some test 
>>> failures as a result.
>>
>> Since char8_type_node is always a distinct type in C++, we shouldn't 
>> need these checks in the C++ front end.  And since it's never a 
>> distinct type in C, the C front end (c/*) doesn't need to change at all.
>>
>>>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>>>>
>>>>> This looks wrong, or at least incomplete; we want to complain 
>>>>> about mismatched types here even with -fchar8-t. Perhaps we should 
>>>>> replace all of this if/else with simply comparing typ1 and 
>>>>> char_type, and complaining if they're different.  Talking about 
>>>>> wide and non-wide isn't as useful as the actual types would be.
>>>>
>>>> Well, I suppose it isn't quite that simple, since we still need to 
>>>> treat the ordinary character types as interchangeable.
>>>
>>> We do need to complain about mismatched types and test 
>>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that 
>>> happens.
>>
>> You don't seem to test initializing an array of signed/unsigned char, 
>> which I think will be broken by the change only considering 
>> char_type_node.
>>
>> I think we want to specifically allow conversion from array of one 
>> ordinary character type to another, and otherwise complain about the 
>> types being different with a message like "cannot initialize array of 
>> %qT from array of %qT" rather than mess with terms like int-array and 
>> (non-)wide string.
>
> Ping.  Will you have a chance to update the patch soon, or would you 
> like me to make these last changes myself?

Hi, Jason.  Updating the patch has remained high on my todo list, but 
I've been under water elsewhere.  I've been hoping to get to it in the 
next week or so.  If you have time available to help complete changes, 
sure, that would be great.

Tom.

>
> Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-11-05 19:40 [PATCH 2/9]: C++ P0482R5 char8_t: Core language support Tom Honermann
  2018-12-03 21:51 ` Jason Merrill
@ 2018-12-24  2:27 ` Tom Honermann
  2019-01-14 19:59   ` Jason Merrill
  1 sibling, 1 reply; 14+ messages in thread
From: Tom Honermann @ 2018-12-24  2:27 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3646 bytes --]

Attached is a revised patch that addresses changes in P0482R6 as well as 
feedback provided by Jason.  Changes from the prior patch include:
- Updated the value of the __cpp_char8_t feature test macro to 201811
   per P0482R6.
- Enable char8_t support with -std=c++2a per adoption of P0482R6 in
   San Diego.
- Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
   by Jason.
- Removed unnecessary checks of 'flag_char8_t' within the C++ front
   end as requested by Jason.
- Corrected the regression spotted by Jason regarding initialization of
   signed char and unsigned char arrays with string literals.
- Made minor changes to the error message emitted for ill-formed
   initialization of char arrays with UTF-8 string literals.  These
   changes do not yet implement Jason's suggestion; I'll follow up with a
   separate patch for that due to additional test impact.

Tested on x86_64-linux.

gcc/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * defaults.h: Define CHAR8_TYPE.

gcc/c-family/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>
      * c-family/c-common.c (c_common_reswords): Add char8_t.
      (fix_string_type): Use char8_t for the type of u8 string literals.
      (c_common_get_alias_set): char8_t doesn't alias.
      (c_common_nodes_and_builtins): Define char8_t as a builtin type in
      C++.
      (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
      (keyword_begins_type_specifier): Add RID_CHAR8.
      * c-family/c-common.h (rid): Add RID_CHAR8.
      (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
      Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
      Define char8_type_node and char8_array_type_node.
      * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
      __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
      (c_cpp_builtins): Predefine __cpp_char8_t.
      * c-family/c-lex.c (lex_string): Use char8_array_type_node as the
      type of CPP_UTF8STRING.
      (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
      * c-family/c-opts.c: If not otherwise specified, enable -fchar8_t
      when targeting C++2a.
      * c-family/c.opt: Add the -fchar8_t command line option.

gcc/cp/ChangeLog:

2018-11-04  Tom Honermann  <tom@honermann.net>

      * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
      * cp/decl.c (grokdeclarator): Handle invalid type specifier
      combinations involving char8_t.
      * cp/lex.c (init_reswords): Add char8_t as a reserved word.
      * cp/mangle.c (write_builtin_type): Add name mangling for char8_t
      (Du).
      * cp/parser.c (cp_keyword_starts_decl_specifier_p,
      cp_parser_simple_type_specifier): Recognize char8_t as a simple
      type specifier.
      (cp_parser_string_literal): Use char8_array_type_node for the type
      of CPP_UTF8STRING.
      (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
      headers.
      * cp/rtti.c (emit_support_tinfos): type_info support for char8_t.
      * cp/tree.c (char_type_p): Recognize char8_t as a character type.
      * cp/typeck.c (string_conv_p): Handle conversions of u8 string
      literals of char8_t type.
      (check_literal_operator_args): Handle UDLs with u8 string literals
      of char8_t type.
      * cp/typeck2.c (digest_init_r): Disallow initializing a char array
      with a u8 string literal.

libiberty/ChangeLog:

2018-10-31  Tom Honermann  <tom@honermann.net>
      * cp-demangle.c (cplus_demangle_builtin_types,
      cplus_demangle_type): Add name demangling for char8_t (Du).
      * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
      new char8_t type.

Tom.



[-- Attachment #2: p0482r5-2-2.patch --]
[-- Type: text/x-patch, Size: 21355 bytes --]

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index f10cf89c3a7..b387daca137 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -79,6 +79,7 @@ machine_mode c_default_pointer_mode = VOIDmode;
 	tree signed_char_type_node;
 	tree wchar_type_node;
 
+	tree char8_type_node;
 	tree char16_type_node;
 	tree char32_type_node;
 
@@ -128,6 +129,11 @@ machine_mode c_default_pointer_mode = VOIDmode;
 
 	tree wchar_array_type_node;
 
+   Type `char8_t[SOMENUMBER]' or something like it.
+   Used when a UTF-8 string literal is created.
+
+	tree char8_array_type_node;
+
    Type `char16_t[SOMENUMBER]' or something like it.
    Used when a UTF-16 string literal is created.
 
@@ -450,6 +456,7 @@ const struct c_common_resword c_common_reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	D_CXX_OBJC | D_CXXWARN },
   { "char",		RID_CHAR,	0 },
+  { "char8_t",		RID_CHAR8,	D_CXX_CHAR8_T_FLAGS | D_CXXWARN },
   { "char16_t",		RID_CHAR16,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "char32_t",		RID_CHAR32,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "class",		RID_CLASS,	D_CXX_OBJC | D_CXXWARN },
@@ -746,6 +753,11 @@ fix_string_type (tree value)
       nchars = length;
       e_type = char_type_node;
     }
+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+    {
+      nchars = length / (TYPE_PRECISION (char8_type_node) / BITS_PER_UNIT);
+      e_type = char8_type_node;
+    }
   else if (TREE_TYPE (value) == char16_array_type_node)
     {
       nchars = length / (TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT);
@@ -813,7 +825,8 @@ fix_string_type (tree value)
    CPP_STRING16, or CPP_STRING32.  Return CPP_OTHER in case of error.
    This may not be exactly the string token type that initially created
    the string, since CPP_WSTRING is indistinguishable from the 16/32 bit
-   string type at this point.
+   string type, and CPP_UTF8STRING is indistinguishable from CPP_STRING
+   at this point.
 
    This effectively reverses part of the logic in lex_string and
    fix_string_type.  */
@@ -3543,8 +3556,12 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
     return -1;
 
+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;
+
   /* The C standard guarantees that any object may be accessed via an
-     lvalue that has character type.  */
+     lvalue that has narrow character type (except char8_t).  */
   if (t == char_type_node
       || t == signed_char_type_node
       || t == unsigned_char_type_node)
@@ -3953,6 +3970,7 @@ c_get_ident (const char *id)
 void
 c_common_nodes_and_builtins (void)
 {
+  int char8_type_size;
   int char16_type_size;
   int char32_type_size;
   int wchar_type_size;
@@ -4244,6 +4262,22 @@ c_common_nodes_and_builtins (void)
   wchar_array_type_node
     = build_array_type (wchar_type_node, array_domain_type);
 
+  /* Define 'char8_t'.  */
+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }
+
+  /* This is for UTF-8 string constants.  */
+  char8_array_type_node
+    = build_array_type (char8_type_node, array_domain_type);
+
   /* Define 'char16_t'.  */
   char16_type_node = get_identifier (CHAR16_TYPE);
   char16_type_node = TREE_TYPE (identifier_global_value (char16_type_node));
@@ -5041,6 +5075,8 @@ c_stddef_cpp_builtins(void)
   builtin_define_with_value ("__WINT_TYPE__", WINT_TYPE, 0);
   builtin_define_with_value ("__INTMAX_TYPE__", INTMAX_TYPE, 0);
   builtin_define_with_value ("__UINTMAX_TYPE__", UINTMAX_TYPE, 0);
+  if (flag_char8_t)
+    builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
   builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
   builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
   if (SIG_ATOMIC_TYPE)
@@ -7717,6 +7753,7 @@ keyword_begins_type_specifier (enum rid keyword)
     case RID_ACCUM:
     case RID_BOOL:
     case RID_WCHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_SAT:
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 641fe57d671..56992b63c0b 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -179,6 +179,9 @@ enum rid
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
+  /* char8_t */
+  RID_CHAR8,
+
   /* C++ concepts */
   RID_CONCEPT, RID_REQUIRES,
 
@@ -286,6 +289,7 @@ extern GTY ((length ("(int) RID_MAX"))) tree *ridpointers;
 
 enum c_tree_index
 {
+    CTI_CHAR8_TYPE,
     CTI_CHAR16_TYPE,
     CTI_CHAR32_TYPE,
     CTI_WCHAR_TYPE,
@@ -329,6 +333,7 @@ enum c_tree_index
     CTI_UINTPTR_TYPE,
 
     CTI_CHAR_ARRAY_TYPE,
+    CTI_CHAR8_ARRAY_TYPE,
     CTI_CHAR16_ARRAY_TYPE,
     CTI_CHAR32_ARRAY_TYPE,
     CTI_WCHAR_ARRAY_TYPE,
@@ -408,20 +413,22 @@ extern machine_mode c_default_pointer_mode;
    mask) is _true_.  Thus for keywords which are present in all
    languages the disable field is zero.  */
 
-#define D_CONLY		0x001	/* C only (not in C++).  */
-#define D_CXXONLY	0x002	/* C++ only (not in C).  */
-#define D_C99		0x004	/* In C, C99 only.  */
-#define D_CXX11         0x008	/* In C++, C++11 only.  */
-#define D_EXT		0x010	/* GCC extension.  */
-#define D_EXT89		0x020	/* GCC extension incorporated in C99.  */
-#define D_ASM		0x040	/* Disabled by -fno-asm.  */
-#define D_OBJC		0x080	/* In Objective C and neither C nor C++.  */
-#define D_CXX_OBJC	0x100	/* In Objective C, and C++, but not C.  */
-#define D_CXXWARN	0x200	/* In C warn with -Wcxx-compat.  */
-#define D_CXX_CONCEPTS  0x400   /* In C++, only with concepts. */
-#define D_TRANSMEM	0X800   /* C++ transactional memory TS.  */
+#define D_CONLY		0x0001	/* C only (not in C++).  */
+#define D_CXXONLY	0x0002	/* C++ only (not in C).  */
+#define D_C99		0x0004	/* In C, C99 only.  */
+#define D_CXX11         0x0008	/* In C++, C++11 only.  */
+#define D_EXT		0x0010	/* GCC extension.  */
+#define D_EXT89		0x0020	/* GCC extension incorporated in C99.  */
+#define D_ASM		0x0040	/* Disabled by -fno-asm.  */
+#define D_OBJC		0x0080	/* In Objective C and neither C nor C++.  */
+#define D_CXX_OBJC	0x0100	/* In Objective C, and C++, but not C.  */
+#define D_CXXWARN	0x0200	/* In C warn with -Wcxx-compat.  */
+#define D_CXX_CONCEPTS  0x0400	/* In C++, only with concepts.  */
+#define D_TRANSMEM	0X0800	/* C++ transactional memory TS.  */
+#define D_CXX_CHAR8_T	0X1000	/* In C++, only with -fchar8_t.  */
 
 #define D_CXX_CONCEPTS_FLAGS D_CXXONLY | D_CXX_CONCEPTS
+#define D_CXX_CHAR8_T_FLAGS D_CXXONLY | D_CXX_CHAR8_T
 
 /* The reserved keyword table.  */
 extern const struct c_common_resword c_common_reswords[];
@@ -429,6 +436,7 @@ extern const struct c_common_resword c_common_reswords[];
 /* The number of items in the reserved keyword table.  */
 extern const unsigned int num_c_common_reswords;
 
+#define char8_type_node			c_global_trees[CTI_CHAR8_TYPE]
 #define char16_type_node		c_global_trees[CTI_CHAR16_TYPE]
 #define char32_type_node		c_global_trees[CTI_CHAR32_TYPE]
 #define wchar_type_node			c_global_trees[CTI_WCHAR_TYPE]
@@ -474,6 +482,7 @@ extern const unsigned int num_c_common_reswords;
 #define truthvalue_false_node		c_global_trees[CTI_TRUTHVALUE_FALSE]
 
 #define char_array_type_node		c_global_trees[CTI_CHAR_ARRAY_TYPE]
+#define char8_array_type_node		c_global_trees[CTI_CHAR8_ARRAY_TYPE]
 #define char16_array_type_node		c_global_trees[CTI_CHAR16_ARRAY_TYPE]
 #define char32_array_type_node		c_global_trees[CTI_CHAR32_ARRAY_TYPE]
 #define wchar_array_type_node		c_global_trees[CTI_WCHAR_ARRAY_TYPE]
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 96a6b4dfd2b..1287b55d3b3 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -702,6 +702,11 @@ cpp_atomic_builtins (cpp_reader *pfile)
 			(have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1));
+  if (flag_char8_t)
+    {
+      builtin_define_with_int_value ("__GCC_ATOMIC_CHAR8_T_LOCK_FREE",
+			(have_swap[SWAP_INDEX (char8_type_node)]? 2 : 1));
+    }
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR16_T_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR32_T_LOCK_FREE", 
@@ -993,6 +998,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	cpp_define (pfile, "__cpp_template_template_args=201611");
       if (flag_threadsafe_statics)
 	cpp_define (pfile, "__cpp_threadsafe_static_init=200806");
+      if (flag_char8_t)
+        cpp_define (pfile, "__cpp_char8_t=201811");
     }
   /* Note that we define this for C as well, so that we know if
      __attribute__((cleanup)) will interface with EH.  */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 28a820a2a3d..f7cf79ee350 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1279,9 +1279,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
     {
     default:
     case CPP_STRING:
-    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
+    case CPP_UTF8STRING:
+      if (flag_char8_t)
+        TREE_TYPE (value) = char8_array_type_node;
+      else
+        TREE_TYPE (value) = char_array_type_node;
+      break;
     case CPP_STRING16:
       TREE_TYPE (value) = char16_array_type_node;
       break;
@@ -1321,7 +1326,12 @@ lex_charconst (const cpp_token *token)
   else if (token->type == CPP_CHAR16)
     type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
-    type = char_type_node;
+    {
+      if (flag_char8_t)
+        type = char8_type_node;
+      else
+        type = char_type_node;
+    }
   /* In C, a character constant has type 'int'.
      In C++ 'char', but multi-char charconsts have type 'int'.  */
   else if (!c_dialect_cxx () || chars_seen > 1)
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 9cf1900fb9a..507bf122e3d 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -995,6 +995,10 @@ c_common_post_options (const char **pfilename)
   if (flag_sized_deallocation == -1)
     flag_sized_deallocation = (cxx_dialect >= cxx14);
 
+  /* char8_t support is new in C++2A.  */
+  if (flag_char8_t == -1)
+    flag_char8_t = (cxx_dialect >= cxx2a);
+
   if (flag_extern_tls_init)
     {
       if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 6f88a1013d6..5d5f5c26ce0 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1291,6 +1291,11 @@ fcanonical-system-headers
 C ObjC C++ ObjC++
 Where shorter, use canonicalized paths to systems headers.
 
+fchar8_t
+C++ ObjC++ Var(flag_char8_t) Init(-1)
+Enable the char8_t fundamental type and use it as the type for UTF-8 string
+and character literals.
+
 fcheck-pointer-bounds
 C ObjC C++ ObjC++ LTO Deprecated
 Deprecated in GCC 9.  This switch has no effect.
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 315b0d6a65a..c40606f7c2c 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1863,6 +1863,7 @@ type_promotes_to (tree type)
      wider.  Scoped enums don't promote, but pretend they do for backward
      ABI bug compatibility wrt varargs.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
+	   || type == char8_type_node
 	   || type == char16_type_node
 	   || type == char32_type_node
 	   || type == wchar_type_node)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5ebfaaf85e6..f3c09ad2274 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10714,7 +10714,9 @@ grokdeclarator (const cp_declarator *declarator,
 	  error_at (&richloc, "%<long%> and %<short%> specified together");
 	}
       else if (TREE_CODE (type) != INTEGER_TYPE
-	       || type == char16_type_node || type == char32_type_node
+	       || type == char8_type_node
+	       || type == char16_type_node
+	       || type == char32_type_node
 	       || ((long_p || short_p)
 		   && (explicit_char || explicit_intN)))
 	error_at (loc, "%qs specified with %qT", key, type);
diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
index 47b99c3c469..c679eb73cdd 100644
--- a/gcc/cp/lex.c
+++ b/gcc/cp/lex.c
@@ -229,6 +229,8 @@ init_reswords (void)
     mask |= D_CXX_CONCEPTS;
   if (!flag_tm)
     mask |= D_TRANSMEM;
+  if (!flag_char8_t)
+    mask |= D_CXX_CHAR8_T;
   if (flag_no_asm)
     mask |= D_ASM | D_EXT;
   if (flag_no_gnu_keywords)
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 59a3111fba2..2363ed87dee 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2527,10 +2527,12 @@ write_builtin_type (tree type)
       break;
 
     case INTEGER_TYPE:
-      /* TYPE may still be wchar_t, char16_t, or char32_t, since that
+      /* TYPE may still be wchar_t, char8_t, char16_t, or char32_t, since that
 	 isn't in integer_type_nodes.  */
       if (type == wchar_type_node)
 	write_char ('w');
+      else if (type == char8_type_node)
+	write_string ("Du");
       else if (type == char16_type_node)
 	write_string ("Ds");
       else if (type == char32_type_node)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index ebe326eb923..1e76779b84e 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -944,6 +944,7 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
     case RID_TYPENAME:
       /* Simple type specifiers.  */
     case RID_CHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_WCHAR:
@@ -4183,9 +4184,14 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	{
 	default:
 	case CPP_STRING:
-	case CPP_UTF8STRING:
 	  TREE_TYPE (value) = char_array_type_node;
 	  break;
+	case CPP_UTF8STRING:
+	  if (flag_char8_t)
+	    TREE_TYPE (value) = char8_array_type_node;
+	  else
+	    TREE_TYPE (value) = char_array_type_node;
+	  break;
 	case CPP_STRING16:
 	  TREE_TYPE (value) = char16_array_type_node;
 	  break;
@@ -17064,6 +17070,9 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 	decl_specs->explicit_char_p = true;
       type = char_type_node;
       break;
+    case RID_CHAR8:
+      type = char8_type_node;
+      break;
     case RID_CHAR16:
       type = char16_type_node;
       break;
@@ -28275,14 +28284,15 @@ cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 {
   decl_specs->any_specifiers_p = true;
 
-  /* If the user tries to redeclare bool, char16_t, char32_t, or wchar_t
-     (with, for example, in "typedef int wchar_t;") we remember that
+  /* If the user tries to redeclare bool, char8_t, char16_t, char32_t, or
+     wchar_t (with, for example, in "typedef int wchar_t;") we remember that
      this is what happened.  In system headers, we ignore these
      declarations so that G++ can work with system headers that are not
      C++-safe.  */
   if (decl_spec_seq_has_spec_p (decl_specs, ds_typedef)
       && !type_definition_p
       && (type_spec == boolean_type_node
+	  || type_spec == char8_type_node
 	  || type_spec == char16_type_node
 	  || type_spec == char32_type_node
 	  || type_spec == wchar_type_node)
diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index a0629e19360..987183f14f0 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -1539,7 +1539,7 @@ emit_support_tinfos (void)
   {
     &void_type_node,
     &boolean_type_node,
-    &wchar_type_node, &char16_type_node, &char32_type_node,
+    &wchar_type_node, &char8_type_node, &char16_type_node, &char32_type_node,
     &char_type_node, &signed_char_type_node, &unsigned_char_type_node,
     &short_integer_type_node, &short_unsigned_type_node,
     &integer_type_node, &unsigned_type_node,
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 251c344f181..52bd62b27b5 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5036,6 +5036,7 @@ char_type_p (tree type)
   return (same_type_p (type, char_type_node)
 	  || same_type_p (type, unsigned_char_type_node)
 	  || same_type_p (type, signed_char_type_node)
+	  || same_type_p (type, char8_type_node)
 	  || same_type_p (type, char16_type_node)
 	  || same_type_p (type, char32_type_node)
 	  || same_type_p (type, wchar_type_node));
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index c921096cb31..8f78d8cf1f3 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2206,6 +2206,7 @@ string_conv_p (const_tree totype, const_tree exp, int warn)
 
   t = TREE_TYPE (totype);
   if (!same_type_p (t, char_type_node)
+      && !same_type_p (t, char8_type_node)
       && !same_type_p (t, char16_type_node)
       && !same_type_p (t, char32_type_node)
       && !same_type_p (t, wchar_type_node))
@@ -10206,6 +10207,7 @@ check_literal_operator_args (const_tree decl,
 	      t = TYPE_MAIN_VARIANT (t);
 	      if ((maybe_raw_p = same_type_p (t, char_type_node))
 		  || same_type_p (t, wchar_type_node)
+		  || same_type_p (t, char8_type_node)
 		  || same_type_p (t, char16_type_node)
 		  || same_type_p (t, char32_type_node))
 		{
@@ -10238,6 +10240,8 @@ check_literal_operator_args (const_tree decl,
 	    max_arity = 1;
 	  else if (same_type_p (t, wchar_type_node))
 	    max_arity = 1;
+	  else if (same_type_p (t, char8_type_node))
+	    max_arity = 1;
 	  else if (same_type_p (t, char16_type_node))
 	    max_arity = 1;
 	  else if (same_type_p (t, char32_type_node))
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index fec1db00ca4..782fd7f9cd5 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1063,7 +1063,21 @@ digest_init_r (tree type, tree init, int nested, int flags,
 
 	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
 	    {
-	      if (char_type != char_type_node)
+	      if (typ1 != char8_type_node && char_type == char8_type_node)
+		{
+		  if (complain & tf_error)
+		    error_at (loc, "char-array initialized from UTF-8 string");
+		  return error_mark_node;
+		}
+	      else if (typ1 == char8_type_node && char_type == char_type_node)
+		{
+		  if (complain & tf_error)
+		    error_at (loc, "char8_t-array initialized from ordinary "
+			      "string");
+		  return error_mark_node;
+		}
+	      else if (char_type != char_type_node
+		       && char_type != char8_type_node)
 		{
 		  if (complain & tf_error)
 		    error_at (loc, "char-array initialized from wide string");
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 9035b333be8..fc90b5fae79 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -583,6 +583,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    affect C++ name mangling because in C++ these are distinct types
    not typedefs.  */
 
+#ifndef CHAR8_TYPE
+#define CHAR8_TYPE "unsigned char"
+#endif
+
 #ifdef UINT_LEAST16_TYPE
 #define CHAR16_TYPE UINT_LEAST16_TYPE
 #else
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index 3f2a097e7f2..a45b041c400 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2355,9 +2355,10 @@ cplus_demangle_builtin_types[D_BUILTIN_TYPE_COUNT] =
   /* 27 */ { NL ("decimal64"),	NL ("decimal64"),	D_PRINT_DEFAULT },
   /* 28 */ { NL ("decimal128"),	NL ("decimal128"),	D_PRINT_DEFAULT },
   /* 29 */ { NL ("half"),	NL ("half"),		D_PRINT_FLOAT },
-  /* 30 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
-  /* 31 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
-  /* 32 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
+  /* 30 */ { NL ("char8_t"),	NL ("char8_t"),		D_PRINT_DEFAULT },
+  /* 31 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
+  /* 32 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
+  /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
 };
 
@@ -2645,14 +2646,19 @@ cplus_demangle_type (struct d_info *di)
 	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[29]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
+	case 'u':
+	  /* char8_t */
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  di->expansion += ret->u.s_builtin.type->len;
+	  break;
 	case 's':
 	  /* char16_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 	case 'i':
 	  /* char32_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
@@ -2678,7 +2684,7 @@ cplus_demangle_type (struct d_info *di)
 
         case 'n':
           /* decltype(nullptr) */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[33]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
diff --git a/libiberty/cp-demangle.h b/libiberty/cp-demangle.h
index 51b8a243e0e..d4405127645 100644
--- a/libiberty/cp-demangle.h
+++ b/libiberty/cp-demangle.h
@@ -173,7 +173,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (33)
+#define D_BUILTIN_TYPE_COUNT (34)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-17 21:47           ` Tom Honermann
@ 2018-12-24  2:32             ` Tom Honermann
  0 siblings, 0 replies; 14+ messages in thread
From: Tom Honermann @ 2018-12-24  2:32 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches

Thanks, Jason.  I just sent a revised set of patches addressing most of 
your feedback with exceptions as described inline below.

On 12/17/18 4:47 PM, Tom Honermann wrote:
> On 12/17/18 4:02 PM, Jason Merrill wrote:
>> On 12/5/18 11:16 AM, Jason Merrill wrote:
>>> On 12/5/18 2:09 AM, Tom Honermann wrote:
>>>> On 12/3/18 5:01 PM, Jason Merrill wrote:
>>>>> On 12/3/18 4:51 PM, Jason Merrill wrote:
>>>>>> On 11/5/18 2:39 PM, Tom Honermann wrote:
>>>>>>> This patch adds support for the P0482R5 core language changes.  
>>>>>>> This includes:
>>>>>>> - The -fchar8_t and -fno_char8_t command line options.
>>>>>>> - char8_t as a keyword.
>>>>>>> - The char8_t builtin type as a non-aliasing unsigned integral
>>>>>>>    character type of size 1.
>>>>>>> - Use of char8_t as a simple type specifier.
>>>>>>> - u8 character literals with type char8_t.
>>>>>>> - u8 string literals with type array of const char8_t.
>>>>>>> - User defined literal operators that accept char8_1 and char8_t 
>>>>>>> pointer
>>>>>>>    types.
>>>>>>> - New __cpp_char8_t predefined feature test macro.
>>>>>>> - New __CHAR8_TYPE__ and __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined
>>>>>>>    macros .
>>>>>>> - Name mangling and demangling for char8_t (using Du).
>>>>>>>
>>>>>>> gcc/ChangeLog:
>>>>>>>
>>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>>
>>>>>>>       * defaults.h: Define CHAR8_TYPE.
>>>>>>>
>>>>>>> gcc/c-family/ChangeLog:
>>>>>>>
>>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>>       * c-family/c-common.c (c_common_reswords): Add char8_t.
>>>>>>>       (fix_string_type): Use char8_t for the type of u8 string 
>>>>>>> literals.
>>>>>>>       (c_common_get_alias_set): char8_t doesn't alias.
>>>>>>>       (c_common_nodes_and_builtins): Define char8_t as a builtin 
>>>>>>> type in
>>>>>>>       C++.
>>>>>>>       (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
>>>>>>>       (keyword_begins_type_specifier): Add RID_CHAR8.
>>>>>>>       * gcc/c-family/c-common.h (rid): Add RID_CHAR8.
>>>>>>>       (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
>>>>>>>       Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
>>>>>>>       Define char8_type_node and char8_array_type_node.
>>>>>>>       * c-family/c-cppbuiltin.c (cpp_atomic_builtins): Predefine
>>>>>>>       __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
>>>>>>>       (c_cpp_builtins): Predefine __cpp_char8_t.
>>>>>>>       * c-family/c-lex.c (lex_string): Use char8_array_type_node 
>>>>>>> as the
>>>>>>>       type of CPP_UTF8STRING.
>>>>>>>       (lex_charconst): Use char8_type_node as the type of 
>>>>>>> CPP_UTF8CHAR.
>>>>>>>       * c-family/c.opt: Add the -fchar8_t command line option.
>>>>>>>
>>>>>>> gcc/c/ChangeLog:
>>>>>>>
>>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>>
>>>>>>>       * c/c-typeck.c (char_type_p): Add char8_type_node.
>>>>>>>       (digest_init): Handle initialization by a u8 string 
>>>>>>> literal of
>>>>>>>       char8_t type.
>>>>>>>
>>>>>>> gcc/cp/ChangeLog:
>>>>>>>
>>>>>>> 2018-11-04  Tom Honermann  <tom@honermann.net>
>>>>>>>
>>>>>>>       * cp/cvt.c (type_promotes_to): Handle char8_t promotion.
>>>>>>>       * cp/decl.c (grokdeclarator): Handle invalid type specifier
>>>>>>>       combinations involving char8_t.
>>>>>>>       * cp/lex.c (init_reswords): Add char8_t as a reserved word.
>>>>>>>       * cp/mangle.c (write_builtin_type): Add name mangling for 
>>>>>>> char8_t
>>>>>>>       (Du).
>>>>>>>       * cp/parser.c (cp_keyword_starts_decl_specifier_p,
>>>>>>>       cp_parser_simple_type_specifier): Recognize char8_t as a 
>>>>>>> simple
>>>>>>>       type specifier.
>>>>>>>       (cp_parser_string_literal): Use char8_array_type_node for 
>>>>>>> the type
>>>>>>>       of CPP_UTF8STRING.
>>>>>>>       (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs 
>>>>>>> in system
>>>>>>>       headers.
>>>>>>>       * cp/rtti.c (emit_support_tinfos): type_info support for 
>>>>>>> char8_t.
>>>>>>>       * cp/tree.c (char_type_p): Recognize char8_t as a 
>>>>>>> character type.
>>>>>>>       * cp/typeck.c (string_conv_p): Handle conversions of u8 
>>>>>>> string
>>>>>>>       literals of char8_t type.
>>>>>>>       (check_literal_operator_args): Handle UDLs with u8 string 
>>>>>>> literals
>>>>>>>       of char8_t type.
>>>>>>>       * cp/typeck2.c (digest_init_r): Disallow initializing a 
>>>>>>> char array
>>>>>>>       with a u8 string literal.
>>>>>>>
>>>>>>> libiberty/ChangeLog:
>>>>>>>
>>>>>>> 2018-10-31  Tom Honermann  <tom@honermann.net>
>>>>>>>       * cp-demangle.c (cplus_demangle_builtin_types,
>>>>>>>       cplus_demangle_type): Add name demangling for char8_t (Du).
>>>>>>>       * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to 
>>>>>>> accommodate the
>>>>>>>       new char8_t type.
>>>>>>
>>>>>>> @@ -3543,6 +3556,10 @@ c_common_get_alias_set (tree t)
>>>>>>>    if (!TYPE_P (t))
>>>>>>>      return -1;
>>>>>>
>>>>>>> +  /* Unlike char, char8_t doesn't alias. */
>>>>>>> +  if (flag_char8_t && t == char8_type_node)
>>>>>>> +    return -1;
>>>>>>
>>>>>> This seems unnecessary; doesn't the existing code have the same 
>>>>>> effect? I think we could do with just an adjustment to the 
>>>>>> existing comment.
>>>> I'm not sure.  I had concerns about unintended matching due to 
>>>> char8_t having an underlying type of unsigned char.
>>>
>>> That shouldn't be a problem: if char8_t is a distinct type, it won't 
>>> match unsigned char, and if it's the same as unsigned char, 
>>> flag_char8_t will be false.
I tried removing this check and that resulted in test 
gcc/testsuite/g++.dg/ext/char8_t-aliasing-1.C (added in patch 3/9) 
failing.  It seems this change is needed.  If you believe that implies 
that something is wrong elsewhere, please let me know.
>>>
>>>>>>> +  else if (flag_char8_t && TREE_TYPE (value) == 
>>>>>>> char8_array_type_node)
>>>>>>> +      || (flag_char8_t && type == char8_type_node)
>>>>>>> +      bool char8_array = (flag_char8_t && !!comptypes (typ1, 
>>>>>>> char8_type_node));
>>>>>>> +       || (flag_char8_t && type == char8_type_node
>>>>>> In many places you check the flag and then for one of the char8 
>>>>>> types. Since the types won't be used without the flag, checking 
>>>>>> the flag seems redundant?
>>>>
>>>> This was again protection against unintended matching of the 
>>>> underlying unsigned char type, particularly when compiling as C. 
>>>> char8_type_node is constructed (in c_common_nodes_and_builtins) 
>>>> following the pattern in place for char16_t and char32_t with the 
>>>> following code:
>>>>
>>>> +  char8_type_node = get_identifier (CHAR8_TYPE);
>>>> +  char8_type_node = TREE_TYPE (identifier_global_value 
>>>> (char8_type_node));
>>>> +  char8_type_size = TYPE_PRECISION (char8_type_node);
>>>> +  if (c_dialect_cxx ())
>>>> +    {
>>>> +      char8_type_node = make_unsigned_type (char8_type_size);
>>>> +
>>>> +      if (flag_char8_t)
>>>> +        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
>>>> +    }
>>>>
>>>> My knowledge of gcc internals is weak, but I understand this to be, 
>>>> effectively, defining a type alias (of unsigned char) and then, if 
>>>> compiling as C++, re-defining it as a strong unsigned type.
>>>>
>>>> I don't recall the details now, but at one point, I was missing a 
>>>> check for flag_char8_t in some location and I encountered some test 
>>>> failures as a result.
>>>
>>> Since char8_type_node is always a distinct type in C++, we shouldn't 
>>> need these checks in the C++ front end.  And since it's never a 
>>> distinct type in C, the C front end (c/*) doesn't need to change at 
>>> all.
Thanks, that makes sense.  I removed the checks in the C++ front end.  
The only checks now remaining are in gcc/c-family/c-common.c and I 
believe they are necessary there.
>>>
>>>>>>> -      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
>>>>>>> +      if (TYPE_PRECISION (typ1) == BITS_PER_UNIT
>>>>>>> +          && (typ1 == char_type_node || !flag_char8_t))
>>>>>>
>>>>>> This looks wrong, or at least incomplete; we want to complain 
>>>>>> about mismatched types here even with -fchar8-t. Perhaps we 
>>>>>> should replace all of this if/else with simply comparing typ1 and 
>>>>>> char_type, and complaining if they're different.  Talking about 
>>>>>> wide and non-wide isn't as useful as the actual types would be.
>>>>>
>>>>> Well, I suppose it isn't quite that simple, since we still need to 
>>>>> treat the ordinary character types as interchangeable.
>>>>
>>>> We do need to complain about mismatched types and test 
>>>> gcc/testsuite/g++.dg/ext/char8_t-init-2.C was added to ensure that 
>>>> happens.
>>>
>>> You don't seem to test initializing an array of signed/unsigned 
>>> char, which I think will be broken by the change only considering 
>>> char_type_node.
Thank you for spotting this!  Yes, this was broken.  Now fixed and tests 
added.
>>>
>>> I think we want to specifically allow conversion from array of one 
>>> ordinary character type to another, and otherwise complain about the 
>>> types being different with a message like "cannot initialize array 
>>> of %qT from array of %qT" rather than mess with terms like int-array 
>>> and (non-)wide string.

I like this suggestion.  For this patch, I just made a small tweak to 
avoid misleadingly describing UTF-8 strings literals as wide strings.  
The new (temporary) error messages look like "char-array initialized 
from UTF-8 string" and "char8_t-array initialized from ordinary 
string".  I have a separate patch that changes the error messages along 
the lines you suggested, but I'll send that separately (in the next day 
or so; I need to re-run the testsuite) as it will require updates to a 
distinct set of tests.

Tom.

>>
>> Ping.  Will you have a chance to update the patch soon, or would you 
>> like me to make these last changes myself?
>
> Hi, Jason.  Updating the patch has remained high on my todo list, but 
> I've been under water elsewhere.  I've been hoping to get to it in the 
> next week or so.  If you have time available to help complete changes, 
> sure, that would be great.
>
> Tom.
>
>>
>> Jason
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2018-12-24  2:27 ` [REVISED PATCH " Tom Honermann
@ 2019-01-14 19:59   ` Jason Merrill
  2019-01-15  4:08     ` Tom Honermann
  2019-01-15  6:51     ` Christophe Lyon
  0 siblings, 2 replies; 14+ messages in thread
From: Jason Merrill @ 2019-01-14 19:59 UTC (permalink / raw)
  To: Tom Honermann, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

On 12/23/18 9:27 PM, Tom Honermann wrote:
> Attached is a revised patch that addresses changes in P0482R6 as well as 
> feedback provided by Jason.  Changes from the prior patch include:
> - Updated the value of the __cpp_char8_t feature test macro to 201811
>    per P0482R6.
> - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
>    San Diego.
> - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
>    by Jason.
> - Removed unnecessary checks of 'flag_char8_t' within the C++ front
>    end as requested by Jason.
> - Corrected the regression spotted by Jason regarding initialization of
>    signed char and unsigned char arrays with string literals.
> - Made minor changes to the error message emitted for ill-formed
>    initialization of char arrays with UTF-8 string literals.  These
>    changes do not yet implement Jason's suggestion; I'll follow up with a
>    separate patch for that due to additional test impact.
> 
> Tested on x86_64-linux.

I just applied the compiler changes with small modifications, as 
follows; thank you very much for the patches.  Jonathan should check in 
the library portion before long.

Jason

[-- Attachment #2: char8-core.diff --]
[-- Type: text/x-patch, Size: 85687 bytes --]

commit 08872ecfcbe97cc6ccedf31b8d9a7edeb29bf290
Author: Jason Merrill <jason@redhat.com>
Date:   Mon Jan 7 23:51:35 2019 -0500

            Implement P0482R5, char8_t: A type for UTF-8 characters and strings
    
    gcc/cp/
            * cvt.c (type_promotes_to): Handle char8_t promotion.
            * decl.c (grokdeclarator): Handle invalid type specifier
            combinations involving char8_t.
            * lex.c (init_reswords): Add char8_t as a reserved word.
            * mangle.c (write_builtin_type): Add name mangling for char8_t (Du).
            * parser.c (cp_keyword_starts_decl_specifier_p)
            (cp_parser_simple_type_specifier): Recognize char8_t as a simple
            type specifier.
            (cp_parser_string_literal): Use char8_array_type_node for the type
            of CPP_UTF8STRING.
            (cp_parser_set_decl_spec_type): Tolerate char8_t typedefs in system
            headers.
            * rtti.c (emit_support_tinfos): type_info support for char8_t.
            * tree.c (char_type_p): Recognize char8_t as a character type.
            * typeck.c (string_conv_p): Handle conversions of u8 string
            literals of char8_t type.
            (check_literal_operator_args): Handle UDLs with u8 string literals
            of char8_t type.
            * typeck2.c (ordinary_char_type_p): New.
            (digest_init_r): Disallow initializing a char array with a u8 string
            literal.
    gcc/c-family/
            * c-common.c (c_common_reswords): Add char8_t.
            (fix_string_type): Use char8_t for the type of u8 string literals.
            (c_common_get_alias_set): char8_t doesn't alias.
            (c_common_nodes_and_builtins): Define char8_t as a builtin type in
            C++.
            (c_stddef_cpp_builtins): Add __CHAR8_TYPE__.
            (keyword_begins_type_specifier): Add RID_CHAR8.
            * c-common.h (rid): Add RID_CHAR8.
            (c_tree_index): Add CTI_CHAR8_TYPE and CTI_CHAR8_ARRAY_TYPE.
            Define D_CXX_CHAR8_T and D_CXX_CHAR8_T_FLAGS.
            Define char8_type_node and char8_array_type_node.
            * c-cppbuiltin.c (cpp_atomic_builtins): Predefine
            __GCC_ATOMIC_CHAR8_T_LOCK_FREE.
            (c_cpp_builtins): Predefine __cpp_char8_t.
            * c-lex.c (lex_string): Use char8_array_type_node as the type of
            CPP_UTF8STRING.
            (lex_charconst): Use char8_type_node as the type of CPP_UTF8CHAR.
            * c-opts.c: If not otherwise specified, enable -fchar8_t when
            targeting C++2a.
            * c.opt: Add the -fchar8_t command line option.
    libiberty/
            * cp-demangle.c (cplus_demangle_builtin_types)
            (cplus_demangle_type): Add name demangling for char8_t (Du).
            * cp-demangle.h: Increase D_BUILTIN_TYPE_COUNT to accommodate the
            new char8_t type.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 5ed1d133420..1151708aaf0 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -206,7 +206,7 @@ in the following sections.
 @item C++ Language Options
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
--faligned-new=@var{n}  -fargs-in-order=@var{n}  -fcheck-new @gol
+-faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t  -fcheck-new @gol
 -fconstexpr-depth=@var{n}  -fconstexpr-loop-limit=@var{n} @gol
 -fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
@@ -2426,6 +2426,60 @@ but few users will need to override the default of
 
 This flag is enabled by default for @option{-std=c++17}.
 
+@item -fchar8_t
+@itemx -fno-char8_t
+@opindex fchar8_t
+@opindex fno-char8_t
+Enable support for @code{char8_t} as adopted for C++2a.  This includes
+the addition of a new @code{char8_t} fundamental type, changes to the
+types of UTF-8 string and character literals, new signatures for
+user-defined literals, associated standard library updates, and new
+@code{__cpp_char8_t} and @code{__cpp_lib_char8_t} feature test macros.
+
+This option enables functions to be overloaded for ordinary and UTF-8
+strings:
+
+@smallexample
+int f(const char *);    // #1
+int f(const char8_t *); // #2
+int v1 = f("text");     // Calls #1
+int v2 = f(u8"text");   // Calls #2
+@end smallexample
+
+@noindent
+and introduces new signatures for user-defined literals:
+
+@smallexample
+int operator""_udl1(char8_t);
+int v3 = u8'x'_udl1;
+int operator""_udl2(const char8_t*, std::size_t);
+int v4 = u8"text"_udl2;
+template<typename T, T...> int operator""_udl3();
+int v5 = u8"text"_udl3;
+@end smallexample
+
+@noindent
+The change to the types of UTF-8 string and character literals introduces
+incompatibilities with ISO C++11 and later standards.  For example, the
+following code is well-formed under ISO C++11, but is ill-formed when
+@option{-fchar8_t} is specified.
+
+@smallexample
+char ca[] = u8"xx";     // error: char-array initialized from wide
+                        //        string
+const char *cp = u8"xx";// error: invalid conversion from
+                        //        `const char8_t*' to `const char*'
+int f(const char*);
+auto v = f(u8"xx");     // error: invalid conversion from
+                        //        `const char8_t*' to `const char*'
+std::string s@{u8"xx"@};  // error: no matching function for call to
+                        //        `std::basic_string<char>::basic_string()'
+using namespace std::literals;
+s = u8"xx"s;            // error: conversion from
+                        //        `basic_string<char8_t>' to non-scalar
+                        //        type `basic_string<char>' requested
+@end smallexample
+
 @item -fcheck-new
 @opindex fcheck-new
 Check that the pointer returned by @code{operator new} is non-null
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index d118e74ab07..858beff53d6 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1300,6 +1300,11 @@ fcanonical-system-headers
 C ObjC C++ ObjC++
 Where shorter, use canonicalized paths to systems headers.
 
+fchar8_t
+C++ ObjC++ Var(flag_char8_t) Init(-1)
+Enable the char8_t fundamental type and use it as the type for UTF-8 string
+and character literals.
+
 fcheck-pointer-bounds
 C ObjC C++ ObjC++ LTO Deprecated
 Deprecated in GCC 9.  This switch has no effect.
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 9f790bc6a14..9fe90f32b16 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -180,6 +180,9 @@ enum rid
   /* C++11 */
   RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
 
+  /* char8_t */
+  RID_CHAR8,
+
   /* C++ concepts */
   RID_CONCEPT, RID_REQUIRES,
 
@@ -287,6 +290,7 @@ extern GTY ((length ("(int) RID_MAX"))) tree *ridpointers;
 
 enum c_tree_index
 {
+    CTI_CHAR8_TYPE,
     CTI_CHAR16_TYPE,
     CTI_CHAR32_TYPE,
     CTI_WCHAR_TYPE,
@@ -330,6 +334,7 @@ enum c_tree_index
     CTI_UINTPTR_TYPE,
 
     CTI_CHAR_ARRAY_TYPE,
+    CTI_CHAR8_ARRAY_TYPE,
     CTI_CHAR16_ARRAY_TYPE,
     CTI_CHAR32_ARRAY_TYPE,
     CTI_WCHAR_ARRAY_TYPE,
@@ -409,20 +414,22 @@ extern machine_mode c_default_pointer_mode;
    mask) is _true_.  Thus for keywords which are present in all
    languages the disable field is zero.  */
 
-#define D_CONLY		0x001	/* C only (not in C++).  */
-#define D_CXXONLY	0x002	/* C++ only (not in C).  */
-#define D_C99		0x004	/* In C, C99 only.  */
-#define D_CXX11         0x008	/* In C++, C++11 only.  */
-#define D_EXT		0x010	/* GCC extension.  */
-#define D_EXT89		0x020	/* GCC extension incorporated in C99.  */
-#define D_ASM		0x040	/* Disabled by -fno-asm.  */
-#define D_OBJC		0x080	/* In Objective C and neither C nor C++.  */
-#define D_CXX_OBJC	0x100	/* In Objective C, and C++, but not C.  */
-#define D_CXXWARN	0x200	/* In C warn with -Wcxx-compat.  */
-#define D_CXX_CONCEPTS  0x400   /* In C++, only with concepts. */
-#define D_TRANSMEM	0X800   /* C++ transactional memory TS.  */
+#define D_CONLY		0x0001	/* C only (not in C++).  */
+#define D_CXXONLY	0x0002	/* C++ only (not in C).  */
+#define D_C99		0x0004	/* In C, C99 only.  */
+#define D_CXX11         0x0008	/* In C++, C++11 only.  */
+#define D_EXT		0x0010	/* GCC extension.  */
+#define D_EXT89		0x0020	/* GCC extension incorporated in C99.  */
+#define D_ASM		0x0040	/* Disabled by -fno-asm.  */
+#define D_OBJC		0x0080	/* In Objective C and neither C nor C++.  */
+#define D_CXX_OBJC	0x0100	/* In Objective C, and C++, but not C.  */
+#define D_CXXWARN	0x0200	/* In C warn with -Wcxx-compat.  */
+#define D_CXX_CONCEPTS  0x0400	/* In C++, only with concepts.  */
+#define D_TRANSMEM	0X0800	/* C++ transactional memory TS.  */
+#define D_CXX_CHAR8_T	0X1000	/* In C++, only with -fchar8_t.  */
 
 #define D_CXX_CONCEPTS_FLAGS D_CXXONLY | D_CXX_CONCEPTS
+#define D_CXX_CHAR8_T_FLAGS D_CXXONLY | D_CXX_CHAR8_T
 
 /* The reserved keyword table.  */
 extern const struct c_common_resword c_common_reswords[];
@@ -430,6 +437,7 @@ extern const struct c_common_resword c_common_reswords[];
 /* The number of items in the reserved keyword table.  */
 extern const unsigned int num_c_common_reswords;
 
+#define char8_type_node			c_global_trees[CTI_CHAR8_TYPE]
 #define char16_type_node		c_global_trees[CTI_CHAR16_TYPE]
 #define char32_type_node		c_global_trees[CTI_CHAR32_TYPE]
 #define wchar_type_node			c_global_trees[CTI_WCHAR_TYPE]
@@ -475,6 +483,7 @@ extern const unsigned int num_c_common_reswords;
 #define truthvalue_false_node		c_global_trees[CTI_TRUTHVALUE_FALSE]
 
 #define char_array_type_node		c_global_trees[CTI_CHAR_ARRAY_TYPE]
+#define char8_array_type_node		c_global_trees[CTI_CHAR8_ARRAY_TYPE]
 #define char16_array_type_node		c_global_trees[CTI_CHAR16_ARRAY_TYPE]
 #define char32_array_type_node		c_global_trees[CTI_CHAR32_ARRAY_TYPE]
 #define wchar_array_type_node		c_global_trees[CTI_WCHAR_ARRAY_TYPE]
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 56489465c0c..6a2004330d2 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7488,6 +7488,7 @@ extern tree store_init_value			(tree, tree, vec<tree, va_gc>**, int);
 extern tree split_nonconstant_init		(tree, tree);
 extern bool check_narrowing			(tree, tree, tsubst_flags_t,
 						 bool = false);
+extern bool ordinary_char_type_p		(tree);
 extern tree digest_init				(tree, tree, tsubst_flags_t);
 extern tree digest_init_flags			(tree, tree, int, tsubst_flags_t);
 extern tree digest_nsdmi_init		        (tree, tree, tsubst_flags_t);
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 2a1a5b8d1df..b7534256119 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -583,6 +583,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
    affect C++ name mangling because in C++ these are distinct types
    not typedefs.  */
 
+#ifndef CHAR8_TYPE
+#define CHAR8_TYPE "unsigned char"
+#endif
+
 #ifdef UINT_LEAST16_TYPE
 #define CHAR16_TYPE UINT_LEAST16_TYPE
 #else
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.h b/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.h
index e61034ec4ef..c8725fa9f46 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.h
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.h
@@ -8,6 +8,12 @@ inline namespace my_string_literals
   operator"" s(const char* str, std::size_t len)
   { return std::string{str, len}; }
 
+#if __cpp_lib_char8_t
+  std::u8string
+  operator"" s(const char8_t* str, std::size_t len)
+  { return std::u8string{str, len}; }
+#endif
+
   std::wstring
   operator"" s(const wchar_t* str, std::size_t len)
   { return std::wstring{str, len}; }
diff --git a/libiberty/cp-demangle.h b/libiberty/cp-demangle.h
index b739bdfb8a4..92191cf3ea8 100644
--- a/libiberty/cp-demangle.h
+++ b/libiberty/cp-demangle.h
@@ -176,7 +176,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (33)
+#define D_BUILTIN_TYPE_COUNT (34)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index d2ea384d653..2a5a8e7defb 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -79,6 +79,7 @@ machine_mode c_default_pointer_mode = VOIDmode;
 	tree signed_char_type_node;
 	tree wchar_type_node;
 
+	tree char8_type_node;
 	tree char16_type_node;
 	tree char32_type_node;
 
@@ -128,6 +129,11 @@ machine_mode c_default_pointer_mode = VOIDmode;
 
 	tree wchar_array_type_node;
 
+   Type `char8_t[SOMENUMBER]' or something like it.
+   Used when a UTF-8 string literal is created.
+
+	tree char8_array_type_node;
+
    Type `char16_t[SOMENUMBER]' or something like it.
    Used when a UTF-16 string literal is created.
 
@@ -452,6 +458,7 @@ const struct c_common_resword c_common_reswords[] =
   { "case",		RID_CASE,	0 },
   { "catch",		RID_CATCH,	D_CXX_OBJC | D_CXXWARN },
   { "char",		RID_CHAR,	0 },
+  { "char8_t",		RID_CHAR8,	D_CXX_CHAR8_T_FLAGS | D_CXXWARN },
   { "char16_t",		RID_CHAR16,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "char32_t",		RID_CHAR32,	D_CXXONLY | D_CXX11 | D_CXXWARN },
   { "class",		RID_CLASS,	D_CXX_OBJC | D_CXXWARN },
@@ -748,6 +755,11 @@ fix_string_type (tree value)
       charsz = 1;
       e_type = char_type_node;
     }
+  else if (flag_char8_t && TREE_TYPE (value) == char8_array_type_node)
+    {
+      charsz = TYPE_PRECISION (char8_type_node) / BITS_PER_UNIT;
+      e_type = char8_type_node;
+    }
   else if (TREE_TYPE (value) == char16_array_type_node)
     {
       charsz = TYPE_PRECISION (char16_type_node) / BITS_PER_UNIT;
@@ -828,7 +840,8 @@ fix_string_type (tree value)
    CPP_STRING16, or CPP_STRING32.  Return CPP_OTHER in case of error.
    This may not be exactly the string token type that initially created
    the string, since CPP_WSTRING is indistinguishable from the 16/32 bit
-   string type at this point.
+   string type, and CPP_UTF8STRING is indistinguishable from CPP_STRING
+   at this point.
 
    This effectively reverses part of the logic in lex_string and
    fix_string_type.  */
@@ -3640,8 +3653,12 @@ c_common_get_alias_set (tree t)
   if (!TYPE_P (t))
     return -1;
 
+  /* Unlike char, char8_t doesn't alias. */
+  if (flag_char8_t && t == char8_type_node)
+    return -1;
+
   /* The C standard guarantees that any object may be accessed via an
-     lvalue that has character type.  */
+     lvalue that has narrow character type (except char8_t).  */
   if (t == char_type_node
       || t == signed_char_type_node
       || t == unsigned_char_type_node)
@@ -4050,6 +4067,7 @@ c_get_ident (const char *id)
 void
 c_common_nodes_and_builtins (void)
 {
+  int char8_type_size;
   int char16_type_size;
   int char32_type_size;
   int wchar_type_size;
@@ -4341,6 +4359,22 @@ c_common_nodes_and_builtins (void)
   wchar_array_type_node
     = build_array_type (wchar_type_node, array_domain_type);
 
+  /* Define 'char8_t'.  */
+  char8_type_node = get_identifier (CHAR8_TYPE);
+  char8_type_node = TREE_TYPE (identifier_global_value (char8_type_node));
+  char8_type_size = TYPE_PRECISION (char8_type_node);
+  if (c_dialect_cxx ())
+    {
+      char8_type_node = make_unsigned_type (char8_type_size);
+
+      if (flag_char8_t)
+        record_builtin_type (RID_CHAR8, "char8_t", char8_type_node);
+    }
+
+  /* This is for UTF-8 string constants.  */
+  char8_array_type_node
+    = build_array_type (char8_type_node, array_domain_type);
+
   /* Define 'char16_t'.  */
   char16_type_node = get_identifier (CHAR16_TYPE);
   char16_type_node = TREE_TYPE (identifier_global_value (char16_type_node));
@@ -5138,6 +5172,8 @@ c_stddef_cpp_builtins(void)
   builtin_define_with_value ("__WINT_TYPE__", WINT_TYPE, 0);
   builtin_define_with_value ("__INTMAX_TYPE__", INTMAX_TYPE, 0);
   builtin_define_with_value ("__UINTMAX_TYPE__", UINTMAX_TYPE, 0);
+  if (flag_char8_t)
+    builtin_define_with_value ("__CHAR8_TYPE__", CHAR8_TYPE, 0);
   builtin_define_with_value ("__CHAR16_TYPE__", CHAR16_TYPE, 0);
   builtin_define_with_value ("__CHAR32_TYPE__", CHAR32_TYPE, 0);
   if (SIG_ATOMIC_TYPE)
@@ -7856,6 +7892,7 @@ keyword_begins_type_specifier (enum rid keyword)
     case RID_ACCUM:
     case RID_BOOL:
     case RID_WCHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_SAT:
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 25b5c1a7406..c9b63caeb2d 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -702,6 +702,9 @@ cpp_atomic_builtins (cpp_reader *pfile)
 			(have_swap[SWAP_INDEX (boolean_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (signed_char_type_node)]? 2 : 1));
+  if (flag_char8_t)
+    builtin_define_with_int_value ("__GCC_ATOMIC_CHAR8_T_LOCK_FREE",
+			(have_swap[SWAP_INDEX (char8_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR16_T_LOCK_FREE", 
 			(have_swap[SWAP_INDEX (char16_type_node)]? 2 : 1));
   builtin_define_with_int_value ("__GCC_ATOMIC_CHAR32_T_LOCK_FREE", 
@@ -1000,6 +1003,8 @@ c_cpp_builtins (cpp_reader *pfile)
 	cpp_define (pfile, "__cpp_template_template_args=201611");
       if (flag_threadsafe_statics)
 	cpp_define (pfile, "__cpp_threadsafe_static_init=200806");
+      if (flag_char8_t)
+        cpp_define (pfile, "__cpp_char8_t=201811");
     }
   /* Note that we define this for C as well, so that we know if
      __attribute__((cleanup)) will interface with EH.  */
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index d5ce9e9a032..0a368a33a58 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1281,9 +1281,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
     {
     default:
     case CPP_STRING:
-    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
+    case CPP_UTF8STRING:
+      if (flag_char8_t)
+        TREE_TYPE (value) = char8_array_type_node;
+      else
+        TREE_TYPE (value) = char_array_type_node;
+      break;
     case CPP_STRING16:
       TREE_TYPE (value) = char16_array_type_node;
       break;
@@ -1323,7 +1328,12 @@ lex_charconst (const cpp_token *token)
   else if (token->type == CPP_CHAR16)
     type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
-    type = char_type_node;
+    {
+      if (flag_char8_t)
+        type = char8_type_node;
+      else
+        type = char_type_node;
+    }
   /* In C, a character constant has type 'int'.
      In C++ 'char', but multi-char charconsts have type 'int'.  */
   else if (!c_dialect_cxx () || chars_seen > 1)
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index 2c22574b730..9660f51867a 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -996,6 +996,10 @@ c_common_post_options (const char **pfilename)
   if (flag_sized_deallocation == -1)
     flag_sized_deallocation = (cxx_dialect >= cxx14);
 
+  /* char8_t support is new in C++2A.  */
+  if (flag_char8_t == -1)
+    flag_char8_t = (cxx_dialect >= cxx2a);
+
   if (flag_extern_tls_init)
     {
       if (!TARGET_SUPPORTS_ALIASES || !SUPPORTS_WEAK)
diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index 449ce50d9e4..91428119398 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1877,6 +1877,7 @@ type_promotes_to (tree type)
      wider.  Scoped enums don't promote, but pretend they do for backward
      ABI bug compatibility wrt varargs.  */
   else if (TREE_CODE (type) == ENUMERAL_TYPE
+	   || type == char8_type_node
 	   || type == char16_type_node
 	   || type == char32_type_node
 	   || type == wchar_type_node)
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 6e75c3d3d93..41972aa0955 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -10770,7 +10770,9 @@ grokdeclarator (const cp_declarator *declarator,
 	  error_at (&richloc, "%<long%> and %<short%> specified together");
 	}
       else if (TREE_CODE (type) != INTEGER_TYPE
-	       || type == char16_type_node || type == char32_type_node
+	       || type == char8_type_node
+	       || type == char16_type_node
+	       || type == char32_type_node
 	       || ((long_p || short_p)
 		   && (explicit_char || explicit_intN)))
 	error_at (loc, "%qs specified with %qT", key, type);
diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c
index 36ffa37e037..369ecc05df2 100644
--- a/gcc/cp/lex.c
+++ b/gcc/cp/lex.c
@@ -233,6 +233,8 @@ init_reswords (void)
     mask |= D_CXX_CONCEPTS;
   if (!flag_tm)
     mask |= D_TRANSMEM;
+  if (!flag_char8_t)
+    mask |= D_CXX_CHAR8_T;
   if (flag_no_asm)
     mask |= D_ASM | D_EXT;
   if (flag_no_gnu_keywords)
diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 919f7b3f2fb..00bde4ee59a 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -2473,10 +2473,12 @@ write_builtin_type (tree type)
       break;
 
     case INTEGER_TYPE:
-      /* TYPE may still be wchar_t, char16_t, or char32_t, since that
+      /* TYPE may still be wchar_t, char8_t, char16_t, or char32_t, since that
 	 isn't in integer_type_nodes.  */
       if (type == wchar_type_node)
 	write_char ('w');
+      else if (type == char8_type_node)
+	write_string ("Du");
       else if (type == char16_type_node)
 	write_string ("Ds");
       else if (type == char32_type_node)
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index be669f2f321..7d7b0292650 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -948,6 +948,7 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
     case RID_TYPENAME:
       /* Simple type specifiers.  */
     case RID_CHAR:
+    case RID_CHAR8:
     case RID_CHAR16:
     case RID_CHAR32:
     case RID_WCHAR:
@@ -4235,9 +4236,14 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	{
 	default:
 	case CPP_STRING:
-	case CPP_UTF8STRING:
 	  TREE_TYPE (value) = char_array_type_node;
 	  break;
+	case CPP_UTF8STRING:
+	  if (flag_char8_t)
+	    TREE_TYPE (value) = char8_array_type_node;
+	  else
+	    TREE_TYPE (value) = char_array_type_node;
+	  break;
 	case CPP_STRING16:
 	  TREE_TYPE (value) = char16_array_type_node;
 	  break;
@@ -17504,6 +17510,9 @@ cp_parser_simple_type_specifier (cp_parser* parser,
 	decl_specs->explicit_char_p = true;
       type = char_type_node;
       break;
+    case RID_CHAR8:
+      type = char8_type_node;
+      break;
     case RID_CHAR16:
       type = char16_type_node;
       break;
@@ -28919,14 +28928,15 @@ cp_parser_set_decl_spec_type (cp_decl_specifier_seq *decl_specs,
 {
   decl_specs->any_specifiers_p = true;
 
-  /* If the user tries to redeclare bool, char16_t, char32_t, or wchar_t
-     (with, for example, in "typedef int wchar_t;") we remember that
+  /* If the user tries to redeclare bool, char8_t, char16_t, char32_t, or
+     wchar_t (with, for example, in "typedef int wchar_t;") we remember that
      this is what happened.  In system headers, we ignore these
      declarations so that G++ can work with system headers that are not
      C++-safe.  */
   if (decl_spec_seq_has_spec_p (decl_specs, ds_typedef)
       && !type_definition_p
       && (type_spec == boolean_type_node
+	  || type_spec == char8_type_node
 	  || type_spec == char16_type_node
 	  || type_spec == char32_type_node
 	  || type_spec == wchar_type_node)
diff --git a/gcc/cp/rtti.c b/gcc/cp/rtti.c
index a6d32b914a7..c4aabea7003 100644
--- a/gcc/cp/rtti.c
+++ b/gcc/cp/rtti.c
@@ -1539,7 +1539,7 @@ emit_support_tinfos (void)
   {
     &void_type_node,
     &boolean_type_node,
-    &wchar_type_node, &char16_type_node, &char32_type_node,
+    &wchar_type_node, &char8_type_node, &char16_type_node, &char32_type_node,
     &char_type_node, &signed_char_type_node, &unsigned_char_type_node,
     &short_integer_type_node, &short_unsigned_type_node,
     &integer_type_node, &unsigned_type_node,
diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 4db89a4e5a6..50002161500 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -5022,6 +5022,7 @@ char_type_p (tree type)
   return (same_type_p (type, char_type_node)
 	  || same_type_p (type, unsigned_char_type_node)
 	  || same_type_p (type, signed_char_type_node)
+	  || same_type_p (type, char8_type_node)
 	  || same_type_p (type, char16_type_node)
 	  || same_type_p (type, char32_type_node)
 	  || same_type_p (type, wchar_type_node));
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 43d2899a3c4..88e2cd6ab9b 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -2216,6 +2216,7 @@ string_conv_p (const_tree totype, const_tree exp, int warn)
 
   t = TREE_TYPE (totype);
   if (!same_type_p (t, char_type_node)
+      && !same_type_p (t, char8_type_node)
       && !same_type_p (t, char16_type_node)
       && !same_type_p (t, char32_type_node)
       && !same_type_p (t, wchar_type_node))
@@ -10288,6 +10289,7 @@ check_literal_operator_args (const_tree decl,
 	      t = TYPE_MAIN_VARIANT (t);
 	      if ((maybe_raw_p = same_type_p (t, char_type_node))
 		  || same_type_p (t, wchar_type_node)
+		  || same_type_p (t, char8_type_node)
 		  || same_type_p (t, char16_type_node)
 		  || same_type_p (t, char32_type_node))
 		{
@@ -10320,6 +10322,8 @@ check_literal_operator_args (const_tree decl,
 	    max_arity = 1;
 	  else if (same_type_p (t, wchar_type_node))
 	    max_arity = 1;
+	  else if (same_type_p (t, char8_type_node))
+	    max_arity = 1;
 	  else if (same_type_p (t, char16_type_node))
 	    max_arity = 1;
 	  else if (same_type_p (t, char32_type_node))
diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index ecc313b2355..cd4313295d5 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1026,6 +1026,17 @@ check_narrowing (tree type, tree init, tsubst_flags_t complain, bool const_only)
   return ok;
 }
 
+/* True iff TYPE is a C++2a "ordinary" character type.  */
+
+bool
+ordinary_char_type_p (tree type)
+{
+  type = TYPE_MAIN_VARIANT (type);
+  return (type == char_type_node
+	  || type == signed_char_type_node
+	  || type == unsigned_char_type_node);
+}
+
 /* Process the initializer INIT for a variable of type TYPE, emitting
    diagnostics for invalid initializers and converting the initializer as
    appropriate.
@@ -1091,36 +1102,30 @@ digest_init_r (tree type, tree init, int nested, int flags,
 	  && TREE_CODE (stripped_init) == STRING_CST)
 	{
 	  tree char_type = TYPE_MAIN_VARIANT (TREE_TYPE (TREE_TYPE (init)));
+	  bool incompat_string_cst = false;
 
-	  if (TYPE_PRECISION (typ1) == BITS_PER_UNIT)
+	  if (typ1 != char_type)
 	    {
-	      if (char_type != char_type_node
-		  && char_type != signed_char_type_node
-		  && char_type != unsigned_char_type_node)
-		{
-		  if (complain & tf_error)
-		    error_at (loc, "char-array initialized from wide string");
-		  return error_mark_node;
-		}
+	      /* The array element type does not match the initializing string
+	         literal element type; this is only allowed when both types are
+	         ordinary character type.  There are no string literals of
+	         signed or unsigned char type in the language, but we can get
+	         them internally from converting braced-init-lists to
+	         STRING_CST.  */
+	      if (ordinary_char_type_p (typ1)
+		  && ordinary_char_type_p (char_type))
+		/* OK */;
+	      else
+		incompat_string_cst = true;
 	    }
-	  else
+
+	  if (incompat_string_cst)
 	    {
-	      if (char_type == char_type_node
-		  || char_type == signed_char_type_node
-		  || char_type == unsigned_char_type_node)
-		{
-		  if (complain & tf_error)
-		    error_at (loc,
-			      "int-array initialized from non-wide string");
-		  return error_mark_node;
-		}
-	      else if (char_type != typ1)
-		{
-		  if (complain & tf_error)
-		    error_at (loc, "int-array initialized from incompatible "
-			      "wide string");
-		  return error_mark_node;
-		}
+	      if (complain & tf_error)
+		error_at (loc, "cannot initialize array of %qT from "
+		          "a string literal with type array of %qT",
+		          typ1, char_type);
+	      return error_mark_node;
 	    }
 
 	  if (nested == 2 && !TYPE_DOMAIN (type))
diff --git a/gcc/testsuite/c-c++-common/raw-string-13.c b/gcc/testsuite/c-c++-common/raw-string-13.c
index 1b37405cee9..fa11edaa7aa 100644
--- a/gcc/testsuite/c-c++-common/raw-string-13.c
+++ b/gcc/testsuite/c-c++-common/raw-string-13.c
@@ -62,6 +62,47 @@ const char s16[] = R"??(??)??";
 const char s17[] = R"?(?)??)?";
 const char s18[] = R"??(??)??)??)??";
 
+const char u800[] = u8R"??=??(??<??>??)??'??!??-\
+(a)#[{}]^|~";
+)??=??";
+const char u801[] = u8R"a(
+)\
+a"
+)a";
+const char u802[] = u8R"a(
+)a\
+"
+)a";
+const char u803[] = u8R"ab(
+)a\
+b"
+)ab";
+const char u804[] = u8R"a??/(x)a??/";
+const char u805[] = u8R"abcdefghijklmn??(abc)abcdefghijklmn??";
+const char u806[] = u8R"abcdefghijklm??/(abc)abcdefghijklm??/";
+const char u807[] = u8R"abc(??)\
+abc";)abc";
+const char u808[] = u8R"def(de)\
+def";)def";
+const char u809[] = u8R"a(??)\
+a"
+)a";
+const char u810[] = u8R"a(??)a\
+"
+)a";
+const char u811[] = u8R"ab(??)a\
+b"
+)ab";
+const char u812[] = u8R"a#(a#)a??=)a#";
+const char u813[] = u8R"a#(??)a??=??)a#";
+const char u814[] = u8R"??/(x)??/
+";)??/";
+const char u815[] = u8R"??/(??)??/
+";)??/";
+const char u816[] = u8R"??(??)??";
+const char u817[] = u8R"?(?)??)?";
+const char u818[] = u8R"??(??)??)??)??";
+
 const char16_t u00[] = uR"??=??(??<??>??)??'??!??-\
 (a)#[{}]^|~";
 )??=??";
@@ -211,6 +252,25 @@ main (void)
   TEST (s16, "??");
   TEST (s17, "?)??");
   TEST (s18, "??"")??"")??");
+  TEST (u800, u8"??""<??"">??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
+  TEST (u801, u8"\n)\\\na\"\n");
+  TEST (u802, u8"\n)a\\\n\"\n");
+  TEST (u803, u8"\n)a\\\nb\"\n");
+  TEST (u804, u8"x");
+  TEST (u805, u8"abc");
+  TEST (u806, u8"abc");
+  TEST (u807, u8"??"")\\\nabc\";");
+  TEST (u808, u8"de)\\\ndef\";");
+  TEST (u809, u8"??"")\\\na\"\n");
+  TEST (u810, u8"??"")a\\\n\"\n");
+  TEST (u811, u8"??"")a\\\nb\"\n");
+  TEST (u812, u8"a#)a??""=");
+  TEST (u813, u8"??"")a??""=??");
+  TEST (u814, u8"x)??""/\n\";");
+  TEST (u815, u8"??"")??""/\n\";");
+  TEST (u816, u8"??");
+  TEST (u817, u8"?)??");
+  TEST (u818, u8"??"")??"")??");
   TEST (u00, u"??""<??"">??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
   TEST (u01, u"\n)\\\na\"\n");
   TEST (u02, u"\n)a\\\n\"\n");
diff --git a/gcc/testsuite/c-c++-common/raw-string-15.c b/gcc/testsuite/c-c++-common/raw-string-15.c
index 9dfdaabd87d..1d101dc8393 100644
--- a/gcc/testsuite/c-c++-common/raw-string-15.c
+++ b/gcc/testsuite/c-c++-common/raw-string-15.c
@@ -62,6 +62,47 @@ const char s16[] = R"??(??)??";
 const char s17[] = R"?(?)??)?";
 const char s18[] = R"??(??)??)??)??";
 
+const char u800[] = u8R"??=??(??<??>??)??'??!??-\
+(a)#[{}]^|~";
+)??=??";
+const char u801[] = u8R"a(
+)\
+a"
+)a";
+const char u802[] = u8R"a(
+)a\
+"
+)a";
+const char u803[] = u8R"ab(
+)a\
+b"
+)ab";
+const char u804[] = u8R"a??/(x)a??/";
+const char u805[] = u8R"abcdefghijklmn??(abc)abcdefghijklmn??";
+const char u806[] = u8R"abcdefghijklm??/(abc)abcdefghijklm??/";
+const char u807[] = u8R"abc(??)\
+abc";)abc";
+const char u808[] = u8R"def(de)\
+def";)def";
+const char u809[] = u8R"a(??)\
+a"
+)a";
+const char u810[] = u8R"a(??)a\
+"
+)a";
+const char u811[] = u8R"ab(??)a\
+b"
+)ab";
+const char u812[] = u8R"a#(a#)a??=)a#";
+const char u813[] = u8R"a#(??)a??=??)a#";
+const char u814[] = u8R"??/(x)??/
+";)??/";
+const char u815[] = u8R"??/(??)??/
+";)??/";
+const char u816[] = u8R"??(??)??";
+const char u817[] = u8R"?(?)??)?";
+const char u818[] = u8R"??(??)??)??)??";
+
 const char16_t u00[] = uR"??=??(??<??>??)??'??!??-\
 (a)#[{}]^|~";
 )??=??";
@@ -211,6 +252,25 @@ main (void)
   TEST (s16, "??");
   TEST (s17, "?)??");
   TEST (s18, "??"")??"")??");
+  TEST (u800, u8"??""<??"">??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
+  TEST (u801, u8"\n)\\\na\"\n");
+  TEST (u802, u8"\n)a\\\n\"\n");
+  TEST (u803, u8"\n)a\\\nb\"\n");
+  TEST (u804, u8"x");
+  TEST (u805, u8"abc");
+  TEST (u806, u8"abc");
+  TEST (u807, u8"??"")\\\nabc\";");
+  TEST (u808, u8"de)\\\ndef\";");
+  TEST (u809, u8"??"")\\\na\"\n");
+  TEST (u810, u8"??"")a\\\n\"\n");
+  TEST (u811, u8"??"")a\\\nb\"\n");
+  TEST (u812, u8"a#)a??""=");
+  TEST (u813, u8"??"")a??""=??");
+  TEST (u814, u8"x)??""/\n\";");
+  TEST (u815, u8"??"")??""/\n\";");
+  TEST (u816, u8"??");
+  TEST (u817, u8"?)??");
+  TEST (u818, u8"??"")??"")??");
   TEST (u00, u"??""<??"">??"")??""'??""!??""-\\\n(a)#[{}]^|~\";\n");
   TEST (u01, u"\n)\\\na\"\n");
   TEST (u02, u"\n)a\\\n\"\n");
diff --git a/libiberty/cp-demangle.c b/libiberty/cp-demangle.c
index ddcd3be6b8f..b34b4856922 100644
--- a/libiberty/cp-demangle.c
+++ b/libiberty/cp-demangle.c
@@ -2364,9 +2364,10 @@ cplus_demangle_builtin_types[D_BUILTIN_TYPE_COUNT] =
   /* 27 */ { NL ("decimal64"),	NL ("decimal64"),	D_PRINT_DEFAULT },
   /* 28 */ { NL ("decimal128"),	NL ("decimal128"),	D_PRINT_DEFAULT },
   /* 29 */ { NL ("half"),	NL ("half"),		D_PRINT_FLOAT },
-  /* 30 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
-  /* 31 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
-  /* 32 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
+  /* 30 */ { NL ("char8_t"),	NL ("char8_t"),		D_PRINT_DEFAULT },
+  /* 31 */ { NL ("char16_t"),	NL ("char16_t"),	D_PRINT_DEFAULT },
+  /* 32 */ { NL ("char32_t"),	NL ("char32_t"),	D_PRINT_DEFAULT },
+  /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
 };
 
@@ -2654,14 +2655,19 @@ cplus_demangle_type (struct d_info *di)
 	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[29]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
+	case 'u':
+	  /* char8_t */
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  di->expansion += ret->u.s_builtin.type->len;
+	  break;
 	case 's':
 	  /* char16_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[30]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 	case 'i':
 	  /* char32_t */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[31]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
@@ -2687,7 +2693,7 @@ cplus_demangle_type (struct d_info *di)
 
         case 'n':
           /* decltype(nullptr) */
-	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[32]);
+	  ret = d_make_builtin_type (di, &cplus_demangle_builtin_types[33]);
 	  di->expansion += ret->u.s_builtin.type->len;
 	  break;
 
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-wstring2.C b/gcc/testsuite/g++.dg/cpp0x/constexpr-wstring2.C
index 4055e0ee8ec..b878918a9f3 100644
--- a/gcc/testsuite/g++.dg/cpp0x/constexpr-wstring2.C
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-wstring2.C
@@ -4,3 +4,4 @@
 constexpr wchar_t c1 = L"hi"[3];	// { dg-error "array subscript" }
 constexpr char16_t c2 = u"hi"[3];	// { dg-error "array subscript" }
 constexpr char32_t c3 = U"hi"[3];	// { dg-error "array subscript" }
+constexpr char c4 = u8"hi"[3];		// { dg-error "array subscript" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-implicit-conv-neg-char8_t.C b/gcc/testsuite/g++.dg/cpp0x/udlit-implicit-conv-neg-char8_t.C
new file mode 100644
index 00000000000..b917b5f6b90
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-implicit-conv-neg-char8_t.C
@@ -0,0 +1,81 @@
+// { dg-options "-std=c++17 -fchar8_t" }
+
+#include <cstdint>
+
+int operator"" _bar (long double);
+
+double operator"" _foo (long long unsigned);
+
+int i = 12_bar; // { dg-error "unable to find numeric literal operator|with|argument" }
+
+double d = 1.2_foo; // { dg-error "unable to find numeric literal operator|with|argument" }
+
+int operator"" _char(char);
+
+int operator"" _char8_t(char8_t);
+
+int operator"" _wchar_t(wchar_t);
+
+int operator"" _char16_t(char16_t);
+
+int operator"" _char32_t(char32_t);
+
+int cwcx = 'c'_wchar_t; // { dg-error "unable to find character literal operator|with|argument" }
+int cc8  = 'c'_char8_t; // { dg-error "unable to find character literal operator|with|argument" }
+int cc16 = 'c'_char16_t; // { dg-error "unable to find character literal operator|with|argument" }
+int cc32 = 'c'_char32_t; // { dg-error "unable to find character literal operator|with|argument" }
+
+int wccx = L'c'_char; // { dg-error "unable to find character literal operator|with|argument" }
+int wcc8 = L'c'_char8_t; // { dg-error "unable to find character literal operator|with|argument" }
+int wcc16 = L'c'_char16_t; // { dg-error "unable to find character literal operator|with|argument" }
+int wcc32 = L'c'_char32_t; // { dg-error "unable to find character literal operator|with|argument" }
+
+int c8c  = u8'c'_char; // { dg-error "unable to find character literal operator|with|argument" }
+int c8wc = u8'c'_wchar_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c8c16 = u8'c'_char16_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c8c32 = u8'c'_char32_t; // { dg-error "unable to find character literal operator|with|argument" }
+
+int c16c = u'c'_char; // { dg-error "unable to find character literal operator|with|argument" }
+int c16c8 = u'c'_char8_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c16wc = u'c'_wchar_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c16c32 = u'c'_char32_t; // { dg-error "unable to find character literal operator|with|argument" }
+
+int c32c = U'c'_char; // { dg-error "unable to find character literal operator|with|argument" }
+int c32c8 = U'c'_char8_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c32wc = U'c'_wchar_t; // { dg-error "unable to find character literal operator|with|argument" }
+int c32c16 = U'c'_char16_t; // { dg-error "unable to find character literal operator|with|argument" }
+
+int operator"" _char_str(const char*, std::size_t);
+
+int operator"" _wchar_t_str(const wchar_t*, std::size_t);
+
+int operator"" _char8_t_str(const char8_t*, std::size_t);
+
+int operator"" _char16_t_str(const char16_t*, std::size_t);
+
+int operator"" _char32_t_str(const char32_t*, std::size_t);
+
+int strwstr = "str"_wchar_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int strstr8 = "str"_char8_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int strstr16 = "str"_char16_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int strstr32 = "str"_char32_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+
+int str8str = u8"str"_char_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str8wstr = u8"str"_wchar_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str8str16 = u8"str"_char16_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str8str32 = u8"str"_char32_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+
+int wstrstr = L"str"_char_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int wstrstr8 = L"str"_char8_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int wstrstr16 = L"str"_char16_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int wstrstr32 = L"str"_char32_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+
+int str16str = u"str"_char_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str16wstr = u"str"_wchar_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str16str8 = u"str"_char8_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str16str32 = u"str"_char32_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+
+int str32str = U"str"_char_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str32wstr = U"str"_wchar_t_str; // { dg-error "unable to find string literal operator|with|arguments" }
+int str32str8 = U"str"_char8_t_str; // { dg-error "unable to find string literal operator string operator|with|arguments" }
+int str32str16 = U"str"_char16_t_str; // { dg-error "unable to find string literal operator string operator|with|arguments" }
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-resolve-char8_t.C b/gcc/testsuite/g++.dg/cpp0x/udlit-resolve-char8_t.C
new file mode 100644
index 00000000000..19cbd519a86
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-resolve-char8_t.C
@@ -0,0 +1,38 @@
+// { dg-options "-std=c++17 -fchar8_t" }
+
+#include <cstdint>
+#include <cassert>
+
+int operator"" _foo(const char*)                  { return 0; }
+int operator"" _foo(unsigned long long int)       { return 1; }
+int operator"" _foo(long double)                  { return 2; }
+int operator"" _foo(char)                         { return 3; }
+int operator"" _foo(wchar_t)                      { return 4; }
+int operator"" _foo(char8_t)                      { return 5; }
+int operator"" _foo(char16_t)                     { return 6; }
+int operator"" _foo(char32_t)                     { return 7; }
+int operator"" _foo(const char*, std::size_t)     { return 8; }
+int operator"" _foo(const wchar_t*, std::size_t)  { return 9; }
+int operator"" _foo(const char8_t*, std::size_t)  { return 10; }
+int operator"" _foo(const char16_t*, std::size_t) { return 11; }
+int operator"" _foo(const char32_t*, std::size_t) { return 12; }
+template<char...> int operator"" _foo2()          { return 20; }
+int operator"" _foo2(unsigned long long int)      { return 21; }
+
+int
+main()
+{
+  assert(123_foo == 1);
+  assert(0.123_foo == 2);
+  assert('c'_foo == 3);
+  assert(L'c'_foo == 4);
+  assert(u8'c'_foo == 5);
+  assert(u'c'_foo == 6);
+  assert(U'c'_foo == 7);
+  assert("abc"_foo == 8);
+  assert(L"abc"_foo == 9);
+  assert(u8"abc"_foo == 10);
+  assert(u"abc"_foo == 11);
+  assert(U"abc"_foo == 12);
+  assert(123_foo2 == 21);
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-string-length.C b/gcc/testsuite/g++.dg/cpp0x/udlit-string-length.C
index acfda45d5fe..c76b0af9a28 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-string-length.C
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-string-length.C
@@ -9,6 +9,14 @@ operator"" _len(const char*, size_type len)
   return len;
 }
 
+#if __cpp_char8_t
+constexpr size_type
+operator"" _len(const char8_t*, size_type len)
+{
+  return len;
+}
+#endif
+
 constexpr size_type
 operator"" _len(const wchar_t*, size_type len)
 {
diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.C b/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.C
index 734a0f38ad0..ab65dd08714 100644
--- a/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.C
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-string-literal.C
@@ -7,7 +7,9 @@
 using namespace my_string_literals;
 
 decltype("Hello, World!"s) s;
+#if !__cpp_char8_t == !__cpp_lib_char8_t
 decltype(u8"Hello, World!"s) s8;
+#endif
 decltype(L"Hello, World!"s) ws;
 decltype(u"Hello, World!"s) s16;
 decltype(U"Hello, World!"s) s32;
diff --git a/gcc/testsuite/g++.dg/cpp1z/udlit-utf8char.C b/gcc/testsuite/g++.dg/cpp1z/udlit-utf8char.C
index 0e921963835..093e32345cd 100644
--- a/gcc/testsuite/g++.dg/cpp1z/udlit-utf8char.C
+++ b/gcc/testsuite/g++.dg/cpp1z/udlit-utf8char.C
@@ -1,7 +1,9 @@
 // { dg-do compile { target c++17 } }
 
+typedef decltype(u8'c') u8_char_t;
+
 constexpr int
-operator""_foo(char c)
+operator""_foo(u8_char_t c)
 { return c * 100; }
 
 auto cc = u8'8'_foo;
diff --git a/gcc/testsuite/g++.dg/cpp1z/utf8.C b/gcc/testsuite/g++.dg/cpp1z/utf8.C
index e08fbb9c86e..ed413f30976 100644
--- a/gcc/testsuite/g++.dg/cpp1z/utf8.C
+++ b/gcc/testsuite/g++.dg/cpp1z/utf8.C
@@ -6,7 +6,11 @@
 auto c = 'c';
 auto u8c = u8'c';
 
+#if __cpp_char8_t
+static_assert(!std::experimental::is_same_v<decltype(u8c), decltype(c)>, "");
+#else
 static_assert(std::experimental::is_same_v<decltype(u8c), decltype(c)>, "");
+#endif
 
 auto u8s = u8"c";
 auto x = u8s[0];
diff --git a/gcc/testsuite/g++.dg/cpp2a/char8_t1.C b/gcc/testsuite/g++.dg/cpp2a/char8_t1.C
new file mode 100644
index 00000000000..aa0860b9f63
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/char8_t1.C
@@ -0,0 +1,5 @@
+// P0482R6
+// { dg-do compile }
+// { dg-options "-std=c++2a" }
+
+char8_t c8;
diff --git a/gcc/testsuite/g++.dg/cpp2a/char8_t2.C b/gcc/testsuite/g++.dg/cpp2a/char8_t2.C
new file mode 100644
index 00000000000..71eea7952d3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/char8_t2.C
@@ -0,0 +1,5 @@
+// P0482R6
+// { dg-do compile }
+// { dg-options "-std=c++2a -fno-char8_t" }
+
+char8_t c8; // { dg-error "does not name a type" }
diff --git a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
index b80cc342364..8e1ea48bb1d 100644
--- a/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
+++ b/gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C
@@ -471,3 +471,11 @@
 #else
 #  error "__has_cpp_attribute"
 #endif
+
+// C++2A features:
+
+#ifndef __cpp_char8_t
+#  error "__cpp_char8_t"
+#elif __cpp_char8_t != 201811
+#  error "__cpp_char8_t != 201811"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-aliasing-1.C b/gcc/testsuite/g++.dg/ext/char8_t-aliasing-1.C
new file mode 100644
index 00000000000..9252ef9dfa6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-aliasing-1.C
@@ -0,0 +1,8 @@
+// Test that char8_t does not alias with other types when -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-fstrict-aliasing -Wstrict-aliasing=1 -fchar8_t" }
+
+extern long l;
+char8_t* f() {
+  return (char8_t*)&l; // { dg-warning "dereferencing type-punned pointer might break strict-aliasing rules" "char8_t" }
+}
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
new file mode 100644
index 00000000000..8ed85ccfdcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
@@ -0,0 +1,12 @@
+// Test that UTF-8 character literals have type char if -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+static_assert(is_same<decltype(u8'x'), char>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
new file mode 100644
index 00000000000..7861736689c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
@@ -0,0 +1,12 @@
+// Test that UTF-8 character literals have type char8_t if -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+static_assert(is_same<decltype(u8'x'), char8_t>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-deduction-1.C b/gcc/testsuite/g++.dg/ext/char8_t-deduction-1.C
new file mode 100644
index 00000000000..27f19fe2dc9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-deduction-1.C
@@ -0,0 +1,30 @@
+// Test that char is deduced for UTF-8 character and string literals when
+// -fchar8_t is not in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+template<typename T1, typename T2, typename T3>
+void ft(T1, T2, T3 &) {
+  static_assert(is_same<T1, char>::value, "Error");
+  static_assert(is_same<T2, const char*>::value, "Error");
+  static_assert(is_same<T3, const char[2]>::value, "Error");
+}
+
+auto x = (ft(u8'x', u8"x", u8"x"),0);
+
+auto c8 = u8'x';
+static_assert(is_same<decltype(c8), char>::value, "Error");
+
+auto c8p = u8"x";
+static_assert(is_same<decltype(c8p), const char*>::value, "Error");
+
+auto &c8a = u8"x";
+static_assert(is_same<decltype(c8a), const char(&)[2]>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-deduction-2.C b/gcc/testsuite/g++.dg/ext/char8_t-deduction-2.C
new file mode 100644
index 00000000000..1daf2969d61
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-deduction-2.C
@@ -0,0 +1,30 @@
+// Test that char8_t is deduced for UTF-8 character and string literals when
+// -fchar8_t is in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+template<typename T1, typename T2, typename T3>
+void ft(T1, T2, T3 &) {
+  static_assert(is_same<T1, char8_t>::value, "Error");
+  static_assert(is_same<T2, const char8_t*>::value, "Error");
+  static_assert(is_same<T3, const char8_t[2]>::value, "Error");
+}
+
+auto x = (ft(u8'x', u8"x", u8"x"),0);
+
+auto c8 = u8'x';
+static_assert(is_same<decltype(c8), char8_t>::value, "Error");
+
+auto c8p = u8"x";
+static_assert(is_same<decltype(c8p), const char8_t*>::value, "Error");
+
+auto &c8a = u8"x";
+static_assert(is_same<decltype(c8a), const char8_t(&)[2]>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-1.C b/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-1.C
new file mode 100644
index 00000000000..6107cb61ecc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-1.C
@@ -0,0 +1,8 @@
+// Test that predefined feature test macros are not present when -fchar8_t is
+// not enabled.
+// { dg-do compile }
+// { dg-options "-fno-char8_t" }
+
+#if defined(__cpp_char8_t)
+#error __cpp_char8_t is defined!
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-2.C b/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-2.C
new file mode 100644
index 00000000000..df1063f6aa1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-feature-test-macro-2.C
@@ -0,0 +1,10 @@
+// Test that predefined feature test macros are present when -fchar8_t is
+// enabled.
+// { dg-do compile }
+// { dg-options "-fchar8_t" }
+
+#if !defined(__cpp_char8_t)
+#  error __cpp_char8_t is not defined!
+#elif __cpp_char8_t != 201811
+#  error __cpp_char8_t != 201811
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-init-1.C b/gcc/testsuite/g++.dg/ext/char8_t-init-1.C
new file mode 100644
index 00000000000..e2fd67bac72
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-init-1.C
@@ -0,0 +1,21 @@
+// Test initialization from UTF-8 literals when -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+char c1 = 'x';
+char c2 = u8'x';
+
+const char *pc1 = "x";
+const char *pc2 = u8"x";
+
+const char (&rca1)[2] = "x";
+const char (&rca2)[2] = u8"x";
+
+char ca1[] = "x";
+char ca2[] = u8"x";
+
+signed char sca1[] = "x";
+signed char sca2[] = u8"x";
+
+unsigned char uca1[] = "x";
+unsigned char uca2[] = u8"x";
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-init-2.C b/gcc/testsuite/g++.dg/ext/char8_t-init-2.C
new file mode 100644
index 00000000000..c713bc12266
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-init-2.C
@@ -0,0 +1,33 @@
+// Test initialization from UTF-8 literals when -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+char c1 = 'x';
+char c2 = u8'x';
+char8_t c3 = 'x';
+char8_t c4 = u8'x';
+char8_t c5 = u'x';
+
+const char *pc1 = "x";
+const char *pc2 = u8"x"; // { dg-error "invalid conversion from .const char8_t.. to .const char.." "char8_t" }
+const char8_t *pc3 = "x"; // { dg-error "invalid conversion from .const char.. to .const char8_t.." "char8_t" }
+const char8_t *pc4 = u8"x";
+const char8_t *pc5 = u"x"; // { dg-error "cannot convert .const char16_t.. to .const char8_t.. in initialization" "char8_t" }
+
+const char (&rca1)[2] = "x";
+const char (&rca2)[2] = u8"x"; // { dg-error "invalid initialization of reference of type .const char ....... from expression of type .const char8_t ...." "char8_t" }
+const char8_t (&rca3)[2] = "x"; // { dg-error "invalid initialization of reference of type .const char8_t ....... from expression of type .const char ...." "char8_t" }
+const char8_t (&rca4)[2] = u8"x";
+const char8_t (&rca5)[2] = u"x"; // { dg-error "invalid initialization of reference of type .const char8_t ....... from expression of type .const char16_t ...." "char8_t" }
+
+char ca1[] = "x";
+char ca2[] = u8"x"; // { dg-error "from a string literal with type array of .char8_t." "char8_t" }
+char8_t ca3[] = "x"; // { dg-error "from a string literal with type array of .char." "char8_t" }
+char8_t ca4[] = u8"x";
+char8_t ca5[] = u"x"; // { dg-error "from a string literal with type array of .char16_t." "char8_t" }
+
+signed char sca1[] = "x";
+signed char sca2[] = u8"x"; // { dg-error "from a string literal with type array of .char8_t." "char8_t" }
+
+unsigned char uca1[] = "x";
+unsigned char uca2[] = u8"x"; // { dg-error "from a string literal with type array of .char8_t." "char8_t" }
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-keyword-1.C b/gcc/testsuite/g++.dg/ext/char8_t-keyword-1.C
new file mode 100644
index 00000000000..f2475094aa5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-keyword-1.C
@@ -0,0 +1,5 @@
+// Test that char8_t is not a keyword if -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-fno-char8_t" }
+
+int char8_t;
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-keyword-2.C b/gcc/testsuite/g++.dg/ext/char8_t-keyword-2.C
new file mode 100644
index 00000000000..8c84e1e79dd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-keyword-2.C
@@ -0,0 +1,5 @@
+// Test that char8_t is recognized as a keyword if -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-fchar8_t" }
+
+int char8_t; /* { dg-error "multiple types in one declaration|declaration does not declare anything" "char8_t" } */
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-limits-1.C b/gcc/testsuite/g++.dg/ext/char8_t-limits-1.C
new file mode 100644
index 00000000000..0d6df34d23f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-limits-1.C
@@ -0,0 +1,9 @@
+// Test for unsignedness and that the max limit of char8_t is at least 0xFF
+// when -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+static_assert(u8'\xFF' == 0xFF, "Error");
+static_assert(u8"\xFF"[0] == 0xFF, "Error");
+static_assert(char8_t(-1) >= 0, "Error");
+static_assert(char8_t{-1} >= 0, "Error"); // { dg-error "narrowing conversion of .-1. from .int. to .char8_t." "char8_t" }
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-overload-1.C b/gcc/testsuite/g++.dg/ext/char8_t-overload-1.C
new file mode 100644
index 00000000000..48aa44a6691
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-overload-1.C
@@ -0,0 +1,26 @@
+// Test overloading for UTF-8 literals when -fchar8_t is not in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+int fc(char);
+long fc(unsigned char);
+static_assert(is_same<decltype(fc('x')), int>::value, "Error");
+static_assert(is_same<decltype(fc(u8'x')), int>::value, "Error");
+
+int fs(const char*);
+long fs(const unsigned char*);
+static_assert(is_same<decltype(fs("x")), int>::value, "Error");
+static_assert(is_same<decltype(fs(u8"x")), int>::value, "Error");
+
+int fr(const char(&)[2]);
+long fr(const unsigned char(&)[2]);
+static_assert(is_same<decltype(fr("x")), int>::value, "Error");
+static_assert(is_same<decltype(fr(u8"x")), int>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-overload-2.C b/gcc/testsuite/g++.dg/ext/char8_t-overload-2.C
new file mode 100644
index 00000000000..15e28cd6db2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-overload-2.C
@@ -0,0 +1,26 @@
+// Test overloading for UTF-8 literals when -fchar8_t is in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+int fc(char);
+long fc(char8_t);
+static_assert(is_same<decltype(fc('x')), int>::value, "Error");
+static_assert(is_same<decltype(fc(u8'x')), long>::value, "Error");
+
+int fs(const char*);
+long fs(const char8_t*);
+static_assert(is_same<decltype(fs("x")), int>::value, "Error");
+static_assert(is_same<decltype(fs(u8"x")), long>::value, "Error");
+
+int fr(const char(&)[2]);
+long fr(const char8_t(&)[2]);
+static_assert(is_same<decltype(fr("x")), int>::value, "Error");
+static_assert(is_same<decltype(fr(u8"x")), long>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-1.C b/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-1.C
new file mode 100644
index 00000000000..36d411b20db
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-1.C
@@ -0,0 +1,12 @@
+// Test that char8_t related predefined macros are not present when -fchar8_t is
+// not enabled.
+// { dg-do compile }
+// { dg-options "-fno-char8_t" }
+
+#if defined(__CHAR8_TYPE__)
+#error __CHAR8_TYPE__ is defined!
+#endif
+
+#if defined(__GCC_ATOMIC_CHAR8_T_LOCK_FREE)
+#error __GCC_ATOMIC_CHAR8_T_LOCK_FREE is defined!
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-2.C b/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-2.C
new file mode 100644
index 00000000000..06d9b246794
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-predefined-macros-2.C
@@ -0,0 +1,12 @@
+// Test that char8_t related predefined macros are present when -fchar8_t is
+// enabled.
+// { dg-do compile }
+// { dg-options "-fchar8_t" }
+
+#if !defined(__CHAR8_TYPE__)
+#error __CHAR8_TYPE__ is not defined!
+#endif
+
+#if !defined(__GCC_ATOMIC_CHAR8_T_LOCK_FREE)
+#error __GCC_ATOMIC_CHAR8_T_LOCK_FREE is not defined!
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-sizeof-1.C b/gcc/testsuite/g++.dg/ext/char8_t-sizeof-1.C
new file mode 100644
index 00000000000..c4bc4cb3872
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-sizeof-1.C
@@ -0,0 +1,7 @@
+// Test sizeof for char8_t.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+static_assert(sizeof(u8'x') == 1);
+static_assert(sizeof(char8_t) == 1);
+static_assert(sizeof(__CHAR8_TYPE__) == 1);
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-specialization-1.C b/gcc/testsuite/g++.dg/ext/char8_t-specialization-1.C
new file mode 100644
index 00000000000..1c2fe360abc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-specialization-1.C
@@ -0,0 +1,8 @@
+// Test specialization for UTF-8 literals when -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+template<auto> struct ct { static constexpr int dm = 1; };
+template<> struct ct<'x'> { static constexpr int dm = 2; };
+static_assert(ct<'x'>::dm == 2, "Error");
+static_assert(ct<u8'x'>::dm == 2, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-specialization-2.C b/gcc/testsuite/g++.dg/ext/char8_t-specialization-2.C
new file mode 100644
index 00000000000..969e09ecc18
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-specialization-2.C
@@ -0,0 +1,17 @@
+// Test specialization for UTF-8 literals when -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+template<auto> struct ct { static constexpr int dm = 1; };
+template<> struct ct<'x'> { static constexpr int dm = 2; };
+template<> struct ct<u8'x'> { static constexpr int dm = 3; };
+static_assert(ct<'x'>::dm == 2, "Error");
+static_assert(ct<u8'x'>::dm == 3, "Error");
+
+template<typename T, const T *> struct ct2 { static constexpr int dm = 4; };
+template<const char *P> struct ct2<char,P> { static constexpr int dm = 5; };
+template<const char8_t *P> struct ct2<char8_t,P> { static constexpr int dm = 6; };
+constexpr const char s[] = "x";
+constexpr const char8_t s8[] = u8"x";
+static_assert(ct2<char,s>::dm == 5, "Error");
+static_assert(ct2<char8_t,s8>::dm == 6, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-string-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-string-literal-1.C
new file mode 100644
index 00000000000..6cfb47be3a9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-string-literal-1.C
@@ -0,0 +1,12 @@
+// Test that UTF-8 string literals have type const char[] if -fchar8_t is not enabled.
+// { dg-do compile { target c++11 } }
+// { dg-options "-fno-char8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+static_assert(is_same<decltype(u8""), const char(&)[1]>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-string-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-string-literal-2.C
new file mode 100644
index 00000000000..f51df72d7ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-string-literal-2.C
@@ -0,0 +1,12 @@
+// Test that UTF-8 string literals have type const char8_t[] if -fchar8_t is enabled.
+// { dg-do compile { target c++11 } }
+// { dg-options "-fchar8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+static_assert(is_same<decltype(u8""), const char8_t(&)[1]>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-1.C b/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-1.C
new file mode 100644
index 00000000000..dac4a47eea3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-1.C
@@ -0,0 +1,5 @@
+// Test that char8_t is not a type specifier if -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-fno-char8_t" }
+
+char8_t c8; /* { dg-error ".char8_t. does not name a type" "no-char8_t" } */
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-2.C b/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-2.C
new file mode 100644
index 00000000000..ecc5d1c67c0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-type-specifier-2.C
@@ -0,0 +1,16 @@
+// Test that char8_t is recognized as a type specifier if -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-fchar8_t" }
+
+char8_t c8;
+
+signed char8_t         sc8;            /* { dg-error "signed" } */
+unsigned char8_t       uc8;            /* { dg-error "unsigned" } */
+
+short char8_t          shc8;           /* { dg-error "short" } */
+long char8_t           lgc8;           /* { dg-error "long" } */
+
+signed short char8_t   ssc8;           /* { dg-error "signed" } */
+signed long char8_t    slc8;           /* { dg-error "signed" } */
+unsigned short char8_t usc8;           /* { dg-error "unsigned" } */
+unsigned long char8_t  ulc8;           /* { dg-error "unsigned" } */
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-typedef-1.C b/gcc/testsuite/g++.dg/ext/char8_t-typedef-1.C
new file mode 100644
index 00000000000..b77d9a2e6c4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-typedef-1.C
@@ -0,0 +1,6 @@
+// Test that no error is issued for attempted char8_t typedef declarations
+// when -fchar8_t is not enabled.
+// { dg-do compile }
+// { dg-options "-fno-char8_t" }
+
+typedef unsigned char char8_t;
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-typedef-2.C b/gcc/testsuite/g++.dg/ext/char8_t-typedef-2.C
new file mode 100644
index 00000000000..bb20499c26e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-typedef-2.C
@@ -0,0 +1,6 @@
+// Test that an error is issued for attempted char8_t typedef declarations
+// when -fchar8_t is enabled.
+// { dg-do compile }
+// { dg-options "-fchar8_t" }
+
+typedef unsigned char char8_t; // { dg-error "redeclaration" }
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-udl-1.C b/gcc/testsuite/g++.dg/ext/char8_t-udl-1.C
new file mode 100644
index 00000000000..627c263bafe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-udl-1.C
@@ -0,0 +1,19 @@
+// Test overloading for UTF-8 user defined literals when -fchar8_t is not in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fno-char8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+int operator "" _udcl(char);
+static_assert(is_same<decltype('x'_udcl), int>::value, "Error");
+static_assert(is_same<decltype(u8'x'_udcl), int>::value, "Error");
+
+int operator "" _udsl(const char*, __SIZE_TYPE__);
+static_assert(is_same<decltype("x"_udsl), int>::value, "Error");
+static_assert(is_same<decltype(u8"x"_udsl), int>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-udl-2.C b/gcc/testsuite/g++.dg/ext/char8_t-udl-2.C
new file mode 100644
index 00000000000..74cc775e87c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/char8_t-udl-2.C
@@ -0,0 +1,21 @@
+// Test overloading for UTF-8 user defined literals when -fchar8_t is in effect.
+// { dg-do compile }
+// { dg-options "-std=c++17 -fchar8_t" }
+
+template<typename T1, typename T2>
+  struct is_same
+  { static const bool value = false; };
+
+template<typename T>
+  struct is_same<T, T>
+  { static const bool value = true; };
+
+int operator "" _udcl(char);
+long operator "" _udcl(char8_t);
+static_assert(is_same<decltype('x'_udcl), int>::value, "Error");
+static_assert(is_same<decltype(u8'x'_udcl), long>::value, "Error");
+
+int operator "" _udsl(const char*, __SIZE_TYPE__);
+long operator "" _udsl(const char8_t*, __SIZE_TYPE__);
+static_assert(is_same<decltype("x"_udsl), int>::value, "Error");
+static_assert(is_same<decltype(u8"x"_udsl), long>::value, "Error");
diff --git a/gcc/testsuite/g++.dg/ext/utf-array-short-wchar.C b/gcc/testsuite/g++.dg/ext/utf-array-short-wchar.C
index 2ce774abc4a..f37f93c8b73 100644
--- a/gcc/testsuite/g++.dg/ext/utf-array-short-wchar.C
+++ b/gcc/testsuite/g++.dg/ext/utf-array-short-wchar.C
@@ -3,34 +3,44 @@
 /* { dg-do compile { target c++11 } } */
 /* { dg-options "-fshort-wchar" } */
 
+#if __cpp_char8_t
+typedef char8_t u8_char_t;
+#else
+typedef char u8_char_t;
+#endif
+
 const char	s_0[]	= "ab";
-const char	s_1[]	= u"ab";	/* { dg-error "from wide string" } */
-const char	s_2[]	= U"ab";	/* { dg-error "from wide string" } */
-const char	s_3[]	= L"ab";	/* { dg-error "from wide string" } */
+const char	s_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
+const char	s_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
+const char	s_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const u8_char_t	s_4[]	= u8"ab";
 
-const char16_t	s16_0[]	= "ab";		/* { dg-error "from non-wide" } */
+const char16_t	s16_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
 const char16_t	s16_1[]	= u"ab";
-const char16_t	s16_2[]	= U"ab";	/* { dg-error "from incompatible" } */
-const char16_t	s16_3[]	= L"ab";	/* { dg-error "from incompatible" } */
+const char16_t	s16_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
+const char16_t	s16_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const char16_t	s16_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
 
-const char16_t	s16_4[0] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_5[1] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_6[2] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_7[3] = u"ab";
-const char16_t	s16_8[4] = u"ab";
+const char16_t	s16_5[0] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_6[1] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_7[2] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_8[3] = u"ab";
+const char16_t	s16_9[4] = u"ab";
 
-const char32_t	s32_0[]	= "ab";		/* { dg-error "from non-wide" } */
-const char32_t	s32_1[]	= u"ab";	/* { dg-error "from incompatible" } */
+const char32_t	s32_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
+const char32_t	s32_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
 const char32_t	s32_2[]	= U"ab";
-const char32_t	s32_3[]	= L"ab";	/* { dg-error "from incompatible" } */
+const char32_t	s32_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const char32_t	s32_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
 
-const char32_t	s32_4[0] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_5[1] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_6[2] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_7[3] = U"ab";
-const char32_t	s32_8[4] = U"ab";
+const char32_t	s32_5[0] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_6[1] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_7[2] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_8[3] = U"ab";
+const char32_t	s32_9[4] = U"ab";
 
-const wchar_t	sw_0[]	= "ab";		/* { dg-error "from non-wide" } */
-const wchar_t	sw_1[]	= u"ab";	/* { dg-error "from incompatible" } */
-const wchar_t	sw_2[]	= U"ab";	/* { dg-error "from incompatible" } */
+const wchar_t	sw_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
+const wchar_t	sw_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
+const wchar_t	sw_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
 const wchar_t	sw_3[]	= L"ab";
+const wchar_t	sw_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
diff --git a/gcc/testsuite/g++.dg/ext/utf-array.C b/gcc/testsuite/g++.dg/ext/utf-array.C
index 21e438693a2..0e403db7e05 100644
--- a/gcc/testsuite/g++.dg/ext/utf-array.C
+++ b/gcc/testsuite/g++.dg/ext/utf-array.C
@@ -3,34 +3,44 @@
 /* { dg-do compile { target c++11 } } */
 // { dg-options "" }
 
+#if __cpp_char8_t
+typedef char8_t u8_char_t;
+#else
+typedef char u8_char_t;
+#endif
+
 const char	s_0[]	= "ab";
-const char	s_1[]	= u"ab";	/* { dg-error "from wide string" } */
-const char	s_2[]	= U"ab";	/* { dg-error "from wide string" } */
-const char	s_3[]	= L"ab";	/* { dg-error "from wide string" } */
+const char	s_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
+const char	s_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
+const char	s_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const u8_char_t	s_4[]	= u8"ab";
 
-const char16_t	s16_0[]	= "ab";		/* { dg-error "from non-wide" } */
+const char16_t	s16_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
 const char16_t	s16_1[]	= u"ab";
-const char16_t	s16_2[]	= U"ab";	/* { dg-error "from incompatible" } */
-const char16_t	s16_3[]	= L"ab";	/* { dg-error "from incompatible" } */
+const char16_t	s16_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
+const char16_t	s16_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const char16_t	s16_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
 
-const char16_t	s16_4[0] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_5[1] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_6[2] = u"ab";	/* { dg-error "chars is too long" } */
-const char16_t	s16_7[3] = u"ab";
-const char16_t	s16_8[4] = u"ab";
+const char16_t	s16_5[0] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_6[1] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_7[2] = u"ab";	/* { dg-error "chars is too long" } */
+const char16_t	s16_8[3] = u"ab";
+const char16_t	s16_9[4] = u"ab";
 
-const char32_t	s32_0[]	= "ab";		/* { dg-error "from non-wide" } */
-const char32_t	s32_1[]	= u"ab";	/* { dg-error "from incompatible" } */
+const char32_t	s32_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
+const char32_t	s32_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
 const char32_t	s32_2[]	= U"ab";
-const char32_t	s32_3[]	= L"ab";	/* { dg-error "from incompatible" } */
+const char32_t	s32_3[]	= L"ab";	/* { dg-error "from a string literal with type array of .wchar_t." } */
+const char32_t	s32_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
 
-const char32_t	s32_4[0] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_5[1] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_6[2] = U"ab";	/* { dg-error "chars is too long" } */
-const char32_t	s32_7[3] = U"ab";
-const char32_t	s32_8[4] = U"ab";
+const char32_t	s32_5[0] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_6[1] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_7[2] = U"ab";	/* { dg-error "chars is too long" } */
+const char32_t	s32_8[3] = U"ab";
+const char32_t	s32_9[4] = U"ab";
 
-const wchar_t	sw_0[]	= "ab";		/* { dg-error "from non-wide" } */
-const wchar_t	sw_1[]	= u"ab";	/* { dg-error "from incompatible" } */
-const wchar_t	sw_2[]	= U"ab";	/* { dg-error "from incompatible" } */
+const wchar_t	sw_0[]	= "ab";		/* { dg-error "from a string literal with type array of .char." } */
+const wchar_t	sw_1[]	= u"ab";	/* { dg-error "from a string literal with type array of .char16_t." } */
+const wchar_t	sw_2[]	= U"ab";	/* { dg-error "from a string literal with type array of .char32_t." } */
 const wchar_t	sw_3[]	= L"ab";
+const wchar_t	sw_4[]	= u8"ab";	/* { dg-error "from a string literal with type array of .char." } */
diff --git a/gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C b/gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C
new file mode 100644
index 00000000000..0170b36da14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C
@@ -0,0 +1,39 @@
+/* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
+/* Test the char8_t promotion rules. */
+/* { dg-do compile { target c++11 } } */
+/* { dg-options "-fchar8_t -Wall -Wconversion -Wsign-conversion -Wsign-promo" } */
+
+extern void f_c (char);
+extern void fsc (signed char);
+extern void fuc (unsigned char);
+extern void f_s (short);
+extern void fss (signed short);
+extern void fus (unsigned short);
+extern void f_i (int);
+extern void fsi (signed int);
+extern void fui (unsigned int);
+extern void f_l (long);
+extern void fsl (signed long);
+extern void ful (unsigned long);
+extern void f_ll (long long);
+extern void fsll (signed long long);
+extern void full (unsigned long long);
+
+void m(char8_t c)
+{
+    f_c (c);	/* { dg-warning "change the sign" } */
+    fsc (c);	/* { dg-warning "change the sign" } */
+    fuc (c);
+    f_s (c);
+    fss (c);
+    fus (c);
+    f_i (c);
+    fsi (c);
+    fui (c);
+    f_l (c);
+    fsl (c);
+    ful (c);
+    f_ll (c);
+    fsll (c);
+    full (c);
+}
diff --git a/gcc/testsuite/g++.dg/ext/utf-cxx98.C b/gcc/testsuite/g++.dg/ext/utf-cxx98.C
index 365118e3964..ada97be5ef6 100644
--- a/gcc/testsuite/g++.dg/ext/utf-cxx98.C
+++ b/gcc/testsuite/g++.dg/ext/utf-cxx98.C
@@ -1,27 +1,33 @@
 /* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
 /* Expected errors for char16_t/char32_t in c++98. */
-/* Ensure u and U prefixes are parsed as separate tokens in c++98. */
+/* Ensure u8, u and U prefixes are parsed as separate tokens in c++98. */
 /* { dg-do compile } */
 /* { dg-options "-std=c++98" } */
 
 const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
 const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
 
-const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
-const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+const unsigned short	c2	= u'a';		/* { dg-error "not declared" } */
+const unsigned long	c3	= U'a';		/* { dg-error "not declared" } */
+const unsigned char	c4	= u8'a';	/* { dg-error "not declared" } */
 
 #define u	1 +
 #define U	2 +
+#define u8	3 +
 
 const unsigned short	c5	= u'a';
 const unsigned long	c6	= U'a';
+const unsigned char	c7	= u8'a';
 
 #undef u
 #undef U
+#undef u8
 #define u	"a"
 #define U	"b"
+#define u8	"c"
 
 const void		*s0	= u"a";
 const void		*s1	= U"a";
+const void		*s2	= u8"a";
 
 int main () {}
diff --git a/gcc/testsuite/g++.dg/ext/utf-dflt.C b/gcc/testsuite/g++.dg/ext/utf-dflt.C
index c2b127d5dda..6bf020f7cdb 100644
--- a/gcc/testsuite/g++.dg/ext/utf-dflt.C
+++ b/gcc/testsuite/g++.dg/ext/utf-dflt.C
@@ -1,27 +1,33 @@
 /* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
 /* Expected errors for char16_t/char32_t in default std. */
-/* Ensure u and U prefixes are parsed as separate tokens in default std. */
+/* Ensure u8, u and U prefixes are parsed as separate tokens in default std. */
 /* { dg-do compile } */
 /* { dg-options "-std=c++98" } */
 
 const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
 const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
 
-const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
-const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+const unsigned short	c2	= u'a';		/* { dg-error "not declared" } */
+const unsigned long	c3	= U'a';		/* { dg-error "not declared" } */
+const unsigned char	c4	= u8'a';	/* { dg-error "not declared" } */
 
 #define u	1 +
 #define U	2 +
+#define u8	3 +
 
-const unsigned short	c4	= u'a';
-const unsigned long	c5	= U'a';
+const unsigned short	c5	= u'a';
+const unsigned long	c6	= U'a';
+const unsigned char	c7	= u8'a';
 
 #undef u
 #undef U
+#undef u8
 #define u	"a"
 #define U	"b"
+#define u8	"c"
 
 const void		*s0	= u"a";
 const void		*s1	= U"a";
+const void		*s2	= u8"a";
 
 int main () {}
diff --git a/gcc/testsuite/g++.dg/ext/utf-gnuxx98.C b/gcc/testsuite/g++.dg/ext/utf-gnuxx98.C
index b3be121e2dc..dc9a814c161 100644
--- a/gcc/testsuite/g++.dg/ext/utf-gnuxx98.C
+++ b/gcc/testsuite/g++.dg/ext/utf-gnuxx98.C
@@ -1,27 +1,33 @@
 /* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
 /* Expected errors for char16_t/char32_t in gnu++98. */
-/* Ensure u and U prefixes are parsed as separate tokens in gnu++98. */
+/* Ensure u8, u and U prefixes are parsed as separate tokens in gnu++98. */
 /* { dg-do compile } */
 /* { dg-options "-std=gnu++98" } */
 
 const static char16_t	c0	= 'a';	/* { dg-error "not name a type" } */
 const static char32_t	c1	= 'a';	/* { dg-error "not name a type" } */
 
-const unsigned short	c2	= u'a';	/* { dg-error "not declared" } */
-const unsigned long	c3	= U'a';	/* { dg-error "not declared" } */
+const unsigned short	c2	= u'a';		/* { dg-error "not declared" } */
+const unsigned long	c3	= U'a';		/* { dg-error "not declared" } */
+const unsigned char	c4	= u8'a';	/* { dg-error "not declared" } */
 
 #define u	1 +
 #define U	2 +
+#define u8	3 +
 
 const unsigned short	c5	= u'a';
 const unsigned long	c6	= U'a';
+const unsigned char	c7	= u8'a';
 
 #undef u
 #undef U
+#undef u8
 #define u	"a"
 #define U	"b"
+#define u8	"c"
 
 const void		*s0	= u"a";
 const void		*s1	= U"a";
+const void		*s2	= u8"a";
 
 int main () {}
diff --git a/gcc/testsuite/g++.dg/ext/utf-type-char8_t.C b/gcc/testsuite/g++.dg/ext/utf-type-char8_t.C
new file mode 100644
index 00000000000..a7d8b16a285
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/utf-type-char8_t.C
@@ -0,0 +1,11 @@
+/* Ensure that __CHAR8_TYPE__ exists and matches the underlying type. */
+/* { dg-do run { target c++11 } } */
+/* { dg-options "-fchar8_t -Wall -Werror" } */
+
+extern "C" void abort (void);
+
+int main ()
+{
+    if (sizeof (__CHAR8_TYPE__) != sizeof (char8_t))
+	abort();
+}
diff --git a/gcc/testsuite/g++.dg/ext/utf8-1.C b/gcc/testsuite/g++.dg/ext/utf8-1.C
index a1a3518a497..089465fa518 100644
--- a/gcc/testsuite/g++.dg/ext/utf8-1.C
+++ b/gcc/testsuite/g++.dg/ext/utf8-1.C
@@ -2,15 +2,20 @@
 // { dg-require-iconv "ISO-8859-2" }
 // { dg-options "-fexec-charset=ISO-8859-2" }
 
+#if __cpp_char8_t
+typedef char8_t u8_char_t;
+#else
+typedef char u8_char_t;
+#endif
+
 const char *str1 = "h\u00e1\U0000010Dky ";
 const char *str2 = "\u010d\u00E1rky\n";
-const char *str3 = u8"h\u00e1\U0000010Dky ";
-const char *str4 = u8"\u010d\u00E1rky\n";
+const u8_char_t *str3 = u8"h\u00e1\U0000010Dky ";
+const u8_char_t *str4 = u8"\u010d\u00E1rky\n";
 const char *str5 = "h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
-const char *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
-const char *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
-#define u8
-const char *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+const u8_char_t *str6 = u8"h\u00e1\U0000010Dky " "\u010d\u00E1rky\n";
+const u8_char_t *str7 = "h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
+const u8_char_t *str8 = u8"h\u00e1\U0000010Dky " u8"\u010d\u00E1rky\n";
 
 const char latin2_1[] = "\x68\xe1\xe8\x6b\x79\x20";
 const char latin2_2[] = "\xe8\xe1\x72\x6b\x79\n";
@@ -22,16 +27,16 @@ main (void)
 {
   if (__builtin_strcmp (str1, latin2_1) != 0
       || __builtin_strcmp (str2, latin2_2) != 0
-      || __builtin_strcmp (str3, utf8_1) != 0
-      || __builtin_strcmp (str4, utf8_2) != 0
+      || __builtin_memcmp (str3, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_memcmp (str4, utf8_2, sizeof (utf8_2) - 1) != 0
       || __builtin_strncmp (str5, latin2_1, sizeof (latin2_1) - 1) != 0
       || __builtin_strcmp (str5 + sizeof (latin2_1) - 1, latin2_2) != 0
-      || __builtin_strncmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0
-      || __builtin_strcmp (str6 + sizeof (utf8_1) - 1, utf8_2) != 0
-      || __builtin_strncmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0
-      || __builtin_strcmp (str7 + sizeof (utf8_1) - 1, utf8_2) != 0
-      || __builtin_strncmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0
-      || __builtin_strcmp (str8 + sizeof (utf8_1) - 1, utf8_2) != 0)
+      || __builtin_memcmp (str6, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_memcmp (str6 + sizeof (utf8_1) - 1, utf8_2, sizeof (utf8_2) - 1) != 0
+      || __builtin_memcmp (str7, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_memcmp (str7 + sizeof (utf8_1) - 1, utf8_2, sizeof (utf8_2) - 1) != 0
+      || __builtin_memcmp (str8, utf8_1, sizeof (utf8_1) - 1) != 0
+      || __builtin_memcmp (str8 + sizeof (utf8_1) - 1, utf8_2, sizeof (utf8_2) - 1) != 0)
     __builtin_abort ();
   if (sizeof ("a" u8"b"[0]) != 1
       || sizeof (u8"a" "b"[0]) != 1
diff --git a/gcc/testsuite/g++.dg/ext/utf8-2.C b/gcc/testsuite/g++.dg/ext/utf8-2.C
index bafe6e8351c..b13d55f1139 100644
--- a/gcc/testsuite/g++.dg/ext/utf8-2.C
+++ b/gcc/testsuite/g++.dg/ext/utf8-2.C
@@ -1,21 +1,27 @@
 // { dg-do compile { target c++11 } }
 // { dg-options "" }
 
-const char	s0[]	= u8"ab";
-const char16_t	s1[]	= u8"ab";	// { dg-error "from non-wide" }
-const char32_t  s2[]    = u8"ab";	// { dg-error "from non-wide" }
-const wchar_t   s3[]    = u8"ab";	// { dg-error "from non-wide" }
+#if __cpp_char8_t
+typedef char8_t u8_char_t;
+#else
+typedef char u8_char_t;
+#endif
 
-const char      t0[0]   = u8"ab";	// { dg-error "chars is too long" }
-const char      t1[1]   = u8"ab";	// { dg-error "chars is too long" }
-const char      t2[2]   = u8"ab";	// { dg-error "chars is too long" }
-const char      t3[3]   = u8"ab";
-const char      t4[4]   = u8"ab";
+const u8_char_t	s0[]	= u8"ab";
+const char16_t	s1[]	= u8"ab";	// { dg-error "from a string literal with type array of .char." }
+const char32_t  s2[]    = u8"ab";	// { dg-error "from a string literal with type array of .char." }
+const wchar_t   s3[]    = u8"ab";	// { dg-error "from a string literal with type array of .char." }
 
-const char      u0[0]   = u8"\u2160.";	// { dg-error "chars is too long" }
-const char      u1[1]   = u8"\u2160.";	// { dg-error "chars is too long" }
-const char      u2[2]   = u8"\u2160.";	// { dg-error "chars is too long" }
-const char      u3[3]   = u8"\u2160.";	// { dg-error "chars is too long" }
-const char      u4[4]   = u8"\u2160.";	// { dg-error "chars is too long" }
-const char      u5[5]   = u8"\u2160.";
-const char      u6[6]   = u8"\u2160.";
+const u8_char_t      t0[0]   = u8"ab";	// { dg-error "chars is too long" }
+const u8_char_t      t1[1]   = u8"ab";	// { dg-error "chars is too long" }
+const u8_char_t      t2[2]   = u8"ab";	// { dg-error "chars is too long" }
+const u8_char_t      t3[3]   = u8"ab";
+const u8_char_t      t4[4]   = u8"ab";
+
+const u8_char_t      u0[0]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const u8_char_t      u1[1]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const u8_char_t      u2[2]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const u8_char_t      u3[3]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const u8_char_t      u4[4]   = u8"\u2160.";	// { dg-error "chars is too long" }
+const u8_char_t      u5[5]   = u8"\u2160.";
+const u8_char_t      u6[6]   = u8"\u2160.";
diff --git a/gcc/testsuite/g++.dg/warn/Wformat-ranges-c++11.C b/gcc/testsuite/g++.dg/warn/Wformat-ranges-c++11.C
index a4d3fff2967..653171b4357 100644
--- a/gcc/testsuite/g++.dg/warn/Wformat-ranges-c++11.C
+++ b/gcc/testsuite/g++.dg/warn/Wformat-ranges-c++11.C
@@ -1,4 +1,5 @@
 // { dg-do compile { target c++11 } }
+// { dg-skip-if "char8_t" { c++2a } }
 /* { dg-options "-Wformat -fdiagnostics-show-caret" } */
 
 /* C++11-specific format tests. */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2019-01-14 19:59   ` Jason Merrill
@ 2019-01-15  4:08     ` Tom Honermann
  2019-01-15  6:51     ` Christophe Lyon
  1 sibling, 0 replies; 14+ messages in thread
From: Tom Honermann @ 2019-01-15  4:08 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches

On 1/14/19 2:58 PM, Jason Merrill wrote:
> On 12/23/18 9:27 PM, Tom Honermann wrote:
>> Attached is a revised patch that addresses changes in P0482R6 as well 
>> as feedback provided by Jason. Changes from the prior patch include:
>> - Updated the value of the __cpp_char8_t feature test macro to 201811
>>    per P0482R6.
>> - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
>>    San Diego.
>> - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
>>    by Jason.
>> - Removed unnecessary checks of 'flag_char8_t' within the C++ front
>>    end as requested by Jason.
>> - Corrected the regression spotted by Jason regarding initialization of
>>    signed char and unsigned char arrays with string literals.
>> - Made minor changes to the error message emitted for ill-formed
>>    initialization of char arrays with UTF-8 string literals. These
>>    changes do not yet implement Jason's suggestion; I'll follow up 
>> with a
>>    separate patch for that due to additional test impact.
>>
>> Tested on x86_64-linux.
>
> I just applied the compiler changes with small modifications, as 
> follows; thank you very much for the patches.  Jonathan should check 
> in the library portion before long.

Excellent, thank you, Jason!

Tom.

>
> Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2019-01-14 19:59   ` Jason Merrill
  2019-01-15  4:08     ` Tom Honermann
@ 2019-01-15  6:51     ` Christophe Lyon
  2019-01-15 15:50       ` Tom Honermann
  1 sibling, 1 reply; 14+ messages in thread
From: Christophe Lyon @ 2019-01-15  6:51 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Tom Honermann, gcc-patches

On Mon, 14 Jan 2019 at 20:59, Jason Merrill <jason@redhat.com> wrote:
>
> On 12/23/18 9:27 PM, Tom Honermann wrote:
> > Attached is a revised patch that addresses changes in P0482R6 as well as
> > feedback provided by Jason.  Changes from the prior patch include:
> > - Updated the value of the __cpp_char8_t feature test macro to 201811
> >    per P0482R6.
> > - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
> >    San Diego.
> > - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
> >    by Jason.
> > - Removed unnecessary checks of 'flag_char8_t' within the C++ front
> >    end as requested by Jason.
> > - Corrected the regression spotted by Jason regarding initialization of
> >    signed char and unsigned char arrays with string literals.
> > - Made minor changes to the error message emitted for ill-formed
> >    initialization of char arrays with UTF-8 string literals.  These
> >    changes do not yet implement Jason's suggestion; I'll follow up with a
> >    separate patch for that due to additional test impact.
> >
> > Tested on x86_64-linux.
>
> I just applied the compiler changes with small modifications, as
> follows; thank you very much for the patches.  Jonathan should check in
> the library portion before long.
>
> Jason

Hi,

The new testcase g++.dg/ext/utf-cvt-char8_t.C fails at least on arm and aarch64:

g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++14  (test for warnings, line 24)
g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++17  (test for warnings, line 24)

Christophe

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2019-01-15  6:51     ` Christophe Lyon
@ 2019-01-15 15:50       ` Tom Honermann
  2019-01-15 18:28         ` Jason Merrill
  0 siblings, 1 reply; 14+ messages in thread
From: Tom Honermann @ 2019-01-15 15:50 UTC (permalink / raw)
  To: Christophe Lyon, Jason Merrill; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1911 bytes --]

On 1/15/19 1:51 AM, Christophe Lyon wrote:
> On Mon, 14 Jan 2019 at 20:59, Jason Merrill <jason@redhat.com> wrote:
>> On 12/23/18 9:27 PM, Tom Honermann wrote:
>>> Attached is a revised patch that addresses changes in P0482R6 as well as
>>> feedback provided by Jason.  Changes from the prior patch include:
>>> - Updated the value of the __cpp_char8_t feature test macro to 201811
>>>     per P0482R6.
>>> - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
>>>     San Diego.
>>> - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
>>>     by Jason.
>>> - Removed unnecessary checks of 'flag_char8_t' within the C++ front
>>>     end as requested by Jason.
>>> - Corrected the regression spotted by Jason regarding initialization of
>>>     signed char and unsigned char arrays with string literals.
>>> - Made minor changes to the error message emitted for ill-formed
>>>     initialization of char arrays with UTF-8 string literals.  These
>>>     changes do not yet implement Jason's suggestion; I'll follow up with a
>>>     separate patch for that due to additional test impact.
>>>
>>> Tested on x86_64-linux.
>> I just applied the compiler changes with small modifications, as
>> follows; thank you very much for the patches.  Jonathan should check in
>> the library portion before long.
>>
>> Jason
> Hi,
>
> The new testcase g++.dg/ext/utf-cvt-char8_t.C fails at least on arm and aarch64:
>
> g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++14  (test for warnings, line 24)
> g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++17  (test for warnings, line 24)

Arm and aarch64 have unsigned char by default, so the warning 
("conversion to 'char' from 'char8_t' may change the sign of the 
result") isn't emitted on those platforms.  I presume adding 
'-fsigned-char' to the options for the test would be a sufficient fix?  
If so, a patch is attached.

Tom.

>
> Christophe



[-- Attachment #2: test-utf-cvt-char8_t.C.diff --]
[-- Type: text/x-patch, Size: 648 bytes --]

Index: gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C
===================================================================
--- gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C	(revision 267930)
+++ gcc/testsuite/g++.dg/ext/utf-cvt-char8_t.C	(working copy)
@@ -1,7 +1,7 @@
 /* Contributed by Kris Van Hees <kris.van.hees@oracle.com> */
 /* Test the char8_t promotion rules. */
 /* { dg-do compile { target c++11 } } */
-/* { dg-options "-fchar8_t -Wall -Wconversion -Wsign-conversion -Wsign-promo" } */
+/* { dg-options "-fchar8_t -fsigned-char -Wall -Wconversion -Wsign-conversion -Wsign-promo" } */
 
 extern void f_c (char);
 extern void fsc (signed char);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [REVISED PATCH 2/9]: C++ P0482R5 char8_t: Core language support
  2019-01-15 15:50       ` Tom Honermann
@ 2019-01-15 18:28         ` Jason Merrill
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Merrill @ 2019-01-15 18:28 UTC (permalink / raw)
  To: Tom Honermann, Christophe Lyon; +Cc: gcc-patches

On 1/15/19 10:50 AM, Tom Honermann wrote:
> On 1/15/19 1:51 AM, Christophe Lyon wrote:
>> On Mon, 14 Jan 2019 at 20:59, Jason Merrill <jason@redhat.com> wrote:
>>> On 12/23/18 9:27 PM, Tom Honermann wrote:
>>>> Attached is a revised patch that addresses changes in P0482R6 as 
>>>> well as
>>>> feedback provided by Jason.  Changes from the prior patch include:
>>>> - Updated the value of the __cpp_char8_t feature test macro to 201811
>>>>     per P0482R6.
>>>> - Enable char8_t support with -std=c++2a per adoption of P0482R6 in
>>>>     San Diego.
>>>> - Reverted the unnecessary changes to gcc/gcc/c/c-typeck.c as requested
>>>>     by Jason.
>>>> - Removed unnecessary checks of 'flag_char8_t' within the C++ front
>>>>     end as requested by Jason.
>>>> - Corrected the regression spotted by Jason regarding initialization of
>>>>     signed char and unsigned char arrays with string literals.
>>>> - Made minor changes to the error message emitted for ill-formed
>>>>     initialization of char arrays with UTF-8 string literals.  These
>>>>     changes do not yet implement Jason's suggestion; I'll follow up 
>>>> with a
>>>>     separate patch for that due to additional test impact.
>>>>
>>>> Tested on x86_64-linux.
>>> I just applied the compiler changes with small modifications, as
>>> follows; thank you very much for the patches.  Jonathan should check in
>>> the library portion before long.
>>>
>>> Jason
>> Hi,
>>
>> The new testcase g++.dg/ext/utf-cvt-char8_t.C fails at least on arm 
>> and aarch64:
>>
>> g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++14  (test for warnings, line 24)
>> g++.dg/ext/utf-cvt-char8_t.C  -std=gnu++17  (test for warnings, line 24)
> 
> Arm and aarch64 have unsigned char by default, so the warning 
> ("conversion to 'char' from 'char8_t' may change the sign of the 
> result") isn't emitted on those platforms.  I presume adding 
> '-fsigned-char' to the options for the test would be a sufficient fix? 
> If so, a patch is attached.

Applied, thanks.

Jason

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-15 18:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-05 19:40 [PATCH 2/9]: C++ P0482R5 char8_t: Core language support Tom Honermann
2018-12-03 21:51 ` Jason Merrill
2018-12-03 22:01   ` Jason Merrill
2018-12-05  7:10     ` Tom Honermann
2018-12-05 16:16       ` Jason Merrill
2018-12-17 21:02         ` Jason Merrill
2018-12-17 21:47           ` Tom Honermann
2018-12-24  2:32             ` Tom Honermann
2018-12-24  2:27 ` [REVISED PATCH " Tom Honermann
2019-01-14 19:59   ` Jason Merrill
2019-01-15  4:08     ` Tom Honermann
2019-01-15  6:51     ` Christophe Lyon
2019-01-15 15:50       ` Tom Honermann
2019-01-15 18:28         ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).