public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v4 0/2] Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++
@ 2022-08-02 18:36 Tom Honermann
  2022-08-02 18:36 ` [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Tom Honermann
  2022-08-02 18:36 ` [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes Tom Honermann
  0 siblings, 2 replies; 7+ messages in thread
From: Tom Honermann @ 2022-08-02 18:36 UTC (permalink / raw)
  To: gcc-patches

This patch series provides an implementation and tests for the WG14 N2653
paper as adopted for C2X.

Additionally, a fix is included for the C++ preprocessor to treat UTF-8
character literals in preprocessor directives as an unsigned type in char8_t
enabled modes (in C++17 and earlier with -fchar8_t or in C++20 or later
without -fno-char8_t).

Tom Honermann (2):
  C: Implement C2X N2653 char8_t and UTF-8 string literal changes
  preprocessor/106426: Treat u8 character literals as unsigned in
    char8_t modes.

 gcc/c-family/c-lex.cc                         | 13 ++++--
 gcc/c-family/c-opts.cc                        |  5 ++-
 gcc/c/c-parser.cc                             | 16 ++++++-
 gcc/c/c-typeck.cc                             |  2 +-
 gcc/ginclude/stdatomic.h                      |  6 +++
 .../g++.dg/ext/char8_t-char-literal-1.C       |  6 ++-
 .../g++.dg/ext/char8_t-char-literal-2.C       |  4 ++
 .../atomic/c2x-stdatomic-lockfree-char8_t.c   | 42 +++++++++++++++++++
 .../atomic/gnu2x-stdatomic-lockfree-char8_t.c |  5 +++
 gcc/testsuite/gcc.dg/c11-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c17-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str.c            | 34 +++++++++++++++
 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c     |  5 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str.c          | 34 +++++++++++++++
 libcpp/charset.cc                             |  4 +-
 libcpp/include/cpplib.h                       |  4 +-
 libcpp/init.cc                                |  1 +
 18 files changed, 185 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/c11-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c17-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str.c

-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes
  2022-08-02 18:36 [PATCH v4 0/2] Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++ Tom Honermann
@ 2022-08-02 18:36 ` Tom Honermann
  2022-08-02 22:10   ` Joseph Myers
  2022-08-02 18:36 ` [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes Tom Honermann
  1 sibling, 1 reply; 7+ messages in thread
From: Tom Honermann @ 2022-08-02 18:36 UTC (permalink / raw)
  To: gcc-patches

This patch implements the core language and compiler dependent library
changes adopted for C2X via WG14 N2653.  The changes include:
- Change of type for UTF-8 string literals from array of const char to
  array of const char8_t (unsigned char).
- A new atomic_char8_t typedef.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
  __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.

gcc/ChangeLog:

	* ginclude/stdatomic.h (atomic_char8_t,
	ATOMIC_CHAR8_T_LOCK_FREE): New typedef and macro.

gcc/c/ChangeLog:

	* c-parser.c (c_parser_string_literal): Use char8_t as the type
	of CPP_UTF8STRING when char8_t support is enabled.
	* c-typeck.c (digest_init): Allow initialization of an array
	of character type by a string literal with type array of
	char8_t.

gcc/c-family/ChangeLog:

	* c-lex.c (lex_string, lex_charconst): Use char8_t as the type
	of CPP_UTF8CHAR and CPP_UTF8STRING when char8_t support is
	enabled.
	* c-opts.c (c_common_post_options): Set flag_char8_t if
	targeting C2x.

gcc/testsuite/ChangeLog:
	* gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c: New test.
	* gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c: New test.
	* gcc.dg/c11-utf8str-type.c: New test.
	* gcc.dg/c17-utf8str-type.c: New test.
	* gcc.dg/c2x-utf8str-type.c: New test.
	* gcc.dg/c2x-utf8str.c: New test.
	* gcc.dg/gnu2x-utf8str-type.c: New test.
	* gcc.dg/gnu2x-utf8str.c: New test.
---
 gcc/c-family/c-lex.cc                         | 13 ++++--
 gcc/c-family/c-opts.cc                        |  4 +-
 gcc/c/c-parser.cc                             | 16 ++++++-
 gcc/c/c-typeck.cc                             |  2 +-
 gcc/ginclude/stdatomic.h                      |  6 +++
 .../atomic/c2x-stdatomic-lockfree-char8_t.c   | 42 +++++++++++++++++++
 .../atomic/gnu2x-stdatomic-lockfree-char8_t.c |  5 +++
 gcc/testsuite/gcc.dg/c11-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c17-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str-type.c       |  6 +++
 gcc/testsuite/gcc.dg/c2x-utf8str.c            | 34 +++++++++++++++
 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c     |  5 +++
 gcc/testsuite/gcc.dg/gnu2x-utf8str.c          | 34 +++++++++++++++
 13 files changed, 170 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
 create mode 100644 gcc/testsuite/gcc.dg/c11-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c17-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/c2x-utf8str.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
 create mode 100644 gcc/testsuite/gcc.dg/gnu2x-utf8str.c

diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index 8bfa4f4024f..0b6f94e18a8 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -1352,7 +1352,14 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
 	default:
 	case CPP_STRING:
 	case CPP_UTF8STRING:
-	  value = build_string (1, "");
+	  if (type == CPP_UTF8STRING && flag_char8_t)
+	    {
+	      value = build_string (TYPE_PRECISION (char8_type_node)
+				    / TYPE_PRECISION (char_type_node),
+				    "");  /* char8_t is 8 bits */
+	    }
+	  else
+	    value = build_string (1, "");
 	  break;
 	case CPP_STRING16:
 	  value = build_string (TYPE_PRECISION (char16_type_node)
@@ -1425,9 +1432,7 @@ lex_charconst (const cpp_token *token)
     type = char16_type_node;
   else if (token->type == CPP_UTF8CHAR)
     {
-      if (!c_dialect_cxx ())
-	type = unsigned_char_type_node;
-      else if (flag_char8_t)
+      if (flag_char8_t)
         type = char8_type_node;
       else
         type = char_type_node;
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index b9f01a65ed7..108adc5caf8 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1059,9 +1059,9 @@ c_common_post_options (const char **pfilename)
   if (flag_sized_deallocation == -1)
     flag_sized_deallocation = (cxx_dialect >= cxx14);
 
-  /* char8_t support is new in C++20.  */
+  /* char8_t support is implicitly enabled in C++20 and C2X.  */
   if (flag_char8_t == -1)
-    flag_char8_t = (cxx_dialect >= cxx20);
+    flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x;
 
   if (flag_extern_tls_init)
     {
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 92049d1a101..fa9395986de 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -7447,7 +7447,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok)
 	default:
 	case CPP_STRING:
 	case CPP_UTF8STRING:
-	  value = build_string (1, "");
+	  if (type == CPP_UTF8STRING && flag_char8_t)
+	    {
+	      value = build_string (TYPE_PRECISION (char8_type_node)
+				    / TYPE_PRECISION (char_type_node),
+				    "");  /* char8_t is 8 bits */
+	    }
+	  else
+	    value = build_string (1, "");
 	  break;
 	case CPP_STRING16:
 	  value = build_string (TYPE_PRECISION (char16_type_node)
@@ -7472,9 +7479,14 @@ c_parser_string_literal (c_parser *parser, bool translate, bool wide_ok)
     {
     default:
     case CPP_STRING:
-    case CPP_UTF8STRING:
       TREE_TYPE (value) = char_array_type_node;
       break;
+    case CPP_UTF8STRING:
+      if (flag_char8_t)
+	TREE_TYPE (value) = char8_array_type_node;
+      else
+	TREE_TYPE (value) = char_array_type_node;
+      break;
     case CPP_STRING16:
       TREE_TYPE (value) = char16_array_type_node;
       break;
diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
index fd0a7f81a7a..231f4e980b6 100644
--- a/gcc/c/c-typeck.cc
+++ b/gcc/c/c-typeck.cc
@@ -8045,7 +8045,7 @@ digest_init (location_t init_loc, tree type, tree init, tree origtype,
 
 	  if (char_array)
 	    {
-	      if (typ2 != char_type_node)
+	      if (typ2 != char_type_node && typ2 != char8_type_node)
 		incompat_string_cst = true;
 	    }
 	  else if (!comptypes (typ1, typ2))
diff --git a/gcc/ginclude/stdatomic.h b/gcc/ginclude/stdatomic.h
index bfcfdf664c7..9f2475b739d 100644
--- a/gcc/ginclude/stdatomic.h
+++ b/gcc/ginclude/stdatomic.h
@@ -49,6 +49,9 @@ typedef _Atomic long atomic_long;
 typedef _Atomic unsigned long atomic_ulong;
 typedef _Atomic long long atomic_llong;
 typedef _Atomic unsigned long long atomic_ullong;
+#ifdef __CHAR8_TYPE__
+typedef _Atomic __CHAR8_TYPE__ atomic_char8_t;
+#endif
 typedef _Atomic __CHAR16_TYPE__ atomic_char16_t;
 typedef _Atomic __CHAR32_TYPE__ atomic_char32_t;
 typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t;
@@ -97,6 +100,9 @@ extern void atomic_signal_fence (memory_order);
 
 #define ATOMIC_BOOL_LOCK_FREE		__GCC_ATOMIC_BOOL_LOCK_FREE
 #define ATOMIC_CHAR_LOCK_FREE		__GCC_ATOMIC_CHAR_LOCK_FREE
+#ifdef __GCC_ATOMIC_CHAR8_T_LOCK_FREE
+#define ATOMIC_CHAR8_T_LOCK_FREE	__GCC_ATOMIC_CHAR8_T_LOCK_FREE
+#endif
 #define ATOMIC_CHAR16_T_LOCK_FREE	__GCC_ATOMIC_CHAR16_T_LOCK_FREE
 #define ATOMIC_CHAR32_T_LOCK_FREE	__GCC_ATOMIC_CHAR32_T_LOCK_FREE
 #define ATOMIC_WCHAR_T_LOCK_FREE	__GCC_ATOMIC_WCHAR_T_LOCK_FREE
diff --git a/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c b/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
new file mode 100644
index 00000000000..1b692f55ed0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/c2x-stdatomic-lockfree-char8_t.c
@@ -0,0 +1,42 @@
+/* Test atomic_is_lock_free for char8_t.  */
+/* { dg-do run } */
+/* { dg-options "-std=c2x -pedantic-errors" } */
+
+#include <stdatomic.h>
+#include <stdint.h>
+
+extern void abort (void);
+
+_Atomic __CHAR8_TYPE__ ac8a;
+atomic_char8_t ac8t;
+
+#define CHECK_TYPE(MACRO, V1, V2)		\
+  do						\
+    {						\
+      int r1 = MACRO;				\
+      int r2 = atomic_is_lock_free (&V1);	\
+      int r3 = atomic_is_lock_free (&V2);	\
+      if (r1 != 0 && r1 != 1 && r1 != 2)	\
+	abort ();				\
+      if (r2 != 0 && r2 != 1)			\
+	abort ();				\
+      if (r3 != 0 && r3 != 1)			\
+	abort ();				\
+      if (r1 == 2 && r2 != 1)			\
+	abort ();				\
+      if (r1 == 2 && r3 != 1)			\
+	abort ();				\
+      if (r1 == 0 && r2 != 0)			\
+	abort ();				\
+      if (r1 == 0 && r3 != 0)			\
+	abort ();				\
+    }						\
+  while (0)
+
+int
+main ()
+{
+  CHECK_TYPE (ATOMIC_CHAR8_T_LOCK_FREE, ac8a, ac8t);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c b/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
new file mode 100644
index 00000000000..27a3cfe3552
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/atomic/gnu2x-stdatomic-lockfree-char8_t.c
@@ -0,0 +1,5 @@
+/* Test atomic_is_lock_free for char8_t with -std=gnu2x.  */
+/* { dg-do run } */
+/* { dg-options "-std=gnu2x -pedantic-errors" } */
+
+#include "c2x-stdatomic-lockfree-char8_t.c"
diff --git a/gcc/testsuite/gcc.dg/c11-utf8str-type.c b/gcc/testsuite/gcc.dg/c11-utf8str-type.c
new file mode 100644
index 00000000000..8be9abb9686
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c11-utf8str-type.c
@@ -0,0 +1,6 @@
+/* Test C11 UTF-8 string literal type.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c11" } */
+
+_Static_assert (_Generic (u8"text", char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type");
+_Static_assert (_Generic (u8"x"[0], char:  1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type");
diff --git a/gcc/testsuite/gcc.dg/c17-utf8str-type.c b/gcc/testsuite/gcc.dg/c17-utf8str-type.c
new file mode 100644
index 00000000000..515c6db3970
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c17-utf8str-type.c
@@ -0,0 +1,6 @@
+/* Test C17 UTF-8 string literal type.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c17" } */
+
+_Static_assert (_Generic (u8"text", char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type");
+_Static_assert (_Generic (u8"x"[0], char:  1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type");
diff --git a/gcc/testsuite/gcc.dg/c2x-utf8str-type.c b/gcc/testsuite/gcc.dg/c2x-utf8str-type.c
new file mode 100644
index 00000000000..ebdde97b57a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-utf8str-type.c
@@ -0,0 +1,6 @@
+/* Test C2X UTF-8 string literal type.  */
+/* { dg-do compile } */
+/* { dg-options "-std=c2x" } */
+
+_Static_assert (_Generic (u8"text", unsigned char*: 1, default: 2) == 1, "UTF-8 string literals have an unexpected type");
+_Static_assert (_Generic (u8"x"[0], unsigned char:  1, default: 2) == 1, "UTF-8 string literal elements have an unexpected type");
diff --git a/gcc/testsuite/gcc.dg/c2x-utf8str.c b/gcc/testsuite/gcc.dg/c2x-utf8str.c
new file mode 100644
index 00000000000..2e4c392da9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/c2x-utf8str.c
@@ -0,0 +1,34 @@
+/* Test initialization by UTF-8 string literal in C2X.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=c2x" } */
+
+typedef __CHAR8_TYPE__  char8_t;
+typedef __CHAR16_TYPE__ char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+typedef __WCHAR_TYPE__  wchar_t;
+
+/* Test that char, signed char, unsigned char, and char8_t arrays can be
+   initialized by a UTF-8 string literal.  */
+const char cbuf1[] = u8"text";
+const char cbuf2[] = { u8"text" };
+const signed char scbuf1[] = u8"text";
+const signed char scbuf2[] = { u8"text" };
+const unsigned char ucbuf1[] = u8"text";
+const unsigned char ucbuf2[] = { u8"text" };
+const char8_t c8buf1[] = u8"text";
+const char8_t c8buf2[] = { u8"text" };
+
+/* Test that a diagnostic is issued for attempted initialization of
+   other character types by a UTF-8 string literal.  */
+const char16_t c16buf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char16_t c16buf2[] = { u8"text" };	/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char32_t c32buf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char32_t c32buf2[] = { u8"text" };	/* { dg-error "from a string literal with type array of .unsigned char." } */
+const wchar_t wbuf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const wchar_t wbuf2[] = { u8"text" };		/* { dg-error "from a string literal with type array of .unsigned char." } */
+
+/* Test that char8_t arrays can be initialized by an ordinary string
+   literal.  */
+const char8_t c8buf3[] = "text";
+const char8_t c8buf4[] = { "text" };
diff --git a/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c b/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
new file mode 100644
index 00000000000..efe16ffc28d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gnu2x-utf8str-type.c
@@ -0,0 +1,5 @@
+/* Test C2X UTF-8 string literal type with -std=gnu2x.  */
+/* { dg-do compile } */
+/* { dg-options "-std=gnu2x" } */
+
+#include "c2x-utf8str-type.c"
diff --git a/gcc/testsuite/gcc.dg/gnu2x-utf8str.c b/gcc/testsuite/gcc.dg/gnu2x-utf8str.c
new file mode 100644
index 00000000000..f3719ea8c77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gnu2x-utf8str.c
@@ -0,0 +1,34 @@
+/* Test initialization by UTF-8 string literal in C2X with -std=gnu2x.  */
+/* { dg-do compile } */
+/* { dg-require-effective-target wchar } */
+/* { dg-options "-std=gnu2x" } */
+
+typedef __CHAR8_TYPE__  char8_t;
+typedef __CHAR16_TYPE__ char16_t;
+typedef __CHAR32_TYPE__ char32_t;
+typedef __WCHAR_TYPE__  wchar_t;
+
+/* Test that char, signed char, unsigned char, and char8_t arrays can be
+   initialized by a UTF-8 string literal.  */
+const char cbuf1[] = u8"text";
+const char cbuf2[] = { u8"text" };
+const signed char scbuf1[] = u8"text";
+const signed char scbuf2[] = { u8"text" };
+const unsigned char ucbuf1[] = u8"text";
+const unsigned char ucbuf2[] = { u8"text" };
+const char8_t c8buf1[] = u8"text";
+const char8_t c8buf2[] = { u8"text" };
+
+/* Test that a diagnostic is issued for attempted initialization of
+   other character types by a UTF-8 string literal.  */
+const char16_t c16buf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char16_t c16buf2[] = { u8"text" };	/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char32_t c32buf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const char32_t c32buf2[] = { u8"text" };	/* { dg-error "from a string literal with type array of .unsigned char." } */
+const wchar_t wbuf1[] = u8"text";		/* { dg-error "from a string literal with type array of .unsigned char." } */
+const wchar_t wbuf2[] = { u8"text" };		/* { dg-error "from a string literal with type array of .unsigned char." } */
+
+/* Test that char8_t arrays can be initialized by an ordinary string
+   literal.  */
+const char8_t c8buf3[] = "text";
+const char8_t c8buf4[] = { "text" };
-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.
  2022-08-02 18:36 [PATCH v4 0/2] Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++ Tom Honermann
  2022-08-02 18:36 ` [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Tom Honermann
@ 2022-08-02 18:36 ` Tom Honermann
  2022-08-02 22:14   ` Joseph Myers
  1 sibling, 1 reply; 7+ messages in thread
From: Tom Honermann @ 2022-08-02 18:36 UTC (permalink / raw)
  To: gcc-patches

This patch corrects handling of UTF-8 character literals in preprocessing
directives so that they are treated as unsigned types in char8_t enabled
C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
UTF-8 character literals were always treated as having the same type as
ordinary character literals (signed or unsigned dependent on target or use
of the -fsigned-char or -funsigned char options).

	PR preprocessor/106426

gcc/c-family/ChangeLog:
	* c-opts.cc (c_common_post_options): Assign cpp_opts->unsigned_utf8char
	subject to -fchar8_t, -fsigned-char, and/or -funsigned-char.

gcc/testsuite/ChangeLog:
	* g++.dg/ext/char8_t-char-literal-1.C: Check signedness of u8 literals.
	* g++.dg/ext/char8_t-char-literal-2.C: Check signedness of u8 literals.

libcpp/ChangeLog:
	* charset.cc (narrow_str_to_charconst): Set signedness of CPP_UTF8CHAR
	literals based on unsigned_utf8char.
	* include/cpplib.h (cpp_options): Add unsigned_utf8char.
	* init.cc (cpp_create_reader): Initialize unsigned_utf8char.
---
 gcc/c-family/c-opts.cc                            | 1 +
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C | 6 +++++-
 gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C | 4 ++++
 libcpp/charset.cc                                 | 4 ++--
 libcpp/include/cpplib.h                           | 4 ++--
 libcpp/init.cc                                    | 1 +
 6 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 108adc5caf8..02ce1e86cdb 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1062,6 +1062,7 @@ c_common_post_options (const char **pfilename)
   /* char8_t support is implicitly enabled in C++20 and C2X.  */
   if (flag_char8_t == -1)
     flag_char8_t = (cxx_dialect >= cxx20) || flag_isoc2x;
+  cpp_opts->unsigned_utf8char = flag_char8_t ? 1 : cpp_opts->unsigned_char;
 
   if (flag_extern_tls_init)
     {
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
index 8ed85ccfdcd..2994dd38516 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-1.C
@@ -1,6 +1,6 @@
 // Test that UTF-8 character literals have type char if -fchar8_t is not enabled.
 // { dg-do compile }
-// { dg-options "-std=c++17 -fno-char8_t" }
+// { dg-options "-std=c++17 -fsigned-char -fno-char8_t" }
 
 template<typename T1, typename T2>
   struct is_same
@@ -10,3 +10,7 @@ template<typename T>
   { static const bool value = true; };
 
 static_assert(is_same<decltype(u8'x'), char>::value, "Error");
+
+#if u8'\0' - 1 > 0
+#error "UTF-8 character literals not signed in preprocessor"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
index 7861736689c..db4fe70046d 100644
--- a/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
+++ b/gcc/testsuite/g++.dg/ext/char8_t-char-literal-2.C
@@ -10,3 +10,7 @@ template<typename T>
   { static const bool value = true; };
 
 static_assert(is_same<decltype(u8'x'), char8_t>::value, "Error");
+
+#if u8'\0' - 1 < 0
+#error "UTF-8 character literals not unsigned in preprocessor"
+#endif
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index ca8b7cf7aa5..12e31632228 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1960,8 +1960,8 @@ narrow_str_to_charconst (cpp_reader *pfile, cpp_string str,
   /* Multichar constants are of type int and therefore signed.  */
   if (i > 1)
     unsigned_p = 0;
-  else if (type == CPP_UTF8CHAR && !CPP_OPTION (pfile, cplusplus))
-    unsigned_p = 1;
+  else if (type == CPP_UTF8CHAR)
+    unsigned_p = CPP_OPTION (pfile, unsigned_utf8char);
   else
     unsigned_p = CPP_OPTION (pfile, unsigned_char);
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 3eba6f74b57..f9c042db034 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -581,8 +581,8 @@ struct cpp_options
      ints and target wide characters, respectively.  */
   size_t precision, char_precision, int_precision, wchar_precision;
 
-  /* True means chars (wide chars) are unsigned.  */
-  bool unsigned_char, unsigned_wchar;
+  /* True means chars (wide chars, UTF-8 chars) are unsigned.  */
+  bool unsigned_char, unsigned_wchar, unsigned_utf8char;
 
   /* True if the most significant byte in a word has the lowest
      address in memory.  */
diff --git a/libcpp/init.cc b/libcpp/init.cc
index f4ab83d2145..0242da5f55c 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -231,6 +231,7 @@ cpp_create_reader (enum c_lang lang, cpp_hash_table *table,
   CPP_OPTION (pfile, int_precision) = CHAR_BIT * sizeof (int);
   CPP_OPTION (pfile, unsigned_char) = 0;
   CPP_OPTION (pfile, unsigned_wchar) = 1;
+  CPP_OPTION (pfile, unsigned_utf8char) = 1;
   CPP_OPTION (pfile, bytes_big_endian) = 1;  /* does not matter */
 
   /* Default to no charset conversion.  */
-- 
2.32.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes
  2022-08-02 18:36 ` [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Tom Honermann
@ 2022-08-02 22:10   ` Joseph Myers
  0 siblings, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2022-08-02 22:10 UTC (permalink / raw)
  To: Tom Honermann; +Cc: gcc-patches

On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This patch implements the core language and compiler dependent library
> changes adopted for C2X via WG14 N2653.  The changes include:
> - Change of type for UTF-8 string literals from array of const char to
>   array of const char8_t (unsigned char).
> - A new atomic_char8_t typedef.
> - A new ATOMIC_CHAR8_T_LOCK_FREE macro defined in terms of the existing
>   __GCC_ATOMIC_CHAR8_T_LOCK_FREE predefined macro.

OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.
  2022-08-02 18:36 ` [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes Tom Honermann
@ 2022-08-02 22:14   ` Joseph Myers
  2022-08-08 13:45     ` Tom Honermann
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph Myers @ 2022-08-02 22:14 UTC (permalink / raw)
  To: Tom Honermann; +Cc: gcc-patches

On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:

> This patch corrects handling of UTF-8 character literals in preprocessing
> directives so that they are treated as unsigned types in char8_t enabled
> C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
> UTF-8 character literals were always treated as having the same type as
> ordinary character literals (signed or unsigned dependent on target or use
> of the -fsigned-char or -funsigned char options).

OK in the absence of C++ maintainer objections within 72 hours.  (This is 
the case where, when I added support for such literals for C (commit 
7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of 
whether they should be unsigned in the preprocessor for C++ as well.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.
  2022-08-02 22:14   ` Joseph Myers
@ 2022-08-08 13:45     ` Tom Honermann
  2022-08-08 20:01       ` Joseph Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Honermann @ 2022-08-08 13:45 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches

On 8/2/22 6:14 PM, Joseph Myers wrote:
> On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:
>
>> This patch corrects handling of UTF-8 character literals in preprocessing
>> directives so that they are treated as unsigned types in char8_t enabled
>> C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t). Previously,
>> UTF-8 character literals were always treated as having the same type as
>> ordinary character literals (signed or unsigned dependent on target or use
>> of the -fsigned-char or -funsigned char options).
> OK in the absence of C++ maintainer objections within 72 hours.  (This is
> the case where, when I added support for such literals for C (commit
> 7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of
> whether they should be unsigned in the preprocessor for C++ as well.)

Joseph, would you be so kind as to commit this patch series for me? I 
don't have commit access. Thank you in advance!

Tom.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes.
  2022-08-08 13:45     ` Tom Honermann
@ 2022-08-08 20:01       ` Joseph Myers
  0 siblings, 0 replies; 7+ messages in thread
From: Joseph Myers @ 2022-08-08 20:01 UTC (permalink / raw)
  To: Tom Honermann; +Cc: gcc-patches

On Mon, 8 Aug 2022, Tom Honermann via Gcc-patches wrote:

> On 8/2/22 6:14 PM, Joseph Myers wrote:
> > On Tue, 2 Aug 2022, Tom Honermann via Gcc-patches wrote:
> > 
> > > This patch corrects handling of UTF-8 character literals in preprocessing
> > > directives so that they are treated as unsigned types in char8_t enabled
> > > C++ modes (C++17 with -fchar8_t or C++20 without -fno-char8_t).
> > > Previously,
> > > UTF-8 character literals were always treated as having the same type as
> > > ordinary character literals (signed or unsigned dependent on target or use
> > > of the -fsigned-char or -funsigned char options).
> > OK in the absence of C++ maintainer objections within 72 hours.  (This is
> > the case where, when I added support for such literals for C (commit
> > 7c5890cc0a0ecea0e88cc39e9fba6385fb579e61), I raised the question of
> > whether they should be unsigned in the preprocessor for C++ as well.)
> 
> Joseph, would you be so kind as to commit this patch series for me? I don't
> have commit access. Thank you in advance!

Done.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-08 20:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-02 18:36 [PATCH v4 0/2] Implement C2X N2653 (char8_t) and correct UTF-8 character literal type in preprocessor directives for C++ Tom Honermann
2022-08-02 18:36 ` [PATCH v4 1/2] C: Implement C2X N2653 char8_t and UTF-8 string literal changes Tom Honermann
2022-08-02 22:10   ` Joseph Myers
2022-08-02 18:36 ` [PATCH v4 2/2] preprocessor/106426: Treat u8 character literals as unsigned in char8_t modes Tom Honermann
2022-08-02 22:14   ` Joseph Myers
2022-08-08 13:45     ` Tom Honermann
2022-08-08 20:01       ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).