public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb()
@ 2022-06-30 12:52 Tom Honermann
  2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Tom Honermann @ 2022-06-30 12:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Tom Honermann

This series of patches provides the following:
- A fix for BZ #25744 [1].
- Implementations of the mbrtoc8 and c8rtomb functions adopted for
  C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3].
- A char8_t typedef as adopted for C2X via WG14 N2653 [3].

The fix for BZ #25744 [1] is included in this patch series because the tests
for mbrtoc8 and c8rtomb depend on it for exercising the special case where a
pair of Unicode code points is converted to/from a single double byte sequence.
Such conversion cases exist for Big5-HKSCS. 

N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision
of the C2X working draft [4].  This patch series enables the new declarations
in C2X mode and when _GNU_SOURCE is defined. 

This patch series addresses feedback provided in response to the prior
submission series starting at [5].  All feedback was addressed except as noted
below:
- I removed the previously proposed wcsmbs/test-char8-type.c test as redundant
  per feedback. I prototyped extending the c++-types.data test as suggested,
  but that would have introduced a C++20 dependency.  The test demonstrated that
  the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but
  char8_t got mapped to 'h' rather than 'Du'.  It might be worth extending this
  test in the future if/when C++20 support becomes a minimal requirement.
- I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use
  TEST_COMPARE in favor of the existing custom error handling; the current
  error messages are customized to provide a more detailed explanation of the
  error, so removing that seemed like an arguable regression.  My changes retain
  consistency with the existing code.  I think it is reasonable to switch to
  TEST_COMPARE, but I think that should be done separately from this change and
  separately motivated.
- I did not change the c8rtomb() and mbrtoc8() implementations to use thread
  local storage when PS is null; the implementations continue to use static
  storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32().
  I think it is reasonable to switch all of these functions to use thread
  local storage, but I think such a change should be done consistently across
  all of them and should be separately motivated.

Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian
Weimer for prior reviews of this patch series.

Tested on Linux x86_64.

[1]: Bug 25744
     "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
     second byte of certain double byte characters"
     https://sourceware.org/bugzilla/show_bug.cgi?id=25744

[2]: WG21 P0482R6
     "char8_t: A type for UTF-8 characters and strings (Revision 6)"
     https://wg21.link/p0482r6

[3]: WG14 N2653
     "char8_t: A type for UTF-8 characters and strings (Revision 1)"
     http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

[4]: WG14 N2912
     "Programming languages — C working draft — June 8, 2022"
     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf

[5]: libc-alpha mailing list
     "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()."
     https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744]
  2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
@ 2022-06-30 12:52 ` Tom Honermann
  2022-07-04 18:16   ` Adhemerval Zanella
  2022-06-30 12:52 ` [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef Tom Honermann
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-06-30 12:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Tom Honermann

This patch corrects the Big5-HKSCS converter to preserve the lowest 3 bits of
the mbstate_t __count data member when the converter encounters an incomplete
multibyte character.

This fixes BZ #25744.
---
 iconvdata/big5hkscs.c                     | 16 +++---
 iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c | 65 +++++++++++++++++++++++
 2 files changed, 73 insertions(+), 8 deletions(-)

diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
index a28b18a5ec..d12389b2e3 100644
--- a/iconvdata/big5hkscs.c
+++ b/iconvdata/big5hkscs.c
@@ -17769,7 +17769,7 @@ static struct
    the output state to the initial state.  This has to be done during the
    flushing.  */
 #define EMIT_SHIFT_TO_INIT \
-  if (data->__statep->__count != 0)					      \
+  if ((data->__statep->__count >> 3) != 0)				      \
     {									      \
       if (FROM_DIRECTION)						      \
 	{								      \
@@ -17778,7 +17778,7 @@ static struct
 	      /* Write out the last character.  */			      \
 	      *((uint32_t *) outbuf) = data->__statep->__count >> 3;	      \
 	      outbuf += sizeof (uint32_t);				      \
-	      data->__statep->__count = 0;				      \
+	      data->__statep->__count &= 7;				      \
 	    }								      \
 	  else								      \
 	    /* We don't have enough room in the output buffer.  */	      \
@@ -17792,7 +17792,7 @@ static struct
 	      uint32_t lasttwo = data->__statep->__count >> 3;		      \
 	      *outbuf++ = (lasttwo >> 8) & 0xff;			      \
 	      *outbuf++ = lasttwo & 0xff;				      \
-	      data->__statep->__count = 0;				      \
+	      data->__statep->__count &= 7;				      \
 	    }								      \
 	  else								      \
 	    /* We don't have enough room in the output buffer.  */	      \
@@ -17878,7 +17878,7 @@ static struct
 									      \
 		/* Otherwise store only the first character now, and	      \
 		   put the second one into the queue.  */		      \
-		*statep = ch2 << 3;					      \
+		*statep = (ch2 << 3) | (*statep & 7);			      \
 		/* Tell the caller why we terminate the loop.  */	      \
 		result = __GCONV_FULL_OUTPUT;				      \
 		break;							      \
@@ -17895,7 +17895,7 @@ static struct
       }									      \
     else								      \
       /* Clear the queue and proceed to output the saved character.  */	      \
-      *statep = 0;							      \
+      *statep &= 7;							      \
 									      \
     put32 (outptr, ch);							      \
     outptr += 4;							      \
@@ -17946,7 +17946,7 @@ static struct
 	  }								      \
 	*outptr++ = (ch >> 8) & 0xff;					      \
 	*outptr++ = ch & 0xff;						      \
-	*statep = 0;							      \
+	*statep &= 7;							      \
 	inptr += 4;							      \
 	continue;							      \
 									      \
@@ -17959,7 +17959,7 @@ static struct
 	  }								      \
 	*outptr++ = (lasttwo >> 8) & 0xff;				      \
 	*outptr++ = lasttwo & 0xff;					      \
-	*statep = 0;							      \
+	*statep &= 7;							      \
 	continue;							      \
       }									      \
 									      \
@@ -17996,7 +17996,7 @@ static struct
 	   /* Check for possible combining character.  */		      \
 	    if (__glibc_unlikely (ch == 0xca || ch == 0xea))		      \
 	      {								      \
-		*statep = ((cp[0] << 8) | cp[1]) << 3;			      \
+		*statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7);      \
 		inptr += 4;						      \
 		continue;						      \
 	      }								      \
diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
index 9601b6c1d9..e1472dc2e2 100644
--- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
+++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
@@ -128,6 +128,71 @@ check_conversion (struct testdata test)
       printf ("error: Result of third conversion was wrong.\n");
       err++;
     }
+
+  /* Now perform the same test as above consuming one byte at a time.  */
+  mbs = test.input;
+  memset (&st, 0, sizeof (st));
+
+  /* Consume the first byte; expect an incomplete multibyte character.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != -2)
+    {
+      printf ("error: First byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the first consumed byte.  */
+  mbs += 1;
+  /* Consume the second byte; expect the first wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 1)
+    {
+      printf ("error: Second byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the second consumed byte.  */
+  mbs += 1;
+  if (wc != test.expected[0])
+    {
+      printf ("error: Result of first wchar_t conversion was wrong.\n");
+      err++;
+    }
+  /* Consume no bytes; expect the second wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 0)
+    {
+      printf ("error: First attempt of third byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Do not advance past the third byte.  */
+  mbs += 0;
+  if (wc != test.expected[1])
+    {
+      printf ("error: Result of second wchar_t conversion was wrong.\n");
+      err++;
+    }
+  /* After the second wchar_t conversion, the converter should be in
+     the initial state since the two input BIG5-HKSCS bytes have been
+     consumed and the two wchar_t's have been output.  */
+  if (mbsinit (&st) == 0)
+    {
+      printf ("error: Converter not in initial state.\n");
+      err++;
+    }
+  /* Consume the third byte; expect the third wchar_t.  */
+  ret = mbrtowc (&wc, mbs, 1, &st);
+  if (ret != 1)
+    {
+      printf ("error: Third byte conversion returned %zd.\n", ret);
+      err++;
+    }
+  /* Advance past the third consumed byte.  */
+  mbs += 1;
+  if (wc != test.expected[2])
+    {
+      printf ("error: Result of third wchar_t conversion was wrong.\n");
+      err++;
+    }
+
   /* Return 0 if we saw no errors.  */
   return err;
 }
-- 
2.32.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
  2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
@ 2022-06-30 12:52 ` Tom Honermann
  2022-07-04 18:33   ` Adhemerval Zanella
  2022-06-30 12:52 ` [PATCH v4 3/3] stdlib: Tests for " Tom Honermann
  2022-07-04 19:08 ` [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Adhemerval Zanella
  3 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-06-30 12:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Tom Honermann

This change provides implementations for the mbrtoc8 and c8rtomb
functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14
N2653.  It also provides the char8_t typedef from WG14 N2653.

The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X
mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature
test macro is defined.

The char8_t typedef is declared in uchar.h in C2X mode or when the
_GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature
test macro is not defined (if __cpp_char8_t is defined, then char8_t
is a builtin type).
---
 NEWS                                          |   9 ++
 sysdeps/mach/hurd/i386/libc.abilist           |   2 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
 .../sysv/linux/microblaze/be/libc.abilist     |   2 +
 .../sysv/linux/microblaze/le/libc.abilist     |   2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
 wcsmbs/Makefile                               |   2 +-
 wcsmbs/Versions                               |   3 +
 wcsmbs/c8rtomb.c                              | 132 ++++++++++++++++++
 wcsmbs/mbrtoc8.c                              | 126 +++++++++++++++++
 wcsmbs/uchar.h                                |  21 +++
 40 files changed, 360 insertions(+), 1 deletion(-)
 create mode 100644 wcsmbs/c8rtomb.c
 create mode 100644 wcsmbs/mbrtoc8.c

diff --git a/NEWS b/NEWS
index b0a3d7e512..94243e2170 100644
--- a/NEWS
+++ b/NEWS
@@ -46,6 +46,15 @@ Major new features:
   to more flexibly configure and operate on filesystem mounts.  The new
   mount APIs are specifically designed to work with namespaces.
 
+* Support for the mbrtoc8 and c8rtomb multibyte/UTF-8 character conversion
+  functions has been added per the ISO C2X N2653 and C++20 P0482R6 proposals.
+  Support for the char8_t typedef has been added per the ISO C2X N2653
+  proposal.  The functions are declared in uchar.h in C2X mode or when the
+  _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined.
+  The char8_t typedef is declared in uchar.h in C2X mode or when the
+  _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro
+  is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * Support for prelink will be removed in the next release; this includes
diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
index 4dc87e9061..66fb0e28fa 100644
--- a/sysdeps/mach/hurd/i386/libc.abilist
+++ b/sysdeps/mach/hurd/i386/libc.abilist
@@ -2289,6 +2289,8 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 close_range F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 8dba065b81..b3cf9fdd70 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2616,8 +2616,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 08f4750022..2a45006462 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2713,8 +2713,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index 75db763023..0ac6bba241 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2377,8 +2377,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index fa33f317ac..bfa763906b 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -496,8 +496,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index dba2e4ce42..ffcd7ca432 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -493,8 +493,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index e6ff921c29..940777b118 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2652,8 +2652,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 8a40cece83..508efe6626 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2601,8 +2601,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index a89826049f..16b91fcee9 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2785,8 +2785,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index d1d96b7469..51b646790d 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2551,8 +2551,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 63a62f267a..ddb43651f2 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -497,8 +497,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index f68325f9bc..3db7deb4d0 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2728,8 +2728,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 247af2075c..94afb7ad0b 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2701,8 +2701,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index b0ac3f9009..5873751425 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2698,8 +2698,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index b22cd6bf2f..f296e4edb7 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2693,8 +2693,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 12fc2cce3e..1888756819 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2691,8 +2691,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index d3e96dfd43..7dfacee25b 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2699,8 +2699,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index cb58ed4db0..53e188aafe 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2602,8 +2602,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 61ad58a599..bc6a836b1b 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2740,8 +2740,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
index 1260dc4e2e..299fa67961 100644
--- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
+++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
@@ -2123,8 +2123,10 @@ GLIBC_2.35 wprintf F
 GLIBC_2.35 write F
 GLIBC_2.35 writev F
 GLIBC_2.35 wscanf F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 363939762c..a5a072394d 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2755,8 +2755,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index f512ad8baf..2d26fd8639 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2788,8 +2788,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index c9bdc9859c..d9f1c593ea 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2510,8 +2510,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index f091be30bd..874f33dbcc 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2812,8 +2812,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index 7ea73f9af8..465798a56f 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2379,8 +2379,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 333fa62714..ecc0544c05 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2579,8 +2579,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index a867467b12..3e8d00d513 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2753,8 +2753,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index dbad5b3163..a872a3d186 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2547,8 +2547,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 6f755cc173..a2938ca2be 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2608,8 +2608,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 77d936aa3c..ef318251c5 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2605,8 +2605,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 09bb4363e1..2e2fbe72e2 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2748,8 +2748,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 9df9cb6adb..e1991259cd 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2574,8 +2574,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 4829450ad0..7d0843d1d8 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2525,8 +2525,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index caea228bcb..761958f768 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2631,8 +2631,10 @@ GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
 GLIBC_2.35 epoll_pwait2 F
 GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
+GLIBC_2.36 c8rtomb F
 GLIBC_2.36 fsmount F
 GLIBC_2.36 fsopen F
+GLIBC_2.36 mbrtoc8 F
 GLIBC_2.36 move_mount F
 GLIBC_2.36 pidfd_getfd F
 GLIBC_2.36 pidfd_open F
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index df9a85f4a9..bda281ad70 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \
 	    wcsmbsload mbsrtowcs_l \
 	    isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \
 	    isoc99_swscanf isoc99_vswscanf \
-	    mbrtoc16 c16rtomb mbrtoc32 c32rtomb
+	    mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb
 
 strop-tests :=  wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \
 		wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \
diff --git a/wcsmbs/Versions b/wcsmbs/Versions
index 0b31c1b940..ec28acfb73 100644
--- a/wcsmbs/Versions
+++ b/wcsmbs/Versions
@@ -49,4 +49,7 @@ libc {
     wcstof32; wcstof64; wcstof32x;
     wcstof32_l; wcstof64_l; wcstof32x_l;
   }
+  GLIBC_2.36 {
+    c8rtomb; mbrtoc8;
+  }
 }
diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c
new file mode 100644
index 0000000000..b564770eb5
--- /dev/null
+++ b/wcsmbs/c8rtomb.c
@@ -0,0 +1,132 @@
+/* UTF-8 to multibyte conversion.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <uchar.h>
+#include <wchar.h>
+
+
+/* This is the private state used if PS is NULL.  */
+static mbstate_t state;
+
+size_t
+c8rtomb (char *s, char8_t c8, mbstate_t *ps)
+{
+  /* This implementation depends on the converter invoked by wcrtomb not
+     needing to retain state in either the top most bit of ps->__count or
+     in ps->__value between invocations.  This implementation uses the
+     top most bit of ps->__count to indicate that trailing code units are
+     expected and uses ps->__value to store previously seen code units.  */
+
+  wchar_t wc;
+
+  if (ps == NULL)
+    ps = &state;
+
+  if (s == NULL)
+    {
+      /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'.  If
+         this occurs for an incomplete code unit sequence, then an error will
+         be reported below.  */
+      c8 = u8""[0];
+    }
+
+  if (! (ps->__count & 0x80000000))
+    {
+      /* Initial state.  */
+      if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5)
+	{
+	  /* An invalid lead code unit.  */
+	  __set_errno (EILSEQ);
+	  return -1;
+	}
+      if (c8 >= 0xC2)
+	{
+	  /* A valid lead code unit.  */
+	  ps->__count |= 0x80000000;
+	  ps->__value.__wchb[0] = c8;
+	  ps->__value.__wchb[3] = 1;
+	  return 0;
+	}
+      /* A single byte (ASCII) code unit.  */
+      wc = c8;
+    }
+  else
+    {
+      char8_t cu1 = ps->__value.__wchb[0];
+      if (ps->__value.__wchb[3] == 1)
+	{
+	  /* A single lead code unit was previously seen.  */
+	  if ((c8 < 0x80 || c8 > 0xBF)
+              || (cu1 == 0xE0 && c8 < 0xA0)
+              || (cu1 == 0xED && c8 > 0x9F)
+              || (cu1 == 0xF0 && c8 < 0x90)
+              || (cu1 == 0xF4 && c8 > 0x8F))
+	    {
+	      /* An invalid second code unit.  */
+	      __set_errno (EILSEQ);
+	      return -1;
+	    }
+	  if (cu1 >= 0xE0)
+	    {
+	      /* A three or four code unit sequence.  */
+	      ps->__value.__wchb[1] = c8;
+	      ++ps->__value.__wchb[3];
+	      return 0;
+	    }
+	  wc = ((cu1 & 0x1F) << 6)
+	       + (c8 & 0x3F);
+	}
+      else
+	{
+	  char8_t cu2 = ps->__value.__wchb[1];
+	  /* A three or four byte code unit sequence.  */
+	  if (c8 < 0x80 || c8 > 0xBF)
+	    {
+	      /* An invalid third or fourth code unit.  */
+	      __set_errno (EILSEQ);
+	      return -1;
+	    }
+	  if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0)
+	    {
+	      /* A four code unit sequence.  */
+	      ps->__value.__wchb[2] = c8;
+	      ++ps->__value.__wchb[3];
+	      return 0;
+	    }
+	  if (cu1 < 0xF0)
+	    {
+	      wc = ((cu1 & 0x0F) << 12)
+		   + ((cu2 & 0x3F) << 6)
+		   + (c8 & 0x3F);
+	    }
+	  else
+	    {
+	      char8_t cu3 = ps->__value.__wchb[2];
+	      wc = ((cu1 & 0x07) << 18)
+		   + ((cu2 & 0x3F) << 12)
+		   + ((cu3 & 0x3F) << 6)
+		   + (c8 & 0x3F);
+	    }
+	}
+      ps->__count &= 0x7fffffff;
+      ps->__value.__wch = 0;
+    }
+
+  return wcrtomb (s, wc, ps);
+}
diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c
new file mode 100644
index 0000000000..f2fab3b6a7
--- /dev/null
+++ b/wcsmbs/mbrtoc8.c
@@ -0,0 +1,126 @@
+/* Multibyte to UTF-8 conversion.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <dlfcn.h>
+#include <errno.h>
+#include <gconv.h>
+#include <uchar.h>
+#include <wcsmbsload.h>
+
+#include <sysdep.h>
+
+#ifndef EILSEQ
+# define EILSEQ EINVAL
+#endif
+
+
+/* This is the private state used if PS is NULL.  */
+static mbstate_t state;
+
+size_t
+mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps)
+{
+  /* This implementation depends on the converter invoked by mbrtowc() not
+     needing to retain state in either the top most bit of ps->__count or
+     in ps->__value between invocations.  This implementation uses the
+     top most bit of ps->__count to indicate that trailing code units are
+     yet to be written and uses ps->__value to store those code units.  */
+
+  if (ps == NULL)
+    ps = &state;
+
+  /* If state indicates that trailing code units are yet to be written, write
+     those first regardless of whether 's' is a null pointer.  */
+  if (ps->__count & 0x80000000)
+    {
+      /* ps->__value.__wchb[3] stores the index of the next code unit to
+         write.  Code units are stored in reverse order.  */
+      size_t i = ps->__value.__wchb[3];
+      if (pc8 != NULL)
+	{
+	  *pc8 = ps->__value.__wchb[i];
+	}
+      if (i == 0)
+	{
+	  ps->__count &= 0x7fffffff;
+	  ps->__value.__wch = 0;
+	}
+      else
+	--ps->__value.__wchb[3];
+      return -3;
+    }
+
+  if (s == NULL)
+    {
+      /* if 's' is a null pointer, behave as if a null pointer was passed for
+         'pc8', an empty string was passed for 's', and 1 passed for 'n'.  */
+      pc8 = NULL;
+      s = "";
+      n = 1;
+    }
+
+  wchar_t wc;
+  size_t result;
+
+  result = mbrtowc (&wc, s, n, ps);
+  if (result <= n)
+    {
+      if (wc <= 0x7F)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = wc;
+	}
+      else if (wc <= 0x7FF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xC0 + ((wc >> 6) & 0x1F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 0;
+	  ps->__count |= 0x80000000;
+	}
+      else if (wc <= 0xFFFF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xE0 + ((wc >> 12) & 0x0F);
+	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 1;
+	  ps->__count |= 0x80000000;
+	}
+      else if (wc <= 0x10FFFF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xF0 + ((wc >> 18) & 0x07);
+	  ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F);
+	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 2;
+	  ps->__count |= 0x80000000;
+	}
+    }
+  if (result == 0 && wc != 0)
+    {
+      /* mbrtowc() never returns -3.  When a MB sequence converts to multiple
+         WCs, no input is consumed when writing the subsequent WCs resulting
+         in a result of 0 even if a null character wasn't written.  */
+      result = -3;
+    }
+
+  return result;
+}
diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h
index 051cdcbeb5..c37e8619a0 100644
--- a/wcsmbs/uchar.h
+++ b/wcsmbs/uchar.h
@@ -31,6 +31,13 @@
 #include <bits/types.h>
 #include <bits/types/mbstate_t.h>
 
+/* Declare the C2x char8_t typedef in C2x modes, but only if the C++
+  __cpp_char8_t feature test macro is not defined.  */
+#if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
+/* Define the 8-bit character type.  */
+typedef unsigned char char8_t;
+#endif
+
 #ifndef __USE_ISOCXX11
 /* Define the 16-bit and 32-bit character types.  */
 typedef __uint_least16_t char16_t;
@@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t;
 
 __BEGIN_DECLS
 
+/* Declare the C2x mbrtoc8() and c8rtomb() functions in C2x modes or if
+   the C++ __cpp_char8_t feature test macro is defined.  */
+#if __GLIBC_USE (ISOC2X) || defined __cpp_char8_t
+/* Write char8_t representation of multibyte character pointed
+   to by S to PC8.  */
+extern size_t mbrtoc8  (char8_t *__restrict __pc8,
+			const char *__restrict __s, size_t __n,
+			mbstate_t *__restrict __p) __THROW;
+
+/* Write multibyte representation of char8_t C8 to S.  */
+extern size_t c8rtomb  (char *__restrict __s, char8_t __c8,
+			mbstate_t *__restrict __ps) __THROW;
+#endif
+
 /* Write char16_t representation of multibyte character pointed
    to by S to PC16.  */
 extern size_t mbrtoc16 (char16_t *__restrict __pc16,
-- 
2.32.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 3/3] stdlib: Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
  2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
  2022-06-30 12:52 ` [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef Tom Honermann
@ 2022-06-30 12:52 ` Tom Honermann
  2022-07-04 18:58   ` Adhemerval Zanella
  2022-07-04 19:08 ` [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Adhemerval Zanella
  3 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-06-30 12:52 UTC (permalink / raw)
  To: libc-alpha; +Cc: Tom Honermann

This change adds tests for the mbrtoc8 and c8rtomb functions adopted for
C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t
typedef adopted for C2X from WG14 N2653.

The tests for mbrtoc8 and c8rtomb specifically exercise conversion to
and from Big5-HKSCS because of special cases that arise with that encoding.
Big5-HKSCS defines some double byte sequences that convert to more than
one Unicode code point.  In order to test this, the locale dependencies
for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS.
---
 wcsmbs/Makefile       |   3 +-
 wcsmbs/test-c8rtomb.c | 613 ++++++++++++++++++++++++++++++++++++++++++
 wcsmbs/test-mbrtoc8.c | 539 +++++++++++++++++++++++++++++++++++++
 3 files changed, 1154 insertions(+), 1 deletion(-)
 create mode 100644 wcsmbs/test-c8rtomb.c
 create mode 100644 wcsmbs/test-mbrtoc8.c

diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index bda281ad70..e6b9e8743a 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -52,6 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
 	 tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
 	 tst-wcstod-round test-char-types tst-fgetwc-after-eof \
 	 tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
+	 test-mbrtoc8 test-c8rtomb \
 	 $(addprefix test-,$(strop-tests)) tst-mbstowcs \
 	 tst-wprintf-binary
 
@@ -59,7 +60,7 @@ include ../Rules
 
 ifeq ($(run-built-tests),yes)
 LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \
-	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9
+	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS
 include ../gen-locales.mk
 
 $(objpfx)tst-btowc.out: $(gen-locales)
diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c
new file mode 100644
index 0000000000..6d72189e86
--- /dev/null
+++ b/wcsmbs/test-c8rtomb.c
@@ -0,0 +1,613 @@
+/* Test c8rtomb.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <limits.h>
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <uchar.h>
+#include <wchar.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+test_truncated_code_unit_sequence (void)
+{
+  /* Missing trailing code unit for a two code byte unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xC2";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Missing first trailing code unit for a three byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xE0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Missing second trailing code unit for a three byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Missing first trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Missing second trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\x90";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Missing third trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_invalid_trailing_code_unit_sequence (void)
+{
+  /* Invalid trailing code unit for a two code byte unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xC2\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Invalid first trailing code unit for a three byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xE0\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Invalid second trailing code unit for a three byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Invalid first trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Invalid second trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Invalid third trailing code unit for a four byte code unit sequence.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80\xC0";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_lone_trailing_code_units (void)
+{
+  /* Lone trailing code unit.  */
+  const char8_t *u8s = (const char8_t*) u8"\x80";
+  char buf[MB_LEN_MAX] = { 0 };
+  mbstate_t s = { 0 };
+
+  errno = 0;
+  TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
+  TEST_COMPARE (errno, EILSEQ);
+
+  return 0;
+}
+
+static int
+test_overlong_encoding (void)
+{
+  /* Two byte overlong encoding.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xC0\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Two byte overlong encoding.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xC1\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Three byte overlong encoding.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xE0\x9F\xBF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Four byte overlong encoding.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_surrogate_range (void)
+{
+  /* Would encode U+D800.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xED\xA0\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Would encode U+DFFF.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xED\xBF\xBF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_out_of_range_encoding (void)
+{
+  /* Would encode U+00110000.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF4\x90\x80\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  /* Would encode U+00140000.  */
+  {
+    const char8_t *u8s = (const char8_t*) u8"\xF5\x90\x80\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    errno = 0;
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_null_output_buffer (void)
+{
+  /* Null character with an initial state.  */
+  {
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1);
+    /* Assert the state is now an initial state.  */
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Null buffer with a state corresponding to an incompletely read code
+     unit sequence.  In this case, an error occurs since insufficient
+     information is available to complete the already started code unit
+     sequence and return to the initial state.  */
+  {
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t)  0);
+    errno = 0;
+    TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1);
+    TEST_COMPARE (errno, EILSEQ);
+  }
+
+  return 0;
+}
+
+static int
+test_utf8 (void)
+{
+  xsetlocale (LC_ALL, "de_DE.UTF-8");
+
+  /* Null character.  */
+  {
+    /* U+0000 => 0x00 */
+    const char8_t *u8s = (const char8_t*) u8"\x00";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
+    TEST_COMPARE (buf[0], (char) 0x00);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First non-null character in the code point range that maps to a single
+     code unit.  */
+  {
+    /* U+0001 => 0x01 */
+    const char8_t *u8s = (const char8_t*) u8"\x01";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
+    TEST_COMPARE (buf[0], (char) 0x01);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to a single code unit.  */
+  {
+    /* U+007F => 0x7F */
+    const char8_t *u8s = (const char8_t*) u8"\x7F";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
+    TEST_COMPARE (buf[0], (char) 0x7F);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to two code units.  */
+  {
+    /* U+0080 => 0xC2 0x80 */
+    const char8_t *u8s = (const char8_t*) u8"\xC2\x80";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
+    TEST_COMPARE (buf[0], (char) 0xC2);
+    TEST_COMPARE (buf[1], (char) 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to two code units.  */
+  {
+    /* U+07FF => 0xDF 0xBF */
+    const char8_t *u8s = (const char8_t*) u8"\u07FF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
+    TEST_COMPARE (buf[0], (char) 0xDF);
+    TEST_COMPARE (buf[1], (char) 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to three code units.  */
+  {
+    /* U+0800 => 0xE0 0xA0 0x80 */
+    const char8_t *u8s = (const char8_t*) u8"\u0800";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xE0);
+    TEST_COMPARE (buf[1], (char) 0xA0);
+    TEST_COMPARE (buf[2], (char) 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to three code units
+     before the surrogate code point range.  */
+  {
+    /* U+D7FF => 0xED 0x9F 0xBF */
+    const char8_t *u8s = (const char8_t*) u8"\uD7FF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xED);
+    TEST_COMPARE (buf[1], (char) 0x9F);
+    TEST_COMPARE (buf[2], (char) 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to three code units
+     after the surrogate code point range.  */
+  {
+    /* U+E000 => 0xEE 0x80 0x80 */
+    const char8_t *u8s = (const char8_t*) u8"\uE000";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xEE);
+    TEST_COMPARE (buf[1], (char) 0x80);
+    TEST_COMPARE (buf[2], (char) 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Not a BOM.  */
+  {
+    /* U+FEFF => 0xEF 0xBB 0xBF */
+    const char8_t *u8s = (const char8_t*) u8"\uFEFF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xEF);
+    TEST_COMPARE (buf[1], (char) 0xBB);
+    TEST_COMPARE (buf[2], (char) 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Replacement character.  */
+  {
+    /* U+FFFD => 0xEF 0xBF 0xBD */
+    const char8_t *u8s = (const char8_t*) u8"\uFFFD";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xEF);
+    TEST_COMPARE (buf[1], (char) 0xBF);
+    TEST_COMPARE (buf[2], (char) 0xBD);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to three code units.  */
+  {
+    /* U+FFFF => 0xEF 0xBF 0xBF */
+    const char8_t *u8s = (const char8_t*) u8"\uFFFF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
+    TEST_COMPARE (buf[0], (char) 0xEF);
+    TEST_COMPARE (buf[1], (char) 0xBF);
+    TEST_COMPARE (buf[2], (char) 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to four code units.  */
+  {
+    /* U+10000 => 0xF0 0x90 0x80 0x80 */
+    const char8_t *u8s = (const char8_t*) u8"\U00010000";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
+    TEST_COMPARE (buf[0], (char) 0xF0);
+    TEST_COMPARE (buf[1], (char) 0x90);
+    TEST_COMPARE (buf[2], (char) 0x80);
+    TEST_COMPARE (buf[3], (char) 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to four code units.  */
+  {
+    /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */
+    const char8_t *u8s = (const char8_t*) u8"\U0010FFFF";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
+    TEST_COMPARE (buf[0], (char) 0xF4);
+    TEST_COMPARE (buf[1], (char) 0x8F);
+    TEST_COMPARE (buf[2], (char) 0xBF);
+    TEST_COMPARE (buf[3], (char) 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  return 0;
+}
+
+static int
+test_big5_hkscs (void)
+{
+  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
+
+  /* A pair of two byte UTF-8 code unit sequences that map a Unicode code
+     point and combining character to a single double byte character.  */
+  {
+    /* U+00CA U+0304 => 0x88 0x62 */
+    const char8_t *u8s = (const char8_t*) u8"\u00CA\u0304";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
+    TEST_COMPARE (buf[0], (char) 0x88);
+    TEST_COMPARE (buf[1], (char) 0x62);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code
+     point and combining character to a single double byte character.  */
+  {
+    /* U+00EA U+030C => 0x88 0xA5 */
+    const char8_t *u8s = (const char8_t*) u8"\u00EA\u030C";
+    char buf[MB_LEN_MAX] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
+    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
+    TEST_COMPARE (buf[0], (char) 0x88);
+    TEST_COMPARE (buf[1], (char) 0xA5);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  test_truncated_code_unit_sequence ();
+  test_invalid_trailing_code_unit_sequence ();
+  test_lone_trailing_code_units ();
+  test_overlong_encoding ();
+  test_surrogate_range ();
+  test_out_of_range_encoding ();
+  test_null_output_buffer ();
+  test_utf8 ();
+  test_big5_hkscs ();
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c
new file mode 100644
index 0000000000..b282fa6dba
--- /dev/null
+++ b/wcsmbs/test-mbrtoc8.c
@@ -0,0 +1,539 @@
+/* Test mbrtoc8.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <uchar.h>
+#include <wchar.h>
+#include <support/check.h>
+#include <support/support.h>
+
+static int
+test_utf8 (void)
+{
+  xsetlocale (LC_ALL, "de_DE.UTF-8");
+
+  /* No inputs.  */
+  {
+    const char *mbs = "";
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no input */
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Null character.  */
+  {
+    const char *mbs = "\x00"; /* 0x00 => U+0000 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0x00);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First non-null character in the code point range that maps to a single
+     code unit.  */
+  {
+    const char *mbs = "\x01"; /* 0x01 => U+0001 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0x01);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to a single code unit.  */
+  {
+    const char *mbs = "\x7F"; /* 0x7F => U+007F */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0x7F);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to two code units.  */
+  {
+    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
+    mbs += 2;
+    TEST_COMPARE (buf[0], 0xC2);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xC2);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to two code units.  */
+  {
+    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
+    mbs += 2;
+    TEST_COMPARE (buf[0], 0xDF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xDF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to three code units.  */
+  {
+    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xE0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xA0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xE0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xA0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to three code units
+     before the surrogate code point range.  */
+  {
+    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xED);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x9F);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xED);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x9F);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to three code units
+     after the surrogate code point range.  */
+  {
+    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xEE);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xEE);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Not a BOM.  */
+  {
+    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBB);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBB);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Replacement character.  */
+  {
+    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBD);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBD);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to three code units.  */
+  {
+    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
+    mbs += 3;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xEF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* First character in the code point range that maps to four code units.  */
+  {
+    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
+    mbs += 4;
+    TEST_COMPARE (buf[0], 0xF0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x90);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xF0);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x90);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x80);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Last character in the code point range that maps to four code units.  */
+  {
+    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
+    mbs += 4;
+    TEST_COMPARE (buf[0], 0xF4);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8F);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xF4);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8F);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xBF);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  return 0;
+}
+
+static int
+test_big5_hkscs (void)
+{
+  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
+
+  /* A double byte character that maps to a pair of two byte UTF-8 code unit
+     sequences.  */
+  {
+    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
+    mbs += 2;
+    TEST_COMPARE (buf[0], 0xC3);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8A);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xCC);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x84);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xC3);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8A);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xCC);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x84);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Another double byte character that maps to a pair of two byte UTF-8 code
+     unit sequences.  */
+  {
+    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
+    mbs += 2;
+    TEST_COMPARE (buf[0], 0xC3);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xAA);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xCC);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8C);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  /* Same as last test, but one code unit at a time.  */
+  {
+    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
+    char8_t buf[1] = { 0 };
+    mbstate_t s = { 0 };
+
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
+    mbs += 1;
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
+    mbs += 1;
+    TEST_COMPARE (buf[0], 0xC3);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xAA);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0xCC);
+    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
+    TEST_COMPARE (buf[0], 0x8C);
+    TEST_VERIFY (mbsinit (&s));
+  }
+
+  return 0;
+}
+
+static int
+do_test (void)
+{
+  test_utf8 ();
+  test_big5_hkscs ();
+  return 0;
+}
+
+#include <support/test-driver.c>
-- 
2.32.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744]
  2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
@ 2022-07-04 18:16   ` Adhemerval Zanella
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2022-07-04 18:16 UTC (permalink / raw)
  To: Tom Honermann; +Cc: libc-alpha



> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> This patch corrects the Big5-HKSCS converter to preserve the lowest 3 bits of
> the mbstate_t __count data member when the converter encounters an incomplete
> multibyte character.
> 
> This fixes BZ #25744.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
> iconvdata/big5hkscs.c                     | 16 +++---
> iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c | 65 +++++++++++++++++++++++
> 2 files changed, 73 insertions(+), 8 deletions(-)
> 
> diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c
> index a28b18a5ec..d12389b2e3 100644
> --- a/iconvdata/big5hkscs.c
> +++ b/iconvdata/big5hkscs.c
> @@ -17769,7 +17769,7 @@ static struct
>    the output state to the initial state.  This has to be done during the
>    flushing.  */
> #define EMIT_SHIFT_TO_INIT \
> -  if (data->__statep->__count != 0)					      \
> +  if ((data->__statep->__count >> 3) != 0)				      \
>     {									      \
>       if (FROM_DIRECTION)						      \
> 	{								      \
> @@ -17778,7 +17778,7 @@ static struct
> 	      /* Write out the last character.  */			      \
> 	      *((uint32_t *) outbuf) = data->__statep->__count >> 3;	      \
> 	      outbuf += sizeof (uint32_t);				      \
> -	      data->__statep->__count = 0;				      \
> +	      data->__statep->__count &= 7;				      \
> 	    }								      \
> 	  else								      \
> 	    /* We don't have enough room in the output buffer.  */	      \
> @@ -17792,7 +17792,7 @@ static struct
> 	      uint32_t lasttwo = data->__statep->__count >> 3;		      \
> 	      *outbuf++ = (lasttwo >> 8) & 0xff;			      \
> 	      *outbuf++ = lasttwo & 0xff;				      \
> -	      data->__statep->__count = 0;				      \
> +	      data->__statep->__count &= 7;				      \
> 	    }								      \
> 	  else								      \
> 	    /* We don't have enough room in the output buffer.  */	      \
> @@ -17878,7 +17878,7 @@ static struct
> 									      \
> 		/* Otherwise store only the first character now, and	      \
> 		   put the second one into the queue.  */		      \
> -		*statep = ch2 << 3;					      \
> +		*statep = (ch2 << 3) | (*statep & 7);			      \
> 		/* Tell the caller why we terminate the loop.  */	      \
> 		result = __GCONV_FULL_OUTPUT;				      \
> 		break;							      \
> @@ -17895,7 +17895,7 @@ static struct
>       }									      \
>     else								      \
>       /* Clear the queue and proceed to output the saved character.  */	      \
> -      *statep = 0;							      \
> +      *statep &= 7;							      \
> 									      \
>     put32 (outptr, ch);							      \
>     outptr += 4;							      \
> @@ -17946,7 +17946,7 @@ static struct
> 	  }								      \
> 	*outptr++ = (ch >> 8) & 0xff;					      \
> 	*outptr++ = ch & 0xff;						      \
> -	*statep = 0;							      \
> +	*statep &= 7;							      \
> 	inptr += 4;							      \
> 	continue;							      \
> 									      \
> @@ -17959,7 +17959,7 @@ static struct
> 	  }								      \
> 	*outptr++ = (lasttwo >> 8) & 0xff;				      \
> 	*outptr++ = lasttwo & 0xff;					      \
> -	*statep = 0;							      \
> +	*statep &= 7;							      \
> 	continue;							      \
>       }									      \
> 									      \
> @@ -17996,7 +17996,7 @@ static struct
> 	   /* Check for possible combining character.  */		      \
> 	    if (__glibc_unlikely (ch == 0xca || ch == 0xea))		      \
> 	      {								      \
> -		*statep = ((cp[0] << 8) | cp[1]) << 3;			      \
> +		*statep = (((cp[0] << 8) | cp[1]) << 3) | (*statep & 7);      \
> 		inptr += 4;						      \
> 		continue;						      \
> 	      }								      \
> diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> index 9601b6c1d9..e1472dc2e2 100644
> --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c
> @@ -128,6 +128,71 @@ check_conversion (struct testdata test)
>       printf ("error: Result of third conversion was wrong.\n");
>       err++;
>     }
> +
> +  /* Now perform the same test as above consuming one byte at a time.  */
> +  mbs = test.input;
> +  memset (&st, 0, sizeof (st));
> +
> +  /* Consume the first byte; expect an incomplete multibyte character.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != -2)
> +    {
> +      printf ("error: First byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Advance past the first consumed byte.  */
> +  mbs += 1;
> +  /* Consume the second byte; expect the first wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 1)
> +    {
> +      printf ("error: Second byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Advance past the second consumed byte.  */
> +  mbs += 1;
> +  if (wc != test.expected[0])
> +    {
> +      printf ("error: Result of first wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +  /* Consume no bytes; expect the second wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 0)
> +    {
> +      printf ("error: First attempt of third byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Do not advance past the third byte.  */
> +  mbs += 0;
> +  if (wc != test.expected[1])
> +    {
> +      printf ("error: Result of second wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +  /* After the second wchar_t conversion, the converter should be in
> +     the initial state since the two input BIG5-HKSCS bytes have been
> +     consumed and the two wchar_t's have been output.  */
> +  if (mbsinit (&st) == 0)
> +    {
> +      printf ("error: Converter not in initial state.\n");
> +      err++;
> +    }
> +  /* Consume the third byte; expect the third wchar_t.  */
> +  ret = mbrtowc (&wc, mbs, 1, &st);
> +  if (ret != 1)
> +    {
> +      printf ("error: Third byte conversion returned %zd.\n", ret);
> +      err++;
> +    }
> +  /* Advance past the third consumed byte.  */
> +  mbs += 1;
> +  if (wc != test.expected[2])
> +    {
> +      printf ("error: Result of third wchar_t conversion was wrong.\n");
> +      err++;
> +    }
> +
>   /* Return 0 if we saw no errors.  */
>   return err;
> }
> -- 
> 2.32.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-06-30 12:52 ` [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef Tom Honermann
@ 2022-07-04 18:33   ` Adhemerval Zanella
  2022-07-19 21:08     ` Joseph Myers
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2022-07-04 18:33 UTC (permalink / raw)
  To: Tom Honermann; +Cc: libc-alpha



> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> This change provides implementations for the mbrtoc8 and c8rtomb
> functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14
> N2653.  It also provides the char8_t typedef from WG14 N2653.
> 
> The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X
> mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature
> test macro is defined.
> 
> The char8_t typedef is declared in uchar.h in C2X mode or when the
> _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature
> test macro is not defined (if __cpp_char8_t is defined, then char8_t
> is a builtin type).

LGTM with a minor nit below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
> NEWS                                          |   9 ++
> sysdeps/mach/hurd/i386/libc.abilist           |   2 +
> sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   2 +
> sysdeps/unix/sysv/linux/alpha/libc.abilist    |   2 +
> sysdeps/unix/sysv/linux/arc/libc.abilist      |   2 +
> sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   2 +
> sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   2 +
> sysdeps/unix/sysv/linux/csky/libc.abilist     |   2 +
> sysdeps/unix/sysv/linux/hppa/libc.abilist     |   2 +
> sysdeps/unix/sysv/linux/i386/libc.abilist     |   2 +
> sysdeps/unix/sysv/linux/ia64/libc.abilist     |   2 +
> .../sysv/linux/m68k/coldfire/libc.abilist     |   2 +
> .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   2 +
> .../sysv/linux/microblaze/be/libc.abilist     |   2 +
> .../sysv/linux/microblaze/le/libc.abilist     |   2 +
> .../sysv/linux/mips/mips32/fpu/libc.abilist   |   2 +
> .../sysv/linux/mips/mips32/nofpu/libc.abilist |   2 +
> .../sysv/linux/mips/mips64/n32/libc.abilist   |   2 +
> .../sysv/linux/mips/mips64/n64/libc.abilist   |   2 +
> sysdeps/unix/sysv/linux/nios2/libc.abilist    |   2 +
> sysdeps/unix/sysv/linux/or1k/libc.abilist     |   2 +
> .../linux/powerpc/powerpc32/fpu/libc.abilist  |   2 +
> .../powerpc/powerpc32/nofpu/libc.abilist      |   2 +
> .../linux/powerpc/powerpc64/be/libc.abilist   |   2 +
> .../linux/powerpc/powerpc64/le/libc.abilist   |   2 +
> .../unix/sysv/linux/riscv/rv32/libc.abilist   |   2 +
> .../unix/sysv/linux/riscv/rv64/libc.abilist   |   2 +
> .../unix/sysv/linux/s390/s390-32/libc.abilist |   2 +
> .../unix/sysv/linux/s390/s390-64/libc.abilist |   2 +
> sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   2 +
> sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   2 +
> .../sysv/linux/sparc/sparc32/libc.abilist     |   2 +
> .../sysv/linux/sparc/sparc64/libc.abilist     |   2 +
> .../unix/sysv/linux/x86_64/64/libc.abilist    |   2 +
> .../unix/sysv/linux/x86_64/x32/libc.abilist   |   2 +
> wcsmbs/Makefile                               |   2 +-
> wcsmbs/Versions                               |   3 +
> wcsmbs/c8rtomb.c                              | 132 ++++++++++++++++++
> wcsmbs/mbrtoc8.c                              | 126 +++++++++++++++++
> wcsmbs/uchar.h                                |  21 +++
> 40 files changed, 360 insertions(+), 1 deletion(-)
> create mode 100644 wcsmbs/c8rtomb.c
> create mode 100644 wcsmbs/mbrtoc8.c
> 
> diff --git a/NEWS b/NEWS
> index b0a3d7e512..94243e2170 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -46,6 +46,15 @@ Major new features:
>   to more flexibly configure and operate on filesystem mounts.  The new
>   mount APIs are specifically designed to work with namespaces.
> 
> +* Support for the mbrtoc8 and c8rtomb multibyte/UTF-8 character conversion
> +  functions has been added per the ISO C2X N2653 and C++20 P0482R6 proposals.
> +  Support for the char8_t typedef has been added per the ISO C2X N2653
> +  proposal.  The functions are declared in uchar.h in C2X mode or when the
> +  _GNU_SOURCE macro or C++20 __cpp_char8_t feature test macro is defined.
> +  The char8_t typedef is declared in uchar.h in C2X mode or when the
> +  _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro
> +  is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type).
> +
> Deprecated and removed features, and other changes affecting compatibility:
> 
> * Support for prelink will be removed in the next release; this includes
> diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
> index 4dc87e9061..66fb0e28fa 100644
> --- a/sysdeps/mach/hurd/i386/libc.abilist
> +++ b/sysdeps/mach/hurd/i386/libc.abilist
> @@ -2289,6 +2289,8 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 close_range F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.4 __confstr_chk F
> GLIBC_2.4 __fgets_chk F
> GLIBC_2.4 __fgets_unlocked_chk F
> diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> index 8dba065b81..b3cf9fdd70 100644
> --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
> @@ -2616,8 +2616,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> index 08f4750022..2a45006462 100644
> --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
> @@ -2713,8 +2713,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
> index 75db763023..0ac6bba241 100644
> --- a/sysdeps/unix/sysv/linux/arc/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
> @@ -2377,8 +2377,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> index fa33f317ac..bfa763906b 100644
> --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
> @@ -496,8 +496,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> index dba2e4ce42..ffcd7ca432 100644
> --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
> @@ -493,8 +493,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
> index e6ff921c29..940777b118 100644
> --- a/sysdeps/unix/sysv/linux/csky/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
> @@ -2652,8 +2652,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> index 8a40cece83..508efe6626 100644
> --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
> @@ -2601,8 +2601,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
> index a89826049f..16b91fcee9 100644
> --- a/sysdeps/unix/sysv/linux/i386/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
> @@ -2785,8 +2785,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> index d1d96b7469..51b646790d 100644
> --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
> @@ -2551,8 +2551,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> index 63a62f267a..ddb43651f2 100644
> --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
> @@ -497,8 +497,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> index f68325f9bc..3db7deb4d0 100644
> --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
> @@ -2728,8 +2728,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> index 247af2075c..94afb7ad0b 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
> @@ -2701,8 +2701,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> index b0ac3f9009..5873751425 100644
> --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
> @@ -2698,8 +2698,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> index b22cd6bf2f..f296e4edb7 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
> @@ -2693,8 +2693,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> index 12fc2cce3e..1888756819 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
> @@ -2691,8 +2691,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> index d3e96dfd43..7dfacee25b 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
> @@ -2699,8 +2699,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> index cb58ed4db0..53e188aafe 100644
> --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
> @@ -2602,8 +2602,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> index 61ad58a599..bc6a836b1b 100644
> --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
> @@ -2740,8 +2740,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> index 1260dc4e2e..299fa67961 100644
> --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist
> @@ -2123,8 +2123,10 @@ GLIBC_2.35 wprintf F
> GLIBC_2.35 write F
> GLIBC_2.35 writev F
> GLIBC_2.35 wscanf F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> index 363939762c..a5a072394d 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
> @@ -2755,8 +2755,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> index f512ad8baf..2d26fd8639 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
> @@ -2788,8 +2788,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> index c9bdc9859c..d9f1c593ea 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
> @@ -2510,8 +2510,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> index f091be30bd..874f33dbcc 100644
> --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
> @@ -2812,8 +2812,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> index 7ea73f9af8..465798a56f 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
> @@ -2379,8 +2379,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> index 333fa62714..ecc0544c05 100644
> --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
> @@ -2579,8 +2579,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> index a867467b12..3e8d00d513 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
> @@ -2753,8 +2753,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> index dbad5b3163..a872a3d186 100644
> --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
> @@ -2547,8 +2547,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> index 6f755cc173..a2938ca2be 100644
> --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
> @@ -2608,8 +2608,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> index 77d936aa3c..ef318251c5 100644
> --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
> @@ -2605,8 +2605,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> index 09bb4363e1..2e2fbe72e2 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
> @@ -2748,8 +2748,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> index 9df9cb6adb..e1991259cd 100644
> --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
> @@ -2574,8 +2574,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> index 4829450ad0..7d0843d1d8 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
> @@ -2525,8 +2525,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> index caea228bcb..761958f768 100644
> --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
> @@ -2631,8 +2631,10 @@ GLIBC_2.35 __memcmpeq F
> GLIBC_2.35 _dl_find_object F
> GLIBC_2.35 epoll_pwait2 F
> GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F
> +GLIBC_2.36 c8rtomb F
> GLIBC_2.36 fsmount F
> GLIBC_2.36 fsopen F
> +GLIBC_2.36 mbrtoc8 F
> GLIBC_2.36 move_mount F
> GLIBC_2.36 pidfd_getfd F
> GLIBC_2.36 pidfd_open F
> diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
> index df9a85f4a9..bda281ad70 100644
> --- a/wcsmbs/Makefile
> +++ b/wcsmbs/Makefile
> @@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \
> 	    wcsmbsload mbsrtowcs_l \
> 	    isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \
> 	    isoc99_swscanf isoc99_vswscanf \
> -	    mbrtoc16 c16rtomb mbrtoc32 c32rtomb
> +	    mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb
> 
> strop-tests :=  wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \
> 		wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \
> diff --git a/wcsmbs/Versions b/wcsmbs/Versions
> index 0b31c1b940..ec28acfb73 100644
> --- a/wcsmbs/Versions
> +++ b/wcsmbs/Versions
> @@ -49,4 +49,7 @@ libc {
>     wcstof32; wcstof64; wcstof32x;
>     wcstof32_l; wcstof64_l; wcstof32x_l;
>   }
> +  GLIBC_2.36 {
> +    c8rtomb; mbrtoc8;
> +  }
> }
> diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c
> new file mode 100644
> index 0000000000..b564770eb5
> --- /dev/null
> +++ b/wcsmbs/c8rtomb.c
> @@ -0,0 +1,132 @@
> +/* UTF-8 to multibyte conversion.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <uchar.h>
> +#include <wchar.h>
> +
> +
> +/* This is the private state used if PS is NULL.  */
> +static mbstate_t state;
> +
> +size_t
> +c8rtomb (char *s, char8_t c8, mbstate_t *ps)
> +{
> +  /* This implementation depends on the converter invoked by wcrtomb not
> +     needing to retain state in either the top most bit of ps->__count or
> +     in ps->__value between invocations.  This implementation uses the
> +     top most bit of ps->__count to indicate that trailing code units are
> +     expected and uses ps->__value to store previously seen code units.  */
> +
> +  wchar_t wc;
> +
> +  if (ps == NULL)
> +    ps = &state;
> +
> +  if (s == NULL)
> +    {
> +      /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'.  If
> +         this occurs for an incomplete code unit sequence, then an error will
> +         be reported below.  */
> +      c8 = u8""[0];
> +    }
> +
> +  if (! (ps->__count & 0x80000000))
> +    {
> +      /* Initial state.  */
> +      if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5)
> +	{
> +	  /* An invalid lead code unit.  */
> +	  __set_errno (EILSEQ);
> +	  return -1;
> +	}
> +      if (c8 >= 0xC2)
> +	{
> +	  /* A valid lead code unit.  */
> +	  ps->__count |= 0x80000000;
> +	  ps->__value.__wchb[0] = c8;
> +	  ps->__value.__wchb[3] = 1;
> +	  return 0;
> +	}
> +      /* A single byte (ASCII) code unit.  */
> +      wc = c8;
> +    }
> +  else
> +    {
> +      char8_t cu1 = ps->__value.__wchb[0];
> +      if (ps->__value.__wchb[3] == 1)
> +	{
> +	  /* A single lead code unit was previously seen.  */
> +	  if ((c8 < 0x80 || c8 > 0xBF)
> +              || (cu1 == 0xE0 && c8 < 0xA0)
> +              || (cu1 == 0xED && c8 > 0x9F)
> +              || (cu1 == 0xF0 && c8 < 0x90)
> +              || (cu1 == 0xF4 && c8 > 0x8F))
> +	    {
> +	      /* An invalid second code unit.  */
> +	      __set_errno (EILSEQ);
> +	      return -1;
> +	    }
> +	  if (cu1 >= 0xE0)
> +	    {
> +	      /* A three or four code unit sequence.  */
> +	      ps->__value.__wchb[1] = c8;
> +	      ++ps->__value.__wchb[3];
> +	      return 0;
> +	    }
> +	  wc = ((cu1 & 0x1F) << 6)
> +	       + (c8 & 0x3F);
> +	}
> +      else
> +	{
> +	  char8_t cu2 = ps->__value.__wchb[1];
> +	  /* A three or four byte code unit sequence.  */
> +	  if (c8 < 0x80 || c8 > 0xBF)
> +	    {
> +	      /* An invalid third or fourth code unit.  */
> +	      __set_errno (EILSEQ);
> +	      return -1;
> +	    }
> +	  if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0)
> +	    {
> +	      /* A four code unit sequence.  */
> +	      ps->__value.__wchb[2] = c8;
> +	      ++ps->__value.__wchb[3];
> +	      return 0;
> +	    }
> +	  if (cu1 < 0xF0)
> +	    {
> +	      wc = ((cu1 & 0x0F) << 12)
> +		   + ((cu2 & 0x3F) << 6)
> +		   + (c8 & 0x3F);
> +	    }
> +	  else
> +	    {
> +	      char8_t cu3 = ps->__value.__wchb[2];
> +	      wc = ((cu1 & 0x07) << 18)
> +		   + ((cu2 & 0x3F) << 12)
> +		   + ((cu3 & 0x3F) << 6)
> +		   + (c8 & 0x3F);
> +	    }
> +	}
> +      ps->__count &= 0x7fffffff;
> +      ps->__value.__wch = 0;
> +    }
> +
> +  return wcrtomb (s, wc, ps);
> +}
> diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c
> new file mode 100644
> index 0000000000..f2fab3b6a7
> --- /dev/null
> +++ b/wcsmbs/mbrtoc8.c
> @@ -0,0 +1,126 @@
> +/* Multibyte to UTF-8 conversion.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <assert.h>
> +#include <dlfcn.h>
> +#include <errno.h>
> +#include <gconv.h>
> +#include <uchar.h>
> +#include <wcsmbsload.h>
> +
> +#include <sysdep.h>
> +
> +#ifndef EILSEQ
> +# define EILSEQ EINVAL
> +#endif
> +
> +
> +/* This is the private state used if PS is NULL.  */
> +static mbstate_t state;
> +
> +size_t
> +mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps)
> +{
> +  /* This implementation depends on the converter invoked by mbrtowc() not

Refer to function name without ‘()’.

> +     needing to retain state in either the top most bit of ps->__count or
> +     in ps->__value between invocations.  This implementation uses the
> +     top most bit of ps->__count to indicate that trailing code units are
> +     yet to be written and uses ps->__value to store those code units.  */
> +
> +  if (ps == NULL)
> +    ps = &state;
> +
> +  /* If state indicates that trailing code units are yet to be written, write
> +     those first regardless of whether 's' is a null pointer.  */
> +  if (ps->__count & 0x80000000)
> +    {
> +      /* ps->__value.__wchb[3] stores the index of the next code unit to
> +         write.  Code units are stored in reverse order.  */
> +      size_t i = ps->__value.__wchb[3];
> +      if (pc8 != NULL)
> +	{
> +	  *pc8 = ps->__value.__wchb[i];
> +	}
> +      if (i == 0)
> +	{
> +	  ps->__count &= 0x7fffffff;
> +	  ps->__value.__wch = 0;
> +	}
> +      else
> +	--ps->__value.__wchb[3];
> +      return -3;
> +    }
> +
> +  if (s == NULL)
> +    {
> +      /* if 's' is a null pointer, behave as if a null pointer was passed for
> +         'pc8', an empty string was passed for 's', and 1 passed for 'n'.  */
> +      pc8 = NULL;
> +      s = "";
> +      n = 1;
> +    }
> +
> +  wchar_t wc;
> +  size_t result;
> +
> +  result = mbrtowc (&wc, s, n, ps);
> +  if (result <= n)
> +    {
> +      if (wc <= 0x7F)
> +	{
> +	  if (pc8 != NULL)
> +	    *pc8 = wc;
> +	}
> +      else if (wc <= 0x7FF)
> +	{
> +	  if (pc8 != NULL)
> +	    *pc8 = 0xC0 + ((wc >> 6) & 0x1F);
> +	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
> +	  ps->__value.__wchb[3] = 0;
> +	  ps->__count |= 0x80000000;
> +	}
> +      else if (wc <= 0xFFFF)
> +	{
> +	  if (pc8 != NULL)
> +	    *pc8 = 0xE0 + ((wc >> 12) & 0x0F);
> +	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
> +	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
> +	  ps->__value.__wchb[3] = 1;
> +	  ps->__count |= 0x80000000;
> +	}
> +      else if (wc <= 0x10FFFF)
> +	{
> +	  if (pc8 != NULL)
> +	    *pc8 = 0xF0 + ((wc >> 18) & 0x07);
> +	  ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F);
> +	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
> +	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
> +	  ps->__value.__wchb[3] = 2;
> +	  ps->__count |= 0x80000000;
> +	}
> +    }
> +  if (result == 0 && wc != 0)
> +    {
> +      /* mbrtowc() never returns -3.  When a MB sequence converts to multiple
> +         WCs, no input is consumed when writing the subsequent WCs resulting
> +         in a result of 0 even if a null character wasn't written.  */
> +      result = -3;
> +    }
> +
> +  return result;
> +}
> diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h
> index 051cdcbeb5..c37e8619a0 100644
> --- a/wcsmbs/uchar.h
> +++ b/wcsmbs/uchar.h
> @@ -31,6 +31,13 @@
> #include <bits/types.h>
> #include <bits/types/mbstate_t.h>
> 
> +/* Declare the C2x char8_t typedef in C2x modes, but only if the C++
> +  __cpp_char8_t feature test macro is not defined.  */
> +#if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
> +/* Define the 8-bit character type.  */
> +typedef unsigned char char8_t;
> +#endif
> +
> #ifndef __USE_ISOCXX11
> /* Define the 16-bit and 32-bit character types.  */
> typedef __uint_least16_t char16_t;
> @@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t;
> 
> __BEGIN_DECLS
> 
> +/* Declare the C2x mbrtoc8() and c8rtomb() functions in C2x modes or if
> +   the C++ __cpp_char8_t feature test macro is defined.  */
> +#if __GLIBC_USE (ISOC2X) || defined __cpp_char8_t
> +/* Write char8_t representation of multibyte character pointed
> +   to by S to PC8.  */
> +extern size_t mbrtoc8  (char8_t *__restrict __pc8,
> +			const char *__restrict __s, size_t __n,
> +			mbstate_t *__restrict __p) __THROW;
> +
> +/* Write multibyte representation of char8_t C8 to S.  */
> +extern size_t c8rtomb  (char *__restrict __s, char8_t __c8,
> +			mbstate_t *__restrict __ps) __THROW;
> +#endif
> +
> /* Write char16_t representation of multibyte character pointed
>    to by S to PC16.  */
> extern size_t mbrtoc16 (char16_t *__restrict __pc16,
> -- 
> 2.32.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 3/3] stdlib: Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-06-30 12:52 ` [PATCH v4 3/3] stdlib: Tests for " Tom Honermann
@ 2022-07-04 18:58   ` Adhemerval Zanella
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2022-07-04 18:58 UTC (permalink / raw)
  To: Tom Honermann; +Cc: libc-alpha



> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> This change adds tests for the mbrtoc8 and c8rtomb functions adopted for
> C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t
> typedef adopted for C2X from WG14 N2653.
> 
> The tests for mbrtoc8 and c8rtomb specifically exercise conversion to
> and from Big5-HKSCS because of special cases that arise with that encoding.
> Big5-HKSCS defines some double byte sequences that convert to more than
> one Unicode code point.  In order to test this, the locale dependencies
> for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS.

LGMT, with some minor style issues below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
> wcsmbs/Makefile       |   3 +-
> wcsmbs/test-c8rtomb.c | 613 ++++++++++++++++++++++++++++++++++++++++++
> wcsmbs/test-mbrtoc8.c | 539 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 1154 insertions(+), 1 deletion(-)
> create mode 100644 wcsmbs/test-c8rtomb.c
> create mode 100644 wcsmbs/test-mbrtoc8.c
> 
> diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
> index bda281ad70..e6b9e8743a 100644
> --- a/wcsmbs/Makefile
> +++ b/wcsmbs/Makefile
> @@ -52,6 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
> 	 tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
> 	 tst-wcstod-round test-char-types tst-fgetwc-after-eof \
> 	 tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
> +	 test-mbrtoc8 test-c8rtomb \
> 	 $(addprefix test-,$(strop-tests)) tst-mbstowcs \
> 	 tst-wprintf-binary
> 
> @@ -59,7 +60,7 @@ include ../Rules
> 
> ifeq ($(run-built-tests),yes)
> LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \
> -	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9
> +	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS
> include ../gen-locales.mk
> 
> $(objpfx)tst-btowc.out: $(gen-locales)
> diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c
> new file mode 100644
> index 0000000000..6d72189e86
> --- /dev/null
> +++ b/wcsmbs/test-c8rtomb.c
> @@ -0,0 +1,613 @@
> +/* Test c8rtomb.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <limits.h>
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <uchar.h>
> +#include <wchar.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +
> +static int
> +test_truncated_code_unit_sequence (void)
> +{
> +  /* Missing trailing code unit for a two code byte unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC2";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing first trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing second trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing first trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing second trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing third trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_invalid_trailing_code_unit_sequence (void)
> +{
> +  /* Invalid trailing code unit for a two code byte unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC2\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid first trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid second trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid first trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid second trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid third trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_lone_trailing_code_units (void)
> +{
> +  /* Lone trailing code unit.  */
> +  const char8_t *u8s = (const char8_t*) u8"\x80";
> +  char buf[MB_LEN_MAX] = { 0 };
> +  mbstate_t s = { 0 };
> +
> +  errno = 0;
> +  TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +  TEST_COMPARE (errno, EILSEQ);
> +
> +  return 0;
> +}
> +
> +static int
> +test_overlong_encoding (void)
> +{
> +  /* Two byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC0\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Two byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC1\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Three byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\x9F\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Four byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_surrogate_range (void)
> +{
> +  /* Would encode U+D800.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xED\xA0\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Would encode U+DFFF.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xED\xBF\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_out_of_range_encoding (void)
> +{
> +  /* Would encode U+00110000.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF4\x90\x80\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Would encode U+00140000.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF5\x90\x80\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_null_output_buffer (void)
> +{
> +  /* Null character with an initial state.  */
> +  {
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1);
> +    /* Assert the state is now an initial state.  */
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Null buffer with a state corresponding to an incompletely read code
> +     unit sequence.  In this case, an error occurs since insufficient
> +     information is available to complete the already started code unit
> +     sequence and return to the initial state.  */
> +  {
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_utf8 (void)
> +{
> +  xsetlocale (LC_ALL, "de_DE.UTF-8");
> +
> +  /* Null character.  */
> +  {
> +    /* U+0000 => 0x00 */
> +    const char8_t *u8s = (const char8_t*) u8"\x00";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x00);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First non-null character in the code point range that maps to a single
> +     code unit.  */
> +  {
> +    /* U+0001 => 0x01 */
> +    const char8_t *u8s = (const char8_t*) u8"\x01";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x01);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to a single code unit.  */
> +  {
> +    /* U+007F => 0x7F */
> +    const char8_t *u8s = (const char8_t*) u8"\x7F";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x7F);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to two code units.  */
> +  {
> +    /* U+0080 => 0xC2 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\xC2\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0xC2);
> +    TEST_COMPARE (buf[1], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to two code units.  */
> +  {
> +    /* U+07FF => 0xDF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\u07FF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0xDF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units.  */
> +  {
> +    /* U+0800 => 0xE0 0xA0 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\u0800";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xE0);
> +    TEST_COMPARE (buf[1], (char) 0xA0);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units
> +     before the surrogate code point range.  */
> +  {
> +    /* U+D7FF => 0xED 0x9F 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uD7FF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xED);
> +    TEST_COMPARE (buf[1], (char) 0x9F);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units
> +     after the surrogate code point range.  */
> +  {
> +    /* U+E000 => 0xEE 0x80 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\uE000";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEE);
> +    TEST_COMPARE (buf[1], (char) 0x80);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Not a BOM.  */
> +  {
> +    /* U+FEFF => 0xEF 0xBB 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uFEFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBB);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Replacement character.  */
> +  {
> +    /* U+FFFD => 0xEF 0xBF 0xBD */
> +    const char8_t *u8s = (const char8_t*) u8"\uFFFD";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_COMPARE (buf[2], (char) 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units.  */
> +  {
> +    /* U+FFFF => 0xEF 0xBF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uFFFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to four code units.  */
> +  {
> +    /* U+10000 => 0xF0 0x90 0x80 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\U00010000";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
> +    TEST_COMPARE (buf[0], (char) 0xF0);
> +    TEST_COMPARE (buf[1], (char) 0x90);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_COMPARE (buf[3], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to four code units.  */
> +  {
> +    /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\U0010FFFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
> +    TEST_COMPARE (buf[0], (char) 0xF4);
> +    TEST_COMPARE (buf[1], (char) 0x8F);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_COMPARE (buf[3], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_big5_hkscs (void)
> +{
> +  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
> +
> +  /* A pair of two byte UTF-8 code unit sequences that map a Unicode code
> +     point and combining character to a single double byte character.  */
> +  {
> +    /* U+00CA U+0304 => 0x88 0x62 */
> +    const char8_t *u8s = (const char8_t*) u8"\u00CA\u0304";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0x88);
> +    TEST_COMPARE (buf[1], (char) 0x62);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code
> +     point and combining character to a single double byte character.  */
> +  {
> +    /* U+00EA U+030C => 0x88 0xA5 */
> +    const char8_t *u8s = (const char8_t*) u8"\u00EA\u030C";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0x88);
> +    TEST_COMPARE (buf[1], (char) 0xA5);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  test_truncated_code_unit_sequence ();
> +  test_invalid_trailing_code_unit_sequence ();
> +  test_lone_trailing_code_units ();
> +  test_overlong_encoding ();
> +  test_surrogate_range ();
> +  test_out_of_range_encoding ();
> +  test_null_output_buffer ();
> +  test_utf8 ();
> +  test_big5_hkscs ();
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c
> new file mode 100644
> index 0000000000..b282fa6dba
> --- /dev/null
> +++ b/wcsmbs/test-mbrtoc8.c
> @@ -0,0 +1,539 @@
> +/* Test mbrtoc8.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <uchar.h>
> +#include <wchar.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +
> +static int
> +test_utf8 (void)
> +{
> +  xsetlocale (LC_ALL, "de_DE.UTF-8");
> +
> +  /* No inputs.  */
> +  {
> +    const char *mbs = "";
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no input */
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Null character.  */
> +  {
> +    const char *mbs = "\x00"; /* 0x00 => U+0000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0);

Style: strlen (mbs) + 1.

> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x00);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First non-null character in the code point range that maps to a single
> +     code unit.  */
> +  {
> +    const char *mbs = "\x01"; /* 0x01 => U+0001 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x01);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to a single code unit.  */
> +  {
> +    const char *mbs = "\x7F"; /* 0x7F => U+007F */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x7F);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to two code units.  */
> +  {
> +    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC2);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC2);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to two code units.  */
> +  {
> +    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xDF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xDF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units.  */
> +  {
> +    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xE0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xA0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xE0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xA0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units
> +     before the surrogate code point range.  */
> +  {
> +    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xED);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x9F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xED);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x9F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units
> +     after the surrogate code point range.  */
> +  {
> +    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEE);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEE);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Not a BOM.  */
> +  {
> +    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBB);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBB);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Replacement character.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to four code units.  */
> +  {
> +    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
> +    mbs += 4;
> +    TEST_COMPARE (buf[0], 0xF0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x90);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xF0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x90);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to four code units.  */
> +  {
> +    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
> +    mbs += 4;
> +    TEST_COMPARE (buf[0], 0xF4);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xF4);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_big5_hkscs (void)
> +{
> +  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
> +
> +  /* A double byte character that maps to a pair of two byte UTF-8 code unit
> +     sequences.  */
> +  {
> +    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8A);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x84);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8A);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x84);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Another double byte character that maps to a pair of two byte UTF-8 code
> +     unit sequences.  */
> +  {
> +    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xAA);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8C);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xAA);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8C);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  test_utf8 ();
> +  test_big5_hkscs ();
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> -- 
> 2.32.0
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb()
  2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
                   ` (2 preceding siblings ...)
  2022-06-30 12:52 ` [PATCH v4 3/3] stdlib: Tests for " Tom Honermann
@ 2022-07-04 19:08 ` Adhemerval Zanella
  2022-07-06  3:27   ` Tom Honermann
  3 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella @ 2022-07-04 19:08 UTC (permalink / raw)
  To: Tom Honermann; +Cc: libc-alpha



> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> This series of patches provides the following:
> - A fix for BZ #25744 [1].
> - Implementations of the mbrtoc8 and c8rtomb functions adopted for
>  C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3].
> - A char8_t typedef as adopted for C2X via WG14 N2653 [3].
> 
> The fix for BZ #25744 [1] is included in this patch series because the tests
> for mbrtoc8 and c8rtomb depend on it for exercising the special case where a
> pair of Unicode code points is converted to/from a single double byte sequence.
> Such conversion cases exist for Big5-HKSCS. 
> 
> N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision
> of the C2X working draft [4].  This patch series enables the new declarations
> in C2X mode and when _GNU_SOURCE is defined. 
> 
> This patch series addresses feedback provided in response to the prior
> submission series starting at [5].  All feedback was addressed except as noted
> below:
> - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant
>  per feedback. I prototyped extending the c++-types.data test as suggested,
>  but that would have introduced a C++20 dependency.  The test demonstrated that
>  the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but
>  char8_t got mapped to 'h' rather than 'Du'.  It might be worth extending this
>  test in the future if/when C++20 support becomes a minimal requirement.
> - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use
>  TEST_COMPARE in favor of the existing custom error handling; the current
>  error messages are customized to provide a more detailed explanation of the
>  error, so removing that seemed like an arguable regression.  My changes retain
>  consistency with the existing code.  I think it is reasonable to switch to
>  TEST_COMPARE, but I think that should be done separately from this change and
>  separately motivated.
> - I did not change the c8rtomb() and mbrtoc8() implementations to use thread
>  local storage when PS is null; the implementations continue to use static
>  storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32().
>  I think it is reasonable to switch all of these functions to use thread
>  local storage, but I think such a change should be done consistently across
>  all of them and should be separately motivated.
> 
> Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian
> Weimer for prior reviews of this patch series.
> 
> Tested on Linux x86_64.
> 
> [1]: Bug 25744
>     "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
>     second byte of certain double byte characters"
>     https://sourceware.org/bugzilla/show_bug.cgi?id=25744
> 
> [2]: WG21 P0482R6
>     "char8_t: A type for UTF-8 characters and strings (Revision 6)"
>     https://wg21.link/p0482r6
> 
> [3]: WG14 N2653
>     "char8_t: A type for UTF-8 characters and strings (Revision 1)"
>     http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm
> 
> [4]: WG14 N2912
>     "Programming languages — C working draft — June 8, 2022"
>     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf
> 
> [5]: libc-alpha mailing list
>     "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()."
>     https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html

Hi Tom,

Patches look good in general, just some really minor style issues I
found in last review.  I have fixed them and if you are ok I can
install them, I have create a branch so you can check [1].

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb()
  2022-07-04 19:08 ` [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Adhemerval Zanella
@ 2022-07-06  3:27   ` Tom Honermann
  2022-07-06 12:23     ` Adhemerval Zanella
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-07-06  3:27 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

On 7/4/22 3:08 PM, Adhemerval Zanella wrote:
>
>> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>
>> This series of patches provides the following:
>> - A fix for BZ #25744 [1].
>> - Implementations of the mbrtoc8 and c8rtomb functions adopted for
>>   C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3].
>> - A char8_t typedef as adopted for C2X via WG14 N2653 [3].
>>
>> The fix for BZ #25744 [1] is included in this patch series because the tests
>> for mbrtoc8 and c8rtomb depend on it for exercising the special case where a
>> pair of Unicode code points is converted to/from a single double byte sequence.
>> Such conversion cases exist for Big5-HKSCS.
>>
>> N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision
>> of the C2X working draft [4].  This patch series enables the new declarations
>> in C2X mode and when _GNU_SOURCE is defined.
>>
>> This patch series addresses feedback provided in response to the prior
>> submission series starting at [5].  All feedback was addressed except as noted
>> below:
>> - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant
>>   per feedback. I prototyped extending the c++-types.data test as suggested,
>>   but that would have introduced a C++20 dependency.  The test demonstrated that
>>   the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but
>>   char8_t got mapped to 'h' rather than 'Du'.  It might be worth extending this
>>   test in the future if/when C++20 support becomes a minimal requirement.
>> - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use
>>   TEST_COMPARE in favor of the existing custom error handling; the current
>>   error messages are customized to provide a more detailed explanation of the
>>   error, so removing that seemed like an arguable regression.  My changes retain
>>   consistency with the existing code.  I think it is reasonable to switch to
>>   TEST_COMPARE, but I think that should be done separately from this change and
>>   separately motivated.
>> - I did not change the c8rtomb() and mbrtoc8() implementations to use thread
>>   local storage when PS is null; the implementations continue to use static
>>   storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32().
>>   I think it is reasonable to switch all of these functions to use thread
>>   local storage, but I think such a change should be done consistently across
>>   all of them and should be separately motivated.
>>
>> Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian
>> Weimer for prior reviews of this patch series.
>>
>> Tested on Linux x86_64.
>>
>> [1]: Bug 25744
>>      "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
>>      second byte of certain double byte characters"
>>      https://sourceware.org/bugzilla/show_bug.cgi?id=25744
>>
>> [2]: WG21 P0482R6
>>      "char8_t: A type for UTF-8 characters and strings (Revision 6)"
>>      https://wg21.link/p0482r6
>>
>> [3]: WG14 N2653
>>      "char8_t: A type for UTF-8 characters and strings (Revision 1)"
>>      http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm
>>
>> [4]: WG14 N2912
>>      "Programming languages — C working draft — June 8, 2022"
>>      https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf
>>
>> [5]: libc-alpha mailing list
>>      "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()."
>>      https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html
> Hi Tom,
>
> Patches look good in general, just some really minor style issues I
> found in last review.  I have fixed them and if you are ok I can
> install them, I have create a branch so you can check [1].

Excellent, thank you! I verified that your branch differs from the 
patches I posted just by the style issues mentioned in your review and 
the additional removal of the (empty) 
sysdeps/mach/hurd/libhurduser.abilist and 
sysdeps/mach/libmachuser.abilist files (which I presume to be 
intentional). Looks good to me! I very much appreciate you making the 
additional style corrections!

Tom.

>
> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb()
  2022-07-06  3:27   ` Tom Honermann
@ 2022-07-06 12:23     ` Adhemerval Zanella
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella @ 2022-07-06 12:23 UTC (permalink / raw)
  To: Tom Honermann; +Cc: libc-alpha



> On 6 Jul 2022, at 00:27, Tom Honermann <tom@honermann.net> wrote:
> 
> On 7/4/22 3:08 PM, Adhemerval Zanella wrote:
>> 
>>> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
>>> 
>>> This series of patches provides the following:
>>> - A fix for BZ #25744 [1].
>>> - Implementations of the mbrtoc8 and c8rtomb functions adopted for
>>>  C++20 via WG21 P0482R6 [2] and for C2X via WG14 N2653 [3].
>>> - A char8_t typedef as adopted for C2X via WG14 N2653 [3].
>>> 
>>> The fix for BZ #25744 [1] is included in this patch series because the tests
>>> for mbrtoc8 and c8rtomb depend on it for exercising the special case where a
>>> pair of Unicode code points is converted to/from a single double byte sequence.
>>> Such conversion cases exist for Big5-HKSCS.
>>> 
>>> N2653 was adopted by WG14 for C2X and wording is present in the N2912 revision
>>> of the C2X working draft [4].  This patch series enables the new declarations
>>> in C2X mode and when _GNU_SOURCE is defined.
>>> 
>>> This patch series addresses feedback provided in response to the prior
>>> submission series starting at [5].  All feedback was addressed except as noted
>>> below:
>>> - I removed the previously proposed wcsmbs/test-char8-type.c test as redundant
>>>  per feedback. I prototyped extending the c++-types.data test as suggested,
>>>  but that would have introduced a C++20 dependency.  The test demonstrated that
>>>  the char16_t and char32_t types were mapped to 'Ds' and 'Di' as expected, but
>>>  char8_t got mapped to 'h' rather than 'Du'.  It might be worth extending this
>>>  test in the future if/when C++20 support becomes a minimal requirement.
>>> - I did not switch the iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c test to use
>>>  TEST_COMPARE in favor of the existing custom error handling; the current
>>>  error messages are customized to provide a more detailed explanation of the
>>>  error, so removing that seemed like an arguable regression.  My changes retain
>>>  consistency with the existing code.  I think it is reasonable to switch to
>>>  TEST_COMPARE, but I think that should be done separately from this change and
>>>  separately motivated.
>>> - I did not change the c8rtomb() and mbrtoc8() implementations to use thread
>>>  local storage when PS is null; the implementations continue to use static
>>>  storage consistent with c16rtomb(), mbrtoc16(), c32rtomb(), and mbrtoc32().
>>>  I think it is reasonable to switch all of these functions to use thread
>>>  local storage, but I think such a change should be done consistently across
>>>  all of them and should be separately motivated.
>>> 
>>> Thank you to Joseph Myers, Carlos O'Donell, Adhemerval Zanella, and Florian
>>> Weimer for prior reviews of this patch series.
>>> 
>>> Tested on Linux x86_64.
>>> 
>>> [1]: Bug 25744
>>>     "mbrtowc with Big5-HKSCS returns 2 instead of 1 when consuming the
>>>     second byte of certain double byte characters"
>>>     https://sourceware.org/bugzilla/show_bug.cgi?id=25744
>>> 
>>> [2]: WG21 P0482R6
>>>     "char8_t: A type for UTF-8 characters and strings (Revision 6)"
>>>     https://wg21.link/p0482r6
>>> 
>>> [3]: WG14 N2653
>>>     "char8_t: A type for UTF-8 characters and strings (Revision 1)"
>>>     http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm
>>> 
>>> [4]: WG14 N2912
>>>     "Programming languages — C working draft — June 8, 2022"
>>>     https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf
>>> 
>>> [5]: libc-alpha mailing list
>>>     "[PATCH 0/3]: C++20 P0482R6 and C2X N2653: support for char8_t, mbrtoc8(), and c8rtomb()."
>>>     https://sourceware.org/pipermail/libc-alpha/2022-February/136723.html
>> Hi Tom,
>> 
>> Patches look good in general, just some really minor style issues I
>> found in last review.  I have fixed them and if you are ok I can
>> install them, I have create a branch so you can check [1].
> 
> Excellent, thank you! I verified that your branch differs from the patches I posted just by the style issues mentioned in your review and the additional removal of the (empty) sysdeps/mach/hurd/libhurduser.abilist and sysdeps/mach/libmachuser.abilist files (which I presume to be intentional). Looks good to me! I very much appreciate you making the additional style corrections!
> 
> Tom.

The hurd files removal is a mistake from my part (we should really fix
make update-abi/check-abi on hurd...).  I will fix it and install the
patches.

> 
>> 
>> [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/mbrtoc8-c8rtomb
>> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-04 18:33   ` Adhemerval Zanella
@ 2022-07-19 21:08     ` Joseph Myers
  2022-07-20 12:04       ` Adhemerval Zanella Netto
  2022-07-20 16:47       ` Tom Honermann
  0 siblings, 2 replies; 26+ messages in thread
From: Joseph Myers @ 2022-07-19 21:08 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Tom Honermann, libc-alpha

This change appears to introduce a failure of 
wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now 
produces:

../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
   38 | typedef unsigned char char8_t;

(recall we want our installed headers to avoid warnings *without* relying 
on the default disabling of warnings in system headers).

Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside 
__extension__ (unlike what the C front end does), so simply adding 
__extension__ to that declaration wouldn't help, but we could use 
diagnostic disabling pragmas (as already done in some installed headers).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-19 21:08     ` Joseph Myers
@ 2022-07-20 12:04       ` Adhemerval Zanella Netto
  2022-07-20 13:54         ` Florian Weimer
  2022-07-20 16:47       ` Tom Honermann
  1 sibling, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-20 12:04 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Tom Honermann, libc-alpha



On 19/07/22 18:08, Joseph Myers wrote:
> This change appears to introduce a failure of 
> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now 
> produces:
> 
> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>    38 | typedef unsigned char char8_t;
> 
> (recall we want our installed headers to avoid warnings *without* relying 
> on the default disabling of warnings in system headers).
> 
> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside 
> __extension__ (unlike what the C front end does), so simply adding 
> __extension__ to that declaration wouldn't help, but we could use 
> diagnostic disabling pragmas (as already done in some installed headers).
> 

My understanding is compiler will define __cpp_char8_t exactly to avoid
such redefinition.  But it seems from gcc documentation that it is only
actually enabled with -fchar8_t.  Do we a preprocessor flag to indicate
it? 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-20 12:04       ` Adhemerval Zanella Netto
@ 2022-07-20 13:54         ` Florian Weimer
  2022-07-20 14:31           ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2022-07-20 13:54 UTC (permalink / raw)
  To: Adhemerval Zanella Netto via Libc-alpha
  Cc: Joseph Myers, Adhemerval Zanella Netto, Tom Honermann

* Adhemerval Zanella Netto via Libc-alpha:

> On 19/07/22 18:08, Joseph Myers wrote:
>> This change appears to introduce a failure of 
>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now 
>> produces:
>> 
>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>    38 | typedef unsigned char char8_t;
>> 
>> (recall we want our installed headers to avoid warnings *without* relying 
>> on the default disabling of warnings in system headers).
>> 
>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside 
>> __extension__ (unlike what the C front end does), so simply adding 
>> __extension__ to that declaration wouldn't help, but we could use 
>> diagnostic disabling pragmas (as already done in some installed headers).

> My understanding is compiler will define __cpp_char8_t exactly to avoid
> such redefinition.  But it seems from gcc documentation that it is only
> actually enabled with -fchar8_t.  Do we a preprocessor flag to indicate
> it? 

I think __cpp_char8_t is only defined if the language mode is active.
The warning is independent of the language mode, though.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-20 13:54         ` Florian Weimer
@ 2022-07-20 14:31           ` Adhemerval Zanella Netto
  2022-07-20 15:05             ` Florian Weimer
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-20 14:31 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella Netto via Libc-alpha
  Cc: Joseph Myers, Tom Honermann



On 20/07/22 10:54, Florian Weimer wrote:
> * Adhemerval Zanella Netto via Libc-alpha:
> 
>> On 19/07/22 18:08, Joseph Myers wrote:
>>> This change appears to introduce a failure of 
>>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now 
>>> produces:
>>>
>>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>>    38 | typedef unsigned char char8_t;
>>>
>>> (recall we want our installed headers to avoid warnings *without* relying 
>>> on the default disabling of warnings in system headers).
>>>
>>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside 
>>> __extension__ (unlike what the C front end does), so simply adding 
>>> __extension__ to that declaration wouldn't help, but we could use 
>>> diagnostic disabling pragmas (as already done in some installed headers).
> 
>> My understanding is compiler will define __cpp_char8_t exactly to avoid
>> such redefinition.  But it seems from gcc documentation that it is only
>> actually enabled with -fchar8_t.  Do we a preprocessor flag to indicate
>> it? 
> 
> I think __cpp_char8_t is only defined if the language mode is active.
> The warning is independent of the language mode, though.

Maybe we also skip chat8_t definition for __cplusplus == 202002L ?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-20 14:31           ` Adhemerval Zanella Netto
@ 2022-07-20 15:05             ` Florian Weimer
  2022-07-20 16:53               ` Tom Honermann
  0 siblings, 1 reply; 26+ messages in thread
From: Florian Weimer @ 2022-07-20 15:05 UTC (permalink / raw)
  To: Adhemerval Zanella Netto
  Cc: Adhemerval Zanella Netto via Libc-alpha, Joseph Myers, Tom Honermann

* Adhemerval Zanella Netto:

>> I think __cpp_char8_t is only defined if the language mode is active.
>> The warning is independent of the language mode, though.
>
> Maybe we also skip chat8_t definition for __cplusplus == 202002L ?

We have to do that in any case, but we still need to deal with the
warning in the earlier language modes.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-19 21:08     ` Joseph Myers
  2022-07-20 12:04       ` Adhemerval Zanella Netto
@ 2022-07-20 16:47       ` Tom Honermann
  2022-07-21 19:22         ` Adhemerval Zanella Netto
  1 sibling, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-07-20 16:47 UTC (permalink / raw)
  To: Joseph Myers, Adhemerval Zanella; +Cc: libc-alpha

Confirmed that this issue can be easily reproduced outside the testsuite.

$ cat t.cpp
#include <uchar.h>

$ g++ --version
g++ (GCC) 13.0.0 20220720 (experimental)
...

$ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 
-Werror=c++20-compat t.cpp
In file included from t.cpp:1:
/home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: 
identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
    38 | typedef unsigned char char8_t;
       |                       ^~~~~~~
cc1plus: some warnings being treated as errors

The char8_t typedef is currently guarded by:

/* Declare the C2x char8_t typedef in C2x modes, but only if the C++
   __cpp_char8_t feature test macro is not defined.  */
#if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
/* Define the 8-bit character type.  */
typedef unsigned char char8_t;
#endif

__GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally 
defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or 
should only) imply __GLIBC_USE (ISOC11).

Regardless, it seems that directives should be added to suppress the 
diagnostic. I tried prototyping such a fix, but it doesn't seem to work 
for me. I don't understand why.

$ diff -U3 uchar.h.old uchar.h
--- uchar.h.old 2022-07-20 12:37:55.544301692 -0400
+++ uchar.h     2022-07-20 12:43:21.124365563 -0400
@@ -34,8 +34,17 @@
  /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
    __cpp_char8_t feature test macro is not defined.  */
  #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
+/* Suppress the C++20 compatability diagnostic regarding char8_t being a
+   keyword.  */
+#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Wc++20-compat"
+#endif
  /* Define the 8-bit character type.  */
  typedef unsigned char char8_t;
+#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
+# pragma GCC diagnostic pop
+#endif
  #endif

  #ifndef __USE_ISOCXX11

Tom.

On 7/19/22 5:08 PM, Joseph Myers wrote:
> This change appears to introduce a failure of
> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now
> produces:
>
> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>     38 | typedef unsigned char char8_t;
>
> (recall we want our installed headers to avoid warnings *without* relying
> on the default disabling of warnings in system headers).
>
> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside
> __extension__ (unlike what the C front end does), so simply adding
> __extension__ to that declaration wouldn't help, but we could use
> diagnostic disabling pragmas (as already done in some installed headers).
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-20 15:05             ` Florian Weimer
@ 2022-07-20 16:53               ` Tom Honermann
  0 siblings, 0 replies; 26+ messages in thread
From: Tom Honermann @ 2022-07-20 16:53 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella Netto
  Cc: Adhemerval Zanella Netto via Libc-alpha, Joseph Myers

On 7/20/22 11:05 AM, Florian Weimer wrote:
> * Adhemerval Zanella Netto:
>
>>> I think __cpp_char8_t is only defined if the language mode is active.
>>> The warning is independent of the language mode, though.
>> Maybe we also skip chat8_t definition for __cplusplus == 202002L ?
> We have to do that in any case, but we still need to deal with the
> warning in the earlier language modes.

Declaring the char8_t typedef (and related mbrtoc8 and c8rtomb 
functions) in C++ modes that don't enable the builtin char8_t type 
(including -std=c++20 -fno-char8_t) is beneficial for compatibility with 
C in C23 or _GNU_SOURCE modes.

Tom.

>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-20 16:47       ` Tom Honermann
@ 2022-07-21 19:22         ` Adhemerval Zanella Netto
  2022-07-21 20:51           ` Tom Honermann
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-21 19:22 UTC (permalink / raw)
  To: Tom Honermann, Joseph Myers; +Cc: libc-alpha



On 20/07/22 13:47, Tom Honermann wrote:
> Confirmed that this issue can be easily reproduced outside the testsuite.
> 
> $ cat t.cpp
> #include <uchar.h>
> 
> $ g++ --version
> g++ (GCC) 13.0.0 20220720 (experimental)
> ...
> 
> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
> In file included from t.cpp:1:
> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>    38 | typedef unsigned char char8_t;
>       |                       ^~~~~~~
> cc1plus: some warnings being treated as errors
> 
> The char8_t typedef is currently guarded by:
> 
> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>   __cpp_char8_t feature test macro is not defined.  */
> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
> /* Define the 8-bit character type.  */
> typedef unsigned char char8_t;
> #endif
> 
> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
> 
> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.

I have tried as well and I can't get to work either.  It would expect to work
as we have done bits/stdlib-bsearch.h, could it be a gcc issue?

> 
> $ diff -U3 uchar.h.old uchar.h
> --- uchar.h.old 2022-07-20 12:37:55.544301692 -0400
> +++ uchar.h     2022-07-20 12:43:21.124365563 -0400
> @@ -34,8 +34,17 @@
>  /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>    __cpp_char8_t feature test macro is not defined.  */
>  #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
> +/* Suppress the C++20 compatability diagnostic regarding char8_t being a
> +   keyword.  */
> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
> +# pragma GCC diagnostic push
> +# pragma GCC diagnostic ignored "-Wc++20-compat"
> +#endif
>  /* Define the 8-bit character type.  */
>  typedef unsigned char char8_t;
> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
> +# pragma GCC diagnostic pop
> +#endif
>  #endif
> 
>  #ifndef __USE_ISOCXX11
> 
> Tom.
> 
> On 7/19/22 5:08 PM, Joseph Myers wrote:
>> This change appears to introduce a failure of
>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now
>> produces:
>>
>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>     38 | typedef unsigned char char8_t;
>>
>> (recall we want our installed headers to avoid warnings *without* relying
>> on the default disabling of warnings in system headers).
>>
>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside
>> __extension__ (unlike what the C front end does), so simply adding
>> __extension__ to that declaration wouldn't help, but we could use
>> diagnostic disabling pragmas (as already done in some installed headers).
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-21 19:22         ` Adhemerval Zanella Netto
@ 2022-07-21 20:51           ` Tom Honermann
  2022-07-21 20:56             ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-07-21 20:51 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, Joseph Myers; +Cc: libc-alpha

On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>
> On 20/07/22 13:47, Tom Honermann wrote:
>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>
>> $ cat t.cpp
>> #include <uchar.h>
>>
>> $ g++ --version
>> g++ (GCC) 13.0.0 20220720 (experimental)
>> ...
>>
>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>> In file included from t.cpp:1:
>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>     38 | typedef unsigned char char8_t;
>>        |                       ^~~~~~~
>> cc1plus: some warnings being treated as errors
>>
>> The char8_t typedef is currently guarded by:
>>
>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>    __cpp_char8_t feature test macro is not defined.  */
>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>> /* Define the 8-bit character type.  */
>> typedef unsigned char char8_t;
>> #endif
>>
>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>
>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
> I have tried as well and I can't get to work either.  It would expect to work
> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?

Yes, this appears to be a gcc issue. I spent some time looking at gcc 
source code, but didn't find anything obvious. I verified the same 
technique does work to suppress the similar warning issued for use of, 
e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I 
found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", 
but none for -Wc++20-compat (or -Wc++11-compat).

Tom.

>
>> $ diff -U3 uchar.h.old uchar.h
>> --- uchar.h.old 2022-07-20 12:37:55.544301692 -0400
>> +++ uchar.h     2022-07-20 12:43:21.124365563 -0400
>> @@ -34,8 +34,17 @@
>>   /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>     __cpp_char8_t feature test macro is not defined.  */
>>   #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>> +/* Suppress the C++20 compatability diagnostic regarding char8_t being a
>> +   keyword.  */
>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
>> +# pragma GCC diagnostic push
>> +# pragma GCC diagnostic ignored "-Wc++20-compat"
>> +#endif
>>   /* Define the 8-bit character type.  */
>>   typedef unsigned char char8_t;
>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
>> +# pragma GCC diagnostic pop
>> +#endif
>>   #endif
>>
>>   #ifndef __USE_ISOCXX11
>>
>> Tom.
>>
>> On 7/19/22 5:08 PM, Joseph Myers wrote:
>>> This change appears to introduce a failure of
>>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now
>>> produces:
>>>
>>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>>      38 | typedef unsigned char char8_t;
>>>
>>> (recall we want our installed headers to avoid warnings *without* relying
>>> on the default disabling of warnings in system headers).
>>>
>>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside
>>> __extension__ (unlike what the C front end does), so simply adding
>>> __extension__ to that declaration wouldn't help, but we could use
>>> diagnostic disabling pragmas (as already done in some installed headers).
>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-21 20:51           ` Tom Honermann
@ 2022-07-21 20:56             ` Adhemerval Zanella Netto
  2022-07-22  5:24               ` Tom Honermann
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-21 20:56 UTC (permalink / raw)
  To: Tom Honermann, Joseph Myers; +Cc: libc-alpha



On 21/07/22 17:51, Tom Honermann wrote:
> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>
>> On 20/07/22 13:47, Tom Honermann wrote:
>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>
>>> $ cat t.cpp
>>> #include <uchar.h>
>>>
>>> $ g++ --version
>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>> ...
>>>
>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>> In file included from t.cpp:1:
>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>     38 | typedef unsigned char char8_t;
>>>        |                       ^~~~~~~
>>> cc1plus: some warnings being treated as errors
>>>
>>> The char8_t typedef is currently guarded by:
>>>
>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>    __cpp_char8_t feature test macro is not defined.  */
>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>> /* Define the 8-bit character type.  */
>>> typedef unsigned char char8_t;
>>> #endif
>>>
>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>
>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>> I have tried as well and I can't get to work either.  It would expect to work
>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
> 
> Yes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
> 
> Tom.
> 

In any case I think the fix below is the correct way (in fact I don't see
another way so I am assuming a compiler issue here).  We also need to avoid 
declare the typedef for __cplusplus >= 202002L.

>>
>>> $ diff -U3 uchar.h.old uchar.h
>>> --- uchar.h.old 2022-07-20 12:37:55.544301692 -0400
>>> +++ uchar.h     2022-07-20 12:43:21.124365563 -0400
>>> @@ -34,8 +34,17 @@
>>>   /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>     __cpp_char8_t feature test macro is not defined.  */
>>>   #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>> +/* Suppress the C++20 compatability diagnostic regarding char8_t being a
>>> +   keyword.  */
>>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)

Use __GNUC_PREREQ.

>>> +# pragma GCC diagnostic push
>>> +# pragma GCC diagnostic ignored "-Wc++20-compat"
>>> +#endif
>>>   /* Define the 8-bit character type.  */
>>>   typedef unsigned char char8_t;
>>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
>>> +# pragma GCC diagnostic pop
>>> +#endif
>>>   #endif
>>>
>>>   #ifndef __USE_ISOCXX11
>>>
>>> Tom.
>>>
>>> On 7/19/22 5:08 PM, Joseph Myers wrote:
>>>> This change appears to introduce a failure of
>>>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now
>>>> produces:
>>>>
>>>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>>>      38 | typedef unsigned char char8_t;
>>>>
>>>> (recall we want our installed headers to avoid warnings *without* relying
>>>> on the default disabling of warnings in system headers).
>>>>
>>>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside
>>>> __extension__ (unlike what the C front end does), so simply adding
>>>> __extension__ to that declaration wouldn't help, but we could use
>>>> diagnostic disabling pragmas (as already done in some installed headers).
>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-21 20:56             ` Adhemerval Zanella Netto
@ 2022-07-22  5:24               ` Tom Honermann
  2022-07-22 11:21                 ` Adhemerval Zanella Netto
  2022-07-24  4:46                 ` Tom Honermann
  0 siblings, 2 replies; 26+ messages in thread
From: Tom Honermann @ 2022-07-22  5:24 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, Joseph Myers; +Cc: libc-alpha

On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>
> On 21/07/22 17:51, Tom Honermann wrote:
>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>>
>>>> $ cat t.cpp
>>>> #include <uchar.h>
>>>>
>>>> $ g++ --version
>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>> ...
>>>>
>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>>> In file included from t.cpp:1:
>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>      38 | typedef unsigned char char8_t;
>>>>         |                       ^~~~~~~
>>>> cc1plus: some warnings being treated as errors
>>>>
>>>> The char8_t typedef is currently guarded by:
>>>>
>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>     __cpp_char8_t feature test macro is not defined.  */
>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>> /* Define the 8-bit character type.  */
>>>> typedef unsigned char char8_t;
>>>> #endif
>>>>
>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>>
>>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>>> I have tried as well and I can't get to work either.  It would expect to work
>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>> Yes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>
>> Tom.
>>
> In any case I think the fix below is the correct way (in fact I don't see
> another way so I am assuming a compiler issue here).
I agree. I debugged gcc tonight and discovered what the problem was. 
I'll submit a patch to gcc.
> We also need to avoid
> declare the typedef for __cplusplus >= 202002L.
The typedef is already avoided if the __cpp_char8_t feature test macro 
is defined (builtin char8_t support can be enabled in previous C++ 
standard modes via the -fchar8_t option).
>
>>>> $ diff -U3 uchar.h.old uchar.h
>>>> --- uchar.h.old 2022-07-20 12:37:55.544301692 -0400
>>>> +++ uchar.h     2022-07-20 12:43:21.124365563 -0400
>>>> @@ -34,8 +34,17 @@
>>>>    /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>      __cpp_char8_t feature test macro is not defined.  */
>>>>    #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>> +/* Suppress the C++20 compatability diagnostic regarding char8_t being a
>>>> +   keyword.  */
>>>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
> Use __GNUC_PREREQ.

Yes, thank you. I also determined that the -Wc++20-compat option was 
added to gcc 10 and is only recognized in C++ modes, so the diagnostic 
suppression guards will be __GNUC_PREREQ (10, 0) && defined __cplusplus.

I'll follow up with a patch soon.

Tom.

>
>>>> +# pragma GCC diagnostic push
>>>> +# pragma GCC diagnostic ignored "-Wc++20-compat"
>>>> +#endif
>>>>    /* Define the 8-bit character type.  */
>>>>    typedef unsigned char char8_t;
>>>> +#if defined __GNUC__ && 4 < __GNUC__ + (6 <= __GNUC_MINOR__)
>>>> +# pragma GCC diagnostic pop
>>>> +#endif
>>>>    #endif
>>>>
>>>>    #ifndef __USE_ISOCXX11
>>>>
>>>> Tom.
>>>>
>>>> On 7/19/22 5:08 PM, Joseph Myers wrote:
>>>>> This change appears to introduce a failure of
>>>>> wcsmbs/check-installed-headers-cxx with GCC mainline, because uchar.h now
>>>>> produces:
>>>>>
>>>>> ../wcsmbs/uchar.h:38:23: error: identifier 'char8_t' is a keyword in C++20 [-Werror=c++20-compat]
>>>>>       38 | typedef unsigned char char8_t;
>>>>>
>>>>> (recall we want our installed headers to avoid warnings *without* relying
>>>>> on the default disabling of warnings in system headers).
>>>>>
>>>>> Unfortunately, GCC for C++ doesn't disable -Wc++20-compat inside
>>>>> __extension__ (unlike what the C front end does), so simply adding
>>>>> __extension__ to that declaration wouldn't help, but we could use
>>>>> diagnostic disabling pragmas (as already done in some installed headers).
>>>>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-22  5:24               ` Tom Honermann
@ 2022-07-22 11:21                 ` Adhemerval Zanella Netto
  2022-07-22 14:15                   ` Adhemerval Zanella Netto
  2022-07-24  4:46                 ` Tom Honermann
  1 sibling, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-22 11:21 UTC (permalink / raw)
  To: Tom Honermann, Joseph Myers; +Cc: libc-alpha



On 22/07/22 02:24, Tom Honermann wrote:
> On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>>
>> On 21/07/22 17:51, Tom Honermann wrote:
>>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>>>
>>>>> $ cat t.cpp
>>>>> #include <uchar.h>
>>>>>
>>>>> $ g++ --version
>>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>>> ...
>>>>>
>>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>>>> In file included from t.cpp:1:
>>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>>      38 | typedef unsigned char char8_t;
>>>>>         |                       ^~~~~~~
>>>>> cc1plus: some warnings being treated as errors
>>>>>
>>>>> The char8_t typedef is currently guarded by:
>>>>>
>>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>>     __cpp_char8_t feature test macro is not defined.  */
>>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>>> /* Define the 8-bit character type.  */
>>>>> typedef unsigned char char8_t;
>>>>> #endif
>>>>>
>>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>>>
>>>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>>>> I have tried as well and I can't get to work either.  It would expect to work
>>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>>> Yes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>>
>>> Tom.
>>>
>> In any case I think the fix below is the correct way (in fact I don't see
>> another way so I am assuming a compiler issue here).
> I agree. I debugged gcc tonight and discovered what the problem was. I'll submit a patch to gcc.
>> We also need to avoid
>> declare the typedef for __cplusplus >= 202002L.
> The typedef is already avoided if the __cpp_char8_t feature test macro is defined (builtin char8_t support can be enabled in previous C++ standard modes via the -fchar8_t option).

If the compiler preprocessor defined for -std=c++20? I had the impression it is
enabled iff -fchar8_t is set.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-22 11:21                 ` Adhemerval Zanella Netto
@ 2022-07-22 14:15                   ` Adhemerval Zanella Netto
  2022-07-22 17:00                     ` Tom Honermann
  0 siblings, 1 reply; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-22 14:15 UTC (permalink / raw)
  To: Tom Honermann, Joseph Myers; +Cc: libc-alpha



On 22/07/22 08:21, Adhemerval Zanella Netto wrote:
> 
> 
> On 22/07/22 02:24, Tom Honermann wrote:
>> On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>>>
>>> On 21/07/22 17:51, Tom Honermann wrote:
>>>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>>>>
>>>>>> $ cat t.cpp
>>>>>> #include <uchar.h>
>>>>>>
>>>>>> $ g++ --version
>>>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>>>> ...
>>>>>>
>>>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>>>>> In file included from t.cpp:1:
>>>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>>>      38 | typedef unsigned char char8_t;
>>>>>>         |                       ^~~~~~~
>>>>>> cc1plus: some warnings being treated as errors
>>>>>>
>>>>>> The char8_t typedef is currently guarded by:
>>>>>>
>>>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>>>     __cpp_char8_t feature test macro is not defined.  */
>>>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>>>> /* Define the 8-bit character type.  */
>>>>>> typedef unsigned char char8_t;
>>>>>> #endif
>>>>>>
>>>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>>>>
>>>>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>>>>> I have tried as well and I can't get to work either.  It would expect to work
>>>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>>>> Yes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>>>
>>>> Tom.
>>>>
>>> In any case I think the fix below is the correct way (in fact I don't see
>>> another way so I am assuming a compiler issue here).
>> I agree. I debugged gcc tonight and discovered what the problem was. I'll submit a patch to gcc.
>>> We also need to avoid
>>> declare the typedef for __cplusplus >= 202002L.
>> The typedef is already avoided if the __cpp_char8_t feature test macro is defined (builtin char8_t support can be enabled in previous C++ standard modes via the -fchar8_t option).
> 
> If the compiler preprocessor defined for -std=c++20? I had the impression it is
> enabled iff -fchar8_t is set.

I realized now I did not write proper english, I meant to ask if gcc always define
__cpp_char8_t for -std=c++20 and higher since I had the impression that the
define is only active iff -fchar8_t is actively used.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-22 14:15                   ` Adhemerval Zanella Netto
@ 2022-07-22 17:00                     ` Tom Honermann
  2022-07-22 17:01                       ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 26+ messages in thread
From: Tom Honermann @ 2022-07-22 17:00 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, Joseph Myers; +Cc: libc-alpha

On 7/22/22 10:15 AM, Adhemerval Zanella Netto wrote:
>
> On 22/07/22 08:21, Adhemerval Zanella Netto wrote:
>>
>> On 22/07/22 02:24, Tom Honermann wrote:
>>> On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>>>> On 21/07/22 17:51, Tom Honermann wrote:
>>>>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>>>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>>>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>>>>>
>>>>>>> $ cat t.cpp
>>>>>>> #include <uchar.h>
>>>>>>>
>>>>>>> $ g++ --version
>>>>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>>>>> ...
>>>>>>>
>>>>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>>>>>> In file included from t.cpp:1:
>>>>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>>>>       38 | typedef unsigned char char8_t;
>>>>>>>          |                       ^~~~~~~
>>>>>>> cc1plus: some warnings being treated as errors
>>>>>>>
>>>>>>> The char8_t typedef is currently guarded by:
>>>>>>>
>>>>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>>>>      __cpp_char8_t feature test macro is not defined.  */
>>>>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>>>>> /* Define the 8-bit character type.  */
>>>>>>> typedef unsigned char char8_t;
>>>>>>> #endif
>>>>>>>
>>>>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>>>>>
>>>>>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>>>>>> I have tried as well and I can't get to work either.  It would expect to work
>>>>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>>>>> prereqYes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>>>>
>>>>> Tom.
>>>>>
>>>> In any case I think the fix below is the correct way (in fact I don't see
>>>> another way so I am assuming a compiler issue here).
>>> I agree. I debugged gcc tonight and discovered what the problem was. I'll submit a patch to gcc.
>>>> We also need to avoid
>>>> declare the typedef for __cplusplus >= 202002L.
>>> The typedef is already avoided if the __cpp_char8_t feature test macro is defined (builtin char8_t support can be enabled in previous C++ standard modes via the -fchar8_t option).
>> If the compiler preprocessor defined for -std=c++20? I had the impression it is
>> enabled iff -fchar8_t is set.
> I realized now I did not write proper english, I meant to ask if gcc always define
> __cpp_char8_t for -std=c++20 and higher since I had the impression that the
> define is only active iff -fchar8_t is actively used.

English is hard :)

For gcc, clang, and MSVC, -std=c++20 (/std:c++20 for MSVC) implies 
-fchar8_t (/Zc:char8_t), but that feature can be disabled with 
-fno-char8_t (/Zc:char8_t-). For example:

$ gcc -E -dM -x c++ -std=c++17 /dev/null | grep __cpp_char8_t
$ gcc -E -dM -x c++ -std=c++17 -fchar8_t /dev/null | grep __cpp_char8_t
#define __cpp_char8_t 201811L
$ gcc -E -dM -x c++ -std=c++20 /dev/null | grep __cpp_char8_t
#define __cpp_char8_t 201811L
$ gcc -E -dM -x c++ -std=c++20 -fno-char8_t /dev/null | grep __cpp_char8_t

Tom.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-22 17:00                     ` Tom Honermann
@ 2022-07-22 17:01                       ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 26+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-22 17:01 UTC (permalink / raw)
  To: Tom Honermann, Joseph Myers; +Cc: libc-alpha



On 22/07/22 14:00, Tom Honermann wrote:
> On 7/22/22 10:15 AM, Adhemerval Zanella Netto wrote:
>>
>> On 22/07/22 08:21, Adhemerval Zanella Netto wrote:
>>>
>>> On 22/07/22 02:24, Tom Honermann wrote:
>>>> On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>>>>> On 21/07/22 17:51, Tom Honermann wrote:
>>>>>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>>>>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>>>>>> Confirmed that this issue can be easily reproduced outside the testsuite.
>>>>>>>>
>>>>>>>> $ cat t.cpp
>>>>>>>> #include <uchar.h>
>>>>>>>>
>>>>>>>> $ g++ --version
>>>>>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>>>>>> ...
>>>>>>>>
>>>>>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 -Werror=c++20-compat t.cpp
>>>>>>>> In file included from t.cpp:1:
>>>>>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>>>>>       38 | typedef unsigned char char8_t;
>>>>>>>>          |                       ^~~~~~~
>>>>>>>> cc1plus: some warnings being treated as errors
>>>>>>>>
>>>>>>>> The char8_t typedef is currently guarded by:
>>>>>>>>
>>>>>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>>>>>      __cpp_char8_t feature test macro is not defined.  */
>>>>>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>>>>>> /* Define the 8-bit character type.  */
>>>>>>>> typedef unsigned char char8_t;
>>>>>>>> #endif
>>>>>>>>
>>>>>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally defines _GNU_SOURCE. I believe otherwise, C++17 mode would only (or should only) imply __GLIBC_USE (ISOC11).
>>>>>>>>
>>>>>>>> Regardless, it seems that directives should be added to suppress the diagnostic. I tried prototyping such a fix, but it doesn't seem to work for me. I don't understand why.
>>>>>>> I have tried as well and I can't get to work either.  It would expect to work
>>>>>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>>>>>> prereqYes, this appears to be a gcc issue. I spent some time looking at gcc source code, but didn't find anything obvious. I verified the same technique does work to suppress the similar warning issued for use of, e.g., constexpr, as an identifier when -Wc++11-compat is enabled. I found tests that exercise #pragma GCC diagnostic ignored "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>>>>>
>>>>>> Tom.
>>>>>>
>>>>> In any case I think the fix below is the correct way (in fact I don't see
>>>>> another way so I am assuming a compiler issue here).
>>>> I agree. I debugged gcc tonight and discovered what the problem was. I'll submit a patch to gcc.
>>>>> We also need to avoid
>>>>> declare the typedef for __cplusplus >= 202002L.
>>>> The typedef is already avoided if the __cpp_char8_t feature test macro is defined (builtin char8_t support can be enabled in previous C++ standard modes via the -fchar8_t option).
>>> If the compiler preprocessor defined for -std=c++20? I had the impression it is
>>> enabled iff -fchar8_t is set.
>> I realized now I did not write proper english, I meant to ask if gcc always define
>> __cpp_char8_t for -std=c++20 and higher since I had the impression that the
>> define is only active iff -fchar8_t is actively used.
> 
> English is hard :)
> 
> For gcc, clang, and MSVC, -std=c++20 (/std:c++20 for MSVC) implies -fchar8_t (/Zc:char8_t), but that feature can be disabled with -fno-char8_t (/Zc:char8_t-). For example:
> 
> $ gcc -E -dM -x c++ -std=c++17 /dev/null | grep __cpp_char8_t
> $ gcc -E -dM -x c++ -std=c++17 -fchar8_t /dev/null | grep __cpp_char8_t
> #define __cpp_char8_t 201811L
> $ gcc -E -dM -x c++ -std=c++20 /dev/null | grep __cpp_char8_t
> #define __cpp_char8_t 201811L
> $ gcc -E -dM -x c++ -std=c++20 -fno-char8_t /dev/null | grep __cpp_char8_t
> 
> Tom.
> 

Ok, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
  2022-07-22  5:24               ` Tom Honermann
  2022-07-22 11:21                 ` Adhemerval Zanella Netto
@ 2022-07-24  4:46                 ` Tom Honermann
  1 sibling, 0 replies; 26+ messages in thread
From: Tom Honermann @ 2022-07-24  4:46 UTC (permalink / raw)
  To: Adhemerval Zanella Netto, Joseph Myers; +Cc: libc-alpha

On 7/22/22 1:24 AM, Tom Honermann via Libc-alpha wrote:
> On 7/21/22 4:56 PM, Adhemerval Zanella Netto wrote:
>>
>> On 21/07/22 17:51, Tom Honermann wrote:
>>> On 7/21/22 3:22 PM, Adhemerval Zanella Netto wrote:
>>>> On 20/07/22 13:47, Tom Honermann wrote:
>>>>> Confirmed that this issue can be easily reproduced outside the 
>>>>> testsuite.
>>>>>
>>>>> $ cat t.cpp
>>>>> #include <uchar.h>
>>>>>
>>>>> $ g++ --version
>>>>> g++ (GCC) 13.0.0 20220720 (experimental)
>>>>> ...
>>>>>
>>>>> $ g++ -c -I/path/to/glibc-char8_t/include -std=c++17 
>>>>> -Werror=c++20-compat t.cpp
>>>>> In file included from t.cpp:1:
>>>>> /home/tom/products/glibc-char8_t/include/uchar.h:38:23: error: 
>>>>> identifier ‘char8_t’ is a keyword in C++20 [-Werror=c++20-compat]
>>>>>      38 | typedef unsigned char char8_t;
>>>>>         |                       ^~~~~~~
>>>>> cc1plus: some warnings being treated as errors
>>>>>
>>>>> The char8_t typedef is currently guarded by:
>>>>>
>>>>> /* Declare the C2x char8_t typedef in C2x modes, but only if the C++
>>>>>     __cpp_char8_t feature test macro is not defined.  */
>>>>> #if __GLIBC_USE (ISOC2X) && !defined __cpp_char8_t
>>>>> /* Define the 8-bit character type.  */
>>>>> typedef unsigned char char8_t;
>>>>> #endif
>>>>>
>>>>> __GLIBC_USE (ISOC2X) evaluates to true because gcc unconditionally 
>>>>> defines _GNU_SOURCE. I believe otherwise, C++17 mode would only 
>>>>> (or should only) imply __GLIBC_USE (ISOC11).
>>>>>
>>>>> Regardless, it seems that directives should be added to suppress 
>>>>> the diagnostic. I tried prototyping such a fix, but it doesn't 
>>>>> seem to work for me. I don't understand why.
>>>> I have tried as well and I can't get to work either.  It would 
>>>> expect to work
>>>> as we have done bits/stdlib-bsearch.h, could it be a gcc issue?
>>> Yes, this appears to be a gcc issue. I spent some time looking at 
>>> gcc source code, but didn't find anything obvious. I verified the 
>>> same technique does work to suppress the similar warning issued for 
>>> use of, e.g., constexpr, as an identifier when -Wc++11-compat is 
>>> enabled. I found tests that exercise #pragma GCC diagnostic ignored 
>>> "-Wc++-compat", but none for -Wc++20-compat (or -Wc++11-compat).
>>>
>>> Tom.
>>>
>> In any case I think the fix below is the correct way (in fact I don't 
>> see
>> another way so I am assuming a compiler issue here).
> I agree. I debugged gcc tonight and discovered what the problem was. 
> I'll submit a patch to gcc.

A patch has been submitted to gcc to correct '#pragma GCC diagnostic 
ignored' based suppression for -Wc++20-compat diagnostics. See the 
thread at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598736.html.

Tom.


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-07-24  4:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
2022-07-04 18:16   ` Adhemerval Zanella
2022-06-30 12:52 ` [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef Tom Honermann
2022-07-04 18:33   ` Adhemerval Zanella
2022-07-19 21:08     ` Joseph Myers
2022-07-20 12:04       ` Adhemerval Zanella Netto
2022-07-20 13:54         ` Florian Weimer
2022-07-20 14:31           ` Adhemerval Zanella Netto
2022-07-20 15:05             ` Florian Weimer
2022-07-20 16:53               ` Tom Honermann
2022-07-20 16:47       ` Tom Honermann
2022-07-21 19:22         ` Adhemerval Zanella Netto
2022-07-21 20:51           ` Tom Honermann
2022-07-21 20:56             ` Adhemerval Zanella Netto
2022-07-22  5:24               ` Tom Honermann
2022-07-22 11:21                 ` Adhemerval Zanella Netto
2022-07-22 14:15                   ` Adhemerval Zanella Netto
2022-07-22 17:00                     ` Tom Honermann
2022-07-22 17:01                       ` Adhemerval Zanella Netto
2022-07-24  4:46                 ` Tom Honermann
2022-06-30 12:52 ` [PATCH v4 3/3] stdlib: Tests for " Tom Honermann
2022-07-04 18:58   ` Adhemerval Zanella
2022-07-04 19:08 ` [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Adhemerval Zanella
2022-07-06  3:27   ` Tom Honermann
2022-07-06 12:23     ` Adhemerval Zanella

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).