public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Tom Honermann <tom@honermann.net>
To: libc-alpha <libc-alpha@sourceware.org>
Subject: [PATCH 2/3]: C++20 P0482R6 and C2X N2653: Implement mbrtoc8, c8rtomb,  char8_t
Date: Fri, 7 Jan 2022 19:39:11 -0500	[thread overview]
Message-ID: <a7b6eeac-4e0e-e189-8420-c9acd9ccbbe3@honermann.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 831 bytes --]

This patch provides implementations for the mbrtoc8 and c8rtomb 
functions adopted for C++20 via WG21 P0482R6 [1] and proposed for C2X 
via WG14 N2653 [2]. It also provides the char8_t typedef from WG14 N2653 
[2].

The mbrtoc8 and c8rtomb functions are declared in uchar.h if either of 
the C++20 __cpp_char8_t or _GNU_SOURCE feature test macros are defined.

The char8_t typedef is declared in uchar.h if _GNU_SOURCE is defined and 
__cpp_char8_t is not defined (if __cpp_char8_t is defined, then char8_t 
is a builtin type).

Tested on Linux x86_64.

Tom.

[1]: WG21 P0482R6
      "char8_t: A type for UTF-8 characters and strings (Revision 6)"
      https://wg21.link/p0482r6

[2]: WG14 N2653
      "char8_t: A type for UTF-8 characters and strings (Revision 1)"
      http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

[-- Attachment #2: n2653-2.patch --]
[-- Type: text/x-patch, Size: 27563 bytes --]

commit 0710b1004c1eb151d739c73090c4eab81e454eb1
Author: Tom Honermann <tom@honermann.net>
Date:   Wed Jan 5 18:42:03 2022 -0500

    Implement mbrtoc8(), c8rtomb(), and the char8_t typedef.
    
    This change provides implementations for the mbrtoc8 and c8rtomb
    functions adopted for C++20 via WG21 P0482R6 and proposed for C2X
    via WG14 N2653.  It also provides the char8_t typedef from N2653.
    
    The mbrtoc8 and c8rtomb functions are declared in uchar.h if
    either of the C++20 __cpp_char8_t feature test macro or the
    _GNU_SOURCE macro are defined.
    
    The char8_t typedef is declared in uchar.h if _GNU_SOURCE is
    defined and __cpp_char8_t is not defined (if __cpp_char8_t is
    defined, then char8_t is a builtin type).

diff --git a/NEWS b/NEWS
index 4762bfcc4e..ccb22553f2 100644
--- a/NEWS
+++ b/NEWS
@@ -237,6 +237,14 @@ Major new features:
 * The audit libraries will avoid unnecessary slowdown if it is not required
   PLT tracking (by not implementing the la_pltenter or la_pltexit callbacks).
 
+* The mbrtoc8 and c8rtomb functions are added for implementation of the
+  C++20 P0482R6 and C2X N2653 proposals.  These functions perform conversions
+  between multibyte sequences and the UTF-8 character encoding.  A char8_t
+  typedef is added for the C2X N2653 proposal.  The functions are declared
+  in uchar.h if the C++20 __cpp_char8_t feature test macro or _GNU_SOURCE
+  macro is defined.  The char8_t typedef is declared in uchar.h if _GNU_SOURCE
+  is defined and __cpp_char8_t is not defined.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The function pthread_mutex_consistent_np has been deprecated; programs
diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
index ecf8c0992e..d1188a83a5 100644
--- a/sysdeps/mach/hurd/i386/libc.abilist
+++ b/sysdeps/mach/hurd/i386/libc.abilist
@@ -2287,7 +2287,9 @@ GLIBC_2.34 shm_unlink F
 GLIBC_2.34 timespec_getres F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
 GLIBC_2.35 close_range F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index fed942ed4b..35586c023f 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2613,4 +2613,6 @@ GLIBC_2.34 tss_delete F
 GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
+GLIBC_2.35 c8rtomb F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 2867932704..4969c57a37 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2711,6 +2711,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist
index 239db7bab0..fccd1f8662 100644
--- a/sysdeps/unix/sysv/linux/arc/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arc/libc.abilist
@@ -2375,3 +2375,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index bc79dcfe8a..1cb2d4fd66 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -493,6 +493,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index 614607fd6b..b36859f8e1 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -490,6 +490,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 2b61543f0d..4f7828d822 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2649,3 +2649,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 6b3cb1adb4..d59accc822 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2598,6 +2598,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 7f608c1b64..37a065f555 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2782,6 +2782,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 865deec43f..69839c86fd 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2549,6 +2549,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index a172d74632..b8418d612c 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -494,6 +494,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 174e9c7739..8cfbb5f235 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2725,6 +2725,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index d042be1369..0d729329aa 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2698,3 +2698,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 332da62de2..3980c6df0d 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2695,3 +2695,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 2d6ec0d0e8..f8c2087559 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2690,6 +2690,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 6c5befa72b..ce87791286 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2688,6 +2688,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 5fb24c97e1..ee547ceb78 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2696,6 +2696,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index f4f29fc15e..4c558a283e 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2600,6 +2600,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 2e7300cd05..234ab84125 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2737,3 +2737,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 129a2f16a7..e383bac9bd 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2752,6 +2752,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index 7e23226779..ddbcb3e085 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2785,6 +2785,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 6f97392b70..daaa374fa0 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2508,6 +2508,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 29058a041a..a24f6b4751 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2810,3 +2810,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
index d2924766d2..3dc0e04f64 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist
@@ -2377,3 +2377,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index b770e05da3..af3bbfc83b 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2577,3 +2577,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index bed3433a2b..4025a7f913 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2750,6 +2750,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 4f1a143da5..55febf9508 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2545,6 +2545,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 92c8dec8ec..cc37920ee2 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2605,6 +2605,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index 263da58cb7..298654b119 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2602,6 +2602,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 0171efe7db..ed214429f0 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2745,6 +2745,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index 7f8d45f362..0dcc3674eb 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2572,6 +2572,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index c2f1a8ecc6..ec4e0b73e4 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2523,6 +2523,8 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index 8b43acf100..b60cd45821 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2629,3 +2629,5 @@ GLIBC_2.34 tss_get F
 GLIBC_2.34 tss_set F
 GLIBC_2.35 __memcmpeq F
 GLIBC_2.35 _dl_find_object F
+GLIBC_2.35 c8rtomb F
+GLIBC_2.35 mbrtoc8 F
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 3a3be9cd95..390d94bbb2 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -42,7 +42,7 @@ routines := wcscat wcschr wcscmp wcscpy wcscspn wcsdup wcslen wcsncat \
 	    wcsmbsload mbsrtowcs_l \
 	    isoc99_wscanf isoc99_vwscanf isoc99_fwscanf isoc99_vfwscanf \
 	    isoc99_swscanf isoc99_vswscanf \
-	    mbrtoc16 c16rtomb mbrtoc32 c32rtomb
+	    mbrtoc8 c8rtomb mbrtoc16 c16rtomb mbrtoc32 c32rtomb
 
 strop-tests :=  wcscmp wcsncmp wmemcmp wcslen wcschr wcsrchr wcscpy wcsnlen \
 		wcpcpy wcsncpy wcpncpy wcscat wcsncat wcschrnul wcsspn wcspbrk \
diff --git a/wcsmbs/Versions b/wcsmbs/Versions
index 0b31c1b940..d578518fad 100644
--- a/wcsmbs/Versions
+++ b/wcsmbs/Versions
@@ -49,4 +49,7 @@ libc {
     wcstof32; wcstof64; wcstof32x;
     wcstof32_l; wcstof64_l; wcstof32x_l;
   }
+  GLIBC_2.35 {
+    c8rtomb; mbrtoc8;
+  }
 }
diff --git a/wcsmbs/c8rtomb.c b/wcsmbs/c8rtomb.c
new file mode 100644
index 0000000000..8a5ffeab23
--- /dev/null
+++ b/wcsmbs/c8rtomb.c
@@ -0,0 +1,132 @@
+/* UTF-8 to multibyte conversion.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <errno.h>
+#include <uchar.h>
+#include <wchar.h>
+
+
+/* This is the private state used if PS is NULL.  */
+static mbstate_t state;
+
+size_t
+c8rtomb (char *s, char8_t c8, mbstate_t *ps)
+{
+  /* This implementation depends on the converter invoked by wcrtomb not
+     needing to retain state in either the top most bit of ps->__count or
+     in ps->__value between invocations.  This implementation uses the
+     top most bit of ps->__count to indicate that trailing code units are
+     expected and uses ps->__value to store previously seen code units.  */
+
+  wchar_t wc;
+
+  if (ps == NULL)
+    ps = &state;
+
+  if (s == NULL)
+    {
+      /* if 's' is a null pointer, behave as if u8'\0' was passed as 'c8'.  If
+         this occurs for an incomplete code unit sequence, then an error will
+         be reported below.  */
+      c8 = u8""[0];
+    }
+
+  if (! (ps->__count & 0x80000000))
+    {
+      /* Initial state.  */
+      if ((c8 >= 0x80 && c8 <= 0xC1) || c8 >= 0xF5)
+	{
+	  /* An invalid lead code unit.  */
+	  __set_errno (EILSEQ);
+	  return -1;
+	}
+      if (c8 >= 0xC2)
+	{
+	  /* A valid lead code unit.  */
+	  ps->__count |= 0x80000000;
+	  ps->__value.__wchb[0] = c8;
+          ps->__value.__wchb[3] = 1;
+	  return 0;
+	}
+      /* A single byte (ASCII) code unit.  */
+      wc = c8;
+    }
+  else
+    {
+      char8_t cu1 = ps->__value.__wchb[0];
+      if (ps->__value.__wchb[3] == 1)
+	{
+	  /* A single lead code unit was previously seen.  */
+	  if ((c8 < 0x80 || c8 > 0xBF) ||
+               (cu1 == 0xE0 && c8 < 0xA0) ||
+               (cu1 == 0xED && c8 > 0x9F) ||
+               (cu1 == 0xF0 && c8 < 0x90) ||
+               (cu1 == 0xF4 && c8 > 0x8F))
+	    {
+	      /* An invalid second code unit.  */
+	      __set_errno (EILSEQ);
+	      return -1;
+	    }
+	  if (cu1 >= 0xE0)
+	    {
+	      /* A three or four code unit sequence.  */
+	      ps->__value.__wchb[1] = c8;
+	      ++ps->__value.__wchb[3];
+	      return 0;
+	    }
+	  wc = ((cu1 & 0x1F) << 6) +
+	       (c8 & 0x3F);
+	}
+      else
+	{
+	  char8_t cu2 = ps->__value.__wchb[1];
+	  /* A three or four byte code unit sequence.  */
+	  if (c8 < 0x80 || c8 > 0xBF)
+	    {
+	      /* An invalid third or fourth code unit.  */
+	      __set_errno (EILSEQ);
+	      return -1;
+	    }
+	  if (ps->__value.__wchb[3] == 2 && cu1 >= 0xF0)
+	    {
+	      /* A four code unit sequence.  */
+	      ps->__value.__wchb[2] = c8;
+	      ++ps->__value.__wchb[3];
+	      return 0;
+	    }
+	  if (cu1 < 0xF0)
+	    {
+	      wc = ((cu1 & 0x0F) << 12) +
+		   ((cu2 & 0x3F) << 6) +
+		   (c8 & 0x3F);
+	    }
+	  else
+	    {
+	      char8_t cu3 = ps->__value.__wchb[2];
+	      wc = ((cu1 & 0x07) << 18) +
+		   ((cu2 & 0x3F) << 12) +
+		   ((cu3 & 0x3F) << 6) +
+		   (c8 & 0x3F);
+	    }
+	}
+      ps->__count &= 0x7fffffff;
+      ps->__value.__wch = 0;
+    }
+
+  return wcrtomb (s, wc, ps);
+}
diff --git a/wcsmbs/mbrtoc8.c b/wcsmbs/mbrtoc8.c
new file mode 100644
index 0000000000..8ca457088d
--- /dev/null
+++ b/wcsmbs/mbrtoc8.c
@@ -0,0 +1,126 @@
+/* Multibyte to UTF-8 conversion.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <dlfcn.h>
+#include <errno.h>
+#include <gconv.h>
+#include <uchar.h>
+#include <wcsmbsload.h>
+
+#include <sysdep.h>
+
+#ifndef EILSEQ
+# define EILSEQ EINVAL
+#endif
+
+
+/* This is the private state used if PS is NULL.  */
+static mbstate_t state;
+
+size_t
+mbrtoc8 (char8_t *pc8, const char *s, size_t n, mbstate_t *ps)
+{
+  /* This implementation depends on the converter invoked by mbrtowc() not
+     needing to retain state in either the top most bit of ps->__count or
+     in ps->__value between invocations.  This implementation uses the
+     top most bit of ps->__count to indicate that trailing code units are
+     yet to be written and uses ps->__value to store those code units.  */
+
+  if (ps == NULL)
+    ps = &state;
+
+  /* If state indicates that trailing code units are yet to be written, write
+     those first regardless of whether 's' is a null pointer.  */
+  if (ps->__count & 0x80000000)
+    {
+      /* ps->__value.__wchb[3] stores the index of the next code unit to
+         write.  Code units are stored in reverse order.  */
+      size_t i = ps->__value.__wchb[3];
+      if (pc8 != NULL)
+	{
+	  *pc8 = ps->__value.__wchb[i];
+	}
+      if (i == 0)
+	{
+	  ps->__count &= 0x7fffffff;
+	  ps->__value.__wch = 0;
+	}
+      else
+	--ps->__value.__wchb[3];
+      return -3;
+    }
+
+  if (s == NULL)
+    {
+      /* if 's' is a null pointer, behave as if a null pointer was passed for
+         'pc8', an empty string was passed for 's', and 1 passed for 'n'.  */
+      pc8 = NULL;
+      s = "";
+      n = 1;
+    }
+
+  wchar_t wc;
+  size_t result;
+
+  result = mbrtowc(&wc, s, n, ps);
+  if (result <= n)
+    {
+      if (wc <= 0x7F)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = wc;
+	}
+      else if (wc <= 0x7FF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xC0 + ((wc >> 6) & 0x1F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 0;
+	  ps->__count |= 0x80000000;
+	}
+      else if (wc <= 0xFFFF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xE0 + ((wc >> 12) & 0x0F);
+	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 1;
+	  ps->__count |= 0x80000000;
+	}
+      else if (wc <= 0x10FFFF)
+	{
+	  if (pc8 != NULL)
+	    *pc8 = 0xF0 + ((wc >> 18) & 0x07);
+	  ps->__value.__wchb[2] = 0x80 + ((wc >> 12) & 0x3F);
+	  ps->__value.__wchb[1] = 0x80 + ((wc >> 6) & 0x3F);
+	  ps->__value.__wchb[0] = 0x80 + (wc & 0x3F);
+	  ps->__value.__wchb[3] = 2;
+	  ps->__count |= 0x80000000;
+	}
+    }
+  if (result == 0 && wc != 0)
+    {
+      /* mbrtowc() never returns -3.  When a MB sequence converts to multiple
+         WCs, no input is consumed when writing the subsequent WCs resulting
+         in a result of 0 even if a null character wasn't written.  */
+      result = -3;
+    }
+
+  return result;
+}
diff --git a/wcsmbs/uchar.h b/wcsmbs/uchar.h
index 6020f66cf6..2f15c6f74d 100644
--- a/wcsmbs/uchar.h
+++ b/wcsmbs/uchar.h
@@ -31,6 +31,13 @@
 #include <bits/types.h>
 #include <bits/types/mbstate_t.h>
 
+/* Declare the char8_t typedef in _GNU_SOURCE mode, but only if the C++
+   __cpp_char8_t feature test macro is not defined.  */
+#if defined __USE_GNU && !defined __cpp_char8_t
+/* Define the 8-bit character type.  */
+typedef unsigned char char8_t;
+#endif
+
 #ifndef __USE_ISOCXX11
 /* Define the 16-bit and 32-bit character types.  */
 typedef __uint_least16_t char16_t;
@@ -40,6 +47,20 @@ typedef __uint_least32_t char32_t;
 
 __BEGIN_DECLS
 
+/* Declare mbrtoc8() and c8rtomb() in _GNU_SOURCE mode or if the C++
+   __cpp_char8_t feature test macro is defined.  */
+#if defined __USE_GNU || defined __cpp_char8_t
+/* Write char8_t representation of multibyte character pointed
+   to by S to PC8.  */
+extern size_t mbrtoc8  (char8_t *__restrict __pc8,
+			const char *__restrict __s, size_t __n,
+			mbstate_t *__restrict __p) __THROW;
+
+/* Write multibyte representation of char8_t C8 to S.  */
+extern size_t c8rtomb  (char *__restrict __s, char8_t __c8,
+			mbstate_t *__restrict __ps) __THROW;
+#endif
+
 /* Write char16_t representation of multibyte character pointed
    to by S to PC16.  */
 extern size_t mbrtoc16 (char16_t *__restrict __pc16,

             reply	other threads:[~2022-01-08  0:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-08  0:39 Tom Honermann [this message]
2022-01-11  0:53 ` Joseph Myers
2022-01-11 19:23   ` Tom Honermann
2022-01-20 23:17     ` Tom Honermann
2022-01-21 20:01       ` Adhemerval Zanella
2022-01-22 12:24         ` Tom Honermann
2022-02-16 18:29           ` Joseph Myers
2022-02-16 19:14             ` tom
  -- strict thread matches above, loose matches on Subject: below --
2022-02-27 16:53 Tom Honermann
2022-02-28 23:01 ` Joseph Myers
2022-03-01  3:40   ` Tom Honermann
2022-05-17 15:57     ` Carlos O'Donell
2022-05-17 18:05     ` Adhemerval Zanella
2022-05-17 18:12       ` Joseph Myers
2022-05-17 18:17         ` Adhemerval Zanella
2022-05-17 21:33           ` Florian Weimer
2022-05-18 15:32             ` Tom Honermann
2022-05-18 16:17               ` Adhemerval Zanella
2022-05-18 17:26                 ` Tom Honermann
2022-05-18 17:39                   ` Adhemerval Zanella
2022-05-18 17:40                 ` Florian Weimer
2022-05-18 17:57                   ` Adhemerval Zanella
2021-06-07  2:08 Tom Honermann
2021-06-07 18:53 ` Joseph Myers
2021-06-11 11:25   ` Tom Honermann
2021-06-11 16:28     ` Joseph Myers
2021-06-13 15:35       ` Tom Honermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7b6eeac-4e0e-e189-8420-c9acd9ccbbe3@honermann.net \
    --to=tom@honermann.net \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).