public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Tom Honermann <tom@honermann.net>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v4 3/3] stdlib: Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef.
Date: Mon, 4 Jul 2022 15:58:06 -0300	[thread overview]
Message-ID: <5BC53FAF-5520-4D24-9C6B-F7F7767E4EA6@linaro.org> (raw)
In-Reply-To: <20220630125215.6052-4-tom@honermann.net>



> On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha <libc-alpha@sourceware.org> wrote:
> 
> This change adds tests for the mbrtoc8 and c8rtomb functions adopted for
> C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t
> typedef adopted for C2X from WG14 N2653.
> 
> The tests for mbrtoc8 and c8rtomb specifically exercise conversion to
> and from Big5-HKSCS because of special cases that arise with that encoding.
> Big5-HKSCS defines some double byte sequences that convert to more than
> one Unicode code point.  In order to test this, the locale dependencies
> for running tests under wcsmbs is expanded to include zh_HK.BIG5-HKSCS.

LGMT, with some minor style issues below.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> ---
> wcsmbs/Makefile       |   3 +-
> wcsmbs/test-c8rtomb.c | 613 ++++++++++++++++++++++++++++++++++++++++++
> wcsmbs/test-mbrtoc8.c | 539 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 1154 insertions(+), 1 deletion(-)
> create mode 100644 wcsmbs/test-c8rtomb.c
> create mode 100644 wcsmbs/test-mbrtoc8.c
> 
> diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
> index bda281ad70..e6b9e8743a 100644
> --- a/wcsmbs/Makefile
> +++ b/wcsmbs/Makefile
> @@ -52,6 +52,7 @@ tests := tst-wcstof wcsmbs-tst1 tst-wcsnlen tst-btowc tst-mbrtowc \
> 	 tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale \
> 	 tst-wcstod-round test-char-types tst-fgetwc-after-eof \
> 	 tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \
> +	 test-mbrtoc8 test-c8rtomb \
> 	 $(addprefix test-,$(strop-tests)) tst-mbstowcs \
> 	 tst-wprintf-binary
> 
> @@ -59,7 +60,7 @@ include ../Rules
> 
> ifeq ($(run-built-tests),yes)
> LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 hr_HR.ISO-8859-2 \
> -	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9
> +	   ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 zh_HK.BIG5-HKSCS
> include ../gen-locales.mk
> 
> $(objpfx)tst-btowc.out: $(gen-locales)
> diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c
> new file mode 100644
> index 0000000000..6d72189e86
> --- /dev/null
> +++ b/wcsmbs/test-c8rtomb.c
> @@ -0,0 +1,613 @@
> +/* Test c8rtomb.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <errno.h>
> +#include <limits.h>
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <uchar.h>
> +#include <wchar.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +
> +static int
> +test_truncated_code_unit_sequence (void)
> +{
> +  /* Missing trailing code unit for a two code byte unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC2";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing first trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing second trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing first trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing second trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Missing third trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_invalid_trailing_code_unit_sequence (void)
> +{
> +  /* Invalid trailing code unit for a two code byte unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC2\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid first trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid second trailing code unit for a three byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\xA0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid first trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid second trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Invalid third trailing code unit for a four byte code unit sequence.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x90\x80\xC0";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t)  0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_lone_trailing_code_units (void)
> +{
> +  /* Lone trailing code unit.  */
> +  const char8_t *u8s = (const char8_t*) u8"\x80";
> +  char buf[MB_LEN_MAX] = { 0 };
> +  mbstate_t s = { 0 };
> +
> +  errno = 0;
> +  TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +  TEST_COMPARE (errno, EILSEQ);
> +
> +  return 0;
> +}
> +
> +static int
> +test_overlong_encoding (void)
> +{
> +  /* Two byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC0\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Two byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xC1\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Three byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xE0\x9F\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Four byte overlong encoding.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF0\x8F\xBF\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_surrogate_range (void)
> +{
> +  /* Would encode U+D800.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xED\xA0\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Would encode U+DFFF.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xED\xBF\xBF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_out_of_range_encoding (void)
> +{
> +  /* Would encode U+00110000.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF4\x90\x80\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  /* Would encode U+00140000.  */
> +  {
> +    const char8_t *u8s = (const char8_t*) u8"\xF5\x90\x80\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_null_output_buffer (void)
> +{
> +  /* Null character with an initial state.  */
> +  {
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1);
> +    /* Assert the state is now an initial state.  */
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Null buffer with a state corresponding to an incompletely read code
> +     unit sequence.  In this case, an error occurs since insufficient
> +     information is available to complete the already started code unit
> +     sequence and return to the initial state.  */
> +  {
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t)  0);
> +    errno = 0;
> +    TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1);
> +    TEST_COMPARE (errno, EILSEQ);
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_utf8 (void)
> +{
> +  xsetlocale (LC_ALL, "de_DE.UTF-8");
> +
> +  /* Null character.  */
> +  {
> +    /* U+0000 => 0x00 */
> +    const char8_t *u8s = (const char8_t*) u8"\x00";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x00);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First non-null character in the code point range that maps to a single
> +     code unit.  */
> +  {
> +    /* U+0001 => 0x01 */
> +    const char8_t *u8s = (const char8_t*) u8"\x01";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x01);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to a single code unit.  */
> +  {
> +    /* U+007F => 0x7F */
> +    const char8_t *u8s = (const char8_t*) u8"\x7F";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1);
> +    TEST_COMPARE (buf[0], (char) 0x7F);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to two code units.  */
> +  {
> +    /* U+0080 => 0xC2 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\xC2\x80";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0xC2);
> +    TEST_COMPARE (buf[1], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to two code units.  */
> +  {
> +    /* U+07FF => 0xDF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\u07FF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0xDF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units.  */
> +  {
> +    /* U+0800 => 0xE0 0xA0 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\u0800";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xE0);
> +    TEST_COMPARE (buf[1], (char) 0xA0);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units
> +     before the surrogate code point range.  */
> +  {
> +    /* U+D7FF => 0xED 0x9F 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uD7FF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xED);
> +    TEST_COMPARE (buf[1], (char) 0x9F);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units
> +     after the surrogate code point range.  */
> +  {
> +    /* U+E000 => 0xEE 0x80 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\uE000";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEE);
> +    TEST_COMPARE (buf[1], (char) 0x80);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Not a BOM.  */
> +  {
> +    /* U+FEFF => 0xEF 0xBB 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uFEFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBB);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Replacement character.  */
> +  {
> +    /* U+FFFD => 0xEF 0xBF 0xBD */
> +    const char8_t *u8s = (const char8_t*) u8"\uFFFD";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_COMPARE (buf[2], (char) 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units.  */
> +  {
> +    /* U+FFFF => 0xEF 0xBF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\uFFFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3);
> +    TEST_COMPARE (buf[0], (char) 0xEF);
> +    TEST_COMPARE (buf[1], (char) 0xBF);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to four code units.  */
> +  {
> +    /* U+10000 => 0xF0 0x90 0x80 0x80 */
> +    const char8_t *u8s = (const char8_t*) u8"\U00010000";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
> +    TEST_COMPARE (buf[0], (char) 0xF0);
> +    TEST_COMPARE (buf[1], (char) 0x90);
> +    TEST_COMPARE (buf[2], (char) 0x80);
> +    TEST_COMPARE (buf[3], (char) 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to four code units.  */
> +  {
> +    /* U+10FFFF => 0xF4 0x8F 0xBF 0xBF */
> +    const char8_t *u8s = (const char8_t*) u8"\U0010FFFF";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4);
> +    TEST_COMPARE (buf[0], (char) 0xF4);
> +    TEST_COMPARE (buf[1], (char) 0x8F);
> +    TEST_COMPARE (buf[2], (char) 0xBF);
> +    TEST_COMPARE (buf[3], (char) 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_big5_hkscs (void)
> +{
> +  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
> +
> +  /* A pair of two byte UTF-8 code unit sequences that map a Unicode code
> +     point and combining character to a single double byte character.  */
> +  {
> +    /* U+00CA U+0304 => 0x88 0x62 */
> +    const char8_t *u8s = (const char8_t*) u8"\u00CA\u0304";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0x88);
> +    TEST_COMPARE (buf[1], (char) 0x62);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Another pair of two byte UTF-8 code unit sequences that map a Unicode code
> +     point and combining character to a single double byte character.  */
> +  {
> +    /* U+00EA U+030C => 0x88 0xA5 */
> +    const char8_t *u8s = (const char8_t*) u8"\u00EA\u030C";
> +    char buf[MB_LEN_MAX] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0);
> +    TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2);
> +    TEST_COMPARE (buf[0], (char) 0x88);
> +    TEST_COMPARE (buf[1], (char) 0xA5);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  test_truncated_code_unit_sequence ();
> +  test_invalid_trailing_code_unit_sequence ();
> +  test_lone_trailing_code_units ();
> +  test_overlong_encoding ();
> +  test_surrogate_range ();
> +  test_out_of_range_encoding ();
> +  test_null_output_buffer ();
> +  test_utf8 ();
> +  test_big5_hkscs ();
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>

Ok.

> diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c
> new file mode 100644
> index 0000000000..b282fa6dba
> --- /dev/null
> +++ b/wcsmbs/test-mbrtoc8.c
> @@ -0,0 +1,539 @@
> +/* Test mbrtoc8.
> +   Copyright (C) 2022 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <https://www.gnu.org/licenses/>.  */
> +
> +#include <locale.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <uchar.h>
> +#include <wchar.h>
> +#include <support/check.h>
> +#include <support/support.h>
> +
> +static int
> +test_utf8 (void)
> +{
> +  xsetlocale (LC_ALL, "de_DE.UTF-8");
> +
> +  /* No inputs.  */
> +  {
> +    const char *mbs = "";
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no input */
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Null character.  */
> +  {
> +    const char *mbs = "\x00"; /* 0x00 => U+0000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0);

Style: strlen (mbs) + 1.

> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x00);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First non-null character in the code point range that maps to a single
> +     code unit.  */
> +  {
> +    const char *mbs = "\x01"; /* 0x01 => U+0001 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x01);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to a single code unit.  */
> +  {
> +    const char *mbs = "\x7F"; /* 0x7F => U+007F */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0x7F);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to two code units.  */
> +  {
> +    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC2);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xC2\x80"; /* 0xC2 0x80 => U+0080 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC2);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to two code units.  */
> +  {
> +    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xDF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xDF\xBF"; /* 0xDF 0xBF => U+07FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xDF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units.  */
> +  {
> +    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xE0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xA0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 => U+0800 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xE0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xA0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units
> +     before the surrogate code point range.  */
> +  {
> +    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xED);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x9F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF => U+D7FF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xED);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x9F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to three code units
> +     after the surrogate code point range.  */
> +  {
> +    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEE);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEE\x80\x80"; /* 0xEE 0x80 0x80 => U+E000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEE);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Not a BOM.  */
> +  {
> +    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBB);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF => U+FEFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBB);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Replacement character.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD => U+FFFD */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBD);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to three code units.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3);
> +    mbs += 3;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF => U+FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xEF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* First character in the code point range that maps to four code units.  */
> +  {
> +    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
> +    mbs += 4;
> +    TEST_COMPARE (buf[0], 0xF0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x90);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 => U+10000 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xF0);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x90);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x80);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Last character in the code point range that maps to four code units.  */
> +  {
> +    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4);
> +    mbs += 4;
> +    TEST_COMPARE (buf[0], 0xF4);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF => U+10FFFF */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xF4);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8F);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xBF);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +test_big5_hkscs (void)
> +{
> +  xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS");
> +
> +  /* A double byte character that maps to a pair of two byte UTF-8 code unit
> +     sequences.  */
> +  {
> +    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8A);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x84);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\x88\x62"; /* 0x88 0x62 => U+00CA U+0304 */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8A);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x84);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Another double byte character that maps to a pair of two byte UTF-8 code
> +     unit sequences.  */
> +  {
> +    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2);
> +    mbs += 2;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xAA);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8C);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  /* Same as last test, but one code unit at a time.  */
> +  {
> +    const char *mbs = "\x88\xA5"; /* 0x88 0xA5 => U+00EA U+030C */
> +    char8_t buf[1] = { 0 };
> +    mbstate_t s = { 0 };
> +
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2);
> +    mbs += 1;
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1);
> +    mbs += 1;
> +    TEST_COMPARE (buf[0], 0xC3);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xAA);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0xCC);
> +    TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3);
> +    TEST_COMPARE (buf[0], 0x8C);
> +    TEST_VERIFY (mbsinit (&s));
> +  }
> +
> +  return 0;
> +}
> +
> +static int
> +do_test (void)
> +{
> +  test_utf8 ();
> +  test_big5_hkscs ();
> +  return 0;
> +}
> +
> +#include <support/test-driver.c>
> -- 
> 2.32.0
> 


  reply	other threads:[~2022-07-04 18:58 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-30 12:52 [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Tom Honermann
2022-06-30 12:52 ` [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] Tom Honermann
2022-07-04 18:16   ` Adhemerval Zanella
2022-06-30 12:52 ` [PATCH v4 2/3] stdlib: Implement mbrtoc8(), c8rtomb(), and the char8_t typedef Tom Honermann
2022-07-04 18:33   ` Adhemerval Zanella
2022-07-19 21:08     ` Joseph Myers
2022-07-20 12:04       ` Adhemerval Zanella Netto
2022-07-20 13:54         ` Florian Weimer
2022-07-20 14:31           ` Adhemerval Zanella Netto
2022-07-20 15:05             ` Florian Weimer
2022-07-20 16:53               ` Tom Honermann
2022-07-20 16:47       ` Tom Honermann
2022-07-21 19:22         ` Adhemerval Zanella Netto
2022-07-21 20:51           ` Tom Honermann
2022-07-21 20:56             ` Adhemerval Zanella Netto
2022-07-22  5:24               ` Tom Honermann
2022-07-22 11:21                 ` Adhemerval Zanella Netto
2022-07-22 14:15                   ` Adhemerval Zanella Netto
2022-07-22 17:00                     ` Tom Honermann
2022-07-22 17:01                       ` Adhemerval Zanella Netto
2022-07-24  4:46                 ` Tom Honermann
2022-06-30 12:52 ` [PATCH v4 3/3] stdlib: Tests for " Tom Honermann
2022-07-04 18:58   ` Adhemerval Zanella [this message]
2022-07-04 19:08 ` [PATCH v4 0/3] C++20 P0482R6 and C2X N2653: char8_t, mbrtoc8(), and c8rtomb() Adhemerval Zanella
2022-07-06  3:27   ` Tom Honermann
2022-07-06 12:23     ` Adhemerval Zanella

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5BC53FAF-5520-4D24-9C6B-F7F7767E4EA6@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=libc-alpha@sourceware.org \
    --cc=tom@honermann.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).