From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) by sourceware.org (Postfix) with ESMTPS id B1DF538582A2 for ; Mon, 4 Jul 2022 18:58:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B1DF538582A2 Received: by mail-vk1-xa33.google.com with SMTP id b81so4852237vkf.1 for ; Mon, 04 Jul 2022 11:58:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=s8jGE2GbgeQT656pQDu/6Ds48ZyNap4ie+dU3fB6u8s=; b=EjHwMySraGMJpN9PTca4UkWJKffwnIy3rc7sjVmQripZ5SJD2uOqocwHZuRgZebny7 kUTsPBnRYwDyd/1JfhppabEIg2nFzC4QrEKciLnEISOivAuOREjHU+4TNWxbHW8ZQwWC TehG5YkDWEzu6RubSDLrwIEiTegEI1iBnuuMCinaRg+47HTnrmNWkmkI83aKntPpGAWa bNnxM1wNE9tXeZj3wphUqN+UbmSg4zRfm30pJu7DNSvf7fHepUBQZHFCkoXppIF8n5qD z/aRrf3XQSlpCYDrUjenMf0Gp7TGjCc05tFx4mK/s36GFDX887ZvX7gHQKwvsR+weDRO 1RzA== X-Gm-Message-State: AJIora98qBa2XPBlLTiJYcxydOttkV8WCrvTlci0YQ/b+dhFU4zkvdNb AGwuHnhTAZR1fbo8+XJomWcxCsowT0+9c5Ie X-Google-Smtp-Source: AGRyM1s9MmrFMvLrgO3BJnEo57crvIMue+3ehkAILW3YQCAhfQV0SIkGc6V1Sb5XRmKnBjikbiSeRQ== X-Received: by 2002:a1f:a484:0:b0:36c:b137:b2e with SMTP id n126-20020a1fa484000000b0036cb1370b2emr18630400vke.29.1656961089596; Mon, 04 Jul 2022 11:58:09 -0700 (PDT) Received: from smtpclient.apple ([2804:431:c7cb:fef6:fc57:dc88:c1a6:22c5]) by smtp.gmail.com with ESMTPSA id 188-20020a1f16c5000000b00374066cc9ffsm178267vkw.6.2022.07.04.11.58.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Jul 2022 11:58:09 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH v4 3/3] stdlib: Tests for mbrtoc8(), c8rtomb(), and the char8_t typedef. From: Adhemerval Zanella In-Reply-To: <20220630125215.6052-4-tom@honermann.net> Date: Mon, 4 Jul 2022 15:58:06 -0300 Cc: libc-alpha@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: <5BC53FAF-5520-4D24-9C6B-F7F7767E4EA6@linaro.org> References: <20220630125215.6052-1-tom@honermann.net> <20220630125215.6052-4-tom@honermann.net> To: Tom Honermann X-Mailer: Apple Mail (2.3696.100.31) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jul 2022 18:58:15 -0000 > On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha = wrote: >=20 > This change adds tests for the mbrtoc8 and c8rtomb functions adopted = for > C++20 via WG21 P0482R6 and for C2X via WG14 N2653, and for the char8_t > typedef adopted for C2X from WG14 N2653. >=20 > The tests for mbrtoc8 and c8rtomb specifically exercise conversion to > and from Big5-HKSCS because of special cases that arise with that = encoding. > Big5-HKSCS defines some double byte sequences that convert to more = than > one Unicode code point. In order to test this, the locale = dependencies > for running tests under wcsmbs is expanded to include = zh_HK.BIG5-HKSCS. LGMT, with some minor style issues below. Reviewed-by: Adhemerval Zanella > --- > wcsmbs/Makefile | 3 +- > wcsmbs/test-c8rtomb.c | 613 ++++++++++++++++++++++++++++++++++++++++++ > wcsmbs/test-mbrtoc8.c | 539 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 1154 insertions(+), 1 deletion(-) > create mode 100644 wcsmbs/test-c8rtomb.c > create mode 100644 wcsmbs/test-mbrtoc8.c >=20 > diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile > index bda281ad70..e6b9e8743a 100644 > --- a/wcsmbs/Makefile > +++ b/wcsmbs/Makefile > @@ -52,6 +52,7 @@ tests :=3D tst-wcstof wcsmbs-tst1 tst-wcsnlen = tst-btowc tst-mbrtowc \ > tst-c16c32-1 wcsatcliff tst-wcstol-locale tst-wcstod-nan-locale = \ > tst-wcstod-round test-char-types tst-fgetwc-after-eof \ > tst-wcstod-nan-sign tst-c16-surrogate tst-c32-state \ > + test-mbrtoc8 test-c8rtomb \ > $(addprefix test-,$(strop-tests)) tst-mbstowcs \ > tst-wprintf-binary >=20 > @@ -59,7 +60,7 @@ include ../Rules >=20 > ifeq ($(run-built-tests),yes) > LOCALES :=3D de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 = hr_HR.ISO-8859-2 \ > - ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 > + ja_JP.EUC-JP zh_TW.EUC-TW tr_TR.UTF-8 tr_TR.ISO-8859-9 = zh_HK.BIG5-HKSCS > include ../gen-locales.mk >=20 > $(objpfx)tst-btowc.out: $(gen-locales) > diff --git a/wcsmbs/test-c8rtomb.c b/wcsmbs/test-c8rtomb.c > new file mode 100644 > index 0000000000..6d72189e86 > --- /dev/null > +++ b/wcsmbs/test-c8rtomb.c > @@ -0,0 +1,613 @@ > +/* Test c8rtomb. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be = useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static int > +test_truncated_code_unit_sequence (void) > +{ > + /* Missing trailing code unit for a two code byte unit sequence. = */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xC2"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Missing first trailing code unit for a three byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xE0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Missing second trailing code unit for a three byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xE0\xA0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Missing first trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Missing second trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\x90"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Missing third trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\x90\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_invalid_trailing_code_unit_sequence (void) > +{ > + /* Invalid trailing code unit for a two code byte unit sequence. = */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xC2\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Invalid first trailing code unit for a three byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xE0\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Invalid second trailing code unit for a three byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xE0\xA0\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Invalid first trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Invalid second trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\x90\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Invalid third trailing code unit for a four byte code unit = sequence. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\x90\x80\xC0"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_lone_trailing_code_units (void) > +{ > + /* Lone trailing code unit. */ > + const char8_t *u8s =3D (const char8_t*) u8"\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + > + return 0; > +} > + > +static int > +test_overlong_encoding (void) > +{ > + /* Two byte overlong encoding. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xC0\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Two byte overlong encoding. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xC1\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Three byte overlong encoding. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xE0\x9F\xBF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Four byte overlong encoding. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF0\x8F\xBF\xBF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_surrogate_range (void) > +{ > + /* Would encode U+D800. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xED\xA0\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Would encode U+DFFF. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xED\xBF\xBF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_out_of_range_encoding (void) > +{ > + /* Would encode U+00110000. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF4\x90\x80\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + /* Would encode U+00140000. */ > + { > + const char8_t *u8s =3D (const char8_t*) u8"\xF5\x90\x80\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + errno =3D 0; > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_null_output_buffer (void) > +{ > + /* Null character with an initial state. */ > + { > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (NULL, u8"X"[0], &s), (size_t) 1); > + /* Assert the state is now an initial state. */ > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Null buffer with a state corresponding to an incompletely read = code > + unit sequence. In this case, an error occurs since insufficient > + information is available to complete the already started code = unit > + sequence and return to the initial state. */ > + { > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8"\xC2"[0], &s), (size_t) 0); > + errno =3D 0; > + TEST_COMPARE (c8rtomb (NULL, u8"\x80"[0], &s), (size_t) -1); > + TEST_COMPARE (errno, EILSEQ); > + } > + > + return 0; > +} > + > +static int > +test_utf8 (void) > +{ > + xsetlocale (LC_ALL, "de_DE.UTF-8"); > + > + /* Null character. */ > + { > + /* U+0000 =3D> 0x00 */ > + const char8_t *u8s =3D (const char8_t*) u8"\x00"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); > + TEST_COMPARE (buf[0], (char) 0x00); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First non-null character in the code point range that maps to a = single > + code unit. */ > + { > + /* U+0001 =3D> 0x01 */ > + const char8_t *u8s =3D (const char8_t*) u8"\x01"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); > + TEST_COMPARE (buf[0], (char) 0x01); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to a single = code unit. */ > + { > + /* U+007F =3D> 0x7F */ > + const char8_t *u8s =3D (const char8_t*) u8"\x7F"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 1); > + TEST_COMPARE (buf[0], (char) 0x7F); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to two code = units. */ > + { > + /* U+0080 =3D> 0xC2 0x80 */ > + const char8_t *u8s =3D (const char8_t*) u8"\xC2\x80"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); > + TEST_COMPARE (buf[0], (char) 0xC2); > + TEST_COMPARE (buf[1], (char) 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to two code = units. */ > + { > + /* U+07FF =3D> 0xDF 0xBF */ > + const char8_t *u8s =3D (const char8_t*) u8"\u07FF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 2); > + TEST_COMPARE (buf[0], (char) 0xDF); > + TEST_COMPARE (buf[1], (char) 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to three code = units. */ > + { > + /* U+0800 =3D> 0xE0 0xA0 0x80 */ > + const char8_t *u8s =3D (const char8_t*) u8"\u0800"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xE0); > + TEST_COMPARE (buf[1], (char) 0xA0); > + TEST_COMPARE (buf[2], (char) 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to three code = units > + before the surrogate code point range. */ > + { > + /* U+D7FF =3D> 0xED 0x9F 0xBF */ > + const char8_t *u8s =3D (const char8_t*) u8"\uD7FF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xED); > + TEST_COMPARE (buf[1], (char) 0x9F); > + TEST_COMPARE (buf[2], (char) 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to three code = units > + after the surrogate code point range. */ > + { > + /* U+E000 =3D> 0xEE 0x80 0x80 */ > + const char8_t *u8s =3D (const char8_t*) u8"\uE000"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xEE); > + TEST_COMPARE (buf[1], (char) 0x80); > + TEST_COMPARE (buf[2], (char) 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Not a BOM. */ > + { > + /* U+FEFF =3D> 0xEF 0xBB 0xBF */ > + const char8_t *u8s =3D (const char8_t*) u8"\uFEFF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xEF); > + TEST_COMPARE (buf[1], (char) 0xBB); > + TEST_COMPARE (buf[2], (char) 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Replacement character. */ > + { > + /* U+FFFD =3D> 0xEF 0xBF 0xBD */ > + const char8_t *u8s =3D (const char8_t*) u8"\uFFFD"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xEF); > + TEST_COMPARE (buf[1], (char) 0xBF); > + TEST_COMPARE (buf[2], (char) 0xBD); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to three code = units. */ > + { > + /* U+FFFF =3D> 0xEF 0xBF 0xBF */ > + const char8_t *u8s =3D (const char8_t*) u8"\uFFFF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 3); > + TEST_COMPARE (buf[0], (char) 0xEF); > + TEST_COMPARE (buf[1], (char) 0xBF); > + TEST_COMPARE (buf[2], (char) 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to four code = units. */ > + { > + /* U+10000 =3D> 0xF0 0x90 0x80 0x80 */ > + const char8_t *u8s =3D (const char8_t*) u8"\U00010000"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); > + TEST_COMPARE (buf[0], (char) 0xF0); > + TEST_COMPARE (buf[1], (char) 0x90); > + TEST_COMPARE (buf[2], (char) 0x80); > + TEST_COMPARE (buf[3], (char) 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to four code = units. */ > + { > + /* U+10FFFF =3D> 0xF4 0x8F 0xBF 0xBF */ > + const char8_t *u8s =3D (const char8_t*) u8"\U0010FFFF"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 4); > + TEST_COMPARE (buf[0], (char) 0xF4); > + TEST_COMPARE (buf[1], (char) 0x8F); > + TEST_COMPARE (buf[2], (char) 0xBF); > + TEST_COMPARE (buf[3], (char) 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + return 0; > +} > + > +static int > +test_big5_hkscs (void) > +{ > + xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS"); > + > + /* A pair of two byte UTF-8 code unit sequences that map a Unicode = code > + point and combining character to a single double byte character. = */ > + { > + /* U+00CA U+0304 =3D> 0x88 0x62 */ > + const char8_t *u8s =3D (const char8_t*) u8"\u00CA\u0304"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); > + TEST_COMPARE (buf[0], (char) 0x88); > + TEST_COMPARE (buf[1], (char) 0x62); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Another pair of two byte UTF-8 code unit sequences that map a = Unicode code > + point and combining character to a single double byte character. = */ > + { > + /* U+00EA U+030C =3D> 0x88 0xA5 */ > + const char8_t *u8s =3D (const char8_t*) u8"\u00EA\u030C"; > + char buf[MB_LEN_MAX] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (c8rtomb (buf, u8s[0], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[1], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[2], &s), (size_t) 0); > + TEST_COMPARE (c8rtomb (buf, u8s[3], &s), (size_t) 2); > + TEST_COMPARE (buf[0], (char) 0x88); > + TEST_COMPARE (buf[1], (char) 0xA5); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + return 0; > +} > + > +static int > +do_test (void) > +{ > + test_truncated_code_unit_sequence (); > + test_invalid_trailing_code_unit_sequence (); > + test_lone_trailing_code_units (); > + test_overlong_encoding (); > + test_surrogate_range (); > + test_out_of_range_encoding (); > + test_null_output_buffer (); > + test_utf8 (); > + test_big5_hkscs (); > + return 0; > +} > + > +#include Ok. > diff --git a/wcsmbs/test-mbrtoc8.c b/wcsmbs/test-mbrtoc8.c > new file mode 100644 > index 0000000000..b282fa6dba > --- /dev/null > +++ b/wcsmbs/test-mbrtoc8.c > @@ -0,0 +1,539 @@ > +/* Test mbrtoc8. > + Copyright (C) 2022 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be = useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +static int > +test_utf8 (void) > +{ > + xsetlocale (LC_ALL, "de_DE.UTF-8"); > + > + /* No inputs. */ > + { > + const char *mbs =3D ""; > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 0, &s), (size_t) -2); /* no = input */ > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Null character. */ > + { > + const char *mbs =3D "\x00"; /* 0x00 =3D> U+0000 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 0); Style: strlen (mbs) + 1. > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0x00); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First non-null character in the code point range that maps to a = single > + code unit. */ > + { > + const char *mbs =3D "\x01"; /* 0x01 =3D> U+0001 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0x01); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to a single = code unit. */ > + { > + const char *mbs =3D "\x7F"; /* 0x7F =3D> U+007F */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0x7F); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to two code = units. */ > + { > + const char *mbs =3D "\xC2\x80"; /* 0xC2 0x80 =3D> U+0080 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); > + mbs +=3D 2; > + TEST_COMPARE (buf[0], 0xC2); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xC2\x80"; /* 0xC2 0x80 =3D> U+0080 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xC2); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to two code = units. */ > + { > + const char *mbs =3D "\xDF\xBF"; /* 0xDF 0xBF =3D> U+07FF */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); > + mbs +=3D 2; > + TEST_COMPARE (buf[0], 0xDF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xDF\xBF"; /* 0xDF 0xBF =3D> U+07FF */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xDF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to three code = units. */ > + { > + const char *mbs =3D u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 =3D> = U+0800 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xE0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xA0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D u8"\xE0\xA0\x80"; /* 0xE0 0xA0 0x80 =3D> = U+0800 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xE0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xA0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to three code = units > + before the surrogate code point range. */ > + { > + const char *mbs =3D "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF =3D> U+D7FF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xED); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x9F); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xED\x9F\xBF"; /* 0xED 0x9F 0xBF =3D> U+D7FF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xED); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x9F); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to three code = units > + after the surrogate code point range. */ > + { > + const char *mbs =3D "\xEE\x80\x80"; /* 0xEE 0x80 0x80 =3D> U+E000 = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xEE); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xEE\x80\x80"; /* 0xEE 0x80 0x80 =3D> U+E000 = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xEE); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Not a BOM. */ > + { > + const char *mbs =3D "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF =3D> U+FEFF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBB); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xEF\xBB\xBF"; /* 0xEF 0xBB 0xBF =3D> U+FEFF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBB); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Replacement character. */ > + { > + const char *mbs =3D "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD =3D> U+FFFD = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBD); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xEF\xBF\xBD"; /* 0xEF 0xBF 0xBD =3D> U+FFFD = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBD); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to three code = units. */ > + { > + const char *mbs =3D "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF =3D> U+FFFF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 3); > + mbs +=3D 3; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xEF\xBF\xBF"; /* 0xEF 0xBF 0xBF =3D> U+FFFF = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xEF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* First character in the code point range that maps to four code = units. */ > + { > + const char *mbs =3D "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 =3D>= U+10000 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); > + mbs +=3D 4; > + TEST_COMPARE (buf[0], 0xF0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x90); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xF0\x90\x80\x80"; /* 0xF0 0x90 0x80 0x80 =3D>= U+10000 */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xF0); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x90); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x80); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Last character in the code point range that maps to four code = units. */ > + { > + const char *mbs =3D "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF =3D>= U+10FFFF */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 4); > + mbs +=3D 4; > + TEST_COMPARE (buf[0], 0xF4); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x8F); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\xF4\x8F\xBF\xBF"; /* 0xF4 0x8F 0xBF 0xBF =3D>= U+10FFFF */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xF4); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x8F); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xBF); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + return 0; > +} > + > +static int > +test_big5_hkscs (void) > +{ > + xsetlocale (LC_ALL, "zh_HK.BIG5-HKSCS"); > + > + /* A double byte character that maps to a pair of two byte UTF-8 = code unit > + sequences. */ > + { > + const char *mbs =3D "\x88\x62"; /* 0x88 0x62 =3D> U+00CA U+0304 = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); > + mbs +=3D 2; > + TEST_COMPARE (buf[0], 0xC3); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x8A); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xCC); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x84); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\x88\x62"; /* 0x88 0x62 =3D> U+00CA U+0304 = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xC3); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x8A); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xCC); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x84); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Another double byte character that maps to a pair of two byte = UTF-8 code > + unit sequences. */ > + { > + const char *mbs =3D "\x88\xA5"; /* 0x88 0xA5 =3D> U+00EA U+030C = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) 2); > + mbs +=3D 2; > + TEST_COMPARE (buf[0], 0xC3); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xAA); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0xCC); > + TEST_COMPARE (mbrtoc8 (buf, mbs, strlen(mbs)+1, &s), (size_t) = -3); > + TEST_COMPARE (buf[0], 0x8C); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + /* Same as last test, but one code unit at a time. */ > + { > + const char *mbs =3D "\x88\xA5"; /* 0x88 0xA5 =3D> U+00EA U+030C = */ > + char8_t buf[1] =3D { 0 }; > + mbstate_t s =3D { 0 }; > + > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -2); > + mbs +=3D 1; > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) 1); > + mbs +=3D 1; > + TEST_COMPARE (buf[0], 0xC3); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xAA); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0xCC); > + TEST_COMPARE (mbrtoc8 (buf, mbs, 1, &s), (size_t) -3); > + TEST_COMPARE (buf[0], 0x8C); > + TEST_VERIFY (mbsinit (&s)); > + } > + > + return 0; > +} > + > +static int > +do_test (void) > +{ > + test_utf8 (); > + test_big5_hkscs (); > + return 0; > +} > + > +#include > --=20 > 2.32.0 >=20