From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id A4545384C007 for ; Tue, 6 Sep 2022 14:06:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A4545384C007 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 21C772F4; Tue, 6 Sep 2022 16:06:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202205; t=1662473178; bh=HDIutxB+IajL3634UFarYj/IgByjVrgmUmFvynTa9hQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=sQYYB7E/mIdbQF2KmjfhkOOgJNPjmL1J2N57SRuurLr0A2vyOk98U5pfhPDg3gC24 z9QdzICyJun2a6ASZkqc9jyi0cX7f6dXizr4M2AK6Se6FJ364wIoaeEfBlzJFWuCR4 9WdOEyLAuA6NKJX3Fv+irzjerEHQdHFGeldiUJu94vIRu+g06E+aaDfU9bS1M6JVQj kqfw7rjd+zXUr+U6ECtgVla9DZzlohtMP49QgzI/kiSVPA3Zb5zeWbxK+FdaW8rCuL 3c7h/PHQQpFP3kGhcjDNsJDFk+GmcCmWkkP9uNN0vanlYkikRO0lsZkUWnJUzWAHVs CT1tg5Guf6TO71PBrYEVZHuvJVCd0R+qOyQJJsHZGDwVFmy42Iq+EZ9NEWdJGK/ie9 D87vGlKR9+eR8OC6bsdOCo13ax5g+SFSPo6BBo4fs4Gqzq7AlszgA6ws1uC9kgUnwG 9ZOjzBDYt5vly1LQv3G8+uZcTKN+sbGJGGRv18IjwAwlHbQJxAmuxIO1vZnoE/Kv1N STOQzJhyLiHRPjRuTpQcAw+sMm8j6+0Dy0VkUhcnFAyG9+kpbYNa/gd9qGumdmMgn0 UYyA2U7Oq3+lXmCtlYFS4qOxpNxfzMdQYRBsGfS3Z63id9f3LuxEhpGjDcSLses0fB qVuwKH5Fez/3AlO5+A41QUbo= Date: Tue, 6 Sep 2022 16:06:17 +0200 From: =?utf-8?B?0L3QsNCx?= To: libc-alpha@sourceware.org Cc: Florian Weimer Subject: [PATCH v2] POSIX locale covers every byte [BZ# 29511] Message-ID: <20220906140617.5dpxxfqh47ovcxfh@tarta.nabijaczleweli.xyz> References: <20220830181932.oggrz6f6itrpyi6g@tarta.nabijaczleweli.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="iuu2ecballkdxbik" Content-Disposition: inline In-Reply-To: <20220830181932.oggrz6f6itrpyi6g@tarta.nabijaczleweli.xyz> User-Agent: NeoMutt/20220429 X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FROM_SUSPICIOUS_NTLD,GIT_PATCH_0,KAM_INFOUSMEBIZ,PDS_OTHER_BAD_TLD,RDNS_DYNAMIC,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --iuu2ecballkdxbik Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable This is a trivial patch, largely duplicating the extant ASCII code There are two user-facing changes: * nl_langinfo(CODESET) is "POSIX" instead of "ANSI_X3.4-1968" * mbrtowc() and friends return b if b <=3D 0x7F else +b Since Issue 7 TC 2/Issue 8, the C/POSIX locale, effectively, (a) is 1-byte, stateless, and contains 256 characters (b) they collate in byte order (c) the first 128 characters are equivalent to ASCII (like previous) cf. https://www.austingroupbugs.net/view.php?id=3D663 for a summary of changes to the standard; in short, this means that mbrtowc() must never fail and must return b if b <=3D 0x7F else ab+c for all bytes b where c is some constant >=3D0x80 and a is a positive integer constant By strategically picking c=3D we land at the tail-end of the Unicode Low Surrogate Area at DC00-DFFF, described as > Isolated surrogate code points have no interpretation; > consequently, no character code charts or names lists > are provided for this range. and match musl Signed-off-by: Ahelenia Ziemia=C5=84ska --- v2: rebased, no changes, resending after a week per guidelines iconv/gconv_builtin.h | 8 + iconv/gconv_int.h | 7 + iconv/gconv_simple.c | 75 +++++++ iconv/tst-iconv_prog.sh | 43 ++++ inet/tst-idna_name_classify.c | 6 +- locale/tst-C-locale.c | 69 ++++++ localedata/locales/POSIX | 143 +++++++++++- stdio-common/tst-printf-bz25691.c | 2 + sysdeps/s390/multiarch/gconv_simple.c | 298 ++++++++++++++++++++++++++ wcsmbs/wcsmbsload.c | 10 +- 10 files changed, 652 insertions(+), 9 deletions(-) diff --git a/iconv/gconv_builtin.h b/iconv/gconv_builtin.h index 68c2369b1f..cd1805b3ce 100644 --- a/iconv/gconv_builtin.h +++ b/iconv/gconv_builtin.h @@ -89,6 +89,14 @@ BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", = 1, "=3DINTERNAL->ascii", __gconv_transform_internal_ascii, NULL, 4, 4, 1, 1) =20 =20 +BUILTIN_TRANSFORMATION ("POSIX//", "INTERNAL", 1, "=3Dposix->INTERNAL", + __gconv_transform_posix_internal, __gconv_btwoc_posix, + 1, 1, 4, 4) + +BUILTIN_TRANSFORMATION ("INTERNAL", "POSIX//", 1, "=3DINTERNAL->posix", + __gconv_transform_internal_posix, NULL, 4, 4, 1, 1) + + #if BYTE_ORDER =3D=3D BIG_ENDIAN BUILTIN_ALIAS ("UNICODEBIG//", "ISO-10646/UCS2/") BUILTIN_ALIAS ("UCS-2BE//", "ISO-10646/UCS2/") diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h index 1c6745043e..45ab1edfad 100644 --- a/iconv/gconv_int.h +++ b/iconv/gconv_int.h @@ -281,6 +281,8 @@ extern int __gconv_compare_alias (const char *name1, co= nst char *name2) =20 __BUILTIN_TRANSFORM (__gconv_transform_ascii_internal); __BUILTIN_TRANSFORM (__gconv_transform_internal_ascii); +__BUILTIN_TRANSFORM (__gconv_transform_posix_internal); +__BUILTIN_TRANSFORM (__gconv_transform_internal_posix); __BUILTIN_TRANSFORM (__gconv_transform_utf8_internal); __BUILTIN_TRANSFORM (__gconv_transform_internal_utf8); __BUILTIN_TRANSFORM (__gconv_transform_ucs2_internal); @@ -299,6 +301,11 @@ __BUILTIN_TRANSFORM (__gconv_transform_utf16_internal); only ASCII characters. */ extern wint_t __gconv_btwoc_ascii (struct __gconv_step *step, unsigned cha= r c); =20 +/* Specialized conversion function for a single byte to INTERNAL, + identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end + of the Low Surrogate Area at [U+DF80, U+DFFF]. */ +extern wint_t __gconv_btwoc_posix (struct __gconv_step *step, unsigned cha= r c); + #endif =20 __END_DECLS diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c index 640068d9ba..4cd01854cd 100644 --- a/iconv/gconv_simple.c +++ b/iconv/gconv_simple.c @@ -53,6 +53,18 @@ __gconv_btwoc_ascii (struct __gconv_step *step, unsigned= char c) return WEOF; } =20 +/* Specialized conversion function for a single byte to INTERNAL, + identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end + of the Low Surrogate Area at [U+DF80, U+DFFF]. */ +wint_t +__gconv_btwoc_posix (struct __gconv_step *step, unsigned char c) +{ + if (c < 0x80) + return c; + else + return 0xdf00 + c; +} + =20 /* Transform from the internal, UCS4-like format, to UCS4. The difference between the internal ucs4 format and the real UCS4 @@ -868,6 +880,69 @@ ucs4le_internal_loop_single (struct __gconv_step *step, #include =20 =20 +/* Convert from {[0, 0x7F] =3D> ISO 646-IRV; [0x80, 0xFF] =3D> [U+DF80, U+= DFFF]} + to the internal (UCS4-like) format. */ +#define DEFINE_INIT 0 +#define DEFINE_FINI 0 +#define MIN_NEEDED_FROM 1 +#define MIN_NEEDED_TO 4 +#define FROM_DIRECTION 1 +#define FROM_LOOP posix_internal_loop +#define TO_LOOP posix_internal_loop /* This is not used. */ +#define FUNCTION_NAME __gconv_transform_posix_internal +#define ONE_DIRECTION 1 + +#define MIN_NEEDED_INPUT MIN_NEEDED_FROM +#define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +#define LOOPFCT FROM_LOOP +#define BODY \ + { \ + if (__glibc_unlikely (*inptr > '\x7f')) \ + *((uint32_t *) outptr) =3D 0xdf00 + *inptr++; \ + else \ + *((uint32_t *) outptr) =3D *inptr++; \ + outptr +=3D sizeof (uint32_t); \ + } +#include +#include + + +/* Convert from the internal (UCS4-like) format to + {ISO 646-IRV =3D> [0, 0x7F]; [U+DF80, U+DFFF] =3D> [0x80, 0xFF]}. */ +#define DEFINE_INIT 0 +#define DEFINE_FINI 0 +#define MIN_NEEDED_FROM 4 +#define MIN_NEEDED_TO 1 +#define FROM_DIRECTION 1 +#define FROM_LOOP internal_posix_loop +#define TO_LOOP internal_posix_loop /* This is not used. */ +#define FUNCTION_NAME __gconv_transform_internal_posix +#define ONE_DIRECTION 1 + +#define MIN_NEEDED_INPUT MIN_NEEDED_FROM +#define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +#define LOOPFCT FROM_LOOP +#define BODY \ + { \ + uint32_t val =3D *((const uint32_t *) inptr); \ + if (__glibc_unlikely ((val > 0x7f && val < 0xdf80) || val > 0xdfff)) = \ + { \ + UNICODE_TAG_HANDLER (val, 4); \ + STANDARD_TO_LOOP_ERR_HANDLER (4); \ + } \ + else \ + { \ + if (__glibc_unlikely (val > 0x7f)) \ + val -=3D 0xdf00; \ + *outptr++ =3D val; \ + inptr +=3D sizeof (uint32_t); \ + } \ + } +#define LOOP_NEED_FLAGS +#include +#include + + /* Convert from the internal (UCS4-like) format to UTF-8. */ #define DEFINE_INIT 0 #define DEFINE_FINI 0 diff --git a/iconv/tst-iconv_prog.sh b/iconv/tst-iconv_prog.sh index b3d8bf5110..a24d8d2207 100644 --- a/iconv/tst-iconv_prog.sh +++ b/iconv/tst-iconv_prog.sh @@ -285,3 +285,46 @@ for errorcommand in "${errorarray[@]}"; do execute_test check_errtest_result done + +allbytes () +{ + for (( i =3D 0; i <=3D 255; i++ )); do + printf '\'"$(printf "%o" "$i")" + done +} + +allucs4be () +{ + for (( i =3D 0; i <=3D 127; i++ )); do + printf '\0\0\0\'"$(printf "%o" "$i")" + done + for (( i =3D 128; i <=3D 255; i++ )); do + printf '\0\0\xdf\'"$(printf "%o" "$i")" + done +} + +check_posix_result () +{ + if [ $? -eq 0 ]; then + result=3DPASS + else + result=3DFAIL + fi + + echo "$result: from \"$1\", to: \"$2\"" + + if [ "$result" !=3D "PASS" ]; then + exit 1 + fi +} + +check_posix_encoding () +{ + eval PROG=3D\"$ICONV\" + allbytes | $PROG -f POSIX -t UCS-4BE | cmp -s - <(allucs4be) + check_posix_result POSIX UCS-4BE + allucs4be | $PROG -f UCS-4BE -t POSIX | cmp -s - <(allbytes) + check_posix_result UCS-4BE POSIX +} + +check_posix_encoding diff --git a/inet/tst-idna_name_classify.c b/inet/tst-idna_name_classify.c index bfd34eee31..b379481844 100644 --- a/inet/tst-idna_name_classify.c +++ b/inet/tst-idna_name_classify.c @@ -37,11 +37,11 @@ do_test (void) puts ("info: C locale tests"); locale_insensitive_tests (); TEST_COMPARE (__idna_name_classify ("abc\200def"), - idna_name_encoding_error); + idna_name_nonascii); TEST_COMPARE (__idna_name_classify ("abc\200\\def"), - idna_name_encoding_error); + idna_name_nonascii_backslash); TEST_COMPARE (__idna_name_classify ("abc\377def"), - idna_name_encoding_error); + idna_name_nonascii); =20 puts ("info: en_US.ISO-8859-1 locale tests"); if (setlocale (LC_CTYPE, "en_US.ISO-8859-1") =3D=3D 0) diff --git a/locale/tst-C-locale.c b/locale/tst-C-locale.c index 6bd0367069..f30396ae12 100644 --- a/locale/tst-C-locale.c +++ b/locale/tst-C-locale.c @@ -229,6 +229,75 @@ run_test (const char *locname) STRTEST (YESSTR, ""); STRTEST (NOSTR, ""); =20 +#define CONVTEST(b, v) \ + { \ + unsigned char bs[] =3D {b, 0}; \ + mbstate_t ctx =3D {}; \ + wchar_t wc =3D -1; \ + size_t sz =3D mbrtowc(&wc, (char *) bs, 1, &ctx); \ + if (sz !=3D !!b) \ + { \ + printf ("mbrtowc(%02hhx) width in locale %s wrong " \ + "(is %zd, should be %d)\n", *bs, locname, sz, !!b); \ + result =3D 1; \ + } \ + if (wc !=3D v) \ + { \ + printf ("mbrtowc(%02hhx) value in locale %s wrong " \ + "(is %x, should be %x)\n", *bs, locname, wc, v); \ + result =3D 1; \ + } \ + } + for(int i =3D 0; i <=3D 0x7f; ++i) + CONVTEST(i, i); + for(int i =3D 0x80; i <=3D 0xff; ++i) + CONVTEST(i, 0xdf00 + i); + +#define DECONVTEST(v, b) \ + { \ + unsigned char ob =3D -1; \ + mbstate_t ctx =3D {}; \ + size_t sz =3D wcrtomb((char *) &ob, v, &ctx); \ + if (sz !=3D 1) \ + { \ + printf ("wcrtomb(%x) width in locale %s wrong " \ + "(is %zd, should be 1)\n", v, locname, sz); \ + result =3D 1; \ + } \ + if (ob !=3D b) \ + { \ + printf ("wcrtomb(%x) value in locale %s wrong " \ + "(is %hhx, should be %hhx)\n", v, locname, ob, b); \ + result =3D 1; \ + } \ + } +#define DECONVERR(v) \ + { \ + unsigned char ob =3D -1; \ + mbstate_t ctx =3D {}; \ + size_t sz =3D wcrtomb((char *) &ob, v, &ctx); \ + if (sz !=3D (size_t) -1) \ + { \ + printf ("wcrtomb(%x) width in locale %s wrong " \ + "(is %zd, should be (size_t )-1)\n", v, locname, sz); \ + result =3D 1; \ + } \ + if (ob !=3D (unsigned char) -1) \ + { \ + printf ("wcrtomb(%x) value in locale %s wrong " \ + "(is %hhx, should be unchanged)\n", v, locname, ob); \ + result =3D 1; \ + } \ + } + for(int i =3D 0; i <=3D 0x7f; ++i) + DECONVTEST(i, i); + for(int i =3D 0x80; i < 0xdf00; ++i) + DECONVERR(i); + for(int i =3D 0x80; i <=3D 0xff; ++i) + DECONVTEST(0xdf00 + i, i); + for(int i =3D 0xe000; i <=3D 0xffff; ++i) + DECONVERR(i); + /* Test the new locale mechanisms. */ loc =3D newlocale (LC_ALL_MASK, locname, NULL); if (loc =3D=3D NULL) diff --git a/localedata/locales/POSIX b/localedata/locales/POSIX index 7ec7f1c577..fc34a6abc1 100644 --- a/localedata/locales/POSIX +++ b/localedata/locales/POSIX @@ -97,6 +97,20 @@ END LC_CTYPE LC_COLLATE % This is the POSIX Locale definition for the LC_COLLATE category. % The order is the same as in the ASCII code set. +% Values above () inserted in order, per Issue 7 TC2, +% XBD, 7.3.2, LC_COLLATE Category in the POSIX Locale: +% > All characters not explicitly listed here shall be inserted +% > in the character collation order after the listed characters +% > and shall be assigned unique primary weights. If the listed +% > characters have ASCII encoding, the other characters shall +% > be in ascending order according to their coded character set values +% Since Issue 7 TC2 (XBD, 6.2 Character Encoding): +% > The POSIX locale shall contain 256 single-byte characters [...] +% (cf. bug 663, 674). +% this is in contrast to previous issues, which limited the POSIX +% locale to the Portable Character Set (7-bit ASCII). +% We use the end of the Low Surrogate Area to contain these, +% yielding [, ] order_start forward @@ -226,7 +240,134 @@ order_start forward -UNDEFINED + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + order_end % END LC_COLLATE diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz= 25691.c index 44844e71c3..e66242b58f 100644 --- a/stdio-common/tst-printf-bz25691.c +++ b/stdio-common/tst-printf-bz25691.c @@ -30,6 +30,8 @@ static int do_test (void) { + setlocale(LC_CTYPE, "C.UTF-8"); + mtrace (); =20 /* For 's' conversion specifier with 'l' modifier the array must be diff --git a/sysdeps/s390/multiarch/gconv_simple.c b/sysdeps/s390/multiarch= /gconv_simple.c index 41132f620a..3896bdd96a 100644 --- a/sysdeps/s390/multiarch/gconv_simple.c +++ b/sysdeps/s390/multiarch/gconv_simple.c @@ -68,6 +68,8 @@ =20 # undef __gconv_transform_ascii_internal # undef __gconv_transform_internal_ascii +# undef __gconv_transform_posix_internal +# undef __gconv_transform_internal_posix # undef __gconv_transform_internal_ucs4le # undef __gconv_transform_ucs4_internal # undef __gconv_transform_ucs4le_internal @@ -385,6 +387,302 @@ ICONV_VX_IFUNC (__gconv_transform_ascii_internal) # undef BODY_ORIG_ERROR ICONV_VX_IFUNC (__gconv_transform_internal_ascii) =20 +/* Convert from {[0, 0x7F] =3D> ISO 646-IRV; [0x80, 0xFF] =3D> [U+DF80, U+= DFFF]} + to the internal (UCS4-like) format. */ +# define DEFINE_INIT 0 +# define DEFINE_FINI 0 +# define MIN_NEEDED_FROM 1 +# define MIN_NEEDED_TO 4 +# define FROM_DIRECTION 1 +# define FROM_LOOP ICONV_VX_NAME (posix_internal_loop) +# define TO_LOOP ICONV_VX_NAME (posix_internal_loop) /* This is not used.= */ +# define FUNCTION_NAME ICONV_VX_NAME (__gconv_transform_posix_internal) +# define ONE_DIRECTION 1 + +# define MIN_NEEDED_INPUT MIN_NEEDED_FROM +# define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +# define LOOPFCT FROM_LOOP + +# define BODY_ORIG \ + { \ + if (__glibc_unlikely (*inptr > '\x7f')) \ + *((uint32_t *) outptr) =3D 0xdf00 + *inptr++; \ + else \ + *((uint32_t *) outptr) =3D *inptr++; \ + outptr +=3D sizeof (uint32_t); \ + } +# define BODY \ + { \ + size_t len =3D inend - inptr; \ TODO: entirely ascii_internal_loo= p, above + if (len > (outend - outptr) / 4) \ + len =3D (outend - outptr) / 4; \ + size_t loop_count, tmp; \ + __asm__ volatile (".machine push\n\t" \ + ".machine \"z13\"\n\t" \ + ".machinemode \"zarch_nohighgprs\"\n\t" \ + CONVERT_32BIT_SIZE_T ([R_LEN]) \ + " vrepib %%v30,0x7f\n\t" /* For compare > 0x7f. */ \ + " srlg %[R_LI],%[R_LEN],4\n\t" \ + " vrepib %%v31,0x20\n\t" \ + " clgije %[R_LI],0,1f\n\t" \ + "0: \n\t" /* Handle 16-byte blocks. */ \ + " vl %%v16,0(%[R_IN])\n\t" \ + /* Checking for values > 0x7f. */ \ + " vstrcbs %%v17,%%v16,%%v30,%%v31\n\t" \ + " jno 10f\n\t" \ + /* Enlarge to UCS4. */ \ + " vuplhb %%v17,%%v16\n\t" \ + " vupllb %%v18,%%v16\n\t" \ + " vuplhh %%v19,%%v17\n\t" \ + " vupllh %%v20,%%v17\n\t" \ + " vuplhh %%v21,%%v18\n\t" \ + " vupllh %%v22,%%v18\n\t" \ + /* Store 64bytes to buf_out. */ \ + " vstm %%v19,%%v22,0(%[R_OUT])\n\t" \ + " la %[R_IN],16(%[R_IN])\n\t" \ + " la %[R_OUT],64(%[R_OUT])\n\t" \ + " brctg %[R_LI],0b\n\t" \ + " lghi %[R_LI],15\n\t" \ + " ngr %[R_LEN],%[R_LI]\n\t" \ + " je 20f\n\t" /* Jump away if no remaining bytes. */ \ + /* Handle remaining bytes. */ \ + "1: aghik %[R_LI],%[R_LEN],-1\n\t" \ + " jl 20f\n\t" /* Jump away if no remaining bytes. */ \ + " vll %%v16,%[R_LI],0(%[R_IN])\n\t" \ + /* Checking for values > 0x7f. */ \ + " vstrcbs %%v17,%%v16,%%v30,%%v31\n\t" \ + " vlgvb %[R_TMP],%%v17,7\n\t" \ + " clr %[R_TMP],%[R_LI]\n\t" \ + " locrh %[R_TMP],%[R_LEN]\n\t" \ + " locghih %[R_LEN],0\n\t" \ + " j 12f\n\t" \ + "10:\n\t" \ + /* Found a value > 0x7f. \ + Store the preceding chars. */ \ + " vlgvb %[R_TMP],%%v17,7\n\t" \ + "12: la %[R_IN],0(%[R_TMP],%[R_IN])\n\t" \ + " sllk %[R_TMP],%[R_TMP],2\n\t" \ + " ahi %[R_TMP],-1\n\t" \ + " jl 20f\n\t" \ + " lgr %[R_LI],%[R_TMP]\n\t" \ + " vuplhb %%v17,%%v16\n\t" \ + " vuplhh %%v19,%%v17\n\t" \ + " vstl %%v19,%[R_LI],0(%[R_OUT])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 11f\n\t" \ + " vupllh %%v20,%%v17\n\t" \ + " vstl %%v20,%[R_LI],16(%[R_OUT])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 11f\n\t" \ + " vupllb %%v18,%%v16\n\t" \ + " vuplhh %%v21,%%v18\n\t" \ + " vstl %%v21,%[R_LI],32(%[R_OUT])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 11f\n\t" \ + " vupllh %%v22,%%v18\n\t" \ + " vstl %%v22,%[R_LI],48(%[R_OUT])\n\t" \ + "11:\n\t" \ + " la %[R_OUT],1(%[R_TMP],%[R_OUT])\n\t" \ + "20:\n\t" \ + ".machine pop" \ + : /* outputs */ [R_OUT] "+a" (outptr) \ + , [R_IN] "+a" (inptr) \ + , [R_LEN] "+d" (len) \ + , [R_LI] "=3Dd" (loop_count) \ + , [R_TMP] "=3Da" (tmp) \ + : /* inputs */ \ + : /* clobber list*/ "memory", "cc" \ + ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \ + ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \ + ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \ + ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v30") \ + ASM_CLOBBER_VR ("v31") \ + ); \ + if (len > 0) \ + { \ + /* Found an invalid character at the next input byte. */ \ + BODY_ORIG_ERROR \ + } \ + } + +# include +# include +# undef BODY_ORIG +# undef BODY_ORIG_ERROR +ICONV_VX_IFUNC (__gconv_transform_posix_internal) + +/* Convert from the internal (UCS4-like) format to + {ISO 646-IRV =3D> [0, 0x7F]; [U+DF80, U+DFFF] =3D> [0x80, 0xFF]}. */ +# define DEFINE_INIT 0 +# define DEFINE_FINI 0 +# define MIN_NEEDED_FROM 4 +# define MIN_NEEDED_TO 1 +# define FROM_DIRECTION 1 +# define FROM_LOOP ICONV_VX_NAME (internal_posix_loop) +# define TO_LOOP ICONV_VX_NAME (internal_posix_loop) /* This is not used.= */ +# define FUNCTION_NAME ICONV_VX_NAME (__gconv_transform_internal_posix) +# define ONE_DIRECTION 1 + +# define MIN_NEEDED_INPUT MIN_NEEDED_FROM +# define MIN_NEEDED_OUTPUT MIN_NEEDED_TO +# define LOOPFCT FROM_LOOP +# define BODY_ORIG_ERROR \ + UNICODE_TAG_HANDLER (*((const uint32_t *) inptr), 4); \ + STANDARD_TO_LOOP_ERR_HANDLER (4); + +# define BODY_ORIG \ + { \ + uint32_t val =3D *((const uint32_t *) inptr); \ + if (__glibc_unlikely ((val > 0x7f && val < 0xdf80) || val > 0xdfff))\ + { \ + UNICODE_TAG_HANDLER (val, 4); \ + STANDARD_TO_LOOP_ERR_HANDLER (4); \ + } \ + else \ + { \ + if (__glibc_unlikely (val > 0x7f)) \ + val -=3D 0xdf00; \ + *outptr++ =3D val; \ + inptr +=3D sizeof (uint32_t); \ + } \ + } + +# define BODY \ + { \ + size_t len =3D (inend - inptr) / 4; \ TODO: entirely internal_asci= i_loop, above + if (len > outend - outptr) \ + len =3D outend - outptr; \ + size_t loop_count, tmp, tmp2; \ + __asm__ volatile (".machine push\n\t" \ + ".machine \"z13\"\n\t" \ + ".machinemode \"zarch_nohighgprs\"\n\t" \ + CONVERT_32BIT_SIZE_T ([R_LEN]) \ + /* Setup to check for ch > 0x7f. */ \ + " vzero %%v21\n\t" \ + " srlg %[R_LI],%[R_LEN],4\n\t" \ + " vleih %%v21,8192,0\n\t" /* element 0: > */ \ + " vleih %%v21,-8192,2\n\t" /* element 1: =3D<> */ \ + " vleif %%v20,127,0\n\t" /* element 0: 127 */ \ + " lghi %[R_TMP],0\n\t" \ + " clgije %[R_LI],0,1f\n\t" \ + "0:\n\t" \ + " vlm %%v16,%%v19,0(%[R_IN])\n\t" \ + /* Shorten to byte values. */ \ + " vpkf %%v23,%%v16,%%v17\n\t" \ + " vpkf %%v24,%%v18,%%v19\n\t" \ + " vpkh %%v23,%%v23,%%v24\n\t" \ + /* Checking for values > 0x7f. */ \ + " vstrcfs %%v22,%%v16,%%v20,%%v21\n\t" \ + " jno 10f\n\t" \ + " vstrcfs %%v22,%%v17,%%v20,%%v21\n\t" \ + " jno 11f\n\t" \ + " vstrcfs %%v22,%%v18,%%v20,%%v21\n\t" \ + " jno 12f\n\t" \ + " vstrcfs %%v22,%%v19,%%v20,%%v21\n\t" \ + " jno 13f\n\t" \ + /* Store 16bytes to outptr. */ \ + " vst %%v23,0(%[R_OUT])\n\t" \ + " la %[R_IN],64(%[R_IN])\n\t" \ + " la %[R_OUT],16(%[R_OUT])\n\t" \ + " brctg %[R_LI],0b\n\t" \ + " lghi %[R_LI],15\n\t" \ + " ngr %[R_LEN],%[R_LI]\n\t" \ + " je 20f\n\t" /* Jump away if no remaining bytes. */ \ + /* Handle remaining bytes. */ \ + "1: sllg %[R_LI],%[R_LEN],2\n\t" \ + " aghi %[R_LI],-1\n\t" \ + " jl 20f\n\t" /* Jump away if no remaining bytes. */ \ + /* Load remaining 1...63 bytes. */ \ + " vll %%v16,%[R_LI],0(%[R_IN])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 2f\n\t" \ + " vll %%v17,%[R_LI],16(%[R_IN])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 2f\n\t" \ + " vll %%v18,%[R_LI],32(%[R_IN])\n\t" \ + " ahi %[R_LI],-16\n\t" \ + " jl 2f\n\t" \ + " vll %%v19,%[R_LI],48(%[R_IN])\n\t" \ + "2:\n\t" \ + /* Shorten to byte values. */ \ + " vpkf %%v23,%%v16,%%v17\n\t" \ + " vpkf %%v24,%%v18,%%v19\n\t" \ + " vpkh %%v23,%%v23,%%v24\n\t" \ + " sllg %[R_LI],%[R_LEN],2\n\t" \ + " aghi %[R_LI],-16\n\t" \ + " jl 3f\n\t" /* v16 is not fully loaded. */ \ + " vstrcfs %%v22,%%v16,%%v20,%%v21\n\t" \ + " jno 10f\n\t" \ + " aghi %[R_LI],-16\n\t" \ + " jl 4f\n\t" /* v17 is not fully loaded. */ \ + " vstrcfs %%v22,%%v17,%%v20,%%v21\n\t" \ + " jno 11f\n\t" \ + " aghi %[R_LI],-16\n\t" \ + " jl 5f\n\t" /* v18 is not fully loaded. */ \ + " vstrcfs %%v22,%%v18,%%v20,%%v21\n\t" \ + " jno 12f\n\t" \ + " aghi %[R_LI],-16\n\t" \ + /* v19 is not fully loaded. */ \ + " lghi %[R_TMP],12\n\t" \ + " vstrcfs %%v22,%%v19,%%v20,%%v21\n\t" \ + "6: vlgvb %[R_I],%%v22,7\n\t" \ + " aghi %[R_LI],16\n\t" \ + " clrjl %[R_I],%[R_LI],14f\n\t" \ + " lgr %[R_I],%[R_LEN]\n\t" \ + " lghi %[R_LEN],0\n\t" \ + " j 15f\n\t" \ + "3: vstrcfs %%v22,%%v16,%%v20,%%v21\n\t" \ + " j 6b\n\t" \ + "4: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t" \ + " lghi %[R_TMP],4\n\t" \ + " j 6b\n\t" \ + "5: vstrcfs %%v22,%%v17,%%v20,%%v21\n\t" \ + " lghi %[R_TMP],8\n\t" \ + " j 6b\n\t" \ + /* Found a value > 0x7f. */ \ + "13: ahi %[R_TMP],4\n\t" \ + "12: ahi %[R_TMP],4\n\t" \ + "11: ahi %[R_TMP],4\n\t" \ + "10: vlgvb %[R_I],%%v22,7\n\t" \ + "14: srlg %[R_I],%[R_I],2\n\t" \ + " agr %[R_I],%[R_TMP]\n\t" \ + " je 20f\n\t" \ + /* Store characters before invalid one... */ \ + "15: aghi %[R_I],-1\n\t" \ + " vstl %%v23,%[R_I],0(%[R_OUT])\n\t" \ + /* ... and update pointers. */ \ + " la %[R_OUT],1(%[R_I],%[R_OUT])\n\t" \ + " sllg %[R_I],%[R_I],2\n\t" \ + " la %[R_IN],4(%[R_I],%[R_IN])\n\t" \ + "20:\n\t" \ + ".machine pop" \ + : /* outputs */ [R_OUT] "+a" (outptr) \ + , [R_IN] "+a" (inptr) \ + , [R_LEN] "+d" (len) \ + , [R_LI] "=3Dd" (loop_count) \ + , [R_I] "=3Da" (tmp2) \ + , [R_TMP] "=3Dd" (tmp) \ + : /* inputs */ \ + : /* clobber list*/ "memory", "cc" \ + ASM_CLOBBER_VR ("v16") ASM_CLOBBER_VR ("v17") \ + ASM_CLOBBER_VR ("v18") ASM_CLOBBER_VR ("v19") \ + ASM_CLOBBER_VR ("v20") ASM_CLOBBER_VR ("v21") \ + ASM_CLOBBER_VR ("v22") ASM_CLOBBER_VR ("v23") \ + ASM_CLOBBER_VR ("v24") \ + ); \ + if (len > 0) \ + { \ + /* Found an invalid character > 0x7f at next character. */ \ + BODY_ORIG_ERROR \ + } \ + } +# define LOOP_NEED_FLAGS +# include +# include +# undef BODY_ORIG +# undef BODY_ORIG_ERROR +ICONV_VX_IFUNC (__gconv_transform_internal_posix) + =20 /* Convert from internal UCS4 to UCS4 little endian form. */ # define DEFINE_INIT 0 diff --git a/wcsmbs/wcsmbsload.c b/wcsmbs/wcsmbsload.c index 0f0f55f9ed..f87099bcf5 100644 --- a/wcsmbs/wcsmbsload.c +++ b/wcsmbs/wcsmbsload.c @@ -33,10 +33,10 @@ static const struct __gconv_step to_wc =3D .__shlib_handle =3D NULL, .__modname =3D NULL, .__counter =3D INT_MAX, - .__from_name =3D (char *) "ANSI_X3.4-1968//TRANSLIT", + .__from_name =3D (char *) "POSIX", .__to_name =3D (char *) "INTERNAL", - .__fct =3D __gconv_transform_ascii_internal, - .__btowc_fct =3D __gconv_btwoc_ascii, + .__fct =3D __gconv_transform_posix_internal, + .__btowc_fct =3D __gconv_btwoc_posix, .__init_fct =3D NULL, .__end_fct =3D NULL, .__min_needed_from =3D 1, @@ -53,8 +53,8 @@ static const struct __gconv_step to_mb =3D .__modname =3D NULL, .__counter =3D INT_MAX, .__from_name =3D (char *) "INTERNAL", - .__to_name =3D (char *) "ANSI_X3.4-1968//TRANSLIT", - .__fct =3D __gconv_transform_internal_ascii, + .__to_name =3D (char *) "POSIX", + .__fct =3D __gconv_transform_internal_posix, .__btowc_fct =3D NULL, .__init_fct =3D NULL, .__end_fct =3D NULL, --=20 2.30.2 --iuu2ecballkdxbik Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmMXU9gACgkQvP0LAY0m WPHhxQ//doS1jx8ampJvjHA3zEQTVwUa8BC5DVRnCt9EF58ZEAd2V2GwkgqUoKAZ pI2YlqE+xTyAzRmFmflh6M7aDgXutLuGhGYtCeNqKDAZ7BzqVPJDT/8/M7YPRqlT 5vgSNA1lWHGIfqmslovU3M4woK+NeOszY6c5lB/GOYoZClug0WY7b8OcMvxLTWMe nS3/oxiQW+Zji4HcwFuJJYb2+CrW19KhZErzCnLEnjxwVt3B/DDxRUYgJJtx3FVM wdPQPR0g6s8pYVUCNQwXHurCsxJeQfozNFDgR5yzSD0GkD9GssYMayxHTN3Yadai I93veCIe6/6ClwFCXEaZHzd5iNS81FYas3HlZeng9AY5wxZcD6M7ufoJ8z54jlgt PprMqUGL1iQ61s9RspveNxY+1jexmW/Asp2xufogAtHQtY67AcZDfCdcqhJcrLJa fCjKnThht/QPjnMb4bObsN/yXBr84kt2K7Qxq6vZzqOzCEU9CznqwcW2Jl1RIW0m eIudPhO2nlqhig8J28GPVFXmjGnazEsz4JB5xKe0hP+yib3bYrecyHp8e0lcC/Mn NoFLwaW+dl5MwQOHD+9eqa3q346/vZ3ahv0rPeX0cGYVxwZ0VLhqM4LznRdSbgpf 1WvQpBSaaRczBXwU/oLvAMF3m4UNSViEyAgpnOpu8ySwXy/m1WY= =DEkW -----END PGP SIGNATURE----- --iuu2ecballkdxbik--