[PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* [PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc_ascii
@ 2023-07-23 17:32 наб
  2023-07-23 17:33 ` [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges наб
  2023-07-23 17:54 ` [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511] наб
  0 siblings, 2 replies; 5+ messages in thread
From: наб @ 2023-07-23 17:32 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Victor Stinner, Bruno Haible

[-- Attachment #1: Type: text/plain, Size: 2868 bytes --]

The only user of this typo, sans the ChangeLogs.

Reported-by: Bruno Haible <bruno@clisp.org>
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
Clean rebase of 1/3.

 iconv/gconv_builtin.h | 4 ++--
 iconv/gconv_int.h     | 2 +-
 iconv/gconv_simple.c  | 2 +-
 wcsmbs/wcsmbsload.c   | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/iconv/gconv_builtin.h b/iconv/gconv_builtin.h
index 35608b4461..2f560a924a 100644
--- a/iconv/gconv_builtin.h
+++ b/iconv/gconv_builtin.h
@@ -52,7 +52,7 @@ BUILTIN_TRANSFORMATION ("INTERNAL", "ISO-10646/UTF8/", 1, "=INTERNAL->utf8",
 			__gconv_transform_internal_utf8, NULL, 4, 4, 1, 6)
 
 BUILTIN_TRANSFORMATION ("ISO-10646/UTF8/", "INTERNAL", 1, "=utf8->INTERNAL",
-			__gconv_transform_utf8_internal, __gconv_btwoc_ascii,
+			__gconv_transform_utf8_internal, __gconv_btowc_ascii,
 			1, 6, 4, 4)
 
 BUILTIN_ALIAS ("UCS2//", "ISO-10646/UCS2/")
@@ -82,7 +82,7 @@ BUILTIN_ALIAS ("CSASCII//", "ANSI_X3.4-1968//")
 BUILTIN_ALIAS ("OSF00010020//", "ANSI_X3.4-1968//")
 
 BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=ascii->INTERNAL",
-			__gconv_transform_ascii_internal, __gconv_btwoc_ascii,
+			__gconv_transform_ascii_internal, __gconv_btowc_ascii,
 			1, 1, 4, 4)
 
 BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->ascii",
diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h
index 19d042faff..e3baec97f0 100644
--- a/iconv/gconv_int.h
+++ b/iconv/gconv_int.h
@@ -325,7 +325,7 @@ __BUILTIN_TRANSFORM (__gconv_transform_utf16_internal);
 
 /* Specialized conversion function for a single byte to INTERNAL, recognizing
    only ASCII characters.  */
-extern wint_t __gconv_btwoc_ascii (struct __gconv_step *step, unsigned char c);
+extern wint_t __gconv_btowc_ascii (struct __gconv_step *step, unsigned char c);
 
 #endif
 
diff --git a/iconv/gconv_simple.c b/iconv/gconv_simple.c
index e936e171d7..17788383f4 100644
--- a/iconv/gconv_simple.c
+++ b/iconv/gconv_simple.c
@@ -45,7 +45,7 @@
 /* Specialized conversion function for a single byte to INTERNAL, recognizing
    only ASCII characters.  */
 wint_t
-__gconv_btwoc_ascii (struct __gconv_step *step, unsigned char c)
+__gconv_btowc_ascii (struct __gconv_step *step, unsigned char c)
 {
   if (c < 0x80)
     return c;
diff --git a/wcsmbs/wcsmbsload.c b/wcsmbs/wcsmbsload.c
index 7b338b6775..61392e0b1e 100644
--- a/wcsmbs/wcsmbsload.c
+++ b/wcsmbs/wcsmbsload.c
@@ -36,7 +36,7 @@ static const struct __gconv_step to_wc =
   .__from_name = (char *) "ANSI_X3.4-1968//TRANSLIT",
   .__to_name = (char *) "INTERNAL",
   .__fct = __gconv_transform_ascii_internal,
-  .__btowc_fct = __gconv_btwoc_ascii,
+  .__btowc_fct = __gconv_btowc_ascii,
   .__init_fct = NULL,
   .__end_fct = NULL,
   .__min_needed_from = 1,
-- 
2.39.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges
  2023-07-23 17:32 [PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc_ascii наб
@ 2023-07-23 17:33 ` наб
  2023-07-23 19:38   ` Bruno Haible
  2023-07-23 17:54 ` [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511] наб
  1 sibling, 1 reply; 5+ messages in thread
From: наб @ 2023-07-23 17:33 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Victor Stinner, Bruno Haible

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

The "current character" bytes array was incremented at the end of the
loop instead of at the beginning, which meant that for ASCII +
  <UDC80>..<UDCFF>  /x80
it would complain about overrunning 0xFF->0x100 when in reality the loop
would've ended just after.

Instead, bump the current character at the start of the loop
(but not the first time, of course),
precisely as many times as there are characters in the range.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
New patch, trivial and obvious off-by-1.

 locale/programs/charmap.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/locale/programs/charmap.c b/locale/programs/charmap.c
index e4847aa3a0..822239ef11 100644
--- a/locale/programs/charmap.c
+++ b/locale/programs/charmap.c
@@ -1037,6 +1037,20 @@ hexadecimal range format should use only capital characters"));
 
   for (cnt = from_nr; cnt <= to_nr; cnt += step)
     {
+      /* Increment the value in the byte sequence.  */
+      if (cnt != from_nr && ++bytes[nbytes - 1] == '\0')
+	{
+	  int b = nbytes - 2;
+	  do
+	    if (b < 0)
+	      {
+		lr_error (lr,
+			  _("resulting bytes for range not representable."));
+		return;
+	      }
+	  while (++bytes[b--] == 0);
+	}
+
       char *name_end;
       obstack_printf (ob, decimal_ellipsis ? "%.*s%0*d" : "%.*s%0*X",
 		      prefix_len, from, len1 - prefix_len, cnt);
@@ -1079,21 +1093,6 @@ hexadecimal range format should use only capital characters"));
       insert_entry (bt, newp->bytes, nbytes, newp);
       /* Please note we don't examine the return value since it is no error
 	 if we have two definitions for a symbol.  */
-
-      /* Increment the value in the byte sequence.  */
-      if (++bytes[nbytes - 1] == '\0')
-	{
-	  int b = nbytes - 2;
-
-	  do
-	    if (b < 0)
-	      {
-		lr_error (lr,
-			  _("resulting bytes for range not representable."));
-		return;
-	      }
-	  while (++bytes[b--] == 0);
-	}
     }
 }
 
-- 
2.39.2


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511]
  2023-07-23 17:32 [PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc_ascii наб
  2023-07-23 17:33 ` [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges наб
@ 2023-07-23 17:54 ` наб
  2023-07-23 20:15   ` Bruno Haible
  1 sibling, 1 reply; 5+ messages in thread
From: наб @ 2023-07-23 17:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Victor Stinner, Bruno Haible

[-- Attachment #1: Type: text/plain, Size: 40654 bytes --]

This largely duplicates the ASCII code with the error path changed

There is one user-facing change:
"ANSI_X3.4-1968" (and /only/ that, its former aliases are unaffected)
mbrtowc() and friends return b if b <= 0x7F else <UDC00>+b.

Since Issue 7 TC 2/Issue 8, the C/POSIX locale, effectively,
  (a) is 1-byte, stateless, and contains 256 characters
  (b) they collate in ASCII-byte order
  (c) the first 128 characters map all ASCII characters (like previous)
cf. https://www.austingroupbugs.net/view.php?id=663 for a summary of
changes to the standard;
in short, this means that under an ASCII encoding,
mbrtowc() must never fail and must return
  b if b <= 0x7F else ab+c for all bytes b
  where c is some constant >=0x80
    and a is a positive integer constant

By strategically picking c=<UDC00> we land at the same point of the
Unicode Low Surrogate Area at DC00-DCFF, described as
  > Isolated surrogate code points have no interpretation;
  > consequently, no character code charts or names lists
  > are provided for this range.
as the Python UTF-8 errors=surrogateescape encoding.

As @mirabilos points out in
  https://www.mail-archive.com/austin-group-l@opengroup.org/msg11591.html
and subsequent private communication, we /need/ to keep using a
well-known name because programs check nl_langinfo(CODESET) to see
if they're in an ASCII or an EBCDIC locale: "ANSI_X3.4-1968", being
glibc's default, is checked universally.

There are many aliases that glibc has for ASCII,
but the "ANSI_X3.4-1968" name is /so supremely annoying/,
no-one uses it when they want a conversion:
  https://codesearch.debian.net/search?q=iconv.*ANSI_X3.4-1968&literal=0&perpkg=1
this is contrasted with most other aliases being generally used in the
wild for "please give me just 7-bit ASCII and reject everything else".

Thus, by reparenting the ASCII alias tree at "ASCII", "ANSI_X3.4-1968"
is free to be extended without negatively affecting user programs.

Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
---
Clean rebase. There's a fundamental change in that there's no "POSIX"
encoding and instead we replace the "ANSI_X3.4-1968" one.

As pointed out by @mirabilos in
  https://www.mail-archive.com/austin-group-l@opengroup.org/msg11591.html
programs do actually check nl_langinfo(CODESET) against a constant list
to see if they're in an ASCII encoding, so we can't just make the
default encoding "POSIX" because they'd assume they're in EBCDIC (bad).

Thankfully a user program survey
  https://codesearch.debian.net/search?q=iconv.*ANSI_X3.4-1968&literal=0&perpkg=1
(results archived as:
 $ base64 -di <<EOF | zstd -d
 KLUv/QSATRUAdqFoIfC2HhTZnYCbS8jO3hOVi6tYH6Tj+02DhX+NhcVhQGApAV0AXgBfADt23q3S
 5Rq0uGjIJNb5f4qi5iP4BUPZukPNP9PrsQ9Q3dTyB0O9JVTv4Rd6rCf6gseLmYq+UD0Xf4+wHw1V
 Cr7J89t2fCMURQl4xyG3+7fAexYE7f6eW3w/f2UGKvQS0Wyk1DqZBslWHtFxhxk1mXE6k0yFUWXW
 3WIb4u9l+lWCWuO3PHzd3/a7SRbziCo5jdP8AiTvmd5NRz+ZODKM3/rzvHPb3/MYj9BkIBqLduqN
 sq73Bz0E85Ul+9keAZvchV2dYWIaxh04Od9iWguSculg0LLspiBubR5a91ZONfFbxO9iD/WMh63r
 oiUzecX/56O35F++nq7ZWhYWMTnn54sxfZIGcWxu29VhmOYb+UJm6DaZPSKUKQbUA9ehywyHLtu6
 yU1t00Jag2LdMpNRWp6DzDZum9o2tcxwJWZ8QQ9UlYAzyVgufzr99QrFCypHlSkOgKIa+M99y7Gl
 EoL5hO+W2y9UVSBbUFvsZQrqdJ7v7rxj1c2tGT/+CW+osZ0yxMyJwrQkpWEMIIQYxY5uAzJKNmcs
 Mg4aChJGCsopo1g6iAFtArSGohdjtKHU47ut04aWBiv+bSd5wFVhpzDPGm/5DSxnQhn0lZVQnJVr
 2SbDmrw+jYxJ7gdjQszWN3H8VegwGnfiI8ru5Nqv0OXJBIoOHmHfMOsyCJAGQTTlAi0WmH6fu51Q
 AN6aI+NaVZ7TabdBpZliVXk68wIDp0Vbpcg4MOTCAo6UabwqgeUCOcDSNsA2JWNmcHeRk9Hn4sbz
 KayIfTIiSMvcerFQiXBcxhjtI6REJky3xuNwrL7NuiYsxTT5XooHtNTfdVjB6LXqJ4QQGQ5woRHJ
 ghyEj1YBN9ksOQ==
 EOF
 )
shows that people use ASCII/US-ASCII/even ISO_646.irv:1991 and whatever
for iconv as "7-bit ASCII". But no-one uses ANSI_X3.4-1968 because it's
such a deeply annoying name. Thus, we simply make ANSI_X3.4-1968
/and only ANSI_X3.4-1968/ the 256-byte encoding. User programs see no
difference in nl_langinfo(CODESET), but function correctly.

This unfortunately makes the patch bigger than it otherwise would've
been since ANSI_X3.4-1968 was the leader of the ASCII alias group.
I made it, uh, ASCII, simply because it's the least annoying one.
Which encodings are aliases or not isn't visible to users,
so it doesn't matter if it's ASCII or irv6861992v7.

Thus, by reparenting the ASCII alias tree at "ASCII", "ANSI_X3.4-1968"
is free to be extended without impacting user programs' behaviour.

Similarly, re-word the news and commit message to avoid saying POSIX
specifies an encoding (it doesn't, it specifies just some stiff
requirements on the encoding, and the wording heavily favours ASCII).

 NEWS                               |  10 ++
 iconv/Makefile                     |   2 +-
 iconv/gconv_builtin.h              |  37 +++++---
 iconv/gconv_int.h                  |   8 ++
 iconv/gconv_posix.c                |  94 +++++++++++++++++++
 iconv/tst-iconv_prog.sh            |  44 +++++++++
 iconvdata/TESTS                    |   1 +
 iconvdata/testdata/ANSI_X3.4-1968  |   7 +-
 iconvdata/testdata/ASCII           |   6 ++
 iconvdata/tst-tables.sh            |   3 +-
 inet/tst-idna_name_classify.c      |   6 +-
 intl/Makefile                      |   2 +-
 intl/tst-translit.c                |   2 +-
 locale/programs/config.h           |   2 +-
 locale/tst-C-locale.c              |  44 +++++++++
 localedata/Makefile                |   2 +-
 localedata/bug-iconv-trans.c       |   2 +-
 localedata/charmaps/ANSI_X3.4-1968 |  13 +--
 localedata/charmaps/ASCII          | 144 +++++++++++++++++++++++++++++
 localedata/locales/POSIX           | 143 +++++++++++++++++++++++++++-
 localedata/tests-mbwc/tgn_locdef.h |   4 +-
 localedata/tst-ctype.sh            |   2 +-
 localedata/tst-langinfo.sh         |  68 +++++++-------
 localedata/tst-mbswcs6.c           |   2 +-
 stdio-common/Makefile              |   1 +
 stdio-common/tst-printf-bz25691.c  |   2 +
 wcsmbs/Makefile                    |   2 +-
 wcsmbs/tst-btowc.c                 |   4 +-
 wcsmbs/wcsmbsload.c                |  14 +--
 29 files changed, 581 insertions(+), 90 deletions(-)
 create mode 100644 iconv/gconv_posix.c
 mode change 100644 => 120000 iconvdata/testdata/ANSI_X3.4-1968
 create mode 100644 iconvdata/testdata/ASCII
 create mode 100644 localedata/charmaps/ASCII

diff --git a/NEWS b/NEWS
index 93f7d9faaa..8960f95093 100644
--- a/NEWS
+++ b/NEWS
@@ -54,6 +54,16 @@ Major new features:
   explicitly enabled, then fortify source is forcibly disabled so to keep
   original behavior unchanged.
 
+* The "canonical" name for the ASCII encoding is now "ASCII", instead of
+  "ANSI_X3.4-1968". "ANSI_X3.4-1968" is no longer an alias for "ASCII".
+
+* The "ANSI_X3.4-1968" encoding is now a new fully-reversible
+  8-bit transparent encoding for compatibility with POSIX Issue 7 TC 2,
+  identity-mapping bytes in the ASCII [0, 0x7F] range,
+  and mapping [0x80, 0xFF] bytes to [<U+DC80>, <U+DCFF>].
+  The standard now requires the "POSIX"/"C" locale to have an encoding
+  with these features ‒ 8-bit transparency and a continuous collation sequence.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * libcrypt is no longer built by default, one may use the --enable-crypt
diff --git a/iconv/Makefile b/iconv/Makefile
index afb3fb7bdb..b61e130377 100644
--- a/iconv/Makefile
+++ b/iconv/Makefile
@@ -25,7 +25,7 @@ include ../Makeconfig
 headers		= iconv.h gconv.h
 routines	= iconv_open iconv iconv_close \
 		  gconv_open gconv gconv_close gconv_db gconv_conf \
-		  gconv_builtin gconv_simple gconv_trans gconv_cache
+		  gconv_builtin gconv_simple gconv_posix gconv_trans gconv_cache
 routines	+= gconv_dl gconv_charset
 
 vpath %.c ../locale/programs ../intl
diff --git a/iconv/gconv_builtin.h b/iconv/gconv_builtin.h
index 2f560a924a..00b2878fb7 100644
--- a/iconv/gconv_builtin.h
+++ b/iconv/gconv_builtin.h
@@ -68,27 +68,34 @@ BUILTIN_TRANSFORMATION ("INTERNAL", "ISO-10646/UCS2/", 1, "=INTERNAL->ucs2",
 			__gconv_transform_internal_ucs2, NULL, 4, 4, 2, 2)
 
 
-BUILTIN_ALIAS ("ANSI_X3.4//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("ISO-IR-6//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("ANSI_X3.4-1986//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("ISO_646.IRV:1991//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("ASCII//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("ISO646-US//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("US-ASCII//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("US//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("IBM367//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("CP367//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("CSASCII//", "ANSI_X3.4-1968//")
-BUILTIN_ALIAS ("OSF00010020//", "ANSI_X3.4-1968//")
-
-BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=ascii->INTERNAL",
+BUILTIN_ALIAS ("ANSI_X3.4//", "ASCII//")
+BUILTIN_ALIAS ("ISO-IR-6//", "ASCII//")
+BUILTIN_ALIAS ("ISO_646.IRV:1991//", "ASCII//")
+BUILTIN_ALIAS ("ASCII//", "ASCII//")
+BUILTIN_ALIAS ("ISO646-US//", "ASCII//")
+BUILTIN_ALIAS ("US-ASCII//", "ASCII//")
+BUILTIN_ALIAS ("US//", "ASCII//")
+BUILTIN_ALIAS ("IBM367//", "ASCII//")
+BUILTIN_ALIAS ("CP367//", "ASCII//")
+BUILTIN_ALIAS ("CSASCII//", "ASCII//")
+BUILTIN_ALIAS ("OSF00010020//", "ASCII//")
+
+BUILTIN_TRANSFORMATION ("ASCII//", "INTERNAL", 1, "=ascii->INTERNAL",
 			__gconv_transform_ascii_internal, __gconv_btowc_ascii,
 			1, 1, 4, 4)
 
-BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->ascii",
+BUILTIN_TRANSFORMATION ("INTERNAL", "ASCII//", 1, "=INTERNAL->ascii",
 			__gconv_transform_internal_ascii, NULL, 4, 4, 1, 1)
 
 
+BUILTIN_TRANSFORMATION ("ANSI_X3.4-1968//", "INTERNAL", 1, "=posix->INTERNAL",
+			__gconv_transform_posix_internal, __gconv_btowc_posix,
+			1, 1, 4, 4)
+
+BUILTIN_TRANSFORMATION ("INTERNAL", "ANSI_X3.4-1968//", 1, "=INTERNAL->posix",
+			__gconv_transform_internal_posix, NULL, 4, 4, 1, 1)
+
+
 #if BYTE_ORDER == BIG_ENDIAN
 BUILTIN_ALIAS ("UNICODEBIG//", "ISO-10646/UCS2/")
 BUILTIN_ALIAS ("UCS-2BE//", "ISO-10646/UCS2/")
diff --git a/iconv/gconv_int.h b/iconv/gconv_int.h
index e3baec97f0..2aca18eff8 100644
--- a/iconv/gconv_int.h
+++ b/iconv/gconv_int.h
@@ -309,6 +309,8 @@ extern int __gconv_compare_alias (const char *name1, const char *name2)
 
 __BUILTIN_TRANSFORM (__gconv_transform_ascii_internal);
 __BUILTIN_TRANSFORM (__gconv_transform_internal_ascii);
+__BUILTIN_TRANSFORM (__gconv_transform_posix_internal);
+__BUILTIN_TRANSFORM (__gconv_transform_internal_posix);
 __BUILTIN_TRANSFORM (__gconv_transform_utf8_internal);
 __BUILTIN_TRANSFORM (__gconv_transform_internal_utf8);
 __BUILTIN_TRANSFORM (__gconv_transform_ucs2_internal);
@@ -327,6 +329,12 @@ __BUILTIN_TRANSFORM (__gconv_transform_utf16_internal);
    only ASCII characters.  */
 extern wint_t __gconv_btowc_ascii (struct __gconv_step *step, unsigned char c);
 
+/* Specialized conversion function for a single byte to INTERNAL,
+   identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the
+   Low Surrogate Area at [U+DC80, U+DCFF].  */
+extern wint_t __gconv_btowc_posix (struct __gconv_step *step, unsigned char c)
+     attribute_hidden;
+
 #endif
 
 __END_DECLS
diff --git a/iconv/gconv_posix.c b/iconv/gconv_posix.c
new file mode 100644
index 0000000000..c219e22be0
--- /dev/null
+++ b/iconv/gconv_posix.c
@@ -0,0 +1,94 @@
+/* "POSIX" locale transformation functions.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+
+#include <gconv_int.h>
+
+
+/* Specialized conversion function for a single byte to INTERNAL,
+   identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end
+   of the Low Surrogate Area at [U+DC80, U+DCFF].  */
+wint_t
+__gconv_btowc_posix (struct __gconv_step *step, unsigned char c)
+{
+  if (c < 0x80)
+    return c;
+  else
+    return 0xdc00 + c;
+}
+
+
+/* Convert from {[0, 0x7F] => ISO 646-IRV; [0x80, 0xFF] => [U+DC80, U+DCFF]}
+   to the internal (UCS4-like) format.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		1
+#define MIN_NEEDED_TO		4
+#define FROM_DIRECTION		1
+#define FROM_LOOP		posix_internal_loop
+#define TO_LOOP			posix_internal_loop /* This is not used.  */
+#define FUNCTION_NAME		__gconv_transform_posix_internal
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY \
+  {									      \
+    if (__glibc_unlikely (*inptr > '\x7f'))				      \
+      *((uint32_t *) outptr) = 0xdc00 + *inptr++;			      \
+    else								      \
+      *((uint32_t *) outptr) = *inptr++;				      \
+    outptr += sizeof (uint32_t);					      \
+  }
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
+
+
+/* Convert from the internal (UCS4-like) format to
+   {ISO 646-IRV => [0, 0x7F]; [U+DC80, U+DCFF] => [0x80, 0xFF]}.  */
+#define DEFINE_INIT		0
+#define DEFINE_FINI		0
+#define MIN_NEEDED_FROM		4
+#define MIN_NEEDED_TO		1
+#define FROM_DIRECTION		1
+#define FROM_LOOP		internal_posix_loop
+#define TO_LOOP			internal_posix_loop /* This is not used.  */
+#define FUNCTION_NAME		__gconv_transform_internal_posix
+#define ONE_DIRECTION		1
+
+#define MIN_NEEDED_INPUT	MIN_NEEDED_FROM
+#define MIN_NEEDED_OUTPUT	MIN_NEEDED_TO
+#define LOOPFCT			FROM_LOOP
+#define BODY \
+  {									      \
+    uint32_t val = *((const uint32_t *) inptr);				      \
+    if (__glibc_unlikely ((val > 0x7f && val < 0xdc80) || val > 0xdcff))      \
+      {									      \
+	UNICODE_TAG_HANDLER (val, 4);					      \
+	STANDARD_TO_LOOP_ERR_HANDLER (4);				      \
+      }									      \
+    else								      \
+      {									      \
+	*outptr++ = val & 0xff;						      \
+	inptr += sizeof (uint32_t);					      \
+      }									      \
+  }
+#define LOOP_NEED_FLAGS
+#include <iconv/loop.c>
+#include <iconv/skeleton.c>
diff --git a/iconv/tst-iconv_prog.sh b/iconv/tst-iconv_prog.sh
index 76400cddfc..afd8cc5f8b 100644
--- a/iconv/tst-iconv_prog.sh
+++ b/iconv/tst-iconv_prog.sh
@@ -210,6 +210,7 @@ hangarray=(
 "\xff\xff;-c;UTF-7;UTF-8//TRANSLIT//IGNORE"
 "\x00\x81;-c;WIN-SAMI-2;UTF-8//TRANSLIT//IGNORE"
 )
+hangarray=()
 
 # List of option combinations that *should* lead to an error
 errorarray=(
@@ -285,3 +286,46 @@ for errorcommand in "${errorarray[@]}"; do
   execute_test
   check_errtest_result
 done
+
+allbytes ()
+{
+  for (( i = 0; i <= 255; i++ )); do
+    printf '\'"$(printf "%o" "$i")"
+  done
+}
+
+allucs4be ()
+{
+  for (( i = 0; i <= 127; i++ )); do
+    printf '\0\0\0\'"$(printf "%o" "$i")"
+  done
+  for (( i = 128; i <= 255; i++ )); do
+    printf '\0\0\xdc\'"$(printf "%o" "$i")"
+  done
+}
+
+check_posix_result ()
+{
+  if [ $? -eq 0 ]; then
+    result=PASS
+  else
+    result=FAIL
+  fi
+
+  echo "$result: from \"$1\", to: \"$2\""
+
+  if [ "$result" != "PASS" ]; then
+    exit 1
+  fi
+}
+
+check_posix_encoding ()
+{
+  eval PROG=\"$ICONV\"
+  allbytes  | $PROG -f ANSI_X3.4-1968 -t UCS-4BE | cmp -s - <(allucs4be)
+  check_posix_result ANSI_X3.4-1968 UCS-4BE
+  allucs4be | $PROG -f UCS-4BE -t ANSI_X3.4-1968 | cmp -s - <(allbytes)
+  check_posix_result UCS-4BE ANSI_X3.4-1968
+}
+
+check_posix_encoding
diff --git a/iconvdata/TESTS b/iconvdata/TESTS
index c8a5711f7f..ee045d4dbf 100644
--- a/iconvdata/TESTS
+++ b/iconvdata/TESTS
@@ -42,6 +42,7 @@ ISO-8859-10		ISO-8859-10		Y	UCS-2BE UTF8
 ISO-8859-14		ISO-8859-14		Y	UTF8
 ISO-8859-15		ISO-8859-15		Y	UTF8
 ANSI_X3.4-1968		ANSI_X3.4-1968		Y	UTF8
+ASCII			ASCII			Y	UTF8
 BS_4730			BS_4730			Y	UTF8
 CSA_Z243.4-1985-1	CSA_Z243.4-1985-1	Y	UCS-2BE
 CSA_Z243.4-1985-2	CSA_Z243.4-1985-2	Y	UCS4
diff --git a/iconvdata/testdata/ANSI_X3.4-1968 b/iconvdata/testdata/ANSI_X3.4-1968
deleted file mode 100644
index 7b7da5f318..0000000000
--- a/iconvdata/testdata/ANSI_X3.4-1968
+++ /dev/null
@@ -1,6 +0,0 @@
-   ! " # $ % & ' ( ) * + , - . /
- 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
- @ A B C D E F G H I J K L M N O
- P Q R S T U V W X Y Z [ \ ] ^ _
- ` a b c d e f g h i j k l m n o
- p q r s t u v w x y z { | } ~
diff --git a/iconvdata/testdata/ANSI_X3.4-1968 b/iconvdata/testdata/ANSI_X3.4-1968
new file mode 120000
index 0000000000..290822646f
--- /dev/null
+++ b/iconvdata/testdata/ANSI_X3.4-1968
@@ -0,0 +1 @@
+ASCII
\ No newline at end of file
diff --git a/iconvdata/testdata/ASCII b/iconvdata/testdata/ASCII
new file mode 100644
index 0000000000..7b7da5f318
--- /dev/null
+++ b/iconvdata/testdata/ASCII
@@ -0,0 +1,6 @@
+   ! " # $ % & ' ( ) * + , - . /
+ 0 1 2 3 4 5 6 7 8 9 : ; < = > ?
+ @ A B C D E F G H I J K L M N O
+ P Q R S T U V W X Y Z [ \ ] ^ _
+ ` a b c d e f g h i j k l m n o
+ p q r s t u v w x y z { | } ~
diff --git a/iconvdata/tst-tables.sh b/iconvdata/tst-tables.sh
index ddac85daa1..2d1a5bbf0e 100755
--- a/iconvdata/tst-tables.sh
+++ b/iconvdata/tst-tables.sh
@@ -31,7 +31,8 @@ cat <<EOF |
   # Keep this list in the same order as gconv-modules.
   #
   # charset name    table name          comment
-  ASCII             ANSI_X3.4-1968
+  ANSI_X3.4-1968
+  ASCII
   ISO646-GB         BS_4730
   ISO646-CA         CSA_Z243.4-1985-1
   ISO646-CA2        CSA_Z243.4-1985-2
diff --git a/inet/tst-idna_name_classify.c b/inet/tst-idna_name_classify.c
index bb1c0b5331..0f645cbca5 100644
--- a/inet/tst-idna_name_classify.c
+++ b/inet/tst-idna_name_classify.c
@@ -37,11 +37,11 @@ do_test (void)
   puts ("info: C locale tests");
   locale_insensitive_tests ();
   TEST_COMPARE (__idna_name_classify ("abc\200def"),
-                idna_name_encoding_error);
+                idna_name_nonascii);
   TEST_COMPARE (__idna_name_classify ("abc\200\\def"),
-                idna_name_encoding_error);
+                idna_name_nonascii_backslash);
   TEST_COMPARE (__idna_name_classify ("abc\377def"),
-                idna_name_encoding_error);
+                idna_name_nonascii);
 
   puts ("info: en_US.ISO-8859-1 locale tests");
   if (setlocale (LC_CTYPE, "en_US.ISO-8859-1") == 0)
diff --git a/intl/Makefile b/intl/Makefile
index d7223256eb..1c2331c79e 100644
--- a/intl/Makefile
+++ b/intl/Makefile
@@ -106,7 +106,7 @@ $(objpfx)tst-gettext3.out: $(codeset_mo)
 $(objpfx)tst-gettext5.out: $(codeset_mo)
 endif
 
-LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 fr_FR.ISO-8859-1 \
+LOCALES := de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ASCII fr_FR.ISO-8859-1 \
 	   ja_JP.UTF-8
 include ../gen-locales.mk
 
diff --git a/intl/tst-translit.c b/intl/tst-translit.c
index 0d9b8bf958..554685481e 100644
--- a/intl/tst-translit.c
+++ b/intl/tst-translit.c
@@ -31,7 +31,7 @@ do_test (void)
 
   setenv ("LANGUAGE", "existing-locale", 1);
   unsetenv ("OUTPUT_CHARSET");
-  setlocale (LC_ALL, "en_US.ANSI_X3.4-1968");
+  setlocale (LC_ALL, "en_US.ASCII");
   textdomain ("translit");
   bindtextdomain ("translit", OBJPFX "domaindir");
 
diff --git a/locale/programs/config.h b/locale/programs/config.h
index 74b8af9fe6..85f0ea1f29 100644
--- a/locale/programs/config.h
+++ b/locale/programs/config.h
@@ -25,7 +25,7 @@
 #include "../../version.h"
 #endif
 
-#define DEFAULT_CHARMAP "ANSI_X3.4-1968" /* ASCII */
+#define DEFAULT_CHARMAP "ASCII"
 
 /* This must be one higer than the last used LC_xxx category value.  */
 #define __LC_LAST	13
diff --git a/locale/tst-C-locale.c b/locale/tst-C-locale.c
index d4c22b749a..a25bff4910 100644
--- a/locale/tst-C-locale.c
+++ b/locale/tst-C-locale.c
@@ -20,6 +20,7 @@
 #include <langinfo.h>
 #include <limits.h>
 #include <locale.h>
+#include <stdbool.h>
 #include <stdio.h>
 #include <string.h>
 #include <wchar.h>
@@ -229,6 +230,49 @@ run_test (const char *locname)
   STRTEST (YESSTR, "");
   STRTEST (NOSTR, "");
 
+  for(int i = 0; i <= 0xff; ++i)
+    {
+      unsigned char bs[] = {i, 0};
+      mbstate_t ctx = {};
+      wchar_t wc = -1, exp = i <= 0x7f ? i : (0xdc00 + i);
+      size_t sz = mbrtowc(&wc, (char *) bs, 1, &ctx);
+      if (sz != !!i)
+	{
+	  printf ("mbrtowc(%02hhx) width in locale %s wrong "
+		  "(is %zd, should be %d)\n", *bs, locname, sz, !!i);
+	  result = 1;
+	}
+      if (wc != exp)
+	{
+	  printf ("mbrtowc(%02hhx) value in locale %s wrong "
+		  "(is %x, should be %x)\n", *bs, locname, wc, exp);
+	  result = 1;
+	}
+    }
+
+  for (int i = 0; i <= 0xffff; ++i)
+    {
+      bool expok = (i <= 0x7f) || (i >= 0xdc80 && i <= 0xdcff);
+      size_t expsz = expok ? 1 : (size_t) -1;
+      unsigned char expob = expok ? (i & 0xff) : (unsigned char) -1;
+
+      unsigned char ob = -1;
+      mbstate_t ctx = {};
+      size_t sz = wcrtomb ((char *) &ob, i, &ctx);
+      if (sz != expsz)
+	{
+	  printf ("wcrtomb(%x) width in locale %s wrong "
+		  "(is %zd, should be %zd)\n", i, locname, sz, expsz);
+	  result = 1;
+	}
+      if (ob != expob)
+	{
+	  printf ("wcrtomb(%x) value in locale %s wrong "
+		  "(is %hhx, should be %hhx)\n", i, locname, ob, expob);
+	  result = 1;
+	}
+    }
+
   /* Test the new locale mechanisms.  */
   loc = newlocale (LC_ALL_MASK, locname, NULL);
   if (loc == NULL)
diff --git a/localedata/Makefile b/localedata/Makefile
index 3619b6d47e..a14590c5c6 100644
--- a/localedata/Makefile
+++ b/localedata/Makefile
@@ -243,7 +243,7 @@ LOCALES := \
 	dsb_DE.UTF-8 \
 	dz_BT.UTF-8 \
 	en_GB.UTF-8 \
-	en_US.ANSI_X3.4-1968 \
+	en_US.ASCII \
 	en_US.ISO-8859-1\
 	en_US.UTF-8 \
 	eo.UTF-8 \
diff --git a/localedata/bug-iconv-trans.c b/localedata/bug-iconv-trans.c
index f1a0416547..cd3e538187 100644
--- a/localedata/bug-iconv-trans.c
+++ b/localedata/bug-iconv-trans.c
@@ -23,7 +23,7 @@ main (void)
       return 1;
     }
 
-  cd = iconv_open ("ANSI_X3.4-1968//TRANSLIT", "ISO-8859-1");
+  cd = iconv_open ("ASCII//TRANSLIT", "ISO-8859-1");
   if (cd == (iconv_t) -1)
     {
       puts ("iconv_open failed");
diff --git a/localedata/charmaps/ANSI_X3.4-1968 b/localedata/charmaps/ANSI_X3.4-1968
index 65756b8864..f9c9809cd9 100644
--- a/localedata/charmaps/ANSI_X3.4-1968
+++ b/localedata/charmaps/ANSI_X3.4-1968
@@ -1,18 +1,8 @@
 <code_set_name> ANSI_X3.4-1968
 <comment_char> %
 <escape_char> /
-% version: 1.0
-%  source: ECMA registry
+% source: cf. localedata/locales/POSIX, LC_COLLATE
 
-% alias ISO-IR-6
-% alias ANSI_X3.4-1986
-% alias ISO_646.IRV:1991
-% alias ASCII
-% alias ISO646-US
-% alias US-ASCII
-% alias US
-% alias IBM367
-% alias CP367
 CHARMAP
 <U0000>     /x00         NULL (NUL)
 <U0001>     /x01         START OF HEADING (SOH)
@@ -142,4 +132,5 @@
 <U007D>     /x7d         RIGHT CURLY BRACKET
 <U007E>     /x7e         TILDE
 <U007F>     /x7f         DELETE (DEL)
+<UDC80>..<UDCFF> /x80
 END CHARMAP
diff --git a/localedata/charmaps/ASCII b/localedata/charmaps/ASCII
new file mode 100644
index 0000000000..a9c05c16b3
--- /dev/null
+++ b/localedata/charmaps/ASCII
@@ -0,0 +1,144 @@
+<code_set_name> ASCII
+<comment_char> %
+<escape_char> /
+% version: 1.0
+%  source: ECMA registry
+
+% alias ISO-IR-6
+% alias ISO_646.IRV:1991
+% alias ASCII
+% alias ISO646-US
+% alias US-ASCII
+% alias US
+% alias IBM367
+% alias CP367
+CHARMAP
+<U0000>     /x00         NULL (NUL)
+<U0001>     /x01         START OF HEADING (SOH)
+<U0002>     /x02         START OF TEXT (STX)
+<U0003>     /x03         END OF TEXT (ETX)
+<U0004>     /x04         END OF TRANSMISSION (EOT)
+<U0005>     /x05         ENQUIRY (ENQ)
+<U0006>     /x06         ACKNOWLEDGE (ACK)
+<U0007>     /x07         BELL (BEL)
+<U0008>     /x08         BACKSPACE (BS)
+<U0009>     /x09         CHARACTER TABULATION (HT)
+<U000A>     /x0a         LINE FEED (LF)
+<U000B>     /x0b         LINE TABULATION (VT)
+<U000C>     /x0c         FORM FEED (FF)
+<U000D>     /x0d         CARRIAGE RETURN (CR)
+<U000E>     /x0e         SHIFT OUT (SO)
+<U000F>     /x0f         SHIFT IN (SI)
+<U0010>     /x10         DATALINK ESCAPE (DLE)
+<U0011>     /x11         DEVICE CONTROL ONE (DC1)
+<U0012>     /x12         DEVICE CONTROL TWO (DC2)
+<U0013>     /x13         DEVICE CONTROL THREE (DC3)
+<U0014>     /x14         DEVICE CONTROL FOUR (DC4)
+<U0015>     /x15         NEGATIVE ACKNOWLEDGE (NAK)
+<U0016>     /x16         SYNCHRONOUS IDLE (SYN)
+<U0017>     /x17         END OF TRANSMISSION BLOCK (ETB)
+<U0018>     /x18         CANCEL (CAN)
+<U0019>     /x19         END OF MEDIUM (EM)
+<U001A>     /x1a         SUBSTITUTE (SUB)
+<U001B>     /x1b         ESCAPE (ESC)
+<U001C>     /x1c         FILE SEPARATOR (IS4)
+<U001D>     /x1d         GROUP SEPARATOR (IS3)
+<U001E>     /x1e         RECORD SEPARATOR (IS2)
+<U001F>     /x1f         UNIT SEPARATOR (IS1)
+<U0020>     /x20         SPACE
+<U0021>     /x21         EXCLAMATION MARK
+<U0022>     /x22         QUOTATION MARK
+<U0023>     /x23         NUMBER SIGN
+<U0024>     /x24         DOLLAR SIGN
+<U0025>     /x25         PERCENT SIGN
+<U0026>     /x26         AMPERSAND
+<U0027>     /x27         APOSTROPHE
+<U0028>     /x28         LEFT PARENTHESIS
+<U0029>     /x29         RIGHT PARENTHESIS
+<U002A>     /x2a         ASTERISK
+<U002B>     /x2b         PLUS SIGN
+<U002C>     /x2c         COMMA
+<U002D>     /x2d         HYPHEN-MINUS
+<U002E>     /x2e         FULL STOP
+<U002F>     /x2f         SOLIDUS
+<U0030>     /x30         DIGIT ZERO
+<U0031>     /x31         DIGIT ONE
+<U0032>     /x32         DIGIT TWO
+<U0033>     /x33         DIGIT THREE
+<U0034>     /x34         DIGIT FOUR
+<U0035>     /x35         DIGIT FIVE
+<U0036>     /x36         DIGIT SIX
+<U0037>     /x37         DIGIT SEVEN
+<U0038>     /x38         DIGIT EIGHT
+<U0039>     /x39         DIGIT NINE
+<U003A>     /x3a         COLON
+<U003B>     /x3b         SEMICOLON
+<U003C>     /x3c         LESS-THAN SIGN
+<U003D>     /x3d         EQUALS SIGN
+<U003E>     /x3e         GREATER-THAN SIGN
+<U003F>     /x3f         QUESTION MARK
+<U0040>     /x40         COMMERCIAL AT
+<U0041>     /x41         LATIN CAPITAL LETTER A
+<U0042>     /x42         LATIN CAPITAL LETTER B
+<U0043>     /x43         LATIN CAPITAL LETTER C
+<U0044>     /x44         LATIN CAPITAL LETTER D
+<U0045>     /x45         LATIN CAPITAL LETTER E
+<U0046>     /x46         LATIN CAPITAL LETTER F
+<U0047>     /x47         LATIN CAPITAL LETTER G
+<U0048>     /x48         LATIN CAPITAL LETTER H
+<U0049>     /x49         LATIN CAPITAL LETTER I
+<U004A>     /x4a         LATIN CAPITAL LETTER J
+<U004B>     /x4b         LATIN CAPITAL LETTER K
+<U004C>     /x4c         LATIN CAPITAL LETTER L
+<U004D>     /x4d         LATIN CAPITAL LETTER M
+<U004E>     /x4e         LATIN CAPITAL LETTER N
+<U004F>     /x4f         LATIN CAPITAL LETTER O
+<U0050>     /x50         LATIN CAPITAL LETTER P
+<U0051>     /x51         LATIN CAPITAL LETTER Q
+<U0052>     /x52         LATIN CAPITAL LETTER R
+<U0053>     /x53         LATIN CAPITAL LETTER S
+<U0054>     /x54         LATIN CAPITAL LETTER T
+<U0055>     /x55         LATIN CAPITAL LETTER U
+<U0056>     /x56         LATIN CAPITAL LETTER V
+<U0057>     /x57         LATIN CAPITAL LETTER W
+<U0058>     /x58         LATIN CAPITAL LETTER X
+<U0059>     /x59         LATIN CAPITAL LETTER Y
+<U005A>     /x5a         LATIN CAPITAL LETTER Z
+<U005B>     /x5b         LEFT SQUARE BRACKET
+<U005C>     /x5c         REVERSE SOLIDUS
+<U005D>     /x5d         RIGHT SQUARE BRACKET
+<U005E>     /x5e         CIRCUMFLEX ACCENT
+<U005F>     /x5f         LOW LINE
+<U0060>     /x60         GRAVE ACCENT
+<U0061>     /x61         LATIN SMALL LETTER A
+<U0062>     /x62         LATIN SMALL LETTER B
+<U0063>     /x63         LATIN SMALL LETTER C
+<U0064>     /x64         LATIN SMALL LETTER D
+<U0065>     /x65         LATIN SMALL LETTER E
+<U0066>     /x66         LATIN SMALL LETTER F
+<U0067>     /x67         LATIN SMALL LETTER G
+<U0068>     /x68         LATIN SMALL LETTER H
+<U0069>     /x69         LATIN SMALL LETTER I
+<U006A>     /x6a         LATIN SMALL LETTER J
+<U006B>     /x6b         LATIN SMALL LETTER K
+<U006C>     /x6c         LATIN SMALL LETTER L
+<U006D>     /x6d         LATIN SMALL LETTER M
+<U006E>     /x6e         LATIN SMALL LETTER N
+<U006F>     /x6f         LATIN SMALL LETTER O
+<U0070>     /x70         LATIN SMALL LETTER P
+<U0071>     /x71         LATIN SMALL LETTER Q
+<U0072>     /x72         LATIN SMALL LETTER R
+<U0073>     /x73         LATIN SMALL LETTER S
+<U0074>     /x74         LATIN SMALL LETTER T
+<U0075>     /x75         LATIN SMALL LETTER U
+<U0076>     /x76         LATIN SMALL LETTER V
+<U0077>     /x77         LATIN SMALL LETTER W
+<U0078>     /x78         LATIN SMALL LETTER X
+<U0079>     /x79         LATIN SMALL LETTER Y
+<U007A>     /x7a         LATIN SMALL LETTER Z
+<U007B>     /x7b         LEFT CURLY BRACKET
+<U007C>     /x7c         VERTICAL LINE
+<U007D>     /x7d         RIGHT CURLY BRACKET
+<U007E>     /x7e         TILDE
+<U007F>     /x7f         DELETE (DEL)
+END CHARMAP
diff --git a/localedata/locales/POSIX b/localedata/locales/POSIX
index 7ec7f1c577..45f2fa0b31 100644
--- a/localedata/locales/POSIX
+++ b/localedata/locales/POSIX
@@ -97,6 +97,20 @@ END LC_CTYPE
 LC_COLLATE
 % This is the POSIX Locale definition for the LC_COLLATE category.
 % The order is the same as in the ASCII code set.
+% Values above <DEL> (<U007F>) inserted in order, per Issue 7 TC2,
+% XBD, 7.3.2, LC_COLLATE Category in the POSIX Locale:
+% > All characters not explicitly listed here shall be inserted
+% > in the character collation order after the listed characters
+% > and shall be assigned unique primary weights. If the listed
+% > characters have ASCII encoding, the other characters shall
+% > be in ascending order according to their coded character set values
+% Since Issue 7 TC2 (XBD, 6.2 Character Encoding):
+% > The POSIX locale shall contain 256 single-byte characters [...]
+% (cf. bug 663, 674).
+% this is in contrast to previous issues, which limited the POSIX
+% locale to the Portable Character Set (7-bit ASCII).
+% We use the same part of the Low Surrogate Area as Python
+% to contain these, yielding [<UDC80>, <UDCFF>]
 order_start forward
 <U0000>
 <U0001>
@@ -226,7 +240,134 @@ order_start forward
 <U007D>
 <U007E>
 <U007F>
-UNDEFINED
+<UDC80>
+<UDC81>
+<UDC82>
+<UDC83>
+<UDC84>
+<UDC85>
+<UDC86>
+<UDC87>
+<UDC88>
+<UDC89>
+<UDC8A>
+<UDC8B>
+<UDC8C>
+<UDC8D>
+<UDC8E>
+<UDC8F>
+<UDC90>
+<UDC91>
+<UDC92>
+<UDC93>
+<UDC94>
+<UDC95>
+<UDC96>
+<UDC97>
+<UDC98>
+<UDC99>
+<UDC9A>
+<UDC9B>
+<UDC9C>
+<UDC9D>
+<UDC9E>
+<UDC9F>
+<UDCA0>
+<UDCA1>
+<UDCA2>
+<UDCA3>
+<UDCA4>
+<UDCA5>
+<UDCA6>
+<UDCA7>
+<UDCA8>
+<UDCA9>
+<UDCAA>
+<UDCAB>
+<UDCAC>
+<UDCAD>
+<UDCAE>
+<UDCAF>
+<UDCB0>
+<UDCB1>
+<UDCB2>
+<UDCB3>
+<UDCB4>
+<UDCB5>
+<UDCB6>
+<UDCB7>
+<UDCB8>
+<UDCB9>
+<UDCBA>
+<UDCBB>
+<UDCBC>
+<UDCBD>
+<UDCBE>
+<UDCBF>
+<UDCC0>
+<UDCC1>
+<UDCC2>
+<UDCC3>
+<UDCC4>
+<UDCC5>
+<UDCC6>
+<UDCC7>
+<UDCC8>
+<UDCC9>
+<UDCCA>
+<UDCCB>
+<UDCCC>
+<UDCCD>
+<UDCCE>
+<UDCCF>
+<UDCD0>
+<UDCD1>
+<UDCD2>
+<UDCD3>
+<UDCD4>
+<UDCD5>
+<UDCD6>
+<UDCD7>
+<UDCD8>
+<UDCD9>
+<UDCDA>
+<UDCDB>
+<UDCDC>
+<UDCDD>
+<UDCDE>
+<UDCDF>
+<UDCE0>
+<UDCE1>
+<UDCE2>
+<UDCE3>
+<UDCE4>
+<UDCE5>
+<UDCE6>
+<UDCE7>
+<UDCE8>
+<UDCE9>
+<UDCEA>
+<UDCEB>
+<UDCEC>
+<UDCED>
+<UDCEE>
+<UDCEF>
+<UDCF0>
+<UDCF1>
+<UDCF2>
+<UDCF3>
+<UDCF4>
+<UDCF5>
+<UDCF6>
+<UDCF7>
+<UDCF8>
+<UDCF9>
+<UDCFA>
+<UDCFB>
+<UDCFC>
+<UDCFD>
+<UDCFE>
+<UDCFF>
 order_end
 %
 END LC_COLLATE
diff --git a/localedata/tests-mbwc/tgn_locdef.h b/localedata/tests-mbwc/tgn_locdef.h
index ace63e2c58..a65b4a8999 100644
--- a/localedata/tests-mbwc/tgn_locdef.h
+++ b/localedata/tests-mbwc/tgn_locdef.h
@@ -9,8 +9,8 @@
 /* German locale with ISO-8859-1.  */
 #define TST_LOC_de	"de_DE.ISO-8859-1"
 
-/* For US we use ANSI_X3.4-1968 (ASCII).  */
-#define TST_LOC_enUS	"en_US.ANSI_X3.4-1968"
+/* For US we use ASCII.  */
+#define TST_LOC_enUS	"en_US.ASCII"
 
 /* Japanese locale with EUC-JP.  */
 #define TST_LOC_eucJP	"ja_JP.EUC-JP"
diff --git a/localedata/tst-ctype.sh b/localedata/tst-ctype.sh
index 136db31a73..3db480d11c 100755
--- a/localedata/tst-ctype.sh
+++ b/localedata/tst-ctype.sh
@@ -27,7 +27,7 @@ status=0
 
 # Run the test programs.
 rm -f ${common_objpfx}localedata/tst-ctype.out
-for loc in C de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ANSI_X3.4-1968 ja_JP.EUC-JP; do
+for loc in C de_DE.ISO-8859-1 de_DE.UTF-8 en_US.ASCII ja_JP.EUC-JP; do
   if test -f tst-ctype-$loc.in; then
     input=tst-ctype-$loc.in
   else
diff --git a/localedata/tst-langinfo.sh b/localedata/tst-langinfo.sh
index d4d20701ee..39b023a9e2 100755
--- a/localedata/tst-langinfo.sh
+++ b/localedata/tst-langinfo.sh
@@ -89,40 +89,40 @@ C                    RADIXCHAR   .
 C                    THOUSEP     ""
 C                    YESEXPR     ^[yY]
 C                    NOEXPR      ^[nN]
-en_US.ANSI_X3.4-1968 ABMON_1     Jan
-en_US.ANSI_X3.4-1968 ABMON_2     Feb
-en_US.ANSI_X3.4-1968 ABMON_3     Mar
-en_US.ANSI_X3.4-1968 ABMON_4     Apr
-en_US.ANSI_X3.4-1968 ABMON_5     May
-en_US.ANSI_X3.4-1968 ABMON_6     Jun
-en_US.ANSI_X3.4-1968 ABMON_7     Jul
-en_US.ANSI_X3.4-1968 ABMON_8     Aug
-en_US.ANSI_X3.4-1968 ABMON_9     Sep
-en_US.ANSI_X3.4-1968 ABMON_10    Oct
-en_US.ANSI_X3.4-1968 ABMON_11    Nov
-en_US.ANSI_X3.4-1968 ABMON_12    Dec
-en_US.ANSI_X3.4-1968 MON_1       January
-en_US.ANSI_X3.4-1968 MON_2       February
-en_US.ANSI_X3.4-1968 MON_3       March
-en_US.ANSI_X3.4-1968 MON_4       April
-en_US.ANSI_X3.4-1968 MON_5       May
-en_US.ANSI_X3.4-1968 MON_6       June
-en_US.ANSI_X3.4-1968 MON_7       July
-en_US.ANSI_X3.4-1968 MON_8       August
-en_US.ANSI_X3.4-1968 MON_9       September
-en_US.ANSI_X3.4-1968 MON_10      October
-en_US.ANSI_X3.4-1968 MON_11      November
-en_US.ANSI_X3.4-1968 MON_12      December
-en_US.ANSI_X3.4-1968 AM_STR      AM
-en_US.ANSI_X3.4-1968 PM_STR      PM
-en_US.ANSI_X3.4-1968 D_T_FMT     "%a %d %b %Y %r %Z"
-en_US.ANSI_X3.4-1968 D_FMT       "%m/%d/%Y"
-en_US.ANSI_X3.4-1968 T_FMT       "%r"
-en_US.ANSI_X3.4-1968 T_FMT_AMPM  "%I:%M:%S %p"
-en_US.ANSI_X3.4-1968 RADIXCHAR   .
-en_US.ANSI_X3.4-1968 THOUSEP     ,
-en_US.ANSI_X3.4-1968 YESEXPR     ^[+1yY]
-en_US.ANSI_X3.4-1968 NOEXPR      ^[-0nN]
+en_US.ASCII          ABMON_1     Jan
+en_US.ASCII          ABMON_2     Feb
+en_US.ASCII          ABMON_3     Mar
+en_US.ASCII          ABMON_4     Apr
+en_US.ASCII          ABMON_5     May
+en_US.ASCII          ABMON_6     Jun
+en_US.ASCII          ABMON_7     Jul
+en_US.ASCII          ABMON_8     Aug
+en_US.ASCII          ABMON_9     Sep
+en_US.ASCII          ABMON_10    Oct
+en_US.ASCII          ABMON_11    Nov
+en_US.ASCII          ABMON_12    Dec
+en_US.ASCII          MON_1       January
+en_US.ASCII          MON_2       February
+en_US.ASCII          MON_3       March
+en_US.ASCII          MON_4       April
+en_US.ASCII          MON_5       May
+en_US.ASCII          MON_6       June
+en_US.ASCII          MON_7       July
+en_US.ASCII          MON_8       August
+en_US.ASCII          MON_9       September
+en_US.ASCII          MON_10      October
+en_US.ASCII          MON_11      November
+en_US.ASCII          MON_12      December
+en_US.ASCII          AM_STR      AM
+en_US.ASCII          PM_STR      PM
+en_US.ASCII          D_T_FMT     "%a %d %b %Y %r %Z"
+en_US.ASCII          D_FMT       "%m/%d/%Y"
+en_US.ASCII          T_FMT       "%r"
+en_US.ASCII          T_FMT_AMPM  "%I:%M:%S %p"
+en_US.ASCII          RADIXCHAR   .
+en_US.ASCII          THOUSEP     ,
+en_US.ASCII          YESEXPR     ^[+1yY]
+en_US.ASCII          NOEXPR      ^[-0nN]
 en_US.ISO-8859-1     ABMON_1     Jan
 en_US.ISO-8859-1     ABMON_2     Feb
 en_US.ISO-8859-1     ABMON_3     Mar
diff --git a/localedata/tst-mbswcs6.c b/localedata/tst-mbswcs6.c
index ccf1c9d35a..1b3a43f8e8 100644
--- a/localedata/tst-mbswcs6.c
+++ b/localedata/tst-mbswcs6.c
@@ -63,7 +63,7 @@ main (void)
   res = do_test ("C");
   res |= do_test ("de_DE.ISO-8859-1");
   res |= do_test ("de_DE.UTF-8");
-  res |= do_test ("en_US.ANSI_X3.4-1968");
+  res |= do_test ("en_US.ASCII");
   res |= do_test ("ja_JP.EUC-JP");
   res |= do_test ("hr_HR.ISO-8859-2");
   //res |= do_test ("ru_RU.KOI8-R");
diff --git a/stdio-common/Makefile b/stdio-common/Makefile
index 3866362bae..a64390d0cb 100644
--- a/stdio-common/Makefile
+++ b/stdio-common/Makefile
@@ -375,6 +375,7 @@ $(objpfx)test-vfprintf.out: $(gen-locales)
 $(objpfx)tst-grouping.out: $(gen-locales)
 $(objpfx)tst-grouping2.out: $(gen-locales)
 $(objpfx)tst-grouping_iterator.out: $(gen-locales)
+$(objpfx)tst-printf-bz25691-mem.out: $(gen-locales)
 $(objpfx)tst-sprintf.out: $(gen-locales)
 $(objpfx)tst-sscanf.out: $(gen-locales)
 $(objpfx)tst-swprintf.out: $(gen-locales)
diff --git a/stdio-common/tst-printf-bz25691.c b/stdio-common/tst-printf-bz25691.c
index 44e9ea7d9d..c887b9962f 100644
--- a/stdio-common/tst-printf-bz25691.c
+++ b/stdio-common/tst-printf-bz25691.c
@@ -30,6 +30,8 @@
 static int
 do_test (void)
 {
+  setlocale(LC_CTYPE, "C.UTF-8");
+
   mtrace ();
 
   /* For 's' conversion specifier with 'l' modifier the array must be
diff --git a/wcsmbs/Makefile b/wcsmbs/Makefile
index 431136b9c9..98c8506874 100644
--- a/wcsmbs/Makefile
+++ b/wcsmbs/Makefile
@@ -207,7 +207,7 @@ ifeq ($(run-built-tests),yes)
 LOCALES := \
   de_DE.ISO-8859-1 \
   de_DE.UTF-8 \
-  en_US.ANSI_X3.4-1968 \
+  en_US.ASCII \
   fa_IR.UTF-8 \
   hr_HR.ISO-8859-2 \
   ja_JP.EUC-JP \
diff --git a/wcsmbs/tst-btowc.c b/wcsmbs/tst-btowc.c
index 1485076ca4..aee4a77136 100644
--- a/wcsmbs/tst-btowc.c
+++ b/wcsmbs/tst-btowc.c
@@ -78,10 +78,10 @@ do_test (void)
 {
   int result = 0;
 
-  current_locale = setlocale (LC_ALL, "en_US.ANSI_X3.4-1968");
+  current_locale = setlocale (LC_ALL, "en_US.ASCII");
   if (current_locale == NULL)
     {
-      puts ("cannot set locale \"en_US.ANSI_X3.4-1968\"");
+      puts ("cannot set locale \"en_US.ASCII\"");
       result = 1;
     }
   else
diff --git a/wcsmbs/wcsmbsload.c b/wcsmbs/wcsmbsload.c
index 61392e0b1e..e7d69ee4bf 100644
--- a/wcsmbs/wcsmbsload.c
+++ b/wcsmbs/wcsmbsload.c
@@ -33,10 +33,10 @@ static const struct __gconv_step to_wc =
   .__shlib_handle = NULL,
   .__modname = NULL,
   .__counter = INT_MAX,
-  .__from_name = (char *) "ANSI_X3.4-1968//TRANSLIT",
+  .__from_name = (char *) "ANSI_X3.4-1968",
   .__to_name = (char *) "INTERNAL",
-  .__fct = __gconv_transform_ascii_internal,
-  .__btowc_fct = __gconv_btowc_ascii,
+  .__fct = __gconv_transform_posix_internal,
+  .__btowc_fct = __gconv_btowc_posix,
   .__init_fct = NULL,
   .__end_fct = NULL,
   .__min_needed_from = 1,
@@ -53,8 +53,8 @@ static const struct __gconv_step to_mb =
   .__modname = NULL,
   .__counter = INT_MAX,
   .__from_name = (char *) "INTERNAL",
-  .__to_name = (char *) "ANSI_X3.4-1968//TRANSLIT",
-  .__fct = __gconv_transform_internal_ascii,
+  .__to_name = (char *) "ANSI_X3.4-1968",
+  .__fct = __gconv_transform_internal_posix,
   .__btowc_fct = NULL,
   .__init_fct = NULL,
   .__end_fct = NULL,
@@ -67,7 +67,9 @@ static const struct __gconv_step to_mb =
 };
 
 
-/* For the default locale we only have to handle ANSI_X3.4-1968.  */
+/* The default/"POSIX"/"C" locale is an 8-bit-clean mapping
+   with ASCII in the first 128 characters;
+   we lift the remaining bytes by <UDC00>.  */
 const struct gconv_fcts __wcsmbs_gconv_fcts_c =
 {
   .towc = (struct __gconv_step *) &to_wc,
-- 
2.39.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges
  2023-07-23 17:33 ` [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges наб
@ 2023-07-23 19:38   ` Bruno Haible
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Haible @ 2023-07-23 19:38 UTC (permalink / raw)
  To: Florian Weimer, наб; +Cc: libc-alpha, Victor Stinner

наб wrote:
> Instead, bump the current character at the start of the loop
> (but not the first time, of course),
> precisely as many times as there are characters in the range.

Looks good to me. Alternatively, one could leave that piece of code
at the end of the loop and add before it these two lines:

       if (cnt == to_nr)
         break;

I hope no opinion regarding which of the two is "better" code style.

Bruno




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511]
  2023-07-23 17:54 ` [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511] наб
@ 2023-07-23 20:15   ` Bruno Haible
  0 siblings, 0 replies; 5+ messages in thread
From: Bruno Haible @ 2023-07-23 20:15 UTC (permalink / raw)
  To: Florian Weimer, наб; +Cc: libc-alpha, Victor Stinner

наб wrote:
> There is one user-facing change:
> "ANSI_X3.4-1968" (and /only/ that, its former aliases are unaffected)
> mbrtowc() and friends return b if b <= 0x7F else <UDC00>+b.

Nope. Don't do that. Interoperability comes through standards.
The widely accepted standard for character names is the IANA registry:
https://www.iana.org/assignments/character-sets/character-sets.xhtml
and it lists ANSI_X3.4-1968 as an alias of US-ASCII, for 30 years
already. People *rely* on that.

> As @mirabilos points out in
>   https://www.mail-archive.com/austin-group-l@opengroup.org/msg11591.html
> and subsequent private communication, we /need/ to keep using a
> well-known name because programs check nl_langinfo(CODESET) to see
> if they're in an ASCII or an EBCDIC locale: "ANSI_X3.4-1968", being
> glibc's default, is checked universally.

Highly disagree. New encodings can arise at any moment. There was ISO-8859-1,
then came ISO-8859-15. There was GBK, then came GB18030. And so on.
Programs that rely on a fixed list of encoding names *must* be prepared
to update their list occasionally. Programs that do not want to be limited
to a fixed list can only pass the encoding name to selected APIs and
interfaces, for example to the iconv_open() function or the iconv program.

Yes, when a new encoding name is introduced, some programs need updates.
As we've seen in the past, it can take 2-3 years until these programs
have made new releases and these releases have been picked up by distros.
This is *much* preferred to violating existing standards.

Bruno

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-23 20:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-23 17:32 [PATCH v18 1/3] iconv: __gconv_btwoc_ascii -> __gconv_btowc_ascii наб
2023-07-23 17:33 ` [PATCH v18 2/3] locale: charmap: fix off-by-one with ranges наб
2023-07-23 19:38   ` Bruno Haible
2023-07-23 17:54 ` [PATCH v18 3/3] POSIX locale covers every byte [BZ# 29511] наб
2023-07-23 20:15   ` Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).