On Wed, Jul 12, 2023 at 11:49:59PM +0200, Bruno Haible wrote: > Some more comments: > * The encoding name "POSIX" is available as nl_langinfo (CODESET). Many > applications pass this encoding name to iconv(1) or iconv_open(3); > this is the main purpose of knowing the locale's encoding, after all. > > But if musl libc, or any other libc (Solaris or whatever), does the > same thing, but with a different mapping table than glibc has, > iconv implementations have a problem, because the name "POSIX" then > has different mapping tables on different systems. Implementation survey via #include #include int main(){ puts(nl_langinfo (CODESET)); } and my original test program from https://bugs.debian.org/1017852 says: musl: "ASCII" (7f: 1, 007f 80: 1, df80 ff: 1, dfff) NetBSD 9.3: "646" (7f: 1, 007f 80: 1, 0080 ff: 1, 00ff) tribblix m25: ‒ || ‒ (this is the "solaris or whatever") FreeBSD 13.0: "US-ASCII" (7f: 1, 007f 80: 1, 0080 ff: 1, 00ff) OpenBSD 7.2: ‒ || ‒ MacOS: ‒ || ‒ (uname says 19.6.0) (notably, all of these are, strictly, wrong, but). None of these implementations' iconvs have a POSIX/posix encoding, either; I'm like mostly-sure this is just historical behaviour on the historical systems. > Suggestion: Change the encoding name from "POSIX" to "POSIX-GNU" or > something similar, to make it possible for iconv implementations to > support it in a future-proof way. Given the above, I don't think there's much value to making the encoding name weirder and longer, since the dust has long settled on this? > * The identifier __gconv_btwoc_posix looks misspelled. > Suggestion: s/__gconv_btwoc_posix/__gconv_btowc_posix/ > (The previous one, __gconv_btwoc_ascii, is misspelled as well.) This explains why I didn't understand what it was trying to say lol; applied, and propagated to __gconv_btwoc_ascii. > * iconv/gconv_int.h: In the comment > identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end > of the Low Surrogate Area at [U+DC80, U+DCFF]. */ > it's no longer the end of the Low Surrogate Area [U+DC00, U+DFFF]. > > Suggestion: Change the comment to > identity-mapping bytes [0, 0x7F], and mapping [0x80, 0xFF] into the > Low Surrogate Area at [U+DC80, U+DCFF]. */ Applied. Will resend rebased for monday.