From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from tarta.nabijaczleweli.xyz (unknown [139.28.40.42]) by sourceware.org (Postfix) with ESMTP id 692E43858D20 for ; Wed, 12 Jul 2023 23:51:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 692E43858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=nabijaczleweli.xyz Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nabijaczleweli.xyz DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=nabijaczleweli.xyz; s=202305; t=1689205908; bh=9qGV7uBW9HjzASIZQR8H6WVNbuszg6MXuR3/p31r07o=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IbUn5uwq699XbTBilyWUSIiXvmDXsgk2jhRa2cySo4lj6lC9gQpBL2dq9KHNBbNmL b+rq6v//okWiBQGOqIJtGwE2xMXZQHt1+FsJf1q+6cbD8VCXBEv5upa9w/3rG9LZ2o 1JgXgl3vV2FlVHNkkVaUhfjanzYF9ysmiIefPJ7thvUtMZkFsjKaIPBdOUxLuwRecO BIVGfixgrHn3bgY5gggnYp9Y8i9zCzri2bBpPwelF11w4wwlbF5/FrB3DS9pmZLLvR ZJzAeuiN02bGWS4Nv/7EtiUpOZmp3XSl2E2iX+wan10J0fR1mpAhBl2/0hT3NL5Qy0 bqEKeww9ej7sA== Received: from tarta.nabijaczleweli.xyz (unknown [192.168.1.250]) by tarta.nabijaczleweli.xyz (Postfix) with ESMTPSA id 8D4A9330E; Thu, 13 Jul 2023 01:51:48 +0200 (CEST) Date: Thu, 13 Jul 2023 01:51:47 +0200 From: =?utf-8?B?0L3QsNCx?= To: Bruno Haible Cc: libc-alpha@sourceware.org Subject: Re: [PATCH v16] POSIX locale covers every byte [BZ# 29511] Message-ID: References: <6497083.JsnAkG3lO3@nimes> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="bol6mi2q2y66najj" Content-Disposition: inline In-Reply-To: <6497083.JsnAkG3lO3@nimes> User-Agent: NeoMutt/20230517 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_INFOUSMEBIZ,RDNS_DYNAMIC,SPF_HELO_PASS,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: --bol6mi2q2y66najj Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 12, 2023 at 11:49:59PM +0200, Bruno Haible wrote: > Some more comments: > * The encoding name "POSIX" is available as nl_langinfo (CODESET). Many > applications pass this encoding name to iconv(1) or iconv_open(3); > this is the main purpose of knowing the locale's encoding, after all. >=20 > But if musl libc, or any other libc (Solaris or whatever), does the > same thing, but with a different mapping table than glibc has, > iconv implementations have a problem, because the name "POSIX" then > has different mapping tables on different systems. Implementation survey via #include #include int main(){ puts(nl_langinfo (CODESET)); } and my original test program from https://bugs.debian.org/1017852 says: musl: "ASCII" (7f: 1, 007f 80: 1, df80 ff: 1, dfff) NetBSD 9.3: "646" (7f: 1, 007f 80: 1, 0080 ff: 1, 00ff) tribblix m25: =E2=80=92 || =E2=80=92 (this is the "solaris or wha= tever") FreeBSD 13.0: "US-ASCII" (7f: 1, 007f 80: 1, 0080 ff: 1, 00ff) OpenBSD 7.2: =E2=80=92 || =E2=80=92 MacOS: =E2=80=92 || =E2=80=92 (uname says 19.6.0) (notably, all of these are, strictly, wrong, but). None of these implementations' iconvs have a POSIX/posix encoding, either; I'm like mostly-sure this is just historical behaviour on the historical sy= stems. > Suggestion: Change the encoding name from "POSIX" to "POSIX-GNU" or > something similar, to make it possible for iconv implementations to > support it in a future-proof way. Given the above, I don't think there's much value to making the encoding name weirder and longer, since the dust has long settled on this? > * The identifier __gconv_btwoc_posix looks misspelled. > Suggestion: s/__gconv_btwoc_posix/__gconv_btowc_posix/ > (The previous one, __gconv_btwoc_ascii, is misspelled as well.) This explains why I didn't understand what it was trying to say lol; applied, and propagated to __gconv_btwoc_ascii. > * iconv/gconv_int.h: In the comment > identity-mapping bytes [0, 0x7F], and moving [0x80, 0xFF] into the end > of the Low Surrogate Area at [U+DC80, U+DCFF]. */ > it's no longer the end of the Low Surrogate Area [U+DC00, U+DFFF]. > > Suggestion: Change the comment to > identity-mapping bytes [0, 0x7F], and mapping [0x80, 0xFF] into the > Low Surrogate Area at [U+DC80, U+DCFF]. */ Applied. Will resend rebased for monday. --bol6mi2q2y66najj Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfWlHToQCjFzAxEFjvP0LAY0mWPEFAmSvPJEACgkQvP0LAY0m WPG0dA/+O8R8tuasyCkJj4HOngBg+4Ba017uxYRJ4stEO29idKB7HwJWo9RhHNy2 l/QoOmSlhLf62bJhJR3cBXQZAFUjmUU8J13gbopBYsvpOEx/Quszmqas+24kvYKu EL3cjgO87JXvyXFWVAzuRqYI8R7iKs6UK9jS9lfjGOCA5UE1WHfrg4Gd2gYa7kcm EMzymji60PZhnbzHv0DpL1sp/Ypafh7DfTHZbXyJy3cCiFdoSwB+KLlH+GNqPM8y Ixpsm24mz3bHHbY0OV6whHZZvgRLSquzVYeUeGxiM6IpyYV2RPgJNk24JJwA5bu4 29rSOyUxL+vQF99uZSPfg0cGqrUhVmap0ijuz3nzVccbmBWy4a24lY5n/7d0T9XX KuyrgTaPe8uuoyRoPUeSrlFFh3PoUDqk12kGm6qnVa32syLjM7K0hLduPeyyn8cY hGYFZLjLpeFN/cf/ZHKcSByO1B7H2SDaZTTWWG5PdStPdbkuFrFjU7kOYB2MRf8W 5OANTtbL6rPEwpxKBjclwEDwx7uELJgDBHwDF0/L3Wri4PcqOwDb7+1a2Eexg/9w Q9y34qJiZPEz0kWs75o0JvvRo0IvfwDTMmINpEzHX3/hdERS1vIVNWj3AbnHA4Yk RabMkv/6iNb/sIki9PsG9EcUD0wzN5Ul8yMIYPSYiTvhHUKJ8EA= =Gy2h -----END PGP SIGNATURE----- --bol6mi2q2y66najj--