From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mo4-p00-ob.smtp.rzone.de (mo4-p00-ob.smtp.rzone.de [85.215.255.20]) by sourceware.org (Postfix) with ESMTPS id 2486D3858D20 for ; Wed, 12 Jul 2023 19:44:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2486D3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=clisp.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=clisp.org ARC-Seal: i=1; a=rsa-sha256; t=1689191066; cv=none; d=strato.com; s=strato-dkim-0002; b=Tsa2zIPn/mhzxtxaIKv4eBtr8X8YBqI2vizUk9WWgM0+GoXvfU+BDVC4EttzunCAed FnrCQfAcmW8YoSQqCZGY10MhovH5eAhAs0w8qDXXhsyB4lkGuNkv36PMUwTf3BR5QZof XpOe5NtZfQ0uPlg1PSg8grf3r82GBLQq5N9Yl47H2V5OvhQNiEA3GUxiYaypk/jzGB2I 7RI+w8AfTX+/yt+doZwyAuAKbC0huP3WrSkyMomgWJy2+FVM5oYsreSWtpJBEm+zXy0Y lnOYo2UmAFzM+5OeaSsgbY4X78wPpzEszeX1cIkAbs2U4APQLQR6cZQj7alc22u5/nWI axBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; t=1689191066; s=strato-dkim-0002; d=strato.com; h=Message-ID:Date:Subject:Cc:To:From:Cc:Date:From:Subject:Sender; bh=R7kTghEIJLzfuHIrNrbkvON4phZ8JmLbIkkYaOVkGsw=; b=VlivVfOr0T3BAwKZDh7b7Fj6shzC5w3v9tZzlg6ImsxLapp+pX5Klt5IqddrmW8yCo 2PjUmH9+Gzr8zH6Cs8xEyyglSD3Onl+mMli+hvy7eOG6VIazdY/vlgTDp1IpFdyf9TaY A9NQjcxG+BlAWWy3GXa+lKEk73e/iMDCKVYc3/cJ03TF+BAa+4Rn/V9DLz5JstwEiIbR z9NxJHJIYpCGEe/SsBwVDvV51nH+7/UMTd5SlnN8LC/0plkjW5UbRAdi0X2O0RXS14Tt yS2Zm3yAcdu80KYFLU1ocIVpnhBcJ85wqqdS9FxgSmoYAcU17nWYvGNsWS+uKum10BSW oldA== ARC-Authentication-Results: i=1; strato.com; arc=none; dkim=none X-RZG-CLASS-ID: mo00 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; t=1689191066; s=strato-dkim-0002; d=clisp.org; h=Message-ID:Date:Subject:Cc:To:From:Cc:Date:From:Subject:Sender; bh=R7kTghEIJLzfuHIrNrbkvON4phZ8JmLbIkkYaOVkGsw=; b=Y9eSgpwSKmjWOnjXMWGWxw1MQvtyov8NpARZNVC6drZb/Z0ZNGFQXRef1j2dfnE+tT GPUfp6w2bft2wlsLOCaQSP+DqovKNWiFCtTXAovEeQLRXzB6CN758JnSmUTjpN1Kplat 0zH+tn8h8sx9y70YDL6wP5dKalcguEBwPrfifmH4ha13+ZDBFKd8JRjXdXEjPd2f9EXe weY8u+W4n8UfG/5AAaPdOpEhZH63yyiP/eaiI8ELBtO1k7WMMzkmPg9IY40PTGq7rzD6 gRsA8vy6YnXBpfzbkfJANAe5GR+F6Mrn8/I8sgQwyVcgFCsz8ERV1w1dvo0vu9NIypJb GcnQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; t=1689191066; s=strato-dkim-0003; d=clisp.org; h=Message-ID:Date:Subject:Cc:To:From:Cc:Date:From:Subject:Sender; bh=R7kTghEIJLzfuHIrNrbkvON4phZ8JmLbIkkYaOVkGsw=; b=Z2k2gFmZ2nQG+ZeGR+Phxtg+qLldt7t64aHXqP+0P2M9+8UtKJlLjzG0HSHPq9zU0F yvcYRNJJJ24l+Mb/mqAw== X-RZG-AUTH: ":Ln4Re0+Ic/6oZXR1YgKryK8brlshOcZlIWs+iCP5vnk6shH0WWb0LN8XZoH94zq68+3cfpPH1Pezfz6ABsJatyCSzar92Ju8zg==" Received: from nimes.localnet by smtp.strato.de (RZmta 49.6.0 AUTH) with ESMTPSA id 6b0accz6CJiQH5i (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256 bits)) (Client did not present a certificate); Wed, 12 Jul 2023 21:44:26 +0200 (CEST) From: Bruno Haible To: Ahelenia =?utf-8?B?WmllbWlhxYRza2E=?= Cc: libc-alpha@sourceware.org Subject: Re: [PATCH v16] POSIX locale covers every byte [BZ# 29511] Date: Wed, 12 Jul 2023 21:44:26 +0200 Message-ID: <4881032.NnENhoQgcM@nimes> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_PASS,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Thank you for working on this. Regarding the mapping of the bytes 0x80..0xFF: > By strategically picking c= we land at the same point of the > Unicode Low Surrogate Area at DC00-DCFF, described as > > Isolated surrogate code points have no interpretation; > > consequently, no character code charts or names lists > > are provided for this range. > as the Python UTF-8 errors=surrogateescape encoding. musl libc maps the bytes 0x80..0xFF to U+DF80..U+DFFF. [1][2] I think it is more useful to avoid an inconsistency between glibc and musl libc, than to be consistent with what a particular user-space program (Python) does. How about mapping the bytes 0x80..0xFF to U+DF80..U+DFFF, like musl libc does? Bruno [1] https://git.musl-libc.org/cgit/musl/tree/src/multibyte/internal.h#n19 [2] https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/test-btowc.c#l71