From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-out.m-online.net (mail-out.m-online.net [IPv6:2001:a60:0:28:0:1:25:1]) by sourceware.org (Postfix) with ESMTPS id 98792388700C for ; Mon, 29 Jun 2020 09:33:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 98792388700C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=linux-m68k.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=whitebox@nefkom.net Received: from frontend01.mail.m-online.net (unknown [192.168.8.182]) by mail-out.m-online.net (Postfix) with ESMTP id 49wKWp4xRbz1rwbW; Mon, 29 Jun 2020 09:54:30 +0200 (CEST) Received: from localhost (dynscan1.mnet-online.de [192.168.6.70]) by mail.m-online.net (Postfix) with ESMTP id 49wKWp4SFPz1qqkj; Mon, 29 Jun 2020 09:54:30 +0200 (CEST) X-Virus-Scanned: amavisd-new at mnet-online.de Received: from mail.mnet-online.de ([192.168.8.182]) by localhost (dynscan1.mail.m-online.net [192.168.6.70]) (amavisd-new, port 10024) with ESMTP id sONNrIR01fsp; Mon, 29 Jun 2020 09:54:29 +0200 (CEST) X-Auth-Info: JC5Jz3tY2cOvvckn0W6josLMHf2hs9s9IiiNJ/p8r06KzVZnzGXnTzwXsN2gzpFx Received: from igel.home (ppp-46-244-182-174.dynamic.mnet-online.de [46.244.182.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.mnet-online.de (Postfix) with ESMTPSA; Mon, 29 Jun 2020 09:54:29 +0200 (CEST) Received: by igel.home (Postfix, from userid 1000) id 1F6952C0B5E; Mon, 29 Jun 2020 09:54:29 +0200 (CEST) From: Andreas Schwab To: Carlos O'Donell via Libc-alpha Subject: Re: [PATCH 2/2] Add new C.UTF-8 locale (Bug 17318) References: <75d21bd8-2698-2e25-969c-4e086c90abd9@redhat.com> X-Yow: When this load is DONE I think I'll wash it AGAIN.. Date: Mon, 29 Jun 2020 09:54:29 +0200 In-Reply-To: (Carlos O'Donell via Libc-alpha's message of "Mon, 29 Jun 2020 00:22:48 -0400") Message-ID: <87r1tyz4qy.fsf@igel.home> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.91 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.3 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Jun 2020 09:36:32 -0000 On Jun 29 2020, Carlos O'Donell via Libc-alpha wrote: > @@ -125,67 +146,122 @@ def process_charmap(flines, outfile): > > /x10 DATA LINK ESCAPE > .. /xe3/x90/x80 > - % /xed/xa0/x80 > - % /xed/xad/xbf > + /xed/xa0/x80 > + /xed/xad/xbf > .. /xf4/x8f/xbf/x80 > > + Note that old glibc UTF-8 charmap left the surrogates commented out. > + We keep the surrogate entries because we want to be able to sort the > + invalid values into a consistent location. > + > ''' > fields_start = [] > + fields_end = [] > for line in flines: > fields = line.split(";") > - # Some characters have “” as their name. We try to > - # use the “Unicode 1.0 Name” (10th field in > - # UnicodeData.txt) for them. > - # > - # The Characters U+0080, U+0081, U+0084 and U+0099 have > - # “” as their name but do not even have aa > - # ”Unicode 1.0 Name”. We could write code to take their > - # alternate names from NameAliases.txt. > + # Some characters have “” as their name. We try to > + # use the “Unicode 1.0 Name” (10th field in > + # UnicodeData.txt) for them. > + # > + # The Characters U+0080, U+0081, U+0084 and U+0099 have > + # “” as their name but do not even have aa s/aa/a/ Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."