From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 78912 invoked by alias); 17 Jun 2019 08:59:28 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 78892 invoked by uid 89); 17 Jun 2019 08:59:28 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=0.7 required=5.0 tests=AWL,BAYES_00,BODY_8BITS,GARBLED_BODY,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS autolearn=no version=3.3.1 spammy=intensive, essential, appliances, intelligent X-HELO: mail-40132.protonmail.ch Date: Mon, 17 Jun 2019 08:59:00 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kobylkin.com; s=protonmail; t=1560761963; bh=5DoenkvRsM8zv0jZQmXx2jOz1mds58WIytiYKrVCSPY=; h=Date:To:From:Cc:Reply-To:Subject:In-Reply-To:References: Feedback-ID:From; b=M7GeL//UTr4DNS38joi1tYySVeBGTuIRt163Oyx1vQlHQ+7AKFwNv+7TCEN53lyrs QvhjG0B0/5L7u0ERorE9f9vMmtmR5fmeUTV2coHCaOVtB491U64lMuvQY6RzPDam6t OMPdTY8m880qoqlIKl//6ms4xb70//C4hNYdnj10= To: Rafal Luzynski From: "Diego (Egor) Kobylkin" Cc: Carlos O'Donell , Marko Myllynen , "libc-alpha@sourceware.org" , "libc-locales@sourceware.org" , Siddhesh Poyarekar , Mike Fabian Reply-To: "Diego (Egor) Kobylkin" Subject: Re: [PING^8][PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] Message-ID: <6wv51loHLfKSGkdpVWewwuBG_P28BI2WZGNfStpjj6TdejdGYIhMynrzeLfh63mRjURnIVT7eciFtrgidjwlhJd7hWg-TfByCOUo5AjCZ4w=@kobylkin.com> In-Reply-To: <1728627823.1766022.1560206406480@poczta.nazwa.pl> References: <2030695416.914859.1559778544120@poczta.nazwa.pl> <1640311749.1550210.1559856673283@poczta.nazwa.pl> <054f3b06-3ca8-00b0-ee07-1ff86a4106dc@redhat.com> <956159024.1658672.1559904734686@poczta.nazwa.pl> <761147fe-75d8-fbbf-b75a-1b58323254f9@redhat.com> <1728627823.1766022.1560206406480@poczta.nazwa.pl> MIME-Version: 1.0 Content-Type: multipart/signed; protocol="application/pgp-signature"; micalg=pgp-sha512; boundary="---------------------33840e6408d733a9786bde5278de2c1c"; charset=UTF-8 X-SW-Source: 2019-q2/txt/msg00094.txt.bz2 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) -----------------------33840e6408d733a9786bde5278de2c1c Content-Type: multipart/mixed;boundary=---------------------20b920c355980d95103879e54e70cd6b -----------------------20b920c355980d95103879e54e70cd6b Content-Transfer-Encoding: quoted-printable Content-Type: text/plain;charset=utf-8 Content-length: 2934 Carlos,=20 we seem to have a consensus of all involved that the patch can be committed= as is.=20 Do you see it like this on your side as well or are there any more question= s or suggestions? Bests, Egor P.S. Just a clarification to Rafal points below and thanks @Rafal for the i= ntensive "peer review" so far! It definitely looks to me like we finally don't have any more divergent poi= nts after all the issues discussed. =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 Original Me= ssage =E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90=E2=80=90 On Tuesday, June 11, 2019 12:40 AM, Rafal Luzynski wrote: ... > 7.06.2019 14:59 "Diego (Egor) Kobylkin" egor@kobylkin.com wrote: > > But the target system doesn't support Russian locale and so you must > > transliterate the filenames. >=20 > While talking about the filesystem: I think the problem is not > that it does not support Russian locale but that it tries to > handle it and fails at this. If the filesystem accepted any > byte string as a file name wouldn't it accept a byte string which > constructs correct Cyrillic characters in UTF-8, without any > transliteration? Just to clarify here - the need to transliterate is the essential part in t= his example, not the actual cause of that need.=20 A lot of "things" don't support UTF-8 or Cyrillic - filesystems, some UNIX = power tools, older network appliances, databases, key-value stores etc. We = are talking about a situation where you are forced to transliterate to ASCI= I. So that requirement is a given.=20 ... >=20 > > In glibc we don't have any framework for an intelligent conversion. > > We would have to write specific code to handle this case and add > > it into the translit code for special handling in this case. >=20 > My suggestion was to add such an intelligent conversion. The rule > should be simple: if a letter is followed by a lowercase it should > be a titlecase (Sh), otherwise it should be uppercase (SH). But > this may break Egor's requirement to keep them always uppercase. Again for the record my "requirement" is to have a minimal patch committed = sooner than later. It turned out surprisingly difficult to keep our focus e= ven on a single flat mapping table that the ASCII transliteration really is= .=20 >=20 > > I think we should today leave "=D0=A8"->"SH" and "=D0=A1=D1=85"->"Sh", = since it's > > the most conservative position that avoids ambiguity, and then we > > can discuss the aesthetics of this and the other impacts and solutions. > > I appreciate Rafal's position, but I think being conservative here, > > even if it's not as pretty as uconv, is a good guiding idea. >=20 > Just to summarize: if you want to apply the relaxed rules, more > technical than linguistic, then I am more willing to accept these > patches. The great thing is that we seem to have a consensus now and can proceed. -----------------------20b920c355980d95103879e54e70cd6b Content-Type: application/pgp-keys; filename="publickey - egor@kobylkin.com - 0x01FEB4E8.asc"; name="publickey - egor@kobylkin.com - 0x01FEB4E8.asc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="publickey - egor@kobylkin.com - 0x01FEB4E8.asc"; name="publickey - egor@kobylkin.com - 0x01FEB4E8.asc" Content-length: 891 LS0tLS1CRUdJTiBQR1AgUFVCTElDIEtFWSBCTE9DSy0tLS0tDQpWZXJzaW9u OiBPcGVuUEdQLmpzIHY0LjUuMQ0KQ29tbWVudDogaHR0cHM6Ly9vcGVucGdw anMub3JnDQoNCnhqTUVYTGN4NkJZSkt3WUJCQUhhUnc4QkFRZEFUYVpYRStO US9ZYXJYRk9jTEhJQk9DSWJ6TXNnNXpQZQ0KSTZ5VzR4OHBQVlhOSnlKbFoy OXlRR3R2WW5sc2EybHVMbU52YlNJZ1BHVm5iM0pBYTI5aWVXeHJhVzR1DQpZ Mjl0UHNKM0JCQVdDZ0FmQlFKY3R6SG9CZ3NKQndnREFnUVZDQW9DQXhZQ0FR SVpBUUliQXdJZUFRQUsNCkNSQStPcVNEZ0FHcG9acmVBUDlOTUdxMXZ1UVJi Y1hBbGhZbStvRU9XMGVWYXRyK0RJcDRBdGJoYzdkZw0KUUFFQXA1NjBKMFEz RHpmK1BKY1pDdFBHeERlOWZWVkZyelBYUzN3MTBYN00wd2ZPT0FSY3R6SG9F Z29yDQpCZ0VFQVpkVkFRVUJBUWRBb2RSbXRLSDkwV0ZMZzlwTHloS0c2b0Rv ZWpIdWhjOEd0eTROSXlhRUxtd0QNCkFRZ0h3bUVFR0JZSUFBa0ZBbHkzTWVn Q0d3d0FDZ2tRUGpxa2c0QUJxYUVtc2dFQTZnSWdWQ29jMVp0cw0KWWMyNVh6 MEtVWXNuMWtPNEZxZmwyd2pQNzVUYkxYZ0EvQW9odWdlc2xXZVFsRTdUQ2Fh U3hFV0RXL2xYDQo4SmRlTEo4dFlIZFEvNU1MDQo9T0JwMQ0KLS0tLS1FTkQg UEdQIFBVQkxJQyBLRVkgQkxPQ0stLS0tLQ0K -----------------------20b920c355980d95103879e54e70cd6b-- -----------------------33840e6408d733a9786bde5278de2c1c Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" Content-length: 249 -----BEGIN PGP SIGNATURE----- Version: ProtonMail Comment: https://protonmail.com wl4EARYKAAYFAl0HVmkACgkQPjqkg4ABqaEc4wEA/CKYM1aKRr+IXHHEKf3F ZHY5AZoTjPjuRxgN9nn0RxEA/iJIeWmxT9zJ2Vl4Bch61dy8qRidDrv4F3BA XTUoldQK =4+XG -----END PGP SIGNATURE----- -----------------------33840e6408d733a9786bde5278de2c1c--