From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 68846 invoked by alias); 9 Oct 2018 22:40:54 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 68824 invoked by uid 89); 9 Oct 2018 22:40:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.6 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=Hx-languages-length:1768, H*r:mreue108 X-HELO: mout.kundenserver.de Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Rafal Luzynski , Marko Myllynen Cc: Keld Simonsen , libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com> <246390048.827062.1539037422672@poczta.nazwa.pl> <4db1ce91-3184-cf45-01c5-80667fc4cf65@kobylkin.com> <1198370378.413479.1539123456488@poczta.nazwa.pl> From: Egor Kobylkin Openpgp: preference=signencrypt Message-ID: <70c29e42-0fd3-4f10-fafb-44d67190d870@kobylkin.com> Date: Tue, 09 Oct 2018 22:40:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1198370378.413479.1539123456488@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-q4/txt/msg00035.txt.bz2 On 10.10.2018 00:17, Rafal Luzynski wrote: > 9.10.2018 20:34 Egor Kobylkin wrote: >> >> The culprits were the "" around the "" () and >> "" (). >> It works now with >> % CYRILLIC UNDEFINED >> ;"" >> % CYRILLIC UNDEFINED >> ;"" >> >> [...] > > I wonder why you need Cyrillic U with acute, and why you comment it > as "undefined" at all. I know that any Cyrillic vowel may appear with > an acute accent but "the diacritic is used only in dictionaries, children's > books, resources for foreign-language learners (...)". [1] So maybe > all vowels with an acute accent should be handled (which I think is fine) > rather than just U. I have just taken the https://en.wikipedia.org/wiki/ISO_9 table and implemented it on Marko's suggestion. Personally I have no opinion on what letters should be included and under what name. These funny Us just happened to be in the ISO9 table. There is no codepoint and no name for and in Unicode. That’s why its coming through that way from my worksheet as it does a reverse lookup on the names based on the Unicode codepoints. Manually we can change it to whatever you’d suggest in the translit_cyrillic. I just don’t know the right name. On my side I think I have all outstanding tasks complete for the patch https://sourceware.org/bugzilla/attachment.cgi?id=11144. So please let me know explicitly if you'd like anything changed there. I was planning to rewrite just the commit message according to your earlier feedback and resubmit sometime soon. Bests, Diego