From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 63530 invoked by alias); 10 Oct 2018 12:34:35 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 63494 invoked by uid 89); 10 Oct 2018 12:34:34 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=instantly, 10102018, H*r:sk:libc-lo X-HELO: mail-wm1-f66.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Egor Kobylkin , Rafal Luzynski Cc: Keld Simonsen , libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com> <246390048.827062.1539037422672@poczta.nazwa.pl> <4db1ce91-3184-cf45-01c5-80667fc4cf65@kobylkin.com> <1198370378.413479.1539123456488@poczta.nazwa.pl> <70c29e42-0fd3-4f10-fafb-44d67190d870@kobylkin.com> <9edcf6f2-607c-91ac-8eaf-ffbc973fe597@redhat.com> <3f50cc1f-9493-0611-3478-0394ecb6b37e@kobylkin.com> From: Marko Myllynen Message-ID: <286bc20c-db97-5244-8c26-a3a95e989361@redhat.com> Date: Wed, 10 Oct 2018 12:34:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <3f50cc1f-9493-0611-3478-0394ecb6b37e@kobylkin.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2018-q4/txt/msg00040.txt.bz2 Hi, On 2018-10-10 15:19, Egor Kobylkin wrote: > On 10.10.2018 13:22, Marko Myllynen wrote: >>> correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303 >> >> Although I haven't checked every rule this in general looks very good >> (but see below). > >> Not sure do we want to add the few missing characters >> mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode, >> e.g., one instantly notices that U+0400 is missing. (I wouldn't add at >> least initially the more exotic characters, like the historic ones, >> though.) Perhaps filing a bug or two for these cases for separate >> consideration would be ok. > > The question here is what should serve as their transliteration and > transcription? Not sure, so filing a separate bug about this once your patch is merged might be the most suitable action for now, I don't think we want to postpone merging your work further due to these non-ISO 9 cases. >> I'm not sure this will work, no existing rule in translit_* files >> contain two characters, I'd assume that the rule for U+0423 is applied >> first and then the below rule is never used. >> >> % CYRILLIC UNDEFINED >> ;"" >> >> Perhaps this should be commented out or removed altogether if it's not >> working as intended. > > So yes, they are not processed. I would drop them to not to have special > cases. But I am also fine with keeping them because all work is done > already. I'd probably drop them but I don't feel strongly about this either way. Thanks for your efforts, I don't have any further comments, I'll leave this now for Rafal and Mike to provide additional feedback and hopefully merge soon. Thanks, -- Marko Myllynen