From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-6428-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 63530 invoked by alias); 10 Oct 2018 12:34:35 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 63494 invoked by uid 89); 10 Oct 2018 12:34:34 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=instantly, 10102018, H*r:sk:libc-lo
X-HELO: mail-wm1-f66.google.com
Return-Path: <myllynen@redhat.com>
Reply-To: Marko Myllynen <myllynen@redhat.com>
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872] re-submission for 2.29
To: Egor Kobylkin <egor@kobylkin.com>,
 Rafal Luzynski <digitalfreak@lingonborough.com>
Cc: Keld Simonsen <keld@keldix.com>, libc-alpha@sourceware.org,
 libc-locales@sourceware.org, "Dmitry V. Levin" <ldv@altlinux.org>,
 Volodymyr Lisivka <vlisivka@gmail.com>, Carlos O'Donell <carlos@redhat.com>,
 Max Kutny <mkutny@gmail.com>, danilo@gnome.org
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20181003091949.GA21486@rap.rap.dk>
 <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com>
 <1485772360.805333.1538731225156@poczta.nazwa.pl>
 <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
 <bda2ca60-18f1-3b19-91e5-c9ad144bc834@redhat.com>
 <bb4e1ba5-5fa5-2986-2573-7d27be226124@kobylkin.com>
 <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com>
 <b8f02fe9-f911-487f-b50b-9b0c43191cb6@kobylkin.com>
 <f51992ad-008b-03a4-8880-4c12edced53b@redhat.com>
 <246390048.827062.1539037422672@poczta.nazwa.pl>
 <4db1ce91-3184-cf45-01c5-80667fc4cf65@kobylkin.com>
 <f6b530b0-53b7-bd90-9bb9-864d0a477f50@kobylkin.com>
 <a9af47d8-bf3d-e607-38e1-a6e765a604d3@kobylkin.com>
 <1198370378.413479.1539123456488@poczta.nazwa.pl>
 <70c29e42-0fd3-4f10-fafb-44d67190d870@kobylkin.com>
 <c89f0ac3-6ccb-3e41-dc26-75ef03d9afa1@kobylkin.com>
 <9edcf6f2-607c-91ac-8eaf-ffbc973fe597@redhat.com>
 <3f50cc1f-9493-0611-3478-0394ecb6b37e@kobylkin.com>
From: Marko Myllynen <myllynen@redhat.com>
Message-ID: <286bc20c-db97-5244-8c26-a3a95e989361@redhat.com>
Date: Wed, 10 Oct 2018 12:34:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <3f50cc1f-9493-0611-3478-0394ecb6b37e@kobylkin.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-SW-Source: 2018-q4/txt/msg00040.txt.bz2

Hi,

On 2018-10-10 15:19, Egor Kobylkin wrote:
> On 10.10.2018 13:22, Marko Myllynen wrote:
>>> correct link https://sourceware.org/bugzilla/attachment.cgi?id=11303
>>
>> Although I haven't checked every rule this in general looks very good
>> (but see below). 
> 
>> Not sure do we want to add the few missing characters
>> mentioned at https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode,
>> e.g., one instantly notices that U+0400 is missing. (I wouldn't add at
>> least initially the more exotic characters, like the historic ones,
>> though.) Perhaps filing a bug or two for these cases for separate
>> consideration would be ok.
> 
> The question here is what should serve as their transliteration and
> transcription?

Not sure, so filing a separate bug about this once your patch is merged
might be the most suitable action for now, I don't think we want to
postpone merging your work further due to these non-ISO 9 cases.

>> I'm not sure this will work, no existing rule in translit_* files
>> contain two characters, I'd assume that the rule for U+0423 is applied
>> first and then the below rule is never used.
>>
>> % CYRILLIC UNDEFINED
>> <U0423><U0301> <U00DA>;"<U0055><U0060>"
>>
>> Perhaps this should be commented out or removed altogether if it's not
>> working as intended.
> 
> So yes, they are not processed. I would drop them to not to have special
> cases. But I am also fine with keeping them because all work is done
> already.
I'd probably drop them but I don't feel strongly about this either way.

Thanks for your efforts, I don't have any further comments, I'll leave
this now for Rafal and Mike to provide additional feedback and hopefully
merge soon.

Thanks,

-- 
Marko Myllynen