From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-6445-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 33575 invoked by alias); 15 Oct 2018 11:05:01 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 33522 invoked by uid 89); 15 Oct 2018 11:04:59 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=wins, extending, discrepancies, HContent-Transfer-Encoding:8bit
X-HELO: mail-wm1-f67.google.com
Return-Path: <myllynen@redhat.com>
Reply-To: Marko Myllynen <myllynen@redhat.com>
Subject: Re: [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ
 #2872]
To: Egor Kobylkin <egor@kobylkin.com>,
 Rafal Luzynski <digitalfreak@lingonborough.com>, libc-alpha@sourceware.org,
 libc-locales@sourceware.org
Cc: mfabian@redhat.com, "Dmitry V. Levin" <ldv@altlinux.org>,
 Volodymyr Lisivka <vlisivka@gmail.com>, Max Kutny <mkutny@gmail.com>,
 danilo@gnome.org
References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
 <20180412224352.GB2911@altlinux.org>
 <d5582688-819b-90c2-3f4a-0d19c932d487@kobylkin.com>
 <165238610.582597.1539392357757@poczta.nazwa.pl>
 <e072a70c-9962-4087-93c2-06ec3c9a0b1f@kobylkin.com>
From: Marko Myllynen <myllynen@redhat.com>
Message-ID: <1374aef3-4c16-b9cd-49a6-b6da9b1a9eeb@redhat.com>
Date: Mon, 15 Oct 2018 11:05:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.9.1
MIME-Version: 1.0
In-Reply-To: <e072a70c-9962-4087-93c2-06ec3c9a0b1f@kobylkin.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-SW-Source: 2018-q4/txt/msg00057.txt.bz2

Hi,

On 2018-10-13 19:58, Egor Kobylkin wrote:
> On 13.10.2018 02:59, Rafal Luzynski wrote:
> 
>> Regarding the tests, I think there is no complete transliteration 
>> test suite at the moment.  Probably the only test is 
>> localedata/bug-iconv-trans.c. You can also see the collation tests 
>> placed in the same directory, they use those multiple *.UTF-8.in 
>> files.
>>
>> You can skip the tests for now.
> 
> First I though they could just be added but not all locales
> transliterate Umlauts so just extending the current test won't do as it
> will fail for those locales.

I still think a one-time check against uconv(1) (part of Unicode's ICU
project) for discrepancies.

>>> [...] diff -uNr a/localedata/locales/am_ET 
>>> b/localedata/locales/am_ET --- a/localedata/locales/am_ET 
>>> 2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET 
>>> 2018-10-11 15:10:43.000000000 +0000 @@ -1394,6 +1394,7 @@ <U137A> 
>>> <U0060><U0039><U0030> <U137B> <U0060><U0031><U0030><U0030> <U137C> 
>>> <U0060><U0031><U0030><U0030><U0030><U0030> +include 
>>> "translit_cyrillic";"" translit_end % END LC_CTYPE
>>
>> Shouldn't âinclude "translit_cyrillic";""â be placed before the 
>> custom rules, together with other includes?  The same in more files, 
>> I will not mention them all.
> 
> If I recall correctly it is because of the
> "translit_end
> END LC_CTYPE"
> part at the end of the translit_cyrillic. This way it works for any
> locale, regardless whether it has translit itself or not. And being at
> the end it does not supersede any previous transliteration that may be
> there for a reason.

I suspect one problem would be that the latter rule wins, so if there
are some locale-specific rules than possible translit_* inclusions would
override them if not included before the locale-specific rules.

Cheers,

-- 
Marko Myllynen