From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 33575 invoked by alias); 15 Oct 2018 11:05:01 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 33522 invoked by uid 89); 15 Oct 2018 11:04:59 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-11.9 required=5.0 tests=BAYES_00,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=wins, extending, discrepancies, HContent-Transfer-Encoding:8bit X-HELO: mail-wm1-f67.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] To: Egor Kobylkin , Rafal Luzynski , libc-alpha@sourceware.org, libc-locales@sourceware.org Cc: mfabian@redhat.com, "Dmitry V. Levin" , Volodymyr Lisivka , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <165238610.582597.1539392357757@poczta.nazwa.pl> From: Marko Myllynen Message-ID: <1374aef3-4c16-b9cd-49a6-b6da9b1a9eeb@redhat.com> Date: Mon, 15 Oct 2018 11:05:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-q4/txt/msg00057.txt.bz2 Hi, On 2018-10-13 19:58, Egor Kobylkin wrote: > On 13.10.2018 02:59, Rafal Luzynski wrote: > >> Regarding the tests, I think there is no complete transliteration >> test suite at the moment. Probably the only test is >> localedata/bug-iconv-trans.c. You can also see the collation tests >> placed in the same directory, they use those multiple *.UTF-8.in >> files. >> >> You can skip the tests for now. > > First I though they could just be added but not all locales > transliterate Umlauts so just extending the current test won't do as it > will fail for those locales. I still think a one-time check against uconv(1) (part of Unicode's ICU project) for discrepancies. >>> [...] diff -uNr a/localedata/locales/am_ET >>> b/localedata/locales/am_ET --- a/localedata/locales/am_ET >>> 2018-10-11 15:10:11.000000000 +0000 +++ b/localedata/locales/am_ET >>> 2018-10-11 15:10:43.000000000 +0000 @@ -1394,6 +1394,7 @@ >>> >>> +include >>> "translit_cyrillic";"" translit_end % END LC_CTYPE >> >> Shouldn't “include "translit_cyrillic";""” be placed before the >> custom rules, together with other includes? The same in more files, >> I will not mention them all. > > If I recall correctly it is because of the > "translit_end > END LC_CTYPE" > part at the end of the translit_cyrillic. This way it works for any > locale, regardless whether it has translit itself or not. And being at > the end it does not supersede any previous transliteration that may be > there for a reason. I suspect one problem would be that the latter rule wins, so if there are some locale-specific rules than possible translit_* inclusions would override them if not included before the locale-specific rules. Cheers, -- Marko Myllynen