From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 99161 invoked by alias); 7 Jan 2019 20:37:33 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 99066 invoked by uid 89); 7 Jan 2019 20:37:32 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=sole, focus, Country, siddhesh X-HELO: mail-wm1-f67.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] To: Egor Kobylkin , Rafal Luzynski , libc-alpha@sourceware.org, libc-locales@sourceware.org, Carlos O'Donell , Siddhesh Poyarekar Cc: Mike Fabian References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <2124833400.35614.1546698902753@poczta.nazwa.pl> <908ed415-cfe4-804c-f421-4351ef062edc@kobylkin.com> From: Marko Myllynen Message-ID: Date: Mon, 07 Jan 2019 20:37:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <908ed415-cfe4-804c-f421-4351ef062edc@kobylkin.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2019-q1/txt/msg00021.txt.bz2 Hi, On 05/01/2019 23.12, Egor Kobylkin wrote: > On 05.01.19 15:35, Rafal Luzynski wrote: >> 2.01.2019 19:38 Egor Kobylkin wrote: >>> >>> Changelog v12: >>> [...] >>> >>> Changelog v11: >>> * Re-targeted the patch against locale/C-translit.h.in as the proper >>> file for the ASCII translit table. >>> * Correspondingly the patch now only contains the additional >>> Cyrillic-ASCII strings in the format of locale/C-translit.h.in table. >>> The 'include "translit_cyrillic";""' directives are not necessary in the >>> locale files and they are now all left intact. >>> * Also the file translit_cyrillic is not longer needed and is omitted. >>> * Edited below email, commit message. >>> [...] >> >> I have tested this and, unfortunately, now this transliteration >> works *only* in C locale, that is, only when no locale is set or when >> it is explicitly set to C (C.UTF8, POSIX).  It does not work when locale >> is set to anything different, including en_US, ru_RU, etc. > > Good catch! Should we maybe split this into two patches, one for C and > the other for "country" locales? They have different codes and > functionality so it looks like it would be easier to keep focus. That would probably make sense, the standard C/POSIX locale won't support System A so it also narrows down solution alternatives with it. (If the C.UTF-8 locale (see https://sourceware.org/bugzilla/show_bug.cgi?id=17318) materializes one day I'm not sure would transliteration be applicable in that context.) > My understanding is that locale/C-translit.h.in is still the proper > locale for the sole ASCII translit table. It is also the only solution > for many use cases where there is no locale available (not compiled or > not set). Correct, as Siddhesh mentioned those rules will end up to the built-in C/POSIX locale which is ASCII and will be used if no other locales are available or set properly. The translit_* files won't affect to it. > "Country" locales in localedata/locales/ can then have the exact same > translit table included or they can have any other flavor - I don't see > a problem here. Indeed, and since those files are not limited to ASCII, perhaps we could now reconsider the v9 approach for them, i.e., prefer System A if possible, otherwise use System B / ASCII (just need to make sure that the ASCII fall-back for them will match the built-in C ASCII rule)? Thanks, -- Marko Myllynen