From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13209 invoked by alias); 10 Dec 2018 21:20:42 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 13182 invoked by uid 89); 10 Dec 2018 21:20:41 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.2 spammy=HX-Google-DKIM-Signature:reply-to, aside, article, theoretical X-HELO: mail-wr1-f54.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH v9] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] To: Rafal Luzynski , Egor Kobylkin , libc-alpha@sourceware.org, libc-locales@sourceware.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <837001401.21346.1542406647888@poczta.nazwa.pl> <5a247161-c498-ed50-ff4a-58f2ecf974f0@redhat.com> <1441622134.517912.1543702039942@poczta.nazwa.pl> <2f6fc82c-77ba-d331-ae5d-e2373e122a88@kobylkin.com> <1361059722.707244.1544231740358@poczta.nazwa.pl> From: Marko Myllynen Cc: Mike Fabian , Carlos O'Donell Message-ID: Date: Mon, 10 Dec 2018 21:20:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <1361059722.707244.1544231740358@poczta.nazwa.pl> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-q4/txt/msg00127.txt.bz2 Hi, On 08/12/2018 03.15, Rafal Luzynski wrote: > 17.11.2018 19:34 Egor Kobylkin wrote: >> >> The SH/Sh can be decided on either way - seems like an easy change any >> way. > > I'm in favor of "Sh" because it will work fine for titlecased words > (where only the first letter is uppercase) but I'm aware it would be > a problem for uppercased words. Unfortunately, I think we are unable > to satisfy both cases. I think I'm in favor of "Sh" as well, although not perfect I'd assume it's probably going to be correct in more cases than SH. >> System A was added on Marko's request (so setting him on TO:) I am >> neutral on keeping it or dropping it, just to be clear. > > I think I didn't see this Marko's request but I'm in favor of keeping > System A, too. > > Marko, it would be good to hear your opinion about System A vs. System B > again. I think System A is a better option as it should be the same as ISO 9 and perhaps also produces results in some cases which are more expected than with System B (if the Wikipedia ISO 9 article is to be believed). Wrt BZ #2872 I think it's good to keep it in mind but IMHO we can also deviate from it if needed, however with System A + ASCII fallback definitions the RFE should be satisfied as well? > 19.11.2018 20:35 Marko Myllynen wrote: >> [...] >> In any case once your patch lands I'm going to submit a follow-up patch >> for fi_FI to make it compliant with the applicable national standard >> (SFS 4900) which defines how to do Cyrillic transliteration / >> transcription in the context Finnish. > > I totally agree. As far as I can see, SFS 4900 is more similar to > System A (ISO 9) rather than System B, that is, it transliterates to Latin > characters with diacritics rather than plain ASCII. Marko, what is your > opinion about possible implementation of SFS 4900 in these cases: > > * When the destination charset does not contain required Latin diacritic > characters (e.g., it is plain ASCII)? This would be according to http://jkorpela.fi/iso9.html8 so for example instead of ž -> zh and instead of štš -> shtsh. > * When the output is ambiguous, that means, when two different Cyrillic > strings produce the same Latin (or ASCII) output? This is a good point and one I haven't considered but I'm not sure is there anything we can do about this (at least without major locale system internals work)? Do you have any rough idea how frequently this could happen or is this more a theoretical issue? (Sorry if I've missed earlier comments about this, it's been a long thread.) >> The same with having both System A and System B. Initially I went along >> with the suggestion to include the system A but it is clear now that it >> doesn’t make fixing [BZ #2872] more straightforward. So I’d also propose >> to set it aside for the moment and use the v10 without the system A. >> That is the whole reason I have submitted it, to be superclear on that. > > OK, I think that now I understand your reason to drop System A better. > But still I'd like to rethink implementing System A somehow and drop > (or rather: implement only partially) System B. Yes, I also think System A AKA ISO 9 would be a better choice but I'll leave the final decision for you two (and others who might weigh in). Thanks, -- Marko Myllynen