From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 53979 invoked by alias); 14 Feb 2019 16:48:59 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 53954 invoked by uid 89); 14 Feb 2019 16:48:58 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=BAYES_00,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy=Country, HX-Google-DKIM-Signature:reply-to, HTo:U*libc-locales, clarity X-HELO: mail-wr1-f65.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 To: Egor Kobylkin , libc-alpha@sourceware.org, libc-locales@sourceware.org, Carlos O'Donell Cc: Rafal Luzynski , Siddhesh Poyarekar , Mike Fabian References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <2124833400.35614.1546698902753@poczta.nazwa.pl> <908ed415-cfe4-804c-f421-4351ef062edc@kobylkin.com> <6d076299-babd-406a-b1fe-87778f54bf36@kobylkin.com> <41aff10b-9cf1-638c-4fbc-8c4f4122f2e9@kobylkin.com> From: Marko Myllynen Message-ID: Date: Thu, 14 Feb 2019 16:48:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0 MIME-Version: 1.0 In-Reply-To: <41aff10b-9cf1-638c-4fbc-8c4f4122f2e9@kobylkin.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-SW-Source: 2019-q1/txt/msg00051.txt.bz2 Hi Carlos, Mike, Rafal, It seems clear that you all are currently too busy to have a look at this but would you have any estimate when you might be able to review this so that we could consider merging? FWIW, I chatted with Egor off-list and we're on the same page wrt the following, hopefully this gives you a bit off jump start for this subject when you have time to dig deeper: 1) Built-in C locale doesn't read/use any translit_* files and it can't have any fallback mechanisms and it only supports ASCII so using GOST 7.79 System B in locale/C-translit.h.in (as per patch v12) would seem to be the appropriate way to implement Cyrillic transliteration for the built-in C locale (it adds some 8KB to the binary). 2) Other locales read/use translit_* files and with them fallbacks and non-ASCII are possible so it would seem preferable to first try ISO 9 / GOST 7.79 System A and only if that fails then use GOST 7.79 System B (in which case the end result should match with the built-in C locale). For this the translit_cyrillic file should be added (as per patch v9 + changes mentioned in patches v10 and v12). 3) Individual locale files can then be updated to use translit_cyrillic as appropriate (see patch v9) and language/national specific conventions (e.g., SFS 4900 for fi_FI) can be applied on per-locale basis. Thanks, On 04/02/2019 09.14, Egor Kobylkin wrote: > Carlos, > are you comfortable to pick this up again this month? > > I would really love to have a reliable action plan to get this committed > for 2.30. Maybe cut out a subset that is undisputed and commit only that > first. It looks kinda like an eternal moving target otherwise. > > for you reference: > https://sourceware.org/ml/libc-alpha/2019-01/msg00036.html > https://sourceware.org/ml/libc-alpha/2019-01/msg00040.html > > Bests, > Egor Kobylkin > > On 09.01.19 21:03, Marko Myllynen wrote: >> Hi, >> >> On 09/01/2019 02.46, Egor Kobylkin wrote: >>> On 07.01.19 21:37, Marko Myllynen wrote: >>>> On 05/01/2019 23.12, Egor Kobylkin wrote: >>>>> >>>>> Good catch! Should we maybe split this into two patches, one for C and >>>>> the other for "country" locales? They have different codes and >>>>> functionality so it looks like it would be easier to keep focus. >>>> >>>> That would probably make sense, the standard C/POSIX locale won't >>>> support System A so it also narrows down solution alternatives with it. >>>> >>>>> "Country" locales in localedata/locales/ can then have the exact same >>>>> translit table included or they can have any other flavor - I don't >>>>> see >>>>> a problem here. >>>> >>>> Indeed, and since those files are not limited to ASCII, perhaps we >>>> could >>>> now reconsider the v9 approach for them, i.e., prefer System A if >>>> possible, otherwise use System B / ASCII (just need to make sure that >>>> the ASCII fall-back for them will match the built-in C ASCII rule)? >>> >>> Happy to hear the split seems to be a clear cut one. >>> How about I rename the "[PATCH v12]...[BZ #2872]" to "[PATCH v1]... >>> C/POSIX [BZ #2872]" and the "[PATCH v9]" gets its own bug-report >>> (number) and title for clarity in communication? >> >> I'm not sure is a new BZ really needed for such an addition, perhaps a >> NEWS entry might be more appropriate (with the full details explained in >> the commit messages of course) but I'll leave this to others to decide. >> >>> This way it would probably be easier to have the decision making process >>> tied up for both patches (separately). We may want to get the v12 POSIX >>> out of the door in 2.30 then and can take all the time we need to set up >>> the rules for "Countries" locales as you need them to be. >> >> Perhaps Rafal or Carlos have better suggestions but I would think we >> could have a patch series where the patch 1/3 adds the C/POSIX locale >> part (that would be what you posted as v12), then patch 2/3 adds >> translit_cyrillic (based on your v9 so supports ISO 9.1995 / GOST 7.79 >> System A and GOST 7.79 System B as a fall-back (which would match the >> C/POSIX rules)), and finally the patch 3/3 updates locales to use >> translit_cyrillic as appropriate. But as said, Rafal or Carlos may have >> alternative suggestions so it might be best to wait for their feedback >> before doing anything yet (it's unfortunate you've had to do so many >> iterations around this already but I think we've all learned something >> during the process and the end result will be more correct than any of >> the earlier versions). >> >> Thanks, >> -- Marko Myllynen