From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 21301 invoked by alias); 9 Oct 2018 16:49:20 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 21274 invoked by uid 89); 9 Oct 2018 16:49:20 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: =?ISO-8859-1?Q?Yes, score=5.8 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,KAM_NUMSUBJECT,RCVD_IN_DNSWL_NONE autolearn=no version=3.3.2 spammy==d0=be=d1, =d0=b8=d1, 8:=d0=b5, 8:=d0=be?= X-HELO: mail-wm1-f66.google.com Return-Path: Reply-To: Marko Myllynen Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 To: Egor Kobylkin , Rafal Luzynski , Keld Simonsen Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org, "Dmitry V. Levin" , Volodymyr Lisivka , Carlos O'Donell , Max Kutny , danilo@gnome.org References: <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com> <20180412224352.GB2911@altlinux.org> <16e785f3-2e9f-ceb2-698f-dc33c91a5d5e@kobylkin.com> <20181003091949.GA21486@rap.rap.dk> <21d872b2-613e-d1f5-26c0-baa4b5721df9@kobylkin.com> <1485772360.805333.1538731225156@poczta.nazwa.pl> <19e29568-e710-535f-4f90-98dbcec930ed@kobylkin.com> <1028447684.826961.1539036295224@poczta.nazwa.pl> <63fb4fae-a93b-7aff-13df-4452cbc8853f@redhat.com> <18f97c1f-3da2-809d-14bb-6e6d677b27eb@kobylkin.com> From: Marko Myllynen Message-ID: <8bfe3169-55c9-af90-91cb-fe0f3ecccfb6@redhat.com> Date: Tue, 09 Oct 2018 16:49:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <18f97c1f-3da2-809d-14bb-6e6d677b27eb@kobylkin.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-SW-Source: 2018-q4/txt/msg00030.txt.bz2 Hi, To clarify, the page has a section explaining the differences between transliteration and transcription and how the terminology is not entirely unambiguous. It also explains that the national standard SFS 4900 overrides ISO 9, thus ISO 9 can't be used as-is in Finnish context. Thanks, On 2018-10-09 19:22, Egor Kobylkin wrote: > In the hope to be helpful: what you describe below from > https://fi.wikipedia.org/wiki/Siirtokirjoitus is called _transcription_, > not transliteration. > > Transliteration is what we have done with ISO 9 or GOST 7.79 System A > and it could be the same for all languages indeed. > > The transcription can be phonetic or serve other purposes and depends on > the target language or use case. We have used the GOST 7.79 System B. > > Egor > > On 09.10.2018 18:10, Marko Myllynen wrote: >> Hi, >> >> On 2018-10-09 01:04, Rafal Luzynski wrote: >>> >>> Particularly, I think that those rules will not be helpful at all for >>> the languages which use neither Latin nor Cyrillic alphabet. >> >> This is certainly a very good point. >> >>> If you refer to other languages than Russian which also use the Cyrillic >>> alphabet but need a different transliteration rules than Russian for >>> the same characters then it is OK for me now. I am afraid that the iconv >>> algorithm does not handle such case. Of course, we should add this missing >>> feature eventually but I do not volunteer to do it now. >> >> Yes, this would be needed for correct transliteration of different >> languages, and this might be quite a bit of work. There's also the case >> of transliteration and character sets, consider the transliteration >> examples from https://fi.wikipedia.org/wiki/Siirtokirjoitus: >> >> Russian: Борис Николаевич Ельцин >> Int'l: Boris Nikolaevič Elʹcin >> Finnish: Boris Nikolajevitš Jeltsin >> French: Boris Nikolaïevitch Ieltsine >> Phonetic (IPA): [bɐˈrʲis nʲɪkɐˈlaɪvʲɪtɕ ˈjelʲtsɨn] >> >> For French you'll get the correct transliteration with iconv by using -t >> ISO-8859-1//TRANSLIT, for Finnish with -t ISO-8859-15//TRANSLIT but it's >> not so obvious how to get the above kind transliteration for ISO 9 >> international or especially for the phonetic case. >> >> One thing that might be helpful here could be something like: >> >> $ echo ж | LC_ALL=fi_FI.UTF-8 iconv -f UTF-8 -t UTF-8//TRANSLIT_FORCE >> ž >> >> That is, force transliteration of each character (if defined) even if >> it's part of the target character set. AFAICS this is not currently >> possible. >> >>> But, while at this, is there anything that stops are from adding transliteration >>> rules for additional Cyrillic characters not used in Russian but used in >>> other languages? >> >> This would probably make sense. >> >> FWIW, for Finnish the diff for Russian to be applied in the locale on >> top of translit_cyrillic (ISO 9) rules would be something like below, I >> still need to check whether there are rules needed for other languages >> than Russian that could be added (I hope to submit a proper patch >> against fi_FI shortly after translit_cyrillic has landed): >> >> "" >> "";"" >> "";"" >> "";"" >> "" >> "" >> "" >> "" >> "" >> "" >> >> Thanks, >> > -- Marko Myllynen