From: Marko Myllynen <myllynen@redhat.com>
To: Egor Kobylkin <egor@kobylkin.com>,
Rafal Luzynski <digitalfreak@lingonborough.com>,
Keld Simonsen <keld@keldix.com>
Cc: libc-alpha@sourceware.org, libc-locales@sourceware.org,
"Dmitry V. Levin" <ldv@altlinux.org>,
Volodymyr Lisivka <vlisivka@gmail.com>,
Carlos O'Donell <carlos@redhat.com>, Max Kutny <mkutny@gmail.com>,
danilo@gnome.org
Subject: Re: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29
Date: Fri, 05 Oct 2018 12:21:00 -0000 [thread overview]
Message-ID: <69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com> (raw)
In-Reply-To: <bb4e1ba5-5fa5-2986-2573-7d27be226124@kobylkin.com>
Hi,
The scheme I proposed would also be ASCII compatible; consider this example:
% CYRILLIC CAPITAL LETTER SHA
<U0428> "<U0160>";"<U0053><U0068>"
"printf \\u0428\\n | iconv -f UTF-8 -t ISO-8859-15//TRANSLIT | iconv -f
ISO-8859-15 -t UTF-8" would produce Å as per System A and "printf
\\u0428\\n | iconv -f UTF-8 -t ASCII//TRANSLIT" would produce Sh as per
System B.
Thanks,
On 2018-10-05 15:00, Egor Kobylkin wrote:
> Hi Marko,
>
> I have chosen the System B because it is ASCII compartible. System A is
> not ASCII compartible (diacritics in target).
>
> https://en.wikipedia.org/wiki/ISO_9#ISO_9:1995,_or_GOST_7.79_System_A
> "GOST 7.79 contains two transliteration tables.
>
> System A
> one Cyrillic character to one Latin character, some with diacritics
> â identical to ISO 9:1995
>
> System B
> one Cyrillic character to one or many Latin characters without
> diacritics
> "
> Hope this helps,
> Egor
>
> On 05.10.2018 13:54, Marko Myllynen wrote:
>> Hi,
>>
>> Would it make sense to first use ISO 9:1995/GOST 7.79 System A if
>> possible and if not, then fall back to GOST 7.79 System B?
>>
>> Implementation-wise current translit_* files have few examples where a
>> non-ASCII transliteration is tried first before an ASCII fallback. These
>> examples are from translit_neutral:
>>
>> % NARROW NO-BREAK SPACE
>> <U202F> <U00A0>;<U0020>
>> % REVERSED TRIPLE PRIME
>> <U2037> "<U2035><U2035><U2035>";"<U0060><U0060><U0060>"
>>
>> Thanks,
>>
>> On 2018-10-05 13:29, Egor Kobylkin wrote:
>>> Keld,Marko,Rafal, other locale maintainers,
>>>
>>> this all is written with having in mind a minimal viable fix for this
>>> bug asap. I want to avoid wasting maintainers time getting into
>>> fundamental discussions here (although for perfectly good reasons).
>>>
>>> I see three options:
>>> 1. those locale maintainers that are fine with using ISO
>>> 9:1995/GOST_7.79_System_B cyrillic transliteration table (Ru) include it
>>> in their locales (see attached screenshot of the table).
>>> 2. those that that want to have a differing table can create their own
>>> variety based on the spreadsheet I have prepared
>>> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include it in
>>> this patch.
>>> 3. those that want to omit a cyrillic transliteration altogether for now
>>> state so and just carry over the bug #2872 from the year 2006.
>>>
>>> Does this make sense to you?
>>>
>>> Just to be super clear on this: the patch is a stopgap _ASCII_
>>> transliteration table. ASCII being AMERICAN Standard Code for
>>> Information Interchange, that is obviously orthogonal to any
>>> transliteration rule of other countries. As such it is not explicitly
>>> targeting transliteration standards of any country.
>>>
>>> The fact that the patch is reflecting Russian variety of ISO
>>> 9:1995/GOST_7.79_System_B is because a) ISO 9:1995/GOST_7.79_System_B is
>>> available and can be helpful to a majority of cyrillic users b) I have
>>> access to it including via being proficient in Russian.
>>>
>>> It is offered to all the respective locale maintainers as a stopgap
>>> solution. Stopgap in the sense that it is better to have some
>>> transliteration than not to have any at all and carry over the bug from
>>> 2006. That it may be a somewhat officially correct transliteration for
>>> ru_RU is a bonus. In that sense I would dub the discussion on the
>>> correctness for other languages "offtopic". Let me know if this is not OK.
>>>
>>> You are all are correctly mentioning the deficiencies of this approach.
>>> However, I couldn't find a better straightforward approach as of yet.
>>> Happy to hear from you as on how this could be handled.
>>>
>>> There is a danger of being caught in the web of language/country
>>> differences. I propose just pruning the locales that are not comfortable
>>> including this current table. We can address possible solutions in the
>>> second wave of patching.
>>>
>>> I am vary of getting into discussions on specific country variants just
>>> because of the sheer complexity of this topic. It is probably better
>>> addressed by respective maintainers of their locales. I do not see a
>>> "one fits all" solution in this first wave possible.
>>>
>>> I would like to have this "three options plan of action" vetted first
>>> and then we could go to the specific detail. (Like, for instance, what
>>> characters should be included in to the table, and in which
>>> transliteration form.)
>>>
>>> I am looking forward to your reply,
>>> Egor Kobylkin
>>>
>>> P.S. specifically as to how address languages other than Ru included in
>>> GOST_7.79_System_B: we can take the first option left to right from that
>>> table (Ru,By,Uk,Bg,Mk). Then it will technically work for all those
>>> locales/languages but with errors where Ru supersedes their own variants.
>>>
>>>
>>> On 05.10.2018 11:20, Rafal Luzynski wrote:
>>>> 3.10.2018 11:32 Egor Kobylkin <egor@kobylkin.com> wrote:
>>>>>
>>>>> On 03.10.2018 11:19, Keld Simonsen wrote:
>>>>>> Hi
>>>>>>
>>>>>> Please note that translitteration of Cyrillic to latin is not universal.
>>>>>> There are different schemes for for example German, English and Danish, and
>>>>>> there is also an ISO standard for it.
>>>>>
>>>>> Thanks for your feedback, Keld!
>>>>>
>>>>> Could the locale maintainers that wouldn't like to include this patch
>>>>> explicitly state so here?
>>>>
>>>> I think it is about me so I must reply. I am sorry about that and the sole
>>>> reason is my lack of time. I'm just a volunteer here, that means it's not
>>>> my regular job to work on locale data nor anything in glibc nor in any other
>>>> open source project. I do these things only in my free time which I don't
>>>> have much. Of course you will see my contributions here and there but they
>>>> are either trivial or take me months to complete. Your patches are on my
>>>> radar but I can't tell any ETA for them. Of course, there are other people
>>>> around here and they are all welcome to come and join.
>>>>
>>>>> That is:
>>>>> - In the case that there is a different preferred cyrillic
>>>>> transliteration table for any specific locale their maintainers may want
>>>>> to point me to it so I can supply a separate table/patch.
>>>>> - Or they could state explicitly that for some reason they would like to
>>>>> exclude their locale from the patch for a default cyrillic
>>>>> transliteration altogether.
>>>>
>>>> As Keld wrote, there are probably separate rules for every language so
>>>> I don't think you should treat your rules as universal and include them
>>>> in every locale. At first sight, it seems to me they work only for English
>>>> (as a destination locale). Also, although it is called "transliteration
>>>> from Cyrillic" it seems that it covers only Russian alphabet. What about
>>>> other languages which use Cyrillic alphabet but add their own diacritic
>>>> characters? Think about Belarusian, Ukrainian, Serbian, Chechen, Chuvash,
>>>> Mari, Ossetian, Yakut, Tatar, and more. What about languages which use
>>>> Cyrillic alphabet but transliterate their respective letters in a different
>>>> way than Russian? For example, Russian "Ъ" is (I think) usually skipped
>>>> in transliteration, I think you propose "``", but when transliterating from
>>>> Bulgarian they usually transliterate this as "Ä".
>>>>
>>>> Few remarks:
>>>>
>>>> * I think you transliterate "Ñ" as "shh", wouldn't "shch" be better?
>>>> * You transliterate "Ñ" as "cz", wouldn't "ts" be better? By the way,
>>>> in Polish language "cz" is a correct transliteration of "Ñ".
>>>> * You transliterate "й" as "j", this is fine in many languages but wouldn't
>>>> "y" be better in English?
>>>> * In case of "е": how will you know if it is correct to transliterate it
>>>> to "e" or "ie" or "je" or "ye"?
>>>>
>>>> These remarks are obviously incomplete, your patch deserves much more
>>>> attention to review.
>>>>
>>>> Best regards,
>>>>
>>>> Rafal
>>>>
>>>
>>
>>
>
--
Marko Myllynen
next prev parent reply other threads:[~2018-10-05 12:21 UTC|newest]
Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <41532e13-a63d-5df1-ab37-05eb4d6c8d0a@kobylkin.com>
[not found] ` <20180412224352.GB2911@altlinux.org>
2018-07-17 19:34 ` SUBJECT: [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-07-17 19:41 ` Carlos O'Donell
2018-07-17 19:50 ` Egor Kobylkin
2018-07-17 19:59 ` Carlos O'Donell
2018-08-06 19:00 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] re-submission for 2.29 Egor Kobylkin
2018-10-03 8:28 ` Egor Kobylkin
2018-10-03 9:20 ` Keld Simonsen
2018-10-03 9:32 ` Egor Kobylkin
2018-10-05 8:44 ` Marko Myllynen
2018-10-05 9:20 ` Rafal Luzynski
2018-10-05 10:37 ` Egor Kobylkin
2018-10-08 22:05 ` Rafal Luzynski
2018-10-08 22:52 ` Egor Kobylkin
2018-10-09 21:43 ` Rafal Luzynski
2018-10-09 16:10 ` Marko Myllynen
2018-10-09 16:22 ` Egor Kobylkin
2018-10-09 16:49 ` Marko Myllynen
2018-10-09 22:09 ` Rafal Luzynski
2018-10-10 11:21 ` Marko Myllynen
2018-10-11 10:10 ` Marko Myllynen
[not found] ` <deacdf31-d0bb-a92d-1de3-934d6b4cb158@kobylkin.com>
2018-10-05 11:54 ` Marko Myllynen
2018-10-05 12:01 ` Egor Kobylkin
2018-10-05 12:21 ` Marko Myllynen [this message]
2018-10-05 20:47 ` Egor Kobylkin
2018-10-08 12:41 ` Marko Myllynen
2018-10-08 22:23 ` Rafal Luzynski
2018-10-08 23:36 ` Egor Kobylkin
2018-10-09 13:18 ` Egor Kobylkin
2018-10-09 18:34 ` Egor Kobylkin
2018-10-09 22:18 ` Rafal Luzynski
2018-10-09 22:40 ` Egor Kobylkin
2018-10-09 22:43 ` Egor Kobylkin
2018-10-10 11:23 ` Marko Myllynen
2018-10-10 12:20 ` Egor Kobylkin
2018-10-10 12:34 ` Marko Myllynen
2018-10-10 22:29 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v2 Egor Kobylkin
2018-10-11 10:00 ` Marko Myllynen
2018-10-11 11:05 ` Rafal Luzynski
2018-10-11 13:10 ` Marko Myllynen
2018-10-11 13:51 ` Volodymyr Lisivka
2018-10-11 14:59 ` Egor Kobylkin
2018-10-11 21:31 ` Egor Kobylkin
2018-10-11 15:05 ` Egor Kobylkin
2018-10-11 15:45 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v3 Egor Kobylkin
2018-10-11 21:33 ` [PATCH] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] v4 Egor Kobylkin
2018-10-12 14:06 ` [PATCH v5] Locales: Cyrillic -> ASCII transliteration table [BZ #2872] Egor Kobylkin
2018-10-13 1:01 ` Rafal Luzynski
2018-10-13 16:58 ` Egor Kobylkin
2018-10-15 11:05 ` Marko Myllynen
2018-10-15 11:55 ` Egor Kobylkin
2018-10-23 23:08 ` Rafal Luzynski
2018-10-17 14:17 ` [PATCH v6] " Egor Kobylkin
2018-11-01 22:52 ` [PATCH v7] " Egor Kobylkin
2018-11-02 0:01 ` [PATCH v8] " Egor Kobylkin
2018-11-02 22:22 ` Rafal Luzynski
2018-11-02 23:27 ` Egor Kobylkin
2018-11-14 21:25 ` [PATCH v9] " Egor Kobylkin
2018-11-16 22:17 ` Rafal Luzynski
2018-11-17 18:35 ` Egor Kobylkin
2018-11-19 7:14 ` Marko Myllynen
2018-11-19 9:22 ` Egor Kobylkin
2018-11-19 19:36 ` Marko Myllynen
2018-12-01 22:09 ` Rafal Luzynski
2018-12-01 22:53 ` Egor Kobylkin
2018-12-03 22:19 ` Egor Kobylkin
[not found] ` <1361059722.707244.1544231740358@poczta.nazwa.pl>
2018-12-10 21:20 ` Marko Myllynen
2018-12-19 22:26 ` Rafal Luzynski
2018-12-19 22:48 ` Egor Kobylkin
2018-12-19 23:51 ` Rafal Luzynski
2018-11-19 11:11 ` [PATCH v10] " Egor Kobylkin
2018-12-07 23:35 ` Rafal Luzynski
2018-12-08 21:51 ` Egor Kobylkin
2018-12-19 22:42 ` Rafal Luzynski
2018-12-19 23:02 ` Egor Kobylkin
2018-12-20 0:06 ` Rafal Luzynski
2018-12-08 22:28 ` [PATCH v11] Locales: Cyrillic -> ASCII transliteration " Egor Kobylkin
2018-12-19 23:16 ` Egor Kobylkin
2018-12-26 10:07 ` Siddhesh Poyarekar
2018-12-26 12:14 ` Egor Kobylkin
2018-12-27 1:31 ` Siddhesh Poyarekar
2018-12-27 11:31 ` Rafal Luzynski
2019-01-02 18:39 ` [PATCH v12] " Egor Kobylkin
2019-01-05 14:36 ` Rafal Luzynski
2019-01-05 21:13 ` Egor Kobylkin
2019-01-07 20:37 ` Marko Myllynen
2019-01-09 0:46 ` Egor Kobylkin
2019-01-09 20:03 ` Marko Myllynen
2019-02-04 7:14 ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] ping for 2.30 Egor Kobylkin
2019-02-14 16:48 ` Marko Myllynen
2019-03-04 22:12 ` Egor Kobylkin
2019-03-11 13:59 ` PING " Egor Kobylkin
2019-03-14 19:49 ` Egor Kobylkin
2019-04-19 22:24 ` Rafal Luzynski
[not found] ` <5ELixS9SQ0DW4mlvswp96ASpLobBabU9KQ6zOTH-Udrb34mABhcqiPERpBZfPWZ9F77s8XNmiLIAq9UWu0AjLFFdjOz_FZVU5_xF-SiQkrw=@kobylkin.com>
2019-04-27 2:51 ` Siddhesh Poyarekar
2019-04-27 7:34 ` Diego (Egor) Kobylkin
2019-04-09 1:04 ` [PATCH v12] Locales: Cyrillic -> ASCII transliteration [BZ #2872] Carlos O'Donell
2019-03-19 10:39 ` ping " Egor Kobylkin
2019-03-28 16:20 ` [PING^4][PATCH " Marko Myllynen
2019-04-04 19:44 ` [PING^5][PATCH " Egor Kobylkin
2019-04-06 1:36 ` Siddhesh Poyarekar
2019-04-16 7:15 ` [PING^6][PATCH " Marko Myllynen
2019-04-16 13:17 ` Carlos O'Donell
2019-04-16 17:07 ` Egor Kobylkin
2019-04-16 17:58 ` Carlos O'Donell
2019-04-16 18:41 ` Egor Kobylkin
2019-04-16 19:06 ` Carlos O'Donell
2019-05-10 12:19 ` Marko Myllynen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69e26cab-810e-824b-3b16-b75ac44d8b0c@redhat.com \
--to=myllynen@redhat.com \
--cc=carlos@redhat.com \
--cc=danilo@gnome.org \
--cc=digitalfreak@lingonborough.com \
--cc=egor@kobylkin.com \
--cc=keld@keldix.com \
--cc=ldv@altlinux.org \
--cc=libc-alpha@sourceware.org \
--cc=libc-locales@sourceware.org \
--cc=mkutny@gmail.com \
--cc=vlisivka@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).