Hi Rafal,

> But, while at this, is there anything that stops are from adding
> transliteration rules for additional Cyrillic characters not used in
> Russian but used in other languages?

Just to make sure we are not talking at cross purposes. Since your last
email on this topic on the suggestion from Marko I have already
implemented ISO 9 transliteration for all characters there are. This
should cover most if not all Slavic Cyrillic. You seem to have just
noticed and replied to this email of Marko as I write mine.

Pls also check the Spreadsheet version I have just uploaded
https://sourceware.org/bugzilla/attachment.cgi?id=11298

I am currently absorbing Marko's further suggestions and correction to
that one and will get back for more discussion once done there. I am
reading your suggestions and taking them to my heart, be sure of that.

Two  professional translators independently indicated the difference
between transliteration and transcription to me. Transliteration is
normative (letter for letter) and transcription is phonetic - letter for
whatever combination of Latin letters in the target language that sounds
like it for a native speaker. While transliteration should be easy to
cover for all those languages via ISO 9, transcription is inherently
language specific. The problem is we are (mis)using the transcription as
transliteration to ASCII because ASCII set of characters does not allow
for proper transcription. Another problem is that to be really useful
the ASCII transliteration should work outside of source locale (i.e. not
only ru_RU but en_US, de_DE, en_DE, es_ES etc. or even just C locale).

In fact for myself I would be committed to do all work needed to cover
at least C, en_US, ru_RU, de_DE in that order. ru_RU as a "courtesy", I
am not really using it but hope more contributors for locales may come
because of that and fix my bugs :-).


> The problem is that we don't have a separate maintainer for each
> locale, we have only 2 maintainers for about 200 locales and we must
> represent them all.

It was not clear to me that glibc team can not fall back on the
individual locale maintainers to make the decision. But then it may make
the decision making even easier. If you guys have a list of requirements
(may be implicit until now) could you please shoot them my way? We can
also certainly just keep this thread up and have all issues ironed out.

Anyway hopefully with ISO 9 as a first column in the translit_cyrillic
we cover the issue of the completeness of transliteration now. What we
need to figure out is transcription/transliteration to ASCII - second
column.

Are we sharing the same view on this?

Speaking on decision making - maybe I can get an officially certified
court translator to answer our questions. Do you care to put a list
together of questions you would like answered to make a decision on the
table/inclusion into various locales?

Hope this helps,
Egor


On 09.10.2018 00:04, Rafal Luzynski wrote:
> 5.10.2018 12:36 Egor Kobylkin <egor@kobylkin.com> wrote:
>> [...] I see three options: 1. those locale maintainers that are
>> fine with using ISO 9:1995/GOST_7.79_System_B cyrillic
>> transliteration table (Ru) include it in their locales.
>> https://sourceware.org/bugzilla/attachment.cgi?id=11289 2. those
>> that that want to have a differing table can create their own 
>> variety based on the spreadsheet I have prepared 
>> https://sourceware.org/bugzilla/attachment.cgi?id=8590 and include
>> it in this patch. 3. those that want to omit a cyrillic
>> transliteration altogether for now state so and just carry over the
>> bug #2872 from the year 2006.
>> 
>> Does this make sense to you?
> 
> The problem is that we don't have a separate maintainer for each
> locale, we have only 2 maintainers for about 200 locales and we must
> represent them all.  Sometimes a locale may happen to be our own
> native locale or of someone in this list, or it may be a locale which
> we accidentally can speak as a foreign language, or we may have
> friends who can speak it. Or it may be totally unknown and we still
> must somehow handle it.
> 
> I think that these transliteration rules should be included in
> multiple locales on "opt-in" basis rather than "opt-out".  I mean, we
> should not include them in all locales unless someone explicitly
> provides a different rules.  Instead, I think we should add them
> (maybe with modification) only to those locales where we have a good
> reason to think they will work.
> 
> Particularly, I think that those rules will not be helpful at all
> for the languages which use neither Latin nor Cyrillic alphabet.
> 
>> [...] The fact that the patch is reflecting Russian variety of ISO 
>> 9:1995/GOST_7.79_System_B is because a) ISO
>> 9:1995/GOST_7.79_System_B is available and can be helpful to a
>> majority of cyrillic users b) I have access to it including via
>> being proficient in Russian.
> 
> I took a look at these standards and as first I doubted they may be 
> correct for English language now I understand they are created for 
> Russian users.  Therefore I think it is pretty correct to include
> them to Russian locale data.  Will it be OK if we say that it is only
> for Russian language?  Will it be satisfying for you and/or your
> users?
> 
>> It is offered to all the respective locale maintainers as a
>> stopgap solution. Stopgap in the sense that it is better to have
>> some transliteration than not to have any at all and carry over the
>> bug from 2006. That it may be a somewhat officially correct
>> transliteration for ru_RU is a bonus. In that sense I would dub the
>> discussion on the correctness for other languages "offtopic". Let
>> me know if this is not OK.
> 
> If you refer to other languages than Russian which also use the
> Cyrillic alphabet but need a different transliteration rules than
> Russian for the same characters then it is OK for me now.  I am
> afraid that the iconv algorithm does not handle such case.  Of
> course, we should add this missing feature eventually but I do not
> volunteer to do it now.
> 
>> [...] P.S. specifically as to how address languages other than Ru
>> included in GOST_7.79_System_B: we can take the first option left
>> to right from that table (Ru,By,Uk,Bg,Mk). Then it will technically
>> work for all those locales/languages but with errors where Ru
>> supersedes their own variants.
> 
> Makes sense, as long as we cannot select the source language now.
> 
> But, while at this, is there anything that stops are from adding
> transliteration rules for additional Cyrillic characters not used in
> Russian but used in other languages?
> 
> Regards,
> 
> Rafal
>