public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
@ 2015-09-07 11:50 ` ekobylkin at paypal dot com
  2015-09-07 11:54 ` schwab@linux-m68k.org
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 11:50 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
                 CC|                            |ekobylkin at paypal dot com
         Resolution|WORKSFORME                  |---

--- Comment #5 from Egor Kobylkin <ekobylkin at paypal dot com> ---
I would like to try to supply the data you need to make the Cyrillic
transliteration work for the ru_RU locale. Could you point me to an example of
the data you would need need?

Here is what I have tried just to see what works.
$ echo Лаковый |LANG=ru_RU.UTF-8 iconv -t ASCII//TRANSLIT
iconv: (stdin):1:0: cannot convert

$ echo Лаковый |LANG=sr_CS.UTF-8 iconv -t ASCII//TRANSLIT
iconv: (стдул):1:0: не може претворити

$ echo Лаковый |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
iconv: (Standard-Eingabe):1:0: Kann nicht umwandeln.

$ echo Müßte |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
M"usste

$ echo Лаковый |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
iconv: (stdin):1:0: cannot convert

$ echo Müßte |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
M"usste

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
  2015-09-07 11:50 ` [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails ekobylkin at paypal dot com
@ 2015-09-07 11:54 ` schwab@linux-m68k.org
  2015-09-07 12:22 ` ekobylkin at paypal dot com
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: schwab@linux-m68k.org @ 2015-09-07 11:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |WAITING

--- Comment #6 from Andreas Schwab <schwab@linux-m68k.org> ---
Make sure you are not using any local modifications.

$ echo Лаковый |LANG=ru_RU.UTF-8 iconv -t ASCII//TRANSLIT
???????
$ echo Лаковый |LANG=sr_CS.UTF-8 iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0
$ echo Лаковый |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
???????
$ echo Müßte |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
Muesste
$ echo Лаковый |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
???????
$ echo Müßte |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
Musste

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
  2015-09-07 11:50 ` [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails ekobylkin at paypal dot com
  2015-09-07 11:54 ` schwab@linux-m68k.org
@ 2015-09-07 12:22 ` ekobylkin at paypal dot com
  2015-09-07 15:07 ` myllynen at redhat dot com
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 12:22 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #7 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Andreas, in your example the Cyrillic transliteration does not work either.  My
understanding is that the tool is lacking a translation table for Cyrillic to
TRANSLIT for example in the ru_RU locale. This is what Ulrich Drepper asks for
in his comment2 here: https://sourceware.org/bugzilla/show_bug.cgi?id=2872#c2
I would like to know in which form this data should be provided?

I am only concerned with the Cyrillic for now. German serves as an example that
the functionality works at all in at least one case.

My first submission is from Cygwin on Windows 7. While it may be indeed some
effect of the Cygwin (name similarity to Cyrillic is coincidental) I have just
tried the same on Ubuntu 12 for essentially same effect.

$ echo Лаковый |LANG=ru_RU.UTF-8 iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0
$ echo Лаковый |LANG=sr_CS.UTF-8 iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0
$ echo Лаковый |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0
$ echo Müßte |LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
Miconv: illegal input sequence at position 1
$ echo Лаковый |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
???????
$ echo Müßte |LANG=en_US.UTF-8 iconv -t ASCII//TRANSLIT
Musste

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-09-07 12:22 ` ekobylkin at paypal dot com
@ 2015-09-07 15:07 ` myllynen at redhat dot com
  2015-09-07 23:08 ` ekobylkin at paypal dot com
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2015-09-07 15:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |myllynen at redhat dot com

--- Comment #8 from Marko Myllynen <myllynen at redhat dot com> ---
You'll need to setup testing environment where you can see how your changes
affect to iconv and then try to come up with proper rules, the following links
should help get you started (and are pretty much all the documentation there
is):

https://sourceware.org/glibc/wiki/Locales
http://man7.org/linux/man-pages/man1/iconv.1.html
https://sourceware.org/bugzilla/show_bug.cgi?id=16061
https://sourceware.org/ml/libc-alpha/2015-07/msg00836.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-09-07 15:07 ` myllynen at redhat dot com
@ 2015-09-07 23:08 ` ekobylkin at paypal dot com
  2015-09-07 23:09 ` ekobylkin at paypal dot com
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 23:08 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #9 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8585
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8585&action=edit
the OpenOffice Calc spreadsheet used to create the translit_cyrillic file

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2015-09-07 23:08 ` ekobylkin at paypal dot com
@ 2015-09-07 23:09 ` ekobylkin at paypal dot com
  2015-09-07 23:12 ` ekobylkin at paypal dot com
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 23:09 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8585|the OpenOffice Calc         |the LibreOffice Calc
        description|spreadsheet used to create  |spreadsheet used to create
                   |the translit_cyrillic file  |the translit_cyrillic file

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2015-09-07 23:09 ` ekobylkin at paypal dot com
@ 2015-09-07 23:12 ` ekobylkin at paypal dot com
  2015-09-07 23:35 ` ekobylkin at paypal dot com
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 23:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #10 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8586
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8586&action=edit
translation table for transliteration of cyrillic to ascii

Single character version. Up to three characters are required to do a
reversible transliteration. The table for a reversible transliteration can be
created through the same spreadsheet included here.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2015-09-07 23:12 ` ekobylkin at paypal dot com
@ 2015-09-07 23:35 ` ekobylkin at paypal dot com
  2015-09-08  7:31 ` myllynen at redhat dot com
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-07 23:35 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #11 from Egor Kobylkin <ekobylkin at paypal dot com> ---
I have read the linked documents from Marko Myllynen Comment 8. 
My understanding so far is that apart from possibly required code parts that
are not clear yet to me there should be a translation table for the
transliteration.

Based on the 
man page http://man7.org/linux/man-pages/man5/locale.5.html
Russian GOST 7.79-2000 official transliteration table
http://transliteration.ru/gost-7-79-2000/
and the Unicode file http://www.unicode.org/Public/UNIDATA/UnicodeData.txt
I have created a single character transliteration table in the form of a
following list
% CYRILLIC CAPITAL LETTER IO
<U0401> <U0059>
% CYRILLIC CAPITAL LETTER A
<U0410> <U0041>
% CYRILLIC CAPITAL LETTER BE
<U0411> <U0042>
% CYRILLIC CAPITAL LETTER VE
<U0412> <U0056>
etc.
First Unicode value is the Cyrillic letter and the second is a corresponding
ASCII symbol.

The file is attached as translit_cyrillic. 
I wonder if it could be useful already for inclusion into the Latin based
locales files via "include" keyword.

Please let me know what you think. Specifically my understanding is that this
is the list that Ulrich Drepper was requesting.

I would be grateful if somebody familiar with the logic behind the
transliteration file structure could outline the missing parts in case the
above is not sufficient to get bootstrap the cyrillic-ascii transliteration.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2015-09-07 23:35 ` ekobylkin at paypal dot com
@ 2015-09-08  7:31 ` myllynen at redhat dot com
  2015-09-08  7:41 ` ekobylkin at paypal dot com
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2015-09-08  7:31 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #12 from Marko Myllynen <myllynen at redhat dot com> ---
I don't read Cyrillic but technically the table looks like what would be
expected. I'm CC'ing Mike Fabian who has done the heavy-lifting for bug 16061 -
Mike how does this look like to you?

If you didn't do so already, please test your changes, the earlier mentioned
wiki page and the following man pages should provide all the needed
information.

http://man7.org/linux/man-pages/man1/locale.1.html
http://man7.org/linux/man-pages/man1/localedef.1.html
http://man7.org/linux/man-pages/man7/locale.7.html

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2015-09-08  7:31 ` myllynen at redhat dot com
@ 2015-09-08  7:41 ` ekobylkin at paypal dot com
  2015-09-08  9:11 ` ekobylkin at paypal dot com
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08  7:41 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #13 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Thank you for the feedback, Marko!
I will do the testing as suggested and will supply the multi-character
transliteration as well. While for my purposes a single-character would do, it
should be more practical to have the multi-character one in place.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2015-09-08  7:41 ` ekobylkin at paypal dot com
@ 2015-09-08  9:11 ` ekobylkin at paypal dot com
  2015-09-08 10:06 ` ekobylkin at paypal dot com
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08  9:11 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8586|0                           |1
        is obsolete|                            |

--- Comment #14 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8588
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8588&action=edit
a version that works for localedef

I have tested it with the en_GB locale including into the following section
LC_CTYPE
copy "i18n"

translit_start
include "translit_combining";"translit_cyrillic";""
translit_end
END LC_CTYPE

let's copy en_GB to en_TR for the testing purposes and generate the new locale
en_TR.UTF-8 while being in glibc/localedata/locales
I18NPATH=./ localedef -f UTF-8 -i en_TR en_TR.UTF-8
Now we can test the transliteration 

$echo Съешь ещё этих мягких французских булок, да выпей же чаю |LOCPATH=.
LC_ALL=en_TR.UTF-8 LANG=en_TR.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT
S`es` esy etix mygkix francuzskix bulok, da vypej ze cay

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2015-09-08  9:11 ` ekobylkin at paypal dot com
@ 2015-09-08 10:06 ` ekobylkin at paypal dot com
  2015-09-08 10:06 ` ekobylkin at paypal dot com
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08 10:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8585|0                           |1
        is obsolete|                            |

--- Comment #16 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8590
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8590&action=edit
the LibreOffice Calc spreadsheet used to create the translit_cyrillic file with
milti-character transliteration

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2015-09-08 10:06 ` ekobylkin at paypal dot com
@ 2015-09-08 10:06 ` ekobylkin at paypal dot com
  2015-09-08 10:06 ` ekobylkin at paypal dot com
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08 10:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8589|0                           |1
        is obsolete|                            |

--- Comment #17 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8591
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8591&action=edit
multi-character transliteration table cyrillic->ascii with fallback to
single-character

correction: updated the comment in the file to reflect new spreadsheet and
mutli-character feature

I have not tested the fallback to the sigle-character but have included it in
case somebody needs it. I am not sure on how to test it.
The file should ideally be included in all Latin based locales via include in
this section as follows (example)
LC_CTYPE
copy "i18n"

translit_start
include "translit_combining";"translit_cyrillic";""
translit_end
END LC_CTYPE

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2015-09-08 10:06 ` ekobylkin at paypal dot com
@ 2015-09-08 10:06 ` ekobylkin at paypal dot com
  2015-09-08 10:21 ` ekobylkin at paypal dot com
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08 10:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Attachment #8588|0                           |1
        is obsolete|                            |

--- Comment #15 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8589
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8589&action=edit
multi-character transliteration table cyrillic->ascii with fallback to
single-character

I have not tested the fallback to the sigle-character but have included it in
case somebody needs it. I am not sure on how to test it.
The file should ideally be included in all Latin based locales via include in
this section as follows (example)
LC_CTYPE
copy "i18n"

translit_start
include "translit_combining";"translit_cyrillic";""
translit_end
END LC_CTYPE

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2015-09-08 10:06 ` ekobylkin at paypal dot com
@ 2015-09-08 10:21 ` ekobylkin at paypal dot com
  2015-09-18  9:30 ` ekobylkin at paypal dot com
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-08 10:21 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #18 from Egor Kobylkin <ekobylkin at paypal dot com> ---
(In reply to Ulrich Drepper from comment #2)
> Transliteration is locale dependend, there is no way around it:
> 
> Russian/Cyrillic:  &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1105;&#1074;
> 
> German transliteration: Gorbaschow
> 
> English transliteration: Gorbatsov or Gorbatsev
> 
> If you want cyrillic transliteration for the locale you use, provide the
> data.
I want to comment on this to clarify my starting point and ask for suggestions
in case somebody decide to take on further development. For now I believe the
issue is solved well however in a most basic way.

From the Russian speaking person point of view there are various
transliterations possible for Cyrillic depending on the purpose. A good example
of a multiplicity of such transliterations is listed here
http://transliteration.ru/ However having different characters to represent the
Cyrillic letters they have same phonetic meaning for a Russian-speaking person.
So any of them could be used for all the Latin locales. This is what I propose
as a first approximation to solve this issue. My submission above takes this
approach with the GOST 7.79-2000 transliteration chosen as a basis.

For a non-Russian speaking person a yet different transliteration may make
sense to represent their phonetic rules. This is what Ulrich is referring in
his comment above. 

One could take my table as a basis and create separate transliteration tables
to specific locales. The one I have proposed could then still serve as a
ASCII//TRANSLIT target or be replaced by a most proper one.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2015-09-08 10:21 ` ekobylkin at paypal dot com
@ 2015-09-18  9:30 ` ekobylkin at paypal dot com
  2015-09-18 10:05 ` ekobylkin at paypal dot com
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18  9:30 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |suse-linux at ml082 dot pinguin.un
                   |                            |i.cc

--- Comment #19 from Egor Kobylkin <ekobylkin at paypal dot com> ---
*** Bug 12031 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2015-09-18  9:30 ` ekobylkin at paypal dot com
@ 2015-09-18 10:05 ` ekobylkin at paypal dot com
  2015-09-18 13:23 ` ekobylkin at paypal dot com
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 10:05 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pere at hungry dot com

--- Comment #20 from Egor Kobylkin <ekobylkin at paypal dot com> ---
*** Bug 89 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2015-09-18 10:05 ` ekobylkin at paypal dot com
@ 2015-09-18 13:23 ` ekobylkin at paypal dot com
  2015-09-18 13:23 ` myllynen at redhat dot com
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 13:23 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW

--- Comment #22 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Reset the bug status to "NEW", to signify it's ready for review by maintainers
of the library

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2015-09-18 13:23 ` myllynen at redhat dot com
@ 2015-09-18 13:23 ` ekobylkin at paypal dot com
  2015-09-18 14:07 ` ekobylkin at paypal dot com
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 13:23 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #21 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Tested including the Greeklish_transliteraion
https://sourceware.org/bugzilla/attachment.cgi?id=6380 from the duplication of
this bug along with the cyrillic tranlation proposed in this bug.

Copied the file translit_greeklish to glibc/localedata/locales. In a copy of
the en_GB locale en_TR2 changed this section 
LC_CTYPE
copy "i18n"
translit_start
include "translit_combining";""
include "translit_cyrillic";""
include "translit_greeklish";""
translit_end
END LC_CTYPE

generated the en_TR2 locale I18NPATH=./ localedef -f UTF-8 -i en_TR2
../../../en_TR/en_TR2.UTF-8

echo CYRILLIC Съешь ещё этих мягких французских булок, да выпей же чаю GREEK
Ελληνικό Ίδρυμα Ευρωπαϊκής και Εξωτερικής |LOCPATH=.../en_TR/
LC_ALL=en_TR2.UTF-8 LANG=en_TR2.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe chayu
GREEK Ellhniko Idryma Eyrwpaikhs kai Ekswterikhs

Test successfull.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2015-09-18 13:23 ` ekobylkin at paypal dot com
@ 2015-09-18 13:23 ` myllynen at redhat dot com
  2015-09-18 13:23 ` ekobylkin at paypal dot com
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2015-09-18 13:23 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #23 from Marko Myllynen <myllynen at redhat dot com> ---
Looks like you're making very good progress here. According to
https://sourceware.org/glibc/wiki/Locales the next step would be to check the
Contribution checklist at
https://sourceware.org/glibc/wiki/Contribution%20checklist and post your
patches to libc-alpha + libc-locales for formal review.

However, please be aware that for example Mike's translit update patch has been
pending for a review for many months already [1] so having the patches included
might take a while. But the first step anyway is to post them to the lists.

1) https://sourceware.org/ml/libc-alpha/2015-09/msg00190.html

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2015-09-18 13:23 ` ekobylkin at paypal dot com
@ 2015-09-18 14:07 ` ekobylkin at paypal dot com
  2015-09-18 16:39 ` ekobylkin at paypal dot com
  2015-09-18 16:40 ` ekobylkin at paypal dot com
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 14:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

--- Comment #24 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Marco,

thanks for reviewing, I will proceed as you propose.

Just in case you know it would be great to have your advice:
In order to get the translit included into the default C.UTF8 locale what is
the venue to discuss that? 

It is a default Cygwin locale and there is no way to generate an own locale in
Cygwin environment AFAIK. But neither could I re-generate the C.UTF8 from the
original POSIX file on my Ubuntu system to test.

I get the the same error messages as listed here. 
http://ask.debian.net/questions/how-to-generate-a-c-utf-8-locale-in-debian-squeeze

So this appears to be a blocker to generate a patch. It seems the POSIX source
file for C.UTF8 is somehow broken for Ubuntu. Do I need to file another bug for
that or is that by design?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2015-09-18 14:07 ` ekobylkin at paypal dot com
@ 2015-09-18 16:39 ` ekobylkin at paypal dot com
  2015-09-18 16:40 ` ekobylkin at paypal dot com
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 16:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING

--- Comment #25 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Pangramms in five languages to test the transliteration.

echo CYRILLIC Съешь ещё этих мягких французских булок, да выпей же чаю GREEK
Ελληνικό Ίδρυμα Ευρωπαϊκής και Εξωτερικής GERMAN Zwölf Boxkämpfer jagen Victor
quer über den großen Sylter Deich FRENCH Dès Noël où un zéphyr haï me vêt de
glaçons würmiens je dîne d’exquis rôtis de bœuf au kir à l’aÿ d’âge mûr \&
cætera SPANISH El veloz murciélago hindú comía feliz cardillo y kiwi, la
cigüeña tocaba el saxofón detrás del palenque de paja|LOCPATH=./
LC_ALL=en_TR2.UTF-8 LANG=en_TR2.UTF-8 iconv -f UTF-8 -t ASCII//TRANSLIT

And the result so you can compare. 
CYRILLIC S``esh` eshhyo e`tix myagkix franczuzskix bulok, da vy'pej zhe chayu
GREEK Ellhniko Idryma Eyrwpaikhs kai Ekswterikhs GERMAN Zwolf Boxkampfer jagen
Victor quer uber den grossen Sylter Deich FRENCH Des Noel ou un zephyr hai me
vet de glacons wurmiens je dine d'exquis rotis de boeuf au kir a l'ay d'age mur
& caetera SPANISH El veloz murcielago hindu comia feliz cardillo y kiwi, la
ciguena tocaba el saxofon detras del palenque de paja

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
       [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2015-09-18 16:39 ` ekobylkin at paypal dot com
@ 2015-09-18 16:40 ` ekobylkin at paypal dot com
  22 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 16:40 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2872

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|libc-locales at sourceware dot org |ekobylkin at paypal dot com

--- Comment #26 from Egor Kobylkin <ekobylkin at paypal dot com> ---
Created attachment 8618
  --> https://sourceware.org/bugzilla/attachment.cgi?id=8618&action=edit
test file

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: " edi at gmx dot de
                   ` (2 preceding siblings ...)
  2007-02-17 22:17 ` edi at gmx dot de
@ 2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 27+ messages in thread
From: drepper at redhat dot com @ 2007-02-19  0:52 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From drepper at redhat dot com  2007-02-19 00:51 -------
It all works as designed given the data provided.  If you want change, provide
the data.  Otherwise go away.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: " edi at gmx dot de
  2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
  2007-02-17 19:24 ` drepper at redhat dot com
@ 2007-02-17 22:17 ` edi at gmx dot de
  2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 27+ messages in thread
From: edi at gmx dot de @ 2007-02-17 22:17 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From edi at gmx dot de  2007-02-17 22:17 -------
"WORKSFORME" implies that you cannot reproduce the problem, but... does
transliterating &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1086;&#1074; work with English or not? (see below) If not, how can
it be "resolved"?

Or what are you trying to say with "provide the data"? That there is no data
yet? I have seen it working some years ago. With correct US-style
transliterations. If it is broken now or data has been lost, then TRANSLIT maybe
should be disabled and throw an error immediately. Currently it produces crap
and AFAICS there is no proper documentation explaining why.

The crap it creates is not even consistent within the same language/country
pair, without .UTF-8 suffix it produces more funny non-sense.

echo &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1086;&#1074; | LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
????????
echo &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1086;&#1074; | LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
????????
echo &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1086;&#1074; | LANG=de_DE iconv -t ASCII//TRANSLIT
??? 3/4 N?????N?? 3/4 ??
echo &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1086;&#1074; | LANG=en_US iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: " edi at gmx dot de
  2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
@ 2007-02-17 19:24 ` drepper at redhat dot com
  2007-02-17 22:17 ` edi at gmx dot de
  2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 27+ messages in thread
From: drepper at redhat dot com @ 2007-02-17 19:24 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From drepper at redhat dot com  2007-02-17 19:24 -------
Transliteration is locale dependend, there is no way around it:

Russian/Cyrillic:  &#1043;&#1086;&#1088;&#1073;&#1072;&#1095;&#1105;&#1074;

German transliteration: Gorbaschow

English transliteration: Gorbatsov or Gorbatsev

If you want cyrillic transliteration for the locale you use, provide the data.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: " edi at gmx dot de
@ 2006-07-20 10:18 ` dsegan at gmx dot net
  2007-02-17 19:24 ` drepper at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: dsegan at gmx dot net @ 2006-07-20 10:18 UTC (permalink / raw)
  To: libc-locales

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 410 bytes --]


------- Additional Comments From dsegan at gmx dot net  2006-07-20 10:18 -------
Works for me with sr_CS locale:

$ echo 'Müßte &#1044;&#1072;&#1085;&#1080;&#1083;&#1086;' | LANG=sr_CS.UTF-8 iconv -t ASCII//translit
Musste Danilo

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2015-09-18 16:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-2872-716@http.sourceware.org/bugzilla/>
2015-09-07 11:50 ` [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails ekobylkin at paypal dot com
2015-09-07 11:54 ` schwab@linux-m68k.org
2015-09-07 12:22 ` ekobylkin at paypal dot com
2015-09-07 15:07 ` myllynen at redhat dot com
2015-09-07 23:08 ` ekobylkin at paypal dot com
2015-09-07 23:09 ` ekobylkin at paypal dot com
2015-09-07 23:12 ` ekobylkin at paypal dot com
2015-09-07 23:35 ` ekobylkin at paypal dot com
2015-09-08  7:31 ` myllynen at redhat dot com
2015-09-08  7:41 ` ekobylkin at paypal dot com
2015-09-08  9:11 ` ekobylkin at paypal dot com
2015-09-08 10:06 ` ekobylkin at paypal dot com
2015-09-08 10:06 ` ekobylkin at paypal dot com
2015-09-08 10:06 ` ekobylkin at paypal dot com
2015-09-08 10:21 ` ekobylkin at paypal dot com
2015-09-18  9:30 ` ekobylkin at paypal dot com
2015-09-18 10:05 ` ekobylkin at paypal dot com
2015-09-18 13:23 ` ekobylkin at paypal dot com
2015-09-18 13:23 ` myllynen at redhat dot com
2015-09-18 13:23 ` ekobylkin at paypal dot com
2015-09-18 14:07 ` ekobylkin at paypal dot com
2015-09-18 16:39 ` ekobylkin at paypal dot com
2015-09-18 16:40 ` ekobylkin at paypal dot com
2006-07-02  8:31 [Bug localedata/2872] New: " edi at gmx dot de
2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
2007-02-17 19:24 ` drepper at redhat dot com
2007-02-17 22:17 ` edi at gmx dot de
2007-02-19  0:52 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).