public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails
@ 2006-07-02  8:31 edi at gmx dot de
  2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: edi at gmx dot de @ 2006-07-02  8:31 UTC (permalink / raw)
  To: libc-locales

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1483 bytes --]

Hello,

I tried to convert some text from Cyrillic (UTF-8) to ASCII, using the
//translit flag. However, it fails badly, all chars are just replaced with ?. It
seems to be independent from my current locale, I can set en_US.UTF-8 or
de_DE.UTF-8 or ru_RU.UTF-8, it still fails.

Transliteration of latin seems to work, though:

echo Müßte асдфасфд | LANG=de_DE.UTF-8 iconv -f UTF-8 -t ASCII//translit
Muesste ????????
echo Müßte асдфасфд | LANG=fr_FR.UTF-8 iconv -f UTF-8 -t ASCII//translit
Musste ????????

I had a "discussion" with a Debian maintainer of glibc, who indicated that the
problem is in the locale which controls the transliterations... but, from my
POV, there should be a default fallback when there is no other transliteration
scheme. And I remember that it has been working with glibc some months or years ago.

-- 
           Summary: Transliteration Cyrillic -> ASCII fails
           Product: glibc
           Version: 2.3.6
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: libc-locales at sources dot redhat dot com
        ReportedBy: edi at gmx dot de
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails edi at gmx dot de
@ 2006-07-20 10:18 ` dsegan at gmx dot net
  2007-02-17 19:24 ` drepper at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: dsegan at gmx dot net @ 2006-07-20 10:18 UTC (permalink / raw)
  To: libc-locales

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 410 bytes --]


------- Additional Comments From dsegan at gmx dot net  2006-07-20 10:18 -------
Works for me with sr_CS locale:

$ echo 'Müßte Данило' | LANG=sr_CS.UTF-8 iconv -t ASCII//translit
Musste Danilo

-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails edi at gmx dot de
  2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
@ 2007-02-17 19:24 ` drepper at redhat dot com
  2007-02-17 22:17 ` edi at gmx dot de
  2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: drepper at redhat dot com @ 2007-02-17 19:24 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From drepper at redhat dot com  2007-02-17 19:24 -------
Transliteration is locale dependend, there is no way around it:

Russian/Cyrillic:  Горбачёв

German transliteration: Gorbaschow

English transliteration: Gorbatsov or Gorbatsev

If you want cyrillic transliteration for the locale you use, provide the data.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails edi at gmx dot de
  2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
  2007-02-17 19:24 ` drepper at redhat dot com
@ 2007-02-17 22:17 ` edi at gmx dot de
  2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: edi at gmx dot de @ 2007-02-17 22:17 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From edi at gmx dot de  2007-02-17 22:17 -------
"WORKSFORME" implies that you cannot reproduce the problem, but... does
transliterating Горбачов work with English or not? (see below) If not, how can
it be "resolved"?

Or what are you trying to say with "provide the data"? That there is no data
yet? I have seen it working some years ago. With correct US-style
transliterations. If it is broken now or data has been lost, then TRANSLIT maybe
should be disabled and throw an error immediately. Currently it produces crap
and AFAICS there is no proper documentation explaining why.

The crap it creates is not even consistent within the same language/country
pair, without .UTF-8 suffix it produces more funny non-sense.

echo Горбачов | LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
????????
echo Горбачов | LANG=de_DE.UTF-8 iconv -t ASCII//TRANSLIT
????????
echo Горбачов | LANG=de_DE iconv -t ASCII//TRANSLIT
??? 3/4 N?????N?? 3/4 ??
echo Горбачов | LANG=en_US iconv -t ASCII//TRANSLIT
iconv: illegal input sequence at position 0



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|WORKSFORME                  |


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/2872] Transliteration Cyrillic -> ASCII fails
  2006-07-02  8:31 [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails edi at gmx dot de
                   ` (2 preceding siblings ...)
  2007-02-17 22:17 ` edi at gmx dot de
@ 2007-02-19  0:52 ` drepper at redhat dot com
  3 siblings, 0 replies; 5+ messages in thread
From: drepper at redhat dot com @ 2007-02-19  0:52 UTC (permalink / raw)
  To: libc-locales


------- Additional Comments From drepper at redhat dot com  2007-02-19 00:51 -------
It all works as designed given the data provided.  If you want change, provide
the data.  Otherwise go away.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
         Resolution|                            |WORKSFORME


http://sourceware.org/bugzilla/show_bug.cgi?id=2872

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-02-19  0:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-02  8:31 [Bug localedata/2872] New: Transliteration Cyrillic -> ASCII fails edi at gmx dot de
2006-07-20 10:18 ` [Bug localedata/2872] " dsegan at gmx dot net
2007-02-17 19:24 ` drepper at redhat dot com
2007-02-17 22:17 ` edi at gmx dot de
2007-02-19  0:52 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).