public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
@ 2014-02-07  2:56 ` jsm28 at gcc dot gnu.org
  2014-06-26  5:15 ` pravin.d.s at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2014-02-07  2:56 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

Joseph Myers <jsm28 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |libc-locales at sourceware dot org
          Component|libc                        |localedata

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
  2014-02-07  2:56 ` [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others) jsm28 at gcc dot gnu.org
@ 2014-06-26  5:15 ` pravin.d.s at gmail dot com
  2015-05-04 20:39 ` maiku.fabian at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-26  5:15 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

Pravin S <pravin.d.s at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pravin.d.s at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
  2014-02-07  2:56 ` [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others) jsm28 at gcc dot gnu.org
  2014-06-26  5:15 ` pravin.d.s at gmail dot com
@ 2015-05-04 20:39 ` maiku.fabian at gmail dot com
  2015-05-04 23:07 ` samuel.thibault@ens-lyon.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: maiku.fabian at gmail dot com @ 2015-05-04 20:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #5 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Samuel Thibault from comment #4)
> Right, thus changing bug title: the transliteration however still produces
> "e", while it could produce "é".

Transliterating to “e” is probably OK in most locales, for
example in English dropping accents seemst to be common usage.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2015-05-04 20:39 ` maiku.fabian at gmail dot com
@ 2015-05-04 23:07 ` samuel.thibault@ens-lyon.org
  2015-05-05  9:46 ` maiku.fabian at gmail dot com
  2018-04-19 13:58 ` fweimer at redhat dot com
  5 siblings, 0 replies; 6+ messages in thread
From: samuel.thibault@ens-lyon.org @ 2015-05-04 23:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

--- Comment #6 from Samuel Thibault <samuel.thibault@ens-lyon.org> ---
Err, but here e+combineacute *is* representable in latin1, it's eacute. So
transliteration should not discard the accent.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2015-05-04 23:07 ` samuel.thibault@ens-lyon.org
@ 2015-05-05  9:46 ` maiku.fabian at gmail dot com
  2018-04-19 13:58 ` fweimer at redhat dot com
  5 siblings, 0 replies; 6+ messages in thread
From: maiku.fabian at gmail dot com @ 2015-05-05  9:46 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Samuel Thibault from comment #6)
> Err, but here e+combineacute *is* representable in latin1, it's eacute. So
> transliteration should not discard the accent.

Yes, maybe.

But is this doable with the glibc transliteration system?
All the glibc/localedata/locales/translit_* files just transliterate
one single character to another character or a list of characters.
It never starts with a character sequence. So I guess this is not supported.

As Jungshik Shin suggests in comment#1, iconv could
normalize the input to NFC before attempting a transliteration.

Certainly not without transliteration, as Rich Felker writes in
comment#3, but *if* transliteration is used, normalizing to NFC and
then doing the transliteration might be a reasonable approach.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others)
       [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2015-05-05  9:46 ` maiku.fabian at gmail dot com
@ 2018-04-19 13:58 ` fweimer at redhat dot com
  5 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2018-04-19 13:58 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=2253

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com

--- Comment #8 from Florian Weimer <fweimer at redhat dot com> ---
Created attachment 10961
  --> https://sourceware.org/bugzilla/attachment.cgi?id=10961&action=edit
e-combining-acute

Attaching file for posterity.

00000000: 65cc 810a                                e...

Issue is still present.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-19 13:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-2253-716@http.sourceware.org/bugzilla/>
2014-02-07  2:56 ` [Bug localedata/2253] unicode combining accents can't be iconv-ed to latin//translit (and others) jsm28 at gcc dot gnu.org
2014-06-26  5:15 ` pravin.d.s at gmail dot com
2015-05-04 20:39 ` maiku.fabian at gmail dot com
2015-05-04 23:07 ` samuel.thibault@ens-lyon.org
2015-05-05  9:46 ` maiku.fabian at gmail dot com
2018-04-19 13:58 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).