public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
@ 2011-05-15  7:32 ` drepper.fsp at gmail dot com
  2011-09-24  7:14 ` mistresssilvara at hotmail dot com
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: drepper.fsp at gmail dot com @ 2011-05-15  7:32 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=12031

Ulrich Drepper <drepper.fsp at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING
                 CC|                            |drepper.fsp at gmail dot
                   |                            |com

--- Comment #1 from Ulrich Drepper <drepper.fsp at gmail dot com> 2011-05-15 04:43:52 UTC ---
What would the transliteration look like?  And is it locale-independent?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
  2011-05-15  7:32 ` [Bug localedata/12031] iconv -t ascii//translit with Greek characters drepper.fsp at gmail dot com
@ 2011-09-24  7:14 ` mistresssilvara at hotmail dot com
  2012-02-03 19:48 ` alexander.karlstad at gmail dot com
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: mistresssilvara at hotmail dot com @ 2011-09-24  7:14 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=12031

-EMail Hidden- <mistresssilvara at hotmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
                 CC|                            |mistresssilvara at hotmail
                   |                            |dot com

--- Comment #3 from -EMail Hidden- <mistresssilvara at hotmail dot com> 2011-09-24 02:33:54 UTC ---
Absolutely any transliteration scheme is good if it gives some ASCII characters
instead of exception this function does now.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
  2011-05-15  7:32 ` [Bug localedata/12031] iconv -t ascii//translit with Greek characters drepper.fsp at gmail dot com
  2011-09-24  7:14 ` mistresssilvara at hotmail dot com
@ 2012-02-03 19:48 ` alexander.karlstad at gmail dot com
  2012-02-04 11:59 ` pere at hungry dot com
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: alexander.karlstad at gmail dot com @ 2012-02-03 19:48 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=12031

Alexander Karlstad <alexander.karlstad at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |alexander.karlstad at gmail
                   |                            |dot com

--- Comment #4 from Alexander Karlstad <alexander.karlstad at gmail dot com> 2012-02-03 19:28:54 UTC ---
I have a similar problem with later versions of iconv (2.13 in Ubuntu).

iconv -t ascii//TRANSLIT <<< 'æ,ø,å'

gives me "ae,?,a" but in my opinion it should give me "ae,o,a".

Tested this on several machines with the same version (2.13) and on an old
SunOS box with 1.9. The latter returned the desired result.

My LC_ALL and LANG variables are all set to nb_NO.UTF-8 and I've tried changing
it to other available locales, without getting the wanted result.

Is this a bug?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2012-02-03 19:48 ` alexander.karlstad at gmail dot com
@ 2012-02-04 11:59 ` pere at hungry dot com
  2012-04-29  0:56 ` nick.andrik at gmail dot com
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pere at hungry dot com @ 2012-02-04 11:59 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=12031

Petter Reinholdtsen <pere at hungry dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pere at hungry dot com

--- Comment #5 from Petter Reinholdtsen <pere at hungry dot com> 2012-02-04 11:20:39 UTC ---
(In reply to comment #4)
> gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
[...]
> Is this a bug?

I believe it is a bug.

The request to change transliteration for æøå is
http://sourceware.org/bugzilla/show_bug.cgi?id=89 .  Please explain there why
you believe it should transliterate to ae,o,a and not ae,oe,aa.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2012-02-04 11:59 ` pere at hungry dot com
@ 2012-04-29  0:56 ` nick.andrik at gmail dot com
  2014-02-16 21:31 ` jackie.rosen at hushmail dot com
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: nick.andrik at gmail dot com @ 2012-04-29  0:56 UTC (permalink / raw)
  To: libc-locales

http://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #6 from Nick Andrik <nick.andrik at gmail dot com> 2012-04-28 19:40:48 UTC ---
Created attachment 6380
  --> http://sourceware.org/bugzilla/attachment.cgi?id=6380
Greeklish trasliteration

I have created a first version of a file to use for greeklish (greek to ascii)
transliteration.

The conversion scheme is:

alpha -> a
beta -> b
gamma -> g
delta -> d
epsilon -> e
zeta -> z
eta -> h
theta -> 8
iota -> i
kappa -> k
lamda -> l
mu -> m
nu -> n
xi -> ks
omikron -> o
pi -> p
ro -> r
sigma -> s
tau -> t
ypsilon -> y
phi -> f
chi  -> x
psi -> ps
omega -> w

From my experiments I realized that there isn't "chained" transliteration.
By this, I mean that I had to specify the greeklish transliterations for all
accented versions of letters, even I had specified for the simply one.

Example:
ETA with PERISPOMENI -> ETA (this is already in translit_combining)
ETA -> H (this is my addition)
If I try to convert "ETA with PERISPOMENI" to ascii then I get ?, I had to edit
it to this:
ETA with PERISPOMENI -> ETA;H

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2012-04-29  0:56 ` nick.andrik at gmail dot com
@ 2014-02-16 21:31 ` jackie.rosen at hushmail dot com
  2014-05-28 19:54 ` schwab at sourceware dot org
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: jackie.rosen at hushmail dot com @ 2014-02-16 21:31 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Jackie Rosen <jackie.rosen at hushmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jackie.rosen at hushmail dot com

--- Comment #7 from Jackie Rosen <jackie.rosen at hushmail dot com> ---
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2014-02-16 21:31 ` jackie.rosen at hushmail dot com
@ 2014-05-28 19:54 ` schwab at sourceware dot org
  2014-06-26  5:16 ` pravin.d.s at gmail dot com
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: schwab at sourceware dot org @ 2014-05-28 19:54 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Andreas Schwab <schwab at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|jackie.rosen at hushmail dot com   |

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2014-05-28 19:54 ` schwab at sourceware dot org
@ 2014-06-26  5:16 ` pravin.d.s at gmail dot com
  2014-06-30  9:14 ` fweimer at redhat dot com
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: pravin.d.s at gmail dot com @ 2014-06-26  5:16 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Pravin S <pravin.d.s at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pravin.d.s at gmail dot com

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2014-06-26  5:16 ` pravin.d.s at gmail dot com
@ 2014-06-30  9:14 ` fweimer at redhat dot com
  2015-05-04 20:40 ` maiku.fabian at gmail dot com
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: fweimer at redhat dot com @ 2014-06-30  9:14 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2014-06-30  9:14 ` fweimer at redhat dot com
@ 2015-05-04 20:40 ` maiku.fabian at gmail dot com
  2015-05-04 21:07 ` pere at hungry dot com
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: maiku.fabian at gmail dot com @ 2015-05-04 20:40 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

--- Comment #8 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Petter Reinholdtsen from comment #5)
> (In reply to comment #4)
> > gives me "ae,?,a" but in my opinion it should give me "ae,o,a".
> [...]
> > Is this a bug?
> 
> I believe it is a bug.

It works in recent glibc (glibc-2.20-8.fc21.x86_64)
in *all* locales except C/POSIX. 

$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=nb_NO.UTF-8 iconv -t ascii//TRANSLIT 
AE,ae,OE,oe,A,a

$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=en_US.UTF-8 iconv -t ascii//TRANSLIT 
AE,ae,OE,oe,A,a

$ echo 'Æ,æ,Ø,ø,Å,å' | LANG=POSIX iconv -t ascii//TRANSLIT 
iconv: illegal input sequence at position 0

It is independent of the locale because all locales (except C/POSIX)
include translit_neutral where this is defined.

> The request to change transliteration for æøå is
> http://sourceware.org/bugzilla/show_bug.cgi?id=89 .  Please explain there
> why you believe it should transliterate to ae,o,a and not ae,oe,aa.

For Scandinavian locales, transliterating 'Æ,æ,Ø,ø,Å,å' to 'Ae, ae,
Oe, oe, Aa, aa' is more appropriate. For most other locales,
transliterating å to a is probably OK.  I am a bit puzzled about Æ ->
AE, shouldn’t this be transliterated to Ae, even in English locales?
(Same with Ø, transliterating to just O or maybe Oe in
translit_neutral for all locales which do not have special rules
seems better.

The patch attached to

https://sourceware.org/bugzilla/show_bug.cgi?id=89#c5

fixes the transliteration for Norwegian locales (nn_NO and nb_NO).
Probably the same fix should be applied also for Swedish and Finnish
locales (and maybe Icelandic locales as well).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2015-05-04 20:40 ` maiku.fabian at gmail dot com
@ 2015-05-04 21:07 ` pere at hungry dot com
  2015-05-04 21:10   ` Keld Simonsen
  2015-05-04 21:13 ` keld at keldix dot com
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 27+ messages in thread
From: pere at hungry dot com @ 2015-05-04 21:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #9 from Petter Reinholdtsen <pere at hungry dot com> ---
(In reply to Mike FABIAN from comment #8)

> I am a bit puzzled about Æ ->
> AE, shouldn’t this be transliterated to Ae, even in English locales?
> (Same with Ø, transliterating to just O or maybe Oe in
> translit_neutral for all locales which do not have special rules
> seems better.

For me it make more sense to transliterate a capital letter to all capital
letters, to ensure words with only capital letters look sane.  For example
SØRING would end up like SOERING, not SOeRING.  Sure, if the capital letter is
the first one in the sentence, it would make more sense to use Øvelse ->
Oevelse,
but I suspect special norwegian characters at the start of the sentence
is less common than capital special norwegian letters in an all capital word. 
Most Norwegian words do not start with æ, ø or å. :)

-- 
Happy hacking
Petter Reinholdtsen

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Bug localedata/12031] iconv -t ascii//translit with Greek characters
  2015-05-04 21:07 ` pere at hungry dot com
@ 2015-05-04 21:10   ` Keld Simonsen
  0 siblings, 0 replies; 27+ messages in thread
From: Keld Simonsen @ 2015-05-04 21:10 UTC (permalink / raw)
  To: pere at hungry dot com; +Cc: libc-locales

On Mon, May 04, 2015 at 09:00:36PM +0000, pere at hungry dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=12031
> 
> --- Comment #9 from Petter Reinholdtsen <pere at hungry dot com> ---
> (In reply to Mike FABIAN from comment #8)
> 
> > I am a bit puzzled about Æ ->
> > AE, shouldn???t this be transliterated to Ae, even in English locales?
> > (Same with Ø, transliterating to just O or maybe Oe in
> > translit_neutral for all locales which do not have special rules
> > seems better.
> 
> For me it make more sense to transliterate a capital letter to all capital
> letters, to ensure words with only capital letters look sane.  For example
> SØRING would end up like SOERING, not SOeRING.  Sure, if the capital letter is
> the first one in the sentence, it would make more sense to use Øvelse ->
> Oevelse,
> but I suspect special norwegian characters at the start of the sentence
> is less common than capital special norwegian letters in an all capital word. 
> Most Norwegian words do not start with æ, ø or å. :)

The same goes for Danish which due to some common hertiage use the same letters
and to some extent the same transliteration rules.

I would also recommend transliterating Æ, Ø, Å to AE, OE, AA

Best regards
Keld

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (10 preceding siblings ...)
  2015-05-04 21:07 ` pere at hungry dot com
@ 2015-05-04 21:13 ` keld at keldix dot com
  2015-05-27  7:06 ` myllynen at redhat dot com
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: keld at keldix dot com @ 2015-05-04 21:13 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #10 from keld at keldix dot com <keld at keldix dot com> ---
On Mon, May 04, 2015 at 09:00:36PM +0000, pere at hungry dot com wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=12031
> 
> --- Comment #9 from Petter Reinholdtsen <pere at hungry dot com> ---
> (In reply to Mike FABIAN from comment #8)
> 
> > I am a bit puzzled about Æ ->
> > AE, shouldn???t this be transliterated to Ae, even in English locales?
> > (Same with Ø, transliterating to just O or maybe Oe in
> > translit_neutral for all locales which do not have special rules
> > seems better.
> 
> For me it make more sense to transliterate a capital letter to all capital
> letters, to ensure words with only capital letters look sane.  For example
> SØRING would end up like SOERING, not SOeRING.  Sure, if the capital letter is
> the first one in the sentence, it would make more sense to use Øvelse ->
> Oevelse,
> but I suspect special norwegian characters at the start of the sentence
> is less common than capital special norwegian letters in an all capital word. 
> Most Norwegian words do not start with æ, ø or å. :)

The same goes for Danish which due to some common hertiage use the same letters
and to some extent the same transliteration rules.

I would also recommend transliterating Æ, Ø, Å to AE, OE, AA

Best regards
Keld

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (11 preceding siblings ...)
  2015-05-04 21:13 ` keld at keldix dot com
@ 2015-05-27  7:06 ` myllynen at redhat dot com
  2015-09-18  9:29 ` ekobylkin at paypal dot com
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2015-05-27  7:06 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |myllynen at redhat dot com

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (12 preceding siblings ...)
  2015-05-27  7:06 ` myllynen at redhat dot com
@ 2015-09-18  9:29 ` ekobylkin at paypal dot com
  2015-09-18 16:39 ` ekobylkin at paypal dot com
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18  9:29 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Egor Kobylkin <ekobylkin at paypal dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |ekobylkin at paypal dot com
         Resolution|---                         |DUPLICATE

--- Comment #11 from Egor Kobylkin <ekobylkin at paypal dot com> ---
The problem is present for many languages and was reporter earlier
https://sourceware.org/bugzilla/show_bug.cgi?id=2872 
I have created a spreadsheet to generate transliteration tables
https://sourceware.org/bugzilla/attachment.cgi?id=8590
The table should look like this
https://sourceware.org/bugzilla/attachment.cgi?id=8591
And the list of unicode characters can be found here
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

Those who are interested in their language being included for transliteration,
would you spend some time to generate the needed table/file?

*** This bug has been marked as a duplicate of bug 2872 ***

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (13 preceding siblings ...)
  2015-09-18  9:29 ` ekobylkin at paypal dot com
@ 2015-09-18 16:39 ` ekobylkin at paypal dot com
  2019-08-07  7:44 ` egor at kobylkin dot com
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: ekobylkin at paypal dot com @ 2015-09-18 16:39 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #12 from Egor Kobylkin <ekobylkin at paypal dot com> ---
I have tested the translit_greeklish by  Nick Andrik and will try to get it
included into the fix along with with the translit_cyrilic that I have
generated myself.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (14 preceding siblings ...)
  2015-09-18 16:39 ` ekobylkin at paypal dot com
@ 2019-08-07  7:44 ` egor at kobylkin dot com
  2019-08-07  7:49 ` egor at kobylkin dot com
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-08-07  7:44 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #13 from Egor Kobylkin <egor at kobylkin dot com> ---
Created attachment 11938
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11938&action=edit
greeklish translit for C-translit.h.in based on 2872 bug table

Greek transcription table in style of Cyrillic transcription accepted as a fix
for 2872. We can hopefully get this approved soon as there should not be any
discussion on the style.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (15 preceding siblings ...)
  2019-08-07  7:44 ` egor at kobylkin dot com
@ 2019-08-07  7:49 ` egor at kobylkin dot com
  2019-08-07 14:34 ` myllynen at redhat dot com
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-08-07  7:49 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #14 from Egor Kobylkin <egor at kobylkin dot com> ---
Created attachment 11939
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11939&action=edit
adaptation of the Cyrillic transliteration ODS worksheet from 2872 to Greek

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (16 preceding siblings ...)
  2019-08-07  7:49 ` egor at kobylkin dot com
@ 2019-08-07 14:34 ` myllynen at redhat dot com
  2019-08-10  9:25 ` egor at kobylkin dot com
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2019-08-07 14:34 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Marko Myllynen <myllynen at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |REOPENED
         Resolution|DUPLICATE                   |---

--- Comment #15 from Marko Myllynen <myllynen at redhat dot com> ---
Reopening since this is about Greek not Cyrillic.

Looking at table for Modern Greek at
https://en.wikipedia.org/wiki/Romanization_of_Greek I see that this seems to
follow the standards there but it would probably a good idea to name a standard
(e.g., ELOT 743) explicitly somewhere (e.g., in the commit message).

For inclusion I'd suggest sending a patch for review to the mailing lists as
you did with the Cyrillic patch.

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (17 preceding siblings ...)
  2019-08-07 14:34 ` myllynen at redhat dot com
@ 2019-08-10  9:25 ` egor at kobylkin dot com
  2019-08-12  7:17 ` myllynen at redhat dot com
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-08-10  9:25 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #16 from Egor Kobylkin <egor at kobylkin dot com> ---
AFAIK there are many versions of transcription tables for Greek to ASCII
transcription. Given that current iconf logic can only translit one to many but
not many to many symbols we take the "Standard" part of the following table

https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek

and only keep the one letter Greek graphems. That "standard" seems to be close
to the ELOT 743 indeed but not the same. 

So we omit things like M and Μπ being transliterated as M and B accordingly.
Rather Μπ will be treated like two separate graphems and transliterated as Mp. 


Here is the list of some standards I have collected so far. There doesn't seem
a way to harmonize them all into one. But if anyone want to propose a solution
- please do.

+ ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm
Passports.
+ ISO 843 https://en.wikipedia.org/wiki/ISO_843
+ ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles.
+ BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf 
http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf
Geographical names.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (18 preceding siblings ...)
  2019-08-10  9:25 ` egor at kobylkin dot com
@ 2019-08-12  7:17 ` myllynen at redhat dot com
  2019-09-02 15:28 ` egor at kobylkin dot com
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: myllynen at redhat dot com @ 2019-08-12  7:17 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #17 from Marko Myllynen <myllynen at redhat dot com> ---
Thanks, that sounds like a good rationale and approach.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (19 preceding siblings ...)
  2019-08-12  7:17 ` myllynen at redhat dot com
@ 2019-09-02 15:28 ` egor at kobylkin dot com
  2019-09-04  7:01 ` egor at kobylkin dot com
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-09-02 15:28 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #18 from Egor Kobylkin <egor at kobylkin dot com> ---
Created attachment 11970
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11970&action=edit
A patch for the full U0370-U03FF Greek/Coptic Unicode range

Added all characters from the unicode range. There are quite a few that are
rather obscure so this is the best attempt to cover them all nevertheless.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (20 preceding siblings ...)
  2019-09-02 15:28 ` egor at kobylkin dot com
@ 2019-09-04  7:01 ` egor at kobylkin dot com
  2019-09-04  7:07 ` egor at kobylkin dot com
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-09-04  7:01 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Egor Kobylkin <egor at kobylkin dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #11970|0                           |1
        is obsolete|                            |

--- Comment #19 from Egor Kobylkin <egor at kobylkin dot com> ---
Created attachment 11972
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11972&action=edit
A patch for the full U0370-U03FF Greek/Coptic Unicode range

Removed apostrophe from the ASCII version of Greek letters with Tonos and
Dialytica as they arguably reflect grammatical conventions rather than spoken
or semantic reality.

Here is a string that can be used for testing:"GREEK Ελληνικό Ίδρυμα Ευρωπαϊκής
και Εξωτερικής.’ Ταχίστη αλώπηξ βαφής ψημένη γη, δρασκελίζει υπέρ νωθρού κυνός.
NativeLetterͰͱͲͳʹ͵Ͷͷͺͻͼͽ;Ϳ΄΅Ά·ΈΉΊΌΎΏΐΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫάέήίΰαβγδεζηθικλμνξοπρςστυφχψωϊϋόύώϏϐϑϑϒϓϕϖϗϘϙϚϛϜϝϞϟϠϡϢϣϤϥϦϧϨϩϪϫϬϭϮϯϰϱϲϳϴϵ϶ϷϸϹϺϻϼϽϾϿ"

It should produce this "#GREEK Elliniko Idryma Eyropaikes kai Exoterikes.'
Tachisti alopix vafes psimeni gi, draskelizei yper nothroy kynos.
NativeLetterHhSSss##`Wwisss?J``A;EEIOYOIAVGDEZITHIKLMNXOPRSTYFCHPSOIYaeeiyavgdezithiklmnxoprsstyfchpsoiyoyo&bththY`Y`fp&Qq66Ww9090900900SHshFfKHkhHhDJdjGJgjTItikrsjTHeeSHshSSsrSSS"

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (21 preceding siblings ...)
  2019-09-04  7:01 ` egor at kobylkin dot com
@ 2019-09-04  7:07 ` egor at kobylkin dot com
  2019-09-07 14:45 ` egor at kobylkin dot com
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-09-04  7:07 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Egor Kobylkin <egor at kobylkin dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #11939|0                           |1
        is obsolete|                            |

--- Comment #20 from Egor Kobylkin <egor at kobylkin dot com> ---
Created attachment 11973
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11973&action=edit
adaptation of the Cyrillic transliteration ODS worksheet from 2872 to Greek

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (22 preceding siblings ...)
  2019-09-04  7:07 ` egor at kobylkin dot com
@ 2019-09-07 14:45 ` egor at kobylkin dot com
  2019-11-26 11:43 ` fweimer at redhat dot com
  2019-11-26 11:43 ` cvs-commit at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: egor at kobylkin dot com @ 2019-09-07 14:45 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Egor Kobylkin <egor at kobylkin dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #11938|0                           |1
        is obsolete|                            |

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (24 preceding siblings ...)
  2019-11-26 11:43 ` fweimer at redhat dot com
@ 2019-11-26 11:43 ` cvs-commit at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2019-11-26 11:43 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

--- Comment #21 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Florian Weimer <fw@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=7fc8c286e31a336caa008a8bcfb00aac1e47cdc8

commit 7fc8c286e31a336caa008a8bcfb00aac1e47cdc8
Author: Egor Kobylkin <egor@kobylkin.com>
Date:   Thu Nov 14 13:59:39 2019 +0100

    locale: Greek -> ASCII transliteration table [BZ #12031]

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug localedata/12031] iconv -t ascii//translit with Greek characters
       [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
                   ` (23 preceding siblings ...)
  2019-09-07 14:45 ` egor at kobylkin dot com
@ 2019-11-26 11:43 ` fweimer at redhat dot com
  2019-11-26 11:43 ` cvs-commit at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: fweimer at redhat dot com @ 2019-11-26 11:43 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=12031

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |RESOLVED
                 CC|                            |fweimer at redhat dot com
         Resolution|---                         |FIXED
   Target Milestone|---                         |2.31

--- Comment #22 from Florian Weimer <fweimer at redhat dot com> ---
Fixed for glibc 2.31.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-11-26 11:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-12031-716@http.sourceware.org/bugzilla/>
2011-05-15  7:32 ` [Bug localedata/12031] iconv -t ascii//translit with Greek characters drepper.fsp at gmail dot com
2011-09-24  7:14 ` mistresssilvara at hotmail dot com
2012-02-03 19:48 ` alexander.karlstad at gmail dot com
2012-02-04 11:59 ` pere at hungry dot com
2012-04-29  0:56 ` nick.andrik at gmail dot com
2014-02-16 21:31 ` jackie.rosen at hushmail dot com
2014-05-28 19:54 ` schwab at sourceware dot org
2014-06-26  5:16 ` pravin.d.s at gmail dot com
2014-06-30  9:14 ` fweimer at redhat dot com
2015-05-04 20:40 ` maiku.fabian at gmail dot com
2015-05-04 21:07 ` pere at hungry dot com
2015-05-04 21:10   ` Keld Simonsen
2015-05-04 21:13 ` keld at keldix dot com
2015-05-27  7:06 ` myllynen at redhat dot com
2015-09-18  9:29 ` ekobylkin at paypal dot com
2015-09-18 16:39 ` ekobylkin at paypal dot com
2019-08-07  7:44 ` egor at kobylkin dot com
2019-08-07  7:49 ` egor at kobylkin dot com
2019-08-07 14:34 ` myllynen at redhat dot com
2019-08-10  9:25 ` egor at kobylkin dot com
2019-08-12  7:17 ` myllynen at redhat dot com
2019-09-02 15:28 ` egor at kobylkin dot com
2019-09-04  7:01 ` egor at kobylkin dot com
2019-09-04  7:07 ` egor at kobylkin dot com
2019-09-07 14:45 ` egor at kobylkin dot com
2019-11-26 11:43 ` fweimer at redhat dot com
2019-11-26 11:43 ` cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).