* [PATCHv2] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]
@ 2019-11-14 12:29 Diego (Egor) Kobylkin
2019-11-14 12:40 ` Florian Weimer
0 siblings, 1 reply; 2+ messages in thread
From: Diego (Egor) Kobylkin @ 2019-11-14 12:29 UTC (permalink / raw)
To: libc-locales, libc-alpha; +Cc: Florian Weimer, Marko Myllynen
[-- Attachment #1.1: Type: text/plain, Size: 2555 bytes --]
Changelog:
v2
* ETA WITH TONOS is now transliterated as I/i to be consistent throughout the table. Ancient Greek calls for E/e and modern for I/i which we are taking here.
Thanks Florian for the feedback on this!
Egor
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, September 4, 2019 9:31 AM, Diego (Egor) Kobylkin <egor@kobylkin.com> wrote:
> Dear locale maintainers,
>
> fix the glibc bug 12031 "iconv -t ascii//translit with Greek characters" [1]
> add Greek transliteration rows to locale/C-translit.h.in.
>
> This work is done on the heels of the successfully committed patch for the
> virtually the same bug [BZ #2872] but concerning Cyrillic characters. [2]
>
> AFAIK there are many versions of transcription tables for Greek to ASCII
> transcription. Given that current iconv logic can only translit one to many
> but not many to many symbols we take the "Standard" part of
> the Romanization_of_Greek#Modern_Greek table [3]
>
> and only keep the one letter Greek graphems. That "standard" seems to be close to
> the ELOT 743 indeed but not the same.
>
> So we omit things like M and Μπ being transliterated as M and B accordingly.
> Rather Μπ will be treated like two separate graphems and transliterated as Mp.
>
> Here is the list of some standards I have collected so far. There doesn't seem
> a way to harmonize them all into one. But if anyone want to propose a solution -
> please do.
>
> - ΕΛΟΤ 743 https://www.teicrete.gr/users/kutrulis/Ergalia/ELOT743.htm Passports.
> - ISO 843 https://en.wikipedia.org/wiki/ISO_843
> - ALA-LC https://www.loc.gov/catdir/cpso/romanization/greek.pdf Book titles.
> - BGN/PCGN http://libraries.ucsd.edu/bib/fed/USBGN_romanization.pdf
> - http://geonames.nga.mil/gns/html/Romanization/Romanization_Greek.pdf Geographical names.
>
> Furthermore to cover the whole U0370-U03FF Greek/Coptic Unicode range I have
> asked around and made a best effort transliteration for the rest of characters
> not covered in above standards.
>
> Should you have better sources for the actual translit entries please make sure to
> send your feedback!
>
> The patch is attached.
>
> Best regards,
> Egor Kobylkin
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=12031 [1]
> https://sourceware.org/ml/libc-alpha/2019-07/msg00477.html [2]
> https://en.wikipedia.org/wiki/Romanization_of_Greek#Modern_Greek [3]
>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-Locales-Greek-ASCII-transliteration-table-BZ-12031.patch --]
[-- Type: text/x-patch; filename="0001-Locales-Greek-ASCII-transliteration-table-BZ-12031.patch"; name="0001-Locales-Greek-ASCII-transliteration-table-BZ-12031.patch", Size: 8282 bytes --]
From 21865905ff539265b3142b3a44c8eac394bf8eeb Mon Sep 17 00:00:00 2001
From: Egor Kobylkin <egor@kobylkin.com>
Date: Thu, 14 Nov 2019 13:05:52 +0100
Subject: [PATCH] Locales: Greek -> ASCII transliteration table [BZ #12031]
[BZ #12031]
* locale/C-translit.h.in: Add Greeklish transliteration.
---
locale/C-translit.h.in | 137 ++++++++++++++++++++++++++++++++++++++++-
1 file changed, 136 insertions(+), 1 deletion(-)
diff --git a/locale/C-translit.h.in b/locale/C-translit.h.in
index 12cbcd35be..c22dddd430 100644
--- a/locale/C-translit.h.in
+++ b/locale/C-translit.h.in
@@ -15,7 +15,7 @@
#
# You should have received a copy of the GNU Lesser General Public
# License along with the GNU C Library; if not, see
-# <https://www.gnu.org/licenses/>.
+# <http://www.gnu.org/licenses/>.
# The entries here have to be sorted relative to the input string.
@@ -56,6 +56,141 @@
"\x02cd" "_" # <U02CD> MODIFIER LETTER LOW MACRON
"\x02d0" ":" # <U02D0> MODIFIER LETTER TRIANGULAR COLON
"\x02dc" "~" # <U02DC> SMALL TILDE
+"\x0370" "H" # <U0370> GREEK CAPITAL LETTER HETA
+"\x0371" "h" # <U0371> GREEK SMALL LETTER HETA
+"\x0372" "SS" # <U0372> GREEK CAPITAL LETTER ARCHAIC SAMPI
+"\x0373" "ss" # <U0373> GREEK SMALL LETTER ARCHAIC SAMPI
+"\x0374" "#" # <U0374> GREEK NUMERAL SIGN
+"\x0375" "#`" # <U0375> GREEK LOWER NUMERAL SIGN
+"\x0376" "W" # <U0376> GREEK CAPITAL LETTER PAMPHYLIAN DIGAMMA
+"\x0377" "w" # <U0377> GREEK SMALL LETTER PAMPHYLIAN DIGAMMA
+"\x037a" "i" # <U037A> GREEK YPOGEGRAMMENI
+"\x037b" "s" # <U037B> GREEK SMALL REVERSED LUNATE SIGMA SYMBOL
+"\x037c" "s" # <U037C> GREEK SMALL DOTTED LUNATE SIGMA SYMBOL
+"\x037d" "s" # <U037D> GREEK SMALL REVERSED DOTTED LUNATE SIGMA SYMBOL
+"\x037e" "?" # <U037E> GREEK QUESTION MARK
+"\x037f" "J" # <U037F> GREEK CAPITAL LETTER YOT
+"\x0384" "`" # <U0384> GREEK TONOS
+"\x0385" "`" # <U0385> GREEK DIALYTIKA TONOS
+"\x0386" "A" # <U0386> GREEK CAPITAL LETTER ALPHA WITH TONOS
+"\x0387" ";" # <U0387> GREEK ANO TELEIA
+"\x0388" "E" # <U0388> GREEK CAPITAL LETTER EPSILON WITH TONOS
+"\x0389" "I" # <U0389> GREEK CAPITAL LETTER ETA WITH TONOS
+"\x038a" "I" # <U038A> GREEK CAPITAL LETTER IOTA WITH TONOS
+"\x038c" "O" # <U038C> GREEK CAPITAL LETTER OMICRON WITH TONOS
+"\x038e" "Y" # <U038E> GREEK CAPITAL LETTER UPSILON WITH TONOS
+"\x038f" "O" # <U038F> GREEK CAPITAL LETTER OMEGA WITH TONOS
+"\x0390" "I" # <U0390> GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS
+"\x0391" "A" # <U0391> GREEK CAPITAL LETTER ALPHA
+"\x0392" "V" # <U0392> GREEK CAPITAL LETTER BETA
+"\x0393" "G" # <U0393> GREEK CAPITAL LETTER GAMMA
+"\x0394" "D" # <U0394> GREEK CAPITAL LETTER DELTA
+"\x0395" "E" # <U0395> GREEK CAPITAL LETTER EPSILON
+"\x0396" "Z" # <U0396> GREEK CAPITAL LETTER ZETA
+"\x0397" "I" # <U0397> GREEK CAPITAL LETTER ETA
+"\x0398" "TH" # <U0398> GREEK CAPITAL LETTER THETA
+"\x0399" "I" # <U0399> GREEK CAPITAL LETTER IOTA
+"\x039a" "K" # <U039A> GREEK CAPITAL LETTER KAPPA
+"\x039b" "L" # <U039B> GREEK CAPITAL LETTER LAMDA
+"\x039c" "M" # <U039C> GREEK CAPITAL LETTER MU
+"\x039d" "N" # <U039D> GREEK CAPITAL LETTER NU
+"\x039e" "X" # <U039E> GREEK CAPITAL LETTER XI
+"\x039f" "O" # <U039F> GREEK CAPITAL LETTER OMICRON
+"\x03a0" "P" # <U03A0> GREEK CAPITAL LETTER PI
+"\x03a1" "R" # <U03A1> GREEK CAPITAL LETTER RHO
+"\x03a3" "S" # <U03A3> GREEK CAPITAL LETTER SIGMA
+"\x03a4" "T" # <U03A4> GREEK CAPITAL LETTER TAU
+"\x03a5" "Y" # <U03A5> GREEK CAPITAL LETTER UPSILON
+"\x03a6" "F" # <U03A6> GREEK CAPITAL LETTER PHI
+"\x03a7" "CH" # <U03A7> GREEK CAPITAL LETTER CHI
+"\x03a8" "PS" # <U03A8> GREEK CAPITAL LETTER PSI
+"\x03a9" "O" # <U03A9> GREEK CAPITAL LETTER OMEGA
+"\x03aa" "I" # <U03AA> GREEK CAPITAL LETTER IOTA WITH DIALYTIKA
+"\x03ab" "Y" # <U03AB> GREEK CAPITAL LETTER UPSILON WITH DIALYTIKA
+"\x03ac" "a" # <U03AC> GREEK SMALL LETTER ALPHA WITH TONOS
+"\x03ad" "e" # <U03AD> GREEK SMALL LETTER EPSILON WITH TONOS
+"\x03ae" "i" # <U03AE> GREEK SMALL LETTER ETA WITH TONOS
+"\x03af" "i" # <U03AF> GREEK SMALL LETTER IOTA WITH TONOS
+"\x03b0" "y" # <U03B0> GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
+"\x03b1" "a" # <U03B1> GREEK SMALL LETTER ALPHA
+"\x03b2" "v" # <U03B2> GREEK SMALL LETTER BETA
+"\x03b3" "g" # <U03B3> GREEK SMALL LETTER GAMMA
+"\x03b4" "d" # <U03B4> GREEK SMALL LETTER DELTA
+"\x03b5" "e" # <U03B5> GREEK SMALL LETTER EPSILON
+"\x03b6" "z" # <U03B6> GREEK SMALL LETTER ZETA
+"\x03b7" "i" # <U03B7> GREEK SMALL LETTER ETA
+"\x03b8" "th" # <U03B8> GREEK SMALL LETTER THETA
+"\x03b9" "i" # <U03B9> GREEK SMALL LETTER IOTA
+"\x03ba" "k" # <U03BA> GREEK SMALL LETTER KAPPA
+"\x03bb" "l" # <U03BB> GREEK SMALL LETTER LAMDA
+"\x03bc" "m" # <U03BC> GREEK SMALL LETTER MU
+"\x03bd" "n" # <U03BD> GREEK SMALL LETTER NU
+"\x03be" "x" # <U03BE> GREEK SMALL LETTER XI
+"\x03bf" "o" # <U03BF> GREEK SMALL LETTER OMICRON
+"\x03c0" "p" # <U03C0> GREEK SMALL LETTER PI
+"\x03c1" "r" # <U03C1> GREEK SMALL LETTER RHO
+"\x03c2" "s" # <U03C2> GREEK SMALL LETTER FINAL SIGMA
+"\x03c3" "s" # <U03C3> GREEK SMALL LETTER SIGMA
+"\x03c4" "t" # <U03C4> GREEK SMALL LETTER TAU
+"\x03c5" "y" # <U03C5> GREEK SMALL LETTER UPSILON
+"\x03c6" "f" # <U03C6> GREEK SMALL LETTER PHI
+"\x03c7" "ch" # <U03C7> GREEK SMALL LETTER CHI
+"\x03c8" "ps" # <U03C8> GREEK SMALL LETTER PSI
+"\x03c9" "o" # <U03C9> GREEK SMALL LETTER OMEGA
+"\x03ca" "i" # <U03CA> GREEK SMALL LETTER IOTA WITH DIALYTIKA
+"\x03cb" "y" # <U03CB> GREEK SMALL LETTER UPSILON WITH DIALYTIKA
+"\x03cc" "o" # <U03CC> GREEK SMALL LETTER OMICRON WITH TONOS
+"\x03cd" "y" # <U03CD> GREEK SMALL LETTER UPSILON WITH TONOS
+"\x03ce" "o" # <U03CE> GREEK SMALL LETTER OMEGA WITH TONOS
+"\x03cf" "&" # <U03CF> GREEK CAPITAL KAI SYMBOL
+"\x03d0" "b" # <U03D0> GREEK BETA SYMBOL
+"\x03d1" "th" # <U03D1> GREEK THETA SYMBOL
+"\x03d2" "Y`" # <U03D2> GREEK UPSILON WITH HOOK SYMBOL
+"\x03d3" "Y`" # <U03D3> GREEK UPSILON WITH ACUTE AND HOOK SYMBOL
+"\x03d4" "Y`" # <U03D4> GREEK UPSILON WITH DIAERESIS AND HOOK SYMBOL
+"\x03d5" "f" # <U03D5> GREEK PHI SYMBOL
+"\x03d6" "p" # <U03D6> GREEK PI SYMBOL
+"\x03d7" "&" # <U03D7> GREEK KAI SYMBOL
+"\x03d8" "Q" # <U03D8> GREEK LETTER ARCHAIC KOPPA
+"\x03d9" "q" # <U03D9> GREEK SMALL LETTER ARCHAIC KOPPA
+"\x03da" "6" # <U03DA> GREEK LETTER STIGMA
+"\x03db" "6" # <U03DB> GREEK SMALL LETTER STIGMA
+"\x03dc" "W" # <U03DC> GREEK LETTER DIGAMMA
+"\x03dd" "w" # <U03DD> GREEK SMALL LETTER DIGAMMA
+"\x03de" "90" # <U03DE> GREEK LETTER KOPPA
+"\x03df" "90" # <U03DF> GREEK SMALL LETTER KOPPA
+"\x03e0" "900" # <U03E0> GREEK LETTER SAMPI
+"\x03e1" "900" # <U03E1> GREEK SMALL LETTER SAMPI
+"\x03e2" "SH" # <U03E2> COPTIC CAPITAL LETTER SHEI
+"\x03e3" "sh" # <U03E3> COPTIC SMALL LETTER SHEI
+"\x03e4" "F" # <U03E4> COPTIC CAPITAL LETTER FEI
+"\x03e5" "f" # <U03E5> COPTIC SMALL LETTER FEI
+"\x03e6" "KH" # <U03E6> COPTIC CAPITAL LETTER KHEI
+"\x03e7" "kh" # <U03E7> COPTIC SMALL LETTER KHEI
+"\x03e8" "H" # <U03E8> COPTIC CAPITAL LETTER HORI
+"\x03e9" "h" # <U03E9> COPTIC SMALL LETTER HORI
+"\x03ea" "DJ" # <U03EA> COPTIC CAPITAL LETTER GANGIA
+"\x03eb" "dj" # <U03EB> COPTIC SMALL LETTER GANGIA
+"\x03ec" "GJ" # <U03EC> COPTIC CAPITAL LETTER SHIMA
+"\x03ed" "gj" # <U03ED> COPTIC SMALL LETTER SHIMA
+"\x03ee" "TI" # <U03EE> COPTIC CAPITAL LETTER DEI
+"\x03ef" "ti" # <U03EF> COPTIC SMALL LETTER DEI
+"\x03f0" "k" # <U03F0> GREEK KAPPA SYMBOL
+"\x03f1" "r" # <U03F1> GREEK RHO SYMBOL
+"\x03f2" "s" # <U03F2> GREEK LUNATE SIGMA SYMBOL
+"\x03f3" "j" # <U03F3> GREEK LETTER YOT
+"\x03f4" "TH" # <U03F4> GREEK CAPITAL THETA SYMBOL
+"\x03f5" "e" # <U03F5> GREEK LUNATE EPSILON SYMBOL
+"\x03f6" "e" # <U03F6> GREEK REVERSED LUNATE EPSILON SYMBOL
+"\x03f7" "SH" # <U03F7> GREEK CAPITAL LETTER SHO
+"\x03f8" "sh" # <U03F8> GREEK SMALL LETTER SHO
+"\x03f9" "S" # <U03F9> GREEK CAPITAL LUNATE SIGMA SYMBOL
+"\x03fa" "S" # <U03FA> GREEK CAPITAL LETTER SAN
+"\x03fb" "s" # <U03FB> GREEK SMALL LETTER SAN
+"\x03fc" "r" # <U03FC> GREEK RHO WITH STROKE SYMBOL
+"\x03fd" "S" # <U03FD> GREEK CAPITAL REVERSED LUNATE SIGMA SYMBOL
+"\x03fe" "S" # <U03FE> GREEK CAPITAL DOTTED LUNATE SIGMA SYMBOL
+"\x03ff" "S" # <U03FF> GREEK CAPITAL REVERSED DOTTED LUNATE SIGMA SYMBOL
"\x0401" "YO" # <U0401> CYRILLIC CAPITAL LETTER IO
"\x0402" "DJ" # <U0402> CYRILLIC CAPITAL LETTER DJE
"\x0403" "G`" # <U0403> CYRILLIC CAPITAL LETTER GJE
--
2.17.1
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 217 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCHv2] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031]
2019-11-14 12:29 [PATCHv2] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031] Diego (Egor) Kobylkin
@ 2019-11-14 12:40 ` Florian Weimer
0 siblings, 0 replies; 2+ messages in thread
From: Florian Weimer @ 2019-11-14 12:40 UTC (permalink / raw)
To: Diego (Egor) Kobylkin; +Cc: libc-locales, libc-alpha, Marko Myllynen
* Diego Kobylkin:
> -# <https://www.gnu.org/licenses/>.
> +# <http://www.gnu.org/licenses/>.
Spurious change.
I think we should take this—even if imperfect, it beats all those ?.
Thanks,
Florian
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2019-11-14 12:40 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-14 12:29 [PATCHv2] locale/C-translit.h.in: Greek -> ASCII transliteration table [BZ #12031] Diego (Egor) Kobylkin
2019-11-14 12:40 ` Florian Weimer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).