public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: transliteration of Hanzi characters
       [not found]   ` <CAE2sS1jzyUsW9CgtKA3-RSyPjHw2-qd=NH59h5s8LqV1-sEY5g@mail.gmail.com>
@ 2021-05-05  0:13     ` Bruno Haible
  0 siblings, 0 replies; only message in thread
From: Bruno Haible @ 2021-05-05  0:13 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

Carlos O'Donell wrote in
<https://lists.gnu.org/archive/html/bug-gettext/2021-05/msg00008.html>:
> > 1 Empfaenger Chinese (??,???,??)      ??
> >   * For the second line of output, in the first three cases, iconv()
> >     did transliteration, and the result was always an ASCII string.
> >     (The quality of glibc's transliteration of Hanzi characters to
> >     question marks can be debated, though.)
> 
> Completely off-topic, but is there a "high quality" transliteration of
> Hanzi characters?
> Would you have expected a phenome to be spelled out in ASCII?
> I am not aware of any way to keep the meaning of the Hanzi characters in
> ASCII, therefore you see the locales "default_missing" character U+003F '?'.

I think this entire question of transliteration is mostly irrelevant
nowadays. It was relevant when Ulrich Drepper designed the glibc gettext()
and iconv() systems, in and before 2001, because at that time
  - many applications were not multibyte-encoding aware (and some were
    not even 8-bit clean),
  - GUI applications could often not display Unicode well,
  - console users were limited to a font with just Latin, Cyrillic,
    Greek, and Hebrew scripts (essentially).

Nowadays
  - most applications are multibyte-encoding aware,
  - all UI toolkits display Unicode fonts well,
  - even the framebuffer devices can display Chinese character (via
    fbterm).

Nevertheless, coming back to your question, specifically for Han characters
and for Chinese, the Unicode database [1] (file Unihan_Readings.txt, field
'kMandarin') looks like it could provide the data for a transliteration.

Bruno

[1] https://www.unicode.org/Public/13.0.0/ucd/Unihan.zip


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-05-05  0:13 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <30212d07-e3f7-9f34-192f-23eb5b817004@redhat.com>
     [not found] ` <10876659.0BVuUKZfFM@omega>
     [not found]   ` <CAE2sS1jzyUsW9CgtKA3-RSyPjHw2-qd=NH59h5s8LqV1-sEY5g@mail.gmail.com>
2021-05-05  0:13     ` transliteration of Hanzi characters Bruno Haible

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).