public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "elie.roux@telecom-bretagne.eu" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
Date: Mon, 15 Jan 2018 22:03:00 -0000	[thread overview]
Message-ID: <bug-21547-716-K0vCmff41H@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-21547-716@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=21547

--- Comment #18 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
well, things are supposed to be sorted just like in the sorted list attached to
this bug report.

Now, I agree there is some magic going on here, and it's not totally obvious to
me how this works, but it works.

Even though it's not clear what line 7 does, it clearly does something, because
if you remove it, the tests on the tibetan-collation github repo fails with:

expected [གངས་ལྷགས།, གཉྫིར།, གད།]
got      [གངས་ལྷགས།, གད།, གཉྫིར།]

The test corresponds to page 347 of the tshig mdzod chen mo:

https://www.tbrc.org/browser/ImageService?work=W29329&igroup=I1KG15042&image=379&first=1&last=1058&fetchimg=yes

So line 7 has a purpose and doesn't get completely overwritten, although I
agree the magic that takes place is a bit above my head... I suppose that
somehow it indicates that གཉྫ should be sorted after the initial value of གཉ,
and this get recorded somehow, even though གཉ then takes another value
afterwards.

I guess it may become less confusing with a bit of an understanding of Tibetan:
in Tibetan གཉ absolutely never exists on its own, as it would be main letter ག
then suffix ཉ and this can simply never happens (ཉ cannot be a suffix). What
may happen are two cases starting with གཉ:

1. གཉྫིར is transliterated Sanskrit, and sort of exceptionally (and quite
erratically) behaves as if ཉ was a suffix, and is thus sorted with the main
letter ག, and that's what line 7 is trying to sort. What I believe happens is
that at the time line 7 is parsed, གཉ is still sorted with the main letter ག,
as it would be in the root collation. So this sorts གཉྫ with the main letter ག.
Note that if you put the rule at the end of the file, the result is not the
same, so I think it's more or less what's happening...

2. གཉར is prefix ག, then main letter ཉ then suffix ར, which is sorted in a
totally different way, with the main letter ཉ, as stated by the rule of line
30, far after main letter ག. So this sorts གཉ with the main letter ཉ.

That's my understanding of the situation, and I still think the rules are
correct... I'm not sure I have made things clearer, if you want more details
don't hesitate to ask!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

  parent reply	other threads:[~2018-01-15 22:03 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-05 10:37 [Bug localedata/21547] New: " elie.roux@telecom-bretagne.eu
2017-06-16  9:03 ` [Bug localedata/21547] " elie.roux@telecom-bretagne.eu
2017-10-21  8:26 ` maiku.fabian at gmail dot com
2017-12-14 15:51 ` maiku.fabian at gmail dot com
2017-12-18 17:15 ` maiku.fabian at gmail dot com
2017-12-18 17:58 ` elie.roux@telecom-bretagne.eu
2017-12-18 18:00 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:25 ` maiku.fabian at gmail dot com
2018-01-15 10:50 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:51 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:35 ` maiku.fabian at gmail dot com
2018-01-15 14:36 ` maiku.fabian at gmail dot com
2018-01-15 14:47 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:56 ` maiku.fabian at gmail dot com
2018-01-15 15:06 ` maiku.fabian at gmail dot com
2018-01-15 15:14 ` elie.roux@telecom-bretagne.eu
2018-01-15 15:15 ` maiku.fabian at gmail dot com
2018-01-15 15:18 ` maiku.fabian at gmail dot com
2018-01-15 15:25 ` maiku.fabian at gmail dot com
2018-01-15 19:09 ` elie.roux@telecom-bretagne.eu
2018-01-15 20:58 ` maiku.fabian at gmail dot com
2018-01-15 22:03 ` elie.roux@telecom-bretagne.eu [this message]
2018-01-16  7:38 ` maiku.fabian at gmail dot com
2018-01-16  8:11 ` elie.roux@telecom-bretagne.eu
2018-01-22 15:17 ` maiku.fabian at gmail dot com
2018-01-22 15:18 ` maiku.fabian at gmail dot com
2018-01-22 15:22 ` maiku.fabian at gmail dot com
2018-01-22 21:45 ` elie.roux@telecom-bretagne.eu
2018-01-23 17:32 ` maiku.fabian at gmail dot com
2018-01-23 21:57 ` elie.roux@telecom-bretagne.eu
2018-01-24  0:04 ` maiku.fabian at gmail dot com
2018-01-24  8:49 ` elie.roux@telecom-bretagne.eu
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-03-01 14:40 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-21547-716-K0vCmff41H@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).