* [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan)
@ 2017-06-05 10:37 elie.roux@telecom-bretagne.eu
2017-06-16 9:03 ` [Bug localedata/21547] " elie.roux@telecom-bretagne.eu
` (34 more replies)
0 siblings, 35 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2017-06-05 10:37 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Bug ID: 21547
Summary: Tibetan script collation broken (Dzongkha and Tibetan)
Product: glibc
Version: 2.24
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: elie.roux@telecom-bretagne.eu
CC: libc-locales at sourceware dot org
Target Milestone: ---
Hello,
Tibetan or Dzongkha sorting do not work properly with the current locale data.
With the following test file:
$ cat tibt_order_test.txt
ལྔ
ང
ཅ
རྔ
སྔ
བརྔ
བསྔ
I get the following wrong result:
$ LC_COLLATE="dz_BT.utf8" sort tibt_order_test.txt
ང
བརྔ
བསྔ
རྔ
ལྔ
སྔ
ཅ
The correct result would be
ང
རྔ
ལྔ
སྔ
བརྔ
བསྔ
ཅ
Dz and bo have the same collation data in CLDR.
See https://github.com/eroux/tibetan-collation for more on tibetan collation.
Result of locale -a:
bo_CN
bo_CN.utf8
bo_IN
bo_IN.utf8
C
C.UTF-8
dz_BT
dz_BT.utf8
en_GB.utf8
en_US.utf8
fr_FR.utf8
POSIX
Thank you,
--
Elie
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
@ 2017-06-16 9:03 ` elie.roux@telecom-bretagne.eu
2017-10-21 8:26 ` maiku.fabian at gmail dot com
` (33 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2017-06-16 9:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Elie Roux <elie.roux@telecom-bretagne.eu> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |elie.roux@telecom-bretagne.
| |eu
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
2017-06-16 9:03 ` [Bug localedata/21547] " elie.roux@telecom-bretagne.eu
@ 2017-10-21 8:26 ` maiku.fabian at gmail dot com
2017-12-14 15:51 ` maiku.fabian at gmail dot com
` (32 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21 8:26 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
2017-06-16 9:03 ` [Bug localedata/21547] " elie.roux@telecom-bretagne.eu
2017-10-21 8:26 ` maiku.fabian at gmail dot com
@ 2017-12-14 15:51 ` maiku.fabian at gmail dot com
2017-12-18 17:15 ` maiku.fabian at gmail dot com
` (31 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-14 15:51 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at sourceware dot org |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (2 preceding siblings ...)
2017-12-14 15:51 ` maiku.fabian at gmail dot com
@ 2017-12-18 17:15 ` maiku.fabian at gmail dot com
2017-12-18 17:58 ` elie.roux@telecom-bretagne.eu
` (30 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-18 17:15 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Do you have a complete test file which tests
all collation rules for Tibetan?
Are the current rules in CLDR correct or not?
https://unicode.org/cldr/trac/browser/trunk/common/collation/dz.xml
Your ticket
http://unicode.org/cldr/trac/ticket/9895
seems to be an update for the rules currently in CLDR, but
your update is not yet included, right?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (3 preceding siblings ...)
2017-12-18 17:15 ` maiku.fabian at gmail dot com
@ 2017-12-18 17:58 ` elie.roux@telecom-bretagne.eu
2017-12-18 18:00 ` elie.roux@telecom-bretagne.eu
` (29 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2017-12-18 17:58 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #2 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
Created attachment 10696
--> https://sourceware.org/bugzilla/attachment.cgi?id=10696&action=edit
sorted list for test
This is a correctly sorted test file, made using the latest rules of
https://github.com/eroux/tibetan-collation
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (4 preceding siblings ...)
2017-12-18 17:58 ` elie.roux@telecom-bretagne.eu
@ 2017-12-18 18:00 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:25 ` maiku.fabian at gmail dot com
` (28 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2017-12-18 18:00 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #3 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
Thanks for your answer!
You're right, my proposal to update the CLDR rules hasn't been considered yet.
I don't know how much time these things usually take, but if you have any
advice on speeding up the process I'd be glad to hear them!
I've attached a sorted list of strings, it sorts correctly with the latest
rules of https://github.com/eroux/tibetan-collation/
Thank you!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (5 preceding siblings ...)
2017-12-18 18:00 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 10:25 ` maiku.fabian at gmail dot com
2018-01-15 10:50 ` elie.roux@telecom-bretagne.eu
` (27 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 10:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #4 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I wonder whether there isn’t a contradicton in your rules.
https://github.com/eroux/tibetan-collation/blob/master/implementations/Unicode/rules.txt#L7
contains:
&གཉ<གཉྫ
so གཉ comes *before* གཉྫ as a primary difference.
But then
https://github.com/eroux/tibetan-collation/blob/master/implementations/Unicode/rules.txt#L30
contains:
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
And this causes གཉ to be sorted *after* གཉྫ.
(I tested this with icu 57.1 using it via Python3.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (6 preceding siblings ...)
2018-01-15 10:25 ` maiku.fabian at gmail dot com
@ 2018-01-15 10:50 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:51 ` elie.roux@telecom-bretagne.eu
` (26 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 10:50 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #5 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
Hello Fabian,
Thanks a lot for your thorough review, that's appreciated!
I have to say I don't really understand the second part, why would line 30
causes གཉ to be sorted after གཉྫ ? can you elaborate a little bit?
Anyways, a solution I suppose is to remove line 7 and have line 30 be:
&ཉ<<ྋྙ<གཉ<གཉྫ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
would that solve the problem? I can elaborate a bit on why lines 4 to 18 are
separated from the rest if you want, but I think it will be helpful to
understand your point of view a bit better first.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (7 preceding siblings ...)
2018-01-15 10:50 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 10:51 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:35 ` maiku.fabian at gmail dot com
` (25 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 10:51 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #6 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
Hello Fabian,
Thanks a lot for your thorough review, that's appreciated!
I have to say I don't really understand the second part, why would line 30
causes གཉ to be sorted after གཉྫ ? can you elaborate a little bit?
Anyways, a solution I suppose is to remove line 7 and have line 30 be:
&ཉ<<ྋྙ<གཉ<གཉྫ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
would that solve the problem? I can elaborate a bit on why lines 4 to 18 are
separated from the rest if you want, but I think it will be helpful to
understand your point of view a bit better first.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (8 preceding siblings ...)
2018-01-15 10:51 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 14:35 ` maiku.fabian at gmail dot com
2018-01-15 14:36 ` maiku.fabian at gmail dot com
` (24 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 14:35 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #6)
> Hello Fabian,
>
> Thanks a lot for your thorough review, that's appreciated!
>
> I have to say I don't really understand the second part, why would line 30
> causes གཉ to be sorted after གཉྫ ? can you elaborate a little bit?
I am not sure why this happens either. But it seems to happen.
I tested like this:
First the input test file to be sorted, made very short to contain only the
strings
in question:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat localedata/dz_BT.UTF-8.in.mini
གཉ
གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
And I use a very short rule file, first containg only &གཉ<གཉྫ and
the other rule commented out:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&གཉ<གཉྫ
#&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Now I sort using my small test program ~/bin/icu-collation-test.py
(I’ll attach it in the next comment):
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
And check the result:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
No difference between input and output, i.e. གཉ is still before གཉྫ
in dz_BT.UTF-8.out.
Now I remove the comment in front of the second line in rules-mini.txt:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&གཉ<གཉྫ
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
And sort again:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Checking the result:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
--- /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini 2018-01-15
15:21:59.357477414 +0100
+++ /tmp/dz_BT.UTF-8.out 2018-01-15 15:26:12.266632745 +0100
@@ -1,2 +1,2 @@
-གཉ
གཉྫ
+གཉ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$
Now the order is reversed, གཉ comes after གཉྫ.
The same happened to me while I was implementing the rules for glibc
and test sorting using glibc. I found this very confusing and thought
I might have done something wrong implementing the rules in the glibc
way. But then I tested with the above small Python3 program using icu
and found that it behaves the same way.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (9 preceding siblings ...)
2018-01-15 14:35 ` maiku.fabian at gmail dot com
@ 2018-01-15 14:36 ` maiku.fabian at gmail dot com
2018-01-15 14:47 ` elie.roux@telecom-bretagne.eu
` (23 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 14:36 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #8 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 10749
--> https://sourceware.org/bugzilla/attachment.cgi?id=10749&action=edit
icu-collation-test.py
A simple tool to test ICU/CLDR style collation rules.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (10 preceding siblings ...)
2018-01-15 14:36 ` maiku.fabian at gmail dot com
@ 2018-01-15 14:47 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:56 ` maiku.fabian at gmail dot com
` (22 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 14:47 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #9 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
I have to say I don't really understand why ICU behaves like that... I think we
should do two things:
- change my rule file so that it contains just one line and fix this oddity
- report a bug on ICU (maybe it's not a bug per se, but I can't see any other
way to solve this mistery)
I'll fix the rule file (possibly today). If you have some time do you think you
could report the ICU bug?
Thank you!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (11 preceding siblings ...)
2018-01-15 14:47 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 14:56 ` maiku.fabian at gmail dot com
2018-01-15 15:06 ` maiku.fabian at gmail dot com
` (21 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 14:56 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #10 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #9)
> I have to say I don't really understand why ICU behaves like that... I think
> we should do two things:
>
> - change my rule file so that it contains just one line and fix this oddity
> - report a bug on ICU (maybe it's not a bug per se, but I can't see any
> other way to solve this mistery)
The glibc rules I implemented showed the same behaviour, so I doubt this
is an ICU bug, I think it is a bug in the rules.
> I'll fix the rule file (possibly today). If you have some time do you think
> you could report the ICU bug?
>
> Thank you!
The order of the rules does matter. Look at this:
A somewhat longer test input:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat localedata/dz_BT.UTF-8.in.mini
ཉ
ྋྙ
གཉ
གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Again the same two lines in the rules file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&གཉ<གཉྫ
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Testing:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Result looks unexpected, probably not what you want:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
--- /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
2018-01-15 15:45:48.332377013 +0100
+++ /tmp/dz_BT.UTF-8.out 2018-01-15 15:50:46.357040054 +0100
@@ -1,4 +1,4 @@
+གཉྫ
ཉ
ྋྙ
གཉ
-གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Now I reverse the order of the two lines in the rules file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
&གཉ<གཉྫ
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Testing again:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i
localedata/dz_BT.UTF-8.in.mini -o /tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
No difference, I get the expected order:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ diff -u /local/mfabian/src/glibc/localedata/dz_BT.UTF-8.in.mini
/tmp/dz_BT.UTF-8.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (12 preceding siblings ...)
2018-01-15 14:56 ` maiku.fabian at gmail dot com
@ 2018-01-15 15:06 ` maiku.fabian at gmail dot com
2018-01-15 15:14 ` elie.roux@telecom-bretagne.eu
` (20 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 15:06 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #11 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #9)
> I have to say I don't really understand why ICU behaves like that... I think
> we should do two things:
>
> - change my rule file so that it contains just one line and fix this oddity
> - report a bug on ICU (maybe it's not a bug per se, but I can't see any
> other way to solve this mistery)
>
> I'll fix the rule file (possibly today). If you have some time do you think
> you could report the ICU bug?
>
> Thank you!
Here are the two lines with rules:
&ཉ<<ྋྙ<གཉ<མཉ<རྙ=ཪྙ<སྙ<བརྙ=བཪྙ<བསྙ
&གཉ<གཉྫ
And here I added code points in [] brackets to understand better what
is going on:
&ག[0F42]ཉ[0F49]<ག[0F42]ཉ[0F49]ྫ[0FAB]
&ཉ[0F49]<<ྋ[0F8B]ྙ[0F99]<ག[0F42]ཉ[0F49]<མ[0F58]ཉ[0F49]<ར[0F62]ྙ[0F99]=ཪ[0F6A]ྙ[0F99]<ས[0F66]ྙ[0F99]<བ[0F56]ར[0F62]ྙ[0F99]=བ[0F56]ཪ[0F6A]ྙ[0F99]<བ[0F56]ས[0F66]ྙ[0F99]
So the first line orders U+0F42 U+0F49 U+0FAB after U+0F42 U+0F49.
But then the second line reorders U+0F42 U+0F49 after U+0F8B U+0F99.
So the reference point after which U+0F42 U+0F49 U+0FAB has been
reordered in the first line has been moved to somewhere else by the second
line.
Moving away that reference point U+0F42 U+0F49 does not move U+0F42 U+0F49
U+0FAB
as well to stay behind the reference point U+0F42 U+0F49.
I.e. the second line overrides the first.
If the order of the lines is reversed, it works, because the line:
&ཉ[0F49]<<ྋ[0F8B]ྙ[0F99]<ག[0F42]ཉ[0F49]<མ[0F58]ཉ[0F49]<ར[0F62]ྙ[0F99]=ཪ[0F6A]ྙ[0F99]<ས[0F66]ྙ[0F99]<བ[0F56]ར[0F62]ྙ[0F99]=བ[0F56]ཪ[0F6A]ྙ[0F99]<བ[0F56]ས[0F66]ྙ[0F99]
then first reorders U+0F42 U+0F49 somewhere and *after* doing that,
the line:
&ག[0F42]ཉ[0F49]<ག[0F42]ཉ[0F49]ྫ[0FAB]
inserts U+0F42 U+0F49 U+0FAB after the current position of U+0F42 U+0F49.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (13 preceding siblings ...)
2018-01-15 15:06 ` maiku.fabian at gmail dot com
@ 2018-01-15 15:14 ` elie.roux@telecom-bretagne.eu
2018-01-15 15:15 ` maiku.fabian at gmail dot com
` (19 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 15:14 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #12 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
Thanks a lot for the explanation! I'm inspecting that, I'll change this rule
and the one line 12, which is in the same case I think. I need to take more
time to work on this, probably tonight.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (14 preceding siblings ...)
2018-01-15 15:14 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 15:15 ` maiku.fabian at gmail dot com
2018-01-15 15:18 ` maiku.fabian at gmail dot com
` (18 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 15:15 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #13 from Mike FABIAN <maiku.fabian at gmail dot com> ---
A simpler example. A small rules file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat rules-mini.txt
&b<z
&d<b
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
And a simple test file:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat /tmp/test-latin.in
a
b
c
d
e
f
z
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Testing:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ ~/bin/icu-collation-test.py -r rules-mini.txt -i /tmp/test-latin.in -o
/tmp/test-latin.out
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
Result:
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$ cat /tmp/test-latin.out
a
z
c
d
b
e
f
mfabian@taka:/local/mfabian/src/glibc (locales *$%)
$
The first line in the rules file move z after b. Then the second line
movez b after d. But that does not keep the z aber the b, the z is not
reordered together with the b, z stays where it was originally put by
the first line in the rules file. Moving the reference point b
somewhere else does not move the z.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (15 preceding siblings ...)
2018-01-15 15:15 ` maiku.fabian at gmail dot com
@ 2018-01-15 15:18 ` maiku.fabian at gmail dot com
2018-01-15 15:25 ` maiku.fabian at gmail dot com
` (17 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 15:18 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #14 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #12)
> Thanks a lot for the explanation! I'm inspecting that, I'll change this rule
> and the one line 12, which is in the same case I think. I need to take more
> time to work on this, probably tonight.
Thank you!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (16 preceding siblings ...)
2018-01-15 15:18 ` maiku.fabian at gmail dot com
@ 2018-01-15 15:25 ` maiku.fabian at gmail dot com
2018-01-15 19:09 ` elie.roux@telecom-bretagne.eu
` (16 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 15:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #15 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike FABIAN from comment #14)
> (In reply to Elie Roux from comment #12)
> > Thanks a lot for the explanation! I'm inspecting that, I'll change this rule
> > and the one line 12, which is in the same case I think. I need to take more
> > time to work on this, probably tonight.
>
> Thank you!
Ah, and maybe add some more stuff to your test file, apparently
your test file did not check the correct order of གཉ and གཉྫ.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (17 preceding siblings ...)
2018-01-15 15:25 ` maiku.fabian at gmail dot com
@ 2018-01-15 19:09 ` elie.roux@telecom-bretagne.eu
2018-01-15 20:58 ` maiku.fabian at gmail dot com
` (15 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 19:09 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #16 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
I've been able to look at things properly and... there is in fact no bug. The
rules do exactly what they are supposed to do, and the sorted list are exactly
as the reference dictionary. I was wrong when I suggested that
&ཉ<<ྋྙ<གཉ<གཉྫ
would be correct. It would seem more natural to me, but the dictionary I took
as a reference does not say that at all... This is a pretty specific case, and
the sorting of transliterated Sanskrit is sort of a grey area, but I think we
can keep that. I'm really sorry for the inconvenience.
Would that be ok?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (18 preceding siblings ...)
2018-01-15 19:09 ` elie.roux@telecom-bretagne.eu
@ 2018-01-15 20:58 ` maiku.fabian at gmail dot com
2018-01-15 22:03 ` elie.roux@telecom-bretagne.eu
` (14 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-15 20:58 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #17 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #16)
> I've been able to look at things properly and... there is in fact no bug.
> The rules do exactly what they are supposed to do, and the sorted list are
> exactly as the reference dictionary. I was wrong when I suggested that
>
> &ཉ<<ྋྙ<གཉ<གཉྫ
>
> would be correct. It would seem more natural to me, but the dictionary I
> took as a reference does not say that at all... This is a pretty specific
> case, and the sorting of transliterated Sanskrit is sort of a grey area,
> but I think we can keep that. I'm really sorry for the inconvenience.
>
> Would that be ok?
But what is the purpose of line 7:
https://github.com/eroux/tibetan-collation/blob/master/implementations/Unicode/rules.txt#L7
&གཉ<གཉྫ
if it is overidden later?
Should གཉ be sorted before གཉྫ or not?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (19 preceding siblings ...)
2018-01-15 20:58 ` maiku.fabian at gmail dot com
@ 2018-01-15 22:03 ` elie.roux@telecom-bretagne.eu
2018-01-16 7:38 ` maiku.fabian at gmail dot com
` (13 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-15 22:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #18 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
well, things are supposed to be sorted just like in the sorted list attached to
this bug report.
Now, I agree there is some magic going on here, and it's not totally obvious to
me how this works, but it works.
Even though it's not clear what line 7 does, it clearly does something, because
if you remove it, the tests on the tibetan-collation github repo fails with:
expected [གངས་ལྷགས།, གཉྫིར།, གད།]
got [གངས་ལྷགས།, གད།, གཉྫིར།]
The test corresponds to page 347 of the tshig mdzod chen mo:
https://www.tbrc.org/browser/ImageService?work=W29329&igroup=I1KG15042&image=379&first=1&last=1058&fetchimg=yes
So line 7 has a purpose and doesn't get completely overwritten, although I
agree the magic that takes place is a bit above my head... I suppose that
somehow it indicates that གཉྫ should be sorted after the initial value of གཉ,
and this get recorded somehow, even though གཉ then takes another value
afterwards.
I guess it may become less confusing with a bit of an understanding of Tibetan:
in Tibetan གཉ absolutely never exists on its own, as it would be main letter ག
then suffix ཉ and this can simply never happens (ཉ cannot be a suffix). What
may happen are two cases starting with གཉ:
1. གཉྫིར is transliterated Sanskrit, and sort of exceptionally (and quite
erratically) behaves as if ཉ was a suffix, and is thus sorted with the main
letter ག, and that's what line 7 is trying to sort. What I believe happens is
that at the time line 7 is parsed, གཉ is still sorted with the main letter ག,
as it would be in the root collation. So this sorts གཉྫ with the main letter ག.
Note that if you put the rule at the end of the file, the result is not the
same, so I think it's more or less what's happening...
2. གཉར is prefix ག, then main letter ཉ then suffix ར, which is sorted in a
totally different way, with the main letter ཉ, as stated by the rule of line
30, far after main letter ག. So this sorts གཉ with the main letter ཉ.
That's my understanding of the situation, and I still think the rules are
correct... I'm not sure I have made things clearer, if you want more details
don't hesitate to ask!
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (20 preceding siblings ...)
2018-01-15 22:03 ` elie.roux@telecom-bretagne.eu
@ 2018-01-16 7:38 ` maiku.fabian at gmail dot com
2018-01-16 8:11 ` elie.roux@telecom-bretagne.eu
` (12 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-16 7:38 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #19 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #18)
> well, things are supposed to be sorted just like in the sorted list attached
> to this bug report.
>
> Now, I agree there is some magic going on here, and it's not totally obvious
> to me how this works, but it works.
>
> Even though it's not clear what line 7 does, it clearly does something,
> because if you remove it, the tests on the tibetan-collation github repo
> fails with:
Yes, it does something. You see that also in my simple Latin example
in comment#13:
https://sourceware.org/bugzilla/show_bug.cgi?id=21547#c13
In
&b<z
&d<b
The rule &b<z does something, although b is reordered again by the next rule.
But z is not reordered and stays where it was reordered by the first rule.
So one could have written
&a<z
&d<b
to achieve the same effect to get the order
a
z
c
d
b
e
f
and that would have been much clearer and easier to understand rules.
I have adapted the collation rules for many locales during the last weeks
because I am updating to a new iso14651_t1_common file. And none of the
locales had a collation rule which reordered some character after an
anchor character and then reordered that anchor. It is not that this cannot
work, it just seems very weird.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (21 preceding siblings ...)
2018-01-16 7:38 ` maiku.fabian at gmail dot com
@ 2018-01-16 8:11 ` elie.roux@telecom-bretagne.eu
2018-01-22 15:17 ` maiku.fabian at gmail dot com
` (11 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-16 8:11 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #20 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
I agree it's counter-intuitive. I think the reason why the rule is there is
because of some Tibetan way of expressing things, let me elaborate:
I could replace
&གཉ<གཉྫ
with
&གངས<གཉྫ
this would keep all tests passing, but the reason I don't really want to do
that is the following:
གངས may not be the very last element before གཉྫ, it's very possible that some
not-very common suffix combination (in Dzongkha for instance). And, although
that would be quite exceptional, a strange suffix combination could happen
between གངས and གཉྫ. While basically nothing could happen between the initial
version of གཉ and གཉྫ, so the rule as it is is quite safe. Does it make more
sense? If these rules are the only ones to use this pattern and you think it's
confusing, I'll remove it though, I suppose
&གངས<གཉྫ
is a good enough replacement.
What do you think?
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (22 preceding siblings ...)
2018-01-16 8:11 ` elie.roux@telecom-bretagne.eu
@ 2018-01-22 15:17 ` maiku.fabian at gmail dot com
2018-01-22 15:18 ` maiku.fabian at gmail dot com
` (10 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-22 15:17 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #21 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 10755
--> https://sourceware.org/bugzilla/attachment.cgi?id=10755&action=edit
glibc/locales/iso14651_t1_common
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (23 preceding siblings ...)
2018-01-22 15:17 ` maiku.fabian at gmail dot com
@ 2018-01-22 15:18 ` maiku.fabian at gmail dot com
2018-01-22 15:22 ` maiku.fabian at gmail dot com
` (9 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-22 15:18 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #22 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Created attachment 10756
--> https://sourceware.org/bugzilla/attachment.cgi?id=10756&action=edit
glib/localedata/locales/dz_BT
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (24 preceding siblings ...)
2018-01-22 15:18 ` maiku.fabian at gmail dot com
@ 2018-01-22 15:22 ` maiku.fabian at gmail dot com
2018-01-22 21:45 ` elie.roux@telecom-bretagne.eu
` (8 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-22 15:22 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #23 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I finally could translate your collation rules to the glibc LC_COLLATE syntax
and make it work,i.e. sort your test file attached to comment#2 correctly
using glibc.
The updated dz_BT locale file is attached. And one also needs the update for
the iso14651_t1_common file which I have attached as well.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (25 preceding siblings ...)
2018-01-22 15:22 ` maiku.fabian at gmail dot com
@ 2018-01-22 21:45 ` elie.roux@telecom-bretagne.eu
2018-01-23 17:32 ` maiku.fabian at gmail dot com
` (7 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-22 21:45 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #24 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
That's really great, thank you so much! Is it possible to have the exact same
collation rules for the bo locale(s)? bo, bo_IN and bo_CN
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (26 preceding siblings ...)
2018-01-22 21:45 ` elie.roux@telecom-bretagne.eu
@ 2018-01-23 17:32 ` maiku.fabian at gmail dot com
2018-01-23 21:57 ` elie.roux@telecom-bretagne.eu
` (6 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-23 17:32 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #25 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #24)
> That's really great, thank you so much! Is it possible to have the exact
> same collation rules for the bo locale(s)? bo, bo_IN and bo_CN
Yes, I did this:
$ grep -A3 ^LC_COLLATE dz_* bo_*
dz_BT:LC_COLLATE
dz_BT-% Using the rules.txt attached to:
dz_BT-% http://unicode.org/cldr/trac/ticket/9895
dz_BT-% See also: https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--
bo_CN:LC_COLLATE
bo_CN-copy "dz_BT"
bo_CN-END LC_COLLATE
bo_CN-
--
bo_IN:LC_COLLATE
bo_IN-copy "bo_CN"
bo_IN-END LC_COLLATE
bo_IN-
So dz_BT implements the rules and bo_CN and bo_IN just copy them.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (27 preceding siblings ...)
2018-01-23 17:32 ` maiku.fabian at gmail dot com
@ 2018-01-23 21:57 ` elie.roux@telecom-bretagne.eu
2018-01-24 0:04 ` maiku.fabian at gmail dot com
` (5 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-23 21:57 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #26 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
That's perfect, thank you so much, really! Do you have any idea of when this
will be publicly released? (the glibc / i18n release cycle is totally unknown
to me)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (28 preceding siblings ...)
2018-01-23 21:57 ` elie.roux@telecom-bretagne.eu
@ 2018-01-24 0:04 ` maiku.fabian at gmail dot com
2018-01-24 8:49 ` elie.roux@telecom-bretagne.eu
` (4 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-01-24 0:04 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #27 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Elie Roux from comment #26)
> That's perfect, thank you so much, really! Do you have any idea of when this
> will be publicly released? (the glibc / i18n release cycle is totally
> unknown to me)
It seems like it has been postponed to glibc 2.28:
https://sourceware.org/glibc/wiki/Release/2.27#Desirable_this_release.3F
where it says:
Bug 14095 - Review / update collation data from Unicode / ISO 14651
Work on this is almost finished, patch will be sent to the mailing list in
a few days.
Postponed to 2.28
(Bug 14095 is the big collation update which includes the fix for
this bug here).
Here is the glibc release timeline:
https://sourceware.org/glibc/wiki/Glibc%20Timeline
So glibc 2.28 is scheduled for 2018-08-01.
Sorry that I could not make it fast enough for glibc 2.27.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (29 preceding siblings ...)
2018-01-24 0:04 ` maiku.fabian at gmail dot com
@ 2018-01-24 8:49 ` elie.roux@telecom-bretagne.eu
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: elie.roux@telecom-bretagne.eu @ 2018-01-24 8:49 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #28 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
That's perfect, no problem! I was just being curious
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (30 preceding siblings ...)
2018-01-24 8:49 ` elie.roux@telecom-bretagne.eu
@ 2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-03-01 14:40 ` maiku.fabian at gmail dot com
` (2 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2018-02-27 16:55 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #29 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 874c56d7979858bbb1bb1604c55769ad0ce7a072 (commit)
via 159738548130d5ac4fe6178977e940ed5f8cfdc4 (commit)
via ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5 (commit)
via ac3a3b4b0d561d776b60317d6a926050c8541655 (commit)
via 770cbe147cf33580e05ba6de78993c3070c5c2f8 (commit)
via 0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea (commit)
via 43f3893f4b5679cb9eb93300b18f7febd17e5239 (commit)
via df74ef786f9c87ce5404df3b68a91cb9d2c4c26f (commit)
via d5adfbadd47e6836a7ddae54fba9f88e2b3354db (commit)
via 5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d (commit)
via 8a97e9002ffa807b49e1222e5a9d51ce7896f209 (commit)
via bbdd2fba7d36d8f03c919b34f95238d8cf248b47 (commit)
via 1569e551aff088ed48e2694b07045256f3582271 (commit)
via 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 (commit)
from 93d260ddda87a124d3fbb9af400fa154cfd00b4b (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=874c56d7979858bbb1bb1604c55769ad0ce7a072
commit 874c56d7979858bbb1bb1604c55769ad0ce7a072
Author: Mike FABIAN <mfabian@redhat.com>
Date: Thu Dec 21 18:56:52 2017 +0100
Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment.
See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898
These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.
As soon as that bug is fixed, these lines should be added back to the
test file.
* localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
be sorted correctly at the moment because of a bug.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=159738548130d5ac4fe6178977e940ed5f8cfdc4
commit 159738548130d5ac4fe6178977e940ed5f8cfdc4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 18:26:22 2017 +0100
Adapt collation in several locales to the new iso14651_t1_common file
[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)
* localedata/Makefile: Add new test files.
* localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
* localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation
order.
* localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
* localedata/am_ET.UTF-8.in: New test file.
* localedata/az_AZ.UTF-8.in: Likewise.
* localedata/be_BY.UTF-8.in: Likewise.
* localedata/ber_DZ.UTF-8.in: Likewise.
* localedata/ber_MA.UTF-8.in: Likewise.
* localedata/bg_BG.UTF-8.in: Likewise.
* localedata/br_FR.UTF-8.in: Likewise.
* localedata/cmn_TW.UTF-8.in: Likewise.
* localedata/crh_UA.UTF-8.in: Likewise.
* localedata/csb_PL.UTF-8.in: Likewise.
* localedata/cv_RU.UTF-8.in: Likewise.
* localedata/cy_GB.UTF-8.in: Likewise.
* localedata/dz_BT.UTF-8.in: Likewise.
* localedata/eo.UTF-8.in: Likewise.
* localedata/es_ES.UTF-8.in: Likewise.
* localedata/fa_IR.UTF-8.in: Likewise.
* localedata/fi_FI.UTF-8.in: Likewise.
* localedata/fil_PH.UTF-8.in: Likewise.
* localedata/fur_IT.UTF-8.in: Likewise.
* localedata/gez_ER.UTF-8@abegede.in: Likewise.
* localedata/ha_NG.UTF-8.in: Likewise.
* localedata/ig_NG.UTF-8.in: Likewise.
* localedata/ik_CA.UTF-8.in: Likewise.
* localedata/kk_KZ.UTF-8.in: Likewise.
* localedata/ku_TR.UTF-8.in: Likewise.
* localedata/ky_KG.UTF-8.in: Likewise.
* localedata/ln_CD.UTF-8.in: Likewise.
* localedata/mi_NZ.UTF-8.in: Likewise.
* localedata/ml_IN.UTF-8.in: Likewise.
* localedata/mn_MN.UTF-8.in: Likewise.
* localedata/mr_IN.UTF-8.in: Likewise.
* localedata/mt_MT.UTF-8.in: Likewise.
* localedata/nb_NO.UTF-8.in: Likewise.
* localedata/om_KE.UTF-8.in: Likewise.
* localedata/os_RU.UTF-8.in: Likewise.
* localedata/ps_AF.UTF-8.in: Likewise.
* localedata/ro_RO.UTF-8.in: Likewise.
* localedata/ru_RU.UTF-8.in: Likewise.
* localedata/sc_IT.UTF-8.in: Likewise.
* localedata/se_NO.UTF-8.in: Likewise.
* localedata/sq_AL.UTF-8.in: Likewise.
* localedata/sv_SE.UTF-8.in: Likewise.
* localedata/szl_PL.UTF-8.in: Likewise.
* localedata/tg_TJ.UTF-8.in: Likewise.
* localedata/tk_TM.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8@iqtelif.in: Likewise.
* localedata/ug_CN.UTF-8.in: Likewise.
* localedata/uz_UZ.UTF-8.in: Likewise.
* localedata/vi_VN.UTF-8.in: Likewise.
* localedata/yi_US.UTF-8.in: Likewise.
* localedata/yo_NG.UTF-8.in: Likewise.
* localedata/zh_CN.UTF-8.in: Likewise.
* localedata/locales/am_ET: Adapt collation rules to new
iso14651_t1_common
file and fix bugs in the collation.
* localedata/locales/az_AZ: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/br_FR: Likewise.
* localedata/locales/br_FR@euro: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/cns11643_stroke: Likewise.
* localedata/locales/crh_UA: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/csb_PL: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/en_CA: Likewise.
* localedata/locales/eo: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_EC: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/es_US: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fil_PH: Likewise.
* localedata/locales/fur_IT: Likewise.
* localedata/locales/gez_ER@abegede: Likewise.
* localedata/locales/ha_NG: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/hsb_DE: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/ig_NG: Likewise.
* localedata/locales/ik_CA: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/iso14651_t1_pinyin: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/ku_TR: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mi_NZ: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/mn_MN: Likewise.
* localedata/locales/mr_IN: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/ps_AF: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/ru_UA: Likewise.
* localedata/locales/sc_IT: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/sv_FI: Likewise.
* localedata/locales/sv_FI@euro: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/szl_PL: Likewise.
* localedata/locales/tg_TJ: Likewise.
* localedata/locales/ti_ER: Likewise.
* localedata/locales/tk_TM: Likewise.
* localedata/locales/tl_PH: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/tt_RU: Likewise.
* localedata/locales/tt_RU@iqtelif: Likewise.
* localedata/locales/ug_CN: Likewise.
* localedata/locales/uk_UA: Likewise.
* localedata/locales/uz_UZ: Likewise.
* localedata/locales/uz_UZ@cyrillic: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yo_NG: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5
commit ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Jan 1 15:33:50 2018 +0100
Improve gen-locales.mk and gen-locale.sh to make test files with @ options
work
With out this, adding collation test files like
localedata/gez_ER.UTF-8@abegede.in
does not work for locales which contain @ modifiers.
* gen-locales.mk: Make test files which contain @ modifiers in their
name work.
* localedata/gen-locale.sh: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ac3a3b4b0d561d776b60317d6a926050c8541655
commit ac3a3b4b0d561d776b60317d6a926050c8541655
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 23 17:29:36 2018 +0100
Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common
file.
See:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html
> A range expression represents the set of collating elements that fall
> between two elements in the current collation sequence,
> inclusively. It is expressed as the starting point and the ending
> point separated by a hyphen (-).
>
> Range expressions must not be used in portable applications because
> their behaviour is dependent on the collating sequence. Ranges will be
> treated according to the current collating sequence, and include such
> characters that fall within the range based on that collating
> sequence, regardless of character values. This, however, means that
> the interpretation will differ depending on collating sequence. If,
> for instance, one collating sequence defines ä as a variant of a,
> while another defines it as a letter following z, then the expression
> [ä-z] is valid in the first language and invalid in the second.
Therefore, using [a-z] does not make much sense except in the C/POSIX
locale.
The new iso14651_t1_common lists upper case and lower case Latin
characters
in a different order than the old one which causes surprising results
for example in the de_DE locale: [a-z] now includes A because A comes
after a in iso14651_t1_common but does not include Z because that comes
after z in iso14651_t1_common.
* posix/tst-fnmatch.input: Fix results for range expressions
for non C locales.
* posix/tst-regexloc.c: Do not use a range expression for
de_DE.ISO-8859-1 locale.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=770cbe147cf33580e05ba6de78993c3070c5c2f8
commit 770cbe147cf33580e05ba6de78993c3070c5c2f8
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Dec 15 07:19:45 2017 +0100
Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate
This test case tests how many collating elements are defined in
da_DK.ISO-8859-1 locale. The da_DK locale source defines 4:
collating-element <A-A> from "<U0041><U0041>"
collating-element <A-a> from "<U0041><U0061>"
collating-element <a-A> from "<U0061><U0041>"
collating-element <a-a> from "<U0061><U0061>"
The new iso14651_t1_common file defines more collating elements, two
of them are in the ISO-8859-1 range:
collating-element <U004C_00B7> from "<U004C><U00B7>" % decomposition of
LATIN CAPITAL LETTER L WITH MIDDLE DOT
collating-element <U006C_00B7> from "<U006C><U00B7>" % decomposition of
LATIN SMALL LETTER L WITH MIDDLE DOT
So the total count is now 6 instead of 4.
* posix/bug-regex5.c: Fix test case because with the new
iso14651_t1_common file, the da_DK locale now has 6 collating
elements
in the ISO-8859-1 range instead of 4 with the old
iso14651_t1_common
file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea
commit 0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 13 14:39:54 2017 +0100
Collation order of @-. and space has changed in new iso14651_t1_common
file, adapt test files
* localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
downloaded from ISO, the collation order of @-. and space has
changed.
Therefore, this test file needed to be adapted.
* localedata/fr_CA.UTF-8.in: Likewise.
* localedata/fr_FR.UTF-8.in: Likewise.
* localedata/uk_UA.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=43f3893f4b5679cb9eb93300b18f7febd17e5239
commit 43f3893f4b5679cb9eb93300b18f7febd17e5239
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Dec 12 14:39:34 2017 +0100
Collation order of ȥ has changed in new iso14651_t1_common file, adapt test
files
* localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
order of ȥ in the new iso14651_t1_common file.
* localedata/pl_PL.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=df74ef786f9c87ce5404df3b68a91cb9d2c4c26f
commit df74ef786f9c87ce5404df3b68a91cb9d2c4c26f
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 15:45:05 2018 +0100
Add sections for various scripts to the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Add sections for various
scripts to the iso14651_t1_common file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d5adfbadd47e6836a7ddae54fba9f88e2b3354db
commit d5adfbadd47e6836a7ddae54fba9f88e2b3354db
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Jan 31 06:18:47 2018 +0100
iso14651_t1_common: make the fourth level the codepoint for characters
which are ignorable on all 4 levels
Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d
commit 5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 20:00:24 2017 +0100
Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8a97e9002ffa807b49e1222e5a9d51ce7896f209
commit 8a97e9002ffa807b49e1222e5a9d51ce7896f209
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:24:47 2018 +0100
Fixing syntax errors after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bbdd2fba7d36d8f03c919b34f95238d8cf248b47
commit bbdd2fba7d36d8f03c919b34f95238d8cf248b47
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:07:39 2018 +0100
iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> →
<U000\1>
* localedata/locales/iso14651_t1_common: replace all <U.....>
with <U000.....> because glibc understands only 4 digit or 8 digit
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1569e551aff088ed48e2694b07045256f3582271
commit 1569e551aff088ed48e2694b07045256f3582271
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:04:31 2018 +0100
Necessary changes after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Necessary changes
to make the file downloaded from ISO usable by glibc.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4
commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 17:59:00 2018 +0100
Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same
character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 224 +
gen-locales.mk | 4 +-
localedata/Makefile | 185 +-
localedata/am_ET.UTF-8.in | 347 +
localedata/az_AZ.UTF-8.in | 73 +
localedata/be_BY.UTF-8.in | 16 +
localedata/ber_DZ.UTF-8.in | 50 +
localedata/ber_MA.UTF-8.in | 13 +
localedata/bg_BG.UTF-8.in | 57 +
localedata/br_FR.UTF-8.in | 15 +
localedata/cmn_TW.UTF-8.in |75649 ++++++++++++++++++++++++++
localedata/crh_UA.UTF-8.in | 50 +
localedata/cs_CZ.UTF-8.in | 4 +-
localedata/csb_PL.UTF-8.in | 70 +
localedata/cv_RU.UTF-8.in | 45 +
localedata/cy_GB.UTF-8.in | 72 +
localedata/da_DK.ISO-8859-1.in | 4 +-
localedata/dz_BT.UTF-8.in | 789 +
localedata/eo.UTF-8.in | 32 +
localedata/es_ES.UTF-8.in | 46 +
localedata/fa_IR.UTF-8.in | 71 +
localedata/fi_FI.UTF-8.in | 140 +
localedata/fil_PH.UTF-8.in | 16 +
localedata/fr_CA.UTF-8.in | 9 +-
localedata/fr_FR.UTF-8.in | 9 +-
localedata/fur_IT.UTF-8.in | 12 +
localedata/gen-locale.sh | 5 +-
localedata/gez_ER.UTF-8@abegede.in | 365 +
localedata/ha_NG.UTF-8.in | 47 +
localedata/ig_NG.UTF-8.in | 93 +
localedata/ik_CA.UTF-8.in | 60 +
localedata/kk_KZ.UTF-8.in | 40 +
localedata/ku_TR.UTF-8.in | 52 +
localedata/ky_KG.UTF-8.in | 72 +
localedata/ln_CD.UTF-8.in | 18 +
localedata/locales/am_ET | 549 +-
localedata/locales/az_AZ | 201 +-
localedata/locales/be_BY | 41 +-
localedata/locales/ber_DZ | 173 +-
localedata/locales/ber_MA | 42 +-
localedata/locales/bg_BG | 290 +-
localedata/locales/br_FR | 55 +-
localedata/locales/br_FR@euro | 3 +-
localedata/locales/ca_ES | 16 +-
localedata/locales/cns11643_stroke | 9 +-
localedata/locales/crh_UA | 111 +-
localedata/locales/cs_CZ | 69 +-
localedata/locales/csb_PL | 83 +-
localedata/locales/cv_RU | 75 +-
localedata/locales/cy_GB | 242 +-
localedata/locales/da_DK | 116 +-
localedata/locales/dz_BT | 2484 +-
localedata/locales/en_CA | 8 -
localedata/locales/eo | 69 +-
localedata/locales/es_CU | 3 +-
localedata/locales/es_EC | 2 +-
localedata/locales/es_ES | 49 +-
localedata/locales/es_US | 56 +-
localedata/locales/et_EE | 31 +-
localedata/locales/fa_IR | 287 +-
localedata/locales/fi_FI | 175 +-
localedata/locales/fil_PH | 57 +-
localedata/locales/fur_IT | 15 +-
localedata/locales/gez_ER@abegede | 409 +-
localedata/locales/ha_NG | 165 +-
localedata/locales/hr_HR | 84 +-
localedata/locales/hsb_DE | 64 +-
localedata/locales/hu_HU | 298 +-
localedata/locales/ig_NG | 453 +-
localedata/locales/ik_CA | 153 +-
localedata/locales/is_IS | 72 +-
localedata/locales/iso14651_t1_common |94998 +++++++++++++++++++++++++++++----
localedata/locales/iso14651_t1_pinyin | 9 +-
localedata/locales/kk_KZ | 132 +-
localedata/locales/ku_TR | 87 +-
localedata/locales/ky_KG | 59 +-
localedata/locales/ln_CD | 47 +-
localedata/locales/lt_LT | 52 +-
localedata/locales/lv_LV | 67 +-
localedata/locales/mi_NZ | 43 +-
localedata/locales/ml_IN | 158 +-
localedata/locales/mn_MN | 34 +-
localedata/locales/mr_IN | 76 +-
localedata/locales/mt_MT | 144 +-
localedata/locales/nan_TW@latin | 33 +-
localedata/locales/nb_NO | 120 +-
localedata/locales/om_KE | 120 +-
localedata/locales/os_RU | 14 +-
localedata/locales/pl_PL | 66 +-
localedata/locales/ps_AF | 224 +-
localedata/locales/ro_RO | 99 +-
localedata/locales/ru_RU | 24 +-
localedata/locales/ru_UA | 16 +-
localedata/locales/sc_IT | 15 +-
localedata/locales/se_NO | 298 +-
localedata/locales/si_LK | 42 +
localedata/locales/sq_AL | 291 +-
localedata/locales/sv_FI | 2 +-
localedata/locales/sv_FI@euro | 2 +-
localedata/locales/sv_SE | 113 +-
localedata/locales/szl_PL | 86 +-
localedata/locales/tg_TJ | 106 +-
localedata/locales/ti_ER | 2 +
localedata/locales/tk_TM | 399 +-
localedata/locales/tl_PH | 31 +-
localedata/locales/tr_TR | 47 +-
localedata/locales/tt_RU | 244 +-
localedata/locales/tt_RU@iqtelif | 14 +-
localedata/locales/ug_CN | 196 +-
localedata/locales/uk_UA | 487 +-
localedata/locales/uz_UZ | 131 +-
localedata/locales/uz_UZ@cyrillic | 56 +-
localedata/locales/vi_VN | 242 +-
localedata/locales/yi_US | 125 +-
localedata/locales/yo_NG | 365 +-
localedata/lv_LV.UTF-8.in | 6 +-
localedata/mi_NZ.UTF-8.in | 37 +
localedata/ml_IN.UTF-8.in | 25 +
localedata/mn_MN.UTF-8.in | 15 +
localedata/mr_IN.UTF-8.in | 9 +
localedata/mt_MT.UTF-8.in | 39 +
localedata/nan_TW.UTF-8@latin.in | 11 +
localedata/nb_NO.UTF-8.in | 66 +
localedata/om_KE.UTF-8.in | 36 +
localedata/os_RU.UTF-8.in | 9 +
localedata/pl_PL.UTF-8.in | 4 +-
localedata/ps_AF.UTF-8.in | 61 +
localedata/ro_RO.UTF-8.in | 32 +
localedata/ru_RU.UTF-8.in | 15 +
localedata/sc_IT.UTF-8.in | 12 +
localedata/se_NO.UTF-8.in | 144 +
localedata/sq_AL.UTF-8.in | 82 +
localedata/sv_SE.ISO-8859-1.in | 10 +-
localedata/sv_SE.UTF-8.in | 107 +
localedata/szl_PL.UTF-8.in | 49 +
localedata/tg_TJ.UTF-8.in | 105 +
localedata/tk_TM.UTF-8.in | 213 +
localedata/tt_RU.UTF-8.in | 194 +
localedata/tt_RU.UTF-8@iqtelif.in | 53 +
localedata/ug_CN.UTF-8.in | 16 +
localedata/uk_UA.UTF-8.in | 18 +-
localedata/uz_UZ.UTF-8.in | 26 +
localedata/vi_VN.UTF-8.in | 45 +
localedata/yi_US.UTF-8.in | 39 +
localedata/yo_NG.UTF-8.in | 30 +
localedata/zh_CN.UTF-8.in |25498 +++++++++
posix/bug-regex5.c | 4 +-
posix/tst-fnmatch.input | 58 +-
posix/tst-regexloc.c | 4 +-
149 files changed, 197751 insertions(+), 15000 deletions(-)
create mode 100644 localedata/am_ET.UTF-8.in
create mode 100644 localedata/az_AZ.UTF-8.in
create mode 100644 localedata/be_BY.UTF-8.in
create mode 100644 localedata/ber_DZ.UTF-8.in
create mode 100644 localedata/ber_MA.UTF-8.in
create mode 100644 localedata/bg_BG.UTF-8.in
create mode 100644 localedata/br_FR.UTF-8.in
create mode 100644 localedata/cmn_TW.UTF-8.in
create mode 100644 localedata/crh_UA.UTF-8.in
create mode 100644 localedata/csb_PL.UTF-8.in
create mode 100644 localedata/cv_RU.UTF-8.in
create mode 100644 localedata/cy_GB.UTF-8.in
create mode 100644 localedata/dz_BT.UTF-8.in
create mode 100644 localedata/eo.UTF-8.in
create mode 100644 localedata/es_ES.UTF-8.in
create mode 100644 localedata/fa_IR.UTF-8.in
create mode 100644 localedata/fi_FI.UTF-8.in
create mode 100644 localedata/fil_PH.UTF-8.in
create mode 100644 localedata/fur_IT.UTF-8.in
create mode 100644 localedata/gez_ER.UTF-8@abegede.in
create mode 100644 localedata/ha_NG.UTF-8.in
create mode 100644 localedata/ig_NG.UTF-8.in
create mode 100644 localedata/ik_CA.UTF-8.in
create mode 100644 localedata/kk_KZ.UTF-8.in
create mode 100644 localedata/ku_TR.UTF-8.in
create mode 100644 localedata/ky_KG.UTF-8.in
create mode 100644 localedata/ln_CD.UTF-8.in
create mode 100644 localedata/mi_NZ.UTF-8.in
create mode 100644 localedata/ml_IN.UTF-8.in
create mode 100644 localedata/mn_MN.UTF-8.in
create mode 100644 localedata/mr_IN.UTF-8.in
create mode 100644 localedata/mt_MT.UTF-8.in
create mode 100644 localedata/nan_TW.UTF-8@latin.in
create mode 100644 localedata/nb_NO.UTF-8.in
create mode 100644 localedata/om_KE.UTF-8.in
create mode 100644 localedata/os_RU.UTF-8.in
create mode 100644 localedata/ps_AF.UTF-8.in
create mode 100644 localedata/ro_RO.UTF-8.in
create mode 100644 localedata/ru_RU.UTF-8.in
create mode 100644 localedata/sc_IT.UTF-8.in
create mode 100644 localedata/se_NO.UTF-8.in
create mode 100644 localedata/sq_AL.UTF-8.in
create mode 100644 localedata/sv_SE.UTF-8.in
create mode 100644 localedata/szl_PL.UTF-8.in
create mode 100644 localedata/tg_TJ.UTF-8.in
create mode 100644 localedata/tk_TM.UTF-8.in
create mode 100644 localedata/tt_RU.UTF-8.in
create mode 100644 localedata/tt_RU.UTF-8@iqtelif.in
create mode 100644 localedata/ug_CN.UTF-8.in
create mode 100644 localedata/uz_UZ.UTF-8.in
create mode 100644 localedata/vi_VN.UTF-8.in
create mode 100644 localedata/yi_US.UTF-8.in
create mode 100644 localedata/yo_NG.UTF-8.in
create mode 100644 localedata/zh_CN.UTF-8.in
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (31 preceding siblings ...)
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
@ 2018-03-01 14:40 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
34 siblings, 0 replies; 36+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-03-01 14:40 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |2.28
--- Comment #30 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed.
(In reply to Elie Roux from comment #28)
> That's perfect, no problem! I was just being curious
I hope your collation improvements will be accepted by CLDR soon.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (32 preceding siblings ...)
2018-03-01 14:40 ` maiku.fabian at gmail dot com
@ 2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
34 siblings, 0 replies; 36+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2018-03-02 12:59 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
--- Comment #31 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, mfabian/collation-update-2.27 has been created
at 9589174d076327deb7ed816d16b89b0e7470abd6 (commit)
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9589174d076327deb7ed816d16b89b0e7470abd6
commit 9589174d076327deb7ed816d16b89b0e7470abd6
Author: Mike FABIAN <mfabian@redhat.com>
Date: Thu Dec 21 18:56:52 2017 +0100
Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment.
See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898
These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.
As soon as that bug is fixed, these lines should be added back to the
test file.
* localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
be sorted correctly at the moment because of a bug.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e289a7d4c7f2abf09e4a4877b8cadcded7440e55
commit e289a7d4c7f2abf09e4a4877b8cadcded7440e55
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 18:26:22 2017 +0100
Adapt collation in several locales to the new iso14651_t1_common file
[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)
* localedata/Makefile: Add new test files.
* localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
* localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation
order.
* localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
* localedata/am_ET.UTF-8.in: New test file.
* localedata/az_AZ.UTF-8.in: Likewise.
* localedata/be_BY.UTF-8.in: Likewise.
* localedata/ber_DZ.UTF-8.in: Likewise.
* localedata/ber_MA.UTF-8.in: Likewise.
* localedata/bg_BG.UTF-8.in: Likewise.
* localedata/br_FR.UTF-8.in: Likewise.
* localedata/cmn_TW.UTF-8.in: Likewise.
* localedata/crh_UA.UTF-8.in: Likewise.
* localedata/csb_PL.UTF-8.in: Likewise.
* localedata/cv_RU.UTF-8.in: Likewise.
* localedata/cy_GB.UTF-8.in: Likewise.
* localedata/dz_BT.UTF-8.in: Likewise.
* localedata/eo.UTF-8.in: Likewise.
* localedata/es_ES.UTF-8.in: Likewise.
* localedata/fa_IR.UTF-8.in: Likewise.
* localedata/fi_FI.UTF-8.in: Likewise.
* localedata/fil_PH.UTF-8.in: Likewise.
* localedata/fur_IT.UTF-8.in: Likewise.
* localedata/gez_ER.UTF-8@abegede.in: Likewise.
* localedata/ha_NG.UTF-8.in: Likewise.
* localedata/ig_NG.UTF-8.in: Likewise.
* localedata/ik_CA.UTF-8.in: Likewise.
* localedata/kk_KZ.UTF-8.in: Likewise.
* localedata/ku_TR.UTF-8.in: Likewise.
* localedata/ky_KG.UTF-8.in: Likewise.
* localedata/ln_CD.UTF-8.in: Likewise.
* localedata/mi_NZ.UTF-8.in: Likewise.
* localedata/ml_IN.UTF-8.in: Likewise.
* localedata/mn_MN.UTF-8.in: Likewise.
* localedata/mr_IN.UTF-8.in: Likewise.
* localedata/mt_MT.UTF-8.in: Likewise.
* localedata/nb_NO.UTF-8.in: Likewise.
* localedata/om_KE.UTF-8.in: Likewise.
* localedata/os_RU.UTF-8.in: Likewise.
* localedata/ps_AF.UTF-8.in: Likewise.
* localedata/ro_RO.UTF-8.in: Likewise.
* localedata/ru_RU.UTF-8.in: Likewise.
* localedata/sc_IT.UTF-8.in: Likewise.
* localedata/se_NO.UTF-8.in: Likewise.
* localedata/sq_AL.UTF-8.in: Likewise.
* localedata/sv_SE.UTF-8.in: Likewise.
* localedata/szl_PL.UTF-8.in: Likewise.
* localedata/tg_TJ.UTF-8.in: Likewise.
* localedata/tk_TM.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8@iqtelif.in: Likewise.
* localedata/ug_CN.UTF-8.in: Likewise.
* localedata/uz_UZ.UTF-8.in: Likewise.
* localedata/vi_VN.UTF-8.in: Likewise.
* localedata/yi_US.UTF-8.in: Likewise.
* localedata/yo_NG.UTF-8.in: Likewise.
* localedata/zh_CN.UTF-8.in: Likewise.
* localedata/locales/am_ET: Adapt collation rules to new
iso14651_t1_common
file and fix bugs in the collation.
* localedata/locales/az_AZ: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/br_FR: Likewise.
* localedata/locales/br_FR@euro: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/cns11643_stroke: Likewise.
* localedata/locales/crh_UA: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/csb_PL: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/en_CA: Likewise.
* localedata/locales/eo: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_EC: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/es_US: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fil_PH: Likewise.
* localedata/locales/fur_IT: Likewise.
* localedata/locales/gez_ER@abegede: Likewise.
* localedata/locales/ha_NG: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/hsb_DE: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/ig_NG: Likewise.
* localedata/locales/ik_CA: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/iso14651_t1_pinyin: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/ku_TR: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mi_NZ: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/mn_MN: Likewise.
* localedata/locales/mr_IN: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/ps_AF: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/ru_UA: Likewise.
* localedata/locales/sc_IT: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/sv_FI: Likewise.
* localedata/locales/sv_FI@euro: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/szl_PL: Likewise.
* localedata/locales/tg_TJ: Likewise.
* localedata/locales/ti_ER: Likewise.
* localedata/locales/tk_TM: Likewise.
* localedata/locales/tl_PH: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/tt_RU: Likewise.
* localedata/locales/tt_RU@iqtelif: Likewise.
* localedata/locales/ug_CN: Likewise.
* localedata/locales/uk_UA: Likewise.
* localedata/locales/uz_UZ: Likewise.
* localedata/locales/uz_UZ@cyrillic: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yo_NG: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=242596394db9dad6147bb2b7bcb53d8a7610e1d0
commit 242596394db9dad6147bb2b7bcb53d8a7610e1d0
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Jan 1 15:33:50 2018 +0100
Improve gen-locales.mk and gen-locale.sh to make test files with @ options
work
With out this, adding collation test files like
localedata/gez_ER.UTF-8@abegede.in
does not work for locales which contain @ modifiers.
* gen-locales.mk: Make test files which contain @ modifiers in their
name work.
* localedata/gen-locale.sh: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cc5351f2c0502826f8b4143f3646d44e334ff7b8
commit cc5351f2c0502826f8b4143f3646d44e334ff7b8
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 23 17:29:36 2018 +0100
Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common
file.
See:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html
> A range expression represents the set of collating elements that fall
> between two elements in the current collation sequence,
> inclusively. It is expressed as the starting point and the ending
> point separated by a hyphen (-).
>
> Range expressions must not be used in portable applications because
> their behaviour is dependent on the collating sequence. Ranges will be
> treated according to the current collating sequence, and include such
> characters that fall within the range based on that collating
> sequence, regardless of character values. This, however, means that
> the interpretation will differ depending on collating sequence. If,
> for instance, one collating sequence defines ä as a variant of a,
> while another defines it as a letter following z, then the expression
> [ä-z] is valid in the first language and invalid in the second.
Therefore, using [a-z] does not make much sense except in the C/POSIX
locale.
The new iso14651_t1_common lists upper case and lower case Latin
characters
in a different order than the old one which causes surprising results
for example in the de_DE locale: [a-z] now includes A because A comes
after a in iso14651_t1_common but does not include Z because that comes
after z in iso14651_t1_common.
* posix/tst-fnmatch.input: Fix results for range expressions
for non C locales.
* posix/tst-regexloc.c: Do not use a range expression for
de_DE.ISO-8859-1 locale.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ffa8106c727607fb365f2b93649fe3ea182dffe4
commit ffa8106c727607fb365f2b93649fe3ea182dffe4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Dec 15 07:19:45 2017 +0100
Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate
This test case tests how many collating elements are defined in
da_DK.ISO-8859-1 locale. The da_DK locale source defines 4:
collating-element <A-A> from "<U0041><U0041>"
collating-element <A-a> from "<U0041><U0061>"
collating-element <a-A> from "<U0061><U0041>"
collating-element <a-a> from "<U0061><U0061>"
The new iso14651_t1_common file defines more collating elements, two
of them are in the ISO-8859-1 range:
collating-element <U004C_00B7> from "<U004C><U00B7>" % decomposition of
LATIN CAPITAL LETTER L WITH MIDDLE DOT
collating-element <U006C_00B7> from "<U006C><U00B7>" % decomposition of
LATIN SMALL LETTER L WITH MIDDLE DOT
So the total count is now 6 instead of 4.
* posix/bug-regex5.c: Fix test case because with the new
iso14651_t1_common file, the da_DK locale now has 6 collating
elements
in the ISO-8859-1 range instead of 4 with the old
iso14651_t1_common
file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=61e613fb97aa619ae4fabac3f106d5fffe15eacb
commit 61e613fb97aa619ae4fabac3f106d5fffe15eacb
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 13 14:39:54 2017 +0100
Collation order of @-. and space has changed in new iso14651_t1_common
file, adapt test files
* localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
downloaded from ISO, the collation order of @-. and space has
changed.
Therefore, this test file needed to be adapted.
* localedata/fr_CA.UTF-8.in: Likewise.
* localedata/fr_FR.UTF-8.in: Likewise.
* localedata/uk_UA.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=059454de60bdb1be9979ee09596c1e9a7e9e6c8b
commit 059454de60bdb1be9979ee09596c1e9a7e9e6c8b
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Dec 12 14:39:34 2017 +0100
Collation order of ȥ has changed in new iso14651_t1_common file, adapt test
files
* localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
order of ȥ in the new iso14651_t1_common file.
* localedata/pl_PL.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1f4df3bb2ac69f2e1947c2953379a7f19b5f0c35
commit 1f4df3bb2ac69f2e1947c2953379a7f19b5f0c35
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 15:45:05 2018 +0100
Add sections for various scripts to the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Add sections for various
scripts to the iso14651_t1_common file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a93fecdcece3e2178834f4b4868b2309b0158753
commit a93fecdcece3e2178834f4b4868b2309b0158753
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Jan 31 06:18:47 2018 +0100
iso14651_t1_common: make the fourth level the codepoint for characters
which are ignorable on all 4 levels
Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3e7089bf28ed1fd77e644bb3ce7405aff7847e61
commit 3e7089bf28ed1fd77e644bb3ce7405aff7847e61
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 20:00:24 2017 +0100
Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=50a54ba443575e69ffb03aa67d53ccf8b66a4fbd
commit 50a54ba443575e69ffb03aa67d53ccf8b66a4fbd
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:24:47 2018 +0100
Fixing syntax errors after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=661ab21c7521ba8e6e8bc7dad897b6cf162e0cd0
commit 661ab21c7521ba8e6e8bc7dad897b6cf162e0cd0
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:07:39 2018 +0100
iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> →
<U000\1>
* localedata/locales/iso14651_t1_common: replace all <U.....>
with <U000.....> because glibc understands only 4 digit or 8 digit
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06061c30d615b2862ac360f11384092c92022ea7
commit 06061c30d615b2862ac360f11384092c92022ea7
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:04:31 2018 +0100
Necessary changes after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Necessary changes
to make the file downloaded from ISO usable by glibc.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bc1d41044c0cf9f0214acdbfd79b6cd11fd1e8c1
commit bc1d41044c0cf9f0214acdbfd79b6cd11fd1e8c1
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 17:59:00 2018 +0100
Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same
character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=16e349c550942d274d3193ccedaa88855e3ac690
commit 16e349c550942d274d3193ccedaa88855e3ac690
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Mar 2 11:29:24 2018 +0100
Remove --quiet argument when installing locales
Using this argument hides problems. I would like to see when something
fails.
* localedata/Makefile: Remove --quiet argument when
installing locales
-----------------------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
` (33 preceding siblings ...)
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
@ 2018-03-31 12:32 ` jeremip11 at gmail dot com
34 siblings, 0 replies; 36+ messages in thread
From: jeremip11 at gmail dot com @ 2018-03-31 12:32 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=21547
Jeremi <jeremip11 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jeremip11 at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2018-03-31 12:32 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-05 10:37 [Bug localedata/21547] New: Tibetan script collation broken (Dzongkha and Tibetan) elie.roux@telecom-bretagne.eu
2017-06-16 9:03 ` [Bug localedata/21547] " elie.roux@telecom-bretagne.eu
2017-10-21 8:26 ` maiku.fabian at gmail dot com
2017-12-14 15:51 ` maiku.fabian at gmail dot com
2017-12-18 17:15 ` maiku.fabian at gmail dot com
2017-12-18 17:58 ` elie.roux@telecom-bretagne.eu
2017-12-18 18:00 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:25 ` maiku.fabian at gmail dot com
2018-01-15 10:50 ` elie.roux@telecom-bretagne.eu
2018-01-15 10:51 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:35 ` maiku.fabian at gmail dot com
2018-01-15 14:36 ` maiku.fabian at gmail dot com
2018-01-15 14:47 ` elie.roux@telecom-bretagne.eu
2018-01-15 14:56 ` maiku.fabian at gmail dot com
2018-01-15 15:06 ` maiku.fabian at gmail dot com
2018-01-15 15:14 ` elie.roux@telecom-bretagne.eu
2018-01-15 15:15 ` maiku.fabian at gmail dot com
2018-01-15 15:18 ` maiku.fabian at gmail dot com
2018-01-15 15:25 ` maiku.fabian at gmail dot com
2018-01-15 19:09 ` elie.roux@telecom-bretagne.eu
2018-01-15 20:58 ` maiku.fabian at gmail dot com
2018-01-15 22:03 ` elie.roux@telecom-bretagne.eu
2018-01-16 7:38 ` maiku.fabian at gmail dot com
2018-01-16 8:11 ` elie.roux@telecom-bretagne.eu
2018-01-22 15:17 ` maiku.fabian at gmail dot com
2018-01-22 15:18 ` maiku.fabian at gmail dot com
2018-01-22 15:22 ` maiku.fabian at gmail dot com
2018-01-22 21:45 ` elie.roux@telecom-bretagne.eu
2018-01-23 17:32 ` maiku.fabian at gmail dot com
2018-01-23 21:57 ` elie.roux@telecom-bretagne.eu
2018-01-24 0:04 ` maiku.fabian at gmail dot com
2018-01-24 8:49 ` elie.roux@telecom-bretagne.eu
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-03-01 14:40 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).