* [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
@ 2017-12-05 14:32 maiku.fabian at gmail dot com
2017-12-05 14:38 ` [Bug localedata/22550] " maiku.fabian at gmail dot com
` (19 more replies)
0 siblings, 20 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-05 14:32 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Bug ID: 22550
Summary: es_ES locale (and other es_* locales): collation
should treat ñ as a primary different character,
sync the collation for Spanish with CLDR
Product: glibc
Version: 2.26
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: maiku.fabian at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---
https://github.com/voidlinux/void-packages/issues/9744
And CLDR has:
https://unicode.org/cldr/trac/browser/trunk/common/collation/es.xml
which contains:
<collation type="standard">
<cr><![CDATA[
&N<ñ<<<Ñ
]]></cr>
and:
<collation type="traditional">
<cr><![CDATA[
&N<ñ<<<Ñ
&C<ch<<<Ch<<<CH
&l<ll<<<Ll<<<LL
]]></cr>
I guess we want “standard”, not “traditional”.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
@ 2017-12-05 14:38 ` maiku.fabian at gmail dot com
2017-12-05 14:46 ` maiku.fabian at gmail dot com
` (18 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-05 14:38 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |carlos at redhat dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
2017-12-05 14:38 ` [Bug localedata/22550] " maiku.fabian at gmail dot com
@ 2017-12-05 14:46 ` maiku.fabian at gmail dot com
2017-12-05 15:18 ` carlos at redhat dot com
` (17 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-05 14:46 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
The es_US locale currently implements the “traditional” sort order.
Do we want that?:
LC_COLLATE
copy "iso14651_t1"
collating-element <C-H> from "<U0043><U0048>"
collating-element <c-h> from "<U0063><U0068>"
collating-element <C-h> from "<U0043><U0068>"
collating-element <c-H> from "<U0063><U0048>"
collating-element <L-L> from "<U004C><U004C>"
collating-element <l-l> from "<U006C><U006C>"
collating-element <L-l> from "<U004C><U006C>"
collating-element <l-L> from "<U006C><U004C>"
collating-symbol <ch>
collating-symbol <ll>
collating-symbol <ntilde>
collating-symbol <CAP-MIN>
collating-symbol <MIN-CAP>
reorder-after <MIN>
<MIN-CAP>
<CAP-MIN>
reorder-after <n>
<ntilde>
reorder-after <U006E>
<U00F1> <ntilde>;<BAS>;<MIN>;IGNORE
reorder-after <U004E>
<U00D1> <ntilde>;<BAS>;<CAP>;IGNORE
reorder-after <c>
<ch>
reorder-after <U0063>
<c-H> <ch>;<BAS>;<MIN-CAP>;IGNORE
<c-h> <ch>;<BAS>;<MIN>;IGNORE
reorder-after <U0043>
<C-H> <ch>;<BAS>;<CAP>;IGNORE
<C-h> <ch>;<BAS>;<CAP-MIN>;IGNORE
reorder-after <l>
<ll>
reorder-after <U006C>
<l-L> <ll>;<BAS>;<MIN-CAP>;IGNORE
<l-l> <ll>;<BAS>;<MIN>;IGNORE
reorder-after <U004C>
<L-L> <ll>;<BAS>;<CAP>;IGNORE
<L-l> <ll>;<BAS>;<CAP-MIN>;IGNORE
reorder-end
END LC_COLLATE
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
2017-12-05 14:38 ` [Bug localedata/22550] " maiku.fabian at gmail dot com
2017-12-05 14:46 ` maiku.fabian at gmail dot com
@ 2017-12-05 15:18 ` carlos at redhat dot com
2017-12-05 15:20 ` carlos at redhat dot com
` (16 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: carlos at redhat dot com @ 2017-12-05 15:18 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #2 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Mike FABIAN from comment #1)
> The es_US locale currently implements the “traditional” sort order.
> Do we want that?:
Yes. I'm Argentine, and I would expect the traditional sort order.
Even though 'ch' is not in common use, the traditional sort order is
appropriate for all Spanish locales.
We should harmonize the collation across all es locales.
We may need to offer a post-2010 "new style" collation locale, but we need to
get feedback from our users.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (2 preceding siblings ...)
2017-12-05 15:18 ` carlos at redhat dot com
@ 2017-12-05 15:20 ` carlos at redhat dot com
2017-12-05 15:31 ` carlos at redhat dot com
` (15 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: carlos at redhat dot com @ 2017-12-05 15:20 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #3 from Carlos O'Donell <carlos at redhat dot com> ---
We definitely need to have ch, ll, and ñ as a distinct primary characters.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (3 preceding siblings ...)
2017-12-05 15:20 ` carlos at redhat dot com
@ 2017-12-05 15:31 ` carlos at redhat dot com
2017-12-05 15:47 ` carlos at redhat dot com
` (14 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: carlos at redhat dot com @ 2017-12-05 15:31 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #4 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Carlos O'Donell from comment #3)
> We definitely need to have ch, ll, and ñ as a distinct primary characters.
I have a slight worry that in those Spanish dialects that have deviated more
heavily from Castellano that the ch and ll digraphs will cause confusion among
users of the locale.
This needs to be seen though, and we should keep an eye out for such regional
differences.
Locale to locale we may need to adjust from "tranditional" to "standard" to
match geographic / cultural tradition.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (4 preceding siblings ...)
2017-12-05 15:31 ` carlos at redhat dot com
@ 2017-12-05 15:47 ` carlos at redhat dot com
2017-12-05 16:13 ` hector.monacci at gmail dot com
` (13 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: carlos at redhat dot com @ 2017-12-05 15:47 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #5 from Carlos O'Donell <carlos at redhat dot com> ---
(In reply to Carlos O'Donell from comment #4)
> (In reply to Carlos O'Donell from comment #3)
> > We definitely need to have ch, ll, and ñ as a distinct primary characters.
>
> I have a slight worry that in those Spanish dialects that have deviated more
> heavily from Castellano that the ch and ll digraphs will cause confusion
> among users of the locale.
>
> This needs to be seen though, and we should keep an eye out for such
> regional differences.
>
> Locale to locale we may need to adjust from "tranditional" to "standard" to
> match geographic / cultural tradition.
I take it back.
I think we should aim for maximum compatibility.
(a) Make ñ, a proper primary character everywhere. This is the correct thing
to do for all spanish locales.
(b) Only make ch and ll proper characters in Argentina, Mexico, Spain, US, and
Venezuela and collate them as-per "traditional". These countries are known to
support the traditional characters and their sortings and not follow RAE's
ruling in 2010 to drop ch/ll.
(c) Everything else remains as-is following "standard" with the new ñ from (a),
an sorting accordingly. This maximizes compatibility for the other locales.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (5 preceding siblings ...)
2017-12-05 15:47 ` carlos at redhat dot com
@ 2017-12-05 16:13 ` hector.monacci at gmail dot com
2017-12-06 7:12 ` maiku.fabian at gmail dot com
` (12 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-05 16:13 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Héctor M. Monacci <hector.monacci at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |hector.monacci at gmail dot com
--- Comment #6 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
Hi everyone.
I am a native Spanish speaker born in Argentina. I have a University degree in
Grammar. I have worked 4 years teaching Spanish grammar at Buenos Aires
University.
Up to 1994, Spanish dictionaries around the world included 2 "letters", namely
"ch" and "ll", which together with "ñ" where meant to have a unique position in
the alphabet. As independent letters, CH was located always after C, LL was
located always after L, and Ñ was located always after N.
So this was the OLD (pre-1995) correct Spanish sorting order:
CH:
1. caca
2. caco
3. cacha
LL:
1. lana
2. luna
3. llanto
Ñ:
1. pena
2. penoso
3. peña
This was very different since this ACUERDO entered into force back in 1994:
http://www.rae.es/consultas/exclusion-de-ch-y-ll-del-abecedario
This was an ACUERDO (i.e., an AGREEMENT). Every Academy of the Spanish
Language, the one in Spain, the one in Argentina (Academia Argentina de
Letras), and the two dozen other Spanish Language Academies, signed this
Agreement.
The official Spanish Language alphabet, since then, has no unique place for CH
and LL, now just considered digraphs instead of letters.
This is how you order the same words all over the Spanish speaking word since
1994:
CH:
1. caca
2. cacha
3. caco
LL:
1. lana
1. llanto
3. luna
Ñ:
1. pena
2. penoso
3. peña
As you can see, only the Ñ remains in the same situation as before the
Agreement of 1994: i.e., as an independent letter to be sorted always after N.
Of course, all paper dictionaries you may find, printed before 1995, still
carry the old letter situation. Of course, people who learnt to read and left
school before 19954 may be less aware of the change.
But there is no doubt about it.
Even the US Academia de la Lengua Española signed that Agreement, as you can
see here:
http://www.rae.es/la-institucion/politica-panhispanica/x-congreso-madrid-1994
So, since 1994 Spanish is less different from the rest of the languages that
use the latin alphabet, in the sense that we have only one special letter that
affects alphabetic ordering: namely, Ñ.
Accented vowels (á, é, í, ó, ú and ü) DO NOT change alphabetic ordering. They
are treated as ordinary vowels (a, e, i, o, u).
The Ñ letter still does. CH doesn't. LL doesn't.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (6 preceding siblings ...)
2017-12-05 16:13 ` hector.monacci at gmail dot com
@ 2017-12-06 7:12 ` maiku.fabian at gmail dot com
2017-12-06 8:38 ` maiku.fabian at gmail dot com
` (11 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-06 7:12 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Héctor M. Monacci from comment #6)
> The Ñ letter still does. CH doesn't. LL doesn't.
I think Héctor is right here and we should only make Ñ special in *all*
Spanish locales.
Until now, only es_US (and es_EC which did copy "es_US") do anything
special at all for the collation. es_US does treat Ñ, CH, and LL as
special. So if we only treat Ñ as special from now on, this is
certainly an improvement for most Spanish locales over the current
state. And it follows the current official rules. It could be a
regression only for user of es_US and es_EC which are used to the
traditional sorting rules.
Overall I think going with the current official rules is better than
using the traditional rules for some random number of es_* locales
where we just guess that the users of these locales would like the
tradtional rules.
Carlos and Héctor are both from Argentina, Carlos prefers the
traditional rules and Héctor the current rules. So we have
a disagreement for es_AR already. In case of such a disagreement,
I think it is best just to follow the official rules.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (7 preceding siblings ...)
2017-12-06 7:12 ` maiku.fabian at gmail dot com
@ 2017-12-06 8:38 ` maiku.fabian at gmail dot com
2017-12-06 10:05 ` hector.monacci at gmail dot com
` (10 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-06 8:38 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #8 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mike FABIAN from comment #7)
> (In reply to Héctor M. Monacci from comment #6)
>
> > The Ñ letter still does. CH doesn't. LL doesn't.
>
> I think Héctor is right here and we should only make Ñ special in *all*
> Spanish locales.
[...]
> I think it is best just to follow the official rules.
And if there are really some users for some Spanish locales who
insist on the traditional rules, we could add locales like
es_US@traditional
but only if somebody really asks for that ...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (8 preceding siblings ...)
2017-12-06 8:38 ` maiku.fabian at gmail dot com
@ 2017-12-06 10:05 ` hector.monacci at gmail dot com
2017-12-06 10:28 ` hector.monacci at gmail dot com
` (9 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-06 10:05 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #9 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
I agree with Mike.
I think it is way better to synchronize all the variants of the Spanish
language around an official decision taken 23 years ago, by general
international agreement, taught in schools since, and recently reasserted in
2010 when a new Panhispanic Orthography was published
(https://www.diariolibre.com/noticias/el-alfabeto-despide-letras-ch-y-ll-BNdl272715).
This would lead to treat Ñ as the only extra standalone letter in Spanish,
always sorted after N.
My local copy of es_ES (up-to-date Void Linux) has this right now:
LC_COLLATE
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"
END LC_COLLATE
This, of course, is horrible. It even ignores Ñ as a standalone letter.
Please, we need to move ahead. We are late to the party by some years now.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (9 preceding siblings ...)
2017-12-06 10:05 ` hector.monacci at gmail dot com
@ 2017-12-06 10:28 ` hector.monacci at gmail dot com
2017-12-06 10:37 ` hector.monacci at gmail dot com
` (8 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-06 10:28 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #10 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
Created attachment 10668
--> https://sourceware.org/bugzilla/attachment.cgi?id=10668&action=edit
Diccionario del Español Actual, 1999
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (10 preceding siblings ...)
2017-12-06 10:28 ` hector.monacci at gmail dot com
@ 2017-12-06 10:37 ` hector.monacci at gmail dot com
2017-12-09 11:26 ` hector.monacci at gmail dot com
` (7 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-06 10:37 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #11 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
Please find attached an example of Spanish letter sorting after the 1994
Agreement. This is from Diccionario del Español Actual, by Manuel Seco,
published in 1999, page 2860.
As you can see, after LIZO comes LLAGA (and then, several pages ahead, you will
find LUJO). This is to show that already back in 1999 the old LL "letter" was
nonexistent, and words carrying LL used the common latin-alphabet order. If the
old situation hadn't changed, this dictionary would have sorted these words
thus:
LIZO
LUJO
LLAGA
Instead, already in 1999 these words are sorted, in this Dictionary, thus:
LIZO
LLAGA
LUJO
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (11 preceding siblings ...)
2017-12-06 10:37 ` hector.monacci at gmail dot com
@ 2017-12-09 11:26 ` hector.monacci at gmail dot com
2017-12-09 11:32 ` hector.monacci at gmail dot com
` (6 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-09 11:26 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #12 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
Created attachment 10676
--> https://sourceware.org/bugzilla/attachment.cgi?id=10676&action=edit
Gran Diccionario Salvat, 1992
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (12 preceding siblings ...)
2017-12-09 11:26 ` hector.monacci at gmail dot com
@ 2017-12-09 11:32 ` hector.monacci at gmail dot com
2017-12-09 11:35 ` maiku.fabian at gmail dot com
` (5 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: hector.monacci at gmail dot com @ 2017-12-09 11:32 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #13 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
I have just attached another interesting proof.
Back in 1992, i.e., years before the Agreement was reached in 1994, there were
some very prestigious dictionaries that already used the new alphabet (without
CH and LL as independent letters).
The image is from page 241 of hte Gran Diccionario Salvat, originally published
in Barcelona. My copy is from a local edition in Buenos Aires. The newspaper La
Nación published it as a series of installments.
The fact that, back in this publication of 1992, only the Ñ was a special
Spanish character (and not CH and LL) didn't upset anybody.
It is a quarter of a century since. It is time to move on.
Please, can we agree on this?
We need Ñ to be treated properly under es_* locales, and especially under
LC_COLLATE sections. And we need to abandon support, at least as default, for
CH and LL as independent letters.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (13 preceding siblings ...)
2017-12-09 11:32 ` hector.monacci at gmail dot com
@ 2017-12-09 11:35 ` maiku.fabian at gmail dot com
2017-12-20 14:31 ` maiku.fabian at gmail dot com
` (4 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-09 11:35 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #14 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I agree.
On 9 Dec 2017 12:32, "hector.monacci at gmail dot com" <
sourceware-bugzilla@sourceware.org> wrote:
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #13 from Héctor M. Monacci <hector.monacci at gmail dot com> ---
I have just attached another interesting proof.
Back in 1992, i.e., years before the Agreement was reached in 1994, there
were
some very prestigious dictionaries that already used the new alphabet
(without
CH and LL as independent letters).
The image is from page 241 of hte Gran Diccionario Salvat, originally
published
in Barcelona. My copy is from a local edition in Buenos Aires. The
newspaper La
Nación published it as a series of installments.
The fact that, back in this publication of 1992, only the Ñ was a special
Spanish character (and not CH and LL) didn't upset anybody.
It is a quarter of a century since. It is time to move on.
Please, can we agree on this?
We need Ñ to be treated properly under es_* locales, and especially under
LC_COLLATE sections. And we need to abandon support, at least as default,
for
CH and LL as independent letters.
--
You are receiving this mail because:
You reported the bug.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (14 preceding siblings ...)
2017-12-09 11:35 ` maiku.fabian at gmail dot com
@ 2017-12-20 14:31 ` maiku.fabian at gmail dot com
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
` (3 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-12-20 14:31 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at sourceware dot org |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (15 preceding siblings ...)
2017-12-20 14:31 ` maiku.fabian at gmail dot com
@ 2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-02-28 14:13 ` maiku.fabian at gmail dot com
` (2 subsequent siblings)
19 siblings, 0 replies; 21+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2018-02-27 16:55 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #15 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 874c56d7979858bbb1bb1604c55769ad0ce7a072 (commit)
via 159738548130d5ac4fe6178977e940ed5f8cfdc4 (commit)
via ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5 (commit)
via ac3a3b4b0d561d776b60317d6a926050c8541655 (commit)
via 770cbe147cf33580e05ba6de78993c3070c5c2f8 (commit)
via 0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea (commit)
via 43f3893f4b5679cb9eb93300b18f7febd17e5239 (commit)
via df74ef786f9c87ce5404df3b68a91cb9d2c4c26f (commit)
via d5adfbadd47e6836a7ddae54fba9f88e2b3354db (commit)
via 5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d (commit)
via 8a97e9002ffa807b49e1222e5a9d51ce7896f209 (commit)
via bbdd2fba7d36d8f03c919b34f95238d8cf248b47 (commit)
via 1569e551aff088ed48e2694b07045256f3582271 (commit)
via 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4 (commit)
from 93d260ddda87a124d3fbb9af400fa154cfd00b4b (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=874c56d7979858bbb1bb1604c55769ad0ce7a072
commit 874c56d7979858bbb1bb1604c55769ad0ce7a072
Author: Mike FABIAN <mfabian@redhat.com>
Date: Thu Dec 21 18:56:52 2017 +0100
Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment.
See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898
These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.
As soon as that bug is fixed, these lines should be added back to the
test file.
* localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
be sorted correctly at the moment because of a bug.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=159738548130d5ac4fe6178977e940ed5f8cfdc4
commit 159738548130d5ac4fe6178977e940ed5f8cfdc4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 18:26:22 2017 +0100
Adapt collation in several locales to the new iso14651_t1_common file
[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)
* localedata/Makefile: Add new test files.
* localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
* localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation
order.
* localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
* localedata/am_ET.UTF-8.in: New test file.
* localedata/az_AZ.UTF-8.in: Likewise.
* localedata/be_BY.UTF-8.in: Likewise.
* localedata/ber_DZ.UTF-8.in: Likewise.
* localedata/ber_MA.UTF-8.in: Likewise.
* localedata/bg_BG.UTF-8.in: Likewise.
* localedata/br_FR.UTF-8.in: Likewise.
* localedata/cmn_TW.UTF-8.in: Likewise.
* localedata/crh_UA.UTF-8.in: Likewise.
* localedata/csb_PL.UTF-8.in: Likewise.
* localedata/cv_RU.UTF-8.in: Likewise.
* localedata/cy_GB.UTF-8.in: Likewise.
* localedata/dz_BT.UTF-8.in: Likewise.
* localedata/eo.UTF-8.in: Likewise.
* localedata/es_ES.UTF-8.in: Likewise.
* localedata/fa_IR.UTF-8.in: Likewise.
* localedata/fi_FI.UTF-8.in: Likewise.
* localedata/fil_PH.UTF-8.in: Likewise.
* localedata/fur_IT.UTF-8.in: Likewise.
* localedata/gez_ER.UTF-8@abegede.in: Likewise.
* localedata/ha_NG.UTF-8.in: Likewise.
* localedata/ig_NG.UTF-8.in: Likewise.
* localedata/ik_CA.UTF-8.in: Likewise.
* localedata/kk_KZ.UTF-8.in: Likewise.
* localedata/ku_TR.UTF-8.in: Likewise.
* localedata/ky_KG.UTF-8.in: Likewise.
* localedata/ln_CD.UTF-8.in: Likewise.
* localedata/mi_NZ.UTF-8.in: Likewise.
* localedata/ml_IN.UTF-8.in: Likewise.
* localedata/mn_MN.UTF-8.in: Likewise.
* localedata/mr_IN.UTF-8.in: Likewise.
* localedata/mt_MT.UTF-8.in: Likewise.
* localedata/nb_NO.UTF-8.in: Likewise.
* localedata/om_KE.UTF-8.in: Likewise.
* localedata/os_RU.UTF-8.in: Likewise.
* localedata/ps_AF.UTF-8.in: Likewise.
* localedata/ro_RO.UTF-8.in: Likewise.
* localedata/ru_RU.UTF-8.in: Likewise.
* localedata/sc_IT.UTF-8.in: Likewise.
* localedata/se_NO.UTF-8.in: Likewise.
* localedata/sq_AL.UTF-8.in: Likewise.
* localedata/sv_SE.UTF-8.in: Likewise.
* localedata/szl_PL.UTF-8.in: Likewise.
* localedata/tg_TJ.UTF-8.in: Likewise.
* localedata/tk_TM.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8@iqtelif.in: Likewise.
* localedata/ug_CN.UTF-8.in: Likewise.
* localedata/uz_UZ.UTF-8.in: Likewise.
* localedata/vi_VN.UTF-8.in: Likewise.
* localedata/yi_US.UTF-8.in: Likewise.
* localedata/yo_NG.UTF-8.in: Likewise.
* localedata/zh_CN.UTF-8.in: Likewise.
* localedata/locales/am_ET: Adapt collation rules to new
iso14651_t1_common
file and fix bugs in the collation.
* localedata/locales/az_AZ: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/br_FR: Likewise.
* localedata/locales/br_FR@euro: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/cns11643_stroke: Likewise.
* localedata/locales/crh_UA: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/csb_PL: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/en_CA: Likewise.
* localedata/locales/eo: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_EC: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/es_US: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fil_PH: Likewise.
* localedata/locales/fur_IT: Likewise.
* localedata/locales/gez_ER@abegede: Likewise.
* localedata/locales/ha_NG: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/hsb_DE: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/ig_NG: Likewise.
* localedata/locales/ik_CA: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/iso14651_t1_pinyin: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/ku_TR: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mi_NZ: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/mn_MN: Likewise.
* localedata/locales/mr_IN: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/ps_AF: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/ru_UA: Likewise.
* localedata/locales/sc_IT: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/sv_FI: Likewise.
* localedata/locales/sv_FI@euro: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/szl_PL: Likewise.
* localedata/locales/tg_TJ: Likewise.
* localedata/locales/ti_ER: Likewise.
* localedata/locales/tk_TM: Likewise.
* localedata/locales/tl_PH: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/tt_RU: Likewise.
* localedata/locales/tt_RU@iqtelif: Likewise.
* localedata/locales/ug_CN: Likewise.
* localedata/locales/uk_UA: Likewise.
* localedata/locales/uz_UZ: Likewise.
* localedata/locales/uz_UZ@cyrillic: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yo_NG: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5
commit ce6636b06b67d6bb9b3d6927bf2a926b9b7478f5
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Jan 1 15:33:50 2018 +0100
Improve gen-locales.mk and gen-locale.sh to make test files with @ options
work
With out this, adding collation test files like
localedata/gez_ER.UTF-8@abegede.in
does not work for locales which contain @ modifiers.
* gen-locales.mk: Make test files which contain @ modifiers in their
name work.
* localedata/gen-locale.sh: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ac3a3b4b0d561d776b60317d6a926050c8541655
commit ac3a3b4b0d561d776b60317d6a926050c8541655
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 23 17:29:36 2018 +0100
Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common
file.
See:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html
> A range expression represents the set of collating elements that fall
> between two elements in the current collation sequence,
> inclusively. It is expressed as the starting point and the ending
> point separated by a hyphen (-).
>
> Range expressions must not be used in portable applications because
> their behaviour is dependent on the collating sequence. Ranges will be
> treated according to the current collating sequence, and include such
> characters that fall within the range based on that collating
> sequence, regardless of character values. This, however, means that
> the interpretation will differ depending on collating sequence. If,
> for instance, one collating sequence defines ä as a variant of a,
> while another defines it as a letter following z, then the expression
> [ä-z] is valid in the first language and invalid in the second.
Therefore, using [a-z] does not make much sense except in the C/POSIX
locale.
The new iso14651_t1_common lists upper case and lower case Latin
characters
in a different order than the old one which causes surprising results
for example in the de_DE locale: [a-z] now includes A because A comes
after a in iso14651_t1_common but does not include Z because that comes
after z in iso14651_t1_common.
* posix/tst-fnmatch.input: Fix results for range expressions
for non C locales.
* posix/tst-regexloc.c: Do not use a range expression for
de_DE.ISO-8859-1 locale.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=770cbe147cf33580e05ba6de78993c3070c5c2f8
commit 770cbe147cf33580e05ba6de78993c3070c5c2f8
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Dec 15 07:19:45 2017 +0100
Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate
This test case tests how many collating elements are defined in
da_DK.ISO-8859-1 locale. The da_DK locale source defines 4:
collating-element <A-A> from "<U0041><U0041>"
collating-element <A-a> from "<U0041><U0061>"
collating-element <a-A> from "<U0061><U0041>"
collating-element <a-a> from "<U0061><U0061>"
The new iso14651_t1_common file defines more collating elements, two
of them are in the ISO-8859-1 range:
collating-element <U004C_00B7> from "<U004C><U00B7>" % decomposition of
LATIN CAPITAL LETTER L WITH MIDDLE DOT
collating-element <U006C_00B7> from "<U006C><U00B7>" % decomposition of
LATIN SMALL LETTER L WITH MIDDLE DOT
So the total count is now 6 instead of 4.
* posix/bug-regex5.c: Fix test case because with the new
iso14651_t1_common file, the da_DK locale now has 6 collating
elements
in the ISO-8859-1 range instead of 4 with the old
iso14651_t1_common
file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea
commit 0fc355d9a7b3cc9d5e4190ce929e1eb4459ef0ea
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 13 14:39:54 2017 +0100
Collation order of @-. and space has changed in new iso14651_t1_common
file, adapt test files
* localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
downloaded from ISO, the collation order of @-. and space has
changed.
Therefore, this test file needed to be adapted.
* localedata/fr_CA.UTF-8.in: Likewise.
* localedata/fr_FR.UTF-8.in: Likewise.
* localedata/uk_UA.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=43f3893f4b5679cb9eb93300b18f7febd17e5239
commit 43f3893f4b5679cb9eb93300b18f7febd17e5239
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Dec 12 14:39:34 2017 +0100
Collation order of ȥ has changed in new iso14651_t1_common file, adapt test
files
* localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
order of ȥ in the new iso14651_t1_common file.
* localedata/pl_PL.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=df74ef786f9c87ce5404df3b68a91cb9d2c4c26f
commit df74ef786f9c87ce5404df3b68a91cb9d2c4c26f
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 15:45:05 2018 +0100
Add sections for various scripts to the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Add sections for various
scripts to the iso14651_t1_common file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d5adfbadd47e6836a7ddae54fba9f88e2b3354db
commit d5adfbadd47e6836a7ddae54fba9f88e2b3354db
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Jan 31 06:18:47 2018 +0100
iso14651_t1_common: make the fourth level the codepoint for characters
which are ignorable on all 4 levels
Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d
commit 5f5a96109187b4bb4a10b62139ab1c7fe45f7c1d
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 20:00:24 2017 +0100
Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=8a97e9002ffa807b49e1222e5a9d51ce7896f209
commit 8a97e9002ffa807b49e1222e5a9d51ce7896f209
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:24:47 2018 +0100
Fixing syntax errors after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bbdd2fba7d36d8f03c919b34f95238d8cf248b47
commit bbdd2fba7d36d8f03c919b34f95238d8cf248b47
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:07:39 2018 +0100
iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> →
<U000\1>
* localedata/locales/iso14651_t1_common: replace all <U.....>
with <U000.....> because glibc understands only 4 digit or 8 digit
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1569e551aff088ed48e2694b07045256f3582271
commit 1569e551aff088ed48e2694b07045256f3582271
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:04:31 2018 +0100
Necessary changes after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Necessary changes
to make the file downloaded from ISO usable by glibc.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4
commit 9479b6d5e08eacce06c6ab60abc9b2f4eb8b71e4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 17:59:00 2018 +0100
Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same
character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 224 +
gen-locales.mk | 4 +-
localedata/Makefile | 185 +-
localedata/am_ET.UTF-8.in | 347 +
localedata/az_AZ.UTF-8.in | 73 +
localedata/be_BY.UTF-8.in | 16 +
localedata/ber_DZ.UTF-8.in | 50 +
localedata/ber_MA.UTF-8.in | 13 +
localedata/bg_BG.UTF-8.in | 57 +
localedata/br_FR.UTF-8.in | 15 +
localedata/cmn_TW.UTF-8.in |75649 ++++++++++++++++++++++++++
localedata/crh_UA.UTF-8.in | 50 +
localedata/cs_CZ.UTF-8.in | 4 +-
localedata/csb_PL.UTF-8.in | 70 +
localedata/cv_RU.UTF-8.in | 45 +
localedata/cy_GB.UTF-8.in | 72 +
localedata/da_DK.ISO-8859-1.in | 4 +-
localedata/dz_BT.UTF-8.in | 789 +
localedata/eo.UTF-8.in | 32 +
localedata/es_ES.UTF-8.in | 46 +
localedata/fa_IR.UTF-8.in | 71 +
localedata/fi_FI.UTF-8.in | 140 +
localedata/fil_PH.UTF-8.in | 16 +
localedata/fr_CA.UTF-8.in | 9 +-
localedata/fr_FR.UTF-8.in | 9 +-
localedata/fur_IT.UTF-8.in | 12 +
localedata/gen-locale.sh | 5 +-
localedata/gez_ER.UTF-8@abegede.in | 365 +
localedata/ha_NG.UTF-8.in | 47 +
localedata/ig_NG.UTF-8.in | 93 +
localedata/ik_CA.UTF-8.in | 60 +
localedata/kk_KZ.UTF-8.in | 40 +
localedata/ku_TR.UTF-8.in | 52 +
localedata/ky_KG.UTF-8.in | 72 +
localedata/ln_CD.UTF-8.in | 18 +
localedata/locales/am_ET | 549 +-
localedata/locales/az_AZ | 201 +-
localedata/locales/be_BY | 41 +-
localedata/locales/ber_DZ | 173 +-
localedata/locales/ber_MA | 42 +-
localedata/locales/bg_BG | 290 +-
localedata/locales/br_FR | 55 +-
localedata/locales/br_FR@euro | 3 +-
localedata/locales/ca_ES | 16 +-
localedata/locales/cns11643_stroke | 9 +-
localedata/locales/crh_UA | 111 +-
localedata/locales/cs_CZ | 69 +-
localedata/locales/csb_PL | 83 +-
localedata/locales/cv_RU | 75 +-
localedata/locales/cy_GB | 242 +-
localedata/locales/da_DK | 116 +-
localedata/locales/dz_BT | 2484 +-
localedata/locales/en_CA | 8 -
localedata/locales/eo | 69 +-
localedata/locales/es_CU | 3 +-
localedata/locales/es_EC | 2 +-
localedata/locales/es_ES | 49 +-
localedata/locales/es_US | 56 +-
localedata/locales/et_EE | 31 +-
localedata/locales/fa_IR | 287 +-
localedata/locales/fi_FI | 175 +-
localedata/locales/fil_PH | 57 +-
localedata/locales/fur_IT | 15 +-
localedata/locales/gez_ER@abegede | 409 +-
localedata/locales/ha_NG | 165 +-
localedata/locales/hr_HR | 84 +-
localedata/locales/hsb_DE | 64 +-
localedata/locales/hu_HU | 298 +-
localedata/locales/ig_NG | 453 +-
localedata/locales/ik_CA | 153 +-
localedata/locales/is_IS | 72 +-
localedata/locales/iso14651_t1_common |94998 +++++++++++++++++++++++++++++----
localedata/locales/iso14651_t1_pinyin | 9 +-
localedata/locales/kk_KZ | 132 +-
localedata/locales/ku_TR | 87 +-
localedata/locales/ky_KG | 59 +-
localedata/locales/ln_CD | 47 +-
localedata/locales/lt_LT | 52 +-
localedata/locales/lv_LV | 67 +-
localedata/locales/mi_NZ | 43 +-
localedata/locales/ml_IN | 158 +-
localedata/locales/mn_MN | 34 +-
localedata/locales/mr_IN | 76 +-
localedata/locales/mt_MT | 144 +-
localedata/locales/nan_TW@latin | 33 +-
localedata/locales/nb_NO | 120 +-
localedata/locales/om_KE | 120 +-
localedata/locales/os_RU | 14 +-
localedata/locales/pl_PL | 66 +-
localedata/locales/ps_AF | 224 +-
localedata/locales/ro_RO | 99 +-
localedata/locales/ru_RU | 24 +-
localedata/locales/ru_UA | 16 +-
localedata/locales/sc_IT | 15 +-
localedata/locales/se_NO | 298 +-
localedata/locales/si_LK | 42 +
localedata/locales/sq_AL | 291 +-
localedata/locales/sv_FI | 2 +-
localedata/locales/sv_FI@euro | 2 +-
localedata/locales/sv_SE | 113 +-
localedata/locales/szl_PL | 86 +-
localedata/locales/tg_TJ | 106 +-
localedata/locales/ti_ER | 2 +
localedata/locales/tk_TM | 399 +-
localedata/locales/tl_PH | 31 +-
localedata/locales/tr_TR | 47 +-
localedata/locales/tt_RU | 244 +-
localedata/locales/tt_RU@iqtelif | 14 +-
localedata/locales/ug_CN | 196 +-
localedata/locales/uk_UA | 487 +-
localedata/locales/uz_UZ | 131 +-
localedata/locales/uz_UZ@cyrillic | 56 +-
localedata/locales/vi_VN | 242 +-
localedata/locales/yi_US | 125 +-
localedata/locales/yo_NG | 365 +-
localedata/lv_LV.UTF-8.in | 6 +-
localedata/mi_NZ.UTF-8.in | 37 +
localedata/ml_IN.UTF-8.in | 25 +
localedata/mn_MN.UTF-8.in | 15 +
localedata/mr_IN.UTF-8.in | 9 +
localedata/mt_MT.UTF-8.in | 39 +
localedata/nan_TW.UTF-8@latin.in | 11 +
localedata/nb_NO.UTF-8.in | 66 +
localedata/om_KE.UTF-8.in | 36 +
localedata/os_RU.UTF-8.in | 9 +
localedata/pl_PL.UTF-8.in | 4 +-
localedata/ps_AF.UTF-8.in | 61 +
localedata/ro_RO.UTF-8.in | 32 +
localedata/ru_RU.UTF-8.in | 15 +
localedata/sc_IT.UTF-8.in | 12 +
localedata/se_NO.UTF-8.in | 144 +
localedata/sq_AL.UTF-8.in | 82 +
localedata/sv_SE.ISO-8859-1.in | 10 +-
localedata/sv_SE.UTF-8.in | 107 +
localedata/szl_PL.UTF-8.in | 49 +
localedata/tg_TJ.UTF-8.in | 105 +
localedata/tk_TM.UTF-8.in | 213 +
localedata/tt_RU.UTF-8.in | 194 +
localedata/tt_RU.UTF-8@iqtelif.in | 53 +
localedata/ug_CN.UTF-8.in | 16 +
localedata/uk_UA.UTF-8.in | 18 +-
localedata/uz_UZ.UTF-8.in | 26 +
localedata/vi_VN.UTF-8.in | 45 +
localedata/yi_US.UTF-8.in | 39 +
localedata/yo_NG.UTF-8.in | 30 +
localedata/zh_CN.UTF-8.in |25498 +++++++++
posix/bug-regex5.c | 4 +-
posix/tst-fnmatch.input | 58 +-
posix/tst-regexloc.c | 4 +-
149 files changed, 197751 insertions(+), 15000 deletions(-)
create mode 100644 localedata/am_ET.UTF-8.in
create mode 100644 localedata/az_AZ.UTF-8.in
create mode 100644 localedata/be_BY.UTF-8.in
create mode 100644 localedata/ber_DZ.UTF-8.in
create mode 100644 localedata/ber_MA.UTF-8.in
create mode 100644 localedata/bg_BG.UTF-8.in
create mode 100644 localedata/br_FR.UTF-8.in
create mode 100644 localedata/cmn_TW.UTF-8.in
create mode 100644 localedata/crh_UA.UTF-8.in
create mode 100644 localedata/csb_PL.UTF-8.in
create mode 100644 localedata/cv_RU.UTF-8.in
create mode 100644 localedata/cy_GB.UTF-8.in
create mode 100644 localedata/dz_BT.UTF-8.in
create mode 100644 localedata/eo.UTF-8.in
create mode 100644 localedata/es_ES.UTF-8.in
create mode 100644 localedata/fa_IR.UTF-8.in
create mode 100644 localedata/fi_FI.UTF-8.in
create mode 100644 localedata/fil_PH.UTF-8.in
create mode 100644 localedata/fur_IT.UTF-8.in
create mode 100644 localedata/gez_ER.UTF-8@abegede.in
create mode 100644 localedata/ha_NG.UTF-8.in
create mode 100644 localedata/ig_NG.UTF-8.in
create mode 100644 localedata/ik_CA.UTF-8.in
create mode 100644 localedata/kk_KZ.UTF-8.in
create mode 100644 localedata/ku_TR.UTF-8.in
create mode 100644 localedata/ky_KG.UTF-8.in
create mode 100644 localedata/ln_CD.UTF-8.in
create mode 100644 localedata/mi_NZ.UTF-8.in
create mode 100644 localedata/ml_IN.UTF-8.in
create mode 100644 localedata/mn_MN.UTF-8.in
create mode 100644 localedata/mr_IN.UTF-8.in
create mode 100644 localedata/mt_MT.UTF-8.in
create mode 100644 localedata/nan_TW.UTF-8@latin.in
create mode 100644 localedata/nb_NO.UTF-8.in
create mode 100644 localedata/om_KE.UTF-8.in
create mode 100644 localedata/os_RU.UTF-8.in
create mode 100644 localedata/ps_AF.UTF-8.in
create mode 100644 localedata/ro_RO.UTF-8.in
create mode 100644 localedata/ru_RU.UTF-8.in
create mode 100644 localedata/sc_IT.UTF-8.in
create mode 100644 localedata/se_NO.UTF-8.in
create mode 100644 localedata/sq_AL.UTF-8.in
create mode 100644 localedata/sv_SE.UTF-8.in
create mode 100644 localedata/szl_PL.UTF-8.in
create mode 100644 localedata/tg_TJ.UTF-8.in
create mode 100644 localedata/tk_TM.UTF-8.in
create mode 100644 localedata/tt_RU.UTF-8.in
create mode 100644 localedata/tt_RU.UTF-8@iqtelif.in
create mode 100644 localedata/ug_CN.UTF-8.in
create mode 100644 localedata/uz_UZ.UTF-8.in
create mode 100644 localedata/vi_VN.UTF-8.in
create mode 100644 localedata/yi_US.UTF-8.in
create mode 100644 localedata/yo_NG.UTF-8.in
create mode 100644 localedata/zh_CN.UTF-8.in
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (16 preceding siblings ...)
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
@ 2018-02-28 14:13 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
19 siblings, 0 replies; 21+ messages in thread
From: maiku.fabian at gmail dot com @ 2018-02-28 14:13 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
Target Milestone|--- |2.28
--- Comment #16 from Mike FABIAN <maiku.fabian at gmail dot com> ---
Fixed.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (17 preceding siblings ...)
2018-02-28 14:13 ` maiku.fabian at gmail dot com
@ 2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
19 siblings, 0 replies; 21+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2018-03-02 12:59 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
--- Comment #17 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, mfabian/collation-update-2.27 has been created
at 9589174d076327deb7ed816d16b89b0e7470abd6 (commit)
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9589174d076327deb7ed816d16b89b0e7470abd6
commit 9589174d076327deb7ed816d16b89b0e7470abd6
Author: Mike FABIAN <mfabian@redhat.com>
Date: Thu Dec 21 18:56:52 2017 +0100
Remove the lines from cmn_TW.UTF-8.in which cannot work at the moment.
See this bug https://sourceware.org/bugzilla/show_bug.cgi?id=22898
These lines don’t yet work because of a glibc bug, not because of
problems in the locale data. No matter what sorting rules one uses,
these characters cannot be sorted at all at the moment.
As soon as that bug is fixed, these lines should be added back to the
test file.
* localedata/cmn_TW.UTF-8.in: Remove the lines which cannot
be sorted correctly at the moment because of a bug.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e289a7d4c7f2abf09e4a4877b8cadcded7440e55
commit e289a7d4c7f2abf09e4a4877b8cadcded7440e55
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 18:26:22 2017 +0100
Adapt collation in several locales to the new iso14651_t1_common file
[BZ #22550] - es_ES locale (and other es_* locales): collation should
treat ñ as a primary different character, sync the collation
for Spanish with CLDR
[BZ #21547] - Tibetan script collation broken (Dzongkha and Tibetan)
* localedata/Makefile: Add new test files.
* localedata/lv_LV.UTF-8.in: Adapt test file to new collation order.
* localedata/sv_SE.ISO-8859-1.in: Adapt test file to new collation
order.
* localedata/uk_UA.UTF-8.in: Adapt test file to new collation order.
* localedata/am_ET.UTF-8.in: New test file.
* localedata/az_AZ.UTF-8.in: Likewise.
* localedata/be_BY.UTF-8.in: Likewise.
* localedata/ber_DZ.UTF-8.in: Likewise.
* localedata/ber_MA.UTF-8.in: Likewise.
* localedata/bg_BG.UTF-8.in: Likewise.
* localedata/br_FR.UTF-8.in: Likewise.
* localedata/cmn_TW.UTF-8.in: Likewise.
* localedata/crh_UA.UTF-8.in: Likewise.
* localedata/csb_PL.UTF-8.in: Likewise.
* localedata/cv_RU.UTF-8.in: Likewise.
* localedata/cy_GB.UTF-8.in: Likewise.
* localedata/dz_BT.UTF-8.in: Likewise.
* localedata/eo.UTF-8.in: Likewise.
* localedata/es_ES.UTF-8.in: Likewise.
* localedata/fa_IR.UTF-8.in: Likewise.
* localedata/fi_FI.UTF-8.in: Likewise.
* localedata/fil_PH.UTF-8.in: Likewise.
* localedata/fur_IT.UTF-8.in: Likewise.
* localedata/gez_ER.UTF-8@abegede.in: Likewise.
* localedata/ha_NG.UTF-8.in: Likewise.
* localedata/ig_NG.UTF-8.in: Likewise.
* localedata/ik_CA.UTF-8.in: Likewise.
* localedata/kk_KZ.UTF-8.in: Likewise.
* localedata/ku_TR.UTF-8.in: Likewise.
* localedata/ky_KG.UTF-8.in: Likewise.
* localedata/ln_CD.UTF-8.in: Likewise.
* localedata/mi_NZ.UTF-8.in: Likewise.
* localedata/ml_IN.UTF-8.in: Likewise.
* localedata/mn_MN.UTF-8.in: Likewise.
* localedata/mr_IN.UTF-8.in: Likewise.
* localedata/mt_MT.UTF-8.in: Likewise.
* localedata/nb_NO.UTF-8.in: Likewise.
* localedata/om_KE.UTF-8.in: Likewise.
* localedata/os_RU.UTF-8.in: Likewise.
* localedata/ps_AF.UTF-8.in: Likewise.
* localedata/ro_RO.UTF-8.in: Likewise.
* localedata/ru_RU.UTF-8.in: Likewise.
* localedata/sc_IT.UTF-8.in: Likewise.
* localedata/se_NO.UTF-8.in: Likewise.
* localedata/sq_AL.UTF-8.in: Likewise.
* localedata/sv_SE.UTF-8.in: Likewise.
* localedata/szl_PL.UTF-8.in: Likewise.
* localedata/tg_TJ.UTF-8.in: Likewise.
* localedata/tk_TM.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8.in: Likewise.
* localedata/tt_RU.UTF-8@iqtelif.in: Likewise.
* localedata/ug_CN.UTF-8.in: Likewise.
* localedata/uz_UZ.UTF-8.in: Likewise.
* localedata/vi_VN.UTF-8.in: Likewise.
* localedata/yi_US.UTF-8.in: Likewise.
* localedata/yo_NG.UTF-8.in: Likewise.
* localedata/zh_CN.UTF-8.in: Likewise.
* localedata/locales/am_ET: Adapt collation rules to new
iso14651_t1_common
file and fix bugs in the collation.
* localedata/locales/az_AZ: Likewise.
* localedata/locales/be_BY: Likewise.
* localedata/locales/ber_DZ: Likewise.
* localedata/locales/ber_MA: Likewise.
* localedata/locales/bg_BG: Likewise.
* localedata/locales/br_FR: Likewise.
* localedata/locales/br_FR@euro: Likewise.
* localedata/locales/ca_ES: Likewise.
* localedata/locales/cns11643_stroke: Likewise.
* localedata/locales/crh_UA: Likewise.
* localedata/locales/cs_CZ: Likewise.
* localedata/locales/csb_PL: Likewise.
* localedata/locales/cv_RU: Likewise.
* localedata/locales/cy_GB: Likewise.
* localedata/locales/da_DK: Likewise.
* localedata/locales/dz_BT: Likewise.
* localedata/locales/en_CA: Likewise.
* localedata/locales/eo: Likewise.
* localedata/locales/es_CU: Likewise.
* localedata/locales/es_EC: Likewise.
* localedata/locales/es_ES: Likewise.
* localedata/locales/es_US: Likewise.
* localedata/locales/et_EE: Likewise.
* localedata/locales/fa_IR: Likewise.
* localedata/locales/fi_FI: Likewise.
* localedata/locales/fil_PH: Likewise.
* localedata/locales/fur_IT: Likewise.
* localedata/locales/gez_ER@abegede: Likewise.
* localedata/locales/ha_NG: Likewise.
* localedata/locales/hr_HR: Likewise.
* localedata/locales/hsb_DE: Likewise.
* localedata/locales/hu_HU: Likewise.
* localedata/locales/ig_NG: Likewise.
* localedata/locales/ik_CA: Likewise.
* localedata/locales/is_IS: Likewise.
* localedata/locales/iso14651_t1_pinyin: Likewise.
* localedata/locales/kk_KZ: Likewise.
* localedata/locales/ku_TR: Likewise.
* localedata/locales/ky_KG: Likewise.
* localedata/locales/ln_CD: Likewise.
* localedata/locales/lt_LT: Likewise.
* localedata/locales/lv_LV: Likewise.
* localedata/locales/mi_NZ: Likewise.
* localedata/locales/ml_IN: Likewise.
* localedata/locales/mn_MN: Likewise.
* localedata/locales/mr_IN: Likewise.
* localedata/locales/mt_MT: Likewise.
* localedata/locales/nb_NO: Likewise.
* localedata/locales/om_KE: Likewise.
* localedata/locales/os_RU: Likewise.
* localedata/locales/pl_PL: Likewise.
* localedata/locales/ps_AF: Likewise.
* localedata/locales/ro_RO: Likewise.
* localedata/locales/ru_RU: Likewise.
* localedata/locales/ru_UA: Likewise.
* localedata/locales/sc_IT: Likewise.
* localedata/locales/se_NO: Likewise.
* localedata/locales/si_LK: Likewise.
* localedata/locales/sq_AL: Likewise.
* localedata/locales/sv_FI: Likewise.
* localedata/locales/sv_FI@euro: Likewise.
* localedata/locales/sv_SE: Likewise.
* localedata/locales/szl_PL: Likewise.
* localedata/locales/tg_TJ: Likewise.
* localedata/locales/ti_ER: Likewise.
* localedata/locales/tk_TM: Likewise.
* localedata/locales/tl_PH: Likewise.
* localedata/locales/tr_TR: Likewise.
* localedata/locales/tt_RU: Likewise.
* localedata/locales/tt_RU@iqtelif: Likewise.
* localedata/locales/ug_CN: Likewise.
* localedata/locales/uk_UA: Likewise.
* localedata/locales/uz_UZ: Likewise.
* localedata/locales/uz_UZ@cyrillic: Likewise.
* localedata/locales/vi_VN: Likewise.
* localedata/locales/yi_US: Likewise.
* localedata/locales/yo_NG: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=242596394db9dad6147bb2b7bcb53d8a7610e1d0
commit 242596394db9dad6147bb2b7bcb53d8a7610e1d0
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Jan 1 15:33:50 2018 +0100
Improve gen-locales.mk and gen-locale.sh to make test files with @ options
work
With out this, adding collation test files like
localedata/gez_ER.UTF-8@abegede.in
does not work for locales which contain @ modifiers.
* gen-locales.mk: Make test files which contain @ modifiers in their
name work.
* localedata/gen-locale.sh: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=cc5351f2c0502826f8b4143f3646d44e334ff7b8
commit cc5351f2c0502826f8b4143f3646d44e334ff7b8
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 23 17:29:36 2018 +0100
Fix test cases tst-fnmatch and tst-regexloc for the new iso14651_t1_common
file.
See:
http://pubs.opengroup.org/onlinepubs/7908799/xbd/re.html
> A range expression represents the set of collating elements that fall
> between two elements in the current collation sequence,
> inclusively. It is expressed as the starting point and the ending
> point separated by a hyphen (-).
>
> Range expressions must not be used in portable applications because
> their behaviour is dependent on the collating sequence. Ranges will be
> treated according to the current collating sequence, and include such
> characters that fall within the range based on that collating
> sequence, regardless of character values. This, however, means that
> the interpretation will differ depending on collating sequence. If,
> for instance, one collating sequence defines ä as a variant of a,
> while another defines it as a letter following z, then the expression
> [ä-z] is valid in the first language and invalid in the second.
Therefore, using [a-z] does not make much sense except in the C/POSIX
locale.
The new iso14651_t1_common lists upper case and lower case Latin
characters
in a different order than the old one which causes surprising results
for example in the de_DE locale: [a-z] now includes A because A comes
after a in iso14651_t1_common but does not include Z because that comes
after z in iso14651_t1_common.
* posix/tst-fnmatch.input: Fix results for range expressions
for non C locales.
* posix/tst-regexloc.c: Do not use a range expression for
de_DE.ISO-8859-1 locale.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ffa8106c727607fb365f2b93649fe3ea182dffe4
commit ffa8106c727607fb365f2b93649fe3ea182dffe4
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Dec 15 07:19:45 2017 +0100
Fix posix/bug-regex5.c test case, adapt to iso14651_t1_common upate
This test case tests how many collating elements are defined in
da_DK.ISO-8859-1 locale. The da_DK locale source defines 4:
collating-element <A-A> from "<U0041><U0041>"
collating-element <A-a> from "<U0041><U0061>"
collating-element <a-A> from "<U0061><U0041>"
collating-element <a-a> from "<U0061><U0061>"
The new iso14651_t1_common file defines more collating elements, two
of them are in the ISO-8859-1 range:
collating-element <U004C_00B7> from "<U004C><U00B7>" % decomposition of
LATIN CAPITAL LETTER L WITH MIDDLE DOT
collating-element <U006C_00B7> from "<U006C><U00B7>" % decomposition of
LATIN SMALL LETTER L WITH MIDDLE DOT
So the total count is now 6 instead of 4.
* posix/bug-regex5.c: Fix test case because with the new
iso14651_t1_common file, the da_DK locale now has 6 collating
elements
in the ISO-8859-1 range instead of 4 with the old
iso14651_t1_common
file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=61e613fb97aa619ae4fabac3f106d5fffe15eacb
commit 61e613fb97aa619ae4fabac3f106d5fffe15eacb
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 13 14:39:54 2017 +0100
Collation order of @-. and space has changed in new iso14651_t1_common
file, adapt test files
* localedata/da_DK.ISO-8859-1.in: In the new iso14651_t1_common file
downloaded from ISO, the collation order of @-. and space has
changed.
Therefore, this test file needed to be adapted.
* localedata/fr_CA.UTF-8.in: Likewise.
* localedata/fr_FR.UTF-8.in: Likewise.
* localedata/uk_UA.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=059454de60bdb1be9979ee09596c1e9a7e9e6c8b
commit 059454de60bdb1be9979ee09596c1e9a7e9e6c8b
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Dec 12 14:39:34 2017 +0100
Collation order of ȥ has changed in new iso14651_t1_common file, adapt test
files
* localedata/cs_CZ.UTF-8.in: adapt this test file to the collation
order of ȥ in the new iso14651_t1_common file.
* localedata/pl_PL.UTF-8.in: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1f4df3bb2ac69f2e1947c2953379a7f19b5f0c35
commit 1f4df3bb2ac69f2e1947c2953379a7f19b5f0c35
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 15:45:05 2018 +0100
Add sections for various scripts to the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Add sections for various
scripts to the iso14651_t1_common file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a93fecdcece3e2178834f4b4868b2309b0158753
commit a93fecdcece3e2178834f4b4868b2309b0158753
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Jan 31 06:18:47 2018 +0100
iso14651_t1_common: make the fourth level the codepoint for characters
which are ignorable on all 4 levels
Entries for characters which have “IGNORE” on all 4 levels like:
<U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
are changed into:
<U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
i.e. putting the code point of the character into the fourth level
instead of “IGNORE”. Without that change, all such characters
would compare equal which would make a wcscoll test case fail.
It is better to have a clearly defined sort order even for characters
like this so it is good to use the code point as a tie-break.
* localedata/locales/iso14651_t1_common: Use the code point of a
character in the fourth collation level instead of IGNORE for all
entries which have IGNORE on all 4 levels.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=3e7089bf28ed1fd77e644bb3ce7405aff7847e61
commit 3e7089bf28ed1fd77e644bb3ce7405aff7847e61
Author: Mike FABIAN <mfabian@redhat.com>
Date: Mon Dec 11 20:00:24 2017 +0100
Add convenience symbols like <AFTER-A>, <BEFORE-A> to iso14651_t1_common
* localedata/locales/iso14651_t1_common: Add some convenient collation
symbols like <AFTER-A>, <BEFORE-A> to make tailoring easier using
rules similar to those in CLDR.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=50a54ba443575e69ffb03aa67d53ccf8b66a4fbd
commit 50a54ba443575e69ffb03aa67d53ccf8b66a4fbd
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:24:47 2018 +0100
Fixing syntax errors after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: The new version of this
file downloaded from ISO contained several syntax errors which
are fixed by this patch.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=661ab21c7521ba8e6e8bc7dad897b6cf162e0cd0
commit 661ab21c7521ba8e6e8bc7dad897b6cf162e0cd0
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:07:39 2018 +0100
iso14651_t1_common: <U\([0-9A-F][0-9A-F][0-9A-F][0-9A-F][0-9A-F]\)> →
<U000\1>
* localedata/locales/iso14651_t1_common: replace all <U.....>
with <U000.....> because glibc understands only 4 digit or 8 digit
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=06061c30d615b2862ac360f11384092c92022ea7
commit 06061c30d615b2862ac360f11384092c92022ea7
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 18:04:31 2018 +0100
Necessary changes after updating the iso14651_t1_common file
* localedata/locales/iso14651_t1_common: Necessary changes
to make the file downloaded from ISO usable by glibc.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=bc1d41044c0cf9f0214acdbfd79b6cd11fd1e8c1
commit bc1d41044c0cf9f0214acdbfd79b6cd11fd1e8c1
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jan 30 17:59:00 2018 +0100
Update iso14651_t1_common file to ISO14651_2016_TABLE1_en.txt [BZ #14095]
[BZ #14095] - Review / update collation data from Unicode / ISO 14651
File downloaded from:
http://standards.iso.org/iso-iec/14651/ed-4/ISO14651_2016_TABLE1_en.txt
Updating this file alone is not enough, there are problems in the new
file which need to be fixed and the collation rules for many locales
need to be adapted. This is done by the following patches.
This update also fixes the problem that many characters are treated as
identical when sorting because they were not yet in the old
iso14651_t1_common file, see:
https://bugzilla.redhat.com/show_bug.cgi?id=1336308
- Infinite (∞) and empty set (∅) are treated as if they were the same
character by sort and uniq
[BZ #14095]
* localedata/locales/iso14651_t1_common: Update file to
latest version from ISO (ISO14651_2016_TABLE1_en.txt).
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=16e349c550942d274d3193ccedaa88855e3ac690
commit 16e349c550942d274d3193ccedaa88855e3ac690
Author: Mike FABIAN <mfabian@redhat.com>
Date: Fri Mar 2 11:29:24 2018 +0100
Remove --quiet argument when installing locales
Using this argument hides problems. I would like to see when something
fails.
* localedata/Makefile: Remove --quiet argument when
installing locales
-----------------------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
* [Bug localedata/22550] es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
` (18 preceding siblings ...)
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
@ 2018-03-31 12:32 ` jeremip11 at gmail dot com
19 siblings, 0 replies; 21+ messages in thread
From: jeremip11 at gmail dot com @ 2018-03-31 12:32 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=22550
Jeremi <jeremip11 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jeremip11 at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2018-03-31 12:32 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-05 14:32 [Bug localedata/22550] New: es_ES locale (and other es_* locales): collation should treat ñ as a primary different character, sync the collation for Spanish with CLDR maiku.fabian at gmail dot com
2017-12-05 14:38 ` [Bug localedata/22550] " maiku.fabian at gmail dot com
2017-12-05 14:46 ` maiku.fabian at gmail dot com
2017-12-05 15:18 ` carlos at redhat dot com
2017-12-05 15:20 ` carlos at redhat dot com
2017-12-05 15:31 ` carlos at redhat dot com
2017-12-05 15:47 ` carlos at redhat dot com
2017-12-05 16:13 ` hector.monacci at gmail dot com
2017-12-06 7:12 ` maiku.fabian at gmail dot com
2017-12-06 8:38 ` maiku.fabian at gmail dot com
2017-12-06 10:05 ` hector.monacci at gmail dot com
2017-12-06 10:28 ` hector.monacci at gmail dot com
2017-12-06 10:37 ` hector.monacci at gmail dot com
2017-12-09 11:26 ` hector.monacci at gmail dot com
2017-12-09 11:32 ` hector.monacci at gmail dot com
2017-12-09 11:35 ` maiku.fabian at gmail dot com
2017-12-20 14:31 ` maiku.fabian at gmail dot com
2018-02-27 16:55 ` cvs-commit at gcc dot gnu.org
2018-02-28 14:13 ` maiku.fabian at gmail dot com
2018-03-02 12:59 ` cvs-commit at gcc dot gnu.org
2018-03-31 12:32 ` jeremip11 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).