public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/672] New: Include iso14651_t1 in collation rules
@ 2005-01-16 8:22 barbier at linuxfr dot org
2005-01-16 8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-16 8:22 UTC (permalink / raw)
To: glibc-bugs
The rationale was given by Pablo in BZ#664.
I tested that sequences of 2 'alnum' characters produce the same
sorted output. Ligatures and expanded characters have different
weights, so there are some minor changes when checking with more
than 2 characters.
Extra rules are added to mimic current behavior, I did not fix any
supposed errors.
--
Summary: Include iso14651_t1 in collation rules
Product: glibc
Version: 2.3.4
Status: NEW
Severity: normal
Priority: P2
Component: localedata
AssignedTo: pere at hungry dot com
ReportedBy: barbier at linuxfr dot org
CC: glibc-bugs at sources dot redhat dot com
http://sources.redhat.com/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
@ 2005-01-16 8:24 ` barbier at linuxfr dot org
2005-09-24 19:06 ` drepper at redhat dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-16 8:24 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-16 08:24 -------
Created an attachment (id=368)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=368&action=view)
Patch to include iso14651_t1 in LC_COLLATE
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
2005-01-16 8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
@ 2005-09-24 19:06 ` drepper at redhat dot com
2005-10-26 22:08 ` barbier at linuxfr dot org
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2005-09-24 19:06 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From drepper at redhat dot com 2005-09-24 19:06 -------
How did you verify nothing changed?
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |drepper at redhat dot com
Status|NEW |WAITING
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
2005-01-16 8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
2005-09-24 19:06 ` drepper at redhat dot com
@ 2005-10-26 22:08 ` barbier at linuxfr dot org
2005-10-26 22:11 ` barbier at linuxfr dot org
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:08 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-10-26 22:08 -------
> How did you verify nothing changed?
I used the attached files to check differences:
* tst-show-table-sorted.c contains 2 loops to print 2 characters per
line, and sort them according to the current locale. Only
non-ignorable and alphanumeric characters are taken into account.
* test-collate.sh
+ applies collate-iso.patch
+ modifies iso14651_t1 so that include "iso14651_t1" gives the same
ruleset as in original locale files (this is a workaround for BZ645)
+ compiles original and patched locales
+ runs tst-show-table-sorted with these locales
+ compares output
The only differences are with
<U00AA>: FEMININE ORDINAL INDICATOR
<U00BA>: MASCULINE ORDINAL INDICATOR
<U00DF>: LATIN SMALL LETTER SHARP S
Some locales have also differences with respect to
<U00D0>: LATIN CAPITAL LETTER ETH
<U00F0>: LATIN SMALL LETTER ETH
<U00DE>: LATIN CAPITAL LETTER THORN
<U00FE>: LATIN SMALL LETTER THORN
but in such cases, these characters are not commonly used for this locale.
See the end of test-collate.sh for exhaustive results.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (2 preceding siblings ...)
2005-10-26 22:08 ` barbier at linuxfr dot org
@ 2005-10-26 22:11 ` barbier at linuxfr dot org
2005-10-26 22:12 ` barbier at linuxfr dot org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:11 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-10-26 22:11 -------
Created an attachment (id=728)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=728&action=view)
Program to display and sort all combinations of 2 characters
--
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (3 preceding siblings ...)
2005-10-26 22:11 ` barbier at linuxfr dot org
@ 2005-10-26 22:12 ` barbier at linuxfr dot org
2006-05-12 10:05 ` pablo at mandriva dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:12 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-10-26 22:12 -------
Created an attachment (id=729)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=729&action=view)
Script to compare output of tst-show-table-sorted with original and patched
locales
--
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (4 preceding siblings ...)
2005-10-26 22:12 ` barbier at linuxfr dot org
@ 2006-05-12 10:05 ` pablo at mandriva dot com
2006-05-12 10:18 ` pablo at mandriva dot com
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 10:05 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From pablo at mandriva dot com 2006-05-12 10:05 -------
Created an attachment (id=1018)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=1018&action=view)
improved iso14651_t1 file
improved iso14651_t1 file; changes are:
- converted to UTF-8 (for text in comments)
- added Armenian script block, with proper sorting
- added Tifinagh script block
- added a whole lot of latin and cyrillic script letters,
so they are "properly" sorted (not at random positions
before "0" or after "z", but, for example, "e with dot below"
sorted as "e", etc.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (5 preceding siblings ...)
2006-05-12 10:05 ` pablo at mandriva dot com
@ 2006-05-12 10:18 ` pablo at mandriva dot com
2006-05-12 14:33 ` pablo at mandriva dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 10:18 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From pablo at mandriva dot com 2006-05-12 10:17 -------
The use of iso14651_t1 by default then only redefine or add some local rules if
needed is indeed much better than redefine everyhing in a locale; as the things
redefined are much smaller, it helps understand the important rules, and more
easily detect errors and correct them.
Also, it also allow sorting in a predictable way the characters out of the scope
of the locale, which is a very nice thing to have.
I attached an improved iso14651_t1 that adds a lot of other latin and cyrillic
characters that were missing, so they get sorted too; it also handles double
accented letters (like in vietnamese); and adds armenian and tifinagh script
blocks; considet de t/s with cedilla and t/s with comma below as synonyms for
sorting and made digraphs (as opposed to ligatures) as synonyms of the base
letters for sorting.
It provides a much better default collating set.
Note that with the exception of t/s with cedilla and t/s with comma below and
the digraphs (which are unicode compatibility stuff and should never be typed
directly btw), I mainly only added new, previously ignored, characters.
The main advantages of that modified file are a proper (or at least, quite
acceptable) sorting, when using a generic (eg not specific to that language)
locale; in particular when sorting words from Armenian, Vietnamese, African or
Native American languages written in latin script, languages of former USSR
written in cyrillic script.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (6 preceding siblings ...)
2006-05-12 10:18 ` pablo at mandriva dot com
@ 2006-05-12 14:33 ` pablo at mandriva dot com
2006-05-12 16:31 ` petrosyan at gmail dot com
2007-02-18 4:35 ` drepper at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 14:33 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From pablo at mandriva dot com 2006-05-12 14:32 -------
Created an attachment (id=1019)
--> (http://sourceware.org/bugzilla/attachment.cgi?id=1019&action=view)
improved iso14651_t1 file
(fixed small problem (there were two defined symbols that were unused)
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #1018 is|0 |1
obsolete| |
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (7 preceding siblings ...)
2006-05-12 14:33 ` pablo at mandriva dot com
@ 2006-05-12 16:31 ` petrosyan at gmail dot com
2007-02-18 4:35 ` drepper at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: petrosyan at gmail dot com @ 2006-05-12 16:31 UTC (permalink / raw)
To: glibc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
CC| |petrosyan at gmail dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/672] Include iso14651_t1 in collation rules
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
` (8 preceding siblings ...)
2006-05-12 16:31 ` petrosyan at gmail dot com
@ 2007-02-18 4:35 ` drepper at redhat dot com
9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2007-02-18 4:35 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From drepper at redhat dot com 2007-02-18 04:34 -------
I've added the latest iso14651_t1 and then changed the locale definitions.
Please check whether this iis all that's needed.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|WAITING |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=672
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-02-18 4:35 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-16 8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
2005-01-16 8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
2005-09-24 19:06 ` drepper at redhat dot com
2005-10-26 22:08 ` barbier at linuxfr dot org
2005-10-26 22:11 ` barbier at linuxfr dot org
2005-10-26 22:12 ` barbier at linuxfr dot org
2006-05-12 10:05 ` pablo at mandriva dot com
2006-05-12 10:18 ` pablo at mandriva dot com
2006-05-12 14:33 ` pablo at mandriva dot com
2006-05-12 16:31 ` petrosyan at gmail dot com
2007-02-18 4:35 ` drepper at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).