public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/672] New: Include iso14651_t1 in collation rules
@ 2005-01-16  8:22 barbier at linuxfr dot org
  2005-01-16  8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-16  8:22 UTC (permalink / raw)
  To: glibc-bugs

The rationale was given by Pablo in BZ#664.

I tested that sequences of 2 'alnum' characters produce the same
sorted output.  Ligatures and expanded characters have different
weights, so there are some minor changes when checking with more
than 2 characters.
Extra rules are added to mimic current behavior, I did not fix any
supposed errors.

-- 
           Summary: Include iso14651_t1 in collation rules
           Product: glibc
           Version: 2.3.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
        AssignedTo: pere at hungry dot com
        ReportedBy: barbier at linuxfr dot org
                CC: glibc-bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
@ 2005-01-16  8:24 ` barbier at linuxfr dot org
  2005-09-24 19:06 ` drepper at redhat dot com
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-16  8:24 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-16 08:24 -------
Created an attachment (id=368)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=368&action=view)
Patch to include iso14651_t1 in LC_COLLATE


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
  2005-01-16  8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
@ 2005-09-24 19:06 ` drepper at redhat dot com
  2005-10-26 22:08 ` barbier at linuxfr dot org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2005-09-24 19:06 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2005-09-24 19:06 -------
How did you verify nothing changed?

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |drepper at redhat dot com
             Status|NEW                         |WAITING


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
  2005-01-16  8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
  2005-09-24 19:06 ` drepper at redhat dot com
@ 2005-10-26 22:08 ` barbier at linuxfr dot org
  2005-10-26 22:11 ` barbier at linuxfr dot org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:08 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-10-26 22:08 -------
> How did you verify nothing changed?

I used the attached files to check differences:
  * tst-show-table-sorted.c contains 2 loops to print 2 characters per
    line, and sort them according to the current locale.  Only
    non-ignorable and alphanumeric characters are taken into account.
  * test-collate.sh
    + applies collate-iso.patch
    + modifies iso14651_t1 so that include "iso14651_t1" gives the same
      ruleset as in original locale files (this is a workaround for BZ645)
    + compiles original and patched locales
    + runs tst-show-table-sorted with these locales
    + compares output

The only differences are with 
   <U00AA>: FEMININE ORDINAL INDICATOR
   <U00BA>: MASCULINE ORDINAL INDICATOR
   <U00DF>: LATIN SMALL LETTER SHARP S
Some locales have also differences with respect to
   <U00D0>: LATIN CAPITAL LETTER ETH
   <U00F0>: LATIN SMALL LETTER ETH
   <U00DE>: LATIN CAPITAL LETTER THORN
   <U00FE>: LATIN SMALL LETTER THORN
but in such cases, these characters are not commonly used for this locale.
See the end of test-collate.sh for exhaustive results.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (2 preceding siblings ...)
  2005-10-26 22:08 ` barbier at linuxfr dot org
@ 2005-10-26 22:11 ` barbier at linuxfr dot org
  2005-10-26 22:12 ` barbier at linuxfr dot org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:11 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-10-26 22:11 -------
Created an attachment (id=728)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=728&action=view)
Program to display and sort all combinations of 2 characters


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (3 preceding siblings ...)
  2005-10-26 22:11 ` barbier at linuxfr dot org
@ 2005-10-26 22:12 ` barbier at linuxfr dot org
  2006-05-12 10:05 ` pablo at mandriva dot com
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: barbier at linuxfr dot org @ 2005-10-26 22:12 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-10-26 22:12 -------
Created an attachment (id=729)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=729&action=view)
Script to compare output of tst-show-table-sorted with original and patched
locales


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (4 preceding siblings ...)
  2005-10-26 22:12 ` barbier at linuxfr dot org
@ 2006-05-12 10:05 ` pablo at mandriva dot com
  2006-05-12 10:18 ` pablo at mandriva dot com
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 10:05 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From pablo at mandriva dot com  2006-05-12 10:05 -------
Created an attachment (id=1018)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=1018&action=view)
improved iso14651_t1 file

improved iso14651_t1 file; changes are:
- converted to UTF-8 (for text in comments)
- added Armenian script block, with proper sorting
- added Tifinagh script block
- added a whole lot of latin and cyrillic script letters,
  so they are "properly" sorted (not at random positions
  before "0" or after "z", but, for example, "e with dot below"
  sorted as "e", etc.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (5 preceding siblings ...)
  2006-05-12 10:05 ` pablo at mandriva dot com
@ 2006-05-12 10:18 ` pablo at mandriva dot com
  2006-05-12 14:33 ` pablo at mandriva dot com
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 10:18 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From pablo at mandriva dot com  2006-05-12 10:17 -------
The use of iso14651_t1 by default then only redefine or add some local rules if
needed is indeed much better than redefine everyhing in a locale; as the things
redefined are much smaller, it helps understand the important rules, and more
easily detect errors and correct them.
Also, it also allow sorting in a predictable way the characters out of the scope
of the locale, which is a very nice thing to have.

I attached an improved iso14651_t1 that adds a lot of other latin and cyrillic
characters that were missing, so they get sorted too; it also handles double
accented letters (like in vietnamese); and adds armenian and tifinagh script
blocks; considet de t/s with cedilla and t/s with comma below as synonyms for
sorting and made digraphs (as opposed to ligatures) as synonyms of the base
letters for sorting.

It provides a much better default collating set.
Note that with the exception of t/s with cedilla and t/s with comma below and
the digraphs (which are unicode compatibility stuff and should never be typed
directly btw), I mainly only added new, previously ignored, characters.

The main advantages of that modified file are a proper (or at least, quite
acceptable) sorting, when using a generic (eg not specific to that language)
locale; in particular when sorting words from Armenian, Vietnamese, African or
Native American languages written in latin script, languages of former USSR
written in cyrillic script.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (6 preceding siblings ...)
  2006-05-12 10:18 ` pablo at mandriva dot com
@ 2006-05-12 14:33 ` pablo at mandriva dot com
  2006-05-12 16:31 ` petrosyan at gmail dot com
  2007-02-18  4:35 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: pablo at mandriva dot com @ 2006-05-12 14:33 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From pablo at mandriva dot com  2006-05-12 14:32 -------
Created an attachment (id=1019)
 --> (http://sourceware.org/bugzilla/attachment.cgi?id=1019&action=view)
improved iso14651_t1 file

(fixed small problem (there were two defined symbols that were unused)

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
Attachment #1018 is|0                           |1
           obsolete|                            |


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (7 preceding siblings ...)
  2006-05-12 14:33 ` pablo at mandriva dot com
@ 2006-05-12 16:31 ` petrosyan at gmail dot com
  2007-02-18  4:35 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: petrosyan at gmail dot com @ 2006-05-12 16:31 UTC (permalink / raw)
  To: glibc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |petrosyan at gmail dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug localedata/672] Include iso14651_t1 in collation rules
  2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
                   ` (8 preceding siblings ...)
  2006-05-12 16:31 ` petrosyan at gmail dot com
@ 2007-02-18  4:35 ` drepper at redhat dot com
  9 siblings, 0 replies; 11+ messages in thread
From: drepper at redhat dot com @ 2007-02-18  4:35 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2007-02-18 04:34 -------
I've added the latest iso14651_t1 and then changed the locale definitions. 
Please check whether this iis all that's needed.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=672

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-02-18  4:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-16  8:22 [Bug localedata/672] New: Include iso14651_t1 in collation rules barbier at linuxfr dot org
2005-01-16  8:24 ` [Bug localedata/672] " barbier at linuxfr dot org
2005-09-24 19:06 ` drepper at redhat dot com
2005-10-26 22:08 ` barbier at linuxfr dot org
2005-10-26 22:11 ` barbier at linuxfr dot org
2005-10-26 22:12 ` barbier at linuxfr dot org
2006-05-12 10:05 ` pablo at mandriva dot com
2006-05-12 10:18 ` pablo at mandriva dot com
2006-05-12 14:33 ` pablo at mandriva dot com
2006-05-12 16:31 ` petrosyan at gmail dot com
2007-02-18  4:35 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).