public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE
@ 2005-01-07 22:49 barbier at linuxfr dot org
  2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:49 UTC (permalink / raw)
  To: glibc-bugs

Executive summary: several bugs in ld-collate.c make localedef produce
  wrong collation data, here is a detailed analysis and a patch.

Sorting with French locales is special because diacritics are considered
from right to left, as described in ISO-14651 and many other documents.
And indeed, localedata/locales/iso14651_t1 contains
  order_start <LATIN>;forward;backward;forward;forward,position
An example is available at
  http://www.open-std.org/jtc1/sc22/wg20/docs/n602.htm#AnnexC
and fr_FR sort this text as if the backward directive had no effect.

I wrote simple tests to debug this problem; the xx_XX.tmpl locale file
defines a and A characters with the rule forward;forward;forward;forward,
and b, B with the rule forward;backward;forward;forward.
The tst-coll-rule program gets pairs of characters (with the same
primary level but different secondary level) as arguments, and
displays the direction of the 2nd level (f=forward, b=backward) for each
pair.
  $ export LOCPATH=$(mktemp -d /tmp/localedef.XXXXXX)
  $ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
  $ LC_ALL=xx_XX ./tst-coll-rule aA bB 
  bb
After switching definitions for S1 and S2:
  $ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
  $ LC_ALL=xx_XX ./tst-coll-rule aA bB 
  ff

So the last definition wins and overwrites the other one.  This is
due to the optimization of rulesets in ld-collate.c, line 1843 needs
to be changed from
  memcmp (osect->rules, sect->rules, nrules) == 0
to
  memcmp (osect->rules, sect->rules, nrules * sizeof (*osect->rules)) == 0

This patch being applied and xx_XX.tmpl reverted to its initial value,
we got now:
  $ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
  $ LC_ALL=xx_XX ./tst-coll-rule aA bB 
  bb

Huh?  This patch does not look that good, and some more digging in
ld-collate.c is needed.  There are named sections, at most one unnamed
section (defined without script name, e.g. order_start forward;forward)
and a symbol section, which stores symbols if they are read before the
first rule.

The test-collate.sh shell script defines all combinations of 2 level
scripts, and runs tst-coll-rule to check whether stored collation data
match their definition.  Output is;
  1st field: LC_COLLATE definition
    s: there is a symbol section, i.e. symbols are defined before the
       first order_start keyword.
    N: order_start <script_name>;forward;forward
    n: order_start <script_name>;forward;backward
    U: order_start forward;forward
    u: order_start forward;backward
  2nd field: output of "LC_ALL=xx_XX tst-coll-rule aA bB", or **
      when localedef segfaults.
  3rd field: expected output
  4th field: 0=match  1=mismatch  *=localedef segfaults

Current CVS version:
  snn bb bb 0 | sNn bb fb 1 | nn ** bb * | Nn ** fb * 
  snu bb bb 0 | sNu bb fb 1 | nu bb bb 0 | Nu bb fb 1 
  snN ff bf 1 | sNN ff ff 0 | nN ** bf * | NN ** ff * 
  snU ff bf 1 | sNU ff ff 0 | nU ff bf 1 | NU ff ff 0 
  sun bb bb 0 | sUn bb fb 1 | un bb bb 0 | Un bb fb 1 
  suN ff bf 1 | sUN ff ff 0 | uN ff bf 1 | UN ff ff 0 
After applying the one-line patch described above:
  snn bb bb 0 | sNn bb fb 1 | nn ** bb * | Nn ** fb *
  snu bb bb 0 | sNu fb fb 0 | nu bb bb 0 | Nu bb fb 1
  snN ff bf 1 | sNN ff ff 0 | nN ** bf * | NN ** ff *
  snU bf bf 0 | sNU ff ff 0 | nU ff bf 1 | NU ff ff 0
  sun bb bb 0 | sUn bb fb 1 | un bb bb 0 | Un bb fb 1
  suN ff bf 1 | sUN ff ff 0 | uN ff bf 1 | UN ff ff 0
After applying ld-collate.patch:
  snn bb bb 0 | sNn fb fb 0 | nn bb bb 0 | Nn fb fb 0
  snu bb bb 0 | sNu fb fb 0 | nu bb bb 0 | Nu fb fb 0
  snN bf bf 0 | sNN ff ff 0 | nN bf bf 0 | NN ff ff 0
  snU bf bf 0 | sNU ff ff 0 | nU bf bf 0 | NU ff ff 0
  sun bb bb 0 | sUn fb fb 0 | un bb bb 0 | Un fb fb 0
  suN bf bf 0 | sUN ff ff 0 | uN bf bf 0 | UN ff ff 0
which looks much better.  And indeed, my French locale now sorts
the sample file as expected, great.

-- 
           Summary: localedef does not respect rule definitions in
                    LC_COLLATE
           Product: glibc
           Version: 2.3.4
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: gotom at debian dot or dot jp
        ReportedBy: barbier at linuxfr dot org
                CC: glibc-bugs at sources dot redhat dot com


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
@ 2005-01-07 22:51 ` barbier at linuxfr dot org
  2005-01-07 22:53 ` barbier at linuxfr dot org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:51 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-07 22:50 -------
Created an attachment (id=339)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=339&action=view)
patch giving good results for all cases (=last table)


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
  2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
@ 2005-01-07 22:53 ` barbier at linuxfr dot org
  2005-01-07 22:55 ` barbier at linuxfr dot org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:53 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-07 22:53 -------
Created an attachment (id=340)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=340&action=view)
sample text to check sorting in French environment, found in
http://www.open-std.org/jtc1/sc22/wg20/docs/n602.htm#AnnexC


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
  2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
  2005-01-07 22:53 ` barbier at linuxfr dot org
@ 2005-01-07 22:55 ` barbier at linuxfr dot org
  2005-01-07 22:56 ` barbier at linuxfr dot org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:55 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-07 22:55 -------
Created an attachment (id=341)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=341&action=view)
locale file to perform tests


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (2 preceding siblings ...)
  2005-01-07 22:55 ` barbier at linuxfr dot org
@ 2005-01-07 22:56 ` barbier at linuxfr dot org
  2005-01-07 22:57 ` barbier at linuxfr dot org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:56 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-07 22:56 -------
Created an attachment (id=342)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=342&action=view)
C source file to check rule directives in collation data


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (3 preceding siblings ...)
  2005-01-07 22:56 ` barbier at linuxfr dot org
@ 2005-01-07 22:57 ` barbier at linuxfr dot org
  2005-01-09 12:08 ` barbier at linuxfr dot org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:57 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-07 22:57 -------
Created an attachment (id=343)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=343&action=view)
shell script to display results found in this bugreport


-- 


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (4 preceding siblings ...)
  2005-01-07 22:57 ` barbier at linuxfr dot org
@ 2005-01-09 12:08 ` barbier at linuxfr dot org
  2005-01-20 14:37 ` barbier at linuxfr dot org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-09 12:08 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-09 12:08 -------
Created an attachment (id=344)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=344&action=view)
update for patch #339

The first chunk is added to the previous patch.  Previously all
characters were wrongly assigned to the same section, but when
this gets fixed, characters defined by reorder-after keywords
must also be assigned to the right section, otherwise strxfrm
segfaults.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #339 is|0                           |1
           obsolete|                            |


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (5 preceding siblings ...)
  2005-01-09 12:08 ` barbier at linuxfr dot org
@ 2005-01-20 14:37 ` barbier at linuxfr dot org
  2005-05-24 21:02 ` barbier at linuxfr dot org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-20 14:37 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-01-20 14:37 -------
Created an attachment (id=381)
 --> (http://sources.redhat.com/bugzilla/attachment.cgi?id=381&action=view)
updated patch

The last patch fixed reorder-after but broke plain definitions,
here is a better patch.


-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #344 is|0                           |1
           obsolete|                            |


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (6 preceding siblings ...)
  2005-01-20 14:37 ` barbier at linuxfr dot org
@ 2005-05-24 21:02 ` barbier at linuxfr dot org
  2005-10-15 20:51 ` drepper at redhat dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-05-24 21:02 UTC (permalink / raw)
  To: glibc-bugs



-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |968


http://sources.redhat.com/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (7 preceding siblings ...)
  2005-05-24 21:02 ` barbier at linuxfr dot org
@ 2005-10-15 20:51 ` drepper at redhat dot com
  2005-12-30  8:53 ` barbier at linuxfr dot org
  2007-10-02 15:54 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2005-10-15 20:51 UTC (permalink / raw)
  To: glibc-bugs



-- 
Bug 645 depends on bug 968, which changed state.

Bug 968 Summary: Integer overflow in strxfrm_l.c
http://sourceware.org/bugzilla/show_bug.cgi?id=968

           What    |Old Value                   |New Value
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

http://sourceware.org/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (8 preceding siblings ...)
  2005-10-15 20:51 ` drepper at redhat dot com
@ 2005-12-30  8:53 ` barbier at linuxfr dot org
  2007-10-02 15:54 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-12-30  8:53 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From barbier at linuxfr dot org  2005-12-30 08:53 -------
With unmodified sources:
  $ LC_ALL=en_US.UTF-8 locale collate-rulesets | od -tc
  0000000 001 001 001 005  \n

This demonstrates that only one ruleset is taken into account
whereas en_US definition contains the following rulesets:
  forward;backward;forward;forward,position
  forward;forward;forward;forward,position
  backward;backward;backward;forward,position

After applying the last ld-collate.patch:
  $ LC_ALL=en_US.UTF-8 locale collate-rulesets | od -tc
  0000000 001 002 001 005 001 001 001 005 002 002 002 005  \n
All rulesets are then considered.


-- 


http://sourceware.org/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
  2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
                   ` (9 preceding siblings ...)
  2005-12-30  8:53 ` barbier at linuxfr dot org
@ 2007-10-02 15:54 ` drepper at redhat dot com
  10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2007-10-02 15:54 UTC (permalink / raw)
  To: glibc-bugs


------- Additional Comments From drepper at redhat dot com  2007-10-02 15:54 -------
I've made changes to the cvs code which should also take care of this issue.

-- 
           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


http://sourceware.org/bugzilla/show_bug.cgi?id=645

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2007-10-02 15:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
2005-01-07 22:53 ` barbier at linuxfr dot org
2005-01-07 22:55 ` barbier at linuxfr dot org
2005-01-07 22:56 ` barbier at linuxfr dot org
2005-01-07 22:57 ` barbier at linuxfr dot org
2005-01-09 12:08 ` barbier at linuxfr dot org
2005-01-20 14:37 ` barbier at linuxfr dot org
2005-05-24 21:02 ` barbier at linuxfr dot org
2005-10-15 20:51 ` drepper at redhat dot com
2005-12-30  8:53 ` barbier at linuxfr dot org
2007-10-02 15:54 ` drepper at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).