public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE
@ 2005-01-07 22:49 barbier at linuxfr dot org
2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:49 UTC (permalink / raw)
To: glibc-bugs
Executive summary: several bugs in ld-collate.c make localedef produce
wrong collation data, here is a detailed analysis and a patch.
Sorting with French locales is special because diacritics are considered
from right to left, as described in ISO-14651 and many other documents.
And indeed, localedata/locales/iso14651_t1 contains
order_start <LATIN>;forward;backward;forward;forward,position
An example is available at
http://www.open-std.org/jtc1/sc22/wg20/docs/n602.htm#AnnexC
and fr_FR sort this text as if the backward directive had no effect.
I wrote simple tests to debug this problem; the xx_XX.tmpl locale file
defines a and A characters with the rule forward;forward;forward;forward,
and b, B with the rule forward;backward;forward;forward.
The tst-coll-rule program gets pairs of characters (with the same
primary level but different secondary level) as arguments, and
displays the direction of the 2nd level (f=forward, b=backward) for each
pair.
$ export LOCPATH=$(mktemp -d /tmp/localedef.XXXXXX)
$ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
$ LC_ALL=xx_XX ./tst-coll-rule aA bB
bb
After switching definitions for S1 and S2:
$ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
$ LC_ALL=xx_XX ./tst-coll-rule aA bB
ff
So the last definition wins and overwrites the other one. This is
due to the optimization of rulesets in ld-collate.c, line 1843 needs
to be changed from
memcmp (osect->rules, sect->rules, nrules) == 0
to
memcmp (osect->rules, sect->rules, nrules * sizeof (*osect->rules)) == 0
This patch being applied and xx_XX.tmpl reverted to its initial value,
we got now:
$ localedef -i xx_XX.tmpl -f ISO-8859-1 $LOCPATH/xx_XX
$ LC_ALL=xx_XX ./tst-coll-rule aA bB
bb
Huh? This patch does not look that good, and some more digging in
ld-collate.c is needed. There are named sections, at most one unnamed
section (defined without script name, e.g. order_start forward;forward)
and a symbol section, which stores symbols if they are read before the
first rule.
The test-collate.sh shell script defines all combinations of 2 level
scripts, and runs tst-coll-rule to check whether stored collation data
match their definition. Output is;
1st field: LC_COLLATE definition
s: there is a symbol section, i.e. symbols are defined before the
first order_start keyword.
N: order_start <script_name>;forward;forward
n: order_start <script_name>;forward;backward
U: order_start forward;forward
u: order_start forward;backward
2nd field: output of "LC_ALL=xx_XX tst-coll-rule aA bB", or **
when localedef segfaults.
3rd field: expected output
4th field: 0=match 1=mismatch *=localedef segfaults
Current CVS version:
snn bb bb 0 | sNn bb fb 1 | nn ** bb * | Nn ** fb *
snu bb bb 0 | sNu bb fb 1 | nu bb bb 0 | Nu bb fb 1
snN ff bf 1 | sNN ff ff 0 | nN ** bf * | NN ** ff *
snU ff bf 1 | sNU ff ff 0 | nU ff bf 1 | NU ff ff 0
sun bb bb 0 | sUn bb fb 1 | un bb bb 0 | Un bb fb 1
suN ff bf 1 | sUN ff ff 0 | uN ff bf 1 | UN ff ff 0
After applying the one-line patch described above:
snn bb bb 0 | sNn bb fb 1 | nn ** bb * | Nn ** fb *
snu bb bb 0 | sNu fb fb 0 | nu bb bb 0 | Nu bb fb 1
snN ff bf 1 | sNN ff ff 0 | nN ** bf * | NN ** ff *
snU bf bf 0 | sNU ff ff 0 | nU ff bf 1 | NU ff ff 0
sun bb bb 0 | sUn bb fb 1 | un bb bb 0 | Un bb fb 1
suN ff bf 1 | sUN ff ff 0 | uN ff bf 1 | UN ff ff 0
After applying ld-collate.patch:
snn bb bb 0 | sNn fb fb 0 | nn bb bb 0 | Nn fb fb 0
snu bb bb 0 | sNu fb fb 0 | nu bb bb 0 | Nu fb fb 0
snN bf bf 0 | sNN ff ff 0 | nN bf bf 0 | NN ff ff 0
snU bf bf 0 | sNU ff ff 0 | nU bf bf 0 | NU ff ff 0
sun bb bb 0 | sUn fb fb 0 | un bb bb 0 | Un fb fb 0
suN bf bf 0 | sUN ff ff 0 | uN bf bf 0 | UN ff ff 0
which looks much better. And indeed, my French locale now sorts
the sample file as expected, great.
--
Summary: localedef does not respect rule definitions in
LC_COLLATE
Product: glibc
Version: 2.3.4
Status: NEW
Severity: normal
Priority: P2
Component: libc
AssignedTo: gotom at debian dot or dot jp
ReportedBy: barbier at linuxfr dot org
CC: glibc-bugs at sources dot redhat dot com
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
@ 2005-01-07 22:51 ` barbier at linuxfr dot org
2005-01-07 22:53 ` barbier at linuxfr dot org
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:51 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-07 22:50 -------
Created an attachment (id=339)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=339&action=view)
patch giving good results for all cases (=last table)
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
@ 2005-01-07 22:53 ` barbier at linuxfr dot org
2005-01-07 22:55 ` barbier at linuxfr dot org
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:53 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-07 22:53 -------
Created an attachment (id=340)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=340&action=view)
sample text to check sorting in French environment, found in
http://www.open-std.org/jtc1/sc22/wg20/docs/n602.htm#AnnexC
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
2005-01-07 22:53 ` barbier at linuxfr dot org
@ 2005-01-07 22:55 ` barbier at linuxfr dot org
2005-01-07 22:56 ` barbier at linuxfr dot org
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:55 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-07 22:55 -------
Created an attachment (id=341)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=341&action=view)
locale file to perform tests
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (2 preceding siblings ...)
2005-01-07 22:55 ` barbier at linuxfr dot org
@ 2005-01-07 22:56 ` barbier at linuxfr dot org
2005-01-07 22:57 ` barbier at linuxfr dot org
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:56 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-07 22:56 -------
Created an attachment (id=342)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=342&action=view)
C source file to check rule directives in collation data
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (3 preceding siblings ...)
2005-01-07 22:56 ` barbier at linuxfr dot org
@ 2005-01-07 22:57 ` barbier at linuxfr dot org
2005-01-09 12:08 ` barbier at linuxfr dot org
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-07 22:57 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-07 22:57 -------
Created an attachment (id=343)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=343&action=view)
shell script to display results found in this bugreport
--
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (4 preceding siblings ...)
2005-01-07 22:57 ` barbier at linuxfr dot org
@ 2005-01-09 12:08 ` barbier at linuxfr dot org
2005-01-20 14:37 ` barbier at linuxfr dot org
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-09 12:08 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-09 12:08 -------
Created an attachment (id=344)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=344&action=view)
update for patch #339
The first chunk is added to the previous patch. Previously all
characters were wrongly assigned to the same section, but when
this gets fixed, characters defined by reorder-after keywords
must also be assigned to the right section, otherwise strxfrm
segfaults.
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #339 is|0 |1
obsolete| |
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (5 preceding siblings ...)
2005-01-09 12:08 ` barbier at linuxfr dot org
@ 2005-01-20 14:37 ` barbier at linuxfr dot org
2005-05-24 21:02 ` barbier at linuxfr dot org
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-01-20 14:37 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-01-20 14:37 -------
Created an attachment (id=381)
--> (http://sources.redhat.com/bugzilla/attachment.cgi?id=381&action=view)
updated patch
The last patch fixed reorder-after but broke plain definitions,
here is a better patch.
--
What |Removed |Added
----------------------------------------------------------------------------
Attachment #344 is|0 |1
obsolete| |
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (6 preceding siblings ...)
2005-01-20 14:37 ` barbier at linuxfr dot org
@ 2005-05-24 21:02 ` barbier at linuxfr dot org
2005-10-15 20:51 ` drepper at redhat dot com
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-05-24 21:02 UTC (permalink / raw)
To: glibc-bugs
--
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |968
http://sources.redhat.com/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (7 preceding siblings ...)
2005-05-24 21:02 ` barbier at linuxfr dot org
@ 2005-10-15 20:51 ` drepper at redhat dot com
2005-12-30 8:53 ` barbier at linuxfr dot org
2007-10-02 15:54 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2005-10-15 20:51 UTC (permalink / raw)
To: glibc-bugs
--
Bug 645 depends on bug 968, which changed state.
Bug 968 Summary: Integer overflow in strxfrm_l.c
http://sourceware.org/bugzilla/show_bug.cgi?id=968
What |Old Value |New Value
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (8 preceding siblings ...)
2005-10-15 20:51 ` drepper at redhat dot com
@ 2005-12-30 8:53 ` barbier at linuxfr dot org
2007-10-02 15:54 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: barbier at linuxfr dot org @ 2005-12-30 8:53 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From barbier at linuxfr dot org 2005-12-30 08:53 -------
With unmodified sources:
$ LC_ALL=en_US.UTF-8 locale collate-rulesets | od -tc
0000000 001 001 001 005 \n
This demonstrates that only one ruleset is taken into account
whereas en_US definition contains the following rulesets:
forward;backward;forward;forward,position
forward;forward;forward;forward,position
backward;backward;backward;forward,position
After applying the last ld-collate.patch:
$ LC_ALL=en_US.UTF-8 locale collate-rulesets | od -tc
0000000 001 002 001 005 001 001 001 005 002 002 002 005 \n
All rulesets are then considered.
--
http://sourceware.org/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Bug libc/645] localedef does not respect rule definitions in LC_COLLATE
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
` (9 preceding siblings ...)
2005-12-30 8:53 ` barbier at linuxfr dot org
@ 2007-10-02 15:54 ` drepper at redhat dot com
10 siblings, 0 replies; 12+ messages in thread
From: drepper at redhat dot com @ 2007-10-02 15:54 UTC (permalink / raw)
To: glibc-bugs
------- Additional Comments From drepper at redhat dot com 2007-10-02 15:54 -------
I've made changes to the cvs code which should also take care of this issue.
--
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
http://sourceware.org/bugzilla/show_bug.cgi?id=645
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-10-02 15:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-07 22:49 [Bug libc/645] New: localedef does not respect rule definitions in LC_COLLATE barbier at linuxfr dot org
2005-01-07 22:51 ` [Bug libc/645] " barbier at linuxfr dot org
2005-01-07 22:53 ` barbier at linuxfr dot org
2005-01-07 22:55 ` barbier at linuxfr dot org
2005-01-07 22:56 ` barbier at linuxfr dot org
2005-01-07 22:57 ` barbier at linuxfr dot org
2005-01-09 12:08 ` barbier at linuxfr dot org
2005-01-20 14:37 ` barbier at linuxfr dot org
2005-05-24 21:02 ` barbier at linuxfr dot org
2005-10-15 20:51 ` drepper at redhat dot com
2005-12-30 8:53 ` barbier at linuxfr dot org
2007-10-02 15:54 ` drepper at redhat dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).