From: "b.cama at kerlink dot fr" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/23421] New: Strange collation rules for A and space with UTF-8 locale when other characters appended
Date: Tue, 17 Jul 2018 09:09:00 -0000 [thread overview]
Message-ID: <bug-23421-716@http.sourceware.org/bugzilla/> (raw)
https://sourceware.org/bugzilla/show_bug.cgi?id=23421
Bug ID: 23421
Summary: Strange collation rules for A and space with UTF-8
locale when other characters appended
Product: glibc
Version: 2.28
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: b.cama at kerlink dot fr
CC: libc-locales at sourceware dot org
Target Milestone: ---
Created attachment 11136
--> https://sourceware.org/bugzilla/attachment.cgi?id=11136&action=edit
A and space collation test case
Hi,
I stumbled against a strange string ordering bug, and managed to reduce it to
the attached test case. I *think* it comes from the locale data only, as I can
change the difference values slightly (but not the ordering) compared to an
older localedata, while still having the exact same behavior between 2.24 libc
and latest master (as of two days ago).
Here is the output with git master:
% ./testrun.sh ../collate_a_space
setlocale(LC_COLLATE,"C") = 361286057
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 380448880
strcoll("A", " ") = 1
strcoll("AB", " B") = -13
strcoll("B", " ") = 1
strcoll("BB", " B") = 1
(the result is exactly the same when not using testrun.sh and only setting
LOCPATH)
And with my stock libc (2.24):
% ../collate_a_space
setlocale(LC_COLLATE,"C") = 1774630437
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 1082470992
strcoll("A", " ") = 1
strcoll("AB", " B") = -1
strcoll("B", " ") = 1
strcoll("BB", " B") = 1
Note the second strcoll test, which gives an opposite result with an UTF-8
locale compared to raw C one. This only happen with letter “A” (or even “a”),
but no other one, hence the “B” test for comparison. And the ordering is
correct when comparing lone “A” (or any letter) and “ ” (space), with nothing
appended.
Note that this is with 100% ASCII characters.
--
You are receiving this mail because:
You are on the CC list for the bug.
next reply other threads:[~2018-07-17 9:09 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-17 9:09 b.cama at kerlink dot fr [this message]
2018-07-17 9:23 ` [Bug localedata/23421] " b.cama at kerlink dot fr
2018-07-17 15:21 ` carlos at redhat dot com
2018-07-17 16:42 ` b.cama at kerlink dot fr
2018-07-17 16:54 ` carlos at redhat dot com
2018-07-18 8:10 ` b.cama at kerlink dot fr
2018-07-18 8:36 ` schwab@linux-m68k.org
2018-07-18 8:52 ` b.cama at kerlink dot fr
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-23421-716@http.sourceware.org/bugzilla/ \
--to=sourceware-bugzilla@sourceware.org \
--cc=libc-locales@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).