public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/23421] New: Strange collation rules for A and space with UTF-8 locale when other characters appended
@ 2018-07-17  9:09 b.cama at kerlink dot fr
  2018-07-17  9:23 ` [Bug localedata/23421] " b.cama at kerlink dot fr
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: b.cama at kerlink dot fr @ 2018-07-17  9:09 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=23421

            Bug ID: 23421
           Summary: Strange collation rules for A and space with UTF-8
                    locale when other characters appended
           Product: glibc
           Version: 2.28
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: b.cama at kerlink dot fr
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 11136
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11136&action=edit
A and space collation test case

Hi,
I stumbled against a strange string ordering bug, and managed to reduce it to
the attached test case. I *think* it comes from the locale data only, as I can
change the difference values slightly (but not the ordering) compared to an
older localedata, while still having the exact same behavior between 2.24 libc
and latest master (as of two days ago).

Here is the output with git master:

% ./testrun.sh ../collate_a_space
setlocale(LC_COLLATE,"C") = 361286057
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 380448880
strcoll("A", " ") = 1
strcoll("AB", " B") = -13
strcoll("B", " ") = 1
strcoll("BB", " B") = 1

(the result is exactly the same when not using testrun.sh and only setting
LOCPATH)

And with my stock libc (2.24):

% ../collate_a_space
setlocale(LC_COLLATE,"C") = 1774630437
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 1082470992
strcoll("A", " ") = 1
strcoll("AB", " B") = -1
strcoll("B", " ") = 1
strcoll("BB", " B") = 1

Note the second strcoll test, which gives an opposite result with an UTF-8
locale compared to raw C one. This only happen with letter “A” (or even “a”),
but no other one, hence the “B” test for comparison. And the ordering is
correct when comparing lone “A” (or any letter) and “ ” (space), with nothing
appended.

Note that this is with 100% ASCII characters.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-07-18  8:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-17  9:09 [Bug localedata/23421] New: Strange collation rules for A and space with UTF-8 locale when other characters appended b.cama at kerlink dot fr
2018-07-17  9:23 ` [Bug localedata/23421] " b.cama at kerlink dot fr
2018-07-17 15:21 ` carlos at redhat dot com
2018-07-17 16:42 ` b.cama at kerlink dot fr
2018-07-17 16:54 ` carlos at redhat dot com
2018-07-18  8:10 ` b.cama at kerlink dot fr
2018-07-18  8:36 ` schwab@linux-m68k.org
2018-07-18  8:52 ` b.cama at kerlink dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).