public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
From: "b.cama at kerlink dot fr" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/23421] New: Strange collation rules for A and space with UTF-8 locale when other characters appended
Date: Tue, 17 Jul 2018 09:09:00 -0000	[thread overview]
Message-ID: <bug-23421-716@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=23421

            Bug ID: 23421
           Summary: Strange collation rules for A and space with UTF-8
                    locale when other characters appended
           Product: glibc
           Version: 2.28
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: b.cama at kerlink dot fr
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 11136
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11136&action=edit
A and space collation test case

Hi,
I stumbled against a strange string ordering bug, and managed to reduce it to
the attached test case. I *think* it comes from the locale data only, as I can
change the difference values slightly (but not the ordering) compared to an
older localedata, while still having the exact same behavior between 2.24 libc
and latest master (as of two days ago).

Here is the output with git master:

% ./testrun.sh ../collate_a_space
setlocale(LC_COLLATE,"C") = 361286057
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 380448880
strcoll("A", " ") = 1
strcoll("AB", " B") = -13
strcoll("B", " ") = 1
strcoll("BB", " B") = 1

(the result is exactly the same when not using testrun.sh and only setting
LOCPATH)

And with my stock libc (2.24):

% ../collate_a_space
setlocale(LC_COLLATE,"C") = 1774630437
strcoll("A", " ") = 33
strcoll("AB", " B") = 33
strcoll("B", " ") = 34
strcoll("BB", " B") = 34
setlocale(LC_COLLATE,"en_US.UTF-8") = 1082470992
strcoll("A", " ") = 1
strcoll("AB", " B") = -1
strcoll("B", " ") = 1
strcoll("BB", " B") = 1

Note the second strcoll test, which gives an opposite result with an UTF-8
locale compared to raw C one. This only happen with letter “A” (or even “a”),
but no other one, hence the “B” test for comparison. And the ordering is
correct when comparing lone “A” (or any letter) and “ ” (space), with nothing
appended.

Note that this is with 100% ASCII characters.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2018-07-17  9:09 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-17  9:09 b.cama at kerlink dot fr [this message]
2018-07-17  9:23 ` [Bug localedata/23421] " b.cama at kerlink dot fr
2018-07-17 15:21 ` carlos at redhat dot com
2018-07-17 16:42 ` b.cama at kerlink dot fr
2018-07-17 16:54 ` carlos at redhat dot com
2018-07-18  8:10 ` b.cama at kerlink dot fr
2018-07-18  8:36 ` schwab@linux-m68k.org
2018-07-18  8:52 ` b.cama at kerlink dot fr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-23421-716@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=libc-locales@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).