From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5226F3858C5E; Fri, 14 Apr 2023 13:46:05 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5226F3858C5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1681479965; bh=jL0VkD4+K0x8hQM+1jBVLmtwnGkb2goJhCz6hU9UeN0=; h=From:To:Subject:Date:In-Reply-To:References:From; b=Y/6VzDfqnbjTRkp419POTXeOKaNZo3jvSeO8qrtthcUn6a8sq+nWVh41amSQmq/Qr wd+X88DDmuDR4qMCY6bWAo19U3+25gxCz+FmyQMcOPvM55wWHQiOvfQalzWxt3lrJ6 VuvKK4azeOfFAZQpUYRbtiFw4sE676/ZT3GC0BSM= From: "infinity0 at pwned dot gg" To: libc-locales@sourceware.org Subject: =?UTF-8?B?W0J1ZyBsb2NhbGVkYXRhLzMwMTQ5XSBlbl9HQiB0aGlua3MgIg==?= =?UTF-8?B?45CsIiBhbmQgIuOSvCIgYXJlIHRoZSBzYW1lIGNoYXJhY3Rlciwgb3RoZXIg?= =?UTF-8?B?bG9jYWxlcyBhcmUgZmluZQ==?= Date: Fri, 14 Apr 2023 13:46:04 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.36 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: infinity0 at pwned dot gg X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D30149 --- Comment #2 from infinity0 at pwned dot gg --- Thanks for the investigation & explanation. However I can confirm again that en_US.UTF-8 does indeed work as above for me, giving the correct answer unl= ike en_GB which gives the incorrect answer as reported. In fact every other loc= ale works correctly, except en_GB and surprisingly zh_TW as well: $ for l in /usr/share/locale/*; do echo $(LC_COLLATE=3D$(basename $l).UTF-8= sort -u test2.txt | wc -l) $l; done | grep -v ^2 1 /usr/share/locale/en_GB 1 /usr/share/locale/zh_TW Furthermore, other character I tried are fine, giving the expected "2" even= in en_GB and zh_TW. Granted, =E3=90=AC and =E3=92=BC rare characters mostly used for linguistic= study purposes and not everyday use, however they are still certainly distinct characters and = the locale and/or sorting is buggy. Are you able to replicate this? Feel free to ask me to run stuff on my comp= uter to debug further, in case my results are different from yours. --=20 You are receiving this mail because: You are on the CC list for the bug.=