From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10701 invoked by alias); 4 Feb 2014 22:14:12 -0000 Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: libc-locales-owner@sourceware.org Received: (qmail 19136 invoked by uid 48); 4 Feb 2014 21:13:46 -0000 From: "ju.orth+sourceware at gmail dot com" To: libc-locales@sourceware.org Subject: [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 Date: Tue, 04 Feb 2014 22:14:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: localedata X-Bugzilla-Version: 2.18 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: ju.orth+sourceware at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-q1/txt/msg00016.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D16527 Bug ID: 16527 Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8 Product: glibc Version: 2.18 Status: NEW Severity: normal Priority: P2 Component: localedata Assignee: unassigned at sourceware dot org Reporter: ju.orth+sourceware at gmail dot com CC: libc-locales at sourceware dot org Consider this program: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D #include #include #include #include void ps(const char *a) { size_t s; unsigned char *b; int i; s =3D strxfrm(NULL, a, 0); b =3D malloc(s+1); strxfrm((void *)b, a, s+1); for (i =3D 0; i <=3D s; i++) printf("%u ", (unsigned)b[i]); printf("\n"); } int main(void) { ps("=ED=8D=BC"); ps("=ED=9D=90"); setlocale(LC_COLLATE, ""); ps("=ED=8D=BC"); ps("=ED=9D=90"); } =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On systems with LANG=3Den_US.UTF-8 the output is =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D 237 141 188 0=20 237 157 144 0=20 1 1 1 1 194 182 1 194 182 1 194 182 0=20 1 1 1 1 194 182 1 194 182 1 194 182 0=20 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The output after setlocale(LC_COLLATE, "") is completely nonsensical. Simil= ar useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale. This can be circumvented by adding the following code to iso14651_t1: =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D script order_start ;forward;forward;forward;forward,position ;IGNORE;IGNORE;IGNORE .. ..;IGNORE;IGNORE;IGNORE ;IGNORE;IGNORE;IGNORE # order_end # =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Right below a very similar workaround... --=20 You are receiving this mail because: You are on the CC list for the bug.