* [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 @ 2014-02-04 22:14 ju.orth+sourceware at gmail dot com 2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: ju.orth+sourceware at gmail dot com @ 2014-02-04 22:14 UTC (permalink / raw) To: libc-locales https://sourceware.org/bugzilla/show_bug.cgi?id=16527 Bug ID: 16527 Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8 Product: glibc Version: 2.18 Status: NEW Severity: normal Priority: P2 Component: localedata Assignee: unassigned at sourceware dot org Reporter: ju.orth+sourceware at gmail dot com CC: libc-locales at sourceware dot org Consider this program: ============= #include <stdio.h> #include <locale.h> #include <string.h> #include <malloc.h> void ps(const char *a) { size_t s; unsigned char *b; int i; s = strxfrm(NULL, a, 0); b = malloc(s+1); strxfrm((void *)b, a, s+1); for (i = 0; i <= s; i++) printf("%u ", (unsigned)b[i]); printf("\n"); } int main(void) { ps("퍼"); ps("흐"); setlocale(LC_COLLATE, ""); ps("퍼"); ps("흐"); } ============= On systems with LANG=en_US.UTF-8 the output is ============= 237 141 188 0 237 157 144 0 1 1 1 1 194 182 1 194 182 1 194 182 0 1 1 1 1 194 182 1 194 182 1 194 182 0 ============= The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale. This can be circumvented by adding the following code to iso14651_t1: ============= script <HANGUL> order_start <HANGUL>;forward;forward;forward;forward,position <UAC00> <UAC00>;IGNORE;IGNORE;IGNORE .. ..;IGNORE;IGNORE;IGNORE <UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE # order_end # ============= Right below a very similar workaround... -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8 2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com @ 2014-02-04 22:14 ` ju.orth+sourceware at gmail dot com 2014-06-13 8:46 ` fweimer at redhat dot com ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: ju.orth+sourceware at gmail dot com @ 2014-02-04 22:14 UTC (permalink / raw) To: libc-locales https://sourceware.org/bugzilla/show_bug.cgi?id=16527 ju.orth+sourceware at gmail dot com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |ju.orth+sourceware at gmail dot co | |m -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8 2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com 2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com @ 2014-06-13 8:46 ` fweimer at redhat dot com 2015-09-06 23:12 ` egmont at gmail dot com 2017-10-21 8:26 ` maiku.fabian at gmail dot com 3 siblings, 0 replies; 5+ messages in thread From: fweimer at redhat dot com @ 2014-06-13 8:46 UTC (permalink / raw) To: libc-locales https://sourceware.org/bugzilla/show_bug.cgi?id=16527 Florian Weimer <fweimer at redhat dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- Flags| |security- -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8 2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com 2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com 2014-06-13 8:46 ` fweimer at redhat dot com @ 2015-09-06 23:12 ` egmont at gmail dot com 2017-10-21 8:26 ` maiku.fabian at gmail dot com 3 siblings, 0 replies; 5+ messages in thread From: egmont at gmail dot com @ 2015-09-06 23:12 UTC (permalink / raw) To: libc-locales https://sourceware.org/bugzilla/show_bug.cgi?id=16527 Egmont Koblinger <egmont at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |egmont at gmail dot com --- Comment #1 from Egmont Koblinger <egmont at gmail dot com> --- Please see bug 18927 as well. -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8 2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com ` (2 preceding siblings ...) 2015-09-06 23:12 ` egmont at gmail dot com @ 2017-10-21 8:26 ` maiku.fabian at gmail dot com 3 siblings, 0 replies; 5+ messages in thread From: maiku.fabian at gmail dot com @ 2017-10-21 8:26 UTC (permalink / raw) To: libc-locales https://sourceware.org/bugzilla/show_bug.cgi?id=16527 Mike FABIAN <maiku.fabian at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |maiku.fabian at gmail dot com -- You are receiving this mail because: You are on the CC list for the bug. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-10-21 8:26 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com 2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com 2014-06-13 8:46 ` fweimer at redhat dot com 2015-09-06 23:12 ` egmont at gmail dot com 2017-10-21 8:26 ` maiku.fabian at gmail dot com
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).