public inbox for libc-locales@sourceware.org
 help / color / mirror / Atom feed
* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8
  2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com
@ 2014-02-04 22:14 ` ju.orth+sourceware at gmail dot com
  2014-06-13  8:46 ` fweimer at redhat dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: ju.orth+sourceware at gmail dot com @ 2014-02-04 22:14 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16527

ju.orth+sourceware at gmail dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ju.orth+sourceware at gmail dot co
                   |                            |m

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8
@ 2014-02-04 22:14 ju.orth+sourceware at gmail dot com
  2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: ju.orth+sourceware at gmail dot com @ 2014-02-04 22:14 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16527

            Bug ID: 16527
           Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: ju.orth+sourceware at gmail dot com
                CC: libc-locales at sourceware dot org

Consider this program:

=============
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <malloc.h>

void ps(const char *a)
{
    size_t s;
    unsigned char *b;
    int i;

    s = strxfrm(NULL, a, 0);
    b = malloc(s+1);
    strxfrm((void *)b, a, s+1);
    for (i = 0; i <= s; i++)
        printf("%u ", (unsigned)b[i]);
    printf("\n");
}

int main(void)
{
    ps("퍼");
    ps("흐");

    setlocale(LC_COLLATE, "");

    ps("퍼");
    ps("흐");
}
=============

On systems with LANG=en_US.UTF-8 the output is

=============
237 141 188 0 
237 157 144 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
=============

The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar
useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and
jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale.

This can be circumvented by adding the following code to iso14651_t1:

=============
script <HANGUL>

order_start <HANGUL>;forward;forward;forward;forward,position
<UAC00> <UAC00>;IGNORE;IGNORE;IGNORE
.. ..;IGNORE;IGNORE;IGNORE
<UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE
#
order_end
#
=============

Right below a very similar workaround...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8
  2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com
  2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com
@ 2014-06-13  8:46 ` fweimer at redhat dot com
  2015-09-06 23:12 ` egmont at gmail dot com
  2017-10-21  8:26 ` maiku.fabian at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13  8:46 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16527

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8
  2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com
  2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com
  2014-06-13  8:46 ` fweimer at redhat dot com
@ 2015-09-06 23:12 ` egmont at gmail dot com
  2017-10-21  8:26 ` maiku.fabian at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: egmont at gmail dot com @ 2015-09-06 23:12 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16527

Egmont Koblinger <egmont at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |egmont at gmail dot com

--- Comment #1 from Egmont Koblinger <egmont at gmail dot com> ---
Please see bug 18927 as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug localedata/16527] strxfrm & strcoll broken with Hangul & en_US.UTF-8
  2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com
                   ` (2 preceding siblings ...)
  2015-09-06 23:12 ` egmont at gmail dot com
@ 2017-10-21  8:26 ` maiku.fabian at gmail dot com
  3 siblings, 0 replies; 5+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21  8:26 UTC (permalink / raw)
  To: libc-locales

https://sourceware.org/bugzilla/show_bug.cgi?id=16527

Mike FABIAN <maiku.fabian at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maiku.fabian at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-10-21  8:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-04 22:14 [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8 ju.orth+sourceware at gmail dot com
2014-02-04 22:14 ` [Bug localedata/16527] " ju.orth+sourceware at gmail dot com
2014-06-13  8:46 ` fweimer at redhat dot com
2015-09-06 23:12 ` egmont at gmail dot com
2017-10-21  8:26 ` maiku.fabian at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).