public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D
@ 2011-08-06 17:21 an.euroford at gmail dot com
  2011-08-06 17:25 ` [Bug localedata/13063] " an.euroford at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-06 17:21 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13063

           Summary: Can not 'sort -u' all Chinese characters in CJK
                    UNIFIED IDEOGRAPH EXTENSION A/B/C/D
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: critical
          Priority: P2
         Component: localedata
        AssignedTo: libc-locales@sources.redhat.com
        ReportedBy: an.euroford@gmail.com


Hi,

Refer to glibc/localedata/locales/zh_CN and iso14651_t1_pinyin or
iso14651_t1, glibc just support unicode3.0.

The new version of unicode is 6.0, it extend CJK UNIFIED IDEOGRAPH with
extension A/B/C/D, and extension A is included in GB18030:2005( China
locale charset standard).

So at least, glibc should sort all Chinese characters in CJK UNIFIED IDEOGRAPH
and EXTENSIONA(U+3400-U+4DBF).

The real effect is sort -u.
If you execute sort -u examples_CJK_extensionA.txt (see attachment), you
will got only one Chinese character "㑗".


Regards,
An Yang

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
@ 2011-08-06 17:25 ` an.euroford at gmail dot com
  2011-08-07 17:34 ` [Bug localedata/13063] 'sort -u' will erase some Chinese characters an.euroford at gmail dot com
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-06 17:25 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13063

--- Comment #1 from An Yang <an.euroford at gmail dot com> 2011-08-06 17:24:33 UTC ---
Created attachment 5880
  --> http://sourceware.org/bugzilla/attachment.cgi?id=5880
example characters in CJK extension A.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
  2011-08-06 17:25 ` [Bug localedata/13063] " an.euroford at gmail dot com
@ 2011-08-07 17:34 ` an.euroford at gmail dot com
  2011-08-07 17:43 ` an.euroford at gmail dot com
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-07 17:34 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13063

An Yang <an.euroford at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Can not 'sort -u' all       |'sort -u' will erase some
                   |Chinese characters in CJK   |Chinese characters
                   |UNIFIED IDEOGRAPH EXTENSION |
                   |A/B/C/D                     |

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
  2011-08-06 17:25 ` [Bug localedata/13063] " an.euroford at gmail dot com
  2011-08-07 17:34 ` [Bug localedata/13063] 'sort -u' will erase some Chinese characters an.euroford at gmail dot com
@ 2011-08-07 17:43 ` an.euroford at gmail dot com
  2011-08-08 16:54 ` an.euroford at gmail dot com
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-07 17:43 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13063

--- Comment #2 from An Yang <an.euroford at gmail dot com> 2011-08-07 17:42:44 UTC ---
I'm not sure, this bugs has any relationship with charmaps, maybe or may not.
But the value of LC_COLLATE in zh_CN is:

% ISO 14651 collation sequence
LC_COLLATE
copy "iso14651_t1_pinyin"
END LC_COLLATE

I'm sure, something is wrong in this table.

All the erased Chinese characters do not a record in iso14651_t1_pinyin, but
they are included in CJK unified Ideographs/ExtA/B/C/D.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
                   ` (2 preceding siblings ...)
  2011-08-07 17:43 ` an.euroford at gmail dot com
@ 2011-08-08 16:54 ` an.euroford at gmail dot com
  2014-06-13 14:35 ` fweimer at redhat dot com
  2014-11-19  4:13 ` bluebat at member dot fsf.org
  5 siblings, 0 replies; 7+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-08 16:54 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=13063

--- Comment #3 from An Yang <an.euroford at gmail dot com> 2011-08-08 16:54:28 UTC ---
There are 25496 Chinese characters in iso14651_t1_pinyin, most of them
distribute over CJK unified ideographs and CJK unified ideographs extension A.

But there are 27552 Chinese characters in CJK unified ideographs and extension
A, more than 2000 Chinese characters without pinyin were losted.

So my suggestion is just add the losted characters at the end of the
iso14651_t1_pinyin, in the order of unicode.

Could you give me any feedback?

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
                   ` (3 preceding siblings ...)
  2011-08-08 16:54 ` an.euroford at gmail dot com
@ 2014-06-13 14:35 ` fweimer at redhat dot com
  2014-11-19  4:13 ` bluebat at member dot fsf.org
  5 siblings, 0 replies; 7+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 14:35 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=13063

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
  2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
                   ` (4 preceding siblings ...)
  2014-06-13 14:35 ` fweimer at redhat dot com
@ 2014-11-19  4:13 ` bluebat at member dot fsf.org
  5 siblings, 0 replies; 7+ messages in thread
From: bluebat at member dot fsf.org @ 2014-11-19  4:13 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=13063

--- Comment #5 from Wei-Lun Chao <bluebat at member dot fsf.org> ---
Tested with patch from bug 17563 and get pass.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-11-19  4:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-06 17:21 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
2011-08-06 17:25 ` [Bug localedata/13063] " an.euroford at gmail dot com
2011-08-07 17:34 ` [Bug localedata/13063] 'sort -u' will erase some Chinese characters an.euroford at gmail dot com
2011-08-07 17:43 ` an.euroford at gmail dot com
2011-08-08 16:54 ` an.euroford at gmail dot com
2014-06-13 14:35 ` fweimer at redhat dot com
2014-11-19  4:13 ` bluebat at member dot fsf.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).