* [Bug localedata/13063] Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
@ 2011-08-06 19:28 ` an.euroford at gmail dot com
2011-08-07 20:46 ` [Bug localedata/13063] 'sort -u' will erase some Chinese characters an.euroford at gmail dot com
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-06 19:28 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=13063
--- Comment #1 from An Yang <an.euroford at gmail dot com> 2011-08-06 17:24:33 UTC ---
Created attachment 5880
--> http://sourceware.org/bugzilla/attachment.cgi?id=5880
example characters in CJK extension A.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
2011-08-06 19:28 ` [Bug localedata/13063] " an.euroford at gmail dot com
@ 2011-08-07 20:46 ` an.euroford at gmail dot com
2011-08-07 20:47 ` an.euroford at gmail dot com
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-07 20:46 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=13063
--- Comment #2 from An Yang <an.euroford at gmail dot com> 2011-08-07 17:42:44 UTC ---
I'm not sure, this bugs has any relationship with charmaps, maybe or may not.
But the value of LC_COLLATE in zh_CN is:
% ISO 14651 collation sequence
LC_COLLATE
copy "iso14651_t1_pinyin"
END LC_COLLATE
I'm sure, something is wrong in this table.
All the erased Chinese characters do not a record in iso14651_t1_pinyin, but
they are included in CJK unified Ideographs/ExtA/B/C/D.
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
2011-08-06 19:28 ` [Bug localedata/13063] " an.euroford at gmail dot com
2011-08-07 20:46 ` [Bug localedata/13063] 'sort -u' will erase some Chinese characters an.euroford at gmail dot com
@ 2011-08-07 20:47 ` an.euroford at gmail dot com
2011-08-08 16:56 ` an.euroford at gmail dot com
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-07 20:47 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=13063
An Yang <an.euroford at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Can not 'sort -u' all |'sort -u' will erase some
|Chinese characters in CJK |Chinese characters
|UNIFIED IDEOGRAPH EXTENSION |
|A/B/C/D |
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (2 preceding siblings ...)
2011-08-07 20:47 ` an.euroford at gmail dot com
@ 2011-08-08 16:56 ` an.euroford at gmail dot com
2014-05-07 8:20 ` bluebat at member dot fsf.org
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: an.euroford at gmail dot com @ 2011-08-08 16:56 UTC (permalink / raw)
To: libc-locales
http://sourceware.org/bugzilla/show_bug.cgi?id=13063
--- Comment #3 from An Yang <an.euroford at gmail dot com> 2011-08-08 16:54:28 UTC ---
There are 25496 Chinese characters in iso14651_t1_pinyin, most of them
distribute over CJK unified ideographs and CJK unified ideographs extension A.
But there are 27552 Chinese characters in CJK unified ideographs and extension
A, more than 2000 Chinese characters without pinyin were losted.
So my suggestion is just add the losted characters at the end of the
iso14651_t1_pinyin, in the order of unicode.
Could you give me any feedback?
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (3 preceding siblings ...)
2011-08-08 16:56 ` an.euroford at gmail dot com
@ 2014-05-07 8:20 ` bluebat at member dot fsf.org
2014-06-13 15:12 ` fweimer at redhat dot com
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bluebat at member dot fsf.org @ 2014-05-07 8:20 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
趙惟倫 <bluebat at member dot fsf.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bluebat at member dot fsf.org
--- Comment #4 from 趙惟倫 <bluebat at member dot fsf.org> ---
as BZ#15616 report confirmed.
BZ#16905 is another approach but untested.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (4 preceding siblings ...)
2014-05-07 8:20 ` bluebat at member dot fsf.org
@ 2014-06-13 15:12 ` fweimer at redhat dot com
2014-11-19 4:14 ` bluebat at member dot fsf.org
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: fweimer at redhat dot com @ 2014-06-13 15:12 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (5 preceding siblings ...)
2014-06-13 15:12 ` fweimer at redhat dot com
@ 2014-11-19 4:14 ` bluebat at member dot fsf.org
2017-01-22 23:58 ` arthur200126 at gmail dot com
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bluebat at member dot fsf.org @ 2014-11-19 4:14 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
--- Comment #5 from Wei-Lun Chao <bluebat at member dot fsf.org> ---
Tested with patch from bug 17563 and get pass.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (6 preceding siblings ...)
2014-11-19 4:14 ` bluebat at member dot fsf.org
@ 2017-01-22 23:58 ` arthur200126 at gmail dot com
2017-07-19 16:16 ` maiku.fabian at gmail dot com
2017-07-20 8:02 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: arthur200126 at gmail dot com @ 2017-01-22 23:58 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
Mingye Wang <arthur200126 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |arthur200126 at gmail dot com
--- Comment #6 from Mingye Wang <arthur200126 at gmail dot com> ---
This bug is not only seen with extA characters, but also seen with simple
punctuations and/or kanas.
$ printf '%s\n' , 。 : ¥ あ か ア カ a b c , . : $ | LC_COLLATE=zh_CN.UTF-8 sort -u
,
:
.
$
,
a
b
c
(uniq does the same thing.)
It seems that glibc is just eating away anything not on that list. (What kind
of equivalence assumption is that?)
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (7 preceding siblings ...)
2017-01-22 23:58 ` arthur200126 at gmail dot com
@ 2017-07-19 16:16 ` maiku.fabian at gmail dot com
2017-07-20 8:02 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-07-19 16:16 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug localedata/13063] 'sort -u' will erase some Chinese characters
2011-08-06 19:28 [Bug localedata/13063] New: Can not 'sort -u' all Chinese characters in CJK UNIFIED IDEOGRAPH EXTENSION A/B/C/D an.euroford at gmail dot com
` (8 preceding siblings ...)
2017-07-19 16:16 ` maiku.fabian at gmail dot com
@ 2017-07-20 8:02 ` maiku.fabian at gmail dot com
9 siblings, 0 replies; 11+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-07-20 8:02 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=13063
--- Comment #7 from Mike FABIAN <maiku.fabian at gmail dot com> ---
(In reply to Mingye Wang from comment #6)
> This bug is not only seen with extA characters, but also seen with simple
> punctuations and/or kanas.
>
> $ printf '%s\n' , 。 : ¥ あ か ア カ a b c , . : $ | LC_COLLATE=zh_CN.UTF-8 sort
> -u
> ,
> :
> .
> $
> ,
> a
> b
> c
>
> (uniq does the same thing.)
>
> It seems that glibc is just eating away anything not on that list. (What
> kind of equivalence assumption is that?)
This is caused by the collation symbol UNDEFINED not working correctly,
see:
https://sourceware.org/bugzilla/show_bug.cgi?id=18978
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread