* [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings
@ 2016-11-25 4:49 arthur200126 at gmail dot com
2016-11-29 8:15 ` [Bug localedata/20865] charmaps: " fweimer at redhat dot com
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: arthur200126 at gmail dot com @ 2016-11-25 4:49 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
Bug ID: 20865
Summary: iconv: cp950 does not contain EUDC/PUA mappings
Product: glibc
Version: unspecified
Status: UNCONFIRMED
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: arthur200126 at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---
Microsoft's cp950 mapping contains sequential mappings from Big5's Extended
User-defined Characters (EUDC) to Unicode PUA. Such mappings are used by a
number of Big5 extensions, including HKSCS which uses these PUA code points
when a character is not yet available in the target UCS version.
The following sessions come from GNU bash running in a UTF-8 console. $''
denotes bash's ANSI C-style quoting, where \xhh generates a raw hex byte and
\uhhhh generates the representation of U+hhhh under current locale.
Currently glibc's cp950 implementation does not contain these mappings:
# iconv (Ubuntu GLIBC 2.23-0ubuntu4) 2.23
ubuntu$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
iconv: illegal input sequence at position 0
ubuntu$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0
The desired behavior for decoding can be seen in libiconv:
# iconv (GNU libiconv 1.14)
cygwin$ iconv -f cp950 -t utf-32le <<< $'\x81\x40' | hexdump -C
00000000 b8 ee 00 00 0a 00 00 00 |........|
00000008
Note that libiconv is not interested in doing the reverse:
cygwin$ iconv -t cp950 -f utf-8 <<< $'\ueeb8' | hexdump -C
iconv: illegal input sequence at position 0
libiconv's mapping:
http://git.savannah.gnu.org/cgit/libiconv.git/tree/lib/cp950.h#n72
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/20865] charmaps: cp950 does not contain EUDC/PUA mappings
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
@ 2016-11-29 8:15 ` fweimer at redhat dot com
2016-11-30 1:03 ` arthur200126 at gmail dot com
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: fweimer at redhat dot com @ 2016-11-29 8:15 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
Mike Frysinger <vapier at gentoo dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|iconv: cp950 does not |charmaps: cp950 does not
|contain EUDC/PUA mappings |contain EUDC/PUA mappings
Florian Weimer <fweimer at redhat dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Flags| |security-
--- Comment #1 from Mike Frysinger <vapier at gentoo dot org> ---
we currently alias CP950 to BIG5. i don't think we want to add all these
mappings to BIG5 since it explicitly carves out that spaces as "reserved":
https://en.wikipedia.org/wiki/Big5#A_more_detailed_look_at_the_organization
which means we need to copy BIG5 to CP950 and add the MS extensions. then drop
the alias of CP950->BIG5.
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/20865] charmaps: cp950 does not contain EUDC/PUA mappings
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
2016-11-29 8:15 ` [Bug localedata/20865] charmaps: " fweimer at redhat dot com
@ 2016-11-30 1:03 ` arthur200126 at gmail dot com
2016-11-30 21:25 ` arthur200126 at gmail dot com
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: arthur200126 at gmail dot com @ 2016-11-30 1:03 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
--- Comment #2 from Mingye Wang <arthur200126 at gmail dot com> ---
> I don't think we want to add all these mappings to BIG5 since it
> explicitly carves out that spaces as "reserved":
If you consider how GB 18030's user-defined areas are mapped, "reserved for
user-defined characters" may be acceptable for PUA. The "reserved, not for
user-defined" part is not mapped to PUA in cp950 and big5-2003.
> which means we need to copy BIG5 to CP950 and add the MS extensions
I went back to libiconv's charts and read a few lines up too see all of MS's
modifications & extensions; now it does sound messy enough for a split. (0xA1F2
and 0xA1F3 are entertainingly weird.)
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/20865] charmaps: cp950 does not contain EUDC/PUA mappings
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
2016-11-29 8:15 ` [Bug localedata/20865] charmaps: " fweimer at redhat dot com
2016-11-30 1:03 ` arthur200126 at gmail dot com
@ 2016-11-30 21:25 ` arthur200126 at gmail dot com
2017-10-21 8:27 ` maiku.fabian at gmail dot com
2017-10-28 18:22 ` [Bug localedata/20865] charmaps: cp950 needs Windows " arthur200126 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: arthur200126 at gmail dot com @ 2016-11-30 21:25 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
--- Comment #3 from Mingye Wang <arthur200126 at gmail dot com> ---
> now it does sound messy enough for a split.
Hmm, glibc is already using the "MICSFT/WINDOWS" cp950 definition for BIG5's
charmap. So it's not that bad...
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/20865] charmaps: cp950 does not contain EUDC/PUA mappings
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
` (2 preceding siblings ...)
2016-11-30 21:25 ` arthur200126 at gmail dot com
@ 2017-10-21 8:27 ` maiku.fabian at gmail dot com
2017-10-28 18:22 ` [Bug localedata/20865] charmaps: cp950 needs Windows " arthur200126 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: maiku.fabian at gmail dot com @ 2017-10-21 8:27 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
Mike FABIAN <maiku.fabian at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |maiku.fabian at gmail dot com
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug localedata/20865] charmaps: cp950 needs Windows EUDC/PUA mappings
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
` (3 preceding siblings ...)
2017-10-21 8:27 ` maiku.fabian at gmail dot com
@ 2017-10-28 18:22 ` arthur200126 at gmail dot com
4 siblings, 0 replies; 6+ messages in thread
From: arthur200126 at gmail dot com @ 2017-10-28 18:22 UTC (permalink / raw)
To: libc-locales
https://sourceware.org/bugzilla/show_bug.cgi?id=20865
Mingye Wang <arthur200126 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|charmaps: cp950 does not |charmaps: cp950 needs
|contain EUDC/PUA mappings |Windows EUDC/PUA mappings
--
You are receiving this mail because:
You are on the CC list for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-10-28 18:22 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-25 4:49 [Bug localedata/20865] New: iconv: cp950 does not contain EUDC/PUA mappings arthur200126 at gmail dot com
2016-11-29 8:15 ` [Bug localedata/20865] charmaps: " fweimer at redhat dot com
2016-11-30 1:03 ` arthur200126 at gmail dot com
2016-11-30 21:25 ` arthur200126 at gmail dot com
2017-10-21 8:27 ` maiku.fabian at gmail dot com
2017-10-28 18:22 ` [Bug localedata/20865] charmaps: cp950 needs Windows " arthur200126 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).