public inbox for glibc-bugs@sourceware.org help / color / mirror / Atom feed
From: "glibcbugz at ghalkes dot nl" <sourceware-bugzilla@sourceware.org> To: glibc-bugs@sources.redhat.com Subject: [Bug libc/12830] ISO-2022-JP-2 maps C1 control characters incorrectly Date: Tue, 20 Sep 2011 13:21:00 -0000 [thread overview] Message-ID: <bug-12830-131-uHEZF57rPa@http.sourceware.org/bugzilla/> (raw) In-Reply-To: <bug-12830-131@http.sourceware.org/bugzilla/> http://sourceware.org/bugzilla/show_bug.cgi?id=12830 --- Comment #2 from G. Halkes <glibcbugz at ghalkes dot nl> 2011-09-20 13:20:34 UTC --- Testcase: in bash, using GNU libc iconv (converts U+0081 from C1): echo -e -n '\x00\x81' | iconv -f UTF-16BE -t ISO-2022-JP-2 | od -t x1 result: 0000000 1b 2e 41 1b 4e 01 0000006 expected result: 0000000 1b 41 0000002 The standard I base my opinion on is ECMA-35, which can be found at http://www.ecma-international.org/publications/standards/Ecma-035.htm and which, according to ECMA itself, is "fully identical with International Standard ISO/IEC 2022:1994". However, the ECMA-35 specification is freely available, contrary to the ISO-2022 spec. Specifically, section 9 discusses the structure of 7-bit codes, such as ISO-2022-JP-2. It references section 7.2, which discusses the definitions of G0 through G3 and C0 and C1. In the specification of graphics sets G0 - G3, it notes that it uses "column numbers" 02 through 07, i.e. has values between 0x20 and 0x7f. For C1 codes, it defines that they use column numbers 08 and 09, or ESC Fe. The meaning of the Fe is explained in section 6.4.3 and 13.2, and basically means a byte in the range 0x40 - 0x5f. In my reading of the standard, changing GL to one of G0 through G3, using any of the shift mechanisms, has no impact on the control codes in CL (the range 0x00 through 0x1f). Therefore, the generated sequence is incorrect, and is essentially equal to the sequence "01". Because columns 08 and 09 are not used in a 7 bit code such as ISO-2022-JP-2, it has to use the ESC Fe construct for representing C1 control codes. Thus the correct sequence would be "1b 41". This actually corresponds to how the control character SS2 (U+008E) from C1 is already encoded in the example (i.e. "1b 4e"). See also Figure 8 on page 22, for a graphical representation of the structure of a 7 bit code. I hope this sufficiently clarifies my previous report. -- Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
next prev parent reply other threads:[~2011-09-20 13:21 UTC|newest] Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top 2011-06-01 7:42 [Bug libc/12830] New: " glibcbugz at ghalkes dot nl 2011-09-09 19:50 ` [Bug libc/12830] " drepper.fsp at gmail dot com 2011-09-20 13:21 ` glibcbugz at ghalkes dot nl [this message] 2012-01-04 20:03 ` aj at suse dot de 2012-12-19 10:51 ` schwab@linux-m68k.org 2014-02-07 2:56 ` [Bug localedata/12830] " jsm28 at gcc dot gnu.org 2014-06-27 13:14 ` fweimer at redhat dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-12830-131-uHEZF57rPa@http.sourceware.org/bugzilla/ \ --to=sourceware-bugzilla@sourceware.org \ --cc=glibc-bugs@sources.redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).