public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly
@ 2011-06-01  7:42 glibcbugz at ghalkes dot nl
  2011-09-09 19:50 ` [Bug libc/12830] " drepper.fsp at gmail dot com
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: glibcbugz at ghalkes dot nl @ 2011-06-01  7:42 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12830

           Summary: ISO-2022-JP-2 maps C1 control characters incorrectly
           Product: glibc
           Version: 2.13
            Status: NEW
          Severity: normal
          Priority: P2
         Component: libc
        AssignedTo: drepper.fsp@gmail.com
        ReportedBy: glibcbugz@ghalkes.nl


In the ISO-2022-JP-2 converter, the C1 control codes (U0080-U009F) are encoded
as 1B 2E 41 1B 4E [00 - 1F] (i.e., load ISO-8859-1 in the G2 graphics set, use
single shift to set G2 and encode the byte [00 - 1F]). However, if I understand
the standard correctly, switching to the G2 set _only_ changes the mapping of
the 96 characters in the range 20-7F (or the 94 charaacters in the range 21-7E
if a smaller set is used). The control characters are unaffected. To access the
C1 control set, one should use 1B [40 - 5F]. This is actually done for the
encoding of the "single shift 2" control (U+008E) in the sequence above, which
is encoded as 1B 4E.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libc/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
@ 2011-09-09 19:50 ` drepper.fsp at gmail dot com
  2011-09-20 13:21 ` glibcbugz at ghalkes dot nl
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: drepper.fsp at gmail dot com @ 2011-09-09 19:50 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12830

Ulrich Drepper <drepper.fsp at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |WAITING

--- Comment #1 from Ulrich Drepper <drepper.fsp at gmail dot com> 2011-09-09 19:49:24 UTC ---
Provide an self-contained test case and a reference to the specification which
you base your opinion on.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libc/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
  2011-09-09 19:50 ` [Bug libc/12830] " drepper.fsp at gmail dot com
@ 2011-09-20 13:21 ` glibcbugz at ghalkes dot nl
  2012-01-04 20:03 ` aj at suse dot de
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: glibcbugz at ghalkes dot nl @ 2011-09-20 13:21 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12830

--- Comment #2 from G. Halkes <glibcbugz at ghalkes dot nl> 2011-09-20 13:20:34 UTC ---
Testcase: in bash, using GNU libc iconv (converts U+0081 from C1):

echo -e -n '\x00\x81' | iconv -f UTF-16BE -t ISO-2022-JP-2 | od -t x1

result:

0000000 1b 2e 41 1b 4e 01
0000006

expected result:

0000000 1b 41
0000002

The standard I base my opinion on is ECMA-35, which can be found at
http://www.ecma-international.org/publications/standards/Ecma-035.htm and
which, according to ECMA itself, is "fully identical with International
Standard ISO/IEC 2022:1994". However, the ECMA-35 specification is freely
available, contrary to the ISO-2022 spec.

Specifically, section 9 discusses the structure of 7-bit codes, such as
ISO-2022-JP-2. It references section 7.2, which discusses the definitions of G0
through G3 and C0 and C1. In the specification of graphics sets G0 - G3, it
notes that it uses "column numbers" 02 through 07, i.e. has values between 0x20
and 0x7f. For C1 codes, it defines that they use column numbers 08 and 09, or
ESC Fe. The meaning of the Fe is explained in section 6.4.3 and 13.2, and
basically means a byte in the range 0x40 - 0x5f.

In my reading of the standard, changing GL to one of G0 through G3, using any
of the shift mechanisms, has no impact on the control codes in CL (the range
0x00 through 0x1f). Therefore, the generated sequence is incorrect, and is
essentially equal to the sequence "01".

Because columns 08 and 09 are not used in a 7 bit code such as ISO-2022-JP-2,
it has to use the ESC Fe construct for representing C1 control codes. Thus the
correct sequence would be "1b 41". This actually corresponds to how the control
character SS2 (U+008E) from C1 is already encoded in the example (i.e. "1b
4e"). See also Figure 8 on page 22, for a graphical representation of the
structure of a 7 bit code.

I hope this sufficiently clarifies my previous report.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libc/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
  2011-09-09 19:50 ` [Bug libc/12830] " drepper.fsp at gmail dot com
  2011-09-20 13:21 ` glibcbugz at ghalkes dot nl
@ 2012-01-04 20:03 ` aj at suse dot de
  2012-12-19 10:51 ` schwab@linux-m68k.org
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: aj at suse dot de @ 2012-01-04 20:03 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12830

Andreas Jaeger <aj at suse dot de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|WAITING                     |NEW
                 CC|                            |aj at suse dot de

--- Comment #3 from Andreas Jaeger <aj at suse dot de> 2012-01-04 20:02:18 UTC ---
Information was provided, resetting state.

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug libc/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
                   ` (2 preceding siblings ...)
  2012-01-04 20:03 ` aj at suse dot de
@ 2012-12-19 10:51 ` schwab@linux-m68k.org
  2014-02-07  2:56 ` [Bug localedata/12830] " jsm28 at gcc dot gnu.org
  2014-06-27 13:14 ` fweimer at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: schwab@linux-m68k.org @ 2012-12-19 10:51 UTC (permalink / raw)
  To: glibc-bugs

http://sourceware.org/bugzilla/show_bug.cgi?id=12830

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|drepper.fsp at gmail dot    |unassigned at sourceware
                   |com                         |dot org

-- 
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
                   ` (3 preceding siblings ...)
  2012-12-19 10:51 ` schwab@linux-m68k.org
@ 2014-02-07  2:56 ` jsm28 at gcc dot gnu.org
  2014-06-27 13:14 ` fweimer at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2014-02-07  2:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=12830

Joseph Myers <jsm28 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |libc-locales at sourceware dot org
          Component|libc                        |localedata

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug localedata/12830] ISO-2022-JP-2 maps C1 control characters incorrectly
  2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
                   ` (4 preceding siblings ...)
  2014-02-07  2:56 ` [Bug localedata/12830] " jsm28 at gcc dot gnu.org
@ 2014-06-27 13:14 ` fweimer at redhat dot com
  5 siblings, 0 replies; 7+ messages in thread
From: fweimer at redhat dot com @ 2014-06-27 13:14 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=12830

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
              Flags|                            |security-

-- 
You are receiving this mail because:
You are on the CC list for the bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-06-27 13:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-01  7:42 [Bug libc/12830] New: ISO-2022-JP-2 maps C1 control characters incorrectly glibcbugz at ghalkes dot nl
2011-09-09 19:50 ` [Bug libc/12830] " drepper.fsp at gmail dot com
2011-09-20 13:21 ` glibcbugz at ghalkes dot nl
2012-01-04 20:03 ` aj at suse dot de
2012-12-19 10:51 ` schwab@linux-m68k.org
2014-02-07  2:56 ` [Bug localedata/12830] " jsm28 at gcc dot gnu.org
2014-06-27 13:14 ` fweimer at redhat dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).