public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Re: Problem with displaying ASCII table in mintty
@ 2009-06-26 15:20 Mark Harig
  0 siblings, 0 replies; 7+ messages in thread
From: Mark Harig @ 2009-06-26 15:20 UTC (permalink / raw)
  To: The Cygwin Mailing List

> >
> > With this configuration, the upper 128 entries to the ASCII
> > table are displayed as follows (the #'s are replacements for
> > the gray box character that is displayed):
>
> That's because because bytes from 0x80 to 0xFF by themselves are
> invalid in UTF-8. Those codepoints need to be encoded as two-byte
> sequences. I'd suspect /bin/ascii isn't designed for that.

OK.  From what you have written, it appears that the defect
is in /usr/bin/ascii (cygutils) in that it does not support UTF-8.
Either that will need to be fixed or the documentation for
the tool could list/describe the problem so that other users
are told that it is known.

> > In addition,
> > many entries are displayed in the wrong location (some rows are out
> > of order).
>
> That's because some of the C1 control characters are interpreted
> specially, in particular CSI and OSC. It's the same if you try it in
> xterm.
>
> You can get most of the printable characters in the C1 range by
> switching to Windows codepage 1252. (Well, you could anyway if it
> wasn't for a rather bad bug in mintty-0.4.0 and 0.4.1 that means that
> ISO-8859-1 is used no matter your codepage setting. That's fixed on
> the 0.4 SVN branch.)

OK.  Users can display the upper 128 entries in 'rxvt' so there is
a work-around in a Cygwin environment.  (In 'rxvt', LC_CTYPE
should not be set to UTF-8.  AFAIK, UTF-8 is not supported
in cygwin's rxvt.)



--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with displaying ASCII table in mintty
  2009-06-26 19:51 ` Mark J. Reed
@ 2009-06-27  0:14   ` Charles Wilson
  0 siblings, 0 replies; 7+ messages in thread
From: Charles Wilson @ 2009-06-27  0:14 UTC (permalink / raw)
  To: cygwin

Mark J. Reed wrote:
> On Fri, Jun 26, 2009 at 12:51 PM, Mark Harig wrote:
>> Do you have any recommendations about what the utility program /usr/bin/ascii
>> (in the package 'cygutils') should do?
> 
> Since the Cygwin version of ascii doesn't appear to have a man page,
> I'm not sure what it "should" do.  What it appears to do is simply
> printout out all possible 8-bit characters so you can see what they
> are.  Which will fail in any multibyte locale.

Yep. ascii.exe is dirt simple.  One possibility is that it could detect
the locale, and for multibyte ones display only 0x00..0x7f (or
x20..0x7f).  For single byte locales, go ahead with the current
dirt-simple behavior.

PTC.

--
Chuck


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with displaying ASCII table in mintty
  2009-06-26 17:36 Mark Harig
@ 2009-06-26 19:51 ` Mark J. Reed
  2009-06-27  0:14   ` Charles Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Mark J. Reed @ 2009-06-26 19:51 UTC (permalink / raw)
  To: cygwin

On Fri, Jun 26, 2009 at 12:51 PM, Mark Harig wrote:
> Do you have any recommendations about what the utility program /usr/bin/ascii
> (in the package 'cygutils') should do?

Since the Cygwin version of ascii doesn't appear to have a man page,
I'm not sure what it "should" do.  What it appears to do is simply
printout out all possible 8-bit characters so you can see what they
are.  Which will fail in any multibyte locale.

You can write your own imitation ascii as a Perl one-liner:

perl -e 'for ($i=0;$i<256;$i+=4) { for ($i..$i+3) {  printf "%03d
0x%02x  %c\t", ($_)x3 } print "\n"; }'

which can be adjusted for different locale settings:

perl -Mencoding=utf8 -e '...'


adding the control-sequence support (^x) is left as an exercise for
the reader. :)

-- 
Mark J. Reed <markjreed@gmail.com>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with displaying ASCII table in mintty
@ 2009-06-26 17:36 Mark Harig
  2009-06-26 19:51 ` Mark J. Reed
  0 siblings, 1 reply; 7+ messages in thread
From: Mark Harig @ 2009-06-26 17:36 UTC (permalink / raw)
  To: The Cygwin Mailing List

Fri, 26 Jun 2009 10:50:45 -0400 Mark J. Reed wrote:

> > Is is possible to display the upper 128 entries in the ASCII
> > table in mintty using the 'cygutils' application 'ascii'?
>
> The ASCII table doesn't have an upper 128 entries.  Only codes 0
> through 127 decimal are defined by ASCII.  Once you hit 128 you're not
> in ASCII anymore, and what you *are* in depends entirely on what code
> page you're using.
>
> 128 through 159 are control characters in Unicode and Latin-1, but
> printable characters in Windows 1252.  160 through 255 are the same in
> Windows 1252, Latin-1, and Unicode, but defined differently in the
> other ISO-8859 and ISO-2022 character sets and Windows code pages.
>
> If you're using UTF-8 (a particular way of representing Unicode
> characters, which are defined as numbers, as concrete bits and bytes),
> then only characters 0 through 127 can be expressed in one byte.
> Characters from 128 to 2047 take two bytes; the rest of the BMP (2048
> through 65536)  three bytes per character, and the rest of Unicode
> four bytes per character.
>
> So if you just send the byte with decimal value 128, not preceded by
> the start of a UTF-8 sequence, to a UTF-8 terminal, the terminal will
> reject it as invalid, or display gobbledygook, depending on its error
> handling design.

Thank you for the explanation.  I see from the manual page for 'ascii'
provided at http://www.kernel.org/pub/linux/docs/manpages/ that the
ASCII table is as you described (that is, 128 entries only).  Do you have
any recommendations about what the utility program /usr/bin/ascii
(in the package 'cygutils') should do?  Should it not provide a display
of the values above 128 because they are not part of the ASCII table?
Does it make sense to provide options that handle the values above 128
under the various conditions described above?


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with displaying ASCII table in mintty
  2009-06-26  2:07 Mark Harig
  2009-06-26  8:49 ` Andy Koppe
@ 2009-06-26 15:01 ` Mark J. Reed
  1 sibling, 0 replies; 7+ messages in thread
From: Mark J. Reed @ 2009-06-26 15:01 UTC (permalink / raw)
  To: cygwin

On Thu, Jun 25, 2009 at 9:24 PM, Mark Harig wrote:
> Is is possible to display the upper 128 entries in the ASCII
> table in mintty using the 'cygutils' application 'ascii'?

The ASCII table doesn't have an upper 128 entries.  Only codes 0
through 127 decimal are defined by ASCII.  Once you hit 128 you're not
in ASCII anymore, and what you *are* in depends entirely on what code
page you're using.

128 through 159 are control characters in Unicode and Latin-1, but
printable characters in Windows 1252.  160 through 255 are the same in
Windows 1252, Latin-1, and Unicode, but defined differently in the
other ISO-8859 and ISO-2022 character sets and Windows code pages.

If you're using UTF-8 (a particular way of representing Unicode
characters, which are defined as numbers, as concrete bits and bytes),
then only characters 0 through 127 can be expressed in one byte.
Characters from 128 to 2047 take two bytes; the rest of the BMP (2048
through 65536)  three bytes per character, and the rest of Unicode
four bytes per character.

So if you just send the byte with decimal value 128, not preceded by
the start of a UTF-8 sequence, to a UTF-8 terminal, the terminal will
reject it as invalid, or display gobbledygook, depending on its error
handling design.

-- 
Mark J. Reed <markjreed@gmail.com>

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Problem with displaying ASCII table in mintty
  2009-06-26  2:07 Mark Harig
@ 2009-06-26  8:49 ` Andy Koppe
  2009-06-26 15:01 ` Mark J. Reed
  1 sibling, 0 replies; 7+ messages in thread
From: Andy Koppe @ 2009-06-26  8:49 UTC (permalink / raw)
  To: cygwin

2009/6/26 Mark Harig
>
> Is is possible to display the upper 128 entries in the ASCII
> table in mintty using the 'cygutils' application 'ascii'?
>
> I have attempted to use two configurations, but neither one
> displays the table without problems in mintty:
>
> Configuration 1:
>
>   - mintty: Using the font's codepage set to UTF-8
>
>     bash-3.2$ /usr/bin/grep Codepage ~/.minttyrc
>     Codepage=UTF-8
>
>   - bash:
>     bash-3.2$ echo $TERM
>     xterm
>
>    bash-3.2$ echo \"$LANG\" ":" \"$LC_ALL\" ":" \"$LC_CTYPE\"
>     "en_US.UTF-8" : "en_US.UTF-8" : "en_US.UTF-8"
>
>  With this configuration, the upper 128 entries to the ASCII
>  table are displayed as follows (the #'s are replacements for
>  the gray box character that is displayed):

That's because because bytes from 0x80 to 0xFF by themselves are
invalid in UTF-8. Those codepoints need to be encoded as two-byte
sequences. I'd suspect /bin/ascii isn't designed for that.


Configuration 2:
>
>   - mintty: Using the font's codepage set to ISO-8859-1
>
>     bash-3.2$ /usr/bin/grep -i codepage ~/.minttyrc
>     Codepage=ISO-8859-1:1998 (Latin-1, West Europe)
>
>  - bash:
>
>    bash-3.2$ echo \"$LANG\" ":" \"$LC_ALL\" ":" \"$LC_CTYPE\"
>    "en_US.ISO-8859-1" : "en_US.ISO-8859-1" : "en_US.ISO-8859-1"
>
>  With this second configuration, most of the upper 128 entries
>  of the ASCII table are displayed, but many are missing.

What's missing are the characters from 0x80 to 0x9F, aka the C1
control character set in the ISO codepages. Windows codepages have
printable characters in their place.

>  In addition,
>  many entries are displayed in the wrong location (some rows are out
>  of order).

That's because some of the C1 control characters are interpreted
specially, in particular CSI and OSC. It's the same if you try it in
xterm.

You can get most of the printable characters in the C1 range by
switching to Windows codepage 1252. (Well, you could anyway if it
wasn't for a rather bad bug in mintty-0.4.0 and 0.4.1 that means that
ISO-8859-1 is used no matter your codepage setting. That's fixed on
the 0.4 SVN branch.)

Andy

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Problem with displaying ASCII table in mintty
@ 2009-06-26  2:07 Mark Harig
  2009-06-26  8:49 ` Andy Koppe
  2009-06-26 15:01 ` Mark J. Reed
  0 siblings, 2 replies; 7+ messages in thread
From: Mark Harig @ 2009-06-26  2:07 UTC (permalink / raw)
  To: The Cygwin Mailing List

Is is possible to display the upper 128 entries in the ASCII
table in mintty using the 'cygutils' application 'ascii'?

Package versions:

bash-3.2$ /usr/bin/cygcheck -c bash cygutils cygwin mintty
Cygwin Package Information
Package              Version        Status
bash                 3.2.49-22      OK
cygutils             1.4.0-1        OK
cygwin               1.7.0-50       OK
mintty               0.4.1-1        OK

bash-3.2$ /usr/bin/ascii --version
/usr/bin/ascii is part of cygutils version 1.4.0
  Prints nicely formatted table of the ascii character set


I have attempted to use two configurations, but neither one
displays the table without problems in mintty:

Configuration 1:

    - mintty: Using the font's codepage set to UTF-8

      bash-3.2$ /usr/bin/grep Codepage ~/.minttyrc
      Codepage=UTF-8

    - bash:
      bash-3.2$ echo $TERM
      xterm

     bash-3.2$ echo \"$LANG\" ":" \"$LC_ALL\" ":" \"$LC_CTYPE\"
      "en_US.UTF-8" : "en_US.UTF-8" : "en_US.UTF-8"

   With this configuration, the upper 128 entries to the ASCII
   table are displayed as follows (the #'s are replacements for
   the gray box character that is displayed):

       bash-3.2$ /usr/bin/ascii
       128  0x80  #    160  0xa0  #    192  0xc0  #    224  0xe0  #
       ...
       159  0x9f  #    191  0xbf  #    223  0xdf  #     255  0xff  #

Configuration 2:

    - mintty: Using the font's codepage set to ISO-8859-1

      bash-3.2$ /usr/bin/grep -i codepage ~/.minttyrc
      Codepage=ISO-8859-1:1998 (Latin-1, West Europe)

   - bash:

     bash-3.2$ echo \"$LANG\" ":" \"$LC_ALL\" ":" \"$LC_CTYPE\"
     "en_US.ISO-8859-1" : "en_US.ISO-8859-1" : "en_US.ISO-8859-1"

   With this second configuration, most of the upper 128 entries
   of the ASCII table are displayed, but many are missing.  In addition,
   many entries are displayed in the wrong location (some rows are out
   of order).

Note: I had mintty start bash with the '--norc' and '--noprofile' options.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-26 23:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-26 15:20 Problem with displaying ASCII table in mintty Mark Harig
  -- strict thread matches above, loose matches on Subject: below --
2009-06-26 17:36 Mark Harig
2009-06-26 19:51 ` Mark J. Reed
2009-06-27  0:14   ` Charles Wilson
2009-06-26  2:07 Mark Harig
2009-06-26  8:49 ` Andy Koppe
2009-06-26 15:01 ` Mark J. Reed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).