public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Doug Henderson <djndnbvg@gmail.com>
To: cygwin <cygwin@cygwin.com>
Subject: Re: Cygwin fails to utilize Unicode replacement character
Date: Tue, 04 Sep 2018 19:59:00 -0000	[thread overview]
Message-ID: <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw@mail.gmail.com> (raw)
In-Reply-To: <5b8aba97.1c69fb81.96f14.1b37@mx.google.com>

On Sat, 1 Sep 2018 at 10:13, Steven Penny  wrote:
<snip>
> You get this result with Linux:
>
>     $ cat alfa.txt
>     �
>
> Where "cat" properly outputs Unicode 'REPLACEMENT CHARACTER' (U+FFFD). However
> with Cygwin you get this:
>
>     $ cat alfa.txt
>     ▒
>
> Where "cat" outputs Unicode Character 'MEDIUM SHADE' (U+2592).


My preference is to remove the output fiddling code that Corrina has
been working on. It is trying to solve the wrong problem.
I think we have gone down a rabbit hole at the wrong end of cat's data flow.

Should any changes to the way a character is displayed be required, it
needs to be in the terminal program that display the character, not in
cygwin which should pass the character along unmodified.

Both cygwin and Debian 9.5 show:

    $ file alfa.txt
    alfa.txt: ISO-8859 text

When Linux reads the file, it assumes the encoding is UTF-8.
When cygwin reads the file, it assume the encoding is CP1252
This command shows the problem

    $ iconv -f utf8 alfa.txt
    iconv: alfa.txt:1:0: incomplete character or shift sequence

On Linux, this shows a slightly different message, with the same intent.

Try using this string:

    $ printf "\xC3\xAB\353\n"
    ë▒

to get a better understanding of the problem. It contains two
representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded
in UTF-8, then using ISO-8859-1.

There are two different reasons for the MEDIUM SHADE. Here it
indicates an invalid UTF-8 character, and the font does not have a
glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in
place of an ordinary character without a glyph in the font.

HTH
Doug

-- 
Doug Henderson, Calgary, Alberta, Canada - from gmail.com

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  parent reply	other threads:[~2018-09-04 19:59 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-01 16:13 Steven Penny
2018-09-01 18:11 ` Thomas Wolff
2018-09-01 18:46   ` Steven Penny
2018-09-01 21:07     ` Thomas Wolff
2018-09-01 19:40 ` Corinna Vinschen
2018-09-01 21:50 ` Doug Henderson
2018-09-01 22:49   ` Steven Penny
2018-09-02  8:07     ` Thomas Wolff
2018-09-02 12:51       ` Steven Penny
2018-09-03 12:46         ` Corinna Vinschen
2018-09-03 14:59           ` Corinna Vinschen
2018-09-03 16:34             ` Thomas Wolff
2018-09-03 17:17               ` Corinna Vinschen
2018-09-03 17:56                 ` Thomas Wolff
2018-09-03 18:20                   ` Thomas Wolff
2018-09-03 19:14                     ` Corinna Vinschen
2018-09-03 20:27                       ` Corinna Vinschen
2018-09-03 20:42                         ` Thomas Wolff
2018-09-03 21:03                           ` Corinna Vinschen
2018-09-03 22:15                             ` Steven Penny
2018-09-04  6:06                               ` Brian Inglis
2018-09-04  9:00                               ` Corinna Vinschen
2018-09-04 11:40                                 ` Steven Penny
2018-09-05  7:55                                   ` Corinna Vinschen
2018-09-05  9:22                                     ` Thomas Wolff
2018-09-05 11:58                                     ` Steven Penny
2018-09-05 13:18                                       ` Marco Atzeri
2018-09-05 15:20                                         ` Andrey Repin
2018-09-05 15:58                                         ` Corinna Vinschen
2018-09-05 20:15                                           ` Corinna Vinschen
2018-09-06  1:35                                             ` Steven Penny
2018-09-06  7:01                                               ` Corinna Vinschen
2018-09-07  8:20                                                 ` Corinna Vinschen
2018-09-07 10:34                                                   ` Thomas Wolff
2018-09-07 11:29                                                     ` Corinna Vinschen
2018-09-07 11:42                                                       ` Thomas Wolff
2018-09-07 11:51                                                         ` Thomas Wolff
2018-09-07 11:54                                                           ` Corinna Vinschen
2018-09-07 16:22                                                             ` Brian Inglis
2018-09-07 16:48                                                             ` Brian Inglis
2018-09-07 17:01                                                               ` Marco Atzeri
2018-09-07 18:21                                                                 ` Corinna Vinschen
2018-09-07 18:20                                                               ` Corinna Vinschen
2018-09-05 13:35                                       ` Andrey Repin
2018-09-05 14:04                                         ` Houder
2018-09-05 15:05                                           ` Andrey Repin
2018-09-04 12:50                                 ` David Macek
2018-09-04 14:18                                   ` Thomas Wolff
2018-09-04 14:46                                     ` David Macek
2018-09-04 18:20                                     ` Steven Penny
2018-09-04 18:41                                       ` Thomas Wolff
2018-09-04 19:50                                         ` Andrey Repin
2018-09-04 19:53                                         ` Steven Penny
2018-09-04 21:43                                           ` Thomas Wolff
2018-09-04 23:29                                             ` Steven Penny
2018-09-04 20:40                                       ` Brian Inglis
2018-09-05  8:32                                         ` Corinna Vinschen
2018-09-04 13:05                                 ` Andrey Repin
2018-10-04  0:25                               ` Steven Penny
2018-09-03 16:05         ` Brian Inglis
2018-09-04 19:59 ` Doug Henderson [this message]
2018-09-04 21:05   ` Steven Penny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw@mail.gmail.com \
    --to=djndnbvg@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).