public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Steven Penny <svnpenn@gmail.com>
To: cygwin@cygwin.com
Subject: Re: Cygwin fails to utilize Unicode replacement character
Date: Tue, 04 Sep 2018 21:05:00 -0000	[thread overview]
Message-ID: <5b8ef3af.1c69fb81.6801.f392@mx.google.com> (raw)
In-Reply-To: <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw@mail.gmail.com>

On Tue, 4 Sep 2018 13:59:10, Doug Henderson wrote:
> My preference is to remove the output fiddling code that Corrina has
> been working on. It is trying to solve the wrong problem.
> I think we have gone down a rabbit hole at the wrong end of cat's data flow.

this has nothing to do with "cat". it has to do with the unfounded design
decision to use U+2592. Granted at this point we are bikeshedding - but an
official standard does exist, namely Unicode, with 2 applicable characters for
this use case:

1. U+FFFD: http://unicode.org/charts/nameslist/n_FFF0.html
2. U+25A1: http://unicode.org/charts/nameslist/n_25A0.html

> Should any changes to the way a character is displayed be required, it
> needs to be in the terminal program that display the character, not in
> cygwin which should pass the character along unmodified.

the "terminal" in this case is either "cygwin" or "xterm" - in both cases code
changes have already been made in reponse to this thread, so i dont think your
comment here holds weight.

> Both cygwin and Debian 9.5 show:
>
>     $ file alfa.txt
>     alfa.txt: ISO-8859 text
>
> When Linux reads the file, it assumes the encoding is UTF-8.
> When cygwin reads the file, it assume the encoding is CP1252
> This command shows the problem
>
>     $ iconv -f utf8 alfa.txt
>     iconv: alfa.txt:1:0: incomplete character or shift sequence
>
> On Linux, this shows a slightly different message, with the same intent.
>
> Try using this string:
>
>     $ printf "\xC3\xAB\353\n"
>     =C3=AB=E2=96=92
>
> to get a better understanding of the problem. It contains two
> representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded
> in UTF-8, then using ISO-8859-1.

now it appears *you* are going down the rabbit hole. both Cygwin and Mintty were
in violation on Unicode standard - however this has already been remedied in the
code.

> There are two different reasons for the MEDIUM SHADE. Here it
> indicates an invalid UTF-8 character, and the font does not have a
> glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in
> place of an ordinary character without a glyph in the font.

this is flat wrong. U+2592 MEDIUM SHADE is *only* used in cases of invalid
UTF-8. In case of missing character - the ".notdef" glyph is used - as has been
discussed several times in this thread. This is not an actual character, so i
cannot paste it here - but as an example with "DejaVu Sans Mono" the glyph is
an empty rectangle.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

      reply	other threads:[~2018-09-04 21:05 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-01 16:13 Steven Penny
2018-09-01 18:11 ` Thomas Wolff
2018-09-01 18:46   ` Steven Penny
2018-09-01 21:07     ` Thomas Wolff
2018-09-01 19:40 ` Corinna Vinschen
2018-09-01 21:50 ` Doug Henderson
2018-09-01 22:49   ` Steven Penny
2018-09-02  8:07     ` Thomas Wolff
2018-09-02 12:51       ` Steven Penny
2018-09-03 12:46         ` Corinna Vinschen
2018-09-03 14:59           ` Corinna Vinschen
2018-09-03 16:34             ` Thomas Wolff
2018-09-03 17:17               ` Corinna Vinschen
2018-09-03 17:56                 ` Thomas Wolff
2018-09-03 18:20                   ` Thomas Wolff
2018-09-03 19:14                     ` Corinna Vinschen
2018-09-03 20:27                       ` Corinna Vinschen
2018-09-03 20:42                         ` Thomas Wolff
2018-09-03 21:03                           ` Corinna Vinschen
2018-09-03 22:15                             ` Steven Penny
2018-09-04  6:06                               ` Brian Inglis
2018-09-04  9:00                               ` Corinna Vinschen
2018-09-04 11:40                                 ` Steven Penny
2018-09-05  7:55                                   ` Corinna Vinschen
2018-09-05  9:22                                     ` Thomas Wolff
2018-09-05 11:58                                     ` Steven Penny
2018-09-05 13:18                                       ` Marco Atzeri
2018-09-05 15:20                                         ` Andrey Repin
2018-09-05 15:58                                         ` Corinna Vinschen
2018-09-05 20:15                                           ` Corinna Vinschen
2018-09-06  1:35                                             ` Steven Penny
2018-09-06  7:01                                               ` Corinna Vinschen
2018-09-07  8:20                                                 ` Corinna Vinschen
2018-09-07 10:34                                                   ` Thomas Wolff
2018-09-07 11:29                                                     ` Corinna Vinschen
2018-09-07 11:42                                                       ` Thomas Wolff
2018-09-07 11:51                                                         ` Thomas Wolff
2018-09-07 11:54                                                           ` Corinna Vinschen
2018-09-07 16:22                                                             ` Brian Inglis
2018-09-07 16:48                                                             ` Brian Inglis
2018-09-07 17:01                                                               ` Marco Atzeri
2018-09-07 18:21                                                                 ` Corinna Vinschen
2018-09-07 18:20                                                               ` Corinna Vinschen
2018-09-05 13:35                                       ` Andrey Repin
2018-09-05 14:04                                         ` Houder
2018-09-05 15:05                                           ` Andrey Repin
2018-09-04 12:50                                 ` David Macek
2018-09-04 14:18                                   ` Thomas Wolff
2018-09-04 14:46                                     ` David Macek
2018-09-04 18:20                                     ` Steven Penny
2018-09-04 18:41                                       ` Thomas Wolff
2018-09-04 19:50                                         ` Andrey Repin
2018-09-04 19:53                                         ` Steven Penny
2018-09-04 21:43                                           ` Thomas Wolff
2018-09-04 23:29                                             ` Steven Penny
2018-09-04 20:40                                       ` Brian Inglis
2018-09-05  8:32                                         ` Corinna Vinschen
2018-09-04 13:05                                 ` Andrey Repin
2018-10-04  0:25                               ` Steven Penny
2018-09-03 16:05         ` Brian Inglis
2018-09-04 19:59 ` Doug Henderson
2018-09-04 21:05   ` Steven Penny [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5b8ef3af.1c69fb81.6801.f392@mx.google.com \
    --to=svnpenn@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).