From: Steven Penny <svnpenn@gmail.com>
To: cygwin@cygwin.com
Subject: Re: Cygwin fails to utilize Unicode replacement character
Date: Tue, 04 Sep 2018 21:05:00 -0000 [thread overview]
Message-ID: <5b8ef3af.1c69fb81.6801.f392@mx.google.com> (raw)
In-Reply-To: <CAJ1FpuNrMhfB-cmKSiQbj_JB2F_GymCzuv_kY2K9M7RuFqr8Rw@mail.gmail.com>
On Tue, 4 Sep 2018 13:59:10, Doug Henderson wrote:
> My preference is to remove the output fiddling code that Corrina has
> been working on. It is trying to solve the wrong problem.
> I think we have gone down a rabbit hole at the wrong end of cat's data flow.
this has nothing to do with "cat". it has to do with the unfounded design
decision to use U+2592. Granted at this point we are bikeshedding - but an
official standard does exist, namely Unicode, with 2 applicable characters for
this use case:
1. U+FFFD: http://unicode.org/charts/nameslist/n_FFF0.html
2. U+25A1: http://unicode.org/charts/nameslist/n_25A0.html
> Should any changes to the way a character is displayed be required, it
> needs to be in the terminal program that display the character, not in
> cygwin which should pass the character along unmodified.
the "terminal" in this case is either "cygwin" or "xterm" - in both cases code
changes have already been made in reponse to this thread, so i dont think your
comment here holds weight.
> Both cygwin and Debian 9.5 show:
>
> $ file alfa.txt
> alfa.txt: ISO-8859 text
>
> When Linux reads the file, it assumes the encoding is UTF-8.
> When cygwin reads the file, it assume the encoding is CP1252
> This command shows the problem
>
> $ iconv -f utf8 alfa.txt
> iconv: alfa.txt:1:0: incomplete character or shift sequence
>
> On Linux, this shows a slightly different message, with the same intent.
>
> Try using this string:
>
> $ printf "\xC3\xAB\353\n"
> =C3=AB=E2=96=92
>
> to get a better understanding of the problem. It contains two
> representation of LATIN SMALL LETTER E WITH DIAERESIS, first encoded
> in UTF-8, then using ISO-8859-1.
now it appears *you* are going down the rabbit hole. both Cygwin and Mintty were
in violation on Unicode standard - however this has already been remedied in the
code.
> There are two different reasons for the MEDIUM SHADE. Here it
> indicates an invalid UTF-8 character, and the font does not have a
> glyph for REPLACEMENT CHARACTER. The MEDIUM SHADE is also used in
> place of an ordinary character without a glyph in the font.
this is flat wrong. U+2592 MEDIUM SHADE is *only* used in cases of invalid
UTF-8. In case of missing character - the ".notdef" glyph is used - as has been
discussed several times in this thread. This is not an actual character, so i
cannot paste it here - but as an example with "DejaVu Sans Mono" the glyph is
an empty rectangle.
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
prev parent reply other threads:[~2018-09-04 21:05 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-01 16:13 Steven Penny
2018-09-01 18:11 ` Thomas Wolff
2018-09-01 18:46 ` Steven Penny
2018-09-01 21:07 ` Thomas Wolff
2018-09-01 19:40 ` Corinna Vinschen
2018-09-01 21:50 ` Doug Henderson
2018-09-01 22:49 ` Steven Penny
2018-09-02 8:07 ` Thomas Wolff
2018-09-02 12:51 ` Steven Penny
2018-09-03 12:46 ` Corinna Vinschen
2018-09-03 14:59 ` Corinna Vinschen
2018-09-03 16:34 ` Thomas Wolff
2018-09-03 17:17 ` Corinna Vinschen
2018-09-03 17:56 ` Thomas Wolff
2018-09-03 18:20 ` Thomas Wolff
2018-09-03 19:14 ` Corinna Vinschen
2018-09-03 20:27 ` Corinna Vinschen
2018-09-03 20:42 ` Thomas Wolff
2018-09-03 21:03 ` Corinna Vinschen
2018-09-03 22:15 ` Steven Penny
2018-09-04 6:06 ` Brian Inglis
2018-09-04 9:00 ` Corinna Vinschen
2018-09-04 11:40 ` Steven Penny
2018-09-05 7:55 ` Corinna Vinschen
2018-09-05 9:22 ` Thomas Wolff
2018-09-05 11:58 ` Steven Penny
2018-09-05 13:18 ` Marco Atzeri
2018-09-05 15:20 ` Andrey Repin
2018-09-05 15:58 ` Corinna Vinschen
2018-09-05 20:15 ` Corinna Vinschen
2018-09-06 1:35 ` Steven Penny
2018-09-06 7:01 ` Corinna Vinschen
2018-09-07 8:20 ` Corinna Vinschen
2018-09-07 10:34 ` Thomas Wolff
2018-09-07 11:29 ` Corinna Vinschen
2018-09-07 11:42 ` Thomas Wolff
2018-09-07 11:51 ` Thomas Wolff
2018-09-07 11:54 ` Corinna Vinschen
2018-09-07 16:22 ` Brian Inglis
2018-09-07 16:48 ` Brian Inglis
2018-09-07 17:01 ` Marco Atzeri
2018-09-07 18:21 ` Corinna Vinschen
2018-09-07 18:20 ` Corinna Vinschen
2018-09-05 13:35 ` Andrey Repin
2018-09-05 14:04 ` Houder
2018-09-05 15:05 ` Andrey Repin
2018-09-04 12:50 ` David Macek
2018-09-04 14:18 ` Thomas Wolff
2018-09-04 14:46 ` David Macek
2018-09-04 18:20 ` Steven Penny
2018-09-04 18:41 ` Thomas Wolff
2018-09-04 19:50 ` Andrey Repin
2018-09-04 19:53 ` Steven Penny
2018-09-04 21:43 ` Thomas Wolff
2018-09-04 23:29 ` Steven Penny
2018-09-04 20:40 ` Brian Inglis
2018-09-05 8:32 ` Corinna Vinschen
2018-09-04 13:05 ` Andrey Repin
2018-10-04 0:25 ` Steven Penny
2018-09-03 16:05 ` Brian Inglis
2018-09-04 19:59 ` Doug Henderson
2018-09-04 21:05 ` Steven Penny [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b8ef3af.1c69fb81.6801.f392@mx.google.com \
--to=svnpenn@gmail.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).