public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Doug Henderson <djndnbvg@gmail.com>
To: cygwin <cygwin@cygwin.com>
Subject: Re: Need help with multibyte UTF-8 characters
Date: Tue, 12 Dec 2017 20:00:00 -0000	[thread overview]
Message-ID: <CAJ1FpuM_omZszrpUj9TtjgrQbvpUqKb6AUnGOzZfgmSLiMb1Tg@mail.gmail.com> (raw)
In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com>

On 11 December 2017 at 16:36, Thomas Taylor wrote:
> Thank you for your advice on setting my locale to en_US.UTF-8.
> Unfortunately, Cygwin still seems to have trouble displaying some three-byte
> UTF-8 encoded characters correctly.  For example, see the following snippet
> from a "sed" file.  This file attempts to convert XML-encoded filenames to
> UTF-8.  As you can see, it converts one- and two-byte encodings correctly,
> but fails on some three-byte encodings (the en dash, the em dash, and the
> ellipsis, all of which are displayed as a filled-in rectangle):
>

Your sed script works for me. I copy/pasted your sample script into
"cvt_script.sed" and also into "cvt_input.txt". My sed command looks
like: "sed --file=cvt_script.sed < cvt_input.txt > cvt_output.txt". It
correctly translates all the encoded utf-8 strings.

Your display may appear different if you are using different fonts in
mintty or the windows console. I am using Lucinda Console, 10pt and
Consolas 16, respectively. They display different glyphs for the
non-breaking space, but are otherwise identical. In mintty, I have
LANG and all the LC_* variables set to en_CA.UTF-8, and in the windows
console, to en_US.UTF-8.

I am running Win 10 and cygwin setup was last updated a couple or
three days ago.

Check the output of the "locale" command. All variables should have
the same value.

Is your cygwin installation up to date, or fairly close to current?
What wiindows version are you using?

HTH,
Doug


-- 
Doug Henderson, Calgary, Alberta, Canada - from gmail.com

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2017-12-12 16:41 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-05  1:24 Thomas Taylor
2017-12-05  3:48 ` Brian Inglis
2017-12-12  3:43 ` Thomas Taylor
2017-12-12 20:00   ` Doug Henderson [this message]
2017-12-12 20:17   ` Thomas Taylor
2017-12-14 19:50     ` Andrey Repin
2017-12-15  2:51     ` Brian Inglis
2017-12-16  1:50       ` Thomas Wolff
2017-12-13  3:06   ` Thomas Wolff
2017-12-14 19:32   ` Brian Inglis
2017-12-13 13:07 ` Brian Inglis
2017-12-13 13:28   ` Thomas Wolff
2017-12-14  1:15     ` cyg Simple
2017-12-14  7:36     ` Brian Inglis
2017-12-14 16:21       ` Thomas Wolff
2017-12-14 18:09         ` cyg Simple
2017-12-14 19:20           ` Thomas Wolff
2017-12-14 16:55       ` cyg Simple

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ1FpuM_omZszrpUj9TtjgrQbvpUqKb6AUnGOzZfgmSLiMb1Tg@mail.gmail.com \
    --to=djndnbvg@gmail.com \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).