From: Lee <ler762@gmail.com>
To: cygwin@cygwin.com
Subject: Re: UTF-8 character encoding
Date: Mon, 25 Jun 2018 20:52:00 -0000 [thread overview]
Message-ID: <CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com> (raw)
In-Reply-To: <5B3045B1.4080504@tlinx.org>
On 6/24/18, L A Walsh <cygwin@tlinx.org> wrote:
> Lee wrote:
>> So... keep it simple, set
>> LANG=en_US.UTF-8
>> and use vi or something else that comes with cygwin to create the file
>> and I'll have a file with UTF-8 character encoding - correct?
> ---
> The first 127 characters of UTF-8 are identical to the
> first 127 characters of ASCII, and latin1 and iso-8859-1.
>
> If you don't use any characters that need accents or special symbols,
> then nothing will be encoded in UTF-8, because its only
> the characters OVER the first 127
> (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html).
I'm still trying to figure utf-8 out, but it seems to me that 0x0 -
0xff is part of the utf-8 encoding. This chart makes things clearer
... at least for me :)
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
The proposed UCS transformation format encodes UCS values in the range
[0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5
bytes. For all encodings of more than one byte, the initial byte
determines the number of bytes used and the high-order bit in each byte
is set.
An easy way to remember this transformation format is to note that the
number of high-order 1's in the first byte is the same as the number of
subsequent bytes in the multibyte character:
Bits Hex Min Hex Max Byte Sequence in Binary
1 7 00000000 0000007f 0zzzzzzz
2 13 00000080 0000207f 10zzzzzz 1yyyyyyy
3 19 00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx
4 25 00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww
5 31 02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv
Thanks
Lee
--
Problem reports: http://cygwin.com/problems.html
FAQ: http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
next prev parent reply other threads:[~2018-06-25 18:33 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-21 7:20 Lee
2018-06-21 10:12 ` Stefan Weil
2018-06-21 10:39 ` Andrey Repin
2018-06-22 7:31 ` Lee
2018-06-22 17:30 ` Andrey Repin
2018-06-25 9:56 ` L A Walsh
2018-06-25 20:52 ` Lee [this message]
2018-06-26 21:39 ` Thomas Wolff
2018-06-27 9:31 ` Lee
2018-06-27 7:50 ` Michael Enright
2018-06-27 9:34 ` Lee
2018-06-21 18:49 ` Houder
2018-06-21 20:46 ` Houder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAD8GWsuevQX6fBUzkEvUs5rBPehhG7-ht+FPZU=eOaACF5uCPg@mail.gmail.com' \
--to=ler762@gmail.com \
--cc=cygwin@cygwin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).