From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107631 invoked by alias); 25 Jun 2018 18:33:13 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 107526 invoked by uid 89); 25 Jun 2018 18:33:12 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-6.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=Min, walsh, Walsh, 1s X-HELO: mail-io0-f171.google.com Received: from mail-io0-f171.google.com (HELO mail-io0-f171.google.com) (209.85.223.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 25 Jun 2018 18:33:09 +0000 Received: by mail-io0-f171.google.com with SMTP id l19-v6so13478973ioj.5 for ; Mon, 25 Jun 2018 11:33:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=ebeg1YLlm6C5p/Fmy6bnU4+7tauwUWY4O+/h0YN97mw=; b=aELNU0SETVom+28jSrr//xM8vT1U9Fcf/UJxmSQvbKt6e5YBzMoOL2gc/et2PybWJl kw2FgUDHFl+bpjd73vGeZXGEojPE3FAgTOM3ME5X+/59EOFFDBax48bRgjt2rpJ8d1qZ V3tOIuUSkIgiW7QxnPCU36LRpoF08AM2m/zlCCWtw5wksLn394lboDs8lzagHATJxzp9 vgQEaiIgfXWWyoWy+3XM31bGGZlpGCPq5z/5O5qUgR/OThnJ0gSIjHaKk++Snhcy1xAt LhcPp5GFIg5wOVPIGmQaJjlP+j/X3O7YWDJCZ5aD2LRKdWl+hL2RLUct2S3ID8UoOGC7 ww5g== MIME-Version: 1.0 Received: by 2002:a4f:f87:0:0:0:0:0 with HTTP; Mon, 25 Jun 2018 11:33:06 -0700 (PDT) In-Reply-To: <5B3045B1.4080504@tlinx.org> References: <1183751257.20180621042620@yandex.ru> <5B3045B1.4080504@tlinx.org> From: Lee Date: Mon, 25 Jun 2018 20:52:00 -0000 Message-ID: Subject: Re: UTF-8 character encoding To: cygwin@cygwin.com Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg00281.txt.bz2 On 6/24/18, L A Walsh wrote: > Lee wrote: >> So... keep it simple, set >> LANG=en_US.UTF-8 >> and use vi or something else that comes with cygwin to create the file >> and I'll have a file with UTF-8 character encoding - correct? > --- > The first 127 characters of UTF-8 are identical to the > first 127 characters of ASCII, and latin1 and iso-8859-1. > > If you don't use any characters that need accents or special symbols, > then nothing will be encoded in UTF-8, because its only > the characters OVER the first 127 > (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html). I'm still trying to figure utf-8 out, but it seems to me that 0x0 - 0xff is part of the utf-8 encoding. This chart makes things clearer ... at least for me :) http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt The proposed UCS transformation format encodes UCS values in the range [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5 bytes. For all encodings of more than one byte, the initial byte determines the number of bytes used and the high-order bit in each byte is set. An easy way to remember this transformation format is to note that the number of high-order 1's in the first byte is the same as the number of subsequent bytes in the multibyte character: Bits Hex Min Hex Max Byte Sequence in Binary 1 7 00000000 0000007f 0zzzzzzz 2 13 00000080 0000207f 10zzzzzz 1yyyyyyy 3 19 00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx 4 25 00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww 5 31 02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv Thanks Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple