From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 118405 invoked by alias); 27 Jun 2018 06:25:53 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 114896 invoked by uid 89); 27 Jun 2018 06:25:45 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=Third, Distribution X-HELO: mail-io0-f174.google.com Received: from mail-io0-f174.google.com (HELO mail-io0-f174.google.com) (209.85.223.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 27 Jun 2018 06:25:44 +0000 Received: by mail-io0-f174.google.com with SMTP id q4-v6so820609iob.2 for ; Tue, 26 Jun 2018 23:25:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=WVtrYravQ8xDzZvJTekBa1xbM+tCjo80ua9b7CvMDAc=; b=kiRmnAT57NAmqwSPDUB/qEI7+SNg7kcnNRUB0mnJAGDOPas9s1nKUScCfSW5XFra5y PWo3oFNB+iuenAiZ7zDrIRhqh3Y5MQx2aJ11R6y+e2UVuh5jh654ZDs1aTaiuv0lED8r lu3uvDXfWiSZ81lrdS8KhEhXtiA4QlXE+td/BkeQxftabJdadRcma+P7nfPwcgJojgbK cSGzmkkWJpX42hdKa1HjaB9lrhco3Zjsd88Al9Asv8mXkP1aZilRAPDHooLJzrv+1uss QN0dKZF5Wa+j/BtaP+NjYiKl1mjpXhWCZqVSd/5urhAfv9f6bie8AwH/TpzdWfBmwVd5 uuzQ== MIME-Version: 1.0 Received: by 2002:a4f:f87:0:0:0:0:0 with HTTP; Tue, 26 Jun 2018 23:25:41 -0700 (PDT) In-Reply-To: <981ba1fe-7961-5ed0-e3c7-a5717af8c141@towo.net> References: <1183751257.20180621042620@yandex.ru> <5B3045B1.4080504@tlinx.org> <981ba1fe-7961-5ed0-e3c7-a5717af8c141@towo.net> From: Lee Date: Wed, 27 Jun 2018 09:31:00 -0000 Message-ID: Subject: Re: UTF-8 character encoding To: cygwin@cygwin.com Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-06/txt/msg00295.txt.bz2 On 6/26/18, Thomas Wolff wrote: > This encoding scheme is wrong; where did you get it from? Maybe it's the > obsolete UTF-8... http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt I thought I saw something about utf-8 being able to handle a 31 bit value.. is that also obsolete/wrong? how about this for the current encoding scheme: http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf Table 3-6. UTF-8 Bit Distribution Bits Scalar Value First Byte Second Byte Third Byte Fourth Byte 7 00000000 0xxxxxxx 0xxxxxxx 11 00000yyy yyxxxxxx 110yyyyy 10xxxxxx 16 zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx 21 000uuuuu zzzzyyyy yyxxxxxx 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple