From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4817 invoked by alias); 12 Dec 2017 19:42:41 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 4800 invoked by uid 89); 12 Dec 2017 19:42:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.8 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=Euro, Sharp, percent, dash X-HELO: mail-qt0-f177.google.com Received: from mail-qt0-f177.google.com (HELO mail-qt0-f177.google.com) (209.85.216.177) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 12 Dec 2017 19:42:39 +0000 Received: by mail-qt0-f177.google.com with SMTP id m59so50241470qte.11 for ; Tue, 12 Dec 2017 11:42:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:references:from:to:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=UISQ/SEEU0jmJ+I7YTNQzOu3IqqRlBdMfcY93vO/+yA=; b=G//E1hq/kYYI4rXMAXu3lGj4gw61BbzjXioJ7w5UHmTdR8LCb3/j0OZ/6KtIS2Dtgd faEBVQF5gYcToWN8ArLNR36Dw2cPcY2/DLF3nhTY7ReJyEIcJ9PEyPBydjNrawk/m2U+ V5j04UIfW+EbEmoPxOXIyH5+H78zUsY+11yN4OJwjDXmkNYmn7IzZEhJQ4vRG64ZcnTA XIpaDS1f1/Noz5ESGLgZWXx7y8aBrnWlpor4Kfae1wOLxxaQvBQ9xmMuu3+i6OcG2jV6 RYq3uzEht71gKQ9gwSB7fQCQC7XFFMD0ILvdzKCl4ahIRAyoufY7Megtr/XLVPngiugx 1m/Q== X-Gm-Message-State: AKGB3mIGzFeTJLAO2QdELYtqP8YQVcU6DGmNbC+uleHUc3iKlffxey7r 4n8S/ltSCgUJB1Ow5Olpbq4= X-Google-Smtp-Source: ACJfBouWk1KSS3Rf0jvN/a844/KvrQjs0mCvqG1uglekakUcb1qCiLNaOskrr+JB/I/2LbbksdKqEA== X-Received: by 10.200.41.145 with SMTP id 17mr7266137qts.239.1513107758017; Tue, 12 Dec 2017 11:42:38 -0800 (PST) Received: from [192.168.0.20] (74-94-185-237-NewEngland.hfc.comcastbusiness.net. [74.94.185.237]) by smtp.googlemail.com with ESMTPSA id x79sm1551442qkg.15.2017.12.12.11.42.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Dec 2017 11:42:37 -0800 (PST) Subject: Re: Need help with multibyte UTF-8 characters References: <626a3c06-e9f2-1932-f1f3-47ddb2051215@gmail.com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> From: Thomas Taylor To: cygwin@cygwin.com Message-ID: <1909177a-3f35-52d5-1717-9007d6efaa71@gmail.com> Date: Tue, 12 Dec 2017 20:17:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> Content-Type: multipart/mixed; boundary="------------F45A94644063078C0C6E8549" X-SW-Source: 2017-12/txt/msg00115.txt.bz2 --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-length: 1272 I believe that Cygwin displays certain UTF-8 characters incorrectly.  To see the problem, first save the attached "utf-8_test.sed" text file to your desktop.  Then run "mintty," and set its options by right clicking in its title bar, selecting "Options" and then "Text."  On the Text page set "Locale" to "en_US" and "Character set" to "UTF-8," and then "Save."  Now exit and restart mintty.  Change directory to your desktop and run the editor "vim" on the utf-8_test.sed file.  Once inside vim do a ":set fileencoding=utf-8".  You should now see that vim displays correctly a sample of one-, two-, and three-byte UTF-8 character encodings in the test file.  Vim fails, however, on the three-byte encodings for the "en" dash, the "em" dash, and the ellipsis, each of which displays incorrectly as a filled-in rectangle.  Now exit vim and do a "less" or "cat" on the utf-8_test.sed file.  You should see most of the sample UTF-8 encoded characters displayed correctly, except once again for the en dash, em dash, and ellipsis.  So it looks like a problem in the underlying Cygwin run-time libraries rather than in vim, less, or cat.  I haven't tested this on four-byte UTF-8 character encodings, but assume Cygwin will have similar problems. --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=UTF-8; name="utf-8_test.sed" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="utf-8_test.sed" Content-length: 1546 IyBUaGlzIGlzIGZpbGUgInV0Zi04X3Rlc3Quc2VkIgojCiMgSXQncyB1c2Vk IGJ5IHRoZSAic2VkIiB1dGlsaXR5IHByb2dyYW0KIyB0byBjb252ZXJ0IFhN TC1lbmNvZGVkIGZpbGVuYW1lcyB0byBVVEYtOAoKIyBNYXRjaCBsb25nZXN0 IHN0cmluZ3MgZmlyc3QKCiMgVGhyZWUtYnl0ZSBlbmNvZGluZ3M6CgojIEVu IGRhc2gKcy8lW0VlXTIlODAlOTMv4oCTL2cKCiMgRW0gZGFzaApzLyVbRWVd MiU4MCU5NC/igJQvZwoKIyBIb3Jpem9udGFsIGVsbGlwc2lzCnMvJVtFZV0y JTgwJVtBYV02L+KApi9nCgojIExlc3MtdGhhbi1vci1lcXVhbCBzaWduCnMv JVtFZV0yJTg5JVtBYV00L+KJpC9nCgojIEV1cm8gc3ltYm9sCnMvJVtFZV0y JTgyJVtBYV1bQ2NdL+KCrC9nCgojIFR3by1ieXRlIGVuY29kaW5nczoKCiMg Tm9uLWJyZWFrIHNwYWNlCnMvJVtDY10yJVtBYV0wL+KOtS9nCgojIExvd2Vy Y2FzZSBhIHdpdGggYWN1dGUgYWNjZW50CnMvJVtDY10zJVtBYV0xL8OhL2cK CiMgTG93ZXJjYXNlIGEgd2l0aCB1bWxhdXQgKGEuay5hLiBkaWFlcmVzaXMp CnMvJVtDY10zJVtBYV00L8OkL2cKCiMgTG93ZXJjYXNlIGUgd2l0aCBhY3V0 ZSBhY2NlbnQKcy8lW0NjXTMlW0FhXTkvw6kvZwoKIyBMb3dlcmNhc2UgaSB3 aXRoIGFjdXRlIGFjY2VudApzLyVbQ2NdMyVbQWFdRC/DrS9nCgojIExvd2Vy Y2FzZSBvIHdpdGggYWN1dGUgYWNjZW50CnMvJVtDY10zJVtCYl0zL8OzL2cK CiMgTG93ZXJjYXNlIG4gd2l0aCB0aWxkZQpzLyVbQ2NdMyVbQmJdMS/DsS9n CgojIExvd2VyY2FzZSBjIHdpdGggYWN1dGUgYWNjZW50IApzLyVbQ2NdNCU4 Ny/Ehy9nCgojIExvd2VyY2FzZSBvIHdpdGggbG9uZyBhY2NlbnQgKGEuay5h LiBtYWNyb24pCnMvJVtDY101JThbRGRdL8WNL2cKCiMgT25lLWJ5dGUgZW5j b2RpbmdzOgoKIyAiQW5kIiBzaWduIChhLmsuYS4gYW1wZXJzYW5kKQpzLyYj Mzg7L1wmL2cKCiMgU3BhY2UKcy8lMjAvIC9nCgojIFNoYXJwIChvciBwb3Vu ZCkgc2lnbgpzLyUyMy8jL2cKCiMgUGVyY2VudCBzaWduCnMvJTI1LyUvZwoK IyBMZWZ0IHNxdWFyZSBicmFja2V0CnMvJTVbQmJdL1svZwoKIyBSaWdodCBz cXVhcmUgYnJhY2tldApzLyU1W0RkXS9dL2cKCiMgRW5kIG9mIGZpbGUgInV0 Zi04X3Rlc3Quc2VkIgoK --------------F45A94644063078C0C6E8549 Content-Type: text/plain; charset=us-ascii Content-length: 219 -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple --------------F45A94644063078C0C6E8549--