From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 83157 invoked by alias); 15 Dec 2017 00:32:32 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 83149 invoked by uid 89); 15 Dec 2017 00:32:31 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.4 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=calgary, Calgary, alberta, Alberta X-HELO: smtp-out-so.shaw.ca Received: from smtp-out-so.shaw.ca (HELO smtp-out-so.shaw.ca) (64.59.136.137) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 15 Dec 2017 00:32:30 +0000 Received: from [192.168.1.100] ([24.64.240.204]) by shaw.ca with ESMTP id PdvPeyaTUS7BpPdvQeWcYr; Thu, 14 Dec 2017 17:32:29 -0700 X-Authority-Analysis: v=2.2 cv=NKylwwyg c=1 sm=1 tr=0 a=MVEHjbUiAHxQW0jfcDq5EA==:117 a=MVEHjbUiAHxQW0jfcDq5EA==:17 a=N659UExz7-8A:10 a=KsUnjp9KtFXNGl9f_ekA:9 a=pILNOxqGKmIA:10 Reply-To: Brian.Inglis@SystematicSw.ab.ca Subject: Re: Need help with multibyte UTF-8 characters To: cygwin@cygwin.com References: <626a3c06-e9f2-1932-f1f3-47ddb2051215@gmail.com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> <1909177a-3f35-52d5-1717-9007d6efaa71@gmail.com> From: Brian Inglis Message-ID: Date: Fri, 15 Dec 2017 02:51:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <1909177a-3f35-52d5-1717-9007d6efaa71@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfCEJAF5VPYxyq8QZlNZnlY49hZZAwKCzQarIvJJzqZbaddn04Z07k+2OaBfO0PgLdwNCSYEACJxFKFB0nQZVTFVa5dSvOKnPYeMHoLGnPaMKGPecg+fc Kk4/byUiOfO8VQygvpHFfqD3+kSllLh6pRaWjMbIgQnxY+knVeDPF7JSLfS+CjHnotQqQuRNE9oTmg== X-IsSubscribed: yes X-SW-Source: 2017-12/txt/msg00145.txt.bz2 On 2017-12-12 12:42, Thomas Taylor wrote: > I believe that Cygwin displays certain UTF-8 characters incorrectly.  To see the > problem, first save the attached "utf-8_test.sed" text file to your desktop.  > Then run "mintty," and set its options by right clicking in its title bar, > selecting "Options" and then "Text."  On the Text page set "Locale" to "en_US" > and "Character set" to "UTF-8," and then "Save."  Now exit and restart mintty.  > Change directory to your desktop and run the editor "vim" on the utf-8_test.sed > file.  Once inside vim do a ":set fileencoding=utf-8".  You should now see that > vim displays correctly a sample of one-, two-, and three-byte UTF-8 character > encodings in the test file.  Vim fails, however, on the three-byte encodings for > the "en" dash, the "em" dash, and the ellipsis, each of which displays > incorrectly as a filled-in rectangle.  Now exit vim and do a "less" or "cat" on > the utf-8_test.sed file.  You should see most of the sample UTF-8 encoded > characters displayed correctly, except once again for the en dash, em dash, and > ellipsis.  So it looks like a problem in the underlying Cygwin run-time > libraries rather than in vim, less, or cat.  I haven't tested this on four-byte > UTF-8 character encodings, but assume Cygwin will have similar problems. Like many others -- no problems visible -- all UTF-8 characters displayed correctly in gvim/X, vim, less, cat from mintty. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple