From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9147 invoked by alias); 14 Dec 2017 18:09:12 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 9080 invoked by uid 89); 14 Dec 2017 18:09:11 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.4 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=enc, bomb, calgary, Calgary X-HELO: smtp-out-so.shaw.ca Received: from smtp-out-so.shaw.ca (HELO smtp-out-so.shaw.ca) (64.59.136.138) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 14 Dec 2017 18:09:10 +0000 Received: from [192.168.1.100] ([24.64.240.204]) by shaw.ca with ESMTP id PXwQeiGIUb3YIPXwReZpcK; Thu, 14 Dec 2017 11:09:08 -0700 X-Authority-Analysis: v=2.2 cv=J/va1EvS c=1 sm=1 tr=0 a=MVEHjbUiAHxQW0jfcDq5EA==:117 a=MVEHjbUiAHxQW0jfcDq5EA==:17 a=N659UExz7-8A:10 a=w5aJ8kaLLAry8Qfnm_kA:9 a=pILNOxqGKmIA:10 Reply-To: Brian.Inglis@SystematicSw.ab.ca Subject: Re: Need help with multibyte UTF-8 characters To: cygwin@cygwin.com References: <626a3c06-e9f2-1932-f1f3-47ddb2051215@gmail.com> <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> From: Brian Inglis Message-ID: <4f67d273-61f1-29d0-433a-d519e70bf912@SystematicSw.ab.ca> Date: Thu, 14 Dec 2017 19:32:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 MIME-Version: 1.0 In-Reply-To: <9d3b73ff-f596-51a2-909a-30a767e3e9b3@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfMQhPYDd1a3J/eZnNL3va8gHg9BAAbzhZdJItEuz7we/sv0hUZ7e2R21uzjwqr1sL3rZ4fS+wjPXesQYbDCesD58K1+043gjCiqvpopGkQSLpKAVGWCS sF9ihWs1N1iqDzejqAjVN2tOO5xmSu94++YPJ0gmPLYwcpgjH1Q/dqeg4JYcmEm0Igqdaz+NfMiZGQ== X-IsSubscribed: yes X-SW-Source: 2017-12/txt/msg00138.txt.bz2 On 2017-12-11 16:36, Thomas Taylor wrote: > Thank you for your advice on setting my locale to en_US.UTF-8.  Unfortunately, > Cygwin still seems to have trouble displaying some three-byte UTF-8 encoded > characters correctly.  For example, see the following snippet from a "sed" > file.  This file attempts to convert XML-encoded filenames to UTF-8.  As you can > see, it converts one- and two-byte encodings correctly, but fails on some > three-byte encodings (the en dash, the em dash, and the ellipsis, all of which > are displayed as a filled-in rectangle): Going back to first principles - what is your script encoded as and run as? What characters are in your script? $ wc -lwmc ... What does vim say for that script: :set enc? tenc? fenc? fencs? eol? bomb? What does locale say sed runs as: $ locale -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple