From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91310 invoked by alias); 1 Oct 2016 05:17:38 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 90704 invoked by uid 89); 1 Oct 2016 05:16:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: Yes, score=6.4 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=iso88591, iso-8859-1, 2.6.0, unreadable X-HELO: smtp-out-no.shaw.ca Received: from smtp-out-no.shaw.ca (HELO smtp-out-no.shaw.ca) (64.59.134.9) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 01 Oct 2016 05:15:07 +0000 Received: from [192.168.1.100] ([174.0.238.184]) by shaw.ca with SMTP id qCdabMPMgFfiXqCdbbSpnu; Fri, 30 Sep 2016 23:15:04 -0600 X-Authority-Analysis: v=2.2 cv=Qb8khYTv c=1 sm=1 tr=0 a=WqCeCkldcEjBO3QZneQsCg==:117 a=WqCeCkldcEjBO3QZneQsCg==:17 a=IkcTkHD0fZMA:10 a=3YTwEyP03cqNC2KE9nAA:9 a=QEXdDO2ut3YA:10 Reply-To: Brian.Inglis@SystematicSw.ab.ca Subject: Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console References: <123291584.20161001051347@vanav.org> From: Brian Inglis To: cygwin@cygwin.com Message-ID: Date: Sat, 01 Oct 2016 05:17:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfH238VGQbtM5zNc2p1pkYJO4FbNHxkDRSRPeLbjuHATf5QmdXgWfsQQE32zUkPVoZj29CgKv8eYgwBibqHUbmt05FWR7wD6X9SX5WFvpLDz8og8cClYf U74uwQES3vyEIBazKScAwY+EgOIvMJr/POXp6PwMsreBJ1fiRkTG0YWiC+o1zvJT+l0eKzQzayVnKA== X-IsSubscribed: yes X-SW-Source: 2016-10/txt/msg00001.txt.bz2 On 2016-09-30 22:34, Brian Inglis wrote: > On 2016-09-30 20:13, Ivan Vanyushkin wrote: >> Something has changed in version 2.6.0, and now UTF-8 text can't be displayed in Windows console (cmd). >> 1. Create a file "test.txt" with non-ASCII text in UTF-8 encoding. >> 2. Run "cmd". >> 3. Run: >> C:\Cygwin\bin\cat test.txt >> ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒ ▒▒▒▒▒▒ 8000 ▒▒. ▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒. >> Non-ASCII text is not readable. Older Cygwin 2.5.2 has no such issue. >> C:\Cygwin\bin\uname -a >> CYGWIN_NT-10.0 PCName 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin >> C:\Cygwin\bin\locale >> LANG= >> LC_CTYPE="C.UTF-8" >> LC_NUMERIC="C.UTF-8" >> LC_TIME="C.UTF-8" >> LC_COLLATE="C.UTF-8" >> LC_MONETARY="C.UTF-8" >> LC_MESSAGES="C.UTF-8" >> LC_ALL= >> Same issue with any other commands like "grep", or with utilities built and run under Cygwin 2.6.0. >> Same issue in other Windows consoles, like ConEmu or FAR Manager. >> If I change Windows console encoding to UTF-8 (run: "chcp 65001"), file can be correctly displayed natively >> (run: "type test.txt"), but Cygwin "cat" still has the same issue. >> How should I display UTF-8 now? > > No problems here - same setup. > Don't have files containing UTF-8 specials handy, but do have with Latin1 (ISO-8859-1) specials, > convertable to UTF-8. > Stripped common ASCII-only lines from output below. > Default email encoding is Unicode (hopefully UTF-8) not Western (presumably Latin1), so should render accurately. > > $ uname -srvmo > CYGWIN_NT-10.0 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin > $ locale > LANG=C.UTF-8 > LC_CTYPE="C.UTF-8" > LC_NUMERIC="C.UTF-8" > LC_TIME="C.UTF-8" > LC_COLLATE="C.UTF-8" > LC_MONETARY="C.UTF-8" > LC_MESSAGES="C.UTF-8" > LC_ALL=C.UTF-8 > $ egrep -a 'Deg|LF' latin1.txt # -a needed to override binary assumption - garbled characters > DegN='▒N' > DegW='▒W' > Y2LF='%s▒%s %s %s' > Y2LLF='|▒%.0s|' > LF='|▒'.YFP.'|' > $ iconv -f iso-8859-1 -t utf-8 latin1.txt | egrep 'Deg|LF' # good utf-8 characters > DegN='°N' > DegW='°W' > Y2LF='%s±%s %s %s' > Y2LLF='|±%.0s|' > LF='|±'.YFP.'|' Sorry - this was mintty - you used cmd! Saw similar problems you had until I set LC_ALL=C.UTF-8 (and LANG for consistency, but doesn't really matter) and chcp 65001. Then type and Cygwin commands produce the same output. Without CP65001 (and a Unicode console font mapping most characters - I use DejaVu Sans Mono everywhere I can) there may be no valid encoding for UTF-8 special characters in your default console CP (437 for US, 850 for non-US, others for localized versions). Unfortunately then less displays spaces as squares, so you may have to set PAGER=more for readability. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 96663 invoked by alias); 1 Oct 2016 05:18:20 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 95941 invoked by uid 89); 1 Oct 2016 05:18:19 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: Yes, score=6.4 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,GARBLED_BODY,KAM_LAZY_DOMAIN_SECURITY,LIKELY_SPAM_SUBJECT,RCVD_IN_DNSWL_LOW autolearn=no version=3.3.2 spammy=iso88591, iso-8859-1, 2.6.0, unreadable X-HELO: smtp-out-no.shaw.ca Received: from smtp-out-no.shaw.ca (HELO smtp-out-no.shaw.ca) (64.59.134.9) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 01 Oct 2016 05:17:05 +0000 Received: from [192.168.1.100] ([174.0.238.184]) by shaw.ca with SMTP id qCdabMPMgFfiXqCdbbSpnu; Fri, 30 Sep 2016 23:15:04 -0600 X-Authority-Analysis: v=2.2 cv=Qb8khYTv c=1 sm=1 tr=0 a=WqCeCkldcEjBO3QZneQsCg==:117 a=WqCeCkldcEjBO3QZneQsCg==:17 a=IkcTkHD0fZMA:10 a=3YTwEyP03cqNC2KE9nAA:9 a=QEXdDO2ut3YA:10 Reply-To: Brian.Inglis@SystematicSw.ab.ca Subject: Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console References: <123291584.20161001051347@vanav.org> From: Brian Inglis To: cygwin@cygwin.com Message-ID: Date: Sat, 01 Oct 2016 09:05:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4wfH238VGQbtM5zNc2p1pkYJO4FbNHxkDRSRPeLbjuHATf5QmdXgWfsQQE32zUkPVoZj29CgKv8eYgwBibqHUbmt05FWR7wD6X9SX5WFvpLDz8og8cClYf U74uwQES3vyEIBazKScAwY+EgOIvMJr/POXp6PwMsreBJ1fiRkTG0YWiC+o1zvJT+l0eKzQzayVnKA== X-IsSubscribed: yes X-SW-Source: 2016-10/txt/msg00002.txt.bz2 Message-ID: <20161001090500.wREa22wJliFGlSxtCX1ZpCkA1ChU5hLgNQMqjeBr9kg@z> On 2016-09-30 22:34, Brian Inglis wrote: > On 2016-09-30 20:13, Ivan Vanyushkin wrote: >> Something has changed in version 2.6.0, and now UTF-8 text can't be displayed in Windows console (cmd). >> 1. Create a file "test.txt" with non-ASCII text in UTF-8 encoding. >> 2. Run "cmd". >> 3. Run: >> C:\Cygwin\bin\cat test.txt >> ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒ ▒▒▒▒▒▒ 8000 ▒▒. ▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ ▒▒▒▒▒▒▒▒▒▒. >> Non-ASCII text is not readable. Older Cygwin 2.5.2 has no such issue. >> C:\Cygwin\bin\uname -a >> CYGWIN_NT-10.0 PCName 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin >> C:\Cygwin\bin\locale >> LANG= >> LC_CTYPE="C.UTF-8" >> LC_NUMERIC="C.UTF-8" >> LC_TIME="C.UTF-8" >> LC_COLLATE="C.UTF-8" >> LC_MONETARY="C.UTF-8" >> LC_MESSAGES="C.UTF-8" >> LC_ALL= >> Same issue with any other commands like "grep", or with utilities built and run under Cygwin 2.6.0. >> Same issue in other Windows consoles, like ConEmu or FAR Manager. >> If I change Windows console encoding to UTF-8 (run: "chcp 65001"), file can be correctly displayed natively >> (run: "type test.txt"), but Cygwin "cat" still has the same issue. >> How should I display UTF-8 now? > > No problems here - same setup. > Don't have files containing UTF-8 specials handy, but do have with Latin1 (ISO-8859-1) specials, > convertable to UTF-8. > Stripped common ASCII-only lines from output below. > Default email encoding is Unicode (hopefully UTF-8) not Western (presumably Latin1), so should render accurately. > > $ uname -srvmo > CYGWIN_NT-10.0 2.6.0(0.304/5/3) 2016-08-31 14:32 x86_64 Cygwin > $ locale > LANG=C.UTF-8 > LC_CTYPE="C.UTF-8" > LC_NUMERIC="C.UTF-8" > LC_TIME="C.UTF-8" > LC_COLLATE="C.UTF-8" > LC_MONETARY="C.UTF-8" > LC_MESSAGES="C.UTF-8" > LC_ALL=C.UTF-8 > $ egrep -a 'Deg|LF' latin1.txt # -a needed to override binary assumption - garbled characters > DegN='▒N' > DegW='▒W' > Y2LF='%s▒%s %s %s' > Y2LLF='|▒%.0s|' > LF='|▒'.YFP.'|' > $ iconv -f iso-8859-1 -t utf-8 latin1.txt | egrep 'Deg|LF' # good utf-8 characters > DegN='°N' > DegW='°W' > Y2LF='%s±%s %s %s' > Y2LLF='|±%.0s|' > LF='|±'.YFP.'|' Sorry - this was mintty - you used cmd! Saw similar problems you had until I set LC_ALL=C.UTF-8 (and LANG for consistency, but doesn't really matter) and chcp 65001. Then type and Cygwin commands produce the same output. Without CP65001 (and a Unicode console font mapping most characters - I use DejaVu Sans Mono everywhere I can) there may be no valid encoding for UTF-8 special characters in your default console CP (437 for US, 850 for non-US, others for localized versions). Unfortunately then less displays spaces as squares, so you may have to set PAGER=more for readability. -- Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple