From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 66696 invoked by alias); 2 Oct 2016 06:27:09 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 66644 invoked by uid 89); 2 Oct 2016 06:27:05 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?Yes, score=6.5 required=5.0 tests=AWL,BAYES_50,BODY_8BITS,GARBLED_BODY,LIKELY_SPAM_SUBJECT,SPF_PASS autolearn=no version=3.3.2 spammy==d0=b8=d1, unreadable, lc_messages, LC_MESSAGES?= X-HELO: Marvin.c0d.org Received: from Marvin.c0d.org (HELO Marvin.c0d.org) (46.165.225.232) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sun, 02 Oct 2016 06:26:55 +0000 Received: from marvin (Marvin.c0d.org [127.0.0.1]) by Marvin.c0d.org (Postfix) with ESMTP id A8A7E192C31 for ; Sun, 2 Oct 2016 09:26:52 +0300 (MSK) Authentication-Results: marvin.c0d.org (amavisd-new); dkim=pass (1024-bit key) reason="pass (just generated, assumed good)" header.d=vanav.org Received: from Marvin.c0d.org ([127.0.0.1]) by marvin (marvin.c0d.org [127.0.0.1]) (amavisd-new, port 10026) with LMTP id rVeoWKCljQpB for ; Sun, 2 Oct 2016 09:26:52 +0300 (MSK) Received: from [127.0.0.1] (localhost [127.0.0.1]) by Marvin.c0d.org (Postfix) with ESMTPSA id 3274B192C2E for ; Sun, 2 Oct 2016 09:26:52 +0300 (MSK) Date: Sun, 02 Oct 2016 06:29:00 -0000 From: Ivan Vanyushkin Message-ID: <1025056909.20161002092650@vanav.org> To: cygwin@cygwin.com Subject: Re: Cygwin 2.6.0: unreadable UTF-8 in Windows console In-Reply-To: References: <123291584.20161001051347@vanav.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2016-10/txt/msg00012.txt.bz2 "set LANG=C.UTF-8" has fixed the issue on Cygwin 2.6.0. But documentation says [1], that "The default locale in the absence of the aforementioned locale environment variables is "C.UTF-8"." Seems this is broken in Cygwin 2.6.0. "chcp 65001", console font or console charset doesn't matter here. This is bad, because now I can't share compiled binaries to anyone, because users will have no LANG in environment variable, and any non-ACSII text will not be readable. For example, list running Windows services: sc query | grep -i "running" - will not work for not-English Windows, because output in console will not be readable. Watch Windows log: tail -f C:\Windows\Logs\SomeLog.log - will be not readable if there are some non-English file names. I think locale should remain default UTF-8, as in Cygwin 2.5.2. This is expected by both applications and users. Tests: // Run Windows console. cmd C:\Cygwin_2.6.0\bin\echo ±5°> utf-8.txt C:\Cygwin_2.6.0\bin\od -t x1z utf-8.txt 0000000 c2 b1 35 c2 b0 0d 0a >..5....< // We have UTF-8 now in "utf-8.txt" file. C:\Cygwin_2.6.0\bin\locale LANG= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= C:\Cygwin_2.6.0\bin\cat utf-8.txt ▒▒5▒▒ C:\Cygwin_2.6.0\bin\env LANG=C.UTF-8 C:\Cygwin_2.6.0\bin\cat utf-8.txt ±5° // Fixed! But what is default locale then? C:\Cygwin_2.6.0\bin\env LANG=C.CP1251 C:\Cygwin_2.6.0\bin\cat utf-8.txt В±5В° C:\Cygwin_2.6.0\bin\env LANG=C.CP866 C:\Cygwin_2.6.0\bin\cat utf-8.txt ┬▒5┬░ C:\Cygwin_2.6.0\bin\env LANG=C.ISO8859-1 C:\Cygwin_2.6.0\bin\cat utf-8.txt ±5° // Doesn't match. I have no idea what is default locale in Cygwin 2.6.0. // Let's try console native encoding echo ±5°> cp866.txt C:\Cygwin_2.6.0\bin\od -t x1z cp866.txt 0000000 2b 35 f8 0d 0a >+5...< type cp866.txt +5° C:\Cygwin_2.6.0\bin\cat cp866.txt +5▒ // Bad. Cygwin 2.6.0 can't display even non-UTF-8. // Try filenames: ls -al lrwxrwxrwx 1 vanav ▒▒▒▒▒▒▒▒▒▒▒▒▒▒ 33 Sep 17 07:43 ''$'\320\234\320\276\320\270'' '$'\320\2 64\320\276\320\272\321\203\320\274\320\265\320\275\321\202\321\213' -> /cygdrive/c/Users/Vanav/Documents // Bad. C:\Cygwin_2.6.0\bin\env LANG=C.UTF-8 C:\Cygwin_2.6.0\bin\ls -al lrwxrwxrwx 1 vanav система 33 Sep 17 07:43 'Мои документы' -> /cygdrive/c/Users/Vanav/Documents // Good. // Now try previous Cygwin 2.5.2. C:\Cygwin_2.5.2\bin\locale LANG= LC_CTYPE="C.UTF-8" LC_NUMERIC="C.UTF-8" LC_TIME="C.UTF-8" LC_COLLATE="C.UTF-8" LC_MONETARY="C.UTF-8" LC_MESSAGES="C.UTF-8" LC_ALL= C:\Cygwin_2.5.2\bin\cat utf-8.txt ±5° // Good. C:\Cygwin_2.5.2\bin\env LANG=C.UTF-8 C:\Cygwin_2.5.2\bin\cat utf-8.txt ±5° // Good. [1] https://cygwin.com/cygwin-ug-net/setup-locale.html Saturday, October 1, 2016, 8:15:02 AM, you wrote: > On 2016-09-30 22:34, Brian Inglis wrote: > Sorry - this was mintty - you used cmd! > Saw similar problems you had until I set LC_ALL=C.UTF-8 (and LANG > for consistency, but doesn't really matter) and chcp 65001. > Then type and Cygwin commands produce the same output. > Without CP65001 (and a Unicode console font mapping most characters > - I use DejaVu Sans Mono everywhere I can) there may be no valid > encoding for UTF-8 special characters in your default console CP > (437 for US, 850 for non-US, others for localized versions). > Unfortunately then less displays spaces as squares, so you may have > to set PAGER=more for readability. -- Best regards, Ivan mailto:vanav@vanav.org -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple