From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13802 invoked by alias); 14 Feb 2014 12:05:14 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 13786 invoked by uid 89); 14 Feb 2014 12:05:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KAM_THEBAT,SPF_SOFTFAIL autolearn=no version=3.3.2 X-HELO: smtpback.ht-systems.ru Received: from smtpback.ht-systems.ru (HELO smtpback.ht-systems.ru) (78.110.50.181) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Fri, 14 Feb 2014 12:05:11 +0000 Received: from [91.78.166.180] (helo=darkdragon.lan) by smtp.ht-systems.ru with esmtpa (Exim 4.80.1) (envelope-from ) id 1WEHVw-0007Gt-AR for cygwin@cygwin.com; Fri, 14 Feb 2014 16:05:04 +0400 Received: from [192.168.1.10] (HELO localhost) by daemon2 (Office Mail Server 0.8.12 build 08053101) with SMTP; Fri, 14 Feb 2014 11:56:31 -0000 Date: Fri, 14 Feb 2014 12:35:00 -0000 From: Andrey Repin Reply-To: Andrey Repin Message-ID: <1078913914.20140214155631@mtu-net.ru> To: Corinna Vinschen Subject: Re: New passwd/group handling in Cygwin - test results and observations In-Reply-To: <20140214102044.GX2246@calimero.vinschen.de> References: <20140213143849.GH2246@calimero.vinschen.de> <1717869165.20140214021113@mtu-net.ru> <20140214102044.GX2246@calimero.vinschen.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2014-02/txt/msg00357.txt.bz2 Greetings, Corinna Vinschen! >> The issue can be observed when you have a user or group name containing >> characters outside basic ASCII character set. Even western diacritics will >> suffice. >> >> Add somewhere in your startup files an equivalent of the following block: >> (I have it in private .profile) >> >> ---->8-------->8-------->8-------->8-------->8-------->8-------->8---- >> case "$TERM" in >> xterm*) >> LANG=ru_RU.UTF-8 >> ;; >> *) >> LANG=ru_RU.CP866 >> ;; >> esac >> >> export PATH HISTCONTROL LANG >> ----8<--------8<--------8<--------8<--------8<--------8<--------8<---- >> >> restart your shell, and try to ls -l a directory, where you have files owned >> by abovementioned user/group. >> >> Try it in mintty(the encoding will be UTF-8 and names will show up readable) >> and in native console (with appropriate single-byte encoding, the names will >> still be printed in unicode, means, raw byte sequences will be dumped to >> terminal). >> I though it could be affected by the fact I'm changing LANG on the fly, but >> starting bash in a console that initially have correct LANG= variable doesn't >> change observed results. > Yes, this is a problem, and I'm not sure how to fix it, if at all. > The problem is hopefully obvious. We have to initialize things in some > order. For instance, to read /etc/fstab.d/$USER, we need the username. > And since the Cygwin username can be different from the Windows username > (I guess I should have never added this functionality in the first > place), I feel your pain... > we have to read the user's passwd before we read the fstabs. > Same for the initialization of $LANG and friends. That occurs pretty > late in the process initialization. You know that Windows uses UTF-16 > under the hood, so a lot of stuff gets read and converted to UTF-8 > before we even care for the environment. And if you set the codeset in > the application only, all the relevant information has already been read > long ago, of course. > But this is a problem not different from Linux. If you have a username > with non-ASCII chars, it will use *some* encoding in the passwd DB, > usually UTF-8 these days. If you then change the codeset in your > application, you will still get your username in UTF-8. It won't be > changed on the fly, just because your application calls setlocale. I understand it (mostly), but there's actually two issues, not one. One issue is the display part, where names are output for user consumption. Another can be observed in, i.e., rsync, and file access in general (remember the discussion about accessing long directory names in unicode). Changing LANG variable DO matter for the latter, and you may only hope that whatever is output in the former case is actually printable (thank God, most of the time it actually is, in case of UTF-8). It is getting even more complicated, when you consider the fact, that in Windows you have 2 different single-byte encodings, so-called ANSI (for GUI applications) and OEM (for console). And alot of stuff making assumptions without consulting with current status of things. As convoluted the problem is, I think, we need some sort of solution, or at the very least - documentation. -- WBR, Andrey Repin (anrdaemon@yandex.ru) 14.02.2014, <15:15> Sorry for my terrible english... -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple