From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27597 invoked by alias); 29 Sep 2009 11:12:53 -0000 Received: (qmail 27575 invoked by uid 22791); 29 Sep 2009 11:12:52 -0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: sourceware.org Received: from demumfd002.nsn-inter.net (HELO demumfd002.nsn-inter.net) (217.115.75.234) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 29 Sep 2009 11:12:45 +0000 Received: from demuprx016.emea.nsn-intra.net ([10.150.129.55]) by demumfd002.nsn-inter.net (8.12.11.20060308/8.12.11) with ESMTP id n8TBBYU9011835 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Tue, 29 Sep 2009 13:12:39 +0200 Received: from [10.149.155.84] ([10.149.155.84]) by demuprx016.emea.nsn-intra.net (8.12.11.20060308/8.12.11) with ESMTP id n8TB5pYs028446 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Tue, 29 Sep 2009 13:05:53 +0200 Message-ID: <4AC1EA0F.5040603@towo.net> Date: Tue, 29 Sep 2009 11:12:00 -0000 From: Thomas Wolff User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: cygwin@cygwin.com Subject: Re: The C locale References: <20090921103758.GE20981@calimero.vinschen.de> <416096c60909211420g4ac8ea93l80fc1f00dcd5c0f3@mail.gmail.com> <3f0ad08d0909240003j435818e7h6f7cde2e26188f7e@mail.gmail.com> <20090924073441.GA30267@calimero.vinschen.de> <3f0ad08d0909240237s518de248jee409b731711404a@mail.gmail.com> <20090924095701.GC30851@calimero.vinschen.de> <20090924100006.GD30851@calimero.vinschen.de> <20090926091504.GA7275@calimero.vinschen.de> <3f0ad08d0909262021u5fe79873r65850865166ce40f@mail.gmail.com> <3f0ad08d0909280903t5caaf611ie4049a73beb93f06@mail.gmail.com> <20090928161626.GC8378@calimero.vinschen.de> In-Reply-To: <20090928161626.GC8378@calimero.vinschen.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-purgate: clean X-purgate: This mail is considered clean X-purgate-type: clean X-purgate-Ad: Checked for Spam by eleven - eXpurgate www.eXpurgate.net X-purgate-ID: 151667::090929131239-2020FBB0-D5F1AABC/0-0/0-0 X-purgate-size: 3357/3195 X-IsSubscribed: yes Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2009-09/txt/msg00824.txt.bz2 Corinna Vinschen wrote: > On Sep 29 01:03, IWAMURO Motonori wrote: > >> 2009/9/27 IWAMURO Motonori : >> >>>> LANG="ja" -> EUCJP >>>> LANG="ja_JP" -> EUCJP >>>> >>> Hmmm, It is a difficult problem. >>> >>> I think selecting UTF-8 is good because eucJP is legacy. >>> >>> But, for interoperability with other UNIX-like system(*), I don't >>> think selecting UTF-8 is good. >>> >>> * Solaris: ja, ja_JP -> eucJP >>> * Linux (Debian): ja -> Unknown, ja_JP -> eucJP >>> >>> I need to think more... >>> >> My conclusion is as follows as a result of hearing other Japanese >> people's opinion: >> >> LANG=ja -> UTF-8 >> LANG=ja_JP -> UTF-8 >> >> Because, we specify "eucJP" explicitly when we need it. >> > > Hmm. > > That's an interesting point. > > In theory this sounds like a good idea to be used for all locales which > don't specify the charset explicitely, because that results in using the > same charset, "UTF-8", for all such locales. "C", "ja" or "en_US" > would all default to UTF-8. > The keyword here again should be compatibility. That means, unfortunately, that I do not think this is a good idea. A number of locales have been established on common systems that do not specify their encoding explicitly (i.e. in their name). Since there is now more or less a common set of such locales among various Linux and Unix systems, this seems to be a de-facto standard although I am not aware of any more formal definition/listing/description of this. On a modern Linux system, use the following command to get a list (not sure if it's appropriate to attach it here): for l in `locale -a` do echo "$l `LC_ALL=$l locale charmap`" done I have also tried to incorporate a best guess assembly of mappings from modern systems in my editor mined so it can derive the encoding from the locale name, so you could also take a working list from there. I think this list should be used for reference to define the locale/encoding mapping, other choices may be more attractive but only raise problems. > The downside is that a user, who needs to work under the default ANSI > codepage for some reason, has to know the name of the default ANSI > codepage. Right now any user who needs the default ANSI codepage can > simply set LANG to some language code and go ahead, without having to > know the number. With your solution, that wouldn't be possible anymore > and the user would have to figure out the default ANSI codepage on the > system before being able to use it. > > I honestly don't know if that's really a problem, though. But I don't > want to take that feature away for now. Anybody having a strong opinion > on this issue? > I wasn't quite aware that the old "codepage:oem" setting didn't strictly mean "CP850" or "CP437" but apparently the respective system locale. If that is really needed, maybe the "C" locale should get you there, or some "OEM" as (I think) Andy proposed. If someone feels the need to combine a specific language setting with the unspecific "system locale", well, maybe a pseudo encoding name could be invented to form names like "en_GB.OEM". Just leaving out the encoding suffix should not have that effect as I argued above. Kind regards, Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple