From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15930 invoked by alias); 31 Aug 2009 00:53:16 -0000 Received: (qmail 15922 invoked by uid 22791); 31 Aug 2009 00:53:15 -0000 X-Spam-Check-By: sourceware.org Received: from pool-173-76-52-116.bstnma.fios.verizon.net (HELO cgf.cx) (173.76.52.116) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 31 Aug 2009 00:53:08 +0000 Received: from ednor.cgf.cx (ednor.casa.cgf.cx [192.168.187.5]) by cgf.cx (Postfix) with ESMTP id C76B013C0C4 for ; Sun, 30 Aug 2009 20:52:58 -0400 (EDT) Received: by ednor.cgf.cx (Postfix, from userid 201) id E05232B352; Sun, 30 Aug 2009 20:52:58 -0400 (EDT) Date: Mon, 31 Aug 2009 00:53:00 -0000 From: Christopher Faylor To: cygwin@cygwin.com Subject: Re: The C locale Message-ID: <20090831005258.GG2068@ednor.casa.cgf.cx> Reply-To: cygwin@cygwin.com Mail-Followup-To: cygwin@cygwin.com References: <416096c60908300959i1e0084b1xc8f6e65e792b035d@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <416096c60908300959i1e0084b1xc8f6e65e792b035d@mail.gmail.com> User-Agent: Mutt/1.5.20 (2009-06-14) Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com X-SW-Source: 2009-08/txt/msg00931.txt.bz2 On Sun, Aug 30, 2009 at 05:59:11PM +0100, Andy Koppe wrote: >Trying to reply to Tuomo Valkonen's post about locale issues, I got >rather confused about the C locale. The manual and the POSIX standard >say that it supports ASCII only, so in theory anything above 0x7F >should be rejected. In practice though, both Cygwin 1.5 and 1.7 do >support characters above 0x7F in the C locale, which could be quite >useful. Trouble is, they do so rather inconsistenly. > >Both in 1.5 and 1.7, the mb conversion functions treat such characters >as ISO-8859-1. In other words, conversion between chars and wchars are >simple casts (except that wchars above 0xFF can't be converted). This >makes some sense. > >Filename handling is different though. Cygwin 1.5 translates filenames >according to the system's ANSI codepage. I guess the inconsistency >with the mb functions didn't really matter, as the mb functions were >pretty much useless anyway, and supporting the system codepage was >more important. > >So, with Cygwin 1.7, I'd have expected filename handling in the C >locale to either use ISO-8859-1 for consistency with the mb functions, >or the ANSI codepage for compatibility with 1.5. In actual fact >though, it uses UTF-8. > >Is this on purpose? If so, shouldn't the multibyte conversions >functions in the C locale use UTF-8 as well? Since Cygin has a clear system that it is supposed to be emulating, the real question is "What does Linux do?" cgf -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple