From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-return-157455-listarch-gcc=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 14371 invoked by alias); 24 Oct 2009 15:17:31 -0000
Received: (qmail 14360 invoked by uid 22791); 24 Oct 2009 15:17:30 -0000
X-SWARE-Spam-Status: No, hits=-3.5 required=5.0 	tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: sourceware.org
Received: from out2.smtp.messagingengine.com (HELO out2.smtp.messagingengine.com) (66.111.4.26)     by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 24 Oct 2009 15:17:26 +0000
Received: from compute2.internal (compute2.internal [10.202.2.42]) 	by gateway1.messagingengine.com (Postfix) with ESMTP id C9781B4CD1; 	Sat, 24 Oct 2009 11:17:24 -0400 (EDT)
Received: from heartbeat1.messagingengine.com ([10.202.2.160])   by compute2.internal (MEProxy); Sat, 24 Oct 2009 11:17:25 -0400
Received: from [192.168.1.3] (user-0c6sbc4.cable.mindspring.com [24.110.45.132]) 	by mail.messagingengine.com (Postfix) with ESMTPSA id 391365942D; 	Sat, 24 Oct 2009 11:17:24 -0400 (EDT)
Message-ID: <4AE31A50.4010802@cwilson.fastmail.fm>
Date: Sun, 25 Oct 2009 11:08:00 -0000
From: Charles Wilson <cygwin@cwilson.fastmail.fm>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666
MIME-Version: 1.0
To: Andreas Schwab <schwab@linux-m68k.org>
CC: Dave Korn <dave.korn.cygwin@googlemail.com>,   Andrew Pinski <pinskia@gmail.com>,  "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>
Subject: Re: dg-error vs. i18n?
References: <4AE235E4.2060005@gmail.com> 	<de8d50360910231604u545f6ae8t9e8d714b13013036@mail.gmail.com> 	<4AE2480A.5020206@gmail.com> <m2ws2luihi.fsf@igel.home>
In-Reply-To: <m2ws2luihi.fsf@igel.home>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc/>
List-Post: <mailto:gcc@gcc.gnu.org>
List-Help: <http://gcc.gnu.org/ml/>
Sender: gcc-owner@gcc.gnu.org
X-SW-Source: 2009-10/txt/msg00501.txt.bz2

Andreas Schwab wrote:
> Dave Korn writes:
> 
>>   I'll check.  Joseph's suggestion sounds likely: I think Cygwin just switched
>> to use lots of UTF-8 internally, so I might well need to specify an encoding
>> as well.  (Sorry for not being as well educated in this field as I really
>> ought to be by now.)
> 
> If cygwin wants to be POSIX compatible then the C locale cannot use
> UTF-8.

I'm certainly no expert, but AFAICT POSIX requires nothing of the sort.
locale != character encoding, as below. (I could be wrong, but I think
you could easily have a POSIX-conformant C locale on a system which uses
EBCDIC ecoding -- because the default locale definition tables are
specified in terms of character, not hexadecimal, values.)


Also, see the HTML table at
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03.


"The tables in Locale Definition describe the characteristics and
behavior of the POSIX locale for data consisting entirely of characters
from the portable character set and the control character set. For other
characters, the behavior is unspecified. For C-language programs, the
POSIX locale shall be the default locale when the setlocale() function
is not called."

IOW, it only imposes requirements on how the POSIX locale operates on
the basic 128 characters (*interpreted as characters*, with zero regard
to their hexidecimal values.  For ASCII and UTF-8...those characters are
the "lower 128" 7bit hex values, and are the same; behavior with respect
to "other characters" -- the "upper 128" for single byte, and any
multibyte -- is explicitly "unspecified".  So C.UTF-8 is a perfectly
valid default POSIX locale.

The underlying issue is actually gcc: its i18n messages appear
explicitly to "translate" from (e.g.) _("error in file '%s'") to "error
in file {fancy-left-quote}%s{fancy-right-quote}"  when the encoding is
UTF-8.  Working around that by specifying setlocale("C") isn't
sufficient, without also specifying the encoding...

But not all systems will recognize "C.ASCII" as /THE/ C locale, with
explicit ASCII encoding; they might not recognize "C.ASCII" at all.
Looks like to me that this silence concerning encoding is a hole in the
standard.

--
Chuck