From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 14371 invoked by alias); 24 Oct 2009 15:17:31 -0000 Received: (qmail 14360 invoked by uid 22791); 24 Oct 2009 15:17:30 -0000 X-SWARE-Spam-Status: No, hits=-3.5 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: sourceware.org Received: from out2.smtp.messagingengine.com (HELO out2.smtp.messagingengine.com) (66.111.4.26) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Sat, 24 Oct 2009 15:17:26 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by gateway1.messagingengine.com (Postfix) with ESMTP id C9781B4CD1; Sat, 24 Oct 2009 11:17:24 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute2.internal (MEProxy); Sat, 24 Oct 2009 11:17:25 -0400 Received: from [192.168.1.3] (user-0c6sbc4.cable.mindspring.com [24.110.45.132]) by mail.messagingengine.com (Postfix) with ESMTPSA id 391365942D; Sat, 24 Oct 2009 11:17:24 -0400 (EDT) Message-ID: <4AE31A50.4010802@cwilson.fastmail.fm> Date: Sun, 25 Oct 2009 11:08:00 -0000 From: Charles Wilson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 MIME-Version: 1.0 To: Andreas Schwab CC: Dave Korn , Andrew Pinski , "gcc@gcc.gnu.org" Subject: Re: dg-error vs. i18n? References: <4AE235E4.2060005@gmail.com> <4AE2480A.5020206@gmail.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org X-SW-Source: 2009-10/txt/msg00501.txt.bz2 Andreas Schwab wrote: > Dave Korn writes: > >> I'll check. Joseph's suggestion sounds likely: I think Cygwin just switched >> to use lots of UTF-8 internally, so I might well need to specify an encoding >> as well. (Sorry for not being as well educated in this field as I really >> ought to be by now.) > > If cygwin wants to be POSIX compatible then the C locale cannot use > UTF-8. I'm certainly no expert, but AFAICT POSIX requires nothing of the sort. locale != character encoding, as below. (I could be wrong, but I think you could easily have a POSIX-conformant C locale on a system which uses EBCDIC ecoding -- because the default locale definition tables are specified in terms of character, not hexadecimal, values.) Also, see the HTML table at http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03. "The tables in Locale Definition describe the characteristics and behavior of the POSIX locale for data consisting entirely of characters from the portable character set and the control character set. For other characters, the behavior is unspecified. For C-language programs, the POSIX locale shall be the default locale when the setlocale() function is not called." IOW, it only imposes requirements on how the POSIX locale operates on the basic 128 characters (*interpreted as characters*, with zero regard to their hexidecimal values. For ASCII and UTF-8...those characters are the "lower 128" 7bit hex values, and are the same; behavior with respect to "other characters" -- the "upper 128" for single byte, and any multibyte -- is explicitly "unspecified". So C.UTF-8 is a perfectly valid default POSIX locale. The underlying issue is actually gcc: its i18n messages appear explicitly to "translate" from (e.g.) _("error in file '%s'") to "error in file {fancy-left-quote}%s{fancy-right-quote}" when the encoding is UTF-8. Working around that by specifying setlocale("C") isn't sufficient, without also specifying the encoding... But not all systems will recognize "C.ASCII" as /THE/ C locale, with explicit ASCII encoding; they might not recognize "C.ASCII" at all. Looks like to me that this silence concerning encoding is a hole in the standard. -- Chuck