public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* dg-error vs. i18n?
@ 2009-10-23 23:04 Dave Korn
  2009-10-23 23:08 ` Richard Guenther
  2009-10-23 23:19 ` Andrew Pinski
  0 siblings, 2 replies; 17+ messages in thread
From: Dave Korn @ 2009-10-23 23:04 UTC (permalink / raw)
  To: gcc


    Hi everyone,

  Sorry for posting a dumb question, but it's not my strongest area: now that
cygwin is handling i18n and unicode and "all that stuff", I started seeing a
whole slew of test failures, e.g.:

> FAIL: g++.dg/debug/pr22514.C  (test for errors, line 12)
> FAIL: g++.dg/debug/pr22514.C (test for excess errors)
> Excess errors:
> /gnu/gcc/releases/4.3.4-2/gcc4-4.3.4-2/src/gcc-4.3.4/gcc/testsuite/g++.dg/debug/pr22514.C:12:
> error: expected unqualified-id before ‘}’ token

  The reason appears to be because the testcase has single-quotes in the regex
pattern:

>> $ cat g++.dg/debug/pr22514.C -n
>>      1  /* { dg-do compile } */
>>      2  namespace s
>>      3  {
>>      4    template <int> struct _List_base
>>      5    {
>>      6       int _M_impl;
>>      7    };
>>      8    template<int i> struct list : _List_base<i>
>>      9    {
>>     10      using _List_base<i>::_M_impl;
>>     11    }
>>     12  }  /* { dg-error "expected unqualified-id before '\}'" } */
>>     13  s::list<1> OutputModuleListType;

... where the actual compiler outputs those fancy left- and right-facing
quotes.  It will probably go away if I set LC_ALL=c or something like that,
but is dg-error meant to be insensitive to this kind of transformation, or
would it be best if dg-error test patterns didn't include any kind of quote
chars that might get i14ed?

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-23 23:04 dg-error vs. i18n? Dave Korn
@ 2009-10-23 23:08 ` Richard Guenther
  2009-10-24  0:04   ` Joseph S. Myers
  2009-10-23 23:19 ` Andrew Pinski
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Guenther @ 2009-10-23 23:08 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Sat, Oct 24, 2009 at 1:01 AM, Dave Korn
<dave.korn.cygwin@googlemail.com> wrote:
>
>    Hi everyone,
>
>  Sorry for posting a dumb question, but it's not my strongest area: now that
> cygwin is handling i18n and unicode and "all that stuff", I started seeing a
> whole slew of test failures, e.g.:
>
>> FAIL: g++.dg/debug/pr22514.C  (test for errors, line 12)
>> FAIL: g++.dg/debug/pr22514.C (test for excess errors)
>> Excess errors:
>> /gnu/gcc/releases/4.3.4-2/gcc4-4.3.4-2/src/gcc-4.3.4/gcc/testsuite/g++.dg/debug/pr22514.C:12:
>> error: expected unqualified-id before ‘}’ token
>
>  The reason appears to be because the testcase has single-quotes in the regex
> pattern:
>
>>> $ cat g++.dg/debug/pr22514.C -n
>>>      1  /* { dg-do compile } */
>>>      2  namespace s
>>>      3  {
>>>      4    template <int> struct _List_base
>>>      5    {
>>>      6       int _M_impl;
>>>      7    };
>>>      8    template<int i> struct list : _List_base<i>
>>>      9    {
>>>     10      using _List_base<i>::_M_impl;
>>>     11    }
>>>     12  }  /* { dg-error "expected unqualified-id before '\}'" } */
>>>     13  s::list<1> OutputModuleListType;
>
> ... where the actual compiler outputs those fancy left- and right-facing
> quotes.  It will probably go away if I set LC_ALL=c or something like that,
> but is dg-error meant to be insensitive to this kind of transformation, or
> would it be best if dg-error test patterns didn't include any kind of quote
> chars that might get i14ed?

The testsuite should run with C locale.

Richard.

>    cheers,
>      DaveK
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-23 23:04 dg-error vs. i18n? Dave Korn
  2009-10-23 23:08 ` Richard Guenther
@ 2009-10-23 23:19 ` Andrew Pinski
  2009-10-24  1:28   ` Dave Korn
  2009-10-24 15:17   ` Andreas Schwab
  1 sibling, 2 replies; 17+ messages in thread
From: Andrew Pinski @ 2009-10-23 23:19 UTC (permalink / raw)
  To: Dave Korn; +Cc: gcc

On Fri, Oct 23, 2009 at 4:01 PM, Dave Korn
<dave.korn.cygwin@googlemail.com> wrote:
>
>    Hi everyone,
>
>  Sorry for posting a dumb question, but it's not my strongest area: now that
> cygwin is handling i18n and unicode and "all that stuff", I started seeing a
> whole slew of test failures, e.g.:

The .exp files should be setting LC_ALL and LANG to 'c'.  See bug
14264 which I fixed ...
Hopefully I did not miss one when I did that change almost 5 years ago.

Thanks,
Andrew pinski

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-23 23:08 ` Richard Guenther
@ 2009-10-24  0:04   ` Joseph S. Myers
  2009-10-24  2:28     ` Dave Korn
  2009-10-25 11:19     ` Dave Korn
  0 siblings, 2 replies; 17+ messages in thread
From: Joseph S. Myers @ 2009-10-24  0:04 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Dave Korn, gcc

On Sat, 24 Oct 2009, Richard Guenther wrote:

> The testsuite should run with C locale.

Unfortunately it appears there are now some systems where C no longer 
implies ASCII, causing problems for predictability of output.  So the 
testsuite may need to check the host and use other host-specific names 
(C.US-ASCII?) in some cases.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-23 23:19 ` Andrew Pinski
@ 2009-10-24  1:28   ` Dave Korn
  2009-10-24 20:05     ` Andreas Schwab
  2009-10-24 15:17   ` Andreas Schwab
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Korn @ 2009-10-24  1:28 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Dave Korn, gcc

Andrew Pinski wrote:
> On Fri, Oct 23, 2009 at 4:01 PM, Dave Korn
> <dave.korn.cygwin@googlemail.com> wrote:
>>    Hi everyone,
>>
>>  Sorry for posting a dumb question, but it's not my strongest area: now that
>> cygwin is handling i18n and unicode and "all that stuff", I started seeing a
>> whole slew of test failures, e.g.:
> 
> The .exp files should be setting LC_ALL and LANG to 'c'.  See bug
> 14264 which I fixed ...
> Hopefully I did not miss one when I did that change almost 5 years ago.

  I'll check.  Joseph's suggestion sounds likely: I think Cygwin just switched
to use lots of UTF-8 internally, so I might well need to specify an encoding
as well.  (Sorry for not being as well educated in this field as I really
ought to be by now.)

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-24  0:04   ` Joseph S. Myers
@ 2009-10-24  2:28     ` Dave Korn
  2009-10-24  5:06       ` Charles Wilson
  2009-10-25 11:19     ` Dave Korn
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Korn @ 2009-10-24  2:28 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Dave Korn, gcc

Joseph S. Myers wrote:
> On Sat, 24 Oct 2009, Richard Guenther wrote:
> 
>> The testsuite should run with C locale.
> 
> Unfortunately it appears there are now some systems where C no longer 
> implies ASCII, causing problems for predictability of output.  So the 
> testsuite may need to check the host and use other host-specific names 
> (C.US-ASCII?) in some cases.

  Thanks, that was it.  Had to use "C.CP437" in the end, apparently we have
charset encoding names for lots of OEM code pages but none for plain vanilla
ASCII.

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-24  2:28     ` Dave Korn
@ 2009-10-24  5:06       ` Charles Wilson
  0 siblings, 0 replies; 17+ messages in thread
From: Charles Wilson @ 2009-10-24  5:06 UTC (permalink / raw)
  To: Dave Korn; +Cc: Joseph S. Myers, Richard Guenther, gcc, Cygwin Mailing List

[cross-posted to cygwin list]

Background for cygwin list: Dave discovered a problem running some of
the gcc tests.  The tests were run in the "C" locale, but in so doing
they assumed an ascii encoding (specifically, that "'" would match ' in
test patterns -- but the program actually emitted those fancy curled
quotes which did not match ').

Dave Korn wrote:
> Thanks, that was it.  Had to use "C.CP437" in the end, apparently we have
> charset encoding names for lots of OEM code pages but none for plain vanilla
> ASCII.

That's interesting. I had thought "ascii" was a fairly common encoding
name; I know I've seen both 'encoding="ascii"' and 'encoding="us-ascii"'
in XML documents.  Maybe we (cygwin) should add an explicit
plain-old-ascii encoding name?

--
Chuck

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-23 23:19 ` Andrew Pinski
  2009-10-24  1:28   ` Dave Korn
@ 2009-10-24 15:17   ` Andreas Schwab
  1 sibling, 0 replies; 17+ messages in thread
From: Andreas Schwab @ 2009-10-24 15:17 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Dave Korn, gcc

Andrew Pinski <pinskia@gmail.com> writes:

> On Fri, Oct 23, 2009 at 4:01 PM, Dave Korn
> <dave.korn.cygwin@googlemail.com> wrote:
>>
>>    Hi everyone,
>>
>>  Sorry for posting a dumb question, but it's not my strongest area: now that
>> cygwin is handling i18n and unicode and "all that stuff", I started seeing a
>> whole slew of test failures, e.g.:
>
> The .exp files should be setting LC_ALL and LANG to 'c'.

'C', not 'c'.  The latter is not a valid locale in general (and bash
will then complain).

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-24  1:28   ` Dave Korn
@ 2009-10-24 20:05     ` Andreas Schwab
  2009-10-25 11:08       ` Charles Wilson
  0 siblings, 1 reply; 17+ messages in thread
From: Andreas Schwab @ 2009-10-24 20:05 UTC (permalink / raw)
  To: Dave Korn; +Cc: Andrew Pinski, gcc

Dave Korn <dave.korn.cygwin@googlemail.com> writes:

>   I'll check.  Joseph's suggestion sounds likely: I think Cygwin just switched
> to use lots of UTF-8 internally, so I might well need to specify an encoding
> as well.  (Sorry for not being as well educated in this field as I really
> ought to be by now.)

If cygwin wants to be POSIX compatible then the C locale cannot use
UTF-8.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-24 20:05     ` Andreas Schwab
@ 2009-10-25 11:08       ` Charles Wilson
  2009-10-27 15:49         ` Eric Blake
  0 siblings, 1 reply; 17+ messages in thread
From: Charles Wilson @ 2009-10-25 11:08 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Dave Korn, Andrew Pinski, gcc

Andreas Schwab wrote:
> Dave Korn writes:
> 
>>   I'll check.  Joseph's suggestion sounds likely: I think Cygwin just switched
>> to use lots of UTF-8 internally, so I might well need to specify an encoding
>> as well.  (Sorry for not being as well educated in this field as I really
>> ought to be by now.)
> 
> If cygwin wants to be POSIX compatible then the C locale cannot use
> UTF-8.

I'm certainly no expert, but AFAICT POSIX requires nothing of the sort.
locale != character encoding, as below. (I could be wrong, but I think
you could easily have a POSIX-conformant C locale on a system which uses
EBCDIC ecoding -- because the default locale definition tables are
specified in terms of character, not hexadecimal, values.)


Also, see the HTML table at
http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap07.html#tag_07_03.


"The tables in Locale Definition describe the characteristics and
behavior of the POSIX locale for data consisting entirely of characters
from the portable character set and the control character set. For other
characters, the behavior is unspecified. For C-language programs, the
POSIX locale shall be the default locale when the setlocale() function
is not called."

IOW, it only imposes requirements on how the POSIX locale operates on
the basic 128 characters (*interpreted as characters*, with zero regard
to their hexidecimal values.  For ASCII and UTF-8...those characters are
the "lower 128" 7bit hex values, and are the same; behavior with respect
to "other characters" -- the "upper 128" for single byte, and any
multibyte -- is explicitly "unspecified".  So C.UTF-8 is a perfectly
valid default POSIX locale.

The underlying issue is actually gcc: its i18n messages appear
explicitly to "translate" from (e.g.) _("error in file '%s'") to "error
in file {fancy-left-quote}%s{fancy-right-quote}"  when the encoding is
UTF-8.  Working around that by specifying setlocale("C") isn't
sufficient, without also specifying the encoding...

But not all systems will recognize "C.ASCII" as /THE/ C locale, with
explicit ASCII encoding; they might not recognize "C.ASCII" at all.
Looks like to me that this silence concerning encoding is a hole in the
standard.

--
Chuck

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-24  0:04   ` Joseph S. Myers
  2009-10-24  2:28     ` Dave Korn
@ 2009-10-25 11:19     ` Dave Korn
  2009-10-25 18:42       ` Joseph S. Myers
  1 sibling, 1 reply; 17+ messages in thread
From: Dave Korn @ 2009-10-25 11:19 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Richard Guenther, Dave Korn, gcc

Joseph S. Myers wrote:
> On Sat, 24 Oct 2009, Richard Guenther wrote:
> 
>> The testsuite should run with C locale.
> 
> Unfortunately it appears there are now some systems where C no longer 
> implies ASCII, causing problems for predictability of output.  

  I was asked about this on another list: by "some systems", do you mean any
other than Cygwin?  Are you specifically aware of any e.g. linux distros that
default to say UTF-8 in the C locale, or were you just making a general
statement since the Cygwin case proves that there "exists at least one" such
system?

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-25 11:19     ` Dave Korn
@ 2009-10-25 18:42       ` Joseph S. Myers
  0 siblings, 0 replies; 17+ messages in thread
From: Joseph S. Myers @ 2009-10-25 18:42 UTC (permalink / raw)
  To: Dave Korn; +Cc: Richard Guenther, gcc

On Sat, 24 Oct 2009, Dave Korn wrote:

> Joseph S. Myers wrote:
> > On Sat, 24 Oct 2009, Richard Guenther wrote:
> > 
> >> The testsuite should run with C locale.
> > 
> > Unfortunately it appears there are now some systems where C no longer 
> > implies ASCII, causing problems for predictability of output.  
> 
>   I was asked about this on another list: by "some systems", do you mean any
> other than Cygwin?  Are you specifically aware of any e.g. linux distros that
> default to say UTF-8 in the C locale, or were you just making a general
> statement since the Cygwin case proves that there "exists at least one" such
> system?

It was stated on #gcc on 12 August that AIX 5.3 returns "ISO8859-1" from 
nl_langinfo (CODESET) in the C locale, which also causes problems for 
testcases that are making assertions about GCC's English-language, 
ASCII-charset output.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-25 11:08       ` Charles Wilson
@ 2009-10-27 15:49         ` Eric Blake
  2009-10-27 20:18           ` Dave Korn
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Blake @ 2009-10-27 15:49 UTC (permalink / raw)
  To: gcc

Charles Wilson <cygwin <at> cwilson.fastmail.fm> writes:

> > If cygwin wants to be POSIX compatible then the C locale cannot use
> > UTF-8.

Not true.  POSIX has no restrictions against the C locale not being a multi-
byte charset.

> 
> "The tables in Locale Definition describe the characteristics and
> behavior of the POSIX locale for data consisting entirely of characters
> from the portable character set and the control character set. For other
> characters, the behavior is unspecified. For C-language programs, the
> POSIX locale shall be the default locale when the setlocale() function
> is not called."
> 
> IOW, it only imposes requirements on how the POSIX locale operates on
> the basic 128 characters (*interpreted as characters*, with zero regard
> to their hexidecimal values.  For ASCII and UTF-8...those characters are
> the "lower 128" 7bit hex values, and are the same; behavior with respect
> to "other characters" -- the "upper 128" for single byte, and any
> multibyte -- is explicitly "unspecified".  So C.UTF-8 is a perfectly
> valid default POSIX locale.

I concur with Chuck's reading of POSIX - the C locale is allowed to use a 
multibyte character encoding, _precisely_ because behavior is unspecified if an 
application attempts to ever interpret any 8-bit bytes in a character context.  
Using the UTF-8 charset in the "C" locale is permitted by POSIX, and any 
application that thinks that "C" implies a unibyte charset is broken.

In non-character contexts (such as strcmp), the "C" locale has guarantees that 
8-bit bytes will sort in the same order as extended ascii (ie. based on the 
byte's values, regardless of whether the byte represents any character, and 
regardless of whether the charset has multibyte encodings).  And thankfully, 
UTF-8 has the nice property that strcmp happens to also perform character 
sorting (at least, for properly normalized character sequences).  The problem 
is only visible when using character contexts.  But that is exactly what gcc is 
doing - it is using the charset determination (UTF-8 in cygwin's case) coupled 
with the "C" locale to make decisions on which quoting characters to use, and 
that's where gcc is falling foul of POSIX.

> The underlying issue is actually gcc: its i18n messages appear
> explicitly to "translate" from (e.g.) _("error in file '%s'") to "error
> in file {fancy-left-quote}%s{fancy-right-quote}"  when the encoding is
> UTF-8.  Working around that by specifying setlocale("C") isn't
> sufficient, without also specifying the encoding...

The correct workaround is indeed to specify a locale with specific charset 
encodings, rather than relying on plain "C" (hopefully cygwin will 
support "C.ASCII", if it does not already).

> But not all systems will recognize "C.ASCII" as /THE/ C locale, with
> explicit ASCII encoding; they might not recognize "C.ASCII" at all.
> Looks like to me that this silence concerning encoding is a hole in the
> standard.

As far as I know, the hole is intentional.  But if others would like me to, I 
am willing to pursue the action of raising a defect against the POSIX standard, 
requesting that the next version of POSIX consider including a standardized 
name for a locale with guaranteed single-byte encoding.

-- 
Eric Blake


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-27 15:49         ` Eric Blake
@ 2009-10-27 20:18           ` Dave Korn
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Korn @ 2009-10-27 20:18 UTC (permalink / raw)
  To: Eric Blake; +Cc: gcc

Eric Blake wrote:

> The correct workaround is indeed to specify a locale with specific charset 
> encodings, rather than relying on plain "C" 

  Yep, I'm testing a patch to that effect.

> (hopefully cygwin will support "C.ASCII", if it does not already).

  Yep, it does.

> As far as I know, the hole is intentional.  But if others would like me to, I 
> am willing to pursue the action of raising a defect against the POSIX standard, 
> requesting that the next version of POSIX consider including a standardized 
> name for a locale with guaranteed single-byte encoding.

  I think that would be useful, albeit with the drawback that it would make it
easier for people to carry on putting off dealing with i18n indefinitely.
Surprised that POSIX doesn't specify "C.ASCII" as exactly that already; I
think it would well make sense to put it in the standard.

    cheers,
      DaveK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
@ 2009-10-28 18:59 Ross Ridge
  0 siblings, 0 replies; 17+ messages in thread
From: Ross Ridge @ 2009-10-28 18:59 UTC (permalink / raw)
  To: gcc

Ross Ridge wrote:
> The correct fix is for GCC not to intentionally choose to rely on
> implementation defined behaviour when using the "C" locale.  GCC can't
> portably assume any other locale exists, but can portibly and easily
> choose to get consistant output when using the "C" locale.

Joseph S. Myers writes:
>GCC is behaving properly according to the user's locale (representing 
>English-language diagnostics as best it can - remember that ASCII does not 
>allow good representation of English in all cases).  

This is an issue of style, but as I far as I'm concerned using these
fancy quotes in English locales is unnecessary and unhelpful. 

>The problem here is not a bug in the compiler proper, it is an issue
>with how to test the compiler portably - that is, how the testsuite can
>portably set a locale with English language and ASCII character set in
>order to test the output the compiler gives in such a locale.

It's a design flaw in GCC.  The "C" locale is the only locale that GCC
can use to reliably and portably get consistant output across all ASCII
systems and so should be the locale used to achieve consistant output.
GCC can simply choose to restrict it's output to ASCII.  It's not in
any way being forced by POSIX to output non-ASCII characters, or for
that matter to treat the "C" locale as an English locale. 

					Ross Ridge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
  2009-10-28 11:30 Ross Ridge
@ 2009-10-28 17:34 ` Joseph S. Myers
  0 siblings, 0 replies; 17+ messages in thread
From: Joseph S. Myers @ 2009-10-28 17:34 UTC (permalink / raw)
  To: Ross Ridge; +Cc: gcc

On Tue, 27 Oct 2009, Ross Ridge wrote:

> Eric Blake writes:
> >The correct workaround is indeed to specify a locale with specific charset 
> >encodings, rather than relying on plain "C" (hopefully cygwin will 
> >support "C.ASCII", if it does not already).
> 
> The correct fix is for GCC not to intentionally choose to rely on
> implementation defined behaviour when using the "C" locale.  GCC can't
> portably assume any other locale exists, but can portibly and easily
> choose to get consistant output when using the "C" locale.

GCC is behaving properly according to the user's locale (representing 
English-language diagnostics as best it can - remember that ASCII does not 
allow good representation of English in all cases).  The problem here is 
not a bug in the compiler proper, it is an issue with how to test the 
compiler portably - that is, how the testsuite can portably set a locale 
with English language and ASCII character set in order to test the output 
the compiler gives in such a locale.  (Ideally it would be possible for 
individual tests to specify other locale character sets to test the output 
given in those locales as well - for example, to test that in an ISO 
8859-1 locale extended characters in identifiers that are in ISO 8859-1 
are shown as is in diagnostics while those that are not in ISO 8859-1 are 
shown as UCNs.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dg-error vs. i18n?
@ 2009-10-28 11:30 Ross Ridge
  2009-10-28 17:34 ` Joseph S. Myers
  0 siblings, 1 reply; 17+ messages in thread
From: Ross Ridge @ 2009-10-28 11:30 UTC (permalink / raw)
  To: gcc

Eric Blake writes:
>The correct workaround is indeed to specify a locale with specific charset 
>encodings, rather than relying on plain "C" (hopefully cygwin will 
>support "C.ASCII", if it does not already).

The correct fix is for GCC not to intentionally choose to rely on
implementation defined behaviour when using the "C" locale.  GCC can't
portably assume any other locale exists, but can portibly and easily
choose to get consistant output when using the "C" locale.

>As far as I know, the hole is intentional.  But if others would like
>me to, I am willing to pursue the action of raising a defect against
>the POSIX standard, requesting that the next version of POSIX consider
>including a standardized name for a locale with guaranteed single-byte
>encoding.

I don't see how a defect in POSIX is exposed here.  Nothing in
the standard forced GCC to output multi-byte characters when
nl_langinfo(CHARSET) returns something like "utf-8".  GCC chould just
as easily have choosen to output these quotes as single-byte characters
when nl_langinfo(CHARSET) returns something like "windows-1252", or some
other non-ASCII single-byte characters when it returned "iso-8859-1".

					Ross Ridge

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-10-28 18:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-23 23:04 dg-error vs. i18n? Dave Korn
2009-10-23 23:08 ` Richard Guenther
2009-10-24  0:04   ` Joseph S. Myers
2009-10-24  2:28     ` Dave Korn
2009-10-24  5:06       ` Charles Wilson
2009-10-25 11:19     ` Dave Korn
2009-10-25 18:42       ` Joseph S. Myers
2009-10-23 23:19 ` Andrew Pinski
2009-10-24  1:28   ` Dave Korn
2009-10-24 20:05     ` Andreas Schwab
2009-10-25 11:08       ` Charles Wilson
2009-10-27 15:49         ` Eric Blake
2009-10-27 20:18           ` Dave Korn
2009-10-24 15:17   ` Andreas Schwab
2009-10-28 11:30 Ross Ridge
2009-10-28 17:34 ` Joseph S. Myers
2009-10-28 18:59 Ross Ridge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).