public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Using Unicode quotes (was: Re: Ada files now checked in)
@ 2001-10-07  7:25 dewar
  2001-10-07  7:46 ` Joseph S. Myers
  0 siblings, 1 reply; 4+ messages in thread
From: dewar @ 2001-10-07  7:25 UTC (permalink / raw)
  To: dewar, jsm28; +Cc: gcc, zack

<<For English, things still look prettier (given a UTF-8 terminal, e.g.
recentish xterm with appropriate options and fonts) if Unicode quotes are
used.  Since things need to produce ASCII double quotes when not in a
UTF-8 LC_CTYPE, but Unicode quotes where available, and since knowledge of
quotes should not be hardcoded everywhere, this suggests adding printf
modifiers to handle quoting to GCC's extensible printf reimplementation.
The common case of quoting is simply `%s', but the C++ front end also
quotes some longer strings with multiple conversions.  Perhaps the flags
should be ` for an open quote to appear before the converted output and '
for a close quote to appear afterwards; the individual left and right
quotes would be translated once in each .po file, and there would be
special-case handling for when no message catalog is being used to produce
Unicode or ASCII quotes according to the value of LC_CTYPE.
>>

The world is not close enough to Unicode and UTF-8 yet for this to make
practical sense. In practice such a policy will simply lead to even more
junk, since most people will just display the raw output of the compiler
in standard ASCII.

If you tell me that this is definitely false, and that the correct translation
will occur with the current version of GCC on all possible targets and
environments, then I agree that this may make sense, but I would need to
be convinced that this is the case :-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Using Unicode quotes (was: Re: Ada files now checked in)
  2001-10-07  7:25 Using Unicode quotes (was: Re: Ada files now checked in) dewar
@ 2001-10-07  7:46 ` Joseph S. Myers
  0 siblings, 0 replies; 4+ messages in thread
From: Joseph S. Myers @ 2001-10-07  7:46 UTC (permalink / raw)
  To: dewar; +Cc: gcc, zack

On Sun, 7 Oct 2001 dewar@gnat.com wrote:

> The world is not close enough to Unicode and UTF-8 yet for this to make
> practical sense. In practice such a policy will simply lead to even more
> junk, since most people will just display the raw output of the compiler
> in standard ASCII.

UTF-8 output would only get displayed in standard ASCII if someone has
LC_CTYPE set to a UTF-8 value which their terminal doesn't support - which
is simply a broken configuration.  Clearly UTF-8 output would only be
enabled when nl_langinfo(CODESET) is available and indicates the locale to
be UTF-8.

The world is moving to Unicode; there are people using GNU/Linux systems
entirely in UTF-8 locales, and it largely works.

> If you tell me that this is definitely false, and that the correct translation
> will occur with the current version of GCC on all possible targets and
> environments, then I agree that this may make sense, but I would need to
> be convinced that this is the case :-)

The only problem cases would be with message catalogs on systems without
transliteration support in iconv (sufficiently recent GNU libc or GNU
libiconv).  We can always provide pre-transliterated catalogs for the
(locale, charset) pairs we want to support.  Note that at present message
catalogs in GCC do little enough useful anyway (too few up-to-date
translations).  We can also recommend that people with older or non-GNU
systems install GNU libiconv to gain the advantages of translations - and
other advantages in handling input in multiple character sets, for which
iconv is already used by the Java front end and will be used by the C and
C++ front ends.

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Using Unicode quotes (was: Re: Ada files now checked in)
@ 2001-10-07  8:11 dewar
  0 siblings, 0 replies; 4+ messages in thread
From: dewar @ 2001-10-07  8:11 UTC (permalink / raw)
  To: dewar, jsm28; +Cc: gcc, zack

<<The only problem cases would be with message catalogs on systems without
transliteration support in iconv (sufficiently recent GNU libc or GNU
libiconv).  We can always provide pre-transliterated catalogs for the
(locale, charset) pairs we want to support.  Note that at present message
catalogs in GCC do little enough useful anyway (too few up-to-date
translations).  We can also recommend that people with older or non-GNU
systems install GNU libiconv to gain the advantages of translations - and
other advantages in handling input in multiple character sets, for which
iconv is already used by the Java front end and will be used by the C and
C++ front ends.
>>

OK, sounds like this should be the long term or even medium term plan, and
that it is therefore probably not worth worrying about this issue in the
short term.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Using Unicode quotes (was: Re: Ada files now checked in)
  2001-10-07  5:34 Ada files now checked in dewar
@ 2001-10-07  7:22 ` Joseph S. Myers
  0 siblings, 0 replies; 4+ messages in thread
From: Joseph S. Myers @ 2001-10-07  7:22 UTC (permalink / raw)
  To: dewar; +Cc: gcc, zack

On Sun, 7 Oct 2001 dewar@gnat.com wrote:

> The character ' is by the way an apostrophe, not a quote. The normal
> english use is in posessives, and there is a special rule about using
> it for nested quotations in place of normal quote marks.
> 
> Note that this is not completely idle discussion, the GNAT message insertions
> do use quotations as in:
> 
> j.adb:3:04: "xyz" is undefined
> 
> compared to the c message
> 
> j.c:1: `asdf' undeclared (first use this function)

For all non-English languages, the problem can simply be solved by having
the .po message catalogs be UTF-8 encoded, and using the proper Unicode
quotes (U+201C and U+201D for double quotes or U+2018 and U+2019 for
single quotes) since gettext will transliterate when converting to other
locale character sets (at least with glibc 2.2).

For English, things still look prettier (given a UTF-8 terminal, e.g.  
recentish xterm with appropriate options and fonts) if Unicode quotes are
used.  Since things need to produce ASCII double quotes when not in a
UTF-8 LC_CTYPE, but Unicode quotes where available, and since knowledge of
quotes should not be hardcoded everywhere, this suggests adding printf
modifiers to handle quoting to GCC's extensible printf reimplementation.  
The common case of quoting is simply `%s', but the C++ front end also
quotes some longer strings with multiple conversions.  Perhaps the flags
should be ` for an open quote to appear before the converted output and '
for a close quote to appear afterwards; the individual left and right 
quotes would be translated once in each .po file, and there would be 
special-case handling for when no message catalog is being used to produce 
Unicode or ASCII quotes according to the value of LC_CTYPE.

-- 
Joseph S. Myers
jsm28@cam.ac.uk

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-10-07  8:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-07  7:25 Using Unicode quotes (was: Re: Ada files now checked in) dewar
2001-10-07  7:46 ` Joseph S. Myers
  -- strict thread matches above, loose matches on Subject: below --
2001-10-07  8:11 dewar
2001-10-07  5:34 Ada files now checked in dewar
2001-10-07  7:22 ` Using Unicode quotes (was: Re: Ada files now checked in) Joseph S. Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).