From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Martin v. Loewis" <martin@mira.isdn.cs.tu-berlin.de>
To: vonbrand@sleipnir.valparaiso.cl
Cc: eggert@twinsun.com, zack@rabi.columbia.edu, rms@gnu.org, bothner@cygnus.com, amylaar@cygnus.co.uk, gcc2@gnu.org, egcs@cygnus.com
Subject: Re: revised proposal for GCC and non-Ascii source files
Date: Sun, 31 Jan 1999 23:58:00 -0000
Message-id: <199901030110.CAA00589@mira.isdn.cs.tu-berlin.de>
References: <199901021600.NAA14215@sleipnir.valparaiso.cl>
X-SW-Source: 1999-01n/msg00050.html

> Has anybody else implemented this kind of stuff? What are the ideas
> floating around?

For string and character literals, non-ASCII support always was in the
compilers, in some form (usually not supporting multibyte
strings). They were just copied through as-is.

For non-ASCII in identifiers, I believe this is actually the first
time in computing history. Java (i.e. SunSoft or whoever) held onto
Unicode and Unicode only. This is the origin of the \u escapes, right?

The C standard very strongly suggests to use 2-byte escapes for
Unicode: they say that a \u character counts as six bytes in the 31
byte limit for external identifiers, and \U counts as 10 bytes :-)

> In that case, the easy way out is to assume that... or are there
> fundamental reasons that _force_ the concurrent use of incompatible
> charsets?

There are fundamental reasons to support more than one charset, not
necessarily concurrently. The main reason is that Unicode is not
(yet?) universally accepted.

Whether this means we need *simultaneous* use of different charsets is
still an open question.

> Just disallow (for the time being?) non-ASCII (or non-Unicode + assorted
> restrictions?) external identifiers.

This would basically put as back to where we are right now. We can
have that approach for free :-)

Regards,
Martin