From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Martin v. Loewis" To: vonbrand@sleipnir.valparaiso.cl Cc: eggert@twinsun.com, zack@rabi.columbia.edu, rms@gnu.org, bothner@cygnus.com, amylaar@cygnus.co.uk, gcc2@gnu.org, egcs@cygnus.com Subject: Re: revised proposal for GCC and non-Ascii source files Date: Sun, 31 Jan 1999 23:58:00 -0000 Message-id: <199901030110.CAA00589@mira.isdn.cs.tu-berlin.de> References: <199901021600.NAA14215@sleipnir.valparaiso.cl> X-SW-Source: 1999-01n/msg00050.html > Has anybody else implemented this kind of stuff? What are the ideas > floating around? For string and character literals, non-ASCII support always was in the compilers, in some form (usually not supporting multibyte strings). They were just copied through as-is. For non-ASCII in identifiers, I believe this is actually the first time in computing history. Java (i.e. SunSoft or whoever) held onto Unicode and Unicode only. This is the origin of the \u escapes, right? The C standard very strongly suggests to use 2-byte escapes for Unicode: they say that a \u character counts as six bytes in the 31 byte limit for external identifiers, and \U counts as 10 bytes :-) > In that case, the easy way out is to assume that... or are there > fundamental reasons that _force_ the concurrent use of incompatible > charsets? There are fundamental reasons to support more than one charset, not necessarily concurrently. The main reason is that Unicode is not (yet?) universally accepted. Whether this means we need *simultaneous* use of different charsets is still an open question. > Just disallow (for the time being?) non-ASCII (or non-Unicode + assorted > restrictions?) external identifiers. This would basically put as back to where we are right now. We can have that approach for free :-) Regards, Martin