From mboxrd@z Thu Jan  1 00:00:00 1970
From: Martin von Loewis <martin@mira.isdn.cs.tu-berlin.de>
To: egcs@cygnus.com
Subject: UTF8 in identifiers
Date: Fri, 01 May 1998 12:29:00 -0000
Message-id: <199805011927.VAA16755@mira.isdn.cs.tu-berlin.de>
X-SW-Source: 1998-05/msg00011.html

After reading gxxint, I see that the mangling format already provides
for Unicode in identifiers. This is good.

However, I wonder whether it might be better to choose a different
solution: mangle Unicode characters as UTF-8. This would give a couple
of advantages:
- It applies to C as well. The mangling approach cannot be extended to
  C without introducing ambiguities. However, C9X has the same kind of
  Unicode support as ISO C++.
- It is more compact. The current mangling consumes 5 bytes per
  character, plus one per identifier.
- It supports \U escapes as well. This is not an important issue, as
  the current mangling can be extended to \U escapes, and because
  those escapes will be rare in the next few years.

There is one drawback, of course: UTF-8 is illegal in most assemblers.
So in order to implement this, I would have to start with binutils. As
far as I can tell, only gas needs to be changed - ld already handles 8
bit characters in symbols.

On platforms where the GNU binutils are not used, one would still have
to go with the current mechanism, so I would put UTF8_IN_IDENTIFIERS
into gcc, similar to DOLLAR_IN_IDENTIFIERS.

Before starting to work on it, I'd like to know what people think
about this proposal.

TIA,
Martin