public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* UTF8 in identifiers
@ 1998-05-01 12:29 Martin von Loewis
  1998-05-01 14:59 ` Per Bothner
  0 siblings, 1 reply; 4+ messages in thread
From: Martin von Loewis @ 1998-05-01 12:29 UTC (permalink / raw)
  To: egcs

After reading gxxint, I see that the mangling format already provides
for Unicode in identifiers. This is good.

However, I wonder whether it might be better to choose a different
solution: mangle Unicode characters as UTF-8. This would give a couple
of advantages:
- It applies to C as well. The mangling approach cannot be extended to
  C without introducing ambiguities. However, C9X has the same kind of
  Unicode support as ISO C++.
- It is more compact. The current mangling consumes 5 bytes per
  character, plus one per identifier.
- It supports \U escapes as well. This is not an important issue, as
  the current mangling can be extended to \U escapes, and because
  those escapes will be rare in the next few years.

There is one drawback, of course: UTF-8 is illegal in most assemblers.
So in order to implement this, I would have to start with binutils. As
far as I can tell, only gas needs to be changed - ld already handles 8
bit characters in symbols.

On platforms where the GNU binutils are not used, one would still have
to go with the current mechanism, so I would put UTF8_IN_IDENTIFIERS
into gcc, similar to DOLLAR_IN_IDENTIFIERS.

Before starting to work on it, I'd like to know what people think
about this proposal.

TIA,
Martin
  

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-05-06  9:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1998-05-01 12:29 UTF8 in identifiers Martin von Loewis
1998-05-01 14:59 ` Per Bothner
1998-05-06  6:22   ` Gerald Pfeifer
1998-05-06  9:19     ` Per Bothner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).