public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Martin von Loewis <martin@mira.isdn.cs.tu-berlin.de>
To: egcs@cygnus.com
Subject: UTF8 in identifiers
Date: Fri, 01 May 1998 12:29:00 -0000	[thread overview]
Message-ID: <199805011927.VAA16755@mira.isdn.cs.tu-berlin.de> (raw)

After reading gxxint, I see that the mangling format already provides
for Unicode in identifiers. This is good.

However, I wonder whether it might be better to choose a different
solution: mangle Unicode characters as UTF-8. This would give a couple
of advantages:
- It applies to C as well. The mangling approach cannot be extended to
  C without introducing ambiguities. However, C9X has the same kind of
  Unicode support as ISO C++.
- It is more compact. The current mangling consumes 5 bytes per
  character, plus one per identifier.
- It supports \U escapes as well. This is not an important issue, as
  the current mangling can be extended to \U escapes, and because
  those escapes will be rare in the next few years.

There is one drawback, of course: UTF-8 is illegal in most assemblers.
So in order to implement this, I would have to start with binutils. As
far as I can tell, only gas needs to be changed - ld already handles 8
bit characters in symbols.

On platforms where the GNU binutils are not used, one would still have
to go with the current mechanism, so I would put UTF8_IN_IDENTIFIERS
into gcc, similar to DOLLAR_IN_IDENTIFIERS.

Before starting to work on it, I'd like to know what people think
about this proposal.

TIA,
Martin
  

             reply	other threads:[~1998-05-01 12:29 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1998-05-01 12:29 Martin von Loewis [this message]
1998-05-01 14:59 ` Per Bothner
1998-05-06  6:22   ` Gerald Pfeifer
1998-05-06  9:19     ` Per Bothner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=199805011927.VAA16755@mira.isdn.cs.tu-berlin.de \
    --to=martin@mira.isdn.cs.tu-berlin.de \
    --cc=egcs@cygnus.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).