From: Paul Eggert <eggert@twinsun.com>
To: rms@gnu.org
Cc: zack@rabi.columbia.edu, bothner@cygnus.com, amylaar@cygnus.co.uk,
martin@mira.isdn.cs.tu-berlin.de, gcc2@gnu.org, egcs@cygnus.com
Subject: Re: thoughts on martin's proposed patch for GCC and UTF-8
Date: Sat, 26 Dec 1998 00:36:00 -0000 [thread overview]
Message-ID: <199812260834.AAA11774@shade.twinsun.com> (raw)
In-Reply-To: <199812250809.DAA05042@psilocin.gnu.org>
Date: Fri, 25 Dec 1998 03:07:56 -0500
From: Richard Stallman <rms@gnu.org>
Even if we solve the mangling problem, though, the ASCII-only
name-mangling method seems less useful than UTF-8 name mangling.
Neither mangling method allows an arbitrary native encoding
(e.g. Shift-JIS or ISO-2022-JP) to be used uniformly,
ASCII-only name mangling ought to achieve that. Could you
please explain why you think it will not?
Here's what I was thinking:
* Unsafe native encodings can't be used in assembly-language strings.
The simplest way to handle this is to do what GCC currently does:
escape non-ASCII bytes in assembly-language strings using notation
like `\377'.
* Hence, if ASCII-only name mangling is also used, assembly language
files will contain only ASCII, regardless of the input encoding.
* This will work, but it's unfriendly for non-English writers, because
it means that assembly language uses ASCII instead of the native
encoding -- i.e. the native encoding isn't being used uniformly in
both source and assembly language output. E.g. suppose we have the
following code:
const char message[] = "contents";
except that the words `message and `contents' are in Japanese. A
Japanese reader would naturally desire to see something like the
following assembly language output:
message:
.asciz "contents"
except, of course, the words `message' and `contents' would be in
Japanese. Unfortunately, though, with ASCII name mangling, and with
string mangling as described above, the Japanese reader will see
something like the following instead:
.x8c.x32.x9c.x41.x91.x32.xac.x90:
.asciz "\200 \x309!\x240@\x201\\\x300\""
which is painful to work with.
If GCC outputs bytes with the top bit on in assembly language
identifiers and strings, then at least safe encodings like UTF-8, ISO
8859, and EUC will yield the naturally desired assembly language
output. (Shift-JIS and other unsafe encodings may still yield
undesirable escapes in output, but this is no worse than the escapes
they already get.) I believe this is what is partly motivating
martin's proposed patch, and I'm sympathetic to this motivation.
Date: Fri, 25 Dec 1998 03:09:25 -0500
From: Richard Stallman <rms@gnu.org>
the default mode should be not to convert, and in that case, GCC
doesn't need to know what the encoding is (unless /u is used).
Even when not converting, GCC needs to know the input encoding if it's
an unsafe one like Shift-JIS or ISO-2022-JP (``unsafe'' meaning ``some
multibyte chars contain ASCII bytes'') -- otherwise GCC won't be able
to parse comments, strings, and identifiers correctly. Much (if not
most) east Asian text currently uses unsafe encodings, so this is not
a minor point.
next prev parent reply other threads:[~1998-12-26 0:36 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <19981204032449.3033.qmail@comton.airs.com>
[not found] ` <199812060519.VAA07309@shade.twinsun.com>
[not found] ` <366C0645.61C48A38@cygnus.com>
[not found] ` <199812080057.QAA00491@shade.twinsun.com>
[not found] ` <366D460E.4FB0ECD0@cygnus.com>
1998-12-09 13:44 ` Paul Eggert
1998-12-09 14:38 ` Martin von Loewis
1998-12-09 14:56 ` Per Bothner
1998-12-09 22:57 ` Martin von Loewis
1998-12-09 23:16 ` Per Bothner
1998-12-11 19:27 ` Paul Eggert
1998-12-09 17:46 ` Paul Eggert
1998-12-09 18:01 ` Tim Hollebeek
1998-12-10 5:58 ` Craig Burley
1998-12-10 10:21 ` Tim Hollebeek
1998-12-10 11:50 ` Craig Burley
1998-12-10 14:23 ` Chip Salzenberg
1998-12-09 23:03 ` Per Bothner
1998-12-10 7:49 ` Ian Lance Taylor
1998-12-11 19:23 ` Paul Eggert
1998-12-12 2:21 ` Martin von Loewis
1998-12-13 6:23 ` Richard Stallman
1998-12-13 12:27 ` Martin von Loewis
1998-12-14 2:22 ` Richard Stallman
1998-12-15 10:47 ` Paul Eggert
1998-12-17 18:10 ` Richard Stallman
1998-12-17 21:41 ` Paul Eggert
1998-12-18 1:23 ` Martin von Loewis
1998-12-17 23:55 ` Joern Rennecke
1998-12-19 5:13 ` Richard Stallman
1998-12-19 10:36 ` Paul Eggert
1998-12-20 20:29 ` Richard Stallman
1998-12-21 1:52 ` Andreas Schwab
1998-12-22 1:09 ` Richard Stallman
1998-12-20 20:29 ` Richard Stallman
1998-12-21 7:00 ` Zack Weinberg
1998-12-21 18:58 ` Paul Eggert
1998-12-21 19:07 ` Zack Weinberg
1998-12-21 19:28 ` Ulrich Drepper
1998-12-23 0:36 ` Richard Stallman
1998-12-21 18:11 ` Paul Eggert
1998-12-21 18:46 ` Per Bothner
1998-12-21 19:44 ` Paul Eggert
1998-12-21 20:30 ` Per Bothner
1998-12-23 0:35 ` Richard Stallman
1998-12-21 20:16 ` Paul Eggert
1998-12-21 20:28 ` Zack Weinberg
1998-12-22 2:59 ` Paul Eggert
1998-12-23 17:16 ` Richard Stallman
1998-12-23 18:11 ` Zack Weinberg
1998-12-25 0:05 ` Richard Stallman
1998-12-28 5:55 ` Martin von Loewis
1998-12-30 5:19 ` Richard Stallman
1998-12-23 19:21 ` Paul Eggert
1998-12-25 0:05 ` Richard Stallman
1998-12-25 0:05 ` Richard Stallman
1998-12-21 21:03 ` Per Bothner
1998-12-22 2:35 ` Paul Eggert
1998-12-28 8:10 ` Martin von Loewis
1998-12-28 11:00 ` Per Bothner
1998-12-25 0:05 ` Richard Stallman
1998-12-26 0:36 ` Paul Eggert [this message]
1998-12-27 17:24 ` Richard Stallman
1998-12-21 19:16 ` Per Bothner
1998-12-21 19:20 ` Per Bothner
1998-12-23 0:35 ` Richard Stallman
1998-12-22 3:09 ` Joern Rennecke
1998-12-22 10:52 ` Paul Eggert
1998-12-23 0:36 ` Richard Stallman
1998-12-21 12:25 ` Samuel Figueroa
1998-12-15 22:00 ` Paul Eggert
1998-12-15 23:17 ` Martin von Loewis
1998-12-17 7:32 ` Paul Eggert
1998-12-17 16:48 ` Martin von Loewis
1998-12-17 22:10 ` Paul Eggert
1998-12-18 21:31 ` Richard Stallman
1998-12-16 0:18 ` Per Bothner
1998-12-09 23:18 ` Martin von Loewis
1998-12-10 7:57 ` Ian Lance Taylor
1998-12-10 13:12 ` Martin von Loewis
1998-12-11 19:32 ` Paul Eggert
1998-12-11 19:34 ` Ken Raeburn
1998-12-14 17:05 ` Ian Lance Taylor
1998-12-11 19:28 ` Paul Eggert
1998-12-12 1:06 ` Martin von Loewis
[not found] ` <199812100200.VAA06419.cygnus.egcs@wagner.Princeton.EDU>
1998-12-10 11:31 ` Jonathan Larmour
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=199812260834.AAA11774@shade.twinsun.com \
--to=eggert@twinsun.com \
--cc=amylaar@cygnus.co.uk \
--cc=bothner@cygnus.com \
--cc=egcs@cygnus.com \
--cc=gcc2@gnu.org \
--cc=martin@mira.isdn.cs.tu-berlin.de \
--cc=rms@gnu.org \
--cc=zack@rabi.columbia.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).