public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Paul Eggert <eggert@twinsun.com>
To: rms@gnu.org
Cc: zack@rabi.columbia.edu, bothner@cygnus.com, amylaar@cygnus.co.uk,
	martin@mira.isdn.cs.tu-berlin.de, gcc2@gnu.org, egcs@cygnus.com
Subject: Re: thoughts on martin's proposed patch for GCC and UTF-8
Date: Sat, 26 Dec 1998 00:36:00 -0000	[thread overview]
Message-ID: <199812260834.AAA11774@shade.twinsun.com> (raw)
In-Reply-To: <199812250809.DAA05042@psilocin.gnu.org>

   Date: Fri, 25 Dec 1998 03:07:56 -0500
   From: Richard Stallman <rms@gnu.org>

       Even if we solve the mangling problem, though, the ASCII-only
       name-mangling method seems less useful than UTF-8 name mangling.
       Neither mangling method allows an arbitrary native encoding
       (e.g. Shift-JIS or ISO-2022-JP) to be used uniformly, 

   ASCII-only name mangling ought to achieve that.  Could you
   please explain why you think it will not?

Here's what I was thinking:

* Unsafe native encodings can't be used in assembly-language strings.
  The simplest way to handle this is to do what GCC currently does:
  escape non-ASCII bytes in assembly-language strings using notation
  like `\377'.

* Hence, if ASCII-only name mangling is also used, assembly language
  files will contain only ASCII, regardless of the input encoding.

* This will work, but it's unfriendly for non-English writers, because
  it means that assembly language uses ASCII instead of the native
  encoding -- i.e. the native encoding isn't being used uniformly in
  both source and assembly language output.  E.g. suppose we have the
  following code:

	const char message[] = "contents";

  except that the words `message and `contents' are in Japanese.  A
  Japanese reader would naturally desire to see something like the
  following assembly language output:

	message:
		.asciz	"contents"

  except, of course, the words `message' and `contents' would be in
  Japanese.  Unfortunately, though, with ASCII name mangling, and with
  string mangling as described above, the Japanese reader will see
  something like the following instead:

	.x8c.x32.x9c.x41.x91.x32.xac.x90:
		.asciz "\200 \x309!\x240@\x201\\\x300\""

  which is painful to work with.

If GCC outputs bytes with the top bit on in assembly language
identifiers and strings, then at least safe encodings like UTF-8, ISO
8859, and EUC will yield the naturally desired assembly language
output.  (Shift-JIS and other unsafe encodings may still yield
undesirable escapes in output, but this is no worse than the escapes
they already get.)  I believe this is what is partly motivating
martin's proposed patch, and I'm sympathetic to this motivation.

   Date: Fri, 25 Dec 1998 03:09:25 -0500
   From: Richard Stallman <rms@gnu.org>

   the default mode should be not to convert, and in that case, GCC
   doesn't need to know what the encoding is (unless /u is used).

Even when not converting, GCC needs to know the input encoding if it's
an unsafe one like Shift-JIS or ISO-2022-JP (``unsafe'' meaning ``some
multibyte chars contain ASCII bytes'') -- otherwise GCC won't be able
to parse comments, strings, and identifiers correctly.  Much (if not
most) east Asian text currently uses unsafe encodings, so this is not
a minor point.

  reply	other threads:[~1998-12-26  0:36 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <19981204032449.3033.qmail@comton.airs.com>
     [not found] ` <199812060519.VAA07309@shade.twinsun.com>
     [not found]   ` <366C0645.61C48A38@cygnus.com>
     [not found]     ` <199812080057.QAA00491@shade.twinsun.com>
     [not found]       ` <366D460E.4FB0ECD0@cygnus.com>
1998-12-09 13:44         ` Paul Eggert
1998-12-09 14:38           ` Martin von Loewis
1998-12-09 14:56             ` Per Bothner
1998-12-09 22:57               ` Martin von Loewis
1998-12-09 23:16                 ` Per Bothner
1998-12-11 19:27                   ` Paul Eggert
1998-12-09 17:46             ` Paul Eggert
1998-12-09 18:01               ` Tim Hollebeek
1998-12-10  5:58                 ` Craig Burley
1998-12-10 10:21                   ` Tim Hollebeek
1998-12-10 11:50                     ` Craig Burley
1998-12-10 14:23                   ` Chip Salzenberg
1998-12-09 23:03               ` Per Bothner
1998-12-10  7:49                 ` Ian Lance Taylor
1998-12-11 19:23                 ` Paul Eggert
1998-12-12  2:21                   ` Martin von Loewis
1998-12-13  6:23                     ` Richard Stallman
1998-12-13 12:27                       ` Martin von Loewis
1998-12-14  2:22                         ` Richard Stallman
1998-12-15 10:47                           ` Paul Eggert
1998-12-17 18:10                             ` Richard Stallman
1998-12-17 21:41                               ` Paul Eggert
1998-12-18  1:23                                 ` Martin von Loewis
1998-12-17 23:55                               ` Joern Rennecke
1998-12-19  5:13                                 ` Richard Stallman
1998-12-19 10:36                                   ` Paul Eggert
1998-12-20 20:29                                     ` Richard Stallman
1998-12-21  1:52                                       ` Andreas Schwab
1998-12-22  1:09                                         ` Richard Stallman
1998-12-20 20:29                                     ` Richard Stallman
1998-12-21  7:00                                       ` Zack Weinberg
1998-12-21 18:58                                         ` Paul Eggert
1998-12-21 19:07                                           ` Zack Weinberg
1998-12-21 19:28                                           ` Ulrich Drepper
1998-12-23  0:36                                           ` Richard Stallman
1998-12-21 18:11                                       ` Paul Eggert
1998-12-21 18:46                                         ` Per Bothner
1998-12-21 19:44                                           ` Paul Eggert
1998-12-21 20:30                                             ` Per Bothner
1998-12-23  0:35                                               ` Richard Stallman
1998-12-21 20:16                                           ` Paul Eggert
1998-12-21 20:28                                             ` Zack Weinberg
1998-12-22  2:59                                               ` Paul Eggert
1998-12-23 17:16                                                 ` Richard Stallman
1998-12-23 18:11                                                   ` Zack Weinberg
1998-12-25  0:05                                                     ` Richard Stallman
1998-12-28  5:55                                                       ` Martin von Loewis
1998-12-30  5:19                                                         ` Richard Stallman
1998-12-23 19:21                                                   ` Paul Eggert
1998-12-25  0:05                                                     ` Richard Stallman
1998-12-25  0:05                                                     ` Richard Stallman
1998-12-21 21:03                                             ` Per Bothner
1998-12-22  2:35                                               ` Paul Eggert
1998-12-28  8:10                                               ` Martin von Loewis
1998-12-28 11:00                                                 ` Per Bothner
1998-12-25  0:05                                             ` Richard Stallman
1998-12-26  0:36                                               ` Paul Eggert [this message]
1998-12-27 17:24                                                 ` Richard Stallman
1998-12-21 19:16                                         ` Per Bothner
1998-12-21 19:20                                           ` Per Bothner
1998-12-23  0:35                                           ` Richard Stallman
1998-12-22  3:09                                         ` Joern Rennecke
1998-12-22 10:52                                           ` Paul Eggert
1998-12-23  0:36                                         ` Richard Stallman
1998-12-21 12:25                                     ` Samuel Figueroa
1998-12-15 22:00                     ` Paul Eggert
1998-12-15 23:17                       ` Martin von Loewis
1998-12-17  7:32                         ` Paul Eggert
1998-12-17 16:48                           ` Martin von Loewis
1998-12-17 22:10                             ` Paul Eggert
1998-12-18 21:31                           ` Richard Stallman
1998-12-16  0:18                       ` Per Bothner
1998-12-09 23:18               ` Martin von Loewis
1998-12-10  7:57                 ` Ian Lance Taylor
1998-12-10 13:12                   ` Martin von Loewis
1998-12-11 19:32                   ` Paul Eggert
1998-12-11 19:34                   ` Ken Raeburn
1998-12-14 17:05                     ` Ian Lance Taylor
1998-12-11 19:28                 ` Paul Eggert
1998-12-12  1:06                   ` Martin von Loewis
     [not found]               ` <199812100200.VAA06419.cygnus.egcs@wagner.Princeton.EDU>
1998-12-10 11:31                 ` Jonathan Larmour

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=199812260834.AAA11774@shade.twinsun.com \
    --to=eggert@twinsun.com \
    --cc=amylaar@cygnus.co.uk \
    --cc=bothner@cygnus.com \
    --cc=egcs@cygnus.com \
    --cc=gcc2@gnu.org \
    --cc=martin@mira.isdn.cs.tu-berlin.de \
    --cc=rms@gnu.org \
    --cc=zack@rabi.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).