Re: revised proposal for GCC and non-Ascii source files

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Paul Eggert <eggert@twinsun.com>
To: zack@rabi.columbia.edu
Cc: martin@mira.isdn.cs.tu-berlin.de, rms@gnu.org,
	bothner@cygnus.com, amylaar@cygnus.co.uk, gcc2@gnu.org,
	egcs@cygnus.com
Subject: Re: revised proposal for GCC and non-Ascii source files
Date: Sun, 31 Jan 1999 23:58:00 -0000	[thread overview]
Message-ID: <199901050020.QAA06754@shade.twinsun.com> (raw)
In-Reply-To: <199901042115.QAA13627@rabi.phys.columbia.edu>

   Date: Mon, 04 Jan 1999 16:15:46 -0500
   From: Zack Weinberg <zack@rabi.columbia.edu>

   The preprocessor may have trouble with native extended chars as
   first characters, but if we can get isalpha() to cooperate, it
   should work.

There should be no trouble with isalpha cooperating, since GCC
shouldn't use isalpha to detect identifiers.

To detect C alphanumerics, GCC must use its own function or table.
GCC shouldn't use isalpha unless we want it to detect alphanumeric
bytes in the current locale, which is normally not what is wanted.
It might be OK for GCC to use isalpha for some obscure low-level stuff,
e.g. detecting whether an MS-DOS file name has a drive specifier.
But GCC shouldn't use isalpha for detecting identifiers, because
isalpha doesn't work with multibyte chars.

Parts of GCC already do the right thing here.  E.g. cccp.c uses
is_idchar instead of isalpha.  Some other parts of GCC use isalpha
where they shouldn't, though, and we'll have to fix these parts if we
want gcc to work with multibyte chars properly.

I already did most of this job for the back end and the C front end,
in the internationalization patch that has been applied to GCC2:

ftp://alpha.gnu.org/gnu/testgcc-980705-intl.patch.gz

However, this patch didn't cover the C++ front end, and there are
probably some other places that I missed that we'll just have to find
if and when they come up.  Also, this patch hasn't been fully applied
to EGCS yet, and there's undoubtedly some integration work needed
there since GCC2 and EGCS have diverged.  At some point I'd like to
get it working properly with EGCS as well as GCC2.  I understand that
a merge is in progress and so I'll wait for it to finish before
hacking any further on isalpha problems.

   This raises the issue of how we tell native extended character X
   from native ASCII character %.  I'm beginning to suspect we need
   the more general locale information, not just the charset.

I reluctantly agree.  Another reason we need the locale info is
because iconv (which needs only the charset) doesn't let us determine
character boundaries reliably; we need mbrlen and mbsinit for that,
and they need the LC_CTYPE locale.

This is why I'm changing the word `charset' to `ctype' in my next
version of the proposal for non-Ascii source files.  GCC will need to
know the LC_CTYPE locale (which I am calling the `ctype'), and from
that GCC can use nl_langinfo (CODESET) to determine the charset.

next prev parent reply	other threads:[~1999-01-31 23:58 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1998-12-28 17:59 Paul Eggert
1998-12-29  1:38 ` Martin von Loewis
1998-12-29  1:39 ` Martin von Loewis
1998-12-29  5:53   ` Paul Eggert
1998-12-29  6:22     ` Martin von Loewis
1998-12-31 13:55       ` Paul Eggert
1999-01-31 23:58         ` Zack Weinberg
1999-01-31 23:58           ` Horst von Brand
1999-01-31 23:58             ` Paul Eggert
1999-01-31 23:58               ` Horst von Brand
1999-01-31 23:58                 ` Martin v. Loewis
1999-01-31 23:58                 ` Paul Eggert
1999-01-31 23:58                   ` Horst von Brand
1999-01-31 23:58                     ` Martin v. Loewis
1999-01-31 23:58           ` Paul Eggert
1999-01-31 23:58           ` Martin v. Loewis
1999-01-31 23:58             ` Zack Weinberg
1999-01-31 23:58               ` Paul Eggert [this message]
1999-01-31 23:58               ` Martin v. Loewis
1999-01-31 23:58         ` Martin v. Loewis
1999-01-31 23:58           ` Paul Eggert
1999-01-31 23:58           ` Per Bothner
1999-01-31 23:58             ` Joe Buck
1999-01-31 23:58               ` Jeffrey A Law
1998-12-29  1:50 ` Martin von Loewis
1998-12-29  5:41   ` Paul Eggert
1998-12-30  5:21 ` Richard Stallman
1998-12-31 15:55   ` Paul Eggert
1998-12-30 14:58 ` Zack Weinberg
1998-12-31 14:28   ` Paul Eggert
1998-12-31 15:13   ` problems with C9x's _Pragma Paul Eggert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=199901050020.QAA06754@shade.twinsun.com \
    --to=eggert@twinsun.com \
    --cc=amylaar@cygnus.co.uk \
    --cc=bothner@cygnus.com \
    --cc=egcs@cygnus.com \
    --cc=gcc2@gnu.org \
    --cc=martin@mira.isdn.cs.tu-berlin.de \
    --cc=rms@gnu.org \
    --cc=zack@rabi.columbia.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).