public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: Howto make another convertion with _identifiers_ following '#' in libcpp
@ 2007-12-08  6:12 Zack Weinberg
  0 siblings, 0 replies; 2+ messages in thread
From: Zack Weinberg @ 2007-12-08  6:12 UTC (permalink / raw)
  To: Lijuan Hai, GCC Development

Lijuan Hai wrote:
>
> I have a plan to convert UCN to alphabet instead of UTF8 in
> GCC-4.2.0, and already handled it in libcpp.

I would like to offer advice, but I don't understand what you are
trying to do.  You say you want to "convert UCN[s] to [an] alphabet
instead of UTF8" but that doesn't make any sense.  Alphabets are
abstract sets of glyphs commonly used to write a language.  They are
not alternatives to UTF8 (a scheme for encoding integers as sequences
of bytes) or even to Unicode (a mapping from integers to glyphs).

The only thing I can guess is that you want to convert UCNs to some
specific character set other than Unicode, like EUC-JP or ISO8859.n.
In that case the first thing I must ask you is to read up on the
-fexec-charset option, and to explain why that doesn't do what you
need it to do.

> But I encountered a problem when compiling the code like following:
> -------------------cut-------------------
> 1:  #define str(t) #t
> 2:  int foo()
> 3:  {
> 4:    char* cc = str(\u1234);
> 5:    if (!strcmp(cc, "\u1234"))
> 6:      abort();
> 7: }
> -------------------cut-------------------
>   With my changes, \u1234 is converted to alphabet in line 4 while
> kept in line 5. It's incorrect and also unexpected to convert it in
> line 4 for '#' makes it different from plain identifiers.

As I don't know what you mean by "converted to alphabet", I can't say
for sure, but if I had to guess, I'd say you inserted your code into
the routines for scanning identifiers?  But at that point there is no
way to know that there is a '#' in effect.  You need to postpone the
conversion, whatever it is, until much later; the point where cpplib
hands off identifiers to the compiler proper, or perhaps even the
assembly output macros, depending on your goal.

(Have you read the long comment at the top of libcpp/charset.c?  Do
you understand all of the fine distinctions made there?)

zw

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Howto make another convertion with _identifiers_ following '#' in libcpp
@ 2007-12-08  4:39 Lijuan Hai
  0 siblings, 0 replies; 2+ messages in thread
From: Lijuan Hai @ 2007-12-08  4:39 UTC (permalink / raw)
  To: gcc; +Cc: hailijuan

Hi all,
  I have a plan to convert UCN to alphabet instead of UTF8 in
GCC-4.2.0, and already handled it in libcpp. But I encountered a
problem when compiling the code like following:
-------------------cut-------------------
1:  #define str(t) #t
2:  int foo()
3:  {
4:    char* cc = str(\u1234);
5:    if (!strcmp(cc, "\u1234"))
6:      abort();
7: }
-------------------cut-------------------
  With my changes, \u1234 is converted to alphabet in line 4 while
kept in line 5. It's incorrect and also unexpected to convert it in
line 4 for '#' makes it different from plain identifiers. So how could
I catch the case and prevent converting it to alphabet? I believe
there's someway in libcpp to handle it well. Anyone familiar with
libcpp processing? Thanks in advance. Nice weekends.

-- 
        Best wishes!
Yours,
Lijuan Hai
  _  _
  (_)(_)
   (,,)
  =()=
 ((__)\
   _|L\_______/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-12-08  4:39 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-08  6:12 Howto make another convertion with _identifiers_ following '#' in libcpp Zack Weinberg
  -- strict thread matches above, loose matches on Subject: below --
2007-12-08  4:39 Lijuan Hai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).