From mboxrd@z Thu Jan 1 00:00:00 1970 From: Martin von Loewis To: egcs@cygnus.com Subject: UTF8 in identifiers Date: Fri, 01 May 1998 12:29:00 -0000 Message-id: <199805011927.VAA16755@mira.isdn.cs.tu-berlin.de> X-SW-Source: 1998-05/msg00011.html After reading gxxint, I see that the mangling format already provides for Unicode in identifiers. This is good. However, I wonder whether it might be better to choose a different solution: mangle Unicode characters as UTF-8. This would give a couple of advantages: - It applies to C as well. The mangling approach cannot be extended to C without introducing ambiguities. However, C9X has the same kind of Unicode support as ISO C++. - It is more compact. The current mangling consumes 5 bytes per character, plus one per identifier. - It supports \U escapes as well. This is not an important issue, as the current mangling can be extended to \U escapes, and because those escapes will be rare in the next few years. There is one drawback, of course: UTF-8 is illegal in most assemblers. So in order to implement this, I would have to start with binutils. As far as I can tell, only gas needs to be changed - ld already handles 8 bit characters in symbols. On platforms where the GNU binutils are not used, one would still have to go with the current mechanism, so I would put UTF8_IN_IDENTIFIERS into gcc, similar to DOLLAR_IN_IDENTIFIERS. Before starting to work on it, I'd like to know what people think about this proposal. TIA, Martin