Re: NO_DOLLAR_IN

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58               ` Per Bothner
  1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1 sibling, 1 reply; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: egcs, gcc2

I hope my new proposal meets some of the objections;  here
are some comment on previous mail:

> I suggested `__9V_' earlier, but `_9'
> should do if we want something shorter.

"_9" is not a reserved prefix, I believe.  "__9V_" seems
very arbitrary, but I guess I missed the rationale earlier.

> However, if our goal is to use short escapes, then there are better
> characters than underscore.

That is one goal.  Another goal is simplify the spec, and avoid
different manglings on different platforms.  That argues for '_'.
I think in my proposal, there should be very little need to
mangle '_' as other than '_'.

> And mangled C++ names use _ even more often than C names do.

Irrelevant.  The idea is not to encode mangled C++ names.
The idea is to encode C++ identifiers *as part of a mangling spec*.

> Note that UTF-8 is one possible encoding for extended native
> characters, and should be treated like any such other encoding.

But since Unicode is *the* current international standard for
extended native characters, it is not unreasonable to give
special preference to UTF8, or at least make it the default.
(By the way:  The offical current Japanese standard (JIS) for a
multi-national character set is Unicode.  I've seen the document.)
(That is not to say we should lock ourselves into 16-bit Unicode,
as it is likely that more standard "planes" will appear, but new
standards will have Unicode as a subset, and UTF-8 will work just
great with them.)

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58           ` NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58             ` Paul Eggert
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Horst von Brand
  2 siblings, 2 replies; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: martin; +Cc: bothner, egcs, gcc2

   Date: Tue, 5 Jan 1999 00:49:09 +0100
   From: "Martin v. Loewis" <martin@mira.isdn.cs.tu-berlin.de>

   > For example, we can mangle:
   > 	extern "C" void x\u00c6(void)l
   > as:
   > 	_Ux_00c6

This doesn't suffice for C and C++, which require support for ISO
10646 characters that don't fit into 16 bits.  The gxxint.texi
proposal needs to be extended to allow up chars up to 32 bits.

I'm leery of the prefix `_U'.  On Solaris 7, for example, both libX11
and libXol contain symbols that start with `_U'.  It would be better
to use a less common prefix.  I suggested `__9V_' earlier, but `_9'
should do if we want something shorter.

Also, I suggest that escapes' hexadecimal digits be upper case only;
this makes it less likely for the mangled symbols to collide with
existing programs' symbols.  I know that we're in the reserved space,
but it doesn't hurt to upper-case and it might help.

   I think it is a good idea to stick with only the underscore as escape.

I assume you mean as contrasted with `.' (which has problems with some
assemblers).  This is reasonable, as it simplifies the spec.  However,
if our goal is to use short escapes, then there are better characters
than underscore.  In Solaris libc.so, for example, underscore is the
most popular character (4408 occurrences).  And mangled C++ names use
_ even more often than C names do.  We will be more efficient if we
pick an uncommon escape character, like `V' (only 3 occurrences).

   C identifiers that contain universal characters are mangled in a
   different way than C++ and Java identifiers.

This seems unnecessarily confusing to me.  Why not use the same
mangling convention for all languages?  People who write in C++ want
to interface to C routines; we'll simplify their job if we're
consistent.

   C identifiers are mangled by prefixing the whole identifier with _U,
   and replacing each occurence of a universal character \uVWXY with
   _vwxy.

By ``universal character'' I assume that you mean ``a \u escape that
has no extended native character equivalent''.  Otherwise, the \u
escapes won't unify properly with their native equivalents.

   This still leaves the problem of extended native character if they are
   not converted to Unicode or UTF-8.

Note that UTF-8 is one possible encoding for extended native
characters, and should be treated like any such other encoding.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58         ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58           ` Martin v. Loewis
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
                               ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

> For example, we can mangle:
> 	extern "C" void x\u00c6(void)l
> as:
> 	_Ux_00c6

In an attempt to formalize this proposal, I'd write:

C identifiers that contain universal characters are mangled in a
different way than C++ and Java identifiers. The mangling for the
latter is defined in gxxint.texi.

C identifiers are mangled by prefixing the whole identifier with _U,
and replacing each occurence of a universal character \uVWXY with
_vwxy. If the underscore appears in such an identifier, it is
converted to _005f

Is this what you've meant?

This still leaves the problem of extended native character if they are
not converted to Unicode or UTF-8. I think it is a good idea to stick
with only the underscore as escape. Since the lower case letters and
digits are used for UCNs, I use a base 16 encoding, with the following
digits:

ABCDEFGHIJKLMNOP

So if a identifier is "ESC h ( a", the mangled name is

_U_BLh_CIa

With such an encoding, we don't have to take any precautions towards
the C++ mangling, or towards assembler capabilities. The only
restriction (for libraries) is that we reserve all names starting with
_U.

Regards,
Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Jeffrey A Law
@ 1999-01-31 23:58   ` Martin v. Loewis
  1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Jeffrey A Law
  0 siblings, 1 reply; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: egcs, gcc2

> I believe NO_DOLLAR_IN_LABEL is supposed to mean  "the assembler does not
> support $ in label".

So would you consider it a bug then when this is defined for Linux and
Solaris, even though their assembler support $?

> For example, if we need a macro to indicate that mangling should not
> use '$' independent of NO_DOLLAR_IN_LABEL we could use
> NO_DOLLAR_IN_MANGLED_NAMES or something like that.

I have no problems with that.

Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58                   ` Paul Eggert
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Joe Buck
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Marc Espie
  2 siblings, 2 replies; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

Here's a dumb question: if gas and gld can handle arbitrary strings as
names, why not skip name mangling and assembling entirely?  Wouldn't
this be cleanest?

Obviously we would still have to mangle/assemble on platforms that
only allow A-Za-z0-9_ (I'll write more about this later).  And even
for GNU/Linux, we'd need backwards-compatibility options.  But with
the right compatibility option or two, it seems to me that the
transition to straight-through regime would be doable.  E.g. gcc could
optionally generate both old (mangled) and new (unmangled) format
external definition symbols, so that a shared object would link with
both old- and new-format executables.  This would work because the two
name formats can't possibly collide.

I vaguely recall your mentioning the need for a transition to a
new-format mangling scheme for C++ anyway.....

Obviously we'd also need to translate to UTF-8 for Java (and
optionally for other languages), but other than this, it seems to me
that we don't have to mangle if we don't want to.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58               ` Paul Eggert
  1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Per Bothner
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: martin, egcs, gcc2

   Date: Tue, 05 Jan 1999 12:18:55 -0800
   From: Per Bothner <bothner@cygnus.com>

   A non-plain character CH is written as an initial underscore,
   followed by the uppercased hexadecimal expansion of the
   character's numeric value, with initial zeroes removed.

Isn't this ambiguous?  E.g. both \u0123A (2 characters) and \u123A
(1 character) would be written "_123A".

Does the phrase ``the character's numeric value'' assume that native
extended characters are translated to Unicode before mangling begins?
If so, I don't see how to satisfy RMS's desire to support platforms
where names are not translated to Unicode.  And if not, then I don't
see how to distinguish native extended character values from UCN
values when demangling.

Also, does that same phrase assume that UCNs are translated to native
characters before mangling begins?  If so, then I don't know how to
interpret the phrase ``the character's numeric value'' for UCNs; is
this the UCN's corresponding Unicode value, or is it the translated
native character value (which would raise the previous paragraph's
question again)?  And if not, then I don't see how to unify
e.g. \u00B5 with a native MICRO SIGN character.

If native extended character values are involved, the ``numeric
value'' mentioned is of type wchar_t, right?  Otherwise it couldn't
represent the native charset.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58         ` Per Bothner
  1999-01-31 23:58           ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  0 siblings, 1 reply; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

> For C++ and Java, it is perfect. It fails for C.

As I said, it was just an idea for someting complatible with
the existing C++/Java mangling.  G++ is moving away from using
target-dependent magic mangling characters ('$' or '.') towards
using a single mangling scheme, using '_'.

For example, we can mangle:
	extern "C" void x\u00c6(void)l
as:
	_Ux_00c6

This name is reserved to the implementation in both C and C++.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58                       ` Paul Eggert
  1999-01-31 23:58                         ` NO_DOLLAR_IN_LABEL John A. Tamplin
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

   Date: Wed, 06 Jan 1999 14:08:53 -0800
   From: Per Bothner <bothner@cygnus.com>

   Maybe --no-unify-names vs --unify-names should be a configure flag,
   rather than a compile flag. One reason is that they change the ABI
   incompatibly, but a bigger reason is that -funify-names may require
   translation tables and library support, and you may need to decide
   at configure time whether and where to load those from.

Sounds reasonable.  I guess that I had been assuming that the
translation library support would be autoconfed, but perhaps some
installers will want to turn it off even if available.

It might also be helpful to have both kinds of flag -- e.g. the
configure flag specifies whether translation libraries are linked in,
and the compile flag specifies whether they are used (if available).

   If -fno-unify-names is in effect, and the native character set is
   not UTF-8, then it seems to me that UCNs don't make much sense,

Perhaps; but frankly I don't see much sense to UCNs even in UTF-8
environments.  I think UCNs a bit like trigraphs -- they are a
committee invention, and they may look like a good idea if you've
never used them, but that's about the best that one can say for them.
At least UCNs don't have trigraphs' backward-compatibility problems.

I think most people will prefer using native characters, either UTF-8
or some other encoding.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Horst von Brand
@ 1999-01-31 23:58               ` Martin v. Loewis
  0 siblings, 0 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: vonbrand; +Cc: bothner, egcs, gcc2

> One of the virtues of C++ is that it is defined to be (almost) source-level
> compatible with C, and link-compatible (at least for 'extern "C" ...')

... and I don't plan to change that.

Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Jeffrey A Law
@ 1999-01-31 23:58       ` Martin v. Loewis
  0 siblings, 0 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: egcs, gcc2

> If they're getting that definition from svr4.h or some other commonized file
> I probably wouldn't consider it a bug.

This is indeed the case. Is it cast in stone that we never emit '$' on
svr4-like platforms? What is the rationale?

Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58   ` Martin v. Loewis
  1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Per Bothner
  0 siblings, 1 reply; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

> Why?  If NO_DOLLAR_IN_LABEL is not set, we use '$' as the magic
> characters; if it is set, we use '.', unless NO_DOT_IN_LABEL
> is also set.

Because the current value of NO_DOLLAR_IN_LABEL is meaningless.
It says 'we don't want g++ to use DOLLAR', whereas it looks as if
it says 'our assembler does not support DOLLAR'.

> (b) we stick with the existing assumption of a single magic character.

There is no working proposal for such an encoding. Please propose one.

> There is no purpose in designing an encoding that uses *two* magic
> characters.  It will not work on many existing assemblers, and if
> we are going to assume more than a minimal assembler, we might as
> well assume gas, and do whatever is cleanest (which I think is UTF-8).

All assemblers I've tried (gas and Solaris 2.5.1 /usr/ccs/bin/as)
define NO_DOLLAR_IN_LABEL, yet support it. Which major platforms don't
support it?

Regards,
Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58     ` Jeffrey A Law
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  0 siblings, 1 reply; 31+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

  In message < 199901030058.BAA00581@mira.isdn.cs.tu-berlin.de >you write:
  > > I believe NO_DOLLAR_IN_LABEL is supposed to mean  "the assembler does not
  > > support $ in label".
  > 
  > So would you consider it a bug then when this is defined for Linux and
  > Solaris, even though their assembler support $?
Possibly.  Depends on the circumstances.

If they're getting that definition from svr4.h or some other commonized file
I probably wouldn't consider it a bug.

Also remember that we support different assemblers on many targets; it may
have been the case that some version of the assembler would not accept '$', or
was buggy, etc.

However, changing it now is the wrong thing to do since that's an ABI breakage.

jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58     ` Per Bothner
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL David Edelsohn
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  0 siblings, 2 replies; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

> Because the current value of NO_DOLLAR_IN_LABEL is meaningless.
> It says 'we don't want g++ to use DOLLAR', whereas it looks as if
> it says 'our assembler does not support DOLLAR'.

Well, I see no reason why we need to distinguish these situations,
as long as we anyway have to support assemblers that don't *allow* '$',

> There is no working proposal for such an encoding. Please propose one.

I have proposed such an encoding.  See gcc/cp/gxxint.texi.
(That is not to say that I will make any claims that that is
particularly good encoding.)

> All assemblers I've tried (gas and Solaris 2.5.1 /usr/ccs/bin/as) 
> define NO_DOLLAR_IN_LABEL, yet support it. Which major platforms don't 
> support it?

But "major platforms" is not the issue.  Most or all major platforms
support gas.  The issue is what we are doing for minor platforms.
Personally, I don't really care if we start requiring an assembler
that accepts '$'.  Some people might care, though.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58               ` Martin v. Loewis
  1 sibling, 0 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: eggert; +Cc: bothner, egcs, gcc2

> I suggested `__9V_' earlier, but `_9' should do if we want something
> shorter.

__9V will be interpreted as a constructor name of a class with 9 letters.
_9 is not a reserved name.

>    C identifiers that contain universal characters are mangled in a
>    different way than C++ and Java identifiers.
> 
> This seems unnecessarily confusing to me.  Why not use the same
> mangling convention for all languages?

Maybe the wording was confusing. What I meant is that C++ and Java
were covered by gxxint.texi. The manglings will look very similar.
The only difference is the prefixing in the entire identifier.

For example

  foo(_tmp,\u1234);

becomes foo__F4_tmpU5_1234 under the gxxint.texi mangling, and 'foo'
if it is extern "C" (since parameter names are not mangled in C).
With a common mangling, we would either get

_Ufoo_00f6_00f6F4_00f6tmp1_1234

or

_Ufoo_00f6_00f6F8_00f6tmp5_1234

depending on whether the length of the class names have to take the
escaped names into account or not. 

The issue is simply that a C++ mangled identifier is a combination of
individual identifiers, which are mangled separately. A C identifier
is a single piece, so the we don't have problems with sub-identifiers
and indicating their length.

> People who write in C++ want to interface to C routines; we'll
> simplify their job if we're consistent.

And people can do that. If they define a function with extern "C"
linkage, they will, of course, get the same mangling as if they write
the same identifier in a C program.

> By ``universal character'' I assume that you mean ``a \u escape that
> has no extended native character equivalent''.  Otherwise, the \u
> escapes won't unify properly with their native equivalents.

Certainly - if unification is performed.

Regards,
Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58                   ` Paul Eggert
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
  2 siblings, 1 reply; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

   Date: Wed, 06 Jan 1999 00:05:46 -0800
   From: Per Bothner <bothner@cygnus.com>

   > Does the phrase ``the character's numeric value'' assume that native
   > extended characters are translated to Unicode before mangling begins?

   For Java, they have to be.  For C and C++, I think that is the
   cleanest solution.

I tend to agree, and that's why my proposal has a -funify-names option.

RMS prefers that names be left as-is by default, though, so the
proposal also has a -fno-unify-names option.  I think the default
value of this option will depend on the platform, and that
Java-oriented platforms will prefer -funify-names.

   This is what is going to happen for Java, so we will have to
   be able to translate external encodings into UTF8 on input, and vice
   versa on output (in gdb and in ld/as error messages).  So there is
   no simplification in leaving external characters untranslated

These are good points, but I think your conclusion is too strong.  I
can think of at least one simplification: with -fno-unify-names, a
non-Java GCC installation won't need translation tables to and from
UTF-8.  This is a simplification, since these translation tables are a
maintenance hassle: they are not available on all platforms, and I
doubt whether it's wise to include the tables in GCC itself.  (Even
maintaining Ebcdic-to-Ascii tables is dicey, but just try maintaining
tables translating private JIS extensions to UTF-8!)

In other words, if -fno-unify-names is in effect, the non-Java part of
GCC will be easier to port to non-UTF-8 environments; this is a plus.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Joe Buck
@ 1999-01-31 23:58                     ` Marc Espie
  1 sibling, 0 replies; 31+ messages in thread
From: Marc Espie @ 1999-01-31 23:58 UTC (permalink / raw)
  To: eggert; +Cc: egcs

In article < 199901090224.SAA05659@shade.twinsun.com > you write:
>Here's a dumb question: if gas and gld can handle arbitrary strings as
>names, why not skip name mangling and assembling entirely?  Wouldn't
>this be cleanest?

>Obviously we would still have to mangle/assemble on platforms that
>only allow A-Za-z0-9_ (I'll write more about this later).  And even
>for GNU/Linux, we'd need backwards-compatibility options.  But with
>the right compatibility option or two, it seems to me that the
>transition to straight-through regime would be doable.  

The main problem I see for such a scheme is BUGS. 
Some people are working on platforms for which gas/gld does not work, or
is not an option for various reasons. 

If you start foregoing name mangling entirely, there's bound to be a 
time when name-mangled platoforms will lose: it's not easy to notice that
you forgot to add name-mangling in an obscure corner if your platform 
actually does not require it.

As a related example, I'm currently wading thru the internals of gas/gld.
Yucky. This program is a pile of assorted junk put together. There is no
obvious way to fix it: what happened is that all the distinct flavors 
diverged somewhat. The end result is that this may be a portable assembler,
but you had better be using a very mainstream platform if you want it to 
work perfectly. Or you will have to fix things yourself. There are oodles of
#if defined(HAVE_ELF) or if (bfd_format == elf_flavor) throughout. From what
I've seen, some of them are not elf-related at all, but rather bug-fixes
that nobody bothered to integrate to the main gas---which is hard, mind you,
as the bug-fix is probably slightly different for a.out, and for coff.

This is the way I see things going for egcs if name-mangling is allowed to
be turned off...

^ permalink raw reply	[flat|nested] 31+ messages in thread

* NO_DOLLAR_IN_LABEL
@ 1999-01-31 23:58 Martin v. Loewis
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
                   ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: egcs, gcc2

It seems that there is a serious misconfiguration in gcc with regard to
usage of the dollar sign in assembler names. Many platforms define
NO_DOLLAR_IN_LABEL with a comment

/* Use periods rather than dollar signs in special g++ assembler names.  */

or

/* We define this to prevent the name mangler from putting dollar signs into
   function names.  */

This is not really the semantics of NO_DOLLAR_IN_LABEL - it really
should mean

>> Define this macro if the assembler does not accept the character
>> @samp{$} in label names.
                            [tm.texi]

Now, it is proposed that the dollar sign is used to encode universal
character names and 'national' characters if g++ uses '.', and vice
versa, with a fall-back when either dollars or dots are not available.

This is a good proposal, but for it to work, we have to really
indicate what the assembler supports.

I propose that we add an additional macro to tm.texi,
CXX_NO_DOLLAR_IN_LABEL, which is set in all config files which
currently use NO_DOLLAR_IN_LABEL even if the assemblers do support
dollar signs.

If there are no objections, I'll produce a patch in that direction.

Regards,
Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58                   ` Martin v. Loewis
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
  2 siblings, 0 replies; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: eggert, egcs, gcc2

> [I keep seeing the abbreviation UCN, but I must have not read the message
> where the abbreviation was expanded.  Could someone do that?]

Universal-character-name. This is what the C and C++ standards call
the \uxxxx and \UXXXXXXXX constructs.

Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58       ` David Edelsohn
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1 sibling, 0 replies; 31+ messages in thread
From: David Edelsohn @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Per Bothner; +Cc: Martin v. Loewis, egcs, gcc2

	I think that AIX assembler might not grok '$'.  It does not handle
it in identifiers, but I am not sure about labels.

David

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58 NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58 ` Jeffrey A Law
  1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  2 siblings, 1 reply; 31+ messages in thread
From: Jeffrey A Law @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

  In message < 199901021027.LAA01251@mira.isdn.cs.tu-berlin.de >you write:
  > I propose that we add an additional macro to tm.texi,
  > CXX_NO_DOLLAR_IN_LABEL, which is set in all config files which
  > currently use NO_DOLLAR_IN_LABEL even if the assemblers do support
  > dollar signs.
I believe NO_DOLLAR_IN_LABEL is supposed to mean  "the assembler does not
support $ in label".  I'm pretty sure the comments are wrong/misleading.
Note the following from tm.texi:

@findex NO_DOLLAR_IN_LABEL
@item NO_DOLLAR_IN_LABEL
Define this macro if the assembler does not accept the character
@samp{$} in label names.  By default constructors and destructors in
G++ have @samp{$} in the identifiers.  If this macro is defined,
@samp{.} is used instead.

@findex NO_DOT_IN_LABEL
@item NO_DOT_IN_LABEL
Define this macro if the assembler does not accept the character
@samp{.} in label names.  By default constructors and destructors in G++
have names that use @samp{.}.  If this macro is defined, these names
are rewritten to avoid @samp{.}.

If we end up needing an additional patch with an additional macro, it needs
to be language independent.  In general, putting langauage specific stuff in
the config files is frowned upon.  For example, if we need a macro to indicate
that mangling should not use '$' independent of NO_DOLLAR_IN_LABEL we could
use NO_DOLLAR_IN_MANGLED_NAMES or something like that.

jeff

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL David Edelsohn
@ 1999-01-31 23:58       ` Martin v. Loewis
  1999-01-31 23:58         ` NO_DOLLAR_IN_LABEL Per Bothner
  1 sibling, 1 reply; 31+ messages in thread
From: Martin v. Loewis @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

> I have proposed such an encoding.  See gcc/cp/gxxint.texi.
> (That is not to say that I will make any claims that that is
> particularly good encoding.)

For C++ and Java, it is perfect. It fails for C. Consider

extern "C" void x\u00c6(void);

Using your encoding, we get 'x_u00C6U' (or perhaps 'U6x_00C6',
depending on the interpretation). Both are valid, non-reserved
identifiers on their own.

I find it desirable to support C linkage to such names for C++, with
no hidden restrictions or potential clashes.

> But "major platforms" is not the issue.  Most or all major platforms
> support gas.

Even for gas, we have the problem of non-Unicode non-ASCII
identifiers. Under Paul's proposal, we have to support these unless
a) A conversion to Unicode is available in the C library and
b) the user gave the -funify-names option or

Regards,
Martin

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58 NO_DOLLAR_IN_LABEL Martin v. Loewis
@ 1999-01-31 23:58 ` Per Bothner
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Jeffrey A Law
  2 siblings, 0 replies; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

> I propose that we add an additional macro to tm.texi,
> CXX_NO_DOLLAR_IN_LABEL, which is set in all config files which
> currently use NO_DOLLAR_IN_LABEL even if the assemblers do support
> dollar signs.
>
> If there are no objections, I'll produce a patch in that direction.

I object.  There is no need for such a patch.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58 NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58 ` Per Bothner
  1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Jeffrey A Law
  2 siblings, 1 reply; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

> Now, it is proposed that the dollar sign is used to encode universal
> character names and 'national' characters if g++ uses '.', and vice
> versa, with a fall-back when either dollars or dots are not available.
>
> This is a good proposal, but for it to work, we have to really
> indicate what the assembler supports.

Why?  If NO_DOLLAR_IN_LABEL is not set, we use '$' as the magic
characters; if it is set, we use '.', unless NO_DOT_IN_LABEL
is also set.

No need for a new macro.

C++ has managed with one "magic" character:  Either '$' or '.'
or (in a pinch) '_'.  For encoding other character sets, we
either:
(a) assume the assembler supports UTF-8 and/or arbitrarty features, or
(b) we stick with the existing assumption of a single magic character.

There is no purpose in designing an encoding that uses *two* magic
characters.  It will not work on many existing assemblers, and if
we are going to assume more than a minimal assembler, we might as
well assume gas, and do whatever is cleanest (which I think is UTF-8).

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58                 ` Per Bothner
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
                                     ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: egcs, gcc2

> Isn't this ambiguous?  E.g. both \u0123A (2 characters) and \u123A
> (1 character) would be written "_123A".

Yes, I was too quick.  I tried to add support for character
values needing more than 16 bits.  Basically, you need to choose
bewteen a fixed-length encoding (e.g. 4 hex digits) or a
terminating character (e.g. the hex encoding has to be
followed by '_').  One possible modification is to add a
'_' after the hex digits, but only if the next character
is a valid hex digit.

> Does the phrase ``the character's numeric value'' assume that native
> extended characters are translated to Unicode before mangling begins?

For Java, they have to be.  For C and C++, I think that is the
cleanest solution.

> If so, I don't see how to satisfy RMS's desire to support platforms
> where names are not translated to Unicode.

I am not convinced this is useful or practical.  I still think think
the cleanest solution is to translate all external characters to
UTF8 strings in IDENTIFIER_NODEs in gcc, in names in gdb symbol
tables, and also use UTF8 in symbols in .o files.  The latter is
complicated by the use of old assemblers, hence the need for
mangling.  But abstractly, we should be using UTF8 (not necessarily
Unicode).

This is what is going to happen for Java, so we will have to
be able to translate external encodings into UTF8 on input, and vice
versa on output (in gdb and in ld/as error messages).  So there is
no simplification in leaving external characters untranslated, only
extra useless complication.  (Assuming you consider Java support
important.)

> And if not, then I don't see how to distinguish native extended character
> values from UCN values when demangling.

[I keep seeing the abbreviation UCN, but I must have not read the message
where the abbreviation was expanded.  Could someone do that?]

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58                     ` Per Bothner
  1999-01-31 23:58                       ` NO_DOLLAR_IN_LABEL Paul Eggert
  0 siblings, 1 reply; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: egcs, gcc2

> These are good points, but I think your conclusion is too strong.  I
> can think of at least one simplification: with -fno-unify-names, a
> non-Java GCC installation won't need translation tables to and from
> UTF-8.

I'll buy that.

One further point, though:  If -fno-unify-names is in effect,
and the native character set is not UTF-8, then it seems to me
that UCNs don't make much sense, so it doesn't really matter
what we do with them.

Another point:  Maybe --no-unify-names vs --unify-names should
be a configure flag, rather than a compile flag. One reason is
that they change the ABI incompatibly, but a bigger reason is
that -funify-names may require translation tables and library
support, and you may need to decide at configure time whether and
where to load those from.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58           ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58             ` Horst von Brand
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  2 siblings, 1 reply; 31+ messages in thread
From: Horst von Brand @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: bothner, egcs, gcc2

"Martin v. Loewis" <martin@mira.isdn.cs.tu-berlin.de> said:
> In an attempt to formalize this proposal, I'd write:
> 
> C identifiers that contain universal characters are mangled in a
> different way than C++ and Java identifiers. The mangling for the
> latter is defined in gxxint.texi.

One of the virtues of C++ is that it is defined to be (almost) source-level
compatible with C, and link-compatible (at least for 'extern "C" ...')
-- 
Dr. Horst H. von Brand                       mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58                     ` Joe Buck
  1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Marc Espie
  1 sibling, 0 replies; 31+ messages in thread
From: Joe Buck @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: bothner, egcs, gcc2

> Here's a dumb question: if gas and gld can handle arbitrary strings as
> names, why not skip name mangling and assembling entirely?  Wouldn't
> this be cleanest?

Fully expanded C++ template names can be huge; it can actually be shorter
to use mangled names (with -fnew-abi), because the long names are highly
redundant.

Example: map<string,string>, an STL map that maps strings to strings.
(Let's ignore the std namespace for the moment).
There is a default template parameter, so this is really
map<string,string,less<string> >.
And string is itself a template,
string = basic_string<char,string_char_traits<char>,allocator<char> >.

So once this is fully expanded you have a very long string:

map<basic_string<char,string_char_traits<char>,allocator<char> >,
    basic_string<char,string_char_traits<char>,allocator<char> >,
    less<basic_string<char,string_char_traits<char>,allocator<char> > > >

Then consider the name of the copy constructor:

map<basic_string<char,string_char_traits<char>,allocator<char> >,
    basic_string<char,string_char_traits<char>,allocator<char> >,
    less<basic_string<char,string_char_traits<char>,allocator<char> > >
>::
map<basic_string<char,string_char_traits<char>,allocator<char> >,
    basic_string<char,string_char_traits<char>,allocator<char> >,
    less<basic_string<char,string_char_traits<char>,allocator<char> > > >
(const
map<basic_string<char,string_char_traits<char>,allocator<char> >,
    basic_string<char,string_char_traits<char>,allocator<char> >,
    less<basic_string<char,string_char_traits<char>,allocator<char> > > >
&);

However, it is highly redundant, and the -fnew-abi encoding needs
to only give the expansion of each sub-part once.  This way neither
the user, the assembler, nor the linker ever has to deal with these
huge names.  This encoding only needs two or three characters to repeat
an already named class or template class.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58                       ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58                         ` John A. Tamplin
  0 siblings, 0 replies; 31+ messages in thread
From: John A. Tamplin @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Paul Eggert; +Cc: egcs, gcc2

On Wed, 6 Jan 1999, Paul Eggert wrote:

> Perhaps; but frankly I don't see much sense to UCNs even in UTF-8
> environments.  I think UCNs a bit like trigraphs -- they are a
> committee invention, and they may look like a good idea if you've
> never used them, but that's about the best that one can say for them.
> At least UCNs don't have trigraphs' backward-compatibility problems.
> 
> I think most people will prefer using native characters, either UTF-8
> or some other encoding.

Imagine you are writing a program that you want to compile in any 
character set which has US-ASCII as a subset (traditionally what is used
by programs).  I can specify a given character without any ambiguity or
without restricting which of these character sets are used.

John A. Tamplin					Traveller Information Services
jat@Traveller.COM				2104 West Ferry Way
256/705-7007 - FAX 256/705-7100 		Huntsville, AL 35801

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58           ` NO_DOLLAR_IN_LABEL Martin v. Loewis
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
@ 1999-01-31 23:58             ` Per Bothner
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Paul Eggert
  1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Horst von Brand
  2 siblings, 1 reply; 31+ messages in thread
From: Per Bothner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: Martin v. Loewis; +Cc: egcs, gcc2

> In an attempt to formalize this proposal, I'd write:

It was not meant as a formal proposal, but let me try to make one:

Non-plain characters are any characters except ascii letter, digits,
and '_'.  In languages that allow identifiers to start with a
digit, an initial digit is also a non-plain character.  (We could
also allow doubled underscore, or initial underscore, or
underscore followed by capital letter *when intended to
be in the user rather than implementation namespace* to also
count as non-plain characters.)

The mangled name (assembly-level name, not counting possible
initial underscore on some platforms) of a C or C++ global
variable, or a C global function, or a C++ global function
declared as extern "C" is as follows (assuming there is no
asm specification):  If the name contains only plain characters,
then the mangled name is the same as the source name.
If the source name contains any non-plain characters,
the mangled name starts with a prefix "_UC" (for "universal
character"), followed by encoding for each of the characters.
Plain characters except for '_' are encoded as themselves.
An underscore followed by a lower-case letter is encoded
as itself.  Other underscores are encoded as "___".
A non-plain character CH is written as an initial underscore,
followed by the uppercased hexadecimal expansion of the
character's numeric value, with initial zeroes removed.
In other words, as if written by :
	printf ("_%X", CH);
(Note this does not limit us to 16-bit character codes.)

A C++ method is encoded as the encoding of the method name
(as described above), followed by "__", followed by mangling
of the containing class name (*not* as described above) and
parameters types.  (This is the same as the existing C++
mangling, but with a new mangling for non-ascii characters.)

When a class name needs to be mangled in a C++ mangled
method name, we use a variant of the above scheme, because
we need to distinguish class names from  primitives types
and other mangling codes.  To mangle a class name, if it
contains only plain characters, we emit the number of
characters in the name, followed by the characters of the name.
Thus class "Foo" is mangled as "3Foo".  If any of the
characters of the class name are non-plain, we emit
a "U", followed by the number of characters in the
mangling of the class name, followed by the encodings
of the characters, as given above for mangling simple names.

	--Per Bothner
Cygnus Solutions     bothner@cygnus.com     http://www.cygnus.com/~bothner

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
  1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Per Bothner
@ 1999-01-31 23:58                 ` Paul Eggert
  0 siblings, 0 replies; 31+ messages in thread
From: Paul Eggert @ 1999-01-31 23:58 UTC (permalink / raw)
  To: bothner; +Cc: egcs, gcc2

   Date: Tue, 05 Jan 1999 12:34:54 -0800
   From: Per Bothner <bothner@cygnus.com>

   > if our goal is to use short escapes, then there are better
   > characters than underscore.

   That is one goal.  Another goal is simplify the spec, and avoid
   different manglings on different platforms.

I think we have two different models here.  In my model, name mangling
is separate from what (for lack of a better word) I'll call ``name
assembling''.  Name mangling is a language specific process that
generates single names from names+signatures+whatever.  ``Name
assembling'' is a language independent process that generates names
acceptable to the assembler from possibly-mangled names that may not
be assembler-acceptable.  In my model, name assembling operates after
mangling, is independent of mangling, and is part of the back end.

In contrast, you are making name assembling a part of name mangling;
in your model, it's just one big happy procedure.

There are some advantages to making name assembling independent, though:

* It is more modular: e.g. it means that we don't have to modify every
  front end to worry about assembling names.

* It will make it easy to support platforms that use UTF-8 or native
  encodings in names -- name assembling would be the identity function
  on such platforms.

* It's easier to explain.

* It can be space-efficient, if done carefully; I'll send more details
  about this in a later letter.

   it is not unreasonable to give special preference to UTF8

I agree with this point, and will keep it in mind in future proposals.

   The offical current Japanese standard (JIS) for a multi-national
   character set is Unicode.

That's true, but UTF-8 is a different animal.  Few Japanese documents
in practice use UTF-8.  Shift-JIS, EUC, DBCS, and other encodings
(even UCS-2!) are more popular in practice, at least in my experience.
This is partly because of inertia, and partly because these other
encodings are considerably more space-efficient for Japanese than
UTF-8 is.

I think it likely that many Japanese, in practice, will move in the
ISO-2022 direction instead of the UTF-8 direction.  It is possible to
support Unicode under the ISO-2022 umbrella, so this is compatible
with JIS's support for Unicode.  GNU Emacs currently supports ISO-2022
and not UTF-8, so it's possible that the free software folks in Japan
will stick with ISO-2022 even if Microsoft continues to push UCS-2.

   > I suggested `__9V_' earlier, but `_9'
   > should do if we want something shorter.

   "_9" is not a reserved prefix, I believe.

I believe that `_9' is reserved.  In C at least, all identifiers that
begin with an underscore are always reserved for use as identifiers
with file scope, and the assembler identifiers in question all come
from file-scope identifiers.  In theory, we could use only `_' as a
prefix and still conform to the standard, but that's pushing the
limits of practice a bit too hard.

   "__9V_" seems very arbitrary, but I guess I missed the rationale
   earlier.

The rationale for avoiding `_U' is that existing C libraries already
use symbols that start with `_U'.  On GNU/Linux, for example, both
libX11 and libXol contain symbols that start with `_U'.  I surveyed
several systems and never found something that started with `_'
followed by a digit.

Since I wrote the comment quoted above, I shortened `__9V_' to `_9' to
make names shorter.  martin objected that `_9' might be confused with
a the prefix of a mangled C++ name; I don't fully understand C++ name
mangling, but from my limited understanding of it, changing `_9' to
`_0' should fix the problem.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: NO_DOLLAR_IN_LABEL
@ 1999-01-31 23:58 Richard Kenner
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Kenner @ 1999-01-31 23:58 UTC (permalink / raw)
  To: law; +Cc: egcs, gcc2

    If we end up needing an additional patch with an additional macro, it
    needs to be language independent.  In general, putting langauage
    specific stuff in the config files is frowned upon.  For example, if
    we need a macro to indicate that mangling should not use '$'
    independent of NO_DOLLAR_IN_LABEL we could use
    NO_DOLLAR_IN_MANGLED_NAMES or something like that.

I agree.  An irreverant way of saying this would be "if you want to add a
language-specific macro to config files, at least *pretend* it isn't
language-specific".  And who knows?  Perhaps some other language will indeed
use it.  Many times I've added a macro to tm.h that I was sure would only be
used by one machine, but it ended up being used by lots of them.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~1999-01-31 23:58 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-31 23:58 NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL David Edelsohn
1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58         ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58           ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58                 ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Per Bothner
1999-01-31 23:58                       ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58                         ` NO_DOLLAR_IN_LABEL John A. Tamplin
1999-01-31 23:58                   ` NO_DOLLAR_IN_LABEL Paul Eggert
1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Joe Buck
1999-01-31 23:58                     ` NO_DOLLAR_IN_LABEL Marc Espie
1999-01-31 23:58             ` NO_DOLLAR_IN_LABEL Horst von Brand
1999-01-31 23:58               ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58 ` NO_DOLLAR_IN_LABEL Jeffrey A Law
1999-01-31 23:58   ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58     ` NO_DOLLAR_IN_LABEL Jeffrey A Law
1999-01-31 23:58       ` NO_DOLLAR_IN_LABEL Martin v. Loewis
1999-01-31 23:58 NO_DOLLAR_IN_LABEL Richard Kenner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).