public inbox for jit@gcc.gnu.org
 help / color / mirror / Atom feed
* Symbol name character restrictions
@ 2023-04-10 14:39 Joshua Saxby
  2023-04-10 14:43 ` Marc Nieper-Wißkirchen
  0 siblings, 1 reply; 4+ messages in thread
From: Joshua Saxby @ 2023-04-10 14:39 UTC (permalink / raw)
  To: jit

[-- Attachment #1: Type: text/plain, Size: 1526 bytes --]

Dear All,

I noticed that currently libgccjit restricts symbol names for generated
functions (and I assume all other symbols) to match the rules for C symbol
names, that is, alphanumeric and underscores.

From the source for gcc_jit_context_new_function() (
https://github.com/gcc-mirror/gcc/blob/725bcdeec60771cb9ee387978716028b64ea1b7f/gcc/jit/libgccjit.cc#L1173-L1177
):

  /* The assembler can only handle certain names, so for now, enforce
     C's rules for identifiers upon the name, using ISALPHA and ISALNUM
     from safe-ctype.h to ignore the current locale.
     Eventually we'll need some way to interact with e.g. C++ name
     mangling.  */

I've seen some suggestions elsewhere that some assemblers can handle
symbols with wider varieties of symbols than these, I have struggled to
find any documentation of the exact restrictions on symbol-naming in the
assembler itself (I could assume it's identical to C symbol naming rules,
but I like to be sure), any pointers to where I could find such a
specification? Also, any plans to follow up on the hinted extension toward
the end of that comment, RE C++ name mangling?

Best Regards,

*J.S.*



*My PGP Public Key Identity*

pub   4096R/*B7A947E4* 2016-11-16 [expires: 2025-12-31]
      Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9 47E4*
uid       Joshua Saxby <joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
uid                   Joshua Saxby (saxbophone) <joshua.a.saxby@gmail.com>
sub   4096R/0A445946 2016-11-16 [expires: 2025-12-31]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Symbol name character restrictions
  2023-04-10 14:39 Symbol name character restrictions Joshua Saxby
@ 2023-04-10 14:43 ` Marc Nieper-Wißkirchen
  2023-04-10 14:51   ` Joshua Saxby
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Nieper-Wißkirchen @ 2023-04-10 14:43 UTC (permalink / raw)
  To: Joshua Saxby; +Cc: jit

According to the documentation of the GNU assembler at
https://sourceware.org/binutils/docs/as/Symbol-Intro.html, any
characters except for the NUL character are allowed in symbol names.

Am Mo., 10. Apr. 2023 um 16:40 Uhr schrieb Joshua Saxby via Jit
<jit@gcc.gnu.org>:
>
> Dear All,
>
> I noticed that currently libgccjit restricts symbol names for generated
> functions (and I assume all other symbols) to match the rules for C symbol
> names, that is, alphanumeric and underscores.
>
> From the source for gcc_jit_context_new_function() (
> https://github.com/gcc-mirror/gcc/blob/725bcdeec60771cb9ee387978716028b64ea1b7f/gcc/jit/libgccjit.cc#L1173-L1177
> ):
>
>   /* The assembler can only handle certain names, so for now, enforce
>      C's rules for identifiers upon the name, using ISALPHA and ISALNUM
>      from safe-ctype.h to ignore the current locale.
>      Eventually we'll need some way to interact with e.g. C++ name
>      mangling.  */
>
> I've seen some suggestions elsewhere that some assemblers can handle
> symbols with wider varieties of symbols than these, I have struggled to
> find any documentation of the exact restrictions on symbol-naming in the
> assembler itself (I could assume it's identical to C symbol naming rules,
> but I like to be sure), any pointers to where I could find such a
> specification? Also, any plans to follow up on the hinted extension toward
> the end of that comment, RE C++ name mangling?
>
> Best Regards,
>
> *J.S.*
>
>
>
> *My PGP Public Key Identity*
>
> pub   4096R/*B7A947E4* 2016-11-16 [expires: 2025-12-31]
>       Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9 47E4*
> uid       Joshua Saxby <joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
> uid                   Joshua Saxby (saxbophone) <joshua.a.saxby@gmail.com>
> sub   4096R/0A445946 2016-11-16 [expires: 2025-12-31]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Symbol name character restrictions
  2023-04-10 14:43 ` Marc Nieper-Wißkirchen
@ 2023-04-10 14:51   ` Joshua Saxby
  2023-04-13 15:14     ` Joshua Saxby
  0 siblings, 1 reply; 4+ messages in thread
From: Joshua Saxby @ 2023-04-10 14:51 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: jit

[-- Attachment #1: Type: text/plain, Size: 2887 bytes --]

Thanks for that info Marc, I can't believe I missed it!

I had a feeling that assemblers/object files were pretty permissive to
symbol names at least in principle.

This seems to contradict that comment from libgccjit source that I brought
up earlier, I wonder what other technical limitations (if any) are there in
the character set that jit's symbols can support?

Thanks,
*J.S.*



*My PGP Public Key Identity*

pub   4096R/*B7A947E4* 2016-11-16 [expires: 2017-05-15]
      Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9 47E4*
uid       Joshua Saxby <joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
uid                   Joshua Saxby (saxbophone) <joshua.a.saxby@gmail.com>
sub   4096R/0A445946 2016-11-16 [expires: 2017-05-15]




On Mon, 10 Apr 2023 at 15:43, Marc Nieper-Wißkirchen <marc.nieper@gmail.com>
wrote:

> According to the documentation of the GNU assembler at
> https://sourceware.org/binutils/docs/as/Symbol-Intro.html, any
> characters except for the NUL character are allowed in symbol names.
>
> Am Mo., 10. Apr. 2023 um 16:40 Uhr schrieb Joshua Saxby via Jit
> <jit@gcc.gnu.org>:
> >
> > Dear All,
> >
> > I noticed that currently libgccjit restricts symbol names for generated
> > functions (and I assume all other symbols) to match the rules for C
> symbol
> > names, that is, alphanumeric and underscores.
> >
> > From the source for gcc_jit_context_new_function() (
> >
> https://github.com/gcc-mirror/gcc/blob/725bcdeec60771cb9ee387978716028b64ea1b7f/gcc/jit/libgccjit.cc#L1173-L1177
> > ):
> >
> >   /* The assembler can only handle certain names, so for now, enforce
> >      C's rules for identifiers upon the name, using ISALPHA and ISALNUM
> >      from safe-ctype.h to ignore the current locale.
> >      Eventually we'll need some way to interact with e.g. C++ name
> >      mangling.  */
> >
> > I've seen some suggestions elsewhere that some assemblers can handle
> > symbols with wider varieties of symbols than these, I have struggled to
> > find any documentation of the exact restrictions on symbol-naming in the
> > assembler itself (I could assume it's identical to C symbol naming rules,
> > but I like to be sure), any pointers to where I could find such a
> > specification? Also, any plans to follow up on the hinted extension
> toward
> > the end of that comment, RE C++ name mangling?
> >
> > Best Regards,
> >
> > *J.S.*
> >
> >
> >
> > *My PGP Public Key Identity*
> >
> > pub   4096R/*B7A947E4* 2016-11-16 [expires: 2025-12-31]
> >       Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9
> 47E4*
> > uid       Joshua Saxby <
> joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
> > uid                   Joshua Saxby (saxbophone) <
> joshua.a.saxby@gmail.com>
> > sub   4096R/0A445946 2016-11-16 [expires: 2025-12-31]
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Symbol name character restrictions
  2023-04-10 14:51   ` Joshua Saxby
@ 2023-04-13 15:14     ` Joshua Saxby
  0 siblings, 0 replies; 4+ messages in thread
From: Joshua Saxby @ 2023-04-13 15:14 UTC (permalink / raw)
  To: Marc Nieper-Wißkirchen; +Cc: jit

[-- Attachment #1: Type: text/plain, Size: 3941 bytes --]

I've done some further digging, and it appears this feature was added to
GNU as about 8 years ago:

https://github.com/bminor/binutils-gdb/commit/d02603dc201f80cd9d2a1f4b1a16110b1e04222b
(commit: d02603d "Allow symbol and label names to be enclosed in double
quotes.")

I guess libgccjit predates this and the change wasn't propagated to it. I
think I will do some hacking on libgccjit to remove the check on symbol
names locally and see if it works.

Cheers,
*J.S.*



*My PGP Public Key Identity*

pub   4096R/*B7A947E4* 2016-11-16 [expires: 2017-05-15]
      Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9 47E4*
uid       Joshua Saxby <joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
uid                   Joshua Saxby (saxbophone) <joshua.a.saxby@gmail.com>
sub   4096R/0A445946 2016-11-16 [expires: 2017-05-15]




On Mon, 10 Apr 2023 at 15:51, Joshua Saxby <joshua.a.saxby@gmail.com> wrote:

> Thanks for that info Marc, I can't believe I missed it!
>
> I had a feeling that assemblers/object files were pretty permissive to
> symbol names at least in principle.
>
> This seems to contradict that comment from libgccjit source that I brought
> up earlier, I wonder what other technical limitations (if any) are there in
> the character set that jit's symbols can support?
>
> Thanks,
> *J.S.*
>
>
>
> *My PGP Public Key Identity*
>
> pub   4096R/*B7A947E4* 2016-11-16 [expires: 2017-05-15]
>       Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9
> 47E4*
> uid       Joshua Saxby <joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
> uid                   Joshua Saxby (saxbophone) <joshua.a.saxby@gmail.com>
> sub   4096R/0A445946 2016-11-16 [expires: 2017-05-15]
>
>
>
>
> On Mon, 10 Apr 2023 at 15:43, Marc Nieper-Wißkirchen <
> marc.nieper@gmail.com> wrote:
>
>> According to the documentation of the GNU assembler at
>> https://sourceware.org/binutils/docs/as/Symbol-Intro.html, any
>> characters except for the NUL character are allowed in symbol names.
>>
>> Am Mo., 10. Apr. 2023 um 16:40 Uhr schrieb Joshua Saxby via Jit
>> <jit@gcc.gnu.org>:
>> >
>> > Dear All,
>> >
>> > I noticed that currently libgccjit restricts symbol names for generated
>> > functions (and I assume all other symbols) to match the rules for C
>> symbol
>> > names, that is, alphanumeric and underscores.
>> >
>> > From the source for gcc_jit_context_new_function() (
>> >
>> https://github.com/gcc-mirror/gcc/blob/725bcdeec60771cb9ee387978716028b64ea1b7f/gcc/jit/libgccjit.cc#L1173-L1177
>> > ):
>> >
>> >   /* The assembler can only handle certain names, so for now, enforce
>> >      C's rules for identifiers upon the name, using ISALPHA and ISALNUM
>> >      from safe-ctype.h to ignore the current locale.
>> >      Eventually we'll need some way to interact with e.g. C++ name
>> >      mangling.  */
>> >
>> > I've seen some suggestions elsewhere that some assemblers can handle
>> > symbols with wider varieties of symbols than these, I have struggled to
>> > find any documentation of the exact restrictions on symbol-naming in the
>> > assembler itself (I could assume it's identical to C symbol naming
>> rules,
>> > but I like to be sure), any pointers to where I could find such a
>> > specification? Also, any plans to follow up on the hinted extension
>> toward
>> > the end of that comment, RE C++ name mangling?
>> >
>> > Best Regards,
>> >
>> > *J.S.*
>> >
>> >
>> >
>> > *My PGP Public Key Identity*
>> >
>> > pub   4096R/*B7A947E4* 2016-11-16 [expires: 2025-12-31]
>> >       Key fingerprint = *E2C4 514F F0FA 52D1 896A  B1D6 3D42 BFD9 B7A9
>> 47E4*
>> > uid       Joshua Saxby <
>> joshua.a.saxby+UMvLnvbsOxBHaeiCHvbdunpz@gmail.com>
>> > uid                   Joshua Saxby (saxbophone) <
>> joshua.a.saxby@gmail.com>
>> > sub   4096R/0A445946 2016-11-16 [expires: 2025-12-31]
>>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-13 15:15 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-10 14:39 Symbol name character restrictions Joshua Saxby
2023-04-10 14:43 ` Marc Nieper-Wißkirchen
2023-04-10 14:51   ` Joshua Saxby
2023-04-13 15:14     ` Joshua Saxby

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).