public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier
@ 2023-05-22 19:58 adam at wozniakconsulting dot com
  2023-05-22 20:04 ` [Bug c++/109936] " pinskia at gcc dot gnu.org
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 19:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

            Bug ID: 109936
           Summary: error: extended character ≠ is not valid in an
                    identifier
           Product: gcc
           Version: 11.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: adam at wozniakconsulting dot com
  Target Milestone: ---

Created attachment 55138
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55138&action=edit
cpp file that demonstrates bug

#define X(x)
X(🤔) // emojis work
X(≠)  // this "not equal" does NOT work!

///////////////////////////////////////////
#if 0
compile with "g++ -c bad.cpp" gives:

bad.cpp:3:3: error: extended character ≠ is not valid in an identifier
    3 | X(≠)
      |   ^

compile with "g++ -c -fextended-identifiers bad.cpp" gives the same error.

g++ --version says:

g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/lib/cpp --version says:

cpp (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


manual for both gcc and cpp says:

       -fextended-identifiers
          Accept universal character names and extended characters in
          identifiers.  This option is enabled by default for C99 (and later
          C standard versions) and C++.

BTW, i get similar error with the following unicode code points.  while some
may have reasonable explanations, many do not.

0080 - 00a7
00a9
00ab - 00ac
00ae
00b0 - 00b1
00b6
00bb
00bf
00d7
00f7
0300 - 036f
1680
180e
1dc0 - 1dff
2000 - 200a
200e - 2029
202f - 203e
2041 - 2053
2055 - 205f
20d0 - 20ff
2190 - 245f
2500 - 2775
2794 - 2bff
2e00 - 2e7f
3000 - 3003
3008 - 3020
3030
e000 - f8ff
fdd0 - fdef
fe20 - fe2f
fe45 - fe46

#endif

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
@ 2023-05-22 20:04 ` pinskia at gcc dot gnu.org
  2023-05-22 20:06 ` pinskia at gcc dot gnu.org
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-22 20:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
clang rejects both:
<source>:2:3: error: unexpected character <U+1F914>
X(🤔) // emojis work
  ^~
<source>:3:3: error: unexpected character <U+2260>
X(≠)  // this "not equal" does NOT work!
  ^


GCC does reject both with -pedantic (in GCC 12+) though:
<source>:2:3: error: extended character 🤔 is not valid in an identifier
    2 | X(🤔) // emojis work
      |   ^
<source>:3:3: error: extended character ≠ is not valid in an identifier
    3 | X(≠)  // this "not equal" does NOT work!
      |   ^


This changed from GCC 9 where GCC would accept both ...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
  2023-05-22 20:04 ` [Bug c++/109936] " pinskia at gcc dot gnu.org
@ 2023-05-22 20:06 ` pinskia at gcc dot gnu.org
  2023-05-22 20:06 ` pinskia at gcc dot gnu.org
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-22 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
See C++ papers: P1041R4 and P1139R2 on this.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
  2023-05-22 20:04 ` [Bug c++/109936] " pinskia at gcc dot gnu.org
  2023-05-22 20:06 ` pinskia at gcc dot gnu.org
@ 2023-05-22 20:06 ` pinskia at gcc dot gnu.org
  2023-05-22 20:16 ` adam at wozniakconsulting dot com
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-22 20:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |INVALID

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
This is all expected.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (2 preceding siblings ...)
  2023-05-22 20:06 ` pinskia at gcc dot gnu.org
@ 2023-05-22 20:16 ` adam at wozniakconsulting dot com
  2023-05-22 20:17 ` redi at gcc dot gnu.org
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 20:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Adam Wozniak <adam at wozniakconsulting dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|INVALID                     |---
             Status|RESOLVED                    |UNCONFIRMED
            Version|11.3.0                      |12.1.0

--- Comment #4 from Adam Wozniak <adam at wozniakconsulting dot com> ---
reopening.  this is not at all "expected".

C++ papers P1041R4 and P1139R2 cover literal constants in code.
they do not at all cover anything about arguments to C preprocessor macros.

in this case, the macro generates no code.
it should be perfectly legal to use these as arguments.
note the emoji passes through flawlessly.

bug also exists in 12.1.0, so updating "Version".

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (3 preceding siblings ...)
  2023-05-22 20:16 ` adam at wozniakconsulting dot com
@ 2023-05-22 20:17 ` redi at gcc dot gnu.org
  2023-05-22 20:26 ` redi at gcc dot gnu.org
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: redi at gcc dot gnu.org @ 2023-05-22 20:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I think https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1949r7.html
is the most relevant paper.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (4 preceding siblings ...)
  2023-05-22 20:17 ` redi at gcc dot gnu.org
@ 2023-05-22 20:26 ` redi at gcc dot gnu.org
  2023-05-22 20:29 ` pinskia at gcc dot gnu.org
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: redi at gcc dot gnu.org @ 2023-05-22 20:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Adam Wozniak from comment #4)
> reopening.  this is not at all "expected".
> 
> C++ papers P1041R4 and P1139R2 cover literal constants in code.
> they do not at all cover anything about arguments to C preprocessor macros.
> 
> in this case, the macro generates no code.

That isn't the point. The compiler has to tokenize the input in order to
perform the preprocessing step. That means it has to be able to decide what the
bytes comprising the ≠ mean. Are they multiple tokens? A single token
consisting of an identifier? A C++ operator?

The standard says "Each preprocessing token that is converted to a token (5.6)
shall have the lexical form of a keyword, an identifier, a literal, or an
operator or punctuator."

≠ cannot be used in an identifier, and it's none of the other forms either.

> it should be perfectly legal to use these as arguments.

By that argument, you could say X(£), but that isn't allowed either.

> note the emoji passes through flawlessly.

Not with -Wpedantic

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (5 preceding siblings ...)
  2023-05-22 20:26 ` redi at gcc dot gnu.org
@ 2023-05-22 20:29 ` pinskia at gcc dot gnu.org
  2023-05-22 20:52 ` adam at wozniakconsulting dot com
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-22 20:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Jonathan Wakely from comment #5)
> I think
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1949r7.html is the
> most relevant paper.

Right this is the correct paper. I misread the commit logs to find the right
one.

Also see PR 67224 which is what changed behavior in GCC 10 to reject ≠ as not a
valid identifier (p1949r7 paper references preprocessor here too).

Again this is all expected.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (6 preceding siblings ...)
  2023-05-22 20:29 ` pinskia at gcc dot gnu.org
@ 2023-05-22 20:52 ` adam at wozniakconsulting dot com
  2023-05-22 20:58 ` redi at gcc dot gnu.org
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 20:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #8 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #6)
> That isn't the point. The compiler has to tokenize the input in order to
> perform the preprocessing step. That means it has to be able to decide what
> the bytes comprising the ≠ mean. Are they multiple tokens? A single token
> consisting of an identifier? A C++ operator?
> 
> The standard says "Each preprocessing token that is converted to a token
> (5.6) shall have the lexical form of a keyword, an identifier, a literal, or
> an operator or punctuator."
> 
> ≠ cannot be used in an identifier, and it's none of the other forms either.
> 
> > it should be perfectly legal to use these as arguments.
> 
> By that argument, you could say X(£), but that isn't allowed either.
> 
> > note the emoji passes through flawlessly.
> 
> Not with -Wpedantic

i would argue that X(£) should also be allowed.
i don't think of the preprocessor as part of the compiler.
it's a different step, a different executable, that happens BEFORE the
compiler.
hence the name, PREprocessor.

i cannot argue with "the standard", however.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (7 preceding siblings ...)
  2023-05-22 20:52 ` adam at wozniakconsulting dot com
@ 2023-05-22 20:58 ` redi at gcc dot gnu.org
  2023-05-22 21:14 ` schwab@linux-m68k.org
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: redi at gcc dot gnu.org @ 2023-05-22 20:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #9 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Adam Wozniak from comment #8)
> i don't think of the preprocessor as part of the compiler.
> it's a different step, a different executable, that happens BEFORE the
> compiler.

No it isn't. Preprocessing is done by the compiler, using libcpp. There is no
different executable. GCC has worked that way for many, many years.

> hence the name, PREprocessor.
> i cannot argue with "the standard", however.

Tokenization happens before preprocessing anyway. It has to, so that the
preprocessor can tell that X(a) is four tokens.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (8 preceding siblings ...)
  2023-05-22 20:58 ` redi at gcc dot gnu.org
@ 2023-05-22 21:14 ` schwab@linux-m68k.org
  2023-05-22 21:20 ` jakub at gcc dot gnu.org
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: schwab@linux-m68k.org @ 2023-05-22 21:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #10 from Andreas Schwab <schwab@linux-m68k.org> ---
> The standard says "Each preprocessing token that is converted to a token
> (5.6) shall have the lexical form of a keyword, an identifier, a literal, or
> an operator or punctuator."

The argument of the macros is not converted to a token, but discarded in phase
4.

> ≠ cannot be used in an identifier, and it's none of the other forms either.

It is a valid preprocessing token ("non-whitespace character that cannot be one
of the above").

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (9 preceding siblings ...)
  2023-05-22 21:14 ` schwab@linux-m68k.org
@ 2023-05-22 21:20 ` jakub at gcc dot gnu.org
  2023-05-22 23:36 ` adam at wozniakconsulting dot com
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-05-22 21:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #11 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Bisection points to r10-3309-g7d112d6670a0e0e662

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (10 preceding siblings ...)
  2023-05-22 21:20 ` jakub at gcc dot gnu.org
@ 2023-05-22 23:36 ` adam at wozniakconsulting dot com
  2023-05-22 23:43 ` adam at wozniakconsulting dot com
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 23:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #12 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #9)
> (In reply to Adam Wozniak from comment #8)
> > i don't think of the preprocessor as part of the compiler.
> > it's a different step, a different executable, that happens BEFORE the
> > compiler.
> 
> No it isn't. Preprocessing is done by the compiler, using libcpp. There is
> no different executable. GCC has worked that way for many, many years.


No, i am fairly CERTAIN they are different executables.

i can even invoke one without the other; /lib/cpp can be invoked directly, and
g++ can be told to skip the preprocessor by renaming your source file *.i or
*.ii.

$ ls -la /lib/cpp
lrwxrwxrwx 1 root root 21 May 11  2022 /lib/cpp -> /etc/alternatives/cpp
$ ls -la /etc/alternatives/cpp
lrwxrwxrwx 1 root root 12 May 11  2022 /etc/alternatives/cpp -> /usr/bin/cpp
$ ls -la /usr/bin/cpp
lrwxrwxrwx 1 root root 6 May 11  2022 /usr/bin/cpp -> cpp-11
$ ls -la /usr/bin/cpp-11
lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 ->
x86_64-linux-gnu-cpp-11
$ ls -la /usr/bin/x86_64-linux-gnu-cpp-11
-rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11
$ which g++
/usr/bin/g++
$ ls -la /usr/bin/g++
lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++
$ ls -la /etc/alternatives/g++
lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ -> /usr/bin/g++-11
$ ls -la /usr/bin/g++-11
lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 ->
x86_64-linux-gnu-g++-11
$ ls -la /usr/bin/x86_64-linux-gnu-g++-11
-rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (11 preceding siblings ...)
  2023-05-22 23:36 ` adam at wozniakconsulting dot com
@ 2023-05-22 23:43 ` adam at wozniakconsulting dot com
  2023-05-22 23:47 ` adam at wozniakconsulting dot com
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 23:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #13 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jakub Jelinek from comment #11)
> Bisection points to r10-3309-g7d112d6670a0e0e662

that link gives me an error

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (12 preceding siblings ...)
  2023-05-22 23:43 ` adam at wozniakconsulting dot com
@ 2023-05-22 23:47 ` adam at wozniakconsulting dot com
  2023-05-23  0:39 ` adam at wozniakconsulting dot com
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-22 23:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #14 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Adam Wozniak from comment #12)
> (In reply to Jonathan Wakely from comment #9)
> > (In reply to Adam Wozniak from comment #8)
> > > i don't think of the preprocessor as part of the compiler.
> > > it's a different step, a different executable, that happens BEFORE the
> > > compiler.
> > 
> > No it isn't. Preprocessing is done by the compiler, using libcpp. There is
> > no different executable. GCC has worked that way for many, many years.
> 
> 
> No, i am fairly CERTAIN they are different executables.
> 
> i can even invoke one without the other; /lib/cpp can be invoked directly,
> and g++ can be told to skip the preprocessor by renaming your source file
> *.i or *.ii.
> 
> $ ls -la /lib/cpp
> lrwxrwxrwx 1 root root 21 May 11  2022 /lib/cpp -> /etc/alternatives/cpp
> $ ls -la /etc/alternatives/cpp
> lrwxrwxrwx 1 root root 12 May 11  2022 /etc/alternatives/cpp -> /usr/bin/cpp
> $ ls -la /usr/bin/cpp
> lrwxrwxrwx 1 root root 6 May 11  2022 /usr/bin/cpp -> cpp-11
> $ ls -la /usr/bin/cpp-11
> lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/cpp-11 ->
> x86_64-linux-gnu-cpp-11
> $ ls -la /usr/bin/x86_64-linux-gnu-cpp-11
> -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-cpp-11
> $ which g++
> /usr/bin/g++
> $ ls -la /usr/bin/g++
> lrwxrwxrwx 1 root root 21 May 22 16:06 /usr/bin/g++ -> /etc/alternatives/g++
> $ ls -la /etc/alternatives/g++
> lrwxrwxrwx 1 root root 15 May 22 19:31 /etc/alternatives/g++ ->
> /usr/bin/g++-11
> $ ls -la /usr/bin/g++-11
> lrwxrwxrwx 1 root root 23 Jan 16 05:17 /usr/bin/g++-11 ->
> x86_64-linux-gnu-g++-11
> $ ls -la /usr/bin/x86_64-linux-gnu-g++-11
> -rwxr-xr-x 1 root root 862976 Jan 16 05:17 /usr/bin/x86_64-linux-gnu-g++-11

lest someone claim they are the same because of identical sizes...

$ md5sum /usr/bin/x86_64-linux-gnu-g++-11
f0b26412421754aa03b9457a4d2ee40c  /usr/bin/x86_64-linux-gnu-g++-11

$ md5sum /usr/bin/x86_64-linux-gnu-cpp-11
3bddc1f50d7631ad22da0f875babe7a3  /usr/bin/x86_64-linux-gnu-cpp-11

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (13 preceding siblings ...)
  2023-05-22 23:47 ` adam at wozniakconsulting dot com
@ 2023-05-23  0:39 ` adam at wozniakconsulting dot com
  2023-05-23  1:54 ` pinskia at gcc dot gnu.org
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23  0:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #15 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #6)
> ≠ cannot be used in an identifier, and it's none of the other forms either.

at the risk of beating a dead horse, what you are saying here is that ≠ simply
cannot be used, ever, anywhere, in C/C++.

that seems like kind of a waste.  a whole raft of unicode characters that
simply cannot be used.  so much for embracing unicode.  Maybe someone wants to
name a variable "§32" for some reason, but can't because...

why exactly?

because the spec says so.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (14 preceding siblings ...)
  2023-05-23  0:39 ` adam at wozniakconsulting dot com
@ 2023-05-23  1:54 ` pinskia at gcc dot gnu.org
  2023-05-23  1:55 ` pinskia at gcc dot gnu.org
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23  1:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #16 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Adam Wozniak from comment #12)
> 
> No, i am fairly CERTAIN they are different executables.

GCC has been using an integrated preprocessor since GCC 3.0 timeframe (~2000 or
maybe even before). It is implemented in a library called libcpp in fact.


> i can even invoke one without the other; /lib/cpp can be invoked directly,
> and g++ can be told to skip the preprocessor by renaming your source file
> *.i or *.ii.

g++ and /lib/cpp (which is really gcc) are called drivers which launches the
actually compiler (cc1 and cc1plus). What happens when you name the file *.ii,
it just passes -fpreprocessed down to the actually compiler which skips part of
the preprocessing stage but the lexing (tokenizing) stage is still done. You
can also pass -E to cause the compiler just do the preprocessing stage for you
and write out the result (note tokenization happens during that too).

The name preprocessor is still used even though most compilers don't have a
seperate program that does the preprocessing any more.

It is funny arguing with folks who write parts of GCC on an idea of integrated
vs seperate preprocessor really.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (15 preceding siblings ...)
  2023-05-23  1:54 ` pinskia at gcc dot gnu.org
@ 2023-05-23  1:55 ` pinskia at gcc dot gnu.org
  2023-05-23  8:01 ` schwab@linux-m68k.org
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-05-23  1:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #17 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to Adam Wozniak from comment #13)
> (In reply to Jakub Jelinek from comment #11)
> > Bisection points to r10-3309-g7d112d6670a0e0e662
> 
> that link gives me an error

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=7d112d6670a0e0e662

Does that link work?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (16 preceding siblings ...)
  2023-05-23  1:55 ` pinskia at gcc dot gnu.org
@ 2023-05-23  8:01 ` schwab@linux-m68k.org
  2023-05-23  9:06 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: schwab@linux-m68k.org @ 2023-05-23  8:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Andreas Schwab <schwab@linux-m68k.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-05-23
             Status|RESOLVED                    |REOPENED
         Resolution|INVALID                     |---

--- Comment #18 from Andreas Schwab <schwab@linux-m68k.org> ---
I don't think this error is correct since the preprocessing token ≠ is never
converted to a token.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (17 preceding siblings ...)
  2023-05-23  8:01 ` schwab@linux-m68k.org
@ 2023-05-23  9:06 ` redi at gcc dot gnu.org
  2023-05-23 15:31 ` adam at wozniakconsulting dot com
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: redi at gcc dot gnu.org @ 2023-05-23  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #19 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Andreas Schwab from comment #10)
> It is a valid preprocessing token ("non-whitespace character that cannot be
> one of the above").

Ah right, yes. It's a preprocessing token, but is never converted to a token,
so doesn't need to be a keyword, identifier etc.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (18 preceding siblings ...)
  2023-05-23  9:06 ` redi at gcc dot gnu.org
@ 2023-05-23 15:31 ` adam at wozniakconsulting dot com
  2023-05-23 15:38 ` adam at wozniakconsulting dot com
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23 15:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #20 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Andrew Pinski from comment #17)
> (In reply to Adam Wozniak from comment #13)
> > (In reply to Jakub Jelinek from comment #11)
> > > Bisection points to r10-3309-g7d112d6670a0e0e662
> > 
> > that link gives me an error
> 
> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=7d112d6670a0e0e662
> 
> Does that link work?

i get this response:

This page contains the following errors:
error on line 20 at column 54: AttValue: " or ' expected
Below is a rendering of the page up to the first error.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (19 preceding siblings ...)
  2023-05-23 15:31 ` adam at wozniakconsulting dot com
@ 2023-05-23 15:38 ` adam at wozniakconsulting dot com
  2023-05-23 15:40 ` adam at wozniakconsulting dot com
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23 15:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #21 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Andrew Pinski from comment #16)
> It is funny arguing with folks who write parts of GCC on an idea of
> integrated vs seperate preprocessor really.

yeah, i've been pounding out C since the late 80s, my dinosaur is probably
showing.  they'll probably call me in 2038 like they called the old COBOL
programmers for Y2K.

it's weird to me to think of them not separately.  i've even used the C
preprocessor in contexts unrelated to parsing C code.

it's also weird to see someone who thinks of the C preprocessor only in terms
of its service to the compiler.

whatever, that's drifting off topic.

main point for me was, i don't see any other reason to disallow these unicode
chars other than "the spec says so".  i don't see any HARM in allowing them,
and i certainly see use cases where there is BENEFIT to allowing them.

not all macro args get turned into C++ identifiers.  some get thrown away. 
some get stringified.  in the particular case where i tripped over this, they
get thrown away, and i have ANOTHER postprocessing step that picks them up and
does other magic stuff with them.

also, there's probably a really good case for allowing some of these things,
like emoji, actually be allowed as C++ identifiers.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (20 preceding siblings ...)
  2023-05-23 15:38 ` adam at wozniakconsulting dot com
@ 2023-05-23 15:40 ` adam at wozniakconsulting dot com
  2023-05-23 16:00 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23 15:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #22 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #19)
> (In reply to Andreas Schwab from comment #10)
> > It is a valid preprocessing token ("non-whitespace character that cannot be
> > one of the above").
> 
> Ah right, yes. It's a preprocessing token, but is never converted to a
> token, so doesn't need to be a keyword, identifier etc.

Correct.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (21 preceding siblings ...)
  2023-05-23 15:40 ` adam at wozniakconsulting dot com
@ 2023-05-23 16:00 ` redi at gcc dot gnu.org
  2023-05-23 17:23 ` adam at wozniakconsulting dot com
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: redi at gcc dot gnu.org @ 2023-05-23 16:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #23 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Adam Wozniak from comment #20)
> i get this response:
> 
> This page contains the following errors:
> error on line 20 at column 54: AttValue: " or ' expected
> Below is a rendering of the page up to the first error.

That seems to be a problem at your end, the page is well-formed:
https://validator.w3.org/nu/?doc=https%3A%2F%2Fgcc.gnu.org%2Fgit%2Fgitweb.cgi%3Fp%3Dgcc.git%3Bh%3D7d112d6670a0e0e662

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (22 preceding siblings ...)
  2023-05-23 16:00 ` redi at gcc dot gnu.org
@ 2023-05-23 17:23 ` adam at wozniakconsulting dot com
  2023-05-23 17:36 ` joseph at codesourcery dot com
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23 17:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #24 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #23)
> (In reply to Adam Wozniak from comment #20)
> > i get this response:
> > 
> > This page contains the following errors:
> > error on line 20 at column 54: AttValue: " or ' expected
> > Below is a rendering of the page up to the first error.
> 
> That seems to be a problem at your end, the page is well-formed:
> https://validator.w3.org/nu/?doc=https%3A%2F%2Fgcc.gnu.org%2Fgit%2Fgitweb.
> cgi%3Fp%3Dgcc.git%3Bh%3D7d112d6670a0e0e662

works now.  did not before.  weird.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (23 preceding siblings ...)
  2023-05-23 17:23 ` adam at wozniakconsulting dot com
@ 2023-05-23 17:36 ` joseph at codesourcery dot com
  2023-05-23 18:05 ` adam at wozniakconsulting dot com
  2023-11-24 15:41 ` pinskia at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: joseph at codesourcery dot com @ 2023-05-23 17:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #25 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
Older versions of C++ - up to C++20 - would reject such characters (not 
allowed in identifiers based on the list of allowed characters in that 
standard version) even when not converted to a token, because (a) those 
older versions had (as-if) conversion of extended characters to UCNs in 
translation phase 1, and (b) UCNs not permitted in identifiers still 
matched the syntax for identifier preprocessing tokens ("Otherwise, the 
next preprocessing token is the longest sequence of characters that 
matches the syntax of a preprocessing token, even if that would cause 
further lexical analysis to fail") and then violated a semantic rule on 
which UCNs are allowed in identifiers.

C++23 instead converts UCNs to extended characters in phase 3 rather than 
doing the reverse conversion, and has (as of N4944, at least), 
[lex.pptoken], "... single non-whitespace characters that do not lexically 
match the other preprocessing token categories ... If any character not in 
the basic character set matches the last category, the program is 
ill-formed.".  That's part of the description of preprocessing tokens, 
before they get converted to tokens.  I think it has the same effect of 
disallowing the use of such a character (outside contexts such as string 
literals) - even if a different diagnostic might be better.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (24 preceding siblings ...)
  2023-05-23 17:36 ` joseph at codesourcery dot com
@ 2023-05-23 18:05 ` adam at wozniakconsulting dot com
  2023-11-24 15:41 ` pinskia at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: adam at wozniakconsulting dot com @ 2023-05-23 18:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

--- Comment #26 from Adam Wozniak <adam at wozniakconsulting dot com> ---
(In reply to Jonathan Wakely from comment #19)
> (In reply to Andreas Schwab from comment #10)
> > It is a valid preprocessing token ("non-whitespace character that cannot be
> > one of the above").
> 
> Ah right, yes. It's a preprocessing token, but is never converted to a
> token, so doesn't need to be a keyword, identifier etc.

i feel like it should work for stringification reasons too.  e.g.

#define X(x) #x
const char *letterA = X(A);   // this works
const char *notequal = X(≠);  // this does not

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug c++/109936] error: extended character ≠ is not valid in an identifier
  2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
                   ` (25 preceding siblings ...)
  2023-05-23 18:05 ` adam at wozniakconsulting dot com
@ 2023-11-24 15:41 ` pinskia at gcc dot gnu.org
  26 siblings, 0 replies; 28+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-24 15:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109936

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |stammark at gcc dot gnu.org

--- Comment #27 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
*** Bug 112702 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2023-11-24 15:41 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-22 19:58 [Bug c++/109936] New: error: extended character ≠ is not valid in an identifier adam at wozniakconsulting dot com
2023-05-22 20:04 ` [Bug c++/109936] " pinskia at gcc dot gnu.org
2023-05-22 20:06 ` pinskia at gcc dot gnu.org
2023-05-22 20:06 ` pinskia at gcc dot gnu.org
2023-05-22 20:16 ` adam at wozniakconsulting dot com
2023-05-22 20:17 ` redi at gcc dot gnu.org
2023-05-22 20:26 ` redi at gcc dot gnu.org
2023-05-22 20:29 ` pinskia at gcc dot gnu.org
2023-05-22 20:52 ` adam at wozniakconsulting dot com
2023-05-22 20:58 ` redi at gcc dot gnu.org
2023-05-22 21:14 ` schwab@linux-m68k.org
2023-05-22 21:20 ` jakub at gcc dot gnu.org
2023-05-22 23:36 ` adam at wozniakconsulting dot com
2023-05-22 23:43 ` adam at wozniakconsulting dot com
2023-05-22 23:47 ` adam at wozniakconsulting dot com
2023-05-23  0:39 ` adam at wozniakconsulting dot com
2023-05-23  1:54 ` pinskia at gcc dot gnu.org
2023-05-23  1:55 ` pinskia at gcc dot gnu.org
2023-05-23  8:01 ` schwab@linux-m68k.org
2023-05-23  9:06 ` redi at gcc dot gnu.org
2023-05-23 15:31 ` adam at wozniakconsulting dot com
2023-05-23 15:38 ` adam at wozniakconsulting dot com
2023-05-23 15:40 ` adam at wozniakconsulting dot com
2023-05-23 16:00 ` redi at gcc dot gnu.org
2023-05-23 17:23 ` adam at wozniakconsulting dot com
2023-05-23 17:36 ` joseph at codesourcery dot com
2023-05-23 18:05 ` adam at wozniakconsulting dot com
2023-11-24 15:41 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).