public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
@ 2020-03-13 13:44 marxin at gcc dot gnu.org
2020-03-13 13:45 ` [Bug preprocessor/94168] " marxin at gcc dot gnu.org
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-13 13:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
Bug ID: 94168
Summary: [10 Regression] error: extended character § is not
valid in an identifier since
r10-3309-g7d112d6670a0e0e6
Product: gcc
Version: 10.0
Status: UNCONFIRMED
Keywords: rejects-valid
Severity: normal
Priority: P3
Component: preprocessor
Assignee: unassigned at gcc dot gnu.org
Reporter: marxin at gcc dot gnu.org
Target Milestone: ---
I see the following regression:
$ cat red.cc
#ifdef WINDOWS
§
#endif
$ g++-9 red.cc -c
$ g++ red.cc -c
red.cc:2:1: error: extended character § is not valid in an identifier
2 | ��
| ^
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
@ 2020-03-13 13:45 ` marxin at gcc dot gnu.org
2020-03-13 16:44 ` lhyatt at gmail dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-13 13:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
Martin Liška <marxin at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2020-03-13
Target Milestone|--- |10.0
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Known to fail| |10.0
Known to work| |9.3.0
CC| |lhyatt at gmail dot com
--- Comment #1 from Martin Liška <marxin at gcc dot gnu.org> ---
It's reduced from hfst-ospell package.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
2020-03-13 13:45 ` [Bug preprocessor/94168] " marxin at gcc dot gnu.org
@ 2020-03-13 16:44 ` lhyatt at gmail dot com
2020-03-13 18:16 ` joseph at codesourcery dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: lhyatt at gmail dot com @ 2020-03-13 16:44 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
--- Comment #2 from Lewis Hyatt <lhyatt at gmail dot com> ---
(In reply to Martin Liška from comment #0)
> I see the following regression:
>
> $ cat red.cc
> #ifdef WINDOWS
> §
> #endif
>
> $ g++-9 red.cc -c
> $ g++ red.cc -c
> red.cc:2:1: error: extended character § is not valid in an identifier
> 2 | ��
> | ^
The corrupted colorization in the diagnostics is a bug that I submitted a patch
for already. David prefers that fix to wait for GCC 11.
Regarding the behavior, if you replace the § with the equivalent UCN:
#ifdef WINDOWS
\u00A7
#endif
then you will get the same behavior with older GCC before my patch too. My
patch causes the UTF-8 to be interpreted as an identifier rather than a stray
token, hence it ends up with the same error.
As it happens, if you compile your test in C mode, it will succeed, because the
UTF-8 logic for C mode treats the invalid character as a stray token rather
than part of an identifier, then it gets compiled out fine. In C++, it is
rather a syntax error by design so it triggers this error. When switching to
UCN syntax, it is an error for both C and C++ so fails either way.
Looking at the relevant code in charset.c (_cpp_valid_ucn and _cpp_valid_utf8)
... I think it is probably just a matter of checking pfile->state.skipping in
more places. I made _cpp_valid_utf8 so as to preserve all the analogous
behavior of the existing _cpp_valid_ucn. It seems that _cpp_valid_ucn checks
pfile->state.skipping in some cases, like for $ in identifiers, but not for
others, such as the invalid character case.
I am happy to submit a patch to fix this, but I am not sure in what all cases
it is correct to skip the error. For instance, this code can be made to trigger
an error too, in C90 mode:
$ cat t.c
#ifdef WINDOWS
int \u00E4;
#endif
$ gcc-8 -c t.c -std=c90 -fextended-identifiers
t.c:2:5: warning: universal character names are only valid in C++ and C99
int \u00E4;
^
That is because _cpp_valid_ucn doesn't check pfile->state.skipping for this
case either. I think, especially in C++, there are probably at least some cases
where an error should be triggered even in conditionally compiled code, but I
don't know enough off hand to say for sure.
FWIW, the below patch fixes the present issue, but it doesn't tackle equivalent
UCN behavior or fix the related issues... I just need some guidance as to the
expected behavior to do that.
-Lewis
diff --git a/libcpp/charset.c b/libcpp/charset.c
index d9281c5fb97..129f234349e 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -1260,7 +1260,7 @@ _cpp_valid_utf8 (cpp_reader *pfile,
way). In C, this byte rather becomes grammatically a separate
token. */
- if (CPP_OPTION (pfile, cplusplus))
+ if (!pfile->state.skipping && CPP_OPTION (pfile, cplusplus))
cpp_error (pfile, CPP_DL_ERROR,
"extended character %.*s is not valid in an identifier",
(int) (*pstr - base), base);
@@ -1273,7 +1273,7 @@ _cpp_valid_utf8 (cpp_reader *pfile,
break;
case 2:
- if (identifier_pos == 1)
+ if (!pfile->state.skipping && identifier_pos == 1)
{
/* This is treated the same way in C++ or C99 -- lexed as an
identifier which is then invalid because an identifier is
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
2020-03-13 13:45 ` [Bug preprocessor/94168] " marxin at gcc dot gnu.org
2020-03-13 16:44 ` lhyatt at gmail dot com
@ 2020-03-13 18:16 ` joseph at codesourcery dot com
2020-03-16 8:06 ` marxin at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: joseph at codesourcery dot com @ 2020-03-13 18:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
--- Comment #3 from joseph at codesourcery dot com <joseph at codesourcery dot com> ---
The reasoning for rejecting this (for UCNs in both C and C++, for other
characters in C++ because of the C++ rule that such characters get
converted to UCNs) is that the constraints on permitted characters in
identifiers appear to me to apply to any pp-token that matches the syntax
productions for "identifier", whether or not that pp-token ends up getting
converted to a token. This is similar to
#if 0
"multiline
string"
#endif
being disallowed as well.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
` (2 preceding siblings ...)
2020-03-13 18:16 ` joseph at codesourcery dot com
@ 2020-03-16 8:06 ` marxin at gcc dot gnu.org
2020-03-16 10:29 ` marxin at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-16 8:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
Thank you for the analysis and suggested patch.
The original source code looks like this:
#ifdef WINDOWS
static std::string wide_string_to_string(const std::wstring & wstr)
{
int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0],
(int)§wstr.size(), NULL, 0, NULL, NULL);
std::string str( size_needed, 0 );
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &str[0],
size_needed, NULL, NULL);
return str;
}
#endif
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
` (3 preceding siblings ...)
2020-03-16 8:06 ` marxin at gcc dot gnu.org
@ 2020-03-16 10:29 ` marxin at gcc dot gnu.org
2020-04-01 7:52 ` rguenth at gcc dot gnu.org
2020-04-01 7:56 ` marxin at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-03-16 10:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
--- Comment #5 from Martin Liška <marxin at gcc dot gnu.org> ---
I reported that upstream as well:
https://github.com/hfst/hfst-ospell/issues/49
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
` (4 preceding siblings ...)
2020-03-16 10:29 ` marxin at gcc dot gnu.org
@ 2020-04-01 7:52 ` rguenth at gcc dot gnu.org
2020-04-01 7:56 ` marxin at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-04-01 7:52 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |INVALID
--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
I read Josephs comment so that GCC is correct to reject the code and Martins
quoting of the original testcase shows it's likely a genuine bug in the
program (a typo of some sorts).
Thus closing as INVALID. Please reopen if any of the above is wrong.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug preprocessor/94168] [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
` (5 preceding siblings ...)
2020-04-01 7:52 ` rguenth at gcc dot gnu.org
@ 2020-04-01 7:56 ` marxin at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-04-01 7:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94168
--- Comment #7 from Martin Liška <marxin at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #6)
> I read Josephs comment so that GCC is correct to reject the code and Martins
> quoting of the original testcase shows it's likely a genuine bug in the
> program (a typo of some sorts).
Yes. A real one bug that was already fixed.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-04-01 7:56 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 13:44 [Bug preprocessor/94168] New: [10 Regression] error: extended character § is not valid in an identifier since r10-3309-g7d112d6670a0e0e6 marxin at gcc dot gnu.org
2020-03-13 13:45 ` [Bug preprocessor/94168] " marxin at gcc dot gnu.org
2020-03-13 16:44 ` lhyatt at gmail dot com
2020-03-13 18:16 ` joseph at codesourcery dot com
2020-03-16 8:06 ` marxin at gcc dot gnu.org
2020-03-16 10:29 ` marxin at gcc dot gnu.org
2020-04-01 7:52 ` rguenth at gcc dot gnu.org
2020-04-01 7:56 ` marxin at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).