public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens
@ 2022-01-20 17:30 pavel.morozkin at gmail dot com
  2022-01-20 20:59 ` [Bug preprocessor/104147] [9/10/11/12 Regression] " jsm28 at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: pavel.morozkin at gmail dot com @ 2022-01-20 17:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

            Bug ID: 104147
           Summary: C preprocessor may remove the standard required
                    whitespace between the preprocessing tokens
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pavel.morozkin at gmail dot com
  Target Milestone: ---

Sample code:
#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
STR(X(Y,Y))

Invocations:
$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P
"Y Y"

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"YY"

Also 1: Somehow gcc takes into account the whitespace between `,` and `Y`:
$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()" -D"Z=STR(X(Y,Y))"
"YY"

$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()" -D"Z=STR(X(Y, Y))"
"Y Y"

Also 2: This:
STR(X(Y,
Y))

leads to:
$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"Y Y"

However, this:
STR(X(Y
,Y))

leads to:
$ gcc t222.c -std=c11 -pedantic -Wall -Wextra -E -P -D"Y()"
"YY"

The whitespace is required by the standard.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
@ 2022-01-20 20:59 ` jsm28 at gcc dot gnu.org
  2022-01-21  7:55 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jsm28 at gcc dot gnu.org @ 2022-01-20 20:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Joseph S. Myers <jsm28 at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |3.2.3
      Known to work|                            |3.0.4
   Target Milestone|---                         |9.5
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2022-01-20
     Ever confirmed|0                           |1
            Summary|C preprocessor may remove   |[9/10/11/12 Regression] C
                   |the standard required       |preprocessor may remove the
                   |whitespace between the      |standard required
                   |preprocessing tokens        |whitespace between the
                   |                            |preprocessing tokens

--- Comment #1 from Joseph S. Myers <jsm28 at gcc dot gnu.org> ---
This appears to be a regression between GCC 3.0 and 3.2.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
  2022-01-20 20:59 ` [Bug preprocessor/104147] [9/10/11/12 Regression] " jsm28 at gcc dot gnu.org
@ 2022-01-21  7:55 ` rguenth at gcc dot gnu.org
  2022-01-31 14:07 ` jakub at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-21  7:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
  2022-01-20 20:59 ` [Bug preprocessor/104147] [9/10/11/12 Regression] " jsm28 at gcc dot gnu.org
  2022-01-21  7:55 ` rguenth at gcc dot gnu.org
@ 2022-01-31 14:07 ` jakub at gcc dot gnu.org
  2022-01-31 14:50 ` jakub at gcc dot gnu.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-31 14:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
So, from what I can see on
#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
STR(X(Y,Y))
vs.
#define X(x,y)  x y
#define STR_(x) #x
#define STR(x)  STR_(x)
#define Y()
STR(X(Y,Y))
is that on the third call to funlike_invocation_p for the Y macro, the code
reads 2 padding tokens before finding a non-padding non-CPP_OPEN_PAREN one:
1365    static _cpp_buff *
1366    funlike_invocation_p (cpp_reader *pfile, cpp_hashnode *node,
1367                          _cpp_buff **pragma_buff, unsigned *num_args)
1368    {
1369      const cpp_token *token, *padding = NULL;
1370    
1371      for (;;)
1372        {
1373          token = cpp_get_token (pfile);
1374          if (token->type != CPP_PADDING)
1375            break;
1376          if (padding == NULL
1377              || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
1378            padding = token;
1379        }
The first padding token is pfile->avoid_paste and the second padding token is
one created by padding_token:
(gdb) p *padding
$128 = {src_loc = 0, type = CPP_PADDING, flags = 0, val = {node = {node = 0x0,
spelling = 0x0}, source = 0x0, str = {len = 0, text = 0x0}, macro_arg = {arg_no
= 0, spelling = 0x0}, 
    token_no = 0, pragma = 0}}
(gdb) p *token
$129 = {src_loc = 258816, type = CPP_PADDING, flags = 0, val = {node = {node =
0x7fffea249db0, spelling = 0x0}, source = 0x7fffea249db0, str = {len =
3928268208, text = 0x0}, 
    macro_arg = {arg_no = 3928268208, spelling = 0x0}, token_no = 3928268208,
pragma = 3928268208}}
(gdb) p *token->val.source
$130 = {src_loc = 242688, type = CPP_MACRO_ARG, flags = 1, val = {node = {node
= 0x7fff00000002, spelling = 0x7fffea38e2e8}, source = 0x7fff00000002, str =
{len = 2, 
      text = 0x7fffea38e2e8 "\260\260\070\352\377\177"}, macro_arg = {arg_no =
2, spelling = 0x7fffea38e2e8}, token_no = 2, pragma = 2}}
So, both CPP_PADDING tokens have flags of 0, but the second padding has
non-NULL val.source and that one has PREV_WHITE set on it.
Now, because of the above condition, padding is in the end the first of the 2
padding tokens, i.e. pfile->avoid_paste.
The code later will do:
1393      if (token->type != CPP_EOF || token == &pfile->endarg)
1394        {
1395          _cpp_backup_tokens (pfile, 1);
1396          if (padding)
1397            _cpp_push_token_context (pfile, NULL, padding, 1);
1398        }
so backup a single token (a CPP_NAME) and push a new context containing just
the avoid_paste token.
Later on cpp_get_token_1 is called and first hits:
2930          else if (!reached_end_of_context (context))
2931            {
2932              consume_next_token_from_context (pfile, &result,
2933                                               &virt_loc);
and so the avoid_paste token is read and next time reached_end_of_context is
true and so it will:
2945              if (pfile->context->c.macro)
2946                ++num_expanded_macros_counter;
2947              _cpp_pop_context (pfile);
2948              if (pfile->state.in_directive)
2949                continue;
2950              result = &pfile->avoid_paste;
and read avoid_paste again.
Now, if Y is not a macro, the tokens which will be read are the original ones,
i.e.
one avoid_paste followed by the padding_token one.
Now, stringify_arg is called later on, and that one ignores CPP_PADDING tokens,
except when they have val.source set on it:
      if (token->type == CPP_PADDING)
        {
          if (source == NULL
              || (!(source->flags & PREV_WHITE)
                  && token->val.source == NULL))
            source = token->val.source;
          continue;
        }
...
      /* Leading white space?  */
      if (dest - 1 != BUFF_FRONT (pfile->u_buff))
        {
          if (source == NULL)
            source = token;
          if (source->flags & PREV_WHITE)
            *dest++ = ' ';
        }
      source = NULL;
So, when Y is not a macro, we set source to the CPP_MACRO_ARG with PREV_WHITE
set on it and so emit a space in between,
while when Y is a function-like macro, source isn't set from CPP_PADDING tokens
but instead from the actual CPP_NAME
token after it which doesn't have PREV_WHITE and so we don't emit a space in
there.
Now, if the order of the CPP_PADDING tokens doesn't matter, perhaps we want
funlike_invocation_p use the
padding token with non-NULL val.source in preference of the avoid_paste token
rather than the other way around,
but not sure what that would break.  Also, not really sure how PREV_WHITE flags
on the CPP_PADDING tokens come to play with this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (2 preceding siblings ...)
  2022-01-31 14:07 ` jakub at gcc dot gnu.org
@ 2022-01-31 14:50 ` jakub at gcc dot gnu.org
  2022-02-01 19:50 ` cvs-commit at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-01-31 14:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Looking at the PREV_WHITE uses, they look like:
  if (token->type == CPP_PADDING)
    {
      avoid_paste = true;
      if (print.source == NULL
          || (!(print.source->flags & PREV_WHITE)
              && token->val.source == NULL))
        print.source = token->val.source;
      return;
    }
...
  if (avoid_paste)
    {
...
      else if (print.source->flags & PREV_WHITE
...
    }
  else if (token->flags & PREV_WHITE)

and stringify_arg has something similar.
So, makes me wonder if funlike_invocation_p isn't just a copy & paste
of those conditions, with the important difference that padding (what we
remember) is not the source, but the token itself.
I'd expect something like:
      if (padding == NULL
          || padding->val.source == NULL
          || (!(padding->val.source->flags & PREV_WHITE)
              && token->val.source == NULL))
        padding = token;
instead.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11/12 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (3 preceding siblings ...)
  2022-01-31 14:50 ` jakub at gcc dot gnu.org
@ 2022-02-01 19:50 ` cvs-commit at gcc dot gnu.org
  2022-02-01 19:51 ` [Bug preprocessor/104147] [9/10/11 " jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-01 19:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:95ac5635409606386259d2ff21fb61738858ca4a

commit r12-6976-g95ac5635409606386259d2ff21fb61738858ca4a
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 1 20:48:03 2022 +0100

    libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

    As mentioned in the PR, in some cases we preprocess incorrectly when we
    encounter an identifier which is defined as function-like macro, followed
    by at least 2 CPP_PADDING tokens and then some other identifier.
    On the following testcase, the problem is in the 3rd funlike_invocation_p,
    the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared
token),
    CPP_PADDING (one created with padding_token, val.source is non-NULL and
    val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
    funlike_invocation_p remembers there was a padding token, but remembers the
    first one because of its condition, then the next token is the CPP_NAME,
    which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
    can't easily backup more tokens, it pushes into a new context the padding
    token (the pfile->avoid_paste one).  The net effect is that when Y is not
    defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
    while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
    (the second avoid_paste is because that is how we handle end of a context).
    Now, for stringify_arg that is unfortunately a significant difference,
    which handles CPP_PADDING tokens with:
          if (token->type == CPP_PADDING)
            {
              if (source == NULL
                  || (!(source->flags & PREV_WHITE)
                      && token->val.source == NULL))
                source = token->val.source;
              continue;
            }
    and later on
          /* Leading white space?  */
          if (dest - 1 != BUFF_FRONT (pfile->u_buff))
            {
              if (source == NULL)
                source = token;
              if (source->flags & PREV_WHITE)
                *dest++ = ' ';
            }
          source = NULL;
    (and c-ppoutput.cc has similar code).
    So, when Y is not fun-like macro, ' ' is added because padding_token's
    val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
    macro, we don't add ' ' in between, because source is NULL and so
    used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

    Now, the funlike_invocation_p condition
           if (padding == NULL
               || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
            padding = token;
    looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
    the intent was to prefer do the same thing and pick the right padding.
    But there are significant differences.  Both stringify_arg and
c-ppoutput.cc
    don't remember the CPP_PADDING token, but its val.source instead, while
    in funlike_invocation_p we want to remember the padding token that has the
    significant information for stringify_arg/c-ppoutput.cc.
    So, IMHO we want to overwrite padding if:
    1) padding == NULL (remember that there was any padding at all)
    2) padding->val.source == NULL (this matches the source == NULL
       case in stringify_arg)
    3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
       (this matches the !(source->flags & PREV_WHITE) && token->val.source ==
NULL
       case in stringify_arg)

    2022-02-01  Jakub Jelinek  <jakub@redhat.com>

            PR preprocessor/104147
            * macro.cc (funlike_invocation_p): For padding prefer a token
            with val.source non-NULL especially if it has PREV_WHITE set
            on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
            don't have PREV_WHITE set in flags.

            * c-c++-common/cpp/pr104147.c: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (4 preceding siblings ...)
  2022-02-01 19:50 ` cvs-commit at gcc dot gnu.org
@ 2022-02-01 19:51 ` jakub at gcc dot gnu.org
  2022-02-19  8:02 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-01 19:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[9/10/11/12 Regression] C   |[9/10/11 Regression] C
                   |preprocessor may remove the |preprocessor may remove the
                   |standard required           |standard required
                   |whitespace between the      |whitespace between the
                   |preprocessing tokens        |preprocessing tokens

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed on the trunk so far.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10/11 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (5 preceding siblings ...)
  2022-02-01 19:51 ` [Bug preprocessor/104147] [9/10/11 " jakub at gcc dot gnu.org
@ 2022-02-19  8:02 ` cvs-commit at gcc dot gnu.org
  2022-02-19  8:08 ` [Bug preprocessor/104147] [9/10 " jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-02-19  8:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7c442c8897163888e1d9e367b85565f61e0d3136

commit r11-9598-g7c442c8897163888e1d9e367b85565f61e0d3136
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 1 20:48:03 2022 +0100

    libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

    As mentioned in the PR, in some cases we preprocess incorrectly when we
    encounter an identifier which is defined as function-like macro, followed
    by at least 2 CPP_PADDING tokens and then some other identifier.
    On the following testcase, the problem is in the 3rd funlike_invocation_p,
    the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared
token),
    CPP_PADDING (one created with padding_token, val.source is non-NULL and
    val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
    funlike_invocation_p remembers there was a padding token, but remembers the
    first one because of its condition, then the next token is the CPP_NAME,
    which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
    can't easily backup more tokens, it pushes into a new context the padding
    token (the pfile->avoid_paste one).  The net effect is that when Y is not
    defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
    while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
    (the second avoid_paste is because that is how we handle end of a context).
    Now, for stringify_arg that is unfortunately a significant difference,
    which handles CPP_PADDING tokens with:
          if (token->type == CPP_PADDING)
            {
              if (source == NULL
                  || (!(source->flags & PREV_WHITE)
                      && token->val.source == NULL))
                source = token->val.source;
              continue;
            }
    and later on
          /* Leading white space?  */
          if (dest - 1 != BUFF_FRONT (pfile->u_buff))
            {
              if (source == NULL)
                source = token;
              if (source->flags & PREV_WHITE)
                *dest++ = ' ';
            }
          source = NULL;
    (and c-ppoutput.cc has similar code).
    So, when Y is not fun-like macro, ' ' is added because padding_token's
    val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
    macro, we don't add ' ' in between, because source is NULL and so
    used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

    Now, the funlike_invocation_p condition
           if (padding == NULL
               || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
            padding = token;
    looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
    the intent was to prefer do the same thing and pick the right padding.
    But there are significant differences.  Both stringify_arg and
c-ppoutput.cc
    don't remember the CPP_PADDING token, but its val.source instead, while
    in funlike_invocation_p we want to remember the padding token that has the
    significant information for stringify_arg/c-ppoutput.cc.
    So, IMHO we want to overwrite padding if:
    1) padding == NULL (remember that there was any padding at all)
    2) padding->val.source == NULL (this matches the source == NULL
       case in stringify_arg)
    3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
       (this matches the !(source->flags & PREV_WHITE) && token->val.source ==
NULL
       case in stringify_arg)

    2022-02-01  Jakub Jelinek  <jakub@redhat.com>

            PR preprocessor/104147
            * macro.c (funlike_invocation_p): For padding prefer a token
            with val.source non-NULL especially if it has PREV_WHITE set
            on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
            don't have PREV_WHITE set in flags.

            * c-c++-common/cpp/pr104147.c: New test.

    (cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (6 preceding siblings ...)
  2022-02-19  8:02 ` cvs-commit at gcc dot gnu.org
@ 2022-02-19  8:08 ` jakub at gcc dot gnu.org
  2022-05-10  8:23 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-02-19  8:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |jakub at gcc dot gnu.org
             Status|NEW                         |ASSIGNED
            Summary|[9/10/11 Regression] C      |[9/10 Regression] C
                   |preprocessor may remove the |preprocessor may remove the
                   |standard required           |standard required
                   |whitespace between the      |whitespace between the
                   |preprocessing tokens        |preprocessing tokens

--- Comment #7 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed for 11.3+ too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (7 preceding siblings ...)
  2022-02-19  8:08 ` [Bug preprocessor/104147] [9/10 " jakub at gcc dot gnu.org
@ 2022-05-10  8:23 ` cvs-commit at gcc dot gnu.org
  2022-05-11  6:24 ` cvs-commit at gcc dot gnu.org
  2022-05-11  6:36 ` jakub at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-10  8:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:fb1792e4c96cf0e969d5fde2a857da3fb4b2a5aa

commit r10-10673-gfb1792e4c96cf0e969d5fde2a857da3fb4b2a5aa
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 1 20:48:03 2022 +0100

    libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

    As mentioned in the PR, in some cases we preprocess incorrectly when we
    encounter an identifier which is defined as function-like macro, followed
    by at least 2 CPP_PADDING tokens and then some other identifier.
    On the following testcase, the problem is in the 3rd funlike_invocation_p,
    the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared
token),
    CPP_PADDING (one created with padding_token, val.source is non-NULL and
    val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
    funlike_invocation_p remembers there was a padding token, but remembers the
    first one because of its condition, then the next token is the CPP_NAME,
    which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
    can't easily backup more tokens, it pushes into a new context the padding
    token (the pfile->avoid_paste one).  The net effect is that when Y is not
    defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
    while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
    (the second avoid_paste is because that is how we handle end of a context).
    Now, for stringify_arg that is unfortunately a significant difference,
    which handles CPP_PADDING tokens with:
          if (token->type == CPP_PADDING)
            {
              if (source == NULL
                  || (!(source->flags & PREV_WHITE)
                      && token->val.source == NULL))
                source = token->val.source;
              continue;
            }
    and later on
          /* Leading white space?  */
          if (dest - 1 != BUFF_FRONT (pfile->u_buff))
            {
              if (source == NULL)
                source = token;
              if (source->flags & PREV_WHITE)
                *dest++ = ' ';
            }
          source = NULL;
    (and c-ppoutput.cc has similar code).
    So, when Y is not fun-like macro, ' ' is added because padding_token's
    val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
    macro, we don't add ' ' in between, because source is NULL and so
    used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

    Now, the funlike_invocation_p condition
           if (padding == NULL
               || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
            padding = token;
    looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
    the intent was to prefer do the same thing and pick the right padding.
    But there are significant differences.  Both stringify_arg and
c-ppoutput.cc
    don't remember the CPP_PADDING token, but its val.source instead, while
    in funlike_invocation_p we want to remember the padding token that has the
    significant information for stringify_arg/c-ppoutput.cc.
    So, IMHO we want to overwrite padding if:
    1) padding == NULL (remember that there was any padding at all)
    2) padding->val.source == NULL (this matches the source == NULL
       case in stringify_arg)
    3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
       (this matches the !(source->flags & PREV_WHITE) && token->val.source ==
NULL
       case in stringify_arg)

    2022-02-01  Jakub Jelinek  <jakub@redhat.com>

            PR preprocessor/104147
            * macro.c (funlike_invocation_p): For padding prefer a token
            with val.source non-NULL especially if it has PREV_WHITE set
            on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
            don't have PREV_WHITE set in flags.

            * c-c++-common/cpp/pr104147.c: New test.

    (cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (8 preceding siblings ...)
  2022-05-10  8:23 ` cvs-commit at gcc dot gnu.org
@ 2022-05-11  6:24 ` cvs-commit at gcc dot gnu.org
  2022-05-11  6:36 ` jakub at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-05-11  6:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

--- Comment #9 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-9 branch has been updated by Jakub Jelinek
<jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:7157e074472435489963cf6961cd4c2d06804f4d

commit r9-10121-g7157e074472435489963cf6961cd4c2d06804f4d
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Feb 1 20:48:03 2022 +0100

    libcpp: Fix up padding handling in funlike_invocation_p [PR104147]

    As mentioned in the PR, in some cases we preprocess incorrectly when we
    encounter an identifier which is defined as function-like macro, followed
    by at least 2 CPP_PADDING tokens and then some other identifier.
    On the following testcase, the problem is in the 3rd funlike_invocation_p,
    the tokens are CPP_NAME Y, CPP_PADDING (the pfile->avoid_paste shared
token),
    CPP_PADDING (one created with padding_token, val.source is non-NULL and
    val.source->flags & PREV_WHITE is non-zero) and then another CPP_NAME.
    funlike_invocation_p remembers there was a padding token, but remembers the
    first one because of its condition, then the next token is the CPP_NAME,
    which is not CPP_OPEN_PAREN, so the CPP_NAME token is backed up, but as we
    can't easily backup more tokens, it pushes into a new context the padding
    token (the pfile->avoid_paste one).  The net effect is that when Y is not
    defined as fun-like macro, we read Y, avoid_paste, padding_token, Y,
    while if Y is fun-like macro, we read Y, avoid_paste, avoid_paste, Y
    (the second avoid_paste is because that is how we handle end of a context).
    Now, for stringify_arg that is unfortunately a significant difference,
    which handles CPP_PADDING tokens with:
          if (token->type == CPP_PADDING)
            {
              if (source == NULL
                  || (!(source->flags & PREV_WHITE)
                      && token->val.source == NULL))
                source = token->val.source;
              continue;
            }
    and later on
          /* Leading white space?  */
          if (dest - 1 != BUFF_FRONT (pfile->u_buff))
            {
              if (source == NULL)
                source = token;
              if (source->flags & PREV_WHITE)
                *dest++ = ' ';
            }
          source = NULL;
    (and c-ppoutput.cc has similar code).
    So, when Y is not fun-like macro, ' ' is added because padding_token's
    val.source->flags & PREV_WHITE is non-zero, while when it is fun-like
    macro, we don't add ' ' in between, because source is NULL and so
    used from the next token (CPP_NAME Y), which doesn't have PREV_WHITE set.

    Now, the funlike_invocation_p condition
           if (padding == NULL
               || (!(padding->flags & PREV_WHITE) && token->val.source ==
NULL))
            padding = token;
    looks very similar to that in stringify_arg/c-ppoutput.cc, so I assume
    the intent was to prefer do the same thing and pick the right padding.
    But there are significant differences.  Both stringify_arg and
c-ppoutput.cc
    don't remember the CPP_PADDING token, but its val.source instead, while
    in funlike_invocation_p we want to remember the padding token that has the
    significant information for stringify_arg/c-ppoutput.cc.
    So, IMHO we want to overwrite padding if:
    1) padding == NULL (remember that there was any padding at all)
    2) padding->val.source == NULL (this matches the source == NULL
       case in stringify_arg)
    3) !(padding->val.source->flags & PREV_WHITE) && token->val.source == NULL
       (this matches the !(source->flags & PREV_WHITE) && token->val.source ==
NULL
       case in stringify_arg)

    2022-02-01  Jakub Jelinek  <jakub@redhat.com>

            PR preprocessor/104147
            * macro.c (funlike_invocation_p): For padding prefer a token
            with val.source non-NULL especially if it has PREV_WHITE set
            on val.source->flags.  Add gcc_assert that CPP_PADDING tokens
            don't have PREV_WHITE set in flags.

            * c-c++-common/cpp/pr104147.c: New test.

    (cherry picked from commit 95ac5635409606386259d2ff21fb61738858ca4a)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug preprocessor/104147] [9/10 Regression] C preprocessor may remove the standard required whitespace between the preprocessing tokens
  2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
                   ` (9 preceding siblings ...)
  2022-05-11  6:24 ` cvs-commit at gcc dot gnu.org
@ 2022-05-11  6:36 ` jakub at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: jakub at gcc dot gnu.org @ 2022-05-11  6:36 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104147

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-11  6:36 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-20 17:30 [Bug c/104147] New: C preprocessor may remove the standard required whitespace between the preprocessing tokens pavel.morozkin at gmail dot com
2022-01-20 20:59 ` [Bug preprocessor/104147] [9/10/11/12 Regression] " jsm28 at gcc dot gnu.org
2022-01-21  7:55 ` rguenth at gcc dot gnu.org
2022-01-31 14:07 ` jakub at gcc dot gnu.org
2022-01-31 14:50 ` jakub at gcc dot gnu.org
2022-02-01 19:50 ` cvs-commit at gcc dot gnu.org
2022-02-01 19:51 ` [Bug preprocessor/104147] [9/10/11 " jakub at gcc dot gnu.org
2022-02-19  8:02 ` cvs-commit at gcc dot gnu.org
2022-02-19  8:08 ` [Bug preprocessor/104147] [9/10 " jakub at gcc dot gnu.org
2022-05-10  8:23 ` cvs-commit at gcc dot gnu.org
2022-05-11  6:24 ` cvs-commit at gcc dot gnu.org
2022-05-11  6:36 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).