public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] c-family: Implement pragma_lex () for preprocess-only mode
@ 2023-06-30 22:59 Lewis Hyatt
  2023-07-26 20:58 ` Lewis Hyatt
  2023-07-26 21:36 ` Jason Merrill
  0 siblings, 2 replies; 9+ messages in thread
From: Lewis Hyatt @ 2023-06-30 22:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill, Lewis Hyatt

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
add a new libcpp callback, on_token_lex (), that ensures the preprocessor
sees these tokens too.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

	* c-common.h (c_init_preprocess): Declare new function.
	* c-opts.cc (c_common_init): Call it.
	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
	(pragma_diagnostic_lex): ...this.
	(pragma_diagnostic_lex_pp): Remove.
	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
	all modes.
	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
	usage.
	* c-pragma.h (pragma_lex_discard_to_eol): Declare new function.

gcc/c/ChangeLog:

	* c-parser.cc (pragma_lex): Support preprocess-only mode.
	(pragma_lex_discard_to_eol): New function.
	(c_init_preprocess): New function.

gcc/cp/ChangeLog:

	* parser.cc (c_init_preprocess): New function.
	(maybe_read_tokens_for_pragma_lex): New function.
	(pragma_lex): Support preprocess-only mode.
	(pragma_lex_discard_to_eol): New funtion.

libcpp/ChangeLog:

	* include/cpplib.h (struct cpp_callbacks): Add new callback
	on_token_lex.
	* macro.cc (cpp_get_token_1): Support new callback.
---

Notes:
    Hello-
    
    In r13-1544, I added support for processing `#pragma GCC diagnostic' in
    preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
    that patch I called into libcpp directly to obtain the tokens needed to
    process the pragma. As part of the review, Jason noted that it would
    probably be better to make pragma_lex () usable in preprocess-only mode, and
    we decided just to add a comment about that for the time being, and to go
    ahead and implement that in the future, if it became necessary to support
    other pragmas during preprocessing.
    
    I think now is a good time to proceed with that plan, because I would like
    to fix PR87299, which is about another pragma (#pragma GCC target) not
    working in preprocess-only mode. This patch makes the necessary changes for
    pragma_lex () to work in preprocess-only mode.
    
    I have also added a new callback, on_token_lex (), to libcpp. This is so the
    preprocessor can see and stream out all the tokens that pragma_lex () gets
    from libcpp, since it won't otherwise see them.  This seemed the simplest
    approach to me. Another possibility would be to add a wrapper function in
    c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
    also stream the token in preprocess-only mode, and then change all calls
    into libcpp in that file to use the wrapper function.  The libcpp callback
    seemed cleaner to me FWIW.
    
    There are no new tests added here, since it's just a change of
    implementation covered by existing tests. Bootstrap + regtest all languages
    looks good on x86-64 Linux.
    
    Please let me know what you think? Thanks!
    
    -Lewis

 gcc/c-family/c-common.h  |  3 +++
 gcc/c-family/c-opts.cc   |  1 +
 gcc/c-family/c-pragma.cc | 56 ++++++----------------------------------
 gcc/c-family/c-pragma.h  |  2 ++
 gcc/c/c-parser.cc        | 34 ++++++++++++++++++++++++
 gcc/cp/parser.cc         | 50 +++++++++++++++++++++++++++++++++++
 libcpp/include/cpplib.h  |  4 +++
 libcpp/macro.cc          |  3 +++
 8 files changed, 105 insertions(+), 48 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..78fc5248ba6 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index af19140e382..4961af63de8 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1232,6 +1232,7 @@ c_common_init (void)
   if (flag_preprocess_only)
     {
       c_finish_options ();
+      c_init_preprocess ();
       preprocess_file (parse_in);
       return false;
     }
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebb..73d59df3bf4 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -840,11 +840,11 @@ public:
 
 };
 
-/* When compiling normally, use pragma_lex () to obtain the needed tokens.
-   This will call into either the C or C++ frontends as appropriate.  */
+/* This will call into either the C or C++ frontends as appropriate to get
+   tokens from libcpp for the pragma.  */
 
 static void
-pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
+pragma_diagnostic_lex (pragma_diagnostic_data *result)
 {
   result->clear ();
   tree x;
@@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
   result->valid = true;
 }
 
-/* When preprocessing only, pragma_lex () is not available, so obtain the
-   tokens directly from libcpp.  We also need to inform the token streamer
-   about all tokens we lex ourselves here, so it outputs them too; this is
-   done by calling c_pp_stream_token () for each.
-
-   ???  If we need to support more pragmas in the future, maybe initialize
-   this_parser with the pragma tokens and call pragma_lex () instead?  */
-
-static void
-pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
-{
-  result->clear ();
-
-  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
-  c_pp_stream_token (parse_in, tok, result->loc_kind);
-  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
-    return;
-  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
-  result->set_kind ((const char *)kind_u);
-  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
-    return;
-
-  if (result->needs_option ())
-    {
-      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
-      c_pp_stream_token (parse_in, tok, result->loc_option);
-      if (tok->type != CPP_STRING)
-	return;
-      cpp_string str;
-      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str,
-					     CPP_STRING)
-	  || !str.len)
-	return;
-      result->option_str = (const char *)str.text;
-      result->own_option_str = true;
-    }
-
-  result->valid = true;
-}
-
 /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as C++)
    that do not process the deferred pragma while they are consuming tokens; they
    can use early mode to make sure diagnostics affecting the preprocessor itself
@@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
   static const bool want_diagnostics = (is_pp || !early);
 
   pragma_diagnostic_data data;
-  if (is_pp)
-    pragma_diagnostic_lex_pp (&data);
-  else
-    pragma_diagnostic_lex_normal (&data);
+  pragma_diagnostic_lex (&data);
 
   if (!data.kind_str)
     {
@@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
 {
   const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
   if (data->early_handler)
-    data->early_handler (parse_in);
+    {
+      data->early_handler (parse_in);
+      pragma_lex_discard_to_eol ();
+    }
 }
 
 /* Set up front-end pragmas.  */
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee3..198fa7723e5 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
 extern void maybe_apply_pragma_scalar_storage_order (tree);
 extern void add_to_renaming_pragma_list (tree, tree);
 
+/* These are to be implemented in each frontend that needs them.  */
 extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
+extern void pragma_lex_discard_to_eol ();
 
 /* Flags for use with c_lex_with_flags.  The values here were picked
    so that 0 means to translate and join strings.  */
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..aaf6d704fe6 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13355,6 +13355,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
 enum cpp_ttype
 pragma_lex (tree *value, location_t *loc)
 {
+  if (flag_preprocess_only)
+    /* Arrange for the preprocessor to see the tokens we're about to read,
+       since it won't see them later.  */
+    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
+
   c_token *tok = c_parser_peek_token (the_parser);
   enum cpp_ttype ret = tok->type;
 
@@ -13373,9 +13378,29 @@ pragma_lex (tree *value, location_t *loc)
       c_parser_consume_token (the_parser);
     }
 
+  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  if (flag_preprocess_only)
+    /* Arrange for the preprocessor to see the tokens we're about to read,
+       since it won't see them later.  */
+    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
+
+  cpp_ttype type;
+  do
+    {
+      type = c_parser_peek_token (the_parser)->type;
+      gcc_assert (type != CPP_EOF);
+      c_parser_consume_token (the_parser);
+    } while (type != CPP_PRAGMA_EOL);
+
+  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
+}
+
 static void
 c_parser_pragma_pch_preprocess (c_parser *parser)
 {
@@ -24756,6 +24781,15 @@ c_parse_file (void)
   the_parser = NULL;
 }
 
+void
+c_init_preprocess (void)
+{
+  /* Create a parser for use by pragma_lex during preprocessing.  */
+  the_parser = ggc_alloc<c_parser> ();
+  memset (the_parser, 0, sizeof (c_parser));
+  the_parser->tokens = &the_parser->tokens_buf[0];
+}
+
 /* Parse the body of a function declaration marked with "__RTL".
 
    The RTL parser works on the level of characters read from a
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 5e2b5cba57e..b2f2e222d81 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -765,6 +765,15 @@ cp_lexer_new_main (void)
   return lexer;
 }
 
+/* Create a lexer and parser to be used during preprocess-only mode.
+   This will be filled with tokens to parse when needed by pragma_lex ().  */
+void
+c_init_preprocess ()
+{
+  gcc_assert (!the_parser);
+  the_parser = cp_parser_new (cp_lexer_alloc ());
+}
+
 /* Create a new lexer whose token stream is primed with the tokens in
    CACHE.  When these tokens are exhausted, no new tokens will be read.  */
 
@@ -49683,11 +49692,42 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
   return ret;
 }
 
+/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
+   populated the lexer with any tokens (the tokens rather being read by
+   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
+   a pragma.  */
+static void
+maybe_read_tokens_for_pragma_lex ()
+{
+  const auto lexer = the_parser->lexer;
+  if (!lexer->buffer->is_empty ())
+    return;
+
+  /* Arrange for the preprocessor to see the tokens we're about to read,
+     since it won't see them later.  */
+  cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
+
+  /* Read the rest of the tokens comprising the pragma line.  */
+  cp_token *tok;
+  do
+    {
+      tok = vec_safe_push (lexer->buffer, cp_token ());
+      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
+      gcc_assert (tok->type != CPP_EOF);
+    } while (tok->type != CPP_PRAGMA_EOL);
+  lexer->next_token = lexer->buffer->address ();
+  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
+  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
+}
+
 /* The interface the pragma parsers have to the lexer.  */
 
 enum cpp_ttype
 pragma_lex (tree *value, location_t *loc)
 {
+  if (flag_preprocess_only)
+    maybe_read_tokens_for_pragma_lex ();
+
   cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
   enum cpp_ttype ret = tok->type;
 
@@ -49710,6 +49750,16 @@ pragma_lex (tree *value, location_t *loc)
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  /* We have already read all the tokens, so we just need to discard
+     them here.  */
+  const auto lexer = the_parser->lexer;
+  lexer->next_token = lexer->last_token;
+  lexer->buffer->truncate (0);
+}
+
 \f
 /* External interface.  */
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index aef703f8111..8b63204df0e 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -784,6 +784,10 @@ struct cpp_callbacks
      cpp_buffer containing the translation if translating.  */
   char *(*translate_include) (cpp_reader *, line_maps *, location_t,
 			      const char *path);
+
+  /* Called when cpp_get_token() / cpp_get_token_with_location()
+     have produced a token.  */
+  void (*on_token_lex) (cpp_reader *, const cpp_token *, location_t);
 };
 
 #ifdef VMS
diff --git a/libcpp/macro.cc b/libcpp/macro.cc
index dada8fea835..ebbc1618a71 100644
--- a/libcpp/macro.cc
+++ b/libcpp/macro.cc
@@ -3135,6 +3135,9 @@ cpp_get_token_1 (cpp_reader *pfile, location_t *location)
 	}
     }
 
+  if (pfile->cb.on_token_lex)
+    pfile->cb.on_token_lex (pfile, result,
+			    location ? *location : result->src_loc);
   return result;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode
  2023-06-30 22:59 [PATCH] c-family: Implement pragma_lex () for preprocess-only mode Lewis Hyatt
@ 2023-07-26 20:58 ` Lewis Hyatt
  2023-07-26 21:36 ` Jason Merrill
  1 sibling, 0 replies; 9+ messages in thread
From: Lewis Hyatt @ 2023-07-26 20:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill

May I please ping this?
I am just about ready with the followup patch that fixes PR87299, but
it depends on this one. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/623364.html

-Lewis

On Fri, Jun 30, 2023 at 6:59 PM Lewis Hyatt <lhyatt@gmail.com> wrote:
>
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
>
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
>
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> sees these tokens too.
>
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
>
> gcc/c-family/ChangeLog:
>
>         * c-common.h (c_init_preprocess): Declare new function.
>         * c-opts.cc (c_common_init): Call it.
>         * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
>         (pragma_diagnostic_lex): ...this.
>         (pragma_diagnostic_lex_pp): Remove.
>         (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
>         all modes.
>         (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
>         usage.
>         * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
>
> gcc/c/ChangeLog:
>
>         * c-parser.cc (pragma_lex): Support preprocess-only mode.
>         (pragma_lex_discard_to_eol): New function.
>         (c_init_preprocess): New function.
>
> gcc/cp/ChangeLog:
>
>         * parser.cc (c_init_preprocess): New function.
>         (maybe_read_tokens_for_pragma_lex): New function.
>         (pragma_lex): Support preprocess-only mode.
>         (pragma_lex_discard_to_eol): New funtion.
>
> libcpp/ChangeLog:
>
>         * include/cpplib.h (struct cpp_callbacks): Add new callback
>         on_token_lex.
>         * macro.cc (cpp_get_token_1): Support new callback.
> ---
>
> Notes:
>     Hello-
>
>     In r13-1544, I added support for processing `#pragma GCC diagnostic' in
>     preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
>     that patch I called into libcpp directly to obtain the tokens needed to
>     process the pragma. As part of the review, Jason noted that it would
>     probably be better to make pragma_lex () usable in preprocess-only mode, and
>     we decided just to add a comment about that for the time being, and to go
>     ahead and implement that in the future, if it became necessary to support
>     other pragmas during preprocessing.
>
>     I think now is a good time to proceed with that plan, because I would like
>     to fix PR87299, which is about another pragma (#pragma GCC target) not
>     working in preprocess-only mode. This patch makes the necessary changes for
>     pragma_lex () to work in preprocess-only mode.
>
>     I have also added a new callback, on_token_lex (), to libcpp. This is so the
>     preprocessor can see and stream out all the tokens that pragma_lex () gets
>     from libcpp, since it won't otherwise see them.  This seemed the simplest
>     approach to me. Another possibility would be to add a wrapper function in
>     c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
>     also stream the token in preprocess-only mode, and then change all calls
>     into libcpp in that file to use the wrapper function.  The libcpp callback
>     seemed cleaner to me FWIW.
>
>     There are no new tests added here, since it's just a change of
>     implementation covered by existing tests. Bootstrap + regtest all languages
>     looks good on x86-64 Linux.
>
>     Please let me know what you think? Thanks!
>
>     -Lewis
>
>  gcc/c-family/c-common.h  |  3 +++
>  gcc/c-family/c-opts.cc   |  1 +
>  gcc/c-family/c-pragma.cc | 56 ++++++----------------------------------
>  gcc/c-family/c-pragma.h  |  2 ++
>  gcc/c/c-parser.cc        | 34 ++++++++++++++++++++++++
>  gcc/cp/parser.cc         | 50 +++++++++++++++++++++++++++++++++++
>  libcpp/include/cpplib.h  |  4 +++
>  libcpp/macro.cc          |  3 +++
>  8 files changed, 105 insertions(+), 48 deletions(-)
>
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index b5ef5ff6b2c..78fc5248ba6 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -990,6 +990,9 @@ extern void c_parse_file (void);
>
>  extern void c_parse_final_cleanups (void);
>
> +/* This initializes for preprocess-only mode.  */
> +extern void c_init_preprocess (void);
> +
>  /* These macros provide convenient access to the various _STMT nodes.  */
>
>  /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index af19140e382..4961af63de8 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -1232,6 +1232,7 @@ c_common_init (void)
>    if (flag_preprocess_only)
>      {
>        c_finish_options ();
> +      c_init_preprocess ();
>        preprocess_file (parse_in);
>        return false;
>      }
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
> index 0d2b333cebb..73d59df3bf4 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -840,11 +840,11 @@ public:
>
>  };
>
> -/* When compiling normally, use pragma_lex () to obtain the needed tokens.
> -   This will call into either the C or C++ frontends as appropriate.  */
> +/* This will call into either the C or C++ frontends as appropriate to get
> +   tokens from libcpp for the pragma.  */
>
>  static void
> -pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
> +pragma_diagnostic_lex (pragma_diagnostic_data *result)
>  {
>    result->clear ();
>    tree x;
> @@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
>    result->valid = true;
>  }
>
> -/* When preprocessing only, pragma_lex () is not available, so obtain the
> -   tokens directly from libcpp.  We also need to inform the token streamer
> -   about all tokens we lex ourselves here, so it outputs them too; this is
> -   done by calling c_pp_stream_token () for each.
> -
> -   ???  If we need to support more pragmas in the future, maybe initialize
> -   this_parser with the pragma tokens and call pragma_lex () instead?  */
> -
> -static void
> -pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
> -{
> -  result->clear ();
> -
> -  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
> -  c_pp_stream_token (parse_in, tok, result->loc_kind);
> -  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
> -    return;
> -  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
> -  result->set_kind ((const char *)kind_u);
> -  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
> -    return;
> -
> -  if (result->needs_option ())
> -    {
> -      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
> -      c_pp_stream_token (parse_in, tok, result->loc_option);
> -      if (tok->type != CPP_STRING)
> -       return;
> -      cpp_string str;
> -      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str,
> -                                            CPP_STRING)
> -         || !str.len)
> -       return;
> -      result->option_str = (const char *)str.text;
> -      result->own_option_str = true;
> -    }
> -
> -  result->valid = true;
> -}
> -
>  /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as C++)
>     that do not process the deferred pragma while they are consuming tokens; they
>     can use early mode to make sure diagnostics affecting the preprocessor itself
> @@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
>    static const bool want_diagnostics = (is_pp || !early);
>
>    pragma_diagnostic_data data;
> -  if (is_pp)
> -    pragma_diagnostic_lex_pp (&data);
> -  else
> -    pragma_diagnostic_lex_normal (&data);
> +  pragma_diagnostic_lex (&data);
>
>    if (!data.kind_str)
>      {
> @@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
>  {
>    const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
>    if (data->early_handler)
> -    data->early_handler (parse_in);
> +    {
> +      data->early_handler (parse_in);
> +      pragma_lex_discard_to_eol ();
> +    }
>  }
>
>  /* Set up front-end pragmas.  */
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> index 9cc95ab3ee3..198fa7723e5 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
>  extern void maybe_apply_pragma_scalar_storage_order (tree);
>  extern void add_to_renaming_pragma_list (tree, tree);
>
> +/* These are to be implemented in each frontend that needs them.  */
>  extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
> +extern void pragma_lex_discard_to_eol ();
>
>  /* Flags for use with c_lex_with_flags.  The values here were picked
>     so that 0 means to translate and join strings.  */
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 24a6eb6e459..aaf6d704fe6 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -13355,6 +13355,11 @@ c_parser_pragma (c_parser *parser, enum pragma_context context, bool *if_p)
>  enum cpp_ttype
>  pragma_lex (tree *value, location_t *loc)
>  {
> +  if (flag_preprocess_only)
> +    /* Arrange for the preprocessor to see the tokens we're about to read,
> +       since it won't see them later.  */
> +    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
>    c_token *tok = c_parser_peek_token (the_parser);
>    enum cpp_ttype ret = tok->type;
>
> @@ -13373,9 +13378,29 @@ pragma_lex (tree *value, location_t *loc)
>        c_parser_consume_token (the_parser);
>      }
>
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
>    return ret;
>  }
>
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  if (flag_preprocess_only)
> +    /* Arrange for the preprocessor to see the tokens we're about to read,
> +       since it won't see them later.  */
> +    cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
> +  cpp_ttype type;
> +  do
> +    {
> +      type = c_parser_peek_token (the_parser)->type;
> +      gcc_assert (type != CPP_EOF);
> +      c_parser_consume_token (the_parser);
> +    } while (type != CPP_PRAGMA_EOL);
> +
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
> +}
> +
>  static void
>  c_parser_pragma_pch_preprocess (c_parser *parser)
>  {
> @@ -24756,6 +24781,15 @@ c_parse_file (void)
>    the_parser = NULL;
>  }
>
> +void
> +c_init_preprocess (void)
> +{
> +  /* Create a parser for use by pragma_lex during preprocessing.  */
> +  the_parser = ggc_alloc<c_parser> ();
> +  memset (the_parser, 0, sizeof (c_parser));
> +  the_parser->tokens = &the_parser->tokens_buf[0];
> +}
> +
>  /* Parse the body of a function declaration marked with "__RTL".
>
>     The RTL parser works on the level of characters read from a
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 5e2b5cba57e..b2f2e222d81 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -765,6 +765,15 @@ cp_lexer_new_main (void)
>    return lexer;
>  }
>
> +/* Create a lexer and parser to be used during preprocess-only mode.
> +   This will be filled with tokens to parse when needed by pragma_lex ().  */
> +void
> +c_init_preprocess ()
> +{
> +  gcc_assert (!the_parser);
> +  the_parser = cp_parser_new (cp_lexer_alloc ());
> +}
> +
>  /* Create a new lexer whose token stream is primed with the tokens in
>     CACHE.  When these tokens are exhausted, no new tokens will be read.  */
>
> @@ -49683,11 +49692,42 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
>    return ret;
>  }
>
> +/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
> +   populated the lexer with any tokens (the tokens rather being read by
> +   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
> +   a pragma.  */
> +static void
> +maybe_read_tokens_for_pragma_lex ()
> +{
> +  const auto lexer = the_parser->lexer;
> +  if (!lexer->buffer->is_empty ())
> +    return;
> +
> +  /* Arrange for the preprocessor to see the tokens we're about to read,
> +     since it won't see them later.  */
> +  cpp_get_callbacks (parse_in)->on_token_lex = c_pp_stream_token;
> +
> +  /* Read the rest of the tokens comprising the pragma line.  */
> +  cp_token *tok;
> +  do
> +    {
> +      tok = vec_safe_push (lexer->buffer, cp_token ());
> +      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
> +      gcc_assert (tok->type != CPP_EOF);
> +    } while (tok->type != CPP_PRAGMA_EOL);
> +  lexer->next_token = lexer->buffer->address ();
> +  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
> +  cpp_get_callbacks (parse_in)->on_token_lex = nullptr;
> +}
> +
>  /* The interface the pragma parsers have to the lexer.  */
>
>  enum cpp_ttype
>  pragma_lex (tree *value, location_t *loc)
>  {
> +  if (flag_preprocess_only)
> +    maybe_read_tokens_for_pragma_lex ();
> +
>    cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
>    enum cpp_ttype ret = tok->type;
>
> @@ -49710,6 +49750,16 @@ pragma_lex (tree *value, location_t *loc)
>    return ret;
>  }
>
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  /* We have already read all the tokens, so we just need to discard
> +     them here.  */
> +  const auto lexer = the_parser->lexer;
> +  lexer->next_token = lexer->last_token;
> +  lexer->buffer->truncate (0);
> +}
> +
>
>  /* External interface.  */
>
> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index aef703f8111..8b63204df0e 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -784,6 +784,10 @@ struct cpp_callbacks
>       cpp_buffer containing the translation if translating.  */
>    char *(*translate_include) (cpp_reader *, line_maps *, location_t,
>                               const char *path);
> +
> +  /* Called when cpp_get_token() / cpp_get_token_with_location()
> +     have produced a token.  */
> +  void (*on_token_lex) (cpp_reader *, const cpp_token *, location_t);
>  };
>
>  #ifdef VMS
> diff --git a/libcpp/macro.cc b/libcpp/macro.cc
> index dada8fea835..ebbc1618a71 100644
> --- a/libcpp/macro.cc
> +++ b/libcpp/macro.cc
> @@ -3135,6 +3135,9 @@ cpp_get_token_1 (cpp_reader *pfile, location_t *location)
>         }
>      }
>
> +  if (pfile->cb.on_token_lex)
> +    pfile->cb.on_token_lex (pfile, result,
> +                           location ? *location : result->src_loc);
>    return result;
>  }
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode
  2023-06-30 22:59 [PATCH] c-family: Implement pragma_lex () for preprocess-only mode Lewis Hyatt
  2023-07-26 20:58 ` Lewis Hyatt
@ 2023-07-26 21:36 ` Jason Merrill
  2023-07-26 22:25   ` Lewis Hyatt
  2023-07-27 22:59   ` [PATCH v2] " Lewis Hyatt
  1 sibling, 2 replies; 9+ messages in thread
From: Jason Merrill @ 2023-07-26 21:36 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On 6/30/23 18:59, Lewis Hyatt wrote:
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
> 
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
> 
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> sees these tokens too.
> 
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-common.h (c_init_preprocess): Declare new function.
> 	* c-opts.cc (c_common_init): Call it.
> 	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> 	(pragma_diagnostic_lex): ...this.
> 	(pragma_diagnostic_lex_pp): Remove.
> 	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> 	all modes.
> 	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> 	usage.
> 	* c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
> 
> gcc/c/ChangeLog:
> 
> 	* c-parser.cc (pragma_lex): Support preprocess-only mode.
> 	(pragma_lex_discard_to_eol): New function.
> 	(c_init_preprocess): New function.
> 
> gcc/cp/ChangeLog:
> 
> 	* parser.cc (c_init_preprocess): New function.
> 	(maybe_read_tokens_for_pragma_lex): New function.
> 	(pragma_lex): Support preprocess-only mode.
> 	(pragma_lex_discard_to_eol): New funtion.
> 
> libcpp/ChangeLog:
> 
> 	* include/cpplib.h (struct cpp_callbacks): Add new callback
> 	on_token_lex.
> 	* macro.cc (cpp_get_token_1): Support new callback.
> ---
> 
> Notes:
>      Hello-
>      
>      In r13-1544, I added support for processing `#pragma GCC diagnostic' in
>      preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
>      that patch I called into libcpp directly to obtain the tokens needed to
>      process the pragma. As part of the review, Jason noted that it would
>      probably be better to make pragma_lex () usable in preprocess-only mode, and
>      we decided just to add a comment about that for the time being, and to go
>      ahead and implement that in the future, if it became necessary to support
>      other pragmas during preprocessing.
>      
>      I think now is a good time to proceed with that plan, because I would like
>      to fix PR87299, which is about another pragma (#pragma GCC target) not
>      working in preprocess-only mode. This patch makes the necessary changes for
>      pragma_lex () to work in preprocess-only mode.
>      
>      I have also added a new callback, on_token_lex (), to libcpp. This is so the
>      preprocessor can see and stream out all the tokens that pragma_lex () gets
>      from libcpp, since it won't otherwise see them.  This seemed the simplest
>      approach to me. Another possibility would be to add a wrapper function in
>      c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
>      also stream the token in preprocess-only mode, and then change all calls
>      into libcpp in that file to use the wrapper function.  The libcpp callback
>      seemed cleaner to me FWIW.

I think the other way sounds better to me; there are only three calls to 
cpp_get_... in c_lex_with_flags.

The rest of the patch looks good.

Jason


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-26 21:36 ` Jason Merrill
@ 2023-07-26 22:25   ` Lewis Hyatt
  2023-07-27 22:59   ` [PATCH v2] " Lewis Hyatt
  1 sibling, 0 replies; 9+ messages in thread
From: Lewis Hyatt @ 2023-07-26 22:25 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc-patches

On Wed, Jul 26, 2023 at 5:36 PM Jason Merrill <jason@redhat.com> wrote:
>
> On 6/30/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> >
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> >
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > add a new libcpp callback, on_token_lex (), that ensures the preprocessor
> > sees these tokens too.
> >
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> >
> > gcc/c-family/ChangeLog:
> >
> >       * c-common.h (c_init_preprocess): Declare new function.
> >       * c-opts.cc (c_common_init): Call it.
> >       * c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> >       (pragma_diagnostic_lex): ...this.
> >       (pragma_diagnostic_lex_pp): Remove.
> >       (handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> >       all modes.
> >       (c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> >       usage.
> >       * c-pragma.h (pragma_lex_discard_to_eol): Declare new function.
> >
> > gcc/c/ChangeLog:
> >
> >       * c-parser.cc (pragma_lex): Support preprocess-only mode.
> >       (pragma_lex_discard_to_eol): New function.
> >       (c_init_preprocess): New function.
> >
> > gcc/cp/ChangeLog:
> >
> >       * parser.cc (c_init_preprocess): New function.
> >       (maybe_read_tokens_for_pragma_lex): New function.
> >       (pragma_lex): Support preprocess-only mode.
> >       (pragma_lex_discard_to_eol): New funtion.
> >
> > libcpp/ChangeLog:
> >
> >       * include/cpplib.h (struct cpp_callbacks): Add new callback
> >       on_token_lex.
> >       * macro.cc (cpp_get_token_1): Support new callback.
> > ---
> >
> > Notes:
> >      Hello-
> >
> >      In r13-1544, I added support for processing `#pragma GCC diagnostic' in
> >      preprocess-only mode. Because pragma_lex () doesn't work in that mode, in
> >      that patch I called into libcpp directly to obtain the tokens needed to
> >      process the pragma. As part of the review, Jason noted that it would
> >      probably be better to make pragma_lex () usable in preprocess-only mode, and
> >      we decided just to add a comment about that for the time being, and to go
> >      ahead and implement that in the future, if it became necessary to support
> >      other pragmas during preprocessing.
> >
> >      I think now is a good time to proceed with that plan, because I would like
> >      to fix PR87299, which is about another pragma (#pragma GCC target) not
> >      working in preprocess-only mode. This patch makes the necessary changes for
> >      pragma_lex () to work in preprocess-only mode.
> >
> >      I have also added a new callback, on_token_lex (), to libcpp. This is so the
> >      preprocessor can see and stream out all the tokens that pragma_lex () gets
> >      from libcpp, since it won't otherwise see them.  This seemed the simplest
> >      approach to me. Another possibility would be to add a wrapper function in
> >      c-family/c-lex.cc, which would call cpp_get_token_with_location(), and then
> >      also stream the token in preprocess-only mode, and then change all calls
> >      into libcpp in that file to use the wrapper function.  The libcpp callback
> >      seemed cleaner to me FWIW.
>
> I think the other way sounds better to me; there are only three calls to
> cpp_get_... in c_lex_with_flags.
>
> The rest of the patch looks good.

Thank you very much for the feedback. I will test it this way and send
the updated version.

-Lewis

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-26 21:36 ` Jason Merrill
  2023-07-26 22:25   ` Lewis Hyatt
@ 2023-07-27 22:59   ` Lewis Hyatt
  2023-07-28  1:18     ` Jason Merrill
  1 sibling, 1 reply; 9+ messages in thread
From: Lewis Hyatt @ 2023-07-27 22:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jason Merrill, Lewis Hyatt

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

	* c-common.h (c_init_preprocess): Declare.
	(c_lex_enable_token_streaming): Declare.
	* c-opts.cc (c_common_init): Call c_init_preprocess ().
	* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
	(c_lex_enable_token_streaming): New function.
	(cb_def_pragma): Add a comment.
	(get_token): New function wrapping cpp_get_token.
	(c_lex_with_flags): Use the new wrapper function to support
	obtaining tokens in preprocess_only mode.
	(lex_string): Likewise.
	* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
	when needed.
	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
	(pragma_diagnostic_lex): ...this.
	(pragma_diagnostic_lex_pp): Remove.
	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
	all modes.
	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
	usage.
	* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

	* c-parser.cc (pragma_lex_discard_to_eol): New function.
	(c_init_preprocess): New function.

gcc/cp/ChangeLog:

	* parser.cc (c_init_preprocess): New function.
	(maybe_read_tokens_for_pragma_lex): New function.
	(pragma_lex): Support preprocess-only mode.
	(pragma_lex_discard_to_eol): New function.
---

Notes:
    Hello-
    
    Here is version 2 of the patch, incorporating Jason's feedback from
    https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
    
    Thanks again, please let me know if it's OK? Bootstrap + regtest all
    languages on x86-64 Linux looks good.
    
    -Lewis

 gcc/c-family/c-common.h    |  4 +++
 gcc/c-family/c-lex.cc      | 49 +++++++++++++++++++++++++++++----
 gcc/c-family/c-opts.cc     |  1 +
 gcc/c-family/c-ppoutput.cc | 17 +++++++++---
 gcc/c-family/c-pragma.cc   | 56 ++++++--------------------------------
 gcc/c-family/c-pragma.h    |  2 ++
 gcc/c/c-parser.cc          | 21 ++++++++++++++
 gcc/cp/parser.cc           | 45 ++++++++++++++++++++++++++++++
 8 files changed, 138 insertions(+), 57 deletions(-)

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..2fe2f194660 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
@@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
 /* In c-lex.cc.  */
 extern enum cpp_ttype
 conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
+extern void c_lex_enable_token_streaming (bool enabled);
 
 /* In c-pch.cc  */
 extern void pch_init (void);
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..ac4c018d863 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
+
+/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
+   tokens obtained here need to be streamed to the preprocessor.  */
+static bool stream_tokens_to_preprocessor = false;
+
+void
+c_lex_enable_token_streaming (bool enabled)
+{
+  stream_tokens_to_preprocessor = enabled;
+}
+
 \f
 void
 init_c_lex (void)
@@ -249,6 +260,10 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
       location_t fe_loc = loc;
 
       space = name = (const unsigned char *) "";
+
+      /* N.B.  It's fine to call cpp_get_token () directly here (rather than our
+	 local wrapper get_token ()), because this callback is not used with
+	 flag_preprocess_only==true.  */
       s = cpp_get_token (pfile);
       if (s->type != CPP_EOF)
 	{
@@ -284,8 +299,32 @@ cb_undef (cpp_reader *pfile, location_t loc, cpp_hashnode *node)
 			 (const char *) NODE_NAME (node));
 }
 
+/* Wrapper around cpp_get_token_with_location to stream the token to the
+   preprocessor so it can output it.  This is necessary with
+   flag_preprocess_only if we are obtaining tokens here instead of from the loop
+   in c-ppoutput.cc, such as while processing a #pragma.  */
+
+static const cpp_token *
+get_token (cpp_reader *pfile, location_t *loc = nullptr)
+{
+  if (stream_tokens_to_preprocessor)
+    {
+      location_t x;
+      if (!loc)
+	loc = &x;
+      const auto tok = cpp_get_token_with_location (pfile, loc);
+      c_pp_stream_token (pfile, tok, *loc);
+      return tok;
+    }
+  else
+    return cpp_get_token_with_location (pfile, loc);
+}
+
 /* Wrapper around cpp_get_token to skip CPP_PADDING tokens
-   and not consume CPP_EOF.  */
+   and not consume CPP_EOF.  This does not perform the optional
+   streaming in preprocess_only mode, so is suitable to be used
+   when processing builtin expansions such as c_common_has_attribute.  */
+
 static const cpp_token *
 get_token_no_padding (cpp_reader *pfile)
 {
@@ -492,7 +531,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 
   timevar_push (TV_CPP);
  retry:
-  tok = cpp_get_token_with_location (parse_in, loc);
+  tok = get_token (parse_in, loc);
   type = tok->type;
 
  retry_after_at:
@@ -566,7 +605,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 	  location_t newloc;
 
 	retry_at:
-	  tok = cpp_get_token_with_location (parse_in, &newloc);
+	  tok = get_token (parse_in, &newloc);
 	  type = tok->type;
 	  switch (type)
 	    {
@@ -716,7 +755,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 	{
 	  do
 	    {
-	      tok = cpp_get_token_with_location (parse_in, loc);
+	      tok = get_token (parse_in, loc);
 	      type = tok->type;
 	    }
 	  while (type == CPP_PADDING || type == CPP_COMMENT);
@@ -1308,7 +1347,7 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
   bool objc_at_sign_was_seen = false;
 
  retry:
-  tok = cpp_get_token (parse_in);
+  tok = get_token (parse_in);
   switch (tok->type)
     {
     case CPP_PADDING:
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index af19140e382..4961af63de8 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1232,6 +1232,7 @@ c_common_init (void)
   if (flag_preprocess_only)
     {
       c_finish_options ();
+      c_init_preprocess ();
       preprocess_file (parse_in);
       return false;
     }
diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index 4aa2bef2c0f..01a0bd79b13 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -99,11 +99,20 @@ preprocess_file (cpp_reader *pfile)
     }
   else if (cpp_get_options (pfile)->traditional)
     scan_translation_unit_trad (pfile);
-  else if (cpp_get_options (pfile)->directives_only
-	   && !cpp_get_options (pfile)->preprocessed)
-    scan_translation_unit_directives_only (pfile);
   else
-    scan_translation_unit (pfile);
+    {
+      /* If we end up processing a pragma while preprocessing, the handler
+	 for that pragma may end up obtaining some tokens from libcpp itself,
+	 e.g. by calling pragma_lex ().  The frontend needs to know that it
+	 should inform us about all such tokens, so we can output them.  */
+      c_lex_enable_token_streaming (true);
+
+      if (cpp_get_options (pfile)->directives_only
+	  && !cpp_get_options (pfile)->preprocessed)
+	scan_translation_unit_directives_only (pfile);
+      else
+	scan_translation_unit (pfile);
+    }
 
   /* -dM command line option.  Should this be elsewhere?  */
   if (flag_dump_macros == 'M')
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebb..73d59df3bf4 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -840,11 +840,11 @@ public:
 
 };
 
-/* When compiling normally, use pragma_lex () to obtain the needed tokens.
-   This will call into either the C or C++ frontends as appropriate.  */
+/* This will call into either the C or C++ frontends as appropriate to get
+   tokens from libcpp for the pragma.  */
 
 static void
-pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
+pragma_diagnostic_lex (pragma_diagnostic_data *result)
 {
   result->clear ();
   tree x;
@@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
   result->valid = true;
 }
 
-/* When preprocessing only, pragma_lex () is not available, so obtain the
-   tokens directly from libcpp.  We also need to inform the token streamer
-   about all tokens we lex ourselves here, so it outputs them too; this is
-   done by calling c_pp_stream_token () for each.
-
-   ???  If we need to support more pragmas in the future, maybe initialize
-   this_parser with the pragma tokens and call pragma_lex () instead?  */
-
-static void
-pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
-{
-  result->clear ();
-
-  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
-  c_pp_stream_token (parse_in, tok, result->loc_kind);
-  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
-    return;
-  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
-  result->set_kind ((const char *)kind_u);
-  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
-    return;
-
-  if (result->needs_option ())
-    {
-      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
-      c_pp_stream_token (parse_in, tok, result->loc_option);
-      if (tok->type != CPP_STRING)
-	return;
-      cpp_string str;
-      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str,
-					     CPP_STRING)
-	  || !str.len)
-	return;
-      result->option_str = (const char *)str.text;
-      result->own_option_str = true;
-    }
-
-  result->valid = true;
-}
-
 /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as C++)
    that do not process the deferred pragma while they are consuming tokens; they
    can use early mode to make sure diagnostics affecting the preprocessor itself
@@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
   static const bool want_diagnostics = (is_pp || !early);
 
   pragma_diagnostic_data data;
-  if (is_pp)
-    pragma_diagnostic_lex_pp (&data);
-  else
-    pragma_diagnostic_lex_normal (&data);
+  pragma_diagnostic_lex (&data);
 
   if (!data.kind_str)
     {
@@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
 {
   const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
   if (data->early_handler)
-    data->early_handler (parse_in);
+    {
+      data->early_handler (parse_in);
+      pragma_lex_discard_to_eol ();
+    }
 }
 
 /* Set up front-end pragmas.  */
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee3..198fa7723e5 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
 extern void maybe_apply_pragma_scalar_storage_order (tree);
 extern void add_to_renaming_pragma_list (tree, tree);
 
+/* These are to be implemented in each frontend that needs them.  */
 extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
+extern void pragma_lex_discard_to_eol ();
 
 /* Flags for use with c_lex_with_flags.  The values here were picked
    so that 0 means to translate and join strings.  */
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..8cede343edd 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13376,6 +13376,18 @@ pragma_lex (tree *value, location_t *loc)
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  cpp_ttype type;
+  do
+    {
+      type = c_parser_peek_token (the_parser)->type;
+      gcc_assert (type != CPP_EOF);
+      c_parser_consume_token (the_parser);
+    } while (type != CPP_PRAGMA_EOL);
+}
+
 static void
 c_parser_pragma_pch_preprocess (c_parser *parser)
 {
@@ -24756,6 +24768,15 @@ c_parse_file (void)
   the_parser = NULL;
 }
 
+void
+c_init_preprocess (void)
+{
+  /* Create a parser for use by pragma_lex during preprocessing.  */
+  the_parser = ggc_alloc<c_parser> ();
+  memset (the_parser, 0, sizeof (c_parser));
+  the_parser->tokens = &the_parser->tokens_buf[0];
+}
+
 /* Parse the body of a function declaration marked with "__RTL".
 
    The RTL parser works on the level of characters read from a
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d7ef5b34d42..bd9158134d1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -765,6 +765,15 @@ cp_lexer_new_main (void)
   return lexer;
 }
 
+/* Create a lexer and parser to be used during preprocess-only mode.
+   This will be filled with tokens to parse when needed by pragma_lex ().  */
+void
+c_init_preprocess ()
+{
+  gcc_assert (!the_parser);
+  the_parser = cp_parser_new (cp_lexer_alloc ());
+}
+
 /* Create a new lexer whose token stream is primed with the tokens in
    CACHE.  When these tokens are exhausted, no new tokens will be read.  */
 
@@ -49687,11 +49696,37 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
   return ret;
 }
 
+/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
+   populated the lexer with any tokens (the tokens rather being read by
+   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
+   a pragma.  */
+static void
+maybe_read_tokens_for_pragma_lex ()
+{
+  const auto lexer = the_parser->lexer;
+  if (!lexer->buffer->is_empty ())
+    return;
+
+  /* Read the rest of the tokens comprising the pragma line.  */
+  cp_token *tok;
+  do
+    {
+      tok = vec_safe_push (lexer->buffer, cp_token ());
+      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
+      gcc_assert (tok->type != CPP_EOF);
+    } while (tok->type != CPP_PRAGMA_EOL);
+  lexer->next_token = lexer->buffer->address ();
+  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
+}
+
 /* The interface the pragma parsers have to the lexer.  */
 
 enum cpp_ttype
 pragma_lex (tree *value, location_t *loc)
 {
+  if (flag_preprocess_only)
+    maybe_read_tokens_for_pragma_lex ();
+
   cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
   enum cpp_ttype ret = tok->type;
 
@@ -49714,6 +49749,16 @@ pragma_lex (tree *value, location_t *loc)
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  /* We have already read all the tokens, so we just need to discard
+     them here.  */
+  const auto lexer = the_parser->lexer;
+  lexer->next_token = lexer->last_token;
+  lexer->buffer->truncate (0);
+}
+
 \f
 /* External interface.  */
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-27 22:59   ` [PATCH v2] " Lewis Hyatt
@ 2023-07-28  1:18     ` Jason Merrill
  2023-07-28 11:14       ` Lewis Hyatt
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Merrill @ 2023-07-28  1:18 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On 7/27/23 18:59, Lewis Hyatt wrote:
> In order to support processing #pragma in preprocess-only mode (-E or
> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> libcpp. In full compilation modes, this is accomplished by calling
> pragma_lex (), which is a symbol that must be exported by the frontend, and
> which is currently implemented for C and C++. Neither of those frontends
> initializes its parser machinery in preprocess-only mode, and consequently
> pragma_lex () does not work in this case.
> 
> Address that by adding a new function c_init_preprocess () for the frontends
> to implement, which arranges for pragma_lex () to work in preprocess-only
> mode, and adjusting pragma_lex () accordingly.
> 
> In preprocess-only mode, the preprocessor is accustomed to controlling the
> interaction with libcpp, and it only knows about tokens that it has called
> into libcpp itself to obtain. Since it still needs to see the tokens
> obtained by pragma_lex () so that they can be streamed to the output, also
> adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
> inform the preprocessor about any tokens it won't be aware of.
> 
> Currently, there is one place where we are already supporting #pragma in
> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> was done by directly interfacing with libcpp, rather than making use of
> pragma_lex (). Now that pragma_lex () works, that code is no longer
> necessary; remove it.
> 
> gcc/c-family/ChangeLog:
> 
> 	* c-common.h (c_init_preprocess): Declare.
> 	(c_lex_enable_token_streaming): Declare.
> 	* c-opts.cc (c_common_init): Call c_init_preprocess ().
> 	* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
> 	(c_lex_enable_token_streaming): New function.
> 	(cb_def_pragma): Add a comment.
> 	(get_token): New function wrapping cpp_get_token.
> 	(c_lex_with_flags): Use the new wrapper function to support
> 	obtaining tokens in preprocess_only mode.
> 	(lex_string): Likewise.
> 	* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
> 	when needed.
> 	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> 	(pragma_diagnostic_lex): ...this.
> 	(pragma_diagnostic_lex_pp): Remove.
> 	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> 	all modes.
> 	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> 	usage.
> 	* c-pragma.h (pragma_lex_discard_to_eol): Declare.
> 
> gcc/c/ChangeLog:
> 
> 	* c-parser.cc (pragma_lex_discard_to_eol): New function.
> 	(c_init_preprocess): New function.
> 
> gcc/cp/ChangeLog:
> 
> 	* parser.cc (c_init_preprocess): New function.
> 	(maybe_read_tokens_for_pragma_lex): New function.
> 	(pragma_lex): Support preprocess-only mode.
> 	(pragma_lex_discard_to_eol): New function.
> ---
> 
> Notes:
>      Hello-
>      
>      Here is version 2 of the patch, incorporating Jason's feedback from
>      https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
>      
>      Thanks again, please let me know if it's OK? Bootstrap + regtest all
>      languages on x86-64 Linux looks good.
>      
>      -Lewis
> 
>   gcc/c-family/c-common.h    |  4 +++
>   gcc/c-family/c-lex.cc      | 49 +++++++++++++++++++++++++++++----
>   gcc/c-family/c-opts.cc     |  1 +
>   gcc/c-family/c-ppoutput.cc | 17 +++++++++---
>   gcc/c-family/c-pragma.cc   | 56 ++++++--------------------------------
>   gcc/c-family/c-pragma.h    |  2 ++
>   gcc/c/c-parser.cc          | 21 ++++++++++++++
>   gcc/cp/parser.cc           | 45 ++++++++++++++++++++++++++++++
>   8 files changed, 138 insertions(+), 57 deletions(-)
> 
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index b5ef5ff6b2c..2fe2f194660 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -990,6 +990,9 @@ extern void c_parse_file (void);
>   
>   extern void c_parse_final_cleanups (void);
>   
> +/* This initializes for preprocess-only mode.  */
> +extern void c_init_preprocess (void);
> +
>   /* These macros provide convenient access to the various _STMT nodes.  */
>   
>   /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
>   /* In c-lex.cc.  */
>   extern enum cpp_ttype
>   conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
> +extern void c_lex_enable_token_streaming (bool enabled);
>   
>   /* In c-pch.cc  */
>   extern void pch_init (void);
> diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> index dcd061c7cb1..ac4c018d863 100644
> --- a/gcc/c-family/c-lex.cc
> +++ b/gcc/c-family/c-lex.cc
> @@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
>   static void cb_def_pragma (cpp_reader *, unsigned int);
>   static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
>   static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
> +
> +/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
> +   tokens obtained here need to be streamed to the preprocessor.  */
> +static bool stream_tokens_to_preprocessor = false;
> +
> +void
> +c_lex_enable_token_streaming (bool enabled)
> +{
> +  stream_tokens_to_preprocessor = enabled;
> +}
> +
>   \f
>   void
>   init_c_lex (void)
> @@ -249,6 +260,10 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
>         location_t fe_loc = loc;
>   
>         space = name = (const unsigned char *) "";
> +
> +      /* N.B.  It's fine to call cpp_get_token () directly here (rather than our
> +	 local wrapper get_token ()), because this callback is not used with
> +	 flag_preprocess_only==true.  */
>         s = cpp_get_token (pfile);
>         if (s->type != CPP_EOF)
>   	{
> @@ -284,8 +299,32 @@ cb_undef (cpp_reader *pfile, location_t loc, cpp_hashnode *node)
>   			 (const char *) NODE_NAME (node));
>   }
>   
> +/* Wrapper around cpp_get_token_with_location to stream the token to the
> +   preprocessor so it can output it.  This is necessary with
> +   flag_preprocess_only if we are obtaining tokens here instead of from the loop
> +   in c-ppoutput.cc, such as while processing a #pragma.  */
> +
> +static const cpp_token *
> +get_token (cpp_reader *pfile, location_t *loc = nullptr)
> +{
> +  if (stream_tokens_to_preprocessor)

We can't use flag_preprocess_only here?

> +    {
> +      location_t x;
> +      if (!loc)
> +	loc = &x;
> +      const auto tok = cpp_get_token_with_location (pfile, loc);
> +      c_pp_stream_token (pfile, tok, *loc);
> +      return tok;
> +    }
> +  else
> +    return cpp_get_token_with_location (pfile, loc);
> +}
> +
>   /* Wrapper around cpp_get_token to skip CPP_PADDING tokens
> -   and not consume CPP_EOF.  */
> +   and not consume CPP_EOF.  This does not perform the optional
> +   streaming in preprocess_only mode, so is suitable to be used
> +   when processing builtin expansions such as c_common_has_attribute.  */
> +
>   static const cpp_token *
>   get_token_no_padding (cpp_reader *pfile)
>   {
> @@ -492,7 +531,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
>   
>     timevar_push (TV_CPP);
>    retry:
> -  tok = cpp_get_token_with_location (parse_in, loc);
> +  tok = get_token (parse_in, loc);
>     type = tok->type;
>   
>    retry_after_at:
> @@ -566,7 +605,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
>   	  location_t newloc;
>   
>   	retry_at:
> -	  tok = cpp_get_token_with_location (parse_in, &newloc);
> +	  tok = get_token (parse_in, &newloc);
>   	  type = tok->type;
>   	  switch (type)
>   	    {
> @@ -716,7 +755,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
>   	{
>   	  do
>   	    {
> -	      tok = cpp_get_token_with_location (parse_in, loc);
> +	      tok = get_token (parse_in, loc);
>   	      type = tok->type;
>   	    }
>   	  while (type == CPP_PADDING || type == CPP_COMMENT);
> @@ -1308,7 +1347,7 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
>     bool objc_at_sign_was_seen = false;
>   
>    retry:
> -  tok = cpp_get_token (parse_in);
> +  tok = get_token (parse_in);
>     switch (tok->type)
>       {
>       case CPP_PADDING:
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index af19140e382..4961af63de8 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -1232,6 +1232,7 @@ c_common_init (void)
>     if (flag_preprocess_only)
>       {
>         c_finish_options ();
> +      c_init_preprocess ();
>         preprocess_file (parse_in);
>         return false;
>       }
> diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
> index 4aa2bef2c0f..01a0bd79b13 100644
> --- a/gcc/c-family/c-ppoutput.cc
> +++ b/gcc/c-family/c-ppoutput.cc
> @@ -99,11 +99,20 @@ preprocess_file (cpp_reader *pfile)
>       }
>     else if (cpp_get_options (pfile)->traditional)
>       scan_translation_unit_trad (pfile);
> -  else if (cpp_get_options (pfile)->directives_only
> -	   && !cpp_get_options (pfile)->preprocessed)
> -    scan_translation_unit_directives_only (pfile);
>     else
> -    scan_translation_unit (pfile);
> +    {
> +      /* If we end up processing a pragma while preprocessing, the handler
> +	 for that pragma may end up obtaining some tokens from libcpp itself,
> +	 e.g. by calling pragma_lex ().  The frontend needs to know that it
> +	 should inform us about all such tokens, so we can output them.  */
> +      c_lex_enable_token_streaming (true);
> +
> +      if (cpp_get_options (pfile)->directives_only
> +	  && !cpp_get_options (pfile)->preprocessed)
> +	scan_translation_unit_directives_only (pfile);
> +      else
> +	scan_translation_unit (pfile);
> +    }
>   
>     /* -dM command line option.  Should this be elsewhere?  */
>     if (flag_dump_macros == 'M')
> diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
> index 0d2b333cebb..73d59df3bf4 100644
> --- a/gcc/c-family/c-pragma.cc
> +++ b/gcc/c-family/c-pragma.cc
> @@ -840,11 +840,11 @@ public:
>   
>   };
>   
> -/* When compiling normally, use pragma_lex () to obtain the needed tokens.
> -   This will call into either the C or C++ frontends as appropriate.  */
> +/* This will call into either the C or C++ frontends as appropriate to get
> +   tokens from libcpp for the pragma.  */
>   
>   static void
> -pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
> +pragma_diagnostic_lex (pragma_diagnostic_data *result)
>   {
>     result->clear ();
>     tree x;
> @@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
>     result->valid = true;
>   }
>   
> -/* When preprocessing only, pragma_lex () is not available, so obtain the
> -   tokens directly from libcpp.  We also need to inform the token streamer
> -   about all tokens we lex ourselves here, so it outputs them too; this is
> -   done by calling c_pp_stream_token () for each.
> -
> -   ???  If we need to support more pragmas in the future, maybe initialize
> -   this_parser with the pragma tokens and call pragma_lex () instead?  */
> -
> -static void
> -pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
> -{
> -  result->clear ();
> -
> -  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
> -  c_pp_stream_token (parse_in, tok, result->loc_kind);
> -  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
> -    return;
> -  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
> -  result->set_kind ((const char *)kind_u);
> -  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
> -    return;
> -
> -  if (result->needs_option ())
> -    {
> -      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
> -      c_pp_stream_token (parse_in, tok, result->loc_option);
> -      if (tok->type != CPP_STRING)
> -	return;
> -      cpp_string str;
> -      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str,
> -					     CPP_STRING)
> -	  || !str.len)
> -	return;
> -      result->option_str = (const char *)str.text;
> -      result->own_option_str = true;
> -    }
> -
> -  result->valid = true;
> -}
> -
>   /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as C++)
>      that do not process the deferred pragma while they are consuming tokens; they
>      can use early mode to make sure diagnostics affecting the preprocessor itself
> @@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
>     static const bool want_diagnostics = (is_pp || !early);
>   
>     pragma_diagnostic_data data;
> -  if (is_pp)
> -    pragma_diagnostic_lex_pp (&data);
> -  else
> -    pragma_diagnostic_lex_normal (&data);
> +  pragma_diagnostic_lex (&data);
>   
>     if (!data.kind_str)
>       {
> @@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
>   {
>     const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
>     if (data->early_handler)
> -    data->early_handler (parse_in);
> +    {
> +      data->early_handler (parse_in);
> +      pragma_lex_discard_to_eol ();
> +    }
>   }
>   
>   /* Set up front-end pragmas.  */
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> index 9cc95ab3ee3..198fa7723e5 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
>   extern void maybe_apply_pragma_scalar_storage_order (tree);
>   extern void add_to_renaming_pragma_list (tree, tree);
>   
> +/* These are to be implemented in each frontend that needs them.  */
>   extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
> +extern void pragma_lex_discard_to_eol ();
>   
>   /* Flags for use with c_lex_with_flags.  The values here were picked
>      so that 0 means to translate and join strings.  */
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 24a6eb6e459..8cede343edd 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -13376,6 +13376,18 @@ pragma_lex (tree *value, location_t *loc)
>     return ret;
>   }
>   
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  cpp_ttype type;
> +  do
> +    {
> +      type = c_parser_peek_token (the_parser)->type;
> +      gcc_assert (type != CPP_EOF);
> +      c_parser_consume_token (the_parser);
> +    } while (type != CPP_PRAGMA_EOL);
> +}
> +
>   static void
>   c_parser_pragma_pch_preprocess (c_parser *parser)
>   {
> @@ -24756,6 +24768,15 @@ c_parse_file (void)
>     the_parser = NULL;
>   }
>   
> +void
> +c_init_preprocess (void)
> +{
> +  /* Create a parser for use by pragma_lex during preprocessing.  */
> +  the_parser = ggc_alloc<c_parser> ();
> +  memset (the_parser, 0, sizeof (c_parser));
> +  the_parser->tokens = &the_parser->tokens_buf[0];
> +}
> +
>   /* Parse the body of a function declaration marked with "__RTL".
>   
>      The RTL parser works on the level of characters read from a
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index d7ef5b34d42..bd9158134d1 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -765,6 +765,15 @@ cp_lexer_new_main (void)
>     return lexer;
>   }
>   
> +/* Create a lexer and parser to be used during preprocess-only mode.
> +   This will be filled with tokens to parse when needed by pragma_lex ().  */
> +void
> +c_init_preprocess ()
> +{
> +  gcc_assert (!the_parser);
> +  the_parser = cp_parser_new (cp_lexer_alloc ());
> +}
> +
>   /* Create a new lexer whose token stream is primed with the tokens in
>      CACHE.  When these tokens are exhausted, no new tokens will be read.  */
>   
> @@ -49687,11 +49696,37 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
>     return ret;
>   }
>   
> +/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
> +   populated the lexer with any tokens (the tokens rather being read by
> +   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
> +   a pragma.  */
> +static void
> +maybe_read_tokens_for_pragma_lex ()
> +{
> +  const auto lexer = the_parser->lexer;
> +  if (!lexer->buffer->is_empty ())
> +    return;
> +
> +  /* Read the rest of the tokens comprising the pragma line.  */
> +  cp_token *tok;
> +  do
> +    {
> +      tok = vec_safe_push (lexer->buffer, cp_token ());
> +      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
> +      gcc_assert (tok->type != CPP_EOF);
> +    } while (tok->type != CPP_PRAGMA_EOL);
> +  lexer->next_token = lexer->buffer->address ();
> +  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
> +}
> +
>   /* The interface the pragma parsers have to the lexer.  */
>   
>   enum cpp_ttype
>   pragma_lex (tree *value, location_t *loc)
>   {
> +  if (flag_preprocess_only)
> +    maybe_read_tokens_for_pragma_lex ();
> +
>     cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
>     enum cpp_ttype ret = tok->type;
>   
> @@ -49714,6 +49749,16 @@ pragma_lex (tree *value, location_t *loc)
>     return ret;
>   }
>   
> +void
> +pragma_lex_discard_to_eol ()
> +{
> +  /* We have already read all the tokens, so we just need to discard
> +     them here.  */
> +  const auto lexer = the_parser->lexer;
> +  lexer->next_token = lexer->last_token;
> +  lexer->buffer->truncate (0);
> +}
> +
>   \f
>   /* External interface.  */
>   
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-28  1:18     ` Jason Merrill
@ 2023-07-28 11:14       ` Lewis Hyatt
  2023-07-29  6:29         ` Jason Merrill
  0 siblings, 1 reply; 9+ messages in thread
From: Lewis Hyatt @ 2023-07-28 11:14 UTC (permalink / raw)
  To: Jason Merrill; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 7064 bytes --]

On Thu, Jul 27, 2023 at 06:18:33PM -0700, Jason Merrill wrote:
> On 7/27/23 18:59, Lewis Hyatt wrote:
> > In order to support processing #pragma in preprocess-only mode (-E or
> > -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
> > libcpp. In full compilation modes, this is accomplished by calling
> > pragma_lex (), which is a symbol that must be exported by the frontend, and
> > which is currently implemented for C and C++. Neither of those frontends
> > initializes its parser machinery in preprocess-only mode, and consequently
> > pragma_lex () does not work in this case.
> > 
> > Address that by adding a new function c_init_preprocess () for the frontends
> > to implement, which arranges for pragma_lex () to work in preprocess-only
> > mode, and adjusting pragma_lex () accordingly.
> > 
> > In preprocess-only mode, the preprocessor is accustomed to controlling the
> > interaction with libcpp, and it only knows about tokens that it has called
> > into libcpp itself to obtain. Since it still needs to see the tokens
> > obtained by pragma_lex () so that they can be streamed to the output, also
> > adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
> > inform the preprocessor about any tokens it won't be aware of.
> > 
> > Currently, there is one place where we are already supporting #pragma in
> > preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
> > was done by directly interfacing with libcpp, rather than making use of
> > pragma_lex (). Now that pragma_lex () works, that code is no longer
> > necessary; remove it.
> > 
> > gcc/c-family/ChangeLog:
> > 
> > 	* c-common.h (c_init_preprocess): Declare.
> > 	(c_lex_enable_token_streaming): Declare.
> > 	* c-opts.cc (c_common_init): Call c_init_preprocess ().
> > 	* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
> > 	(c_lex_enable_token_streaming): New function.
> > 	(cb_def_pragma): Add a comment.
> > 	(get_token): New function wrapping cpp_get_token.
> > 	(c_lex_with_flags): Use the new wrapper function to support
> > 	obtaining tokens in preprocess_only mode.
> > 	(lex_string): Likewise.
> > 	* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
> > 	when needed.
> > 	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
> > 	(pragma_diagnostic_lex): ...this.
> > 	(pragma_diagnostic_lex_pp): Remove.
> > 	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
> > 	all modes.
> > 	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
> > 	usage.
> > 	* c-pragma.h (pragma_lex_discard_to_eol): Declare.
> > 
> > gcc/c/ChangeLog:
> > 
> > 	* c-parser.cc (pragma_lex_discard_to_eol): New function.
> > 	(c_init_preprocess): New function.
> > 
> > gcc/cp/ChangeLog:
> > 
> > 	* parser.cc (c_init_preprocess): New function.
> > 	(maybe_read_tokens_for_pragma_lex): New function.
> > 	(pragma_lex): Support preprocess-only mode.
> > 	(pragma_lex_discard_to_eol): New function.
> > ---
> > 
> > Notes:
> >      Hello-
> >      Here is version 2 of the patch, incorporating Jason's feedback from
> >      https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
> >      Thanks again, please let me know if it's OK? Bootstrap + regtest all
> >      languages on x86-64 Linux looks good.
> >      -Lewis
> > 
> >   gcc/c-family/c-common.h    |  4 +++
> >   gcc/c-family/c-lex.cc      | 49 +++++++++++++++++++++++++++++----
> >   gcc/c-family/c-opts.cc     |  1 +
> >   gcc/c-family/c-ppoutput.cc | 17 +++++++++---
> >   gcc/c-family/c-pragma.cc   | 56 ++++++--------------------------------
> >   gcc/c-family/c-pragma.h    |  2 ++
> >   gcc/c/c-parser.cc          | 21 ++++++++++++++
> >   gcc/cp/parser.cc           | 45 ++++++++++++++++++++++++++++++
> >   8 files changed, 138 insertions(+), 57 deletions(-)
> > 
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index b5ef5ff6b2c..2fe2f194660 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -990,6 +990,9 @@ extern void c_parse_file (void);
> >   extern void c_parse_final_cleanups (void);
> > +/* This initializes for preprocess-only mode.  */
> > +extern void c_init_preprocess (void);
> > +
> >   /* These macros provide convenient access to the various _STMT nodes.  */
> >   /* Nonzero if a given STATEMENT_LIST represents the outermost binding
> > @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
> >   /* In c-lex.cc.  */
> >   extern enum cpp_ttype
> >   conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
> > +extern void c_lex_enable_token_streaming (bool enabled);
> >   /* In c-pch.cc  */
> >   extern void pch_init (void);
> > diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
> > index dcd061c7cb1..ac4c018d863 100644
> > --- a/gcc/c-family/c-lex.cc
> > +++ b/gcc/c-family/c-lex.cc
> > @@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
> >   static void cb_def_pragma (cpp_reader *, unsigned int);
> >   static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
> >   static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
> > +
> > +/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
> > +   tokens obtained here need to be streamed to the preprocessor.  */
> > +static bool stream_tokens_to_preprocessor = false;
> > +
> > +void
> > +c_lex_enable_token_streaming (bool enabled)
> > +{
> > +  stream_tokens_to_preprocessor = enabled;
> > +}
> > +
> >   \f
> >   void
> >   init_c_lex (void)
> > @@ -249,6 +260,10 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
> >         location_t fe_loc = loc;
> >         space = name = (const unsigned char *) "";
> > +
> > +      /* N.B.  It's fine to call cpp_get_token () directly here (rather than our
> > +	 local wrapper get_token ()), because this callback is not used with
> > +	 flag_preprocess_only==true.  */
> >         s = cpp_get_token (pfile);
> >         if (s->type != CPP_EOF)
> >   	{
> > @@ -284,8 +299,32 @@ cb_undef (cpp_reader *pfile, location_t loc, cpp_hashnode *node)
> >   			 (const char *) NODE_NAME (node));
> >   }
> > +/* Wrapper around cpp_get_token_with_location to stream the token to the
> > +   preprocessor so it can output it.  This is necessary with
> > +   flag_preprocess_only if we are obtaining tokens here instead of from the loop
> > +   in c-ppoutput.cc, such as while processing a #pragma.  */
> > +
> > +static const cpp_token *
> > +get_token (cpp_reader *pfile, location_t *loc = nullptr)
> > +{
> > +  if (stream_tokens_to_preprocessor)
> 
> We can't use flag_preprocess_only here?

Thanks, I had thought there could be a potential issue with needing to also
check cpp_get_options(pfile)->traditional. But looking at it more, there's no
code path currently that can end up here in traditional mode, so yes we can
eliminate stream_tokens_to_preprocessor and just check flag_preprocess_only.

The attached simplified patch does this, bootstrap + regtest look good as
well.

-Lewis

[-- Attachment #2: v3-0001-c-family-Implement-pragma_lex-for-preprocess-only-m.txt --]
[-- Type: text/plain, Size: 13181 bytes --]

Subject: [PATCH] c-family: Implement pragma_lex () for preprocess-only mode

In order to support processing #pragma in preprocess-only mode (-E or
-save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
libcpp. In full compilation modes, this is accomplished by calling
pragma_lex (), which is a symbol that must be exported by the frontend, and
which is currently implemented for C and C++. Neither of those frontends
initializes its parser machinery in preprocess-only mode, and consequently
pragma_lex () does not work in this case.

Address that by adding a new function c_init_preprocess () for the frontends
to implement, which arranges for pragma_lex () to work in preprocess-only
mode, and adjusting pragma_lex () accordingly.

In preprocess-only mode, the preprocessor is accustomed to controlling the
interaction with libcpp, and it only knows about tokens that it has called
into libcpp itself to obtain. Since it still needs to see the tokens
obtained by pragma_lex () so that they can be streamed to the output, also
adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
inform the preprocessor about any tokens it won't be aware of.

Currently, there is one place where we are already supporting #pragma in
preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
was done by directly interfacing with libcpp, rather than making use of
pragma_lex (). Now that pragma_lex () works, that code is no longer
necessary; remove it.

gcc/c-family/ChangeLog:

	* c-common.h (c_init_preprocess): Declare new function.
	* c-opts.cc (c_common_init): Call it.
	* c-lex.cc (cb_def_pragma): Add a comment.
	(get_token): New function wrapping cpp_get_token.
	(c_lex_with_flags): Use the new wrapper function to support
	obtaining tokens in preprocess_only mode.
	(lex_string): Likewise.
	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
	(pragma_diagnostic_lex): ...this.
	(pragma_diagnostic_lex_pp): Remove.
	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
	all modes.
	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
	usage.
	* c-pragma.h (pragma_lex_discard_to_eol): Declare.

gcc/c/ChangeLog:

	* c-parser.cc (pragma_lex_discard_to_eol): New function.
	(c_init_preprocess): New function.

gcc/cp/ChangeLog:

	* parser.cc (c_init_preprocess): New function.
	(maybe_read_tokens_for_pragma_lex): New function.
	(pragma_lex): Support preprocess-only mode.
	(pragma_lex_discard_to_eol): New function.

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b5ef5ff6b2c..78fc5248ba6 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,9 @@ extern void c_parse_file (void);
 
 extern void c_parse_final_cleanups (void);
 
+/* This initializes for preprocess-only mode.  */
+extern void c_init_preprocess (void);
+
 /* These macros provide convenient access to the various _STMT nodes.  */
 
 /* Nonzero if a given STATEMENT_LIST represents the outermost binding
diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
index dcd061c7cb1..d8aa2907c51 100644
--- a/gcc/c-family/c-lex.cc
+++ b/gcc/c-family/c-lex.cc
@@ -249,6 +249,10 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
       location_t fe_loc = loc;
 
       space = name = (const unsigned char *) "";
+
+      /* N.B.  It's fine to call cpp_get_token () directly here (rather than our
+	 local wrapper get_token ()), because this callback is not used with
+	 flag_preprocess_only==true.  */
       s = cpp_get_token (pfile);
       if (s->type != CPP_EOF)
 	{
@@ -284,8 +288,32 @@ cb_undef (cpp_reader *pfile, location_t loc, cpp_hashnode *node)
 			 (const char *) NODE_NAME (node));
 }
 
+/* Wrapper around cpp_get_token_with_location to stream the token to the
+   preprocessor so it can output it.  This is necessary with
+   flag_preprocess_only if we are obtaining tokens here instead of from the loop
+   in c-ppoutput.cc, such as while processing a #pragma.  */
+
+static const cpp_token *
+get_token (cpp_reader *pfile, location_t *loc = nullptr)
+{
+  if (flag_preprocess_only)
+    {
+      location_t x;
+      if (!loc)
+	loc = &x;
+      const auto tok = cpp_get_token_with_location (pfile, loc);
+      c_pp_stream_token (pfile, tok, *loc);
+      return tok;
+    }
+  else
+    return cpp_get_token_with_location (pfile, loc);
+}
+
 /* Wrapper around cpp_get_token to skip CPP_PADDING tokens
-   and not consume CPP_EOF.  */
+   and not consume CPP_EOF.  This does not perform the optional
+   streaming in preprocess_only mode, so is suitable to be used
+   when processing builtin expansions such as c_common_has_attribute.  */
+
 static const cpp_token *
 get_token_no_padding (cpp_reader *pfile)
 {
@@ -492,7 +520,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 
   timevar_push (TV_CPP);
  retry:
-  tok = cpp_get_token_with_location (parse_in, loc);
+  tok = get_token (parse_in, loc);
   type = tok->type;
 
  retry_after_at:
@@ -566,7 +594,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 	  location_t newloc;
 
 	retry_at:
-	  tok = cpp_get_token_with_location (parse_in, &newloc);
+	  tok = get_token (parse_in, &newloc);
 	  type = tok->type;
 	  switch (type)
 	    {
@@ -716,7 +744,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
 	{
 	  do
 	    {
-	      tok = cpp_get_token_with_location (parse_in, loc);
+	      tok = get_token (parse_in, loc);
 	      type = tok->type;
 	    }
 	  while (type == CPP_PADDING || type == CPP_COMMENT);
@@ -1308,7 +1336,7 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
   bool objc_at_sign_was_seen = false;
 
  retry:
-  tok = cpp_get_token (parse_in);
+  tok = get_token (parse_in);
   switch (tok->type)
     {
     case CPP_PADDING:
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index af19140e382..4961af63de8 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -1232,6 +1232,7 @@ c_common_init (void)
   if (flag_preprocess_only)
     {
       c_finish_options ();
+      c_init_preprocess ();
       preprocess_file (parse_in);
       return false;
     }
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 0d2b333cebb..73d59df3bf4 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -840,11 +840,11 @@ public:
 
 };
 
-/* When compiling normally, use pragma_lex () to obtain the needed tokens.
-   This will call into either the C or C++ frontends as appropriate.  */
+/* This will call into either the C or C++ frontends as appropriate to get
+   tokens from libcpp for the pragma.  */
 
 static void
-pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
+pragma_diagnostic_lex (pragma_diagnostic_data *result)
 {
   result->clear ();
   tree x;
@@ -866,46 +866,6 @@ pragma_diagnostic_lex_normal (pragma_diagnostic_data *result)
   result->valid = true;
 }
 
-/* When preprocessing only, pragma_lex () is not available, so obtain the
-   tokens directly from libcpp.  We also need to inform the token streamer
-   about all tokens we lex ourselves here, so it outputs them too; this is
-   done by calling c_pp_stream_token () for each.
-
-   ???  If we need to support more pragmas in the future, maybe initialize
-   this_parser with the pragma tokens and call pragma_lex () instead?  */
-
-static void
-pragma_diagnostic_lex_pp (pragma_diagnostic_data *result)
-{
-  result->clear ();
-
-  auto tok = cpp_get_token_with_location (parse_in, &result->loc_kind);
-  c_pp_stream_token (parse_in, tok, result->loc_kind);
-  if (!(tok->type == CPP_NAME || tok->type == CPP_KEYWORD))
-    return;
-  const unsigned char *const kind_u = cpp_token_as_text (parse_in, tok);
-  result->set_kind ((const char *)kind_u);
-  if (result->pd_kind == pragma_diagnostic_data::PK_INVALID)
-    return;
-
-  if (result->needs_option ())
-    {
-      tok = cpp_get_token_with_location (parse_in, &result->loc_option);
-      c_pp_stream_token (parse_in, tok, result->loc_option);
-      if (tok->type != CPP_STRING)
-	return;
-      cpp_string str;
-      if (!cpp_interpret_string_notranslate (parse_in, &tok->val.str, 1, &str,
-					     CPP_STRING)
-	  || !str.len)
-	return;
-      result->option_str = (const char *)str.text;
-      result->own_option_str = true;
-    }
-
-  result->valid = true;
-}
-
 /* Handle #pragma GCC diagnostic.  Early mode is used by frontends (such as C++)
    that do not process the deferred pragma while they are consuming tokens; they
    can use early mode to make sure diagnostics affecting the preprocessor itself
@@ -916,10 +876,7 @@ handle_pragma_diagnostic_impl ()
   static const bool want_diagnostics = (is_pp || !early);
 
   pragma_diagnostic_data data;
-  if (is_pp)
-    pragma_diagnostic_lex_pp (&data);
-  else
-    pragma_diagnostic_lex_normal (&data);
+  pragma_diagnostic_lex (&data);
 
   if (!data.kind_str)
     {
@@ -1808,7 +1765,10 @@ c_pp_invoke_early_pragma_handler (unsigned int id)
 {
   const auto data = &registered_pp_pragmas[id - PRAGMA_FIRST_EXTERNAL];
   if (data->early_handler)
-    data->early_handler (parse_in);
+    {
+      data->early_handler (parse_in);
+      pragma_lex_discard_to_eol ();
+    }
 }
 
 /* Set up front-end pragmas.  */
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 9cc95ab3ee3..198fa7723e5 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -263,7 +263,9 @@ extern tree maybe_apply_renaming_pragma (tree, tree);
 extern void maybe_apply_pragma_scalar_storage_order (tree);
 extern void add_to_renaming_pragma_list (tree, tree);
 
+/* These are to be implemented in each frontend that needs them.  */
 extern enum cpp_ttype pragma_lex (tree *, location_t *loc = NULL);
+extern void pragma_lex_discard_to_eol ();
 
 /* Flags for use with c_lex_with_flags.  The values here were picked
    so that 0 means to translate and join strings.  */
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 24a6eb6e459..8cede343edd 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -13376,6 +13376,18 @@ pragma_lex (tree *value, location_t *loc)
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  cpp_ttype type;
+  do
+    {
+      type = c_parser_peek_token (the_parser)->type;
+      gcc_assert (type != CPP_EOF);
+      c_parser_consume_token (the_parser);
+    } while (type != CPP_PRAGMA_EOL);
+}
+
 static void
 c_parser_pragma_pch_preprocess (c_parser *parser)
 {
@@ -24756,6 +24768,15 @@ c_parse_file (void)
   the_parser = NULL;
 }
 
+void
+c_init_preprocess (void)
+{
+  /* Create a parser for use by pragma_lex during preprocessing.  */
+  the_parser = ggc_alloc<c_parser> ();
+  memset (the_parser, 0, sizeof (c_parser));
+  the_parser->tokens = &the_parser->tokens_buf[0];
+}
+
 /* Parse the body of a function declaration marked with "__RTL".
 
    The RTL parser works on the level of characters read from a
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index d7ef5b34d42..bd9158134d1 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -765,6 +765,15 @@ cp_lexer_new_main (void)
   return lexer;
 }
 
+/* Create a lexer and parser to be used during preprocess-only mode.
+   This will be filled with tokens to parse when needed by pragma_lex ().  */
+void
+c_init_preprocess ()
+{
+  gcc_assert (!the_parser);
+  the_parser = cp_parser_new (cp_lexer_alloc ());
+}
+
 /* Create a new lexer whose token stream is primed with the tokens in
    CACHE.  When these tokens are exhausted, no new tokens will be read.  */
 
@@ -49687,11 +49696,37 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context, bool *if_p)
   return ret;
 }
 
+/* Helper for pragma_lex in preprocess-only mode; in this mode, we have not
+   populated the lexer with any tokens (the tokens rather being read by
+   c-ppoutput.c's machinery), so we need to read enough tokens now to handle
+   a pragma.  */
+static void
+maybe_read_tokens_for_pragma_lex ()
+{
+  const auto lexer = the_parser->lexer;
+  if (!lexer->buffer->is_empty ())
+    return;
+
+  /* Read the rest of the tokens comprising the pragma line.  */
+  cp_token *tok;
+  do
+    {
+      tok = vec_safe_push (lexer->buffer, cp_token ());
+      cp_lexer_get_preprocessor_token (C_LEX_STRING_NO_JOIN, tok);
+      gcc_assert (tok->type != CPP_EOF);
+    } while (tok->type != CPP_PRAGMA_EOL);
+  lexer->next_token = lexer->buffer->address ();
+  lexer->last_token = lexer->next_token + lexer->buffer->length () - 1;
+}
+
 /* The interface the pragma parsers have to the lexer.  */
 
 enum cpp_ttype
 pragma_lex (tree *value, location_t *loc)
 {
+  if (flag_preprocess_only)
+    maybe_read_tokens_for_pragma_lex ();
+
   cp_token *tok = cp_lexer_peek_token (the_parser->lexer);
   enum cpp_ttype ret = tok->type;
 
@@ -49714,6 +49749,16 @@ pragma_lex (tree *value, location_t *loc)
   return ret;
 }
 
+void
+pragma_lex_discard_to_eol ()
+{
+  /* We have already read all the tokens, so we just need to discard
+     them here.  */
+  const auto lexer = the_parser->lexer;
+  lexer->next_token = lexer->last_token;
+  lexer->buffer->truncate (0);
+}
+
 \f
 /* External interface.  */
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-28 11:14       ` Lewis Hyatt
@ 2023-07-29  6:29         ` Jason Merrill
  2023-07-31 15:56           ` Joseph Myers
  0 siblings, 1 reply; 9+ messages in thread
From: Jason Merrill @ 2023-07-29  6:29 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches, Joseph S. Myers, Marek Polacek

On 7/28/23 07:14, Lewis Hyatt wrote:
> On Thu, Jul 27, 2023 at 06:18:33PM -0700, Jason Merrill wrote:
>> On 7/27/23 18:59, Lewis Hyatt wrote:
>>> In order to support processing #pragma in preprocess-only mode (-E or
>>> -save-temps for gcc/g++), we need a way to obtain the #pragma tokens from
>>> libcpp. In full compilation modes, this is accomplished by calling
>>> pragma_lex (), which is a symbol that must be exported by the frontend, and
>>> which is currently implemented for C and C++. Neither of those frontends
>>> initializes its parser machinery in preprocess-only mode, and consequently
>>> pragma_lex () does not work in this case.
>>>
>>> Address that by adding a new function c_init_preprocess () for the frontends
>>> to implement, which arranges for pragma_lex () to work in preprocess-only
>>> mode, and adjusting pragma_lex () accordingly.
>>>
>>> In preprocess-only mode, the preprocessor is accustomed to controlling the
>>> interaction with libcpp, and it only knows about tokens that it has called
>>> into libcpp itself to obtain. Since it still needs to see the tokens
>>> obtained by pragma_lex () so that they can be streamed to the output, also
>>> adjust c_lex_with_flags () and related functions in c-family/c-lex.cc to
>>> inform the preprocessor about any tokens it won't be aware of.
>>>
>>> Currently, there is one place where we are already supporting #pragma in
>>> preprocess-only mode, namely the handling of `#pragma GCC diagnostic'.  That
>>> was done by directly interfacing with libcpp, rather than making use of
>>> pragma_lex (). Now that pragma_lex () works, that code is no longer
>>> necessary; remove it.
>>>
>>> gcc/c-family/ChangeLog:
>>>
>>> 	* c-common.h (c_init_preprocess): Declare.
>>> 	(c_lex_enable_token_streaming): Declare.
>>> 	* c-opts.cc (c_common_init): Call c_init_preprocess ().
>>> 	* c-lex.cc (stream_tokens_to_preprocessor): New static variable.
>>> 	(c_lex_enable_token_streaming): New function.
>>> 	(cb_def_pragma): Add a comment.
>>> 	(get_token): New function wrapping cpp_get_token.
>>> 	(c_lex_with_flags): Use the new wrapper function to support
>>> 	obtaining tokens in preprocess_only mode.
>>> 	(lex_string): Likewise.
>>> 	* c-ppoutput.cc (preprocess_file): Call c_lex_enable_token_streaming
>>> 	when needed.
>>> 	* c-pragma.cc (pragma_diagnostic_lex_normal): Rename to...
>>> 	(pragma_diagnostic_lex): ...this.
>>> 	(pragma_diagnostic_lex_pp): Remove.
>>> 	(handle_pragma_diagnostic_impl): Call pragma_diagnostic_lex () in
>>> 	all modes.
>>> 	(c_pp_invoke_early_pragma_handler): Adapt to support pragma_lex ()
>>> 	usage.
>>> 	* c-pragma.h (pragma_lex_discard_to_eol): Declare.
>>>
>>> gcc/c/ChangeLog:
>>>
>>> 	* c-parser.cc (pragma_lex_discard_to_eol): New function.
>>> 	(c_init_preprocess): New function.
>>>
>>> gcc/cp/ChangeLog:
>>>
>>> 	* parser.cc (c_init_preprocess): New function.
>>> 	(maybe_read_tokens_for_pragma_lex): New function.
>>> 	(pragma_lex): Support preprocess-only mode.
>>> 	(pragma_lex_discard_to_eol): New function.
>>> ---
>>>
>>> Notes:
>>>       Hello-
>>>       Here is version 2 of the patch, incorporating Jason's feedback from
>>>       https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625591.html
>>>       Thanks again, please let me know if it's OK? Bootstrap + regtest all
>>>       languages on x86-64 Linux looks good.
>>>       -Lewis
>>>
>>>    gcc/c-family/c-common.h    |  4 +++
>>>    gcc/c-family/c-lex.cc      | 49 +++++++++++++++++++++++++++++----
>>>    gcc/c-family/c-opts.cc     |  1 +
>>>    gcc/c-family/c-ppoutput.cc | 17 +++++++++---
>>>    gcc/c-family/c-pragma.cc   | 56 ++++++--------------------------------
>>>    gcc/c-family/c-pragma.h    |  2 ++
>>>    gcc/c/c-parser.cc          | 21 ++++++++++++++
>>>    gcc/cp/parser.cc           | 45 ++++++++++++++++++++++++++++++
>>>    8 files changed, 138 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
>>> index b5ef5ff6b2c..2fe2f194660 100644
>>> --- a/gcc/c-family/c-common.h
>>> +++ b/gcc/c-family/c-common.h
>>> @@ -990,6 +990,9 @@ extern void c_parse_file (void);
>>>    extern void c_parse_final_cleanups (void);
>>> +/* This initializes for preprocess-only mode.  */
>>> +extern void c_init_preprocess (void);
>>> +
>>>    /* These macros provide convenient access to the various _STMT nodes.  */
>>>    /* Nonzero if a given STATEMENT_LIST represents the outermost binding
>>> @@ -1214,6 +1217,7 @@ extern tree c_build_bind_expr (location_t, tree, tree);
>>>    /* In c-lex.cc.  */
>>>    extern enum cpp_ttype
>>>    conflict_marker_get_final_tok_kind (enum cpp_ttype tok1_kind);
>>> +extern void c_lex_enable_token_streaming (bool enabled);
>>>    /* In c-pch.cc  */
>>>    extern void pch_init (void);
>>> diff --git a/gcc/c-family/c-lex.cc b/gcc/c-family/c-lex.cc
>>> index dcd061c7cb1..ac4c018d863 100644
>>> --- a/gcc/c-family/c-lex.cc
>>> +++ b/gcc/c-family/c-lex.cc
>>> @@ -57,6 +57,17 @@ static void cb_ident (cpp_reader *, unsigned int, const cpp_string *);
>>>    static void cb_def_pragma (cpp_reader *, unsigned int);
>>>    static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
>>>    static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
>>> +
>>> +/* Flag to remember if we are in a mode (such as flag_preprocess_only) in which
>>> +   tokens obtained here need to be streamed to the preprocessor.  */
>>> +static bool stream_tokens_to_preprocessor = false;
>>> +
>>> +void
>>> +c_lex_enable_token_streaming (bool enabled)
>>> +{
>>> +  stream_tokens_to_preprocessor = enabled;
>>> +}
>>> +
>>>    \f
>>>    void
>>>    init_c_lex (void)
>>> @@ -249,6 +260,10 @@ cb_def_pragma (cpp_reader *pfile, location_t loc)
>>>          location_t fe_loc = loc;
>>>          space = name = (const unsigned char *) "";
>>> +
>>> +      /* N.B.  It's fine to call cpp_get_token () directly here (rather than our
>>> +	 local wrapper get_token ()), because this callback is not used with
>>> +	 flag_preprocess_only==true.  */
>>>          s = cpp_get_token (pfile);
>>>          if (s->type != CPP_EOF)
>>>    	{
>>> @@ -284,8 +299,32 @@ cb_undef (cpp_reader *pfile, location_t loc, cpp_hashnode *node)
>>>    			 (const char *) NODE_NAME (node));
>>>    }
>>> +/* Wrapper around cpp_get_token_with_location to stream the token to the
>>> +   preprocessor so it can output it.  This is necessary with
>>> +   flag_preprocess_only if we are obtaining tokens here instead of from the loop
>>> +   in c-ppoutput.cc, such as while processing a #pragma.  */
>>> +
>>> +static const cpp_token *
>>> +get_token (cpp_reader *pfile, location_t *loc = nullptr)
>>> +{
>>> +  if (stream_tokens_to_preprocessor)
>>
>> We can't use flag_preprocess_only here?
> 
> Thanks, I had thought there could be a potential issue with needing to also
> check cpp_get_options(pfile)->traditional. But looking at it more, there's no
> code path currently that can end up here in traditional mode, so yes we can
> eliminate stream_tokens_to_preprocessor and just check flag_preprocess_only.
> 
> The attached simplified patch does this, bootstrap + regtest look good as
> well.

LGTM, I'll let the C maintainers comment on the C parser change.

Jason


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] c-family: Implement pragma_lex () for preprocess-only mode
  2023-07-29  6:29         ` Jason Merrill
@ 2023-07-31 15:56           ` Joseph Myers
  0 siblings, 0 replies; 9+ messages in thread
From: Joseph Myers @ 2023-07-31 15:56 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Lewis Hyatt, gcc-patches, Marek Polacek

On Fri, 28 Jul 2023, Jason Merrill via Gcc-patches wrote:

> > Thanks, I had thought there could be a potential issue with needing to also
> > check cpp_get_options(pfile)->traditional. But looking at it more, there's
> > no
> > code path currently that can end up here in traditional mode, so yes we can
> > eliminate stream_tokens_to_preprocessor and just check flag_preprocess_only.
> > 
> > The attached simplified patch does this, bootstrap + regtest look good as
> > well.
> 
> LGTM, I'll let the C maintainers comment on the C parser change.

The C parser change is OK.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-07-31 15:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-30 22:59 [PATCH] c-family: Implement pragma_lex () for preprocess-only mode Lewis Hyatt
2023-07-26 20:58 ` Lewis Hyatt
2023-07-26 21:36 ` Jason Merrill
2023-07-26 22:25   ` Lewis Hyatt
2023-07-27 22:59   ` [PATCH v2] " Lewis Hyatt
2023-07-28  1:18     ` Jason Merrill
2023-07-28 11:14       ` Lewis Hyatt
2023-07-29  6:29         ` Jason Merrill
2023-07-31 15:56           ` Joseph Myers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).