public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH RESEND 0/1] RFC: P1689R5 support
@ 2022-10-04 15:11 Ben Boeckel
  2022-10-04 15:12 ` [PATCH RESEND 1/1] p1689r5: initial support Ben Boeckel
  2022-10-10 20:21 ` [PATCH RESEND 0/1] RFC: P1689R5 support Jason Merrill
  0 siblings, 2 replies; 14+ messages in thread
From: Ben Boeckel @ 2022-10-04 15:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: jason, nathan, fortran, gcc, brad.king, Ben Boeckel

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
<some_header>;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

Testing is currently happening in CMake's CI using a prior revision of
this patch (the differences are basically the changelog, some style, and
`trtbd` instead of `p1689r5` as the format name).

For testing within GCC, I'll work on the following:

- scanning non-module source
- scanning module-importing source (`import X;`)
- scanning module-exporting source (`export module X;`)
- scanning module implementation unit (`module X;`)
- flag combinations?

Are there existing tools for handling JSON output for testing purposes?
Basically, something that I can add to the test suite that doesn't care
about whitespace, but checks the structure (with sensible replacements
for absolute paths where relevant)?

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

    https://reviews.llvm.org/D134269

with the same flags (though using my old `trtbd` spelling for the
format name).

Thanks,

--Ben

Ben Boeckel (1):
  p1689r5: initial support

 gcc/ChangeLog           |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++++++-
 gcc/c-family/c.opt      |  12 ++
 gcc/cp/ChangeLog        |   5 +
 gcc/cp/module.cc        |   3 +-
 gcc/doc/invoke.texi     |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc      |   4 +-
 gcc/genmatch.cc         |   2 +-
 gcc/input.cc            |   4 +-
 libcpp/ChangeLog        |  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc          |  14 ++-
 libcpp/mkdeps.cc        | 235 ++++++++++++++++++++++++++++++++++++++--
 16 files changed, 368 insertions(+), 26 deletions(-)


base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
-- 
2.37.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-04 15:11 [PATCH RESEND 0/1] RFC: P1689R5 support Ben Boeckel
@ 2022-10-04 15:12 ` Ben Boeckel
  2022-10-04 19:12   ` Harald Anlauf
  2022-10-10 21:04   ` Jason Merrill
  2022-10-10 20:21 ` [PATCH RESEND 0/1] RFC: P1689R5 support Jason Merrill
  1 sibling, 2 replies; 14+ messages in thread
From: Ben Boeckel @ 2022-10-04 15:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: jason, nathan, fortran, gcc, brad.king, Ben Boeckel

This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 gcc/ChangeLog           |   9 ++
 gcc/c-family/ChangeLog  |   6 +
 gcc/c-family/c-opts.cc  |  40 ++++++-
 gcc/c-family/c.opt      |  12 ++
 gcc/cp/ChangeLog        |   5 +
 gcc/cp/module.cc        |   3 +-
 gcc/doc/invoke.texi     |  15 +++
 gcc/fortran/ChangeLog   |   5 +
 gcc/fortran/cpp.cc      |   4 +-
 gcc/genmatch.cc         |   2 +-
 gcc/input.cc            |   4 +-
 libcpp/ChangeLog        |  11 ++
 libcpp/include/cpplib.h |  12 +-
 libcpp/include/mkdeps.h |  17 ++-
 libcpp/init.cc          |  14 ++-
 libcpp/mkdeps.cc        | 235 ++++++++++++++++++++++++++++++++++++++--
 16 files changed, 368 insertions(+), 26 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 6dded16c0e3..2d61de6adde 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+	-fdep-output= flags.
+	* genmatch.cc (main): Add new preprocessor parameter used for C++
+	module tracking.
+	* input.cc (test_lexer): Add new preprocessor parameter used for C++
+	module tracking.
+
 2022-09-19  Torbjörn SVENSSON  <torbjorn.svensson@foss.st.com>
 
 	* targhooks.cc (default_zero_call_used_regs): Improve sorry
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ba3d76dd6cb..569dcd96e8c 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+	-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+	* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-09-15  Richard Biener  <rguenther@suse.de>
 
 	* c-common.h (build_void_list_node): Remove.
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index babaa2fc157..617d0e93696 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value,
       deps_file = arg;
       break;
 
+    case OPT_fdep_format_:
+      if (!strcmp (arg, "p1689r5"))
+	cpp_opts->deps.format = DEPS_FMT_P1689R5;
+      else
+	error ("%<-fdep-format=%> unknown format %s", arg);
+      break;
+
+    case OPT_fdep_file_:
+      deps_seen = true;
+      fdeps_file = arg;
+      break;
+
+    case OPT_fdep_output_:
+      deps_seen = true;
+      defer_opt (code, arg);
+      break;
+
     case OPT_MF:
       deps_seen = true;
       deps_file = arg;
@@ -1279,6 +1299,7 @@ void
 c_common_finish (void)
 {
   FILE *deps_stream = NULL;
+  FILE *fdeps_stream = NULL;
 
   /* Note that we write the dependencies even if there are errors. This is
      useful for handling outdated generated headers that now trigger errors
@@ -1307,9 +1328,24 @@ c_common_finish (void)
      locations with input_location, which would be incorrect now.  */
   override_libcpp_locations = false;
 
+  if (cpp_opts->deps.format != DEPS_FMT_NONE)
+    {
+      if (!fdeps_file)
+	fdeps_stream = out_stream;
+      else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
+	fdeps_stream = stdout;
+      else
+	{
+	  fdeps_stream = fopen (fdeps_file, "w");
+	  if (!fdeps_stream)
+	    fatal_error (input_location, "opening dependency file %s: %m",
+			 fdeps_file);
+	}
+    }
+
   /* For performance, avoid tearing down cpplib's internal structures
      with cpp_destroy ().  */
-  cpp_finish (parse_in, deps_stream);
+  cpp_finish (parse_in, deps_stream, fdeps_stream);
 
   if (deps_stream && deps_stream != out_stream && deps_stream != stdout
       && (ferror (deps_stream) || fclose (deps_stream)))
@@ -1381,6 +1417,8 @@ handle_deferred_opts (void)
 
 	if (opt->code == OPT_MT || opt->code == OPT_MQ)
 	  deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
+	else if (opt->code == OPT_fdep_output_)
+	  deps_add_output (deps, opt->arg, true);
       }
 }
 
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 1c7f89eeb94..b0b91ef73a3 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -256,6 +256,18 @@ MT
 C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target after %qs)
 -MT <target>	Add a target that does not require quoting.
 
+fdep-format=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after %qs)
+Format for output dependency information.  Supported (\"p1689r5\").
+
+fdep-file=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing output path after %qs)
+File for output dependency information.
+
+fdep-output=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing path after %qs)
+-fdep-output=obj.o Output file for the compile step.
+
 P
 C ObjC C++ ObjC++
 Do not generate #line directives.
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index dc4ce202c35..eb5fd11bff7 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* module.cc (preprocessed_module): Pass whether the module is
+	exported to dependency tracking.
+
 2022-09-17  Patrick Palka  <ppalka@redhat.com>
 
 	* module.cc (friend_from_decl_list): Don't consider
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9a9ef4e3332..9a2d1da0af1 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19751,7 +19751,8 @@ preprocessed_module (cpp_reader *reader)
 		  && (module->is_interface () || module->is_partition ()))
 		deps_add_module_target (deps, module->get_flatname (),
 					maybe_add_cmi_prefix (module->filename),
-					module->is_header());
+					module->is_header (),
+					module->is_exported ());
 	      else
 		deps_add_module_dep (deps, module->get_flatname ());
 	    }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa5655764a0..890bb58b056 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2776,6 +2776,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The default when in a GNU
 dialect (@option{-std=gnu11} or similar) is
 @option{-fpermitted-flt-eval-methods=ts-18661-3}.
 
+@item -fdep-file=@var{file}
+@opindex fdep-file
+Where to write structured dependency information.
+
+@item -fdep-format=@var{format}
+@opindex fdep-format
+The format to use for structured dependency information. @samp{p1689r5} is the
+only supported format right now.  Note that when this argument is specified, the
+output of @samp{-MF} is stripped of some information (namely C++ modules) so
+that it does not use extended makefile syntax not understood by most tools.
+
+@item -fdep-output=@var{file}
+@opindex fdep-output
+Analogous to @option{-MT} but for structured dependency information.
+
 @item -fplan9-extensions
 @opindex fplan9-extensions
 Accept some non-standard constructs used in Plan 9 code.
diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index f5f8ac04286..b0942f876a0 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,8 @@
+2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* cpp.cc (gfc_cpp_done): Add new preprocessor parameter used for C++
+	module tracking.
+
 2022-09-19  Francois-Xavier Coudert  <fxcoudert@gcc.gnu.org>
 
 	* libgfortran.h: Declare GFC_FPE_AWAY.
diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
index 364bd0d2a85..0b9df9c02cd 100644
--- a/gcc/fortran/cpp.cc
+++ b/gcc/fortran/cpp.cc
@@ -712,7 +712,7 @@ gfc_cpp_done (void)
 	  FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
 	  if (f)
 	    {
-	      cpp_finish (cpp_in, f);
+	      cpp_finish (cpp_in, f, NULL);
 	      fclose (f);
 	    }
 	  else
@@ -721,7 +721,7 @@ gfc_cpp_done (void)
 			     xstrerror (errno));
 	}
       else
-	cpp_finish (cpp_in, stdout);
+	cpp_finish (cpp_in, stdout, NULL);
     }
 
   cpp_undef_all (cpp_in);
diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index a0b22c50ae3..b9ddf8ad669 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -5246,7 +5246,7 @@ main (int argc, char **argv)
   dt.gen (stdout, gimple);
 
   /* Finalize.  */
-  cpp_finish (r, NULL);
+  cpp_finish (r, NULL, NULL);
   cpp_destroy (r);
 
   delete operators;
diff --git a/gcc/input.cc b/gcc/input.cc
index 060ca160126..a563f7ca4bc 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2260,7 +2260,7 @@ test_lexer (const line_table_case &case_)
   ASSERT_NE (tok, NULL);
   ASSERT_EQ (tok->type, CPP_EOF);
 
-  cpp_finish (parser, NULL);
+  cpp_finish (parser, NULL, NULL);
   cpp_destroy (parser);
 }
 
@@ -2291,7 +2291,7 @@ class cpp_reader_ptr
 
   ~cpp_reader_ptr ()
   {
-    cpp_finish (m_ptr, NULL);
+    cpp_finish (m_ptr, NULL, NULL);
     cpp_destroy (m_ptr);
   }
 
diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 5984915a49e..52e20886037 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,14 @@
+2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* include/cpplib.h: Add cpp_deps_format enum.
+	* include/cpplib.h (cpp_options): Add format field.
+	* include/cpplib.h (cpp_finish): Add dependency stream parameter.
+	* include/mkdeps.h (deps_add_module_target): Add new preprocessor
+	parameter used for C++ module tracking.
+	* init.cc (cpp_finish): Add new preprocessor parameter used for C++
+	module tracking.
+	* mkdeps.cc (mkdeps): Implement P1689R5 output.
+
 2022-09-08  Lewis Hyatt  <lhyatt@gmail.com>
 
 	* line-map.cc (location_adhoc_data_update): Remove reliance on
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 2db1e9cbdfb..90787230a9e 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
 /* Style of header dependencies to generate.  */
 enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
 
+/* Format of header dependencies to generate.  */
+enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
+
 /* The possible normalization levels, from most restrictive to least.  */
 enum cpp_normalize_level {
   /* In NFKC.  */
@@ -581,6 +584,9 @@ struct cpp_options
     /* Style of header dependencies to generate.  */
     enum cpp_deps_style style;
 
+    /* Format of header dependencies to generate.  */
+    enum cpp_deps_format format;
+
     /* Assume missing files are generated files.  */
     bool missing_files;
 
@@ -1104,9 +1110,9 @@ extern void cpp_post_options (cpp_reader *);
 extern void cpp_init_iconv (cpp_reader *);
 
 /* Call this to finish preprocessing.  If you requested dependency
-   generation, pass an open stream to write the information to,
-   otherwise NULL.  It is your responsibility to close the stream.  */
-extern void cpp_finish (cpp_reader *, FILE *deps_stream);
+   generation, pass open stream(s) to write the information to,
+   otherwise NULL.  It is your responsibility to close the stream(s).  */
+extern void cpp_finish (cpp_reader *, FILE *deps_stream, FILE *fdeps_stream);
 
 /* Call this to release the handle at the end of preprocessing.  Any
    use of the handle after this function returns is invalid.  */
diff --git a/libcpp/include/mkdeps.h b/libcpp/include/mkdeps.h
index 96d64641b1a..396afa60f25 100644
--- a/libcpp/include/mkdeps.h
+++ b/libcpp/include/mkdeps.h
@@ -53,20 +53,29 @@ extern void deps_add_default_target (class mkdeps *, const char *);
 
 /* Adds a module target.  The module name and cmi name are copied.  */
 extern void deps_add_module_target (struct mkdeps *, const char *module,
-				    const char *cmi, bool is_header);
+				    const char *cmi, bool is_header,
+				    bool is_exported);
 
 /* Adds a module dependency.  The module name is copied.  */
 extern void deps_add_module_dep (struct mkdeps *, const char *module);
 
+/* Add an output.  */
+extern void deps_add_output (struct mkdeps *, const char *, bool);
+
 /* Add a dependency (appears on the right side of the colon) to the
    deps list.  Dependencies will be printed in the order that they
    were entered with this function.  By convention, the first
    dependency entered should be the primary source file.  */
 extern void deps_add_dep (class mkdeps *, const char *);
 
-/* Write out a deps buffer to a specified file.  The last argument
-   is the number of columns to word-wrap at (0 means don't wrap).  */
-extern void deps_write (const cpp_reader *, FILE *, unsigned int);
+/* Write out a deps buffer to a specified file.  The third argument
+   is the number of columns to word-wrap at (0 means don't wrap).
+   The last argument indicates whether to output extra information
+   (namely modules).  */
+extern void deps_write (const struct cpp_reader *, FILE *, unsigned int, int);
+
+/* Write out a deps buffer to a specified file in P1689R5 format.  */
+extern void deps_write_p1689r5 (const struct mkdeps *, FILE *);
 
 /* Write out a deps buffer to a file, in a form that can be read back
    with deps_restore.  Returns nonzero on error, in which case the
diff --git a/libcpp/init.cc b/libcpp/init.cc
index d3b4f00994b..d575afe23c7 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -853,7 +853,7 @@ read_original_directory (cpp_reader *pfile)
    Maybe it should also reset state, such that you could call
    cpp_start_read with a new filename to restart processing.  */
 void
-cpp_finish (cpp_reader *pfile, FILE *deps_stream)
+cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
 {
   /* Warn about unused macros before popping the final buffer.  */
   if (CPP_OPTION (pfile, warn_unused_macros))
@@ -867,8 +867,16 @@ cpp_finish (cpp_reader *pfile, FILE *deps_stream)
   while (pfile->buffer)
     _cpp_pop_buffer (pfile);
 
-  if (deps_stream)
-    deps_write (pfile, deps_stream, 72);
+  cpp_deps_format deps_format = CPP_OPTION (pfile, deps.format);
+  if (deps_format == DEPS_FMT_P1689R5 && fdeps_stream)
+    deps_write_p1689r5 (pfile->deps, fdeps_stream);
+
+  if (CPP_OPTION (pfile, deps.style) != DEPS_NONE
+      && deps_stream)
+    {
+      deps_write (pfile, deps_stream,
+		  72, deps_format == DEPS_FMT_NONE);
+    }
 
   /* Report on headers that could use multiple include guards.  */
   if (CPP_OPTION (pfile, print_include_names))
diff --git a/libcpp/mkdeps.cc b/libcpp/mkdeps.cc
index 30e87d8b4d7..e1253202d9f 100644
--- a/libcpp/mkdeps.cc
+++ b/libcpp/mkdeps.cc
@@ -81,7 +81,8 @@ public:
   };
 
   mkdeps ()
-    : module_name (NULL), cmi_name (NULL), is_header_unit (false), quote_lwm (0)
+    : primary_output (NULL), module_name (NULL), cmi_name (NULL)
+    , is_header_unit (false), is_exported (false), quote_lwm (0)
   {
   }
   ~mkdeps ()
@@ -90,6 +91,9 @@ public:
 
     for (i = targets.size (); i--;)
       free (const_cast <char *> (targets[i]));
+    free (const_cast <char *> (primary_output));
+    for (i = outputs.size (); i--;)
+      free (const_cast <char *> (outputs[i]));
     for (i = deps.size (); i--;)
       free (const_cast <char *> (deps[i]));
     for (i = vpath.size (); i--;)
@@ -103,6 +107,8 @@ public:
 public:
   vec<const char *> targets;
   vec<const char *> deps;
+  const char * primary_output;
+  vec<const char *> outputs;
   vec<velt> vpath;
   vec<const char *> modules;
 
@@ -110,6 +116,7 @@ public:
   const char *module_name;
   const char *cmi_name;
   bool is_header_unit;
+  bool is_exported;
   unsigned short quote_lwm;
 };
 
@@ -288,6 +295,21 @@ deps_add_default_target (class mkdeps *d, const char *tgt)
     }
 }
 
+/* Adds an output O.  We make a copy, so it need not be a permanent
+   string.  */
+void
+deps_add_output (struct mkdeps *d, const char *o, bool is_primary)
+{
+  o = apply_vpath (d, o);
+  if (is_primary)
+  {
+    if (d->primary_output)
+      d->outputs.push (d->primary_output);
+    d->primary_output = xstrdup (o);
+  } else
+    d->outputs.push (xstrdup (o));
+}
+
 void
 deps_add_dep (class mkdeps *d, const char *t)
 {
@@ -325,12 +347,13 @@ deps_add_vpath (class mkdeps *d, const char *vpath)
 
 void
 deps_add_module_target (struct mkdeps *d, const char *m,
-			const char *cmi, bool is_header_unit)
+			const char *cmi, bool is_header_unit, bool is_exported)
 {
   gcc_assert (!d->module_name);
   
   d->module_name = xstrdup (m);
   d->is_header_unit = is_header_unit;
+  d->is_exported = is_exported;
   d->cmi_name = xstrdup (cmi);
 }
 
@@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp,
    .PHONY targets for all the dependencies too.  */
 
 static void
-make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
+make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int extra)
 {
   const mkdeps *d = pfile->deps;
 
@@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
   if (d->deps.size ())
     {
       column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
-      if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
+      if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
 	column = make_write_name (d->cmi_name, fp, column, colmax);
       fputs (":", fp);
       column++;
@@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
   if (!CPP_OPTION (pfile, deps.modules))
     return;
 
-  if (d->modules.size ())
+  if (extra && d->modules.size ())
     {
       column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
       if (d->cmi_name)
@@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
       fputs ("\n", fp);
     }
 
-  if (d->module_name)
+  if (extra && d->module_name)
     {
       if (d->cmi_name)
 	{
@@ -455,7 +478,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
 	}
     }
   
-  if (d->modules.size ())
+  if (extra && d->modules.size ())
     {
       column = fprintf (fp, "CXX_IMPORTS +=");
       make_write_vec (d->modules, fp, column, colmax, 0, ".c++m");
@@ -468,9 +491,203 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
 /* Really we should be opening fp here.  */
 
 void
-deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
+deps_write (const struct cpp_reader *pfile, FILE *fp, unsigned int colmax,
+	    int extra)
+{
+  make_write (pfile, fp, colmax, extra);
+}
+
+static bool
+is_utf8 (const char *name)
+{
+  int byte_length = 0;
+  int expected_continuations = 0;
+  uint32_t codepoint = 0;
+
+  for (const char* c = name; *c; c++)
+    {
+      int byte = *c;
+
+      // Check for a continuation character.
+      if ((byte & 0xc0) == 0x80) // 10xx_xxxx
+	{
+	  --expected_continuations;
+
+	  if (expected_continuations < 0)
+	    return false;
+
+	  // Add the encoded bits to the codepoint.
+	  codepoint = (codepoint << 6) + (byte & ~0x3f);
+
+	  continue;
+	}
+
+      // Make sure we don't expect more continuation characters
+      // once we don't find one.
+      if (expected_continuations != 0)
+	return false;
+
+      // Invalid because if they appear, it is an attempt to encode an ASCII
+      // character in 2 bytes.
+      if (byte == 0xc0 || byte == 0xc1) // 1100_0000 || 1100_0001
+	return false;
+
+      // Invalid because they would encode codepoints greater than allowed
+      // (0x10FFFF).
+      if (byte > 0xf4)
+	return false;
+
+      // ED-prefixed sequence encoding UTF-16 surrogate halves.
+      if (0xD0FF <= codepoint && codepoint <= 0xDFFF)
+	return false;
+
+      // 0x10FFFF is the highest valid codepoint for UTF-8.
+      if (0x10FFFF < codepoint)
+	return false;
+
+      // Overlong encoding of a codepoint.
+      if (byte_length == 2 && codepoint < 0x0800)
+	return false;
+      if (byte_length == 3 && codepoint < 0x10000)
+	return false;
+
+      // Single byte character.
+      if ((byte & 0x80) == 0x00) // 0xxx_xxxx
+	{
+	  codepoint = byte;
+	  byte_length = expected_continuations = 0;
+	}
+      // Prefix codes.
+      else if ((byte & 0xe0) == 0xc0) // 110x_xxxx
+	{
+	  codepoint = byte & 0x1f;
+	  byte_length = expected_continuations = 1;
+	}
+      else if ((byte & 0xf0) == 0xe0) // 1110_xxxx
+	{
+	  codepoint = byte & 0x0f;
+	  byte_length = expected_continuations = 2;
+	}
+      else if ((byte & 0xf8) == 0xf0) // 1111_0xxx
+	{
+	  codepoint = byte & 0x07;
+	  byte_length = expected_continuations = 3;
+	}
+      else
+	return false;
+    }
+
+  if (expected_continuations != 0)
+    return false;
+
+  return true;
+}
+
+static void
+p1689r5_write_filepath (const char *name, FILE *fp)
 {
-  make_write (pfile, fp, colmax);
+  if (is_utf8 (name))
+    {
+      fputc ('"', fp);
+      for (const char* c = name; *c; c++)
+	{
+	  // Escape control characters.
+	  if (ISCNTRL (*c))
+	    fprintf (fp, "\\u%04x", *c);
+	  // JSON escape characters.
+	  else if (*c == '"' || *c == '\\')
+	    {
+	      fputc ('\\', fp);
+	      fputc (*c, fp);
+	    }
+	  // Everything else.
+	  else
+	    fputc (*c, fp);
+	}
+      fputc ('"', fp);
+    }
+  else
+    {
+      // TODO: print an error
+    }
+}
+
+static void
+p1689r5_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp)
+{
+  for (unsigned ix = 0; ix != vec.size (); ix++)
+    {
+      p1689r5_write_filepath (vec[ix], fp);
+      if (ix < vec.size () - 1)
+	fputc (',', fp);
+      fputc ('\n', fp);
+    }
+}
+
+void
+deps_write_p1689r5 (const struct mkdeps *d, FILE *fp)
+{
+  fputs ("{\n", fp);
+
+  fputs ("\"rules\": [\n", fp);
+  fputs ("{\n", fp);
+
+  if (d->primary_output)
+    {
+      fputs ("\"primary-output\": ", fp);
+      p1689r5_write_filepath (d->primary_output, fp);
+      fputs (",\n", fp);
+    }
+
+  if (d->outputs.size ())
+    {
+      fputs ("\"outputs\": [\n", fp);
+      p1689r5_write_vec (d->outputs, fp);
+      fputs ("],\n", fp);
+    }
+
+  if (d->module_name)
+    {
+      fputs ("\"provides\": [\n", fp);
+      fputs ("{\n", fp);
+
+      fputs ("\"logical-name\": ", fp);
+      p1689r5_write_filepath (d->module_name, fp);
+      fputs (",\n", fp);
+
+      fprintf (fp, "\"is-interface\": %s\n", d->is_exported ? "true" : "false");
+
+      // TODO: header-unit information
+
+      fputs ("}\n", fp);
+      fputs ("],\n", fp);
+    }
+
+  fputs ("\"requires\": [\n", fp);
+  for (size_t i = 0; i < d->modules.size (); i++)
+    {
+      if (i != 0)
+	fputs (",\n", fp);
+      fputs ("{\n", fp);
+
+      fputs ("\"logical-name\": ", fp);
+      p1689r5_write_filepath (d->modules[i], fp);
+      fputs ("\n", fp);
+
+      // TODO: header-unit information
+
+      fputs ("}\n", fp);
+    }
+  fputs ("]\n", fp);
+
+  fputs ("}\n", fp);
+
+  fputs ("],\n", fp);
+
+  fputs ("\"version\": 0,\n", fp);
+  fputs ("\"revision\": 0\n", fp);
+
+  fputs ("}\n", fp);
 }
 
 /* Write out a deps buffer to a file, in a form that can be read back
-- 
2.37.3


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-04 15:12 ` [PATCH RESEND 1/1] p1689r5: initial support Ben Boeckel
@ 2022-10-04 19:12   ` Harald Anlauf
  2022-10-11 11:30     ` Ben Boeckel
  2022-10-10 21:04   ` Jason Merrill
  1 sibling, 1 reply; 14+ messages in thread
From: Harald Anlauf @ 2022-10-04 19:12 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: gcc, jason, brad.king, fortran, Ben Boeckel, nathan

Am 04.10.22 um 17:12 schrieb Ben Boeckel:
> This patch implements support for [P1689R5][] to communicate to a build
> system the C++20 module dependencies to build systems so that they may
> build `.gcm` files in the proper order.

Is there a reason that you are touching so many frontends?

> diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> index 364bd0d2a85..0b9df9c02cd 100644
> --- a/gcc/fortran/cpp.cc
> +++ b/gcc/fortran/cpp.cc
> @@ -712,7 +712,7 @@ gfc_cpp_done (void)
>   	  FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
>   	  if (f)
>   	    {
> -	      cpp_finish (cpp_in, f);
> +	      cpp_finish (cpp_in, f, NULL);
>   	      fclose (f);
>   	    }
>   	  else
> @@ -721,7 +721,7 @@ gfc_cpp_done (void)
>   			     xstrerror (errno));
>   	}
>         else
> -	cpp_finish (cpp_in, stdout);
> +	cpp_finish (cpp_in, stdout, NULL);
>       }
>
>     cpp_undef_all (cpp_in);

Couldn't you simply default the third argument of cpp_finish() to NULL?

> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index 2db1e9cbdfb..90787230a9e 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
>   /* Style of header dependencies to generate.  */
>   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
>
> +/* Format of header dependencies to generate.  */
> +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
> +
>   /* The possible normalization levels, from most restrictive to least.  */
>   enum cpp_normalize_level {
>     /* In NFKC.  */
> @@ -581,6 +584,9 @@ struct cpp_options
>       /* Style of header dependencies to generate.  */
>       enum cpp_deps_style style;
>
> +    /* Format of header dependencies to generate.  */
> +    enum cpp_deps_format format;
> +
>       /* Assume missing files are generated files.  */
>       bool missing_files;
>
> @@ -1104,9 +1110,9 @@ extern void cpp_post_options (cpp_reader *);
>   extern void cpp_init_iconv (cpp_reader *);
>
>   /* Call this to finish preprocessing.  If you requested dependency
> -   generation, pass an open stream to write the information to,
> -   otherwise NULL.  It is your responsibility to close the stream.  */
> -extern void cpp_finish (cpp_reader *, FILE *deps_stream);
> +   generation, pass open stream(s) to write the information to,
> +   otherwise NULL.  It is your responsibility to close the stream(s).  */
> +extern void cpp_finish (cpp_reader *, FILE *deps_stream, FILE *fdeps_stream);
                                                                  ^^^^^^^
>
>   /* Call this to release the handle at the end of preprocessing.  Any
>      use of the handle after this function returns is invalid.  */



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/1] RFC: P1689R5 support
  2022-10-04 15:11 [PATCH RESEND 0/1] RFC: P1689R5 support Ben Boeckel
  2022-10-04 15:12 ` [PATCH RESEND 1/1] p1689r5: initial support Ben Boeckel
@ 2022-10-10 20:21 ` Jason Merrill
  2022-10-13 17:08   ` David Malcolm
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2022-10-10 20:21 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: nathan, fortran, gcc, brad.king, Ben Boeckel, David Malcolm

On 10/4/22 11:11, Ben Boeckel wrote:
> This patch adds initial support for ISO C++'s [P1689R5][], a format for
> describing C++ module requirements and provisions based on the source
> code. This is required because compiling C++ with modules is not
> embarrassingly parallel and need to be ordered to ensure that `import
> some_module;` can be satisfied in time by making sure that the TU with
> `export import some_module;` is compiled first.
> 
> [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> 
> I'd like feedback on the approach taken here with respect to the
> user-visible flags. I'll also note that header units are not supported
> at this time because the current `-E` behavior with respect to `import
> <some_header>;` is to search for an appropriate `.gcm` file which is not
> something such a "scan" can support. A new mode will likely need to be
> created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
> where headers are looked up "normally" and processed only as much as
> scanning requires.
> 
> Testing is currently happening in CMake's CI using a prior revision of
> this patch (the differences are basically the changelog, some style, and
> `trtbd` instead of `p1689r5` as the format name).
> 
> For testing within GCC, I'll work on the following:
> 
> - scanning non-module source
> - scanning module-importing source (`import X;`)
> - scanning module-exporting source (`export module X;`)
> - scanning module implementation unit (`module X;`)
> - flag combinations?
> 
> Are there existing tools for handling JSON output for testing purposes?

David Malcolm would probably know best about JSON wrangling.

> Basically, something that I can add to the test suite that doesn't care
> about whitespace, but checks the structure (with sensible replacements
> for absolute paths where relevant)?

Various tests in g++.dg/debug/dwarf2 handle that sort of thing with regexps.

> For the record, Clang has patches with similar flags and behavior by
> Chuanqi Xu here:
> 
>      https://reviews.llvm.org/D134269
> 
> with the same flags (though using my old `trtbd` spelling for the
> format name).
> 
> Thanks,
> 
> --Ben
> 
> Ben Boeckel (1):
>    p1689r5: initial support
> 
>   gcc/ChangeLog           |   9 ++
>   gcc/c-family/ChangeLog  |   6 +
>   gcc/c-family/c-opts.cc  |  40 ++++++-
>   gcc/c-family/c.opt      |  12 ++
>   gcc/cp/ChangeLog        |   5 +
>   gcc/cp/module.cc        |   3 +-
>   gcc/doc/invoke.texi     |  15 +++
>   gcc/fortran/ChangeLog   |   5 +
>   gcc/fortran/cpp.cc      |   4 +-
>   gcc/genmatch.cc         |   2 +-
>   gcc/input.cc            |   4 +-
>   libcpp/ChangeLog        |  11 ++
>   libcpp/include/cpplib.h |  12 +-
>   libcpp/include/mkdeps.h |  17 ++-
>   libcpp/init.cc          |  14 ++-
>   libcpp/mkdeps.cc        | 235 ++++++++++++++++++++++++++++++++++++++--
>   16 files changed, 368 insertions(+), 26 deletions(-)
> 
> 
> base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-04 15:12 ` [PATCH RESEND 1/1] p1689r5: initial support Ben Boeckel
  2022-10-04 19:12   ` Harald Anlauf
@ 2022-10-10 21:04   ` Jason Merrill
  2022-10-11 11:42     ` Ben Boeckel
  1 sibling, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2022-10-10 21:04 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches; +Cc: nathan, fortran, gcc, brad.king, Ben Boeckel

On 10/4/22 11:12, Ben Boeckel wrote:
> This patch implements support for [P1689R5][] to communicate to a build
> system the C++20 module dependencies to build systems so that they may
> build `.gcm` files in the proper order.

Thanks!

> Support is communicated through the following three new flags:
> 
> - `-fdeps-format=` specifies the format for the output. Currently named
>    `p1689r5`.
> 
> - `-fdeps-file=` specifies the path to the file to write the format to.

Do you expect users to want to emit Makefile (-MM) and P1689 
dependencies from the same compilation?

> - `-fdep-output=` specifies the `.o` that will be written for the TU
>    that is scanned. This is required so that the build system can
>    correlate the dependency output with the actual compilation that will
>    occur.

The dependency machinery already needs to be able to figure out the name 
of the output file, can't we reuse that instead of specifying it on the 
command line?

> CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
> using an experimental feature selection (to allow for future usage
> evolution without committing to how it works today). While it remains
> experimental, docs may be found in CMake's documentation for
> experimental features.
> 
> Future work may include using this format for Fortran module
> dependencies as well, however this is still pending work.
> 
> [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> [cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst
> 
> TODO:
> 
> - header-unit information fields
> 
> Header units (including the standard library headers) are 100%
> unsupported right now because the `-E` mechanism wants to import their
> BMIs. A new mode (i.e., something more workable than existing `-E`
> behavior) that mocks up header units as if they were imported purely
> from their path and content would be required.
> 
> - non-utf8 paths
> 
> The current standard says that paths that are not unambiguously
> represented using UTF-8 are not supported (because these cases are rare
> and the extra complication is not worth it at this time). Future
> versions of the format might have ways of encoding non-UTF-8 paths. For
> now, this patch just doesn't support non-UTF-8 paths (ignoring the
> "unambiguously represetable in UTF-8" case).
> 
> - figure out why junk gets placed at the end of the file
> 
> Sometimes it seems like the file gets a lot of `NUL` bytes appended to
> it. It happens rarely and seems to be the result of some
> `ftruncate`-style call which results in extra padding in the contents.
> Noting it here as an observation at least.
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   gcc/ChangeLog           |   9 ++
>   gcc/c-family/ChangeLog  |   6 +
>   gcc/c-family/c-opts.cc  |  40 ++++++-
>   gcc/c-family/c.opt      |  12 ++
>   gcc/cp/ChangeLog        |   5 +
>   gcc/cp/module.cc        |   3 +-
>   gcc/doc/invoke.texi     |  15 +++
>   gcc/fortran/ChangeLog   |   5 +
>   gcc/fortran/cpp.cc      |   4 +-
>   gcc/genmatch.cc         |   2 +-
>   gcc/input.cc            |   4 +-
>   libcpp/ChangeLog        |  11 ++
>   libcpp/include/cpplib.h |  12 +-
>   libcpp/include/mkdeps.h |  17 ++-
>   libcpp/init.cc          |  14 ++-
>   libcpp/mkdeps.cc        | 235 ++++++++++++++++++++++++++++++++++++++--
>   16 files changed, 368 insertions(+), 26 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 6dded16c0e3..2d61de6adde 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
> +	-fdep-output= flags.
> +	* genmatch.cc (main): Add new preprocessor parameter used for C++
> +	module tracking.
> +	* input.cc (test_lexer): Add new preprocessor parameter used for C++
> +	module tracking.
> +
>   2022-09-19  Torbjörn SVENSSON  <torbjorn.svensson@foss.st.com>
>   
>   	* targhooks.cc (default_zero_call_used_regs): Improve sorry
> diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
> index ba3d76dd6cb..569dcd96e8c 100644
> --- a/gcc/c-family/ChangeLog
> +++ b/gcc/c-family/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
> +	-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
> +	* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
> +
>   2022-09-15  Richard Biener  <rguenther@suse.de>
>   
>   	* c-common.h (build_void_list_node): Remove.
> diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
> index babaa2fc157..617d0e93696 100644
> --- a/gcc/c-family/c-opts.cc
> +++ b/gcc/c-family/c-opts.cc
> @@ -77,6 +77,9 @@ static bool verbose;
>   /* Dependency output file.  */
>   static const char *deps_file;
>   
> +/* Enhanced dependency output file.  */
> +static const char *fdeps_file;
> +
>   /* The prefix given by -iprefix, if any.  */
>   static const char *iprefix;
>   
> @@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value,
>         deps_file = arg;
>         break;
>   
> +    case OPT_fdep_format_:
> +      if (!strcmp (arg, "p1689r5"))
> +	cpp_opts->deps.format = DEPS_FMT_P1689R5;
> +      else
> +	error ("%<-fdep-format=%> unknown format %s", arg);
> +      break;
> +
> +    case OPT_fdep_file_:
> +      deps_seen = true;
> +      fdeps_file = arg;
> +      break;
> +
> +    case OPT_fdep_output_:
> +      deps_seen = true;
> +      defer_opt (code, arg);
> +      break;
> +
>       case OPT_MF:
>         deps_seen = true;
>         deps_file = arg;
> @@ -1279,6 +1299,7 @@ void
>   c_common_finish (void)
>   {
>     FILE *deps_stream = NULL;
> +  FILE *fdeps_stream = NULL;
>   
>     /* Note that we write the dependencies even if there are errors. This is
>        useful for handling outdated generated headers that now trigger errors
> @@ -1307,9 +1328,24 @@ c_common_finish (void)
>        locations with input_location, which would be incorrect now.  */
>     override_libcpp_locations = false;
>   
> +  if (cpp_opts->deps.format != DEPS_FMT_NONE)
> +    {
> +      if (!fdeps_file)
> +	fdeps_stream = out_stream;
> +      else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
> +	fdeps_stream = stdout;
> +      else
> +	{
> +	  fdeps_stream = fopen (fdeps_file, "w");
> +	  if (!fdeps_stream)
> +	    fatal_error (input_location, "opening dependency file %s: %m",
> +			 fdeps_file);
> +	}
> +    }
> +
>     /* For performance, avoid tearing down cpplib's internal structures
>        with cpp_destroy ().  */
> -  cpp_finish (parse_in, deps_stream);
> +  cpp_finish (parse_in, deps_stream, fdeps_stream);
>   
>     if (deps_stream && deps_stream != out_stream && deps_stream != stdout
>         && (ferror (deps_stream) || fclose (deps_stream)))
> @@ -1381,6 +1417,8 @@ handle_deferred_opts (void)
>   
>   	if (opt->code == OPT_MT || opt->code == OPT_MQ)
>   	  deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
> +	else if (opt->code == OPT_fdep_output_)
> +	  deps_add_output (deps, opt->arg, true);
>         }
>   }
>   
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 1c7f89eeb94..b0b91ef73a3 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -256,6 +256,18 @@ MT
>   C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target after %qs)
>   -MT <target>	Add a target that does not require quoting.
>   
> +fdep-format=
> +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after %qs)
> +Format for output dependency information.  Supported (\"p1689r5\").
> +
> +fdep-file=
> +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing output path after %qs)
> +File for output dependency information.
> +
> +fdep-output=
> +C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing path after %qs)
> +-fdep-output=obj.o Output file for the compile step.
> +
>   P
>   C ObjC C++ ObjC++
>   Do not generate #line directives.
> diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
> index dc4ce202c35..eb5fd11bff7 100644
> --- a/gcc/cp/ChangeLog
> +++ b/gcc/cp/ChangeLog
> @@ -1,3 +1,8 @@
> +2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* module.cc (preprocessed_module): Pass whether the module is
> +	exported to dependency tracking.
> +
>   2022-09-17  Patrick Palka  <ppalka@redhat.com>
>   
>   	* module.cc (friend_from_decl_list): Don't consider
> diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
> index 9a9ef4e3332..9a2d1da0af1 100644
> --- a/gcc/cp/module.cc
> +++ b/gcc/cp/module.cc
> @@ -19751,7 +19751,8 @@ preprocessed_module (cpp_reader *reader)
>   		  && (module->is_interface () || module->is_partition ()))
>   		deps_add_module_target (deps, module->get_flatname (),
>   					maybe_add_cmi_prefix (module->filename),
> -					module->is_header());
> +					module->is_header (),
> +					module->is_exported ());
>   	      else
>   		deps_add_module_dep (deps, module->get_flatname ());
>   	    }
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index aa5655764a0..890bb58b056 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -2776,6 +2776,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The default when in a GNU
>   dialect (@option{-std=gnu11} or similar) is
>   @option{-fpermitted-flt-eval-methods=ts-18661-3}.
>   
> +@item -fdep-file=@var{file}
> +@opindex fdep-file
> +Where to write structured dependency information.
> +
> +@item -fdep-format=@var{format}
> +@opindex fdep-format
> +The format to use for structured dependency information. @samp{p1689r5} is the
> +only supported format right now.  Note that when this argument is specified, the
> +output of @samp{-MF} is stripped of some information (namely C++ modules) so
> +that it does not use extended makefile syntax not understood by most tools.
> +
> +@item -fdep-output=@var{file}
> +@opindex fdep-output
> +Analogous to @option{-MT} but for structured dependency information.
> +
>   @item -fplan9-extensions
>   @opindex fplan9-extensions
>   Accept some non-standard constructs used in Plan 9 code.
> diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
> index f5f8ac04286..b0942f876a0 100644
> --- a/gcc/fortran/ChangeLog
> +++ b/gcc/fortran/ChangeLog
> @@ -1,3 +1,8 @@
> +2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* cpp.cc (gfc_cpp_done): Add new preprocessor parameter used for C++
> +	module tracking.
> +
>   2022-09-19  Francois-Xavier Coudert  <fxcoudert@gcc.gnu.org>
>   
>   	* libgfortran.h: Declare GFC_FPE_AWAY.
> diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> index 364bd0d2a85..0b9df9c02cd 100644
> --- a/gcc/fortran/cpp.cc
> +++ b/gcc/fortran/cpp.cc
> @@ -712,7 +712,7 @@ gfc_cpp_done (void)
>   	  FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
>   	  if (f)
>   	    {
> -	      cpp_finish (cpp_in, f);
> +	      cpp_finish (cpp_in, f, NULL);
>   	      fclose (f);
>   	    }
>   	  else
> @@ -721,7 +721,7 @@ gfc_cpp_done (void)
>   			     xstrerror (errno));
>   	}
>         else
> -	cpp_finish (cpp_in, stdout);
> +	cpp_finish (cpp_in, stdout, NULL);
>       }
>   
>     cpp_undef_all (cpp_in);
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index a0b22c50ae3..b9ddf8ad669 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -5246,7 +5246,7 @@ main (int argc, char **argv)
>     dt.gen (stdout, gimple);
>   
>     /* Finalize.  */
> -  cpp_finish (r, NULL);
> +  cpp_finish (r, NULL, NULL);
>     cpp_destroy (r);
>   
>     delete operators;
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 060ca160126..a563f7ca4bc 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -2260,7 +2260,7 @@ test_lexer (const line_table_case &case_)
>     ASSERT_NE (tok, NULL);
>     ASSERT_EQ (tok->type, CPP_EOF);
>   
> -  cpp_finish (parser, NULL);
> +  cpp_finish (parser, NULL, NULL);
>     cpp_destroy (parser);
>   }
>   
> @@ -2291,7 +2291,7 @@ class cpp_reader_ptr
>   
>     ~cpp_reader_ptr ()
>     {
> -    cpp_finish (m_ptr, NULL);
> +    cpp_finish (m_ptr, NULL, NULL);
>       cpp_destroy (m_ptr);
>     }
>   
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 5984915a49e..52e20886037 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,14 @@
> +2022-09-20  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* include/cpplib.h: Add cpp_deps_format enum.
> +	* include/cpplib.h (cpp_options): Add format field.
> +	* include/cpplib.h (cpp_finish): Add dependency stream parameter.
> +	* include/mkdeps.h (deps_add_module_target): Add new preprocessor
> +	parameter used for C++ module tracking.
> +	* init.cc (cpp_finish): Add new preprocessor parameter used for C++
> +	module tracking.
> +	* mkdeps.cc (mkdeps): Implement P1689R5 output.
> +
>   2022-09-08  Lewis Hyatt  <lhyatt@gmail.com>
>   
>   	* line-map.cc (location_adhoc_data_update): Remove reliance on
> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index 2db1e9cbdfb..90787230a9e 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
>   /* Style of header dependencies to generate.  */
>   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
>   
> +/* Format of header dependencies to generate.  */
> +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };

Why not add this to cpp_deps_style?

> +
>   /* The possible normalization levels, from most restrictive to least.  */
>   enum cpp_normalize_level {
>     /* In NFKC.  */
> @@ -581,6 +584,9 @@ struct cpp_options
>       /* Style of header dependencies to generate.  */
>       enum cpp_deps_style style;
>   
> +    /* Format of header dependencies to generate.  */
> +    enum cpp_deps_format format;
> +
>       /* Assume missing files are generated files.  */
>       bool missing_files;
>   
> @@ -1104,9 +1110,9 @@ extern void cpp_post_options (cpp_reader *);
>   extern void cpp_init_iconv (cpp_reader *);
>   
>   /* Call this to finish preprocessing.  If you requested dependency
> -   generation, pass an open stream to write the information to,
> -   otherwise NULL.  It is your responsibility to close the stream.  */
> -extern void cpp_finish (cpp_reader *, FILE *deps_stream);
> +   generation, pass open stream(s) to write the information to,
> +   otherwise NULL.  It is your responsibility to close the stream(s).  */
> +extern void cpp_finish (cpp_reader *, FILE *deps_stream, FILE *fdeps_stream);
>   
>   /* Call this to release the handle at the end of preprocessing.  Any
>      use of the handle after this function returns is invalid.  */
> diff --git a/libcpp/include/mkdeps.h b/libcpp/include/mkdeps.h
> index 96d64641b1a..396afa60f25 100644
> --- a/libcpp/include/mkdeps.h
> +++ b/libcpp/include/mkdeps.h
> @@ -53,20 +53,29 @@ extern void deps_add_default_target (class mkdeps *, const char *);
>   
>   /* Adds a module target.  The module name and cmi name are copied.  */
>   extern void deps_add_module_target (struct mkdeps *, const char *module,
> -				    const char *cmi, bool is_header);
> +				    const char *cmi, bool is_header,
> +				    bool is_exported);
>   
>   /* Adds a module dependency.  The module name is copied.  */
>   extern void deps_add_module_dep (struct mkdeps *, const char *module);
>   
> +/* Add an output.  */
> +extern void deps_add_output (struct mkdeps *, const char *, bool);
> +
>   /* Add a dependency (appears on the right side of the colon) to the
>      deps list.  Dependencies will be printed in the order that they
>      were entered with this function.  By convention, the first
>      dependency entered should be the primary source file.  */
>   extern void deps_add_dep (class mkdeps *, const char *);
>   
> -/* Write out a deps buffer to a specified file.  The last argument
> -   is the number of columns to word-wrap at (0 means don't wrap).  */
> -extern void deps_write (const cpp_reader *, FILE *, unsigned int);
> +/* Write out a deps buffer to a specified file.  The third argument
> +   is the number of columns to word-wrap at (0 means don't wrap).
> +   The last argument indicates whether to output extra information
> +   (namely modules).  */
> +extern void deps_write (const struct cpp_reader *, FILE *, unsigned int, int);
> +
> +/* Write out a deps buffer to a specified file in P1689R5 format.  */
> +extern void deps_write_p1689r5 (const struct mkdeps *, FILE *);
>   
>   /* Write out a deps buffer to a file, in a form that can be read back
>      with deps_restore.  Returns nonzero on error, in which case the
> diff --git a/libcpp/init.cc b/libcpp/init.cc
> index d3b4f00994b..d575afe23c7 100644
> --- a/libcpp/init.cc
> +++ b/libcpp/init.cc
> @@ -853,7 +853,7 @@ read_original_directory (cpp_reader *pfile)
>      Maybe it should also reset state, such that you could call
>      cpp_start_read with a new filename to restart processing.  */
>   void
> -cpp_finish (cpp_reader *pfile, FILE *deps_stream)
> +cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
>   {
>     /* Warn about unused macros before popping the final buffer.  */
>     if (CPP_OPTION (pfile, warn_unused_macros))
> @@ -867,8 +867,16 @@ cpp_finish (cpp_reader *pfile, FILE *deps_stream)
>     while (pfile->buffer)
>       _cpp_pop_buffer (pfile);
>   
> -  if (deps_stream)
> -    deps_write (pfile, deps_stream, 72);
> +  cpp_deps_format deps_format = CPP_OPTION (pfile, deps.format);
> +  if (deps_format == DEPS_FMT_P1689R5 && fdeps_stream)
> +    deps_write_p1689r5 (pfile->deps, fdeps_stream);
> +
> +  if (CPP_OPTION (pfile, deps.style) != DEPS_NONE
> +      && deps_stream)
> +    {
> +      deps_write (pfile, deps_stream,
> +		  72, deps_format == DEPS_FMT_NONE);
> +    }
>   
>     /* Report on headers that could use multiple include guards.  */
>     if (CPP_OPTION (pfile, print_include_names))
> diff --git a/libcpp/mkdeps.cc b/libcpp/mkdeps.cc
> index 30e87d8b4d7..e1253202d9f 100644
> --- a/libcpp/mkdeps.cc
> +++ b/libcpp/mkdeps.cc
> @@ -81,7 +81,8 @@ public:
>     };
>   
>     mkdeps ()
> -    : module_name (NULL), cmi_name (NULL), is_header_unit (false), quote_lwm (0)
> +    : primary_output (NULL), module_name (NULL), cmi_name (NULL)
> +    , is_header_unit (false), is_exported (false), quote_lwm (0)
>     {
>     }
>     ~mkdeps ()
> @@ -90,6 +91,9 @@ public:
>   
>       for (i = targets.size (); i--;)
>         free (const_cast <char *> (targets[i]));
> +    free (const_cast <char *> (primary_output));
> +    for (i = outputs.size (); i--;)
> +      free (const_cast <char *> (outputs[i]));
>       for (i = deps.size (); i--;)
>         free (const_cast <char *> (deps[i]));
>       for (i = vpath.size (); i--;)
> @@ -103,6 +107,8 @@ public:
>   public:
>     vec<const char *> targets;
>     vec<const char *> deps;
> +  const char * primary_output;
> +  vec<const char *> outputs;
>     vec<velt> vpath;
>     vec<const char *> modules;
>   
> @@ -110,6 +116,7 @@ public:
>     const char *module_name;
>     const char *cmi_name;
>     bool is_header_unit;
> +  bool is_exported;
>     unsigned short quote_lwm;
>   };
>   
> @@ -288,6 +295,21 @@ deps_add_default_target (class mkdeps *d, const char *tgt)
>       }
>   }
>   
> +/* Adds an output O.  We make a copy, so it need not be a permanent
> +   string.  */
> +void
> +deps_add_output (struct mkdeps *d, const char *o, bool is_primary)
> +{
> +  o = apply_vpath (d, o);
> +  if (is_primary)
> +  {
> +    if (d->primary_output)
> +      d->outputs.push (d->primary_output);
> +    d->primary_output = xstrdup (o);
> +  } else
> +    d->outputs.push (xstrdup (o));
> +}
> +
>   void
>   deps_add_dep (class mkdeps *d, const char *t)
>   {
> @@ -325,12 +347,13 @@ deps_add_vpath (class mkdeps *d, const char *vpath)
>   
>   void
>   deps_add_module_target (struct mkdeps *d, const char *m,
> -			const char *cmi, bool is_header_unit)
> +			const char *cmi, bool is_header_unit, bool is_exported)
>   {
>     gcc_assert (!d->module_name);
>     
>     d->module_name = xstrdup (m);
>     d->is_header_unit = is_header_unit;
> +  d->is_exported = is_exported;
>     d->cmi_name = xstrdup (cmi);
>   }
>   
> @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp,
>      .PHONY targets for all the dependencies too.  */
>   
>   static void
> -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int extra)

Instead of adding the "extra" parameter...

>   {
>     const mkdeps *d = pfile->deps;
>   
> @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
>     if (d->deps.size ())
>       {
>         column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> -      if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> +      if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
>   	column = make_write_name (d->cmi_name, fp, column, colmax);
>         fputs (":", fp);
>         column++;
> @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
>     if (!CPP_OPTION (pfile, deps.modules))
>       return;

...how about checking CPP_OPTION for p1689r5 mode here?

>   
> -  if (d->modules.size ())
> +  if (extra && d->modules.size ())
>       {
>         column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
>         if (d->cmi_name)
> @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
>         fputs ("\n", fp);
>       }
>   
> -  if (d->module_name)
> +  if (extra && d->module_name)
>       {
>         if (d->cmi_name)
>   	{
> @@ -455,7 +478,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
>   	}
>       }
>     
> -  if (d->modules.size ())
> +  if (extra && d->modules.size ())
>       {
>         column = fprintf (fp, "CXX_IMPORTS +=");
>         make_write_vec (d->modules, fp, column, colmax, 0, ".c++m");
> @@ -468,9 +491,203 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
>   /* Really we should be opening fp here.  */
>   
>   void
> -deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> +deps_write (const struct cpp_reader *pfile, FILE *fp, unsigned int colmax,
> +	    int extra)
> +{
> +  make_write (pfile, fp, colmax, extra);
> +}
> +
> +static bool
> +is_utf8 (const char *name)

Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?

> +{
> +  int byte_length = 0;
> +  int expected_continuations = 0;
> +  uint32_t codepoint = 0;
> +
> +  for (const char* c = name; *c; c++)
> +    {
> +      int byte = *c;
> +
> +      // Check for a continuation character.
> +      if ((byte & 0xc0) == 0x80) // 10xx_xxxx
> +	{
> +	  --expected_continuations;
> +
> +	  if (expected_continuations < 0)
> +	    return false;
> +
> +	  // Add the encoded bits to the codepoint.
> +	  codepoint = (codepoint << 6) + (byte & ~0x3f);
> +
> +	  continue;
> +	}
> +
> +      // Make sure we don't expect more continuation characters
> +      // once we don't find one.
> +      if (expected_continuations != 0)
> +	return false;
> +
> +      // Invalid because if they appear, it is an attempt to encode an ASCII
> +      // character in 2 bytes.
> +      if (byte == 0xc0 || byte == 0xc1) // 1100_0000 || 1100_0001
> +	return false;
> +
> +      // Invalid because they would encode codepoints greater than allowed
> +      // (0x10FFFF).
> +      if (byte > 0xf4)
> +	return false;
> +
> +      // ED-prefixed sequence encoding UTF-16 surrogate halves.
> +      if (0xD0FF <= codepoint && codepoint <= 0xDFFF)
> +	return false;
> +
> +      // 0x10FFFF is the highest valid codepoint for UTF-8.
> +      if (0x10FFFF < codepoint)
> +	return false;
> +
> +      // Overlong encoding of a codepoint.
> +      if (byte_length == 2 && codepoint < 0x0800)
> +	return false;
> +      if (byte_length == 3 && codepoint < 0x10000)
> +	return false;
> +
> +      // Single byte character.
> +      if ((byte & 0x80) == 0x00) // 0xxx_xxxx
> +	{
> +	  codepoint = byte;
> +	  byte_length = expected_continuations = 0;
> +	}
> +      // Prefix codes.
> +      else if ((byte & 0xe0) == 0xc0) // 110x_xxxx
> +	{
> +	  codepoint = byte & 0x1f;
> +	  byte_length = expected_continuations = 1;
> +	}
> +      else if ((byte & 0xf0) == 0xe0) // 1110_xxxx
> +	{
> +	  codepoint = byte & 0x0f;
> +	  byte_length = expected_continuations = 2;
> +	}
> +      else if ((byte & 0xf8) == 0xf0) // 1111_0xxx
> +	{
> +	  codepoint = byte & 0x07;
> +	  byte_length = expected_continuations = 3;
> +	}
> +      else
> +	return false;
> +    }
> +
> +  if (expected_continuations != 0)
> +    return false;
> +
> +  return true;
> +}
> +
> +static void
> +p1689r5_write_filepath (const char *name, FILE *fp)
>   {
> -  make_write (pfile, fp, colmax);
> +  if (is_utf8 (name))
> +    {
> +      fputc ('"', fp);
> +      for (const char* c = name; *c; c++)
> +	{
> +	  // Escape control characters.
> +	  if (ISCNTRL (*c))
> +	    fprintf (fp, "\\u%04x", *c);
> +	  // JSON escape characters.
> +	  else if (*c == '"' || *c == '\\')
> +	    {
> +	      fputc ('\\', fp);
> +	      fputc (*c, fp);
> +	    }
> +	  // Everything else.
> +	  else
> +	    fputc (*c, fp);
> +	}
> +      fputc ('"', fp);
> +    }
> +  else
> +    {
> +      // TODO: print an error
> +    }
> +}
> +
> +static void
> +p1689r5_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp)
> +{
> +  for (unsigned ix = 0; ix != vec.size (); ix++)
> +    {
> +      p1689r5_write_filepath (vec[ix], fp);
> +      if (ix < vec.size () - 1)
> +	fputc (',', fp);
> +      fputc ('\n', fp);
> +    }
> +}
> +
> +void
> +deps_write_p1689r5 (const struct mkdeps *d, FILE *fp)
> +{
> +  fputs ("{\n", fp);
> +
> +  fputs ("\"rules\": [\n", fp);
> +  fputs ("{\n", fp);
> +
> +  if (d->primary_output)
> +    {
> +      fputs ("\"primary-output\": ", fp);
> +      p1689r5_write_filepath (d->primary_output, fp);
> +      fputs (",\n", fp);
> +    }
> +
> +  if (d->outputs.size ())
> +    {
> +      fputs ("\"outputs\": [\n", fp);
> +      p1689r5_write_vec (d->outputs, fp);
> +      fputs ("],\n", fp);
> +    }
> +
> +  if (d->module_name)
> +    {
> +      fputs ("\"provides\": [\n", fp);
> +      fputs ("{\n", fp);
> +
> +      fputs ("\"logical-name\": ", fp);
> +      p1689r5_write_filepath (d->module_name, fp);
> +      fputs (",\n", fp);
> +
> +      fprintf (fp, "\"is-interface\": %s\n", d->is_exported ? "true" : "false");
> +
> +      // TODO: header-unit information
> +
> +      fputs ("}\n", fp);
> +      fputs ("],\n", fp);
> +    }
> +
> +  fputs ("\"requires\": [\n", fp);
> +  for (size_t i = 0; i < d->modules.size (); i++)
> +    {
> +      if (i != 0)
> +	fputs (",\n", fp);
> +      fputs ("{\n", fp);
> +
> +      fputs ("\"logical-name\": ", fp);
> +      p1689r5_write_filepath (d->modules[i], fp);
> +      fputs ("\n", fp);
> +
> +      // TODO: header-unit information
> +
> +      fputs ("}\n", fp);
> +    }
> +  fputs ("]\n", fp);
> +
> +  fputs ("}\n", fp);
> +
> +  fputs ("],\n", fp);
> +
> +  fputs ("\"version\": 0,\n", fp);
> +  fputs ("\"revision\": 0\n", fp);
> +
> +  fputs ("}\n", fp);
>   }
>   
>   /* Write out a deps buffer to a file, in a form that can be read back


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-04 19:12   ` Harald Anlauf
@ 2022-10-11 11:30     ` Ben Boeckel
  0 siblings, 0 replies; 14+ messages in thread
From: Ben Boeckel @ 2022-10-11 11:30 UTC (permalink / raw)
  To: Harald Anlauf
  Cc: Ben Boeckel, gcc-patches, gcc, jason, brad.king, fortran, nathan

On Tue, Oct 04, 2022 at 21:12:03 +0200, Harald Anlauf wrote:
> Am 04.10.22 um 17:12 schrieb Ben Boeckel:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Is there a reason that you are touching so many frontends?

Just those that needed the update for `cpp_finish`. It does align with
those that will (eventually) need this support anyways (AFAIK).

> > diff --git a/gcc/fortran/cpp.cc b/gcc/fortran/cpp.cc
> > index 364bd0d2a85..0b9df9c02cd 100644
> > --- a/gcc/fortran/cpp.cc
> > +++ b/gcc/fortran/cpp.cc
> > @@ -712,7 +712,7 @@ gfc_cpp_done (void)
> >   	  FILE *f = fopen (gfc_cpp_option.deps_filename, "w");
> >   	  if (f)
> >   	    {
> > -	      cpp_finish (cpp_in, f);
> > +	      cpp_finish (cpp_in, f, NULL);
> >   	      fclose (f);
> >   	    }
> >   	  else
> > @@ -721,7 +721,7 @@ gfc_cpp_done (void)
> >   			     xstrerror (errno));
> >   	}
> >         else
> > -	cpp_finish (cpp_in, stdout);
> > +	cpp_finish (cpp_in, stdout, NULL);
> >       }
> >
> >     cpp_undef_all (cpp_in);
> 
> Couldn't you simply default the third argument of cpp_finish() to NULL?

I could do that. Wasn't sure how much that would be acceptable given
that it is a "silent" change, but it would reduce the number of files
touched here.

Thanks,

--Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-10 21:04   ` Jason Merrill
@ 2022-10-11 11:42     ` Ben Boeckel
  2022-10-18 12:18       ` Ben Boeckel
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Boeckel @ 2022-10-11 11:42 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> On 10/4/22 11:12, Ben Boeckel wrote:
> > This patch implements support for [P1689R5][] to communicate to a build
> > system the C++20 module dependencies to build systems so that they may
> > build `.gcm` files in the proper order.
> 
> Thanks!
> 
> > Support is communicated through the following three new flags:
> > 
> > - `-fdeps-format=` specifies the format for the output. Currently named
> >    `p1689r5`.
> > 
> > - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> Do you expect users to want to emit Makefile (-MM) and P1689 
> dependencies from the same compilation?

Yes, the build system wants to know what files affect scanning as well
(e.g., `#include <iostream>` is still possible and if it changes, this
operation should be performed again. The `-M` flags do this quite nicely
already :) .

> > - `-fdep-output=` specifies the `.o` that will be written for the TU
> >    that is scanned. This is required so that the build system can
> >    correlate the dependency output with the actual compilation that will
> >    occur.
> 
> The dependency machinery already needs to be able to figure out the name 
> of the output file, can't we reuse that instead of specifying it on the 
> command line?

This is how it determines the output of the compilation. Because it is
piggy-backing on the `-E` flag, the `-o` flag specifies the output of
the preprocessed source (purely a side-effect right now).

> > diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> > index 2db1e9cbdfb..90787230a9e 100644
> > --- a/libcpp/include/cpplib.h
> > +++ b/libcpp/include/cpplib.h
> > @@ -298,6 +298,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
> >   /* Style of header dependencies to generate.  */
> >   enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
> >   
> > +/* Format of header dependencies to generate.  */
> > +enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
> 
> Why not add this to cpp_deps_style?

It is orthogonal. The `-M` flags and `-fdeps-*` flags are similar, but
`-fdeps-*` dependencies are fundamentally different than `-M`
dependencies. The comment does need updated though.

`-M` is about discovered dependencies: those that you find out while
doing work. `-fdep-*` is about ordering dependencies: extracting
information from file content in order to even order future work around.
It stems from the loss of embarassing parallelism when building C++20
code that uses `import`. It's isomorphic to the Fortran module
compilation ordering problem.

> > @@ -387,7 +410,7 @@ make_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp,
> >      .PHONY targets for all the dependencies too.  */
> >   
> >   static void
> > -make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax, int extra)
> 
> Instead of adding the "extra" parameter...

Hmm. Not sure why I had named this so poorly. Maybe this comes from my
initial stab at this functionality in 2019 (the format has been hammered
out in ISO C++'s SG15 since then).

> >   {
> >     const mkdeps *d = pfile->deps;
> >   
> > @@ -398,7 +421,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> >     if (d->deps.size ())
> >       {
> >         column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> > -      if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> > +      if (extra && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
> >   	column = make_write_name (d->cmi_name, fp, column, colmax);
> >         fputs (":", fp);
> >         column++;
> > @@ -412,7 +435,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> >     if (!CPP_OPTION (pfile, deps.modules))
> >       return;
> 
> ...how about checking CPP_OPTION for p1689r5 mode here?

I'll take a look at this.

> >   
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >       {
> >         column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
> >         if (d->cmi_name)
> > @@ -423,7 +446,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> >         fputs ("\n", fp);
> >       }
> >   
> > -  if (d->module_name)
> > +  if (extra && d->module_name)
> >       {
> >         if (d->cmi_name)
> >   	{
> > @@ -455,7 +478,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> >   	}
> >       }
> >     
> > -  if (d->modules.size ())
> > +  if (extra && d->modules.size ())
> >       {
> >         column = fprintf (fp, "CXX_IMPORTS +=");
> >         make_write_vec (d->modules, fp, column, colmax, 0, ".c++m");
> > @@ -468,9 +491,203 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> >   /* Really we should be opening fp here.  */
> >   
> >   void
> > -deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
> > +deps_write (const struct cpp_reader *pfile, FILE *fp, unsigned int colmax,
> > +	    int extra)
> > +{
> > +  make_write (pfile, fp, colmax, extra);
> > +}
> > +
> > +static bool
> > +is_utf8 (const char *name)
> 
> Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?

I can look at factoring that out. I'll have to decode its logic to see
how much overlap there is.

Thanks,

--Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/1] RFC: P1689R5 support
  2022-10-10 20:21 ` [PATCH RESEND 0/1] RFC: P1689R5 support Jason Merrill
@ 2022-10-13 17:08   ` David Malcolm
  2022-10-18 12:22     ` Ben Boeckel
  0 siblings, 1 reply; 14+ messages in thread
From: David Malcolm @ 2022-10-13 17:08 UTC (permalink / raw)
  To: Jason Merrill, Ben Boeckel, gcc-patches
  Cc: nathan, fortran, gcc, brad.king, Ben Boeckel, Martin Liska

On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> On 10/4/22 11:11, Ben Boeckel wrote:
> > This patch adds initial support for ISO C++'s [P1689R5][], a format
> > for
> > describing C++ module requirements and provisions based on the
> > source
> > code. This is required because compiling C++ with modules is not
> > embarrassingly parallel and need to be ordered to ensure that
> > `import
> > some_module;` can be satisfied in time by making sure that the TU
> > with
> > `export import some_module;` is compiled first.
> > 
> > [P1689R5]: https://isocpp.org/files/papers/P1689R5.html
> > 
> > I'd like feedback on the approach taken here with respect to the
> > user-visible flags. I'll also note that header units are not
> > supported
> > at this time because the current `-E` behavior with respect to
> > `import
> > <some_header>;` is to search for an appropriate `.gcm` file which
> > is not
> > something such a "scan" can support. A new mode will likely need to
> > be
> > created (e.g., replacing `-E` with `-fc++-module-scanning` or
> > something)
> > where headers are looked up "normally" and processed only as much
> > as
> > scanning requires.
> > 
> > Testing is currently happening in CMake's CI using a prior revision
> > of
> > this patch (the differences are basically the changelog, some
> > style, and
> > `trtbd` instead of `p1689r5` as the format name).
> > 
> > For testing within GCC, I'll work on the following:
> > 
> > - scanning non-module source
> > - scanning module-importing source (`import X;`)
> > - scanning module-exporting source (`export module X;`)
> > - scanning module implementation unit (`module X;`)
> > - flag combinations?
> > 
> > Are there existing tools for handling JSON output for testing
> > purposes?
> 
> David Malcolm would probably know best about JSON wrangling.

Unfortunately our JSON output doesn't make any guarantees about the
ordering of keys within an object, so the precise textual output
changes from run to run.  I've coped with that in my test cases by
limiting myself to simple regexes of fragments of the JSON output.

Martin Liska [CCed] went much further in
4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
DejaGnu directive, allowing for test cases for gcov to be written in
Python, which can thus test much more interesting assertions about the
generated JSON.

Dave

> 
> > Basically, something that I can add to the test suite that doesn't
> > care
> > about whitespace, but checks the structure (with sensible
> > replacements
> > for absolute paths where relevant)?
> 
> Various tests in g++.dg/debug/dwarf2 handle that sort of thing with
> regexps.
> 
> > For the record, Clang has patches with similar flags and behavior
> > by
> > Chuanqi Xu here:
> > 
> >      https://reviews.llvm.org/D134269
> > 
> > with the same flags (though using my old `trtbd` spelling for the
> > format name).
> > 
> > Thanks,
> > 
> > --Ben
> > 
> > Ben Boeckel (1):
> >    p1689r5: initial support
> > 
> >   gcc/ChangeLog           |   9 ++
> >   gcc/c-family/ChangeLog  |   6 +
> >   gcc/c-family/c-opts.cc  |  40 ++++++-
> >   gcc/c-family/c.opt      |  12 ++
> >   gcc/cp/ChangeLog        |   5 +
> >   gcc/cp/module.cc        |   3 +-
> >   gcc/doc/invoke.texi     |  15 +++
> >   gcc/fortran/ChangeLog   |   5 +
> >   gcc/fortran/cpp.cc      |   4 +-
> >   gcc/genmatch.cc         |   2 +-
> >   gcc/input.cc            |   4 +-
> >   libcpp/ChangeLog        |  11 ++
> >   libcpp/include/cpplib.h |  12 +-
> >   libcpp/include/mkdeps.h |  17 ++-
> >   libcpp/init.cc          |  14 ++-
> >   libcpp/mkdeps.cc        | 235
> > ++++++++++++++++++++++++++++++++++++++--
> >   16 files changed, 368 insertions(+), 26 deletions(-)
> > 
> > 
> > base-commit: d812e8cb2a920fd75768e16ca8ded59ad93c172f
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-11 11:42     ` Ben Boeckel
@ 2022-10-18 12:18       ` Ben Boeckel
  2022-10-20 15:39         ` Jason Merrill
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Boeckel @ 2022-10-18 12:18 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote:
> On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
> > Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?
> 
> I can look at factoring that out. I'll have to decode its logic to see
> how much overlap there is.

There is some mismatch. First, that is in `gcc` and this is in `libcpp`.
Second, `pretty-print.cc`'s implementation:

- fails on an empty string;
- accepts extended-length (5+-byte) encodings which are invalid Unicode;
  and
- decodes codepoint-by-codepoint instead of just validating the entire
  string.

--Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/1] RFC: P1689R5 support
  2022-10-13 17:08   ` David Malcolm
@ 2022-10-18 12:22     ` Ben Boeckel
  2022-10-19  7:21       ` Martin Liška
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Boeckel @ 2022-10-18 12:22 UTC (permalink / raw)
  To: David Malcolm
  Cc: Jason Merrill, Ben Boeckel, gcc-patches, nathan, fortran, gcc,
	brad.king, Martin Liska

On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote:
> On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
> > David Malcolm would probably know best about JSON wrangling.
> 
> Unfortunately our JSON output doesn't make any guarantees about the
> ordering of keys within an object, so the precise textual output
> changes from run to run.  I've coped with that in my test cases by
> limiting myself to simple regexes of fragments of the JSON output.
> 
> Martin Liska [CCed] went much further in
> 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
> DejaGnu directive, allowing for test cases for gcov to be written in
> Python, which can thus test much more interesting assertions about the
> generated JSON.

Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things.
Part of this is because I want to assert that unnecessary fields don't
exist and that sounds…unlikely to be possible in any maintainable way
(assuming it is possible) with regexen. `jq` could help immensely, but
that is probably a bridge too far :) .

Thanks,

--Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 0/1] RFC: P1689R5 support
  2022-10-18 12:22     ` Ben Boeckel
@ 2022-10-19  7:21       ` Martin Liška
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Liška @ 2022-10-19  7:21 UTC (permalink / raw)
  To: Ben Boeckel, David Malcolm
  Cc: Jason Merrill, Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On 10/18/22 14:22, Ben Boeckel wrote:
> On Thu, Oct 13, 2022 at 13:08:46 -0400, David Malcolm wrote:
>> On Mon, 2022-10-10 at 16:21 -0400, Jason Merrill wrote:
>>> David Malcolm would probably know best about JSON wrangling.
>>
>> Unfortunately our JSON output doesn't make any guarantees about the
>> ordering of keys within an object, so the precise textual output
>> changes from run to run.  I've coped with that in my test cases by
>> limiting myself to simple regexes of fragments of the JSON output.
>>
>> Martin Liska [CCed] went much further in
>> 4e275dccfc2467b3fe39012a3dd2a80bac257dd0 by adding a run-gcov-pytest
>> DejaGnu directive, allowing for test cases for gcov to be written in
>> Python, which can thus test much more interesting assertions about the
>> generated JSON.
> 
> Ok, if Python is acceptable, I'll use its stdlib to do "fancy" things.
> Part of this is because I want to assert that unnecessary fields don't
> exist and that sounds…unlikely to be possible in any maintainable way
> (assuming it is possible) with regexen. `jq` could help immensely, but
> that is probably a bridge too far :) .

Yes, please use Python if you have a more complicated output verification.

Examples I introduced:
./gcc/testsuite/g++.dg/gcov/test-pr98273.py
./gcc/testsuite/g++.dg/gcov/test-gcov-17.py

Martin

> 
> Thanks,
> 
> --Ben


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-18 12:18       ` Ben Boeckel
@ 2022-10-20 15:39         ` Jason Merrill
  2022-10-20 17:31           ` Ben Boeckel
  0 siblings, 1 reply; 14+ messages in thread
From: Jason Merrill @ 2022-10-20 15:39 UTC (permalink / raw)
  To: Ben Boeckel; +Cc: Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On 10/18/22 08:18, Ben Boeckel wrote:
> On Tue, Oct 11, 2022 at 07:42:43 -0400, Ben Boeckel wrote:
>> On Mon, Oct 10, 2022 at 17:04:09 -0400, Jason Merrill wrote:
>>> Can we share utf8 parsing code with decode_utf8_char in pretty-print.cc?
>>
>> I can look at factoring that out. I'll have to decode its logic to see
>> how much overlap there is.
> 
> There is some mismatch. First, that is in `gcc` and this is in `libcpp`.

Oops, I was thinking this was in gcc as well.  In libcpp there's 
_cpp_valid_utf8 (which calls one_utf8_to_cppchar).

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-20 15:39         ` Jason Merrill
@ 2022-10-20 17:31           ` Ben Boeckel
  2022-10-20 18:22             ` Jason Merrill
  0 siblings, 1 reply; 14+ messages in thread
From: Ben Boeckel @ 2022-10-20 17:31 UTC (permalink / raw)
  To: Jason Merrill; +Cc: Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
> Oops, I was thinking this was in gcc as well.  In libcpp there's 
> _cpp_valid_utf8 (which calls one_utf8_to_cppchar).

This routine has a lot more logic (including UCN decoding) and the
`one_utf8_to_cppchar` also supports out-of-bounds codepoints above
`0x10FFFF`.

--Ben

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH RESEND 1/1] p1689r5: initial support
  2022-10-20 17:31           ` Ben Boeckel
@ 2022-10-20 18:22             ` Jason Merrill
  0 siblings, 0 replies; 14+ messages in thread
From: Jason Merrill @ 2022-10-20 18:22 UTC (permalink / raw)
  To: Ben Boeckel; +Cc: Ben Boeckel, gcc-patches, nathan, fortran, gcc, brad.king

On 10/20/22 13:31, Ben Boeckel wrote:
> On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote:
>> Oops, I was thinking this was in gcc as well.  In libcpp there's
>> _cpp_valid_utf8 (which calls one_utf8_to_cppchar).
> 
> This routine has a lot more logic (including UCN decoding) and the
> `one_utf8_to_cppchar` also supports out-of-bounds codepoints above
> `0x10FFFF`.

The latter seems like a bug to be fixed; presumably it hasn't been 
updated since the range of codepoints was restricted.  This sort of 
thing is why I'd like to minimize the number of separate implementations 
of UTF-8 parsing.

Jason


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-20 18:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-04 15:11 [PATCH RESEND 0/1] RFC: P1689R5 support Ben Boeckel
2022-10-04 15:12 ` [PATCH RESEND 1/1] p1689r5: initial support Ben Boeckel
2022-10-04 19:12   ` Harald Anlauf
2022-10-11 11:30     ` Ben Boeckel
2022-10-10 21:04   ` Jason Merrill
2022-10-11 11:42     ` Ben Boeckel
2022-10-18 12:18       ` Ben Boeckel
2022-10-20 15:39         ` Jason Merrill
2022-10-20 17:31           ` Ben Boeckel
2022-10-20 18:22             ` Jason Merrill
2022-10-10 20:21 ` [PATCH RESEND 0/1] RFC: P1689R5 support Jason Merrill
2022-10-13 17:08   ` David Malcolm
2022-10-18 12:22     ` Ben Boeckel
2022-10-19  7:21       ` Martin Liška

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).