public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] RFC: P1689R5 support
@ 2022-10-27 23:16 Ben Boeckel
  2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-27 23:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ben Boeckel, jason, nathan, fortran, gcc, brad.king, dmalcolm,
	mliska, anlauf

Hi,

This patch adds initial support for ISO C++'s [P1689R5][], a format for
describing C++ module requirements and provisions based on the source
code. This is required because compiling C++ with modules is not
embarrassingly parallel and need to be ordered to ensure that `import
some_module;` can be satisfied in time by making sure that the TU with
`export import some_module;` is compiled first.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html

I'd like feedback on the approach taken here with respect to the
user-visible flags. I'll also note that header units are not supported
at this time because the current `-E` behavior with respect to `import
<some_header>;` is to search for an appropriate `.gcm` file which is not
something such a "scan" can support. A new mode will likely need to be
created (e.g., replacing `-E` with `-fc++-module-scanning` or something)
where headers are looked up "normally" and processed only as much as
scanning requires.

For the record, Clang has patches with similar flags and behavior by
Chuanqi Xu here:

    https://reviews.llvm.org/D134269

with the same flags.

Thanks,

--Ben

---
v1 -> v2:

- removal of the `deps_write(extra)` parameter to option-checking where
  ndeeded
- default parameter of `cpp_finish(fdeps_stream = NULL)`
- unification of libcpp UTF-8 validity functions from v1
- test cases for flag parsing states (depflags-*) and p1689 output
  (p1689-*)

Ben Boeckel (3):
  libcpp: reject codepoints above 0x10FFFF
  libcpp: add a function to determine UTF-8 validity of a C string
  p1689r5: initial support

 gcc/ChangeLog                                 |   5 +
 gcc/c-family/ChangeLog                        |   6 +
 gcc/c-family/c-opts.cc                        |  40 +++-
 gcc/c-family/c.opt                            |  12 +
 gcc/cp/ChangeLog                              |   5 +
 gcc/cp/module.cc                              |   3 +-
 gcc/doc/invoke.texi                           |  15 ++
 gcc/testsuite/ChangeLog                       |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C     |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C    |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C    |   4 +
 .../g++.dg/modules/depflags-fjo-MD.C          |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C     |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C     |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp      |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C        |  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C        |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py    | 222 ++++++++++++++++++
 gcc/testsuite/lib/modules.exp                 |  71 ++++++
 libcpp/ChangeLog                              |  23 ++
 libcpp/charset.cc                             |  22 +-
 libcpp/include/cpplib.h                       |  12 +-
 libcpp/include/mkdeps.h                       |  17 +-
 libcpp/init.cc                                |  13 +-
 libcpp/internal.h                             |   2 +
 libcpp/mkdeps.cc                              | 149 +++++++++++-
 43 files changed, 823 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp


base-commit: f95d3d5de72a1c43e8d529bad3ef59afc3214705
-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF
  2022-10-27 23:16 [PATCH v2 0/1] RFC: P1689R5 support Ben Boeckel
@ 2022-10-27 23:16 ` Ben Boeckel
  2022-10-28 12:54   ` David Malcolm
  2022-11-07 23:04   ` Jason Merrill
  2022-10-27 23:16 ` [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Ben Boeckel
  2022-10-27 23:16 ` [PATCH v2 3/3] p1689r5: initial support Ben Boeckel
  2 siblings, 2 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-27 23:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ben Boeckel, jason, nathan, fortran, gcc, brad.king, dmalcolm,
	mliska, anlauf

Unicode does not support such values because they are unrepresentable in
UTF-16.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 libcpp/ChangeLog  | 6 ++++++
 libcpp/charset.cc | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 18d5bcceaf0..4d707277531 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* include/charset.cc: Reject encodings of codepoints above 0x10FFFF.
+	UTF-16 does not support such codepoints and therefore all Unicode
+	rejects such values.
+
 2022-10-19  Lewis Hyatt  <lhyatt@gmail.com>
 
 	* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 12a398e7527..e9da6674b5f 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp,
   if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
 
   /* Make sure the character is valid.  */
-  if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
+  if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
 
   *cp = c;
   *inbufp = inbuf;
@@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp,
   s += inbuf[bigend ? 2 : 1] << 8;
   s += inbuf[bigend ? 3 : 0];
 
-  if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF))
+  if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF))
     return EILSEQ;
 
   rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);
-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
  2022-10-27 23:16 [PATCH v2 0/1] RFC: P1689R5 support Ben Boeckel
  2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
@ 2022-10-27 23:16 ` Ben Boeckel
  2022-10-28 12:59   ` David Malcolm
  2022-11-07 23:47   ` Jason Merrill
  2022-10-27 23:16 ` [PATCH v2 3/3] p1689r5: initial support Ben Boeckel
  2 siblings, 2 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-27 23:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ben Boeckel, jason, nathan, fortran, gcc, brad.king, dmalcolm,
	mliska, anlauf

This simplifies the interface for other UTF-8 validity detections when a
simple "yes" or "no" answer is sufficient.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
---
 libcpp/ChangeLog  |  6 ++++++
 libcpp/charset.cc | 18 ++++++++++++++++++
 libcpp/internal.h |  2 ++
 3 files changed, 26 insertions(+)

diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4d707277531..4e2c7900ae2 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* include/charset.cc: Add `_cpp_valid_utf8_str` which determines
+	whether a C string is valid UTF-8 or not.
+	* include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
+
 2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
 
 	* include/charset.cc: Reject encodings of codepoints above 0x10FFFF.
diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index e9da6674b5f..0524ab6beba 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
   return true;
 }
 
+extern bool
+_cpp_valid_utf8_str (const char *name)
+{
+  const uchar* in = (const uchar*)name;
+  size_t len = strlen(name);
+  cppchar_t cp;
+
+  while (*in)
+    {
+      if (one_utf8_to_cppchar(&in, &len, &cp))
+	{
+	  return false;
+	}
+    }
+
+  return true;
+}
+
 /* Subroutine of convert_hex and convert_oct.  N is the representation
    in the execution character set of a numeric escape; write it into the
    string buffer TBUF and update the end-of-string pointer therein.  WIDE
diff --git a/libcpp/internal.h b/libcpp/internal.h
index badfd1b40da..4f2dd4a2f5c 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
 			     struct normalize_state *nst,
 			     cppchar_t *cp);
 
+extern bool _cpp_valid_utf8_str (const char *str);
+
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] p1689r5: initial support
  2022-10-27 23:16 [PATCH v2 0/1] RFC: P1689R5 support Ben Boeckel
  2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
  2022-10-27 23:16 ` [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Ben Boeckel
@ 2022-10-27 23:16 ` Ben Boeckel
  2022-10-28 17:15   ` Ben Boeckel
  2022-11-01 14:57   ` Tom Tromey
  2 siblings, 2 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-27 23:16 UTC (permalink / raw)
  To: gcc-patches
  Cc: Ben Boeckel, jason, nathan, fortran, gcc, brad.king, dmalcolm,
	mliska, anlauf

This patch implements support for [P1689R5][] to communicate to a build
system the C++20 module dependencies to build systems so that they may
build `.gcm` files in the proper order.

Support is communicated through the following three new flags:

- `-fdeps-format=` specifies the format for the output. Currently named
  `p1689r5`.

- `-fdeps-file=` specifies the path to the file to write the format to.

- `-fdep-output=` specifies the `.o` that will be written for the TU
  that is scanned. This is required so that the build system can
  correlate the dependency output with the actual compilation that will
  occur.

CMake supports this format as of 17 Jun 2022 (to be part of 3.25.0)
using an experimental feature selection (to allow for future usage
evolution without committing to how it works today). While it remains
experimental, docs may be found in CMake's documentation for
experimental features.

Future work may include using this format for Fortran module
dependencies as well, however this is still pending work.

[P1689R5]: https://isocpp.org/files/papers/P1689R5.html
[cmake-experimental]: https://gitlab.kitware.com/cmake/cmake/-/blob/master/Help/dev/experimental.rst

TODO:

- header-unit information fields

Header units (including the standard library headers) are 100%
unsupported right now because the `-E` mechanism wants to import their
BMIs. A new mode (i.e., something more workable than existing `-E`
behavior) that mocks up header units as if they were imported purely
from their path and content would be required.

- non-utf8 paths

The current standard says that paths that are not unambiguously
represented using UTF-8 are not supported (because these cases are rare
and the extra complication is not worth it at this time). Future
versions of the format might have ways of encoding non-UTF-8 paths. For
now, this patch just doesn't support non-UTF-8 paths (ignoring the
"unambiguously represetable in UTF-8" case).

- figure out why junk gets placed at the end of the file

Sometimes it seems like the file gets a lot of `NUL` bytes appended to
it. It happens rarely and seems to be the result of some
`ftruncate`-style call which results in extra padding in the contents.
Noting it here as an observation at least.

Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>

---
 gcc/ChangeLog                                 |   5 +
 gcc/c-family/ChangeLog                        |   6 +
 gcc/c-family/c-opts.cc                        |  40 +++-
 gcc/c-family/c.opt                            |  12 +
 gcc/cp/ChangeLog                              |   5 +
 gcc/cp/module.cc                              |   3 +-
 gcc/doc/invoke.texi                           |  15 ++
 gcc/testsuite/ChangeLog                       |   7 +
 gcc/testsuite/g++.dg/modules/depflags-f-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-f.C     |   1 +
 gcc/testsuite/g++.dg/modules/depflags-fi.C    |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fj.C    |   4 +
 .../g++.dg/modules/depflags-fjo-MD.C          |   4 +
 gcc/testsuite/g++.dg/modules/depflags-fjo.C   |   5 +
 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-fo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-j-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-j.C     |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C |   3 +
 gcc/testsuite/g++.dg/modules/depflags-jo.C    |   4 +
 gcc/testsuite/g++.dg/modules/depflags-o-MD.C  |   2 +
 gcc/testsuite/g++.dg/modules/depflags-o.C     |   3 +
 gcc/testsuite/g++.dg/modules/modules.exp      |  11 +
 gcc/testsuite/g++.dg/modules/p1689-1.C        |  18 ++
 gcc/testsuite/g++.dg/modules/p1689-1.exp.json |  27 +++
 gcc/testsuite/g++.dg/modules/p1689-2.C        |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-2.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-3.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-3.exp.json |  16 ++
 gcc/testsuite/g++.dg/modules/p1689-4.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-4.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.C        |  14 ++
 gcc/testsuite/g++.dg/modules/p1689-5.exp.json |  14 ++
 gcc/testsuite/g++.dg/modules/test-p1689.py    | 222 ++++++++++++++++++
 gcc/testsuite/lib/modules.exp                 |  71 ++++++
 libcpp/ChangeLog                              |  11 +
 libcpp/include/cpplib.h                       |  12 +-
 libcpp/include/mkdeps.h                       |  17 +-
 libcpp/init.cc                                |  13 +-
 libcpp/mkdeps.cc                              | 149 +++++++++++-
 41 files changed, 789 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-f.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fi.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fj.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fjo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-fo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-j.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-jo.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o-MD.C
 create mode 100644 gcc/testsuite/g++.dg/modules/depflags-o.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-1.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-2.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-3.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-4.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.C
 create mode 100644 gcc/testsuite/g++.dg/modules/p1689-5.exp.json
 create mode 100644 gcc/testsuite/g++.dg/modules/test-p1689.py
 create mode 100644 gcc/testsuite/lib/modules.exp

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f9052da2f97..4bdf70701f1 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* doc/invoke.texi: Document -fdeps-format=, -fdep-file=, and
+	-fdep-output= flags.
+
 2022-10-26  David Faust  <david.faust@oracle.com>
 
 	* config/bpf/bpf.cc: Support __builtin_preserve_field_info.
diff --git a/gcc/c-family/ChangeLog b/gcc/c-family/ChangeLog
index ee7b51179e1..819015fdaad 100644
--- a/gcc/c-family/ChangeLog
+++ b/gcc/c-family/ChangeLog
@@ -1,3 +1,9 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* c-opts.cc (c_common_handle_option): Add fdeps_file variable and
+	-fdeps-format=, -fdep-file=, and -fdep-output= parsing.
+	* c.opt: Add -fdeps-format=, -fdep-file=, and -fdep-output= flags.
+
 2022-10-26  Marek Polacek  <polacek@redhat.com>
 
 	PR c++/106393
diff --git a/gcc/c-family/c-opts.cc b/gcc/c-family/c-opts.cc
index 32b929e3ece..093432bddeb 100644
--- a/gcc/c-family/c-opts.cc
+++ b/gcc/c-family/c-opts.cc
@@ -77,6 +77,9 @@ static bool verbose;
 /* Dependency output file.  */
 static const char *deps_file;
 
+/* Enhanced dependency output file.  */
+static const char *fdeps_file;
+
 /* The prefix given by -iprefix, if any.  */
 static const char *iprefix;
 
@@ -360,6 +363,23 @@ c_common_handle_option (size_t scode, const char *arg, HOST_WIDE_INT value,
       deps_file = arg;
       break;
 
+    case OPT_fdep_format_:
+      if (!strcmp (arg, "p1689r5"))
+	cpp_opts->deps.format = DEPS_FMT_P1689R5;
+      else
+	error ("%<-fdep-format=%> unknown format %<%s%>", arg);
+      break;
+
+    case OPT_fdep_file_:
+      deps_seen = true;
+      fdeps_file = arg;
+      break;
+
+    case OPT_fdep_output_:
+      deps_seen = true;
+      defer_opt (code, arg);
+      break;
+
     case OPT_MF:
       deps_seen = true;
       deps_file = arg;
@@ -1271,6 +1291,7 @@ void
 c_common_finish (void)
 {
   FILE *deps_stream = NULL;
+  FILE *fdeps_stream = NULL;
 
   /* Note that we write the dependencies even if there are errors. This is
      useful for handling outdated generated headers that now trigger errors
@@ -1299,9 +1320,24 @@ c_common_finish (void)
      locations with input_location, which would be incorrect now.  */
   override_libcpp_locations = false;
 
+  if (cpp_opts->deps.format != DEPS_FMT_NONE)
+    {
+      if (!fdeps_file)
+	fdeps_stream = out_stream;
+      else if (fdeps_file[0] == '-' && fdeps_file[1] == '\0')
+	fdeps_stream = stdout;
+      else
+	{
+	  fdeps_stream = fopen (fdeps_file, "w");
+	  if (!fdeps_stream)
+	    fatal_error (input_location, "opening dependency file %s: %m",
+			 fdeps_file);
+	}
+    }
+
   /* For performance, avoid tearing down cpplib's internal structures
      with cpp_destroy ().  */
-  cpp_finish (parse_in, deps_stream);
+  cpp_finish (parse_in, deps_stream, fdeps_stream);
 
   if (deps_stream && deps_stream != out_stream && deps_stream != stdout
       && (ferror (deps_stream) || fclose (deps_stream)))
@@ -1373,6 +1409,8 @@ handle_deferred_opts (void)
 
 	if (opt->code == OPT_MT || opt->code == OPT_MQ)
 	  deps_add_target (deps, opt->arg, opt->code == OPT_MQ);
+	else if (opt->code == OPT_fdep_output_)
+	  deps_add_output (deps, opt->arg, true);
       }
 }
 
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 070f85c81d2..dcee27c6533 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -256,6 +256,18 @@ MT
 C ObjC C++ ObjC++ Joined Separate MissingArgError(missing makefile target after %qs)
 -MT <target>	Add a target that does not require quoting.
 
+fdep-format=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing format after %qs)
+Format for output dependency information.  Supported (\"p1689r5\").
+
+fdep-file=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing output path after %qs)
+File for output dependency information.
+
+fdep-output=
+C ObjC C++ ObjC++ NoDriverArg Joined MissingArgError(missing path after %qs)
+-fdep-output=obj.o Output file for the compile step.
+
 P
 C ObjC C++ ObjC++
 Do not generate #line directives.
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 4a490753778..d7fb7c8f3e6 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* module.cc (preprocessed_module): Pass whether the module is
+	exported to dependency tracking.
+
 2022-10-26  Marek Polacek  <polacek@redhat.com>
 
 	PR c++/106393
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 9957df510e6..d9e760876b7 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -19794,7 +19794,8 @@ preprocessed_module (cpp_reader *reader)
 		  && (module->is_interface () || module->is_partition ()))
 		deps_add_module_target (deps, module->get_flatname (),
 					maybe_add_cmi_prefix (module->filename),
-					module->is_header());
+					module->is_header (),
+					module->is_exported ());
 	      else
 		deps_add_module_dep (deps, module->get_flatname ());
 	    }
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9f0e5460861..3bd4024f9c5 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2786,6 +2786,21 @@ is @option{-fpermitted-flt-eval-methods=c11}.  The default when in a GNU
 dialect (@option{-std=gnu11} or similar) is
 @option{-fpermitted-flt-eval-methods=ts-18661-3}.
 
+@item -fdep-file=@var{file}
+@opindex fdep-file
+Where to write structured dependency information.
+
+@item -fdep-format=@var{format}
+@opindex fdep-format
+The format to use for structured dependency information. @samp{p1689r5} is the
+only supported format right now.  Note that when this argument is specified, the
+output of @samp{-MF} is stripped of some information (namely C++ modules) so
+that it does not use extended makefile syntax not understood by most tools.
+
+@item -fdep-output=@var{file}
+@opindex fdep-output
+Analogous to @option{-MT} but for structured dependency information.
+
 @item -fplan9-extensions
 @opindex fplan9-extensions
 Accept some non-standard constructs used in Plan 9 code.
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 56ccf843482..248f57e8035 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,10 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* g++.dg/modules/depflags*: New tests.
+	* g++.dg/modules/p1689*: New tests.
+	* g++.dg/modules/test-p1689.py: New tool for validating P1689 output.
+	* lib/modules.exp: Support for validating P1689 outputs.
+
 2022-10-26  David Malcolm  <dmalcolm@redhat.com>
 
 	* gcc.dg/analyzer/fd-3.c (test_5): Expect "opened here" message
diff --git a/gcc/testsuite/g++.dg/modules/depflags-f-MD.C b/gcc/testsuite/g++.dg/modules/depflags-f-MD.C
new file mode 100644
index 00000000000..90e1c9983bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-f-MD.C
@@ -0,0 +1,2 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-format=p1689r5 }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-f.C b/gcc/testsuite/g++.dg/modules/depflags-f.C
new file mode 100644
index 00000000000..6192300879d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-f.C
@@ -0,0 +1 @@
+// { dg-additional-options -fdep-format=p1689r5 }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fi.C b/gcc/testsuite/g++.dg/modules/depflags-fi.C
new file mode 100644
index 00000000000..4f649a11bdd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fi.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -fdep-format=invalid }
+
+// { dg-prune-output "error: '-fdep-format=' unknown format 'invalid'"  }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fj-MD.C b/gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
new file mode 100644
index 00000000000..a361d81f37f
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fj-MD.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-format=p1689r5 }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fj.C b/gcc/testsuite/g++.dg/modules/depflags-fj.C
new file mode 100644
index 00000000000..4a140ec1f13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fj.C
@@ -0,0 +1,4 @@
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-format=p1689r5 }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C b/gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
new file mode 100644
index 00000000000..18d765211b4
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fjo-MD.C
@@ -0,0 +1,4 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-output=depflags-1.C }
+// { dg-additional-options -fdep-format=p1689r5 }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fjo.C b/gcc/testsuite/g++.dg/modules/depflags-fjo.C
new file mode 100644
index 00000000000..6d239f63017
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fjo.C
@@ -0,0 +1,5 @@
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-output=depflags-1.C }
+// { dg-additional-options -fdep-format=p1689r5 }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fo-MD.C b/gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
new file mode 100644
index 00000000000..a3a775b606a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fo-MD.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=depflags-1.C }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-fo.C b/gcc/testsuite/g++.dg/modules/depflags-fo.C
new file mode 100644
index 00000000000..29839978e59
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-fo.C
@@ -0,0 +1,4 @@
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=depflags-1.C }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-j-MD.C b/gcc/testsuite/g++.dg/modules/depflags-j-MD.C
new file mode 100644
index 00000000000..d95c8e6c2e6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-j-MD.C
@@ -0,0 +1,2 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-file=depflags-3.json }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-j.C b/gcc/testsuite/g++.dg/modules/depflags-j.C
new file mode 100644
index 00000000000..5f100b0f6e5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-j.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -fdep-file=depflags-3.json }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-jo-MD.C b/gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
new file mode 100644
index 00000000000..44330794abc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-jo-MD.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-output=depflags-1.C }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-jo.C b/gcc/testsuite/g++.dg/modules/depflags-jo.C
new file mode 100644
index 00000000000..8eec6bba1d1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-jo.C
@@ -0,0 +1,4 @@
+// { dg-additional-options -fdep-file=depflags-3.json }
+// { dg-additional-options -fdep-output=depflags-1.C }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-o-MD.C b/gcc/testsuite/g++.dg/modules/depflags-o-MD.C
new file mode 100644
index 00000000000..429f1f85684
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-o-MD.C
@@ -0,0 +1,2 @@
+// { dg-additional-options -MD }
+// { dg-additional-options -fdep-output=depflags-1.C }
diff --git a/gcc/testsuite/g++.dg/modules/depflags-o.C b/gcc/testsuite/g++.dg/modules/depflags-o.C
new file mode 100644
index 00000000000..9a7326cc812
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/depflags-o.C
@@ -0,0 +1,3 @@
+// { dg-additional-options -fdep-output=depflags-1.C }
+
+// { dg-prune-output "error: to generate dependencies you must specify either '-M' or '-MM'" }
diff --git a/gcc/testsuite/g++.dg/modules/modules.exp b/gcc/testsuite/g++.dg/modules/modules.exp
index afb323d0efd..7fe8825144f 100644
--- a/gcc/testsuite/g++.dg/modules/modules.exp
+++ b/gcc/testsuite/g++.dg/modules/modules.exp
@@ -28,6 +28,7 @@
 # { dg-module-do [link|run] [xfail] [options] } # link [and run]
 
 load_lib g++-dg.exp
+load_lib modules.exp
 
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CXXFLAGS
@@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
     }
 }
 
+# delete the specified set of dep files
+proc cleanup_dep_files { files } {
+    foreach file $files {
+	file_on_host delete $file
+    }
+}
+
 global testdir
 set testdir $srcdir/$subdir
 proc srcdir {} {
@@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
 	set std_list [module-init $src]
 	foreach std $std_list {
 	    set mod_files {}
+	    set dep_files {}
 	    global module_do
 	    set module_do {"compile" "P"}
 	    set asm_list {}
@@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
 		set mod_files [find $DEFAULT_REPO *.gcm]
 	    }
 	    cleanup_module_files $mod_files
+
+	    cleanup_dep_files $dep_files
 	}
     }
 }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-1.C b/gcc/testsuite/g++.dg/modules/p1689-1.C
new file mode 100644
index 00000000000..245e30d09ce
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-1.C
@@ -0,0 +1,18 @@
+// { dg-additional-options -E }
+// { dg-additional-options -MT }
+// { dg-additional-options p1689-1.json }
+// { dg-additional-options -MD }
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=p1689-1.o }
+// { dg-additional-options -fdep-file=p1689-1.json }
+
+// Export a module that uses modules, re-exports modules, and partitions.
+
+export module foo;
+export import foo:part1;
+import foo:part2;
+
+export import bar;
+
+// { dg-final { run-check-p1689-valid p1689-1.json p1689-1.exp.json } }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-1.exp.json b/gcc/testsuite/g++.dg/modules/p1689-1.exp.json
new file mode 100644
index 00000000000..c5648ac7ae5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-1.exp.json
@@ -0,0 +1,27 @@
+{
+    "rules": [
+        {
+            "primary-output": "p1689-1.o",
+            "provides": [
+                {
+                    "logical-name": "foo",
+                    "is-interface": true
+                }
+            ],
+            "requires": [
+                "__P1689_unordered__",
+                {
+                    "logical-name": "bar"
+                },
+                {
+                    "logical-name": "foo:part1"
+                },
+                {
+                    "logical-name": "foo:part2"
+                }
+            ]
+        }
+    ],
+    "version": 0,
+    "revision": 0
+}
diff --git a/gcc/testsuite/g++.dg/modules/p1689-2.C b/gcc/testsuite/g++.dg/modules/p1689-2.C
new file mode 100644
index 00000000000..add07f59a0e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-2.C
@@ -0,0 +1,16 @@
+// { dg-additional-options -E }
+// { dg-additional-options -MT }
+// { dg-additional-options p1689-2.json }
+// { dg-additional-options -MD }
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=p1689-2.o }
+// { dg-additional-options -fdep-file=p1689-2.json }
+
+// Export a module partition that uses modules.
+
+export module foo:part1;
+
+#include <iostream>
+
+// { dg-final { run-check-p1689-valid p1689-2.json p1689-2.exp.json } }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-2.exp.json b/gcc/testsuite/g++.dg/modules/p1689-2.exp.json
new file mode 100644
index 00000000000..6901172f277
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-2.exp.json
@@ -0,0 +1,16 @@
+{
+    "rules": [
+        {
+            "primary-output": "p1689-2.o",
+            "provides": [
+                {
+                    "logical-name": "foo:part1",
+                    "is-interface": true
+                }
+            ],
+            "requires": []
+        }
+    ],
+    "version": 0,
+    "revision": 0
+}
diff --git a/gcc/testsuite/g++.dg/modules/p1689-3.C b/gcc/testsuite/g++.dg/modules/p1689-3.C
new file mode 100644
index 00000000000..3482c2f4903
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-3.C
@@ -0,0 +1,14 @@
+// { dg-additional-options -E }
+// { dg-additional-options -MT }
+// { dg-additional-options p1689-3.json }
+// { dg-additional-options -MD }
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=p1689-3.o }
+// { dg-additional-options -fdep-file=p1689-3.json }
+
+// Provide a module partition.
+
+module foo:part2;
+
+// { dg-final { run-check-p1689-valid p1689-3.json p1689-3.exp.json } }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-3.exp.json b/gcc/testsuite/g++.dg/modules/p1689-3.exp.json
new file mode 100644
index 00000000000..5a40beacd22
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-3.exp.json
@@ -0,0 +1,16 @@
+{
+    "rules": [
+        {
+            "primary-output": "p1689-3.o",
+            "provides": [
+                {
+                    "logical-name": "foo:part2",
+                    "is-interface": false
+                }
+            ],
+            "requires": []
+        }
+    ],
+    "version": 0,
+    "revision": 0
+}
diff --git a/gcc/testsuite/g++.dg/modules/p1689-4.C b/gcc/testsuite/g++.dg/modules/p1689-4.C
new file mode 100644
index 00000000000..88bac77a8f8
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-4.C
@@ -0,0 +1,14 @@
+// { dg-additional-options -E }
+// { dg-additional-options -MT }
+// { dg-additional-options p1689-4.json }
+// { dg-additional-options -MD }
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=p1689-4.o }
+// { dg-additional-options -fdep-file=p1689-4.json }
+
+// Module implementation unit.
+
+module foo;
+
+// { dg-final { run-check-p1689-valid p1689-4.json p1689-4.exp.json } }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-4.exp.json b/gcc/testsuite/g++.dg/modules/p1689-4.exp.json
new file mode 100644
index 00000000000..b119f5654b1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-4.exp.json
@@ -0,0 +1,14 @@
+{
+    "rules": [
+        {
+            "primary-output": "p1689-4.o",
+            "requires": []
+                {
+                    "logical-name": "foo"
+                }
+            ]
+        }
+    ],
+    "version": 0,
+    "revision": 0
+}
diff --git a/gcc/testsuite/g++.dg/modules/p1689-5.C b/gcc/testsuite/g++.dg/modules/p1689-5.C
new file mode 100644
index 00000000000..e985277368b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-5.C
@@ -0,0 +1,14 @@
+// { dg-additional-options -E }
+// { dg-additional-options -MT }
+// { dg-additional-options p1689-5.json }
+// { dg-additional-options -MD }
+// { dg-additional-options -fmodules-ts }
+// { dg-additional-options -fdep-format=p1689r5 }
+// { dg-additional-options -fdep-output=p1689-5.o }
+// { dg-additional-options -fdep-file=p1689-5.json }
+
+// Use modules, don't provide anything.
+
+import bar;
+
+// { dg-final { run-check-p1689-valid p1689-5.json p1689-5.exp.json } }
diff --git a/gcc/testsuite/g++.dg/modules/p1689-5.exp.json b/gcc/testsuite/g++.dg/modules/p1689-5.exp.json
new file mode 100644
index 00000000000..18704ac8820
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/p1689-5.exp.json
@@ -0,0 +1,14 @@
+{
+    "rules": [
+        {
+            "primary-output": "p1689-5.o",
+            "requires": [
+                {
+                    "logical-name": "bar"
+                }
+            ]
+        }
+    ],
+    "version": 0,
+    "revision": 0
+}
diff --git a/gcc/testsuite/g++.dg/modules/test-p1689.py b/gcc/testsuite/g++.dg/modules/test-p1689.py
new file mode 100644
index 00000000000..2f07cc361aa
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/test-p1689.py
@@ -0,0 +1,222 @@
+import json
+
+
+# Parameters.
+ALL_ERRORS = False
+REPLACEMENTS = {}
+
+
+def _print_path(path):
+    '''Format a JSON path for output.'''
+    return '/'.join(path)
+
+
+def _report_error(msg):
+    '''Report an error.'''
+    full_msg = 'ERROR: ' + msg
+    if ALL_ERRORS:
+        print(full_msg)
+    else:
+        raise RuntimeError(full_msg)
+
+
+def _error_type_mismatch(path, actual, expect):
+    '''Report that there is a type mismatch.'''
+    _report_error('type mismatch at %s: actual: "%s" expect: "%s"' % (_print_path(path), actual, expect))
+
+
+def _error_unknown_type(path, typ):
+    '''Report that there is an unknown type in the JSON object.'''
+    _report_error('unknown type at %s: "%s"' % (_print_path(path), typ))
+
+
+def _error_length_mismatch(path, actual, expect):
+    '''Report a length mismatch in an object or array.'''
+    _report_error('length mismatch at %s: actual: "%s" expect: "%s"' % (_print_path(path), actual, expect))
+
+
+def _error_unexpect_value(path, actual, expect):
+    '''Report a value mismatch.'''
+    _report_error('value mismatch at %s: actual: "%s" expect: "%s"' % (_print_path(path), actual, expect))
+
+
+def _error_extra_key(path, key):
+    '''Report on a key that is unexpected.'''
+    _report_error('extra key at %s: "%s"' % (_print_path(path), key))
+
+
+def _error_missing_key(path, key):
+    '''Report on a key that is missing.'''
+    _report_error('extra key at %s: %s' % (_print_path(path), key))
+
+
+def _compare_object(path, actual, expect):
+    '''Compare a JSON object.'''
+    is_ok = True
+
+    if not len(actual) == len(expect):
+        _error_length_mismatch(path, len(actual), len(expect))
+        is_ok = False
+
+    for key in actual:
+        if key not in expect:
+            _error_extra_key(path, key)
+            is_ok = False
+        else:
+            sub_error = compare_json(path + [key], actual[key], expect[key])
+            if sub_error:
+                is_ok = False
+
+    for key in expect:
+        if key not in actual:
+            _error_missing_key(path, key)
+            is_ok = False
+
+    return is_ok
+
+
+def _compare_array(path, actual, expect):
+    '''Compare a JSON array.'''
+    is_ok = True
+
+    if not len(actual) == len(expect):
+        _error_length_mismatch(path, len(actual), len(expect))
+        is_ok = False
+
+    for (idx, (a, e)) in enumerate(zip(actual, expect)):
+        sub_error = compare_json(path + [str(idx)], a, e)
+        if sub_error:
+            is_ok = False
+
+    return is_ok
+
+
+def _make_replacements(value):
+    for (old, new) in REPLACEMENTS.values():
+        value = value.replace(old, new)
+    return value
+
+
+def _compare_string(path, actual, expect):
+    '''Compare a JSON string supporting replacements in the expected output.'''
+    expect = _make_replacements(expect)
+
+    if not actual == expect:
+        _error_unexpect_value(path, actual, expect)
+        return False
+    else:
+        print('%s is ok: %s' % (_print_path(path), actual))
+    return True
+
+
+def _compare_number(path, actual, expect):
+    '''Compare a JSON integer.'''
+    if not actual == expect:
+        _error_unexpect_value(path, actual, expect)
+        return False
+    else:
+        print('%s is ok: %s' % (_print_path(path), actual))
+    return True
+
+
+def _inspect_ordering(arr):
+    req_ordering = True
+
+    if not arr:
+        return arr, req_ordering
+
+    if arr[0] == '__P1689_unordered__':
+        arr.pop(0)
+        req_ordering = False
+
+    return arr, req_ordering
+
+
+def compare_json(path, actual, expect):
+    actual_type = type(actual)
+    expect_type = type(expect)
+
+    is_ok = True
+
+    if not actual_type == expect_type:
+        _error_type_mismatch(path, actual_type, expect_type)
+        is_ok = False
+    elif actual_type == dict:
+        is_ok = _compare_object(path, actual, expect)
+    elif actual_type == list:
+        expect, req_ordering = _inspect_ordering(expect)
+        if not req_ordering:
+            actual = set(actual)
+            expect = set(expect)
+        is_ok = _compare_array(path, actual, expect)
+    elif actual_type == str:
+        is_ok = _compare_string(path, actual, expect)
+    elif actual_type == float:
+        is_ok = _compare_number(path, actual, expect)
+    elif actual_type == int:
+        is_ok = _compare_number(path, actual, expect)
+    elif actual_type == bool:
+        is_ok = _compare_number(path, actual, expect)
+    elif actual_type == type(None):
+        pass
+    else:
+        _error_unknown_type(path, actual_type)
+        is_ok = False
+
+    return is_ok
+
+
+def validate_p1689(actual, expect):
+    '''Validate a P1689 file against an expected output file.
+
+    Returns `False` if it fails, `True` if they are the same.
+    '''
+    with open(actual, 'r') as fin:
+        actual_content = fin.read()
+    with open(expect, 'r') as fin:
+        expect_content = fin.read()
+
+    actual_json = json.loads(actual_content)
+    expect_json = json.loads(expect_content)
+
+    return compare_json([], actual_json, expect_json)
+
+
+if __name__ == '__main__':
+    import sys
+
+    actual = None
+    expect = None
+
+    # Parse arguments.
+    args = sys.argv[1:]
+    while args:
+        # Take an argument.
+        arg = args.pop(0)
+
+        # Parse out replacement expressions.
+        if arg == '-r' or arg == '--replace':
+            replacement = args.pop(0)
+            (key, value) = replacement.split('=', maxsplit=1)
+            REPLACEMENTS[key] = value
+        # Flag to change how errors are reported.
+        elif arg == '-A' or arg == '--all':
+            ALL_ERRORS = True
+        # Required arguments.
+        elif arg == '-a' or arg == '--actual':
+            actual = args.pop(0)
+        elif arg == '-e' or arg == '--expect':
+            expect = args.pop(0)
+
+    # Validate that we have the required arguments.
+    if actual is None:
+        raise RuntimeError('missing "actual" file')
+    if expect is None:
+        raise RuntimeError('missing "expect" file')
+
+    # Do the actual work.
+    is_ok = validate_p1689(actual, expect)
+
+    # Fail if errors are found.
+    if not is_ok:
+        sys.exit(1)
diff --git a/gcc/testsuite/lib/modules.exp b/gcc/testsuite/lib/modules.exp
new file mode 100644
index 00000000000..c7cfda6aae4
--- /dev/null
+++ b/gcc/testsuite/lib/modules.exp
@@ -0,0 +1,71 @@
+#   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# Verify various kinds of gcov output: line counts, branch percentages,
+# and call return percentages.  None of this is language-specific.
+
+load_lib "target-supports.exp"
+
+#
+# clean-p1689-file -- delete a working file the compiler creates for p1689
+#
+# TESTCASE is the name of the test.
+# SUFFIX is file suffix
+
+proc clean-p1689-file { testcase suffix } {
+    set basename [file tail $testcase]
+    set base [file rootname $basename]
+    remote_file host delete $base.$suffix
+}
+
+#
+# clean-p1689 -- delete the working files the compiler creates for p1689
+#
+# TESTCASE is the name of the test.
+#
+proc clean-p1689 { testcase } {
+    clean-p1689-file $testcase "d"
+    clean-p1689-file $testcase "json"
+}
+
+# Call by dg-final to check a P1689 dependency file
+
+proc run-check-p1689-valid { depfile template } {
+    global srcdir subdir
+    # Extract the test file name from the arguments.
+    set testcase [file rootname [file tail $depfile]]
+
+    verbose "Running P1689 validation for $testcase in $srcdir/$subdir" 2
+    set testcase [remote_download host $testcase]
+
+    set pytest_script "test-p1689.py"
+    if { ![check_effective_target_recent_python3] } {
+      unsupported "$pytest_script python3 is missing"
+      return
+    }
+
+    verbose "running script" 1
+    spawn -noecho python3 $srcdir/$subdir/$pytest_script --all --actual $depfile --expect $srcdir/$subdir/$template
+
+    expect {
+      -re "ERROR: (\[^\r\n\]*)" {
+       fail $expect_out(0,string)
+       exp_continue
+      }
+    }
+
+    clean-p1689 $testcase
+}
diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
index 4e2c7900ae2..34117d79fee 100644
--- a/libcpp/ChangeLog
+++ b/libcpp/ChangeLog
@@ -1,3 +1,14 @@
+2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
+
+	* include/cpplib.h: Add cpp_deps_format enum.
+	* include/cpplib.h (cpp_options): Add format field.
+	* include/cpplib.h (cpp_finish): Add dependency stream parameter.
+	* include/mkdeps.h (deps_add_module_target): Add new preprocessor
+	parameter used for C++ module tracking.
+	* init.cc (cpp_finish): Add new preprocessor parameter used for C++
+	module tracking.
+	* mkdeps.cc (mkdeps): Implement P1689R5 output.
+
 2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
 
 	* include/charset.cc: Add `_cpp_valid_utf8_str` which determines
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 1d34c00669f..3b2e4f23204 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -302,6 +302,9 @@ typedef CPPCHAR_SIGNED_T cppchar_signed_t;
 /* Style of header dependencies to generate.  */
 enum cpp_deps_style { DEPS_NONE = 0, DEPS_USER, DEPS_SYSTEM };
 
+/* Format of header dependencies to generate.  */
+enum cpp_deps_format { DEPS_FMT_NONE = 0, DEPS_FMT_P1689R5 };
+
 /* The possible normalization levels, from most restrictive to least.  */
 enum cpp_normalize_level {
   /* In NFKC.  */
@@ -589,6 +592,9 @@ struct cpp_options
     /* Style of header dependencies to generate.  */
     enum cpp_deps_style style;
 
+    /* Format of header dependencies to generate.  */
+    enum cpp_deps_format format;
+
     /* Assume missing files are generated files.  */
     bool missing_files;
 
@@ -1112,9 +1118,9 @@ extern void cpp_post_options (cpp_reader *);
 extern void cpp_init_iconv (cpp_reader *);
 
 /* Call this to finish preprocessing.  If you requested dependency
-   generation, pass an open stream to write the information to,
-   otherwise NULL.  It is your responsibility to close the stream.  */
-extern void cpp_finish (cpp_reader *, FILE *deps_stream);
+   generation, pass open stream(s) to write the information to,
+   otherwise NULL.  It is your responsibility to close the stream(s).  */
+extern void cpp_finish (cpp_reader *, FILE *deps_stream, FILE *fdeps_stream = NULL);
 
 /* Call this to release the handle at the end of preprocessing.  Any
    use of the handle after this function returns is invalid.  */
diff --git a/libcpp/include/mkdeps.h b/libcpp/include/mkdeps.h
index 96d64641b1a..0bd284f903e 100644
--- a/libcpp/include/mkdeps.h
+++ b/libcpp/include/mkdeps.h
@@ -53,20 +53,29 @@ extern void deps_add_default_target (class mkdeps *, const char *);
 
 /* Adds a module target.  The module name and cmi name are copied.  */
 extern void deps_add_module_target (struct mkdeps *, const char *module,
-				    const char *cmi, bool is_header);
+				    const char *cmi, bool is_header,
+				    bool is_exported);
 
 /* Adds a module dependency.  The module name is copied.  */
 extern void deps_add_module_dep (struct mkdeps *, const char *module);
 
+/* Add an output.  */
+extern void deps_add_output (struct mkdeps *, const char *, bool);
+
 /* Add a dependency (appears on the right side of the colon) to the
    deps list.  Dependencies will be printed in the order that they
    were entered with this function.  By convention, the first
    dependency entered should be the primary source file.  */
 extern void deps_add_dep (class mkdeps *, const char *);
 
-/* Write out a deps buffer to a specified file.  The last argument
-   is the number of columns to word-wrap at (0 means don't wrap).  */
-extern void deps_write (const cpp_reader *, FILE *, unsigned int);
+/* Write out a deps buffer to a specified file.  The third argument
+   is the number of columns to word-wrap at (0 means don't wrap).
+   The last argument indicates whether to output extra information
+   (namely modules).  */
+extern void deps_write (const struct cpp_reader *, FILE *, unsigned int);
+
+/* Write out a deps buffer to a specified file in P1689R5 format.  */
+extern void deps_write_p1689r5 (const struct mkdeps *, FILE *);
 
 /* Write out a deps buffer to a file, in a form that can be read back
    with deps_restore.  Returns nonzero on error, in which case the
diff --git a/libcpp/init.cc b/libcpp/init.cc
index 5f34e3515d2..ca1e57b1669 100644
--- a/libcpp/init.cc
+++ b/libcpp/init.cc
@@ -855,7 +855,7 @@ read_original_directory (cpp_reader *pfile)
    Maybe it should also reset state, such that you could call
    cpp_start_read with a new filename to restart processing.  */
 void
-cpp_finish (cpp_reader *pfile, FILE *deps_stream)
+cpp_finish (struct cpp_reader *pfile, FILE *deps_stream, FILE *fdeps_stream)
 {
   /* Warn about unused macros before popping the final buffer.  */
   if (CPP_OPTION (pfile, warn_unused_macros))
@@ -869,8 +869,15 @@ cpp_finish (cpp_reader *pfile, FILE *deps_stream)
   while (pfile->buffer)
     _cpp_pop_buffer (pfile);
 
-  if (deps_stream)
-    deps_write (pfile, deps_stream, 72);
+  cpp_deps_format deps_format = CPP_OPTION (pfile, deps.format);
+  if (deps_format == DEPS_FMT_P1689R5 && fdeps_stream)
+    deps_write_p1689r5 (pfile->deps, fdeps_stream);
+
+  if (CPP_OPTION (pfile, deps.style) != DEPS_NONE
+      && deps_stream)
+    {
+      deps_write (pfile, deps_stream, 72);
+    }
 
   /* Report on headers that could use multiple include guards.  */
   if (CPP_OPTION (pfile, print_include_names))
diff --git a/libcpp/mkdeps.cc b/libcpp/mkdeps.cc
index 30e87d8b4d7..8e0171d3f00 100644
--- a/libcpp/mkdeps.cc
+++ b/libcpp/mkdeps.cc
@@ -81,7 +81,8 @@ public:
   };
 
   mkdeps ()
-    : module_name (NULL), cmi_name (NULL), is_header_unit (false), quote_lwm (0)
+    : primary_output (NULL), module_name (NULL), cmi_name (NULL)
+    , is_header_unit (false), is_exported (false), quote_lwm (0)
   {
   }
   ~mkdeps ()
@@ -90,6 +91,9 @@ public:
 
     for (i = targets.size (); i--;)
       free (const_cast <char *> (targets[i]));
+    free (const_cast <char *> (primary_output));
+    for (i = outputs.size (); i--;)
+      free (const_cast <char *> (outputs[i]));
     for (i = deps.size (); i--;)
       free (const_cast <char *> (deps[i]));
     for (i = vpath.size (); i--;)
@@ -103,6 +107,8 @@ public:
 public:
   vec<const char *> targets;
   vec<const char *> deps;
+  const char * primary_output;
+  vec<const char *> outputs;
   vec<velt> vpath;
   vec<const char *> modules;
 
@@ -110,6 +116,7 @@ public:
   const char *module_name;
   const char *cmi_name;
   bool is_header_unit;
+  bool is_exported;
   unsigned short quote_lwm;
 };
 
@@ -288,6 +295,21 @@ deps_add_default_target (class mkdeps *d, const char *tgt)
     }
 }
 
+/* Adds an output O.  We make a copy, so it need not be a permanent
+   string.  */
+void
+deps_add_output (struct mkdeps *d, const char *o, bool is_primary)
+{
+  o = apply_vpath (d, o);
+  if (is_primary)
+  {
+    if (d->primary_output)
+      d->outputs.push (d->primary_output);
+    d->primary_output = xstrdup (o);
+  } else
+    d->outputs.push (xstrdup (o));
+}
+
 void
 deps_add_dep (class mkdeps *d, const char *t)
 {
@@ -325,12 +347,13 @@ deps_add_vpath (class mkdeps *d, const char *vpath)
 
 void
 deps_add_module_target (struct mkdeps *d, const char *m,
-			const char *cmi, bool is_header_unit)
+			const char *cmi, bool is_header_unit, bool is_exported)
 {
   gcc_assert (!d->module_name);
   
   d->module_name = xstrdup (m);
   d->is_header_unit = is_header_unit;
+  d->is_exported = is_exported;
   d->cmi_name = xstrdup (cmi);
 }
 
@@ -395,10 +418,15 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
   if (colmax && colmax < 34)
     colmax = 34;
 
+  /* Write out C++ modules information if no other `-fdeps-format=`
+   * option is given. */
+  cpp_deps_format deps_format = CPP_OPTION (pfile, deps.format);
+  bool write_make_modules_deps = deps_format == DEPS_FMT_NONE;
+
   if (d->deps.size ())
     {
       column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
-      if (CPP_OPTION (pfile, deps.modules) && d->cmi_name)
+      if (write_make_modules_deps && CPP_OPTION (pfile, deps.modules) && d->cmi_name)
 	column = make_write_name (d->cmi_name, fp, column, colmax);
       fputs (":", fp);
       column++;
@@ -412,7 +440,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
   if (!CPP_OPTION (pfile, deps.modules))
     return;
 
-  if (d->modules.size ())
+  if (write_make_modules_deps && d->modules.size ())
     {
       column = make_write_vec (d->targets, fp, 0, colmax, d->quote_lwm);
       if (d->cmi_name)
@@ -423,7 +451,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
       fputs ("\n", fp);
     }
 
-  if (d->module_name)
+  if (write_make_modules_deps && d->module_name)
     {
       if (d->cmi_name)
 	{
@@ -455,7 +483,7 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
 	}
     }
   
-  if (d->modules.size ())
+  if (write_make_modules_deps && d->modules.size ())
     {
       column = fprintf (fp, "CXX_IMPORTS +=");
       make_write_vec (d->modules, fp, column, colmax, 0, ".c++m");
@@ -468,11 +496,118 @@ make_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
 /* Really we should be opening fp here.  */
 
 void
-deps_write (const cpp_reader *pfile, FILE *fp, unsigned int colmax)
+deps_write (const struct cpp_reader *pfile, FILE *fp, unsigned int colmax)
 {
   make_write (pfile, fp, colmax);
 }
 
+static void
+p1689r5_write_filepath (const char *name, FILE *fp)
+{
+  if (_cpp_valid_utf8_str (name))
+    {
+      fputc ('"', fp);
+      for (const char* c = name; *c; c++)
+	{
+	  // Escape control characters.
+	  if (ISCNTRL (*c))
+	    fprintf (fp, "\\u%04x", *c);
+	  // JSON escape characters.
+	  else if (*c == '"' || *c == '\\')
+	    {
+	      fputc ('\\', fp);
+	      fputc (*c, fp);
+	    }
+	  // Everything else.
+	  else
+	    fputc (*c, fp);
+	}
+      fputc ('"', fp);
+    }
+  else
+    {
+      // TODO: print an error
+    }
+}
+
+static void
+p1689r5_write_vec (const mkdeps::vec<const char *> &vec, FILE *fp)
+{
+  for (unsigned ix = 0; ix != vec.size (); ix++)
+    {
+      p1689r5_write_filepath (vec[ix], fp);
+      if (ix < vec.size () - 1)
+	fputc (',', fp);
+      fputc ('\n', fp);
+    }
+}
+
+void
+deps_write_p1689r5 (const struct mkdeps *d, FILE *fp)
+{
+  fputs ("{\n", fp);
+
+  fputs ("\"rules\": [\n", fp);
+  fputs ("{\n", fp);
+
+  if (d->primary_output)
+    {
+      fputs ("\"primary-output\": ", fp);
+      p1689r5_write_filepath (d->primary_output, fp);
+      fputs (",\n", fp);
+    }
+
+  if (d->outputs.size ())
+    {
+      fputs ("\"outputs\": [\n", fp);
+      p1689r5_write_vec (d->outputs, fp);
+      fputs ("],\n", fp);
+    }
+
+  if (d->module_name)
+    {
+      fputs ("\"provides\": [\n", fp);
+      fputs ("{\n", fp);
+
+      fputs ("\"logical-name\": ", fp);
+      p1689r5_write_filepath (d->module_name, fp);
+      fputs (",\n", fp);
+
+      fprintf (fp, "\"is-interface\": %s\n", d->is_exported ? "true" : "false");
+
+      // TODO: header-unit information
+
+      fputs ("}\n", fp);
+      fputs ("],\n", fp);
+    }
+
+  fputs ("\"requires\": [\n", fp);
+  for (size_t i = 0; i < d->modules.size (); i++)
+    {
+      if (i != 0)
+	fputs (",\n", fp);
+      fputs ("{\n", fp);
+
+      fputs ("\"logical-name\": ", fp);
+      p1689r5_write_filepath (d->modules[i], fp);
+      fputs ("\n", fp);
+
+      // TODO: header-unit information
+
+      fputs ("}\n", fp);
+    }
+  fputs ("]\n", fp);
+
+  fputs ("}\n", fp);
+
+  fputs ("],\n", fp);
+
+  fputs ("\"version\": 0,\n", fp);
+  fputs ("\"revision\": 0\n", fp);
+
+  fputs ("}\n", fp);
+}
+
 /* Write out a deps buffer to a file, in a form that can be read back
    with deps_restore.  Returns nonzero on error, in which case the
    error number will be in errno.  */
-- 
2.37.3


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF
  2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
@ 2022-10-28 12:54   ` David Malcolm
  2022-11-07 23:04   ` Jason Merrill
  1 sibling, 0 replies; 12+ messages in thread
From: David Malcolm @ 2022-10-28 12:54 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: jason, nathan, fortran, gcc, brad.king, mliska, anlauf

On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> Unicode does not support such values because they are unrepresentable
> in
> UTF-16.

Wikipedia pointed me to RFC 3629, which was when UTF-8 introduced this
restriction, whereas libcpp was implementing the higher upper limit
from the earlier, superceded RFC 2279.

The patch looks good to me, assuming it bootstraps and passes usual
regression testing, but...
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>  libcpp/ChangeLog  | 6 ++++++
>  libcpp/charset.cc | 4 ++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 18d5bcceaf0..4d707277531 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +       * include/charset.cc: Reject encodings of codepoints above
> 0x10FFFF.
> +       UTF-16 does not support such codepoints and therefore all
> Unicode
> +       rejects such values.
> +
>  2022-10-19  Lewis Hyatt  <lhyatt@gmail.com>

...AIUI we now put ChangeLog entries in the blurb part of the patch, so
that server-side git scripts add them to the actual ChangeLog file.

Does the patch pass:
  ./contrib/gcc-changelog/git_check_commit.py
?

Thanks
Dave

>  
>         * include/cpplib.h (struct cpp_string): Use new
> "string_length" GTY.
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index 12a398e7527..e9da6674b5f 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t
> *inbytesleftp,
>    if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
>  
>    /* Make sure the character is valid.  */
> -  if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
> +  if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
>  
>    *cp = c;
>    *inbufp = inbuf;
> @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar
> **inbufp, size_t *inbytesleftp,
>    s += inbuf[bigend ? 2 : 1] << 8;
>    s += inbuf[bigend ? 3 : 0];
>  
> -  if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF))
> +  if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF))
>      return EILSEQ;
>  
>    rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
  2022-10-27 23:16 ` [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Ben Boeckel
@ 2022-10-28 12:59   ` David Malcolm
  2022-10-28 17:14     ` Ben Boeckel
  2022-11-07 23:47   ` Jason Merrill
  1 sibling, 1 reply; 12+ messages in thread
From: David Malcolm @ 2022-10-28 12:59 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: jason, nathan, fortran, gcc, brad.king, mliska, anlauf

On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> This simplifies the interface for other UTF-8 validity detections
> when a
> simple "yes" or "no" answer is sufficient.
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>  libcpp/ChangeLog  |  6 ++++++
>  libcpp/charset.cc | 18 ++++++++++++++++++
>  libcpp/internal.h |  2 ++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 4d707277531..4e2c7900ae2 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +       * include/charset.cc: Add `_cpp_valid_utf8_str` which
> determines
> +       whether a C string is valid UTF-8 or not.
> +       * include/internal.h: Add prototype for
> `_cpp_valid_utf8_str`.
> +
>  2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
>  
>         * include/charset.cc: Reject encodings of codepoints above
> 0x10FFFF.

The patch looks good to me, with the same potential caveat that you
might need to move the ChangeLog entry from the patch "body" to the
leading blurb, to satisfy:
  ./contrib/gcc-changelog/git_check_commit.py

Thanks
Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
  2022-10-28 12:59   ` David Malcolm
@ 2022-10-28 17:14     ` Ben Boeckel
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-28 17:14 UTC (permalink / raw)
  To: David Malcolm
  Cc: gcc-patches, jason, nathan, fortran, gcc, brad.king, mliska, anlauf

On Fri, Oct 28, 2022 at 08:59:16 -0400, David Malcolm wrote:
> On Thu, 2022-10-27 at 19:16 -0400, Ben Boeckel wrote:
> > This simplifies the interface for other UTF-8 validity detections
> > when a
> > simple "yes" or "no" answer is sufficient.
> > 
> > Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> > ---
> >  libcpp/ChangeLog  |  6 ++++++
> >  libcpp/charset.cc | 18 ++++++++++++++++++
> >  libcpp/internal.h |  2 ++
> >  3 files changed, 26 insertions(+)
> > 
> > diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> > index 4d707277531..4e2c7900ae2 100644
> > --- a/libcpp/ChangeLog
> > +++ b/libcpp/ChangeLog
> > @@ -1,3 +1,9 @@
> > +2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> > +
> > +       * include/charset.cc: Add `_cpp_valid_utf8_str` which
> > determines
> > +       whether a C string is valid UTF-8 or not.
> > +       * include/internal.h: Add prototype for
> > `_cpp_valid_utf8_str`.
> > +
> >  2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> >  
> >         * include/charset.cc: Reject encodings of codepoints above
> > 0x10FFFF.
> 
> The patch looks good to me, with the same potential caveat that you
> might need to move the ChangeLog entry from the patch "body" to the
> leading blurb, to satisfy:
>   ./contrib/gcc-changelog/git_check_commit.py

Ah, I had missed that. Now fixed locally for patches 1 and 2; will be in
v3 pending some time for further reviews.

THanks,

--Ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] p1689r5: initial support
  2022-10-27 23:16 ` [PATCH v2 3/3] p1689r5: initial support Ben Boeckel
@ 2022-10-28 17:15   ` Ben Boeckel
  2022-11-01 14:57   ` Tom Tromey
  1 sibling, 0 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-10-28 17:15 UTC (permalink / raw)
  To: gcc-patches
  Cc: jason, nathan, fortran, gcc, brad.king, dmalcolm, mliska, anlauf

On Thu, Oct 27, 2022 at 19:16:44 -0400, Ben Boeckel wrote:
> diff --git a/gcc/testsuite/g++.dg/modules/modules.exp b/gcc/testsuite/g++.dg/modules/modules.exp
> index afb323d0efd..7fe8825144f 100644
> --- a/gcc/testsuite/g++.dg/modules/modules.exp
> +++ b/gcc/testsuite/g++.dg/modules/modules.exp
> @@ -28,6 +28,7 @@
>  # { dg-module-do [link|run] [xfail] [options] } # link [and run]
>  
>  load_lib g++-dg.exp
> +load_lib modules.exp
>  
>  # If a testcase doesn't have special options, use these.
>  global DEFAULT_CXXFLAGS
> @@ -237,6 +238,13 @@ proc cleanup_module_files { files } {
>      }
>  }
>  
> +# delete the specified set of dep files
> +proc cleanup_dep_files { files } {
> +    foreach file $files {
> +	file_on_host delete $file
> +    }
> +}
> +
>  global testdir
>  set testdir $srcdir/$subdir
>  proc srcdir {} {
> @@ -310,6 +318,7 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>  	set std_list [module-init $src]
>  	foreach std $std_list {
>  	    set mod_files {}
> +	    set dep_files {}
>  	    global module_do
>  	    set module_do {"compile" "P"}
>  	    set asm_list {}
> @@ -346,6 +355,8 @@ foreach src [lsort [find $srcdir/$subdir {*_a.[CHX}]] {
>  		set mod_files [find $DEFAULT_REPO *.gcm]
>  	    }
>  	    cleanup_module_files $mod_files
> +
> +	    cleanup_dep_files $dep_files
>  	}
>      }
>  }

These `cleanup_dep_files` hunks are leftovers from my attempts at
getting the P1689 and flags tests working; they'll be gone in v3.

--Ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] p1689r5: initial support
  2022-10-27 23:16 ` [PATCH v2 3/3] p1689r5: initial support Ben Boeckel
  2022-10-28 17:15   ` Ben Boeckel
@ 2022-11-01 14:57   ` Tom Tromey
  2022-11-01 16:22     ` Ben Boeckel
  1 sibling, 1 reply; 12+ messages in thread
From: Tom Tromey @ 2022-11-01 14:57 UTC (permalink / raw)
  To: Ben Boeckel via Gcc-patches
  Cc: Ben Boeckel, gcc, brad.king, fortran, anlauf, nathan

>>>>> "Ben" == Ben Boeckel via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

Ben> - `-fdeps-file=` specifies the path to the file to write the format to.

I don't know how this output is intended to be used, but one mistake
made with the other dependency-tracking options was that the output file
isn't created atomically.  As a consequence, Makefiles normally have to
work around this to be robust.  If that's a possible issue here then it
would be best to handle it in this patch.

Tom

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] p1689r5: initial support
  2022-11-01 14:57   ` Tom Tromey
@ 2022-11-01 16:22     ` Ben Boeckel
  0 siblings, 0 replies; 12+ messages in thread
From: Ben Boeckel @ 2022-11-01 16:22 UTC (permalink / raw)
  To: Tom Tromey
  Cc: Ben Boeckel via Gcc-patches, gcc, brad.king, fortran, anlauf, nathan

On Tue, Nov 01, 2022 at 08:57:37 -0600, Tom Tromey wrote:
> >>>>> "Ben" == Ben Boeckel via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> 
> Ben> - `-fdeps-file=` specifies the path to the file to write the format to.
> 
> I don't know how this output is intended to be used, but one mistake
> made with the other dependency-tracking options was that the output file
> isn't created atomically.  As a consequence, Makefiles normally have to
> work around this to be robust.  If that's a possible issue here then it
> would be best to handle it in this patch.

I don't think there'll be any race here because it's the "output" of the
rule as far as the build graph is concerned. It's also JSON, so anything
reading it "early" will get a partial object and easily detect
"something went wrong". And for clarity, the `-o` flag used in CMake
with this is just a side effect of the `-E` mechanism used and is
completely ignored in the CMake usage of this.

--Ben

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF
  2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
  2022-10-28 12:54   ` David Malcolm
@ 2022-11-07 23:04   ` Jason Merrill
  1 sibling, 0 replies; 12+ messages in thread
From: Jason Merrill @ 2022-11-07 23:04 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: nathan, fortran, gcc, brad.king, dmalcolm, mliska, anlauf

On 10/27/22 13:16, Ben Boeckel wrote:
> Unicode does not support such values because they are unrepresentable in
> UTF-16.
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   libcpp/ChangeLog  | 6 ++++++
>   libcpp/charset.cc | 4 ++--
>   2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 18d5bcceaf0..4d707277531 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* include/charset.cc: Reject encodings of codepoints above 0x10FFFF.
> +	UTF-16 does not support such codepoints and therefore all Unicode
> +	rejects such values.
> +
>   2022-10-19  Lewis Hyatt  <lhyatt@gmail.com>
>   
>   	* include/cpplib.h (struct cpp_string): Use new "string_length" GTY.
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index 12a398e7527..e9da6674b5f 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -216,7 +216,7 @@ one_utf8_to_cppchar (const uchar **inbufp, size_t *inbytesleftp,
>     if (c <= 0x3FFFFFF && nbytes > 5) return EILSEQ;
>   
>     /* Make sure the character is valid.  */
> -  if (c > 0x7FFFFFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;
> +  if (c > 0x10FFFF || (c >= 0xD800 && c <= 0xDFFF)) return EILSEQ;

Please also adjust the comment before the function that talks about the 
0x7FFFFFFF maximum.

>   
>     *cp = c;
>     *inbufp = inbuf;
> @@ -320,7 +320,7 @@ one_utf32_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp,
>     s += inbuf[bigend ? 2 : 1] << 8;
>     s += inbuf[bigend ? 3 : 0];
>   
> -  if (s >= 0x7FFFFFFF || (s >= 0xD800 && s <= 0xDFFF))
> +  if (s > 0x10FFFF || (s >= 0xD800 && s <= 0xDFFF))
>       return EILSEQ;
>   
>     rval = one_cppchar_to_utf8 (s, outbufp, outbytesleftp);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string
  2022-10-27 23:16 ` [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Ben Boeckel
  2022-10-28 12:59   ` David Malcolm
@ 2022-11-07 23:47   ` Jason Merrill
  1 sibling, 0 replies; 12+ messages in thread
From: Jason Merrill @ 2022-11-07 23:47 UTC (permalink / raw)
  To: Ben Boeckel, gcc-patches
  Cc: nathan, fortran, gcc, brad.king, dmalcolm, mliska, anlauf

On 10/27/22 13:16, Ben Boeckel wrote:
> This simplifies the interface for other UTF-8 validity detections when a
> simple "yes" or "no" answer is sufficient.
> 
> Signed-off-by: Ben Boeckel <ben.boeckel@kitware.com>
> ---
>   libcpp/ChangeLog  |  6 ++++++
>   libcpp/charset.cc | 18 ++++++++++++++++++
>   libcpp/internal.h |  2 ++
>   3 files changed, 26 insertions(+)
> 
> diff --git a/libcpp/ChangeLog b/libcpp/ChangeLog
> index 4d707277531..4e2c7900ae2 100644
> --- a/libcpp/ChangeLog
> +++ b/libcpp/ChangeLog
> @@ -1,3 +1,9 @@
> +2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
> +
> +	* include/charset.cc: Add `_cpp_valid_utf8_str` which determines
> +	whether a C string is valid UTF-8 or not.
> +	* include/internal.h: Add prototype for `_cpp_valid_utf8_str`.
> +
>   2022-10-27  Ben Boeckel  <ben.boeckel@kitware.com>
>   
>   	* include/charset.cc: Reject encodings of codepoints above 0x10FFFF.
> diff --git a/libcpp/charset.cc b/libcpp/charset.cc
> index e9da6674b5f..0524ab6beba 100644
> --- a/libcpp/charset.cc
> +++ b/libcpp/charset.cc
> @@ -1864,6 +1864,24 @@ _cpp_valid_utf8 (cpp_reader *pfile,
>     return true;
>   }

Please add a comment before the function.

> +extern bool
> +_cpp_valid_utf8_str (const char *name)
> +{
> +  const uchar* in = (const uchar*)name;
> +  size_t len = strlen(name);
> +  cppchar_t cp;
> +
> +  while (*in)
> +    {
> +      if (one_utf8_to_cppchar(&in, &len, &cp))
> +	{
> +	  return false;
> +	}
> +    }

We usually omit unnecessary { } around single statements.

> +  return true;
> +}
> +
>   /* Subroutine of convert_hex and convert_oct.  N is the representation
>      in the execution character set of a numeric escape; write it into the
>      string buffer TBUF and update the end-of-string pointer therein.  WIDE
> diff --git a/libcpp/internal.h b/libcpp/internal.h
> index badfd1b40da..4f2dd4a2f5c 100644
> --- a/libcpp/internal.h
> +++ b/libcpp/internal.h
> @@ -834,6 +834,8 @@ extern bool _cpp_valid_utf8 (cpp_reader *pfile,
>   			     struct normalize_state *nst,
>   			     cppchar_t *cp);
>   
> +extern bool _cpp_valid_utf8_str (const char *str);
> +
>   extern void _cpp_destroy_iconv (cpp_reader *);
>   extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
>   					  unsigned char *, size_t, size_t,


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-07 23:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-27 23:16 [PATCH v2 0/1] RFC: P1689R5 support Ben Boeckel
2022-10-27 23:16 ` [PATCH v2 1/3] libcpp: reject codepoints above 0x10FFFF Ben Boeckel
2022-10-28 12:54   ` David Malcolm
2022-11-07 23:04   ` Jason Merrill
2022-10-27 23:16 ` [PATCH v2 2/3] libcpp: add a function to determine UTF-8 validity of a C string Ben Boeckel
2022-10-28 12:59   ` David Malcolm
2022-10-28 17:14     ` Ben Boeckel
2022-11-07 23:47   ` Jason Merrill
2022-10-27 23:16 ` [PATCH v2 3/3] p1689r5: initial support Ben Boeckel
2022-10-28 17:15   ` Ben Boeckel
2022-11-01 14:57   ` Tom Tromey
2022-11-01 16:22     ` Ben Boeckel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).