public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens
@ 2023-07-21 23:08 Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-21 23:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Hello-

This is an update to the v2 patch series last sent in January:
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html

While I did not receive any feedback on the v2 patches yet, they did need some
rebasing on top of other recent commits to input.cc, so I thought it would be
helpful to send them again now. The patches have not otherwise changed from
v2, and the above-linked message explains how all the patches fit in with the
original v1 series sent last November.

Dave, I would appreciate it very much if you could please let me know what you
think of this approach? I feel like the diagnostics we currently
output for _Pragmas are worth improving. As a reminder, say for this example:

=====
 #define S "GCC diagnostic ignored \"oops"
 _Pragma(S)
=====

We currently output:

=====
file.cpp:2:24: warning: missing terminating " character
    2 | _Pragma(S)
      |                        ^
=====

While after these patches, we would output:

======
<generated>:1:24: warning: missing terminating " character
    1 | GCC diagnostic ignored "oops
      |                        ^
file.cpp:2:1: note: in <_Pragma directive>
    2 | _Pragma(S)
      | ^~~~~~~
======

Thanks!

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
@ 2023-07-21 23:08 ` Lewis Hyatt
  2023-07-28 22:58   ` David Malcolm
  2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-21 23:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The actual change needed to the line-maps API in libcpp is not too large and
requires no space overhead in the line map data structures (on 64-bit systems
that is; one newly added data member to class line_map_ordinary sits inside
former padding bytes.) An LC_GEN map is just an ordinary map like any other,
but the TO_FILE member that normally points to the file name points instead to
the actual data.  This works automatically with PCH as well, for the same
reason that the file name makes its way into a PCH.  In order to avoid
confusion, the member has been renamed from TO_FILE to DATA, and associated
accessors adjusted.

Outside libcpp, there are many small changes but most of them are to
selftests, which are necessarily more sensitive to implementation
details. From the perspective of the user (the "user", here, being a frontend
using line maps or else the diagnostics infrastructure), the chief visible
change is that the function location_get_source_line() should be passed an
expanded_location object instead of a separate filename and line number.  This
is not a big change because in most cases, this information came anyway from a
call to expand_location and the needed expanded_location object is readily
available. The new overload of location_get_source_line() uses the extra
information in the expanded_location object to obtain the data from the
in-memory buffer when it originated from an LC_GEN map.

Until the subsequent patch that starts using LC_GEN maps, none are yet
generated within GCC, hence nothing is added to the testsuite here; but all
relevant selftests have been extended to cover generated data maps in addition
to normal files.

libcpp/ChangeLog:

	* include/line-map.h (enum lc_reason): Add LC_GEN.
	(struct line_map_ordinary): Add new members to support LC_GEN concept.
	(ORDINARY_MAP_FILE_NAME): Assert that map really does encode a file
	and not generated data.
	(ORDINARY_MAP_GENERATED_DATA_P): New function.
	(ORDINARY_MAP_GENERATED_DATA): New function.
	(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
	(ORDINARY_MAP_FILE_NAME_OR_DATA): New function.
	(ORDINARY_MAPS_SAME_FILE_P): Declare new function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare new function.
	(LINEMAP_FILE): This was always a synonym for ORDINARY_MAP_FILE_NAME;
	make this explicit.
	(linemap_get_file_highest_location): Adjust prototype.
	(linemap_add): Adjust prototype.
	(class expanded_location): Add new members to store generated content.
	* line-map.cc (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
	(ORDINARY_MAPS_SAME_FILE_P): New function.
	(linemap_add): Add new argument DATA_LEN. Support generated data in
	LC_GEN maps.
	(linemap_check_files_exited): Adapt to API changes supporting LC_GEN.
	(linemap_line_start): Likewise.
	(linemap_position_for_loc_and_offset): Likewise.
	(linemap_get_expansion_filename): Likewise.
	(linemap_expand_location): Likewise.
	(linemap_dump): Likewise.
	(linemap_dump_location): Likewise.
	(linemap_get_file_highest_location): Likewise.
	* directives.cc (_cpp_do_file_change): Likewise.

gcc/ChangeLog:

	* diagnostic-show-locus.cc (make_range): Initialize new fields in
	expanded_location.
	(compatible_locations_p): Use new ORDINARY_MAPS_SAME_FILE_P ()
	function.
	(layout::calculate_x_offset_display): Use the new expanded_location
	overload of location_get_source_line(), so as to support LC_GEN maps.
	(layout::print_line): Likewise.
	(source_line::source_line): Likewise.
	(line_corrections::add_hint): Likewise.
	(class line_corrections): Store the location as an exploc rather than
	individual filename, so as to support LC_GEN maps.
	(layout::print_trailing_fixits): Use the new exploc constructor for
	class line_corrections.
	(test_layout_x_offset_display_utf8): Test LC_GEN maps as well as normal.
	(test_layout_x_offset_display_tab): Likewise.
	(test_diagnostic_show_locus_one_liner): Likewise.
	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
	(test_add_location_if_nearby): Likewise.
	(test_diagnostic_show_locus_fixit_lines): Likewise.
	(test_fixit_consolidation): Likewise.
	(test_overlapped_fixit_printing): Likewise.
	(test_overlapped_fixit_printing_utf8): Likewise.
	(test_overlapped_fixit_printing_2): Likewise.
	(test_fixit_insert_containing_newline): Likewise.
	(test_fixit_insert_containing_newline_2): Likewise.
	(test_fixit_replace_containing_newline): Likewise.
	(test_fixit_deletion_affecting_newline): Likewise.
	(test_tab_expansion): Likewise.
	(test_escaping_bytes_1): Likewise.
	(test_escaping_bytes_2): Likewise.
	(test_line_numbers_multiline_range): Likewise.
	(diagnostic_show_locus_cc_tests): Likewise.
	* diagnostic.cc (diagnostic_report_current_module): Support LC_GEN
	maps when outputting include trace.
	(assert_location_text): Zero-initialize the expanded_location so as to
	cover all fields, including the newly added ones.
	* gcc-rich-location.cc (blank_line_before_p): Use the new
	expanded_location overload of location_get_source_line().
	* input.cc (special_fname_generated): New function.
	(class file_cache_slot): Factored out most of implementation to a new
	base class...
	(class cache_data_source): ... here.
	(cache_data_source::cache_data_source): New member function.
	(cache_data_source::~cache_data_source): New member function.
	(cache_data_source::reset): New member function.
	(class data_cache_slot): New derived class of cache_data_source which
	handles generated data.
	(data_cache_slot::create): New function.
	(expand_location_1): Handle LC_GEN locations.
	(total_lines_num): Likewise.
	(file_cache::lookup_data): New member function.
	(diagnostics_file_cache_forcibly_evict_data): New function.
	(file_cache::forcibly_evict_data): New member function.
	(file_cache::add_data): New member function.
	(file_cache::lookup_or_add_data): New member function.
	(file_cache::evicted_cache_tab_entry): Adapt to handle generated data
	locations.
	(file_cache::file_cache): Likewise.
	(file_cache::~file_cache): Likewise.
	(file_cache_slot::evict): Rename to...
	(file_cache_slot::reset): ...the new interface here.
	(file_cache_slot::create): Likewise.
	(file_cache_slot::file_cache_slot): Likewise.
	(file_cache_slot::~file_cache_slot): Likewise.
	(file_cache_slot::needs_read_p): Likewise.
	(file_cache_slot::needs_grow_p): Likewise.
	(file_cache_slot::maybe_grow): Likewise.
	(file_cache_slot::read_data): Likewise.
	(file_cache_slot::maybe_read_data): Rename to...
	(file_cache_slot::get_more_data): ...the new interface here.
	(find_end_of_line): Add missing const.
	(file_cache_slot::get_next_line): Refactored to...
	(cache_data_source::get_next_line): ...here.
	(file_cache_slot::goto_next_line): Refactored to...
	(cache_data_source::goto_next_line): ...here.
	(file_cache_slot::read_line_num): Refactored to...
	(cache_data_source::read_line_num): ...here.
	(location_get_source_line): Change to take an expanded_location
	argument instead of a filename.  Support generated data. Add another
	overload taking a filename that delegates to this one.
	(location_compute_display_column): Use new overload of
	location_get_source_line and handle generated data locations.
	(dump_location_info): Likewise.
	(get_substring_ranges_for_loc): Likewise.
	(temp_source_file::do_linemap_add): New member function.
	(line_table_test::line_table_test): Initialize the new member.
	(test_accessing_ordinary_linemaps): Test generated data as well as
	normal files.
	(test_make_location_nonpure_range_endpoints): Likewise.
	(test_line_offset_overflow): Likewise.
	(for_each_line_table_case): Add new argument requesting to test
	generated data.
	(input_cc_tests): Enable testing generated data in the selftests.
	* input.h (special_fname_generated): Declare new function.
	(location_get_source_line): Add new overload taking an
	expanded_location.
	(class data_cache_slot): Forward declare.
	(class file_cache): Add a cache of generated data buffers as well as
	ordinary file buffers.
	(diagnostics_file_cache_forcibly_evict_data): Declare new function.
	* selftest.cc (named_temp_file::named_temp_file): Support nullptr
	argument to disable creating any file.
	(named_temp_file::~named_temp_file): Likewise.
	(temp_source_file::temp_source_file): Add a new constructor argument
	to enable creating generated data instead of a file.
	(temp_source_file::~temp_source_file): Handle freeing generated data buffer.
	* selftest.h (struct line_map_ordinary): Forward declare.
	(class named_temp_file): Add missing explicit on constructor.
	(class temp_source_file): Add new members to store generated content.
	(class line_table_test): Add new m_generated_data member.
	(for_each_line_table_case): Update prototype for new argument.

gcc/c-family/ChangeLog:
	* c-common.cc (try_to_locate_new_include_insertion_point): Add
	awareness of LC_GEN maps.
	* c-format.cc (get_corrected_substring): Use the new expanded_location
	overload of location_get_source_line(), so as to support LC_GEN maps.
	* c-indentation.cc (get_visual_column): Likewise.
	(get_first_nws_vis_column): Likewise.
	(detect_intervening_unindent): Likewise.
	(should_warn_for_misleading_indentation): Likewise.
	(assert_get_visual_column_succeeds): Zero-initialize the exploc to
	cover all fields including those newly added.
	(assert_get_visual_column_fails): Likewise.

gcc/cp/ChangeLog:

	* module.cc (module_state::write_ordinary_maps): Ignore LC_GEN maps to
	be safe.
	(module_state::read_ordinary_maps): Likewise.

gcc/go/ChangeLog:

	* go-linemap.cc (Gcc_linemap::to_string): Adapt to linemaps API change.

gcc/testsuite/ChangeLog:

	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: Use the new
	overload of location_get_source_line.
---
 gcc/c-family/c-common.cc                      |   9 +-
 gcc/c-family/c-format.cc                      |   2 +-
 gcc/c-family/c-indentation.cc                 |  28 +-
 gcc/cp/module.cc                              |   9 +-
 gcc/diagnostic-show-locus.cc                  | 239 ++---
 gcc/diagnostic.cc                             |  15 +-
 gcc/gcc-rich-location.cc                      |   2 +-
 gcc/go/go-linemap.cc                          |   3 +-
 gcc/input.cc                                  | 821 ++++++++++--------
 gcc/input.h                                   |  22 +-
 gcc/selftest.cc                               |  53 +-
 gcc/selftest.h                                |  20 +-
 .../diagnostic_plugin_test_show_locus.c       |   4 +-
 libcpp/directives.cc                          |   3 +-
 libcpp/include/line-map.h                     |  92 +-
 libcpp/line-map.cc                            | 178 +++-
 16 files changed, 926 insertions(+), 574 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9fbaeb437a1..44256ae5512 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9206,11 +9206,15 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
       const line_map_ordinary *ord_map
 	= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+      if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+	continue;
+
       if (const line_map_ordinary *from
 	  = linemap_included_from_linemap (line_table, ord_map))
 	/* We cannot use pointer equality, because with preprocessed
 	   input all filename strings are unique.  */
-	if (0 == strcmp (from->to_file, file))
+	if (!ORDINARY_MAP_GENERATED_DATA_P (from)
+	    && 0 == strcmp (ORDINARY_MAP_FILE_NAME (from), file))
 	  {
 	    last_include_ord_map = from;
 	    last_ord_map_after_include = NULL;
@@ -9218,7 +9222,8 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
 
       /* Likewise, use strcmp, and reject any line-zero introductory
 	 map.  */
-      if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+      if (ord_map->to_line
+	  && 0 == strcmp (ORDINARY_MAP_FILE_NAME (ord_map), file))
 	{
 	  if (!first_ord_map_in_file)
 	    first_ord_map_in_file = ord_map;
diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index b4eeebcb30e..eda85c0162a 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4537,7 +4537,7 @@ get_corrected_substring (const substring_loc &fmt_loc,
   if (caret.column > finish.column)
     return NULL;
 
-  char_span line = location_get_source_line (start.file, start.line);
+  char_span line = location_get_source_line (start);
   if (!line)
     return NULL;
 
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index e8d3dece770..4164fa0b1ba 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
 		   unsigned int *first_nws,
 		   unsigned int tab_width)
 {
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
     return false;
   if ((size_t)exploc.column > line.length ())
@@ -87,13 +87,13 @@ get_visual_column (expanded_location exploc,
    Otherwise, return false, leaving *FIRST_NWS untouched.  */
 
 static bool
-get_first_nws_vis_column (const char *file, int line_num,
+get_first_nws_vis_column (expanded_location exploc,
 			  unsigned int *first_nws,
 			  unsigned int tab_width)
 {
   gcc_assert (first_nws);
 
-  char_span line = location_get_source_line (file, line_num);
+  char_span line = location_get_source_line (exploc);
   if (!line)
     return false;
   unsigned int vis_column = 0;
@@ -158,19 +158,18 @@ get_first_nws_vis_column (const char *file, int line_num,
    Return true if such an unindent/outdent is detected.  */
 
 static bool
-detect_intervening_unindent (const char *file,
-			     int body_line,
+detect_intervening_unindent (expanded_location exploc,
 			     int next_stmt_line,
 			     unsigned int vis_column,
 			     unsigned int tab_width)
 {
-  gcc_assert (file);
-  gcc_assert (next_stmt_line > body_line);
+  gcc_assert (exploc.file);
+  gcc_assert (next_stmt_line > exploc.line);
 
-  for (int line = body_line + 1; line < next_stmt_line; line++)
+  while (++exploc.line < next_stmt_line)
     {
       unsigned int line_vis_column;
-      if (get_first_nws_vis_column (file, line, &line_vis_column, tab_width))
+      if (get_first_nws_vis_column (exploc, &line_vis_column, tab_width))
 	if (line_vis_column < vis_column)
 	  return true;
     }
@@ -528,8 +527,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
 
 	  /* Don't warn if there is an unindent between the two statements. */
 	  int vis_column = MIN (next_stmt_vis_column, body_vis_column);
-	  if (detect_intervening_unindent (body_exploc.file, body_exploc.line,
-					   next_stmt_exploc.line,
+	  if (detect_intervening_unindent (body_exploc, next_stmt_exploc.line,
 					   vis_column, tab_width))
 	    return false;
 
@@ -691,12 +689,10 @@ assert_get_visual_column_succeeds (const location &loc,
 				   unsigned int expected_visual_column,
 				   unsigned int expected_first_nws)
 {
-  expanded_location exploc;
+  expanded_location exploc = {};
   exploc.file = file;
   exploc.line = line;
   exploc.column = column;
-  exploc.data = NULL;
-  exploc.sysp = false;
   unsigned int actual_visual_column;
   unsigned int actual_first_nws;
   bool result = get_visual_column (exploc,
@@ -729,12 +725,10 @@ assert_get_visual_column_fails (const location &loc,
 				const char *file, int line, int column,
 				const unsigned int tab_width)
 {
-  expanded_location exploc;
+  expanded_location exploc = {};
   exploc.file = file;
   exploc.line = line;
   exploc.column = column;
-  exploc.data = NULL;
-  exploc.sysp = false;
   unsigned int actual_visual_column;
   unsigned int actual_first_nws;
   bool result = get_visual_column (exploc,
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..908fff82cce 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16250,6 +16250,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
        iter != end; ++iter)
     if (iter->src != current)
       {
+	if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	  continue;
 	current = iter->src;
 	const char *fname = ORDINARY_MAP_FILE_NAME (iter->src);
 
@@ -16267,7 +16269,7 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
 		   preprocessed input we could have multiple instances
 		   of the same name, and we'd rather not percolate
 		   that.  */
-		const_cast<line_map_ordinary *> (iter->src)->to_file = name;
+		const_cast<line_map_ordinary *> (iter->src)->data = name;
 		fname = NULL;
 		break;
 	      }
@@ -16295,6 +16297,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
   for (auto iter = ord_loc_remap->begin (), end = ord_loc_remap->end ();
        iter != end; ++iter)
     {
+      if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	continue;
       dump (dumper::LOCATION)
 	&& dump ("Span:%u ordinary [%u+%u,+%u)->[%u,+%u)",
 		 iter - ord_loc_remap->begin (),
@@ -16456,7 +16460,8 @@ module_state::read_ordinary_maps (unsigned num_ord_locs, unsigned range_bits)
 	  map->m_range_bits = sec.u ();
 	  map->m_column_and_range_bits = sec.u () + map->m_range_bits;
 	  unsigned fnum = sec.u ();
-	  map->to_file = (fnum < filenames.length () ? filenames[fnum] : "");
+	  map->data = (fnum < filenames.length () ? filenames[fnum] : "");
+	  map->data_len = 1 + strlen (map->data);
 	  map->to_line = sec.u ();
 	  base = map;
 	}
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 0514815b51f..fe94dc75d10 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -709,9 +709,9 @@ static layout_range
 make_range (int start_line, int start_col, int end_line, int end_col)
 {
   const expanded_location start_exploc
-    = {"", start_line, start_col, NULL, false};
+    = {"", start_line, start_col, NULL, false, 0, NULL};
   const expanded_location finish_exploc
-    = {"", end_line, end_col, NULL, false};
+    = {"", end_line, end_col, NULL, false, 0, NULL};
   return layout_range (exploc_with_display_col (start_exploc, def_policy (),
 						LOCATION_ASPECT_START),
 		       exploc_with_display_col (finish_exploc, def_policy (),
@@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
 	 are in the same file.  */
       const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
       const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
-      return ord_map_a->to_file == ord_map_b->to_file;
+      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);
     }
 }
 
@@ -1614,8 +1614,7 @@ layout::calculate_x_offset_display ()
       return;
     }
 
-  const char_span line = location_get_source_line (m_exploc.file,
-						   m_exploc.line);
+  const char_span line = location_get_source_line (m_exploc);
   if (!line)
     {
       /* Nothing to do, we couldn't find the source line.  */
@@ -2403,17 +2402,18 @@ class line_corrections
 {
 public:
   line_corrections (const char_display_policy &policy,
-		    const char *filename,
-		    linenum_type row)
-  : m_policy (policy), m_filename (filename), m_row (row)
-  {}
+		    expanded_location exploc, linenum_type row = 0)
+  : m_policy (policy), m_exploc (exploc)
+  {
+    if (row)
+      m_exploc.line = row;
+  }
   ~line_corrections ();
 
   void add_hint (const fixit_hint *hint);
 
   const char_display_policy &m_policy;
-  const char *m_filename;
-  linenum_type m_row;
+  expanded_location m_exploc;
   auto_vec <correction *> m_corrections;
 };
 
@@ -2433,7 +2433,7 @@ line_corrections::~line_corrections ()
 class source_line
 {
 public:
-  source_line (const char *filename, int line);
+  explicit source_line (expanded_location xloc);
 
   char_span as_span () { return char_span (chars, width); }
 
@@ -2443,9 +2443,9 @@ public:
 
 /* source_line's ctor.  */
 
-source_line::source_line (const char *filename, int line)
+source_line::source_line (expanded_location exploc)
 {
-  char_span span = location_get_source_line (filename, line);
+  char_span span = location_get_source_line (exploc);
   chars = span.get_buffer ();
   width = span.length ();
 }
@@ -2489,7 +2489,7 @@ line_corrections::add_hint (const fixit_hint *hint)
 				affected_bytes.start - 1);
 
 	  /* Try to read the source.  */
-	  source_line line (m_filename, m_row);
+	  source_line line (m_exploc);
 	  if (line.chars && between.finish < line.width)
 	    {
 	      /* Consolidate into the last correction:
@@ -2545,7 +2545,7 @@ layout::print_trailing_fixits (linenum_type row)
 {
   /* Build a list of correction instances for the line,
      potentially consolidating hints (for the sake of readability).  */
-  line_corrections corrections (m_policy, m_exploc.file, row);
+  line_corrections corrections (m_policy, m_exploc, row);
   for (unsigned int i = 0; i < m_fixit_hints.length (); i++)
     {
       const fixit_hint *hint = m_fixit_hints[i];
@@ -2783,7 +2783,7 @@ layout::show_ruler (int max_column) const
 void
 layout::print_line (linenum_type row)
 {
-  char_span line = location_get_source_line (m_exploc.file, row);
+  char_span line = location_get_source_line (m_exploc, row);
   if (!line)
     return;
 
@@ -2992,10 +2992,10 @@ test_layout_x_offset_display_utf8 (const line_table_case &case_)
      no multibyte characters earlier on the line.  */
   const int emoji_col = 102;
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, 1 + line_bytes,
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, line_bytes);
 
@@ -3003,17 +3003,23 @@ test_layout_x_offset_display_utf8 (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (line_bytes, LOCATION_COLUMN (line_end));
 
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  const expanded_location xloc = expand_location (line_end);
+  char_span lspan = location_get_source_line (xloc);
   ASSERT_EQ (line_display_cols,
 	     cpp_display_width (lspan.get_buffer (), lspan.length (),
 				def_policy ()));
   ASSERT_EQ (line_display_cols,
-	     location_compute_display_column (expand_location (line_end),
-					      def_policy ()));
+	     location_compute_display_column (xloc, def_policy ()));
   ASSERT_EQ (0, memcmp (lspan.get_buffer () + (emoji_col - 1),
 			"\xf0\x9f\x98\x82\xf0\x9f\x98\x82", 8));
 
@@ -3145,10 +3151,10 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
      a space would have taken up.  */
   ASSERT_EQ (7, extra_width[10]);
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, line_bytes + 1,
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, line_bytes);
 
@@ -3157,7 +3163,8 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
     return;
 
   /* Check that cpp_display_width handles the tabs as expected.  */
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  const expanded_location xloc = expand_location (line_end);
+  char_span lspan = location_get_source_line (xloc);
   ASSERT_EQ ('\t', *(lspan.get_buffer () + (tab_col - 1)));
   for (int tabstop = 1; tabstop != num_tabstops; ++tabstop)
     {
@@ -3166,8 +3173,7 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
 		 cpp_display_width (lspan.get_buffer (), lspan.length (),
 				    policy));
       ASSERT_EQ (line_bytes + extra_width[tabstop],
-		 location_compute_display_column (expand_location (line_end),
-						  policy));
+		 location_compute_display_column (xloc, policy));
     }
 
   /* Check that the tab is expanded to the expected number of spaces.  */
@@ -3791,10 +3797,10 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
      ....................0000000001111111.
      ....................1234567890123456.  */
   const char *content = "foo = bar.field;\n";
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, 16);
 
@@ -3802,7 +3808,14 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (16, LOCATION_COLUMN (line_end));
 
@@ -4373,10 +4386,10 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
     /* 0000000000000000000001111111111111111111222222222222222222222233333
        1111222233334444567890122223333456789999000011112222345678999900001
        Byte columns.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, 31);
 
@@ -4384,11 +4397,18 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (31, LOCATION_COLUMN (line_end));
 
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  char_span lspan = location_get_source_line (expand_location (line_end));
   ASSERT_EQ (25, cpp_display_width (lspan.get_buffer (), lspan.length (),
 				    def_policy ()));
   ASSERT_EQ (25, location_compute_display_column (expand_location (line_end),
@@ -4425,12 +4445,10 @@ test_add_location_if_nearby (const line_table_case &case_)
        "  double x;\n"                              /* line 4.  */
        "  double y;\n"                              /* line 5.  */
        ";\n");                                      /* line 6.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
 
   linemap_line_start (line_table, 1, 100);
 
@@ -4489,12 +4507,10 @@ test_diagnostic_show_locus_fixit_lines (const line_table_case &case_)
        "\n"                                      /* line 4.  */
        "\n"                                      /* line 5.  */
        "                        : 0.0};\n");     /* line 6.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
 
   linemap_line_start (line_table, 1, 100);
 
@@ -4585,8 +4601,10 @@ static void
 test_fixit_consolidation (const line_table_case &case_)
 {
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, "test.c", 1);
+  if (ltt.m_generated_data)
+    linemap_add (line_table, LC_GEN, false, "some content", 1, 13);
+  else
+    linemap_add (line_table, LC_ENTER, false, "test.c", 1);
 
   const location_t c10 = linemap_position_for_column (line_table, 10);
   const location_t c15 = linemap_position_for_column (line_table, 15);
@@ -4732,13 +4750,11 @@ test_overlapped_fixit_printing (const line_table_case &case_)
      ...123456789012345678901234567890123456789.  */
   const char *content
     = ("  foo *f = (foo *)ptr->field;\n");
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
 
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -4759,6 +4775,8 @@ test_overlapped_fixit_printing (const line_table_case &case_)
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 28);
   const location_t expr = make_location (expr_start, expr_start, expr_finish);
 
+  const expanded_location xloc = expand_location (expr);
+
   /* Various examples of fix-it hints that aren't themselves consolidated,
      but for which the *printing* may need consolidation.  */
 
@@ -4802,7 +4820,7 @@ test_overlapped_fixit_printing (const line_table_case &case_)
     /* Add each hint in turn to a line_corrections instance,
        and verify that they are consolidated into one correction instance
        as expected.  */
-    line_corrections lc (policy, tmp.get_filename (), 1);
+    line_corrections lc (policy, xloc);
 
     /* The first replace hint by itself.  */
     lc.add_hint (hint_0);
@@ -4943,13 +4961,10 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
        12344445555666677778901234566667777888899990123456789012333344445
        Byte columns.  */
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -4970,6 +4985,8 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 34);
   const location_t expr = make_location (expr_start, expr_start, expr_finish);
 
+  const expanded_location xloc = expand_location (expr);
+
   /* Various examples of fix-it hints that aren't themselves consolidated,
      but for which the *printing* may need consolidation.  */
 
@@ -5018,7 +5035,7 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
     /* Add each hint in turn to a line_corrections instance,
        and verify that they are consolidated into one correction instance
        as expected.  */
-    line_corrections lc (policy, tmp.get_filename (), 1);
+    line_corrections lc (policy, xloc);
 
     /* The first replace hint by itself.  */
     lc.add_hint (hint_0);
@@ -5176,13 +5193,11 @@ test_overlapped_fixit_printing_2 (const line_table_case &case_)
      ...123456789012345678901234567890123456789.  */
   const char *content
     = ("int a5[][0][0] = { 1, 2 };\n");
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
-  line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
 
+  line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -5267,10 +5282,10 @@ test_fixit_insert_containing_newline (const line_table_case &case_)
 			     "      x = a;\n"  /* line 2. */
 			     "    case 'b':\n" /* line 3. */
 			     "      x = b;\n");/* line 4. */
-
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 3);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), false);
+  tmp.do_linemap_add (3);
 
   location_t case_start = linemap_position_for_column (line_table, 5);
   location_t case_finish = linemap_position_for_column (line_table, 13);
@@ -5338,12 +5353,11 @@ test_fixit_insert_containing_newline_2 (const line_table_case &case_)
 			     "{\n"              /* line 2. */
 			     " putchar (ch);\n" /* line 3. */
 			     "}\n");            /* line 4. */
-
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
 
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* The primary range is the "putchar" token.  */
@@ -5402,9 +5416,10 @@ test_fixit_replace_containing_newline (const line_table_case &case_)
     .........................1234567890123.  */
   const char *old_content = "foo = bar ();\n";
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   /* Replace the " = " with "\n  = ", as if we were reformatting an
      overly long line.  */
@@ -5442,10 +5457,10 @@ test_fixit_deletion_affecting_newline (const line_table_case &case_)
   const char *old_content = ("foo = bar (\n"
 			     "      );\n");
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* Attempt to delete the " (\n...)".  */
@@ -5494,9 +5509,10 @@ test_tab_expansion (const line_table_case &case_)
   const int last_byte_col = 25;
   ASSERT_EQ (35, cpp_display_width (content, last_byte_col, policy));
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   /* Don't attempt to run the tests if column data might be unavailable.  */
   location_t line_end = linemap_position_for_column (line_table, last_byte_col);
@@ -5543,15 +5559,14 @@ test_escaping_bytes_1 (const line_table_case &case_)
 {
   const char content[] = "before\0\1\2\3\v\x80\xff""after\n";
   const size_t sz = sizeof (content);
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz,
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   location_t finish
-    = linemap_position_for_line_and_column (line_table, ord_map, 1,
-					    strlen (content));
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, sz);
 
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
@@ -5599,15 +5614,14 @@ test_escaping_bytes_2 (const line_table_case &case_)
 {
   const char content[]  = "\0after\n";
   const size_t sz = sizeof (content);
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz,
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   location_t finish
-    = linemap_position_for_line_and_column (line_table, ord_map, 1,
-					    strlen (content));
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, sz);
 
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
@@ -5659,8 +5673,7 @@ test_line_numbers_multiline_range ()
   temp_source_file tmp (SELFTEST_LOCATION, ".txt", pp_formatted_text (&pp));
   line_table_test ltt;
 
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* Create a multi-line location, starting at the "line" of line 9, with
@@ -5701,28 +5714,28 @@ diagnostic_show_locus_cc_tests ()
 
   test_display_widths ();
 
-  for_each_line_table_case (test_layout_x_offset_display_utf8);
-  for_each_line_table_case (test_layout_x_offset_display_tab);
+  for_each_line_table_case (test_layout_x_offset_display_utf8, true);
+  for_each_line_table_case (test_layout_x_offset_display_tab, true);
 
   test_get_line_bytes_without_trailing_whitespace ();
 
   test_diagnostic_show_locus_unknown_location ();
 
-  for_each_line_table_case (test_diagnostic_show_locus_one_liner);
-  for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8);
-  for_each_line_table_case (test_add_location_if_nearby);
-  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines);
-  for_each_line_table_case (test_fixit_consolidation);
-  for_each_line_table_case (test_overlapped_fixit_printing);
-  for_each_line_table_case (test_overlapped_fixit_printing_utf8);
-  for_each_line_table_case (test_overlapped_fixit_printing_2);
-  for_each_line_table_case (test_fixit_insert_containing_newline);
-  for_each_line_table_case (test_fixit_insert_containing_newline_2);
-  for_each_line_table_case (test_fixit_replace_containing_newline);
-  for_each_line_table_case (test_fixit_deletion_affecting_newline);
-  for_each_line_table_case (test_tab_expansion);
-  for_each_line_table_case (test_escaping_bytes_1);
-  for_each_line_table_case (test_escaping_bytes_2);
+  for_each_line_table_case (test_diagnostic_show_locus_one_liner, true);
+  for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8, true);
+  for_each_line_table_case (test_add_location_if_nearby, true);
+  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines, true);
+  for_each_line_table_case (test_fixit_consolidation, true);
+  for_each_line_table_case (test_overlapped_fixit_printing, true);
+  for_each_line_table_case (test_overlapped_fixit_printing_utf8, true);
+  for_each_line_table_case (test_overlapped_fixit_printing_2, true);
+  for_each_line_table_case (test_fixit_insert_containing_newline, true);
+  for_each_line_table_case (test_fixit_insert_containing_newline_2, true);
+  for_each_line_table_case (test_fixit_replace_containing_newline, true);
+  for_each_line_table_case (test_fixit_deletion_affecting_newline, true);
+  for_each_line_table_case (test_tab_expansion, true);
+  for_each_line_table_case (test_escaping_bytes_1, true);
+  for_each_line_table_case (test_escaping_bytes_2, true);
 
   test_line_numbers_multiline_range ();
 }
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index c523f215bae..ec78dcc7dbc 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -798,13 +798,15 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
       if (!includes_seen (context, map))
 	{
 	  bool first = true, need_inc = true, was_module = MAP_MODULE_P (map);
+	  const bool was_gen = ORDINARY_MAP_GENERATED_DATA_P (map);
 	  expanded_location s = {};
 	  do
 	    {
 	      where = linemap_included_from (map);
 	      map = linemap_included_from_linemap (line_table, map);
 	      bool is_module = MAP_MODULE_P (map);
-	      s.file = LINEMAP_FILE (map);
+	      s.file = (ORDINARY_MAP_GENERATED_DATA_P (map)
+			? special_fname_generated () : LINEMAP_FILE (map));
 	      s.line = SOURCE_LINE (map, where);
 	      int col = -1;
 	      if (first && context->show_column)
@@ -823,10 +825,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
 		 N_("of module"),
 		 N_("In module imported at"),	/* 6 */
 		 N_("imported at"),
+		 N_("In buffer generated from"),   /* 8 */
 		};
 
-	      unsigned index = (was_module ? 6 : is_module ? 4
-				: need_inc ? 2 : 0) + !first;
+	      const unsigned index
+		= was_gen ? 8
+		: ((was_module ? 6 : is_module ? 4 : need_inc ? 2 : 0)
+		   + !first);
 
 	      pp_verbatim (context->printer, "%s%s %r%s%s%R",
 			   first ? "" : was_module ? ", " : ",\n",
@@ -2690,12 +2695,10 @@ assert_location_text (const char *expected_loc_text,
   dc.column_unit = column_unit;
   dc.column_origin = origin;
 
-  expanded_location xloc;
+  expanded_location xloc = {};
   xloc.file = filename;
   xloc.line = line;
   xloc.column = column;
-  xloc.data = NULL;
-  xloc.sysp = false;
 
   char *actual_loc_text = diagnostic_get_location_text (&dc, xloc);
   ASSERT_STREQ (expected_loc_text, actual_loc_text);
diff --git a/gcc/gcc-rich-location.cc b/gcc/gcc-rich-location.cc
index edecf07f81e..5a118925f77 100644
--- a/gcc/gcc-rich-location.cc
+++ b/gcc/gcc-rich-location.cc
@@ -78,7 +78,7 @@ static bool
 blank_line_before_p (location_t loc)
 {
   expanded_location exploc = expand_location (loc);
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
     return false;
   if (line.length () < (size_t)exploc.column)
diff --git a/gcc/go/go-linemap.cc b/gcc/go/go-linemap.cc
index 1d72e79647d..02d4ce04181 100644
--- a/gcc/go/go-linemap.cc
+++ b/gcc/go/go-linemap.cc
@@ -84,7 +84,8 @@ Gcc_linemap::to_string(Location location)
   resolved_location =
       linemap_resolve_location (line_table, location.gcc_location(),
                                 LRK_SPELLING_LOCATION, &lmo);
-  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT)
+  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT
+      || ORDINARY_MAP_GENERATED_DATA_P (lmo))
     return "";
   const char *path = LINEMAP_FILE (lmo);
   if (!path)
diff --git a/gcc/input.cc b/gcc/input.cc
index eaf301ec7c1..77689e667c5 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -35,6 +35,12 @@ special_fname_builtin ()
   return _("<built-in>");
 }
 
+const char *
+special_fname_generated ()
+{
+  return _("<generated>");
+}
+
 /* Input charset configuration.  */
 static const char *default_charset_callback (const char *)
 {
@@ -49,34 +55,88 @@ file_cache::initialize_input_context (diagnostic_input_charset_callback ccb,
   in_context.should_skip_bom = should_skip_bom;
 }
 
-/* This is a cache used by get_next_line to store the content of a
-   file to be searched for file lines.  */
-class file_cache_slot
-{
-public:
-  file_cache_slot ();
-  ~file_cache_slot ();
+/* This is an abstract interface for a class that provides data which we want to
+   look up by line number.  Concrete implementations will follow, which handle
+   the cases of reading the data from the input source files, or of reading it
+   from in-memory generated data buffers.  The design is driven with reading
+   from files in mind, in particular it is desirable to read only as much of a
+   file from disk as necessary.  It works like a simplified std::istream, i.e.
+   virtual function calls are only needed when we need to retrieve more data
+   from the underlying source.  */
 
-  bool read_line_num (size_t line_num,
-		      char ** line, ssize_t *line_len);
+class cache_data_source
+{
 
-  /* Accessors.  */
-  const char *get_file_path () const { return m_file_path; }
+public:
+  bool read_line_num (size_t line_num, const char **line, ssize_t *line_len);
   unsigned get_use_count () const { return m_use_count; }
+  void inc_use_count () { m_use_count++; }
+  bool get_next_line (const char **line, ssize_t *line_len);
+  bool goto_next_line ();
   bool missing_trailing_newline_p () const
   {
     return m_missing_trailing_newline;
   }
   char_span get_full_file_content ();
+  bool unused () const { return !m_data_begin; }
+  virtual void reset ();
+
+protected:
+  cache_data_source ();
+  virtual ~cache_data_source ();
+
+  /* These pointers delimit the data that we are processing.  They are
+     maintained by the derived classes, we only ask for more by calling
+     get_more_data().  That function should return TRUE if more data was
+     obtained.  Calling get_more_data () may invalidate these pointers
+     (i.e. reallocating them to a larger buffer).  */
+  const char *m_data_begin;
+  const char *m_data_end;
+  virtual bool get_more_data () = 0;
+
+  /* This is to be called by the derived classes when this object is
+     being activated.  */
+  void on_create (unsigned int use_count, size_t total_lines)
+  {
+    m_use_count = use_count;
+    m_total_lines = total_lines;
+  }
 
-  void inc_use_count () { m_use_count++; }
+private:
+  /* Non-copyable.  */
+  cache_data_source (const cache_data_source &) = delete;
+  cache_data_source& operator= (const cache_data_source &) = delete;
 
-  bool create (const file_cache::input_context &in_context,
-	       const char *file_path, FILE *fp, unsigned highest_use_count);
-  void evict ();
+  /* The number of times this data has been accessed.  This is used to designate
+     which entry to evict from the cache array when needed.  */
+  unsigned m_use_count;
 
- private:
-  /* These are information used to store a line boundary.  */
+  /* Could this file be missing a trailing newline on its final line?
+     Initially true (to cope with empty files), set to true/false
+     as each line is read.  */
+  bool m_missing_trailing_newline;
+
+  /* This is the total number of lines in the current data.  At the
+     moment, we try to get this information from the line map
+     subsystem.  Note that this is just a hint.  When using the C++
+     front-end, this hint is correct because the input file is then
+     completely tokenized before parsing starts; so the line map knows
+     the number of lines before compilation really starts.  For e.g,
+     the C front-end, it can happen that we start emitting diagnostics
+     before the line map has seen the end of the file.  */
+  size_t m_total_lines;
+
+  /* The number of the previous lines read.  This starts at 1.  Zero
+     means we've read no line so far.  */
+  size_t m_line_num;
+
+  /* The index of the beginning of the current line.  */
+  size_t m_line_start_idx;
+
+  /* These are information used to store a line boundary.  Here and below, we
+     store always byte offsets, not pointers, since the underlying buffer may be
+     reallocated by the derived implementation unbeknownst to us after calling
+     get_more_data().  */
   class line_info
   {
   public:
@@ -84,13 +144,12 @@ public:
     size_t line_num;
 
     /* The position (byte count) of the beginning of the line,
-       relative to the file data pointer.  This starts at zero.  */
+       relative to M_DATA_BEGIN.  This starts at zero.  */
     size_t start_pos;
 
-    /* The position (byte count) of the last byte of the line.  This
-       normally points to the '\n' character, or to one byte after the
-       last byte of the file, if the file doesn't contain a '\n'
-       character.  */
+    /* The position (byte count) of the last byte of the line.  This normally
+       points to the '\n' character, or to M_DATA_END, if the data doesn't end
+       with a '\n' character.  */
     size_t end_pos;
 
     line_info (size_t l, size_t s, size_t e)
@@ -98,91 +157,76 @@ public:
     {}
 
     line_info ()
-      :line_num (0), start_pos (0), end_pos (0)
+      : line_num (0), start_pos (0), end_pos (0)
     {}
   };
 
-  bool needs_read_p () const;
-  bool needs_grow_p () const;
-  void maybe_grow ();
-  bool read_data ();
-  bool maybe_read_data ();
-  bool get_next_line (char **line, ssize_t *line_len);
-  bool read_next_line (char ** line, ssize_t *line_len);
-  bool goto_next_line ();
-
-  static const size_t buffer_size = 4 * 1024;
+  /* This is a record of the beginning and end of the lines we've seen
+     while reading the file.  This is useful to avoid walking the data
+     from the beginning when we are asked to read a line that is
+     before M_LINE_START_IDX.  Note that the maximum size of this
+     record is line_record_size, so that the memory consumption
+     doesn't explode.  We thus scale total_lines down to
+     line_record_size.  */
+  vec<line_info, va_heap> m_line_record;
   static const size_t line_record_size = 100;
+};
 
-  /* The number of time this file has been accessed.  This is used
-     to designate which file cache to evict from the cache
-     array.  */
-  unsigned m_use_count;
-
-  /* The file_path is the key for identifying a particular file in
-     the cache.
-     For libcpp-using code, the underlying buffer for this field is
-     owned by the corresponding _cpp_file within the cpp_reader.  */
-  const char *m_file_path;
-
-  FILE *m_fp;
-
-  /* This points to the content of the file that we've read so
-     far.  */
-  char *m_data;
-
-  /* The allocated buffer to be freed may start a little earlier than DATA,
-     e.g. if a UTF8 BOM was skipped at the beginning.  */
-  int m_alloc_offset;
+/* This is the implementation of cache_data_source for ordinary
+   source files.  */
+class file_cache_slot final : public cache_data_source
+{
 
-  /*  The size of the DATA array above.*/
-  size_t m_size;
+public:
+  file_cache_slot ();
+  ~file_cache_slot ();
 
-  /* The number of bytes read from the underlying file so far.  This
-     must be less (or equal) than SIZE above.  */
-  size_t m_nb_read;
+  const char *get_file_path () const { return m_file_path; }
+  bool create (const file_cache::input_context &in_context,
+	       const char *file_path, FILE *fp, unsigned highest_use_count);
+  void reset () override;
 
-  /* The index of the beginning of the current line.  */
-  size_t m_line_start_idx;
+protected:
+  bool get_more_data () override;
 
-  /* The number of the previous line read.  This starts at 1.  Zero
-     means we've read no line so far.  */
-  size_t m_line_num;
-
-  /* This is the total number of lines of the current file.  At the
-     moment, we try to get this information from the line map
-     subsystem.  Note that this is just a hint.  When using the C++
-     front-end, this hint is correct because the input file is then
-     completely tokenized before parsing starts; so the line map knows
-     the number of lines before compilation really starts.  For e.g,
-     the C front-end, it can happen that we start emitting diagnostics
-     before the line map has seen the end of the file.  */
-  size_t m_total_lines;
+private:
+  /* The file_path is the key for identifying a particular file in the cache.
+     For libcpp-using code, the underlying buffer for this field is owned by the
+     corresponding _cpp_file within the cpp_reader.  */
+  const char *m_file_path;
 
-  /* Could this file be missing a trailing newline on its final line?
-     Initially true (to cope with empty files), set to true/false
-     as each line is read.  */
-  bool m_missing_trailing_newline;
+  FILE *m_fp;
 
-  /* This is a record of the beginning and end of the lines we've seen
-     while reading the file.  This is useful to avoid walking the data
-     from the beginning when we are asked to read a line that is
-     before LINE_START_IDX above.  Note that the maximum size of this
-     record is line_record_size, so that the memory consumption
-     doesn't explode.  We thus scale total_lines down to
-     line_record_size.  */
-  vec<line_info, va_heap> m_line_record;
+  /* The base class M_DATA_BEGIN and M_DATA_END delimit the bytes that are ready
+     to process.  These two pointers here track a growable memory buffer, owned
+     by this object, where we store data as we read it from the file; we arrange
+     for the base class pointers to point to the right place within this
+     buffer.  */
+  char *m_buf_begin;
+  char *m_buf_end;
+  void maybe_grow ();
+};
 
-  void offset_buffer (int offset)
+/* This is the implementation of cache_data_source for generated
+   data that is already in memory.  */
+class data_cache_slot final : public cache_data_source
+{
+public:
+  void create (const char *data, unsigned int data_len,
+	       unsigned int highest_use_count);
+  bool represents_data (const char *data, unsigned int) const
   {
-    gcc_assert (offset < 0 ? m_alloc_offset + offset >= 0
-		: (size_t) offset <= m_size);
-    gcc_assert (m_data);
-    m_alloc_offset += offset;
-    m_data += offset;
-    m_size -= offset;
+    /* We can just use pointer equality here since the generated data lives in
+       memory in one persistent place.  It isn't anticipated there would be
+       several generated data buffers with the same content, so we don't mind
+       that in such a case we will store it twice.  */
+    return m_data_begin == data;
   }
 
+protected:
+  /* In contrast to file_cache_slot, we do not own a buffer.  The buffer
+     passed to create() needs to outlive this object.  */
+  bool get_more_data () override { return false; }
 };
 
 /* Current position in real source file.  */
@@ -283,6 +327,8 @@ expand_location_1 (location_t loc,
   xloc.data = block;
   if (loc <= BUILTINS_LOCATION)
     xloc.file = loc == UNKNOWN_LOCATION ? NULL : special_fname_builtin ();
+  else if (xloc.generated_data_len)
+    xloc.file = special_fname_generated ();
 
   return xloc;
 }
@@ -317,11 +363,12 @@ diagnostic_file_cache_fini (void)
    equals the actual number of lines of the file.  */
 
 static size_t
-total_lines_num (const char *file_path)
+total_lines_num (const char *fname_or_data, bool is_data)
 {
   size_t r = 0;
   location_t l = 0;
-  if (linemap_get_file_highest_location (line_table, file_path, &l))
+  if (linemap_get_file_highest_location (line_table, fname_or_data,
+					 is_data, &l))
     {
       gcc_assert (l >= RESERVED_LOCATION_COUNT);
       expanded_location xloc = expand_location (l);
@@ -357,6 +404,21 @@ file_cache::lookup_file (const char *file_path)
   return r;
 }
 
+data_cache_slot *
+file_cache::lookup_data (const char *data, unsigned int data_len)
+{
+  for (unsigned int i = 0; i != num_file_slots; ++i)
+    {
+      const auto slot = m_data_slots + i;
+      if (slot->represents_data (data, data_len))
+	{
+	  slot->inc_use_count ();
+	  return slot;
+	}
+    }
+  return nullptr;
+}
+
 /* Purge any mention of FILENAME from the cache of files used for
    printing source code.  For use in selftests when working
    with tempfiles.  */
@@ -372,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
   global_dc->m_file_cache->forcibly_evict_file (file_path);
 }
 
+void
+diagnostics_file_cache_forcibly_evict_data (const char *data,
+					    unsigned int data_len)
+{
+  if (!global_dc->m_file_cache)
+    return;
+  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
+}
+
 void
 file_cache::forcibly_evict_file (const char *file_path)
 {
@@ -382,55 +453,39 @@ file_cache::forcibly_evict_file (const char *file_path)
     /* Not found.  */
     return;
 
-  r->evict ();
+  r->reset ();
 }
 
 void
-file_cache_slot::evict ()
+file_cache::forcibly_evict_data (const char *data, unsigned int data_len)
 {
-  m_file_path = NULL;
-  if (m_fp)
-    fclose (m_fp);
-  m_fp = NULL;
-  m_nb_read = 0;
-  m_line_start_idx = 0;
-  m_line_num = 0;
-  m_line_record.truncate (0);
-  m_use_count = 0;
-  m_total_lines = 0;
-  m_missing_trailing_newline = true;
+  if (auto r = lookup_data (data, data_len))
+    r->reset ();
 }
 
-/* Return the file cache that has been less used, recently, or the
+/* Return the cache that has been less used, recently, or the
    first empty one.  If HIGHEST_USE_COUNT is non-null,
    *HIGHEST_USE_COUNT is set to the highest use count of the entries
    in the cache table.  */
 
-file_cache_slot*
-file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
+template <class Slot>
+Slot *
+file_cache::evicted_cache_tab_entry (Slot *slots,
+				     unsigned int *highest_use_count)
 {
-  diagnostic_file_cache_init ();
-
-  file_cache_slot *to_evict = &m_file_slots[0];
+  auto to_evict = &slots[0];
   unsigned huc = to_evict->get_use_count ();
   for (unsigned i = 1; i < num_file_slots; ++i)
     {
-      file_cache_slot *c = &m_file_slots[i];
-      bool c_is_empty = (c->get_file_path () == NULL);
-
+      auto c = &slots[i];
       if (c->get_use_count () < to_evict->get_use_count ()
-	  || (to_evict->get_file_path () && c_is_empty))
+	  || (!to_evict->unused () && c->unused ()))
 	/* We evict C because it's either an entry with a lower use
 	   count or one that is empty.  */
 	to_evict = c;
 
       if (huc < c->get_use_count ())
 	huc = c->get_use_count ();
-
-      if (c_is_empty)
-	/* We've reached the end of the cache; subsequent elements are
-	   all empty.  */
-	break;
     }
 
   if (highest_use_count)
@@ -454,24 +509,21 @@ file_cache::add_file (const char *file_path)
     return NULL;
 
   unsigned highest_use_count = 0;
-  file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
+  file_cache_slot *r = evicted_cache_tab_entry (m_file_slots,
+						&highest_use_count);
   if (!r->create (in_context, file_path, fp, highest_use_count))
     return NULL;
   return r;
 }
 
-/* Get a borrowed char_span to the full content of this file
-   as decoded according to the input charset, encoded as UTF-8.  */
-
-char_span
-file_cache_slot::get_full_file_content ()
+data_cache_slot *
+file_cache::add_data (const char *data, unsigned int data_len)
 {
-  char *line;
-  ssize_t line_len;
-  while (get_next_line (&line, &line_len))
-    {
-    }
-  return char_span (m_data, m_nb_read);
+  unsigned int highest_use_count = 0;
+  data_cache_slot *r = evicted_cache_tab_entry (m_data_slots,
+						&highest_use_count);
+  r->create (data, data_len, highest_use_count);
+  return r;
 }
 
 /* Populate this slot for use on FILE_PATH and FP, dropping any
@@ -482,22 +534,12 @@ file_cache_slot::create (const file_cache::input_context &in_context,
 			 const char *file_path, FILE *fp,
 			 unsigned highest_use_count)
 {
+  reset ();
+  on_create (highest_use_count + 1, total_lines_num (file_path, false));
+  m_data_begin = m_buf_begin;
+  m_data_end = m_buf_begin;
   m_file_path = file_path;
-  if (m_fp)
-    fclose (m_fp);
   m_fp = fp;
-  if (m_alloc_offset)
-    offset_buffer (-m_alloc_offset);
-  m_nb_read = 0;
-  m_line_start_idx = 0;
-  m_line_num = 0;
-  m_line_record.truncate (0);
-  /* Ensure that this cache entry doesn't get evicted next time
-     add_file_to_cache_tab is called.  */
-  m_use_count = ++highest_use_count;
-  m_total_lines = total_lines_num (file_path);
-  m_missing_trailing_newline = true;
-
 
   /* Check the input configuration to determine if we need to do any
      transformations, such as charset conversion or BOM skipping.  */
@@ -510,29 +552,37 @@ file_cache_slot::create (const file_cache::input_context &in_context,
 	= cpp_get_converted_source (file_path, input_charset);
       if (!cs.data)
 	return false;
-      if (m_data)
-	XDELETEVEC (m_data);
-      m_data = cs.data;
-      m_nb_read = m_size = cs.len;
-      m_alloc_offset = cs.data - cs.to_free;
+      XDELETEVEC (m_buf_begin);
+      m_buf_begin = cs.to_free;
+      m_buf_end = cs.data + cs.len;
+      m_data_begin = cs.data;
+      m_data_end = m_buf_end;
     }
-  else if (in_context.should_skip_bom)
+  else if (in_context.should_skip_bom && get_more_data ())
     {
-      if (read_data ())
-	{
-	  const int offset = cpp_check_utf8_bom (m_data, m_nb_read);
-	  offset_buffer (offset);
-	  m_nb_read -= offset;
-	}
+      const int offset = cpp_check_utf8_bom (m_data_begin,
+					     m_data_end - m_data_begin);
+      m_data_begin += offset;
     }
 
   return true;
 }
 
+void
+data_cache_slot::create (const char *data, unsigned int data_len,
+			 unsigned int highest_use_count)
+{
+  reset ();
+  on_create (highest_use_count + 1, total_lines_num (data, true));
+  m_data_begin = data;
+  m_data_end = data + data_len;
+}
+
 /* file_cache's ctor.  */
 
 file_cache::file_cache ()
-: m_file_slots (new file_cache_slot[num_file_slots])
+  : m_file_slots (new file_cache_slot[num_file_slots]),
+    m_data_slots (new data_cache_slot[num_file_slots])
 {
   initialize_input_context (nullptr, false);
 }
@@ -541,6 +591,7 @@ file_cache::file_cache ()
 
 file_cache::~file_cache ()
 {
+  delete[] m_data_slots;
   delete[] m_file_slots;
 }
 
@@ -558,55 +609,69 @@ file_cache::lookup_or_add_file (const char *file_path)
   return r;
 }
 
-/* Default constructor for a cache of file used by caret
-   diagnostic.  */
+data_cache_slot *
+file_cache::lookup_or_add_data (const char *data, unsigned int data_len)
+{
+  data_cache_slot *r = lookup_data (data, data_len);
+  if (!r)
+    r = add_data (data, data_len);
+  return r;
+}
 
-file_cache_slot::file_cache_slot ()
-: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
-  m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0),
-  m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true)
+cache_data_source::cache_data_source ()
+: m_data_begin (nullptr), m_data_end (nullptr),
+  m_use_count (0),
+  m_missing_trailing_newline (true),
+  m_total_lines (0),
+  m_line_num (0),
+  m_line_start_idx (0)
 {
   m_line_record.create (0);
 }
 
-/* Destructor for a cache of file used by caret diagnostic.  */
-
-file_cache_slot::~file_cache_slot ()
+cache_data_source::~cache_data_source ()
 {
-  if (m_fp)
-    {
-      fclose (m_fp);
-      m_fp = NULL;
-    }
-  if (m_data)
-    {
-      offset_buffer (-m_alloc_offset);
-      XDELETEVEC (m_data);
-      m_data = 0;
-    }
   m_line_record.release ();
 }
 
-/* Returns TRUE iff the cache would need to be filled with data coming
-   from the file.  That is, either the cache is empty or full or the
-   current line is empty.  Note that if the cache is full, it would
-   need to be extended and filled again.  */
-
-bool
-file_cache_slot::needs_read_p () const
+void
+cache_data_source::reset ()
 {
-  return m_fp && (m_nb_read == 0
-	  || m_nb_read == m_size
-	  || (m_line_start_idx >= m_nb_read - 1));
+  m_data_begin = nullptr;
+  m_data_end = nullptr;
+  m_use_count = 0;
+  m_missing_trailing_newline = true;
+  m_total_lines = 0;
+  m_line_num = 0;
+  m_line_start_idx = 0;
+  m_line_record.truncate (0);
 }
 
-/*  Return TRUE iff the cache is full and thus needs to be
-    extended.  */
+file_cache_slot::file_cache_slot ()
+: m_file_path (nullptr), m_fp (nullptr),
+  m_buf_begin (nullptr), m_buf_end (nullptr)
+{}
 
-bool
-file_cache_slot::needs_grow_p () const
+file_cache_slot::~file_cache_slot ()
 {
-  return m_nb_read == m_size;
+  if (m_fp)
+    fclose (m_fp);
+  XDELETEVEC (m_buf_begin);
+}
+
+void
+file_cache_slot::reset ()
+{
+  cache_data_source::reset ();
+  m_file_path = NULL;
+  if (m_fp)
+    {
+      fclose (m_fp);
+      m_fp = NULL;
+    }
+
+  /* Do not free the buffer here, we intend to reuse it the next time this
+     slot is activated.  */
 }
 
 /* Grow the cache if it needs to be extended.  */
@@ -614,22 +679,23 @@ file_cache_slot::needs_grow_p () const
 void
 file_cache_slot::maybe_grow ()
 {
-  if (!needs_grow_p ())
-    return;
-
-  if (!m_data)
+  if (!m_buf_begin)
     {
-      gcc_assert (m_size == 0 && m_alloc_offset == 0);
-      m_size = buffer_size;
-      m_data = XNEWVEC (char, m_size);
+      const size_t buffer_size = 4 * 1024;
+      m_buf_begin = XNEWVEC (char, buffer_size);
+      m_buf_end = m_buf_begin + buffer_size;
+      m_data_begin = m_buf_begin;
+      m_data_end = m_data_begin;
     }
-  else
+  else if (m_data_end == m_buf_end)
     {
-      const int offset = m_alloc_offset;
-      offset_buffer (-offset);
-      m_size *= 2;
-      m_data = XRESIZEVEC (char, m_data, m_size);
-      offset_buffer (offset);
+      const auto new_size = 2 * (m_buf_end - m_buf_begin);
+      const auto data_offset = m_data_begin - m_buf_begin;
+      const auto data_size = m_data_end - m_data_begin;
+      m_buf_begin = XRESIZEVEC (char, m_buf_begin, new_size);
+      m_buf_end = m_buf_begin + new_size;
+      m_data_begin = m_buf_begin + data_offset;
+      m_data_end = m_data_begin + data_size;
     }
 }
 
@@ -637,45 +703,28 @@ file_cache_slot::maybe_grow ()
     Returns TRUE iff new data could be read.  */
 
 bool
-file_cache_slot::read_data ()
+file_cache_slot::get_more_data ()
 {
-  if (feof (m_fp) || ferror (m_fp))
+  if (!m_fp || feof (m_fp) || ferror (m_fp))
     return false;
-
   maybe_grow ();
-
-  char * from = m_data + m_nb_read;
-  size_t to_read = m_size - m_nb_read;
-  size_t nb_read = fread (from, 1, to_read, m_fp);
-
-  if (ferror (m_fp))
+  char *const dest = m_buf_begin + (m_data_end - m_buf_begin);
+  const auto nb_read = fread (dest, 1, m_buf_end - dest, m_fp);
+  if (ferror (m_fp) || !nb_read)
     return false;
-
-  m_nb_read += nb_read;
-  return !!nb_read;
-}
-
-/* Read new data iff the cache needs to be filled with more data
-   coming from the file FP.  Return TRUE iff the cache was filled with
-   mode data.  */
-
-bool
-file_cache_slot::maybe_read_data ()
-{
-  if (!needs_read_p ())
-    return false;
-  return read_data ();
+  m_data_end += nb_read;
+  return true;
 }
 
-/* Helper function for file_cache_slot::get_next_line (), to find the end of
+/* Helper function for cache_data_source::get_next_line (), to find the end of
    the next line.  Returns with the memchr convention, i.e. nullptr if a line
    terminator was not found.  We need to determine line endings in the same
    manner that libcpp does: any of \n, \r\n, or \r is a line ending.  */
 
-static char *
-find_end_of_line (char *s, size_t len)
+static const char *
+find_end_of_line (const char *s, const char *end)
 {
-  for (const auto end = s + len; s != end; ++s)
+  for (; s != end; ++s)
     {
       if (*s == '\n')
 	return s;
@@ -698,41 +747,38 @@ find_end_of_line (char *s, size_t len)
   return nullptr;
 }
 
-/* Read a new line from file FP, using C as a cache for the data
-   coming from the file.  Upon successful completion, *LINE is set to
-   the beginning of the line found.  *LINE points directly in the
-   line cache and is only valid until the next call of get_next_line.
-   *LINE_LEN is set to the length of the line.  Note that the line
-   does not contain any terminal delimiter.  This function returns
-   true if some data was read or process from the cache, false
-   otherwise.  Note that subsequent calls to get_next_line might
-   make the content of *LINE invalid.  */
+/* Read a new line from the data source.  Upon successful completion, *LINE is
+   set to the beginning of the line found.  *LINE points directly in the line
+   cache and is only valid until the next call of get_next_line.  *LINE_LEN is
+   set to the length of the line.  Note that the line does not contain any
+   terminal delimiter.  This function returns true if some data was read or
+   processed from the cache, false otherwise.  Note that subsequent calls to
+   get_next_line might make the content of *LINE invalid.  */
 
 bool
-file_cache_slot::get_next_line (char **line, ssize_t *line_len)
+cache_data_source::get_next_line (const char **line, ssize_t *line_len)
 {
-  /* Fill the cache with data to process.  */
-  maybe_read_data ();
+  const char *line_start = m_data_begin + m_line_start_idx;
 
-  size_t remaining_size = m_nb_read - m_line_start_idx;
-  if (remaining_size == 0)
-    /* There is no more data to process.  */
-    return false;
-
-  char *line_start = m_data + m_line_start_idx;
+  /* Check if we are all done reading the file.  */
+  if (line_start == m_data_end)
+    {
+      if (!get_more_data ())
+	return false;
+      line_start = m_data_begin + m_line_start_idx;
+    }
 
-  char *next_line_start = NULL;
-  size_t len = 0;
-  char *line_end = find_end_of_line (line_start, remaining_size);
+  /* Find the end of the current line.  */
+  const char *next_line_start = NULL;
+  const char *line_end = find_end_of_line (line_start, m_data_end);
   if (line_end == NULL)
     {
       /* We haven't found an end-of-line delimiter in the cache.
 	 Fill the cache with more data from the file and look again.  */
-      while (maybe_read_data ())
+      while (get_more_data ())
 	{
-	  line_start = m_data + m_line_start_idx;
-	  remaining_size = m_nb_read - m_line_start_idx;
-	  line_end = find_end_of_line (line_start, remaining_size);
+	  line_start = m_data_begin + m_line_start_idx;
+	  line_end = find_end_of_line (line_start, m_data_end);
 	  if (line_end != NULL)
 	    {
 	      next_line_start = line_end + 1;
@@ -749,8 +795,8 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 
 	     If the file ends in a \r, we didn't identify it as a line
 	     terminator above, so do that now instead.  */
-	  line_end = m_data + m_nb_read;
-	  if (m_nb_read && line_end[-1] == '\r')
+	  line_end = m_data_end;
+	  if (line_end != m_data_begin && line_end[-1] == '\r')
 	    {
 	      --line_end;
 	      m_missing_trailing_newline = false;
@@ -767,18 +813,11 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
       m_missing_trailing_newline = false;
     }
 
-  if (m_fp && ferror (m_fp))
-    return false;
-
   /* At this point, we've found the end of the of line.  It either points to
      the line terminator or to one byte after the last byte of the file.  */
-  gcc_assert (line_end != NULL);
-
-  len = line_end - line_start;
-
-  if (m_line_start_idx < m_nb_read)
-    *line = line_start;
-
+  const auto len = line_end - line_start;
+  *line = line_start;
+  *line_len = len;
   ++m_line_num;
 
   /* Before we update our line record, make sure the hint about the
@@ -800,7 +839,7 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 	m_line_record.safe_push
 	  (file_cache_slot::line_info (m_line_num,
 				       m_line_start_idx,
-				       line_end - m_data));
+				       line_end - m_data_begin));
       else if (m_total_lines > line_record_size)
 	{
 	  /* ... otherwise, we just scale total_lines down to
@@ -811,23 +850,14 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 	    m_line_record.safe_push
 	      (file_cache_slot::line_info (m_line_num,
 					   m_line_start_idx,
-					   line_end - m_data));
+					   line_end - m_data_begin));
 	}
     }
 
   /* Update m_line_start_idx so that it points to the next line to be
      read.  */
-  if (next_line_start)
-    m_line_start_idx = next_line_start - m_data;
-  else
-    /* We didn't find any terminal '\n'.  Let's consider that the end
-       of line is the end of the data in the cache.  The next
-       invocation of get_next_line will either read more data from the
-       underlying file or return false early because we've reached the
-       end of the file.  */
-    m_line_start_idx = m_nb_read;
-
-  *line_len = len;
+  m_line_start_idx
+    = (next_line_start ? next_line_start : m_data_end) - m_data_begin;
 
   return true;
 }
@@ -839,15 +869,15 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
    completion.  */
 
 bool
-file_cache_slot::goto_next_line ()
+cache_data_source::goto_next_line ()
 {
-  char *l;
+  const char *l;
   ssize_t len;
 
   return get_next_line (&l, &len);
 }
 
-/* Read an arbitrary line number LINE_NUM from the file cached in C.
+/* Read an arbitrary line number LINE_NUM from the data cache.
    If the line was read successfully, *LINE points to the beginning
    of the line in the file cache and *LINE_LEN is the length of the
    line.  *LINE is not nul-terminated, but may contain zero bytes.
@@ -855,8 +885,8 @@ file_cache_slot::goto_next_line ()
    This function returns bool if a line was read.  */
 
 bool
-file_cache_slot::read_line_num (size_t line_num,
-		       char ** line, ssize_t *line_len)
+cache_data_source::read_line_num (size_t line_num,
+				  const char ** line, ssize_t *line_len)
 {
   gcc_assert (line_num > 0);
 
@@ -864,7 +894,7 @@ file_cache_slot::read_line_num (size_t line_num,
     {
       /* We've been asked to read lines that are before m_line_num.
 	 So lets use our line record (if it's not empty) to try to
-	 avoid re-reading the file from the beginning again.  */
+	 avoid re-scanning the data from the beginning again.  */
 
       if (m_line_record.is_empty ())
 	{
@@ -873,7 +903,7 @@ file_cache_slot::read_line_num (size_t line_num,
 	}
       else
 	{
-	  file_cache_slot::line_info *i = NULL;
+	  line_info *i = NULL;
 	  if (m_total_lines <= line_record_size)
 	    {
 	      /* In languages where the input file is not totally
@@ -909,7 +939,7 @@ file_cache_slot::read_line_num (size_t line_num,
 	  if (i && i->line_num == line_num)
 	    {
 	      /* We have the start/end of the line.  */
-	      *line = m_data + i->start_pos;
+	      *line = m_data_begin + i->start_pos;
 	      *line_len = i->end_pos - i->start_pos;
 	      return true;
 	    }
@@ -938,6 +968,20 @@ file_cache_slot::read_line_num (size_t line_num,
   return get_next_line (line, line_len);
 }
 
+/* Get a borrowed char_span to the full content of this file
+   as decoded according to the input charset, encoded as UTF-8.  */
+
+char_span
+cache_data_source::get_full_file_content ()
+{
+  const char *line;
+  ssize_t line_len;
+  while (get_next_line (&line, &line_len))
+    {
+    }
+  return char_span (m_data_begin, m_data_end - m_data_begin);
+}
+
 /* Return the physical source line that corresponds to FILE_PATH/LINE.
    The line is not nul-terminated.  The returned pointer is only
    valid until the next call of location_get_source_line.
@@ -946,30 +990,56 @@ file_cache_slot::read_line_num (size_t line_num,
    If the function fails, a NULL char_span is returned.  */
 
 char_span
-location_get_source_line (const char *file_path, int line)
+location_get_source_line (expanded_location xloc, int line)
 {
-  char *buffer = NULL;
-  ssize_t len;
-
+  const char_span fail (nullptr, 0);
   if (line == 0)
-    return char_span (NULL, 0);
-
-  if (file_path == NULL)
-    return char_span (NULL, 0);
+    return fail;
 
   diagnostic_file_cache_init ();
 
-  file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path);
-  if (c == NULL)
-    return char_span (NULL, 0);
+  cache_data_source *c;
+  if (xloc.generated_data_len)
+    {
+      if (!xloc.generated_data)
+	return fail;
+      c = global_dc->m_file_cache->lookup_or_add_data (xloc.generated_data,
+						       xloc.generated_data_len);
+    }
+  else
+    {
+      if (!xloc.file)
+	return fail;
+      c = global_dc->m_file_cache->lookup_or_add_file (xloc.file);
+    }
 
+  if (!c)
+    return fail;
+
+  const char *buffer = NULL;
+  ssize_t len;
   bool read = c->read_line_num (line, &buffer, &len);
   if (!read)
-    return char_span (NULL, 0);
+    return fail;
 
   return char_span (buffer, len);
 }
 
+char_span
+location_get_source_line (expanded_location xloc)
+{
+  return location_get_source_line (xloc, xloc.line);
+}
+
+char_span
+location_get_source_line (const char *file_path, int line)
+{
+  expanded_location xloc = {};
+  xloc.file = file_path;
+  xloc.line = line;
+  return location_get_source_line (xloc);
+}
+
 /* Return a NUL-terminated copy of the source text between two locations, or
    NULL if the arguments are invalid.  The caller is responsible for freeing
    the return value.  */
@@ -986,8 +1056,18 @@ get_source_text_between (location_t start, location_t end)
      start, give up and return nothing.  */
   if (!expstart.file || !expend.file)
     return NULL;
-  if (strcmp (expstart.file, expend.file) != 0)
+  if (expstart.generated_data_len != expend.generated_data_len)
     return NULL;
+  if (expstart.generated_data_len)
+    {
+      if (expstart.generated_data != expend.generated_data)
+	return NULL;
+    }
+  else
+    {
+      if (strcmp (expstart.file, expend.file) != 0)
+	return NULL;
+    }
   if (expstart.line > expend.line)
     return NULL;
   if (expstart.line == expend.line
@@ -1229,9 +1309,10 @@ int
 location_compute_display_column (expanded_location exploc,
 				 const cpp_char_column_policy &policy)
 {
-  if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
+  if (!(exploc.file && (exploc.generated_data_len || *exploc.file)
+	&& exploc.line && exploc.column))
     return exploc.column;
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   /* If line is NULL, this function returns exploc.column which is the
      desired fallback.  */
   return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
@@ -1391,7 +1472,19 @@ dump_location_info (FILE *stream)
       fprintf (stream, "ORDINARY MAP: %i\n", idx);
       dump_location_range (stream,
 			   MAP_START_LOCATION (map), end_location);
-      fprintf (stream, "  file: %s\n", ORDINARY_MAP_FILE_NAME (map));
+
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	{
+	  fprintf (stream, "  file: %s%s\n",
+		   ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map),
+		   special_fname_generated ());
+	  fprintf (stream, "  data: %.*s\n",
+		   (int) ORDINARY_MAP_GENERATED_DATA_LEN (map),
+		   ORDINARY_MAP_GENERATED_DATA (map));
+	}
+      else
+	fprintf (stream, "  file: %s\n", LINEMAP_FILE (map));
+
       fprintf (stream, "  starting at line: %i\n",
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (map));
       fprintf (stream, "  column and range bits: %i\n",
@@ -1417,6 +1510,9 @@ dump_location_info (FILE *stream)
       case LC_ENTER_MACRO:
 	reason = "LC_RENAME_MACRO";
 	break;
+      case LC_GEN:
+	reason = "LC_GEN";
+	break;
       default:
 	reason = "Unknown";
       }
@@ -1446,13 +1542,14 @@ dump_location_info (FILE *stream)
 	    {
 	      /* Beginning of a new source line: draw the line.  */
 
-	      char_span line_text = location_get_source_line (exploc.file,
-							      exploc.line);
+	      char_span line_text = location_get_source_line (exploc);
 	      if (!line_text)
 		break;
 	      fprintf (stream,
-		       "%s:%3i|loc:%5i|%.*s\n",
-		       exploc.file, exploc.line,
+		       "%s%s:%3i|loc:%5i|%.*s\n",
+		       exploc.file,
+		       exploc.generated_data ? special_fname_generated () : "",
+		       exploc.line,
 		       loc,
 		       (int)line_text.length (), line_text.get_buffer ());
 
@@ -1767,14 +1864,17 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       expanded_location finish
 	= expand_location_to_spelling_point (src_range.m_finish,
 					     LOCATION_ASPECT_FINISH);
-      if (start.file != finish.file)
+      if (start.generated_data_len != finish.generated_data_len
+	  || (start.generated_data_len
+	      ? start.generated_data != finish.generated_data
+	      : start.file != finish.file))
 	return "range endpoints are in different files";
       if (start.line != finish.line)
 	return "range endpoints are on different lines";
       if (start.column > finish.column)
 	return "range endpoints are reversed";
 
-      char_span line = location_get_source_line (start.file, start.line);
+      char_span line = location_get_source_line (start);
       if (!line)
 	return "unable to read source line";
 
@@ -1814,11 +1914,13 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       /* Bulletproofing.  We ought to only have different ordinary maps
 	 for start vs finish due to line-length jumps.  */
       if (start_ord_map != final_ord_map
-	  && start_ord_map->to_file != final_ord_map->to_file)
+	  && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))
 	return "start and finish are spelled in different ordinary maps";
       /* The file from linemap_resolve_location ought to match that from
 	 expand_location_to_spelling_point.  */
-      if (start_ord_map->to_file != start.file)
+      if (ORDINARY_MAP_GENERATED_DATA_P (start_ord_map)
+	  ? ORDINARY_MAP_GENERATED_DATA (start_ord_map) != start.generated_data
+	  : ORDINARY_MAP_FILE_NAME (start_ord_map) != start.file)
 	return "mismatching file after resolving linemap";
 
       location_t start_loc
@@ -1990,6 +2092,20 @@ get_num_source_ranges_for_substring (cpp_reader *pfile,
 
 /* Selftests of location handling.  */
 
+/* Wrapper around linemap_add to handle transparently adding either a tmp file,
+   or in-memory generated content.  */
+const line_map_ordinary *
+temp_source_file::do_linemap_add (int line)
+{
+  const line_map *map;
+  if (content_buf)
+    map = linemap_add (line_table, LC_GEN, false, content_buf,
+		       line, content_len);
+  else
+    map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
+  return linemap_check_ordinary (map);
+}
+
 /* Verify that compare() on linenum_type handles comparisons over the full
    range of the type.  */
 
@@ -2068,13 +2184,16 @@ assert_loceq (const char *exp_filename, int exp_linenum, int exp_colnum,
 class line_table_case
 {
 public:
-  line_table_case (int default_range_bits, int base_location)
+  line_table_case (int default_range_bits, int base_location,
+		   bool generated_data)
   : m_default_range_bits (default_range_bits),
-    m_base_location (base_location)
+    m_base_location (base_location),
+    m_generated_data (generated_data)
   {}
 
   int m_default_range_bits;
   int m_base_location;
+  bool m_generated_data;
 };
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2091,6 +2210,7 @@ line_table_test::line_table_test ()
   gcc_assert (saved_line_table->round_alloc_size);
   line_table->round_alloc_size = saved_line_table->round_alloc_size;
   line_table->default_range_bits = 0;
+  m_generated_data = false;
 }
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2112,6 +2232,7 @@ line_table_test::line_table_test (const line_table_case &case_)
       line_table->highest_location = case_.m_base_location;
       line_table->highest_line = case_.m_base_location;
     }
+  m_generated_data = case_.m_generated_data;
 }
 
 /* Destructor.  Restore the old value of line_table.  */
@@ -2131,7 +2252,10 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
   line_table_test ltt (case_);
 
   /* Build a simple linemap describing some locations. */
-  linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
+  if (ltt.m_generated_data)
+    linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+  else
+    linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
   linemap_line_start (line_table, 1, 100);
   location_t loc_a = linemap_position_for_column (line_table, 1);
@@ -2181,21 +2305,23 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
   linemap_add (line_table, LC_LEAVE, false, NULL, 0);
 
   /* Verify that we can recover the location info.  */
-  assert_loceq ("foo.c", 1, 1, loc_a);
-  assert_loceq ("foo.c", 1, 23, loc_b);
-  assert_loceq ("foo.c", 2, 1, loc_c);
-  assert_loceq ("foo.c", 2, 17, loc_d);
-  assert_loceq ("foo.c", 3, 700, loc_e);
-  assert_loceq ("foo.c", 4, 100, loc_back_to_short);
+  const auto fname
+    = (ltt.m_generated_data ? special_fname_generated () : "foo.c");
+  assert_loceq (fname, 1, 1, loc_a);
+  assert_loceq (fname, 1, 23, loc_b);
+  assert_loceq (fname, 2, 1, loc_c);
+  assert_loceq (fname, 2, 17, loc_d);
+  assert_loceq (fname, 3, 700, loc_e);
+  assert_loceq (fname, 4, 100, loc_back_to_short);
 
   /* In the very wide line, the initial location should be fully tracked.  */
-  assert_loceq ("foo.c", 5, 2000, loc_start_of_very_long_line);
+  assert_loceq (fname, 5, 2000, loc_start_of_very_long_line);
   /* ...but once we exceed LINE_MAP_MAX_COLUMN_NUMBER column-tracking should
      be disabled.  */
-  assert_loceq ("foo.c", 5, 0, loc_too_wide);
-  assert_loceq ("foo.c", 5, 0, loc_too_wide_2);
+  assert_loceq (fname, 5, 0, loc_too_wide);
+  assert_loceq (fname, 5, 0, loc_too_wide_2);
   /*...and column-tracking should be re-enabled for subsequent lines.  */
-  assert_loceq ("foo.c", 6, 10, loc_sane_again);
+  assert_loceq (fname, 6, 10, loc_sane_again);
 
   assert_loceq ("bar.c", 1, 150, loc_f);
 
@@ -2242,10 +2368,11 @@ test_make_location_nonpure_range_endpoints (const line_table_case &case_)
      with C++ frontend.
      ....................0000000001111111111222.
      ....................1234567890123456789012.  */
-  const char *content = "     r += !aaa == bbb;\n";
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  const char *content = "     r += !aaa == bbb;\n";
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   const location_t c11 = linemap_position_for_column (line_table, 11);
   const location_t c12 = linemap_position_for_column (line_table, 12);
@@ -3902,7 +4029,8 @@ static const location_t boundary_locations[] = {
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 void
-for_each_line_table_case (void (*testcase) (const line_table_case &))
+for_each_line_table_case (void (*testcase) (const line_table_case &),
+			  bool test_generated_data)
 {
   /* As noted above in the description of struct line_table_case,
      we want to explore a test matrix of interesting line_table
@@ -3921,16 +4049,19 @@ for_each_line_table_case (void (*testcase) (const line_table_case &))
       const int num_boundary_locations = ARRAY_SIZE (boundary_locations);
       for (int loc_idx = 0; loc_idx < num_boundary_locations; loc_idx++)
 	{
-	  line_table_case c (default_range_bits, boundary_locations[loc_idx]);
-
-	  testcase (c);
-
-	  num_cases_tested++;
+	  /* ...and try both normal files, and internally generated data.  */
+	  for (int gen = 0; gen != 1+test_generated_data; ++gen)
+	    {
+	      line_table_case c (default_range_bits,
+				 boundary_locations[loc_idx], gen);
+	      testcase (c);
+	      num_cases_tested++;
+	    }
 	}
     }
 
   /* Verify that we fully covered the test matrix.  */
-  ASSERT_EQ (num_cases_tested, 2 * 12);
+  ASSERT_EQ (num_cases_tested, 2 * 12 * (1+test_generated_data));
 }
 
 /* Verify that when presented with a consecutive pair of locations with
@@ -3941,7 +4072,7 @@ for_each_line_table_case (void (*testcase) (const line_table_case &))
 static void
 test_line_offset_overflow ()
 {
-  line_table_test ltt (line_table_case (5, 0));
+  line_table_test ltt (line_table_case (5, 0, false));
 
   linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
   linemap_line_start (line_table, 1, 100);
@@ -4181,9 +4312,9 @@ input_cc_tests ()
   test_should_have_column_data_p ();
   test_unknown_location ();
   test_builtins ();
-  for_each_line_table_case (test_make_location_nonpure_range_endpoints);
+  for_each_line_table_case (test_make_location_nonpure_range_endpoints, true);
 
-  for_each_line_table_case (test_accessing_ordinary_linemaps);
+  for_each_line_table_case (test_accessing_ordinary_linemaps, true);
   for_each_line_table_case (test_lexer);
   for_each_line_table_case (test_lexer_string_locations_simple);
   for_each_line_table_case (test_lexer_string_locations_ebcdic);
diff --git a/gcc/input.h b/gcc/input.h
index d1087b7a9e8..129d2f7c2f2 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -34,6 +34,7 @@ extern GTY(()) class line_maps *saved_line_table;
 
 /* Returns the translated string referring to the special location.  */
 const char *special_fname_builtin ();
+const char *special_fname_generated ();
 
 /* line-map.cc reserves RESERVED_LOCATION_COUNT to the user.  Ensure
    both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
@@ -114,14 +115,21 @@ class char_span
 };
 
 extern char_span location_get_source_line (const char *file_path, int line);
+
+/* The version taking an exploc handles generated source too, and should be used
+   whenever possible.  */
+extern char_span location_get_source_line (expanded_location exploc);
+extern char_span location_get_source_line (expanded_location exploc, int line);
+
 extern char *get_source_text_between (location_t, location_t);
 extern char_span get_source_file_content (const char *file_path);
 
 extern bool location_missing_trailing_newline (const char *file_path);
 
-/* Forward decl of slot within file_cache, so that the definition doesn't
+/* Forward decl of slots within file_cache, so that the definition doesn't
    need to be in this header.  */
 class file_cache_slot;
+class data_cache_slot;
 
 /* A cache of source files for use when emitting diagnostics
    (and in a few places in the C/C++ frontends).
@@ -139,7 +147,9 @@ class file_cache
   ~file_cache ();
 
   file_cache_slot *lookup_or_add_file (const char *file_path);
+  data_cache_slot *lookup_or_add_data (const char *data, unsigned int data_len);
   void forcibly_evict_file (const char *file_path);
+  void forcibly_evict_data (const char *data, unsigned int data_len);
 
   /* See comments in diagnostic.h about the input conversion context.  */
   struct input_context
@@ -151,13 +161,17 @@ class file_cache
 				 bool should_skip_bom);
 
  private:
-  file_cache_slot *evicted_cache_tab_entry (unsigned *highest_use_count);
+  template <class Slot>
+  Slot *evicted_cache_tab_entry (Slot *slots, unsigned int *highest_use_count);
+
   file_cache_slot *add_file (const char *file_path);
+  data_cache_slot *add_data (const char *data, unsigned int data_len);
   file_cache_slot *lookup_file (const char *file_path);
+  data_cache_slot *lookup_data (const char *data, unsigned int data_len);
 
- private:
   static const size_t num_file_slots = 16;
   file_cache_slot *m_file_slots;
+  data_cache_slot *m_data_slots;
   input_context in_context;
 };
 
@@ -254,6 +268,8 @@ void dump_location_info (FILE *stream);
 void diagnostics_file_cache_fini (void);
 
 void diagnostics_file_cache_forcibly_evict_file (const char *file_path);
+void diagnostics_file_cache_forcibly_evict_data (const char *data,
+						 unsigned int data_len);
 
 class GTY(()) string_concat
 {
diff --git a/gcc/selftest.cc b/gcc/selftest.cc
index 20c10bbd055..7126b9901dd 100644
--- a/gcc/selftest.cc
+++ b/gcc/selftest.cc
@@ -163,14 +163,21 @@ assert_str_startswith (const location &loc,
 
 named_temp_file::named_temp_file (const char *suffix)
 {
-  m_filename = make_temp_file (suffix);
-  ASSERT_NE (m_filename, NULL);
+  if (suffix)
+    {
+      m_filename = make_temp_file (suffix);
+      ASSERT_NE (m_filename, NULL);
+    }
+  else
+    m_filename = nullptr;
 }
 
 /* Destructor.  Delete the tempfile.  */
 
 named_temp_file::~named_temp_file ()
 {
+  if (!m_filename)
+    return;
   unlink (m_filename);
   diagnostics_file_cache_forcibly_evict_file (m_filename);
   free (m_filename);
@@ -183,7 +190,9 @@ named_temp_file::~named_temp_file ()
 temp_source_file::temp_source_file (const location &loc,
 				    const char *suffix,
 				    const char *content)
-: named_temp_file (suffix)
+: named_temp_file (suffix),
+  content_buf (nullptr),
+  content_len (0)
 {
   FILE *out = fopen (get_filename (), "w");
   if (!out)
@@ -192,19 +201,41 @@ temp_source_file::temp_source_file (const location &loc,
   fclose (out);
 }
 
-/* As above, but with a size, to allow for NUL bytes in CONTENT.  */
+/* As above, but with a size, to allow for NUL bytes in CONTENT.  When
+   IS_GENERATED==true, the data is kept in memory instead, for testing LC_GEN
+   maps.  */
 
 temp_source_file::temp_source_file (const location &loc,
 				    const char *suffix,
 				    const char *content,
-				    size_t sz)
-: named_temp_file (suffix)
+				    size_t sz,
+				    bool is_generated)
+: named_temp_file (is_generated ? nullptr : suffix),
+  content_buf (is_generated ? XNEWVEC (char, sz) : nullptr),
+  content_len (is_generated ? sz : 0)
 {
-  FILE *out = fopen (get_filename (), "w");
-  if (!out)
-    fail_formatted (loc, "unable to open tempfile: %s", get_filename ());
-  fwrite (content, sz, 1, out);
-  fclose (out);
+  if (is_generated)
+    {
+      gcc_assert (sz); /* Empty generated content is not supported.  */
+      memcpy (content_buf, content, sz);
+    }
+  else
+    {
+      FILE *out = fopen (get_filename (), "w");
+      if (!out)
+	fail_formatted (loc, "unable to open tempfile: %s", get_filename ());
+      fwrite (content, sz, 1, out);
+      fclose (out);
+    }
+}
+
+temp_source_file::~temp_source_file ()
+{
+  if (content_buf)
+    {
+      diagnostics_file_cache_forcibly_evict_data (content_buf, content_len);
+      XDELETEVEC (content_buf);
+    }
 }
 
 /* Avoid introducing locale-specific differences in the results
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 20d522afda4..1bcbd275cd1 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+struct line_map_ordinary;
+
 namespace selftest {
 
 /* A struct describing the source-location of a selftest, to make it
@@ -96,10 +98,9 @@ extern void assert_str_startswith (const location &loc,
 class named_temp_file
 {
  public:
-  named_temp_file (const char *suffix);
+  explicit named_temp_file (const char *suffix);
   ~named_temp_file ();
   const char *get_filename () const { return m_filename; }
-
  private:
   char *m_filename;
 };
@@ -113,7 +114,13 @@ class temp_source_file : public named_temp_file
   temp_source_file (const location &loc, const char *suffix,
 		    const char *content);
   temp_source_file (const location &loc, const char *suffix,
-		    const char *content, size_t sz);
+		    const char *content, size_t sz,
+		    bool is_generated = false);
+  ~temp_source_file ();
+
+  char *const content_buf;
+  const size_t content_len;
+  const line_map_ordinary *do_linemap_add (int line); /* In input.cc */
 };
 
 /* RAII-style class for avoiding introducing locale-specific differences
@@ -171,6 +178,10 @@ class line_table_test
 
   /* Destructor.  Restore the saved line_table.  */
   ~line_table_test ();
+
+  /* When this is enabled in the line_table_case, test storing all the data
+     in memory rather than a file.  */
+  bool m_generated_data;
 };
 
 /* Helper function for selftests that need a function decl.  */
@@ -183,7 +194,8 @@ extern tree make_fndecl (tree return_type,
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 extern void
-for_each_line_table_case (void (*testcase) (const line_table_case &));
+for_each_line_table_case (void (*testcase) (const line_table_case &),
+			  bool test_generated_data = false);
 
 /* Read the contents of PATH into memory, returning a 0-terminated buffer
    that must be freed by the caller.
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
index baa6b629b83..29e653625f8 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
@@ -430,7 +430,7 @@ test_show_locus (function *fun)
      to upper case.  Give all of the ranges labels (sharing one label).  */
   if (0 == strcmp (fnname, "test_many_nested_locations"))
     {
-      const char *file = LOCATION_FILE (fnstart);
+      const expanded_location xloc = expand_location (fnstart);
       const int start_line = fnstart_line + 2;
       const int finish_line = start_line + 7;
       location_t loc = get_loc (start_line - 1, 2);
@@ -438,7 +438,7 @@ test_show_locus (function *fun)
       rich_location richloc (line_table, loc);
       for (int line = start_line; line <= finish_line; line++)
 	{
-	  char_span content = location_get_source_line (file, line);
+	  char_span content = location_get_source_line (xloc, line);
 	  gcc_assert (content);
 	  /* Split line up into words.  */
 	  for (int idx = 0; idx < content.length (); idx++)
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index ee5419d1f40..8d7c93bce53 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1165,7 +1165,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
 		     const char *to_file, linenum_type to_line,
 		     unsigned int sysp)
 {
-  linemap_assert (reason != LC_ENTER_MACRO);
+  linemap_assert (reason != LC_ENTER_MACRO && reason != LC_GEN);
 
   const line_map_ordinary *ord_map = NULL;
   if (!to_line && reason == LC_RENAME_VERBATIM)
@@ -1176,6 +1176,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
          preprocessed source.  */
       line_map_ordinary *last = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
       if (!ORDINARY_MAP_STARTING_LINE_NUMBER (last)
+	  && !ORDINARY_MAP_GENERATED_DATA_P (last)
 	  && 0 == filename_cmp (to_file, ORDINARY_MAP_FILE_NAME (last))
 	  && SOURCE_LINE (last, pfile->line_table->highest_line) == 2)
 	{
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 44fea0ea08e..426cddb6964 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -75,6 +75,8 @@ enum lc_reason
   LC_RENAME_VERBATIM,	/* Likewise, but "" != stdin.  */
   LC_ENTER_MACRO,	/* Begin macro expansion.  */
   LC_MODULE,		/* A (C++) Module.  */
+  LC_GEN,		/* Internally generated source.  */
+
   /* FIXME: add support for stringize and paste.  */
   LC_HWM /* High Water Mark.  */
 };
@@ -437,7 +439,13 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
 
   /* Pointer alignment boundary on both 32 and 64-bit systems.  */
 
-  const char *to_file;
+  /* For an LC_GEN map, DATA points to the actual content.  Otherwise it is
+     a file name.  In the former case, the data could contain embedded nulls
+     and it need not be null terminated, so we use the GTY markup appropriate
+     for that case.  */
+  const char * GTY((string_length ("%h.data_len"))) data;
+  unsigned int data_len;
+
   linenum_type to_line;
 
   /* Location from whence this line map was included.  For regular
@@ -662,6 +670,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* TRUE if this line map contains generated data.  */
+inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
+{
+  return ord_map->reason == LC_GEN;
+}
+
 /* TRUE if this line map is for a module (not a source file).  */
 
 inline bool
@@ -671,14 +685,42 @@ MAP_MODULE_P (const line_map *map)
 	  && linemap_check_ordinary (map)->reason == LC_MODULE);
 }
 
-/* Get the filename of ordinary map MAP.  */
+/* Get the data contents of ordinary map MAP.  */
 
 inline const char *
 ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  linemap_assert (ord_map->reason != LC_GEN);
+  return ord_map->data;
 }
 
+inline const char *
+ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->data;
+}
+
+inline unsigned int
+ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->data_len;
+}
+
+/* Sometimes we don't need to care which kind it is.  */
+inline const char *
+ORDINARY_MAP_FILE_NAME_OR_DATA (const line_map_ordinary *ord_map)
+{
+  return ord_map->data;
+}
+
+/* If we just want to know whether two maps point to the same
+   file/buffer or not.  */
+bool
+ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
+			   const line_map_ordinary *map2);
+
 /* Get the cpp macro whose expansion gave birth to macro map MAP.  */
 
 inline cpp_hashnode *
@@ -1097,17 +1139,19 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
    map that records locations of tokens that are not part of macro
    replacement-lists present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the lifetime of SET.  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
-   natural values considering the file we are returning to.
+   The text pointed to by DATA must have a lifetime at least as long as the
+   lifetime of SET.  If reason is LC_LEAVE, and DATA is NULL, then DATA, TO_LINE
+   and SYSP are given their natural values considering the file we are returning
+   to.  If reason is LC_GEN, then DATA is the actual content, and DATA_LEN>0 is
+   the length of it.  Otherwise DATA is a file name and DATA_LEN need not be
+   specified.  If DATA_LEN is specified for a file name, it should be the length
+   of the file name, including the terminating null.
 
-   A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   A call to this function can relocate the previous set of maps, so any stored
+   line_map pointers should not be used.  */
 extern const line_map *linemap_add
   (class line_maps *, enum lc_reason, unsigned int sysp,
-   const char *to_file, linenum_type to_line);
+   const char *data, linenum_type to_line, unsigned int data_len = 0);
 
 /* Create a macro map.  A macro map encodes source locations of tokens
    that are part of a macro replacement-list, at a macro expansion
@@ -1257,7 +1301,7 @@ linemap_position_for_loc_and_offset (class line_maps *set,
 inline const char *
 LINEMAP_FILE (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  return ORDINARY_MAP_FILE_NAME (ord_map);
 }
 
 /* Return the line number this map started encoding location from.  */
@@ -1277,6 +1321,13 @@ LINEMAP_SYSP (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map);
+
 const struct line_map *first_map_in_common (line_maps *set,
 					    location_t loc0,
 					    location_t loc1,
@@ -1316,6 +1367,11 @@ typedef struct
 
   /* In a system header?. */
   bool sysp;
+
+  /* If generated data, the data and its length.  The data may contain embedded
+   nulls and need not be null-terminated.  */
+  unsigned int generated_data_len;
+  const char *generated_data;
 } expanded_location;
 
 class range_label;
@@ -2104,12 +2160,14 @@ struct linemap_stats
   long adhoc_table_entries_used;
 };
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
+/* Return the highest location emitted for a given file or generated data buffer
+   for which there is a line map in SET.  If the function returns TRUE, *LOC is
+   set to the highest location emitted for that file.  The const char* arg is
+   either a file name or a generated data buffer, as indicated by
+   IS_DATA.  */
 bool linemap_get_file_highest_location (class line_maps * set,
-					const char *file_name,
+					const char *fname_or_data,
+					bool is_data,
 					location_t *loc);
 
 /* Compute and return statistics about the memory consumption of some
diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index e0f82e20571..c37effec68d 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -48,6 +48,35 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
 extern unsigned num_expanded_macros_counter;
 extern unsigned num_macro_tokens_counter;
 
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map)
+{
+  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    {
+      ord_map = linemap_included_from_linemap (set, ord_map);
+      if (!ord_map)
+	return "-";
+    }
+  return ORDINARY_MAP_FILE_NAME (ord_map);
+}
+
+/* If we just want to know whether two maps point to the same
+   file/buffer or not.  */
+bool
+ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
+			   const line_map_ordinary *map2)
+{
+  const bool is_data = ORDINARY_MAP_GENERATED_DATA_P (map1);
+  return is_data == ORDINARY_MAP_GENERATED_DATA_P (map2)
+    && (is_data
+	? map1->data == map2->data
+	: !filename_cmp (map1->data, map2->data));
+}
+
 /* Destructor for class line_maps.
    Ensure non-GC-managed memory is released.  */
 
@@ -411,8 +440,9 @@ linemap_check_files_exited (line_maps *set)
   for (const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP (set);
        ! MAIN_FILE_P (map);
        map = linemap_included_from_linemap (set, map))
-    fprintf (stderr, "line-map.cc: file \"%s\" entered but not left\n",
-	     ORDINARY_MAP_FILE_NAME (map));
+    fprintf (stderr, "line-map.cc: file \"%s%s\" entered but not left\n",
+	     ORDINARY_MAP_CONTAINING_FILE_NAME (set, map),
+	     ORDINARY_MAP_GENERATED_DATA_P (map) ? "<generated>" : "");
 }
 
 /* Create NUM zero-initialized maps of type MACRO_P.  */
@@ -505,21 +535,25 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
 }
 
 /* Add a mapping of logical source line to physical source file and
-   line number.
+   line number.  This function creates an "ordinary map", which is a
+   map that records locations of tokens that are not part of macro
+   replacement-lists present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the final call to lookup_line ().  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
-   natural values considering the file we are returning to.
+   The text pointed to by DATA must have a lifetime at least as long as the
+   lifetime of SET.  If reason is LC_LEAVE, and DATA is NULL, then DATA, TO_LINE
+   and SYSP are given their natural values considering the file we are returning
+   to.  If reason is LC_GEN, then DATA is the actual content, and DATA_LEN>0 is
+   the length of it.  Otherwise DATA is a file name and DATA_LEN need not be
+   specified.  If DATA_LEN is specified for a file name, it should be the length
+   of the file name, including the terminating null.
 
-   FROM_LINE should be monotonic increasing across calls to this
-   function.  A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   A call to this function can relocate the previous set of maps, so any stored
+   line_map pointers should not be used.  */
 
 const struct line_map *
 linemap_add (line_maps *set, enum lc_reason reason,
-	     unsigned int sysp, const char *to_file, linenum_type to_line)
+	     unsigned int sysp, const char *data, linenum_type to_line,
+	     unsigned int data_len)
 {
   /* Generate a start_location above the current highest_location.
      If possible, make the low range bits be zero.  */
@@ -535,13 +569,25 @@ linemap_add (line_maps *set, enum lc_reason reason,
 		      >= MAP_START_LOCATION (LINEMAPS_LAST_ORDINARY_MAP (set))));
 
   /* When we enter the file for the first time reason cannot be
-     LC_RENAME.  */
-  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
+     LC_RENAME.  To keep things simple, don't track LC_RENAME for
+     LC_GEN maps, but just keep their reason as always LC_GEN.  */
+  if (reason == LC_RENAME)
+    {
+      linemap_assert (set->depth != 0);
+      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
+      linemap_assert (prev);
+      if (prev->reason == LC_GEN)
+	{
+	  reason = LC_GEN;
+	  data = prev->data;
+	  data_len = prev->data_len;
+	}
+    }
 
   /* If we are leaving the main file, return a NULL map.  */
   if (reason == LC_LEAVE
       && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
-      && to_file == NULL)
+      && data == NULL)
     {
       set->depth--;
       return NULL;
@@ -557,8 +603,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
     = linemap_check_ordinary (new_linemap (set, start_location));
   map->reason = reason;
 
-  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
-    to_file = "<stdin>";
+  if (data && *data == '\0' && reason != LC_RENAME_VERBATIM
+      && reason != LC_GEN)
+    data = "<stdin>";
 
   if (reason == LC_RENAME_VERBATIM)
     reason = LC_RENAME;
@@ -577,20 +624,31 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	 that comes right before MAP in the same file.  */
       from = linemap_included_from_linemap (set, map - 1);
 
-      /* A TO_FILE of NULL is special - we use the natural values.  */
-      if (to_file == NULL)
+      /* A DATA of NULL is special - we use the natural values.  */
+      if (data == NULL)
 	{
-	  to_file = ORDINARY_MAP_FILE_NAME (from);
+	  data = ORDINARY_MAP_FILE_NAME_OR_DATA (from);
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
       else
-	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-				      to_file) == 0);
+	linemap_assert (ORDINARY_MAP_GENERATED_DATA_P (from)
+			? (ORDINARY_MAP_GENERATED_DATA (from) == data)
+			: (filename_cmp (ORDINARY_MAP_FILE_NAME (from), data)
+			   == 0));
     }
 
   map->sysp = sysp;
-  map->to_file = to_file;
+  map->data = data;
+
+  if (reason == LC_GEN)
+    {
+      gcc_assert (data_len);
+      map->data_len = data_len;
+    }
+  else
+    map->data_len = (data_len > 0 ? data_len : strlen (data) + 1);
+
   map->to_line = to_line;
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
   /* Do not store range_bits here.  That's readjusted in
@@ -606,7 +664,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
      pure_location_p.  */
   linemap_assert (pure_location_p (set, start_location));
 
-  if (reason == LC_ENTER)
+  if (reason == LC_ENTER || reason == LC_GEN)
     {
       if (set->depth == 0)
 	map->included_from = 0;
@@ -617,7 +675,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	      & ~((1 << map[-1].m_column_and_range_bits) - 1))
 	     + map[-1].start_location);
       set->depth++;
-      if (set->trace_includes)
+      if (set->trace_includes && reason == LC_ENTER)
 	trace_include (set, map);
     }
   else if (reason == LC_RENAME)
@@ -863,8 +921,9 @@ linemap_line_start (line_maps *set, linenum_type to_line,
 	        (const_cast <line_map *>
 		  (linemap_add (set, LC_RENAME,
 				ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
-				ORDINARY_MAP_FILE_NAME (map),
-				to_line)));
+				ORDINARY_MAP_FILE_NAME_OR_DATA (map),
+				to_line,
+				map->data_len)));
       map->m_column_and_range_bits = column_bits;
       map->m_range_bits = range_bits;
       r = (MAP_START_LOCATION (map)
@@ -1025,7 +1084,7 @@ linemap_position_for_loc_and_offset (line_maps *set,
        cannot encode the location there.  */
     if ((map + 1)->reason != LC_RENAME
 	|| line < ORDINARY_MAP_STARTING_LINE_NUMBER (map + 1)
-	|| 0 != strcmp (LINEMAP_FILE (map + 1), LINEMAP_FILE (map)))
+	|| !ORDINARY_MAPS_SAME_FILE_P (map, map + 1))
       return loc;
 
   column += column_offset;
@@ -1283,7 +1342,7 @@ linemap_get_expansion_filename (line_maps *set,
 
   linemap_macro_loc_to_exp_point (set, location, &map);
 
-  return LINEMAP_FILE (map);
+  return ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
 }
 
 /* Return the name of the macro associated to MACRO_MAP.  */
@@ -1853,8 +1912,12 @@ linemap_expand_location (line_maps *set,
 	abort ();
 
       const line_map_ordinary *ord_map = linemap_check_ordinary (map);
-
-      xloc.file = LINEMAP_FILE (ord_map);
+      xloc.file = ORDINARY_MAP_CONTAINING_FILE_NAME (set, ord_map);
+      if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+	{
+	  xloc.generated_data = ORDINARY_MAP_GENERATED_DATA (ord_map);
+	  xloc.generated_data_len = ORDINARY_MAP_GENERATED_DATA_LEN (ord_map);
+	}
       xloc.line = SOURCE_LINE (ord_map, loc);
       xloc.column = SOURCE_COLUMN (ord_map, loc);
       xloc.sysp = LINEMAP_SYSP (ord_map) != 0;
@@ -1873,7 +1936,7 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
 {
   const char *const lc_reasons_v[LC_HWM]
       = { "LC_ENTER", "LC_LEAVE", "LC_RENAME", "LC_RENAME_VERBATIM",
-	  "LC_ENTER_MACRO", "LC_MODULE" };
+	  "LC_ENTER_MACRO", "LC_MODULE", "LC_GEN" };
   const line_map *map;
   unsigned reason;
 
@@ -1903,11 +1966,15 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
       const line_map_ordinary *includer_map
 	= linemap_included_from_linemap (set, ord_map);
 
-      fprintf (stream, "File: %s:%d\n", ORDINARY_MAP_FILE_NAME (ord_map),
+      fprintf (stream, "File: %s:%d\n",
+	       ORDINARY_MAP_GENERATED_DATA_P (ord_map) ? "<generated>"
+	       : ORDINARY_MAP_FILE_NAME (ord_map),
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (ord_map));
       fprintf (stream, "Included from: [%d] %s\n",
 	       includer_map ? int (includer_map - set->info_ordinary.maps) : -1,
-	       includer_map ? ORDINARY_MAP_FILE_NAME (includer_map) : "None");
+	       includer_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set,
+								 includer_map)
+	       : "None");
     }
   else
     {
@@ -1931,7 +1998,7 @@ linemap_dump_location (line_maps *set,
 {
   const line_map_ordinary *map;
   location_t location;
-  const char *path = "", *from = "";
+  const char *path = "", *path_suffix = "", *from = "";
   int l = -1, c = -1, s = -1, e = -1;
 
   if (IS_ADHOC_LOC (loc))
@@ -1948,7 +2015,9 @@ linemap_dump_location (line_maps *set,
     linemap_assert (location < RESERVED_LOCATION_COUNT);
   else
     {
-      path = LINEMAP_FILE (map);
+      path = ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	path_suffix = "<generated>";
       l = SOURCE_LINE (map, location);
       c = SOURCE_COLUMN (map, location);
       s = LINEMAP_SYSP (map) != 0;
@@ -1959,24 +2028,27 @@ linemap_dump_location (line_maps *set,
 	{
 	  const line_map_ordinary *from_map
 	    = linemap_included_from_linemap (set, map);
-	  from = from_map ? LINEMAP_FILE (from_map) : "<NULL>";
+	  from = from_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set, from_map)
+	    : "<NULL>";
 	}
     }
 
   /* P: path, L: line, C: column, S: in-system-header, M: map address,
      E: macro expansion?, LOC: original location, R: resolved location   */
-  fprintf (stream, "{P:%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
-	   path, from, l, c, s, (void*)map, e, loc, location);
+  fprintf (stream, "{P:%s%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
+	   path, path_suffix, from, l, c, s, (void*)map, e, loc, location);
 }
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
+/* Return the highest location emitted for a given file or generated data buffer
+   for which there is a line map in SET.  If the function returns TRUE, *LOC is
+   set to the highest location emitted for that file.  The const char* arg is
+   either a file name or a generated data buffer, as indicated by
+   IS_DATA.  */
 
 bool
 linemap_get_file_highest_location (line_maps *set,
-				   const char *file_name,
+				   const char *fname_or_data,
+				   bool is_data,
 				   location_t *loc)
 {
   /* If the set is empty or no ordinary map has been created then
@@ -1984,13 +2056,23 @@ linemap_get_file_highest_location (line_maps *set,
   if (set == NULL || set->info_ordinary.used == 0)
     return false;
 
-  /* Now look for the last ordinary map created for FILE_NAME.  */
+  /* Now look for the last ordinary map created for this file.  */
   int i;
   for (i = set->info_ordinary.used - 1; i >= 0; --i)
     {
-      const char *fname = set->info_ordinary.maps[i].to_file;
-      if (fname && !filename_cmp (fname, file_name))
-	break;
+      const auto map = set->info_ordinary.maps + i;
+      if (is_data)
+	{
+	  if (ORDINARY_MAP_GENERATED_DATA_P (map)
+	      && ORDINARY_MAP_GENERATED_DATA (map) == fname_or_data)
+	    break;
+	}
+      else if (!ORDINARY_MAP_GENERATED_DATA_P (map))
+	{
+	  const auto this_fname = ORDINARY_MAP_FILE_NAME (map);
+	  if (this_fname && !filename_cmp (this_fname, fname_or_data))
+	    break;
+	}
     }
 
   if (i < 0)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context
  2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
@ 2023-07-21 23:08 ` Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-21 23:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Class edit_context handles outputting fixit hints in diff form that could be
manually or automatically applied by the user. This will not make sense for
generated data locations, such as the contents of a _Pragma string, because
the text to be modified does not appear in the user's input files. We do not
currently ever generate fixit hints in such a context, but for future-proofing
purposes, ignore such locations in edit context now.

gcc/ChangeLog:

	* edit-context.cc (edit_context::apply_fixit): Ignore locations in
	generated data.
---
 gcc/edit-context.cc | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..ae11b6f2e00 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -301,8 +301,12 @@ edit_context::apply_fixit (const fixit_hint *hint)
     return false;
   if (start.column == 0)
     return false;
+  if (start.generated_data)
+    return false;
   if (next_loc.column == 0)
     return false;
+  if (next_loc.generated_data)
+    return false;
 
   edited_file &file = get_or_insert_file (start.file);
   if (!m_valid)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings
  2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
@ 2023-07-21 23:08 ` Lewis Hyatt
  2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
  2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
  4 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-21 23:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=====
_Pragma("GCC diagnostic ignored \"oops")
=====

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
    1 | _Pragma("GCC diagnostic ignored \"oops")
      |                        ^

with the caret in a nonsensical location, while this one:

=====
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=====

produces:

file.cpp:2:24: warning: missing terminating " character
    2 | _Pragma(S)
      |                        ^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

======
In buffer generated from file.cpp:1:
<generated>:1:24: warning: missing terminating " character
    1 | GCC diagnostic ignored "oops
      |                        ^
file.cpp:1:1: note: in <_Pragma directive>
    1 | _Pragma("GCC diagnostic ignored \"oops")
      | ^~~~~~~
======

and

======
<generated>:1:24: warning: missing terminating " character
    1 | GCC diagnostic ignored "oops
      |                        ^
file.cpp:2:1: note: in <_Pragma directive>
    2 | _Pragma(S)
      | ^~~~~~~
======

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

	* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
	of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

	* directives.cc (get_token_no_padding): Add argument to receive the
	virtual location of the token.
	(get__Pragma_string): Likewise.
	(do_pragma): Set pfile->directive_result->src_loc properly, it should
	not be a virtual location.
	(destringize_and_run): Update to provide proper locations for the
	_Pragma string tokens.  Support raw strings.
	(_cpp_do__Pragma): Adapt to changes to the helper functions.
	* errors.cc (cpp_diagnostic_at): Support
	cpp_reader::diagnostic_rebase_loc.
	(cpp_diagnostic_with_line): Likewise.
	* include/line-map.h (class rich_location): Add new member
	forget_cached_expanded_locations().
	* internal.h (struct _cpp__Pragma_state): Define new struct.
	(_cpp_rebase_diagnostic_location): Declare new function.
	(struct cpp_reader): Add diagnostic_rebase_loc member.
	(_cpp_push__Pragma_token_context): Declare new function.
	(_cpp_do__Pragma): Adjust prototype.
	* macro.cc (pragma_str): New static var.
	(builtin_macro): Adapt to new implementation of _Pragma processing.
	(_cpp_pop_context): Fix the logic for resetting
	pfile->top_most_macro_node, which previously was never triggered,
	although the error seems to have been harmless.
	(_cpp_push__Pragma_token_context): New function.
	(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

	* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
	the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
	macro tracking output for _Pragma directives.
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
	tracking output for _Pragma directives.
	* c-c++-common/cpp/pr57580.c: Likewise.
	* c-c++-common/gomp/pragma-3.c: Likewise.
	* c-c++-common/gomp/pragma-5.c: Likewise.
	* g++.dg/pch/operator-1.C: Likewise.
	* gcc.dg/cpp/pr28165.c: Likewise.
	* gcc.dg/cpp/pr35322.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-4.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-5.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-6.c: Likewise.
	* gcc.dg/gomp/macro-4.c: Likewise.
	* gcc.dg/pragma-message.c: Likewise.
	* c-c++-common/pragma-diag-17.c: New test.
	* c-c++-common/pragma-diag-18.c: New test.
	* g++.dg/cpp/pragma-raw-string.C: New test.
	* g++.dg/pch/LC_GEN-maps.C: New test.
	* g++.dg/pch/LC_GEN-maps.Hs: New test.
	* lib/prune.exp: Support pruning new _Pragma include trace.
---
 gcc/c-family/c-ppoutput.cc                    |   2 +-
 .../c-c++-common/cpp/diagnostic-pragma-1.c    |   1 +
 gcc/testsuite/c-c++-common/cpp/pr57580.c      |   2 +-
 gcc/testsuite/c-c++-common/gomp/pragma-3.c    |   3 +-
 gcc/testsuite/c-c++-common/gomp/pragma-5.c    |   3 +-
 gcc/testsuite/c-c++-common/pragma-diag-17.c   |  35 +++
 gcc/testsuite/c-c++-common/pragma-diag-18.c   |  18 ++
 gcc/testsuite/g++.dg/cpp/pragma-raw-string.C  |  16 +
 gcc/testsuite/g++.dg/pch/LC_GEN-maps.C        |  20 ++
 gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs       |   5 +
 gcc/testsuite/g++.dg/pch/operator-1.C         |   1 +
 gcc/testsuite/gcc.dg/cpp/pr28165.c            |   1 +
 gcc/testsuite/gcc.dg/cpp/pr35322.c            |   1 +
 .../dfp/pragma-float-const-decimal64-4.c      |   1 +
 .../dfp/pragma-float-const-decimal64-5.c      |   2 +-
 .../dfp/pragma-float-const-decimal64-6.c      |   2 +-
 gcc/testsuite/gcc.dg/gomp/macro-4.c           |   2 +-
 gcc/testsuite/gcc.dg/pragma-message.c         |   3 +-
 gcc/testsuite/lib/prune.exp                   |   1 +
 gcc/tree-diagnostic.cc                        |  18 +-
 libcpp/directives.cc                          | 278 ++++++++++++------
 libcpp/errors.cc                              |  16 +-
 libcpp/include/line-map.h                     |   1 +
 libcpp/internal.h                             |  32 +-
 libcpp/macro.cc                               | 126 +++++++-
 .../libgomp.oacc-c-c++-common/reduction-5.c   |   3 +-
 .../libgomp.oacc-c-c++-common/vred2d-128.c    |  40 ++-
 27 files changed, 491 insertions(+), 142 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pragma-diag-17.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-diag-18.c
 create mode 100644 gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
 create mode 100644 gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
 create mode 100644 gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index 4aa2bef2c0f..364bfe5ad43 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -280,7 +280,7 @@ token_streamer::stream (cpp_reader *pfile, const cpp_token *token,
 	  const char *space;
 	  const char *name;
 
-	  line_marker_emitted = maybe_print_line (token->src_loc);
+	  line_marker_emitted = maybe_print_line (loc);
 	  fputs ("#pragma ", print.outf);
 	  c_pp_lookup_pragma (token->val.pragma, &space, &name);
 	  if (space)
diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
index 9867c94a8dd..801c93935b8 100644
--- a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
+++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-additional-options "-ftrack-macro-expansion=0" }
 
 #pragma GCC warning "warn-a" // { dg-warning warn-a }
 #pragma GCC error "err-b" // { dg-error err-b }
diff --git a/gcc/testsuite/c-c++-common/cpp/pr57580.c b/gcc/testsuite/c-c++-common/cpp/pr57580.c
index e77462b20de..b0e54d876d6 100644
--- a/gcc/testsuite/c-c++-common/cpp/pr57580.c
+++ b/gcc/testsuite/c-c++-common/cpp/pr57580.c
@@ -1,6 +1,6 @@
 /* PR preprocessor/57580 */
 /* { dg-do compile } */
-/* { dg-options "-save-temps" } */
+/* { dg-options "-save-temps -ftrack-macro-expansion=0" } */
 
 #define MSG 	\
   _Pragma("message(\"message0\")")	\
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-3.c b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
index 3e1b2111c3d..e0cffb8aeea 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
@@ -8,7 +8,8 @@ void
 f (void)
 {
   const char *str = outer(inner(1,2)); /* { dg-line str_location } */
-  /* { dg-warning "35:'pragma omp error' encountered: Test" "" { target *-*-* } inner_location }
+  /* { dg-warning "1:'pragma omp error' encountered: Test" "" { target *-*-* } 1 }
+     { dg-note "35: in <_Pragma directive>" "" { target *-*-* } inner_location }
      { dg-note "20:in expansion of macro 'inner'" "" { target *-*-* } outer_location }
      { dg-note "21:in expansion of macro 'outer'" "" { target *-*-* } str_location } */
 }
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-5.c b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
index 173c25e803a..787a334882d 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-5.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
@@ -8,7 +8,8 @@ void
 f (void)
 {
   const char *str = outer(inner(1,2)); /* { dg-line str_location } */
-  /* { dg-warning "35:'pragma omp error' encountered: Test" "" { target *-*-* } inner_location }
+  /* { dg-warning "4:'pragma omp error' encountered: Test" "" { target *-*-* } 1 }
+     { dg-note "35:in <_Pragma directive>" "" { target *-*-*} inner_location }
      { dg-note "20:in expansion of macro 'inner'" "" { target *-*-* } outer_location }
      { dg-note "21:in expansion of macro 'outer'" "" { target *-*-* } str_location } */
 }
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-17.c b/gcc/testsuite/c-c++-common/pragma-diag-17.c
new file mode 100644
index 00000000000..b9539c9598b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-17.c
@@ -0,0 +1,35 @@
+/* Test virtual location aspects of _Pragmas, when an error is reported after
+   lexing the tokens from the _Pragma string.  */
+/* { dg-additional-options "-Wpragmas -Wunknown-pragmas" } */
+
+_Pragma("GCC diagnostic ignored \"oops1\"") /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {24:'oops1' is not an option} "" { target *-*-* } 1 } */
+
+#define S2 "GCC diagnostic ignored \"oops2\""
+_Pragma(S2) /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {24:'oops2' is not an option} "" { target *-*-* } 1 } */
+
+#define PP(x) _Pragma(x) /* { dg-note {15:in <_Pragma directive>} } */
+PP("GCC diagnostic ignored \"oops3\"") /* { dg-note {1:in expansion of macro 'PP'} } */
+/* { dg-warning {24:'oops3' is not an option} "" { target *-*-* } 1 } */
+
+#define X4 _Pragma("GCC diagnostic ignored \"oops4\"") /* { dg-note {12:in <_Pragma directive>} } */
+#define Y4 X4 /* { dg-note {12:in expansion of macro 'X4'} } */
+Y4 /* { dg-note {1:in expansion of macro 'Y4'} } */
+/* { dg-warning {24:'oops4' is not an option} "" { target *-*-* } 1 } */
+
+#define P5 _Pragma /* { dg-note {12:in <_Pragma directive>} } */
+#define S5 "GCC diagnostic ignored \"oops5\""
+#define Y5 P5(S5) /* { dg-note {12:in expansion of macro 'P5'} } */
+Y5 /* { dg-note {1:in expansion of macro 'Y5'} } */
+/* { dg-warning {24:'oops5' is not an option} "" { target *-*-* } 1 } */
+
+#define P6 _Pragma /* { dg-note {12:in <_Pragma directive>} } */
+#define X6 P6("GCC diagnostic ignored \"oops6\"") /* { dg-note {12:in expansion of macro 'P6'} } */
+X6 /* { dg-note {1:in expansion of macro 'X6'} } */
+/* { dg-warning {24:'oops6' is not an option} "" { target *-*-* } 1 } */
+
+_Pragma(__DATE__) /* { dg-warning {-:[-Wunknown-pragmas]} } */
+
+_Pragma("once") /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {#pragma once in main file} "" { target *-*-*} 1 } */
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-18.c b/gcc/testsuite/c-c++-common/pragma-diag-18.c
new file mode 100644
index 00000000000..5de0fbcb8f1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-18.c
@@ -0,0 +1,18 @@
+/* Test virtual location aspects of _Pragmas, when an error is reported during
+   lexing of the _Pragma string itself or of the tokens within it.  */
+/* { dg-additional-options "-Wpragmas" } */
+
+#define X1 "\""
+_Pragma(X1) /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {1:missing terminating " character} "" { target *-*-* } 1 } */
+
+#define X2a _Pragma("GCC warning \"hello\"") ( /* { dg-note {13:in <_Pragma directive>} } */
+#define X2b "GCC warning \"goodbye\"" )
+_Pragma X2a X2b /* { dg-note {9:in expansion of macro 'X2a'} } */
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-1 } */
+/* { dg-warning {13:hello} "" { target *-*-* } 1 } */
+/* { dg-warning {13:goodbye} "" { target *-*-* } 1 } */
+
+_Pragma() /* { dg-error {9:_Pragma takes a parenthesized string literal} } */
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-1 } */
+/* { dg-error {at end of input|'_Pragma' does not name a type} "" { target *-*-* } .-2 } */
diff --git a/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C b/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
new file mode 100644
index 00000000000..5a495aadeec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
@@ -0,0 +1,16 @@
+/* Test that _Pragma with a raw string works correctly.  */
+/* { dg-do compile { target c++11 } } */
+/* { dg-additional-options "-Wunused-variable -Wpragmas" } */
+
+_Pragma(R"delim(GCC diagnostic push)delim")
+_Pragma(R"(GCC diagnostic ignored "-Wunused-variable")")
+void f1 () { int i; }
+_Pragma(R"(GCC diagnostic pop)")
+void f2 () { int i; } /* { dg-warning {18:-Wunused-variable} } */
+
+/* Make sure lines stay in sync if there is an embedded newline too.  */
+_Pragma(R"xyz(GCC diagnostic ignored R"(two
+line option?)")xyz")
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-2 } */
+/* { dg-warning {24:unknown option} "" { target *-*-* } 1 } */
+void f3 () { int i; } /* { dg-warning {18:-Wunused-variable} } */
diff --git a/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
new file mode 100644
index 00000000000..c21bce29bd2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
@@ -0,0 +1,20 @@
+#include "LC_GEN-maps.H"
+
+/* The LC_GEN map was written to the PCH, but there is not currently a way to
+   observe that fact in normal user code.  Let's try to test it anyway, using
+   -fdump-internal-locations to inspect the line_maps object we received from
+   the PCH.  */
+
+/* { dg-additional-options -fdump-internal-locations } */
+/* { dg-allow-blank-lines-in-output "" } */
+
+/* These regexps themselves will also appear in the output of
+   -fdump-internal-locations, so we need to make sure they contain at least
+   some regexp special characters, even if not strictly necessary, so they
+   match the intended text only, and not themselves.  Also, we make the second
+   one intentionally match the whole output if it maches anything.  We could
+   use dg-excess-errors instead, but that outputs XFAILS which are not really
+   helpful for this test.  */
+
+/* { dg-regexp {reason: . \(LC_GEN\)} } */
+/* { dg-regexp {(.|[\n\r])*data: this string should end up in the "PCH"(.|[\n\r])*} } */
diff --git a/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs
new file mode 100644
index 00000000000..76eefa7d1ae
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs
@@ -0,0 +1,5 @@
+/* Evaluating the _Pragma directive here creates an LC_GEN map in the
+   line_maps object that will be stored in the PCH.  The test will make sure
+   that the buffer holding the de-stringified _Pragma string contents makes
+   its way there.  */
+_Pragma("this string should end up in the \"PCH\"")
diff --git a/gcc/testsuite/g++.dg/pch/operator-1.C b/gcc/testsuite/g++.dg/pch/operator-1.C
index 290b5f7ab21..bf1c8b07bdb 100644
--- a/gcc/testsuite/g++.dg/pch/operator-1.C
+++ b/gcc/testsuite/g++.dg/pch/operator-1.C
@@ -1,2 +1,3 @@
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 #include "operator-1.H"
 int main(void){ major(0);} /* { dg-warning "Did not Work" } */
diff --git a/gcc/testsuite/gcc.dg/cpp/pr28165.c b/gcc/testsuite/gcc.dg/cpp/pr28165.c
index 71c7c1dba46..3e5e49ffa01 100644
--- a/gcc/testsuite/gcc.dg/cpp/pr28165.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr28165.c
@@ -2,5 +2,6 @@
 /* PR preprocessor/28165 */
 
 /* { dg-do preprocess } */
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 #pragma GCC system_header   /* { dg-warning "system_header" "ignored" } */
 _Pragma ("GCC system_header")   /* { dg-warning "system_header" "ignored" } */
diff --git a/gcc/testsuite/gcc.dg/cpp/pr35322.c b/gcc/testsuite/gcc.dg/cpp/pr35322.c
index 1af9605eac6..5bd5f69b73d 100644
--- a/gcc/testsuite/gcc.dg/cpp/pr35322.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr35322.c
@@ -1,4 +1,5 @@
 /* Test case for PR 35322 -- _Pragma ICE.  */
 
 /* { dg-do preprocess } */
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 _Pragma("GCC dependency") /* { dg-error "#pragma dependency expects" } */
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
index af0398daf79..42fc28a4384 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-additional-options -ftrack-macro-expansion=0 } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
index 75e9525dda0..3aefede7b5d 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -pedantic" } */
+/* { dg-options "-std=c99 -pedantic -ftrack-macro-expansion=0" } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
index 03c1715bee6..6d70ce2bb8d 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -pedantic-errors" } */
+/* { dg-options "-std=c99 -pedantic-errors -ftrack-macro-expansion=0" } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/gomp/macro-4.c b/gcc/testsuite/gcc.dg/gomp/macro-4.c
index a4ed9a3980a..c6817d40125 100644
--- a/gcc/testsuite/gcc.dg/gomp/macro-4.c
+++ b/gcc/testsuite/gcc.dg/gomp/macro-4.c
@@ -1,6 +1,6 @@
 /* PR preprocessor/27746 */
 /* { dg-do compile } */
-/* { dg-options "-fopenmp -Wunknown-pragmas" } */
+/* { dg-options "-fopenmp -Wunknown-pragmas -ftrack-macro-expansion=0" } */
 
 #define p		_Pragma ("omp parallel")
 #define omp_p		_Pragma ("omp p")
diff --git a/gcc/testsuite/gcc.dg/pragma-message.c b/gcc/testsuite/gcc.dg/pragma-message.c
index 1b7cf09de0a..72fb0da6f44 100644
--- a/gcc/testsuite/gcc.dg/pragma-message.c
+++ b/gcc/testsuite/gcc.dg/pragma-message.c
@@ -45,8 +45,9 @@
 #define DO_PRAGMA(x) _Pragma (#x) /* { dg-line pragma_loc1 } */
 #define TODO(x) DO_PRAGMA(message ("TODO - " #x)) /* { dg-line pragma_loc2 } */
 TODO(Okay 4) /* { dg-message "in expansion of macro 'TODO'" } */
-/* { dg-message "TODO - Okay 4" "test4.1" { target *-*-* } pragma_loc1 } */
+/* { dg-message "1:TODO - Okay 4" "test4.1" { target *-*-* } 1 } */
 /* { dg-message "in expansion of macro 'DO_PRAGMA'" "test4.2" { target *-*-* } pragma_loc2 } */
+/* { dg-note {in <_Pragma directive>} "test4.3" { target *-*-* } pragma_loc1 } */
 
 #if 0
 #pragma message ("Not printed")
diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
index 8d37b24e59b..02ebf8b30d9 100644
--- a/gcc/testsuite/lib/prune.exp
+++ b/gcc/testsuite/lib/prune.exp
@@ -54,6 +54,7 @@ proc prune_gcc_output { text } {
 
     # Diagnostic inclusion stack
     regsub -all "(^|\n)(In file)?\[ \]+included from \[^\n\]*" $text "" text
+    regsub -all "(^|\n)In buffer generated from \[^\n\]*" $text "" text
     regsub -all "(^|\n)\[ \]+from \[^\n\]*" $text "" text
     regsub -all "(^|\n)(In|of) module( \[^\n \]*,)? imported at \[^\n\]*" $text "" text
 
diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index 731e3559cd8..fd2773f3d8a 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -203,9 +203,12 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context,
 	const int resolved_def_loc_line = SOURCE_LINE (m, l0);
         if (ix == 0 && saved_location_line != resolved_def_loc_line)
           {
-            diagnostic_append_note (context, resolved_def_loc, 
-                                    "in definition of macro %qs",
-                                    linemap_map_get_macro_name (iter->map));
+	    const char *name = linemap_map_get_macro_name (iter->map);
+	    if (*name == '<')
+	      diagnostic_append_note (context, resolved_def_loc, "in %s", name);
+	    else
+	      diagnostic_append_note (context, resolved_def_loc,
+				      "in definition of macro %qs", name);
             /* At this step, as we've printed the context of the macro
                definition, we don't want to print the context of its
                expansion, otherwise, it'd be redundant.  */
@@ -220,9 +223,12 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context,
                                     MACRO_MAP_EXPANSION_POINT_LOCATION (iter->map),
                                     LRK_MACRO_DEFINITION_LOCATION, NULL);
 
-        diagnostic_append_note (context, resolved_exp_loc, 
-                                "in expansion of macro %qs",
-                                linemap_map_get_macro_name (iter->map));
+	const char *name = linemap_map_get_macro_name (iter->map);
+	if (*name == '<')
+	  diagnostic_append_note (context, resolved_exp_loc, "in %s", name);
+	else
+	  diagnostic_append_note (context, resolved_exp_loc,
+				  "in expansion of macro %qs", name);
       }
 }
 
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index 8d7c93bce53..c9e833887fb 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -127,10 +127,10 @@ static void do_pragma_warning_or_error (cpp_reader *, bool error);
 static void do_pragma_warning (cpp_reader *);
 static void do_pragma_error (cpp_reader *);
 static void do_linemarker (cpp_reader *);
-static const cpp_token *get_token_no_padding (cpp_reader *);
-static const cpp_token *get__Pragma_string (cpp_reader *);
-static void destringize_and_run (cpp_reader *, const cpp_string *,
-				 location_t);
+static const cpp_token *get_token_no_padding (cpp_reader *,
+					      location_t * = nullptr);
+static const cpp_token *get__Pragma_string (cpp_reader *,
+					    location_t * = nullptr);
 static bool parse_answer (cpp_reader *, int, location_t, cpp_macro **);
 static cpp_hashnode *parse_assertion (cpp_reader *, int, cpp_macro **);
 static cpp_macro **find_answer (cpp_hashnode *, const cpp_macro *);
@@ -1505,14 +1505,12 @@ do_pragma (cpp_reader *pfile)
 {
   const struct pragma_entry *p = NULL;
   const cpp_token *token, *pragma_token;
-  location_t pragma_token_virt_loc = 0;
   cpp_token ns_token;
   unsigned int count = 1;
 
   pfile->state.prevent_expansion++;
 
-  pragma_token = token = cpp_get_token_with_location (pfile,
-						      &pragma_token_virt_loc);
+  pragma_token = token = cpp_get_token (pfile);
   ns_token = *token;
   if (token->type == CPP_NAME)
     {
@@ -1538,7 +1536,7 @@ do_pragma (cpp_reader *pfile)
     {
       if (p->is_deferred)
 	{
-	  pfile->directive_result.src_loc = pragma_token_virt_loc;
+	  pfile->directive_result.src_loc = pragma_token->src_loc;
 	  pfile->directive_result.type = CPP_PRAGMA;
 	  pfile->directive_result.flags = pragma_token->flags;
 	  pfile->directive_result.val.pragma = p->u.ident;
@@ -1831,11 +1829,11 @@ do_pragma_error (cpp_reader *pfile)
 
 /* Get a token but skip padding.  */
 static const cpp_token *
-get_token_no_padding (cpp_reader *pfile)
+get_token_no_padding (cpp_reader *pfile, location_t *virt_loc)
 {
   for (;;)
     {
-      const cpp_token *result = cpp_get_token (pfile);
+      const cpp_token *result = cpp_get_token_with_location (pfile, virt_loc);
       if (result->type != CPP_PADDING)
 	return result;
     }
@@ -1844,7 +1842,7 @@ get_token_no_padding (cpp_reader *pfile)
 /* Check syntax is "(string-literal)".  Returns the string on success,
    or NULL on failure.  */
 static const cpp_token *
-get__Pragma_string (cpp_reader *pfile)
+get__Pragma_string (cpp_reader *pfile, location_t *string_virt_loc)
 {
   const cpp_token *string;
   const cpp_token *paren;
@@ -1855,7 +1853,7 @@ get__Pragma_string (cpp_reader *pfile)
   if (paren->type != CPP_OPEN_PAREN)
     return NULL;
 
-  string = get_token_no_padding (pfile);
+  string = get_token_no_padding (pfile, string_virt_loc);
   if (string->type == CPP_EOF)
     _cpp_backup_tokens (pfile, 1);
   if (string->type != CPP_STRING && string->type != CPP_WSTRING
@@ -1875,55 +1873,105 @@ get__Pragma_string (cpp_reader *pfile)
 /* Destringize IN into a temporary buffer, by removing the first \ of
    \" and \\ sequences, and process the result as a #pragma directive.  */
 static void
-destringize_and_run (cpp_reader *pfile, const cpp_string *in,
-		     location_t expansion_loc)
-{
-  const unsigned char *src, *limit;
-  char *dest, *result;
-  cpp_context *saved_context;
-  cpp_token *saved_cur_token;
-  tokenrun *saved_cur_run;
-  cpp_token *toks;
-  int count;
-  const struct directive *save_directive;
-
-  dest = result = (char *) alloca (in->len - 1);
-  src = in->text + 1 + (in->text[0] == 'L');
-  limit = in->text + in->len - 1;
-  while (src < limit)
+destringize_and_run (cpp_reader *pfile, _cpp__Pragma_state *pstate)
+{
+  uchar *dest, *result;
+
+  /* Determine where the data starts, and what kind of string it is.  */
+  const cpp_string *const in = &pstate->string_tok->val.str;
+  const uchar *src = in->text;
+  bool is_raw_string = false;
+  for (;;)
     {
-      /* We know there is a character following the backslash.  */
-      if (*src == '\\' && (src[1] == '\\' || src[1] == '"'))
-	src++;
-      *dest++ = *src++;
+      switch (*src++)
+	{
+	case '\"': break;
+	case 'R': is_raw_string = true; continue;
+	case '\0': gcc_assert (false);
+	default: continue;
+	}
+      break;
     }
-  *dest = '\n';
 
-  /* Ugh; an awful kludge.  We are really not set up to be lexing
-     tokens when in the middle of a macro expansion.  Use a new
-     context to force cpp_get_token to lex, and so skip_rest_of_line
-     doesn't go beyond the end of the text.  Also, remember the
-     current lexing position so we can return to it later.
+  /* If we were given a raw string literal, we don't need to destringize it,
+     but we do need to strip off the prefix and the suffix.  */
+  if (is_raw_string)
+    {
+      cpp_string buf;
+      const bool ok
+	= cpp_interpret_string_notranslate (pfile, in, 1, &buf, CPP_STRING);
+      gcc_assert (ok);
 
-     Something like line-at-a-time lexing should remove the need for
-     this.  */
-  saved_context = pfile->context;
-  saved_cur_token = pfile->cur_token;
-  saved_cur_run = pfile->cur_run;
+      /* BUF.TEXT ends with a terminating null (which is counted in BUF.LEN).
+	 We want to end with a newline as required by cpp_push_buffer.  While it
+	 is not strictly necessary to null terminate our buffer, it is useful to
+	 do so for safety, so we reserve one extra byte.  The \n\0 sequence is
+	 appended after the else block.  */
+      result = _cpp_unaligned_alloc (pfile, buf.len + 1);
+      memcpy (result, buf.text, buf.len - 1);
+      dest = result + (buf.len - 1);
+      XDELETEVEC (buf.text);
+    }
+  else
+    {
+      const auto last_ptr = in->text + in->len - 1;
+      /* +2 for the trailing \n\0 as above.  */
+      dest = result = _cpp_unaligned_alloc (pfile, last_ptr - src + 1 + 2);
+      while (src < last_ptr)
+	{
+	  /* We know there is a character following the backslash.  */
+	  if (*src == '\\' && (src[1] == '\\' || src[1] == '"'))
+	    src++;
+	  *dest++ = *src++;
+	}
+    }
+  *dest++ = '\n';
+  *dest++ = '\0';
 
-  pfile->context = XCNEW (cpp_context);
+  /* We will now ask PFILE to interrupt what it was doing (obtaining tokens
+     either from the main context via lexing, or from a macro context), and get
+     tokens from the string argument instead.  We create a new isolated
+     cpp_context so that cpp_get_token will think it is working on the main
+     buffer and call cpp_lex_token accordingly.  Save all the relevant state so
+     we can return to the previous task once that is completed.
 
-  /* Inline run_directive, since we need to delay the _cpp_pop_buffer
-     until we've read all of the tokens that we want.  */
-  cpp_push_buffer (pfile, (const uchar *) result, dest - result,
-		   /* from_stage3 */ true);
-  /* ??? Antique Disgusting Hack.  What does this do?  */
-  if (pfile->buffer->prev)
-    pfile->buffer->file = pfile->buffer->prev->file;
+     Doing things this way is a bit of a kludge, but the alternative would be
+     to create a new context type to support lexing from a string, and that
+     would add overhead to every token parse, while _Pragma is relatively rarely
+     needed.  */
 
+  const auto saved_context = pfile->context;
+  const auto saved_cur_token = pfile->cur_token;
+  const auto saved_cur_run = pfile->cur_run;
+  pfile->context = XCNEW (cpp_context);
   start_directive (pfile);
+
+  /* Set up an LC_GEN line map to get valid locations for the tokens we are
+     about to lex.  We need to do this after calling start_directive, because
+     historically pfile->directive_line is what's been passed to
+     pfile->cb.def_pragma, and we are not proposing to change that now.  To
+     decide if we are in a system header or not, look at the location of the
+     _Pragma token.  So for instance if we have _Pragma(S) in the main file,
+     where S is a macro defined in a system header, we will decide we are not in
+     a system location.  */
+  const unsigned int buf_len = dest - result;
+  const int sysp = linemap_location_in_system_header_p (pfile->line_table,
+							pstate->pragma_loc);
+  linemap_add (pfile->line_table, LC_GEN, sysp, (const char *)result, 1,
+	       buf_len);
+  const auto col_hint = (uchar *) memchr (result, '\n', buf_len) - result;
+  linemap_line_start (pfile->line_table, 1, col_hint);
+
+  /* Push the buffer.  */
+  cpp_push_buffer (pfile, result, buf_len - 2, true);
+
+  /* This is needed to make _Pragma("once") work correctly, as it needs
+     pfile->buffer->file to be set to the current source file.  */
+  pfile->buffer->file = pfile->buffer->prev->file;
+
+  /* We are ready to start handling the directive as normal.  */
   _cpp_clean_line (pfile);
-  save_directive = pfile->directive;
+  const auto save_directive = pfile->directive;
   pfile->directive = &dtable[T_PRAGMA];
   do_pragma (pfile);
   if (pfile->directive_result.type == CPP_PRAGMA)
@@ -1932,85 +1980,127 @@ destringize_and_run (cpp_reader *pfile, const cpp_string *in,
   pfile->directive = save_directive;
 
   /* We always insert at least one token, the directive result.  It'll
-     either be a CPP_PADDING or a CPP_PRAGMA.  In the later case, we 
+     either be a CPP_PADDING or a CPP_PRAGMA.  In the latter case, we
      need to insert *all* of the tokens, including the CPP_PRAGMA_EOL.  */
 
   /* If we're not handling the pragma internally, read all of the tokens from
-     the string buffer now, while the string buffer is still installed.  */
-  /* ??? Note that the token buffer allocated here is leaked.  It's not clear
-     to me what the true lifespan of the tokens are.  It would appear that
-     the lifespan is the entire parse of the main input stream, in which case
-     this may not be wrong.  */
-  if (pfile->directive_result.type == CPP_PRAGMA)
-    {
-      int maxcount;
-
-      count = 1;
-      maxcount = 50;
-      toks = XNEWVEC (cpp_token, maxcount);
-      toks[0] = pfile->directive_result;
-      toks[0].src_loc = expansion_loc;
-
-      do
+     the string buffer now, while the string buffer is still installed, and then
+     push them as a new token context after.  This way, we can clean up the
+     temporarily modified state of the lexer now.  */
+
+  const bool is_deferred = (pfile->directive_result.type == CPP_PRAGMA);
+  if (is_deferred)
+    {
+      /* Using _cpp_buff allows us to arrange for this buffer to be freed when
+	 the new token context is popped, without adding any additional space
+	 overhead to the cpp_context structure.  In order to support
+	 track_macro_expansion==0, we need to store the cpp_token objects
+	 contiguously, and the virt locs separately.  (Note that these tokens
+	 may acquire a virtual loc here, in case the pragma allows macro
+	 expansion.  But they will not yet have virtual locs representing them
+	 as part of the expansion of the _Pragma directive; this will be handled
+	 later in _cpp_push__Pragma_token_context.  */
+      const size_t init_count = 50;
+      _cpp_buff *tok_buff
+	= _cpp_get_buff (pfile, init_count * sizeof (cpp_token));
+      _cpp_buff *loc_buff
+	= _cpp_get_buff (pfile, init_count * sizeof (location_t));
+
+      /* Remember the base buffs so we can chain the final loc buff after it
+	 once we are done collecting tokens.  */
+      const auto tok_buff0 = tok_buff;
+      pstate->buff_chain = &loc_buff->next;
+
+      /* DIRECTIVE_RESULT is the first token we return (a CPP_PRAGMA).  This
+	 location cannot result from macro expansion, so there is no virtual
+	 location to worry about.  */
+      auto tok_out = (cpp_token *) tok_buff->base;
+      *tok_out++ = pfile->directive_result;
+      auto loc_out = (location_t *) loc_buff->base;
+      *loc_out++ = pfile->directive_result.src_loc;
+      unsigned int ntoks = 1;
+
+      /* Finally get all the tokens.  */
+      for (;;)
 	{
-	  if (count == maxcount)
+	  if (tok_buff->limit - (uchar *)tok_out < (int)sizeof (cpp_token))
 	    {
-	      maxcount = maxcount * 3 / 2;
-	      toks = XRESIZEVEC (cpp_token, toks, maxcount);
+	      _cpp_extend_buff (pfile, &tok_buff,
+				tok_buff->limit - tok_buff->base);
+	      tok_out = ((cpp_token *)tok_buff->base) + ntoks;
 	    }
-	  toks[count] = *cpp_get_token (pfile);
-	  /* _Pragma is a builtin, so we're not within a macro-map, and so
-	     the token locations are set to bogus ordinary locations
-	     near to, but after that of the "_Pragma".
-	     Paper over this by setting them equal to the location of the
-	     _Pragma itself (PR preprocessor/69126).  */
-	  toks[count].src_loc = expansion_loc;
+
+	  if (loc_buff->limit - (uchar *)loc_out < (int)sizeof (location_t))
+	    {
+	      _cpp_extend_buff (pfile, &loc_buff,
+				loc_buff->limit - loc_buff->base);
+	      loc_out = ((location_t *)loc_buff->base) + ntoks;
+	    }
+
+	  const auto this_tok = tok_out;
+	  *tok_out++ = *cpp_get_token_with_location (pfile, loc_out++);
+	  ++ntoks;
+
 	  /* Macros have been already expanded by cpp_get_token
 	     if the pragma allowed expansion.  */
-	  toks[count++].flags |= NO_EXPAND;
+	  this_tok->flags |= NO_EXPAND;
+	  if (this_tok->type == CPP_PRAGMA_EOL)
+	    break;
 	}
-      while (toks[count-1].type != CPP_PRAGMA_EOL);
+
+      /* Finalize the buffers so they can be stored as one chain in a
+	 cpp_context and freed when that context is popped.  */
+      tok_buff0->next = loc_buff;
+      pstate->ntoks = ntoks;
+      pstate->tok_buff = tok_buff;
+      pstate->loc_buff = loc_buff;
     }
   else
     {
-      count = 1;
-      toks = &pfile->avoid_paste;
-
       /* If we handled the entire pragma internally, make sure we get the
 	 line number correct for the next token.  */
       if (pfile->cb.line_change)
 	pfile->cb.line_change (pfile, pfile->cur_token, false);
     }
 
-  /* Finish inlining run_directive.  */
+  /* Reset the old state before...  */
+  const auto map = linemap_add (pfile->line_table, LC_LEAVE, 0, nullptr, 0);
+  linemap_line_start
+    (pfile->line_table,
+     ORDINARY_MAP_STARTING_LINE_NUMBER (linemap_check_ordinary (map)),
+     127);
   pfile->buffer->file = NULL;
   _cpp_pop_buffer (pfile);
-
-  /* Reset the old macro state before ...  */
   XDELETE (pfile->context);
   pfile->context = saved_context;
   pfile->cur_token = saved_cur_token;
   pfile->cur_run = saved_cur_run;
 
-  /* ... inserting the new tokens we collected.  */
-  _cpp_push_token_context (pfile, NULL, toks, count);
+  /* ...inserting the new tokens we collected.  This is not a simple call to
+     _cpp_push_token_context, because we need to create virtual locations
+     for the tokens and push an extended token context to return them.  */
+  if (is_deferred)
+    _cpp_push__Pragma_token_context (pfile, pstate);
+  else
+    _cpp_push_token_context (pfile, nullptr, &pfile->avoid_paste, 1);
 }
 
+
 /* Handle the _Pragma operator.  Return 0 on error, 1 if ok.  */
+
 int
-_cpp_do__Pragma (cpp_reader *pfile, location_t expansion_loc)
+_cpp_do__Pragma (cpp_reader *pfile, _cpp__Pragma_state *pstate)
 {
   /* Make sure we don't invalidate the string token, if the closing parenthesis
    ended up on a different line.  */
   ++pfile->keep_tokens;
-  const cpp_token *string = get__Pragma_string (pfile);
+  pstate->string_tok = get__Pragma_string (pfile, &pstate->string_loc);
   --pfile->keep_tokens;
 
   pfile->directive_result.type = CPP_PADDING;
-
-  if (string)
+  if (pstate->string_tok)
     {
-      destringize_and_run (pfile, &string->val.str, expansion_loc);
+      destringize_and_run (pfile, pstate);
       return 1;
     }
   cpp_error (pfile, CPP_DL_ERROR,
diff --git a/libcpp/errors.cc b/libcpp/errors.cc
index 3269d076af2..54c1c282540 100644
--- a/libcpp/errors.cc
+++ b/libcpp/errors.cc
@@ -60,13 +60,11 @@ cpp_diagnostic_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
 		   enum cpp_warning_reason reason, rich_location *richloc,
 		   const char *msgid, va_list *ap)
 {
-  bool ret;
-
   if (!pfile->cb.diagnostic)
     abort ();
-  ret = pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
-
-  return ret;
+  if (pfile->diagnostic_rebase_loc)
+    _cpp_rebase_diagnostic_location (pfile, richloc);
+  return pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
 }
 
 /* Print a diagnostic at the location of the previously lexed token.  */
@@ -197,16 +195,14 @@ cpp_diagnostic_with_line (cpp_reader * pfile, enum cpp_diagnostic_level level,
 			  location_t src_loc, unsigned int column,
 			  const char *msgid, va_list *ap)
 {
-  bool ret;
-  
   if (!pfile->cb.diagnostic)
     abort ();
   rich_location richloc (pfile->line_table, src_loc);
   if (column)
     richloc.override_column (column);
-  ret = pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
-
-  return ret;
+  if (pfile->diagnostic_rebase_loc)
+    _cpp_rebase_diagnostic_location (pfile, &richloc);
+  return pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
 }
 
 /* Print a warning or error, depending on the value of LEVEL.  */
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 426cddb6964..430c567f776 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1758,6 +1758,7 @@ class rich_location
   location_range *get_range (unsigned int idx);
 
   expanded_location get_expanded_location (unsigned int idx);
+  void forget_cached_expanded_location () { m_have_expanded_location = false; }
 
   void
   override_column (int column);
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 8b74d10c1a3..b6118d7128b 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -292,6 +292,28 @@ struct lexer_state
   unsigned char ignore__Pragma;
 };
 
+/* Because handling of _Pragma bounces back and forth between macro.cc and
+   directives.cc, it is useful to keep the needed state in one place.  */
+struct _cpp__Pragma_state
+{
+  const cpp_token *string_tok; /* The token for the argument string.  */
+
+  /* These locations are the virtual locations returned by
+     cpp_get_token_with_location, if the relevant tokens came from macro
+     expansions.  */
+  location_t pragma_loc; /* Location of the _Pragma token.  */
+  location_t string_loc; /* Location of the string arg.  */
+
+  /* The tokens lexed from the _Pragma string.  */
+  unsigned int ntoks;
+  _cpp_buff *tok_buff;
+  _cpp_buff *loc_buff;
+  _cpp_buff **buff_chain;
+};
+
+/* In macro.cc, implements pstate->diagnostic_rebase_loc handling.  */
+void _cpp_rebase_diagnostic_location (cpp_reader *, rich_location *);
+
 /* Special nodes - identifiers with predefined significance.  */
 struct spec_nodes
 {
@@ -601,6 +623,12 @@ struct cpp_reader
      zero of said file.  */
   location_t main_loc;
 
+  /* Location from which we would like to pretend a given token was
+     macro-expanded, if a diagnostic is issued.  Useful for improving
+     _Pragma diagnostics.  */
+  location_t diagnostic_rebase_loc;
+  cpp_hashnode *diagnostic_rebase_node;
+
   /* Returns true iff we should warn about UTF-8 bidirectional control
      characters.  */
   bool warn_bidi_p () const
@@ -701,6 +729,8 @@ extern const unsigned char *_cpp_builtin_macro_text (cpp_reader *,
 extern int _cpp_warn_if_unused_macro (cpp_reader *, cpp_hashnode *, void *);
 extern void _cpp_push_token_context (cpp_reader *, cpp_hashnode *,
 				     const cpp_token *, unsigned int);
+extern void _cpp_push__Pragma_token_context (cpp_reader *,
+					     _cpp__Pragma_state *);
 extern void _cpp_backup_tokens_direct (cpp_reader *, unsigned int);
 
 /* In identifiers.cc */
@@ -772,7 +802,7 @@ extern int _cpp_handle_directive (cpp_reader *, bool);
 extern void _cpp_define_builtin (cpp_reader *, const char *);
 extern char ** _cpp_save_pragma_names (cpp_reader *);
 extern void _cpp_restore_pragma_names (cpp_reader *, char **);
-extern int _cpp_do__Pragma (cpp_reader *, location_t);
+extern int _cpp_do__Pragma (cpp_reader *, _cpp__Pragma_state *);
 extern void _cpp_init_directives (cpp_reader *);
 extern void _cpp_init_internal_pragmas (cpp_reader *);
 extern void _cpp_do_file_change (cpp_reader *, enum lc_reason, const char *,
diff --git a/libcpp/macro.cc b/libcpp/macro.cc
index dada8fea835..26019ef7934 100644
--- a/libcpp/macro.cc
+++ b/libcpp/macro.cc
@@ -93,6 +93,8 @@ struct macro_arg_saved_data {
 static const char *vaopt_paste_error =
   N_("'##' cannot appear at either end of __VA_OPT__");
 
+static const uchar pragma_str[] = N_("<_Pragma directive>");
+
 static void expand_arg (cpp_reader *, macro_arg *);
 
 /* A class for tracking __VA_OPT__ state while iterating over a
@@ -756,7 +758,31 @@ builtin_macro (cpp_reader *pfile, cpp_hashnode *node,
       if (pfile->state.in_directive || pfile->state.ignore__Pragma)
 	return 0;
 
-      return _cpp_do__Pragma (pfile, loc);
+      _cpp__Pragma_state pstate = {};
+      pstate.pragma_loc = loc;
+
+      /* The diagnostic_rebase stuff arranges that any diagnostics issued during
+	 lexing will point the user back to the _Pragma location.  */
+      const auto prev_rloc = pfile->diagnostic_rebase_loc;
+      const auto prev_rnode = pfile->diagnostic_rebase_node;
+      pfile->diagnostic_rebase_loc = loc;
+      pfile->diagnostic_rebase_node
+	= cpp_lookup (pfile, pragma_str, (sizeof pragma_str) - 1);
+
+      /* While lexing tokens, if we end up expanding some macros, we would
+	 like not to override top_most_macro_node; preserving it pointing
+	 to the _Pragma helps out the case of -ftrack-macro-expansion=0.
+	 Setting this flag causes in_macro_expansion_p to return TRUE,
+	 even though we are not technically in a macro context.  */
+      const bool prev_expand = pfile->about_to_expand_macro_p;
+      pfile->about_to_expand_macro_p = true;
+
+      /* Get the tokens, then reset everything back how it was.  */
+      const int res = _cpp_do__Pragma (pfile, &pstate);
+      pfile->about_to_expand_macro_p = prev_expand;
+      pfile->diagnostic_rebase_loc = prev_rloc;
+      pfile->diagnostic_rebase_node = prev_rnode;
+      return res;
     }
 
   buf = _cpp_builtin_macro_text (pfile, node, expand_loc);
@@ -2802,7 +2828,8 @@ _cpp_pop_context (cpp_reader *pfile)
 	  && macro_of_context (context->prev) != macro)
 	macro->flags &= ~NODE_DISABLED;
 
-      if (macro == pfile->top_most_macro_node && context->prev == NULL)
+      if (!pfile->about_to_expand_macro_p
+	  && context->prev == &pfile->base_context)
 	/* We are popping the context of the top-most macro node.  */
 	pfile->top_most_macro_node = NULL;
     }
@@ -2836,10 +2863,10 @@ reached_end_of_context (cpp_context *context)
 
 /* Consume the next token contained in the current context of PFILE,
    and return it in *TOKEN. It's "full location" is returned in
-   *LOCATION. If -ftrack-macro-location is in effeect, fFull location"
-   means the location encoding the locus of the token across macro
-   expansion; otherwise it's just is the "normal" location of the
-   token which (*TOKEN)->src_loc.  */
+   *LOCATION.  If -ftrack-macro-location is in effect, "full location"
+   means the virtual location encoding the locus of the token across macro
+   expansion; otherwise it's just the "normal" (spelling) location of the
+   token, which is (*TOKEN)->src_loc.  */
 static inline void
 consume_next_token_from_context (cpp_reader *pfile,
 				 const cpp_token ** token,
@@ -4137,3 +4164,90 @@ cpp_macro_definition (cpp_reader *pfile, cpp_hashnode *node,
   *buffer = '\0';
   return pfile->macro_buffer;
 }
+
+/* Handle the list of tokens lexed from a _Pragma string.  We need to create
+   virtual locations (reflecting the fact that these tokens are logically
+   within the expansion of the _Pragma string), and push an extended token
+   context.  */
+
+void
+_cpp_push__Pragma_token_context (cpp_reader *pfile,
+				 _cpp__Pragma_state *pstate)
+{
+  const auto node = cpp_lookup (pfile, pragma_str, (sizeof pragma_str) - 1);
+  const auto toks = (const cpp_token *) pstate->tok_buff->base;
+
+  /* If not tracking macro expansions, then just push a normal token context.
+     cpp_get_token () will return the user the location of the _Pragma
+     directive, so they will have a valid location for the _Pragma which is
+     outside the LC_GEN map.  */
+  if (!CPP_OPTION (pfile, track_macro_expansion))
+    {
+      _cpp_push_token_context (pfile, node, toks, pstate->ntoks);
+      /* Arrange to free the buffers when the context is popped.  */
+      pfile->context->buff = pstate->tok_buff;
+      return;
+    }
+
+  location_t *virt_locs = nullptr;
+  _cpp_buff *const macro_tokens = tokens_buff_new (pfile, pstate->ntoks,
+						   &virt_locs);
+  const auto map = linemap_enter_macro (pfile->line_table, node,
+					pstate->pragma_loc, pstate->ntoks);
+  const auto locs = (location_t *)pstate->loc_buff->base;
+  for (unsigned int i = 0; i != pstate->ntoks; ++i)
+    {
+      tokens_buff_add_token (macro_tokens, virt_locs, toks + i,
+			     locs[i], locs[i], map, i);
+    }
+
+  /* Chain tok_buff ahead of macro_tokens so both are freed together
+     when the context is popped.  pstate->buff_chain is the NEXT pointer
+     of the last buffer in the LOC_BUFF chain, so it looks like:
+     TOK_BUFF_1 -> ... -> TOK_BUFF_N -> ... -> LOC_BUFF_1 -> ... ->
+     LOC_BUFF_N -> MACRO_TOKENS_1 -> ... -> MACRO_TOKENS_N.  */
+  *pstate->buff_chain = macro_tokens;
+  push_extended_tokens_context (pfile, node, pstate->tok_buff, virt_locs,
+				(const cpp_token **) macro_tokens->base,
+				pstate->ntoks);
+}
+
+void
+_cpp_rebase_diagnostic_location (cpp_reader *pfile, rich_location *richloc)
+{
+  /* If we are here, it means a diagnostic is being generated while lexing
+     tokens outside a macro context, but pfile->diagnostic_rebase_loc indicates
+     a location from which we would like to pretend we are actually expanding a
+     macro.  This works around the fact that a macro map can only be generated
+     once we know how many tokens it will contain, but the number of tokens to
+     be lexed from, say, a _Pragma string, is not known ahead of time.  In the
+     case of _Pragma, _cpp_push__Pragma_token_context above handles creating the
+     proper macro map once all the tokens are available.  This function runs
+     earlier than that, while in the middle of lexing tokens, so it creates a
+     temporary macro map which serves only to improve the information content of
+     the diagnostic that's about to be generated.  */
+
+  const int nlocs = richloc->get_num_locations ();
+
+  if (CPP_OPTION (pfile, track_macro_expansion))
+    {
+      const auto map
+	= linemap_enter_macro (pfile->line_table, pfile->diagnostic_rebase_node,
+			       pfile->diagnostic_rebase_loc, nlocs);
+      for (int i = 0; i != nlocs; ++i)
+	{
+	  location_range& r = *richloc->get_range (i);
+	  r.m_loc = linemap_add_macro_token (map, i, r.m_loc, r.m_loc);
+	}
+    }
+  else
+    {
+      /* When not tracking macro expansion, then set the location to the
+	 expansion point for all tokens, which is what would be returned
+	 by cpp_get_token in the normal case.  */
+      for (int i = 0; i != nlocs; ++i)
+	richloc->get_range (i)->m_loc = pfile->invocation_location;
+    }
+
+  richloc->forget_cached_expanded_location ();
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index ddccfe89e73..f518915492d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -46,7 +46,8 @@ main (void)
   /* Nvptx targets require a vector_length or 32 in to allow spinlocks with
      gangs.  */
   check_reduction (num_workers (nw) vector_length (vl), worker); /* { dg-line check_reduction_loc } */
-  /* { dg-warning "22:region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } pragma_loc }
+  /* { dg-warning "1:region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } 1 }
+     { dg-note "22:in <_Pragma directive>" "" { target *-*-* xfail offloading_enabled} pragma_loc }
      { dg-note "1:in expansion of macro 'DO_PRAGMA'" "" { target *-*-* xfail offloading_enabled } DO_PRAGMA_loc }
      { dg-note "3:in expansion of macro 'check_reduction'" "" { target *-*-* xfail offloading_enabled } check_reduction_loc }
      TODO See PR101551 for 'offloading_enabled' XFAILs.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
index 84e6d51670b..bd2567d96f8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
@@ -40,46 +40,54 @@ int a1[n], a2[n];
 
 gentest (test1, "acc parallel loop gang vector_length (128) firstprivate (t1, t2)",
 	 "acc loop vector reduction(+:t1) reduction(-:t2)")
-/* { dg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { dg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { dg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { dg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { dg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test2, "acc parallel loop gang vector_length (128) firstprivate (t1, t2)",
 	 "acc loop worker vector reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test3, "acc parallel loop gang worker vector_length (128) firstprivate (t1, t2)",
 	 "acc loop vector reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test4, "acc parallel loop firstprivate (t1, t2)",
 	 "acc loop reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output
  2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                   ` (2 preceding siblings ...)
  2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
@ 2023-07-21 23:08 ` Lewis Hyatt
  2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
  4 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-21 23:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

	* diagnostic-format-sarif.cc (sarif_builder::xloc_to_fb): New function.
	(sarif_builder::maybe_make_physical_location_object): Support
	generated data locations.
	(sarif_builder::make_artifact_location_object): Likewise.
	(sarif_builder::maybe_make_region_object_for_context): Likewise.
	(sarif_builder::make_artifact_object): Likewise.
	(sarif_builder::maybe_make_artifact_content_object): Likewise.
	(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc                | 115 +++++++++++-------
 .../diagnostic-format-sarif-file-5.c          |  31 +++++
 2 files changed, 99 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 5e483988027..29f614124b2 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -173,7 +173,10 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+
+  typedef std::pair<const char *, unsigned int> filename_or_buffer;
+  json::object *make_artifact_location_object (filename_or_buffer fb);
+
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -196,16 +199,17 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
-						    int start_line,
+  json::object *make_artifact_object (filename_or_buffer fb);
+  json::object *
+  maybe_make_artifact_content_object (filename_or_buffer fb) const;
+  json::object *maybe_make_artifact_content_object (expanded_location xloc,
 						    int end_line) const;
   json::object *make_fix_object (const rich_location &rich_loc);
   json::object *make_artifact_change_object (const rich_location &richloc);
   json::object *make_replacement_object (const fixit_hint &hint) const;
   json::object *make_artifact_content_object (const char *text) const;
   int get_sarif_column (expanded_location exploc) const;
+  static filename_or_buffer xloc_to_fb (expanded_location xloc);
 
   diagnostic_context *m_context;
 
@@ -219,7 +223,11 @@ private:
      diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set <const char *> m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+     with that length, not a filename.  */
+  hash_set <pair_hash <nofree_ptr_hash <const char>,
+		       int_hash <unsigned int, -1U> >
+	    > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set <free_string_hash> m_rule_id_set;
   json::array *m_rules_arr;
@@ -749,6 +757,15 @@ sarif_builder::make_location_object (const diagnostic_event &event)
   return location_obj;
 }
 
+/* Populate a filename_or_buffer pair from an expanded location.  */
+sarif_builder::filename_or_buffer
+sarif_builder::xloc_to_fb (expanded_location xloc)
+{
+  if (xloc.generated_data_len)
+    return filename_or_buffer (xloc.generated_data, xloc.generated_data_len);
+  return filename_or_buffer (xloc.file, 0);
+}
+
 /* Make a physicalLocation object (SARIF v2.1.0 section 3.29) for LOC,
    or return NULL;
    Add any filename to the m_artifacts.  */
@@ -764,7 +781,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  m_filenames.add (xloc_to_fb (expand_location (loc)));
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -788,7 +805,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (xloc_to_fb (expand_location (loc)));
 }
 
 /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4)
@@ -800,10 +817,12 @@ sarif_builder::make_artifact_location_object (location_t loc)
    or return NULL.  */
 
 json::object *
-sarif_builder::make_artifact_location_object (const char *filename)
+sarif_builder::make_artifact_location_object (filename_or_buffer fb)
 {
   json::object *artifact_loc_obj = new json::object ();
 
+  const auto filename = (fb.second ? special_fname_generated () : fb.first);
+
   /* "uri" property (SARIF v2.1.0 section 3.4.3).  */
   artifact_loc_obj->set ("uri", new json::string (filename));
 
@@ -956,9 +975,7 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const
 
   /* "snippet" property (SARIF v2.1.0 section 3.30.13).  */
   if (json::object *artifact_content_obj
-	 = maybe_make_artifact_content_object (exploc_start.file,
-					       exploc_start.line,
-					       exploc_finish.line))
+	= maybe_make_artifact_content_object (exploc_start, exploc_finish.line))
     region_obj->set ("snippet", artifact_content_obj);
 
   return region_obj;
@@ -1449,24 +1466,24 @@ sarif_builder::maybe_make_cwe_taxonomy_object () const
 /* Make an artifact object (SARIF v2.1.0 section 3.24).  */
 
 json::object *
-sarif_builder::make_artifact_object (const char *filename)
+sarif_builder::make_artifact_object (filename_or_buffer fb)
 {
   json::object *artifact_obj = new json::object ();
 
   /* "location" property (SARIF v2.1.0 section 3.24.2).  */
-  json::object *artifact_loc_obj = make_artifact_location_object (filename);
+  json::object *artifact_loc_obj = make_artifact_location_object (fb);
   artifact_obj->set ("location", artifact_loc_obj);
 
   /* "contents" property (SARIF v2.1.0 section 3.24.8).  */
   if (json::object *artifact_content_obj
-	= maybe_make_artifact_content_object (filename))
+	= maybe_make_artifact_content_object (fb))
     artifact_obj->set ("contents", artifact_content_obj);
 
   /* "sourceLanguage" property (SARIF v2.1.0 section 3.24.10).  */
   if (m_context->m_client_data_hooks)
     if (const char *source_lang
 	= m_context->m_client_data_hooks->maybe_get_sarif_source_language
-	    (filename))
+	    (fb.first))
       artifact_obj->set ("sourceLanguage", new json::string (source_lang));
 
   return artifact_obj;
@@ -1476,39 +1493,44 @@ sarif_builder::make_artifact_object (const char *filename)
    full contents of FILENAME.  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename) const
+sarif_builder::maybe_make_artifact_content_object (filename_or_buffer fb) const
 {
-  /* Let input.cc handle any charset conversion.  */
-  char_span utf8_content = get_source_file_content (filename);
-  if (!utf8_content)
-    return NULL;
-
-  /* Don't add it if it's not valid UTF-8.  */
-  if (!cpp_valid_utf8_p(utf8_content.get_buffer (), utf8_content.length ()))
-    return NULL;
-
-  json::object *artifact_content_obj = new json::object ();
-  artifact_content_obj->set ("text",
-			     new json::string (utf8_content.get_buffer (),
-					       utf8_content.length ()));
+  json::object *artifact_content_obj = nullptr;
+  if (fb.second)
+    {
+      artifact_content_obj = new json::object ();
+      artifact_content_obj->set ("text", new json::string (fb.first,
+							   fb.second));
+    }
+  else if (char_span utf8_content = get_source_file_content (fb.first))
+    {
+      /* Don't add it if it's not valid UTF-8.  */
+      if (!cpp_valid_utf8_p(utf8_content.get_buffer (), utf8_content.length ()))
+	return NULL;
+      artifact_content_obj = new json::object ();
+      artifact_content_obj->set ("text",
+				 new json::string (utf8_content.get_buffer (),
+						   utf8_content.length ()));
+    }
   return artifact_content_obj;
 }
 
 /* Attempt to read the given range of lines from FILENAME; return
-   a freshly-allocated 0-terminated buffer containing them, or NULL.  */
+   a freshly-allocated buffer containing them, or NULL.
+   The buffer is null-terminated, but could also contain embedded null
+   bytes, so the char_span's length() accessor should be used.  */
 
-static char *
-get_source_lines (const char *filename,
-		  int start_line,
+static char_span
+get_source_lines (expanded_location xloc,
 		  int end_line)
 {
   auto_vec<char> result;
 
-  for (int line = start_line; line <= end_line; line++)
+  for (int line = xloc.line; line <= end_line; line++)
     {
-      char_span line_content = location_get_source_line (filename, line);
+      char_span line_content = location_get_source_line (xloc, line);
       if (!line_content.get_buffer ())
-	return NULL;
+	return char_span (nullptr, 0);
       result.reserve (line_content.length () + 1);
       for (size_t i = 0; i < line_content.length (); i++)
 	result.quick_push (line_content[i]);
@@ -1516,33 +1538,32 @@ get_source_lines (const char *filename,
     }
   result.safe_push ('\0');
 
-  return xstrdup (result.address ());
+  return char_span (xstrdup (result.address ()), result.length () - 1);
 }
 
 /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the given
-   run of lines within FILENAME (including the endpoints).  */
+   run of lines starting at XLOC (including the endpoints).  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename,
-						   int start_line,
+sarif_builder::maybe_make_artifact_content_object (expanded_location xloc,
 						   int end_line) const
 {
-  char *text_utf8 = get_source_lines (filename, start_line, end_line);
+  const char_span text_utf8 = get_source_lines (xloc, end_line);
 
   if (!text_utf8)
     return NULL;
 
   /* Don't add it if it's not valid UTF-8.  */
-  if (!cpp_valid_utf8_p(text_utf8, strlen(text_utf8)))
+  if (!cpp_valid_utf8_p(text_utf8.get_buffer (), text_utf8.length ()))
     {
-      free (text_utf8);
+      free (const_cast<char *> (text_utf8.get_buffer ())); 
       return NULL;
     }
 
   json::object *artifact_content_obj = new json::object ();
-  artifact_content_obj->set ("text", new json::string (text_utf8));
-  free (text_utf8);
-
+  artifact_content_obj->set ("text", new json::string (text_utf8.get_buffer (),
+						       text_utf8.length ()));
+  free (const_cast<char *> (text_utf8.get_buffer ()));
   return artifact_content_obj;
 }
 
diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
new file mode 100644
index 00000000000..2ca6a069d3f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
@@ -0,0 +1,31 @@
+/* The goal is to test SARIF output of generated data, such as a _Pragma string.
+   But SARIF output as of yet does not output macro definitions, so such
+   generated data buffers never end up in the typical SARIF output.  One way we
+   can achieve it is to use -fdump-internal-locations, which outputs top-level
+   diagnostic notes inside macro definitions, that SARIF will end up processing.
+   It also outputs a lot of other stuff to stderr (not to the SARIF file) that
+   is not relevant to this test, so we use a blanket dg-regexp to filter all of
+   that away.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=sarif-file -fdump-internal-locations" } */
+/* { dg-allow-blank-lines-in-output "" } */
+
+_Pragma("GCC diagnostic push")
+
+/* { dg-regexp {(.|[\n\r])*} } */
+
+/* Because of the way -fdump-internal-locations works, these regexes themselves
+   will end up in the sarif output also.  But due to the escaping, they don't
+   match themselves, so they still test what we need.  */
+
+/* Four of this pair are output for the tokens inside the
+   _Pragma string (3 plus a PRAGMA_EOL).  */
+
+/* { dg-final { scan-sarif-file "\"artifactLocation\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"snippet\": \{\"text\": \"GCC diagnostic push\\\\n\"" } } */
+
+/* One of this pair is output for the overall internal location.  */
+
+/* { dg-final { scan-sarif-file "\{\"location\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"contents\": \{\"text\": \"GCC diagnostic push\\\\n\\\\0" } } */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens
  2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                   ` (3 preceding siblings ...)
  2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
@ 2023-07-28 22:22 ` David Malcolm
  2023-07-29 14:27   ` Lewis Hyatt
  4 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-07-28 22:22 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> Hello-
> 
> This is an update to the v2 patch series last sent in January:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html
> 
> While I did not receive any feedback on the v2 patches yet, they did
> need some
> rebasing on top of other recent commits to input.cc, so I thought it
> would be
> helpful to send them again now. The patches have not otherwise
> changed from
> v2, and the above-linked message explains how all the patches fit in
> with the
> original v1 series sent last November.
> 
> Dave, I would appreciate it very much if you could please let me know
> what you
> think of this approach? I feel like the diagnostics we currently
> output for _Pragmas are worth improving. As a reminder, say for this
> example:
> 
> =====
>  #define S "GCC diagnostic ignored \"oops"
>  _Pragma(S)
> =====
> 
> We currently output:
> 
> =====
> file.cpp:2:24: warning: missing terminating " character
>     2 | _Pragma(S)
>       |                        ^
> =====
> 
> While after these patches, we would output:
> 
> ======
> <generated>:1:24: warning: missing terminating " character
>     1 | GCC diagnostic ignored "oops
>       |                        ^
> file.cpp:2:1: note: in <_Pragma directive>
>     2 | _Pragma(S)
>       | ^~~~~~~
> ======
> 
> Thanks!

Hi Lewis; sorry for not responding to the v2 patches.

I've started looking at the v3 patches in detail, but I have some high-
level questions about memory usage:

Am I right in thinking that the effect of this patch is that for every
_Pragma in the source we will create a new line_map_ordinary, and a new
buffer for the stringified content of that _Pragma, and that these
allocations will persist for the rest of the compilation?  (plus a
little extra allocation within the "location_t" space from 0 to
0x7fffffff).

It sounds like this will probably be a rounding error that won't be
noticable in profiling, but did you attempt any such measurement of the
memory usage before/after this patch on some real-world projects?

Thanks
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
@ 2023-07-28 22:58   ` David Malcolm
  2023-07-31 22:39     ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-07-28 22:58 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> Add a new linemap reason LC_GEN which enables encoding the location
> of data
> that was generated during compilation and does not appear in any
> source file.
> There could be many use cases, such as, for instance, referring to
> the content
> of builtin macros (not yet implemented, but an easy lift after this
> one.) The
> first intended application is to create a place to store the input to
> a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
> 
> The actual change needed to the line-maps API in libcpp is not too
> large and
> requires no space overhead in the line map data structures (on 64-bit
> systems
> that is; one newly added data member to class line_map_ordinary sits
> inside
> former padding bytes.) An LC_GEN map is just an ordinary map like any
> other,
> but the TO_FILE member that normally points to the file name points
> instead to
> the actual data.  This works automatically with PCH as well, for the
> same
> reason that the file name makes its way into a PCH.  In order to
> avoid
> confusion, the member has been renamed from TO_FILE to DATA, and
> associated
> accessors adjusted.
> 
> Outside libcpp, there are many small changes but most of them are to
> selftests, which are necessarily more sensitive to implementation
> details. From the perspective of the user (the "user", here, being a
> frontend
> using line maps or else the diagnostics infrastructure), the chief
> visible
> change is that the function location_get_source_line() should be
> passed an
> expanded_location object instead of a separate filename and line
> number.  This
> is not a big change because in most cases, this information came
> anyway from a
> call to expand_location and the needed expanded_location object is
> readily
> available. The new overload of location_get_source_line() uses the
> extra
> information in the expanded_location object to obtain the data from
> the
> in-memory buffer when it originated from an LC_GEN map.
> 
> Until the subsequent patch that starts using LC_GEN maps, none are
> yet
> generated within GCC, hence nothing is added to the testsuite here;
> but all
> relevant selftests have been extended to cover generated data maps in
> addition
> to normal files.

[..snip...]

Thanks for the updated patch.

Reading this patch, it felt a bit unnatural to me to have an
  (exploded location, source line) 
pair where the exploded location seems to be representing "which source
file or generated buffer", but the line/column info in that
exploded_location is to be ignored in favor of the 2nd source line.

I think we're missing a class: something that identifies either a
specific source file, or a specific generated buffer.

How about something like either:

class source_id
{
public:
  source_id (const char *filename)
  : m_filename_or_buffer (filename),
    m_len (0)
  {
  }

  explicit source_id (const char *buffer, unsigned buffer_len)
  : m_filename_or_buffer (buffer),
    m_len (buffer_len)
  {
    linemap_assert (buffer_len > 0);
  }

private:
  const char *m_filename_or_buffer;
  unsigned m_len;  // where 0 means "it's a filename"
};

or:

class source_id
{
public:
  source_id (const char *filename)
  : m_ptr (filename),
    m_is_buffer (false)
  {
  }

  explicit source_id (const linemap_ordinary *buffer_linemap)
  : m_ptr (buffer_linemap),
    m_is_buffer (true)
  {
  }

private:
  const void *m_ptr;
  bool m_is_buffer;
};

and use one of these "source_id file" in place of "const char *file",
rather than replacing such things with expanded_location?

> diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> index e8d3dece770..4164fa0b1ba 100644
> --- a/gcc/c-family/c-indentation.cc
> +++ b/gcc/c-family/c-indentation.cc
> @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
>  		   unsigned int *first_nws,
>  		   unsigned int tab_width)
>  {
> -  char_span line = location_get_source_line (exploc.file, exploc.line);
> +  char_span line = location_get_source_line (exploc);

...so this might contine to be:

  char_span line = location_get_source_line (exploc.file, exploc.line);

...but expanded_location's "file" field would become a source_id,
rather than a const char *.  It looks like doing do might make a lot of
"is this the same file or buffer?"  turn into comparisons of source_id
instances.

So I think expanded_location would become:

typedef struct
{
  /* Either the name of the source file involved, or the
     specific generated buffer.  */
  source_id file;

  /* The line-location in the source file.  */
  int line;

  int column;

  void *data;

  /* In a system header?. */
  bool sysp;
} expanded_location;

and we wouldn't need to add these extra fields:

> +
> +  /* If generated data, the data and its length.  The data may contain embedded
> +   nulls and need not be null-terminated.  */
> +  unsigned int generated_data_len;
> +  const char *generated_data;
>  } expanded_location;

and we could pass around source_id instances when identifying specific
filenames/generated buffers.
 
Does this idea simplify/clarify the patch, or make it more complicated?

[...snip...]

Thoughts?
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens
  2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
@ 2023-07-29 14:27   ` Lewis Hyatt
  2023-07-29 16:03     ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-29 14:27 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Fri, Jul 28, 2023 at 6:22 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Hello-
> >
> > This is an update to the v2 patch series last sent in January:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html
> >
> > While I did not receive any feedback on the v2 patches yet, they did
> > need some
> > rebasing on top of other recent commits to input.cc, so I thought it
> > would be
> > helpful to send them again now. The patches have not otherwise
> > changed from
> > v2, and the above-linked message explains how all the patches fit in
> > with the
> > original v1 series sent last November.
> >
> > Dave, I would appreciate it very much if you could please let me know
> > what you
> > think of this approach? I feel like the diagnostics we currently
> > output for _Pragmas are worth improving. As a reminder, say for this
> > example:
> >
> > =====
> >  #define S "GCC diagnostic ignored \"oops"
> >  _Pragma(S)
> > =====
> >
> > We currently output:
> >
> > =====
> > file.cpp:2:24: warning: missing terminating " character
> >     2 | _Pragma(S)
> >       |                        ^
> > =====
> >
> > While after these patches, we would output:
> >
> > ======
> > <generated>:1:24: warning: missing terminating " character
> >     1 | GCC diagnostic ignored "oops
> >       |                        ^
> > file.cpp:2:1: note: in <_Pragma directive>
> >     2 | _Pragma(S)
> >       | ^~~~~~~
> > ======
> >
> > Thanks!
>
> Hi Lewis; sorry for not responding to the v2 patches.
>
> I've started looking at the v3 patches in detail, but I have some high-
> level questions about memory usage:
>
> Am I right in thinking that the effect of this patch is that for every
> _Pragma in the source we will create a new line_map_ordinary, and a new
> buffer for the stringified content of that _Pragma, and that these
> allocations will persist for the rest of the compilation?  (plus a
> little extra allocation within the "location_t" space from 0 to
> 0x7fffffff).
>
> It sounds like this will probably be a rounding error that won't be
> noticable in profiling, but did you attempt any such measurement of the
> memory usage before/after this patch on some real-world projects?
>
> Thanks
> Dave
>

Thanks for looking at the patches, I appreciate it whenever you have
time to get to them.

This is a fair point about the memory usage, basically it means that
each instance of a _Pragma has comparable memory footprint to a macro
definition. (In addition to the overheads you mentioned, it also
creates a macro map to generate a virtual location for the tokens, so
that it's able to output the "in expansion of _Pragma" note. That part
can be disabled with -ftrack-macro-expansion=0 at least.)

I had the sense that _Pragma isn't used often enough for that to be a
problem, but agreed it is worth checking. (I really hope this memory
usage isn't an issue since there are also numerous PRs complaining
about 32-bit limitations in location tracking, that make it tempting
to explore 64-bit line maps or some other option someday too.)

I tried one thing now, wxWidgets uses a lot of diagnostic pragmas
wrapped up inside macros that use _Pragma. (See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578). The testsuite
contains a file allheaders.cpp which includes the whole library, so I
tried compiling this into a pch, which I believe measures the entire
memory footprint including the ordinary and macro line maps and the
_Pragma strings. The resulting PCH sizes were:

279000173 bytes before the changes
279491345 bytes after the changes

So 0.1% bigger. Happy to check other projects too, do you have any
standard gotos? Maybe firefox or something I take it.

I see your other response on patch #1, I am thinking about that and
will reply later. Thanks again!

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens
  2023-07-29 14:27   ` Lewis Hyatt
@ 2023-07-29 16:03     ` David Malcolm
  0 siblings, 0 replies; 36+ messages in thread
From: David Malcolm @ 2023-07-29 16:03 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches

On Sat, 2023-07-29 at 10:27 -0400, Lewis Hyatt wrote:
> On Fri, Jul 28, 2023 at 6:22 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > > Hello-
> > > 
> > > This is an update to the v2 patch series last sent in January:
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609473.html
> > > 
> > > While I did not receive any feedback on the v2 patches yet, they
> > > did
> > > need some
> > > rebasing on top of other recent commits to input.cc, so I thought
> > > it
> > > would be
> > > helpful to send them again now. The patches have not otherwise
> > > changed from
> > > v2, and the above-linked message explains how all the patches fit
> > > in
> > > with the
> > > original v1 series sent last November.
> > > 
> > > Dave, I would appreciate it very much if you could please let me
> > > know
> > > what you
> > > think of this approach? I feel like the diagnostics we currently
> > > output for _Pragmas are worth improving. As a reminder, say for
> > > this
> > > example:
> > > 
> > > =====
> > >  #define S "GCC diagnostic ignored \"oops"
> > >  _Pragma(S)
> > > =====
> > > 
> > > We currently output:
> > > 
> > > =====
> > > file.cpp:2:24: warning: missing terminating " character
> > >     2 | _Pragma(S)
> > >       |                        ^
> > > =====
> > > 
> > > While after these patches, we would output:
> > > 
> > > ======
> > > <generated>:1:24: warning: missing terminating " character
> > >     1 | GCC diagnostic ignored "oops
> > >       |                        ^
> > > file.cpp:2:1: note: in <_Pragma directive>
> > >     2 | _Pragma(S)
> > >       | ^~~~~~~
> > > ======
> > > 
> > > Thanks!
> > 
> > Hi Lewis; sorry for not responding to the v2 patches.
> > 
> > I've started looking at the v3 patches in detail, but I have some
> > high-
> > level questions about memory usage:
> > 
> > Am I right in thinking that the effect of this patch is that for
> > every
> > _Pragma in the source we will create a new line_map_ordinary, and a
> > new
> > buffer for the stringified content of that _Pragma, and that these
> > allocations will persist for the rest of the compilation?  (plus a
> > little extra allocation within the "location_t" space from 0 to
> > 0x7fffffff).
> > 
> > It sounds like this will probably be a rounding error that won't be
> > noticable in profiling, but did you attempt any such measurement of
> > the
> > memory usage before/after this patch on some real-world projects?
> > 
> > Thanks
> > Dave
> > 
> 
> Thanks for looking at the patches, I appreciate it whenever you have
> time to get to them.
> 
> This is a fair point about the memory usage, basically it means that
> each instance of a _Pragma has comparable memory footprint to a macro
> definition. (In addition to the overheads you mentioned, it also
> creates a macro map to generate a virtual location for the tokens, so
> that it's able to output the "in expansion of _Pragma" note. That
> part
> can be disabled with -ftrack-macro-expansion=0 at least.)
> 
> I had the sense that _Pragma isn't used often enough for that to be a
> problem, but agreed it is worth checking. (I really hope this memory
> usage isn't an issue since there are also numerous PRs complaining
> about 32-bit limitations in location tracking, that make it tempting
> to explore 64-bit line maps or some other option someday too.)
> 
> I tried one thing now, wxWidgets uses a lot of diagnostic pragmas
> wrapped up inside macros that use _Pragma. (See
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55578). The testsuite
> contains a file allheaders.cpp which includes the whole library, so I
> tried compiling this into a pch, which I believe measures the entire
> memory footprint including the ordinary and macro line maps and the
> _Pragma strings. The resulting PCH sizes were:
> 
> 279000173 bytes before the changes
> 279491345 bytes after the changes
> 
> So 0.1% bigger. Happy to check other projects too, do you have any
> standard gotos? Maybe firefox or something I take it.

Thanks for doing that test; I think that slight increase on a heavy
user of _Pragma is acceptable.
> 
> I see your other response on patch #1, I am thinking about that and
> will reply later. Thanks again!

Thanks.  Hope that my patch #1 response makes sense and that I'm not
missing something about the way this works.

Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-07-28 22:58   ` David Malcolm
@ 2023-07-31 22:39     ` Lewis Hyatt
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-07-31 22:39 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Fri, Jul 28, 2023 at 6:58 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Add a new linemap reason LC_GEN which enables encoding the location
> > of data
> > that was generated during compilation and does not appear in any
> > source file.
> > There could be many use cases, such as, for instance, referring to
> > the content
> > of builtin macros (not yet implemented, but an easy lift after this
> > one.) The
> > first intended application is to create a place to store the input to
> > a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> >
> > The actual change needed to the line-maps API in libcpp is not too
> > large and
> > requires no space overhead in the line map data structures (on 64-bit
> > systems
> > that is; one newly added data member to class line_map_ordinary sits
> > inside
> > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > other,
> > but the TO_FILE member that normally points to the file name points
> > instead to
> > the actual data.  This works automatically with PCH as well, for the
> > same
> > reason that the file name makes its way into a PCH.  In order to
> > avoid
> > confusion, the member has been renamed from TO_FILE to DATA, and
> > associated
> > accessors adjusted.
> >
> > Outside libcpp, there are many small changes but most of them are to
> > selftests, which are necessarily more sensitive to implementation
> > details. From the perspective of the user (the "user", here, being a
> > frontend
> > using line maps or else the diagnostics infrastructure), the chief
> > visible
> > change is that the function location_get_source_line() should be
> > passed an
> > expanded_location object instead of a separate filename and line
> > number.  This
> > is not a big change because in most cases, this information came
> > anyway from a
> > call to expand_location and the needed expanded_location object is
> > readily
> > available. The new overload of location_get_source_line() uses the
> > extra
> > information in the expanded_location object to obtain the data from
> > the
> > in-memory buffer when it originated from an LC_GEN map.
> >
> > Until the subsequent patch that starts using LC_GEN maps, none are
> > yet
> > generated within GCC, hence nothing is added to the testsuite here;
> > but all
> > relevant selftests have been extended to cover generated data maps in
> > addition
> > to normal files.
>
> [..snip...]
>
> Thanks for the updated patch.
>
> Reading this patch, it felt a bit unnatural to me to have an
>   (exploded location, source line)
> pair where the exploded location seems to be representing "which source
> file or generated buffer", but the line/column info in that
> exploded_location is to be ignored in favor of the 2nd source line.
>
> I think we're missing a class: something that identifies either a
> specific source file, or a specific generated buffer.
>
> How about something like either:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_filename_or_buffer (filename),
>     m_len (0)
>   {
>   }
>
>   explicit source_id (const char *buffer, unsigned buffer_len)
>   : m_filename_or_buffer (buffer),
>     m_len (buffer_len)
>   {
>     linemap_assert (buffer_len > 0);
>   }
>
> private:
>   const char *m_filename_or_buffer;
>   unsigned m_len;  // where 0 means "it's a filename"
> };
>
> or:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_ptr (filename),
>     m_is_buffer (false)
>   {
>   }
>
>   explicit source_id (const linemap_ordinary *buffer_linemap)
>   : m_ptr (buffer_linemap),
>     m_is_buffer (true)
>   {
>   }
>
> private:
>   const void *m_ptr;
>   bool m_is_buffer;
> };
>
> and use one of these "source_id file" in place of "const char *file",
> rather than replacing such things with expanded_location?
>
> > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > index e8d3dece770..4164fa0b1ba 100644
> > --- a/gcc/c-family/c-indentation.cc
> > +++ b/gcc/c-family/c-indentation.cc
> > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> >                  unsigned int *first_nws,
> >                  unsigned int tab_width)
> >  {
> > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > +  char_span line = location_get_source_line (exploc);
>
> ...so this might contine to be:
>
>   char_span line = location_get_source_line (exploc.file, exploc.line);
>
> ...but expanded_location's "file" field would become a source_id,
> rather than a const char *.  It looks like doing do might make a lot of
> "is this the same file or buffer?"  turn into comparisons of source_id
> instances.
>
> So I think expanded_location would become:
>
> typedef struct
> {
>   /* Either the name of the source file involved, or the
>      specific generated buffer.  */
>   source_id file;
>
>   /* The line-location in the source file.  */
>   int line;
>
>   int column;
>
>   void *data;
>
>   /* In a system header?. */
>   bool sysp;
> } expanded_location;
>
> and we wouldn't need to add these extra fields:
>
> > +
> > +  /* If generated data, the data and its length.  The data may contain embedded
> > +   nulls and need not be null-terminated.  */
> > +  unsigned int generated_data_len;
> > +  const char *generated_data;
> >  } expanded_location;
>
> and we could pass around source_id instances when identifying specific
> filenames/generated buffers.
>
> Does this idea simplify/clarify the patch, or make it more complicated?
>
> [...snip...]
>
> Thoughts?
> Dave
>

Thanks, this makes sense and I think on balance it makes the interface
nicer to do it this way. In the last patch of this series (for SARIF
output) I had found it necessary to use a
    typdef std::pair<const char*, unsigned int> filename_or_buffer;
which was the same thing in spirit as your source_id. It makes sense
to promote that to a real class and use it more widely.

I will send out an updated series with that change later after testing.

I don't think we can simply change "file" in expanded_location to be a
source_id, because this field is used in lots of places that don't
care about generated data buffers, and that interface change would
necessitate touching all of them. For example, gengtype.cc uses libcpp
and expanded_location, and there are a bunch of call sites in the
middle end and backend that do as well, plus e.g. the custom
diagnostics hooks in the C++ front end that print "inlined from
xyz.cc". I thought about making source_id implicitly convertible to
char*, but I think that is too error prone, plus it doesn't help with
the most common use of this field, which is to pass it to printf().
The approach I am thinking to take is to leave "file" as it is in
expanded_location, but to also add a "source_id src" field too. This
way, the only call sites that need to be touched are those that care
about this distinction, and so the number of changes is much more
limited. But it does still achieve the goal that we don't need to use
an expanded_location to communicate with input.cc, we can use a
source_id instead, and that makes the interface more natural.

With the new interface, the main change needed for diagnostics code
would be that instead of calling location_get_source_line(file_name,
line), you would need to call location_get_source_line(src_id, line).
In case both the source_id and the line number come from an
expanded_location, there could be a convenience overload like
location_get_source_line(exploc) also, but it wouldn't be necessary to
involved an expanded_location if the source_id and line are better
handled separately.

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens
  2023-07-31 22:39     ` Lewis Hyatt
@ 2023-08-09 22:14       ` Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
                           ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

On Mon, Jul 31, 2023 at 06:39:15PM -0400, Lewis Hyatt wrote:
> On Fri, Jul 28, 2023 at 6:58 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >
> > On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > > Add a new linemap reason LC_GEN which enables encoding the location
> > > of data
> > > that was generated during compilation and does not appear in any
> > > source file.
> > > There could be many use cases, such as, for instance, referring to
> > > the content
> > > of builtin macros (not yet implemented, but an easy lift after this
> > > one.) The
> > > first intended application is to create a place to store the input to
> > > a
> > > _Pragma directive, so that proper locations can be assigned to those
> > > tokens. This will be done in a subsequent commit.
> > >
> > > The actual change needed to the line-maps API in libcpp is not too
> > > large and
> > > requires no space overhead in the line map data structures (on 64-bit
> > > systems
> > > that is; one newly added data member to class line_map_ordinary sits
> > > inside
> > > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > > other,
> > > but the TO_FILE member that normally points to the file name points
> > > instead to
> > > the actual data.  This works automatically with PCH as well, for the
> > > same
> > > reason that the file name makes its way into a PCH.  In order to
> > > avoid
> > > confusion, the member has been renamed from TO_FILE to DATA, and
> > > associated
> > > accessors adjusted.
> > >
> > > Outside libcpp, there are many small changes but most of them are to
> > > selftests, which are necessarily more sensitive to implementation
> > > details. From the perspective of the user (the "user", here, being a
> > > frontend
> > > using line maps or else the diagnostics infrastructure), the chief
> > > visible
> > > change is that the function location_get_source_line() should be
> > > passed an
> > > expanded_location object instead of a separate filename and line
> > > number.  This
> > > is not a big change because in most cases, this information came
> > > anyway from a
> > > call to expand_location and the needed expanded_location object is
> > > readily
> > > available. The new overload of location_get_source_line() uses the
> > > extra
> > > information in the expanded_location object to obtain the data from
> > > the
> > > in-memory buffer when it originated from an LC_GEN map.
> > >
> > > Until the subsequent patch that starts using LC_GEN maps, none are
> > > yet
> > > generated within GCC, hence nothing is added to the testsuite here;
> > > but all
> > > relevant selftests have been extended to cover generated data maps in
> > > addition
> > > to normal files.
> >
> > [..snip...]
> >
> > Thanks for the updated patch.
> >
> > Reading this patch, it felt a bit unnatural to me to have an
> >   (exploded location, source line)
> > pair where the exploded location seems to be representing "which source
> > file or generated buffer", but the line/column info in that
> > exploded_location is to be ignored in favor of the 2nd source line.
> >
> > I think we're missing a class: something that identifies either a
> > specific source file, or a specific generated buffer.
> >
> > How about something like either:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_filename_or_buffer (filename),
> >     m_len (0)
> >   {
> >   }
> >
> >   explicit source_id (const char *buffer, unsigned buffer_len)
> >   : m_filename_or_buffer (buffer),
> >     m_len (buffer_len)
> >   {
> >     linemap_assert (buffer_len > 0);
> >   }
> >
> > private:
> >   const char *m_filename_or_buffer;
> >   unsigned m_len;  // where 0 means "it's a filename"
> > };
> >
> > or:
> >
> > class source_id
> > {
> > public:
> >   source_id (const char *filename)
> >   : m_ptr (filename),
> >     m_is_buffer (false)
> >   {
> >   }
> >
> >   explicit source_id (const linemap_ordinary *buffer_linemap)
> >   : m_ptr (buffer_linemap),
> >     m_is_buffer (true)
> >   {
> >   }
> >
> > private:
> >   const void *m_ptr;
> >   bool m_is_buffer;
> > };
> >
> > and use one of these "source_id file" in place of "const char *file",
> > rather than replacing such things with expanded_location?
> >
> > > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > > index e8d3dece770..4164fa0b1ba 100644
> > > --- a/gcc/c-family/c-indentation.cc
> > > +++ b/gcc/c-family/c-indentation.cc
> > > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> > >                  unsigned int *first_nws,
> > >                  unsigned int tab_width)
> > >  {
> > > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > > +  char_span line = location_get_source_line (exploc);
> >
> > ...so this might contine to be:
> >
> >   char_span line = location_get_source_line (exploc.file, exploc.line);
> >
> > ...but expanded_location's "file" field would become a source_id,
> > rather than a const char *.  It looks like doing do might make a lot of
> > "is this the same file or buffer?"  turn into comparisons of source_id
> > instances.
> >
> > So I think expanded_location would become:
> >
> > typedef struct
> > {
> >   /* Either the name of the source file involved, or the
> >      specific generated buffer.  */
> >   source_id file;
> >
> >   /* The line-location in the source file.  */
> >   int line;
> >
> >   int column;
> >
> >   void *data;
> >
> >   /* In a system header?. */
> >   bool sysp;
> > } expanded_location;
> >
> > and we wouldn't need to add these extra fields:
> >
> > > +
> > > +  /* If generated data, the data and its length.  The data may contain embedded
> > > +   nulls and need not be null-terminated.  */
> > > +  unsigned int generated_data_len;
> > > +  const char *generated_data;
> > >  } expanded_location;
> >
> > and we could pass around source_id instances when identifying specific
> > filenames/generated buffers.
> >
> > Does this idea simplify/clarify the patch, or make it more complicated?
> >
> > [...snip...]
> >
> > Thoughts?
> > Dave
> >
> 
> Thanks, this makes sense and I think on balance it makes the interface
> nicer to do it this way. In the last patch of this series (for SARIF
> output) I had found it necessary to use a
>     typdef std::pair<const char*, unsigned int> filename_or_buffer;
> which was the same thing in spirit as your source_id. It makes sense
> to promote that to a real class and use it more widely.
> 
> I will send out an updated series with that change later after testing.
> 
> I don't think we can simply change "file" in expanded_location to be a
> source_id, because this field is used in lots of places that don't
> care about generated data buffers, and that interface change would
> necessitate touching all of them. For example, gengtype.cc uses libcpp
> and expanded_location, and there are a bunch of call sites in the
> middle end and backend that do as well, plus e.g. the custom
> diagnostics hooks in the C++ front end that print "inlined from
> xyz.cc". I thought about making source_id implicitly convertible to
> char*, but I think that is too error prone, plus it doesn't help with
> the most common use of this field, which is to pass it to printf().
> The approach I am thinking to take is to leave "file" as it is in
> expanded_location, but to also add a "source_id src" field too. This
> way, the only call sites that need to be touched are those that care
> about this distinction, and so the number of changes is much more
> limited. But it does still achieve the goal that we don't need to use
> an expanded_location to communicate with input.cc, we can use a
> source_id instead, and that makes the interface more natural.
> 
> With the new interface, the main change needed for diagnostics code
> would be that instead of calling location_get_source_line(file_name,
> line), you would need to call location_get_source_line(src_id, line).
> In case both the source_id and the line number come from an
> expanded_location, there could be a convenience overload like
> location_get_source_line(exploc) also, but it wouldn't be necessary to
> involved an expanded_location if the source_id and line are better
> handled separately.
> 

Hello-

Here is the updated patch series with the interface change we
discussed.

I also identified an issue with the previous version. I had intended for
struct line_map_ordinary to incur 0 space overhead with this change, but
with the prior approach, the size was increasing by 8 bytes (from 24 to
32). In this round, I changed the approach to use a union so there is no
size overhead. Rather, there is an extra level of indirection incurred only
when LC_GEN maps are used, so that the impact on current usage is minimal to
none.

While working on this version I felt that the first patch is really too
large. In this iteration, I split it into 6 patches to (hopefully)
make it easier to review. If that is inconvenient for any reason, please let
me know and I can send it the old way.

Thanks again for taking a look at it.

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-11 22:45           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
                           ` (6 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The TO_FILE member of struct line_map_ordinary has been changed to a union
named SRC which can be either a file name, or a pointer to a line_map_data
struct describing the data. There is no space overhead added to the line
maps data structures.

Outside libcpp, this patch includes only the minimal changes implied by the
adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
patches will implement the new functionality.

libcpp/ChangeLog:

	* include/line-map.h (enum lc_reason): Add LC_GEN.
	(struct line_map_data): New struct.
	(struct line_map_ordinary): Change TO_FILE from a char* to a union,
	and rename to SRC.
	(class source_id): New class.
	(ORDINARY_MAP_GENERATED_DATA_P): New function.
	(ORDINARY_MAP_GENERATED_DATA): New function.
	(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
	(ORDINARY_MAP_SOURCE_ID): New function.
	(ORDINARY_MAPS_SAME_FILE_P): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
	(LINEMAP_FILE): Adapt to struct line_map_ordinary change.
	(linemap_get_file_highest_location): Likewise.
	* line-map.cc (source_id::operator==): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
	(linemap_add): Support creating LC_GEN maps.
	(linemap_line_start): Support LC_GEN maps.
	(linemap_check_files_exited): Likewise.
	(linemap_position_for_loc_and_offset): Likewise.
	(linemap_get_expansion_filename): Likewise.
	(linemap_dump): Likewise.
	(linemap_dump_location): Likewise.
	(linemap_get_file_highest_location): Likewise.
	* directives.cc (_cpp_do_file_change): Likewise.

gcc/c-family/ChangeLog:

	* c-common.cc (try_to_locate_new_include_insertion_point): Recognize
	and ignore LC_GEN maps.

gcc/cp/ChangeLog:

	* module.cc (module_state::write_ordinary_maps): Recognize and
	ignore LC_GEN maps, and adapt to interface change in struct
	line_map_ordinary.
	(module_state::read_ordinary_maps): Likewise.

gcc/ChangeLog:

	* diagnostic-show-locus.cc (compatible_locations_p): Adapt to
	interface change in struct line_map_ordinary.
	* input.cc (special_fname_generated): New function.
	(dump_location_info): Support LC_GEN maps.
	(get_substring_ranges_for_loc): Adapt to interface change in struct
	line_map_ordinary.
	* input.h (special_fname_generated): Declare.

gcc/go/ChangeLog:

	* go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
	LC_GEN maps.
---
 gcc/c-family/c-common.cc     |  11 ++-
 gcc/cp/module.cc             |   8 +-
 gcc/diagnostic-show-locus.cc |   2 +-
 gcc/go/go-linemap.cc         |   3 +-
 gcc/input.cc                 |  27 +++++-
 gcc/input.h                  |   1 +
 libcpp/directives.cc         |   4 +-
 libcpp/include/line-map.h    | 144 ++++++++++++++++++++++++----
 libcpp/line-map.cc           | 181 +++++++++++++++++++++++++----------
 9 files changed, 299 insertions(+), 82 deletions(-)

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 9fbaeb437a1..ecfc2efc29f 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9206,19 +9206,22 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
       const line_map_ordinary *ord_map
 	= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+      if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+	continue;
+
       if (const line_map_ordinary *from
 	  = linemap_included_from_linemap (line_table, ord_map))
 	/* We cannot use pointer equality, because with preprocessed
 	   input all filename strings are unique.  */
-	if (0 == strcmp (from->to_file, file))
+	if (ORDINARY_MAP_SOURCE_ID (from) == file)
 	  {
 	    last_include_ord_map = from;
 	    last_ord_map_after_include = NULL;
 	  }
 
-      /* Likewise, use strcmp, and reject any line-zero introductory
-	 map.  */
-      if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+      /* Likewise, use strcmp (via the source_id comparison), and reject any
+	 line-zero introductory map.  */
+      if (ord_map->to_line && ORDINARY_MAP_SOURCE_ID (ord_map) == file)
 	{
 	  if (!first_ord_map_in_file)
 	    first_ord_map_in_file = ord_map;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..ff17cd57016 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16250,6 +16250,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
        iter != end; ++iter)
     if (iter->src != current)
       {
+	if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	  continue;
 	current = iter->src;
 	const char *fname = ORDINARY_MAP_FILE_NAME (iter->src);
 
@@ -16267,7 +16269,7 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
 		   preprocessed input we could have multiple instances
 		   of the same name, and we'd rather not percolate
 		   that.  */
-		const_cast<line_map_ordinary *> (iter->src)->to_file = name;
+		const_cast<line_map_ordinary *> (iter->src)->src.file = name;
 		fname = NULL;
 		break;
 	      }
@@ -16295,6 +16297,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
   for (auto iter = ord_loc_remap->begin (), end = ord_loc_remap->end ();
        iter != end; ++iter)
     {
+      if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	continue;
       dump (dumper::LOCATION)
 	&& dump ("Span:%u ordinary [%u+%u,+%u)->[%u,+%u)",
 		 iter - ord_loc_remap->begin (),
@@ -16456,7 +16460,7 @@ module_state::read_ordinary_maps (unsigned num_ord_locs, unsigned range_bits)
 	  map->m_range_bits = sec.u ();
 	  map->m_column_and_range_bits = sec.u () + map->m_range_bits;
 	  unsigned fnum = sec.u ();
-	  map->to_file = (fnum < filenames.length () ? filenames[fnum] : "");
+	  map->src.file = (fnum < filenames.length () ? filenames[fnum] : "");
 	  map->to_line = sec.u ();
 	  base = map;
 	}
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 0514815b51f..a2aa6b4e0b5 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
 	 are in the same file.  */
       const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
       const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
-      return ord_map_a->to_file == ord_map_b->to_file;
+      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);
     }
 }
 
diff --git a/gcc/go/go-linemap.cc b/gcc/go/go-linemap.cc
index 1d72e79647d..02d4ce04181 100644
--- a/gcc/go/go-linemap.cc
+++ b/gcc/go/go-linemap.cc
@@ -84,7 +84,8 @@ Gcc_linemap::to_string(Location location)
   resolved_location =
       linemap_resolve_location (line_table, location.gcc_location(),
                                 LRK_SPELLING_LOCATION, &lmo);
-  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT)
+  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT
+      || ORDINARY_MAP_GENERATED_DATA_P (lmo))
     return "";
   const char *path = LINEMAP_FILE (lmo);
   if (!path)
diff --git a/gcc/input.cc b/gcc/input.cc
index eaf301ec7c1..c1735215b29 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -35,6 +35,12 @@ special_fname_builtin ()
   return _("<built-in>");
 }
 
+const char *
+special_fname_generated ()
+{
+  return _("<generated>");
+}
+
 /* Input charset configuration.  */
 static const char *default_charset_callback (const char *)
 {
@@ -1391,7 +1397,19 @@ dump_location_info (FILE *stream)
       fprintf (stream, "ORDINARY MAP: %i\n", idx);
       dump_location_range (stream,
 			   MAP_START_LOCATION (map), end_location);
-      fprintf (stream, "  file: %s\n", ORDINARY_MAP_FILE_NAME (map));
+
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	{
+	  fprintf (stream, "  file: %s%s\n",
+		   ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map),
+		   special_fname_generated ());
+	  fprintf (stream, "  data: %.*s\n",
+		   (int) ORDINARY_MAP_GENERATED_DATA_LEN (map),
+		   ORDINARY_MAP_GENERATED_DATA (map));
+	}
+      else
+	fprintf (stream, "  file: %s\n", LINEMAP_FILE (map));
+
       fprintf (stream, "  starting at line: %i\n",
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (map));
       fprintf (stream, "  column and range bits: %i\n",
@@ -1417,6 +1435,9 @@ dump_location_info (FILE *stream)
       case LC_ENTER_MACRO:
 	reason = "LC_RENAME_MACRO";
 	break;
+      case LC_GEN:
+	reason = "LC_GEN";
+	break;
       default:
 	reason = "Unknown";
       }
@@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       /* Bulletproofing.  We ought to only have different ordinary maps
 	 for start vs finish due to line-length jumps.  */
       if (start_ord_map != final_ord_map
-	  && start_ord_map->to_file != final_ord_map->to_file)
+	  && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))
 	return "start and finish are spelled in different ordinary maps";
       /* The file from linemap_resolve_location ought to match that from
 	 expand_location_to_spelling_point.  */
-      if (start_ord_map->to_file != start.file)
+      if (ORDINARY_MAP_SOURCE_ID (start_ord_map) != start.file)
 	return "mismatching file after resolving linemap";
 
       location_t start_loc
diff --git a/gcc/input.h b/gcc/input.h
index d1087b7a9e8..1b81a995f86 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -34,6 +34,7 @@ extern GTY(()) class line_maps *saved_line_table;
 
 /* Returns the translated string referring to the special location.  */
 const char *special_fname_builtin ();
+const char *special_fname_generated ();
 
 /* line-map.cc reserves RESERVED_LOCATION_COUNT to the user.  Ensure
    both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index ee5419d1f40..dfd782b3fca 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1165,7 +1165,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
 		     const char *to_file, linenum_type to_line,
 		     unsigned int sysp)
 {
-  linemap_assert (reason != LC_ENTER_MACRO);
+  linemap_assert (reason != LC_ENTER_MACRO && reason != LC_GEN);
 
   const line_map_ordinary *ord_map = NULL;
   if (!to_line && reason == LC_RENAME_VERBATIM)
@@ -1176,7 +1176,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
          preprocessed source.  */
       line_map_ordinary *last = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
       if (!ORDINARY_MAP_STARTING_LINE_NUMBER (last)
-	  && 0 == filename_cmp (to_file, ORDINARY_MAP_FILE_NAME (last))
+	  && ORDINARY_MAP_SOURCE_ID (last) == to_file
 	  && SOURCE_LINE (last, pfile->line_table->highest_line) == 2)
 	{
 	  ord_map = last;
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 44fea0ea08e..e59123b18c5 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -75,6 +75,8 @@ enum lc_reason
   LC_RENAME_VERBATIM,	/* Likewise, but "" != stdin.  */
   LC_ENTER_MACRO,	/* Begin macro expansion.  */
   LC_MODULE,		/* A (C++) Module.  */
+  LC_GEN,		/* Internally generated source.  */
+
   /* FIXME: add support for stringize and paste.  */
   LC_HWM /* High Water Mark.  */
 };
@@ -355,6 +357,16 @@ typedef void *(*line_map_realloc) (void *, size_t);
    for a given requested allocation.  */
 typedef size_t (*line_map_round_alloc_size_func) (size_t);
 
+/* Struct to hold the data + size for in-memory data to be stored in a
+   line_map_ordinary.  Because this is used rarely, it is better to
+   dynamically allocate this struct just when needed, rather than adding
+   overhead to every line_map to store the extra field.  */
+struct GTY(()) line_map_data
+{
+  const char * GTY((string_length ("%h.len"))) data;
+  unsigned int len;
+};
+
 /* A line_map encodes a sequence of locations.
    There are two kinds of maps. Ordinary maps and macro expansion
    maps, a.k.a macro maps.
@@ -437,9 +449,15 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
 
   /* Pointer alignment boundary on both 32 and 64-bit systems.  */
 
-  const char *to_file;
-  linenum_type to_line;
+  /* SRC is either the file name, in the typical case, or a pointer to
+     a line_map_data which shows where to find the actual data, for the
+     case of an LC_GEN map.  */
+  union {
+    const char * GTY((tag ("false"))) file;
+    line_map_data * GTY((tag ("true"))) data;
+  } GTY((desc ("ORDINARY_MAP_GENERATED_DATA_P (&%1)"))) src;
 
+  linenum_type to_line;
   /* Location from whence this line map was included.  For regular
      #includes, this location will be the last location of a map.  For
      outermost file, this is 0.  For modules it could be anywhere
@@ -565,6 +583,42 @@ struct GTY((tag ("2"))) line_map_macro : public line_map {
 #define linemap_assert_fails(EXPR) (! (EXPR))
 #endif
 
+/* A source_id represents a location that contains source code, which is usually
+   the name of a file.  But if the buffer length is non-zero, then it refers
+   instead to an in-memory buffer.  This is used so that diagnostics can refer
+   to generated data as well as to normal source code.  */
+
+class source_id
+{
+public:
+  /* This constructor is for the typical case, where the source code lives in
+     a file.  It is not explicit, because this case is by far the most common
+     one, it is worthwhile to allow implicit construction from a string.  */
+  source_id (const char *filename = nullptr)
+    : m_filename_or_buffer (filename),
+      m_len (0)
+  {}
+
+  /* This constructor is for the in-memory data case.  */
+  source_id (const char *buffer, unsigned buffer_len)
+    : m_filename_or_buffer (buffer),
+      m_len (buffer_len)
+  {
+    linemap_assert (buffer_len > 0);
+  }
+
+  explicit operator bool () const { return m_filename_or_buffer; }
+  const char * get_filename_or_buffer () const { return m_filename_or_buffer; }
+  unsigned get_buffer_len () const { return m_len; }
+  bool is_buffer () const { return m_len; }
+  bool operator== (source_id src) const;
+  bool operator!= (source_id src) const { return !(*this == src); }
+
+private:
+  const char *m_filename_or_buffer;
+  unsigned m_len;
+};
+
 /* Get whether location LOC is an ordinary location.  */
 
 inline bool
@@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* TRUE if this line map contains generated data.  */
+inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
+{
+  return ord_map->reason == LC_GEN;
+}
+
 /* TRUE if this line map is for a module (not a source file).  */
 
 inline bool
@@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
 	  && linemap_check_ordinary (map)->reason == LC_MODULE);
 }
 
-/* Get the filename of ordinary map MAP.  */
+/* Get the data contents of ordinary map MAP.  */
 
 inline const char *
 ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  linemap_assert (ord_map->reason != LC_GEN);
+  return ord_map->src.file;
+}
+
+inline const char *
+ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->data;
+}
+
+inline unsigned int
+ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->len;
+}
+
+inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
+{
+  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    return source_id {ord_map->src.data->data, ord_map->src.data->len};
+  return source_id {ord_map->src.file};
+}
+
+/* If we just want to know whether two maps point to the same
+   file/buffer or not.  */
+inline bool
+ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
+			   const line_map_ordinary *map2)
+{
+  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
 }
 
+
 /* Get the cpp macro whose expansion gave birth to macro map MAP.  */
 
 inline cpp_hashnode *
@@ -1093,21 +1185,28 @@ extern location_t linemap_line_start
 extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
 
 /* Add a mapping of logical source line to physical source file and
-   line number. This function creates an "ordinary map", which is a
+   line number.  This function creates an "ordinary map", which is a
    map that records locations of tokens that are not part of macro
    replacement-lists present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the lifetime of SET.  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
-   natural values considering the file we are returning to.
+   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
+   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
+   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
+   values considering the file we are returning to.  If reason is LC_GEN, then
+   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
+   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
+
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
+   map.
+
+   A call to this function can relocate the previous set of maps, so any stored
+   line_map pointers should not be used.  */
 
-   A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
 extern const line_map *linemap_add
   (class line_maps *, enum lc_reason, unsigned int sysp,
-   const char *to_file, linenum_type to_line);
+   const char *filename_or_buffer, linenum_type to_line,
+   unsigned int data_len = 0);
 
 /* Create a macro map.  A macro map encodes source locations of tokens
    that are part of a macro replacement-list, at a macro expansion
@@ -1257,7 +1356,7 @@ linemap_position_for_loc_and_offset (class line_maps *set,
 inline const char *
 LINEMAP_FILE (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  return ORDINARY_MAP_FILE_NAME (ord_map);
 }
 
 /* Return the line number this map started encoding location from.  */
@@ -1277,6 +1376,13 @@ LINEMAP_SYSP (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map);
+
 const struct line_map *first_map_in_common (line_maps *set,
 					    location_t loc0,
 					    location_t loc1,
@@ -2104,12 +2210,10 @@ struct linemap_stats
   long adhoc_table_entries_used;
 };
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
-bool linemap_get_file_highest_location (class line_maps * set,
-					const char *file_name,
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
+bool linemap_get_file_highest_location (line_maps *set, source_id src,
 					location_t *loc);
 
 /* Compute and return statistics about the memory consumption of some
diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index e0f82e20571..e63916054e0 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -48,6 +48,31 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
 extern unsigned num_expanded_macros_counter;
 extern unsigned num_macro_tokens_counter;
 
+bool
+source_id::operator== (source_id src) const
+{
+  return m_len == src.m_len
+    && (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer
+	? m_filename_or_buffer == src.m_filename_or_buffer
+	: !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer));
+}
+
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map)
+{
+  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    {
+      ord_map = linemap_included_from_linemap (set, ord_map);
+      if (!ord_map)
+	return "-";
+    }
+  return ORDINARY_MAP_FILE_NAME (ord_map);
+}
+
 /* Destructor for class line_maps.
    Ensure non-GC-managed memory is released.  */
 
@@ -411,8 +436,9 @@ linemap_check_files_exited (line_maps *set)
   for (const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP (set);
        ! MAIN_FILE_P (map);
        map = linemap_included_from_linemap (set, map))
-    fprintf (stderr, "line-map.cc: file \"%s\" entered but not left\n",
-	     ORDINARY_MAP_FILE_NAME (map));
+    fprintf (stderr, "line-map.cc: file \"%s%s\" entered but not left\n",
+	     ORDINARY_MAP_CONTAINING_FILE_NAME (set, map),
+	     ORDINARY_MAP_GENERATED_DATA_P (map) ? "<generated>" : "");
 }
 
 /* Create NUM zero-initialized maps of type MACRO_P.  */
@@ -505,21 +531,28 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
 }
 
 /* Add a mapping of logical source line to physical source file and
-   line number.
+   line number.  This function creates an "ordinary map", which is a
+   map that records locations of tokens that are not part of macro
+   replacement-lists present at a macro expansion point.
+
+   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
+   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
+   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
+   values considering the file we are returning to.  If reason is LC_GEN, then
+   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
+   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the final call to lookup_line ().  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
-   natural values considering the file we are returning to.
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
+   map.
 
-   FROM_LINE should be monotonic increasing across calls to this
-   function.  A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   A call to this function can relocate the previous set of maps, so any stored
+   line_map pointers should not be used.  */
 
 const struct line_map *
 linemap_add (line_maps *set, enum lc_reason reason,
-	     unsigned int sysp, const char *to_file, linenum_type to_line)
+	     unsigned int sysp, const char *filename_or_buffer,
+	     linenum_type to_line, unsigned int data_len)
 {
   /* Generate a start_location above the current highest_location.
      If possible, make the low range bits be zero.  */
@@ -536,12 +569,24 @@ linemap_add (line_maps *set, enum lc_reason reason,
 
   /* When we enter the file for the first time reason cannot be
      LC_RENAME.  */
-  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
+  line_map_data *data_to_reuse = nullptr;
+  bool is_data_map = (reason == LC_GEN);
+  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
+    {
+      linemap_assert (set->depth != 0);
+      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
+      linemap_assert (prev);
+      if (prev->reason == LC_GEN)
+	{
+	  data_to_reuse = prev->src.data;
+	  is_data_map = true;
+	}
+    }
 
   /* If we are leaving the main file, return a NULL map.  */
   if (reason == LC_LEAVE
       && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
-      && to_file == NULL)
+      && filename_or_buffer == NULL)
     {
       set->depth--;
       return NULL;
@@ -557,8 +602,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
     = linemap_check_ordinary (new_linemap (set, start_location));
   map->reason = reason;
 
-  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
-    to_file = "<stdin>";
+  if (filename_or_buffer && *filename_or_buffer == '\0'
+      && reason != LC_RENAME_VERBATIM && !is_data_map)
+    filename_or_buffer = "<stdin>";
 
   if (reason == LC_RENAME_VERBATIM)
     reason = LC_RENAME;
@@ -577,21 +623,50 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	 that comes right before MAP in the same file.  */
       from = linemap_included_from_linemap (set, map - 1);
 
-      /* A TO_FILE of NULL is special - we use the natural values.  */
-      if (to_file == NULL)
+      /* Not currently supporting a #include originating from an LC_GEN
+	 map, since there is no clear use case for this and it would complicate
+	 the logic here.  */
+      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
+
+      /* A null FILENAME_OR_BUFFER is special - we use the natural
+	 values.  */
+      if (!filename_or_buffer)
 	{
-	  to_file = ORDINARY_MAP_FILE_NAME (from);
+	  filename_or_buffer = from->src.file;
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
       else
 	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-				      to_file) == 0);
+				      filename_or_buffer) == 0);
     }
 
   map->sysp = sysp;
-  map->to_file = to_file;
   map->to_line = to_line;
+
+  if (is_data_map)
+    {
+      /* All data maps should have reason == LC_GEN, even if they were
+	 an LC_RENAME, to keep it simple to check which maps contain
+	 data.  */
+      map->reason = LC_GEN;
+
+      if (data_to_reuse)
+	map->src.data = data_to_reuse;
+      else
+	{
+	  auto src_data
+	    = (line_map_data *)set->reallocator (nullptr,
+						 sizeof (line_map_data));
+	  src_data->data = filename_or_buffer;
+	  src_data->len = data_len;
+	  gcc_assert (data_len);
+	  map->src.data = src_data;
+	}
+    }
+  else
+    map->src.file = filename_or_buffer;
+
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
   /* Do not store range_bits here.  That's readjusted in
      linemap_line_start.  */
@@ -606,7 +681,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
      pure_location_p.  */
   linemap_assert (pure_location_p (set, start_location));
 
-  if (reason == LC_ENTER)
+  if (reason == LC_ENTER || reason == LC_GEN)
     {
       if (set->depth == 0)
 	map->included_from = 0;
@@ -617,7 +692,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	      & ~((1 << map[-1].m_column_and_range_bits) - 1))
 	     + map[-1].start_location);
       set->depth++;
-      if (set->trace_includes)
+      if (set->trace_includes && reason == LC_ENTER)
 	trace_include (set, map);
     }
   else if (reason == LC_RENAME)
@@ -859,12 +934,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
 	      >= (((uint64_t) 1)
 		  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
 	  || range_bits < map->m_range_bits)
-	map = linemap_check_ordinary
-	        (const_cast <line_map *>
-		  (linemap_add (set, LC_RENAME,
-				ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
-				ORDINARY_MAP_FILE_NAME (map),
-				to_line)));
+	{
+	  const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
+	    ? nullptr : map->src.file;
+	  map = linemap_check_ordinary
+	    (const_cast <line_map *>
+	     (linemap_add (set, LC_RENAME,
+			   ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
+			   maybe_filename,
+			   to_line)));
+	}
       map->m_column_and_range_bits = column_bits;
       map->m_range_bits = range_bits;
       r = (MAP_START_LOCATION (map)
@@ -1023,9 +1102,9 @@ linemap_position_for_loc_and_offset (line_maps *set,
 	     >= MAP_START_LOCATION (map + 1)); map++)
     /* If the next map is a different file, or starts in a higher line, we
        cannot encode the location there.  */
-    if ((map + 1)->reason != LC_RENAME
+    if (((map + 1)->reason != LC_RENAME && (map + 1)->reason != LC_GEN)
 	|| line < ORDINARY_MAP_STARTING_LINE_NUMBER (map + 1)
-	|| 0 != strcmp (LINEMAP_FILE (map + 1), LINEMAP_FILE (map)))
+	|| !ORDINARY_MAPS_SAME_FILE_P (map, map + 1))
       return loc;
 
   column += column_offset;
@@ -1283,7 +1362,7 @@ linemap_get_expansion_filename (line_maps *set,
 
   linemap_macro_loc_to_exp_point (set, location, &map);
 
-  return LINEMAP_FILE (map);
+  return ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
 }
 
 /* Return the name of the macro associated to MACRO_MAP.  */
@@ -1873,7 +1952,7 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
 {
   const char *const lc_reasons_v[LC_HWM]
       = { "LC_ENTER", "LC_LEAVE", "LC_RENAME", "LC_RENAME_VERBATIM",
-	  "LC_ENTER_MACRO", "LC_MODULE" };
+	  "LC_ENTER_MACRO", "LC_MODULE", "LC_GEN" };
   const line_map *map;
   unsigned reason;
 
@@ -1903,11 +1982,15 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
       const line_map_ordinary *includer_map
 	= linemap_included_from_linemap (set, ord_map);
 
-      fprintf (stream, "File: %s:%d\n", ORDINARY_MAP_FILE_NAME (ord_map),
+      fprintf (stream, "File: %s:%d\n",
+	       ORDINARY_MAP_GENERATED_DATA_P (ord_map) ? "<generated>"
+	       : ORDINARY_MAP_FILE_NAME (ord_map),
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (ord_map));
       fprintf (stream, "Included from: [%d] %s\n",
 	       includer_map ? int (includer_map - set->info_ordinary.maps) : -1,
-	       includer_map ? ORDINARY_MAP_FILE_NAME (includer_map) : "None");
+	       includer_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set,
+								 includer_map)
+	       : "None");
     }
   else
     {
@@ -1931,7 +2014,7 @@ linemap_dump_location (line_maps *set,
 {
   const line_map_ordinary *map;
   location_t location;
-  const char *path = "", *from = "";
+  const char *path = "", *path_suffix = "", *from = "";
   int l = -1, c = -1, s = -1, e = -1;
 
   if (IS_ADHOC_LOC (loc))
@@ -1948,7 +2031,9 @@ linemap_dump_location (line_maps *set,
     linemap_assert (location < RESERVED_LOCATION_COUNT);
   else
     {
-      path = LINEMAP_FILE (map);
+      path = ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	path_suffix = "<generated>";
       l = SOURCE_LINE (map, location);
       c = SOURCE_COLUMN (map, location);
       s = LINEMAP_SYSP (map) != 0;
@@ -1959,24 +2044,23 @@ linemap_dump_location (line_maps *set,
 	{
 	  const line_map_ordinary *from_map
 	    = linemap_included_from_linemap (set, map);
-	  from = from_map ? LINEMAP_FILE (from_map) : "<NULL>";
+	  from = from_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set, from_map)
+	    : "<NULL>";
 	}
     }
 
   /* P: path, L: line, C: column, S: in-system-header, M: map address,
      E: macro expansion?, LOC: original location, R: resolved location   */
-  fprintf (stream, "{P:%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
-	   path, from, l, c, s, (void*)map, e, loc, location);
+  fprintf (stream, "{P:%s%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
+	   path, path_suffix, from, l, c, s, (void*)map, e, loc, location);
 }
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
 
 bool
-linemap_get_file_highest_location (line_maps *set,
-				   const char *file_name,
+linemap_get_file_highest_location (line_maps *set, source_id src,
 				   location_t *loc)
 {
   /* If the set is empty or no ordinary map has been created then
@@ -1984,12 +2068,11 @@ linemap_get_file_highest_location (line_maps *set,
   if (set == NULL || set->info_ordinary.used == 0)
     return false;
 
-  /* Now look for the last ordinary map created for FILE_NAME.  */
+  /* Now look for the last ordinary map created for this file.  */
   int i;
   for (i = set->info_ordinary.used - 1; i >= 0; --i)
     {
-      const char *fname = set->info_ordinary.maps[i].to_file;
-      if (fname && !filename_cmp (fname, file_name))
+      if (ORDINARY_MAP_SOURCE_ID (set->info_ordinary.maps + i) == src)
 	break;
     }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-11 23:02           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
                           ` (5 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

The previous patch in this series introduced the concept of LC_GEN line
maps. This patch continues on the path to using them to improve _Pragma
diagnostics, by adding a new source_id SRC member to struct
expanded_location, which is populated by linemap_expand_location. This
member allows call sites to detect and handle when a location refers to
generated data rather than a plain file name.

The previous FILE member of expanded_location is preserved (although
redundant with SRC), so that call sites which do not and never will care
about generated data do not need to be concerned about it. Call sites that
will care are modified here, to use SRC rather than FILE for comparing
locations.

libcpp/ChangeLog:

	* include/line-map.h (struct expanded_location): Add SRC member. Add
	zero-initializers for all members, since source_id is not a POD
	type.
	(class fixit_hint): Adjust prototype.
	* line-map.cc (linemap_expand_location): Populate the new SRC member
	in the expanded_location.
	(rich_location::maybe_add_fixit): Compare explocs with the new SRC
	field instead of the FILE field.
	(fixit_hint::affects_line_p): Accept a source_id instead of a file
	name, and use it for the comparisons.

gcc/c-family/ChangeLog:

	* c-format.cc (get_corrected_substring): Compare explocs with the
	new SRC field instead of the FILE field.
	* c-indentation.cc (should_warn_for_misleading_indentation): Likewise.
	(assert_get_visual_column_succeeds): Initialize the SRC field in the
	test expanded_location.
	(assert_get_visual_column_fails): Likewise.

gcc/ChangeLog:

	* diagnostic-show-locus.cc (make_range): Adapt to the new
	constructor semantics for struct expanded_location.
	(layout::maybe_add_location_range): Compare explocs with the new SRC
	field instead of the FILE field.
	(layout::validate_fixit_hint_p): Likewise.
	(layout::print_leading_fixits): Use the SRC field in struct
	expanded_location to query fixit_hint::affects_line_p.
	(layout::print_trailing_fixits): Likewise.
	* diagnostic.cc (diagnostic_report_current_module): Use the new SRC
	field in expanded_location to detect LC_GEN locations and identify
	them as such.
	(assert_location_text): Adapt to the new constructor semantics for
	struct expanded_location.
	* input.cc (expand_location_1): Likewise. And when libcpp's
	linemap_expand_location returns a null FILE for generated data,
	replace it with special_fname_generated ().
	(total_lines_num): Handle a generic source_id argument rather than a
	file name only.
	(get_source_text_between): Compare explocs with the new SRC field
	instead of the FILE field.
	(get_substring_ranges_for_loc): Likewise.
	* edit-context.cc (edit_context::apply_fixit): Ignore locations in
	generated data.
	* input.h (LOCATION_SRC): New accessor macro.
---
 gcc/c-family/c-format.cc      |  4 ++--
 gcc/c-family/c-indentation.cc | 10 +++++-----
 gcc/diagnostic-show-locus.cc  | 30 +++++++++++++++++-------------
 gcc/diagnostic.cc             | 19 ++++++++++++-------
 gcc/edit-context.cc           |  2 +-
 gcc/input.cc                  | 21 +++++++++++----------
 gcc/input.h                   |  1 +
 libcpp/include/line-map.h     | 24 ++++++++++++++----------
 libcpp/line-map.cc            | 15 +++++++--------
 9 files changed, 70 insertions(+), 56 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index b4eeebcb30e..529b1408179 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4522,9 +4522,9 @@ get_corrected_substring (const substring_loc &fmt_loc,
     = expand_location_to_spelling_point (fmt_substring_range.m_start);
   expanded_location finish
     = expand_location_to_spelling_point (fmt_substring_range.m_finish);
-  if (caret.file != start.file)
+  if (caret.src != start.src)
     return NULL;
-  if (start.file != finish.file)
+  if (start.src != finish.src)
     return NULL;
   if (caret.line != start.line)
     return NULL;
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index e8d3dece770..fce74991aae 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -334,7 +334,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
   const unsigned int tab_width = global_dc->tabstop;
 
   /* They must be in the same file.  */
-  if (next_stmt_exploc.file != body_exploc.file)
+  if (next_stmt_exploc.src != body_exploc.src)
     return false;
 
   /* If NEXT_STMT_LOC and BODY_LOC are on the same line, consider
@@ -363,7 +363,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
                                           ^ DON'T WARN HERE.  */
   if (next_stmt_exploc.line == body_exploc.line)
     {
-      if (guard_exploc.file != body_exploc.file)
+      if (guard_exploc.src != body_exploc.src)
 	return true;
       if (guard_exploc.line < body_exploc.line)
 	/* The guard is on a line before a line that contains both
@@ -372,7 +372,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
       else if (guard_exploc.line == body_exploc.line)
 	{
 	  /* They're all on the same line.  */
-	  gcc_assert (guard_exploc.file == next_stmt_exploc.file);
+	  gcc_assert (guard_exploc.src == next_stmt_exploc.src);
 	  gcc_assert (guard_exploc.line == next_stmt_exploc.line);
 	  unsigned int guard_vis_column;
 	  unsigned int guard_line_first_nws;
@@ -692,7 +692,7 @@ assert_get_visual_column_succeeds (const location &loc,
 				   unsigned int expected_first_nws)
 {
   expanded_location exploc;
-  exploc.file = file;
+  exploc.src = exploc.file = file;
   exploc.line = line;
   exploc.column = column;
   exploc.data = NULL;
@@ -730,7 +730,7 @@ assert_get_visual_column_fails (const location &loc,
 				const unsigned int tab_width)
 {
   expanded_location exploc;
-  exploc.file = file;
+  exploc.src = exploc.file = file;
   exploc.line = line;
   exploc.column = column;
   exploc.data = NULL;
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index a2aa6b4e0b5..bf969ab6d6a 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -697,9 +697,9 @@ static cpp_char_column_policy def_policy ()
 }
 
 /* Create some expanded locations for testing layout_range.  The filename
-   member of the explocs is set to the empty string.  This member will only be
+   member of the explocs is set to NULL.  This member will only be
    inspected by the calls to location_compute_display_column() made from the
-   layout_point constructors.  That function will check for an empty filename
+   layout_point constructors.  That function will check for a NULL filename
    argument and not attempt to open it, rather treating the non-existent data
    as if the display width were the same as the byte count.  Tests exercising a
    real difference between byte count and display width are performed later,
@@ -708,10 +708,14 @@ static cpp_char_column_policy def_policy ()
 static layout_range
 make_range (int start_line, int start_col, int end_line, int end_col)
 {
-  const expanded_location start_exploc
-    = {"", start_line, start_col, NULL, false};
-  const expanded_location finish_exploc
-    = {"", end_line, end_col, NULL, false};
+  expanded_location start_exploc;
+  start_exploc.line = start_line;
+  start_exploc.column = start_col;
+
+  expanded_location finish_exploc;
+  finish_exploc.line = end_line;
+  finish_exploc.column = end_col;
+
   return layout_range (exploc_with_display_col (start_exploc, def_policy (),
 						LOCATION_ASPECT_START),
 		       exploc_with_display_col (finish_exploc, def_policy (),
@@ -1268,12 +1272,12 @@ layout::maybe_add_location_range (const location_range *loc_range,
 
   /* If any part of the range isn't in the same file as the primary
      location of this diagnostic, ignore the range.  */
-  if (start.file != m_exploc.file)
+  if (start.src != m_exploc.src)
     return false;
-  if (finish.file != m_exploc.file)
+  if (finish.src != m_exploc.src)
     return false;
   if (loc_range->m_range_display_kind == SHOW_RANGE_WITH_CARET)
-    if (caret.file != m_exploc.file)
+    if (caret.src != m_exploc.src)
       return false;
 
   /* Sanitize the caret location for non-primary ranges.  */
@@ -1437,9 +1441,9 @@ layout::get_expanded_location (const line_span *line_span) const
 bool
 layout::validate_fixit_hint_p (const fixit_hint *hint)
 {
-  if (LOCATION_FILE (hint->get_start_loc ()) != m_exploc.file)
+  if (LOCATION_SRC (hint->get_start_loc ()) != m_exploc.src)
     return false;
-  if (LOCATION_FILE (hint->get_next_loc ()) != m_exploc.file)
+  if (LOCATION_SRC (hint->get_next_loc ()) != m_exploc.src)
     return false;
 
   return true;
@@ -2102,7 +2106,7 @@ layout::print_leading_fixits (linenum_type row)
 
       gcc_assert (hint->insertion_p ());
 
-      if (hint->affects_line_p (m_exploc.file, row))
+      if (hint->affects_line_p (m_exploc.src, row))
 	{
 	  /* Printing the '+' with normal colorization
 	     and the inserted line with "insert" colorization
@@ -2554,7 +2558,7 @@ layout::print_trailing_fixits (linenum_type row)
       if (hint->ends_with_newline_p ())
 	continue;
 
-      if (hint->affects_line_p (m_exploc.file, row))
+      if (hint->affects_line_p (m_exploc.src, row))
 	corrections.add_hint (hint);
     }
 
diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index c523f215bae..10a377ea209 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -798,13 +798,17 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
       if (!includes_seen (context, map))
 	{
 	  bool first = true, need_inc = true, was_module = MAP_MODULE_P (map);
-	  expanded_location s = {};
+	  const bool was_gen = ORDINARY_MAP_GENERATED_DATA_P (map);
+	  expanded_location s;
 	  do
 	    {
 	      where = linemap_included_from (map);
 	      map = linemap_included_from_linemap (line_table, map);
 	      bool is_module = MAP_MODULE_P (map);
-	      s.file = LINEMAP_FILE (map);
+	      s.src = ORDINARY_MAP_SOURCE_ID (map);
+	      s.file = (s.src.is_buffer ()
+			? special_fname_generated ()
+			: s.src.get_filename_or_buffer ());
 	      s.line = SOURCE_LINE (map, where);
 	      int col = -1;
 	      if (first && context->show_column)
@@ -823,10 +827,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
 		 N_("of module"),
 		 N_("In module imported at"),	/* 6 */
 		 N_("imported at"),
+		 N_("In buffer generated from"),   /* 8 */
 		};
 
-	      unsigned index = (was_module ? 6 : is_module ? 4
-				: need_inc ? 2 : 0) + !first;
+	      const unsigned index
+		= was_gen ? 8
+		: ((was_module ? 6 : is_module ? 4 : need_inc ? 2 : 0)
+		   + !first);
 
 	      pp_verbatim (context->printer, "%s%s %r%s%s%R",
 			   first ? "" : was_module ? ", " : ",\n",
@@ -2691,11 +2698,9 @@ assert_location_text (const char *expected_loc_text,
   dc.column_origin = origin;
 
   expanded_location xloc;
-  xloc.file = filename;
+  xloc.src = xloc.file = filename;
   xloc.line = line;
   xloc.column = column;
-  xloc.data = NULL;
-  xloc.sysp = false;
 
   char *actual_loc_text = diagnostic_get_location_text (&dc, xloc);
   ASSERT_STREQ (expected_loc_text, actual_loc_text);
diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
index 6f5bc6b9d8f..15052aec417 100644
--- a/gcc/edit-context.cc
+++ b/gcc/edit-context.cc
@@ -295,7 +295,7 @@ edit_context::apply_fixit (const fixit_hint *hint)
 {
   expanded_location start = expand_location (hint->get_start_loc ());
   expanded_location next_loc = expand_location (hint->get_next_loc ());
-  if (start.file != next_loc.file)
+  if (start.src != next_loc.src || start.src.is_buffer ())
     return false;
   if (start.line != next_loc.line)
     return false;
diff --git a/gcc/input.cc b/gcc/input.cc
index c1735215b29..c2559614a99 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -236,8 +236,6 @@ expand_location_1 (location_t loc,
       loc = LOCATION_LOCUS (loc);
     }
 
-  memset (&xloc, 0, sizeof (xloc));
-
   if (loc >= RESERVED_LOCATION_COUNT)
     {
       if (!expansion_point_p)
@@ -288,7 +286,12 @@ expand_location_1 (location_t loc,
 
   xloc.data = block;
   if (loc <= BUILTINS_LOCATION)
-    xloc.file = loc == UNKNOWN_LOCATION ? NULL : special_fname_builtin ();
+    {
+      xloc.file = loc == UNKNOWN_LOCATION ? NULL : special_fname_builtin ();
+      xloc.src = xloc.file;
+    }
+  else if (xloc.src.is_buffer ())
+    xloc.file = special_fname_generated ();
 
   return xloc;
 }
@@ -323,11 +326,11 @@ diagnostic_file_cache_fini (void)
    equals the actual number of lines of the file.  */
 
 static size_t
-total_lines_num (const char *file_path)
+total_lines_num (source_id src)
 {
   size_t r = 0;
   location_t l = 0;
-  if (linemap_get_file_highest_location (line_table, file_path, &l))
+  if (linemap_get_file_highest_location (line_table, src, &l))
     {
       gcc_assert (l >= RESERVED_LOCATION_COUNT);
       expanded_location xloc = expand_location (l);
@@ -990,9 +993,7 @@ get_source_text_between (location_t start, location_t end)
 
   /* If the locations are in different files or the end comes before the
      start, give up and return nothing.  */
-  if (!expstart.file || !expend.file)
-    return NULL;
-  if (strcmp (expstart.file, expend.file) != 0)
+  if (!expstart.src || expend.src != expstart.src)
     return NULL;
   if (expstart.line > expend.line)
     return NULL;
@@ -1788,7 +1789,7 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       expanded_location finish
 	= expand_location_to_spelling_point (src_range.m_finish,
 					     LOCATION_ASPECT_FINISH);
-      if (start.file != finish.file)
+      if (start.src != finish.src)
 	return "range endpoints are in different files";
       if (start.line != finish.line)
 	return "range endpoints are on different lines";
@@ -1839,7 +1840,7 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
 	return "start and finish are spelled in different ordinary maps";
       /* The file from linemap_resolve_location ought to match that from
 	 expand_location_to_spelling_point.  */
-      if (ORDINARY_MAP_SOURCE_ID (start_ord_map) != start.file)
+      if (ORDINARY_MAP_SOURCE_ID (start_ord_map) != start.src)
 	return "mismatching file after resolving linemap";
 
       location_t start_loc
diff --git a/gcc/input.h b/gcc/input.h
index 1b81a995f86..5c578f1a9de 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -175,6 +175,7 @@ extern location_t location_with_discriminator (location_t, int);
 extern bool has_discriminator (location_t);
 extern int get_discriminator_from_loc (location_t);
 
+#define LOCATION_SRC(LOC) ((expand_location (LOC)).src)
 #define LOCATION_FILE(LOC) ((expand_location (LOC)).file)
 #define LOCATION_LINE(LOC) ((expand_location (LOC)).line)
 #define LOCATION_COLUMN(LOC)((expand_location (LOC)).column)
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index e59123b18c5..76617fe6129 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1410,18 +1410,22 @@ linemap_location_before_p (class line_maps *set,
 
 typedef struct
 {
-  /* The name of the source file involved.  */
-  const char *file;
+  /* The file name of the location involved, or NULL if the location
+     is not in an external file.  */
+  const char *file = nullptr;
 
-  /* The line-location in the source file.  */
-  int line;
-
-  int column;
+  /* A source_id recording the file name and/or the in-memory content,
+     as appropriate.  Users that need to handle in-memory content need
+     to use this rather than FILE.  */
+  source_id src;
 
-  void *data;
+  /* The line-location in the source file.  */
+  int line = 0;
+  int column = 0;
+  void *data = nullptr;
 
-  /* In a system header?. */
-  bool sysp;
+  /* In a system header?  */
+  bool sysp = false;
 } expanded_location;
 
 class range_label;
@@ -2065,7 +2069,7 @@ class fixit_hint
 	      const char *new_content);
   ~fixit_hint () { free (m_bytes); }
 
-  bool affects_line_p (const char *file, int line) const;
+  bool affects_line_p (source_id src, int line) const;
   location_t get_start_loc () const { return m_start; }
   location_t get_next_loc () const { return m_next_loc; }
   bool maybe_append (location_t start,
diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index e63916054e0..7704c60773b 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -1905,8 +1905,6 @@ linemap_expand_location (line_maps *set,
 
 {
   expanded_location xloc;
-
-  memset (&xloc, 0, sizeof (xloc));
   if (IS_ADHOC_LOC (loc))
     {
       xloc.data = get_data_from_adhoc_loc (set, loc);
@@ -1932,8 +1930,9 @@ linemap_expand_location (line_maps *set,
 	abort ();
 
       const line_map_ordinary *ord_map = linemap_check_ordinary (map);
-
-      xloc.file = LINEMAP_FILE (ord_map);
+      xloc.src = ORDINARY_MAP_SOURCE_ID (ord_map);
+      if (!xloc.src.is_buffer ())
+	xloc.file = xloc.src.get_filename_or_buffer ();
       xloc.line = SOURCE_LINE (ord_map, loc);
       xloc.column = SOURCE_COLUMN (ord_map, loc);
       xloc.sysp = LINEMAP_SYSP (ord_map) != 0;
@@ -2534,7 +2533,7 @@ rich_location::maybe_add_fixit (location_t start,
     = linemap_client_expand_location_to_spelling_point (next_loc,
 							LOCATION_ASPECT_START);
   /* They must be within the same file...  */
-  if (exploc_start.file != exploc_next_loc.file)
+  if (exploc_start.src != exploc_next_loc.src)
     {
       stop_supporting_fixits ();
       return;
@@ -2619,19 +2618,19 @@ fixit_hint::fixit_hint (location_t start,
 /* Does this fix-it hint affect the given line?  */
 
 bool
-fixit_hint::affects_line_p (const char *file, int line) const
+fixit_hint::affects_line_p (source_id src, int line) const
 {
   expanded_location exploc_start
     = linemap_client_expand_location_to_spelling_point (m_start,
 							LOCATION_ASPECT_START);
-  if (file != exploc_start.file)
+  if (src != exploc_start.src)
     return false;
   if (line < exploc_start.line)
       return false;
   expanded_location exploc_next_loc
     = linemap_client_expand_location_to_spelling_point (m_next_loc,
 							LOCATION_ASPECT_START);
-  if (file != exploc_next_loc.file)
+  if (src != exploc_next_loc.src)
     return false;
   if (line > exploc_next_loc.line)
       return false;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-15 15:43           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
                           ` (4 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Class file_cache_slot in input.cc is used to query specific lines of source
code from a file when needed by diagnostics infrastructure. This will be
extended in a subsequent patch to support obtaining the source code from
in-memory generated buffers rather than from a file. The present patch
refactors class file_cache_slot, putting most of the logic into a new base
class cache_data_source, in preparation for reusing that code in the next
patch. There is no change in functionality yet.

gcc/ChangeLog:

	* input.cc (class file_cache_slot): Refactor functionality into a
	new base class...
	(class cache_data_source): ...here.
	(file_cache::forcibly_evict_file): Adapt for refactoring.
	(file_cache_slot::evict): Renamed to...
	(file_cache_slot::reset): ...this, and partially refactored into
	base class...
	(cache_data_source::reset): ...here.
	(file_cache_slot::get_full_file_content): Moved into base class...
	(cache_data_source::get_full_file_content): ...here.
	(file_cache_slot::create): Adapt for refactoring.
	(file_cache_slot::file_cache_slot): Refactor partially into...
	(cache_data_source::cache_data_source): ...here.
	(file_cache_slot::~file_cache_slot): Refactor partially into...
	(cache_data_source::~cache_data_source): ...here.
	(file_cache_slot::needs_read_p): Remove.
	(file_cache_slot::needs_grow_p): Remove.
	(file_cache_slot::maybe_grow): Adapt for refactoring.
	(file_cache_slot::read_data): Refactored, along with...
	(file_cache_slot::maybe_read_data): this, into...
	(file_cache_slot::get_more_data): ...here.
	(find_end_of_line): Change interface to take a pair of pointers,
	rather than a pointer + length.
	(file_cache_slot::get_next_line): Refactored into...
	(cache_data_source::get_next_line): ...here.
	(file_cache_slot::goto_next_line): Refactored into...
	(cache_data_source::goto_next_line): ...here.
	(file_cache_slot::read_line_num): Refactored into...
	(cache_data_source::read_line_num): ...here.
	(location_get_source_line): Fix const-correctness as necessitated by
	new interface.
---
 gcc/input.cc | 513 +++++++++++++++++++++++----------------------------
 1 file changed, 235 insertions(+), 278 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index c2559614a99..9377020b460 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -55,34 +55,88 @@ file_cache::initialize_input_context (diagnostic_input_charset_callback ccb,
   in_context.should_skip_bom = should_skip_bom;
 }
 
-/* This is a cache used by get_next_line to store the content of a
-   file to be searched for file lines.  */
-class file_cache_slot
+/* This is an abstract interface for a class that provides data which we want to
+   look up by line number.  Concrete implementations will follow, which handle
+   the cases of reading the data from the input source files, or of reading it
+   from in-memory generated data buffers.  The design is driven with reading
+   from files in mind, in particular it is desirable to read only as much of a
+   file from disk as necessary.  It works like a simplified std::istream, i.e.
+   virtual function calls are only needed when we need to retrieve more data
+   from the underlying source.  */
+
+class cache_data_source
 {
-public:
-  file_cache_slot ();
-  ~file_cache_slot ();
 
-  bool read_line_num (size_t line_num,
-		      char ** line, ssize_t *line_len);
-
-  /* Accessors.  */
-  const char *get_file_path () const { return m_file_path; }
+public:
+  bool read_line_num (size_t line_num, const char **line, ssize_t *line_len);
   unsigned get_use_count () const { return m_use_count; }
+  void inc_use_count () { m_use_count++; }
+  bool get_next_line (const char **line, ssize_t *line_len);
+  bool goto_next_line ();
   bool missing_trailing_newline_p () const
   {
     return m_missing_trailing_newline;
   }
   char_span get_full_file_content ();
+  bool unused () const { return !m_data_begin; }
+  virtual void reset ();
+
+protected:
+  cache_data_source ();
+  virtual ~cache_data_source ();
+
+  /* These pointers delimit the data that we are processing.  They are
+     maintained by the derived classes, we only ask for more by calling
+     get_more_data().  That function should return TRUE if more data was
+     obtained.  Calling get_more_data () may invalidate these pointers
+     (i.e. reallocating them to a larger buffer).  */
+  const char *m_data_begin;
+  const char *m_data_end;
+  virtual bool get_more_data () = 0;
+
+  /* This is to be called by the derived classes when this object is
+     being activated.  */
+  void on_create (unsigned int use_count, size_t total_lines)
+  {
+    m_use_count = use_count;
+    m_total_lines = total_lines;
+  }
 
-  void inc_use_count () { m_use_count++; }
+private:
+  /* Non-copyable.  */
+  cache_data_source (const cache_data_source &) = delete;
+  cache_data_source& operator= (const cache_data_source &) = delete;
 
-  bool create (const file_cache::input_context &in_context,
-	       const char *file_path, FILE *fp, unsigned highest_use_count);
-  void evict ();
+  /* The number of times this data has been accessed.  This is used to designate
+     which entry to evict from the cache array when needed.  */
+  unsigned m_use_count;
 
- private:
-  /* These are information used to store a line boundary.  */
+  /* Could this file be missing a trailing newline on its final line?
+     Initially true (to cope with empty files), set to true/false
+     as each line is read.  */
+  bool m_missing_trailing_newline;
+
+  /* This is the total number of lines in the current data.  At the
+     moment, we try to get this information from the line map
+     subsystem.  Note that this is just a hint.  When using the C++
+     front-end, this hint is correct because the input file is then
+     completely tokenized before parsing starts; so the line map knows
+     the number of lines before compilation really starts.  For e.g,
+     the C front-end, it can happen that we start emitting diagnostics
+     before the line map has seen the end of the file.  */
+  size_t m_total_lines;
+
+  /* The number of the previous lines read.  This starts at 1.  Zero
+     means we've read no line so far.  */
+  size_t m_line_num;
+
+  /* The index of the beginning of the current line.  */
+  size_t m_line_start_idx;
+
+  /* These are information used to store a line boundary.  Here and below, we
+     store always byte offsets, not pointers, since the underlying buffer may be
+     reallocated by the derived implementation unbeknownst to us after calling
+     get_more_data().  */
   class line_info
   {
   public:
@@ -90,13 +144,12 @@ public:
     size_t line_num;
 
     /* The position (byte count) of the beginning of the line,
-       relative to the file data pointer.  This starts at zero.  */
+       relative to M_DATA_BEGIN.  This starts at zero.  */
     size_t start_pos;
 
-    /* The position (byte count) of the last byte of the line.  This
-       normally points to the '\n' character, or to one byte after the
-       last byte of the file, if the file doesn't contain a '\n'
-       character.  */
+    /* The position (byte count) of the last byte of the line.  This normally
+       points to the '\n' character, or to M_DATA_END, if the data doesn't end
+       with a '\n' character.  */
     size_t end_pos;
 
     line_info (size_t l, size_t s, size_t e)
@@ -104,91 +157,54 @@ public:
     {}
 
     line_info ()
-      :line_num (0), start_pos (0), end_pos (0)
+      : line_num (0), start_pos (0), end_pos (0)
     {}
   };
 
-  bool needs_read_p () const;
-  bool needs_grow_p () const;
-  void maybe_grow ();
-  bool read_data ();
-  bool maybe_read_data ();
-  bool get_next_line (char **line, ssize_t *line_len);
-  bool read_next_line (char ** line, ssize_t *line_len);
-  bool goto_next_line ();
-
-  static const size_t buffer_size = 4 * 1024;
-  static const size_t line_record_size = 100;
-
-  /* The number of time this file has been accessed.  This is used
-     to designate which file cache to evict from the cache
-     array.  */
-  unsigned m_use_count;
-
-  /* The file_path is the key for identifying a particular file in
-     the cache.
-     For libcpp-using code, the underlying buffer for this field is
-     owned by the corresponding _cpp_file within the cpp_reader.  */
-  const char *m_file_path;
-
-  FILE *m_fp;
-
-  /* This points to the content of the file that we've read so
-     far.  */
-  char *m_data;
-
-  /* The allocated buffer to be freed may start a little earlier than DATA,
-     e.g. if a UTF8 BOM was skipped at the beginning.  */
-  int m_alloc_offset;
-
-  /*  The size of the DATA array above.*/
-  size_t m_size;
-
-  /* The number of bytes read from the underlying file so far.  This
-     must be less (or equal) than SIZE above.  */
-  size_t m_nb_read;
-
-  /* The index of the beginning of the current line.  */
-  size_t m_line_start_idx;
-
-  /* The number of the previous line read.  This starts at 1.  Zero
-     means we've read no line so far.  */
-  size_t m_line_num;
-
-  /* This is the total number of lines of the current file.  At the
-     moment, we try to get this information from the line map
-     subsystem.  Note that this is just a hint.  When using the C++
-     front-end, this hint is correct because the input file is then
-     completely tokenized before parsing starts; so the line map knows
-     the number of lines before compilation really starts.  For e.g,
-     the C front-end, it can happen that we start emitting diagnostics
-     before the line map has seen the end of the file.  */
-  size_t m_total_lines;
-
-  /* Could this file be missing a trailing newline on its final line?
-     Initially true (to cope with empty files), set to true/false
-     as each line is read.  */
-  bool m_missing_trailing_newline;
-
   /* This is a record of the beginning and end of the lines we've seen
      while reading the file.  This is useful to avoid walking the data
      from the beginning when we are asked to read a line that is
-     before LINE_START_IDX above.  Note that the maximum size of this
+     before M_LINE_START_IDX.  Note that the maximum size of this
      record is line_record_size, so that the memory consumption
      doesn't explode.  We thus scale total_lines down to
      line_record_size.  */
   vec<line_info, va_heap> m_line_record;
+  static const size_t line_record_size = 100;
+};
 
-  void offset_buffer (int offset)
-  {
-    gcc_assert (offset < 0 ? m_alloc_offset + offset >= 0
-		: (size_t) offset <= m_size);
-    gcc_assert (m_data);
-    m_alloc_offset += offset;
-    m_data += offset;
-    m_size -= offset;
-  }
+/* This is the implementation of cache_data_source for ordinary
+   source files.  */
+class file_cache_slot final : public cache_data_source
+{
+
+public:
+  file_cache_slot ();
+  ~file_cache_slot ();
+
+  const char *get_file_path () const { return m_file_path; }
+  bool create (const file_cache::input_context &in_context,
+	       const char *file_path, FILE *fp, unsigned highest_use_count);
+  void reset () override;
+
+protected:
+  bool get_more_data () override;
 
+private:
+  /* The file_path is the key for identifying a particular file in the cache.
+     For libcpp-using code, the underlying buffer for this field is owned by the
+     corresponding _cpp_file within the cpp_reader.  */
+  const char *m_file_path;
+
+  FILE *m_fp;
+
+  /* The base class M_DATA_BEGIN and M_DATA_END delimit the bytes that are ready
+     to process.  These two pointers here track a growable memory buffer, owned
+     by this object, where we store data as we read it from the file; we arrange
+     for the base class pointers to point to the right place within this
+     buffer.  */
+  char *m_buf_begin;
+  char *m_buf_end;
+  void maybe_grow ();
 };
 
 /* Current position in real source file.  */
@@ -391,26 +407,10 @@ file_cache::forcibly_evict_file (const char *file_path)
     /* Not found.  */
     return;
 
-  r->evict ();
+  r->reset ();
 }
 
-void
-file_cache_slot::evict ()
-{
-  m_file_path = NULL;
-  if (m_fp)
-    fclose (m_fp);
-  m_fp = NULL;
-  m_nb_read = 0;
-  m_line_start_idx = 0;
-  m_line_num = 0;
-  m_line_record.truncate (0);
-  m_use_count = 0;
-  m_total_lines = 0;
-  m_missing_trailing_newline = true;
-}
-
-/* Return the file cache that has been less used, recently, or the
+/* Return the cache that has been less used, recently, or the
    first empty one.  If HIGHEST_USE_COUNT is non-null,
    *HIGHEST_USE_COUNT is set to the highest use count of the entries
    in the cache table.  */
@@ -473,14 +473,14 @@ file_cache::add_file (const char *file_path)
    as decoded according to the input charset, encoded as UTF-8.  */
 
 char_span
-file_cache_slot::get_full_file_content ()
+cache_data_source::get_full_file_content ()
 {
-  char *line;
+  const char *line;
   ssize_t line_len;
   while (get_next_line (&line, &line_len))
     {
     }
-  return char_span (m_data, m_nb_read);
+  return char_span (m_data_begin, m_data_end - m_data_begin);
 }
 
 /* Populate this slot for use on FILE_PATH and FP, dropping any
@@ -491,22 +491,12 @@ file_cache_slot::create (const file_cache::input_context &in_context,
 			 const char *file_path, FILE *fp,
 			 unsigned highest_use_count)
 {
+  reset ();
+  on_create (highest_use_count + 1, total_lines_num (source_id {file_path}));
+  m_data_begin = m_buf_begin;
+  m_data_end = m_buf_begin;
   m_file_path = file_path;
-  if (m_fp)
-    fclose (m_fp);
   m_fp = fp;
-  if (m_alloc_offset)
-    offset_buffer (-m_alloc_offset);
-  m_nb_read = 0;
-  m_line_start_idx = 0;
-  m_line_num = 0;
-  m_line_record.truncate (0);
-  /* Ensure that this cache entry doesn't get evicted next time
-     add_file_to_cache_tab is called.  */
-  m_use_count = ++highest_use_count;
-  m_total_lines = total_lines_num (file_path);
-  m_missing_trailing_newline = true;
-
 
   /* Check the input configuration to determine if we need to do any
      transformations, such as charset conversion or BOM skipping.  */
@@ -519,20 +509,17 @@ file_cache_slot::create (const file_cache::input_context &in_context,
 	= cpp_get_converted_source (file_path, input_charset);
       if (!cs.data)
 	return false;
-      if (m_data)
-	XDELETEVEC (m_data);
-      m_data = cs.data;
-      m_nb_read = m_size = cs.len;
-      m_alloc_offset = cs.data - cs.to_free;
+      XDELETEVEC (m_buf_begin);
+      m_buf_begin = cs.to_free;
+      m_buf_end = cs.data + cs.len;
+      m_data_begin = cs.data;
+      m_data_end = m_buf_end;
     }
-  else if (in_context.should_skip_bom)
+  else if (in_context.should_skip_bom && get_more_data ())
     {
-      if (read_data ())
-	{
-	  const int offset = cpp_check_utf8_bom (m_data, m_nb_read);
-	  offset_buffer (offset);
-	  m_nb_read -= offset;
-	}
+      const int offset = cpp_check_utf8_bom (m_data_begin,
+					     m_data_end - m_data_begin);
+      m_data_begin += offset;
     }
 
   return true;
@@ -567,55 +554,60 @@ file_cache::lookup_or_add_file (const char *file_path)
   return r;
 }
 
-/* Default constructor for a cache of file used by caret
-   diagnostic.  */
-
-file_cache_slot::file_cache_slot ()
-: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
-  m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0),
-  m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true)
+cache_data_source::cache_data_source ()
+: m_data_begin (nullptr), m_data_end (nullptr),
+  m_use_count (0),
+  m_missing_trailing_newline (true),
+  m_total_lines (0),
+  m_line_num (0),
+  m_line_start_idx (0)
 {
   m_line_record.create (0);
 }
 
-/* Destructor for a cache of file used by caret diagnostic.  */
-
-file_cache_slot::~file_cache_slot ()
+cache_data_source::~cache_data_source ()
 {
-  if (m_fp)
-    {
-      fclose (m_fp);
-      m_fp = NULL;
-    }
-  if (m_data)
-    {
-      offset_buffer (-m_alloc_offset);
-      XDELETEVEC (m_data);
-      m_data = 0;
-    }
   m_line_record.release ();
 }
 
-/* Returns TRUE iff the cache would need to be filled with data coming
-   from the file.  That is, either the cache is empty or full or the
-   current line is empty.  Note that if the cache is full, it would
-   need to be extended and filled again.  */
-
-bool
-file_cache_slot::needs_read_p () const
+void
+cache_data_source::reset ()
 {
-  return m_fp && (m_nb_read == 0
-	  || m_nb_read == m_size
-	  || (m_line_start_idx >= m_nb_read - 1));
+  m_data_begin = nullptr;
+  m_data_end = nullptr;
+  m_use_count = 0;
+  m_missing_trailing_newline = true;
+  m_total_lines = 0;
+  m_line_num = 0;
+  m_line_start_idx = 0;
+  m_line_record.truncate (0);
 }
 
-/*  Return TRUE iff the cache is full and thus needs to be
-    extended.  */
+file_cache_slot::file_cache_slot ()
+: m_file_path (nullptr), m_fp (nullptr),
+  m_buf_begin (nullptr), m_buf_end (nullptr)
+{}
+
+file_cache_slot::~file_cache_slot ()
+{
+  if (m_fp)
+    fclose (m_fp);
+  XDELETEVEC (m_buf_begin);
+}
 
-bool
-file_cache_slot::needs_grow_p () const
+void
+file_cache_slot::reset ()
 {
-  return m_nb_read == m_size;
+  cache_data_source::reset ();
+  m_file_path = NULL;
+  if (m_fp)
+    {
+      fclose (m_fp);
+      m_fp = NULL;
+    }
+
+  /* Do not free the buffer here, we intend to reuse it the next time this
+     slot is activated.  */
 }
 
 /* Grow the cache if it needs to be extended.  */
@@ -623,22 +615,23 @@ file_cache_slot::needs_grow_p () const
 void
 file_cache_slot::maybe_grow ()
 {
-  if (!needs_grow_p ())
-    return;
-
-  if (!m_data)
+  if (!m_buf_begin)
     {
-      gcc_assert (m_size == 0 && m_alloc_offset == 0);
-      m_size = buffer_size;
-      m_data = XNEWVEC (char, m_size);
+      const size_t buffer_size = 4 * 1024;
+      m_buf_begin = XNEWVEC (char, buffer_size);
+      m_buf_end = m_buf_begin + buffer_size;
+      m_data_begin = m_buf_begin;
+      m_data_end = m_data_begin;
     }
-  else
+  else if (m_data_end == m_buf_end)
     {
-      const int offset = m_alloc_offset;
-      offset_buffer (-offset);
-      m_size *= 2;
-      m_data = XRESIZEVEC (char, m_data, m_size);
-      offset_buffer (offset);
+      const auto new_size = 2 * (m_buf_end - m_buf_begin);
+      const auto data_offset = m_data_begin - m_buf_begin;
+      const auto data_size = m_data_end - m_data_begin;
+      m_buf_begin = XRESIZEVEC (char, m_buf_begin, new_size);
+      m_buf_end = m_buf_begin + new_size;
+      m_data_begin = m_buf_begin + data_offset;
+      m_data_end = m_data_begin + data_size;
     }
 }
 
@@ -646,45 +639,28 @@ file_cache_slot::maybe_grow ()
     Returns TRUE iff new data could be read.  */
 
 bool
-file_cache_slot::read_data ()
+file_cache_slot::get_more_data ()
 {
-  if (feof (m_fp) || ferror (m_fp))
+  if (!m_fp || feof (m_fp) || ferror (m_fp))
     return false;
-
   maybe_grow ();
-
-  char * from = m_data + m_nb_read;
-  size_t to_read = m_size - m_nb_read;
-  size_t nb_read = fread (from, 1, to_read, m_fp);
-
-  if (ferror (m_fp))
-    return false;
-
-  m_nb_read += nb_read;
-  return !!nb_read;
-}
-
-/* Read new data iff the cache needs to be filled with more data
-   coming from the file FP.  Return TRUE iff the cache was filled with
-   mode data.  */
-
-bool
-file_cache_slot::maybe_read_data ()
-{
-  if (!needs_read_p ())
+  char *const dest = m_buf_begin + (m_data_end - m_buf_begin);
+  const auto nb_read = fread (dest, 1, m_buf_end - dest, m_fp);
+  if (ferror (m_fp) || !nb_read)
     return false;
-  return read_data ();
+  m_data_end += nb_read;
+  return true;
 }
 
-/* Helper function for file_cache_slot::get_next_line (), to find the end of
+/* Helper function for cache_data_source::get_next_line (), to find the end of
    the next line.  Returns with the memchr convention, i.e. nullptr if a line
    terminator was not found.  We need to determine line endings in the same
    manner that libcpp does: any of \n, \r\n, or \r is a line ending.  */
 
-static char *
-find_end_of_line (char *s, size_t len)
+static const char *
+find_end_of_line (const char *s, const char *end)
 {
-  for (const auto end = s + len; s != end; ++s)
+  for (; s != end; ++s)
     {
       if (*s == '\n')
 	return s;
@@ -707,41 +683,38 @@ find_end_of_line (char *s, size_t len)
   return nullptr;
 }
 
-/* Read a new line from file FP, using C as a cache for the data
-   coming from the file.  Upon successful completion, *LINE is set to
-   the beginning of the line found.  *LINE points directly in the
-   line cache and is only valid until the next call of get_next_line.
-   *LINE_LEN is set to the length of the line.  Note that the line
-   does not contain any terminal delimiter.  This function returns
-   true if some data was read or process from the cache, false
-   otherwise.  Note that subsequent calls to get_next_line might
-   make the content of *LINE invalid.  */
+/* Read a new line from the data source.  Upon successful completion, *LINE is
+   set to the beginning of the line found.  *LINE points directly in the line
+   cache and is only valid until the next call of get_next_line.  *LINE_LEN is
+   set to the length of the line.  Note that the line does not contain any
+   terminal delimiter.  This function returns true if some data was read or
+   processed from the cache, false otherwise.  Note that subsequent calls to
+   get_next_line might make the content of *LINE invalid.  */
 
 bool
-file_cache_slot::get_next_line (char **line, ssize_t *line_len)
+cache_data_source::get_next_line (const char **line, ssize_t *line_len)
 {
-  /* Fill the cache with data to process.  */
-  maybe_read_data ();
+  const char *line_start = m_data_begin + m_line_start_idx;
 
-  size_t remaining_size = m_nb_read - m_line_start_idx;
-  if (remaining_size == 0)
-    /* There is no more data to process.  */
-    return false;
-
-  char *line_start = m_data + m_line_start_idx;
+  /* Check if we are all done reading the file.  */
+  if (line_start == m_data_end)
+    {
+      if (!get_more_data ())
+	return false;
+      line_start = m_data_begin + m_line_start_idx;
+    }
 
-  char *next_line_start = NULL;
-  size_t len = 0;
-  char *line_end = find_end_of_line (line_start, remaining_size);
+  /* Find the end of the current line.  */
+  const char *next_line_start = NULL;
+  const char *line_end = find_end_of_line (line_start, m_data_end);
   if (line_end == NULL)
     {
       /* We haven't found an end-of-line delimiter in the cache.
 	 Fill the cache with more data from the file and look again.  */
-      while (maybe_read_data ())
+      while (get_more_data ())
 	{
-	  line_start = m_data + m_line_start_idx;
-	  remaining_size = m_nb_read - m_line_start_idx;
-	  line_end = find_end_of_line (line_start, remaining_size);
+	  line_start = m_data_begin + m_line_start_idx;
+	  line_end = find_end_of_line (line_start, m_data_end);
 	  if (line_end != NULL)
 	    {
 	      next_line_start = line_end + 1;
@@ -758,8 +731,8 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 
 	     If the file ends in a \r, we didn't identify it as a line
 	     terminator above, so do that now instead.  */
-	  line_end = m_data + m_nb_read;
-	  if (m_nb_read && line_end[-1] == '\r')
+	  line_end = m_data_end;
+	  if (line_end != m_data_begin && line_end[-1] == '\r')
 	    {
 	      --line_end;
 	      m_missing_trailing_newline = false;
@@ -776,18 +749,11 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
       m_missing_trailing_newline = false;
     }
 
-  if (m_fp && ferror (m_fp))
-    return false;
-
   /* At this point, we've found the end of the of line.  It either points to
      the line terminator or to one byte after the last byte of the file.  */
-  gcc_assert (line_end != NULL);
-
-  len = line_end - line_start;
-
-  if (m_line_start_idx < m_nb_read)
-    *line = line_start;
-
+  const auto len = line_end - line_start;
+  *line = line_start;
+  *line_len = len;
   ++m_line_num;
 
   /* Before we update our line record, make sure the hint about the
@@ -809,7 +775,7 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 	m_line_record.safe_push
 	  (file_cache_slot::line_info (m_line_num,
 				       m_line_start_idx,
-				       line_end - m_data));
+				       line_end - m_data_begin));
       else if (m_total_lines > line_record_size)
 	{
 	  /* ... otherwise, we just scale total_lines down to
@@ -820,23 +786,14 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
 	    m_line_record.safe_push
 	      (file_cache_slot::line_info (m_line_num,
 					   m_line_start_idx,
-					   line_end - m_data));
+					   line_end - m_data_begin));
 	}
     }
 
   /* Update m_line_start_idx so that it points to the next line to be
      read.  */
-  if (next_line_start)
-    m_line_start_idx = next_line_start - m_data;
-  else
-    /* We didn't find any terminal '\n'.  Let's consider that the end
-       of line is the end of the data in the cache.  The next
-       invocation of get_next_line will either read more data from the
-       underlying file or return false early because we've reached the
-       end of the file.  */
-    m_line_start_idx = m_nb_read;
-
-  *line_len = len;
+  m_line_start_idx
+    = (next_line_start ? next_line_start : m_data_end) - m_data_begin;
 
   return true;
 }
@@ -848,15 +805,15 @@ file_cache_slot::get_next_line (char **line, ssize_t *line_len)
    completion.  */
 
 bool
-file_cache_slot::goto_next_line ()
+cache_data_source::goto_next_line ()
 {
-  char *l;
+  const char *l;
   ssize_t len;
 
   return get_next_line (&l, &len);
 }
 
-/* Read an arbitrary line number LINE_NUM from the file cached in C.
+/* Read an arbitrary line number LINE_NUM from the data cache.
    If the line was read successfully, *LINE points to the beginning
    of the line in the file cache and *LINE_LEN is the length of the
    line.  *LINE is not nul-terminated, but may contain zero bytes.
@@ -864,8 +821,8 @@ file_cache_slot::goto_next_line ()
    This function returns bool if a line was read.  */
 
 bool
-file_cache_slot::read_line_num (size_t line_num,
-		       char ** line, ssize_t *line_len)
+cache_data_source::read_line_num (size_t line_num,
+				  const char ** line, ssize_t *line_len)
 {
   gcc_assert (line_num > 0);
 
@@ -873,7 +830,7 @@ file_cache_slot::read_line_num (size_t line_num,
     {
       /* We've been asked to read lines that are before m_line_num.
 	 So lets use our line record (if it's not empty) to try to
-	 avoid re-reading the file from the beginning again.  */
+	 avoid re-scanning the data from the beginning again.  */
 
       if (m_line_record.is_empty ())
 	{
@@ -882,7 +839,7 @@ file_cache_slot::read_line_num (size_t line_num,
 	}
       else
 	{
-	  file_cache_slot::line_info *i = NULL;
+	  line_info *i = NULL;
 	  if (m_total_lines <= line_record_size)
 	    {
 	      /* In languages where the input file is not totally
@@ -918,7 +875,7 @@ file_cache_slot::read_line_num (size_t line_num,
 	  if (i && i->line_num == line_num)
 	    {
 	      /* We have the start/end of the line.  */
-	      *line = m_data + i->start_pos;
+	      *line = m_data_begin + i->start_pos;
 	      *line_len = i->end_pos - i->start_pos;
 	      return true;
 	    }
@@ -957,7 +914,7 @@ file_cache_slot::read_line_num (size_t line_num,
 char_span
 location_get_source_line (const char *file_path, int line)
 {
-  char *buffer = NULL;
+  const char *buffer = NULL;
   ssize_t len;
 
   if (line == 0)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                           ` (2 preceding siblings ...)
  2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-15 16:15           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
                           ` (3 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

This patch enhances location_get_source_line(), which is the primary
interface provided by the diagnostics infrastructure to obtain the line of
source code corresponding to a given location, so that it understands
generated data locations in addition to normal file-based locations. This
involves changing the argument to location_get_source_line() from a plain
file name, to a source_id object that can represent either type of location.

gcc/ChangeLog:

	* input.cc (class data_cache_slot): New class.
	(file_cache::lookup_data): New function.
	(diagnostics_file_cache_forcibly_evict_data): New function.
	(file_cache::forcibly_evict_data): New function.
	(file_cache::evicted_cache_tab_entry): Generalize (via a template)
	to work for both file_cache_slot and data_cache_slot.
	(file_cache::add_file): Adapt for new interface to
	evicted_cache_tab_entry.
	(file_cache::add_data): New function.
	(data_cache_slot::create): New function.
	(file_cache::file_cache): Support the new m_data_slots member.
	(file_cache::~file_cache): Likewise.
	(file_cache::lookup_or_add_data): New function.
	(file_cache::lookup_or_add): New function that calls either
	lookup_or_add_data or lookup_or_add_file as appropriate.
	(location_get_source_line): Change the FILE_PATH argument to a
	source_id SRC, and use it to support obtaining source lines from
	generated data as well as from files.
	(location_compute_display_column): Support generated data using the
	new features of location_get_source_line.
	(dump_location_info): Likewise.
	* input.h (location_get_source_line): Adjust prototype. Add a new
	convenience overload taking an expanded_location.
	(class cache_data_source): Declare.
	(class data_cache_slot): Declare.
	(class file_cache): Declare new members.
	(diagnostics_file_cache_forcibly_evict_data): Declare.
---
 gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
 gcc/input.h  |  23 +++++--
 2 files changed, 153 insertions(+), 41 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 9377020b460..790279d4273 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -207,6 +207,28 @@ private:
   void maybe_grow ();
 };
 
+/* This is the implementation of cache_data_source for generated
+   data that is already in memory.  */
+class data_cache_slot final : public cache_data_source
+{
+public:
+  void create (const char *data, unsigned int data_len,
+	       unsigned int highest_use_count);
+  bool represents_data (const char *data, unsigned int) const
+  {
+    /* We can just use pointer equality here since the generated data lives in
+       memory in one persistent place.  It isn't anticipated there would be
+       several generated data buffers with the same content, so we don't mind
+       that in such a case we will store it twice.  */
+    return m_data_begin == data;
+  }
+
+protected:
+  /* In contrast to file_cache_slot, we do not own a buffer.  The buffer
+     passed to create() needs to outlive this object.  */
+  bool get_more_data () override { return false; }
+};
+
 /* Current position in real source file.  */
 
 location_t input_location = UNKNOWN_LOCATION;
@@ -382,6 +404,21 @@ file_cache::lookup_file (const char *file_path)
   return r;
 }
 
+data_cache_slot *
+file_cache::lookup_data (const char *data, unsigned int data_len)
+{
+  for (unsigned int i = 0; i != num_file_slots; ++i)
+    {
+      const auto slot = m_data_slots + i;
+      if (slot->represents_data (data, data_len))
+	{
+	  slot->inc_use_count ();
+	  return slot;
+	}
+    }
+  return nullptr;
+}
+
 /* Purge any mention of FILENAME from the cache of files used for
    printing source code.  For use in selftests when working
    with tempfiles.  */
@@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
   global_dc->m_file_cache->forcibly_evict_file (file_path);
 }
 
+void
+diagnostics_file_cache_forcibly_evict_data (const char *data,
+					    unsigned int data_len)
+{
+  if (!global_dc->m_file_cache)
+    return;
+  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
+}
+
 void
 file_cache::forcibly_evict_file (const char *file_path)
 {
@@ -410,36 +456,36 @@ file_cache::forcibly_evict_file (const char *file_path)
   r->reset ();
 }
 
+void
+file_cache::forcibly_evict_data (const char *data, unsigned int data_len)
+{
+  if (auto r = lookup_data (data, data_len))
+    r->reset ();
+}
+
 /* Return the cache that has been less used, recently, or the
    first empty one.  If HIGHEST_USE_COUNT is non-null,
    *HIGHEST_USE_COUNT is set to the highest use count of the entries
    in the cache table.  */
 
-file_cache_slot*
-file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
+template <class Slot>
+Slot *
+file_cache::evicted_cache_tab_entry (Slot *slots,
+				     unsigned int *highest_use_count)
 {
-  diagnostic_file_cache_init ();
-
-  file_cache_slot *to_evict = &m_file_slots[0];
+  auto to_evict = &slots[0];
   unsigned huc = to_evict->get_use_count ();
   for (unsigned i = 1; i < num_file_slots; ++i)
     {
-      file_cache_slot *c = &m_file_slots[i];
-      bool c_is_empty = (c->get_file_path () == NULL);
-
+      auto c = &slots[i];
       if (c->get_use_count () < to_evict->get_use_count ()
-	  || (to_evict->get_file_path () && c_is_empty))
+	  || (!to_evict->unused () && c->unused ()))
 	/* We evict C because it's either an entry with a lower use
 	   count or one that is empty.  */
 	to_evict = c;
 
       if (huc < c->get_use_count ())
 	huc = c->get_use_count ();
-
-      if (c_is_empty)
-	/* We've reached the end of the cache; subsequent elements are
-	   all empty.  */
-	break;
     }
 
   if (highest_use_count)
@@ -463,12 +509,23 @@ file_cache::add_file (const char *file_path)
     return NULL;
 
   unsigned highest_use_count = 0;
-  file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
+  file_cache_slot *r = evicted_cache_tab_entry (m_file_slots,
+						&highest_use_count);
   if (!r->create (in_context, file_path, fp, highest_use_count))
     return NULL;
   return r;
 }
 
+data_cache_slot *
+file_cache::add_data (const char *data, unsigned int data_len)
+{
+  unsigned int highest_use_count = 0;
+  data_cache_slot *r = evicted_cache_tab_entry (m_data_slots,
+						&highest_use_count);
+  r->create (data, data_len, highest_use_count);
+  return r;
+}
+
 /* Get a borrowed char_span to the full content of this file
    as decoded according to the input charset, encoded as UTF-8.  */
 
@@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
   return true;
 }
 
+void
+data_cache_slot::create (const char *data, unsigned int data_len,
+			 unsigned int highest_use_count)
+{
+  reset ();
+  on_create (highest_use_count + 1,
+	     total_lines_num (source_id {data, data_len}));
+  m_data_begin = data;
+  m_data_end = data + data_len;
+}
+
 /* file_cache's ctor.  */
 
 file_cache::file_cache ()
-: m_file_slots (new file_cache_slot[num_file_slots])
+  : m_file_slots (new file_cache_slot[num_file_slots]),
+    m_data_slots (new data_cache_slot[num_file_slots])
 {
   initialize_input_context (nullptr, false);
 }
@@ -537,6 +606,7 @@ file_cache::file_cache ()
 
 file_cache::~file_cache ()
 {
+  delete[] m_data_slots;
   delete[] m_file_slots;
 }
 
@@ -554,6 +624,24 @@ file_cache::lookup_or_add_file (const char *file_path)
   return r;
 }
 
+data_cache_slot *
+file_cache::lookup_or_add_data (const char *data, unsigned int data_len)
+{
+  data_cache_slot *r = lookup_data (data, data_len);
+  if (!r)
+    r = add_data (data, data_len);
+  return r;
+}
+
+cache_data_source *
+file_cache::lookup_or_add (source_id src)
+{
+  if (src.is_buffer ())
+    return lookup_or_add_data (src.get_filename_or_buffer (),
+			       src.get_buffer_len ());
+  return src ? lookup_or_add_file (src.get_filename_or_buffer ()) : nullptr;
+}
+
 cache_data_source::cache_data_source ()
 : m_data_begin (nullptr), m_data_end (nullptr),
   m_use_count (0),
@@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
    If the function fails, a NULL char_span is returned.  */
 
 char_span
-location_get_source_line (const char *file_path, int line)
+location_get_source_line (source_id src, int line)
 {
-  const char *buffer = NULL;
-  ssize_t len;
-
-  if (line == 0)
-    return char_span (NULL, 0);
-
-  if (file_path == NULL)
-    return char_span (NULL, 0);
+  const char_span fail (nullptr, 0);
+  if (!src || line <= 0)
+    return fail;
 
   diagnostic_file_cache_init ();
+  const auto c = global_dc->m_file_cache->lookup_or_add (src);
+  if (!c)
+    return fail;
 
-  file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path);
-  if (c == NULL)
-    return char_span (NULL, 0);
-
+  const char *buffer = NULL;
+  ssize_t len;
   bool read = c->read_line_num (line, &buffer, &len);
   if (!read)
-    return char_span (NULL, 0);
+    return fail;
 
   return char_span (buffer, len);
 }
@@ -1193,9 +1277,9 @@ int
 location_compute_display_column (expanded_location exploc,
 				 const cpp_char_column_policy &policy)
 {
-  if (!(exploc.file && *exploc.file && exploc.line && exploc.column))
+  if (!(exploc.src && exploc.line && exploc.column))
     return exploc.column;
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   /* If line is NULL, this function returns exploc.column which is the
      desired fallback.  */
   return cpp_byte_column_to_display_column (line.get_buffer (), line.length (),
@@ -1425,13 +1509,26 @@ dump_location_info (FILE *stream)
 	    {
 	      /* Beginning of a new source line: draw the line.  */
 
-	      char_span line_text = location_get_source_line (exploc.file,
-							      exploc.line);
+	      char_span line_text = location_get_source_line (exploc);
 	      if (!line_text)
 		break;
+
+	      const char *fn1, *fn2;
+	      if (exploc.src.is_buffer ())
+		{
+		  fn1 = ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map);
+		  fn2 = special_fname_generated ();
+		}
+	      else
+		{
+		  fn1 = exploc.file;
+		  fn2 = "";
+		}
+
 	      fprintf (stream,
-		       "%s:%3i|loc:%5i|%.*s\n",
-		       exploc.file, exploc.line,
+		       "%s%s:%3i|loc:%5i|%.*s\n",
+		       fn1, fn2,
+		       exploc.line,
 		       loc,
 		       (int)line_text.length (), line_text.get_buffer ());
 
@@ -1450,7 +1547,7 @@ dump_location_info (FILE *stream)
 	      if (len_loc < 5)
 		len_loc = 5;
 
-	      int indent = 6 + strlen (exploc.file) + len_lnum + len_loc;
+	      int indent = 6 + strlen (fn1) + strlen (fn2) + len_lnum + len_loc;
 
 	      /* Thousands.  */
 	      if (end_location > 999)
diff --git a/gcc/input.h b/gcc/input.h
index 5c578f1a9de..d30673f1089 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -114,15 +114,21 @@ class char_span
   size_t m_n_elts;
 };
 
-extern char_span location_get_source_line (const char *file_path, int line);
+extern char_span location_get_source_line (source_id src, int line);
+inline char_span location_get_source_line (expanded_location exploc)
+{
+  return location_get_source_line (exploc.src, exploc.line);
+}
 extern char *get_source_text_between (location_t, location_t);
 extern char_span get_source_file_content (const char *file_path);
 
 extern bool location_missing_trailing_newline (const char *file_path);
 
-/* Forward decl of slot within file_cache, so that the definition doesn't
+/* Forward decl of slots within file_cache, so that the definition doesn't
    need to be in this header.  */
+class cache_data_source;
 class file_cache_slot;
+class data_cache_slot;
 
 /* A cache of source files for use when emitting diagnostics
    (and in a few places in the C/C++ frontends).
@@ -140,7 +146,10 @@ class file_cache
   ~file_cache ();
 
   file_cache_slot *lookup_or_add_file (const char *file_path);
+  data_cache_slot *lookup_or_add_data (const char *data, unsigned int data_len);
+  cache_data_source *lookup_or_add (source_id src);
   void forcibly_evict_file (const char *file_path);
+  void forcibly_evict_data (const char *data, unsigned int data_len);
 
   /* See comments in diagnostic.h about the input conversion context.  */
   struct input_context
@@ -152,13 +161,17 @@ class file_cache
 				 bool should_skip_bom);
 
  private:
-  file_cache_slot *evicted_cache_tab_entry (unsigned *highest_use_count);
+  template <class Slot>
+  Slot *evicted_cache_tab_entry (Slot *slots, unsigned int *highest_use_count);
+
   file_cache_slot *add_file (const char *file_path);
+  data_cache_slot *add_data (const char *data, unsigned int data_len);
   file_cache_slot *lookup_file (const char *file_path);
+  data_cache_slot *lookup_data (const char *data, unsigned int data_len);
 
- private:
   static const size_t num_file_slots = 16;
   file_cache_slot *m_file_slots;
+  data_cache_slot *m_data_slots;
   input_context in_context;
 };
 
@@ -256,6 +269,8 @@ void dump_location_info (FILE *stream);
 void diagnostics_file_cache_fini (void);
 
 void diagnostics_file_cache_forcibly_evict_file (const char *file_path);
+void diagnostics_file_cache_forcibly_evict_data (const char *data,
+						 unsigned int data_len);
 
 class GTY(()) string_concat
 {

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                           ` (3 preceding siblings ...)
  2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-15 16:27           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
                           ` (2 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Add selftests for the new capabilities in input.cc related to source code
locations that are stored in memory rather than ordinary files.

gcc/ChangeLog:

	* input.cc (temp_source_file::do_linemap_add): New function.
	(line_table_case::line_table_case): Add GENERATED_DATA argument.
	(line_table_test::line_table_test): Implement new M_GENERATED_DATA
	argument.
	(for_each_line_table_case): Optionally include generated data
	locations in the set of cases.
	(test_accessing_ordinary_linemaps): Test generated data locations.
	(test_make_location_nonpure_range_endpoints): Likewise.
	(test_line_offset_overflow): Likewise.
	(input_cc_tests): Likewise.
	* selftest.cc (named_temp_file::named_temp_file): Interpret a null
	SUFFIX argument as a request to use in-memory data.
	(named_temp_file::~named_temp_file): Support in-memory data.
	(temp_source_file::temp_source_file): Likewise.
	(temp_source_file::~temp_source_file): Likewise.
	* selftest.h (struct line_map_ordinary): Foward declare.
	(class named_temp_file): Add missing explicit to the constructor.
	(class temp_source_file): Add new members to support in-memory data.
	(class line_table_test): Likewise.
	(for_each_line_table_case): Adjust prototype.
---
 gcc/input.cc    | 81 +++++++++++++++++++++++++++++++++----------------
 gcc/selftest.cc | 53 +++++++++++++++++++++++++-------
 gcc/selftest.h  | 19 ++++++++++--
 3 files changed, 113 insertions(+), 40 deletions(-)

diff --git a/gcc/input.cc b/gcc/input.cc
index 790279d4273..8c4e40aaf23 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2066,6 +2066,20 @@ get_num_source_ranges_for_substring (cpp_reader *pfile,
 
 /* Selftests of location handling.  */
 
+/* Wrapper around linemap_add to handle transparently adding either a tmp file,
+   or in-memory generated content.  */
+const line_map_ordinary *
+temp_source_file::do_linemap_add (int line)
+{
+  const line_map *map;
+  if (content_buf)
+    map = linemap_add (line_table, LC_GEN, false, content_buf,
+		       line, content_len);
+  else
+    map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
+  return linemap_check_ordinary (map);
+}
+
 /* Verify that compare() on linenum_type handles comparisons over the full
    range of the type.  */
 
@@ -2144,13 +2158,16 @@ assert_loceq (const char *exp_filename, int exp_linenum, int exp_colnum,
 class line_table_case
 {
 public:
-  line_table_case (int default_range_bits, int base_location)
+  line_table_case (int default_range_bits, int base_location,
+		   bool generated_data)
   : m_default_range_bits (default_range_bits),
-    m_base_location (base_location)
+    m_base_location (base_location),
+    m_generated_data (generated_data)
   {}
 
   int m_default_range_bits;
   int m_base_location;
+  bool m_generated_data;
 };
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2167,6 +2184,7 @@ line_table_test::line_table_test ()
   gcc_assert (saved_line_table->round_alloc_size);
   line_table->round_alloc_size = saved_line_table->round_alloc_size;
   line_table->default_range_bits = 0;
+  m_generated_data = false;
 }
 
 /* Constructor.  Store the old value of line_table, and create a new
@@ -2188,6 +2206,7 @@ line_table_test::line_table_test (const line_table_case &case_)
       line_table->highest_location = case_.m_base_location;
       line_table->highest_line = case_.m_base_location;
     }
+  m_generated_data = case_.m_generated_data;
 }
 
 /* Destructor.  Restore the old value of line_table.  */
@@ -2207,7 +2226,10 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
   line_table_test ltt (case_);
 
   /* Build a simple linemap describing some locations. */
-  linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
+  if (ltt.m_generated_data)
+    linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+  else
+    linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
   linemap_line_start (line_table, 1, 100);
   location_t loc_a = linemap_position_for_column (line_table, 1);
@@ -2257,21 +2279,23 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
   linemap_add (line_table, LC_LEAVE, false, NULL, 0);
 
   /* Verify that we can recover the location info.  */
-  assert_loceq ("foo.c", 1, 1, loc_a);
-  assert_loceq ("foo.c", 1, 23, loc_b);
-  assert_loceq ("foo.c", 2, 1, loc_c);
-  assert_loceq ("foo.c", 2, 17, loc_d);
-  assert_loceq ("foo.c", 3, 700, loc_e);
-  assert_loceq ("foo.c", 4, 100, loc_back_to_short);
+  const auto fname
+    = (ltt.m_generated_data ? special_fname_generated () : "foo.c");
+  assert_loceq (fname, 1, 1, loc_a);
+  assert_loceq (fname, 1, 23, loc_b);
+  assert_loceq (fname, 2, 1, loc_c);
+  assert_loceq (fname, 2, 17, loc_d);
+  assert_loceq (fname, 3, 700, loc_e);
+  assert_loceq (fname, 4, 100, loc_back_to_short);
 
   /* In the very wide line, the initial location should be fully tracked.  */
-  assert_loceq ("foo.c", 5, 2000, loc_start_of_very_long_line);
+  assert_loceq (fname, 5, 2000, loc_start_of_very_long_line);
   /* ...but once we exceed LINE_MAP_MAX_COLUMN_NUMBER column-tracking should
      be disabled.  */
-  assert_loceq ("foo.c", 5, 0, loc_too_wide);
-  assert_loceq ("foo.c", 5, 0, loc_too_wide_2);
+  assert_loceq (fname, 5, 0, loc_too_wide);
+  assert_loceq (fname, 5, 0, loc_too_wide_2);
   /*...and column-tracking should be re-enabled for subsequent lines.  */
-  assert_loceq ("foo.c", 6, 10, loc_sane_again);
+  assert_loceq (fname, 6, 10, loc_sane_again);
 
   assert_loceq ("bar.c", 1, 150, loc_f);
 
@@ -2318,10 +2342,11 @@ test_make_location_nonpure_range_endpoints (const line_table_case &case_)
      with C++ frontend.
      ....................0000000001111111111222.
      ....................1234567890123456789012.  */
-  const char *content = "     r += !aaa == bbb;\n";
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  const char *content = "     r += !aaa == bbb;\n";
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   const location_t c11 = linemap_position_for_column (line_table, 11);
   const location_t c12 = linemap_position_for_column (line_table, 12);
@@ -3978,7 +4003,8 @@ static const location_t boundary_locations[] = {
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 void
-for_each_line_table_case (void (*testcase) (const line_table_case &))
+for_each_line_table_case (void (*testcase) (const line_table_case &),
+			  bool test_generated_data)
 {
   /* As noted above in the description of struct line_table_case,
      we want to explore a test matrix of interesting line_table
@@ -3997,16 +4023,19 @@ for_each_line_table_case (void (*testcase) (const line_table_case &))
       const int num_boundary_locations = ARRAY_SIZE (boundary_locations);
       for (int loc_idx = 0; loc_idx < num_boundary_locations; loc_idx++)
 	{
-	  line_table_case c (default_range_bits, boundary_locations[loc_idx]);
-
-	  testcase (c);
-
-	  num_cases_tested++;
+	  /* ...and try both normal files, and internally generated data.  */
+	  for (int gen = 0; gen != 1+test_generated_data; ++gen)
+	    {
+	      line_table_case c (default_range_bits,
+				 boundary_locations[loc_idx], gen);
+	      testcase (c);
+	      num_cases_tested++;
+	    }
 	}
     }
 
   /* Verify that we fully covered the test matrix.  */
-  ASSERT_EQ (num_cases_tested, 2 * 12);
+  ASSERT_EQ (num_cases_tested, 2 * 12 * (1+test_generated_data));
 }
 
 /* Verify that when presented with a consecutive pair of locations with
@@ -4017,7 +4046,7 @@ for_each_line_table_case (void (*testcase) (const line_table_case &))
 static void
 test_line_offset_overflow ()
 {
-  line_table_test ltt (line_table_case (5, 0));
+  line_table_test ltt (line_table_case (5, 0, false));
 
   linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
   linemap_line_start (line_table, 1, 100);
@@ -4257,9 +4286,9 @@ input_cc_tests ()
   test_should_have_column_data_p ();
   test_unknown_location ();
   test_builtins ();
-  for_each_line_table_case (test_make_location_nonpure_range_endpoints);
+  for_each_line_table_case (test_make_location_nonpure_range_endpoints, true);
 
-  for_each_line_table_case (test_accessing_ordinary_linemaps);
+  for_each_line_table_case (test_accessing_ordinary_linemaps, true);
   for_each_line_table_case (test_lexer);
   for_each_line_table_case (test_lexer_string_locations_simple);
   for_each_line_table_case (test_lexer_string_locations_ebcdic);
diff --git a/gcc/selftest.cc b/gcc/selftest.cc
index 20c10bbd055..7126b9901dd 100644
--- a/gcc/selftest.cc
+++ b/gcc/selftest.cc
@@ -163,14 +163,21 @@ assert_str_startswith (const location &loc,
 
 named_temp_file::named_temp_file (const char *suffix)
 {
-  m_filename = make_temp_file (suffix);
-  ASSERT_NE (m_filename, NULL);
+  if (suffix)
+    {
+      m_filename = make_temp_file (suffix);
+      ASSERT_NE (m_filename, NULL);
+    }
+  else
+    m_filename = nullptr;
 }
 
 /* Destructor.  Delete the tempfile.  */
 
 named_temp_file::~named_temp_file ()
 {
+  if (!m_filename)
+    return;
   unlink (m_filename);
   diagnostics_file_cache_forcibly_evict_file (m_filename);
   free (m_filename);
@@ -183,7 +190,9 @@ named_temp_file::~named_temp_file ()
 temp_source_file::temp_source_file (const location &loc,
 				    const char *suffix,
 				    const char *content)
-: named_temp_file (suffix)
+: named_temp_file (suffix),
+  content_buf (nullptr),
+  content_len (0)
 {
   FILE *out = fopen (get_filename (), "w");
   if (!out)
@@ -192,19 +201,41 @@ temp_source_file::temp_source_file (const location &loc,
   fclose (out);
 }
 
-/* As above, but with a size, to allow for NUL bytes in CONTENT.  */
+/* As above, but with a size, to allow for NUL bytes in CONTENT.  When
+   IS_GENERATED==true, the data is kept in memory instead, for testing LC_GEN
+   maps.  */
 
 temp_source_file::temp_source_file (const location &loc,
 				    const char *suffix,
 				    const char *content,
-				    size_t sz)
-: named_temp_file (suffix)
+				    size_t sz,
+				    bool is_generated)
+: named_temp_file (is_generated ? nullptr : suffix),
+  content_buf (is_generated ? XNEWVEC (char, sz) : nullptr),
+  content_len (is_generated ? sz : 0)
 {
-  FILE *out = fopen (get_filename (), "w");
-  if (!out)
-    fail_formatted (loc, "unable to open tempfile: %s", get_filename ());
-  fwrite (content, sz, 1, out);
-  fclose (out);
+  if (is_generated)
+    {
+      gcc_assert (sz); /* Empty generated content is not supported.  */
+      memcpy (content_buf, content, sz);
+    }
+  else
+    {
+      FILE *out = fopen (get_filename (), "w");
+      if (!out)
+	fail_formatted (loc, "unable to open tempfile: %s", get_filename ());
+      fwrite (content, sz, 1, out);
+      fclose (out);
+    }
+}
+
+temp_source_file::~temp_source_file ()
+{
+  if (content_buf)
+    {
+      diagnostics_file_cache_forcibly_evict_data (content_buf, content_len);
+      XDELETEVEC (content_buf);
+    }
 }
 
 /* Avoid introducing locale-specific differences in the results
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 20d522afda4..ede3b008145 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -25,6 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #if CHECKING_P
 
+struct line_map_ordinary;
+
 namespace selftest {
 
 /* A struct describing the source-location of a selftest, to make it
@@ -96,7 +98,7 @@ extern void assert_str_startswith (const location &loc,
 class named_temp_file
 {
  public:
-  named_temp_file (const char *suffix);
+  explicit named_temp_file (const char *suffix);
   ~named_temp_file ();
   const char *get_filename () const { return m_filename; }
 
@@ -113,7 +115,13 @@ class temp_source_file : public named_temp_file
   temp_source_file (const location &loc, const char *suffix,
 		    const char *content);
   temp_source_file (const location &loc, const char *suffix,
-		    const char *content, size_t sz);
+		    const char *content, size_t sz,
+		    bool is_generated = false);
+  ~temp_source_file ();
+
+  char *const content_buf;
+  const size_t content_len;
+  const line_map_ordinary *do_linemap_add (int line); /* In input.cc */
 };
 
 /* RAII-style class for avoiding introducing locale-specific differences
@@ -171,6 +179,10 @@ class line_table_test
 
   /* Destructor.  Restore the saved line_table.  */
   ~line_table_test ();
+
+  /* When this is enabled in the line_table_case, test storing all the data
+     in memory rather than a file.  */
+  bool m_generated_data;
 };
 
 /* Helper function for selftests that need a function decl.  */
@@ -183,7 +195,8 @@ extern tree make_fndecl (tree return_type,
 /* Run TESTCASE multiple times, once for each case in our test matrix.  */
 
 extern void
-for_each_line_table_case (void (*testcase) (const line_table_case &));
+for_each_line_table_case (void (*testcase) (const line_table_case &),
+			  bool test_generated_data = false);
 
 /* Read the contents of PATH into memory, returning a 0-terminated buffer
    that must be freed by the caller.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 6/8] diagnostics: Full support for generated data locations
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                           ` (4 preceding siblings ...)
  2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-15 16:39           ` David Malcolm
  2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Previous patches in this series have laid the groundwork for supporting
source code locations in memory ("generated data") rather than ordinary
files. This patch completes the support by adding awareness of such
locations to all places that need to support them. The main changes are to
diagnostic-show-locus.cc; the others are primarily small tweaks such as
changing from the FILE to the SRC member when inspecting an
expanded_location.

gcc/c-family/ChangeLog:

	* c-format.cc (get_corrected_substring): Use the new overload of
	location_get_source_line() to support generated data.
	* c-indentation.cc (get_visual_column): Likewise.
	(get_first_nws_vis_column): Change argument from a plain file name
	to a source_id.
	(detect_intervening_unindent): Likewise.
	(should_warn_for_misleading_indentation): Pass
	detect_intervening_unindent() the SRC field rather than the FILE
	field from the expanded_location.

gcc/ChangeLog:

	* gcc-rich-location.cc (blank_line_before_p): Use the new overload
	of location_get_source_line() to support generated data.
	* input.cc (get_source_text_between): Likewise.
	(get_substring_ranges_for_loc): Likewise.
	(get_source_file_content): Change the argument from a plain filename
	to a source_id.
	(location_missing_trailing_newline): Likewise.
	* input.h (get_source_file_content): Adjust prototype.
	(location_missing_trailing_newline): Likewise.
	* diagnostic-show-locus.cc (layout::calculate_x_offset_display): Use
	the new overload of location_get_source_line() to support generated
	data.
	(layout::print_line): Likewise.
	(class line_corrections): Change m_filename from a plain filename to
	a source_id.
	(source_line::source_line): Change argument from a plain filename to
	a source_id.
	(line_corrections::add_hint): Adapt to source_line change.
	(layout::print_trailing_fixits): Adapt to line_corrections change.
	(test_layout_x_offset_display_utf8): Test generated data too.
	(test_layout_x_offset_display_tab): Likewise.
	(test_diagnostic_show_locus_one_liner): Likewise.
	(test_diagnostic_show_locus_one_liner_utf8): Likewise.
	(test_add_location_if_nearby): Likewise.
	(test_diagnostic_show_locus_fixit_lines): Likewise.
	(test_fixit_consolidation): Likewise.
	(test_overlapped_fixit_printing): Likewise.
	(test_overlapped_fixit_printing_utf8): Likewise.
	(test_overlapped_fixit_printing_2): Likewise.
	(test_fixit_insert_containing_newline): Likewise.
	(test_fixit_insert_containing_newline_2): Likewise.
	(test_fixit_replace_containing_newline): Likewise.
	(test_fixit_deletion_affecting_newline): Likewise.
	(test_tab_expansion): Likewise.
	(test_escaping_bytes_1): Likewise.
	(test_escaping_bytes_2): Likewise.
	(test_line_numbers_multiline_range): Likewise.
	(diagnostic_show_locus_cc_tests): Likewise.
---
 gcc/c-family/c-format.cc      |   2 +-
 gcc/c-family/c-indentation.cc |   8 +-
 gcc/diagnostic-show-locus.cc  | 227 ++++++++++++++++++----------------
 gcc/gcc-rich-location.cc      |   2 +-
 gcc/input.cc                  |  21 ++--
 gcc/input.h                   |   6 +-
 6 files changed, 136 insertions(+), 130 deletions(-)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 529b1408179..929ec24622c 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -4537,7 +4537,7 @@ get_corrected_substring (const substring_loc &fmt_loc,
   if (caret.column > finish.column)
     return NULL;
 
-  char_span line = location_get_source_line (start.file, start.line);
+  char_span line = location_get_source_line (start);
   if (!line)
     return NULL;
 
diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
index fce74991aae..27a90d9cc15 100644
--- a/gcc/c-family/c-indentation.cc
+++ b/gcc/c-family/c-indentation.cc
@@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
 		   unsigned int *first_nws,
 		   unsigned int tab_width)
 {
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
     return false;
   if ((size_t)exploc.column > line.length ())
@@ -87,7 +87,7 @@ get_visual_column (expanded_location exploc,
    Otherwise, return false, leaving *FIRST_NWS untouched.  */
 
 static bool
-get_first_nws_vis_column (const char *file, int line_num,
+get_first_nws_vis_column (source_id file, int line_num,
 			  unsigned int *first_nws,
 			  unsigned int tab_width)
 {
@@ -158,7 +158,7 @@ get_first_nws_vis_column (const char *file, int line_num,
    Return true if such an unindent/outdent is detected.  */
 
 static bool
-detect_intervening_unindent (const char *file,
+detect_intervening_unindent (source_id file,
 			     int body_line,
 			     int next_stmt_line,
 			     unsigned int vis_column,
@@ -528,7 +528,7 @@ should_warn_for_misleading_indentation (const token_indent_info &guard_tinfo,
 
 	  /* Don't warn if there is an unindent between the two statements. */
 	  int vis_column = MIN (next_stmt_vis_column, body_vis_column);
-	  if (detect_intervening_unindent (body_exploc.file, body_exploc.line,
+	  if (detect_intervening_unindent (body_exploc.src, body_exploc.line,
 					   next_stmt_exploc.line,
 					   vis_column, tab_width))
 	    return false;
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index bf969ab6d6a..b75c272caae 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -1618,8 +1618,7 @@ layout::calculate_x_offset_display ()
       return;
     }
 
-  const char_span line = location_get_source_line (m_exploc.file,
-						   m_exploc.line);
+  const char_span line = location_get_source_line (m_exploc);
   if (!line)
     {
       /* Nothing to do, we couldn't find the source line.  */
@@ -2407,16 +2406,15 @@ class line_corrections
 {
 public:
   line_corrections (const char_display_policy &policy,
-		    const char *filename,
-		    linenum_type row)
-  : m_policy (policy), m_filename (filename), m_row (row)
+		    source_id src, linenum_type row)
+  : m_policy (policy), m_src (src), m_row (row)
   {}
   ~line_corrections ();
 
   void add_hint (const fixit_hint *hint);
 
   const char_display_policy &m_policy;
-  const char *m_filename;
+  source_id m_src;
   linenum_type m_row;
   auto_vec <correction *> m_corrections;
 };
@@ -2437,7 +2435,7 @@ line_corrections::~line_corrections ()
 class source_line
 {
 public:
-  source_line (const char *filename, int line);
+  source_line (source_id src, int line);
 
   char_span as_span () { return char_span (chars, width); }
 
@@ -2447,9 +2445,9 @@ public:
 
 /* source_line's ctor.  */
 
-source_line::source_line (const char *filename, int line)
+source_line::source_line (source_id src, int line)
 {
-  char_span span = location_get_source_line (filename, line);
+  char_span span = location_get_source_line (src, line);
   chars = span.get_buffer ();
   width = span.length ();
 }
@@ -2493,7 +2491,7 @@ line_corrections::add_hint (const fixit_hint *hint)
 				affected_bytes.start - 1);
 
 	  /* Try to read the source.  */
-	  source_line line (m_filename, m_row);
+	  source_line line (m_src, m_row);
 	  if (line.chars && between.finish < line.width)
 	    {
 	      /* Consolidate into the last correction:
@@ -2549,7 +2547,7 @@ layout::print_trailing_fixits (linenum_type row)
 {
   /* Build a list of correction instances for the line,
      potentially consolidating hints (for the sake of readability).  */
-  line_corrections corrections (m_policy, m_exploc.file, row);
+  line_corrections corrections (m_policy, m_exploc.src, row);
   for (unsigned int i = 0; i < m_fixit_hints.length (); i++)
     {
       const fixit_hint *hint = m_fixit_hints[i];
@@ -2787,7 +2785,7 @@ layout::show_ruler (int max_column) const
 void
 layout::print_line (linenum_type row)
 {
-  char_span line = location_get_source_line (m_exploc.file, row);
+  char_span line = location_get_source_line (m_exploc.src, row);
   if (!line)
     return;
 
@@ -2996,10 +2994,10 @@ test_layout_x_offset_display_utf8 (const line_table_case &case_)
      no multibyte characters earlier on the line.  */
   const int emoji_col = 102;
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, 1 + line_bytes,
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, line_bytes);
 
@@ -3007,17 +3005,23 @@ test_layout_x_offset_display_utf8 (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (line_bytes, LOCATION_COLUMN (line_end));
 
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  const expanded_location xloc = expand_location (line_end);
+  char_span lspan = location_get_source_line (xloc);
   ASSERT_EQ (line_display_cols,
 	     cpp_display_width (lspan.get_buffer (), lspan.length (),
 				def_policy ()));
   ASSERT_EQ (line_display_cols,
-	     location_compute_display_column (expand_location (line_end),
-					      def_policy ()));
+	     location_compute_display_column (xloc, def_policy ()));
   ASSERT_EQ (0, memcmp (lspan.get_buffer () + (emoji_col - 1),
 			"\xf0\x9f\x98\x82\xf0\x9f\x98\x82", 8));
 
@@ -3149,10 +3153,10 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
      a space would have taken up.  */
   ASSERT_EQ (7, extra_width[10]);
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, line_bytes + 1,
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, line_bytes);
 
@@ -3161,7 +3165,8 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
     return;
 
   /* Check that cpp_display_width handles the tabs as expected.  */
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  const expanded_location xloc = expand_location (line_end);
+  char_span lspan = location_get_source_line (xloc);
   ASSERT_EQ ('\t', *(lspan.get_buffer () + (tab_col - 1)));
   for (int tabstop = 1; tabstop != num_tabstops; ++tabstop)
     {
@@ -3170,8 +3175,7 @@ test_layout_x_offset_display_tab (const line_table_case &case_)
 		 cpp_display_width (lspan.get_buffer (), lspan.length (),
 				    policy));
       ASSERT_EQ (line_bytes + extra_width[tabstop],
-		 location_compute_display_column (expand_location (line_end),
-						  policy));
+		 location_compute_display_column (xloc, policy));
     }
 
   /* Check that the tab is expanded to the expected number of spaces.  */
@@ -3795,10 +3799,10 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
      ....................0000000001111111.
      ....................1234567890123456.  */
   const char *content = "foo = bar.field;\n";
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, 16);
 
@@ -3806,7 +3810,14 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (16, LOCATION_COLUMN (line_end));
 
@@ -4377,10 +4388,10 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
     /* 0000000000000000000001111111111111111111222222222222222222222233333
        1111222233334444567890122223333456789999000011112222345678999900001
        Byte columns.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   location_t line_end = linemap_position_for_column (line_table, 31);
 
@@ -4388,11 +4399,18 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
   if (line_end > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
 
-  ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+  if (ltt.m_generated_data)
+    {
+      ASSERT_EQ (nullptr, tmp.get_filename ());
+      ASSERT_STREQ (special_fname_generated (), LOCATION_FILE (line_end));
+    }
+  else
+    ASSERT_STREQ (tmp.get_filename (), LOCATION_FILE (line_end));
+
   ASSERT_EQ (1, LOCATION_LINE (line_end));
   ASSERT_EQ (31, LOCATION_COLUMN (line_end));
 
-  char_span lspan = location_get_source_line (tmp.get_filename (), 1);
+  char_span lspan = location_get_source_line (expand_location (line_end));
   ASSERT_EQ (25, cpp_display_width (lspan.get_buffer (), lspan.length (),
 				    def_policy ()));
   ASSERT_EQ (25, location_compute_display_column (expand_location (line_end),
@@ -4429,12 +4447,10 @@ test_add_location_if_nearby (const line_table_case &case_)
        "  double x;\n"                              /* line 4.  */
        "  double y;\n"                              /* line 5.  */
        ";\n");                                      /* line 6.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
 
   linemap_line_start (line_table, 1, 100);
 
@@ -4493,12 +4509,10 @@ test_diagnostic_show_locus_fixit_lines (const line_table_case &case_)
        "\n"                                      /* line 4.  */
        "\n"                                      /* line 5.  */
        "                        : 0.0};\n");     /* line 6.  */
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
 
   linemap_line_start (line_table, 1, 100);
 
@@ -4589,8 +4603,10 @@ static void
 test_fixit_consolidation (const line_table_case &case_)
 {
   line_table_test ltt (case_);
-
-  linemap_add (line_table, LC_ENTER, false, "test.c", 1);
+  if (ltt.m_generated_data)
+    linemap_add (line_table, LC_GEN, false, "some content", 1, 13);
+  else
+    linemap_add (line_table, LC_ENTER, false, "test.c", 1);
 
   const location_t c10 = linemap_position_for_column (line_table, 10);
   const location_t c15 = linemap_position_for_column (line_table, 15);
@@ -4736,13 +4752,11 @@ test_overlapped_fixit_printing (const line_table_case &case_)
      ...123456789012345678901234567890123456789.  */
   const char *content
     = ("  foo *f = (foo *)ptr->field;\n");
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
 
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -4763,6 +4777,8 @@ test_overlapped_fixit_printing (const line_table_case &case_)
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 28);
   const location_t expr = make_location (expr_start, expr_start, expr_finish);
 
+  const expanded_location xloc = expand_location (expr);
+
   /* Various examples of fix-it hints that aren't themselves consolidated,
      but for which the *printing* may need consolidation.  */
 
@@ -4806,7 +4822,7 @@ test_overlapped_fixit_printing (const line_table_case &case_)
     /* Add each hint in turn to a line_corrections instance,
        and verify that they are consolidated into one correction instance
        as expected.  */
-    line_corrections lc (policy, tmp.get_filename (), 1);
+    line_corrections lc (policy, xloc.src, xloc.line);
 
     /* The first replace hint by itself.  */
     lc.add_hint (hint_0);
@@ -4947,13 +4963,10 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
        12344445555666677778901234566667777888899990123456789012333344445
        Byte columns.  */
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".C", content);
   line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
-
+  temp_source_file tmp (SELFTEST_LOCATION, ".C", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -4974,6 +4987,8 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
     = linemap_position_for_line_and_column (line_table, ord_map, 1, 34);
   const location_t expr = make_location (expr_start, expr_start, expr_finish);
 
+  const expanded_location xloc = expand_location (expr);
+
   /* Various examples of fix-it hints that aren't themselves consolidated,
      but for which the *printing* may need consolidation.  */
 
@@ -5022,7 +5037,7 @@ test_overlapped_fixit_printing_utf8 (const line_table_case &case_)
     /* Add each hint in turn to a line_corrections instance,
        and verify that they are consolidated into one correction instance
        as expected.  */
-    line_corrections lc (policy, tmp.get_filename (), 1);
+    line_corrections lc (policy, xloc.src, xloc.line);
 
     /* The first replace hint by itself.  */
     lc.add_hint (hint_0);
@@ -5180,13 +5195,11 @@ test_overlapped_fixit_printing_2 (const line_table_case &case_)
      ...123456789012345678901234567890123456789.  */
   const char *content
     = ("int a5[][0][0] = { 1, 2 };\n");
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
-  line_table_test ltt (case_);
-
-  const line_map_ordinary *ord_map
-    = linemap_check_ordinary (linemap_add (line_table, LC_ENTER, false,
-					   tmp.get_filename (), 0));
 
+  line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   const location_t final_line_end
@@ -5271,10 +5284,10 @@ test_fixit_insert_containing_newline (const line_table_case &case_)
 			     "      x = a;\n"  /* line 2. */
 			     "    case 'b':\n" /* line 3. */
 			     "      x = b;\n");/* line 4. */
-
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 3);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), false);
+  tmp.do_linemap_add (3);
 
   location_t case_start = linemap_position_for_column (line_table, 5);
   location_t case_finish = linemap_position_for_column (line_table, 13);
@@ -5342,12 +5355,11 @@ test_fixit_insert_containing_newline_2 (const line_table_case &case_)
 			     "{\n"              /* line 2. */
 			     " putchar (ch);\n" /* line 3. */
 			     "}\n");            /* line 4. */
-
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
 
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* The primary range is the "putchar" token.  */
@@ -5406,9 +5418,10 @@ test_fixit_replace_containing_newline (const line_table_case &case_)
     .........................1234567890123.  */
   const char *old_content = "foo = bar ();\n";
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   /* Replace the " = " with "\n  = ", as if we were reformatting an
      overly long line.  */
@@ -5446,10 +5459,10 @@ test_fixit_deletion_affecting_newline (const line_table_case &case_)
   const char *old_content = ("foo = bar (\n"
 			     "      );\n");
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", old_content,
+			strlen (old_content), ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* Attempt to delete the " (\n...)".  */
@@ -5498,9 +5511,10 @@ test_tab_expansion (const line_table_case &case_)
   const int last_byte_col = 25;
   ASSERT_EQ (35, cpp_display_width (content, last_byte_col, policy));
 
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content);
   line_table_test ltt (case_);
-  linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 1);
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, strlen (content),
+			ltt.m_generated_data);
+  tmp.do_linemap_add (1);
 
   /* Don't attempt to run the tests if column data might be unavailable.  */
   location_t line_end = linemap_position_for_column (line_table, last_byte_col);
@@ -5547,15 +5561,14 @@ test_escaping_bytes_1 (const line_table_case &case_)
 {
   const char content[] = "before\0\1\2\3\v\x80\xff""after\n";
   const size_t sz = sizeof (content);
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz,
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   location_t finish
-    = linemap_position_for_line_and_column (line_table, ord_map, 1,
-					    strlen (content));
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, sz);
 
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
@@ -5603,15 +5616,14 @@ test_escaping_bytes_2 (const line_table_case &case_)
 {
   const char content[]  = "\0after\n";
   const size_t sz = sizeof (content);
-  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz);
   line_table_test ltt (case_);
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  temp_source_file tmp (SELFTEST_LOCATION, ".c", content, sz,
+			ltt.m_generated_data);
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   location_t finish
-    = linemap_position_for_line_and_column (line_table, ord_map, 1,
-					    strlen (content));
+    = linemap_position_for_line_and_column (line_table, ord_map, 1, sz);
 
   if (finish > LINE_MAP_MAX_LOCATION_WITH_COLS)
     return;
@@ -5663,8 +5675,7 @@ test_line_numbers_multiline_range ()
   temp_source_file tmp (SELFTEST_LOCATION, ".txt", pp_formatted_text (&pp));
   line_table_test ltt;
 
-  const line_map_ordinary *ord_map = linemap_check_ordinary
-    (linemap_add (line_table, LC_ENTER, false, tmp.get_filename (), 0));
+  const line_map_ordinary *ord_map = tmp.do_linemap_add (0);
   linemap_line_start (line_table, 1, 100);
 
   /* Create a multi-line location, starting at the "line" of line 9, with
@@ -5705,28 +5716,28 @@ diagnostic_show_locus_cc_tests ()
 
   test_display_widths ();
 
-  for_each_line_table_case (test_layout_x_offset_display_utf8);
-  for_each_line_table_case (test_layout_x_offset_display_tab);
+  for_each_line_table_case (test_layout_x_offset_display_utf8, true);
+  for_each_line_table_case (test_layout_x_offset_display_tab, true);
 
   test_get_line_bytes_without_trailing_whitespace ();
 
   test_diagnostic_show_locus_unknown_location ();
 
-  for_each_line_table_case (test_diagnostic_show_locus_one_liner);
-  for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8);
-  for_each_line_table_case (test_add_location_if_nearby);
-  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines);
-  for_each_line_table_case (test_fixit_consolidation);
-  for_each_line_table_case (test_overlapped_fixit_printing);
-  for_each_line_table_case (test_overlapped_fixit_printing_utf8);
-  for_each_line_table_case (test_overlapped_fixit_printing_2);
-  for_each_line_table_case (test_fixit_insert_containing_newline);
-  for_each_line_table_case (test_fixit_insert_containing_newline_2);
-  for_each_line_table_case (test_fixit_replace_containing_newline);
-  for_each_line_table_case (test_fixit_deletion_affecting_newline);
-  for_each_line_table_case (test_tab_expansion);
-  for_each_line_table_case (test_escaping_bytes_1);
-  for_each_line_table_case (test_escaping_bytes_2);
+  for_each_line_table_case (test_diagnostic_show_locus_one_liner, true);
+  for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8, true);
+  for_each_line_table_case (test_add_location_if_nearby, true);
+  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines, true);
+  for_each_line_table_case (test_fixit_consolidation, true);
+  for_each_line_table_case (test_overlapped_fixit_printing, true);
+  for_each_line_table_case (test_overlapped_fixit_printing_utf8, true);
+  for_each_line_table_case (test_overlapped_fixit_printing_2, true);
+  for_each_line_table_case (test_fixit_insert_containing_newline, true);
+  for_each_line_table_case (test_fixit_insert_containing_newline_2, true);
+  for_each_line_table_case (test_fixit_replace_containing_newline, true);
+  for_each_line_table_case (test_fixit_deletion_affecting_newline, true);
+  for_each_line_table_case (test_tab_expansion, true);
+  for_each_line_table_case (test_escaping_bytes_1, true);
+  for_each_line_table_case (test_escaping_bytes_2, true);
 
   test_line_numbers_multiline_range ();
 }
diff --git a/gcc/gcc-rich-location.cc b/gcc/gcc-rich-location.cc
index edecf07f81e..5a118925f77 100644
--- a/gcc/gcc-rich-location.cc
+++ b/gcc/gcc-rich-location.cc
@@ -78,7 +78,7 @@ static bool
 blank_line_before_p (location_t loc)
 {
   expanded_location exploc = expand_location (loc);
-  char_span line = location_get_source_line (exploc.file, exploc.line);
+  char_span line = location_get_source_line (exploc);
   if (!line)
     return false;
   if (line.length () < (size_t)exploc.column)
diff --git a/gcc/input.cc b/gcc/input.cc
index 8c4e40aaf23..a987435c733 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -1048,7 +1048,7 @@ get_source_text_between (location_t start, location_t end)
   /* For a single line we need to trim both edges.  */
   if (expstart.line == expend.line)
     {
-      char_span line = location_get_source_line (expstart.file, expstart.line);
+      char_span line = location_get_source_line (expstart);
       if (line.length () < 1)
 	return NULL;
       int s = expstart.column - 1;
@@ -1065,7 +1065,7 @@ get_source_text_between (location_t start, location_t end)
      parts of the start and end lines off depending on column values.  */
   for (int lnum = expstart.line; lnum <= expend.line; ++lnum)
     {
-      char_span line = location_get_source_line (expstart.file, lnum);
+      char_span line = location_get_source_line (expstart.src, lnum);
       if (line.length () < 1 && (lnum != expstart.line && lnum != expend.line))
 	continue;
 
@@ -1114,11 +1114,10 @@ get_source_text_between (location_t start, location_t end)
    as decoded according to the input charset, encoded as UTF-8.  */
 
 char_span
-get_source_file_content (const char *file_path)
+get_source_file_content (source_id src)
 {
   diagnostic_file_cache_init ();
-
-  file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path);
+  const auto c = global_dc->m_file_cache->lookup_or_add (src);
   return c->get_full_file_content ();
 }
 
@@ -1127,15 +1126,11 @@ get_source_file_content (const char *file_path)
    requesting a line number beyond the end of the file.  */
 
 bool
-location_missing_trailing_newline (const char *file_path)
+location_missing_trailing_newline (source_id src)
 {
   diagnostic_file_cache_init ();
-
-  file_cache_slot *c = global_dc->m_file_cache->lookup_or_add_file (file_path);
-  if (c == NULL)
-    return false;
-
-  return c->missing_trailing_newline_p ();
+  const auto c = global_dc->m_file_cache->lookup_or_add (src);
+  return c && c->missing_trailing_newline_p ();
 }
 
 /* Test if the location originates from the spelling location of a
@@ -1850,7 +1845,7 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       if (start.column > finish.column)
 	return "range endpoints are reversed";
 
-      char_span line = location_get_source_line (start.file, start.line);
+      char_span line = location_get_source_line (start);
       if (!line)
 	return "unable to read source line";
 
diff --git a/gcc/input.h b/gcc/input.h
index d30673f1089..a784f101ce7 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -119,10 +119,10 @@ inline char_span location_get_source_line (expanded_location exploc)
 {
   return location_get_source_line (exploc.src, exploc.line);
 }
-extern char *get_source_text_between (location_t, location_t);
-extern char_span get_source_file_content (const char *file_path);
 
-extern bool location_missing_trailing_newline (const char *file_path);
+extern char *get_source_text_between (location_t, location_t);
+extern char_span get_source_file_content (source_id src);
+extern bool location_missing_trailing_newline (source_id src);
 
 /* Forward decl of slots within file_cache, so that the definition doesn't
    need to be in this header.  */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                           ` (5 preceding siblings ...)
  2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
  7 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

Currently, the tokens obtained from a destringified _Pragma string do not get
assigned proper locations while they are being lexed.  After the tokens have
been obtained, they are reassigned the same location as the _Pragma token,
which is sufficient to make things like _Pragma("GCC diagnostic ignored...")
operate correctly, but this still results in inferior diagnostics, since the
diagnostics do not point to the problematic tokens.  Further, if a diagnostic
is issued by libcpp during the lexing of the tokens, as opposed to being
issued by the frontend during the processing of the pragma, then the
patched-up location is not yet in place, and the user rather sees an invalid
location that is near to the location of the _Pragma string in some cases, or
potentially very far away, depending on the macro expansion history.  For
example:

=====
_Pragma("GCC diagnostic ignored \"oops")
=====

produces the diagnostic:

file.cpp:1:24: warning: missing terminating " character
    1 | _Pragma("GCC diagnostic ignored \"oops")
      |                        ^

with the caret in a nonsensical location, while this one:

=====
 #define S "GCC diagnostic ignored \"oops"
_Pragma(S)
=====

produces:

file.cpp:2:24: warning: missing terminating " character
    2 | _Pragma(S)
      |                        ^

with both the caret in a nonsensical location, and the actual relevant context
completely absent.

Fix this by assigning proper locations using the new LC_GEN type of linemap.
Now the tokens are given locations inside a generated content buffer, and the
macro expansion stack is modified to be aware that these tokens logically
belong to the "expansion" of the _Pragma directive. For the above examples we
now output:

======
In buffer generated from file.cpp:1:
<generated>:1:24: warning: missing terminating " character
    1 | GCC diagnostic ignored "oops
      |                        ^
file.cpp:1:1: note: in <_Pragma directive>
    1 | _Pragma("GCC diagnostic ignored \"oops")
      | ^~~~~~~
======

and

======
<generated>:1:24: warning: missing terminating " character
    1 | GCC diagnostic ignored "oops
      |                        ^
file.cpp:2:1: note: in <_Pragma directive>
    2 | _Pragma(S)
      | ^~~~~~~
======

So that carets are pointing to something meaningful and all relevant context
appears in the diagnostic.  For the second example, it would be nice if the
macro expansion also output "in expansion of macro S", however doing that for
a general case of macro expansions makes the logic very complicated, since it
has to be done after the fact when the macro maps have already been
constructed.  It doesn't seem worth it for this case, given that the _Pragma
string has already been output once on the first line.

gcc/ChangeLog:

	* tree-diagnostic.cc (maybe_unwind_expanded_macro_loc): Add awareness
	of _Pragma directive to the macro expansion trace.

libcpp/ChangeLog:

	* directives.cc (get_token_no_padding): Add argument to receive the
	virtual location of the token.
	(get__Pragma_string): Likewise.
	(do_pragma): Set pfile->directive_result->src_loc properly, it should
	not be a virtual location.
	(destringize_and_run): Update to provide proper locations for the
	_Pragma string tokens.  Support raw strings.
	(_cpp_do__Pragma): Adapt to changes to the helper functions.
	* errors.cc (cpp_diagnostic_at): Support
	cpp_reader::diagnostic_rebase_loc.
	(cpp_diagnostic_with_line): Likewise.
	* include/line-map.h (class rich_location): Add new member
	forget_cached_expanded_locations().
	* internal.h (struct _cpp__Pragma_state): Define new struct.
	(_cpp_rebase_diagnostic_location): Declare new function.
	(struct cpp_reader): Add diagnostic_rebase_loc member.
	(_cpp_push__Pragma_token_context): Declare new function.
	(_cpp_do__Pragma): Adjust prototype.
	* macro.cc (pragma_str): New static var.
	(builtin_macro): Adapt to new implementation of _Pragma processing.
	(_cpp_pop_context): Fix the logic for resetting
	pfile->top_most_macro_node, which previously was never triggered,
	although the error seems to have been harmless.
	(_cpp_push__Pragma_token_context): New function.
	(_cpp_rebase_diagnostic_location): New function.

gcc/c-family/ChangeLog:

	* c-ppoutput.cc (token_streamer::stream): Pass the virtual location of
	the _Pragma token to maybe_print_line(), not the spelling location.

libgomp/ChangeLog:

	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adjust for new
	macro tracking output for _Pragma directives.
	* testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/cpp/diagnostic-pragma-1.c: Adjust for new macro
	tracking output for _Pragma directives.
	* c-c++-common/cpp/pr57580.c: Likewise.
	* c-c++-common/gomp/pragma-3.c: Likewise.
	* c-c++-common/gomp/pragma-5.c: Likewise.
	* g++.dg/pch/operator-1.C: Likewise.
	* gcc.dg/cpp/pr28165.c: Likewise.
	* gcc.dg/cpp/pr35322.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-4.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-5.c: Likewise.
	* gcc.dg/dfp/pragma-float-const-decimal64-6.c: Likewise.
	* gcc.dg/gomp/macro-4.c: Likewise.
	* gcc.dg/pragma-message.c: Likewise.
	* c-c++-common/pragma-diag-17.c: New test.
	* c-c++-common/pragma-diag-18.c: New test.
	* g++.dg/cpp/pragma-raw-string.C: New test.
	* g++.dg/pch/LC_GEN-maps.C: New test.
	* g++.dg/pch/LC_GEN-maps.Hs: New test.
	* lib/prune.exp: Support pruning new _Pragma include trace.
---
 gcc/c-family/c-ppoutput.cc                    |   2 +-
 .../c-c++-common/cpp/diagnostic-pragma-1.c    |   1 +
 gcc/testsuite/c-c++-common/cpp/pr57580.c      |   2 +-
 gcc/testsuite/c-c++-common/gomp/pragma-3.c    |   3 +-
 gcc/testsuite/c-c++-common/gomp/pragma-5.c    |   3 +-
 gcc/testsuite/c-c++-common/pragma-diag-17.c   |  35 +++
 gcc/testsuite/c-c++-common/pragma-diag-18.c   |  18 ++
 gcc/testsuite/g++.dg/cpp/pragma-raw-string.C  |  16 +
 gcc/testsuite/g++.dg/pch/LC_GEN-maps.C        |  20 ++
 gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs       |   5 +
 gcc/testsuite/g++.dg/pch/operator-1.C         |   1 +
 gcc/testsuite/gcc.dg/cpp/pr28165.c            |   1 +
 gcc/testsuite/gcc.dg/cpp/pr35322.c            |   1 +
 .../dfp/pragma-float-const-decimal64-4.c      |   1 +
 .../dfp/pragma-float-const-decimal64-5.c      |   2 +-
 .../dfp/pragma-float-const-decimal64-6.c      |   2 +-
 gcc/testsuite/gcc.dg/gomp/macro-4.c           |   2 +-
 gcc/testsuite/gcc.dg/pragma-message.c         |   3 +-
 gcc/testsuite/lib/prune.exp                   |   1 +
 gcc/tree-diagnostic.cc                        |  18 +-
 libcpp/directives.cc                          | 278 ++++++++++++------
 libcpp/errors.cc                              |  16 +-
 libcpp/include/line-map.h                     |   1 +
 libcpp/internal.h                             |  32 +-
 libcpp/macro.cc                               | 126 +++++++-
 .../libgomp.oacc-c-c++-common/reduction-5.c   |   3 +-
 .../libgomp.oacc-c-c++-common/vred2d-128.c    |  40 ++-
 27 files changed, 491 insertions(+), 142 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/pragma-diag-17.c
 create mode 100644 gcc/testsuite/c-c++-common/pragma-diag-18.c
 create mode 100644 gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
 create mode 100644 gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
 create mode 100644 gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs

diff --git a/gcc/c-family/c-ppoutput.cc b/gcc/c-family/c-ppoutput.cc
index 4aa2bef2c0f..364bfe5ad43 100644
--- a/gcc/c-family/c-ppoutput.cc
+++ b/gcc/c-family/c-ppoutput.cc
@@ -280,7 +280,7 @@ token_streamer::stream (cpp_reader *pfile, const cpp_token *token,
 	  const char *space;
 	  const char *name;
 
-	  line_marker_emitted = maybe_print_line (token->src_loc);
+	  line_marker_emitted = maybe_print_line (loc);
 	  fputs ("#pragma ", print.outf);
 	  c_pp_lookup_pragma (token->val.pragma, &space, &name);
 	  if (space)
diff --git a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
index 9867c94a8dd..801c93935b8 100644
--- a/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
+++ b/gcc/testsuite/c-c++-common/cpp/diagnostic-pragma-1.c
@@ -1,4 +1,5 @@
 // { dg-do compile }
+// { dg-additional-options "-ftrack-macro-expansion=0" }
 
 #pragma GCC warning "warn-a" // { dg-warning warn-a }
 #pragma GCC error "err-b" // { dg-error err-b }
diff --git a/gcc/testsuite/c-c++-common/cpp/pr57580.c b/gcc/testsuite/c-c++-common/cpp/pr57580.c
index e77462b20de..b0e54d876d6 100644
--- a/gcc/testsuite/c-c++-common/cpp/pr57580.c
+++ b/gcc/testsuite/c-c++-common/cpp/pr57580.c
@@ -1,6 +1,6 @@
 /* PR preprocessor/57580 */
 /* { dg-do compile } */
-/* { dg-options "-save-temps" } */
+/* { dg-options "-save-temps -ftrack-macro-expansion=0" } */
 
 #define MSG 	\
   _Pragma("message(\"message0\")")	\
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-3.c b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
index 3e1b2111c3d..e0cffb8aeea 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-3.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-3.c
@@ -8,7 +8,8 @@ void
 f (void)
 {
   const char *str = outer(inner(1,2)); /* { dg-line str_location } */
-  /* { dg-warning "35:'pragma omp error' encountered: Test" "" { target *-*-* } inner_location }
+  /* { dg-warning "1:'pragma omp error' encountered: Test" "" { target *-*-* } 1 }
+     { dg-note "35: in <_Pragma directive>" "" { target *-*-* } inner_location }
      { dg-note "20:in expansion of macro 'inner'" "" { target *-*-* } outer_location }
      { dg-note "21:in expansion of macro 'outer'" "" { target *-*-* } str_location } */
 }
diff --git a/gcc/testsuite/c-c++-common/gomp/pragma-5.c b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
index 173c25e803a..787a334882d 100644
--- a/gcc/testsuite/c-c++-common/gomp/pragma-5.c
+++ b/gcc/testsuite/c-c++-common/gomp/pragma-5.c
@@ -8,7 +8,8 @@ void
 f (void)
 {
   const char *str = outer(inner(1,2)); /* { dg-line str_location } */
-  /* { dg-warning "35:'pragma omp error' encountered: Test" "" { target *-*-* } inner_location }
+  /* { dg-warning "4:'pragma omp error' encountered: Test" "" { target *-*-* } 1 }
+     { dg-note "35:in <_Pragma directive>" "" { target *-*-*} inner_location }
      { dg-note "20:in expansion of macro 'inner'" "" { target *-*-* } outer_location }
      { dg-note "21:in expansion of macro 'outer'" "" { target *-*-* } str_location } */
 }
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-17.c b/gcc/testsuite/c-c++-common/pragma-diag-17.c
new file mode 100644
index 00000000000..b9539c9598b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-17.c
@@ -0,0 +1,35 @@
+/* Test virtual location aspects of _Pragmas, when an error is reported after
+   lexing the tokens from the _Pragma string.  */
+/* { dg-additional-options "-Wpragmas -Wunknown-pragmas" } */
+
+_Pragma("GCC diagnostic ignored \"oops1\"") /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {24:'oops1' is not an option} "" { target *-*-* } 1 } */
+
+#define S2 "GCC diagnostic ignored \"oops2\""
+_Pragma(S2) /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {24:'oops2' is not an option} "" { target *-*-* } 1 } */
+
+#define PP(x) _Pragma(x) /* { dg-note {15:in <_Pragma directive>} } */
+PP("GCC diagnostic ignored \"oops3\"") /* { dg-note {1:in expansion of macro 'PP'} } */
+/* { dg-warning {24:'oops3' is not an option} "" { target *-*-* } 1 } */
+
+#define X4 _Pragma("GCC diagnostic ignored \"oops4\"") /* { dg-note {12:in <_Pragma directive>} } */
+#define Y4 X4 /* { dg-note {12:in expansion of macro 'X4'} } */
+Y4 /* { dg-note {1:in expansion of macro 'Y4'} } */
+/* { dg-warning {24:'oops4' is not an option} "" { target *-*-* } 1 } */
+
+#define P5 _Pragma /* { dg-note {12:in <_Pragma directive>} } */
+#define S5 "GCC diagnostic ignored \"oops5\""
+#define Y5 P5(S5) /* { dg-note {12:in expansion of macro 'P5'} } */
+Y5 /* { dg-note {1:in expansion of macro 'Y5'} } */
+/* { dg-warning {24:'oops5' is not an option} "" { target *-*-* } 1 } */
+
+#define P6 _Pragma /* { dg-note {12:in <_Pragma directive>} } */
+#define X6 P6("GCC diagnostic ignored \"oops6\"") /* { dg-note {12:in expansion of macro 'P6'} } */
+X6 /* { dg-note {1:in expansion of macro 'X6'} } */
+/* { dg-warning {24:'oops6' is not an option} "" { target *-*-* } 1 } */
+
+_Pragma(__DATE__) /* { dg-warning {-:[-Wunknown-pragmas]} } */
+
+_Pragma("once") /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {#pragma once in main file} "" { target *-*-*} 1 } */
diff --git a/gcc/testsuite/c-c++-common/pragma-diag-18.c b/gcc/testsuite/c-c++-common/pragma-diag-18.c
new file mode 100644
index 00000000000..5de0fbcb8f1
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pragma-diag-18.c
@@ -0,0 +1,18 @@
+/* Test virtual location aspects of _Pragmas, when an error is reported during
+   lexing of the _Pragma string itself or of the tokens within it.  */
+/* { dg-additional-options "-Wpragmas" } */
+
+#define X1 "\""
+_Pragma(X1) /* { dg-note {1:in <_Pragma directive>} } */
+/* { dg-warning {1:missing terminating " character} "" { target *-*-* } 1 } */
+
+#define X2a _Pragma("GCC warning \"hello\"") ( /* { dg-note {13:in <_Pragma directive>} } */
+#define X2b "GCC warning \"goodbye\"" )
+_Pragma X2a X2b /* { dg-note {9:in expansion of macro 'X2a'} } */
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-1 } */
+/* { dg-warning {13:hello} "" { target *-*-* } 1 } */
+/* { dg-warning {13:goodbye} "" { target *-*-* } 1 } */
+
+_Pragma() /* { dg-error {9:_Pragma takes a parenthesized string literal} } */
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-1 } */
+/* { dg-error {at end of input|'_Pragma' does not name a type} "" { target *-*-* } .-2 } */
diff --git a/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C b/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
new file mode 100644
index 00000000000..5a495aadeec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp/pragma-raw-string.C
@@ -0,0 +1,16 @@
+/* Test that _Pragma with a raw string works correctly.  */
+/* { dg-do compile { target c++11 } } */
+/* { dg-additional-options "-Wunused-variable -Wpragmas" } */
+
+_Pragma(R"delim(GCC diagnostic push)delim")
+_Pragma(R"(GCC diagnostic ignored "-Wunused-variable")")
+void f1 () { int i; }
+_Pragma(R"(GCC diagnostic pop)")
+void f2 () { int i; } /* { dg-warning {18:-Wunused-variable} } */
+
+/* Make sure lines stay in sync if there is an embedded newline too.  */
+_Pragma(R"xyz(GCC diagnostic ignored R"(two
+line option?)")xyz")
+/* { dg-note {1:in <_Pragma directive>} "" { target *-*-* } .-2 } */
+/* { dg-warning {24:unknown option} "" { target *-*-* } 1 } */
+void f3 () { int i; } /* { dg-warning {18:-Wunused-variable} } */
diff --git a/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
new file mode 100644
index 00000000000..4ce241579fe
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.C
@@ -0,0 +1,20 @@
+#include "LC_GEN-maps.H"
+
+/* The LC_GEN map was written to the PCH, but there is not currently a way to
+   observe that fact in normal user code.  Let's try to test it anyway, using
+   -fdump-internal-locations to inspect the line_maps object we received from
+   the PCH.  */
+
+/* { dg-additional-options -fdump-internal-locations } */
+/* { dg-allow-blank-lines-in-output "" } */
+
+/* These regexps themselves will also appear in the output of
+   -fdump-internal-locations, so we need to make sure they contain at least
+   some regexp special characters, even if not strictly necessary, so they
+   match the intended text only, and not themselves.  Also, we make the second
+   one intentionally match the whole output if it maches anything.  We could
+   use dg-excess-errors instead, but that outputs XFAILS which are not really
+   helpful for this test.  */
+
+/* { dg-regexp {reason: . \(LC_GEN\)} } */
+/* { dg-regexp {(.|[\n\r])*[d]ata: this string should end up in the "PCH"(.|[\n\r])*} } */
diff --git a/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs
new file mode 100644
index 00000000000..76eefa7d1ae
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pch/LC_GEN-maps.Hs
@@ -0,0 +1,5 @@
+/* Evaluating the _Pragma directive here creates an LC_GEN map in the
+   line_maps object that will be stored in the PCH.  The test will make sure
+   that the buffer holding the de-stringified _Pragma string contents makes
+   its way there.  */
+_Pragma("this string should end up in the \"PCH\"")
diff --git a/gcc/testsuite/g++.dg/pch/operator-1.C b/gcc/testsuite/g++.dg/pch/operator-1.C
index 290b5f7ab21..bf1c8b07bdb 100644
--- a/gcc/testsuite/g++.dg/pch/operator-1.C
+++ b/gcc/testsuite/g++.dg/pch/operator-1.C
@@ -1,2 +1,3 @@
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 #include "operator-1.H"
 int main(void){ major(0);} /* { dg-warning "Did not Work" } */
diff --git a/gcc/testsuite/gcc.dg/cpp/pr28165.c b/gcc/testsuite/gcc.dg/cpp/pr28165.c
index 71c7c1dba46..3e5e49ffa01 100644
--- a/gcc/testsuite/gcc.dg/cpp/pr28165.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr28165.c
@@ -2,5 +2,6 @@
 /* PR preprocessor/28165 */
 
 /* { dg-do preprocess } */
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 #pragma GCC system_header   /* { dg-warning "system_header" "ignored" } */
 _Pragma ("GCC system_header")   /* { dg-warning "system_header" "ignored" } */
diff --git a/gcc/testsuite/gcc.dg/cpp/pr35322.c b/gcc/testsuite/gcc.dg/cpp/pr35322.c
index 1af9605eac6..5bd5f69b73d 100644
--- a/gcc/testsuite/gcc.dg/cpp/pr35322.c
+++ b/gcc/testsuite/gcc.dg/cpp/pr35322.c
@@ -1,4 +1,5 @@
 /* Test case for PR 35322 -- _Pragma ICE.  */
 
 /* { dg-do preprocess } */
+/* { dg-additional-options "-ftrack-macro-expansion=0" } */
 _Pragma("GCC dependency") /* { dg-error "#pragma dependency expects" } */
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
index af0398daf79..42fc28a4384 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-4.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-additional-options -ftrack-macro-expansion=0 } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
index 75e9525dda0..3aefede7b5d 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -pedantic" } */
+/* { dg-options "-std=c99 -pedantic -ftrack-macro-expansion=0" } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
index 03c1715bee6..6d70ce2bb8d 100644
--- a/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
+++ b/gcc/testsuite/gcc.dg/dfp/pragma-float-const-decimal64-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-std=c99 -pedantic-errors" } */
+/* { dg-options "-std=c99 -pedantic-errors -ftrack-macro-expansion=0" } */
 
 /* N1312 7.1.1: The FLOAT_CONST_DECIMAL64 pragma.
    C99 6.4.4.2a (New).
diff --git a/gcc/testsuite/gcc.dg/gomp/macro-4.c b/gcc/testsuite/gcc.dg/gomp/macro-4.c
index a4ed9a3980a..c6817d40125 100644
--- a/gcc/testsuite/gcc.dg/gomp/macro-4.c
+++ b/gcc/testsuite/gcc.dg/gomp/macro-4.c
@@ -1,6 +1,6 @@
 /* PR preprocessor/27746 */
 /* { dg-do compile } */
-/* { dg-options "-fopenmp -Wunknown-pragmas" } */
+/* { dg-options "-fopenmp -Wunknown-pragmas -ftrack-macro-expansion=0" } */
 
 #define p		_Pragma ("omp parallel")
 #define omp_p		_Pragma ("omp p")
diff --git a/gcc/testsuite/gcc.dg/pragma-message.c b/gcc/testsuite/gcc.dg/pragma-message.c
index 1b7cf09de0a..72fb0da6f44 100644
--- a/gcc/testsuite/gcc.dg/pragma-message.c
+++ b/gcc/testsuite/gcc.dg/pragma-message.c
@@ -45,8 +45,9 @@
 #define DO_PRAGMA(x) _Pragma (#x) /* { dg-line pragma_loc1 } */
 #define TODO(x) DO_PRAGMA(message ("TODO - " #x)) /* { dg-line pragma_loc2 } */
 TODO(Okay 4) /* { dg-message "in expansion of macro 'TODO'" } */
-/* { dg-message "TODO - Okay 4" "test4.1" { target *-*-* } pragma_loc1 } */
+/* { dg-message "1:TODO - Okay 4" "test4.1" { target *-*-* } 1 } */
 /* { dg-message "in expansion of macro 'DO_PRAGMA'" "test4.2" { target *-*-* } pragma_loc2 } */
+/* { dg-note {in <_Pragma directive>} "test4.3" { target *-*-* } pragma_loc1 } */
 
 #if 0
 #pragma message ("Not printed")
diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
index 8d37b24e59b..02ebf8b30d9 100644
--- a/gcc/testsuite/lib/prune.exp
+++ b/gcc/testsuite/lib/prune.exp
@@ -54,6 +54,7 @@ proc prune_gcc_output { text } {
 
     # Diagnostic inclusion stack
     regsub -all "(^|\n)(In file)?\[ \]+included from \[^\n\]*" $text "" text
+    regsub -all "(^|\n)In buffer generated from \[^\n\]*" $text "" text
     regsub -all "(^|\n)\[ \]+from \[^\n\]*" $text "" text
     regsub -all "(^|\n)(In|of) module( \[^\n \]*,)? imported at \[^\n\]*" $text "" text
 
diff --git a/gcc/tree-diagnostic.cc b/gcc/tree-diagnostic.cc
index 731e3559cd8..fd2773f3d8a 100644
--- a/gcc/tree-diagnostic.cc
+++ b/gcc/tree-diagnostic.cc
@@ -203,9 +203,12 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context,
 	const int resolved_def_loc_line = SOURCE_LINE (m, l0);
         if (ix == 0 && saved_location_line != resolved_def_loc_line)
           {
-            diagnostic_append_note (context, resolved_def_loc, 
-                                    "in definition of macro %qs",
-                                    linemap_map_get_macro_name (iter->map));
+	    const char *name = linemap_map_get_macro_name (iter->map);
+	    if (*name == '<')
+	      diagnostic_append_note (context, resolved_def_loc, "in %s", name);
+	    else
+	      diagnostic_append_note (context, resolved_def_loc,
+				      "in definition of macro %qs", name);
             /* At this step, as we've printed the context of the macro
                definition, we don't want to print the context of its
                expansion, otherwise, it'd be redundant.  */
@@ -220,9 +223,12 @@ maybe_unwind_expanded_macro_loc (diagnostic_context *context,
                                     MACRO_MAP_EXPANSION_POINT_LOCATION (iter->map),
                                     LRK_MACRO_DEFINITION_LOCATION, NULL);
 
-        diagnostic_append_note (context, resolved_exp_loc, 
-                                "in expansion of macro %qs",
-                                linemap_map_get_macro_name (iter->map));
+	const char *name = linemap_map_get_macro_name (iter->map);
+	if (*name == '<')
+	  diagnostic_append_note (context, resolved_exp_loc, "in %s", name);
+	else
+	  diagnostic_append_note (context, resolved_exp_loc,
+				  "in expansion of macro %qs", name);
       }
 }
 
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index dfd782b3fca..d2d83e6dc83 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -127,10 +127,10 @@ static void do_pragma_warning_or_error (cpp_reader *, bool error);
 static void do_pragma_warning (cpp_reader *);
 static void do_pragma_error (cpp_reader *);
 static void do_linemarker (cpp_reader *);
-static const cpp_token *get_token_no_padding (cpp_reader *);
-static const cpp_token *get__Pragma_string (cpp_reader *);
-static void destringize_and_run (cpp_reader *, const cpp_string *,
-				 location_t);
+static const cpp_token *get_token_no_padding (cpp_reader *,
+					      location_t * = nullptr);
+static const cpp_token *get__Pragma_string (cpp_reader *,
+					    location_t * = nullptr);
 static bool parse_answer (cpp_reader *, int, location_t, cpp_macro **);
 static cpp_hashnode *parse_assertion (cpp_reader *, int, cpp_macro **);
 static cpp_macro **find_answer (cpp_hashnode *, const cpp_macro *);
@@ -1504,14 +1504,12 @@ do_pragma (cpp_reader *pfile)
 {
   const struct pragma_entry *p = NULL;
   const cpp_token *token, *pragma_token;
-  location_t pragma_token_virt_loc = 0;
   cpp_token ns_token;
   unsigned int count = 1;
 
   pfile->state.prevent_expansion++;
 
-  pragma_token = token = cpp_get_token_with_location (pfile,
-						      &pragma_token_virt_loc);
+  pragma_token = token = cpp_get_token (pfile);
   ns_token = *token;
   if (token->type == CPP_NAME)
     {
@@ -1537,7 +1535,7 @@ do_pragma (cpp_reader *pfile)
     {
       if (p->is_deferred)
 	{
-	  pfile->directive_result.src_loc = pragma_token_virt_loc;
+	  pfile->directive_result.src_loc = pragma_token->src_loc;
 	  pfile->directive_result.type = CPP_PRAGMA;
 	  pfile->directive_result.flags = pragma_token->flags;
 	  pfile->directive_result.val.pragma = p->u.ident;
@@ -1830,11 +1828,11 @@ do_pragma_error (cpp_reader *pfile)
 
 /* Get a token but skip padding.  */
 static const cpp_token *
-get_token_no_padding (cpp_reader *pfile)
+get_token_no_padding (cpp_reader *pfile, location_t *virt_loc)
 {
   for (;;)
     {
-      const cpp_token *result = cpp_get_token (pfile);
+      const cpp_token *result = cpp_get_token_with_location (pfile, virt_loc);
       if (result->type != CPP_PADDING)
 	return result;
     }
@@ -1843,7 +1841,7 @@ get_token_no_padding (cpp_reader *pfile)
 /* Check syntax is "(string-literal)".  Returns the string on success,
    or NULL on failure.  */
 static const cpp_token *
-get__Pragma_string (cpp_reader *pfile)
+get__Pragma_string (cpp_reader *pfile, location_t *string_virt_loc)
 {
   const cpp_token *string;
   const cpp_token *paren;
@@ -1854,7 +1852,7 @@ get__Pragma_string (cpp_reader *pfile)
   if (paren->type != CPP_OPEN_PAREN)
     return NULL;
 
-  string = get_token_no_padding (pfile);
+  string = get_token_no_padding (pfile, string_virt_loc);
   if (string->type == CPP_EOF)
     _cpp_backup_tokens (pfile, 1);
   if (string->type != CPP_STRING && string->type != CPP_WSTRING
@@ -1874,55 +1872,105 @@ get__Pragma_string (cpp_reader *pfile)
 /* Destringize IN into a temporary buffer, by removing the first \ of
    \" and \\ sequences, and process the result as a #pragma directive.  */
 static void
-destringize_and_run (cpp_reader *pfile, const cpp_string *in,
-		     location_t expansion_loc)
-{
-  const unsigned char *src, *limit;
-  char *dest, *result;
-  cpp_context *saved_context;
-  cpp_token *saved_cur_token;
-  tokenrun *saved_cur_run;
-  cpp_token *toks;
-  int count;
-  const struct directive *save_directive;
-
-  dest = result = (char *) alloca (in->len - 1);
-  src = in->text + 1 + (in->text[0] == 'L');
-  limit = in->text + in->len - 1;
-  while (src < limit)
+destringize_and_run (cpp_reader *pfile, _cpp__Pragma_state *pstate)
+{
+  uchar *dest, *result;
+
+  /* Determine where the data starts, and what kind of string it is.  */
+  const cpp_string *const in = &pstate->string_tok->val.str;
+  const uchar *src = in->text;
+  bool is_raw_string = false;
+  for (;;)
     {
-      /* We know there is a character following the backslash.  */
-      if (*src == '\\' && (src[1] == '\\' || src[1] == '"'))
-	src++;
-      *dest++ = *src++;
+      switch (*src++)
+	{
+	case '\"': break;
+	case 'R': is_raw_string = true; continue;
+	case '\0': gcc_assert (false);
+	default: continue;
+	}
+      break;
     }
-  *dest = '\n';
 
-  /* Ugh; an awful kludge.  We are really not set up to be lexing
-     tokens when in the middle of a macro expansion.  Use a new
-     context to force cpp_get_token to lex, and so skip_rest_of_line
-     doesn't go beyond the end of the text.  Also, remember the
-     current lexing position so we can return to it later.
+  /* If we were given a raw string literal, we don't need to destringize it,
+     but we do need to strip off the prefix and the suffix.  */
+  if (is_raw_string)
+    {
+      cpp_string buf;
+      const bool ok
+	= cpp_interpret_string_notranslate (pfile, in, 1, &buf, CPP_STRING);
+      gcc_assert (ok);
 
-     Something like line-at-a-time lexing should remove the need for
-     this.  */
-  saved_context = pfile->context;
-  saved_cur_token = pfile->cur_token;
-  saved_cur_run = pfile->cur_run;
+      /* BUF.TEXT ends with a terminating null (which is counted in BUF.LEN).
+	 We want to end with a newline as required by cpp_push_buffer.  While it
+	 is not strictly necessary to null terminate our buffer, it is useful to
+	 do so for safety, so we reserve one extra byte.  The \n\0 sequence is
+	 appended after the else block.  */
+      result = _cpp_unaligned_alloc (pfile, buf.len + 1);
+      memcpy (result, buf.text, buf.len - 1);
+      dest = result + (buf.len - 1);
+      XDELETEVEC (buf.text);
+    }
+  else
+    {
+      const auto last_ptr = in->text + in->len - 1;
+      /* +2 for the trailing \n\0 as above.  */
+      dest = result = _cpp_unaligned_alloc (pfile, last_ptr - src + 1 + 2);
+      while (src < last_ptr)
+	{
+	  /* We know there is a character following the backslash.  */
+	  if (*src == '\\' && (src[1] == '\\' || src[1] == '"'))
+	    src++;
+	  *dest++ = *src++;
+	}
+    }
+  *dest++ = '\n';
+  *dest++ = '\0';
 
-  pfile->context = XCNEW (cpp_context);
+  /* We will now ask PFILE to interrupt what it was doing (obtaining tokens
+     either from the main context via lexing, or from a macro context), and get
+     tokens from the string argument instead.  We create a new isolated
+     cpp_context so that cpp_get_token will think it is working on the main
+     buffer and call cpp_lex_token accordingly.  Save all the relevant state so
+     we can return to the previous task once that is completed.
 
-  /* Inline run_directive, since we need to delay the _cpp_pop_buffer
-     until we've read all of the tokens that we want.  */
-  cpp_push_buffer (pfile, (const uchar *) result, dest - result,
-		   /* from_stage3 */ true);
-  /* ??? Antique Disgusting Hack.  What does this do?  */
-  if (pfile->buffer->prev)
-    pfile->buffer->file = pfile->buffer->prev->file;
+     Doing things this way is a bit of a kludge, but the alternative would be
+     to create a new context type to support lexing from a string, and that
+     would add overhead to every token parse, while _Pragma is relatively rarely
+     needed.  */
 
+  const auto saved_context = pfile->context;
+  const auto saved_cur_token = pfile->cur_token;
+  const auto saved_cur_run = pfile->cur_run;
+  pfile->context = XCNEW (cpp_context);
   start_directive (pfile);
+
+  /* Set up an LC_GEN line map to get valid locations for the tokens we are
+     about to lex.  We need to do this after calling start_directive, because
+     historically pfile->directive_line is what's been passed to
+     pfile->cb.def_pragma, and we are not proposing to change that now.  To
+     decide if we are in a system header or not, look at the location of the
+     _Pragma token.  So for instance if we have _Pragma(S) in the main file,
+     where S is a macro defined in a system header, we will decide we are not in
+     a system location.  */
+  const unsigned int buf_len = dest - result;
+  const int sysp = linemap_location_in_system_header_p (pfile->line_table,
+							pstate->pragma_loc);
+  linemap_add (pfile->line_table, LC_GEN, sysp, (const char *)result, 1,
+	       buf_len);
+  const auto col_hint = (uchar *) memchr (result, '\n', buf_len) - result;
+  linemap_line_start (pfile->line_table, 1, col_hint);
+
+  /* Push the buffer.  */
+  cpp_push_buffer (pfile, result, buf_len - 2, true);
+
+  /* This is needed to make _Pragma("once") work correctly, as it needs
+     pfile->buffer->file to be set to the current source file.  */
+  pfile->buffer->file = pfile->buffer->prev->file;
+
+  /* We are ready to start handling the directive as normal.  */
   _cpp_clean_line (pfile);
-  save_directive = pfile->directive;
+  const auto save_directive = pfile->directive;
   pfile->directive = &dtable[T_PRAGMA];
   do_pragma (pfile);
   if (pfile->directive_result.type == CPP_PRAGMA)
@@ -1931,85 +1979,127 @@ destringize_and_run (cpp_reader *pfile, const cpp_string *in,
   pfile->directive = save_directive;
 
   /* We always insert at least one token, the directive result.  It'll
-     either be a CPP_PADDING or a CPP_PRAGMA.  In the later case, we 
+     either be a CPP_PADDING or a CPP_PRAGMA.  In the latter case, we
      need to insert *all* of the tokens, including the CPP_PRAGMA_EOL.  */
 
   /* If we're not handling the pragma internally, read all of the tokens from
-     the string buffer now, while the string buffer is still installed.  */
-  /* ??? Note that the token buffer allocated here is leaked.  It's not clear
-     to me what the true lifespan of the tokens are.  It would appear that
-     the lifespan is the entire parse of the main input stream, in which case
-     this may not be wrong.  */
-  if (pfile->directive_result.type == CPP_PRAGMA)
-    {
-      int maxcount;
-
-      count = 1;
-      maxcount = 50;
-      toks = XNEWVEC (cpp_token, maxcount);
-      toks[0] = pfile->directive_result;
-      toks[0].src_loc = expansion_loc;
-
-      do
+     the string buffer now, while the string buffer is still installed, and then
+     push them as a new token context after.  This way, we can clean up the
+     temporarily modified state of the lexer now.  */
+
+  const bool is_deferred = (pfile->directive_result.type == CPP_PRAGMA);
+  if (is_deferred)
+    {
+      /* Using _cpp_buff allows us to arrange for this buffer to be freed when
+	 the new token context is popped, without adding any additional space
+	 overhead to the cpp_context structure.  In order to support
+	 track_macro_expansion==0, we need to store the cpp_token objects
+	 contiguously, and the virt locs separately.  (Note that these tokens
+	 may acquire a virtual loc here, in case the pragma allows macro
+	 expansion.  But they will not yet have virtual locs representing them
+	 as part of the expansion of the _Pragma directive; this will be handled
+	 later in _cpp_push__Pragma_token_context.  */
+      const size_t init_count = 50;
+      _cpp_buff *tok_buff
+	= _cpp_get_buff (pfile, init_count * sizeof (cpp_token));
+      _cpp_buff *loc_buff
+	= _cpp_get_buff (pfile, init_count * sizeof (location_t));
+
+      /* Remember the base buffs so we can chain the final loc buff after it
+	 once we are done collecting tokens.  */
+      const auto tok_buff0 = tok_buff;
+      pstate->buff_chain = &loc_buff->next;
+
+      /* DIRECTIVE_RESULT is the first token we return (a CPP_PRAGMA).  This
+	 location cannot result from macro expansion, so there is no virtual
+	 location to worry about.  */
+      auto tok_out = (cpp_token *) tok_buff->base;
+      *tok_out++ = pfile->directive_result;
+      auto loc_out = (location_t *) loc_buff->base;
+      *loc_out++ = pfile->directive_result.src_loc;
+      unsigned int ntoks = 1;
+
+      /* Finally get all the tokens.  */
+      for (;;)
 	{
-	  if (count == maxcount)
+	  if (tok_buff->limit - (uchar *)tok_out < (int)sizeof (cpp_token))
 	    {
-	      maxcount = maxcount * 3 / 2;
-	      toks = XRESIZEVEC (cpp_token, toks, maxcount);
+	      _cpp_extend_buff (pfile, &tok_buff,
+				tok_buff->limit - tok_buff->base);
+	      tok_out = ((cpp_token *)tok_buff->base) + ntoks;
 	    }
-	  toks[count] = *cpp_get_token (pfile);
-	  /* _Pragma is a builtin, so we're not within a macro-map, and so
-	     the token locations are set to bogus ordinary locations
-	     near to, but after that of the "_Pragma".
-	     Paper over this by setting them equal to the location of the
-	     _Pragma itself (PR preprocessor/69126).  */
-	  toks[count].src_loc = expansion_loc;
+
+	  if (loc_buff->limit - (uchar *)loc_out < (int)sizeof (location_t))
+	    {
+	      _cpp_extend_buff (pfile, &loc_buff,
+				loc_buff->limit - loc_buff->base);
+	      loc_out = ((location_t *)loc_buff->base) + ntoks;
+	    }
+
+	  const auto this_tok = tok_out;
+	  *tok_out++ = *cpp_get_token_with_location (pfile, loc_out++);
+	  ++ntoks;
+
 	  /* Macros have been already expanded by cpp_get_token
 	     if the pragma allowed expansion.  */
-	  toks[count++].flags |= NO_EXPAND;
+	  this_tok->flags |= NO_EXPAND;
+	  if (this_tok->type == CPP_PRAGMA_EOL)
+	    break;
 	}
-      while (toks[count-1].type != CPP_PRAGMA_EOL);
+
+      /* Finalize the buffers so they can be stored as one chain in a
+	 cpp_context and freed when that context is popped.  */
+      tok_buff0->next = loc_buff;
+      pstate->ntoks = ntoks;
+      pstate->tok_buff = tok_buff;
+      pstate->loc_buff = loc_buff;
     }
   else
     {
-      count = 1;
-      toks = &pfile->avoid_paste;
-
       /* If we handled the entire pragma internally, make sure we get the
 	 line number correct for the next token.  */
       if (pfile->cb.line_change)
 	pfile->cb.line_change (pfile, pfile->cur_token, false);
     }
 
-  /* Finish inlining run_directive.  */
+  /* Reset the old state before...  */
+  const auto map = linemap_add (pfile->line_table, LC_LEAVE, 0, nullptr, 0);
+  linemap_line_start
+    (pfile->line_table,
+     ORDINARY_MAP_STARTING_LINE_NUMBER (linemap_check_ordinary (map)),
+     127);
   pfile->buffer->file = NULL;
   _cpp_pop_buffer (pfile);
-
-  /* Reset the old macro state before ...  */
   XDELETE (pfile->context);
   pfile->context = saved_context;
   pfile->cur_token = saved_cur_token;
   pfile->cur_run = saved_cur_run;
 
-  /* ... inserting the new tokens we collected.  */
-  _cpp_push_token_context (pfile, NULL, toks, count);
+  /* ...inserting the new tokens we collected.  This is not a simple call to
+     _cpp_push_token_context, because we need to create virtual locations
+     for the tokens and push an extended token context to return them.  */
+  if (is_deferred)
+    _cpp_push__Pragma_token_context (pfile, pstate);
+  else
+    _cpp_push_token_context (pfile, nullptr, &pfile->avoid_paste, 1);
 }
 
+
 /* Handle the _Pragma operator.  Return 0 on error, 1 if ok.  */
+
 int
-_cpp_do__Pragma (cpp_reader *pfile, location_t expansion_loc)
+_cpp_do__Pragma (cpp_reader *pfile, _cpp__Pragma_state *pstate)
 {
   /* Make sure we don't invalidate the string token, if the closing parenthesis
    ended up on a different line.  */
   ++pfile->keep_tokens;
-  const cpp_token *string = get__Pragma_string (pfile);
+  pstate->string_tok = get__Pragma_string (pfile, &pstate->string_loc);
   --pfile->keep_tokens;
 
   pfile->directive_result.type = CPP_PADDING;
-
-  if (string)
+  if (pstate->string_tok)
     {
-      destringize_and_run (pfile, &string->val.str, expansion_loc);
+      destringize_and_run (pfile, pstate);
       return 1;
     }
   cpp_error (pfile, CPP_DL_ERROR,
diff --git a/libcpp/errors.cc b/libcpp/errors.cc
index 3269d076af2..54c1c282540 100644
--- a/libcpp/errors.cc
+++ b/libcpp/errors.cc
@@ -60,13 +60,11 @@ cpp_diagnostic_at (cpp_reader * pfile, enum cpp_diagnostic_level level,
 		   enum cpp_warning_reason reason, rich_location *richloc,
 		   const char *msgid, va_list *ap)
 {
-  bool ret;
-
   if (!pfile->cb.diagnostic)
     abort ();
-  ret = pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
-
-  return ret;
+  if (pfile->diagnostic_rebase_loc)
+    _cpp_rebase_diagnostic_location (pfile, richloc);
+  return pfile->cb.diagnostic (pfile, level, reason, richloc, _(msgid), ap);
 }
 
 /* Print a diagnostic at the location of the previously lexed token.  */
@@ -197,16 +195,14 @@ cpp_diagnostic_with_line (cpp_reader * pfile, enum cpp_diagnostic_level level,
 			  location_t src_loc, unsigned int column,
 			  const char *msgid, va_list *ap)
 {
-  bool ret;
-  
   if (!pfile->cb.diagnostic)
     abort ();
   rich_location richloc (pfile->line_table, src_loc);
   if (column)
     richloc.override_column (column);
-  ret = pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
-
-  return ret;
+  if (pfile->diagnostic_rebase_loc)
+    _cpp_rebase_diagnostic_location (pfile, &richloc);
+  return pfile->cb.diagnostic (pfile, level, reason, &richloc, _(msgid), ap);
 }
 
 /* Print a warning or error, depending on the value of LEVEL.  */
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 76617fe6129..ae32584c264 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1812,6 +1812,7 @@ class rich_location
   location_range *get_range (unsigned int idx);
 
   expanded_location get_expanded_location (unsigned int idx);
+  void forget_cached_expanded_location () { m_have_expanded_location = false; }
 
   void
   override_column (int column);
diff --git a/libcpp/internal.h b/libcpp/internal.h
index 8b74d10c1a3..b6118d7128b 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -292,6 +292,28 @@ struct lexer_state
   unsigned char ignore__Pragma;
 };
 
+/* Because handling of _Pragma bounces back and forth between macro.cc and
+   directives.cc, it is useful to keep the needed state in one place.  */
+struct _cpp__Pragma_state
+{
+  const cpp_token *string_tok; /* The token for the argument string.  */
+
+  /* These locations are the virtual locations returned by
+     cpp_get_token_with_location, if the relevant tokens came from macro
+     expansions.  */
+  location_t pragma_loc; /* Location of the _Pragma token.  */
+  location_t string_loc; /* Location of the string arg.  */
+
+  /* The tokens lexed from the _Pragma string.  */
+  unsigned int ntoks;
+  _cpp_buff *tok_buff;
+  _cpp_buff *loc_buff;
+  _cpp_buff **buff_chain;
+};
+
+/* In macro.cc, implements pstate->diagnostic_rebase_loc handling.  */
+void _cpp_rebase_diagnostic_location (cpp_reader *, rich_location *);
+
 /* Special nodes - identifiers with predefined significance.  */
 struct spec_nodes
 {
@@ -601,6 +623,12 @@ struct cpp_reader
      zero of said file.  */
   location_t main_loc;
 
+  /* Location from which we would like to pretend a given token was
+     macro-expanded, if a diagnostic is issued.  Useful for improving
+     _Pragma diagnostics.  */
+  location_t diagnostic_rebase_loc;
+  cpp_hashnode *diagnostic_rebase_node;
+
   /* Returns true iff we should warn about UTF-8 bidirectional control
      characters.  */
   bool warn_bidi_p () const
@@ -701,6 +729,8 @@ extern const unsigned char *_cpp_builtin_macro_text (cpp_reader *,
 extern int _cpp_warn_if_unused_macro (cpp_reader *, cpp_hashnode *, void *);
 extern void _cpp_push_token_context (cpp_reader *, cpp_hashnode *,
 				     const cpp_token *, unsigned int);
+extern void _cpp_push__Pragma_token_context (cpp_reader *,
+					     _cpp__Pragma_state *);
 extern void _cpp_backup_tokens_direct (cpp_reader *, unsigned int);
 
 /* In identifiers.cc */
@@ -772,7 +802,7 @@ extern int _cpp_handle_directive (cpp_reader *, bool);
 extern void _cpp_define_builtin (cpp_reader *, const char *);
 extern char ** _cpp_save_pragma_names (cpp_reader *);
 extern void _cpp_restore_pragma_names (cpp_reader *, char **);
-extern int _cpp_do__Pragma (cpp_reader *, location_t);
+extern int _cpp_do__Pragma (cpp_reader *, _cpp__Pragma_state *);
 extern void _cpp_init_directives (cpp_reader *);
 extern void _cpp_init_internal_pragmas (cpp_reader *);
 extern void _cpp_do_file_change (cpp_reader *, enum lc_reason, const char *,
diff --git a/libcpp/macro.cc b/libcpp/macro.cc
index dada8fea835..864e7dabc38 100644
--- a/libcpp/macro.cc
+++ b/libcpp/macro.cc
@@ -93,6 +93,8 @@ struct macro_arg_saved_data {
 static const char *vaopt_paste_error =
   N_("'##' cannot appear at either end of __VA_OPT__");
 
+static const uchar pragma_str[] = N_("<_Pragma directive>");
+
 static void expand_arg (cpp_reader *, macro_arg *);
 
 /* A class for tracking __VA_OPT__ state while iterating over a
@@ -756,7 +758,31 @@ builtin_macro (cpp_reader *pfile, cpp_hashnode *node,
       if (pfile->state.in_directive || pfile->state.ignore__Pragma)
 	return 0;
 
-      return _cpp_do__Pragma (pfile, loc);
+      _cpp__Pragma_state pstate = {};
+      pstate.pragma_loc = loc;
+
+      /* The diagnostic_rebase stuff arranges that any diagnostics issued during
+	 lexing will point the user back to the _Pragma location.  */
+      const auto prev_rloc = pfile->diagnostic_rebase_loc;
+      const auto prev_rnode = pfile->diagnostic_rebase_node;
+      pfile->diagnostic_rebase_loc = loc;
+      pfile->diagnostic_rebase_node
+	= cpp_lookup (pfile, pragma_str, (sizeof pragma_str) - 1);
+
+      /* While lexing tokens, if we end up expanding some macros, we would
+	 like not to override top_most_macro_node; preserving it pointing
+	 to the _Pragma helps out the case of -ftrack-macro-expansion=0.
+	 Setting this flag causes in_macro_expansion_p to return TRUE,
+	 even though we are not technically in a macro context.  */
+      const bool prev_expand = pfile->about_to_expand_macro_p;
+      pfile->about_to_expand_macro_p = true;
+
+      /* Get the tokens, then reset everything back how it was.  */
+      const int res = _cpp_do__Pragma (pfile, &pstate);
+      pfile->about_to_expand_macro_p = prev_expand;
+      pfile->diagnostic_rebase_loc = prev_rloc;
+      pfile->diagnostic_rebase_node = prev_rnode;
+      return res;
     }
 
   buf = _cpp_builtin_macro_text (pfile, node, expand_loc);
@@ -2802,7 +2828,8 @@ _cpp_pop_context (cpp_reader *pfile)
 	  && macro_of_context (context->prev) != macro)
 	macro->flags &= ~NODE_DISABLED;
 
-      if (macro == pfile->top_most_macro_node && context->prev == NULL)
+      if (!pfile->about_to_expand_macro_p
+	  && context->prev == &pfile->base_context)
 	/* We are popping the context of the top-most macro node.  */
 	pfile->top_most_macro_node = NULL;
     }
@@ -2836,10 +2863,10 @@ reached_end_of_context (cpp_context *context)
 
 /* Consume the next token contained in the current context of PFILE,
    and return it in *TOKEN. It's "full location" is returned in
-   *LOCATION. If -ftrack-macro-location is in effeect, fFull location"
-   means the location encoding the locus of the token across macro
-   expansion; otherwise it's just is the "normal" location of the
-   token which (*TOKEN)->src_loc.  */
+   *LOCATION.  If -ftrack-macro-location is in effect, "full location"
+   means the virtual location encoding the locus of the token across macro
+   expansion; otherwise it's just the "normal" (spelling) location of the
+   token, which is (*TOKEN)->src_loc.  */
 static inline void
 consume_next_token_from_context (cpp_reader *pfile,
 				 const cpp_token ** token,
@@ -4137,3 +4164,90 @@ cpp_macro_definition (cpp_reader *pfile, cpp_hashnode *node,
   *buffer = '\0';
   return pfile->macro_buffer;
 }
+
+/* Handle the list of tokens lexed from a _Pragma string.  We need to create
+   virtual locations (reflecting the fact that these tokens are logically
+   within the expansion of the _Pragma string), and push an extended token
+   context.  */
+
+void
+_cpp_push__Pragma_token_context (cpp_reader *pfile,
+				 _cpp__Pragma_state *pstate)
+{
+  const auto node = cpp_lookup (pfile, pragma_str, (sizeof pragma_str) - 1);
+  const auto toks = (const cpp_token *) pstate->tok_buff->base;
+
+  /* If not tracking macro expansions, then just push a normal token context.
+     cpp_get_token () will return the user the location of the _Pragma
+     directive, so they will have a valid location for the _Pragma which is
+     outside the LC_GEN map.  */
+  if (!CPP_OPTION (pfile, track_macro_expansion))
+    {
+      _cpp_push_token_context (pfile, node, toks, pstate->ntoks);
+      /* Arrange to free the buffers when the context is popped.  */
+      pfile->context->buff = pstate->tok_buff;
+      return;
+    }
+
+  location_t *virt_locs = nullptr;
+  _cpp_buff *const macro_tokens = tokens_buff_new (pfile, pstate->ntoks,
+						   &virt_locs);
+  const auto map = linemap_enter_macro (pfile->line_table, node,
+					pstate->pragma_loc, pstate->ntoks);
+  const auto locs = (location_t *)pstate->loc_buff->base;
+  for (unsigned int i = 0; i != pstate->ntoks; ++i)
+    {
+      tokens_buff_add_token (macro_tokens, virt_locs, toks + i,
+			     locs[i], locs[i], map, i);
+    }
+
+  /* Chain tok_buff ahead of macro_tokens so both are freed together
+     when the context is popped.  pstate->buff_chain is the NEXT pointer
+     of the last buffer in the LOC_BUFF chain, so it looks like:
+     TOK_BUFF_1 -> ... -> TOK_BUFF_N -> ... -> LOC_BUFF_1 -> ... ->
+     LOC_BUFF_N -> MACRO_TOKENS_1 -> ... -> MACRO_TOKENS_N.  */
+  *pstate->buff_chain = macro_tokens;
+  push_extended_tokens_context (pfile, node, pstate->tok_buff, virt_locs,
+				(const cpp_token **) macro_tokens->base,
+				pstate->ntoks);
+}
+
+void
+_cpp_rebase_diagnostic_location (cpp_reader *pfile, rich_location *richloc)
+{
+  /* If we are here, it means a diagnostic is being generated while lexing
+     tokens outside a macro context, but pfile->diagnostic_rebase_loc indicates
+     a location from which we would like to pretend we are actually expanding a
+     macro.  This works around the fact that a macro map can only be generated
+     once we know how many tokens it will contain, but the number of tokens to
+     be lexed from, say, a _Pragma string, is not known ahead of time.  In the
+     case of _Pragma, _cpp_push__Pragma_token_context above handles creating the
+     proper macro map once all the tokens are available.  This function runs
+     earlier than that, while in the middle of lexing tokens, so it creates a
+     temporary macro map which serves only to improve the information content of
+     the diagnostic that's about to be generated.  */
+
+  const int nlocs = richloc->get_num_locations ();
+
+  if (CPP_OPTION (pfile, track_macro_expansion))
+    {
+      const auto map
+	= linemap_enter_macro (pfile->line_table, pfile->diagnostic_rebase_node,
+			       pfile->diagnostic_rebase_loc, nlocs);
+      for (int i = 0; i != nlocs; ++i)
+	{
+	  location_range &r = *richloc->get_range (i);
+	  r.m_loc = linemap_add_macro_token (map, i, r.m_loc, r.m_loc);
+	}
+    }
+  else
+    {
+      /* When not tracking macro expansion, then set the location to the
+	 expansion point for all tokens, which is what would be returned
+	 by cpp_get_token in the normal case.  */
+      for (int i = 0; i != nlocs; ++i)
+	richloc->get_range (i)->m_loc = pfile->invocation_location;
+    }
+
+  richloc->forget_cached_expanded_location ();
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
index ddccfe89e73..f518915492d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-5.c
@@ -46,7 +46,8 @@ main (void)
   /* Nvptx targets require a vector_length or 32 in to allow spinlocks with
      gangs.  */
   check_reduction (num_workers (nw) vector_length (vl), worker); /* { dg-line check_reduction_loc } */
-  /* { dg-warning "22:region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } pragma_loc }
+  /* { dg-warning "1:region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } 1 }
+     { dg-note "22:in <_Pragma directive>" "" { target *-*-* xfail offloading_enabled} pragma_loc }
      { dg-note "1:in expansion of macro 'DO_PRAGMA'" "" { target *-*-* xfail offloading_enabled } DO_PRAGMA_loc }
      { dg-note "3:in expansion of macro 'check_reduction'" "" { target *-*-* xfail offloading_enabled } check_reduction_loc }
      TODO See PR101551 for 'offloading_enabled' XFAILs.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
index 84e6d51670b..bd2567d96f8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vred2d-128.c
@@ -40,46 +40,54 @@ int a1[n], a2[n];
 
 gentest (test1, "acc parallel loop gang vector_length (128) firstprivate (t1, t2)",
 	 "acc loop vector reduction(+:t1) reduction(-:t2)")
-/* { dg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { dg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { dg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { dg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { dg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test2, "acc parallel loop gang vector_length (128) firstprivate (t1, t2)",
 	 "acc loop worker vector reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test3, "acc parallel loop gang worker vector_length (128) firstprivate (t1, t2)",
 	 "acc loop vector reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 gentest (test4, "acc parallel loop firstprivate (t1, t2)",
 	 "acc loop reduction(+:t1) reduction(-:t2)")
-/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t1' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t1' was declared here} {} { target *-*-* } vars }
-   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-4 }
+   { dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-5 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
-/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } outer }
+/* { DUPdg-warning {'t2' is used uninitialized} {} { target *-*-* } 1 }
+   { DUPdg-note {in <_Pragma directive>} {} { target { ! offloading_enabled } } outer }
    { DUP_dg-note {'t2' was declared here} {} { target *-*-* } vars }
-   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-8 }
+   { DUP_dg-note {in expansion of macro 'gentest'} {} { target { ! offloading_enabled } } .-10 }
      TODO See PR101551 for 'offloading_enabled' differences.  */
 
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output
  2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
                           ` (6 preceding siblings ...)
  2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
@ 2023-08-09 22:14         ` Lewis Hyatt
  2023-08-15 17:04           ` David Malcolm
  7 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-09 22:14 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm, Lewis Hyatt

The diagnostics routines for SARIF output need to read the source code back
in, so that they can generate "snippet" and "content" records, so they need to
be able to cope with generated data locations.  Add support for that in
diagnostic-format-sarif.cc.

gcc/ChangeLog:

	* diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
	to support generated data locations.
	(sarif_builder::maybe_make_physical_location_object): Change the
	m_filenames hash_set to support generated data.
	(sarif_builder::make_artifact_location_object): Use a source_id rather
	than a plain file name.
	(sarif_builder::maybe_make_region_object): Adapt to
	expanded_location interface changes.
	(sarif_builder::maybe_make_region_object_for_context): Likewise.
	(sarif_builder::make_artifact_object): Likewise.
	(sarif_builder::make_run_object): Handle generated data.
	(sarif_builder::maybe_make_artifact_content_object): Likewise.
	(get_source_lines): Likewise.

gcc/testsuite/ChangeLog:

	* c-c++-common/diagnostic-format-sarif-file-5.c: New test.
---
 gcc/diagnostic-format-sarif.cc                | 88 +++++++++++--------
 .../diagnostic-format-sarif-file-5.c          | 31 +++++++
 2 files changed, 82 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c

diff --git a/gcc/diagnostic-format-sarif.cc b/gcc/diagnostic-format-sarif.cc
index 1eff71962d7..c7c0e5d4b0a 100644
--- a/gcc/diagnostic-format-sarif.cc
+++ b/gcc/diagnostic-format-sarif.cc
@@ -174,7 +174,7 @@ private:
   json::array *maybe_make_kinds_array (diagnostic_event::meaning m) const;
   json::object *maybe_make_physical_location_object (location_t loc);
   json::object *make_artifact_location_object (location_t loc);
-  json::object *make_artifact_location_object (const char *filename);
+  json::object *make_artifact_location_object (source_id src);
   json::object *make_artifact_location_object_for_pwd () const;
   json::object *maybe_make_region_object (location_t loc) const;
   json::object *maybe_make_region_object_for_context (location_t loc) const;
@@ -197,9 +197,9 @@ private:
   json::object *make_reporting_descriptor_object_for_cwe_id (int cwe_id) const;
   json::object *
   make_reporting_descriptor_reference_object_for_cwe_id (int cwe_id);
-  json::object *make_artifact_object (const char *filename);
-  json::object *maybe_make_artifact_content_object (const char *filename) const;
-  json::object *maybe_make_artifact_content_object (const char *filename,
+  json::object *make_artifact_object (source_id src);
+  json::object *maybe_make_artifact_content_object (source_id src) const;
+  json::object *maybe_make_artifact_content_object (source_id src,
 						    int start_line,
 						    int end_line) const;
   json::object *make_fix_object (const rich_location &rich_loc);
@@ -220,7 +220,11 @@ private:
      diagnostic group.  */
   sarif_result *m_cur_group_result;
 
-  hash_set <const char *> m_filenames;
+  /* If the second member is >0, then this is a buffer of generated content,
+     with that length, not a filename.  */
+  hash_set <pair_hash <nofree_ptr_hash <const char>,
+		       int_hash <unsigned int, -1U> >
+	    > m_filenames;
   bool m_seen_any_relative_paths;
   hash_set <free_string_hash> m_rule_id_set;
   json::array *m_rules_arr;
@@ -787,7 +791,8 @@ sarif_builder::maybe_make_physical_location_object (location_t loc)
   /* "artifactLocation" property (SARIF v2.1.0 section 3.29.3).  */
   json::object *artifact_loc_obj = make_artifact_location_object (loc);
   phys_loc_obj->set ("artifactLocation", artifact_loc_obj);
-  m_filenames.add (LOCATION_FILE (loc));
+  const auto src = LOCATION_SRC (loc);
+  m_filenames.add ({src.get_filename_or_buffer (), src.get_buffer_len ()});
 
   /* "region" property (SARIF v2.1.0 section 3.29.4).  */
   if (json::object *region_obj = maybe_make_region_object (loc))
@@ -811,7 +816,7 @@ sarif_builder::maybe_make_physical_location_object (location_t loc)
 json::object *
 sarif_builder::make_artifact_location_object (location_t loc)
 {
-  return make_artifact_location_object (LOCATION_FILE (loc));
+  return make_artifact_location_object (LOCATION_SRC (loc));
 }
 
 /* The ID value for use in "uriBaseId" properties (SARIF v2.1.0 section 3.4.4)
@@ -823,10 +828,13 @@ sarif_builder::make_artifact_location_object (location_t loc)
    or return NULL.  */
 
 json::object *
-sarif_builder::make_artifact_location_object (const char *filename)
+sarif_builder::make_artifact_location_object (source_id src)
 {
   json::object *artifact_loc_obj = new json::object ();
 
+  const auto filename = src.is_buffer ()
+    ? special_fname_generated () : src.get_filename_or_buffer ();
+
   /* "uri" property (SARIF v2.1.0 section 3.4.3).  */
   artifact_loc_obj->set ("uri", new json::string (filename));
 
@@ -912,9 +920,9 @@ sarif_builder::maybe_make_region_object (location_t loc) const
   expanded_location exploc_start = expand_location (start_loc);
   expanded_location exploc_finish = expand_location (finish_loc);
 
-  if (exploc_start.file !=exploc_caret.file)
+  if (exploc_start.src != exploc_caret.src)
     return NULL;
-  if (exploc_finish.file !=exploc_caret.file)
+  if (exploc_finish.src != exploc_caret.src)
     return NULL;
 
   json::object *region_obj = new json::object ();
@@ -963,9 +971,9 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const
   expanded_location exploc_start = expand_location (start_loc);
   expanded_location exploc_finish = expand_location (finish_loc);
 
-  if (exploc_start.file !=exploc_caret.file)
+  if (exploc_start.src != exploc_caret.src)
     return NULL;
-  if (exploc_finish.file !=exploc_caret.file)
+  if (exploc_finish.src != exploc_caret.src)
     return NULL;
 
   json::object *region_obj = new json::object ();
@@ -979,9 +987,9 @@ sarif_builder::maybe_make_region_object_for_context (location_t loc) const
 
   /* "snippet" property (SARIF v2.1.0 section 3.30.13).  */
   if (json::object *artifact_content_obj
-	 = maybe_make_artifact_content_object (exploc_start.file,
-					       exploc_start.line,
-					       exploc_finish.line))
+      = maybe_make_artifact_content_object (exploc_start.src,
+					    exploc_start.line,
+					    exploc_finish.line))
     region_obj->set ("snippet", artifact_content_obj);
 
   return region_obj;
@@ -1298,7 +1306,10 @@ sarif_builder::make_run_object (sarif_invocation *invocation_obj,
   json::array *artifacts_arr = new json::array ();
   for (auto iter : m_filenames)
     {
-      json::object *artifact_obj = make_artifact_object (iter);
+      const auto src = iter.second
+	? source_id {iter.first, iter.second} /* Memory buffer.  */
+	: source_id {iter.first}; /* Filename.  */
+      json::object *artifact_obj = make_artifact_object (src);
       artifacts_arr->append (artifact_obj);
     }
   run_obj->set ("artifacts", artifacts_arr);
@@ -1472,37 +1483,37 @@ sarif_builder::maybe_make_cwe_taxonomy_object () const
 /* Make an artifact object (SARIF v2.1.0 section 3.24).  */
 
 json::object *
-sarif_builder::make_artifact_object (const char *filename)
+sarif_builder::make_artifact_object (source_id src)
 {
   json::object *artifact_obj = new json::object ();
 
   /* "location" property (SARIF v2.1.0 section 3.24.2).  */
-  json::object *artifact_loc_obj = make_artifact_location_object (filename);
+  json::object *artifact_loc_obj = make_artifact_location_object (src);
   artifact_obj->set ("location", artifact_loc_obj);
 
   /* "contents" property (SARIF v2.1.0 section 3.24.8).  */
   if (json::object *artifact_content_obj
-	= maybe_make_artifact_content_object (filename))
+	= maybe_make_artifact_content_object (src))
     artifact_obj->set ("contents", artifact_content_obj);
 
   /* "sourceLanguage" property (SARIF v2.1.0 section 3.24.10).  */
   if (m_context->m_client_data_hooks)
     if (const char *source_lang
 	= m_context->m_client_data_hooks->maybe_get_sarif_source_language
-	    (filename))
+	    (src.get_filename_or_buffer ()))
       artifact_obj->set ("sourceLanguage", new json::string (source_lang));
 
   return artifact_obj;
 }
 
 /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the
-   full contents of FILENAME.  */
+   full contents of SRC.  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename) const
+sarif_builder::maybe_make_artifact_content_object (source_id src) const
 {
   /* Let input.cc handle any charset conversion.  */
-  char_span utf8_content = get_source_file_content (filename);
+  char_span utf8_content = get_source_file_content (src);
   if (!utf8_content)
     return NULL;
 
@@ -1518,10 +1529,12 @@ sarif_builder::maybe_make_artifact_content_object (const char *filename) const
 }
 
 /* Attempt to read the given range of lines from FILENAME; return
-   a freshly-allocated 0-terminated buffer containing them, or NULL.  */
+   a freshly-allocated buffer containing them, or NULL.
+   The buffer is null-terminated, but could also contain embedded null
+   bytes, so the char_span's length() accessor should be used.  */
 
-static char *
-get_source_lines (const char *filename,
+static char_span
+get_source_lines (source_id src,
 		  int start_line,
 		  int end_line)
 {
@@ -1529,9 +1542,9 @@ get_source_lines (const char *filename,
 
   for (int line = start_line; line <= end_line; line++)
     {
-      char_span line_content = location_get_source_line (filename, line);
+      char_span line_content = location_get_source_line (src, line);
       if (!line_content.get_buffer ())
-	return NULL;
+	return char_span (nullptr, 0);
       result.reserve (line_content.length () + 1);
       for (size_t i = 0; i < line_content.length (); i++)
 	result.quick_push (line_content[i]);
@@ -1539,33 +1552,34 @@ get_source_lines (const char *filename,
     }
   result.safe_push ('\0');
 
-  return xstrdup (result.address ());
+  return char_span (xstrdup (result.address ()), result.length () - 1);
 }
 
 /* Make an artifactContent object (SARIF v2.1.0 section 3.3) for the given
-   run of lines within FILENAME (including the endpoints).  */
+   run of lines in the source code identified by SRC (including the
+   endpoints).  */
 
 json::object *
-sarif_builder::maybe_make_artifact_content_object (const char *filename,
+sarif_builder::maybe_make_artifact_content_object (source_id src,
 						   int start_line,
 						   int end_line) const
 {
-  char *text_utf8 = get_source_lines (filename, start_line, end_line);
+  const char_span text_utf8 = get_source_lines (src, start_line, end_line);
 
   if (!text_utf8)
     return NULL;
 
   /* Don't add it if it's not valid UTF-8.  */
-  if (!cpp_valid_utf8_p(text_utf8, strlen(text_utf8)))
+  if (!cpp_valid_utf8_p (text_utf8.get_buffer (), text_utf8.length ()))
     {
-      free (text_utf8);
+      free (const_cast<char *> (text_utf8.get_buffer ()));
       return NULL;
     }
 
   json::object *artifact_content_obj = new json::object ();
-  artifact_content_obj->set ("text", new json::string (text_utf8));
-  free (text_utf8);
-
+  artifact_content_obj->set ("text", new json::string (text_utf8.get_buffer (),
+						       text_utf8.length ()));
+  free (const_cast<char *> (text_utf8.get_buffer ()));
   return artifact_content_obj;
 }
 
diff --git a/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
new file mode 100644
index 00000000000..2ca6a069d3f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/diagnostic-format-sarif-file-5.c
@@ -0,0 +1,31 @@
+/* The goal is to test SARIF output of generated data, such as a _Pragma string.
+   But SARIF output as of yet does not output macro definitions, so such
+   generated data buffers never end up in the typical SARIF output.  One way we
+   can achieve it is to use -fdump-internal-locations, which outputs top-level
+   diagnostic notes inside macro definitions, that SARIF will end up processing.
+   It also outputs a lot of other stuff to stderr (not to the SARIF file) that
+   is not relevant to this test, so we use a blanket dg-regexp to filter all of
+   that away.  */
+
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-format=sarif-file -fdump-internal-locations" } */
+/* { dg-allow-blank-lines-in-output "" } */
+
+_Pragma("GCC diagnostic push")
+
+/* { dg-regexp {(.|[\n\r])*} } */
+
+/* Because of the way -fdump-internal-locations works, these regexes themselves
+   will end up in the sarif output also.  But due to the escaping, they don't
+   match themselves, so they still test what we need.  */
+
+/* Four of this pair are output for the tokens inside the
+   _Pragma string (3 plus a PRAGMA_EOL).  */
+
+/* { dg-final { scan-sarif-file "\"artifactLocation\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"snippet\": \{\"text\": \"GCC diagnostic push\\\\n\"" } } */
+
+/* One of this pair is output for the overall internal location.  */
+
+/* { dg-final { scan-sarif-file "\{\"location\": \{\"uri\": \"<generated>\"," } } */
+/* { dg-final { scan-sarif-file "\"contents\": \{\"text\": \"GCC diagnostic push\\\\n\\\\0" } } */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
@ 2023-08-11 22:45           ` David Malcolm
  2023-08-13 20:18             ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-11 22:45 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:

Hi Lewis, thanks for the patch...

> Add a new linemap reason LC_GEN which enables encoding the location of data
> that was generated during compilation and does not appear in any source file.
> There could be many use cases, such as, for instance, referring to the content
> of builtin macros (not yet implemented, but an easy lift after this one.) The
> first intended application is to create a place to store the input to a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
> 
> The TO_FILE member of struct line_map_ordinary has been changed to a union
> named SRC which can be either a file name, or a pointer to a line_map_data
> struct describing the data. There is no space overhead added to the line
> maps data structures.
> 
> Outside libcpp, this patch includes only the minimal changes implied by the
> adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
> patches will implement the new functionality.
> 
> libcpp/ChangeLog:
> 
>         * include/line-map.h (enum lc_reason): Add LC_GEN.
>         (struct line_map_data): New struct.
>         (struct line_map_ordinary): Change TO_FILE from a char* to a union,
>         and rename to SRC.
>         (class source_id): New class.
>         (ORDINARY_MAP_GENERATED_DATA_P): New function.
>         (ORDINARY_MAP_GENERATED_DATA): New function.
>         (ORDINARY_MAP_GENERATED_DATA_LEN): New function.
>         (ORDINARY_MAP_SOURCE_ID): New function.
>         (ORDINARY_MAPS_SAME_FILE_P): New function.
>         (ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
>         (LINEMAP_FILE): Adapt to struct line_map_ordinary change.
>         (linemap_get_file_highest_location): Likewise.
>         * line-map.cc (source_id::operator==): New function.
>         (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
>         (linemap_add): Support creating LC_GEN maps.
>         (linemap_line_start): Support LC_GEN maps.
>         (linemap_check_files_exited): Likewise.
>         (linemap_position_for_loc_and_offset): Likewise.
>         (linemap_get_expansion_filename): Likewise.
>         (linemap_dump): Likewise.
>         (linemap_dump_location): Likewise.
>         (linemap_get_file_highest_location): Likewise.
>         * directives.cc (_cpp_do_file_change): Likewise.
> 
> gcc/c-family/ChangeLog:
> 
>         * c-common.cc (try_to_locate_new_include_insertion_point): Recognize
>         and ignore LC_GEN maps.
> 
> gcc/cp/ChangeLog:
> 
>         * module.cc (module_state::write_ordinary_maps): Recognize and
>         ignore LC_GEN maps, and adapt to interface change in struct
>         line_map_ordinary.
>         (module_state::read_ordinary_maps): Likewise.
> 
> gcc/ChangeLog:
> 
>         * diagnostic-show-locus.cc (compatible_locations_p): Adapt to
>         interface change in struct line_map_ordinary.
>         * input.cc (special_fname_generated): New function.
>         (dump_location_info): Support LC_GEN maps.
>         (get_substring_ranges_for_loc): Adapt to interface change in struct
>         line_map_ordinary.
>         * input.h (special_fname_generated): Declare.
> 
> gcc/go/ChangeLog:
> 
>         * go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
>         LC_GEN maps.
> ---
>  gcc/c-family/c-common.cc     |  11 ++-
>  gcc/cp/module.cc             |   8 +-
>  gcc/diagnostic-show-locus.cc |   2 +-
>  gcc/go/go-linemap.cc         |   3 +-
>  gcc/input.cc                 |  27 +++++-
>  gcc/input.h                  |   1 +
>  libcpp/directives.cc         |   4 +-
>  libcpp/include/line-map.h    | 144 ++++++++++++++++++++++++----
>  libcpp/line-map.cc           | 181 +++++++++++++++++++++++++----------
>  9 files changed, 299 insertions(+), 82 deletions(-)

[...snip...]

> 
> diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> index 0514815b51f..a2aa6b4e0b5 100644
> --- a/gcc/diagnostic-show-locus.cc
> +++ b/gcc/diagnostic-show-locus.cc
> @@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
>          are in the same file.  */
>        const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
>        const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
> -      return ord_map_a->to_file == ord_map_b->to_file;
> +      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);

My first thought here was: are buffers supported here, or does it have
to be a file?

It turns out that ORDINARY_MAPS_SAME_FILE_P works on both files and
buffers.

This suggests that it would be better named as
ORDINARY_MAPS_SAME_SOURCE_ID_P, but note the comment below, could this
be:

           return ord_map_a->same_source_id_p (ord_map_b);

?

[...snip...]

> diff --git a/gcc/input.cc b/gcc/input.cc
> index eaf301ec7c1..c1735215b29 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc

[...snip...]

> @@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
>        /* Bulletproofing.  We ought to only have different ordinary maps
>          for start vs finish due to line-length jumps.  */
>        if (start_ord_map != final_ord_map
> -         && start_ord_map->to_file != final_ord_map->to_file)
> +         && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))

For the common case of comparing a pair of ordinary maps that have
filenames, this hunk is replacing pointer comparison with filename_cmp,
which ultimately does something like strcmp.  Should
ORDINARY_MAPS_SAME_FILE_P have a fast-path for pointer equality?


[...snip...]

> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index 44fea0ea08e..e59123b18c5 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h

[...snip...]

> @@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
>    return ord_map->sysp;
>  }
>  
> +/* TRUE if this line map contains generated data.  */
> +inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
> +{
> +  return ord_map->reason == LC_GEN;
> +}
> +
>  /* TRUE if this line map is for a module (not a source file).  */
>  
>  inline bool
> @@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
>           && linemap_check_ordinary (map)->reason == LC_MODULE);
>  }
>  
> -/* Get the filename of ordinary map MAP.  */
> +/* Get the data contents of ordinary map MAP.  */
>  
>  inline const char *
>  ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
>  {
> -  return ord_map->to_file;
> +  linemap_assert (ord_map->reason != LC_GEN);
> +  return ord_map->src.file;
> +}
> +
> +inline const char *
> +ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
> +{
> +  linemap_assert (ord_map->reason == LC_GEN);
> +  return ord_map->src.data->data;
> +}
> +
> +inline unsigned int
> +ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
> +{
> +  linemap_assert (ord_map->reason == LC_GEN);
> +  return ord_map->src.data->len;
> +}
> +
> +inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
> +{
> +  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> +    return source_id {ord_map->src.data->data, ord_map->src.data->len};
> +  return source_id {ord_map->src.file};
> +}
> +
> +/* If we just want to know whether two maps point to the same
> +   file/buffer or not.  */
> +inline bool
> +ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
> +                          const line_map_ordinary *map2)
> +{
> +  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
>  }
> 
> 

There are lots of existing BLOCK_CAPS inline functions in line-map.h
due to them originally being macros, but could the new ones above be
member functions of line_map_ordinary?

e.g.

inline const char *
linemap_ordinary::get_generated_data () const
{
  linemap_assert (reason == LC_GEN);
  return src.data->data;
}

Then again, the patch is matching the existing style, so you could save
this for a followup if you like.

> @@ -1093,21 +1185,28 @@ extern location_t linemap_line_start
>  extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
>  
>  /* Add a mapping of logical source line to physical source file and
> -   line number. This function creates an "ordinary map", which is a
> +   line number.  This function creates an "ordinary map", which is a
>     map that records locations of tokens that are not part of macro
>     replacement-lists present at a macro expansion point.
>  
> -   The text pointed to by TO_FILE must have a lifetime
> -   at least as long as the lifetime of SET.  An empty
> -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> -   natural values considering the file we are returning to.
> +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> +   values considering the file we are returning to.  If reason is LC_GEN, then
> +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> +
> +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> +   map.
> +
> +   A call to this function can relocate the previous set of maps, so any stored
> +   line_map pointers should not be used.  */
>  
> -   A call to this function can relocate the previous set of
> -   maps, so any stored line_map pointers should not be used.  */
>  extern const line_map *linemap_add
>    (class line_maps *, enum lc_reason, unsigned int sysp,
> -   const char *to_file, linenum_type to_line);
> +   const char *filename_or_buffer, linenum_type to_line,
> +   unsigned int data_len = 0);

I haven't looked at the rest of the patches yet, but could the params
  const char *filename_or_buffer
and
  unsigned int data_len = 0 

be replaced by:
  source_id src

and, if so, does it simplify things?  Or do the various LC_* cases
complicated things?

>  
>  /* Create a macro map.  A macro map encodes source locations of tokens
>     that are part of a macro replacement-list, at a macro expansion
> @@ -1257,7 +1356,7 @@ linemap_position_for_loc_and_offset (class line_maps *set,
>  inline const char *
>  LINEMAP_FILE (const line_map_ordinary *ord_map)
>  {
> -  return ord_map->to_file;
> +  return ORDINARY_MAP_FILE_NAME (ord_map);
>  }

Presumably this adds the precondition that ORD_MAP isn't an LC_GEN map,
so please update the leading comment.

>  
>  /* Return the line number this map started encoding location from.  */

[...snip...]

> diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
> index e0f82e20571..e63916054e0 100644
> --- a/libcpp/line-map.cc
> +++ b/libcpp/line-map.cc
> @@ -48,6 +48,31 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
>  extern unsigned num_expanded_macros_counter;
>  extern unsigned num_macro_tokens_counter;
>  
> +bool
> +source_id::operator== (source_id src) const
> +{
> +  return m_len == src.m_len
> +    && (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer
> +       ? m_filename_or_buffer == src.m_filename_or_buffer
> +       : !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer));
> +}

This function could really use a leading comment, and I'd much prefer
it if you converted to if statements rather than one big expression.

Am I right in thinking that for filenames, we use libiberty's
filename_cmp (which compares the contents of the buffers), whereas for
buffers we use pointer equality, and we assume that every buffer ptr is
different from every filename's ptr?

As noted above, do we need a fast path for pointer equality before
calling filename_cmp? 


> +
> +/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
> +   but for an LC_GEN map, it returns the file name from which the data
> +   originated, instead of asserting.  */
> +const char *
> +ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
> +                                  const line_map_ordinary *ord_map)
> +{
> +  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> +    {
> +      ord_map = linemap_included_from_linemap (set, ord_map);
> +      if (!ord_map)
> +       return "-";

How does the above early return happen?  Is it the "read from stdin"
case?

> +    }
> +  return ORDINARY_MAP_FILE_NAME (ord_map);
> +}
> +
>  /* Destructor for class line_maps.
>     Ensure non-GC-managed memory is released.  */
>  

[...snip...]

> @@ -505,21 +531,28 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
>  }
>  
>  /* Add a mapping of logical source line to physical source file and
> -   line number.
> +   line number.  This function creates an "ordinary map", which is a
> +   map that records locations of tokens that are not part of macro
> +   replacement-lists present at a macro expansion point.
> +
> +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> +   values considering the file we are returning to.  If reason is LC_GEN, then
> +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
>  
> -   The text pointed to by TO_FILE must have a lifetime
> -   at least as long as the final call to lookup_line ().  An empty
> -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> -   natural values considering the file we are returning to.
> +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> +   map.
>  
> -   FROM_LINE should be monotonic increasing across calls to this
> -   function.  A call to this function can relocate the previous set of
> -   maps, so any stored line_map pointers should not be used.  */
> +   A call to this function can relocate the previous set of maps, so any stored
> +   line_map pointers should not be used.  */
>  
>  const struct line_map *
>  linemap_add (line_maps *set, enum lc_reason reason,
> -            unsigned int sysp, const char *to_file, linenum_type to_line)
> +            unsigned int sysp, const char *filename_or_buffer,
> +            linenum_type to_line, unsigned int data_len)

As noted above, would passing in a source_id make this simpler? 
Looking at the logic below, possibly not...

>  {
>    /* Generate a start_location above the current highest_location.
>       If possible, make the low range bits be zero.  */
> @@ -536,12 +569,24 @@ linemap_add (line_maps *set, enum lc_reason reason,
>  
>    /* When we enter the file for the first time reason cannot be
>       LC_RENAME.  */
> -  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
> +  line_map_data *data_to_reuse = nullptr;
> +  bool is_data_map = (reason == LC_GEN);
> +  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
> +    {
> +      linemap_assert (set->depth != 0);
> +      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
> +      linemap_assert (prev);
> +      if (prev->reason == LC_GEN)
> +       {
> +         data_to_reuse = prev->src.data;
> +         is_data_map = true;
> +       }
> +    }
>  
>    /* If we are leaving the main file, return a NULL map.  */
>    if (reason == LC_LEAVE
>        && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
> -      && to_file == NULL)
> +      && filename_or_buffer == NULL)
>      {
>        set->depth--;
>        return NULL;
> @@ -557,8 +602,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
>      = linemap_check_ordinary (new_linemap (set, start_location));
>    map->reason = reason;
>  
> -  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
> -    to_file = "<stdin>";
> +  if (filename_or_buffer && *filename_or_buffer == '\0'
> +      && reason != LC_RENAME_VERBATIM && !is_data_map)
> +    filename_or_buffer = "<stdin>";
>  
>    if (reason == LC_RENAME_VERBATIM)
>      reason = LC_RENAME;
> @@ -577,21 +623,50 @@ linemap_add (line_maps *set, enum lc_reason reason,
>          that comes right before MAP in the same file.  */
>        from = linemap_included_from_linemap (set, map - 1);
>  
> -      /* A TO_FILE of NULL is special - we use the natural values.  */
> -      if (to_file == NULL)
> +      /* Not currently supporting a #include originating from an LC_GEN
> +        map, since there is no clear use case for this and it would complicate
> +        the logic here.  */
> +      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
> +
> +      /* A null FILENAME_OR_BUFFER is special - we use the natural
> +        values.  */
> +      if (!filename_or_buffer)
>         {
> -         to_file = ORDINARY_MAP_FILE_NAME (from);
> +         filename_or_buffer = from->src.file;
>           to_line = SOURCE_LINE (from, from[1].start_location);
>           sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
>         }
>        else
>         linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
> -                                     to_file) == 0);
> +                                     filename_or_buffer) == 0);
>      }
>  
>    map->sysp = sysp;
> -  map->to_file = to_file;
>    map->to_line = to_line;
> +
> +  if (is_data_map)
> +    {
> +      /* All data maps should have reason == LC_GEN, even if they were
> +        an LC_RENAME, to keep it simple to check which maps contain
> +        data.  */
> +      map->reason = LC_GEN;
> +
> +      if (data_to_reuse)
> +       map->src.data = data_to_reuse;
> +      else
> +       {
> +         auto src_data
> +           = (line_map_data *)set->reallocator (nullptr,
> +                                                sizeof (line_map_data));
> +         src_data->data = filename_or_buffer;
> +         src_data->len = data_len;
> +         gcc_assert (data_len);
> +         map->src.data = src_data;
> +       }
> +    }
> +  else
> +    map->src.file = filename_or_buffer;
> +
>    LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
>    /* Do not store range_bits here.  That's readjusted in
>       linemap_line_start.  */
> @@ -606,7 +681,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
>       pure_location_p.  */
>    linemap_assert (pure_location_p (set, start_location));
>  
> -  if (reason == LC_ENTER)
> +  if (reason == LC_ENTER || reason == LC_GEN)
>      {
>        if (set->depth == 0)
>         map->included_from = 0;
> @@ -617,7 +692,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
>               & ~((1 << map[-1].m_column_and_range_bits) - 1))
>              + map[-1].start_location);
>        set->depth++;
> -      if (set->trace_includes)
> +      if (set->trace_includes && reason == LC_ENTER)
>         trace_include (set, map);
>      }
>    else if (reason == LC_RENAME)
> @@ -859,12 +934,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
>               >= (((uint64_t) 1)
>                   << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
>           || range_bits < map->m_range_bits)
> -       map = linemap_check_ordinary
> -               (const_cast <line_map *>
> -                 (linemap_add (set, LC_RENAME,
> -                               ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> -                               ORDINARY_MAP_FILE_NAME (map),
> -                               to_line)));
> +       {
> +         const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
> +           ? nullptr : map->src.file;
> +         map = linemap_check_ordinary
> +           (const_cast <line_map *>
> +            (linemap_add (set, LC_RENAME,
> +                          ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> +                          maybe_filename,
> +                          to_line)));
> +       }
>        map->m_column_and_range_bits = column_bits;
>        map->m_range_bits = range_bits;
>        r = (MAP_START_LOCATION (map)

[...snip...]


Thanks again for the patch; hope this is constructive
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations
  2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
@ 2023-08-11 23:02           ` David Malcolm
  2023-08-14 21:41             ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-11 23:02 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> The previous patch in this series introduced the concept of LC_GEN line
> maps. This patch continues on the path to using them to improve _Pragma
> diagnostics, by adding a new source_id SRC member to struct
> expanded_location, which is populated by linemap_expand_location. This
> member allows call sites to detect and handle when a location refers to
> generated data rather than a plain file name.
> 
> The previous FILE member of expanded_location is preserved (although
> redundant with SRC), so that call sites which do not and never will care
> about generated data do not need to be concerned about it. Call sites that
> will care are modified here, to use SRC rather than FILE for comparing
> locations.

Thanks; this seems like a good approach.


[...snip...]

> diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
> index 6f5bc6b9d8f..15052aec417 100644
> --- a/gcc/edit-context.cc
> +++ b/gcc/edit-context.cc
> @@ -295,7 +295,7 @@ edit_context::apply_fixit (const fixit_hint *hint)
>  {
>    expanded_location start = expand_location (hint->get_start_loc ());
>    expanded_location next_loc = expand_location (hint->get_next_loc ());
> -  if (start.file != next_loc.file)
> +  if (start.src != next_loc.src || start.src.is_buffer ())
>      return false;
>    if (start.line != next_loc.line)
>      return false;

Thinking about fix-it hints, it makes sense to reject attempts to
create fix-it hints within generated strings, as we can't apply them or
visualize them.

Does anywhere in the patch kit do that?  Either of 
  rich_location::maybe_add_fixit
or
  rich_location::reject_impossible_fixit
would be good places to do that.


[...snip...]

> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index e59123b18c5..76617fe6129 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h
> @@ -1410,18 +1410,22 @@ linemap_location_before_p (class line_maps *set,
>  
>  typedef struct
>  {
> -  /* The name of the source file involved.  */
> -  const char *file;
> +  /* The file name of the location involved, or NULL if the location
> +     is not in an external file.  */
> +  const char *file = nullptr;
>  
> -  /* The line-location in the source file.  */
> -  int line;
> -
> -  int column;
> +  /* A source_id recording the file name and/or the in-memory content,
> +     as appropriate.  Users that need to handle in-memory content need
> +     to use this rather than FILE.  */
> +  source_id src;
>  
> -  void *data;
> +  /* The line-location in the source file.  */
> +  int line = 0;
> +  int column = 0;
> +  void *data = nullptr;
>  
> -  /* In a system header?. */
> -  bool sysp;
> +  /* In a system header?  */
> +  bool sysp = false;
>  } expanded_location;

I don't know if we've been using default member initialization yet, but
apparently it's C++11, and thus OK.

[...snip...]


This patch looks good to me, but obviously it has dependencies on the
rest of the kit.

Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers
  2023-08-11 22:45           ` David Malcolm
@ 2023-08-13 20:18             ` Lewis Hyatt
  0 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-13 20:18 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 26228 bytes --]

On Fri, Aug 11, 2023 at 06:45:31PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> 
> Hi Lewis, thanks for the patch...
> 
> > Add a new linemap reason LC_GEN which enables encoding the location of data
> > that was generated during compilation and does not appear in any source file.
> > There could be many use cases, such as, for instance, referring to the content
> > of builtin macros (not yet implemented, but an easy lift after this one.) The
> > first intended application is to create a place to store the input to a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> > 
> > The TO_FILE member of struct line_map_ordinary has been changed to a union
> > named SRC which can be either a file name, or a pointer to a line_map_data
> > struct describing the data. There is no space overhead added to the line
> > maps data structures.
> > 
> > Outside libcpp, this patch includes only the minimal changes implied by the
> > adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
> > patches will implement the new functionality.
> > 
> > libcpp/ChangeLog:
> > 
> >         * include/line-map.h (enum lc_reason): Add LC_GEN.
> >         (struct line_map_data): New struct.
> >         (struct line_map_ordinary): Change TO_FILE from a char* to a union,
> >         and rename to SRC.
> >         (class source_id): New class.
> >         (ORDINARY_MAP_GENERATED_DATA_P): New function.
> >         (ORDINARY_MAP_GENERATED_DATA): New function.
> >         (ORDINARY_MAP_GENERATED_DATA_LEN): New function.
> >         (ORDINARY_MAP_SOURCE_ID): New function.
> >         (ORDINARY_MAPS_SAME_FILE_P): New function.
> >         (ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
> >         (LINEMAP_FILE): Adapt to struct line_map_ordinary change.
> >         (linemap_get_file_highest_location): Likewise.
> >         * line-map.cc (source_id::operator==): New function.
> >         (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
> >         (linemap_add): Support creating LC_GEN maps.
> >         (linemap_line_start): Support LC_GEN maps.
> >         (linemap_check_files_exited): Likewise.
> >         (linemap_position_for_loc_and_offset): Likewise.
> >         (linemap_get_expansion_filename): Likewise.
> >         (linemap_dump): Likewise.
> >         (linemap_dump_location): Likewise.
> >         (linemap_get_file_highest_location): Likewise.
> >         * directives.cc (_cpp_do_file_change): Likewise.
> > 
> > gcc/c-family/ChangeLog:
> > 
> >         * c-common.cc (try_to_locate_new_include_insertion_point): Recognize
> >         and ignore LC_GEN maps.
> > 
> > gcc/cp/ChangeLog:
> > 
> >         * module.cc (module_state::write_ordinary_maps): Recognize and
> >         ignore LC_GEN maps, and adapt to interface change in struct
> >         line_map_ordinary.
> >         (module_state::read_ordinary_maps): Likewise.
> > 
> > gcc/ChangeLog:
> > 
> >         * diagnostic-show-locus.cc (compatible_locations_p): Adapt to
> >         interface change in struct line_map_ordinary.
> >         * input.cc (special_fname_generated): New function.
> >         (dump_location_info): Support LC_GEN maps.
> >         (get_substring_ranges_for_loc): Adapt to interface change in struct
> >         line_map_ordinary.
> >         * input.h (special_fname_generated): Declare.
> > 
> > gcc/go/ChangeLog:
> > 
> >         * go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
> >         LC_GEN maps.
> > ---
> >  gcc/c-family/c-common.cc     |  11 ++-
> >  gcc/cp/module.cc             |   8 +-
> >  gcc/diagnostic-show-locus.cc |   2 +-
> >  gcc/go/go-linemap.cc         |   3 +-
> >  gcc/input.cc                 |  27 +++++-
> >  gcc/input.h                  |   1 +
> >  libcpp/directives.cc         |   4 +-
> >  libcpp/include/line-map.h    | 144 ++++++++++++++++++++++++----
> >  libcpp/line-map.cc           | 181 +++++++++++++++++++++++++----------
> >  9 files changed, 299 insertions(+), 82 deletions(-)
> 
> [...snip...]
> 
> > 
> > diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> > index 0514815b51f..a2aa6b4e0b5 100644
> > --- a/gcc/diagnostic-show-locus.cc
> > +++ b/gcc/diagnostic-show-locus.cc
> > @@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
> >          are in the same file.  */
> >        const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
> >        const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
> > -      return ord_map_a->to_file == ord_map_b->to_file;
> > +      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);
> 
> My first thought here was: are buffers supported here, or does it have
> to be a file?
> 
> It turns out that ORDINARY_MAPS_SAME_FILE_P works on both files and
> buffers.
> 
> This suggests that it would be better named as
> ORDINARY_MAPS_SAME_SOURCE_ID_P, but note the comment below, could this
> be:
> 
>            return ord_map_a->same_source_id_p (ord_map_b);
> 
> ?
> 
> [...snip...]
>

I think I would tend to feel that it's better not to switch to member
functions just yet, I'll reply to your comment below. In the meantime I did
rename this to ORDINARY_MAPS_SAME_SOURCE_P.

> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index eaf301ec7c1..c1735215b29 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> 
> [...snip...]
> 
> > @@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
> >        /* Bulletproofing.  We ought to only have different ordinary maps
> >          for start vs finish due to line-length jumps.  */
> >        if (start_ord_map != final_ord_map
> > -         && start_ord_map->to_file != final_ord_map->to_file)
> > +         && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))
> 
> For the common case of comparing a pair of ordinary maps that have
> filenames, this hunk is replacing pointer comparison with filename_cmp,
> which ultimately does something like strcmp.  Should
> ORDINARY_MAPS_SAME_FILE_P have a fast-path for pointer equality?
>

Yes I think it should, added. So it seems that in most places currently, we
call filename_cmp, but in some places we assume pointer equality is a
sufficient check. I am pretty sure, that code which *only* does pointer
equality for file names can be wrong for edge cases in the case of PCH; I
think the PCH process can produce multiple pointers to the same file name
under some circumstances. (One, although atypical, would be if the same file
were included again after being brought in as a PCH.) So it's an improvement
to add filename_cmp() but it's also even better to check equality first.

> 
> [...snip...]
> 
> > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> > index 44fea0ea08e..e59123b18c5 100644
> > --- a/libcpp/include/line-map.h
> > +++ b/libcpp/include/line-map.h
> 
> [...snip...]
> 
> > @@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
> >    return ord_map->sysp;
> >  }
> >  
> > +/* TRUE if this line map contains generated data.  */
> > +inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
> > +{
> > +  return ord_map->reason == LC_GEN;
> > +}
> > +
> >  /* TRUE if this line map is for a module (not a source file).  */
> >  
> >  inline bool
> > @@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
> >           && linemap_check_ordinary (map)->reason == LC_MODULE);
> >  }
> >  
> > -/* Get the filename of ordinary map MAP.  */
> > +/* Get the data contents of ordinary map MAP.  */
> >  
> >  inline const char *
> >  ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
> >  {
> > -  return ord_map->to_file;
> > +  linemap_assert (ord_map->reason != LC_GEN);
> > +  return ord_map->src.file;
> > +}
> > +
> > +inline const char *
> > +ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
> > +{
> > +  linemap_assert (ord_map->reason == LC_GEN);
> > +  return ord_map->src.data->data;
> > +}
> > +
> > +inline unsigned int
> > +ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
> > +{
> > +  linemap_assert (ord_map->reason == LC_GEN);
> > +  return ord_map->src.data->len;
> > +}
> > +
> > +inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
> > +{
> > +  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> > +    return source_id {ord_map->src.data->data, ord_map->src.data->len};
> > +  return source_id {ord_map->src.file};
> > +}
> > +
> > +/* If we just want to know whether two maps point to the same
> > +   file/buffer or not.  */
> > +inline bool
> > +ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
> > +                          const line_map_ordinary *map2)
> > +{
> > +  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
> >  }
> > 
> > 
> 
> There are lots of existing BLOCK_CAPS inline functions in line-map.h
> due to them originally being macros, but could the new ones above be
> member functions of line_map_ordinary?
> 
> e.g.
> 
> inline const char *
> linemap_ordinary::get_generated_data () const
> {
>   linemap_assert (reason == LC_GEN);
>   return src.data->data;
> }
> 
> Then again, the patch is matching the existing style, so you could save
> this for a followup if you like.
>

Definitely agree that member functions would be a nicer interface. One thing
to keep in mind is that the linemaps are allocated by ggc so they can work
with PCH and so they need to be POD types (I guess "standard layout types"
in newer C++ standards). So they can never really be a full class, hence
taking too many steps in that direction could be potentially error prone?
(e.g., it's OK to add a member function, but not a constructor, etc.) On the
other hand, it does already inherit from a base class with non-static
members, which I think makes it technically not a standard layout type as
is. I think revisiting it subsequently makes sense to me.

> > @@ -1093,21 +1185,28 @@ extern location_t linemap_line_start
> >  extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
> >  
> >  /* Add a mapping of logical source line to physical source file and
> > -   line number. This function creates an "ordinary map", which is a
> > +   line number.  This function creates an "ordinary map", which is a
> >     map that records locations of tokens that are not part of macro
> >     replacement-lists present at a macro expansion point.
> >  
> > -   The text pointed to by TO_FILE must have a lifetime
> > -   at least as long as the lifetime of SET.  An empty
> > -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> > -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > -   natural values considering the file we are returning to.
> > +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> > +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> > +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> > +   values considering the file we are returning to.  If reason is LC_GEN, then
> > +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> > +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> > +
> > +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> > +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> > +   map.
> > +
> > +   A call to this function can relocate the previous set of maps, so any stored
> > +   line_map pointers should not be used.  */
> >  
> > -   A call to this function can relocate the previous set of
> > -   maps, so any stored line_map pointers should not be used.  */
> >  extern const line_map *linemap_add
> >    (class line_maps *, enum lc_reason, unsigned int sysp,
> > -   const char *to_file, linenum_type to_line);
> > +   const char *filename_or_buffer, linenum_type to_line,
> > +   unsigned int data_len = 0);
> 
> I haven't looked at the rest of the patches yet, but could the params
>   const char *filename_or_buffer
> and
>   unsigned int data_len = 0 
> 
> be replaced by:
>   source_id src
> 
> and, if so, does it simplify things?  Or do the various LC_* cases
> complicated things?
>

Oh, I should have done it this way yes, it's much better, I didn't like how
the data_len argument needed to be separate from the buffer argument by the
to_line argument. The fact that source_id is implicitly constructible from a
char*, means it "just works" to change the argument to a source_id, without
touching all the call sites in every frontend.

> >  
> >  /* Create a macro map.  A macro map encodes source locations of tokens
> >     that are part of a macro replacement-list, at a macro expansion
> > @@ -1257,7 +1356,7 @@ linemap_position_for_loc_and_offset (class line_maps *set,
> >  inline const char *
> >  LINEMAP_FILE (const line_map_ordinary *ord_map)
> >  {
> > -  return ord_map->to_file;
> > +  return ORDINARY_MAP_FILE_NAME (ord_map);
> >  }
> 
> Presumably this adds the precondition that ORD_MAP isn't an LC_GEN map,
> so please update the leading comment.
>

Done.

> >  
> >  /* Return the line number this map started encoding location from.  */
> 
> [...snip...]
> 
> > diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
> > index e0f82e20571..e63916054e0 100644
> > --- a/libcpp/line-map.cc
> > +++ b/libcpp/line-map.cc
> > @@ -48,6 +48,31 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
> >  extern unsigned num_expanded_macros_counter;
> >  extern unsigned num_macro_tokens_counter;
> >  
> > +bool
> > +source_id::operator== (source_id src) const
> > +{
> > +  return m_len == src.m_len
> > +    && (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer
> > +       ? m_filename_or_buffer == src.m_filename_or_buffer
> > +       : !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer));
> > +}
> 
> This function could really use a leading comment, and I'd much prefer
> it if you converted to if statements rather than one big expression.
> 
> Am I right in thinking that for filenames, we use libiberty's
> filename_cmp (which compares the contents of the buffers), whereas for
> buffers we use pointer equality, and we assume that every buffer ptr is
> different from every filename's ptr?
> 
> As noted above, do we need a fast path for pointer equality before
> calling filename_cmp? 
>

Added now.

> 
> > +
> > +/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
> > +   but for an LC_GEN map, it returns the file name from which the data
> > +   originated, instead of asserting.  */
> > +const char *
> > +ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
> > +                                  const line_map_ordinary *ord_map)
> > +{
> > +  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> > +    {
> > +      ord_map = linemap_included_from_linemap (set, ord_map);
> > +      if (!ord_map)
> > +       return "-";
> 
> How does the above early return happen?  Is it the "read from stdin"
> case?
>

It can't happen the way that GCC uses line_maps. It could happen if someone
created their own line_maps instance and made the very first map an LC_GEN
map. We do that in some selftests, although we don't actually query
ORDINARY_MAP_CONTAINING_FILE_NAME in any of those that currently exist.

> > +    }
> > +  return ORDINARY_MAP_FILE_NAME (ord_map);
> > +}
> > +
> >  /* Destructor for class line_maps.
> >     Ensure non-GC-managed memory is released.  */
> >  
> 
> [...snip...]
> 
> > @@ -505,21 +531,28 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
> >  }
> >  
> >  /* Add a mapping of logical source line to physical source file and
> > -   line number.
> > +   line number.  This function creates an "ordinary map", which is a
> > +   map that records locations of tokens that are not part of macro
> > +   replacement-lists present at a macro expansion point.
> > +
> > +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> > +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> > +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> > +   values considering the file we are returning to.  If reason is LC_GEN, then
> > +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> > +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> >  
> > -   The text pointed to by TO_FILE must have a lifetime
> > -   at least as long as the final call to lookup_line ().  An empty
> > -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> > -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > -   natural values considering the file we are returning to.
> > +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> > +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> > +   map.
> >  
> > -   FROM_LINE should be monotonic increasing across calls to this
> > -   function.  A call to this function can relocate the previous set of
> > -   maps, so any stored line_map pointers should not be used.  */
> > +   A call to this function can relocate the previous set of maps, so any stored
> > +   line_map pointers should not be used.  */
> >  
> >  const struct line_map *
> >  linemap_add (line_maps *set, enum lc_reason reason,
> > -            unsigned int sysp, const char *to_file, linenum_type to_line)
> > +            unsigned int sysp, const char *filename_or_buffer,
> > +            linenum_type to_line, unsigned int data_len)
> 
> As noted above, would passing in a source_id make this simpler? 
> Looking at the logic below, possibly not...
>

Seems to me it is better with this change here yes.

> >  {
> >    /* Generate a start_location above the current highest_location.
> >       If possible, make the low range bits be zero.  */
> > @@ -536,12 +569,24 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >  
> >    /* When we enter the file for the first time reason cannot be
> >       LC_RENAME.  */
> > -  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
> > +  line_map_data *data_to_reuse = nullptr;
> > +  bool is_data_map = (reason == LC_GEN);
> > +  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
> > +    {
> > +      linemap_assert (set->depth != 0);
> > +      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
> > +      linemap_assert (prev);
> > +      if (prev->reason == LC_GEN)
> > +       {
> > +         data_to_reuse = prev->src.data;
> > +         is_data_map = true;
> > +       }
> > +    }
> >  
> >    /* If we are leaving the main file, return a NULL map.  */
> >    if (reason == LC_LEAVE
> >        && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
> > -      && to_file == NULL)
> > +      && filename_or_buffer == NULL)
> >      {
> >        set->depth--;
> >        return NULL;
> > @@ -557,8 +602,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >      = linemap_check_ordinary (new_linemap (set, start_location));
> >    map->reason = reason;
> >  
> > -  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
> > -    to_file = "<stdin>";
> > +  if (filename_or_buffer && *filename_or_buffer == '\0'
> > +      && reason != LC_RENAME_VERBATIM && !is_data_map)
> > +    filename_or_buffer = "<stdin>";
> >  
> >    if (reason == LC_RENAME_VERBATIM)
> >      reason = LC_RENAME;
> > @@ -577,21 +623,50 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >          that comes right before MAP in the same file.  */
> >        from = linemap_included_from_linemap (set, map - 1);
> >  
> > -      /* A TO_FILE of NULL is special - we use the natural values.  */
> > -      if (to_file == NULL)
> > +      /* Not currently supporting a #include originating from an LC_GEN
> > +        map, since there is no clear use case for this and it would complicate
> > +        the logic here.  */
> > +      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
> > +
> > +      /* A null FILENAME_OR_BUFFER is special - we use the natural
> > +        values.  */
> > +      if (!filename_or_buffer)
> >         {
> > -         to_file = ORDINARY_MAP_FILE_NAME (from);
> > +         filename_or_buffer = from->src.file;
> >           to_line = SOURCE_LINE (from, from[1].start_location);
> >           sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
> >         }
> >        else
> >         linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
> > -                                     to_file) == 0);
> > +                                     filename_or_buffer) == 0);
> >      }
> >  
> >    map->sysp = sysp;
> > -  map->to_file = to_file;
> >    map->to_line = to_line;
> > +
> > +  if (is_data_map)
> > +    {
> > +      /* All data maps should have reason == LC_GEN, even if they were
> > +        an LC_RENAME, to keep it simple to check which maps contain
> > +        data.  */
> > +      map->reason = LC_GEN;
> > +
> > +      if (data_to_reuse)
> > +       map->src.data = data_to_reuse;
> > +      else
> > +       {
> > +         auto src_data
> > +           = (line_map_data *)set->reallocator (nullptr,
> > +                                                sizeof (line_map_data));
> > +         src_data->data = filename_or_buffer;
> > +         src_data->len = data_len;
> > +         gcc_assert (data_len);
> > +         map->src.data = src_data;
> > +       }
> > +    }
> > +  else
> > +    map->src.file = filename_or_buffer;
> > +
> >    LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
> >    /* Do not store range_bits here.  That's readjusted in
> >       linemap_line_start.  */
> > @@ -606,7 +681,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >       pure_location_p.  */
> >    linemap_assert (pure_location_p (set, start_location));
> >  
> > -  if (reason == LC_ENTER)
> > +  if (reason == LC_ENTER || reason == LC_GEN)
> >      {
> >        if (set->depth == 0)
> >         map->included_from = 0;
> > @@ -617,7 +692,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >               & ~((1 << map[-1].m_column_and_range_bits) - 1))
> >              + map[-1].start_location);
> >        set->depth++;
> > -      if (set->trace_includes)
> > +      if (set->trace_includes && reason == LC_ENTER)
> >         trace_include (set, map);
> >      }
> >    else if (reason == LC_RENAME)
> > @@ -859,12 +934,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
> >               >= (((uint64_t) 1)
> >                   << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
> >           || range_bits < map->m_range_bits)
> > -       map = linemap_check_ordinary
> > -               (const_cast <line_map *>
> > -                 (linemap_add (set, LC_RENAME,
> > -                               ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> > -                               ORDINARY_MAP_FILE_NAME (map),
> > -                               to_line)));
> > +       {
> > +         const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
> > +           ? nullptr : map->src.file;
> > +         map = linemap_check_ordinary
> > +           (const_cast <line_map *>
> > +            (linemap_add (set, LC_RENAME,
> > +                          ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> > +                          maybe_filename,
> > +                          to_line)));
> > +       }
> >        map->m_column_and_range_bits = column_bits;
> >        map->m_range_bits = range_bits;
> >        r = (MAP_START_LOCATION (map)
> 
> [...snip...]
> 
> 
> Thanks again for the patch; hope this is constructive
> Dave
> 

Thanks for looking at it. I attached a new version of patch 1 that reflects
all this feedback. There are also 4 trivial changes needed to later patches,
to adapt to the new arguments for linemap_add(), which for now I am just
pasting here for reference.

-Lewis

-- >8 --

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index bdaa138fb2f..62c60645e88 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -4604,7 +4604,7 @@ test_fixit_consolidation (const line_table_case &case_)
 {
   line_table_test ltt (case_);
   if (ltt.m_generated_data)
-    linemap_add (line_table, LC_GEN, false, "some content", 1, 13);
+    linemap_add (line_table, LC_GEN, false, source_id("some content", 13), 1);
   else
     linemap_add (line_table, LC_ENTER, false, "test.c", 1);
 
diff --git a/gcc/input.cc b/gcc/input.cc
index 942e6b6f9c7..4c99df7a205 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2068,8 +2068,9 @@ temp_source_file::do_linemap_add (int line)
 {
   const line_map *map;
   if (content_buf)
-    map = linemap_add (line_table, LC_GEN, false, content_buf,
-		       line, content_len);
+    map = linemap_add (line_table, LC_GEN, false,
+		       source_id(content_buf, content_len),
+		       line);
   else
     map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
   return linemap_check_ordinary (map);
@@ -2222,7 +2223,7 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
 
   /* Build a simple linemap describing some locations. */
   if (ltt.m_generated_data)
-    linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+    linemap_add (line_table, LC_GEN, false, source_id("some data", 10), 0);
   else
     linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index d2d83e6dc83..854a5ea65f0 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1956,8 +1956,8 @@ destringize_and_run (cpp_reader *pfile, _cpp__Pragma_state *pstate)
   const unsigned int buf_len = dest - result;
   const int sysp = linemap_location_in_system_header_p (pfile->line_table,
 							pstate->pragma_loc);
-  linemap_add (pfile->line_table, LC_GEN, sysp, (const char *)result, 1,
-	       buf_len);
+  linemap_add (pfile->line_table, LC_GEN, sysp,
+	       source_id((const char *)result, buf_len), 1);
   const auto col_hint = (uchar *) memchr (result, '\n', buf_len) - result;
   linemap_line_start (pfile->line_table, 1, col_hint);
 

[-- Attachment #2: Pragma_locs_v5_1_of_8.txt --]
[-- Type: text/plain, Size: 33546 bytes --]

From: Lewis Hyatt <lhyatt@gmail.com>
Date: Thu, 3 Aug 2023 12:44:14 -0400
Subject: [PATCH v5 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers

Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The TO_FILE member of struct line_map_ordinary has been changed to a union
named SRC which can be either a file name, or a pointer to a line_map_data
struct describing the data. There is no space overhead added to the line
maps data structures.

Outside libcpp, this patch includes only the minimal changes implied by the
adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
patches will implement the new functionality.

libcpp/ChangeLog:

	* include/line-map.h (enum lc_reason): Add LC_GEN.
	(struct line_map_data): New struct.
	(struct line_map_ordinary): Change TO_FILE from a char* to a union,
	and rename to SRC.
	(class source_id): New class.
	(ORDINARY_MAP_GENERATED_DATA_P): New function.
	(ORDINARY_MAP_GENERATED_DATA): New function.
	(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
	(ORDINARY_MAP_SOURCE_ID): New function.
	(ORDINARY_MAPS_SAME_SOURCE_P): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
	(LINEMAP_FILE): Adapt to struct line_map_ordinary change.
	(linemap_get_file_highest_location): Likewise.
	* line-map.cc (source_id::operator==): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
	(linemap_add): Support creating LC_GEN maps.
	(linemap_line_start): Support LC_GEN maps.
	(linemap_check_files_exited): Likewise.
	(linemap_position_for_loc_and_offset): Likewise.
	(linemap_get_expansion_filename): Likewise.
	(linemap_dump): Likewise.
	(linemap_dump_location): Likewise.
	(linemap_get_file_highest_location): Likewise.
	* directives.cc (_cpp_do_file_change): Likewise.

gcc/c-family/ChangeLog:

	* c-common.cc (try_to_locate_new_include_insertion_point): Recognize
	and ignore LC_GEN maps.

gcc/cp/ChangeLog:

	* module.cc (module_state::write_ordinary_maps): Recognize and
	ignore LC_GEN maps, and adapt to interface change in struct
	line_map_ordinary.
	(module_state::read_ordinary_maps): Likewise.

gcc/ChangeLog:

	* diagnostic-show-locus.cc (compatible_locations_p): Adapt to
	interface change in struct line_map_ordinary.
	* input.cc (special_fname_generated): New function.
	(dump_location_info): Support LC_GEN maps.
	(get_substring_ranges_for_loc): Adapt to interface change in struct
	line_map_ordinary.
	* input.h (special_fname_generated): Declare.

gcc/go/ChangeLog:

	* go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
	LC_GEN maps.

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 268462f900e..e60dc16937a 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9208,19 +9208,22 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
       const line_map_ordinary *ord_map
 	= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+      if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+	continue;
+
       if (const line_map_ordinary *from
 	  = linemap_included_from_linemap (line_table, ord_map))
 	/* We cannot use pointer equality, because with preprocessed
 	   input all filename strings are unique.  */
-	if (0 == strcmp (from->to_file, file))
+	if (ORDINARY_MAP_SOURCE_ID (from) == file)
 	  {
 	    last_include_ord_map = from;
 	    last_ord_map_after_include = NULL;
 	  }
 
-      /* Likewise, use strcmp, and reject any line-zero introductory
-	 map.  */
-      if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+      /* Likewise, use strcmp (via the source_id comparison), and reject any
+	 line-zero introductory map.  */
+      if (ord_map->to_line && ORDINARY_MAP_SOURCE_ID (ord_map) == file)
 	{
 	  if (!first_ord_map_in_file)
 	    first_ord_map_in_file = ord_map;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..ff17cd57016 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16250,6 +16250,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
        iter != end; ++iter)
     if (iter->src != current)
       {
+	if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	  continue;
 	current = iter->src;
 	const char *fname = ORDINARY_MAP_FILE_NAME (iter->src);
 
@@ -16267,7 +16269,7 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
 		   preprocessed input we could have multiple instances
 		   of the same name, and we'd rather not percolate
 		   that.  */
-		const_cast<line_map_ordinary *> (iter->src)->to_file = name;
+		const_cast<line_map_ordinary *> (iter->src)->src.file = name;
 		fname = NULL;
 		break;
 	      }
@@ -16295,6 +16297,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
   for (auto iter = ord_loc_remap->begin (), end = ord_loc_remap->end ();
        iter != end; ++iter)
     {
+      if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	continue;
       dump (dumper::LOCATION)
 	&& dump ("Span:%u ordinary [%u+%u,+%u)->[%u,+%u)",
 		 iter - ord_loc_remap->begin (),
@@ -16456,7 +16460,7 @@ module_state::read_ordinary_maps (unsigned num_ord_locs, unsigned range_bits)
 	  map->m_range_bits = sec.u ();
 	  map->m_column_and_range_bits = sec.u () + map->m_range_bits;
 	  unsigned fnum = sec.u ();
-	  map->to_file = (fnum < filenames.length () ? filenames[fnum] : "");
+	  map->src.file = (fnum < filenames.length () ? filenames[fnum] : "");
 	  map->to_line = sec.u ();
 	  base = map;
 	}
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 0514815b51f..5a422f90272 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
 	 are in the same file.  */
       const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
       const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
-      return ord_map_a->to_file == ord_map_b->to_file;
+      return ORDINARY_MAPS_SAME_SOURCE_P (ord_map_a, ord_map_b);
     }
 }
 
diff --git a/gcc/go/go-linemap.cc b/gcc/go/go-linemap.cc
index 1d72e79647d..02d4ce04181 100644
--- a/gcc/go/go-linemap.cc
+++ b/gcc/go/go-linemap.cc
@@ -84,7 +84,8 @@ Gcc_linemap::to_string(Location location)
   resolved_location =
       linemap_resolve_location (line_table, location.gcc_location(),
                                 LRK_SPELLING_LOCATION, &lmo);
-  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT)
+  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT
+      || ORDINARY_MAP_GENERATED_DATA_P (lmo))
     return "";
   const char *path = LINEMAP_FILE (lmo);
   if (!path)
diff --git a/gcc/input.cc b/gcc/input.cc
index eaf301ec7c1..a5db934836a 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -35,6 +35,12 @@ special_fname_builtin ()
   return _("<built-in>");
 }
 
+const char *
+special_fname_generated ()
+{
+  return _("<generated>");
+}
+
 /* Input charset configuration.  */
 static const char *default_charset_callback (const char *)
 {
@@ -1391,7 +1397,19 @@ dump_location_info (FILE *stream)
       fprintf (stream, "ORDINARY MAP: %i\n", idx);
       dump_location_range (stream,
 			   MAP_START_LOCATION (map), end_location);
-      fprintf (stream, "  file: %s\n", ORDINARY_MAP_FILE_NAME (map));
+
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	{
+	  fprintf (stream, "  file: %s%s\n",
+		   ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map),
+		   special_fname_generated ());
+	  fprintf (stream, "  data: %.*s\n",
+		   (int) ORDINARY_MAP_GENERATED_DATA_LEN (map),
+		   ORDINARY_MAP_GENERATED_DATA (map));
+	}
+      else
+	fprintf (stream, "  file: %s\n", LINEMAP_FILE (map));
+
       fprintf (stream, "  starting at line: %i\n",
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (map));
       fprintf (stream, "  column and range bits: %i\n",
@@ -1417,6 +1435,9 @@ dump_location_info (FILE *stream)
       case LC_ENTER_MACRO:
 	reason = "LC_RENAME_MACRO";
 	break;
+      case LC_GEN:
+	reason = "LC_GEN";
+	break;
       default:
 	reason = "Unknown";
       }
@@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       /* Bulletproofing.  We ought to only have different ordinary maps
 	 for start vs finish due to line-length jumps.  */
       if (start_ord_map != final_ord_map
-	  && start_ord_map->to_file != final_ord_map->to_file)
+	  && !ORDINARY_MAPS_SAME_SOURCE_P (start_ord_map, final_ord_map))
 	return "start and finish are spelled in different ordinary maps";
       /* The file from linemap_resolve_location ought to match that from
 	 expand_location_to_spelling_point.  */
-      if (start_ord_map->to_file != start.file)
+      if (ORDINARY_MAP_SOURCE_ID (start_ord_map) != start.file)
 	return "mismatching file after resolving linemap";
 
       location_t start_loc
diff --git a/gcc/input.h b/gcc/input.h
index d1087b7a9e8..1b81a995f86 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -34,6 +34,7 @@ extern GTY(()) class line_maps *saved_line_table;
 
 /* Returns the translated string referring to the special location.  */
 const char *special_fname_builtin ();
+const char *special_fname_generated ();
 
 /* line-map.cc reserves RESERVED_LOCATION_COUNT to the user.  Ensure
    both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index ee5419d1f40..dfd782b3fca 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1165,7 +1165,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
 		     const char *to_file, linenum_type to_line,
 		     unsigned int sysp)
 {
-  linemap_assert (reason != LC_ENTER_MACRO);
+  linemap_assert (reason != LC_ENTER_MACRO && reason != LC_GEN);
 
   const line_map_ordinary *ord_map = NULL;
   if (!to_line && reason == LC_RENAME_VERBATIM)
@@ -1176,7 +1176,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
          preprocessed source.  */
       line_map_ordinary *last = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
       if (!ORDINARY_MAP_STARTING_LINE_NUMBER (last)
-	  && 0 == filename_cmp (to_file, ORDINARY_MAP_FILE_NAME (last))
+	  && ORDINARY_MAP_SOURCE_ID (last) == to_file
 	  && SOURCE_LINE (last, pfile->line_table->highest_line) == 2)
 	{
 	  ord_map = last;
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 44fea0ea08e..4a03be2f9c7 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -75,6 +75,8 @@ enum lc_reason
   LC_RENAME_VERBATIM,	/* Likewise, but "" != stdin.  */
   LC_ENTER_MACRO,	/* Begin macro expansion.  */
   LC_MODULE,		/* A (C++) Module.  */
+  LC_GEN,		/* Internally generated source.  */
+
   /* FIXME: add support for stringize and paste.  */
   LC_HWM /* High Water Mark.  */
 };
@@ -355,6 +357,16 @@ typedef void *(*line_map_realloc) (void *, size_t);
    for a given requested allocation.  */
 typedef size_t (*line_map_round_alloc_size_func) (size_t);
 
+/* Struct to hold the data + size for in-memory data to be stored in a
+   line_map_ordinary.  Because this is used rarely, it is better to
+   dynamically allocate this struct just when needed, rather than adding
+   overhead to every line_map to store the extra field.  */
+struct GTY(()) line_map_data
+{
+  const char * GTY((string_length ("%h.len"))) data;
+  unsigned int len;
+};
+
 /* A line_map encodes a sequence of locations.
    There are two kinds of maps. Ordinary maps and macro expansion
    maps, a.k.a macro maps.
@@ -437,9 +449,15 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
 
   /* Pointer alignment boundary on both 32 and 64-bit systems.  */
 
-  const char *to_file;
-  linenum_type to_line;
+  /* SRC is either the file name, in the typical case, or a pointer to
+     a line_map_data which shows where to find the actual data, for the
+     case of an LC_GEN map.  */
+  union {
+    const char * GTY((tag ("false"))) file;
+    line_map_data * GTY((tag ("true"))) data;
+  } GTY((desc ("ORDINARY_MAP_GENERATED_DATA_P (&%1)"))) src;
 
+  linenum_type to_line;
   /* Location from whence this line map was included.  For regular
      #includes, this location will be the last location of a map.  For
      outermost file, this is 0.  For modules it could be anywhere
@@ -565,6 +583,42 @@ struct GTY((tag ("2"))) line_map_macro : public line_map {
 #define linemap_assert_fails(EXPR) (! (EXPR))
 #endif
 
+/* A source_id represents a location that contains source code, which is usually
+   the name of a file.  But if the buffer length is non-zero, then it refers
+   instead to an in-memory buffer.  This is used so that diagnostics can refer
+   to generated data as well as to normal source code.  */
+
+class source_id
+{
+public:
+  /* This constructor is for the typical case, where the source code lives in
+     a file.  It is not explicit, because this case is by far the most common
+     one, it is worthwhile to allow implicit construction from a string.  */
+  source_id (const char *filename = nullptr)
+    : m_filename_or_buffer (filename),
+      m_len (0)
+  {}
+
+  /* This constructor is for the in-memory data case.  */
+  source_id (const char *buffer, unsigned buffer_len)
+    : m_filename_or_buffer (buffer),
+      m_len (buffer_len)
+  {
+    linemap_assert (buffer_len > 0);
+  }
+
+  explicit operator bool () const { return m_filename_or_buffer; }
+  const char * get_filename_or_buffer () const { return m_filename_or_buffer; }
+  unsigned get_buffer_len () const { return m_len; }
+  bool is_buffer () const { return m_len; }
+  bool operator== (source_id src) const;
+  bool operator!= (source_id src) const { return !(*this == src); }
+
+private:
+  const char *m_filename_or_buffer;
+  unsigned m_len;
+};
+
 /* Get whether location LOC is an ordinary location.  */
 
 inline bool
@@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* TRUE if this line map contains generated data.  */
+inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
+{
+  return ord_map->reason == LC_GEN;
+}
+
 /* TRUE if this line map is for a module (not a source file).  */
 
 inline bool
@@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
 	  && linemap_check_ordinary (map)->reason == LC_MODULE);
 }
 
-/* Get the filename of ordinary map MAP.  */
+/* Get the data contents of ordinary map MAP.  */
 
 inline const char *
 ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  linemap_assert (ord_map->reason != LC_GEN);
+  return ord_map->src.file;
+}
+
+inline const char *
+ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->data;
+}
+
+inline unsigned int
+ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->len;
+}
+
+inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
+{
+  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    return source_id {ord_map->src.data->data, ord_map->src.data->len};
+  return source_id {ord_map->src.file};
+}
+
+/* If we just want to know whether two maps point to the same
+   file/buffer or not.  */
+inline bool
+ORDINARY_MAPS_SAME_SOURCE_P (const line_map_ordinary *map1,
+			   const line_map_ordinary *map2)
+{
+  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
 }
 
+
 /* Get the cpp macro whose expansion gave birth to macro map MAP.  */
 
 inline cpp_hashnode *
@@ -1092,22 +1184,26 @@ extern location_t linemap_line_start
 /* Allocate a raw block of line maps, zero initialized.  */
 extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
 
-/* Add a mapping of logical source line to physical source file and
-   line number. This function creates an "ordinary map", which is a
-   map that records locations of tokens that are not part of macro
-   replacement-lists present at a macro expansion point.
+/* Add a mapping of logical source line to physical source file and line
+   number.  This function creates an "ordinary map", which is a map that
+   records locations of tokens that are not part of macro replacement-lists
+   present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the lifetime of SET.  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
+   SRC is a source_id indicating either the name of the file for the
+   location, or, if reason is LC_GEN, the in-memory data.  In either case
+   the data must have a lifetime at least as long as that of SET.  If reason
+   is LC_LEAVE, and SRC is NULL, then SRC, TO_LINE and SYSP are given their
    natural values considering the file we are returning to.
 
-   A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then SRC may be NULL and will be copied from the source map.
+
+   A call to this function can relocate the previous set of maps, so any
+   stored line_map pointers should not be used.  */
+
 extern const line_map *linemap_add
   (class line_maps *, enum lc_reason, unsigned int sysp,
-   const char *to_file, linenum_type to_line);
+   source_id src, linenum_type to_line);
 
 /* Create a macro map.  A macro map encodes source locations of tokens
    that are part of a macro replacement-list, at a macro expansion
@@ -1253,11 +1349,12 @@ linemap_position_for_loc_and_offset (class line_maps *set,
 				     location_t loc,
 				     unsigned int offset);
 
-/* Return the file this map is for.  */
+/* Return the file this map is for.  ORD_MAP must not be an
+   LC_GEN map.  */
 inline const char *
 LINEMAP_FILE (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  return ORDINARY_MAP_FILE_NAME (ord_map);
 }
 
 /* Return the line number this map started encoding location from.  */
@@ -1277,6 +1374,13 @@ LINEMAP_SYSP (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map);
+
 const struct line_map *first_map_in_common (line_maps *set,
 					    location_t loc0,
 					    location_t loc1,
@@ -2104,12 +2208,10 @@ struct linemap_stats
   long adhoc_table_entries_used;
 };
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
-bool linemap_get_file_highest_location (class line_maps * set,
-					const char *file_name,
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
+bool linemap_get_file_highest_location (line_maps *set, source_id src,
 					location_t *loc);
 
 /* Compute and return statistics about the memory consumption of some
diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index e0f82e20571..582ff3a6a58 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -48,6 +48,47 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
 extern unsigned num_expanded_macros_counter;
 extern unsigned num_macro_tokens_counter;
 
+
+/* Determine if two source_id refer to the same location.  If both are
+   files, they should refer to the same file name.  If both are memory
+   buffers, they should be the same buffer.  Otherwise they are
+   different.  */
+
+bool
+source_id::operator== (source_id src) const
+{
+  if (m_len != src.m_len)
+    return false;
+
+  /* Use pointer equality for generated data buffers.  For file names, if
+     either of them is NULL, the other one should also be NULL.
+     (filename_cmp cannot take a NULL pointer.)  */
+  if (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer)
+    return m_filename_or_buffer == src.m_filename_or_buffer;
+
+  /* Pointer equality is usually sufficient for file names, but PCH may end
+     up using different pointers for the same file, so fall back also to
+     filename_cmp().  */
+  return m_filename_or_buffer == src.m_filename_or_buffer
+    || !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer);
+}
+
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map)
+{
+  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    {
+      ord_map = linemap_included_from_linemap (set, ord_map);
+      if (!ord_map)
+	return "-";
+    }
+  return ORDINARY_MAP_FILE_NAME (ord_map);
+}
+
 /* Destructor for class line_maps.
    Ensure non-GC-managed memory is released.  */
 
@@ -411,8 +452,9 @@ linemap_check_files_exited (line_maps *set)
   for (const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP (set);
        ! MAIN_FILE_P (map);
        map = linemap_included_from_linemap (set, map))
-    fprintf (stderr, "line-map.cc: file \"%s\" entered but not left\n",
-	     ORDINARY_MAP_FILE_NAME (map));
+    fprintf (stderr, "line-map.cc: file \"%s%s\" entered but not left\n",
+	     ORDINARY_MAP_CONTAINING_FILE_NAME (set, map),
+	     ORDINARY_MAP_GENERATED_DATA_P (map) ? "<generated>" : "");
 }
 
 /* Create NUM zero-initialized maps of type MACRO_P.  */
@@ -504,22 +546,26 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
 	  + map->start_location);
 }
 
-/* Add a mapping of logical source line to physical source file and
-   line number.
+/* Add a mapping of logical source line to physical source file and line
+   number.  This function creates an "ordinary map", which is a map that
+   records locations of tokens that are not part of macro replacement-lists
+   present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the final call to lookup_line ().  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
+   SRC is a source_id indicating either the name of the file for the
+   location, or, if reason is LC_GEN, the in-memory data.  In either case
+   the data must have a lifetime at least as long as that of SET.  If reason
+   is LC_LEAVE, and SRC is NULL, then SRC, TO_LINE and SYSP are given their
    natural values considering the file we are returning to.
 
-   FROM_LINE should be monotonic increasing across calls to this
-   function.  A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then SRC may be NULL and will be copied from the source map.
+
+   A call to this function can relocate the previous set of maps, so any
+   stored line_map pointers should not be used.  */
 
 const struct line_map *
-linemap_add (line_maps *set, enum lc_reason reason,
-	     unsigned int sysp, const char *to_file, linenum_type to_line)
+linemap_add (line_maps *set, enum lc_reason reason, unsigned int sysp,
+	     source_id src, linenum_type to_line)
 {
   /* Generate a start_location above the current highest_location.
      If possible, make the low range bits be zero.  */
@@ -536,12 +582,25 @@ linemap_add (line_maps *set, enum lc_reason reason,
 
   /* When we enter the file for the first time reason cannot be
      LC_RENAME.  */
-  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
+  line_map_data *data_to_reuse = nullptr;
+  bool is_data_map = (reason == LC_GEN);
+  linemap_assert (is_data_map == src.is_buffer ());
+  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
+    {
+      linemap_assert (set->depth != 0);
+      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
+      linemap_assert (prev);
+      if (prev->reason == LC_GEN)
+	{
+	  data_to_reuse = prev->src.data;
+	  is_data_map = true;
+	}
+    }
 
   /* If we are leaving the main file, return a NULL map.  */
   if (reason == LC_LEAVE
       && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
-      && to_file == NULL)
+      && !src)
     {
       set->depth--;
       return NULL;
@@ -557,8 +616,8 @@ linemap_add (line_maps *set, enum lc_reason reason,
     = linemap_check_ordinary (new_linemap (set, start_location));
   map->reason = reason;
 
-  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
-    to_file = "<stdin>";
+  if (!is_data_map && src == "" && reason != LC_RENAME_VERBATIM)
+    src = "<stdin>";
 
   if (reason == LC_RENAME_VERBATIM)
     reason = LC_RENAME;
@@ -577,21 +636,49 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	 that comes right before MAP in the same file.  */
       from = linemap_included_from_linemap (set, map - 1);
 
-      /* A TO_FILE of NULL is special - we use the natural values.  */
-      if (to_file == NULL)
+      /* Not currently supporting a #include originating from an LC_GEN
+	 map, since there is no clear use case for this and it would complicate
+	 the logic here.  */
+      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
+
+      /* A null SRC is special - we use the natural
+	 values.  */
+      if (!src)
 	{
-	  to_file = ORDINARY_MAP_FILE_NAME (from);
+	  src = from->src.file;
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
       else
-	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-				      to_file) == 0);
+	linemap_assert (ORDINARY_MAP_SOURCE_ID (from) == src);
     }
 
   map->sysp = sysp;
-  map->to_file = to_file;
   map->to_line = to_line;
+
+  if (is_data_map)
+    {
+      /* All data maps should have reason == LC_GEN, even if they were
+	 an LC_RENAME, to keep it simple to check which maps contain
+	 data.  */
+      map->reason = LC_GEN;
+
+      if (data_to_reuse)
+	map->src.data = data_to_reuse;
+      else
+	{
+	  auto src_data
+	    = (line_map_data *)set->reallocator (nullptr,
+						 sizeof (line_map_data));
+	  src_data->data = src.get_filename_or_buffer ();
+	  src_data->len = src.get_buffer_len ();
+	  linemap_assert (src_data->len);
+	  map->src.data = src_data;
+	}
+    }
+  else
+    map->src.file = src.get_filename_or_buffer ();
+
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
   /* Do not store range_bits here.  That's readjusted in
      linemap_line_start.  */
@@ -606,7 +693,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
      pure_location_p.  */
   linemap_assert (pure_location_p (set, start_location));
 
-  if (reason == LC_ENTER)
+  if (reason == LC_ENTER || reason == LC_GEN)
     {
       if (set->depth == 0)
 	map->included_from = 0;
@@ -617,7 +704,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	      & ~((1 << map[-1].m_column_and_range_bits) - 1))
 	     + map[-1].start_location);
       set->depth++;
-      if (set->trace_includes)
+      if (set->trace_includes && reason == LC_ENTER)
 	trace_include (set, map);
     }
   else if (reason == LC_RENAME)
@@ -859,12 +946,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
 	      >= (((uint64_t) 1)
 		  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
 	  || range_bits < map->m_range_bits)
-	map = linemap_check_ordinary
-	        (const_cast <line_map *>
-		  (linemap_add (set, LC_RENAME,
-				ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
-				ORDINARY_MAP_FILE_NAME (map),
-				to_line)));
+	{
+	  const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
+	    ? nullptr : map->src.file;
+	  map = linemap_check_ordinary
+	    (const_cast <line_map *>
+	     (linemap_add (set, LC_RENAME,
+			   ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
+			   maybe_filename,
+			   to_line)));
+	}
       map->m_column_and_range_bits = column_bits;
       map->m_range_bits = range_bits;
       r = (MAP_START_LOCATION (map)
@@ -1023,9 +1114,9 @@ linemap_position_for_loc_and_offset (line_maps *set,
 	     >= MAP_START_LOCATION (map + 1)); map++)
     /* If the next map is a different file, or starts in a higher line, we
        cannot encode the location there.  */
-    if ((map + 1)->reason != LC_RENAME
+    if (((map + 1)->reason != LC_RENAME && (map + 1)->reason != LC_GEN)
 	|| line < ORDINARY_MAP_STARTING_LINE_NUMBER (map + 1)
-	|| 0 != strcmp (LINEMAP_FILE (map + 1), LINEMAP_FILE (map)))
+	|| !ORDINARY_MAPS_SAME_SOURCE_P (map, map + 1))
       return loc;
 
   column += column_offset;
@@ -1283,7 +1374,7 @@ linemap_get_expansion_filename (line_maps *set,
 
   linemap_macro_loc_to_exp_point (set, location, &map);
 
-  return LINEMAP_FILE (map);
+  return ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
 }
 
 /* Return the name of the macro associated to MACRO_MAP.  */
@@ -1873,7 +1964,7 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
 {
   const char *const lc_reasons_v[LC_HWM]
       = { "LC_ENTER", "LC_LEAVE", "LC_RENAME", "LC_RENAME_VERBATIM",
-	  "LC_ENTER_MACRO", "LC_MODULE" };
+	  "LC_ENTER_MACRO", "LC_MODULE", "LC_GEN" };
   const line_map *map;
   unsigned reason;
 
@@ -1903,11 +1994,15 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
       const line_map_ordinary *includer_map
 	= linemap_included_from_linemap (set, ord_map);
 
-      fprintf (stream, "File: %s:%d\n", ORDINARY_MAP_FILE_NAME (ord_map),
+      fprintf (stream, "File: %s:%d\n",
+	       ORDINARY_MAP_GENERATED_DATA_P (ord_map) ? "<generated>"
+	       : ORDINARY_MAP_FILE_NAME (ord_map),
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (ord_map));
       fprintf (stream, "Included from: [%d] %s\n",
 	       includer_map ? int (includer_map - set->info_ordinary.maps) : -1,
-	       includer_map ? ORDINARY_MAP_FILE_NAME (includer_map) : "None");
+	       includer_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set,
+								 includer_map)
+	       : "None");
     }
   else
     {
@@ -1931,7 +2026,7 @@ linemap_dump_location (line_maps *set,
 {
   const line_map_ordinary *map;
   location_t location;
-  const char *path = "", *from = "";
+  const char *path = "", *path_suffix = "", *from = "";
   int l = -1, c = -1, s = -1, e = -1;
 
   if (IS_ADHOC_LOC (loc))
@@ -1948,7 +2043,9 @@ linemap_dump_location (line_maps *set,
     linemap_assert (location < RESERVED_LOCATION_COUNT);
   else
     {
-      path = LINEMAP_FILE (map);
+      path = ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	path_suffix = "<generated>";
       l = SOURCE_LINE (map, location);
       c = SOURCE_COLUMN (map, location);
       s = LINEMAP_SYSP (map) != 0;
@@ -1959,24 +2056,23 @@ linemap_dump_location (line_maps *set,
 	{
 	  const line_map_ordinary *from_map
 	    = linemap_included_from_linemap (set, map);
-	  from = from_map ? LINEMAP_FILE (from_map) : "<NULL>";
+	  from = from_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set, from_map)
+	    : "<NULL>";
 	}
     }
 
   /* P: path, L: line, C: column, S: in-system-header, M: map address,
      E: macro expansion?, LOC: original location, R: resolved location   */
-  fprintf (stream, "{P:%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
-	   path, from, l, c, s, (void*)map, e, loc, location);
+  fprintf (stream, "{P:%s%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
+	   path, path_suffix, from, l, c, s, (void*)map, e, loc, location);
 }
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
 
 bool
-linemap_get_file_highest_location (line_maps *set,
-				   const char *file_name,
+linemap_get_file_highest_location (line_maps *set, source_id src,
 				   location_t *loc)
 {
   /* If the set is empty or no ordinary map has been created then
@@ -1984,12 +2080,11 @@ linemap_get_file_highest_location (line_maps *set,
   if (set == NULL || set->info_ordinary.used == 0)
     return false;
 
-  /* Now look for the last ordinary map created for FILE_NAME.  */
+  /* Now look for the last ordinary map created for this file.  */
   int i;
   for (i = set->info_ordinary.used - 1; i >= 0; --i)
     {
-      const char *fname = set->info_ordinary.maps[i].to_file;
-      if (fname && !filename_cmp (fname, file_name))
+      if (ORDINARY_MAP_SOURCE_ID (set->info_ordinary.maps + i) == src)
 	break;
     }
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations
  2023-08-11 23:02           ` David Malcolm
@ 2023-08-14 21:41             ` Lewis Hyatt
  0 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-14 21:41 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Fri, Aug 11, 2023 at 07:02:49PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The previous patch in this series introduced the concept of LC_GEN line
> > maps. This patch continues on the path to using them to improve _Pragma
> > diagnostics, by adding a new source_id SRC member to struct
> > expanded_location, which is populated by linemap_expand_location. This
> > member allows call sites to detect and handle when a location refers to
> > generated data rather than a plain file name.
> > 
> > The previous FILE member of expanded_location is preserved (although
> > redundant with SRC), so that call sites which do not and never will care
> > about generated data do not need to be concerned about it. Call sites that
> > will care are modified here, to use SRC rather than FILE for comparing
> > locations.
> 
> Thanks; this seems like a good approach.
> 
> 
> [...snip...]
> 
> > diff --git a/gcc/edit-context.cc b/gcc/edit-context.cc
> > index 6f5bc6b9d8f..15052aec417 100644
> > --- a/gcc/edit-context.cc
> > +++ b/gcc/edit-context.cc
> > @@ -295,7 +295,7 @@ edit_context::apply_fixit (const fixit_hint *hint)
> >  {
> >    expanded_location start = expand_location (hint->get_start_loc ());
> >    expanded_location next_loc = expand_location (hint->get_next_loc ());
> > -  if (start.file != next_loc.file)
> > +  if (start.src != next_loc.src || start.src.is_buffer ())
> >      return false;
> >    if (start.line != next_loc.line)
> >      return false;
> 
> Thinking about fix-it hints, it makes sense to reject attempts to
> create fix-it hints within generated strings, as we can't apply them or
> visualize them.
> 
> Does anywhere in the patch kit do that?  Either of 
>   rich_location::maybe_add_fixit
> or
>   rich_location::reject_impossible_fixit
> would be good places to do that.
>

So rich_location::reject_impossible_fixit does reject them for _Pragmas now,
because what the frontend sees and passes to it is a virtual location, and it
always rejects virtual locations. But it doesn't reject arbitrary generated
data locations that may be created in an ordinary non-virtual location. I
think it's this one-line change to reject those:

-- >8 --

diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index 835e8e1b8cd..382594637ad 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -2545,7 +2545,8 @@ rich_location::maybe_add_fixit (location_t start,
     = linemap_client_expand_location_to_spelling_point (next_loc,
                                                        LOCATION_ASPECT_START);
   /* They must be within the same file...  */
-  if (exploc_start.src != exploc_next_loc.src)
+  if (exploc_start.src != exploc_next_loc.src
+      || exploc_start.src.is_buffer ())
     {
       stop_supporting_fixits ();
       return;

-- >8 --

However, there are many selftests in diagnostic-show-locus.cc that actually
verify we generate the fixit hints for generated data, so I would need also to
change those to skip the test in this case as well. That looks like this:

-- >8 --

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 62c60645e88..884c55e91e9 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -3824,6 +3824,8 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
   test_one_liner_simple_caret ();
   test_one_liner_caret_and_range ();
   test_one_liner_multiple_carets_and_ranges ();
+  if (!ltt.m_generated_data)
+    {
       test_one_liner_fixit_insert_before ();
       test_one_liner_fixit_insert_after ();
       test_one_liner_fixit_remove ();
@@ -3835,6 +3837,7 @@ test_diagnostic_show_locus_one_liner (const line_table_case &case_)
       test_one_liner_many_fixits_2 ();
       test_one_liner_labels ();
     }
+}

 /* Version of all one-liner tests exercising multibyte awareness.  For
    simplicity we stick to using two multibyte characters in the test, U+1F602
@@ -4419,6 +4422,8 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
   test_one_liner_simple_caret_utf8 ();
   test_one_liner_caret_and_range_utf8 ();
   test_one_liner_multiple_carets_and_ranges_utf8 ();
+  if (!ltt.m_generated_data)
+    {
       test_one_liner_fixit_insert_before_utf8 ();
       test_one_liner_fixit_insert_after_utf8 ();
       test_one_liner_fixit_remove_utf8 ();
@@ -4428,6 +4433,7 @@ test_diagnostic_show_locus_one_liner_utf8 (const line_table_case &case_)
       test_one_liner_fixit_validation_adhoc_locations_utf8 ();
       test_one_liner_many_fixits_1_utf8 ();
       test_one_liner_many_fixits_2_utf8 ();
+    }
   test_one_liner_labels_utf8 ();
   test_one_liner_colorized_utf8 ();
 }
@@ -5726,15 +5732,15 @@ diagnostic_show_locus_cc_tests ()
   for_each_line_table_case (test_diagnostic_show_locus_one_liner, true);
   for_each_line_table_case (test_diagnostic_show_locus_one_liner_utf8, true);
   for_each_line_table_case (test_add_location_if_nearby, true);
-  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines, true);
-  for_each_line_table_case (test_fixit_consolidation, true);
-  for_each_line_table_case (test_overlapped_fixit_printing, true);
-  for_each_line_table_case (test_overlapped_fixit_printing_utf8, true);
-  for_each_line_table_case (test_overlapped_fixit_printing_2, true);
-  for_each_line_table_case (test_fixit_insert_containing_newline, true);
-  for_each_line_table_case (test_fixit_insert_containing_newline_2, true);
-  for_each_line_table_case (test_fixit_replace_containing_newline, true);
-  for_each_line_table_case (test_fixit_deletion_affecting_newline, true);
+  for_each_line_table_case (test_diagnostic_show_locus_fixit_lines, false);
+  for_each_line_table_case (test_fixit_consolidation, false);
+  for_each_line_table_case (test_overlapped_fixit_printing, false);
+  for_each_line_table_case (test_overlapped_fixit_printing_utf8, false);
+  for_each_line_table_case (test_overlapped_fixit_printing_2, false);
+  for_each_line_table_case (test_fixit_insert_containing_newline, false);
+  for_each_line_table_case (test_fixit_insert_containing_newline_2, false);
+  for_each_line_table_case (test_fixit_replace_containing_newline, false);
+  for_each_line_table_case (test_fixit_deletion_affecting_newline, false);
   for_each_line_table_case (test_tab_expansion, true);
   for_each_line_table_case (test_escaping_bytes_1, true);
   for_each_line_table_case (test_escaping_bytes_2, true);

-- >8 --

(The above diff was with -w to avoid a lot of useless indent changes, just for
illustration what it does.)

> 
> [...snip...]
> 
> > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> > index e59123b18c5..76617fe6129 100644
> > --- a/libcpp/include/line-map.h
> > +++ b/libcpp/include/line-map.h
> > @@ -1410,18 +1410,22 @@ linemap_location_before_p (class line_maps *set,
> >  
> >  typedef struct
> >  {
> > -  /* The name of the source file involved.  */
> > -  const char *file;
> > +  /* The file name of the location involved, or NULL if the location
> > +     is not in an external file.  */
> > +  const char *file = nullptr;
> >  
> > -  /* The line-location in the source file.  */
> > -  int line;
> > -
> > -  int column;
> > +  /* A source_id recording the file name and/or the in-memory content,
> > +     as appropriate.  Users that need to handle in-memory content need
> > +     to use this rather than FILE.  */
> > +  source_id src;
> >  
> > -  void *data;
> > +  /* The line-location in the source file.  */
> > +  int line = 0;
> > +  int column = 0;
> > +  void *data = nullptr;
> >  
> > -  /* In a system header?. */
> > -  bool sysp;
> > +  /* In a system header?  */
> > +  bool sysp = false;
> >  } expanded_location;
> 
> I don't know if we've been using default member initialization yet, but
> apparently it's C++11, and thus OK.
>

Thanks, I feel like it does make things more maintainable. FWIW, I did verify
it builds with gcc 4.8.5.

> [...snip...]
> 
> 
> This patch looks good to me, but obviously it has dependencies on the
> rest of the kit.
> 
> Dave
>

Thank you, please let me know if I should also apply the above tweaks or no?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot
  2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
@ 2023-08-15 15:43           ` David Malcolm
  2023-08-15 17:58             ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-15 15:43 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Class file_cache_slot in input.cc is used to query specific lines of source
> code from a file when needed by diagnostics infrastructure. This will be
> extended in a subsequent patch to support obtaining the source code from
> in-memory generated buffers rather than from a file. The present patch
> refactors class file_cache_slot, putting most of the logic into a new base
> class cache_data_source, in preparation for reusing that code in the next
> patch. There is no change in functionality yet.
> 
> gcc/ChangeLog:
> 
>         * input.cc (class file_cache_slot): Refactor functionality into a
>         new base class...
>         (class cache_data_source): ...here.
>         (file_cache::forcibly_evict_file): Adapt for refactoring.
>         (file_cache_slot::evict): Renamed to...
>         (file_cache_slot::reset): ...this, and partially refactored into
>         base class...
>         (cache_data_source::reset): ...here.
>         (file_cache_slot::get_full_file_content): Moved into base class...
>         (cache_data_source::get_full_file_content): ...here.
>         (file_cache_slot::create): Adapt for refactoring.
>         (file_cache_slot::file_cache_slot): Refactor partially into...
>         (cache_data_source::cache_data_source): ...here.
>         (file_cache_slot::~file_cache_slot): Refactor partially into...
>         (cache_data_source::~cache_data_source): ...here.
>         (file_cache_slot::needs_read_p): Remove.
>         (file_cache_slot::needs_grow_p): Remove.
>         (file_cache_slot::maybe_grow): Adapt for refactoring.
>         (file_cache_slot::read_data): Refactored, along with...
>         (file_cache_slot::maybe_read_data): this, into...
>         (file_cache_slot::get_more_data): ...here.
>         (find_end_of_line): Change interface to take a pair of pointers,
>         rather than a pointer + length.
>         (file_cache_slot::get_next_line): Refactored into...
>         (cache_data_source::get_next_line): ...here.
>         (file_cache_slot::goto_next_line): Refactored into...
>         (cache_data_source::goto_next_line): ...here.
>         (file_cache_slot::read_line_num): Refactored into...
>         (cache_data_source::read_line_num): ...here.
>         (location_get_source_line): Fix const-correctness as necessitated by
>         new interface.
> ---
>  gcc/input.cc | 513 +++++++++++++++++++++++----------------------------
>  1 file changed, 235 insertions(+), 278 deletions(-)
> 

I confess I had to reread both this and patch 4/8 to make sense of
this; this is probably one of those cases where it's harder to read in
patch form than as source, but I think I now understand the new
implementation.

Did you try testing this with valgrind (e.g. "make selftest-valgrind")?

I don't think we have any selftest coverage for "\r" in the line-break
handling; that would be good to add.

This patch is OK for trunk once the rest of the kit is approved.

Thanks
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
@ 2023-08-15 16:15           ` David Malcolm
  2023-08-15 18:15             ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-15 16:15 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> This patch enhances location_get_source_line(), which is the primary
> interface provided by the diagnostics infrastructure to obtain the line of
> source code corresponding to a given location, so that it understands
> generated data locations in addition to normal file-based locations. This
> involves changing the argument to location_get_source_line() from a plain
> file name, to a source_id object that can represent either type of location.
> 
> gcc/ChangeLog:
> 
>         * input.cc (class data_cache_slot): New class.
>         (file_cache::lookup_data): New function.
>         (diagnostics_file_cache_forcibly_evict_data): New function.
>         (file_cache::forcibly_evict_data): New function.
>         (file_cache::evicted_cache_tab_entry): Generalize (via a template)
>         to work for both file_cache_slot and data_cache_slot.
>         (file_cache::add_file): Adapt for new interface to
>         evicted_cache_tab_entry.
>         (file_cache::add_data): New function.
>         (data_cache_slot::create): New function.
>         (file_cache::file_cache): Support the new m_data_slots member.
>         (file_cache::~file_cache): Likewise.
>         (file_cache::lookup_or_add_data): New function.
>         (file_cache::lookup_or_add): New function that calls either
>         lookup_or_add_data or lookup_or_add_file as appropriate.
>         (location_get_source_line): Change the FILE_PATH argument to a
>         source_id SRC, and use it to support obtaining source lines from
>         generated data as well as from files.
>         (location_compute_display_column): Support generated data using the
>         new features of location_get_source_line.
>         (dump_location_info): Likewise.
>         * input.h (location_get_source_line): Adjust prototype. Add a new
>         convenience overload taking an expanded_location.
>         (class cache_data_source): Declare.
>         (class data_cache_slot): Declare.
>         (class file_cache): Declare new members.
>         (diagnostics_file_cache_forcibly_evict_data): Declare.
> ---
>  gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
>  gcc/input.h  |  23 +++++--
>  2 files changed, 153 insertions(+), 41 deletions(-)
> 
> diff --git a/gcc/input.cc b/gcc/input.cc
> index 9377020b460..790279d4273 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc
> @@ -207,6 +207,28 @@ private:
>    void maybe_grow ();
>  };
>  
> +/* This is the implementation of cache_data_source for generated
> +   data that is already in memory.  */
> +class data_cache_slot final : public cache_data_source

It occurred to me: why are we caching accessing a buffer that's already
in memory - but we're also caching the line-splitting information, and
providing the line-splitting algorithm with a consistent interface to
the data, right?

[...snip...]

> @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
>    global_dc->m_file_cache->forcibly_evict_file (file_path);
>  }
>  
> +void
> +diagnostics_file_cache_forcibly_evict_data (const char *data,
> +                                           unsigned int data_len)
> +{
> +  if (!global_dc->m_file_cache)
> +    return;
> +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);

Maybe we should rename diagnostic_context's m_file_cache to
m_source_cache?  (and class file_cache for that matter?)  But if so,
that can/should be a followup/separate patch.

[...snip...]
 
> @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
>    return true;
>  }
>  
> +void
> +data_cache_slot::create (const char *data, unsigned int data_len,
> +                        unsigned int highest_use_count)
> +{
> +  reset ();
> +  on_create (highest_use_count + 1,
> +            total_lines_num (source_id {data, data_len}));
> +  m_data_begin = data;
> +  m_data_end = data + data_len;
> +}
> +
>  /* file_cache's ctor.  */
>  
>  file_cache::file_cache ()
> -: m_file_slots (new file_cache_slot[num_file_slots])
> +  : m_file_slots (new file_cache_slot[num_file_slots]),
> +    m_data_slots (new data_cache_slot[num_file_slots])

Should "num_file_slots" be renamed to "num_slots"?

I assume you're using the same value for both kinds of slot since the
file_cache::evicted_cache_tab_entry template uses this.  I suppose the
number could be passed in as an argument to that function if we wanted
to have different sizes for the two kinds, but I don't think it
matters.

[...snip...]

> @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
>     If the function fails, a NULL char_span is returned.  */
>  
>  char_span
> -location_get_source_line (const char *file_path, int line)
> +location_get_source_line (source_id src, int line)
>  {
> -  const char *buffer = NULL;
> -  ssize_t len;
> -
> -  if (line == 0)
> -    return char_span (NULL, 0);
> -
> -  if (file_path == NULL)
> -    return char_span (NULL, 0);
> +  const char_span fail (nullptr, 0);
> +  if (!src || line <= 0)
> +    return fail;

Looking at source_id's operator bool, are there effectively three kinds
of source_id?

(a) file names
(b) generated buffer
(c) NULL == m_filename_or_buffer

What does (c) mean?  Is it a "something's gone wrong/error" state?  Or
is this more a special-case of (a)? (in that the m_len for such a case
would be zero)

Should source_id's 2-param ctor have an assert that the ptr is non-
NULL?

[...snip...]

The patch is OK for trunk as-is, but note the question about the
source_id ctor above.

Thanks
Dave



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests
  2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
@ 2023-08-15 16:27           ` David Malcolm
  0 siblings, 0 replies; 36+ messages in thread
From: David Malcolm @ 2023-08-15 16:27 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Add selftests for the new capabilities in input.cc related to source code
> locations that are stored in memory rather than ordinary files.
> 
> gcc/ChangeLog:
> 
>         * input.cc (temp_source_file::do_linemap_add): New function.
>         (line_table_case::line_table_case): Add GENERATED_DATA argument.
>         (line_table_test::line_table_test): Implement new M_GENERATED_DATA
>         argument.
>         (for_each_line_table_case): Optionally include generated data
>         locations in the set of cases.
>         (test_accessing_ordinary_linemaps): Test generated data locations.
>         (test_make_location_nonpure_range_endpoints): Likewise.
>         (test_line_offset_overflow): Likewise.
>         (input_cc_tests): Likewise.
>         * selftest.cc (named_temp_file::named_temp_file): Interpret a null
>         SUFFIX argument as a request to use in-memory data.
>         (named_temp_file::~named_temp_file): Support in-memory data.
>         (temp_source_file::temp_source_file): Likewise.
>         (temp_source_file::~temp_source_file): Likewise.
>         * selftest.h (struct line_map_ordinary): Foward declare.
>         (class named_temp_file): Add missing explicit to the constructor.
>         (class temp_source_file): Add new members to support in-memory data.
>         (class line_table_test): Likewise.
>         (for_each_line_table_case): Adjust prototype.
> ---
>  gcc/input.cc    | 81 +++++++++++++++++++++++++++++++++----------------
>  gcc/selftest.cc | 53 +++++++++++++++++++++++++-------
>  gcc/selftest.h  | 19 ++++++++++--
>  3 files changed, 113 insertions(+), 40 deletions(-)
> 

Thanks; looks good to me.

Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 6/8] diagnostics: Full support for generated data locations
  2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
@ 2023-08-15 16:39           ` David Malcolm
  0 siblings, 0 replies; 36+ messages in thread
From: David Malcolm @ 2023-08-15 16:39 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> Previous patches in this series have laid the groundwork for supporting
> source code locations in memory ("generated data") rather than ordinary
> files. This patch completes the support by adding awareness of such
> locations to all places that need to support them. The main changes are to
> diagnostic-show-locus.cc; the others are primarily small tweaks such as
> changing from the FILE to the SRC member when inspecting an
> expanded_location.
> 
> gcc/c-family/ChangeLog:
> 
>         * c-format.cc (get_corrected_substring): Use the new overload of
>         location_get_source_line() to support generated data.
>         * c-indentation.cc (get_visual_column): Likewise.
>         (get_first_nws_vis_column): Change argument from a plain file name
>         to a source_id.
>         (detect_intervening_unindent): Likewise.
>         (should_warn_for_misleading_indentation): Pass
>         detect_intervening_unindent() the SRC field rather than the FILE
>         field from the expanded_location.
> 
> gcc/ChangeLog:
> 
>         * gcc-rich-location.cc (blank_line_before_p): Use the new overload
>         of location_get_source_line() to support generated data.
>         * input.cc (get_source_text_between): Likewise.
>         (get_substring_ranges_for_loc): Likewise.
>         (get_source_file_content): Change the argument from a plain filename
>         to a source_id.
>         (location_missing_trailing_newline): Likewise.
>         * input.h (get_source_file_content): Adjust prototype.
>         (location_missing_trailing_newline): Likewise.
>         * diagnostic-show-locus.cc (layout::calculate_x_offset_display): Use
>         the new overload of location_get_source_line() to support generated
>         data.
>         (layout::print_line): Likewise.
>         (class line_corrections): Change m_filename from a plain filename to
>         a source_id.
>         (source_line::source_line): Change argument from a plain filename to
>         a source_id.
>         (line_corrections::add_hint): Adapt to source_line change.
>         (layout::print_trailing_fixits): Adapt to line_corrections change.
>         (test_layout_x_offset_display_utf8): Test generated data too.
>         (test_layout_x_offset_display_tab): Likewise.
>         (test_diagnostic_show_locus_one_liner): Likewise.
>         (test_diagnostic_show_locus_one_liner_utf8): Likewise.
>         (test_add_location_if_nearby): Likewise.
>         (test_diagnostic_show_locus_fixit_lines): Likewise.
>         (test_fixit_consolidation): Likewise.
>         (test_overlapped_fixit_printing): Likewise.
>         (test_overlapped_fixit_printing_utf8): Likewise.
>         (test_overlapped_fixit_printing_2): Likewise.
>         (test_fixit_insert_containing_newline): Likewise.
>         (test_fixit_insert_containing_newline_2): Likewise.
>         (test_fixit_replace_containing_newline): Likewise.
>         (test_fixit_deletion_affecting_newline): Likewise.
>         (test_tab_expansion): Likewise.
>         (test_escaping_bytes_1): Likewise.
>         (test_escaping_bytes_2): Likewise.
>         (test_line_numbers_multiline_range): Likewise.
>         (diagnostic_show_locus_cc_tests): Likewise.
> ---
>  gcc/c-family/c-format.cc      |   2 +-
>  gcc/c-family/c-indentation.cc |   8 +-
>  gcc/diagnostic-show-locus.cc  | 227 ++++++++++++++++++----------------
>  gcc/gcc-rich-location.cc      |   2 +-
>  gcc/input.cc                  |  21 ++--
>  gcc/input.h                   |   6 +-
>  6 files changed, 136 insertions(+), 130 deletions(-)
> 

Looks OK for trunk as-is (assuming prerequisites, of course), but as I
think you noted elsewhere this probably needs revising if we're going
to reject applying fix-it-hints to locations in generated data buffers.

Thanks
Dave

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output
  2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
@ 2023-08-15 17:04           ` David Malcolm
  2023-08-15 17:51             ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-15 17:04 UTC (permalink / raw)
  To: Lewis Hyatt, gcc-patches

On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> The diagnostics routines for SARIF output need to read the source code back
> in, so that they can generate "snippet" and "content" records, so they need to
> be able to cope with generated data locations.  Add support for that in
> diagnostic-format-sarif.cc.
> 
> gcc/ChangeLog:
> 
>         * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
>         to support generated data locations.
>         (sarif_builder::maybe_make_physical_location_object): Change the
>         m_filenames hash_set to support generated data.
>         (sarif_builder::make_artifact_location_object): Use a source_id rather
>         than a plain file name.
>         (sarif_builder::maybe_make_region_object): Adapt to
>         expanded_location interface changes.
>         (sarif_builder::maybe_make_region_object_for_context): Likewise.
>         (sarif_builder::make_artifact_object): Likewise.
>         (sarif_builder::make_run_object): Handle generated data.
>         (sarif_builder::maybe_make_artifact_content_object): Likewise.
>         (get_source_lines): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>         * c-c++-common/diagnostic-format-sarif-file-5.c: New test.

I'm not sure if generated data is allowed as part of a SARIF artefact,
or if there's a more standard-compliant way of representing this; SARIF
says an artefact is a "sequence of bytes addressable via a URI".

Can you post a simple example of the generated .sarif JSON please? 
e.g. from the new test, so that we can see it looks like.

You could run it through:

  python -m json.tool 

to format it for easier reading.


Thanks
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output
  2023-08-15 17:04           ` David Malcolm
@ 2023-08-15 17:51             ` Lewis Hyatt
  0 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-15 17:51 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3534 bytes --]

On Tue, Aug 15, 2023 at 01:04:04PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > The diagnostics routines for SARIF output need to read the source code back
> > in, so that they can generate "snippet" and "content" records, so they need to
> > be able to cope with generated data locations.  Add support for that in
> > diagnostic-format-sarif.cc.
> > 
> > gcc/ChangeLog:
> > 
> >         * diagnostic-format-sarif.cc (class sarif_builder): Adapt interface
> >         to support generated data locations.
> >         (sarif_builder::maybe_make_physical_location_object): Change the
> >         m_filenames hash_set to support generated data.
> >         (sarif_builder::make_artifact_location_object): Use a source_id rather
> >         than a plain file name.
> >         (sarif_builder::maybe_make_region_object): Adapt to
> >         expanded_location interface changes.
> >         (sarif_builder::maybe_make_region_object_for_context): Likewise.
> >         (sarif_builder::make_artifact_object): Likewise.
> >         (sarif_builder::make_run_object): Handle generated data.
> >         (sarif_builder::maybe_make_artifact_content_object): Likewise.
> >         (get_source_lines): Likewise.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >         * c-c++-common/diagnostic-format-sarif-file-5.c: New test.
> 
> I'm not sure if generated data is allowed as part of a SARIF artefact,
> or if there's a more standard-compliant way of representing this; SARIF
> says an artefact is a "sequence of bytes addressable via a URI".
> 
> Can you post a simple example of the generated .sarif JSON please? 
> e.g. from the new test, so that we can see it looks like.
> 
> You could run it through:
> 
>   python -m json.tool 
> 
> to format it for easier reading.

For a simple example like:

_Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")

for which the normal output is:

=====
In buffer generated from t.cpp:1:
<generated>:1:24: warning: unknown option after ‘#pragma GCC diagnostic’ kind [-Wpragmas]
    1 | GCC diagnostic ignored "-Wnot-an-option"
      |                        ^~~~~~~~~~~~~~~~~
t.cpp:1:1: note: in <_Pragma directive>
    1 | _Pragma("GCC diagnostic ignored \"-Wnot-an-option\"")
      | ^~~~~~~
=====

The SARIF output does not end up referencing any generated data locations,
because those are logically part of the "expansion" of the _Pragma
directive, and it doesn't output macro expansions.  In order for SARIF to
currently do something with generated data, it needs to see a generated data
location in a non-macro context. The only way to get GCC to do that, right
now, is with -fdump-internal-locations, which is what the new test case
does. That just unfortunately generates a larger amount of output. I attached
it, in case that's still helpful, for the following program:

=====
_Pragma("GCC diagnostic push")
=====

I guess there's potentially already a problem here because 'python -m
json.tool' is unhappy with this output and refuses to process it:

=====
Invalid \escape: line 1 column 3436 (char 3435)
=====

The related text is:
=====
{"location": {"uri": "<generated>", "uriBaseId": "PWD"},
"contents":{"text": "GCC diagnostic push\n\0"}
=====

And the \0 is not allowed it seems?

I also attached the output of 'python -m json.tool' anyway, after manually
removing the \0.

Is it better to just skip these locations for now?

-Lewis

[-- Attachment #2: t.cpp.sarif --]
[-- Type: text/plain, Size: 5872 bytes --]

{"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json", "version": "2.1.0", "runs": [{"tool": {"driver": {"name": "GNU C++17", "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)", "version": "14.0.0 20230811 (experimental)", "informationUri": "https://gcc.gnu.org/gcc-14/", "rules": []}}, "invocations": [{"executionSuccessful": true, "toolExecutionNotifications": []}], "originalUriBaseIds": {"PWD": {"uri": "file:///home/lewis/"}}, "artifacts": [{"location": {"uri": "t.cpp", "uriBaseId": "PWD"}, "contents": {"text": "_Pragma(\"GCC diagnostic push\")\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "/usr/include/stdc-predef.h"}, "contents": {"text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the start of every compilation.  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a system header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLEX\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"}, "sourceLanguage": "cplusplus"}, {"location": {"uri": "<generated>", "uriBaseId": "PWD"}, "contents": {"text": "GCC diagnostic push\n\0"}, "sourceLanguage": "cplusplus"}], "results": [{"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 258918"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "t.cpp", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 8}, "contextRegion": {"startLine": 1, "snippet": {"text": "_Pragma(\"GCC diagnostic push\")\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 259906’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 1, "endColumn": 4}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 1 has ‘x-location == y-location == 260387’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 16, "endColumn": 20}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 2 has ‘x-location == y-location == 260512’"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "<generated>", "uriBaseId": "PWD"}, "region": {"startLine": 1, "startColumn": 20, "endColumn": 21}, "contextRegion": {"startLine": 1, "snippet": {"text": "GCC diagnostic push\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 189172"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 47, "startColumn": 6, "endColumn": 27}, "contextRegion": {"startLine": 47, "snippet": {"text": "# if __GCC_IEC_559_COMPLEX > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}, {"ruleId": "note", "level": "note", "message": {"text": "expansion point is location 148204"}, "locations": [{"physicalLocation": {"artifactLocation": {"uri": "/usr/include/stdc-predef.h"}, "region": {"startLine": 37, "startColumn": 6, "endColumn": 19}, "contextRegion": {"startLine": 37, "snippet": {"text": "# if __GCC_IEC_559 > 0\n"}}}}]}, {"ruleId": "note", "level": "note", "message": {"text": "token 0 has ‘x-location == y-location == 1’"}, "locations": [{}]}]}]}

[-- Attachment #3: t.cpp.json --]
[-- Type: text/plain, Size: 12185 bytes --]

{
    "$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
    "runs": [
        {
            "artifacts": [
                {
                    "contents": {
                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                    },
                    "location": {
                        "uri": "t.cpp",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "/* Copyright (C) 1991-2022 Free Software Foundation, Inc.\n   This file is part of the GNU C Library.\n\n   The GNU C Library is free software; you can redistribute it and/or\n   modify it under the terms of the GNU Lesser General Public\n   License as published by the Free Software Foundation; either\n   version 2.1 of the License, or (at your option) any later version.\n\n   The GNU C Library is distributed in the hope that it will be useful,\n   but WITHOUT ANY WARRANTY; without even the implied warranty of\n   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU\n   Lesser General Public License for more details.\n\n   You should have received a copy of the GNU Lesser General Public\n   License along with the GNU C Library; if not, see\n   <https://www.gnu.org/licenses/>.  */\n\n#ifndef\t_STDC_PREDEF_H\n#define\t_STDC_PREDEF_H\t1\n\n/* This header is separate from features.h so that the compiler can\n   include it implicitly at the start of every compilation..  It must\n   not itself include <features.h> or any other header that includes\n   <features.h> because the implicit include comes before any feature\n   test macros that may be defined in a source file before it first\n   explicitly includes a system header.  GCC knows the name of this\n   header in order to preinclude it.  */\n\n/* glibc's intent is to support the IEC 559 math functionality, real\n   and complex.  If the GCC (4.9 and later) predefined macros\n   specifying compiler intent are available, use them to determine\n   whether the overall intent is to support these features; otherwise,\n   presume an older compiler has intent to support these features and\n   define these macros by default.  */\n\n#ifdef __GCC_IEC_559\n# if __GCC_IEC_559 > 0\n#  define __STDC_IEC_559__\t\t1\n#  define __STDC_IEC_60559_BFP__ \t201404L\n# endif\n#else\n# define __STDC_IEC_559__\t\t1\n# define __STDC_IEC_60559_BFP__ \t201404L\n#endif\n\n#ifdef __GCC_IEC_559_COMPLEX\n# if __GCC_IEC_559_COMPLEX > 0\n#  define __STDC_IEC_559_COMPLEX__\t1\n#  define __STDC_IEC_60559_COMPLEX__\t201404L\n# endif\n#else\n# define __STDC_IEC_559_COMPLEX__\t1\n# define __STDC_IEC_60559_COMPLEX__\t201404L\n#endif\n\n/* wchar_t uses Unicode 10.0.0.  Version 10.0 of the Unicode Standard is\n   synchronized with ISO/IEC 10646:2017, fifth edition, plus\n   the following additions from Amendment 1 to the fifth edition:\n   - 56 emoji characters\n   - 285 hentaigana\n   - 3 additional Zanabazar Square characters */\n#define __STDC_ISO_10646__\t\t201706L\n\n#endif\n"
                    },
                    "location": {
                        "uri": "/usr/include/stdc-predef.h"
                    },
                    "sourceLanguage": "cplusplus"
                },
                {
                    "contents": {
                        "text": "GCC diagnostic push\n"
                    },
                    "location": {
                        "uri": "<generated>",
                        "uriBaseId": "PWD"
                    },
                    "sourceLanguage": "cplusplus"
                }
            ],
            "invocations": [
                {
                    "executionSuccessful": true,
                    "toolExecutionNotifications": []
                }
            ],
            "originalUriBaseIds": {
                "PWD": {
                    "uri": "file:///home/lewis/"
                }
            },
            "results": [
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "t.cpp",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "_Pragma(\"GCC diagnostic push\")\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 8,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 258918"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 4,
                                    "startColumn": 1,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 259906\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 20,
                                    "startColumn": 16,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 1 has \u2018x-location == y-location == 260387\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "<generated>",
                                    "uriBaseId": "PWD"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "GCC diagnostic push\n"
                                    },
                                    "startLine": 1
                                },
                                "region": {
                                    "endColumn": 21,
                                    "startColumn": 20,
                                    "startLine": 1
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "token 2 has \u2018x-location == y-location == 260512\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559_COMPLEX > 0\n"
                                    },
                                    "startLine": 47
                                },
                                "region": {
                                    "endColumn": 27,
                                    "startColumn": 6,
                                    "startLine": 47
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 189172"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {
                            "physicalLocation": {
                                "artifactLocation": {
                                    "uri": "/usr/include/stdc-predef.h"
                                },
                                "contextRegion": {
                                    "snippet": {
                                        "text": "# if __GCC_IEC_559 > 0\n"
                                    },
                                    "startLine": 37
                                },
                                "region": {
                                    "endColumn": 19,
                                    "startColumn": 6,
                                    "startLine": 37
                                }
                            }
                        }
                    ],
                    "message": {
                        "text": "expansion point is location 148204"
                    },
                    "ruleId": "note"
                },
                {
                    "level": "note",
                    "locations": [
                        {}
                    ],
                    "message": {
                        "text": "token 0 has \u2018x-location == y-location == 1\u2019"
                    },
                    "ruleId": "note"
                }
            ],
            "tool": {
                "driver": {
                    "fullName": "GNU C++17 (GCC) version 14.0.0 20230811 (experimental) (x86_64-pc-linux-gnu)",
                    "informationUri": "https://gcc.gnu.org/gcc-14/",
                    "name": "GNU C++17",
                    "rules": [],
                    "version": "14.0.0 20230811 (experimental)"
                }
            }
        }
    ],
    "version": "2.1.0"
}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot
  2023-08-15 15:43           ` David Malcolm
@ 2023-08-15 17:58             ` Lewis Hyatt
  2023-08-15 19:39               ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-15 17:58 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > Class file_cache_slot in input.cc is used to query specific lines of source
> > code from a file when needed by diagnostics infrastructure. This will be
> > extended in a subsequent patch to support obtaining the source code from
> > in-memory generated buffers rather than from a file. The present patch
> > refactors class file_cache_slot, putting most of the logic into a new base
> > class cache_data_source, in preparation for reusing that code in the next
> > patch. There is no change in functionality yet.
> > 
> > gcc/ChangeLog:
> > 
> >         * input.cc (class file_cache_slot): Refactor functionality into a
> >         new base class...
> >         (class cache_data_source): ...here.
> >         (file_cache::forcibly_evict_file): Adapt for refactoring.
> >         (file_cache_slot::evict): Renamed to...
> >         (file_cache_slot::reset): ...this, and partially refactored into
> >         base class...
> >         (cache_data_source::reset): ...here.
> >         (file_cache_slot::get_full_file_content): Moved into base class...
> >         (cache_data_source::get_full_file_content): ...here.
> >         (file_cache_slot::create): Adapt for refactoring.
> >         (file_cache_slot::file_cache_slot): Refactor partially into...
> >         (cache_data_source::cache_data_source): ...here.
> >         (file_cache_slot::~file_cache_slot): Refactor partially into...
> >         (cache_data_source::~cache_data_source): ...here.
> >         (file_cache_slot::needs_read_p): Remove.
> >         (file_cache_slot::needs_grow_p): Remove.
> >         (file_cache_slot::maybe_grow): Adapt for refactoring.
> >         (file_cache_slot::read_data): Refactored, along with...
> >         (file_cache_slot::maybe_read_data): this, into...
> >         (file_cache_slot::get_more_data): ...here.
> >         (find_end_of_line): Change interface to take a pair of pointers,
> >         rather than a pointer + length.
> >         (file_cache_slot::get_next_line): Refactored into...
> >         (cache_data_source::get_next_line): ...here.
> >         (file_cache_slot::goto_next_line): Refactored into...
> >         (cache_data_source::goto_next_line): ...here.
> >         (file_cache_slot::read_line_num): Refactored into...
> >         (cache_data_source::read_line_num): ...here.
> >         (location_get_source_line): Fix const-correctness as necessitated by
> >         new interface.
> > ---
> >  gcc/input.cc | 513 +++++++++++++++++++++++----------------------------
> >  1 file changed, 235 insertions(+), 278 deletions(-)
> > 
> 
> I confess I had to reread both this and patch 4/8 to make sense of
> this; this is probably one of those cases where it's harder to read in
> patch form than as source, but I think I now understand the new
> implementation.

Yes, sorry about that. I hope at least splitting into two patches here made it
a little easier.

> 
> Did you try testing this with valgrind (e.g. "make selftest-valgrind")?
>

Oh interesting, was not aware of this. I think it shows that new leaks were
not introduced with the patch series.

BEFORE patch series:
==1572278==
-fself-test: 7634593 pass(es) in 22.799240 seconds
==1572278==
==1572278== HEAP SUMMARY:
==1572278==     in use at exit: 1,083,255 bytes in 2,394 blocks
==1572278==   total heap usage: 2,704,869 allocs, 2,702,475 frees, 1,257,334,536 bytes allocated
==1572278==
==1572278== 8,032 bytes in 1 blocks are possibly lost in loss record 639 of 657
==1572278==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1572278==    by 0x21FE1CB: xmalloc (xmalloc.c:149)
==1572278==    by 0x21B02E0: new_buff (lex.cc:4767)
==1572278==    by 0x21B02E0: _cpp_get_buff (lex.cc:4800)
==1572278==    by 0x21ACC80: cpp_create_reader(c_lang, ht*, line_maps*) (init.cc:289)
==1572278==    by 0xA64282: c_common_init_options(unsigned int, cl_decoded_option*) (c-opts.cc:237)
==1572278==    by 0x95E479: toplev::main(int, char**) (toplev.cc:2241)
==1572278==    by 0x960B2D: main (main.cc:39)
==1572278==
==1572278== LEAK SUMMARY:
==1572278==    definitely lost: 0 bytes in 0 blocks
==1572278==    indirectly lost: 0 bytes in 0 blocks
==1572278==      possibly lost: 8,032 bytes in 1 blocks
==1572278==    still reachable: 1,075,223 bytes in 2,393 blocks
==1572278==         suppressed: 0 bytes in 0 blocks
==1572278== Reachable blocks (those to which a pointer was found) are not shown.
==1572278== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1572278==
==1572278== For lists of detected and suppressed errors, rerun with: -s
==1572278== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

AFTER patch series:
==1594840==
-fself-test: 7638403 pass(es) in 23.671784 seconds
==1594840==
==1594840== HEAP SUMMARY:
==1594840==     in use at exit: 1,081,759 bytes in 2,367 blocks
==1594840==   total heap usage: 2,728,561 allocs, 2,726,194 frees, 1,272,214,526 bytes allocated
==1594840==
==1594840== 8,032 bytes in 1 blocks are possibly lost in loss record 609 of 628
==1594840==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==1594840==    by 0x2200CCB: xmalloc (xmalloc.c:149)
==1594840==    by 0x21B2440: new_buff (lex.cc:4767)
==1594840==    by 0x21B2440: _cpp_get_buff (lex.cc:4800)
==1594840==    by 0x21AEDA0: cpp_create_reader(c_lang, ht*, line_maps*) (init.cc:289)
==1594840==    by 0xA64592: c_common_init_options(unsigned int, cl_decoded_option*) (c-opts.cc:237)
==1594840==    by 0x95E529: toplev::main(int, char**) (toplev.cc:2241)
==1594840==    by 0x960BDD: main (main.cc:39)
==1594840==
==1594840== LEAK SUMMARY:
==1594840==    definitely lost: 0 bytes in 0 blocks
==1594840==    indirectly lost: 0 bytes in 0 blocks
==1594840==      possibly lost: 8,032 bytes in 1 blocks
==1594840==    still reachable: 1,073,727 bytes in 2,366 blocks
==1594840==         suppressed: 0 bytes in 0 blocks

> I don't think we have any selftest coverage for "\r" in the line-break
> handling; that would be good to add.
> 
> This patch is OK for trunk once the rest of the kit is approved.

Thank you. To be clear, were you suggesting to add selftest coverage for \r
endings now, or in a follow up?

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-15 16:15           ` David Malcolm
@ 2023-08-15 18:15             ` Lewis Hyatt
  2023-08-15 19:46               ` David Malcolm
  0 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-15 18:15 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > This patch enhances location_get_source_line(), which is the primary
> > interface provided by the diagnostics infrastructure to obtain the line of
> > source code corresponding to a given location, so that it understands
> > generated data locations in addition to normal file-based locations. This
> > involves changing the argument to location_get_source_line() from a plain
> > file name, to a source_id object that can represent either type of location.
> > 
> > gcc/ChangeLog:
> > 
> >         * input.cc (class data_cache_slot): New class.
> >         (file_cache::lookup_data): New function.
> >         (diagnostics_file_cache_forcibly_evict_data): New function.
> >         (file_cache::forcibly_evict_data): New function.
> >         (file_cache::evicted_cache_tab_entry): Generalize (via a template)
> >         to work for both file_cache_slot and data_cache_slot.
> >         (file_cache::add_file): Adapt for new interface to
> >         evicted_cache_tab_entry.
> >         (file_cache::add_data): New function.
> >         (data_cache_slot::create): New function.
> >         (file_cache::file_cache): Support the new m_data_slots member.
> >         (file_cache::~file_cache): Likewise.
> >         (file_cache::lookup_or_add_data): New function.
> >         (file_cache::lookup_or_add): New function that calls either
> >         lookup_or_add_data or lookup_or_add_file as appropriate.
> >         (location_get_source_line): Change the FILE_PATH argument to a
> >         source_id SRC, and use it to support obtaining source lines from
> >         generated data as well as from files.
> >         (location_compute_display_column): Support generated data using the
> >         new features of location_get_source_line.
> >         (dump_location_info): Likewise.
> >         * input.h (location_get_source_line): Adjust prototype. Add a new
> >         convenience overload taking an expanded_location.
> >         (class cache_data_source): Declare.
> >         (class data_cache_slot): Declare.
> >         (class file_cache): Declare new members.
> >         (diagnostics_file_cache_forcibly_evict_data): Declare.
> > ---
> >  gcc/input.cc | 171 ++++++++++++++++++++++++++++++++++++++++-----------
> >  gcc/input.h  |  23 +++++--
> >  2 files changed, 153 insertions(+), 41 deletions(-)
> > 
> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index 9377020b460..790279d4273 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> > @@ -207,6 +207,28 @@ private:
> >    void maybe_grow ();
> >  };
> >  
> > +/* This is the implementation of cache_data_source for generated
> > +   data that is already in memory.  */
> > +class data_cache_slot final : public cache_data_source
> 
> It occurred to me: why are we caching accessing a buffer that's already
> in memory - but we're also caching the line-splitting information, and
> providing the line-splitting algorithm with a consistent interface to
> the data, right?
>

Yeah, for the current _Pragma use case, multi-line buffers are not going to
be common, but they can occur. I was mainly motivated by the consistent
interface, and by the assumption that the overhead is not critical given a
diagnostic is being issued.

> [...snip...]
> 
> > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file (const char *file_path)
> >    global_dc->m_file_cache->forcibly_evict_file (file_path);
> >  }
> >  
> > +void
> > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > +                                           unsigned int data_len)
> > +{
> > +  if (!global_dc->m_file_cache)
> > +    return;
> > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> 
> Maybe we should rename diagnostic_context's m_file_cache to
> m_source_cache?  (and class file_cache for that matter?)  But if so,
> that can/should be a followup/separate patch.
>

Yes, we should. Believe it or not, I was trying to minimize the size of the
patch :) So I didn't make such changes, but they will make things more
clear.

> [...snip...]
>  
> > @@ -525,10 +582,22 @@ file_cache_slot::create (const file_cache::input_context &in_context,
> >    return true;
> >  }
> >  
> > +void
> > +data_cache_slot::create (const char *data, unsigned int data_len,
> > +                        unsigned int highest_use_count)
> > +{
> > +  reset ();
> > +  on_create (highest_use_count + 1,
> > +            total_lines_num (source_id {data, data_len}));
> > +  m_data_begin = data;
> > +  m_data_end = data + data_len;
> > +}
> > +
> >  /* file_cache's ctor.  */
> >  
> >  file_cache::file_cache ()
> > -: m_file_slots (new file_cache_slot[num_file_slots])
> > +  : m_file_slots (new file_cache_slot[num_file_slots]),
> > +    m_data_slots (new data_cache_slot[num_file_slots])
> 
> Should "num_file_slots" be renamed to "num_slots"?
> 
> I assume you're using the same value for both kinds of slot since the
> file_cache::evicted_cache_tab_entry template uses this.  I suppose the
> number could be passed in as an argument to that function if we wanted
> to have different sizes for the two kinds, but I don't think it
> matters.
>

Yes that's right... would rename num_file_slots too.

> [...snip...]
> 
> > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t line_num,
> >     If the function fails, a NULL char_span is returned.  */
> >  
> >  char_span
> > -location_get_source_line (const char *file_path, int line)
> > +location_get_source_line (source_id src, int line)
> >  {
> > -  const char *buffer = NULL;
> > -  ssize_t len;
> > -
> > -  if (line == 0)
> > -    return char_span (NULL, 0);
> > -
> > -  if (file_path == NULL)
> > -    return char_span (NULL, 0);
> > +  const char_span fail (nullptr, 0);
> > +  if (!src || line <= 0)
> > +    return fail;
> 
> Looking at source_id's operator bool, are there effectively three kinds
> of source_id?
> 
> (a) file names
> (b) generated buffer
> (c) NULL == m_filename_or_buffer
> 
> What does (c) mean?  Is it a "something's gone wrong/error" state?  Or
> is this more a special-case of (a)? (in that the m_len for such a case
> would be zero)
> 
> Should source_id's 2-param ctor have an assert that the ptr is non-
> NULL?
> 
> [...snip...]
> 
> The patch is OK for trunk as-is, but note the question about the
> source_id ctor above.
> 

Thanks. (c) has the same meaning as a NULL file name currently does, so a
default-constructed source_id is not an in-memory buffer, but is rather a
NULL filename. linemap_add() for instance, will interpret a NULL filename
for an LC_LEAVE map, as a request to copy it from the natural values being
returned to. I think the source_id constructor needs to accept a NULL
filename to remain backwards compatible. With the current design of
source_id, it is safe always to change a 'const char*' file name argument to
a source_id argument instead; it will work just how it did before because it
has an implicit constructor. But if the constructor would assert on a
non-NULL pointer, that would necessitate changing all call sites that
currently expect they can pass a NULL pointer there. (For example, there are
several calls to _cpp_do_file_change() within libcpp that take advantage of
being able to pass a NULL filename to linemap_add.)

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot
  2023-08-15 17:58             ` Lewis Hyatt
@ 2023-08-15 19:39               ` David Malcolm
  2023-08-23 21:22                 ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-15 19:39 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches

On Tue, 2023-08-15 at 13:58 -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > Class file_cache_slot in input.cc is used to query specific lines
> > > of source
> > > code from a file when needed by diagnostics infrastructure. This
> > > will be
> > > extended in a subsequent patch to support obtaining the source
> > > code from
> > > in-memory generated buffers rather than from a file. The present
> > > patch
> > > refactors class file_cache_slot, putting most of the logic into a
> > > new base
> > > class cache_data_source, in preparation for reusing that code in
> > > the next
> > > patch. There is no change in functionality yet.
> > > 

[...snip...]

> > 
> > I confess I had to reread both this and patch 4/8 to make sense of
> > this; this is probably one of those cases where it's harder to read
> > in
> > patch form than as source, but I think I now understand the new
> > implementation.
> 
> Yes, sorry about that. I hope at least splitting into two patches
> here made it
> a little easier.
> 
> > 
> > Did you try testing this with valgrind (e.g. "make selftest-
> > valgrind")?
> > 
> 
> Oh interesting, was not aware of this. I think it shows that new
> leaks were
> not introduced with the patch series.
> 

[...snip...]

> 
> 
> > I don't think we have any selftest coverage for "\r" in the line-
> > break
> > handling; that would be good to add.
> > 
> > This patch is OK for trunk once the rest of the kit is approved.
> 
> Thank you. To be clear, were you suggesting to add selftest coverage
> for \r
> endings now, or in a follow up?

The former, please, so that we can sure that the patch doesn't
introduce any buffer overreads etc.

Thanks
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-15 18:15             ` Lewis Hyatt
@ 2023-08-15 19:46               ` David Malcolm
  2023-08-15 20:08                 ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: David Malcolm @ 2023-08-15 19:46 UTC (permalink / raw)
  To: Lewis Hyatt; +Cc: gcc-patches

On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > This patch enhances location_get_source_line(), which is the
> > > primary
> > > interface provided by the diagnostics infrastructure to obtain
> > > the line of
> > > source code corresponding to a given location, so that it
> > > understands
> > > generated data locations in addition to normal file-based
> > > locations. This
> > > involves changing the argument to location_get_source_line() from
> > > a plain
> > > file name, to a source_id object that can represent either type
> > > of location.
> > > 

[...]

> > > 
> > > 
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 9377020b460..790279d4273 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> > > @@ -207,6 +207,28 @@ private:
> > >    void maybe_grow ();
> > >  };
> > >  
> > > +/* This is the implementation of cache_data_source for generated
> > > +   data that is already in memory.  */
> > > +class data_cache_slot final : public cache_data_source
> > 
> > It occurred to me: why are we caching accessing a buffer that's
> > already
> > in memory - but we're also caching the line-splitting information,
> > and
> > providing the line-splitting algorithm with a consistent interface
> > to
> > the data, right?
> > 
> 
> Yeah, for the current _Pragma use case, multi-line buffers are not
> going to
> be common, but they can occur. I was mainly motivated by the
> consistent
> interface, and by the assumption that the overhead is not critical
> given a
> diagnostic is being issued.

(nods)

> 
> > [...snip...]
> > 
> > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > (const char *file_path)
> > >    global_dc->m_file_cache->forcibly_evict_file (file_path);
> > >  }
> > >  
> > > +void
> > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > +                                           unsigned int
> > > data_len)
> > > +{
> > > +  if (!global_dc->m_file_cache)
> > > +    return;
> > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > 
> > Maybe we should rename diagnostic_context's m_file_cache to
> > m_source_cache?  (and class file_cache for that matter?)  But if
> > so,
> > that can/should be a followup/separate patch.
> > 
> 
> Yes, we should. Believe it or not, I was trying to minimize the size
> of the
> patch :) 

:)

Thanks for splitting it up, BTW.

[...]


> > 
> > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > line_num,
> > >     If the function fails, a NULL char_span is returned.  */
> > >  
> > >  char_span
> > > -location_get_source_line (const char *file_path, int line)
> > > +location_get_source_line (source_id src, int line)
> > >  {
> > > -  const char *buffer = NULL;
> > > -  ssize_t len;
> > > -
> > > -  if (line == 0)
> > > -    return char_span (NULL, 0);
> > > -
> > > -  if (file_path == NULL)
> > > -    return char_span (NULL, 0);
> > > +  const char_span fail (nullptr, 0);
> > > +  if (!src || line <= 0)
> > > +    return fail;
> > 
> > Looking at source_id's operator bool, are there effectively three
> > kinds
> > of source_id?
> > 
> > (a) file names
> > (b) generated buffer
> > (c) NULL == m_filename_or_buffer
> > 
> > What does (c) mean?  Is it a "something's gone wrong/error" state? 
> > Or
> > is this more a special-case of (a)? (in that the m_len for such a
> > case
> > would be zero)
> > 
> > Should source_id's 2-param ctor have an assert that the ptr is non-
> > NULL?
> > 
> > [...snip...]
> > 
> > The patch is OK for trunk as-is, but note the question about the
> > source_id ctor above.
> > 
> 
> Thanks. (c) has the same meaning as a NULL file name currently does,
> so a
> default-constructed source_id is not an in-memory buffer, but is
> rather a
> NULL filename. linemap_add() for instance, will interpret a NULL
> filename
> for an LC_LEAVE map, as a request to copy it from the natural values
> being
> returned to. I think the source_id constructor needs to accept a NULL
> filename to remain backwards compatible. With the current design of
> source_id, it is safe always to change a 'const char*' file name
> argument to
> a source_id argument instead; it will work just how it did before
> because it
> has an implicit constructor. But if the constructor would assert on a
> non-NULL pointer, that would necessitate changing all call sites that
> currently expect they can pass a NULL pointer there. (For example,
> there are
> several calls to _cpp_do_file_change() within libcpp that take
> advantage of
> being able to pass a NULL filename to linemap_add.)

Yes, it's OK for this ctor to accept NULL;
   source_id (const char *filename = nullptr)
and I see you added the default arg.

I was referring to this ctor:
   source_id (const char *buffer, unsigned buffer_len)
Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
assert that it's non-NULL in this ctor?  Does the generated data case
ever return NULL?

This is more of a patch 1 thing, of course.

Thanks
Dave


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-15 19:46               ` David Malcolm
@ 2023-08-15 20:08                 ` Lewis Hyatt
  2023-08-23 19:41                   ` Lewis Hyatt
  0 siblings, 1 reply; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-15 20:08 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > This patch enhances location_get_source_line(), which is the
> > > > primary
> > > > interface provided by the diagnostics infrastructure to obtain
> > > > the line of
> > > > source code corresponding to a given location, so that it
> > > > understands
> > > > generated data locations in addition to normal file-based
> > > > locations. This
> > > > involves changing the argument to location_get_source_line() from
> > > > a plain
> > > > file name, to a source_id object that can represent either type
> > > > of location.
> > > >
>
> [...]
>
> > > >
> > > >
> > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > index 9377020b460..790279d4273 100644
> > > > --- a/gcc/input.cc
> > > > +++ b/gcc/input.cc
> > > > @@ -207,6 +207,28 @@ private:
> > > >    void maybe_grow ();
> > > >  };
> > > >
> > > > +/* This is the implementation of cache_data_source for generated
> > > > +   data that is already in memory.  */
> > > > +class data_cache_slot final : public cache_data_source
> > >
> > > It occurred to me: why are we caching accessing a buffer that's
> > > already
> > > in memory - but we're also caching the line-splitting information,
> > > and
> > > providing the line-splitting algorithm with a consistent interface
> > > to
> > > the data, right?
> > >
> >
> > Yeah, for the current _Pragma use case, multi-line buffers are not
> > going to
> > be common, but they can occur. I was mainly motivated by the
> > consistent
> > interface, and by the assumption that the overhead is not critical
> > given a
> > diagnostic is being issued.
>
> (nods)
>
> >
> > > [...snip...]
> > >
> > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > (const char *file_path)
> > > >    global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > >  }
> > > >
> > > > +void
> > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > +                                           unsigned int
> > > > data_len)
> > > > +{
> > > > +  if (!global_dc->m_file_cache)
> > > > +    return;
> > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > >
> > > Maybe we should rename diagnostic_context's m_file_cache to
> > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > so,
> > > that can/should be a followup/separate patch.
> > >
> >
> > Yes, we should. Believe it or not, I was trying to minimize the size
> > of the
> > patch :)
>
> :)
>
> Thanks for splitting it up, BTW.
>
> [...]
>
>
> > >
> > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > line_num,
> > > >     If the function fails, a NULL char_span is returned.  */
> > > >
> > > >  char_span
> > > > -location_get_source_line (const char *file_path, int line)
> > > > +location_get_source_line (source_id src, int line)
> > > >  {
> > > > -  const char *buffer = NULL;
> > > > -  ssize_t len;
> > > > -
> > > > -  if (line == 0)
> > > > -    return char_span (NULL, 0);
> > > > -
> > > > -  if (file_path == NULL)
> > > > -    return char_span (NULL, 0);
> > > > +  const char_span fail (nullptr, 0);
> > > > +  if (!src || line <= 0)
> > > > +    return fail;
> > >
> > > Looking at source_id's operator bool, are there effectively three
> > > kinds
> > > of source_id?
> > >
> > > (a) file names
> > > (b) generated buffer
> > > (c) NULL == m_filename_or_buffer
> > >
> > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > Or
> > > is this more a special-case of (a)? (in that the m_len for such a
> > > case
> > > would be zero)
> > >
> > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > NULL?
> > >
> > > [...snip...]
> > >
> > > The patch is OK for trunk as-is, but note the question about the
> > > source_id ctor above.
> > >
> >
> > Thanks. (c) has the same meaning as a NULL file name currently does,
> > so a
> > default-constructed source_id is not an in-memory buffer, but is
> > rather a
> > NULL filename. linemap_add() for instance, will interpret a NULL
> > filename
> > for an LC_LEAVE map, as a request to copy it from the natural values
> > being
> > returned to. I think the source_id constructor needs to accept a NULL
> > filename to remain backwards compatible. With the current design of
> > source_id, it is safe always to change a 'const char*' file name
> > argument to
> > a source_id argument instead; it will work just how it did before
> > because it
> > has an implicit constructor. But if the constructor would assert on a
> > non-NULL pointer, that would necessitate changing all call sites that
> > currently expect they can pass a NULL pointer there. (For example,
> > there are
> > several calls to _cpp_do_file_change() within libcpp that take
> > advantage of
> > being able to pass a NULL filename to linemap_add.)
>
> Yes, it's OK for this ctor to accept NULL;
>    source_id (const char *filename = nullptr)
> and I see you added the default arg.
>
> I was referring to this ctor:
>    source_id (const char *buffer, unsigned buffer_len)
> Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
> assert that it's non-NULL in this ctor?  Does the generated data case
> ever return NULL?
>

Oh, I see. This should never be NULL and I can add an assert for that.

-Lewis

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers
  2023-08-15 20:08                 ` Lewis Hyatt
@ 2023-08-23 19:41                   ` Lewis Hyatt
  0 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-23 19:41 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, Aug 15, 2023 at 04:08:47PM -0400, Lewis Hyatt wrote:
> On Tue, Aug 15, 2023 at 3:46 PM David Malcolm <dmalcolm@redhat.com> wrote:
> >
> > On Tue, 2023-08-15 at 14:15 -0400, Lewis Hyatt wrote:
> > > On Tue, Aug 15, 2023 at 12:15:15PM -0400, David Malcolm wrote:
> > > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > > This patch enhances location_get_source_line(), which is the
> > > > > primary
> > > > > interface provided by the diagnostics infrastructure to obtain
> > > > > the line of
> > > > > source code corresponding to a given location, so that it
> > > > > understands
> > > > > generated data locations in addition to normal file-based
> > > > > locations. This
> > > > > involves changing the argument to location_get_source_line() from
> > > > > a plain
> > > > > file name, to a source_id object that can represent either type
> > > > > of location.
> > > > >
> >
> > [...]
> >
> > > > >
> > > > >
> > > > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > > > index 9377020b460..790279d4273 100644
> > > > > --- a/gcc/input.cc
> > > > > +++ b/gcc/input.cc
> > > > > @@ -207,6 +207,28 @@ private:
> > > > >    void maybe_grow ();
> > > > >  };
> > > > >
> > > > > +/* This is the implementation of cache_data_source for generated
> > > > > +   data that is already in memory.  */
> > > > > +class data_cache_slot final : public cache_data_source
> > > >
> > > > It occurred to me: why are we caching accessing a buffer that's
> > > > already
> > > > in memory - but we're also caching the line-splitting information,
> > > > and
> > > > providing the line-splitting algorithm with a consistent interface
> > > > to
> > > > the data, right?
> > > >
> > >
> > > Yeah, for the current _Pragma use case, multi-line buffers are not
> > > going to
> > > be common, but they can occur. I was mainly motivated by the
> > > consistent
> > > interface, and by the assumption that the overhead is not critical
> > > given a
> > > diagnostic is being issued.
> >
> > (nods)
> >
> > >
> > > > [...snip...]
> > > >
> > > > > @@ -397,6 +434,15 @@ diagnostics_file_cache_forcibly_evict_file
> > > > > (const char *file_path)
> > > > >    global_dc->m_file_cache->forcibly_evict_file (file_path);
> > > > >  }
> > > > >
> > > > > +void
> > > > > +diagnostics_file_cache_forcibly_evict_data (const char *data,
> > > > > +                                           unsigned int
> > > > > data_len)
> > > > > +{
> > > > > +  if (!global_dc->m_file_cache)
> > > > > +    return;
> > > > > +  global_dc->m_file_cache->forcibly_evict_data (data, data_len);
> > > >
> > > > Maybe we should rename diagnostic_context's m_file_cache to
> > > > m_source_cache?  (and class file_cache for that matter?)  But if
> > > > so,
> > > > that can/should be a followup/separate patch.
> > > >
> > >
> > > Yes, we should. Believe it or not, I was trying to minimize the size
> > > of the
> > > patch :)
> >
> > :)
> >
> > Thanks for splitting it up, BTW.
> >
> > [...]
> >
> >
> > > >
> > > > > @@ -912,26 +1000,22 @@ cache_data_source::read_line_num (size_t
> > > > > line_num,
> > > > >     If the function fails, a NULL char_span is returned.  */
> > > > >
> > > > >  char_span
> > > > > -location_get_source_line (const char *file_path, int line)
> > > > > +location_get_source_line (source_id src, int line)
> > > > >  {
> > > > > -  const char *buffer = NULL;
> > > > > -  ssize_t len;
> > > > > -
> > > > > -  if (line == 0)
> > > > > -    return char_span (NULL, 0);
> > > > > -
> > > > > -  if (file_path == NULL)
> > > > > -    return char_span (NULL, 0);
> > > > > +  const char_span fail (nullptr, 0);
> > > > > +  if (!src || line <= 0)
> > > > > +    return fail;
> > > >
> > > > Looking at source_id's operator bool, are there effectively three
> > > > kinds
> > > > of source_id?
> > > >
> > > > (a) file names
> > > > (b) generated buffer
> > > > (c) NULL == m_filename_or_buffer
> > > >
> > > > What does (c) mean?  Is it a "something's gone wrong/error" state?
> > > > Or
> > > > is this more a special-case of (a)? (in that the m_len for such a
> > > > case
> > > > would be zero)
> > > >
> > > > Should source_id's 2-param ctor have an assert that the ptr is non-
> > > > NULL?
> > > >
> > > > [...snip...]
> > > >
> > > > The patch is OK for trunk as-is, but note the question about the
> > > > source_id ctor above.
> > > >
> > >
> > > Thanks. (c) has the same meaning as a NULL file name currently does,
> > > so a
> > > default-constructed source_id is not an in-memory buffer, but is
> > > rather a
> > > NULL filename. linemap_add() for instance, will interpret a NULL
> > > filename
> > > for an LC_LEAVE map, as a request to copy it from the natural values
> > > being
> > > returned to. I think the source_id constructor needs to accept a NULL
> > > filename to remain backwards compatible. With the current design of
> > > source_id, it is safe always to change a 'const char*' file name
> > > argument to
> > > a source_id argument instead; it will work just how it did before
> > > because it
> > > has an implicit constructor. But if the constructor would assert on a
> > > non-NULL pointer, that would necessitate changing all call sites that
> > > currently expect they can pass a NULL pointer there. (For example,
> > > there are
> > > several calls to _cpp_do_file_change() within libcpp that take
> > > advantage of
> > > being able to pass a NULL filename to linemap_add.)
> >
> > Yes, it's OK for this ctor to accept NULL;
> >    source_id (const char *filename = nullptr)
> > and I see you added the default arg.
> >
> > I was referring to this ctor:
> >    source_id (const char *buffer, unsigned buffer_len)
> > Is it ever OK for "buffer" to be NULL in this 2-param ctor, or can we
> > assert that it's non-NULL in this ctor?  Does the generated data case
> > ever return NULL?
> >
> 
> Oh, I see. This should never be NULL and I can add an assert for that.
> 

This tweak (incremental to patch 1/8) accomplishes that.

-Lewis

-- >8 --

diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 395c4612dbe..ad20b140cce 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -604,7 +604,7 @@ public:
     : m_filename_or_buffer (buffer),
       m_len (buffer_len)
   {
-    linemap_assert (buffer_len > 0);
+    linemap_assert (buffer && buffer_len > 0);
   }
 
   explicit operator bool () const { return m_filename_or_buffer; }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot
  2023-08-15 19:39               ` David Malcolm
@ 2023-08-23 21:22                 ` Lewis Hyatt
  0 siblings, 0 replies; 36+ messages in thread
From: Lewis Hyatt @ 2023-08-23 21:22 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, Aug 15, 2023 at 03:39:40PM -0400, David Malcolm wrote:
> On Tue, 2023-08-15 at 13:58 -0400, Lewis Hyatt wrote:
> > On Tue, Aug 15, 2023 at 11:43:05AM -0400, David Malcolm wrote:
> > > On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> > > > Class file_cache_slot in input.cc is used to query specific lines
> > > > of source
> > > > code from a file when needed by diagnostics infrastructure. This
> > > > will be
> > > > extended in a subsequent patch to support obtaining the source
> > > > code from
> > > > in-memory generated buffers rather than from a file. The present
> > > > patch
> > > > refactors class file_cache_slot, putting most of the logic into a
> > > > new base
> > > > class cache_data_source, in preparation for reusing that code in
> > > > the next
> > > > patch. There is no change in functionality yet.
> > > > 
> 
> [...snip...]
> 
> > > 
> > > I confess I had to reread both this and patch 4/8 to make sense of
> > > this; this is probably one of those cases where it's harder to read
> > > in
> > > patch form than as source, but I think I now understand the new
> > > implementation.
> > 
> > Yes, sorry about that. I hope at least splitting into two patches
> > here made it
> > a little easier.
> > 
> > > 
> > > Did you try testing this with valgrind (e.g. "make selftest-
> > > valgrind")?
> > > 
> > 
> > Oh interesting, was not aware of this. I think it shows that new
> > leaks were
> > not introduced with the patch series.
> > 
> 
> [...snip...]
> 
> > 
> > 
> > > I don't think we have any selftest coverage for "\r" in the line-
> > > break
> > > handling; that would be good to add.
> > > 
> > > This patch is OK for trunk once the rest of the kit is approved.
> > 
> > Thank you. To be clear, were you suggesting to add selftest coverage
> > for \r
> > endings now, or in a follow up?
> 
> The former, please, so that we can sure that the patch doesn't
> introduce any buffer overreads etc.
> 
> Thanks
> Dave
>

The following (incremental to patch 5/8 or after) adds selftest coverage for
alternate line endings. I hope things aren't too unclear this way; I can
resend updated versions of some or all of the patches from scratch, if useful.

AFAIK this is the current status of things:

Patch 1/8: Reviewed, updated version incorporating feedback has not been acked
yet, at: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627250.html

Patch 2/8: OKed, pending tweak to reject fixit hints in generated data, which
was sent incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627405.html

Patch 3/8: OKed, pending new selftest attached to this email.

Patch 4/8: OKed, pending tweak to assert on non-NULL buffers which was sent
incrementally here:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628283.html

Patch 5/8: OKed

Patch 6/8: OKed

Patch 7/8: Not reviewed yet

Patch 8/8: Waiting additional feedback from you, perhaps SARIF need not worry
about this for now and should just ignore generated data locations.

Thanks again for taking the time to go through this, I hope it will prove
worth it.

-Lewis

-- >8 --

gcc/ChangeLog:

	* input.cc (test_reading_source_line): Test additional cases,
	including generated data and alternate line endings.
	(input_cc_tests): Adapt to test_reading_source_line() changes.

diff --git a/gcc/input.cc b/gcc/input.cc
index 4c99df7a205..72274732c6c 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2392,30 +2392,51 @@ test_make_location_nonpure_range_endpoints (const line_table_case &case_)
 /* Verify reading of input files (e.g. for caret-based diagnostics).  */
 
 static void
-test_reading_source_line ()
+test_reading_source_line (bool generated, const char *e1, const char *e2)
 {
   /* Create a tempfile and write some text to it.  */
+  const char *line1 = "01234567890123456789";
+  const char *line2 = "This is the test text";
+  const char *line3 = "This is the 3rd line";
+  char content[72];
+  const int content_len = snprintf (content, sizeof (content),
+				    "%s%s%s%s%s",
+				    line1, e1, line2, e2, line3);
+  ASSERT_LT (content_len, (int)sizeof (content));
   temp_source_file tmp (SELFTEST_LOCATION, ".txt",
-			"01234567890123456789\n"
-			"This is the test text\n"
-			"This is the 3rd line");
+			content, content_len, generated);
 
-  /* Read back a specific line from the tempfile.  */
-  char_span source_line = location_get_source_line (tmp.get_filename (), 3);
+  /* Read back some specific lines from the tempfile, not all in order.  */
+  const source_id src = generated
+    ? source_id (tmp.content_buf, tmp.content_len)
+    : source_id (tmp.get_filename ());
+
+  char_span source_line = location_get_source_line (src, 1);
+  ASSERT_TRUE (source_line);
+  ASSERT_TRUE (source_line.get_buffer () != NULL);
+  /* N.B. If the line terminator is \r\n, the returned char_span will include
+     the \r as part of the line.  */
+  const size_t off1 = strlen (e1) - 1;
+  ASSERT_EQ (20 + off1, source_line.length ());
+  ASSERT_TRUE (!strncmp (line1, source_line.get_buffer (),
+			 source_line.length () - off1));
+
+  source_line = location_get_source_line (src, 3);
   ASSERT_TRUE (source_line);
   ASSERT_TRUE (source_line.get_buffer () != NULL);
   ASSERT_EQ (20, source_line.length ());
-  ASSERT_TRUE (!strncmp ("This is the 3rd line",
-			 source_line.get_buffer (), source_line.length ()));
+  ASSERT_TRUE (!strncmp (line3, source_line.get_buffer (),
+			 source_line.length ()));
 
-  source_line = location_get_source_line (tmp.get_filename (), 2);
+  source_line = location_get_source_line (src, 2);
   ASSERT_TRUE (source_line);
   ASSERT_TRUE (source_line.get_buffer () != NULL);
-  ASSERT_EQ (21, source_line.length ());
-  ASSERT_TRUE (!strncmp ("This is the test text",
-			 source_line.get_buffer (), source_line.length ()));
+  const size_t off2 = strlen (e2) - 1;
+  ASSERT_EQ (21 + off2, source_line.length ());
+  ASSERT_TRUE (!strncmp (line2, source_line.get_buffer (),
+			 source_line.length () - off2));
 
-  source_line = location_get_source_line (tmp.get_filename (), 4);
+  source_line = location_get_source_line (src, 4);
   ASSERT_FALSE (source_line);
   ASSERT_TRUE (source_line.get_buffer () == NULL);
 }
@@ -4311,7 +4332,11 @@ input_cc_tests ()
   for_each_line_table_case (test_lexer_string_locations_raw_string_unterminated);
   for_each_line_table_case (test_lexer_char_constants);
 
-  test_reading_source_line ();
+  const char *const line_endings[] = {"\n", "\r", "\r\n"};
+  for (bool generated : {false, true})
+    for (const char *e1 : line_endings)
+      for (const char *e2: line_endings)
+	test_reading_source_line (generated, e1, e2);
 
   test_line_offset_overflow ();
 

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2023-08-23 21:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-07-28 22:58   ` David Malcolm
2023-07-31 22:39     ` Lewis Hyatt
2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-08-11 22:45           ` David Malcolm
2023-08-13 20:18             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
2023-08-11 23:02           ` David Malcolm
2023-08-14 21:41             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
2023-08-15 15:43           ` David Malcolm
2023-08-15 17:58             ` Lewis Hyatt
2023-08-15 19:39               ` David Malcolm
2023-08-23 21:22                 ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
2023-08-15 16:15           ` David Malcolm
2023-08-15 18:15             ` Lewis Hyatt
2023-08-15 19:46               ` David Malcolm
2023-08-15 20:08                 ` Lewis Hyatt
2023-08-23 19:41                   ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
2023-08-15 16:27           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
2023-08-15 16:39           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-08-15 17:04           ` David Malcolm
2023-08-15 17:51             ` Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
2023-07-29 14:27   ` Lewis Hyatt
2023-07-29 16:03     ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).