public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers
Date: Sun, 13 Aug 2023 16:18:53 -0400	[thread overview]
Message-ID: <20230813201853.GA22077@ldh-imac.local> (raw)
In-Reply-To: <39ff876be039f58c60ca3150cc3806b13d316d69.camel@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 26228 bytes --]

On Fri, Aug 11, 2023 at 06:45:31PM -0400, David Malcolm wrote:
> On Wed, 2023-08-09 at 18:14 -0400, Lewis Hyatt wrote:
> 
> Hi Lewis, thanks for the patch...
> 
> > Add a new linemap reason LC_GEN which enables encoding the location of data
> > that was generated during compilation and does not appear in any source file.
> > There could be many use cases, such as, for instance, referring to the content
> > of builtin macros (not yet implemented, but an easy lift after this one.) The
> > first intended application is to create a place to store the input to a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> > 
> > The TO_FILE member of struct line_map_ordinary has been changed to a union
> > named SRC which can be either a file name, or a pointer to a line_map_data
> > struct describing the data. There is no space overhead added to the line
> > maps data structures.
> > 
> > Outside libcpp, this patch includes only the minimal changes implied by the
> > adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
> > patches will implement the new functionality.
> > 
> > libcpp/ChangeLog:
> > 
> >         * include/line-map.h (enum lc_reason): Add LC_GEN.
> >         (struct line_map_data): New struct.
> >         (struct line_map_ordinary): Change TO_FILE from a char* to a union,
> >         and rename to SRC.
> >         (class source_id): New class.
> >         (ORDINARY_MAP_GENERATED_DATA_P): New function.
> >         (ORDINARY_MAP_GENERATED_DATA): New function.
> >         (ORDINARY_MAP_GENERATED_DATA_LEN): New function.
> >         (ORDINARY_MAP_SOURCE_ID): New function.
> >         (ORDINARY_MAPS_SAME_FILE_P): New function.
> >         (ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
> >         (LINEMAP_FILE): Adapt to struct line_map_ordinary change.
> >         (linemap_get_file_highest_location): Likewise.
> >         * line-map.cc (source_id::operator==): New function.
> >         (ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
> >         (linemap_add): Support creating LC_GEN maps.
> >         (linemap_line_start): Support LC_GEN maps.
> >         (linemap_check_files_exited): Likewise.
> >         (linemap_position_for_loc_and_offset): Likewise.
> >         (linemap_get_expansion_filename): Likewise.
> >         (linemap_dump): Likewise.
> >         (linemap_dump_location): Likewise.
> >         (linemap_get_file_highest_location): Likewise.
> >         * directives.cc (_cpp_do_file_change): Likewise.
> > 
> > gcc/c-family/ChangeLog:
> > 
> >         * c-common.cc (try_to_locate_new_include_insertion_point): Recognize
> >         and ignore LC_GEN maps.
> > 
> > gcc/cp/ChangeLog:
> > 
> >         * module.cc (module_state::write_ordinary_maps): Recognize and
> >         ignore LC_GEN maps, and adapt to interface change in struct
> >         line_map_ordinary.
> >         (module_state::read_ordinary_maps): Likewise.
> > 
> > gcc/ChangeLog:
> > 
> >         * diagnostic-show-locus.cc (compatible_locations_p): Adapt to
> >         interface change in struct line_map_ordinary.
> >         * input.cc (special_fname_generated): New function.
> >         (dump_location_info): Support LC_GEN maps.
> >         (get_substring_ranges_for_loc): Adapt to interface change in struct
> >         line_map_ordinary.
> >         * input.h (special_fname_generated): Declare.
> > 
> > gcc/go/ChangeLog:
> > 
> >         * go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
> >         LC_GEN maps.
> > ---
> >  gcc/c-family/c-common.cc     |  11 ++-
> >  gcc/cp/module.cc             |   8 +-
> >  gcc/diagnostic-show-locus.cc |   2 +-
> >  gcc/go/go-linemap.cc         |   3 +-
> >  gcc/input.cc                 |  27 +++++-
> >  gcc/input.h                  |   1 +
> >  libcpp/directives.cc         |   4 +-
> >  libcpp/include/line-map.h    | 144 ++++++++++++++++++++++++----
> >  libcpp/line-map.cc           | 181 +++++++++++++++++++++++++----------
> >  9 files changed, 299 insertions(+), 82 deletions(-)
> 
> [...snip...]
> 
> > 
> > diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
> > index 0514815b51f..a2aa6b4e0b5 100644
> > --- a/gcc/diagnostic-show-locus.cc
> > +++ b/gcc/diagnostic-show-locus.cc
> > @@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
> >          are in the same file.  */
> >        const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
> >        const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
> > -      return ord_map_a->to_file == ord_map_b->to_file;
> > +      return ORDINARY_MAPS_SAME_FILE_P (ord_map_a, ord_map_b);
> 
> My first thought here was: are buffers supported here, or does it have
> to be a file?
> 
> It turns out that ORDINARY_MAPS_SAME_FILE_P works on both files and
> buffers.
> 
> This suggests that it would be better named as
> ORDINARY_MAPS_SAME_SOURCE_ID_P, but note the comment below, could this
> be:
> 
>            return ord_map_a->same_source_id_p (ord_map_b);
> 
> ?
> 
> [...snip...]
>

I think I would tend to feel that it's better not to switch to member
functions just yet, I'll reply to your comment below. In the meantime I did
rename this to ORDINARY_MAPS_SAME_SOURCE_P.

> > diff --git a/gcc/input.cc b/gcc/input.cc
> > index eaf301ec7c1..c1735215b29 100644
> > --- a/gcc/input.cc
> > +++ b/gcc/input.cc
> 
> [...snip...]
> 
> > @@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
> >        /* Bulletproofing.  We ought to only have different ordinary maps
> >          for start vs finish due to line-length jumps.  */
> >        if (start_ord_map != final_ord_map
> > -         && start_ord_map->to_file != final_ord_map->to_file)
> > +         && !ORDINARY_MAPS_SAME_FILE_P (start_ord_map, final_ord_map))
> 
> For the common case of comparing a pair of ordinary maps that have
> filenames, this hunk is replacing pointer comparison with filename_cmp,
> which ultimately does something like strcmp.  Should
> ORDINARY_MAPS_SAME_FILE_P have a fast-path for pointer equality?
>

Yes I think it should, added. So it seems that in most places currently, we
call filename_cmp, but in some places we assume pointer equality is a
sufficient check. I am pretty sure, that code which *only* does pointer
equality for file names can be wrong for edge cases in the case of PCH; I
think the PCH process can produce multiple pointers to the same file name
under some circumstances. (One, although atypical, would be if the same file
were included again after being brought in as a PCH.) So it's an improvement
to add filename_cmp() but it's also even better to check equality first.

> 
> [...snip...]
> 
> > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> > index 44fea0ea08e..e59123b18c5 100644
> > --- a/libcpp/include/line-map.h
> > +++ b/libcpp/include/line-map.h
> 
> [...snip...]
> 
> > @@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
> >    return ord_map->sysp;
> >  }
> >  
> > +/* TRUE if this line map contains generated data.  */
> > +inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
> > +{
> > +  return ord_map->reason == LC_GEN;
> > +}
> > +
> >  /* TRUE if this line map is for a module (not a source file).  */
> >  
> >  inline bool
> > @@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
> >           && linemap_check_ordinary (map)->reason == LC_MODULE);
> >  }
> >  
> > -/* Get the filename of ordinary map MAP.  */
> > +/* Get the data contents of ordinary map MAP.  */
> >  
> >  inline const char *
> >  ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
> >  {
> > -  return ord_map->to_file;
> > +  linemap_assert (ord_map->reason != LC_GEN);
> > +  return ord_map->src.file;
> > +}
> > +
> > +inline const char *
> > +ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
> > +{
> > +  linemap_assert (ord_map->reason == LC_GEN);
> > +  return ord_map->src.data->data;
> > +}
> > +
> > +inline unsigned int
> > +ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
> > +{
> > +  linemap_assert (ord_map->reason == LC_GEN);
> > +  return ord_map->src.data->len;
> > +}
> > +
> > +inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
> > +{
> > +  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> > +    return source_id {ord_map->src.data->data, ord_map->src.data->len};
> > +  return source_id {ord_map->src.file};
> > +}
> > +
> > +/* If we just want to know whether two maps point to the same
> > +   file/buffer or not.  */
> > +inline bool
> > +ORDINARY_MAPS_SAME_FILE_P (const line_map_ordinary *map1,
> > +                          const line_map_ordinary *map2)
> > +{
> > +  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
> >  }
> > 
> > 
> 
> There are lots of existing BLOCK_CAPS inline functions in line-map.h
> due to them originally being macros, but could the new ones above be
> member functions of line_map_ordinary?
> 
> e.g.
> 
> inline const char *
> linemap_ordinary::get_generated_data () const
> {
>   linemap_assert (reason == LC_GEN);
>   return src.data->data;
> }
> 
> Then again, the patch is matching the existing style, so you could save
> this for a followup if you like.
>

Definitely agree that member functions would be a nicer interface. One thing
to keep in mind is that the linemaps are allocated by ggc so they can work
with PCH and so they need to be POD types (I guess "standard layout types"
in newer C++ standards). So they can never really be a full class, hence
taking too many steps in that direction could be potentially error prone?
(e.g., it's OK to add a member function, but not a constructor, etc.) On the
other hand, it does already inherit from a base class with non-static
members, which I think makes it technically not a standard layout type as
is. I think revisiting it subsequently makes sense to me.

> > @@ -1093,21 +1185,28 @@ extern location_t linemap_line_start
> >  extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
> >  
> >  /* Add a mapping of logical source line to physical source file and
> > -   line number. This function creates an "ordinary map", which is a
> > +   line number.  This function creates an "ordinary map", which is a
> >     map that records locations of tokens that are not part of macro
> >     replacement-lists present at a macro expansion point.
> >  
> > -   The text pointed to by TO_FILE must have a lifetime
> > -   at least as long as the lifetime of SET.  An empty
> > -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> > -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > -   natural values considering the file we are returning to.
> > +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> > +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> > +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> > +   values considering the file we are returning to.  If reason is LC_GEN, then
> > +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> > +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> > +
> > +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> > +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> > +   map.
> > +
> > +   A call to this function can relocate the previous set of maps, so any stored
> > +   line_map pointers should not be used.  */
> >  
> > -   A call to this function can relocate the previous set of
> > -   maps, so any stored line_map pointers should not be used.  */
> >  extern const line_map *linemap_add
> >    (class line_maps *, enum lc_reason, unsigned int sysp,
> > -   const char *to_file, linenum_type to_line);
> > +   const char *filename_or_buffer, linenum_type to_line,
> > +   unsigned int data_len = 0);
> 
> I haven't looked at the rest of the patches yet, but could the params
>   const char *filename_or_buffer
> and
>   unsigned int data_len = 0 
> 
> be replaced by:
>   source_id src
> 
> and, if so, does it simplify things?  Or do the various LC_* cases
> complicated things?
>

Oh, I should have done it this way yes, it's much better, I didn't like how
the data_len argument needed to be separate from the buffer argument by the
to_line argument. The fact that source_id is implicitly constructible from a
char*, means it "just works" to change the argument to a source_id, without
touching all the call sites in every frontend.

> >  
> >  /* Create a macro map.  A macro map encodes source locations of tokens
> >     that are part of a macro replacement-list, at a macro expansion
> > @@ -1257,7 +1356,7 @@ linemap_position_for_loc_and_offset (class line_maps *set,
> >  inline const char *
> >  LINEMAP_FILE (const line_map_ordinary *ord_map)
> >  {
> > -  return ord_map->to_file;
> > +  return ORDINARY_MAP_FILE_NAME (ord_map);
> >  }
> 
> Presumably this adds the precondition that ORD_MAP isn't an LC_GEN map,
> so please update the leading comment.
>

Done.

> >  
> >  /* Return the line number this map started encoding location from.  */
> 
> [...snip...]
> 
> > diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
> > index e0f82e20571..e63916054e0 100644
> > --- a/libcpp/line-map.cc
> > +++ b/libcpp/line-map.cc
> > @@ -48,6 +48,31 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
> >  extern unsigned num_expanded_macros_counter;
> >  extern unsigned num_macro_tokens_counter;
> >  
> > +bool
> > +source_id::operator== (source_id src) const
> > +{
> > +  return m_len == src.m_len
> > +    && (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer
> > +       ? m_filename_or_buffer == src.m_filename_or_buffer
> > +       : !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer));
> > +}
> 
> This function could really use a leading comment, and I'd much prefer
> it if you converted to if statements rather than one big expression.
> 
> Am I right in thinking that for filenames, we use libiberty's
> filename_cmp (which compares the contents of the buffers), whereas for
> buffers we use pointer equality, and we assume that every buffer ptr is
> different from every filename's ptr?
> 
> As noted above, do we need a fast path for pointer equality before
> calling filename_cmp? 
>

Added now.

> 
> > +
> > +/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
> > +   but for an LC_GEN map, it returns the file name from which the data
> > +   originated, instead of asserting.  */
> > +const char *
> > +ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
> > +                                  const line_map_ordinary *ord_map)
> > +{
> > +  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
> > +    {
> > +      ord_map = linemap_included_from_linemap (set, ord_map);
> > +      if (!ord_map)
> > +       return "-";
> 
> How does the above early return happen?  Is it the "read from stdin"
> case?
>

It can't happen the way that GCC uses line_maps. It could happen if someone
created their own line_maps instance and made the very first map an LC_GEN
map. We do that in some selftests, although we don't actually query
ORDINARY_MAP_CONTAINING_FILE_NAME in any of those that currently exist.

> > +    }
> > +  return ORDINARY_MAP_FILE_NAME (ord_map);
> > +}
> > +
> >  /* Destructor for class line_maps.
> >     Ensure non-GC-managed memory is released.  */
> >  
> 
> [...snip...]
> 
> > @@ -505,21 +531,28 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
> >  }
> >  
> >  /* Add a mapping of logical source line to physical source file and
> > -   line number.
> > +   line number.  This function creates an "ordinary map", which is a
> > +   map that records locations of tokens that are not part of macro
> > +   replacement-lists present at a macro expansion point.
> > +
> > +   The text pointed to by FILENAME_OR_BUFFER must have a lifetime at least as
> > +   long as the lifetime of SET.  If reason is LC_LEAVE, and FILENAME_OR_BUFFER
> > +   is NULL, then FILENAME_OR_BUFFER, TO_LINE and SYSP are given their natural
> > +   values considering the file we are returning to.  If reason is LC_GEN, then
> > +   FILENAME_OR_BUFFER is the actual content, and DATA_LEN>0 is the length of it.
> > +   Otherwise FILENAME_OR_BUFFER is a file name and DATA_LEN is ignored.
> >  
> > -   The text pointed to by TO_FILE must have a lifetime
> > -   at least as long as the final call to lookup_line ().  An empty
> > -   TO_FILE means standard input.  If reason is LC_LEAVE, and
> > -   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > -   natural values considering the file we are returning to.
> > +   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
> > +   then FILENAME_OR_BUFFER may be NULL and will be copied from the source
> > +   map.
> >  
> > -   FROM_LINE should be monotonic increasing across calls to this
> > -   function.  A call to this function can relocate the previous set of
> > -   maps, so any stored line_map pointers should not be used.  */
> > +   A call to this function can relocate the previous set of maps, so any stored
> > +   line_map pointers should not be used.  */
> >  
> >  const struct line_map *
> >  linemap_add (line_maps *set, enum lc_reason reason,
> > -            unsigned int sysp, const char *to_file, linenum_type to_line)
> > +            unsigned int sysp, const char *filename_or_buffer,
> > +            linenum_type to_line, unsigned int data_len)
> 
> As noted above, would passing in a source_id make this simpler? 
> Looking at the logic below, possibly not...
>

Seems to me it is better with this change here yes.

> >  {
> >    /* Generate a start_location above the current highest_location.
> >       If possible, make the low range bits be zero.  */
> > @@ -536,12 +569,24 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >  
> >    /* When we enter the file for the first time reason cannot be
> >       LC_RENAME.  */
> > -  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
> > +  line_map_data *data_to_reuse = nullptr;
> > +  bool is_data_map = (reason == LC_GEN);
> > +  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
> > +    {
> > +      linemap_assert (set->depth != 0);
> > +      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
> > +      linemap_assert (prev);
> > +      if (prev->reason == LC_GEN)
> > +       {
> > +         data_to_reuse = prev->src.data;
> > +         is_data_map = true;
> > +       }
> > +    }
> >  
> >    /* If we are leaving the main file, return a NULL map.  */
> >    if (reason == LC_LEAVE
> >        && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
> > -      && to_file == NULL)
> > +      && filename_or_buffer == NULL)
> >      {
> >        set->depth--;
> >        return NULL;
> > @@ -557,8 +602,9 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >      = linemap_check_ordinary (new_linemap (set, start_location));
> >    map->reason = reason;
> >  
> > -  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
> > -    to_file = "<stdin>";
> > +  if (filename_or_buffer && *filename_or_buffer == '\0'
> > +      && reason != LC_RENAME_VERBATIM && !is_data_map)
> > +    filename_or_buffer = "<stdin>";
> >  
> >    if (reason == LC_RENAME_VERBATIM)
> >      reason = LC_RENAME;
> > @@ -577,21 +623,50 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >          that comes right before MAP in the same file.  */
> >        from = linemap_included_from_linemap (set, map - 1);
> >  
> > -      /* A TO_FILE of NULL is special - we use the natural values.  */
> > -      if (to_file == NULL)
> > +      /* Not currently supporting a #include originating from an LC_GEN
> > +        map, since there is no clear use case for this and it would complicate
> > +        the logic here.  */
> > +      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
> > +
> > +      /* A null FILENAME_OR_BUFFER is special - we use the natural
> > +        values.  */
> > +      if (!filename_or_buffer)
> >         {
> > -         to_file = ORDINARY_MAP_FILE_NAME (from);
> > +         filename_or_buffer = from->src.file;
> >           to_line = SOURCE_LINE (from, from[1].start_location);
> >           sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
> >         }
> >        else
> >         linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
> > -                                     to_file) == 0);
> > +                                     filename_or_buffer) == 0);
> >      }
> >  
> >    map->sysp = sysp;
> > -  map->to_file = to_file;
> >    map->to_line = to_line;
> > +
> > +  if (is_data_map)
> > +    {
> > +      /* All data maps should have reason == LC_GEN, even if they were
> > +        an LC_RENAME, to keep it simple to check which maps contain
> > +        data.  */
> > +      map->reason = LC_GEN;
> > +
> > +      if (data_to_reuse)
> > +       map->src.data = data_to_reuse;
> > +      else
> > +       {
> > +         auto src_data
> > +           = (line_map_data *)set->reallocator (nullptr,
> > +                                                sizeof (line_map_data));
> > +         src_data->data = filename_or_buffer;
> > +         src_data->len = data_len;
> > +         gcc_assert (data_len);
> > +         map->src.data = src_data;
> > +       }
> > +    }
> > +  else
> > +    map->src.file = filename_or_buffer;
> > +
> >    LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
> >    /* Do not store range_bits here.  That's readjusted in
> >       linemap_line_start.  */
> > @@ -606,7 +681,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >       pure_location_p.  */
> >    linemap_assert (pure_location_p (set, start_location));
> >  
> > -  if (reason == LC_ENTER)
> > +  if (reason == LC_ENTER || reason == LC_GEN)
> >      {
> >        if (set->depth == 0)
> >         map->included_from = 0;
> > @@ -617,7 +692,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
> >               & ~((1 << map[-1].m_column_and_range_bits) - 1))
> >              + map[-1].start_location);
> >        set->depth++;
> > -      if (set->trace_includes)
> > +      if (set->trace_includes && reason == LC_ENTER)
> >         trace_include (set, map);
> >      }
> >    else if (reason == LC_RENAME)
> > @@ -859,12 +934,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
> >               >= (((uint64_t) 1)
> >                   << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
> >           || range_bits < map->m_range_bits)
> > -       map = linemap_check_ordinary
> > -               (const_cast <line_map *>
> > -                 (linemap_add (set, LC_RENAME,
> > -                               ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> > -                               ORDINARY_MAP_FILE_NAME (map),
> > -                               to_line)));
> > +       {
> > +         const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
> > +           ? nullptr : map->src.file;
> > +         map = linemap_check_ordinary
> > +           (const_cast <line_map *>
> > +            (linemap_add (set, LC_RENAME,
> > +                          ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
> > +                          maybe_filename,
> > +                          to_line)));
> > +       }
> >        map->m_column_and_range_bits = column_bits;
> >        map->m_range_bits = range_bits;
> >        r = (MAP_START_LOCATION (map)
> 
> [...snip...]
> 
> 
> Thanks again for the patch; hope this is constructive
> Dave
> 

Thanks for looking at it. I attached a new version of patch 1 that reflects
all this feedback. There are also 4 trivial changes needed to later patches,
to adapt to the new arguments for linemap_add(), which for now I am just
pasting here for reference.

-Lewis

-- >8 --

diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index bdaa138fb2f..62c60645e88 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -4604,7 +4604,7 @@ test_fixit_consolidation (const line_table_case &case_)
 {
   line_table_test ltt (case_);
   if (ltt.m_generated_data)
-    linemap_add (line_table, LC_GEN, false, "some content", 1, 13);
+    linemap_add (line_table, LC_GEN, false, source_id("some content", 13), 1);
   else
     linemap_add (line_table, LC_ENTER, false, "test.c", 1);
 
diff --git a/gcc/input.cc b/gcc/input.cc
index 942e6b6f9c7..4c99df7a205 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -2068,8 +2068,9 @@ temp_source_file::do_linemap_add (int line)
 {
   const line_map *map;
   if (content_buf)
-    map = linemap_add (line_table, LC_GEN, false, content_buf,
-		       line, content_len);
+    map = linemap_add (line_table, LC_GEN, false,
+		       source_id(content_buf, content_len),
+		       line);
   else
     map = linemap_add (line_table, LC_ENTER, false, get_filename (), line);
   return linemap_check_ordinary (map);
@@ -2222,7 +2223,7 @@ test_accessing_ordinary_linemaps (const line_table_case &case_)
 
   /* Build a simple linemap describing some locations. */
   if (ltt.m_generated_data)
-    linemap_add (line_table, LC_GEN, false, "some data", 0, 10);
+    linemap_add (line_table, LC_GEN, false, source_id("some data", 10), 0);
   else
     linemap_add (line_table, LC_ENTER, false, "foo.c", 0);
 
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index d2d83e6dc83..854a5ea65f0 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1956,8 +1956,8 @@ destringize_and_run (cpp_reader *pfile, _cpp__Pragma_state *pstate)
   const unsigned int buf_len = dest - result;
   const int sysp = linemap_location_in_system_header_p (pfile->line_table,
 							pstate->pragma_loc);
-  linemap_add (pfile->line_table, LC_GEN, sysp, (const char *)result, 1,
-	       buf_len);
+  linemap_add (pfile->line_table, LC_GEN, sysp,
+	       source_id((const char *)result, buf_len), 1);
   const auto col_hint = (uchar *) memchr (result, '\n', buf_len) - result;
   linemap_line_start (pfile->line_table, 1, col_hint);
 

[-- Attachment #2: Pragma_locs_v5_1_of_8.txt --]
[-- Type: text/plain, Size: 33546 bytes --]

From: Lewis Hyatt <lhyatt@gmail.com>
Date: Thu, 3 Aug 2023 12:44:14 -0400
Subject: [PATCH v5 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers

Add a new linemap reason LC_GEN which enables encoding the location of data
that was generated during compilation and does not appear in any source file.
There could be many use cases, such as, for instance, referring to the content
of builtin macros (not yet implemented, but an easy lift after this one.) The
first intended application is to create a place to store the input to a
_Pragma directive, so that proper locations can be assigned to those
tokens. This will be done in a subsequent commit.

The TO_FILE member of struct line_map_ordinary has been changed to a union
named SRC which can be either a file name, or a pointer to a line_map_data
struct describing the data. There is no space overhead added to the line
maps data structures.

Outside libcpp, this patch includes only the minimal changes implied by the
adjustment from TO_FILE to SRC in struct line_map_ordinary. Subsequent
patches will implement the new functionality.

libcpp/ChangeLog:

	* include/line-map.h (enum lc_reason): Add LC_GEN.
	(struct line_map_data): New struct.
	(struct line_map_ordinary): Change TO_FILE from a char* to a union,
	and rename to SRC.
	(class source_id): New class.
	(ORDINARY_MAP_GENERATED_DATA_P): New function.
	(ORDINARY_MAP_GENERATED_DATA): New function.
	(ORDINARY_MAP_GENERATED_DATA_LEN): New function.
	(ORDINARY_MAP_SOURCE_ID): New function.
	(ORDINARY_MAPS_SAME_SOURCE_P): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): Declare.
	(LINEMAP_FILE): Adapt to struct line_map_ordinary change.
	(linemap_get_file_highest_location): Likewise.
	* line-map.cc (source_id::operator==): New function.
	(ORDINARY_MAP_CONTAINING_FILE_NAME): New function.
	(linemap_add): Support creating LC_GEN maps.
	(linemap_line_start): Support LC_GEN maps.
	(linemap_check_files_exited): Likewise.
	(linemap_position_for_loc_and_offset): Likewise.
	(linemap_get_expansion_filename): Likewise.
	(linemap_dump): Likewise.
	(linemap_dump_location): Likewise.
	(linemap_get_file_highest_location): Likewise.
	* directives.cc (_cpp_do_file_change): Likewise.

gcc/c-family/ChangeLog:

	* c-common.cc (try_to_locate_new_include_insertion_point): Recognize
	and ignore LC_GEN maps.

gcc/cp/ChangeLog:

	* module.cc (module_state::write_ordinary_maps): Recognize and
	ignore LC_GEN maps, and adapt to interface change in struct
	line_map_ordinary.
	(module_state::read_ordinary_maps): Likewise.

gcc/ChangeLog:

	* diagnostic-show-locus.cc (compatible_locations_p): Adapt to
	interface change in struct line_map_ordinary.
	* input.cc (special_fname_generated): New function.
	(dump_location_info): Support LC_GEN maps.
	(get_substring_ranges_for_loc): Adapt to interface change in struct
	line_map_ordinary.
	* input.h (special_fname_generated): Declare.

gcc/go/ChangeLog:

	* go-linemap.cc (Gcc_linemap::to_string): Recognize and ignore
	LC_GEN maps.

diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 268462f900e..e60dc16937a 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -9208,19 +9208,22 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
       const line_map_ordinary *ord_map
 	= LINEMAPS_ORDINARY_MAP_AT (line_table, i);
 
+      if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+	continue;
+
       if (const line_map_ordinary *from
 	  = linemap_included_from_linemap (line_table, ord_map))
 	/* We cannot use pointer equality, because with preprocessed
 	   input all filename strings are unique.  */
-	if (0 == strcmp (from->to_file, file))
+	if (ORDINARY_MAP_SOURCE_ID (from) == file)
 	  {
 	    last_include_ord_map = from;
 	    last_ord_map_after_include = NULL;
 	  }
 
-      /* Likewise, use strcmp, and reject any line-zero introductory
-	 map.  */
-      if (ord_map->to_line && 0 == strcmp (ord_map->to_file, file))
+      /* Likewise, use strcmp (via the source_id comparison), and reject any
+	 line-zero introductory map.  */
+      if (ord_map->to_line && ORDINARY_MAP_SOURCE_ID (ord_map) == file)
 	{
 	  if (!first_ord_map_in_file)
 	    first_ord_map_in_file = ord_map;
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index ea362bdffa4..ff17cd57016 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -16250,6 +16250,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
        iter != end; ++iter)
     if (iter->src != current)
       {
+	if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	  continue;
 	current = iter->src;
 	const char *fname = ORDINARY_MAP_FILE_NAME (iter->src);
 
@@ -16267,7 +16269,7 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
 		   preprocessed input we could have multiple instances
 		   of the same name, and we'd rather not percolate
 		   that.  */
-		const_cast<line_map_ordinary *> (iter->src)->to_file = name;
+		const_cast<line_map_ordinary *> (iter->src)->src.file = name;
 		fname = NULL;
 		break;
 	      }
@@ -16295,6 +16297,8 @@ module_state::write_ordinary_maps (elf_out *to, range_t &info,
   for (auto iter = ord_loc_remap->begin (), end = ord_loc_remap->end ();
        iter != end; ++iter)
     {
+      if (ORDINARY_MAP_GENERATED_DATA_P (iter->src))
+	continue;
       dump (dumper::LOCATION)
 	&& dump ("Span:%u ordinary [%u+%u,+%u)->[%u,+%u)",
 		 iter - ord_loc_remap->begin (),
@@ -16456,7 +16460,7 @@ module_state::read_ordinary_maps (unsigned num_ord_locs, unsigned range_bits)
 	  map->m_range_bits = sec.u ();
 	  map->m_column_and_range_bits = sec.u () + map->m_range_bits;
 	  unsigned fnum = sec.u ();
-	  map->to_file = (fnum < filenames.length () ? filenames[fnum] : "");
+	  map->src.file = (fnum < filenames.length () ? filenames[fnum] : "");
 	  map->to_line = sec.u ();
 	  base = map;
 	}
diff --git a/gcc/diagnostic-show-locus.cc b/gcc/diagnostic-show-locus.cc
index 0514815b51f..5a422f90272 100644
--- a/gcc/diagnostic-show-locus.cc
+++ b/gcc/diagnostic-show-locus.cc
@@ -998,7 +998,7 @@ compatible_locations_p (location_t loc_a, location_t loc_b)
 	 are in the same file.  */
       const line_map_ordinary *ord_map_a = linemap_check_ordinary (map_a);
       const line_map_ordinary *ord_map_b = linemap_check_ordinary (map_b);
-      return ord_map_a->to_file == ord_map_b->to_file;
+      return ORDINARY_MAPS_SAME_SOURCE_P (ord_map_a, ord_map_b);
     }
 }
 
diff --git a/gcc/go/go-linemap.cc b/gcc/go/go-linemap.cc
index 1d72e79647d..02d4ce04181 100644
--- a/gcc/go/go-linemap.cc
+++ b/gcc/go/go-linemap.cc
@@ -84,7 +84,8 @@ Gcc_linemap::to_string(Location location)
   resolved_location =
       linemap_resolve_location (line_table, location.gcc_location(),
                                 LRK_SPELLING_LOCATION, &lmo);
-  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT)
+  if (lmo == NULL || resolved_location < RESERVED_LOCATION_COUNT
+      || ORDINARY_MAP_GENERATED_DATA_P (lmo))
     return "";
   const char *path = LINEMAP_FILE (lmo);
   if (!path)
diff --git a/gcc/input.cc b/gcc/input.cc
index eaf301ec7c1..a5db934836a 100644
--- a/gcc/input.cc
+++ b/gcc/input.cc
@@ -35,6 +35,12 @@ special_fname_builtin ()
   return _("<built-in>");
 }
 
+const char *
+special_fname_generated ()
+{
+  return _("<generated>");
+}
+
 /* Input charset configuration.  */
 static const char *default_charset_callback (const char *)
 {
@@ -1391,7 +1397,19 @@ dump_location_info (FILE *stream)
       fprintf (stream, "ORDINARY MAP: %i\n", idx);
       dump_location_range (stream,
 			   MAP_START_LOCATION (map), end_location);
-      fprintf (stream, "  file: %s\n", ORDINARY_MAP_FILE_NAME (map));
+
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	{
+	  fprintf (stream, "  file: %s%s\n",
+		   ORDINARY_MAP_CONTAINING_FILE_NAME (line_table, map),
+		   special_fname_generated ());
+	  fprintf (stream, "  data: %.*s\n",
+		   (int) ORDINARY_MAP_GENERATED_DATA_LEN (map),
+		   ORDINARY_MAP_GENERATED_DATA (map));
+	}
+      else
+	fprintf (stream, "  file: %s\n", LINEMAP_FILE (map));
+
       fprintf (stream, "  starting at line: %i\n",
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (map));
       fprintf (stream, "  column and range bits: %i\n",
@@ -1417,6 +1435,9 @@ dump_location_info (FILE *stream)
       case LC_ENTER_MACRO:
 	reason = "LC_RENAME_MACRO";
 	break;
+      case LC_GEN:
+	reason = "LC_GEN";
+	break;
       default:
 	reason = "Unknown";
       }
@@ -1814,11 +1835,11 @@ get_substring_ranges_for_loc (cpp_reader *pfile,
       /* Bulletproofing.  We ought to only have different ordinary maps
 	 for start vs finish due to line-length jumps.  */
       if (start_ord_map != final_ord_map
-	  && start_ord_map->to_file != final_ord_map->to_file)
+	  && !ORDINARY_MAPS_SAME_SOURCE_P (start_ord_map, final_ord_map))
 	return "start and finish are spelled in different ordinary maps";
       /* The file from linemap_resolve_location ought to match that from
 	 expand_location_to_spelling_point.  */
-      if (start_ord_map->to_file != start.file)
+      if (ORDINARY_MAP_SOURCE_ID (start_ord_map) != start.file)
 	return "mismatching file after resolving linemap";
 
       location_t start_loc
diff --git a/gcc/input.h b/gcc/input.h
index d1087b7a9e8..1b81a995f86 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -34,6 +34,7 @@ extern GTY(()) class line_maps *saved_line_table;
 
 /* Returns the translated string referring to the special location.  */
 const char *special_fname_builtin ();
+const char *special_fname_generated ();
 
 /* line-map.cc reserves RESERVED_LOCATION_COUNT to the user.  Ensure
    both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
diff --git a/libcpp/directives.cc b/libcpp/directives.cc
index ee5419d1f40..dfd782b3fca 100644
--- a/libcpp/directives.cc
+++ b/libcpp/directives.cc
@@ -1165,7 +1165,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
 		     const char *to_file, linenum_type to_line,
 		     unsigned int sysp)
 {
-  linemap_assert (reason != LC_ENTER_MACRO);
+  linemap_assert (reason != LC_ENTER_MACRO && reason != LC_GEN);
 
   const line_map_ordinary *ord_map = NULL;
   if (!to_line && reason == LC_RENAME_VERBATIM)
@@ -1176,7 +1176,7 @@ _cpp_do_file_change (cpp_reader *pfile, enum lc_reason reason,
          preprocessed source.  */
       line_map_ordinary *last = LINEMAPS_LAST_ORDINARY_MAP (pfile->line_table);
       if (!ORDINARY_MAP_STARTING_LINE_NUMBER (last)
-	  && 0 == filename_cmp (to_file, ORDINARY_MAP_FILE_NAME (last))
+	  && ORDINARY_MAP_SOURCE_ID (last) == to_file
 	  && SOURCE_LINE (last, pfile->line_table->highest_line) == 2)
 	{
 	  ord_map = last;
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 44fea0ea08e..4a03be2f9c7 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -75,6 +75,8 @@ enum lc_reason
   LC_RENAME_VERBATIM,	/* Likewise, but "" != stdin.  */
   LC_ENTER_MACRO,	/* Begin macro expansion.  */
   LC_MODULE,		/* A (C++) Module.  */
+  LC_GEN,		/* Internally generated source.  */
+
   /* FIXME: add support for stringize and paste.  */
   LC_HWM /* High Water Mark.  */
 };
@@ -355,6 +357,16 @@ typedef void *(*line_map_realloc) (void *, size_t);
    for a given requested allocation.  */
 typedef size_t (*line_map_round_alloc_size_func) (size_t);
 
+/* Struct to hold the data + size for in-memory data to be stored in a
+   line_map_ordinary.  Because this is used rarely, it is better to
+   dynamically allocate this struct just when needed, rather than adding
+   overhead to every line_map to store the extra field.  */
+struct GTY(()) line_map_data
+{
+  const char * GTY((string_length ("%h.len"))) data;
+  unsigned int len;
+};
+
 /* A line_map encodes a sequence of locations.
    There are two kinds of maps. Ordinary maps and macro expansion
    maps, a.k.a macro maps.
@@ -437,9 +449,15 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
 
   /* Pointer alignment boundary on both 32 and 64-bit systems.  */
 
-  const char *to_file;
-  linenum_type to_line;
+  /* SRC is either the file name, in the typical case, or a pointer to
+     a line_map_data which shows where to find the actual data, for the
+     case of an LC_GEN map.  */
+  union {
+    const char * GTY((tag ("false"))) file;
+    line_map_data * GTY((tag ("true"))) data;
+  } GTY((desc ("ORDINARY_MAP_GENERATED_DATA_P (&%1)"))) src;
 
+  linenum_type to_line;
   /* Location from whence this line map was included.  For regular
      #includes, this location will be the last location of a map.  For
      outermost file, this is 0.  For modules it could be anywhere
@@ -565,6 +583,42 @@ struct GTY((tag ("2"))) line_map_macro : public line_map {
 #define linemap_assert_fails(EXPR) (! (EXPR))
 #endif
 
+/* A source_id represents a location that contains source code, which is usually
+   the name of a file.  But if the buffer length is non-zero, then it refers
+   instead to an in-memory buffer.  This is used so that diagnostics can refer
+   to generated data as well as to normal source code.  */
+
+class source_id
+{
+public:
+  /* This constructor is for the typical case, where the source code lives in
+     a file.  It is not explicit, because this case is by far the most common
+     one, it is worthwhile to allow implicit construction from a string.  */
+  source_id (const char *filename = nullptr)
+    : m_filename_or_buffer (filename),
+      m_len (0)
+  {}
+
+  /* This constructor is for the in-memory data case.  */
+  source_id (const char *buffer, unsigned buffer_len)
+    : m_filename_or_buffer (buffer),
+      m_len (buffer_len)
+  {
+    linemap_assert (buffer_len > 0);
+  }
+
+  explicit operator bool () const { return m_filename_or_buffer; }
+  const char * get_filename_or_buffer () const { return m_filename_or_buffer; }
+  unsigned get_buffer_len () const { return m_len; }
+  bool is_buffer () const { return m_len; }
+  bool operator== (source_id src) const;
+  bool operator!= (source_id src) const { return !(*this == src); }
+
+private:
+  const char *m_filename_or_buffer;
+  unsigned m_len;
+};
+
 /* Get whether location LOC is an ordinary location.  */
 
 inline bool
@@ -662,6 +716,12 @@ ORDINARY_MAP_IN_SYSTEM_HEADER_P (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* TRUE if this line map contains generated data.  */
+inline bool ORDINARY_MAP_GENERATED_DATA_P (const line_map_ordinary *ord_map)
+{
+  return ord_map->reason == LC_GEN;
+}
+
 /* TRUE if this line map is for a module (not a source file).  */
 
 inline bool
@@ -671,14 +731,46 @@ MAP_MODULE_P (const line_map *map)
 	  && linemap_check_ordinary (map)->reason == LC_MODULE);
 }
 
-/* Get the filename of ordinary map MAP.  */
+/* Get the data contents of ordinary map MAP.  */
 
 inline const char *
 ORDINARY_MAP_FILE_NAME (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  linemap_assert (ord_map->reason != LC_GEN);
+  return ord_map->src.file;
+}
+
+inline const char *
+ORDINARY_MAP_GENERATED_DATA (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->data;
+}
+
+inline unsigned int
+ORDINARY_MAP_GENERATED_DATA_LEN (const line_map_ordinary *ord_map)
+{
+  linemap_assert (ord_map->reason == LC_GEN);
+  return ord_map->src.data->len;
+}
+
+inline source_id ORDINARY_MAP_SOURCE_ID (const line_map_ordinary *ord_map)
+{
+  if (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    return source_id {ord_map->src.data->data, ord_map->src.data->len};
+  return source_id {ord_map->src.file};
+}
+
+/* If we just want to know whether two maps point to the same
+   file/buffer or not.  */
+inline bool
+ORDINARY_MAPS_SAME_SOURCE_P (const line_map_ordinary *map1,
+			   const line_map_ordinary *map2)
+{
+  return ORDINARY_MAP_SOURCE_ID (map1) == ORDINARY_MAP_SOURCE_ID (map2);
 }
 
+
 /* Get the cpp macro whose expansion gave birth to macro map MAP.  */
 
 inline cpp_hashnode *
@@ -1092,22 +1184,26 @@ extern location_t linemap_line_start
 /* Allocate a raw block of line maps, zero initialized.  */
 extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
 
-/* Add a mapping of logical source line to physical source file and
-   line number. This function creates an "ordinary map", which is a
-   map that records locations of tokens that are not part of macro
-   replacement-lists present at a macro expansion point.
+/* Add a mapping of logical source line to physical source file and line
+   number.  This function creates an "ordinary map", which is a map that
+   records locations of tokens that are not part of macro replacement-lists
+   present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the lifetime of SET.  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
+   SRC is a source_id indicating either the name of the file for the
+   location, or, if reason is LC_GEN, the in-memory data.  In either case
+   the data must have a lifetime at least as long as that of SET.  If reason
+   is LC_LEAVE, and SRC is NULL, then SRC, TO_LINE and SYSP are given their
    natural values considering the file we are returning to.
 
-   A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then SRC may be NULL and will be copied from the source map.
+
+   A call to this function can relocate the previous set of maps, so any
+   stored line_map pointers should not be used.  */
+
 extern const line_map *linemap_add
   (class line_maps *, enum lc_reason, unsigned int sysp,
-   const char *to_file, linenum_type to_line);
+   source_id src, linenum_type to_line);
 
 /* Create a macro map.  A macro map encodes source locations of tokens
    that are part of a macro replacement-list, at a macro expansion
@@ -1253,11 +1349,12 @@ linemap_position_for_loc_and_offset (class line_maps *set,
 				     location_t loc,
 				     unsigned int offset);
 
-/* Return the file this map is for.  */
+/* Return the file this map is for.  ORD_MAP must not be an
+   LC_GEN map.  */
 inline const char *
 LINEMAP_FILE (const line_map_ordinary *ord_map)
 {
-  return ord_map->to_file;
+  return ORDINARY_MAP_FILE_NAME (ord_map);
 }
 
 /* Return the line number this map started encoding location from.  */
@@ -1277,6 +1374,13 @@ LINEMAP_SYSP (const line_map_ordinary *ord_map)
   return ord_map->sysp;
 }
 
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map);
+
 const struct line_map *first_map_in_common (line_maps *set,
 					    location_t loc0,
 					    location_t loc1,
@@ -2104,12 +2208,10 @@ struct linemap_stats
   long adhoc_table_entries_used;
 };
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
-bool linemap_get_file_highest_location (class line_maps * set,
-					const char *file_name,
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
+bool linemap_get_file_highest_location (line_maps *set, source_id src,
 					location_t *loc);
 
 /* Compute and return statistics about the memory consumption of some
diff --git a/libcpp/line-map.cc b/libcpp/line-map.cc
index e0f82e20571..582ff3a6a58 100644
--- a/libcpp/line-map.cc
+++ b/libcpp/line-map.cc
@@ -48,6 +48,47 @@ static location_t linemap_macro_loc_to_exp_point (line_maps *,
 extern unsigned num_expanded_macros_counter;
 extern unsigned num_macro_tokens_counter;
 
+
+/* Determine if two source_id refer to the same location.  If both are
+   files, they should refer to the same file name.  If both are memory
+   buffers, they should be the same buffer.  Otherwise they are
+   different.  */
+
+bool
+source_id::operator== (source_id src) const
+{
+  if (m_len != src.m_len)
+    return false;
+
+  /* Use pointer equality for generated data buffers.  For file names, if
+     either of them is NULL, the other one should also be NULL.
+     (filename_cmp cannot take a NULL pointer.)  */
+  if (is_buffer () || !m_filename_or_buffer || !src.m_filename_or_buffer)
+    return m_filename_or_buffer == src.m_filename_or_buffer;
+
+  /* Pointer equality is usually sufficient for file names, but PCH may end
+     up using different pointers for the same file, so fall back also to
+     filename_cmp().  */
+  return m_filename_or_buffer == src.m_filename_or_buffer
+    || !filename_cmp (m_filename_or_buffer, src.m_filename_or_buffer);
+}
+
+/* For a normal ordinary map, this is the same as ORDINARY_MAP_FILE_NAME;
+   but for an LC_GEN map, it returns the file name from which the data
+   originated, instead of asserting.  */
+const char *
+ORDINARY_MAP_CONTAINING_FILE_NAME (line_maps *set,
+				   const line_map_ordinary *ord_map)
+{
+  while (ORDINARY_MAP_GENERATED_DATA_P (ord_map))
+    {
+      ord_map = linemap_included_from_linemap (set, ord_map);
+      if (!ord_map)
+	return "-";
+    }
+  return ORDINARY_MAP_FILE_NAME (ord_map);
+}
+
 /* Destructor for class line_maps.
    Ensure non-GC-managed memory is released.  */
 
@@ -411,8 +452,9 @@ linemap_check_files_exited (line_maps *set)
   for (const line_map_ordinary *map = LINEMAPS_LAST_ORDINARY_MAP (set);
        ! MAIN_FILE_P (map);
        map = linemap_included_from_linemap (set, map))
-    fprintf (stderr, "line-map.cc: file \"%s\" entered but not left\n",
-	     ORDINARY_MAP_FILE_NAME (map));
+    fprintf (stderr, "line-map.cc: file \"%s%s\" entered but not left\n",
+	     ORDINARY_MAP_CONTAINING_FILE_NAME (set, map),
+	     ORDINARY_MAP_GENERATED_DATA_P (map) ? "<generated>" : "");
 }
 
 /* Create NUM zero-initialized maps of type MACRO_P.  */
@@ -504,22 +546,26 @@ LAST_SOURCE_LINE_LOCATION (const line_map_ordinary *map)
 	  + map->start_location);
 }
 
-/* Add a mapping of logical source line to physical source file and
-   line number.
+/* Add a mapping of logical source line to physical source file and line
+   number.  This function creates an "ordinary map", which is a map that
+   records locations of tokens that are not part of macro replacement-lists
+   present at a macro expansion point.
 
-   The text pointed to by TO_FILE must have a lifetime
-   at least as long as the final call to lookup_line ().  An empty
-   TO_FILE means standard input.  If reason is LC_LEAVE, and
-   TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
+   SRC is a source_id indicating either the name of the file for the
+   location, or, if reason is LC_GEN, the in-memory data.  In either case
+   the data must have a lifetime at least as long as that of SET.  If reason
+   is LC_LEAVE, and SRC is NULL, then SRC, TO_LINE and SYSP are given their
    natural values considering the file we are returning to.
 
-   FROM_LINE should be monotonic increasing across calls to this
-   function.  A call to this function can relocate the previous set of
-   maps, so any stored line_map pointers should not be used.  */
+   If reason is LC_RENAME, and the map being renamed from is an LC_GEN map,
+   then SRC may be NULL and will be copied from the source map.
+
+   A call to this function can relocate the previous set of maps, so any
+   stored line_map pointers should not be used.  */
 
 const struct line_map *
-linemap_add (line_maps *set, enum lc_reason reason,
-	     unsigned int sysp, const char *to_file, linenum_type to_line)
+linemap_add (line_maps *set, enum lc_reason reason, unsigned int sysp,
+	     source_id src, linenum_type to_line)
 {
   /* Generate a start_location above the current highest_location.
      If possible, make the low range bits be zero.  */
@@ -536,12 +582,25 @@ linemap_add (line_maps *set, enum lc_reason reason,
 
   /* When we enter the file for the first time reason cannot be
      LC_RENAME.  */
-  linemap_assert (!(set->depth == 0 && reason == LC_RENAME));
+  line_map_data *data_to_reuse = nullptr;
+  bool is_data_map = (reason == LC_GEN);
+  linemap_assert (is_data_map == src.is_buffer ());
+  if (reason == LC_RENAME || reason == LC_RENAME_VERBATIM)
+    {
+      linemap_assert (set->depth != 0);
+      const auto prev = LINEMAPS_LAST_ORDINARY_MAP (set);
+      linemap_assert (prev);
+      if (prev->reason == LC_GEN)
+	{
+	  data_to_reuse = prev->src.data;
+	  is_data_map = true;
+	}
+    }
 
   /* If we are leaving the main file, return a NULL map.  */
   if (reason == LC_LEAVE
       && MAIN_FILE_P (LINEMAPS_LAST_ORDINARY_MAP (set))
-      && to_file == NULL)
+      && !src)
     {
       set->depth--;
       return NULL;
@@ -557,8 +616,8 @@ linemap_add (line_maps *set, enum lc_reason reason,
     = linemap_check_ordinary (new_linemap (set, start_location));
   map->reason = reason;
 
-  if (to_file && *to_file == '\0' && reason != LC_RENAME_VERBATIM)
-    to_file = "<stdin>";
+  if (!is_data_map && src == "" && reason != LC_RENAME_VERBATIM)
+    src = "<stdin>";
 
   if (reason == LC_RENAME_VERBATIM)
     reason = LC_RENAME;
@@ -577,21 +636,49 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	 that comes right before MAP in the same file.  */
       from = linemap_included_from_linemap (set, map - 1);
 
-      /* A TO_FILE of NULL is special - we use the natural values.  */
-      if (to_file == NULL)
+      /* Not currently supporting a #include originating from an LC_GEN
+	 map, since there is no clear use case for this and it would complicate
+	 the logic here.  */
+      linemap_assert (!ORDINARY_MAP_GENERATED_DATA_P (from));
+
+      /* A null SRC is special - we use the natural
+	 values.  */
+      if (!src)
 	{
-	  to_file = ORDINARY_MAP_FILE_NAME (from);
+	  src = from->src.file;
 	  to_line = SOURCE_LINE (from, from[1].start_location);
 	  sysp = ORDINARY_MAP_IN_SYSTEM_HEADER_P (from);
 	}
       else
-	linemap_assert (filename_cmp (ORDINARY_MAP_FILE_NAME (from),
-				      to_file) == 0);
+	linemap_assert (ORDINARY_MAP_SOURCE_ID (from) == src);
     }
 
   map->sysp = sysp;
-  map->to_file = to_file;
   map->to_line = to_line;
+
+  if (is_data_map)
+    {
+      /* All data maps should have reason == LC_GEN, even if they were
+	 an LC_RENAME, to keep it simple to check which maps contain
+	 data.  */
+      map->reason = LC_GEN;
+
+      if (data_to_reuse)
+	map->src.data = data_to_reuse;
+      else
+	{
+	  auto src_data
+	    = (line_map_data *)set->reallocator (nullptr,
+						 sizeof (line_map_data));
+	  src_data->data = src.get_filename_or_buffer ();
+	  src_data->len = src.get_buffer_len ();
+	  linemap_assert (src_data->len);
+	  map->src.data = src_data;
+	}
+    }
+  else
+    map->src.file = src.get_filename_or_buffer ();
+
   LINEMAPS_ORDINARY_CACHE (set) = LINEMAPS_ORDINARY_USED (set) - 1;
   /* Do not store range_bits here.  That's readjusted in
      linemap_line_start.  */
@@ -606,7 +693,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
      pure_location_p.  */
   linemap_assert (pure_location_p (set, start_location));
 
-  if (reason == LC_ENTER)
+  if (reason == LC_ENTER || reason == LC_GEN)
     {
       if (set->depth == 0)
 	map->included_from = 0;
@@ -617,7 +704,7 @@ linemap_add (line_maps *set, enum lc_reason reason,
 	      & ~((1 << map[-1].m_column_and_range_bits) - 1))
 	     + map[-1].start_location);
       set->depth++;
-      if (set->trace_includes)
+      if (set->trace_includes && reason == LC_ENTER)
 	trace_include (set, map);
     }
   else if (reason == LC_RENAME)
@@ -859,12 +946,16 @@ linemap_line_start (line_maps *set, linenum_type to_line,
 	      >= (((uint64_t) 1)
 		  << (CHAR_BIT * sizeof (linenum_type) - column_bits)))
 	  || range_bits < map->m_range_bits)
-	map = linemap_check_ordinary
-	        (const_cast <line_map *>
-		  (linemap_add (set, LC_RENAME,
-				ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
-				ORDINARY_MAP_FILE_NAME (map),
-				to_line)));
+	{
+	  const auto maybe_filename = ORDINARY_MAP_GENERATED_DATA_P (map)
+	    ? nullptr : map->src.file;
+	  map = linemap_check_ordinary
+	    (const_cast <line_map *>
+	     (linemap_add (set, LC_RENAME,
+			   ORDINARY_MAP_IN_SYSTEM_HEADER_P (map),
+			   maybe_filename,
+			   to_line)));
+	}
       map->m_column_and_range_bits = column_bits;
       map->m_range_bits = range_bits;
       r = (MAP_START_LOCATION (map)
@@ -1023,9 +1114,9 @@ linemap_position_for_loc_and_offset (line_maps *set,
 	     >= MAP_START_LOCATION (map + 1)); map++)
     /* If the next map is a different file, or starts in a higher line, we
        cannot encode the location there.  */
-    if ((map + 1)->reason != LC_RENAME
+    if (((map + 1)->reason != LC_RENAME && (map + 1)->reason != LC_GEN)
 	|| line < ORDINARY_MAP_STARTING_LINE_NUMBER (map + 1)
-	|| 0 != strcmp (LINEMAP_FILE (map + 1), LINEMAP_FILE (map)))
+	|| !ORDINARY_MAPS_SAME_SOURCE_P (map, map + 1))
       return loc;
 
   column += column_offset;
@@ -1283,7 +1374,7 @@ linemap_get_expansion_filename (line_maps *set,
 
   linemap_macro_loc_to_exp_point (set, location, &map);
 
-  return LINEMAP_FILE (map);
+  return ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
 }
 
 /* Return the name of the macro associated to MACRO_MAP.  */
@@ -1873,7 +1964,7 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
 {
   const char *const lc_reasons_v[LC_HWM]
       = { "LC_ENTER", "LC_LEAVE", "LC_RENAME", "LC_RENAME_VERBATIM",
-	  "LC_ENTER_MACRO", "LC_MODULE" };
+	  "LC_ENTER_MACRO", "LC_MODULE", "LC_GEN" };
   const line_map *map;
   unsigned reason;
 
@@ -1903,11 +1994,15 @@ linemap_dump (FILE *stream, class line_maps *set, unsigned ix, bool is_macro)
       const line_map_ordinary *includer_map
 	= linemap_included_from_linemap (set, ord_map);
 
-      fprintf (stream, "File: %s:%d\n", ORDINARY_MAP_FILE_NAME (ord_map),
+      fprintf (stream, "File: %s:%d\n",
+	       ORDINARY_MAP_GENERATED_DATA_P (ord_map) ? "<generated>"
+	       : ORDINARY_MAP_FILE_NAME (ord_map),
 	       ORDINARY_MAP_STARTING_LINE_NUMBER (ord_map));
       fprintf (stream, "Included from: [%d] %s\n",
 	       includer_map ? int (includer_map - set->info_ordinary.maps) : -1,
-	       includer_map ? ORDINARY_MAP_FILE_NAME (includer_map) : "None");
+	       includer_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set,
+								 includer_map)
+	       : "None");
     }
   else
     {
@@ -1931,7 +2026,7 @@ linemap_dump_location (line_maps *set,
 {
   const line_map_ordinary *map;
   location_t location;
-  const char *path = "", *from = "";
+  const char *path = "", *path_suffix = "", *from = "";
   int l = -1, c = -1, s = -1, e = -1;
 
   if (IS_ADHOC_LOC (loc))
@@ -1948,7 +2043,9 @@ linemap_dump_location (line_maps *set,
     linemap_assert (location < RESERVED_LOCATION_COUNT);
   else
     {
-      path = LINEMAP_FILE (map);
+      path = ORDINARY_MAP_CONTAINING_FILE_NAME (set, map);
+      if (ORDINARY_MAP_GENERATED_DATA_P (map))
+	path_suffix = "<generated>";
       l = SOURCE_LINE (map, location);
       c = SOURCE_COLUMN (map, location);
       s = LINEMAP_SYSP (map) != 0;
@@ -1959,24 +2056,23 @@ linemap_dump_location (line_maps *set,
 	{
 	  const line_map_ordinary *from_map
 	    = linemap_included_from_linemap (set, map);
-	  from = from_map ? LINEMAP_FILE (from_map) : "<NULL>";
+	  from = from_map ? ORDINARY_MAP_CONTAINING_FILE_NAME (set, from_map)
+	    : "<NULL>";
 	}
     }
 
   /* P: path, L: line, C: column, S: in-system-header, M: map address,
      E: macro expansion?, LOC: original location, R: resolved location   */
-  fprintf (stream, "{P:%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
-	   path, from, l, c, s, (void*)map, e, loc, location);
+  fprintf (stream, "{P:%s%s;F:%s;L:%d;C:%d;S:%d;M:%p;E:%d,LOC:%d,R:%d}",
+	   path, path_suffix, from, l, c, s, (void*)map, e, loc, location);
 }
 
-/* Return the highest location emitted for a given file for which
-   there is a line map in SET.  FILE_NAME is the file name to
-   consider.  If the function returns TRUE, *LOC is set to the highest
-   location emitted for that file.  */
+/* Return the highest location emitted for a given source ID for which there is
+   a line map in SET.  If the function returns TRUE, *LOC is set to the highest
+   location emitted for that source.  */
 
 bool
-linemap_get_file_highest_location (line_maps *set,
-				   const char *file_name,
+linemap_get_file_highest_location (line_maps *set, source_id src,
 				   location_t *loc)
 {
   /* If the set is empty or no ordinary map has been created then
@@ -1984,12 +2080,11 @@ linemap_get_file_highest_location (line_maps *set,
   if (set == NULL || set->info_ordinary.used == 0)
     return false;
 
-  /* Now look for the last ordinary map created for FILE_NAME.  */
+  /* Now look for the last ordinary map created for this file.  */
   int i;
   for (i = set->info_ordinary.used - 1; i >= 0; --i)
     {
-      const char *fname = set->info_ordinary.maps[i].to_file;
-      if (fname && !filename_cmp (fname, file_name))
+      if (ORDINARY_MAP_SOURCE_ID (set->info_ordinary.maps + i) == src)
 	break;
     }
 

  reply	other threads:[~2023-08-13 20:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-07-28 22:58   ` David Malcolm
2023-07-31 22:39     ` Lewis Hyatt
2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-08-11 22:45           ` David Malcolm
2023-08-13 20:18             ` Lewis Hyatt [this message]
2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
2023-08-11 23:02           ` David Malcolm
2023-08-14 21:41             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
2023-08-15 15:43           ` David Malcolm
2023-08-15 17:58             ` Lewis Hyatt
2023-08-15 19:39               ` David Malcolm
2023-08-23 21:22                 ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
2023-08-15 16:15           ` David Malcolm
2023-08-15 18:15             ` Lewis Hyatt
2023-08-15 19:46               ` David Malcolm
2023-08-15 20:08                 ` Lewis Hyatt
2023-08-23 19:41                   ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
2023-08-15 16:27           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
2023-08-15 16:39           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-08-15 17:04           ` David Malcolm
2023-08-15 17:51             ` Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
2023-07-29 14:27   ` Lewis Hyatt
2023-07-29 16:03     ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230813201853.GA22077@ldh-imac.local \
    --to=lhyatt@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).