public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: Lewis Hyatt <lhyatt@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
Date: Sat, 05 Nov 2022 12:23:28 -0400	[thread overview]
Message-ID: <298deff4873718eb8b5e366007e8f2d4baa83bdb.camel@redhat.com> (raw)
In-Reply-To: <dbe5ba2a7d067eb725c0733ca3960fab969cf139.1667514153.git.lhyatt@gmail.com>

On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> Add a new linemap reason LC_GEN which enables encoding the location
> of data
> that was generated during compilation and does not appear in any
> source file.
> There could be many use cases, such as, for instance, referring to
> the content
> of builtin macros (not yet implemented, but an easy lift after this
> one.) The
> first intended application is to create a place to store the input to
> a
> _Pragma directive, so that proper locations can be assigned to those
> tokens. This will be done in a subsequent commit.
> 
> The actual change needed to the line-maps API in libcpp is very
> minimal and
> requires no space overhead in the line map data structures (on 64-bit
> systems
> that is; one newly added data member to class line_map_ordinary sits
> inside
> former padding bytes.) An LC_GEN map is just an ordinary map like any
> other,
> but the TO_FILE member that normally points to the file name points
> instead to
> the actual data.  This works automatically with PCH as well, for the
> same
> reason that the file name makes its way into a PCH.
> 
> Outside libcpp, there are many small changes but most of them are to
> selftests, which are necessarily more sensitive to implementation
> details. From the perspective of the user (the "user", here, being a
> frontend
> using line maps or else the diagnostics infrastructure), the chief
> visible
> change is that the function location_get_source_line() should be
> passed an
> expanded_location object instead of a separate filename and line
> number.  This
> is not a big change because in most cases, this information came
> anyway from a
> call to expand_location and the needed expanded_location object is
> readily
> available. The new overload of location_get_source_line() uses the
> extra
> information in the expanded_location object to obtain the data from
> the
> in-memory buffer when it originated from an LC_GEN map.
> 
> Until the subsequent patch that starts using LC_GEN maps, none are
> yet
> generated within GCC, hence nothing is added to the testsuite here;
> but all
> relevant selftests have been extended to cover generated data maps in
> addition to normal files.

Thanks for this patch.


[...snip...]
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index 5890c18bdc3..2935d7fb236 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
>        const line_map_ordinary *ord_map
>         = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
>  
> +      if (ord_map->reason == LC_GEN)
> +       continue;
> +
>        if (const line_map_ordinary *from
>           = linemap_included_from_linemap (line_table, ord_map))
>         /* We cannot use pointer equality, because with preprocessed
>            input all filename strings are unique.  */
> -       if (0 == strcmp (from->to_file, file))
> +       if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
>           {
>             last_include_ord_map = from;
>             last_ord_map_after_include = NULL;

[...snip...]

I'm not a fan of having the "to_file" field change meaning based on
whether reason is LC_GEN.

How involved would it be to split line_map_ordinary into two
subclasses, so that we'd have this hierarchy (with indentation showing
inheritance):

line_map
  line_map_ordinary
    line_map_ordinary_file
    line_map_ordinary_generated
  line_map_macro

Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
to emphasize that it might not be a filename, and that we have to check
everywhere we access that field.

Please can all those checks for LC_GEN go into an inline function so we
can write e.g.
  map->generated_p ()
or somesuch.

If I reading things right, patch 6 adds the sole usage of this in
destringize_and_run.  Would we ever want to discriminate between
different kinds of generated buffers?

[...snip...]

> @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
>                  N_("of module"),
>                  N_("In module imported at"),   /* 6 */
>                  N_("imported at"),
> +                N_("In buffer generated from"),   /* 8 */
>                 };

We use the wording "destringized" in:

so maybe this should be "In buffer destringized from" ???  (I'm not
sure) 

[...snip...]

> diff --git a/gcc/input.cc b/gcc/input.cc
> index 483cb6e940d..3cf5480551d 100644
> --- a/gcc/input.cc
> +++ b/gcc/input.cc

[..snip...]

> @@ -58,7 +64,7 @@ public:
>    ~file_cache_slot ();

My initial thought reading the input.cc part of this patch was that I
want it to be very clear when a file_cache_slot is for a real file vs
when we're replaying generated data.  I'd hoped that this could have
been expressed via inheritance, but we preallocate all the cache slots
once in an array in file_cache's ctor and the slots get reused over
time.  So instead of that, can we please have some kind of:

   bool file_slot_p () const;
   bool generated_slot_p () const;

or somesuch, so that we can have clear assertions and conditionals
about the current state of a slot (I think the discriminating condition
is that generated_data_len > 0, right?)

If I'm reading things right, it looks like file_cache_slot::m_file_path
does double duty after this patch, and is either a filename, or a
pointer to the generated data.  If so, please can the patch rename it,
and have all usage guarded appropriately.  Can it be a union? (or does
the ctor prevent that?)

[...snip...]
 
> @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
>     num_file_slots files are cached.  */
>  
>  file_cache_slot*
> -file_cache::add_file (const char *file_path)
> +file_cache::add_file (const char *file_path, unsigned int generated_data_len)

Can we split this into two functions: one for files, and one for
generated data?  (add_file vs add_generated_data?)

>  {
>  
> -  FILE *fp = fopen (file_path, "r");
> -  if (fp == NULL)
> -    return NULL;
> +  FILE *fp;
> +  if (generated_data_len)
> +    fp = NULL;
> +  else
> +    {
> +      fp = fopen (file_path, "r");
> +      if (fp == NULL)
> +       return NULL;
> +    }
>  
>    unsigned highest_use_count = 0;
>    file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
> -  if (!r->create (in_context, file_path, fp, highest_use_count))
> +  if (!r->create (in_context, file_path, fp, highest_use_count,
> +                 generated_data_len))
>      return NULL;
>    return r;
>  }

[...snip...]

> @@ -535,11 +571,12 @@ file_cache::~file_cache ()
>     it.  */
>  
>  file_cache_slot*
> -file_cache::lookup_or_add_file (const char *file_path)
> +file_cache::lookup_or_add_file (const char *file_path,
> +                               unsigned int generated_data_len)

Likewise, could this be split into:
  lookup_or_add_file
and
  lookup_or_add_generated
or somesuch?

>  {
>    file_cache_slot *r = lookup_file (file_path);

The patch doesn't seem to touch file_cache::lookup_file.  Is the
current implementation of that ideal (it looks like we're going to be
doing strcmp of generated buffers, when presumably for those we could
simply be doing pointer comparisons).

Maybe rename it to lookup_slot?

>    if (r == NULL)
> -    r = add_file (file_path);
> +    r = add_file (file_path, generated_data_len);
>    return r;
>  }
>  
> @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path)
>     diagnostic.  */
>  
>  file_cache_slot::file_cache_slot ()
> -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
> +: m_use_count (0), m_file_path (NULL), m_fp (NULL),
> +  m_data (0), m_data_active (0),
>    m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0),
>    m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true)
>  {

[...snip...]



> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> index 50207cacc12..eb281809cbd 100644
> --- a/libcpp/include/line-map.h
> +++ b/libcpp/include/line-map.h
> @@ -75,6 +75,8 @@ enum lc_reason
>    LC_RENAME_VERBATIM,  /* Likewise, but "" != stdin.  */
>    LC_ENTER_MACRO,      /* Begin macro expansion.  */
>    LC_MODULE,           /* A (C++) Module.  */
> +  LC_GEN,              /* Internally generated source.  */
> +
>    /* FIXME: add support for stringize and paste.  */
>    LC_HWM /* High Water Mark.  */
>  };
> @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
>  
>    /* Pointer alignment boundary on both 32 and 64-bit systems.  */
>  
> -  const char *to_file;
> +  /* This GTY markup is in case this is an LC_GEN map, in which case
> +     to_file actually points to the generated data, which we do not
> +     want to require to be free of null bytes.  */
> +  const char * GTY((string_length ("%h.to_file_len"))) to_file;
> +  unsigned int to_file_len;
>    linenum_type to_line;

What's the intended interaction between this, the garbage-collector,
and PCH?  Is to_file to be allocated in the GC-managed heap, or can it
be outside of it?  Looking at patch 6 I see that this seems to be
allocated (in destringize_and_run) by _cpp_unaligned_alloc.  I don't
remember off the top of my head if that's valid.

>  
>    /* Location from whence this line map was included.  For regular
> @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
>     at least as long as the lifetime of SET.  An empty
>     TO_FILE means standard input.  If reason is LC_LEAVE, and
>     TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> -   natural values considering the file we are returning to.
> +   natural values considering the file we are returning to.  If reason
> +   is LC_GEN, then TO_FILE is not a file name, but rather the actual
> +   content, and TO_FILE_LEN>0 is the length of it.
>  
>     A call to this function can relocate the previous set of
>     maps, so any stored line_map pointers should not be used.  */
>  extern const line_map *linemap_add
>    (class line_maps *, enum lc_reason, unsigned int sysp,
> -   const char *to_file, linenum_type to_line);
> +   const char *to_file, linenum_type to_line, unsigned int to_file_len = 0);
>  
>  /* Create a macro map.  A macro map encodes source locations of tokens
>     that are part of a macro replacement-list, at a macro expansion
> @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set,
>  
>  typedef struct
>  {
> -  /* The name of the source file involved.  */
> +  /* The name of the source file involved, or NULL if
> +     generated_data is non-NULL.  */
>    const char *file;
>  
>    /* The line-location in the source file.  */
> @@ -1316,6 +1325,10 @@ typedef struct
>  
>    /* In a system header?. */
>    bool sysp;
> +
> +  /* If generated data, the data and its length.  */
> +  unsigned int generated_data_len;
> +  const char *generated_data;
>  } expanded_location;

Probably worth noting that generated_data can contain NUL bytes, and
isn't necessarily NUL-terminated.


Thanks again for the patch; hope this is constructive
Dave


  reply	other threads:[~2022-11-05 16:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 13:44 [PATCH 0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2022-11-04 13:44 ` [PATCH 1/6] diagnostics: Fix macro tracking for ad-hoc locations Lewis Hyatt
2022-11-04 15:53   ` David Malcolm
2022-11-04 13:44 ` [PATCH 2/6] diagnostics: Use an inline function rather than hardcoding <built-in> string Lewis Hyatt
2022-11-04 15:55   ` David Malcolm
2022-11-04 13:44 ` [PATCH 3/6] libcpp: Fix paste error with unknown pragma after macro expansion Lewis Hyatt
2022-11-21 17:50   ` Jeff Law
2022-11-04 13:44 ` [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2022-11-05 16:23   ` David Malcolm [this message]
2022-11-05 17:28     ` Lewis Hyatt
2022-11-17 21:21     ` Lewis Hyatt
2023-01-05 22:34       ` Lewis Hyatt
2022-11-04 13:44 ` [PATCH 5/6] diagnostics: Support generated data in additional contexts Lewis Hyatt
2022-11-04 16:42   ` David Malcolm
2022-11-04 21:05     ` Lewis Hyatt
2022-11-05  1:54       ` [PATCH 5b/6] diagnostics: Remove null-termination requirement for json::string David Malcolm
2022-11-05  1:55       ` [PATCH 5a/6] diagnostics: Handle generated data locations in edit_context David Malcolm
2022-11-04 13:44 ` [PATCH 6/6] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=298deff4873718eb8b5e366007e8f2d4baa83bdb.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=lhyatt@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).