public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: gcc-patches@gcc.gnu.org
Cc: David Malcolm <dmalcolm@redhat.com>
Subject: Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
Date: Thu, 5 Jan 2023 17:34:41 -0500	[thread overview]
Message-ID: <CAA_5UQ6dZvxOhiOF67oVRSrE5qZH5Jf7Y8NeaTAt9PWBfYMFEg@mail.gmail.com> (raw)
In-Reply-To: <20221117212115.GA38914@ldh-imac.local>

On Thu, Nov 17, 2022 at 4:21 PM Lewis Hyatt <lhyatt@gmail.com> wrote:
>
> On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote:
> > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> > [...snip...]
> > >
> > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > index 5890c18bdc3..2935d7fb236 100644
> > > --- a/gcc/c-family/c-common.cc
> > > +++ b/gcc/c-family/c-common.cc
> > > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
> > >        const line_map_ordinary *ord_map
> > >         = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> > >
> > > +      if (ord_map->reason == LC_GEN)
> > > +       continue;
> > > +
> > >        if (const line_map_ordinary *from
> > >           = linemap_included_from_linemap (line_table, ord_map))
> > >         /* We cannot use pointer equality, because with preprocessed
> > >            input all filename strings are unique.  */
> > > -       if (0 == strcmp (from->to_file, file))
> > > +       if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> > >           {
> > >             last_include_ord_map = from;
> > >             last_ord_map_after_include = NULL;
> >
> > [...snip...]
> >
> > I'm not a fan of having the "to_file" field change meaning based on
> > whether reason is LC_GEN.
> >
> > How involved would it be to split line_map_ordinary into two
> > subclasses, so that we'd have this hierarchy (with indentation showing
> > inheritance):
> >
> > line_map
> >   line_map_ordinary
> >     line_map_ordinary_file
> >     line_map_ordinary_generated
> >   line_map_macro
> >
> > Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> > to emphasize that it might not be a filename, and that we have to check
> > everywhere we access that field.
> >
> > Please can all those checks for LC_GEN go into an inline function so we
> > can write e.g.
> >   map->generated_p ()
> > or somesuch.
> >
> > If I reading things right, patch 6 adds the sole usage of this in
> > destringize_and_run.  Would we ever want to discriminate between
> > different kinds of generated buffers?
> >
> > [...snip...]
> >
> > > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
> > >                  N_("of module"),
> > >                  N_("In module imported at"),   /* 6 */
> > >                  N_("imported at"),
> > > +                N_("In buffer generated from"),   /* 8 */
> > >                 };
> >
> > We use the wording "destringized" in:
> >
> > so maybe this should be "In buffer destringized from" ???  (I'm not
> > sure)
> >
> > [...snip...]
> >
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 483cb6e940d..3cf5480551d 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> >
> > [..snip...]
> >
> > > @@ -58,7 +64,7 @@ public:
> > >    ~file_cache_slot ();
> >
> > My initial thought reading the input.cc part of this patch was that I
> > want it to be very clear when a file_cache_slot is for a real file vs
> > when we're replaying generated data.  I'd hoped that this could have
> > been expressed via inheritance, but we preallocate all the cache slots
> > once in an array in file_cache's ctor and the slots get reused over
> > time.  So instead of that, can we please have some kind of:
> >
> >    bool file_slot_p () const;
> >    bool generated_slot_p () const;
> >
> > or somesuch, so that we can have clear assertions and conditionals
> > about the current state of a slot (I think the discriminating condition
> > is that generated_data_len > 0, right?)
> >
> > If I'm reading things right, it looks like file_cache_slot::m_file_path
> > does double duty after this patch, and is either a filename, or a
> > pointer to the generated data.  If so, please can the patch rename it,
> > and have all usage guarded appropriately.  Can it be a union? (or does
> > the ctor prevent that?)
> >
> > [...snip...]
> >
> > > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
> > >     num_file_slots files are cached.  */
> > >
> > >  file_cache_slot*
> > > -file_cache::add_file (const char *file_path)
> > > +file_cache::add_file (const char *file_path, unsigned int generated_data_len)
> >
> > Can we split this into two functions: one for files, and one for
> > generated data?  (add_file vs add_generated_data?)
> >
> > >  {
> > >
> > > -  FILE *fp = fopen (file_path, "r");
> > > -  if (fp == NULL)
> > > -    return NULL;
> > > +  FILE *fp;
> > > +  if (generated_data_len)
> > > +    fp = NULL;
> > > +  else
> > > +    {
> > > +      fp = fopen (file_path, "r");
> > > +      if (fp == NULL)
> > > +       return NULL;
> > > +    }
> > >
> > >    unsigned highest_use_count = 0;
> > >    file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
> > > -  if (!r->create (in_context, file_path, fp, highest_use_count))
> > > +  if (!r->create (in_context, file_path, fp, highest_use_count,
> > > +                 generated_data_len))
> > >      return NULL;
> > >    return r;
> > >  }
> >
> > [...snip...]
> >
> > > @@ -535,11 +571,12 @@ file_cache::~file_cache ()
> > >     it.  */
> > >
> > >  file_cache_slot*
> > > -file_cache::lookup_or_add_file (const char *file_path)
> > > +file_cache::lookup_or_add_file (const char *file_path,
> > > +                               unsigned int generated_data_len)
> >
> > Likewise, could this be split into:
> >   lookup_or_add_file
> > and
> >   lookup_or_add_generated
> > or somesuch?
> >
> > >  {
> > >    file_cache_slot *r = lookup_file (file_path);
> >
> > The patch doesn't seem to touch file_cache::lookup_file.  Is the
> > current implementation of that ideal (it looks like we're going to be
> > doing strcmp of generated buffers, when presumably for those we could
> > simply be doing pointer comparisons).
> >
> > Maybe rename it to lookup_slot?
> >
> > >    if (r == NULL)
> > > -    r = add_file (file_path);
> > > +    r = add_file (file_path, generated_data_len);
> > >    return r;
> > >  }
> > >
> > > @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path)
> > >     diagnostic.  */
> > >
> > >  file_cache_slot::file_cache_slot ()
> > > -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
> > > +: m_use_count (0), m_file_path (NULL), m_fp (NULL),
> > > +  m_data (0), m_data_active (0),
> > >    m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0),
> > >    m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true)
> > >  {
> >
> > [...snip...]
> >
> >
> >
> > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> > > index 50207cacc12..eb281809cbd 100644
> > > --- a/libcpp/include/line-map.h
> > > +++ b/libcpp/include/line-map.h
> > > @@ -75,6 +75,8 @@ enum lc_reason
> > >    LC_RENAME_VERBATIM,  /* Likewise, but "" != stdin.  */
> > >    LC_ENTER_MACRO,      /* Begin macro expansion.  */
> > >    LC_MODULE,           /* A (C++) Module.  */
> > > +  LC_GEN,              /* Internally generated source.  */
> > > +
> > >    /* FIXME: add support for stringize and paste.  */
> > >    LC_HWM /* High Water Mark.  */
> > >  };
> > > @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
> > >
> > >    /* Pointer alignment boundary on both 32 and 64-bit systems.  */
> > >
> > > -  const char *to_file;
> > > +  /* This GTY markup is in case this is an LC_GEN map, in which case
> > > +     to_file actually points to the generated data, which we do not
> > > +     want to require to be free of null bytes.  */
> > > +  const char * GTY((string_length ("%h.to_file_len"))) to_file;
> > > +  unsigned int to_file_len;
> > >    linenum_type to_line;
> >
> > What's the intended interaction between this, the garbage-collector,
> > and PCH?  Is to_file to be allocated in the GC-managed heap, or can it
> > be outside of it?  Looking at patch 6 I see that this seems to be
> > allocated (in destringize_and_run) by _cpp_unaligned_alloc.  I don't
> > remember off the top of my head if that's valid.
> >
> > >
> > >    /* Location from whence this line map was included.  For regular
> > > @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
> > >     at least as long as the lifetime of SET.  An empty
> > >     TO_FILE means standard input.  If reason is LC_LEAVE, and
> > >     TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > > -   natural values considering the file we are returning to.
> > > +   natural values considering the file we are returning to.  If reason
> > > +   is LC_GEN, then TO_FILE is not a file name, but rather the actual
> > > +   content, and TO_FILE_LEN>0 is the length of it.
> > >
> > >     A call to this function can relocate the previous set of
> > >     maps, so any stored line_map pointers should not be used.  */
> > >  extern const line_map *linemap_add
> > >    (class line_maps *, enum lc_reason, unsigned int sysp,
> > > -   const char *to_file, linenum_type to_line);
> > > +   const char *to_file, linenum_type to_line, unsigned int to_file_len = 0);
> > >
> > >  /* Create a macro map.  A macro map encodes source locations of tokens
> > >     that are part of a macro replacement-list, at a macro expansion
> > > @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set,
> > >
> > >  typedef struct
> > >  {
> > > -  /* The name of the source file involved.  */
> > > +  /* The name of the source file involved, or NULL if
> > > +     generated_data is non-NULL.  */
> > >    const char *file;
> > >
> > >    /* The line-location in the source file.  */
> > > @@ -1316,6 +1325,10 @@ typedef struct
> > >
> > >    /* In a system header?. */
> > >    bool sysp;
> > > +
> > > +  /* If generated data, the data and its length.  */
> > > +  unsigned int generated_data_len;
> > > +  const char *generated_data;
> > >  } expanded_location;
> >
> > Probably worth noting that generated_data can contain NUL bytes, and
> > isn't necessarily NUL-terminated.
> >
> >
> > Thanks again for the patch; hope this is constructive
> > Dave
> >
>
> Hi Dave-
>
> Thanks again for taking a look at this one, sorry it's so long. I redid this
> patch 4/6 taking into account all of your suggestions. It's attached here.
> Now, on the linemap side of things, I renamed the member variable from TO_FILE
> to DATA, and created inline accessor functions to get at it.  The inline
> accessors will assert that the linemap is of the correct type.  I checked all
> the call sites and adjusted as needed.  On the input.cc side of things, I
> switched it to use inheritance.  The logic for finding and caching lines
> resides in a base class, while the two derived classes handle retrieving the
> data from the necessary source (a file, or an in-memory buffer).  I think it's
> much nicer now, please let me know what you think? Thanks!
>
> BTW, the remaining patches downstream of this one do not need to be modified,
> except that the new testcase for 5c/6 (the SARIF output patch) needs one line
> changed since the output of -fdump-internal-locations now distinguishes LC_GEN
> maps as well. I can resend that and/or any of the others once the dust
> settles with this one if that's helpful.
>
> -Lewis

Hello-

I thought I might check in on this one to see if you'd still be able
to take a look at it sometime please, and if you like the new approach
better?
It is at https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606616.html
, but I am going to re-submit the 4 remaining patches to the list
shortly so they are all in one place cleanly with more logical
numbering. The other 3 are unchanged from before other than the
one-line change to SARIF output. Thanks!


-Lewis

  reply	other threads:[~2023-01-05 22:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-04 13:44 [PATCH 0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2022-11-04 13:44 ` [PATCH 1/6] diagnostics: Fix macro tracking for ad-hoc locations Lewis Hyatt
2022-11-04 15:53   ` David Malcolm
2022-11-04 13:44 ` [PATCH 2/6] diagnostics: Use an inline function rather than hardcoding <built-in> string Lewis Hyatt
2022-11-04 15:55   ` David Malcolm
2022-11-04 13:44 ` [PATCH 3/6] libcpp: Fix paste error with unknown pragma after macro expansion Lewis Hyatt
2022-11-21 17:50   ` Jeff Law
2022-11-04 13:44 ` [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2022-11-05 16:23   ` David Malcolm
2022-11-05 17:28     ` Lewis Hyatt
2022-11-17 21:21     ` Lewis Hyatt
2023-01-05 22:34       ` Lewis Hyatt [this message]
2022-11-04 13:44 ` [PATCH 5/6] diagnostics: Support generated data in additional contexts Lewis Hyatt
2022-11-04 16:42   ` David Malcolm
2022-11-04 21:05     ` Lewis Hyatt
2022-11-05  1:54       ` [PATCH 5b/6] diagnostics: Remove null-termination requirement for json::string David Malcolm
2022-11-05  1:55       ` [PATCH 5a/6] diagnostics: Handle generated data locations in edit_context David Malcolm
2022-11-04 13:44 ` [PATCH 6/6] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAA_5UQ6dZvxOhiOF67oVRSrE5qZH5Jf7Y8NeaTAt9PWBfYMFEg@mail.gmail.com \
    --to=lhyatt@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).