From: Lewis Hyatt <lhyatt@gmail.com>
To: gcc-patches@gcc.gnu.org
Cc: David Malcolm <dmalcolm@redhat.com>
Subject: Re: [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
Date: Thu, 5 Jan 2023 17:34:41 -0500 [thread overview]
Message-ID: <CAA_5UQ6dZvxOhiOF67oVRSrE5qZH5Jf7Y8NeaTAt9PWBfYMFEg@mail.gmail.com> (raw)
In-Reply-To: <20221117212115.GA38914@ldh-imac.local>
On Thu, Nov 17, 2022 at 4:21 PM Lewis Hyatt <lhyatt@gmail.com> wrote:
>
> On Sat, Nov 05, 2022 at 12:23:28PM -0400, David Malcolm wrote:
> > On Fri, 2022-11-04 at 09:44 -0400, Lewis Hyatt via Gcc-patches wrote:
> > [...snip...]
> > >
> > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > index 5890c18bdc3..2935d7fb236 100644
> > > --- a/gcc/c-family/c-common.cc
> > > +++ b/gcc/c-family/c-common.cc
> > > @@ -9183,11 +9183,14 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
> > > const line_map_ordinary *ord_map
> > > = LINEMAPS_ORDINARY_MAP_AT (line_table, i);
> > >
> > > + if (ord_map->reason == LC_GEN)
> > > + continue;
> > > +
> > > if (const line_map_ordinary *from
> > > = linemap_included_from_linemap (line_table, ord_map))
> > > /* We cannot use pointer equality, because with preprocessed
> > > input all filename strings are unique. */
> > > - if (0 == strcmp (from->to_file, file))
> > > + if (from->reason != LC_GEN && 0 == strcmp (from->to_file, file))
> > > {
> > > last_include_ord_map = from;
> > > last_ord_map_after_include = NULL;
> >
> > [...snip...]
> >
> > I'm not a fan of having the "to_file" field change meaning based on
> > whether reason is LC_GEN.
> >
> > How involved would it be to split line_map_ordinary into two
> > subclasses, so that we'd have this hierarchy (with indentation showing
> > inheritance):
> >
> > line_map
> > line_map_ordinary
> > line_map_ordinary_file
> > line_map_ordinary_generated
> > line_map_macro
> >
> > Alternatively, how about renaming "to_file" to be "data" (or "m_data"),
> > to emphasize that it might not be a filename, and that we have to check
> > everywhere we access that field.
> >
> > Please can all those checks for LC_GEN go into an inline function so we
> > can write e.g.
> > map->generated_p ()
> > or somesuch.
> >
> > If I reading things right, patch 6 adds the sole usage of this in
> > destringize_and_run. Would we ever want to discriminate between
> > different kinds of generated buffers?
> >
> > [...snip...]
> >
> > > @@ -796,10 +798,13 @@ diagnostic_report_current_module (diagnostic_context *context, location_t where)
> > > N_("of module"),
> > > N_("In module imported at"), /* 6 */
> > > N_("imported at"),
> > > + N_("In buffer generated from"), /* 8 */
> > > };
> >
> > We use the wording "destringized" in:
> >
> > so maybe this should be "In buffer destringized from" ??? (I'm not
> > sure)
> >
> > [...snip...]
> >
> > > diff --git a/gcc/input.cc b/gcc/input.cc
> > > index 483cb6e940d..3cf5480551d 100644
> > > --- a/gcc/input.cc
> > > +++ b/gcc/input.cc
> >
> > [..snip...]
> >
> > > @@ -58,7 +64,7 @@ public:
> > > ~file_cache_slot ();
> >
> > My initial thought reading the input.cc part of this patch was that I
> > want it to be very clear when a file_cache_slot is for a real file vs
> > when we're replaying generated data. I'd hoped that this could have
> > been expressed via inheritance, but we preallocate all the cache slots
> > once in an array in file_cache's ctor and the slots get reused over
> > time. So instead of that, can we please have some kind of:
> >
> > bool file_slot_p () const;
> > bool generated_slot_p () const;
> >
> > or somesuch, so that we can have clear assertions and conditionals
> > about the current state of a slot (I think the discriminating condition
> > is that generated_data_len > 0, right?)
> >
> > If I'm reading things right, it looks like file_cache_slot::m_file_path
> > does double duty after this patch, and is either a filename, or a
> > pointer to the generated data. If so, please can the patch rename it,
> > and have all usage guarded appropriately. Can it be a union? (or does
> > the ctor prevent that?)
> >
> > [...snip...]
> >
> > > @@ -445,16 +461,23 @@ file_cache::evicted_cache_tab_entry (unsigned *highest_use_count)
> > > num_file_slots files are cached. */
> > >
> > > file_cache_slot*
> > > -file_cache::add_file (const char *file_path)
> > > +file_cache::add_file (const char *file_path, unsigned int generated_data_len)
> >
> > Can we split this into two functions: one for files, and one for
> > generated data? (add_file vs add_generated_data?)
> >
> > > {
> > >
> > > - FILE *fp = fopen (file_path, "r");
> > > - if (fp == NULL)
> > > - return NULL;
> > > + FILE *fp;
> > > + if (generated_data_len)
> > > + fp = NULL;
> > > + else
> > > + {
> > > + fp = fopen (file_path, "r");
> > > + if (fp == NULL)
> > > + return NULL;
> > > + }
> > >
> > > unsigned highest_use_count = 0;
> > > file_cache_slot *r = evicted_cache_tab_entry (&highest_use_count);
> > > - if (!r->create (in_context, file_path, fp, highest_use_count))
> > > + if (!r->create (in_context, file_path, fp, highest_use_count,
> > > + generated_data_len))
> > > return NULL;
> > > return r;
> > > }
> >
> > [...snip...]
> >
> > > @@ -535,11 +571,12 @@ file_cache::~file_cache ()
> > > it. */
> > >
> > > file_cache_slot*
> > > -file_cache::lookup_or_add_file (const char *file_path)
> > > +file_cache::lookup_or_add_file (const char *file_path,
> > > + unsigned int generated_data_len)
> >
> > Likewise, could this be split into:
> > lookup_or_add_file
> > and
> > lookup_or_add_generated
> > or somesuch?
> >
> > > {
> > > file_cache_slot *r = lookup_file (file_path);
> >
> > The patch doesn't seem to touch file_cache::lookup_file. Is the
> > current implementation of that ideal (it looks like we're going to be
> > doing strcmp of generated buffers, when presumably for those we could
> > simply be doing pointer comparisons).
> >
> > Maybe rename it to lookup_slot?
> >
> > > if (r == NULL)
> > > - r = add_file (file_path);
> > > + r = add_file (file_path, generated_data_len);
> > > return r;
> > > }
> > >
> > > @@ -547,7 +584,8 @@ file_cache::lookup_or_add_file (const char *file_path)
> > > diagnostic. */
> > >
> > > file_cache_slot::file_cache_slot ()
> > > -: m_use_count (0), m_file_path (NULL), m_fp (NULL), m_data (0),
> > > +: m_use_count (0), m_file_path (NULL), m_fp (NULL),
> > > + m_data (0), m_data_active (0),
> > > m_alloc_offset (0), m_size (0), m_nb_read (0), m_line_start_idx (0),
> > > m_line_num (0), m_total_lines (0), m_missing_trailing_newline (true)
> > > {
> >
> > [...snip...]
> >
> >
> >
> > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
> > > index 50207cacc12..eb281809cbd 100644
> > > --- a/libcpp/include/line-map.h
> > > +++ b/libcpp/include/line-map.h
> > > @@ -75,6 +75,8 @@ enum lc_reason
> > > LC_RENAME_VERBATIM, /* Likewise, but "" != stdin. */
> > > LC_ENTER_MACRO, /* Begin macro expansion. */
> > > LC_MODULE, /* A (C++) Module. */
> > > + LC_GEN, /* Internally generated source. */
> > > +
> > > /* FIXME: add support for stringize and paste. */
> > > LC_HWM /* High Water Mark. */
> > > };
> > > @@ -437,7 +439,11 @@ struct GTY((tag ("1"))) line_map_ordinary : public line_map {
> > >
> > > /* Pointer alignment boundary on both 32 and 64-bit systems. */
> > >
> > > - const char *to_file;
> > > + /* This GTY markup is in case this is an LC_GEN map, in which case
> > > + to_file actually points to the generated data, which we do not
> > > + want to require to be free of null bytes. */
> > > + const char * GTY((string_length ("%h.to_file_len"))) to_file;
> > > + unsigned int to_file_len;
> > > linenum_type to_line;
> >
> > What's the intended interaction between this, the garbage-collector,
> > and PCH? Is to_file to be allocated in the GC-managed heap, or can it
> > be outside of it? Looking at patch 6 I see that this seems to be
> > allocated (in destringize_and_run) by _cpp_unaligned_alloc. I don't
> > remember off the top of my head if that's valid.
> >
> > >
> > > /* Location from whence this line map was included. For regular
> > > @@ -1101,13 +1107,15 @@ extern line_map *line_map_new_raw (line_maps *, bool, unsigned);
> > > at least as long as the lifetime of SET. An empty
> > > TO_FILE means standard input. If reason is LC_LEAVE, and
> > > TO_FILE is NULL, then TO_FILE, TO_LINE and SYSP are given their
> > > - natural values considering the file we are returning to.
> > > + natural values considering the file we are returning to. If reason
> > > + is LC_GEN, then TO_FILE is not a file name, but rather the actual
> > > + content, and TO_FILE_LEN>0 is the length of it.
> > >
> > > A call to this function can relocate the previous set of
> > > maps, so any stored line_map pointers should not be used. */
> > > extern const line_map *linemap_add
> > > (class line_maps *, enum lc_reason, unsigned int sysp,
> > > - const char *to_file, linenum_type to_line);
> > > + const char *to_file, linenum_type to_line, unsigned int to_file_len = 0);
> > >
> > > /* Create a macro map. A macro map encodes source locations of tokens
> > > that are part of a macro replacement-list, at a macro expansion
> > > @@ -1304,7 +1312,8 @@ linemap_location_before_p (class line_maps *set,
> > >
> > > typedef struct
> > > {
> > > - /* The name of the source file involved. */
> > > + /* The name of the source file involved, or NULL if
> > > + generated_data is non-NULL. */
> > > const char *file;
> > >
> > > /* The line-location in the source file. */
> > > @@ -1316,6 +1325,10 @@ typedef struct
> > >
> > > /* In a system header?. */
> > > bool sysp;
> > > +
> > > + /* If generated data, the data and its length. */
> > > + unsigned int generated_data_len;
> > > + const char *generated_data;
> > > } expanded_location;
> >
> > Probably worth noting that generated_data can contain NUL bytes, and
> > isn't necessarily NUL-terminated.
> >
> >
> > Thanks again for the patch; hope this is constructive
> > Dave
> >
>
> Hi Dave-
>
> Thanks again for taking a look at this one, sorry it's so long. I redid this
> patch 4/6 taking into account all of your suggestions. It's attached here.
> Now, on the linemap side of things, I renamed the member variable from TO_FILE
> to DATA, and created inline accessor functions to get at it. The inline
> accessors will assert that the linemap is of the correct type. I checked all
> the call sites and adjusted as needed. On the input.cc side of things, I
> switched it to use inheritance. The logic for finding and caching lines
> resides in a base class, while the two derived classes handle retrieving the
> data from the necessary source (a file, or an in-memory buffer). I think it's
> much nicer now, please let me know what you think? Thanks!
>
> BTW, the remaining patches downstream of this one do not need to be modified,
> except that the new testcase for 5c/6 (the SARIF output patch) needs one line
> changed since the output of -fdump-internal-locations now distinguishes LC_GEN
> maps as well. I can resend that and/or any of the others once the dust
> settles with this one if that's helpful.
>
> -Lewis
Hello-
I thought I might check in on this one to see if you'd still be able
to take a look at it sometime please, and if you like the new approach
better?
It is at https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606616.html
, but I am going to re-submit the 4 remaining patches to the list
shortly so they are all in one place cleanly with more logical
numbering. The other 3 are unchanged from before other than the
one-line change to SARIF output. Thanks!
-Lewis
next prev parent reply other threads:[~2023-01-05 22:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 13:44 [PATCH 0/6] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2022-11-04 13:44 ` [PATCH 1/6] diagnostics: Fix macro tracking for ad-hoc locations Lewis Hyatt
2022-11-04 15:53 ` David Malcolm
2022-11-04 13:44 ` [PATCH 2/6] diagnostics: Use an inline function rather than hardcoding <built-in> string Lewis Hyatt
2022-11-04 15:55 ` David Malcolm
2022-11-04 13:44 ` [PATCH 3/6] libcpp: Fix paste error with unknown pragma after macro expansion Lewis Hyatt
2022-11-21 17:50 ` Jeff Law
2022-11-04 13:44 ` [PATCH 4/6] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2022-11-05 16:23 ` David Malcolm
2022-11-05 17:28 ` Lewis Hyatt
2022-11-17 21:21 ` Lewis Hyatt
2023-01-05 22:34 ` Lewis Hyatt [this message]
2022-11-04 13:44 ` [PATCH 5/6] diagnostics: Support generated data in additional contexts Lewis Hyatt
2022-11-04 16:42 ` David Malcolm
2022-11-04 21:05 ` Lewis Hyatt
2022-11-05 1:54 ` [PATCH 5b/6] diagnostics: Remove null-termination requirement for json::string David Malcolm
2022-11-05 1:55 ` [PATCH 5a/6] diagnostics: Handle generated data locations in edit_context David Malcolm
2022-11-04 13:44 ` [PATCH 6/6] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAA_5UQ6dZvxOhiOF67oVRSrE5qZH5Jf7Y8NeaTAt9PWBfYMFEg@mail.gmail.com \
--to=lhyatt@gmail.com \
--cc=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).