public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Lewis Hyatt <lhyatt@gmail.com>
To: David Malcolm <dmalcolm@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers
Date: Mon, 31 Jul 2023 18:39:15 -0400	[thread overview]
Message-ID: <CAA_5UQ77jeihxYOZ35pAKQ2h5LygnRLhzTGQQrYD=zVBb4cq2Q@mail.gmail.com> (raw)
In-Reply-To: <7b672a949cd1686d5c0a998cf88e6bbfed4f0cde.camel@redhat.com>

On Fri, Jul 28, 2023 at 6:58 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Fri, 2023-07-21 at 19:08 -0400, Lewis Hyatt wrote:
> > Add a new linemap reason LC_GEN which enables encoding the location
> > of data
> > that was generated during compilation and does not appear in any
> > source file.
> > There could be many use cases, such as, for instance, referring to
> > the content
> > of builtin macros (not yet implemented, but an easy lift after this
> > one.) The
> > first intended application is to create a place to store the input to
> > a
> > _Pragma directive, so that proper locations can be assigned to those
> > tokens. This will be done in a subsequent commit.
> >
> > The actual change needed to the line-maps API in libcpp is not too
> > large and
> > requires no space overhead in the line map data structures (on 64-bit
> > systems
> > that is; one newly added data member to class line_map_ordinary sits
> > inside
> > former padding bytes.) An LC_GEN map is just an ordinary map like any
> > other,
> > but the TO_FILE member that normally points to the file name points
> > instead to
> > the actual data.  This works automatically with PCH as well, for the
> > same
> > reason that the file name makes its way into a PCH.  In order to
> > avoid
> > confusion, the member has been renamed from TO_FILE to DATA, and
> > associated
> > accessors adjusted.
> >
> > Outside libcpp, there are many small changes but most of them are to
> > selftests, which are necessarily more sensitive to implementation
> > details. From the perspective of the user (the "user", here, being a
> > frontend
> > using line maps or else the diagnostics infrastructure), the chief
> > visible
> > change is that the function location_get_source_line() should be
> > passed an
> > expanded_location object instead of a separate filename and line
> > number.  This
> > is not a big change because in most cases, this information came
> > anyway from a
> > call to expand_location and the needed expanded_location object is
> > readily
> > available. The new overload of location_get_source_line() uses the
> > extra
> > information in the expanded_location object to obtain the data from
> > the
> > in-memory buffer when it originated from an LC_GEN map.
> >
> > Until the subsequent patch that starts using LC_GEN maps, none are
> > yet
> > generated within GCC, hence nothing is added to the testsuite here;
> > but all
> > relevant selftests have been extended to cover generated data maps in
> > addition
> > to normal files.
>
> [..snip...]
>
> Thanks for the updated patch.
>
> Reading this patch, it felt a bit unnatural to me to have an
>   (exploded location, source line)
> pair where the exploded location seems to be representing "which source
> file or generated buffer", but the line/column info in that
> exploded_location is to be ignored in favor of the 2nd source line.
>
> I think we're missing a class: something that identifies either a
> specific source file, or a specific generated buffer.
>
> How about something like either:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_filename_or_buffer (filename),
>     m_len (0)
>   {
>   }
>
>   explicit source_id (const char *buffer, unsigned buffer_len)
>   : m_filename_or_buffer (buffer),
>     m_len (buffer_len)
>   {
>     linemap_assert (buffer_len > 0);
>   }
>
> private:
>   const char *m_filename_or_buffer;
>   unsigned m_len;  // where 0 means "it's a filename"
> };
>
> or:
>
> class source_id
> {
> public:
>   source_id (const char *filename)
>   : m_ptr (filename),
>     m_is_buffer (false)
>   {
>   }
>
>   explicit source_id (const linemap_ordinary *buffer_linemap)
>   : m_ptr (buffer_linemap),
>     m_is_buffer (true)
>   {
>   }
>
> private:
>   const void *m_ptr;
>   bool m_is_buffer;
> };
>
> and use one of these "source_id file" in place of "const char *file",
> rather than replacing such things with expanded_location?
>
> > diff --git a/gcc/c-family/c-indentation.cc b/gcc/c-family/c-indentation.cc
> > index e8d3dece770..4164fa0b1ba 100644
> > --- a/gcc/c-family/c-indentation.cc
> > +++ b/gcc/c-family/c-indentation.cc
> > @@ -50,7 +50,7 @@ get_visual_column (expanded_location exploc,
> >                  unsigned int *first_nws,
> >                  unsigned int tab_width)
> >  {
> > -  char_span line = location_get_source_line (exploc.file, exploc.line);
> > +  char_span line = location_get_source_line (exploc);
>
> ...so this might contine to be:
>
>   char_span line = location_get_source_line (exploc.file, exploc.line);
>
> ...but expanded_location's "file" field would become a source_id,
> rather than a const char *.  It looks like doing do might make a lot of
> "is this the same file or buffer?"  turn into comparisons of source_id
> instances.
>
> So I think expanded_location would become:
>
> typedef struct
> {
>   /* Either the name of the source file involved, or the
>      specific generated buffer.  */
>   source_id file;
>
>   /* The line-location in the source file.  */
>   int line;
>
>   int column;
>
>   void *data;
>
>   /* In a system header?. */
>   bool sysp;
> } expanded_location;
>
> and we wouldn't need to add these extra fields:
>
> > +
> > +  /* If generated data, the data and its length.  The data may contain embedded
> > +   nulls and need not be null-terminated.  */
> > +  unsigned int generated_data_len;
> > +  const char *generated_data;
> >  } expanded_location;
>
> and we could pass around source_id instances when identifying specific
> filenames/generated buffers.
>
> Does this idea simplify/clarify the patch, or make it more complicated?
>
> [...snip...]
>
> Thoughts?
> Dave
>

Thanks, this makes sense and I think on balance it makes the interface
nicer to do it this way. In the last patch of this series (for SARIF
output) I had found it necessary to use a
    typdef std::pair<const char*, unsigned int> filename_or_buffer;
which was the same thing in spirit as your source_id. It makes sense
to promote that to a real class and use it more widely.

I will send out an updated series with that change later after testing.

I don't think we can simply change "file" in expanded_location to be a
source_id, because this field is used in lots of places that don't
care about generated data buffers, and that interface change would
necessitate touching all of them. For example, gengtype.cc uses libcpp
and expanded_location, and there are a bunch of call sites in the
middle end and backend that do as well, plus e.g. the custom
diagnostics hooks in the C++ front end that print "inlined from
xyz.cc". I thought about making source_id implicitly convertible to
char*, but I think that is too error prone, plus it doesn't help with
the most common use of this field, which is to pass it to printf().
The approach I am thinking to take is to leave "file" as it is in
expanded_location, but to also add a "source_id src" field too. This
way, the only call sites that need to be touched are those that care
about this distinction, and so the number of changes is much more
limited. But it does still achieve the goal that we don't need to use
an expanded_location to communicate with input.cc, we can use a
source_id instead, and that makes the interface more natural.

With the new interface, the main change needed for diagnostics code
would be that instead of calling location_get_source_line(file_name,
line), you would need to call location_get_source_line(src_id, line).
In case both the source_id and the line number come from an
expanded_location, there could be a convenience overload like
location_get_source_line(exploc) also, but it wouldn't be necessary to
involved an expanded_location if the source_id and line are better
handled separately.

-Lewis

  reply	other threads:[~2023-07-31 22:39 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-21 23:08 [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 1/4] diagnostics: libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-07-28 22:58   ` David Malcolm
2023-07-31 22:39     ` Lewis Hyatt [this message]
2023-08-09 22:14       ` [PATCH v4 0/8] diagnostics: libcpp: Overhaul locations for _Pragma tokens Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 1/8] libcpp: Add LC_GEN linemaps to support in-memory buffers Lewis Hyatt
2023-08-11 22:45           ` David Malcolm
2023-08-13 20:18             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 2/8] libcpp: diagnostics: Support generated data in expanded locations Lewis Hyatt
2023-08-11 23:02           ` David Malcolm
2023-08-14 21:41             ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 3/8] diagnostics: Refactor class file_cache_slot Lewis Hyatt
2023-08-15 15:43           ` David Malcolm
2023-08-15 17:58             ` Lewis Hyatt
2023-08-15 19:39               ` David Malcolm
2023-08-23 21:22                 ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 4/8] diagnostics: Support obtaining source code lines from generated data buffers Lewis Hyatt
2023-08-15 16:15           ` David Malcolm
2023-08-15 18:15             ` Lewis Hyatt
2023-08-15 19:46               ` David Malcolm
2023-08-15 20:08                 ` Lewis Hyatt
2023-08-23 19:41                   ` Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 5/8] diagnostics: Support testing generated data in input.cc selftests Lewis Hyatt
2023-08-15 16:27           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 6/8] diagnostics: Full support for generated data locations Lewis Hyatt
2023-08-15 16:39           ` David Malcolm
2023-08-09 22:14         ` [PATCH v4 7/8] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-08-09 22:14         ` [PATCH v4 8/8] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-08-15 17:04           ` David Malcolm
2023-08-15 17:51             ` Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 2/4] diagnostics: Handle generated data locations in edit_context Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 3/4] diagnostics: libcpp: Assign real locations to the tokens inside _Pragma strings Lewis Hyatt
2023-07-21 23:08 ` [PATCH v3 4/4] diagnostics: Support generated data locations in SARIF output Lewis Hyatt
2023-07-28 22:22 ` [PATCH v3 0/4] diagnostics: libcpp: Overhaul locations for _Pragma tokens David Malcolm
2023-07-29 14:27   ` Lewis Hyatt
2023-07-29 16:03     ` David Malcolm

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA_5UQ77jeihxYOZ35pAKQ2h5LygnRLhzTGQQrYD=zVBb4cq2Q@mail.gmail.com' \
    --to=lhyatt@gmail.com \
    --cc=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).