public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: David Malcolm <dmalcolm@redhat.com>
To: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 00/17] RFC: New source-location representation; Language Server Protocol
Date: Wed, 11 Oct 2017 19:32:00 -0000	[thread overview]
Message-ID: <1507747922.29092.49.camel@redhat.com> (raw)
In-Reply-To: <1500926714-56988-1-git-send-email-dmalcolm@redhat.com>

On Mon, 2017-07-24 at 16:04 -0400, David Malcolm wrote:
> We currently capture some source location information in the
> frontends, but there are many kinds of source entity for which we
> *don't*
> retain the location information after the initial parse.
> 
> For example, in the C/C++ frontends:
> 
> * we don't capture the locations of the individual parameters
>   within e.g. an extern function declaration, so we can't underline
>   the pertinent param when there's a mismatching type in a call to
>   that function decl.
> 
> * we don't capture the locations of attributes of a function,
>   so we can't underline these if they're wrong (e.g. a "noreturn" on
> a
>   function that does in fact return).
> 
> * we don't retain the locations of things like close parens and
>   semicolons for after parsing, so we can't offer fix-it hints for
>   adding new attributes, or, say the C++11 "override" feature.
> 
> * we can't at present implement many kinds of useful "cousins" of a
>   compiler on top of the GCC codebase (e.g. code refactoring tools,
>   code reformatting tools, IDE support daemons, etc), since much of
> the
>   useful location information is discarded at parse time.
> 
> This patch kit implements:
> 
> (a) a new, optional, representation of this location information,
>     enabled by a command-line flag
> 
> (b) improvements to various diagnostics to use this location
> information
>     if it's present, falling back to the status-quo (less accurate)
>     source locations otherwise
> 
> (b) a gcc-based implementation of Microsoft's Language Server
> Protocol,
>       https://github.com/Microsoft/language-server-protocol
>     allowing IDEs to connect to a gcc-based LSP server, making RPC
>     calls to query it for things like "where is this struct
> declared?".
>     This last part is very much just a proof-of-concept.
> 
> 
> ================================
> (a) The new location information
> ================================
> 
> Our existing "tree" type represents a node within an abstract syntax
> tree,
> but this is sometimes too abstract - sometimes we want the locations
> of the clauses and tokens that were abstracted away by the frontends.
> 
> In theory we could generate the full parse tree ("concrete syntax
> tree"),
> showing every production followed to parse the input, but it is
> likely
> to be unwieldy: large and difficult to navigate.
> 
> (aside: I found myself re-reading the Dragon book to refresh my mind
> on exactly what an AST vs a CST is; I also found this blog post to be
> very useful:
>   http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax
> -trees )
> 
> So the patch kit implements a middle-ground: an additional tree of
> parse
> information, much more concrete than our "tree" type, but not quite
> the
> full parse tree.
> 
> My working title for this system is "BLT" (and hence "class
> blt_node").
> I could claim that this is a acronym for "bonus location tree" (but
> it's actually a reference to a sandwich) - it needs a name, and that
> name needs to not clash with anything else in the source tree.
> "Parse Tree" would be "PT" which clashes with "points-to", and
> "Concrete Syntax Tree" would be "CST" which clashes with our
> abbreviation
> for "constant".  ("BLT" popped into my mind somewhere between "AST"
> and "CST"; ideas for better names welcome).
> 
> blt_nodes form a tree-like structure; a blt_node has a "kind",
> identifying the terminal or nonterminal it corresponds to
> (e.g. BLT_TRANSLATION_UNIT or BLT_DECLARATION_SPECIFIERS).
> This is just an enum, but allows for language-specific traversals,
> without introducing significant language-specific features in
> the shared "gcc" dir (it's just an enum of IDs).
> 
> There is a partial mapping between "tree" and blt_node: a blt_node
> can reference a tree, and a tree can reference a blt_node, though
> typically the mapping is very sparse; most don't.  This allows us
> to go from e.g. a function_decl in the "tree" world and navigate to
> pertinent parts of the syntax that was used to declare it.
> 
> All of this is enabled by a new "-fblt" command-line option; in the
> absense of -fblt, almost all of it becomes close to no-ops, and the
> relevant diagnostics fall back to using less accurate location
> information.
> 
> So it's a kind of optional, "on-the-side" record of how we parsed
> the source, with a sparse relationship to our tree type.
> 
> The patch kit implements it for the C and C++ frontends.
> 
> An example of a BLT dump for a C file can be seen here:
>   https://dmalcolm.fedorapeople.org/gcc/2017-07-24/fdump-blt.html
> It shows the tree structure using indentation (and colorization);
> source locations are printed, and, for each node where the
> location is different from the parent, the pertinent source range
> is printed and underlined inline.
> (BTW, does the colorization of dumps look useful for other
> dump formats?  similarly for the ASCII art for printing hierarchies)
> 
> 
> =====================
> (b) Examples of usage
> =====================
> 
> Patches 6-10 in the kit update various diagnostics to use
> the improved location information where available:
> 
> * C and C++: highlighting the pertinent parameter of a function
>   decl when there's a mismatched type in a call
> 
> * C and C++: highlighting the return type in the function defn
>   when compaining about mismatch in the body (e.g. highlighting
>   the "void" when we see a "return EXPR;" within a void function).
> 
> * C++: add a fix-it hint to -Wsuggest-override
> 
> I have plenty of ideas for other uses of this infrastructure
> (but which aren't implemented yet), e.g.:
> 
> * C++: highlight the "const" token (or suggest a fix-it hint)
>   when you have a missing "const" on the *definition* of a member
>   function that was declared as "const" (I make this mistake
>   all the time).
> 
> * C++: add a fix-it hint to -Wsuggest-final-methods
> 
> * highlight bogus attributes
> 
> * add fix-it hints suggesting missing attributes
> 
> ...etc, plus those "cousins of a compiler" ideas mentioned above.
> 
> Any other ideas?
> 
> 
> ============================
> (c) Language Server Protocol
> ============================
> 
> The later parts of the patch kit implement a proof-of-concept
> LSP server, making use of the extended location information,
> exposing it to IDEs.
> 
> LSP is an RPC protocol layered on top of JSON-RPC (and hence JSON
> and HTTP):
>   https://github.com/Microsoft/language-server-protocol
> so the patch kit implements a set of classes to support
> this (including a barebones HTTP server running inside cc1), and
> a toy IDE written in PyGTK to test it.
> 
> 
> =======
> Caveats
> =======
> 
> * There are plenty of FIXMEs and TODOs in the patch kit.
> 
> * I've entirely ignored tentative parsing in the C++ frontend for
> now.
> 
> * I haven't attempted to optimize it at all yet (so no performance
>   measurements yet).
> 
> * How much of the syntax tree ought to be captured?  I've focussed on
> the
>   stuff outside of function bodies, since we don't currently capture
> that
>   well, but to do "proper" IDE support we'd want to capture things
> more
>   deeply.  (I experimented with using it to fix some of our missing
>   location information for things like uses of constants and
> variables
>   as arguments at callsites, but it quickly turned into a much more
>   invasive patch).
> 
> * The LSP implementation is a just a proof-of-concept, to further
>   motivate capturing the extra data.  Turning it into a "proper" LSP
>   server implementation would be a *lot* more work, and I'm unlikely
> to
>   actually do that (but maybe someone on the list wants to take this
> on?)
> 
> I've successfully bootstrapped&regrtested the combination of the
> patches
> on x86_64-pc-linux-gnu; takes -fself-test from 39458 passes to 41574;
> adds 30 PASS results to gcc.sum; adds 182 PASS results to g++.sum.
> 
> Thoughts?
> Dave
> 
> David Malcolm (17):
>   Add param-type-mismatch.c/C testcases as a baseline
>   diagnostics: support prefixes within diagnostic_show_locus
>   Core of BLT implementation
>   C frontend: capture BLT information
>   C++ frontend: capture BLT information
>   C: use BLT to highlight parameter of callee decl for mismatching
> types
>   C++: use BLT to highlight parameter of callee decl for mismatching
>     types
>   C: highlight return types when complaining about mismatches
>   C++: highlight return types when complaining about mismatches
>   C++: provide fix-it hints in -Wsuggest-override
>   Add JSON implementation
>   Add server.h and server.c
>   Add http-server.h and http-server.c
>   Add implementation of JSON-RPC
>   Language Server Protocol: add lsp::server abstract base class
>   Language Server Protocol: proof-of-concept GCC implementation
>   Language Server Protocol: work-in-progess on testsuite

[...snip...]

When I presented this at Cauldron, both Jason and Nathan were unhappy
with this proposal: rather than adding a new on-the-side
representation, they both wanted me to use the old representation,
fixing and extending it as necessary (I hope I'm correctly
characterizing their opinion).

Jason also pointed out a way to achieve my #1 use case for this via the
existing representation (highlighting the pertinent parameter of a
function decl when there's a mismatched type in a call); I've
implemented his approach for this in trunk (r253411 for C and r253096
for C++).

Given the above, it seems unlikely that this will go forward at this
time.

In case someone wants to tackle this, I've archived the patches into
our git mirror, as git-only branches:

dmalcolm/lsp-v1.0 has the patches I posted (and a little extra):
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/dmalcolm/lsp-v1.0


dmalcolm/lsp-v1.3 has the latest state of the extra work I did after posting:
https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/dmalcolm/lsp-v1.3


Dave

      parent reply	other threads:[~2017-10-11 18:52 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-24 19:31 David Malcolm
2017-07-24 19:31 ` [PATCH 13/17] Add http-server.h and http-server.c David Malcolm
2017-07-24 19:31 ` [PATCH 17/17] Language Server Protocol: work-in-progess on testsuite David Malcolm
2017-07-24 19:31 ` [PATCH 02/17] diagnostics: support prefixes within diagnostic_show_locus David Malcolm
2017-09-01 17:11   ` Jeff Law
2017-07-24 19:31 ` [PATCH 15/17] Language Server Protocol: add lsp::server abstract base class David Malcolm
2017-07-27  4:14   ` Trevor Saunders
2017-07-28 14:48     ` David Malcolm
2017-07-27  7:55   ` Richard Biener
2017-07-28 16:02     ` David Malcolm
2017-07-24 19:31 ` [PATCH 03/17] Core of BLT implementation David Malcolm
2017-09-01 17:32   ` Jeff Law
2017-07-24 19:31 ` [PATCH 12/17] Add server.h and server.c David Malcolm
2017-07-26 14:35   ` Oleg Endo
2017-07-26 14:50     ` David Malcolm
2017-07-26 21:00       ` Mike Stump
2017-09-01 17:09   ` Jeff Law
2017-07-24 19:31 ` [PATCH 11/17] Add JSON implementation David Malcolm
2017-09-01 17:06   ` Jeff Law
2017-07-24 19:31 ` [PATCH 10/17] C++: provide fix-it hints in -Wsuggest-override David Malcolm
2017-07-24 19:31 ` [PATCH 01/17] Add param-type-mismatch.c/C testcases as a baseline David Malcolm
2017-07-24 19:56   ` Eric Gallager
2017-09-01 17:00   ` Jeff Law
2017-07-24 19:38 ` [PATCH 06/17] C: use BLT to highlight parameter of callee decl for mismatching types David Malcolm
2017-07-24 19:38 ` [PATCH 07/17] C++: " David Malcolm
2017-07-24 19:38 ` [PATCH 14/17] Add implementation of JSON-RPC David Malcolm
2017-07-24 19:39 ` [PATCH 04/17] C frontend: capture BLT information David Malcolm
2017-07-27 19:58   ` Martin Sebor
2017-07-24 19:39 ` [PATCH 08/17] C: highlight return types when complaining about mismatches David Malcolm
2017-07-24 19:39 ` [PATCH 09/17] C++: " David Malcolm
2017-07-24 19:40 ` [PATCH 05/17] C++ frontend: capture BLT information David Malcolm
2017-07-24 19:41 ` [PATCH 16/17] Language Server Protocol: proof-of-concept GCC implementation David Malcolm
2017-07-26 17:10 ` [PATCH 00/17] RFC: New source-location representation; Language Server Protocol Jim Wilson
2017-07-28  5:12   ` Alexandre Oliva
2017-07-27 19:51 ` Martin Sebor
2017-07-28 14:28   ` David Malcolm
2017-07-28 18:32     ` Martin Sebor
2017-10-11 19:32 ` David Malcolm [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1507747922.29092.49.camel@redhat.com \
    --to=dmalcolm@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).