From: David Malcolm <dmalcolm@redhat.com>
To: Jeff Law <law@redhat.com>, gcc-patches@gcc.gnu.org
Cc: David Malcolm <dmalcolm@redhat.com>
Subject: [committed] libcpp: add callback for comment-handling
Date: Mon, 05 Jun 2017 20:58:00 -0000 [thread overview]
Message-ID: <1496698274-14253-1-git-send-email-dmalcolm@redhat.com> (raw)
In-Reply-To: <5c921d5d-0e76-0663-e834-45ccc7339bf4@redhat.com>
On Fri, 2017-05-12 at 12:47 -0600, Jeff Law wrote:
> On 05/02/2017 01:08 PM, David Malcolm wrote:
> > Currently the C/C++ frontends discard comments when parsing.
> > It's possible to set up libcpp to capture comments as tokens,
> > by setting CPP_OPTION (pfile, discard_comments) to false),
> > and this can be enabled using the -C command line option (see
> > also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
> > tokens it sees, so they're not seen by the frontend parser.
> >
> > The following patch adds an (optional) callback to libcpp
> > for handling comments, giving the comment content, and the
> > location it was seen at. This approach allows arbitrary
> > logic to be wired up to comments, and avoids having to
> > copy the comment content to a new buffer (which the CPP_COMMENT
> > approach does).
> >
> > This could be used by plugins to chain up on the callback
> > e.g. to parse specially-formatted comments, e.g. for
> > documentation generation, or e.g. for GObject introspection
> > annotations [1].
> >
> > As a proof of concept, the patch uses this to add a spellchecker
> > for comments. It uses the Enchant meta-library:
> > https://abiword.github.io/enchant/
> > (essentially a wrapper around 8 different spellchecking libraries).
> > I didn't bother with the autotool detection for enchant, and
> > just hacked it in for now.
> >
> > Example output:
> >
> > test.c:3:46: warning: spellcheck_word: "evaulate"
> > When NONCONST_PRED is false the code will evaulate to constant
> > and
> > ^~~~~~~~
> > test.c:3:46: note: suggestion: "evaluate"
> > When NONCONST_PRED is false the code will evaulate to constant
> > and
> > ^~~~~~~~
> > evaluate
> > test.c:3:46: note: suggestion: "ululate"
> > When NONCONST_PRED is false the code will evaulate to constant
> > and
> > ^~~~~~~~
> > ululate
> > test.c:3:46: note: suggestion: "elevate"
> > When NONCONST_PRED is false the code will evaulate to constant
> > and
> > ^~~~~~~~
> > elevate
> >
> > License-wise, Enchant is LGPL 2.1 "or (at your option) any
> > later version." with a special exception to allow non-LGPL
> > spellchecking providers (e.g. to allow linking against an
> > OS-provided spellchecker).
> >
> > Various FIXMEs are present (e.g. hardcoded "en_US" for the
> > language to spellcheck against).
> > Also, the spellchecker has a lot of false positives e.g.
> > it doesn't grok URLs (and thus complains when it seens them);
> > similar for DejaGnu directives etc.
> >
> > Does enchant seem like a reasonable dependency for the compiler?
> > (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule
> > -2.0.so.0).
> > Or would this be better pursued as a plugin? (if so, I'd
> > prefer the plugin to live in the source tree as an example,
> > rather than out-of-tree).
> >
> > Unrelated to spellchecking, I also added two new options:
> > -Wfixme and -Wtodo, for warning when comments containing
> > "FIXME" or "TODO" are encountered.
> > I use such comments a lot during development. I thought some
> > people might want a warning about them (I tend to just use grep
> > though). [TODO: document these in invoke.texi, add test cases]
> >
> > Thoughts? Does any of this sound useful?
> >
> > [not yet bootstrapped; as noted above, I haven't yet done
> > the autoconf stuff for handling Enchant]
> >
> > [1]
> > https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations
> >
> > gcc/ChangeLog:
> > * Makefile.in (LIBS): Hack in -lenchant for now.
> > (OBJS): Add spellcheck-enchant.o.
> > * common.opt (Wfixme): New option.
> > (Wtodo): New option.
> > * spellcheck-enchant.c: New file.
> > * spellcheck-enchant.h: New file.
> >
> > gcc/c-family/ChangeLog:
> > * c-lex.c: Include spellcheck-enchant.h.
> > (init_c_lex): Wire up spellcheck_enchant_check_comment to the
> > comment callback.
> > * c-opts.c: Include spellcheck-enchant.h.
> > (c_common_post_options): Call spellcheck_enchant_init.
> > (c_common_finish): Call spellcheck_enchant_finish.
> >
> > libcpp/ChangeLog:
> > * include/cpplib.h (struct cpp_callbacks): Add "comment"
> > callback.
> > * lex.c (_cpp_lex_direct): Call the comment callback if non
> > -NULL.
> enchant seems a bit out of the sweet spot, particular just to catch
> mis-spellings in comments. But it might make an interesting plugin.
>
> IIRC from our meeting earlier this week, you had another use case
> that
> might have been more compelling, but I can't remember what it was.
(FWIW I think we were chatting about the GObjectIntrospection annotations
use-case)
> I do like the ability to at least capture the comments better and
> while
> we don't have a strong need for that capability now, we might in the
> future.
>
> Jeff
I stripped out all of the spellchecking stuff, and restricted the
scope of the patch to just adding a callback for plugins to hook into
if they're interested in handling comments. I added an example of such
a plugin in the testsuite.
Successfully bootstrapped®rtested on x86_64-pc-linux-gnu.
Committed to trunk as r248901.
gcc/testsuite/ChangeLog:
* g++.dg/plugin/comment_plugin.c: New test plugin.
* g++.dg/plugin/comments-1.C: New test file.
* g++.dg/plugin/plugin.exp (plugin_test_list): Add the above.
libcpp/ChangeLog:
* include/cpplib.h (struct cpp_callbacks): Add "comment"
callback.
* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
---
| 63 ++++++++++++++++++++++++++++
| 49 ++++++++++++++++++++++
gcc/testsuite/g++.dg/plugin/plugin.exp | 1 +
libcpp/include/cpplib.h | 9 ++++
libcpp/lex.c | 7 ++++
5 files changed, 129 insertions(+)
create mode 100644 gcc/testsuite/g++.dg/plugin/comment_plugin.c
create mode 100644 gcc/testsuite/g++.dg/plugin/comments-1.C
--git a/gcc/testsuite/g++.dg/plugin/comment_plugin.c b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
new file mode 100644
index 0000000..c3b08e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
@@ -0,0 +1,63 @@
+/* Test of cpp_callbacks::comments. */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "cpplib.h"
+#include "diagnostic.h"
+#include "c-family/c-pragma.h"
+
+int plugin_is_GPL_compatible;
+
+/* Test callback for cpp_callbacks::comments. */
+
+void
+my_comment_cb (cpp_reader *, source_location loc,
+ const unsigned char *content, size_t len)
+{
+ if (in_system_header_at (loc))
+ return;
+
+ /* CONTENT contains the opening slash-star (or slash-slash),
+ and for C-style comments contains the closing star-slash. */
+ gcc_assert (len >= 2);
+ gcc_assert (content[0] == '/');
+ gcc_assert (content[1] == '*' || content[1] == '/');
+ bool c_style = (content[1] == '*');
+ if (c_style)
+ {
+ gcc_assert (content[len - 2] == '*');
+ gcc_assert (content[len - 1] == '/');
+ }
+
+ if (c_style)
+ inform (loc, "got C-style comment; length=%i", len);
+ else
+ inform (loc, "got C++-style comment; length=%i", len);
+
+ /* Print the content of the comment.
+ For a C-style comment, the buffer CONTENT contains the opening
+ slash-star and closing star-slash, so we can't directly verify
+ it in the DejaGnu test without adding another comment, which
+ would trigger this callback again.
+ Hence we skip the syntactically-significant parts of the comment
+ when printing it. */
+ fprintf (stderr, "stripped content of comment: >");
+ /* Avoid printing trailing star-slash. */
+ if (c_style)
+ len -= 2;
+ for (size_t i = 2; i < len; i++)
+ fputc (content[i], stderr);
+ fprintf (stderr, "<\n");
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+ struct plugin_gcc_version *version)
+{
+ cpp_callbacks *cb = cpp_get_callbacks (parse_in);
+ cb->comment = my_comment_cb;
+
+ return 0;
+}
--git a/gcc/testsuite/g++.dg/plugin/comments-1.C b/gcc/testsuite/g++.dg/plugin/comments-1.C
new file mode 100644
index 0000000..0821b14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comments-1.C
@@ -0,0 +1,49 @@
+/* Example of a one-line C-style comment. */
+#if 0
+{ dg-message "1: got C-style comment; length=45" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a one-line C-style comment. <
+{ dg-end-multiline-output "" }
+#endif
+
+ /*Another example of a one-line C-style comment.*/
+#if 0
+{ dg-message "6: got C-style comment; length=50" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: >Another example of a one-line C-style comment.<
+{ dg-end-multiline-output "" }
+#endif
+
+/**/
+#if 0
+{ dg-message "1: got C-style comment; length=4" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
+
+/* Example of a
+ multi-line C-style comment. */
+#if 0
+{ dg-message "1: got C-style comment; length=50" "" { target *-*-* } .-3 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a
+ multi-line C-style comment. <
+{ dg-end-multiline-output "" }
+#endif
+
+// Example of a C++-style comment
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=33" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a C++-style comment<
+{ dg-end-multiline-output "" }
+#endif
+
+//
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=2" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp
index 94ebe93..e40cba3 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -68,6 +68,7 @@ set plugin_test_list [list \
{ show_template_tree_color_plugin.c \
show-template-tree-color.C \
show-template-tree-color-no-elide-type.C } \
+ { comment_plugin.c comments-1.C } \
]
foreach plugin_test $plugin_test_list {
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index b843992..66ef4d6 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -609,6 +609,15 @@ struct cpp_callbacks
/* Callback for providing suggestions for misspelled directives. */
const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *);
+
+ /* Callback for when a comment is encountered, giving the location
+ of the opening slash, a pointer to the content (which is not
+ necessarily 0-terminated), and the length of the content.
+ The content contains the opening slash-star (or slash-slash),
+ and for C-style comments contains the closing star-slash. For
+ C++-style comments it does not include the terminating newline. */
+ void (*comment) (cpp_reader *, source_location, const unsigned char *,
+ size_t);
};
#ifdef VMS
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 9edd2a6..40ff801 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2889,6 +2889,13 @@ _cpp_lex_direct (cpp_reader *pfile)
if (fallthrough_comment_p (pfile, comment_start))
fallthrough_comment = true;
+ if (pfile->cb.comment)
+ {
+ size_t len = pfile->buffer->cur - comment_start;
+ pfile->cb.comment (pfile, result->src_loc, comment_start - 1,
+ len + 1);
+ }
+
if (!pfile->state.save_comments)
{
result->flags |= PREV_WHITE;
--
1.8.5.3
prev parent reply other threads:[~2017-06-05 20:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
2017-05-02 21:58 ` Mike Stump
2017-06-08 9:34 ` [PATCH] testsuite: example plugin for spellchecking comments David Malcolm
2017-06-08 19:47 ` Mike Stump
2017-05-03 0:09 ` [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo Trevor Saunders
2017-05-12 18:56 ` Jeff Law
2017-06-05 20:58 ` David Malcolm [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1496698274-14253-1-git-send-email-dmalcolm@redhat.com \
--to=dmalcolm@redhat.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).