public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo
@ 2017-05-02 19:06 David Malcolm
  2017-05-02 21:58 ` Mike Stump
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Malcolm @ 2017-05-02 19:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Currently the C/C++ frontends discard comments when parsing.
It's possible to set up libcpp to capture comments as tokens,
by setting CPP_OPTION (pfile, discard_comments) to false),
and this can be enabled using the -C command line option (see
also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
tokens it sees, so they're not seen by the frontend parser.

The following patch adds an (optional) callback to libcpp
for handling comments, giving the comment content, and the
location it was seen at.  This approach allows arbitrary
logic to be wired up to comments, and avoids having to
copy the comment content to a new buffer (which the CPP_COMMENT
approach does).

This could be used by plugins to chain up on the callback
e.g. to parse specially-formatted comments, e.g. for
documentation generation, or e.g. for GObject introspection
annotations [1].

As a proof of concept, the patch uses this to add a spellchecker
for comments.  It uses the Enchant meta-library:
   https://abiword.github.io/enchant/
(essentially a wrapper around 8 different spellchecking libraries).
I didn't bother with the autotool detection for enchant, and
just hacked it in for now.

Example output:

test.c:3:46: warning: spellcheck_word: "evaulate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
test.c:3:46: note: suggestion: "evaluate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              evaluate
test.c:3:46: note: suggestion: "ululate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              ululate
test.c:3:46: note: suggestion: "elevate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              elevate

License-wise, Enchant is LGPL 2.1 "or (at your option) any
later version." with a special exception to allow non-LGPL
spellchecking providers (e.g. to allow linking against an
OS-provided spellchecker).

Various FIXMEs are present (e.g. hardcoded "en_US" for the
language to spellcheck against).
Also, the spellchecker has a lot of false positives e.g.
it doesn't grok URLs (and thus complains when it seens them);
similar for DejaGnu directives etc.

Does enchant seem like a reasonable dependency for the compiler?
(it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule-2.0.so.0).
Or would this be better pursued as a plugin?  (if so, I'd
prefer the plugin to live in the source tree as an example,
rather than out-of-tree).

Unrelated to spellchecking, I also added two new options:
-Wfixme and -Wtodo, for warning when comments containing
"FIXME" or "TODO" are encountered.
I use such comments a lot during development.  I thought some
people might want a warning about them (I tend to just use grep
though).  [TODO: document these in invoke.texi, add test cases]

Thoughts?  Does any of this sound useful?

[not yet bootstrapped; as noted above, I haven't yet done
the autoconf stuff for handling Enchant]

[1] https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

gcc/ChangeLog:
	* Makefile.in (LIBS): Hack in -lenchant for now.
	(OBJS): Add spellcheck-enchant.o.
	* common.opt (Wfixme): New option.
	(Wtodo): New option.
	* spellcheck-enchant.c: New file.
	* spellcheck-enchant.h: New file.

gcc/c-family/ChangeLog:
	* c-lex.c: Include spellcheck-enchant.h.
	(init_c_lex): Wire up spellcheck_enchant_check_comment to the
	comment callback.
	* c-opts.c: Include spellcheck-enchant.h.
	(c_common_post_options): Call spellcheck_enchant_init.
	(c_common_finish): Call spellcheck_enchant_finish.

libcpp/ChangeLog:
	* include/cpplib.h (struct cpp_callbacks): Add "comment"
	callback.
	* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
---
 gcc/Makefile.in          |   3 +-
 gcc/c-family/c-lex.c     |   2 +
 gcc/c-family/c-opts.c    |   4 +
 gcc/common.opt           |   8 ++
 gcc/spellcheck-enchant.c | 294 +++++++++++++++++++++++++++++++++++++++++++++++
 gcc/spellcheck-enchant.h |  33 ++++++
 libcpp/include/cpplib.h  |   4 +
 libcpp/lex.c             |   7 ++
 8 files changed, 354 insertions(+), 1 deletion(-)
 create mode 100644 gcc/spellcheck-enchant.c
 create mode 100644 gcc/spellcheck-enchant.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f675e07..6bb3dc0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1046,7 +1046,7 @@ BUILD_LIBDEPS= $(BUILD_LIBIBERTY)
 # How to link with both our special library facilities
 # and the system's installed libraries.
 LIBS = @LIBS@ libcommon.a $(CPPLIB) $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) \
-	$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS)
+	$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS) -lenchant
 BACKENDLIBS = $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
 	$(ZLIB)
 # Any system libraries needed just for GNAT.
@@ -1450,6 +1450,7 @@ OBJS = \
 	simplify-rtx.o \
 	sparseset.o \
 	spellcheck.o \
+	spellcheck-enchant.o \
 	spellcheck-tree.o \
 	sreal.o \
 	stack-ptr-mod.o \
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e1c8bdf..389302c 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 
 #include "attribs.h"
+#include "spellcheck-enchant.h"
 
 /* We may keep statistics about how long which files took to compile.  */
 static int header_time, body_time;
@@ -82,6 +83,7 @@ init_c_lex (void)
   cb->has_attribute = c_common_has_attribute;
   cb->get_source_date_epoch = cb_get_source_date_epoch;
   cb->get_suggestion = cb_get_suggestion;
+  cb->comment = spellcheck_enchant_check_comment;
 
   /* Set the debug callbacks if we can use them.  */
   if ((debug_info_level == DINFO_LEVEL_VERBOSE
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index ea0e01b..cc58dd0 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"		/* For PLUGIN_INCLUDE_FILE event.  */
 #include "mkdeps.h"
 #include "dumpfile.h"
+#include "spellcheck-enchant.h"
 
 #ifndef DOLLARS_IN_IDENTIFIERS
 # define DOLLARS_IN_IDENTIFIERS true
@@ -995,6 +996,7 @@ c_common_post_options (const char **pfilename)
     }
   else
     {
+      spellcheck_enchant_init ("en_US"); // FIXME: get from an option
       init_c_lex ();
 
       /* When writing a PCH file, avoid reading some other PCH file,
@@ -1177,6 +1179,8 @@ c_common_finish (void)
      with cpp_destroy ().  */
   cpp_finish (parse_in, deps_stream);
 
+  spellcheck_enchant_finish ();
+
   if (deps_stream && deps_stream != out_stream
       && (ferror (deps_stream) || fclose (deps_stream)))
     fatal_error (input_location, "closing dependency file %s: %m", deps_file);
diff --git a/gcc/common.opt b/gcc/common.opt
index 5817407..f17d823 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -585,6 +585,10 @@ Wfatal-errors
 Common Var(flag_fatal_errors)
 Exit on the first error occurred.
 
+Wfixme
+Common Var(warn_fixme) Warning
+Complain about comments containing 'FIXME'.
+
 Wframe-larger-than=
 Common RejectNegative Joined UInteger Warning
 -Wframe-larger-than=<number>	Warn if a function's stack frame requires more than <number> bytes.
@@ -737,6 +741,10 @@ Wsystem-headers
 Common Var(warn_system_headers) Warning
 Do not suppress warnings from system headers.
 
+Wtodo
+Common Var(warn_todo) Warning
+Complain about comments containing 'TODO'.
+
 Wtrampolines
 Common Var(warn_trampolines) Warning
 Warn whenever a trampoline is generated.
diff --git a/gcc/spellcheck-enchant.c b/gcc/spellcheck-enchant.c
new file mode 100644
index 0000000..30152c1
--- /dev/null
+++ b/gcc/spellcheck-enchant.c
@@ -0,0 +1,294 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "options.h"
+
+/* Enchant is LGPL 2.1 "or (at your option) any later version." with
+   a special exception to allow non-LGPL spellchecking providers.  */
+
+#include <enchant/enchant.h> // FIXME
+
+/* Class wrapping the result of enchant_dict_suggest, ensuring that
+   the suggestions are cleaned up.  */
+
+class suggestions
+{
+ public:
+  suggestions (EnchantDict *dict, const char *const word, ssize_t len)
+    : m_dict (dict), m_suggestions (NULL)
+    {
+      m_suggestions = enchant_dict_suggest (m_dict, word, len, &m_num);
+    }
+  ~suggestions ()
+    {
+      enchant_dict_free_string_list (m_dict, m_suggestions);
+    }
+
+  size_t get_count () const { return m_num; }
+  const char *get_suggestion (size_t index) const
+  {
+    return m_suggestions[index];
+  }
+
+ private:
+  EnchantDict *m_dict;
+  size_t m_num;
+  char **m_suggestions;
+};
+
+/* Callback for describing metadata about the Enchant provider in use.  */
+
+static void
+describe_broker_cb (const char * const provider_name,
+		    const char * const provider_desc,
+		    const char * const provider_dll_file,
+		    void *)
+{
+  inform (UNKNOWN_LOCATION,
+	  "spellcheck provider: %qs  description: %qs  file: %qs",
+	  provider_name, provider_desc, provider_dll_file);
+}
+
+/* A class for wrapping access to the Enchant API (both the broker
+   and the dict).  */
+
+class checker
+{
+ public:
+  checker (const char *language_tag);
+  ~checker ();
+
+  bool misspelled_p (const char *const word, ssize_t len);
+  suggestions get_suggestions (const char *const word, ssize_t len);
+
+  static checker *singleton;
+
+ private:
+  EnchantBroker *m_broker;
+  EnchantDict *m_dict;
+};
+
+/* checker's ctor.  */
+
+checker::checker (const char *language_tag)
+: m_broker (NULL), m_dict (NULL)
+{
+  m_broker = enchant_broker_init ();
+  if (!m_broker)
+    return;
+  m_dict = enchant_broker_request_dict (m_broker, language_tag);
+
+  enchant_broker_describe (m_broker, describe_broker_cb, NULL);
+
+  // FIXME: will probably want to support local wordlists also?
+}
+
+/* checker's dtor.  */
+
+checker::~checker ()
+{
+  if (m_dict)
+    enchant_broker_free_dict (m_broker, m_dict);
+  if (m_broker)
+    enchant_broker_free (m_broker);
+}
+
+/* Return true if WORD of length LEN appears to be misspelled.  */
+
+bool
+checker::misspelled_p (const char *const word, ssize_t len)
+{
+  int result = enchant_dict_check (m_dict, word, len);
+  return result != 0;
+}
+
+/* Get suggestions for misspelled WORD of length LEN.  */
+
+suggestions
+checker::get_suggestions (const char *const word, ssize_t len)
+{
+  return suggestions (m_dict, word, len);
+}
+
+/* Singleton instance of checker.  */
+
+checker *checker::singleton = NULL;
+
+/* Generate a location for a word of length LEN within ORD_MAP at the
+   given LINE AND COLUMN.  */
+
+static location_t
+make_word_location (const line_map_ordinary *ord_map, int line, int column,
+		    size_t len)
+{
+  location_t start_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line, column);
+  location_t end_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line,
+					    column + len - 1);
+  return make_location (start_loc, start_loc, end_loc);
+}
+
+/* Subroutine for Wfixme and Wtodo.
+
+   If the WORD of length LEN matches BAD_WORD, issue a warning controlled
+   by OPT at the source location given by ORD_MAP, LINE, and COLUMN, and
+   return true.
+
+   Otherwise return false.  */
+
+static inline bool
+maybe_warn_about_unfinished_code (const char *bad_word, int opt,
+				  const unsigned char *word, size_t len,
+				  const line_map_ordinary *ord_map,
+				  int line, int column)
+{
+  if (len == strlen (bad_word) && !strncmp ((const char *)word, bad_word, len))
+    {
+      location_t word_loc = make_word_location (ord_map, line, column, len);
+      warning_at (word_loc, opt, "%qs found in comment", bad_word);
+      return true;
+    }
+  return false;
+}
+
+/* Spellcheck the word at WORD of length LEN, with a location
+   within ORD_MAP at the given LINE AND COLUMN.
+   Issue warnings and suggestions for possibly misspelled words,
+   and for "FIXME" and "TODO" comments.  */
+
+static void
+spellcheck_word (const unsigned char *word, size_t len,
+		 const line_map_ordinary *ord_map, int line, int column)
+{
+  /* Complain about "FIXME" and "TODO" in comments.  */
+  if (maybe_warn_about_unfinished_code ("FIXME", OPT_Wfixme, word, len, ord_map,
+					line, column))
+    return;
+  if (maybe_warn_about_unfinished_code ("TODO", OPT_Wtodo, word, len, ord_map,
+					line, column))
+    return;
+
+  /* Don't bother spellchecking words with underscores or any upper case,
+     as these may refer to code entities.
+     FIXME: should we also merge in names from code?  */
+  for (size_t i = 0; i < len; i++)
+    {
+      if (word[i] == '_' || ISUPPER (word[i]))
+	return;
+    }
+
+  if (!checker::singleton->misspelled_p ((const char *)word, len))
+    return;
+
+  location_t word_loc = make_word_location (ord_map, line, column, len);
+
+  if (warning_at (word_loc, 0, "spellcheck_word: \"%.*s\"", (int)len, word))
+    {
+      suggestions s
+	= checker::singleton->get_suggestions ((const char *)word, len);
+      for (size_t i = 0; i < s.get_count (); i++)
+	{
+	  rich_location richloc (line_table, word_loc);
+	  richloc.add_fixit_replace (s.get_suggestion (i));
+	  inform_at_rich_loc (&richloc, "suggestion: \"%s\"",
+			      s.get_suggestion (i));
+	}
+    }
+}
+
+/* Initialize this module, using the given LANGUAGE_TAG as the
+   language to use for spellchecking (e.g. "en_US").  */
+
+void
+spellcheck_enchant_init (const char *language_tag)
+{
+  checker::singleton = new checker (language_tag);
+}
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+void
+spellcheck_enchant_check_comment (cpp_reader *, source_location loc,
+				  const unsigned char *content, size_t len)
+{
+  /* Split into words, tracking the location, and call spellcheck_word
+     on each word.  */
+
+  // FIXME: sanity check LOC
+
+  const struct line_map *map = linemap_lookup (line_table, loc);
+  const line_map_ordinary *ord_map = linemap_check_ordinary (map);
+
+  int column = LOCATION_COLUMN (loc);
+  int line = LOCATION_LINE (loc);
+  int word_start_column = column;
+  int word_start_line = line;
+  const unsigned char *last_char = content + len;
+  const unsigned char *iter = content;
+  const unsigned char *word_start = NULL;
+  while (iter <= last_char)
+    {
+      unsigned char ch = *iter;
+      if (ISALPHA (ch) || ch == '_')
+	{
+	  if (!word_start)
+	    {
+	      word_start = iter;
+	      word_start_column = column;
+	      word_start_line = line;
+	    }
+	}
+      else
+	{
+	  if (word_start)
+	    {
+	      /* We have a word ending.  */
+	      spellcheck_word (word_start, iter - word_start, ord_map,
+			       word_start_line, word_start_column);
+	      word_start = NULL;
+	    }
+	}
+      if (ch == '\n')
+	{
+	  column = 1;
+	  line++;
+	}
+      else
+	column++;
+      iter++;
+    }
+  if (word_start)
+    // FIXME: doublecheck length here
+    spellcheck_word (word_start, iter - word_start - 1, ord_map,
+		     word_start_line, word_start_column);
+}
+
+/* Clean up this module.  */
+
+void
+spellcheck_enchant_finish (void)
+{
+  delete checker::singleton;
+}
diff --git a/gcc/spellcheck-enchant.h b/gcc/spellcheck-enchant.h
new file mode 100644
index 0000000..33aac0a
--- /dev/null
+++ b/gcc/spellcheck-enchant.h
@@ -0,0 +1,33 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_ENCHANT_H
+#define GCC_SPELLCHECK_ENCHANT_H
+
+extern void spellcheck_enchant_init (const char *language_tag);
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+extern void spellcheck_enchant_check_comment (cpp_reader *, source_location,
+					      const unsigned char *, size_t);
+
+extern void spellcheck_enchant_finish (void);
+
+#endif  /* GCC_SPELLCHECK_ENCHANT_H  */
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index b843992..3138738 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -609,6 +609,10 @@ struct cpp_callbacks
 
   /* Callback for providing suggestions for misspelled directives.  */
   const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *);
+
+  /* Callback for when a comment is encountered.  */
+  void (*comment) (cpp_reader *, source_location, const unsigned char *,
+		   size_t);
 };
 
 #ifdef VMS
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 9edd2a6..40ff801 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2889,6 +2889,13 @@ _cpp_lex_direct (cpp_reader *pfile)
       if (fallthrough_comment_p (pfile, comment_start))
 	fallthrough_comment = true;
 
+      if (pfile->cb.comment)
+	{
+	  size_t len = pfile->buffer->cur - comment_start;
+	  pfile->cb.comment (pfile, result->src_loc, comment_start - 1,
+			     len + 1);
+	}
+
       if (!pfile->state.save_comments)
 	{
 	  result->flags |= PREV_WHITE;
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo
  2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
@ 2017-05-02 21:58 ` Mike Stump
  2017-06-08  9:34   ` [PATCH] testsuite: example plugin for spellchecking comments David Malcolm
  2017-05-03  0:09 ` [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo Trevor Saunders
  2017-05-12 18:56 ` Jeff Law
  2 siblings, 1 reply; 7+ messages in thread
From: Mike Stump @ 2017-05-02 21:58 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On May 2, 2017, at 12:08 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> As a proof of concept, the patch uses this to add a spellchecker
> for comments.

:-)

> I didn't bother with the autotool detection for enchant, and
> just hacked it in for now.

Only other comment would be, rather then requiring it, would be nice to make it optional.  I can host gcc in an environment that is a bare metal target on newlib.  autoconf can smell it, and tell that a ton of things are missing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo
  2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
  2017-05-02 21:58 ` Mike Stump
@ 2017-05-03  0:09 ` Trevor Saunders
  2017-05-12 18:56 ` Jeff Law
  2 siblings, 0 replies; 7+ messages in thread
From: Trevor Saunders @ 2017-05-03  0:09 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Tue, May 02, 2017 at 03:08:01PM -0400, David Malcolm wrote:
> Currently the C/C++ frontends discard comments when parsing.
> It's possible to set up libcpp to capture comments as tokens,
> by setting CPP_OPTION (pfile, discard_comments) to false),
> and this can be enabled using the -C command line option (see
> also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
> tokens it sees, so they're not seen by the frontend parser.
> 
> The following patch adds an (optional) callback to libcpp
> for handling comments, giving the comment content, and the
> location it was seen at.  This approach allows arbitrary
> logic to be wired up to comments, and avoids having to
> copy the comment content to a new buffer (which the CPP_COMMENT
> approach does).
> 
> This could be used by plugins to chain up on the callback
> e.g. to parse specially-formatted comments, e.g. for
> documentation generation, or e.g. for GObject introspection
> annotations [1].

Making that kind of task easier seems like a good thing.  One difficulty
will be associating the comment with the declaration its for. In C++ its
probably better to use attributes when possible but that won't work for
the documentation issue.

> Does enchant seem like a reasonable dependency for the compiler?
> (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule-2.0.so.0).
> Or would this be better pursued as a plugin?  (if so, I'd
> prefer the plugin to live in the source tree as an example,
> rather than out-of-tree).

I'd kind of worry about what loading all that does to start up time, but
maybe that's premature optimization.  Anyway it seems like making things
plugins where reasonable will help modularity, so kinda seems like a
good idea anyway.

> Unrelated to spellchecking, I also added two new options:
> -Wfixme and -Wtodo, for warning when comments containing
> "FIXME" or "TODO" are encountered.
> I use such comments a lot during development.  I thought some
> people might want a warning about them (I tend to just use grep
> though).  [TODO: document these in invoke.texi, add test cases]

it seems useful if you don't have existing committed fixmes all over
already, or you can tel those apart easily.

It may also be worth suppoting more wordings, for some reason Mozilla
uses XXX a lot.

> Thoughts?  Does any of this sound useful?

didn't read the code careful, but the ideas seem useful.

Trev

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo
  2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
  2017-05-02 21:58 ` Mike Stump
  2017-05-03  0:09 ` [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo Trevor Saunders
@ 2017-05-12 18:56 ` Jeff Law
  2017-06-05 20:58   ` [committed] libcpp: add callback for comment-handling David Malcolm
  2 siblings, 1 reply; 7+ messages in thread
From: Jeff Law @ 2017-05-12 18:56 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 05/02/2017 01:08 PM, David Malcolm wrote:
> Currently the C/C++ frontends discard comments when parsing.
> It's possible to set up libcpp to capture comments as tokens,
> by setting CPP_OPTION (pfile, discard_comments) to false),
> and this can be enabled using the -C command line option (see
> also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
> tokens it sees, so they're not seen by the frontend parser.
> 
> The following patch adds an (optional) callback to libcpp
> for handling comments, giving the comment content, and the
> location it was seen at.  This approach allows arbitrary
> logic to be wired up to comments, and avoids having to
> copy the comment content to a new buffer (which the CPP_COMMENT
> approach does).
> 
> This could be used by plugins to chain up on the callback
> e.g. to parse specially-formatted comments, e.g. for
> documentation generation, or e.g. for GObject introspection
> annotations [1].
> 
> As a proof of concept, the patch uses this to add a spellchecker
> for comments.  It uses the Enchant meta-library:
>     https://abiword.github.io/enchant/
> (essentially a wrapper around 8 different spellchecking libraries).
> I didn't bother with the autotool detection for enchant, and
> just hacked it in for now.
> 
> Example output:
> 
> test.c:3:46: warning: spellcheck_word: "evaulate"
>      When NONCONST_PRED is false the code will evaulate to constant and
>                                                ^~~~~~~~
> test.c:3:46: note: suggestion: "evaluate"
>      When NONCONST_PRED is false the code will evaulate to constant and
>                                                ^~~~~~~~
>                                                evaluate
> test.c:3:46: note: suggestion: "ululate"
>      When NONCONST_PRED is false the code will evaulate to constant and
>                                                ^~~~~~~~
>                                                ululate
> test.c:3:46: note: suggestion: "elevate"
>      When NONCONST_PRED is false the code will evaulate to constant and
>                                                ^~~~~~~~
>                                                elevate
> 
> License-wise, Enchant is LGPL 2.1 "or (at your option) any
> later version." with a special exception to allow non-LGPL
> spellchecking providers (e.g. to allow linking against an
> OS-provided spellchecker).
> 
> Various FIXMEs are present (e.g. hardcoded "en_US" for the
> language to spellcheck against).
> Also, the spellchecker has a lot of false positives e.g.
> it doesn't grok URLs (and thus complains when it seens them);
> similar for DejaGnu directives etc.
> 
> Does enchant seem like a reasonable dependency for the compiler?
> (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule-2.0.so.0).
> Or would this be better pursued as a plugin?  (if so, I'd
> prefer the plugin to live in the source tree as an example,
> rather than out-of-tree).
> 
> Unrelated to spellchecking, I also added two new options:
> -Wfixme and -Wtodo, for warning when comments containing
> "FIXME" or "TODO" are encountered.
> I use such comments a lot during development.  I thought some
> people might want a warning about them (I tend to just use grep
> though).  [TODO: document these in invoke.texi, add test cases]
> 
> Thoughts?  Does any of this sound useful?
> 
> [not yet bootstrapped; as noted above, I haven't yet done
> the autoconf stuff for handling Enchant]
> 
> [1] https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations
> 
> gcc/ChangeLog:
> 	* Makefile.in (LIBS): Hack in -lenchant for now.
> 	(OBJS): Add spellcheck-enchant.o.
> 	* common.opt (Wfixme): New option.
> 	(Wtodo): New option.
> 	* spellcheck-enchant.c: New file.
> 	* spellcheck-enchant.h: New file.
> 
> gcc/c-family/ChangeLog:
> 	* c-lex.c: Include spellcheck-enchant.h.
> 	(init_c_lex): Wire up spellcheck_enchant_check_comment to the
> 	comment callback.
> 	* c-opts.c: Include spellcheck-enchant.h.
> 	(c_common_post_options): Call spellcheck_enchant_init.
> 	(c_common_finish): Call spellcheck_enchant_finish.
> 
> libcpp/ChangeLog:
> 	* include/cpplib.h (struct cpp_callbacks): Add "comment"
> 	callback.
> 	* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
enchant seems a bit out of the sweet spot, particular just to catch 
mis-spellings in comments.  But it might make an interesting plugin.

IIRC from our meeting earlier this week, you had another use case that 
might have been more compelling, but I can't remember what it was.

I do like the ability to at least capture the comments better and while 
we don't have a strong need for that capability now, we might in the future.

Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [committed] libcpp: add callback for comment-handling
  2017-05-12 18:56 ` Jeff Law
@ 2017-06-05 20:58   ` David Malcolm
  0 siblings, 0 replies; 7+ messages in thread
From: David Malcolm @ 2017-06-05 20:58 UTC (permalink / raw)
  To: Jeff Law, gcc-patches; +Cc: David Malcolm

On Fri, 2017-05-12 at 12:47 -0600, Jeff Law wrote:
> On 05/02/2017 01:08 PM, David Malcolm wrote:
> > Currently the C/C++ frontends discard comments when parsing.
> > It's possible to set up libcpp to capture comments as tokens,
> > by setting CPP_OPTION (pfile, discard_comments) to false),
> > and this can be enabled using the -C command line option (see
> > also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
> > tokens it sees, so they're not seen by the frontend parser.
> > 
> > The following patch adds an (optional) callback to libcpp
> > for handling comments, giving the comment content, and the
> > location it was seen at.  This approach allows arbitrary
> > logic to be wired up to comments, and avoids having to
> > copy the comment content to a new buffer (which the CPP_COMMENT
> > approach does).
> > 
> > This could be used by plugins to chain up on the callback
> > e.g. to parse specially-formatted comments, e.g. for
> > documentation generation, or e.g. for GObject introspection
> > annotations [1].
> > 
> > As a proof of concept, the patch uses this to add a spellchecker
> > for comments.  It uses the Enchant meta-library:
> >     https://abiword.github.io/enchant/
> > (essentially a wrapper around 8 different spellchecking libraries).
> > I didn't bother with the autotool detection for enchant, and
> > just hacked it in for now.
> > 
> > Example output:
> > 
> > test.c:3:46: warning: spellcheck_word: "evaulate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> > test.c:3:46: note: suggestion: "evaluate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                evaluate
> > test.c:3:46: note: suggestion: "ululate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                ululate
> > test.c:3:46: note: suggestion: "elevate"
> >      When NONCONST_PRED is false the code will evaulate to constant
> > and
> >                                                ^~~~~~~~
> >                                                elevate
> > 
> > License-wise, Enchant is LGPL 2.1 "or (at your option) any
> > later version." with a special exception to allow non-LGPL
> > spellchecking providers (e.g. to allow linking against an
> > OS-provided spellchecker).
> > 
> > Various FIXMEs are present (e.g. hardcoded "en_US" for the
> > language to spellcheck against).
> > Also, the spellchecker has a lot of false positives e.g.
> > it doesn't grok URLs (and thus complains when it seens them);
> > similar for DejaGnu directives etc.
> > 
> > Does enchant seem like a reasonable dependency for the compiler?
> > (it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule
> > -2.0.so.0).
> > Or would this be better pursued as a plugin?  (if so, I'd
> > prefer the plugin to live in the source tree as an example,
> > rather than out-of-tree).
> > 
> > Unrelated to spellchecking, I also added two new options:
> > -Wfixme and -Wtodo, for warning when comments containing
> > "FIXME" or "TODO" are encountered.
> > I use such comments a lot during development.  I thought some
> > people might want a warning about them (I tend to just use grep
> > though).  [TODO: document these in invoke.texi, add test cases]
> > 
> > Thoughts?  Does any of this sound useful?
> > 
> > [not yet bootstrapped; as noted above, I haven't yet done
> > the autoconf stuff for handling Enchant]
> > 
> > [1] 
> > https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations
> > 
> > gcc/ChangeLog:
> > 	* Makefile.in (LIBS): Hack in -lenchant for now.
> > 	(OBJS): Add spellcheck-enchant.o.
> > 	* common.opt (Wfixme): New option.
> > 	(Wtodo): New option.
> > 	* spellcheck-enchant.c: New file.
> > 	* spellcheck-enchant.h: New file.
> > 
> > gcc/c-family/ChangeLog:
> > 	* c-lex.c: Include spellcheck-enchant.h.
> > 	(init_c_lex): Wire up spellcheck_enchant_check_comment to the
> > 	comment callback.
> > 	* c-opts.c: Include spellcheck-enchant.h.
> > 	(c_common_post_options): Call spellcheck_enchant_init.
> > 	(c_common_finish): Call spellcheck_enchant_finish.
> > 
> > libcpp/ChangeLog:
> > 	* include/cpplib.h (struct cpp_callbacks): Add "comment"
> > 	callback.
> > 	* lex.c (_cpp_lex_direct): Call the comment callback if non
> > -NULL.
> enchant seems a bit out of the sweet spot, particular just to catch
> mis-spellings in comments.  But it might make an interesting plugin.
> 
> IIRC from our meeting earlier this week, you had another use case
> that
> might have been more compelling, but I can't remember what it was.

(FWIW I think we were chatting about the GObjectIntrospection annotations
use-case)

> I do like the ability to at least capture the comments better and
> while
> we don't have a strong need for that capability now, we might in the
> future.
> 
> Jeff

I stripped out all of the spellchecking stuff, and restricted the
scope of the patch to just adding a callback for plugins to hook into
if they're interested in handling comments.  I added an example of such
a plugin in the testsuite.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r248901.

gcc/testsuite/ChangeLog:
	* g++.dg/plugin/comment_plugin.c: New test plugin.
	* g++.dg/plugin/comments-1.C: New test file.
	* g++.dg/plugin/plugin.exp (plugin_test_list): Add the above.

libcpp/ChangeLog:
	* include/cpplib.h (struct cpp_callbacks): Add "comment"
	callback.
	* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
---
 gcc/testsuite/g++.dg/plugin/comment_plugin.c | 63 ++++++++++++++++++++++++++++
 gcc/testsuite/g++.dg/plugin/comments-1.C     | 49 ++++++++++++++++++++++
 gcc/testsuite/g++.dg/plugin/plugin.exp       |  1 +
 libcpp/include/cpplib.h                      |  9 ++++
 libcpp/lex.c                                 |  7 ++++
 5 files changed, 129 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/plugin/comment_plugin.c
 create mode 100644 gcc/testsuite/g++.dg/plugin/comments-1.C

diff --git a/gcc/testsuite/g++.dg/plugin/comment_plugin.c b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
new file mode 100644
index 0000000..c3b08e3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comment_plugin.c
@@ -0,0 +1,63 @@
+/* Test of cpp_callbacks::comments.  */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "cpplib.h"
+#include "diagnostic.h"
+#include "c-family/c-pragma.h"
+
+int plugin_is_GPL_compatible;
+
+/* Test callback for cpp_callbacks::comments.  */
+
+void
+my_comment_cb (cpp_reader *, source_location loc,
+	       const unsigned char *content, size_t len)
+{
+  if (in_system_header_at (loc))
+    return;
+
+  /* CONTENT contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  */
+  gcc_assert (len >= 2);
+  gcc_assert (content[0] == '/');
+  gcc_assert (content[1] == '*' || content[1] == '/');
+  bool c_style = (content[1] == '*');
+  if (c_style)
+    {
+      gcc_assert (content[len - 2] == '*');
+      gcc_assert (content[len - 1] == '/');
+    }
+
+  if (c_style)
+    inform (loc, "got C-style comment; length=%i", len);
+  else
+    inform (loc, "got C++-style comment; length=%i", len);
+
+  /* Print the content of the comment.
+     For a C-style comment, the buffer CONTENT contains the opening
+     slash-star and closing star-slash, so we can't directly verify
+     it in the DejaGnu test without adding another comment, which
+     would trigger this callback again.
+     Hence we skip the syntactically-significant parts of the comment
+     when printing it.  */
+  fprintf (stderr, "stripped content of comment: >");
+  /* Avoid printing trailing star-slash.  */
+  if (c_style)
+    len -= 2;
+  for (size_t i = 2; i < len; i++)
+    fputc (content[i], stderr);
+  fprintf (stderr, "<\n");
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  cpp_callbacks *cb = cpp_get_callbacks (parse_in);
+  cb->comment = my_comment_cb;
+
+  return 0;
+}
diff --git a/gcc/testsuite/g++.dg/plugin/comments-1.C b/gcc/testsuite/g++.dg/plugin/comments-1.C
new file mode 100644
index 0000000..0821b14
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/comments-1.C
@@ -0,0 +1,49 @@
+/* Example of a one-line C-style comment.  */
+#if 0
+{ dg-message "1: got C-style comment; length=45" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a one-line C-style comment.  <
+{ dg-end-multiline-output "" }
+#endif
+
+     /*Another example of a one-line C-style comment.*/
+#if 0
+{ dg-message "6: got C-style comment; length=50" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: >Another example of a one-line C-style comment.<
+{ dg-end-multiline-output "" }
+#endif
+
+/**/
+#if 0
+{ dg-message "1: got C-style comment; length=4" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
+
+/* Example of a
+   multi-line C-style comment.  */
+#if 0
+{ dg-message "1: got C-style comment; length=50" "" { target *-*-* } .-3 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a
+   multi-line C-style comment.  <
+{ dg-end-multiline-output "" }
+#endif
+
+// Example of a C++-style comment
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=33" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: > Example of a C++-style comment<
+{ dg-end-multiline-output "" }
+#endif
+
+//
+#if 0
+{ dg-message "1: got C\\+\\+-style comment; length=2" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+stripped content of comment: ><
+{ dg-end-multiline-output "" }
+#endif
diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp
index 94ebe93..e40cba3 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -68,6 +68,7 @@ set plugin_test_list [list \
     { show_template_tree_color_plugin.c \
     	  show-template-tree-color.C \
     	  show-template-tree-color-no-elide-type.C } \
+    { comment_plugin.c comments-1.C } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index b843992..66ef4d6 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -609,6 +609,15 @@ struct cpp_callbacks
 
   /* Callback for providing suggestions for misspelled directives.  */
   const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *);
+
+  /* Callback for when a comment is encountered, giving the location
+     of the opening slash, a pointer to the content (which is not
+     necessarily 0-terminated), and the length of the content.
+     The content contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  For
+     C++-style comments it does not include the terminating newline.  */
+  void (*comment) (cpp_reader *, source_location, const unsigned char *,
+		   size_t);
 };
 
 #ifdef VMS
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 9edd2a6..40ff801 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2889,6 +2889,13 @@ _cpp_lex_direct (cpp_reader *pfile)
       if (fallthrough_comment_p (pfile, comment_start))
 	fallthrough_comment = true;
 
+      if (pfile->cb.comment)
+	{
+	  size_t len = pfile->buffer->cur - comment_start;
+	  pfile->cb.comment (pfile, result->src_loc, comment_start - 1,
+			     len + 1);
+	}
+
       if (!pfile->state.save_comments)
 	{
 	  result->flags |= PREV_WHITE;
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] testsuite: example plugin for spellchecking comments
  2017-05-02 21:58 ` Mike Stump
@ 2017-06-08  9:34   ` David Malcolm
  2017-06-08 19:47     ` Mike Stump
  0 siblings, 1 reply; 7+ messages in thread
From: David Malcolm @ 2017-06-08  9:34 UTC (permalink / raw)
  To: Mike Stump; +Cc: gcc-patches, David Malcolm

On Tue, 2017-05-02 at 14:11 -0700, Mike Stump wrote:
> On May 2, 2017, at 12:08 PM, David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > As a proof of concept, the patch uses this to add a spellchecker
> > for comments.
> 
> :-)
> 
> > I didn't bother with the autotool detection for enchant, and
> > just hacked it in for now.
> 
> Only other comment would be, rather then requiring it, would be nice
> to make it optional.  I can host gcc in an environment that is a bare
> metal target on newlib.  autoconf can smell it, and tell that a ton
> of things are missing.

Given that Enchant seems a stretch as a hard dependency, it seems
better to either:
(a) have this code live as a plugin, or
(b) attempt to dynamically load libenchant on-demand (a "soft"
dependency ?)

This patch implements (a): the plugin approach.

We don't currently install any "official" plugins; ideally, if a
plugin, I'd like it to be one of the things that "make install"
installs: it add dependencies that are too much to impose on everyone's
cc1 et al (hence a plugin), but which a significant number of people
might want to opt in to (hence in-tree and "supported", for some meaning
of that term).

We don't have any mechanism or policies to allow that yet, so in the
meantime, here's a revised version of the spellchecker code which adds
it to the *testsuite*, as an example plugin.  (specifically the C++
testsuite, to support both C and C++ style comments without needing
to silence the compat flag).

Changes:
  * moved it all to a plugin, within the testsuite
  * added test coverage
  * added option -fplugin-arg-spellcheck-show-provider
  * added option -fplugin-arg-spellcheck-language
  * fixed bug with misspelling at very end of a comment
  * fixed bug with apostrophes breaking words
  * fixed quoting of suggestions
  * don't spellcheck within system headers
  * don't spellcheck DejaGnu directives
  * don't spellcheck URLs
  * removed the -Wfixme and -Wtodo idea

libenchant uses pkg-config to expose the flags needed to build
against it, so the patch adds a new "dg-pkgconfig" directive to
plugin-support.exp.  This attempts to use pkg-config to locate
the appropriate flags for building against libenchant.
If it fails, the test is marked as unsupported.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/testsuite/ChangeLog:
	* g++.dg/plugin/plugin.exp (plugin_test_list): Add the
	following...
	* g++.dg/plugin/spellcheck-1.C: New test case.
	* g++.dg/plugin/spellcheck-2.C: New test case.
	* g++.dg/plugin/spellcheck-3.C: New test case.
	* g++.dg/plugin/spellcheck-4.C: New test case.
	* g++.dg/plugin/spellcheck-5.C: New test case.
	* g++.dg/plugin/spellcheck-6.C: New test case.
	* g++.dg/plugin/spellcheck.c: New test plugin.
	* lib/plugin-support.exp (plugin-get-options): Add handling
	for "dg-pkgconfig" to the special-cased directive handling.
	Verbosely log the resulting value of dg-extra-tool-flags.
	(plugin-test-execute): Handle "unsupported" errors within
	plugin-get-options by marking the testcase as unsupported and
	immediately returning.
---
 gcc/testsuite/g++.dg/plugin/plugin.exp     |   7 +
 gcc/testsuite/g++.dg/plugin/spellcheck-1.C |  14 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-2.C |  13 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-3.C |   2 +
 gcc/testsuite/g++.dg/plugin/spellcheck-4.C |  19 ++
 gcc/testsuite/g++.dg/plugin/spellcheck-5.C |   4 +
 gcc/testsuite/g++.dg/plugin/spellcheck-6.C |   5 +
 gcc/testsuite/g++.dg/plugin/spellcheck.c   | 361 +++++++++++++++++++++++++++++
 gcc/testsuite/lib/plugin-support.exp       |  32 ++-
 9 files changed, 456 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-1.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-2.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-3.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-4.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-5.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck-6.C
 create mode 100644 gcc/testsuite/g++.dg/plugin/spellcheck.c

diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp b/gcc/testsuite/g++.dg/plugin/plugin.exp
index e40cba3..e3f030f 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -69,6 +69,13 @@ set plugin_test_list [list \
     	  show-template-tree-color.C \
     	  show-template-tree-color-no-elide-type.C } \
     { comment_plugin.c comments-1.C } \
+    { spellcheck.c \
+	  spellcheck-1.C \
+	  spellcheck-2.C \
+	  spellcheck-3.C \
+	  spellcheck-4.C \
+	  spellcheck-5.C \
+	  spellcheck-6.C } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-1.C b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C
new file mode 100644
index 0000000..242ab13
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-1.C
@@ -0,0 +1,14 @@
+/* When NONCONST_PRED is false the code will evaulate to constant.  */
+/* { dg-warning "46: possible spelling mistake in comment: 'evaulate'" "" { target *-*-* } .-1 } */
+
+// Example of a boogus C++-style comment
+/* { dg-warning "17: possible spelling mistake in comment: 'boogus'" "" { target *-*-* } .-1 } */
+
+/* Exercise the code path for the final word in a commment*/
+/* { dg-warning "51: possible spelling mistake in comment: 'commment'" "" { target *-*-* } .-1 } */
+
+// Exercise the code path for the final word in a C++ commment
+/* { dg-warning "55: possible spelling mistake in comment: 'commment'" "" { target *-*-* } .-1 } */
+
+/* Examples of words that are correctly spelled, containing quotes:
+   don't doesn't is isn't.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-2.C b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C
new file mode 100644
index 0000000..8f28f62
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-2.C
@@ -0,0 +1,13 @@
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+/* Verify that misspelled words are underlined correctly.  We need
+   a word that the checker won't offer a suggestion for, to avoid
+   having to specify checker output within the test case.
+   Hence the use of splendiferouslitudinalistic here.  */
+#if 0
+{ dg-warning "possible spelling mistake in comment: 'splendiferouslitudinalistic'" "" { target *-*-* } .-2 }
+{ dg-begin-multiline-output "" }
+    Hence the use of splendiferouslitudinalistic here.
+                     ^~~~~~~~~~~~~~~~~~~~~~~~~~~
+{ dg-end-multiline-output "" }
+#endif
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-3.C b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C
new file mode 100644
index 0000000..c497988
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-3.C
@@ -0,0 +1,2 @@
+// { dg-options "-fplugin-arg-spellcheck-show-provider" }
+// { dg-message "spellcheck provider: '.*'  description: '.*'  file: '.*'" "" { target *-*-* } 0 }
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-4.C b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C
new file mode 100644
index 0000000..2038f10
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-4.C
@@ -0,0 +1,19 @@
+/* Verify that we don't complain about our legal boilerplate.  */
+
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-5.C b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C
new file mode 100644
index 0000000..c7f4714
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-5.C
@@ -0,0 +1,4 @@
+// { dg-options "-fplugin-arg-spellcheck-language=this_is_not_valid" }
+// { dg-error "unknown language tag: 'this_is_not_valid'" "" { target *-*-* } 0 }
+
+/* A test comment.  */
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck-6.C b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C
new file mode 100644
index 0000000..5a95881
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck-6.C
@@ -0,0 +1,5 @@
+// Example of specifying language
+// { dg-options "-fplugin-arg-spellcheck-language=en_GB" }
+// Is it color or colour?
+// { dg-warning "possible spelling mistake in comment: 'color'" "" { target *-*-* } .-1 }
+// { dg-message "suggestion: 'colour'" "" { target *-*-* } .-2 }
diff --git a/gcc/testsuite/g++.dg/plugin/spellcheck.c b/gcc/testsuite/g++.dg/plugin/spellcheck.c
new file mode 100644
index 0000000..6851637
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/spellcheck.c
@@ -0,0 +1,361 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This plugin links against libenchant.
+   Use pkg-config to locate compilation and linkage flags for
+   libenchant, treating the testcase as unsupported if pkg-config or
+   enchant are not available.  */
+
+/* { dg-pkgconfig "enchant" } */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "options.h"
+#include "c-family/c-pragma.h"
+
+/* The dg-pkgconfig above should have provided the necessary
+   -I flags to locate enchant.h (e.g. "-I/usr/include/enchant").  */
+
+#include <enchant.h>
+
+/* Enchant is LGPL 2.1 "or (at your option) any later version." with
+   a special exception to allow non-LGPL spellchecking providers.  */
+
+int plugin_is_GPL_compatible;
+
+/* Class wrapping the result of enchant_dict_suggest, ensuring that
+   the suggestions are cleaned up.  */
+
+class suggestions
+{
+ public:
+  suggestions (EnchantDict *dict, const char *const word, ssize_t len)
+    : m_dict (dict), m_suggestions (NULL)
+    {
+      m_suggestions = enchant_dict_suggest (m_dict, word, len, &m_num);
+    }
+  ~suggestions ()
+    {
+      enchant_dict_free_string_list (m_dict, m_suggestions);
+    }
+
+  size_t get_count () const { return m_num; }
+  const char *get_suggestion (size_t index) const
+  {
+    return m_suggestions[index];
+  }
+
+ private:
+  EnchantDict *m_dict;
+  size_t m_num;
+  char **m_suggestions;
+};
+
+/* Callback for describing metadata about the Enchant provider in use.  */
+
+static void
+describe_broker_cb (const char * const provider_name,
+		    const char * const provider_desc,
+		    const char * const provider_dll_file,
+		    void *)
+{
+  inform (UNKNOWN_LOCATION,
+	  "spellcheck provider: %qs  description: %qs  file: %qs",
+	  provider_name, provider_desc, provider_dll_file);
+}
+
+/* A class for wrapping access to the Enchant API (both the broker
+   and the dict).  */
+
+class checker
+{
+ public:
+  checker (const char *language_tag, bool show_provider);
+  ~checker ();
+
+  bool misspelled_p (const char *const word, ssize_t len);
+  suggestions get_suggestions (const char *const word, ssize_t len);
+
+  static checker *singleton;
+
+ private:
+  EnchantBroker *m_broker;
+  EnchantDict *m_dict;
+};
+
+/* Callback for describing the available languages.  */
+
+static void
+describe_dict_cb (const char * const lang_tag,
+		  const char * const provider_name,
+		  const char * const provider_desc,
+		  const char * const provider_file,
+		  void *)
+{
+  inform (UNKNOWN_LOCATION,
+	  "available language: %qs"
+	  " (provider: %qs  description: %qs  file: %qs)",
+	  lang_tag, provider_name, provider_desc, provider_file);
+}
+
+/* checker's ctor.  */
+
+checker::checker (const char *language_tag, bool show_provider)
+: m_broker (NULL), m_dict (NULL)
+{
+  m_broker = enchant_broker_init ();
+  if (!m_broker)
+    {
+      error_at (UNKNOWN_LOCATION, "enchant_broker_init failed");
+      return;
+    }
+  m_dict = enchant_broker_request_dict (m_broker, language_tag);
+  if (!m_dict)
+    {
+      error_at (UNKNOWN_LOCATION, "unknown language tag: %qs", language_tag);
+      enchant_broker_list_dicts (m_broker, describe_dict_cb, NULL);
+    }
+
+  if (show_provider)
+    enchant_broker_describe (m_broker, describe_broker_cb, NULL);
+
+  // TODO: will probably want to support local wordlists also?
+}
+
+/* checker's dtor.  */
+
+checker::~checker ()
+{
+  if (m_dict)
+    enchant_broker_free_dict (m_broker, m_dict);
+  if (m_broker)
+    enchant_broker_free (m_broker);
+}
+
+/* Return true if WORD of length LEN appears to be misspelled.  */
+
+bool
+checker::misspelled_p (const char *const word, ssize_t len)
+{
+  if (!m_dict)
+    return false;
+
+  int result = enchant_dict_check (m_dict, word, len);
+  return result != 0;
+}
+
+/* Get suggestions for misspelled WORD of length LEN.  */
+
+suggestions
+checker::get_suggestions (const char *const word, ssize_t len)
+{
+  gcc_assert (m_dict);
+  return suggestions (m_dict, word, len);
+}
+
+/* Singleton instance of checker.  */
+
+checker *checker::singleton = NULL;
+
+/* Generate a location for a word of length LEN within ORD_MAP at the
+   given LINE AND COLUMN.  */
+
+static location_t
+make_word_location (const line_map_ordinary *ord_map, int line, int column,
+		    size_t len)
+{
+  location_t start_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line, column);
+  location_t end_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line,
+					    column + len - 1);
+  return make_location (start_loc, start_loc, end_loc);
+}
+
+/* Spellcheck the word at WORD of length LEN, with a location
+   within ORD_MAP at the given LINE AND COLUMN.
+   Issue warnings and suggestions for possibly misspelled words.  */
+
+static void
+spellcheck_word (const unsigned char *word, size_t len,
+		 const line_map_ordinary *ord_map, int line, int column)
+{
+  /* Don't bother spellchecking words with underscores or any upper case,
+     as these may refer to code entities.
+     TODO: should we also merge in names from code?  */
+  for (size_t i = 0; i < len; i++)
+    {
+      if (word[i] == '_' || ISUPPER (word[i]))
+	return;
+    }
+
+  if (!checker::singleton->misspelled_p ((const char *)word, len))
+    return;
+
+  location_t word_loc = make_word_location (ord_map, line, column, len);
+
+  if (warning_at (word_loc, 0, "possible spelling mistake in comment: %q.*s", (int)len, word))
+    {
+      suggestions s
+	= checker::singleton->get_suggestions ((const char *)word, len);
+      for (size_t i = 0; i < s.get_count (); i++)
+	{
+	  rich_location richloc (line_table, word_loc);
+	  richloc.add_fixit_replace (s.get_suggestion (i));
+	  inform_at_rich_loc (&richloc, "suggestion: %qs",
+			      s.get_suggestion (i));
+	}
+    }
+}
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+static void
+spellcheck_enchant_check_comment (cpp_reader *, source_location loc,
+				  const unsigned char *content, size_t len)
+{
+  /* Don't spellcheck within system headers.  */
+  if (in_system_header_at (loc))
+    return;
+
+  /* Split into words, tracking the location, and call spellcheck_word
+     on each word.  */
+
+  /* CONTENT contains the opening slash-star (or slash-slash),
+     and for C-style comments contains the closing star-slash.  */
+  gcc_assert (len >= 2);
+  gcc_assert (content[0] == '/');
+  gcc_assert (content[1] == '*' || content[1] == '/');
+  bool c_style = (content[1] == '*');
+  if (c_style)
+    {
+      gcc_assert (content[len - 2] == '*');
+      gcc_assert (content[len - 1] == '/');
+    }
+  const unsigned char *iter = content + 2;
+  const unsigned char *after_last_char = content + len;
+  if (c_style)
+    after_last_char -= 2;
+  const unsigned char *word_start = NULL;
+
+  const struct line_map *map = linemap_lookup (line_table, loc);
+  const line_map_ordinary *ord_map = linemap_check_ordinary (map);
+
+  int column = LOCATION_COLUMN (loc) + 2;
+  int line = LOCATION_LINE (loc);
+  int word_start_column = column;
+  int word_start_line = line;
+
+  bool within_url = false;
+  while (iter < after_last_char)
+    {
+      unsigned char ch = *iter;
+      if (ISALPHA (ch) || ch == '_' || ch == '\'')
+	{
+	  if (!word_start)
+	    {
+	      /* We have the start of a word.  */
+	      word_start = iter;
+	      word_start_column = column;
+	      word_start_line = line;
+	    }
+	}
+      else
+	{
+	  if (word_start)
+	    {
+	      /* We have a word ending.  */
+	      size_t word_len = iter - word_start;
+	      /* Don't check any comments that appear to contain DejaGnu
+		 directives.  */
+	      if (word_len >= 2)
+		if (0 == strncmp ((const char *)word_start, "dg", 2))
+		  return;
+
+	      /* Detect things that look like URLs.  */
+	      /* "www."  */
+	      if (word_len == 3 && ch == '.')
+		if (0 == strncmp ((const char *)word_start, "www", 3))
+		  within_url = true;
+
+	      /* "http:"  */
+	      if (word_len == 4 && ch == ':')
+		if (0 == strncmp ((const char *)word_start, "http", 4))
+		  within_url = true;
+
+	      if (!within_url)
+		spellcheck_word (word_start, word_len, ord_map,
+				 word_start_line, word_start_column);
+	      word_start = NULL;
+	    }
+	}
+      if (ISSPACE (ch))
+	within_url = false;
+      if (ch == '\n')
+	{
+	  column = 1;
+	  line++;
+	}
+      else
+	column++;
+      iter++;
+    }
+  if (word_start)
+    if (!within_url)
+      spellcheck_word (word_start, iter - word_start, ord_map,
+		       word_start_line, word_start_column);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  bool show_provider = false;
+  const char *language = "en_US";
+
+  /* Process the plugin arguments.  */
+  for (int i = 0; i < argc; ++i)
+    {
+      if (0 == strcmp (argv[i].key, "show-provider"))
+	show_provider = true;
+      else if (0 == strcmp (argv[i].key, "language"))
+	language = xstrdup (argv[i].value);
+      else
+	warning_at (UNKNOWN_LOCATION, 0,
+		    "plugin %qs: unrecognized argument %qs ignored",
+		    plugin_name, argv[i].key);
+    }
+
+  checker::singleton = new checker (language, show_provider);
+
+  cpp_callbacks *cb = cpp_get_callbacks (parse_in);
+  cb->comment = spellcheck_enchant_check_comment;
+
+  // TODO: register a cleanup hook, to delete checker::singleton
+
+  return 0;
+}
diff --git a/gcc/testsuite/lib/plugin-support.exp b/gcc/testsuite/lib/plugin-support.exp
index 0754e84..14cc628 100644
--- a/gcc/testsuite/lib/plugin-support.exp
+++ b/gcc/testsuite/lib/plugin-support.exp
@@ -42,12 +42,30 @@ proc plugin-get-options { src } {
 		unresolved "$src: $errmsg for \"$op\""
 		return
 	    }
+	} elseif { ![string compare "dg-pkgconfig" $cmd] } {
+	    # Attempt to run "pkg-config --cflags --libs $arg"
+	    # If successful, add stdout to options.
+	    # If a failure occurs (e.g. pkg-config is not installed,
+	    # or $arg is not found), the test case is unsupported;
+	    # generate an "unsupported" error for the caller to catch.
+	    set arg [lindex $op 2]
+	    verbose "dg-pkgconfig: $arg" 2
+	    #set cmdline "pkg-config --cflags --libs $arg"
+	    if {[catch {exec pkg-config --cflags --libs $arg} result] == 0} { 
+		verbose "pkg-config succeeded: $result" 2
+		append dg-extra-tool-flags $result
+	    } else { 
+		verbose "pkg-config failed: $result" 2
+		error "unsupported"
+	    } 
 	} else {
 	    # Ignore unrecognized dg- commands, but warn about them.
 	    warning "plugin.exp does not support $cmd"
 	}
    }
 
+    verbose "dg-extra-tool-flags: ${dg-extra-tool-flags}" 2
+
     # Return flags to use for compiling the plugin source file
     return ${dg-extra-tool-flags}
 }
@@ -74,7 +92,19 @@ proc plugin-test-execute { plugin_src plugin_tests } {
     verbose "Test the plugin $testcase" 1
 
     # Build the plugin itself
-    set extra_flags [plugin-get-options $plugin_src]
+
+    # Attempt to get extra_flags for building the plugin.
+    if [catch {set extra_flags [plugin-get-options $plugin_src]} reason] {
+	set saved_info $::errorInfo
+	if { $reason == "unsupported" } {
+	    # If the plugin is unsupported in this configuration,
+	    # then we're done.
+	    unsupported "$testcase"
+	    return;
+	}
+	# Otherwise re-emit the error
+	error $reason $saved_info
+    }
 
     # Note that the plugin test support currently only works when the GCC
     # build tree is available. (We make sure that is the case in plugin.exp.)
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] testsuite: example plugin for spellchecking comments
  2017-06-08  9:34   ` [PATCH] testsuite: example plugin for spellchecking comments David Malcolm
@ 2017-06-08 19:47     ` Mike Stump
  0 siblings, 0 replies; 7+ messages in thread
From: Mike Stump @ 2017-06-08 19:47 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On Jun 8, 2017, at 3:07 AM, David Malcolm <dmalcolm@redhat.com> wrote:
> 
> Given that Enchant seems a stretch as a hard dependency,

But, by using autoconf, it isn't a hard dependency.  If it is there, it is built, if it isn't, it isn't.  I think it should be trivial (10-20 lines) to do this.

If you want to ship this, I think it makes more sense to just do that.  Build and install in the usual way, conditionalizing the build as necessary.  I think the autoconf code is 10-20 lines.

We could drop this into the testsuite as a test of the infrastructure, if you want; but if you'd like to ship it, I'd rather just start it in the source tree.  If you'd prefer to not ship it for now and just have it live in the testsuite, the patch is Ok. I'd like to see it shipped.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-08 19:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
2017-05-02 21:58 ` Mike Stump
2017-06-08  9:34   ` [PATCH] testsuite: example plugin for spellchecking comments David Malcolm
2017-06-08 19:47     ` Mike Stump
2017-05-03  0:09 ` [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo Trevor Saunders
2017-05-12 18:56 ` Jeff Law
2017-06-05 20:58   ` [committed] libcpp: add callback for comment-handling David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).