public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo
@ 2017-05-02 19:06 David Malcolm
  2017-05-02 21:58 ` Mike Stump
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Malcolm @ 2017-05-02 19:06 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Currently the C/C++ frontends discard comments when parsing.
It's possible to set up libcpp to capture comments as tokens,
by setting CPP_OPTION (pfile, discard_comments) to false),
and this can be enabled using the -C command line option (see
also -CC), but c-family/c-lex.c then discards any CPP_COMMENT
tokens it sees, so they're not seen by the frontend parser.

The following patch adds an (optional) callback to libcpp
for handling comments, giving the comment content, and the
location it was seen at.  This approach allows arbitrary
logic to be wired up to comments, and avoids having to
copy the comment content to a new buffer (which the CPP_COMMENT
approach does).

This could be used by plugins to chain up on the callback
e.g. to parse specially-formatted comments, e.g. for
documentation generation, or e.g. for GObject introspection
annotations [1].

As a proof of concept, the patch uses this to add a spellchecker
for comments.  It uses the Enchant meta-library:
   https://abiword.github.io/enchant/
(essentially a wrapper around 8 different spellchecking libraries).
I didn't bother with the autotool detection for enchant, and
just hacked it in for now.

Example output:

test.c:3:46: warning: spellcheck_word: "evaulate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
test.c:3:46: note: suggestion: "evaluate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              evaluate
test.c:3:46: note: suggestion: "ululate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              ululate
test.c:3:46: note: suggestion: "elevate"
    When NONCONST_PRED is false the code will evaulate to constant and
                                              ^~~~~~~~
                                              elevate

License-wise, Enchant is LGPL 2.1 "or (at your option) any
later version." with a special exception to allow non-LGPL
spellchecking providers (e.g. to allow linking against an
OS-provided spellchecker).

Various FIXMEs are present (e.g. hardcoded "en_US" for the
language to spellcheck against).
Also, the spellchecker has a lot of false positives e.g.
it doesn't grok URLs (and thus complains when it seens them);
similar for DejaGnu directives etc.

Does enchant seem like a reasonable dependency for the compiler?
(it pulls in libpthread.so.0, libglib-2.0.so.0, libgmodule-2.0.so.0).
Or would this be better pursued as a plugin?  (if so, I'd
prefer the plugin to live in the source tree as an example,
rather than out-of-tree).

Unrelated to spellchecking, I also added two new options:
-Wfixme and -Wtodo, for warning when comments containing
"FIXME" or "TODO" are encountered.
I use such comments a lot during development.  I thought some
people might want a warning about them (I tend to just use grep
though).  [TODO: document these in invoke.texi, add test cases]

Thoughts?  Does any of this sound useful?

[not yet bootstrapped; as noted above, I haven't yet done
the autoconf stuff for handling Enchant]

[1] https://wiki.gnome.org/Projects/GObjectIntrospection/Annotations

gcc/ChangeLog:
	* Makefile.in (LIBS): Hack in -lenchant for now.
	(OBJS): Add spellcheck-enchant.o.
	* common.opt (Wfixme): New option.
	(Wtodo): New option.
	* spellcheck-enchant.c: New file.
	* spellcheck-enchant.h: New file.

gcc/c-family/ChangeLog:
	* c-lex.c: Include spellcheck-enchant.h.
	(init_c_lex): Wire up spellcheck_enchant_check_comment to the
	comment callback.
	* c-opts.c: Include spellcheck-enchant.h.
	(c_common_post_options): Call spellcheck_enchant_init.
	(c_common_finish): Call spellcheck_enchant_finish.

libcpp/ChangeLog:
	* include/cpplib.h (struct cpp_callbacks): Add "comment"
	callback.
	* lex.c (_cpp_lex_direct): Call the comment callback if non-NULL.
---
 gcc/Makefile.in          |   3 +-
 gcc/c-family/c-lex.c     |   2 +
 gcc/c-family/c-opts.c    |   4 +
 gcc/common.opt           |   8 ++
 gcc/spellcheck-enchant.c | 294 +++++++++++++++++++++++++++++++++++++++++++++++
 gcc/spellcheck-enchant.h |  33 ++++++
 libcpp/include/cpplib.h  |   4 +
 libcpp/lex.c             |   7 ++
 8 files changed, 354 insertions(+), 1 deletion(-)
 create mode 100644 gcc/spellcheck-enchant.c
 create mode 100644 gcc/spellcheck-enchant.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f675e07..6bb3dc0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1046,7 +1046,7 @@ BUILD_LIBDEPS= $(BUILD_LIBIBERTY)
 # How to link with both our special library facilities
 # and the system's installed libraries.
 LIBS = @LIBS@ libcommon.a $(CPPLIB) $(LIBINTL) $(LIBICONV) $(LIBBACKTRACE) \
-	$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS)
+	$(LIBIBERTY) $(LIBDECNUMBER) $(HOST_LIBS) -lenchant
 BACKENDLIBS = $(ISLLIBS) $(GMPLIBS) $(PLUGINLIBS) $(HOST_LIBS) \
 	$(ZLIB)
 # Any system libraries needed just for GNAT.
@@ -1450,6 +1450,7 @@ OBJS = \
 	simplify-rtx.o \
 	sparseset.o \
 	spellcheck.o \
+	spellcheck-enchant.o \
 	spellcheck-tree.o \
 	sreal.o \
 	stack-ptr-mod.o \
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index e1c8bdf..389302c 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 
 #include "attribs.h"
+#include "spellcheck-enchant.h"
 
 /* We may keep statistics about how long which files took to compile.  */
 static int header_time, body_time;
@@ -82,6 +83,7 @@ init_c_lex (void)
   cb->has_attribute = c_common_has_attribute;
   cb->get_source_date_epoch = cb_get_source_date_epoch;
   cb->get_suggestion = cb_get_suggestion;
+  cb->comment = spellcheck_enchant_check_comment;
 
   /* Set the debug callbacks if we can use them.  */
   if ((debug_info_level == DINFO_LEVEL_VERBOSE
diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c
index ea0e01b..cc58dd0 100644
--- a/gcc/c-family/c-opts.c
+++ b/gcc/c-family/c-opts.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "plugin.h"		/* For PLUGIN_INCLUDE_FILE event.  */
 #include "mkdeps.h"
 #include "dumpfile.h"
+#include "spellcheck-enchant.h"
 
 #ifndef DOLLARS_IN_IDENTIFIERS
 # define DOLLARS_IN_IDENTIFIERS true
@@ -995,6 +996,7 @@ c_common_post_options (const char **pfilename)
     }
   else
     {
+      spellcheck_enchant_init ("en_US"); // FIXME: get from an option
       init_c_lex ();
 
       /* When writing a PCH file, avoid reading some other PCH file,
@@ -1177,6 +1179,8 @@ c_common_finish (void)
      with cpp_destroy ().  */
   cpp_finish (parse_in, deps_stream);
 
+  spellcheck_enchant_finish ();
+
   if (deps_stream && deps_stream != out_stream
       && (ferror (deps_stream) || fclose (deps_stream)))
     fatal_error (input_location, "closing dependency file %s: %m", deps_file);
diff --git a/gcc/common.opt b/gcc/common.opt
index 5817407..f17d823 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -585,6 +585,10 @@ Wfatal-errors
 Common Var(flag_fatal_errors)
 Exit on the first error occurred.
 
+Wfixme
+Common Var(warn_fixme) Warning
+Complain about comments containing 'FIXME'.
+
 Wframe-larger-than=
 Common RejectNegative Joined UInteger Warning
 -Wframe-larger-than=<number>	Warn if a function's stack frame requires more than <number> bytes.
@@ -737,6 +741,10 @@ Wsystem-headers
 Common Var(warn_system_headers) Warning
 Do not suppress warnings from system headers.
 
+Wtodo
+Common Var(warn_todo) Warning
+Complain about comments containing 'TODO'.
+
 Wtrampolines
 Common Var(warn_trampolines) Warning
 Warn whenever a trampoline is generated.
diff --git a/gcc/spellcheck-enchant.c b/gcc/spellcheck-enchant.c
new file mode 100644
index 0000000..30152c1
--- /dev/null
+++ b/gcc/spellcheck-enchant.c
@@ -0,0 +1,294 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "options.h"
+
+/* Enchant is LGPL 2.1 "or (at your option) any later version." with
+   a special exception to allow non-LGPL spellchecking providers.  */
+
+#include <enchant/enchant.h> // FIXME
+
+/* Class wrapping the result of enchant_dict_suggest, ensuring that
+   the suggestions are cleaned up.  */
+
+class suggestions
+{
+ public:
+  suggestions (EnchantDict *dict, const char *const word, ssize_t len)
+    : m_dict (dict), m_suggestions (NULL)
+    {
+      m_suggestions = enchant_dict_suggest (m_dict, word, len, &m_num);
+    }
+  ~suggestions ()
+    {
+      enchant_dict_free_string_list (m_dict, m_suggestions);
+    }
+
+  size_t get_count () const { return m_num; }
+  const char *get_suggestion (size_t index) const
+  {
+    return m_suggestions[index];
+  }
+
+ private:
+  EnchantDict *m_dict;
+  size_t m_num;
+  char **m_suggestions;
+};
+
+/* Callback for describing metadata about the Enchant provider in use.  */
+
+static void
+describe_broker_cb (const char * const provider_name,
+		    const char * const provider_desc,
+		    const char * const provider_dll_file,
+		    void *)
+{
+  inform (UNKNOWN_LOCATION,
+	  "spellcheck provider: %qs  description: %qs  file: %qs",
+	  provider_name, provider_desc, provider_dll_file);
+}
+
+/* A class for wrapping access to the Enchant API (both the broker
+   and the dict).  */
+
+class checker
+{
+ public:
+  checker (const char *language_tag);
+  ~checker ();
+
+  bool misspelled_p (const char *const word, ssize_t len);
+  suggestions get_suggestions (const char *const word, ssize_t len);
+
+  static checker *singleton;
+
+ private:
+  EnchantBroker *m_broker;
+  EnchantDict *m_dict;
+};
+
+/* checker's ctor.  */
+
+checker::checker (const char *language_tag)
+: m_broker (NULL), m_dict (NULL)
+{
+  m_broker = enchant_broker_init ();
+  if (!m_broker)
+    return;
+  m_dict = enchant_broker_request_dict (m_broker, language_tag);
+
+  enchant_broker_describe (m_broker, describe_broker_cb, NULL);
+
+  // FIXME: will probably want to support local wordlists also?
+}
+
+/* checker's dtor.  */
+
+checker::~checker ()
+{
+  if (m_dict)
+    enchant_broker_free_dict (m_broker, m_dict);
+  if (m_broker)
+    enchant_broker_free (m_broker);
+}
+
+/* Return true if WORD of length LEN appears to be misspelled.  */
+
+bool
+checker::misspelled_p (const char *const word, ssize_t len)
+{
+  int result = enchant_dict_check (m_dict, word, len);
+  return result != 0;
+}
+
+/* Get suggestions for misspelled WORD of length LEN.  */
+
+suggestions
+checker::get_suggestions (const char *const word, ssize_t len)
+{
+  return suggestions (m_dict, word, len);
+}
+
+/* Singleton instance of checker.  */
+
+checker *checker::singleton = NULL;
+
+/* Generate a location for a word of length LEN within ORD_MAP at the
+   given LINE AND COLUMN.  */
+
+static location_t
+make_word_location (const line_map_ordinary *ord_map, int line, int column,
+		    size_t len)
+{
+  location_t start_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line, column);
+  location_t end_loc
+    = linemap_position_for_line_and_column (line_table, ord_map, line,
+					    column + len - 1);
+  return make_location (start_loc, start_loc, end_loc);
+}
+
+/* Subroutine for Wfixme and Wtodo.
+
+   If the WORD of length LEN matches BAD_WORD, issue a warning controlled
+   by OPT at the source location given by ORD_MAP, LINE, and COLUMN, and
+   return true.
+
+   Otherwise return false.  */
+
+static inline bool
+maybe_warn_about_unfinished_code (const char *bad_word, int opt,
+				  const unsigned char *word, size_t len,
+				  const line_map_ordinary *ord_map,
+				  int line, int column)
+{
+  if (len == strlen (bad_word) && !strncmp ((const char *)word, bad_word, len))
+    {
+      location_t word_loc = make_word_location (ord_map, line, column, len);
+      warning_at (word_loc, opt, "%qs found in comment", bad_word);
+      return true;
+    }
+  return false;
+}
+
+/* Spellcheck the word at WORD of length LEN, with a location
+   within ORD_MAP at the given LINE AND COLUMN.
+   Issue warnings and suggestions for possibly misspelled words,
+   and for "FIXME" and "TODO" comments.  */
+
+static void
+spellcheck_word (const unsigned char *word, size_t len,
+		 const line_map_ordinary *ord_map, int line, int column)
+{
+  /* Complain about "FIXME" and "TODO" in comments.  */
+  if (maybe_warn_about_unfinished_code ("FIXME", OPT_Wfixme, word, len, ord_map,
+					line, column))
+    return;
+  if (maybe_warn_about_unfinished_code ("TODO", OPT_Wtodo, word, len, ord_map,
+					line, column))
+    return;
+
+  /* Don't bother spellchecking words with underscores or any upper case,
+     as these may refer to code entities.
+     FIXME: should we also merge in names from code?  */
+  for (size_t i = 0; i < len; i++)
+    {
+      if (word[i] == '_' || ISUPPER (word[i]))
+	return;
+    }
+
+  if (!checker::singleton->misspelled_p ((const char *)word, len))
+    return;
+
+  location_t word_loc = make_word_location (ord_map, line, column, len);
+
+  if (warning_at (word_loc, 0, "spellcheck_word: \"%.*s\"", (int)len, word))
+    {
+      suggestions s
+	= checker::singleton->get_suggestions ((const char *)word, len);
+      for (size_t i = 0; i < s.get_count (); i++)
+	{
+	  rich_location richloc (line_table, word_loc);
+	  richloc.add_fixit_replace (s.get_suggestion (i));
+	  inform_at_rich_loc (&richloc, "suggestion: \"%s\"",
+			      s.get_suggestion (i));
+	}
+    }
+}
+
+/* Initialize this module, using the given LANGUAGE_TAG as the
+   language to use for spellchecking (e.g. "en_US").  */
+
+void
+spellcheck_enchant_init (const char *language_tag)
+{
+  checker::singleton = new checker (language_tag);
+}
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+void
+spellcheck_enchant_check_comment (cpp_reader *, source_location loc,
+				  const unsigned char *content, size_t len)
+{
+  /* Split into words, tracking the location, and call spellcheck_word
+     on each word.  */
+
+  // FIXME: sanity check LOC
+
+  const struct line_map *map = linemap_lookup (line_table, loc);
+  const line_map_ordinary *ord_map = linemap_check_ordinary (map);
+
+  int column = LOCATION_COLUMN (loc);
+  int line = LOCATION_LINE (loc);
+  int word_start_column = column;
+  int word_start_line = line;
+  const unsigned char *last_char = content + len;
+  const unsigned char *iter = content;
+  const unsigned char *word_start = NULL;
+  while (iter <= last_char)
+    {
+      unsigned char ch = *iter;
+      if (ISALPHA (ch) || ch == '_')
+	{
+	  if (!word_start)
+	    {
+	      word_start = iter;
+	      word_start_column = column;
+	      word_start_line = line;
+	    }
+	}
+      else
+	{
+	  if (word_start)
+	    {
+	      /* We have a word ending.  */
+	      spellcheck_word (word_start, iter - word_start, ord_map,
+			       word_start_line, word_start_column);
+	      word_start = NULL;
+	    }
+	}
+      if (ch == '\n')
+	{
+	  column = 1;
+	  line++;
+	}
+      else
+	column++;
+      iter++;
+    }
+  if (word_start)
+    // FIXME: doublecheck length here
+    spellcheck_word (word_start, iter - word_start - 1, ord_map,
+		     word_start_line, word_start_column);
+}
+
+/* Clean up this module.  */
+
+void
+spellcheck_enchant_finish (void)
+{
+  delete checker::singleton;
+}
diff --git a/gcc/spellcheck-enchant.h b/gcc/spellcheck-enchant.h
new file mode 100644
index 0000000..33aac0a
--- /dev/null
+++ b/gcc/spellcheck-enchant.h
@@ -0,0 +1,33 @@
+/* Spellchecking using the Enchant "meta-library".
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_ENCHANT_H
+#define GCC_SPELLCHECK_ENCHANT_H
+
+extern void spellcheck_enchant_init (const char *language_tag);
+
+/* Callback for cpp_callbacks::comments for spellchecking comments
+   using Enchant.  */
+
+extern void spellcheck_enchant_check_comment (cpp_reader *, source_location,
+					      const unsigned char *, size_t);
+
+extern void spellcheck_enchant_finish (void);
+
+#endif  /* GCC_SPELLCHECK_ENCHANT_H  */
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index b843992..3138738 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -609,6 +609,10 @@ struct cpp_callbacks
 
   /* Callback for providing suggestions for misspelled directives.  */
   const char *(*get_suggestion) (cpp_reader *, const char *, const char *const *);
+
+  /* Callback for when a comment is encountered.  */
+  void (*comment) (cpp_reader *, source_location, const unsigned char *,
+		   size_t);
 };
 
 #ifdef VMS
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 9edd2a6..40ff801 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2889,6 +2889,13 @@ _cpp_lex_direct (cpp_reader *pfile)
       if (fallthrough_comment_p (pfile, comment_start))
 	fallthrough_comment = true;
 
+      if (pfile->cb.comment)
+	{
+	  size_t len = pfile->buffer->cur - comment_start;
+	  pfile->cb.comment (pfile, result->src_loc, comment_start - 1,
+			     len + 1);
+	}
+
       if (!pfile->state.save_comments)
 	{
 	  result->flags |= PREV_WHITE;
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-06-08 19:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-02 19:06 [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo David Malcolm
2017-05-02 21:58 ` Mike Stump
2017-06-08  9:34   ` [PATCH] testsuite: example plugin for spellchecking comments David Malcolm
2017-06-08 19:47     ` Mike Stump
2017-05-03  0:09 ` [PATCH] RFC: spellchecker for comments, plus -Wfixme and -Wtodo Trevor Saunders
2017-05-12 18:56 ` Jeff Law
2017-06-05 20:58   ` [committed] libcpp: add callback for comment-handling David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).