public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/4] Add more spellcheck selftests
  2016-06-14 14:49 [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id David Malcolm
  2016-06-14 14:49 ` [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject David Malcolm
  2016-06-14 14:49 ` [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names David Malcolm
@ 2016-06-14 14:49 ` David Malcolm
  2016-06-14 20:59   ` Jeff Law
  2016-06-14 20:58 ` [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id Jeff Law
  3 siblings, 1 reply; 8+ messages in thread
From: David Malcolm @ 2016-06-14 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

The next patch in the kit reimplements find_closest_string and
find_closest_identifier, so it seems prudent to add some more
test coverage for these.  This patch also adds some more test
coverage for levenshtein_distance itself.

Successfully bootstrapped&regrtested in combination with the rest of
the kit on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
just this patch (on top of patch 1).

OK for trunk if it passes individual bootstrap&regrtest?

gcc/ChangeLog:
	* selftest-run-tests.c (selftest::run_tests): Call
	selftest::spellcheck_tree_c_tests.
	* selftest.h (selftest::spellcheck_tree_c_tests): New decl.
	* spellcheck-tree.c: Include selftest.h and stringpool.h.
	(selftest::test_find_closest_identifier): New function.
	(selftest::spellcheck_tree_c_tests): New function.
	* spellcheck.c (selftest::test_find_closest_string): Verify that
	the order of the vec does not affect the results for this case.
	(selftest::test_data): New array.
	(selftest::test_metric_conditions): New function.
	(selftest::spellcheck_c_tests): Add a test of case-comparison.
	Call selftest::test_metric_conditions.
---
 gcc/selftest-run-tests.c |  1 +
 gcc/selftest.h           |  1 +
 gcc/spellcheck-tree.c    | 49 ++++++++++++++++++++++++++++++++++++
 gcc/spellcheck.c         | 65 ++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 116 insertions(+)

diff --git a/gcc/selftest-run-tests.c b/gcc/selftest-run-tests.c
index 934e700..d4a9c0b 100644
--- a/gcc/selftest-run-tests.c
+++ b/gcc/selftest-run-tests.c
@@ -61,6 +61,7 @@ selftest::run_tests ()
   diagnostic_show_locus_c_tests ();
   fold_const_c_tests ();
   spellcheck_c_tests ();
+  spellcheck_tree_c_tests ();
   tree_cfg_c_tests ();
 
   /* This one relies on most of the above.  */
diff --git a/gcc/selftest.h b/gcc/selftest.h
index d1f8acc..7adea2f 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -54,6 +54,7 @@ extern void input_c_tests ();
 extern void pretty_print_c_tests ();
 extern void rtl_tests_c_tests ();
 extern void spellcheck_c_tests ();
+extern void spellcheck_tree_c_tests ();
 extern void tree_c_tests ();
 extern void tree_cfg_c_tests ();
 extern void vec_c_tests ();
diff --git a/gcc/spellcheck-tree.c b/gcc/spellcheck-tree.c
index eb6e72a..2d73b774 100644
--- a/gcc/spellcheck-tree.c
+++ b/gcc/spellcheck-tree.c
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm.h"
 #include "tree.h"
 #include "spellcheck.h"
+#include "selftest.h"
+#include "stringpool.h"
 
 /* Calculate Levenshtein distance between two identifiers.  */
 
@@ -78,3 +80,50 @@ find_closest_identifier (tree target, const auto_vec<tree> *candidates)
 
   return best_identifier;
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Selftests.  */
+
+/* Verify that find_closest_identifier is sane.  */
+
+static void
+test_find_closest_identifier ()
+{
+  auto_vec<tree> candidates;
+
+  /* Verify that it can handle an empty vec.  */
+  ASSERT_EQ (NULL, find_closest_identifier (get_identifier (""), &candidates));
+
+  /* Verify that it works sanely for non-empty vecs.  */
+  tree apple = get_identifier ("apple");
+  tree banana = get_identifier ("banana");
+  tree cherry = get_identifier ("cherry");
+  candidates.safe_push (apple);
+  candidates.safe_push (banana);
+  candidates.safe_push (cherry);
+
+  ASSERT_EQ (apple, find_closest_identifier (get_identifier ("app"),
+					     &candidates));
+  ASSERT_EQ (banana, find_closest_identifier (get_identifier ("banyan"),
+					      &candidates));;
+  ASSERT_EQ (cherry, find_closest_identifier (get_identifier ("berry"),
+					      &candidates));
+  ASSERT_EQ (NULL,
+	     find_closest_identifier (get_identifier ("not like the others"),
+				      &candidates));
+}
+
+/* Run all of the selftests within this file.  */
+
+void
+spellcheck_tree_c_tests ()
+{
+  test_find_closest_identifier ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index 11018f0..e03f484 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -217,6 +217,69 @@ test_find_closest_string ()
   ASSERT_STREQ ("banana", find_closest_string ("banyan", &candidates));
   ASSERT_STREQ ("cherry", find_closest_string ("berry", &candidates));
   ASSERT_EQ (NULL, find_closest_string ("not like the others", &candidates));
+
+  /* The order of the vec can matter, but it should not matter for these
+     inputs.  */
+  candidates.truncate (0);
+  candidates.safe_push ("cherry");
+  candidates.safe_push ("banana");
+  candidates.safe_push ("apple");
+  ASSERT_STREQ ("apple", find_closest_string ("app", &candidates));
+  ASSERT_STREQ ("banana", find_closest_string ("banyan", &candidates));
+  ASSERT_STREQ ("cherry", find_closest_string ("berry", &candidates));
+  ASSERT_EQ (NULL, find_closest_string ("not like the others", &candidates));
+}
+
+/* Test data for test_metric_conditions.  */
+
+static const char * const test_data[] = {
+  "",
+  "foo"
+  "food",
+  "boo",
+  "1234567890123456789012345678901234567890123456789012345678901234567890"
+};
+
+/* Verify that levenshtein_distance appears to be a sane distance function,
+   i.e. the conditions for being a metric.  This is done directly for a
+   small set of examples, using test_data above.  This is O(N^3) in the size
+   of the array, due to the test for the triangle inequality, so we keep the
+   array small.  */
+
+static void
+test_metric_conditions ()
+{
+  const int num_test_cases = sizeof (test_data) / sizeof (test_data[0]);
+
+  for (int i = 0; i < num_test_cases; i++)
+    {
+      for (int j = 0; j < num_test_cases; j++)
+	{
+	  edit_distance_t dist_ij
+	    = levenshtein_distance (test_data[i], test_data[j]);
+
+	  /* Identity of indiscernibles: d(i, j) > 0 iff i == j.  */
+	  if (i == j)
+	    ASSERT_EQ (dist_ij, 0);
+	  else
+	    ASSERT_TRUE (dist_ij > 0);
+
+	  /* Symmetry: d(i, j) == d(j, i).  */
+	  edit_distance_t dist_ji
+	    = levenshtein_distance (test_data[j], test_data[i]);
+	  ASSERT_EQ (dist_ij, dist_ji);
+
+	  /* Triangle inequality.  */
+	  for (int k = 0; k < num_test_cases; k++)
+	    {
+	      edit_distance_t dist_ik
+		= levenshtein_distance (test_data[i], test_data[k]);
+	      edit_distance_t dist_jk
+		= levenshtein_distance (test_data[j], test_data[k]);
+	      ASSERT_TRUE (dist_ik <= dist_ij + dist_jk);
+	    }
+	}
+    }
 }
 
 /* Verify levenshtein_distance for a variety of pairs of pre-canned
@@ -239,8 +302,10 @@ spellcheck_c_tests ()
     ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
      "All your base are belong to us",
      44);
+  levenshtein_distance_unit_test ("foo", "FOO", 3);
 
   test_find_closest_string ();
+  test_metric_conditions ();
 }
 
 } // namespace selftest
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names
  2016-06-14 14:49 [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id David Malcolm
  2016-06-14 14:49 ` [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject David Malcolm
@ 2016-06-14 14:49 ` David Malcolm
  2016-06-21 21:41   ` Jeff Law
  2016-06-14 14:49 ` [PATCH 2/4] Add more spellcheck selftests David Malcolm
  2016-06-14 20:58 ` [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id Jeff Law
  3 siblings, 1 reply; 8+ messages in thread
From: David Malcolm @ 2016-06-14 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch introduces a new lookup_name_fuzzy function to the
C frontend and uses it to provides suggestions for various error
messages that may be due to misspellings, and also for the
warnings in implicit_decl_warning.

This latter part may be controversial.  So far, we've only provided
spelling suggestions during error-handling, and I think there's
a strong case for spending the extra cycles to give a good error
message given that the compilation is going to fail.  For a *warning*,
this tradeoff may not be so clear.  In my experience, the
"implicit declaration of function" warning usually means that
either the compile or link is going to fail, so it may be that these
cases are usually effectively errors (hence the suggestions in this
patch).

Alternatively, the call to lookup_name_fuzzy could be somehow guarded
so that we don't do work upfront to generate the suggestion
if the warning is going to be discarded internally within diagnostics.c.
(if the user has supplied -Wno-implicit-function-declaration, they're
presumably working with code that uses implicit function decls, I
suppose).

Amongst other things, builtins get offered as suggestions (hence the
suggestion of "nanl" for "name" in one existing test case, which I
initially found surprising).

The patch includes PR c/70339: "singed" vs "signed" hints, looking
at reserved words that could start typenames.

The patch also considers preprocessor macros when considering
spelling candidates: if the user misspells a macro name, this is
typically seen by the frontend as an unrecognized identifier, so
we can offer suggestions like this:

spellcheck-identifiers.c: In function 'test_4':
spellcheck-identifiers.c:64:10: warning: implicit declaration of
function 'IDENTIFIER_PTR'; did you mean 'IDENTIFIER_POINTER'?
[-Wimplicit-function-declaration]
   return IDENTIFIER_PTR (node);
          ^~~~~~~~~~~~~~
          IDENTIFIER_POINTER

Doing so uses the "best_match" class added in a prior patch, merging
in results between C scopes and the preprocessor, using the
optimizations there to minimize the work done.  Merging these
results required some minor surgery to class best_match.

In c-c++-common/attributes-1.c, the error message for:
  void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(1,bar))); /* { dg-warning "outside range" } */
gains a suggestion of "carg", becoming:
  attributes-1.c:4:65: error: 'bar' undeclared here (not in a function); did you mean 'carg'?
  attributes-1.c:4:1: warning: alloc_size parameter outside range [-Wattributes]
This is an unhelpful suggestion, given that alloc_size expects an integer
value identifying a parameter, which the builtin
  double carg (double complex z);
is not.  It's not clear to me what the best way to fix this is:
if I'm reading things right, c_parser_attributes parses expressions
using c_parser_expr_list without regard to which attribute it's
handling, so there's (currently) no way to "tune" the attribute parser
based on the attribute (and I don't know if that's a complexity we
want to take on).

Successfully bootstrapped&regrtested in combination with the rest of
the kit on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 (in
combination with the rest of the kit).

OK for trunk?

gcc/c-family/ChangeLog:
	PR c/70339
	* c-common.h (enum lookup_name_fuzzy_kind): New enum.
	(lookup_name_fuzzy): New prototype.

gcc/c/ChangeLog:
	PR c/70339
	* c-decl.c: Include spellcheck-tree.h and gcc-rich-location.h.
	(implicit_decl_warning): When issuing warnings for implicit
	declarations, attempt to provide a suggestion via
	lookup_name_fuzzy.
	(undeclared_variable): Likewise when issuing errors.
	(lookup_name_in_scope): Likewise.
	(struct edit_distance_traits<cpp_hashnode *>): New struct.
	(best_macro_match): New typedef.
	(find_closest_macro_cpp_cb): New function.
	(lookup_name_fuzzy): New function.
	* c-parser.c: Include gcc-rich-location.h.
	(c_token_starts_typename): Split out case CPP_KEYWORD into...
	(c_keyword_starts_typename): ...this new function.
	(c_parser_declaration_or_fndef): When issuing errors about
	missing "struct" etc, add a fixit.  For other kinds of errors,
	attempt to provide a suggestion via lookup_name_fuzzy.
	(c_parser_parms_declarator): When looking ahead to detect typos in
	type names, also reject CPP_KEYWORD.
	(c_parser_parameter_declaration): When issuing errors about
	unknown type names, attempt to provide a suggestion via
	lookup_name_fuzzy.
	* c-tree.h (c_keyword_starts_typename): New prototype.

gcc/ChangeLog:
	PR c/70339
	* diagnostic-core.h (pedwarn_at_rich_loc): New prototype.
	* diagnostic.c (pedwarn_at_rich_loc): New function.
	* spellcheck.h (best_match::best_match): Add a
	"best_distance_so_far" optional parameter.
	(best_match::set_best_so_far): New method.
	(best_match::get_best_distance): New accessor.
	(best_match::get_best_candidate_length): New accessor.

gcc/testsuite/ChangeLog:
	PR c/70339
	* c-c++-common/attributes-1.c: Update dg-prune-output to include
	hint.
	* gcc.dg/diagnostic-token-ranges.c (undeclared_identifier): Update
	expected results due to builtin "nanl" now being suggested for
	"name".
	* gcc.dg/pr67580.c: Update expected messages.
	* gcc.dg/spellcheck-identifiers.c: New testcase.
	* gcc.dg/spellcheck-typenames.c: New testcase.
---
 gcc/c-family/c-common.h                        |   9 ++
 gcc/c/c-decl.c                                 | 190 ++++++++++++++++++++++++-
 gcc/c/c-parser.c                               | 144 +++++++++++++------
 gcc/c/c-tree.h                                 |   1 +
 gcc/diagnostic-core.h                          |   2 +
 gcc/diagnostic.c                               |  12 ++
 gcc/spellcheck.h                               |  22 ++-
 gcc/testsuite/c-c++-common/attributes-1.c      |   2 +-
 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c |   3 +-
 gcc/testsuite/gcc.dg/pr67580.c                 |  18 +--
 gcc/testsuite/gcc.dg/spellcheck-identifiers.c  | 136 ++++++++++++++++++
 gcc/testsuite/gcc.dg/spellcheck-typenames.c    | 107 ++++++++++++++
 12 files changed, 579 insertions(+), 67 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-identifiers.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-typenames.c

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 4e6aa00..3ad5400 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -990,6 +990,15 @@ extern tree lookup_label (tree);
 extern tree lookup_name (tree);
 extern bool lvalue_p (const_tree);
 
+enum lookup_name_fuzzy_kind {
+  /* Names of types.  */
+  FUZZY_LOOKUP_TYPENAME,
+
+  /* Any name.  */
+  FUZZY_LOOKUP_NAME
+};
+extern tree lookup_name_fuzzy (tree, enum lookup_name_fuzzy_kind);
+
 extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
 extern bool vector_types_convertible_p (const_tree t1, const_tree t2, bool emit_lax_note);
 extern tree c_build_vec_perm_expr (location_t, tree, tree, tree, bool = true);
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 5c08c59..8b966fe 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -51,6 +51,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ada-spec.h"
 #include "cilk.h"
 #include "builtins.h"
+#include "spellcheck-tree.h"
+#include "gcc-rich-location.h"
 
 /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
 enum decl_context
@@ -3086,13 +3088,36 @@ implicit_decl_warning (location_t loc, tree id, tree olddecl)
   if (warn_implicit_function_declaration)
     {
       bool warned;
+      tree hint = NULL_TREE;
+      if (!olddecl)
+	hint = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
 
       if (flag_isoc99)
-	warned = pedwarn (loc, OPT_Wimplicit_function_declaration,
-			  "implicit declaration of function %qE", id);
+	if (hint)
+	  {
+	    gcc_rich_location richloc (loc);
+	    richloc.add_fixit_misspelled_id (loc, hint);
+	    warned = pedwarn_at_rich_loc
+	      (&richloc, OPT_Wimplicit_function_declaration,
+	       "implicit declaration of function %qE; did you mean %qE?",
+	       id, hint);
+	  }
+	else
+	  warned = pedwarn (loc, OPT_Wimplicit_function_declaration,
+			    "implicit declaration of function %qE", id);
       else
-	warned = warning_at (loc, OPT_Wimplicit_function_declaration,
-			     G_("implicit declaration of function %qE"), id);
+	if (hint)
+	  {
+	    gcc_rich_location richloc (loc);
+	    richloc.add_fixit_misspelled_id (loc, hint);
+	    warned = warning_at_rich_loc
+	      (&richloc, OPT_Wimplicit_function_declaration,
+	       G_("implicit declaration of function %qE;did you mean %qE?"),
+	       id, hint);
+	  }
+	else
+	  warned = warning_at (loc, OPT_Wimplicit_function_declaration,
+			       G_("implicit declaration of function %qE"), id);
       if (olddecl && warned)
 	locate_old_decl (olddecl);
     }
@@ -3408,13 +3433,38 @@ undeclared_variable (location_t loc, tree id)
 
   if (current_function_decl == 0)
     {
-      error_at (loc, "%qE undeclared here (not in a function)", id);
+      tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
+      if (guessed_id)
+	{
+	  gcc_rich_location richloc (loc);
+	  richloc.add_fixit_misspelled_id (loc, guessed_id);
+	  error_at_rich_loc (&richloc,
+			     "%qE undeclared here (not in a function);"
+			     " did you mean %qE?",
+			     id, guessed_id);
+	}
+      else
+	error_at (loc, "%qE undeclared here (not in a function)", id);
       scope = current_scope;
     }
   else
     {
       if (!objc_diagnose_private_ivar (id))
-        error_at (loc, "%qE undeclared (first use in this function)", id);
+	{
+	  tree guessed_id = lookup_name_fuzzy (id, FUZZY_LOOKUP_NAME);
+	  if (guessed_id)
+	    {
+	      gcc_rich_location richloc (loc);
+	      richloc.add_fixit_misspelled_id (loc, guessed_id);
+	      error_at_rich_loc
+		(&richloc,
+		 "%qE undeclared (first use in this function);"
+		 " did you mean %qE?",
+		 id, guessed_id);
+	    }
+	  else
+	    error_at (loc, "%qE undeclared (first use in this function)", id);
+	}
       if (!already)
 	{
           inform (loc, "each undeclared identifier is reported only"
@@ -3904,6 +3954,134 @@ lookup_name_in_scope (tree name, struct c_scope *scope)
       return b->decl;
   return NULL_TREE;
 }
+
+/* Specialization of edit_distance_traits for preprocessor macros.  */
+
+template <>
+struct edit_distance_traits<cpp_hashnode *>
+{
+  static size_t get_length (cpp_hashnode *hashnode)
+  {
+    return hashnode->ident.len;
+  }
+
+  static const char *get_string (cpp_hashnode *hashnode)
+  {
+    return (const char *)hashnode->ident.str;
+  }
+};
+
+/* Specialization of best_match<> for finding the closest preprocessor
+   macro to a given identifier.  */
+
+typedef best_match<tree, cpp_hashnode *> best_macro_match;
+
+/* A callback for cpp_forall_identifiers, for use by lookup_name_fuzzy.
+   Process HASHNODE and update the best_macro_match instance pointed to be
+   USER_DATA.  */
+
+static int
+find_closest_macro_cpp_cb (cpp_reader *, cpp_hashnode *hashnode,
+			   void *user_data)
+{
+  if (hashnode->type != NT_MACRO)
+    return 1;
+
+  best_macro_match *bmm = (best_macro_match *)user_data;
+  bmm->consider (hashnode);
+
+  /* Keep iterating.  */
+  return 1;
+}
+
+/* Look for the closest match for NAME within the currently valid
+   scopes.
+
+   This finds the identifier with the lowest Levenshtein distance to
+   NAME.  If there are multiple candidates with equal minimal distance,
+   the first one found is returned.  Scopes are searched from innermost
+   outwards, and within a scope in reverse order of declaration, thus
+   benefiting candidates "near" to the current scope.
+
+   The function also looks for similar macro names to NAME, since a
+   misspelled macro name will not be expanded, and hence looks like an
+   identifier to the C frontend.
+
+   It also looks for start_typename keywords, to detect "singed" vs "signed"
+   typos.  */
+
+tree
+lookup_name_fuzzy (tree name, enum lookup_name_fuzzy_kind kind)
+{
+  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
+
+  best_match<tree, tree> bm (name);
+
+  /* Look within currently valid scopes.  */
+  for (c_scope *scope = current_scope; scope; scope = scope->outer)
+    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
+      {
+	if (!binding->id)
+	  continue;
+	/* Don't use bindings from implicitly declared functions,
+	   as they were likely misspellings themselves.  */
+	if (TREE_CODE (binding->decl) == FUNCTION_DECL)
+	  if (C_DECL_IMPLICIT (binding->decl))
+	    continue;
+	if (kind == FUZZY_LOOKUP_TYPENAME)
+	  if (TREE_CODE (binding->decl) != TYPE_DECL)
+	    continue;
+	bm.consider (binding->id);
+      }
+
+  /* Consider macros: if the user misspelled a macro name e.g. "SOME_MACRO"
+     as:
+       x = SOME_OTHER_MACRO (y);
+     then "SOME_OTHER_MACRO" will survive to the frontend and show up
+     as a misspelled identifier.
+
+     Use the best distance so far so that a candidate is only set if
+     a macro is better than anything so far.  This allows early rejection
+     (without calculating the edit distance) of macro names that must have
+     distance >= bm.get_best_distance (), and means that we only get a
+     non-NULL result for best_macro_match if it's better than any of
+     the identifiers already checked, which avoids needless creation
+     of identifiers for macro hashnodes.  */
+  best_macro_match bmm (name, bm.get_best_distance ());
+  cpp_forall_identifiers (parse_in, find_closest_macro_cpp_cb, &bmm);
+  cpp_hashnode *best_macro = bmm.get_best_meaningful_candidate ();
+  /* If a macro is the closest so far to NAME, use it, creating an
+     identifier tree node for it.  */
+  if (best_macro)
+    {
+      const char *id = (const char *)best_macro->ident.str;
+      tree macro_as_identifier
+	= get_identifier_with_length (id, best_macro->ident.len);
+      bm.set_best_so_far (macro_as_identifier,
+			  bmm.get_best_distance (),
+			  bmm.get_best_candidate_length ());
+    }
+
+  /* Try the "start_typename" keywords to detect
+     "singed" vs "signed" typos.  */
+  if (kind == FUZZY_LOOKUP_TYPENAME)
+    {
+      for (unsigned i = 0; i < num_c_common_reswords; i++)
+	{
+	  const c_common_resword *resword = &c_common_reswords[i];
+	  if (!c_keyword_starts_typename (resword->rid))
+	    continue;
+	  tree resword_identifier = ridpointers [resword->rid];
+	  if (!resword_identifier)
+	    continue;
+	  gcc_assert (TREE_CODE (resword_identifier) == IDENTIFIER_NODE);
+	  bm.consider (resword_identifier);
+	}
+    }
+
+  return bm.get_best_meaningful_candidate ();
+}
+
 \f
 /* Create the predefined scalar types of C,
    and some nodes representing standard constants (0, 1, (void *) 0).
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index ff32479..db36a00 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-indentation.h"
 #include "gimple-expr.h"
 #include "context.h"
+#include "gcc-rich-location.h"
 
 /* We need to walk over decls with incomplete struct/union/enum types
    after parsing the whole translation unit.
@@ -518,6 +519,48 @@ c_parser_peek_nth_token (c_parser *parser, unsigned int n)
   return &parser->tokens[n - 1];
 }
 
+bool
+c_keyword_starts_typename (enum rid keyword)
+{
+  switch (keyword)
+    {
+    case RID_UNSIGNED:
+    case RID_LONG:
+    case RID_SHORT:
+    case RID_SIGNED:
+    case RID_COMPLEX:
+    case RID_INT:
+    case RID_CHAR:
+    case RID_FLOAT:
+    case RID_DOUBLE:
+    case RID_VOID:
+    case RID_DFLOAT32:
+    case RID_DFLOAT64:
+    case RID_DFLOAT128:
+    case RID_BOOL:
+    case RID_ENUM:
+    case RID_STRUCT:
+    case RID_UNION:
+    case RID_TYPEOF:
+    case RID_CONST:
+    case RID_ATOMIC:
+    case RID_VOLATILE:
+    case RID_RESTRICT:
+    case RID_ATTRIBUTE:
+    case RID_FRACT:
+    case RID_ACCUM:
+    case RID_SAT:
+    case RID_AUTO_TYPE:
+      return true;
+    default:
+      if (keyword >= RID_FIRST_INT_N
+	  && keyword < RID_FIRST_INT_N + NUM_INT_N_ENTS
+	  && int_n_enabled_p[keyword - RID_FIRST_INT_N])
+	return true;
+      return false;
+    }
+}
+
 /* Return true if TOKEN can start a type name,
    false otherwise.  */
 static bool
@@ -541,43 +584,7 @@ c_token_starts_typename (c_token *token)
 	  gcc_unreachable ();
 	}
     case CPP_KEYWORD:
-      switch (token->keyword)
-	{
-	case RID_UNSIGNED:
-	case RID_LONG:
-	case RID_SHORT:
-	case RID_SIGNED:
-	case RID_COMPLEX:
-	case RID_INT:
-	case RID_CHAR:
-	case RID_FLOAT:
-	case RID_DOUBLE:
-	case RID_VOID:
-	case RID_DFLOAT32:
-	case RID_DFLOAT64:
-	case RID_DFLOAT128:
-	case RID_BOOL:
-	case RID_ENUM:
-	case RID_STRUCT:
-	case RID_UNION:
-	case RID_TYPEOF:
-	case RID_CONST:
-	case RID_ATOMIC:
-	case RID_VOLATILE:
-	case RID_RESTRICT:
-	case RID_ATTRIBUTE:
-	case RID_FRACT:
-	case RID_ACCUM:
-	case RID_SAT:
-	case RID_AUTO_TYPE:
-	  return true;
-	default:
-	  if (token->keyword >= RID_FIRST_INT_N
-	      && token->keyword < RID_FIRST_INT_N + NUM_INT_N_ENTS
-	      && int_n_enabled_p[token->keyword - RID_FIRST_INT_N])
-	    return true;
-	  return false;
-	}
+      return c_keyword_starts_typename (token->keyword);
     case CPP_LESS:
       if (c_dialect_objc ())
 	return true;
@@ -1655,15 +1662,50 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
       && (!nested || !lookup_name (c_parser_peek_token (parser)->value)))
     {
       tree name = c_parser_peek_token (parser)->value;
-      error_at (here, "unknown type name %qE", name);
-      /* Give a hint to the user.  This is not C++ with its implicit
-	 typedef.  */
+
+      /* Issue a warning about NAME being an unknown type name, perhaps
+	 with some kind of hint.
+	 If the user forgot a "struct" etc, suggest inserting
+	 it.  Otherwise, attempt to look for misspellings.  */
+      gcc_rich_location richloc (here);
       if (tag_exists_p (RECORD_TYPE, name))
-	inform (here, "use %<struct%> keyword to refer to the type");
+	{
+	  /* This is not C++ with its implicit typedef.  */
+	  richloc.add_fixit_insert (here, "struct");
+	  error_at_rich_loc (&richloc,
+			     "unknown type name %qE;"
+			     " use %<struct%> keyword to refer to the type",
+			     name);
+	}
       else if (tag_exists_p (UNION_TYPE, name))
-	inform (here, "use %<union%> keyword to refer to the type");
+	{
+	  richloc.add_fixit_insert (here, "union");
+	  error_at_rich_loc (&richloc,
+			     "unknown type name %qE;"
+			     " use %<union%> keyword to refer to the type",
+			     name);
+	}
       else if (tag_exists_p (ENUMERAL_TYPE, name))
-	inform (here, "use %<enum%> keyword to refer to the type");
+	{
+	  richloc.add_fixit_insert (here, "enum");
+	  error_at_rich_loc (&richloc,
+			     "unknown type name %qE;"
+			     " use %<enum%> keyword to refer to the type",
+			     name);
+	}
+      else
+	{
+	  tree hint = lookup_name_fuzzy (name, FUZZY_LOOKUP_TYPENAME);
+	  if (hint)
+	    {
+	      richloc.add_fixit_misspelled_id (here, hint);
+	      error_at_rich_loc (&richloc,
+				 "unknown type name %qE; did you mean %qE?",
+				 name, hint);
+	    }
+	  else
+	    error_at (here, "unknown type name %qE", name);
+	}
 
       /* Parse declspecs normally to get a correct pointer type, but avoid
          a further "fails to be a type name" error.  Refuse nested functions
@@ -3632,7 +3674,8 @@ c_parser_parms_declarator (c_parser *parser, bool id_list_ok, tree attrs)
       && c_parser_peek_2nd_token (parser)->type != CPP_NAME
       && c_parser_peek_2nd_token (parser)->type != CPP_MULT
       && c_parser_peek_2nd_token (parser)->type != CPP_OPEN_PAREN
-      && c_parser_peek_2nd_token (parser)->type != CPP_OPEN_SQUARE)
+      && c_parser_peek_2nd_token (parser)->type != CPP_OPEN_SQUARE
+      && c_parser_peek_2nd_token (parser)->type != CPP_KEYWORD)
     {
       tree list = NULL_TREE, *nextp = &list;
       while (c_parser_next_token_is (parser, CPP_NAME)
@@ -3807,7 +3850,18 @@ c_parser_parameter_declaration (c_parser *parser, tree attrs)
       c_parser_set_source_position_from_token (token);
       if (c_parser_next_tokens_start_typename (parser, cla_prefer_type))
 	{
-	  error_at (token->location, "unknown type name %qE", token->value);
+	  tree hint = lookup_name_fuzzy (token->value, FUZZY_LOOKUP_TYPENAME);
+	  if (hint)
+	    {
+	      gcc_assert (TREE_CODE (hint) == IDENTIFIER_NODE);
+	      gcc_rich_location richloc (token->location);
+	      richloc.add_fixit_misspelled_id (token->location, hint);
+	      error_at_rich_loc (&richloc,
+				 "unknown type name %qE; did you mean %qE?",
+				 token->value, hint);
+	    }
+	  else
+	    error_at (token->location, "unknown type name %qE", token->value);
 	  parser->error = true;
 	}
       /* ??? In some Objective-C cases '...' isn't applicable so there
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 8f10a13..46be53e 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -482,6 +482,7 @@ enum c_inline_static_type {
 \f
 /* in c-parser.c */
 extern void c_parse_init (void);
+extern bool c_keyword_starts_typename (enum rid keyword);
 
 /* in c-aux-info.c */
 extern void gen_aux_info_record (tree, int, int, int);
diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index f783761..51df150 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -76,6 +76,8 @@ extern void fatal_error (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3)
 /* Pass one of the OPT_W* from options.h as the second parameter.  */
 extern bool pedwarn (location_t, int, const char *, ...)
      ATTRIBUTE_GCC_DIAG(3,4);
+extern bool pedwarn_at_rich_loc (rich_location *, int, const char *, ...)
+     ATTRIBUTE_GCC_DIAG(3,4);
 extern bool permerror (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern bool permerror_at_rich_loc (rich_location *, const char *,
 				   ...) ATTRIBUTE_GCC_DIAG(2,3);
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 8467aaa..9f419fb 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -1089,6 +1089,18 @@ pedwarn (location_t location, int opt, const char *gmsgid, ...)
   return ret;
 }
 
+/* Same as pedwarn, but using RICHLOC.  */
+
+bool
+pedwarn_at_rich_loc (rich_location *richloc, int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool ret = diagnostic_impl (richloc, opt, gmsgid, &ap, DK_PEDWARN);
+  va_end (ap);
+  return ret;
+}
+
 /* A "permissive" error at LOCATION: issues an error unless
    -fpermissive was given on the command line, in which case it issues
    a warning.  Use this for things that really should be errors but we
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
index 7379399..035f4ac 100644
--- a/gcc/spellcheck.h
+++ b/gcc/spellcheck.h
@@ -69,11 +69,12 @@ class best_match
 
   /* Constructor.  */
 
-  best_match (goal_t goal)
+  best_match (GOAL_TYPE goal,
+	      edit_distance_t best_distance_so_far = MAX_EDIT_DISTANCE)
   : m_goal (goal_traits::get_string (goal)),
     m_goal_len (goal_traits::get_length (goal)),
     m_best_candidate (NULL),
-    m_best_distance (MAX_EDIT_DISTANCE)
+    m_best_distance (best_distance_so_far)
   {}
 
   /* Compare the edit distance between CANDIDATE and m_goal,
@@ -118,6 +119,20 @@ class best_match
       }
   }
 
+  /* Assuming that BEST_CANDIDATE is known to be better than
+     m_best_candidate, update (without recomputing the edit distance to
+     the goal).  */
+
+  void set_best_so_far (CANDIDATE_TYPE best_candidate,
+			edit_distance_t best_distance,
+			size_t best_candidate_len)
+  {
+    gcc_assert (best_distance < m_best_distance);
+    m_best_candidate = best_candidate;
+    m_best_distance = best_distance;
+    m_best_candidate_len = best_candidate_len;
+  }
+
   /* Get the best candidate so far, but applying a filter to ensure
      that we return NULL if none of the candidates are close to the goal,
      to avoid offering nonsensical suggestions to the user.  */
@@ -135,6 +150,9 @@ class best_match
     return m_best_candidate;
   }
 
+  edit_distance_t get_best_distance () const { return m_best_distance; }
+  size_t get_best_candidate_length () const { return m_best_candidate_len; }
+
  private:
   const char *m_goal;
   size_t m_goal_len;
diff --git a/gcc/testsuite/c-c++-common/attributes-1.c b/gcc/testsuite/c-c++-common/attributes-1.c
index 1657da1..c348526 100644
--- a/gcc/testsuite/c-c++-common/attributes-1.c
+++ b/gcc/testsuite/c-c++-common/attributes-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-prune-output "undeclared here \\(not in a function\\)|\[^\n\r\]* was not declared in this scope" } */
+/* { dg-prune-output "undeclared here \\(not in a function\\); did you mean .carg..|\[^\n\r\]* was not declared in this scope" } */
 
 void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(1,bar))); /* { dg-warning "outside range" } */
 void* my_realloc(void*, unsigned) __attribute__((alloc_size(bar))); /* { dg-warning "outside range" } */
diff --git a/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
index ac969e3..1939949 100644
--- a/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
+++ b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
@@ -6,11 +6,12 @@
 
 void undeclared_identifier (void)
 {
-  name; /* { dg-error "'name' undeclared" } */
+  name; /* { dg-error "'name' undeclared .first use in this function.; did you mean .nanl." } */
 /*
 { dg-begin-multiline-output "" }
    name;
    ^~~~
+   nanl
 { dg-end-multiline-output "" }
 */
 }
diff --git a/gcc/testsuite/gcc.dg/pr67580.c b/gcc/testsuite/gcc.dg/pr67580.c
index 90e4b1b..c2760e5 100644
--- a/gcc/testsuite/gcc.dg/pr67580.c
+++ b/gcc/testsuite/gcc.dg/pr67580.c
@@ -8,12 +8,9 @@ enum E { A };
 void
 f (void)
 {
-  S s; /* { dg-error "unknown type name" } */
-/* { dg-message "use .struct. keyword to refer to the type" "" { target *-*-* } 11 } */
-  U u; /* { dg-error "unknown type name" } */
-/* { dg-message "use .union. keyword to refer to the type" "" { target *-*-* } 13 } */
-  E e; /* { dg-error "unknown type name" } */
-/* { dg-message "use .enum. keyword to refer to the type" "" { target *-*-* } 15 } */
+  S s; /* { dg-error "unknown type name .S.; use .struct. keyword to refer to the type" } */
+  U u; /* { dg-error "unknown type name .U.; use .union. keyword to refer to the type" } */
+  E e; /* { dg-error "unknown type name .E.; use .enum. keyword to refer to the type" } */
 }
 
 void
@@ -22,10 +19,7 @@ g (void)
   struct T { int i; };
   union V { int i; };
   enum F { J };
-  T t; /* { dg-error "unknown type name" } */
-/* { dg-message "use .struct. keyword to refer to the type" "" { target *-*-* } 25 } */
-  V v; /* { dg-error "unknown type name" } */
-/* { dg-message "use .union. keyword to refer to the type" "" { target *-*-* } 27 } */
-  F f; /* { dg-error "unknown type name" } */
-/* { dg-message "use .enum. keyword to refer to the type" "" { target *-*-* } 29 } */
+  T t; /* { dg-error "unknown type name .T.; use .struct. keyword to refer to the type" } */
+  V v; /* { dg-error "unknown type name .V.; use .union. keyword to refer to the type" } */
+  F f; /* { dg-error "unknown type name .F.; use .enum. keyword to refer to the type" } */
 }
diff --git a/gcc/testsuite/gcc.dg/spellcheck-identifiers.c b/gcc/testsuite/gcc.dg/spellcheck-identifiers.c
new file mode 100644
index 0000000..22a12d0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-identifiers.c
@@ -0,0 +1,136 @@
+/* { dg-do compile } */
+/* { dg-options "-Wimplicit-function-declaration -fdiagnostics-show-caret" } */
+
+typedef struct GtkWidget { int dummy; } GtkWidget;
+
+extern void gtk_widget_show_all (GtkWidget *w);
+
+void
+test_1 (GtkWidget *w)
+{
+  gtk_widget_showall (w); /* { dg-warning "3: implicit declaration of function .gtk_widget_showall.; did you mean .gtk_widget_show_all.?" } */
+  /* { dg-begin-multiline-output "" }
+   gtk_widget_showall (w);
+   ^~~~~~~~~~~~~~~~~~
+   gtk_widget_show_all
+   { dg-end-multiline-output "" } */
+
+  /* Ensure we don't try to suggest "gtk_widget_showall" for subsequent
+     corrections.  */
+  gtk_widget_showall_ (w); /* { dg-warning "3: implicit declaration of function .gtk_widget_showall_.; did you mean .gtk_widget_show_all.?" } */
+  /* { dg-begin-multiline-output "" }
+   gtk_widget_showall_ (w);
+   ^~~~~~~~~~~~~~~~~~~
+   gtk_widget_show_all
+   { dg-end-multiline-output "" } */
+
+  GtkWidgetShowAll (w); /* { dg-warning "3: implicit declaration of function .GtkWidgetShowAll.; did you mean .gtk_widget_show_all.?" } */
+  /* { dg-begin-multiline-output "" }
+   GtkWidgetShowAll (w);
+   ^~~~~~~~~~~~~~~~
+   gtk_widget_show_all
+   { dg-end-multiline-output "" } */
+}
+
+int
+test_2 (int param)
+{
+  return parma * parma; /* { dg-error "10: .parma. undeclared .first use in this function.; did you mean .param." } */
+  /* { dg-begin-multiline-output "" }
+   return parma * parma;
+          ^~~~~
+          param
+   { dg-end-multiline-output "" } */
+}
+
+#define MACRO(X) ((X))
+
+int
+test_3 (int i)
+{
+  return MACRAME (i); /* { dg-warning "10: implicit declaration of function .MACRAME.; did you mean .MACRO.?" } */
+  /* { dg-begin-multiline-output "" }
+   return MACRAME (i);
+          ^~~~~~~
+          MACRO
+   { dg-end-multiline-output "" } */
+}
+
+#define IDENTIFIER_POINTER(X) ((X))
+
+int
+test_4 (int node)
+{
+  return IDENTIFIER_PTR (node); /* { dg-warning "10: implicit declaration of function .IDENTIFIER_PTR.; did you mean .IDENTIFIER_POINTER.?" } */
+  /* { dg-begin-multiline-output "" }
+   return IDENTIFIER_PTR (node);
+          ^~~~~~~~~~~~~~
+          IDENTIFIER_POINTER
+   { dg-end-multiline-output "" } */
+}
+
+
+int
+test_5 (void)
+{
+  return __LINE_; /* { dg-error "10: .__LINE_. undeclared .first use in this function.; did you mean .__LINE__." } */
+  /* { dg-begin-multiline-output "" }
+   return __LINE_;
+          ^~~~~~~
+          __LINE__
+   { dg-end-multiline-output "" } */
+}
+
+#define MAX_ITEMS 100
+int array[MAX_ITEM]; /* { dg-error "11: .MAX_ITEM. undeclared here .not in a function.; did you mean .MAX_ITEMS." } */
+  /* { dg-begin-multiline-output "" }
+ int array[MAX_ITEM];
+           ^~~~~~~~
+           MAX_ITEMS
+   { dg-end-multiline-output "" } */
+
+
+enum foo {
+  FOO_FIRST,
+  FOO_SECOND
+};
+
+int
+test_6 (enum foo f)
+{
+  switch (f)
+    {
+    case FOO_FURST: /* { dg-error "10: .FOO_FURST. undeclared .first use in this function.; did you mean .FOO_FIRST." } */
+      break;
+  /* { dg-begin-multiline-output "" }
+     case FOO_FURST:
+          ^~~~~~~~~
+          FOO_FIRST
+   { dg-end-multiline-output "" } */
+
+    case FOO_SECCOND: /* { dg-error "10: .FOO_SECCOND. undeclared .first use in this function.; did you mean .FOO_SECOND." } */
+      break;
+  /* { dg-begin-multiline-output "" }
+     case FOO_SECCOND:
+          ^~~~~~~~~~~
+          FOO_SECOND
+   { dg-end-multiline-output "" } */
+
+    default:
+      break;
+    }
+}
+
+/* Verify that we offer names of builtins as suggestions.  */
+
+void
+test_7 (int i, int j)
+{
+  int buffer[100];
+  snprint (buffer, 100, "%i of %i", i, j); /* { dg-warning "3: implicit declaration of function .snprint.; did you mean .snprintf.." } */
+  /* { dg-begin-multiline-output "" }
+   snprint (buffer, 100, "%i of %i", i, j);
+   ^~~~~~~
+   snprintf
+   { dg-end-multiline-output "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/spellcheck-typenames.c b/gcc/testsuite/gcc.dg/spellcheck-typenames.c
new file mode 100644
index 0000000..ae22ce3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-typenames.c
@@ -0,0 +1,107 @@
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+void test_1 (signed char e);
+
+/* PR c/70339.  */
+void test_2 (singed char e); /* { dg-error "14: unknown type name .singed.; did you mean .signed.?" } */
+/* { dg-begin-multiline-output "" }
+ void test_2 (singed char e);
+              ^~~~~~
+              signed
+   { dg-end-multiline-output "" } */
+
+void test_3 (car e); /* { dg-error "14: unknown type name .car.; did you mean .char.?" } */
+/* { dg-begin-multiline-output "" }
+ void test_3 (car e);
+              ^~~
+              char
+   { dg-end-multiline-output "" } */
+
+/* TODO: this one could be handled better.  */
+void test_4 (signed car e); /* { dg-error "25: before .e." } */
+/* { dg-begin-multiline-output "" }
+ void test_4 (signed car e);
+                         ^
+   { dg-end-multiline-output "" } */
+
+/* Verify that we handle misspelled typedef names.  */
+
+typedef struct something {} something_t;
+
+some_thing_t test_5; /* { dg-error "1: unknown type name .some_thing_t.; did you mean .something_t.?" } */
+  /* { dg-begin-multiline-output "" }
+ some_thing_t test_5;
+ ^~~~~~~~~~~~
+ something_t
+   { dg-end-multiline-output "" } */
+
+/* TODO: we don't yet handle misspelled struct names.  */
+struct some_thing test_6; /* { dg-error "storage size of .test_6. isn't known" } */
+  /* { dg-begin-multiline-output "" }
+ struct some_thing test_6;
+                   ^~~~~~
+   { dg-end-multiline-output "" } */
+
+typedef long int64_t;
+int64 i; /* { dg-error "unknown type name 'int64'; did you mean 'int64_t'?" } */
+/* { dg-begin-multiline-output "" }
+ int64 i;
+ ^~~~~
+ int64_t
+   { dg-end-multiline-output "" } */
+
+/* Verify that gcc doesn't offer nonsensical suggestions.  */
+
+nonsensical_suggestion_t var; /* { dg-bogus "did you mean" } */
+/* { dg-error "unknown type name" "" { target { *-*-* } } 56 } */
+/* { dg-begin-multiline-output "" }
+ nonsensical_suggestion_t var;
+ ^~~~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+
+/* In the following, we should suggest inserting "struct" (rather
+   than "did you mean 'float'") and provide a fixit hint.  */
+struct foo_t {
+  int i;
+};
+foo_t *foo_ptr; /* { dg-error "1: unknown type name .foo_t.; use .struct. keyword to refer to the type" } */
+/* { dg-begin-multiline-output "" }
+ foo_t *foo_ptr;
+ ^~~~~
+ struct
+   { dg-end-multiline-output "" } */
+
+
+/* Similarly for unions.  */
+union bar_t {
+  int i;
+  char j;
+};
+bar_t *bar_ptr; /* { dg-error "1: unknown type name .bar_t.; use .union. keyword to refer to the type" } */
+/* { dg-begin-multiline-output "" }
+ bar_t *bar_ptr;
+ ^~~~~
+ union
+   { dg-end-multiline-output "" } */
+
+
+/* Similarly for enums.  */
+enum baz {
+  BAZ_FIRST,
+  BAZ_SECOND
+};
+baz value; /* { dg-error "1: unknown type name .baz.; use .enum. keyword to refer to the type" } */
+/* { dg-begin-multiline-output "" }
+ baz value;
+ ^~~
+ enum
+   { dg-end-multiline-output "" } */
+
+/* TODO: it would be better to detect the "singed" vs "signed" typo here.  */
+singed char ch; /* { dg-error "8: before .char." } */
+/* { dg-begin-multiline-output "" }
+ singed char ch;
+        ^~~~
+   { dg-end-multiline-output "" } */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject
  2016-06-14 14:49 [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id David Malcolm
@ 2016-06-14 14:49 ` David Malcolm
  2016-06-14 21:06   ` Jeff Law
  2016-06-14 14:49 ` [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names David Malcolm
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: David Malcolm @ 2016-06-14 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

There's a lot of repetition between find_closest_string and
find_closest_identifier, and the next patch adds more, so this
patch moves the logic into a new template class "best_match"
for locating the closest string from a sequence of candidates.

The patch also introduces a pair of early-reject optimizations
that weren't present in the older implementations, reducing the
number of calls to levenshtein_distance.
Introducing class best_match allows for these optimizations to be
in one place (best_match::consider), rather than having to implement
them twice.

Successfully bootstrapped&regrtested in combination with the rest of
the kit on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
just this patch (on top of patches 1 and 2).

OK for trunk if it passes individual bootstrap&regrtest?

gcc/c/ChangeLog:
	* c-typeck.c: Include spellcheck-tree.h rather than spellcheck.h.

gcc/cp/ChangeLog:
	* search.c: Include spellcheck-tree.h rather than spellcheck.h.

gcc/ChangeLog:
	* spellcheck-tree.c: Include spellcheck-tree.h rather than
	spellcheck.h.
	(find_closest_identifier): Reimplement in terms of
	best_match<tree,tree>.
	* spellcheck-tree.h: New file.
	* spellcheck.c (struct edit_distance_traits<const char *>): New
	struct.
	(find_closest_string): Reimplement in terms of
	best_match<const char *, const char *>.
	* spellcheck.h (levenshtein_distance): Move prototype of tree-based
	overload to spellcheck-tree.h.
	(find_closest_identifier): Likewise.
	(struct edit_distance_traits<T>): New template.
	(class best_match): New class.
---
 gcc/c/c-typeck.c      |   2 +-
 gcc/cp/search.c       |   2 +-
 gcc/spellcheck-tree.c |  24 ++---------
 gcc/spellcheck-tree.h |  51 +++++++++++++++++++++++
 gcc/spellcheck.c      |  42 +++++++++----------
 gcc/spellcheck.h      | 110 +++++++++++++++++++++++++++++++++++++++++++++++---
 6 files changed, 183 insertions(+), 48 deletions(-)
 create mode 100644 gcc/spellcheck-tree.h

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index f987508..f03c07b 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -47,7 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ubsan.h"
 #include "cilk.h"
 #include "gomp-constants.h"
-#include "spellcheck.h"
+#include "spellcheck-tree.h"
 #include "gcc-rich-location.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
diff --git a/gcc/cp/search.c b/gcc/cp/search.c
index f47833f..990c3fe 100644
--- a/gcc/cp/search.c
+++ b/gcc/cp/search.c
@@ -27,7 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cp-tree.h"
 #include "intl.h"
 #include "toplev.h"
-#include "spellcheck.h"
+#include "spellcheck-tree.h"
 
 static int is_subobject_of_p (tree, tree);
 static tree dfs_lookup_base (tree, void *);
diff --git a/gcc/spellcheck-tree.c b/gcc/spellcheck-tree.c
index 2d73b774..63fb1a8 100644
--- a/gcc/spellcheck-tree.c
+++ b/gcc/spellcheck-tree.c
@@ -22,7 +22,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
-#include "spellcheck.h"
+#include "spellcheck-tree.h"
 #include "selftest.h"
 #include "stringpool.h"
 
@@ -53,32 +53,16 @@ find_closest_identifier (tree target, const auto_vec<tree> *candidates)
 {
   gcc_assert (TREE_CODE (target) == IDENTIFIER_NODE);
 
+  best_match<tree, tree> bm (target);
   int i;
   tree identifier;
-  tree best_identifier = NULL_TREE;
-  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
   FOR_EACH_VEC_ELT (*candidates, i, identifier)
     {
       gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
-      edit_distance_t dist = levenshtein_distance (target, identifier);
-      if (dist < best_distance)
-	{
-	  best_distance = dist;
-	  best_identifier = identifier;
-	}
+      bm.consider (identifier);
     }
 
-  /* If more than half of the letters were misspelled, the suggestion is
-     likely to be meaningless.  */
-  if (best_identifier)
-    {
-      unsigned int cutoff = MAX (IDENTIFIER_LENGTH (target),
-				 IDENTIFIER_LENGTH (best_identifier)) / 2;
-      if (best_distance > cutoff)
-	return NULL_TREE;
-    }
-
-  return best_identifier;
+  return bm.get_best_meaningful_candidate ();
 }
 
 #if CHECKING_P
diff --git a/gcc/spellcheck-tree.h b/gcc/spellcheck-tree.h
new file mode 100644
index 0000000..0d5e253
--- /dev/null
+++ b/gcc/spellcheck-tree.h
@@ -0,0 +1,51 @@
+/* Find near-matches for identifiers.
+   Copyright (C) 2015-2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_TREE_H
+#define GCC_SPELLCHECK_TREE_H
+
+#include "spellcheck.h"
+
+/* spellcheck-tree.c  */
+
+extern edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t);
+
+extern tree
+find_closest_identifier (tree target, const auto_vec<tree> *candidates);
+
+/* Specialization of edit_distance_traits for identifiers.  */
+
+template <>
+struct edit_distance_traits<tree>
+{
+  static size_t get_length (tree id)
+  {
+    gcc_assert (TREE_CODE (id) == IDENTIFIER_NODE);
+    return IDENTIFIER_LENGTH (id);
+  }
+
+  static const char *get_string (tree id)
+  {
+    gcc_assert (TREE_CODE (id) == IDENTIFIER_NODE);
+    return IDENTIFIER_POINTER (id);
+  }
+};
+
+#endif  /* GCC_SPELLCHECK_TREE_H  */
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index e03f484..2648f3a 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -121,6 +121,24 @@ levenshtein_distance (const char *s, const char *t)
   return levenshtein_distance (s, strlen (s), t, strlen (t));
 }
 
+/* Specialization of edit_distance_traits for C-style strings.  */
+
+template <>
+struct edit_distance_traits<const char *>
+{
+  static size_t get_length (const char *str)
+  {
+    gcc_assert (str);
+    return strlen (str);
+  }
+
+  static const char *get_string (const char *str)
+  {
+    gcc_assert (str);
+    return str;
+  }
+};
+
 /* Given TARGET, a non-NULL string, and CANDIDATES, a non-NULL ptr to
    an autovec of non-NULL strings, determine which element within
    CANDIDATES has the lowest edit distance to TARGET.  If there are
@@ -139,32 +157,14 @@ find_closest_string (const char *target,
 
   int i;
   const char *candidate;
-  const char *best_candidate = NULL;
-  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
-  size_t len_target = strlen (target);
+  best_match<const char *, const char *> bm (target);
   FOR_EACH_VEC_ELT (*candidates, i, candidate)
     {
       gcc_assert (candidate);
-      edit_distance_t dist
-	= levenshtein_distance (target, len_target,
-				candidate, strlen (candidate));
-      if (dist < best_distance)
-	{
-	  best_distance = dist;
-	  best_candidate = candidate;
-	}
-    }
-
-  /* If more than half of the letters were misspelled, the suggestion is
-     likely to be meaningless.  */
-  if (best_candidate)
-    {
-      unsigned int cutoff = MAX (len_target, strlen (best_candidate)) / 2;
-      if (best_distance > cutoff)
-	return NULL;
+      bm.consider (candidate);
     }
 
-  return best_candidate;
+  return bm.get_best_meaningful_candidate ();
 }
 
 #if CHECKING_P
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
index 040c33e..7379399 100644
--- a/gcc/spellcheck.h
+++ b/gcc/spellcheck.h
@@ -35,12 +35,112 @@ extern const char *
 find_closest_string (const char *target,
 		     const auto_vec<const char *> *candidates);
 
-/* spellcheck-tree.c  */
+/* A traits class for describing a string-like type usable by
+   class best_match.
+   Specializations should provide the implementations of the following:
 
-extern edit_distance_t
-levenshtein_distance (tree ident_s, tree ident_t);
+     static size_t get_length (TYPE);
+     static const char *get_string (TYPE);
+
+   get_string should return a non-NULL ptr, which does not need to be
+   0-terminated.  */
+
+template <typename TYPE>
+struct edit_distance_traits {};
+
+/* A type for use when determining the best match against a string,
+   expressed as a template so that we can match against various
+   string-like types (const char *, frontend identifiers, and preprocessor
+   macros).
+
+   This type accumulates the best possible match against GOAL_TYPE for
+   a sequence of elements of CANDIDATE_TYPE, whilst minimizing the
+   number of calls to levenshtein_distance and to
+   edit_distance_traits<T>::get_length.  */
+
+template <typename GOAL_TYPE, typename CANDIDATE_TYPE>
+class best_match
+{
+ public:
+  typedef GOAL_TYPE goal_t;
+  typedef CANDIDATE_TYPE candidate_t;
+  typedef edit_distance_traits<goal_t> goal_traits;
+  typedef edit_distance_traits<candidate_t> candidate_traits;
+
+  /* Constructor.  */
+
+  best_match (goal_t goal)
+  : m_goal (goal_traits::get_string (goal)),
+    m_goal_len (goal_traits::get_length (goal)),
+    m_best_candidate (NULL),
+    m_best_distance (MAX_EDIT_DISTANCE)
+  {}
+
+  /* Compare the edit distance between CANDIDATE and m_goal,
+     and if it's the best so far, record it.  */
+
+  void consider (candidate_t candidate)
+  {
+    size_t candidate_len = candidate_traits::get_length (candidate);
+
+    /* Calculate a lower bound on the candidate's distance to the goal,
+       based on the difference in lengths; it will require at least
+       this many insertions/deletions.  */
+    edit_distance_t min_candidate_distance
+      = abs ((ssize_t)candidate_len - (ssize_t)m_goal_len);
+
+    /* If the candidate's length is sufficiently different to that
+       of the goal string, then the number of insertions/deletions
+       may be >= the best distance so far.  If so, we can reject
+       the candidate immediately without needing to compute
+       the exact distance, since it won't be an improvement.  */
+    if (min_candidate_distance >= m_best_distance)
+      return;
+
+    /* If the candidate will be unable to beat the criterion in
+       get_best_meaningful_candidate, reject it without computing
+       the exact distance.  */
+    unsigned int cutoff = MAX (m_goal_len, candidate_len) / 2;
+    if (min_candidate_distance > cutoff)
+      return;
+
+    /* Otherwise, compute the distance and see if the candidate
+       has beaten the previous best value.  */
+    edit_distance_t dist
+      = levenshtein_distance (m_goal, m_goal_len,
+			      candidate_traits::get_string (candidate),
+			      candidate_len);
+    if (dist < m_best_distance)
+      {
+	m_best_distance = dist;
+	m_best_candidate = candidate;
+	m_best_candidate_len = candidate_len;
+      }
+  }
+
+  /* Get the best candidate so far, but applying a filter to ensure
+     that we return NULL if none of the candidates are close to the goal,
+     to avoid offering nonsensical suggestions to the user.  */
+
+  candidate_t get_best_meaningful_candidate () const
+  {
+    /* If more than half of the letters were misspelled, the suggestion is
+       likely to be meaningless.  */
+    if (m_best_candidate)
+      {
+	unsigned int cutoff = MAX (m_goal_len, m_best_candidate_len) / 2;
+	if (m_best_distance > cutoff)
+	  return NULL;
+    }
+    return m_best_candidate;
+  }
 
-extern tree
-find_closest_identifier (tree target, const auto_vec<tree> *candidates);
+ private:
+  const char *m_goal;
+  size_t m_goal_len;
+  candidate_t m_best_candidate;
+  edit_distance_t m_best_distance;
+  size_t m_best_candidate_len;
+};
 
 #endif  /* GCC_SPELLCHECK_H  */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id
@ 2016-06-14 14:49 David Malcolm
  2016-06-14 14:49 ` [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject David Malcolm
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: David Malcolm @ 2016-06-14 14:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

There's a fair amount of repetition in the code to emit
fixits for misspelled identifiers, and I plan to add more
such fixits, so this patch moves it to a helper method.

Successfully bootstrapped&regrtested in combination with the rest of
the kit on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
just this patch by itself.

OK for trunk if it passes individual bootstrap&regrtest?

gcc/c/ChangeLog:
	* c-typeck.c (build_component_ref): Simplify fixit code by
	using gcc_rich_location::add_fixit_misspelled_id.
	(set_init_label): Likewise.

gcc/cp/ChangeLog:
	* typeck.c: Include "gcc-rich-location.h".
	(finish_class_member_access_expr): Simplify fixit code by
	using gcc_rich_location::add_fixit_misspelled_id.

gcc/ChangeLog:
	* gcc-rich-location.c
	(gcc_rich_location::add_fixit_misspelled_id): New method.
	* gcc-rich-location.h
	(gcc_rich_location::add_fixit_misspelled_id): Add decl.
---
 gcc/c/c-typeck.c        | 16 ++++------------
 gcc/cp/typeck.c         | 10 ++++------
 gcc/gcc-rich-location.c | 14 ++++++++++++++
 gcc/gcc-rich-location.h |  3 +++
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index ea04d5e..f987508 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2371,14 +2371,9 @@ build_component_ref (location_t loc, tree datum, tree component,
 		 we have a valid range for the component.  */
 	      location_t reported_loc
 		= (component_loc != UNKNOWN_LOCATION) ? component_loc : loc;
-	      rich_location rich_loc (line_table, reported_loc);
+	      gcc_rich_location rich_loc (reported_loc);
 	      if (component_loc != UNKNOWN_LOCATION)
-		{
-		  source_range component_range =
-		    get_range_from_loc (line_table, component_loc);
-		  rich_loc.add_fixit_replace (component_range,
-					      IDENTIFIER_POINTER (guessed_id));
-		}
+		rich_loc.add_fixit_misspelled_id (component_loc, guessed_id);
 	      error_at_rich_loc
 		(&rich_loc,
 		 "%qT has no member named %qE; did you mean %qE?",
@@ -8234,11 +8229,8 @@ set_init_label (location_t loc, tree fieldname, location_t fieldname_loc,
       tree guessed_id = lookup_field_fuzzy (constructor_type, fieldname);
       if (guessed_id)
 	{
-	  rich_location rich_loc (line_table, fieldname_loc);
-	  source_range component_range =
-	    get_range_from_loc (line_table, fieldname_loc);
-	  rich_loc.add_fixit_replace (component_range,
-				      IDENTIFIER_POINTER (guessed_id));
+	  gcc_rich_location rich_loc (fieldname_loc);
+	  rich_loc.add_fixit_misspelled_id (fieldname_loc, guessed_id);
 	  error_at_rich_loc
 	    (&rich_loc,
 	     "%qT has no member named %qE; did you mean %qE?",
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index f68c2a3..2ccd2da 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -36,6 +36,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-objc.h"
 #include "c-family/c-ubsan.h"
 #include "params.h"
+#include "gcc-rich-location.h"
 
 static tree cp_build_addr_expr_strict (tree, tsubst_flags_t);
 static tree cp_build_function_call (tree, tree, tsubst_flags_t);
@@ -2831,12 +2832,9 @@ finish_class_member_access_expr (cp_expr object, tree name, bool template_p,
 		  if (guessed_id)
 		    {
 		      location_t bogus_component_loc = input_location;
-		      rich_location rich_loc (line_table, bogus_component_loc);
-		      source_range bogus_component_range =
-			get_range_from_loc (line_table, bogus_component_loc);
-		      rich_loc.add_fixit_replace
-			(bogus_component_range,
-			 IDENTIFIER_POINTER (guessed_id));
+		      gcc_rich_location rich_loc (bogus_component_loc);
+		      rich_loc.add_fixit_misspelled_id (bogus_component_loc,
+							guessed_id);
 		      error_at_rich_loc
 			(&rich_loc,
 			 "%q#T has no member named %qE; did you mean %qE?",
diff --git a/gcc/gcc-rich-location.c b/gcc/gcc-rich-location.c
index a03ce0e..15c0700 100644
--- a/gcc/gcc-rich-location.c
+++ b/gcc/gcc-rich-location.c
@@ -60,3 +60,17 @@ gcc_rich_location::maybe_add_expr (tree t)
   if (EXPR_P (t))
     add_expr (t);
 }
+
+/* Add a fixit hint suggesting replacing the range at MISSPELLED_TOKEN_LOC
+   with the identifier HINT_ID.  */
+
+void
+gcc_rich_location::add_fixit_misspelled_id (location_t misspelled_token_loc,
+					    tree hint_id)
+{
+  gcc_assert (TREE_CODE (hint_id) == IDENTIFIER_NODE);
+
+  source_range misspelled_token_range
+    = get_range_from_loc (line_table, misspelled_token_loc);
+  add_fixit_replace (misspelled_token_range, IDENTIFIER_POINTER (hint_id));
+}
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
index fd332de..9c8a790 100644
--- a/gcc/gcc-rich-location.h
+++ b/gcc/gcc-rich-location.h
@@ -42,6 +42,9 @@ class gcc_rich_location : public rich_location
 
   void
   maybe_add_expr (tree t);
+
+  void add_fixit_misspelled_id (location_t misspelled_token_loc,
+				tree hint_id);
 };
 
 #endif /* GCC_RICH_LOCATION_H */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id
  2016-06-14 14:49 [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id David Malcolm
                   ` (2 preceding siblings ...)
  2016-06-14 14:49 ` [PATCH 2/4] Add more spellcheck selftests David Malcolm
@ 2016-06-14 20:58 ` Jeff Law
  3 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-06-14 20:58 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 06/14/2016 09:15 AM, David Malcolm wrote:
> There's a fair amount of repetition in the code to emit
> fixits for misspelled identifiers, and I plan to add more
> such fixits, so this patch moves it to a helper method.
>
> Successfully bootstrapped&regrtested in combination with the rest of
> the kit on x86_64-pc-linux-gnu
> Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
> just this patch by itself.
>
> OK for trunk if it passes individual bootstrap&regrtest?
>
> gcc/c/ChangeLog:
> 	* c-typeck.c (build_component_ref): Simplify fixit code by
> 	using gcc_rich_location::add_fixit_misspelled_id.
> 	(set_init_label): Likewise.
>
> gcc/cp/ChangeLog:
> 	* typeck.c: Include "gcc-rich-location.h".
> 	(finish_class_member_access_expr): Simplify fixit code by
> 	using gcc_rich_location::add_fixit_misspelled_id.
>
> gcc/ChangeLog:
> 	* gcc-rich-location.c
> 	(gcc_rich_location::add_fixit_misspelled_id): New method.
> 	* gcc-rich-location.h
> 	(gcc_rich_location::add_fixit_misspelled_id): Add decl.
OK.
jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] Add more spellcheck selftests
  2016-06-14 14:49 ` [PATCH 2/4] Add more spellcheck selftests David Malcolm
@ 2016-06-14 20:59   ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-06-14 20:59 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 06/14/2016 09:15 AM, David Malcolm wrote:
> The next patch in the kit reimplements find_closest_string and
> find_closest_identifier, so it seems prudent to add some more
> test coverage for these.  This patch also adds some more test
> coverage for levenshtein_distance itself.
>
> Successfully bootstrapped&regrtested in combination with the rest of
> the kit on x86_64-pc-linux-gnu
> Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
> just this patch (on top of patch 1).
>
> OK for trunk if it passes individual bootstrap&regrtest?
>
> gcc/ChangeLog:
> 	* selftest-run-tests.c (selftest::run_tests): Call
> 	selftest::spellcheck_tree_c_tests.
> 	* selftest.h (selftest::spellcheck_tree_c_tests): New decl.
> 	* spellcheck-tree.c: Include selftest.h and stringpool.h.
> 	(selftest::test_find_closest_identifier): New function.
> 	(selftest::spellcheck_tree_c_tests): New function.
> 	* spellcheck.c (selftest::test_find_closest_string): Verify that
> 	the order of the vec does not affect the results for this case.
> 	(selftest::test_data): New array.
> 	(selftest::test_metric_conditions): New function.
> 	(selftest::spellcheck_c_tests): Add a test of case-comparison.
> 	Call selftest::test_metric_conditions.
OK.
jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject
  2016-06-14 14:49 ` [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject David Malcolm
@ 2016-06-14 21:06   ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-06-14 21:06 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 06/14/2016 09:15 AM, David Malcolm wrote:
> There's a lot of repetition between find_closest_string and
> find_closest_identifier, and the next patch adds more, so this
> patch moves the logic into a new template class "best_match"
> for locating the closest string from a sequence of candidates.
>
> The patch also introduces a pair of early-reject optimizations
> that weren't present in the older implementations, reducing the
> number of calls to levenshtein_distance.
> Introducing class best_match allows for these optimizations to be
> in one place (best_match::consider), rather than having to implement
> them twice.
>
> Successfully bootstrapped&regrtested in combination with the rest of
> the kit on x86_64-pc-linux-gnu
> Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 of
> just this patch (on top of patches 1 and 2).
>
> OK for trunk if it passes individual bootstrap&regrtest?
>
> gcc/c/ChangeLog:
> 	* c-typeck.c: Include spellcheck-tree.h rather than spellcheck.h.
>
> gcc/cp/ChangeLog:
> 	* search.c: Include spellcheck-tree.h rather than spellcheck.h.
>
> gcc/ChangeLog:
> 	* spellcheck-tree.c: Include spellcheck-tree.h rather than
> 	spellcheck.h.
> 	(find_closest_identifier): Reimplement in terms of
> 	best_match<tree,tree>.
> 	* spellcheck-tree.h: New file.
> 	* spellcheck.c (struct edit_distance_traits<const char *>): New
> 	struct.
> 	(find_closest_string): Reimplement in terms of
> 	best_match<const char *, const char *>.
> 	* spellcheck.h (levenshtein_distance): Move prototype of tree-based
> 	overload to spellcheck-tree.h.
> 	(find_closest_identifier): Likewise.
> 	(struct edit_distance_traits<T>): New template.
> 	(class best_match): New class.
OK.
jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names
  2016-06-14 14:49 ` [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names David Malcolm
@ 2016-06-21 21:41   ` Jeff Law
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Law @ 2016-06-21 21:41 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 06/14/2016 09:15 AM, David Malcolm wrote:
> This patch introduces a new lookup_name_fuzzy function to the
> C frontend and uses it to provides suggestions for various error
> messages that may be due to misspellings, and also for the
> warnings in implicit_decl_warning.
>
> This latter part may be controversial.  So far, we've only provided
> spelling suggestions during error-handling, and I think there's
> a strong case for spending the extra cycles to give a good error
> message given that the compilation is going to fail.  For a *warning*,
> this tradeoff may not be so clear.  In my experience, the
> "implicit declaration of function" warning usually means that
> either the compile or link is going to fail, so it may be that these
> cases are usually effectively errors (hence the suggestions in this
> patch).
>
> Alternatively, the call to lookup_name_fuzzy could be somehow guarded
> so that we don't do work upfront to generate the suggestion
> if the warning is going to be discarded internally within diagnostics.c.
> (if the user has supplied -Wno-implicit-function-declaration, they're
> presumably working with code that uses implicit function decls, I
> suppose).
>
> Amongst other things, builtins get offered as suggestions (hence the
> suggestion of "nanl" for "name" in one existing test case, which I
> initially found surprising).
>
> The patch includes PR c/70339: "singed" vs "signed" hints, looking
> at reserved words that could start typenames.
>
> The patch also considers preprocessor macros when considering
> spelling candidates: if the user misspells a macro name, this is
> typically seen by the frontend as an unrecognized identifier, so
> we can offer suggestions like this:
>
> spellcheck-identifiers.c: In function 'test_4':
> spellcheck-identifiers.c:64:10: warning: implicit declaration of
> function 'IDENTIFIER_PTR'; did you mean 'IDENTIFIER_POINTER'?
> [-Wimplicit-function-declaration]
>    return IDENTIFIER_PTR (node);
>           ^~~~~~~~~~~~~~
>           IDENTIFIER_POINTER
>
> Doing so uses the "best_match" class added in a prior patch, merging
> in results between C scopes and the preprocessor, using the
> optimizations there to minimize the work done.  Merging these
> results required some minor surgery to class best_match.
>
> In c-c++-common/attributes-1.c, the error message for:
>   void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(1,bar))); /* { dg-warning "outside range" } */
> gains a suggestion of "carg", becoming:
>   attributes-1.c:4:65: error: 'bar' undeclared here (not in a function); did you mean 'carg'?
>   attributes-1.c:4:1: warning: alloc_size parameter outside range [-Wattributes]
> This is an unhelpful suggestion, given that alloc_size expects an integer
> value identifying a parameter, which the builtin
>   double carg (double complex z);
> is not.  It's not clear to me what the best way to fix this is:
> if I'm reading things right, c_parser_attributes parses expressions
> using c_parser_expr_list without regard to which attribute it's
> handling, so there's (currently) no way to "tune" the attribute parser
> based on the attribute (and I don't know if that's a complexity we
> want to take on).
>
> Successfully bootstrapped&regrtested in combination with the rest of
> the kit on x86_64-pc-linux-gnu
> Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 (in
> combination with the rest of the kit).
>
> OK for trunk?
>
> gcc/c-family/ChangeLog:
> 	PR c/70339
> 	* c-common.h (enum lookup_name_fuzzy_kind): New enum.
> 	(lookup_name_fuzzy): New prototype.
>
> gcc/c/ChangeLog:
> 	PR c/70339
> 	* c-decl.c: Include spellcheck-tree.h and gcc-rich-location.h.
> 	(implicit_decl_warning): When issuing warnings for implicit
> 	declarations, attempt to provide a suggestion via
> 	lookup_name_fuzzy.
> 	(undeclared_variable): Likewise when issuing errors.
> 	(lookup_name_in_scope): Likewise.
> 	(struct edit_distance_traits<cpp_hashnode *>): New struct.
> 	(best_macro_match): New typedef.
> 	(find_closest_macro_cpp_cb): New function.
> 	(lookup_name_fuzzy): New function.
> 	* c-parser.c: Include gcc-rich-location.h.
> 	(c_token_starts_typename): Split out case CPP_KEYWORD into...
> 	(c_keyword_starts_typename): ...this new function.
> 	(c_parser_declaration_or_fndef): When issuing errors about
> 	missing "struct" etc, add a fixit.  For other kinds of errors,
> 	attempt to provide a suggestion via lookup_name_fuzzy.
> 	(c_parser_parms_declarator): When looking ahead to detect typos in
> 	type names, also reject CPP_KEYWORD.
> 	(c_parser_parameter_declaration): When issuing errors about
> 	unknown type names, attempt to provide a suggestion via
> 	lookup_name_fuzzy.
> 	* c-tree.h (c_keyword_starts_typename): New prototype.
>
> gcc/ChangeLog:
> 	PR c/70339
> 	* diagnostic-core.h (pedwarn_at_rich_loc): New prototype.
> 	* diagnostic.c (pedwarn_at_rich_loc): New function.
> 	* spellcheck.h (best_match::best_match): Add a
> 	"best_distance_so_far" optional parameter.
> 	(best_match::set_best_so_far): New method.
> 	(best_match::get_best_distance): New accessor.
> 	(best_match::get_best_candidate_length): New accessor.
>
> gcc/testsuite/ChangeLog:
> 	PR c/70339
> 	* c-c++-common/attributes-1.c: Update dg-prune-output to include
> 	hint.
> 	* gcc.dg/diagnostic-token-ranges.c (undeclared_identifier): Update
> 	expected results due to builtin "nanl" now being suggested for
> 	"name".
> 	* gcc.dg/pr67580.c: Update expected messages.
> 	* gcc.dg/spellcheck-identifiers.c: New testcase.
> 	* gcc.dg/spellcheck-typenames.c: New testcase.
I think this is good and isn't terribly controversial in the modern era. 
  I suspect things would be different had you proposed something like 
this 15+ years ago when implicit declarations were common.

I wouldn't be terribly surprised if we get some flak if/when folks try 
to build old codebases with gcc-7.  So based on feedback in the spring 
as we approach the release, we may need to revisit whether or not to 
guard this somehow to avoid noise on old codebases.

jeff

Jeff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-06-21 21:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-14 14:49 [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id David Malcolm
2016-06-14 14:49 ` [PATCH 3/4] spellcheck.h: add best_match template; implement early-reject David Malcolm
2016-06-14 21:06   ` Jeff Law
2016-06-14 14:49 ` [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names David Malcolm
2016-06-21 21:41   ` Jeff Law
2016-06-14 14:49 ` [PATCH 2/4] Add more spellcheck selftests David Malcolm
2016-06-14 20:59   ` Jeff Law
2016-06-14 20:58 ` [PATCH 1/4] Introduce gcc_rich_location::add_fixit_misspelled_id Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).