[PATCH 00/22] RFC: Overhaul of diagnostics

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH 00/22] RFC: Overhaul of diagnostics
@ 2015-09-10 20:12 David Malcolm
  2015-09-10 20:12 ` [PATCH 01/22] Change of location_get_source_line signature David Malcolm
                   ` (22 more replies)
  0 siblings, 23 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This is a followup to the ideas posted at:
  https://gcc.gnu.org/ml/gcc-patches/2015-03/msg00837.html

I've been experimenting with capturing and printing richer
information for our diagnostics, underlining pertinent source ranges
when printing them, and potentially providing "fix-it" hints.

Attached is a work-in-progress patch kit implementing these ideas.
I posting it now to get feedback: some parts of it may be ready to
commit, but other parts are definitely *not* ready yet.

Some screenshots:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-04/ranges-in-format-string-diagnostics.html
 https://dmalcolm.fedorapeople.org/gcc/2015-09-10/spellcheck-with-fixits.html
(other screenshots can be seen in the individual patches).

Overview of the patches:

Patches 1-3:
    01: Change of location_get_source_line signature
    02: Testsuite: add dg-{begin|end}-multiline-output commands
    03: Move diagnostic_show_locus and friends out into a new source file
These patches are a preamble, setting things up for what's to come.

Patches 4-5:
    04: Reimplement diagnostic_show_locus, introducing rich_location classes
    05: Add overloads of inform, warning_at, etc that take a source_range
These patches introduce a "rich_location" class capable of storing
multiple ranges, and use it internally throughout the diagnostic
implementation, together with the code for printing such information.

These patches and the ones leading up to it survive bootstrap and
regression testing; after this point things start to get more
"experimental".

Patch 6:
    06: PR/62314: add ability to add fixit-hints
This adds some machinery to allow diagnostics to supply "fix-it" hints
to the user.

Patches 7-11:
    07: Implement token range tracking within libcpp and C/C++ FEs
    08: C frontend: use token ranges in various diagnostics
    09: C frontend: store and use token ranges in c_declspecs
    10: C++ FE: Use token ranges for various diagnostics
    11: Objective C: c/c-parser.c: use token ranges in two places
These patches capture source range information for *tokens* in
libcpp, C and C++, and start using them in place of
mere locations for various diagnostics, so that we get underlines.

Patches 12-16:
    12: Add source-ranges for trees
    13: gcc-rich-location.[ch]: add methods for working with tree ranges
    14: C: capture tree ranges for various expressions
    15: Add plugin to recursively dump the source-ranges in a tree
These patches capture source range information
for *tree expressions*.  The representation I'm using has both
enlargements to core tree structures *and* will require lots of extra
tree nodes, so it's clearly going to be unacceptable as-is; my aim
here is to use this as a starting point for trying to optimize the
implementation (e.g. to gather stats on real-world code).

Patches 16:
    16: C/C++ frontend: use tree ranges in various diagnostics
This patches uses the tree expression source range information
gathered above in various diagnostics, so that we get more
underlines.  This is probably just scratching the surface in terms
of the diagnostics we could improve.

Patches 17-20:
    17: libcpp: add location tracking within string literals
    18: Track locations within string literals in tree_string
    19: gcc-rich-location.[ch]: add debug methods for cpp_string_location
    20: Use rich locations in c-family/c-format.c
These patches extend string-literal parsing to capture per-character
source range information within string literals, then use it within
c-format.c to robustly print string ranges for printf-style format
string errors (even in the face of e.g. concatenation).  This
hasn't been optimized yet; I don't yet know the impact on
compile-time and memory (but I have ideas for optimizing it if
it's an issue).

Patches 21-22:
    21: Use Levenshtein distance for various misspellings in C frontend
    22: Add fixit hints to spellchecker suggestions
These patches use Levenshtein distance to provide hints in
various places when the user misspells something,
and starts adding the ability to add "fix-it" hints to diagnostics.
Some of this could be split out and applied independently of the
rich_location work.

Thoughts?

(the patches are on top of r227562)

 boehm-gc/testsuite/lib/boehm-gc.exp                |    1 +
 gcc/Makefile.in                                    |    5 +-
 gcc/box-drawing.c                                  |   99 ++
 gcc/box-drawing.h                                  |   43 +
 gcc/c-family/c-common.c                            |   37 +-
 gcc/c-family/c-common.h                            |    8 +-
 gcc/c-family/c-format.c                            |  117 +-
 gcc/c-family/c-indentation.c                       |   10 +-
 gcc/c-family/c-lex.c                               |   32 +-
 gcc/c-family/c-pragma.h                            |    4 +-
 gcc/c-family/c-pretty-print.c                      |    4 +
 gcc/c/c-convert.c                                  |   17 +-
 gcc/c/c-decl.c                                     |   58 +-
 gcc/c/c-errors.c                                   |   95 +-
 gcc/c/c-objc-common.c                              |    2 +-
 gcc/c/c-parser.c                                   |  172 ++-
 gcc/c/c-tree.h                                     |   26 +-
 gcc/c/c-typeck.c                                   |  273 +++--
 gcc/cp/error.c                                     |    5 +-
 gcc/cp/parser.c                                    |   54 +-
 gcc/cp/parser.h                                    |    2 +
 gcc/cp/typeck.c                                    |    3 +-
 gcc/diagnostic-color.c                             |    5 +-
 gcc/diagnostic-core.h                              |   15 +
 gcc/diagnostic-show-locus.c                        | 1116 ++++++++++++++++++++
 gcc/diagnostic.c                                   |  412 +++++---
 gcc/diagnostic.h                                   |   48 +-
 gcc/fortran/cpp.c                                  |   13 +-
 gcc/fortran/error.c                                |   43 +-
 gcc/gcc-rich-location.c                            |  235 +++++
 gcc/gcc-rich-location.h                            |   84 ++
 gcc/genmatch.c                                     |   27 +-
 gcc/gimplify.c                                     |    4 +
 gcc/input.c                                        |   21 +-
 gcc/input.h                                        |    2 +-
 gcc/intl.c                                         |    9 +
 gcc/objc/objc-act.c                                |    3 +-
 gcc/pretty-print.c                                 |   21 +
 gcc/pretty-print.h                                 |   25 +-
 gcc/print-tree.c                                   |   21 +
 gcc/rtl-error.c                                    |    3 +-
 gcc/spellcheck.c                                   |  126 +++
 gcc/spellcheck.h                                   |   32 +
 gcc/testsuite/g++.dg/diagnostic/token-ranges.C     |  104 ++
 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c     |  135 +++
 gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c |  159 +++
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c    |  101 ++
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c  |  562 ++++++++++
 .../plugin/diagnostic-test-show-locus-ascii-bw.c   |  157 +++
 .../diagnostic-test-show-locus-ascii-color.c       |   78 ++
 .../plugin/diagnostic-test-show-locus-utf-8-bw.c   |  101 ++
 .../diagnostic-test-show-locus-utf-8-color.c       |  105 ++
 .../gcc.dg/plugin/diagnostic-test-show-trees-1.c   |  106 ++
 .../plugin/diagnostic-test-string-literals-1.c     |  139 +++
 .../gcc.dg/plugin/diagnostic_plugin_show_trees.c   |  179 ++++
 .../plugin/diagnostic_plugin_test_show_locus.c     |  396 +++++++
 .../diagnostic_plugin_test_string_literals.c       |  215 ++++
 .../diagnostic_plugin_test_tree_expression_range.c |  162 +++
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   11 +
 gcc/testsuite/gcc.dg/spellcheck.c                  |   40 +
 gcc/testsuite/lib/gcc-dg.exp                       |    1 +
 gcc/testsuite/lib/multiline.exp                    |  241 +++++
 gcc/testsuite/lib/prune.exp                        |    5 +
 gcc/tree-core.h                                    |    5 +
 gcc/tree-diagnostic.c                              |    2 +-
 gcc/tree-pretty-print.c                            |    2 +-
 gcc/tree.c                                         |   66 +-
 gcc/tree.def                                       |    2 +
 gcc/tree.h                                         |   33 +
 libatomic/testsuite/lib/libatomic.exp              |    1 +
 libcpp/charset.c                                   |  357 ++++++-
 libcpp/directives.c                                |    4 +-
 libcpp/errors.c                                    |    7 +-
 libcpp/expr.c                                      |    2 +
 libcpp/include/cpplib.h                            |  144 ++-
 libcpp/include/line-map.h                          |  329 ++++++
 libcpp/internal.h                                  |    7 +-
 libcpp/lex.c                                       |   20 +-
 libcpp/line-map.c                                  |  334 ++++++
 libcpp/macro.c                                     |    1 +
 libgo/testsuite/lib/libgo.exp                      |    1 +
 libgomp/testsuite/lib/libgomp.exp                  |    1 +
 libitm/testsuite/lib/libitm.exp                    |    1 +
 libvtv/testsuite/lib/libvtv.exp                    |    1 +
 84 files changed, 7124 insertions(+), 525 deletions(-)
 create mode 100644 gcc/box-drawing.c
 create mode 100644 gcc/box-drawing.h
 create mode 100644 gcc/diagnostic-show-locus.c
 create mode 100644 gcc/gcc-rich-location.c
 create mode 100644 gcc/gcc-rich-location.h
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/token-ranges.C
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c
 create mode 100644 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-trees-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_show_trees.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck.c
 create mode 100644 gcc/testsuite/lib/multiline.exp

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 01/22] Change of location_get_source_line signature
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
@ 2015-09-10 20:12 ` David Malcolm
  2015-09-14 19:28   ` Jeff Law
  2015-09-10 20:13 ` [PATCH 06/22] PR/62314: add ability to add fixit-hints David Malcolm
                   ` (21 subsequent siblings)
  22 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

location_get_source_line takes an expanded_location, but the column
is irrelevant; it just needs a filename and line number.

This change is used by, but independent of, the new implementation of
diagnostic_show_locus later in the kit, so am breaking this out early.

gcc/ChangeLog:
	* input.h (location_get_source_line): Drop "expanded_location"
	param in favor of a file and line number.
	* input.c (location_get_source_line): Likewise.
	(dump_location_info): Update for change in signature of
	location_get_source_line.
	* diagnostic.c (diagnostic_print_caret_line): Likewise.

gcc/c-family/ChangeLog:
	* c-format.c (location_from_offset): Update for change in
	signature of location_get_source_line.
	* c-indentation.c (get_visual_column): Likewise.
	(line_contains_hash_if): Likewise.
---
 gcc/c-family/c-format.c      |  2 +-
 gcc/c-family/c-indentation.c | 10 +++-------
 gcc/diagnostic.c             |  3 ++-
 gcc/input.c                  | 14 ++++++++------
 gcc/input.h                  |  2 +-
 5 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index 2940f92..ab58076 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -132,7 +132,7 @@ location_from_offset (location_t loc, int offset)
 
   expanded_location s = expand_location_to_spelling_point (loc);
   int line_width;
-  const char *line = location_get_source_line (s, &line_width);
+  const char *line = location_get_source_line (s.file, s.line, &line_width);
   if (line == NULL)
     return loc;
   line += s.column - 1 ;
diff --git a/gcc/c-family/c-indentation.c b/gcc/c-family/c-indentation.c
index fdfe0a9..dd35223 100644
--- a/gcc/c-family/c-indentation.c
+++ b/gcc/c-family/c-indentation.c
@@ -45,7 +45,8 @@ get_visual_column (expanded_location exploc,
 		   unsigned int *first_nws = NULL)
 {
   int line_len;
-  const char *line = location_get_source_line (exploc, &line_len);
+  const char *line = location_get_source_line (exploc.file, exploc.line,
+					       &line_len);
   if (!line)
     return false;
   unsigned int vis_column = 0;
@@ -84,13 +85,8 @@ get_visual_column (expanded_location exploc,
 static bool
 line_contains_hash_if (const char *file, int line_num)
 {
-  expanded_location exploc;
-  exploc.file = file;
-  exploc.line = line_num;
-  exploc.column = 1;
-
   int line_len;
-  const char *line = location_get_source_line (exploc, &line_len);
+  const char *line = location_get_source_line (file, line_num, &line_len);
   if (!line)
     return false;
 
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 01a8e35..74a40bb 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -378,7 +378,8 @@ diagnostic_print_caret_line (diagnostic_context * context,
   
   int cmax = MAX (xloc1.column, xloc2.column);
   int line_width;
-  const char *line = location_get_source_line (xloc1, &line_width);
+  const char *line = location_get_source_line (xloc1.file, xloc1.line,
+					       &line_width);
   if (line == NULL || cmax > line_width)
     return;
 
diff --git a/gcc/input.c b/gcc/input.c
index 59cab5c..e7302a4 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -684,27 +684,27 @@ read_line_num (fcache *c, size_t line_num,
   return read_next_line (c, line, line_len);
 }
 
-/* Return the physical source line that corresponds to xloc in a
+/* Return the physical source line that corresponds to FILE_PATH/LINE in a
    buffer that is statically allocated.  The newline is replaced by
    the null character.  Note that the line can contain several null
    characters, so LINE_LEN, if non-null, points to the actual length
    of the line.  */
 
 const char *
-location_get_source_line (expanded_location xloc,
+location_get_source_line (const char *file_path, int line,
 			  int *line_len)
 {
   static char *buffer;
   static ssize_t len;
 
-  if (xloc.line == 0)
+  if (line == 0)
     return NULL;
 
-  fcache *c = lookup_or_add_file_to_cache_tab (xloc.file);
+  fcache *c = lookup_or_add_file_to_cache_tab (file_path);
   if (c == NULL)
     return NULL;
 
-  bool read = read_line_num (c, xloc.line, &buffer, &len);
+  bool read = read_line_num (c, line, &buffer, &len);
 
   if (read && line_len)
     *line_len = len;
@@ -971,7 +971,9 @@ dump_location_info (FILE *stream)
 	      /* Beginning of a new source line: draw the line.  */
 
 	      int line_size;
-	      const char *line_text = location_get_source_line (exploc, &line_size);
+	      const char *line_text = location_get_source_line (exploc.file,
+								exploc.line,
+								&line_size);
 	      if (!line_text)
 		break;
 	      fprintf (stream,
diff --git a/gcc/input.h b/gcc/input.h
index 5ba4d3b..07d8544 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -38,7 +38,7 @@ extern char builtins_location_check[(BUILTINS_LOCATION
 
 extern bool is_location_from_builtin_token (source_location);
 extern expanded_location expand_location (source_location);
-extern const char *location_get_source_line (expanded_location xloc,
+extern const char *location_get_source_line (const char *file_path, int line,
 					     int *line_size);
 extern expanded_location expand_location_to_spelling_point (source_location);
 extern source_location expansion_point_location_if_in_system_header (source_location);
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (8 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 13/22] gcc-rich-location.[ch]: add methods for working with tree ranges David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-14 19:37   ` Jeff Law
  2015-09-10 20:28 ` [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs David Malcolm
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

The function "diagnostic_show_locus" gains new functionality in the
next patch, so this preliminary patch breaks it out into a new source
file, diagnostic-show-locus.c, along with a couple of related functions.

gcc/ChangeLog:
	* Makefile.in (OBJS-libcommon): Add diagnostic-show-locus.o.
	* diagnostic.c (adjust_line): Move to diagnostic-show-locus.c.
	(diagnostic_show_locus): Likewise.
	(diagnostic_print_caret_line): Likewise.
	* diagnostic-show-locus.c: New file.
---
 gcc/Makefile.in             |   3 +-
 gcc/diagnostic-show-locus.c | 166 ++++++++++++++++++++++++++++++++++++++++++++
 gcc/diagnostic.c            | 129 ----------------------------------
 3 files changed, 168 insertions(+), 130 deletions(-)
 create mode 100644 gcc/diagnostic-show-locus.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3d1c1e5..f183b22 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1512,7 +1512,8 @@ OBJS = \
 
 # Objects in libcommon.a, potentially used by all host binaries and with
 # no target dependencies.
-OBJS-libcommon = diagnostic.o diagnostic-color.o pretty-print.o intl.o \
+OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
+	pretty-print.o intl.o \
 	vec.o input.o version.o hash-table.o ggc-none.o
 
 # Objects in libcommon-target.a, used by drivers and by the core
diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
new file mode 100644
index 0000000..147a2b8
--- /dev/null
+++ b/gcc/diagnostic-show-locus.c
@@ -0,0 +1,166 @@
+/* Diagnostic subroutines for printing source-code
+   Copyright (C) 1999-2015 Free Software Foundation, Inc.
+   Contributed by Gabriel Dos Reis <gdr@codesourcery.com>
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "version.h"
+#include "demangle.h"
+#include "intl.h"
+#include "backtrace.h"
+#include "diagnostic.h"
+#include "diagnostic-color.h"
+
+#ifdef HAVE_TERMIOS_H
+# include <termios.h>
+#endif
+
+#ifdef GWINSZ_IN_SYS_IOCTL
+# include <sys/ioctl.h>
+#endif
+
+/* If LINE is longer than MAX_WIDTH, and COLUMN is not smaller than
+   MAX_WIDTH by some margin, then adjust the start of the line such
+   that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
+   margin is either CARET_LINE_MARGIN characters or the difference
+   between the column and the length of the line, whatever is smaller.
+   The length of LINE is given by LINE_WIDTH.  */
+static const char *
+adjust_line (const char *line, int line_width,
+	     int max_width, int *column_p)
+{
+  int right_margin = CARET_LINE_MARGIN;
+  int column = *column_p;
+
+  gcc_checking_assert (line_width >= column);
+  right_margin = MIN (line_width - column, right_margin);
+  right_margin = max_width - right_margin;
+  if (line_width >= max_width && column > right_margin)
+    {
+      line += column - right_margin;
+      *column_p = right_margin;
+    }
+  return line;
+}
+
+/* Print the physical source line corresponding to the location of
+   this diagnostic, and a caret indicating the precise column.  This
+   function only prints two caret characters if the two locations
+   given by DIAGNOSTIC are on the same line according to
+   diagnostic_same_line().  */
+void
+diagnostic_show_locus (diagnostic_context * context,
+		       const diagnostic_info *diagnostic)
+{
+  if (!context->show_caret
+      || diagnostic_location (diagnostic, 0) <= BUILTINS_LOCATION
+      || diagnostic_location (diagnostic, 0) == context->last_location)
+    return;
+
+  context->last_location = diagnostic_location (diagnostic, 0);
+  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
+  expanded_location s1 = { };
+  /* Zero-initialized. This is checked later by diagnostic_print_caret_line.  */
+
+  if (diagnostic_location (diagnostic, 1) > BUILTINS_LOCATION)
+    s1 = diagnostic_expand_location (diagnostic, 1);
+
+  diagnostic_print_caret_line (context, s0, s1,
+			       context->caret_chars[0],
+			       context->caret_chars[1]);
+}
+
+/* Print (part) of the source line given by xloc1 with caret1 pointing
+   at the column.  If xloc2.column != 0 and it fits within the same
+   line as xloc1 according to diagnostic_same_line (), then caret2 is
+   printed at xloc2.colum.  Otherwise, the caller has to set up things
+   to print a second caret line for xloc2.  */
+void
+diagnostic_print_caret_line (diagnostic_context * context,
+			     expanded_location xloc1,
+			     expanded_location xloc2,
+			     char caret1, char caret2)
+{
+  if (!diagnostic_same_line (context, xloc1, xloc2))
+    /* This will mean ignore xloc2.  */
+    xloc2.column = 0;
+  else if (xloc1.column == xloc2.column)
+    xloc2.column++;
+
+  int cmax = MAX (xloc1.column, xloc2.column);
+  int line_width;
+  const char *line = location_get_source_line (xloc1.file, xloc1.line,
+					       &line_width);
+  if (line == NULL || cmax > line_width)
+    return;
+
+  /* Center the interesting part of the source line to fit in
+     max_width, and adjust all columns accordingly.  */
+  int max_width = context->caret_max_width;
+  int offset = (int) cmax;
+  line = adjust_line (line, line_width, max_width, &offset);
+  offset -= cmax;
+  cmax += offset;
+  xloc1.column += offset;
+  if (xloc2.column)
+    xloc2.column += offset;
+
+  /* Print the source line.  */
+  pp_newline (context->printer);
+  const char *saved_prefix = pp_get_prefix (context->printer);
+  pp_set_prefix (context->printer, NULL);
+  pp_space (context->printer);
+  while (max_width > 0 && line_width > 0)
+    {
+      char c = *line == '\t' ? ' ' : *line;
+      if (c == '\0')
+	c = ' ';
+      pp_character (context->printer, c);
+      max_width--;
+      line_width--;
+      line++;
+    }
+  pp_newline (context->printer);
+
+  /* Print the caret under the line.  */
+  const char *caret_cs, *caret_ce;
+  caret_cs = colorize_start (pp_show_color (context->printer), "caret");
+  caret_ce = colorize_stop (pp_show_color (context->printer));
+  int cmin = xloc2.column
+    ? MIN (xloc1.column, xloc2.column) : xloc1.column;
+  int caret_min = cmin == xloc1.column ? caret1 : caret2;
+  int caret_max = cmin == xloc1.column ? caret2 : caret1;
+
+  /* cmin is >= 1, but we indent with an extra space at the start like
+     we did above.  */
+  int i;
+  for (i = 0; i < cmin; i++)
+    pp_space (context->printer);
+  pp_printf (context->printer, "%s%c%s", caret_cs, caret_min, caret_ce);
+
+  if (xloc2.column)
+    {
+      for (i++; i < cmax; i++)
+	pp_space (context->printer);
+      pp_printf (context->printer, "%s%c%s", caret_cs, caret_max, caret_ce);
+    }
+  pp_set_prefix (context->printer, saved_prefix);
+  pp_needs_newline (context->printer) = true;
+}
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 74a40bb..f40e469 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -308,135 +308,6 @@ diagnostic_build_prefix (diagnostic_context *context,
 			     locus_ce, text_cs, text, text_ce));
 }
 
-/* If LINE is longer than MAX_WIDTH, and COLUMN is not smaller than
-   MAX_WIDTH by some margin, then adjust the start of the line such
-   that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
-   margin is either CARET_LINE_MARGIN characters or the difference
-   between the column and the length of the line, whatever is smaller.
-   The length of LINE is given by LINE_WIDTH.  */
-static const char *
-adjust_line (const char *line, int line_width,
-	     int max_width, int *column_p)
-{
-  int right_margin = CARET_LINE_MARGIN;
-  int column = *column_p;
-
-  gcc_checking_assert (line_width >= column);
-  right_margin = MIN (line_width - column, right_margin);
-  right_margin = max_width - right_margin;
-  if (line_width >= max_width && column > right_margin)
-    {
-      line += column - right_margin;
-      *column_p = right_margin;
-    }
-  return line;
-}
-
-/* Print the physical source line corresponding to the location of
-   this diagnostic, and a caret indicating the precise column.  This
-   function only prints two caret characters if the two locations
-   given by DIAGNOSTIC are on the same line according to
-   diagnostic_same_line().  */
-void
-diagnostic_show_locus (diagnostic_context * context,
-		       const diagnostic_info *diagnostic)
-{
-  if (!context->show_caret
-      || diagnostic_location (diagnostic, 0) <= BUILTINS_LOCATION
-      || diagnostic_location (diagnostic, 0) == context->last_location)
-    return;
-
-  context->last_location = diagnostic_location (diagnostic, 0);
-  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
-  expanded_location s1 = { }; 
-  /* Zero-initialized. This is checked later by diagnostic_print_caret_line.  */
-
-  if (diagnostic_location (diagnostic, 1) > BUILTINS_LOCATION)
-    s1 = diagnostic_expand_location (diagnostic, 1);
-
-  diagnostic_print_caret_line (context, s0, s1,
-			       context->caret_chars[0],
-			       context->caret_chars[1]);
-}
-
-/* Print (part) of the source line given by xloc1 with caret1 pointing
-   at the column.  If xloc2.column != 0 and it fits within the same
-   line as xloc1 according to diagnostic_same_line (), then caret2 is
-   printed at xloc2.colum.  Otherwise, the caller has to set up things
-   to print a second caret line for xloc2.  */
-void
-diagnostic_print_caret_line (diagnostic_context * context,
-			     expanded_location xloc1,
-			     expanded_location xloc2,
-			     char caret1, char caret2)
-{
-  if (!diagnostic_same_line (context, xloc1, xloc2))
-    /* This will mean ignore xloc2.  */
-    xloc2.column = 0;
-  else if (xloc1.column == xloc2.column)
-    xloc2.column++;
-  
-  int cmax = MAX (xloc1.column, xloc2.column);
-  int line_width;
-  const char *line = location_get_source_line (xloc1.file, xloc1.line,
-					       &line_width);
-  if (line == NULL || cmax > line_width)
-    return;
-
-  /* Center the interesting part of the source line to fit in
-     max_width, and adjust all columns accordingly.  */
-  int max_width = context->caret_max_width;
-  int offset = (int) cmax;
-  line = adjust_line (line, line_width, max_width, &offset);
-  offset -= cmax;
-  cmax += offset;
-  xloc1.column += offset;
-  if (xloc2.column)
-    xloc2.column += offset;
-
-  /* Print the source line.  */
-  pp_newline (context->printer);
-  const char *saved_prefix = pp_get_prefix (context->printer);
-  pp_set_prefix (context->printer, NULL);
-  pp_space (context->printer);
-  while (max_width > 0 && line_width > 0)
-    {
-      char c = *line == '\t' ? ' ' : *line;
-      if (c == '\0')
-	c = ' ';
-      pp_character (context->printer, c);
-      max_width--;
-      line_width--;
-      line++;
-    }
-  pp_newline (context->printer);
-
-  /* Print the caret under the line.  */
-  const char *caret_cs, *caret_ce;
-  caret_cs = colorize_start (pp_show_color (context->printer), "caret");
-  caret_ce = colorize_stop (pp_show_color (context->printer));
-  int cmin = xloc2.column 
-    ? MIN (xloc1.column, xloc2.column) : xloc1.column;
-  int caret_min = cmin == xloc1.column ? caret1 : caret2;
-  int caret_max = cmin == xloc1.column ? caret2 : caret1;
-
-  /* cmin is >= 1, but we indent with an extra space at the start like
-     we did above.  */
-  int i;
-  for (i = 0; i < cmin; i++)
-    pp_space (context->printer);
-  pp_printf (context->printer, "%s%c%s", caret_cs, caret_min, caret_ce);
-
-  if (xloc2.column)
-    {
-      for (i++; i < cmax; i++)
-	pp_space (context->printer);
-      pp_printf (context->printer, "%s%c%s", caret_cs, caret_max, caret_ce);
-    }
-  pp_set_prefix (context->printer, saved_prefix);
-  pp_needs_newline (context->printer) = true;
-}
-
 /* Functions at which to stop the backtrace print.  It's not
    particularly helpful to print the callers of these functions.  */
 
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 13/22] gcc-rich-location.[ch]: add methods for working with tree ranges
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (7 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 11/22] Objective C: c/c-parser.c: use token ranges in two places David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file David Malcolm
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/ChangeLog:
	* gcc-rich-location.c (get_range_for_expr): New function.
	(gcc_rich_location::add_expr): New method.
	(gcc_rich_location::maybe_add_expr): New method.
	(gcc_rich_location::add_expr_with_caption_va): New method.
	(gcc_rich_location::add_expr_with_caption): New method.
	(gcc_rich_location::maybe_add_expr_with_caption): New method.
	* gcc-rich-location.h (gcc_rich_location::add_expr): New method.
	(gcc_rich_location::maybe_add_expr): New method.
	(gcc_rich_location::add_expr_with_caption_va): New method.
	(gcc_rich_location::add_expr_with_caption): New method.
	(gcc_rich_location::maybe_add_expr_with_caption): New method.
---
 gcc/gcc-rich-location.c | 99 +++++++++++++++++++++++++++++++++++++++++++++++++
 gcc/gcc-rich-location.h | 25 +++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/gcc/gcc-rich-location.c b/gcc/gcc-rich-location.c
index bdb2915..003e8f0 100644
--- a/gcc/gcc-rich-location.c
+++ b/gcc/gcc-rich-location.c
@@ -41,6 +41,28 @@ along with GCC; see the file COPYING3.  If not see
 #include "cpplib.h"
 #include "diagnostic.h"
 
+/* Extract any source range information from EXPR and write it
+   to *R.  */
+
+static bool
+get_range_for_expr (tree expr, location_range *r)
+{
+  if (EXPR_HAS_RANGE (expr))
+    {
+      source_range sr = EXPR_LOCATION_RANGE (expr);
+
+      /* Do we have meaningful data?  */
+      if (sr.m_start && sr.m_finish)
+	{
+	  r->m_start = expand_location (sr.m_start);
+	  r->m_finish = expand_location (sr.m_finish);
+	  return true;
+	}
+    }
+
+  return false;
+}
+
 /* Add a range covering [START, FINISH], with a caption given
    by translating and formatting GMSGID and any variadic args.  */
 
@@ -61,6 +83,83 @@ gcc_rich_location::add_range_with_caption (location_t start, location_t finish,
   va_end (ap);
 }
 
+/* Add a range to the rich_location, covering expression EXPR. */
+
+void
+gcc_rich_location::add_expr (tree expr)
+{
+  gcc_assert (expr);
+
+  location_range r;
+  r.m_caption = NULL;
+  r.m_show_caret_p = false;
+  if (get_range_for_expr (expr, &r))
+    add_range (&r);
+}
+
+/* If T is an expression, add a range for it to the rich_location.  */
+
+void
+gcc_rich_location::maybe_add_expr (tree t)
+{
+  if (EXPR_P (t))
+    add_expr (t);
+}
+
+/* As per rich_location::add_expr, but adding a caption to the
+   resulting range.  */
+
+void
+gcc_rich_location::add_expr_with_caption_va (tree expr,
+					     diagnostic_context *context,
+					     const char *gmsgid, va_list *args)
+{
+  gcc_assert (expr);
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+
+  location_range r;
+  r.m_caption = NULL;
+  if (get_range_for_expr (expr, &r))
+    {
+      /* We want to save the string information for later replay.
+	 The printer->format_decoder machinery assumes that we're working
+	 directly from va_args.  It's not possible to save a va_args
+	 for later reuse, since it makes use of a specific stack frame
+	 which needs to still be around.
+	 Hence we need to expand the formatting now, and save the result.  */
+
+      r.m_caption = expand_caption_va (context, gmsgid, args);
+      r.m_show_caret_p = false;
+      add_range (&r); /* This takes ownership of r.m_caption.  */
+    }
+}
+
+void
+gcc_rich_location::add_expr_with_caption (tree expr,
+					  diagnostic_context *context,
+					  const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  add_expr_with_caption_va (expr, context, gmsgid, &ap);
+  va_end (ap);
+}
+
+void
+gcc_rich_location::maybe_add_expr_with_caption (tree expr,
+						diagnostic_context *context,
+						const char *gmsgid, ...)
+{
+  if (!EXPR_P (expr))
+    return;
+
+  va_list ap;
+  va_start (ap, gmsgid);
+  add_expr_with_caption_va (expr, context, gmsgid, &ap);
+  va_end (ap);
+}
+
 /* Translate and expand the given GMSGID and ARGS into a caption
    (or NULL).
 
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
index 795b60f..a7822ac 100644
--- a/gcc/gcc-rich-location.h
+++ b/gcc/gcc-rich-location.h
@@ -44,6 +44,31 @@ class gcc_rich_location : public rich_location
 			  const char *gmsgid, ...)
     ATTRIBUTE_GCC_DIAG(5,6);
 
+  /* Methods for adding ranges via gcc entities.  */
+  void
+  add_expr (tree expr);
+
+  void
+  maybe_add_expr (tree t);
+
+  void
+  add_expr_with_caption_va (tree expr,
+			    diagnostic_context *context,
+			    const char *gmsgid, va_list *args)
+    ATTRIBUTE_GCC_DIAG(4, 0);
+
+  void
+  add_expr_with_caption (tree expr,
+			 diagnostic_context *context,
+			 const char *gmsgid, ...)
+    ATTRIBUTE_GCC_DIAG(4,5);
+
+  void
+  maybe_add_expr_with_caption (tree expr,
+			       diagnostic_context *context,
+			       const char *gmsgid, ...)
+    ATTRIBUTE_GCC_DIAG(4,5);
+
  private:
   static char *
   expand_caption_va (diagnostic_context *context,
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 10/22] C++ FE: Use token ranges for various diagnostics
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (3 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 09/22] C frontend: store and use token ranges in c_declspecs David Malcolm
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Screenshot:
  https://dmalcolm.fedorapeople.org/gcc/2015-09-09/c++-token-ranges.html

gcc/cp/ChangeLog:
	* parser.c (cp_parser_string_literal): Show ranges of both
	string literals in the "unsupported concatenation" error.
	(cp_parser_primary_expression): Use token range rather than
	location for two errors.
	(cp_parser_namespace_alias_definition): Likewise for one error.
	(cp_parser_init_declarator): Likewise.
	(cp_parser_base_specifier): Likewise for two errors.
	(cp_parser_std_attribute): Likewise for an error and a warning.
	(cp_parser_function_definition_after_declarator): Likewise for an
	error.
	(cp_parser_explicit_template_declaration): Add "tok_range" param
	and use it for the errors.
	(cp_parser_template_declaration_after_export): Pass the range
	of the "template" token to
	cp_parser_explicit_template_declaration.

gcc/testsuite/ChangeLog:
	* g++.dg/diagnostic/token-ranges.C: New.
---
 gcc/cp/parser.c                                |  45 +++++++----
 gcc/testsuite/g++.dg/diagnostic/token-ranges.C | 104 +++++++++++++++++++++++++
 2 files changed, 132 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/token-ranges.C

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 7c59c58..17b7de0 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3734,6 +3734,8 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
       gcc_obstack_init (&str_ob);
       count = 0;
 
+      source_range range_of_prior_literal = tok->range;
+
       do
 	{
 	  cp_lexer_consume_token (parser->lexer);
@@ -3767,11 +3769,17 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	      if (type == CPP_STRING)
 		type = curr_type;
 	      else if (curr_type != CPP_STRING)
-		error_at (tok->location,
-			  "unsupported non-standard concatenation "
-			  "of string literals");
+                {
+                  rich_location rich_loc (tok->range);
+                  rich_loc.add_range (range_of_prior_literal);
+                  error_at_rich_loc (&rich_loc,
+                                     "unsupported non-standard concatenation "
+                                     "of string literals");
+                }
 	    }
 
+          range_of_prior_literal.m_finish = tok->range.m_finish;
+
 	  obstack_grow (&str_ob, &str, sizeof (cpp_string));
 
 	  tok = cp_lexer_peek_token (parser->lexer);
@@ -4549,7 +4557,7 @@ cp_parser_primary_expression (cp_parser *parser,
 	  cp_lexer_consume_token (parser->lexer);
 	  if (parser->local_variables_forbidden_p)
 	    {
-	      error_at (token->location,
+	      error_at (token->range,
 			"%<this%> may not be used in this context");
 	      return error_mark_node;
 	    }
@@ -4672,7 +4680,7 @@ cp_parser_primary_expression (cp_parser *parser,
 	      && (cp_lexer_peek_nth_token (parser->lexer, 2)->type
 	      	  == CPP_LESS))
 	    {
-	      error_at (token->location,
+	      error_at (token->range,
 			"a template declaration cannot appear at block scope");
 	      cp_parser_skip_to_end_of_block_or_statement (parser);
 	      return error_mark_node;
@@ -16829,7 +16837,7 @@ cp_parser_namespace_alias_definition (cp_parser* parser)
   if (!cp_parser_uncommitted_to_tentative_parse_p (parser)
       && cp_lexer_next_token_is (parser->lexer, CPP_OPEN_BRACE)) 
     {
-      error_at (token->location, "%<namespace%> definition is not allowed here");
+      error_at (token->range, "%<namespace%> definition is not allowed here");
       /* Skip the definition.  */
       cp_lexer_consume_token (parser->lexer);
       if (cp_parser_skip_to_closing_brace (parser))
@@ -17602,7 +17610,7 @@ cp_parser_init_declarator (cp_parser* parser,
 		      "an asm-specification is not allowed "
 		      "on a function-definition");
 	  if (attributes)
-	    error_at (attributes_start_token->location,
+	    error_at (attributes_start_token->range,
 		      "attributes are not allowed "
 		      "on a function-definition");
 	  /* This is a function-definition.  */
@@ -22016,10 +22024,10 @@ cp_parser_base_specifier (cp_parser* parser)
     {
       token = cp_lexer_peek_token (parser->lexer);
       if (!processing_template_decl)
-	error_at (token->location,
+	error_at (token->range,
 		  "keyword %<typename%> not allowed outside of templates");
       else
-	error_at (token->location,
+	error_at (token->range,
 		  "keyword %<typename%> not allowed in this context "
 		  "(the base class is implicitly a type)");
       cp_lexer_consume_token (parser->lexer);
@@ -22971,10 +22979,12 @@ cp_parser_std_attribute (cp_parser *parser)
 {
   tree attribute, attr_ns = NULL_TREE, attr_id = NULL_TREE, arguments;
   cp_token *token;
+  source_range name_range;
 
   /* First, parse name of the attribute, a.k.a attribute-token.  */
 
   token = cp_lexer_peek_token (parser->lexer);
+  name_range = token->range;
   if (token->type == CPP_NAME)
     attr_id = token->u.value;
   else if (token->type == CPP_KEYWORD)
@@ -23002,7 +23012,7 @@ cp_parser_std_attribute (cp_parser *parser)
 	attr_id = ridpointers[(int) token->keyword];
       else
 	{
-	  error_at (token->location,
+	  error_at (token->range,
 		    "expected an identifier for the attribute name");
 	  return error_mark_node;
 	}
@@ -23021,7 +23031,7 @@ cp_parser_std_attribute (cp_parser *parser)
       else if (cxx_dialect >= cxx11 && is_attribute_p ("deprecated", attr_id))
 	{
 	  if (cxx_dialect == cxx11)
-	    pedwarn (token->location, OPT_Wpedantic,
+	    pedwarn (name_range, OPT_Wpedantic,
 		     "%<deprecated%> is a C++14 feature;"
 		     " use %<gnu::deprecated%>");
 	  TREE_PURPOSE (TREE_PURPOSE (attribute)) = get_identifier ("gnu");
@@ -24383,7 +24393,7 @@ cp_parser_function_definition_after_declarator (cp_parser* parser,
 	 returned.  */
       cp_parser_identifier (parser);
       /* Issue an error message.  */
-      error_at (token->location,
+      error_at (token->range,
 		"named return values are no longer supported");
       /* Skip tokens until we reach the start of the function body.  */
       while (true)
@@ -24654,11 +24664,11 @@ cp_parser_template_introduction (cp_parser* parser, bool member_p)
 /* Parse a normal template-declaration following the template keyword.  */
 
 static void
-cp_parser_explicit_template_declaration (cp_parser* parser, bool member_p)
+cp_parser_explicit_template_declaration (cp_parser* parser, bool member_p,
+                                         source_range tok_range)
 {
   tree parameter_list;
   bool need_lang_pop;
-  location_t location = input_location;
 
   /* Look for the `<' token.  */
   if (!cp_parser_require (parser, CPP_LESS, RT_LESS))
@@ -24668,7 +24678,7 @@ cp_parser_explicit_template_declaration (cp_parser* parser, bool member_p)
       /* 14.5.2.2 [temp.mem]
 
          A local class shall not have member templates.  */
-      error_at (location,
+      error_at (tok_range,
                 "invalid declaration of member template in local class");
       cp_parser_skip_to_end_of_block_or_statement (parser);
       return;
@@ -24678,7 +24688,7 @@ cp_parser_explicit_template_declaration (cp_parser* parser, bool member_p)
      A template ... shall not have C linkage.  */
   if (current_lang_name == lang_name_c)
     {
-      error_at (location, "template with C linkage");
+      error_at (tok_range, "template with C linkage");
       /* Give it C++ linkage to avoid confusing other parts of the
          front end.  */
       push_lang_context (lang_name_cplusplus);
@@ -24737,8 +24747,9 @@ cp_parser_template_declaration_after_export (cp_parser* parser, bool member_p)
 {
   if (cp_lexer_next_token_is_keyword (parser->lexer, RID_TEMPLATE))
     {
+      source_range tok_range = cp_lexer_peek_token (parser->lexer)->range;
       cp_lexer_consume_token (parser->lexer);
-      cp_parser_explicit_template_declaration (parser, member_p);
+      cp_parser_explicit_template_declaration (parser, member_p, tok_range);
       return true;
     }
   else if (flag_concepts)
diff --git a/gcc/testsuite/g++.dg/diagnostic/token-ranges.C b/gcc/testsuite/g++.dg/diagnostic/token-ranges.C
new file mode 100644
index 0000000..bc830ad
--- /dev/null
+++ b/gcc/testsuite/g++.dg/diagnostic/token-ranges.C
@@ -0,0 +1,104 @@
+/* { dg-options "-fdiagnostics-show-caret -std=c++11 -Wpedantic" } */
+
+/* Verify that various diagnostics show source code ranges.  */
+
+/* These ones merely use token ranges; they don't use tree ranges.  */
+
+void bad_namespace () {
+  namespace foo { // { dg-error "'namespace' definition is not allowed here" }
+  }
+/* { dg-begin-multiline-output "" }
+   namespace foo {
+   ^~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+
+void fn_defn_with_attribute ()
+  __attribute__((constructor (0))) // { dg-error "attributes are not allowed on a function-definition" }
+{
+  /* { dg-begin-multiline-output "" }
+   __attribute__((constructor (0)))
+   ^~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+class foo {};
+class bar : public typename foo // { dg-error "keyword 'typename' not allowed outside of templates" }
+{
+};
+/* { dg-begin-multiline-output "" }
+ class bar : public typename foo
+                    ^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+
+// C++11 attributes
+
+void bogus_scoped_attribute [[foo::400]] (); // { dg-error "expected an identifier for the attribute name" }
+/* { dg-begin-multiline-output "" }
+ void bogus_scoped_attribute [[foo::400]] ();
+                                    ^~~
+   { dg-end-multiline-output "" } */
+
+void meta_deprecation [[deprecated]] (); // { dg-warning "use 'gnu::deprecated'" }
+/* { dg-begin-multiline-output "" }
+ void meta_deprecation [[deprecated]] ();
+                         ^~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+
+int foo() return bar {} // { dg-error "named return values are no longer supported" }
+/* { dg-begin-multiline-output "" }
+ int foo() return bar {}
+           ^~~~~~
+   { dg-end-multiline-output "" } */
+
+template<typename T> void foo(T)
+{
+  struct A
+  {
+    template<int> struct B {} // { dg-error "local class" }
+
+/* { dg-begin-multiline-output "" }
+     template<int> struct B {}
+     ^~~~~~~~
+   { dg-end-multiline-output "" } */
+  };
+}
+
+extern "C" { template<typename T> void foo(T); } // { dg-error "C linkage" }
+/* { dg-begin-multiline-output "" }
+ extern "C" { template<typename T> void foo(T); }
+              ^~~~~~~~
+   { dg-end-multiline-output "" } */
+// TODO: It would be nice to inform the user of the location of the
+// relevant extern "C".
+
+const void *s = u8"a"  u"b";  // { dg-error "non-standard concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s = u8"a"  u"b";
+                 ~~~~~  ^~~~
+   { dg-end-multiline-output "" } */
+
+const void *s2 = u"a"  u"b"  u8"c";  // { dg-error "non-standard concatenation" }
+/* { dg-begin-multiline-output "" }
+ const void *s2 = u"a"  u"b"  u8"c";
+                  ~~~~~~~~~~  ^~~~~
+  { dg-end-multiline-output "" } */
+
+
+void default_arg_of_this (void *ptr = this); // { dg-error "'this'" }
+/* { dg-begin-multiline-output "" }
+ void default_arg_of_this (void *ptr = this);
+                                       ^~~~
+  { dg-end-multiline-output "" } */
+
+void template_inside_fn ()
+{
+  int i = template < // { dg-error "cannot appear at block scope" }
+/* { dg-begin-multiline-output "" }
+   int i = template <
+           ^~~~~~~~
+  { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 09/22] C frontend: store and use token ranges in c_declspecs
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (4 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 10/22] C++ FE: Use token ranges for various diagnostics David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 20/22] Use rich locations in c-family/c-format.c David Malcolm
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch replaces the source_location in c_declspecs with
a source_range, and uses it for various diagnostics to gain
underlines.

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/bad-c-decls.html

gcc/c/ChangeLog:
	* c-decl.c (declspecs_add_addrspace): Convert param 1 from
	source_location to source_range.
	(declspecs_add_qual): Likewise.
	(declspecs_add_type): Likewise.
	(declspecs_add_scspec): Likewise.
	(declspecs_add_attrs): Likewise.
	(declspecs_add_alignas): Likewise.
	* c-parser.c (c_parser_declaration_or_fndef): Likewise for local
	"init_loc".
	(c_parser_declspecs): Likewise for local "loc", and for calls
	to declspecs_add_addrspace and declspecs_add_type.
	* c-tree.h (struct c_declspecs): Convert elements of "locations"
	array from source_location to source_range.
	(declspecs_add_qual): Convert param 1 from source_location to
	source_range.
	(declspecs_add_type): Likewise.
	(declspecs_add_scspec): Likewise.
	(declspecs_add_attrs): Likewise.
	(declspecs_add_addrspace): Likewise.
	(declspecs_add_alignas): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/diagnostic-token-ranges.c: Add tests of various
	kinds of bad declaration.
---
 gcc/c/c-decl.c                                 | 16 ++++++++--------
 gcc/c/c-parser.c                               | 23 +++++++++++++----------
 gcc/c/c-tree.h                                 | 14 +++++++-------
 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c | 26 ++++++++++++++++++++++++++
 4 files changed, 54 insertions(+), 25 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 9fe8aa4..b7f0241 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9443,7 +9443,7 @@ build_null_declspecs (void)
    SPECS, returning SPECS.  */
 
 struct c_declspecs *
-declspecs_add_addrspace (source_location location,
+declspecs_add_addrspace (source_range location,
 			 struct c_declspecs *specs, addr_space_t as)
 {
   specs->non_sc_seen_p = true;
@@ -9466,7 +9466,7 @@ declspecs_add_addrspace (source_location location,
    returning SPECS.  */
 
 struct c_declspecs *
-declspecs_add_qual (source_location loc,
+declspecs_add_qual (source_range loc,
 		    struct c_declspecs *specs, tree qual)
 {
   enum rid i;
@@ -9509,7 +9509,7 @@ declspecs_add_qual (source_location loc,
    returning SPECS.  */
 
 struct c_declspecs *
-declspecs_add_type (location_t loc, struct c_declspecs *specs,
+declspecs_add_type (source_range loc, struct c_declspecs *specs,
 		    struct c_typespec spec)
 {
   tree type = spec.spec;
@@ -9747,7 +9747,7 @@ declspecs_add_type (location_t loc, struct c_declspecs *specs,
 	      break;
 	    case RID_COMPLEX:
 	      dupe = specs->complex_p;
-	      if (!in_system_header_at (loc))
+	      if (!in_system_header_at (loc.m_start))
 		pedwarn_c90 (loc, OPT_Wpedantic,
 			     "ISO C90 does not support complex types");
 	      if (specs->typespec_word == cts_auto_type)
@@ -9973,7 +9973,7 @@ declspecs_add_type (location_t loc, struct c_declspecs *specs,
 		}
 	      return specs;
 	    case RID_BOOL:
-	      if (!in_system_header_at (loc))
+	      if (!in_system_header_at (loc.m_start))
 		pedwarn_c90 (loc, OPT_Wpedantic,
 			     "ISO C90 does not support boolean types");
 	      if (specs->long_p)
@@ -10257,7 +10257,7 @@ declspecs_add_type (location_t loc, struct c_declspecs *specs,
    declaration specifiers SPECS, returning SPECS.  */
 
 struct c_declspecs *
-declspecs_add_scspec (source_location loc,
+declspecs_add_scspec (source_range loc,
 		      struct c_declspecs *specs,
 		      tree scspec)
 {
@@ -10376,7 +10376,7 @@ declspecs_add_scspec (source_location loc,
    returning SPECS.  */
 
 struct c_declspecs *
-declspecs_add_attrs (source_location loc, struct c_declspecs *specs, tree attrs)
+declspecs_add_attrs (source_range loc, struct c_declspecs *specs, tree attrs)
 {
   specs->attrs = chainon (attrs, specs->attrs);
   specs->locations[cdw_attributes] = loc;
@@ -10388,7 +10388,7 @@ declspecs_add_attrs (source_location loc, struct c_declspecs *specs, tree attrs)
    alignment is ALIGN) to the declaration specifiers SPECS, returning
    SPECS.  */
 struct c_declspecs *
-declspecs_add_alignas (source_location loc,
+declspecs_add_alignas (source_range loc,
 		       struct c_declspecs *specs, tree align)
 {
   int align_log;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 1c93d39..6e6464b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1746,19 +1746,20 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	    {
 	      tree d;
 	      struct c_expr init;
-	      location_t init_loc;
+	      source_range init_loc;
 	      c_parser_consume_token (parser);
 	      if (auto_type_p)
 		{
 		  start_init (NULL_TREE, asm_name, global_bindings_p ());
-		  init_loc = c_parser_peek_token (parser)->location;
+		  init_loc = c_parser_peek_token (parser)->range;
 		  init = c_parser_expr_no_commas (parser, NULL);
 		  if (TREE_CODE (init.value) == COMPONENT_REF
 		      && DECL_C_BIT_FIELD (TREE_OPERAND (init.value, 1)))
 		    error_at (here,
 			      "%<__auto_type%> used with a bit-field"
 			      " initializer");
-		  init = convert_lvalue_to_rvalue (init_loc, init, true, true);
+		  init = convert_lvalue_to_rvalue (init_loc.m_start, init,
+						   true, true);
 		  tree init_type = TREE_TYPE (init.value);
 		  /* As with typeof, remove all qualifiers from atomic types.  */
 		  if (init_type != error_mark_node && TYPE_ATOMIC (init_type))
@@ -1808,14 +1809,15 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 		    c_finish_omp_declare_simd (parser, d, NULL_TREE,
 					       omp_declare_simd_clauses);
 		  start_init (d, asm_name, global_bindings_p ());
-		  init_loc = c_parser_peek_token (parser)->location;
+		  init_loc = c_parser_peek_token (parser)->range;
 		  init = c_parser_initializer (parser);
 		  finish_init ();
 		}
 	      if (d != error_mark_node)
 		{
-		  maybe_warn_string_init (init_loc, TREE_TYPE (d), init);
-		  finish_decl (d, init_loc, init.value,
+		  maybe_warn_string_init (init_loc.m_start, TREE_TYPE (d),
+					  init);
+		  finish_decl (d, init_loc.m_start, init.value,
 			       init.original_type, asm_name);
 		}
 	    }
@@ -2228,7 +2230,7 @@ c_parser_declspecs (c_parser *parser, struct c_declspecs *specs,
       struct c_typespec t;
       tree attrs;
       tree align;
-      location_t loc = c_parser_peek_token (parser)->location;
+      source_range loc = c_parser_peek_token (parser)->range;
 
       /* If we cannot accept a type, exit if the next token must start
 	 one.  Also, if we already have seen a tagged definition,
@@ -2249,7 +2251,7 @@ c_parser_declspecs (c_parser *parser, struct c_declspecs *specs,
 	    {
 	      addr_space_t as
 		= name_token->keyword - RID_FIRST_ADDR_SPACE;
-	      declspecs_add_addrspace (name_token->location, specs, as);
+	      declspecs_add_addrspace (name_token->range, specs, as);
 	      c_parser_consume_token (parser);
 	      attrs_ok = true;
 	      continue;
@@ -2295,7 +2297,7 @@ c_parser_declspecs (c_parser *parser, struct c_declspecs *specs,
 	    }
 	  t.expr = NULL_TREE;
 	  t.expr_const_operands = true;
-	  declspecs_add_type (name_token->location, specs, t);
+	  declspecs_add_type (name_token->range, specs, t);
 	  continue;
 	}
       if (c_parser_next_token_is (parser, CPP_LESS))
@@ -3684,7 +3686,8 @@ c_parser_parameter_declaration (c_parser *parser, tree attrs)
   specs = build_null_declspecs ();
   if (attrs)
     {
-      declspecs_add_attrs (input_location, specs, attrs);
+      declspecs_add_attrs (source_range::from_location (input_location),
+			   specs, attrs);
       attrs = NULL_TREE;
     }
   c_parser_declspecs (parser, specs, true, true, true, true, false,
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 4b0ec22..0810a74 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -253,7 +253,7 @@ enum c_declspec_word {
    specifier is added, please update the enum c_declspec_word above
    accordingly.  */
 struct c_declspecs {
-  source_location locations[cdw_number_of_elements];
+  source_range locations[cdw_number_of_elements];
   /* The type specified, if a single type specifier such as a struct,
      union or enum specifier, typedef name or typeof specifies the
      whole type, or NULL_TREE if none or a keyword such as "void" or
@@ -544,19 +544,19 @@ extern struct c_declarator *build_id_declarator (tree);
 extern struct c_declarator *make_pointer_declarator (struct c_declspecs *,
 						     struct c_declarator *);
 extern struct c_declspecs *build_null_declspecs (void);
-extern struct c_declspecs *declspecs_add_qual (source_location,
+extern struct c_declspecs *declspecs_add_qual (source_range,
 					       struct c_declspecs *, tree);
-extern struct c_declspecs *declspecs_add_type (location_t,
+extern struct c_declspecs *declspecs_add_type (source_range,
 					       struct c_declspecs *,
 					       struct c_typespec);
-extern struct c_declspecs *declspecs_add_scspec (source_location,
+extern struct c_declspecs *declspecs_add_scspec (source_range,
 						 struct c_declspecs *, tree);
-extern struct c_declspecs *declspecs_add_attrs (source_location,
+extern struct c_declspecs *declspecs_add_attrs (source_range,
 						struct c_declspecs *, tree);
-extern struct c_declspecs *declspecs_add_addrspace (source_location,
+extern struct c_declspecs *declspecs_add_addrspace (source_range,
 						    struct c_declspecs *,
 						    addr_space_t);
-extern struct c_declspecs *declspecs_add_alignas (source_location,
+extern struct c_declspecs *declspecs_add_alignas (source_range,
 						  struct c_declspecs *, tree);
 extern struct c_declspecs *finish_declspecs (struct c_declspecs *);
 
diff --git a/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
index 6bd9e0b..5f00563 100644
--- a/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
+++ b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
@@ -107,3 +107,29 @@ void break_and_continue_in_wrong_places (void)
      ^~~~~~~~
     { dg-end-multiline-output "" } */
 }
+
+/* Various examples of bad type decls.  */
+
+int float bogus; /* { dg-error "two or more data types in declaration specifiers" } */
+/* { dg-begin-multiline-output "" }
+ int float bogus;
+     ^~~~~
+    { dg-end-multiline-output "" } */
+
+long long long bogus2; /* { dg-error "'long long long' is too long for GCC" } */
+/* { dg-begin-multiline-output "" }
+ long long long bogus2;
+           ^~~~
+    { dg-end-multiline-output "" } */
+
+long short bogus3; /* { dg-error "both 'long' and 'short' in declaration specifiers" } */
+/* { dg-begin-multiline-output "" }
+ long short bogus3;
+      ^~~~~
+    { dg-end-multiline-output "" } */
+
+signed unsigned bogus4; /* { dg-error "both 'signed' and 'unsigned' in declaration specifiers" } */
+/* { dg-begin-multiline-output "" }
+ signed unsigned bogus4;
+        ^~~~~~~~
+    { dg-end-multiline-output "" } */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 11/22] Objective C: c/c-parser.c: use token ranges in two places
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (6 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 20/22] Use rich locations in c-family/c-format.c David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 13/22] gcc-rich-location.[ch]: add methods for working with tree ranges David Malcolm
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

I don't yet have an explicit test case for these.

gcc/c/ChangeLog:
	* c-parser.c (c_parser_declaration_or_fndef): Use token range
	rather than location for a couple of warnings.
---
 gcc/c/c-parser.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 6e6464b..9bb5200 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1608,7 +1608,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	      return;
 	    if (specs->attrs)
 	      {
-		warning_at (c_parser_peek_token (parser)->location, 
+		warning_at (c_parser_peek_token (parser)->range,
 			    OPT_Wattributes,
 	       		    "prefix attributes are ignored for methods");
 		specs->attrs = NULL_TREE;
@@ -1643,7 +1643,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	      return;
 	    if (specs->attrs)
 	      {
-		warning_at (c_parser_peek_token (parser)->location, 
+		warning_at (c_parser_peek_token (parser)->range,
 			OPT_Wattributes,
 			"prefix attributes are ignored for implementations");
 		specs->attrs = NULL_TREE;
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 08/22] C frontend: use token ranges in various diagnostics
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
  2015-09-10 20:12 ` [PATCH 01/22] Change of location_get_source_line signature David Malcolm
  2015-09-10 20:13 ` [PATCH 06/22] PR/62314: add ability to add fixit-hints David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands David Malcolm
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch makes use of token ranges in the C frontend to add underlines
to various diagnostics.

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/diagnostic-token-ranges.html

gcc/c/ChangeLog:
	* c-decl.c (undeclared_variable): Convert param "loc" from a
	location_t to a source_range.
	* c-parser.c (c_lex_one_token): Use token's range rather than
	location for "identifier conflicts with C++ keyword" warning.
	(c_parser_declaration_or_fndef): Convert local "here" from a
	location_t to a source_range.
	(c_parser_parms_list_declarator): Use token's range rather than
	location for "ISO C requires a named argument before ..." warning.
	(c_parser_parameter_declaration): Likewise for "unknown type name"
	error.
	(c_parser_asm_string_literal): Likewise for
	"wide string literal in asm" error.
	(c_parser_label): Likewise for label-before-declaration error, and
	show the label's range to the error.
	(c_parser_statement_after_labels): Pass the token's range rather
	than location to c_finish_bc_stmt.
	(c_parser_postfix_expression): Likewise for call to
	build_external_ref.
	(c_parser_omp_variable_list): Likewise for call to
	undeclared_variable.
	* c-tree.h (undeclared_variable): Convert initial param from
	location_t to source_range.
	(build_external_ref): Likewise.
	(c_finish_bc_stmt): Likewise.
	* c-typeck.c (build_external_ref): Likewise.
	(c_finish_bc_stmt): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/diagnostic-token-ranges.c: New file.
---
 gcc/c/c-decl.c                                 |   2 +-
 gcc/c/c-parser.c                               |  31 ++++---
 gcc/c/c-tree.h                                 |   6 +-
 gcc/c/c-typeck.c                               |  22 ++---
 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c | 109 +++++++++++++++++++++++++
 5 files changed, 143 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-token-ranges.c

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index e6b6ba5..9fe8aa4 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -3409,7 +3409,7 @@ implicitly_declare (location_t loc, tree functionid)
    in an appropriate scope, which will suppress further errors for the
    same identifier.  The error message should be given location LOC.  */
 void
-undeclared_variable (location_t loc, tree id)
+undeclared_variable (source_range loc, tree id)
 {
   static bool already = false;
   struct c_scope *scope;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 5d822ee..1c93d39 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -266,7 +266,7 @@ c_lex_one_token (c_parser *parser, c_token *token)
 
 	    if (rid_code == RID_CXX_COMPAT_WARN)
 	      {
-		warning_at (token->location,
+		warning_at (token->range,
 			    OPT_Wc___compat,
 			    "identifier %qE conflicts with C++ keyword",
 			    token->value);
@@ -1526,7 +1526,7 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
   tree prefix_attrs;
   tree all_prefix_attrs;
   bool diagnosed_no_specs = false;
-  location_t here = c_parser_peek_token (parser)->location;
+  source_range here = c_parser_peek_token (parser)->range;
 
   if (static_assert_ok
       && c_parser_next_token_is_keyword (parser, RID_STATIC_ASSERT))
@@ -3575,7 +3575,7 @@ c_parser_parms_list_declarator (c_parser *parser, tree attrs, tree expr)
         {
           /* Suppress -Wold-style-definition for this case.  */
           ret->types = error_mark_node;
-          error_at (c_parser_peek_token (parser)->location,
+          error_at (c_parser_peek_token (parser)->range,
                     "ISO C requires a named argument before %<...%>");
         }
       c_parser_consume_token (parser);
@@ -3670,7 +3670,7 @@ c_parser_parameter_declaration (c_parser *parser, tree attrs)
       c_parser_set_source_position_from_token (token);
       if (c_parser_next_tokens_start_typename (parser, cla_prefer_type))
 	{
-	  error_at (token->location, "unknown type name %qE", token->value);
+	  error_at (token->range, "unknown type name %qE", token->value);
 	  parser->error = true;
 	}
       /* ??? In some Objective-C cases '...' isn't applicable so there
@@ -3731,7 +3731,7 @@ c_parser_asm_string_literal (c_parser *parser)
     }
   else if (c_parser_next_token_is (parser, CPP_WSTRING))
     {
-      error_at (c_parser_peek_token (parser)->location,
+      error_at (c_parser_peek_token (parser)->range,
 		"wide string literal in %<asm%>");
       str = build_string (1, "");
       c_parser_consume_token (parser);
@@ -4749,6 +4749,7 @@ c_parser_all_labels (c_parser *parser)
 static void
 c_parser_label (c_parser *parser)
 {
+  source_range label_range = c_parser_peek_token (parser)->range;
   location_t loc1 = c_parser_peek_token (parser)->location;
   tree label = NULL_TREE;
   if (c_parser_next_token_is_keyword (parser, RID_CASE))
@@ -4799,9 +4800,11 @@ c_parser_label (c_parser *parser)
     {
       if (c_parser_next_tokens_start_declaration (parser))
 	{
-	  error_at (c_parser_peek_token (parser)->location,
-		    "a label can only be part of a statement and "
-		    "a declaration is not a statement");
+	  rich_location richloc (c_parser_peek_token (parser)->range);
+	  richloc.add_range (label_range);
+	  error_at_rich_loc (&richloc,
+			     "a label can only be part of a statement and "
+			     "a declaration is not a statement");
 	  c_parser_declaration_or_fndef (parser, /*fndef_ok*/ false,
 					 /*static_assert_ok*/ true,
 					 /*empty_ok*/ true, /*nested*/ true,
@@ -4963,6 +4966,7 @@ static void
 c_parser_statement_after_labels (c_parser *parser)
 {
   location_t loc = c_parser_peek_token (parser)->location;
+  source_range tok_range = c_parser_peek_token (parser)->range;
   tree stmt = NULL_TREE;
   bool in_if_block = parser->in_if_block;
   parser->in_if_block = false;
@@ -5034,11 +5038,11 @@ c_parser_statement_after_labels (c_parser *parser)
 	  goto expect_semicolon;
 	case RID_CONTINUE:
 	  c_parser_consume_token (parser);
-	  stmt = c_finish_bc_stmt (loc, &c_cont_label, false);
+	  stmt = c_finish_bc_stmt (tok_range, &c_cont_label, false);
 	  goto expect_semicolon;
 	case RID_BREAK:
 	  c_parser_consume_token (parser);
-	  stmt = c_finish_bc_stmt (loc, &c_break_label, true);
+	  stmt = c_finish_bc_stmt (tok_range, &c_break_label, true);
 	  goto expect_semicolon;
 	case RID_RETURN:
 	  c_parser_consume_token (parser);
@@ -7128,6 +7132,7 @@ c_parser_postfix_expression (c_parser *parser)
   struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
+  source_range src_range = c_parser_peek_token (parser)->range;
   expr.original_code = ERROR_MARK;
   expr.original_type = NULL;
   switch (c_parser_peek_token (parser)->type)
@@ -7172,7 +7177,7 @@ c_parser_postfix_expression (c_parser *parser)
 	  {
 	    tree id = c_parser_peek_token (parser)->value;
 	    c_parser_consume_token (parser);
-	    expr.value = build_external_ref (loc, id,
+	    expr.value = build_external_ref (src_range, id,
 					     (c_parser_peek_token (parser)->type
 					      == CPP_OPEN_PAREN),
 					     &expr.original_type);
@@ -10165,7 +10170,7 @@ c_parser_omp_variable_list (c_parser *parser,
 
       if (t == NULL_TREE)
 	{
-	  undeclared_variable (c_parser_peek_token (parser)->location,
+	  undeclared_variable (c_parser_peek_token (parser)->range,
 			       c_parser_peek_token (parser)->value);
 	  t = error_mark_node;
 	}
@@ -14933,7 +14938,7 @@ c_parser_cilk_clause_linear (c_parser *parser, tree clauses)
 
       if (var == NULL)
 	{
-	  undeclared_variable (c_parser_peek_token (parser)->location,
+	  undeclared_variable (c_parser_peek_token (parser)->range,
 			       c_parser_peek_token (parser)->value);
 	c_parser_consume_token (parser);
 	}
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index df1ebb6..4b0ec22 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -491,7 +491,7 @@ extern tree build_enumerator (location_t, location_t, struct c_enum_contents *,
 extern tree check_for_loop_decls (location_t, bool);
 extern void mark_forward_parm_decls (void);
 extern void declare_parm_level (void);
-extern void undeclared_variable (location_t, tree);
+extern void undeclared_variable (source_range, tree);
 extern tree lookup_label_for_goto (location_t, tree);
 extern tree declare_label (tree);
 extern tree define_label (location_t, tree);
@@ -595,7 +595,7 @@ extern void mark_exp_read (tree);
 extern tree composite_type (tree, tree);
 extern tree build_component_ref (location_t, tree, tree);
 extern tree build_array_ref (location_t, tree, tree);
-extern tree build_external_ref (location_t, tree, int, tree *);
+extern tree build_external_ref (source_range, tree, int, tree *);
 extern void pop_maybe_used (bool);
 extern struct c_expr c_expr_sizeof_expr (location_t, struct c_expr);
 extern struct c_expr c_expr_sizeof_type (location_t, struct c_type_name *);
@@ -636,7 +636,7 @@ extern tree c_finish_stmt_expr (location_t, tree);
 extern tree c_process_expr_stmt (location_t, tree);
 extern tree c_finish_expr_stmt (location_t, tree);
 extern tree c_finish_return (location_t, tree, tree);
-extern tree c_finish_bc_stmt (location_t, tree *, bool);
+extern tree c_finish_bc_stmt (source_range tok_range, tree *, bool);
 extern tree c_finish_goto_label (location_t, tree);
 extern tree c_finish_goto_ptr (location_t, tree);
 extern tree c_expr_to_decl (tree, bool *, bool *);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index dc22396..a755a7e 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2587,7 +2587,7 @@ build_array_ref (location_t loc, tree array, tree index)
    for CONST_DECLs defined as enum constants.  If the type of the
    identifier is not available, *TYPE is set to NULL.  */
 tree
-build_external_ref (location_t loc, tree id, int fun, tree *type)
+build_external_ref (source_range loc, tree id, int fun, tree *type)
 {
   tree ref;
   tree decl = lookup_name (id);
@@ -2604,7 +2604,7 @@ build_external_ref (location_t loc, tree id, int fun, tree *type)
     }
   else if (fun)
     /* Implicit function declaration.  */
-    ref = implicitly_declare (loc, id);
+    ref = implicitly_declare (loc.m_start, id);
   else if (decl == error_mark_node)
     /* Don't complain about something that's already been
        complained about.  */
@@ -2674,7 +2674,7 @@ build_external_ref (location_t loc, tree id, int fun, tree *type)
 	   && (!VAR_P (ref) || TREE_STATIC (ref))
 	   && ! TREE_PUBLIC (ref)
 	   && DECL_CONTEXT (ref) != current_function_decl)
-    record_inline_static (loc, current_function_decl, ref,
+    record_inline_static (loc.m_start, current_function_decl, ref,
 			  csi_internal);
 
   return ref;
@@ -9873,7 +9873,7 @@ c_finish_loop (location_t start_locus, tree cond, tree incr, tree body,
 }
 
 tree
-c_finish_bc_stmt (location_t loc, tree *label_p, bool is_break)
+c_finish_bc_stmt (source_range tok_range, tree *label_p, bool is_break)
 {
   bool skip;
   tree label = *label_p;
@@ -9890,7 +9890,7 @@ c_finish_bc_stmt (location_t loc, tree *label_p, bool is_break)
   if (!label)
     {
       if (!skip)
-	*label_p = label = create_artificial_label (loc);
+	*label_p = label = create_artificial_label (tok_range.m_start);
     }
   else if (TREE_CODE (label) == LABEL_DECL)
     ;
@@ -9898,21 +9898,23 @@ c_finish_bc_stmt (location_t loc, tree *label_p, bool is_break)
     {
     case 0:
       if (is_break)
-	error_at (loc, "break statement not within loop or switch");
+	error_at (tok_range, "break statement not within loop or switch");
       else
-	error_at (loc, "continue statement not within a loop");
+	error_at (tok_range, "continue statement not within a loop");
       return NULL_TREE;
 
     case 1:
       gcc_assert (is_break);
-      error_at (loc, "break statement used with OpenMP for loop");
+      error_at (tok_range, "break statement used with OpenMP for loop");
       return NULL_TREE;
 
     case 2:
       if (is_break) 
-	error ("break statement within %<#pragma simd%> loop body");
+	error_at (tok_range,
+		  "break statement within %<#pragma simd%> loop body");
       else 
-	error ("continue statement within %<#pragma simd%> loop body");
+	error_at (tok_range,
+		  "continue statement within %<#pragma simd%> loop body");
       return NULL_TREE;
 
     default:
diff --git a/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
new file mode 100644
index 0000000..6bd9e0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/diagnostic-token-ranges.c
@@ -0,0 +1,109 @@
+/* { dg-options "-fdiagnostics-show-caret -Wc++-compat" } */
+
+/* Verify that various diagnostics show source code ranges.  */
+
+/* These ones merely use token ranges; they don't use tree ranges.  */
+
+void undeclared_identifier (void)
+{
+  name; /* { dg-error "'name' undeclared" } */
+/*
+{ dg-begin-multiline-output "" }
+   name;
+   ^~~~
+{ dg-end-multiline-output "" }
+*/
+}
+
+void unknown_type_name (void)
+{
+  foo bar; /* { dg-error "unknown type name 'foo'" } */
+/*
+{ dg-begin-multiline-output "" }
+   foo bar;
+   ^~~
+{ dg-end-multiline-output "" }
+*/
+
+  qux *baz; /* { dg-error "unknown type name 'qux'" } */
+/*
+{ dg-begin-multiline-output "" }
+   qux *baz;
+   ^~~
+{ dg-end-multiline-output "" }
+*/
+}
+
+void test_identifier_conflicts_with_cplusplus (void)
+{
+  int new; /* { dg-warning "identifier 'new' conflicts with" } */
+/*
+{ dg-begin-multiline-output "" }
+   int new;
+       ^~~
+{ dg-end-multiline-output "" }
+*/
+}
+
+extern void
+bogus_varargs (...); /* { dg-error "ISO C requires a named argument before '...'" } */
+/*
+{ dg-begin-multiline-output "" }
+ bogus_varargs (...);
+                ^~~
+{ dg-end-multiline-output "" }
+*/
+
+extern void
+foo (unknown_type param); /* { dg-error "unknown type name 'unknown_type'" } */
+/*
+{ dg-begin-multiline-output "" }
+ foo (unknown_type param);
+      ^~~~~~~~~~~~
+{ dg-end-multiline-output "" }
+*/
+
+void wide_string_literal_in_asm (void)
+{
+  asm (L"nop"); /* { dg-error "wide string literal in 'asm'" } */
+/*
+{ dg-begin-multiline-output "" }
+   asm (L"nop");
+        ^~~~~~
+{ dg-end-multiline-output "" }
+*/
+}
+
+void label_in_front_of_decl (void)
+{
+ label:
+  int i; /* { dg-error "a label can only be part of a statement and a declaration is not a statement" } */
+/*
+{ dg-begin-multiline-output "" }
+  label:
+  ~~~~~
+   int i;
+   ^~~
+{ dg-end-multiline-output "" }
+*/
+  return;
+}
+
+void break_and_continue_in_wrong_places (void)
+{
+  if (0)
+    break; /* { dg-error "break statement not within loop or switch" } */
+/* { dg-begin-multiline-output "" }
+     break;
+     ^~~~~
+   { dg-end-multiline-output "" } */
+
+  if (1)
+    ;
+  else
+    continue; /* { dg-error "continue statement not within a loop" } */
+/* { dg-begin-multiline-output "" }
+     continue;
+     ^~~~~~~~
+    { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (2 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 08/22] C frontend: use token ranges in various diagnostics David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-14 19:35   ` Jeff Law
  2015-09-10 20:13 ` [PATCH 10/22] C++ FE: Use token ranges for various diagnostics David Malcolm
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch adds an easy way to write tests for expected multiline
output.  For example we can test carets and underlines for
a particular diagnostic with:

/* { dg-begin-multiline-output "" }
 typedef struct _GMutex GMutex;
                ^~~~~~~
   { dg-end-multiline-output "" } */

It is used extensively by the rest of the patch kit.

multiline.exp is used by prune.exp; hence we need to load it before
prune.exp via *load_gcc_lib* for the testsuites of the various
non-"gcc" support libraries (e.g. boehm-gc).

Question: which ChangeLog file should the change to
  libgo/testsuite/lib/libgo.exp
go into?

gcc/testsuite/ChangeLog:
	* lib/multiline.exp: New file.
	* lib/prune.exp: Load multiline.exp.
	(prune_gcc_output): Call into multiline.exp to handle any
	multiline output directives.

./ChangeLog:
	* libgo/testsuite/lib/libgo.exp: Load multiline.exp before
	prune.exp, using load_gcc_lib.

boehm-gc/ChangeLog:
	* testsuite/lib/boehm-gc.exp: Load multiline.exp before
	prune.exp, using load_gcc_lib.

libatomic/ChangeLog:
	* testsuite/lib/libatomic.exp: Load multiline.exp before
	prune.exp, using load_gcc_lib.

libgomp/ChangeLog:
	* testsuite/lib/libgomp.exp: Load multiline.exp before prune.exp,
	using load_gcc_lib.

libitm/ChangeLog:
	* testsuite/lib/libitm.exp: Load multiline.exp before prune.exp,
	using load_gcc_lib.

libvtv/ChangeLog:
	* testsuite/lib/libvtv.exp: Load multiline.exp before prune.exp,
	using load_gcc_lib.
---
 boehm-gc/testsuite/lib/boehm-gc.exp   |   1 +
 gcc/testsuite/lib/multiline.exp       | 241 ++++++++++++++++++++++++++++++++++
 gcc/testsuite/lib/prune.exp           |   5 +
 libatomic/testsuite/lib/libatomic.exp |   1 +
 libgo/testsuite/lib/libgo.exp         |   1 +
 libgomp/testsuite/lib/libgomp.exp     |   1 +
 libitm/testsuite/lib/libitm.exp       |   1 +
 libvtv/testsuite/lib/libvtv.exp       |   1 +
 8 files changed, 252 insertions(+)
 create mode 100644 gcc/testsuite/lib/multiline.exp

diff --git a/boehm-gc/testsuite/lib/boehm-gc.exp b/boehm-gc/testsuite/lib/boehm-gc.exp
index bafe7bb..d162035 100644
--- a/boehm-gc/testsuite/lib/boehm-gc.exp
+++ b/boehm-gc/testsuite/lib/boehm-gc.exp
@@ -31,6 +31,7 @@ load_gcc_lib target-utils.exp
 # For ${tool}_exit.
 load_gcc_lib gcc-defs.exp
 # For prune_gcc_output.
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 
 set dg-do-what-default run
diff --git a/gcc/testsuite/lib/multiline.exp b/gcc/testsuite/lib/multiline.exp
new file mode 100644
index 0000000..eb72143
--- /dev/null
+++ b/gcc/testsuite/lib/multiline.exp
@@ -0,0 +1,241 @@
+#   Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# Testing of multiline output
+
+# We have pre-existing testcases like this:
+#   |typedef struct _GMutex GMutex; // { dg-message "previously declared here"}
+# (using "|" here to indicate the start of a line),
+# generating output like this:
+#   |gcc/testsuite/g++.dg/diagnostic/wrong-tag-1.C:4:16: note: 'struct _GMutex' was previously declared here
+# where the location of the dg-message determines the expected line at
+# which the error should be reported.
+#
+# To handle rich error-reporting, we want to be able to verify that we
+# get output like this:
+#   |gcc/testsuite/g++.dg/diagnostic/wrong-tag-1.C:4:16: note: 'struct _GMutex' was previously declared here
+#   | typedef struct _GMutex GMutex; // { dg-message "previously declared here"}
+#   |                ^~~~~~~
+# where the compiler's first line of output is as before, but in
+# which it then echoes the source lines, adding annotations.
+#
+# We want to be able to write testcases that verify that the
+# emitted source-and-annotations are sane.
+#
+# A complication here is that the source lines contain comments
+# containing DejaGnu directives (such as the "dg-message" above).
+#
+# We punt this somewhat by only matching the beginnings of lines.
+# so that we can write e.g.
+#   |/* { dg-begin-multiline-output "" }
+#   | typedef struct _GMutex GMutex;
+#   |                ^~~~~~~
+#   |   { dg-end-multiline-output "" } */
+# to have the testsuite verify the expected output.
+
+############################################################################
+# Global variables.  Although global, these are intended to only be used from
+# within multiline.exp.
+############################################################################
+
+# The line number of the last dg-begin-multiline-output directive.
+set _multiline_last_beginning_line -1
+
+# A list of lists of strings.
+set _multiline_expected_outputs []
+
+############################################################################
+# Exported functions.
+############################################################################
+
+# Mark the beginning of an expected multiline output
+# All lines between this and the next dg-end-multiline-output are
+# expected to be seen.
+
+proc dg-begin-multiline-output { args } {
+    global _multiline_last_beginning_line
+    verbose "dg-begin-multiline-output: args: $args" 3
+    set line [expr [lindex $args 0] + 1]
+    set _multiline_last_beginning_line $line
+}
+
+# Mark the end of an expected multiline output
+# All lines up to here since the last dg-begin-multiline-output are
+# expected to be seen.
+
+proc dg-end-multiline-output { args } {
+    global _multiline_last_beginning_line
+    verbose "dg-end-multiline-output: args: $args" 3
+    set line [expr [lindex $args 0] - 1]
+    verbose "multiline output lines: $_multiline_last_beginning_line-$line" 3
+
+    upvar 1 prog prog
+    verbose "prog: $prog" 3
+    # "prog" now contains the filename
+    # Load it and split it into lines
+
+    set lines [_get_lines $prog $_multiline_last_beginning_line $line]
+    set _multiline_last_beginning_line -1
+
+    verbose "lines: $lines" 3
+    global _multiline_expected_outputs
+    lappend _multiline_expected_outputs $lines
+    verbose "within dg-end-multiline-output: _multiline_expected_outputs: $_multiline_expected_outputs" 3
+}
+
+# Hook to be called by prune.exp's prune_gcc_output to
+# look for the expected multiline outputs, pruning them,
+# reporting PASS for those that are found, and FAIL for
+# those that weren't found.
+#
+# It returns a pruned version of its output.
+#
+# It also clears the list of expected multiline outputs.
+
+proc handle-multiline-outputs { text } {
+    global _multiline_expected_outputs
+    set index 0
+    foreach multiline $_multiline_expected_outputs {
+	verbose "  multiline: $multiline" 4
+	set rexp [_build_multiline_regex $multiline $index]
+	verbose "rexp: ${rexp}" 4
+	# Escape newlines in $rexp so that we can print them in
+	# pass/fail results.
+	set escaped_regex [string map {"\n" "\\n"} $rexp]
+	verbose "escaped_regex: ${escaped_regex}" 4
+
+	# Use "regsub" to attempt to prune the pattern from $text
+	if {[regsub -line $rexp $text "" text]} {
+	    # Success; the multiline pattern was pruned.
+	    pass "expected multiline pattern $index was found: \"$escaped_regex\""
+	} else {
+	    fail "expected multiline pattern $index not found: \"$escaped_regex\""
+	}
+
+	set index [expr $index + 1]
+    }
+
+    # Clear the list of expected multiline outputs
+    set _multiline_expected_outputs []
+
+    return $text
+}
+
+############################################################################
+# Internal functions
+############################################################################
+
+# Load FILENAME and extract the lines from FIRST_LINE
+# to LAST_LINE (inclusive) as a list of strings.
+
+proc _get_lines { filename first_line last_line } {
+    verbose "_get_lines" 3
+    verbose "  filename: $filename" 3
+    verbose "  first_line: $first_line" 3
+    verbose "  last_line: $last_line" 3
+
+    set fp [open $filename r]
+    set file_data [read $fp]
+    close $fp
+    set data [split $file_data "\n"]
+    set linenum 1
+    set lines []
+    foreach line $data {
+	verbose "line $linenum: $line" 4
+	if { $linenum >= $first_line && $linenum <= $last_line } {
+	    lappend lines $line
+	}
+	set linenum [expr $linenum + 1]
+    }
+
+    return $lines
+}
+
+# Convert $multiline from a list of strings to a multiline regex
+# We need to support matching arbitrary followup text on each line,
+# to deal with comments containing containing DejaGnu directives.
+
+proc _build_multiline_regex { multiline index } {
+    verbose "_build_multiline_regex: $multiline $index" 4
+
+    set rexp ""
+    foreach line $multiline {
+	verbose "  line: $line" 4
+
+	# We need to escape "^" and other regexp metacharacters.
+	set line [string map {"^" "\\^"
+	                      "(" "\\("
+	                      ")" "\\)"
+	                      "[" "\\["
+	                      "]" "\\]"
+	                      "." "\\."
+	                      "\\" "\\\\"
+	                      "?" "\\?"
+	                      "+" "\\+"
+	                      "*" "\\*"
+	                      "|" "\\|"} $line]
+
+	append rexp $line
+	if {[string match "*^" $line] || [string match "*~" $line]} {
+	    # Assume a line containing a caret/range.  This must be
+	    # an exact match.
+	} elseif {[string match "*\\|" $line]} {
+	    # Assume a source line with a right-margin.  Support
+	    # arbitrary text in place of any whitespace before the
+	    # right-margin, to deal with comments containing containing
+	    # DejaGnu directives.
+
+	    # Remove final "\|":
+	    set rexp [string range $rexp 0 [expr [string length $rexp] - 3]]
+
+	    # Trim off trailing whitespace:
+	    set old_length [string length $rexp]
+	    set rexp [string trimright $rexp]
+	    set new_length [string length $rexp]
+
+	    # Replace the trimmed whitespace with "." chars to match anything:
+	    set ws [string repeat "." [expr $old_length - $new_length]]
+	    set rexp "${rexp}${ws}"
+
+	    # Add back the trailing '\|':
+	    set rexp "${rexp}\\|"
+	} else {
+	    # Assume that we have a quoted source line.
+	    # Support arbitrary followup text on each line,
+	    # to deal with comments containing containing DejaGnu
+	    # directives.
+	    append rexp ".*"
+	}
+	append rexp "\n"
+    }
+
+    # dg.exp's dg-test trims leading whitespace from the output
+    # in this line:
+    #   set comp_output [string trimleft $comp_output]
+    # so we can't rely on the exact leading whitespace for the
+    # first line in the *first* multiline regex.
+    #
+    # Trim leading whitespace from the regexp, replacing it with
+    # a "\s*", to match zero or more whitespace characters.
+    if { $index == 0 } {
+	set rexp [string trimleft $rexp]
+	set rexp "\\s*$rexp"
+    }
+
+    verbose "rexp: $rexp" 4
+
+    return $rexp
+}
diff --git a/gcc/testsuite/lib/prune.exp b/gcc/testsuite/lib/prune.exp
index 8e4c203..fa10043 100644
--- a/gcc/testsuite/lib/prune.exp
+++ b/gcc/testsuite/lib/prune.exp
@@ -16,6 +16,8 @@
 
 # Prune messages from gcc that aren't useful.
 
+load_lib multiline.exp
+
 if ![info exists TEST_ALWAYS_FLAGS] {
     set TEST_ALWAYS_FLAGS ""
 }
@@ -68,6 +70,9 @@ proc prune_gcc_output { text } {
     # Ignore harmless warnings from Xcode 4.0.
     regsub -all "(^|\n)\[^\n\]*ld: warning: could not create compact unwind for\[^\n\]*" $text "" text
 
+    # Call into multiline.exp to handle any multiline output directives.
+    set text [handle-multiline-outputs $text]
+
     #send_user "After:$text\n"
 
     return $text
diff --git a/libatomic/testsuite/lib/libatomic.exp b/libatomic/testsuite/lib/libatomic.exp
index 0491c18..cafab54 100644
--- a/libatomic/testsuite/lib/libatomic.exp
+++ b/libatomic/testsuite/lib/libatomic.exp
@@ -37,6 +37,7 @@ load_gcc_lib scandump.exp
 load_gcc_lib scanrtl.exp
 load_gcc_lib scantree.exp
 load_gcc_lib scanipa.exp
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
diff --git a/libgo/testsuite/lib/libgo.exp b/libgo/testsuite/lib/libgo.exp
index 7031f63..1b0f26a 100644
--- a/libgo/testsuite/lib/libgo.exp
+++ b/libgo/testsuite/lib/libgo.exp
@@ -39,6 +39,7 @@ proc load_gcc_lib { filename } {
     set loaded_libs($filename) ""
 }
 
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index f04b163..1040c29 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -14,6 +14,7 @@ load_lib dg.exp
 # loaded until ${tool}_target_compile is defined since it uses that
 # to determine default LTO options.
 
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
diff --git a/libitm/testsuite/lib/libitm.exp b/libitm/testsuite/lib/libitm.exp
index 1361d56..0416296 100644
--- a/libitm/testsuite/lib/libitm.exp
+++ b/libitm/testsuite/lib/libitm.exp
@@ -28,6 +28,7 @@ load_lib dg.exp
 #Â loaded until ${tool}_target_compile is defined since it uses that
 # to determine default LTO options.
 
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
diff --git a/libvtv/testsuite/lib/libvtv.exp b/libvtv/testsuite/lib/libvtv.exp
index aefcbd2..edf5fdd 100644
--- a/libvtv/testsuite/lib/libvtv.exp
+++ b/libvtv/testsuite/lib/libvtv.exp
@@ -28,6 +28,7 @@ load_lib dg.exp
 #Â loaded until ${tool}_target_compile is defined since it uses that
 # to determine default LTO options.
 
+load_gcc_lib multiline.exp
 load_gcc_lib prune.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 20/22] Use rich locations in c-family/c-format.c
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (5 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 09/22] C frontend: store and use token ranges in c_declspecs David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 11/22] Objective C: c/c-parser.c: use token ranges in two places David Malcolm
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This is a proof-of-concept of

(a) using the string-literal location info to show locations of errors
in format strings, and

(b) underlining the corresponding argument where applicable (no more
having to guess what the compiler means by "argument 2")

In particular, this can handle format strings built using
concatentation, potentially split over multiple lines etc.

Screenshot:
  https://dmalcolm.fedorapeople.org/gcc/2015-09-04/ranges-in-format-string-diagnostics.html

I also attempted to add captions to the underlines, but they looked
too "busy", so I removed them for now.

In theory this could replace some of the work done by Manu on
PR c/52952 in e.g. r223470, since it adds handling of string
concatentation, but I didn't go as far as rewriting all of them yet.

gcc/c-family/ChangeLog:
	* c-format.c: Include gcc-rich-location.h.
	(check_format_arg): Pass in "format_tree" to
	check_format_info_main.
	(check_format_info_main): Add param "format_string_cst".  Generate
	a source_range pass it to the call to check_format_types,
        (check_format_types): Replace location_t param with a
	source_range. Generate param_range and pars it to
	format_type_warning where applicable.
	(format_type_warning): Convert first param from location_t to
	source_range, as the range of the format string, and add
	a "param_range" parameter.  Use them in the warnings.

gcc/ChangeLog:
	* gcc-rich-location.c (gcc_rich_location::set_caption): New
	method.
	* gcc-rich-location.h (gcc_rich_location::set_caption): New
	method.

gcc/testsuite/ChangeLog:
	* gcc.dg/format/diagnostic-ranges.c: New file.
---
 gcc/c-family/c-format.c                         | 115 +++++++++++++++---------
 gcc/gcc-rich-location.c                         |  17 ++++
 gcc/gcc-rich-location.h                         |   5 ++
 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c | 101 +++++++++++++++++++++
 4 files changed, 195 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/format/diagnostic-ranges.c

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index ab58076..5d8de29 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-format.h"
 #include "alloc-pool.h"
 #include "c-target.h"
+#include "gcc-rich-location.h"
 
 /* Handle attributes associated with format checking.  */
 
@@ -1023,6 +1024,7 @@ static void check_format_info (function_format_info *, tree);
 static void check_format_arg (void *, tree, unsigned HOST_WIDE_INT);
 static void check_format_info_main (format_check_results *,
 				    function_format_info *,
+				    tree,
 				    const char *, int, tree,
 				    unsigned HOST_WIDE_INT,
 				    object_allocator<format_wanted_type> &);
@@ -1036,8 +1038,12 @@ static void finish_dollar_format_checking (format_check_results *, int);
 static const format_flag_spec *get_flag_spec (const format_flag_spec *,
 					      int, const char *);
 
-static void check_format_types (location_t, format_wanted_type *);
-static void format_type_warning (location_t, format_wanted_type *, tree, tree);
+static void check_format_types (source_range fmt_source_range,
+				format_wanted_type *);
+static void format_type_warning (source_range fmt_source_range,
+				 source_range *param_range,
+				 format_wanted_type *, tree,
+				 tree);
 
 /* Decode a format type from a string, returning the type, or
    format_type_error if not valid, in which case the caller should print an
@@ -1689,12 +1695,13 @@ check_format_arg (void *ctx, tree format_tree,
   res->number_other++;
   object_allocator <format_wanted_type> fwt_pool ("format_wanted_type pool",
 						  10);
-  check_format_info_main (res, info, format_chars, format_length,
+  check_format_info_main (res, info, format_tree, format_chars, format_length,
 			  params, arg_num, fwt_pool);
 }
 
 
-/* Do the main part of checking a call to a format function.  FORMAT_CHARS
+/* Do the main part of checking a call to a format function.
+   FORMAT_STRING_CST is the STRING_CST format string.  FORMAT_CHARS
    is the NUL-terminated format string (which at this point may contain
    internal NUL characters); FORMAT_LENGTH is its length (excluding the
    terminating NUL character).  ARG_NUM is one less than the number of
@@ -1703,7 +1710,9 @@ check_format_arg (void *ctx, tree format_tree,
 
 static void
 check_format_info_main (format_check_results *res,
-			function_format_info *info, const char *format_chars,
+			function_format_info *info,
+			tree format_string_cst,
+			const char *format_chars,
 			int format_length, tree params,
 			unsigned HOST_WIDE_INT arg_num,
 			object_allocator <format_wanted_type> &fwt_pool)
@@ -1763,6 +1772,7 @@ check_format_info_main (format_check_results *res,
 	  ++format_chars;
 	  continue;
 	}
+      const char *start_of_this_format = format_chars;
       flag_chars[0] = 0;
 
       if ((fki->flags & (int) FMT_FLAG_USE_DOLLAR) && has_operand_number != 0)
@@ -2426,7 +2436,17 @@ check_format_info_main (format_check_results *res,
 	}
 
       if (first_wanted_type != 0)
-        check_format_types (format_string_loc, first_wanted_type);
+	{
+	  ptrdiff_t offset_to_format_start = (start_of_this_format - 1) - orig_format_chars;
+	  ptrdiff_t offset_to_format_end = (format_chars - 1) - orig_format_chars;
+	  cpp_string_location *strloc
+	    = TREE_STRING_LOCATION (format_string_cst);
+	  gcc_assert (strloc);
+	  source_range fmt_source_range
+	    = strloc->get_range_between_indices (offset_to_format_start,
+						 offset_to_format_end);
+	  check_format_types (fmt_source_range, first_wanted_type);
+	}
     }
 
   if (format_chars - orig_format_chars != format_length)
@@ -2446,10 +2466,11 @@ check_format_info_main (format_check_results *res,
 
 
 /* Check the argument types from a single format conversion (possibly
-   including width and precision arguments).  LOC is the location of
-   the format string.  */
+   including width and precision arguments).  FMT_SOURCE_RANGE is the
+   location of the format string.  */
 static void
-check_format_types (location_t loc, format_wanted_type *types)
+check_format_types (source_range fmt_source_range,
+		    format_wanted_type *types)
 {
   for (; types != 0; types = types->next)
     {
@@ -2476,7 +2497,7 @@ check_format_types (location_t loc, format_wanted_type *types)
       cur_param = types->param;
       if (!cur_param)
         {
-          format_type_warning (loc, types, wanted_type, NULL);
+          format_type_warning (fmt_source_range, NULL, types, wanted_type, NULL);
           continue;
         }
 
@@ -2486,6 +2507,7 @@ check_format_types (location_t loc, format_wanted_type *types)
       orig_cur_type = cur_type;
       char_type_flag = 0;
 
+      source_range param_range = EXPR_LOCATION_RANGE (cur_param);
       STRIP_NOPS (cur_param);
 
       /* Check the types of any additional pointer arguments
@@ -2550,7 +2572,8 @@ check_format_types (location_t loc, format_wanted_type *types)
 	    }
 	  else
 	    {
-              format_type_warning (loc, types, wanted_type, orig_cur_type);
+              format_type_warning (fmt_source_range, &param_range, types,
+				   wanted_type, orig_cur_type);
 	      break;
 	    }
 	}
@@ -2618,20 +2641,24 @@ check_format_types (location_t loc, format_wanted_type *types)
 	  && TYPE_PRECISION (cur_type) == TYPE_PRECISION (wanted_type))
 	continue;
       /* Now we have a type mismatch.  */
-      format_type_warning (loc, types, wanted_type, orig_cur_type);
+      format_type_warning (fmt_source_range, &param_range, types, wanted_type,
+			   orig_cur_type);
     }
 }
 
 
-/* Give a warning at LOC about a format argument of different type from that
-   expected.  WANTED_TYPE is the type the argument should have, possibly
-   stripped of pointer dereferences.  The description (such as "field
+/* Give a warning at FMT_SOURCE_RANGE about a format argument of different type
+   from that expected.  If non-NULL, PARAM_RANGE is the source range of the
+   relevant argument.  WANTED_TYPE is the type the argument should have,
+   possibly stripped of pointer dereferences.  The description (such as "field
    precision"), the placement in the format string, a possibly more
    friendly name of WANTED_TYPE, and the number of pointer dereferences
    are taken from TYPE.  ARG_TYPE is the type of the actual argument,
    or NULL if it is missing.  */
 static void
-format_type_warning (location_t loc, format_wanted_type *type,
+format_type_warning (source_range fmt_source_range,
+		     source_range *param_range,
+		     format_wanted_type *type,
 		     tree wanted_type, tree arg_type)
 {
   int kind = type->kind;
@@ -2640,7 +2667,6 @@ format_type_warning (location_t loc, format_wanted_type *type,
   int format_length = type->format_length;
   int pointer_count = type->pointer_count;
   int arg_num = type->arg_num;
-  unsigned int offset_loc = type->offset_loc;
 
   char *p;
   /* If ARG_TYPE is a typedef with a misleading name (for example,
@@ -2674,41 +2700,44 @@ format_type_warning (location_t loc, format_wanted_type *type,
       p[pointer_count + 1] = 0;
     }
 
-  loc = location_from_offset (loc, offset_loc);
-		      
+  gcc_rich_location richloc (fmt_source_range);
+
+  if (param_range)
+    richloc.add_range (*param_range);
+
   if (wanted_type_name)
     {
       if (arg_type)
-        warning_at (loc, OPT_Wformat_,
-		    "%s %<%s%.*s%> expects argument of type %<%s%s%>, "
-		    "but argument %d has type %qT",
-		    gettext (kind_descriptions[kind]),
-		    (kind == CF_KIND_FORMAT ? "%" : ""),
-		    format_length, format_start, 
-		    wanted_type_name, p, arg_num, arg_type);
+	warning_at_rich_loc (&richloc, OPT_Wformat_,
+			     "%s %<%s%.*s%> expects argument of type %<%s%s%>, "
+			     "but argument %d has type %qT",
+			     gettext (kind_descriptions[kind]),
+			     (kind == CF_KIND_FORMAT ? "%" : ""),
+			     format_length, format_start,
+			     wanted_type_name, p, arg_num, arg_type);
       else
-        warning_at (loc, OPT_Wformat_,
-		    "%s %<%s%.*s%> expects a matching %<%s%s%> argument",
-		    gettext (kind_descriptions[kind]),
-		    (kind == CF_KIND_FORMAT ? "%" : ""),
-		    format_length, format_start, wanted_type_name, p);
+	warning_at_rich_loc (&richloc, OPT_Wformat_,
+			     "%s %<%s%.*s%> expects a matching %<%s%s%> argument",
+			     gettext (kind_descriptions[kind]),
+			     (kind == CF_KIND_FORMAT ? "%" : ""),
+			     format_length, format_start, wanted_type_name, p);
     }
   else
     {
       if (arg_type)
-        warning_at (loc, OPT_Wformat_,
-		    "%s %<%s%.*s%> expects argument of type %<%T%s%>, "
-		    "but argument %d has type %qT",
-		    gettext (kind_descriptions[kind]),
-		    (kind == CF_KIND_FORMAT ? "%" : ""),
-		    format_length, format_start, 
-		    wanted_type, p, arg_num, arg_type);
+	warning_at_rich_loc (&richloc, OPT_Wformat_,
+			     "%s %<%s%.*s%> expects argument of type %<%T%s%>, "
+			     "but argument %d has type %qT",
+			     gettext (kind_descriptions[kind]),
+			     (kind == CF_KIND_FORMAT ? "%" : ""),
+			     format_length, format_start,
+			     wanted_type, p, arg_num, arg_type);
       else
-        warning_at (loc, OPT_Wformat_,
-		    "%s %<%s%.*s%> expects a matching %<%T%s%> argument",
-		    gettext (kind_descriptions[kind]),
-		    (kind == CF_KIND_FORMAT ? "%" : ""),
-		    format_length, format_start, wanted_type, p);
+	warning_at_rich_loc (&richloc, OPT_Wformat_,
+			     "%s %<%s%.*s%> expects a matching %<%T%s%> argument",
+			     gettext (kind_descriptions[kind]),
+			     (kind == CF_KIND_FORMAT ? "%" : ""),
+			     format_length, format_start, wanted_type, p);
     }
 }
 
diff --git a/gcc/gcc-rich-location.c b/gcc/gcc-rich-location.c
index ae0b3bb..dffa047 100644
--- a/gcc/gcc-rich-location.c
+++ b/gcc/gcc-rich-location.c
@@ -67,6 +67,23 @@ get_range_for_expr (tree expr, location_range *r)
    by translating and formatting GMSGID and any variadic args.  */
 
 void
+gcc_rich_location::set_caption (diagnostic_context *context,
+				const char *gmsgid, ...)
+{
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+  gcc_assert (m_num_ranges > 0);
+
+  va_list ap;
+  va_start (ap, gmsgid);
+
+  free (m_ranges[0].m_caption);
+  m_ranges[0].m_caption = expand_caption_va (context, gmsgid, &ap);
+
+  va_end (ap);
+}
+
+void
 gcc_rich_location::add_range_with_caption (location_t start, location_t finish,
 					   diagnostic_context *context,
 					   const char *gmsgid, ...)
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
index a7822ac..4a07bb9 100644
--- a/gcc/gcc-rich-location.h
+++ b/gcc/gcc-rich-location.h
@@ -39,6 +39,11 @@ class gcc_rich_location : public rich_location
   /* Methods for adding additional details.  */
 
   void
+  set_caption (diagnostic_context *context,
+	       const char *gmsgid, ...)
+    ATTRIBUTE_GCC_DIAG(3,4);
+
+  void
   add_range_with_caption (location_t start, location_t finish,
 			  diagnostic_context *context,
 			  const char *gmsgid, ...)
diff --git a/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
new file mode 100644
index 0000000..f890f6a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/format/diagnostic-ranges.c
@@ -0,0 +1,101 @@
+/* { dg-options "-Wformat -fdiagnostics-show-caret" } */
+
+/* See PR 52952. */
+
+extern int printf (__const char *__restrict __format, ...);
+
+void test_mismatching_types (const char *msg)
+{
+  printf("hello %i", msg);  /* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'const char \\*' " } */
+
+/* { dg-begin-multiline-output "" }
+   printf("hello %i", msg);
+                 ^~   ~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_multiple_arguments (void)
+{
+  printf ("arg0: %i  arg1: %s arg 2: %i", /* { dg-warning "format '%s'" } */
+          100, 101, 102);
+/* { dg-begin-multiline-output "" }
+   printf ("arg0: %i  arg1: %s arg 2: %i",
+                            ^~
+           100, 101, 102);
+                ~~~          
+   { dg-end-multiline-output "" } */
+/* FIXME: why the trailing whitespace in the line above?  */
+}
+
+void multiline_format_string (void) {
+  printf ("before the fmt specifier"
+          "%"  /* { dg-warning "format '%d' expects a matching 'int' argument" } */
+          "d"
+          "after the fmt specifier");
+
+/* { dg-begin-multiline-output "" }
+           "%"
+            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+           "d"
+           ~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_hex (const char *msg)
+{
+  /* "%" is \x25
+     "i" is \x69 */
+  printf("hello \x25\x69", msg);  /* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'const char \\*' " } */
+
+/* { dg-begin-multiline-output "" }
+   printf("hello \x25\x69", msg);
+                 ^~~~~~~~   ~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_oct (const char *msg)
+{
+  /* "%" is octal 045
+     "i" is octal 151.  */
+  printf("hello \045\151", msg);  /* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'const char \\*' " } */
+
+/* { dg-begin-multiline-output "" }
+   printf("hello \045\151", msg);
+                 ^~~~~~~~   ~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_multiple (const char *msg)
+{
+  /* "%" is \x25 in hex
+     "i" is \151 in octal.  */
+  printf("prefix"  "\x25"  "\151"  "suffix",  /* { dg-warning "format '%i'" } */
+         msg);
+
+/* { dg-begin-multiline-output "" }
+   printf("prefix"  "\x25"  "\151"  "suffix",
+                     ^~~~~~~~~~~~
+          msg);
+          ~~~         
+  { dg-end-multiline-output "" } */
+/* FIXME: why the trailing whitespace in the line above?  */
+}
+
+void test_u8 (const char *msg)
+{
+  printf(u8"hello %i", msg);/* { dg-warning "format '%i' expects argument of type 'int', but argument 2 has type 'const char \\*' " } */
+/* { dg-begin-multiline-output "" }
+   printf(u8"hello %i", msg);
+                   ^~   ~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_spurious_percent (void)
+{
+  printf("hello world %"); /* { dg-warning "spurious trailing" } */
+
+/* { dg-begin-multiline-output "" }
+   printf("hello world %");
+                       ^
+   { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 06/22] PR/62314: add ability to add fixit-hints
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
  2015-09-10 20:12 ` [PATCH 01/22] Change of location_get_source_line signature David Malcolm
@ 2015-09-10 20:13 ` David Malcolm
  2015-09-10 20:13 ` [PATCH 08/22] C frontend: use token ranges in various diagnostics David Malcolm
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch adds the ability to add "fix-it hints" to a rich_location,
which will be displayed when the corresponding diagnostic is printed.

It does not actually add any fix-it hints (that comes in a later
patch), but it adds test coverage of the machinery and printing,
by using the existing diagnostic_plugin_test_show_locus to inject
some meaningless fixit hints, and to verify the output.

gcc/ChangeLog:
	PR/62314
	* diagnostic-show-locus.c (colorizer::set_fixit_hint): New.
	(layout::move_to_column): New method.
	(layout::print_line): Print any fixit hints affecting the line.
	For now, add nasty linker kludge for the sake of
	diagnostic_plugin_test_show_locus.

gcc/testsuite/ChangeLog:
	PR/62314
	* gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
	(test_fixit_insert): New.
	(test_fixit_remove): New.
	(test_fixit_replace): New.
	* gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
	(test_fixit_insert): New.
	(test_fixit_remove): New.
	(test_fixit_replace): New.
	* gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
	(test_fixit_insert): New.
	(test_fixit_remove): New.
	(test_fixit_replace): New.
	* gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
	(test_fixit_insert): New.
	(test_fixit_remove): New.
	(test_fixit_replace): New.
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
	(test_show_locus): Add tests of rendering fixit hints.

libcpp/ChangeLog:
	PR/62314
	* include/line-map.h (source_range::intersects_line_p): New
	method.
	(rich_location::add_fixit_insert): New method.
	(rich_location::add_fixit_remove): New method.
	(rich_location::add_fixit_replace): New method.
	(rich_location::get_num_fixit_hints): New accessor.
	(rich_location::get_fixit_hint): New accessor.
	(rich_location::MAX_FIXIT_HINTS): New constant.
	(rich_location::m_num_fixit_hints): New field.
	(rich_location::m_fixit_hints): New field.
	(class fixit_hint): New class.
	(class fixit_insert): New class.
	(class fixit_remove): New class.
	(class fixit_replace): New class.
	* line-map.c (source_range::intersects_line_p): New method.
	(rich_location::rich_location): Add initialization of
	m_num_fixit_hints to both ctors.
	(rich_location::~rich_location): Delete the fixit hints.
	(rich_location::add_fixit_insert): New method.
	(rich_location::add_fixit_remove): New method.
	(rich_location::add_fixit_replace): New method.
	(fixit_insert::fixit_insert): New.
	(fixit_insert::~fixit_insert): New.
	(fixit_insert::affects_line_p): New.
	(fixit_remove::fixit_remove): New.
	(fixit_remove::affects_line_p): New.
	(fixit_replace::fixit_replace): New.
	(fixit_replace::~fixit_replace): New.
	(fixit_replace::affects_line_p): New.
---
 gcc/diagnostic-show-locus.c                        | 111 +++++++++++++++++-
 .../plugin/diagnostic-test-show-locus-ascii-bw.c   |  43 +++++++
 .../diagnostic-test-show-locus-ascii-color.c       |  43 +++++++
 .../plugin/diagnostic-test-show-locus-utf-8-bw.c   |  43 +++++++
 .../diagnostic-test-show-locus-utf-8-color.c       |  43 +++++++
 .../plugin/diagnostic_plugin_test_show_locus.c     |  35 ++++++
 libcpp/include/line-map.h                          |  93 +++++++++++++++
 libcpp/line-map.c                                  | 130 ++++++++++++++++++++-
 8 files changed, 538 insertions(+), 3 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 9216c4c..0f58f6c 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -271,6 +271,7 @@ class colorizer
 
   void set_range (int range_idx) { set_state (range_idx); }
   void set_normal_text () { set_state (STATE_NORMAL_TEXT); }
+  void set_fixit_hint () { set_state (0); }
 
  private:
   void set_state (int state);
@@ -353,6 +354,9 @@ class layout
   void
   print_any_margin (int line, int column, enum print_line_kind kind);
 
+  void
+  move_to_column (int *column, int dest_column);
+
  private:
   diagnostic_context *m_context;
   pretty_printer *m_pp;
@@ -635,7 +639,10 @@ layout::layout (diagnostic_context * context,
    describing the ranges.
 
    Both lines (1) and (2) may contain a right-most margin containing a
-   vertical bar and a caption, describing a range.  */
+   vertical bar and a caption, describing a range.
+
+   In addition, if there are fixit hints affecting this source line,
+   there will be one or more further lines printed, showing them.  */
 
 void
 layout::print_line (int row)
@@ -749,6 +756,87 @@ layout::print_line (int row)
     }
   print_any_margin (row, x_bound, PRINT_LINE_KIND_ANNOTATION);
   pp_newline (m_pp);
+
+  /* Step 3: if there are any fixit hints on this source line, print them.
+     They are printed in order, attempting to combine them onto lines, but
+     starting new lines if necessary.  */
+  column = 0;
+
+  for (unsigned int i = 0; i < m_richloc->get_num_fixit_hints (); i++)
+    {
+      fixit_hint *hint = m_richloc->get_fixit_hint (i);
+      if (hint->affects_line_p (m_exploc.file, row))
+	{
+	  /* For now we assume each fixit hint can only touch one line.  */
+	  switch (hint->get_kind ())
+	    {
+	    case fixit_hint::INSERT:
+	      {
+		fixit_insert *insert = static_cast <fixit_insert *> (hint);
+		/* This assumes the insertion just affects one line.  */
+		int start_column
+		  = LOCATION_COLUMN (insert->get_location ());
+		move_to_column (&column, start_column);
+		m_colorizer.set_fixit_hint ();
+		pp_string (m_pp, insert->get_string ());
+		m_colorizer.set_normal_text ();
+		column += insert->get_length ();
+	      }
+	      break;
+
+	    case fixit_hint::REMOVE:
+	      {
+		fixit_remove *remove = static_cast <fixit_remove *> (hint);
+		/* This assumes the removal just affects one line.  */
+		source_range src_range = remove->get_range ();
+		int start_column = LOCATION_COLUMN (src_range.m_start);
+		int finish_column = LOCATION_COLUMN (src_range.m_finish);
+		move_to_column (&column, start_column);
+		for (int column = start_column; column <= finish_column; column++)
+		  {
+		    m_colorizer.set_fixit_hint ();
+		    pp_character (m_pp, '-');
+		    m_colorizer.set_normal_text ();
+		  }
+	      }
+	      break;
+
+	    case fixit_hint::REPLACE:
+	      {
+		fixit_replace *replace = static_cast <fixit_replace *> (hint);
+		int start_column
+		  = LOCATION_COLUMN (replace->get_range ().m_start);
+		move_to_column (&column, start_column);
+		m_colorizer.set_fixit_hint ();
+		pp_string (m_pp, replace->get_string ());
+		m_colorizer.set_normal_text ();
+		column += replace->get_length ();
+	      }
+	      break;
+
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+    }
+
+  /* Nasty workaround to convince the linker to add
+      rich_location::add_fixit_insert
+      rich_location::add_fixit_remove
+      rich_location::add_fixit_replace
+     to cc1 for use by diagnostic_plugin_test_show_locus,
+     before anything in cc1 is using them.
+
+     This conditional should never hold, but hopefully the compiler can't
+     figure that out.  */
+  if (0 == strcmp ("grotesque linking kludge", g_line_art.default_caret))
+    {
+      m_richloc->add_fixit_insert (UNKNOWN_LOCATION, "");
+      m_richloc->add_fixit_remove
+	(source_range::from_location (UNKNOWN_LOCATION));
+      m_richloc->add_fixit_replace
+	(source_range::from_location (UNKNOWN_LOCATION), "");
+    }
 }
 
 /* Given a line of source code, get the per_range_info for the first range
@@ -825,6 +913,27 @@ layout::print_any_margin (int line, int column, enum print_line_kind kind)
     pp_string (m_pp, g_line_art.rmargin_vbar);
 }
 
+/* Given *COLUMN as an x-coordinate, print spaces to position
+   successive output at DEST_COLUMN, printing a newline if necessary,
+   and updating *COLUMN.  */
+
+void
+layout::move_to_column (int *column, int dest_column)
+{
+  /* Start a new line if we need to.  */
+  if (*column > dest_column)
+    {
+      pp_newline (m_pp);
+      *column = 0;
+    }
+
+  while (*column < dest_column)
+    {
+      pp_space (m_pp);
+      (*column)++;
+    }
+}
+
 } /* End of anonymous namespace.  */
 
 /* For debugging layout issues in diagnostic_show_locus and friends,
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
index 8ffe2e0..8aef89d 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
@@ -112,3 +112,46 @@ void test6 (void)
    { dg-end-multiline-output "" } */
 #endif
 }
+
+/* Unit test for rendering of insertion fixit hints
+   (example taken from PR 62316).  */
+
+void test_fixit_insert (void)
+{
+#if 0
+   int a[2][2] = { 0, 1 , 2, 3 }; /* { dg-warning "insertion hints" } */
+/* { dg-begin-multiline-output "" }
+    int a[2][2] = { 0, 1 , 2, 3 };
+                    ^~~~
+                    {   }
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "remove" fixit hints.  */
+
+void test_fixit_remove (void)
+{
+#if 0
+  int a;; /* { dg-warning "example of a removal hint" } */
+/* { dg-begin-multiline-output "" }
+   int a;;
+         ^
+         -
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "replace" fixit hints.  */
+
+void test_fixit_replace (void)
+{
+#if 0
+  gtk_widget_showall (dlg); /* { dg-warning "example of a replacement hint" } */
+/* { dg-begin-multiline-output "" }
+   gtk_widget_showall (dlg);
+   ^~~~~~~~~~~~~~~~~~
+   gtk_widget_show_all
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
index dba851d..f261e2a 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
@@ -33,3 +33,46 @@ void test2 (void)
    { dg-end-multiline-output "" } */
 #endif
 }
+
+/* Unit test for rendering of insertion fixit hints
+   (example taken from PR 62316).  */
+
+void test_fixit_insert (void)
+{
+#if 0
+   int a[2][2] = { 0, 1 , 2, 3 }; /* { dg-warning "insertion hints" } */
+/* { dg-begin-multiline-output "" }
+    int a[2][2] = { ^[[01;35m^[[K0, 1^[[m^[[K , 2, 3 };
+                    ^[[01;35m^[[K^~~~
+                    {^[[m^[[K   ^[[01;35m^[[K}^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "remove" fixit hints.  */
+
+void test_fixit_remove (void)
+{
+#if 0
+  int a;; /* { dg-warning "example of a removal hint" } */
+/* { dg-begin-multiline-output "" }
+   int a;^[[01;35m^[[K;^[[m^[[K
+         ^[[01;35m^[[K^
+         -^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "replace" fixit hints.  */
+
+void test_fixit_replace (void)
+{
+#if 0
+  gtk_widget_showall (dlg); /* { dg-warning "example of a replacement hint" } */
+/* { dg-begin-multiline-output "" }
+   ^[[01;35m^[[Kgtk_widget_showall^[[m^[[K (dlg);
+   ^[[01;35m^[[K^~~~~~~~~~~~~~~~~~
+   gtk_widget_show_all^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
index 5fc8395..0e869fb 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
@@ -56,3 +56,46 @@ void test6 (void)
    { dg-end-multiline-output "" } */
 #endif
 }
+
+/* Unit test for rendering of insertion fixit hints
+   (example taken from PR 62316).  */
+
+void test_fixit_insert (void)
+{
+#if 0
+   int a[2][2] = { 0, 1 , 2, 3 }; /* { dg-warning "insertion hints" } */
+/* { dg-begin-multiline-output "" }
+    int a[2][2] = { 0, 1 , 2, 3 };
+                    â–²â”€â”€â”€
+                    {   }
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "remove" fixit hints.  */
+
+void test_fixit_remove (void)
+{
+#if 0
+  int a;; /* { dg-warning "example of a removal hint" } */
+/* { dg-begin-multiline-output "" }
+   int a;;
+         â–²
+         -
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "replace" fixit hints.  */
+
+void test_fixit_replace (void)
+{
+#if 0
+  gtk_widget_showall (dlg); /* { dg-warning "example of a replacement hint" } */
+/* { dg-begin-multiline-output "" }
+   gtk_widget_showall (dlg);
+   â–²â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
+   gtk_widget_show_all
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
index a8f4bc3..df861f6 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
@@ -60,3 +60,46 @@ void test6 (void)
    { dg-end-multiline-output "" } */
 #endif
 }
+
+/* Unit test for rendering of insertion fixit hints
+   (example taken from PR 62316).  */
+
+void test_fixit_insert (void)
+{
+#if 0
+   int a[2][2] = { 0, 1 , 2, 3 }; /* { dg-warning "insertion hints" } */
+/* { dg-begin-multiline-output "" }
+    int a[2][2] = { ^[[01;35m^[[K0, 1^[[m^[[K , 2, 3 };
+                    ^[[01;35m^[[Kâ–²â”€â”€â”€
+                    {^[[m^[[K   ^[[01;35m^[[K}^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "remove" fixit hints.  */
+
+void test_fixit_remove (void)
+{
+#if 0
+  int a;; /* { dg-warning "example of a removal hint" } */
+/* { dg-begin-multiline-output "" }
+   int a;^[[01;35m^[[K;^[[m^[[K
+         ^[[01;35m^[[Kâ–²
+         -^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* Unit test for rendering of "replace" fixit hints.  */
+
+void test_fixit_replace (void)
+{
+#if 0
+  gtk_widget_showall (dlg); /* { dg-warning "example of a replacement hint" } */
+/* { dg-begin-multiline-output "" }
+   ^[[01;35m^[[Kgtk_widget_showall^[[m^[[K (dlg);
+   ^[[01;35m^[[Kâ–²â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€â”€
+   gtk_widget_show_all^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
index f724ef4..0162caa 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
@@ -306,6 +306,41 @@ test_show_locus (function *fun)
       gcc_rich_location richloc (src_range);
       warning_at_rich_loc (&richloc, 0, "test 6");
     }
+
+  /* Tests of rendering fixit hints.  */
+  if (0 == strcmp (fnname, "test_fixit_insert"))
+    {
+      const int line = fnstart_line + 2;
+      source_range src_range;
+      src_range.m_start = get_loc (line, 19);
+      src_range.m_finish = get_loc (line, 22);
+      gcc_rich_location richloc (src_range);
+      richloc.add_fixit_insert (src_range.m_start, "{");
+      richloc.add_fixit_insert (src_range.m_finish + 1, "}");
+      warning_at_rich_loc (&richloc, 0, "example of insertion hints");
+    }
+
+  if (0 == strcmp (fnname, "test_fixit_remove"))
+    {
+      const int line = fnstart_line + 2;
+      source_range src_range;
+      src_range.m_start = get_loc (line, 8);
+      src_range.m_finish = get_loc (line, 8);
+      gcc_rich_location richloc (src_range);
+      richloc.add_fixit_remove (src_range);
+      warning_at_rich_loc (&richloc, 0, "example of a removal hint");
+    }
+
+  if (0 == strcmp (fnname, "test_fixit_replace"))
+    {
+      const int line = fnstart_line + 2;
+      source_range src_range;
+      src_range.m_start = get_loc (line, 2);
+      src_range.m_finish = get_loc (line, 19);
+      gcc_rich_location richloc (src_range);
+      richloc.add_fixit_replace (src_range, "gtk_widget_show_all");
+      warning_at_rich_loc (&richloc, 0, "example of a replacement hint");
+    }
 }
 
 unsigned int
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 53ba68b..2861b42 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -145,6 +145,9 @@ struct GTY(()) source_range
     result.m_finish = loc;
     return result;
   }
+
+  /* Is there any part of this range on the given line?  */
+  bool intersects_line_p (const char *file, int line) const;
 };
 
 /* Memory allocation function typedef.  Works like xrealloc.  */
@@ -1097,6 +1100,11 @@ enum buffer_ownership
   BUFFER_OWNERSHIP_BORROWED /* Make a copy.  */
 };
 
+class fixit_hint;
+  class fixit_insert;
+  class fixit_remove;
+  class fixit_replace;
+
 /* A "rich" source code location, for use when printing diagnostics.
    A rich_location has a "primary location", along with zero or more
    additional ranges.
@@ -1195,8 +1203,24 @@ class rich_location
   void
   override_column (int column);
 
+  /* Fix-it hints.  */
+  void
+  add_fixit_insert (source_location where,
+		    const char *new_content);
+
+  void
+  add_fixit_remove (source_range src_range);
+
+  void
+  add_fixit_replace (source_range src_range,
+		     const char *new_content);
+
+  unsigned int get_num_fixit_hints () const { return m_num_fixit_hints; }
+  fixit_hint *get_fixit_hint (int idx) const { return m_fixit_hints[idx]; }
+
 public:
   static const int MAX_RANGES = 3;
+  static const int MAX_FIXIT_HINTS = 2;
 
 protected:
   friend class range_iter;
@@ -1208,8 +1232,77 @@ protected:
 
   bool m_have_expanded_location;
   expanded_location m_expanded_location;
+
+  unsigned int m_num_fixit_hints;
+  fixit_hint *m_fixit_hints[MAX_FIXIT_HINTS];
 };
 
+class fixit_hint
+{
+public:
+  enum kind {INSERT, REMOVE, REPLACE};
+
+  virtual ~fixit_hint () {}
+
+  virtual enum kind get_kind () const = 0;
+  virtual bool affects_line_p (const char *file, int line) = 0;
+};
+
+class fixit_insert : public fixit_hint
+{
+ public:
+  fixit_insert (source_location where,
+		const char *new_content);
+  ~fixit_insert ();
+  enum kind get_kind () const { return INSERT; }
+  bool affects_line_p (const char *file, int line);
+
+  source_location get_location () const { return m_where; }
+  const char *get_string () const { return m_bytes; }
+  size_t get_length () const { return m_len; }
+
+ private:
+  source_location m_where;
+  char *m_bytes;
+  size_t m_len;
+};
+
+class fixit_remove : public fixit_hint
+{
+ public:
+  fixit_remove (source_range src_range);
+  ~fixit_remove () {}
+
+  enum kind get_kind () const { return REMOVE; }
+  bool affects_line_p (const char *file, int line);
+
+  source_range get_range () const { return m_src_range; }
+
+ private:
+  source_range m_src_range;
+};
+
+class fixit_replace : public fixit_hint
+{
+ public:
+  fixit_replace (source_range src_range,
+                 const char *new_content);
+  ~fixit_replace ();
+
+  enum kind get_kind () const { return REPLACE; }
+  bool affects_line_p (const char *file, int line);
+
+  source_range get_range () const { return m_src_range; }
+  const char *get_string () const { return m_bytes; }
+  size_t get_length () const { return m_len; }
+
+ private:
+  source_range m_src_range;
+  char *m_bytes;
+  size_t m_len;
+};
+
+
 inline
 rich_location::range_iter::range_iter (rich_location *richloc) :
   m_richloc (richloc),
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index 79d8eee..fec57fc 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -1747,6 +1747,28 @@ line_table_dump (FILE *stream, struct line_maps *set, unsigned int num_ordinary,
     }
 }
 
+/* struct source_range.  */
+
+/* Is there any part of this range on the given line?  */
+
+bool
+source_range::intersects_line_p (const char *file, int line) const
+{
+  expanded_location exploc_start
+    = linemap_client_expand_location_to_spelling_point (m_start);
+  if (file != exploc_start.file)
+    return false;
+  if (line < exploc_start.line)
+      return false;
+  expanded_location exploc_finish
+    = linemap_client_expand_location_to_spelling_point (m_finish);
+  if (file != exploc_finish.file)
+    return false;
+  if (line > exploc_finish.line)
+      return false;
+  return true;
+}
+
 /* class rich_location.  */
 
 /* Construct a rich_location with location LOC as its initial range.  */
@@ -1754,7 +1776,8 @@ line_table_dump (FILE *stream, struct line_maps *set, unsigned int num_ordinary,
 rich_location::rich_location (source_location loc) :
   m_loc (loc),
   m_num_ranges (0),
-  m_have_expanded_location (false)
+  m_have_expanded_location (false),
+  m_num_fixit_hints (0)
 {
   /* Set up the 0th range: */
   add_range (loc, loc, true);
@@ -1766,7 +1789,8 @@ rich_location::rich_location (source_location loc) :
 rich_location::rich_location (source_range src_range)
 : m_loc (src_range.m_start),
   m_num_ranges (0),
-  m_have_expanded_location (false)
+  m_have_expanded_location (false),
+  m_num_fixit_hints (0)
 {
   /* Set up the 0th range: */
   add_range (src_range, true);
@@ -1778,6 +1802,8 @@ rich_location::~rich_location ()
 {
   for (unsigned int i = 0; i < m_num_ranges; i++)
     free (m_ranges[i].m_caption);
+  for (unsigned int i = 0; i < m_num_fixit_hints; i++)
+    delete m_fixit_hints[i];
 }
 
 /* Get the first line of the rich_location, either that of
@@ -1954,3 +1980,103 @@ rich_location::set_range (unsigned int idx, source_range src_range,
       m_have_expanded_location = false;
     }
 }
+
+/* Add a fixit-hint, suggesting insertion of NEW_CONTENT
+   at WHERE.  */
+
+void
+rich_location::add_fixit_insert (source_location where,
+				 const char *new_content)
+{
+  linemap_assert (m_num_fixit_hints < MAX_FIXIT_HINTS);
+  m_fixit_hints[m_num_fixit_hints++]
+    = new fixit_insert (where, new_content);
+}
+
+/* Add a fixit-hint, suggesting removal of the content at
+   SRC_RANGE.  */
+
+void
+rich_location::add_fixit_remove (source_range src_range)
+{
+  linemap_assert (m_num_fixit_hints < MAX_FIXIT_HINTS);
+  m_fixit_hints[m_num_fixit_hints++] = new fixit_remove (src_range);
+}
+
+/* Add a fixit-hint, suggesting replacement of the content at
+   SRC_RANGE with NEW_CONTENT.  */
+
+void
+rich_location::add_fixit_replace (source_range src_range,
+				  const char *new_content)
+{
+  linemap_assert (m_num_fixit_hints < MAX_FIXIT_HINTS);
+  m_fixit_hints[m_num_fixit_hints++]
+    = new fixit_replace (src_range, new_content);
+}
+
+/* class fixit_insert.  */
+
+fixit_insert::fixit_insert (source_location where,
+			    const char *new_content)
+: m_where (where),
+  m_bytes (xstrdup (new_content)),
+  m_len (strlen (new_content))
+{
+}
+
+fixit_insert::~fixit_insert ()
+{
+  free (m_bytes);
+}
+
+/* Implementation of fixit_hint::affects_line_p for fixit_insert.  */
+
+bool
+fixit_insert::affects_line_p (const char *file, int line)
+{
+  expanded_location exploc
+    = linemap_client_expand_location_to_spelling_point (m_where);
+  if (file == exploc.file)
+    if (line == exploc.line)
+      return true;
+  return false;
+}
+
+/* class fixit_remove.  */
+
+fixit_remove::fixit_remove (source_range src_range)
+: m_src_range (src_range)
+{
+}
+
+/* Implementation of fixit_hint::affects_line_p for fixit_remove.  */
+
+bool
+fixit_remove::affects_line_p (const char *file, int line)
+{
+  return m_src_range.intersects_line_p (file, line);
+}
+
+/* class fixit_replace.  */
+
+fixit_replace::fixit_replace (source_range src_range,
+			      const char *new_content)
+: m_src_range (src_range),
+  m_bytes (xstrdup (new_content)),
+  m_len (strlen (new_content))
+{
+}
+
+fixit_replace::~fixit_replace ()
+{
+  free (m_bytes);
+}
+
+/* Implementation of fixit_hint::affects_line_p for fixit_replace.  */
+
+bool
+fixit_replace::affects_line_p (const char *file, int line)
+{
+  return m_src_range.intersects_line_p (file, line);
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (9 preceding siblings ...)
  2015-09-10 20:13 ` [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file David Malcolm
@ 2015-09-10 20:28 ` David Malcolm
  2015-09-11 14:08   ` Michael Matz
  2015-09-15 10:20   ` Richard Biener
  2015-09-10 20:29 ` [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes David Malcolm
                   ` (11 subsequent siblings)
  22 siblings, 2 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch adds source *range* information to libcpp's cpp_token, and to
c_token and cp_token in the C and C++ frontends respectively.

To minimize churn, I kept the existing location_t fields, though in
theory these are always just equal to the start of the source range.

cpplib.h's struct cpp_token had this comment:

  /* A preprocessing token.  This has been carefully packed and should
     occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */

Does anyone know why this was "carefully packed" and to what extent
this matters?  I'm adding an extra 8 bytes to it (or 4 if we eliminate
the existing location_t).  As far as I can see, these are
short-lived, and there are only relative few alive at any time.
Or is it about making them fast to copy?

gcc/c-family/ChangeLog:
	* c-lex.c (c_lex_with_flags): Add "range" param, and write back
	to *range with the range of the libcpp token.
	* c-pragma.h (c_lex_with_flags): Add "range" param.

gcc/c/ChangeLog:
	* c-parser.c (struct c_token): Add "range" field.
	(c_lex_one_token): Write back to token->range in call to
	c_lex_with_flags.

gcc/cp/ChangeLog:
	* parser.c (eof_token): Add "range" field to initializer.
	(cp_lexer_get_preprocessor_token): Write back to token->range in
	call to c_lex_with_flags.
	* parser.h (struct cp_token): Add "range" field.

libcpp/ChangeLog:
	* include/cpplib.h (struct cpp_token): Add src_range field.
	* lex.c (_cpp_lex_direct): Set up the src_range on the token.
---
 gcc/c-family/c-lex.c    | 7 +++++--
 gcc/c-family/c-pragma.h | 4 ++--
 gcc/c/c-parser.c        | 6 +++++-
 gcc/cp/parser.c         | 5 +++--
 gcc/cp/parser.h         | 2 ++
 libcpp/include/cpplib.h | 4 +++-
 libcpp/lex.c            | 8 ++++++++
 7 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 55ceb20..1334994 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -380,11 +380,13 @@ c_common_has_attribute (cpp_reader *pfile)
 }
 \f
 /* Read a token and return its type.  Fill *VALUE with its value, if
-   applicable.  Fill *CPP_FLAGS with the token's flags, if it is
+   applicable.  Fill *LOC and *RANGE with the source location and range
+   of the token.  Fill *CPP_FLAGS with the token's flags, if it is
    non-NULL.  */
 
 enum cpp_ttype
-c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
+c_lex_with_flags (tree *value, location_t *loc, source_range *range,
+		  unsigned char *cpp_flags,
 		  int lex_flags)
 {
   static bool no_more_pch;
@@ -397,6 +399,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
  retry:
   tok = cpp_get_token_with_location (parse_in, loc);
   type = tok->type;
+  *range = tok->src_range;
 
  retry_after_at:
   switch (type)
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index aa2b471..05df543 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -225,8 +225,8 @@ extern enum cpp_ttype pragma_lex (tree *);
 /* This is not actually available to pragma parsers.  It's merely a
    convenient location to declare this function for c-lex, after
    having enum cpp_ttype declared.  */
-extern enum cpp_ttype c_lex_with_flags (tree *, location_t *, unsigned char *,
-					int);
+extern enum cpp_ttype c_lex_with_flags (tree *, location_t *, source_range *,
+					unsigned char *, int);
 
 extern void c_pp_lookup_pragma (unsigned int, const char **, const char **);
 
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 11a2b0f..5d822ee 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -170,6 +170,8 @@ struct GTY (()) c_token {
   ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
   /* The location at which this token was found.  */
   location_t location;
+  /* The source range at which this token was found.  */
+  source_range range;
   /* The value associated with this token, if any.  */
   tree value;
 };
@@ -239,7 +241,9 @@ c_lex_one_token (c_parser *parser, c_token *token)
 {
   timevar_push (TV_LEX);
 
-  token->type = c_lex_with_flags (&token->value, &token->location, NULL,
+  token->type = c_lex_with_flags (&token->value, &token->location,
+				  &token->range,
+				  NULL,
 				  (parser->lex_untranslated_string
 				   ? C_LEX_STRING_NO_TRANSLATE : 0));
   token->id_kind = C_ID_NONE;
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 67fbcda..7c59c58 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
 
 static cp_token eof_token =
 {
-  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, false, false, false, 0, { NULL }
+  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, false, false, false, 0, {0, 0}, { NULL }
 };
 
 /* The various kinds of non integral constant we encounter. */
@@ -764,7 +764,8 @@ cp_lexer_get_preprocessor_token (cp_lexer *lexer, cp_token *token)
 
    /* Get a new token from the preprocessor.  */
   token->type
-    = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
+    = c_lex_with_flags (&token->u.value, &token->location,
+                        &token->range, &token->flags,
 			lexer == NULL ? 0 : C_LEX_STRING_NO_JOIN);
   token->keyword = RID_MAX;
   token->pragma_kind = PRAGMA_NONE;
diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index 760467c..c7558a0 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -61,6 +61,8 @@ struct GTY (()) cp_token {
   BOOL_BITFIELD purged_p : 1;
   /* The location at which this token was found.  */
   location_t location;
+  /* The source range at which this token was found.  */
+  source_range range;
   /* The value associated with this token, if any.  */
   union cp_token_value {
     /* Used for CPP_NESTED_NAME_SPECIFIER and CPP_TEMPLATE_ID.  */
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index a2bdfa0..0b1a403 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -235,9 +235,11 @@ struct GTY(()) cpp_identifier {
 };
 
 /* A preprocessing token.  This has been carefully packed and should
-   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
+   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.
+   FIXME: the above comment is no longer true with this patch.  */
 struct GTY(()) cpp_token {
   source_location src_loc;	/* Location of first char of token.  */
+  source_range src_range;	/* Source range covered by the token.  */
   ENUM_BITFIELD(cpp_ttype) type : CHAR_BIT;  /* token type */
   unsigned short flags;		/* flags - see above */
 
diff --git a/libcpp/lex.c b/libcpp/lex.c
index 0aa1090..a84a8c0 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -2365,6 +2365,9 @@ _cpp_lex_direct (cpp_reader *pfile)
     result->src_loc = linemap_position_for_column (pfile->line_table,
 					  CPP_BUF_COLUMN (buffer, buffer->cur));
 
+  /* The token's src_range begins here.  */
+  result->src_range.m_start = result->src_loc;
+
   switch (c)
     {
     case ' ': case '\t': case '\f': case '\v': case '\0':
@@ -2723,6 +2726,11 @@ _cpp_lex_direct (cpp_reader *pfile)
       break;
     }
 
+  /* The token's src_range ends here.  */
+  result->src_range.m_finish =
+    linemap_position_for_column (pfile->line_table,
+				 CPP_BUF_COLUMN (buffer, buffer->cur));
+
   return result;
 }
 
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 05/22] Add overloads of inform, warning_at, etc that take a source_range
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (11 preceding siblings ...)
  2015-09-10 20:29 ` [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes David Malcolm
@ 2015-09-10 20:29 ` David Malcolm
  2015-09-10 20:30 ` [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree David Malcolm
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Various followup patches convert variables from "location_t" to
"source_range".  For the places where we issue diagnostics using
these variables, it's useful to have overloaded variants of
"warning_at" etc that take a source_range, allowing these
diagnostics to display underlines without needing to be rewritten.

gcc/ChangeLog:
	* diagnostic-core.h (warning_at): Add a overloaded declaration
	taking a source_range.
	(error_at): Likewise.
	(pedwarn): Likewise.
	(permerror): Likewise.
	(inform): Likewise.
	* diagnostic.c (inform): Implement new overloaded variant taking
	a source_range.
	(warning_at): Likewise.
	(pedwarn): Likewise.
	(permerror): Likewise.
	(error_at): Likewise.

gcc/c/ChangeLog:
	* c-errors.c (pedwarn_c99): Move bulk of implementation to...
	(pedwarn_c99_va): New function.
	(pedwarn_c99): New overload, taking a source_range.
	(pedwarn_c90): Move bulk of implementation to...
	(pedwarn_c90_va): New function.
	(pedwarn_c90): New overload, taking a source_range.
	* c-tree.h (pedwarn_c90): New declaration of overloaded function
	taking a source_range.
	(pedwarn_c99): Likewise.
---
 gcc/c/c-errors.c      | 97 +++++++++++++++++++++++++++++++++++++++++----------
 gcc/c/c-tree.h        |  4 +++
 gcc/diagnostic-core.h |  7 ++++
 gcc/diagnostic.c      | 84 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+), 19 deletions(-)

diff --git a/gcc/c/c-errors.c b/gcc/c/c-errors.c
index 0f8b933..d5a3d6b 100644
--- a/gcc/c/c-errors.c
+++ b/gcc/c/c-errors.c
@@ -30,26 +30,34 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "opts.h"
 
+static bool
+pedwarn_c99_va (rich_location *richloc, int opt,
+		const char *gmsgid, va_list *args)
+     ATTRIBUTE_GCC_DIAG(3,0);
+
+static void
+pedwarn_c90_va (rich_location *richloc, int opt,
+		const char *gmsgid, va_list *args)
+     ATTRIBUTE_GCC_DIAG(3,0);
+
 /* Issue an ISO C99 pedantic warning MSGID if -pedantic outside C11 mode,
    otherwise issue warning MSGID if -Wc99-c11-compat is specified.
    This function is supposed to be used for matters that are allowed in
    ISO C11 but not supported in ISO C99, thus we explicitly don't pedwarn
    when C11 is specified.  */
 
-bool
-pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
+static bool
+pedwarn_c99_va (rich_location *richloc, int opt,
+		const char *gmsgid, va_list *args)
 {
   diagnostic_info diagnostic;
-  va_list ap;
   bool warned = false;
-  rich_location richloc (location);
 
-  va_start (ap, gmsgid);
   /* If desired, issue the C99/C11 compat warning, which is more specific
      than -pedantic.  */
   if (warn_c99_c11_compat > 0)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+      diagnostic_set_info (&diagnostic, gmsgid, args, richloc,
 			   (pedantic && !flag_isoc11)
 			   ? DK_PEDWARN : DK_WARNING);
       diagnostic.option_index = OPT_Wc99_c11_compat;
@@ -61,14 +69,43 @@ pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
   /* For -pedantic outside C11, issue a pedwarn.  */
   else if (pedantic && !flag_isoc11)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_PEDWARN);
+      diagnostic_set_info (&diagnostic, gmsgid, args, richloc, DK_PEDWARN);
       diagnostic.option_index = opt;
       warned = report_diagnostic (&diagnostic);
     }
-  va_end (ap);
   return warned;
 }
 
+/* As pedwarn_c99_va above, but at LOCATION.  */
+
+bool
+pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  bool ret;
+  rich_location richloc (location);
+
+  va_start (ap, gmsgid);
+  ret = pedwarn_c99_va (&richloc, opt, gmsgid, &ap);
+  va_end (ap);
+  return ret;
+}
+
+/* As pedwarn_c99_va above, but at SRC_RANGE.  */
+
+bool
+pedwarn_c99 (source_range src_range, int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  bool ret;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  ret = pedwarn_c99_va (&richloc, opt, gmsgid, &ap);
+  va_end (ap);
+  return ret;
+}
+
 /* Issue an ISO C90 pedantic warning MSGID if -pedantic outside C99 mode,
    otherwise issue warning MSGID if -Wc90-c99-compat is specified, or if
    a specific option such as -Wlong-long is specified.
@@ -76,35 +113,33 @@ pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
    ISO C99 but not supported in ISO C90, thus we explicitly don't pedwarn
    when C99 is specified.  (There is no flag_c90.)  */
 
-void
-pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
+static void
+pedwarn_c90_va (rich_location *richloc, int opt,
+		const char *gmsgid, va_list *args)
 {
   diagnostic_info diagnostic;
-  va_list ap;
-  rich_location richloc (location);
 
-  va_start (ap, gmsgid);
   /* Warnings such as -Wvla are the most specific ones.  */
   if (opt != OPT_Wpedantic)
     {
       int opt_var = *(int *) option_flag_var (opt, &global_options);
       if (opt_var == 0)
-        goto out;
+	return;
       else if (opt_var > 0)
 	{
-	  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+	  diagnostic_set_info (&diagnostic, gmsgid, args, richloc,
 			       (pedantic && !flag_isoc99)
 			       ? DK_PEDWARN : DK_WARNING);
 	  diagnostic.option_index = opt;
 	  report_diagnostic (&diagnostic);
-	  goto out;
+	  return;
 	}
     }
   /* Maybe we want to issue the C90/C99 compat warning, which is more
      specific than -pedantic.  */
   if (warn_c90_c99_compat > 0)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+      diagnostic_set_info (&diagnostic, gmsgid, args, richloc,
 			   (pedantic && !flag_isoc99)
 			   ? DK_PEDWARN : DK_WARNING);
       diagnostic.option_index = OPT_Wc90_c99_compat;
@@ -116,10 +151,34 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
   /* For -pedantic outside C99, issue a pedwarn.  */
   else if (pedantic && !flag_isoc99)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_PEDWARN);
+      diagnostic_set_info (&diagnostic, gmsgid, args, richloc, DK_PEDWARN);
       diagnostic.option_index = opt;
       report_diagnostic (&diagnostic);
     }
-out:
+}
+
+/* As pedwarn_c90_va above, but at LOCATION.  */
+
+void
+pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  rich_location richloc (location);
+
+  va_start (ap, gmsgid);
+  pedwarn_c90_va (&richloc, opt, gmsgid, &ap);
+  va_end (ap);
+}
+
+/* As pedwarn_c90_va above, but at SRC_RANGE.  */
+
+void
+pedwarn_c90 (source_range src_range, int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  pedwarn_c90_va (&richloc, opt, gmsgid, &ap);
   va_end (ap);
 }
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index a3979dd..df1ebb6 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -705,7 +705,11 @@ extern void c_bind (location_t, tree, bool);
 /* In c-errors.c */
 extern void pedwarn_c90 (location_t, int opt, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,4);
+extern void pedwarn_c90 (source_range, int opt, const char *, ...)
+    ATTRIBUTE_GCC_DIAG(3,4);
 extern bool pedwarn_c99 (location_t, int opt, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,4);
+extern bool pedwarn_c99 (source_range, int opt, const char *, ...)
+    ATTRIBUTE_GCC_DIAG(3,4);
 
 #endif /* ! GCC_C_TREE_H */
diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index a8a7c37..419fdd9 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -63,12 +63,15 @@ extern bool warning_n (location_t, int, int, const char *, const char *, ...)
     ATTRIBUTE_GCC_DIAG(4,6) ATTRIBUTE_GCC_DIAG(5,6);
 extern bool warning_at (location_t, int, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,4);
+extern bool warning_at (source_range, int, const char *, ...)
+    ATTRIBUTE_GCC_DIAG(3,4);
 extern bool warning_at_rich_loc (rich_location *, int, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,4);
 extern void error (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern void error_n (location_t, int, const char *, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,5) ATTRIBUTE_GCC_DIAG(4,5);
 extern void error_at (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern void error_at (source_range, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void error_at_rich_loc (rich_location *, const char *, ...)
   ATTRIBUTE_GCC_DIAG(2,3);
 extern void fatal_error (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3)
@@ -76,11 +79,15 @@ extern void fatal_error (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3)
 /* Pass one of the OPT_W* from options.h as the second parameter.  */
 extern bool pedwarn (location_t, int, const char *, ...)
      ATTRIBUTE_GCC_DIAG(3,4);
+extern bool pedwarn (source_range, int, const char *, ...)
+     ATTRIBUTE_GCC_DIAG(3,4);
 extern bool permerror (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern bool permerror (source_range, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern bool permerror_at_rich_loc (rich_location *, const char *,
 				   ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void sorry (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern void inform (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern void inform (source_range, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void inform_at_rich_loc (rich_location *, const char *,
 				...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void inform_n (location_t, int, const char *, const char *, ...)
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 060f071..ebdea96 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -967,6 +967,21 @@ inform (location_t location, const char *gmsgid, ...)
   va_end (ap);
 }
 
+/* Same as "inform", but at SRC_RANGE.  */
+
+void
+inform (source_range src_range, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_NOTE);
+  report_diagnostic (&diagnostic);
+  va_end (ap);
+}
+
 /* Same as "inform", but at RICHLOC.  */
 void
 inform_at_rich_loc (rich_location *richloc, const char *gmsgid, ...)
@@ -1038,6 +1053,24 @@ warning_at (location_t location, int opt, const char *gmsgid, ...)
   return ret;
 }
 
+/* Same as warning at, but using SRC_RANGE.  */
+
+bool
+warning_at (source_range src_range, int opt, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  bool ret;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_WARNING);
+  diagnostic.option_index = opt;
+  ret = report_diagnostic (&diagnostic);
+  va_end (ap);
+  return ret;
+}
+
 /* Same as warning at, but using RICHLOC.  */
 
 bool
@@ -1108,6 +1141,24 @@ pedwarn (location_t location, int opt, const char *gmsgid, ...)
   return ret;
 }
 
+/* Same as pedwarn, but using SRC_RANGE.  */
+
+bool
+pedwarn (source_range src_range, int opt, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  bool ret;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,  DK_PEDWARN);
+  diagnostic.option_index = opt;
+  ret = report_diagnostic (&diagnostic);
+  va_end (ap);
+  return ret;
+}
+
 /* A "permissive" error at LOCATION: issues an error unless
    -fpermissive was given on the command line, in which case it issues
    a warning.  Use this for things that really should be errors but we
@@ -1132,6 +1183,25 @@ permerror (location_t location, const char *gmsgid, ...)
   return ret;
 }
 
+/* Same as "permerror", but at SRC_RANGE.  */
+
+bool
+permerror (source_range src_range, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  bool ret;
+  rich_location richloc (src_range);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+                       permissive_error_kind (global_dc));
+  diagnostic.option_index = permissive_error_option (global_dc);
+  ret = report_diagnostic (&diagnostic);
+  va_end (ap);
+  return ret;
+}
+
 /* Same as "permerror", but at RICHLOC.  */
 
 bool
@@ -1197,6 +1267,20 @@ error_at (location_t loc, const char *gmsgid, ...)
   va_end (ap);
 }
 
+/* Same as ebove, but use range LOC instead of input_location.  */
+void
+error_at (source_range loc, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  rich_location richloc (loc);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_ERROR);
+  report_diagnostic (&diagnostic);
+  va_end (ap);
+}
+
 /* Same as above, but use RICH_LOC.  */
 
 void
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (10 preceding siblings ...)
  2015-09-10 20:28 ` [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs David Malcolm
@ 2015-09-10 20:29 ` David Malcolm
  2015-09-11 13:44   ` Michael Matz
  2015-09-11 14:12   ` Michael Matz
  2015-09-10 20:29 ` [PATCH 05/22] Add overloads of inform, warning_at, etc that take a source_range David Malcolm
                   ` (10 subsequent siblings)
  22 siblings, 2 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch rewrites diagnostic_show_locus so that it can display
underlined source ranges in addition to the main caret.

It does this by introducing a new "rich_location" class, containing
a location and (potentially) some source ranges.  These are to be
allocated on the stack when generating diagnostics.

The patch reworks various diagnostics machinery to use a
rich_location * rather than a source_location.  The "override_column"
machinery is largely eliminated.

The patch unit-tests the new diagnostic printer using a plugin, which
injects calls to print various diagnostics on some dummy source code,
and verifies the expected output, for the 4 combinations of
ASCII vs UTF-8 output and black&white vs colored output; screenshots
can be seen here:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/ascii-bw.html
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/ascii-color.html
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/utf-8-bw.html
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/utf-8-color.html

Diagnostics already have a "severity color": errors default to bold red,
warnings to bold magenta, notes to bold cyan.  I chose to use this
severity color when coloring range 0, and after some experiments it
became natural to also use the color for the caret, since this is
notionally part of range 0.  Hence the "caret" color name goes away
from diagnostic-color.c, and we gain two new color names: "range1" and
"range2", for additional ranges.  Based on the discussion in that file
I chose green and blue (both non-bold) for these ranges.

The patch also tweaks diagnostic_report_diagnostic, so that the final
option_text for warnings is colored, using the effective severity, e.g.:
  [-Wformat]
   ^^^^^^^^ colored with the "warning" color
or:
  [-Werror=format]
   ^^^^^^^^^^^^^^ colored with the "error" color

This patch bootstraps and passes regression testing, but isn't ready
as-is:

  - The new implementation of diagnostic_show_locus doesn't yet have
    any logic for offsetting columns when displaying very wide source
    lines on a narrow terminal.  This is a regression compared to the
    existing implementation (AFAIK our test suite doesn't have
    coverage for it yet).

Other questions and notes:

  - The Fortran frontend has its own logic for printing multiple
    locations, repeatedly calling in to diagnostic_print_caret_line.
    I hope the new printing logic is suitable for use by Fortran, but I
    wanted to keep the job of
      "introducing range-capable printing logic"
    separate from that of
      "updating Fortran diagnostics to use it",
    since I'm not very familiar with Fortran, and what is desirable
    there.  Hence to faithfully preserve the existing behavior, I
    introduced a flag into the diagnostic_context:
      "frontend_calls_diagnostic_print_caret_line_p"
    which is set by the Fortran frontend, and makes
    diagnostic_show_locus use the existing printing logic.  Hopefully
    that's acceptable, say, as a migration path.

  - I tried losing the "_at_richloc" suffix to "warning_at_richloc" etc,
    and just having an overload (e.g. of "warning_at"), but we then run
    into lots of calls to e.g.
       warning_at (0,
    where the "0" means "UNKNOWN_LOCATION", and this would become a
    compilation errors, due to ambiguity of the overload (0 location_t
    vs a NULL rich_location *).
    The call sites of the above form could be changed to explicitly use
    UNKNOWN_LOCATION instead of 0, if desired.  I think I prefer this
    approach, though it touches more code.

  - There is a nasty hack in the test plugin due to a linker issue (due
    to it being the only user of a new C++ method, which makes it into
    libbackend.a but not cc1; see the notes in that file); help with
    resolving it cleanly would be appreciated.

Thoughts?

libcpp/ChangeLog:
	* errors.c (cpp_diagnostic): Update for change in signature
	of "error" callback.
	(cpp_diagnostic_with_line): Likewise, calling override_column
	on the rich_location.
	* include/cpplib.h (struct cpp_callbacks): Within "error"
	callback, convert param from source_location to rich_location *,
	and drop column_override param.
	* include/line-map.h (struct source_range): New struct.
	(struct location_range): New struct.
	(enum buffer_ownership): New enum.
	(class rich_location): New class.
	(linemap_client_expand_location_to_spelling_point): New declaration.
	* line-map.c (rich_location::rich_location): New ctors.
	(rich_location::~rich_location): New dtor.
	(rich_location::get_first_line): New method.
	(rich_location::get_last_line): New method.
	(rich_location::lazily_expand_location): New method.
	(rich_location::override_column): New method.
	(rich_location::add_range): New methods.
	(rich_location::set_range): New method.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add gcc-rich-location.o.
	(OBJS-libcommon): Add box-drawing.o.
	* box-drawing.c: New file.
	* box-drawing.h: New file.
	* diagnostic-color.c (color_dict): Eliminate "caret"; add "range1"
	and "range2".
	(parse_gcc_colors): Update comment to describe default GCC_COLORS.
	* diagnostic-core.h (warning_at_rich_loc): New declaration.
	(error_at_rich_loc): New declaration.
	(permerror_at_rich_loc): New declaration.
	(inform_at_rich_loc): New declaration.
	* diagnostic-show-locus.c: Include box-drawing.h.
	(location_range::contains_point): New method.
	(get_state_at_point): New function.
	(get_x_bound_for_row): New function.
	(class colorizer): New class.
	(class per_range_info): New class.
	(class layout): New class.
	(colorizer::colorizer): New ctor.
	(colorizer::~colorizer): New dtor.
	(colorizer::set_state): New method.
	(colorizer::begin_state): New method.
	(colorizer::finish_state): New method.
	(per_range_info::per_range_info): New method.
	(per_range_info::add_uniquely_captioned_row): New method.
	(per_range_info::get_first_unique_row): New method.
	(per_range_info::get_last_unique_row): New method.
	(per_range_info::contains_line): New method.
	(get_line_width_without_trailing_whitespace): New method.
	(per_range_info::determine_location_for_caption): New method.
	(layout::layout): New ctor.
	(layout::print_line): New method.
	(layout::get_any_range): New method.
	(layout::print_any_margin): New method.
	(show_ruler): New function.
	(diagnostic_print_ranges): New function.
	(diagnostic_show_locus): Call new function diagnostic_print_ranges,
	falling back to diagnostic_print_caret_line if the frontend has
	set frontend_calls_diagnostic_print_caret_line_p on the
	diagnostic_context.
	(diagnostic_print_caret_line): Convert params caret1 and caret2
	from char to const char *.
	(diagnostic_print_ranges): New function.
	* diagnostic.c: Include "box-drawing.h".
	(diagnostic_initialize): Initialize caret_chars from g_line_art.
	(diagnostic_set_info_translated): Convert param from location_t
	to rich_location *.  Eliminate calls to set_location on the
	message in favor of storing the rich_location ptr there.
	(diagnostic_set_info): Convert param from location_t to
	rich_location *.
	(diagnostic_build): Break out array into...
	(diagnostic_kind_color): New variable.
	(diagnostic_get_color_for_kind): New function.
	(diagnostic_report_diagnostic): Colorize the option_text
	using the color for the severity.
	(diagnostic_append_note): Update for change in signature of
	diagnostic_set_info.
	(diagnostic_append_note_at_rich_loc): New function.
	(emit_diagnostic): Update for change in signature of
	diagnostic_set_info.
	(inform): Likewise.
	(inform_at_rich_loc): New function.
	(inform_n): Update for change in signature of diagnostic_set_info.
	(warning): Likewise.
	(warning_at): Likewise.
	(warning_at_rich_loc): New function.
	(warning_n): Update for change in signature of diagnostic_set_info.
	(pedwarn): Likewise.
	(permerror): Likewise.
	(permerror_at_rich_loc): New function.
	(error): Update for change in signature of diagnostic_set_info.
	(error_n): Likewise.
	(error_at): Likewise.
	(error_at_rich_loc): New function.
	(sorry): Update for change in signature of diagnostic_set_info.
	(fatal_error): Likewise.
	(internal_error): Likewise.
	(internal_error_no_backtrace): Likewise.
	(source_range::debug): New function.
	* diagnostic.h (struct diagnostic_info): Eliminate field
	"override_column".  Add field "richloc".
	(struct diagnostic_context): Convert "caret_chars" elements from
	char to const char *.  Add field
	"frontend_calls_diagnostic_print_caret_line_p".
	(diagnostic_override_column): Eliminate this macro.
	(diagnostic_set_info): Convert param from location_t to
	rich_location *.
	(diagnostic_set_info_translated): Likewise.
	(diagnostic_append_note_at_rich_loc): New function.
	(diagnostic_num_locations): New function.
	(diagnostic_expand_location): Get the location from the
	rich_location.
	(diagnostic_print_caret_line): Convert params "caret1" and
	"caret2" from char to const char *.
	(diagnostic_get_color_for_kind): New declaration.
	* gcc-rich-location.c: New file.
	* gcc-rich-location.h: New file.
	* genmatch.c (linemap_client_expand_location_to_spelling_point): New.
	(error_cb): Update for change in signature of "error" callback.
	(fatal_at): Likewise.
	(warning_at): Likewise.
	* input.c (linemap_client_expand_location_to_spelling_point): New.
	* intl.c: Include "box-drawing.h".
	(gcc_init_libintl): Initialize g_line_art.
	* pretty-print.c (text_info::set_range): New method.
	(text_info::get_location): New method.
	* pretty-print.h (MAX_LOCATIONS_PER_MESSAGE): Eliminate this macro.
	(struct text_info): Eliminate "locations" array in favor of
	"m_richloc", a rich_location *.
	(textinfo::set_location): Add a "caret_p" param, and reimplement
	in terms of a call to set_range.
	(textinfo::get_location): Eliminate inline implementation in favor of
	an out-of-line reimplementation.
	(textinfo::set_range): New method.
	* rtl-error.c (diagnostic_for_asm): Update for change in signature
	of diagnostic_set_info.
	* tree-diagnostic.c (default_tree_printer): Update for new
	"caret_p" param for textinfo::set_location.
	* tree-pretty-print.c (percent_K_format): Likewise.

gcc/c-family/ChangeLog:
	* c-common.c (c_cpp_error): Convert parameter from location_t to
	rich_location *.  Eliminate the "column_override" parameter and
	the call to diagnostic_override_column.
	Update the "done_lexing" clause to set range 0
	on the rich_location, rather than overwriting a location_t.
	* c-common.h (c_cpp_error): Convert parameter from location_t to
	rich_location *.  Eliminate the "column_override" parameter.

gcc/c/ChangeLog:
	* c-decl.c (warn_defaults_to): Update for change in signature
	of diagnostic_set_info.
	* c-errors.c (pedwarn_c99): Likewise.
	(pedwarn_c90): Likewise.
	* c-objc-common.c (c_tree_printer): Update for new "caret_p" param
	for textinfo::set_location.

gcc/cp/ChangeLog:
	* error.c (cp_printer): Update for new "caret_p" param for
	textinfo::set_location.
	(pedwarn_cxx98): Update for change in signature of
	diagnostic_set_info.

gcc/fortran/ChangeLog:
	* cpp.c (cb_cpp_error): Convert parameter from location_t to
	rich_location *.  Eliminate the "column_override" parameter.
	* error.c: Include "box-drawing.h".
	(gfc_warning): Update for change in signature of
	diagnostic_set_info.
	(gfc_format_decoder): Update handling of %C/%L for changes
	to struct text_info.
	(gfc_diagnostic_starter): Use richloc when determining whether to
	print one locus or two.
	(gfc_warning_now_at): Update for change in signature of
	diagnostic_set_info.
	(gfc_warning_now): Likewise.
	(gfc_error_now): Likewise.
	(gfc_fatal_error): Likewise.
	(gfc_error): Likewise.
	(gfc_internal_error): Likewise.
	(gfc_diagnostics_init): Update initialization of caret_chars from char
	to const char *.  Set frontend_calls_diagnostic_print_caret_line_p.
	(gfc_diagnostics_finish): Update resetting of caret_chars from char
	to const char *.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c: New file.
	* gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c: New file.
	* gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c: New file.
	* gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add the above.
	* lib/gcc-dg.exp: Load multiline.exp.
---
 gcc/Makefile.in                                    |   3 +-
 gcc/box-drawing.c                                  |  99 +++
 gcc/box-drawing.h                                  |  43 +
 gcc/c-family/c-common.c                            |  15 +-
 gcc/c-family/c-common.h                            |   4 +-
 gcc/c/c-decl.c                                     |   3 +-
 gcc/c/c-errors.c                                   |  12 +-
 gcc/c/c-objc-common.c                              |   2 +-
 gcc/cp/error.c                                     |   5 +-
 gcc/diagnostic-color.c                             |   5 +-
 gcc/diagnostic-core.h                              |   8 +
 gcc/diagnostic-show-locus.c                        | 877 ++++++++++++++++++++-
 gcc/diagnostic.c                                   | 200 ++++-
 gcc/diagnostic.h                                   |  48 +-
 gcc/fortran/cpp.c                                  |  13 +-
 gcc/fortran/error.c                                |  43 +-
 gcc/gcc-rich-location.c                            |  93 +++
 gcc/gcc-rich-location.h                            |  54 ++
 gcc/genmatch.c                                     |  27 +-
 gcc/input.c                                        |   7 +
 gcc/intl.c                                         |   9 +
 gcc/pretty-print.c                                 |  21 +
 gcc/pretty-print.h                                 |  25 +-
 gcc/rtl-error.c                                    |   3 +-
 .../plugin/diagnostic-test-show-locus-ascii-bw.c   | 114 +++
 .../diagnostic-test-show-locus-ascii-color.c       |  35 +
 .../plugin/diagnostic-test-show-locus-utf-8-bw.c   |  58 ++
 .../diagnostic-test-show-locus-utf-8-color.c       |  62 ++
 .../plugin/diagnostic_plugin_test_show_locus.c     | 361 +++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   5 +
 gcc/testsuite/lib/gcc-dg.exp                       |   1 +
 gcc/tree-diagnostic.c                              |   2 +-
 gcc/tree-pretty-print.c                            |   2 +-
 libcpp/errors.c                                    |   7 +-
 libcpp/include/cpplib.h                            |   4 +-
 libcpp/include/line-map.h                          | 236 ++++++
 libcpp/line-map.c                                  | 208 +++++
 37 files changed, 2569 insertions(+), 145 deletions(-)
 create mode 100644 gcc/box-drawing.c
 create mode 100644 gcc/box-drawing.h
 create mode 100644 gcc/gcc-rich-location.c
 create mode 100644 gcc/gcc-rich-location.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index f183b22..c472696 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1255,6 +1255,7 @@ OBJS = \
 	fold-const.o \
 	function.o \
 	fwprop.o \
+	gcc-rich-location.o \
 	gcse.o \
 	gcse-common.o \
 	ggc-common.o \
@@ -1513,7 +1514,7 @@ OBJS = \
 # Objects in libcommon.a, potentially used by all host binaries and with
 # no target dependencies.
 OBJS-libcommon = diagnostic.o diagnostic-color.o diagnostic-show-locus.o \
-	pretty-print.o intl.o \
+	box-drawing.o pretty-print.o intl.o \
 	vec.o input.o version.o hash-table.o ggc-none.o
 
 # Objects in libcommon-target.a, used by drivers and by the core
diff --git a/gcc/box-drawing.c b/gcc/box-drawing.c
new file mode 100644
index 0000000..af0619e
--- /dev/null
+++ b/gcc/box-drawing.c
@@ -0,0 +1,99 @@
+/* Box-drawing, either using Unicode box-drawing chars, or as pure ASCII
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "box-drawing.h"
+
+/* Global singleton instance.  */
+box_drawing g_line_art;
+
+/* Strings for drawing the caret and ranges within diagnostics.
+   There are either UTF-8-encoded Unicode box-drawing characters,
+   or pure ASCII.  */
+
+/* Initializer for box_drawing.
+   This isn't a constructor since the singleton instance is
+   statically-allocated.  */
+void
+box_drawing::init (bool have_utf8)
+{
+  if (have_utf8)
+    {
+      /* For the caret, use:
+	   U+25B2 BLACK UP-POINTING TRIANGLE
+	 which in UTF-8 is: 0xE2 0x96 0xB2.  */
+      default_caret = "\xE2\x96\xB2";
+
+      /* Underlining text ranges.  */
+
+      /* The start of an underline: the south-western corner of a box:
+	   U+2514 BOX DRAWINGS LIGHT UP AND RIGHT
+	 in UTF-8: 0xE2 0x94 0x94.  */
+      underline_start = "\xE2\x94\x94";
+
+      /* Within an underline:
+	   U+2500 BOX DRAWINGS LIGHT HORIZONTAL
+	 which in UTF-8 is: 0xE2 0x94 0x80.  */
+      underline_hbar = "\xE2\x94\x80";
+
+      /* The end of an underline: the south-eastern corner of a box:
+	   U+2518 BOX DRAWINGS LIGHT UP AND LEFT
+	 UTF-8: 0xE2 0x94 0x98.  */
+      underline_end = "\xE2\x94\x98";
+
+      /* Vertical margins (when showing captions for ranges).  */
+
+      /* The top of a rmargin: the north-eastern corner of a box:
+	   U+2510 BOX DRAWINGS LIGHT DOWN AND LEFT
+	 UTF-8: 0xE2 0x94 0x90.  */
+      rmargin_start = "\xE2\x94\x90";
+
+      /* For vertical lines, use:
+	   U+2502 BOX DRAWINGS LIGHT VERTICAL
+	   which in UTF-8 is: 0xE2 0x94 0x82.  */
+      rmargin_vbar = "\xE2\x94\x82";
+
+      /* For the row within the rmargin containing the caption:
+	   U+251C BOX DRAWINGS LIGHT VERTICAL AND RIGHT
+	 UTF-8: 0xE2 0x94 0x9C.  */
+      rmargin_caption_row = "\xE2\x94\x9C";
+
+      /* Unless its on the very last line, in which case:
+	   U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL
+	 UTF-8: 0xE2 0x94 0xB4.  */
+      rmargin_caption_row_at_end = "\xE2\x94\xB4";
+
+      /* The end of a rmargin: the south-eastern corner of a box:
+	   U+2518 BOX DRAWINGS LIGHT UP AND LEFT
+	 UTF-8: 0xE2 0x94 0x98.  */
+      rmargin_end = "\xE2\x94\x98";
+    }
+  else
+    {
+      default_caret = "^";
+      underline_start = underline_hbar = underline_end = "~";
+
+      rmargin_start = "|";
+      rmargin_vbar = "|";
+      rmargin_caption_row = rmargin_caption_row_at_end = "+";
+      rmargin_end = "|";
+    }
+}
diff --git a/gcc/box-drawing.h b/gcc/box-drawing.h
new file mode 100644
index 0000000..7fef108
--- /dev/null
+++ b/gcc/box-drawing.h
@@ -0,0 +1,43 @@
+/* Box-drawing, either using Unicode box-drawing chars, or as pure ASCII
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* Strings for drawing the caret and ranges within diagnostics.
+   There are either UTF-8-encoded Unicode box-drawing characters,
+   or pure ASCII.  */
+
+class box_drawing
+{
+ public:
+  void init (bool have_utf8);
+
+  const char *default_caret;
+  const char *underline_start;
+  const char *underline_hbar;
+  const char *underline_end;
+
+  /* Right-hand-side margins.  */
+  const char *rmargin_start;
+  const char *rmargin_vbar;
+  const char *rmargin_caption_row;
+  const char *rmargin_caption_row_at_end;
+  const char *rmargin_end;
+};
+
+/* Singleton instance.  */
+extern box_drawing g_line_art;
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9758b9e..c02ea39 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -10451,15 +10451,14 @@ c_option_controlling_cpp_error (int reason)
 /* Callback from cpp_error for PFILE to print diagnostics from the
    preprocessor.  The diagnostic is of type LEVEL, with REASON set
    to the reason code if LEVEL is represents a warning, at location
-   LOCATION unless this is after lexing and the compiler's location
-   should be used instead, with column number possibly overridden by
-   COLUMN_OVERRIDE if not zero; MSG is the translated message and AP
+   RICHLOC unless this is after lexing and the compiler's location
+   should be used instead; MSG is the translated message and AP
    the arguments.  Returns true if a diagnostic was emitted, false
    otherwise.  */
 
 bool
 c_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int level, int reason,
-	     location_t location, unsigned int column_override,
+	     rich_location *richloc,
 	     const char *msg, va_list *ap)
 {
   diagnostic_info diagnostic;
@@ -10500,11 +10499,11 @@ c_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int level, int reason,
       gcc_unreachable ();
     }
   if (done_lexing)
-    location = input_location;
+    richloc->set_range (0,
+			source_range::from_location (input_location),
+			true);
   diagnostic_set_info_translated (&diagnostic, msg, ap,
-				  location, dlevel);
-  if (column_override)
-    diagnostic_override_column (&diagnostic, column_override);
+				  richloc, dlevel);
   diagnostic_override_option_index (&diagnostic,
                                     c_option_controlling_cpp_error (reason));
   ret = report_diagnostic (&diagnostic);
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 74d1bc1..bb17fcc 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -981,9 +981,9 @@ extern void init_c_lex (void);
 
 extern void c_cpp_builtins (cpp_reader *);
 extern void c_cpp_builtins_optimize_pragma (cpp_reader *, tree, tree);
-extern bool c_cpp_error (cpp_reader *, int, int, location_t, unsigned int,
+extern bool c_cpp_error (cpp_reader *, int, int, rich_location *,
 			 const char *, va_list *)
-     ATTRIBUTE_GCC_DIAG(6,0);
+     ATTRIBUTE_GCC_DIAG(5,0);
 extern int c_common_has_attribute (cpp_reader *);
 
 extern bool parse_optimize_options (tree, bool);
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b83c584..e6b6ba5 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -5273,9 +5273,10 @@ warn_defaults_to (location_t location, int opt, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
                        flag_isoc99 ? DK_PEDWARN : DK_WARNING);
   diagnostic.option_index = opt;
   report_diagnostic (&diagnostic);
diff --git a/gcc/c/c-errors.c b/gcc/c/c-errors.c
index e5fbf05..0f8b933 100644
--- a/gcc/c/c-errors.c
+++ b/gcc/c/c-errors.c
@@ -42,13 +42,14 @@ pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool warned = false;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
   /* If desired, issue the C99/C11 compat warning, which is more specific
      than -pedantic.  */
   if (warn_c99_c11_compat > 0)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
 			   (pedantic && !flag_isoc11)
 			   ? DK_PEDWARN : DK_WARNING);
       diagnostic.option_index = OPT_Wc99_c11_compat;
@@ -60,7 +61,7 @@ pedwarn_c99 (location_t location, int opt, const char *gmsgid, ...)
   /* For -pedantic outside C11, issue a pedwarn.  */
   else if (pedantic && !flag_isoc11)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location, DK_PEDWARN);
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_PEDWARN);
       diagnostic.option_index = opt;
       warned = report_diagnostic (&diagnostic);
     }
@@ -80,6 +81,7 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
   /* Warnings such as -Wvla are the most specific ones.  */
@@ -90,7 +92,7 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
         goto out;
       else if (opt_var > 0)
 	{
-	  diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+	  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
 			       (pedantic && !flag_isoc99)
 			       ? DK_PEDWARN : DK_WARNING);
 	  diagnostic.option_index = opt;
@@ -102,7 +104,7 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
      specific than -pedantic.  */
   if (warn_c90_c99_compat > 0)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
 			   (pedantic && !flag_isoc99)
 			   ? DK_PEDWARN : DK_WARNING);
       diagnostic.option_index = OPT_Wc90_c99_compat;
@@ -114,7 +116,7 @@ pedwarn_c90 (location_t location, int opt, const char *gmsgid, ...)
   /* For -pedantic outside C99, issue a pedwarn.  */
   else if (pedantic && !flag_isoc99)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location, DK_PEDWARN);
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_PEDWARN);
       diagnostic.option_index = opt;
       report_diagnostic (&diagnostic);
     }
diff --git a/gcc/c/c-objc-common.c b/gcc/c/c-objc-common.c
index 47fd7de..1e601f9 100644
--- a/gcc/c/c-objc-common.c
+++ b/gcc/c/c-objc-common.c
@@ -101,7 +101,7 @@ c_tree_printer (pretty_printer *pp, text_info *text, const char *spec,
     {
       t = va_arg (*text->args_ptr, tree);
       if (set_locus)
-	text->set_location (0, DECL_SOURCE_LOCATION (t));
+	text->set_location (0, DECL_SOURCE_LOCATION (t), true);
     }
 
   switch (*spec)
diff --git a/gcc/cp/error.c b/gcc/cp/error.c
index faf8744..19ca8c3 100644
--- a/gcc/cp/error.c
+++ b/gcc/cp/error.c
@@ -3554,7 +3554,7 @@ cp_printer (pretty_printer *pp, text_info *text, const char *spec,
 
   pp_string (pp, result);
   if (set_locus && t != NULL)
-    text->set_location (0, location_of (t));
+    text->set_location (0, location_of (t), true);
   return true;
 #undef next_tree
 #undef next_tcode
@@ -3668,9 +3668,10 @@ pedwarn_cxx98 (location_t location, int opt, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
 		       (cxx_dialect == cxx98) ? DK_PEDWARN : DK_WARNING);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
diff --git a/gcc/diagnostic-color.c b/gcc/diagnostic-color.c
index 3fe49b2..d848dfc 100644
--- a/gcc/diagnostic-color.c
+++ b/gcc/diagnostic-color.c
@@ -164,7 +164,8 @@ static struct color_cap color_dict[] =
   { "warning", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_MAGENTA),
 	       7, false },
   { "note", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_CYAN), 4, false },
-  { "caret", SGR_SEQ (COLOR_BOLD COLOR_SEPARATOR COLOR_FG_GREEN), 5, false },
+  { "range1", SGR_SEQ (COLOR_FG_GREEN), 6, false },
+  { "range2", SGR_SEQ (COLOR_FG_BLUE), 6, false },
   { "locus", SGR_SEQ (COLOR_BOLD), 5, false },
   { "quote", SGR_SEQ (COLOR_BOLD), 5, false },
   { NULL, NULL, 0, false }
@@ -195,7 +196,7 @@ colorize_stop (bool show_color)
 }
 
 /* Parse GCC_COLORS.  The default would look like:
-   GCC_COLORS='error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01'
+   GCC_COLORS='error=01;31:warning=01;35:note=01;36:range1=32:range2=34;locus=01:quote=01'
    No character escaping is needed or supported.  */
 static bool
 parse_gcc_colors (void)
diff --git a/gcc/diagnostic-core.h b/gcc/diagnostic-core.h
index 66d2e42..a8a7c37 100644
--- a/gcc/diagnostic-core.h
+++ b/gcc/diagnostic-core.h
@@ -63,18 +63,26 @@ extern bool warning_n (location_t, int, int, const char *, const char *, ...)
     ATTRIBUTE_GCC_DIAG(4,6) ATTRIBUTE_GCC_DIAG(5,6);
 extern bool warning_at (location_t, int, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,4);
+extern bool warning_at_rich_loc (rich_location *, int, const char *, ...)
+    ATTRIBUTE_GCC_DIAG(3,4);
 extern void error (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern void error_n (location_t, int, const char *, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,5) ATTRIBUTE_GCC_DIAG(4,5);
 extern void error_at (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern void error_at_rich_loc (rich_location *, const char *, ...)
+  ATTRIBUTE_GCC_DIAG(2,3);
 extern void fatal_error (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3)
      ATTRIBUTE_NORETURN;
 /* Pass one of the OPT_W* from options.h as the second parameter.  */
 extern bool pedwarn (location_t, int, const char *, ...)
      ATTRIBUTE_GCC_DIAG(3,4);
 extern bool permerror (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern bool permerror_at_rich_loc (rich_location *, const char *,
+				   ...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void sorry (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
 extern void inform (location_t, const char *, ...) ATTRIBUTE_GCC_DIAG(2,3);
+extern void inform_at_rich_loc (rich_location *, const char *,
+				...) ATTRIBUTE_GCC_DIAG(2,3);
 extern void inform_n (location_t, int, const char *, const char *, ...)
     ATTRIBUTE_GCC_DIAG(3,5) ATTRIBUTE_GCC_DIAG(4,5);
 extern void verbatim (const char *, ...) ATTRIBUTE_GCC_DIAG(1,2);
diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 147a2b8..9216c4c 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backtrace.h"
 #include "diagnostic.h"
 #include "diagnostic-color.h"
+#include "box-drawing.h"
 
 #ifdef HAVE_TERMIOS_H
 # include <termios.h>
@@ -36,6 +37,10 @@ along with GCC; see the file COPYING3.  If not see
 # include <sys/ioctl.h>
 #endif
 
+static void
+diagnostic_print_ranges (diagnostic_context * context,
+			 const diagnostic_info *diagnostic);
+
 /* If LINE is longer than MAX_WIDTH, and COLUMN is not smaller than
    MAX_WIDTH by some margin, then adjust the start of the line such
    that the COLUMN is smaller than MAX_WIDTH minus the margin.  The
@@ -60,11 +65,808 @@ adjust_line (const char *line, int line_width,
   return line;
 }
 
-/* Print the physical source line corresponding to the location of
-   this diagnostic, and a caret indicating the precise column.  This
-   function only prints two caret characters if the two locations
-   given by DIAGNOSTIC are on the same line according to
-   diagnostic_same_line().  */
+/* Is (column, row) within the given range?
+
+   Ranges are closed (both limits are within the range).
+
+   Example A: a single-line range:
+     start:  (col=22, line=2)
+     finish: (col=38, line=2)
+
+  |00000011111111112222222222333333333344444444444
+  |34567890123456789012345678901234567890123456789
+--+-----------------------------------------------
+01|bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+02|bbbbbbbbbbbbbbbbbbbSwwwwwwwwwwwwwwwFaaaaaaaaaaa
+03|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+   Example B: a multiline range with
+     start:  (col=14, line=3)
+     finish: (col=08, line=5)
+
+  |00000011111111112222222222333333333344444444444
+  |34567890123456789012345678901234567890123456789
+--+-----------------------------------------------
+01|bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+02|bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
+03|bbbbbbbbbbbSwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
+04|wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
+05|wwwwwFaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+06|aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+--+-----------------------------------------------
+
+   Legend:
+   - 'b' indicates a point *before* the range
+   - 'S' indicates the start of the range
+   - 'w' indicates a point within the range
+   - 'F' indicates the finish of the range (which is
+	 within it).
+   - 'a' indicates a subsequent point *after* the range.  */
+
+bool
+location_range::contains_point (int row, int column) const
+{
+  gcc_assert (m_start.line <= m_finish.line);
+  /* ...but the equivalent isn't true for the columns;
+     consider example B in the comment above.  */
+
+  if (row < m_start.line)
+    /* Points before the first line of the range are
+       outside it (corresponding to line 01 in example A
+       and lines 01 and 02 in example B above).  */
+    return false;
+
+  if (row == m_start.line)
+    /* On same line as start of range (corresponding
+       to line 02 in example A and line 03 in example B).  */
+    {
+      if (column < m_start.column)
+	/* Points on the starting line of the range, but
+	   before the column in which it begins.  */
+	return false;
+
+      if (row < m_finish.line)
+	/* This is a multiline range; the point
+	   is within it (corresponds to line 03 in example B
+	   from column 14 onwards) */
+	return true;
+      else
+	{
+	  /* This is a single-line range.  */
+	  gcc_assert (row == m_finish.line);
+	  return column <= m_finish.column;
+	}
+    }
+
+  /* The point is in a line beyond that containing the
+     start of the range: lines 03 onwards in example A,
+     and lines 04 onwards in example B.  */
+  gcc_assert (row > m_start.line);
+
+  if (row > m_finish.line)
+    /* The point is beyond the final line of the range
+       (lines 03 onwards in example A, and lines 06 onwards
+       in example B).  */
+    return false;
+
+  if (row < m_finish.line)
+    {
+      /* The point is in a line that's fully within a multiline
+	 range (e.g. line 04 in example B).  */
+      gcc_assert (m_start.line < m_finish.line);
+      return true;
+    }
+
+  gcc_assert (row ==  m_finish.line);
+
+  return column <= m_finish.column;
+}
+
+/* Return true if (ROW/COLUMN) is within a range of RICHLOC.
+   If it returns true, OUT_RANGE_IDX and OUT_DRAW_CARET_P are
+   written to, with the range index, and whether we should draw
+   the caret at (ROW/COLUMN) (as opposed to an underline).  */
+
+static bool
+get_state_at_point (/* Inputs.  */
+		    int row, int column, rich_location *richloc,
+		    int first_non_ws, int last_non_ws,
+		    /* Outputs.  */
+		    int *out_range_idx,
+		    bool *out_draw_caret_p)
+{
+  /* Within a multiline range, don't display any underline or caret
+     in any leading or trailing whitespace on a line.  */
+  if (column < first_non_ws || column > last_non_ws)
+    return false;
+
+  for (rich_location::range_iter iter = richloc->iter_ranges ();
+       !iter.at_end ();
+       iter.next())
+    {
+      const location_range *range = *iter;
+
+      if (0)
+	fprintf (stderr,
+		 "range ( (%i, %i), (%i, %i))->contains_point (%i, %i): %s\n",
+		 range->m_start.line,
+		 range->m_start.column,
+		 range->m_finish.line,
+		 range->m_finish.column,
+		 row,
+		 column,
+		 range->contains_point (row, column) ? "true" : "false");
+
+      if (range->contains_point (row, column))
+	{
+	  *out_range_idx = iter.index();
+
+	  /* Are we at the range's caret?  is it visible? */
+	  *out_draw_caret_p = false;
+	  if (row == range->m_start.line
+	      && column == range->m_start.column)
+	    *out_draw_caret_p = range->m_show_caret_p;
+
+	  /* We are within a range.  */
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Get the column beyond the rightmost one that could contain a caret or
+   range marker, given that we stop rendering at trailing whitespace.  */
+
+static int
+get_x_bound_for_row (int row, int caret_column,
+		     rich_location *richloc,
+		     int last_non_ws)
+{
+  int result = caret_column + 1;
+
+  for (rich_location::range_iter iter = richloc->iter_ranges ();
+       !iter.at_end ();
+       iter.next())
+    {
+      const location_range *range = *iter;
+      if (row >= range->m_start.line)
+	{
+	  if (range->m_finish.line == row)
+	    {
+	      /* On the final line within a range; ensure that
+		 we render up to the end of the range.  */
+	      if (result <= range->m_finish.column)
+		result = range->m_finish.column + 1;
+	    }
+	  else if (row < range->m_finish.line)
+	    {
+	      /* Within a multiline range; ensure that we render up to the
+		 last non-whitespace column.  */
+	      if (result <= last_non_ws)
+		result = last_non_ws + 1;
+	    }
+	}
+    }
+
+  return result;
+}
+
+/* Classes for rendering source code and diagnostics, within an
+   anonymous namespace.
+   The work is done by "class layout", which embeds and uses
+   "class colorizer" and "class per_range_info" to get things done.  */
+
+namespace {
+
+/* A class to inject colorization codes when printing the diagnostic locus,
+   tracking state as it goes.  */
+
+class colorizer
+{
+ public:
+  colorizer (diagnostic_context *context,
+	     const diagnostic_info *diagnostic);
+  ~colorizer ();
+
+  void set_range (int range_idx) { set_state (range_idx); }
+  void set_normal_text () { set_state (STATE_NORMAL_TEXT); }
+
+ private:
+  void set_state (int state);
+  void begin_state (int state);
+  void finish_state (int state);
+
+ private:
+  static const int STATE_NORMAL_TEXT = -1;
+
+  diagnostic_context *m_context;
+  const diagnostic_info *m_diagnostic;
+  int m_current_state;
+  const char *m_caret_cs;
+  const char *m_caret_ce;
+  const char *m_range1_cs;
+  const char *m_range2_cs;
+  const char *m_range_ce;
+};
+
+/* A class for use by "class layout" below for capturing information on
+   how to display a specific range within a rich_location.  */
+
+class per_range_info
+{
+ public:
+  per_range_info (int range_idx);
+
+  int get_range_index () const { return m_range_idx; }
+
+  void add_uniquely_captioned_row (int row);
+
+  int get_first_unique_row () const;
+  int get_last_unique_row () const;
+
+  bool contains_line (int line) const;
+
+  void determine_location_for_caption (const char *filename);
+
+  int get_caption_row () const { return m_caption_row; }
+  int get_caption_column () const { return m_caption_column; }
+
+ private:
+  int m_range_idx;
+  auto_vec<int> m_unique_rows;
+  int m_caption_row;
+  int m_caption_column;
+};
+
+/* A class to control the overall layout of a diagnostic.
+
+   The layout is determined within the constructor.
+   It is then printed by repeatedly calling the "print_line" method.
+   Each such call can print two lines: one for the source line itself,
+   and potentially an "annotation" line, containing carets/underlines.
+   Both such lines could be printed with a right-hand margin, containing
+   additional information.
+
+   For now, we just support the case of a single margin at once,
+   showing any captioned ranges that exclusively occupy their lines.
+
+   We also assume we have disjoint ranges.  */
+
+class layout
+{
+ public:
+  layout (diagnostic_context *context,
+	  const diagnostic_info *diagnostic);
+
+  void print_line (int row);
+
+ private:
+  enum print_line_kind {
+    PRINT_LINE_KIND_SOURCE,
+    PRINT_LINE_KIND_ANNOTATION};
+
+ private:
+  per_range_info *
+  get_any_range (int line);
+
+  void
+  print_any_margin (int line, int column, enum print_line_kind kind);
+
+ private:
+  diagnostic_context *m_context;
+  pretty_printer *m_pp;
+  diagnostic_t m_diagnostic_kind;
+  rich_location *m_richloc;
+  expanded_location m_exploc;
+  colorizer m_colorizer;
+  auto_vec <per_range_info> m_per_range_vec;
+};
+
+/* Implementation of "class colorizer".  */
+
+/* The constructor for "colorizer".  Lookup and store color codes for the
+   different kinds of things we might need to print.  */
+
+colorizer::colorizer (diagnostic_context *context,
+		      const diagnostic_info *diagnostic) :
+  m_context (context),
+  m_diagnostic (diagnostic),
+  m_current_state (STATE_NORMAL_TEXT)
+{
+  m_caret_ce = colorize_stop (pp_show_color (context->printer));
+  m_range1_cs = colorize_start (pp_show_color (context->printer), "range1");
+  m_range2_cs = colorize_start (pp_show_color (context->printer), "range2");
+  m_range_ce = colorize_stop (pp_show_color (context->printer));
+}
+
+/* The destructor for "colorize".  If colorization is on, print a code to
+   turn it off.  */
+
+colorizer::~colorizer ()
+{
+  finish_state (m_current_state);
+}
+
+/* Update state, printing color codes if necessary if there's a state
+   change.  */
+
+void
+colorizer::set_state (int new_state)
+{
+  if (m_current_state != new_state)
+    {
+      finish_state (m_current_state);
+      m_current_state = new_state;
+      begin_state (new_state);
+    }
+}
+
+/* Turn on any colorization for STATE.  */
+
+void
+colorizer::begin_state (int state)
+{
+  switch (state)
+    {
+    case STATE_NORMAL_TEXT:
+      break;
+
+    case 0:
+      /* Make range 0 be the same color as the "kind" text
+	 (error vs warning vs note).  */
+      pp_string
+	(m_context->printer,
+	 colorize_start (pp_show_color (m_context->printer),
+			 diagnostic_get_color_for_kind (m_diagnostic->kind)));
+      break;
+
+    case 1:
+      pp_string (m_context->printer, m_range1_cs);
+      break;
+
+    case 2:
+      pp_string (m_context->printer, m_range2_cs);
+      break;
+
+    default:
+      /* We don't expect more than 3 ranges per diagnostic.  */
+      gcc_unreachable ();
+      break;
+    }
+}
+
+/* Turn off any colorization for STATE.  */
+
+void
+colorizer::finish_state (int state)
+{
+  switch (state)
+    {
+    case STATE_NORMAL_TEXT:
+      break;
+
+    case 0:
+      pp_string (m_context->printer, m_caret_ce);
+      break;
+
+    default:
+      /* Within a range.  */
+      gcc_assert (state > 0);
+      pp_string (m_context->printer, m_range_ce);
+      break;
+    }
+}
+
+/* Implementation of class per_range_info.  */
+
+/* The constructor for class per_range_info.  */
+
+per_range_info::per_range_info (int range_idx)
+: m_range_idx (range_idx),
+  m_unique_rows ()
+{
+}
+
+/* This range is the only range with a caption on row ROW; record it as
+   such.  */
+
+void
+per_range_info::add_uniquely_captioned_row (int row)
+{
+  m_unique_rows.safe_push (row);
+}
+
+/* Locate the first row for which this range is the only one with a caption.  */
+
+int
+per_range_info::get_first_unique_row () const
+{
+  gcc_assert (m_unique_rows.length () > 0);
+  return m_unique_rows[0];
+}
+
+/* Locate the last row for which this range is the only one with a caption.  */
+
+int
+per_range_info::get_last_unique_row () const
+{
+  gcc_assert (m_unique_rows.length () > 0);
+
+  return m_unique_rows[m_unique_rows.length () - 1];
+}
+
+/* Is LINE uniquely captioned by this range?  */
+
+bool
+per_range_info::contains_line (int line) const
+{
+  if (0 == m_unique_rows.length ())
+    return false;
+
+  return (line >= get_first_unique_row ()
+	  && line <= get_last_unique_row ());
+}
+
+/* Given a source line LINE of length LINE_WIDTH, determin the width
+   without any trailing whitespace.  */
+
+static int
+get_line_width_without_trailing_whitespace (const char *line, int line_width)
+{
+  int result = line_width;
+  while (result > 0)
+    {
+      char ch = line[result - 1];
+      if (ch == ' ' || ch == '\t')
+	result--;
+      else
+	break;
+    }
+  gcc_assert (result >= 0);
+  gcc_assert (result <= line_width);
+  gcc_assert (result == 0 ||
+	      (line[result - 1] != ' '
+	       && line[result -1] != '\t'));
+  return result;
+}
+
+/* Determine which row to write this range's caption text in (if any).  */
+
+void
+per_range_info::determine_location_for_caption (const char *filename)
+{
+  if (0 == m_unique_rows.length ())
+    return;
+
+  /* Determine the widest line, using it as the column in which to render
+     the caption (m_caption_column).  */
+  int result = 0;
+  for (int row = get_first_unique_row ();
+       row <= get_last_unique_row ();
+       row++)
+    {
+      int line_width;
+      const char *line = location_get_source_line (filename, row, &line_width);
+      line_width = get_line_width_without_trailing_whitespace (line,
+							       line_width);
+      if (result < line_width)
+	result = line_width + 1;
+    }
+  m_caption_column = result;
+
+  /* Determine which row to write the caption text in: the middle row
+     within the range, rounding down mathematically (and thus up
+     visually).  */
+  m_caption_row = (get_first_unique_row () + get_last_unique_row ()) / 2;
+}
+
+/* Implementation of class layout.  */
+
+/* Constructor for class layout. Populate m_per_range_vec, determining
+   for each range where to draw its caption (if any).  */
+
+layout::layout (diagnostic_context * context,
+		const diagnostic_info *diagnostic)
+: m_context (context),
+  m_pp (context->printer),
+  m_diagnostic_kind (diagnostic->kind),
+  m_richloc (diagnostic->richloc),
+  m_exploc (m_richloc->lazily_expand_location ()),
+  m_colorizer (context, diagnostic),
+  m_per_range_vec ()
+{
+  for (rich_location::range_iter iter = m_richloc->iter_ranges ();
+       !iter.at_end ();
+       iter.next())
+    {
+      per_range_info ri (iter.index ());
+      m_per_range_vec.safe_push (ri);
+    }
+
+  /* For each row, determine if it has a unique captioned range.  */
+  int last_line = m_richloc->get_last_line ();
+  for (int row = m_richloc->get_first_line ();
+       row <= last_line;
+       row++)
+    {
+      int num_captions_in_row = 0;
+      const location_range *first_caption = NULL;
+      int first_caption_idx = 0; /* silence "maybe-uninitialized" warning
+				    (PR 67196).  */
+      for (rich_location::range_iter iter = m_richloc->iter_ranges ();
+	   !iter.at_end ();
+	   iter.next())
+	{
+	  const location_range *range = *iter;
+	  if (range->m_caption)
+	    {
+	      if (row >= range->m_start.line
+		  && row <= range->m_finish.line)
+		{
+		  num_captions_in_row++;
+		  if (!first_caption)
+		    {
+		      first_caption = range;
+		      first_caption_idx = iter.index ();
+		    }
+		}
+	    }
+	}
+      if (first_caption && num_captions_in_row == 1)
+	m_per_range_vec[first_caption_idx].add_uniquely_captioned_row (row);
+    }
+
+  /* At this stage, each per_range_info now has a list of lines for which its
+     location_range uniquely captions that source line.  */
+
+  /* Next, calculate the best position in which to draw each caption.  */
+  per_range_info *ri;
+  int i;
+  FOR_EACH_VEC_ELT (m_per_range_vec, i, ri)
+    ri->determine_location_for_caption (m_richloc->get_range (i)->m_start.file);
+}
+
+/* Print text describing a line of source code.
+   This typically prints two lines:
+
+   (1) the source code itself, colorized at any ranges, and
+   (2) an annotation line containing any carets/underlines
+   describing the ranges.
+
+   Both lines (1) and (2) may contain a right-most margin containing a
+   vertical bar and a caption, describing a range.  */
+
+void
+layout::print_line (int row)
+{
+  int line_width;
+  const char *line = location_get_source_line (m_exploc.file, row,
+					       &line_width);
+  if (!line)
+    return;
+
+  m_colorizer.set_normal_text ();
+
+  /* Step 1: print the source code line.  */
+
+  /* We will stop printing at any trailing whitespace.  */
+  line_width
+    = get_line_width_without_trailing_whitespace (line,
+						  line_width);
+  pp_space (m_pp);
+  int first_non_ws = INT_MAX;
+  int last_non_ws = 0;
+  int column;
+  for (column = 1; column <= line_width; column++)
+    {
+      bool in_range_p;
+      int range_idx;
+      bool draw_caret_p;
+      in_range_p = get_state_at_point (row, column,
+				       m_richloc,
+				       0, INT_MAX,
+				       &range_idx, &draw_caret_p);
+      if (in_range_p)
+	m_colorizer.set_range (range_idx);
+      else
+	m_colorizer.set_normal_text ();
+      char c = *line == '\t' ? ' ' : *line;
+      if (c == '\0')
+	c = ' ';
+      if (c != ' ')
+	{
+	  last_non_ws = column;
+	  if (first_non_ws == INT_MAX)
+	    first_non_ws = column;
+	}
+      pp_character (m_pp, c);
+      line++;
+    }
+  print_any_margin (row, column, PRINT_LINE_KIND_SOURCE);
+
+  pp_newline (m_pp);
+
+  /* Step 2: print a line consisting of the caret/underlines for the
+     given source line.  */
+  int x_bound = get_x_bound_for_row (row, m_exploc.column,
+				     m_richloc,
+				     last_non_ws);
+
+  pp_space (m_pp);
+  for (int column = 1; column < x_bound; column++)
+    {
+      bool in_range_p;
+      int range_idx;
+      bool draw_caret_p;
+      in_range_p = get_state_at_point (row, column,
+				       m_richloc,
+				       first_non_ws, last_non_ws,
+				       &range_idx, &draw_caret_p);
+      if (in_range_p)
+	{
+	  /* Within a range.  Draw either the caret or an underline.  */
+	  m_colorizer.set_range (range_idx);
+	  if (draw_caret_p)
+	    /* Draw the caret.  */
+	    pp_string (m_pp, m_context->caret_chars[range_idx]);
+	  else
+	    {
+	      /* Within a range, but not drawing the caret.  Draw an underline
+		 character. */
+	      const location_range *range
+		= m_richloc->get_range (range_idx);
+	      if (range->m_start.line != range->m_finish.line)
+		{
+		  /* Underline multiline ranges using southwest/southeast
+		     corners for the very first/last positions, and hbar
+		     everywhere else.  */
+		  if (row == range->m_start.line
+		      && column == range->m_start.column)
+		    /* Start of multiline range: use a southwest corner for
+		       the underline.  */
+		    pp_string (m_pp, g_line_art.underline_start);
+		  else if (row == range->m_finish.line
+			   && column == range->m_finish.column)
+		    /* End of multiline range: use a southeast corner for
+		       the underline.  */
+		    pp_string (m_pp, g_line_art.underline_end);
+		  else
+		    pp_string (m_pp, g_line_art.underline_hbar);
+		}
+	      else
+		/* Render single-line ranges with the hbar character
+		   throughout.  */
+		pp_string (m_pp, g_line_art.underline_hbar);
+	    }
+	}
+      else
+	{
+	  /* Not in a range.  */
+	  m_colorizer.set_normal_text ();
+	  pp_character (m_pp, ' ');
+	}
+    }
+  print_any_margin (row, x_bound, PRINT_LINE_KIND_ANNOTATION);
+  pp_newline (m_pp);
+}
+
+/* Given a line of source code, get the per_range_info for the first range
+   within it, or NULL if there are no ranges.  */
+
+per_range_info *
+layout::get_any_range (int line)
+{
+  int i;
+  per_range_info *ri;
+  FOR_EACH_VEC_ELT (m_per_range_vec, i, ri)
+    if (ri->contains_line (line))
+      return ri;
+  return NULL;
+}
+
+/* Helper function for layout::print_line, to print the right-most margin
+   area (for both kinds of line, source and caret/underline).  */
+
+void
+layout::print_any_margin (int line, int column, enum print_line_kind kind)
+{
+  /* Locate the range layout for this line, if any.  */
+  per_range_info *ri = get_any_range (line);
+
+  /* If we're not in a range, there's no margin.  */
+  if (!ri)
+    return;
+
+  /* Fill with whitespace to get to the appropriate y-coordinate for
+     the margin.  */
+  while (column++ < ri->get_caption_column ())
+    pp_space (m_pp);
+
+  /* Colorize based on the range.  */
+  int range_idx = ri->get_range_index ();
+  m_colorizer.set_range (range_idx);
+
+  /* Draw a vertical line, with corners pointing to the left,
+     (falling back to '|' if unicode box-drawing is not available),
+     printing any caption on the *annotation* line for the
+     appropriate source line*/
+
+  if (line == ri->get_caption_row ()
+      && kind == PRINT_LINE_KIND_ANNOTATION)
+    {
+      /* We are on the annotation line of the
+	 appropriate source line to print the caption.  */
+
+      /* Draw the line before the caption.  */
+      pp_string (m_pp,
+		 line == ri->get_last_unique_row ()
+		 ? g_line_art.rmargin_caption_row_at_end
+		 : g_line_art.rmargin_caption_row);
+
+      location_range *range
+	= m_richloc->get_range (ri->get_range_index ());
+      gcc_assert (range);
+
+     /* Display any caption.  */
+      if (range->m_caption)
+	pp_string (m_pp, range->m_caption);
+    }
+  else if (line == ri->get_first_unique_row ()
+	   && kind == PRINT_LINE_KIND_SOURCE)
+    /* Start of multiline range.  */
+    pp_string (m_pp, g_line_art.rmargin_start);
+  else if (line == ri->get_last_unique_row ()
+	   && kind == PRINT_LINE_KIND_ANNOTATION)
+    /* End of multiline range.  */
+    pp_string (m_pp, g_line_art.rmargin_end);
+  else
+    /* A normal row: a vertical line.  */
+    pp_string (m_pp, g_line_art.rmargin_vbar);
+}
+
+} /* End of anonymous namespace.  */
+
+/* For debugging layout issues in diagnostic_show_locus and friends,
+   render a ruler giving column numbers (after the 1-column indent).  */
+
+static void
+show_ruler (diagnostic_context *context, int max_width)
+{
+  /* Hundreds.  */
+  if (max_width > 99)
+    {
+      pp_space (context->printer);
+      for (int column = 1; column < max_width; column++)
+	if (0 == column % 10)
+	  pp_character (context->printer, '0' + (column / 100) % 10);
+	else
+	  pp_space (context->printer);
+      pp_newline (context->printer);
+    }
+
+  /* Tens.  */
+  pp_space (context->printer);
+  for (int column = 1; column < max_width; column++)
+    if (0 == column % 10)
+      pp_character (context->printer, '0' + (column / 10) % 10);
+    else
+      pp_space (context->printer);
+  pp_newline (context->printer);
+
+  /* Units.  */
+  pp_space (context->printer);
+  for (int column = 1; column < max_width; column++)
+    pp_character (context->printer, '0' + (column % 10));
+  pp_newline (context->printer);
+}
+
+/* Print the physical source code corresponding to the location of
+   this diagnostic, with additional annotations.
+   If CONTEXT has set frontend_calls_diagnostic_print_caret_line_p,
+   the code is printed using diagnostic_print_caret_line; otherwise
+   it is printed using diagnostic_print_ranges.  */
+
 void
 diagnostic_show_locus (diagnostic_context * context,
 		       const diagnostic_info *diagnostic)
@@ -75,16 +877,25 @@ diagnostic_show_locus (diagnostic_context * context,
     return;
 
   context->last_location = diagnostic_location (diagnostic, 0);
-  expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
-  expanded_location s1 = { };
-  /* Zero-initialized. This is checked later by diagnostic_print_caret_line.  */
 
-  if (diagnostic_location (diagnostic, 1) > BUILTINS_LOCATION)
-    s1 = diagnostic_expand_location (diagnostic, 1);
+  if (context->frontend_calls_diagnostic_print_caret_line_p)
+    {
+      /* The GCC 5 routine. */
+      expanded_location s0 = diagnostic_expand_location (diagnostic, 0);
+      expanded_location s1 = { };
+      /* Zero-initialized. This is checked later by
+	 diagnostic_print_caret_line.  */
+
+      if (diagnostic_num_locations (diagnostic) >= 2)
+	s1 = diagnostic->message.m_richloc->get_range (1)->m_start;
 
-  diagnostic_print_caret_line (context, s0, s1,
-			       context->caret_chars[0],
-			       context->caret_chars[1]);
+      diagnostic_print_caret_line (context, s0, s1,
+				   context->caret_chars[0],
+				   context->caret_chars[1]);
+    }
+  else
+    /* The GCC 6 routine.  */
+    diagnostic_print_ranges (context, diagnostic);
 }
 
 /* Print (part) of the source line given by xloc1 with caret1 pointing
@@ -96,7 +907,7 @@ void
 diagnostic_print_caret_line (diagnostic_context * context,
 			     expanded_location xloc1,
 			     expanded_location xloc2,
-			     char caret1, char caret2)
+			     const char *caret1, const char *caret2)
 {
   if (!diagnostic_same_line (context, xloc1, xloc2))
     /* This will mean ignore xloc2.  */
@@ -145,22 +956,52 @@ diagnostic_print_caret_line (diagnostic_context * context,
   caret_ce = colorize_stop (pp_show_color (context->printer));
   int cmin = xloc2.column
     ? MIN (xloc1.column, xloc2.column) : xloc1.column;
-  int caret_min = cmin == xloc1.column ? caret1 : caret2;
-  int caret_max = cmin == xloc1.column ? caret2 : caret1;
+  const char *caret_min = cmin == xloc1.column ? caret1 : caret2;
+  const char *caret_max = cmin == xloc1.column ? caret2 : caret1;
 
   /* cmin is >= 1, but we indent with an extra space at the start like
      we did above.  */
   int i;
   for (i = 0; i < cmin; i++)
     pp_space (context->printer);
-  pp_printf (context->printer, "%s%c%s", caret_cs, caret_min, caret_ce);
+  pp_printf (context->printer, "%s%s%s", caret_cs, caret_min, caret_ce);
 
   if (xloc2.column)
     {
       for (i++; i < cmax; i++)
 	pp_space (context->printer);
-      pp_printf (context->printer, "%s%c%s", caret_cs, caret_max, caret_ce);
+      pp_printf (context->printer, "%s%s%s", caret_cs, caret_max, caret_ce);
     }
   pp_set_prefix (context->printer, saved_prefix);
   pp_needs_newline (context->printer) = true;
 }
+
+/* Print all source lines covered by the locations and any ranges
+   within DIAGNOSTIC, displaying one or more carets and zero or more
+   underlines as appropriate, potentially with captions.  */
+
+static void
+diagnostic_print_ranges (diagnostic_context * context,
+			 const diagnostic_info *diagnostic)
+{
+  int max_width = context->caret_max_width;
+
+  pp_newline (context->printer);
+
+  const char *saved_prefix = pp_get_prefix (context->printer);
+  pp_set_prefix (context->printer, NULL);
+
+  if (0)
+    show_ruler (context, max_width);
+
+  {
+    layout layout (context, diagnostic);
+    int last_line = diagnostic->richloc->get_last_line ();
+    for (int row = diagnostic->richloc->get_first_line ();
+	 row <= last_line;
+	 row++)
+      layout.print_line (row);
+  }
+
+  pp_set_prefix (context->printer, saved_prefix);
+}
diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index f40e469..060f071 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -31,6 +31,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backtrace.h"
 #include "diagnostic.h"
 #include "diagnostic-color.h"
+#include "box-drawing.h"
 
 #ifdef HAVE_TERMIOS_H
 # include <termios.h>
@@ -145,8 +146,9 @@ diagnostic_initialize (diagnostic_context *context, int n_opts)
     context->classify_diagnostic[i] = DK_UNSPECIFIED;
   context->show_caret = false;
   diagnostic_set_caret_max_width (context, pp_line_cutoff (context->printer));
-  for (i = 0; i < MAX_LOCATIONS_PER_MESSAGE; i++)
-    context->caret_chars[i] = '^';
+  gcc_assert (g_line_art.default_caret);
+  for (i = 0; i < rich_location::MAX_RANGES; i++)
+    context->caret_chars[i] = g_line_art.default_caret;
   context->show_option_requested = false;
   context->abort_on_error = false;
   context->show_column = false;
@@ -235,16 +237,15 @@ diagnostic_finish (diagnostic_context *context)
    translated.  */
 void
 diagnostic_set_info_translated (diagnostic_info *diagnostic, const char *msg,
-				va_list *args, location_t location,
+				va_list *args, rich_location *richloc,
 				diagnostic_t kind)
 {
+  gcc_assert (richloc);
   diagnostic->message.err_no = errno;
   diagnostic->message.args_ptr = args;
   diagnostic->message.format_spec = msg;
-  diagnostic->message.set_location (0, location);
-  for (int i = 1; i < MAX_LOCATIONS_PER_MESSAGE; i++)
-    diagnostic->message.set_location (i, UNKNOWN_LOCATION);
-  diagnostic->override_column = 0;
+  diagnostic->message.m_richloc = richloc;
+  diagnostic->richloc = richloc;
   diagnostic->kind = kind;
   diagnostic->option_index = 0;
 }
@@ -253,10 +254,27 @@ diagnostic_set_info_translated (diagnostic_info *diagnostic, const char *msg,
    translated.  */
 void
 diagnostic_set_info (diagnostic_info *diagnostic, const char *gmsgid,
-		     va_list *args, location_t location,
+		     va_list *args, rich_location *richloc,
 		     diagnostic_t kind)
 {
-  diagnostic_set_info_translated (diagnostic, _(gmsgid), args, location, kind);
+  gcc_assert (richloc);
+  diagnostic_set_info_translated (diagnostic, _(gmsgid), args, richloc, kind);
+}
+
+static const char *const diagnostic_kind_color[] = {
+#define DEFINE_DIAGNOSTIC_KIND(K, T, C) (C),
+#include "diagnostic.def"
+#undef DEFINE_DIAGNOSTIC_KIND
+  NULL
+};
+
+/* Get a color name for diagnostics of type KIND
+   Result could be NULL.  */
+
+const char *
+diagnostic_get_color_for_kind (diagnostic_t kind)
+{
+  return diagnostic_kind_color[kind];
 }
 
 /* Return a malloc'd string describing a location.  The caller is
@@ -271,12 +289,6 @@ diagnostic_build_prefix (diagnostic_context *context,
 #undef DEFINE_DIAGNOSTIC_KIND
     "must-not-happen"
   };
-  static const char *const diagnostic_kind_color[] = {
-#define DEFINE_DIAGNOSTIC_KIND(K, T, C) (C),
-#include "diagnostic.def"
-#undef DEFINE_DIAGNOSTIC_KIND
-    NULL
-  };
   gcc_assert (diagnostic->kind < DK_LAST_DIAGNOSTIC_KIND);
 
   const char *text = _(diagnostic_kind_text[diagnostic->kind]);
@@ -775,10 +787,14 @@ diagnostic_report_diagnostic (diagnostic_context *context,
 
       if (option_text)
 	{
+	  const char *cs
+	    = colorize_start (pp_show_color (context->printer),
+			      diagnostic_kind_color[diagnostic->kind]);
+	  const char *ce = colorize_stop (pp_show_color (context->printer));
 	  diagnostic->message.format_spec
 	    = ACONCAT ((diagnostic->message.format_spec,
 			" ", 
-			"[", option_text, "]",
+			"[", cs, option_text, ce, "]",
 			NULL));
 	  free (option_text);
 	}
@@ -858,9 +874,40 @@ diagnostic_append_note (diagnostic_context *context,
   diagnostic_info diagnostic;
   va_list ap;
   const char *saved_prefix;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location, DK_NOTE);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_NOTE);
+  if (context->inhibit_notes_p)
+    {
+      va_end (ap);
+      return;
+    }
+  saved_prefix = pp_get_prefix (context->printer);
+  pp_set_prefix (context->printer,
+                 diagnostic_build_prefix (context, &diagnostic));
+  pp_newline (context->printer);
+  pp_format (context->printer, &diagnostic.message);
+  pp_output_formatted_text (context->printer);
+  pp_destroy_prefix (context->printer);
+  pp_set_prefix (context->printer, saved_prefix);
+  diagnostic_show_locus (context, &diagnostic);
+  va_end (ap);
+}
+
+/* Same as diagnostic_append_note, but at RICHLOC. */
+
+void
+diagnostic_append_note_at_rich_loc (diagnostic_context *context,
+				    rich_location *richloc,
+				    const char * gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  const char *saved_prefix;
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc, DK_NOTE);
   if (context->inhibit_notes_p)
     {
       va_end (ap);
@@ -885,16 +932,17 @@ emit_diagnostic (diagnostic_t kind, location_t location, int opt,
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
   if (kind == DK_PERMERROR)
     {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
 			   permissive_error_kind (global_dc));
       diagnostic.option_index = permissive_error_option (global_dc);
     }
   else {
-      diagnostic_set_info (&diagnostic, gmsgid, &ap, location, kind);
+      diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, kind);
       if (kind == DK_WARNING || kind == DK_PEDWARN)
 	diagnostic.option_index = opt;
   }
@@ -911,9 +959,23 @@ inform (location_t location, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location, DK_NOTE);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_NOTE);
+  report_diagnostic (&diagnostic);
+  va_end (ap);
+}
+
+/* Same as "inform", but at RICHLOC.  */
+void
+inform_at_rich_loc (rich_location *richloc, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc, DK_NOTE);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -926,11 +988,12 @@ inform_n (location_t location, int n, const char *singular_gmsgid,
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (location);
 
   va_start (ap, plural_gmsgid);
   diagnostic_set_info_translated (&diagnostic,
                                   ngettext (singular_gmsgid, plural_gmsgid, n),
-                                  &ap, location, DK_NOTE);
+                                  &ap, &richloc, DK_NOTE);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -944,9 +1007,10 @@ warning (int opt, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (input_location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, input_location, DK_WARNING);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_WARNING);
   diagnostic.option_index = opt;
 
   ret = report_diagnostic (&diagnostic);
@@ -964,9 +1028,27 @@ warning_at (location_t location, int opt, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location, DK_WARNING);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_WARNING);
+  diagnostic.option_index = opt;
+  ret = report_diagnostic (&diagnostic);
+  va_end (ap);
+  return ret;
+}
+
+/* Same as warning at, but using RICHLOC.  */
+
+bool
+warning_at_rich_loc (rich_location *richloc, int opt, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  bool ret;
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc, DK_WARNING);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
   va_end (ap);
@@ -984,11 +1066,13 @@ warning_n (location_t location, int opt, int n, const char *singular_gmsgid,
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
 
   va_start (ap, plural_gmsgid);
   diagnostic_set_info_translated (&diagnostic,
                                   ngettext (singular_gmsgid, plural_gmsgid, n),
-                                  &ap, location, DK_WARNING);
+                                  &ap, &richloc, DK_WARNING
+);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
   va_end (ap);
@@ -1014,9 +1098,10 @@ pedwarn (location_t location, int opt, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location,  DK_PEDWARN);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,  DK_PEDWARN);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
   va_end (ap);
@@ -1036,9 +1121,28 @@ permerror (location_t location, const char *gmsgid, ...)
   diagnostic_info diagnostic;
   va_list ap;
   bool ret;
+  rich_location richloc (location);
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc,
+                       permissive_error_kind (global_dc));
+  diagnostic.option_index = permissive_error_option (global_dc);
+  ret = report_diagnostic (&diagnostic);
+  va_end (ap);
+  return ret;
+}
+
+/* Same as "permerror", but at RICHLOC.  */
+
+bool
+permerror_at_rich_loc (rich_location *richloc, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+  bool ret;
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, location,
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, richloc,
                        permissive_error_kind (global_dc));
   diagnostic.option_index = permissive_error_option (global_dc);
   ret = report_diagnostic (&diagnostic);
@@ -1053,9 +1157,10 @@ error (const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (input_location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, input_location, DK_ERROR);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_ERROR);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -1068,11 +1173,12 @@ error_n (location_t location, int n, const char *singular_gmsgid,
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (location);
 
   va_start (ap, plural_gmsgid);
   diagnostic_set_info_translated (&diagnostic,
                                   ngettext (singular_gmsgid, plural_gmsgid, n),
-                                  &ap, location, DK_ERROR);
+                                  &ap, &richloc, DK_ERROR);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -1083,9 +1189,25 @@ error_at (location_t loc, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (loc);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, loc, DK_ERROR);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_ERROR);
+  report_diagnostic (&diagnostic);
+  va_end (ap);
+}
+
+/* Same as above, but use RICH_LOC.  */
+
+void
+error_at_rich_loc (rich_location *rich_loc, const char *gmsgid, ...)
+{
+  diagnostic_info diagnostic;
+  va_list ap;
+
+  va_start (ap, gmsgid);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, rich_loc,
+		       DK_ERROR);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -1098,9 +1220,10 @@ sorry (const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (input_location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, input_location, DK_SORRY);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_SORRY);
   report_diagnostic (&diagnostic);
   va_end (ap);
 }
@@ -1121,9 +1244,10 @@ fatal_error (location_t loc, const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (loc);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, loc, DK_FATAL);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_FATAL);
   report_diagnostic (&diagnostic);
   va_end (ap);
 
@@ -1139,9 +1263,10 @@ internal_error (const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (input_location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, input_location, DK_ICE);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_ICE);
   report_diagnostic (&diagnostic);
   va_end (ap);
 
@@ -1156,9 +1281,10 @@ internal_error_no_backtrace (const char *gmsgid, ...)
 {
   diagnostic_info diagnostic;
   va_list ap;
+  rich_location richloc (input_location);
 
   va_start (ap, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &ap, input_location, DK_ICE_NOBT);
+  diagnostic_set_info (&diagnostic, gmsgid, &ap, &richloc, DK_ICE_NOBT);
   report_diagnostic (&diagnostic);
   va_end (ap);
 
@@ -1222,3 +1348,11 @@ real_abort (void)
 {
   abort ();
 }
+
+void
+source_range::debug (const char *msg) const
+{
+  rich_location richloc (m_start);
+  richloc.add_range (m_start, m_finish);
+  inform_at_rich_loc (&richloc, "%s", msg);
+}
diff --git a/gcc/diagnostic.h b/gcc/diagnostic.h
index 1b9b7d4..48512b2 100644
--- a/gcc/diagnostic.h
+++ b/gcc/diagnostic.h
@@ -29,10 +29,12 @@ along with GCC; see the file COPYING3.  If not see
    list in diagnostic.def.  */
 struct diagnostic_info
 {
-  /* Text to be formatted. It also contains the location(s) for this
-     diagnostic.  */
+  /* Text to be formatted.  */
   text_info message;
-  unsigned int override_column;
+
+  /* The location at which the diagnostic is to be reported.  */
+  rich_location *richloc;
+
   /* Auxiliary data for client.  */
   void *x_data;
   /* The kind of diagnostic it is about.  */
@@ -107,7 +109,7 @@ struct diagnostic_context
   int caret_max_width;
 
   /* Characters used for caret diagnostics.  */
-  char caret_chars[MAX_LOCATIONS_PER_MESSAGE];
+  const char *caret_chars[rich_location::MAX_RANGES];
 
   /* True if we should print the command line option which controls
      each diagnostic, if known.  */
@@ -185,6 +187,11 @@ struct diagnostic_context
   int lock;
 
   bool inhibit_notes_p;
+
+  /* Does the frontend make calls to diagnostic_print_caret_line?
+     If so, we fall back to the old implementation of
+     diagnostic_show_locus.  */
+  bool frontend_calls_diagnostic_print_caret_line_p;
 };
 
 static inline void
@@ -256,10 +263,6 @@ extern diagnostic_context *global_dc;
 
 #define report_diagnostic(D) diagnostic_report_diagnostic (global_dc, D)
 
-/* Override the column number to be used for reporting a
-   diagnostic.  */
-#define diagnostic_override_column(DI, COL) (DI)->override_column = (COL)
-
 /* Override the option index to be used for reporting a
    diagnostic.  */
 #define diagnostic_override_option_index(DI, OPTIDX) \
@@ -283,13 +286,17 @@ extern bool diagnostic_report_diagnostic (diagnostic_context *,
 					  diagnostic_info *);
 #ifdef ATTRIBUTE_GCC_DIAG
 extern void diagnostic_set_info (diagnostic_info *, const char *, va_list *,
-				 location_t, diagnostic_t) ATTRIBUTE_GCC_DIAG(2,0);
+				 rich_location *, diagnostic_t) ATTRIBUTE_GCC_DIAG(2,0);
 extern void diagnostic_set_info_translated (diagnostic_info *, const char *,
-					    va_list *, location_t,
+					    va_list *, rich_location *,
 					    diagnostic_t)
      ATTRIBUTE_GCC_DIAG(2,0);
 extern void diagnostic_append_note (diagnostic_context *, location_t,
                                     const char *, ...) ATTRIBUTE_GCC_DIAG(3,4);
+extern void diagnostic_append_note_at_rich_loc (diagnostic_context *,
+						rich_location *,
+						const char *, ...)
+  ATTRIBUTE_GCC_DIAG(3,4);
 #endif
 extern char *diagnostic_build_prefix (diagnostic_context *, const diagnostic_info *);
 void default_diagnostic_starter (diagnostic_context *, diagnostic_info *);
@@ -310,6 +317,14 @@ diagnostic_location (const diagnostic_info * diagnostic, int which = 0)
   return diagnostic->message.get_location (which);
 }
 
+/* Return the number of locations to be printed in DIAGNOSTIC.  */
+
+static inline unsigned int
+diagnostic_num_locations (const diagnostic_info * diagnostic)
+{
+  return diagnostic->message.m_richloc->get_num_locations ();
+}
+
 /* Expand the location of this diagnostic. Use this function for
    consistency.  Parameter WHICH specifies which location. By default,
    expand the first one.  */
@@ -317,12 +332,7 @@ diagnostic_location (const diagnostic_info * diagnostic, int which = 0)
 static inline expanded_location
 diagnostic_expand_location (const diagnostic_info * diagnostic, int which = 0)
 {
-  expanded_location s
-    = expand_location_to_spelling_point (diagnostic_location (diagnostic,
-							      which));
-  if (which == 0 && diagnostic->override_column)
-    s.column = diagnostic->override_column;
-  return s;
+  return diagnostic->richloc->get_range (which)->m_start;
 }
 
 /* This is somehow the right-side margin of a caret line, that is, we
@@ -346,7 +356,11 @@ void
 diagnostic_print_caret_line (diagnostic_context * context,
 			     expanded_location xloc1,
 			     expanded_location xloc2,
-			     char caret1, char caret2);
+			     const char *caret1, const char *caret2);
+
+
+extern const char *
+diagnostic_get_color_for_kind (diagnostic_t kind);
 
 /* Pure text formatting support functions.  */
 extern char *file_name_as_prefix (diagnostic_context *, const char *);
diff --git a/gcc/fortran/cpp.c b/gcc/fortran/cpp.c
index daffc20..92dc584 100644
--- a/gcc/fortran/cpp.c
+++ b/gcc/fortran/cpp.c
@@ -149,9 +149,9 @@ static void cb_include (cpp_reader *, source_location, const unsigned char *,
 static void cb_ident (cpp_reader *, source_location, const cpp_string *);
 static void cb_used_define (cpp_reader *, source_location, cpp_hashnode *);
 static void cb_used_undef (cpp_reader *, source_location, cpp_hashnode *);
-static bool cb_cpp_error (cpp_reader *, int, int, location_t, unsigned int,
+static bool cb_cpp_error (cpp_reader *, int, int, rich_location *,
 			  const char *, va_list *)
-     ATTRIBUTE_GCC_DIAG(6,0);
+     ATTRIBUTE_GCC_DIAG(5,0);
 void pp_dir_change (cpp_reader *, const char *);
 
 static int dump_macro (cpp_reader *, cpp_hashnode *, void *);
@@ -1026,13 +1026,12 @@ cb_used_define (cpp_reader *pfile, source_location line ATTRIBUTE_UNUSED,
 /* Callback from cpp_error for PFILE to print diagnostics from the
    preprocessor.  The diagnostic is of type LEVEL, with REASON set
    to the reason code if LEVEL is represents a warning, at location
-   LOCATION, with column number possibly overridden by COLUMN_OVERRIDE
-   if not zero; MSG is the translated message and AP the arguments.
+   RICHLOC; MSG is the translated message and AP the arguments.
    Returns true if a diagnostic was emitted, false otherwise.  */
 
 static bool
 cb_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int level, int reason,
-	      location_t location, unsigned int column_override,
+	      rich_location *richloc,
 	      const char *msg, va_list *ap)
 {
   diagnostic_info diagnostic;
@@ -1067,9 +1066,7 @@ cb_cpp_error (cpp_reader *pfile ATTRIBUTE_UNUSED, int level, int reason,
       gcc_unreachable ();
     }
   diagnostic_set_info_translated (&diagnostic, msg, ap,
-				  location, dlevel);
-  if (column_override)
-    diagnostic_override_column (&diagnostic, column_override);
+				  richloc, dlevel);
   if (reason == CPP_W_WARNING_DIRECTIVE)
     diagnostic_override_option_index (&diagnostic, OPT_Wcpp);
   ret = report_diagnostic (&diagnostic);
diff --git a/gcc/fortran/error.c b/gcc/fortran/error.c
index 3825751..b5b7beb 100644
--- a/gcc/fortran/error.c
+++ b/gcc/fortran/error.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "diagnostic.h"
 #include "diagnostic-color.h"
 #include "tree-diagnostic.h" /* tree_diagnostics_defaults */
+#include "box-drawing.h"
 
 #include <new> /* For placement-new */
 
@@ -773,6 +774,7 @@ gfc_warning (int opt, const char *gmsgid, va_list ap)
   va_copy (argp, ap);
 
   diagnostic_info diagnostic;
+  rich_location rich_loc (UNKNOWN_LOCATION);
   bool fatal_errors = global_dc->fatal_errors;
   pretty_printer *pp = global_dc->printer;
   output_buffer *tmp_buffer = pp->buffer;
@@ -787,7 +789,7 @@ gfc_warning (int opt, const char *gmsgid, va_list ap)
       --werrorcount;
     }
 
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION,
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc,
 		       DK_WARNING);
   diagnostic.option_index = opt;
   bool ret = report_diagnostic (&diagnostic);
@@ -938,10 +940,12 @@ gfc_format_decoder (pretty_printer *pp,
 	/* If location[0] != UNKNOWN_LOCATION means that we already
 	   processed one of %C/%L.  */
 	int loc_num = text->get_location (0) == UNKNOWN_LOCATION ? 0 : 1;
-	text->set_location (loc_num,
-			    linemap_position_for_loc_and_offset (line_table,
-								 loc->lb->location,
-								 offset));
+	source_range range
+	  = source_range::from_location (
+	      linemap_position_for_loc_and_offset (line_table,
+						   loc->lb->location,
+						   offset));
+	text->set_range (loc_num, range, true);
 	pp_string (pp, result[loc_num]);
 	return true;
       }
@@ -1075,7 +1079,7 @@ gfc_diagnostic_starter (diagnostic_context *context,
 
   expanded_location s1 = diagnostic_expand_location (diagnostic);
   expanded_location s2;
-  bool one_locus = diagnostic_location (diagnostic, 1) == UNKNOWN_LOCATION;
+  bool one_locus = diagnostic->richloc->get_num_locations () < 2;
   bool same_locus = false;
 
   if (!one_locus) 
@@ -1173,10 +1177,11 @@ gfc_warning_now_at (location_t loc, int opt, const char *gmsgid, ...)
 {
   va_list argp;
   diagnostic_info diagnostic;
+  rich_location rich_loc (loc);
   bool ret;
 
   va_start (argp, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, loc, DK_WARNING);
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc, DK_WARNING);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
   va_end (argp);
@@ -1190,10 +1195,11 @@ gfc_warning_now (int opt, const char *gmsgid, ...)
 {
   va_list argp;
   diagnostic_info diagnostic;
+  rich_location rich_loc (UNKNOWN_LOCATION);
   bool ret;
 
   va_start (argp, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION,
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc,
 		       DK_WARNING);
   diagnostic.option_index = opt;
   ret = report_diagnostic (&diagnostic);
@@ -1209,11 +1215,12 @@ gfc_error_now (const char *gmsgid, ...)
 {
   va_list argp;
   diagnostic_info diagnostic;
+  rich_location rich_loc (UNKNOWN_LOCATION);
 
   error_buffer.flag = true;
 
   va_start (argp, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION, DK_ERROR);
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc, DK_ERROR);
   report_diagnostic (&diagnostic);
   va_end (argp);
 }
@@ -1226,9 +1233,10 @@ gfc_fatal_error (const char *gmsgid, ...)
 {
   va_list argp;
   diagnostic_info diagnostic;
+  rich_location rich_loc (UNKNOWN_LOCATION);
 
   va_start (argp, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION, DK_FATAL);
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc, DK_FATAL);
   report_diagnostic (&diagnostic);
   va_end (argp);
 
@@ -1291,6 +1299,7 @@ gfc_error (const char *gmsgid, va_list ap)
     }
 
   diagnostic_info diagnostic;
+  rich_location richloc (UNKNOWN_LOCATION);
   bool fatal_errors = global_dc->fatal_errors;
   pretty_printer *pp = global_dc->printer;
   output_buffer *tmp_buffer = pp->buffer;
@@ -1306,7 +1315,7 @@ gfc_error (const char *gmsgid, va_list ap)
       --errorcount;
     }
 
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION, DK_ERROR);
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &richloc, DK_ERROR);
   report_diagnostic (&diagnostic);
 
   if (buffered_p)
@@ -1336,9 +1345,10 @@ gfc_internal_error (const char *gmsgid, ...)
 {
   va_list argp;
   diagnostic_info diagnostic;
+  rich_location rich_loc (UNKNOWN_LOCATION);
 
   va_start (argp, gmsgid);
-  diagnostic_set_info (&diagnostic, gmsgid, &argp, UNKNOWN_LOCATION, DK_ICE);
+  diagnostic_set_info (&diagnostic, gmsgid, &argp, &rich_loc, DK_ICE);
   report_diagnostic (&diagnostic);
   va_end (argp);
 
@@ -1470,8 +1480,9 @@ gfc_diagnostics_init (void)
   diagnostic_starter (global_dc) = gfc_diagnostic_starter;
   diagnostic_finalizer (global_dc) = gfc_diagnostic_finalizer;
   diagnostic_format_decoder (global_dc) = gfc_format_decoder;
-  global_dc->caret_chars[0] = '1';
-  global_dc->caret_chars[1] = '2';
+  global_dc->caret_chars[0] = "1";
+  global_dc->caret_chars[1] = "2";
+  global_dc->frontend_calls_diagnostic_print_caret_line_p = true;
   pp_warning_buffer = new (XNEW (output_buffer)) output_buffer ();
   pp_warning_buffer->flush_p = false;
   /* pp_error_buffer is statically allocated.  This simplifies memory
@@ -1488,6 +1499,6 @@ gfc_diagnostics_finish (void)
      defaults.  */
   diagnostic_starter (global_dc) = gfc_diagnostic_starter;
   diagnostic_finalizer (global_dc) = gfc_diagnostic_finalizer;
-  global_dc->caret_chars[0] = '^';
-  global_dc->caret_chars[1] = '^';
+  global_dc->caret_chars[0] = g_line_art.default_caret;
+  global_dc->caret_chars[1] = g_line_art.default_caret;
 }
diff --git a/gcc/gcc-rich-location.c b/gcc/gcc-rich-location.c
new file mode 100644
index 0000000..bdb2915
--- /dev/null
+++ b/gcc/gcc-rich-location.c
@@ -0,0 +1,93 @@
+/* Implementation of gcc_rich_location class
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "hash-set.h"
+#include "machmode.h"
+#include "vec.h"
+#include "double-int.h"
+#include "input.h"
+#include "alias.h"
+#include "symtab.h"
+#include "wide-int.h"
+#include "inchash.h"
+#include "tree-core.h"
+#include "tree.h"
+#include "diagnostic-core.h"
+#include "gcc-rich-location.h"
+#include "print-tree.h"
+#include "pretty-print.h"
+#include "intl.h"
+#include "cpplib.h"
+#include "diagnostic.h"
+
+/* Add a range covering [START, FINISH], with a caption given
+   by translating and formatting GMSGID and any variadic args.  */
+
+void
+gcc_rich_location::add_range_with_caption (location_t start, location_t finish,
+					   diagnostic_context *context,
+					   const char *gmsgid, ...)
+{
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+
+  va_list ap;
+  va_start (ap, gmsgid);
+
+  char *caption = expand_caption_va (context, gmsgid, &ap);
+  add_range (start, finish, caption, BUFFER_OWNERSHIP_GIVEN, false);
+
+  va_end (ap);
+}
+
+/* Translate and expand the given GMSGID and ARGS into a caption
+   (or NULL).
+
+   If non-NULL, ownership of the result transfers to the caller,
+   which must free it.  */
+
+char *
+gcc_rich_location::expand_caption_va (diagnostic_context *context,
+				      const char *gmsgid, va_list *args)
+{
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+
+  /* Only bother if show-caret is enabled.  */
+  if (!context->show_caret)
+    return NULL;
+
+  /* Format the text, and return a copy.  */
+  pretty_printer * const pp = context->printer;
+  char *result;
+  text_info text;
+  text.err_no = errno;
+  text.args_ptr = args;
+  text.format_spec = G_(gmsgid);
+  pp_format (pp, &text);
+  pp_output_formatted_text (pp);
+  result = xstrdup (pp_formatted_text (pp));
+  pp_clear_output_area (pp);
+  return result;
+}
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
new file mode 100644
index 0000000..795b60f
--- /dev/null
+++ b/gcc/gcc-rich-location.h
@@ -0,0 +1,54 @@
+/* Declarations relating to class gcc_rich_location
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_RICH_LOCATION_H
+#define GCC_RICH_LOCATION_H
+
+/* A gcc_rich_location is libcpp's rich_location with additional
+   helper methods for working with gcc's types.  */
+class gcc_rich_location : public rich_location
+{
+ public:
+  /* Constructors.  */
+
+  /* Constructing from a location.  */
+  gcc_rich_location (source_location loc) :
+    rich_location (loc) {}
+
+  /* Constructing from a source_range.  */
+  gcc_rich_location (source_range src_range) :
+    rich_location (src_range) {}
+
+
+  /* Methods for adding additional details.  */
+
+  void
+  add_range_with_caption (location_t start, location_t finish,
+			  diagnostic_context *context,
+			  const char *gmsgid, ...)
+    ATTRIBUTE_GCC_DIAG(5,6);
+
+ private:
+  static char *
+  expand_caption_va (diagnostic_context *context,
+		     const char *gmsgid, va_list *args)
+    ATTRIBUTE_GCC_DIAG(2, 0);
+};
+
+#endif /* GCC_RICH_LOCATION_H */
diff --git a/gcc/genmatch.c b/gcc/genmatch.c
index 7266637..e7d8b98 100644
--- a/gcc/genmatch.c
+++ b/gcc/genmatch.c
@@ -53,14 +53,23 @@ unsigned verbose;
 
 static struct line_maps *line_table;
 
+expanded_location
+linemap_client_expand_location_to_spelling_point (source_location loc)
+{
+  const struct line_map_ordinary *map;
+  loc = linemap_resolve_location (line_table, loc, LRK_SPELLING_LOCATION, &map);
+  return linemap_expand_location (line_table, map, loc);
+}
+
 static bool
 #if GCC_VERSION >= 4001
-__attribute__((format (printf, 6, 0)))
+__attribute__((format (printf, 5, 0)))
 #endif
-error_cb (cpp_reader *, int errtype, int, source_location location,
-	  unsigned int, const char *msg, va_list *ap)
+error_cb (cpp_reader *, int errtype, int, rich_location *richloc,
+	  const char *msg, va_list *ap)
 {
   const line_map_ordinary *map;
+  source_location location = richloc->get_loc ();
   linemap_resolve_location (line_table, location, LRK_SPELLING_LOCATION, &map);
   expanded_location loc = linemap_expand_location (line_table, map, location);
   fprintf (stderr, "%s:%d:%d %s: ", loc.file, loc.line, loc.column,
@@ -102,9 +111,10 @@ __attribute__((format (printf, 2, 3)))
 #endif
 fatal_at (const cpp_token *tk, const char *msg, ...)
 {
+  rich_location richloc (tk->src_loc);
   va_list ap;
   va_start (ap, msg);
-  error_cb (NULL, CPP_DL_FATAL, 0, tk->src_loc, 0, msg, &ap);
+  error_cb (NULL, CPP_DL_FATAL, 0, &richloc, msg, &ap);
   va_end (ap);
 }
 
@@ -114,9 +124,10 @@ __attribute__((format (printf, 2, 3)))
 #endif
 fatal_at (source_location loc, const char *msg, ...)
 {
+  rich_location richloc (loc);
   va_list ap;
   va_start (ap, msg);
-  error_cb (NULL, CPP_DL_FATAL, 0, loc, 0, msg, &ap);
+  error_cb (NULL, CPP_DL_FATAL, 0, &richloc, msg, &ap);
   va_end (ap);
 }
 
@@ -126,9 +137,10 @@ __attribute__((format (printf, 2, 3)))
 #endif
 warning_at (const cpp_token *tk, const char *msg, ...)
 {
+  rich_location richloc (tk->src_loc);
   va_list ap;
   va_start (ap, msg);
-  error_cb (NULL, CPP_DL_WARNING, 0, tk->src_loc, 0, msg, &ap);
+  error_cb (NULL, CPP_DL_WARNING, 0, &richloc, msg, &ap);
   va_end (ap);
 }
 
@@ -138,9 +150,10 @@ __attribute__((format (printf, 2, 3)))
 #endif
 warning_at (source_location loc, const char *msg, ...)
 {
+  rich_location richloc (loc);
   va_list ap;
   va_start (ap, msg);
-  error_cb (NULL, CPP_DL_WARNING, 0, loc, 0, msg, &ap);
+  error_cb (NULL, CPP_DL_WARNING, 0, &richloc, msg, &ap);
   va_end (ap);
 }
 
diff --git a/gcc/input.c b/gcc/input.c
index e7302a4..bdba20f 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -751,6 +751,13 @@ expand_location_to_spelling_point (source_location loc)
   return expand_location_1 (loc, /*expansion_point_p=*/false);
 }
 
+expanded_location
+linemap_client_expand_location_to_spelling_point (source_location loc)
+{
+  return expand_location_to_spelling_point (loc);
+}
+
+
 /* If LOCATION is in a system header and if it is a virtual location for
    a token coming from the expansion of a macro, unwind it to the
    location of the expansion point of the macro.  Otherwise, just return
diff --git a/gcc/intl.c b/gcc/intl.c
index a902446..110e1ea 100644
--- a/gcc/intl.c
+++ b/gcc/intl.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "intl.h"
+#include "box-drawing.h"
 
 #ifdef HAVE_LANGINFO_CODESET
 #include <langinfo.h>
@@ -86,6 +87,14 @@ gcc_init_libintl (void)
 	}
 #endif
     }
+
+  bool have_utf8 = false;
+#if defined HAVE_LANGINFO_CODESET
+  if (locale_utf8)
+    have_utf8 = true;
+#endif
+
+  g_line_art.init (have_utf8);
 }
 
 #if defined HAVE_WCHAR_H && defined HAVE_WORKING_MBSTOWCS && defined HAVE_WCSWIDTH
diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index fdc7b4d..02cf991 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -31,6 +31,27 @@ along with GCC; see the file COPYING3.  If not see
 #include <iconv.h>
 #endif
 
+/* Overwrite the range within this text_info's rich_location.
+   For use e.g. when implementing "+" in client format decoders.  */
+
+void
+text_info::set_range (unsigned int idx, source_range range, bool caret_p)
+{
+  gcc_checking_assert (m_richloc);
+  m_richloc->set_range (idx, range, caret_p);
+}
+
+location_t
+text_info::get_location (unsigned int index_of_location) const
+{
+  gcc_checking_assert (m_richloc);
+
+  if (index_of_location == 0)
+    return m_richloc->get_loc ();
+  else
+    return UNKNOWN_LOCATION;
+}
+
 // Default construct an output buffer.
 
 output_buffer::output_buffer ()
diff --git a/gcc/pretty-print.h b/gcc/pretty-print.h
index 36d4e37..d10272c 100644
--- a/gcc/pretty-print.h
+++ b/gcc/pretty-print.h
@@ -27,11 +27,6 @@ along with GCC; see the file COPYING3.  If not see
 /* Maximum number of format string arguments.  */
 #define PP_NL_ARGMAX   30
 
-/* Maximum number of locations associated to each message.  If
-   location 'i' is UNKNOWN_LOCATION, then location 'i+1' is not
-   valid.  */
-#define MAX_LOCATIONS_PER_MESSAGE 2
-
 /* The type of a text to be formatted according a format specification
    along with a list of things.  */
 struct text_info
@@ -40,21 +35,17 @@ struct text_info
   va_list *args_ptr;
   int err_no;  /* for %m */
   void **x_data;
+  rich_location *m_richloc;
 
-  inline void set_location (unsigned int index_of_location, location_t loc)
+  inline void set_location (unsigned int idx, location_t loc, bool caret_p)
   {
-    gcc_checking_assert (index_of_location < MAX_LOCATIONS_PER_MESSAGE);
-    this->locations[index_of_location] = loc;
+    source_range src_range;
+    src_range.m_start = loc;
+    src_range.m_finish = loc;
+    set_range (idx, src_range, caret_p);
   }
-
-  inline location_t get_location (unsigned int index_of_location) const
-  {
-    gcc_checking_assert (index_of_location < MAX_LOCATIONS_PER_MESSAGE);
-    return this->locations[index_of_location];
-  }
-
-private:
-  location_t locations[MAX_LOCATIONS_PER_MESSAGE];
+  void set_range (unsigned int idx, source_range range, bool caret_p);
+  location_t get_location (unsigned int index_of_location) const;
 };
 
 /* How often diagnostics are prefixed by their locations:
diff --git a/gcc/rtl-error.c b/gcc/rtl-error.c
index 8b9b391..d28be1d 100644
--- a/gcc/rtl-error.c
+++ b/gcc/rtl-error.c
@@ -69,9 +69,10 @@ diagnostic_for_asm (const rtx_insn *insn, const char *msg, va_list *args_ptr,
 		    diagnostic_t kind)
 {
   diagnostic_info diagnostic;
+  rich_location richloc (location_for_asm (insn));
 
   diagnostic_set_info (&diagnostic, msg, args_ptr,
-		       location_for_asm (insn), kind);
+		       &richloc, kind);
   report_diagnostic (&diagnostic);
 }
 
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
new file mode 100644
index 0000000..8ffe2e0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
@@ -0,0 +1,114 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret" } */
+
+/* This is a collection of unittests for diagnostic_show_locus;
+   see the overview in diagnostic_plugin_test_show_locus.c.
+
+   In particular, note the discussion of why we need a very long line here:
+01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
+   and that we can't use macros in this file.  */
+
+void test1 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 1" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = myvar.x;
+           ~~~~~^~
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test2 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 2" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = myvar.x;
+           ~~~~~^~
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test3 (void)
+{
+#if 0
+  x = first_function () + second_function ();  /* { dg-warning "test 3" } */
+
+/* { dg-begin-multiline-output "" }
+   x = first_function () + second_function ();
+       ~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+
+void test4 (void)
+{
+#if 0
+  x = (first_function ()
+       + second_function ()); /* { dg-warning "test 4" } */
+
+/* { dg-begin-multiline-output "" }
+   x = (first_function ()|
+        ~~~~~~~~~~~~~~~~~+type 'float'
+        + second_function ());                              |
+        ^ ~~~~~~~~~~~~~~~~~~                                +type 'void'
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test5 (void)
+{
+#if 0
+  x = (first_function_with_a_very_long_name (lorem, ipsum, dolor, sit, amet,
+                                            consectetur, adipiscing, elit,
+                                            sed, eiusmod, tempor,
+                                            incididunt, ut, labore, et,
+                                            dolore, magna, aliqua)
+       + second_function_with_a_very_long_name (lorem, ipsum, dolor, sit, /* { dg-warning "test 5" } */
+                                                amet, consectetur,
+                                                adipiscing, elit, sed,
+                                                eiusmod, tempor, incididunt,
+                                                ut, labore, et, dolore,
+                                                magna, aliqua));
+
+/* { dg-begin-multiline-output "" }
+   x = (first_function_with_a_very_long_name (lorem, ipsum, dolor, sit, amet,|
+        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
+                                             consectetur, adipiscing, elit,  |
+                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  |
+                                             sed, eiusmod, tempor,           |
+                                             ~~~~~~~~~~~~~~~~~~~~~           +type 'float'
+                                             incididunt, ut, labore, et,     |
+                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~~     |
+                                             dolore, magna, aliqua)          |
+                                             ~~~~~~~~~~~~~~~~~~~~~~          |
+        + second_function_with_a_very_long_name (lorem, ipsum, dolor, sit,                              |
+        ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
+                                                 amet, consectetur,                                     |
+                                                 ~~~~~~~~~~~~~~~~~~                                     |
+                                                 adipiscing, elit, sed,                                 |
+                                                 ~~~~~~~~~~~~~~~~~~~~~~                                 +type 'void'
+                                                 eiusmod, tempor, incididunt,                           |
+                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~                           |
+                                                 ut, labore, et, dolore,                                |
+                                                 ~~~~~~~~~~~~~~~~~~~~~~~                                |
+                                                 magna, aliqua));                                       |
+                                                 ~~~~~~~~~~~~~~                                         |
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test6 (void)
+{
+#if 0
+  float f = 98.6f; /* { dg-warning "test 6" } */
+/* { dg-begin-multiline-output "" }
+   float f = 98.6f;
+             ^~~~~
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
new file mode 100644
index 0000000..dba851d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret -fplugin-arg-diagnostic_plugin_test_show_locus-color" } */
+
+/* This is a collection of unittests for diagnostic_show_locus;
+   see the overview in diagnostic_plugin_test_show_locus.c.
+
+   In particular, note the discussion of why we need a very long line here:
+01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
+   and that we can't use macros in this file.  */
+
+void test1 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 1" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = ^[[32m^[[Kmyvar^[[m^[[K^[[01;35m^[[K.^[[m^[[K^[[34m^[[Kx^[[m^[[K;
+           ^[[32m^[[K~~~~~^[[m^[[K^[[01;35m^[[K^^[[m^[[K^[[34m^[[K~
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test2 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 2" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = ^[[32m^[[Kmyvar^[[m^[[K^[[01;35m^[[K.^[[m^[[K^[[34m^[[Kx^[[m^[[K;
+           ^[[32m^[[K~~~~~^[[m^[[K^[[01;35m^[[K^^[[m^[[K^[[34m^[[K~
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
new file mode 100644
index 0000000..5fc8395
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-bw.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret -fplugin-arg-diagnostic_plugin_test_show_locus-force-utf8" } */
+
+/* This is a collection of unittests for diagnostic_show_locus;
+   see the overview in diagnostic_plugin_test_show_locus.c.
+
+   In particular, note the discussion of why we need a very long line here:
+01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
+   and that we can't use macros in this file.  */
+
+void test1 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 1" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = myvar.x;
+           ─────▲─
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test2 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 2" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = myvar.x;
+           ─────▲─
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test3 (void)
+{
+#if 0
+  x = first_function () + second_function ();  /* { dg-warning "test 3" } */
+
+/* { dg-begin-multiline-output "" }
+   x = first_function () + second_function ();
+       ───────────────── ▲ ──────────────────
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* TODO: missing test4 and test5 for now (regex issues).  */
+
+void test6 (void)
+{
+#if 0
+  float f = 98.6f; /* { dg-warning "test 6" } */
+/* { dg-begin-multiline-output "" }
+   float f = 98.6f;
+             ▲────
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
new file mode 100644
index 0000000..a8f4bc3
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-locus-utf-8-color.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret -fplugin-arg-diagnostic_plugin_test_show_locus-force-utf8 -fplugin-arg-diagnostic_plugin_test_show_locus-color" } */
+
+/* This is a collection of unittests for diagnostic_show_locus;
+   see the overview in diagnostic_plugin_test_show_locus.c.
+
+   In particular, note the discussion of why we need a very long line here:
+01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
+   and that we can't use macros in this file.  */
+
+void test1 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 1" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = ^[[32m^[[Kmyvar^[[m^[[K^[[01;35m^[[K.^[[m^[[K^[[34m^[[Kx^[[m^[[K;
+           ^[[32m^[[K─────^[[m^[[K^[[01;35m^[[K▲^[[m^[[K^[[34m^[[K─
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test2 (void)
+{
+#if 0
+  myvar = myvar.x; /* { dg-warning "test 2" } */
+
+/* { dg-begin-multiline-output "" }
+   myvar = ^[[32m^[[Kmyvar^[[m^[[K^[[01;35m^[[K.^[[m^[[K^[[34m^[[Kx^[[m^[[K;
+           ^[[32m^[[K─────^[[m^[[K^[[01;35m^[[K▲^[[m^[[K^[[34m^[[K─
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+void test3 (void)
+{
+#if 0
+  x = first_function () + second_function ();  /* { dg-warning "test 3" } */
+
+/* { dg-begin-multiline-output "" }
+   x = ^[[32m^[[Kfirst_function ()^[[m^[[K ^[[01;35m^[[K+^[[m^[[K ^[[34m^[[Ksecond_function ()^[[m^[[K;
+       ^[[32m^[[K─────────────────^[[m^[[K ^[[01;35m^[[K▲^[[m^[[K ^[[34m^[[K──────────────────
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
+
+/* TODO: missing test4 and test5 for now (regex issues).  */
+
+void test6 (void)
+{
+#if 0
+  float f = 98.6f; /* { dg-warning "test 6" } */
+/* { dg-begin-multiline-output "" }
+   float f = ^[[01;35m^[[K98.6f^[[m^[[K;
+             ^[[01;35m^[[K▲────
+^[[m^[[K
+   { dg-end-multiline-output "" } */
+#endif
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
new file mode 100644
index 0000000..f724ef4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
@@ -0,0 +1,361 @@
+/* { dg-options "-O" } */
+
+/* This plugin exercises the diagnostics-printing code.
+
+   The goal is to unit-test the range-printing code without needing any
+   correct range data within the compiler's IR.  We can't use any real
+   diagnostics for this, so we have to fake it, hence this plugin.
+
+   There are four test files used with this code:
+
+     diagnostic-test-show-locus-ascii-bw.c
+     ..........................-ascii-color.c
+     ..........................-utf-8-bw.c
+     ..........................-utf-8-color.c
+
+   to exercise the different combinations of:
+
+     - ASCII vs UTF-8
+     - uncolored vs colored output.
+
+   by supplying plugin arguments to hack in the desired behavior:
+
+     -fplugin-arg-diagnostic_plugin_test_show_locus-force-utf8
+     -fplugin-arg-diagnostic_plugin_test_show_locus-color
+
+   The test files contain functions, but the body of each
+   function is disabled using the preprocessor.  The plugin detects
+   the functions by name, and inject diagnostics within them, using
+   hard-coded locations relative to the top of each function.
+
+   The plugin uses a function "get_loc" below to map from line/column
+   numbers to source_location, and this relies on input_location being in
+   the same ordinary line_map as the locations in question.  The plugin
+   runs after parsing, so input_location will be at the end of the file.
+
+   This need for all of the test code to be in a single ordinary line map
+   means that each test file needs to have a very long line near the top
+   (potentially to cover the extra byte-count of UTF-8 or colorized data),
+   to ensure that further very long lines don't start a new linemap.
+   This also means that we can't use macros in the test files.  */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "toplev.h"
+#include "basic-block.h"
+#include "hash-table.h"
+#include "vec.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "tree-ssa-alias.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "intl.h"
+#include "plugin-version.h"
+#include "diagnostic.h"
+#include "context.h"
+#include "gcc-rich-location.h"
+#include "print-tree.h"
+#include "box-drawing.h"
+
+/* FIXME: (dmalcolm)
+   This plugin is currently the only user of
+     gcc_rich_location::add_range_with_caption
+   As such, the symbol is present in libbackend.a, but not in "cc1",
+   and running the plugin fails with a linker error:
+     ./diagnostic_plugin_test_show_locus.so: undefined symbol: _ZN17gcc_rich_location22add_range_with_captionEjjP18diagnostic_contextPKcz
+   which c++filt tells us is:
+     ./diagnostic_plugin_test_show_locus.so: undefined symbol: gcc_rich_location::add_range_with_caption(unsigned int, unsigned int, diagnostic_context*, char const*, ...)
+
+   I've tried various workarounds (adding DEBUG_FUNCTION to the
+   method, taking its address), but can't seem to fix it that way.
+   So as a nasty workaround, the following material is copied&pasted
+   from gcc-rich-location.c: */
+
+void
+gcc_rich_location::add_range_with_caption (location_t start, location_t finish,
+					   diagnostic_context *context,
+					   const char *gmsgid, ...)
+{
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+
+  va_list ap;
+  va_start (ap, gmsgid);
+
+  char *caption = expand_caption_va (context, gmsgid, &ap);
+  add_range (start, finish, caption, BUFFER_OWNERSHIP_GIVEN, false);
+
+  va_end (ap);
+}
+
+char *
+gcc_rich_location::expand_caption_va (diagnostic_context *context,
+				      const char *gmsgid, va_list *args)
+{
+  gcc_assert (context);
+  gcc_assert (gmsgid);
+
+  /* Only bother if show-caret is enabled.  */
+  if (!context->show_caret)
+    return NULL;
+
+  /* Format the text, and return a copy.  */
+  pretty_printer * const pp = context->printer;
+  char *result;
+  text_info text;
+  text.err_no = errno;
+  text.args_ptr = args;
+  text.format_spec = G_(gmsgid);
+  pp_format (pp, &text);
+  pp_output_formatted_text (pp);
+  result = xstrdup (pp_formatted_text (pp));
+  pp_clear_output_area (pp);
+  return result;
+}
+
+/* FIXME: end of material taken from gcc-rich-location.c */
+
+int plugin_is_GPL_compatible;
+
+const pass_data pass_data_test_show_locus =
+{
+  GIMPLE_PASS, /* type */
+  "test_show_locus", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_test_show_locus : public gimple_opt_pass
+{
+public:
+  pass_test_show_locus(gcc::context *ctxt)
+    : gimple_opt_pass(pass_data_test_show_locus, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_test_show_locus
+
+/* Given LINE_NUM and COL_NUM, generate a source_location in the
+   current file, relative to input_location.  This relies on the
+   location being expressible in the same ordinary line_map as
+   input_location (which is typically at the end of the source file
+   when this is called).  Hence the test files we compile with this
+   plugin must have an initial very long line (to avoid long lines
+   starting a new line map), and must not use macros.
+
+   COL_NUM uses the Emacs convention of 0-based column numbers.  */
+
+static source_location
+get_loc (unsigned int line_num, unsigned int col_num)
+{
+  /* Use input_location to get the relevant line_map */
+  const struct line_map_ordinary *line_map
+    = (const line_map_ordinary *)(linemap_lookup (line_table,
+						  input_location));
+
+  /* Convert from 0-based column numbers to 1-based column numbers.  */
+  source_location loc
+    = linemap_position_for_line_and_column (line_map,
+					    line_num, col_num + 1);
+
+  return loc;
+}
+
+/* Was "color" passed in as a plugin argument?  */
+static bool force_show_locus_color = false;
+
+/* We want to verify the colorized output of diagnostic_show_locus,
+   but turning on colorization for everything confuses "dg-warning" etc.
+   Hence we special-case it within this plugin by using this modified
+   version of default_diagnostic_finalizer, which, if "color" is
+   passed in as a plugin argument turns on colorization, but just
+   for diagnostic_show_locus.  */
+
+static void
+custom_diagnostic_finalizer (diagnostic_context *context,
+			     diagnostic_info *diagnostic)
+{
+  bool old_show_color = pp_show_color (context->printer);
+  if (force_show_locus_color)
+    pp_show_color (context->printer) = true;
+  diagnostic_show_locus (context, diagnostic);
+  pp_show_color (context->printer) = old_show_color;
+
+  pp_destroy_prefix (context->printer);
+  pp_newline_and_flush (context->printer);
+}
+
+/* Exercise the diagnostic machinery to emit various warnings,
+   for use by diagnostic-test-show-locus-*.c.
+
+   We inject each warning relative to the start of a function,
+   which avoids lots of hardcoded absolute locations.  */
+
+static void
+test_show_locus (function *fun)
+{
+  tree fndecl = fun->decl;
+  tree identifier = DECL_NAME (fndecl);
+  const char *fnname = IDENTIFIER_POINTER (identifier);
+  location_t fnstart = fun->function_start_locus;
+  int fnstart_line = LOCATION_LINE (fnstart);
+
+  diagnostic_finalizer (global_dc) = custom_diagnostic_finalizer;
+
+  /* Test 1.  */
+  if (0 == strcmp (fnname, "test1"))
+    {
+      const int line = fnstart_line + 2;
+      rich_location richloc (get_loc (line, 15));
+      richloc.add_range (get_loc (line, 10), get_loc (line, 14));
+      richloc.add_range (get_loc (line, 16), get_loc (line, 16));
+      warning_at_rich_loc (&richloc, 0, "test 1");
+    }
+
+  /* Test 2: as before, but add captions.  */
+  if (0 == strcmp (fnname, "test2"))
+    {
+      const int line = fnstart_line + 2;
+      gcc_rich_location richloc (get_loc (line, 15));
+      richloc.add_range_with_caption (get_loc (line, 10), get_loc (line, 14),
+				      global_dc,
+				      "hello");
+      richloc.add_range_with_caption (get_loc (line, 16), get_loc (line, 16),
+				      global_dc,
+				      "world");
+      warning_at_rich_loc (&richloc, 0, "test 2");
+    }
+
+  /* Test 3.  */
+  if (0 == strcmp (fnname, "test3"))
+    {
+      const int line = fnstart_line + 2;
+      gcc_rich_location richloc (get_loc (line, 24));
+      richloc.add_range_with_caption (get_loc (line, 6),
+				      get_loc (line, 22),
+				      global_dc,
+				      "type %qT", float_type_node);
+      richloc.add_range_with_caption (get_loc (line, 26),
+				      get_loc (line, 43),
+				      global_dc,
+				      "type %qT", void_type_node);
+      warning_at_rich_loc (&richloc, 0, "test 3");
+    }
+
+  /* Test 4.  */
+  if (0 == strcmp (fnname, "test4"))
+    {
+      const int line = fnstart_line + 2;
+      gcc_rich_location richloc (get_loc (line + 1, 7));
+      richloc.add_range_with_caption (get_loc (line, 7),
+				      get_loc (line, 23),
+				      global_dc,
+				      "type %qT", float_type_node);
+      richloc.add_range_with_caption (get_loc (line + 1, 9),
+				      get_loc (line + 1, 26),
+				      global_dc,
+				      "type %qT", void_type_node);
+      warning_at_rich_loc (&richloc, 0, "test 4");
+    }
+
+  /* Test 5.  Multiline example with captions.  */
+  if (0 == strcmp (fnname, "test5"))
+    {
+      const int line = fnstart_line + 2;
+      gcc_rich_location richloc (get_loc (line + 5, 7));
+      richloc.add_range_with_caption (get_loc (line, 7),
+				      get_loc (line + 4, 65),
+				      global_dc,
+				      "type %qT", float_type_node);
+      richloc.add_range_with_caption (get_loc (line + 5, 9),
+				      get_loc (line + 10, 61),
+				      global_dc,
+				      "type %qT", void_type_node);
+      warning_at_rich_loc (&richloc, 0, "test 5");
+    }
+
+  /* Test 6: example of a single-range location where the range extends
+     beyond the caret.  */
+  if (0 == strcmp (fnname, "test6"))
+    {
+      const int line = fnstart_line + 2;
+      source_range src_range;
+      src_range.m_start = get_loc (line, 12);
+      src_range.m_finish = get_loc (line, 16);
+      gcc_rich_location richloc (src_range);
+      warning_at_rich_loc (&richloc, 0, "test 6");
+    }
+}
+
+unsigned int
+pass_test_show_locus::execute (function *fun)
+{
+  test_show_locus (fun);
+  return 0;
+}
+
+static gimple_opt_pass *
+make_pass_test_show_locus (gcc::context *ctxt)
+{
+  return new pass_test_show_locus (ctxt);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  struct register_pass_info pass_info;
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  if (!plugin_default_version_check (version, &gcc_version))
+    return 1;
+
+  for (int i = 0; i < argc; i++)
+    {
+      if (0 == strcmp (argv[i].key, "force-utf8"))
+	{
+	  /* Forcibly reinit-box-drawing to use UTF-8, so that
+	     we can exercise this even though LANG=C.  */
+	  g_line_art.init (1);
+
+	  /* We also need to reset the carets within the diagnostic
+	     context to use the new default caret.  */
+	  for (int j = 0; j < rich_location::MAX_RANGES; j++)
+	    global_dc->caret_chars[j] = g_line_art.default_caret;
+	}
+      else if (0 == strcmp (argv[i].key, "color"))
+	force_show_locus_color = true;
+    }
+
+  pass_info.pass = make_pass_test_show_locus (g);
+  pass_info.reference_pass_name = "ssa";
+  pass_info.ref_pass_instance_number = 1;
+  pass_info.pos_op = PASS_POS_INSERT_AFTER;
+  register_callback (plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL,
+		     &pass_info);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 39fab6e..1017044 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -63,6 +63,11 @@ set plugin_test_list [list \
     { start_unit_plugin.c start_unit-test-1.c } \
     { finish_unit_plugin.c finish_unit-test-1.c } \
     { wide-int_plugin.c wide-int-test-1.c } \
+    { diagnostic_plugin_test_show_locus.c \
+	  diagnostic-test-show-locus-ascii-bw.c \
+	  diagnostic-test-show-locus-ascii-color.c \
+	  diagnostic-test-show-locus-utf-8-bw.c \
+	  diagnostic-test-show-locus-utf-8-color.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/lib/gcc-dg.exp b/gcc/testsuite/lib/gcc-dg.exp
index 7c1ab85..8cc1d87 100644
--- a/gcc/testsuite/lib/gcc-dg.exp
+++ b/gcc/testsuite/lib/gcc-dg.exp
@@ -29,6 +29,7 @@ load_lib libgloss.exp
 load_lib target-libpath.exp
 load_lib torture-options.exp
 load_lib fortran-modules.exp
+load_lib multiline.exp
 
 # We set LC_ALL and LANG to C so that we get the same error messages as expected.
 setenv LC_ALL C
diff --git a/gcc/tree-diagnostic.c b/gcc/tree-diagnostic.c
index 135f142..02009d8 100644
--- a/gcc/tree-diagnostic.c
+++ b/gcc/tree-diagnostic.c
@@ -289,7 +289,7 @@ default_tree_printer (pretty_printer *pp, text_info *text, const char *spec,
     }
 
   if (set_locus)
-    text->set_location (0, DECL_SOURCE_LOCATION (t));
+    text->set_location (0, DECL_SOURCE_LOCATION (t), true);
 
   if (DECL_P (t))
     {
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 7cd1fe7..3c34d51 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -3602,7 +3602,7 @@ void
 percent_K_format (text_info *text)
 {
   tree t = va_arg (*text->args_ptr, tree), block;
-  text->set_location (0, EXPR_LOCATION (t));
+  text->set_location (0, EXPR_LOCATION (t), true);
   gcc_assert (pp_ti_abstract_origin (text) != NULL);
   block = TREE_BLOCK (t);
   *pp_ti_abstract_origin (text) = NULL;
diff --git a/libcpp/errors.c b/libcpp/errors.c
index a33196e..c351c11 100644
--- a/libcpp/errors.c
+++ b/libcpp/errors.c
@@ -57,7 +57,8 @@ cpp_diagnostic (cpp_reader * pfile, int level, int reason,
 
   if (!pfile->cb.error)
     abort ();
-  ret = pfile->cb.error (pfile, level, reason, src_loc, 0, _(msgid), ap);
+  rich_location richloc (src_loc);
+  ret = pfile->cb.error (pfile, level, reason, &richloc, _(msgid), ap);
 
   return ret;
 }
@@ -139,7 +140,9 @@ cpp_diagnostic_with_line (cpp_reader * pfile, int level, int reason,
   
   if (!pfile->cb.error)
     abort ();
-  ret = pfile->cb.error (pfile, level, reason, src_loc, column, _(msgid), ap);
+  rich_location richloc (src_loc);
+  richloc.override_column (column);
+  ret = pfile->cb.error (pfile, level, reason, &richloc, _(msgid), ap);
 
   return ret;
 }
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 5eaea6b..a2bdfa0 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -573,9 +573,9 @@ struct cpp_callbacks
 
   /* Called to emit a diagnostic.  This callback receives the
      translated message.  */
-  bool (*error) (cpp_reader *, int, int, source_location, unsigned int,
+  bool (*error) (cpp_reader *, int, int, rich_location *,
 		 const char *, va_list *)
-       ATTRIBUTE_FPTR_PRINTF(6,0);
+       ATTRIBUTE_FPTR_PRINTF(5,0);
 
   /* Callbacks for when a macro is expanded, or tested (whether
      defined or not at the time) in #ifdef, #ifndef or "defined".  */
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index bc747c1..53ba68b 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -118,6 +118,35 @@ typedef unsigned int linenum_type;
   libcpp/location-example.txt.  */
 typedef unsigned int source_location;
 
+/* A range of source locations.
+
+   Ranges are half-open:
+   m_start is the first location within the range, whereas
+   m_finish is the first location *after* the range.
+
+   We may need a more compact way to store these, but for now,
+   let's do it the simple way, as a pair.  */
+struct GTY(()) source_range
+{
+  source_location m_start;
+  source_location m_finish;
+
+  void debug (const char *msg) const;
+
+  /* We avoid using constructors, since various structs that
+     don't yet have constructors will embed instances of
+     source_range.  */
+
+  /* Make a source_range from a source_location.  */
+  static source_range from_location (source_location loc)
+  {
+    source_range result;
+    result.m_start = loc;
+    result.m_finish = loc;
+    return result;
+  }
+};
+
 /* Memory allocation function typedef.  Works like xrealloc.  */
 typedef void *(*line_map_realloc) (void *, size_t);
 
@@ -1015,6 +1044,204 @@ typedef struct
   bool sysp;
 } expanded_location;
 
+/* Both gcc and emacs number source *lines* starting at 1, but
+   they have differing conventions for *columns*.
+
+   GCC uses a 1-based convention for source columns,
+   whereas Emacs's M-x column-number-mode uses a 0-based convention.
+
+   For example, an error in the initial, left-hand
+   column of source line 3 is reported by GCC as:
+
+      some-file.c:3:1: error: ...etc...
+
+   On navigating to the location of that error in Emacs
+   (e.g. via "next-error"),
+   the locus is reported in the Mode Line
+   (assuming M-x column-number-mode) as:
+
+     some-file.c   10%   (3, 0)
+
+   i.e. "3:1:" in GCC corresponds to "(3, 0)" in Emacs.  */
+
+/* Ranges are closed
+   m_start is the first location within the range, and
+   m_finish is the last location within the range.  */
+struct location_range
+{
+  expanded_location m_start;
+  expanded_location m_finish;
+
+  /* Should a caret be drawn for this range?  Typically this is
+     true for the 0th range, and false for subsequent ranges,
+     but the Fortran frontend overrides this for rendering things like:
+
+       x = x + y
+           1   2
+       Error: Shapes for operands at (1) and (2) are not conformable
+
+     where "1" and "2" are notionally carets.  */
+  bool m_show_caret_p;
+
+  /* Caption, if any.  If non-NULL, this is dynamically allocated
+     and must be freed.  */
+  char *m_caption;
+
+  bool contains_point (int row, int column) const;
+};
+
+/* Should the callee take ownership of char *, or make a copy?  */
+enum buffer_ownership
+{
+  BUFFER_OWNERSHIP_GIVEN,   /* Take ownership.  */
+  BUFFER_OWNERSHIP_BORROWED /* Make a copy.  */
+};
+
+/* A "rich" source code location, for use when printing diagnostics.
+   A rich_location has a "primary location", along with zero or more
+   additional ranges.
+
+   rich_location instances are intended to be allocated on the stack
+   when generating diagnostics, and to be short-lived.
+
+   The zeroth range can be thought of as an extension of the primary
+   location within the rich_location; additional ranges may be added
+   to help the user identify other pertinent clauses in a diagnostic.
+
+   Each range may be flagged for having a caret displayed
+   at its start; typically this is the case for the zeroth
+   range.
+
+   Each range may optionally have a caption.
+
+   This class is subclassed by gcc (class gcc_rich_location) to add
+   additional methods; see gcc/gcc-rich-location.h.  */
+
+class rich_location
+{
+ public:
+  /* Constructors.  */
+
+  /* Constructing from a location.  */
+  rich_location (source_location loc);
+
+  /* Constructing from a source_range.  */
+  rich_location (source_range src_range);
+
+  /* Destructor.  */
+  virtual ~rich_location ();
+
+  /* Accessors.  */
+  source_location get_loc () const { return m_loc; }
+
+  source_location *get_loc_addr () { return &m_loc; }
+
+  void
+  add_range (source_location start, source_location finish,
+	     bool show_caret_p = false);
+
+  void
+  add_range (source_location start, source_location finish,
+	     char *caption,
+	     enum buffer_ownership ownership,
+	     bool show_caret_p);
+
+  void
+  add_range (source_range src_range,
+	     bool show_caret_p = false);
+
+  void
+  add_range (source_range src_range,
+	     char *caption,
+	     enum buffer_ownership ownership);
+
+  /* The caption, if non-NULL must already be a copy.  */
+  void
+  add_range (location_range *src_range);
+
+  void
+  set_range (unsigned int idx, source_range src_range,
+	     bool show_caret_p);
+
+  int get_first_line ();
+  int get_last_line ();
+
+  unsigned int get_num_locations () const { return m_num_ranges; }
+
+  location_range *get_range (unsigned int idx)
+  {
+    linemap_assert (idx < m_num_ranges);
+    return &m_ranges[idx];
+  }
+
+  expanded_location lazily_expand_location ();
+
+  class range_iter
+  {
+  public:
+    range_iter (rich_location *richloc);
+    bool at_end () const;
+    void next ();
+    location_range *operator * () const;
+    unsigned int index () const;
+
+  private:
+    rich_location *m_richloc;
+    unsigned int m_idx;
+  };
+
+  range_iter iter_ranges () { return range_iter (this); }
+
+  void
+  override_column (int column);
+
+public:
+  static const int MAX_RANGES = 3;
+
+protected:
+  friend class range_iter;
+
+  source_location m_loc;
+
+  unsigned int m_num_ranges;
+  location_range m_ranges[MAX_RANGES];
+
+  bool m_have_expanded_location;
+  expanded_location m_expanded_location;
+};
+
+inline
+rich_location::range_iter::range_iter (rich_location *richloc) :
+  m_richloc (richloc),
+  m_idx (0)
+{
+}
+
+inline bool
+rich_location::range_iter::at_end () const
+{
+  return m_idx >= m_richloc->m_num_ranges;
+}
+
+inline void
+rich_location::range_iter::next ()
+{
+  m_idx++;
+}
+
+inline location_range *
+rich_location::range_iter::operator * () const
+{
+  return m_richloc->get_range (m_idx);
+}
+
+inline unsigned int
+rich_location::range_iter::index () const
+{
+  return m_idx;
+}
+
+
 /* This is enum is used by the function linemap_resolve_location
    below.  The meaning of the values is explained in the comment of
    that function.  */
@@ -1158,4 +1385,13 @@ void linemap_dump (FILE *, struct line_maps *, unsigned, bool);
    specifies how many macro maps to dump.  */
 void line_table_dump (FILE *, struct line_maps *, unsigned int, unsigned int);
 
+/* The rich_location class requires a way to expand source_location instances.
+   We would directly use expand_location_to_spelling_point, which is
+   implemented in gcc/input.c, but we also need to use it for rich_location
+   within genmatch.c.
+   Hence we require client code of libcpp to implement the following
+   symbol.  */
+extern expanded_location
+linemap_client_expand_location_to_spelling_point (source_location );
+
 #endif /* !LIBCPP_LINE_MAP_H  */
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index d58cad2..79d8eee 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -1746,3 +1746,211 @@ line_table_dump (FILE *stream, struct line_maps *set, unsigned int num_ordinary,
       fprintf (stream, "\n");
     }
 }
+
+/* class rich_location.  */
+
+/* Construct a rich_location with location LOC as its initial range.  */
+
+rich_location::rich_location (source_location loc) :
+  m_loc (loc),
+  m_num_ranges (0),
+  m_have_expanded_location (false)
+{
+  /* Set up the 0th range: */
+  add_range (loc, loc, true);
+}
+
+/* Construct a rich_location with source_range SRC_RANGE as its
+   initial range.  */
+
+rich_location::rich_location (source_range src_range)
+: m_loc (src_range.m_start),
+  m_num_ranges (0),
+  m_have_expanded_location (false)
+{
+  /* Set up the 0th range: */
+  add_range (src_range, true);
+}
+
+/* The destructor for class rich_location.  */
+
+rich_location::~rich_location ()
+{
+  for (unsigned int i = 0; i < m_num_ranges; i++)
+    free (m_ranges[i].m_caption);
+}
+
+/* Get the first line of the rich_location, either that of
+   the primary location, or of one of the ranges.  */
+
+int
+rich_location::get_first_line ()
+{
+  lazily_expand_location ();
+  int result = m_expanded_location.line;
+  for (range_iter iter = iter_ranges (); !iter.at_end (); iter.next())
+    {
+      location_range *range = *iter;
+      if (result > range->m_start.line)
+	result = range->m_start.line;
+    }
+  return result;
+}
+
+/* Get the last line of the rich_location, either that of
+   the primary location, or of one of the ranges.  */
+
+int
+rich_location::get_last_line ()
+{
+  lazily_expand_location ();
+  int result = m_expanded_location.line;
+  for (range_iter iter = iter_ranges (); !iter.at_end (); iter.next())
+    {
+      location_range *range = *iter;
+      if (result < range->m_finish.line)
+	result = range->m_finish.line;
+    }
+  return result;
+}
+
+/* Get an expanded_location for this rich_location's primary
+   location.  */
+
+expanded_location
+rich_location::lazily_expand_location ()
+{
+  if (!m_have_expanded_location)
+    {
+      m_expanded_location
+	= linemap_client_expand_location_to_spelling_point (m_loc);
+      m_have_expanded_location = true;
+    }
+
+  return m_expanded_location;
+}
+
+/* Set the column of the primary location.  */
+
+void
+rich_location::override_column (int column)
+{
+  lazily_expand_location ();
+  m_expanded_location.column = column;
+}
+
+/* Add the given range, with no caption.  */
+
+void
+rich_location::add_range (source_location start, source_location finish,
+			  bool show_caret_p)
+{
+  linemap_assert (m_num_ranges < MAX_RANGES);
+
+  add_range (start, finish, NULL, BUFFER_OWNERSHIP_GIVEN, show_caret_p);
+}
+
+/* Add the given range, with a caption, taking a copy if OWNERSHIP
+   is BUFFER_OWNERSHIP_BORROWED, or assuming ownership if OWNERSHIP
+   is BUFFER_OWNERSHIP_GIVEN.  */
+
+void
+rich_location::add_range (source_location start, source_location finish,
+			  char *caption,
+			  enum buffer_ownership ownership,
+			  bool show_caret_p)
+{
+  linemap_assert (m_num_ranges < MAX_RANGES);
+
+  location_range *range = &m_ranges[m_num_ranges++];
+  range->m_start = linemap_client_expand_location_to_spelling_point (start);
+  range->m_finish = linemap_client_expand_location_to_spelling_point (finish);
+  range->m_show_caret_p = show_caret_p;
+  if (ownership == BUFFER_OWNERSHIP_BORROWED)
+    range->m_caption = caption ? xstrdup (caption) : NULL;
+  else
+    range->m_caption = caption;
+}
+
+/* Add the given range, with no caption.  */
+
+void
+rich_location::add_range (source_range src_range, bool show_caret_p)
+{
+  linemap_assert (m_num_ranges < MAX_RANGES);
+
+  add_range (src_range.m_start, src_range.m_finish,
+	     NULL, BUFFER_OWNERSHIP_GIVEN,
+	     show_caret_p);
+}
+
+/* Add the given range, with a caption, taking a copy if OWNERSHIP
+   is BUFFER_OWNERSHIP_BORROWED, or assuming ownership if OWNERSHIP
+   is BUFFER_OWNERSHIP_GIVEN.  */
+
+void
+rich_location::add_range (source_range src_range,
+			  char *caption,
+			  enum buffer_ownership ownership)
+{
+  linemap_assert (m_num_ranges < MAX_RANGES);
+
+  add_range (src_range.m_start, src_range.m_finish, caption, ownership,
+	     false);
+}
+
+void
+rich_location::add_range (location_range *src_range)
+{
+  linemap_assert (m_num_ranges < MAX_RANGES);
+
+  /* The caption, if non-NULL, must already be a copy.  */
+  m_ranges[m_num_ranges++] = *src_range;
+}
+
+/* Add or overwrite the range given by IDX.  It must either
+   overwrite an existing range, or add one *exactly* on the end of
+   the array.
+
+   This is primarily for use by gcc when implementing diagnostic
+   format decoders e.g. the "+" in the C/C++ frontends, for handling
+   format codes like "%q+D" (which writes the source location of a
+   tree back into range 0 of the rich_location).
+
+   If SHOW_CARET_P is true, then the range should be rendered with
+   a caret at its starting location.  This
+   is for use by the Fortran frontend, for implementing the
+   "%C" and "%L" format codes.  */
+
+void
+rich_location::set_range (unsigned int idx, source_range src_range,
+			  bool show_caret_p)
+{
+  linemap_assert (idx < MAX_RANGES);
+
+  /* We can either overwrite an existing range, or add one exactly
+     on the end of the array.  */
+  linemap_assert (idx <= m_num_ranges);
+
+  location_range *locrange = &m_ranges[idx];
+  locrange->m_start
+    = linemap_client_expand_location_to_spelling_point (src_range.m_start);
+  locrange->m_finish
+    = linemap_client_expand_location_to_spelling_point (src_range.m_finish);
+
+  locrange->m_show_caret_p = show_caret_p;
+
+  /* Are we adding a range onto the end?  */
+  if (idx == m_num_ranges)
+    {
+      locrange->m_caption = NULL;
+      m_num_ranges = idx + 1;
+    }
+
+  if (idx == 0)
+    {
+      m_loc = src_range.m_start;
+      /* Mark any cached value here as dirty.  */
+      m_have_expanded_location = false;
+    }
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 12/22] Add source-ranges for trees
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (13 preceding siblings ...)
  2015-09-10 20:30 ` [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree David Malcolm
@ 2015-09-10 20:30 ` David Malcolm
  2015-09-10 20:30 ` [PATCH 14/22] C: capture tree ranges for various expressions David Malcolm
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch adds a way to associate source range information with tree
expressions and decls, for later use by diagnostics.

It's a poor implementation which is unacceptable on multiple grounds:
for starters, it adds a source_range (8 bytes) to struct tree_exp and
to struct tree_decl_minimal.  (It also doesn't bootstrap).

Adding the source_range fields above covers most expressions, but
doesn't help with references to decls and to constants.  Consider:

  int test (int foo)
  {
    return foo * 100;
           ^^^   ^^^
  }

I want gcc to retain information so that diagnostics can underline
the "foo" and "100" above, but all we have is a VAR_DECL and an
INTEGER_CST.  The former's location is in at the top of the
function, and the latter has no location.

Hence the patch adds a new SOURCE_RANGE tree code.  This
is a kind of unary operator or wrapper, which wraps things that don't
have a range field themselves (further bloating the IR).

They get thrown away during gimplification.

This works for simple cases, but isn't yet complete: there are plenty
of places where the frontends will fail if they see a SOURCE_RANGE.

So, as I said, it's a poor implementation, but the followup patches
needed some way to record source range information for trees.

Some alternate ideas for how this could be implementated:

(a) somehow compress the location and range into the 4 bytes taken up
    by a source_location: are we really using all 32-bits?  I suspect
    that in real-world code ranges, there's enough closeness between the
    location, range start and range finish that we can pack them into
    32 bits for most cases, with some kind of lookaside for those that
    don't fit.

(b) introduce some kind of "DECL_USAGE" or "DECL_REF" tree node, an
    expression that references a decl, segregating decls from expression
    trees, putting the location information into the DECL_USAGE node.

(c) only bother tracking the information if -fdiagnostics-show-caret
    is enabled (note, though that it's on by default; if we go down
    this path, maybe it's another thing for torture testing?).

(d) only track information temporarily (e.g. in c_expr, rather than in
    tree), discarding it as the tree is built, or perhaps special-casing
    some places where it's particular worth preserving e.g. the ranges
    of the arguments at a callsite, so the user can easily identify
    whatever "argument 3" means.

etc.  Ideas?

gcc/c-family/ChangeLog:
	* c-common.c (c_fully_fold_internal): Capture existing souce_range,
	and store it on the result.
	* c-pretty-print.c (c_pretty_printer::expression): Handle
	SOURCE_RANGE.

gcc/c/ChangeLog:
	* c-typeck.c (array_to_pointer_conversion): Handle SOURCE_RANGE,
	and preserve any source range information.
	(build_function_call_vec): Handle SOURCE_RANGE.
	(lvalue_p): Likewise.
	(c_finish_return): Likewise.

gcc/ChangeLog:
	* gimplify.c (gimplify_expr): Throw away SOURCE_RANGEs.
	* print-tree.c (print_node): Print any source range information.
	* tree-core.h (struct tree_exp): Add a "range" field.
	(struct tree_decl_minimal): Likewise.
	* tree.c (build1_stat): Initialize EXPR_LOCATION_RANGE (t).
	(build_decl_stat): Add overload taking a source_range.
	(set_source_range): New functions.
	* tree.def (SOURCE_RANGE): New tree code.
	* tree.h (CAN_HAVE_RANGE_P): New.
	(EXPR_LOCATION_RANGE): New.
	(EXPR_RANGE_OR_LOC): New.
	(EXPR_HAS_RANGE): New.
	(DECL_LOCATION_RANGE): New.
	(build_decl_stat): New overload.
	(set_source_range): New decls.
---
 gcc/c-family/c-common.c       | 10 +++++++++-
 gcc/c-family/c-pretty-print.c |  4 ++++
 gcc/c/c-typeck.c              | 23 ++++++++++++++++++++++-
 gcc/gimplify.c                |  4 ++++
 gcc/print-tree.c              | 21 +++++++++++++++++++++
 gcc/tree-core.h               |  2 ++
 gcc/tree.c                    | 42 ++++++++++++++++++++++++++++++++++++++----
 gcc/tree.def                  |  2 ++
 gcc/tree.h                    | 21 +++++++++++++++++++++
 9 files changed, 123 insertions(+), 6 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index c02ea39..ff6f90f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1178,6 +1178,7 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands,
   bool op0_const_self = true, op1_const_self = true, op2_const_self = true;
   bool nowarning = TREE_NO_WARNING (expr);
   bool unused_p;
+  source_range old_range;
 
   /* This function is not relevant to C++ because C++ folds while
      parsing, and may need changes to be correct for C++ when C++
@@ -1193,6 +1194,9 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands,
       || code == SAVE_EXPR)
     return expr;
 
+  if (IS_EXPR_CODE_CLASS (kind))
+    old_range = EXPR_LOCATION_RANGE (expr);
+
   /* Operands of variable-length expressions (function calls) have
      already been folded, as have __builtin_* function calls, and such
      expressions cannot occur in constant expressions.  */
@@ -1617,7 +1621,11 @@ c_fully_fold_internal (tree expr, bool in_init, bool *maybe_const_operands,
       TREE_NO_WARNING (ret) = 1;
     }
   if (ret != expr)
-    protected_set_expr_location (ret, loc);
+    {
+      protected_set_expr_location (ret, loc);
+      if (IS_EXPR_CODE_CLASS (kind))
+	set_source_range (&ret, old_range.m_start, old_range.m_finish);
+    }
   return ret;
 }
 
diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c
index e2809cf..c70cfe0 100644
--- a/gcc/c-family/c-pretty-print.c
+++ b/gcc/c-family/c-pretty-print.c
@@ -2319,6 +2319,10 @@ c_pretty_printer::expression (tree e)
       expression (C_MAYBE_CONST_EXPR_EXPR (e));
       break;
 
+    case SOURCE_RANGE:
+      expression (TREE_OPERAND (e, 0));
+      break;
+
     default:
       pp_unsupported_tree (this, e);
       break;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index a755a7e..6c60dc8 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -1815,6 +1815,7 @@ array_to_pointer_conversion (location_t loc, tree exp)
   tree adr;
   tree restype = TREE_TYPE (type);
   tree ptrtype;
+  tree source_range = NULL;
 
   gcc_assert (TREE_CODE (type) == ARRAY_TYPE);
 
@@ -1828,6 +1829,12 @@ array_to_pointer_conversion (location_t loc, tree exp)
   if (INDIRECT_REF_P (exp))
     return convert (ptrtype, TREE_OPERAND (exp, 0));
 
+  if (TREE_CODE (exp) == SOURCE_RANGE)
+    {
+      source_range = exp;
+      exp = TREE_OPERAND (exp, 0);
+    }
+
   /* In C++ array compound literals are temporary objects unless they are
      const or appear in namespace scope, so they are destroyed too soon
      to use them for much of anything  (c++/53220).  */
@@ -1841,7 +1848,10 @@ array_to_pointer_conversion (location_t loc, tree exp)
     }
 
   adr = build_unary_op (loc, ADDR_EXPR, exp, 1);
-  return convert (ptrtype, adr);
+  tree result = convert (ptrtype, adr);
+  if (source_range)
+    set_source_range (&result, EXPR_LOCATION_RANGE (source_range));
+  return result;
 }
 
 /* Convert the function expression EXP to a pointer.  */
@@ -2867,6 +2877,9 @@ build_function_call_vec (location_t loc, vec<location_t> arg_loc,
   /* Strip NON_LVALUE_EXPRs, etc., since we aren't using as an lvalue.  */
   STRIP_TYPE_NOPS (function);
 
+  if (TREE_CODE (function) == SOURCE_RANGE)
+    function = TREE_OPERAND (function, 0);
+
   /* Convert anything with function type to a pointer-to-function.  */
   if (TREE_CODE (function) == FUNCTION_DECL)
     {
@@ -4306,6 +4319,9 @@ lvalue_p (const_tree ref)
     case BIND_EXPR:
       return TREE_CODE (TREE_TYPE (ref)) == ARRAY_TYPE;
 
+    case SOURCE_RANGE:
+      return lvalue_p (TREE_OPERAND (ref, 0));
+
     default:
       return 0;
     }
@@ -9466,6 +9482,7 @@ c_finish_return (location_t loc, tree retval, tree origtype)
 	    case NON_LVALUE_EXPR:
 	    case PLUS_EXPR:
 	    case POINTER_PLUS_EXPR:
+	    case SOURCE_RANGE:
 	      inner = TREE_OPERAND (inner, 0);
 	      continue;
 
@@ -9475,6 +9492,8 @@ c_finish_return (location_t loc, tree retval, tree origtype)
 		 don't give a warning.  */
 	      {
 		tree op1 = TREE_OPERAND (inner, 1);
+		if (TREE_CODE (op1) == SOURCE_RANGE)
+		  op1 = TREE_OPERAND (op1, 0);
 
 		while (!POINTER_TYPE_P (TREE_TYPE (op1))
 		       && (CONVERT_EXPR_P (op1)
@@ -9490,6 +9509,8 @@ c_finish_return (location_t loc, tree retval, tree origtype)
 
 	    case ADDR_EXPR:
 	      inner = TREE_OPERAND (inner, 0);
+	      if (TREE_CODE (inner) == SOURCE_RANGE)
+		inner = TREE_OPERAND (inner, 0);
 
 	      while (REFERENCE_CLASS_P (inner)
 		     && !INDIRECT_REF_P (inner))
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index b7a918b..47508b3 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7962,6 +7962,10 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	 at the toplevel.  */
       STRIP_USELESS_TYPE_CONVERSION (*expr_p);
 
+      /* For now, strip away source ranges here.  */
+      while (TREE_CODE (*expr_p) == SOURCE_RANGE)
+	*expr_p = TREE_OPERAND (*expr_p, 0);
+
       /* Remember the expr.  */
       save_expr = *expr_p;
 
diff --git a/gcc/print-tree.c b/gcc/print-tree.c
index ea50056..8b3794a 100644
--- a/gcc/print-tree.c
+++ b/gcc/print-tree.c
@@ -936,6 +936,27 @@ print_node (FILE *file, const char *prefix, tree node, int indent)
       expanded_location xloc = expand_location (EXPR_LOCATION (node));
       indent_to (file, indent+4);
       fprintf (file, "%s:%d:%d", xloc.file, xloc.line, xloc.column);
+
+      /* Print the range, if any */
+      source_range r = EXPR_LOCATION_RANGE (node);
+      if (r.m_start)
+	{
+	  xloc = expand_location (r.m_start);
+	  fprintf (file, " start: %s:%d:%d", xloc.file, xloc.line, xloc.column);
+	}
+      else
+	{
+	  fprintf (file, " start: unknown");
+	}
+      if (r.m_finish)
+	{
+	  xloc = expand_location (r.m_finish);
+	  fprintf (file, " finish: %s:%d:%d", xloc.file, xloc.line, xloc.column);
+	}
+      else
+	{
+	  fprintf (file, " finish: unknown");
+	}
     }
 
   fprintf (file, ">");
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 64d1fe4..6931ad9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1235,6 +1235,7 @@ enum omp_clause_proc_bind_kind
 struct GTY(()) tree_exp {
   struct tree_typed typed;
   location_t locus;
+  source_range range;
   tree GTY ((special ("tree_exp"),
 	     desc ("TREE_CODE ((tree) &%0)")))
     operands[1];
@@ -1404,6 +1405,7 @@ struct GTY (()) tree_binfo {
 struct GTY(()) tree_decl_minimal {
   struct tree_common common;
   location_t locus;
+  source_range range;
   unsigned int uid;
   tree name;
   tree context;
diff --git a/gcc/tree.c b/gcc/tree.c
index ed64fe7..d1595c2 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -4320,6 +4320,8 @@ build1_stat (enum tree_code code, tree type, tree node MEM_STAT_DECL)
 
   TREE_TYPE (t) = type;
   SET_EXPR_LOCATION (t, UNKNOWN_LOCATION);
+  EXPR_LOCATION_RANGE (t).m_start = UNKNOWN_LOCATION;
+  EXPR_LOCATION_RANGE (t).m_finish = UNKNOWN_LOCATION;
   TREE_OPERAND (t, 0) = node;
   if (node && !TYPE_P (node))
     {
@@ -4641,19 +4643,20 @@ build_nt_call_vec (tree fn, vec<tree, va_gc> *args)
 /* Create a DECL_... node of code CODE, name NAME and data type TYPE.
    We do NOT enter this node in any sort of symbol table.
 
-   LOC is the location of the decl.
+   RANGE is the source location of the decl.
 
    layout_decl is used to set up the decl's storage layout.
    Other slots are initialized to 0 or null pointers.  */
 
 tree
-build_decl_stat (location_t loc, enum tree_code code, tree name,
-    		 tree type MEM_STAT_DECL)
+build_decl_stat (source_range range, enum tree_code code, tree name,
+		 tree type MEM_STAT_DECL)
 {
   tree t;
 
   t = make_node_stat (code PASS_MEM_STAT);
-  DECL_SOURCE_LOCATION (t) = loc;
+  DECL_SOURCE_LOCATION (t) = range.m_start;
+  DECL_LOCATION_RANGE (t) = range;
 
 /*  if (type == error_mark_node)
     type = integer_type_node; */
@@ -4669,6 +4672,16 @@ build_decl_stat (location_t loc, enum tree_code code, tree name,
   return t;
 }
 
+/* As "build_decl_stat" above, but for location LOC. */
+
+tree
+build_decl_stat (location_t loc, enum tree_code code, tree name,
+		 tree type MEM_STAT_DECL)
+{
+  return build_decl_stat (source_range::from_location (loc),
+			  code, name, type PASS_MEM_STAT);
+}
+
 /* Builds and returns function declaration with NAME and TYPE.  */
 
 tree
@@ -13646,5 +13659,26 @@ nonnull_arg_p (const_tree arg)
   return false;
 }
 
+void
+set_source_range (tree *expr, location_t start, location_t finish)
+{
+  /* Add wrapper nodes for e.g. mentions of a parm_decl
+     in an expression, constants, etc.  */
+  if (!EXPR_P (*expr))
+    {
+      tree wrapper = build1 (SOURCE_RANGE, TREE_TYPE (*expr), *expr);
+      SET_EXPR_LOCATION (wrapper, start);
+      *expr = wrapper;
+    }
+
+  EXPR_LOCATION_RANGE (*expr).m_start = start;
+  EXPR_LOCATION_RANGE (*expr).m_finish = finish;
+}
+
+void
+set_source_range (tree *expr, source_range src_range)
+{
+  set_source_range (expr, src_range.m_start, src_range.m_finish);
+}
 
 #include "gt-tree.h"
diff --git a/gcc/tree.def b/gcc/tree.def
index 56580af..6ad84d7 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1380,6 +1380,8 @@ DEFTREECODE (CILK_SPAWN_STMT, "cilk_spawn_stmt", tcc_statement, 1)
 /* Cilk Sync statement: Does not have any operands.  */
 DEFTREECODE (CILK_SYNC_STMT, "cilk_sync_stmt", tcc_statement, 0)
 
+DEFTREECODE (SOURCE_RANGE, "source_range", tcc_expression, 1)
+
 /*
 Local variables:
 mode:c
diff --git a/gcc/tree.h b/gcc/tree.h
index e500151..66419d4 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -1066,6 +1066,16 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 #define EXPR_FILENAME(NODE) LOCATION_FILE (EXPR_CHECK ((NODE))->exp.locus)
 #define EXPR_LINENO(NODE) LOCATION_LINE (EXPR_CHECK (NODE)->exp.locus)
 
+#define CAN_HAVE_RANGE_P(NODE) (CAN_HAVE_LOCATION_P (NODE))
+#define EXPR_LOCATION_RANGE(NODE) (EXPR_CHECK ((NODE))->exp.range)
+#define EXPR_RANGE_OR_LOC(NODE, LOCUS) (CAN_HAVE_RANGE_P (NODE) \
+					? (NODE)->exp.range \
+					: source_range::from_location (LOCUS))
+#define EXPR_HAS_RANGE(NODE) \
+    (CAN_HAVE_RANGE_P (NODE) \
+     ? EXPR_LOCATION_RANGE (NODE).m_start != UNKNOWN_LOCATION \
+     : false)
+
 /* True if a tree is an expression or statement that can have a
    location.  */
 #define CAN_HAVE_LOCATION_P(NODE) ((NODE) && EXPR_P (NODE))
@@ -2092,6 +2102,9 @@ extern machine_mode element_mode (const_tree t);
 #define DECL_IS_BUILTIN(DECL) \
   (LOCATION_LOCUS (DECL_SOURCE_LOCATION (DECL)) <= BUILTINS_LOCATION)
 
+#define DECL_LOCATION_RANGE(NODE) \
+  (DECL_MINIMAL_CHECK (NODE)->decl_minimal.range)
+
 /*  For FIELD_DECLs, this is the RECORD_TYPE, UNION_TYPE, or
     QUAL_UNION_TYPE node that the field is a member of.  For VAR_DECL,
     PARM_DECL, FUNCTION_DECL, LABEL_DECL, RESULT_DECL, and CONST_DECL
@@ -3784,6 +3797,8 @@ extern tree build_tree_list_vec_stat (const vec<tree, va_gc> *MEM_STAT_DECL);
 #define build_tree_list_vec(v) build_tree_list_vec_stat (v MEM_STAT_INFO)
 extern tree build_decl_stat (location_t, enum tree_code,
 			     tree, tree MEM_STAT_DECL);
+extern tree build_decl_stat (source_range, enum tree_code,
+			     tree, tree MEM_STAT_DECL);
 extern tree build_fn_decl (const char *, tree);
 #define build_decl(l,c,t,q) build_decl_stat (l, c, t, q MEM_STAT_INFO)
 extern tree build_translation_unit_decl (tree);
@@ -5133,6 +5148,12 @@ type_with_alias_set_p (const_tree t)
   return false;
 }
 
+extern void
+set_source_range (tree *expr, location_t start, location_t finish);
+
+extern void
+set_source_range (tree *expr, source_range src_range);
+
 extern void gt_ggc_mx (tree &);
 extern void gt_pch_nx (tree &);
 extern void gt_pch_nx (tree &, gt_pointer_operator, void *);
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (12 preceding siblings ...)
  2015-09-10 20:29 ` [PATCH 05/22] Add overloads of inform, warning_at, etc that take a source_range David Malcolm
@ 2015-09-10 20:30 ` David Malcolm
  2015-09-11  3:19   ` Martin Sebor
  2015-09-10 20:30 ` [PATCH 12/22] Add source-ranges for trees David Malcolm
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch adds a test plugin that recurses down an expression tree,
printing diagnostics showing the ranges of each node in the tree.

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/show-trees.html

This needs a linker hack, since it's the only user of
  gcc_rich_location::add_expr
which thus doesn't appear in "cc1" until later patches in the kit
add uses of it; is there a clean way to fix that?

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic-test-show-trees-1.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_show_trees.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
        diagnostic_plugin_show_trees.c and diagnostic-test-show-trees-1.c.
---
 .../gcc.dg/plugin/diagnostic-test-show-trees-1.c   | 106 ++++++++++++
 .../gcc.dg/plugin/diagnostic_plugin_show_trees.c   | 179 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   2 +
 3 files changed, 287 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-trees-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_show_trees.c

diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-trees-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-trees-1.c
new file mode 100644
index 0000000..3088db2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-show-trees-1.c
@@ -0,0 +1,106 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret" } */
+
+/* This is an example file for use with
+   diagnostic_plugin_show_trees.c.
+
+   The plugin handles "__show_tree" by recursively dumping
+   the internal structure of the second input argument.
+
+   We want to accept an expression of any type.  To do this in C, we
+   use variadic arguments, but C requires at least one argument before
+   the ellipsis, so we have a dummy one.  */
+
+extern void __show_tree (int dummy, ...);
+
+extern double sqrt (double x);
+
+void test_quadratic (double a, double b, double c)
+{
+  __show_tree (0,
+     (-b + sqrt (b * b - 4 * a * c))
+     / (2 * a));
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+      / (2 * a));
+      ^~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+      ~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+            ^~~~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                  ~~~~~~^~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                  ~~^~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                  ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                      ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                          ~~~~~~^~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                          ~~^~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                          ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                              ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+                                  ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+        ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      / (2 * a));
+        ~~~^~~~
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      / (2 * a));
+         ^
+   { dg-end-multiline-output "" } */
+
+/* { dg-begin-multiline-output "" }
+      / (2 * a));
+             ^
+   { dg-end-multiline-output "" } */
+
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_show_trees.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_show_trees.c
new file mode 100644
index 0000000..957c5bb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_show_trees.c
@@ -0,0 +1,179 @@
+/* This plugin recursively dumps the source-code location ranges of
+   expressions, at the pre-gimplification tree stage.  */
+/* { dg-options "-O" } */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "toplev.h"
+#include "basic-block.h"
+#include "hash-table.h"
+#include "vec.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "tree-ssa-alias.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "intl.h"
+#include "plugin-version.h"
+#include "diagnostic.h"
+#include "context.h"
+#include "gcc-rich-location.h"
+#include "print-tree.h"
+
+/*
+  Hack: fails with linker error:
+./diagnostic_plugin_show_trees.so: undefined symbol: _ZN17gcc_rich_location8add_exprEP9tree_node
+  since nothing in the tree is using gcc_rich_location::add_expr yet.
+
+  I've tried various workarounds (adding DEBUG_FUNCTION to the
+  method, taking its address), but can't seem to fix it that way.
+  So as a nasty workaround, the following material is copied&pasted
+  from gcc-rich-location.c: */
+
+static bool
+get_range_for_expr (tree expr, location_range *r)
+{
+  if (EXPR_HAS_RANGE (expr))
+    {
+      source_range sr = EXPR_LOCATION_RANGE (expr);
+
+      /* Do we have meaningful data?  */
+      if (sr.m_start && sr.m_finish)
+	{
+	  r->m_start = expand_location (sr.m_start);
+	  r->m_finish = expand_location (sr.m_finish);
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Add a range to the rich_location, covering expression EXPR. */
+
+void
+gcc_rich_location::add_expr (tree expr)
+{
+  gcc_assert (expr);
+
+  location_range r;
+  r.m_caption = NULL;
+  r.m_show_caret_p = false;
+  if (get_range_for_expr (expr, &r))
+    add_range (&r);
+}
+
+/* FIXME: end of material taken from gcc-rich-location.c */
+
+int plugin_is_GPL_compatible;
+
+static void
+show_tree (tree node)
+{
+  if (!CAN_HAVE_RANGE_P (node))
+    return;
+
+  gcc_rich_location richloc (EXPR_LOCATION (node));
+  richloc.add_expr (node);
+
+  if (richloc.get_num_locations () < 2)
+    {
+      error_at_rich_loc (&richloc, "range not found");
+      return;
+    }
+
+  enum tree_code code = TREE_CODE (node);
+  if (code == SOURCE_RANGE)
+    code = TREE_CODE (TREE_OPERAND (node, 0));
+
+  location_range *range = richloc.get_range (1);
+  inform_at_rich_loc (&richloc,
+		      "%s at range %i:%i-%i:%i",
+		      get_tree_code_name (code),
+		      range->m_start.line,
+		      range->m_start.column,
+		      range->m_finish.line,
+		      range->m_finish.column);
+
+  /* Recurse.  */
+  int min_idx = 0;
+  int max_idx = TREE_OPERAND_LENGTH (node);
+  switch (TREE_CODE (node))
+    {
+    case CALL_EXPR:
+      min_idx = 3;
+      break;
+
+    default:
+      break;
+    }
+
+  for (int i = min_idx; i < max_idx; i++)
+    show_tree (TREE_OPERAND (node, i));
+}
+
+tree
+cb_walk_tree_fn (tree * tp, int * walk_subtrees,
+		 void * data ATTRIBUTE_UNUSED)
+{
+  if (TREE_CODE (*tp) != CALL_EXPR)
+    return NULL_TREE;
+
+  tree call_expr = *tp;
+  tree fn = CALL_EXPR_FN (call_expr);
+  if (TREE_CODE (fn) != ADDR_EXPR)
+    return NULL_TREE;
+  fn = TREE_OPERAND (fn, 0);
+  if (TREE_CODE (fn) == SOURCE_RANGE)
+    fn = TREE_OPERAND (fn, 0);
+  if (TREE_CODE (fn) != FUNCTION_DECL)
+    return NULL_TREE;
+  if (strcmp (IDENTIFIER_POINTER (DECL_NAME (fn)), "__show_tree"))
+    return NULL_TREE;
+
+  /* Get arg 1; print it! */
+  tree arg = CALL_EXPR_ARG (call_expr, 1);
+
+  show_tree (arg);
+
+  return NULL_TREE;
+}
+
+static void
+callback (void *gcc_data, void *user_data)
+{
+  tree fndecl = (tree)gcc_data;
+  walk_tree (&DECL_SAVED_TREE (fndecl), cb_walk_tree_fn, NULL, NULL);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  struct register_pass_info pass_info;
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  if (!plugin_default_version_check (version, &gcc_version))
+    return 1;
+
+  register_callback (plugin_name,
+		     PLUGIN_PRE_GENERICIZE,
+		     callback,
+		     NULL);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 2d2e47e..91f6391 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -70,6 +70,8 @@ set plugin_test_list [list \
 	  diagnostic-test-show-locus-utf-8-color.c } \
     { diagnostic_plugin_test_tree_expression_range.c \
 	  diagnostic-test-expressions-1.c } \
+    { diagnostic_plugin_show_trees.c \
+	  diagnostic-test-show-trees-1.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 14/22] C: capture tree ranges for various expressions
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (14 preceding siblings ...)
  2015-09-10 20:30 ` [PATCH 12/22] Add source-ranges for trees David Malcolm
@ 2015-09-10 20:30 ` David Malcolm
  2015-09-10 20:31 ` [PATCH 19/22] gcc-rich-location.[ch]: add debug methods for cpp_string_location David Malcolm
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch updates the C frontend to capture source_range information
for various kinds of tree expression.

It adds a unit test via a plugin, to verify the source ranges are
correct for many kinds of expression.

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/tree-range-unit-test.html

(There are some FIXMEs in here that probably need addressing)

gcc/c/ChangeLog:
	* c-convert.c (convert): Wrap, retaining any SOURCE_RANGE
	information, moving bulk of implementation to...
	(real_convert): New function.
	* c-parser.c (c_parser_expr_no_commas): Call set_source_range on
	the ret.value based on the range from the start of the LHS to the
	end of the RHS.
	(c_parser_conditional_expression): Call set_source_range on
	the ret.value based on the range from the start of the cond.value
	to the end of exp2.value.
	(c_parser_binary_expression): Call set_source_range on
	the stack values for TRUTH_ANDIF_EXPR and TRUTH_ORIF_EXPR.
	(c_parser_cast_expression): Call set_source_range on
	the ret.value based on the cast_loc through to the end of
	expr.value.
	(c_parser_unary_expression): Likewise, based on the
	op_loc through to the end of op.value.
	(c_parser_sizeof_expression) Likewise, based on the start of the
	sizeof token through to either the closing paren or the end of
	expr.value.
	(c_parser_postfix_expression): Likewise, using the token range,
	or from the open paren through to the close paren for
	parenthesized expressions.
	(c_parser_postfix_expression_after_primary): Likewise, for
	various kinds of expression.
	* c-typeck.c (parser_build_unary_op): Likewise, for prefix
	unary ops.
	(parser_build_binary_op): Likewise, running from the start of
	arg1.value through to the end of arg2.value.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic-test-expressions-1.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c:
	New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	diagnostic_plugin_test_tree_expression_range.c and
	diagnostic-test-expressions-1.c.
---
 gcc/c/c-convert.c                                  |  17 +-
 gcc/c/c-parser.c                                   |  74 ++-
 gcc/c/c-typeck.c                                   |  10 +
 .../gcc.dg/plugin/diagnostic-test-expressions-1.c  | 562 +++++++++++++++++++++
 .../diagnostic_plugin_test_tree_expression_range.c | 162 ++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   2 +
 6 files changed, 821 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c

diff --git a/gcc/c/c-convert.c b/gcc/c/c-convert.c
index f023de9..c419d49 100644
--- a/gcc/c/c-convert.c
+++ b/gcc/c/c-convert.c
@@ -55,7 +55,8 @@ along with GCC; see the file COPYING3.  If not see
      In tree.c: get_narrower and get_unwidened.  */
 \f
 /* Subroutines of `convert'.  */
-
+static tree
+real_convert (tree type, tree expr);
 
 \f
 /* Create an expression whose value is that of EXPR,
@@ -67,6 +68,20 @@ along with GCC; see the file COPYING3.  If not see
 tree
 convert (tree type, tree expr)
 {
+  tree result = real_convert (type, expr);
+  if (TREE_CODE (expr) == SOURCE_RANGE)
+    {
+      set_source_range (&result, EXPR_LOCATION_RANGE (expr));
+      SET_EXPR_LOCATION (result, EXPR_LOCATION (expr));
+    }
+  return result;
+}
+
+/* FIXME.  */
+
+static tree
+real_convert (tree type, tree expr)
+{
   tree e = expr;
   enum tree_code code = TREE_CODE (type);
   const char *invalid_conv_diag;
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 9bb5200..4303496 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -6052,6 +6052,9 @@ c_parser_expr_no_commas (c_parser *parser, struct c_expr *after,
   ret.value = build_modify_expr (op_location, lhs.value, lhs.original_type,
 				 code, exp_location, rhs.value,
 				 rhs.original_type);
+  set_source_range (&ret.value,
+		    EXPR_LOCATION_RANGE (lhs.value).m_start,
+		    EXPR_LOCATION_RANGE (rhs.value).m_finish);
   if (code == NOP_EXPR)
     ret.original_code = MODIFY_EXPR;
   else
@@ -6082,7 +6085,7 @@ c_parser_conditional_expression (c_parser *parser, struct c_expr *after,
 				 tree omp_atomic_lhs)
 {
   struct c_expr cond, exp1, exp2, ret;
-  location_t cond_loc, colon_loc, middle_loc;
+  location_t start, cond_loc, colon_loc, middle_loc;
 
   gcc_assert (!after || c_dialect_objc ());
 
@@ -6090,6 +6093,10 @@ c_parser_conditional_expression (c_parser *parser, struct c_expr *after,
 
   if (c_parser_next_token_is_not (parser, CPP_QUERY))
     return cond;
+  if (cond.value != error_mark_node)
+    start = EXPR_LOCATION_RANGE (cond.value).m_start;
+  else
+    start = UNKNOWN_LOCATION;
   cond_loc = c_parser_peek_token (parser)->location;
   cond = convert_lvalue_to_rvalue (cond_loc, cond, true, true);
   c_parser_consume_token (parser);
@@ -6165,6 +6172,9 @@ c_parser_conditional_expression (c_parser *parser, struct c_expr *after,
 			   ? t1
 			   : NULL);
     }
+  set_source_range (&ret.value,
+		    start,
+		    EXPR_LOCATION_RANGE (exp2.value).m_finish);
   return ret;
 }
 
@@ -6317,6 +6327,7 @@ c_parser_binary_expression (c_parser *parser, struct c_expr *after,
     {
       enum c_parser_prec oprec;
       enum tree_code ocode;
+      source_range src_range;
       if (parser->error)
 	goto out;
       switch (c_parser_peek_token (parser)->type)
@@ -6405,6 +6416,7 @@ c_parser_binary_expression (c_parser *parser, struct c_expr *after,
       switch (ocode)
 	{
 	case TRUTH_ANDIF_EXPR:
+	  src_range = EXPR_LOCATION_RANGE (stack[sp].expr.value);
 	  stack[sp].expr
 	    = convert_lvalue_to_rvalue (stack[sp].loc,
 					stack[sp].expr, true, true);
@@ -6412,8 +6424,10 @@ c_parser_binary_expression (c_parser *parser, struct c_expr *after,
 	    (stack[sp].loc, default_conversion (stack[sp].expr.value));
 	  c_inhibit_evaluation_warnings += (stack[sp].expr.value
 					    == truthvalue_false_node);
+	  set_source_range (&stack[sp].expr.value, src_range);
 	  break;
 	case TRUTH_ORIF_EXPR:
+	  src_range = EXPR_LOCATION_RANGE (stack[sp].expr.value);
 	  stack[sp].expr
 	    = convert_lvalue_to_rvalue (stack[sp].loc,
 					stack[sp].expr, true, true);
@@ -6421,6 +6435,7 @@ c_parser_binary_expression (c_parser *parser, struct c_expr *after,
 	    (stack[sp].loc, default_conversion (stack[sp].expr.value));
 	  c_inhibit_evaluation_warnings += (stack[sp].expr.value
 					    == truthvalue_true_node);
+	  set_source_range (&stack[sp].expr.value, src_range);
 	  break;
 	default:
 	  break;
@@ -6489,6 +6504,9 @@ c_parser_cast_expression (c_parser *parser, struct c_expr *after)
 	expr = convert_lvalue_to_rvalue (expr_loc, expr, true, true);
       }
       ret.value = c_cast_expr (cast_loc, type_name, expr.value);
+      if (ret.value && expr.value)
+	set_source_range(&ret.value, cast_loc,
+			 EXPR_LOCATION_RANGE (expr.value).m_finish);
       ret.original_code = ERROR_MARK;
       ret.original_type = NULL;
       return ret;
@@ -6538,6 +6556,7 @@ c_parser_unary_expression (c_parser *parser)
   struct c_expr ret, op;
   location_t op_loc = c_parser_peek_token (parser)->location;
   location_t exp_loc;
+  location_t finish;
   ret.original_code = ERROR_MARK;
   ret.original_type = NULL;
   switch (c_parser_peek_token (parser)->type)
@@ -6577,8 +6596,10 @@ c_parser_unary_expression (c_parser *parser)
       c_parser_consume_token (parser);
       exp_loc = c_parser_peek_token (parser)->location;
       op = c_parser_cast_expression (parser, NULL);
+      finish = EXPR_LOCATION_RANGE (op.value).m_finish;
       op = convert_lvalue_to_rvalue (exp_loc, op, true, true);
       ret.value = build_indirect_ref (op_loc, op.value, RO_UNARY_STAR);
+      set_source_range (&ret.value, op_loc, finish);
       return ret;
     case CPP_PLUS:
       if (!c_dialect_objc () && !in_system_header_at (input_location))
@@ -6666,8 +6687,15 @@ static struct c_expr
 c_parser_sizeof_expression (c_parser *parser)
 {
   struct c_expr expr;
+  struct c_expr result;
   location_t expr_loc;
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_SIZEOF));
+
+  location_t start;
+  location_t finish = UNKNOWN_LOCATION;
+
+  start = c_parser_peek_token (parser)->location;
+
   c_parser_consume_token (parser);
   c_inhibit_evaluation_warnings++;
   in_sizeof++;
@@ -6681,6 +6709,7 @@ c_parser_sizeof_expression (c_parser *parser)
       expr_loc = c_parser_peek_token (parser)->location;
       type_name = c_parser_type_name (parser);
       c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
+      finish = parser->tokens_buf[0].range.m_finish; // FIXME: better access API to last token
       if (type_name == NULL)
 	{
 	  struct c_expr ret;
@@ -6696,17 +6725,19 @@ c_parser_sizeof_expression (c_parser *parser)
 	  expr = c_parser_postfix_expression_after_paren_type (parser,
 							       type_name,
 							       expr_loc);
+	  finish = EXPR_LOCATION_RANGE (expr.value).m_finish;
 	  goto sizeof_expr;
 	}
       /* sizeof ( type-name ).  */
       c_inhibit_evaluation_warnings--;
       in_sizeof--;
-      return c_expr_sizeof_type (expr_loc, type_name);
+      result = c_expr_sizeof_type (expr_loc, type_name);
     }
   else
     {
       expr_loc = c_parser_peek_token (parser)->location;
       expr = c_parser_unary_expression (parser);
+      finish = EXPR_LOCATION_RANGE (expr.value).m_finish;
     sizeof_expr:
       c_inhibit_evaluation_warnings--;
       in_sizeof--;
@@ -6714,8 +6745,11 @@ c_parser_sizeof_expression (c_parser *parser)
       if (TREE_CODE (expr.value) == COMPONENT_REF
 	  && DECL_C_BIT_FIELD (TREE_OPERAND (expr.value, 1)))
 	error_at (expr_loc, "%<sizeof%> applied to a bit-field");
-      return c_expr_sizeof_expr (expr_loc, expr);
+      result = c_expr_sizeof_expr (expr_loc, expr);
     }
+  if (finish != UNKNOWN_LOCATION)
+    set_source_range (&result.value, start, finish);
+  return result;
 }
 
 /* Parse an alignof expression.  */
@@ -7135,13 +7169,14 @@ c_parser_postfix_expression (c_parser *parser)
   struct c_expr expr, e1;
   struct c_type_name *t1, *t2;
   location_t loc = c_parser_peek_token (parser)->location;;
-  source_range src_range = c_parser_peek_token (parser)->range;
+  source_range tok_range = c_parser_peek_token (parser)->range;
   expr.original_code = ERROR_MARK;
   expr.original_type = NULL;
   switch (c_parser_peek_token (parser)->type)
     {
     case CPP_NUMBER:
       expr.value = c_parser_peek_token (parser)->value;
+      set_source_range (&expr.value, tok_range);
       loc = c_parser_peek_token (parser)->location;
       c_parser_consume_token (parser);
       if (TREE_CODE (expr.value) == FIXED_CST
@@ -7156,6 +7191,7 @@ c_parser_postfix_expression (c_parser *parser)
     case CPP_CHAR32:
     case CPP_WCHAR:
       expr.value = c_parser_peek_token (parser)->value;
+      set_source_range (&expr.value, tok_range);
       c_parser_consume_token (parser);
       break;
     case CPP_STRING:
@@ -7164,6 +7200,7 @@ c_parser_postfix_expression (c_parser *parser)
     case CPP_WSTRING:
     case CPP_UTF8STRING:
       expr.value = c_parser_peek_token (parser)->value;
+      set_source_range (&expr.value, tok_range);
       expr.original_code = STRING_CST;
       c_parser_consume_token (parser);
       break;
@@ -7171,6 +7208,7 @@ c_parser_postfix_expression (c_parser *parser)
       gcc_assert (c_dialect_objc ());
       expr.value
 	= objc_build_string_object (c_parser_peek_token (parser)->value);
+      set_source_range (&expr.value, tok_range);
       c_parser_consume_token (parser);
       break;
     case CPP_NAME:
@@ -7180,10 +7218,11 @@ c_parser_postfix_expression (c_parser *parser)
 	  {
 	    tree id = c_parser_peek_token (parser)->value;
 	    c_parser_consume_token (parser);
-	    expr.value = build_external_ref (src_range, id,
+	    expr.value = build_external_ref (tok_range, id,
 					     (c_parser_peek_token (parser)->type
 					      == CPP_OPEN_PAREN),
 					     &expr.original_type);
+	    set_source_range (&expr.value, tok_range);
 	    break;
 	  }
 	case C_ID_CLASSNAME:
@@ -7272,6 +7311,7 @@ c_parser_postfix_expression (c_parser *parser)
       else
 	{
 	  /* A parenthesized expression.  */
+	  location_t loc_open_paren = c_parser_peek_token (parser)->location;
 	  c_parser_consume_token (parser);
 	  expr = c_parser_expression (parser);
 	  if (TREE_CODE (expr.value) == MODIFY_EXPR)
@@ -7279,6 +7319,8 @@ c_parser_postfix_expression (c_parser *parser)
 	  if (expr.original_code != C_MAYBE_CONST_EXPR)
 	    expr.original_code = ERROR_MARK;
 	  /* Don't change EXPR.ORIGINAL_TYPE.  */
+	  location_t loc_close_paren = c_parser_peek_token (parser)->location;
+	  set_source_range (&expr.value, loc_open_paren, loc_close_paren);
 	  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
 				     "expected %<)%>");
 	}
@@ -7869,6 +7911,8 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
   vec<tree, va_gc> *exprlist;
   vec<tree, va_gc> *origtypes = NULL;
   vec<location_t> arg_loc = vNULL;
+  location_t start;
+  location_t finish;
 
   while (true)
     {
@@ -7905,7 +7949,10 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 		{
 		  c_parser_skip_until_found (parser, CPP_CLOSE_SQUARE,
 					     "expected %<]%>");
+		  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+		  finish = parser->tokens_buf[0].range.m_finish; // FIXME: better access API to last token
 		  expr.value = build_array_ref (op_loc, expr.value, idx);
+		  set_source_range (&expr.value, start, finish);
 		}
 	    }
 	  expr.original_code = ERROR_MARK;
@@ -7948,9 +7995,14 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 			"%<memset%> used with constant zero length parameter; "
 			"this could be due to transposed parameters");
 
+	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+	  finish = parser->tokens_buf[0].range.m_finish; // FIXME: better access API to last token
 	  expr.value
 	    = c_build_function_call_vec (expr_loc, arg_loc, expr.value,
 					 exprlist, origtypes);
+	  set_source_range (&expr.value,
+			    start, finish);
+
 	  expr.original_code = ERROR_MARK;
 	  if (TREE_CODE (expr.value) == INTEGER_CST
 	      && TREE_CODE (orig_expr.value) == FUNCTION_DECL
@@ -7979,8 +8031,11 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
               expr.original_type = NULL;
 	      return expr;
 	    }
+	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+	  finish = c_parser_peek_token (parser)->range.m_finish;
 	  c_parser_consume_token (parser);
 	  expr.value = build_component_ref (op_loc, expr.value, ident);
+	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  if (TREE_CODE (expr.value) != COMPONENT_REF)
 	    expr.original_type = NULL;
@@ -8008,12 +8063,15 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 	      expr.original_type = NULL;
 	      return expr;
 	    }
+	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+	  finish = c_parser_peek_token (parser)->range.m_finish;
 	  c_parser_consume_token (parser);
 	  expr.value = build_component_ref (op_loc,
 					    build_indirect_ref (op_loc,
 								expr.value,
 								RO_ARROW),
 					    ident);
+	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  if (TREE_CODE (expr.value) != COMPONENT_REF)
 	    expr.original_type = NULL;
@@ -8029,6 +8087,8 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 	  break;
 	case CPP_PLUS_PLUS:
 	  /* Postincrement.  */
+	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+	  finish = c_parser_peek_token (parser)->range.m_finish;
 	  c_parser_consume_token (parser);
 	  /* If the expressions have array notations, we expand them.  */
 	  if (flag_cilkplus
@@ -8040,11 +8100,14 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 	      expr.value = build_unary_op (op_loc,
 					   POSTINCREMENT_EXPR, expr.value, 0);
 	    }
+	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  expr.original_type = NULL;
 	  break;
 	case CPP_MINUS_MINUS:
 	  /* Postdecrement.  */
+	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
+	  finish = c_parser_peek_token (parser)->range.m_finish;
 	  c_parser_consume_token (parser);
 	  /* If the expressions have array notations, we expand them.  */
 	  if (flag_cilkplus
@@ -8056,6 +8119,7 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 	      expr.value = build_unary_op (op_loc,
 					   POSTDECREMENT_EXPR, expr.value, 0);
 	    }
+	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  expr.original_type = NULL;
 	  break;
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 6c60dc8..4123f11 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -3408,6 +3408,12 @@ parser_build_unary_op (location_t loc, enum tree_code code, struct c_expr arg)
     overflow_warning (loc, result.value);
     }
 
+  /* We are typically called when parsing a prefix token at LOC acting on
+     ARG.  Reflect this by updating the source range of the result to
+     start at LOC and end at the end of ARG.  */
+  set_source_range (&result.value,
+		    loc, EXPR_LOCATION_RANGE (arg.value).m_finish);
+
   return result;
 }
 
@@ -3445,6 +3451,10 @@ parser_build_binary_op (location_t location, enum tree_code code,
   if (location != UNKNOWN_LOCATION)
     protected_set_expr_location (result.value, location);
 
+  set_source_range (&result.value,
+		    EXPR_LOCATION_RANGE (arg1.value).m_start,
+		    EXPR_LOCATION_RANGE (arg2.value).m_finish);
+
   /* Check for cases such as x+y<<z which users are likely
      to misinterpret.  */
   if (warn_parentheses)
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
new file mode 100644
index 0000000..7863c34
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-expressions-1.c
@@ -0,0 +1,562 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret" } */
+
+/* This is a collection of unittests to verify that we're correctly
+   capturing the source code ranges of various kinds of expression.
+
+   It uses the various "diagnostic_test_*_expression_range_plugin"
+   plugins which handles "__emit_expression_range" by generating a warning
+   at the given source range of the input argument.  Each of the
+   different plugins do this at a different phase of the internal
+   representation (tree, gimple, etc), so we can verify that the
+   source code range information is valid at each phase.
+
+   We want to accept an expression of any type.  To do this in C, we
+   use variadic arguments, but C requires at least one argument before
+   the ellipsis, so we have a dummy one.  */
+
+extern void __emit_expression_range (int dummy, ...);
+
+int global;
+
+void test_global (void)
+{
+  __emit_expression_range (0, global); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, global);
+                               ^~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_param (int param)
+{
+  __emit_expression_range (0, param); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, param);
+                               ^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_local (void)
+{
+  int local = 5;
+
+  __emit_expression_range (0, local); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, local);
+                               ^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_integer_constants (void)
+{
+  __emit_expression_range (0, 1234); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 1234);
+                               ^~~~
+   { dg-end-multiline-output "" } */
+
+  /* Ensure that zero works.  */
+
+  __emit_expression_range (0, 0); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 0);
+                               ^
+   { dg-end-multiline-output "" } */
+}
+
+void test_character_constants (void)
+{
+  __emit_expression_range (0, 'a'); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 'a');
+                               ^~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_floating_constants (void)
+{
+  __emit_expression_range (0, 98.6); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 98.6);
+                               ^~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, .6); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, .6);
+                               ^~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, 98.); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 98.);
+                               ^~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, 6.022140857e23 ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 6.022140857e23 );
+                               ^~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, 98.6f ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 98.6f );
+                               ^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, 6.022140857e23l ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, 6.022140857e23l );
+                               ^~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+enum test_enum {
+  TEST_ENUM_VALUE
+};
+
+void test_enumeration_constant (void)
+{
+  __emit_expression_range (0, TEST_ENUM_VALUE ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, TEST_ENUM_VALUE );
+                               ^~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_parentheses (int a, int b)
+{
+  __emit_expression_range (0, (a + b) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, (a + b) );
+                               ~~~^~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, (a + b) * (a - b) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, (a + b) * (a - b) );
+                               ~~~~~~~~^~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, !(a && b) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, !(a && b) );
+                               ^~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_string_literal (void)
+{
+  __emit_expression_range (0, "0123456789"); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, "0123456789");
+                               ^~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Postfix expressions.  ************************************************/
+
+void test_array_reference (int *arr)
+{
+  __emit_expression_range (0, arr[100] ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, arr[100] );
+                               ~~~^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+int test_function_call (int p, int q, int r)
+{
+  __emit_expression_range (0, test_function_call (p, q, r) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, test_function_call (p, q, r) );
+                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+  return 0;
+}
+
+struct test_struct
+{
+  int field;
+};
+
+int test_structure_references (struct test_struct *ptr)
+{
+  struct test_struct local;
+  local.field = 42;
+
+  __emit_expression_range (0, local.field ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, local.field );
+                               ~~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, ptr->field ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, ptr->field );
+                               ~~~^~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+int test_postfix_incdec (int i)
+{
+  __emit_expression_range (0, i++ ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, i++ );
+                               ~^~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, i-- ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, i-- );
+                               ~^~
+   { dg-end-multiline-output "" } */
+}
+
+/* Unary operators.  ****************************************************/
+
+int test_prefix_incdec (int i)
+{
+  __emit_expression_range (0, ++i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, ++i );
+                               ^~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, --i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, --i );
+                               ^~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_address_operator (void)
+{
+  __emit_expression_range (0, &global ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, &global );
+                               ^~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_indirection (int *ptr)
+{
+  __emit_expression_range (0, *ptr ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, *ptr );
+                               ^~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_unary_plus (int i)
+{
+  __emit_expression_range (0, +i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, +i );
+                               ^~
+   { dg-end-multiline-output "" } */
+}
+
+void test_unary_minus (int i)
+{
+  __emit_expression_range (0, -i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, -i );
+                               ^~
+   { dg-end-multiline-output "" } */
+}
+
+void test_ones_complement (int i)
+{
+  __emit_expression_range (0, ~i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, ~i );
+                               ^~
+   { dg-end-multiline-output "" } */
+}
+
+void test_logical_negation (int flag)
+{
+  __emit_expression_range (0, !flag ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, !flag );
+                               ^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_sizeof (int i)
+{
+  __emit_expression_range (0, sizeof i ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, sizeof i );
+                               ^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, sizeof (char) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, sizeof (char) );
+                               ^~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Casts.  ****************************************************/
+
+void test_cast (void *ptr)
+{
+  __emit_expression_range (0, (int *)ptr ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, (int *)ptr );
+                               ^~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+}
+
+/* Binary operators.  *******************************************/
+
+void test_multiplicative_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs * rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs * rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs / rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs / rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs % rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs % rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_additive_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs + rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs + rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs - rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs - rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_shift_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs << rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs << rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs >> rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs >> rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_relational_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs < rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs < rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs > rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs > rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs <= rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs <= rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs >= rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs >= rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_equality_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs == rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs == rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs != rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs != rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_bitwise_binary_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs & rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs & rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs ^ rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs ^ rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs | rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs | rhs );
+                               ~~~~^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_logical_operators (int lhs, int rhs)
+{
+  __emit_expression_range (0, lhs && rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs && rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, lhs || rhs ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, lhs || rhs );
+                               ~~~~^~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Conditional operator.  *******************************************/
+
+void test_conditional_operators (int flag, int on_true, int on_false)
+{
+  __emit_expression_range (0, flag ? on_true : on_false ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, flag ? on_true : on_false );
+                               ~~~~~~~~~~~~~~~^~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Assignment expressions.  *******************************************/
+
+void test_assignment_expressions (int dest, int other)
+{
+  __emit_expression_range (0, dest = other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest = other );
+                               ~~~~~^~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest *= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest *= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest /= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest /= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest %= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest %= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest += other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest += other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest -= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest -= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest <<= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest <<= other );
+                               ~~~~~^~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest >>= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest >>= other );
+                               ~~~~~^~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest &= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest &= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest ^= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest ^= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0, dest |= other ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, dest |= other );
+                               ~~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Comma operator.  *******************************************/
+
+void test_comma_operator (int a, int b)
+{
+  __emit_expression_range (0, (a++, a + b) ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, (a++, a + b) );
+                               ~~~~^~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+/* Examples of non-trivial expressions.  ****************************/
+
+extern double sqrt (double x);
+
+void test_quadratic (double a, double b, double c)
+{
+  __emit_expression_range (0, b * b - 4 * a * c ); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+   __emit_expression_range (0, b * b - 4 * a * c );
+                               ~~~~~~^~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  __emit_expression_range (0,
+     (-b + sqrt (b * b - 4 * a * c))
+     / (2 * a)); /* { dg-warning "range" } */
+/* { dg-begin-multiline-output "" }
+      (-b + sqrt (b * b - 4 * a * c))
+      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+      / (2 * a));
+      ^~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c
new file mode 100644
index 0000000..591ac25
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_tree_expression_range.c
@@ -0,0 +1,162 @@
+/* This plugin verifies the source-code location ranges of
+   expressions, at the pre-gimplification tree stage.  */
+/* { dg-options "-O" } */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "toplev.h"
+#include "basic-block.h"
+#include "hash-table.h"
+#include "vec.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "tree-ssa-alias.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "intl.h"
+#include "plugin-version.h"
+#include "diagnostic.h"
+#include "context.h"
+#include "gcc-rich-location.h"
+#include "print-tree.h"
+
+/*
+  Hack: fails with linker error:
+./diagnostic_plugin_test_tree_expression_range.so: undefined symbol: _ZN17gcc_rich_location8add_exprEP9tree_node
+  since nothing in the tree is using gcc_rich_location::add_expr yet.
+
+  I've tried various workarounds (adding DEBUG_FUNCTION to the
+  method, taking its address), but can't seem to fix it that way.
+  So as a nasty workaround, the following material is copied&pasted
+  from gcc-rich-location.c: */
+
+static bool
+get_range_for_expr (tree expr, location_range *r)
+{
+  if (EXPR_HAS_RANGE (expr))
+    {
+      source_range sr = EXPR_LOCATION_RANGE (expr);
+
+      /* Do we have meaningful data?  */
+      if (sr.m_start && sr.m_finish)
+	{
+	  r->m_start = expand_location (sr.m_start);
+	  r->m_finish = expand_location (sr.m_finish);
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Add a range to the rich_location, covering expression EXPR. */
+
+void
+gcc_rich_location::add_expr (tree expr)
+{
+  gcc_assert (expr);
+
+  location_range r;
+  r.m_caption = NULL;
+  r.m_show_caret_p = false;
+  if (get_range_for_expr (expr, &r))
+    add_range (&r);
+}
+
+/* FIXME: end of material taken from gcc-rich-location.c */
+
+
+int plugin_is_GPL_compatible;
+
+static void
+emit_warning (rich_location *richloc)
+{
+  if (richloc->get_num_locations () < 2)
+    {
+      error_at_rich_loc (richloc, "range not found");
+      return;
+    }
+
+  location_range *range = richloc->get_range (1);
+  warning_at_rich_loc (richloc, 0,
+		       "tree range %i:%i-%i:%i",
+		       range->m_start.line,
+		       range->m_start.column,
+		       range->m_finish.line,
+		       range->m_finish.column);
+}
+
+tree
+cb_walk_tree_fn (tree * tp, int * walk_subtrees,
+		 void * data ATTRIBUTE_UNUSED)
+{
+  if (TREE_CODE (*tp) != CALL_EXPR)
+    return NULL_TREE;
+
+  tree call_expr = *tp;
+  tree fn = CALL_EXPR_FN (call_expr);
+  if (TREE_CODE (fn) != ADDR_EXPR)
+    return NULL_TREE;
+  fn = TREE_OPERAND (fn, 0);
+  if (TREE_CODE (fn) == SOURCE_RANGE)
+    fn = TREE_OPERAND (fn, 0);
+  if (TREE_CODE (fn) != FUNCTION_DECL)
+    return NULL_TREE;
+  if (strcmp (IDENTIFIER_POINTER (DECL_NAME (fn)), "__emit_expression_range"))
+    return NULL_TREE;
+
+  /* Get arg 1; print it! */
+  //debug_tree (call_expr);
+
+  tree arg = CALL_EXPR_ARG (call_expr, 1);
+  //debug_tree (arg);
+
+  gcc_rich_location richloc (EXPR_LOCATION (arg));
+  richloc.add_expr (arg);
+  emit_warning (&richloc);
+
+  return NULL_TREE; //  should we be setting *walk_subtrees?
+}
+
+static void
+callback (void *gcc_data, void *user_data)
+{
+  //fprintf (stdout, "callback called!\n");
+  tree fndecl = (tree)gcc_data;
+
+  /* FIXME: is this actually going to be valid on all frontends
+     before genericize? */
+  walk_tree (&DECL_SAVED_TREE (fndecl), cb_walk_tree_fn, NULL, NULL);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  struct register_pass_info pass_info;
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  if (!plugin_default_version_check (version, &gcc_version))
+    return 1;
+
+  register_callback (plugin_name,
+		     PLUGIN_PRE_GENERICIZE,
+		     callback,
+		     NULL);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 1017044..2d2e47e 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -68,6 +68,8 @@ set plugin_test_list [list \
 	  diagnostic-test-show-locus-ascii-color.c \
 	  diagnostic-test-show-locus-utf-8-bw.c \
 	  diagnostic-test-show-locus-utf-8-color.c } \
+    { diagnostic_plugin_test_tree_expression_range.c \
+	  diagnostic-test-expressions-1.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 19/22] gcc-rich-location.[ch]: add debug methods for cpp_string_location
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (15 preceding siblings ...)
  2015-09-10 20:30 ` [PATCH 14/22] C: capture tree ranges for various expressions David Malcolm
@ 2015-09-10 20:31 ` David Malcolm
  2015-09-10 20:31 ` [PATCH 18/22] Track locations within string literals in tree_string David Malcolm
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

gcc/ChangeLog:
	* gcc-rich-location.c (cpp_string_fragment_location::debug): New.
	(cpp_string_location::debug): New.
---
 gcc/gcc-rich-location.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/gcc/gcc-rich-location.c b/gcc/gcc-rich-location.c
index 003e8f0..ae0b3bb 100644
--- a/gcc/gcc-rich-location.c
+++ b/gcc/gcc-rich-location.c
@@ -190,3 +190,29 @@ gcc_rich_location::expand_caption_va (diagnostic_context *context,
   pp_clear_output_area (pp);
   return result;
 }
+
+/* Debugging method.  Print a diagnostic showing the given fragment.  */
+
+void
+cpp_string_fragment_location::debug (const char *msg) const
+{
+  rich_location richloc (get_covered_range ());
+  inform_at_rich_loc (&richloc, "%s", msg);
+}
+
+/* Debugging method.  Print diagnostic showings the fragment(s) that make
+   up the cpp_string_location.  */
+
+void
+cpp_string_location::debug () const
+{
+  char buf[64];
+  for (unsigned int i = 0; i < m_num_fraglocs; i++)
+    {
+      cpp_string_fragment_location *fragment = &m_fragloc_array[i];
+      snprintf (buf, sizeof (buf),
+		"fragment %i; %i cols per char",
+		i, fragment->m_cols_per_char);
+      fragment->debug (buf);
+    }
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 18/22] Track locations within string literals in tree_string
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (16 preceding siblings ...)
  2015-09-10 20:31 ` [PATCH 19/22] gcc-rich-location.[ch]: add debug methods for cpp_string_location David Malcolm
@ 2015-09-10 20:31 ` David Malcolm
  2015-09-10 20:32 ` [PATCH 17/22] libcpp: add location tracking within string literals David Malcolm
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This patch uses the string-literal location generated in libcpp in the
previous patch, and stores it in tree_string (adding a new field there).

This hasn't been optimized.  Perhaps the case of a single unbroken run
of 1-column per-char is the most common case, so we could only bother to
store the character range info for those string literal that are exceptions
to this rule.

The patch adds unit testing via a plugin.

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/string-literals.html

gcc/c-family/ChangeLog:
	* c-lex.c (ensure_string_has_location): New function.
	(lex_string): Call ensure_string_has_location on the cpp_string;
	pass in istr.loc to the call to build_string.
	(lex_charconst): Call ensure_string_has_location on the cpp_string;

gcc/cp/ChangeLog:
	* parser.c (cp_parser_string_literal): Call init_raw on the
	str.loc.  Pass in the istr.loc to the call to build_string.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic-test-string-literals-1.c: New file.
	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	diagnostic_plugin_test_string_literals.c and
	diagnostic-test-string-literals-1.c.

gcc/ChangeLog:
	* tree-core.h (struct cpp_string_location): Add forward
	declaration.
	(struct tree_string): Add cpp_string_location * field "loc".
	* tree.c: Include cpplib.h.
	(build_string): Initialize TREE_STRING_LOCATION (s) to NULL.
	(cpp_string_location_stats): New global.
	(build_string): New overload.
	* tree.h (TREE_STRING_LOCATION): New macro.
	(cpp_string_location_stats): New struct and global.
	(build_string): Add overload taking an additional
	cpp_string_location * param.

libcpp/ChangeLog:
	* charset.c (cpp_string_location::get_range_between_indices): New
	method.
	* include/cpplib.h
	(cpp_string_location::get_range_between_indices): Likewise.
---
 gcc/c-family/c-lex.c                               |  23 ++-
 gcc/cp/parser.c                                    |   4 +-
 .../plugin/diagnostic-test-string-literals-1.c     | 139 +++++++++++++
 .../diagnostic_plugin_test_string_literals.c       | 215 +++++++++++++++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp             |   2 +
 gcc/tree-core.h                                    |   3 +
 gcc/tree.c                                         |  24 +++
 gcc/tree.h                                         |  12 ++
 libcpp/charset.c                                   |  12 ++
 libcpp/include/cpplib.h                            |   2 +
 10 files changed, 434 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c

diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index f457199..6eb8fcc 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -1076,6 +1076,19 @@ interpret_fixed (const cpp_token *token, unsigned int flags)
   return value;
 }
 
+/* FIXME.  */
+
+static void
+ensure_string_has_location (const cpp_string *str, source_location src_loc)
+{
+  if (!str->loc.m_fragloc_array)
+    {
+      cpp_string *mutstr = const_cast <cpp_string *> (str);
+      cpp_string_location *strloc = &mutstr->loc;
+      strloc->init_raw (src_loc, mutstr->len, 1, line_table);
+    }
+}
+
 /* Convert a series of STRING, WSTRING, STRING16, STRING32 and/or
    UTF8STRING tokens into a tree, performing string constant
    concatenation.  TOK is the first of these.  VALP is the location to
@@ -1107,7 +1120,9 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
   /* Try to avoid the overhead of creating and destroying an obstack
      for the common case of just one string.  */
   cpp_string str = tok->val.str;
+  ensure_string_has_location (&str, tok->src_loc);
   cpp_string *strs = &str;
+  location_t str0_loc = tok->src_loc;
 
   /* objc_at_sign_was_seen is only used when doing Objective-C string
      concatenation.  It is 'true' if we have seen an '@' before the
@@ -1146,16 +1161,20 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
 	  else
 	    error ("unsupported non-standard concatenation of string literals");
 	}
+      /* FALLTHROUGH */
 
     case CPP_STRING:
       if (!concats)
 	{
 	  gcc_obstack_init (&str_ob);
+	  ensure_string_has_location (&str, str0_loc);
 	  obstack_grow (&str_ob, &str, sizeof (cpp_string));
 	}
 
       concats++;
+      ensure_string_has_location (&tok->val.str, tok->src_loc);
       obstack_grow (&str_ob, &tok->val.str, sizeof (cpp_string));
+
       if (objc_string)
 	objc_at_sign_was_seen = false;
       goto retry;
@@ -1178,7 +1197,7 @@ lex_string (const cpp_token *tok, tree *valp, bool objc_string, bool translate)
        ? cpp_interpret_string : cpp_interpret_string_notranslate)
       (parse_in, strs, concats + 1, &istr, type))
     {
-      value = build_string (istr.len, (const char *) istr.text);
+      value = build_string (istr.len, (const char *) istr.text, &istr.loc);
       free (CONST_CAST (unsigned char *, istr.text));
     }
   else
@@ -1245,6 +1264,8 @@ lex_charconst (const cpp_token *token)
   unsigned int chars_seen;
   int unsignedp = 0;
 
+  ensure_string_has_location (&token->val.str, token->src_loc);
+
   result = cpp_interpret_charconst (parse_in, token,
 				    &chars_seen, &unsignedp);
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 17b7de0..62937ae 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -3716,6 +3716,7 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 
       str.text = (const unsigned char *)TREE_STRING_POINTER (string_tree);
       str.len = TREE_STRING_LENGTH (string_tree);
+      str.loc.init_raw (tok->location, str.len, 1, line_table);
       count = 1;
 
       if (curr_tok_is_userdef_p)
@@ -3742,6 +3743,7 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
 	  count++;
 	  str.text = (const unsigned char *)TREE_STRING_POINTER (string_tree);
 	  str.len = TREE_STRING_LENGTH (string_tree);
+	  str.loc.init_raw (tok->location, str.len, 1, line_table);
 
 	  if (curr_tok_is_userdef_p)
 	    {
@@ -3810,7 +3812,7 @@ cp_parser_string_literal (cp_parser *parser, bool translate, bool wide_ok,
   if ((translate ? cpp_interpret_string : cpp_interpret_string_notranslate)
       (parse_in, strs, count, &istr, type))
     {
-      value = build_string (istr.len, (const char *)istr.text);
+      value = build_string (istr.len, (const char *)istr.text, &istr.loc);
       free (CONST_CAST (unsigned char *, istr.text));
 
       switch (type)
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
new file mode 100644
index 0000000..ef3b8fe
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic-test-string-literals-1.c
@@ -0,0 +1,139 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdiagnostics-show-caret" } */
+
+/* This is a collection of unittests for ranges within string literals,
+   using diagnostic_plugin_test_string_literals, which handles
+   "__emit_string_literal_range" by generating a warning at the given
+   subset of a string literal.
+
+   The indices are 0-based.  It's easiest to verify things using string
+   literals that are runs of 0-based digits (to avoid having to count
+   characters).  */
+
+extern void __emit_string_literal_range (const char *literal,
+					 int start_idx, int end_idx);
+
+void test_simple_string_literal (void)
+{
+  __emit_string_literal_range ("0123456789", /* { dg-warning "range" } */
+			       6, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("0123456789",
+                                       ^~
+   { dg-end-multiline-output "" } */
+}
+
+void test_concatenated_string_literal (void)
+{
+  __emit_string_literal_range ("01234" "56789", /* { dg-warning "range" } */
+			       3, 6);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234" "56789",
+                                    ^~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_multiline_string_literal (void)
+{
+  __emit_string_literal_range ("01234" /* { dg-warning "range" } */
+                               "56789",
+                               3, 6);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234"
+                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+                                "56789",
+                                ~~~  
+   { dg-end-multiline-output "" } */
+  /* FIXME: why does the above need two trailing spaces?  */
+}
+
+/* Tests of various unicode encodings.
+
+   Digits 0 through 9 are unicode code points:
+      U+0030 DIGIT ZERO
+      ...
+      U+0039 DIGIT NINE
+   However, these are not always valid as UCN (see the comment in
+   libcpp/charset.c:_cpp_valid_ucn).
+
+   Hence we need to test UCN using an alternative unicode
+   representation of numbers; let's use Roman numerals,
+   (though these start at one, not zero):
+      U+2170 SMALL ROMAN NUMERAL ONE
+      ...
+      U+2174 SMALL ROMAN NUMERAL FIVE  ("v")
+      U+2175 SMALL ROMAN NUMERAL SIX   ("vi")
+      ...
+      U+2178 SMALL ROMAN NUMERAL NINE.  */
+
+void test_hex (void)
+{
+  /* Digits 0-9, expressing digit 5 in ASCII as "\x35"
+     and with a space in place of digit 6, to terminate the escaped
+     hex code.  */
+  __emit_string_literal_range ("01234\x35 789", /* { dg-warning "range" } */
+			       3, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234\x35 789"
+                                    ^~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_oct (void)
+{
+  /* Digits 0-9, expressing digit 5 in ASCII as "\065"
+     and with a space in place of digit 6, to terminate the escaped
+     octal code.  */
+  __emit_string_literal_range ("01234\065 789", /* { dg-warning "range" } */
+			       3, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234\065 789"
+                                    ^~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_multiple (void)
+{
+  /* Digits 0-9, expressing digit 5 in ASCII as hex "\x35"
+     digit 6 in ASCII as octal "\066", concatenating multiple strings.  */
+  __emit_string_literal_range ("01234"  "\x35"  "\066"  "789", /* { dg-warning "range" } */
+			       3, 8);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234"  "\x35"  "\066"  "789",
+                                    ^~~~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_ucn4 (void)
+{
+  /* Digits 0-9, expressing digits 5 and 6 as Roman numerals expressed
+     as UCN 4.  */
+  __emit_string_literal_range ("01234\u2174\u2175789", /* { dg-warning "range" } */
+			       4, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234\u2174\u2175789",
+                                     ^~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_ucn8 (void)
+{
+  /* Digits 0-9, expressing digits 5 and 6 as Roman numerals as UCN 8.  */
+  __emit_string_literal_range ("01234\U00002174\U00002175789", /* { dg-warning "range" } */
+			       4, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range ("01234\U00002174\U00002175789",
+                                     ^~~~~~~~~~~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+}
+
+void test_u8 (void)
+{
+  /* Digits 0-9.  */
+  __emit_string_literal_range (u8"0123456789", /* { dg-warning "range" } */
+			       4, 7);
+/* { dg-begin-multiline-output "" }
+   __emit_string_literal_range (u8"0123456789",
+                                       ^~~~
+   { dg-end-multiline-output "" } */
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
new file mode 100644
index 0000000..c6b591e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
@@ -0,0 +1,215 @@
+/* This plugin uses the diagnostics code to verify tracking of source code
+   locations within string literals.  */
+/* { dg-options "-O" } */
+
+#include "gcc-plugin.h"
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "toplev.h"
+#include "basic-block.h"
+#include "hash-table.h"
+#include "vec.h"
+#include "ggc.h"
+#include "basic-block.h"
+#include "tree-ssa-alias.h"
+#include "internal-fn.h"
+#include "gimple-fold.h"
+#include "tree-eh.h"
+#include "gimple-expr.h"
+#include "is-a.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "tree.h"
+#include "tree-pass.h"
+#include "intl.h"
+#include "plugin-version.h"
+#include "diagnostic.h"
+#include "context.h"
+#include "gcc-rich-location.h"
+#include "print-tree.h"
+#include "cpplib.h"
+
+/* FIXME. hacking in a copy of this for now to get around linker issues.  */
+
+source_range
+cpp_string_location::get_range_between_indices (unsigned int start_idx,
+						unsigned int finish_idx) const
+{
+  /* This could be optimized if necessary.  */
+  source_range result;
+  result.m_start = get_loc_at_index (start_idx);
+  result.m_finish = get_range_at_index (finish_idx).m_finish;
+  return result;
+}
+
+int plugin_is_GPL_compatible;
+
+const pass_data pass_data_test_string_literals =
+{
+  GIMPLE_PASS, /* type */
+  "test_string_literals", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_test_string_literals : public gimple_opt_pass
+{
+public:
+  pass_test_string_literals(gcc::context *ctxt)
+    : gimple_opt_pass(pass_data_test_string_literals, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  bool gate (function *) { return true; }
+  virtual unsigned int execute (function *);
+
+}; // class pass_test_string_literals
+
+/* FIXME.  */
+
+static gcall *
+check_for_named_call (gimple stmt,
+		      const char *funcname, unsigned int num_args)
+{
+  gcc_assert (funcname);
+
+  gcall *call = dyn_cast <gcall *> (stmt);
+  if (!call)
+    return NULL;
+
+  tree fndecl = gimple_call_fndecl (call);
+  if (!fndecl)
+    return NULL;
+
+  if (strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)), funcname))
+    return NULL;
+
+  if (gimple_call_num_args (call) != num_args)
+    {
+      error_at (stmt->location, "expected number of args: %i (got %i)",
+		num_args, gimple_call_num_args (call));
+      return NULL;
+    }
+
+  return call;
+}
+
+static void
+emit_warning (source_range src_range)
+{
+  rich_location richloc (src_range);
+  location_range *range = richloc.get_range (0);
+  warning_at_rich_loc (&richloc, 0,
+		       "range %i:%i-%i:%i",
+		       range->m_start.line,
+		       range->m_start.column,
+		       range->m_finish.line,
+		       range->m_finish.column);
+}
+
+/* Support code for verifying that we are correctly tracking ranges
+   within string literals, for use by diagnostic-test-string-literals-*.c.  */
+
+static void
+test_string_literals (gimple stmt)
+{
+  gcall *call = check_for_named_call (stmt, "__emit_string_literal_range", 3);
+  if (!call)
+    return;
+
+#if 0
+  for (int i = 0; i < 3; i++)
+    warning_at (EXPR_LOCATION (gimple_call_arg (call, i)), 0, "arg %i", i);
+#endif
+
+  /* We expect an ADDR_EXPR with a STRING_CST inside it for the
+     initial arg.  */
+  tree t_addr_string = gimple_call_arg (call, 0);
+  if (TREE_CODE (t_addr_string) != ADDR_EXPR)
+    {
+      error_at (call->location, "string literal required for arg 1");
+      return;
+    }
+
+  tree t_string = TREE_OPERAND (t_addr_string, 0);
+  if (TREE_CODE (t_string) != STRING_CST)
+    {
+      error_at (call->location, "string literal required for arg 1");
+      return;
+    }
+
+  tree t_start_idx = gimple_call_arg (call, 1);
+  if (TREE_CODE (t_start_idx) != INTEGER_CST)
+    {
+      error_at (call->location, "integer constant required for arg 2");
+      return;
+    }
+  int start_idx = TREE_INT_CST_LOW (t_start_idx);
+
+  tree t_end_idx = gimple_call_arg (call, 2);
+  if (TREE_CODE (t_end_idx) != INTEGER_CST)
+    {
+      error_at (call->location, "integer constant required for arg 3");
+      return;
+    }
+  int end_idx = TREE_INT_CST_LOW (t_end_idx);
+
+  cpp_string_location *strloc = TREE_STRING_LOCATION (t_string);
+  gcc_assert (strloc);
+  source_range src_range
+    = strloc->get_range_between_indices (start_idx, end_idx);
+  emit_warning (src_range);
+}
+
+unsigned int
+pass_test_string_literals::execute (function *fun)
+{
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+
+  FOR_EACH_BB_FN (bb, fun)
+    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      {
+	gimple stmt = gsi_stmt (gsi);
+	test_string_literals (stmt);
+      }
+
+  return 0;
+}
+
+static gimple_opt_pass *
+make_pass_test_string_literals (gcc::context *ctxt)
+{
+  return new pass_test_string_literals (ctxt);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version *version)
+{
+  struct register_pass_info pass_info;
+  const char *plugin_name = plugin_info->base_name;
+  int argc = plugin_info->argc;
+  struct plugin_argument *argv = plugin_info->argv;
+
+  if (!plugin_default_version_check (version, &gcc_version))
+    return 1;
+
+  pass_info.pass = make_pass_test_string_literals (g);
+  pass_info.reference_pass_name = "ssa";
+  pass_info.ref_pass_instance_number = 1;
+  pass_info.pos_op = PASS_POS_INSERT_AFTER;
+  register_callback (plugin_name, PLUGIN_PASS_MANAGER_SETUP, NULL,
+		     &pass_info);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 91f6391..97d7a41 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -72,6 +72,8 @@ set plugin_test_list [list \
 	  diagnostic-test-expressions-1.c } \
     { diagnostic_plugin_show_trees.c \
 	  diagnostic-test-show-trees-1.c } \
+    { diagnostic_plugin_test_string_literals.c \
+	  diagnostic-test-string-literals-1.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 6931ad9..7cda82f 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1166,9 +1166,12 @@ struct GTY(()) tree_fixed_cst {
   struct fixed_value * fixed_cst_ptr;
 };
 
+struct cpp_string_location;
+
 struct GTY(()) tree_string {
   struct tree_typed typed;
   int length;
+  cpp_string_location *loc;
   char str[1];
 };
 
diff --git a/gcc/tree.c b/gcc/tree.c
index d1595c2..81d1cbd 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -75,6 +75,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "print-tree.h"
 #include "ipa-utils.h"
+#include "cpplib.h"
 
 /* Tree code classes.  */
 
@@ -1931,12 +1932,35 @@ build_string (int len, const char *str)
   TREE_SET_CODE (s, STRING_CST);
   TREE_CONSTANT (s) = 1;
   TREE_STRING_LENGTH (s) = len;
+  TREE_STRING_LOCATION (s) = NULL;
   memcpy (s->string.str, str, len);
   s->string.str[len] = '\0';
 
   return s;
 }
 
+/* As above, but with per-character location information.  */
+
+struct cpp_string_location_stats cpp_string_location_stats;
+
+tree
+build_string (int len, const char *str, cpp_string_location *strloc)
+{
+  tree s = build_string (len, str);
+
+  /* Need to allocate a copy:  */
+  TREE_STRING_LOCATION (s) = ggc_alloc <cpp_string_location> ();
+  *TREE_STRING_LOCATION (s) = *strloc;
+
+  /* Maintain stats on string locations.  */
+  cpp_string_location_stats.count_all++;
+  if (strloc->trivial_p ())
+    cpp_string_location_stats.count_trivial++;
+
+  return s;
+}
+
+
 /* Return a newly constructed COMPLEX_CST node whose value is
    specified by the real and imaginary parts REAL and IMAG.
    Both REAL and IMAG should be constant nodes.  TYPE, if specified,
diff --git a/gcc/tree.h b/gcc/tree.h
index 66419d4..995937c 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -937,9 +937,20 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 /* In a STRING_CST */
 /* In C terms, this is sizeof, not strlen.  */
 #define TREE_STRING_LENGTH(NODE) (STRING_CST_CHECK (NODE)->string.length)
+#define TREE_STRING_LOCATION(NODE) (STRING_CST_CHECK (NODE)->string.loc)
 #define TREE_STRING_POINTER(NODE) \
   ((const char *)(STRING_CST_CHECK (NODE)->string.str))
 
+extern struct cpp_string_location_stats
+{
+  /* How many have been used to construct STRING_CST.  */
+  int count_all;
+
+  /* How many of these consisted of a single run of 1-byte-per-char
+     bytes.  */
+  int count_trivial;
+} cpp_string_location_stats;
+
 /* In a COMPLEX_CST node.  */
 #define TREE_REALPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.real)
 #define TREE_IMAGPART(NODE) (COMPLEX_CST_CHECK (NODE)->complex.imag)
@@ -3791,6 +3802,7 @@ extern tree build_minus_one_cst (tree);
 extern tree build_all_ones_cst (tree);
 extern tree build_zero_cst (tree);
 extern tree build_string (int, const char *);
+extern tree build_string (int, const char *, cpp_string_location *);
 extern tree build_tree_list_stat (tree, tree MEM_STAT_DECL);
 #define build_tree_list(t, q) build_tree_list_stat (t, q MEM_STAT_INFO)
 extern tree build_tree_list_vec_stat (const vec<tree, va_gc> *MEM_STAT_DECL);
diff --git a/libcpp/charset.c b/libcpp/charset.c
index 3ae7916..f78cdf6 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -2028,6 +2028,18 @@ cpp_string_location::get_range_at_index (unsigned int char_idx) const
   return err;
 }
 
+/* FIXME. FINISH_IDX is within the range.  */
+source_range
+cpp_string_location::get_range_between_indices (unsigned int start_idx,
+						unsigned int finish_idx) const
+{
+  /* This could be optimized if necessary.  */
+  source_range result;
+  result.m_start = get_loc_at_index (start_idx);
+  result.m_finish = get_range_at_index (finish_idx).m_finish;
+  return result;
+}
+
 /* FIXME.  */
 bool
 cpp_string_location::trivial_p () const
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index a5e5df5..6023812 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -256,6 +256,8 @@ struct GTY(()) cpp_string_location {
 
   source_location get_loc_at_index (unsigned int idx) const;
   source_range get_range_at_index (unsigned int idx) const;
+  source_range get_range_between_indices (unsigned int start_idx,
+					  unsigned int finish_idx) const;
 
   void debug () const;
 
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (19 preceding siblings ...)
  2015-09-10 20:32 ` [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend David Malcolm
@ 2015-09-10 20:32 ` David Malcolm
  2015-09-10 20:50 ` [PATCH 22/22] Add fixit hints to spellchecker suggestions David Malcolm
  2015-09-14 17:49 ` [PATCH 00/22] RFC: Overhaul of diagnostics Bernd Schmidt
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-09/tree-ranges.html

This mostly affects the C frontend, but it touches c-common.h so the
C++ frontend needs a slight adjustment.

gcc/c-family/ChangeLog:
	* c-common.c: Include gcc-rich-location.h.
	(binary_op_error): Add params orig_op0 and orig_op1; use them
	to add ranges and captions to the error.
	* c-common.h (binary_op_error): Add params orig_op0 and orig_op1.

gcc/c/ChangeLog:
	* c-parser.c (c_parser_static_assert_declaration_no_semi):
	Convert locals "assert_loc" and "value_loc" from location_t to
	source_range, thus using ranges in diagnostics.
	* c-typeck.c (inform_declaration): Use DECL_LOCATION_RANGE
	of the decl, rather than DECL_SOURCE_LOCATION.
	(build_function_call_vec): Use the EXPR_LOCATION_RANGE of
	the function when issuing diagnostics, rather than "loc".
	(convert_for_assignment): Emit range-based diagnostics, rather
	than locations when CAN_HAVE_RANGE_P (rhs).
	(c_finish_return): Likewise when CAN_HAVE_RANGE_P (retval).
	When issuing warnings about erroneous presence/absence of return
	values, show the location of current_function_decl using inform.
	(build_binary_op): Pass orig_op0 and orig_op1 to binary_op_error.

gcc/cp/ChangeLog:
	* typeck.c (cp_build_binary_op): Pass orig_op0 and orig_op1 to
	binary_op_error.

gcc/testsuite/ChangeLog:
	* gcc.dg/diagnostic-tree-expr-ranges.c: New file.
---
 gcc/c-family/c-common.c                            |   9 +-
 gcc/c-family/c-common.h                            |   3 +-
 gcc/c/c-parser.c                                   |   6 +-
 gcc/c/c-typeck.c                                   | 123 ++++++++++------
 gcc/cp/typeck.c                                    |   3 +-
 gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c | 159 +++++++++++++++++++++
 6 files changed, 250 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index ff6f90f..77962fc 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -53,6 +53,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimplify.h"
 #include "wide-int-print.h"
 #include "gimple-expr.h"
+#include "gcc-rich-location.h"
 
 cpp_reader *parse_in;		/* Declared in c-pragma.h.  */
 
@@ -4340,6 +4341,7 @@ c_register_builtin_type (tree type, const char* name)
 
 void
 binary_op_error (location_t location, enum tree_code code,
+		 tree orig_op0, tree orig_op1,
 		 tree type0, tree type1)
 {
   const char *opname;
@@ -4391,7 +4393,12 @@ binary_op_error (location_t location, enum tree_code code,
     default:
       gcc_unreachable ();
     }
-  error_at (location,
+  gcc_rich_location richloc (location);
+  richloc.maybe_add_expr_with_caption (orig_op0, global_dc,
+				       "%qT", TREE_TYPE (orig_op0));
+  richloc.maybe_add_expr_with_caption (orig_op1, global_dc,
+				       "%qT", TREE_TYPE (orig_op1));
+  error_at_rich_loc (&richloc,
 	    "invalid operands to binary %s (have %qT and %qT)", opname,
 	    type0, type1);
 }
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index bb17fcc..b9a5d72 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -804,7 +804,8 @@ extern tree c_sizeof_or_alignof_type (location_t, tree, bool, bool, int);
 extern tree c_alignof_expr (location_t, tree);
 /* Print an error message for invalid operands to arith operation CODE.
    NOP_EXPR is used as a special case (see truthvalue_conversion).  */
-extern void binary_op_error (location_t, enum tree_code, tree, tree);
+extern void binary_op_error (location_t, enum tree_code, tree, tree,
+			     tree, tree);
 extern tree fix_string_type (tree);
 extern void constant_expression_warning (tree);
 extern void constant_expression_error (tree);
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 4303496..0c62496 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -2037,12 +2037,12 @@ c_parser_static_assert_declaration (c_parser *parser)
 static void
 c_parser_static_assert_declaration_no_semi (c_parser *parser)
 {
-  location_t assert_loc, value_loc;
+  source_range assert_loc, value_loc;
   tree value;
   tree string;
 
   gcc_assert (c_parser_next_token_is_keyword (parser, RID_STATIC_ASSERT));
-  assert_loc = c_parser_peek_token (parser)->location;
+  assert_loc = c_parser_peek_token (parser)->range;
   if (flag_isoc99)
     pedwarn_c99 (assert_loc, OPT_Wpedantic,
 		 "ISO C99 does not support %<_Static_assert%>");
@@ -2052,8 +2052,8 @@ c_parser_static_assert_declaration_no_semi (c_parser *parser)
   c_parser_consume_token (parser);
   if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
     return;
-  value_loc = c_parser_peek_token (parser)->location;
   value = c_parser_expr_no_commas (parser, NULL).value;
+  value_loc = EXPR_LOCATION_RANGE (value);
   parser->lex_untranslated_string = true;
   if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
     {
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4123f11..506abb3 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2851,7 +2851,7 @@ static void
 inform_declaration (tree decl)
 {
   if (decl && (TREE_CODE (decl) != FUNCTION_DECL || !DECL_IS_BUILTIN (decl)))
-    inform (DECL_SOURCE_LOCATION (decl), "declared here");
+    inform (DECL_LOCATION_RANGE (decl), "declared here");
 }
 
 /* Build a function call to function FUNCTION with parameters PARAMS.
@@ -2873,6 +2873,7 @@ build_function_call_vec (location_t loc, vec<location_t> arg_loc,
   int nargs;
   tree *argarray;
 
+  source_range func_range = EXPR_LOCATION_RANGE (function);
 
   /* Strip NON_LVALUE_EXPRs, etc., since we aren't using as an lvalue.  */
   STRIP_TYPE_NOPS (function);
@@ -2921,13 +2922,13 @@ build_function_call_vec (location_t loc, vec<location_t> arg_loc,
 		  function);
       else if (DECL_P (function))
 	{
-	  error_at (loc,
+	  error_at (func_range,
 		    "called object %qD is not a function or function pointer",
 		    function);
 	  inform_declaration (function);
 	}
       else
-	error_at (loc,
+	error_at (func_range,
 		  "called object is not a function or function pointer");
       return error_mark_node;
     }
@@ -2958,11 +2959,12 @@ build_function_call_vec (location_t loc, vec<location_t> arg_loc,
       /* This situation leads to run-time undefined behavior.  We can't,
 	 therefore, simply error unless we can prove that all possible
 	 executions of the program must execute the code.  */
-      warning_at (loc, 0, "function called through a non-compatible type");
+      warning_at (func_range, 0,
+		  "function called through a non-compatible type");
 
       if (VOID_TYPE_P (return_type)
 	  && TYPE_QUALS (return_type) != TYPE_UNQUALIFIED)
-	pedwarn (loc, 0,
+	pedwarn (func_range, 0,
 		 "function with qualified void return type called");
      }
 
@@ -2999,7 +3001,7 @@ build_function_call_vec (location_t loc, vec<location_t> arg_loc,
   if (VOID_TYPE_P (TREE_TYPE (result)))
     {
       if (TYPE_QUALS (TREE_TYPE (result)) != TYPE_UNQUALIFIED)
-	pedwarn (loc, 0,
+	pedwarn (func_range, 0,
 		 "function with qualified void return type called");
       return result;
     }
@@ -5734,6 +5736,13 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
   tree rname = NULL_TREE;
   bool objc_ok = false;
 
+  source_range rhs_range;
+
+  if (CAN_HAVE_RANGE_P (rhs))
+    rhs_range = EXPR_LOCATION_RANGE (rhs);
+  else
+    rhs_range = source_range::from_location (expr_loc);
+
   if (errtype == ic_argpass)
     {
       tree selector;
@@ -5756,14 +5765,14 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
   /* This macro is used to emit diagnostics to ensure that all format
      strings are complete sentences, visible to gettext and checked at
      compile time.  */
-#define PEDWARN_FOR_ASSIGNMENT(LOCATION, PLOC, OPT, AR, AS, IN, RE)	 \
+#define PEDWARN_FOR_ASSIGNMENT(LOCATION, RANGE, OPT, AR, AS, IN, RE)	 \
   do {                                                                   \
     switch (errtype)                                                     \
       {                                                                  \
       case ic_argpass:                                                   \
-        if (pedwarn (PLOC, OPT, AR, parmnum, rname))			 \
-          inform ((fundecl && !DECL_IS_BUILTIN (fundecl))	         \
-		  ? DECL_SOURCE_LOCATION (fundecl) : PLOC,		 \
+        if (pedwarn (RANGE, OPT, AR, parmnum, rname))                    \
+          inform ((fundecl && !DECL_IS_BUILTIN (fundecl))                \
+                 ? DECL_SOURCE_LOCATION (fundecl) : RANGE.m_start,       \
                   "expected %qT but argument is of type %qT",            \
                   type, rhstype);                                        \
         break;                                                           \
@@ -5785,14 +5794,14 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
      strings are complete sentences, visible to gettext and checked at
      compile time.  It is the same as PEDWARN_FOR_ASSIGNMENT but with an
      extra parameter to enumerate qualifiers.  */
-#define PEDWARN_FOR_QUALIFIERS(LOCATION, PLOC, OPT, AR, AS, IN, RE, QUALS) \
+#define PEDWARN_FOR_QUALIFIERS(LOCATION, RANGE, OPT, AR, AS, IN, RE, QUALS) \
   do {                                                                   \
     switch (errtype)                                                     \
       {                                                                  \
       case ic_argpass:                                                   \
-        if (pedwarn (PLOC, OPT, AR, parmnum, rname, QUALS))		 \
-          inform ((fundecl && !DECL_IS_BUILTIN (fundecl))	         \
-		  ? DECL_SOURCE_LOCATION (fundecl) : PLOC,		 \
+        if (pedwarn (RANGE, OPT, AR, parmnum, rname, QUALS))		 \
+          inform ((fundecl && !DECL_IS_BUILTIN (fundecl))                \
+                 ? DECL_SOURCE_LOCATION (fundecl) : RANGE.m_start,       \
                   "expected %qT but argument is of type %qT",            \
                   type, rhstype);                                        \
         break;                                                           \
@@ -5814,14 +5823,14 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
      strings are complete sentences, visible to gettext and checked at
      compile time.  It is the same as PEDWARN_FOR_QUALIFIERS but uses
      warning_at instead of pedwarn.  */
-#define WARNING_FOR_QUALIFIERS(LOCATION, PLOC, OPT, AR, AS, IN, RE, QUALS) \
+#define WARNING_FOR_QUALIFIERS(LOCATION, RANGE, OPT, AR, AS, IN, RE, QUALS) \
   do {                                                                   \
     switch (errtype)                                                     \
       {                                                                  \
       case ic_argpass:                                                   \
-        if (warning_at (PLOC, OPT, AR, parmnum, rname, QUALS))           \
+        if (warning_at (RANGE, OPT, AR, parmnum, rname, QUALS))          \
           inform ((fundecl && !DECL_IS_BUILTIN (fundecl))                \
-                  ? DECL_SOURCE_LOCATION (fundecl) : PLOC,               \
+                  ? DECL_SOURCE_LOCATION (fundecl) : RANGE.m_start,      \
                   "expected %qT but argument is of type %qT",            \
                   type, rhstype);                                        \
         break;                                                           \
@@ -5881,7 +5890,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	  && TREE_CODE (type) == ENUMERAL_TYPE
 	  && TYPE_MAIN_VARIANT (checktype) != TYPE_MAIN_VARIANT (type))
 	{
-	  PEDWARN_FOR_ASSIGNMENT (location, expr_loc, OPT_Wc___compat,
+	  PEDWARN_FOR_ASSIGNMENT (location, rhs_range, OPT_Wc___compat,
 			          G_("enum conversion when passing argument "
 				     "%d of %qE is invalid in C++"),
 			          G_("enum conversion in assignment is "
@@ -6050,7 +6059,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		     vice-versa.  */
 		  if (TYPE_QUALS_NO_ADDR_SPACE (ttl)
 		      & ~TYPE_QUALS_NO_ADDR_SPACE (ttr))
-		    PEDWARN_FOR_QUALIFIERS (location, expr_loc,
+		    PEDWARN_FOR_QUALIFIERS (location, rhs_range,
 					    OPT_Wdiscarded_qualifiers,
 					    G_("passing argument %d of %qE "
 					       "makes %q#v qualified function "
@@ -6067,7 +6076,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		}
 	      else if (TYPE_QUALS_NO_ADDR_SPACE (ttr)
 		       & ~TYPE_QUALS_NO_ADDR_SPACE (ttl))
-		PEDWARN_FOR_QUALIFIERS (location, expr_loc,
+		PEDWARN_FOR_QUALIFIERS (location, rhs_range,
 				        OPT_Wdiscarded_qualifiers,
 				        G_("passing argument %d of %qE discards "
 					   "%qv qualifier from pointer target type"),
@@ -6158,7 +6167,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	  switch (errtype)
 	    {
 	    case ic_argpass:
-	      error_at (expr_loc, "passing argument %d of %qE from pointer to "
+	      error_at (rhs_range, "passing argument %d of %qE from pointer to "
 			"non-enclosed address space", parmnum, rname);
 	      break;
 	    case ic_assign:
@@ -6187,7 +6196,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	  switch (errtype)
 	  {
 	  case ic_argpass:
-	    warning_at (expr_loc, OPT_Wsuggest_attribute_format,
+	    warning_at (rhs_range, OPT_Wsuggest_attribute_format,
 			"argument %d of %qE might be "
 			"a candidate for a format attribute",
 			parmnum, rname);
@@ -6234,7 +6243,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 
 	      if (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC (ttr)
 		  & ~TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC (ttl))
-		WARNING_FOR_QUALIFIERS (location, expr_loc,
+		WARNING_FOR_QUALIFIERS (location, rhs_range,
 				        OPT_Wdiscarded_array_qualifiers,
 				        G_("passing argument %d of %qE discards "
 					   "%qv qualifier from pointer target type"),
@@ -6252,7 +6261,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		  (VOID_TYPE_P (ttr)
 		   && !null_pointer_constant
 		   && TREE_CODE (ttl) == FUNCTION_TYPE)))
-	    PEDWARN_FOR_ASSIGNMENT (location, expr_loc, OPT_Wpedantic,
+	    PEDWARN_FOR_ASSIGNMENT (location, rhs_range, OPT_Wpedantic,
 				    G_("ISO C forbids passing argument %d of "
 				       "%qE between function pointer "
 				       "and %<void *%>"),
@@ -6277,7 +6286,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	      if (TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC (ttr)
 		  & ~TYPE_QUALS_NO_ADDR_SPACE_NO_ATOMIC (ttl))
 		{
-		  PEDWARN_FOR_QUALIFIERS (location, expr_loc,
+		  PEDWARN_FOR_QUALIFIERS (location, rhs_range,
 				          OPT_Wdiscarded_qualifiers,
 				          G_("passing argument %d of %qE discards "
 					     "%qv qualifier from pointer target type"),
@@ -6296,7 +6305,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		;
 	      /* If there is a mismatch, do warn.  */
 	      else if (warn_pointer_sign)
-		 PEDWARN_FOR_ASSIGNMENT (location, expr_loc, OPT_Wpointer_sign,
+		 PEDWARN_FOR_ASSIGNMENT (location, rhs_range, OPT_Wpointer_sign,
 				         G_("pointer targets in passing argument "
 					    "%d of %qE differ in signedness"),
 				         G_("pointer targets in assignment "
@@ -6315,7 +6324,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 		 where an ordinary one is wanted, but not vice-versa.  */
 	      if (TYPE_QUALS_NO_ADDR_SPACE (ttl)
 		  & ~TYPE_QUALS_NO_ADDR_SPACE (ttr))
-		PEDWARN_FOR_QUALIFIERS (location, expr_loc,
+		PEDWARN_FOR_QUALIFIERS (location, rhs_range,
 				        OPT_Wdiscarded_qualifiers,
 				        G_("passing argument %d of %qE makes "
 					   "%q#v qualified function pointer "
@@ -6332,7 +6341,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
       else
 	/* Avoid warning about the volatile ObjC EH puts on decls.  */
 	if (!objc_ok)
-	  PEDWARN_FOR_ASSIGNMENT (location, expr_loc,
+	  PEDWARN_FOR_ASSIGNMENT (location, rhs_range,
 			          OPT_Wincompatible_pointer_types,
 			          G_("passing argument %d of %qE from "
 				     "incompatible pointer type"),
@@ -6356,7 +6365,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
 	 or one that results from arithmetic, even including
 	 a cast to integer type.  */
       if (!null_pointer_constant)
-	PEDWARN_FOR_ASSIGNMENT (location, expr_loc,
+	PEDWARN_FOR_ASSIGNMENT (location, rhs_range,
 			        OPT_Wint_conversion,
 			        G_("passing argument %d of %qE makes "
 				   "pointer from integer without a cast"),
@@ -6371,7 +6380,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
     }
   else if (codel == INTEGER_TYPE && coder == POINTER_TYPE)
     {
-      PEDWARN_FOR_ASSIGNMENT (location, expr_loc,
+      PEDWARN_FOR_ASSIGNMENT (location, rhs_range,
 			      OPT_Wint_conversion,
 			      G_("passing argument %d of %qE makes integer "
 			         "from pointer without a cast"),
@@ -6396,7 +6405,7 @@ convert_for_assignment (location_t location, location_t expr_loc, tree type,
   switch (errtype)
     {
     case ic_argpass:
-      error_at (expr_loc, "incompatible type for argument %d of %qE", parmnum,
+      error_at (rhs_range, "incompatible type for argument %d of %qE", parmnum,
 		rname);
       inform ((fundecl && !DECL_IS_BUILTIN (fundecl))
 	      ? DECL_SOURCE_LOCATION (fundecl) : expr_loc,
@@ -9396,8 +9405,14 @@ c_finish_return (location_t loc, tree retval, tree origtype)
   bool npc = false;
   size_t rank = 0;
 
+  source_range expr_range;
+  if (retval && CAN_HAVE_RANGE_P (retval))
+    expr_range = EXPR_LOCATION_RANGE (retval);
+  else
+    expr_range = source_range::from_location (loc);
+
   if (TREE_THIS_VOLATILE (current_function_decl))
-    warning_at (loc, 0,
+    warning_at (expr_range, 0,
 		"function declared %<noreturn%> has a %<return%> statement");
 
   if (flag_cilkplus && contains_array_notation_expr (retval))
@@ -9408,14 +9423,15 @@ c_finish_return (location_t loc, tree retval, tree origtype)
 	return error_mark_node;
       if (rank >= 1)
 	{
-	  error_at (loc, "array notation expression cannot be used as a "
+	  error_at (expr_range, "array notation expression cannot be used as a "
 		    "return value");
 	  return error_mark_node;
 	}
     }
   if (flag_cilkplus && retval && contains_cilk_spawn_stmt (retval))
     {
-      error_at (loc, "use of %<_Cilk_spawn%> in a return statement is not "
+      error_at (expr_range,
+		"use of %<_Cilk_spawn%> in a return statement is not "
 		"allowed");
       return error_mark_node;
     }
@@ -9439,24 +9455,36 @@ c_finish_return (location_t loc, tree retval, tree origtype)
       if ((warn_return_type || flag_isoc99)
 	  && valtype != 0 && TREE_CODE (valtype) != VOID_TYPE)
 	{
+	  bool warned_here;
 	  if (flag_isoc99)
-	    pedwarn (loc, 0, "%<return%> with no value, in "
-		     "function returning non-void");
+	    warned_here = pedwarn
+	      (loc, 0,
+	       "%<return%> with no value, in function returning non-void");
 	  else
-	    warning_at (loc, OPT_Wreturn_type, "%<return%> with no value, "
-			"in function returning non-void");
-	  no_warning = true;
+	    warned_here =
+	      warning_at (loc, OPT_Wreturn_type,
+			  "%<return%> with no value, "
+			  "in function returning non-void");
+	  if (warned_here)
+	    inform (DECL_SOURCE_LOCATION (current_function_decl),
+		    "declared here");
 	}
     }
   else if (valtype == 0 || TREE_CODE (valtype) == VOID_TYPE)
     {
       current_function_returns_null = 1;
+      bool warned_here;
       if (TREE_CODE (TREE_TYPE (retval)) != VOID_TYPE)
-	pedwarn (loc, 0,
-		 "%<return%> with a value, in function returning void");
+	warned_here = pedwarn
+	  (expr_range, 0,
+	   "%<return%> with a value, in function returning void");
       else
-	pedwarn (loc, OPT_Wpedantic, "ISO C forbids "
-		 "%<return%> with expression, in function returning void");
+	warned_here = pedwarn
+	  (expr_range, OPT_Wpedantic, "ISO C forbids "
+	   "%<return%> with expression, in function returning void");
+      if (warned_here)
+	inform (DECL_SOURCE_LOCATION (current_function_decl),
+		"declared here");
     }
   else
     {
@@ -9532,11 +9560,11 @@ c_finish_return (location_t loc, tree retval, tree origtype)
 		  && DECL_CONTEXT (inner) == current_function_decl)
 		{
 		  if (TREE_CODE (inner) == LABEL_DECL)
-		    warning_at (loc, OPT_Wreturn_local_addr,
+		    warning_at (expr_range, OPT_Wreturn_local_addr,
 				"function returns address of label");
 		  else
 		    {
-		      warning_at (loc, OPT_Wreturn_local_addr,
+		      warning_at (expr_range, OPT_Wreturn_local_addr,
 				  "function returns address of local variable");
 		      tree zero = build_zero_cst (TREE_TYPE (res));
 		      t = build2 (COMPOUND_EXPR, TREE_TYPE (res), t, zero);
@@ -11055,7 +11083,7 @@ build_binary_op (location_t location, enum tree_code code,
       && (!tree_int_cst_equal (TYPE_SIZE (type0), TYPE_SIZE (type1))
 	  || !vector_types_compatible_elements_p (type0, type1)))
     {
-      binary_op_error (location, code, type0, type1);
+      binary_op_error (location, code, orig_op0, orig_op1, type0, type1);
       return error_mark_node;
     }
 
@@ -11294,7 +11322,8 @@ build_binary_op (location_t location, enum tree_code code,
 
   if (!result_type)
     {
-      binary_op_error (location, code, TREE_TYPE (op0), TREE_TYPE (op1));
+      binary_op_error (location, code, orig_op0, orig_op1,
+		       TREE_TYPE (op0), TREE_TYPE (op1));
       return error_mark_node;
     }
 
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index 388558c..b5a131c 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -4841,7 +4841,8 @@ cp_build_binary_op (location_t location,
 	      || !vector_types_compatible_elements_p (type0, type1))
 	    {
 	      if (complain & tf_error)
-		binary_op_error (location, code, type0, type1);
+		binary_op_error (location, code, orig_op0, orig_op1,
+				 type0, type1);
 	      return error_mark_node;
 	    }
 	  arithmetic_types_p = 1;
diff --git a/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c
new file mode 100644
index 0000000..10ab7db
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/diagnostic-tree-expr-ranges.c
@@ -0,0 +1,159 @@
+/* { dg-options "-fdiagnostics-show-caret -Wreturn-local-addr" } */
+
+/* Verify that various diagnostics show source code ranges.  */
+
+/* These ones make use of tree ranges.  */
+
+void test_nonconst_static_assert (int i)
+{
+  _Static_assert (i > 0, "message"); /* { dg-error "expression in static assertion is not constant" } */
+/*
+{ dg-begin-multiline-output "" }
+   _Static_assert (i > 0, "message");
+                   ^~~~~
+{ dg-end-multiline-output "" }
+*/
+}
+
+extern void test_callee (int first, const char *second, int third);
+
+void test_bad_argument_types (int first, int second, int third)
+{
+  test_callee (first, first + second + third, third); /* { dg-warning "passing argument 2 of 'test_callee' makes pointer from integer without a cast" }  */
+
+/*
+{ dg-begin-multiline-output "" }
+   test_callee (first, first + second + third, third);
+                       ^~~~~~~~~~~~~~~~~~~~~~
+{ dg-end-multiline-output "" }
+{ dg-begin-multiline-output "" }
+ extern void test_callee (int first, const char *second, int third);
+             ^
+{ dg-end-multiline-output "" }
+FIXME: we ought to highlight the specific param in the decl
+*/
+
+  test_callee (first, &first, third); /* { dg-warning "passing argument 2 of 'test_callee' from incompatible pointer type" } */
+
+/*
+{ dg-begin-multiline-output "" }
+   test_callee (first, &first, third);
+                       ^~~~~~
+{ dg-end-multiline-output "" }
+{ dg-begin-multiline-output "" }
+ extern void test_callee (int first, const char *second, int third);
+             ^
+{ dg-end-multiline-output "" }
+FIXME: again, we ought to highlight the specific param in the decl
+*/
+
+  test_callee ("hello world", "", third); /* { dg-warning "passing argument 1 of 'test_callee' makes integer from pointer without a cast" } */
+
+/*
+{ dg-begin-multiline-output "" }
+   test_callee ("hello world", "", third);
+                ^~~~~~~~~~~~~
+{ dg-end-multiline-output "" }
+{ dg-begin-multiline-output "" }
+ extern void test_callee (int first, const char *second, int third);
+             ^
+{ dg-end-multiline-output "" }
+FIXME: and again, we ought to highlight the specific param in the decl
+*/
+
+}
+
+/* Adapted from https://gcc.gnu.org/wiki/ClangDiagnosticsComparison */
+
+void call_of_non_function_ptr (char **argP, char **argQ)
+{
+  (argP - argQ)(); /* { dg-error "called object is not a function or function pointer" } */
+
+/* { dg-begin-multiline-output "" }
+   (argP - argQ)();
+   ^~~~~~~~~~~~~
+   { dg-end-multiline-output "" } */
+
+  argP();       /* { dg-error "called object 'argP' is not a function or function pointer" } */
+
+/* { dg-begin-multiline-output "" }
+   argP();
+   ^~~~
+   { dg-end-multiline-output "" }
+   { dg-begin-multiline-output "" }
+ void call_of_non_function_ptr (char **argP, char **argQ)
+                                       ^
+   { dg-end-multiline-output "" } */
+  /* TODO: underline all of "argP" in the decl above. */
+}
+
+/* Adapted from https://gcc.gnu.org/wiki/ClangDiagnosticsComparison */
+
+typedef float __m128;
+void bad_binary_op ()
+{
+  __m128 myvec[2];
+  int const *ptr;
+  myvec[1]/ptr; /* { dg-error "invalid operands to binary /" } */
+
+/*
+{ dg-begin-multiline-output "" }
+   myvec[1]/ptr;
+   ~~~~~~~~^~~~
+{ dg-end-multiline-output "" } */
+
+}
+
+struct s {};
+struct t {};
+extern struct s some_function (void);
+extern struct t some_other_function (void);
+
+int another_bad_binary_op (void)
+{
+  return (some_function ()
+	  + some_other_function ()); /* { dg-error "invalid operands to binary +" } */
+
+/* { dg-begin-multiline-output "" }
+   return (some_function ()|
+           ~~~~~~~~~~~~~~~~+'struct s'
+    + some_other_function ());                                                  |
+    ^ ~~~~~~~~~~~~~~~~~~~~~~                                                    +'struct t'
+   { dg-end-multiline-output "" } */
+}
+
+void surplus_return_when_void (void)
+{
+  return 500; /* { dg-warning "'return' with a value, in function returning void" } */
+/* { dg-begin-multiline-output "" }
+   return 500;
+          ^~~
+   { dg-end-multiline-output "" } */
+/* { dg-begin-multiline-output "" }
+ void surplus_return_when_void (void)
+      ^
+   { dg-end-multiline-output "" } */
+}
+
+int missing_return_value (void)
+{
+  return; /* { dg-warning "'return' with no value, in function returning non-void" } */
+/* { dg-begin-multiline-output "" }
+   return;
+   ^
+   { dg-end-multiline-output "" } */
+/* { dg-begin-multiline-output "" }
+ int missing_return_value (void)
+     ^
+   { dg-end-multiline-output "" } */
+}
+
+int *address_of_local (void)
+{
+  int i;
+  return &i; /* { dg-warning "function returns address of local variable" } */
+/* { dg-begin-multiline-output "" }
+   return &i;
+          ^~
+   { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 17/22] libcpp: add location tracking within string literals
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (17 preceding siblings ...)
  2015-09-10 20:31 ` [PATCH 18/22] Track locations within string literals in tree_string David Malcolm
@ 2015-09-10 20:32 ` David Malcolm
  2015-09-10 20:32 ` [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend David Malcolm
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

This has not been optimized yet.

gcc/c-family/ChangeLog:
	* c-common.c (fname_as_string): Initialize loc field of "cstr",
	and call init_raw on strname.loc.
	* c-lex.c (cb_ident): Initialize loc field of "cstr".

libcpp/ChangeLog:
	* charset.c (struct _cpp_strbuf): Add cpp_string_location field
	"loc".
	(conversion_loop): Add "loc_reader" param and, if non-NULL, call its
	add_char_at method.
	(convert_utf8_utf16): Add "loc_reader" param and pass it to
	conversion_loop.
	(convert_utf8_utf32): Likewise.
	(convert_utf16_utf8): Likewise.
	(convert_utf32_utf8): Likewise.
	(convert_no_conversion): Add "loc_reader" param and, if non-NULL,
	call its add_n_chars_at method.
	(convert_using_iconv): Add dummy cpp_string_location_reader *
	param.
	(APPLY_CONVERSION): Add LOC_READER param.
	(cpp_host_to_exec_charset): Call init on tbuf's loc.
	(_cpp_valid_ucn): Add "char_range" and "loc_reader" params.  Write
	back to "char_range".
	(convert_ucn): Add "char_range" and "loc_reader" params, passing
	them to _cpp_valid_ucn call and to APPLY_CONVERSION site.
	(convert_hex): Add "char_range" and "loc_reader" params; use them
	to track source range information.
	(convert_oct): Likewise.
	(convert_escape): Add loc_reader param and use it to track source
	range information.
	(cpp_interpret_string): Initialize tbuf.loc.  Create an on-stack
	cpp_string_location_reader and use it to track source range
	information.
	(cpp_interpret_charconst): Initialize str.loc.
	(_cpp_convert_input): Initialize to.loc.  Add NULL when calling
	APPLY_CONVERSION.
	(cpp_string_location::init): New method.
	(cpp_string_location::init_raw): New method.
	(cpp_string_location::add_char_at): New method.
	(cpp_string_location::add_n_chars_at): New method.
	(cpp_string_location::get_loc_at_index): New method.
	(cpp_string_location::get_range_at_index): New method.
	(cpp_string_location::trivial_p): New method.
	(cpp_string_location_reader::cpp_string_location_reader): New ctors.
	(cpp_string_location_reader::get_next): New method.
	* directives.c (do_line): Initialize s.loc;
	(do_linemarker): Likewise.
	* expr.c (_cpp_parse_expr): Call init_raw on the token's str.loc.
	* include/cpplib.h (struct cpp_string_fragment_location): New struct.
	(struct cpp_string_location): New struct.
	(class cpp_string_location_reader): New class.
	(struct cpp_string): Add field "loc", a cpp_string_location.
	* internal.h (convert_f): Add cpp_string_location_reader * param.
	(_cpp_valid_ucn): Add source_range * param.
	* lex.c (forms_identifier_p): Add NULL argument to _cpp_valid_ucn.
	(lex_number): Initialize number->loc.
	(create_literal): Call init_raw on the token's str.loc.
	* macro.c (new_string_token): Call init on the token's str.loc.
---
 gcc/c-family/c-common.c |   3 +-
 gcc/c-family/c-lex.c    |   2 +-
 libcpp/charset.c        | 345 ++++++++++++++++++++++++++++++++++++++++++------
 libcpp/directives.c     |   4 +-
 libcpp/expr.c           |   2 +
 libcpp/include/cpplib.h | 134 +++++++++++++++++++
 libcpp/internal.h       |   7 +-
 libcpp/lex.c            |  12 +-
 libcpp/macro.c          |   1 +
 9 files changed, 465 insertions(+), 45 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 77962fc..a430bee 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -935,7 +935,7 @@ fname_as_string (int pretty_p)
   const char *name = "top level";
   char *namep;
   int vrb = 2, len;
-  cpp_string cstr = { 0, 0 }, strname;
+  cpp_string cstr = { 0, 0, {NULL, 0, 0} }, strname;
 
   if (!pretty_p)
     {
@@ -952,6 +952,7 @@ fname_as_string (int pretty_p)
   snprintf (namep, len, "\"%s\"", name);
   strname.text = (unsigned char *) namep;
   strname.len = len - 1;
+  strname.loc.init_raw (UNKNOWN_LOCATION, len, 1, line_table);
 
   if (cpp_interpret_string (parse_in, &strname, 1, &cstr, CPP_STRING))
     {
diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
index 1334994..f457199 100644
--- a/gcc/c-family/c-lex.c
+++ b/gcc/c-family/c-lex.c
@@ -171,7 +171,7 @@ cb_ident (cpp_reader * ARG_UNUSED (pfile),
   if (!flag_no_ident)
     {
       /* Convert escapes in the string.  */
-      cpp_string cstr = { 0, 0 };
+      cpp_string cstr = { 0, 0, { NULL, 0, 0 } };
       if (cpp_interpret_string (pfile, str, 1, &cstr, CPP_STRING))
 	{
 	  targetm.asm_out.output_ident ((const char *) cstr.text);
diff --git a/libcpp/charset.c b/libcpp/charset.c
index 5a1c929..3ae7916 100644
--- a/libcpp/charset.c
+++ b/libcpp/charset.c
@@ -99,6 +99,7 @@ struct _cpp_strbuf
   uchar *text;
   size_t asize;
   size_t len;
+  cpp_string_location loc;
 };
 
 /* This is enough to hold any string that fits on a single 80-column
@@ -453,7 +454,8 @@ one_utf16_to_utf8 (iconv_t bigend, const uchar **inbufp, size_t *inbytesleftp,
 static inline bool
 conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *,
 					     uchar **, size_t *),
-		 iconv_t cd, const uchar *from, size_t flen, struct _cpp_strbuf *to)
+		 iconv_t cd, const uchar *from, size_t flen, struct _cpp_strbuf *to,
+		 cpp_string_location_reader *loc_reader)
 {
   const uchar *inbuf;
   uchar *outbuf;
@@ -468,8 +470,13 @@ conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *,
   for (;;)
     {
       do
-	rval = one_conversion (cd, &inbuf, &inbytesleft,
-			       &outbuf, &outbytesleft);
+	{
+	  rval = one_conversion (cd, &inbuf, &inbytesleft,
+				 &outbuf, &outbytesleft);
+	  if (loc_reader)
+	    to->loc.add_char_at (loc_reader->get_next (),
+				 loc_reader->get_line_maps ());
+	}
       while (inbytesleft && !rval);
 
       if (__builtin_expect (inbytesleft == 0, 1))
@@ -503,36 +510,37 @@ conversion_loop (int (*const one_conversion)(iconv_t, const uchar **, size_t *,
 /* These four use the custom conversion code above.  */
 static bool
 convert_utf8_utf16 (iconv_t cd, const uchar *from, size_t flen,
-		    struct _cpp_strbuf *to)
+		    struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader)
 {
-  return conversion_loop (one_utf8_to_utf16, cd, from, flen, to);
+  return conversion_loop (one_utf8_to_utf16, cd, from, flen, to, loc_reader);
 }
 
 static bool
 convert_utf8_utf32 (iconv_t cd, const uchar *from, size_t flen,
-		    struct _cpp_strbuf *to)
+		    struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader)
 {
-  return conversion_loop (one_utf8_to_utf32, cd, from, flen, to);
+  return conversion_loop (one_utf8_to_utf32, cd, from, flen, to, loc_reader);
 }
 
 static bool
 convert_utf16_utf8 (iconv_t cd, const uchar *from, size_t flen,
-		    struct _cpp_strbuf *to)
+		    struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader)
 {
-  return conversion_loop (one_utf16_to_utf8, cd, from, flen, to);
+  return conversion_loop (one_utf16_to_utf8, cd, from, flen, to, loc_reader);
 }
 
 static bool
 convert_utf32_utf8 (iconv_t cd, const uchar *from, size_t flen,
-		    struct _cpp_strbuf *to)
+		    struct _cpp_strbuf *to, cpp_string_location_reader *loc_reader)
 {
-  return conversion_loop (one_utf32_to_utf8, cd, from, flen, to);
+  return conversion_loop (one_utf32_to_utf8, cd, from, flen, to, loc_reader);
 }
 
 /* Identity conversion, used when we have no alternative.  */
 static bool
 convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED,
-		       const uchar *from, size_t flen, struct _cpp_strbuf *to)
+		       const uchar *from, size_t flen, struct _cpp_strbuf *to,
+		       cpp_string_location_reader *loc_reader)
 {
   if (to->len + flen > to->asize)
     {
@@ -542,6 +550,7 @@ convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED,
     }
   memcpy (to->text + to->len, from, flen);
   to->len += flen;
+  to->loc.add_n_chars_at (flen, loc_reader);
   return true;
 }
 
@@ -559,7 +568,8 @@ convert_no_conversion (iconv_t cd ATTRIBUTE_UNUSED,
 
 static bool
 convert_using_iconv (iconv_t cd, const uchar *from, size_t flen,
-		     struct _cpp_strbuf *to)
+		     struct _cpp_strbuf *to,
+		     cpp_string_location_reader */*loc_reader*/)
 {
   ICONV_CONST char *inbuf;
   char *outbuf;
@@ -606,8 +616,8 @@ convert_using_iconv (iconv_t cd, const uchar *from, size_t flen,
 /* Arrange for the above custom conversion logic to be used automatically
    when conversion between a suitable pair of character sets is requested.  */
 
-#define APPLY_CONVERSION(CONVERTER, FROM, FLEN, TO) \
-   CONVERTER.func (CONVERTER.cd, FROM, FLEN, TO)
+#define APPLY_CONVERSION(CONVERTER, FROM, FLEN, TO, LOC_READER)	\
+  CONVERTER.func (CONVERTER.cd, FROM, FLEN, TO, LOC_READER)
 
 struct cpp_conversion
 {
@@ -792,8 +802,9 @@ cpp_host_to_exec_charset (cpp_reader *pfile, cppchar_t c)
   tbuf.asize = 1;
   tbuf.text = XNEWVEC (uchar, tbuf.asize);
   tbuf.len = 0;
+  tbuf.loc.init ();
 
-  if (!APPLY_CONVERSION (pfile->narrow_cset_desc, sbuf, 1, &tbuf))
+  if (!APPLY_CONVERSION (pfile->narrow_cset_desc, sbuf, 1, &tbuf, NULL))
     {
       cpp_errno (pfile, CPP_DL_ICE, "converting to execution character set");
       return 0;
@@ -985,7 +996,9 @@ ucn_valid_in_identifier (cpp_reader *pfile, cppchar_t c,
 bool
 _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
 		const uchar *limit, int identifier_pos,
-		struct normalize_state *nst, cppchar_t *cp)
+		struct normalize_state *nst, cppchar_t *cp,
+		source_range *char_range,
+		cpp_string_location_reader *loc_reader)
 {
   cppchar_t result, c;
   unsigned int length;
@@ -1021,6 +1034,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
       if (!ISXDIGIT (c))
 	break;
       str++;
+      if (char_range)
+	char_range->m_finish = loc_reader->get_next ().m_finish;
       result = (result << 4) + hex_value (c);
     }
   while (--length && str < limit);
@@ -1090,7 +1105,9 @@ _cpp_valid_ucn (cpp_reader *pfile, const uchar **pstr,
    An advanced pointer is returned.  Issues all relevant diagnostics.  */
 static const uchar *
 convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt,
+	     source_range char_range,
+	     cpp_string_location_reader *loc_reader)
 {
   cppchar_t ucn;
   uchar buf[6];
@@ -1100,7 +1117,12 @@ convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
   struct normalize_state nst = INITIAL_NORMALIZE_STATE;
 
   from++;  /* Skip u/U.  */
-  _cpp_valid_ucn (pfile, &from, limit, 0, &nst, &ucn);
+
+  /* The u/U is part of the spelling of this character.  */
+  char_range.m_finish = loc_reader->get_next ().m_finish;
+
+  ucn = _cpp_valid_ucn (pfile, &from, limit, 0, &nst,
+			&ucn, &char_range, loc_reader);
 
   rval = one_cppchar_to_utf8 (ucn, &bufp, &bytesleft);
   if (rval)
@@ -1109,9 +1131,18 @@ convert_ucn (cpp_reader *pfile, const uchar *from, const uchar *limit,
       cpp_errno (pfile, CPP_DL_ERROR,
 		 "converting UCN to source character set");
     }
-  else if (!APPLY_CONVERSION (cvt, buf, 6 - bytesleft, tbuf))
-    cpp_errno (pfile, CPP_DL_ERROR,
-	       "converting UCN to execution character set");
+  else
+    {
+      /* Set up a cpp_string_location_reader to supply a
+	 location for the single character, covering all of
+	 char_range.  */
+      cpp_string_location_reader buf_loc_reader
+	(char_range.m_start, char_range.m_finish + 1 - char_range.m_start,
+	 loc_reader->get_line_maps ());
+      if (!APPLY_CONVERSION (cvt, buf, 6 - bytesleft, tbuf, &buf_loc_reader))
+	cpp_errno (pfile, CPP_DL_ERROR,
+		   "converting UCN to execution character set");
+    }
 
   return from;
 }
@@ -1174,7 +1205,9 @@ emit_numeric_escape (cpp_reader *pfile, cppchar_t n,
    number.  You can, e.g. generate surrogate pairs this way.  */
 static const uchar *
 convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt,
+	     source_range char_range,
+	     cpp_string_location_reader *loc_reader)
 {
   cppchar_t c, n = 0, overflow = 0;
   int digits_found = 0;
@@ -1185,13 +1218,19 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
     cpp_warning (pfile, CPP_W_TRADITIONAL,
 	         "the meaning of '\\x' is different in traditional C");
 
-  from++;  /* Skip 'x'.  */
+  /* Skip 'x'.  */
+  from++;
+
+  /* The 'x' is part of the spelling of this character.  */
+  char_range.m_finish = loc_reader->get_next ().m_finish;
+
   while (from < limit)
     {
       c = *from;
       if (! hex_p (c))
 	break;
       from++;
+      char_range.m_finish = loc_reader->get_next ().m_finish;
       overflow |= n ^ (n << 4 >> 4);
       n = (n << 4) + hex_value (c);
       digits_found = 1;
@@ -1213,6 +1252,9 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
 
   emit_numeric_escape (pfile, n, tbuf, cvt);
 
+  tbuf->loc.add_char_at (char_range,
+			 pfile->line_table);
+
   return from;
 }
 
@@ -1224,7 +1266,9 @@ convert_hex (cpp_reader *pfile, const uchar *from, const uchar *limit,
    number.  */
 static const uchar *
 convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
-	     struct _cpp_strbuf *tbuf, struct cset_converter cvt)
+	     struct _cpp_strbuf *tbuf, struct cset_converter cvt,
+	     source_range char_range,
+	     cpp_string_location_reader *loc_reader)
 {
   size_t count = 0;
   cppchar_t c, n = 0;
@@ -1238,6 +1282,7 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
       if (c < '0' || c > '7')
 	break;
       from++;
+      char_range.m_finish = loc_reader->get_next ().m_finish;
       overflow |= n ^ (n << 3 >> 3);
       n = (n << 3) + c - '0';
     }
@@ -1251,6 +1296,9 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
 
   emit_numeric_escape (pfile, n, tbuf, cvt);
 
+  tbuf->loc.add_char_at (char_range,
+			 pfile->line_table);
+
   return from;
 }
 
@@ -1260,7 +1308,8 @@ convert_oct (cpp_reader *pfile, const uchar *from, const uchar *limit,
    pointer.  Handles all relevant diagnostics.  */
 static const uchar *
 convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit,
-		struct _cpp_strbuf *tbuf, struct cset_converter cvt)
+		struct _cpp_strbuf *tbuf, struct cset_converter cvt,
+		cpp_string_location_reader *loc_reader)
 {
   /* Values of \a \b \e \f \n \r \t \v respectively.  */
 #if HOST_CHARSET == HOST_CHARSET_ASCII
@@ -1273,20 +1322,26 @@ convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit,
 
   uchar c;
 
+  /* Record the location of the backslash.  */
+  source_range char_range = loc_reader->get_next ();
+
   c = *from;
   switch (c)
     {
       /* UCNs, hex escapes, and octal escapes are processed separately.  */
     case 'u': case 'U':
-      return convert_ucn (pfile, from, limit, tbuf, cvt);
+      return convert_ucn (pfile, from, limit, tbuf, cvt,
+			  char_range, loc_reader);
 
     case 'x':
-      return convert_hex (pfile, from, limit, tbuf, cvt);
+      return convert_hex (pfile, from, limit, tbuf, cvt,
+			  char_range, loc_reader);
       break;
 
     case '0':  case '1':  case '2':  case '3':
     case '4':  case '5':  case '6':  case '7':
-      return convert_oct (pfile, from, limit, tbuf, cvt);
+      return convert_oct (pfile, from, limit, tbuf, cvt,
+			  char_range, loc_reader);
 
       /* Various letter escapes.  Get the appropriate host-charset
 	 value into C.  */
@@ -1339,7 +1394,7 @@ convert_escape (cpp_reader *pfile, const uchar *from, const uchar *limit,
     }
 
   /* Now convert what we have to the execution character set.  */
-  if (!APPLY_CONVERSION (cvt, &c, 1, tbuf))
+  if (!APPLY_CONVERSION (cvt, &c, 1, tbuf, loc_reader))
     cpp_errno (pfile, CPP_DL_ERROR,
 	       "converting escape sequence to execution character set");
 
@@ -1388,14 +1443,21 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
   tbuf.asize = MAX (OUTBUF_BLOCK_SIZE, from->len);
   tbuf.text = XNEWVEC (uchar, tbuf.asize);
   tbuf.len = 0;
+  tbuf.loc.init ();
 
   for (i = 0; i < count; i++)
     {
+      cpp_string_location_reader loc_reader (&from[i].loc, pfile->line_table);
       p = from[i].text;
       if (*p == 'u')
 	{
-	  if (*++p == '8')
-	    p++;
+	  p++;
+	  loc_reader.get_next ();
+	  if (*p == '8')
+	    {
+	      p++;
+	      loc_reader.get_next ();
+	    }
 	}
       else if (*p == 'L' || *p == 'U') p++;
       if (*p == 'R')
@@ -1414,13 +1476,16 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
 
 	  /* Raw strings are all normal characters; these can be fed
 	     directly to convert_cset.  */
-	  if (!APPLY_CONVERSION (cvt, p, limit - p, &tbuf))
+	  if (!APPLY_CONVERSION (cvt, p, limit - p, &tbuf, &loc_reader))
 	    goto fail;
 
 	  continue;
 	}
 
-      p++; /* Skip leading quote.  */
+      /* Skip leading quote.  */
+      p++;
+      loc_reader.get_next ();
+
       limit = from[i].text + from[i].len - 1; /* Skip trailing quote.  */
 
       for (;;)
@@ -1432,13 +1497,13 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
 	    {
 	      /* We have a run of normal characters; these can be fed
 		 directly to convert_cset.  */
-	      if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf))
+	      if (!APPLY_CONVERSION (cvt, base, p - base, &tbuf, &loc_reader))
 		goto fail;
 	    }
 	  if (p == limit)
 	    break;
 
-	  p = convert_escape (pfile, p + 1, limit, &tbuf, cvt);
+	  p = convert_escape (pfile, p + 1, limit, &tbuf, cvt, &loc_reader);
 	}
     }
   /* NUL-terminate the 'to' buffer and translate it to a cpp_string
@@ -1447,6 +1512,7 @@ cpp_interpret_string (cpp_reader *pfile, const cpp_string *from, size_t count,
   tbuf.text = XRESIZEVEC (uchar, tbuf.text, tbuf.len);
   to->text = tbuf.text;
   to->len = tbuf.len;
+  to->loc = tbuf.loc;
   return true;
 
  fail:
@@ -1611,7 +1677,7 @@ cppchar_t
 cpp_interpret_charconst (cpp_reader *pfile, const cpp_token *token,
 			 unsigned int *pchars_seen, int *unsignedp)
 {
-  cpp_string str = { 0, 0 };
+  cpp_string str = { 0, 0, {NULL, 0, 0} };
   bool wide = (token->type != CPP_CHAR && token->type != CPP_UTF8CHAR);
   int u8 = 2 * int(token->type == CPP_UTF8CHAR);
   cppchar_t result;
@@ -1719,14 +1785,16 @@ _cpp_convert_input (cpp_reader *pfile, const char *input_charset,
       to.text = input;
       to.asize = size;
       to.len = len;
+      to.loc.init ();
     }
   else
     {
       to.asize = MAX (65536, len);
       to.text = XNEWVEC (uchar, to.asize);
       to.len = 0;
+      to.loc.init ();
 
-      if (!APPLY_CONVERSION (input_cset, input, len, &to))
+      if (!APPLY_CONVERSION (input_cset, input, len, &to, NULL))
 	cpp_error (pfile, CPP_DL_ERROR,
 		   "failure to convert %s to %s",
 		   CPP_OPTION (pfile, input_charset), SOURCE_CHARSET);
@@ -1811,3 +1879,204 @@ _cpp_default_encoding (void)
 
   return current_encoding;
 }
+
+/* Implementation of class cpp_string_location and
+   class cpp_string_location_reader.
+   We put them in this source file in the hope that they can be
+   inlined into heavy users such as cpp_interpret_string without
+   requiring the compiler itself to be built with LTO.  */
+
+/* FIXME.  */
+void
+cpp_string_location::init ()
+{
+  m_fragloc_array = NULL;
+  m_num_fraglocs = 0;
+  m_alloc_fraglocs = 0;
+}
+
+/* FIXME.  */
+void
+cpp_string_location::init_raw (source_location loc, int len, int cols_per_char,
+			       line_maps *line_table)
+{
+  line_map_realloc reallocator = (line_table->reallocator
+				  ? line_table->reallocator
+				  : (line_map_realloc) xrealloc);
+  m_fragloc_array = (cpp_string_fragment_location *)reallocator
+    (NULL,
+     sizeof (cpp_string_fragment_location));
+  m_fragloc_array[0].m_len = len;
+
+  /* LOC might be a macro location.  It only makes sense to do
+     column-by-column calculations on ordinary maps, so get the
+     corresponding location in an ordinary map.  */
+  source_location ordinary_loc
+    = linemap_resolve_location (line_table, loc,
+				LRK_SPELLING_LOCATION, NULL);
+  m_fragloc_array[0].m_loc = ordinary_loc;
+  m_fragloc_array[0].m_cols_per_char = cols_per_char;
+  m_num_fraglocs = 1;
+  m_alloc_fraglocs = 1;
+}
+
+
+/* FIXME.  */
+void
+cpp_string_location::add_char_at (source_range range,
+				  line_maps *line_table)
+{
+  if (m_fragloc_array)
+    {
+      /* Is this a simple run-on character in the next column
+	 within the current fragment?  */
+      cpp_string_fragment_location *current_fragment
+	= get_current_fragment ();
+      source_range next_range = current_fragment->get_next_range ();
+      if (range.m_start == next_range.m_start
+	  && range.m_finish == next_range.m_finish)
+	/* If so, we can simply increase the length of the current
+	   fragment.  */
+	current_fragment->m_len++;
+      else
+	{
+	  /* We need to start a new fragment.  This may require growing
+	     the underlying array.  */
+	  if (++m_num_fraglocs > m_alloc_fraglocs)
+	    {
+	      m_alloc_fraglocs *= 2;
+	      line_map_realloc reallocator = (line_table->reallocator
+					      ? line_table->reallocator
+					      : (line_map_realloc) xrealloc);
+	      m_fragloc_array = (cpp_string_fragment_location *)reallocator
+		(m_fragloc_array,
+		 sizeof (cpp_string_fragment_location) * m_alloc_fraglocs);
+	    }
+	  current_fragment = get_current_fragment ();
+	  current_fragment->m_len = 1;
+	  current_fragment->m_loc = range.m_start;
+	  current_fragment->m_cols_per_char
+	    = range.m_finish + 1 - range.m_start;
+	}
+    }
+  else
+    {
+      /* Begin new fragment array.  */
+      line_map_realloc reallocator = (line_table->reallocator
+				      ? line_table->reallocator
+				      : (line_map_realloc) xrealloc);
+      m_fragloc_array = (cpp_string_fragment_location *)reallocator
+	(NULL, sizeof (cpp_string_fragment_location));
+      m_fragloc_array[0].m_len = 1;
+      m_fragloc_array[0].m_loc = range.m_start;
+      m_fragloc_array[0].m_cols_per_char
+	= range.m_finish + 1 - range.m_start;
+      m_num_fraglocs = 1;
+      m_alloc_fraglocs = 1;
+    }
+}
+
+/* FIXME.  */
+void
+cpp_string_location::add_n_chars_at (int flen,
+				     cpp_string_location_reader *loc_reader)
+{
+  if (loc_reader)
+    while (flen--)
+      add_char_at (loc_reader->get_next (),
+		   loc_reader->get_line_maps ());
+}
+
+/* FIXME.  */
+source_location
+cpp_string_location::get_loc_at_index (unsigned int char_idx) const
+{
+  for (unsigned int fragment_idx = 0;
+       fragment_idx < m_num_fraglocs;
+       fragment_idx++)
+    {
+      cpp_string_fragment_location *fragment = &m_fragloc_array[fragment_idx];
+      if (char_idx < fragment->m_len)
+	return fragment->get_char_range (char_idx).m_start;
+      else
+	char_idx -= fragment->m_len;
+    }
+
+  /* Error: accessing beyond the end of the array.  */
+  return 0;
+}
+
+/* FIXME.  */
+source_range
+cpp_string_location::get_range_at_index (unsigned int char_idx) const
+{
+  for (unsigned int fragment_idx = 0;
+       fragment_idx < m_num_fraglocs;
+       fragment_idx++)
+    {
+      cpp_string_fragment_location *fragment = &m_fragloc_array[fragment_idx];
+      if (char_idx < fragment->m_len)
+	return fragment->get_char_range (char_idx);
+      else
+	char_idx -= fragment->m_len;
+    }
+
+  /* Error: accessing beyond the end of the array.  */
+  source_range err;
+  err.m_start = 0;
+  err.m_finish = 0;
+  return err;
+}
+
+/* FIXME.  */
+bool
+cpp_string_location::trivial_p () const
+{
+  if (m_num_fraglocs == 1)
+    if (m_fragloc_array[0].m_cols_per_char == 1)
+      return true;
+  return false;
+}
+
+/* Constructor for iterating through the locations in
+   cpp_string_location.  */
+
+cpp_string_location_reader::
+cpp_string_location_reader (const cpp_string_location *strloc,
+			    line_maps *line_table)
+{
+  /* As an optimization, we require that STRLOC must consist of a
+     single fragment.  */
+  linemap_assert (strloc->m_num_fraglocs == 1);
+  m_loc = strloc->m_fragloc_array[0].m_loc;
+  m_cols_per_char = strloc->m_fragloc_array[0].m_cols_per_char;
+  m_line_table = line_table;
+}
+
+/* Constructor for iterating through an arbitrary buffer.  */
+
+cpp_string_location_reader::
+cpp_string_location_reader (source_location src_loc,
+			    int cols_per_char,
+			    line_maps *line_table)
+: m_cols_per_char (cols_per_char),
+  m_line_table (line_table)
+{
+  /* LOC might be a macro location.  It only makes sense to do
+     column-by-column calculations on ordinary maps, so get the
+     corresponding location in an ordinary map.  */
+  m_loc
+    = linemap_resolve_location (line_table, src_loc,
+				LRK_SPELLING_LOCATION, NULL);
+}
+
+/* FIXME.  */
+source_range
+cpp_string_location_reader::get_next ()
+{
+  source_range result;
+  result.m_start = m_loc;
+  result.m_finish = m_loc + m_cols_per_char - 1;
+  m_loc += m_cols_per_char;
+  return result;
+}
diff --git a/libcpp/directives.c b/libcpp/directives.c
index 1e9bc3d..b783a7e 100644
--- a/libcpp/directives.c
+++ b/libcpp/directives.c
@@ -949,7 +949,7 @@ do_line (cpp_reader *pfile)
   token = cpp_get_token (pfile);
   if (token->type == CPP_STRING)
     {
-      cpp_string s = { 0, 0 };
+      cpp_string s = { 0, 0, { NULL, 0, 0 } };
       if (cpp_interpret_string_notranslate (pfile, &token->val.str, 1,
 					    &s, CPP_STRING))
 	new_file = (const char *)s.text;
@@ -1006,7 +1006,7 @@ do_linemarker (cpp_reader *pfile)
   token = cpp_get_token (pfile);
   if (token->type == CPP_STRING)
     {
-      cpp_string s = { 0, 0 };
+      cpp_string s = { 0, 0, { NULL, 0, 0 } };
       if (cpp_interpret_string_notranslate (pfile, &token->val.str,
 					    1, &s, CPP_STRING))
 	new_file = (const char *)s.text;
diff --git a/libcpp/expr.c b/libcpp/expr.c
index 3dc5c0b..f355646 100644
--- a/libcpp/expr.c
+++ b/libcpp/expr.c
@@ -1228,6 +1228,8 @@ _cpp_parse_expr (cpp_reader *pfile, bool is_if)
 			      "missing binary operator before token \"%s\"",
 			      cpp_token_as_text (pfile, op.token));
 	  want_value = false;
+	  ((cpp_token *)op.token)->val.str.loc.init_raw (op.loc, 1, 1, /* FIXME */
+							 pfile->line_table);
 	  top->value = eval_token (pfile, op.token, op.loc);
 	  continue;
 
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index 0b1a403..a5e5df5 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -173,10 +173,144 @@ enum c_lang {CLK_GNUC89 = 0, CLK_GNUC99, CLK_GNUC11,
 	     CLK_GNUCXX, CLK_CXX98, CLK_GNUCXX11, CLK_CXX11,
 	     CLK_GNUCXX14, CLK_CXX14, CLK_GNUCXX1Z, CLK_CXX1Z, CLK_ASM};
 
+/* Location of the individual chars in a cpp_string.
+   Specifically, this stores a run of characters of len, starting at loc,
+   with a consistent number of columns per char.
+   See the description below for cpp_string_location.  */
+struct GTY(()) cpp_string_fragment_location {
+  source_location m_loc;
+  unsigned int m_len : 12;
+  unsigned int m_cols_per_char : 4;
+
+  source_range get_char_range (int idx) const
+  {
+    source_range result;
+    result.m_start = m_loc + (idx * m_cols_per_char);
+    result.m_finish = result.m_start + m_cols_per_char - 1;
+    return result;
+  }
+  source_range get_next_range () const
+  {
+    return get_char_range (m_len);
+  }
+  source_range get_covered_range () const
+  {
+    source_range result;
+    result.m_start = m_loc;
+    result.m_finish = m_loc + (m_len * m_cols_per_char) - 1;
+    return result;
+  }
+  void debug (const char *msg) const;
+};
+
+class cpp_string_location_reader;
+
+/* Location of the individual chars in a cpp_string.
+   This is stored as a dynamically-allocated array of fragments.
+   For example, consider this call to printf:
+
+     printf ("foo \x25\151 bar"  "baz",
+             "not an int");
+
+   The string constant for the first parameter is composed of
+   the concatenation of two string literals, with hexadecimal
+   encoding of a '%' and octal encoding of a 'i', giving a
+   resulting STRING_CST of:
+
+     "foo %i barbaz"
+
+   We want to efficiently record the range of locations in the
+   source file of each character so that we can emit warnings about
+   the type mismatch between format specifier "%i" and the non-int
+   second argument.
+
+   We record the locations as a series of fragments, where within
+   each fragment we have a contiguous run of input characters with
+   a consistent number of columns per character.  In the example
+   above the fragments are:
+
+    printf ("foo \x25\151 bar"  "baz",
+    .........^^^^....................: fragment 0: 4 chars at 1 col per char
+    .............^^^^^^^^............: fragment 1: 2 chars at 4 cols per char
+    .....................^^^^........: fragment 2: 4 chars at 1 col per char
+    .............................^^^.: fragment 3: 3 chars at 1 col per char
+
+   Note that the hex and octal chars both happen to be 4 cols per char
+   and are contiguous, hence both end up being in fragment 1, whereas the
+   "bar" and "baz" aren't contiguous and hence have to be in separate
+   fragments.
+
+   Note also that having a constant cols-per-char within each fragment
+   means that given an index into the fragment we can directly compute
+   the corresponding source_range.  */
+
+struct GTY(()) cpp_string_location {
+
+  void init ();
+  void init_raw (source_location loc, int len, int cols_per_char,
+		 line_maps *line_table);
+
+  void add_char_at (source_range range,
+		    line_maps *line_table);
+  void add_n_chars_at (int flen, cpp_string_location_reader *loc_reader);
+
+  source_location get_loc_at_index (unsigned int idx) const;
+  source_range get_range_at_index (unsigned int idx) const;
+
+  void debug () const;
+
+  bool trivial_p () const;
+
+ private:
+  cpp_string_fragment_location *get_current_fragment () const
+  {
+    return &m_fragloc_array[m_num_fraglocs - 1];
+  }
+
+  /* Fields.
+     Ideally we would make these fields private, but this isn't easily
+     doable since gengtype generates functions in gtype-desc.c that
+     access them.  */
+ public:
+
+  /* We seemingly can't use vec<> from libcpp, so do it "by hand"
+     here.  */
+  cpp_string_fragment_location *m_fragloc_array;
+  unsigned int m_num_fraglocs;
+  unsigned int m_alloc_fraglocs;
+};
+
+/* A class for iterating through the source-locations within a
+   string, either from a cpp_string_location, or a temporary buffer.  */
+class cpp_string_location_reader {
+ public:
+  /* Constructor for iterating through the locations in
+     cpp_string_location.
+     As an optimization, we require that STRLOC must consist of a
+     single fragment.  */
+  cpp_string_location_reader (const cpp_string_location *strloc,
+			      line_maps *line_table);
+
+  /* Constructor for iterating through an arbitrary buffer.  */
+  cpp_string_location_reader (source_location src_loc,
+			      int cols_per_char,
+			      line_maps *line_table);
+
+  source_range get_next ();
+
+  line_maps *get_line_maps () const { return m_line_table; }
+
+ private:
+  source_location m_loc;
+  int m_cols_per_char;
+  line_maps *m_line_table;
+};
+
 /* Payload of a NUMBER, STRING, CHAR or COMMENT token.  */
 struct GTY(()) cpp_string {
   unsigned int len;
   const unsigned char *text;
+  cpp_string_location loc;
 };
 
 /* Flags for the cpp_token structure.  */
diff --git a/libcpp/internal.h b/libcpp/internal.h
index abd464f..5be45f3 100644
--- a/libcpp/internal.h
+++ b/libcpp/internal.h
@@ -42,7 +42,8 @@ struct op;
 struct _cpp_strbuf;
 
 typedef bool (*convert_f) (iconv_t, const unsigned char *, size_t,
-			   struct _cpp_strbuf *);
+			   struct _cpp_strbuf *,
+			   cpp_string_location_reader *loc_reader);
 struct cset_converter
 {
   convert_f func;
@@ -747,7 +748,9 @@ struct normalize_state
 extern bool _cpp_valid_ucn (cpp_reader *, const unsigned char **,
 			    const unsigned char *, int,
 			    struct normalize_state *state,
-			    cppchar_t *);
+			    cppchar_t *,
+			    source_range *char_range,
+			    cpp_string_location_reader *loc_reader);
 extern void _cpp_destroy_iconv (cpp_reader *);
 extern unsigned char *_cpp_convert_input (cpp_reader *, const char *,
 					  unsigned char *, size_t, size_t,
diff --git a/libcpp/lex.c b/libcpp/lex.c
index a84a8c0..0a6bc1c 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -1247,7 +1247,7 @@ forms_identifier_p (cpp_reader *pfile, int first,
       cppchar_t s;
       buffer->cur += 2;
       if (_cpp_valid_ucn (pfile, &buffer->cur, buffer->rlimit, 1 + !first,
-			  state, &s))
+			  state, &s, NULL, NULL))
 	return true;
       buffer->cur -= 2;
     }
@@ -1407,6 +1407,15 @@ lex_number (cpp_reader *pfile, cpp_string *number,
   const uchar *base;
   uchar *dest;
 
+  /* FIXME: should it really use a new "cpp_number", rather than cpp_string?
+     We need to init this, or we get a crash accessing uninited data
+     during GC, since,
+       struct GTY(()) cpp_token
+     has union cpp_token_u with
+       desc ("cpp_token_val_index (&%1)")))
+     and this gives CPP_TOKEN_FLD_STR for numbers (and strings).  */
+  number->loc.init ();
+
   base = pfile->buffer->cur - 1;
   do
     {
@@ -1446,6 +1455,7 @@ create_literal (cpp_reader *pfile, cpp_token *token, const uchar *base,
   token->type = type;
   token->val.str.len = len;
   token->val.str.text = dest;
+  token->val.str.loc.init_raw (token->src_loc, len, 1, pfile->line_table);
 }
 
 /* Subroutine of lex_raw_string: Append LEN chars from BASE to the buffer
diff --git a/libcpp/macro.c b/libcpp/macro.c
index 786c21b..b21e218 100644
--- a/libcpp/macro.c
+++ b/libcpp/macro.c
@@ -216,6 +216,7 @@ new_string_token (cpp_reader *pfile, unsigned char *text, unsigned int len)
   token->type = CPP_STRING;
   token->val.str.len = len;
   token->val.str.text = text;
+  token->val.str.loc.init ();
   token->flags = 0;
   return token;
 }
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (18 preceding siblings ...)
  2015-09-10 20:32 ` [PATCH 17/22] libcpp: add location tracking within string literals David Malcolm
@ 2015-09-10 20:32 ` David Malcolm
  2015-09-10 21:11   ` Andi Kleen
  2015-09-11 15:31   ` Manuel López-Ibáñez
  2015-09-10 20:32 ` [PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics David Malcolm
                   ` (2 subsequent siblings)
  22 siblings, 2 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-10/spellcheck.html

There are a couple of FIXMEs here:
* where to call levenshtein_distance_unit_tests
* should we attempt error-recovery in c-typeck.c:build_component_ref

gcc/ChangeLog:
	* Makefile.in (OBJS): Add spellcheck.o.
	* spellcheck.c: New file.
	* spellcheck.h: New file.

gcc/c-family/ChangeLog:
	* c-common.h (lookup_name_fuzzy): New decl.

gcc/c/ChangeLog:
	* c-decl.c: Include spellcheck.h.
        (lookup_name_fuzzy): New.
	* c-parser.c: Include spellcheck.h.
	(c_parser_declaration_or_fndef): If "unknown type name",
	attempt to suggest a close match using lookup_name_fuzzy.
	(c_parser_postfix_expression): Pass source range in calls to
	build_component_ref.
	(c_parser_postfix_expression_after_primary): Likewise.
	* c-tree.h (build_component_ref): Add source_range * param.
	* c-typeck.c: Include gcc-rich-location.h and spellcheck.h.
	(lookup_field_fuzzy_find_candidates): New function.
	(lookup_field_fuzzy): New function.
	(build_component_ref): Add "ident_range" param, use it
	when printing field-not-found error.  Use lookup_field_fuzzy
	to suggest close matches.

gcc/objc/ChangeLog:
	* objc-act.c (objc_build_component_ref): Pass NULL to new param
	of build_component_ref.

gcc/testsuite/ChangeLog:
	* gcc.dg/spellcheck.c: New file.
---
 gcc/Makefile.in                   |   1 +
 gcc/c-family/c-common.h           |   1 +
 gcc/c/c-decl.c                    |  37 +++++++++++
 gcc/c/c-parser.c                  |  30 ++++++---
 gcc/c/c-tree.h                    |   2 +-
 gcc/c/c-typeck.c                  |  92 +++++++++++++++++++++++++++-
 gcc/objc/objc-act.c               |   3 +-
 gcc/spellcheck.c                  | 126 ++++++++++++++++++++++++++++++++++++++
 gcc/spellcheck.h                  |  32 ++++++++++
 gcc/testsuite/gcc.dg/spellcheck.c |  36 +++++++++++
 10 files changed, 347 insertions(+), 13 deletions(-)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index c472696..7fbd80a 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1392,6 +1392,7 @@ OBJS = \
 	shrink-wrap.o \
 	simplify-rtx.o \
 	sparseset.o \
+	spellcheck.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	statistics.o \
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index b9a5d72..1903574 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -972,6 +972,7 @@ extern tree finish_label_address_expr (tree, location_t);
    different implementations.  Used in c-common.c.  */
 extern tree lookup_label (tree);
 extern tree lookup_name (tree);
+extern tree lookup_name_fuzzy (tree);
 extern bool lvalue_p (const_tree);
 
 extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b7f0241..778c935 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ada-spec.h"
 #include "cilk.h"
 #include "builtins.h"
+#include "spellcheck.h"
 
 /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
 enum decl_context
@@ -3900,6 +3901,42 @@ lookup_name_in_scope (tree name, struct c_scope *scope)
       return b->decl;
   return 0;
 }
+
+/* Look for the closest match for NAME within the currently valid
+   scopes.
+
+   This finds the identifier with the lowest Levenshtein distance to
+   NAME.  If there are multiple candidates with equal minimal distance,
+   the first one found is returned.  Scopes are searched from innermost
+   outwards, and within a scope in reverse order of declaration, thus
+   benefiting candidates "near" to the current scope.  */
+
+tree
+lookup_name_fuzzy (tree name)
+{
+  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
+
+  c_binding *best_binding = NULL;
+  int best_distance = INT_MAX;
+
+  for (c_scope *scope = current_scope; scope; scope = scope->outer)
+    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
+      {
+	if (!binding->id)
+	  continue;
+	int dist = levenshtein_distance (name, binding->id);
+	if (dist < best_distance)
+	  {
+	    best_distance = dist;
+	    best_binding = binding;
+	  }
+      }
+  if (best_binding)
+    return best_binding->id;
+  else
+    return NULL;
+}
+
 \f
 /* Create the predefined scalar types of C,
    and some nodes representing standard constants (0, 1, (void *) 0).
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 0c62496..d134d85 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
+#include "spellcheck.h"
 
 \f
 /* Initialization routine for this file.  */
@@ -1543,8 +1544,14 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
           || c_parser_peek_2nd_token (parser)->type == CPP_MULT)
       && (!nested || !lookup_name (c_parser_peek_token (parser)->value)))
     {
-      error_at (here, "unknown type name %qE",
-                c_parser_peek_token (parser)->value);
+      tree hint = lookup_name_fuzzy (c_parser_peek_token (parser)->value);
+      if (hint)
+	error_at (here, "unknown type name %qE; did you mean %qE?",
+		  c_parser_peek_token (parser)->value,
+		  hint);
+      else
+	error_at (here, "unknown type name %qE",
+		  c_parser_peek_token (parser)->value);
 
       /* Parse declspecs normally to get a correct pointer type, but avoid
          a further "fails to be a type name" error.  Refuse nested functions
@@ -7425,7 +7432,8 @@ c_parser_postfix_expression (c_parser *parser)
 	    if (c_parser_next_token_is (parser, CPP_NAME))
 	      {
 		offsetof_ref = build_component_ref
-		  (loc, offsetof_ref, c_parser_peek_token (parser)->value);
+		  (loc, offsetof_ref, c_parser_peek_token (parser)->value,
+		   &c_parser_peek_token (parser)->range);
 		c_parser_consume_token (parser);
 		while (c_parser_next_token_is (parser, CPP_DOT)
 		       || c_parser_next_token_is (parser,
@@ -7453,7 +7461,8 @@ c_parser_postfix_expression (c_parser *parser)
 			  }
 			offsetof_ref = build_component_ref
 			  (loc, offsetof_ref,
-			   c_parser_peek_token (parser)->value);
+			   c_parser_peek_token (parser)->value,
+			   &c_parser_peek_token (parser)->range);
 			c_parser_consume_token (parser);
 		      }
 		    else
@@ -7913,6 +7922,7 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
   vec<location_t> arg_loc = vNULL;
   location_t start;
   location_t finish;
+  source_range ident_range;
 
   while (true)
     {
@@ -8031,10 +8041,12 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
               expr.original_type = NULL;
 	      return expr;
 	    }
+	  ident_range = c_parser_peek_token (parser)->range;
 	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
-	  finish = c_parser_peek_token (parser)->range.m_finish;
+	  finish = ident_range.m_finish;
 	  c_parser_consume_token (parser);
-	  expr.value = build_component_ref (op_loc, expr.value, ident);
+	  expr.value = build_component_ref (op_loc, expr.value, ident,
+					    &ident_range);
 	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  if (TREE_CODE (expr.value) != COMPONENT_REF)
@@ -8063,14 +8075,16 @@ c_parser_postfix_expression_after_primary (c_parser *parser,
 	      expr.original_type = NULL;
 	      return expr;
 	    }
+	  ident_range = c_parser_peek_token (parser)->range;
 	  start = EXPR_LOCATION_RANGE (expr.value).m_start;
-	  finish = c_parser_peek_token (parser)->range.m_finish;
+	  finish = ident_range.m_finish;
 	  c_parser_consume_token (parser);
 	  expr.value = build_component_ref (op_loc,
 					    build_indirect_ref (op_loc,
 								expr.value,
 								RO_ARROW),
-					    ident);
+					    ident,
+					    &ident_range);
 	  set_source_range (&expr.value, start, finish);
 	  expr.original_code = ERROR_MARK;
 	  if (TREE_CODE (expr.value) != COMPONENT_REF)
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index 0810a74..4c3c637 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -593,7 +593,7 @@ extern struct c_expr convert_lvalue_to_rvalue (location_t, struct c_expr,
 					       bool, bool);
 extern void mark_exp_read (tree);
 extern tree composite_type (tree, tree);
-extern tree build_component_ref (location_t, tree, tree);
+extern tree build_component_ref (location_t, tree, tree, source_range *);
 extern tree build_array_ref (location_t, tree, tree);
 extern tree build_external_ref (source_range, tree, int, tree *);
 extern void pop_maybe_used (bool);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 506abb3..507400b 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -54,6 +54,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ubsan.h"
 #include "cilk.h"
 #include "gomp-constants.h"
+#include "gcc-rich-location.h"
+#include "spellcheck.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -2259,12 +2261,72 @@ lookup_field (tree type, tree component)
   return tree_cons (NULL_TREE, field, NULL_TREE);
 }
 
+/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
+
+static void
+lookup_field_fuzzy_find_candidates (tree type, tree component,
+				    vec<tree> *candidates)
+{
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+    {
+      if (DECL_NAME (field) == NULL_TREE
+	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
+	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+	{
+	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+					      component,
+					      candidates);
+	}
+
+      if (DECL_NAME (field))
+	candidates->safe_push (field);
+    }
+}
+
+/* Like "lookup_field", but find the closest match, rather than
+   necessarily an exact match.  */
+
+static tree
+lookup_field_fuzzy (tree type, tree component)
+{
+  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
+
+  /* FIXME: move this to a unittest suite. */
+  levenshtein_distance_unit_tests ();
+
+  /* First, gather a list of candidates.  */
+  auto_vec <tree> candidates;
+
+  lookup_field_fuzzy_find_candidates (type, component,
+				      &candidates);
+
+  /* Now determine which is closest.  */
+  int i;
+  tree field;
+  tree best_field = NULL;
+  int best_distance = INT_MAX;
+  FOR_EACH_VEC_ELT (candidates, i, field)
+    {
+      int dist = levenshtein_distance (component, DECL_NAME (field));
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best_field = field;
+	}
+    }
+
+  return best_field;
+}
+
 /* Make an expression to refer to the COMPONENT field of structure or
    union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
-   location of the COMPONENT_REF.  */
+   location of the COMPONENT_REF.  IDENT_RANGE points to the source range of
+   the identifier, or is NULL.  */
 
 tree
-build_component_ref (location_t loc, tree datum, tree component)
+build_component_ref (location_t loc, tree datum, tree component,
+		     source_range *ident_range)
 {
   tree type = TREE_TYPE (datum);
   enum tree_code code = TREE_CODE (type);
@@ -2294,7 +2356,31 @@ build_component_ref (location_t loc, tree datum, tree component)
 
       if (!field)
 	{
-	  error_at (loc, "%qT has no member named %qE", type, component);
+	  if (!ident_range)
+	    {
+	      error_at (loc, "%qT has no member named %qE",
+			type, component);
+	      return error_mark_node;
+	    }
+	  gcc_rich_location richloc (*ident_range);
+	  if (TREE_CODE (datum) == INDIRECT_REF)
+	    richloc.add_expr (TREE_OPERAND (datum, 0));
+	  else
+	    richloc.add_expr (datum);
+	  field = lookup_field_fuzzy (type, component);
+	  if (field)
+	    {
+	      error_at_rich_loc
+		(&richloc,
+		 "%qT has no member named %qE; did you mean %qE?",
+		 type, component, field);
+	      /* FIXME: error recovery: should we try to keep going,
+		 with "field"? (having issued an error, and hence no
+		 output).  */
+	    }
+	  else
+	    error_at_rich_loc (&richloc, "%qT has no member named %qE",
+			       type, component);
 	  return error_mark_node;
 	}
 
diff --git a/gcc/objc/objc-act.c b/gcc/objc/objc-act.c
index a1e32fc..04f2824 100644
--- a/gcc/objc/objc-act.c
+++ b/gcc/objc/objc-act.c
@@ -2665,7 +2665,8 @@ objc_build_component_ref (tree datum, tree component)
   return finish_class_member_access_expr (datum, component, false,
                                           tf_warning_or_error);
 #else
-  return build_component_ref (input_location, datum, component);
+  return build_component_ref (input_location, datum, component,
+			      NULL);
 #endif
 }
 
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
new file mode 100644
index 0000000..2892381
--- /dev/null
+++ b/gcc/spellcheck.c
@@ -0,0 +1,126 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "spellcheck.h"
+
+/* The Levenshtein distance is an "edit-distance": the minimal
+   number of one-character insertions, removals or substitutions
+   that are needed to change one string into another.
+
+   This implementation uses the Wagner-Fischer algorithm.
+
+   It is currently a naive implementation, various optimizations are
+   possible (e.g. removal of the M*N allocation in favor of a pair
+   of allocations of M and N).  */
+
+static int
+levenshtein_distance (const char *s, int m,
+		      const char *t, int n)
+{
+  const bool debug = false;
+
+  int *d = new int [(m + 1) * (n + 1)];
+  if (debug)
+    memset (d, 0xab, (m + 1) * (n + 1) * sizeof (int));
+
+#define D(I, J) (d[((I) * (n + 1)) + (J)])
+
+  for (int i = 0; i <= m; i++)
+    D(i, 0) = i;
+  for (int j = 0; j <= n; j++)
+    D(0, j) = j;
+
+  for (int j = 1; j <= n; j++)
+    for (int i = 1; i <= m; i++)
+      if (s[i - 1] == t[j - 1])
+	D(i, j) = D(i - 1, j - 1);
+      else
+	{
+	  int deletion     = D(i - 1, j    ) + 1;
+	  int insertion    = D(i,     j - 1) + 1;
+	  int substitution = D(i - 1, j - 1) + 1;
+	  int cheapest = MIN (deletion, insertion);
+	  cheapest = MIN (cheapest, substitution);
+	  D(i, j) = cheapest;
+	}
+
+  if (debug)
+    {
+      printf ("s=\"%s\" t=\"%s\"\n", s, t);
+      for (int j = 0; j <= n; j++)
+	{
+	  for (int i = 0; i <= m; i++)
+	    printf ("%i ", D(i, j));
+	  printf ("\n");
+	}
+    }
+
+  int result = D(m, n);
+#undef D
+
+  delete[] d;
+
+  return result;
+}
+
+/* Calculate Levenshtein distance between two nil-terminated strings.
+   This exists purely for the unit tests.  */
+
+int
+levenshtein_distance (const char *s, const char *t)
+{
+  return levenshtein_distance (s, strlen (s), t, strlen (t));
+}
+
+/* Unit tests for levenshtein_distance.  */
+
+void
+levenshtein_distance_unit_tests (void)
+{
+  gcc_assert (levenshtein_distance ("", "nonempty") == strlen ("nonempty"));
+  gcc_assert (levenshtein_distance ("nonempty", "") == strlen ("nonempty"));
+  gcc_assert (levenshtein_distance ("saturday", "sunday") == 3);
+  gcc_assert (levenshtein_distance ("sunday", "saturday") == 3);
+  gcc_assert (levenshtein_distance ("foo", "m_foo") == 2);
+  gcc_assert (levenshtein_distance ("m_foo", "foo") == 2);
+  gcc_assert (levenshtein_distance ("hello_world", "HelloWorld") == 3);
+  gcc_assert (levenshtein_distance
+	      ("the quick brown fox jumps over the lazy dog", "dog") == 40);
+  gcc_assert (levenshtein_distance
+	      ("the quick brown fox jumps over the lazy dog", "fox") == 40);
+}
+
+/* Calculate Levenshtein distance between two identifiers.  */
+
+int
+levenshtein_distance (tree ident_s, tree ident_t)
+{
+  gcc_assert (TREE_CODE (ident_s) == IDENTIFIER_NODE);
+  gcc_assert (TREE_CODE (ident_t) == IDENTIFIER_NODE);
+
+  return levenshtein_distance (IDENTIFIER_POINTER (ident_s),
+			       IDENTIFIER_LENGTH (ident_s),
+			       IDENTIFIER_POINTER (ident_t),
+			       IDENTIFIER_LENGTH (ident_t));
+}
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
new file mode 100644
index 0000000..7900aa2
--- /dev/null
+++ b/gcc/spellcheck.h
@@ -0,0 +1,32 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_H
+#define GCC_SPELLCHECK_H
+
+extern void
+levenshtein_distance_unit_tests (void);
+
+extern int
+levenshtein_distance (const char *s, const char *t);
+
+extern int
+levenshtein_distance (tree ident_s, tree ident_t);
+
+#endif  /* GCC_SPELLCHECK_H  */
diff --git a/gcc/testsuite/gcc.dg/spellcheck.c b/gcc/testsuite/gcc.dg/spellcheck.c
new file mode 100644
index 0000000..e34ade8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+struct foo
+{
+  int foo;
+  int bar;
+  int baz;
+};
+
+int test (struct foo *ptr)
+{
+  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+
+/* { dg-begin-multiline-output "" }
+   return ptr->m_bar;
+          ~~~  ^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+int test2 (void)
+{
+  struct foo instance = {};
+  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+
+/* { dg-begin-multiline-output "" }
+   return instance.m_bar;
+          ~~~~~~~~ ^~~~~
+   { dg-end-multiline-output "" } */
+}
+
+int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int'?" } */
+/* { dg-begin-multiline-output "" }
+ int64 foo;
+ ^~~~~
+   { dg-end-multiline-output "" } */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 22/22] Add fixit hints to spellchecker suggestions
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (20 preceding siblings ...)
  2015-09-10 20:32 ` [PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics David Malcolm
@ 2015-09-10 20:50 ` David Malcolm
  2015-09-14 17:49 ` [PATCH 00/22] RFC: Overhaul of diagnostics Bernd Schmidt
  22 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-10 20:50 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Malcolm

Screenshot:
 https://dmalcolm.fedorapeople.org/gcc/2015-09-10/spellcheck-with-fixits.html

gcc/c/ChangeLog:
	* c-parser.c (c_parser_declaration_or_fndef): Add fix-it
	hint to "did you mean" suggestion.
	* c-typeck.c (build_component_ref): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/spellcheck.c: Update expected output to show fix-it
	hints.
---
 gcc/c/c-parser.c                  | 12 +++++++++---
 gcc/c/c-typeck.c                  |  3 +++
 gcc/testsuite/gcc.dg/spellcheck.c |  6 +++++-
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index d134d85..1defa71 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -1546,9 +1546,15 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
     {
       tree hint = lookup_name_fuzzy (c_parser_peek_token (parser)->value);
       if (hint)
-	error_at (here, "unknown type name %qE; did you mean %qE?",
-		  c_parser_peek_token (parser)->value,
-		  hint);
+	{
+	  rich_location richloc (here);
+	  richloc.add_fixit_replace (here,
+				     IDENTIFIER_POINTER (hint));
+	  error_at_rich_loc (&richloc,
+			     "unknown type name %qE; did you mean %qE?",
+			     c_parser_peek_token (parser)->value,
+			     hint);
+	}
       else
 	error_at (here, "unknown type name %qE",
 		  c_parser_peek_token (parser)->value);
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 507400b..0d92135 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2370,6 +2370,9 @@ build_component_ref (location_t loc, tree datum, tree component,
 	  field = lookup_field_fuzzy (type, component);
 	  if (field)
 	    {
+	      richloc.add_fixit_replace
+		(*ident_range,
+		 IDENTIFIER_POINTER (DECL_NAME (field)));
 	      error_at_rich_loc
 		(&richloc,
 		 "%qT has no member named %qE; did you mean %qE?",
diff --git a/gcc/testsuite/gcc.dg/spellcheck.c b/gcc/testsuite/gcc.dg/spellcheck.c
index e34ade8..892057e 100644
--- a/gcc/testsuite/gcc.dg/spellcheck.c
+++ b/gcc/testsuite/gcc.dg/spellcheck.c
@@ -15,6 +15,7 @@ int test (struct foo *ptr)
 /* { dg-begin-multiline-output "" }
    return ptr->m_bar;
           ~~~  ^~~~~
+               bar
    { dg-end-multiline-output "" } */
 }
 
@@ -26,11 +27,14 @@ int test2 (void)
 /* { dg-begin-multiline-output "" }
    return instance.m_bar;
           ~~~~~~~~ ^~~~~
+                   bar
    { dg-end-multiline-output "" } */
 }
 
-int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int'?" } */
+#include <inttypes.h>
+int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int64_t'?" } */
 /* { dg-begin-multiline-output "" }
  int64 foo;
  ^~~~~
+ int64_t
    { dg-end-multiline-output "" } */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend
  2015-09-10 20:32 ` [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend David Malcolm
@ 2015-09-10 21:11   ` Andi Kleen
  2015-09-11 15:31   ` Manuel López-Ibáñez
  1 sibling, 0 replies; 133+ messages in thread
From: Andi Kleen @ 2015-09-10 21:11 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

David Malcolm <dmalcolm@redhat.com> writes:
> +	    {
> +	      error_at_rich_loc
> +		(&richloc,
> +		 "%qT has no member named %qE; did you mean %qE?",
> +		 type, component, field);
> +	      /* FIXME: error recovery: should we try to keep going,
> +		 with "field"? (having issued an error, and hence no
> +		 output).  */

It would be really interesting to keep going here (and not return
error mark node). Also in other similar places.

The reason is that often typos cause a lot of follow-on errors, and
these would all go away if the guess was right.

IMHO avoiding followon errors is even more useful than just
the hint, as the more irritating problem is multiple pages
of errors.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree
  2015-09-10 20:30 ` [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree David Malcolm
@ 2015-09-11  3:19   ` Martin Sebor
  0 siblings, 0 replies; 133+ messages in thread
From: Martin Sebor @ 2015-09-11  3:19 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

On 09/10/2015 02:28 PM, David Malcolm wrote:
> This patch adds a test plugin that recurses down an expression tree,
> printing diagnostics showing the ranges of each node in the tree.
>
> Screenshot:
>   https://dmalcolm.fedorapeople.org/gcc/2015-09-09/show-trees.html
>
> This needs a linker hack, since it's the only user of
>    gcc_rich_location::add_expr
> which thus doesn't appear in "cc1" until later patches in the kit
> add uses of it; is there a clean way to fix that?
...
> +  Hack: fails with linker error:
> +./diagnostic_plugin_show_trees.so: undefined symbol: _ZN17gcc_rich_location8add_exprEP9tree_node
> +  since nothing in the tree is using gcc_rich_location::add_expr yet.
> +
> +  I've tried various workarounds (adding DEBUG_FUNCTION to the
> +  method, taking its address), but can't seem to fix it that way.
> +  So as a nasty workaround, the following material is copied&pasted
> +  from gcc-rich-location.c: */

I would expect this to work so long as cc1 is linked with
--export-dynamic (or -rdynamic which I think it is), and it
does in the attached example. I wonder what's different about
the way it's built (I haven't tried to reproduce it with gcc).

Martin

PS I've only skimmed the patch but besides being in awe at how
you managed to structure it and not get lost in the dependencies
I really like the output in the snapshots. Very cool!


[-- Attachment #2: t.c --]
[-- Type: text/x-csrc, Size: 1246 bytes --]

/*
$ (set -x; fl='-O2 -Wall -flto'; cat t.c && g++ -DFOO=1 -c $fl -o foo.o t.c && g++ -DBAR=1 -c $fl -o bar.o t.c && g++ -DMAIN=1 -c $fl t.c && g++ $fl -Wl,--export-dynamic -ldl foo.o bar.o t.o && g++ -DDSO=1 -fpic $fl -o plugin.so -shared t.c) && LD_LIBRARY_PATH=. ./a.out
+ fl='-O2 -Wall -flto'
+ cat t.c
*/
#include <dlfcn.h>
#include <stdio.h>

struct S {
    void foo ();
    int bar ();
};

#if FOO
void S::foo () { puts (__func__); }
#elif BAR
int S::bar () { puts (__func__); return 0; }
#elif DSO

extern "C" {
void foobar (S &s) {
    puts (__func__);
    s.foo ();
}
}

#elif MAIN

int main ()
{
    S s;

    void *dl = dlopen ("plugin.so", RTLD_LAZY);
    if (!dl) {
        fprintf (stderr, "dlopen: %s\n", dlerror ());
        return 1;
    }

    void (*f)(S&) = (void (*)(S&))dlsym (dl, "foobar");
    if (!f) {
        fprintf (stderr, "dlsym: %s\n", dlerror ());
        return 1;
    }

    f (s);

    dlclose (dl);

    return s.bar ();
}
#endif
/*
+ g++ -DFOO=1 -c -O2 -Wall -flto -o foo.o t.c
+ g++ -DBAR=1 -c -O2 -Wall -flto -o bar.o t.c
+ g++ -DMAIN=1 -c -O2 -Wall -flto t.c
+ g++ -O2 -Wall -flto -Wl,--export-dynamic -ldl foo.o bar.o t.o
+ g++ -DDSO=1 -fpic -O2 -Wall -flto -o plugin.so -shared t.c
foobar
foo
bar
$
*/

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes
  2015-09-10 20:29 ` [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes David Malcolm
@ 2015-09-11 13:44   ` Michael Matz
  2015-09-11 14:12   ` Michael Matz
  1 sibling, 0 replies; 133+ messages in thread
From: Michael Matz @ 2015-09-11 13:44 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> +/* FIXME: (dmalcolm)
> +   This plugin is currently the only user of
> +     gcc_rich_location::add_range_with_caption
> +   As such, the symbol is present in libbackend.a, but not in "cc1",
> +   and running the plugin fails with a linker error:
> +     ./diagnostic_plugin_test_show_locus.so: undefined symbol: _ZN17gcc_rich_location22add_range_with_captionEjjP18diagnostic_contextPKcz
> +   which c++filt tells us is:
> +     ./diagnostic_plugin_test_show_locus.so: undefined symbol: gcc_rich_location::add_range_with_caption(unsigned int, unsigned int, diagnostic_context*, char const*, ...)
> +
> +   I've tried various workarounds (adding DEBUG_FUNCTION to the
> +   method, taking its address), but can't seem to fix it that way.
> +   So as a nasty workaround, the following material is copied&pasted
> +   from gcc-rich-location.c: */

You need to make cc1 use _anything_ defined in the source file 
gcc-rich-location.c.  E.g. it could be some global internal variable:

int _force_me_into_cc1_hack;

which you then refer to in e.g. diagnostic-color.c (or from whereever).


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-10 20:28 ` [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs David Malcolm
@ 2015-09-11 14:08   ` Michael Matz
  2015-09-14 19:41     ` Jeff Law
  2015-09-15 10:20   ` Richard Biener
  1 sibling, 1 reply; 133+ messages in thread
From: Michael Matz @ 2015-09-11 14:08 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> Does anyone know why this was "carefully packed" and to what extent
> this matters?  I'm adding an extra 8 bytes to it (or 4 if we eliminate
> the existing location_t).  As far as I can see, these are
> short-lived, and there are only relative few alive at any time.

The c++ frontend stores _all_ tokens before starting to parse, so the size 
of cp_token is not totally irrelevant.  It still might not matter much, 
though.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes
  2015-09-10 20:29 ` [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes David Malcolm
  2015-09-11 13:44   ` Michael Matz
@ 2015-09-11 14:12   ` Michael Matz
  2015-09-11 15:15     ` David Malcolm
  1 sibling, 1 reply; 133+ messages in thread
From: Michael Matz @ 2015-09-11 14:12 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

Hi,

On Thu, 10 Sep 2015, David Malcolm wrote:

> +/* A range of source locations.
> +
> +   Ranges are half-open:
> +   m_start is the first location within the range, whereas
> +   m_finish is the first location *after* the range.

I think you eventually decided that they are closed, not half-open, at 
least this:

> +  static source_range from_location (source_location loc)
> +  {
> +    source_range result;
> +    result.m_start = loc;
> +    result.m_finish = loc;

and this:

> +/* Ranges are closed
> +   m_start is the first location within the range, and
> +   m_finish is the last location within the range.  */

suggest so :)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes
  2015-09-11 14:12   ` Michael Matz
@ 2015-09-11 15:15     ` David Malcolm
  0 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-11 15:15 UTC (permalink / raw)
  To: Michael Matz; +Cc: gcc-patches

On Fri, 2015-09-11 at 16:07 +0200, Michael Matz wrote:
> Hi,
> 
> On Thu, 10 Sep 2015, David Malcolm wrote:
> 
> > +/* A range of source locations.
> > +
> > +   Ranges are half-open:
> > +   m_start is the first location within the range, whereas
> > +   m_finish is the first location *after* the range.
> 
> I think you eventually decided that they are closed, not half-open, at 
> least this:

Oops.  Good catch; thanks.  Yes: in an early version of this work they
were half-open, but I found having both endpoints be within the range to
be much more convenient.


> > +  static source_range from_location (source_location loc)
> > +  {
> > +    source_range result;
> > +    result.m_start = loc;
> > +    result.m_finish = loc;
> 
> and this:
> 
> > +/* Ranges are closed
> > +   m_start is the first location within the range, and
> > +   m_finish is the last location within the range.  */
> 
> suggest so :)


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend
  2015-09-10 20:32 ` [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend David Malcolm
  2015-09-10 21:11   ` Andi Kleen
@ 2015-09-11 15:31   ` Manuel López-Ibáñez
  2015-09-15 15:25     ` [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2 David Malcolm
  1 sibling, 1 reply; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-11 15:31 UTC (permalink / raw)
  To: David Malcolm, GCC Patches

On 10/09/15 22:28, David Malcolm wrote:
> There are a couple of FIXMEs here:
> * where to call levenshtein_distance_unit_tests

Should this be part of make check? Perhaps a small program that is compiled and 
linked with spellcheck.c? This would be possible if spellcheck.c did not depend 
on tree.h or tm.h, which I doubt it needs to.

> * should we attempt error-recovery in c-typeck.c:build_component_ref

I would say yes, but why not leave this discussion to a later patch? The 
current one seems useful enough.

> +
> +/* Look for the closest match for NAME within the currently valid
> +   scopes.
> +
> +   This finds the identifier with the lowest Levenshtein distance to
> +   NAME.  If there are multiple candidates with equal minimal distance,
> +   the first one found is returned.  Scopes are searched from innermost
> +   outwards, and within a scope in reverse order of declaration, thus
> +   benefiting candidates "near" to the current scope.  */
> +
> +tree
> +lookup_name_fuzzy (tree name)
> +{
> +  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
> +
> +  c_binding *best_binding = NULL;
> +  int best_distance = INT_MAX;
> +
> +  for (c_scope *scope = current_scope; scope; scope = scope->outer)
> +    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
> +      {
> +	if (!binding->id)
> +	  continue;
> +	int dist = levenshtein_distance (name, binding->id);
> +	if (dist < best_distance)

I guess 'dist' cannot be negative. Can it be zero? If not, wouldn't be 
appropriate to exit as soon as it becomes 1?

Is this code discriminating between types and names? That is, what happens for:

typedef int ins;

int foo(void)
{
    int inr;
    inp x;
}

> +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> +
> +static void
> +lookup_field_fuzzy_find_candidates (tree type, tree component,
> +				    vec<tree> *candidates)
> +{
> +  tree field;
> +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> +    {
> +      if (DECL_NAME (field) == NULL_TREE
> +	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> +	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> +	{
> +	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> +					      component,
> +					      candidates);
> +	}
> +
> +      if (DECL_NAME (field))
> +	candidates->safe_push (field);
> +    }
> +}

This is appending inner-most, isn't it? Thus, given:

struct s{
     struct j { int aa; } kk;
     int aa;
};

void foo(struct s x)
{
     x.ab;
}

it will find s::j::aa before s::aa, no?

>   tree
> -build_component_ref (location_t loc, tree datum, tree component)
> +build_component_ref (location_t loc, tree datum, tree component,
> +		     source_range *ident_range)
>   {
>     tree type = TREE_TYPE (datum);
>     enum tree_code code = TREE_CODE (type);
> @@ -2294,7 +2356,31 @@ build_component_ref (location_t loc, tree datum, tree component)
>
>         if (!field)
>   	{
> -	  error_at (loc, "%qT has no member named %qE", type, component);
> +	  if (!ident_range)
> +	    {
> +	      error_at (loc, "%qT has no member named %qE",
> +			type, component);
> +	      return error_mark_node;
> +	    }
> +	  gcc_rich_location richloc (*ident_range);
> +	  if (TREE_CODE (datum) == INDIRECT_REF)
> +	    richloc.add_expr (TREE_OPERAND (datum, 0));
> +	  else
> +	    richloc.add_expr (datum);
> +	  field = lookup_field_fuzzy (type, component);
> +	  if (field)
> +	    {
> +	      error_at_rich_loc
> +		(&richloc,
> +		 "%qT has no member named %qE; did you mean %qE?",
> +		 type, component, field);
> +	      /* FIXME: error recovery: should we try to keep going,
> +		 with "field"? (having issued an error, and hence no
> +		 output).  */
> +	    }
> +	  else
> +	    error_at_rich_loc (&richloc, "%qT has no member named %qE",
> +			       type, component);
>   	  return error_mark_node;
>   	}

I don't understand why looking for a candidate or not depends on ident_range.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/spellcheck.c
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-fdiagnostics-show-caret" } */
> +
> +struct foo
> +{
> +  int foo;
> +  int bar;
> +  int baz;
> +};
> +
> +int test (struct foo *ptr)
> +{
> +  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> +
> +/* { dg-begin-multiline-output "" }
> +   return ptr->m_bar;
> +          ~~~  ^~~~~
> +   { dg-end-multiline-output "" } */
> +}
> +
> +int test2 (void)
> +{
> +  struct foo instance = {};
> +  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> +
> +/* { dg-begin-multiline-output "" }
> +   return instance.m_bar;
> +          ~~~~~~~~ ^~~~~
> +   { dg-end-multiline-output "" } */
> +}
> +
> +int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int'?" } */
> +/* { dg-begin-multiline-output "" }
> + int64 foo;
> + ^~~~~
> +   { dg-end-multiline-output "" } */
>


These tests could also test different scopes, clashes between types and fields 
and variables, and the correct behavior for nested struct/unions.

I wonder whether it would be worth it to extend existing tests if now they emit 
the "do you mean" part to be sure they are doing the right thing.

Cheers,

Manuel.



^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 00/22] RFC: Overhaul of diagnostics
  2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
                   ` (21 preceding siblings ...)
  2015-09-10 20:50 ` [PATCH 22/22] Add fixit hints to spellchecker suggestions David Malcolm
@ 2015-09-14 17:49 ` Bernd Schmidt
  2015-09-14 19:44   ` Jeff Law
  22 siblings, 1 reply; 133+ messages in thread
From: Bernd Schmidt @ 2015-09-14 17:49 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 09/10/2015 10:28 PM, David Malcolm wrote:
> Attached is a work-in-progress patch kit implementing these ideas.
> I posting it now to get feedback: some parts of it may be ready to
> commit, but other parts are definitely *not* ready yet.

It's hard to provide meaningful review under these conditions. My advice 
would be to resubmit the things that are ready now and can stand on 
their own so that we can get them out of the way first. Also, gather 
memory/time information before posting the patches if that seems likely 
to be important. For example, patch 21 looks quite cool but also 
potentially expensive, I'd probably want that to be restricted by param 
to identifiers of a maximum length (for both identifiers being compared).

For the most part I declare myself agnostic as to whether this is an 
improvement or not, and leave that for others to comment on. I 
personally prefer single-line errors without much noise.

I see lots of unit tests implemented as plugins - have we decided that 
this is the mechanism we want to use for this kind of thing?

Patch 3 is ok as a purely mechanical move.

Bernd

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 01/22] Change of location_get_source_line signature
  2015-09-10 20:12 ` [PATCH 01/22] Change of location_get_source_line signature David Malcolm
@ 2015-09-14 19:28   ` Jeff Law
  2015-09-15 17:02     ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-14 19:28 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 09/10/2015 02:28 PM, David Malcolm wrote:
> location_get_source_line takes an expanded_location, but the column
> is irrelevant; it just needs a filename and line number.
>
> This change is used by, but independent of, the new implementation of
> diagnostic_show_locus later in the kit, so am breaking this out early.
>
> gcc/ChangeLog:
> 	* input.h (location_get_source_line): Drop "expanded_location"
> 	param in favor of a file and line number.
> 	* input.c (location_get_source_line): Likewise.
> 	(dump_location_info): Update for change in signature of
> 	location_get_source_line.
> 	* diagnostic.c (diagnostic_print_caret_line): Likewise.
>
> gcc/c-family/ChangeLog:
> 	* c-format.c (location_from_offset): Update for change in
> 	signature of location_get_source_line.
> 	* c-indentation.c (get_visual_column): Likewise.
> 	(line_contains_hash_if): Likewise.
This looks like a reasonable cleanup in and of itself.  It's OK for the 
trunk once you've done the usual bootstrap & regression test.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands
  2015-09-10 20:13 ` [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands David Malcolm
@ 2015-09-14 19:35   ` Jeff Law
  2015-09-14 22:17     ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-14 19:35 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 09/10/2015 02:28 PM, David Malcolm wrote:
> This patch adds an easy way to write tests for expected multiline
> output.  For example we can test carets and underlines for
> a particular diagnostic with:
>
> /* { dg-begin-multiline-output "" }
>   typedef struct _GMutex GMutex;
>                  ^~~~~~~
>     { dg-end-multiline-output "" } */
>
> It is used extensively by the rest of the patch kit.
And could be used to simplify/test the basic caret diagnostics as well.

>
> multiline.exp is used by prune.exp; hence we need to load it before
> prune.exp via *load_gcc_lib* for the testsuites of the various
> non-"gcc" support libraries (e.g. boehm-gc).
?!? Then why does prune.exp also load multiline.exp?  I  must be missing 
something here.


>
> Question: which ChangeLog file should the change to
>    libgo/testsuite/lib/libgo.exp
> go into?
gcc/testsuite/ChangeLog is the nearest enclosing ChangeLog.  So that 
seems to be right place.  That's also where Ian put changes to go-test.exp.



Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file
  2015-09-10 20:13 ` [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file David Malcolm
@ 2015-09-14 19:37   ` Jeff Law
  2015-09-18 18:31     ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-14 19:37 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 09/10/2015 02:28 PM, David Malcolm wrote:
> The function "diagnostic_show_locus" gains new functionality in the
> next patch, so this preliminary patch breaks it out into a new source
> file, diagnostic-show-locus.c, along with a couple of related functions.
>
> gcc/ChangeLog:
> 	* Makefile.in (OBJS-libcommon): Add diagnostic-show-locus.o.
> 	* diagnostic.c (adjust_line): Move to diagnostic-show-locus.c.
> 	(diagnostic_show_locus): Likewise.
> 	(diagnostic_print_caret_line): Likewise.
> 	* diagnostic-show-locus.c: New file.
This is fine for the trunk.

So much for the easy stuff :-)

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-11 14:08   ` Michael Matz
@ 2015-09-14 19:41     ` Jeff Law
  0 siblings, 0 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-14 19:41 UTC (permalink / raw)
  To: Michael Matz, David Malcolm; +Cc: gcc-patches

On 09/11/2015 07:55 AM, Michael Matz wrote:
> Hi,
>
> On Thu, 10 Sep 2015, David Malcolm wrote:
>
>> Does anyone know why this was "carefully packed" and to what extent
>> this matters?  I'm adding an extra 8 bytes to it (or 4 if we eliminate
>> the existing location_t).  As far as I can see, these are
>> short-lived, and there are only relative few alive at any time.
>
> The c++ frontend stores _all_ tokens before starting to parse, so the size
> of cp_token is not totally irrelevant.  It still might not matter much,
> though.
FWIW, Zack hasn't gone away totally (he was just posting about the 
explicit_bzero stuff) -- it couldn't hurt to ping him on the 
implications of changing the size of that structure.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 00/22] RFC: Overhaul of diagnostics
  2015-09-14 17:49 ` [PATCH 00/22] RFC: Overhaul of diagnostics Bernd Schmidt
@ 2015-09-14 19:44   ` Jeff Law
  2015-09-15  1:11     ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-14 19:44 UTC (permalink / raw)
  To: Bernd Schmidt, David Malcolm, gcc-patches

On 09/14/2015 11:43 AM, Bernd Schmidt wrote:
> It's hard to provide meaningful review under these conditions. My advice
> would be to resubmit the things that are ready now and can stand on
> their own so that we can get them out of the way first. Also, gather
> memory/time information before posting the patches if that seems likely
> to be important. For example, patch 21 looks quite cool but also
> potentially expensive, I'd probably want that to be restricted by param
> to identifiers of a maximum length (for both identifiers being compared).
I think David is looking for some feedback on some of this stuff. 
There's clearly some design/implementation issues in those middling 
patches.  The thought behind showing the later patches is so that folks 
can generally see where this work is trying to go.

One of my big worries is the memory consumption.

>
> For the most part I declare myself agnostic as to whether this is an
> improvement or not, and leave that for others to comment on. I
> personally prefer single-line errors without much noise.
I wasn't a fan of rich location diagnostics, carets, etc.  However, now 
that I'm doing more C++ bits, I'm seeing the utility of this kind of stuff.

>
> I see lots of unit tests implemented as plugins - have we decided that
> this is the mechanism we want to use for this kind of thing?
A lot of the plugin-based testing is stuff that's painful to test 
end-to-end.  Probably the best way to think of those tests is they're 
trying to directly test internal state.

Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands
  2015-09-14 19:35   ` Jeff Law
@ 2015-09-14 22:17     ` Bernhard Reutner-Fischer
  2015-09-14 22:45       ` Jeff Law
  0 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-14 22:17 UTC (permalink / raw)
  To: Jeff Law, David Malcolm, gcc-patches

On September 14, 2015 9:32:54 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 09/10/2015 02:28 PM, David Malcolm wrote:

>>
>> multiline.exp is used by prune.exp; hence we need to load it before
>> prune.exp via *load_gcc_lib* for the testsuites of the various
>> non-"gcc" support libraries (e.g. boehm-gc).
>?!? Then why does prune.exp also load multiline.exp?  I  must be
>missing 
>something here.

https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html

dejagnu can now handle libdirs fine since a couple of years but this was deemed too early for GCC-5. Maybe GCC-6 can bump the required dejagnu version to allow for getting rid of all these superfluous load_gcc_lib? *blink* :)

Thanks and cheers,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands
  2015-09-14 22:17     ` Bernhard Reutner-Fischer
@ 2015-09-14 22:45       ` Jeff Law
  2015-09-15 17:53         ` dejagnu version update? Mike Stump
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-14 22:45 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, David Malcolm, gcc-patches

On 09/14/2015 02:38 PM, Bernhard Reutner-Fischer wrote:
> On September 14, 2015 9:32:54 PM GMT+02:00, Jeff Law <law@redhat.com>
> wrote:
>> On 09/10/2015 02:28 PM, David Malcolm wrote:
>
>>>
>>> multiline.exp is used by prune.exp; hence we need to load it
>>> before prune.exp via *load_gcc_lib* for the testsuites of the
>>> various non-"gcc" support libraries (e.g. boehm-gc).
>> ?!? Then why does prune.exp also load multiline.exp?  I  must be
>> missing something here.
>
> https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html
>
> dejagnu can now handle libdirs fine since a couple of years but this
> was deemed too early for GCC-5. Maybe GCC-6 can bump the required
> dejagnu version to allow for getting rid of all these superfluous
> load_gcc_lib? *blink* :)
I'd support that as a direction.

Certainly dropping the 2001 version from our website in favor of 1.5 
(which is what I'm using anyway) would be a step forward.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 00/22] RFC: Overhaul of diagnostics
  2015-09-14 19:44   ` Jeff Law
@ 2015-09-15  1:11     ` David Malcolm
  0 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-15  1:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bernd Schmidt, gcc-patches

On Mon, 2015-09-14 at 13:42 -0600, Jeff Law wrote:
> On 09/14/2015 11:43 AM, Bernd Schmidt wrote:
> > It's hard to provide meaningful review under these conditions. My advice
> > would be to resubmit the things that are ready now and can stand on
> > their own so that we can get them out of the way first. Also, gather
> > memory/time information before posting the patches if that seems likely
> > to be important. For example, patch 21 looks quite cool but also
> > potentially expensive, I'd probably want that to be restricted by param
> > to identifiers of a maximum length (for both identifiers being compared).
> I think David is looking for some feedback on some of this stuff. 
> There's clearly some design/implementation issues in those middling 
> patches.  The thought behind showing the later patches is so that folks 
> can generally see where this work is trying to go.

Indeed: my hope was that it would be helpful to see the kinds of
diagnostics I was hoping to be able to print, as that motivates both the
changes to diagnostics_show_locus, and efforts to try to capture and
store range information somehow within our IR.
I can post more screenshots if it will be helpful.

> One of my big worries is the memory consumption.

Yes.  Clearly the implementation I have in patch 12 isn't going to fly;
ideas welcome.   One thing I may try next is to only try to track the
ranges as the trees are constructed, immediately discarding them once
we've done that first level of error-checking... basically to not store
it beyond the frontends (for example to stuff it into c_expr in the C
FE).   That might be a useful compromise: hopefully letting us make a
lot of diagnostics more readable, without bloating the memory
requirements.  That's my hope, anyway :)

> > For the most part I declare myself agnostic as to whether this is an
> > improvement or not, and leave that for others to comment on. I
> > personally prefer single-line errors without much noise.
> I wasn't a fan of rich location diagnostics, carets, etc.  However, now 
> that I'm doing more C++ bits, I'm seeing the utility of this kind of stuff.

FWIW, I've mostly been holding off on adding ranges to the C++ FE in the
hope that the delayed folding branch will get merged soon (since
otherwise its unclear what to base the changes on); hence I only touched
a few places where token ranges were in use; I didn't attempt tree
ranges.

> > I see lots of unit tests implemented as plugins - have we decided that
> > this is the mechanism we want to use for this kind of thing?
> A lot of the plugin-based testing is stuff that's painful to test 
> end-to-end.  Probably the best way to think of those tests is they're 
> trying to directly test internal state.

Right.  The new plugins allow us to exercise the underlying machinery
unit by unit, and this is good for sanity (in particular, mine).

The unit tests in this patch kit use source code, which is going to be
the case for some tests, and fits neatly into the
gcc.dg/plugin/plugin.exp pattern, but not every test fits this pattern.

By contrast, if we want to e.g. verify that gengtype generates sane
mark&sweep routines, that doesn't necessarily need specific source code.
This latter style of test is what I was thinking of in the other patch
kit I posted here:
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
("[PATCH 00/17] RFC: Adding a unit testing framework to gcc").
It's kind of a pain to write a plugin each time we poke at a data
structure,  and run it with an empty source file, so my thinking was to
consolidate those tests that simply exercise internal data structures
into a single unit-test plugin, and run all the tests within it.

In particular, my hope is that this style of test could (a) help us
track down bugs earlier [1] and (b) be dramatically faster: I want us to
be measuring e.g. how many 100s or 1000s of unit tests per second we can
run, rather than having to fork/exec subprocesses for just a few tests
each time.

(Though that's probably a different discussion).

Thanks for the comments.  Hope the above sounds sane.
Dave

[1] I *hate* tracking down gengtype bugs; I'm keen to give us direct
test coverage for the code it generates, so we can track down bugs
immediately, rather than with multi-hour gdb sessions...

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-10 20:28 ` [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs David Malcolm
  2015-09-11 14:08   ` Michael Matz
@ 2015-09-15 10:20   ` Richard Biener
  2015-09-15 10:28     ` Jakub Jelinek
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Biener @ 2015-09-15 10:20 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Patches

On Thu, Sep 10, 2015 at 10:28 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> This patch adds source *range* information to libcpp's cpp_token, and to
> c_token and cp_token in the C and C++ frontends respectively.
>
> To minimize churn, I kept the existing location_t fields, though in
> theory these are always just equal to the start of the source range.
>
> cpplib.h's struct cpp_token had this comment:
>
>   /* A preprocessing token.  This has been carefully packed and should
>      occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
>
> Does anyone know why this was "carefully packed" and to what extent
> this matters?  I'm adding an extra 8 bytes to it (or 4 if we eliminate
> the existing location_t).  As far as I can see, these are
> short-lived, and there are only relative few alive at any time.
> Or is it about making them fast to copy?
>
> gcc/c-family/ChangeLog:
>         * c-lex.c (c_lex_with_flags): Add "range" param, and write back
>         to *range with the range of the libcpp token.
>         * c-pragma.h (c_lex_with_flags): Add "range" param.
>
> gcc/c/ChangeLog:
>         * c-parser.c (struct c_token): Add "range" field.
>         (c_lex_one_token): Write back to token->range in call to
>         c_lex_with_flags.
>
> gcc/cp/ChangeLog:
>         * parser.c (eof_token): Add "range" field to initializer.
>         (cp_lexer_get_preprocessor_token): Write back to token->range in
>         call to c_lex_with_flags.
>         * parser.h (struct cp_token): Add "range" field.
>
> libcpp/ChangeLog:
>         * include/cpplib.h (struct cpp_token): Add src_range field.
>         * lex.c (_cpp_lex_direct): Set up the src_range on the token.
> ---
>  gcc/c-family/c-lex.c    | 7 +++++--
>  gcc/c-family/c-pragma.h | 4 ++--
>  gcc/c/c-parser.c        | 6 +++++-
>  gcc/cp/parser.c         | 5 +++--
>  gcc/cp/parser.h         | 2 ++
>  libcpp/include/cpplib.h | 4 +++-
>  libcpp/lex.c            | 8 ++++++++
>  7 files changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/c-family/c-lex.c b/gcc/c-family/c-lex.c
> index 55ceb20..1334994 100644
> --- a/gcc/c-family/c-lex.c
> +++ b/gcc/c-family/c-lex.c
> @@ -380,11 +380,13 @@ c_common_has_attribute (cpp_reader *pfile)
>  }
>
>  /* Read a token and return its type.  Fill *VALUE with its value, if
> -   applicable.  Fill *CPP_FLAGS with the token's flags, if it is
> +   applicable.  Fill *LOC and *RANGE with the source location and range
> +   of the token.  Fill *CPP_FLAGS with the token's flags, if it is
>     non-NULL.  */
>
>  enum cpp_ttype
> -c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
> +c_lex_with_flags (tree *value, location_t *loc, source_range *range,
> +                 unsigned char *cpp_flags,
>                   int lex_flags)
>  {
>    static bool no_more_pch;
> @@ -397,6 +399,7 @@ c_lex_with_flags (tree *value, location_t *loc, unsigned char *cpp_flags,
>   retry:
>    tok = cpp_get_token_with_location (parse_in, loc);
>    type = tok->type;
> +  *range = tok->src_range;
>
>   retry_after_at:
>    switch (type)
> diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
> index aa2b471..05df543 100644
> --- a/gcc/c-family/c-pragma.h
> +++ b/gcc/c-family/c-pragma.h
> @@ -225,8 +225,8 @@ extern enum cpp_ttype pragma_lex (tree *);
>  /* This is not actually available to pragma parsers.  It's merely a
>     convenient location to declare this function for c-lex, after
>     having enum cpp_ttype declared.  */
> -extern enum cpp_ttype c_lex_with_flags (tree *, location_t *, unsigned char *,
> -                                       int);
> +extern enum cpp_ttype c_lex_with_flags (tree *, location_t *, source_range *,
> +                                       unsigned char *, int);
>
>  extern void c_pp_lookup_pragma (unsigned int, const char **, const char **);
>
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 11a2b0f..5d822ee 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -170,6 +170,8 @@ struct GTY (()) c_token {
>    ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>    /* The location at which this token was found.  */
>    location_t location;
> +  /* The source range at which this token was found.  */
> +  source_range range;
>    /* The value associated with this token, if any.  */
>    tree value;
>  };
> @@ -239,7 +241,9 @@ c_lex_one_token (c_parser *parser, c_token *token)
>  {
>    timevar_push (TV_LEX);
>
> -  token->type = c_lex_with_flags (&token->value, &token->location, NULL,
> +  token->type = c_lex_with_flags (&token->value, &token->location,
> +                                 &token->range,
> +                                 NULL,
>                                   (parser->lex_untranslated_string
>                                    ? C_LEX_STRING_NO_TRANSLATE : 0));
>    token->id_kind = C_ID_NONE;
> diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
> index 67fbcda..7c59c58 100644
> --- a/gcc/cp/parser.c
> +++ b/gcc/cp/parser.c
> @@ -58,7 +58,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  static cp_token eof_token =
>  {
> -  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, false, false, false, 0, { NULL }
> +  CPP_EOF, RID_MAX, 0, PRAGMA_NONE, false, false, false, 0, {0, 0}, { NULL }
>  };
>
>  /* The various kinds of non integral constant we encounter. */
> @@ -764,7 +764,8 @@ cp_lexer_get_preprocessor_token (cp_lexer *lexer, cp_token *token)
>
>     /* Get a new token from the preprocessor.  */
>    token->type
> -    = c_lex_with_flags (&token->u.value, &token->location, &token->flags,
> +    = c_lex_with_flags (&token->u.value, &token->location,
> +                        &token->range, &token->flags,
>                         lexer == NULL ? 0 : C_LEX_STRING_NO_JOIN);
>    token->keyword = RID_MAX;
>    token->pragma_kind = PRAGMA_NONE;
> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> index 760467c..c7558a0 100644
> --- a/gcc/cp/parser.h
> +++ b/gcc/cp/parser.h
> @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
>    BOOL_BITFIELD purged_p : 1;
>    /* The location at which this token was found.  */
>    location_t location;
> +  /* The source range at which this token was found.  */
> +  source_range range;

Is it just me or does location now feel somewhat redundant with range?  Can't we
compress that somehow?

>    /* The value associated with this token, if any.  */
>    union cp_token_value {
>      /* Used for CPP_NESTED_NAME_SPECIFIER and CPP_TEMPLATE_ID.  */
> diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
> index a2bdfa0..0b1a403 100644
> --- a/libcpp/include/cpplib.h
> +++ b/libcpp/include/cpplib.h
> @@ -235,9 +235,11 @@ struct GTY(()) cpp_identifier {
>  };
>
>  /* A preprocessing token.  This has been carefully packed and should
> -   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.  */
> +   occupy 16 bytes on 32-bit hosts and 24 bytes on 64-bit hosts.
> +   FIXME: the above comment is no longer true with this patch.  */
>  struct GTY(()) cpp_token {
>    source_location src_loc;     /* Location of first char of token.  */
> +  source_range src_range;      /* Source range covered by the token.  */
>    ENUM_BITFIELD(cpp_ttype) type : CHAR_BIT;  /* token type */
>    unsigned short flags;                /* flags - see above */
>
> diff --git a/libcpp/lex.c b/libcpp/lex.c
> index 0aa1090..a84a8c0 100644
> --- a/libcpp/lex.c
> +++ b/libcpp/lex.c
> @@ -2365,6 +2365,9 @@ _cpp_lex_direct (cpp_reader *pfile)
>      result->src_loc = linemap_position_for_column (pfile->line_table,
>                                           CPP_BUF_COLUMN (buffer, buffer->cur));
>
> +  /* The token's src_range begins here.  */
> +  result->src_range.m_start = result->src_loc;
> +
>    switch (c)
>      {
>      case ' ': case '\t': case '\f': case '\v': case '\0':
> @@ -2723,6 +2726,11 @@ _cpp_lex_direct (cpp_reader *pfile)
>        break;
>      }
>
> +  /* The token's src_range ends here.  */
> +  result->src_range.m_finish =
> +    linemap_position_for_column (pfile->line_table,
> +                                CPP_BUF_COLUMN (buffer, buffer->cur));
> +
>    return result;
>  }
>
> --
> 1.8.5.3
>

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:20   ` Richard Biener
@ 2015-09-15 10:28     ` Jakub Jelinek
  2015-09-15 10:48       ` Richard Biener
                         ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Jakub Jelinek @ 2015-09-15 10:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Malcolm, GCC Patches

On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 760467c..c7558a0 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> >    BOOL_BITFIELD purged_p : 1;
> >    /* The location at which this token was found.  */
> >    location_t location;
> > +  /* The source range at which this token was found.  */
> > +  source_range range;
> 
> Is it just me or does location now feel somewhat redundant with range?  Can't we
> compress that somehow?

For a token I'd expect it is redundant, I don't see how it would be useful
for a single preprocessing token to have more than start and end locations.
But generally, for expressions, 3 locations make sense.
If you have
abc + def
~~~~^~~~~
then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
location_t (the data structures behind it, where we already stick also
BLOCK pointer).

	Jakub

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:28     ` Jakub Jelinek
@ 2015-09-15 10:48       ` Richard Biener
  2015-09-15 11:01         ` Jakub Jelinek
  2015-09-17 19:25         ` Jeff Law
  2015-09-15 12:09       ` Manuel López-Ibáñez
  2015-09-15 13:53       ` David Malcolm
  2 siblings, 2 replies; 133+ messages in thread
From: Richard Biener @ 2015-09-15 10:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: David Malcolm, GCC Patches

On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
>> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
>> > index 760467c..c7558a0 100644
>> > --- a/gcc/cp/parser.h
>> > +++ b/gcc/cp/parser.h
>> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
>> >    BOOL_BITFIELD purged_p : 1;
>> >    /* The location at which this token was found.  */
>> >    location_t location;
>> > +  /* The source range at which this token was found.  */
>> > +  source_range range;
>>
>> Is it just me or does location now feel somewhat redundant with range?  Can't we
>> compress that somehow?
>
> For a token I'd expect it is redundant, I don't see how it would be useful
> for a single preprocessing token to have more than start and end locations.
> But generally, for expressions, 3 locations make sense.
> If you have
> abc + def
> ~~~~^~~~~
> then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
> location_t (the data structures behind it, where we already stick also
> BLOCK pointer).

Probably lack of encoding space ... I suppose upping location_t to
64bits coud solve
some of that (with its own drawback on increasing size of core structures).

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:48       ` Richard Biener
@ 2015-09-15 11:01         ` Jakub Jelinek
  2015-09-16 20:29           ` David Malcolm
  2015-09-17 19:25         ` Jeff Law
  1 sibling, 1 reply; 133+ messages in thread
From: Jakub Jelinek @ 2015-09-15 11:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: David Malcolm, GCC Patches

On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote:
> On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> >> > index 760467c..c7558a0 100644
> >> > --- a/gcc/cp/parser.h
> >> > +++ b/gcc/cp/parser.h
> >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> >> >    BOOL_BITFIELD purged_p : 1;
> >> >    /* The location at which this token was found.  */
> >> >    location_t location;
> >> > +  /* The source range at which this token was found.  */
> >> > +  source_range range;
> >>
> >> Is it just me or does location now feel somewhat redundant with range?  Can't we
> >> compress that somehow?
> >
> > For a token I'd expect it is redundant, I don't see how it would be useful
> > for a single preprocessing token to have more than start and end locations.
> > But generally, for expressions, 3 locations make sense.
> > If you have
> > abc + def
> > ~~~~^~~~~
> > then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
> > location_t (the data structures behind it, where we already stick also
> > BLOCK pointer).
> 
> Probably lack of encoding space ... I suppose upping location_t to
> 64bits coud solve
> some of that (with its own drawback on increasing size of core structures).

What I had in mind was just add
  source_location start, end;
to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent
just plain locations without block and without range (including the cases
where the range has both start and end equal to the locus) and IS_ADHOC_LOC
locations for the cases where either we have non-NULL block, or we have
some other range, or both.  But I haven't spent any time on that, so just
wondering if such an encoding has been considered.

	Jakub

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:28     ` Jakub Jelinek
  2015-09-15 10:48       ` Richard Biener
@ 2015-09-15 12:09       ` Manuel López-Ibáñez
  2015-09-15 12:18         ` Richard Biener
  2015-09-15 13:53       ` David Malcolm
  2 siblings, 1 reply; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-15 12:09 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: David Malcolm, GCC Patches

On 15/09/15 12:20, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
>>> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
>>> index 760467c..c7558a0 100644
>>> --- a/gcc/cp/parser.h
>>> +++ b/gcc/cp/parser.h
>>> @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
>>>     BOOL_BITFIELD purged_p : 1;
>>>     /* The location at which this token was found.  */
>>>     location_t location;
>>> +  /* The source range at which this token was found.  */
>>> +  source_range range;
>>
>> Is it just me or does location now feel somewhat redundant with range?  Can't we
>> compress that somehow?
>
> For a token I'd expect it is redundant, I don't see how it would be useful
> for a single preprocessing token to have more than start and end locations.

If memory usage is a concern, can't we easily find out the end location of a 
token just by simply re-lexing it from the start location? Many tokens are a 
single character.

> But generally, for expressions, 3 locations make sense.
> If you have
> abc + def
> ~~~~^~~~~
> then having a range is useful.

It seems you want to have a location for '+' plus left-most and right-most 
locations. However, we will need the location of 'a' and the location of 'd', 
not only the location of 'f'. Thus, we probably want to have (or build) a range 
for each operand, to be able to handle something like:

(a + b) + (c + d)
~~~~~~~ ^ ~~~~~~~

This does not require to track the ranges of every token, but it requires to 
track ranges of expressions when building them. Moreover, we want to store 
these ranges/locations in the expression node, since many operands (VAR_DECL, 
constants, etc) do not have a location. (In my humble opinion, this a more 
serious defect of GCC than not tracking a range for tokens 
https://gcc.gnu.org/bugzilla/PR43486)

Note also that we do not necessarily need to track ranges in libcpp to print 
ranges in diagnostics. The latter can be implemented and useful before the 
former. The example above:

void foo(void)
{
   float c,d;
   int * a,b;
   (a + b) + (c + d); //error: invalid operands to binary + (have â€˜int *â€™ and 
â€˜floatâ€™)
}

could be implemented simply by building the ranges while parsing (as I did in 
https://gcc.gnu.org/ml/gcc-patches/2009-08/msg00174.html), no need to store 
them explicitly. My intuition is that many of the ranges needed by diagnostics 
could be dynamically generated from two locations and passed to the point where 
it is used (like we do with location_t). We could store them, but we do not 
need to. Some examples:

     int y = *SomeA.X;
             ^~~~~~~~
     myvec[1]/P;
     ~~~~~~~~^~
   struct point origin = { x: 0.0, y: 0.0 };
                           ~~ ^
                           .x =

Do we have a place to store the range for "myvec[1]" or for "x:" ? (honest 
question).

Cheers,

Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 12:09       ` Manuel López-Ibáñez
@ 2015-09-15 12:18         ` Richard Biener
  2015-09-15 12:57           ` Manuel López-Ibáñez
  2015-09-17 19:13           ` Jeff Law
  0 siblings, 2 replies; 133+ messages in thread
From: Richard Biener @ 2015-09-15 12:18 UTC (permalink / raw)
  To: Manuel López-Ibáñez
  Cc: Jakub Jelinek, David Malcolm, GCC Patches

On Tue, Sep 15, 2015 at 2:08 PM, Manuel López-Ibáñez
<lopezibanez@gmail.com> wrote:
> On 15/09/15 12:20, Jakub Jelinek wrote:
>>
>> On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
>>>>
>>>> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
>>>> index 760467c..c7558a0 100644
>>>> --- a/gcc/cp/parser.h
>>>> +++ b/gcc/cp/parser.h
>>>> @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
>>>>     BOOL_BITFIELD purged_p : 1;
>>>>     /* The location at which this token was found.  */
>>>>     location_t location;
>>>> +  /* The source range at which this token was found.  */
>>>> +  source_range range;
>>>
>>>
>>> Is it just me or does location now feel somewhat redundant with range?
>>> Can't we
>>> compress that somehow?
>>
>>
>> For a token I'd expect it is redundant, I don't see how it would be useful
>> for a single preprocessing token to have more than start and end
>> locations.
>
>
> If memory usage is a concern, can't we easily find out the end location of a
> token just by simply re-lexing it from the start location? Many tokens are a
> single character.
>
>> But generally, for expressions, 3 locations make sense.
>> If you have
>> abc + def
>> ~~~~^~~~~
>> then having a range is useful.
>
>
> It seems you want to have a location for '+' plus left-most and right-most
> locations. However, we will need the location of 'a' and the location of
> 'd', not only the location of 'f'. Thus, we probably want to have (or build)
> a range for each operand, to be able to handle something like:
>
> (a + b) + (c + d)
> ~~~~~~~ ^ ~~~~~~~
>
> This does not require to track the ranges of every token, but it requires to
> track ranges of expressions when building them. Moreover, we want to store
> these ranges/locations in the expression node, since many operands
> (VAR_DECL, constants, etc) do not have a location. (In my humble opinion,
> this a more serious defect of GCC than not tracking a range for tokens
> https://gcc.gnu.org/bugzilla/PR43486)

Of course this boils down to "uses" of a VAR_DECL using the shared tree
node.  On GIMPLE some stmt kinds have separate locations for each operand
(PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to
wrap such uses to be able to give them distinct locations (can't use sth
existing as frontends would need to ignore them in a different way than say
NOP_EXPRs or NON_LVALUE_EXPRs).

> Note also that we do not necessarily need to track ranges in libcpp to print
> ranges in diagnostics. The latter can be implemented and useful before the
> former. The example above:
>
> void foo(void)
> {
>   float c,d;
>   int * a,b;
>   (a + b) + (c + d); //error: invalid operands to binary + (have ‘int *’ and
> ‘float’)
> }
>
> could be implemented simply by building the ranges while parsing (as I did
> in https://gcc.gnu.org/ml/gcc-patches/2009-08/msg00174.html), no need to
> store them explicitly. My intuition is that many of the ranges needed by
> diagnostics could be dynamically generated from two locations and passed to
> the point where it is used (like we do with location_t). We could store
> them, but we do not need to. Some examples:
>
>     int y = *SomeA.X;
>             ^~~~~~~~
>     myvec[1]/P;
>     ~~~~~~~~^~
>   struct point origin = { x: 0.0, y: 0.0 };
>                           ~~ ^
>                           .x =
>
> Do we have a place to store the range for "myvec[1]" or for "x:" ? (honest
> question).
>
> Cheers,
>
> Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 12:18         ` Richard Biener
@ 2015-09-15 12:57           ` Manuel López-Ibáñez
  2015-09-17 19:11             ` Jeff Law
  2015-09-17 19:13           ` Jeff Law
  1 sibling, 1 reply; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-15 12:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, David Malcolm, GCC Patches

On 15 September 2015 at 14:18, Richard Biener
<richard.guenther@gmail.com> wrote:
> Of course this boils down to "uses" of a VAR_DECL using the shared tree
> node.  On GIMPLE some stmt kinds have separate locations for each operand
> (PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to
> wrap such uses to be able to give them distinct locations (can't use sth
> existing as frontends would need to ignore them in a different way than say
> NOP_EXPRs or NON_LVALUE_EXPRs).
>

The problem with that approach (besides the waste of memory implied by
a whole tree node just to store one location_t) is keeping those
wrappers in place while making them transparent for most of the
compiler. According to Arnaud, folding made this approach infeasible:
https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01222.html

The other two alternatives are to store the location of the operands
on the expressions themselves or to store them as on-the-side
data-structure, but they come with their own drawbacks. I was
initially more in favour of the wrapper solution, but after dealing
with NOP_EXPRs, having to deal also with LOC_EXPR would be a nightmare
(as you say, they will have to be ignored in a different way). The
other alternatives seem less invasive and the problems mentioned here
https://gcc.gnu.org/ml/gcc-patches/2012-11/msg00164.html do not seem
as serious as I thought (passing down the location of the operand is
becoming  the norm anyway).

Cheers,

Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:28     ` Jakub Jelinek
  2015-09-15 10:48       ` Richard Biener
  2015-09-15 12:09       ` Manuel López-Ibáñez
@ 2015-09-15 13:53       ` David Malcolm
  2 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-15 13:53 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On Tue, 2015-09-15 at 12:20 +0200, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > > index 760467c..c7558a0 100644
> > > --- a/gcc/cp/parser.h
> > > +++ b/gcc/cp/parser.h
> > > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> > >    BOOL_BITFIELD purged_p : 1;
> > >    /* The location at which this token was found.  */
> > >    location_t location;
> > > +  /* The source range at which this token was found.  */
> > > +  source_range range;
> > 
> > Is it just me or does location now feel somewhat redundant with range?  Can't we
> > compress that somehow?
> 
> For a token I'd expect it is redundant, I don't see how it would be useful
> for a single preprocessing token to have more than start and end locations.

Indeed.   Patch 7 of the patch kit is merely considering tokens (in each
of libcpp and the C and C++ FEs), and in each of these three cases, once
we have a range, the "location" is just the start of the range.   I kept
the location to keep the patch smaller.   I can look into eliminating
the location field from the 3 relevant structures.

(I considered that we could have just the location as the start, and
calculate the end from the length of the token, but AFAIK this can't
cover token concatenation by the preprocessor, and if we're going to
have ranges later on, it's simpler to use the same representation).

> But generally, for expressions, 3 locations make sense.
> If you have
> abc + def
> ~~~~^~~~~
> then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
> location_t (the data structures behind it, where we already stick also
> BLOCK pointer).

I posted a few ideas I had for implementing ranges in:
"[PATCH 12/22] Add source-ranges for trees"
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00739.html

but putting them into the ad-hoc location data is one that hadn't
occurred to me.  Thanks; I'll look into it.

Dave

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-11 15:31   ` Manuel López-Ibáñez
@ 2015-09-15 15:25     ` David Malcolm
  2015-09-15 16:25       ` Manuel López-Ibáñez
  2015-09-16  8:45       ` Richard Biener
  0 siblings, 2 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-15 15:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: Manuel López-Ibáñez, David Malcolm

Updated patch attached, which is now independent of the rest of the
patch kit; see below.  Various other comments inline.

On Fri, 2015-09-11 at 17:30 +0200, Manuel LÃ³pez-IbÃ¡Ã±ez wrote:
On 10/09/15 22:28, David Malcolm wrote:
> > There are a couple of FIXMEs here:
> > * where to call levenshtein_distance_unit_tests
>
> Should this be part of make check? Perhaps a small program that is compiled and
> linked with spellcheck.c? This would be possible if spellcheck.c did not depend
> on tree.h or tm.h, which I doubt it needs to.

Ideally I'd like to put them into a unittest plugin I've been working on:
 https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
In the meantime, they only get run in an ENABLE_CHECKING build.

> > * should we attempt error-recovery in c-typeck.c:build_component_ref
>
> I would say yes, but why not leave this discussion to a later patch? The
> current one seems useful enough.

(nods)

> > +
> > +/* Look for the closest match for NAME within the currently valid
> > +   scopes.
> > +
> > +   This finds the identifier with the lowest Levenshtein distance to
> > +   NAME.  If there are multiple candidates with equal minimal distance,
> > +   the first one found is returned.  Scopes are searched from innermost
> > +   outwards, and within a scope in reverse order of declaration, thus
> > +   benefiting candidates "near" to the current scope.  */
> > +
> > +tree
> > +lookup_name_fuzzy (tree name)
> > +{
> > +  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
> > +
> > +  c_binding *best_binding = NULL;
> > +  int best_distance = INT_MAX;
> > +
> > +  for (c_scope *scope = current_scope; scope; scope = scope->outer)
> > +    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
> > +      {
> > +	if (!binding->id)
> > +	  continue;
> > +	int dist = levenshtein_distance (name, binding->id);
> > +	if (dist < best_distance)
>
> I guess 'dist' cannot be negative. Can it be zero? If not, wouldn't be
> appropriate to exit as soon as it becomes 1?

It can't be negative, so I've converted it to unsigned int, and introduced an
"edit_distance_t" typedef for it.

It would be appropriate to exit as soon as we reach 1 if we agree
that lookup_name_fuzzy isn't intended to find exact matches (since
otherwise we might fail to return an exact match if we see a
distance 1 match first).

I haven't implemented that early bailout in this iteration of the
patch; should I?

> Is this code discriminating between types and names? That is, what happens for:
>
> typedef int ins;
>
> int foo(void)
> {
>     int inr;
>     inp x;
> }

Thanks.  I've fixed that.

> > +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> > +
> > +static void
> > +lookup_field_fuzzy_find_candidates (tree type, tree component,
> > +				    vec<tree> *candidates)
> > +{
> > +  tree field;
> > +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> > +    {
> > +      if (DECL_NAME (field) == NULL_TREE
> > +	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > +	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> > +	{
> > +	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> > +					      component,
> > +					      candidates);
> > +	}
> > +
> > +      if (DECL_NAME (field))
> > +	candidates->safe_push (field);
> > +    }
> > +}
>
> This is appending inner-most, isn't it? Thus, given:

Yes.

> struct s{
>      struct j { int aa; } kk;
>      int aa;
> };
>
> void foo(struct s x)
> {
>      x.ab;
> }
>
> it will find s::j::aa before s::aa, no?

AIUI, it doesn't look inside the "kk", only for anonymous structs.

I added a test for this.

> >   tree
> > -build_component_ref (location_t loc, tree datum, tree component)
> > +build_component_ref (location_t loc, tree datum, tree component,
> > +		     source_range *ident_range)
> >   {
> >     tree type = TREE_TYPE (datum);
> >     enum tree_code code = TREE_CODE (type);
> > @@ -2294,7 +2356,31 @@ build_component_ref (location_t loc, tree datum, tree component)
> >
> >         if (!field)
> >   	{
> > -	  error_at (loc, "%qT has no member named %qE", type, component);
> > +	  if (!ident_range)
> > +	    {
> > +	      error_at (loc, "%qT has no member named %qE",
> > +			type, component);
> > +	      return error_mark_node;
> > +	    }
> > +	  gcc_rich_location richloc (*ident_range);
> > +	  if (TREE_CODE (datum) == INDIRECT_REF)
> > +	    richloc.add_expr (TREE_OPERAND (datum, 0));
> > +	  else
> > +	    richloc.add_expr (datum);
> > +	  field = lookup_field_fuzzy (type, component);
> > +	  if (field)
> > +	    {
> > +	      error_at_rich_loc
> > +		(&richloc,
> > +		 "%qT has no member named %qE; did you mean %qE?",
> > +		 type, component, field);
> > +	      /* FIXME: error recovery: should we try to keep going,
> > +		 with "field"? (having issued an error, and hence no
> > +		 output).  */
> > +	    }
> > +	  else
> > +	    error_at_rich_loc (&richloc, "%qT has no member named %qE",
> > +			       type, component);
> >   	  return error_mark_node;
> >   	}
>
> I don't understand why looking for a candidate or not depends on ident_range.

This is because the old patch was integrated with the source_range
ideas from the rest of the patch kit.  I've taken that out in the new
version.

> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/spellcheck.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-fdiagnostics-show-caret" } */
> > +
> > +struct foo
> > +{
> > +  int foo;
> > +  int bar;
> > +  int baz;
> > +};
> > +
> > +int test (struct foo *ptr)
> > +{
> > +  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> > +
> > +/* { dg-begin-multiline-output "" }
> > +   return ptr->m_bar;
> > +          ~~~  ^~~~~
> > +   { dg-end-multiline-output "" } */
> > +}
> > +
> > +int test2 (void)
> > +{
> > +  struct foo instance = {};
> > +  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> > +
> > +/* { dg-begin-multiline-output "" }
> > +   return instance.m_bar;
> > +          ~~~~~~~~ ^~~~~
> > +   { dg-end-multiline-output "" } */
> > +}
> > +
> > +int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int'?" } */
> > +/* { dg-begin-multiline-output "" }
> > + int64 foo;
> > + ^~~~~
> > +   { dg-end-multiline-output "" } */
> >
>
>
> These tests could also test different scopes, clashes between types and fields
> and variables, and the correct behavior for nested struct/unions.

Thanks; added to TODO list below.

> I wonder whether it would be worth it to extend existing tests if now they emit
> the "do you mean" part to be sure they are doing the right thing.

Thanks; added to TODO list below.  These are passing now due to the
dg-error regexp not caring about the exact message.

Many of the field names in these tests are very short; it's not clear
to me that there's a good single suggestion that can be made if there
are several 1-char field names to choose from.

I noticed that the old patch could sometimes offer unhelpful
suggestions; I added a test for this:

  nonsensical_suggestion_t var;

where it would suggest something unrelated.  I suppressed that in
lookup_name_fuzzy by only offering a suggestion if the distance is less
than half of the length of what the user typed and that seemed to work
well, albeit in the few cases I tried.  I suspect that we may
want a similar suppression for lookup_field_fuzzy.

> Cheers,
>
> Manuel.

Thanks.

Update version of the patch follows.

This version of the patch is independent of the rest of the kit,
and applies directly on top of trunk (r227562, specifically).

Changes since previous version:
- it's now independent of the rest of the patch kit.
- removal of tracking of fieldname range "ident_range" from calls
  to build_component_ref, just using the location_t.
- removal of show-caret/multiline tests from testcase
- introduced a typedef "edit_distance_t", using it to convert
  the underlying type from "int" to "unsigned int".
- "lookup_name_fuzzy" now only considers bindings of a TYPE_DECL,
  thus matching "ins" rather than "inr" for the example given by Manu
  here:
    https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00813.html
- lookup_name_fuzzy: don't offer a suggestion if the distance is too
  high, since such a suggestion is likely to be bogus
- added test coverage to try to cover the above
- reimplemented levenshtein_distance to avoid allocating and building
  an (m + 1) * (n + 1) matrix in favor of just tracking two rows
  at once
- made levenshtein_distance_unit_tests automatically run each test
  both ways; added some more tests

I attempted the error-recovery in build_component_ref, but I found it
could make things worse.  For example, in
gcc/testsuite/gcc.dg/anon-struct-11.c:
  f3 (&e.D);		/* { dg-error "no member" } */
becomes:
  error: 'struct E' has no member named 'D'; did you mean 'b'?
but if we try to use "b", this then leads to thes additional bogus
messages:
  warning: passing argument 1 of 'f3' from incompatible
    pointer type [-Wincompatible-pointer-types]
  note: expected 'D * {aka struct <anonymous> *}' but
    argument is of type 'char *'

Similarly, in gcc/testsuite/gcc.dg/c11-anon-struct-2.c:
  x.i = 0; /* { dg-error "has no member" } */
this becomes:
  error: 'struct s5' has no member named 'i'; did you mean 'a'?
which then leads to:
  error: incompatible types when assigning to type
   'struct <anonymous>' from type 'int'

So this version of the patch doesn't attempted to use the suggested
field.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu; adds
9 PASSes to gcc.sum.

I'm posting it here as a work-in-progress.

Remaining work:
  * the FIXME about where to call levenshtein_distance_unit_tests;
there's an argument that this could be moved to libiberty (is C++
allowed in libiberty?); I'd prefer to get the unittest idea from
 https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
into trunk, and then move it into there.  Right now it's all
gcc_assert, so optimizes away in a production build.
  * more testcases as noted by Manu above
  * try existing testcases as noted by Manu above
  * possible early return when distance == 1
  * perhaps some kind of limit on the number of iterations inside
levenshtein_distance (e.g. governed by a param).
  * perhaps some ability to pass in a limit on the
distance we care about, so we can immediately reject distances
that will be above this

It also strikes me that sometimes a "misspelling" is a missing
header file, and that the most helpful thing to do might be to
suggest including that header file.  For instance given:
  $ cat /tmp/foo.c
  int64_t i;

  $ ./xgcc -B. /tmp/foo.c
  /tmp/foo.c:1:1: error: unknown type name â€˜int64_tâ€™
  int64_t i;
  ^
(where the suggestion of "int" is suppressed due to the distance
being too long) it might be helpful to print:
  /tmp/foo.c:1:1: error: unknown type name 'int64_t'; did you mean to include '<inttypes.h>'?
  int64_t i;
  ^
That does seem like a separate enhancement, though.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add spellcheck.o.
	* spellcheck.c: New file.
	* spellcheck.h: New file.

gcc/c-family/ChangeLog:
	* c-common.h (lookup_name_fuzzy): New decl.

gcc/c/ChangeLog:
	* c-decl.c: Include spellcheck.h.
	(lookup_name_fuzzy): New.
	* c-parser.c: Include spellcheck.h.
	(c_parser_declaration_or_fndef): If "unknown type name",
	attempt to suggest a close match using lookup_name_fuzzy.
	* c-typeck.c: Include spellcheck.h.
	(lookup_field_fuzzy_find_candidates): New function.
	(lookup_field_fuzzy): New function.
	(build_component_ref): Use lookup_field_fuzzy to suggest close
	matches when printing field-not-found error.

gcc/testsuite/ChangeLog:
	* gcc.dg/spellcheck.c: New file.
---
 gcc/Makefile.in                   |   1 +
 gcc/c-family/c-common.h           |   1 +
 gcc/c/c-decl.c                    |  45 +++++++++++
 gcc/c/c-parser.c                  |  11 ++-
 gcc/c/c-typeck.c                  |  66 ++++++++++++++-
 gcc/spellcheck.c                  | 166 ++++++++++++++++++++++++++++++++++++++
 gcc/spellcheck.h                  |  35 ++++++++
 gcc/testsuite/gcc.dg/spellcheck.c |  49 +++++++++++
 8 files changed, 371 insertions(+), 3 deletions(-)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 3d1c1e5..73a29b4 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1391,6 +1391,7 @@ OBJS = \
 	shrink-wrap.o \
 	simplify-rtx.o \
 	sparseset.o \
+	spellcheck.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	statistics.o \
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 74d1bc1..e5f867c 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -971,6 +971,7 @@ extern tree finish_label_address_expr (tree, location_t);
    different implementations.  Used in c-common.c.  */
 extern tree lookup_label (tree);
 extern tree lookup_name (tree);
+extern tree lookup_name_fuzzy (tree);
 extern bool lvalue_p (const_tree);
 
 extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index b83c584..d919019 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ada-spec.h"
 #include "cilk.h"
 #include "builtins.h"
+#include "spellcheck.h"
 
 /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
 enum decl_context
@@ -3900,6 +3901,50 @@ lookup_name_in_scope (tree name, struct c_scope *scope)
       return b->decl;
   return 0;
 }
+
+/* Look for the closest match for NAME within the currently valid
+   scopes.
+
+   This finds the identifier with the lowest Levenshtein distance to
+   NAME.  If there are multiple candidates with equal minimal distance,
+   the first one found is returned.  Scopes are searched from innermost
+   outwards, and within a scope in reverse order of declaration, thus
+   benefiting candidates "near" to the current scope.  */
+
+tree
+lookup_name_fuzzy (tree name)
+{
+  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
+
+  c_binding *best_binding = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+
+  for (c_scope *scope = current_scope; scope; scope = scope->outer)
+    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
+      {
+	if (!binding->id)
+	  continue;
+	if (TREE_CODE (binding->decl) != TYPE_DECL)
+	  continue;
+	edit_distance_t dist = levenshtein_distance (name, binding->id);
+	if (dist < best_distance)
+	  {
+	    best_distance = dist;
+	    best_binding = binding;
+	  }
+      }
+
+  if (!best_binding)
+    return NULL;
+
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best_distance > IDENTIFIER_LENGTH (name) / 2 )
+    return NULL;
+
+  return best_binding->id;
+}
+
 \f
 /* Create the predefined scalar types of C,
    and some nodes representing standard constants (0, 1, (void *) 0).
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 11a2b0f..f04c88b 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "builtins.h"
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
+#include "spellcheck.h"
 
 \f
 /* Initialization routine for this file.  */
@@ -1539,8 +1540,14 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
           || c_parser_peek_2nd_token (parser)->type == CPP_MULT)
       && (!nested || !lookup_name (c_parser_peek_token (parser)->value)))
     {
-      error_at (here, "unknown type name %qE",
-                c_parser_peek_token (parser)->value);
+      tree hint = lookup_name_fuzzy (c_parser_peek_token (parser)->value);
+      if (hint)
+	error_at (here, "unknown type name %qE; did you mean %qE?",
+		  c_parser_peek_token (parser)->value,
+		  hint);
+      else
+	error_at (here, "unknown type name %qE",
+		  c_parser_peek_token (parser)->value);
 
       /* Parse declspecs normally to get a correct pointer type, but avoid
          a further "fails to be a type name" error.  Refuse nested functions
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index dc22396..3dded26 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ubsan.h"
 #include "cilk.h"
 #include "gomp-constants.h"
+#include "spellcheck.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -2249,6 +2250,64 @@ lookup_field (tree type, tree component)
   return tree_cons (NULL_TREE, field, NULL_TREE);
 }
 
+/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
+
+static void
+lookup_field_fuzzy_find_candidates (tree type, tree component,
+				    vec<tree> *candidates)
+{
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+    {
+      if (DECL_NAME (field) == NULL_TREE
+	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
+	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+	{
+	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+					      component,
+					      candidates);
+	}
+
+      if (DECL_NAME (field))
+	candidates->safe_push (DECL_NAME (field));
+    }
+}
+
+/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
+   rather than returning a TREE_LIST for an exact match.  */
+
+static tree
+lookup_field_fuzzy (tree type, tree component)
+{
+  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
+
+  /* FIXME: move this to a unittest suite. */
+  levenshtein_distance_unit_tests ();
+
+  /* First, gather a list of candidates.  */
+  auto_vec <tree> candidates;
+
+  lookup_field_fuzzy_find_candidates (type, component,
+				      &candidates);
+
+  /* Now determine which is closest.  */
+  int i;
+  tree identifier;
+  tree best_identifier = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+  FOR_EACH_VEC_ELT (candidates, i, identifier)
+    {
+      edit_distance_t dist = levenshtein_distance (component, identifier);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best_identifier = identifier;
+	}
+    }
+
+  return best_identifier;
+}
+
 /* Make an expression to refer to the COMPONENT field of structure or
    union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
    location of the COMPONENT_REF.  */
@@ -2284,7 +2343,12 @@ build_component_ref (location_t loc, tree datum, tree component)
 
       if (!field)
 	{
-	  error_at (loc, "%qT has no member named %qE", type, component);
+	  tree guessed_id = lookup_field_fuzzy (type, component);
+	  if (guessed_id)
+	    error_at (loc, "%qT has no member named %qE; did you mean %qE?",
+		      type, component, guessed_id);
+	  else
+	    error_at (loc, "%qT has no member named %qE", type, component);
 	  return error_mark_node;
 	}
 
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
new file mode 100644
index 0000000..c407aa0
--- /dev/null
+++ b/gcc/spellcheck.c
@@ -0,0 +1,166 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "spellcheck.h"
+
+/* The Levenshtein distance is an "edit-distance": the minimal
+   number of one-character insertions, removals or substitutions
+   that are needed to change one string into another.
+
+   This implementation uses the Wagner-Fischer algorithm.  */
+
+static edit_distance_t
+levenshtein_distance (const char *s, int m,
+		      const char *t, int n)
+{
+  const bool debug = false;
+
+  if (debug)
+    {
+      printf ("s: \"%s\" (m=%i)\n", s, m);
+      printf ("t: \"%s\" (n=%i)\n", t, n);
+    }
+
+  if (m == 0)
+    return n;
+  if (n == 0)
+    return m;
+
+  /* We effectively build a matrix where each (i, j) contains the
+     Levenshtein distance between the prefix strings s[0:i]
+     and t[0:j].
+     Rather than actually build an (m + 1) * (n + 1) matrix,
+     we simply keep track of the last row, v0 and a new row, v1,
+     which avoids an (m + 1) * (n + 1) allocation and memory accesses
+     in favor of two (m + 1) allocations.  These could potentially be
+     statically-allocated if we impose a maximum length on the
+     strings of interest.  */
+  edit_distance_t *v0 = new edit_distance_t[m + 1];
+  edit_distance_t *v1 = new edit_distance_t[m + 1];
+
+  /* The first row is for the case of an empty target string, which
+     we can reach by deleting every character in the source string.  */
+  for (int i = 0; i < m + 1; i++)
+    v0[i] = i;
+
+  /* Build successive rows.  */
+  for (int i = 0; i < n; i++)
+    {
+      if (debug)
+	{
+	  printf ("i:%i v0 = ", i);
+	  for (int j = 0; j < m + 1; j++)
+	    printf ("%i ", v0[j]);
+	  printf ("\n");
+	}
+
+      /* The initial column is for the case of an empty source string; we
+	 can reach prefixes of the target string of length i
+	 by inserting i characters.  */
+      v1[0] = i + 1;
+
+      /* Build the rest of the row by considering neighbours to
+	 the north, west and northwest.  */
+      for (int j = 0; j < m; j++)
+	{
+	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
+	  edit_distance_t deletion     = v1[j] + 1;
+	  edit_distance_t insertion    = v0[j + 1] + 1;
+	  edit_distance_t substitution = v0[j] + cost;
+	  edit_distance_t cheapest = MIN (deletion, insertion);
+	  cheapest = MIN (cheapest, substitution);
+	  v1[j + 1] = cheapest;
+	}
+
+      /* Prepare to move on to next row.  */
+      for (int j = 0; j < m + 1; j++)
+	v0[j] = v1[j];
+    }
+
+  if (debug)
+    {
+      printf ("final v1 = ");
+      for (int j = 0; j < m + 1; j++)
+	printf ("%i ", v1[j]);
+      printf ("\n");
+    }
+
+  edit_distance_t result = v1[m];
+  delete[] v0;
+  delete[] v1;
+  return result;
+}
+
+/* Calculate Levenshtein distance between two nil-terminated strings.
+   This exists purely for the unit tests.  */
+
+edit_distance_t
+levenshtein_distance (const char *s, const char *t)
+{
+  return levenshtein_distance (s, strlen (s), t, strlen (t));
+}
+
+/* Unit tests for levenshtein_distance.  */
+
+static void
+levenshtein_distance_unit_test (const char *a, const char *b,
+				edit_distance_t expected)
+{
+  /* Run every test both ways to ensure it's symmetric.  */
+  gcc_assert (levenshtein_distance (a, b) == expected);
+  gcc_assert (levenshtein_distance (b, a) == expected);
+}
+
+void
+levenshtein_distance_unit_tests (void)
+{
+  levenshtein_distance_unit_test ("", "nonempty", strlen ("nonempty"));
+  levenshtein_distance_unit_test ("saturday", "sunday", 3);
+  levenshtein_distance_unit_test ("foo", "m_foo", 2);
+  levenshtein_distance_unit_test ("hello_world", "HelloWorld", 3);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog", "dog", 40);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog",
+     "the quick brown dog jumps over the lazy fox",
+     4);
+  levenshtein_distance_unit_test
+    ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
+     "All your base are belong to us",
+     44);
+}
+
+/* Calculate Levenshtein distance between two identifiers.  */
+
+edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t)
+{
+  gcc_assert (TREE_CODE (ident_s) == IDENTIFIER_NODE);
+  gcc_assert (TREE_CODE (ident_t) == IDENTIFIER_NODE);
+
+  return levenshtein_distance (IDENTIFIER_POINTER (ident_s),
+			       IDENTIFIER_LENGTH (ident_s),
+			       IDENTIFIER_POINTER (ident_t),
+			       IDENTIFIER_LENGTH (ident_t));
+}
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
new file mode 100644
index 0000000..6f50b71
--- /dev/null
+++ b/gcc/spellcheck.h
@@ -0,0 +1,35 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_H
+#define GCC_SPELLCHECK_H
+
+typedef unsigned int edit_distance_t;
+const edit_distance_t MAX_EDIT_DISTANCE = UINT_MAX;
+
+extern void
+levenshtein_distance_unit_tests (void);
+
+extern edit_distance_t
+levenshtein_distance (const char *s, const char *t);
+
+extern edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t);
+
+#endif  /* GCC_SPELLCHECK_H  */
diff --git a/gcc/testsuite/gcc.dg/spellcheck.c b/gcc/testsuite/gcc.dg/spellcheck.c
new file mode 100644
index 0000000..53ebb86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+
+struct foo
+{
+  int foo;
+  int bar;
+  int baz;
+};
+
+int test (struct foo *ptr)
+{
+  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+int test2 (void)
+{
+  struct foo instance = {0, 0, 0};
+  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+#include <inttypes.h>
+int64 i; /* { dg-error "unknown type name 'int64'; did you mean 'int64_t'?" } */
+
+typedef int ins;
+int test3 (void)
+{
+    int inr;
+    inp x; /* { dg-error "unknown type name 'inp'; did you mean 'ins'?" } */
+}
+
+struct s {
+    struct j { int aa; } kk;
+    int ab;
+};
+
+void test4 (struct s x)
+{
+  x.ac;  /* { dg-error "'struct s' has no member named 'ac'; did you mean 'ab'?" } */
+}
+
+int test5 (struct foo *ptr)
+{
+  return sizeof (ptr->foa); /* { dg-error "'struct foo' has no member named 'foa'; did you mean 'foo'?" } */
+}
+
+/* Verify that gcc doesn't offer nonsensical suggestions.  */
+
+nonsensical_suggestion_t var; /* { dg-bogus "did you mean" } */
+/* { dg-error "unknown type name" "" { target { *-*-* } } 48 } */
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-15 15:25     ` [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2 David Malcolm
@ 2015-09-15 16:25       ` Manuel López-Ibáñez
  2015-09-16  8:45       ` Richard Biener
  1 sibling, 0 replies; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-15 16:25 UTC (permalink / raw)
  To: David Malcolm; +Cc: Gcc Patch List

On 15 September 2015 at 17:38, David Malcolm <dmalcolm@redhat.com> wrote:
> It would be appropriate to exit as soon as we reach 1 if we agree
> that lookup_name_fuzzy isn't intended to find exact matches (since
> otherwise we might fail to return an exact match if we see a
> distance 1 match first).
>
> I haven't implemented that early bailout in this iteration of the
> patch; should I?

Everything I say are mere suggestions since I cannot approve patches,
so I would say: "whatever the approvers say!" :-)

Nevertheless, how an exact match would play out?

unknown type name 'inp'; did you mean 'inp'?


> Remaining work:
>   * the FIXME about where to call levenshtein_distance_unit_tests;
> there's an argument that this could be moved to libiberty (is C++
> allowed in libiberty?); I'd prefer to get the unittest idea from
>  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
> into trunk, and then move it into there.  Right now it's all
> gcc_assert, so optimizes away in a production build.

That would be gcc_checking_assert, no? gcc_assert() should still work
in release mode, AFAIU.

>   * try existing testcases as noted by Manu above

I think the most useful part of checking those is that we have really
wacky testcases and it may show cases where things go horribly wrong.
Plus, if the suggestion is perfect, then you have another testcase for
free. This is what I was doing with the Wformat precise locations.

> It also strikes me that sometimes a "misspelling" is a missing
> header file, and that the most helpful thing to do might be to
> suggest including that header file.  For instance given:
>   $ cat /tmp/foo.c
>   int64_t i;
>
>   $ ./xgcc -B. /tmp/foo.c
>   /tmp/foo.c:1:1: error: unknown type name ‘int64_t’
>   int64_t i;
>   ^
> (where the suggestion of "int" is suppressed due to the distance
> being too long) it might be helpful to print:
>   /tmp/foo.c:1:1: error: unknown type name 'int64_t'; did you mean to include '<inttypes.h>'?
>   int64_t i;
>   ^
> That does seem like a separate enhancement, though.

We already suggest header files for built-in functions:
https://gcc.gnu.org/PR59717
Doing the same for "standard" types would not be a stretch, but yes,
it is a separate thing.


> diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
> new file mode 100644
> index 0000000..c407aa0
> --- /dev/null
> +++ b/gcc/spellcheck.c
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "spellcheck.h"

Why tm.h?

Great work!

Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 01/22] Change of location_get_source_line signature
  2015-09-14 19:28   ` Jeff Law
@ 2015-09-15 17:02     ` David Malcolm
  0 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-15 17:02 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Mon, 2015-09-14 at 13:26 -0600, Jeff Law wrote:
> On 09/10/2015 02:28 PM, David Malcolm wrote:
> > location_get_source_line takes an expanded_location, but the column
> > is irrelevant; it just needs a filename and line number.
> >
> > This change is used by, but independent of, the new implementation of
> > diagnostic_show_locus later in the kit, so am breaking this out early.
> >
> > gcc/ChangeLog:
> > 	* input.h (location_get_source_line): Drop "expanded_location"
> > 	param in favor of a file and line number.
> > 	* input.c (location_get_source_line): Likewise.
> > 	(dump_location_info): Update for change in signature of
> > 	location_get_source_line.
> > 	* diagnostic.c (diagnostic_print_caret_line): Likewise.
> >
> > gcc/c-family/ChangeLog:
> > 	* c-format.c (location_from_offset): Update for change in
> > 	signature of location_get_source_line.
> > 	* c-indentation.c (get_visual_column): Likewise.
> > 	(line_contains_hash_if): Likewise.
> This looks like a reasonable cleanup in and of itself.  It's OK for the 
> trunk once you've done the usual bootstrap & regression test.

Thanks; bootstrapped&regrtested; committed to trunk as r227800.


^ permalink raw reply	[flat|nested] 133+ messages in thread

* dejagnu version update?
  2015-09-14 22:45       ` Jeff Law
@ 2015-09-15 17:53         ` Mike Stump
  2015-09-15 19:23           ` David Malcolm
  2015-09-15 19:53           ` dejagnu version update? Bernhard Reutner-Fischer
  0 siblings, 2 replies; 133+ messages in thread
From: Mike Stump @ 2015-09-15 17:53 UTC (permalink / raw)
  To: Jeff Law
  Cc: Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>> Maybe GCC-6 can bump the required
>> dejagnu version to allow for getting rid of all these superfluous
>> load_gcc_lib? *blink* :)
> I'd support that as a direction.
> 
> Certainly dropping the 2001 version from our website in favor of 1.5 (which is what I'm using anyway) would be a step forward.

So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to 1.5.  I don’t know of any reason to not update and just require 1.5 at this point.  I’m not a fan of feature chasing dejagnu, but an update every 2-4 years isn’t unreasonable.

So, let’s do it this way…  Any serious and compelling reason to not update to 1.5?  If none, let’s update to 1.5 in another week or two, if no serious and compelling reasons not to.

My general plan is, slow cycle updates on dejagnu, maybe every 2 years.  LTS style releases should have the version in it before the requirement is updated.  I take this approach as I think this should be the maximal change rate of things like make, gcc, g++, ld, if possible.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 17:53         ` dejagnu version update? Mike Stump
@ 2015-09-15 19:23           ` David Malcolm
  2015-09-15 20:29             ` Jeff Law
  2015-09-15 19:53           ` dejagnu version update? Bernhard Reutner-Fischer
  1 sibling, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-15 19:23 UTC (permalink / raw)
  To: Mike Stump
  Cc: Jeff Law, Bernhard Reutner-Fischer, gcc-patches List, GCC Development

On Tue, 2015-09-15 at 10:39 -0700, Mike Stump wrote:
> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
> >> Maybe GCC-6 can bump the required
> >> dejagnu version to allow for getting rid of all these superfluous
> >> load_gcc_lib? *blink* :)
> > I'd support that as a direction.
> > 
> > Certainly dropping the 2001 version from our website in favor of 1.5
> (which is what I'm using anyway) would be a step forward.
> 
> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
> 1.5.  I donâ€™t know of any reason to not update and just require 1.5 at
> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
> every 2-4 years isnâ€™t unreasonable.

FWIW, I believe RHEL 6 is at dejagnu-1.4.4   I don't know whether or not
that's an issue here.

> So, letâ€™s do it this wayâ€¦  Any serious and compelling reason to not
> update to 1.5?  If none, letâ€™s update to 1.5 in another week or two,
> if no serious and compelling reasons not to.
> 
> My general plan is, slow cycle updates on dejagnu, maybe every 2
> years.  LTS style releases should have the version in it before the
> requirement is updated.  I take this approach as I think this should
> be the maximal change rate of things like make, gcc, g++, ld, if
> possible.


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 17:53         ` dejagnu version update? Mike Stump
  2015-09-15 19:23           ` David Malcolm
@ 2015-09-15 19:53           ` Bernhard Reutner-Fischer
  2015-09-15 20:05             ` Jeff Law
  2015-09-16 13:17             ` Matthias Klose
  1 sibling, 2 replies; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-15 19:53 UTC (permalink / raw)
  To: Mike Stump, Jeff Law; +Cc: David Malcolm, gcc-patches List, GCC Development

On September 15, 2015 7:39:39 PM GMT+02:00, Mike Stump <mikestump@comcast.net> wrote:
>On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>> Maybe GCC-6 can bump the required
>>> dejagnu version to allow for getting rid of all these superfluous
>>> load_gcc_lib? *blink* :)
>> I'd support that as a direction.
>> 
>> Certainly dropping the 2001 version from our website in favor of 1.5
>(which is what I'm using anyway) would be a step forward.
>
>So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
>1.5.  I donâ€™t know of any reason to not update and just require 1.5 at
>this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
>every 2-4 years isnâ€™t unreasonable.
>
>So, letâ€™s do it this wayâ€¦  Any serious and compelling reason to not
>update to 1.5?  If none, letâ€™s update to 1.5 in another week or two, if
>no serious and compelling reasons not to.
>
>My general plan is, slow cycle updates on dejagnu, maybe every 2 years.
>LTS style releases should have the version in it before the requirement
>is updated.  I take this approach as I think this should be the maximal
>change rate of things like make, gcc, g++, ld, if possible.

Yea, although this means that 1.5.3 (a Version with the libdirs tweak) being just 5 months old will have to wait another bump, I fear. For my part going to plain 1.5 is useless WRT the load_lib situation. I see no value in conditionalizing simplified libdir handling on a lucky user with recentish stuff so i'm just waiting another 2 or 4 years for this very minor cleanup.

Cheers,


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 19:53           ` dejagnu version update? Bernhard Reutner-Fischer
@ 2015-09-15 20:05             ` Jeff Law
  2015-09-15 23:12               ` Mike Stump
  2015-09-16 13:17             ` Matthias Klose
  1 sibling, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-15 20:05 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Mike Stump
  Cc: David Malcolm, gcc-patches List, GCC Development

On 09/15/2015 01:23 PM, Bernhard Reutner-Fischer wrote:
> On September 15, 2015 7:39:39 PM GMT+02:00, Mike Stump
> <mikestump@comcast.net> wrote:
>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>>> Maybe GCC-6 can bump the required dejagnu version to allow for
>>>> getting rid of all these superfluous load_gcc_lib? *blink* :)
>>> I'd support that as a direction.
>>>
>>> Certainly dropping the 2001 version from our website in favor of
>>> 1.5
>> (which is what I'm using anyway) would be a step forward.
>>
>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website
>> to 1.5.  I donâ€™t know of any reason to not update and just require
>> 1.5 at this point.  Iâ€™m not a fan of feature chasing dejagnu, but
>> an update every 2-4 years isnâ€™t unreasonable.
>>
>> So, letâ€™s do it this wayâ€¦  Any serious and compelling reason to
>> not update to 1.5?  If none, letâ€™s update to 1.5 in another week or
>> two, if no serious and compelling reasons not to.
>>
>> My general plan is, slow cycle updates on dejagnu, maybe every 2
>> years. LTS style releases should have the version in it before the
>> requirement is updated.  I take this approach as I think this
>> should be the maximal change rate of things like make, gcc, g++,
>> ld, if possible.
>
> Yea, although this means that 1.5.3 (a Version with the libdirs
> tweak) being just 5 months old will have to wait another bump, I
> fear. For my part going to plain 1.5 is useless WRT the load_lib
> situation. I see no value in conditionalizing simplified libdir
> handling on a lucky user with recentish stuff so i'm just waiting
> another 2 or 4 years for this very minor cleanup.
Given we haven't updated the dejagnu reqs since ~2001, I think stepping 
forward would be appropriate and I'd support moving all the way to 1.5.3 
with the expectation that we'll be on a cadence of no faster than 2 
years going forward.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 19:23           ` David Malcolm
@ 2015-09-15 20:29             ` Jeff Law
  2015-09-15 21:15               ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-15 20:29 UTC (permalink / raw)
  To: David Malcolm, Mike Stump
  Cc: Bernhard Reutner-Fischer, gcc-patches List, GCC Development

On 09/15/2015 01:21 PM, David Malcolm wrote:
> On Tue, 2015-09-15 at 10:39 -0700, Mike Stump wrote:
>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>>> Maybe GCC-6 can bump the required
>>>> dejagnu version to allow for getting rid of all these superfluous
>>>> load_gcc_lib? *blink* :)
>>> I'd support that as a direction.
>>>
>>> Certainly dropping the 2001 version from our website in favor of 1.5
>> (which is what I'm using anyway) would be a step forward.
>>
>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
>> 1.5.  I donâ€™t know of any reason to not update and just require 1.5 at
>> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
>> every 2-4 years isnâ€™t unreasonable.
>
> FWIW, I believe RHEL 6 is at dejagnu-1.4.4   I don't know whether or not
> that's an issue here.
I'd consider it a non-issue.  Folks that want to do GCC development on 
RHEL 6 are probably few and far between and can probably update dejagnu 
if need be ;-)

If ubuntu, fedora, debian current releases were stuck at 1.4, then it'd 
be a bigger issue.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 20:29             ` Jeff Law
@ 2015-09-15 21:15               ` Bernhard Reutner-Fischer
  2017-05-13 10:38                 ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-15 21:15 UTC (permalink / raw)
  To: Jeff Law, David Malcolm, Mike Stump; +Cc: gcc-patches List, GCC Development

On September 15, 2015 10:05:27 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 09/15/2015 01:21 PM, David Malcolm wrote:
>> On Tue, 2015-09-15 at 10:39 -0700, Mike Stump wrote:
>>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>>>> Maybe GCC-6 can bump the required
>>>>> dejagnu version to allow for getting rid of all these superfluous
>>>>> load_gcc_lib? *blink* :)
>>>> I'd support that as a direction.
>>>>
>>>> Certainly dropping the 2001 version from our website in favor of
>1.5
>>> (which is what I'm using anyway) would be a step forward.
>>>
>>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
>>> 1.5.  I donâ€™t know of any reason to not update and just require 1.5
>at
>>> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
>>> every 2-4 years isnâ€™t unreasonable.
>>
>> FWIW, I believe RHEL 6 is at dejagnu-1.4.4   I don't know whether or
>not
>> that's an issue here.
>I'd consider it a non-issue.  Folks that want to do GCC development on 
>RHEL 6 are probably few and far between and can probably update dejagnu
>
>if need be ;-)
>
>If ubuntu, fedora, debian current releases were stuck at 1.4, then it'd
>
>be a bigger issue.

Debian sid has 1.5.3 fwiw, so I assume Debian 9 will have that too. Not sure if we can get it into Debian 8, I'm not intimately familiar with the policy. If OTOH GCC-6 requires it then that's probably a strong argument to let it bubble down to Debian 8 if need be.
Widespread other buildsystems should pose no problem, dunno about the big non-debian distros in this respect.

Thanks,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 20:05             ` Jeff Law
@ 2015-09-15 23:12               ` Mike Stump
  2015-09-16  7:41                 ` Andreas Schwab
  0 siblings, 1 reply; 133+ messages in thread
From: Mike Stump @ 2015-09-15 23:12 UTC (permalink / raw)
  To: Jeff Law
  Cc: Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On Sep 15, 2015, at 1:04 PM, Jeff Law <law@redhat.com> wrote:
> Given we haven't updated the dejagnu reqs since ~2001, I think stepping forward would be appropriate and I'd support moving all the way to 1.5.3 with the expectation that we'll be on a cadence of no faster than 2 years going forward.

So, I actually picked 1.5, not at random, but because RHEL 7 has 1.5.1 and because ubuntu LTS has 1.5.  The point was for people to get the software for free via their normal update mechanisms.  We could go beyond 1.5 if we had a compelling need.  I just don’t see the need.  The software presently works with 1.4.4 and there aren’t any changes that require anything newer.  If the ubuntu people and the RHEL people can push 1.5.3 into their update chains, that would remove these two reasons to hold back at 1.5.  MacPorts has 1.5.3 already, not sure about brew.

If someone wanted to do something major, I’m fine with on demand dejagnu release and a bump of gcc and even the release branch to match it.

I don’t have a strong desire to not update past 1.5, if people really would like to.  Just a general, would be nice not to have to.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 23:12               ` Mike Stump
@ 2015-09-16  7:41                 ` Andreas Schwab
  2015-09-16 16:19                   ` Mike Stump
  0 siblings, 1 reply; 133+ messages in thread
From: Andreas Schwab @ 2015-09-16  7:41 UTC (permalink / raw)
  To: Mike Stump
  Cc: Jeff Law, Bernhard Reutner-Fischer, David Malcolm,
	gcc-patches List, GCC Development

Mike Stump <mikestump@comcast.net> writes:

> The software presently works with 1.4.4 and there arenâ€™t any changes
> that require anything newer.

SLES 12 has 1.4.4.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-15 15:25     ` [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2 David Malcolm
  2015-09-15 16:25       ` Manuel López-Ibáñez
@ 2015-09-16  8:45       ` Richard Biener
  2015-09-16 13:33         ` Michael Matz
  2015-09-17 19:32         ` Jeff Law
  1 sibling, 2 replies; 133+ messages in thread
From: Richard Biener @ 2015-09-16  8:45 UTC (permalink / raw)
  To: David Malcolm; +Cc: GCC Patches, Manuel López-Ibáñez

On Tue, Sep 15, 2015 at 5:38 PM, David Malcolm <dmalcolm@redhat.com> wrote:
> Updated patch attached, which is now independent of the rest of the
> patch kit; see below.  Various other comments inline.
>
> On Fri, 2015-09-11 at 17:30 +0200, Manuel López-Ibáñez wrote:
> On 10/09/15 22:28, David Malcolm wrote:
>> > There are a couple of FIXMEs here:
>> > * where to call levenshtein_distance_unit_tests
>>
>> Should this be part of make check? Perhaps a small program that is compiled and
>> linked with spellcheck.c? This would be possible if spellcheck.c did not depend
>> on tree.h or tm.h, which I doubt it needs to.
>
> Ideally I'd like to put them into a unittest plugin I've been working on:
>  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
> In the meantime, they only get run in an ENABLE_CHECKING build.
>
>> > * should we attempt error-recovery in c-typeck.c:build_component_ref
>>
>> I would say yes, but why not leave this discussion to a later patch? The
>> current one seems useful enough.
>
> (nods)
>
>> > +
>> > +/* Look for the closest match for NAME within the currently valid
>> > +   scopes.
>> > +
>> > +   This finds the identifier with the lowest Levenshtein distance to
>> > +   NAME.  If there are multiple candidates with equal minimal distance,
>> > +   the first one found is returned.  Scopes are searched from innermost
>> > +   outwards, and within a scope in reverse order of declaration, thus
>> > +   benefiting candidates "near" to the current scope.  */
>> > +
>> > +tree
>> > +lookup_name_fuzzy (tree name)
>> > +{
>> > +  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
>> > +
>> > +  c_binding *best_binding = NULL;
>> > +  int best_distance = INT_MAX;
>> > +
>> > +  for (c_scope *scope = current_scope; scope; scope = scope->outer)
>> > +    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
>> > +      {
>> > +   if (!binding->id)
>> > +     continue;
>> > +   int dist = levenshtein_distance (name, binding->id);
>> > +   if (dist < best_distance)

Btw, this looks quite expensive - I'm sure we want to limit the effort
here a bit.
Also not allowing arbitrary "best" distances and not do this for very simple
identifiers such as 'i'.  Say,

foo()
{
  int i;
  for (i =0; i<10; ++i)
   for (j = 0; j < 12; ++j)
    ;
}

I don't want us to suggest using 'i' instead of j (a good hint is that
I used 'j'
multiple times).

So while the idea might be an improvement to selected cases it can cause
confusion as well.  And if using the suggestion for further parsing it can
cause worse followup errors (unless we can limit such "fixup" use to the
cases where we can parse the result without errors).  Consider

foo()
{
  foz = 1;
}

if we suggest 'foo' instead of foz then we'll get a more confusing followup
error if we actually use it.

But maybe you already handle all these cases (didn't look at the patch,
just saw the above expensive loop plus dropped some obvious concerns).

Richard.

>> I guess 'dist' cannot be negative. Can it be zero? If not, wouldn't be
>> appropriate to exit as soon as it becomes 1?
>
> It can't be negative, so I've converted it to unsigned int, and introduced an
> "edit_distance_t" typedef for it.
>
> It would be appropriate to exit as soon as we reach 1 if we agree
> that lookup_name_fuzzy isn't intended to find exact matches (since
> otherwise we might fail to return an exact match if we see a
> distance 1 match first).
>
> I haven't implemented that early bailout in this iteration of the
> patch; should I?
>
>> Is this code discriminating between types and names? That is, what happens for:
>>
>> typedef int ins;
>>
>> int foo(void)
>> {
>>     int inr;
>>     inp x;
>> }
>
> Thanks.  I've fixed that.
>
>> > +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
>> > +
>> > +static void
>> > +lookup_field_fuzzy_find_candidates (tree type, tree component,
>> > +                               vec<tree> *candidates)
>> > +{
>> > +  tree field;
>> > +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
>> > +    {
>> > +      if (DECL_NAME (field) == NULL_TREE
>> > +     && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
>> > +         || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
>> > +   {
>> > +     lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
>> > +                                         component,
>> > +                                         candidates);
>> > +   }
>> > +
>> > +      if (DECL_NAME (field))
>> > +   candidates->safe_push (field);
>> > +    }
>> > +}
>>
>> This is appending inner-most, isn't it? Thus, given:
>
> Yes.
>
>> struct s{
>>      struct j { int aa; } kk;
>>      int aa;
>> };
>>
>> void foo(struct s x)
>> {
>>      x.ab;
>> }
>>
>> it will find s::j::aa before s::aa, no?
>
> AIUI, it doesn't look inside the "kk", only for anonymous structs.
>
> I added a test for this.
>
>> >   tree
>> > -build_component_ref (location_t loc, tree datum, tree component)
>> > +build_component_ref (location_t loc, tree datum, tree component,
>> > +                source_range *ident_range)
>> >   {
>> >     tree type = TREE_TYPE (datum);
>> >     enum tree_code code = TREE_CODE (type);
>> > @@ -2294,7 +2356,31 @@ build_component_ref (location_t loc, tree datum, tree component)
>> >
>> >         if (!field)
>> >     {
>> > -     error_at (loc, "%qT has no member named %qE", type, component);
>> > +     if (!ident_range)
>> > +       {
>> > +         error_at (loc, "%qT has no member named %qE",
>> > +                   type, component);
>> > +         return error_mark_node;
>> > +       }
>> > +     gcc_rich_location richloc (*ident_range);
>> > +     if (TREE_CODE (datum) == INDIRECT_REF)
>> > +       richloc.add_expr (TREE_OPERAND (datum, 0));
>> > +     else
>> > +       richloc.add_expr (datum);
>> > +     field = lookup_field_fuzzy (type, component);
>> > +     if (field)
>> > +       {
>> > +         error_at_rich_loc
>> > +           (&richloc,
>> > +            "%qT has no member named %qE; did you mean %qE?",
>> > +            type, component, field);
>> > +         /* FIXME: error recovery: should we try to keep going,
>> > +            with "field"? (having issued an error, and hence no
>> > +            output).  */
>> > +       }
>> > +     else
>> > +       error_at_rich_loc (&richloc, "%qT has no member named %qE",
>> > +                          type, component);
>> >       return error_mark_node;
>> >     }
>>
>> I don't understand why looking for a candidate or not depends on ident_range.
>
> This is because the old patch was integrated with the source_range
> ideas from the rest of the patch kit.  I've taken that out in the new
> version.
>
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.dg/spellcheck.c
>> > @@ -0,0 +1,36 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-fdiagnostics-show-caret" } */
>> > +
>> > +struct foo
>> > +{
>> > +  int foo;
>> > +  int bar;
>> > +  int baz;
>> > +};
>> > +
>> > +int test (struct foo *ptr)
>> > +{
>> > +  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
>> > +
>> > +/* { dg-begin-multiline-output "" }
>> > +   return ptr->m_bar;
>> > +          ~~~  ^~~~~
>> > +   { dg-end-multiline-output "" } */
>> > +}
>> > +
>> > +int test2 (void)
>> > +{
>> > +  struct foo instance = {};
>> > +  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
>> > +
>> > +/* { dg-begin-multiline-output "" }
>> > +   return instance.m_bar;
>> > +          ~~~~~~~~ ^~~~~
>> > +   { dg-end-multiline-output "" } */
>> > +}
>> > +
>> > +int64 foo; /* { dg-error "unknown type name 'int64'; did you mean 'int'?" } */
>> > +/* { dg-begin-multiline-output "" }
>> > + int64 foo;
>> > + ^~~~~
>> > +   { dg-end-multiline-output "" } */
>> >
>>
>>
>> These tests could also test different scopes, clashes between types and fields
>> and variables, and the correct behavior for nested struct/unions.
>
> Thanks; added to TODO list below.
>
>> I wonder whether it would be worth it to extend existing tests if now they emit
>> the "do you mean" part to be sure they are doing the right thing.
>
> Thanks; added to TODO list below.  These are passing now due to the
> dg-error regexp not caring about the exact message.
>
> Many of the field names in these tests are very short; it's not clear
> to me that there's a good single suggestion that can be made if there
> are several 1-char field names to choose from.
>
> I noticed that the old patch could sometimes offer unhelpful
> suggestions; I added a test for this:
>
>   nonsensical_suggestion_t var;
>
> where it would suggest something unrelated.  I suppressed that in
> lookup_name_fuzzy by only offering a suggestion if the distance is less
> than half of the length of what the user typed and that seemed to work
> well, albeit in the few cases I tried.  I suspect that we may
> want a similar suppression for lookup_field_fuzzy.
>
>> Cheers,
>>
>> Manuel.
>
> Thanks.
>
> Update version of the patch follows.
>
> This version of the patch is independent of the rest of the kit,
> and applies directly on top of trunk (r227562, specifically).
>
> Changes since previous version:
> - it's now independent of the rest of the patch kit.
> - removal of tracking of fieldname range "ident_range" from calls
>   to build_component_ref, just using the location_t.
> - removal of show-caret/multiline tests from testcase
> - introduced a typedef "edit_distance_t", using it to convert
>   the underlying type from "int" to "unsigned int".
> - "lookup_name_fuzzy" now only considers bindings of a TYPE_DECL,
>   thus matching "ins" rather than "inr" for the example given by Manu
>   here:
>     https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00813.html
> - lookup_name_fuzzy: don't offer a suggestion if the distance is too
>   high, since such a suggestion is likely to be bogus
> - added test coverage to try to cover the above
> - reimplemented levenshtein_distance to avoid allocating and building
>   an (m + 1) * (n + 1) matrix in favor of just tracking two rows
>   at once
> - made levenshtein_distance_unit_tests automatically run each test
>   both ways; added some more tests
>
> I attempted the error-recovery in build_component_ref, but I found it
> could make things worse.  For example, in
> gcc/testsuite/gcc.dg/anon-struct-11.c:
>   f3 (&e.D);            /* { dg-error "no member" } */
> becomes:
>   error: 'struct E' has no member named 'D'; did you mean 'b'?
> but if we try to use "b", this then leads to thes additional bogus
> messages:
>   warning: passing argument 1 of 'f3' from incompatible
>     pointer type [-Wincompatible-pointer-types]
>   note: expected 'D * {aka struct <anonymous> *}' but
>     argument is of type 'char *'
>
> Similarly, in gcc/testsuite/gcc.dg/c11-anon-struct-2.c:
>   x.i = 0; /* { dg-error "has no member" } */
> this becomes:
>   error: 'struct s5' has no member named 'i'; did you mean 'a'?
> which then leads to:
>   error: incompatible types when assigning to type
>    'struct <anonymous>' from type 'int'
>
> So this version of the patch doesn't attempted to use the suggested
> field.
>
> Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu; adds
> 9 PASSes to gcc.sum.
>
> I'm posting it here as a work-in-progress.
>
> Remaining work:
>   * the FIXME about where to call levenshtein_distance_unit_tests;
> there's an argument that this could be moved to libiberty (is C++
> allowed in libiberty?); I'd prefer to get the unittest idea from
>  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00765.html
> into trunk, and then move it into there.  Right now it's all
> gcc_assert, so optimizes away in a production build.
>   * more testcases as noted by Manu above
>   * try existing testcases as noted by Manu above
>   * possible early return when distance == 1
>   * perhaps some kind of limit on the number of iterations inside
> levenshtein_distance (e.g. governed by a param).
>   * perhaps some ability to pass in a limit on the
> distance we care about, so we can immediately reject distances
> that will be above this
>
> It also strikes me that sometimes a "misspelling" is a missing
> header file, and that the most helpful thing to do might be to
> suggest including that header file.  For instance given:
>   $ cat /tmp/foo.c
>   int64_t i;
>
>   $ ./xgcc -B. /tmp/foo.c
>   /tmp/foo.c:1:1: error: unknown type name ‘int64_t’
>   int64_t i;
>   ^
> (where the suggestion of "int" is suppressed due to the distance
> being too long) it might be helpful to print:
>   /tmp/foo.c:1:1: error: unknown type name 'int64_t'; did you mean to include '<inttypes.h>'?
>   int64_t i;
>   ^
> That does seem like a separate enhancement, though.
>
> gcc/ChangeLog:
>         * Makefile.in (OBJS): Add spellcheck.o.
>         * spellcheck.c: New file.
>         * spellcheck.h: New file.
>
> gcc/c-family/ChangeLog:
>         * c-common.h (lookup_name_fuzzy): New decl.
>
> gcc/c/ChangeLog:
>         * c-decl.c: Include spellcheck.h.
>         (lookup_name_fuzzy): New.
>         * c-parser.c: Include spellcheck.h.
>         (c_parser_declaration_or_fndef): If "unknown type name",
>         attempt to suggest a close match using lookup_name_fuzzy.
>         * c-typeck.c: Include spellcheck.h.
>         (lookup_field_fuzzy_find_candidates): New function.
>         (lookup_field_fuzzy): New function.
>         (build_component_ref): Use lookup_field_fuzzy to suggest close
>         matches when printing field-not-found error.
>
> gcc/testsuite/ChangeLog:
>         * gcc.dg/spellcheck.c: New file.
> ---
>  gcc/Makefile.in                   |   1 +
>  gcc/c-family/c-common.h           |   1 +
>  gcc/c/c-decl.c                    |  45 +++++++++++
>  gcc/c/c-parser.c                  |  11 ++-
>  gcc/c/c-typeck.c                  |  66 ++++++++++++++-
>  gcc/spellcheck.c                  | 166 ++++++++++++++++++++++++++++++++++++++
>  gcc/spellcheck.h                  |  35 ++++++++
>  gcc/testsuite/gcc.dg/spellcheck.c |  49 +++++++++++
>  8 files changed, 371 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/spellcheck.c
>  create mode 100644 gcc/spellcheck.h
>  create mode 100644 gcc/testsuite/gcc.dg/spellcheck.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 3d1c1e5..73a29b4 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1391,6 +1391,7 @@ OBJS = \
>         shrink-wrap.o \
>         simplify-rtx.o \
>         sparseset.o \
> +       spellcheck.o \
>         sreal.o \
>         stack-ptr-mod.o \
>         statistics.o \
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 74d1bc1..e5f867c 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -971,6 +971,7 @@ extern tree finish_label_address_expr (tree, location_t);
>     different implementations.  Used in c-common.c.  */
>  extern tree lookup_label (tree);
>  extern tree lookup_name (tree);
> +extern tree lookup_name_fuzzy (tree);
>  extern bool lvalue_p (const_tree);
>
>  extern bool vector_targets_convertible_p (const_tree t1, const_tree t2);
> diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
> index b83c584..d919019 100644
> --- a/gcc/c/c-decl.c
> +++ b/gcc/c/c-decl.c
> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-ada-spec.h"
>  #include "cilk.h"
>  #include "builtins.h"
> +#include "spellcheck.h"
>
>  /* In grokdeclarator, distinguish syntactic contexts of declarators.  */
>  enum decl_context
> @@ -3900,6 +3901,50 @@ lookup_name_in_scope (tree name, struct c_scope *scope)
>        return b->decl;
>    return 0;
>  }
> +
> +/* Look for the closest match for NAME within the currently valid
> +   scopes.
> +
> +   This finds the identifier with the lowest Levenshtein distance to
> +   NAME.  If there are multiple candidates with equal minimal distance,
> +   the first one found is returned.  Scopes are searched from innermost
> +   outwards, and within a scope in reverse order of declaration, thus
> +   benefiting candidates "near" to the current scope.  */
> +
> +tree
> +lookup_name_fuzzy (tree name)
> +{
> +  gcc_assert (TREE_CODE (name) == IDENTIFIER_NODE);
> +
> +  c_binding *best_binding = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +
> +  for (c_scope *scope = current_scope; scope; scope = scope->outer)
> +    for (c_binding *binding = scope->bindings; binding; binding = binding->prev)
> +      {
> +       if (!binding->id)
> +         continue;
> +       if (TREE_CODE (binding->decl) != TYPE_DECL)
> +         continue;
> +       edit_distance_t dist = levenshtein_distance (name, binding->id);
> +       if (dist < best_distance)
> +         {
> +           best_distance = dist;
> +           best_binding = binding;
> +         }
> +      }
> +
> +  if (!best_binding)
> +    return NULL;
> +
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best_distance > IDENTIFIER_LENGTH (name) / 2 )
> +    return NULL;
> +
> +  return best_binding->id;
> +}
> +
>
>  /* Create the predefined scalar types of C,
>     and some nodes representing standard constants (0, 1, (void *) 0).
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 11a2b0f..f04c88b 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "builtins.h"
>  #include "gomp-constants.h"
>  #include "c-family/c-indentation.h"
> +#include "spellcheck.h"
>
>
>  /* Initialization routine for this file.  */
> @@ -1539,8 +1540,14 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
>            || c_parser_peek_2nd_token (parser)->type == CPP_MULT)
>        && (!nested || !lookup_name (c_parser_peek_token (parser)->value)))
>      {
> -      error_at (here, "unknown type name %qE",
> -                c_parser_peek_token (parser)->value);
> +      tree hint = lookup_name_fuzzy (c_parser_peek_token (parser)->value);
> +      if (hint)
> +       error_at (here, "unknown type name %qE; did you mean %qE?",
> +                 c_parser_peek_token (parser)->value,
> +                 hint);
> +      else
> +       error_at (here, "unknown type name %qE",
> +                 c_parser_peek_token (parser)->value);
>
>        /* Parse declspecs normally to get a correct pointer type, but avoid
>           a further "fails to be a type name" error.  Refuse nested functions
> diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
> index dc22396..3dded26 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-ubsan.h"
>  #include "cilk.h"
>  #include "gomp-constants.h"
> +#include "spellcheck.h"
>
>  /* Possible cases of implicit bad conversions.  Used to select
>     diagnostic messages in convert_for_assignment.  */
> @@ -2249,6 +2250,64 @@ lookup_field (tree type, tree component)
>    return tree_cons (NULL_TREE, field, NULL_TREE);
>  }
>
> +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> +
> +static void
> +lookup_field_fuzzy_find_candidates (tree type, tree component,
> +                                   vec<tree> *candidates)
> +{
> +  tree field;
> +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> +    {
> +      if (DECL_NAME (field) == NULL_TREE
> +         && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> +             || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> +       {
> +         lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> +                                             component,
> +                                             candidates);
> +       }
> +
> +      if (DECL_NAME (field))
> +       candidates->safe_push (DECL_NAME (field));
> +    }
> +}
> +
> +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> +   rather than returning a TREE_LIST for an exact match.  */
> +
> +static tree
> +lookup_field_fuzzy (tree type, tree component)
> +{
> +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> +
> +  /* FIXME: move this to a unittest suite. */
> +  levenshtein_distance_unit_tests ();
> +
> +  /* First, gather a list of candidates.  */
> +  auto_vec <tree> candidates;
> +
> +  lookup_field_fuzzy_find_candidates (type, component,
> +                                     &candidates);
> +
> +  /* Now determine which is closest.  */
> +  int i;
> +  tree identifier;
> +  tree best_identifier = NULL;
> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> +    {
> +      edit_distance_t dist = levenshtein_distance (component, identifier);
> +      if (dist < best_distance)
> +       {
> +         best_distance = dist;
> +         best_identifier = identifier;
> +       }
> +    }
> +
> +  return best_identifier;
> +}
> +
>  /* Make an expression to refer to the COMPONENT field of structure or
>     union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
>     location of the COMPONENT_REF.  */
> @@ -2284,7 +2343,12 @@ build_component_ref (location_t loc, tree datum, tree component)
>
>        if (!field)
>         {
> -         error_at (loc, "%qT has no member named %qE", type, component);
> +         tree guessed_id = lookup_field_fuzzy (type, component);
> +         if (guessed_id)
> +           error_at (loc, "%qT has no member named %qE; did you mean %qE?",
> +                     type, component, guessed_id);
> +         else
> +           error_at (loc, "%qT has no member named %qE", type, component);
>           return error_mark_node;
>         }
>
> diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
> new file mode 100644
> index 0000000..c407aa0
> --- /dev/null
> +++ b/gcc/spellcheck.c
> @@ -0,0 +1,166 @@
> +/* Find near-matches for strings and identifiers.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "spellcheck.h"
> +
> +/* The Levenshtein distance is an "edit-distance": the minimal
> +   number of one-character insertions, removals or substitutions
> +   that are needed to change one string into another.
> +
> +   This implementation uses the Wagner-Fischer algorithm.  */
> +
> +static edit_distance_t
> +levenshtein_distance (const char *s, int m,
> +                     const char *t, int n)
> +{
> +  const bool debug = false;
> +
> +  if (debug)
> +    {
> +      printf ("s: \"%s\" (m=%i)\n", s, m);
> +      printf ("t: \"%s\" (n=%i)\n", t, n);
> +    }
> +
> +  if (m == 0)
> +    return n;
> +  if (n == 0)
> +    return m;
> +
> +  /* We effectively build a matrix where each (i, j) contains the
> +     Levenshtein distance between the prefix strings s[0:i]
> +     and t[0:j].
> +     Rather than actually build an (m + 1) * (n + 1) matrix,
> +     we simply keep track of the last row, v0 and a new row, v1,
> +     which avoids an (m + 1) * (n + 1) allocation and memory accesses
> +     in favor of two (m + 1) allocations.  These could potentially be
> +     statically-allocated if we impose a maximum length on the
> +     strings of interest.  */
> +  edit_distance_t *v0 = new edit_distance_t[m + 1];
> +  edit_distance_t *v1 = new edit_distance_t[m + 1];
> +
> +  /* The first row is for the case of an empty target string, which
> +     we can reach by deleting every character in the source string.  */
> +  for (int i = 0; i < m + 1; i++)
> +    v0[i] = i;
> +
> +  /* Build successive rows.  */
> +  for (int i = 0; i < n; i++)
> +    {
> +      if (debug)
> +       {
> +         printf ("i:%i v0 = ", i);
> +         for (int j = 0; j < m + 1; j++)
> +           printf ("%i ", v0[j]);
> +         printf ("\n");
> +       }
> +
> +      /* The initial column is for the case of an empty source string; we
> +        can reach prefixes of the target string of length i
> +        by inserting i characters.  */
> +      v1[0] = i + 1;
> +
> +      /* Build the rest of the row by considering neighbours to
> +        the north, west and northwest.  */
> +      for (int j = 0; j < m; j++)
> +       {
> +         edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> +         edit_distance_t deletion     = v1[j] + 1;
> +         edit_distance_t insertion    = v0[j + 1] + 1;
> +         edit_distance_t substitution = v0[j] + cost;
> +         edit_distance_t cheapest = MIN (deletion, insertion);
> +         cheapest = MIN (cheapest, substitution);
> +         v1[j + 1] = cheapest;
> +       }
> +
> +      /* Prepare to move on to next row.  */
> +      for (int j = 0; j < m + 1; j++)
> +       v0[j] = v1[j];
> +    }
> +
> +  if (debug)
> +    {
> +      printf ("final v1 = ");
> +      for (int j = 0; j < m + 1; j++)
> +       printf ("%i ", v1[j]);
> +      printf ("\n");
> +    }
> +
> +  edit_distance_t result = v1[m];
> +  delete[] v0;
> +  delete[] v1;
> +  return result;
> +}
> +
> +/* Calculate Levenshtein distance between two nil-terminated strings.
> +   This exists purely for the unit tests.  */
> +
> +edit_distance_t
> +levenshtein_distance (const char *s, const char *t)
> +{
> +  return levenshtein_distance (s, strlen (s), t, strlen (t));
> +}
> +
> +/* Unit tests for levenshtein_distance.  */
> +
> +static void
> +levenshtein_distance_unit_test (const char *a, const char *b,
> +                               edit_distance_t expected)
> +{
> +  /* Run every test both ways to ensure it's symmetric.  */
> +  gcc_assert (levenshtein_distance (a, b) == expected);
> +  gcc_assert (levenshtein_distance (b, a) == expected);
> +}
> +
> +void
> +levenshtein_distance_unit_tests (void)
> +{
> +  levenshtein_distance_unit_test ("", "nonempty", strlen ("nonempty"));
> +  levenshtein_distance_unit_test ("saturday", "sunday", 3);
> +  levenshtein_distance_unit_test ("foo", "m_foo", 2);
> +  levenshtein_distance_unit_test ("hello_world", "HelloWorld", 3);
> +  levenshtein_distance_unit_test
> +    ("the quick brown fox jumps over the lazy dog", "dog", 40);
> +  levenshtein_distance_unit_test
> +    ("the quick brown fox jumps over the lazy dog",
> +     "the quick brown dog jumps over the lazy fox",
> +     4);
> +  levenshtein_distance_unit_test
> +    ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
> +     "All your base are belong to us",
> +     44);
> +}
> +
> +/* Calculate Levenshtein distance between two identifiers.  */
> +
> +edit_distance_t
> +levenshtein_distance (tree ident_s, tree ident_t)
> +{
> +  gcc_assert (TREE_CODE (ident_s) == IDENTIFIER_NODE);
> +  gcc_assert (TREE_CODE (ident_t) == IDENTIFIER_NODE);
> +
> +  return levenshtein_distance (IDENTIFIER_POINTER (ident_s),
> +                              IDENTIFIER_LENGTH (ident_s),
> +                              IDENTIFIER_POINTER (ident_t),
> +                              IDENTIFIER_LENGTH (ident_t));
> +}
> diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
> new file mode 100644
> index 0000000..6f50b71
> --- /dev/null
> +++ b/gcc/spellcheck.h
> @@ -0,0 +1,35 @@
> +/* Find near-matches for strings and identifiers.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#ifndef GCC_SPELLCHECK_H
> +#define GCC_SPELLCHECK_H
> +
> +typedef unsigned int edit_distance_t;
> +const edit_distance_t MAX_EDIT_DISTANCE = UINT_MAX;
> +
> +extern void
> +levenshtein_distance_unit_tests (void);
> +
> +extern edit_distance_t
> +levenshtein_distance (const char *s, const char *t);
> +
> +extern edit_distance_t
> +levenshtein_distance (tree ident_s, tree ident_t);
> +
> +#endif  /* GCC_SPELLCHECK_H  */
> diff --git a/gcc/testsuite/gcc.dg/spellcheck.c b/gcc/testsuite/gcc.dg/spellcheck.c
> new file mode 100644
> index 0000000..53ebb86
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/spellcheck.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +
> +struct foo
> +{
> +  int foo;
> +  int bar;
> +  int baz;
> +};
> +
> +int test (struct foo *ptr)
> +{
> +  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> +}
> +
> +int test2 (void)
> +{
> +  struct foo instance = {0, 0, 0};
> +  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
> +}
> +
> +#include <inttypes.h>
> +int64 i; /* { dg-error "unknown type name 'int64'; did you mean 'int64_t'?" } */
> +
> +typedef int ins;
> +int test3 (void)
> +{
> +    int inr;
> +    inp x; /* { dg-error "unknown type name 'inp'; did you mean 'ins'?" } */
> +}
> +
> +struct s {
> +    struct j { int aa; } kk;
> +    int ab;
> +};
> +
> +void test4 (struct s x)
> +{
> +  x.ac;  /* { dg-error "'struct s' has no member named 'ac'; did you mean 'ab'?" } */
> +}
> +
> +int test5 (struct foo *ptr)
> +{
> +  return sizeof (ptr->foa); /* { dg-error "'struct foo' has no member named 'foa'; did you mean 'foo'?" } */
> +}
> +
> +/* Verify that gcc doesn't offer nonsensical suggestions.  */
> +
> +nonsensical_suggestion_t var; /* { dg-bogus "did you mean" } */
> +/* { dg-error "unknown type name" "" { target { *-*-* } } 48 } */
> --
> 1.8.5.3
>

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 19:53           ` dejagnu version update? Bernhard Reutner-Fischer
  2015-09-15 20:05             ` Jeff Law
@ 2015-09-16 13:17             ` Matthias Klose
  2015-09-16 15:46               ` Bernhard Reutner-Fischer
  1 sibling, 1 reply; 133+ messages in thread
From: Matthias Klose @ 2015-09-16 13:17 UTC (permalink / raw)
  To: gcc-patches

On 09/15/2015 09:23 PM, Bernhard Reutner-Fischer wrote:
> On September 15, 2015 7:39:39 PM GMT+02:00, Mike Stump <mikestump@comcast.net> wrote:
>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>>> Maybe GCC-6 can bump the required
>>>> dejagnu version to allow for getting rid of all these superfluous
>>>> load_gcc_lib? *blink* :)
>>> I'd support that as a direction.
>>>
>>> Certainly dropping the 2001 version from our website in favor of 1.5
>> (which is what I'm using anyway) would be a step forward.
>>
>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
>> 1.5.  I donâ€™t know of any reason to not update and just require 1.5 at
>> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
>> every 2-4 years isnâ€™t unreasonable.
>>
>> So, letâ€™s do it this wayâ€¦  Any serious and compelling reason to not
>> update to 1.5?  If none, letâ€™s update to 1.5 in another week or two, if
>> no serious and compelling reasons not to.
>>
>> My general plan is, slow cycle updates on dejagnu, maybe every 2 years.
>> LTS style releases should have the version in it before the requirement
>> is updated.  I take this approach as I think this should be the maximal
>> change rate of things like make, gcc, g++, ld, if possible.
> 
> Yea, although this means that 1.5.3 (a Version with the libdirs tweak) being just 5 months old will have to wait another bump, I fear. For my part going to plain 1.5 is useless WRT the load_lib situation. I see no value in conditionalizing simplified libdir handling on a lucky user with recentish stuff so i'm just waiting another 2 or 4 years for this very minor cleanup.

is this libdirs tweak backportable to 1.5.1 (Debian stable), or 1.5 (Ubuntu LTS)?

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-16  8:45       ` Richard Biener
@ 2015-09-16 13:33         ` Michael Matz
  2015-09-16 14:00           ` Richard Biener
  2015-09-17 19:32         ` Jeff Law
  1 sibling, 1 reply; 133+ messages in thread
From: Michael Matz @ 2015-09-16 13:33 UTC (permalink / raw)
  To: Richard Biener
  Cc: David Malcolm, GCC Patches, Manuel López-Ibáñez

Hi,

On Wed, 16 Sep 2015, Richard Biener wrote:

> Btw, this looks quite expensive - I'm sure we want to limit the effort
> here a bit.

I'm not so sure.  It's only used for printing an error, so walking all 
available decls is expensive but IMHO not too much so.

> I don't want us to suggest using 'i' instead of j (a good hint is that I 
> used 'j' multiple times).

Well, there will always be cases where the suggestion is actually wrong.  
How do you propose to deal with this?  The above case could be solved by 
not giving hints when the levenshtein distance is as long as the string 
length (which makes sense, because then there's no relation at all between 
the string and the suggestion).

> So while the idea might be an improvement to selected cases it can cause 
> confusion as well.  And if using the suggestion for further parsing it 
> can cause worse followup errors (unless we can limit such "fixup" use to 
> the cases where we can parse the result without errors).  Consider
> 
> foo()
> {
>   foz = 1;
> }
> 
> if we suggest 'foo' instead of foz then we'll get a more confusing followup
> error if we actually use it.

This particular case could be solved by ruling out candidaten of the wrong 
kind (here, something that can be assigned to, vs. a function).  But it 
might actually be too early in parsing to say that there will be an 
assignment.  I don't think _this_ problem should block the patch.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-16 13:33         ` Michael Matz
@ 2015-09-16 14:00           ` Richard Biener
  2015-09-16 15:49             ` Manuel López-Ibáñez
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Biener @ 2015-09-16 14:00 UTC (permalink / raw)
  To: Michael Matz
  Cc: David Malcolm, GCC Patches, Manuel López-Ibáñez

On Wed, Sep 16, 2015 at 3:22 PM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Wed, 16 Sep 2015, Richard Biener wrote:
>
>> Btw, this looks quite expensive - I'm sure we want to limit the effort
>> here a bit.
>
> I'm not so sure.  It's only used for printing an error, so walking all
> available decls is expensive but IMHO not too much so.

Well, as we're not stopping at the very first error creating an
artificial testcase
that hits this quite badly should be possible.  Maybe only try this for
the first error and not for followups?

>> I don't want us to suggest using 'i' instead of j (a good hint is that I
>> used 'j' multiple times).
>
> Well, there will always be cases where the suggestion is actually wrong.
> How do you propose to deal with this?  The above case could be solved by
> not giving hints when the levenshtein distance is as long as the string
> length (which makes sense, because then there's no relation at all between
> the string and the suggestion).
>
>> So while the idea might be an improvement to selected cases it can cause
>> confusion as well.  And if using the suggestion for further parsing it
>> can cause worse followup errors (unless we can limit such "fixup" use to
>> the cases where we can parse the result without errors).  Consider
>>
>> foo()
>> {
>>   foz = 1;
>> }
>>
>> if we suggest 'foo' instead of foz then we'll get a more confusing followup
>> error if we actually use it.
>
> This particular case could be solved by ruling out candidaten of the wrong
> kind (here, something that can be assigned to, vs. a function).  But it
> might actually be too early in parsing to say that there will be an
> assignment.  I don't think _this_ problem should block the patch.

I wonder if we can tentatively parse with the choice at hand, only allowing
(and even suggesting?) it if that works out.

Richard.

>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 13:17             ` Matthias Klose
@ 2015-09-16 15:46               ` Bernhard Reutner-Fischer
  0 siblings, 0 replies; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-16 15:46 UTC (permalink / raw)
  To: Matthias Klose, gcc-patches

On September 16, 2015 3:01:47 PM GMT+02:00, Matthias Klose <doko@ubuntu.com> wrote:
>On 09/15/2015 09:23 PM, Bernhard Reutner-Fischer wrote:
>> On September 15, 2015 7:39:39 PM GMT+02:00, Mike Stump
><mikestump@comcast.net> wrote:
>>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
>>>>> Maybe GCC-6 can bump the required
>>>>> dejagnu version to allow for getting rid of all these superfluous
>>>>> load_gcc_lib? *blink* :)
>>>> I'd support that as a direction.
>>>>
>>>> Certainly dropping the 2001 version from our website in favor of
>1.5
>>> (which is what I'm using anyway) would be a step forward.
>>>
>>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
>>> 1.5.  I donâ€™t know of any reason to not update and just require 1.5
>at
>>> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
>>> every 2-4 years isnâ€™t unreasonable.
>>>
>>> So, letâ€™s do it this wayâ€¦  Any serious and compelling reason to not
>>> update to 1.5?  If none, letâ€™s update to 1.5 in another week or two,
>if
>>> no serious and compelling reasons not to.
>>>
>>> My general plan is, slow cycle updates on dejagnu, maybe every 2
>years.
>>> LTS style releases should have the version in it before the
>requirement
>>> is updated.  I take this approach as I think this should be the
>maximal
>>> change rate of things like make, gcc, g++, ld, if possible.
>> 
>> Yea, although this means that 1.5.3 (a Version with the libdirs
>tweak) being just 5 months old will have to wait another bump, I fear.
>For my part going to plain 1.5 is useless WRT the load_lib situation. I
>see no value in conditionalizing simplified libdir handling on a lucky
>user with recentish stuff so i'm just waiting another 2 or 4 years for
>this very minor cleanup.
>
>is this libdirs tweak backportable to 1.5.1 (Debian stable), or 1.5
>(Ubuntu LTS)?

Should be trivial, yes:
http://git.savannah.gnu.org/cgit/dejagnu.git/commit/?id=5481f29161477520c691d525653323b82fa47ad7

Thanks,


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-16 14:00           ` Richard Biener
@ 2015-09-16 15:49             ` Manuel López-Ibáñez
  2015-09-17  8:46               ` Richard Biener
  0 siblings, 1 reply; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-16 15:49 UTC (permalink / raw)
  To: Richard Biener; +Cc: Michael Matz, David Malcolm, GCC Patches

On 16 September 2015 at 15:33, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Sep 16, 2015 at 3:22 PM, Michael Matz <matz@suse.de> wrote:
>>> if we suggest 'foo' instead of foz then we'll get a more confusing followup
>>> error if we actually use it.
>>
>> This particular case could be solved by ruling out candidaten of the wrong
>> kind (here, something that can be assigned to, vs. a function).  But it
>> might actually be too early in parsing to say that there will be an
>> assignment.  I don't think _this_ problem should block the patch.

Indeed. The patch by David does not try to fix-up the code, it merely
suggests a possible candidate. The follow-up errors should be the same
before and after. Such suggestions will never be 100% right, even if
the suggestion makes the code compile and run, it may still be the
wrong one. A wrong suggestion is far less serious than a wrong
uninitialized or Warray-bounds warning and we can live with those. Why
this needs to be perfect from the very beginning?

BTW, there is a PR for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52277

> I wonder if we can tentatively parse with the choice at hand, only allowing
> (and even suggesting?) it if that works out.

This would require to queue the error, fix-up the wrong name and
continue parsing. If there is another error, ignore that one and emit
the original error without suggestion. The problem here is that we do
not know if the additional error is actually caused by the fix-up we
did or it is an already existing error. It would be equally terrible
to emit errors caused by the fix-up or emit just a single error for
the typo. We would need to roll-back the tentative parse and do a
definitive parse anyway. This does not seem possible at the moment
because the parsers maintain a lot of global state that is not easy to
roll-back. We cannot simply create a copy of the parser state and
throw it away later to continue as if the tentative parse has not
happened.

I'm not even sure if, in general, one can stop at the statement level
or we would need to parse the whole function (or translation unit) to
be able to tell if the suggestion is a valid candidate.

Cheers,

Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16  7:41                 ` Andreas Schwab
@ 2015-09-16 16:19                   ` Mike Stump
  2015-09-16 16:32                     ` Ramana Radhakrishnan
  0 siblings, 1 reply; 133+ messages in thread
From: Mike Stump @ 2015-09-16 16:19 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Jeff Law, Bernhard Reutner-Fischer, David Malcolm,
	gcc-patches List, GCC Development

On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de> wrote:
> Mike Stump <mikestump@comcast.net> writes:
> 
>> The software presently works with 1.4.4 and there aren’t any changes
>> that require anything newer.
> 
> SLES 12 has 1.4.4.

Would be nice to cover them as well, but their update schedule, 3-4 years, means that their next update is 2018.  They didn’t update to a 3 year old stable release of dejagnu for their last OS, meaning they are on a > 7 year update cycle.  I love embedded and really long term support cycles (20 years), but, don’t think we should cater to the 20 year cycle just yet.  :-)  Since 7 is substantially longer than 2, I don’t think we should worry about it.  If they had updated at the time, they would have had 3 years of engineering and testing before the release and _had_ 1.5.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:19                   ` Mike Stump
@ 2015-09-16 16:32                     ` Ramana Radhakrishnan
  2015-09-16 16:39                       ` Jeff Law
  2015-09-16 18:04                       ` Mike Stump
  0 siblings, 2 replies; 133+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-16 16:32 UTC (permalink / raw)
  To: Mike Stump, Andreas Schwab
  Cc: Jeff Law, Bernhard Reutner-Fischer, David Malcolm,
	gcc-patches List, GCC Development



On 16/09/15 17:14, Mike Stump wrote:
> On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de> wrote:
>> Mike Stump <mikestump@comcast.net> writes:
>>
>>> The software presently works with 1.4.4 and there arenÂ’t any changes
>>> that require anything newer.
>>
>> SLES 12 has 1.4.4.
> 
> Would be nice to cover them as well, but their update schedule, 3-4 years, means that their next update is 2018.  They didnÂ’t update to a 3 year old stable release of dejagnu for their last OS, meaning they are on a > 7 year update cycle.  I love embedded and really long term support cycles (20 years), but, donÂ’t think we should cater to the 20 year cycle just yet.  :-)  Since 7 is substantially longer than 2, I donÂ’t think we should worry about it.  If they had updated at the time, they would have had 3 years of engineering and testing before the release and _had_ 1.5.
> 

Sorry about the obvious (possibly dumb) question. 

Can't we just import a copy of dejagnu each year and install it as part of the source tree ? I can't imagine installing dejagnu is adding a huge amount of time to build and regression test time ? Advantage is that everyone is guaranteed to be on the same version. I fully expect resistance due to specific issues with specific versions of tcl and expect, but if folks aren't aware of this .....

regards
Ramana

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:32                     ` Ramana Radhakrishnan
@ 2015-09-16 16:39                       ` Jeff Law
  2015-09-16 17:26                         ` Trevor Saunders
                                           ` (2 more replies)
  2015-09-16 18:04                       ` Mike Stump
  1 sibling, 3 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-16 16:39 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Mike Stump, Andreas Schwab
  Cc: Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote:
>
>
> On 16/09/15 17:14, Mike Stump wrote:
>> On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de>
>> wrote:
>>> Mike Stump <mikestump@comcast.net> writes:
>>>
>>>> The software presently works with 1.4.4 and there arenÂ’t any
>>>> changes that require anything newer.
>>>
>>> SLES 12 has 1.4.4.
>>
>> Would be nice to cover them as well, but their update schedule, 3-4
>> years, means that their next update is 2018.  They didnÂ’t update to
>> a 3 year old stable release of dejagnu for their last OS, meaning
>> they are on a > 7 year update cycle.  I love embedded and really
>> long term support cycles (20 years), but, donÂ’t think we should
>> cater to the 20 year cycle just yet.  :-)  Since 7 is substantially
>> longer than 2, I donÂ’t think we should worry about it.  If they had
>> updated at the time, they would have had 3 years of engineering and
>> testing before the release and _had_ 1.5.
>>
>
> Sorry about the obvious (possibly dumb) question.
>
> Can't we just import a copy of dejagnu each year and install it as
> part of the source tree ? I can't imagine installing dejagnu is
> adding a huge amount of time to build and regression test time ?
> Advantage is that everyone is guaranteed to be on the same version. I
> fully expect resistance due to specific issues with specific versions
> of tcl and expect, but if folks aren't aware of this .....
That should work -- certainly that's the way we used to do things at 
Cygnus.  Some of that code may have bitrotted as single tree builds have 
fallen out-of-favor through the years.

As to whether or  not its a good idea.  I'm torn -- I don't like copying 
code from other repos because of the long term maintenance concerns.

I'd rather just move to 1.5 and get on with things.  If some systems 
don't have a new enough version, I'm comfortable telling developers on 
those platforms that they need to update.  It's not like every *user* 
needs dejagnu, it's just for the testing side of things.


jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:39                       ` Jeff Law
@ 2015-09-16 17:26                         ` Trevor Saunders
  2015-09-16 17:46                         ` David Malcolm
  2015-09-17 13:57                         ` Richard Earnshaw
  2 siblings, 0 replies; 133+ messages in thread
From: Trevor Saunders @ 2015-09-16 17:26 UTC (permalink / raw)
  To: Jeff Law
  Cc: Ramana Radhakrishnan, Mike Stump, Andreas Schwab,
	Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On Wed, Sep 16, 2015 at 10:36:47AM -0600, Jeff Law wrote:
> On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote:
> >
> >
> >On 16/09/15 17:14, Mike Stump wrote:
> >>On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de>
> >>wrote:
> >>>Mike Stump <mikestump@comcast.net> writes:
> >>>
> >>>>The software presently works with 1.4.4 and there arenâ€™t any
> >>>>changes that require anything newer.
> >>>
> >>>SLES 12 has 1.4.4.
> >>
> >>Would be nice to cover them as well, but their update schedule, 3-4
> >>years, means that their next update is 2018.  They didnâ€™t update to
> >>a 3 year old stable release of dejagnu for their last OS, meaning
> >>they are on a > 7 year update cycle.  I love embedded and really
> >>long term support cycles (20 years), but, donâ€™t think we should
> >>cater to the 20 year cycle just yet.  :-)  Since 7 is substantially
> >>longer than 2, I donâ€™t think we should worry about it.  If they had
> >>updated at the time, they would have had 3 years of engineering and
> >>testing before the release and _had_ 1.5.
> >>
> >
> >Sorry about the obvious (possibly dumb) question.
> >
> >Can't we just import a copy of dejagnu each year and install it as
> >part of the source tree ? I can't imagine installing dejagnu is
> >adding a huge amount of time to build and regression test time ?
> >Advantage is that everyone is guaranteed to be on the same version. I
> >fully expect resistance due to specific issues with specific versions
> >of tcl and expect, but if folks aren't aware of this .....
> That should work -- certainly that's the way we used to do things at Cygnus.
> Some of that code may have bitrotted as single tree builds have fallen
> out-of-favor through the years.
> 
> As to whether or  not its a good idea.  I'm torn -- I don't like copying
> code from other repos because of the long term maintenance concerns.

yeah, there's definitely history showing sharing code by coppying is not
a great idea e.g. top level files getting out of sync.  I'm hopefully
git submodules will make this better soon, but the UI isn't really great
yet.

> I'd rather just move to 1.5 and get on with things.  If some systems don't
> have a new enough version, I'm comfortable telling developers on those
> platforms that they need to update.  It's not like every *user* needs
> dejagnu, it's just for the testing side of things.

yeah, it seems like a poor idea to slow down progress we make for all
users to benefit a few people who want to develope on rather old
machines.

Trev

> 
> 
> jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:39                       ` Jeff Law
  2015-09-16 17:26                         ` Trevor Saunders
@ 2015-09-16 17:46                         ` David Malcolm
  2015-09-16 19:09                           ` Bernhard Reutner-Fischer
  2015-09-17  0:07                           ` Segher Boessenkool
  2015-09-17 13:57                         ` Richard Earnshaw
  2 siblings, 2 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-16 17:46 UTC (permalink / raw)
  To: Jeff Law
  Cc: Ramana Radhakrishnan, Mike Stump, Andreas Schwab,
	Bernhard Reutner-Fischer, gcc-patches List, GCC Development

On Wed, 2015-09-16 at 10:36 -0600, Jeff Law wrote:
> On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote:
> >
> >
> > On 16/09/15 17:14, Mike Stump wrote:
> >> On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de>
> >> wrote:
> >>> Mike Stump <mikestump@comcast.net> writes:
> >>>
> >>>> The software presently works with 1.4.4 and there arenâ€™t any
> >>>> changes that require anything newer.
> >>>
> >>> SLES 12 has 1.4.4.
> >>
> >> Would be nice to cover them as well, but their update schedule, 3-4
> >> years, means that their next update is 2018.  They didnâ€™t update to
> >> a 3 year old stable release of dejagnu for their last OS, meaning
> >> they are on a > 7 year update cycle.  I love embedded and really
> >> long term support cycles (20 years), but, donâ€™t think we should
> >> cater to the 20 year cycle just yet.  :-)  Since 7 is substantially
> >> longer than 2, I donâ€™t think we should worry about it.  If they had
> >> updated at the time, they would have had 3 years of engineering and
> >> testing before the release and _had_ 1.5.
> >>
> >
> > Sorry about the obvious (possibly dumb) question.
> >
> > Can't we just import a copy of dejagnu each year and install it as
> > part of the source tree ? I can't imagine installing dejagnu is
> > adding a huge amount of time to build and regression test time ?
> > Advantage is that everyone is guaranteed to be on the same version. I
> > fully expect resistance due to specific issues with specific versions
> > of tcl and expect, but if folks aren't aware of this .....
> That should work -- certainly that's the way we used to do things at 
> Cygnus.  Some of that code may have bitrotted as single tree builds have 
> fallen out-of-favor through the years.
> 
> As to whether or  not its a good idea.  I'm torn -- I don't like copying 
> code from other repos because of the long term maintenance concerns.
> 
> I'd rather just move to 1.5 and get on with things. 

AIUI, we specifically need >= 1.5.3 (or a version with a backport) to
get support for multiple load_lib paths mentioned by Bernhard, which is
what motivated this thread (on gcc-patches, before it spread to the gcc
list):
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01196.html
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00983.html

> If some systems 
> don't have a new enough version, I'm comfortable telling developers on 
> those platforms that they need to update.  It's not like every *user* 
> needs dejagnu, it's just for the testing side of things.
> 
> 
> jeff


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:32                     ` Ramana Radhakrishnan
  2015-09-16 16:39                       ` Jeff Law
@ 2015-09-16 18:04                       ` Mike Stump
  2015-09-16 18:58                         ` Bernhard Reutner-Fischer
  2015-09-16 19:37                         ` Ramana Radhakrishnan
  1 sibling, 2 replies; 133+ messages in thread
From: Mike Stump @ 2015-09-16 18:04 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Andreas Schwab, Jeff Law, Bernhard Reutner-Fischer,
	David Malcolm, gcc-patches List, GCC Development

On Sep 16, 2015, at 9:25 AM, Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com> wrote:
> 
> Sorry about the obvious (possibly dumb) question.

> Can't we just import a copy of dejagnu each year and install it as part of the source tree?

TL;DR: No.

We could, and indeed, some people do engineering that way.  We instead depend upon package managers, software updates and backwards compatibility to manage the issue.  This is generally speaking, a better way to do software. In the olden days, back before shared libraries, X11 was the straw that broke the camels back.  Without shared libraries, everyone replicated large portions of the X11 system inside each binary causing a massive bloat just in terms of disk space.  This cost was also reflected in distribution size.  As multiple binaries were run, each program would replicate all the code and read only data of the X11 window system causing larger than optimal usage of ram.  The problem was so fundamental and so compelling to fix that dllhell[1] was born.  It was the price we paid, for the solution to the original problem.  Now, the good news is that dllhell is trivial to avoid and engineer around so that it doesn’t exist.  People that can program their way out of a paper bag can do it without even thinking about it.  So, where we are today, it is a non-issue as it is a solved problem.  Since it is solved, we don’t need to do things like unshare programs, libraries, kernels or executables anymore.  Indeed, it is so completely and fundamentally solved that we can now use the desire to unshare as an indicator that someone in the food chain doesn’t have a clue what they are doing.  Windows is notorious for instances where people have not yet attained the right skill set yet.  In our case, since we can contribute anything we want, to any package we want, because we are open source, we can avoid, fix and do the good engineering required to avoid dllhell.  This makes it so that fundamentally, we never have to unshare, by design.  This is but one benefit of open source.

One day, we will advance and configure && make will automatically fetch and install the required components that we need for a build, using the single command that on every system, resolves dependancies and installs dependent software.  We aren’t there yet, but, that is were we need to go.  Once we get that, we should depend on it and use it, and never look back, then discussions like this never ever take place again, because the first person that wanted 1.5.3 would just put it in, update the 1.4.4 string to 1.5.3 and Bob’s your uncle.  No discussion, no asking, no worries.  That’s the path forward.

The current problem is that everyone wants to solve the dependency problem with their own tool and that tool is different on every system.  The entire software ecosystem would be better off if all these people contributed what they need to the design of a replacement and we all moved to one system.

1 - https://en.wikipedia.org/wiki/DLL_Hell

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 18:04                       ` Mike Stump
@ 2015-09-16 18:58                         ` Bernhard Reutner-Fischer
  2015-09-16 19:37                         ` Ramana Radhakrishnan
  1 sibling, 0 replies; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-16 18:58 UTC (permalink / raw)
  To: Mike Stump, Ramana Radhakrishnan
  Cc: Andreas Schwab, Jeff Law, David Malcolm, gcc-patches List,
	GCC Development

On September 16, 2015 7:57:03 PM GMT+02:00, Mike Stump <mikestump@comcast.net> wrote:
>On Sep 16, 2015, at 9:25 AM, Ramana Radhakrishnan
><ramana.radhakrishnan@foss.arm.com> wrote:
>> 
>> Sorry about the obvious (possibly dumb) question.
>
>> Can't we just import a copy of dejagnu each year and install it as
>part of the source tree?
>
>TL;DR: No.
>
>We could, and indeed, some people do engineering that way.  We instead
>depend upon package managers, software updates and backwards
>compatibility to manage the issue.  This is generally speaking, a
>better way to do software. In the olden days, back before shared
>libraries, X11 was the straw that broke the camels back. 

[Well some thus later had KGI, GGI and fresco (the interviews thing), but that's another story for sure ;) ]

Either way. Importing doesn't make sense at all.

Establishing and maintaining duplicated gcc_load_lib cascades don't either IMO. If folks feel maintaining them is less hazzle than forcing a new dejagnu then fine with me (although we do require pretty recent libs anyway and developers will usually likewise use rather recent binutils et al for obvious reasons).

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 17:46                         ` David Malcolm
@ 2015-09-16 19:09                           ` Bernhard Reutner-Fischer
  2015-09-16 19:51                             ` Mike Stump
  2015-09-17  0:07                           ` Segher Boessenkool
  1 sibling, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-09-16 19:09 UTC (permalink / raw)
  To: David Malcolm, Jeff Law
  Cc: Ramana Radhakrishnan, Mike Stump, Andreas Schwab,
	gcc-patches List, GCC Development

On September 16, 2015 7:39:42 PM GMT+02:00, David Malcolm <dmalcolm@redhat.com> wrote:
>On Wed, 2015-09-16 at 10:36 -0600, Jeff Law wrote:
>> On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote:
>> >
>> >
>> > On 16/09/15 17:14, Mike Stump wrote:
>> >> On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de>
>> >> wrote:
>> >>> Mike Stump <mikestump@comcast.net> writes:
>> >>>
>> >>>> The software presently works with 1.4.4 and there arenâ€™t any
>> >>>> changes that require anything newer.
>> >>>
>> >>> SLES 12 has 1.4.4.
>> >>
>> >> Would be nice to cover them as well, but their update schedule,
>3-4
>> >> years, means that their next update is 2018.  They didnâ€™t update
>to
>> >> a 3 year old stable release of dejagnu for their last OS, meaning
>> >> they are on a > 7 year update cycle.  I love embedded and really
>> >> long term support cycles (20 years), but, donâ€™t think we should
>> >> cater to the 20 year cycle just yet.  :-)  Since 7 is
>substantially
>> >> longer than 2, I donâ€™t think we should worry about it.  If they
>had
>> >> updated at the time, they would have had 3 years of engineering
>and
>> >> testing before the release and _had_ 1.5.
>> >>
>> >
>> > Sorry about the obvious (possibly dumb) question.
>> >
>> > Can't we just import a copy of dejagnu each year and install it as
>> > part of the source tree ? I can't imagine installing dejagnu is
>> > adding a huge amount of time to build and regression test time ?
>> > Advantage is that everyone is guaranteed to be on the same version.
>I
>> > fully expect resistance due to specific issues with specific
>versions
>> > of tcl and expect, but if folks aren't aware of this .....
>> That should work -- certainly that's the way we used to do things at 
>> Cygnus.  Some of that code may have bitrotted as single tree builds
>have 
>> fallen out-of-favor through the years.
>> 
>> As to whether or  not its a good idea.  I'm torn -- I don't like
>copying 
>> code from other repos because of the long term maintenance concerns.
>> 
>> I'd rather just move to 1.5 and get on with things. 
>
>AIUI, we specifically need >= 1.5.3 (or a version with a backport) to
>get support for multiple load_lib paths mentioned by Bernhard, which is
>what motivated this thread (on gcc-patches, before it spread to the gcc
>list):
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01196.html
> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00983.html

And
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01395.html

Where Joseph said he'd wait some more.. I had thought I asked longer ago than that, time flies if one has fun.

I'd just require 1.5.3 just to avoid the time needed by folks to workaround those silly ordering gotchas and load cascades that propagate through the tree. Admittedly not my call but a pity IMHO.

Thanks,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 18:04                       ` Mike Stump
  2015-09-16 18:58                         ` Bernhard Reutner-Fischer
@ 2015-09-16 19:37                         ` Ramana Radhakrishnan
  1 sibling, 0 replies; 133+ messages in thread
From: Ramana Radhakrishnan @ 2015-09-16 19:37 UTC (permalink / raw)
  To: Mike Stump
  Cc: Ramana Radhakrishnan, Andreas Schwab, Jeff Law,
	Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On Wed, Sep 16, 2015 at 11:27 PM, Mike Stump <mikestump@comcast.net> wrote:
> On Sep 16, 2015, at 9:25 AM, Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com> wrote:
>>
>> Sorry about the obvious (possibly dumb) question.
>
>> Can't we just import a copy of dejagnu each year and install it as part of the source tree?
>
> TL;DR: No.

[snip]

Thanks for the refresher on dll hell ;)  My original inspiration for
thinking about the import just as I was leaving for the day was the
whole raft of target libraries we now build with gcc that are imported
 (for convenience, coupling with compiler features etc. etc. ), why
not do the same with what is essentially required for any developer ?
I also see that the coupling for dejagnu is probably best left to
packaging systems - however we have a situations where developers face
version skew for dejagnu with the systems they are forced to develop
and test on. The question was really whether we as a community thought
that this cost with dejagnu was worth it and for someone to ask that
obvious question.

I am happy to settle for dealing with dejagnu the same way as other
prerequisites like gmp, mpfr / mpc i.e. put out a hard error when
testing if the right version of dejagnu is not found. For bonus points
update download_prerequisites to get the right version in the source
tree for people using older distributions and not being in a position
to upgrade their packages.

regards
Ramana

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 19:09                           ` Bernhard Reutner-Fischer
@ 2015-09-16 19:51                             ` Mike Stump
  0 siblings, 0 replies; 133+ messages in thread
From: Mike Stump @ 2015-09-16 19:51 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: David Malcolm, Jeff Law, Ramana Radhakrishnan, Andreas Schwab,
	gcc-patches List, GCC Development

On Sep 16, 2015, at 12:02 PM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> Where Joseph said he'd wait some more.. I had thought I asked longer ago than that, time flies if one has fun.
> 
> I'd just require 1.5.3 just to avoid the time needed by folks to workaround those silly ordering gotchas and load cascades that propagate through the tree. Admittedly not my call but a pity IMHO.

If maintanence is a burden for those that usually maintain these things, we can by fiat just bump up to 1.5.3. It isn’t the end of the world if we do.  It just seems the cost isn’t that high to me, however I’m happy to defer to the people in the trench.  Since 1.5 isn’t going to ease what appears to be the main issue, I think status quo or 1.5.3 make the most sense.  So, let’s phrase it this way, if you work in the trench and are impacted, do you want to see the bump to 1.5.3 now to ease the burden?

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 11:01         ` Jakub Jelinek
@ 2015-09-16 20:29           ` David Malcolm
  2015-09-17 16:54             ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-16 20:29 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote:
> On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote:
> > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > >> > index 760467c..c7558a0 100644
> > >> > --- a/gcc/cp/parser.h
> > >> > +++ b/gcc/cp/parser.h
> > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> > >> >    BOOL_BITFIELD purged_p : 1;
> > >> >    /* The location at which this token was found.  */
> > >> >    location_t location;
> > >> > +  /* The source range at which this token was found.  */
> > >> > +  source_range range;
> > >>
> > >> Is it just me or does location now feel somewhat redundant with range?  Can't we
> > >> compress that somehow?
> > >
> > > For a token I'd expect it is redundant, I don't see how it would be useful
> > > for a single preprocessing token to have more than start and end locations.
> > > But generally, for expressions, 3 locations make sense.
> > > If you have
> > > abc + def
> > > ~~~~^~~~~
> > > then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
> > > location_t (the data structures behind it, where we already stick also
> > > BLOCK pointer).
> > 
> > Probably lack of encoding space ... I suppose upping location_t to
> > 64bits coud solve
> > some of that (with its own drawback on increasing size of core structures).
> 
> What I had in mind was just add
>   source_location start, end;
> to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent
> just plain locations without block and without range (including the cases
> where the range has both start and end equal to the locus) and IS_ADHOC_LOC
> locations for the cases where either we have non-NULL block, or we have
> some other range, or both.  But I haven't spent any time on that, so just
> wondering if such an encoding has been considered.

I've been attempting to implement that.

Am I right in thinking that the ad-hoc locations never get purged? i.e.
that once we've registered an ad-hoc location, then is that slot within
location_adhoc_data_map is forever associated with that (locus, block)
pair?  [or in the proposed model, the (locus, src_range, block)
triple?].

If so, it may make more sense to put the ranges into ad-hoc locations,
but only *after tokenization*: in this approach, the src_range would be
a field within the tokens (like in patch 07/22), in the hope that the
tokens are short-lived  (which AIUI is the case for libcpp and C, though
not for C++), presumably also killing the "location" field within
tokens.  We then stuff the range into the location_t when building trees
(maybe putting a src_range into c_expr to further delay populating
location_adhoc_data_map).

That way we avoid bloating the location_adhoc_data_map during lexing,
whilst preserving the range information, and we can stuff the ranges
into the 32-bit location_t within tree/gimple etc (albeit paying a cost
within the location_adhoc_data_map).

Thoughts?  Hope this sounds sane.
Dave

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 17:46                         ` David Malcolm
  2015-09-16 19:09                           ` Bernhard Reutner-Fischer
@ 2015-09-17  0:07                           ` Segher Boessenkool
  1 sibling, 0 replies; 133+ messages in thread
From: Segher Boessenkool @ 2015-09-17  0:07 UTC (permalink / raw)
  To: David Malcolm
  Cc: Jeff Law, Ramana Radhakrishnan, Mike Stump, Andreas Schwab,
	Bernhard Reutner-Fischer, gcc-patches List, GCC Development

On Wed, Sep 16, 2015 at 01:39:42PM -0400, David Malcolm wrote:
> AIUI, we specifically need >= 1.5.3 (or a version with a backport) to
> get support for multiple load_lib paths mentioned by Bernhard, which is
> what motivated this thread (on gcc-patches, before it spread to the gcc
> list):

We also need it to avoid the "-jN check loses most results in the summary"
problem; or it seems we need to avoid 1.5.2 only for that, if I read the
log correctly.


Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-16 15:49             ` Manuel López-Ibáñez
@ 2015-09-17  8:46               ` Richard Biener
  0 siblings, 0 replies; 133+ messages in thread
From: Richard Biener @ 2015-09-17  8:46 UTC (permalink / raw)
  To: Manuel López-Ibáñez
  Cc: Michael Matz, David Malcolm, GCC Patches

On Wed, Sep 16, 2015 at 5:45 PM, Manuel López-Ibáñez
<lopezibanez@gmail.com> wrote:
> On 16 September 2015 at 15:33, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Wed, Sep 16, 2015 at 3:22 PM, Michael Matz <matz@suse.de> wrote:
>>>> if we suggest 'foo' instead of foz then we'll get a more confusing followup
>>>> error if we actually use it.
>>>
>>> This particular case could be solved by ruling out candidaten of the wrong
>>> kind (here, something that can be assigned to, vs. a function).  But it
>>> might actually be too early in parsing to say that there will be an
>>> assignment.  I don't think _this_ problem should block the patch.
>
> Indeed. The patch by David does not try to fix-up the code, it merely
> suggests a possible candidate. The follow-up errors should be the same
> before and after. Such suggestions will never be 100% right, even if
> the suggestion makes the code compile and run, it may still be the
> wrong one. A wrong suggestion is far less serious than a wrong
> uninitialized or Warray-bounds warning and we can live with those. Why
> this needs to be perfect from the very beginning?
>
> BTW, there is a PR for this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52277
>
>> I wonder if we can tentatively parse with the choice at hand, only allowing
>> (and even suggesting?) it if that works out.
>
> This would require to queue the error, fix-up the wrong name and
> continue parsing. If there is another error, ignore that one and emit
> the original error without suggestion. The problem here is that we do
> not know if the additional error is actually caused by the fix-up we
> did or it is an already existing error. It would be equally terrible
> to emit errors caused by the fix-up or emit just a single error for
> the typo. We would need to roll-back the tentative parse and do a
> definitive parse anyway. This does not seem possible at the moment
> because the parsers maintain a lot of global state that is not easy to
> roll-back. We cannot simply create a copy of the parser state and
> throw it away later to continue as if the tentative parse has not
> happened.
>
> I'm not even sure if, in general, one can stop at the statement level
> or we would need to parse the whole function (or translation unit) to
> be able to tell if the suggestion is a valid candidate.

I was suggesting to only tentatively finish parsing the "current construct".
No idea how to best figure that out to the extend to make the tentative
parse useful.  Say, if we have "a + s.foz" and the field foz is not there
but foo is, so if we continue parsing with 'foo' instead but 'foo' will have
a type that makes "a + s.foo" invalid then we probably shouldn't suggest
it.  It _might_ be reasonably "easy" to implement that, but I'm not sure.
There might be a field named fz (with same or bigger levenstein distance)
with the correct type.  Of course it might have been I misspelled
's' and meant 'r' instead which has a field foz of corect type... (and 's'
is available as well).

I agree that we don't have to solve all this in the first iteration.

Richard.

> Cheers,
>
> Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-16 16:39                       ` Jeff Law
  2015-09-16 17:26                         ` Trevor Saunders
  2015-09-16 17:46                         ` David Malcolm
@ 2015-09-17 13:57                         ` Richard Earnshaw
  2 siblings, 0 replies; 133+ messages in thread
From: Richard Earnshaw @ 2015-09-17 13:57 UTC (permalink / raw)
  To: Jeff Law, Ramana Radhakrishnan, Mike Stump, Andreas Schwab
  Cc: Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On 16/09/15 17:36, Jeff Law wrote:
> On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote:
>>
>>
>> On 16/09/15 17:14, Mike Stump wrote:
>>> On Sep 16, 2015, at 12:29 AM, Andreas Schwab <schwab@suse.de>
>>> wrote:
>>>> Mike Stump <mikestump@comcast.net> writes:
>>>>
>>>>> The software presently works with 1.4.4 and there arenâ€™t any
>>>>> changes that require anything newer.
>>>>
>>>> SLES 12 has 1.4.4.
>>>
>>> Would be nice to cover them as well, but their update schedule, 3-4
>>> years, means that their next update is 2018.  They didnâ€™t update to
>>> a 3 year old stable release of dejagnu for their last OS, meaning
>>> they are on a > 7 year update cycle.  I love embedded and really
>>> long term support cycles (20 years), but, donâ€™t think we should
>>> cater to the 20 year cycle just yet.  :-)  Since 7 is substantially
>>> longer than 2, I donâ€™t think we should worry about it.  If they had
>>> updated at the time, they would have had 3 years of engineering and
>>> testing before the release and _had_ 1.5.
>>>
>>
>> Sorry about the obvious (possibly dumb) question.
>>
>> Can't we just import a copy of dejagnu each year and install it as
>> part of the source tree ? I can't imagine installing dejagnu is
>> adding a huge amount of time to build and regression test time ?
>> Advantage is that everyone is guaranteed to be on the same version. I
>> fully expect resistance due to specific issues with specific versions
>> of tcl and expect, but if folks aren't aware of this .....
> That should work -- certainly that's the way we used to do things at
> Cygnus.  Some of that code may have bitrotted as single tree builds have
> fallen out-of-favor through the years.
> 
> As to whether or  not its a good idea.  I'm torn -- I don't like copying
> code from other repos because of the long term maintenance concerns.
> 
> I'd rather just move to 1.5 and get on with things.  If some systems
> don't have a new enough version, I'm comfortable telling developers on
> those platforms that they need to update.  It's not like every *user*
> needs dejagnu, it's just for the testing side of things.
> 
> 
> jeff

I don't see it as a major issue to have your own private build of
dejagnu rather than the system supplied one.  The only local change you
need is to add it to the front of your path before testing.

Dejagnu does not heavily depend on system libraries, it is not built
directly into GCC is pretty independent on the version of expect that
you have on your machine (likely the system version will serve fine).
So why don't we just migrate to the latest and greatest version as our
standard and be done with these old versions that are lying around?

R.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-16 20:29           ` David Malcolm
@ 2015-09-17 16:54             ` David Malcolm
  2015-09-17 19:15               ` Jeff Law
  0 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-17 16:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On Wed, 2015-09-16 at 16:21 -0400, David Malcolm wrote:
> On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote:
> > On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote:
> > > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
> > > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > > >> > index 760467c..c7558a0 100644
> > > >> > --- a/gcc/cp/parser.h
> > > >> > +++ b/gcc/cp/parser.h
> > > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
> > > >> >    BOOL_BITFIELD purged_p : 1;
> > > >> >    /* The location at which this token was found.  */
> > > >> >    location_t location;
> > > >> > +  /* The source range at which this token was found.  */
> > > >> > +  source_range range;
> > > >>
> > > >> Is it just me or does location now feel somewhat redundant with range?  Can't we
> > > >> compress that somehow?
> > > >
> > > > For a token I'd expect it is redundant, I don't see how it would be useful
> > > > for a single preprocessing token to have more than start and end locations.
> > > > But generally, for expressions, 3 locations make sense.
> > > > If you have
> > > > abc + def
> > > > ~~~~^~~~~
> > > > then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
> > > > location_t (the data structures behind it, where we already stick also
> > > > BLOCK pointer).
> > > 
> > > Probably lack of encoding space ... I suppose upping location_t to
> > > 64bits coud solve
> > > some of that (with its own drawback on increasing size of core structures).
> > 
> > What I had in mind was just add
> >   source_location start, end;
> > to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent
> > just plain locations without block and without range (including the cases
> > where the range has both start and end equal to the locus) and IS_ADHOC_LOC
> > locations for the cases where either we have non-NULL block, or we have
> > some other range, or both.  But I haven't spent any time on that, so just
> > wondering if such an encoding has been considered.
> 
> I've been attempting to implement that.
> 
> Am I right in thinking that the ad-hoc locations never get purged? i.e.
> that once we've registered an ad-hoc location, then is that slot within
> location_adhoc_data_map is forever associated with that (locus, block)
> pair?  [or in the proposed model, the (locus, src_range, block)
> triple?].
> 
> If so, it may make more sense to put the ranges into ad-hoc locations,
> but only *after tokenization*: in this approach, the src_range would be
> a field within the tokens (like in patch 07/22), in the hope that the
> tokens are short-lived  (which AIUI is the case for libcpp and C, though
> not for C++), presumably also killing the "location" field within
> tokens.  We then stuff the range into the location_t when building trees
> (maybe putting a src_range into c_expr to further delay populating
> location_adhoc_data_map).
> 
> That way we avoid bloating the location_adhoc_data_map during lexing,
> whilst preserving the range information, and we can stuff the ranges
> into the 32-bit location_t within tree/gimple etc (albeit paying a cost
> within the location_adhoc_data_map).
> 
> Thoughts?  Hope this sounds sane.
> Dave

FWIW, I have a (very messy) implementation of this working for the C
frontend, which gives us source ranges on expressions without needing to
add any new tree nodes, or add any fields to existing tree structs.

The approach I'm using:

* ranges are stored directly as fields within cpp_token and c_token
(maybe we can ignore cp_token for now)

* ranges are stashed in the C FE, both (a) within the "struct c_expr"
and (b) within the location_t of each tree expression node as a new
field in the adhoc map.

Doing it twice may seem slightly redundant, but I think both are needed:
  (a) doing it within c_expr allows us to support ranges for constants
and VAR_DECL etc during parsing, without needing any kind of new tree
wrapper node
  (b) doing it in the ad-hoc map allows the ranges for expressions to
survive the parse and be usable in diagnostics later.

So this gives us (in the C FE): ranges for everything during parsing,
and ranges for expressions afterwards, with no new tree nodes or new
fields within tree nodes.

I'm working on cleaning it up into a much more minimal set of patches
that I hope are reviewable.

Hopefully this sounds like a good approach

I've also been looking at ways to mitigate bloat of the ad-hoc map, by
using some extra bits of location_t for representing short ranges
directly.

Dave

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 12:57           ` Manuel López-Ibáñez
@ 2015-09-17 19:11             ` Jeff Law
  0 siblings, 0 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-17 19:11 UTC (permalink / raw)
  To: Manuel López-Ibáñez, Richard Biener
  Cc: Jakub Jelinek, David Malcolm, GCC Patches

On 09/15/2015 06:54 AM, Manuel LÃ³pez-IbÃ¡Ã±ez wrote:
> On 15 September 2015 at 14:18, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> Of course this boils down to "uses" of a VAR_DECL using the shared tree
>> node.  On GIMPLE some stmt kinds have separate locations for each operand
>> (PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to
>> wrap such uses to be able to give them distinct locations (can't use sth
>> existing as frontends would need to ignore them in a different way than say
>> NOP_EXPRs or NON_LVALUE_EXPRs).
>>
>
> The problem with that approach (besides the waste of memory implied by
> a whole tree node just to store one location_t) is keeping those
> wrappers in place while making them transparent for most of the
> compiler. According to Arnaud, folding made this approach infeasible:
> https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01222.html
>
> The other two alternatives are to store the location of the operands
> on the expressions themselves or to store them as on-the-side
> data-structure, but they come with their own drawbacks. I was
> initially more in favour of the wrapper solution, but after dealing
> with NOP_EXPRs, having to deal also with LOC_EXPR would be a nightmare
> (as you say, they will have to be ignored in a different way). The
> other alternatives seem less invasive and the problems mentioned here
> https://gcc.gnu.org/ml/gcc-patches/2012-11/msg00164.html do not seem
> as serious as I thought (passing down the location of the operand is
> becoming  the norm anyway).
I suspect any on-the-side data structure to handle this is ultimately 
doomed to failure.  Storing the location info for the operands in the 
expression means that anything that modifies an operand would have to 
have access to the expression so that location information could be 
updated.  Ugh.

As painful as it will be, the right way is to stop using DECL nodes like 
this and instead be using another node that isn't shared.  That allows 
atttaching location information.  David and I kicked this around before 
he posted his patch and didn't come up with anything better IIRC.

These wrapper nodes are definitely going to get in the way of folders 
and all kinds of things.  So it's not something that's going to be easy 
to add without digging into and modifying a lot of code.

I've always considered this a wart, but fixing that wart hasn't seemed 
worth the effort until recently.

Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 12:18         ` Richard Biener
  2015-09-15 12:57           ` Manuel López-Ibáñez
@ 2015-09-17 19:13           ` Jeff Law
  1 sibling, 0 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-17 19:13 UTC (permalink / raw)
  To: Richard Biener, Manuel López-Ibáñez
  Cc: Jakub Jelinek, David Malcolm, GCC Patches

On 09/15/2015 06:18 AM, Richard Biener wrote:

>
> Of course this boils down to "uses" of a VAR_DECL using the shared tree
> node.  On GIMPLE some stmt kinds have separate locations for each operand
> (PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to
> wrap such uses to be able to give them distinct locations (can't use sth
> existing as frontends would need to ignore them in a different way than say
> NOP_EXPRs or NON_LVALUE_EXPRs).
Exactly.

Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-17 16:54             ` David Malcolm
@ 2015-09-17 19:15               ` Jeff Law
  2015-09-17 20:06                 ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-09-17 19:15 UTC (permalink / raw)
  To: David Malcolm, Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On 09/17/2015 10:49 AM, David Malcolm wrote:

> FWIW, I have a (very messy) implementation of this working for the C
> frontend, which gives us source ranges on expressions without needing to
> add any new tree nodes, or add any fields to existing tree structs.
>
> The approach I'm using:
>
> * ranges are stored directly as fields within cpp_token and c_token
> (maybe we can ignore cp_token for now)
>
> * ranges are stashed in the C FE, both (a) within the "struct c_expr"
> and (b) within the location_t of each tree expression node as a new
> field in the adhoc map.
>
> Doing it twice may seem slightly redundant, but I think both are needed:
>    (a) doing it within c_expr allows us to support ranges for constants
> and VAR_DECL etc during parsing, without needing any kind of new tree
> wrapper node
>    (b) doing it in the ad-hoc map allows the ranges for expressions to
> survive the parse and be usable in diagnostics later.
>
> So this gives us (in the C FE): ranges for everything during parsing,
> and ranges for expressions afterwards, with no new tree nodes or new
> fields within tree nodes.
>
> I'm working on cleaning it up into a much more minimal set of patches
> that I hope are reviewable.
>
> Hopefully this sounds like a good approach
So is the assumption here that the location information is or is not 
supposed to survive through the gimple optimizers?   If I understand 
what you're doing correctly, I think the location information gets 
invalidated by const/copy propagations.

Though perhaps that's not a major problem because we're typically 
propagating an SSA_NAME, not a _DECL node.  Hmm.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-15 10:48       ` Richard Biener
  2015-09-15 11:01         ` Jakub Jelinek
@ 2015-09-17 19:25         ` Jeff Law
  1 sibling, 0 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-17 19:25 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: David Malcolm, GCC Patches

On 09/15/2015 04:33 AM, Richard Biener wrote:
> On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote:
>>>> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
>>>> index 760467c..c7558a0 100644
>>>> --- a/gcc/cp/parser.h
>>>> +++ b/gcc/cp/parser.h
>>>> @@ -61,6 +61,8 @@ struct GTY (()) cp_token {
>>>>     BOOL_BITFIELD purged_p : 1;
>>>>     /* The location at which this token was found.  */
>>>>     location_t location;
>>>> +  /* The source range at which this token was found.  */
>>>> +  source_range range;
>>>
>>> Is it just me or does location now feel somewhat redundant with range?  Can't we
>>> compress that somehow?
>>
>> For a token I'd expect it is redundant, I don't see how it would be useful
>> for a single preprocessing token to have more than start and end locations.
>> But generally, for expressions, 3 locations make sense.
>> If you have
>> abc + def
>> ~~~~^~~~~
>> then having a range is useful.  In any case, I'm surprised that the ranges aren't encoded in
>> location_t (the data structures behind it, where we already stick also
>> BLOCK pointer).
>
> Probably lack of encoding space ... I suppose upping location_t to
> 64bits coud solve
> some of that (with its own drawback on increasing size of core structures).
If we're going to 64 bits, then we might consider making it a pointer so 
that we don't have to spend so much time building up an encoding scheme.

Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-16  8:45       ` Richard Biener
  2015-09-16 13:33         ` Michael Matz
@ 2015-09-17 19:32         ` Jeff Law
  2015-09-17 20:05           ` David Malcolm
  2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
  1 sibling, 2 replies; 133+ messages in thread
From: Jeff Law @ 2015-09-17 19:32 UTC (permalink / raw)
  To: Richard Biener, David Malcolm
  Cc: GCC Patches, Manuel López-Ibáñez

On 09/16/2015 02:34 AM, Richard Biener wrote:
>
> Btw, this looks quite expensive - I'm sure we want to limit the effort
> here a bit.
A limiter is reasonable, though as it's been pointed out this only fires 
during error processing, so we probably have more leeway to take time 
and see if we can do better error recovery.

FWIW, I've used this algorithm in totally unrelated projects and while 
it seems expensive, it's worked out quite nicely.

>
> So while the idea might be an improvement to selected cases it can cause
> confusion as well.  And if using the suggestion for further parsing it can
> cause worse followup errors (unless we can limit such "fixup" use to the
> cases where we can parse the result without errors).  Consider
>
> foo()
> {
>    foz = 1;
> }
>
> if we suggest 'foo' instead of foz then we'll get a more confusing followup
> error if we actually use it.
True.  This kind of problem is probably inherent in this kind of "I'm 
going assume you meant..." error recovery mechanisms.

And just to be clear, even in a successful recovery scenario, we still 
issue an error.  The error recovery is just meant to try and give the 
user a hint what might have gone wrong and gracefully handle the case 
where they just made a minor goof.  Obviously the idea here is to cut 
down on the number of iterations of edit-compile cycle one has to do :-)


Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-17 19:32         ` Jeff Law
@ 2015-09-17 20:05           ` David Malcolm
  2015-09-17 20:52             ` Manuel López-Ibáñez
  2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
  1 sibling, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-09-17 20:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches, Manuel López-Ibáñez

On Thu, 2015-09-17 at 13:31 -0600, Jeff Law wrote:
> On 09/16/2015 02:34 AM, Richard Biener wrote:
> >
> > Btw, this looks quite expensive - I'm sure we want to limit the effort
> > here a bit.
> A limiter is reasonable, though as it's been pointed out this only fires 
> during error processing, so we probably have more leeway to take time 
> and see if we can do better error recovery.
> 
> FWIW, I've used this algorithm in totally unrelated projects and while 
> it seems expensive, it's worked out quite nicely.
> 
> >
> > So while the idea might be an improvement to selected cases it can cause
> > confusion as well.  And if using the suggestion for further parsing it can
> > cause worse followup errors (unless we can limit such "fixup" use to the
> > cases where we can parse the result without errors).  Consider
> >
> > foo()
> > {
> >    foz = 1;
> > }
> >
> > if we suggest 'foo' instead of foz then we'll get a more confusing followup
> > error if we actually use it.
> True.  This kind of problem is probably inherent in this kind of "I'm 
> going assume you meant..." error recovery mechanisms.
> 
> And just to be clear, even in a successful recovery scenario, we still 
> issue an error.  The error recovery is just meant to try and give the 
> user a hint what might have gone wrong and gracefully handle the case 
> where they just made a minor goof.  

(nods)

> Obviously the idea here is to cut 
> down on the number of iterations of edit-compile cycle one has to do :-)

In my mind it's more about saving the user from having to locate the
field they really meant within the corresponding structure declaration
(either by grep, or by some cross-referencing tool).

A lot of the time I find myself wishing that the compiler had issued a
note saying "here's the declaration of the struct in question", which
would make it easy for me to go straight there in Emacs.

I wonder what proportion of our users use a cross-referencing tool or
have an IDE that can find this stuff for them, vs those that rely on
grep, and if that should mean something for our diagnostics (I tend to
just rely on grep).

This is rather tangential to this RFE, of course.

Dave

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
  2015-09-17 19:15               ` Jeff Law
@ 2015-09-17 20:06                 ` David Malcolm
  0 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-17 20:06 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jakub Jelinek, Richard Biener, GCC Patches

On Thu, 2015-09-17 at 13:14 -0600, Jeff Law wrote:
> On 09/17/2015 10:49 AM, David Malcolm wrote:
> 
> > FWIW, I have a (very messy) implementation of this working for the C
> > frontend, which gives us source ranges on expressions without needing to
> > add any new tree nodes, or add any fields to existing tree structs.
> >
> > The approach I'm using:
> >
> > * ranges are stored directly as fields within cpp_token and c_token
> > (maybe we can ignore cp_token for now)
> >
> > * ranges are stashed in the C FE, both (a) within the "struct c_expr"
> > and (b) within the location_t of each tree expression node as a new
> > field in the adhoc map.
> >
> > Doing it twice may seem slightly redundant, but I think both are needed:
> >    (a) doing it within c_expr allows us to support ranges for constants
> > and VAR_DECL etc during parsing, without needing any kind of new tree
> > wrapper node
> >    (b) doing it in the ad-hoc map allows the ranges for expressions to
> > survive the parse and be usable in diagnostics later.
> >
> > So this gives us (in the C FE): ranges for everything during parsing,
> > and ranges for expressions afterwards, with no new tree nodes or new
> > fields within tree nodes.
> >
> > I'm working on cleaning it up into a much more minimal set of patches
> > that I hope are reviewable.
> >
> > Hopefully this sounds like a good approach
> So is the assumption here that the location information is or is not 
> supposed to survive through the gimple optimizers?

To be honest I hadn't given much thought to that stage; my hope is that
most of the diagnostics have been issued by the time we're optimizing.

In the proposed implementation the range information is "baked in" to
the location_t (via the ad-hoc lookaside), so it's carried along
wherever the location_t goes, and ought to have the same chances of
survival within the gimple optimizers as the existing location
information does.

>    If I understand 
> what you're doing correctly, I think the location information gets 
> invalidated by const/copy propagations.
> 
> Though perhaps that's not a major problem because we're typically 
> propagating an SSA_NAME, not a _DECL node.  Hmm.

Well, if the location_t is being invalidated by an optimization, we're
already losing source *point* information: file/line/column.  Given
that, losing range information as well seems like no great loss.

Or am I missing something?

(I am attempting to preserve the source_range data when block ptrs are
baked in to the ad-hoc locations, if that's what you're referring to)

Dave

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
  2015-09-17 20:05           ` David Malcolm
@ 2015-09-17 20:52             ` Manuel López-Ibáñez
  0 siblings, 0 replies; 133+ messages in thread
From: Manuel López-Ibáñez @ 2015-09-17 20:52 UTC (permalink / raw)
  To: David Malcolm; +Cc: Jeff Law, Richard Biener, GCC Patches

On 17 September 2015 at 21:57, David Malcolm <dmalcolm@redhat.com> wrote:
> In my mind it's more about saving the user from having to locate the
> field they really meant within the corresponding structure declaration
> (either by grep, or by some cross-referencing tool).

I think it is more than that. After a long coding session, one can
start to wonder why the compiler cannot find type_of_unknwon_predicate
or firstColourInColumn (ah! it was type_of_unknown_predicate and
firstColorInColumn!).

Or when we extend this to options (PR67613), why I get

error: unrecognized command line option '-Weffic++'

when I just read it in the manual!

Cheers,

Manuel.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file
  2015-09-14 19:37   ` Jeff Law
@ 2015-09-18 18:31     ` David Malcolm
  0 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-09-18 18:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

On Mon, 2015-09-14 at 13:35 -0600, Jeff Law wrote:
> On 09/10/2015 02:28 PM, David Malcolm wrote:
> > The function "diagnostic_show_locus" gains new functionality in the
> > next patch, so this preliminary patch breaks it out into a new source
> > file, diagnostic-show-locus.c, along with a couple of related functions.
> >
> > gcc/ChangeLog:
> > 	* Makefile.in (OBJS-libcommon): Add diagnostic-show-locus.o.
> > 	* diagnostic.c (adjust_line): Move to diagnostic-show-locus.c.
> > 	(diagnostic_show_locus): Likewise.
> > 	(diagnostic_print_caret_line): Likewise.
> > 	* diagnostic-show-locus.c: New file.
> This is fine for the trunk.

Thanks; bootstrapped&regrtested; committed to trunk as r227915.

> So much for the easy stuff :-)

FWIW, I'm working on a much simpler version of the patch kit, addressing
some of the issues already raised.

> jeff
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 2/2] C FE: suggest corrections for misspelled field names
  2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
@ 2015-10-30 12:30             ` David Malcolm
  2015-10-30 12:36             ` [PATCH 1/2] Implement Levenshtein distance David Malcolm
  2015-11-02  6:44             ` [PATCH 0/2] Levenshtein-based suggestions (v3) Jeff Law
  2 siblings, 0 replies; 133+ messages in thread
From: David Malcolm @ 2015-10-30 12:30 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches,
	David Malcolm

This is similar to the field-name part of the v2 patch:
 https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01090.html
with the following changes:
  - don't call unit tests from lookup_field_fuzzy
    (instead, see patch 1 in the kit)
  - use a cutoff: if more than half of the letters
    were misspelled, the suggestion is likely to
    be meaningless, so don't offer it.
  - more test coverage
  - deferral of the hints for type-name lookup (this can
    wait to a later patch, since it seemed more
    controversial)

gcc/c/ChangeLog:
	* c-typeck.c: Include spellcheck.h.
	(lookup_field_fuzzy_find_candidates): New function.
	(lookup_field_fuzzy): New function.
	(build_component_ref): If the field was not found, try using
	lookup_field_fuzzy and potentially offer a suggestion.

gcc/testsuite/ChangeLog:
	* gcc.dg/spellcheck-fields.c: New file.
---
 gcc/c/c-typeck.c                         | 74 +++++++++++++++++++++++++++++++-
 gcc/testsuite/gcc.dg/spellcheck-fields.c | 63 +++++++++++++++++++++++++++
 2 files changed, 136 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 61c5313..0660610 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ubsan.h"
 #include "cilk.h"
 #include "gomp-constants.h"
+#include "spellcheck.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -2249,6 +2250,72 @@ lookup_field (tree type, tree component)
   return tree_cons (NULL_TREE, field, NULL_TREE);
 }
 
+/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
+
+static void
+lookup_field_fuzzy_find_candidates (tree type, tree component,
+				    vec<tree> *candidates)
+{
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+    {
+      if (DECL_NAME (field) == NULL_TREE
+	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
+	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+	{
+	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+					      component,
+					      candidates);
+	}
+
+      if (DECL_NAME (field))
+	candidates->safe_push (DECL_NAME (field));
+    }
+}
+
+/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
+   rather than returning a TREE_LIST for an exact match.  */
+
+static tree
+lookup_field_fuzzy (tree type, tree component)
+{
+  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
+
+  /* First, gather a list of candidates.  */
+  auto_vec <tree> candidates;
+
+  lookup_field_fuzzy_find_candidates (type, component,
+				      &candidates);
+
+  /* Now determine which is closest.  */
+  int i;
+  tree identifier;
+  tree best_identifier = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+  FOR_EACH_VEC_ELT (candidates, i, identifier)
+    {
+      gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
+      edit_distance_t dist = levenshtein_distance (component, identifier);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best_identifier = identifier;
+	}
+    }
+
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best_identifier)
+    {
+      unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
+				 IDENTIFIER_LENGTH (best_identifier)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+
+  return best_identifier;
+}
+
 /* Make an expression to refer to the COMPONENT field of structure or
    union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
    location of the COMPONENT_REF.  */
@@ -2284,7 +2351,12 @@ build_component_ref (location_t loc, tree datum, tree component)
 
       if (!field)
 	{
-	  error_at (loc, "%qT has no member named %qE", type, component);
+	  tree guessed_id = lookup_field_fuzzy (type, component);
+	  if (guessed_id)
+	    error_at (loc, "%qT has no member named %qE; did you mean %qE?",
+		      type, component, guessed_id);
+	  else
+	    error_at (loc, "%qT has no member named %qE", type, component);
 	  return error_mark_node;
 	}
 
diff --git a/gcc/testsuite/gcc.dg/spellcheck-fields.c b/gcc/testsuite/gcc.dg/spellcheck-fields.c
new file mode 100644
index 0000000..01be550
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-fields.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+
+struct foo
+{
+  int foo;
+  int bar;
+  int baz;
+};
+
+int test (struct foo *ptr)
+{
+  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+int test2 (void)
+{
+  struct foo instance = {0, 0, 0};
+  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+struct s {
+    struct j { int aa; } kk;
+    int ab;
+};
+
+void test3 (struct s x)
+{
+  x.ac;  /* { dg-error "'struct s' has no member named 'ac'; did you mean 'ab'?" } */
+}
+
+int test4 (struct foo *ptr)
+{
+  return sizeof (ptr->foa); /* { dg-error "'struct foo' has no member named 'foa'; did you mean 'foo'?" } */
+}
+
+/* Verify that we don't offer nonsensical suggestions.  */
+
+int test5 (struct foo *ptr)
+{
+  return ptr->this_is_unlike_any_of_the_fields;   /* { dg-bogus "did you mean" } */
+  /* { dg-error "has no member named" "" { target *-*-* } 40 } */
+}
+
+union u
+{
+  int color;
+  int shape;
+};
+
+int test6 (union u *ptr)
+{
+  return ptr->colour; /* { dg-error "'union u' has no member named 'colour'; did you mean 'color'?" } */
+}
+
+struct has_anon
+{
+  struct { int color; } s;
+};
+
+int test7 (struct has_anon *ptr)
+{
+  return ptr->s.colour; /* { dg-error "'struct <anonymous>' has no member named 'colour'; did you mean 'color'?" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-09-17 19:32         ` Jeff Law
  2015-09-17 20:05           ` David Malcolm
@ 2015-10-30 12:30           ` David Malcolm
  2015-10-30 12:30             ` [PATCH 2/2] C FE: suggest corrections for misspelled field names David Malcolm
                               ` (2 more replies)
  1 sibling, 3 replies; 133+ messages in thread
From: David Malcolm @ 2015-10-30 12:30 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches,
	David Malcolm

On Thu, 2015-09-17 at 13:31 -0600, Jeff Law wrote:
> On 09/16/2015 02:34 AM, Richard Biener wrote:
> >
> > Btw, this looks quite expensive - I'm sure we want to limit the effort
> > here a bit.
> A limiter is reasonable, though as it's been pointed out this only fires 
> during error processing, so we probably have more leeway to take time 
> and see if we can do better error recovery.
> 
> FWIW, I've used this algorithm in totally unrelated projects and while 
> it seems expensive, it's worked out quite nicely.
> 
> >
> > So while the idea might be an improvement to selected cases it can cause
> > confusion as well.  And if using the suggestion for further parsing it can
> > cause worse followup errors (unless we can limit such "fixup" use to the
> > cases where we can parse the result without errors).  Consider
> >
> > foo()
> > {
> >    foz = 1;
> > }
> >
> > if we suggest 'foo' instead of foz then we'll get a more confusing followup
> > error if we actually use it.
> True.  This kind of problem is probably inherent in this kind of "I'm 
> going assume you meant..." error recovery mechanisms.
> 
> And just to be clear, even in a successful recovery scenario, we still 
> issue an error.  The error recovery is just meant to try and give the 
> user a hint what might have gone wrong and gracefully handle the case 
> where they just made a minor goof.  Obviously the idea here is to cut 
> down on the number of iterations of edit-compile cycle one has to do :-)
> 
> 
> Jeff

The typename suggestion seems to be at least somewhat controversial,
whereas (I hope) the misspelled field names suggestion is more
acceptable.

Hence I'm focusing on the field name lookup for now; other uses of the
algorithm (e.g. the typename lookup) could be done in followup patches,
but I'm deferring them for now in the hope of getting the simplest case
into trunk as a first step.  Similarly, for simplicity, I didn't
implement any attempt at error-recovery using the hint.

The following patch kit is in two parts (for ease of review; they would
be applied together):

  patch 1: Implement Levenshtein distance
  patch 2: C FE: suggest corrections for misspelled field names

I didn't implement a limiter, on the grounds that this only fires
once per "has no member named" error, and so is unlikely to slow
things down noticeably.

Successfully bootstrapped&regrtested the combination of these two
on x86_64-pc-linux-gnu (adds 11 new PASS results to gcc.sum)

OK for trunk?

 gcc/Makefile.in                                  |   1 +
 gcc/c/c-typeck.c                                 |  70 +++++++++++-
 gcc/spellcheck.c                                 | 136 +++++++++++++++++++++++
 gcc/spellcheck.h                                 |  32 ++++++
 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp           |   1 +
 gcc/testsuite/gcc.dg/spellcheck-fields.c         |  63 +++++++++++
 8 files changed, 375 insertions(+), 1 deletion(-)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH 1/2] Implement Levenshtein distance
  2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
  2015-10-30 12:30             ` [PATCH 2/2] C FE: suggest corrections for misspelled field names David Malcolm
@ 2015-10-30 12:36             ` David Malcolm
  2015-11-02 10:56               ` Mikael Morin
  2015-11-02  6:44             ` [PATCH 0/2] Levenshtein-based suggestions (v3) Jeff Law
  2 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-10-30 12:36 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches,
	David Malcolm

This patch adds an implementation of Levenshtein distance to gcc,
along with unit testing of the algorithm.

The unit testing is implemented via a plugin within gcc.dg/plugin.
(I'd prefer to do this via the unit testing patches I've been
proposing in a separate patch kit, but to avoid depending on that
this kit does it within a custom plugin.)

The plugin actually fails until followup patches are applied, with:

 cc1: error: cannot load plugin ./levenshtein_plugin.so
 ./levenshtein_plugin.so: undefined symbol: _Z20levenshtein_distancePKcS0_

due to nothing in the tree initially using the API, but I've broken
it out in the hope that it makes review easier.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add spellcheck.o.
	* spellcheck.c: New file.
	* spellcheck.h: New file.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/levenshtein-test-1.c: New file.
	* gcc.dg/plugin/levenshtein_plugin.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	levenshtein_plugin.c.
---
 gcc/Makefile.in                                  |   1 +
 gcc/spellcheck.c                                 | 136 +++++++++++++++++++++++
 gcc/spellcheck.h                                 |  32 ++++++
 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp           |   1 +
 6 files changed, 243 insertions(+)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 2685b38..9fb643e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1394,6 +1394,7 @@ OBJS = \
 	shrink-wrap.o \
 	simplify-rtx.o \
 	sparseset.o \
+	spellcheck.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	statistics.o \
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
new file mode 100644
index 0000000..532df58
--- /dev/null
+++ b/gcc/spellcheck.c
@@ -0,0 +1,136 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "spellcheck.h"
+
+/* The Levenshtein distance is an "edit-distance": the minimal
+   number of one-character insertions, removals or substitutions
+   that are needed to change one string into another.
+
+   This implementation uses the Wagner-Fischer algorithm.  */
+
+static edit_distance_t
+levenshtein_distance (const char *s, int m,
+		      const char *t, int n)
+{
+  const bool debug = false;
+
+  if (debug)
+    {
+      printf ("s: \"%s\" (m=%i)\n", s, m);
+      printf ("t: \"%s\" (n=%i)\n", t, n);
+    }
+
+  if (m == 0)
+    return n;
+  if (n == 0)
+    return m;
+
+  /* We effectively build a matrix where each (i, j) contains the
+     Levenshtein distance between the prefix strings s[0:i]
+     and t[0:j].
+     Rather than actually build an (m + 1) * (n + 1) matrix,
+     we simply keep track of the last row, v0 and a new row, v1,
+     which avoids an (m + 1) * (n + 1) allocation and memory accesses
+     in favor of two (m + 1) allocations.  These could potentially be
+     statically-allocated if we impose a maximum length on the
+     strings of interest.  */
+  edit_distance_t *v0 = new edit_distance_t[m + 1];
+  edit_distance_t *v1 = new edit_distance_t[m + 1];
+
+  /* The first row is for the case of an empty target string, which
+     we can reach by deleting every character in the source string.  */
+  for (int i = 0; i < m + 1; i++)
+    v0[i] = i;
+
+  /* Build successive rows.  */
+  for (int i = 0; i < n; i++)
+    {
+      if (debug)
+	{
+	  printf ("i:%i v0 = ", i);
+	  for (int j = 0; j < m + 1; j++)
+	    printf ("%i ", v0[j]);
+	  printf ("\n");
+	}
+
+      /* The initial column is for the case of an empty source string; we
+	 can reach prefixes of the target string of length i
+	 by inserting i characters.  */
+      v1[0] = i + 1;
+
+      /* Build the rest of the row by considering neighbours to
+	 the north, west and northwest.  */
+      for (int j = 0; j < m; j++)
+	{
+	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
+	  edit_distance_t deletion     = v1[j] + 1;
+	  edit_distance_t insertion    = v0[j + 1] + 1;
+	  edit_distance_t substitution = v0[j] + cost;
+	  edit_distance_t cheapest = MIN (deletion, insertion);
+	  cheapest = MIN (cheapest, substitution);
+	  v1[j + 1] = cheapest;
+	}
+
+      /* Prepare to move on to next row.  */
+      for (int j = 0; j < m + 1; j++)
+	v0[j] = v1[j];
+    }
+
+  if (debug)
+    {
+      printf ("final v1 = ");
+      for (int j = 0; j < m + 1; j++)
+	printf ("%i ", v1[j]);
+      printf ("\n");
+    }
+
+  edit_distance_t result = v1[m];
+  delete[] v0;
+  delete[] v1;
+  return result;
+}
+
+/* Calculate Levenshtein distance between two nil-terminated strings.
+   This exists purely for the unit tests.  */
+
+edit_distance_t
+levenshtein_distance (const char *s, const char *t)
+{
+  return levenshtein_distance (s, strlen (s), t, strlen (t));
+}
+
+/* Calculate Levenshtein distance between two identifiers.  */
+
+edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t)
+{
+  gcc_assert (TREE_CODE (ident_s) == IDENTIFIER_NODE);
+  gcc_assert (TREE_CODE (ident_t) == IDENTIFIER_NODE);
+
+  return levenshtein_distance (IDENTIFIER_POINTER (ident_s),
+			       IDENTIFIER_LENGTH (ident_s),
+			       IDENTIFIER_POINTER (ident_t),
+			       IDENTIFIER_LENGTH (ident_t));
+}
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
new file mode 100644
index 0000000..58355d6
--- /dev/null
+++ b/gcc/spellcheck.h
@@ -0,0 +1,32 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_H
+#define GCC_SPELLCHECK_H
+
+typedef unsigned int edit_distance_t;
+const edit_distance_t MAX_EDIT_DISTANCE = UINT_MAX;
+
+extern edit_distance_t
+levenshtein_distance (const char *s, const char *t);
+
+extern edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t);
+
+#endif  /* GCC_SPELLCHECK_H  */
diff --git a/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c b/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
new file mode 100644
index 0000000..ac49992
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
@@ -0,0 +1,9 @@
+/* Placeholder C source file for unit-testing gcc/spellcheck.c.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+int
+main (int argc, char **argv)
+{
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c b/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
new file mode 100644
index 0000000..3e7dc78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
@@ -0,0 +1,64 @@
+/* Plugin for unittesting gcc/spellcheck.h.  */
+
+#include "config.h"
+#include "gcc-plugin.h"
+#include "system.h"
+#include "coretypes.h"
+#include "spellcheck.h"
+#include "diagnostic.h"
+
+int plugin_is_GPL_compatible;
+
+static void
+levenshtein_distance_unit_test_oneway (const char *a, const char *b,
+				       edit_distance_t expected)
+{
+  edit_distance_t actual = levenshtein_distance (a, b);
+  if (actual != expected)
+    error ("levenshtein_distance (\"%s\", \"%s\") : expected: %i got %i",
+	   a, b, expected, actual);
+}
+
+
+static void
+levenshtein_distance_unit_test (const char *a, const char *b,
+				edit_distance_t expected)
+{
+  /* Run every test both ways to ensure it's symmetric.  */
+  levenshtein_distance_unit_test_oneway (a, b, expected);
+  levenshtein_distance_unit_test_oneway (b, a, expected);
+}
+
+/* Callback handler for the PLUGIN_FINISH event; run
+   levenshtein_distance unit tests here.  */
+
+static void
+on_finish (void */*gcc_data*/, void */*user_data*/)
+{
+  levenshtein_distance_unit_test ("", "nonempty", strlen ("nonempty"));
+  levenshtein_distance_unit_test ("saturday", "sunday", 3);
+  levenshtein_distance_unit_test ("foo", "m_foo", 2);
+  levenshtein_distance_unit_test ("hello_world", "HelloWorld", 3);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog", "dog", 40);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog",
+     "the quick brown dog jumps over the lazy fox",
+     4);
+  levenshtein_distance_unit_test
+    ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
+     "All your base are belong to us",
+     44);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version */*version*/)
+{
+  register_callback (plugin_info->base_name,
+		     PLUGIN_FINISH,
+		     on_finish,
+		     NULL); /* void *user_data */
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 39fab6e..80fc539 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -63,6 +63,7 @@ set plugin_test_list [list \
     { start_unit_plugin.c start_unit-test-1.c } \
     { finish_unit_plugin.c finish_unit-test-1.c } \
     { wide-int_plugin.c wide-int-test-1.c } \
+    { levenshtein_plugin.c levenshtein-test-1.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
  2015-10-30 12:30             ` [PATCH 2/2] C FE: suggest corrections for misspelled field names David Malcolm
  2015-10-30 12:36             ` [PATCH 1/2] Implement Levenshtein distance David Malcolm
@ 2015-11-02  6:44             ` Jeff Law
  2015-11-13  2:08               ` David Malcolm
  2 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2015-11-02  6:44 UTC (permalink / raw)
  To: David Malcolm
  Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches

On 10/30/2015 06:47 AM, David Malcolm wrote:

> The typename suggestion seems to be at least somewhat controversial,
> whereas (I hope) the misspelled field names suggestion is more
> acceptable.
>
> Hence I'm focusing on the field name lookup for now; other uses of the
> algorithm (e.g. the typename lookup) could be done in followup patches,
> but I'm deferring them for now in the hope of getting the simplest case
> into trunk as a first step.  Similarly, for simplicity, I didn't
> implement any attempt at error-recovery using the hint.
>
> The following patch kit is in two parts (for ease of review; they would
> be applied together):
>
>    patch 1: Implement Levenshtein distance
>    patch 2: C FE: suggest corrections for misspelled field names
>
> I didn't implement a limiter, on the grounds that this only fires
> once per "has no member named" error, and so is unlikely to slow
> things down noticeably.
>
> Successfully bootstrapped&regrtested the combination of these two
> on x86_64-pc-linux-gnu (adds 11 new PASS results to gcc.sum)
>
> OK for trunk?
>
>   gcc/Makefile.in                                  |   1 +
>   gcc/c/c-typeck.c                                 |  70 +++++++++++-
>   gcc/spellcheck.c                                 | 136 +++++++++++++++++++++++
>   gcc/spellcheck.h                                 |  32 ++++++
>   gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
>   gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++++++++++
>   gcc/testsuite/gcc.dg/plugin/plugin.exp           |   1 +
>   gcc/testsuite/gcc.dg/spellcheck-fields.c         |  63 +++++++++++
>   8 files changed, 375 insertions(+), 1 deletion(-)
>   create mode 100644 gcc/spellcheck.c
>   create mode 100644 gcc/spellcheck.h
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
>   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
>   create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c
I'm going to assume you got levenshtein's algorithm reasonably correct.

This is OK for the trunk.  Obviously I'd like to see it extend into the 
other front-ends (C++ in particular).  Then I'd like to see it extend 
beyond just misspelled field names.

jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 1/2] Implement Levenshtein distance
  2015-10-30 12:36             ` [PATCH 1/2] Implement Levenshtein distance David Malcolm
@ 2015-11-02 10:56               ` Mikael Morin
  0 siblings, 0 replies; 133+ messages in thread
From: Mikael Morin @ 2015-11-02 10:56 UTC (permalink / raw)
  To: David Malcolm, Jeff Law
  Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches

Hello,

Le 30/10/2015 13:47, David Malcolm a Ã©crit :
> This patch adds an implementation of Levenshtein distance to gcc,
> along with unit testing of the algorithm.
>
I noticed two nits while looking at it.


> diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
> new file mode 100644
> index 0000000..532df58
> --- /dev/null
> +++ b/gcc/spellcheck.c
> @@ -0,0 +1,136 @@
> +/* Find near-matches for strings and identifiers.
> +   Copyright (C) 2015 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "tree.h"
> +#include "spellcheck.h"
> +
> +/* The Levenshtein distance is an "edit-distance": the minimal
> +   number of one-character insertions, removals or substitutions
> +   that are needed to change one string into another.
> +
> +   This implementation uses the Wagner-Fischer algorithm.  */
> +
You forgot to explain that m and n are the lengths of s and t 
respectively.  You may want to just use a more descriptive name for them.

> +static edit_distance_t
> +levenshtein_distance (const char *s, int m,
> +		      const char *t, int n)
> +{
> +  const bool debug = false;
> +
> +  if (debug)
> +    {
> +      printf ("s: \"%s\" (m=%i)\n", s, m);
> +      printf ("t: \"%s\" (n=%i)\n", t, n);
> +    }
> +
> +  if (m == 0)
> +    return n;
> +  if (n == 0)
> +    return m;
> +
> +  /* We effectively build a matrix where each (i, j) contains the
> +     Levenshtein distance between the prefix strings s[0:i]
> +     and t[0:j].
The code seems to use s[0:j] and t[0:i] instead, doesn't it?

Thanks
Mikael

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-02  6:44             ` [PATCH 0/2] Levenshtein-based suggestions (v3) Jeff Law
@ 2015-11-13  2:08               ` David Malcolm
  2015-11-13  6:57                 ` Marek Polacek
  0 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-11-13  2:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, Manuel López-Ibáñez, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]

On Sun, 2015-11-01 at 23:44 -0700, Jeff Law wrote:
> On 10/30/2015 06:47 AM, David Malcolm wrote:
> 
> > The typename suggestion seems to be at least somewhat controversial,
> > whereas (I hope) the misspelled field names suggestion is more
> > acceptable.
> >
> > Hence I'm focusing on the field name lookup for now; other uses of the
> > algorithm (e.g. the typename lookup) could be done in followup patches,
> > but I'm deferring them for now in the hope of getting the simplest case
> > into trunk as a first step.  Similarly, for simplicity, I didn't
> > implement any attempt at error-recovery using the hint.
> >
> > The following patch kit is in two parts (for ease of review; they would
> > be applied together):
> >
> >    patch 1: Implement Levenshtein distance
> >    patch 2: C FE: suggest corrections for misspelled field names
> >
> > I didn't implement a limiter, on the grounds that this only fires
> > once per "has no member named" error, and so is unlikely to slow
> > things down noticeably.
> >
> > Successfully bootstrapped&regrtested the combination of these two
> > on x86_64-pc-linux-gnu (adds 11 new PASS results to gcc.sum)
> >
> > OK for trunk?
> >
> >   gcc/Makefile.in                                  |   1 +
> >   gcc/c/c-typeck.c                                 |  70 +++++++++++-
> >   gcc/spellcheck.c                                 | 136 +++++++++++++++++++++++
> >   gcc/spellcheck.h                                 |  32 ++++++
> >   gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
> >   gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++++++++++
> >   gcc/testsuite/gcc.dg/plugin/plugin.exp           |   1 +
> >   gcc/testsuite/gcc.dg/spellcheck-fields.c         |  63 +++++++++++
> >   8 files changed, 375 insertions(+), 1 deletion(-)
> >   create mode 100644 gcc/spellcheck.c
> >   create mode 100644 gcc/spellcheck.h
> >   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
> >   create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
> >   create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c
> I'm going to assume you got levenshtein's algorithm reasonably correct.
> 
> This is OK for the trunk.  

Thanks.

FWIW I applied some fixes for the nits identified by Mikael in:
  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00046.html
renaming params "m" and "n" to "len_s" and "len_t", and fixing the
comment - under the "obvious" rule.

I've committed the combination of the two patches (with the nit fixes)
as r230284; attached is what I committed (for reference).

> Obviously I'd like to see it extend into the 
> other front-ends (C++ in particular).  Then I'd like to see it extend 
> beyond just misspelled field names.

(nods)

[-- Attachment #2: 0001-Implement-Levenshtein-distance-use-in-C-FE-for-missp.patch --]
[-- Type: text/x-patch, Size: 15549 bytes --]

From 7d22e0182f7d21f2b18a64530e7f94dd36cec7b0 Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Thu, 29 Oct 2015 15:29:26 -0400
Subject: [PATCH] Implement Levenshtein distance; use in C FE for
 misspelled field names

This is the combination of:
  [PATCH 1/2] Implement Levenshtein distance
  [PATCH 2/2] C FE: suggest corrections for misspelled field names
plus some nit fixes to spellcheck.c.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add spellcheck.o.
	* spellcheck.c: New file.
	* spellcheck.h: New file.

gcc/c/ChangeLog:
	* c-typeck.c: Include spellcheck.h.
	(lookup_field_fuzzy_find_candidates): New function.
	(lookup_field_fuzzy): New function.
	(build_component_ref): If the field was not found, try using
	lookup_field_fuzzy and potentially offer a suggestion.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/levenshtein-test-1.c: New file.
	* gcc.dg/plugin/levenshtein_plugin.c: New file.
	* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
	levenshtein_plugin.c.
	* gcc.dg/spellcheck-fields.c: New file.
---
 gcc/Makefile.in                                  |   1 +
 gcc/c/c-typeck.c                                 |  74 +++++++++++-
 gcc/spellcheck.c                                 | 136 +++++++++++++++++++++++
 gcc/spellcheck.h                                 |  32 ++++++
 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c |   9 ++
 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c |  64 +++++++++++
 gcc/testsuite/gcc.dg/plugin/plugin.exp           |   1 +
 gcc/testsuite/gcc.dg/spellcheck-fields.c         |  63 +++++++++++
 8 files changed, 379 insertions(+), 1 deletion(-)
 create mode 100644 gcc/spellcheck.c
 create mode 100644 gcc/spellcheck.h
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
 create mode 100644 gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
 create mode 100644 gcc/testsuite/gcc.dg/spellcheck-fields.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 34d2356..f17234d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1403,6 +1403,7 @@ OBJS = \
 	shrink-wrap.o \
 	simplify-rtx.o \
 	sparseset.o \
+	spellcheck.o \
 	sreal.o \
 	stack-ptr-mod.o \
 	statistics.o \
diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 4335a87..eb4e1fc 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "c-family/c-ubsan.h"
 #include "cilk.h"
 #include "gomp-constants.h"
+#include "spellcheck.h"
 
 /* Possible cases of implicit bad conversions.  Used to select
    diagnostic messages in convert_for_assignment.  */
@@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
   return tree_cons (NULL_TREE, field, NULL_TREE);
 }
 
+/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
+
+static void
+lookup_field_fuzzy_find_candidates (tree type, tree component,
+				    vec<tree> *candidates)
+{
+  tree field;
+  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+    {
+      if (DECL_NAME (field) == NULL_TREE
+	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
+	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+	{
+	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+					      component,
+					      candidates);
+	}
+
+      if (DECL_NAME (field))
+	candidates->safe_push (DECL_NAME (field));
+    }
+}
+
+/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
+   rather than returning a TREE_LIST for an exact match.  */
+
+static tree
+lookup_field_fuzzy (tree type, tree component)
+{
+  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
+
+  /* First, gather a list of candidates.  */
+  auto_vec <tree> candidates;
+
+  lookup_field_fuzzy_find_candidates (type, component,
+				      &candidates);
+
+  /* Now determine which is closest.  */
+  int i;
+  tree identifier;
+  tree best_identifier = NULL;
+  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
+  FOR_EACH_VEC_ELT (candidates, i, identifier)
+    {
+      gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
+      edit_distance_t dist = levenshtein_distance (component, identifier);
+      if (dist < best_distance)
+	{
+	  best_distance = dist;
+	  best_identifier = identifier;
+	}
+    }
+
+  /* If more than half of the letters were misspelled, the suggestion is
+     likely to be meaningless.  */
+  if (best_identifier)
+    {
+      unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
+				 IDENTIFIER_LENGTH (best_identifier)) / 2;
+      if (best_distance > cutoff)
+	return NULL;
+    }
+
+  return best_identifier;
+}
+
 /* Make an expression to refer to the COMPONENT field of structure or
    union value DATUM.  COMPONENT is an IDENTIFIER_NODE.  LOC is the
    location of the COMPONENT_REF.  */
@@ -2277,7 +2344,12 @@ build_component_ref (location_t loc, tree datum, tree component)
 
       if (!field)
 	{
-	  error_at (loc, "%qT has no member named %qE", type, component);
+	  tree guessed_id = lookup_field_fuzzy (type, component);
+	  if (guessed_id)
+	    error_at (loc, "%qT has no member named %qE; did you mean %qE?",
+		      type, component, guessed_id);
+	  else
+	    error_at (loc, "%qT has no member named %qE", type, component);
 	  return error_mark_node;
 	}
 
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
new file mode 100644
index 0000000..31ce322
--- /dev/null
+++ b/gcc/spellcheck.c
@@ -0,0 +1,136 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "tree.h"
+#include "spellcheck.h"
+
+/* The Levenshtein distance is an "edit-distance": the minimal
+   number of one-character insertions, removals or substitutions
+   that are needed to change one string into another.
+
+   This implementation uses the Wagner-Fischer algorithm.  */
+
+static edit_distance_t
+levenshtein_distance (const char *s, int len_s,
+		      const char *t, int len_t)
+{
+  const bool debug = false;
+
+  if (debug)
+    {
+      printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
+      printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
+    }
+
+  if (len_s == 0)
+    return len_t;
+  if (len_t == 0)
+    return len_s;
+
+  /* We effectively build a matrix where each (i, j) contains the
+     Levenshtein distance between the prefix strings s[0:j]
+     and t[0:i].
+     Rather than actually build an (len_t + 1) * (len_s + 1) matrix,
+     we simply keep track of the last row, v0 and a new row, v1,
+     which avoids an (len_t + 1) * (len_s + 1) allocation and memory accesses
+     in favor of two (len_s + 1) allocations.  These could potentially be
+     statically-allocated if we impose a maximum length on the
+     strings of interest.  */
+  edit_distance_t *v0 = new edit_distance_t[len_s + 1];
+  edit_distance_t *v1 = new edit_distance_t[len_s + 1];
+
+  /* The first row is for the case of an empty target string, which
+     we can reach by deleting every character in the source string.  */
+  for (int i = 0; i < len_s + 1; i++)
+    v0[i] = i;
+
+  /* Build successive rows.  */
+  for (int i = 0; i < len_t; i++)
+    {
+      if (debug)
+	{
+	  printf ("i:%i v0 = ", i);
+	  for (int j = 0; j < len_s + 1; j++)
+	    printf ("%i ", v0[j]);
+	  printf ("\n");
+	}
+
+      /* The initial column is for the case of an empty source string; we
+	 can reach prefixes of the target string of length i
+	 by inserting i characters.  */
+      v1[0] = i + 1;
+
+      /* Build the rest of the row by considering neighbours to
+	 the north, west and northwest.  */
+      for (int j = 0; j < len_s; j++)
+	{
+	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
+	  edit_distance_t deletion     = v1[j] + 1;
+	  edit_distance_t insertion    = v0[j + 1] + 1;
+	  edit_distance_t substitution = v0[j] + cost;
+	  edit_distance_t cheapest = MIN (deletion, insertion);
+	  cheapest = MIN (cheapest, substitution);
+	  v1[j + 1] = cheapest;
+	}
+
+      /* Prepare to move on to next row.  */
+      for (int j = 0; j < len_s + 1; j++)
+	v0[j] = v1[j];
+    }
+
+  if (debug)
+    {
+      printf ("final v1 = ");
+      for (int j = 0; j < len_s + 1; j++)
+	printf ("%i ", v1[j]);
+      printf ("\n");
+    }
+
+  edit_distance_t result = v1[len_s];
+  delete[] v0;
+  delete[] v1;
+  return result;
+}
+
+/* Calculate Levenshtein distance between two nil-terminated strings.
+   This exists purely for the unit tests.  */
+
+edit_distance_t
+levenshtein_distance (const char *s, const char *t)
+{
+  return levenshtein_distance (s, strlen (s), t, strlen (t));
+}
+
+/* Calculate Levenshtein distance between two identifiers.  */
+
+edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t)
+{
+  gcc_assert (TREE_CODE (ident_s) == IDENTIFIER_NODE);
+  gcc_assert (TREE_CODE (ident_t) == IDENTIFIER_NODE);
+
+  return levenshtein_distance (IDENTIFIER_POINTER (ident_s),
+			       IDENTIFIER_LENGTH (ident_s),
+			       IDENTIFIER_POINTER (ident_t),
+			       IDENTIFIER_LENGTH (ident_t));
+}
diff --git a/gcc/spellcheck.h b/gcc/spellcheck.h
new file mode 100644
index 0000000..58355d6
--- /dev/null
+++ b/gcc/spellcheck.h
@@ -0,0 +1,32 @@
+/* Find near-matches for strings and identifiers.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_SPELLCHECK_H
+#define GCC_SPELLCHECK_H
+
+typedef unsigned int edit_distance_t;
+const edit_distance_t MAX_EDIT_DISTANCE = UINT_MAX;
+
+extern edit_distance_t
+levenshtein_distance (const char *s, const char *t);
+
+extern edit_distance_t
+levenshtein_distance (tree ident_s, tree ident_t);
+
+#endif  /* GCC_SPELLCHECK_H  */
diff --git a/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c b/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
new file mode 100644
index 0000000..ac49992
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/levenshtein-test-1.c
@@ -0,0 +1,9 @@
+/* Placeholder C source file for unit-testing gcc/spellcheck.c.  */
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+int
+main (int argc, char **argv)
+{
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c b/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
new file mode 100644
index 0000000..3e7dc78
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/plugin/levenshtein_plugin.c
@@ -0,0 +1,64 @@
+/* Plugin for unittesting gcc/spellcheck.h.  */
+
+#include "config.h"
+#include "gcc-plugin.h"
+#include "system.h"
+#include "coretypes.h"
+#include "spellcheck.h"
+#include "diagnostic.h"
+
+int plugin_is_GPL_compatible;
+
+static void
+levenshtein_distance_unit_test_oneway (const char *a, const char *b,
+				       edit_distance_t expected)
+{
+  edit_distance_t actual = levenshtein_distance (a, b);
+  if (actual != expected)
+    error ("levenshtein_distance (\"%s\", \"%s\") : expected: %i got %i",
+	   a, b, expected, actual);
+}
+
+
+static void
+levenshtein_distance_unit_test (const char *a, const char *b,
+				edit_distance_t expected)
+{
+  /* Run every test both ways to ensure it's symmetric.  */
+  levenshtein_distance_unit_test_oneway (a, b, expected);
+  levenshtein_distance_unit_test_oneway (b, a, expected);
+}
+
+/* Callback handler for the PLUGIN_FINISH event; run
+   levenshtein_distance unit tests here.  */
+
+static void
+on_finish (void */*gcc_data*/, void */*user_data*/)
+{
+  levenshtein_distance_unit_test ("", "nonempty", strlen ("nonempty"));
+  levenshtein_distance_unit_test ("saturday", "sunday", 3);
+  levenshtein_distance_unit_test ("foo", "m_foo", 2);
+  levenshtein_distance_unit_test ("hello_world", "HelloWorld", 3);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog", "dog", 40);
+  levenshtein_distance_unit_test
+    ("the quick brown fox jumps over the lazy dog",
+     "the quick brown dog jumps over the lazy fox",
+     4);
+  levenshtein_distance_unit_test
+    ("Lorem ipsum dolor sit amet, consectetur adipiscing elit,",
+     "All your base are belong to us",
+     44);
+}
+
+int
+plugin_init (struct plugin_name_args *plugin_info,
+	     struct plugin_gcc_version */*version*/)
+{
+  register_callback (plugin_info->base_name,
+		     PLUGIN_FINISH,
+		     on_finish,
+		     NULL); /* void *user_data */
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/plugin/plugin.exp b/gcc/testsuite/gcc.dg/plugin/plugin.exp
index 941bccc..ce0a18d 100644
--- a/gcc/testsuite/gcc.dg/plugin/plugin.exp
+++ b/gcc/testsuite/gcc.dg/plugin/plugin.exp
@@ -66,6 +66,7 @@ set plugin_test_list [list \
     { diagnostic_plugin_test_show_locus.c \
 	  diagnostic-test-show-locus-bw.c \
 	  diagnostic-test-show-locus-color.c } \
+    { levenshtein_plugin.c levenshtein-test-1.c } \
 ]
 
 foreach plugin_test $plugin_test_list {
diff --git a/gcc/testsuite/gcc.dg/spellcheck-fields.c b/gcc/testsuite/gcc.dg/spellcheck-fields.c
new file mode 100644
index 0000000..01be550
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/spellcheck-fields.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+
+struct foo
+{
+  int foo;
+  int bar;
+  int baz;
+};
+
+int test (struct foo *ptr)
+{
+  return ptr->m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+int test2 (void)
+{
+  struct foo instance = {0, 0, 0};
+  return instance.m_bar; /* { dg-error "'struct foo' has no member named 'm_bar'; did you mean 'bar'?" } */
+}
+
+struct s {
+    struct j { int aa; } kk;
+    int ab;
+};
+
+void test3 (struct s x)
+{
+  x.ac;  /* { dg-error "'struct s' has no member named 'ac'; did you mean 'ab'?" } */
+}
+
+int test4 (struct foo *ptr)
+{
+  return sizeof (ptr->foa); /* { dg-error "'struct foo' has no member named 'foa'; did you mean 'foo'?" } */
+}
+
+/* Verify that we don't offer nonsensical suggestions.  */
+
+int test5 (struct foo *ptr)
+{
+  return ptr->this_is_unlike_any_of_the_fields;   /* { dg-bogus "did you mean" } */
+  /* { dg-error "has no member named" "" { target *-*-* } 40 } */
+}
+
+union u
+{
+  int color;
+  int shape;
+};
+
+int test6 (union u *ptr)
+{
+  return ptr->colour; /* { dg-error "'union u' has no member named 'colour'; did you mean 'color'?" } */
+}
+
+struct has_anon
+{
+  struct { int color; } s;
+};
+
+int test7 (struct has_anon *ptr)
+{
+  return ptr->s.colour; /* { dg-error "'struct <anonymous>' has no member named 'colour'; did you mean 'color'?" } */
+}
-- 
1.8.5.3


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13  2:08               ` David Malcolm
@ 2015-11-13  6:57                 ` Marek Polacek
  2015-11-13 12:16                   ` David Malcolm
  0 siblings, 1 reply; 133+ messages in thread
From: Marek Polacek @ 2015-11-13  6:57 UTC (permalink / raw)
  To: David Malcolm
  Cc: Jeff Law, Richard Biener, Manuel López-Ibáñez,
	GCC Patches

Probably coming too late, sorry.

On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> index 4335a87..eb4e1fc 100644
> --- a/gcc/c/c-typeck.c
> +++ b/gcc/c/c-typeck.c
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "c-family/c-ubsan.h"
>  #include "cilk.h"
>  #include "gomp-constants.h"
> +#include "spellcheck.h"
>  
>  /* Possible cases of implicit bad conversions.  Used to select
>     diagnostic messages in convert_for_assignment.  */
> @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
>    return tree_cons (NULL_TREE, field, NULL_TREE);
>  }
>  
> +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> +
> +static void
> +lookup_field_fuzzy_find_candidates (tree type, tree component,
> +				    vec<tree> *candidates)
> +{
> +  tree field;
> +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))

I'd prefer declaring field in the for loop, so
  for (tree field = TYPE_FIELDS...

> +	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> +	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))

This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

> +	{
> +	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> +					      component,
> +					      candidates);
> +	}

Lose the brackets around a single statement.

> +      if (DECL_NAME (field))
> +	candidates->safe_push (DECL_NAME (field));
> +    }
> +}
> +
> +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> +   rather than returning a TREE_LIST for an exact match.  */
> +
> +static tree
> +lookup_field_fuzzy (tree type, tree component)
> +{
> +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> +
> +  /* First, gather a list of candidates.  */
> +  auto_vec <tree> candidates;
> +
> +  lookup_field_fuzzy_find_candidates (type, component,
> +				      &candidates);
> +
> +  /* Now determine which is closest.  */
> +  int i;
> +  tree identifier;
> +  tree best_identifier = NULL;

NULL_TREE

> +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> +    {
> +      gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> +      edit_distance_t dist = levenshtein_distance (component, identifier);
> +      if (dist < best_distance)
> +	{
> +	  best_distance = dist;
> +	  best_identifier = identifier;
> +	}
> +    }
> +
> +  /* If more than half of the letters were misspelled, the suggestion is
> +     likely to be meaningless.  */
> +  if (best_identifier)
> +    {
> +      unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> +				 IDENTIFIER_LENGTH (best_identifier)) / 2;
> +      if (best_distance > cutoff)
> +	return NULL;

NULL_TREE

> +/* The Levenshtein distance is an "edit-distance": the minimal
> +   number of one-character insertions, removals or substitutions
> +   that are needed to change one string into another.
> +
> +   This implementation uses the Wagner-Fischer algorithm.  */
> +
> +static edit_distance_t
> +levenshtein_distance (const char *s, int len_s,
> +		      const char *t, int len_t)
> +{
> +  const bool debug = false;
> +
> +  if (debug)
> +    {
> +      printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> +      printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> +    }

Did you leave this debug stuff here intentionally?

> +      /* Build the rest of the row by considering neighbours to
> +	 the north, west and northwest.  */
> +      for (int j = 0; j < len_s; j++)
> +	{
> +	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> +	  edit_distance_t deletion     = v1[j] + 1;
> +	  edit_distance_t insertion    = v0[j + 1] + 1;

The formatting doesn't look right here.

	Marek

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13  6:57                 ` Marek Polacek
@ 2015-11-13 12:16                   ` David Malcolm
  2015-11-13 15:11                     ` Marek Polacek
  0 siblings, 1 reply; 133+ messages in thread
From: David Malcolm @ 2015-11-13 12:16 UTC (permalink / raw)
  To: Marek Polacek
  Cc: Jeff Law, Richard Biener, Manuel López-Ibáñez,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 5015 bytes --]

On Fri, 2015-11-13 at 07:57 +0100, Marek Polacek wrote:
> Probably coming too late, sorry.

> On Thu, Nov 12, 2015 at 09:08:36PM -0500, David Malcolm wrote:
> > index 4335a87..eb4e1fc 100644
> > --- a/gcc/c/c-typeck.c
> > +++ b/gcc/c/c-typeck.c
> > @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "c-family/c-ubsan.h"
> >  #include "cilk.h"
> >  #include "gomp-constants.h"
> > +#include "spellcheck.h"
> >  
> >  /* Possible cases of implicit bad conversions.  Used to select
> >     diagnostic messages in convert_for_assignment.  */
> > @@ -2242,6 +2243,72 @@ lookup_field (tree type, tree component)
> >    return tree_cons (NULL_TREE, field, NULL_TREE);
> >  }
> >  
> > +/* Recursively append candidate IDENTIFIER_NODEs to CANDIDATES.  */
> > +
> > +static void
> > +lookup_field_fuzzy_find_candidates (tree type, tree component,
> > +				    vec<tree> *candidates)
> > +{
> > +  tree field;
> > +  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
> 
> I'd prefer declaring field in the for loop, so
>   for (tree field = TYPE_FIELDS...
> 
> > +	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > +	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> 
> This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).

I based this code on the code in lookup_field right above it;
I copied-and-pasted that conditional, so presumably it should also be
changed in lookup_field (which has the condition twice)?

FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.

/* Nonzero if TYPE is a record or union type.  */
#define RECORD_OR_UNION_TYPE_P(TYPE)		\
  (TREE_CODE (TYPE) == RECORD_TYPE		\
   || TREE_CODE (TYPE) == UNION_TYPE		\
   || TREE_CODE (TYPE) == QUAL_UNION_TYPE)

FWIW I've made the change in the attached patch (both to the new
function, and to lookup_field).

> > +	{
> > +	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
> > +					      component,
> > +					      candidates);
> > +	}
> 
> Lose the brackets around a single statement.

Done.

> > +      if (DECL_NAME (field))
> > +	candidates->safe_push (DECL_NAME (field));
> > +    }
> > +}
> > +
> > +/* Like "lookup_field", but find the closest matching IDENTIFIER_NODE,
> > +   rather than returning a TREE_LIST for an exact match.  */
> > +
> > +static tree
> > +lookup_field_fuzzy (tree type, tree component)
> > +{
> > +  gcc_assert (TREE_CODE (component) == IDENTIFIER_NODE);
> > +
> > +  /* First, gather a list of candidates.  */
> > +  auto_vec <tree> candidates;
> > +
> > +  lookup_field_fuzzy_find_candidates (type, component,
> > +				      &candidates);
> > +
> > +  /* Now determine which is closest.  */
> > +  int i;
> > +  tree identifier;
> > +  tree best_identifier = NULL;
> 
> NULL_TREE

Fixed.

> > +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> > +  FOR_EACH_VEC_ELT (candidates, i, identifier)
> > +    {
> > +      gcc_assert (TREE_CODE (identifier) == IDENTIFIER_NODE);
> > +      edit_distance_t dist = levenshtein_distance (component, identifier);
> > +      if (dist < best_distance)
> > +	{
> > +	  best_distance = dist;
> > +	  best_identifier = identifier;
> > +	}
> > +    }
> > +
> > +  /* If more than half of the letters were misspelled, the suggestion is
> > +     likely to be meaningless.  */
> > +  if (best_identifier)
> > +    {
> > +      unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
> > +				 IDENTIFIER_LENGTH (best_identifier)) / 2;
> > +      if (best_distance > cutoff)
> > +	return NULL;
> 
> NULL_TREE

Fixed.

> > +/* The Levenshtein distance is an "edit-distance": the minimal
> > +   number of one-character insertions, removals or substitutions
> > +   that are needed to change one string into another.
> > +
> > +   This implementation uses the Wagner-Fischer algorithm.  */
> > +
> > +static edit_distance_t
> > +levenshtein_distance (const char *s, int len_s,
> > +		      const char *t, int len_t)
> > +{
> > +  const bool debug = false;
> > +
> > +  if (debug)
> > +    {
> > +      printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > +      printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > +    }
> 
> Did you leave this debug stuff here intentionally?

I find it useful, but I believe it's against our policy, so I've deleted
it in the attached patch.

> > +      /* Build the rest of the row by considering neighbours to
> > +	 the north, west and northwest.  */
> > +      for (int j = 0; j < len_s; j++)
> > +	{
> > +	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > +	  edit_distance_t deletion     = v1[j] + 1;
> > +	  edit_distance_t insertion    = v0[j + 1] + 1;
> 
> The formatting doesn't look right here.

It's correct; it's "diff" inserting two spaces before a tab combined
with our mixed spaces+tab convention: the "for" is at column 6 (6
spaces), whereas the other lines are at column 8 (1 tab), which looks
weird in a diff.

Patch attached; only tested lightly so far (compiles, and passes
spellcheck subset of tests).

OK for trunk if it passes bootstrap&regrtest?



[-- Attachment #2: 0001-Cleanups-of-spellchecking-code.patch --]
[-- Type: text/x-patch, Size: 4265 bytes --]

From b8ed3cbe9cc000416941e0108036f24f4483cdb0 Mon Sep 17 00:00:00 2001
From: David Malcolm <dmalcolm@redhat.com>
Date: Fri, 13 Nov 2015 07:22:16 -0500
Subject: [PATCH] Cleanups of spellchecking code

gcc/c/ChangeLog:
	* c-typeck.c (lookup_field): Use RECORD_OR_UNION_TYPE_P
	in two places.
	(lookup_field_fuzzy_find_candidates): Use RECORD_OR_UNION_TYPE_P;
	formatting cleanups.
	(lookup_field_fuzzy): Use NULL_TREE rather than NULL.

gcc/ChangeLog:
	* spellcheck.c (levenshtein_distance): Remove debug code.
---
 gcc/c/c-typeck.c | 24 +++++++++---------------
 gcc/spellcheck.c | 24 ------------------------
 2 files changed, 9 insertions(+), 39 deletions(-)

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index eb4e1fc..b084ca5 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2166,8 +2166,7 @@ lookup_field (tree type, tree component)
 	      while (DECL_NAME (field_array[bot]) == NULL_TREE)
 		{
 		  field = field_array[bot++];
-		  if (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
-		      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE)
+		  if (RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)))
 		    {
 		      tree anon = lookup_field (TREE_TYPE (field), component);
 
@@ -2213,8 +2212,7 @@ lookup_field (tree type, tree component)
       for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
 	{
 	  if (DECL_NAME (field) == NULL_TREE
-	      && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
-		  || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
+	      && RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)))
 	    {
 	      tree anon = lookup_field (TREE_TYPE (field), component);
 
@@ -2249,17 +2247,13 @@ static void
 lookup_field_fuzzy_find_candidates (tree type, tree component,
 				    vec<tree> *candidates)
 {
-  tree field;
-  for (field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
+  for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field))
     {
       if (DECL_NAME (field) == NULL_TREE
-	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
-	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
-	{
-	  lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
-					      component,
-					      candidates);
-	}
+	  && RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)))
+	lookup_field_fuzzy_find_candidates (TREE_TYPE (field),
+					    component,
+					    candidates);
 
       if (DECL_NAME (field))
 	candidates->safe_push (DECL_NAME (field));
@@ -2283,7 +2277,7 @@ lookup_field_fuzzy (tree type, tree component)
   /* Now determine which is closest.  */
   int i;
   tree identifier;
-  tree best_identifier = NULL;
+  tree best_identifier = NULL_TREE;
   edit_distance_t best_distance = MAX_EDIT_DISTANCE;
   FOR_EACH_VEC_ELT (candidates, i, identifier)
     {
@@ -2303,7 +2297,7 @@ lookup_field_fuzzy (tree type, tree component)
       unsigned int cutoff = MAX (IDENTIFIER_LENGTH (component),
 				 IDENTIFIER_LENGTH (best_identifier)) / 2;
       if (best_distance > cutoff)
-	return NULL;
+	return NULL_TREE;
     }
 
   return best_identifier;
diff --git a/gcc/spellcheck.c b/gcc/spellcheck.c
index 32854cf..6432e385 100644
--- a/gcc/spellcheck.c
+++ b/gcc/spellcheck.c
@@ -34,14 +34,6 @@ edit_distance_t
 levenshtein_distance (const char *s, int len_s,
 		      const char *t, int len_t)
 {
-  const bool debug = false;
-
-  if (debug)
-    {
-      printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
-      printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
-    }
-
   if (len_s == 0)
     return len_t;
   if (len_t == 0)
@@ -67,14 +59,6 @@ levenshtein_distance (const char *s, int len_s,
   /* Build successive rows.  */
   for (int i = 0; i < len_t; i++)
     {
-      if (debug)
-	{
-	  printf ("i:%i v0 = ", i);
-	  for (int j = 0; j < len_s + 1; j++)
-	    printf ("%i ", v0[j]);
-	  printf ("\n");
-	}
-
       /* The initial column is for the case of an empty source string; we
 	 can reach prefixes of the target string of length i
 	 by inserting i characters.  */
@@ -98,14 +82,6 @@ levenshtein_distance (const char *s, int len_s,
 	v0[j] = v1[j];
     }
 
-  if (debug)
-    {
-      printf ("final v1 = ");
-      for (int j = 0; j < len_s + 1; j++)
-	printf ("%i ", v1[j]);
-      printf ("\n");
-    }
-
   edit_distance_t result = v1[len_s];
   delete[] v0;
   delete[] v1;
-- 
1.8.5.3


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13 12:16                   ` David Malcolm
@ 2015-11-13 15:11                     ` Marek Polacek
  2015-11-13 15:44                       ` Bernd Schmidt
  0 siblings, 1 reply; 133+ messages in thread
From: Marek Polacek @ 2015-11-13 15:11 UTC (permalink / raw)
  To: David Malcolm
  Cc: Jeff Law, Richard Biener, Manuel López-Ibáñez,
	GCC Patches

On Fri, Nov 13, 2015 at 07:16:08AM -0500, David Malcolm wrote:
> > > +	  && (TREE_CODE (TREE_TYPE (field)) == RECORD_TYPE
> > > +	      || TREE_CODE (TREE_TYPE (field)) == UNION_TYPE))
> > 
> > This is RECORD_OR_UNION_TYPE_P (TREE_TYPE (field)).
> 
> I based this code on the code in lookup_field right above it;
> I copied-and-pasted that conditional, so presumably it should also be
> changed in lookup_field (which has the condition twice)?
> 
> FWIW I notice RECORD_OR_UNION_TYPE_P also covers QUAL_UNION_TYPE.
> 
> /* Nonzero if TYPE is a record or union type.  */
> #define RECORD_OR_UNION_TYPE_P(TYPE)		\
>   (TREE_CODE (TYPE) == RECORD_TYPE		\
>    || TREE_CODE (TYPE) == UNION_TYPE		\
>    || TREE_CODE (TYPE) == QUAL_UNION_TYPE)
> 
> FWIW I've made the change in the attached patch (both to the new
> function, and to lookup_field).

Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
things down.  I think we should have a C FE-only macro, maybe called
RECORD_OR_UNION_TYPE_P that only checks for those two types, but this is
something that I can deal with later on.

So I think please just drop these changes for now.  Sorry again.

> > > +  const bool debug = false;
> > > +
> > > +  if (debug)
> > > +    {
> > > +      printf ("s: \"%s\" (len_s=%i)\n", s, len_s);
> > > +      printf ("t: \"%s\" (len_t=%i)\n", t, len_t);
> > > +    }
> > 
> > Did you leave this debug stuff here intentionally?
> 
> I find it useful, but I believe it's against our policy, so I've deleted
> it in the attached patch.

Probably.  But you could surely have a separate DEBUG_FUNCTION that can be
called from gdb.
 
> > > +      /* Build the rest of the row by considering neighbours to
> > > +	 the north, west and northwest.  */
> > > +      for (int j = 0; j < len_s; j++)
> > > +	{
> > > +	  edit_distance_t cost = (s[j] == t[i] ? 0 : 1);
> > > +	  edit_distance_t deletion     = v1[j] + 1;
> > > +	  edit_distance_t insertion    = v0[j + 1] + 1;
> > 
> > The formatting doesn't look right here.
> 
> It's correct; it's "diff" inserting two spaces before a tab combined
> with our mixed spaces+tab convention: the "for" is at column 6 (6
> spaces), whereas the other lines are at column 8 (1 tab), which looks
> weird in a diff.

Sorry, what I had in mind were the spaces after "deletion" and "insertion"
before "=".  Not a big deal, of course.
 
> Patch attached; only tested lightly so far (compiles, and passes
> spellcheck subset of tests).
> 
> OK for trunk if it passes bootstrap&regrtest?

Ok modulo the RECORD_OR_UNION_TYPE_P changes, thanks.

	Marek

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13 15:11                     ` Marek Polacek
@ 2015-11-13 15:44                       ` Bernd Schmidt
  2015-11-13 15:53                         ` Marek Polacek
  0 siblings, 1 reply; 133+ messages in thread
From: Bernd Schmidt @ 2015-11-13 15:44 UTC (permalink / raw)
  To: Marek Polacek, David Malcolm
  Cc: Jeff Law, Richard Biener, Manuel López-Ibáñez,
	GCC Patches

On 11/13/2015 04:11 PM, Marek Polacek wrote:
> Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> things down.

I don't think so, the three codes are adjacent so we should be 
generating "(unsigned)(code - RECORD_TYPE) < 3".


Bernd

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13 15:44                       ` Bernd Schmidt
@ 2015-11-13 15:53                         ` Marek Polacek
  2015-11-13 15:56                           ` Jakub Jelinek
  0 siblings, 1 reply; 133+ messages in thread
From: Marek Polacek @ 2015-11-13 15:53 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: David Malcolm, Jeff Law, Richard Biener,
	Manuel López-Ibáñez, GCC Patches

On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> On 11/13/2015 04:11 PM, Marek Polacek wrote:
> >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> >things down.
> 
> I don't think so, the three codes are adjacent so we should be generating
> "(unsigned)(code - RECORD_TYPE) < 3".

Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
form, then we don't need a separate version for the C FE.

I'll look at this cleanup in the next week.

	Marek

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13 15:53                         ` Marek Polacek
@ 2015-11-13 15:56                           ` Jakub Jelinek
  2015-11-13 16:02                             ` Marek Polacek
  0 siblings, 1 reply; 133+ messages in thread
From: Jakub Jelinek @ 2015-11-13 15:56 UTC (permalink / raw)
  To: Marek Polacek
  Cc: Bernd Schmidt, David Malcolm, Jeff Law, Richard Biener,
	Manuel López-Ibáñez, GCC Patches

On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > On 11/13/2015 04:11 PM, Marek Polacek wrote:
> > >Sorry, I changed my mind.  Since QUAL_UNION_TYPE is Ada-only thing and
> > >we check (RECORD_TYPE || UNION_TYPE) in a lot of places in the C FE,
> > >introducing RECORD_OR_UNION_TYPE_P everywhere would unnecessarily slow
> > >things down.
> > 
> > I don't think so, the three codes are adjacent so we should be generating
> > "(unsigned)(code - RECORD_TYPE) < 3".
> 
> Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> form, then we don't need a separate version for the C FE.

Why?  The compiler should do that already, or do you care about
-O0 builds or host compilers other than gcc that aren't able to do this?
The disadvantage of writing it manually that way is that you need to assert
somewhere that the 3 values indeed are consecutive, while
when the (host?) compiler performs this optimization, it does that only if
they are consecutive, if they are not, the code will be just less efficient.

	Jakub

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH 0/2] Levenshtein-based suggestions (v3)
  2015-11-13 15:56                           ` Jakub Jelinek
@ 2015-11-13 16:02                             ` Marek Polacek
  0 siblings, 0 replies; 133+ messages in thread
From: Marek Polacek @ 2015-11-13 16:02 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Bernd Schmidt, David Malcolm, Jeff Law, Richard Biener,
	Manuel López-Ibáñez, GCC Patches

On Fri, Nov 13, 2015 at 04:56:30PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 13, 2015 at 04:53:05PM +0100, Marek Polacek wrote:
> > On Fri, Nov 13, 2015 at 04:44:21PM +0100, Bernd Schmidt wrote:
> > > I don't think so, the three codes are adjacent so we should be generating
> > > "(unsigned)(code - RECORD_TYPE) < 3".
> > 
> > Interesting.  Yeah, if we change the RECORD_OR_UNION_TYPE_P macro to this
> > form, then we don't need a separate version for the C FE.
> 
> Why?  The compiler should do that already, or do you care about
> -O0 builds or host compilers other than gcc that aren't able to do this?

I don't.

> The disadvantage of writing it manually that way is that you need to assert
> somewhere that the 3 values indeed are consecutive, while
> when the (host?) compiler performs this optimization, it does that only if
> they are consecutive, if they are not, the code will be just less efficient.

Ok, I understand now what Bernd meant.  I didn't realize the compiler already
does such optimization with those _TYPEs...

	Marek

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2015-09-15 21:15               ` Bernhard Reutner-Fischer
@ 2017-05-13 10:38                 ` Bernhard Reutner-Fischer
  2017-05-13 11:06                   ` Jakub Jelinek
  0 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2017-05-13 10:38 UTC (permalink / raw)
  To: Jeff Law, David Malcolm, Mike Stump; +Cc: gcc-patches List, GCC Development

On Tue, Sep 15, 2015 at 10:50:12PM +0200, Bernhard Reutner-Fischer wrote:
> On September 15, 2015 10:05:27 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
> >On 09/15/2015 01:21 PM, David Malcolm wrote:
> >> On Tue, 2015-09-15 at 10:39 -0700, Mike Stump wrote:
> >>> On Sep 14, 2015, at 3:37 PM, Jeff Law <law@redhat.com> wrote:
> >>>>> Maybe GCC-6 can bump the required
> >>>>> dejagnu version to allow for getting rid of all these superfluous
> >>>>> load_gcc_lib? *blink* :)
> >>>> I'd support that as a direction.
> >>>>
> >>>> Certainly dropping the 2001 version from our website in favor of
> >1.5
> >>> (which is what I'm using anyway) would be a step forward.
> >>>
> >>> So, even ubuntu LTS is 1.5 now.  No harm in upgrading the website to
> >>> 1.5.  I donâ€™t know of any reason to not update and just require 1.5
> >at
> >>> this point.  Iâ€™m not a fan of feature chasing dejagnu, but an update
> >>> every 2-4 years isnâ€™t unreasonable.
> >>
> >> FWIW, I believe RHEL 6 is at dejagnu-1.4.4   I don't know whether or
> >not
> >> that's an issue here.
> >I'd consider it a non-issue.  Folks that want to do GCC development on 
> >RHEL 6 are probably few and far between and can probably update dejagnu
> >
> >if need be ;-)
> >
> >If ubuntu, fedora, debian current releases were stuck at 1.4, then it'd
> >
> >be a bigger issue.
> 
> Debian sid has 1.5.3 fwiw, so I assume Debian 9 will have that too. Not sure if we can get it into Debian 8, I'm not intimately familiar with the policy. If OTOH GCC-6 requires it then that's probably a strong argument to let it bubble down to Debian 8 if need be.

So Debian 9 will have dejagnu-1.6.

(Ubuntu 16.10 allegedly has 1.6 too)

I guess neither redhat
(https://access.redhat.com/downloads/content/dejagnu/ redirects to a
login page but there seem to be 1.5.1 packages) nor SuSE did update dejagnu in the meantime.

Someone should poke gentoo to bump their dejagnu-1.5 to current -1.6

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-13 10:38                 ` Bernhard Reutner-Fischer
@ 2017-05-13 11:06                   ` Jakub Jelinek
  2017-05-13 21:12                     ` Jeff Law
  2017-05-16  9:56                     ` Jonathan Wakely
  0 siblings, 2 replies; 133+ messages in thread
From: Jakub Jelinek @ 2017-05-13 11:06 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Jeff Law, David Malcolm, Mike Stump, gcc-patches List, GCC Development

On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer wrote:
> I guess neither redhat
> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
> login page but there seem to be 1.5.1 packages) nor SuSE did update dejagnu in the meantime.

Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in Fedora 24, older
Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as RHEL 5 has
dejagnu-1.4.4, older RHEL versions are EOL.

	Jakub

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-13 11:06                   ` Jakub Jelinek
@ 2017-05-13 21:12                     ` Jeff Law
  2017-05-14 23:10                       ` NightStrike
  2017-05-16  9:56                     ` Jonathan Wakely
  1 sibling, 1 reply; 133+ messages in thread
From: Jeff Law @ 2017-05-13 21:12 UTC (permalink / raw)
  To: Jakub Jelinek, Bernhard Reutner-Fischer
  Cc: David Malcolm, Mike Stump, gcc-patches List, GCC Development

On 05/13/2017 04:38 AM, Jakub Jelinek wrote:
> On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer wrote:
>> I guess neither redhat
>> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
>> login page but there seem to be 1.5.1 packages) nor SuSE did update dejagnu in the meantime.
> 
> Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in Fedora 24, older
> Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as RHEL 5 has
> dejagnu-1.4.4, older RHEL versions are EOL.
RHEL-5 is old enough that IMHO it ought not figure into this discussion. 
  RHEL-6 is probably close to if not past that same point as well.

Jeff

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-13 21:12                     ` Jeff Law
@ 2017-05-14 23:10                       ` NightStrike
  2017-05-15  8:14                         ` Richard Biener
  0 siblings, 1 reply; 133+ messages in thread
From: NightStrike @ 2017-05-14 23:10 UTC (permalink / raw)
  To: Jeff Law
  Cc: Jakub Jelinek, Bernhard Reutner-Fischer, David Malcolm,
	Mike Stump, gcc-patches List, GCC Development

On Sat, May 13, 2017 at 4:39 PM, Jeff Law <law@redhat.com> wrote:
> On 05/13/2017 04:38 AM, Jakub Jelinek wrote:
>>
>> On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer wrote:
>>>
>>> I guess neither redhat
>>> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
>>> login page but there seem to be 1.5.1 packages) nor SuSE did update
>>> dejagnu in the meantime.
>>
>>
>> Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in Fedora 24,
>> older
>> Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as RHEL
>> 5 has
>> dejagnu-1.4.4, older RHEL versions are EOL.
>
> RHEL-5 is old enough that IMHO it ought not figure into this discussion.
> RHEL-6 is probably close to if not past that same point as well.

FWIW, I still run the testsuite on RHEL 6.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-14 23:10                       ` NightStrike
@ 2017-05-15  8:14                         ` Richard Biener
  2017-05-15 19:24                           ` Mike Stump
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Biener @ 2017-05-15  8:14 UTC (permalink / raw)
  To: NightStrike
  Cc: Jeff Law, Jakub Jelinek, Bernhard Reutner-Fischer, David Malcolm,
	Mike Stump, gcc-patches List, GCC Development

On Mon, May 15, 2017 at 12:09 AM, NightStrike <nightstrike@gmail.com> wrote:
> On Sat, May 13, 2017 at 4:39 PM, Jeff Law <law@redhat.com> wrote:
>> On 05/13/2017 04:38 AM, Jakub Jelinek wrote:
>>>
>>> On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer wrote:
>>>>
>>>> I guess neither redhat
>>>> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
>>>> login page but there seem to be 1.5.1 packages) nor SuSE did update
>>>> dejagnu in the meantime.
>>>
>>>
>>> Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in Fedora 24,
>>> older
>>> Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as RHEL
>>> 5 has
>>> dejagnu-1.4.4, older RHEL versions are EOL.
>>
>> RHEL-5 is old enough that IMHO it ought not figure into this discussion.
>> RHEL-6 is probably close to if not past that same point as well.
>
> FWIW, I still run the testsuite on RHEL 6.

Both SLE-11 and SLE-12 use dejagnu 1.4.4, so does openSUSE Leap 42.[12].
Tumbleweed uses 1.6 so new SLE will inherit that.  But I still do all
of my testing
on systems with just dejagnu 1.4.4.

Richard.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-15  8:14                         ` Richard Biener
@ 2017-05-15 19:24                           ` Mike Stump
  2017-05-15 20:52                             ` Andreas Schwab
  0 siblings, 1 reply; 133+ messages in thread
From: Mike Stump @ 2017-05-15 19:24 UTC (permalink / raw)
  To: Richard Biener
  Cc: NightStrike, Jeff Law, Jakub Jelinek, Bernhard Reutner-Fischer,
	David Malcolm, gcc-patches List, GCC Development

On May 15, 2017, at 1:06 AM, Richard Biener <richard.guenther@gmail.com> wrote:
> 
> Both SLE-11 and SLE-12 use dejagnu 1.4.4, so does openSUSE Leap 42.[12].
> Tumbleweed uses 1.6 so new SLE will inherit that.  But I still do all
> of my testing on systems with just dejagnu 1.4.4.

So dejagnu is independent of most things and downloads and installs in seconds, upgrading it shouldn't pose a problem for anyone that can build gcc.

That said, a little surprising that SLE is lagging everyone else so hard.  Looking at the 42.2 EOL plans, and that would put switching degagnu versions at around 13 months from now, if we waited.

So, how much would you mind, for trunk to require a newer a dejagnu?  If just a little, I'm inclined to not wait and support updating now.  If please god no, then I don't see the harm in waiting 13 months.  Leap 42.3 is out in 3 months, so the sooner update time would be just 3 months.  Could you jump to Leap 42.3 at that time?

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-15 19:24                           ` Mike Stump
@ 2017-05-15 20:52                             ` Andreas Schwab
  0 siblings, 0 replies; 133+ messages in thread
From: Andreas Schwab @ 2017-05-15 20:52 UTC (permalink / raw)
  To: Mike Stump
  Cc: Richard Biener, NightStrike, Jeff Law, Jakub Jelinek,
	Bernhard Reutner-Fischer, David Malcolm, gcc-patches List,
	GCC Development

On Mai 15 2017, Mike Stump <mikestump@comcast.net> wrote:

> That said, a little surprising that SLE is lagging everyone else so
> hard.

DejaGnu doesn't exactly have frequent releases.  Missing just one
release can easily put you more than 5 years behind.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-13 11:06                   ` Jakub Jelinek
  2017-05-13 21:12                     ` Jeff Law
@ 2017-05-16  9:56                     ` Jonathan Wakely
  2017-05-16 12:16                       ` Bernhard Reutner-Fischer
  1 sibling, 1 reply; 133+ messages in thread
From: Jonathan Wakely @ 2017-05-16  9:56 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Bernhard Reutner-Fischer, Jeff Law, David Malcolm, Mike Stump,
	gcc-patches List, GCC Development

On 13 May 2017 at 11:38, Jakub Jelinek wrote:
> On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer wrote:
>> I guess neither redhat
>> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
>> login page but there seem to be 1.5.1 packages) nor SuSE did update dejagnu in the meantime.
>
> Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in Fedora 24, older
> Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as RHEL 5 has
> dejagnu-1.4.4, older RHEL versions are EOL.

FWIW 1.5.3 has a fix which I rely on for libstdc++ testing, but since
newer versions are trivial to install by hand I'll be OK if we only
bump the minimum to 1.5.0

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16  9:56                     ` Jonathan Wakely
@ 2017-05-16 12:16                       ` Bernhard Reutner-Fischer
  2017-05-16 12:35                         ` Jonathan Wakely
  0 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2017-05-16 12:16 UTC (permalink / raw)
  To: Jonathan Wakely, Jakub Jelinek
  Cc: Jeff Law, David Malcolm, Mike Stump, gcc-patches List, GCC Development

On 16 May 2017 11:54:18 CEST, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>On 13 May 2017 at 11:38, Jakub Jelinek wrote:
>> On Sat, May 13, 2017 at 12:24:12PM +0200, Bernhard Reutner-Fischer
>wrote:
>>> I guess neither redhat
>>> (https://access.redhat.com/downloads/content/dejagnu/ redirects to a
>>> login page but there seem to be 1.5.1 packages) nor SuSE did update
>dejagnu in the meantime.
>>
>> Fedora has dejagnu-1.6 in Fedora 25 and later, dejagnu-1.5.3 in
>Fedora 24, older
>> Fedora versions are EOL.  RHEL 7 has dejagnu-1.5.1, RHEL 6 as well as
>RHEL 5 has
>> dejagnu-1.4.4, older RHEL versions are EOL.
>
>FWIW 1.5.3 has a fix which I rely on for libstdc++ testing, but since
>newer versions are trivial to install by hand I'll be OK if we only
>bump the minimum to 1.5.0

1.5.0 wouldn't buy us anything as the "libdirs" handling is only in 1.5.2 and later.

thanks

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16 12:16                       ` Bernhard Reutner-Fischer
@ 2017-05-16 12:35                         ` Jonathan Wakely
  2017-05-16 12:55                           ` Bernhard Reutner-Fischer
  2017-05-16 19:09                           ` Mike Stump
  0 siblings, 2 replies; 133+ messages in thread
From: Jonathan Wakely @ 2017-05-16 12:35 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Jakub Jelinek, Jeff Law, David Malcolm, Mike Stump,
	gcc-patches List, GCC Development

On 16 May 2017 at 13:13, Bernhard Reutner-Fischer wrote:
> 1.5.0 wouldn't buy us anything as the "libdirs" handling is only in 1.5.2 and later.

Ah I missed that in the earlier discussion.

The change I care about in 1.5.3 is
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16 12:35                         ` Jonathan Wakely
@ 2017-05-16 12:55                           ` Bernhard Reutner-Fischer
  2017-05-16 18:41                             ` Matthias Klose
  2017-05-16 19:09                           ` Mike Stump
  1 sibling, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2017-05-16 12:55 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Jakub Jelinek, Jeff Law, David Malcolm, Mike Stump,
	gcc-patches List, GCC Development

On 16 May 2017 at 14:16, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> On 16 May 2017 at 13:13, Bernhard Reutner-Fischer wrote:
>> 1.5.0 wouldn't buy us anything as the "libdirs" handling is only in 1.5.2 and later.
>
> Ah I missed that in the earlier discussion.
>
> The change I care about in 1.5.3 is
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347

the libdirs handling is
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
and applies cleanly to everything 1.5.x-ish. Didn't try if it applies to 1.4.4.

thanks,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16 12:55                           ` Bernhard Reutner-Fischer
@ 2017-05-16 18:41                             ` Matthias Klose
  0 siblings, 0 replies; 133+ messages in thread
From: Matthias Klose @ 2017-05-16 18:41 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Jonathan Wakely
  Cc: Jakub Jelinek, Jeff Law, David Malcolm, Mike Stump,
	gcc-patches List, GCC Development

On 16.05.2017 05:35, Bernhard Reutner-Fischer wrote:
> On 16 May 2017 at 14:16, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>> On 16 May 2017 at 13:13, Bernhard Reutner-Fischer wrote:
>>> 1.5.0 wouldn't buy us anything as the "libdirs" handling is only in 1.5.2 and later.
>>
>> Ah I missed that in the earlier discussion.
>>
>> The change I care about in 1.5.3 is
>> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
> 
> the libdirs handling is
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
> and applies cleanly to everything 1.5.x-ish. Didn't try if it applies to 1.4.4.

this patch is part of dejagnu in Ubuntu 14.04 LTS.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16 12:35                         ` Jonathan Wakely
  2017-05-16 12:55                           ` Bernhard Reutner-Fischer
@ 2017-05-16 19:09                           ` Mike Stump
  2018-08-04 16:32                             ` Bernhard Reutner-Fischer
  1 sibling, 1 reply; 133+ messages in thread
From: Mike Stump @ 2017-05-16 19:09 UTC (permalink / raw)
  To: Jonathan Wakely
  Cc: Bernhard Reutner-Fischer, Jakub Jelinek, Jeff Law, David Malcolm,
	gcc-patches List, GCC Development

On May 16, 2017, at 5:16 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> The change I care about in 1.5.3

So, we haven't talked much about the version people want most.  If we update, might as well get something that more people care about.  1.5.3 is in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is said to be using 1.6, in the post 1.4.4 systems.  People stated they want 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do update.

As for the machines in the FSF compile farm, nah, tail wagging the dog.  I'd rather just update the requirement, and the owners or users of those machines can install a new dejagnu, if they are using one that is too old and they want to support testing gcc.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2017-05-16 19:09                           ` Mike Stump
@ 2018-08-04 16:32                             ` Bernhard Reutner-Fischer
  2018-08-06 14:33                               ` Jonathan Wakely
                                                 ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2018-08-04 16:32 UTC (permalink / raw)
  To: Mike Stump
  Cc: Jonathan Wakely, Jakub Jelinek, Jeff Law, David Malcolm,
	GCC Patches, GCC Development

On Tue, 16 May 2017 at 21:08, Mike Stump <mikestump@comcast.net> wrote:
>
> On May 16, 2017, at 5:16 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> > The change I care about in 1.5.3
>
> So, we haven't talked much about the version people want most.  If we update, might as well get something that more people care about.  1.5.3 is in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is said to be using 1.6, in the post 1.4.4 systems.  People stated they want 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do update.
>
> As for the machines in the FSF compile farm, nah, tail wagging the dog.  I'd rather just update the requirement, and the owners or users of those machines can install a new dejagnu, if they are using one that is too old and they want to support testing gcc.

So.. let me ping that, again, now that another year has passed :)

PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
later applied as
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
1.5.3 for libstdc++ testing.
The libdirs fix would allow us to remove the 150 occurrences of the
load_gcc_lib hack, refer to the patch to the fortran list back then.
AFAIR this is still not fixed: +# BUG: gcc-dg calls
gcc-set-multilib-library-path but does not load gcc-defs!

debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
to contain both fixes. Commercial distros seem to ship fixed versions,
too.

thanks,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-04 16:32                             ` Bernhard Reutner-Fischer
@ 2018-08-06 14:33                               ` Jonathan Wakely
  2018-08-06 15:26                               ` Mike Stump
  2021-10-27 23:00                               ` Bernhard Reutner-Fischer
  2 siblings, 0 replies; 133+ messages in thread
From: Jonathan Wakely @ 2018-08-06 14:33 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Mike Stump, Jakub Jelinek, Jeff Law, David Malcolm, gcc-patches, gcc

On Sat, 4 Aug 2018 at 17:32, Bernhard Reutner-Fischer wrote:
> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
> to contain both fixes. Commercial distros seem to ship fixed versions,
> too.

The CentOS 7.4.1708 version on gcc112 doesn't seem to be fixed.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-04 16:32                             ` Bernhard Reutner-Fischer
  2018-08-06 14:33                               ` Jonathan Wakely
@ 2018-08-06 15:26                               ` Mike Stump
  2018-08-07 16:34                                 ` Segher Boessenkool
  2021-10-27 23:00                               ` Bernhard Reutner-Fischer
  2 siblings, 1 reply; 133+ messages in thread
From: Mike Stump @ 2018-08-06 15:26 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Jonathan Wakely, Jakub Jelinek, Jeff Law, David Malcolm,
	GCC Patches, GCC Development

On Aug 4, 2018, at 9:32 AM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
> On Tue, 16 May 2017 at 21:08, Mike Stump <mikestump@comcast.net> wrote:
>> 
>> On May 16, 2017, at 5:16 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>> The change I care about in 1.5.3
>> 
>> So, we haven't talked much about the version people want most.  If we update, might as well get something that more people care about.  1.5.3 is in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is said to be using 1.6, in the post 1.4.4 systems.  People stated they want 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do update.
>> 
>> As for the machines in the FSF compile farm, nah, tail wagging the dog.  I'd rather just update the requirement, and the owners or users of those machines can install a new dejagnu, if they are using one that is too old and they want to support testing gcc.
> 
> So.. let me ping that, again, now that another year has passed :)

Putting on my random engineer hat, does Centos 7 have a patch in it?  My system says 1.5.1.

Since g++ already requires 1.5.3, it make no sense to bump to anything older that 1.5.3, so let's bump to 1.5.3.  Those packaging systems and OSes that wanted to update by now, have had their chance to update.  Those that punt until we bump the requirement, well, they will now have to bump.  :-)

Ok to update to 1.5.3.

I'll pre-approve the patches to simplify and remove work arounds from the testsuite that cater to older versions.

If an RM wants to push the approval to sometime later (post a release branch creation point for example), let's give them a few days to request deferral.  I don't want to impact any next release in a way an RM doesn't want.  RM approval for back ports, I think we don't want to back port to a previous release, but I'm happy to defer to RM; if they want to do it.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-06 15:26                               ` Mike Stump
@ 2018-08-07 16:34                                 ` Segher Boessenkool
  2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
  0 siblings, 1 reply; 133+ messages in thread
From: Segher Boessenkool @ 2018-08-07 16:34 UTC (permalink / raw)
  To: Mike Stump
  Cc: Bernhard Reutner-Fischer, Jonathan Wakely, Jakub Jelinek,
	Jeff Law, David Malcolm, GCC Patches, GCC Development

On Mon, Aug 06, 2018 at 08:25:49AM -0700, Mike Stump wrote:
> Since g++ already requires 1.5.3, it make no sense to bump to anything older that 1.5.3, so let's bump to 1.5.3.  Those packaging systems and OSes that wanted to update by now, have had their chance to update.  Those that punt until we bump the requirement, well, they will now have to bump.  :-)

"g++ requires it"?  In what way?  I haven't seen any issues with older
dejagnu versions.

> Ok to update to 1.5.3.

1.5.3 is only three years old, and not all distros carry it.  This is
rather aggressive...


Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-07 16:34                                 ` Segher Boessenkool
@ 2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
  2018-08-08 13:35                                     ` Richard Earnshaw (lists)
                                                       ` (2 more replies)
  0 siblings, 3 replies; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2018-08-08 11:18 UTC (permalink / raw)
  To: Segher Boessenkool, Mike Stump
  Cc: Jonathan Wakely, Jakub Jelinek, Jeff Law, David Malcolm,
	GCC Patches, GCC Development

On 7 August 2018 18:34:30 CEST, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>On Mon, Aug 06, 2018 at 08:25:49AM -0700, Mike Stump wrote:
>> Since g++ already requires 1.5.3, it make no sense to bump to
>anything older that 1.5.3, so let's bump to 1.5.3.  Those packaging
>systems and OSes that wanted to update by now, have had their chance to
>update.  Those that punt until we bump the requirement, well, they will
>now have to bump.  :-)
>
>"g++ requires it"?  In what way?  I haven't seen any issues with older
>dejagnu versions.

I think http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347

>
>> Ok to update to 1.5.3.
>
>1.5.3 is only three years old, and not all distros carry it.  This is
>rather aggressive...

How come?
If one wants to develop on a distro that is notoriously outdated then you have to obtain the missing pieces yourself. I wouldn't call three years "aggressive".

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
@ 2018-08-08 13:35                                     ` Richard Earnshaw (lists)
  2018-08-08 14:37                                     ` Michael Matz
  2018-08-08 16:45                                     ` Segher Boessenkool
  2 siblings, 0 replies; 133+ messages in thread
From: Richard Earnshaw (lists) @ 2018-08-08 13:35 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Segher Boessenkool, Mike Stump
  Cc: Jonathan Wakely, Jakub Jelinek, Jeff Law, David Malcolm,
	GCC Patches, GCC Development

On 08/08/18 12:17, Bernhard Reutner-Fischer wrote:
> On 7 August 2018 18:34:30 CEST, Segher Boessenkool <segher@kernel.crashing.org> wrote:
>> On Mon, Aug 06, 2018 at 08:25:49AM -0700, Mike Stump wrote:
>>> Since g++ already requires 1.5.3, it make no sense to bump to
>> anything older that 1.5.3, so let's bump to 1.5.3.  Those packaging
>> systems and OSes that wanted to update by now, have had their chance to
>> update.  Those that punt until we bump the requirement, well, they will
>> now have to bump.  :-)
>>
>> "g++ requires it"?  In what way?  I haven't seen any issues with older
>> dejagnu versions.
> 
> I think http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347
> 
>>
>>> Ok to update to 1.5.3.
>>
>> 1.5.3 is only three years old, and not all distros carry it.  This is
>> rather aggressive...
> 
> How come?
> If one wants to develop on a distro that is notoriously outdated then you have to obtain the missing pieces yourself. I wouldn't call three years "aggressive".
> 

I would.

IT departments don't upgrade every machine each time a new distribution
comes out.  They expect to install one version (plus the security
updates, of course) on that machine for its lifetime.  Assuming new
distros are released every couple of years (quite aggressive) and that
IT groups also start installing the new version immediately it is
released on those new machines (extremely aggressive), you've got a 5
year life-cycle for software if you work on the basis that a machine is
expected to last three years.

So in practice, I think 6 years is more like that timeframe that needs
to be considered for these things and even that is quite aggressive.
Some machines have to run older versions of the OS simply because other
software running on them *has* to use an older OS release.

R.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
  2018-08-08 13:35                                     ` Richard Earnshaw (lists)
@ 2018-08-08 14:37                                     ` Michael Matz
  2018-08-08 16:45                                     ` Segher Boessenkool
  2 siblings, 0 replies; 133+ messages in thread
From: Michael Matz @ 2018-08-08 14:37 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Segher Boessenkool, Mike Stump, Jonathan Wakely, Jakub Jelinek,
	Jeff Law, David Malcolm, GCC Patches, GCC Development

Hi,

On Wed, 8 Aug 2018, Bernhard Reutner-Fischer wrote:

> How come?
> 
> If one wants to develop on a distro that is notoriously outdated then 
> you have to obtain the missing pieces yourself.

It's not about developing on an "notoriously outdated" distro, but about 
_testing_ on it.  There are very good reasons to test the quality of a 
compiler also on older distros.

> I wouldn't call three years "aggressive".

But even independend from the above I would.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
  2018-08-08 13:35                                     ` Richard Earnshaw (lists)
  2018-08-08 14:37                                     ` Michael Matz
@ 2018-08-08 16:45                                     ` Segher Boessenkool
  2 siblings, 0 replies; 133+ messages in thread
From: Segher Boessenkool @ 2018-08-08 16:45 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: Mike Stump, Jonathan Wakely, Jakub Jelinek, Jeff Law,
	David Malcolm, GCC Patches, GCC Development

On Wed, Aug 08, 2018 at 01:17:49PM +0200, Bernhard Reutner-Fischer wrote:
> On 7 August 2018 18:34:30 CEST, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> >On Mon, Aug 06, 2018 at 08:25:49AM -0700, Mike Stump wrote:
> >> Since g++ already requires 1.5.3, it make no sense to bump to
> >anything older that 1.5.3, so let's bump to 1.5.3.  Those packaging
> >systems and OSes that wanted to update by now, have had their chance to
> >update.  Those that punt until we bump the requirement, well, they will
> >now have to bump.  :-)
> >
> >"g++ requires it"?  In what way?  I haven't seen any issues with older
> >dejagnu versions.
> 
> I think http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347

Ugh.

If there is a conflict between the test-specific options and the testsuite
run options, sometimes you should pick one, sometimes the other, and often
skipping the test is best.  Older dejagnu picked the run options, and now
newer dejagnu picks the test-specific options, so now we cannot rely on
*either* behaviour.  At least for many years to come: we share most
testcases with older GCC versions, which do not require dejagnu 1.5.3!

What a mess.


Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2018-08-04 16:32                             ` Bernhard Reutner-Fischer
  2018-08-06 14:33                               ` Jonathan Wakely
  2018-08-06 15:26                               ` Mike Stump
@ 2021-10-27 23:00                               ` Bernhard Reutner-Fischer
  2021-10-28 19:11                                 ` Jeff Law
  2 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2021-10-27 23:00 UTC (permalink / raw)
  To: Mike Stump
  Cc: rep.dot.nop, Jonathan Wakely, Jakub Jelinek, Jeff Law,
	David Malcolm, GCC Patches, GCC Development, Rainer Orth

On Sat, 4 Aug 2018 18:32:24 +0200
Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:

> On Tue, 16 May 2017 at 21:08, Mike Stump <mikestump@comcast.net> wrote:
> >
> > On May 16, 2017, at 5:16 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:  
> > > The change I care about in 1.5.3  
> >
> > So, we haven't talked much about the version people want most.  If we update, might as well get something that more people care about.  1.5.3 is in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is said to be using 1.6, in the post 1.4.4 systems.  People stated they want 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do update.
> >
> > As for the machines in the FSF compile farm, nah, tail wagging the dog.  I'd rather just update the requirement, and the owners or users of those machines can install a new dejagnu, if they are using one that is too old and they want to support testing gcc.  
> 
> So.. let me ping that, again, now that another year has passed :)

or another 3 or 4 :)
> 
> PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
> later applied as
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
> and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
> 1.5.3 for libstdc++ testing.
(i.e.
http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347 )
> The libdirs fix would allow us to remove the 150 occurrences of the
> load_gcc_lib hack, refer to the patch to the fortran list back then.
> AFAIR this is still not fixed: +# BUG: gcc-dg calls
> gcc-set-multilib-library-path but does not load gcc-defs!
> 
> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
> to contain both fixes. Commercial distros seem to ship fixed versions,
> too.

It seems in May 2020 there was a thread on gcc with about the same
subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
where Mike suggests to have approved to bump the required minimum
version to 1.5.3.
So who's in the position to update the
https://gcc.gnu.org/install/prerequisites.html
to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?

Just asking patiently and politely.
I don't want to rush anybody into such a bump :)

But as you may remember, folks routinely run afoul of using too old
versions (without the 5256bd8 multilib prepending for example, recently
someone doing ARM stuff IIRC) so a bump would just be fair IMHO.

Maybe now, for gcc-12, is the time to bump prerequisites to 1.5.3?

thanks and sorry for my impatience (and, once again, the noise).
cheers,

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: dejagnu version update?
  2021-10-27 23:00                               ` Bernhard Reutner-Fischer
@ 2021-10-28 19:11                                 ` Jeff Law
  2021-10-29  0:41                                   ` [PATCH] Bump required minimum DejaGnu version to 1.5.3 Bernhard Reutner-Fischer
  0 siblings, 1 reply; 133+ messages in thread
From: Jeff Law @ 2021-10-28 19:11 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer, Mike Stump
  Cc: Jonathan Wakely, Jakub Jelinek, David Malcolm, GCC Patches,
	GCC Development, Rainer Orth



On 10/27/2021 5:00 PM, Bernhard Reutner-Fischer wrote:
> On Sat, 4 Aug 2018 18:32:24 +0200
> Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote:
>
>> On Tue, 16 May 2017 at 21:08, Mike Stump <mikestump@comcast.net> wrote:
>>> On May 16, 2017, at 5:16 AM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>>>> The change I care about in 1.5.3
>>> So, we haven't talked much about the version people want most.  If we update, might as well get something that more people care about.  1.5.3 is in ubuntu LTS 16.04 and Fedora 24, so it's been around awhile.  SUSU is said to be using 1.6, in the post 1.4.4 systems.  People stated they want 1.5.2 and 1.5.3, so, I'm inclined to say, let's shoot for 1.5.3 when we do update.
>>>
>>> As for the machines in the FSF compile farm, nah, tail wagging the dog.  I'd rather just update the requirement, and the owners or users of those machines can install a new dejagnu, if they are using one that is too old and they want to support testing gcc.
>> So.. let me ping that, again, now that another year has passed :)
> or another 3 or 4 :)
>> PS: Recap: https://gcc.gnu.org/ml/fortran/2012-03/msg00094.html was
>> later applied as
>> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5481f29161477520c691d525653323b82fa47ad7
>> and was part of the dejagnu-1.5.2 release from 2015. Jonathan requires
>> 1.5.3 for libstdc++ testing.
> (i.e.
> http://git.savannah.gnu.org/gitweb/?p=dejagnu.git;a=commit;h=5256bd82343000c76bc0e48139003f90b6184347 )
>> The libdirs fix would allow us to remove the 150 occurrences of the
>> load_gcc_lib hack, refer to the patch to the fortran list back then.
>> AFAIR this is still not fixed: +# BUG: gcc-dg calls
>> gcc-set-multilib-library-path but does not load gcc-defs!
>>
>> debian-stable (i think 9 ATM), Ubuntu LTS ship versions recent enough
>> to contain both fixes. Commercial distros seem to ship fixed versions,
>> too.
> It seems in May 2020 there was a thread on gcc with about the same
> subject: https://gcc.gnu.org/pipermail/gcc/2020-May/232427.html
> where Mike suggests to have approved to bump the required minimum
> version to 1.5.3.
> So who's in the position to update the
> https://gcc.gnu.org/install/prerequisites.html
> to s/1.4.4/1.5.3/g && git commit -m 'bump dejagnu required version' ?
All kinds of people.  Submit a patch and I bet it'll get approved. More 
than anything I suspect it's out-of-sight-out-of-mind at this point 
holding us back.

jeff


^ permalink raw reply	[flat|nested] 133+ messages in thread

* [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-10-28 19:11                                 ` Jeff Law
@ 2021-10-29  0:41                                   ` Bernhard Reutner-Fischer
  2021-10-29  7:32                                     ` Richard Biener
  0 siblings, 1 reply; 133+ messages in thread
From: Bernhard Reutner-Fischer @ 2021-10-29  0:41 UTC (permalink / raw)
  To: gcc-patches, gcc
  Cc: Bernhard Reutner-Fischer, Bernhard Reutner-Fischer, Rainer Orth,
	Mike Stump, Jeff Law

From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>

Bump required DejaGnu version to 1.5.3 (or later).
Ok for trunk?

gcc/ChangeLog:

	* doc/install.texi: Bump required minimum DejaGnu version.
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 36c8280d7da..094469b9a4e 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -452,7 +452,7 @@ Necessary when modifying @command{gperf} input files, e.g.@:
 @file{gcc/cp/cfns.gperf} to regenerate its associated header file, e.g.@:
 @file{gcc/cp/cfns.h}.
 
-@item DejaGnu 1.4.4
+@item DejaGnu version 1.5.3 (or later)
 @itemx Expect
 @itemx Tcl
 @c Once Tcl 8.5 or higher is required, remove any obsolete
-- 
2.33.0


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-10-29  0:41                                   ` [PATCH] Bump required minimum DejaGnu version to 1.5.3 Bernhard Reutner-Fischer
@ 2021-10-29  7:32                                     ` Richard Biener
  2021-11-04 11:55                                       ` Segher Boessenkool
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Biener @ 2021-10-29  7:32 UTC (permalink / raw)
  To: Bernhard Reutner-Fischer
  Cc: GCC Patches, GCC Development, Bernhard Reutner-Fischer

On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
>
> Bump required DejaGnu version to 1.5.3 (or later).
> Ok for trunk?

OK.

Thanks,
Richard.

> gcc/ChangeLog:
>
>         * doc/install.texi: Bump required minimum DejaGnu version.
> ---
>  gcc/doc/install.texi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 36c8280d7da..094469b9a4e 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -452,7 +452,7 @@ Necessary when modifying @command{gperf} input files, e.g.@:
>  @file{gcc/cp/cfns.gperf} to regenerate its associated header file, e.g.@:
>  @file{gcc/cp/cfns.h}.
>
> -@item DejaGnu 1.4.4
> +@item DejaGnu version 1.5.3 (or later)
>  @itemx Expect
>  @itemx Tcl
>  @c Once Tcl 8.5 or higher is required, remove any obsolete
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-10-29  7:32                                     ` Richard Biener
@ 2021-11-04 11:55                                       ` Segher Boessenkool
  2021-11-04 12:22                                         ` Martin Liška
  2021-11-04 12:41                                         ` Richard Biener
  0 siblings, 2 replies; 133+ messages in thread
From: Segher Boessenkool @ 2021-11-04 11:55 UTC (permalink / raw)
  To: Richard Biener
  Cc: Bernhard Reutner-Fischer, GCC Development, GCC Patches,
	Bernhard Reutner-Fischer

On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches wrote:
> On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
> Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >
> > From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
> >
> > Bump required DejaGnu version to 1.5.3 (or later).
> > Ok for trunk?
> 
> OK.

If we really want to require such a new version of DejaGnu (most
machines I use have 1.5.1 or older), can we include it with GCC please?


Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-04 11:55                                       ` Segher Boessenkool
@ 2021-11-04 12:22                                         ` Martin Liška
  2021-11-04 19:09                                           ` Segher Boessenkool
  2021-11-04 12:41                                         ` Richard Biener
  1 sibling, 1 reply; 133+ messages in thread
From: Martin Liška @ 2021-11-04 12:22 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Biener
  Cc: Bernhard Reutner-Fischer, GCC Patches, Bernhard Reutner-Fischer,
	GCC Development

On 11/4/21 12:55, Segher Boessenkool wrote:
> On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches wrote:
>> On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
>> Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>
>>> From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
>>>
>>> Bump required DejaGnu version to 1.5.3 (or later).
>>> Ok for trunk?
>>
>> OK.
> 
> If we really want to require such a new version of DejaGnu (most
> machines I use have 1.5.1 or older), can we include it with GCC please?

Do you mean in contrib/download_prerequisites?

Note the version 1.5.1 is 8 years old, what legacy system do you use that has such
an old version?

Martin

> 
> 
> Segher
> 


^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-04 11:55                                       ` Segher Boessenkool
  2021-11-04 12:22                                         ` Martin Liška
@ 2021-11-04 12:41                                         ` Richard Biener
  2021-11-04 13:50                                           ` Jonathan Wakely
  1 sibling, 1 reply; 133+ messages in thread
From: Richard Biener @ 2021-11-04 12:41 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Bernhard Reutner-Fischer, GCC Development, GCC Patches,
	Bernhard Reutner-Fischer

On Thu, Nov 4, 2021 at 12:57 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches wrote:
> > On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
> > Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
> > >
> > > Bump required DejaGnu version to 1.5.3 (or later).
> > > Ok for trunk?
> >
> > OK.
>
> If we really want to require such a new version of DejaGnu (most
> machines I use have 1.5.1 or older), can we include it with GCC please?

I checked before approving that all regularly supported SLES releases have
1.5.3 or newer (in fact they even have 1.6+).  Only before SLE12 SP2 you
had the chance to run into 1.4.4.  I guess you run into old versions on
big-endian ppc-linux which tend to be quite old if you rely on enterprise OS?

Richard.

>
> Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-04 12:41                                         ` Richard Biener
@ 2021-11-04 13:50                                           ` Jonathan Wakely
  0 siblings, 0 replies; 133+ messages in thread
From: Jonathan Wakely @ 2021-11-04 13:50 UTC (permalink / raw)
  To: Richard Biener
  Cc: Segher Boessenkool, Bernhard Reutner-Fischer, GCC Patches,
	Bernhard Reutner-Fischer, GCC Development

On Thu, 4 Nov 2021 at 12:42, Richard Biener via Gcc <gcc@gcc.gnu.org> wrote:
>
> On Thu, Nov 4, 2021 at 12:57 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches wrote:
> > > On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
> > > Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
> > > >
> > > > Bump required DejaGnu version to 1.5.3 (or later).
> > > > Ok for trunk?
> > >
> > > OK.
> >
> > If we really want to require such a new version of DejaGnu (most
> > machines I use have 1.5.1 or older), can we include it with GCC please?
>
> I checked before approving that all regularly supported SLES releases have
> 1.5.3 or newer (in fact they even have 1.6+).  Only before SLE12 SP2 you
> had the chance to run into 1.4.4.  I guess you run into old versions on
> big-endian ppc-linux which tend to be quite old if you rely on enterprise OS?

Like most of the ones in the compile farm, which run CentOS 7 and have
1.5.1. I've installed a newer version in /opt/cfarm on most of the
machines that need it.

I'm still in favour of updating the minimum version, because otherwise
we have FAILs for correct tests. The old version is just not good
enough.

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-04 12:22                                         ` Martin Liška
@ 2021-11-04 19:09                                           ` Segher Boessenkool
  2021-11-05  9:33                                             ` Richard Biener
  0 siblings, 1 reply; 133+ messages in thread
From: Segher Boessenkool @ 2021-11-04 19:09 UTC (permalink / raw)
  To: Martin Liška
  Cc: Richard Biener, Bernhard Reutner-Fischer, GCC Patches,
	Bernhard Reutner-Fischer, GCC Development

On Thu, Nov 04, 2021 at 01:22:24PM +0100, Martin Liška wrote:
> On 11/4/21 12:55, Segher Boessenkool wrote:
> >On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches 
> >wrote:
> >>On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
> >>Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>>From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
> >>>
> >>>Bump required DejaGnu version to 1.5.3 (or later).
> >>>Ok for trunk?
> >>
> >>OK.
> >
> >If we really want to require such a new version of DejaGnu (most
> >machines I use have 1.5.1 or older), can we include it with GCC please?
> 
> Do you mean in contrib/download_prerequisites?

I was thinking as actual code, so we can make modifications where we
need to / want to as well.  But your idea is much less contentious :-)

> Note the version 1.5.1 is 8 years old, what legacy system do you use that 
> has such
> an old version?

CentOS 7.  Some of those systems cannot run CentOS 8.  And CentOS 8 will
reach EoL in less than two months, and CentOS Stream is not an option at
all (and even if it were, it cannot work on many of the machines).

Everything else on CentOS 7 is supported by GCC (it is the oldest
supported for pretty much everything, but still).  It would be bad for
DejaGnu to be the limiting factor :-/


Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-04 19:09                                           ` Segher Boessenkool
@ 2021-11-05  9:33                                             ` Richard Biener
  2021-11-05 11:39                                               ` Jonathan Wakely
  0 siblings, 1 reply; 133+ messages in thread
From: Richard Biener @ 2021-11-05  9:33 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Martin Liška, Bernhard Reutner-Fischer, GCC Patches,
	Bernhard Reutner-Fischer, GCC Development

On Thu, Nov 4, 2021 at 8:12 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Thu, Nov 04, 2021 at 01:22:24PM +0100, Martin Liška wrote:
> > On 11/4/21 12:55, Segher Boessenkool wrote:
> > >On Fri, Oct 29, 2021 at 09:32:21AM +0200, Richard Biener via Gcc-patches
> > >wrote:
> > >>On Fri, Oct 29, 2021 at 2:42 AM Bernhard Reutner-Fischer via
> > >>Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > >>>
> > >>>From: Bernhard Reutner-Fischer <aldot@gcc.gnu.org>
> > >>>
> > >>>Bump required DejaGnu version to 1.5.3 (or later).
> > >>>Ok for trunk?
> > >>
> > >>OK.
> > >
> > >If we really want to require such a new version of DejaGnu (most
> > >machines I use have 1.5.1 or older), can we include it with GCC please?
> >
> > Do you mean in contrib/download_prerequisites?
>
> I was thinking as actual code, so we can make modifications where we
> need to / want to as well.  But your idea is much less contentious :-)
>
> > Note the version 1.5.1 is 8 years old, what legacy system do you use that
> > has such
> > an old version?
>
> CentOS 7.  Some of those systems cannot run CentOS 8.  And CentOS 8 will
> reach EoL in less than two months, and CentOS Stream is not an option at
> all (and even if it were, it cannot work on many of the machines).
>
> Everything else on CentOS 7 is supported by GCC (it is the oldest
> supported for pretty much everything, but still).  It would be bad for
> DejaGnu to be the limiting factor :-/

So just contribute updated dejagnu packages to CentOS 7 "backports" or
whatever means exists there?  Btw, openSUSE Tumbleweed still has
ppc64 (non-le) support and I bet Debian has that as well.

Richard.

>
> Segher

^ permalink raw reply	[flat|nested] 133+ messages in thread

* Re: [PATCH] Bump required minimum DejaGnu version to 1.5.3
  2021-11-05  9:33                                             ` Richard Biener
@ 2021-11-05 11:39                                               ` Jonathan Wakely
  0 siblings, 0 replies; 133+ messages in thread
From: Jonathan Wakely @ 2021-11-05 11:39 UTC (permalink / raw)
  To: Richard Biener
  Cc: Segher Boessenkool, Bernhard Reutner-Fischer, GCC Patches,
	Bernhard Reutner-Fischer, GCC Development

On Fri, 5 Nov 2021 at 09:35, Richard Biener via Gcc <gcc@gcc.gnu.org> wrote:
> So just contribute updated dejagnu packages to CentOS 7 "backports" or
> whatever means exists there?

Yes, we could add a newer dejagnu to EPEL.

^ permalink raw reply	[flat|nested] 133+ messages in thread

end of thread, other threads:[~2021-11-05 11:39 UTC | newest]

Thread overview: 133+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-10 20:12 [PATCH 00/22] RFC: Overhaul of diagnostics David Malcolm
2015-09-10 20:12 ` [PATCH 01/22] Change of location_get_source_line signature David Malcolm
2015-09-14 19:28   ` Jeff Law
2015-09-15 17:02     ` David Malcolm
2015-09-10 20:13 ` [PATCH 06/22] PR/62314: add ability to add fixit-hints David Malcolm
2015-09-10 20:13 ` [PATCH 08/22] C frontend: use token ranges in various diagnostics David Malcolm
2015-09-10 20:13 ` [PATCH 02/22] Testsuite: add dg-{begin|end}-multiline-output commands David Malcolm
2015-09-14 19:35   ` Jeff Law
2015-09-14 22:17     ` Bernhard Reutner-Fischer
2015-09-14 22:45       ` Jeff Law
2015-09-15 17:53         ` dejagnu version update? Mike Stump
2015-09-15 19:23           ` David Malcolm
2015-09-15 20:29             ` Jeff Law
2015-09-15 21:15               ` Bernhard Reutner-Fischer
2017-05-13 10:38                 ` Bernhard Reutner-Fischer
2017-05-13 11:06                   ` Jakub Jelinek
2017-05-13 21:12                     ` Jeff Law
2017-05-14 23:10                       ` NightStrike
2017-05-15  8:14                         ` Richard Biener
2017-05-15 19:24                           ` Mike Stump
2017-05-15 20:52                             ` Andreas Schwab
2017-05-16  9:56                     ` Jonathan Wakely
2017-05-16 12:16                       ` Bernhard Reutner-Fischer
2017-05-16 12:35                         ` Jonathan Wakely
2017-05-16 12:55                           ` Bernhard Reutner-Fischer
2017-05-16 18:41                             ` Matthias Klose
2017-05-16 19:09                           ` Mike Stump
2018-08-04 16:32                             ` Bernhard Reutner-Fischer
2018-08-06 14:33                               ` Jonathan Wakely
2018-08-06 15:26                               ` Mike Stump
2018-08-07 16:34                                 ` Segher Boessenkool
2018-08-08 11:18                                   ` Bernhard Reutner-Fischer
2018-08-08 13:35                                     ` Richard Earnshaw (lists)
2018-08-08 14:37                                     ` Michael Matz
2018-08-08 16:45                                     ` Segher Boessenkool
2021-10-27 23:00                               ` Bernhard Reutner-Fischer
2021-10-28 19:11                                 ` Jeff Law
2021-10-29  0:41                                   ` [PATCH] Bump required minimum DejaGnu version to 1.5.3 Bernhard Reutner-Fischer
2021-10-29  7:32                                     ` Richard Biener
2021-11-04 11:55                                       ` Segher Boessenkool
2021-11-04 12:22                                         ` Martin Liška
2021-11-04 19:09                                           ` Segher Boessenkool
2021-11-05  9:33                                             ` Richard Biener
2021-11-05 11:39                                               ` Jonathan Wakely
2021-11-04 12:41                                         ` Richard Biener
2021-11-04 13:50                                           ` Jonathan Wakely
2015-09-15 19:53           ` dejagnu version update? Bernhard Reutner-Fischer
2015-09-15 20:05             ` Jeff Law
2015-09-15 23:12               ` Mike Stump
2015-09-16  7:41                 ` Andreas Schwab
2015-09-16 16:19                   ` Mike Stump
2015-09-16 16:32                     ` Ramana Radhakrishnan
2015-09-16 16:39                       ` Jeff Law
2015-09-16 17:26                         ` Trevor Saunders
2015-09-16 17:46                         ` David Malcolm
2015-09-16 19:09                           ` Bernhard Reutner-Fischer
2015-09-16 19:51                             ` Mike Stump
2015-09-17  0:07                           ` Segher Boessenkool
2015-09-17 13:57                         ` Richard Earnshaw
2015-09-16 18:04                       ` Mike Stump
2015-09-16 18:58                         ` Bernhard Reutner-Fischer
2015-09-16 19:37                         ` Ramana Radhakrishnan
2015-09-16 13:17             ` Matthias Klose
2015-09-16 15:46               ` Bernhard Reutner-Fischer
2015-09-10 20:13 ` [PATCH 10/22] C++ FE: Use token ranges for various diagnostics David Malcolm
2015-09-10 20:13 ` [PATCH 09/22] C frontend: store and use token ranges in c_declspecs David Malcolm
2015-09-10 20:13 ` [PATCH 20/22] Use rich locations in c-family/c-format.c David Malcolm
2015-09-10 20:13 ` [PATCH 11/22] Objective C: c/c-parser.c: use token ranges in two places David Malcolm
2015-09-10 20:13 ` [PATCH 13/22] gcc-rich-location.[ch]: add methods for working with tree ranges David Malcolm
2015-09-10 20:13 ` [PATCH 03/22] Move diagnostic_show_locus and friends out into a new source file David Malcolm
2015-09-14 19:37   ` Jeff Law
2015-09-18 18:31     ` David Malcolm
2015-09-10 20:28 ` [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs David Malcolm
2015-09-11 14:08   ` Michael Matz
2015-09-14 19:41     ` Jeff Law
2015-09-15 10:20   ` Richard Biener
2015-09-15 10:28     ` Jakub Jelinek
2015-09-15 10:48       ` Richard Biener
2015-09-15 11:01         ` Jakub Jelinek
2015-09-16 20:29           ` David Malcolm
2015-09-17 16:54             ` David Malcolm
2015-09-17 19:15               ` Jeff Law
2015-09-17 20:06                 ` David Malcolm
2015-09-17 19:25         ` Jeff Law
2015-09-15 12:09       ` Manuel López-Ibáñez
2015-09-15 12:18         ` Richard Biener
2015-09-15 12:57           ` Manuel López-Ibáñez
2015-09-17 19:11             ` Jeff Law
2015-09-17 19:13           ` Jeff Law
2015-09-15 13:53       ` David Malcolm
2015-09-10 20:29 ` [PATCH 04/22] Reimplement diagnostic_show_locus, introducing rich_location classes David Malcolm
2015-09-11 13:44   ` Michael Matz
2015-09-11 14:12   ` Michael Matz
2015-09-11 15:15     ` David Malcolm
2015-09-10 20:29 ` [PATCH 05/22] Add overloads of inform, warning_at, etc that take a source_range David Malcolm
2015-09-10 20:30 ` [PATCH 15/22] Add plugin to recursively dump the source-ranges in a tree David Malcolm
2015-09-11  3:19   ` Martin Sebor
2015-09-10 20:30 ` [PATCH 12/22] Add source-ranges for trees David Malcolm
2015-09-10 20:30 ` [PATCH 14/22] C: capture tree ranges for various expressions David Malcolm
2015-09-10 20:31 ` [PATCH 19/22] gcc-rich-location.[ch]: add debug methods for cpp_string_location David Malcolm
2015-09-10 20:31 ` [PATCH 18/22] Track locations within string literals in tree_string David Malcolm
2015-09-10 20:32 ` [PATCH 17/22] libcpp: add location tracking within string literals David Malcolm
2015-09-10 20:32 ` [PATCH 21/22] Use Levenshtein distance for various misspellings in C frontend David Malcolm
2015-09-10 21:11   ` Andi Kleen
2015-09-11 15:31   ` Manuel López-Ibáñez
2015-09-15 15:25     ` [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2 David Malcolm
2015-09-15 16:25       ` Manuel López-Ibáñez
2015-09-16  8:45       ` Richard Biener
2015-09-16 13:33         ` Michael Matz
2015-09-16 14:00           ` Richard Biener
2015-09-16 15:49             ` Manuel López-Ibáñez
2015-09-17  8:46               ` Richard Biener
2015-09-17 19:32         ` Jeff Law
2015-09-17 20:05           ` David Malcolm
2015-09-17 20:52             ` Manuel López-Ibáñez
2015-10-30 12:30           ` [PATCH 0/2] Levenshtein-based suggestions (v3) David Malcolm
2015-10-30 12:30             ` [PATCH 2/2] C FE: suggest corrections for misspelled field names David Malcolm
2015-10-30 12:36             ` [PATCH 1/2] Implement Levenshtein distance David Malcolm
2015-11-02 10:56               ` Mikael Morin
2015-11-02  6:44             ` [PATCH 0/2] Levenshtein-based suggestions (v3) Jeff Law
2015-11-13  2:08               ` David Malcolm
2015-11-13  6:57                 ` Marek Polacek
2015-11-13 12:16                   ` David Malcolm
2015-11-13 15:11                     ` Marek Polacek
2015-11-13 15:44                       ` Bernd Schmidt
2015-11-13 15:53                         ` Marek Polacek
2015-11-13 15:56                           ` Jakub Jelinek
2015-11-13 16:02                             ` Marek Polacek
2015-09-10 20:32 ` [PATCH 16/22] C/C++ frontend: use tree ranges in various diagnostics David Malcolm
2015-09-10 20:50 ` [PATCH 22/22] Add fixit hints to spellchecker suggestions David Malcolm
2015-09-14 17:49 ` [PATCH 00/22] RFC: Overhaul of diagnostics Bernd Schmidt
2015-09-14 19:44   ` Jeff Law
2015-09-15  1:11     ` David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).