public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Move class substring_loc from c-family into gcc
@ 2016-08-25 15:40 David Malcolm
  2016-08-25 15:54 ` [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch David Malcolm
  2016-09-02 23:57 ` [PATCH] Move class substring_loc from c-family into gcc Martin Sebor
  0 siblings, 2 replies; 11+ messages in thread
From: David Malcolm @ 2016-08-25 15:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: Martin Sebor, David Malcolm

This patch is intended to help Martin's new middle-end sprintf format
warning.

It moves most of the on-demand locations-within-strings
code in c-family into gcc, into a new substring-locations.c file to
go with substring-locations.h: class substring_loc, representing
a source caret and source range within a STRING_CST, along with
format_warning for handling emitting a warning relating to it.

The actual work of reconstructing the source locations within
a string seems inherently frontend-specific, so the patch introduces a
langhook to do this, implementing it for C using the existing code
(and thus hiding the details of string-concatenation as being
specific to the c-family).  Attempts to get substring location
from other frontends will fail, and the format_warning_* calls
handle such failures gracefully by simply using the location of
the string as a whole for the warning.

I've tested it with Martin's gimple-ssa-sprintf patch, and I'm able to
emit carets and underlines in the correct places in C code from the
middle end with this approach (patch to follow).

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
	* Makefile.in (OBJS): Add substring-locations.o.
	* langhooks-def.h (class substring_loc): New forward decl.
	(lhd_get_substring_location): New decl.
	(LANG_HOOKS_GET_SUBSTRING_LOCATION): New macro.
	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_GET_SUBSTRING_LOCATION.
	* langhooks.c (lhd_get_substring_location): New function.
	* langhooks.h (class substring_loc): New forward decl.
	(struct lang_hooks): Add field get_substring_location.
	* substring-locations.c: New file, taking definition of
	format_warning_va and format_warning_at_substring from
	c-family/c-format.c, making them non-static.
	* substring-locations.h: Include <cpplib.h>.
	(class substring_loc): Move class here from c-family/c-common.h.
	Add comments.
	(format_warning_va): New decl.
	(format_warning_at_substring): New decl.
	(get_source_location_for_substring): Add comment.

gcc/c-family/ChangeLog:
	* c-common.c (get_cpp_ttype_from_string_type): Handle being passed
	a POINTER_TYPE.
	(substring_loc::get_location): Move to substring-locations.c,
	keeping implementation as...
	(c_get_substring_location): New function, from the above, reworked
	to use accessors rather than member lookup.
	* c-common.h (class substring_loc): Move to substring-locations.h,
	replacing with a forward decl.
	(c_get_substring_location): New decl.
	* c-format.c: Include "substring-locations.h".
	(format_warning_va): Move to substring-locations.c.
	(format_warning_at_substring): Likewise.

gcc/c/ChangeLog:
	* c-lang.c (LANG_HOOKS_GET_SUBSTRING_LOCATION): Use
	c_get_substring_location for this new langhook.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Include
	"substring-locations.h".
---
 gcc/Makefile.in                                    |   1 +
 gcc/c-family/c-common.c                            |  23 +--
 gcc/c-family/c-common.h                            |  32 +---
 gcc/c-family/c-format.c                            | 157 +----------------
 gcc/c/c-lang.c                                     |   3 +
 gcc/langhooks-def.h                                |   8 +-
 gcc/langhooks.c                                    |   8 +
 gcc/langhooks.h                                    |   9 +
 gcc/substring-locations.c                          | 195 +++++++++++++++++++++
 gcc/substring-locations.h                          |  67 +++++++
 .../diagnostic_plugin_test_string_literals.c       |   1 +
 11 files changed, 308 insertions(+), 196 deletions(-)
 create mode 100644 gcc/substring-locations.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 1b7464b..769efcf 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1444,6 +1444,7 @@ OBJS = \
 	store-motion.o \
 	streamer-hooks.o \
 	stringpool.o \
+	substring-locations.o \
 	target-globals.o \
 	targhooks.o \
 	timevar.o \
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 3feb910..f3e44c2 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1122,6 +1122,9 @@ static enum cpp_ttype
 get_cpp_ttype_from_string_type (tree string_type)
 {
   gcc_assert (string_type);
+  if (TREE_CODE (string_type) == POINTER_TYPE)
+    string_type = TREE_TYPE (string_type);
+
   if (TREE_CODE (string_type) != ARRAY_TYPE)
     return CPP_OTHER;
 
@@ -1148,23 +1151,23 @@ get_cpp_ttype_from_string_type (tree string_type)
 
 GTY(()) string_concat_db *g_string_concat_db;
 
-/* Attempt to determine the source location of the substring.
-   If successful, return NULL and write the source location to *OUT_LOC.
-   Otherwise return an error message.  Error messages are intended
-   for GCC developers (to help debugging) rather than for end-users.  */
+/* Implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
 
 const char *
-substring_loc::get_location (location_t *out_loc) const
+c_get_substring_location (const substring_loc &substr_loc,
+			  location_t *out_loc)
 {
-  gcc_assert (out_loc);
-
-  enum cpp_ttype tok_type = get_cpp_ttype_from_string_type (m_string_type);
+  enum cpp_ttype tok_type
+    = get_cpp_ttype_from_string_type (substr_loc.get_string_type ());
   if (tok_type == CPP_OTHER)
     return "unrecognized string type";
 
   return get_source_location_for_substring (parse_in, g_string_concat_db,
-					    m_fmt_string_loc, tok_type,
-					    m_caret_idx, m_start_idx, m_end_idx,
+					    substr_loc.get_fmt_string_loc (),
+					    tok_type,
+					    substr_loc.get_caret_idx (),
+					    substr_loc.get_start_idx (),
+					    substr_loc.get_end_idx (),
 					    out_loc);
 }
 
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index bc22baa..4470b4f 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1131,35 +1131,9 @@ extern const char *cb_get_suggestion (cpp_reader *, const char *,
 
 extern GTY(()) string_concat_db *g_string_concat_db;
 
-/* libcpp can calculate location information about a range of characters
-   within a string literal, but doing so is non-trivial.
-
-   This class encapsulates such a source location, so that it can be
-   passed around (e.g. within c-format.c).  It is effectively a deferred
-   call into libcpp.  If needed by a diagnostic, the actual source_range
-   can be calculated by calling the get_range method.  */
-
-class substring_loc
-{
- public:
-  substring_loc (location_t fmt_string_loc, tree string_type,
-		 int caret_idx, int start_idx, int end_idx)
-  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
-    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
-
-  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
-
-  const char *get_location (location_t *out_loc) const;
-
-  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
-
- private:
-  location_t m_fmt_string_loc;
-  tree m_string_type;
-  int m_caret_idx;
-  int m_start_idx;
-  int m_end_idx;
-};
+class substring_loc;
+extern const char *c_get_substring_location (const substring_loc &substr_loc,
+					     location_t *out_loc);
 
 /* In c-gimplify.c  */
 extern void c_genericize (tree);
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index d373819..ac40ac3 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-format.h"
 #include "diagnostic.h"
+#include "substring-locations.h"
 #include "selftest.h"
 
 /* Handle attributes associated with format checking.  */
@@ -67,162 +68,6 @@ static int first_target_format_type;
 static const char *format_name (int format_num);
 static int format_flags (int format_num);
 
-/* Emit a warning governed by option OPT, using GMSGID as the format
-   string and AP as its arguments.
-
-   Attempt to obtain precise location information within a string
-   literal from FMT_LOC.
-
-   Case 1: if substring location is available, and is within the range of
-   the format string itself, the primary location of the
-   diagnostic is the substring range obtained from FMT_LOC, with the
-   caret at the *end* of the substring range.
-
-   For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf ("hello %i", msg);
-                    ~^
-
-   Case 2: if the substring location is available, but is not within
-   the range of the format string, the primary location is that of the
-   format string, and an note is emitted showing the substring location.
-
-   For example:
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf("hello " INT_FMT " world", msg);
-            ^~~~~~~~~~~~~~~~~~~~~~~~~
-     test.c:19: note: format string is defined here
-     #define INT_FMT "%i"
-                      ~^
-
-   Case 3: if precise substring information is unavailable, the primary
-   location is that of the whole string passed to FMT_LOC's constructor.
-   For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf(fmt, msg);
-            ^~~
-
-   For each of cases 1-3, if param_range is non-NULL, then it is used
-   as a secondary range within the warning.  For example, here it
-   is used with case 1:
-
-     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
-     printf ("foo %s bar", long_i + long_j);
-                  ~^       ~~~~~~~~~~~~~~~
-
-   and here with case 2:
-
-     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
-     printf ("foo " STR_FMT " bar", long_i + long_j);
-             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
-     test.c:89:16: note: format string is defined here
-     #define STR_FMT "%s"
-                      ~^
-
-   and with case 3:
-
-     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
-     printf(fmt, msg);
-            ^~~  ~~~
-
-   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
-   a fix-it hint, suggesting that it should replace the text within the
-   substring range.  For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf ("hello %i", msg);
-                    ~^
-                    %s
-
-   Return true if a warning was emitted, false otherwise.  */
-
-ATTRIBUTE_GCC_DIAG (5,0)
-static bool
-format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
-		   const char *corrected_substring,
-		   int opt, const char *gmsgid, va_list *ap)
-{
-  bool substring_within_range = false;
-  location_t primary_loc;
-  location_t fmt_substring_loc = UNKNOWN_LOCATION;
-  source_range fmt_loc_range
-    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
-  const char *err = fmt_loc.get_location (&fmt_substring_loc);
-  source_range fmt_substring_range
-    = get_range_from_loc (line_table, fmt_substring_loc);
-  if (err)
-    /* Case 3: unable to get substring location.  */
-    primary_loc = fmt_loc.get_fmt_string_loc ();
-  else
-    {
-      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
-	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
-	/* Case 1.  */
-	{
-	  substring_within_range = true;
-	  primary_loc = fmt_substring_loc;
-	}
-      else
-	/* Case 2.  */
-	{
-	  substring_within_range = false;
-	  primary_loc = fmt_loc.get_fmt_string_loc ();
-	}
-    }
-
-  rich_location richloc (line_table, primary_loc);
-
-  if (param_range)
-    {
-      location_t param_loc = make_location (param_range->m_start,
-					    param_range->m_start,
-					    param_range->m_finish);
-      richloc.add_range (param_loc, false);
-    }
-
-  if (!err && corrected_substring && substring_within_range)
-    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
-
-  diagnostic_info diagnostic;
-  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
-  diagnostic.option_index = opt;
-  bool warned = report_diagnostic (&diagnostic);
-
-  if (!err && fmt_substring_loc && !substring_within_range)
-    /* Case 2.  */
-    if (warned)
-      {
-	rich_location substring_richloc (line_table, fmt_substring_loc);
-	if (corrected_substring)
-	  substring_richloc.add_fixit_replace (fmt_substring_range,
-					       corrected_substring);
-	inform_at_rich_loc (&substring_richloc,
-			    "format string is defined here");
-      }
-
-  return warned;
-}
-
-/* Variadic call to format_warning_va.  */
-
-ATTRIBUTE_GCC_DIAG (5,0)
-static bool
-format_warning_at_substring (const substring_loc &fmt_loc,
-			     source_range *param_range,
-			     const char *corrected_substring,
-			     int opt, const char *gmsgid, ...)
-{
-  va_list ap;
-  va_start (ap, gmsgid);
-  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
-				   opt, gmsgid, &ap);
-  va_end (ap);
-
-  return warned;
-}
-
 /* Emit a warning as per format_warning_va, but construct the substring_loc
    for the character at offset (CHAR_IDX - 1) within a string constant
    FORMAT_STRING_CST at FMT_STRING_LOC.  */
diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
index b26be6a..b4096d0 100644
--- a/gcc/c/c-lang.c
+++ b/gcc/c/c-lang.c
@@ -43,6 +43,9 @@ enum c_language_kind c_language = clk_c;
 #define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
 #endif /* #if CHECKING_P */
 
+#undef LANG_HOOKS_GET_SUBSTRING_LOCATION
+#define LANG_HOOKS_GET_SUBSTRING_LOCATION c_get_substring_location
+
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index 10d910c..cf5f91d 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hooks.h"
 
 struct diagnostic_info;
+class substring_loc;
 
 /* Note to creators of new hooks:
 
@@ -81,6 +82,9 @@ extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 					       tree);
 extern bool lhd_omp_mappable_type (tree);
 
+extern const char *lhd_get_substring_location (const substring_loc &,
+					       location_t *out_loc);
+
 #define LANG_HOOKS_NAME			"GNU unknown"
 #define LANG_HOOKS_IDENTIFIER_SIZE	sizeof (struct lang_identifier)
 #define LANG_HOOKS_INIT			hook_bool_void_false
@@ -121,6 +125,7 @@ extern bool lhd_omp_mappable_type (tree);
 #define LANG_HOOKS_EH_USE_CXA_END_CLEANUP	false
 #define LANG_HOOKS_DEEP_UNSHARING	false
 #define LANG_HOOKS_RUN_LANG_SELFTESTS   lhd_do_nothing
+#define LANG_HOOKS_GET_SUBSTRING_LOCATION lhd_get_substring_location
 
 /* Attribute hooks.  */
 #define LANG_HOOKS_ATTRIBUTE_TABLE		NULL
@@ -323,7 +328,8 @@ extern void lhd_end_section (void);
   LANG_HOOKS_BLOCK_MAY_FALLTHRU, \
   LANG_HOOKS_EH_USE_CXA_END_CLEANUP, \
   LANG_HOOKS_DEEP_UNSHARING, \
-  LANG_HOOKS_RUN_LANG_SELFTESTS \
+  LANG_HOOKS_RUN_LANG_SELFTESTS, \
+  LANG_HOOKS_GET_SUBSTRING_LOCATION \
 }
 
 #endif /* GCC_LANG_HOOKS_DEF_H */
diff --git a/gcc/langhooks.c b/gcc/langhooks.c
index 3256a9d..538d9f9 100644
--- a/gcc/langhooks.c
+++ b/gcc/langhooks.c
@@ -693,6 +693,14 @@ lhd_enum_underlying_base_type (const_tree enum_type)
 					 TYPE_UNSIGNED (enum_type));
 }
 
+/* Default implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
+
+const char *
+lhd_get_substring_location (const substring_loc &, location_t *)
+{
+  return "unimplemented";
+}
+
 /* Returns true if the current lang_hooks represents the GNU C frontend.  */
 
 bool
diff --git a/gcc/langhooks.h b/gcc/langhooks.h
index 44c258e..c109c8c 100644
--- a/gcc/langhooks.h
+++ b/gcc/langhooks.h
@@ -34,6 +34,8 @@ typedef void (*lang_print_tree_hook) (FILE *, tree, int indent);
 enum classify_record
   { RECORD_IS_STRUCT, RECORD_IS_CLASS, RECORD_IS_INTERFACE };
 
+class substring_loc;
+
 /* The following hooks are documented in langhooks.c.  Must not be
    NULL.  */
 
@@ -513,6 +515,13 @@ struct lang_hooks
   /* Run all lang-specific selftests.  */
   void (*run_lang_selftests) (void);
 
+  /* Attempt to determine the source location of the substring.
+     If successful, return NULL and write the source location to *OUT_LOC.
+     Otherwise return an error message.  Error messages are intended
+     for GCC developers (to help debugging) rather than for end-users.  */
+  const char *(*get_substring_location) (const substring_loc &,
+					 location_t *out_loc);
+
   /* Whenever you add entries here, make sure you adjust langhooks-def.h
      and langhooks.c accordingly.  */
 };
diff --git a/gcc/substring-locations.c b/gcc/substring-locations.c
new file mode 100644
index 0000000..60bf1b0
--- /dev/null
+++ b/gcc/substring-locations.c
@@ -0,0 +1,195 @@
+/* Source locations within string literals.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "cpplib.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "substring-locations.h"
+
+/* Emit a warning governed by option OPT, using GMSGID as the format
+   string and AP as its arguments.
+
+   Attempt to obtain precise location information within a string
+   literal from FMT_LOC.
+
+   Case 1: if substring location is available, and is within the range of
+   the format string itself, the primary location of the
+   diagnostic is the substring range obtained from FMT_LOC, with the
+   caret at the *end* of the substring range.
+
+   For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf ("hello %i", msg);
+                    ~^
+
+   Case 2: if the substring location is available, but is not within
+   the range of the format string, the primary location is that of the
+   format string, and an note is emitted showing the substring location.
+
+   For example:
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf("hello " INT_FMT " world", msg);
+            ^~~~~~~~~~~~~~~~~~~~~~~~~
+     test.c:19: note: format string is defined here
+     #define INT_FMT "%i"
+                      ~^
+
+   Case 3: if precise substring information is unavailable, the primary
+   location is that of the whole string passed to FMT_LOC's constructor.
+   For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf(fmt, msg);
+            ^~~
+
+   For each of cases 1-3, if param_range is non-NULL, then it is used
+   as a secondary range within the warning.  For example, here it
+   is used with case 1:
+
+     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
+     printf ("foo %s bar", long_i + long_j);
+                  ~^       ~~~~~~~~~~~~~~~
+
+   and here with case 2:
+
+     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
+     printf ("foo " STR_FMT " bar", long_i + long_j);
+             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
+     test.c:89:16: note: format string is defined here
+     #define STR_FMT "%s"
+                      ~^
+
+   and with case 3:
+
+     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
+     printf(fmt, msg);
+            ^~~  ~~~
+
+   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
+   a fix-it hint, suggesting that it should replace the text within the
+   substring range.  For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf ("hello %i", msg);
+                    ~^
+                    %s
+
+   Return true if a warning was emitted, false otherwise.  */
+
+ATTRIBUTE_GCC_DIAG (5,0)
+bool
+format_warning_va (const substring_loc &fmt_loc,
+		   const source_range *param_range,
+		   const char *corrected_substring,
+		   int opt, const char *gmsgid, va_list *ap)
+{
+  bool substring_within_range = false;
+  location_t primary_loc;
+  location_t fmt_substring_loc = UNKNOWN_LOCATION;
+  source_range fmt_loc_range
+    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
+  const char *err = fmt_loc.get_location (&fmt_substring_loc);
+  source_range fmt_substring_range
+    = get_range_from_loc (line_table, fmt_substring_loc);
+  if (err)
+    /* Case 3: unable to get substring location.  */
+    primary_loc = fmt_loc.get_fmt_string_loc ();
+  else
+    {
+      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
+	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
+	/* Case 1.  */
+	{
+	  substring_within_range = true;
+	  primary_loc = fmt_substring_loc;
+	}
+      else
+	/* Case 2.  */
+	{
+	  substring_within_range = false;
+	  primary_loc = fmt_loc.get_fmt_string_loc ();
+	}
+    }
+
+  rich_location richloc (line_table, primary_loc);
+
+  if (param_range)
+    {
+      location_t param_loc = make_location (param_range->m_start,
+					    param_range->m_start,
+					    param_range->m_finish);
+      richloc.add_range (param_loc, false);
+    }
+
+  if (!err && corrected_substring && substring_within_range)
+    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
+
+  diagnostic_info diagnostic;
+  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
+  diagnostic.option_index = opt;
+  bool warned = report_diagnostic (&diagnostic);
+
+  if (!err && fmt_substring_loc && !substring_within_range)
+    /* Case 2.  */
+    if (warned)
+      {
+	rich_location substring_richloc (line_table, fmt_substring_loc);
+	if (corrected_substring)
+	  substring_richloc.add_fixit_replace (fmt_substring_range,
+					       corrected_substring);
+	inform_at_rich_loc (&substring_richloc,
+			    "format string is defined here");
+      }
+
+  return warned;
+}
+
+/* Variadic call to format_warning_va.  */
+
+bool
+format_warning_at_substring (const substring_loc &fmt_loc,
+			     const source_range *param_range,
+			     const char *corrected_substring,
+			     int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
+				   opt, gmsgid, &ap);
+  va_end (ap);
+
+  return warned;
+}
+
+/* Attempt to determine the source location of the substring.
+   If successful, return NULL and write the source location to *OUT_LOC.
+   Otherwise return an error message.  Error messages are intended
+   for GCC developers (to help debugging) rather than for end-users.  */
+
+const char *
+substring_loc::get_location (location_t *out_loc) const
+{
+  gcc_assert (out_loc);
+  return lang_hooks.get_substring_location (*this, out_loc);
+}
diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
index f839c74..bb0de4f 100644
--- a/gcc/substring-locations.h
+++ b/gcc/substring-locations.h
@@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_SUBSTRING_LOCATIONS_H
 #define GCC_SUBSTRING_LOCATIONS_H
 
+#include <cpplib.h>
+
+/* libcpp can calculate location information about a range of characters
+   within a string literal, but doing so is non-trivial.
+
+   This class encapsulates such a source location, so that it can be
+   passed around (e.g. within c-format.c).  It is effectively a deferred
+   call into libcpp.
+
+   If needed by a diagnostic, the actual location can be calculated by
+   calling the get_location method.  This calls a langhook, since
+   this is inherently frontend-specific (it reparses the strings to
+   get the location information on-demand, rather than storing the
+   location information in the initial lex for every string).
+
+   substring_loc::get_location returns NULL if it succeeds, or an
+   error message if it fails.  Error messages are intended for GCC
+   developers (to help debugging) rather than for end-users.  */
+
+class substring_loc
+{
+ public:
+  /* Constructor.  FMT_STRING_LOC is the location of the string as
+     a whole.  STRING_TYPE is the type of the string.  It should be an
+     ARRAY_TYPE of INTEGER_TYPE, or a POINTER_TYPE to such an ARRAY_TYPE.
+     CARET_IDX, START_IDX, and END_IDX are offsets from the start
+     of the string data.  */
+  substring_loc (location_t fmt_string_loc, tree string_type,
+		 int caret_idx, int start_idx, int end_idx)
+  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
+    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
+
+  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
+
+  const char *get_location (location_t *out_loc) const;
+
+  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
+  tree get_string_type () const { return m_string_type; }
+  int get_caret_idx () const { return m_caret_idx; }
+  int get_start_idx () const { return m_start_idx; }
+  int get_end_idx () const { return m_end_idx; }
+
+ private:
+  location_t m_fmt_string_loc;
+  tree m_string_type;
+  int m_caret_idx;
+  int m_start_idx;
+  int m_end_idx;
+};
+
+/* Functions for emitting a warning about a format string.  */
+
+extern bool format_warning_va (const substring_loc &fmt_loc,
+			       const source_range *param_range,
+			       const char *corrected_substring,
+			       int opt, const char *gmsgid, va_list *ap)
+  ATTRIBUTE_GCC_DIAG (5,0);
+
+extern bool format_warning_at_substring (const substring_loc &fmt_loc,
+					 const source_range *param_range,
+					 const char *corrected_substring,
+					 int opt, const char *gmsgid, ...)
+  ATTRIBUTE_GCC_DIAG (5,0);
+
+/* Implementation detail, for use when implementing
+   LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
+
 extern const char *get_source_location_for_substring (cpp_reader *pfile,
 						      string_concat_db *concats,
 						      location_t strloc,
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
index dff999c..99a504d 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
@@ -33,6 +33,7 @@
 #include "print-tree.h"
 #include "cpplib.h"
 #include "c-family/c-pragma.h"
+#include "substring-locations.h"
 
 int plugin_is_GPL_compatible;
 
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch
  2016-08-25 15:40 [PATCH] Move class substring_loc from c-family into gcc David Malcolm
@ 2016-08-25 15:54 ` David Malcolm
  2016-08-25 16:30   ` Martin Sebor
  2016-09-02 23:57 ` [PATCH] Move class substring_loc from c-family into gcc Martin Sebor
  1 sibling, 1 reply; 11+ messages in thread
From: David Malcolm @ 2016-08-25 15:54 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches, David Malcolm

Martin: here are the fixups for your patch I needed to apply to make
it work with mine.  I couldn't actually get any of your existing test
cases to emit locations within the string literals, due to them all
being embedded in macro expansions (possibly relating to PR c/77328),
so I added a simple testcase using -fdiagnostics-show-caret, which
does successfully show a range within the string.

Posting in the hope that it's helpful; I haven't attempted a bootstrap
with it.

gcc/ChangeLog:
	* gimple-ssa-sprintf.c (class substring_loc): Delete.
	(g_string_concat_db): Delete.
	(substring_loc::get_location): Delete.
	(format_warning_va): Delete.
	(format_directive): Update use of substring_loc ctor to supply
	type of format string.
	(add_bytes): Likewise.

gcc/testsuite/ChangeLog:
	* gcc.dg/tree-ssa/builtin-sprintf-warn-4.c: New test case.
---
 gcc/gimple-ssa-sprintf.c                           | 129 +--------------------
 .../gcc.dg/tree-ssa/builtin-sprintf-warn-4.c       |  17 +++
 2 files changed, 21 insertions(+), 125 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index ff8697c..fa63047 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -349,129 +349,6 @@ get_format_string (tree format, location_t *ploc)
   return fmtstr;
 }
 
-/* Describes a location range outlining a substring within a string
-   literal.  */
-
-class substring_loc
-{
- public:
-  substring_loc (location_t fmt_string_loc,
-		 int caret_idx, int start_idx, int end_idx)
-    : m_fmt_string_loc (fmt_string_loc),
-    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx)
-  { }
-
-  const char *get_location (location_t *out_loc) const;
-
-  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
-
- private:
-  location_t m_fmt_string_loc;
-  int m_caret_idx;
-  int m_start_idx;
-  int m_end_idx;
-};
-
-/* The global record of string concatentations, for use in extracting
-   locations within string literals.  */
-
-GTY(()) string_concat_db *g_string_concat_db;
-
-/* Attempt to determine the source location of the substring.
-   If successful, return NULL and write the source location to *OUT_LOC.
-   Otherwise return an error message.  Error messages are intended
-   for GCC developers (to help debugging) rather than for end-users.  */
-
-const char *
-substring_loc::get_location (location_t *out_loc) const
-{
-  gcc_assert (out_loc);
-
-  if (!g_string_concat_db)
-    g_string_concat_db
-      = new (ggc_alloc <string_concat_db> ()) string_concat_db ();
-
-  static struct cpp_reader* parse_in;
-  if (!parse_in)
-    {
-      /* Create and initialize a preprocessing reader.  */
-      parse_in = cpp_create_reader (CLK_GNUC99, ident_hash, line_table);
-      cpp_init_iconv (parse_in);
-    }
-
-  return get_source_location_for_substring (parse_in, g_string_concat_db,
-					    m_fmt_string_loc, CPP_STRING,
-					    m_caret_idx, m_start_idx, m_end_idx,
-					    out_loc);
-}
-
-static ATTRIBUTE_GCC_DIAG (5,0) bool
-format_warning_va (const substring_loc &fmt_loc,
-		   const source_range *param_range,
-		   const char *corrected_substring,
-		   int opt, const char *gmsgid, va_list *ap)
-{
-  bool substring_within_range = false;
-  location_t primary_loc;
-  location_t fmt_substring_loc = UNKNOWN_LOCATION;
-  source_range fmt_loc_range
-    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
-  const char *err = fmt_loc.get_location (&fmt_substring_loc);
-  source_range fmt_substring_range
-    = get_range_from_loc (line_table, fmt_substring_loc);
-  if (err)
-    /* Case 3: unable to get substring location.  */
-    primary_loc = fmt_loc.get_fmt_string_loc ();
-  else
-    {
-      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
-	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
-	/* Case 1.  */
-	{
-	  substring_within_range = true;
-	  primary_loc = fmt_substring_loc;
-	}
-      else
-	/* Case 2.  */
-	{
-	  substring_within_range = false;
-	  primary_loc = fmt_loc.get_fmt_string_loc ();
-	}
-    }
-
-  rich_location richloc (line_table, primary_loc);
-
-  if (param_range)
-    {
-      location_t param_loc = make_location (param_range->m_start,
-					    param_range->m_start,
-					    param_range->m_finish);
-      richloc.add_range (param_loc, false);
-    }
-
-  if (!err && corrected_substring && substring_within_range)
-    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
-
-  diagnostic_info diagnostic;
-  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
-  diagnostic.option_index = opt;
-  bool warned = report_diagnostic (&diagnostic);
-
-  if (!err && fmt_substring_loc && !substring_within_range)
-    /* Case 2.  */
-    if (warned)
-      {
-	rich_location substring_richloc (line_table, fmt_substring_loc);
-	if (corrected_substring)
-	  substring_richloc.add_fixit_replace (fmt_substring_range,
-					       corrected_substring);
-	inform_at_rich_loc (&substring_richloc,
-			    "format string is defined here");
-      }
-
-  return warned;
-}
-
 static ATTRIBUTE_GCC_DIAG (5,0) bool
 fmtwarn (const substring_loc &fmt_loc,
 	 const source_range *param_range,
@@ -1695,7 +1572,8 @@ format_directive (const pass_sprintf_length::call_info &info,
 
   /* Create a location for the whole directive from the % to the format
      specifier.  */
-  substring_loc dirloc (info.fmtloc, offset, offset, offset + cvtlen - 1);
+  substring_loc dirloc (info.fmtloc, TREE_TYPE (info.format), offset, offset,
+			offset + cvtlen - 1);
 
   /* Also create a location range for the argument if possible.
      This doesn't work for integer literals or function calls.  */
@@ -1920,7 +1798,8 @@ add_bytes (const pass_sprintf_length::call_info &info,
       size_t len = strlen (info.fmtstr + off);
 
       substring_loc loc
-	(info.fmtloc, off - !len, len ? off : 0, off + len - !!len);
+	(info.fmtloc, TREE_TYPE (info.format), off - !len, len ? off : 0,
+	 off + len - !!len);
 
       /* Is the output of the last directive the result of the argument
 	 being within a range whose lower bound would fit in the buffer
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
new file mode 100644
index 0000000..b10a9cd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-4.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-std=c99 -O2 -Wformat -Wformat-length=1 -fdiagnostics-show-caret" } */
+
+void test (void)
+{
+  char a[4];
+  __builtin_sprintf (a, "abc%sghi", "def"); /* { dg-warning "29: .%s. directive writing 3 bytes into a region of size 1" } */
+  /* { dg-begin-multiline-output "" }
+   __builtin_sprintf (a, "abc%sghi", "def");
+                             ^~      ~~~~~
+     { dg-end-multiline-output "" } */
+
+  /* { dg-begin-multiline-output "" }
+   __builtin_sprintf (a, "abc%sghi", "def");
+   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch
  2016-08-25 15:54 ` [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch David Malcolm
@ 2016-08-25 16:30   ` Martin Sebor
  2016-08-31 16:23     ` Martin Sebor
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Sebor @ 2016-08-25 16:30 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On 08/25/2016 10:23 AM, David Malcolm wrote:
> Martin: here are the fixups for your patch I needed to apply to make
> it work with mine.  I couldn't actually get any of your existing test
> cases to emit locations within the string literals, due to them all
> being embedded in macro expansions (possibly relating to PR c/77328),
> so I added a simple testcase using -fdiagnostics-show-caret, which
> does successfully show a range within the string.
>
> Posting in the hope that it's helpful; I haven't attempted a bootstrap
> with it.

Thanks!  Let me go through it, test it and get back to you when I'm
done (sometime next week after I get back from PTO).

Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch
  2016-08-25 16:30   ` Martin Sebor
@ 2016-08-31 16:23     ` Martin Sebor
  2016-08-31 16:27       ` David Malcolm
  0 siblings, 1 reply; 11+ messages in thread
From: Martin Sebor @ 2016-08-31 16:23 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On 08/25/2016 10:30 AM, Martin Sebor wrote:
> On 08/25/2016 10:23 AM, David Malcolm wrote:
>> Martin: here are the fixups for your patch I needed to apply to make
>> it work with mine.  I couldn't actually get any of your existing test
>> cases to emit locations within the string literals, due to them all
>> being embedded in macro expansions (possibly relating to PR c/77328),
>> so I added a simple testcase using -fdiagnostics-show-caret, which
>> does successfully show a range within the string.
>>
>> Posting in the hope that it's helpful; I haven't attempted a bootstrap
>> with it.

I've tried the patch but the changes don't compile because
substring_loc is not declared.  I see the class defined in
c-family/c-common.h which I can't include here in the middle
end.  Am I missing some another patch?

Thanks
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch
  2016-08-31 16:23     ` Martin Sebor
@ 2016-08-31 16:27       ` David Malcolm
  2016-08-31 16:48         ` Martin Sebor
  0 siblings, 1 reply; 11+ messages in thread
From: David Malcolm @ 2016-08-31 16:27 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

On Wed, 2016-08-31 at 10:23 -0600, Martin Sebor wrote:
> On 08/25/2016 10:30 AM, Martin Sebor wrote:
> > On 08/25/2016 10:23 AM, David Malcolm wrote:
> > > Martin: here are the fixups for your patch I needed to apply to
> > > make
> > > it work with mine.  I couldn't actually get any of your existing
> > > test
> > > cases to emit locations within the string literals, due to them
> > > all
> > > being embedded in macro expansions (possibly relating to PR
> > > c/77328),
> > > so I added a simple testcase using -fdiagnostics-show-caret,
> > > which
> > > does successfully show a range within the string.
> > > 
> > > Posting in the hope that it's helpful; I haven't attempted a
> > > bootstrap
> > > with it.
> 
> I've tried the patch but the changes don't compile because
> substring_loc is not declared.  I see the class defined in
> c-family/c-common.h which I can't include here in the middle
> end.  Am I missing some another patch?

The fixup patch is on top of
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01811.html
which moves class substring_loc to gcc/substring-locations.h.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch
  2016-08-31 16:27       ` David Malcolm
@ 2016-08-31 16:48         ` Martin Sebor
  0 siblings, 0 replies; 11+ messages in thread
From: Martin Sebor @ 2016-08-31 16:48 UTC (permalink / raw)
  To: David Malcolm; +Cc: gcc-patches

On 08/31/2016 10:26 AM, David Malcolm wrote:
> On Wed, 2016-08-31 at 10:23 -0600, Martin Sebor wrote:
>> On 08/25/2016 10:30 AM, Martin Sebor wrote:
>>> On 08/25/2016 10:23 AM, David Malcolm wrote:
>>>> Martin: here are the fixups for your patch I needed to apply to
>>>> make
>>>> it work with mine.  I couldn't actually get any of your existing
>>>> test
>>>> cases to emit locations within the string literals, due to them
>>>> all
>>>> being embedded in macro expansions (possibly relating to PR
>>>> c/77328),
>>>> so I added a simple testcase using -fdiagnostics-show-caret,
>>>> which
>>>> does successfully show a range within the string.
>>>>
>>>> Posting in the hope that it's helpful; I haven't attempted a
>>>> bootstrap
>>>> with it.
>>
>> I've tried the patch but the changes don't compile because
>> substring_loc is not declared.  I see the class defined in
>> c-family/c-common.h which I can't include here in the middle
>> end.  Am I missing some another patch?
>
> The fixup patch is on top of
>    https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01811.html
> which moves class substring_loc to gcc/substring-locations.h.

Great!  With that patch my pass builds fine, all my tests pass,
and based on a quick comparison the output of the diagnostics
looks unchanged.

Thanks
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Move class substring_loc from c-family into gcc
  2016-08-25 15:40 [PATCH] Move class substring_loc from c-family into gcc David Malcolm
  2016-08-25 15:54 ` [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch David Malcolm
@ 2016-09-02 23:57 ` Martin Sebor
  2016-09-03  3:03   ` David Malcolm
  2016-09-03  9:22   ` Manuel López-Ibáñez
  1 sibling, 2 replies; 11+ messages in thread
From: Martin Sebor @ 2016-09-02 23:57 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

I've successfully tested the patch below by incorporating it into
the -Wformat-length pass I've been working on.  I'd like to submit
the latest and hopefully close to final version of my work for
review without duplicating this code and it might be helpful if
it was possible to build my latest patch without having to find
and install this one first.  Could someone review and approve
David's patch?

Thanks
Martin

On 08/25/2016 10:09 AM, David Malcolm wrote:
> This patch is intended to help Martin's new middle-end sprintf format
> warning.
>
> It moves most of the on-demand locations-within-strings
> code in c-family into gcc, into a new substring-locations.c file to
> go with substring-locations.h: class substring_loc, representing
> a source caret and source range within a STRING_CST, along with
> format_warning for handling emitting a warning relating to it.
>
> The actual work of reconstructing the source locations within
> a string seems inherently frontend-specific, so the patch introduces a
> langhook to do this, implementing it for C using the existing code
> (and thus hiding the details of string-concatenation as being
> specific to the c-family).  Attempts to get substring location
> from other frontends will fail, and the format_warning_* calls
> handle such failures gracefully by simply using the location of
> the string as a whole for the warning.
>
> I've tested it with Martin's gimple-ssa-sprintf patch, and I'm able to
> emit carets and underlines in the correct places in C code from the
> middle end with this approach (patch to follow).
>
> Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?
>
> gcc/ChangeLog:
> 	* Makefile.in (OBJS): Add substring-locations.o.
> 	* langhooks-def.h (class substring_loc): New forward decl.
> 	(lhd_get_substring_location): New decl.
> 	(LANG_HOOKS_GET_SUBSTRING_LOCATION): New macro.
> 	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_GET_SUBSTRING_LOCATION.
> 	* langhooks.c (lhd_get_substring_location): New function.
> 	* langhooks.h (class substring_loc): New forward decl.
> 	(struct lang_hooks): Add field get_substring_location.
> 	* substring-locations.c: New file, taking definition of
> 	format_warning_va and format_warning_at_substring from
> 	c-family/c-format.c, making them non-static.
> 	* substring-locations.h: Include <cpplib.h>.
> 	(class substring_loc): Move class here from c-family/c-common.h.
> 	Add comments.
> 	(format_warning_va): New decl.
> 	(format_warning_at_substring): New decl.
> 	(get_source_location_for_substring): Add comment.
>
> gcc/c-family/ChangeLog:
> 	* c-common.c (get_cpp_ttype_from_string_type): Handle being passed
> 	a POINTER_TYPE.
> 	(substring_loc::get_location): Move to substring-locations.c,
> 	keeping implementation as...
> 	(c_get_substring_location): New function, from the above, reworked
> 	to use accessors rather than member lookup.
> 	* c-common.h (class substring_loc): Move to substring-locations.h,
> 	replacing with a forward decl.
> 	(c_get_substring_location): New decl.
> 	* c-format.c: Include "substring-locations.h".
> 	(format_warning_va): Move to substring-locations.c.
> 	(format_warning_at_substring): Likewise.
>
> gcc/c/ChangeLog:
> 	* c-lang.c (LANG_HOOKS_GET_SUBSTRING_LOCATION): Use
> 	c_get_substring_location for this new langhook.
>
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Include
> 	"substring-locations.h".
> ---
>   gcc/Makefile.in                                    |   1 +
>   gcc/c-family/c-common.c                            |  23 +--
>   gcc/c-family/c-common.h                            |  32 +---
>   gcc/c-family/c-format.c                            | 157 +----------------
>   gcc/c/c-lang.c                                     |   3 +
>   gcc/langhooks-def.h                                |   8 +-
>   gcc/langhooks.c                                    |   8 +
>   gcc/langhooks.h                                    |   9 +
>   gcc/substring-locations.c                          | 195 +++++++++++++++++++++
>   gcc/substring-locations.h                          |  67 +++++++
>   .../diagnostic_plugin_test_string_literals.c       |   1 +
>   11 files changed, 308 insertions(+), 196 deletions(-)
>   create mode 100644 gcc/substring-locations.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 1b7464b..769efcf 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1444,6 +1444,7 @@ OBJS = \
>   	store-motion.o \
>   	streamer-hooks.o \
>   	stringpool.o \
> +	substring-locations.o \
>   	target-globals.o \
>   	targhooks.o \
>   	timevar.o \
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 3feb910..f3e44c2 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -1122,6 +1122,9 @@ static enum cpp_ttype
>   get_cpp_ttype_from_string_type (tree string_type)
>   {
>     gcc_assert (string_type);
> +  if (TREE_CODE (string_type) == POINTER_TYPE)
> +    string_type = TREE_TYPE (string_type);
> +
>     if (TREE_CODE (string_type) != ARRAY_TYPE)
>       return CPP_OTHER;
>
> @@ -1148,23 +1151,23 @@ get_cpp_ttype_from_string_type (tree string_type)
>
>   GTY(()) string_concat_db *g_string_concat_db;
>
> -/* Attempt to determine the source location of the substring.
> -   If successful, return NULL and write the source location to *OUT_LOC.
> -   Otherwise return an error message.  Error messages are intended
> -   for GCC developers (to help debugging) rather than for end-users.  */
> +/* Implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
>
>   const char *
> -substring_loc::get_location (location_t *out_loc) const
> +c_get_substring_location (const substring_loc &substr_loc,
> +			  location_t *out_loc)
>   {
> -  gcc_assert (out_loc);
> -
> -  enum cpp_ttype tok_type = get_cpp_ttype_from_string_type (m_string_type);
> +  enum cpp_ttype tok_type
> +    = get_cpp_ttype_from_string_type (substr_loc.get_string_type ());
>     if (tok_type == CPP_OTHER)
>       return "unrecognized string type";
>
>     return get_source_location_for_substring (parse_in, g_string_concat_db,
> -					    m_fmt_string_loc, tok_type,
> -					    m_caret_idx, m_start_idx, m_end_idx,
> +					    substr_loc.get_fmt_string_loc (),
> +					    tok_type,
> +					    substr_loc.get_caret_idx (),
> +					    substr_loc.get_start_idx (),
> +					    substr_loc.get_end_idx (),
>   					    out_loc);
>   }
>
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index bc22baa..4470b4f 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -1131,35 +1131,9 @@ extern const char *cb_get_suggestion (cpp_reader *, const char *,
>
>   extern GTY(()) string_concat_db *g_string_concat_db;
>
> -/* libcpp can calculate location information about a range of characters
> -   within a string literal, but doing so is non-trivial.
> -
> -   This class encapsulates such a source location, so that it can be
> -   passed around (e.g. within c-format.c).  It is effectively a deferred
> -   call into libcpp.  If needed by a diagnostic, the actual source_range
> -   can be calculated by calling the get_range method.  */
> -
> -class substring_loc
> -{
> - public:
> -  substring_loc (location_t fmt_string_loc, tree string_type,
> -		 int caret_idx, int start_idx, int end_idx)
> -  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
> -    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
> -
> -  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
> -
> -  const char *get_location (location_t *out_loc) const;
> -
> -  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
> -
> - private:
> -  location_t m_fmt_string_loc;
> -  tree m_string_type;
> -  int m_caret_idx;
> -  int m_start_idx;
> -  int m_end_idx;
> -};
> +class substring_loc;
> +extern const char *c_get_substring_location (const substring_loc &substr_loc,
> +					     location_t *out_loc);
>
>   /* In c-gimplify.c  */
>   extern void c_genericize (tree);
> diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
> index d373819..ac40ac3 100644
> --- a/gcc/c-family/c-format.c
> +++ b/gcc/c-family/c-format.c
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "langhooks.h"
>   #include "c-format.h"
>   #include "diagnostic.h"
> +#include "substring-locations.h"
>   #include "selftest.h"
>
>   /* Handle attributes associated with format checking.  */
> @@ -67,162 +68,6 @@ static int first_target_format_type;
>   static const char *format_name (int format_num);
>   static int format_flags (int format_num);
>
> -/* Emit a warning governed by option OPT, using GMSGID as the format
> -   string and AP as its arguments.
> -
> -   Attempt to obtain precise location information within a string
> -   literal from FMT_LOC.
> -
> -   Case 1: if substring location is available, and is within the range of
> -   the format string itself, the primary location of the
> -   diagnostic is the substring range obtained from FMT_LOC, with the
> -   caret at the *end* of the substring range.
> -
> -   For example:
> -
> -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> -     printf ("hello %i", msg);
> -                    ~^
> -
> -   Case 2: if the substring location is available, but is not within
> -   the range of the format string, the primary location is that of the
> -   format string, and an note is emitted showing the substring location.
> -
> -   For example:
> -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> -     printf("hello " INT_FMT " world", msg);
> -            ^~~~~~~~~~~~~~~~~~~~~~~~~
> -     test.c:19: note: format string is defined here
> -     #define INT_FMT "%i"
> -                      ~^
> -
> -   Case 3: if precise substring information is unavailable, the primary
> -   location is that of the whole string passed to FMT_LOC's constructor.
> -   For example:
> -
> -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> -     printf(fmt, msg);
> -            ^~~
> -
> -   For each of cases 1-3, if param_range is non-NULL, then it is used
> -   as a secondary range within the warning.  For example, here it
> -   is used with case 1:
> -
> -     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
> -     printf ("foo %s bar", long_i + long_j);
> -                  ~^       ~~~~~~~~~~~~~~~
> -
> -   and here with case 2:
> -
> -     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
> -     printf ("foo " STR_FMT " bar", long_i + long_j);
> -             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
> -     test.c:89:16: note: format string is defined here
> -     #define STR_FMT "%s"
> -                      ~^
> -
> -   and with case 3:
> -
> -     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
> -     printf(fmt, msg);
> -            ^~~  ~~~
> -
> -   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
> -   a fix-it hint, suggesting that it should replace the text within the
> -   substring range.  For example:
> -
> -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> -     printf ("hello %i", msg);
> -                    ~^
> -                    %s
> -
> -   Return true if a warning was emitted, false otherwise.  */
> -
> -ATTRIBUTE_GCC_DIAG (5,0)
> -static bool
> -format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
> -		   const char *corrected_substring,
> -		   int opt, const char *gmsgid, va_list *ap)
> -{
> -  bool substring_within_range = false;
> -  location_t primary_loc;
> -  location_t fmt_substring_loc = UNKNOWN_LOCATION;
> -  source_range fmt_loc_range
> -    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
> -  const char *err = fmt_loc.get_location (&fmt_substring_loc);
> -  source_range fmt_substring_range
> -    = get_range_from_loc (line_table, fmt_substring_loc);
> -  if (err)
> -    /* Case 3: unable to get substring location.  */
> -    primary_loc = fmt_loc.get_fmt_string_loc ();
> -  else
> -    {
> -      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
> -	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
> -	/* Case 1.  */
> -	{
> -	  substring_within_range = true;
> -	  primary_loc = fmt_substring_loc;
> -	}
> -      else
> -	/* Case 2.  */
> -	{
> -	  substring_within_range = false;
> -	  primary_loc = fmt_loc.get_fmt_string_loc ();
> -	}
> -    }
> -
> -  rich_location richloc (line_table, primary_loc);
> -
> -  if (param_range)
> -    {
> -      location_t param_loc = make_location (param_range->m_start,
> -					    param_range->m_start,
> -					    param_range->m_finish);
> -      richloc.add_range (param_loc, false);
> -    }
> -
> -  if (!err && corrected_substring && substring_within_range)
> -    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
> -
> -  diagnostic_info diagnostic;
> -  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
> -  diagnostic.option_index = opt;
> -  bool warned = report_diagnostic (&diagnostic);
> -
> -  if (!err && fmt_substring_loc && !substring_within_range)
> -    /* Case 2.  */
> -    if (warned)
> -      {
> -	rich_location substring_richloc (line_table, fmt_substring_loc);
> -	if (corrected_substring)
> -	  substring_richloc.add_fixit_replace (fmt_substring_range,
> -					       corrected_substring);
> -	inform_at_rich_loc (&substring_richloc,
> -			    "format string is defined here");
> -      }
> -
> -  return warned;
> -}
> -
> -/* Variadic call to format_warning_va.  */
> -
> -ATTRIBUTE_GCC_DIAG (5,0)
> -static bool
> -format_warning_at_substring (const substring_loc &fmt_loc,
> -			     source_range *param_range,
> -			     const char *corrected_substring,
> -			     int opt, const char *gmsgid, ...)
> -{
> -  va_list ap;
> -  va_start (ap, gmsgid);
> -  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
> -				   opt, gmsgid, &ap);
> -  va_end (ap);
> -
> -  return warned;
> -}
> -
>   /* Emit a warning as per format_warning_va, but construct the substring_loc
>      for the character at offset (CHAR_IDX - 1) within a string constant
>      FORMAT_STRING_CST at FMT_STRING_LOC.  */
> diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
> index b26be6a..b4096d0 100644
> --- a/gcc/c/c-lang.c
> +++ b/gcc/c/c-lang.c
> @@ -43,6 +43,9 @@ enum c_language_kind c_language = clk_c;
>   #define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
>   #endif /* #if CHECKING_P */
>
> +#undef LANG_HOOKS_GET_SUBSTRING_LOCATION
> +#define LANG_HOOKS_GET_SUBSTRING_LOCATION c_get_substring_location
> +
>   /* Each front end provides its own lang hook initializer.  */
>   struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
>
> diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
> index 10d910c..cf5f91d 100644
> --- a/gcc/langhooks-def.h
> +++ b/gcc/langhooks-def.h
> @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
>   #include "hooks.h"
>
>   struct diagnostic_info;
> +class substring_loc;
>
>   /* Note to creators of new hooks:
>
> @@ -81,6 +82,9 @@ extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
>   					       tree);
>   extern bool lhd_omp_mappable_type (tree);
>
> +extern const char *lhd_get_substring_location (const substring_loc &,
> +					       location_t *out_loc);
> +
>   #define LANG_HOOKS_NAME			"GNU unknown"
>   #define LANG_HOOKS_IDENTIFIER_SIZE	sizeof (struct lang_identifier)
>   #define LANG_HOOKS_INIT			hook_bool_void_false
> @@ -121,6 +125,7 @@ extern bool lhd_omp_mappable_type (tree);
>   #define LANG_HOOKS_EH_USE_CXA_END_CLEANUP	false
>   #define LANG_HOOKS_DEEP_UNSHARING	false
>   #define LANG_HOOKS_RUN_LANG_SELFTESTS   lhd_do_nothing
> +#define LANG_HOOKS_GET_SUBSTRING_LOCATION lhd_get_substring_location
>
>   /* Attribute hooks.  */
>   #define LANG_HOOKS_ATTRIBUTE_TABLE		NULL
> @@ -323,7 +328,8 @@ extern void lhd_end_section (void);
>     LANG_HOOKS_BLOCK_MAY_FALLTHRU, \
>     LANG_HOOKS_EH_USE_CXA_END_CLEANUP, \
>     LANG_HOOKS_DEEP_UNSHARING, \
> -  LANG_HOOKS_RUN_LANG_SELFTESTS \
> +  LANG_HOOKS_RUN_LANG_SELFTESTS, \
> +  LANG_HOOKS_GET_SUBSTRING_LOCATION \
>   }
>
>   #endif /* GCC_LANG_HOOKS_DEF_H */
> diff --git a/gcc/langhooks.c b/gcc/langhooks.c
> index 3256a9d..538d9f9 100644
> --- a/gcc/langhooks.c
> +++ b/gcc/langhooks.c
> @@ -693,6 +693,14 @@ lhd_enum_underlying_base_type (const_tree enum_type)
>   					 TYPE_UNSIGNED (enum_type));
>   }
>
> +/* Default implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
> +
> +const char *
> +lhd_get_substring_location (const substring_loc &, location_t *)
> +{
> +  return "unimplemented";
> +}
> +
>   /* Returns true if the current lang_hooks represents the GNU C frontend.  */
>
>   bool
> diff --git a/gcc/langhooks.h b/gcc/langhooks.h
> index 44c258e..c109c8c 100644
> --- a/gcc/langhooks.h
> +++ b/gcc/langhooks.h
> @@ -34,6 +34,8 @@ typedef void (*lang_print_tree_hook) (FILE *, tree, int indent);
>   enum classify_record
>     { RECORD_IS_STRUCT, RECORD_IS_CLASS, RECORD_IS_INTERFACE };
>
> +class substring_loc;
> +
>   /* The following hooks are documented in langhooks.c.  Must not be
>      NULL.  */
>
> @@ -513,6 +515,13 @@ struct lang_hooks
>     /* Run all lang-specific selftests.  */
>     void (*run_lang_selftests) (void);
>
> +  /* Attempt to determine the source location of the substring.
> +     If successful, return NULL and write the source location to *OUT_LOC.
> +     Otherwise return an error message.  Error messages are intended
> +     for GCC developers (to help debugging) rather than for end-users.  */
> +  const char *(*get_substring_location) (const substring_loc &,
> +					 location_t *out_loc);
> +
>     /* Whenever you add entries here, make sure you adjust langhooks-def.h
>        and langhooks.c accordingly.  */
>   };
> diff --git a/gcc/substring-locations.c b/gcc/substring-locations.c
> new file mode 100644
> index 0000000..60bf1b0
> --- /dev/null
> +++ b/gcc/substring-locations.c
> @@ -0,0 +1,195 @@
> +/* Source locations within string literals.
> +   Copyright (C) 2016 Free Software Foundation, Inc.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "diagnostic.h"
> +#include "cpplib.h"
> +#include "tree.h"
> +#include "langhooks.h"
> +#include "substring-locations.h"
> +
> +/* Emit a warning governed by option OPT, using GMSGID as the format
> +   string and AP as its arguments.
> +
> +   Attempt to obtain precise location information within a string
> +   literal from FMT_LOC.
> +
> +   Case 1: if substring location is available, and is within the range of
> +   the format string itself, the primary location of the
> +   diagnostic is the substring range obtained from FMT_LOC, with the
> +   caret at the *end* of the substring range.
> +
> +   For example:
> +
> +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> +     printf ("hello %i", msg);
> +                    ~^
> +
> +   Case 2: if the substring location is available, but is not within
> +   the range of the format string, the primary location is that of the
> +   format string, and an note is emitted showing the substring location.
> +
> +   For example:
> +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> +     printf("hello " INT_FMT " world", msg);
> +            ^~~~~~~~~~~~~~~~~~~~~~~~~
> +     test.c:19: note: format string is defined here
> +     #define INT_FMT "%i"
> +                      ~^
> +
> +   Case 3: if precise substring information is unavailable, the primary
> +   location is that of the whole string passed to FMT_LOC's constructor.
> +   For example:
> +
> +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> +     printf(fmt, msg);
> +            ^~~
> +
> +   For each of cases 1-3, if param_range is non-NULL, then it is used
> +   as a secondary range within the warning.  For example, here it
> +   is used with case 1:
> +
> +     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
> +     printf ("foo %s bar", long_i + long_j);
> +                  ~^       ~~~~~~~~~~~~~~~
> +
> +   and here with case 2:
> +
> +     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
> +     printf ("foo " STR_FMT " bar", long_i + long_j);
> +             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
> +     test.c:89:16: note: format string is defined here
> +     #define STR_FMT "%s"
> +                      ~^
> +
> +   and with case 3:
> +
> +     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
> +     printf(fmt, msg);
> +            ^~~  ~~~
> +
> +   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
> +   a fix-it hint, suggesting that it should replace the text within the
> +   substring range.  For example:
> +
> +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> +     printf ("hello %i", msg);
> +                    ~^
> +                    %s
> +
> +   Return true if a warning was emitted, false otherwise.  */
> +
> +ATTRIBUTE_GCC_DIAG (5,0)
> +bool
> +format_warning_va (const substring_loc &fmt_loc,
> +		   const source_range *param_range,
> +		   const char *corrected_substring,
> +		   int opt, const char *gmsgid, va_list *ap)
> +{
> +  bool substring_within_range = false;
> +  location_t primary_loc;
> +  location_t fmt_substring_loc = UNKNOWN_LOCATION;
> +  source_range fmt_loc_range
> +    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
> +  const char *err = fmt_loc.get_location (&fmt_substring_loc);
> +  source_range fmt_substring_range
> +    = get_range_from_loc (line_table, fmt_substring_loc);
> +  if (err)
> +    /* Case 3: unable to get substring location.  */
> +    primary_loc = fmt_loc.get_fmt_string_loc ();
> +  else
> +    {
> +      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
> +	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
> +	/* Case 1.  */
> +	{
> +	  substring_within_range = true;
> +	  primary_loc = fmt_substring_loc;
> +	}
> +      else
> +	/* Case 2.  */
> +	{
> +	  substring_within_range = false;
> +	  primary_loc = fmt_loc.get_fmt_string_loc ();
> +	}
> +    }
> +
> +  rich_location richloc (line_table, primary_loc);
> +
> +  if (param_range)
> +    {
> +      location_t param_loc = make_location (param_range->m_start,
> +					    param_range->m_start,
> +					    param_range->m_finish);
> +      richloc.add_range (param_loc, false);
> +    }
> +
> +  if (!err && corrected_substring && substring_within_range)
> +    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
> +
> +  diagnostic_info diagnostic;
> +  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
> +  diagnostic.option_index = opt;
> +  bool warned = report_diagnostic (&diagnostic);
> +
> +  if (!err && fmt_substring_loc && !substring_within_range)
> +    /* Case 2.  */
> +    if (warned)
> +      {
> +	rich_location substring_richloc (line_table, fmt_substring_loc);
> +	if (corrected_substring)
> +	  substring_richloc.add_fixit_replace (fmt_substring_range,
> +					       corrected_substring);
> +	inform_at_rich_loc (&substring_richloc,
> +			    "format string is defined here");
> +      }
> +
> +  return warned;
> +}
> +
> +/* Variadic call to format_warning_va.  */
> +
> +bool
> +format_warning_at_substring (const substring_loc &fmt_loc,
> +			     const source_range *param_range,
> +			     const char *corrected_substring,
> +			     int opt, const char *gmsgid, ...)
> +{
> +  va_list ap;
> +  va_start (ap, gmsgid);
> +  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
> +				   opt, gmsgid, &ap);
> +  va_end (ap);
> +
> +  return warned;
> +}
> +
> +/* Attempt to determine the source location of the substring.
> +   If successful, return NULL and write the source location to *OUT_LOC.
> +   Otherwise return an error message.  Error messages are intended
> +   for GCC developers (to help debugging) rather than for end-users.  */
> +
> +const char *
> +substring_loc::get_location (location_t *out_loc) const
> +{
> +  gcc_assert (out_loc);
> +  return lang_hooks.get_substring_location (*this, out_loc);
> +}
> diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
> index f839c74..bb0de4f 100644
> --- a/gcc/substring-locations.h
> +++ b/gcc/substring-locations.h
> @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not see
>   #ifndef GCC_SUBSTRING_LOCATIONS_H
>   #define GCC_SUBSTRING_LOCATIONS_H
>
> +#include <cpplib.h>
> +
> +/* libcpp can calculate location information about a range of characters
> +   within a string literal, but doing so is non-trivial.
> +
> +   This class encapsulates such a source location, so that it can be
> +   passed around (e.g. within c-format.c).  It is effectively a deferred
> +   call into libcpp.
> +
> +   If needed by a diagnostic, the actual location can be calculated by
> +   calling the get_location method.  This calls a langhook, since
> +   this is inherently frontend-specific (it reparses the strings to
> +   get the location information on-demand, rather than storing the
> +   location information in the initial lex for every string).
> +
> +   substring_loc::get_location returns NULL if it succeeds, or an
> +   error message if it fails.  Error messages are intended for GCC
> +   developers (to help debugging) rather than for end-users.  */
> +
> +class substring_loc
> +{
> + public:
> +  /* Constructor.  FMT_STRING_LOC is the location of the string as
> +     a whole.  STRING_TYPE is the type of the string.  It should be an
> +     ARRAY_TYPE of INTEGER_TYPE, or a POINTER_TYPE to such an ARRAY_TYPE.
> +     CARET_IDX, START_IDX, and END_IDX are offsets from the start
> +     of the string data.  */
> +  substring_loc (location_t fmt_string_loc, tree string_type,
> +		 int caret_idx, int start_idx, int end_idx)
> +  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
> +    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
> +
> +  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
> +
> +  const char *get_location (location_t *out_loc) const;
> +
> +  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
> +  tree get_string_type () const { return m_string_type; }
> +  int get_caret_idx () const { return m_caret_idx; }
> +  int get_start_idx () const { return m_start_idx; }
> +  int get_end_idx () const { return m_end_idx; }
> +
> + private:
> +  location_t m_fmt_string_loc;
> +  tree m_string_type;
> +  int m_caret_idx;
> +  int m_start_idx;
> +  int m_end_idx;
> +};
> +
> +/* Functions for emitting a warning about a format string.  */
> +
> +extern bool format_warning_va (const substring_loc &fmt_loc,
> +			       const source_range *param_range,
> +			       const char *corrected_substring,
> +			       int opt, const char *gmsgid, va_list *ap)
> +  ATTRIBUTE_GCC_DIAG (5,0);
> +
> +extern bool format_warning_at_substring (const substring_loc &fmt_loc,
> +					 const source_range *param_range,
> +					 const char *corrected_substring,
> +					 int opt, const char *gmsgid, ...)
> +  ATTRIBUTE_GCC_DIAG (5,0);
> +
> +/* Implementation detail, for use when implementing
> +   LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
> +
>   extern const char *get_source_location_for_substring (cpp_reader *pfile,
>   						      string_concat_db *concats,
>   						      location_t strloc,
> diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
> index dff999c..99a504d 100644
> --- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
> +++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
> @@ -33,6 +33,7 @@
>   #include "print-tree.h"
>   #include "cpplib.h"
>   #include "c-family/c-pragma.h"
> +#include "substring-locations.h"
>
>   int plugin_is_GPL_compatible;
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Move class substring_loc from c-family into gcc
  2016-09-02 23:57 ` [PATCH] Move class substring_loc from c-family into gcc Martin Sebor
@ 2016-09-03  3:03   ` David Malcolm
  2016-09-03  9:22   ` Manuel López-Ibáñez
  1 sibling, 0 replies; 11+ messages in thread
From: David Malcolm @ 2016-09-03  3:03 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches

On Fri, 2016-09-02 at 16:55 -0600, Martin Sebor wrote:
> I've successfully tested the patch below by incorporating it into
> the -Wformat-length pass I've been working on.  I'd like to submit
> the latest and hopefully close to final version of my work for
> review without duplicating this code and it might be helpful if
> it was possible to build my latest patch without having to find
> and install this one first.  Could someone review and approve
> David's patch?

I'm not quite sure what the boundaries of my "diagnostics"/"libcpp"
maintainer rights are, and on whether I could self-approve this one -
the addition of the langhook gave me pause.  But
 https://gcc.gnu.org/ml/gcc-patches/2016-08/msg02161.html
suggests that maybe I'm being too reticent, along with your positive
feedback on the patch; it is all about diagnostics infrastructure,
after all.  So if no-one complains I'll commit it early next week.

> Thanks
> Martin
> 
> On 08/25/2016 10:09 AM, David Malcolm wrote:
> > This patch is intended to help Martin's new middle-end sprintf
> > format
> > warning.
> > 
> > It moves most of the on-demand locations-within-strings
> > code in c-family into gcc, into a new substring-locations.c file to
> > go with substring-locations.h: class substring_loc, representing
> > a source caret and source range within a STRING_CST, along with
> > format_warning for handling emitting a warning relating to it.
> > 
> > The actual work of reconstructing the source locations within
> > a string seems inherently frontend-specific, so the patch
> > introduces a
> > langhook to do this, implementing it for C using the existing code
> > (and thus hiding the details of string-concatenation as being
> > specific to the c-family).  Attempts to get substring location
> > from other frontends will fail, and the format_warning_* calls
> > handle such failures gracefully by simply using the location of
> > the string as a whole for the warning.
> > 
> > I've tested it with Martin's gimple-ssa-sprintf patch, and I'm able
> > to
> > emit carets and underlines in the correct places in C code from the
> > middle end with this approach (patch to follow).
> > 
> > Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.
> > 
> > OK for trunk?
> > 
> > gcc/ChangeLog:
> > 	* Makefile.in (OBJS): Add substring-locations.o.
> > 	* langhooks-def.h (class substring_loc): New forward decl.
> > 	(lhd_get_substring_location): New decl.
> > 	(LANG_HOOKS_GET_SUBSTRING_LOCATION): New macro.
> > 	(LANG_HOOKS_INITIALIZER): Add
> > LANG_HOOKS_GET_SUBSTRING_LOCATION.
> > 	* langhooks.c (lhd_get_substring_location): New function.
> > 	* langhooks.h (class substring_loc): New forward decl.
> > 	(struct lang_hooks): Add field get_substring_location.
> > 	* substring-locations.c: New file, taking definition of
> > 	format_warning_va and format_warning_at_substring from
> > 	c-family/c-format.c, making them non-static.
> > 	* substring-locations.h: Include <cpplib.h>.
> > 	(class substring_loc): Move class here from c-family/c
> > -common.h.
> > 	Add comments.
> > 	(format_warning_va): New decl.
> > 	(format_warning_at_substring): New decl.
> > 	(get_source_location_for_substring): Add comment.
> > 
> > gcc/c-family/ChangeLog:
> > 	* c-common.c (get_cpp_ttype_from_string_type): Handle being
> > passed
> > 	a POINTER_TYPE.
> > 	(substring_loc::get_location): Move to substring-locations.c,
> > 	keeping implementation as...
> > 	(c_get_substring_location): New function, from the above,
> > reworked
> > 	to use accessors rather than member lookup.
> > 	* c-common.h (class substring_loc): Move to substring
> > -locations.h,
> > 	replacing with a forward decl.
> > 	(c_get_substring_location): New decl.
> > 	* c-format.c: Include "substring-locations.h".
> > 	(format_warning_va): Move to substring-locations.c.
> > 	(format_warning_at_substring): Likewise.
> > 
> > gcc/c/ChangeLog:
> > 	* c-lang.c (LANG_HOOKS_GET_SUBSTRING_LOCATION): Use
> > 	c_get_substring_location for this new langhook.
> > 
> > gcc/testsuite/ChangeLog:
> > 	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c:
> > Include
> > 	"substring-locations.h".
> > ---
> >   gcc/Makefile.in                                    |   1 +
> >   gcc/c-family/c-common.c                            |  23 +--
> >   gcc/c-family/c-common.h                            |  32 +---
> >   gcc/c-family/c-format.c                            | 157 +-------
> > ---------
> >   gcc/c/c-lang.c                                     |   3 +
> >   gcc/langhooks-def.h                                |   8 +-
> >   gcc/langhooks.c                                    |   8 +
> >   gcc/langhooks.h                                    |   9 +
> >   gcc/substring-locations.c                          | 195
> > +++++++++++++++++++++
> >   gcc/substring-locations.h                          |  67 +++++++
> >   .../diagnostic_plugin_test_string_literals.c       |   1 +
> >   11 files changed, 308 insertions(+), 196 deletions(-)
> >   create mode 100644 gcc/substring-locations.c
> > 
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 1b7464b..769efcf 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1444,6 +1444,7 @@ OBJS = \
> >   	store-motion.o \
> >   	streamer-hooks.o \
> >   	stringpool.o \
> > +	substring-locations.o \
> >   	target-globals.o \
> >   	targhooks.o \
> >   	timevar.o \
> > diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> > index 3feb910..f3e44c2 100644
> > --- a/gcc/c-family/c-common.c
> > +++ b/gcc/c-family/c-common.c
> > @@ -1122,6 +1122,9 @@ static enum cpp_ttype
> >   get_cpp_ttype_from_string_type (tree string_type)
> >   {
> >     gcc_assert (string_type);
> > +  if (TREE_CODE (string_type) == POINTER_TYPE)
> > +    string_type = TREE_TYPE (string_type);
> > +
> >     if (TREE_CODE (string_type) != ARRAY_TYPE)
> >       return CPP_OTHER;
> > 
> > @@ -1148,23 +1151,23 @@ get_cpp_ttype_from_string_type (tree
> > string_type)
> > 
> >   GTY(()) string_concat_db *g_string_concat_db;
> > 
> > -/* Attempt to determine the source location of the substring.
> > -   If successful, return NULL and write the source location to
> > *OUT_LOC.
> > -   Otherwise return an error message.  Error messages are intended
> > -   for GCC developers (to help debugging) rather than for end
> > -users.  */
> > +/* Implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
> > 
> >   const char *
> > -substring_loc::get_location (location_t *out_loc) const
> > +c_get_substring_location (const substring_loc &substr_loc,
> > +			  location_t *out_loc)
> >   {
> > -  gcc_assert (out_loc);
> > -
> > -  enum cpp_ttype tok_type = get_cpp_ttype_from_string_type
> > (m_string_type);
> > +  enum cpp_ttype tok_type
> > +    = get_cpp_ttype_from_string_type (substr_loc.get_string_type
> > ());
> >     if (tok_type == CPP_OTHER)
> >       return "unrecognized string type";
> > 
> >     return get_source_location_for_substring (parse_in,
> > g_string_concat_db,
> > -					    m_fmt_string_loc,
> > tok_type,
> > -					    m_caret_idx,
> > m_start_idx, m_end_idx,
> > +					   
> >  substr_loc.get_fmt_string_loc (),
> > +					    tok_type,
> > +					   
> >  substr_loc.get_caret_idx (),
> > +					   
> >  substr_loc.get_start_idx (),
> > +					    substr_loc.get_end_idx
> > (),
> >   					    out_loc);
> >   }
> > 
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index bc22baa..4470b4f 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -1131,35 +1131,9 @@ extern const char *cb_get_suggestion
> > (cpp_reader *, const char *,
> > 
> >   extern GTY(()) string_concat_db *g_string_concat_db;
> > 
> > -/* libcpp can calculate location information about a range of
> > characters
> > -   within a string literal, but doing so is non-trivial.
> > -
> > -   This class encapsulates such a source location, so that it can
> > be
> > -   passed around (e.g. within c-format.c).  It is effectively a
> > deferred
> > -   call into libcpp.  If needed by a diagnostic, the actual
> > source_range
> > -   can be calculated by calling the get_range method.  */
> > -
> > -class substring_loc
> > -{
> > - public:
> > -  substring_loc (location_t fmt_string_loc, tree string_type,
> > -		 int caret_idx, int start_idx, int end_idx)
> > -  : m_fmt_string_loc (fmt_string_loc), m_string_type
> > (string_type),
> > -    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx
> > (end_idx) {}
> > -
> > -  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx;
> > }
> > -
> > -  const char *get_location (location_t *out_loc) const;
> > -
> > -  location_t get_fmt_string_loc () const { return
> > m_fmt_string_loc; }
> > -
> > - private:
> > -  location_t m_fmt_string_loc;
> > -  tree m_string_type;
> > -  int m_caret_idx;
> > -  int m_start_idx;
> > -  int m_end_idx;
> > -};
> > +class substring_loc;
> > +extern const char *c_get_substring_location (const substring_loc
> > &substr_loc,
> > +					     location_t *out_loc);
> > 
> >   /* In c-gimplify.c  */
> >   extern void c_genericize (tree);
> > diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
> > index d373819..ac40ac3 100644
> > --- a/gcc/c-family/c-format.c
> > +++ b/gcc/c-family/c-format.c
> > @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >   #include "langhooks.h"
> >   #include "c-format.h"
> >   #include "diagnostic.h"
> > +#include "substring-locations.h"
> >   #include "selftest.h"
> > 
> >   /* Handle attributes associated with format checking.  */
> > @@ -67,162 +68,6 @@ static int first_target_format_type;
> >   static const char *format_name (int format_num);
> >   static int format_flags (int format_num);
> > 
> > -/* Emit a warning governed by option OPT, using GMSGID as the
> > format
> > -   string and AP as its arguments.
> > -
> > -   Attempt to obtain precise location information within a string
> > -   literal from FMT_LOC.
> > -
> > -   Case 1: if substring location is available, and is within the
> > range of
> > -   the format string itself, the primary location of the
> > -   diagnostic is the substring range obtained from FMT_LOC, with
> > the
> > -   caret at the *end* of the substring range.
> > -
> > -   For example:
> > -
> > -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > -     printf ("hello %i", msg);
> > -                    ~^
> > -
> > -   Case 2: if the substring location is available, but is not
> > within
> > -   the range of the format string, the primary location is that of
> > the
> > -   format string, and an note is emitted showing the substring
> > location.
> > -
> > -   For example:
> > -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > -     printf("hello " INT_FMT " world", msg);
> > -            ^~~~~~~~~~~~~~~~~~~~~~~~~
> > -     test.c:19: note: format string is defined here
> > -     #define INT_FMT "%i"
> > -                      ~^
> > -
> > -   Case 3: if precise substring information is unavailable, the
> > primary
> > -   location is that of the whole string passed to FMT_LOC's
> > constructor.
> > -   For example:
> > -
> > -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > -     printf(fmt, msg);
> > -            ^~~
> > -
> > -   For each of cases 1-3, if param_range is non-NULL, then it is
> > used
> > -   as a secondary range within the warning.  For example, here it
> > -   is used with case 1:
> > -
> > -     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [
> > -Wformat=]
> > -     printf ("foo %s bar", long_i + long_j);
> > -                  ~^       ~~~~~~~~~~~~~~~
> > -
> > -   and here with case 2:
> > -
> > -     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [
> > -Wformat=]
> > -     printf ("foo " STR_FMT " bar", long_i + long_j);
> > -             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
> > -     test.c:89:16: note: format string is defined here
> > -     #define STR_FMT "%s"
> > -                      ~^
> > -
> > -   and with case 3:
> > -
> > -     test.c:90:10: warning: '%i' here, but arg 2 is "const char *'
> > [-Wformat=]
> > -     printf(fmt, msg);
> > -            ^~~  ~~~
> > -
> > -   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to
> > provide
> > -   a fix-it hint, suggesting that it should replace the text
> > within the
> > -   substring range.  For example:
> > -
> > -     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > -     printf ("hello %i", msg);
> > -                    ~^
> > -                    %s
> > -
> > -   Return true if a warning was emitted, false otherwise.  */
> > -
> > -ATTRIBUTE_GCC_DIAG (5,0)
> > -static bool
> > -format_warning_va (const substring_loc &fmt_loc, source_range
> > *param_range,
> > -		   const char *corrected_substring,
> > -		   int opt, const char *gmsgid, va_list *ap)
> > -{
> > -  bool substring_within_range = false;
> > -  location_t primary_loc;
> > -  location_t fmt_substring_loc = UNKNOWN_LOCATION;
> > -  source_range fmt_loc_range
> > -    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc
> > ());
> > -  const char *err = fmt_loc.get_location (&fmt_substring_loc);
> > -  source_range fmt_substring_range
> > -    = get_range_from_loc (line_table, fmt_substring_loc);
> > -  if (err)
> > -    /* Case 3: unable to get substring location.  */
> > -    primary_loc = fmt_loc.get_fmt_string_loc ();
> > -  else
> > -    {
> > -      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
> > -	  && fmt_substring_range.m_finish <=
> > fmt_loc_range.m_finish)
> > -	/* Case 1.  */
> > -	{
> > -	  substring_within_range = true;
> > -	  primary_loc = fmt_substring_loc;
> > -	}
> > -      else
> > -	/* Case 2.  */
> > -	{
> > -	  substring_within_range = false;
> > -	  primary_loc = fmt_loc.get_fmt_string_loc ();
> > -	}
> > -    }
> > -
> > -  rich_location richloc (line_table, primary_loc);
> > -
> > -  if (param_range)
> > -    {
> > -      location_t param_loc = make_location (param_range->m_start,
> > -					    param_range->m_start,
> > -					    param_range
> > ->m_finish);
> > -      richloc.add_range (param_loc, false);
> > -    }
> > -
> > -  if (!err && corrected_substring && substring_within_range)
> > -    richloc.add_fixit_replace (fmt_substring_range,
> > corrected_substring);
> > -
> > -  diagnostic_info diagnostic;
> > -  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc,
> > DK_WARNING);
> > -  diagnostic.option_index = opt;
> > -  bool warned = report_diagnostic (&diagnostic);
> > -
> > -  if (!err && fmt_substring_loc && !substring_within_range)
> > -    /* Case 2.  */
> > -    if (warned)
> > -      {
> > -	rich_location substring_richloc (line_table,
> > fmt_substring_loc);
> > -	if (corrected_substring)
> > -	  substring_richloc.add_fixit_replace
> > (fmt_substring_range,
> > -					      
> >  corrected_substring);
> > -	inform_at_rich_loc (&substring_richloc,
> > -			    "format string is defined here");
> > -      }
> > -
> > -  return warned;
> > -}
> > -
> > -/* Variadic call to format_warning_va.  */
> > -
> > -ATTRIBUTE_GCC_DIAG (5,0)
> > -static bool
> > -format_warning_at_substring (const substring_loc &fmt_loc,
> > -			     source_range *param_range,
> > -			     const char *corrected_substring,
> > -			     int opt, const char *gmsgid, ...)
> > -{
> > -  va_list ap;
> > -  va_start (ap, gmsgid);
> > -  bool warned = format_warning_va (fmt_loc, param_range,
> > corrected_substring,
> > -				   opt, gmsgid, &ap);
> > -  va_end (ap);
> > -
> > -  return warned;
> > -}
> > -
> >   /* Emit a warning as per format_warning_va, but construct the
> > substring_loc
> >      for the character at offset (CHAR_IDX - 1) within a string
> > constant
> >      FORMAT_STRING_CST at FMT_STRING_LOC.  */
> > diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
> > index b26be6a..b4096d0 100644
> > --- a/gcc/c/c-lang.c
> > +++ b/gcc/c/c-lang.c
> > @@ -43,6 +43,9 @@ enum c_language_kind c_language = clk_c;
> >   #define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
> >   #endif /* #if CHECKING_P */
> > 
> > +#undef LANG_HOOKS_GET_SUBSTRING_LOCATION
> > +#define LANG_HOOKS_GET_SUBSTRING_LOCATION c_get_substring_location
> > +
> >   /* Each front end provides its own lang hook initializer.  */
> >   struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
> > 
> > diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
> > index 10d910c..cf5f91d 100644
> > --- a/gcc/langhooks-def.h
> > +++ b/gcc/langhooks-def.h
> > @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not
> > see
> >   #include "hooks.h"
> > 
> >   struct diagnostic_info;
> > +class substring_loc;
> > 
> >   /* Note to creators of new hooks:
> > 
> > @@ -81,6 +82,9 @@ extern void lhd_omp_firstprivatize_type_sizes
> > (struct gimplify_omp_ctx *,
> >   					       tree);
> >   extern bool lhd_omp_mappable_type (tree);
> > 
> > +extern const char *lhd_get_substring_location (const substring_loc
> > &,
> > +					       location_t
> > *out_loc);
> > +
> >   #define LANG_HOOKS_NAME			"GNU unknown"
> >   #define LANG_HOOKS_IDENTIFIER_SIZE	sizeof (struct
> > lang_identifier)
> >   #define LANG_HOOKS_INIT			hook_bool_void_fal
> > se
> > @@ -121,6 +125,7 @@ extern bool lhd_omp_mappable_type (tree);
> >   #define LANG_HOOKS_EH_USE_CXA_END_CLEANUP	false
> >   #define LANG_HOOKS_DEEP_UNSHARING	false
> >   #define LANG_HOOKS_RUN_LANG_SELFTESTS   lhd_do_nothing
> > +#define LANG_HOOKS_GET_SUBSTRING_LOCATION
> > lhd_get_substring_location
> > 
> >   /* Attribute hooks.  */
> >   #define LANG_HOOKS_ATTRIBUTE_TABLE		NULL
> > @@ -323,7 +328,8 @@ extern void lhd_end_section (void);
> >     LANG_HOOKS_BLOCK_MAY_FALLTHRU, \
> >     LANG_HOOKS_EH_USE_CXA_END_CLEANUP, \
> >     LANG_HOOKS_DEEP_UNSHARING, \
> > -  LANG_HOOKS_RUN_LANG_SELFTESTS \
> > +  LANG_HOOKS_RUN_LANG_SELFTESTS, \
> > +  LANG_HOOKS_GET_SUBSTRING_LOCATION \
> >   }
> > 
> >   #endif /* GCC_LANG_HOOKS_DEF_H */
> > diff --git a/gcc/langhooks.c b/gcc/langhooks.c
> > index 3256a9d..538d9f9 100644
> > --- a/gcc/langhooks.c
> > +++ b/gcc/langhooks.c
> > @@ -693,6 +693,14 @@ lhd_enum_underlying_base_type (const_tree
> > enum_type)
> >   					 TYPE_UNSIGNED
> > (enum_type));
> >   }
> > 
> > +/* Default implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION. 
> >  */
> > +
> > +const char *
> > +lhd_get_substring_location (const substring_loc &, location_t *)
> > +{
> > +  return "unimplemented";
> > +}
> > +
> >   /* Returns true if the current lang_hooks represents the GNU C
> > frontend.  */
> > 
> >   bool
> > diff --git a/gcc/langhooks.h b/gcc/langhooks.h
> > index 44c258e..c109c8c 100644
> > --- a/gcc/langhooks.h
> > +++ b/gcc/langhooks.h
> > @@ -34,6 +34,8 @@ typedef void (*lang_print_tree_hook) (FILE *,
> > tree, int indent);
> >   enum classify_record
> >     { RECORD_IS_STRUCT, RECORD_IS_CLASS, RECORD_IS_INTERFACE };
> > 
> > +class substring_loc;
> > +
> >   /* The following hooks are documented in langhooks.c.  Must not
> > be
> >      NULL.  */
> > 
> > @@ -513,6 +515,13 @@ struct lang_hooks
> >     /* Run all lang-specific selftests.  */
> >     void (*run_lang_selftests) (void);
> > 
> > +  /* Attempt to determine the source location of the substring.
> > +     If successful, return NULL and write the source location to
> > *OUT_LOC.
> > +     Otherwise return an error message.  Error messages are
> > intended
> > +     for GCC developers (to help debugging) rather than for end
> > -users.  */
> > +  const char *(*get_substring_location) (const substring_loc &,
> > +					 location_t *out_loc);
> > +
> >     /* Whenever you add entries here, make sure you adjust
> > langhooks-def.h
> >        and langhooks.c accordingly.  */
> >   };
> > diff --git a/gcc/substring-locations.c b/gcc/substring-locations.c
> > new file mode 100644
> > index 0000000..60bf1b0
> > --- /dev/null
> > +++ b/gcc/substring-locations.c
> > @@ -0,0 +1,195 @@
> > +/* Source locations within string literals.
> > +   Copyright (C) 2016 Free Software Foundation, Inc.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > under
> > +the terms of the GNU General Public License as published by the
> > Free
> > +Software Foundation; either version 3, or (at your option) any
> > later
> > +version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > ANY
> > +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> > License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "diagnostic.h"
> > +#include "cpplib.h"
> > +#include "tree.h"
> > +#include "langhooks.h"
> > +#include "substring-locations.h"
> > +
> > +/* Emit a warning governed by option OPT, using GMSGID as the
> > format
> > +   string and AP as its arguments.
> > +
> > +   Attempt to obtain precise location information within a string
> > +   literal from FMT_LOC.
> > +
> > +   Case 1: if substring location is available, and is within the
> > range of
> > +   the format string itself, the primary location of the
> > +   diagnostic is the substring range obtained from FMT_LOC, with
> > the
> > +   caret at the *end* of the substring range.
> > +
> > +   For example:
> > +
> > +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > +     printf ("hello %i", msg);
> > +                    ~^
> > +
> > +   Case 2: if the substring location is available, but is not
> > within
> > +   the range of the format string, the primary location is that of
> > the
> > +   format string, and an note is emitted showing the substring
> > location.
> > +
> > +   For example:
> > +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > +     printf("hello " INT_FMT " world", msg);
> > +            ^~~~~~~~~~~~~~~~~~~~~~~~~
> > +     test.c:19: note: format string is defined here
> > +     #define INT_FMT "%i"
> > +                      ~^
> > +
> > +   Case 3: if precise substring information is unavailable, the
> > primary
> > +   location is that of the whole string passed to FMT_LOC's
> > constructor.
> > +   For example:
> > +
> > +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > +     printf(fmt, msg);
> > +            ^~~
> > +
> > +   For each of cases 1-3, if param_range is non-NULL, then it is
> > used
> > +   as a secondary range within the warning.  For example, here it
> > +   is used with case 1:
> > +
> > +     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [
> > -Wformat=]
> > +     printf ("foo %s bar", long_i + long_j);
> > +                  ~^       ~~~~~~~~~~~~~~~
> > +
> > +   and here with case 2:
> > +
> > +     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [
> > -Wformat=]
> > +     printf ("foo " STR_FMT " bar", long_i + long_j);
> > +             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
> > +     test.c:89:16: note: format string is defined here
> > +     #define STR_FMT "%s"
> > +                      ~^
> > +
> > +   and with case 3:
> > +
> > +     test.c:90:10: warning: '%i' here, but arg 2 is "const char *'
> > [-Wformat=]
> > +     printf(fmt, msg);
> > +            ^~~  ~~~
> > +
> > +   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to
> > provide
> > +   a fix-it hint, suggesting that it should replace the text
> > within the
> > +   substring range.  For example:
> > +
> > +     test.c:90:10: warning: problem with '%i' here [-Wformat=]
> > +     printf ("hello %i", msg);
> > +                    ~^
> > +                    %s
> > +
> > +   Return true if a warning was emitted, false otherwise.  */
> > +
> > +ATTRIBUTE_GCC_DIAG (5,0)
> > +bool
> > +format_warning_va (const substring_loc &fmt_loc,
> > +		   const source_range *param_range,
> > +		   const char *corrected_substring,
> > +		   int opt, const char *gmsgid, va_list *ap)
> > +{
> > +  bool substring_within_range = false;
> > +  location_t primary_loc;
> > +  location_t fmt_substring_loc = UNKNOWN_LOCATION;
> > +  source_range fmt_loc_range
> > +    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc
> > ());
> > +  const char *err = fmt_loc.get_location (&fmt_substring_loc);
> > +  source_range fmt_substring_range
> > +    = get_range_from_loc (line_table, fmt_substring_loc);
> > +  if (err)
> > +    /* Case 3: unable to get substring location.  */
> > +    primary_loc = fmt_loc.get_fmt_string_loc ();
> > +  else
> > +    {
> > +      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
> > +	  && fmt_substring_range.m_finish <=
> > fmt_loc_range.m_finish)
> > +	/* Case 1.  */
> > +	{
> > +	  substring_within_range = true;
> > +	  primary_loc = fmt_substring_loc;
> > +	}
> > +      else
> > +	/* Case 2.  */
> > +	{
> > +	  substring_within_range = false;
> > +	  primary_loc = fmt_loc.get_fmt_string_loc ();
> > +	}
> > +    }
> > +
> > +  rich_location richloc (line_table, primary_loc);
> > +
> > +  if (param_range)
> > +    {
> > +      location_t param_loc = make_location (param_range->m_start,
> > +					    param_range->m_start,
> > +					    param_range
> > ->m_finish);
> > +      richloc.add_range (param_loc, false);
> > +    }
> > +
> > +  if (!err && corrected_substring && substring_within_range)
> > +    richloc.add_fixit_replace (fmt_substring_range,
> > corrected_substring);
> > +
> > +  diagnostic_info diagnostic;
> > +  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc,
> > DK_WARNING);
> > +  diagnostic.option_index = opt;
> > +  bool warned = report_diagnostic (&diagnostic);
> > +
> > +  if (!err && fmt_substring_loc && !substring_within_range)
> > +    /* Case 2.  */
> > +    if (warned)
> > +      {
> > +	rich_location substring_richloc (line_table,
> > fmt_substring_loc);
> > +	if (corrected_substring)
> > +	  substring_richloc.add_fixit_replace
> > (fmt_substring_range,
> > +					      
> >  corrected_substring);
> > +	inform_at_rich_loc (&substring_richloc,
> > +			    "format string is defined here");
> > +      }
> > +
> > +  return warned;
> > +}
> > +
> > +/* Variadic call to format_warning_va.  */
> > +
> > +bool
> > +format_warning_at_substring (const substring_loc &fmt_loc,
> > +			     const source_range *param_range,
> > +			     const char *corrected_substring,
> > +			     int opt, const char *gmsgid, ...)
> > +{
> > +  va_list ap;
> > +  va_start (ap, gmsgid);
> > +  bool warned = format_warning_va (fmt_loc, param_range,
> > corrected_substring,
> > +				   opt, gmsgid, &ap);
> > +  va_end (ap);
> > +
> > +  return warned;
> > +}
> > +
> > +/* Attempt to determine the source location of the substring.
> > +   If successful, return NULL and write the source location to
> > *OUT_LOC.
> > +   Otherwise return an error message.  Error messages are intended
> > +   for GCC developers (to help debugging) rather than for end
> > -users.  */
> > +
> > +const char *
> > +substring_loc::get_location (location_t *out_loc) const
> > +{
> > +  gcc_assert (out_loc);
> > +  return lang_hooks.get_substring_location (*this, out_loc);
> > +}
> > diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
> > index f839c74..bb0de4f 100644
> > --- a/gcc/substring-locations.h
> > +++ b/gcc/substring-locations.h
> > @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not
> > see
> >   #ifndef GCC_SUBSTRING_LOCATIONS_H
> >   #define GCC_SUBSTRING_LOCATIONS_H
> > 
> > +#include <cpplib.h>
> > +
> > +/* libcpp can calculate location information about a range of
> > characters
> > +   within a string literal, but doing so is non-trivial.
> > +
> > +   This class encapsulates such a source location, so that it can
> > be
> > +   passed around (e.g. within c-format.c).  It is effectively a
> > deferred
> > +   call into libcpp.
> > +
> > +   If needed by a diagnostic, the actual location can be
> > calculated by
> > +   calling the get_location method.  This calls a langhook, since
> > +   this is inherently frontend-specific (it reparses the strings
> > to
> > +   get the location information on-demand, rather than storing the
> > +   location information in the initial lex for every string).
> > +
> > +   substring_loc::get_location returns NULL if it succeeds, or an
> > +   error message if it fails.  Error messages are intended for GCC
> > +   developers (to help debugging) rather than for end-users.  */
> > +
> > +class substring_loc
> > +{
> > + public:
> > +  /* Constructor.  FMT_STRING_LOC is the location of the string as
> > +     a whole.  STRING_TYPE is the type of the string.  It should
> > be an
> > +     ARRAY_TYPE of INTEGER_TYPE, or a POINTER_TYPE to such an
> > ARRAY_TYPE.
> > +     CARET_IDX, START_IDX, and END_IDX are offsets from the start
> > +     of the string data.  */
> > +  substring_loc (location_t fmt_string_loc, tree string_type,
> > +		 int caret_idx, int start_idx, int end_idx)
> > +  : m_fmt_string_loc (fmt_string_loc), m_string_type
> > (string_type),
> > +    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx
> > (end_idx) {}
> > +
> > +  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx;
> > }
> > +
> > +  const char *get_location (location_t *out_loc) const;
> > +
> > +  location_t get_fmt_string_loc () const { return
> > m_fmt_string_loc; }
> > +  tree get_string_type () const { return m_string_type; }
> > +  int get_caret_idx () const { return m_caret_idx; }
> > +  int get_start_idx () const { return m_start_idx; }
> > +  int get_end_idx () const { return m_end_idx; }
> > +
> > + private:
> > +  location_t m_fmt_string_loc;
> > +  tree m_string_type;
> > +  int m_caret_idx;
> > +  int m_start_idx;
> > +  int m_end_idx;
> > +};
> > +
> > +/* Functions for emitting a warning about a format string.  */
> > +
> > +extern bool format_warning_va (const substring_loc &fmt_loc,
> > +			       const source_range *param_range,
> > +			       const char *corrected_substring,
> > +			       int opt, const char *gmsgid,
> > va_list *ap)
> > +  ATTRIBUTE_GCC_DIAG (5,0);
> > +
> > +extern bool format_warning_at_substring (const substring_loc
> > &fmt_loc,
> > +					 const source_range
> > *param_range,
> > +					 const char
> > *corrected_substring,
> > +					 int opt, const char
> > *gmsgid, ...)
> > +  ATTRIBUTE_GCC_DIAG (5,0);
> > +
> > +/* Implementation detail, for use when implementing
> > +   LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
> > +
> >   extern const char *get_source_location_for_substring (cpp_reader
> > *pfile,
> >   						     
> >  string_concat_db *concats,
> >   						      location_t
> > strloc,
> > diff --git
> > a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literal
> > s.c
> > b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literal
> > s.c
> > index dff999c..99a504d 100644
> > ---
> > a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literal
> > s.c
> > +++
> > b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literal
> > s.c
> > @@ -33,6 +33,7 @@
> >   #include "print-tree.h"
> >   #include "cpplib.h"
> >   #include "c-family/c-pragma.h"
> > +#include "substring-locations.h"
> > 
> >   int plugin_is_GPL_compatible;
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Move class substring_loc from c-family into gcc
  2016-09-02 23:57 ` [PATCH] Move class substring_loc from c-family into gcc Martin Sebor
  2016-09-03  3:03   ` David Malcolm
@ 2016-09-03  9:22   ` Manuel López-Ibáñez
  2016-09-06 22:07     ` Martin Sebor
  2016-09-07 17:19     ` [committed] " David Malcolm
  1 sibling, 2 replies; 11+ messages in thread
From: Manuel López-Ibáñez @ 2016-09-03  9:22 UTC (permalink / raw)
  To: Martin Sebor, David Malcolm, GCC Patches

On 02/09/16 23:55, Martin Sebor wrote:
>> diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
>> index f839c74..bb0de4f 100644
>> --- a/gcc/substring-locations.h
>> +++ b/gcc/substring-locations.h
>> @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not see
>>   #ifndef GCC_SUBSTRING_LOCATIONS_H
>>   #define GCC_SUBSTRING_LOCATIONS_H
>>
>> +#include <cpplib.h>
>> +

Is this header file going to be used in the middle-end? If so, then it is 
suspicious that it includes cpplib.h. Otherwise, perhaps it should live in 
c-family/

I'm not complaining about substring-locations.c because libcpp is already 
linked with everything else even for other non-C languages, like Ada, but the 
above is leaking all cpplib.h into the rest of the compiler, which defeats the 
purpose of this in coretypes.h

/* Provide forward struct declaration so that we don't have to include
    all of cpplib.h whenever a random prototype includes a pointer.
    Note that the cpp_reader and cpp_token typedefs remain part of
    cpplib.h.  */

struct cpp_reader;
struct cpp_token;


Cheers,
	Manuel.





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Move class substring_loc from c-family into gcc
  2016-09-03  9:22   ` Manuel López-Ibáñez
@ 2016-09-06 22:07     ` Martin Sebor
  2016-09-07 17:19     ` [committed] " David Malcolm
  1 sibling, 0 replies; 11+ messages in thread
From: Martin Sebor @ 2016-09-06 22:07 UTC (permalink / raw)
  To: Manuel López-Ibáñez, David Malcolm, GCC Patches

On 09/03/2016 03:13 AM, Manuel López-Ibáñez wrote:
> On 02/09/16 23:55, Martin Sebor wrote:
>>> diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
>>> index f839c74..bb0de4f 100644
>>> --- a/gcc/substring-locations.h
>>> +++ b/gcc/substring-locations.h
>>> @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not see
>>>   #ifndef GCC_SUBSTRING_LOCATIONS_H
>>>   #define GCC_SUBSTRING_LOCATIONS_H
>>>
>>> +#include <cpplib.h>
>>> +
>
> Is this header file going to be used in the middle-end? If so, then it
> is suspicious that it includes cpplib.h. Otherwise, perhaps it should
> live in c-family/

I'm not sure if you meant this question for me or for David but
I believe the main (only?) reason for this patch is let to make
use, in the -Wformat-length middle-end pass, of the same range
location API as in c-family/c-format in the C/C++ front ends.
David, please correct me if I mischaracterized it.

Martin

>
> I'm not complaining about substring-locations.c because libcpp is
> already linked with everything else even for other non-C languages, like
> Ada, but the above is leaking all cpplib.h into the rest of the
> compiler, which defeats the purpose of this in coretypes.h
>
> /* Provide forward struct declaration so that we don't have to include
>     all of cpplib.h whenever a random prototype includes a pointer.
>     Note that the cpp_reader and cpp_token typedefs remain part of
>     cpplib.h.  */
>
> struct cpp_reader;
> struct cpp_token;
>
>
> Cheers,
>      Manuel.
>
>
>
>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [committed] Move class substring_loc from c-family into gcc
  2016-09-03  9:22   ` Manuel López-Ibáñez
  2016-09-06 22:07     ` Martin Sebor
@ 2016-09-07 17:19     ` David Malcolm
  1 sibling, 0 replies; 11+ messages in thread
From: David Malcolm @ 2016-09-07 17:19 UTC (permalink / raw)
  To: Manuel López-Ibáñez, Martin Sebor, GCC Patches
  Cc: David Malcolm

On Sat, 2016-09-03 at 10:13 +0100, Manuel López-Ibáñez wrote:
> On 02/09/16 23:55, Martin Sebor wrote:
> > > diff --git a/gcc/substring-locations.h b/gcc/substring
> > > -locations.h
> > > index f839c74..bb0de4f 100644
> > > --- a/gcc/substring-locations.h
> > > +++ b/gcc/substring-locations.h
> > > @@ -20,6 +20,73 @@ along with GCC; see the file COPYING3.  If not
> > > see
> > >   #ifndef GCC_SUBSTRING_LOCATIONS_H
> > >   #define GCC_SUBSTRING_LOCATIONS_H
> > > 
> > > +#include <cpplib.h>
> > > +
> 
> Is this header file going to be used in the middle-end? If so, then
> it is
> suspicious that it includes cpplib.h. Otherwise, perhaps it should
> live in
> c-family/

The include of cpplib.h turned out not to be necessary, so I removed it.
I also reworded the comment for class substring_loc; here's the patch
I ended up committing.

Successfully bootstrapped&regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r240028.

gcc/ChangeLog:
	* Makefile.in (OBJS): Add substring-locations.o.
	* langhooks-def.h (class substring_loc): New forward decl.
	(lhd_get_substring_location): New decl.
	(LANG_HOOKS_GET_SUBSTRING_LOCATION): New macro.
	(LANG_HOOKS_INITIALIZER): Add LANG_HOOKS_GET_SUBSTRING_LOCATION.
	* langhooks.c (lhd_get_substring_location): New function.
	* langhooks.h (class substring_loc): New forward decl.
	(struct lang_hooks): Add field get_substring_location.
	* substring-locations.c: New file, taking definition of
	format_warning_va and format_warning_at_substring from
	c-family/c-format.c, making them non-static.
	* substring-locations.h (class substring_loc): Move class here
	from c-family/c-common.h.  Add and rewrite comments.
	(format_warning_va): New decl.
	(format_warning_at_substring): New decl.
	(get_source_location_for_substring): Add comment.

gcc/c-family/ChangeLog:
	* c-common.c (get_cpp_ttype_from_string_type): Handle being passed
	a POINTER_TYPE.
	(substring_loc::get_location): Move to substring-locations.c,
	keeping implementation as...
	(c_get_substring_location): New function, from the above, reworked
	to use accessors rather than member lookup.
	* c-common.h (class substring_loc): Move to substring-locations.h,
	replacing with a forward decl.
	(c_get_substring_location): New decl.
	* c-format.c: Include "substring-locations.h".
	(format_warning_va): Move to substring-locations.c.
	(format_warning_at_substring): Likewise.

gcc/c/ChangeLog:
	* c-lang.c (LANG_HOOKS_GET_SUBSTRING_LOCATION): Use
	c_get_substring_location for this new langhook.

gcc/testsuite/ChangeLog:
	* gcc.dg/plugin/diagnostic_plugin_test_string_literals.c: Include
	"substring-locations.h".
---
 gcc/Makefile.in                                    |   1 +
 gcc/c-family/c-common.c                            |  23 +--
 gcc/c-family/c-common.h                            |  32 +---
 gcc/c-family/c-format.c                            | 157 +----------------
 gcc/c/c-lang.c                                     |   3 +
 gcc/langhooks-def.h                                |   8 +-
 gcc/langhooks.c                                    |   8 +
 gcc/langhooks.h                                    |   9 +
 gcc/substring-locations.c                          | 195 +++++++++++++++++++++
 gcc/substring-locations.h                          |  71 ++++++++
 .../diagnostic_plugin_test_string_literals.c       |   1 +
 11 files changed, 312 insertions(+), 196 deletions(-)
 create mode 100644 gcc/substring-locations.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7c18285..332c85e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1443,6 +1443,7 @@ OBJS = \
 	store-motion.o \
 	streamer-hooks.o \
 	stringpool.o \
+	substring-locations.o \
 	target-globals.o \
 	targhooks.o \
 	timevar.o \
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 399ba97..ec1d87a 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1122,6 +1122,9 @@ static enum cpp_ttype
 get_cpp_ttype_from_string_type (tree string_type)
 {
   gcc_assert (string_type);
+  if (TREE_CODE (string_type) == POINTER_TYPE)
+    string_type = TREE_TYPE (string_type);
+
   if (TREE_CODE (string_type) != ARRAY_TYPE)
     return CPP_OTHER;
 
@@ -1148,23 +1151,23 @@ get_cpp_ttype_from_string_type (tree string_type)
 
 GTY(()) string_concat_db *g_string_concat_db;
 
-/* Attempt to determine the source location of the substring.
-   If successful, return NULL and write the source location to *OUT_LOC.
-   Otherwise return an error message.  Error messages are intended
-   for GCC developers (to help debugging) rather than for end-users.  */
+/* Implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
 
 const char *
-substring_loc::get_location (location_t *out_loc) const
+c_get_substring_location (const substring_loc &substr_loc,
+			  location_t *out_loc)
 {
-  gcc_assert (out_loc);
-
-  enum cpp_ttype tok_type = get_cpp_ttype_from_string_type (m_string_type);
+  enum cpp_ttype tok_type
+    = get_cpp_ttype_from_string_type (substr_loc.get_string_type ());
   if (tok_type == CPP_OTHER)
     return "unrecognized string type";
 
   return get_source_location_for_substring (parse_in, g_string_concat_db,
-					    m_fmt_string_loc, tok_type,
-					    m_caret_idx, m_start_idx, m_end_idx,
+					    substr_loc.get_fmt_string_loc (),
+					    tok_type,
+					    substr_loc.get_caret_idx (),
+					    substr_loc.get_start_idx (),
+					    substr_loc.get_end_idx (),
 					    out_loc);
 }
 
diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 42ce969..1d923c9 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -1132,35 +1132,9 @@ extern const char *cb_get_suggestion (cpp_reader *, const char *,
 
 extern GTY(()) string_concat_db *g_string_concat_db;
 
-/* libcpp can calculate location information about a range of characters
-   within a string literal, but doing so is non-trivial.
-
-   This class encapsulates such a source location, so that it can be
-   passed around (e.g. within c-format.c).  It is effectively a deferred
-   call into libcpp.  If needed by a diagnostic, the actual source_range
-   can be calculated by calling the get_range method.  */
-
-class substring_loc
-{
- public:
-  substring_loc (location_t fmt_string_loc, tree string_type,
-		 int caret_idx, int start_idx, int end_idx)
-  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
-    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
-
-  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
-
-  const char *get_location (location_t *out_loc) const;
-
-  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
-
- private:
-  location_t m_fmt_string_loc;
-  tree m_string_type;
-  int m_caret_idx;
-  int m_start_idx;
-  int m_end_idx;
-};
+class substring_loc;
+extern const char *c_get_substring_location (const substring_loc &substr_loc,
+					     location_t *out_loc);
 
 /* In c-gimplify.c  */
 extern void c_genericize (tree);
diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index ad434f8..6efa4dd 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "c-format.h"
 #include "diagnostic.h"
+#include "substring-locations.h"
 #include "selftest.h"
 
 /* Handle attributes associated with format checking.  */
@@ -67,162 +68,6 @@ static int first_target_format_type;
 static const char *format_name (int format_num);
 static int format_flags (int format_num);
 
-/* Emit a warning governed by option OPT, using GMSGID as the format
-   string and AP as its arguments.
-
-   Attempt to obtain precise location information within a string
-   literal from FMT_LOC.
-
-   Case 1: if substring location is available, and is within the range of
-   the format string itself, the primary location of the
-   diagnostic is the substring range obtained from FMT_LOC, with the
-   caret at the *end* of the substring range.
-
-   For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf ("hello %i", msg);
-                    ~^
-
-   Case 2: if the substring location is available, but is not within
-   the range of the format string, the primary location is that of the
-   format string, and an note is emitted showing the substring location.
-
-   For example:
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf("hello " INT_FMT " world", msg);
-            ^~~~~~~~~~~~~~~~~~~~~~~~~
-     test.c:19: note: format string is defined here
-     #define INT_FMT "%i"
-                      ~^
-
-   Case 3: if precise substring information is unavailable, the primary
-   location is that of the whole string passed to FMT_LOC's constructor.
-   For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf(fmt, msg);
-            ^~~
-
-   For each of cases 1-3, if param_range is non-NULL, then it is used
-   as a secondary range within the warning.  For example, here it
-   is used with case 1:
-
-     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
-     printf ("foo %s bar", long_i + long_j);
-                  ~^       ~~~~~~~~~~~~~~~
-
-   and here with case 2:
-
-     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
-     printf ("foo " STR_FMT " bar", long_i + long_j);
-             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
-     test.c:89:16: note: format string is defined here
-     #define STR_FMT "%s"
-                      ~^
-
-   and with case 3:
-
-     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
-     printf(fmt, msg);
-            ^~~  ~~~
-
-   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
-   a fix-it hint, suggesting that it should replace the text within the
-   substring range.  For example:
-
-     test.c:90:10: warning: problem with '%i' here [-Wformat=]
-     printf ("hello %i", msg);
-                    ~^
-                    %s
-
-   Return true if a warning was emitted, false otherwise.  */
-
-ATTRIBUTE_GCC_DIAG (5,0)
-static bool
-format_warning_va (const substring_loc &fmt_loc, source_range *param_range,
-		   const char *corrected_substring,
-		   int opt, const char *gmsgid, va_list *ap)
-{
-  bool substring_within_range = false;
-  location_t primary_loc;
-  location_t fmt_substring_loc = UNKNOWN_LOCATION;
-  source_range fmt_loc_range
-    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
-  const char *err = fmt_loc.get_location (&fmt_substring_loc);
-  source_range fmt_substring_range
-    = get_range_from_loc (line_table, fmt_substring_loc);
-  if (err)
-    /* Case 3: unable to get substring location.  */
-    primary_loc = fmt_loc.get_fmt_string_loc ();
-  else
-    {
-      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
-	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
-	/* Case 1.  */
-	{
-	  substring_within_range = true;
-	  primary_loc = fmt_substring_loc;
-	}
-      else
-	/* Case 2.  */
-	{
-	  substring_within_range = false;
-	  primary_loc = fmt_loc.get_fmt_string_loc ();
-	}
-    }
-
-  rich_location richloc (line_table, primary_loc);
-
-  if (param_range)
-    {
-      location_t param_loc = make_location (param_range->m_start,
-					    param_range->m_start,
-					    param_range->m_finish);
-      richloc.add_range (param_loc, false);
-    }
-
-  if (!err && corrected_substring && substring_within_range)
-    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
-
-  diagnostic_info diagnostic;
-  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
-  diagnostic.option_index = opt;
-  bool warned = report_diagnostic (&diagnostic);
-
-  if (!err && fmt_substring_loc && !substring_within_range)
-    /* Case 2.  */
-    if (warned)
-      {
-	rich_location substring_richloc (line_table, fmt_substring_loc);
-	if (corrected_substring)
-	  substring_richloc.add_fixit_replace (fmt_substring_range,
-					       corrected_substring);
-	inform_at_rich_loc (&substring_richloc,
-			    "format string is defined here");
-      }
-
-  return warned;
-}
-
-/* Variadic call to format_warning_va.  */
-
-ATTRIBUTE_GCC_DIAG (5,0)
-static bool
-format_warning_at_substring (const substring_loc &fmt_loc,
-			     source_range *param_range,
-			     const char *corrected_substring,
-			     int opt, const char *gmsgid, ...)
-{
-  va_list ap;
-  va_start (ap, gmsgid);
-  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
-				   opt, gmsgid, &ap);
-  va_end (ap);
-
-  return warned;
-}
-
 /* Emit a warning as per format_warning_va, but construct the substring_loc
    for the character at offset (CHAR_IDX - 1) within a string constant
    FORMAT_STRING_CST at FMT_STRING_LOC.  */
diff --git a/gcc/c/c-lang.c b/gcc/c/c-lang.c
index b26be6a..b4096d0 100644
--- a/gcc/c/c-lang.c
+++ b/gcc/c/c-lang.c
@@ -43,6 +43,9 @@ enum c_language_kind c_language = clk_c;
 #define LANG_HOOKS_RUN_LANG_SELFTESTS selftest::run_c_tests
 #endif /* #if CHECKING_P */
 
+#undef LANG_HOOKS_GET_SUBSTRING_LOCATION
+#define LANG_HOOKS_GET_SUBSTRING_LOCATION c_get_substring_location
+
 /* Each front end provides its own lang hook initializer.  */
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;
 
diff --git a/gcc/langhooks-def.h b/gcc/langhooks-def.h
index 10d910c..cf5f91d 100644
--- a/gcc/langhooks-def.h
+++ b/gcc/langhooks-def.h
@@ -24,6 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "hooks.h"
 
 struct diagnostic_info;
+class substring_loc;
 
 /* Note to creators of new hooks:
 
@@ -81,6 +82,9 @@ extern void lhd_omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *,
 					       tree);
 extern bool lhd_omp_mappable_type (tree);
 
+extern const char *lhd_get_substring_location (const substring_loc &,
+					       location_t *out_loc);
+
 #define LANG_HOOKS_NAME			"GNU unknown"
 #define LANG_HOOKS_IDENTIFIER_SIZE	sizeof (struct lang_identifier)
 #define LANG_HOOKS_INIT			hook_bool_void_false
@@ -121,6 +125,7 @@ extern bool lhd_omp_mappable_type (tree);
 #define LANG_HOOKS_EH_USE_CXA_END_CLEANUP	false
 #define LANG_HOOKS_DEEP_UNSHARING	false
 #define LANG_HOOKS_RUN_LANG_SELFTESTS   lhd_do_nothing
+#define LANG_HOOKS_GET_SUBSTRING_LOCATION lhd_get_substring_location
 
 /* Attribute hooks.  */
 #define LANG_HOOKS_ATTRIBUTE_TABLE		NULL
@@ -323,7 +328,8 @@ extern void lhd_end_section (void);
   LANG_HOOKS_BLOCK_MAY_FALLTHRU, \
   LANG_HOOKS_EH_USE_CXA_END_CLEANUP, \
   LANG_HOOKS_DEEP_UNSHARING, \
-  LANG_HOOKS_RUN_LANG_SELFTESTS \
+  LANG_HOOKS_RUN_LANG_SELFTESTS, \
+  LANG_HOOKS_GET_SUBSTRING_LOCATION \
 }
 
 #endif /* GCC_LANG_HOOKS_DEF_H */
diff --git a/gcc/langhooks.c b/gcc/langhooks.c
index 3256a9d..538d9f9 100644
--- a/gcc/langhooks.c
+++ b/gcc/langhooks.c
@@ -693,6 +693,14 @@ lhd_enum_underlying_base_type (const_tree enum_type)
 					 TYPE_UNSIGNED (enum_type));
 }
 
+/* Default implementation of LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
+
+const char *
+lhd_get_substring_location (const substring_loc &, location_t *)
+{
+  return "unimplemented";
+}
+
 /* Returns true if the current lang_hooks represents the GNU C frontend.  */
 
 bool
diff --git a/gcc/langhooks.h b/gcc/langhooks.h
index 44c258e..c109c8c 100644
--- a/gcc/langhooks.h
+++ b/gcc/langhooks.h
@@ -34,6 +34,8 @@ typedef void (*lang_print_tree_hook) (FILE *, tree, int indent);
 enum classify_record
   { RECORD_IS_STRUCT, RECORD_IS_CLASS, RECORD_IS_INTERFACE };
 
+class substring_loc;
+
 /* The following hooks are documented in langhooks.c.  Must not be
    NULL.  */
 
@@ -513,6 +515,13 @@ struct lang_hooks
   /* Run all lang-specific selftests.  */
   void (*run_lang_selftests) (void);
 
+  /* Attempt to determine the source location of the substring.
+     If successful, return NULL and write the source location to *OUT_LOC.
+     Otherwise return an error message.  Error messages are intended
+     for GCC developers (to help debugging) rather than for end-users.  */
+  const char *(*get_substring_location) (const substring_loc &,
+					 location_t *out_loc);
+
   /* Whenever you add entries here, make sure you adjust langhooks-def.h
      and langhooks.c accordingly.  */
 };
diff --git a/gcc/substring-locations.c b/gcc/substring-locations.c
new file mode 100644
index 0000000..60bf1b0
--- /dev/null
+++ b/gcc/substring-locations.c
@@ -0,0 +1,195 @@
+/* Source locations within string literals.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic.h"
+#include "cpplib.h"
+#include "tree.h"
+#include "langhooks.h"
+#include "substring-locations.h"
+
+/* Emit a warning governed by option OPT, using GMSGID as the format
+   string and AP as its arguments.
+
+   Attempt to obtain precise location information within a string
+   literal from FMT_LOC.
+
+   Case 1: if substring location is available, and is within the range of
+   the format string itself, the primary location of the
+   diagnostic is the substring range obtained from FMT_LOC, with the
+   caret at the *end* of the substring range.
+
+   For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf ("hello %i", msg);
+                    ~^
+
+   Case 2: if the substring location is available, but is not within
+   the range of the format string, the primary location is that of the
+   format string, and an note is emitted showing the substring location.
+
+   For example:
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf("hello " INT_FMT " world", msg);
+            ^~~~~~~~~~~~~~~~~~~~~~~~~
+     test.c:19: note: format string is defined here
+     #define INT_FMT "%i"
+                      ~^
+
+   Case 3: if precise substring information is unavailable, the primary
+   location is that of the whole string passed to FMT_LOC's constructor.
+   For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf(fmt, msg);
+            ^~~
+
+   For each of cases 1-3, if param_range is non-NULL, then it is used
+   as a secondary range within the warning.  For example, here it
+   is used with case 1:
+
+     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
+     printf ("foo %s bar", long_i + long_j);
+                  ~^       ~~~~~~~~~~~~~~~
+
+   and here with case 2:
+
+     test.c:90:16: warning: '%s' here but arg 2 has 'long' type [-Wformat=]
+     printf ("foo " STR_FMT " bar", long_i + long_j);
+             ^~~~~~~~~~~~~~~~~~~~~  ~~~~~~~~~~~~~~~
+     test.c:89:16: note: format string is defined here
+     #define STR_FMT "%s"
+                      ~^
+
+   and with case 3:
+
+     test.c:90:10: warning: '%i' here, but arg 2 is "const char *' [-Wformat=]
+     printf(fmt, msg);
+            ^~~  ~~~
+
+   If CORRECTED_SUBSTRING is non-NULL, use it for cases 1 and 2 to provide
+   a fix-it hint, suggesting that it should replace the text within the
+   substring range.  For example:
+
+     test.c:90:10: warning: problem with '%i' here [-Wformat=]
+     printf ("hello %i", msg);
+                    ~^
+                    %s
+
+   Return true if a warning was emitted, false otherwise.  */
+
+ATTRIBUTE_GCC_DIAG (5,0)
+bool
+format_warning_va (const substring_loc &fmt_loc,
+		   const source_range *param_range,
+		   const char *corrected_substring,
+		   int opt, const char *gmsgid, va_list *ap)
+{
+  bool substring_within_range = false;
+  location_t primary_loc;
+  location_t fmt_substring_loc = UNKNOWN_LOCATION;
+  source_range fmt_loc_range
+    = get_range_from_loc (line_table, fmt_loc.get_fmt_string_loc ());
+  const char *err = fmt_loc.get_location (&fmt_substring_loc);
+  source_range fmt_substring_range
+    = get_range_from_loc (line_table, fmt_substring_loc);
+  if (err)
+    /* Case 3: unable to get substring location.  */
+    primary_loc = fmt_loc.get_fmt_string_loc ();
+  else
+    {
+      if (fmt_substring_range.m_start >= fmt_loc_range.m_start
+	  && fmt_substring_range.m_finish <= fmt_loc_range.m_finish)
+	/* Case 1.  */
+	{
+	  substring_within_range = true;
+	  primary_loc = fmt_substring_loc;
+	}
+      else
+	/* Case 2.  */
+	{
+	  substring_within_range = false;
+	  primary_loc = fmt_loc.get_fmt_string_loc ();
+	}
+    }
+
+  rich_location richloc (line_table, primary_loc);
+
+  if (param_range)
+    {
+      location_t param_loc = make_location (param_range->m_start,
+					    param_range->m_start,
+					    param_range->m_finish);
+      richloc.add_range (param_loc, false);
+    }
+
+  if (!err && corrected_substring && substring_within_range)
+    richloc.add_fixit_replace (fmt_substring_range, corrected_substring);
+
+  diagnostic_info diagnostic;
+  diagnostic_set_info (&diagnostic, gmsgid, ap, &richloc, DK_WARNING);
+  diagnostic.option_index = opt;
+  bool warned = report_diagnostic (&diagnostic);
+
+  if (!err && fmt_substring_loc && !substring_within_range)
+    /* Case 2.  */
+    if (warned)
+      {
+	rich_location substring_richloc (line_table, fmt_substring_loc);
+	if (corrected_substring)
+	  substring_richloc.add_fixit_replace (fmt_substring_range,
+					       corrected_substring);
+	inform_at_rich_loc (&substring_richloc,
+			    "format string is defined here");
+      }
+
+  return warned;
+}
+
+/* Variadic call to format_warning_va.  */
+
+bool
+format_warning_at_substring (const substring_loc &fmt_loc,
+			     const source_range *param_range,
+			     const char *corrected_substring,
+			     int opt, const char *gmsgid, ...)
+{
+  va_list ap;
+  va_start (ap, gmsgid);
+  bool warned = format_warning_va (fmt_loc, param_range, corrected_substring,
+				   opt, gmsgid, &ap);
+  va_end (ap);
+
+  return warned;
+}
+
+/* Attempt to determine the source location of the substring.
+   If successful, return NULL and write the source location to *OUT_LOC.
+   Otherwise return an error message.  Error messages are intended
+   for GCC developers (to help debugging) rather than for end-users.  */
+
+const char *
+substring_loc::get_location (location_t *out_loc) const
+{
+  gcc_assert (out_loc);
+  return lang_hooks.get_substring_location (*this, out_loc);
+}
diff --git a/gcc/substring-locations.h b/gcc/substring-locations.h
index f839c74..f8788c9 100644
--- a/gcc/substring-locations.h
+++ b/gcc/substring-locations.h
@@ -20,6 +20,77 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_SUBSTRING_LOCATIONS_H
 #define GCC_SUBSTRING_LOCATIONS_H
 
+/* The substring_loc class encapsulates information on the source location
+   of a range of characters within a STRING_CST.
+
+   If needed by a diagnostic, the actual location_t of the substring_loc
+   can be calculated by calling its get_location method.  This calls a
+   langhook, since this is inherently frontend-specific.  For the C family
+   of frontends, it calls back into libcpp to reparse the strings.  This
+   gets the location information "on demand", rather than storing the
+   location information in the initial lex for every string.  Thus the
+   substring_loc can also be thought of as a deferred call into libcpp,
+   to allow the non-trivial work of reparsing the string to be delayed
+   until we actually need it (to emit a diagnostic for a particular range
+   of characters).
+
+   substring_loc::get_location returns NULL if it succeeds, or an
+   error message if it fails.  Error messages are intended for GCC
+   developers (to help debugging) rather than for end-users.
+
+   The easiest way to use a substring_loc is via the format_warning_* APIs,
+   which gracefully handle failure of substring_loc::get_location by using
+   the location of the string as a whole if substring-information is
+   unavailable.  */
+
+class substring_loc
+{
+ public:
+  /* Constructor.  FMT_STRING_LOC is the location of the string as
+     a whole.  STRING_TYPE is the type of the string.  It should be an
+     ARRAY_TYPE of INTEGER_TYPE, or a POINTER_TYPE to such an ARRAY_TYPE.
+     CARET_IDX, START_IDX, and END_IDX are offsets from the start
+     of the string data.  */
+  substring_loc (location_t fmt_string_loc, tree string_type,
+		 int caret_idx, int start_idx, int end_idx)
+  : m_fmt_string_loc (fmt_string_loc), m_string_type (string_type),
+    m_caret_idx (caret_idx), m_start_idx (start_idx), m_end_idx (end_idx) {}
+
+  void set_caret_index (int caret_idx) { m_caret_idx = caret_idx; }
+
+  const char *get_location (location_t *out_loc) const;
+
+  location_t get_fmt_string_loc () const { return m_fmt_string_loc; }
+  tree get_string_type () const { return m_string_type; }
+  int get_caret_idx () const { return m_caret_idx; }
+  int get_start_idx () const { return m_start_idx; }
+  int get_end_idx () const { return m_end_idx; }
+
+ private:
+  location_t m_fmt_string_loc;
+  tree m_string_type;
+  int m_caret_idx;
+  int m_start_idx;
+  int m_end_idx;
+};
+
+/* Functions for emitting a warning about a format string.  */
+
+extern bool format_warning_va (const substring_loc &fmt_loc,
+			       const source_range *param_range,
+			       const char *corrected_substring,
+			       int opt, const char *gmsgid, va_list *ap)
+  ATTRIBUTE_GCC_DIAG (5,0);
+
+extern bool format_warning_at_substring (const substring_loc &fmt_loc,
+					 const source_range *param_range,
+					 const char *corrected_substring,
+					 int opt, const char *gmsgid, ...)
+  ATTRIBUTE_GCC_DIAG (5,0);
+
+/* Implementation detail, for use when implementing
+   LANG_HOOKS_GET_SUBSTRING_LOCATION.  */
+
 extern const char *get_source_location_for_substring (cpp_reader *pfile,
 						      string_concat_db *concats,
 						      location_t strloc,
diff --git a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
index dff999c..99a504d 100644
--- a/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
+++ b/gcc/testsuite/gcc.dg/plugin/diagnostic_plugin_test_string_literals.c
@@ -33,6 +33,7 @@
 #include "print-tree.h"
 #include "cpplib.h"
 #include "c-family/c-pragma.h"
+#include "substring-locations.h"
 
 int plugin_is_GPL_compatible;
 
-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-09-07 17:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 15:40 [PATCH] Move class substring_loc from c-family into gcc David Malcolm
2016-08-25 15:54 ` [PATCH] Fixups for Martin's gimple-ssa-sprintf.c patch David Malcolm
2016-08-25 16:30   ` Martin Sebor
2016-08-31 16:23     ` Martin Sebor
2016-08-31 16:27       ` David Malcolm
2016-08-31 16:48         ` Martin Sebor
2016-09-02 23:57 ` [PATCH] Move class substring_loc from c-family into gcc Martin Sebor
2016-09-03  3:03   ` David Malcolm
2016-09-03  9:22   ` Manuel López-Ibáñez
2016-09-06 22:07     ` Martin Sebor
2016-09-07 17:19     ` [committed] " David Malcolm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).