public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] warn for strlen of arrays with missing nul (PR 86552)
@ 2018-07-19 20:09 Martin Sebor
  2018-07-25 23:38 ` PING " Martin Sebor
  0 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-07-19 20:09 UTC (permalink / raw)
  To: Gcc Patch List

[-- Attachment #1: Type: text/plain, Size: 775 bytes --]

In the discussion of my patch for pr86532 Bernd noted that
GCC silently accepts constant character arrays with no
terminating nul as arguments to strlen (and other string
functions).

The attached patch is a first step in detecting these kinds
of bugs in strlen calls by issuing -Wstringop-overflow.
The next step is to modify all other handlers of built-in
functions to detect the same problem (not part of this patch).
Yet another step is to detect these problems in arguments
initialized using the non-string form:

   const char a[] = { 'a', 'b', 'c' };

This patch is meant to apply on top of the one for bug 86532
(I tested it with an earlier version of that patch so there
is code in the context that does not appear in the latest
version of the other diff).

Martin


[-- Attachment #2: gcc-86552.diff --]
[-- Type: text/x-patch, Size: 19657 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86552
	* builtins.c (warn_string_no_nul): New function.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	(fold_builtin_1): Adjust.
	* builtins.h (c_strlen): Add argument.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Revert test.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86552
	* gcc.dg/warn-string-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 03cf012..9885c4b 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+static void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,13 +597,17 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
   if (TREE_CODE (src) == COND_EXPR
@@ -581,24 +615,31 @@ c_strlen (tree src, int only_value)
     {
       tree len1, len2;
 
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arr);
+      len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arr);
       if (tree_int_cst_equal (len1, len2))
 	return len1;
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
     return NULL_TREE;
 
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm && arr)
+    *arr = NULL_TREE;
+
   /* Determine the size of the string element.  */
   unsigned eltsize
     = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (src))));
@@ -650,7 +691,8 @@ c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +732,7 @@ c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2897,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2905,37 @@ expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8238,19 +8296,27 @@ fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9158,7 +9224,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index c922904..9446c09 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -57,7 +57,7 @@ extern unsigned int get_object_alignment (tree);
 extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index 79ead3d..79bcbbe 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11335,7 +11339,7 @@ string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  tree type = TREE_TYPE (arg1);
 	  *ptr_offset = fold_build2 (PLUS_EXPR, type, offset, arg1);
@@ -11357,12 +11361,10 @@ string_constant (tree arg, tree *ptr_offset)
       if (TREE_CODE (TREE_TYPE (array)) != ARRAY_TYPE)
 	return NULL_TREE;
 
-      while (TREE_CODE (chartype) == ARRAY_TYPE
-	     || TREE_CODE (chartype) == POINTER_TYPE)
-	chartype = TREE_TYPE (chartype);
+      gcc_assert (TREE_CODE (chartype) == POINTER_TYPE);
 
-      if (TREE_CODE (chartype) != INTEGER_TYPE)
-	return NULL;
+      while (TREE_CODE (chartype) != INTEGER_TYPE)
+	chartype = TREE_TYPE (chartype);
 
       /* Set the non-constant offset to the non-constant index scaled
 	 by the size of the character type.  */
@@ -11374,6 +11376,8 @@ string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11420,6 +11424,38 @@ string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) determine the size
+     of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+  while (TREE_CODE (ref) == ARRAY_REF)
+    {
+      reftype = TREE_TYPE (ref);
+      ref = TREE_OPERAND (ref, 0);
+    }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11430,7 +11466,9 @@ string_constant (tree arg, tree *ptr_offset)
      but not to strlen().  */
   unsigned HOST_WIDE_INT length
     = strnlen (TREE_STRING_POINTER (init), TREE_STRING_LENGTH (init));
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@ expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 15bbf95..b318fc77 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14638,8 +14638,7 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	 NUL-terminated strings.  */
       *strsize = string_size;
     }
-  else if (string_size < string_length
-	   || string[string_length - 1] != '\0')
+  else if (string[string_length - 1] != '\0')
     {
       /* Support only properly NUL-terminated strings but handle
 	 consecutive strings within the same array, such as the six
diff --git a/gcc/testsuite/gcc.dg/warn-string-no-nul.c b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
new file mode 100644
index 0000000..e470ade
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
@@ -0,0 +1,200 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);


^ permalink raw reply	[flat|nested] 53+ messages in thread

* PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-07-19 20:09 [PATCH] warn for strlen of arrays with missing nul (PR 86552) Martin Sebor
@ 2018-07-25 23:38 ` Martin Sebor
  2018-07-30 19:18   ` Martin Sebor
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
  0 siblings, 2 replies; 53+ messages in thread
From: Martin Sebor @ 2018-07-25 23:38 UTC (permalink / raw)
  To: Gcc Patch List

Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html

The fix for bug 86532 has been checked in so this enhancement
can now be applied on top of it (with only minor adjustments).

On 07/19/2018 02:08 PM, Martin Sebor wrote:
> In the discussion of my patch for pr86532 Bernd noted that
> GCC silently accepts constant character arrays with no
> terminating nul as arguments to strlen (and other string
> functions).
>
> The attached patch is a first step in detecting these kinds
> of bugs in strlen calls by issuing -Wstringop-overflow.
> The next step is to modify all other handlers of built-in
> functions to detect the same problem (not part of this patch).
> Yet another step is to detect these problems in arguments
> initialized using the non-string form:
>
>   const char a[] = { 'a', 'b', 'c' };
>
> This patch is meant to apply on top of the one for bug 86532
> (I tested it with an earlier version of that patch so there
> is code in the context that does not appear in the latest
> version of the other diff).
>
> Martin
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-07-25 23:38 ` PING " Martin Sebor
@ 2018-07-30 19:18   ` Martin Sebor
  2018-08-02  2:44     ` PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) Martin Sebor
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
  1 sibling, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-07-30 19:18 UTC (permalink / raw)
  To: Gcc Patch List

[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]

Attached is an updated version of the patch that handles more
instances of calling strlen() on a constant array that is not
a nul-terminated string.

No other functions except strlen are explicitly handled yet,
and neither are constant arrays with braced-initializer lists
like const char a[] = { 'a', 'b', 'c' };  I am testing
an independent solution for those (bug 86552).  Once those
are handled the warning will be able to detect those as well.

Tested on x86_64-linux.

On 07/25/2018 05:38 PM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>
> The fix for bug 86532 has been checked in so this enhancement
> can now be applied on top of it (with only minor adjustments).
>
> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>> In the discussion of my patch for pr86532 Bernd noted that
>> GCC silently accepts constant character arrays with no
>> terminating nul as arguments to strlen (and other string
>> functions).
>>
>> The attached patch is a first step in detecting these kinds
>> of bugs in strlen calls by issuing -Wstringop-overflow.
>> The next step is to modify all other handlers of built-in
>> functions to detect the same problem (not part of this patch).
>> Yet another step is to detect these problems in arguments
>> initialized using the non-string form:
>>
>>   const char a[] = { 'a', 'b', 'c' };
>>
>> This patch is meant to apply on top of the one for bug 86532
>> (I tested it with an earlier version of that patch so there
>> is code in the context that does not appear in the latest
>> version of the other diff).
>>
>> Martin
>>
>


[-- Attachment #2: gcc-86552.diff --]
[-- Type: text/x-patch, Size: 33470 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86552
	* gcc.dg/warn-string-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+     of a conditional expression.  When ARR is null, point it at
+     one of the elements for simplicity.  */
+  tree arrs[] = { NULL_TREE, NULL_TREE };
+  if (!arr)
+    arr = arrs;
+
   if (TREE_CODE (src) == COND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
     {
-      tree len1, len2;
-
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
+      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
       if (tree_int_cst_equal (len1, len2))
-	return len1;
+	{
+	  *arr = arrs[0] ? arrs[0] : arrs[1];
+	  return len1;
+	}
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
-    return NULL_TREE;
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
+    {
+      *arr = arrs[0] ? arrs[0] : arrs[1];
+      return NULL_TREE;
+    }
+
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm)
+    *arr = NULL_TREE;
 
   /* Determine the size of the string element.  */
   unsigned eltsize
@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
 	maxelts = maxelts / eltsize - 1;
       }
 
+  /* Unless the caller is prepared to handle it by passing in a non-null
+     ARR, fail if the terminating nul doesn't fit in the array the string
+     is stored in (as in const char a[3] = "123";  */
+  if (!arr && maxelts < strelts)
+    return NULL_TREE;
+
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
   const char *ptr = TREE_STRING_POINTER (src);
@@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  /* Array refers to the non-nul terminated constant array
+	     whose length is attempted to be computed.  */
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
+	    loc = EXPR_LOCATION (arg);
+
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 2e0a2f9..73b0b0b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
@@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-
+extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..edbd7f8 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  /* Avoid pointers to arrays (see bug 86622).  */
 	  if (POINTER_TYPE_P (TREE_TYPE (arg))
@@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (nulterm)
+	*nulterm = true;
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) try to determine
+     the size of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+
+  if (TREE_CODE (ref) == MEM_REF)
+    {
+      ref = TREE_OPERAND (ref, 0);
+      if (TREE_CODE (ref) == ADDR_EXPR)
+	{
+	  ref = TREE_OPERAND (ref, 0);
+	  reftype = TREE_TYPE (ref);
+	}
+    }
+  else
+    while (TREE_CODE (ref) == ARRAY_REF)
+      {
+	reftype = TREE_TYPE (ref);
+	ref = TREE_OPERAND (ref, 0);
+      }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
   length = string_length (TREE_STRING_POINTER (init), charsize,
 			  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@ expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 06a42060..849a443 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
   switch (fn)
     {
     case CFN_BUILT_IN_STRLEN:
-      if (const char *str = c_getstr (arg))
-	return build_int_cst (type, strlen (str));
-      return NULL_TREE;
+      {
+	bool nulterm;
+	if (const char *str = c_getstr (arg, NULL, &nulterm))
+	  if (nulterm)
+	    return build_int_cst (type, strlen (str));
+
+	return NULL_TREE;
+      }
 
     CASE_CFN_NAN:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b318fc77..ecbc38c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRSIZE is non-null, store the size of the string
+   literal in *STRSIZE, including any embedded or terminating nuls.  If
+   If NULLTERM is non-null, set *NULLTERM if the referenced string is
+   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
+   can happen in the case of valid C declarations such as:
+   const char a[3] = "123";  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
+	  bool *nulterm /* = NULL */)
 {
   tree offset_node;
 
-  if (strlen)
-    *strlen = 0;
+  if (strsize)
+    *strsize = 0;
 
-  src = string_constant (src, &offset_node);
+  src = string_constant (src, &offset_node, nulterm);
   if (src == 0)
     return NULL;
 
@@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	offset = tree_to_uhwi (offset_node);
     }
 
-  /* STRING_LENGTH is the size of the string literal, including any
-     embedded NULs.  STRING_SIZE is the size of the array the string
+  /* STRING_SIZE is the size of the string literal, including any
+     embedded NULs.  ARRAY_SIZE is the size of the array the string
      literal is stored in.  */
-  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
+  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
+  unsigned HOST_WIDE_INT array_size = string_size;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+      array_size = tree_to_uhwi (size);
+
+  const char *string = TREE_STRING_POINTER (src);
 
-  if (strlen)
+  if (strsize)
     {
-      /* Compute and store the length of the substring at OFFSET.
+      /* Compute and store the size of the substring at OFFSET.
 	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
+      if (offset <= string_size)
+	*strsize = string_size - offset;
       else
-	*strlen = 0;
+	*strsize = 0;
     }
 
-  const char *string = TREE_STRING_POINTER (src);
-
-  if (string_length == 0
-      || offset >= string_size)
+  if (string_size == 0
+      || offset >= array_size)
     return NULL;
 
-  if (strsize)
-    {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
-    }
-  else if (string[string_length - 1] != '\0')
+  if (!nulterm && string[string_size - 1] != '\0')
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
+      /* When NULTERM is null, support only properly nul-terminated
+	 strings but handle consecutive strings within the same array,
+	 such as the six substrings in "1\0002\0003".  Otherwise, let
+	 the caller deal with non-nul-terminated arrays.  */
       return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset <= string_size ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 1b9ccc0..a58a4a2 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+			     bool * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index c3fa570..9eefb37 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
    Set *FLEXP to true if the range of the string lengths has been
    obtained from the upper bound of an array at the end of a struct.
    Such an array may hold a string that's longer than its upper bound
-   due to it being used as a poor-man's flexible array member.  */
+   due to it being used as a poor-man's flexible array member.
+   Clear *NULTERM if ARG refers to a constant array that is known
+   not be nul-terminated.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  int fuzzy, bool *flexp)
+		  int fuzzy, bool *flexp, bool *nulterm)
 {
   tree var, val = NULL_TREE;
   gimple *def_stmt;
@@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	      if (TREE_CODE (aop0) == INDIRECT_REF
 		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 		return get_range_strlen (TREE_OPERAND (aop0, 0),
-					 length, visited, type, fuzzy, flexp);
+					 length, visited, type, fuzzy, flexp,
+					 nulterm);
 	    }
 	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
 	    {
@@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1);
+	{
+	  tree arr;
+	  val = c_strlen (arg, 1, &arr);
+	  if (val && arr)
+	    *nulterm = false;
+	}
 
       if (!val && fuzzy)
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
-				     visited, type, fuzzy, flexp);
+				     visited, type, fuzzy, flexp, nulterm);
 
 	  if (TREE_CODE (arg) == ARRAY_REF)
 	    {
@@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             || gimple_assign_unary_nop_p (def_stmt))
           {
             tree rhs = gimple_assign_rhs1 (def_stmt);
-	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
+	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
+				     nulterm);
           }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
@@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 
 	    for (unsigned int i = 0; i < 2; i++)
 	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
-				     flexp))
+				     flexp, nulterm))
 		{
 		  if (fuzzy == 2)
 		    *maxlen = build_all_ones_cst (size_type_node);
@@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             if (arg == gimple_phi_result (def_stmt))
               continue;
 
-	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
+	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
+				   nulterm))
 	      {
 		if (fuzzy == 2)
 		  *maxlen = build_all_ones_cst (size_type_node);
@@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
    and false if PHIs and COND_EXPRs are to be handled optimistically,
    if we can determine string length minimum and maximum; it will use
    the minimum from the ones where it can be determined.
-   STRICT false should be only used for warning code.  */
+   STRICT false should be only used for warning code.
+   When non-null, clear *NULTERM if ARG refers to a constant array
+   that is known not be nul-terminated.  Otherwise set it to true.  */
 
 bool
-get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
+get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
+		  bool *nulterm /* = NULL */)
 {
   bitmap visited = NULL;
 
   minmaxlen[0] = NULL_TREE;
   minmaxlen[1] = NULL_TREE;
 
+  bool nultermbuf;
+  if (!nulterm)
+    nulterm = &nultermbuf;
+  *nulterm = true;
+
   bool flexarray = false;
   if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
-			 &flexarray))
+			 &flexarray, nulterm))
     {
       minmaxlen[0] = NULL_TREE;
       minmaxlen[1] = NULL_TREE;
@@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type)
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
@@ -3496,12 +3514,14 @@ static bool
 gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  tree arg = gimple_call_arg (stmt, 0);
 
   wide_int minlen;
   wide_int maxlen;
 
   tree lenrange[2];
-  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
+  bool nulterm;
+  if (!get_range_strlen (arg, lenrange, true, &nulterm)
       && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
       && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
     {
@@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 
   if (minlen == maxlen)
     {
+      if (!nulterm)
+	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
+			    gimple_call_fndecl (stmt), arg);
+
       lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
 					      true, GSI_SAME_STMT);
       replace_call_with_value (gsi, lenrange[0]);
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..fe11728 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-extern bool get_range_strlen (tree, tree[2], bool = false);
+extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
 extern tree get_maxval_strlen (tree, int);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.dg/warn-string-no-nul.c b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
new file mode 100644
index 0000000..838528f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-string-no-nul.c
@@ -0,0 +1,239 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+int i1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+T (&a[i0] + 1);       /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+T (b[i0]);
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
+
+T (i0 ? "" : b[0]);
+T (i0 ? "" : b[1]);
+T (i0 ? "" : b[2]);
+T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
+T (i0 ? b[0] : "");
+T (i0 ? b[1] : "");
+T (i0 ? b[2] : "");
+T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
+
+T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
+T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
+
+T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
+T (i0 ? b[0] : b[2]);
+T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
+T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
+
+T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
+T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+/* It's possible to detect the missing nul in the following two
+   expressions but GCC doesn't do it yet.  */
+T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);
+
+
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+
+T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
+T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
+
+T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
+T (i0 ? ba[0].a[1].b : ba[0].a[0].a);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-07-30 19:18   ` Martin Sebor
@ 2018-08-02  2:44     ` Martin Sebor
  2018-08-02 13:26       ` Bernd Edlinger
  2018-08-17  5:15       ` Jeff Law
  0 siblings, 2 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-02  2:44 UTC (permalink / raw)
  To: Gcc Patch List; +Cc: Bernd Edlinger

[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]

Since the foundation of the patch is detecting and avoiding
the overly aggressive folding of unterminated char arrays,
besides issuing a warning for such arguments to strlen,
the patch also fixes pr86711 - wrong folding of memchr, and
pr86714 - tree-ssa-forwprop.c confused by too long initializer.

The substance of the attached updated patch is unchanged,
I have just added test cases for the two additional bugs.

Bernd, as I mentioned Wednesday, the patch supersedes
yours here:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html

Martin

On 07/30/2018 01:17 PM, Martin Sebor wrote:
> Attached is an updated version of the patch that handles more
> instances of calling strlen() on a constant array that is not
> a nul-terminated string.
>
> No other functions except strlen are explicitly handled yet,
> and neither are constant arrays with braced-initializer lists
> like const char a[] = { 'a', 'b', 'c' };  I am testing
> an independent solution for those (bug 86552).  Once those
> are handled the warning will be able to detect those as well.
>
> Tested on x86_64-linux.
>
> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>
>> The fix for bug 86532 has been checked in so this enhancement
>> can now be applied on top of it (with only minor adjustments).
>>
>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>> In the discussion of my patch for pr86532 Bernd noted that
>>> GCC silently accepts constant character arrays with no
>>> terminating nul as arguments to strlen (and other string
>>> functions).
>>>
>>> The attached patch is a first step in detecting these kinds
>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>> The next step is to modify all other handlers of built-in
>>> functions to detect the same problem (not part of this patch).
>>> Yet another step is to detect these problems in arguments
>>> initialized using the non-string form:
>>>
>>>   const char a[] = { 'a', 'b', 'c' };
>>>
>>> This patch is meant to apply on top of the one for bug 86532
>>> (I tested it with an earlier version of that patch so there
>>> is code in the context that does not appear in the latest
>>> version of the other diff).
>>>
>>> Martin
>>>
>>
>


[-- Attachment #2: gcc-86552.diff --]
[-- Type: text/x-patch, Size: 36212 bytes --]

PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
PR tree-optimization/86711 - wrong folding of memchr
PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* builtins.h (warn_string_no_nul): Declare..
	(c_strlen): Add argument.
	* builtins.c (warn_string_no_nul): New function.
	(fold_builtin_strlen): Add argument.  Detect missing nul.
	(fold_builtin_1): Adjust.
	(string_length): Add argument and use it.
	(c_strlen): Same.
	(expand_builtin_strlen): Detect missing nul.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to bool*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Detect missing nul.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	PR tree-optimization/86552
	* gcc.c-torture/execute/memchr-1.c: New test.
	* gcc.c-torture/execute/pr86714.c: New test.
	* gcc.dg/warn-strlen-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index aa3e0d8..f4924d5 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When ARR is non-null and the string is not properly nul-terminated,
+   set *ARR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
 {
   STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+     of a conditional expression.  When ARR is null, point it at
+     one of the elements for simplicity.  */
+  tree arrs[] = { NULL_TREE, NULL_TREE };
+  if (!arr)
+    arr = arrs;
+
   if (TREE_CODE (src) == COND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
     {
-      tree len1, len2;
-
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
+      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
       if (tree_int_cst_equal (len1, len2))
-	return len1;
+	{
+	  *arr = arrs[0] ? arrs[0] : arrs[1];
+	  return len1;
+	}
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
-    return NULL_TREE;
+  /* Set if array is nul-terminated, false otherwise.  */
+  bool nulterm;
+  src = string_constant (src, &byteoff, &nulterm, arr);
+  if (!src)
+    {
+      *arr = arrs[0] ? arrs[0] : arrs[1];
+      return NULL_TREE;
+    }
+
+  /* Clear *ARR when the string is nul-terminated.  It should be
+     of no interest to callers.  */
+  if (nulterm)
+    *arr = NULL_TREE;
 
   /* Determine the size of the string element.  */
   unsigned eltsize
@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
 	maxelts = maxelts / eltsize - 1;
       }
 
+  /* Unless the caller is prepared to handle it by passing in a non-null
+     ARR, fail if the terminating nul doesn't fit in the array the string
+     is stored in (as in const char a[3] = "123";  */
+  if (!arr && maxelts < strelts)
+    return NULL_TREE;
+
   /* PTR can point to the byte representation of any string type, including
      char* and wchar_t*.  */
   const char *ptr = TREE_STRING_POINTER (src);
@@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
@@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
      Since ELTOFF is our starting index into the string, no further
      calculation is needed.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
@@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  /* Array refers to the non-nul terminated constant array
+	     whose length is attempted to be computed.  */
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree arr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &arr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
+	    loc = EXPR_LOCATION (arg);
+
+	  /* To avoid warning multiple times about non-nul-terminated
+	     strings only warn if their length has been determined
+	     and it's being folded.  */
+	  if (arr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 2e0a2f9..73b0b0b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
@@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-
+extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..edbd7f8 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NULTERM is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
+   to such a sequence and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
+		 tree *decl /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nulterm, decl))
 	{
 	  /* Avoid pointers to arrays (see bug 86622).  */
 	  if (POINTER_TYPE_P (TREE_TYPE (arg))
@@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (nulterm)
+	*nulterm = true;
+      if (decl)
+	*decl = NULL_TREE;
       return array;
     }
 
@@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
+  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
+
+  /* When ARG refers to an aggregate (of arrays) try to determine
+     the size of the character array within the aggregate.  */
+  tree ref = arg;
+  tree reftype = TREE_TYPE (arg);
+
+  if (TREE_CODE (ref) == MEM_REF)
+    {
+      ref = TREE_OPERAND (ref, 0);
+      if (TREE_CODE (ref) == ADDR_EXPR)
+	{
+	  ref = TREE_OPERAND (ref, 0);
+	  reftype = TREE_TYPE (ref);
+	}
+    }
+  else
+    while (TREE_CODE (ref) == ARRAY_REF)
+      {
+	reftype = TREE_TYPE (ref);
+	ref = TREE_OPERAND (ref, 0);
+      }
+
+  if (TREE_CODE (ref) == COMPONENT_REF)
+    reftype = TREE_TYPE (ref);
+
+  while (TREE_CODE (reftype) == ARRAY_TYPE)
+    {
+      tree next = TREE_TYPE (reftype);
+      if (TREE_CODE (next) == INTEGER_TYPE)
+	{
+	  if (tree size = TYPE_SIZE_UNIT (reftype))
+	    if (tree_fits_uhwi_p (size))
+	      array_elts = tree_to_uhwi (size);
+	  break;
+	}
+
+      reftype = TREE_TYPE (reftype);
+    }
+
+  if (decl)
+    *decl = array;
+
   /* Avoid returning a string that doesn't fit in the array
      it is stored in, like
      const char a[4] = "abcde";
@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
   length = string_length (TREE_STRING_POINTER (init), charsize,
 			  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  if (nulterm)
+    *nulterm = array_elts > length;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..e630979 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@ expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 06a42060..849a443 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
   switch (fn)
     {
     case CFN_BUILT_IN_STRLEN:
-      if (const char *str = c_getstr (arg))
-	return build_int_cst (type, strlen (str));
-      return NULL_TREE;
+      {
+	bool nulterm;
+	if (const char *str = c_getstr (arg, NULL, &nulterm))
+	  if (nulterm)
+	    return build_int_cst (type, strlen (str));
+
+	return NULL_TREE;
+      }
 
     CASE_CFN_NAN:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b318fc77..ecbc38c 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRSIZE is non-null, store the size of the string
+   literal in *STRSIZE, including any embedded or terminating nuls.  If
+   If NULLTERM is non-null, set *NULLTERM if the referenced string is
+   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
+   can happen in the case of valid C declarations such as:
+   const char a[3] = "123";  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
+	  bool *nulterm /* = NULL */)
 {
   tree offset_node;
 
-  if (strlen)
-    *strlen = 0;
+  if (strsize)
+    *strsize = 0;
 
-  src = string_constant (src, &offset_node);
+  src = string_constant (src, &offset_node, nulterm);
   if (src == 0)
     return NULL;
 
@@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	offset = tree_to_uhwi (offset_node);
     }
 
-  /* STRING_LENGTH is the size of the string literal, including any
-     embedded NULs.  STRING_SIZE is the size of the array the string
+  /* STRING_SIZE is the size of the string literal, including any
+     embedded NULs.  ARRAY_SIZE is the size of the array the string
      literal is stored in.  */
-  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
+  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
+  unsigned HOST_WIDE_INT array_size = string_size;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+      array_size = tree_to_uhwi (size);
+
+  const char *string = TREE_STRING_POINTER (src);
 
-  if (strlen)
+  if (strsize)
     {
-      /* Compute and store the length of the substring at OFFSET.
+      /* Compute and store the size of the substring at OFFSET.
 	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
+      if (offset <= string_size)
+	*strsize = string_size - offset;
       else
-	*strlen = 0;
+	*strsize = 0;
     }
 
-  const char *string = TREE_STRING_POINTER (src);
-
-  if (string_length == 0
-      || offset >= string_size)
+  if (string_size == 0
+      || offset >= array_size)
     return NULL;
 
-  if (strsize)
-    {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
-    }
-  else if (string[string_length - 1] != '\0')
+  if (!nulterm && string[string_size - 1] != '\0')
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
+      /* When NULTERM is null, support only properly nul-terminated
+	 strings but handle consecutive strings within the same array,
+	 such as the six substrings in "1\0002\0003".  Otherwise, let
+	 the caller deal with non-nul-terminated arrays.  */
       return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset <= string_size ? string + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 1b9ccc0..a58a4a2 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+			     bool * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index c3fa570..9eefb37 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
    Set *FLEXP to true if the range of the string lengths has been
    obtained from the upper bound of an array at the end of a struct.
    Such an array may hold a string that's longer than its upper bound
-   due to it being used as a poor-man's flexible array member.  */
+   due to it being used as a poor-man's flexible array member.
+   Clear *NULTERM if ARG refers to a constant array that is known
+   not be nul-terminated.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  int fuzzy, bool *flexp)
+		  int fuzzy, bool *flexp, bool *nulterm)
 {
   tree var, val = NULL_TREE;
   gimple *def_stmt;
@@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	      if (TREE_CODE (aop0) == INDIRECT_REF
 		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 		return get_range_strlen (TREE_OPERAND (aop0, 0),
-					 length, visited, type, fuzzy, flexp);
+					 length, visited, type, fuzzy, flexp,
+					 nulterm);
 	    }
 	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
 	    {
@@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1);
+	{
+	  tree arr;
+	  val = c_strlen (arg, 1, &arr);
+	  if (val && arr)
+	    *nulterm = false;
+	}
 
       if (!val && fuzzy)
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
-				     visited, type, fuzzy, flexp);
+				     visited, type, fuzzy, flexp, nulterm);
 
 	  if (TREE_CODE (arg) == ARRAY_REF)
 	    {
@@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             || gimple_assign_unary_nop_p (def_stmt))
           {
             tree rhs = gimple_assign_rhs1 (def_stmt);
-	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
+	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
+				     nulterm);
           }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
@@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 
 	    for (unsigned int i = 0; i < 2; i++)
 	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
-				     flexp))
+				     flexp, nulterm))
 		{
 		  if (fuzzy == 2)
 		    *maxlen = build_all_ones_cst (size_type_node);
@@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             if (arg == gimple_phi_result (def_stmt))
               continue;
 
-	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
+	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
+				   nulterm))
 	      {
 		if (fuzzy == 2)
 		  *maxlen = build_all_ones_cst (size_type_node);
@@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
    and false if PHIs and COND_EXPRs are to be handled optimistically,
    if we can determine string length minimum and maximum; it will use
    the minimum from the ones where it can be determined.
-   STRICT false should be only used for warning code.  */
+   STRICT false should be only used for warning code.
+   When non-null, clear *NULTERM if ARG refers to a constant array
+   that is known not be nul-terminated.  Otherwise set it to true.  */
 
 bool
-get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
+get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
+		  bool *nulterm /* = NULL */)
 {
   bitmap visited = NULL;
 
   minmaxlen[0] = NULL_TREE;
   minmaxlen[1] = NULL_TREE;
 
+  bool nultermbuf;
+  if (!nulterm)
+    nulterm = &nultermbuf;
+  *nulterm = true;
+
   bool flexarray = false;
   if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
-			 &flexarray))
+			 &flexarray, nulterm))
     {
       minmaxlen[0] = NULL_TREE;
       minmaxlen[1] = NULL_TREE;
@@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type)
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
@@ -3496,12 +3514,14 @@ static bool
 gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  tree arg = gimple_call_arg (stmt, 0);
 
   wide_int minlen;
   wide_int maxlen;
 
   tree lenrange[2];
-  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
+  bool nulterm;
+  if (!get_range_strlen (arg, lenrange, true, &nulterm)
       && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
       && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
     {
@@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 
   if (minlen == maxlen)
     {
+      if (!nulterm)
+	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
+			    gimple_call_fndecl (stmt), arg);
+
       lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
 					      true, GSI_SAME_STMT);
       replace_call_with_value (gsi, lenrange[0]);
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..fe11728 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-extern bool get_range_strlen (tree, tree[2], bool = false);
+extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
 extern tree get_maxval_strlen (tree, int);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
new file mode 100644
index 0000000..2614bee
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
@@ -0,0 +1,43 @@
+/* PR tree-optimization/86711 - wrong folding of memchr
+
+   Verify that memchr() of arrays initialized with string literals
+   where the nul doesn't fit in the array doesn't find the nul.  */
+
+extern void* memchr (const void*, int, __SIZE_TYPE__);
+
+#define A(expr) \
+  ((expr) \
+  ? (void)0							\
+  : (__builtin_printf ("assertion failed on line %i: %s\n",	\
+		       __LINE__, #expr), \
+     __builtin_abort ()))
+
+static const char a1[4] = "1234";
+static const char a2[2][4] = { "1234", "5678" };
+
+static const char a3[2][4] = { "1234", "567" };
+
+int main ()
+{
+  volatile int i = 0;
+
+  A (memchr (a1, 0, sizeof a1) == 0);
+  A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0);
+  A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0);
+  A (memchr (a1 + i, 0, sizeof a1) == 0);
+
+  A (memchr (a2, 0, sizeof a2) == 0);
+
+  A (memchr (a2[0], 0, sizeof a2[0]) == 0);
+  A (memchr (a2[1], 0, sizeof a2[1]) == 0);
+
+  A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0);
+  A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0);
+  A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0);
+
+  A (memchr (a2[i], 0, sizeof a2[i]) == 0);
+  A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0);
+
+  /* This one must find it.  */
+  A (memchr (a3, 0, sizeof a3) == &a3[1][3]);
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
new file mode 100644
index 0000000..3ad6852
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
@@ -0,0 +1,26 @@
+/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too
+   long initializer
+
+   The excessively long initializer for a[0] is undefined but this
+   test verifies that the excess elements are not considered a part
+   of the value of the array as a matter of QoI.  */
+
+const char a[2][3] = { "1234", "xyz" };
+char b[6];
+
+void *pb = b;
+
+int main ()
+{
+   __builtin_memcpy (b, a, 4);
+   __builtin_memset (b + 4, 'a', 2);
+
+   if (b[0] != '1' || b[1] != '2' || b[2] != '3'
+       || b[3] != 'x' || b[4] != 'a' || b[5] != 'a')
+     __builtin_abort ();
+
+   if (__builtin_memcmp (pb, "123xaa", 6))
+     __builtin_abort ();
+
+   return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
new file mode 100644
index 0000000..838528f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
@@ -0,0 +1,239 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int i0 = 0;
+int i1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)					\
+  __attribute__ ((noipa))			\
+  void CAT (test_, __LINE__) (void) {		\
+    sink (strlen (str));			\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[i0]);           /* { dg-warning "nul" }  */
+T (&a[i0] + 1);       /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+T (b[i0]);
+
+T (&b[2][1]);
+T (&b[2][1] + 1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[3][1]);         /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+T (&b[3][i0]);        /* { dg-warning "nul" }  */
+T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
+T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
+
+T (i0 ? "" : b[0]);
+T (i0 ? "" : b[1]);
+T (i0 ? "" : b[2]);
+T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
+T (i0 ? b[0] : "");
+T (i0 ? b[1] : "");
+T (i0 ? b[2] : "");
+T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
+
+T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
+T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
+
+T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
+T (i0 ? b[0] : b[2]);
+T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
+T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
+
+T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
+T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
+
+/* It's possible to detect the missing nul in the following two
+   expressions but GCC doesn't do it yet.  */
+T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + i0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + i0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + i0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + i0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + i0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + i0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + i0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + i0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + i0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + i0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + i0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + i0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + i0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + i0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + i0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + i0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + i0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + i0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + i0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + i0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + i0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + i0);
+
+
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+
+T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
+T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
+
+T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
+T (i0 ? ba[0].a[1].b : ba[0].a[0].a);

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02  2:44     ` PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) Martin Sebor
@ 2018-08-02 13:26       ` Bernd Edlinger
  2018-08-02 18:56         ` Bernd Edlinger
                           ` (2 more replies)
  2018-08-17  5:15       ` Jeff Law
  1 sibling, 3 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-02 13:26 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/02/18 04:44, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
> Bernd, as I mentioned Wednesday, the patch supersedes
> yours here:
> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
> 

No problem, but I hope you understand, that I still uphold
my patch.

So we have two patches now:
- mine, fixing a wrong code bug,
- yours, implementing a new warning and fixing a wrong
code bug at the same time.

I will add a few comments to your patch below.

> Martin
> 
> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>> Attached is an updated version of the patch that handles more
>> instances of calling strlen() on a constant array that is not
>> a nul-terminated string.
>>
>> No other functions except strlen are explicitly handled yet,
>> and neither are constant arrays with braced-initializer lists
>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>> an independent solution for those (bug 86552).  Once those
>> are handled the warning will be able to detect those as well.
>>
>> Tested on x86_64-linux.
>>
>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>
>>> The fix for bug 86532 has been checked in so this enhancement
>>> can now be applied on top of it (with only minor adjustments).
>>>
>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>> GCC silently accepts constant character arrays with no
>>>> terminating nul as arguments to strlen (and other string
>>>> functions).
>>>>
>>>> The attached patch is a first step in detecting these kinds
>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>> The next step is to modify all other handlers of built-in
>>>> functions to detect the same problem (not part of this patch).
>>>> Yet another step is to detect these problems in arguments
>>>> initialized using the non-string form:
>>>>
>>>>   const char a[] = { 'a', 'b', 'c' };
>>>>
>>>> This patch is meant to apply on top of the one for bug 86532
>>>> (I tested it with an earlier version of that patch so there
>>>> is code in the context that does not appear in the latest
>>>> version of the other diff).
>>>>
>>>> Martin
>>>>
>>>
>>
> 
> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
> PR tree-optimization/86711 - wrong folding of memchr
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/86714
> 	PR tree-optimization/86711
> 	PR tree-optimization/86552
> 	* builtins.h (warn_string_no_nul): Declare..
> 	(c_strlen): Add argument.
> 	* builtins.c (warn_string_no_nul): New function.
> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
> 	(fold_builtin_1): Adjust.
> 	(string_length): Add argument and use it.
> 	(c_strlen): Same.
> 	(expand_builtin_strlen): Detect missing nul.
> 	* expr.c (string_constant): Add arguments.  Detect missing nul
> 	terminator and outermost declaration it's missing in.
> 	* expr.h (string_constant): Add argument.
> 	* fold-const.c (c_getstr): Change argument to bool*, rename
> 	other arguments.
> 	* fold-const-call.c (fold_const_call): Detect missing nul.
> 	* gimple-fold.c (get_range_strlen): Add argument.
> 	(get_maxval_strlen): Adjust.
> 	* gimple-fold.h (get_range_strlen): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/86714
> 	PR tree-optimization/86711
> 	PR tree-optimization/86552
> 	* gcc.c-torture/execute/memchr-1.c: New test.
> 	* gcc.c-torture/execute/pr86714.c: New test.
> 	* gcc.dg/warn-strlen-no-nul.c: New test.
> 
> diff --git a/gcc/builtins.c b/gcc/builtins.c
> index aa3e0d8..f4924d5 100644
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>  static rtx expand_builtin_expect (tree, rtx);
>  static tree fold_builtin_constant_p (tree);
>  static tree fold_builtin_classify_type (tree);
> -static tree fold_builtin_strlen (location_t, tree, tree);
> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>  static tree fold_builtin_inf (location_t, tree, int);
>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>  static bool validate_arg (const_tree, enum tree_code code);
> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>    return n;
>  }
>  
> +/* For a call expression EXP to a function that expects a string argument,
> +   issue a diagnostic due to it being a called with an argument NONSTR
> +   that is a character array with no terminating NUL.  */
> +
> +void
> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
> +{
> +  loc = expansion_point_location_if_in_system_header (loc);
> +
> +  bool warned;
> +  if (exp)
> +    {
> +      if (!fndecl)
> +	fndecl = get_callee_fndecl (exp);
> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
> +			   "%K%qD argument missing terminating nul",
> +			   exp, fndecl);
> +    }
> +  else
> +    {
> +      gcc_assert (fndecl);
> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
> +			   "%qD argument missing terminating nul",
> +			   fndecl);
> +    }
> +
> +  if (warned && DECL_P (nonstr))
> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
> +}
> +
>  /* Compute the length of a null-terminated character string or wide
>     character string handling character sizes of 1, 2, and 4 bytes.
>     TREE_STRING_LENGTH is not the right way because it evaluates to
> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>     accesses.  Note that this implies the result is not going to be emitted
>     into the instruction stream.
>  
> +   When ARR is non-null and the string is not properly nul-terminated,
> +   set *ARR to the declaration of the outermost constant object whose
> +   initializer (or one of its elements) is not nul-terminated.
> +
>     The value returned is of type `ssizetype'.
>  
>     Unfortunately, string_constant can't access the values of const char
>     arrays with initializers, so neither can we do so here.  */
>  
>  tree
> -c_strlen (tree src, int only_value)
> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>  {
>    STRIP_NOPS (src);
> +
> +  /* Used to detect non-nul-terminated strings in subexpressions
> +     of a conditional expression.  When ARR is null, point it at
> +     one of the elements for simplicity.  */
> +  tree arrs[] = { NULL_TREE, NULL_TREE };
> +  if (!arr)
> +    arr = arrs;

now arr is always != NULL

> +
>    if (TREE_CODE (src) == COND_EXPR
>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>      {
> -      tree len1, len2;
> -
> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>        if (tree_int_cst_equal (len1, len2))
> -	return len1;
> +	{

funny, if called with NULL *arr and arrs[0] alias each other.

> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
> +	  return len1;
> +	}
>      }
>  
>    if (TREE_CODE (src) == COMPOUND_EXPR
>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>  
>    location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>  
>    /* Offset from the beginning of the string in bytes.  */
>    tree byteoff;
> -  src = string_constant (src, &byteoff);
> -  if (src == 0)
> -    return NULL_TREE;
> +  /* Set if array is nul-terminated, false otherwise.  */
> +  bool nulterm;

note arr is always != null or pointing to arrs[0].

> +  src = string_constant (src, &byteoff, &nulterm, arr);
> +  if (!src)
> +    {
> +      *arr = arrs[0] ? arrs[0] : arrs[1];
> +      return NULL_TREE;
> +    }
> +
> +  /* Clear *ARR when the string is nul-terminated.  It should be
> +     of no interest to callers.  */
> +  if (nulterm)
> +    *arr = NULL_TREE;
>  
>    /* Determine the size of the string element.  */
>    unsigned eltsize
> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>  	maxelts = maxelts / eltsize - 1;
>        }
>  
> +  /* Unless the caller is prepared to handle it by passing in a non-null
> +     ARR, fail if the terminating nul doesn't fit in the array the string
> +     is stored in (as in const char a[3] = "123";  */

note arr is always != NULL, thus this if is never taken.

> +  if (!arr && maxelts < strelts)
> +    return NULL_TREE;
> +
>    /* PTR can point to the byte representation of any string type, including
>       char* and wchar_t*.  */
>    const char *ptr = TREE_STRING_POINTER (src);
> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>        offsave = fold_convert (ssizetype, offsave);
>        tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>  				      build_int_cst (ssizetype, len * eltsize));

this computation is wrong, it computes not in units of eltsize,
I am however not sure if it is really good that this function tries to
compute strlen of wide character strings.

That said, please fix this computation first, in a different patch
instead of just fixing the indentation. (I know I pointed that lines are too
long here, but that was before I realized that the whole length computation
here is wrong).

> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
> +				     offsave);
>        return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>  			      build_zero_cst (ssizetype));
>      }
> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>       Since ELTOFF is our starting index into the string, no further
>       calculation is needed.  */

What are you fixing here, I think that was another bug.
If this fixes something then it should be in a different patch,
just handling this.

>    unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
> -				maxelts - eltoff);
> +				strelts - eltoff);
>  
>    return ssize_int (len);
>  }
> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>  
>    struct expand_operand ops[4];
>    rtx pat;
> -  tree len;
>    tree src = CALL_EXPR_ARG (exp, 0);
>    rtx src_reg;
>    rtx_insn *before_strlen;
> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>    unsigned int align;
>  
>    /* If the length can be computed at compile-time, return it.  */
> -  len = c_strlen (src, 0);
> +  tree array;
> +  tree len = c_strlen (src, 0, &array);

You know the c_strlen tries to compute wide character sizes,
but strlen does not do that, strlen (L"abc") should give 1
(or 0 on a BE machine)
I wonder if that is correct.

>    if (len)
> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +    {
> +      if (array)
> +	{
> +	  /* Array refers to the non-nul terminated constant array
> +	     whose length is attempted to be computed.  */

I really wonder if it would not make more sense to have a
nonterminated_string_constant_p instead.

Last time I wanted to implement a warning in expand I faced the
problem that inlined functions will get one warning per invocation?

> +	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
> +	  return NULL_RTX;
> +	}
> +      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +    }
>  
>    /* If the length can be computed at compile-time and is constant
>       integer, but there are side-effects in src, evaluate
>       src for side-effects, then return len.
>       E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
>       can be optimized into: i++; x = 3;  */
> -  len = c_strlen (src, 1);
> -  if (len && TREE_CODE (len) == INTEGER_CST)
> +  len = c_strlen (src, 1, &array);
> +  if (len)
>      {
> -      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
> -      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +      if (array)
> +	{
> +	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
> +	  return NULL_RTX;
> +	}
> +
> +      if (TREE_CODE (len) == INTEGER_CST)
> +	{
> +	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
> +	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
> +	}
>      }
>  
>    align = get_pointer_alignment (src) / BITS_PER_UNIT;
> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>    return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>  }
>  
> -/* Fold a call to __builtin_strlen with argument ARG.  */
> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>  
>  static tree
> -fold_builtin_strlen (location_t loc, tree type, tree arg)
> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>  {
>    if (!validate_arg (arg, POINTER_TYPE))
>      return NULL_TREE;
>    else
>      {
> -      tree len = c_strlen (arg, 0);
> -
> +      tree arr = NULL_TREE;
> +      tree len = c_strlen (arg, 0, &arr);

Is it possible to write a test case where strlen(L"test") reaches this point?
what will c_strlen return then?

>        if (len)
> -	return fold_convert_loc (loc, type, len);
> +	{
> +	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
> +	    loc = EXPR_LOCATION (arg);
> +
> +	  /* To avoid warning multiple times about non-nul-terminated
> +	     strings only warn if their length has been determined
> +	     and it's being folded.  */
> +	  if (arr)
> +	    warn_string_no_nul (loc, NULL_TREE, fndecl, arr);
> +
> +	  return fold_convert_loc (loc, type, len);
> +	}
>  
>        return NULL_TREE;
>      }
> @@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
>        return fold_builtin_classify_type (arg0);
>  
>      case BUILT_IN_STRLEN:
> -      return fold_builtin_strlen (loc, type, arg0);
> +      return fold_builtin_strlen (loc, fndecl, type, arg0);
>  
>      CASE_FLT_FN (BUILT_IN_FABS):
>      CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
> diff --git a/gcc/builtins.h b/gcc/builtins.h
> index 2e0a2f9..73b0b0b 100644
> --- a/gcc/builtins.h
> +++ b/gcc/builtins.h
> @@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
>  				     unsigned HOST_WIDE_INT *);
>  extern unsigned int get_pointer_alignment (tree);
>  extern unsigned string_length (const void*, unsigned, unsigned);
> -extern tree c_strlen (tree, int);
> +extern tree c_strlen (tree, int, tree * = NULL);
>  extern void expand_builtin_setjmp_setup (rtx, rtx);
>  extern void expand_builtin_setjmp_receiver (rtx);
>  extern void expand_builtin_update_setjmp_buf (rtx);
> @@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
>  
>  extern internal_fn associated_internal_fn (tree);
>  extern internal_fn replacement_internal_fn (gcall *);
> -
> +extern void warn_string_no_nul (location_t, tree, tree, tree);
>  extern tree max_object_size ();
>  
>  #endif /* GCC_BUILTINS_H */
> diff --git a/gcc/expr.c b/gcc/expr.c
> index de6709d..edbd7f8 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp)
>  /* Return the tree node if an ARG corresponds to a string constant or zero
>     if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
>     non-constant) offset in bytes within the string that ARG is accessing.
> +   If NULTERM is non-null, consider valid even sequences of characters that
> +   aren't nul-terminated strings.  In that case, set NULTERM if ARG refers
> +   to such a sequence and clear it otherwise.
>     The type of the offset is sizetype.  */
>  
>  tree
> -string_constant (tree arg, tree *ptr_offset)
> +string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */,
> +		 tree *decl /* = NULL */)
>  {
>    tree array;
>    STRIP_NOPS (arg);
> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>  	return NULL_TREE;
>  
>        tree offset;
> -      if (tree str = string_constant (arg0, &offset))
> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>  	{
>  	  /* Avoid pointers to arrays (see bug 86622).  */
>  	  if (POINTER_TYPE_P (TREE_TYPE (arg))
> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>    if (TREE_CODE (array) == STRING_CST)

Well, actually I think there _are_ STING_CSTs which are not null terminated.
Maybe not in C. But Fortran, Ada, Go...

>      {
>        *ptr_offset = fold_convert (sizetype, offset);
> +      if (nulterm)
> +	*nulterm = true;
> +      if (decl)
> +	*decl = NULL_TREE;
>        return array;
>      }
>  
> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>    if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>      return NULL_TREE;
>  
> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
> +

I don't understand why this is necessary at all.
It looks way too complicated, to say the least.

TREE_TYPE (init) has already the type of the member.

> +  /* When ARG refers to an aggregate (of arrays) try to determine
> +     the size of the character array within the aggregate.  */
> +  tree ref = arg;
> +  tree reftype = TREE_TYPE (arg);
> +
> +  if (TREE_CODE (ref) == MEM_REF)
> +    {
> +      ref = TREE_OPERAND (ref, 0);
> +      if (TREE_CODE (ref) == ADDR_EXPR)
> +	{
> +	  ref = TREE_OPERAND (ref, 0);
> +	  reftype = TREE_TYPE (ref);
> +	}
> +    }
> +  else
> +    while (TREE_CODE (ref) == ARRAY_REF)
> +      {
> +	reftype = TREE_TYPE (ref);
> +	ref = TREE_OPERAND (ref, 0);
> +      }
> +
> +  if (TREE_CODE (ref) == COMPONENT_REF)
> +    reftype = TREE_TYPE (ref);
> +
> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
> +    {
> +      tree next = TREE_TYPE (reftype);
> +      if (TREE_CODE (next) == INTEGER_TYPE)
> +	{
> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
> +	    if (tree_fits_uhwi_p (size))
> +	      array_elts = tree_to_uhwi (size);

so array_elts is measued in bytes.

> +	  break;
> +	}
> +
> +      reftype = TREE_TYPE (reftype);
> +    }
> +
> +  if (decl)
> +    *decl = array;
> +
>    /* Avoid returning a string that doesn't fit in the array
>       it is stored in, like
>       const char a[4] = "abcde";
> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>    unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>    length = string_length (TREE_STRING_POINTER (init), charsize,
>  			  length / charsize);

Some callers especially those where the wrong code happens, expect
to be able to access the STRING_CST up to TREE_STRING_LENGTH,
But using string_length assume thy stop at the first nul char.

> -  if (compare_tree_int (array_size, length + 1) < 0)
> +  if (nulterm)

but here you compare bytes with length which is measued un chars.

> +    *nulterm = array_elts > length;
> +  else if (array_elts <= length)
>      return NULL_TREE;
>  
>    *ptr_offset = offset;
> diff --git a/gcc/expr.h b/gcc/expr.h
> index cf047d4..e630979 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -288,7 +288,7 @@ expand_normal (tree exp)
>  
>  /* Return the tree node and offset if a given argument corresponds to
>     a string constant.  */
> -extern tree string_constant (tree, tree *);
> +extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL);
>  
>  /* Two different ways of generating switch statements.  */
>  extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
> diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
> index 06a42060..849a443 100644
> --- a/gcc/fold-const-call.c
> +++ b/gcc/fold-const-call.c
> @@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
>    switch (fn)
>      {
>      case CFN_BUILT_IN_STRLEN:
> -      if (const char *str = c_getstr (arg))
> -	return build_int_cst (type, strlen (str));
> -      return NULL_TREE;
> +      {
> +	bool nulterm;
> +	if (const char *str = c_getstr (arg, NULL, &nulterm))
> +	  if (nulterm)
> +	    return build_int_cst (type, strlen (str));
> +
> +	return NULL_TREE;
> +      }
>  
>      CASE_CFN_NAN:
>      CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index b318fc77..ecbc38c 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
>  /* Return a pointer P to a NUL-terminated string representing the sequence
>     of constant characters referred to by SRC (or a subsequence of such
>     characters within it if SRC is a reference to a string plus some
> -   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
> -   If STRSIZE is non-null, store in *STRSIZE the size of the array
> -   the string is stored in; in that case, even though P points to a NUL
> -   terminated string, SRC need not refer to one.  This can happen when
> -   SRC refers to a constant character array initialized to all non-NUL
> -   values, as in the C declaration: char a[4] = "1234";  */
> +   constant offset).  If STRSIZE is non-null, store the size of the string
> +   literal in *STRSIZE, including any embedded or terminating nuls.  If
> +   If NULLTERM is non-null, set *NULLTERM if the referenced string is
> +   guaranteed to contain a terminating NUL.  Otherwise clear it.  This
> +   can happen in the case of valid C declarations such as:
> +   const char a[3] = "123";  */
>  
>  const char *
> -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
> -	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
> +c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
> +	  bool *nulterm /* = NULL */)
>  {
>    tree offset_node;
>  
> -  if (strlen)
> -    *strlen = 0;
> +  if (strsize)
> +    *strsize = 0;
>  
> -  src = string_constant (src, &offset_node);
> +  src = string_constant (src, &offset_node, nulterm);
>    if (src == 0)
>      return NULL;
>  
> @@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
>  	offset = tree_to_uhwi (offset_node);
>      }
>  
> -  /* STRING_LENGTH is the size of the string literal, including any
> -     embedded NULs.  STRING_SIZE is the size of the array the string
> +  /* STRING_SIZE is the size of the string literal, including any
> +     embedded NULs.  ARRAY_SIZE is the size of the array the string
>       literal is stored in.  */
> -  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
> -  unsigned HOST_WIDE_INT string_size = string_length;
> +  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
> +  unsigned HOST_WIDE_INT array_size = string_size;
>    tree type = TREE_TYPE (src);
>    if (tree size = TYPE_SIZE_UNIT (type))
>      if (tree_fits_shwi_p (size))
> -      string_size = tree_to_uhwi (size);
> +      array_size = tree_to_uhwi (size);
> +
> +  const char *string = TREE_STRING_POINTER (src);
>  
> -  if (strlen)
> +  if (strsize)
>      {
> -      /* Compute and store the length of the substring at OFFSET.
> +      /* Compute and store the size of the substring at OFFSET.
>  	 All offsets past the initial length refer to null strings.  */
> -      if (offset <= string_length)
> -	*strlen = string_length - offset;

this should be offset < string_length.

> +      if (offset <= string_size)
> +	*strsize = string_size - offset;
>        else
> -	*strlen = 0;

this should be 1, you may access the NUL byte of "".

> +	*strsize = 0;
>      }
>  
> -  const char *string = TREE_STRING_POINTER (src);
> -
> -  if (string_length == 0
> -      || offset >= string_size)
> +  if (string_size == 0
> +      || offset >= array_size)
>      return NULL;
>  
> -  if (strsize)
> -    {
> -      /* Support even constant character arrays that aren't proper
> -	 NUL-terminated strings.  */
> -      *strsize = string_size;
> -    }
> -  else if (string[string_length - 1] != '\0')

Well, this is broken for wide character strings.
but I hope we can get rid of STRING_CST which are
not explicitly null terminated.

> +  if (!nulterm && string[string_size - 1] != '\0')
>      {
> -      /* Support only properly NUL-terminated strings but handle
> -	 consecutive strings within the same array, such as the six
> -	 substrings in "1\0002\0003".  */
> +      /* When NULTERM is null, support only properly nul-terminated
> +	 strings but handle consecutive strings within the same array,
> +	 such as the six substrings in "1\0002\0003".  Otherwise, let
> +	 the caller deal with non-nul-terminated arrays.  */
>        return NULL;
>      }
>  
> -  return offset <= string_length ? string + offset : "";

this should be offset < string_size.

> +  return offset <= string_size ? string + offset : "";
>  }
>  
>  /* Given a tree T, compute which bits in T may be nonzero.  */
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 1b9ccc0..a58a4a2 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
>  extern tree const_binop (enum tree_code, tree, tree, tree);
>  extern bool negate_mathfn_p (combined_fn);
>  extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
> -			     unsigned HOST_WIDE_INT * = NULL);
> +			     bool * = NULL);
>  extern wide_int tree_nonzero_bits (const_tree);
>  
>  /* Return OFF converted to a pointer offset type suitable as offset for
> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> index c3fa570..9eefb37 100644
> --- a/gcc/gimple-fold.c
> +++ b/gcc/gimple-fold.c
> @@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
>     Set *FLEXP to true if the range of the string lengths has been
>     obtained from the upper bound of an array at the end of a struct.
>     Such an array may hold a string that's longer than its upper bound
> -   due to it being used as a poor-man's flexible array member.  */
> +   due to it being used as a poor-man's flexible array member.
> +   Clear *NULTERM if ARG refers to a constant array that is known
> +   not be nul-terminated.  */
>  
>  static bool
>  get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
> -		  int fuzzy, bool *flexp)
> +		  int fuzzy, bool *flexp, bool *nulterm)
>  {
>    tree var, val = NULL_TREE;
>    gimple *def_stmt;
> @@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  	      if (TREE_CODE (aop0) == INDIRECT_REF
>  		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
>  		return get_range_strlen (TREE_OPERAND (aop0, 0),
> -					 length, visited, type, fuzzy, flexp);
> +					 length, visited, type, fuzzy, flexp,
> +					 nulterm);
>  	    }
>  	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
>  	    {
> @@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  	    return false;
>  	}
>        else
> -	val = c_strlen (arg, 1);
> +	{
> +	  tree arr;
> +	  val = c_strlen (arg, 1, &arr);
> +	  if (val && arr)
> +	    *nulterm = false;
> +	}
>  
>        if (!val && fuzzy)
>  	{
>  	  if (TREE_CODE (arg) == ADDR_EXPR)
>  	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
> -				     visited, type, fuzzy, flexp);
> +				     visited, type, fuzzy, flexp, nulterm);
>  
>  	  if (TREE_CODE (arg) == ARRAY_REF)
>  	    {
> @@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>              || gimple_assign_unary_nop_p (def_stmt))
>            {
>              tree rhs = gimple_assign_rhs1 (def_stmt);
> -	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
> +	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
> +				     nulterm);
>            }
>  	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
>  	  {
> @@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>  
>  	    for (unsigned int i = 0; i < 2; i++)
>  	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
> -				     flexp))
> +				     flexp, nulterm))
>  		{
>  		  if (fuzzy == 2)
>  		    *maxlen = build_all_ones_cst (size_type_node);
> @@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>              if (arg == gimple_phi_result (def_stmt))
>                continue;
>  
> -	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
> +	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
> +				   nulterm))
>  	      {
>  		if (fuzzy == 2)
>  		  *maxlen = build_all_ones_cst (size_type_node);
> @@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
>     and false if PHIs and COND_EXPRs are to be handled optimistically,
>     if we can determine string length minimum and maximum; it will use
>     the minimum from the ones where it can be determined.
> -   STRICT false should be only used for warning code.  */
> +   STRICT false should be only used for warning code.
> +   When non-null, clear *NULTERM if ARG refers to a constant array
> +   that is known not be nul-terminated.  Otherwise set it to true.  */
>  
>  bool
> -get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
> +get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
> +		  bool *nulterm /* = NULL */)
>  {
>    bitmap visited = NULL;
>  
>    minmaxlen[0] = NULL_TREE;
>    minmaxlen[1] = NULL_TREE;
>  
> +  bool nultermbuf;
> +  if (!nulterm)
> +    nulterm = &nultermbuf;
> +  *nulterm = true;
> +
>    bool flexarray = false;
>    if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
> -			 &flexarray))
> +			 &flexarray, nulterm))
>      {
>        minmaxlen[0] = NULL_TREE;
>        minmaxlen[1] = NULL_TREE;
> @@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type)
>    tree len[2] = { NULL_TREE, NULL_TREE };
>  
>    bool dummy;
> -  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
> +  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL))
>      len[1] = NULL_TREE;
>    if (visited)
>      BITMAP_FREE (visited);
> @@ -3496,12 +3514,14 @@ static bool
>  gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
> +  tree arg = gimple_call_arg (stmt, 0);
>  
>    wide_int minlen;
>    wide_int maxlen;
>  
>    tree lenrange[2];
> -  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
> +  bool nulterm;
> +  if (!get_range_strlen (arg, lenrange, true, &nulterm)
>        && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
>        && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
>      {
> @@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
>  
>    if (minlen == maxlen)
>      {
> +      if (!nulterm)
> +	warn_string_no_nul (gimple_location (stmt), NULL_TREE,
> +			    gimple_call_fndecl (stmt), arg);
> +
>        lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL,
>  					      true, GSI_SAME_STMT);
>        replace_call_with_value (gsi, lenrange[0]);
> diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
> index 04e9bfa..fe11728 100644
> --- a/gcc/gimple-fold.h
> +++ b/gcc/gimple-fold.h
> @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
>  extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
>  extern tree canonicalize_constructor_val (tree, tree);
>  extern tree get_symbol_constant_value (tree);
> -extern bool get_range_strlen (tree, tree[2], bool = false);
> +extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL);
>  extern tree get_maxval_strlen (tree, int);
>  extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
>  extern bool fold_stmt (gimple_stmt_iterator *);
> diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
> new file mode 100644
> index 0000000..2614bee
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
> @@ -0,0 +1,43 @@
> +/* PR tree-optimization/86711 - wrong folding of memchr
> +
> +   Verify that memchr() of arrays initialized with string literals
> +   where the nul doesn't fit in the array doesn't find the nul.  */
> +
> +extern void* memchr (const void*, int, __SIZE_TYPE__);
> +
> +#define A(expr) \
> +  ((expr) \
> +  ? (void)0							\
> +  : (__builtin_printf ("assertion failed on line %i: %s\n",	\
> +		       __LINE__, #expr), \
> +     __builtin_abort ()))
> +
> +static const char a1[4] = "1234";
> +static const char a2[2][4] = { "1234", "5678" };
> +
> +static const char a3[2][4] = { "1234", "567" };
> +
> +int main ()
> +{
> +  volatile int i = 0;
> +
> +  A (memchr (a1, 0, sizeof a1) == 0);
> +  A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0);
> +  A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0);
> +  A (memchr (a1 + i, 0, sizeof a1) == 0);
> +
> +  A (memchr (a2, 0, sizeof a2) == 0);
> +
> +  A (memchr (a2[0], 0, sizeof a2[0]) == 0);
> +  A (memchr (a2[1], 0, sizeof a2[1]) == 0);
> +
> +  A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0);
> +  A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0);
> +  A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0);
> +
> +  A (memchr (a2[i], 0, sizeof a2[i]) == 0);
> +  A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0);
> +
> +  /* This one must find it.  */
> +  A (memchr (a3, 0, sizeof a3) == &a3[1][3]);
> +}
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
> new file mode 100644
> index 0000000..3ad6852
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
> @@ -0,0 +1,26 @@
> +/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too
> +   long initializer
> +
> +   The excessively long initializer for a[0] is undefined but this
> +   test verifies that the excess elements are not considered a part
> +   of the value of the array as a matter of QoI.  */
> +
> +const char a[2][3] = { "1234", "xyz" };
> +char b[6];
> +
> +void *pb = b;
> +
> +int main ()
> +{
> +   __builtin_memcpy (b, a, 4);
> +   __builtin_memset (b + 4, 'a', 2);
> +
> +   if (b[0] != '1' || b[1] != '2' || b[2] != '3'
> +       || b[3] != 'x' || b[4] != 'a' || b[5] != 'a')
> +     __builtin_abort ();
> +
> +   if (__builtin_memcmp (pb, "123xaa", 6))
> +     __builtin_abort ();
> +
> +   return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
> new file mode 100644
> index 0000000..838528f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
> @@ -0,0 +1,239 @@
> +/* PR tree-optimization/86552 - missing warning for reading past the end
> +   of non-string arrays
> +   { dg-do compile }
> +   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
> +
> +extern __SIZE_TYPE__ strlen (const char*);
> +
> +const char a[5] = "12345";   /* { dg-message "declared here" } */
> +
> +int i0 = 0;
> +int i1 = 1;
> +
> +void sink (int, ...);
> +
> +#define CONCAT(a, b)   a ## b
> +#define CAT(a, b)      CONCAT(a, b)
> +
> +#define T(str)					\
> +  __attribute__ ((noipa))			\
> +  void CAT (test_, __LINE__) (void) {		\
> +    sink (strlen (str));			\
> +  } typedef void dummy_type
> +
> +T (a);                /* { dg-warning "argument missing terminating nul" }  */
> +T (&a[0]);            /* { dg-warning "nul" }  */
> +T (&a[0] + 1);        /* { dg-warning "nul" }  */
> +T (&a[1]);            /* { dg-warning "nul" }  */
> +T (&a[i0]);           /* { dg-warning "nul" }  */
> +T (&a[i0] + 1);       /* { dg-warning "nul" }  */
> +
> +
> +const char b[][5] = { /* { dg-message "declared here" } */
> +  "12", "123", "1234", "54321"
> +};
> +
> +T (b[0]);
> +T (b[1]);
> +T (b[2]);
> +T (b[3]);             /* { dg-warning "nul" }  */
> +T (b[i0]);
> +
> +T (&b[2][1]);
> +T (&b[2][1] + 1);
> +T (&b[2][i0]);
> +T (&b[2][1] + i0);
> +
> +T (&b[3][1]);         /* { dg-warning "nul" }  */
> +T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
> +T (&b[3][i0]);        /* { dg-warning "nul" }  */
> +T (&b[3][1] + i0);    /* { dg-warning "nul" }  */
> +T (&b[3][i0] + i1);   /* { dg-warning "nul" }  */
> +
> +T (i0 ? "" : b[0]);
> +T (i0 ? "" : b[1]);
> +T (i0 ? "" : b[2]);
> +T (i0 ? "" : b[3]);               /* { dg-warning "nul" }  */
> +T (i0 ? b[0] : "");
> +T (i0 ? b[1] : "");
> +T (i0 ? b[2] : "");
> +T (i0 ? b[3] : "");               /* { dg-warning "nul" }  */
> +
> +T (i0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
> +T (i0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
> +
> +T (i0 ? a : b[3]);                /* { dg-warning "nul" }  */
> +T (i0 ? b[0] : b[2]);
> +T (i0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
> +T (i0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
> +
> +T (i0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
> +T (i0 ? b[1] : &b[3][1] + i0);    /* { dg-warning "nul" }  */
> +
> +/* It's possible to detect the missing nul in the following two
> +   expressions but GCC doesn't do it yet.  */
> +T (i0 ? &b[3][1] + i0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
> +T (i0 ? &b[3][i0] : &b[3][i1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
> +
> +
> +struct A { char a[5], b[5]; };
> +
> +const struct A s = { "1234", "12345" };
> +
> +T (s.a);
> +T (&s.a[0]);
> +T (&s.a[0] + 1);
> +T (&s.a[0] + i0);
> +T (&s.a[1]);
> +T (&s.a[1] + 1);
> +T (&s.a[1] + i0);
> +
> +T (s.b);              /* { dg-warning "nul" }  */
> +T (&s.b[0]);          /* { dg-warning "nul" }  */
> +T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
> +T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
> +T (&s.b[1]);          /* { dg-warning "nul" }  */
> +T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
> +T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
> +
> +struct B { struct A a[2]; };
> +
> +const struct B ba[] = {
> +  { { { "123", "12345" }, { "12345", "123" } } },
> +  { { { "12345", "123" }, { "123", "12345" } } },
> +  { { { "1", "12" },      { "123", "1234" } } },
> +  { { { "123", "1234" },  { "12345", "12" } } }
> +};
> +
> +T (ba[0].a[0].a);
> +T (&ba[0].a[0].a[0]);
> +T (&ba[0].a[0].a[0] + 1);
> +T (&ba[0].a[0].a[0] + i0);
> +T (&ba[0].a[0].a[1]);
> +T (&ba[0].a[0].a[1] + 1);
> +T (&ba[0].a[0].a[1] + i0);
> +
> +T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[0].b[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[0].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[0].a[1].b);
> +T (&ba[0].a[1].b[0]);
> +T (&ba[0].a[1].b[0] + 1);
> +T (&ba[0].a[1].b[0] + i0);
> +T (&ba[0].a[1].b[1]);
> +T (&ba[0].a[1].b[1] + 1);
> +T (&ba[0].a[1].b[1] + i0);
> +
> +
> +T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[0].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[1].a[0].b);
> +T (&ba[1].a[0].b[0]);
> +T (&ba[1].a[0].b[0] + 1);
> +T (&ba[1].a[0].b[0] + i0);
> +T (&ba[1].a[0].b[1]);
> +T (&ba[1].a[0].b[1] + 1);
> +T (&ba[1].a[0].b[1] + i0);
> +
> +T (ba[1].a[1].a);
> +T (&ba[1].a[1].a[0]);
> +T (&ba[1].a[1].a[0] + 1);
> +T (&ba[1].a[1].a[0] + i0);
> +T (&ba[1].a[1].a[1]);
> +T (&ba[1].a[1].a[1] + 1);
> +T (&ba[1].a[1].a[1] + i0);
> +
> +T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[1].a[1].b[1] + i0);  /* { dg-warning "nul" }  */
> +
> +
> +T (ba[2].a[0].a);
> +T (&ba[2].a[0].a[0]);
> +T (&ba[2].a[0].a[0] + 1);
> +T (&ba[2].a[0].a[0] + i0);
> +T (&ba[2].a[0].a[1]);
> +T (&ba[2].a[0].a[1] + 1);
> +T (&ba[2].a[0].a[1] + i0);
> +
> +T (ba[2].a[0].b);
> +T (&ba[2].a[0].b[0]);
> +T (&ba[2].a[0].b[0] + 1);
> +T (&ba[2].a[0].b[0] + i0);
> +T (&ba[2].a[0].b[1]);
> +T (&ba[2].a[0].b[1] + 1);
> +T (&ba[2].a[0].b[1] + i0);
> +
> +T (ba[2].a[1].a);
> +T (&ba[2].a[1].a[0]);
> +T (&ba[2].a[1].a[0] + 1);
> +T (&ba[2].a[1].a[0] + i0);
> +T (&ba[2].a[1].a[1]);
> +T (&ba[2].a[1].a[1] + 1);
> +T (&ba[2].a[1].a[1] + i0);
> +
> +
> +T (ba[3].a[0].a);
> +T (&ba[3].a[0].a[0]);
> +T (&ba[3].a[0].a[0] + 1);
> +T (&ba[3].a[0].a[0] + i0);
> +T (&ba[3].a[0].a[1]);
> +T (&ba[3].a[0].a[1] + 1);
> +T (&ba[3].a[0].a[1] + i0);
> +
> +T (ba[3].a[0].b);
> +T (&ba[3].a[0].b[0]);
> +T (&ba[3].a[0].b[0] + 1);
> +T (&ba[3].a[0].b[0] + i0);
> +T (&ba[3].a[0].b[1]);
> +T (&ba[3].a[0].b[1] + 1);
> +T (&ba[3].a[0].b[1] + i0);
> +
> +T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[0] + i0);  /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
> +T (&ba[3].a[1].a[1] + i0);  /* { dg-warning "nul" }  */
> +
> +T (ba[3].a[1].b);
> +T (&ba[3].a[1].b[0]);	
> +T (&ba[3].a[1].b[0] + 1);
> +T (&ba[3].a[1].b[0] + i0);
> +T (&ba[3].a[1].b[1]);	
> +T (&ba[3].a[1].b[1] + 1);
> +T (&ba[3].a[1].b[1] + i0);
> +
> +
> +T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +T (i0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
> +
> +T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
> +T (i0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
> +
> +T (i0 ? ba[0].a[0].a : ba[0].a[1].b);
> +T (i0 ? ba[0].a[1].b : ba[0].a[0].a);


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 13:26       ` Bernd Edlinger
@ 2018-08-02 18:56         ` Bernd Edlinger
  2018-08-02 20:34           ` Martin Sebor
  2018-08-29 17:17           ` Jeff Law
  2018-08-24  6:36         ` Jeff Law
  2018-08-24 16:51         ` Jeff Law
  2 siblings, 2 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-02 18:56 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/02/18 15:26, Bernd Edlinger wrote:
>>
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, &array);
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
> 
[snip]
>>
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>      return NULL_TREE;
>>    else
>>      {
>> -      tree len = c_strlen (arg, 0);
>> -
>> +      tree arr = NULL_TREE;
>> +      tree len = c_strlen (arg, 0, &arr);
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
> 

Yes, of course it is:

$ cat y.c
int f(char *x)
{
   return __builtin_strlen(x);
}

int main ()
{
   return f((char*)&L"abcdef"[0]);
}
$ gcc -O3 -S y.c
$ cat y.s
main:
.LFB1:
	.cfi_startproc
	movl	$6, %eax
	ret
	.cfi_endproc

The reason is that c_strlen tries to fold wide chars at all.
I do not know when that was introduced, was that already before your last patches?
Is it possible to revert the last few patches cleanly?


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 18:56         ` Bernd Edlinger
@ 2018-08-02 20:34           ` Martin Sebor
  2018-08-03 13:01             ` Bernd Edlinger
  2018-08-29 17:17           ` Jeff Law
  1 sibling, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-02 20:34 UTC (permalink / raw)
  To: Bernd Edlinger, Gcc Patch List

On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>
>>>    /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
>>
> [snip]
>>>
>>>  static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>  {
>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>      return NULL_TREE;
>>>    else
>>>      {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
>>
>
> Yes, of course it is:
>
> $ cat y.c
> int f(char *x)
> {
>    return __builtin_strlen(x);
> }
>
> int main ()
> {
>    return f((char*)&L"abcdef"[0]);
> }
> $ gcc -O3 -S y.c
> $ cat y.s
> main:
> .LFB1:
> 	.cfi_startproc
> 	movl	$6, %eax
> 	ret
> 	.cfi_endproc
>
> The reason is that c_strlen tries to fold wide chars at all.
> I do not know when that was introduced, was that already before your last patches?

The function itself was introduced in 1992 if not earlier,
before wide strings even existed.  AFAICS, it has always
accepted strings of all widths.  Until r241489 (in GCC 7)
it computed their length in bytes, not characters.  I don't
know if that was on purpose or if it was just never changed
to compute the length in characters when wide strings were
first introduced.  From the name I assume it's the latter.
The difference wasn't detected until sprintf tests were added
for wide string directives.  The ChangeLog description for
the change reads: Correctly handle wide strings.  I didn't
consider pathological cases like strlen (L"abc").  It
shouldn't be difficult to change to fix this case.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 20:34           ` Martin Sebor
@ 2018-08-03 13:01             ` Bernd Edlinger
  2018-08-03 19:59               ` Martin Sebor
  2018-08-15  5:31               ` Jeff Law
  0 siblings, 2 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-03 13:01 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/02/18 22:34, Martin Sebor wrote:
> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>
>>>>    /* If the length can be computed at compile-time, return it.  */
>>>> -  len = c_strlen (src, 0);
>>>> +  tree array;
>>>> +  tree len = c_strlen (src, 0, &array);
>>>
>>> You know the c_strlen tries to compute wide character sizes,
>>> but strlen does not do that, strlen (L"abc") should give 1
>>> (or 0 on a BE machine)
>>> I wonder if that is correct.
>>>
>> [snip]
>>>>
>>>>  static tree
>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>  {
>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>      return NULL_TREE;
>>>>    else
>>>>      {
>>>> -      tree len = c_strlen (arg, 0);
>>>> -
>>>> +      tree arr = NULL_TREE;
>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>
>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>> what will c_strlen return then?
>>>
>>
>> Yes, of course it is:
>>
>> $ cat y.c
>> int f(char *x)
>> {
>>    return __builtin_strlen(x);
>> }
>>
>> int main ()
>> {
>>    return f((char*)&L"abcdef"[0]);
>> }
>> $ gcc -O3 -S y.c
>> $ cat y.s
>> main:
>> .LFB1:
>>     .cfi_startproc
>>     movl    $6, %eax
>>     ret
>>     .cfi_endproc
>>
>> The reason is that c_strlen tries to fold wide chars at all.
>> I do not know when that was introduced, was that already before your last patches?
> 
> The function itself was introduced in 1992 if not earlier,
> before wide strings even existed.  AFAICS, it has always
> accepted strings of all widths.  Until r241489 (in GCC 7)
> it computed their length in bytes, not characters.  I don't
> know if that was on purpose or if it was just never changed
> to compute the length in characters when wide strings were
> first introduced.  From the name I assume it's the latter.
> The difference wasn't detected until sprintf tests were added
> for wide string directives.  The ChangeLog description for
> the change reads: Correctly handle wide strings.  I didn't
> consider pathological cases like strlen (L"abc").  It
> shouldn't be difficult to change to fix this case.
> 

Oh, oh, oh....

$ cat y3.c
int main ()
{
   char c[100];
   int x = __builtin_sprintf (c, "%S", L"\uFFFF");

   __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
}

$ gcc-4.8 -O3 -std=c99 y3.c
$ ./a.out
-1 0
$ gcc -O3 y3.c
$ ./a.out
1 0
$ echo $LANG
de_DE.UTF-8

I would have expected L"\uFFFF" to converted to UTF-8
or another encoding, so the return value if sprintf is
far from obvious, and probably language dependent.

Why do you think it is a good idea to use really every
opportunity to optimize totally unnecessary things like
using the return value from the sprintf function as it is?

Did you never think this adds a significant maintenance
burden on the rest of us as well?


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-03 13:01             ` Bernd Edlinger
@ 2018-08-03 19:59               ` Martin Sebor
  2018-08-15  5:31               ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-03 19:59 UTC (permalink / raw)
  To: Bernd Edlinger, Gcc Patch List

On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>>
>>>>>    /* If the length can be computed at compile-time, return it.  */
>>>>> -  len = c_strlen (src, 0);
>>>>> +  tree array;
>>>>> +  tree len = c_strlen (src, 0, &array);
>>>>
>>>> You know the c_strlen tries to compute wide character sizes,
>>>> but strlen does not do that, strlen (L"abc") should give 1
>>>> (or 0 on a BE machine)
>>>> I wonder if that is correct.
>>>>
>>> [snip]
>>>>>
>>>>>  static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>  {
>>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>>      return NULL_TREE;
>>>>>    else
>>>>>      {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>>>
>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)&L"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>>     .cfi_startproc
>>>     movl    $6, %eax
>>>     ret
>>>     .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
>
> Oh, oh, oh....
>
> $ cat y3.c
> int main ()
> {
>    char c[100];
>    int x = __builtin_sprintf (c, "%S", L"\uFFFF");
>
>    __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
>
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
>
> I would have expected L"\uFFFF" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
>
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
>
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?

Your condescending tone is uncalled for, and you clearly speak
out of ignorance.  I don't owe you an explanation but as I have
said multiple times: most of my work, including the sprintf pass,
is primarily motivated by detecting bugs like buffer overflow.
Optimization is only a secondary goal (but bug detection depends
on it).  It may come as a shock to you but mistakes happen.
That's why it's important to make an effort to detect them.
This is one is a simple typo (handling %S the same way as %s
instead of %ls.

If you are incapable of a professional tone I would suggest
you go harass someone else.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714)
  2018-07-25 23:38 ` PING " Martin Sebor
  2018-07-30 19:18   ` Martin Sebor
@ 2018-08-13 21:23   ` Martin Sebor
  2018-08-13 21:25     ` [PATCH 1/6] prevent folding of unterminated const arrays in memchr calls (PR " Martin Sebor
                       ` (6 more replies)
  1 sibling, 7 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:23 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

To make reviewing the changes easier I've split up the patch
into a series:

1. Detection of nul-terminated constant arrays to prevent early
    folding.  This resolves PR 86711 - wrong folding of memchr,
    and prevents PR 86714 - tree-ssa-forwprop.c confused by too
    long initializer, but doesn't warn.

2. Warn for reads past unterminated constant character arrays.
    This adds warnings for string functions called with such arrays
    to resolve PR 86552 - missing warning for reading past the end
    of non-string arrays.  Now that GCC transforms braced-initializer
    lists into STRING_CSTs (even those with no nul), the warning is
    capable of diagnosing even those.

    2.1 strlen
    2.2 strcpy
    2.3 sprintf
    2.4 stpcpy
    2.5 strnlen

There are many more string functions where unterminated (constant
or otherwise) should be diagnosed.  I plan to continue to work on
those (with the constant ones first)  but I want to post this
updated patch for review now, mainly so that the wrong code bug
(PR 86711) can be resolved and the basic detection infrastructure
agreed on.

An open question in my mind is what should GCC do with such calls
after issuing a warning: replace them with traps?  Fold them into
constants?  Or continue to pass them through to the corresponding
library functions?

Martin

On 07/25/2018 05:38 PM, Martin Sebor wrote:
> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>
> The fix for bug 86532 has been checked in so this enhancement
> can now be applied on top of it (with only minor adjustments).
>
> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>> In the discussion of my patch for pr86532 Bernd noted that
>> GCC silently accepts constant character arrays with no
>> terminating nul as arguments to strlen (and other string
>> functions).
>>
>> The attached patch is a first step in detecting these kinds
>> of bugs in strlen calls by issuing -Wstringop-overflow.
>> The next step is to modify all other handlers of built-in
>> functions to detect the same problem (not part of this patch).
>> Yet another step is to detect these problems in arguments
>> initialized using the non-string form:
>>
>>   const char a[] = { 'a', 'b', 'c' };
>>
>> This patch is meant to apply on top of the one for bug 86532
>> (I tested it with an earlier version of that patch so there
>> is code in the context that does not appear in the latest
>> version of the other diff).
>>
>> Martin
>>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 1/6] prevent folding of unterminated const arrays in memchr calls (PR 86711, 86714)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
@ 2018-08-13 21:25     ` Martin Sebor
  2018-08-13 21:27     ` [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552) Martin Sebor
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:25 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 278 bytes --]

The attached changes implement the detection of nul-terminated
constant arrays and incorrect or early folding of such arrays.
This resolves PR 86711 - wrong folding of memchr, and prevents
PR 86714 - tree-ssa-forwprop.c confused by too long initializer.
No warnings are issued.

[-- Attachment #2: gcc-86552-1.diff --]
[-- Type: text/x-patch, Size: 31615 bytes --]

PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
PR tree-optimization/86711 - wrong folding of memchr

gcc/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	* builtins.h (c_strlen): Add argument.
	* builtins.c (c_strlen): Add argument and use it.
	* expr.c (string_constant): Add arguments.  Detect missing nul
	terminator and outermost declaration it's missing in.
	* expr.h (string_constant): Add argument.
	* fold-const.c (c_getstr): Change argument to tree*, rename
	other arguments.
	* fold-const-call.c (fold_const_call): Avoid folding calls with
	unterminated arrays.
	* gimple-fold.c (get_range_strlen): Add argument.
	(get_maxval_strlen): Adjust.
	* gimple-fold.h (get_range_strlen): Add argument.

gcc/testsuite/ChangeLog:

	PR tree-optimization/86714
	PR tree-optimization/86711
	* gcc.c-torture/execute/memchr-1.c: New test.
	* gcc.c-torture/execute/pr86714.c: New test.
	* gcc/testsuite/gcc.dg/strlenopt-56.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 39611de..a7aa4b2 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -567,37 +567,55 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
    accesses.  Note that this implies the result is not going to be emitted
    into the instruction stream.
 
+   When NONSTR is non-null and the string is not properly nul-terminated,
+   set *NONSTR to the declaration of the outermost constant object whose
+   initializer (or one of its elements) is not nul-terminated.
+
    The value returned is of type `ssizetype'.
 
    Unfortunately, string_constant can't access the values of const char
    arrays with initializers, so neither can we do so here.  */
 
 tree
-c_strlen (tree src, int only_value)
+c_strlen (tree src, int only_value, tree *nonstr /* = NULL */)
 {
   STRIP_NOPS (src);
+
+  /* Used to detect non-nul-terminated strings in subexpressions
+     of a conditional expression.  When NONSTR is null, point it
+     arbitrarily at one of the elements for simplicity.  */
+  tree nstrs[] = { NULL_TREE, NULL_TREE };
+  if (!nonstr)
+    nonstr = nstrs;
+
   if (TREE_CODE (src) == COND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
     {
-      tree len1, len2;
-
-      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
-      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
+      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, nstrs);
+      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, nstrs + 1);
       if (tree_int_cst_equal (len1, len2))
-	return len1;
+	{
+	  *nonstr = nstrs[0] ? nstrs[0] : nstrs[1];
+	  return len1;
+	}
     }
 
   if (TREE_CODE (src) == COMPOUND_EXPR
       && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
-    return c_strlen (TREE_OPERAND (src, 1), only_value);
+    return c_strlen (TREE_OPERAND (src, 1), only_value, nonstr);
 
   location_t loc = EXPR_LOC_OR_LOC (src, input_location);
 
   /* Offset from the beginning of the string in bytes.  */
   tree byteoff;
-  src = string_constant (src, &byteoff);
-  if (src == 0)
-    return NULL_TREE;
+  src = string_constant (src, &byteoff, nonstr);
+  if (!src)
+    {
+      /* On failure set *NONSTR to the first non-null NSTRS element
+	 if one is non-null, or to null.  */
+      *nonstr = nstrs[0] ? nstrs[0] : nstrs[1];
+      return NULL_TREE;
+    }
 
   /* Determine the size of the string element.  */
   unsigned eltsize
@@ -641,21 +659,25 @@ c_strlen (tree src, int only_value)
       if (!maxelts)
 	return ssize_int (0);
 
-      /* We don't know the starting offset, but we do know that the string
-	 has no internal zero bytes.  If the offset falls within the bounds
-	 of the string subtract the offset from the length of the string,
-	 and return that.  Otherwise the length is zero.  Take care to
-	 use SAVE_EXPR in case the OFFSET has side-effects.  */
-      tree offsave = TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
+      /* We don't know the starting offset, but we do know that
+	 the string has no internal NUL characters.  If the byte
+	 offset falls within the bounds of the string subtract
+	 the offset from the length of the string in bytes, and
+	 return the result.  Otherwise the length is zero.  Take
+	 care to use SAVE_EXPR in case the OFFSET has side-effects.  */
+      tree offsave
+	= TREE_SIDE_EFFECTS (byteoff) ? save_expr (byteoff) : byteoff;
       offsave = fold_convert (ssizetype, offsave);
       tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
 				      build_int_cst (ssizetype, len * eltsize));
-      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
+      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
+				     offsave);
       return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
 			      build_zero_cst (ssizetype));
     }
 
-  /* Offset from the beginning of the string in elements.  */
+  /* Offset in (possibly wide) characters from the beginning of the string
+     in elements.  */
   HOST_WIDE_INT eltoff;
 
   /* We have a known offset into the string.  Start searching there for
@@ -683,14 +705,11 @@ c_strlen (tree src, int only_value)
       return NULL_TREE;
     }
 
-  /* Use strlen to search for the first zero byte.  Since any strings
-     constructed with build_string will have nulls appended, we win even
-     if we get handed something like (char[4])"abcd".
-
-     Since ELTOFF is our starting index into the string, no further
-     calculation is needed.  */
+  /* Search at most STRELTS - ELTOFF characters for the first (possibly
+     wide) NUL character starting at the byte offset.  Return the length
+     of the substring.  */
   unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
-				maxelts - eltoff);
+				strelts - eltoff);
 
   return ssize_int (len);
 }
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 2e0a2f9..27e6959 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *,
 				     unsigned HOST_WIDE_INT *);
 extern unsigned int get_pointer_alignment (tree);
 extern unsigned string_length (const void*, unsigned, unsigned);
-extern tree c_strlen (tree, int);
+extern tree c_strlen (tree, int, tree * = NULL);
 extern void expand_builtin_setjmp_setup (rtx, rtx);
 extern void expand_builtin_setjmp_receiver (rtx);
 extern void expand_builtin_update_setjmp_buf (rtx);
diff --git a/gcc/expr.c b/gcc/expr.c
index de6709d..d228af5 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11271,10 +11271,13 @@ is_aligning_offset (const_tree offset, const_tree exp)
 /* Return the tree node if an ARG corresponds to a string constant or zero
    if it doesn't.  If we return nonzero, set *PTR_OFFSET to the (possibly
    non-constant) offset in bytes within the string that ARG is accessing.
+   If NONSTR is non-null, consider valid even sequences of characters that
+   aren't nul-terminated strings.  In that case, if ARG refers to such
+   a sequence set *NONSTR to its declaration and clear it otherwise.
    The type of the offset is sizetype.  */
 
 tree
-string_constant (tree arg, tree *ptr_offset)
+string_constant (tree arg, tree *ptr_offset, tree *nonstr /* = NULL */)
 {
   tree array;
   STRIP_NOPS (arg);
@@ -11328,7 +11331,7 @@ string_constant (tree arg, tree *ptr_offset)
 	return NULL_TREE;
 
       tree offset;
-      if (tree str = string_constant (arg0, &offset))
+      if (tree str = string_constant (arg0, &offset, nonstr))
 	{
 	  /* Avoid pointers to arrays (see bug 86622).  */
 	  if (POINTER_TYPE_P (TREE_TYPE (arg))
@@ -11368,6 +11371,8 @@ string_constant (tree arg, tree *ptr_offset)
   if (TREE_CODE (array) == STRING_CST)
     {
       *ptr_offset = fold_convert (sizetype, offset);
+      if (nonstr)
+	*nonstr = NULL_TREE;
       return array;
     }
 
@@ -11414,20 +11419,34 @@ string_constant (tree arg, tree *ptr_offset)
   if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
     return NULL_TREE;
 
-  /* Avoid returning a string that doesn't fit in the array
-     it is stored in, like
+  /* Avoid returning an array that is unterminated because it lacks
+     a terminating nul, like
      const char a[4] = "abcde";
-     but do handle those that fit even if they have excess
+     but do handle those that are strings even if they have excess
      initializers, such as in
      const char a[4] = "abc\000\000";
      The excess elements contribute to TREE_STRING_LENGTH()
      but not to strlen().  */
   unsigned HOST_WIDE_INT charsize
     = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (init))));
+  /* Compute the lower bound number of elements (not bytes) in the array
+     that the string is used to initialize.  The actual size of the array
+     will be may be greater if the string is shorter, but the important
+     data point is whether the literal, including the terminating nul,
+     fits in the array. */
+  unsigned HOST_WIDE_INT array_elts
+    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
+
+  /* Compute the string length in (wide) characters.  */
   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
   length = string_length (TREE_STRING_POINTER (init), charsize,
 			  length / charsize);
-  if (compare_tree_int (array_size, length + 1) < 0)
+  /* If the caller is prepared to handle unterminated arrays (as
+     indicated by a non-nul NONSTR), set *NONSTR to the array and
+     return the initializer.  Otherwise fail.  */
+  if (nonstr)
+    *nonstr = array_elts > length ? NULL_TREE : array;
+  else if (array_elts <= length)
     return NULL_TREE;
 
   *ptr_offset = offset;
diff --git a/gcc/expr.h b/gcc/expr.h
index cf047d4..d4d2564 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -288,7 +288,7 @@ expand_normal (tree exp)
 
 /* Return the tree node and offset if a given argument corresponds to
    a string constant.  */
-extern tree string_constant (tree, tree *);
+extern tree string_constant (tree, tree *, tree * = NULL);
 
 /* Two different ways of generating switch statements.  */
 extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability);
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 06a42060..f6bab7b 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg)
   switch (fn)
     {
     case CFN_BUILT_IN_STRLEN:
-      if (const char *str = c_getstr (arg))
-	return build_int_cst (type, strlen (str));
-      return NULL_TREE;
+      {
+	tree nonstr = NULL_TREE;
+	if (const char *str = c_getstr (arg, NULL, &nonstr))
+	  if (!nonstr)
+	    return build_int_cst (type, strlen (str));
+
+	return NULL_TREE;
+      }
 
     CASE_CFN_NAN:
     CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN):
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index b318fc77..97a35f5 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -14577,24 +14577,26 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off)
 /* Return a pointer P to a NUL-terminated string representing the sequence
    of constant characters referred to by SRC (or a subsequence of such
    characters within it if SRC is a reference to a string plus some
-   constant offset).  If STRLEN is non-null, store stgrlen(P) in *STRLEN.
-   If STRSIZE is non-null, store in *STRSIZE the size of the array
-   the string is stored in; in that case, even though P points to a NUL
-   terminated string, SRC need not refer to one.  This can happen when
-   SRC refers to a constant character array initialized to all non-NUL
-   values, as in the C declaration: char a[4] = "1234";  */
+   constant offset).  If STRSIZE is non-null, store the size of the string
+   literal in *STRSIZE, including any embedded or terminating nuls.  If
+   SRC refers to an array that is not a nul-terminated string and NONSTR
+   is non-null, set it to the declaration of the array, otherwise clear it.
+   The former can happen in the case of valid C declarations such as:
+     const char a[3] = "123";  */
 
 const char *
-c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
-	  unsigned HOST_WIDE_INT *strsize /* = NULL */)
+c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */,
+	  tree *nonstr /* = NULL */)
 {
   tree offset_node;
 
-  if (strlen)
-    *strlen = 0;
+  if (strsize)
+    *strsize = 0;
 
-  src = string_constant (src, &offset_node);
-  if (src == 0)
+  /* Set to non-null if SRC refers to an unterminated array.  */
+  tree mynonstr;
+  src = string_constant (src, &offset_node, &mynonstr);
+  if (src == NULL_TREE)
     return NULL;
 
   unsigned HOST_WIDE_INT offset = 0;
@@ -14606,47 +14608,45 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */,
 	offset = tree_to_uhwi (offset_node);
     }
 
-  /* STRING_LENGTH is the size of the string literal, including any
-     embedded NULs.  STRING_SIZE is the size of the array the string
-     literal is stored in.  */
-  unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src);
-  unsigned HOST_WIDE_INT string_size = string_length;
+  /* STRING_SIZE is the size of the string literal, including any
+     embedded and trailing NULs, in bytes.  ARRAY_SIZE is the size
+     of the array the string literal is stored in, in bytes.  */
+  unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src);
+  unsigned HOST_WIDE_INT array_size = string_size;
   tree type = TREE_TYPE (src);
   if (tree size = TYPE_SIZE_UNIT (type))
     if (tree_fits_shwi_p (size))
-      string_size = tree_to_uhwi (size);
+      array_size = tree_to_uhwi (size);
+
+  /* Pointer to the (possibly wide) string representation.  */
+  const char *strdata = TREE_STRING_POINTER (src);
 
-  if (strlen)
+  if (strsize)
     {
-      /* Compute and store the length of the substring at OFFSET.
+      /* Compute and store the size of the substring at OFFSET.
 	 All offsets past the initial length refer to null strings.  */
-      if (offset <= string_length)
-	*strlen = string_length - offset;
+      if (offset <= string_size)
+	*strsize = string_size - offset;
       else
-	*strlen = 0;
+	*strsize = 0;
     }
 
-  const char *string = TREE_STRING_POINTER (src);
-
-  if (string_length == 0
-      || offset >= string_size)
+  if (string_size == 0
+      || offset >= array_size)
     return NULL;
 
-  if (strsize)
-    {
-      /* Support even constant character arrays that aren't proper
-	 NUL-terminated strings.  */
-      *strsize = string_size;
-    }
-  else if (string[string_length - 1] != '\0')
+  if (nonstr)
+    *nonstr = mynonstr;
+  else if (mynonstr)
     {
-      /* Support only properly NUL-terminated strings but handle
-	 consecutive strings within the same array, such as the six
-	 substrings in "1\0002\0003".  */
+      /* When NONSTR is null, support only properly nul-terminated
+	 strings but handle consecutive strings within the same array,
+	 such as the six substrings in "1\0002\0003".  Otherwise, let
+	 the caller deal with non-nul-terminated arrays.  */
       return NULL;
     }
 
-  return offset <= string_length ? string + offset : "";
+  return offset <= string_size ? strdata + offset : "";
 }
 
 /* Given a tree T, compute which bits in T may be nonzero.  */
diff --git a/gcc/fold-const.h b/gcc/fold-const.h
index 1b9ccc0..e3fec20 100644
--- a/gcc/fold-const.h
+++ b/gcc/fold-const.h
@@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree);
 extern tree const_binop (enum tree_code, tree, tree, tree);
 extern bool negate_mathfn_p (combined_fn);
 extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL,
-			     unsigned HOST_WIDE_INT * = NULL);
+			     tree * = NULL);
 extern wide_int tree_nonzero_bits (const_tree);
 
 /* Return OFF converted to a pointer offset type suitable as offset for
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 506a296..5c88e33 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len)
    Set *FLEXP to true if the range of the string lengths has been
    obtained from the upper bound of an array at the end of a struct.
    Such an array may hold a string that's longer than its upper bound
-   due to it being used as a poor-man's flexible array member.  */
+   due to it being used as a poor-man's flexible array member.
+   Clear *NULTERM if ARG refers to a constant array that is known
+   not be nul-terminated.  */
 
 static bool
 get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
-		  int fuzzy, bool *flexp)
+		  int fuzzy, bool *flexp, tree *nonstr)
 {
   tree var, val = NULL_TREE;
   gimple *def_stmt;
@@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	      if (TREE_CODE (aop0) == INDIRECT_REF
 		  && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME)
 		return get_range_strlen (TREE_OPERAND (aop0, 0),
-					 length, visited, type, fuzzy, flexp);
+					 length, visited, type, fuzzy, flexp,
+					 nonstr);
 	    }
 	  else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy)
 	    {
@@ -1329,13 +1332,20 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 	    return false;
 	}
       else
-	val = c_strlen (arg, 1);
+	{
+	  /* Determine the string length.  If NONSTR is non-nul, also
+	     consider non-terminated arrays.  */
+	  tree tmparr;
+	  val = c_strlen (arg, 1, nonstr ? &tmparr : NULL);
+	  if (val && tmparr)
+	    *nonstr = tmparr;
+	}
 
       if (!val && fuzzy)
 	{
 	  if (TREE_CODE (arg) == ADDR_EXPR)
 	    return get_range_strlen (TREE_OPERAND (arg, 0), length,
-				     visited, type, fuzzy, flexp);
+				     visited, type, fuzzy, flexp, nonstr);
 
 	  if (TREE_CODE (arg) == ARRAY_REF)
 	    {
@@ -1477,7 +1487,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             || gimple_assign_unary_nop_p (def_stmt))
           {
             tree rhs = gimple_assign_rhs1 (def_stmt);
-	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp);
+	    return get_range_strlen (rhs, length, visited, type, fuzzy, flexp,
+				     nonstr);
           }
 	else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR)
 	  {
@@ -1486,7 +1497,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
 
 	    for (unsigned int i = 0; i < 2; i++)
 	      if (!get_range_strlen (ops[i], length, visited, type, fuzzy,
-				     flexp))
+				     flexp, nonstr))
 		{
 		  if (fuzzy == 2)
 		    *maxlen = build_all_ones_cst (size_type_node);
@@ -1513,7 +1524,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
             if (arg == gimple_phi_result (def_stmt))
               continue;
 
-	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp))
+	    if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp,
+				   nonstr))
 	      {
 		if (fuzzy == 2)
 		  *maxlen = build_all_ones_cst (size_type_node);
@@ -1545,19 +1557,28 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type,
    and false if PHIs and COND_EXPRs are to be handled optimistically,
    if we can determine string length minimum and maximum; it will use
    the minimum from the ones where it can be determined.
-   STRICT false should be only used for warning code.  */
+   STRICT false should be only used for warning code.
+   When non-null, clear *NONSTR if ARG refers to a constant array
+   that is known not be nul-terminated.  Otherwise set it to
+   the declaration of the constant non-terminated array. */
 
 bool
-get_range_strlen (tree arg, tree minmaxlen[2], bool strict)
+get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
+		  tree *nonstr /* = NULL */)
 {
   bitmap visited = NULL;
 
   minmaxlen[0] = NULL_TREE;
   minmaxlen[1] = NULL_TREE;
 
+  tree nonstrbuf;
+  if (!nonstr)
+    nonstr = &nonstrbuf;
+  *nonstr = NULL_TREE;
+
   bool flexarray = false;
   if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2,
-			 &flexarray))
+			 &flexarray, nonstr))
     {
       minmaxlen[0] = NULL_TREE;
       minmaxlen[1] = NULL_TREE;
@@ -1576,12 +1597,15 @@ get_maxval_strlen (tree arg, int type)
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy))
+  /* Set to non-null if ARG refers to an unterminated array.  */
+  tree nonstr = NULL_TREE;
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, &nonstr))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
 
-  return len[1];
+  /* Fail if the constant array isn't nul-terminated.  */
+  return nonstr ? NULL_TREE : len[1];
 }
 
 
@@ -3495,12 +3519,15 @@ static bool
 gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi)
 {
   gimple *stmt = gsi_stmt (*gsi);
+  tree arg = gimple_call_arg (stmt, 0);
 
   wide_int minlen;
   wide_int maxlen;
 
+  /* Set to non-null if ARG refers to an unterminated array.  */
+  tree nonstr;
   tree lenrange[2];
-  if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true)
+  if (!get_range_strlen (arg, lenrange, true, &nonstr)
       && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST
       && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST)
     {
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 04e9bfa..9bfc468 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
-extern bool get_range_strlen (tree, tree[2], bool = false);
+extern bool get_range_strlen (tree, tree[2], bool = false, tree * = NULL);
 extern tree get_maxval_strlen (tree, int);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
new file mode 100644
index 0000000..ec37632
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c
@@ -0,0 +1,153 @@
+/* PR tree-optimization/86711 - wrong folding of memchr
+
+   Verify that memchr() of arrays initialized with string literals
+   where the nul doesn't fit in the array doesn't find the nul.  */
+typedef __SIZE_TYPE__  size_t;
+typedef __WCHAR_TYPE__ wchar_t;
+
+extern void* memchr (const void*, int, size_t);
+
+#define A(expr)							\
+  ((expr)							\
+   ? (void)0							\
+   : (__builtin_printf ("assertion failed on line %i: %s\n",	\
+			__LINE__, #expr),			\
+      __builtin_abort ()))
+
+static const char c = '1';
+static const char s1[1] = "1";
+static const char s4[4] = "1234";
+
+static const char s4_2[2][4] = { "1234", "5678" };
+static const char s5_3[3][5] = { "12345", "6789", "01234" };
+
+volatile int v0 = 0;
+volatile int v1 = 1;
+volatile int v2 = 2;
+volatile int v3 = 3;
+volatile int v4 = 3;
+
+void test_narrow (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+
+  A (memchr ("" + 1, 0, 0) == 0);
+
+  A (memchr (&c, 0, sizeof c) == 0);
+  A (memchr (&c + 1, 0, sizeof c - 1) == 0);
+  A (memchr (&c + i1, 0, sizeof c - i1) == 0);
+  A (memchr (&c + v1, 0, sizeof c - v1) == 0);
+
+  A (memchr (s1, 0, sizeof s1) == 0);
+  A (memchr (s1 + 1, 0, sizeof s1 - 1) == 0);
+  A (memchr (s1 + i1, 0, sizeof s1 - i1) == 0);
+  A (memchr (s1 + v1, 0, sizeof s1 - v1) == 0);
+
+  A (memchr (&s1, 0, sizeof s1) == 0);
+  A (memchr (&s1 + 1, 0, sizeof s1 - 1) == 0);
+  A (memchr (&s1 + i1, 0, sizeof s1 - i1) == 0);
+  A (memchr (&s1 + v1, 0, sizeof s1 - v1) == 0);
+
+  A (memchr (&s1[0], 0, sizeof s1) == 0);
+  A (memchr (&s1[0] + 1, 0, sizeof s1 - 1) == 0);
+  A (memchr (&s1[0] + i1, 0, sizeof s1 - i1) == 0);
+  A (memchr (&s1[0] + v1, 0, sizeof s1 - v1) == 0);
+
+  A (memchr (&s1[i0], 0, sizeof s1) == 0);
+  A (memchr (&s1[i0] + 1, 0, sizeof s1 - 1) == 0);
+  A (memchr (&s1[i0] + i1, 0, sizeof s1 - i1) == 0);
+  A (memchr (&s1[i0] + v1, 0, sizeof s1 - v1) == 0);
+
+  A (memchr (&s1[v0], 0, sizeof s1) == 0);
+  A (memchr (&s1[v0] + 1, 0, sizeof s1 - 1) == 0);
+  A (memchr (&s1[v0] + i1, 0, sizeof s1 - i1) == 0);
+  A (memchr (&s1[v0] + v1, 0, sizeof s1 - v1) == 0);
+
+
+  A (memchr (s4 + i0, 0, sizeof s4 - i0) == 0);
+  A (memchr (s4 + i1, 0, sizeof s4 - i1) == 0);
+  A (memchr (s4 + i2, 0, sizeof s4 - i2) == 0);
+  A (memchr (s4 + i3, 0, sizeof s4 - i3) == 0);
+  A (memchr (s4 + i4, 0, sizeof s4 - i4) == 0);
+
+  A (memchr (s4 + v0, 0, sizeof s4 - v0) == 0);
+  A (memchr (s4 + v1, 0, sizeof s4 - v1) == 0);
+  A (memchr (s4 + v2, 0, sizeof s4 - v2) == 0);
+  A (memchr (s4 + v3, 0, sizeof s4 - v3) == 0);
+  A (memchr (s4 + v4, 0, sizeof s4 - v4) == 0);
+
+
+  A (memchr (s4_2, 0, sizeof s4_2) == 0);
+
+  A (memchr (s4_2[0], 0, sizeof s4_2[0]) == 0);
+  A (memchr (s4_2[1], 0, sizeof s4_2[1]) == 0);
+
+  A (memchr (s4_2[0] + 1, 0, sizeof s4_2[0] - 1) == 0);
+  A (memchr (s4_2[1] + 2, 0, sizeof s4_2[1] - 2) == 0);
+  A (memchr (s4_2[1] + 3, 0, sizeof s4_2[1] - 3) == 0);
+
+  A (memchr (s4_2[v0], 0, sizeof s4_2[v0]) == 0);
+  A (memchr (s4_2[v0] + 1, 0, sizeof s4_2[v0] - 1) == 0);
+
+
+  /* The following calls must find the nul.  */
+  A (memchr ("", 0, 1) != 0);
+  A (memchr (s5_3, 0, sizeof s5_3) == &s5_3[1][4]);
+
+  A (memchr (&s5_3[0][0] + i0, 0, sizeof s5_3 - i0) == &s5_3[1][4]);
+  A (memchr (&s5_3[0][0] + i1, 0, sizeof s5_3 - i1) == &s5_3[1][4]);
+  A (memchr (&s5_3[0][0] + i2, 0, sizeof s5_3 - i2) == &s5_3[1][4]);
+  A (memchr (&s5_3[0][0] + i4, 0, sizeof s5_3 - i4) == &s5_3[1][4]);
+
+  A (memchr (&s5_3[1][i0], 0, sizeof s5_3[1] - i0) == &s5_3[1][4]);
+}
+
+static const wchar_t wc = L'1';
+static const wchar_t ws1[] = L"1";
+static const wchar_t ws4[] = L"\x00123456\x12005678\x12340078\x12345600";
+
+void test_wide (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+
+  A (memchr (L"" + 1, 0, 0) == 0);
+  A (memchr (&wc + 1, 0, 0) == 0);
+  A (memchr (L"\x12345678", 0, sizeof (wchar_t)) == 0);
+
+  const size_t nb = sizeof ws4;
+  const size_t nwb = sizeof (wchar_t);
+
+  const char *pws1 = (const char*)ws1;
+  const char *pws4 = (const char*)ws4;
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  A (memchr (ws1, 0, sizeof ws1) == pws1 + 1);
+
+  A (memchr (&ws4[0], 0, nb) == pws4 + 3);
+  A (memchr (&ws4[1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 2);
+  A (memchr (&ws4[2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 1);
+  A (memchr (&ws4[3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 0);
+#else
+  A (memchr (ws1, 0, sizeof ws1) == pws1 + 0);
+
+  A (memchr (&ws4[0], 0, nb) == pws4 + 0);
+  A (memchr (&ws4[1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 0);
+  A (memchr (&ws4[2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 1);
+  A (memchr (&ws4[3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 2);
+#endif
+}
+
+
+int main ()
+{
+  test_narrow ();
+  test_wide ();
+}
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
new file mode 100644
index 0000000..3ad6852
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c
@@ -0,0 +1,26 @@
+/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too
+   long initializer
+
+   The excessively long initializer for a[0] is undefined but this
+   test verifies that the excess elements are not considered a part
+   of the value of the array as a matter of QoI.  */
+
+const char a[2][3] = { "1234", "xyz" };
+char b[6];
+
+void *pb = b;
+
+int main ()
+{
+   __builtin_memcpy (b, a, 4);
+   __builtin_memset (b + 4, 'a', 2);
+
+   if (b[0] != '1' || b[1] != '2' || b[2] != '3'
+       || b[3] != 'x' || b[4] != 'a' || b[5] != 'a')
+     __builtin_abort ();
+
+   if (__builtin_memcmp (pb, "123xaa", 6))
+     __builtin_abort ();
+
+   return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/strlenopt-56.c b/gcc/testsuite/gcc.dg/strlenopt-56.c
new file mode 100644
index 0000000..e0e8068
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/strlenopt-56.c
@@ -0,0 +1,93 @@
+/* PR tree-optimization/86711 - wrong folding of memchr
+
+   Verify that calls to memchr() with constant arrays initialized
+   with wide string literals are folded.
+
+   { dg-do compile }
+   { dg-options "-O1 -Wall -fdump-tree-optimized" } */
+
+#include "strlenopt.h"
+
+typedef __WCHAR_TYPE__ wchar_t;
+
+extern void* memchr (const void*, int, size_t);
+
+#define CONCAT(x, y) x ## y
+#define CAT(x, y) CONCAT (x, y)
+#define FAILNAME(name) CAT (call_ ## name ##_on_line_, __LINE__)
+
+#define FAIL(name) do {				\
+    extern void FAILNAME (name) (void);		\
+    FAILNAME (name)();				\
+  } while (0)
+
+/* Macro to emit a call to funcation named
+   call_in_true_branch_not_eliminated_on_line_NNN()
+   for each call that's expected to be eliminated.  The dg-final
+   scan-tree-dump-time directive at the bottom of the test verifies
+   that no such call appears in output.  */
+#define ELIM(expr)							\
+  if (!(expr)) FAIL (in_true_branch_not_eliminated); else (void)0
+
+#define T(s, n) ELIM (strlen (s) == n)
+
+
+static const wchar_t wc = L'1';
+static const wchar_t ws1[] = L"1";
+static const wchar_t wsx[] = L"\x12345678";
+static const wchar_t ws4[] = L"\x00123456\x12005678\x12340078\x12345600";
+
+void test_wide (void)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+  int i4 = i3 + 1;
+
+  ELIM (memchr (L"" + 1, 0, 0) == 0);
+  ELIM (memchr (&wc + 1, 0, 0) == 0);
+  ELIM (memchr (L"\x12345678", 0, sizeof (wchar_t)) == 0);
+
+  const size_t nb = sizeof ws4;
+  const size_t nwb = sizeof (wchar_t);
+
+  const char *pws1 = (const char*)ws1;
+  const char *pws4 = (const char*)ws4;
+  const char *pwsx = (const char*)wsx;
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+  ELIM (memchr (ws1, 0, sizeof ws1) == pws1 + 1);
+  ELIM (memchr (wsx, 0, sizeof wsx) == pwsx + sizeof *wsx);
+
+  ELIM (memchr (&ws4[0], 0, nb) == pws4 + 3);
+  ELIM (memchr (&ws4[1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 2);
+  ELIM (memchr (&ws4[2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 1);
+  ELIM (memchr (&ws4[3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 0);
+  ELIM (memchr (&ws4[4], 0, nb - 4 * nwb) == pws4 + 4 * nwb + 0);
+
+  ELIM (memchr (&ws4[i0], 0, nb) == pws4 + 3);
+  ELIM (memchr (&ws4[i1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 2);
+  ELIM (memchr (&ws4[i2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 1);
+  ELIM (memchr (&ws4[i3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 0);
+  ELIM (memchr (&ws4[i4], 0, nb - 4 * nwb) == pws4 + 4 * nwb + 0);
+#else
+  ELIM (memchr (ws1, 0, sizeof ws1) == pws1 + 0);
+  ELIM (memchr (wsx, 0, sizeof wsx) == pwsx + sizeof *wsx);
+
+  ELIM (memchr (&ws4[0], 0, nb) == pws4 + 0);
+  ELIM (memchr (&ws4[1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 1);
+  ELIM (memchr (&ws4[2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 2);
+  ELIM (memchr (&ws4[3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 3);
+  ELIM (memchr (&ws4[4], 0, nb - 4 * nwb) == pws4 + 4 * nwb + 0);
+
+  ELIM (memchr (&ws4[i0], 0, nb) == pws4 + 0);
+  ELIM (memchr (&ws4[i1], 0, nb - 1 * nwb) == pws4 + 1 * nwb + 1);
+  ELIM (memchr (&ws4[i2], 0, nb - 2 * nwb) == pws4 + 2 * nwb + 2);
+  ELIM (memchr (&ws4[i3], 0, nb - 3 * nwb) == pws4 + 3 * nwb + 3);
+  ELIM (memchr (&ws4[i4], 0, nb - 4 * nwb) == pws4 + 4 * nwb + 0);
+#endif
+}
+
+/* { dg-final { scan-tree-dump-times "memchr" 0 "optimized" } }
+   { dg-final { scan-tree-dump-times "call_in_true_branch_not_eliminated" 0 "optimized" } } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
  2018-08-13 21:25     ` [PATCH 1/6] prevent folding of unterminated const arrays in memchr calls (PR " Martin Sebor
@ 2018-08-13 21:27     ` Martin Sebor
  2018-08-30 22:31       ` Jeff Law
  2018-08-13 21:28     ` [PATCH 4/6] detect unterminated const arrays in sprintf " Martin Sebor
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:27 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 108 bytes --]

The attached changes implement the detection of past-the-end reads
by strcpy due to unterminated arguments.

[-- Attachment #2: gcc-86552-3.diff --]
[-- Type: text/x-patch, Size: 16724 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	* builtins.c (unterminated_array): New.
	(expand_builtin_strcpy): Adjust.
	(expand_builtin_strcpy_args): Detect unterminated arrays.
	* gimple-fold.c (get_maxval_strlen): Add argument.  Detect
	unterminated arrays.
	* gimple-fold.h (get_maxval_strlen): Add argument.
	(gimple_fold_builtin_strcpy): Detec unterminated arrays.

gcc/testsuite/ChangeLog:

	* gcc.dg/warn-strcpy-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 78ced93..a77f25c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -132,7 +132,7 @@ static rtx expand_builtin_mempcpy (tree, rtx);
 static rtx expand_builtin_mempcpy_args (tree, tree, tree, rtx, tree, int);
 static rtx expand_builtin_strcat (tree, rtx);
 static rtx expand_builtin_strcpy (tree, rtx);
-static rtx expand_builtin_strcpy_args (tree, tree, rtx);
+static rtx expand_builtin_strcpy_args (tree, tree, tree, rtx);
 static rtx expand_builtin_stpcpy (tree, rtx, machine_mode);
 static rtx expand_builtin_stpncpy (tree, rtx);
 static rtx expand_builtin_strncat (tree, rtx);
@@ -580,6 +580,34 @@ warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
     inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
 }
 
+/* If EXP refers to an unterminated constant character array return
+   the declaration of the object of which the array is a member or
+   element.  Otherwise return null.  */
+
+static tree
+unterminated_array (tree exp)
+{
+  if (TREE_CODE (exp) == SSA_NAME)
+    {
+      gimple *stmt = SSA_NAME_DEF_STMT (exp);
+      if (!is_gimple_assign (stmt))
+	return NULL_TREE;
+
+      tree rhs1 = gimple_assign_rhs1 (stmt);
+      tree_code code = gimple_assign_rhs_code (stmt);
+      if (code != POINTER_PLUS_EXPR)
+	return NULL_TREE;
+
+      exp = rhs1;
+    }
+
+  tree nonstr;
+  if (c_strlen (exp, 1, &nonstr) && nonstr)
+    return nonstr;
+
+  return NULL_TREE;
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -3902,7 +3930,7 @@ expand_builtin_strcpy (tree exp, rtx target)
 		    src, destsize);
     }
 
-  if (rtx ret = expand_builtin_strcpy_args (dest, src, target))
+  if (rtx ret = expand_builtin_strcpy_args (exp, dest, src, target))
     {
       /* Check to see if the argument was declared attribute nonstring
 	 and if so, issue a warning since at this point it's not known
@@ -3922,8 +3950,17 @@ expand_builtin_strcpy (tree exp, rtx target)
    expand_builtin_strcpy.  */
 
 static rtx
-expand_builtin_strcpy_args (tree dest, tree src, rtx target)
+expand_builtin_strcpy_args (tree exp, tree dest, tree src, rtx target)
 {
+  /* Detect strcpy calls with unterminated arrays..  */
+  if (tree nonstr = unterminated_array (src))
+    {
+      /* NONSTR refers to the non-nul terminated constant array.  */
+      if (!TREE_NO_WARNING (exp))
+	warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, nonstr);
+      return NULL_RTX;
+    }
+
   return expand_movstr (dest, src, target, /*endp=*/0);
 }
 
@@ -3983,7 +4020,7 @@ expand_builtin_stpcpy_1 (tree exp, rtx target, machine_mode mode)
 
 	  if (CONST_INT_P (len_rtx))
 	    {
-	      ret = expand_builtin_strcpy_args (dst, src, target);
+	      ret = expand_builtin_strcpy_args (exp, dst, src, target);
 
 	      if (ret)
 		{
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 5c88e33..3fb8d85 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1591,21 +1591,30 @@ get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */,
 }
 
 tree
-get_maxval_strlen (tree arg, int type)
+get_maxval_strlen (tree arg, int type, tree *nonstr /* = NULL */)
 {
   bitmap visited = NULL;
   tree len[2] = { NULL_TREE, NULL_TREE };
 
   bool dummy;
   /* Set to non-null if ARG refers to an unterminated array.  */
-  tree nonstr = NULL_TREE;
-  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, &nonstr))
+  tree mynonstr = NULL_TREE;
+  if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, &mynonstr))
     len[1] = NULL_TREE;
   if (visited)
     BITMAP_FREE (visited);
 
+  if (nonstr)
+    {
+      /* For callers prepared to handle unterminated arrays set
+       *NONSTR to point to the declaration of the array and return
+       the maximum length/size. */
+      *nonstr = mynonstr;
+      return len[1];
+    }
+
   /* Fail if the constant array isn't nul-terminated.  */
-  return nonstr ? NULL_TREE : len[1];
+  return mynonstr ? NULL_TREE : len[1];
 }
 
 
@@ -1648,10 +1657,21 @@ gimple_fold_builtin_strcpy (gimple_stmt_iterator *gsi,
   if (!fn)
     return false;
 
-  tree len = get_maxval_strlen (src, 0);
+  /* Set to non-null if ARG refers to an unterminated array.  */
+  tree nonstr;
+  tree len = get_maxval_strlen (src, 0, &nonstr);
   if (!len)
     return false;
 
+  if (nonstr)
+    {
+      /* Avoid folding calls with unterminated arrays.  */
+      if (!gimple_no_warning_p (stmt))
+	warn_string_no_nul (loc, NULL_TREE, gimple_call_fndecl (stmt), nonstr);
+      gimple_set_no_warning (stmt, true);
+      return false;
+    }
+
   len = fold_convert_loc (loc, size_type_node, len);
   len = size_binop_loc (loc, PLUS_EXPR, len, build_int_cst (size_type_node, 1));
   len = force_gimple_operand_gsi (gsi, len, true,
diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h
index 9bfc468..9ffe75bf 100644
--- a/gcc/gimple-fold.h
+++ b/gcc/gimple-fold.h
@@ -26,7 +26,7 @@ extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL);
 extern tree canonicalize_constructor_val (tree, tree);
 extern tree get_symbol_constant_value (tree);
 extern bool get_range_strlen (tree, tree[2], bool = false, tree * = NULL);
-extern tree get_maxval_strlen (tree, int);
+extern tree get_maxval_strlen (tree, int, tree * = NULL);
 extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree);
 extern bool fold_stmt (gimple_stmt_iterator *);
 extern bool fold_stmt (gimple_stmt_iterator *, tree (*) (tree));
diff --git a/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c b/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c
new file mode 100644
index 0000000..b06ec52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c
@@ -0,0 +1,324 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wno-array-bounds -ftrack-macro-expansion=0" } */
+
+extern char* strcpy (char*, const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int v0 = 0;
+int v1 = 1;
+int v2 = 1;
+int v3 = 1;
+
+void sink (char*, ...);
+
+#define T(str) sink (strcpy (d, str))
+
+void test_one_dim_array (char *d)
+{
+  T (a);                /* { dg-warning "argument missing terminating nul" } */
+  T (&a[0]);            /* { dg-warning "nul" } */
+  T (&a[0] + 1);        /* { dg-warning "nul" } */
+  T (&a[1]);            /* { dg-warning "nul" } */
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (&a[i0]);           /* { dg-warning "nul" } */
+  T (&a[i0] + 1);       /* { dg-warning "nul" } */
+  T (&a[i1]);           /* { dg-warning "nul" } */
+
+  T (&a[v0]);           /* { dg-warning "nul" } */
+  T (&a[v0] + 1);       /* { dg-warning "nul" } */
+  T (&a[v0] + v1);      /* { dg-warning "nul" } */
+}
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+void test_two_dim_array (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+
+  T (b[0]);
+  T (b[1]);
+  T (b[2]);
+  T (b[3]);             /* { dg-warning "nul" } */
+  T (b[i0]);
+  T (b[i1]);
+  T (b[i2]);
+  T (b[i3]);            /* { dg-warning "nul" } */
+  T (b[v0]);
+  T (b[v3]);
+
+  T (&b[2][1]);
+  T (&b[2][1] + 1);
+  T (&b[2][v0]);
+  T (&b[2][1] + v0);
+
+  T (&b[i2][i1]);
+  T (&b[i2][i1] + i1);
+  T (&b[i2][v0]);
+  T (&b[i2][i1] + v0);
+
+  T (&b[3][1]);         /* { dg-warning "nul" } */
+  T (&b[3][1] + 1);     /* { dg-warning "nul" } */
+  T (&b[3][v0]);        /* { dg-warning "nul" } */
+  T (&b[3][1] + v0);    /* { dg-warning "nul" } */
+  T (&b[3][v0] + v1);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (&b[i3][i1]);       /* { dg-warning "nul" } */
+  T (&b[i3][i1] + i1);  /* { dg-warning "nul" } */
+  T (&b[i3][v0]);       /* { dg-warning "nul" } */
+  T (&b[i3][i1] + v0);  /* { dg-warning "nul" } */
+  T (&b[i3][v0] + v1);  /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? "" : b[0]);
+  T (v0 ? "" : b[1]);
+  T (v0 ? "" : b[2]);
+  T (v0 ? "" : b[3]);               /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[0] : "");
+  T (v0 ? b[1] : "");
+  T (v0 ? b[2] : "");
+  T (v0 ? b[3] : "");               /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? "1234" : b[3]);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[3] : "1234");           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? a : b[3]);                /* { dg-warning "nul" } */
+  T (v0 ? b[0] : b[2]);
+  T (v0 ? b[2] : b[3]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[3] : b[2]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[1] : &b[3][1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  /* It's possible to detect the missing nul in the following two
+     expressions but GCC doesn't do it yet.  */
+  T (v0 ? &b[3][1] + v0 : b[2]);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+}
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+void test_struct_member (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + i0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + i0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + v0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + v0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (&s.a[v0]);
+  T (&s.a[v0] + 1);
+  T (&s.a[v0] + v0);
+  T (&s.a[v1]);
+  T (&s.a[v1] + 1);
+  T (&s.a[v1] + v0);
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[0]);          /* { dg-warning "nul" } */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[0] + i0);     /* { dg-warning "nul" } */
+  T (&s.b[1]);          /* { dg-warning "nul" } */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[1] + i0);     /* { dg-warning "nul" } */
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[0]);          /* { dg-warning "nul" } */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[0] + v0);     /* { dg-warning "nul" } */
+  T (&s.b[1]);          /* { dg-warning "nul" } */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[1] + v0);     /* { dg-warning "nul" } */
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[v0]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v0] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+}
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+void test_array_of_structs (char *d)
+{
+  T (ba[0].a[0].a);
+  T (&ba[0].a[0].a[0]);
+  T (&ba[0].a[0].a[0] + 1);
+  T (&ba[0].a[0].a[0] + v0);
+  T (&ba[0].a[0].a[1]);
+  T (&ba[0].a[0].a[1] + 1);
+  T (&ba[0].a[0].a[1] + v0);
+
+  T (ba[0].a[0].b);           /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[0].a[1].a);           /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[0].a[1].b);
+  T (&ba[0].a[1].b[0]);
+  T (&ba[0].a[1].b[0] + 1);
+  T (&ba[0].a[1].b[0] + v0);
+  T (&ba[0].a[1].b[1]);
+  T (&ba[0].a[1].b[1] + 1);
+  T (&ba[0].a[1].b[1] + v0);
+
+
+  T (ba[1].a[0].a);           /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[1].a[0].b);
+  T (&ba[1].a[0].b[0]);
+  T (&ba[1].a[0].b[0] + 1);
+  T (&ba[1].a[0].b[0] + v0);
+  T (&ba[1].a[0].b[1]);
+  T (&ba[1].a[0].b[1] + 1);
+  T (&ba[1].a[0].b[1] + v0);
+
+  T (ba[1].a[1].a);
+  T (&ba[1].a[1].a[0]);
+  T (&ba[1].a[1].a[0] + 1);
+  T (&ba[1].a[1].a[0] + v0);
+  T (&ba[1].a[1].a[1]);
+  T (&ba[1].a[1].a[1] + 1);
+  T (&ba[1].a[1].a[1] + v0);
+
+  T (ba[1].a[1].b);           /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1] + v0);  /* { dg-warning "nul" } */
+
+
+  T (ba[2].a[0].a);
+  T (&ba[2].a[0].a[0]);
+  T (&ba[2].a[0].a[0] + 1);
+  T (&ba[2].a[0].a[0] + v0);
+  T (&ba[2].a[0].a[1]);
+  T (&ba[2].a[0].a[1] + 1);
+  T (&ba[2].a[0].a[1] + v0);
+
+  T (ba[2].a[0].b);
+  T (&ba[2].a[0].b[0]);
+  T (&ba[2].a[0].b[0] + 1);
+  T (&ba[2].a[0].b[0] + v0);
+  T (&ba[2].a[0].b[1]);
+  T (&ba[2].a[0].b[1] + 1);
+  T (&ba[2].a[0].b[1] + v0);
+
+  T (ba[2].a[1].a);
+  T (&ba[2].a[1].a[0]);
+  T (&ba[2].a[1].a[0] + 1);
+  T (&ba[2].a[1].a[0] + v0);
+  T (&ba[2].a[1].a[1]);
+  T (&ba[2].a[1].a[1] + 1);
+  T (&ba[2].a[1].a[1] + v0);
+
+
+  T (ba[3].a[0].a);
+  T (&ba[3].a[0].a[0]);
+  T (&ba[3].a[0].a[0] + 1);
+  T (&ba[3].a[0].a[0] + v0);
+  T (&ba[3].a[0].a[1]);
+  T (&ba[3].a[0].a[1] + 1);
+  T (&ba[3].a[0].a[1] + v0);
+
+  T (ba[3].a[0].b);
+  T (&ba[3].a[0].b[0]);
+  T (&ba[3].a[0].b[0] + 1);
+  T (&ba[3].a[0].b[0] + v0);
+  T (&ba[3].a[0].b[1]);
+  T (&ba[3].a[0].b[1] + 1);
+  T (&ba[3].a[0].b[1] + v0);
+
+  T (ba[3].a[1].a);           /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0]);	      /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1]);	      /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[3].a[1].b);
+  T (&ba[3].a[1].b[0]);	
+  T (&ba[3].a[1].b[0] + 1);
+  T (&ba[3].a[1].b[0] + v0);
+  T (&ba[3].a[1].b[1]);	
+  T (&ba[3].a[1].b[1] + 1);
+  T (&ba[3].a[1].b[1] + v0);
+
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[1].b);
+  T (v0 ? ba[0].a[1].b : ba[0].a[0].a);
+}
+
+/* { dg-prune-output " reading \[1-9\]\[0-9\]? bytes from a region " } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 4/6] detect unterminated const arrays in sprintf calls (PR 86552)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
  2018-08-13 21:25     ` [PATCH 1/6] prevent folding of unterminated const arrays in memchr calls (PR " Martin Sebor
  2018-08-13 21:27     ` [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552) Martin Sebor
@ 2018-08-13 21:28     ` Martin Sebor
  2018-08-30 22:55       ` Jeff Law
  2018-08-13 21:29     ` [PATCH 5/6] detect unterminated const arrays in stpcpy " Martin Sebor
                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:28 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 150 bytes --]

The attached changes implement the detection of past-the-end reads
by the sprintf family of functions due to unterminated arguments to
%s directives.

[-- Attachment #2: gcc-86552-4.diff --]
[-- Type: text/x-patch, Size: 18501 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	* gimple-ssa-sprintf.c (struct fmtresult): Add new member and
	initialize it.
	(get_string_length): Detect unterminated arrays.
	(format_string): Same.
	(format_directive): Warn about unterminated arrays.

gcc/testsuite/ChangeLog:

	* gcc.dg/warn-sprintf-no-nul.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index c652c55..95ab692 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -648,7 +648,7 @@ struct fmtresult
   /* Construct a FMTRESULT object with all counters initialized
      to MIN.  KNOWNRANGE is set when MIN is valid.  */
   fmtresult (unsigned HOST_WIDE_INT min = HOST_WIDE_INT_MAX)
-  : argmin (), argmax (),
+  : argmin (), argmax (), nonstr (),
     knownrange (min < HOST_WIDE_INT_MAX),
     nullp ()
   {
@@ -662,7 +662,7 @@ struct fmtresult
      KNOWNRANGE is set when both MIN and MAX are valid.   */
   fmtresult (unsigned HOST_WIDE_INT min, unsigned HOST_WIDE_INT max,
 	     unsigned HOST_WIDE_INT likely = HOST_WIDE_INT_MAX)
-  : argmin (), argmax (),
+  : argmin (), argmax (), nonstr (),
     knownrange (min < HOST_WIDE_INT_MAX && max < HOST_WIDE_INT_MAX),
     nullp ()
   {
@@ -689,6 +689,10 @@ struct fmtresult
      results in on output for an argument in the range above.  */
   result_range range;
 
+  /* Non-nul when the argument of a string directive is not a nul
+     terminated string.  */
+  tree nonstr;
+
   /* True when the range above is obtained from a known value of
      a directive's argument or its bounds and not the result of
      heuristics that depend on warning levels.  */
@@ -2129,10 +2133,12 @@ get_string_length (tree str)
   if (!str)
     return fmtresult ();
 
-  if (tree slen = c_strlen (str, 1))
+  tree arr;
+  if (tree slen = c_strlen (str, 1, &arr))
     {
       /* Simply return the length of the string.  */
       fmtresult res (tree_to_shwi (slen));
+      res.nonstr = arr;
       return res;
     }
 
@@ -2140,9 +2146,11 @@ get_string_length (tree str)
      by STR.  Strings of unknown lengths are bounded by the sizes of
      arrays that subexpressions of STR may refer to.  Pointers that
      aren't known to point any such arrays result in LENRANGE[1] set
-     to SIZE_MAX.  */
+     to SIZE_MAX.  NONSTR is set to the declaration of the constant
+     array that is known not to be nul-terminated.  */
   tree lenrange[2];
-  bool flexarray = get_range_strlen (str, lenrange);
+  tree nonstr;
+  bool flexarray = get_range_strlen (str, lenrange, false, &nonstr);
 
   if (lenrange [0] || lenrange [1])
     {
@@ -2165,6 +2173,7 @@ get_string_length (tree str)
 	max = HOST_WIDE_INT_M1U;
 
       fmtresult res (min, max);
+      res.nonstr = nonstr;
 
       /* Set RES.KNOWNRANGE to true if and only if all strings referenced
 	 by STR are known to be bounded (though not necessarily by their
@@ -2422,6 +2431,11 @@ format_string (const directive &dir, tree arg, vr_values *)
       res.range.unlikely = res.range.max;
     }
 
+  /* If the argument isn't a nul-terminated string and the number
+     of bytes on output isn't bounded by precision, set NONSTR.  */
+  if (slen.nonstr && slen.range.min < (unsigned HOST_WIDE_INT)dir.prec[0])
+    res.nonstr = slen.nonstr;
+
   /* Bump up the byte counters if WIDTH is greater.  */
   return res.adjust_for_width_or_precision (dir.width);
 }
@@ -2988,6 +3002,18 @@ format_directive (const sprintf_dom_walker::call_info &info,
 			  fmtres.range.min, fmtres.range.max);
     }
 
+  if (!warned && fmtres.nonstr)
+    {
+      warned = fmtwarn (dirloc, argloc, NULL, info.warnopt (),
+			"%<%.*s%> directive argument is not a nul-terminated "
+			"string",
+			dirlen,
+			target_to_host (hostdir, sizeof hostdir, dir.beg));
+      if (warned && DECL_P (fmtres.nonstr))
+	inform (DECL_SOURCE_LOCATION (fmtres.nonstr),
+		"referenced argument declared here");
+    }
+
   if (warned && fmtres.range.min < fmtres.range.likely
       && fmtres.range.likely < fmtres.range.max)
     inform_n (info.fmtloc, fmtres.range.likely,
@@ -4033,6 +4059,8 @@ sprintf_dom_walker::handle_gimple_call (gimple_stmt_iterator *gsi)
   format_result res = format_result ();
 
   bool success = compute_format_length (info, &res);
+  if (res.warned)
+    gimple_set_no_warning (info.callstmt, true);
 
   /* When optimizing and the printf return value optimization is enabled,
      attempt to substitute the computed result for the return value of
diff --git a/gcc/testsuite/gcc.dg/warn-sprintf-no-nul.c b/gcc/testsuite/gcc.dg/warn-sprintf-no-nul.c
new file mode 100644
index 0000000..b331bb5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-sprintf-no-nul.c
@@ -0,0 +1,90 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   Exercise non-string detection in sprintf.
+   { dg-do compile }
+   { dg-options "-O2 -Wno-array-bounds -Wall -ftrack-macro-expansion=0" } */
+
+#include "range.h"
+
+typedef __WCHAR_TYPE__ wchar_t;
+
+extern int sprintf (char*, const char*, ...);
+
+extern char *dst;
+
+int i0 = 0;
+int i1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(fmt, ...)				\
+  sink (sprintf (dst, fmt, __VA_ARGS__))
+
+const char a[5] = "12345";    /* { dg-message "declared here" } */
+const char b[6] = "123456";   /* { dg-message "declared here" } */
+const char a2[][3] = {
+  "", "1", "12", "123", "123\000"   /* { dg-warning "initializer-string for array of chars is too long" } */
+};
+
+
+void test_narrow (void)
+{
+  /* Verify that precision suppresses the warning when it's less
+     than the size of the array.  */
+  T ("%.0s%.1s%.2s%.3s%.4s%.5s", a, a, a, a, a, a);
+
+  T ("%s", a);          /* { dg-warning ".%s. directive argument is not a nul-terminated string" } */
+  T ("%.6s", a);        /* { dg-warning ".%.6s. directive argument is not a nul-terminated string" } */
+
+  /* Exercise conditional expressions involving strings and non-strings.  */
+  const char *s0 = i0 < 0 ? a2[0] : a2[3];
+  T ("%s", s0);         /* { dg-warning ".%s. directive argument is not a nul-terminated string" } */
+  s0 = i0 < 0 ? "123456" : a2[4];
+  T ("%s", s0);         /* { dg-warning ".%s. directive argument is not a nul-terminated string" } */
+
+  const char *s1 = i0 < 0 ? a2[3] : a2[0];
+  T ("%s", s1);         /* { dg-warning ".%s. directive argument is not a nul-terminated string" } */
+
+  const char *s2 = i0 < 0 ? a2[3] : a2[4];
+  T ("%s", s2);         /* { dg-warning ".%s. directive argument is not a nul-terminated string" } */
+
+  s0 = i0 < 0 ? a : b;
+  T ("%.5s", s0);
+
+  /* Verify that the warning triggers even if precision prevents
+     reading past the end of one of the non-terminated arrays but
+     not the other.  */
+  T ("%.6s", s0);       /* { dg-warning ".%.6s. directive argument is not a nul-terminated string" } */
+
+  s0 = i0 < 0 ? b : a;
+  T ("%.7s", s0);       /* { dg-warning ".%.7s. directive argument is not a nul-terminated string" } */
+
+  /* Verify that at -Wformat-overflow=1 the lower bound of precision
+     given by a range is used to determine whether or not to warn.  */
+  int r = SR (4, 5);
+
+  T ("%.*s", r, a);
+  T ("%.*s", r, b);
+
+  r = SR (5, 6);
+  T ("%.*s", r, a);
+  T ("%.*s", r, b);
+
+  r = SR (6, 7);
+  T ("%.*s", r, a);     /* { dg-warning ".%.\\\*s. directive argument is not a nul-terminated string" } */
+  T ("%.*s", r, b);
+}
+
+
+const wchar_t wa[5] = L"12345";   /* { dg-message "declared here" } */
+
+void test_wide (void)
+{
+  T ("%.0ls%.1ls%.2ls%.3ls%.4ls%.5ls", wa, wa, wa, wa, wa, wa);
+
+  T ("%ls", wa);        /* { dg-warning ".%ls. directive argument is not a nul-terminated string" } */
+  T ("%.6ls", wa);      /* { dg-warning ".%.6ls. directive argument is not a nul-terminated string" } */
+}
diff --git a/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c b/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c
new file mode 100644
index 0000000..b06ec52
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strcpy-no-nul.c
@@ -0,0 +1,324 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wno-array-bounds -ftrack-macro-expansion=0" } */
+
+extern char* strcpy (char*, const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int v0 = 0;
+int v1 = 1;
+int v2 = 1;
+int v3 = 1;
+
+void sink (char*, ...);
+
+#define T(str) sink (strcpy (d, str))
+
+void test_one_dim_array (char *d)
+{
+  T (a);                /* { dg-warning "argument missing terminating nul" } */
+  T (&a[0]);            /* { dg-warning "nul" } */
+  T (&a[0] + 1);        /* { dg-warning "nul" } */
+  T (&a[1]);            /* { dg-warning "nul" } */
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (&a[i0]);           /* { dg-warning "nul" } */
+  T (&a[i0] + 1);       /* { dg-warning "nul" } */
+  T (&a[i1]);           /* { dg-warning "nul" } */
+
+  T (&a[v0]);           /* { dg-warning "nul" } */
+  T (&a[v0] + 1);       /* { dg-warning "nul" } */
+  T (&a[v0] + v1);      /* { dg-warning "nul" } */
+}
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+void test_two_dim_array (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+
+  T (b[0]);
+  T (b[1]);
+  T (b[2]);
+  T (b[3]);             /* { dg-warning "nul" } */
+  T (b[i0]);
+  T (b[i1]);
+  T (b[i2]);
+  T (b[i3]);            /* { dg-warning "nul" } */
+  T (b[v0]);
+  T (b[v3]);
+
+  T (&b[2][1]);
+  T (&b[2][1] + 1);
+  T (&b[2][v0]);
+  T (&b[2][1] + v0);
+
+  T (&b[i2][i1]);
+  T (&b[i2][i1] + i1);
+  T (&b[i2][v0]);
+  T (&b[i2][i1] + v0);
+
+  T (&b[3][1]);         /* { dg-warning "nul" } */
+  T (&b[3][1] + 1);     /* { dg-warning "nul" } */
+  T (&b[3][v0]);        /* { dg-warning "nul" } */
+  T (&b[3][1] + v0);    /* { dg-warning "nul" } */
+  T (&b[3][v0] + v1);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (&b[i3][i1]);       /* { dg-warning "nul" } */
+  T (&b[i3][i1] + i1);  /* { dg-warning "nul" } */
+  T (&b[i3][v0]);       /* { dg-warning "nul" } */
+  T (&b[i3][i1] + v0);  /* { dg-warning "nul" } */
+  T (&b[i3][v0] + v1);  /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? "" : b[0]);
+  T (v0 ? "" : b[1]);
+  T (v0 ? "" : b[2]);
+  T (v0 ? "" : b[3]);               /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[0] : "");
+  T (v0 ? b[1] : "");
+  T (v0 ? b[2] : "");
+  T (v0 ? b[3] : "");               /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? "1234" : b[3]);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[3] : "1234");           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? a : b[3]);                /* { dg-warning "nul" } */
+  T (v0 ? b[0] : b[2]);
+  T (v0 ? b[2] : b[3]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[3] : b[2]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? b[1] : &b[3][1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  /* It's possible to detect the missing nul in the following two
+     expressions but GCC doesn't do it yet.  */
+  T (v0 ? &b[3][1] + v0 : b[2]);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+}
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+void test_struct_member (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + i0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + i0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + v0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + v0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (&s.a[v0]);
+  T (&s.a[v0] + 1);
+  T (&s.a[v0] + v0);
+  T (&s.a[v1]);
+  T (&s.a[v1] + 1);
+  T (&s.a[v1] + v0);
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[0]);          /* { dg-warning "nul" } */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[0] + i0);     /* { dg-warning "nul" } */
+  T (&s.b[1]);          /* { dg-warning "nul" } */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[1] + i0);     /* { dg-warning "nul" } */
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[0]);          /* { dg-warning "nul" } */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[0] + v0);     /* { dg-warning "nul" } */
+  T (&s.b[1]);          /* { dg-warning "nul" } */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" } */
+  T (&s.b[1] + v0);     /* { dg-warning "nul" } */
+
+  T (s.b);              /* { dg-warning "nul" } */
+  T (&s.b[v0]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v0] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (&s.b[v1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+}
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+void test_array_of_structs (char *d)
+{
+  T (ba[0].a[0].a);
+  T (&ba[0].a[0].a[0]);
+  T (&ba[0].a[0].a[0] + 1);
+  T (&ba[0].a[0].a[0] + v0);
+  T (&ba[0].a[0].a[1]);
+  T (&ba[0].a[0].a[1] + 1);
+  T (&ba[0].a[0].a[1] + v0);
+
+  T (ba[0].a[0].b);           /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[0].b[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[0].a[1].a);           /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[0].a[1].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[0].a[1].b);
+  T (&ba[0].a[1].b[0]);
+  T (&ba[0].a[1].b[0] + 1);
+  T (&ba[0].a[1].b[0] + v0);
+  T (&ba[0].a[1].b[1]);
+  T (&ba[0].a[1].b[1] + 1);
+  T (&ba[0].a[1].b[1] + v0);
+
+
+  T (ba[1].a[0].a);           /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[0].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[1].a[0].b);
+  T (&ba[1].a[0].b[0]);
+  T (&ba[1].a[0].b[0] + 1);
+  T (&ba[1].a[0].b[0] + v0);
+  T (&ba[1].a[0].b[1]);
+  T (&ba[1].a[0].b[1] + 1);
+  T (&ba[1].a[0].b[1] + v0);
+
+  T (ba[1].a[1].a);
+  T (&ba[1].a[1].a[0]);
+  T (&ba[1].a[1].a[0] + 1);
+  T (&ba[1].a[1].a[0] + v0);
+  T (&ba[1].a[1].a[1]);
+  T (&ba[1].a[1].a[1] + 1);
+  T (&ba[1].a[1].a[1] + v0);
+
+  T (ba[1].a[1].b);           /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[1].a[1].b[1] + v0);  /* { dg-warning "nul" } */
+
+
+  T (ba[2].a[0].a);
+  T (&ba[2].a[0].a[0]);
+  T (&ba[2].a[0].a[0] + 1);
+  T (&ba[2].a[0].a[0] + v0);
+  T (&ba[2].a[0].a[1]);
+  T (&ba[2].a[0].a[1] + 1);
+  T (&ba[2].a[0].a[1] + v0);
+
+  T (ba[2].a[0].b);
+  T (&ba[2].a[0].b[0]);
+  T (&ba[2].a[0].b[0] + 1);
+  T (&ba[2].a[0].b[0] + v0);
+  T (&ba[2].a[0].b[1]);
+  T (&ba[2].a[0].b[1] + 1);
+  T (&ba[2].a[0].b[1] + v0);
+
+  T (ba[2].a[1].a);
+  T (&ba[2].a[1].a[0]);
+  T (&ba[2].a[1].a[0] + 1);
+  T (&ba[2].a[1].a[0] + v0);
+  T (&ba[2].a[1].a[1]);
+  T (&ba[2].a[1].a[1] + 1);
+  T (&ba[2].a[1].a[1] + v0);
+
+
+  T (ba[3].a[0].a);
+  T (&ba[3].a[0].a[0]);
+  T (&ba[3].a[0].a[0] + 1);
+  T (&ba[3].a[0].a[0] + v0);
+  T (&ba[3].a[0].a[1]);
+  T (&ba[3].a[0].a[1] + 1);
+  T (&ba[3].a[0].a[1] + v0);
+
+  T (ba[3].a[0].b);
+  T (&ba[3].a[0].b[0]);
+  T (&ba[3].a[0].b[0] + 1);
+  T (&ba[3].a[0].b[0] + v0);
+  T (&ba[3].a[0].b[1]);
+  T (&ba[3].a[0].b[1] + 1);
+  T (&ba[3].a[0].b[1] + v0);
+
+  T (ba[3].a[1].a);           /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0]);	      /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[0] + v0);  /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1]);	      /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" } */
+  T (&ba[3].a[1].a[1] + v0);  /* { dg-warning "nul" } */
+
+  T (ba[3].a[1].b);
+  T (&ba[3].a[1].b[0]);	
+  T (&ba[3].a[1].b[0] + 1);
+  T (&ba[3].a[1].b[0] + v0);
+  T (&ba[3].a[1].b[1]);	
+  T (&ba[3].a[1].b[1] + 1);
+  T (&ba[3].a[1].b[1] + v0);
+
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+  T (v0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" "bug ???" { xfail *-*-* } } */
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[1].b);
+  T (v0 ? ba[0].a[1].b : ba[0].a[0].a);
+}
+
+/* { dg-prune-output " reading \[1-9\]\[0-9\]? bytes from a region " } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 5/6] detect unterminated const arrays in stpcpy calls (PR 86552)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
                       ` (2 preceding siblings ...)
  2018-08-13 21:28     ` [PATCH 4/6] detect unterminated const arrays in sprintf " Martin Sebor
@ 2018-08-13 21:29     ` Martin Sebor
  2018-08-30 23:07       ` Jeff Law
  2018-09-14 18:39       ` Jeff Law
  2018-08-13 21:29     ` [PATCH 6/6] detect unterminated const arrays in strnlen " Martin Sebor
                       ` (2 subsequent siblings)
  6 siblings, 2 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:29 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 109 bytes --]

The attached changes implement the detection of past-the-end reads
by stpcpy due to unterminated arguments.


[-- Attachment #2: gcc-86552-5.diff --]
[-- Type: text/x-patch, Size: 15499 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:

	* builtins.c (unterminated_array): Handle ARRAY_REF.
	(expand_builtin_stpcpy_1): Detect unterminated char arrays.
	* builtins.h (unterminated_array): Declare extern.
	* gimple-fold.c (gimple_fold_builtin_stpcpy): Detect unterminated
	  arrays.
	(gimple_fold_builtin_sprintf): Propagate NO_WARNING to transformed
	calls.

gcc/testsuite/ChangeLog:

	* gcc.dg/warn-stpcpy-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index a77f25c..2f493d3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -584,7 +584,7 @@ warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
    the declaration of the object of which the array is a member or
    element.  Otherwise return null.  */
 
-static tree
+tree
 unterminated_array (tree exp)
 {
   if (TREE_CODE (exp) == SSA_NAME)
@@ -595,7 +595,10 @@ unterminated_array (tree exp)
 
       tree rhs1 = gimple_assign_rhs1 (stmt);
       tree_code code = gimple_assign_rhs_code (stmt);
-      if (code != POINTER_PLUS_EXPR)
+      if (code == ADDR_EXPR
+	  && TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+	rhs1 = rhs1;
+      else if (code != POINTER_PLUS_EXPR)
 	return NULL_TREE;
 
       exp = rhs1;
@@ -4004,9 +4007,14 @@ expand_builtin_stpcpy_1 (tree exp, rtx target, machine_mode mode)
 	 compile-time, not an expression containing a string.  This is
 	 because the latter will potentially produce pessimized code
 	 when used to produce the return value.  */
-      if (! c_getstr (src) || ! (len = c_strlen (src, 0)))
+      tree nonstr = NULL_TREE;
+      if (!c_getstr (src, NULL, &nonstr)
+	  || !(len = c_strlen (src, 0, &nonstr)))
 	return expand_movstr (dst, src, target, /*endp=*/2);
 
+      if (nonstr && !TREE_NO_WARNING (exp))
+	warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, nonstr);
+
       lenp1 = size_binop_loc (loc, PLUS_EXPR, len, ssize_int (1));
       ret = expand_builtin_mempcpy_args (dst, src, lenp1,
 					 target, exp, /*endp=*/2);
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 73b0b0b..f722dd8 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -103,6 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
+extern tree unterminated_array (tree);
 extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 3fb8d85..70cb4ef 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -2787,7 +2787,7 @@ gimple_fold_builtin_stpcpy (gimple_stmt_iterator *gsi)
   location_t loc = gimple_location (stmt);
   tree dest = gimple_call_arg (stmt, 0);
   tree src = gimple_call_arg (stmt, 1);
-  tree fn, len, lenp1;
+  tree fn, lenp1;
 
   /* If the result is unused, replace stpcpy with strcpy.  */
   if (gimple_call_lhs (stmt) == NULL_TREE)
@@ -2800,10 +2800,25 @@ gimple_fold_builtin_stpcpy (gimple_stmt_iterator *gsi)
       return true;
     }
 
-  len = c_strlen (src, 1);
+  /* Set to non-null if ARG refers to an unterminated array.  */
+  tree nonstr;
+  tree len = c_strlen (src, 1, &nonstr);
   if (!len
       || TREE_CODE (len) != INTEGER_CST)
-    return false;
+    {
+      nonstr = unterminated_array (src);
+      if (!nonstr)
+	return false;
+    }
+
+  if (nonstr)
+    {
+      /* Avoid folding calls with unterminated arrays.  */
+      if (!gimple_no_warning_p (stmt))
+	warn_string_no_nul (loc, NULL_TREE, gimple_call_fndecl (stmt), nonstr);
+      gimple_set_no_warning (stmt, true);
+      return false;
+    }
 
   if (optimize_function_for_size_p (cfun)
       /* If length is zero it's small enough.  */
@@ -3066,6 +3081,12 @@ gimple_fold_builtin_sprintf (gimple_stmt_iterator *gsi)
 	 'format' is known to contain no % formats.  */
       gimple_seq stmts = NULL;
       gimple *repl = gimple_build_call (fn, 2, dest, fmt);
+
+      /* Propagate the NO_WARNING bit to avoid issuing the same
+	 warning more than once.  */
+      if (gimple_no_warning_p (stmt))
+	gimple_set_no_warning (repl, true);
+
       gimple_seq_add_stmt_without_update (&stmts, repl);
       if (gimple_call_lhs (stmt))
 	{
@@ -3114,6 +3135,12 @@ gimple_fold_builtin_sprintf (gimple_stmt_iterator *gsi)
       /* Convert sprintf (str1, "%s", str2) into strcpy (str1, str2).  */
       gimple_seq stmts = NULL;
       gimple *repl = gimple_build_call (fn, 2, dest, orig);
+
+      /* Propagate the NO_WARNING bit to avoid issuing the same
+	 warning more than once.  */
+      if (gimple_no_warning_p (stmt))
+	gimple_set_no_warning (repl, true);
+
       gimple_seq_add_stmt_without_update (&stmts, repl);
       if (gimple_call_lhs (stmt))
 	{
diff --git a/gcc/testsuite/gcc.dg/warn-stpcpy-no-nul.c b/gcc/testsuite/gcc.dg/warn-stpcpy-no-nul.c
new file mode 100644
index 0000000..78c4a7f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-stpcpy-no-nul.c
@@ -0,0 +1,324 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wno-array-bounds -ftrack-macro-expansion=0" } */
+
+extern char* stpcpy (char*, const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int v0 = 0;
+int v1 = 1;
+int v2 = 1;
+int v3 = 1;
+
+void sink (char*, ...);
+
+#define T(str) sink (stpcpy (d, str))
+
+void test_one_dim_array (char *d)
+{
+  T (a);                /* { dg-warning "argument missing terminating nul" }  */
+  T (&a[0]);            /* { dg-warning "nul" }  */
+  T (&a[0] + 1);        /* { dg-warning "nul" }  */
+  T (&a[1]);            /* { dg-warning "nul" }  */
+
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (&a[i0]);           /* { dg-warning "nul" }  */
+  T (&a[i0] + 1);       /* { dg-warning "nul" }  */
+  T (&a[i1]);           /* { dg-warning "nul" }  */
+
+  T (&a[v0]);           /* { dg-warning "nul" }  */
+  T (&a[v0] + 1);       /* { dg-warning "nul" }  */
+  T (&a[v0] + v1);      /* { dg-warning "nul" }  */
+}
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+void test_two_dim_array (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+  int i2 = i1 + 1;
+  int i3 = i2 + 1;
+
+  T (b[0]);
+  T (b[1]);
+  T (b[2]);
+  T (b[3]);             /* { dg-warning "nul" }  */
+  T (b[i0]);
+  T (b[i1]);
+  T (b[i2]);
+  T (b[i3]);            /* { dg-warning "nul" }  */
+  T (b[v0]);
+  T (b[v3]);
+
+  T (&b[2][1]);
+  T (&b[2][1] + 1);
+  T (&b[2][v0]);
+  T (&b[2][1] + v0);
+
+  T (&b[i2][i1]);
+  T (&b[i2][i1] + i1);
+  T (&b[i2][v0]);
+  T (&b[i2][i1] + v0);
+
+  T (&b[3][1]);         /* { dg-warning "nul" }  */
+  T (&b[3][1] + 1);     /* { dg-warning "nul" }  */
+  T (&b[3][v0]);        /* { dg-warning "nul" }  */
+  T (&b[3][1] + v0);    /* { dg-warning "nul" }  */
+  T (&b[3][v0] + v1);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (&b[i3][i1]);       /* { dg-warning "nul" }  */
+  T (&b[i3][i1] + i1);  /* { dg-warning "nul" }  */
+  T (&b[i3][v0]);       /* { dg-warning "nul" }  */
+  T (&b[i3][i1] + v0);  /* { dg-warning "nul" }  */
+  T (&b[i3][v0] + v1);  /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? "" : b[0]);
+  T (v0 ? "" : b[1]);
+  T (v0 ? "" : b[2]);
+  T (v0 ? "" : b[3]);               /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? b[0] : "");
+  T (v0 ? b[1] : "");
+  T (v0 ? b[2] : "");
+  T (v0 ? b[3] : "");               /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? "1234" : b[3]);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? b[3] : "1234");           /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? a : b[3]);                /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? b[0] : b[2]);
+  T (v0 ? b[2] : b[3]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? b[3] : b[2]);             /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? b[1] : &b[3][1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  /* It's possible to detect the missing nul in the following two
+     expressions but GCC doesn't do it yet.  */
+  T (v0 ? &b[3][1] + v0 : b[2]);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+}
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+void test_struct_member (char *d)
+{
+  int i0 = 0;
+  int i1 = i0 + 1;
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + i0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + i0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (s.a);
+  T (&s.a[0]);
+  T (&s.a[0] + 1);
+  T (&s.a[0] + v0);
+  T (&s.a[1]);
+  T (&s.a[1] + 1);
+  T (&s.a[1] + v0);
+
+  T (&s.a[i0]);
+  T (&s.a[i0] + 1);
+  T (&s.a[i0] + v0);
+  T (&s.a[i1]);
+  T (&s.a[i1] + 1);
+  T (&s.a[i1] + v0);
+
+  T (&s.a[v0]);
+  T (&s.a[v0] + 1);
+  T (&s.a[v0] + v0);
+  T (&s.a[v1]);
+  T (&s.a[v1] + 1);
+  T (&s.a[v1] + v0);
+
+  T (s.b);              /* { dg-warning "nul" }  */
+  T (&s.b[0]);          /* { dg-warning "nul" }  */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+  T (&s.b[0] + i0);     /* { dg-warning "nul" }  */
+  T (&s.b[1]);          /* { dg-warning "nul" }  */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+  T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+
+  T (s.b);              /* { dg-warning "nul" }  */
+  T (&s.b[0]);          /* { dg-warning "nul" }  */
+  T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+  T (&s.b[0] + v0);     /* { dg-warning "nul" }  */
+  T (&s.b[1]);          /* { dg-warning "nul" }  */
+  T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+  T (&s.b[1] + v0);     /* { dg-warning "nul" }  */
+
+  T (s.b);              /* { dg-warning "nul" }  */
+  T (&s.b[v0]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (&s.b[v0] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (&s.b[v0] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (&s.b[v1]);         /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (&s.b[v1] + 1);     /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (&s.b[v1] + v0);    /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+}
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+void test_array_of_structs (char *d)
+{
+  T (ba[0].a[0].a);
+  T (&ba[0].a[0].a[0]);
+  T (&ba[0].a[0].a[0] + 1);
+  T (&ba[0].a[0].a[0] + v0);
+  T (&ba[0].a[0].a[1]);
+  T (&ba[0].a[0].a[1] + 1);
+  T (&ba[0].a[0].a[1] + v0);
+
+  T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[0] + v0);  /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[0].a[0].b[1] + v0);  /* { dg-warning "nul" }  */
+
+  T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[0] + v0);  /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[0].a[1].a[1] + v0);  /* { dg-warning "nul" }  */
+
+  T (ba[0].a[1].b);
+  T (&ba[0].a[1].b[0]);
+  T (&ba[0].a[1].b[0] + 1);
+  T (&ba[0].a[1].b[0] + v0);
+  T (&ba[0].a[1].b[1]);
+  T (&ba[0].a[1].b[1] + 1);
+  T (&ba[0].a[1].b[1] + v0);
+
+
+  T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[0] + v0);  /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[1].a[0].a[1] + v0);  /* { dg-warning "nul" }  */
+
+  T (ba[1].a[0].b);
+  T (&ba[1].a[0].b[0]);
+  T (&ba[1].a[0].b[0] + 1);
+  T (&ba[1].a[0].b[0] + v0);
+  T (&ba[1].a[0].b[1]);
+  T (&ba[1].a[0].b[1] + 1);
+  T (&ba[1].a[0].b[1] + v0);
+
+  T (ba[1].a[1].a);
+  T (&ba[1].a[1].a[0]);
+  T (&ba[1].a[1].a[0] + 1);
+  T (&ba[1].a[1].a[0] + v0);
+  T (&ba[1].a[1].a[1]);
+  T (&ba[1].a[1].a[1] + 1);
+  T (&ba[1].a[1].a[1] + v0);
+
+  T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[0] + v0);  /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[1].a[1].b[1] + v0);  /* { dg-warning "nul" }  */
+
+
+  T (ba[2].a[0].a);
+  T (&ba[2].a[0].a[0]);
+  T (&ba[2].a[0].a[0] + 1);
+  T (&ba[2].a[0].a[0] + v0);
+  T (&ba[2].a[0].a[1]);
+  T (&ba[2].a[0].a[1] + 1);
+  T (&ba[2].a[0].a[1] + v0);
+
+  T (ba[2].a[0].b);
+  T (&ba[2].a[0].b[0]);
+  T (&ba[2].a[0].b[0] + 1);
+  T (&ba[2].a[0].b[0] + v0);
+  T (&ba[2].a[0].b[1]);
+  T (&ba[2].a[0].b[1] + 1);
+  T (&ba[2].a[0].b[1] + v0);
+
+  T (ba[2].a[1].a);
+  T (&ba[2].a[1].a[0]);
+  T (&ba[2].a[1].a[0] + 1);
+  T (&ba[2].a[1].a[0] + v0);
+  T (&ba[2].a[1].a[1]);
+  T (&ba[2].a[1].a[1] + 1);
+  T (&ba[2].a[1].a[1] + v0);
+
+
+  T (ba[3].a[0].a);
+  T (&ba[3].a[0].a[0]);
+  T (&ba[3].a[0].a[0] + 1);
+  T (&ba[3].a[0].a[0] + v0);
+  T (&ba[3].a[0].a[1]);
+  T (&ba[3].a[0].a[1] + 1);
+  T (&ba[3].a[0].a[1] + v0);
+
+  T (ba[3].a[0].b);
+  T (&ba[3].a[0].b[0]);
+  T (&ba[3].a[0].b[0] + 1);
+  T (&ba[3].a[0].b[0] + v0);
+  T (&ba[3].a[0].b[1]);
+  T (&ba[3].a[0].b[1] + 1);
+  T (&ba[3].a[0].b[1] + v0);
+
+  T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[0]);	      /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[0] + v0);  /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[1]);	      /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+  T (&ba[3].a[1].a[1] + v0);  /* { dg-warning "nul" }  */
+
+  T (ba[3].a[1].b);
+  T (&ba[3].a[1].b[0]);	
+  T (&ba[3].a[1].b[0] + 1);
+  T (&ba[3].a[1].b[0] + v0);
+  T (&ba[3].a[1].b[1]);	
+  T (&ba[3].a[1].b[1] + 1);
+  T (&ba[3].a[1].b[1] + v0);
+
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+  T (v0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" "bug ???" { xfail *-*-* } }  */
+
+  T (v0 ? ba[0].a[0].a : ba[0].a[1].b);
+  T (v0 ? ba[0].a[1].b : ba[0].a[0].a);
+}
+
+/* { dg-prune-output " reading \[1-9\]\[0-9\]? bytes from a region " } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 6/6] detect unterminated const arrays in strnlen calls (PR 86552)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
                       ` (3 preceding siblings ...)
  2018-08-13 21:29     ` [PATCH 5/6] detect unterminated const arrays in stpcpy " Martin Sebor
@ 2018-08-13 21:29     ` Martin Sebor
  2018-08-30 23:25       ` Jeff Law
  2018-10-01 21:49       ` Jeff Law
  2018-08-14  3:21     ` [PATCH 2/6] detect unterminated const arrays in strlen " Martin Sebor
  2018-08-15  6:02     ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Jeff Law
  6 siblings, 2 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-13 21:29 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 131 bytes --]

The attached changes implement the detection of past-the-end reads
by strncpy due to unterminated arguments and excessive bounds.


[-- Attachment #2: gcc-86552-6.diff --]
[-- Type: text/x-patch, Size: 20156 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays

gcc/ChangeLog:
	* builtins.c (expand_builtin_strnlen): Detect, avoid expanding,
	and diagnose unterminated arrays.

gcc/testsuite/ChangeLog:
	* gcc.dg/warn-strnlen-no-nul.c: New.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 2f493d3..46df2ea 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -582,11 +582,16 @@ warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
 
 /* If EXP refers to an unterminated constant character array return
    the declaration of the object of which the array is a member or
-   element.  Otherwise return null.  */
+   element and if SIZE is not null, set *SIZE to the size of
+   the unterminated array and set *EXACT if the size is exact or
+   clear it otherwise.  Otherwise return null.  */
 
 tree
-unterminated_array (tree exp)
+unterminated_array (tree exp, tree *size /* = NULL */, bool *exact /* = NULL */)
 {
+  /* Offset from the beginning of the array or null.  */
+  tree off = NULL_TREE;
+
   if (TREE_CODE (exp) == SSA_NAME)
     {
       gimple *stmt = SSA_NAME_DEF_STMT (exp);
@@ -595,18 +600,43 @@ unterminated_array (tree exp)
 
       tree rhs1 = gimple_assign_rhs1 (stmt);
       tree_code code = gimple_assign_rhs_code (stmt);
-      if (code == ADDR_EXPR
-	  && TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
-	rhs1 = rhs1;
-      else if (code != POINTER_PLUS_EXPR)
+      if ((code == ADDR_EXPR
+	   && TREE_CODE (TREE_OPERAND (rhs1, 0)) == ARRAY_REF)
+	  || code == POINTER_PLUS_EXPR)
+	{
+	  /* Store the index or offset.  */
+	  off = gimple_assign_rhs2 (stmt);
+	  exp = rhs1;
+	}
+      else
 	return NULL_TREE;
-
-      exp = rhs1;
     }
 
   tree nonstr;
-  if (c_strlen (exp, 1, &nonstr) && nonstr)
-    return nonstr;
+  tree len = c_strlen (exp, 1, &nonstr);
+  if (len && nonstr)
+    {
+      if (size)
+	{
+	  if (off)
+	    {
+	      if (TREE_CODE (off) == INTEGER_CST)
+		{
+		  /* Subtract the offset from the size of the array.  */
+		  *exact = true;
+		  off = fold_convert (ssizetype, off);
+		  len = fold_build2 (MINUS_EXPR, ssizetype, len, off);
+		}
+	      else
+		*exact = false;
+	    }
+	  else
+	    *exact = true;
+
+	  *size = len;
+	}
+      return nonstr;
+    }
 
   return NULL_TREE;
 }
@@ -3068,7 +3098,8 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
   tree maxobjsize = max_object_size ();
   tree func = get_callee_fndecl (exp);
 
-  tree len = c_strlen (src, 0);
+  tree nonstr;
+  tree len = c_strlen (src, 0, &nonstr);
 
   if (TREE_CODE (bound) == INTEGER_CST)
     {
@@ -3080,8 +3111,41 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
 			 exp, func, bound, maxobjsize))
 	  TREE_NO_WARNING (exp) = true;
 
+      bool exact = true;
       if (!len || TREE_CODE (len) != INTEGER_CST)
-	return NULL_RTX;
+	{
+	  /* Clear EXACT if LEN may be less than SRC suggests,
+	     such as in
+	       strnlen (&a[i], sizeof a)
+	     where the value of i is unknown.  Unless i's value is
+	     zero, the call is unsafe because the bound is greater. */
+	  nonstr = unterminated_array (src, &len, &exact);
+	  if (!nonstr)
+	    return NULL_RTX;
+	}
+
+      if (nonstr
+	  && !TREE_NO_WARNING (exp)
+	  && (tree_int_cst_lt (len, bound)
+	      || !exact))
+	{
+	  location_t warnloc
+	    = expansion_point_location_if_in_system_header (loc);
+
+	  if (warning_at (warnloc, OPT_Wstringop_overflow_,
+			  exact
+			  ? G_("%K%qD specified bound %E exceeds the size %E "
+			       "of unterminated array")
+			  : G_("%K%qD specified bound %E may exceed the size "
+			       "of at most %E of unterminated array"),
+			  exp, func, bound, len))
+	    {
+	      inform (DECL_SOURCE_LOCATION (nonstr),
+		      "referenced argument declared here");
+	      TREE_NO_WARNING (exp) = true;
+	      return NULL_RTX;
+	    }
+	}
 
       len = fold_convert_loc (loc, size_type_node, len);
       len = fold_build2_loc (loc, MIN_EXPR, size_type_node, len, bound);
@@ -3107,6 +3171,18 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
   if (!len || TREE_CODE (len) != INTEGER_CST)
     return NULL_RTX;
 
+  if (!TREE_NO_WARNING (exp)
+      && wi::ltu_p (wi::to_wide (len), min)
+      && warning_at (loc, OPT_Wstringop_overflow_,
+		     "%K%qD specified bound [%wu, %wu] "
+		     "exceeds the size %E of unterminated array",
+		     exp, func, min.to_uhwi (), max.to_uhwi (), len))
+    {
+      inform (DECL_SOURCE_LOCATION (nonstr),
+	      "referenced argument declared here");
+      TREE_NO_WARNING (exp) = true;
+    }
+
   if (wi::gtu_p (min, wi::to_wide (len)))
     return expand_expr (len, target, target_mode, EXPAND_NORMAL);
 
diff --git a/gcc/builtins.h b/gcc/builtins.h
index f722dd8..c55fa6b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-extern tree unterminated_array (tree);
+extern tree unterminated_array (tree, tree * = NULL, bool * = NULL);
 extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
diff --git a/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c
new file mode 100644
index 0000000..09a527e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c
@@ -0,0 +1,356 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern size_t strnlen (const char*, size_t);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+enum { asz = sizeof a };
+
+int v0 = 0;
+int v1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str, n)					\
+  __attribute__ ((noipa))				\
+  void CAT (test_, __LINE__) (void) {			\
+    int i0 = 0, i1 = i0 + 1, i2 = i1 + 1, i3 = i2 + 1;	\
+    sink (strnlen (str, n), i0, i1, i2, i3);		\
+  } typedef void dummy_type
+
+T (a, asz);
+T (a, asz - 1);
+T (a, asz - 2);
+T (a, asz - 5);
+T (&a[0], asz);
+T (&a[0] + 1, asz);            /* { dg-warning "specified bound 5 exceeds the size 4 of unterminated array" } */
+T (&a[1], asz);                /* { dg-warning "specified bound 5 exceeds the size 4 of unterminated array" } */
+T (&a[1], asz - 1);
+T (&a[v0], asz);               /* { dg-warning "specified bound 5 may exceed the size of at most 5 of unterminated array" } */
+T (&a[v0] + 1, asz);           /* { dg-warning "specified bound 5 may exceed the size of at most 5 of unterminated array" } */
+
+T (a, asz + 1);                /* { dg-warning "specified bound 6 exceeds the size 5 " } */
+T (&a[0], asz + 1);            /* { dg-warning "unterminated" } */
+T (&a[0] + 1, asz - 1);
+T (&a[0] + 1, asz + 1);        /* { dg-warning "unterminated" } */
+T (&a[1], asz + 1);            /* { dg-warning "unterminated" } */
+T (&a[v0], asz + 1);           /* { dg-warning "unterminated" } */
+T (&a[v0] + 1, asz + 1);       /* { dg-warning "unterminated" } */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+enum { bsz = sizeof b[0] };
+
+T (b[0], bsz);
+T (b[1], bsz);
+T (b[2], bsz);
+T (b[3], bsz);
+
+T (b[0], bsz - 1);
+T (b[1], bsz - 1);
+T (b[2], bsz - 1);
+T (b[3], bsz - 1);
+
+T (b[0], bsz + 1);
+T (b[1], bsz + 1);
+T (b[2], bsz + 1);
+T (b[3], bsz + 1);            /* { dg-warning "unterminated" } */
+
+T (b[i0], bsz);
+T (b[i1], bsz);
+T (b[i2], bsz);
+T (b[i3], bsz);
+
+T (b[i0], bsz + 1);
+T (b[i1], bsz + 1);
+T (b[i2], bsz + 1);
+T (b[i3], bsz + 1);           /* { dg-warning "unterminated" } */
+
+T (b[v0], bsz);
+T (b[v0], bsz + 1);
+
+T (&b[i2][i1], bsz);
+T (&b[i2][i1] + i1, bsz);
+T (&b[i2][v0], bsz);
+T (&b[i2][i1] + v0, bsz);
+
+T (&b[i2][i1], bsz + 1);
+T (&b[i2][i1] + i1, bsz + 1);
+T (&b[i2][v0], bsz + 1);
+T (&b[i2][i1] + v0, bsz + 1);
+
+T (&b[2][1], bsz);
+T (&b[2][1] + i1, bsz);
+T (&b[2][i0], bsz);
+T (&b[2][1] + i0, bsz);
+T (&b[2][1] + v0, bsz);
+T (&b[2][v0], bsz);
+
+T (&b[2][1], bsz + 1);
+T (&b[2][1] + i1, bsz + 1);
+T (&b[2][i0], bsz + 1);
+T (&b[2][1] + i0, bsz + 1);
+T (&b[2][1] + v0, bsz + 1);
+T (&b[2][v0], bsz + 1);
+
+T (&b[3][1], bsz);                /* { dg-warning "unterminated" } */
+T (&b[3][1], bsz - 1);
+T (&b[3][1] + 1, bsz);            /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz - 1);        /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz - 2);
+T (&b[3][1] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz - i1);      /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz - i2);
+T (&b[3][v0], bsz);
+T (&b[3][1] + v0, bsz);           /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" } */
+T (&b[3][v0] + v1, bsz);          /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" "pr?????" { xfail *-*-* } } */
+
+T (&b[3][1], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz + 1);        /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&b[3][v0], bsz + 1);           /* { dg-warning "unterminated" "pr86936" { xfail *-*-* } } */
+T (&b[3][1] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&b[3][v0] + v1, bsz + 1);      /* { dg-warning "unterminated" "pr86936" { xfail *-*-* } } */
+
+T (&b[i3][i1], bsz);              /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + 1, bsz);          /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + i1, bsz);         /* { dg-warning "specified bound 5 exceeds the size 3 of unterminated array" } */
+T (&b[i3][v0], bsz);
+T (&b[i3][i1] + v0, bsz);         /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" } */
+T (&b[i3][v0] + v1, bsz);
+
+T (&b[i3][i1], bsz + 1);          /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + 1, bsz + 1);      /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + i1, bsz + 1);     /* { dg-warning "unterminated" } */
+T (&b[i3][v0], bsz + 1);          /* { dg-warning "unterminated" "pr86919" { xfail *-*-* } } */
+T (&b[i3][i1] + v0, bsz + 1);     /* { dg-warning "unterminated" } */
+T (&b[i3][v0] + v1, bsz + 1);     /* { dg-warning "unterminated" "pr86919" { xfail *-*-* } } */
+
+T (v0 ? "" : b[0], bsz);
+T (v0 ? "" : b[1], bsz);
+T (v0 ? "" : b[2], bsz);
+T (v0 ? "" : b[3], bsz);
+T (v0 ? b[0] : "", bsz);
+T (v0 ? b[1] : "", bsz);
+T (v0 ? b[2] : "", bsz);
+T (v0 ? b[3] : "", bsz);
+
+T (v0 ? "" : b[0], bsz + 1);
+T (v0 ? "" : b[1], bsz + 1);
+T (v0 ? "" : b[2], bsz + 1);
+T (v0 ? "" : b[3], bsz + 1);      /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[0] : "", bsz + 1);
+T (v0 ? b[1] : "", bsz + 1);
+T (v0 ? b[2] : "", bsz + 1);
+T (v0 ? b[3] : "", bsz + 1);      /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? "" : b[i0], bsz);
+T (v0 ? "" : b[i1], bsz);
+T (v0 ? "" : b[i2], bsz);
+T (v0 ? "" : b[i3], bsz);
+T (v0 ? b[i0] : "", bsz);
+T (v0 ? b[i1] : "", bsz);
+T (v0 ? b[i2] : "", bsz);
+T (v0 ? b[i3] : "", bsz);
+
+T (v0 ? "" : b[i0], bsz + 1);
+T (v0 ? "" : b[i1], bsz + 1);
+T (v0 ? "" : b[i2], bsz + 1);
+T (v0 ? "" : b[i3], bsz + 1);     /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[i0] : "", bsz + 1);
+T (v0 ? b[i1] : "", bsz + 1);
+T (v0 ? b[i2] : "", bsz + 1);
+T (v0 ? b[i3] : "", bsz + 1);     /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? "1234" : b[3], bsz);
+T (v0 ? "1234" : b[i3], bsz);
+T (v0 ? b[3] : "1234", bsz);
+T (v0 ? b[i3] : "1234", bsz);
+
+T (v0 ? a : b[3], bsz);
+T (v0 ? b[0] : b[2], bsz);
+T (v0 ? b[2] : b[3], bsz);
+T (v0 ? b[3] : b[2], bsz);
+
+T (v0 ? "1234" : b[3], bsz + 1);  /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? "1234" : b[i3], bsz + 1); /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[3] : "1234", bsz + 1);  /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[i3] : "1234", bsz + 1); /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? a : b[3], bsz + 1);       /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[0] : b[2], bsz + 1);
+T (v0 ? b[2] : b[3], bsz + 1);    /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[3] : b[2], bsz + 1);    /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a, asz);
+T (&s.a[0], asz);
+T (&s.a[0] + 1, asz);
+T (&s.a[0] + v0, asz);
+T (&s.a[1], asz);
+T (&s.a[1] + 1, asz);
+T (&s.a[1] + v0, asz);
+
+T (&s.a[i0], asz);
+T (&s.a[i0] + i1, asz);
+T (&s.a[i0] + v0, asz);
+T (&s.a[i1], asz);
+T (&s.a[i1] + i1, asz);
+T (&s.a[i1] + v0, asz);
+
+T (s.a, asz + 1);
+T (&s.a[0], asz + 1);
+T (&s.a[0] + 1, asz + 1);
+T (&s.a[0] + v0, asz + 1);
+T (&s.a[1], asz + 1);
+T (&s.a[1] + 1, asz + 1);
+T (&s.a[1] + v0, asz + 1);
+
+T (&s.a[i0], asz + 1);
+T (&s.a[i0] + i1, asz + 1);
+T (&s.a[i0] + v0, asz + 1);
+T (&s.a[i1], asz + 1);
+T (&s.a[i1] + i1, asz + 1);
+T (&s.a[i1] + v0, asz + 1);
+
+T (s.b, bsz);
+T (&s.b[0], bsz);
+T (&s.b[0] + 1, bsz);             /* { dg-warning "unterminated" } */
+T (&s.b[0] + v0, bsz);            /* { dg-warning "unterminated" } */
+T (&s.b[1], bsz);                 /* { dg-warning "unterminated" } */
+T (&s.b[1] + 1, bsz);             /* { dg-warning "unterminated" } */
+T (&s.b[1] + v0, bsz);            /* { dg-warning "unterminated" } */
+
+T (&s.b[i0], bsz);
+T (&s.b[i0] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i0] + v0, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i1], bsz);                /* { dg-warning "unterminated" } */
+T (&s.b[i1] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i1] + v0, bsz);           /* { dg-warning "unterminated" } */
+
+T (s.b, bsz + 1);                 /* { dg-warning "unterminated" } */
+T (&s.b[0], bsz + 1);             /* { dg-warning "unterminated" } */
+T (&s.b[0] + 1, bsz + 1);         /* { dg-warning "unterminated" } */
+T (&s.b[0] + v0, bsz + 1);        /* { dg-warning "unterminated" } */
+T (&s.b[1], bsz + 1);             /* { dg-warning "unterminated" } */
+T (&s.b[1] + 1, bsz + 1);         /* { dg-warning "unterminated" } */
+T (&s.b[1] + v0, bsz + 1);        /* { dg-warning "unterminated" } */
+
+T (&s.b[i0], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&s.b[i0] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i0] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i1], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&s.b[i1] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i1] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a, asz + 1);
+T (&ba[0].a[0].a[0], asz + 1);
+T (&ba[0].a[0].a[0] + 1, asz + 1);
+T (&ba[0].a[0].a[0] + v0, asz + 1);
+T (&ba[0].a[0].a[1], asz + 1);
+T (&ba[0].a[0].a[1] + 1, asz + 1);
+T (&ba[0].a[0].a[1] + v0, asz + 1);
+
+T (ba[0].a[0].b, bsz);
+T (&ba[0].a[0].b[0], bsz);
+T (&ba[0].a[0].b[0] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + 1, bsz - 1);
+T (&ba[0].a[0].b[0] + v0, bsz);       /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz - 1);
+T (&ba[0].a[0].b[1] + 1, bsz - 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + 1, bsz - 2);
+T (&ba[0].a[0].b[1] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + v0, bsz);       /* { dg-warning "unterminated" } */
+
+T (ba[0].a[0].b, bsz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0], bsz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + 1, bsz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + v0, bsz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + 1, bsz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + v0, bsz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[0].a[1].a, asz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[0].a[1].b, bsz + 1);
+T (&ba[0].a[1].b[0], bsz + 1);
+T (&ba[0].a[1].b[0] + 1, bsz + 1);
+T (&ba[0].a[1].b[0] + v0, bsz + 1);
+T (&ba[0].a[1].b[1], bsz + 1);
+T (&ba[0].a[1].b[1] + 1, bsz + 1);
+T (&ba[0].a[1].b[1] + v0, bsz + 1);
+
+T (ba[1].a[0].a, asz);
+T (&ba[1].a[0].a[0], asz);
+T (&ba[1].a[0].a[0] + 1, asz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + v0, asz);       /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1], asz);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + 1, asz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + v0, asz);       /* { dg-warning "unterminated" } */
+
+T (ba[1].a[0].a, asz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[1].a[0].b, bsz);
+T (&ba[1].a[0].b[0], bsz);
+T (&ba[1].a[0].b[0] + 1, bsz);
+T (&ba[1].a[0].b[0] + v0, bsz);
+T (&ba[1].a[0].b[1], bsz);
+T (&ba[1].a[0].b[1] + 1, bsz);
+T (&ba[1].a[0].b[1] + v0, bsz);
+
+T (ba[1].a[1].a, asz);
+T (&ba[1].a[1].a[0], asz);
+T (&ba[1].a[1].a[0] + 1, asz);
+T (&ba[1].a[1].a[0] + v0, asz);
+T (&ba[1].a[1].a[1], asz);
+T (&ba[1].a[1].a[1] + 1, asz);
+T (&ba[1].a[1].a[1] + v0, asz);
+
+T (ba[1].a[1].b, bsz);
+T (&ba[1].a[1].b[0], bsz);
+T (&ba[1].a[1].b[0] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[0] + 1, bsz - 1);
+T (&ba[1].a[1].b[0] + v0, bsz);       /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1], bsz);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1], bsz - 1);
+T (&ba[1].a[1].b[1] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1] + 1, bsz - 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1] + 1, bsz - 2);
+T (&ba[1].a[1].b[1] + 1, bsz - i2);
+T (&ba[1].a[1].b[1] + v0, bsz);       /* { dg-warning "unterminated" } */
+
+/* Prune out warnings with no location (pr?????).
+   { dg-prune-output "cc1:" } */
diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index 1d813b4..b747c35 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -336,8 +336,12 @@ get_stridx (tree exp)
 	return idx;
     }
 
-  s = string_constant (exp, &o);
+  /* Set if EXP refers to a constant array that is not a nul-terminated
+     string, otherwise clear.  */
+  tree nonstr;
+  s = string_constant (exp, &o, &nonstr);
   if (s != NULL_TREE
+      && nonstr == NULL_TREE
       && (o == NULL_TREE || tree_fits_shwi_p (o))
       && TREE_STRING_LENGTH (s) > 0)
     {

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
                       ` (4 preceding siblings ...)
  2018-08-13 21:29     ` [PATCH 6/6] detect unterminated const arrays in strnlen " Martin Sebor
@ 2018-08-14  3:21     ` Martin Sebor
  2018-08-30 22:15       ` Jeff Law
  2018-08-15  6:02     ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Jeff Law
  6 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-14  3:21 UTC (permalink / raw)
  To: Gcc Patch List, Jeff Law

[-- Attachment #1: Type: text/plain, Size: 181 bytes --]

[PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)

The attached changes implement the detection of past-the-end reads
by strlen due to unterminated arguments.

[-- Attachment #2: gcc-86552-2.diff --]
[-- Type: text/x-patch, Size: 16085 bytes --]

PR tree-optimization/86552 - missing warning for reading past the end

gcc/ChangeLog:

	* builtins.c (warn_string_no_nul): New function.
	(expand_builtin_strlen): Warn for unterminated arrays.
	(fold_builtin_strlen): Add argument.  Warn for unterminated arrays.
	(fold_builtin_1): Adjust call to fold_builtin_strlen.
	* builtins.h (warn_string_no_nul): New function.

gcc/testsuite/ChangeLog:

	* gcc.dg/warn-strlen-no-nul.c: New test.

diff --git a/gcc/builtins.c b/gcc/builtins.c
index a7aa4b2..78ced93 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
 static rtx expand_builtin_expect (tree, rtx);
 static tree fold_builtin_constant_p (tree);
 static tree fold_builtin_classify_type (tree);
-static tree fold_builtin_strlen (location_t, tree, tree);
+static tree fold_builtin_strlen (location_t, tree, tree, tree);
 static tree fold_builtin_inf (location_t, tree, int);
 static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
 static bool validate_arg (const_tree, enum tree_code code);
@@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
   return n;
 }
 
+/* For a call expression EXP to a function that expects a string argument,
+   issue a diagnostic due to it being a called with an argument NONSTR
+   that is a character array with no terminating NUL.  */
+
+void
+warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
+{
+  loc = expansion_point_location_if_in_system_header (loc);
+
+  bool warned;
+  if (exp)
+    {
+      if (!fndecl)
+	fndecl = get_callee_fndecl (exp);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%K%qD argument missing terminating nul",
+			   exp, fndecl);
+    }
+  else
+    {
+      gcc_assert (fndecl);
+      warned = warning_at (loc, OPT_Wstringop_overflow_,
+			   "%qD argument missing terminating nul",
+			   fndecl);
+    }
+
+  if (warned && DECL_P (nonstr))
+    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
+}
+
 /* Compute the length of a null-terminated character string or wide
    character string handling character sizes of 1, 2, and 4 bytes.
    TREE_STRING_LENGTH is not the right way because it evaluates to
@@ -2874,7 +2904,6 @@ expand_builtin_strlen (tree exp, rtx target,
 
   struct expand_operand ops[4];
   rtx pat;
-  tree len;
   tree src = CALL_EXPR_ARG (exp, 0);
   rtx src_reg;
   rtx_insn *before_strlen;
@@ -2883,20 +2912,39 @@ expand_builtin_strlen (tree exp, rtx target,
   unsigned int align;
 
   /* If the length can be computed at compile-time, return it.  */
-  len = c_strlen (src, 0);
+  tree array;
+  tree len = c_strlen (src, 0, &array);
   if (len)
-    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    {
+      if (array)
+	{
+	  /* Array refers to the non-nul terminated constant array
+	     whose length is attempted to be computed.  */
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+    }
 
   /* If the length can be computed at compile-time and is constant
      integer, but there are side-effects in src, evaluate
      src for side-effects, then return len.
      E.g. x = strlen (i++ ? "xfoo" + 1 : "bar");
      can be optimized into: i++; x = 3;  */
-  len = c_strlen (src, 1);
-  if (len && TREE_CODE (len) == INTEGER_CST)
+  len = c_strlen (src, 1, &array);
+  if (len)
     {
-      expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
-      return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+      if (array)
+	{
+	  warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array);
+	  return NULL_RTX;
+	}
+
+      if (TREE_CODE (len) == INTEGER_CST)
+	{
+	  expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL);
+	  return expand_expr (len, target, target_mode, EXPAND_NORMAL);
+	}
     }
 
   align = get_pointer_alignment (src) / BITS_PER_UNIT;
@@ -8339,19 +8387,30 @@ fold_builtin_classify_type (tree arg)
   return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
 }
 
-/* Fold a call to __builtin_strlen with argument ARG.  */
+/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
 
 static tree
-fold_builtin_strlen (location_t loc, tree type, tree arg)
+fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
 {
   if (!validate_arg (arg, POINTER_TYPE))
     return NULL_TREE;
   else
     {
-      tree len = c_strlen (arg, 0);
-
+      tree nonstr = NULL_TREE;
+      tree len = c_strlen (arg, 0, &nonstr);
       if (len)
-	return fold_convert_loc (loc, type, len);
+	{
+	  if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg))
+	    loc = EXPR_LOCATION (arg);
+
+	  /* To avoid warning multiple times about unterminated
+	     arrays only warn if its length has been determined
+	     and is being folded to a constant.  */
+	  if (nonstr)
+	    warn_string_no_nul (loc, NULL_TREE, fndecl, nonstr);
+
+	  return fold_convert_loc (loc, type, len);
+	}
 
       return NULL_TREE;
     }
@@ -9259,7 +9318,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0)
       return fold_builtin_classify_type (arg0);
 
     case BUILT_IN_STRLEN:
-      return fold_builtin_strlen (loc, type, arg0);
+      return fold_builtin_strlen (loc, fndecl, type, arg0);
 
     CASE_FLT_FN (BUILT_IN_FABS):
     CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS):
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 27e6959..73b0b0b 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p);
 
 extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
-
+extern void warn_string_no_nul (location_t, tree, tree, tree);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
new file mode 100644
index 0000000..c2b0438
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c
@@ -0,0 +1,304 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+extern __SIZE_TYPE__ strlen (const char*);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+
+int v0 = 0;
+int v1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str)						\
+  __attribute__ ((noipa))				\
+  void CAT (test_, __LINE__) (void) {			\
+    int i0 = 0, i1 = i0 + 1, i2 = i1 + 1, i3 = i2 + 1;	\
+    sink (strlen (str), i0, i1, i2, i3);		\
+  } typedef void dummy_type
+
+T (a);                /* { dg-warning "argument missing terminating nul" }  */
+T (&a[0]);            /* { dg-warning "nul" }  */
+T (&a[0] + 1);        /* { dg-warning "nul" }  */
+T (&a[1]);            /* { dg-warning "nul" }  */
+T (&a[v0]);           /* { dg-warning "nul" }  */
+T (&a[v0] + 1);       /* { dg-warning "nul" }  */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+
+T (b[0]);
+T (b[1]);
+T (b[2]);
+T (b[3]);             /* { dg-warning "nul" }  */
+
+T (b[i0]);
+T (b[i1]);
+T (b[i2]);
+T (b[i3]);            /* { dg-warning "nul" }  */
+
+T (b[v0]);
+
+T (&b[i2][i1]);
+T (&b[i2][i1] + i1);
+T (&b[i2][v0]);
+T (&b[i2][i1] + v0);
+
+T (&b[2][1]);
+T (&b[2][1] + i1);
+T (&b[2][i0]);
+T (&b[2][1] + i0);
+
+T (&b[2][1]);
+T (&b[2][1] + v0);
+T (&b[2][v0]);
+
+T (&b[3][1]);           /* { dg-warning "nul" }  */
+T (&b[3][1] + 1);       /* { dg-warning "nul" }  */
+T (&b[3][1] + i1);      /* { dg-warning "nul" }  */
+T (&b[3][v0]);          /* { dg-warning "nul" }  */
+T (&b[3][1] + v0);      /* { dg-warning "nul" }  */
+T (&b[3][v0] + v1);     /* { dg-warning "nul" }  */
+
+T (&b[i3][i1]);         /* { dg-warning "nul" }  */
+T (&b[i3][i1] + 1);     /* { dg-warning "nul" }  */
+T (&b[i3][i1] + i1);    /* { dg-warning "nul" }  */
+T (&b[i3][v0]);         /* { dg-warning "nul" "pr86919" { xfail *-*-* } }  */
+T (&b[i3][i1] + v0);    /* { dg-warning "nul" "pr86919" { xfail *-*-* } }  */
+T (&b[i3][v0] + v1);    /* { dg-warning "nul" "pr86919" { xfail *-*-* } }  */
+
+T (v0 ? "" : b[0]);
+T (v0 ? "" : b[1]);
+T (v0 ? "" : b[2]);
+T (v0 ? "" : b[3]);               /* { dg-warning "nul" }  */
+T (v0 ? b[0] : "");
+T (v0 ? b[1] : "");
+T (v0 ? b[2] : "");
+T (v0 ? b[3] : "");               /* { dg-warning "nul" }  */
+
+T (v0 ? "" : b[i0]);
+T (v0 ? "" : b[i1]);
+T (v0 ? "" : b[i2]);
+/* The following is diagnosed but the warning location is wrong
+   (the PRE pass loses it).  */
+T (v0 ? "" : b[i3]);              /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+T (v0 ? b[i0] : "");
+T (v0 ? b[i1] : "");
+T (v0 ? b[i2] : "");
+T (v0 ? b[i3] : "");              /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+
+T (v0 ? "1234" : b[3]);           /* { dg-warning "nul" }  */
+T (v0 ? "1234" : b[i3]);          /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+T (v0 ? b[3] : "1234");           /* { dg-warning "nul" }  */
+T (v0 ? b[i3] : "1234");          /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+
+T (v0 ? a : b[3]);                /* { dg-warning "nul" }  */
+T (v0 ? b[0] : b[2]);
+T (v0 ? b[2] : b[3]);             /* { dg-warning "nul" }  */
+T (v0 ? b[3] : b[2]);             /* { dg-warning "nul" }  */
+
+T (v0 ? a : b[i3]);               /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+T (v0 ? b[i0] : b[i2]);
+T (v0 ? b[i2] : b[i3]);           /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+T (v0 ? b[i3] : b[i2]);           /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+
+T (v0 ? b[0] : &b[3][0] + 1);     /* { dg-warning "nul" }  */
+T (v0 ? b[0] : &b[3][0] + i1);    /* { dg-warning "nul" }  */
+T (v0 ? b[1] : &b[3][1] + v0);    /* { dg-warning "nul" }  */
+
+T (v0 ? b[i0] : &b[i3][i0] + i1);    /* { dg-warning "nul" }  */
+T (v0 ? b[i0] : &b[i3][i0] + i1);    /* { dg-warning "nul" }  */
+T (v0 ? b[i1] : &b[i3][i1] + v0);    /* { dg-warning "nul" "pr?????" { xfail *-*-* } }  */
+
+/* It's possible to detect the missing nul in the following two
+   expressions but GCC doesn't do it yet.  */
+T (v0 ? &b[3][1] + v0 : b[2]);    /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
+
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a);
+T (&s.a[0]);
+T (&s.a[0] + 1);
+T (&s.a[0] + v0);
+T (&s.a[1]);
+T (&s.a[1] + 1);
+T (&s.a[1] + v0);
+
+T (&s.a[i0]);
+T (&s.a[i0] + i1);
+T (&s.a[i0] + v0);
+T (&s.a[i1]);
+T (&s.a[i1] + i1);
+T (&s.a[i1] + v0);
+
+T (s.b);              /* { dg-warning "nul" }  */
+T (&s.b[0]);          /* { dg-warning "nul" }  */
+T (&s.b[0] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[0] + v0);     /* { dg-warning "nul" }  */
+T (&s.b[1]);          /* { dg-warning "nul" }  */
+T (&s.b[1] + 1);      /* { dg-warning "nul" }  */
+T (&s.b[1] + i0);     /* { dg-warning "nul" }  */
+T (&s.b[1] + v0);     /* { dg-warning "nul" }  */
+
+T (&s.b[i0]);         /* { dg-warning "nul" }  */
+T (&s.b[i0] + i1);    /* { dg-warning "nul" }  */
+T (&s.b[i0] + v0);    /* { dg-warning "nul" "pr86919" { xfail *-*-* } }  */
+T (&s.b[i1]);         /* { dg-warning "nul" }  */
+T (&s.b[i1] + i1);    /* { dg-warning "nul" }  */
+T (&s.b[i1] + v0);    /* { dg-warning "nul" "pr86919" { xfail *-*-* } }  */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a);
+T (&ba[0].a[0].a[0]);
+T (&ba[0].a[0].a[0] + 1);
+T (&ba[0].a[0].a[0] + v0);
+T (&ba[0].a[0].a[1]);
+T (&ba[0].a[0].a[1] + 1);
+T (&ba[0].a[0].a[1] + v0);
+
+T (ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[0] + v0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[0].b[1] + v0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[0] + v0);  /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[0].a[1].a[1] + v0);  /* { dg-warning "nul" }  */
+
+T (ba[0].a[1].b);
+T (&ba[0].a[1].b[0]);
+T (&ba[0].a[1].b[0] + 1);
+T (&ba[0].a[1].b[0] + v0);
+T (&ba[0].a[1].b[1]);
+T (&ba[0].a[1].b[1] + 1);
+T (&ba[0].a[1].b[1] + v0);
+
+
+T (ba[1].a[0].a);           /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[0] + v0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[0].a[1] + v0);  /* { dg-warning "nul" }  */
+
+T (ba[1].a[0].b);
+T (&ba[1].a[0].b[0]);
+T (&ba[1].a[0].b[0] + 1);
+T (&ba[1].a[0].b[0] + v0);
+T (&ba[1].a[0].b[1]);
+T (&ba[1].a[0].b[1] + 1);
+T (&ba[1].a[0].b[1] + v0);
+
+T (ba[1].a[1].a);
+T (&ba[1].a[1].a[0]);
+T (&ba[1].a[1].a[0] + 1);
+T (&ba[1].a[1].a[0] + v0);
+T (&ba[1].a[1].a[1]);
+T (&ba[1].a[1].a[1] + 1);
+T (&ba[1].a[1].a[1] + v0);
+
+T (ba[1].a[1].b);           /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[0] + v0);  /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1]);       /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[1].a[1].b[1] + v0);  /* { dg-warning "nul" }  */
+
+
+T (ba[2].a[0].a);
+T (&ba[2].a[0].a[0]);
+T (&ba[2].a[0].a[0] + 1);
+T (&ba[2].a[0].a[0] + v0);
+T (&ba[2].a[0].a[1]);
+T (&ba[2].a[0].a[1] + 1);
+T (&ba[2].a[0].a[1] + v0);
+
+T (ba[2].a[0].b);
+T (&ba[2].a[0].b[0]);
+T (&ba[2].a[0].b[0] + 1);
+T (&ba[2].a[0].b[0] + v0);
+T (&ba[2].a[0].b[1]);
+T (&ba[2].a[0].b[1] + 1);
+T (&ba[2].a[0].b[1] + v0);
+
+T (ba[2].a[1].a);
+T (&ba[2].a[1].a[0]);
+T (&ba[2].a[1].a[0] + 1);
+T (&ba[2].a[1].a[0] + v0);
+T (&ba[2].a[1].a[1]);
+T (&ba[2].a[1].a[1] + 1);
+T (&ba[2].a[1].a[1] + v0);
+
+
+T (ba[3].a[0].a);
+T (&ba[3].a[0].a[0]);
+T (&ba[3].a[0].a[0] + 1);
+T (&ba[3].a[0].a[0] + v0);
+T (&ba[3].a[0].a[1]);
+T (&ba[3].a[0].a[1] + 1);
+T (&ba[3].a[0].a[1] + v0);
+
+T (ba[3].a[0].b);
+T (&ba[3].a[0].b[0]);
+T (&ba[3].a[0].b[0] + 1);
+T (&ba[3].a[0].b[0] + v0);
+T (&ba[3].a[0].b[1]);
+T (&ba[3].a[0].b[1] + 1);
+T (&ba[3].a[0].b[1] + v0);
+
+T (ba[3].a[1].a);           /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[0] + v0);  /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1]);	    /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + 1);   /* { dg-warning "nul" }  */
+T (&ba[3].a[1].a[1] + v0);  /* { dg-warning "nul" }  */
+
+T (ba[3].a[1].b);
+T (&ba[3].a[1].b[0]);	
+T (&ba[3].a[1].b[0] + 1);
+T (&ba[3].a[1].b[0] + v0);
+T (&ba[3].a[1].b[1]);	
+T (&ba[3].a[1].b[1] + 1);
+T (&ba[3].a[1].b[1] + v0);
+
+
+T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+T (v0 ? ba[0].a[0].a : ba[0].a[0].b);           /* { dg-warning "nul" }  */
+
+T (v0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]);   /* { dg-warning "nul" }  */
+T (v0 ? &ba[3].a[1].a[1] :  ba[0].a[0].a);      /* { dg-warning "nul" }  */
+
+T (v0 ? ba[0].a[0].a : ba[0].a[1].b);
+T (v0 ? ba[0].a[1].b : ba[0].a[0].a);
+
+/* Prune out warnings with no location (pr?????).
+   { dg-prune-output "cc1:" } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-03 13:01             ` Bernd Edlinger
  2018-08-03 19:59               ` Martin Sebor
@ 2018-08-15  5:31               ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-15  5:31 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/03/2018 07:00 AM, Bernd Edlinger wrote:
> On 08/02/18 22:34, Martin Sebor wrote:
>> On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
>>> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>>>
>>>>>    /* If the length can be computed at compile-time, return it.  */
>>>>> -  len = c_strlen (src, 0);
>>>>> +  tree array;
>>>>> +  tree len = c_strlen (src, 0, &array);
>>>>
>>>> You know the c_strlen tries to compute wide character sizes,
>>>> but strlen does not do that, strlen (L"abc") should give 1
>>>> (or 0 on a BE machine)
>>>> I wonder if that is correct.
>>>>
>>> [snip]
>>>>>
>>>>>  static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>  {
>>>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>>>      return NULL_TREE;
>>>>>    else
>>>>>      {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>>>
>>>
>>> Yes, of course it is:
>>>
>>> $ cat y.c
>>> int f(char *x)
>>> {
>>>    return __builtin_strlen(x);
>>> }
>>>
>>> int main ()
>>> {
>>>    return f((char*)&L"abcdef"[0]);
>>> }
>>> $ gcc -O3 -S y.c
>>> $ cat y.s
>>> main:
>>> .LFB1:
>>>     .cfi_startproc
>>>     movl    $6, %eax
>>>     ret
>>>     .cfi_endproc
>>>
>>> The reason is that c_strlen tries to fold wide chars at all.
>>> I do not know when that was introduced, was that already before your last patches?
>>
>> The function itself was introduced in 1992 if not earlier,
>> before wide strings even existed.  AFAICS, it has always
>> accepted strings of all widths.  Until r241489 (in GCC 7)
>> it computed their length in bytes, not characters.  I don't
>> know if that was on purpose or if it was just never changed
>> to compute the length in characters when wide strings were
>> first introduced.  From the name I assume it's the latter.
>> The difference wasn't detected until sprintf tests were added
>> for wide string directives.  The ChangeLog description for
>> the change reads: Correctly handle wide strings.  I didn't
>> consider pathological cases like strlen (L"abc").  It
>> shouldn't be difficult to change to fix this case.
>>
> 
> Oh, oh, oh....
> 
> $ cat y3.c
> int main ()
> {
>    char c[100];
>    int x = __builtin_sprintf (c, "%S", L"\uFFFF");
> 
>    __builtin_printf("%d %ld\n", x,__builtin_strlen(c));
> }
> 
> $ gcc-4.8 -O3 -std=c99 y3.c
> $ ./a.out
> -1 0
> $ gcc -O3 y3.c
> $ ./a.out
> 1 0
> $ echo $LANG
> de_DE.UTF-8
> 
> I would have expected L"\uFFFF" to converted to UTF-8
> or another encoding, so the return value if sprintf is
> far from obvious, and probably language dependent.
FWIW, Martin has a patch (under review) that I think will fix this and
includes a testcase that is likely inspired by the code above.

> 
> Why do you think it is a good idea to use really every
> opportunity to optimize totally unnecessary things like
> using the return value from the sprintf function as it is?
> 
> Did you never think this adds a significant maintenance
> burden on the rest of us as well?
It largely came along for free during the implementation of the sprintf
warnings.  That's changed a bit over time, but it's still the case that
the sprintf warnings do all the analysis necessary to optimize the
sprintf return value.

As both Martin and I have stated before the real goal is getting good
warnings from sprintf.  Optimization is a distant second.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714)
  2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
                       ` (5 preceding siblings ...)
  2018-08-14  3:21     ` [PATCH 2/6] detect unterminated const arrays in strlen " Martin Sebor
@ 2018-08-15  6:02     ` Jeff Law
  2018-08-15 14:47       ` Martin Sebor
  6 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-15  6:02 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 03:23 PM, Martin Sebor wrote:
> To make reviewing the changes easier I've split up the patch
> into a series:
[ ... ]
I'm about done for the night and thus won't get into the series (and as
you know Bernd has a competing patch in this space).  But I did want to
chime in on two things...

> 
> There are many more string functions where unterminated (constant
> or otherwise) should be diagnosed.  I plan to continue to work on
> those (with the constant ones first)  but I want to post this
> updated patch for review now, mainly so that the wrong code bug
> (PR 86711) can be resolved and the basic detection infrastructure
> agreed on.
Yes, I think we definitely want to focus on the wrong code bug first.

> 
> An open question in my mind is what should GCC do with such calls
> after issuing a warning: replace them with traps?  Fold them into
> constants?  Or continue to pass them through to the corresponding
> library functions?
My personal preference is to turn them into traps.  I don't think we
have to preserve the call itself in this case.   I think the sequencing
is to insert the trap before the call point, split the block after the
trap, remove the outgoing edges, let DCE clean up the rest.  At least I
think that's the sequencing.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714)
  2018-08-15  6:02     ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Jeff Law
@ 2018-08-15 14:47       ` Martin Sebor
  2018-08-15 15:42         ` Jeff Law
  0 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-15 14:47 UTC (permalink / raw)
  To: Jeff Law, Gcc Patch List

On 08/15/2018 12:02 AM, Jeff Law wrote:
> On 08/13/2018 03:23 PM, Martin Sebor wrote:
>> To make reviewing the changes easier I've split up the patch
>> into a series:
> [ ... ]
> I'm about done for the night and thus won't get into the series (and as
> you know Bernd has a competing patch in this space).  But I did want to
> chime in on two things...
>
>>
>> There are many more string functions where unterminated (constant
>> or otherwise) should be diagnosed.  I plan to continue to work on
>> those (with the constant ones first)  but I want to post this
>> updated patch for review now, mainly so that the wrong code bug
>> (PR 86711) can be resolved and the basic detection infrastructure
>> agreed on.
> Yes, I think we definitely want to focus on the wrong code bug first.
>
>>
>> An open question in my mind is what should GCC do with such calls
>> after issuing a warning: replace them with traps?  Fold them into
>> constants?  Or continue to pass them through to the corresponding
>> library functions?
> My personal preference is to turn them into traps.  I don't think we
> have to preserve the call itself in this case.   I think the sequencing
> is to insert the trap before the call point, split the block after the
> trap, remove the outgoing edges, let DCE clean up the rest.  At least I
> think that's the sequencing.

That sounds fine to me.  It would be close in its effects to
what _FORTIFY_SOURCE does.

It would be helpful to get a broader consensus on this and start
adopting the same consistent solution in all contexts.  The question
has come up a few times, most recently also in PR 86519 (folding
memcmp(a, "a", 3)) where GCC ends up calling the library function.

FWIW, if there are other preferences it might be worthwhile to
consider providing an option to control the behavior in these
cases.  There may also be interactions with or implications for
the sanitizers to consider.

Once there is agreement on what the solution should be I can look
into implementing it at some point in the future.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714)
  2018-08-15 14:47       ` Martin Sebor
@ 2018-08-15 15:42         ` Jeff Law
  2018-08-24 10:13           ` Richard Biener
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-15 15:42 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/15/2018 08:47 AM, Martin Sebor wrote:
> On 08/15/2018 12:02 AM, Jeff Law wrote:
>> On 08/13/2018 03:23 PM, Martin Sebor wrote:
>>> To make reviewing the changes easier I've split up the patch
>>> into a series:
>> [ ... ]
>> I'm about done for the night and thus won't get into the series (and as
>> you know Bernd has a competing patch in this space).  But I did want to
>> chime in on two things...
>>
>>>
>>> There are many more string functions where unterminated (constant
>>> or otherwise) should be diagnosed.  I plan to continue to work on
>>> those (with the constant ones first)  but I want to post this
>>> updated patch for review now, mainly so that the wrong code bug
>>> (PR 86711) can be resolved and the basic detection infrastructure
>>> agreed on.
>> Yes, I think we definitely want to focus on the wrong code bug first.
>>
>>>
>>> An open question in my mind is what should GCC do with such calls
>>> after issuing a warning: replace them with traps?  Fold them into
>>> constants?  Or continue to pass them through to the corresponding
>>> library functions?
>> My personal preference is to turn them into traps.  I don't think we
>> have to preserve the call itself in this case.   I think the sequencing
>> is to insert the trap before the call point, split the block after the
>> trap, remove the outgoing edges, let DCE clean up the rest.  At least I
>> think that's the sequencing.
> 
> That sounds fine to me.  It would be close in its effects to
> what _FORTIFY_SOURCE does.
The bad guys are exceedingly resourceful in how they exploit undefined
behavior.  By trapping immediately they don't have any window to do
anything nefarious.

> 
> It would be helpful to get a broader consensus on this and start
> adopting the same consistent solution in all contexts.  The question
> has come up a few times, most recently also in PR 86519 (folding
> memcmp(a, "a", 3)) where GCC ends up calling the library function.
Yup.  We've got a mish-mash of strategies here.

> 
> FWIW, if there are other preferences it might be worthwhile to
> consider providing an option to control the behavior in these
> cases.  There may also be interactions with or implications for
> the sanitizers to consider.
There's some (Marc Glisse IIRC) that would prefer to see the control
path to the undefined behavior zapped entirely.  We didn't initially do
that because the path my have other observable side effects.  However,
there may be cases where it makes sense.

> 
> Once there is agreement on what the solution should be I can look
> into implementing it at some point in the future.
ACK.  Certainly lower priority than the stuff in flight right now.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02  2:44     ` PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) Martin Sebor
  2018-08-02 13:26       ` Bernd Edlinger
@ 2018-08-17  5:15       ` Jeff Law
  2018-08-17 14:38         ` Martin Sebor
  1 sibling, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-17  5:15 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List; +Cc: Bernd Edlinger

On 08/01/2018 08:44 PM, Martin Sebor wrote:
> Since the foundation of the patch is detecting and avoiding
> the overly aggressive folding of unterminated char arrays,
> besides issuing a warning for such arguments to strlen,
> the patch also fixes pr86711 - wrong folding of memchr, and
> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
> 
> The substance of the attached updated patch is unchanged,
> I have just added test cases for the two additional bugs.
> 
Just to be absolutely sure.  The patch attached to this message is
superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.

I'm just trying to make sure I'm looking at the right kits from both you
and Bernd for the next step.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-17  5:15       ` Jeff Law
@ 2018-08-17 14:38         ` Martin Sebor
  0 siblings, 0 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-17 14:38 UTC (permalink / raw)
  To: Jeff Law, Gcc Patch List; +Cc: Bernd Edlinger

On 08/16/2018 11:14 PM, Jeff Law wrote:
> On 08/01/2018 08:44 PM, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
> Just to be absolutely sure.  The patch attached to this message is
> superceded by the later 6 part patchkit that fixes 86714, 86711 and 86552.
>
> I'm just trying to make sure I'm looking at the right kits from both you
> and Bernd for the next step.

Yes, I broke up this patch into a series, with the basic
"infrastructure" in part 1:

[PATCH 1/6] prevent folding of unterminated const arrays (PR
86711, 86714)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00769.html

and the strlen missing nul detection in part 2:
[PATCH 2/6] detect unterminated const arrays in strlen calls
(PR 86552)
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00778.html

The rest of the series (parts 3 through 6) add the missing
nul detection to a few other built-ins.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 13:26       ` Bernd Edlinger
  2018-08-02 18:56         ` Bernd Edlinger
@ 2018-08-24  6:36         ` Jeff Law
  2018-08-24 12:28           ` Bernd Edlinger
  2018-08-24 16:51         ` Jeff Law
  2 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-24  6:36 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> On 08/02/18 04:44, Martin Sebor wrote:
>> Since the foundation of the patch is detecting and avoiding
>> the overly aggressive folding of unterminated char arrays,
>> besides issuing a warning for such arguments to strlen,
>> the patch also fixes pr86711 - wrong folding of memchr, and
>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>
>> The substance of the attached updated patch is unchanged,
>> I have just added test cases for the two additional bugs.
>>
>> Bernd, as I mentioned Wednesday, the patch supersedes
>> yours here:
>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>
> 
> No problem, but I hope you understand, that I still uphold
> my patch.
> 
> So we have two patches now:
> - mine, fixing a wrong code bug,
> - yours, implementing a new warning and fixing a wrong
> code bug at the same time.
> 
> I will add a few comments to your patch below.
[ ... ]

So a lot of the comments are out of date, presumably because Martin
fixed the issues you pointed out in his second version of the patch.
But there's still some useful nuggets in your comments that are still
relevant.

FYI it appears that sometimes you comment above a chunk of code, and
other times below.  That makes it exceedingly difficult to figure out
the issue you're trying to raise.


> 
>> Martin
>>
>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>> Attached is an updated version of the patch that handles more
>>> instances of calling strlen() on a constant array that is not
>>> a nul-terminated string.
>>>
>>> No other functions except strlen are explicitly handled yet,
>>> and neither are constant arrays with braced-initializer lists
>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>> an independent solution for those (bug 86552).  Once those
>>> are handled the warning will be able to detect those as well.
>>>
>>> Tested on x86_64-linux.
>>>
>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>
>>>> The fix for bug 86532 has been checked in so this enhancement
>>>> can now be applied on top of it (with only minor adjustments).
>>>>
>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>> GCC silently accepts constant character arrays with no
>>>>> terminating nul as arguments to strlen (and other string
>>>>> functions).
>>>>>
>>>>> The attached patch is a first step in detecting these kinds
>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>> The next step is to modify all other handlers of built-in
>>>>> functions to detect the same problem (not part of this patch).
>>>>> Yet another step is to detect these problems in arguments
>>>>> initialized using the non-string form:
>>>>>
>>>>>   const char a[] = { 'a', 'b', 'c' };
>>>>>
>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>> (I tested it with an earlier version of that patch so there
>>>>> is code in the context that does not appear in the latest
>>>>> version of the other diff).
>>>>>
>>>>> Martin
>>>>>
>>>>
>>>
>>
>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>> PR tree-optimization/86711 - wrong folding of memchr
>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>
>> gcc/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* builtins.h (warn_string_no_nul): Declare..
>> 	(c_strlen): Add argument.
>> 	* builtins.c (warn_string_no_nul): New function.
>> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
>> 	(fold_builtin_1): Adjust.
>> 	(string_length): Add argument and use it.
>> 	(c_strlen): Same.
>> 	(expand_builtin_strlen): Detect missing nul.
>> 	* expr.c (string_constant): Add arguments.  Detect missing nul
>> 	terminator and outermost declaration it's missing in.
>> 	* expr.h (string_constant): Add argument.
>> 	* fold-const.c (c_getstr): Change argument to bool*, rename
>> 	other arguments.
>> 	* fold-const-call.c (fold_const_call): Detect missing nul.
>> 	* gimple-fold.c (get_range_strlen): Add argument.
>> 	(get_maxval_strlen): Adjust.
>> 	* gimple-fold.h (get_range_strlen): Add argument.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	PR tree-optimization/86714
>> 	PR tree-optimization/86711
>> 	PR tree-optimization/86552
>> 	* gcc.c-torture/execute/memchr-1.c: New test.
>> 	* gcc.c-torture/execute/pr86714.c: New test.
>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index aa3e0d8..f4924d5 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>  static rtx expand_builtin_expect (tree, rtx);
>>  static tree fold_builtin_constant_p (tree);
>>  static tree fold_builtin_classify_type (tree);
>> -static tree fold_builtin_strlen (location_t, tree, tree);
>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>  static tree fold_builtin_inf (location_t, tree, int);
>>  static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>  static bool validate_arg (const_tree, enum tree_code code);
>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>    return n;
>>  }
>>  
>> +/* For a call expression EXP to a function that expects a string argument,
>> +   issue a diagnostic due to it being a called with an argument NONSTR
>> +   that is a character array with no terminating NUL.  */
>> +
>> +void
>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>> +{
>> +  loc = expansion_point_location_if_in_system_header (loc);
>> +
>> +  bool warned;
>> +  if (exp)
>> +    {
>> +      if (!fndecl)
>> +	fndecl = get_callee_fndecl (exp);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%K%qD argument missing terminating nul",
>> +			   exp, fndecl);
>> +    }
>> +  else
>> +    {
>> +      gcc_assert (fndecl);
>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>> +			   "%qD argument missing terminating nul",
>> +			   fndecl);
>> +    }
>> +
>> +  if (warned && DECL_P (nonstr))
>> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>> +}
>> +
>>  /* Compute the length of a null-terminated character string or wide
>>     character string handling character sizes of 1, 2, and 4 bytes.
>>     TREE_STRING_LENGTH is not the right way because it evaluates to
>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>     accesses.  Note that this implies the result is not going to be emitted
>>     into the instruction stream.
>>  
>> +   When ARR is non-null and the string is not properly nul-terminated,
>> +   set *ARR to the declaration of the outermost constant object whose
>> +   initializer (or one of its elements) is not nul-terminated.
>> +
>>     The value returned is of type `ssizetype'.
>>  
>>     Unfortunately, string_constant can't access the values of const char
>>     arrays with initializers, so neither can we do so here.  */
>>  
>>  tree
>> -c_strlen (tree src, int only_value)
>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>  {
>>    STRIP_NOPS (src);
>> +
>> +  /* Used to detect non-nul-terminated strings in subexpressions
>> +     of a conditional expression.  When ARR is null, point it at
>> +     one of the elements for simplicity.  */
>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>> +  if (!arr)
>> +    arr = arrs;
> 
> now arr is always != NULL
Right.  It's renamed in the follow-up patch from Martin, but yes, it's
always non-null.  Note in this case your comment is after the code
you're referring to.

> 
>> +
>>    if (TREE_CODE (src) == COND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>      {
>> -      tree len1, len2;
>> -
>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>        if (tree_int_cst_equal (len1, len2))
>> -	return len1;
>> +	{
> 
> funny, if called with NULL *arr and arrs[0] alias each other.

> 
>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>> +	  return len1;
>> +	}
>>      }

And in this case it looks like your comment is before the code you're
commenting about.  It was fairly obvious in this case because the code
prior to your "funny, if called..." comment didn't reference arr or arrs
at all.

And more importantly, why are you concerned about the aliasing?





>>  
>>    if (TREE_CODE (src) == COMPOUND_EXPR
>>        && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
>> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>  
>>    location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>  
>>    /* Offset from the beginning of the string in bytes.  */
>>    tree byteoff;
>> -  src = string_constant (src, &byteoff);
>> -  if (src == 0)
>> -    return NULL_TREE;
>> +  /* Set if array is nul-terminated, false otherwise.  */
>> +  bool nulterm;
> 
> note arr is always != null or pointing to arrs[0].
> 
>> +  src = string_constant (src, &byteoff, &nulterm, arr);
>> +  if (!src)
>> +    {
>> +      *arr = arrs[0] ? arrs[0] : arrs[1];
>> +      return NULL_TREE;
>> +    }
>> +
>> +  /* Clear *ARR when the string is nul-terminated.  It should be
>> +     of no interest to callers.  */
>> +  if (nulterm)
>> +    *arr = NULL_TREE;
>>  
>>    /* Determine the size of the string element.  */
>>    unsigned eltsize
>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>  	maxelts = maxelts / eltsize - 1;
>>        }
>>  
>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>> +     is stored in (as in const char a[3] = "123";  */
> 
> note arr is always != NULL, thus this if is never taken.
> 
>> +  if (!arr && maxelts < strelts)
>> +    return NULL_TREE;
>> +
Right.  And I think that check is gone in the second version of Martin's
patch.


>>    /* PTR can point to the byte representation of any string type, including
>>       char* and wchar_t*.  */
>>    const char *ptr = TREE_STRING_POINTER (src);
>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>        offsave = fold_convert (ssizetype, offsave);
>>        tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>  				      build_int_cst (ssizetype, len * eltsize));
> 
> this computation is wrong, it computes not in units of eltsize,
> I am however not sure if it is really good that this function tries to
> compute strlen of wide character strings.
Note that in the second version of Martin's patch eltsize will always be
1 when we get here.  Furthermore, the multiplication by eltsize is gone.
 It looks like you switched back to commenting after the code for that
comment, but then immediately thereafter...


> 
> That said, please fix this computation first, in a different patch
> instead of just fixing the indentation. (I know I pointed that lines are too
> long here, but that was before I realized that the whole length computation
> here is wrong).
> 
>> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>> +				     offsave);
>>        return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>  			      build_zero_cst (ssizetype));
>>      }
>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>       Since ELTOFF is our starting index into the string, no further
>>       calculation is needed.  */
The comment shown above appears to refer to the code below the comment.
Again, this makes it exceedingly confusing to understand your comments
and take appropriate action.




> 
> What are you fixing here, I think that was another bug.
> If this fixes something then it should be in a different patch,
> just handling this.
> 
>>    unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>> -				maxelts - eltoff);
>> +				strelts - eltoff);
>>  
>>    return ssize_int (len);
>>  }
I'm guessing the comment in this case refers to the code after.
Presumably questioning the change from maxelts to strelts.




>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>  
>>    struct expand_operand ops[4];
>>    rtx pat;
>> -  tree len;
>>    tree src = CALL_EXPR_ARG (exp, 0);
>>    rtx src_reg;
>>    rtx_insn *before_strlen;
>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>    unsigned int align;
>>  
>>    /* If the length can be computed at compile-time, return it.  */
>> -  len = c_strlen (src, 0);
>> +  tree array;
>> +  tree len = c_strlen (src, 0, &array);
> 
> You know the c_strlen tries to compute wide character sizes,
> but strlen does not do that, strlen (L"abc") should give 1
> (or 0 on a BE machine)
> I wonder if that is correct.
So I think this is fixed by your change which restored the default
behavior of c_strlen to count bytes.  Which restores the behavior of
c_strlen to match that of the strlen library call.

So for something like L"abc", the right value is 1 or 0 for LE and BE
targets respectively.



> 
>>    if (len)
>> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>> +    {
>> +      if (array)
>> +	{
>> +	  /* Array refers to the non-nul terminated constant array
>> +	     whose length is attempted to be computed.  */
> 
> I really wonder if it would not make more sense to have a
> nonterminated_string_constant_p instead.
Rather than a boolean I think we want a tree * so that we can bubble up
more information to the caller WRT non-terminated strings.

> 
> Last time I wanted to implement a warning in expand I faced the
> problem that inlined functions will get one warning per invocation?
Yea, but that's just the way things are.  I don't think it's really an
issue for the patches we're looking at right now.

>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>    return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>  }
>>  
>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>  
>>  static tree
>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>  {
>>    if (!validate_arg (arg, POINTER_TYPE))
>>      return NULL_TREE;
>>    else
>>      {
>> -      tree len = c_strlen (arg, 0);
>> -
>> +      tree arr = NULL_TREE;
>> +      tree len = c_strlen (arg, 0, &arr);
> 
> Is it possible to write a test case where strlen(L"test") reaches this point?
> what will c_strlen return then?
Given your fix to have c_strlen count bytes by default, I think we're OK
here.

> 
>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>  	return NULL_TREE;
>>  
>>        tree offset;
>> -      if (tree str = string_constant (arg0, &offset))
>> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>  	{
>>  	  /* Avoid pointers to arrays (see bug 86622).  */
>>  	  if (POINTER_TYPE_P (TREE_TYPE (arg))
>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (TREE_CODE (array) == STRING_CST)
> 
> Well, actually I think there _are_ STING_CSTs which are not null terminated.
> Maybe not in C. But Fortran, Ada, Go...
> 
>>      {
>>        *ptr_offset = fold_convert (sizetype, offset);
>> +      if (nulterm)
>> +	*nulterm = true;
>> +      if (decl)
>> +	*decl = NULL_TREE;
>>        return array;
>>      }
So your comment seems to be referring to the code above the comment as
well as below.  Confusing.  Consistency really helps.

Are we at a point where we're ready to declare STRING_CSTs as always
being properly terminated?  If not the hunk of code above needs some
rethinking since we can't guarantee the string is properly terminated.


>>  
>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>    if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>      return NULL_TREE;
>>  
>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>> +
> 
> I don't understand why this is necessary at all.
> It looks way too complicated, to say the least.
> 
> TREE_TYPE (init) has already the type of the member.
> 
>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>> +     the size of the character array within the aggregate.  */
>> +  tree ref = arg;
>> +  tree reftype = TREE_TYPE (arg);
>> +
>> +  if (TREE_CODE (ref) == MEM_REF)
>> +    {
>> +      ref = TREE_OPERAND (ref, 0);
>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>> +	{
>> +	  ref = TREE_OPERAND (ref, 0);
>> +	  reftype = TREE_TYPE (ref);
>> +	}
>> +    }
>> +  else
>> +    while (TREE_CODE (ref) == ARRAY_REF)
>> +      {
>> +	reftype = TREE_TYPE (ref);
>> +	ref = TREE_OPERAND (ref, 0);
>> +      }
>> +
>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>> +    reftype = TREE_TYPE (ref);
>> +
>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>> +    {
>> +      tree next = TREE_TYPE (reftype);
>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>> +	{
>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>> +	    if (tree_fits_uhwi_p (size))
>> +	      array_elts = tree_to_uhwi (size);
> 
> so array_elts is measued in bytes.
So I'm guessing your comment about the code looking way too complicated
is referring to the code *after* your comment.  That code is not in the
v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
haven't checked the full kit to see if this code appears in a subsequent
patch.


> 
>> +	  break;
>> +	}
>> +
>> +      reftype = TREE_TYPE (reftype);
>> +    }
>> +
>> +  if (decl)
>> +    *decl = array;
>> +
>>    /* Avoid returning a string that doesn't fit in the array
>>       it is stored in, like
>>       const char a[4] = "abcde";
>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>    unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>    length = string_length (TREE_STRING_POINTER (init), charsize,
>>  			  length / charsize);
> 
> Some callers especially those where the wrong code happens, expect
> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
> But using string_length assume thy stop at the first nul char.
Example please?


> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> 
> but here you compare bytes with length which is measued un chars.
> 
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>      return NULL_TREE;
>>  
>>    *ptr_offset = offset;
Seems wrong.  But my eyes are glazing over badly.  I'm going to have to
look at this and your subsequent comments tomorrow.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714)
  2018-08-15 15:42         ` Jeff Law
@ 2018-08-24 10:13           ` Richard Biener
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Biener @ 2018-08-24 10:13 UTC (permalink / raw)
  To: Jeff Law; +Cc: Martin Sebor, GCC Patches

On Wed, Aug 15, 2018 at 5:42 PM Jeff Law <law@redhat.com> wrote:
>
> On 08/15/2018 08:47 AM, Martin Sebor wrote:
> > On 08/15/2018 12:02 AM, Jeff Law wrote:
> >> On 08/13/2018 03:23 PM, Martin Sebor wrote:
> >>> To make reviewing the changes easier I've split up the patch
> >>> into a series:
> >> [ ... ]
> >> I'm about done for the night and thus won't get into the series (and as
> >> you know Bernd has a competing patch in this space).  But I did want to
> >> chime in on two things...
> >>
> >>>
> >>> There are many more string functions where unterminated (constant
> >>> or otherwise) should be diagnosed.  I plan to continue to work on
> >>> those (with the constant ones first)  but I want to post this
> >>> updated patch for review now, mainly so that the wrong code bug
> >>> (PR 86711) can be resolved and the basic detection infrastructure
> >>> agreed on.
> >> Yes, I think we definitely want to focus on the wrong code bug first.
> >>
> >>>
> >>> An open question in my mind is what should GCC do with such calls
> >>> after issuing a warning: replace them with traps?  Fold them into
> >>> constants?  Or continue to pass them through to the corresponding
> >>> library functions?
> >> My personal preference is to turn them into traps.  I don't think we
> >> have to preserve the call itself in this case.   I think the sequencing
> >> is to insert the trap before the call point, split the block after the
> >> trap, remove the outgoing edges, let DCE clean up the rest.  At least I
> >> think that's the sequencing.
> >
> > That sounds fine to me.  It would be close in its effects to
> > what _FORTIFY_SOURCE does.
> The bad guys are exceedingly resourceful in how they exploit undefined
> behavior.  By trapping immediately they don't have any window to do
> anything nefarious.
>
> >
> > It would be helpful to get a broader consensus on this and start
> > adopting the same consistent solution in all contexts.  The question
> > has come up a few times, most recently also in PR 86519 (folding
> > memcmp(a, "a", 3)) where GCC ends up calling the library function.
> Yup.  We've got a mish-mash of strategies here.

Folding cannot easily make sth "regular" as memcmp a noreturn thing.
At least not all callers expect that to happen.  So what you'd need to
do is ensure GF_CALL_CTRL_ALTERING is not set on the replacement
trap().  The next fixup_cfg () pass will fix things for you then.

> >
> > FWIW, if there are other preferences it might be worthwhile to
> > consider providing an option to control the behavior in these
> > cases.  There may also be interactions with or implications for
> > the sanitizers to consider.
> There's some (Marc Glisse IIRC) that would prefer to see the control
> path to the undefined behavior zapped entirely.  We didn't initially do
> that because the path my have other observable side effects.  However,
> there may be cases where it makes sense.

You can't remove observable side-effects and given that there exist
things like signal handlers for SIGSEGV even changing a memcmp
to __builtin_trap() may change observable behavior.

This is why some places in GCC simply refuse to optimize "broken"
cases but keep calling the library.

Richard.

> >
> > Once there is agreement on what the solution should be I can look
> > into implementing it at some point in the future.
> ACK.  Certainly lower priority than the stuff in flight right now.
>
> jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24  6:36         ` Jeff Law
@ 2018-08-24 12:28           ` Bernd Edlinger
  2018-08-24 16:04             ` Jeff Law
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-24 12:28 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/24/18 08:36, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>> On 08/02/18 04:44, Martin Sebor wrote:
>>> Since the foundation of the patch is detecting and avoiding
>>> the overly aggressive folding of unterminated char arrays,
>>> besides issuing a warning for such arguments to strlen,
>>> the patch also fixes pr86711 - wrong folding of memchr, and
>>> pr86714 - tree-ssa-forwprop.c confused by too long initializer.
>>>
>>> The substance of the attached updated patch is unchanged,
>>> I have just added test cases for the two additional bugs.
>>>
>>> Bernd, as I mentioned Wednesday, the patch supersedes
>>> yours here:
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>
>>
>> No problem, but I hope you understand, that I still uphold
>> my patch.
>>
>> So we have two patches now:
>> - mine, fixing a wrong code bug,
>> - yours, implementing a new warning and fixing a wrong
>> code bug at the same time.
>>
>> I will add a few comments to your patch below.
> [ ... ]
> 
> So a lot of the comments are out of date, presumably because Martin
> fixed the issues you pointed out in his second version of the patch.
> But there's still some useful nuggets in your comments that are still
> relevant.
> 

Yes, possible, and therefore I would like updated patches summarize
the changes.

> FYI it appears that sometimes you comment above a chunk of code, and
> other times below.  That makes it exceedingly difficult to figure out
> the issue you're trying to raise.
> 

Okydoky.


> 
>>
>>> Martin
>>>
>>> On 07/30/2018 01:17 PM, Martin Sebor wrote:
>>>> Attached is an updated version of the patch that handles more
>>>> instances of calling strlen() on a constant array that is not
>>>> a nul-terminated string.
>>>>
>>>> No other functions except strlen are explicitly handled yet,
>>>> and neither are constant arrays with braced-initializer lists
>>>> like const char a[] = { 'a', 'b', 'c' };  I am testing
>>>> an independent solution for those (bug 86552).  Once those
>>>> are handled the warning will be able to detect those as well.
>>>>
>>>> Tested on x86_64-linux.
>>>>
>>>> On 07/25/2018 05:38 PM, Martin Sebor wrote:
>>>>> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html
>>>>>
>>>>> The fix for bug 86532 has been checked in so this enhancement
>>>>> can now be applied on top of it (with only minor adjustments).
>>>>>
>>>>> On 07/19/2018 02:08 PM, Martin Sebor wrote:
>>>>>> In the discussion of my patch for pr86532 Bernd noted that
>>>>>> GCC silently accepts constant character arrays with no
>>>>>> terminating nul as arguments to strlen (and other string
>>>>>> functions).
>>>>>>
>>>>>> The attached patch is a first step in detecting these kinds
>>>>>> of bugs in strlen calls by issuing -Wstringop-overflow.
>>>>>> The next step is to modify all other handlers of built-in
>>>>>> functions to detect the same problem (not part of this patch).
>>>>>> Yet another step is to detect these problems in arguments
>>>>>> initialized using the non-string form:
>>>>>>
>>>>>>    const char a[] = { 'a', 'b', 'c' };
>>>>>>
>>>>>> This patch is meant to apply on top of the one for bug 86532
>>>>>> (I tested it with an earlier version of that patch so there
>>>>>> is code in the context that does not appear in the latest
>>>>>> version of the other diff).
>>>>>>
>>>>>> Martin
>>>>>>
>>>>>
>>>>
>>>
>>> PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer
>>> PR tree-optimization/86711 - wrong folding of memchr
>>> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
>>>
>>> gcc/ChangeLog:
>>>
>>> 	PR tree-optimization/86714
>>> 	PR tree-optimization/86711
>>> 	PR tree-optimization/86552
>>> 	* builtins.h (warn_string_no_nul): Declare..
>>> 	(c_strlen): Add argument.
>>> 	* builtins.c (warn_string_no_nul): New function.
>>> 	(fold_builtin_strlen): Add argument.  Detect missing nul.
>>> 	(fold_builtin_1): Adjust.
>>> 	(string_length): Add argument and use it.
>>> 	(c_strlen): Same.
>>> 	(expand_builtin_strlen): Detect missing nul.
>>> 	* expr.c (string_constant): Add arguments.  Detect missing nul
>>> 	terminator and outermost declaration it's missing in.
>>> 	* expr.h (string_constant): Add argument.
>>> 	* fold-const.c (c_getstr): Change argument to bool*, rename
>>> 	other arguments.
>>> 	* fold-const-call.c (fold_const_call): Detect missing nul.
>>> 	* gimple-fold.c (get_range_strlen): Add argument.
>>> 	(get_maxval_strlen): Adjust.
>>> 	* gimple-fold.h (get_range_strlen): Add argument.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 	PR tree-optimization/86714
>>> 	PR tree-optimization/86711
>>> 	PR tree-optimization/86552
>>> 	* gcc.c-torture/execute/memchr-1.c: New test.
>>> 	* gcc.c-torture/execute/pr86714.c: New test.
>>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
>>>
>>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>>> index aa3e0d8..f4924d5 100644
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int);
>>>   static rtx expand_builtin_expect (tree, rtx);
>>>   static tree fold_builtin_constant_p (tree);
>>>   static tree fold_builtin_classify_type (tree);
>>> -static tree fold_builtin_strlen (location_t, tree, tree);
>>> +static tree fold_builtin_strlen (location_t, tree, tree, tree);
>>>   static tree fold_builtin_inf (location_t, tree, int);
>>>   static tree rewrite_call_expr (location_t, tree, int, tree, int, ...);
>>>   static bool validate_arg (const_tree, enum tree_code code);
>>> @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>>     return n;
>>>   }
>>>   
>>> +/* For a call expression EXP to a function that expects a string argument,
>>> +   issue a diagnostic due to it being a called with an argument NONSTR
>>> +   that is a character array with no terminating NUL.  */
>>> +
>>> +void
>>> +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr)
>>> +{
>>> +  loc = expansion_point_location_if_in_system_header (loc);
>>> +
>>> +  bool warned;
>>> +  if (exp)
>>> +    {
>>> +      if (!fndecl)
>>> +	fndecl = get_callee_fndecl (exp);
>>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> +			   "%K%qD argument missing terminating nul",
>>> +			   exp, fndecl);
>>> +    }
>>> +  else
>>> +    {
>>> +      gcc_assert (fndecl);
>>> +      warned = warning_at (loc, OPT_Wstringop_overflow_,
>>> +			   "%qD argument missing terminating nul",
>>> +			   fndecl);
>>> +    }
>>> +
>>> +  if (warned && DECL_P (nonstr))
>>> +    inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here");
>>> +}
>>> +
>>>   /* Compute the length of a null-terminated character string or wide
>>>      character string handling character sizes of 1, 2, and 4 bytes.
>>>      TREE_STRING_LENGTH is not the right way because it evaluates to
>>> @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts)
>>>      accesses.  Note that this implies the result is not going to be emitted
>>>      into the instruction stream.
>>>   
>>> +   When ARR is non-null and the string is not properly nul-terminated,
>>> +   set *ARR to the declaration of the outermost constant object whose
>>> +   initializer (or one of its elements) is not nul-terminated.
>>> +
>>>      The value returned is of type `ssizetype'.
>>>   
>>>      Unfortunately, string_constant can't access the values of const char
>>>      arrays with initializers, so neither can we do so here.  */
>>>   
>>>   tree
>>> -c_strlen (tree src, int only_value)
>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>>   {
>>>     STRIP_NOPS (src);
>>> +
>>> +  /* Used to detect non-nul-terminated strings in subexpressions
>>> +     of a conditional expression.  When ARR is null, point it at
>>> +     one of the elements for simplicity.  */
>>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>>> +  if (!arr)
>>> +    arr = arrs;
>>
>> now arr is always != NULL
> Right.  It's renamed in the follow-up patch from Martin, but yes, it's
> always non-null.  Note in this case your comment is after the code
> you're referring to.
> 
>>
>>> +
>>>     if (TREE_CODE (src) == COND_EXPR
>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>       {
>>> -      tree len1, len2;
>>> -
>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>         if (tree_int_cst_equal (len1, len2))
>>> -	return len1;
>>> +	{
>>
>> funny, if called with NULL *arr and arrs[0] alias each other.
> 
>>
>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>> +	  return len1;
>>> +	}
>>>       }
> 
> And in this case it looks like your comment is before the code you're
> commenting about.  It was fairly obvious in this case because the code
> prior to your "funny, if called..." comment didn't reference arr or arrs
> at all.
> 
> And more importantly, why are you concerned about the aliasing?
> 

It is just *arr = arrs[0] does nothing, but it looks like the author
was not aware of it.  It may be okay, but causes head-scratching.
If you don't have the context you will think this does something different.

> 
> 
> 
> 
>>>   
>>>     if (TREE_CODE (src) == COMPOUND_EXPR
>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>> -    return c_strlen (TREE_OPERAND (src, 1), only_value);
>>> +    return c_strlen (TREE_OPERAND (src, 1), only_value, arr);
>>>   
>>>     location_t loc = EXPR_LOC_OR_LOC (src, input_location);
>>>   
>>>     /* Offset from the beginning of the string in bytes.  */
>>>     tree byteoff;
>>> -  src = string_constant (src, &byteoff);
>>> -  if (src == 0)
>>> -    return NULL_TREE;
>>> +  /* Set if array is nul-terminated, false otherwise.  */
>>> +  bool nulterm;
>>
>> note arr is always != null or pointing to arrs[0].
>>
>>> +  src = string_constant (src, &byteoff, &nulterm, arr);
>>> +  if (!src)
>>> +    {
>>> +      *arr = arrs[0] ? arrs[0] : arrs[1];
>>> +      return NULL_TREE;
>>> +    }
>>> +
>>> +  /* Clear *ARR when the string is nul-terminated.  It should be
>>> +     of no interest to callers.  */
>>> +  if (nulterm)
>>> +    *arr = NULL_TREE;
>>>   
>>>     /* Determine the size of the string element.  */
>>>     unsigned eltsize
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>>   	maxelts = maxelts / eltsize - 1;
>>>         }
>>>   
>>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>>> +     is stored in (as in const char a[3] = "123";  */
>>
>> note arr is always != NULL, thus this if is never taken.
>>
>>> +  if (!arr && maxelts < strelts)
>>> +    return NULL_TREE;
>>> +
> Right.  And I think that check is gone in the second version of Martin's
> patch.
> 
> 
>>>     /* PTR can point to the byte representation of any string type, including
>>>        char* and wchar_t*.  */
>>>     const char *ptr = TREE_STRING_POINTER (src);
>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>         offsave = fold_convert (ssizetype, offsave);
>>>         tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>   				      build_int_cst (ssizetype, len * eltsize));
>>
>> this computation is wrong, it computes not in units of eltsize,
>> I am however not sure if it is really good that this function tries to
>> compute strlen of wide character strings.
> Note that in the second version of Martin's patch eltsize will always be
> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>   It looks like you switched back to commenting after the code for that
> comment, but then immediately thereafter...
> 

Acutally I had no idea that the second patch did resolve some of my comments
and which those were.

I had the impression that it is just splitting up of a large patch into
several smaller without reworking at the same time.

Once again, a summary what was changed would be helpful.

> 
>>
>> That said, please fix this computation first, in a different patch
>> instead of just fixing the indentation. (I know I pointed that lines are too
>> long here, but that was before I realized that the whole length computation
>> here is wrong).
>>
>>> -      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave);
>>> +      tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize),
>>> +				     offsave);
>>>         return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp,
>>>   			      build_zero_cst (ssizetype));
>>>       }
>>> @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value)
>>>        Since ELTOFF is our starting index into the string, no further
>>>        calculation is needed.  */
> The comment shown above appears to refer to the code below the comment.
> Again, this makes it exceedingly confusing to understand your comments
> and take appropriate action.
> 
> 
> 
> 
>>
>> What are you fixing here, I think that was another bug.
>> If this fixes something then it should be in a different patch,
>> just handling this.
>>
>>>     unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>> -				maxelts - eltoff);
>>> +				strelts - eltoff);
>>>   
>>>     return ssize_int (len);
>>>   }
> I'm guessing the comment in this case refers to the code after.
> Presumably questioning the change from maxelts to strelts.
> 
> 

Yes.  I was thinking that could be a patch of its own.

> 
> 
>>> @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target,
>>>   
>>>     struct expand_operand ops[4];
>>>     rtx pat;
>>> -  tree len;
>>>     tree src = CALL_EXPR_ARG (exp, 0);
>>>     rtx src_reg;
>>>     rtx_insn *before_strlen;
>>> @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target,
>>>     unsigned int align;
>>>   
>>>     /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
> So I think this is fixed by your change which restored the default
> behavior of c_strlen to count bytes.  Which restores the behavior of
> c_strlen to match that of the strlen library call.
> 
> So for something like L"abc", the right value is 1 or 0 for LE and BE
> targets respectively.
> 

Well, yes, although I changed my mind on strlen(L"abc") meanwhile.

This may appear as if I contradict myself but, it is more like I learn.

The installed patch for the ELTSIZE parameter was in part inspired by the
thought about "not folding invalid stuff" that was forming in my mind
at that time, even before I wrote it down and sent it to this list.

So the patch does reject to give a value for strlen(L"abc") since
it is an invalid call.

Previously that was counting bytes in a wide char string, but I doubt
that was the original intention though.


> 
> 
>>
>>>     if (len)
>>> -    return expand_expr (len, target, target_mode, EXPAND_NORMAL);
>>> +    {
>>> +      if (array)
>>> +	{
>>> +	  /* Array refers to the non-nul terminated constant array
>>> +	     whose length is attempted to be computed.  */
>>
>> I really wonder if it would not make more sense to have a
>> nonterminated_string_constant_p instead.
> Rather than a boolean I think we want a tree * so that we can bubble up
> more information to the caller WRT non-terminated strings.
> 
>>
>> Last time I wanted to implement a warning in expand I faced the
>> problem that inlined functions will get one warning per invocation?
> Yea, but that's just the way things are.  I don't think it's really an
> issue for the patches we're looking at right now.
> 

No.

>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>     return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>   }
>>>   
>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>   
>>>   static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>   {
>>>     if (!validate_arg (arg, POINTER_TYPE))
>>>       return NULL_TREE;
>>>     else
>>>       {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
> Given your fix to have c_strlen count bytes by default, I think we're OK
> here.
> 

Yes.  But my fix was still incomplete.
So incremental progress is necessary here.

>>
>>> @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset)
>>>   	return NULL_TREE;
>>>   
>>>         tree offset;
>>> -      if (tree str = string_constant (arg0, &offset))
>>> +      if (tree str = string_constant (arg0, &offset, nulterm, decl))
>>>   	{
>>>   	  /* Avoid pointers to arrays (see bug 86622).  */
>>>   	  if (POINTER_TYPE_P (TREE_TYPE (arg))
>>> @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset)
>>>     if (TREE_CODE (array) == STRING_CST)
>>
>> Well, actually I think there _are_ STING_CSTs which are not null terminated.
>> Maybe not in C. But Fortran, Ada, Go...
>>
>>>       {
>>>         *ptr_offset = fold_convert (sizetype, offset);
>>> +      if (nulterm)
>>> +	*nulterm = true;
>>> +      if (decl)
>>> +	*decl = NULL_TREE;
>>>         return array;
>>>       }
> So your comment seems to be referring to the code above the comment as
> well as below.  Confusing.  Consistency really helps.
> 
> Are we at a point where we're ready to declare STRING_CSTs as always
> being properly terminated?  If not the hunk of code above needs some
> rethinking since we can't guarantee the string is properly terminated.
> 

No.  That is still in the flux.
And I am not sure if the "properly terminated" will survive.

I am digging deep to the ground now.
And the correctness of the code in questing depends heavily on the
results.  Therefore I would like to fix this from the ground.

Adding lots of new code that is based on wrong assumptions
will not help.


> 
>>>   
>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>     if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>       return NULL_TREE;
>>>   
>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>> +
>>
>> I don't understand why this is necessary at all.
>> It looks way too complicated, to say the least.
>>
>> TREE_TYPE (init) has already the type of the member.
>>
>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>> +     the size of the character array within the aggregate.  */
>>> +  tree ref = arg;
>>> +  tree reftype = TREE_TYPE (arg);
>>> +
>>> +  if (TREE_CODE (ref) == MEM_REF)
>>> +    {
>>> +      ref = TREE_OPERAND (ref, 0);
>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>> +	{
>>> +	  ref = TREE_OPERAND (ref, 0);
>>> +	  reftype = TREE_TYPE (ref);
>>> +	}
>>> +    }
>>> +  else
>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>> +      {
>>> +	reftype = TREE_TYPE (ref);
>>> +	ref = TREE_OPERAND (ref, 0);
>>> +      }
>>> +
>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>> +    reftype = TREE_TYPE (ref);
>>> +
>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>> +    {
>>> +      tree next = TREE_TYPE (reftype);
>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>> +	{
>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>> +	    if (tree_fits_uhwi_p (size))
>>> +	      array_elts = tree_to_uhwi (size);
>>
>> so array_elts is measued in bytes.
> So I'm guessing your comment about the code looking way too complicated
> is referring to the code *after* your comment.  That code is not in the
> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
> haven't checked the full kit to see if this code appears in a subsequent
> patch.
> 

No I meant the code between
/* When ARG refers to an aggregate (of arrays) try to determine
    the size of the character array within the aggregate.  */

and here.  It is wrong (assuming C semantic on GIMPLE).

> 
>>
>>> +	  break;
>>> +	}
>>> +
>>> +      reftype = TREE_TYPE (reftype);
>>> +    }
>>> +
>>> +  if (decl)
>>> +    *decl = array;
>>> +
>>>     /* Avoid returning a string that doesn't fit in the array
>>>        it is stored in, like
>>>        const char a[4] = "abcde";
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>     unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>     length = string_length (TREE_STRING_POINTER (init), charsize,
>>>   			  length / charsize);
>>
>> Some callers especially those where the wrong code happens, expect
>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>> But using string_length assume thy stop at the first nul char.
> Example please?
> 

tree-ssa-forwprop.c:
               str1 = string_constant (src1, &off1);
               if (str1 == NULL_TREE)
                 break;
               if (!tree_fits_uhwi_p (off1)
                   || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
                   || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
                                              - tree_to_uhwi (off1)) > 0



Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24 12:28           ` Bernd Edlinger
@ 2018-08-24 16:04             ` Jeff Law
  2018-08-24 21:56               ` Bernd Edlinger
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-24 16:04 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
[ Lots of snipping throughout ]


>>>
>>>> +
>>>>     if (TREE_CODE (src) == COND_EXPR
>>>>         && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>>       {
>>>> -      tree len1, len2;
>>>> -
>>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>>         if (tree_int_cst_equal (len1, len2))
>>>> -	return len1;
>>>> +	{
>>>
>>> funny, if called with NULL *arr and arrs[0] alias each other.
>>
>>>
>>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>>> +	  return len1;
>>>> +	}
>>>>       }
>>
>> And in this case it looks like your comment is before the code you're
>> commenting about.  It was fairly obvious in this case because the code
>> prior to your "funny, if called..." comment didn't reference arr or arrs
>> at all.
>>
>> And more importantly, why are you concerned about the aliasing?
>>
> 
> It is just *arr = arrs[0] does nothing, but it looks like the author
> was not aware of it.  It may be okay, but causes head-scratching.
> If you don't have the context you will think this does something different.
*arr = arrs[0] ? arrs[0] : arrs[1];

Yes, there's potentially a dead store when arr points to arrs[0] because
of the earlier initialization when NULL was passed for arr.  But
otherwise we'd be checking repeatedly that arr != NULL.


>>>>     /* PTR can point to the byte representation of any string type, including
>>>>        char* and wchar_t*.  */
>>>>     const char *ptr = TREE_STRING_POINTER (src);
>>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>>         offsave = fold_convert (ssizetype, offsave);
>>>>         tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>>   				      build_int_cst (ssizetype, len * eltsize));
>>>
>>> this computation is wrong, it computes not in units of eltsize,
>>> I am however not sure if it is really good that this function tries to
>>> compute strlen of wide character strings.
>> Note that in the second version of Martin's patch eltsize will always be
>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>   It looks like you switched back to commenting after the code for that
>> comment, but then immediately thereafter...
>>
> 
> Acutally I had no idea that the second patch did resolve some of my comments
> and which those were.
It happens.  Multiple long threads make it difficult to follow.

> 
> I had the impression that it is just splitting up of a large patch into
> several smaller without reworking at the same time.
> 
> Once again, a summary what was changed would be helpful.
Agreed.


>>> What are you fixing here, I think that was another bug.
>>> If this fixes something then it should be in a different patch,
>>> just handling this.
>>>
>>>>     unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>>> -				maxelts - eltoff);
>>>> +				strelts - eltoff);
>>>>   
>>>>     return ssize_int (len);
>>>>   }
>> I'm guessing the comment in this case refers to the code after.
>> Presumably questioning the change from maxelts to strelts.
>>
>>
> 
> Yes.  I was thinking that could be a patch of its own.
Funny.  I found myself in agreement and was going to extract this out
and run it through the usual tests and get it onto the trunk
immediately.  Then I realized you'd already fixed this in the patch that
added the eltsize paramter to c_strlen which has already been committed
:-)



> 
> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
> 
> This may appear as if I contradict myself but, it is more like I learn.
> 
> The installed patch for the ELTSIZE parameter was in part inspired by the
> thought about "not folding invalid stuff" that was forming in my mind
> at that time, even before I wrote it down and sent it to this list.
ACK.  And I think there seems to be consensus forming around that
concept which is good.

If we ultimately decide not to fold strlen of a wide character string,
then that'll be an easy enough change to make.  In the mean time bring
consistent with how the C library strlen works is a good thing IMHO.


>>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>>     return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>>   }
>>>>   
>>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>>   
>>>>   static tree
>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>   {
>>>>     if (!validate_arg (arg, POINTER_TYPE))
>>>>       return NULL_TREE;
>>>>     else
>>>>       {
>>>> -      tree len = c_strlen (arg, 0);
>>>> -
>>>> +      tree arr = NULL_TREE;
>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>
>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>> what will c_strlen return then?
>> Given your fix to have c_strlen count bytes by default, I think we're OK
>> here.
>>
> 
> Yes.  But my fix was still incomplete.
> So incremental progress is necessary here.
That's fine.  I'm generally a fan of incremental improvements.

>>>
>>>>       {
>>>>         *ptr_offset = fold_convert (sizetype, offset);
>>>> +      if (nulterm)
>>>> +	*nulterm = true;
>>>> +      if (decl)
>>>> +	*decl = NULL_TREE;
>>>>         return array;
>>>>       }
>> So your comment seems to be referring to the code above the comment as
>> well as below.  Confusing.  Consistency really helps.
>>
>> Are we at a point where we're ready to declare STRING_CSTs as always
>> being properly terminated?  If not the hunk of code above needs some
>> rethinking since we can't guarantee the string is properly terminated.
>>
> 
> No.  That is still in the flux.
> And I am not sure if the "properly terminated" will survive.
> 
> I am digging deep to the ground now.
> And the correctness of the code in questing depends heavily on the
> results.  Therefore I would like to fix this from the ground.
> 
> Adding lots of new code that is based on wrong assumptions
> will not help.
OK.  So clearly this hunk needs rethinking.  I'm not sure if/how
critical the code above is -- we may be able to get away with deferring
this chunk.  I'll have to look at that more deeply.

> 
> 
>>
>>>>   
>>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>>     if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>>       return NULL_TREE;
>>>>   
>>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>>> +
>>>
>>> I don't understand why this is necessary at all.
>>> It looks way too complicated, to say the least.
>>>
>>> TREE_TYPE (init) has already the type of the member.
>>>
>>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>>> +     the size of the character array within the aggregate.  */
>>>> +  tree ref = arg;
>>>> +  tree reftype = TREE_TYPE (arg);
>>>> +
>>>> +  if (TREE_CODE (ref) == MEM_REF)
>>>> +    {
>>>> +      ref = TREE_OPERAND (ref, 0);
>>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>>> +	{
>>>> +	  ref = TREE_OPERAND (ref, 0);
>>>> +	  reftype = TREE_TYPE (ref);
>>>> +	}
>>>> +    }
>>>> +  else
>>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>>> +      {
>>>> +	reftype = TREE_TYPE (ref);
>>>> +	ref = TREE_OPERAND (ref, 0);
>>>> +      }
>>>> +
>>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>>> +    reftype = TREE_TYPE (ref);
>>>> +
>>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>>> +    {
>>>> +      tree next = TREE_TYPE (reftype);
>>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>>> +	{
>>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>>> +	    if (tree_fits_uhwi_p (size))
>>>> +	      array_elts = tree_to_uhwi (size);
>>>
>>> so array_elts is measued in bytes.
>> So I'm guessing your comment about the code looking way too complicated
>> is referring to the code *after* your comment.  That code is not in the
>> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
>> haven't checked the full kit to see if this code appears in a subsequent
>> patch.
>>
> 
> No I meant the code between
> /* When ARG refers to an aggregate (of arrays) try to determine
>     the size of the character array within the aggregate.  */
> 
> and here.  It is wrong (assuming C semantic on GIMPLE).
Ah.  Yea.  I think we're broadly in agreement we can't do that for
anything which affects optimization/codegen.  Given those bits aren't in
Martin's v2 patch I think we can move on.


> 
>>
>>>
>>>> +	  break;
>>>> +	}
>>>> +
>>>> +      reftype = TREE_TYPE (reftype);
>>>> +    }
>>>> +
>>>> +  if (decl)
>>>> +    *decl = array;
>>>> +
>>>>     /* Avoid returning a string that doesn't fit in the array
>>>>        it is stored in, like
>>>>        const char a[4] = "abcde";
>>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>>     unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>>     length = string_length (TREE_STRING_POINTER (init), charsize,
>>>>   			  length / charsize);
>>>
>>> Some callers especially those where the wrong code happens, expect
>>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>>> But using string_length assume thy stop at the first nul char.
>> Example please?
>>
> 
> tree-ssa-forwprop.c:
>                str1 = string_constant (src1, &off1);
>                if (str1 == NULL_TREE)
>                  break;
>                if (!tree_fits_uhwi_p (off1)
>                    || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
>                    || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
>                                               - tree_to_uhwi (off1)) > 0
I understand that you're concerned about reading past the end of the
returned STRING_CST via the subsequent memcpy in tree-ssa-forwprop.c
which starts at src1+off1 and reads len1 bytes.

But I'm not sure how that can happen in the V2 patch from Martin.  In
fact, one of the fundamental goals of the V2 patch is to avoid this
exact problem.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 13:26       ` Bernd Edlinger
  2018-08-02 18:56         ` Bernd Edlinger
  2018-08-24  6:36         ` Jeff Law
@ 2018-08-24 16:51         ` Jeff Law
  2018-08-24 17:26           ` Bernd Edlinger
  2 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-24 16:51 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
> 
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
> but here you compare bytes with length which is measued un chars.
> 
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>      return NULL_TREE;
>>  
>>    *ptr_offset = offset;
Actually in the V2 patch length is measured by element size.  So I think
this is a non-issue.



>> -      /* Compute and store the length of the substring at OFFSET.
>> +      /* Compute and store the size of the substring at OFFSET.
>>  	 All offsets past the initial length refer to null strings.  */
>> -      if (offset <= string_length)
>> -	*strlen = string_length - offset;
> this should be offset < string_length.
> 
>> +      if (offset <= string_size)
>> +	*strsize = string_size - offset;
>>        else
>> -	*strlen = 0;
> this should be 1, you may access the NUL byte of "".
> 
>> +	*strsize = 0;
>>      }
Agreed in both cases based on the defined behavior for the function (NUL
is included).


>>  
>> -  const char *string = TREE_STRING_POINTER (src);
>> -
>> -  if (string_length == 0
>> -      || offset >= string_size)
>> +  if (string_size == 0
>> +      || offset >= array_size)
>>      return NULL;
>>  
>> -  if (strsize)
>> -    {
>> -      /* Support even constant character arrays that aren't proper
>> -	 NUL-terminated strings.  */
>> -      *strsize = string_size;
>> -    }
>> -  else if (string[string_length - 1] != '\0')
> Well, this is broken for wide character strings.
> but I hope we can get rid of STRING_CST which are
> not explicitly null terminated.
This hunk is gone in the V2 patch.  So I'm not going to worry about it
right now.



> 
>> +  if (!nulterm && string[string_size - 1] != '\0')
>>      {
>> -      /* Support only properly NUL-terminated strings but handle
>> -	 consecutive strings within the same array, such as the six
>> -	 substrings in "1\0002\0003".  */
>> +      /* When NULTERM is null, support only properly nul-terminated
>> +	 strings but handle consecutive strings within the same array,
>> +	 such as the six substrings in "1\0002\0003".  Otherwise, let
>> +	 the caller deal with non-nul-terminated arrays.  */
>>        return NULL;
>>      }
>>  
>> -  return offset <= string_length ? string + offset : "";
> this should be offset < string_size.
> 
>> +  return offset <= string_size ? string + offset : "";
Agreed.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24 16:51         ` Jeff Law
@ 2018-08-24 17:26           ` Bernd Edlinger
  2018-08-24 23:54             ` Jeff Law
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-24 17:26 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/24/18 18:51, Jeff Law wrote:
> On 08/02/2018 07:26 AM, Bernd Edlinger wrote:
>>
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>> but here you compare bytes with length which is measued un chars.
>>
>>> +    *nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>       return NULL_TREE;
>>>   
>>>     *ptr_offset = offset;
> Actually in the V2 patch length is measured by element size.  So I think
> this is a non-issue.
> 
> 
> 
>>> -      /* Compute and store the length of the substring at OFFSET.
>>> +      /* Compute and store the size of the substring at OFFSET.
>>>   	 All offsets past the initial length refer to null strings.  */
>>> -      if (offset <= string_length)
>>> -	*strlen = string_length - offset;
>> this should be offset < string_length.
>>
>>> +      if (offset <= string_size)
>>> +	*strsize = string_size - offset;
>>>         else
>>> -	*strlen = 0;
>> this should be 1, you may access the NUL byte of "".
>>
>>> +	*strsize = 0;
>>>       }
> Agreed in both cases based on the defined behavior for the function (NUL
> is included).
> 
> 
>>>   
>>> -  const char *string = TREE_STRING_POINTER (src);
>>> -
>>> -  if (string_length == 0
>>> -      || offset >= string_size)
>>> +  if (string_size == 0
>>> +      || offset >= array_size)
>>>       return NULL;
>>>   
>>> -  if (strsize)
>>> -    {
>>> -      /* Support even constant character arrays that aren't proper
>>> -	 NUL-terminated strings.  */
>>> -      *strsize = string_size;
>>> -    }
>>> -  else if (string[string_length - 1] != '\0')
>> Well, this is broken for wide character strings.
>> but I hope we can get rid of STRING_CST which are
>> not explicitly null terminated.

I am afraid that is not going to happen.
Maybe we can get STRING_CST that are never longer
than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
need to take care that the string is zero-terminated.

string_constant, should not promise the string is zero terminated.
But instead it can promise that:
1) the STRING_CST is valid up to TREE_STRING_LENGTH
2) mem_size is >= TREE_STRING_LENGTH
3) memory between TREE_STRING_LENGTH and mem_size is ZERO.

It will not guarantee anything about zero termination any more.

But that will again need an adjustment to the string_constant
and likely to my 86711/86714/87053 patch set.

Sorry for all this back and forth.

> This hunk is gone in the V2 patch.  So I'm not going to worry about it
> right now.
> 
> 

Hmm.  In my tree I wanted to drop this check first, but that
did not work right. So I skip the check if strsize is not NULL.
But if it is NULL, I check the element size is 1, if it is
not 1 return NULL, because it is no character string.

I had several iterations here, and it might still be dependent
on semantical properties of STRING_CST that may not be be granted
in the light of my recent discoveries.


In the end, the best approach might be to either merge my patch
with Martins, or step-wise, first fixing wrong code, and then
implementing warnings without fixing wrong code.

Anyway, the smaller the patch the better for the review.


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24 16:04             ` Jeff Law
@ 2018-08-24 21:56               ` Bernd Edlinger
  0 siblings, 0 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-24 21:56 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/24/18 18:04, Jeff Law wrote:
> On 08/24/2018 06:27 AM, Bernd Edlinger wrote:
> [ Lots of snipping throughout ]
> 
> 
>>>>
>>>>> +
>>>>>      if (TREE_CODE (src) == COND_EXPR
>>>>>          && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0))))
>>>>>        {
>>>>> -      tree len1, len2;
>>>>> -
>>>>> -      len1 = c_strlen (TREE_OPERAND (src, 1), only_value);
>>>>> -      len2 = c_strlen (TREE_OPERAND (src, 2), only_value);
>>>>> +      tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs);
>>>>> +      tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1);
>>>>>          if (tree_int_cst_equal (len1, len2))
>>>>> -	return len1;
>>>>> +	{
>>>>
>>>> funny, if called with NULL *arr and arrs[0] alias each other.
>>>
>>>>
>>>>> +	  *arr = arrs[0] ? arrs[0] : arrs[1];
>>>>> +	  return len1;
>>>>> +	}
>>>>>        }
>>>
>>> And in this case it looks like your comment is before the code you're
>>> commenting about.  It was fairly obvious in this case because the code
>>> prior to your "funny, if called..." comment didn't reference arr or arrs
>>> at all.
>>>
>>> And more importantly, why are you concerned about the aliasing?
>>>
>>
>> It is just *arr = arrs[0] does nothing, but it looks like the author
>> was not aware of it.  It may be okay, but causes head-scratching.
>> If you don't have the context you will think this does something different.
> *arr = arrs[0] ? arrs[0] : arrs[1];
> 
> Yes, there's potentially a dead store when arr points to arrs[0] because
> of the earlier initialization when NULL was passed for arr.  But
> otherwise we'd be checking repeatedly that arr != NULL.
> 
> 
>>>>>      /* PTR can point to the byte representation of any string type, including
>>>>>         char* and wchar_t*.  */
>>>>>      const char *ptr = TREE_STRING_POINTER (src);
>>>>> @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value)
>>>>>          offsave = fold_convert (ssizetype, offsave);
>>>>>          tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave,
>>>>>    				      build_int_cst (ssizetype, len * eltsize));
>>>>
>>>> this computation is wrong, it computes not in units of eltsize,
>>>> I am however not sure if it is really good that this function tries to
>>>> compute strlen of wide character strings.
>>> Note that in the second version of Martin's patch eltsize will always be
>>> 1 when we get here.  Furthermore, the multiplication by eltsize is gone.
>>>    It looks like you switched back to commenting after the code for that
>>> comment, but then immediately thereafter...
>>>
>>
>> Acutally I had no idea that the second patch did resolve some of my comments
>> and which those were.
> It happens.  Multiple long threads make it difficult to follow.
> 
>>
>> I had the impression that it is just splitting up of a large patch into
>> several smaller without reworking at the same time.
>>
>> Once again, a summary what was changed would be helpful.
> Agreed.
> 
> 
>>>> What are you fixing here, I think that was another bug.
>>>> If this fixes something then it should be in a different patch,
>>>> just handling this.
>>>>
>>>>>      unsigned len = string_length (ptr + eltoff * eltsize, eltsize,
>>>>> -				maxelts - eltoff);
>>>>> +				strelts - eltoff);
>>>>>    
>>>>>      return ssize_int (len);
>>>>>    }
>>> I'm guessing the comment in this case refers to the code after.
>>> Presumably questioning the change from maxelts to strelts.
>>>
>>>
>>
>> Yes.  I was thinking that could be a patch of its own.
> Funny.  I found myself in agreement and was going to extract this out
> and run it through the usual tests and get it onto the trunk
> immediately.  Then I realized you'd already fixed this in the patch that
> added the eltsize paramter to c_strlen which has already been committed
> :-)
> 
> 
> 
>>
>> Well, yes, although I changed my mind on strlen(L"abc") meanwhile.
>>
>> This may appear as if I contradict myself but, it is more like I learn.
>>
>> The installed patch for the ELTSIZE parameter was in part inspired by the
>> thought about "not folding invalid stuff" that was forming in my mind
>> at that time, even before I wrote it down and sent it to this list.
> ACK.  And I think there seems to be consensus forming around that
> concept which is good.
> 
> If we ultimately decide not to fold strlen of a wide character string,
> then that'll be an easy enough change to make.  In the mean time bring
> consistent with how the C library strlen works is a good thing IMHO.
> 
> 
>>>>> @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg)
>>>>>      return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg)));
>>>>>    }
>>>>>    
>>>>> -/* Fold a call to __builtin_strlen with argument ARG.  */
>>>>> +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG.  */
>>>>>    
>>>>>    static tree
>>>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>>>    {
>>>>>      if (!validate_arg (arg, POINTER_TYPE))
>>>>>        return NULL_TREE;
>>>>>      else
>>>>>        {
>>>>> -      tree len = c_strlen (arg, 0);
>>>>> -
>>>>> +      tree arr = NULL_TREE;
>>>>> +      tree len = c_strlen (arg, 0, &arr);
>>>>
>>>> Is it possible to write a test case where strlen(L"test") reaches this point?
>>>> what will c_strlen return then?
>>> Given your fix to have c_strlen count bytes by default, I think we're OK
>>> here.
>>>
>>
>> Yes.  But my fix was still incomplete.
>> So incremental progress is necessary here.
> That's fine.  I'm generally a fan of incremental improvements.
> 
>>>>
>>>>>        {
>>>>>          *ptr_offset = fold_convert (sizetype, offset);
>>>>> +      if (nulterm)
>>>>> +	*nulterm = true;
>>>>> +      if (decl)
>>>>> +	*decl = NULL_TREE;
>>>>>          return array;
>>>>>        }
>>> So your comment seems to be referring to the code above the comment as
>>> well as below.  Confusing.  Consistency really helps.
>>>
>>> Are we at a point where we're ready to declare STRING_CSTs as always
>>> being properly terminated?  If not the hunk of code above needs some
>>> rethinking since we can't guarantee the string is properly terminated.
>>>
>>
>> No.  That is still in the flux.
>> And I am not sure if the "properly terminated" will survive.
>>
>> I am digging deep to the ground now.
>> And the correctness of the code in questing depends heavily on the
>> results.  Therefore I would like to fix this from the ground.
>>
>> Adding lots of new code that is based on wrong assumptions
>> will not help.
> OK.  So clearly this hunk needs rethinking.  I'm not sure if/how
> critical the code above is -- we may be able to get away with deferring
> this chunk.  I'll have to look at that more deeply.
> 

Yes. Note one interesting thing.  The new version of the STRING_CST
consistency check in varasm.c does some magic.
This time I think Richard generally agreed on the approach.

The surprise is, it fixes both PR86711/86714 without any other patch.

I really think that is the preferred way to go ahead.

If you prevent inconsistent input to the string_constant function
then it is a lot easier to prevent inconsistent output.

So I start to think that at least my proposed fix for PR86711/86714
seems like it was not yet at the root cause of the problem.
The root cause was probably the semantical differences between
different uses of STRING_CSTs.


>>
>>
>>>
>>>>>    
>>>>> @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset)
>>>>>      if (!array_size || TREE_CODE (array_size) != INTEGER_CST)
>>>>>        return NULL_TREE;
>>>>>    
>>>>> +  unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size);
>>>>> +
>>>>
>>>> I don't understand why this is necessary at all.
>>>> It looks way too complicated, to say the least.
>>>>
>>>> TREE_TYPE (init) has already the type of the member.
>>>>
>>>>> +  /* When ARG refers to an aggregate (of arrays) try to determine
>>>>> +     the size of the character array within the aggregate.  */
>>>>> +  tree ref = arg;
>>>>> +  tree reftype = TREE_TYPE (arg);
>>>>> +
>>>>> +  if (TREE_CODE (ref) == MEM_REF)
>>>>> +    {
>>>>> +      ref = TREE_OPERAND (ref, 0);
>>>>> +      if (TREE_CODE (ref) == ADDR_EXPR)
>>>>> +	{
>>>>> +	  ref = TREE_OPERAND (ref, 0);
>>>>> +	  reftype = TREE_TYPE (ref);
>>>>> +	}
>>>>> +    }
>>>>> +  else
>>>>> +    while (TREE_CODE (ref) == ARRAY_REF)
>>>>> +      {
>>>>> +	reftype = TREE_TYPE (ref);
>>>>> +	ref = TREE_OPERAND (ref, 0);
>>>>> +      }
>>>>> +
>>>>> +  if (TREE_CODE (ref) == COMPONENT_REF)
>>>>> +    reftype = TREE_TYPE (ref);
>>>>> +
>>>>> +  while (TREE_CODE (reftype) == ARRAY_TYPE)
>>>>> +    {
>>>>> +      tree next = TREE_TYPE (reftype);
>>>>> +      if (TREE_CODE (next) == INTEGER_TYPE)
>>>>> +	{
>>>>> +	  if (tree size = TYPE_SIZE_UNIT (reftype))
>>>>> +	    if (tree_fits_uhwi_p (size))
>>>>> +	      array_elts = tree_to_uhwi (size);
>>>>
>>>> so array_elts is measued in bytes.
>>> So I'm guessing your comment about the code looking way too complicated
>>> is referring to the code *after* your comment.  That code is not in the
>>> v2 patch.  At least not 01/06 which addresses the codegen/opt issues.  I
>>> haven't checked the full kit to see if this code appears in a subsequent
>>> patch.
>>>
>>
>> No I meant the code between
>> /* When ARG refers to an aggregate (of arrays) try to determine
>>      the size of the character array within the aggregate.  */
>>
>> and here.  It is wrong (assuming C semantic on GIMPLE).
> Ah.  Yea.  I think we're broadly in agreement we can't do that for
> anything which affects optimization/codegen.  Given those bits aren't in
> Martin's v2 patch I think we can move on.
> 
> 

Yes, with the new semantics of STRING_CST we can guarantee the
STRING_CST have a _valid_ TYPE_UNIT_SIZE, even in cases of flexible array members
where previously the TYPE_UNIT_SIZE was NULL.  That makes a big difference here.


So I would wait with this topic until we have the semantic issues resolved.



>>
>>>
>>>>
>>>>> +	  break;
>>>>> +	}
>>>>> +
>>>>> +      reftype = TREE_TYPE (reftype);
>>>>> +    }
>>>>> +
>>>>> +  if (decl)
>>>>> +    *decl = array;
>>>>> +
>>>>>      /* Avoid returning a string that doesn't fit in the array
>>>>>         it is stored in, like
>>>>>         const char a[4] = "abcde";
>>>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>>>      unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>>>      length = string_length (TREE_STRING_POINTER (init), charsize,
>>>>>    			  length / charsize);
>>>>
>>>> Some callers especially those where the wrong code happens, expect
>>>> to be able to access the STRING_CST up to TREE_STRING_LENGTH,
>>>> But using string_length assume thy stop at the first nul char.
>>> Example please?
>>>
>>
>> tree-ssa-forwprop.c:
>>                 str1 = string_constant (src1, &off1);
>>                 if (str1 == NULL_TREE)
>>                   break;
>>                 if (!tree_fits_uhwi_p (off1)
>>                     || compare_tree_int (off1, TREE_STRING_LENGTH (str1) - 1) > 0
>>                     || compare_tree_int (len1, TREE_STRING_LENGTH (str1)
>>                                                - tree_to_uhwi (off1)) > 0
> I understand that you're concerned about reading past the end of the
> returned STRING_CST via the subsequent memcpy in tree-ssa-forwprop.c
> which starts at src1+off1 and reads len1 bytes.
> 
> But I'm not sure how that can happen in the V2 patch from Martin.  In
> fact, one of the fundamental goals of the V2 patch is to avoid this
> exact problem.
> 

here I quote form martins 1/6 patch:
> +  /* Compute the lower bound number of elements (not bytes) in the array
> +     that the string is used to initialize.  The actual size of the array
> +     will be may be greater if the string is shorter, but the important
> +     data point is whether the literal, including the terminating nul,
> +     fits in the array. */
> +  unsigned HOST_WIDE_INT array_elts
> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
> +
> +  /* Compute the string length in (wide) characters.  */

So prior to my STRING_CST patch this will ICE on a flexible array member,
because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.

I used:

  compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
                    TREE_STRING_LENGTH (init))
  
and this will not ICE with NULL, but consider it like infinity,
and return 1.

So my version did not ICE in that case.

Once we have the new STRING_CST semantics in place it will make
this defined, but it is at the same time completely unnecessary.


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24 17:26           ` Bernd Edlinger
@ 2018-08-24 23:54             ` Jeff Law
  2018-08-25  6:32               ` Bernd Edlinger
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-24 23:54 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
> On 08/24/18 18:51, Jeff Law wrote:
>>> Well, this is broken for wide character strings.
>>> but I hope we can get rid of STRING_CST which are
>>> not explicitly null terminated.
> 
> I am afraid that is not going to happen.
> Maybe we can get STRING_CST that are never longer
> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
> need to take care that the string is zero-terminated.
> 
> string_constant, should not promise the string is zero terminated.
> But instead it can promise that:
> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
> 2) mem_size is >= TREE_STRING_LENGTH
> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
> 
> It will not guarantee anything about zero termination any more.
Interesting because those conditions would be sufficient to deal with a
regression I stumbled over after fixing Martin's patch to not assume
that all STRING_CSTs are NUL terminated.

But I need to think about this a bit more.  Essentially the question
we'd need to ask is whether or not these are sufficient in general or
just in specific cases.

I tend to think they're not sufficient in general. If a string returned
by string_constant that didn't have a terminating NUL, but which did
pass the tests above were ultimately passed to the runtime's str*
routines, then the call may run off the end of the string.  We'd like to
be able to warn for that.

So ISTM those rules are only valid in contexts where we know the result
isn't going to be passed to str* and friends within the C library.

I do think they're sufficient to avoid problems with the
tree-ssa-forwprop code we've looked at.  So what may make the most sense
is to have that routine indicate it's willing to accept unterminated
strings, then check the conditions above before optimizing the code.

> 
> In the end, the best approach might be to either merge my patch
> with Martins, or step-wise, first fixing wrong code, and then
> implementing warnings without fixing wrong code.
Unsure at this time.  I've been working with both.  I suspect that if we
went with yours that we'd then turn around and layer Martin's on top of
it because of the desire to signal to callers that we have an
unterminated string and have the callers take appropriate action.  Which
begs the question of whether or not we just go with Martin's -- ie, is
there really any value in using both.  I haven't seen indications there
is value in that approach, but I'm still poking at things.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-24 23:54             ` Jeff Law
@ 2018-08-25  6:32               ` Bernd Edlinger
  2018-08-25 17:33                 ` Jeff Law
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-25  6:32 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/25/18 01:54, Jeff Law wrote:
> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>> On 08/24/18 18:51, Jeff Law wrote:
>>>> Well, this is broken for wide character strings.
>>>> but I hope we can get rid of STRING_CST which are
>>>> not explicitly null terminated.
>>
>> I am afraid that is not going to happen.
>> Maybe we can get STRING_CST that are never longer
>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>> need to take care that the string is zero-terminated.
>>
>> string_constant, should not promise the string is zero terminated.
>> But instead it can promise that:
>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>> 2) mem_size is >= TREE_STRING_LENGTH
>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>
>> It will not guarantee anything about zero termination any more.
> Interesting because those conditions would be sufficient to deal with a
> regression I stumbled over after fixing Martin's patch to not assume
> that all STRING_CSTs are NUL terminated.
> 
> But I need to think about this a bit more.  Essentially the question
> we'd need to ask is whether or not these are sufficient in general or
> just in specific cases.
> 
> I tend to think they're not sufficient in general. If a string returned
> by string_constant that didn't have a terminating NUL, but which did
> pass the tests above were ultimately passed to the runtime's str*
> routines, then the call may run off the end of the string.  We'd like to
> be able to warn for that.
> 
> So ISTM those rules are only valid in contexts where we know the result
> isn't going to be passed to str* and friends within the C library.
> 
> I do think they're sufficient to avoid problems with the
> tree-ssa-forwprop code we've looked at.  So what may make the most sense
> is to have that routine indicate it's willing to accept unterminated
> strings, then check the conditions above before optimizing the code.
> 

There are not too many callers of string_constant.
Not all need zero termination.

But I think if the are interested in zero-termination
they should simply call c_strlen or c_getstr.


>>
>> In the end, the best approach might be to either merge my patch
>> with Martins, or step-wise, first fixing wrong code, and then
>> implementing warnings without fixing wrong code.
> Unsure at this time.  I've been working with both.  I suspect that if we
> went with yours that we'd then turn around and layer Martin's on top of
> it because of the desire to signal to callers that we have an
> unterminated string and have the callers take appropriate action.  Which
> begs the question of whether or not we just go with Martin's -- ie, is
> there really any value in using both.  I haven't seen indications there
> is value in that approach, but I'm still poking at things.
> 

Well, ya call it "layer one patch over the other"
I call it "incremental improvements".


Bernd.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25  6:32               ` Bernd Edlinger
@ 2018-08-25 17:33                 ` Jeff Law
  2018-08-25 18:36                   ` Bernd Edlinger
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-25 17:33 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
> On 08/25/18 01:54, Jeff Law wrote:
>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>> On 08/24/18 18:51, Jeff Law wrote:
>>>>> Well, this is broken for wide character strings.
>>>>> but I hope we can get rid of STRING_CST which are
>>>>> not explicitly null terminated.
>>>
>>> I am afraid that is not going to happen.
>>> Maybe we can get STRING_CST that are never longer
>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>> need to take care that the string is zero-terminated.
>>>
>>> string_constant, should not promise the string is zero terminated.
>>> But instead it can promise that:
>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>> 2) mem_size is >= TREE_STRING_LENGTH
>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>
>>> It will not guarantee anything about zero termination any more.
>> Interesting because those conditions would be sufficient to deal with a
>> regression I stumbled over after fixing Martin's patch to not assume
>> that all STRING_CSTs are NUL terminated.
>>
>> But I need to think about this a bit more.  Essentially the question
>> we'd need to ask is whether or not these are sufficient in general or
>> just in specific cases.
>>
>> I tend to think they're not sufficient in general. If a string returned
>> by string_constant that didn't have a terminating NUL, but which did
>> pass the tests above were ultimately passed to the runtime's str*
>> routines, then the call may run off the end of the string.  We'd like to
>> be able to warn for that.
>>
>> So ISTM those rules are only valid in contexts where we know the result
>> isn't going to be passed to str* and friends within the C library.
>>
>> I do think they're sufficient to avoid problems with the
>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>> is to have that routine indicate it's willing to accept unterminated
>> strings, then check the conditions above before optimizing the code.
>>
> 
> There are not too many callers of string_constant.
> Not all need zero termination.
Right.  And in retrospect we probably should have avoided default
parameter overloads and just fixed the callers.  But that can be a
follow-up.

> 
> But I think if the are interested in zero-termination
> they should simply call c_strlen or c_getstr.
Perhaps.


> 
> 
>>>
>>> In the end, the best approach might be to either merge my patch
>>> with Martins, or step-wise, first fixing wrong code, and then
>>> implementing warnings without fixing wrong code.
>> Unsure at this time.  I've been working with both.  I suspect that if we
>> went with yours that we'd then turn around and layer Martin's on top of
>> it because of the desire to signal to callers that we have an
>> unterminated string and have the callers take appropriate action.  Which
>> begs the question of whether or not we just go with Martin's -- ie, is
>> there really any value in using both.  I haven't seen indications there
>> is value in that approach, but I'm still poking at things.
>>
> 
> Well, ya call it "layer one patch over the other"
> I call it "incremental improvements".
It is (of course) a case by case basis.  The way I try to look at these
things is to ask whether or not the first patch under consideration
would have any value/purpose after the second patch was installed.  If
so, then it may make sense to include both.  If not, then we really just
want one patch.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 17:33                 ` Jeff Law
@ 2018-08-25 18:36                   ` Bernd Edlinger
  2018-08-25 19:02                     ` Jeff Law
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-25 18:36 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/25/18 19:32, Jeff Law wrote:
> On 08/25/2018 12:32 AM, Bernd Edlinger wrote:
>> On 08/25/18 01:54, Jeff Law wrote:
>>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
>>>> On 08/24/18 18:51, Jeff Law wrote:
>>>>>> Well, this is broken for wide character strings.
>>>>>> but I hope we can get rid of STRING_CST which are
>>>>>> not explicitly null terminated.
>>>>
>>>> I am afraid that is not going to happen.
>>>> Maybe we can get STRING_CST that are never longer
>>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
>>>> need to take care that the string is zero-terminated.
>>>>
>>>> string_constant, should not promise the string is zero terminated.
>>>> But instead it can promise that:
>>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
>>>> 2) mem_size is >= TREE_STRING_LENGTH
>>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
>>>>
>>>> It will not guarantee anything about zero termination any more.
>>> Interesting because those conditions would be sufficient to deal with a
>>> regression I stumbled over after fixing Martin's patch to not assume
>>> that all STRING_CSTs are NUL terminated.
>>>
>>> But I need to think about this a bit more.  Essentially the question
>>> we'd need to ask is whether or not these are sufficient in general or
>>> just in specific cases.
>>>
>>> I tend to think they're not sufficient in general. If a string returned
>>> by string_constant that didn't have a terminating NUL, but which did
>>> pass the tests above were ultimately passed to the runtime's str*
>>> routines, then the call may run off the end of the string.  We'd like to
>>> be able to warn for that.
>>>
>>> So ISTM those rules are only valid in contexts where we know the result
>>> isn't going to be passed to str* and friends within the C library.
>>>
>>> I do think they're sufficient to avoid problems with the
>>> tree-ssa-forwprop code we've looked at.  So what may make the most sense
>>> is to have that routine indicate it's willing to accept unterminated
>>> strings, then check the conditions above before optimizing the code.
>>>
>>
>> There are not too many callers of string_constant.
>> Not all need zero termination.
> Right.  And in retrospect we probably should have avoided default
> parameter overloads and just fixed the callers.  But that can be a
> follow-up.
> 
>>
>> But I think if the are interested in zero-termination
>> they should simply call c_strlen or c_getstr.
> Perhaps.
> 
> 
>>
>>
>>>>
>>>> In the end, the best approach might be to either merge my patch
>>>> with Martins, or step-wise, first fixing wrong code, and then
>>>> implementing warnings without fixing wrong code.
>>> Unsure at this time.  I've been working with both.  I suspect that if we
>>> went with yours that we'd then turn around and layer Martin's on top of
>>> it because of the desire to signal to callers that we have an
>>> unterminated string and have the callers take appropriate action.  Which
>>> begs the question of whether or not we just go with Martin's -- ie, is
>>> there really any value in using both.  I haven't seen indications there
>>> is value in that approach, but I'm still poking at things.
>>>
>>
>> Well, ya call it "layer one patch over the other"
>> I call it "incremental improvements".
> It is (of course) a case by case basis.  The way I try to look at these
> things is to ask whether or not the first patch under consideration
> would have any value/purpose after the second patch was installed.  If
> so, then it may make sense to include both.  If not, then we really just
> want one patch.
> 

Agreed.  I think the question is which of the possible STRING_CST
semantics we want to have in the end (the middle-end).
Everything builds on top of the semantic properties of STRING_CSTs.

My first attempt of fix the STRING_CST semantic was trying to make
string_constant happy.

My second attempt is trying to make Richard happy.  And when I look
at both patches, I think the second one is better, and more simple.


BTW I need to correct on statement in my last e-mail:

On 08/24/18 23:55, Bernd Edlinger wrote:>
> here I quote form martins 1/6 patch:
>> +  /* Compute the lower bound number of elements (not bytes) in the array
>> +     that the string is used to initialize.  The actual size of the array
>> +     will be may be greater if the string is shorter, but the important
>> +     data point is whether the literal, including the terminating nul,
>> +     fits in the array. */
>> +  unsigned HOST_WIDE_INT array_elts
>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>> +
>> +  /* Compute the string length in (wide) characters.  */
> 
> So prior to my STRING_CST patch this will ICE on a flexible array member,
> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
> 
> I used:
> 
>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>                     TREE_STRING_LENGTH (init))
> 
> and this will not ICE with NULL, but consider it like infinity,
> and return 1.
> 
> So my version did not ICE in that case.
> 

Oooops,

actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
I actually tried to test that case, but have done something wrong.

I hope we can get rid if the incomplete types in the middle-end.

So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
here?   Or maybe just defer until we have clarity about the semantics.


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 18:36                   ` Bernd Edlinger
@ 2018-08-25 19:02                     ` Jeff Law
  2018-08-25 19:32                       ` Bernd Edlinger
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-25 19:02 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/25/2018 12:36 PM, Bernd Edlinger wrote:

>>>>
>>>
>>> Well, ya call it "layer one patch over the other"
>>> I call it "incremental improvements".
>> It is (of course) a case by case basis.  The way I try to look at these
>> things is to ask whether or not the first patch under consideration
>> would have any value/purpose after the second patch was installed.  If
>> so, then it may make sense to include both.  If not, then we really just
>> want one patch.
>>
> 
> Agreed.  I think the question is which of the possible STRING_CST
> semantics we want to have in the end (the middle-end).
> Everything builds on top of the semantic properties of STRING_CSTs.
This certainly plays a role.  I bumped pretty hard against the
STRING_CST semantics issue with Martin's patch.  I'm hoping that making
those more consistent will ultimately simplify things and avoid the
problems I'm stumbling over.

Of course, that means more delays in getting this sorted out.  I really
thought I had a viable plan a couple days ago, but I'm having to rethink
in light of some of the issues raised.


> 
> My first attempt of fix the STRING_CST semantic was trying to make
> string_constant happy.
> 
> My second attempt is trying to make Richard happy.  And when I look
> at both patches, I think the second one is better, and more simple.
In general I've found that Richie's advice generally results in a
cleaner implementation ;-)
> 
> 
> BTW I need to correct on statement in my last e-mail:
> 
> On 08/24/18 23:55, Bernd Edlinger wrote:>
>> here I quote form martins 1/6 patch:
>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>> +     that the string is used to initialize.  The actual size of the array
>>> +     will be may be greater if the string is shorter, but the important
>>> +     data point is whether the literal, including the terminating nul,
>>> +     fits in the array. */
>>> +  unsigned HOST_WIDE_INT array_elts
>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>> +
>>> +  /* Compute the string length in (wide) characters.  */
>>
>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>
>> I used:
>>
>>   compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>                     TREE_STRING_LENGTH (init))
>>
>> and this will not ICE with NULL, but consider it like infinity,
>> and return 1.
>>
>> So my version did not ICE in that case.
>>
> 
> Oooops,
> 
> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
> I actually tried to test that case, but have done something wrong.
> 
> I hope we can get rid if the incomplete types in the middle-end.
> 
> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
> here?   Or maybe just defer until we have clarity about the semantics.
Not sure.  I've tried largely to not let VLA issues drive anything here.
 I'm not a fan of them for a variety of reasons and thus I tend to look
at all the VLA stuff as exceptional cases.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 19:02                     ` Jeff Law
@ 2018-08-25 19:32                       ` Bernd Edlinger
  2018-08-25 20:42                         ` Martin Sebor
  2018-08-25 23:22                         ` Jeff Law
  0 siblings, 2 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-25 19:32 UTC (permalink / raw)
  To: Jeff Law, Martin Sebor, Gcc Patch List

On 08/25/18 21:02, Jeff Law wrote:
> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
> 
>>>>>
>>>>
>>>> Well, ya call it "layer one patch over the other"
>>>> I call it "incremental improvements".
>>> It is (of course) a case by case basis.  The way I try to look at these
>>> things is to ask whether or not the first patch under consideration
>>> would have any value/purpose after the second patch was installed.  If
>>> so, then it may make sense to include both.  If not, then we really just
>>> want one patch.
>>>
>>
>> Agreed.  I think the question is which of the possible STRING_CST
>> semantics we want to have in the end (the middle-end).
>> Everything builds on top of the semantic properties of STRING_CSTs.
> This certainly plays a role.  I bumped pretty hard against the
> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
> those more consistent will ultimately simplify things and avoid the
> problems I'm stumbling over.
> 
> Of course, that means more delays in getting this sorted out.  I really
> thought I had a viable plan a couple days ago, but I'm having to rethink
> in light of some of the issues raised.
> 

I think we should slow down.

> 
>>
>> My first attempt of fix the STRING_CST semantic was trying to make
>> string_constant happy.
>>
>> My second attempt is trying to make Richard happy.  And when I look
>> at both patches, I think the second one is better, and more simple.
> In general I've found that Richie's advice generally results in a
> cleaner implementation ;-)
>>
>>
>> BTW I need to correct on statement in my last e-mail:
>>
>> On 08/24/18 23:55, Bernd Edlinger wrote:>
>>> here I quote form martins 1/6 patch:
>>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>>> +     that the string is used to initialize.  The actual size of the array
>>>> +     will be may be greater if the string is shorter, but the important
>>>> +     data point is whether the literal, including the terminating nul,
>>>> +     fits in the array. */
>>>> +  unsigned HOST_WIDE_INT array_elts
>>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>>> +
>>>> +  /* Compute the string length in (wide) characters.  */
>>>
>>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>>
>>> I used:
>>>
>>>    compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>>                      TREE_STRING_LENGTH (init))
>>>
>>> and this will not ICE with NULL, but consider it like infinity,
>>> and return 1.
>>>
>>> So my version did not ICE in that case.
>>>
>>
>> Oooops,
>>
>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>> I actually tried to test that case, but have done something wrong.
>>
>> I hope we can get rid if the incomplete types in the middle-end.
>>
>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>> here?   Or maybe just defer until we have clarity about the semantics.
> Not sure.  I've tried largely to not let VLA issues drive anything here.
>   I'm not a fan of them for a variety of reasons and thus I tend to look
> at all the VLA stuff as exceptional cases.
> 

We should have a test case with flexible array members, those behave
slightly different than VLAs.

struct {
   int i;
   char x[];
} s;

const struct s s = { 1, "test" };

int f()
{
   return strlen(s.x);
}


By the way, when you change so much on Martin's patch it might be
good to post it again on this list, when it's ready, so maybe I can have a
look at it and help out with some review comments?  (please put me on CC)


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 19:32                       ` Bernd Edlinger
@ 2018-08-25 20:42                         ` Martin Sebor
  2018-08-26 10:20                           ` Bernd Edlinger
  2018-08-25 23:22                         ` Jeff Law
  1 sibling, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-08-25 20:42 UTC (permalink / raw)
  To: Bernd Edlinger, Jeff Law, Gcc Patch List

On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
> On 08/25/18 21:02, Jeff Law wrote:
>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>
>>>>>>
>>>>>
>>>>> Well, ya call it "layer one patch over the other"
>>>>> I call it "incremental improvements".
>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>> things is to ask whether or not the first patch under consideration
>>>> would have any value/purpose after the second patch was installed.  If
>>>> so, then it may make sense to include both.  If not, then we really just
>>>> want one patch.
>>>>
>>>
>>> Agreed.  I think the question is which of the possible STRING_CST
>>> semantics we want to have in the end (the middle-end).
>>> Everything builds on top of the semantic properties of STRING_CSTs.
>> This certainly plays a role.  I bumped pretty hard against the
>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>> those more consistent will ultimately simplify things and avoid the
>> problems I'm stumbling over.
>>
>> Of course, that means more delays in getting this sorted out.  I really
>> thought I had a viable plan a couple days ago, but I'm having to rethink
>> in light of some of the issues raised.
>>
>
> I think we should slow down.

That's coming from someone whose been piling on revisions upon
revisions of your own work here as you change your mind about
whether STRING_CST should or shouldn't have a nul byte at
the end.  You've been doing nothing but slowing us down for
the last five weeks.

There's nothing in this basic patch that cannot be easily adjusted
if something changes in the future.  But there is quite a bit of
useful work already that builds on the basic infrastructure in it
that could move forward and that wouldn't be significantly affected
by changes to the underlying representation.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 19:32                       ` Bernd Edlinger
  2018-08-25 20:42                         ` Martin Sebor
@ 2018-08-25 23:22                         ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-25 23:22 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
> On 08/25/18 21:02, Jeff Law wrote:
>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>
>>>>>>
>>>>>
>>>>> Well, ya call it "layer one patch over the other"
>>>>> I call it "incremental improvements".
>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>> things is to ask whether or not the first patch under consideration
>>>> would have any value/purpose after the second patch was installed.  If
>>>> so, then it may make sense to include both.  If not, then we really just
>>>> want one patch.
>>>>
>>>
>>> Agreed.  I think the question is which of the possible STRING_CST
>>> semantics we want to have in the end (the middle-end).
>>> Everything builds on top of the semantic properties of STRING_CSTs.
>> This certainly plays a role.  I bumped pretty hard against the
>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>> those more consistent will ultimately simplify things and avoid the
>> problems I'm stumbling over.
>>
>> Of course, that means more delays in getting this sorted out.  I really
>> thought I had a viable plan a couple days ago, but I'm having to rethink
>> in light of some of the issues raised.
>>
> 
> I think we should slow down.
> 
>>
>>>
>>> My first attempt of fix the STRING_CST semantic was trying to make
>>> string_constant happy.
>>>
>>> My second attempt is trying to make Richard happy.  And when I look
>>> at both patches, I think the second one is better, and more simple.
>> In general I've found that Richie's advice generally results in a
>> cleaner implementation ;-)
>>>
>>>
>>> BTW I need to correct on statement in my last e-mail:
>>>
>>> On 08/24/18 23:55, Bernd Edlinger wrote:>
>>>> here I quote form martins 1/6 patch:
>>>>> +  /* Compute the lower bound number of elements (not bytes) in the array
>>>>> +     that the string is used to initialize.  The actual size of the array
>>>>> +     will be may be greater if the string is shorter, but the important
>>>>> +     data point is whether the literal, including the terminating nul,
>>>>> +     fits in the array. */
>>>>> +  unsigned HOST_WIDE_INT array_elts
>>>>> +    = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize;
>>>>> +
>>>>> +  /* Compute the string length in (wide) characters.  */
>>>>
>>>> So prior to my STRING_CST patch this will ICE on a flexible array member,
>>>> because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE.
>>>>
>>>> I used:
>>>>
>>>>    compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
>>>>                      TREE_STRING_LENGTH (init))
>>>>
>>>> and this will not ICE with NULL, but consider it like infinity,
>>>> and return 1.
>>>>
>>>> So my version did not ICE in that case.
>>>>
>>>
>>> Oooops,
>>>
>>> actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL.
>>> I actually tried to test that case, but have done something wrong.
>>>
>>> I hope we can get rid if the incomplete types in the middle-end.
>>>
>>> So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;"
>>> here?   Or maybe just defer until we have clarity about the semantics.
>> Not sure.  I've tried largely to not let VLA issues drive anything here.
>>   I'm not a fan of them for a variety of reasons and thus I tend to look
>> at all the VLA stuff as exceptional cases.
>>
> 
> We should have a test case with flexible array members, those behave
> slightly different than VLAs.
> 
> struct {
>    int i;
>    char x[];
> } s;
> 
> const struct s s = { 1, "test" };
> 
> int f()
> {
>    return strlen(s.x);
> }
> 
> 
> By the way, when you change so much on Martin's patch it might be
> good to post it again on this list, when it's ready, so maybe I can have a
> look at it and help out with some review comments?  (please put me on CC)
Actually very little changed other than mechanical stuff.  It's one
chunk of code that assumed STRING_CSTs are always NUL terminated that's
problem with that code in isolation.

But I also have to evaluate your patches in the same area and also look
at the known follow-ups to see how the various patches are likely to
interact.

And each time core assumptions/issues are reexamined parts of the
evaluation have to be redone.  In some cases they may simplify, in
others they make the analysis more complex.

Jeff
> 
> 
> Bernd.
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-25 20:42                         ` Martin Sebor
@ 2018-08-26 10:20                           ` Bernd Edlinger
  0 siblings, 0 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-26 10:20 UTC (permalink / raw)
  To: Martin Sebor, Jeff Law, Gcc Patch List

On 08/25/18 22:42, Martin Sebor wrote:
> On 08/25/2018 01:32 PM, Bernd Edlinger wrote:
>> On 08/25/18 21:02, Jeff Law wrote:
>>> On 08/25/2018 12:36 PM, Bernd Edlinger wrote:
>>>
>>>>>>>
>>>>>>
>>>>>> Well, ya call it "layer one patch over the other"
>>>>>> I call it "incremental improvements".
>>>>> It is (of course) a case by case basis.  The way I try to look at these
>>>>> things is to ask whether or not the first patch under consideration
>>>>> would have any value/purpose after the second patch was installed.  If
>>>>> so, then it may make sense to include both.  If not, then we really just
>>>>> want one patch.
>>>>>
>>>>
>>>> Agreed.  I think the question is which of the possible STRING_CST
>>>> semantics we want to have in the end (the middle-end).
>>>> Everything builds on top of the semantic properties of STRING_CSTs.
>>> This certainly plays a role.  I bumped pretty hard against the
>>> STRING_CST semantics issue with Martin's patch.  I'm hoping that making
>>> those more consistent will ultimately simplify things and avoid the
>>> problems I'm stumbling over.
>>>
>>> Of course, that means more delays in getting this sorted out.  I really
>>> thought I had a viable plan a couple days ago, but I'm having to rethink
>>> in light of some of the issues raised.
>>>
>>
>> I think we should slow down.
> 
> That's coming from someone whose been piling on revisions upon
> revisions of your own work here as you change your mind about
> whether STRING_CST should or shouldn't have a nul byte at
> the end.  You've been doing nothing but slowing us down for
> the last five weeks.
> 

Yes, you know, I may be stubborn, but my point of view regarding
these matters is not set in stone.

And it happens that I listen to what others say, including you.

And if after careful consideration I find myself in agreement with
what was said, I just take the freedom to change my point of view.

I for one have learned a lot out of this discussion and out of
writing and repeatedly revising my patches.

I truly hope others have as well gained some new insights.


Thanks
Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) )
  2018-08-02 18:56         ` Bernd Edlinger
  2018-08-02 20:34           ` Martin Sebor
@ 2018-08-29 17:17           ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-29 17:17 UTC (permalink / raw)
  To: Bernd Edlinger, Martin Sebor, Gcc Patch List

On 08/02/2018 12:56 PM, Bernd Edlinger wrote:
> On 08/02/18 15:26, Bernd Edlinger wrote:
>>>
>>>    /* If the length can be computed at compile-time, return it.  */
>>> -  len = c_strlen (src, 0);
>>> +  tree array;
>>> +  tree len = c_strlen (src, 0, &array);
>>
>> You know the c_strlen tries to compute wide character sizes,
>> but strlen does not do that, strlen (L"abc") should give 1
>> (or 0 on a BE machine)
>> I wonder if that is correct.
>>
> [snip]
>>>
>>>  static tree
>>> -fold_builtin_strlen (location_t loc, tree type, tree arg)
>>> +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg)
>>>  {
>>>    if (!validate_arg (arg, POINTER_TYPE))
>>>      return NULL_TREE;
>>>    else
>>>      {
>>> -      tree len = c_strlen (arg, 0);
>>> -
>>> +      tree arr = NULL_TREE;
>>> +      tree len = c_strlen (arg, 0, &arr);
>>
>> Is it possible to write a test case where strlen(L"test") reaches this point?
>> what will c_strlen return then?
>>
> 
> Yes, of course it is:
> 
> $ cat y.c
> int f(char *x)
> {
>    return __builtin_strlen(x);
> }
> 
> int main ()
> {
>    return f((char*)&L"abcdef"[0]);
> }
FWIW, I've twiddled this a bit and included it in Martin's patch for
86711/86714.  THe proper return value is 0 or 1 depending on endianness.


Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)
  2018-08-14  3:21     ` [PATCH 2/6] detect unterminated const arrays in strlen " Martin Sebor
@ 2018-08-30 22:15       ` Jeff Law
  2018-08-31  2:25         ` Martin Sebor
  0 siblings, 1 reply; 53+ messages in thread
From: Jeff Law @ 2018-08-30 22:15 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 09:21 PM, Martin Sebor wrote:
> [PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)
> 
> The attached changes implement the detection of past-the-end reads
> by strlen due to unterminated arguments.
> 
> gcc-86552-2.diff
> 
> 
> PR tree-optimization/86552 - missing warning for reading past the end
> 
> gcc/ChangeLog:
> 
> 	* builtins.c (warn_string_no_nul): New function.
> 	(expand_builtin_strlen): Warn for unterminated arrays.
> 	(fold_builtin_strlen): Add argument.  Warn for unterminated arrays.
> 	(fold_builtin_1): Adjust call to fold_builtin_strlen.
> 	* builtins.h (warn_string_no_nul): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/warn-strlen-no-nul.c: New test.
So this has a dependency on parts of the 1/6 patch that haven't been
committed yet.

Ignoring that for the moment (since I have those parts in my tree :-)...

There are minor API changes to functions we need to use.  Those are
trivially fixed up.

With that taken care of I get one XPASS from the new test:


> +T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
I haven't dug into why this now passes.  It could well be the various
refinements we've made over the last couple weeks.

Given that I've got the patch in my tree I'll take care of posting the
final version of the patch and committing it once I've committed the
prereqs.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552)
  2018-08-13 21:27     ` [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552) Martin Sebor
@ 2018-08-30 22:31       ` Jeff Law
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-30 22:31 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 03:27 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by strcpy due to unterminated arguments.
> 
> gcc-86552-3.diff
> 
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	* builtins.c (unterminated_array): New.
> 	(expand_builtin_strcpy): Adjust.
> 	(expand_builtin_strcpy_args): Detect unterminated arrays.
> 	* gimple-fold.c (get_maxval_strlen): Add argument.  Detect
> 	unterminated arrays.
> 	* gimple-fold.h (get_maxval_strlen): Add argument.
> 	(gimple_fold_builtin_strcpy): Detec unterminated arrays.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/warn-strcpy-no-nul.c: New test.
From a review standpoint this is essentially in the same state as patch
#2.  It depends on bits that haven't been installed (yet) and needs
trivial API updates.  There's one test that is an XPASS which is clearly
a derived from the same test that is an XPASS in patch #2.


While reviewing I noticed that get_maxval_strlen didn't have a function
comment.  So I added one.  get_maxval_strlen will likely need further
refinement of its comment or code once get_range_strlen gets revamped.

As with patch #2 in this series, I'll own posting the final patch and
committing the bits.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 4/6] detect unterminated const arrays in sprintf calls (PR 86552)
  2018-08-13 21:28     ` [PATCH 4/6] detect unterminated const arrays in sprintf " Martin Sebor
@ 2018-08-30 22:55       ` Jeff Law
  0 siblings, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-30 22:55 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 03:28 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by the sprintf family of functions due to unterminated arguments to
> %s directives.
> 
> gcc-86552-4.diff
> 
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	* gimple-ssa-sprintf.c (struct fmtresult): Add new member and
> 	initialize it.
> 	(get_string_length): Detect unterminated arrays.
> 	(format_string): Same.
> 	(format_directive): Warn about unterminated arrays.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/warn-sprintf-no-nul.c: New test.
Largely the same state as #2 and #3.

I am getting a failure from the test though.  It looks like the sprintf
code is turning an offending sprintf call into a strcpy call and we end
up getting a warning from both.

> @@ -2988,6 +3002,18 @@ format_directive (const sprintf_dom_walker::call_info &info,
>  			  fmtres.range.min, fmtres.range.max);
>      }
>  
> +  if (!warned && fmtres.nonstr)
> +    {
> +      warned = fmtwarn (dirloc, argloc, NULL, info.warnopt (),
> +			"%<%.*s%> directive argument is not a nul-terminated "
> +			"string",
> +			dirlen,
> +			target_to_host (hostdir, sizeof hostdir, dir.beg));
> +      if (warned && DECL_P (fmtres.nonstr))
> +	inform (DECL_SOURCE_LOCATION (fmtres.nonstr),
> +		"referenced argument declared here");
> +    }
> +
ISTM that returning false from this point should address the issue.
Essentially preventing the sprintf->strcpy transformation if the
directive argument is not NUL terminated.


I'll own this just like #2 and #3.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/6] detect unterminated const arrays in stpcpy calls (PR 86552)
  2018-08-13 21:29     ` [PATCH 5/6] detect unterminated const arrays in stpcpy " Martin Sebor
@ 2018-08-30 23:07       ` Jeff Law
  2018-09-14 18:39       ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-30 23:07 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 03:28 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by stpcpy due to unterminated arguments.
> 
> 
> gcc-86552-5.diff
> 
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	* builtins.c (unterminated_array): Handle ARRAY_REF.
> 	(expand_builtin_stpcpy_1): Detect unterminated char arrays.
> 	* builtins.h (unterminated_array): Declare extern.
> 	* gimple-fold.c (gimple_fold_builtin_stpcpy): Detect unterminated
> 	  arrays.
> 	(gimple_fold_builtin_sprintf): Propagate NO_WARNING to transformed
> 	calls.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/warn-stpcpy-no-nul.c: New test.
Same story and resolution as #2, #3 and #4.

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/6] detect unterminated const arrays in strnlen calls (PR 86552)
  2018-08-13 21:29     ` [PATCH 6/6] detect unterminated const arrays in strnlen " Martin Sebor
@ 2018-08-30 23:25       ` Jeff Law
  2018-10-01 21:49       ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-08-30 23:25 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 08/13/2018 03:29 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by strncpy due to unterminated arguments and excessive bounds.
> 
> 
> gcc-86552-6.diff
> 
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 	* builtins.c (expand_builtin_strnlen): Detect, avoid expanding,
> 	and diagnose unterminated arrays.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/warn-strnlen-no-nul.c: New.
This will have the same state and resolution as #2-#5.

jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)
  2018-08-30 22:15       ` Jeff Law
@ 2018-08-31  2:25         ` Martin Sebor
  0 siblings, 0 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-31  2:25 UTC (permalink / raw)
  To: Jeff Law, Gcc Patch List

On 08/30/2018 04:15 PM, Jeff Law wrote:
> On 08/13/2018 09:21 PM, Martin Sebor wrote:
>> [PATCH 2/6] detect unterminated const arrays in strlen calls (PR 86552)
>>
>> The attached changes implement the detection of past-the-end reads
>> by strlen due to unterminated arguments.
>>
>> gcc-86552-2.diff
>>
>>
>> PR tree-optimization/86552 - missing warning for reading past the end
>>
>> gcc/ChangeLog:
>>
>> 	* builtins.c (warn_string_no_nul): New function.
>> 	(expand_builtin_strlen): Warn for unterminated arrays.
>> 	(fold_builtin_strlen): Add argument.  Warn for unterminated arrays.
>> 	(fold_builtin_1): Adjust call to fold_builtin_strlen.
>> 	* builtins.h (warn_string_no_nul): New function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 	* gcc.dg/warn-strlen-no-nul.c: New test.
> So this has a dependency on parts of the 1/6 patch that haven't been
> committed yet.
>
> Ignoring that for the moment (since I have those parts in my tree :-)...
>
> There are minor API changes to functions we need to use.  Those are
> trivially fixed up.
>
> With that taken care of I get one XPASS from the new test:
>
>
>> +T (v0 ? &b[3][v0] : &b[3][v1]);   /* { dg-warning "nul" "bug" { xfail *-*-* } }  */
> I haven't dug into why this now passes.  It could well be the various
> refinements we've made over the last couple weeks.

I'm not sure what's letting it succeed.  get_range_strlen() can
tell the array isn't nul-terminated but it's only called from
gimple_fold_builtin_strlen() which doesn't warn.   The next
chance to warn is handle_builtin_strlen() but it doesn't call
get_range_strlen().  The next opportunity to warn after that
is expand_builtin_strlen() and it doesn't call get_range_strlen()
either.  There definitely are more opportunities to warn as
the many xfails in the warn-strlen-no-nul.c test indicate.
I didn't want to make the initial patch too bin and intrusive
by handling all those cases but it's something I'd like to do
in a followup.

>
> Given that I've got the patch in my tree I'll take care of posting the
> final version of the patch and committing it once I've committed the
> prereqs.

Sounds good.  Thanks for handling that!

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 5/6] detect unterminated const arrays in stpcpy calls (PR 86552)
  2018-08-13 21:29     ` [PATCH 5/6] detect unterminated const arrays in stpcpy " Martin Sebor
  2018-08-30 23:07       ` Jeff Law
@ 2018-09-14 18:39       ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-09-14 18:39 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

On 8/13/18 3:28 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by stpcpy due to unterminated arguments.
> 
> 
> gcc-86552-5.diff
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 
> 	* builtins.c (unterminated_array): Handle ARRAY_REF.
> 	(expand_builtin_stpcpy_1): Detect unterminated char arrays.
> 	* builtins.h (unterminated_array): Declare extern.
> 	* gimple-fold.c (gimple_fold_builtin_stpcpy): Detect unterminated
> 	  arrays.
> 	(gimple_fold_builtin_sprintf): Propagate NO_WARNING to transformed
> 	calls.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.dg/warn-stpcpy-no-nul.c: New test.
So with this patch I just added initialization for a NONSTR passed down
to c_strlen.  Otherwise it just worked on top of all the recent changes.

I'll install it on the trunk momentarily.

I'll probably stop here today to let the testers run through another
cycle.  What's left of this kit is #4 (sprintf) and #6 (strnlen).

Jeff

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH 6/6] detect unterminated const arrays in strnlen calls (PR 86552)
  2018-08-13 21:29     ` [PATCH 6/6] detect unterminated const arrays in strnlen " Martin Sebor
  2018-08-30 23:25       ` Jeff Law
@ 2018-10-01 21:49       ` Jeff Law
  1 sibling, 0 replies; 53+ messages in thread
From: Jeff Law @ 2018-10-01 21:49 UTC (permalink / raw)
  To: Martin Sebor, Gcc Patch List

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On 8/13/18 3:29 PM, Martin Sebor wrote:
> The attached changes implement the detection of past-the-end reads
> by strncpy due to unterminated arguments and excessive bounds.
> 
> 
> gcc-86552-6.diff
> 
> PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays
> 
> gcc/ChangeLog:
> 	* builtins.c (expand_builtin_strnlen): Detect, avoid expanding,
> 	and diagnose unterminated arrays.
> 
> gcc/testsuite/ChangeLog:
> 	* gcc.dg/warn-strnlen-no-nul.c: New.
So the changes to c_strlen's API allow us to simplify the changes you
made to unterminated_array.  Essentially we get to drop the code which
tears apart EXP before handing things off to c_strlen -- that's all
handled inside c_strlen/string_constant now.


c_strlen returns NULL for an unterminated array or anything it can't
handle.  So we check for NULL return value and a non-NULL data.decl to
see if we had an unterminated array.  We can get the length of the
unterminated string and the offset via the c_strlen_data we pass to
c_strlen in that case.

If the offset is a pure constant, then it will already be accounted for
in data->len.  So we no longer need to adjust it.  If the offset is
SSA_NAME + INTEGER_CST, we adjust the length by INTEGER_CST and bubble
up exact = false.


I think that summarizes the relatively minor changes I ended up making.

Bootstrapped and regression tested on x86_64.  Installing on the trunk.

Jeff

[-- Attachment #2: P --]
[-- Type: text/plain, Size: 21381 bytes --]

commit ab9a04daf8adffdb00fd085e6f217efeb42875ce
Author: Jeff Law <law@torsion.usersys.redhat.com>
Date:   Thu Aug 30 19:24:34 2018 -0400

            * builtins.c (unterminated_array): Add new arguments.
            If argument is not terminated, bubble up size and exact
            state to callers.
            (expand_builtin_strnlen): Detect, avoid expanding
            and diagnose unterminated arrays.
            (c_strlen): Fill in offset of start of unterminated strings.
            * builtins.h (unterminated_array): Update prototype.
    
            * gcc.dg/warn-strnlen-no-nul.c: New.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b43cc388fa8..05c6f558246 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,14 @@
+2018-10-01  Martin Sebor  <msebor@redhat.com>
+	    Jeff Law  <law@redhat.com>
+
+	* builtins.c (unterminated_array): Add new arguments.
+	If argument is not terminated, bubble up size and exact
+	state to callers.
+	(expand_builtin_strnlen): Detect, avoid expanding
+	and diagnose unterminated arrays.
+	(c_strlen): Fill in offset of start of unterminated strings.
+	* builtins.h (unterminated_array): Update prototype.
+
 2018-10-01  Carl Love  <cel@us.ibm.com>
 
 	PR 69431
diff --git a/gcc/builtins.c b/gcc/builtins.c
index fe411efd9a9..2cb1996dad3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -565,15 +565,50 @@ warn_string_no_nul (location_t loc, const char *fn, tree arg, tree decl)
 
 /* If EXP refers to an unterminated constant character array return
    the declaration of the object of which the array is a member or
-   element.  Otherwise return null.  */
+   element and if SIZE is not null, set *SIZE to the size of
+   the unterminated array and set *EXACT if the size is exact or
+   clear it otherwise.  Otherwise return null.  */
 
 tree
-unterminated_array (tree exp)
+unterminated_array (tree exp, tree *size /* = NULL */, bool *exact /* = NULL */)
 {
+  /* C_STRLEN will return NULL and set DECL in the info
+     structure if EXP references a unterminated array.  */
   c_strlen_data data;
   memset (&data, 0, sizeof (c_strlen_data));
-  c_strlen (exp, 1, &data);
-  return data.decl;
+  tree len = c_strlen (exp, 1, &data);
+  if (len == NULL_TREE && data.len && data.decl)
+     {
+       if (size)
+	{
+	  len = data.len;
+	  if (data.off)
+	    {
+	      /* Constant offsets are already accounted for in data.len, but
+		 not in a SSA_NAME + CST expression.  */
+	      if (TREE_CODE (data.off) == INTEGER_CST)
+		*exact = true;
+	      else if (TREE_CODE (data.off) == PLUS_EXPR
+		       && TREE_CODE (TREE_OPERAND (data.off, 1)) == INTEGER_CST)
+		{
+		  /* Subtract the offset from the size of the array.  */
+		  *exact = false;
+		  tree temp = TREE_OPERAND (data.off, 1);
+		  temp = fold_convert (ssizetype, temp);
+		  len = fold_build2 (MINUS_EXPR, ssizetype, len, temp);
+		}
+	      else
+		*exact = false;
+	    }
+	  else
+	    *exact = true;
+
+	  *size = len;
+	}
+       return data.decl;
+     }
+
+  return NULL_TREE;
 }
 
 /* Compute the length of a null-terminated character string or wide
@@ -685,6 +720,7 @@ c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
       else if (len >= maxelts)
 	{
 	  data->decl = decl;
+	  data->off = byteoff;
 	  data->len = ssize_int (len);
 	  return NULL_TREE;
 	}
@@ -755,6 +791,7 @@ c_strlen (tree src, int only_value, c_strlen_data *data, unsigned eltsize)
   if (len >= maxelts - eltoff)
     {
       data->decl = decl;
+      data->off = byteoff;
       data->len = ssize_int (len);
       return NULL_TREE;
     }
@@ -3037,9 +3074,11 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
   tree maxobjsize = max_object_size ();
   tree func = get_callee_fndecl (exp);
 
-  tree len = c_strlen (src, 0);
   /* FIXME: Change c_strlen() to return sizetype instead of ssizetype
      so these conversions aren't necessary.  */
+  c_strlen_data data;
+  memset (&data, 0, sizeof (c_strlen_data));
+  tree len = c_strlen (src, 0, &data, 1);
   if (len)
     len = fold_convert_loc (loc, TREE_TYPE (bound), len);
 
@@ -3053,7 +3092,43 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
 			 exp, func, bound, maxobjsize))
 	  TREE_NO_WARNING (exp) = true;
 
+      bool exact = true;
       if (!len || TREE_CODE (len) != INTEGER_CST)
+	{
+	  /* Clear EXACT if LEN may be less than SRC suggests,
+	     such as in
+	       strnlen (&a[i], sizeof a)
+	     where the value of i is unknown.  Unless i's value is
+	     zero, the call is unsafe because the bound is greater. */
+	  data.decl = unterminated_array (src, &len, &exact);
+	  if (!data.decl)
+	    return NULL_RTX;
+	}
+
+      if (data.decl
+	  && !TREE_NO_WARNING (exp)
+	  && ((tree_int_cst_lt (len, bound))
+	      || !exact))
+	{
+	  location_t warnloc
+	    = expansion_point_location_if_in_system_header (loc);
+
+	  if (warning_at (warnloc, OPT_Wstringop_overflow_,
+			  exact
+			  ? G_("%K%qD specified bound %E exceeds the size %E "
+			       "of unterminated array")
+			  : G_("%K%qD specified bound %E may exceed the size "
+			       "of at most %E of unterminated array"),
+			  exp, func, bound, len))
+	    {
+	      inform (DECL_SOURCE_LOCATION (data.decl),
+		      "referenced argument declared here");
+	      TREE_NO_WARNING (exp) = true;
+	      return NULL_RTX;
+	    }
+	}
+
+      if (!len)
 	return NULL_RTX;
 
       len = fold_build2_loc (loc, MIN_EXPR, size_type_node, len, bound);
@@ -3079,6 +3154,18 @@ expand_builtin_strnlen (tree exp, rtx target, machine_mode target_mode)
   if (!len || TREE_CODE (len) != INTEGER_CST)
     return NULL_RTX;
 
+  if (!TREE_NO_WARNING (exp)
+      && wi::ltu_p (wi::to_wide (len), min)
+      && warning_at (loc, OPT_Wstringop_overflow_,
+		     "%K%qD specified bound [%wu, %wu] "
+		     "exceeds the size %E of unterminated array",
+		     exp, func, min.to_uhwi (), max.to_uhwi (), len))
+    {
+      inform (DECL_SOURCE_LOCATION (data.decl),
+	      "referenced argument declared here");
+      TREE_NO_WARNING (exp) = true;
+    }
+
   if (wi::gtu_p (min, wi::to_wide (len)))
     return expand_expr (len, target, target_mode, EXPAND_NORMAL);
 
diff --git a/gcc/builtins.h b/gcc/builtins.h
index 3801251f372..cf4f9b1b264 100644
--- a/gcc/builtins.h
+++ b/gcc/builtins.h
@@ -111,7 +111,7 @@ extern internal_fn associated_internal_fn (tree);
 extern internal_fn replacement_internal_fn (gcall *);
 
 extern void warn_string_no_nul (location_t, const char *, tree, tree);
-extern tree unterminated_array (tree);
+extern tree unterminated_array (tree, tree * = NULL, bool * = NULL);
 extern tree max_object_size ();
 
 #endif /* GCC_BUILTINS_H */
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index f322cc56caa..3a906bff938 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,7 @@
+2018-10-01  Martin Sebor  <msebor@redhat.com>
+
+	* gcc.dg/warn-strnlen-no-nul.c: New.
+
 2018-10-01  Carl Love  <cel@us.ibm.com>
 
 	PR 69431
diff --git a/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c
new file mode 100644
index 00000000000..09a527ea337
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/warn-strnlen-no-nul.c
@@ -0,0 +1,356 @@
+/* PR tree-optimization/86552 - missing warning for reading past the end
+   of non-string arrays
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern size_t strnlen (const char*, size_t);
+
+const char a[5] = "12345";   /* { dg-message "declared here" } */
+enum { asz = sizeof a };
+
+int v0 = 0;
+int v1 = 1;
+
+void sink (int, ...);
+
+#define CONCAT(a, b)   a ## b
+#define CAT(a, b)      CONCAT(a, b)
+
+#define T(str, n)					\
+  __attribute__ ((noipa))				\
+  void CAT (test_, __LINE__) (void) {			\
+    int i0 = 0, i1 = i0 + 1, i2 = i1 + 1, i3 = i2 + 1;	\
+    sink (strnlen (str, n), i0, i1, i2, i3);		\
+  } typedef void dummy_type
+
+T (a, asz);
+T (a, asz - 1);
+T (a, asz - 2);
+T (a, asz - 5);
+T (&a[0], asz);
+T (&a[0] + 1, asz);            /* { dg-warning "specified bound 5 exceeds the size 4 of unterminated array" } */
+T (&a[1], asz);                /* { dg-warning "specified bound 5 exceeds the size 4 of unterminated array" } */
+T (&a[1], asz - 1);
+T (&a[v0], asz);               /* { dg-warning "specified bound 5 may exceed the size of at most 5 of unterminated array" } */
+T (&a[v0] + 1, asz);           /* { dg-warning "specified bound 5 may exceed the size of at most 5 of unterminated array" } */
+
+T (a, asz + 1);                /* { dg-warning "specified bound 6 exceeds the size 5 " } */
+T (&a[0], asz + 1);            /* { dg-warning "unterminated" } */
+T (&a[0] + 1, asz - 1);
+T (&a[0] + 1, asz + 1);        /* { dg-warning "unterminated" } */
+T (&a[1], asz + 1);            /* { dg-warning "unterminated" } */
+T (&a[v0], asz + 1);           /* { dg-warning "unterminated" } */
+T (&a[v0] + 1, asz + 1);       /* { dg-warning "unterminated" } */
+
+
+const char b[][5] = { /* { dg-message "declared here" } */
+  "12", "123", "1234", "54321"
+};
+enum { bsz = sizeof b[0] };
+
+T (b[0], bsz);
+T (b[1], bsz);
+T (b[2], bsz);
+T (b[3], bsz);
+
+T (b[0], bsz - 1);
+T (b[1], bsz - 1);
+T (b[2], bsz - 1);
+T (b[3], bsz - 1);
+
+T (b[0], bsz + 1);
+T (b[1], bsz + 1);
+T (b[2], bsz + 1);
+T (b[3], bsz + 1);            /* { dg-warning "unterminated" } */
+
+T (b[i0], bsz);
+T (b[i1], bsz);
+T (b[i2], bsz);
+T (b[i3], bsz);
+
+T (b[i0], bsz + 1);
+T (b[i1], bsz + 1);
+T (b[i2], bsz + 1);
+T (b[i3], bsz + 1);           /* { dg-warning "unterminated" } */
+
+T (b[v0], bsz);
+T (b[v0], bsz + 1);
+
+T (&b[i2][i1], bsz);
+T (&b[i2][i1] + i1, bsz);
+T (&b[i2][v0], bsz);
+T (&b[i2][i1] + v0, bsz);
+
+T (&b[i2][i1], bsz + 1);
+T (&b[i2][i1] + i1, bsz + 1);
+T (&b[i2][v0], bsz + 1);
+T (&b[i2][i1] + v0, bsz + 1);
+
+T (&b[2][1], bsz);
+T (&b[2][1] + i1, bsz);
+T (&b[2][i0], bsz);
+T (&b[2][1] + i0, bsz);
+T (&b[2][1] + v0, bsz);
+T (&b[2][v0], bsz);
+
+T (&b[2][1], bsz + 1);
+T (&b[2][1] + i1, bsz + 1);
+T (&b[2][i0], bsz + 1);
+T (&b[2][1] + i0, bsz + 1);
+T (&b[2][1] + v0, bsz + 1);
+T (&b[2][v0], bsz + 1);
+
+T (&b[3][1], bsz);                /* { dg-warning "unterminated" } */
+T (&b[3][1], bsz - 1);
+T (&b[3][1] + 1, bsz);            /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz - 1);        /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz - 2);
+T (&b[3][1] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz - i1);      /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz - i2);
+T (&b[3][v0], bsz);
+T (&b[3][1] + v0, bsz);           /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" } */
+T (&b[3][v0] + v1, bsz);          /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" "pr?????" { xfail *-*-* } } */
+
+T (&b[3][1], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&b[3][1] + 1, bsz + 1);        /* { dg-warning "unterminated" } */
+T (&b[3][1] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&b[3][v0], bsz + 1);           /* { dg-warning "unterminated" "pr86936" { xfail *-*-* } } */
+T (&b[3][1] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&b[3][v0] + v1, bsz + 1);      /* { dg-warning "unterminated" "pr86936" { xfail *-*-* } } */
+
+T (&b[i3][i1], bsz);              /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + 1, bsz);          /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + i1, bsz);         /* { dg-warning "specified bound 5 exceeds the size 3 of unterminated array" } */
+T (&b[i3][v0], bsz);
+T (&b[i3][i1] + v0, bsz);         /* { dg-warning "specified bound 5 may exceed the size of at most 4 of unterminated array" } */
+T (&b[i3][v0] + v1, bsz);
+
+T (&b[i3][i1], bsz + 1);          /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + 1, bsz + 1);      /* { dg-warning "unterminated" } */
+T (&b[i3][i1] + i1, bsz + 1);     /* { dg-warning "unterminated" } */
+T (&b[i3][v0], bsz + 1);          /* { dg-warning "unterminated" "pr86919" { xfail *-*-* } } */
+T (&b[i3][i1] + v0, bsz + 1);     /* { dg-warning "unterminated" } */
+T (&b[i3][v0] + v1, bsz + 1);     /* { dg-warning "unterminated" "pr86919" { xfail *-*-* } } */
+
+T (v0 ? "" : b[0], bsz);
+T (v0 ? "" : b[1], bsz);
+T (v0 ? "" : b[2], bsz);
+T (v0 ? "" : b[3], bsz);
+T (v0 ? b[0] : "", bsz);
+T (v0 ? b[1] : "", bsz);
+T (v0 ? b[2] : "", bsz);
+T (v0 ? b[3] : "", bsz);
+
+T (v0 ? "" : b[0], bsz + 1);
+T (v0 ? "" : b[1], bsz + 1);
+T (v0 ? "" : b[2], bsz + 1);
+T (v0 ? "" : b[3], bsz + 1);      /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[0] : "", bsz + 1);
+T (v0 ? b[1] : "", bsz + 1);
+T (v0 ? b[2] : "", bsz + 1);
+T (v0 ? b[3] : "", bsz + 1);      /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? "" : b[i0], bsz);
+T (v0 ? "" : b[i1], bsz);
+T (v0 ? "" : b[i2], bsz);
+T (v0 ? "" : b[i3], bsz);
+T (v0 ? b[i0] : "", bsz);
+T (v0 ? b[i1] : "", bsz);
+T (v0 ? b[i2] : "", bsz);
+T (v0 ? b[i3] : "", bsz);
+
+T (v0 ? "" : b[i0], bsz + 1);
+T (v0 ? "" : b[i1], bsz + 1);
+T (v0 ? "" : b[i2], bsz + 1);
+T (v0 ? "" : b[i3], bsz + 1);     /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[i0] : "", bsz + 1);
+T (v0 ? b[i1] : "", bsz + 1);
+T (v0 ? b[i2] : "", bsz + 1);
+T (v0 ? b[i3] : "", bsz + 1);     /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? "1234" : b[3], bsz);
+T (v0 ? "1234" : b[i3], bsz);
+T (v0 ? b[3] : "1234", bsz);
+T (v0 ? b[i3] : "1234", bsz);
+
+T (v0 ? a : b[3], bsz);
+T (v0 ? b[0] : b[2], bsz);
+T (v0 ? b[2] : b[3], bsz);
+T (v0 ? b[3] : b[2], bsz);
+
+T (v0 ? "1234" : b[3], bsz + 1);  /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? "1234" : b[i3], bsz + 1); /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[3] : "1234", bsz + 1);  /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[i3] : "1234", bsz + 1); /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+T (v0 ? a : b[3], bsz + 1);       /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[0] : b[2], bsz + 1);
+T (v0 ? b[2] : b[3], bsz + 1);    /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+T (v0 ? b[3] : b[2], bsz + 1);    /* { dg-warning "unterminated" "pr86937" { xfail *-*-* } } */
+
+struct A { char a[5], b[5]; };
+
+const struct A s = { "1234", "12345" };
+
+T (s.a, asz);
+T (&s.a[0], asz);
+T (&s.a[0] + 1, asz);
+T (&s.a[0] + v0, asz);
+T (&s.a[1], asz);
+T (&s.a[1] + 1, asz);
+T (&s.a[1] + v0, asz);
+
+T (&s.a[i0], asz);
+T (&s.a[i0] + i1, asz);
+T (&s.a[i0] + v0, asz);
+T (&s.a[i1], asz);
+T (&s.a[i1] + i1, asz);
+T (&s.a[i1] + v0, asz);
+
+T (s.a, asz + 1);
+T (&s.a[0], asz + 1);
+T (&s.a[0] + 1, asz + 1);
+T (&s.a[0] + v0, asz + 1);
+T (&s.a[1], asz + 1);
+T (&s.a[1] + 1, asz + 1);
+T (&s.a[1] + v0, asz + 1);
+
+T (&s.a[i0], asz + 1);
+T (&s.a[i0] + i1, asz + 1);
+T (&s.a[i0] + v0, asz + 1);
+T (&s.a[i1], asz + 1);
+T (&s.a[i1] + i1, asz + 1);
+T (&s.a[i1] + v0, asz + 1);
+
+T (s.b, bsz);
+T (&s.b[0], bsz);
+T (&s.b[0] + 1, bsz);             /* { dg-warning "unterminated" } */
+T (&s.b[0] + v0, bsz);            /* { dg-warning "unterminated" } */
+T (&s.b[1], bsz);                 /* { dg-warning "unterminated" } */
+T (&s.b[1] + 1, bsz);             /* { dg-warning "unterminated" } */
+T (&s.b[1] + v0, bsz);            /* { dg-warning "unterminated" } */
+
+T (&s.b[i0], bsz);
+T (&s.b[i0] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i0] + v0, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i1], bsz);                /* { dg-warning "unterminated" } */
+T (&s.b[i1] + i1, bsz);           /* { dg-warning "unterminated" } */
+T (&s.b[i1] + v0, bsz);           /* { dg-warning "unterminated" } */
+
+T (s.b, bsz + 1);                 /* { dg-warning "unterminated" } */
+T (&s.b[0], bsz + 1);             /* { dg-warning "unterminated" } */
+T (&s.b[0] + 1, bsz + 1);         /* { dg-warning "unterminated" } */
+T (&s.b[0] + v0, bsz + 1);        /* { dg-warning "unterminated" } */
+T (&s.b[1], bsz + 1);             /* { dg-warning "unterminated" } */
+T (&s.b[1] + 1, bsz + 1);         /* { dg-warning "unterminated" } */
+T (&s.b[1] + v0, bsz + 1);        /* { dg-warning "unterminated" } */
+
+T (&s.b[i0], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&s.b[i0] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i0] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i1], bsz + 1);            /* { dg-warning "unterminated" } */
+T (&s.b[i1] + i1, bsz + 1);       /* { dg-warning "unterminated" } */
+T (&s.b[i1] + v0, bsz + 1);       /* { dg-warning "unterminated" } */
+
+struct B { struct A a[2]; };
+
+const struct B ba[] = {
+  { { { "123", "12345" }, { "12345", "123" } } },
+  { { { "12345", "123" }, { "123", "12345" } } },
+  { { { "1", "12" },      { "123", "1234" } } },
+  { { { "123", "1234" },  { "12345", "12" } } }
+};
+
+T (ba[0].a[0].a, asz + 1);
+T (&ba[0].a[0].a[0], asz + 1);
+T (&ba[0].a[0].a[0] + 1, asz + 1);
+T (&ba[0].a[0].a[0] + v0, asz + 1);
+T (&ba[0].a[0].a[1], asz + 1);
+T (&ba[0].a[0].a[1] + 1, asz + 1);
+T (&ba[0].a[0].a[1] + v0, asz + 1);
+
+T (ba[0].a[0].b, bsz);
+T (&ba[0].a[0].b[0], bsz);
+T (&ba[0].a[0].b[0] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + 1, bsz - 1);
+T (&ba[0].a[0].b[0] + v0, bsz);       /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz - 1);
+T (&ba[0].a[0].b[1] + 1, bsz - 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + 1, bsz - 2);
+T (&ba[0].a[0].b[1] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + v0, bsz);       /* { dg-warning "unterminated" } */
+
+T (ba[0].a[0].b, bsz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0], bsz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + 1, bsz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[0] + v0, bsz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1], bsz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + 1, bsz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[0].b[1] + v0, bsz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[0].a[1].a, asz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[0] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[0].a[1].a[1] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[0].a[1].b, bsz + 1);
+T (&ba[0].a[1].b[0], bsz + 1);
+T (&ba[0].a[1].b[0] + 1, bsz + 1);
+T (&ba[0].a[1].b[0] + v0, bsz + 1);
+T (&ba[0].a[1].b[1], bsz + 1);
+T (&ba[0].a[1].b[1] + 1, bsz + 1);
+T (&ba[0].a[1].b[1] + v0, bsz + 1);
+
+T (ba[1].a[0].a, asz);
+T (&ba[1].a[0].a[0], asz);
+T (&ba[1].a[0].a[0] + 1, asz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + v0, asz);       /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1], asz);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + 1, asz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + v0, asz);       /* { dg-warning "unterminated" } */
+
+T (ba[1].a[0].a, asz + 1);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[0] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1], asz + 1);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + 1, asz + 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[0].a[1] + v0, asz + 1);   /* { dg-warning "unterminated" } */
+
+T (ba[1].a[0].b, bsz);
+T (&ba[1].a[0].b[0], bsz);
+T (&ba[1].a[0].b[0] + 1, bsz);
+T (&ba[1].a[0].b[0] + v0, bsz);
+T (&ba[1].a[0].b[1], bsz);
+T (&ba[1].a[0].b[1] + 1, bsz);
+T (&ba[1].a[0].b[1] + v0, bsz);
+
+T (ba[1].a[1].a, asz);
+T (&ba[1].a[1].a[0], asz);
+T (&ba[1].a[1].a[0] + 1, asz);
+T (&ba[1].a[1].a[0] + v0, asz);
+T (&ba[1].a[1].a[1], asz);
+T (&ba[1].a[1].a[1] + 1, asz);
+T (&ba[1].a[1].a[1] + v0, asz);
+
+T (ba[1].a[1].b, bsz);
+T (&ba[1].a[1].b[0], bsz);
+T (&ba[1].a[1].b[0] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[0] + 1, bsz - 1);
+T (&ba[1].a[1].b[0] + v0, bsz);       /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1], bsz);            /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1], bsz - 1);
+T (&ba[1].a[1].b[1] + 1, bsz);        /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1] + 1, bsz - 1);    /* { dg-warning "unterminated" } */
+T (&ba[1].a[1].b[1] + 1, bsz - 2);
+T (&ba[1].a[1].b[1] + 1, bsz - i2);
+T (&ba[1].a[1].b[1] + v0, bsz);       /* { dg-warning "unterminated" } */
+
+/* Prune out warnings with no location (pr?????).
+   { dg-prune-output "cc1:" } */

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-08-01 16:34     ` Martin Sebor
  2018-08-01 17:16       ` Bernd Edlinger
@ 2018-08-01 20:33       ` Martin Sebor
  1 sibling, 0 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-01 20:33 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 08/01/2018 10:34 AM, Martin Sebor wrote:
>>> If you care about detecting bugs I would expect you to be
>>> supportive rather than dismissive of this work, and helpful
>>> in bringing it to fruition rather that putting it down or
>>> questioning my priorities.  Especially since the work was
>>> prompted by your own (valid) complaint that GCC doesn't
>>> diagnose them.
>>>
>>
>> You don't really listen to what I am saying, I did not say
>> that we need another warning instead of fixing the wrong
>> optimization issue at hand.
>>
>> But I am in good company, you don't listen to Jakub and Richi
>> either.
>
> I certainly intend to fix bugs I'm responsible for introducing.
> I always do if given the chance.  I assume you are referring
> to bug 86711 (and 86714).  Fixing the underlying problem has
> been on my mind since you first mentioned it, and on my to-do
> list since last week (bug 86688).

I've started looking into fixing 86711 but as it turns out,
by avoiding folding non-nul-terminated strings, this patch
already fixes it as well as producing the output you expect
for the test case in 86714, and also fixes 86688.

So unless you intend to pursue your patch I will assign all
these bugs to myself, add the test cases to this patch, and
resubmit it.  (I would normally prefer to deal with each bug
independently, but since I already have a working patch that
does the right thing I'd just as soon save the time and effort
and not try to break it up).

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-08-01 16:34     ` Martin Sebor
@ 2018-08-01 17:16       ` Bernd Edlinger
  2018-08-01 20:33       ` Martin Sebor
  1 sibling, 0 replies; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-01 17:16 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches

On 08/01/18 18:34, Martin Sebor wrote:
>>> If you care about detecting bugs I would expect you to be
>>> supportive rather than dismissive of this work, and helpful
>>> in bringing it to fruition rather that putting it down or
>>> questioning my priorities.  Especially since the work was
>>> prompted by your own (valid) complaint that GCC doesn't
>>> diagnose them.
>>>
>>
>> You don't really listen to what I am saying, I did not say
>> that we need another warning instead of fixing the wrong
>> optimization issue at hand.
>>
>> But I am in good company, you don't listen to Jakub and Richi
>> either.
> 
> I certainly intend to fix bugs I'm responsible for introducing.
> I always do if given the chance.  I assume you are referring
> to bug 86711 (and 86714).  Fixing the underlying problem has
> been on my mind since you first mentioned it, and on my to-do
> list since last week (bug 86688).  You have now submitted
> a patch for both of the former, plus a follow-on patch, but
> you didn't assign either of the bugs to yourself, or indicated
> if the patch fixes 86688, or if you intend to work on it too.
> I haven't reviewed the patches in any detail except to note
> that they touch the same area as mine and likely conflict.
> I'm not sure what I should do now.  Work on fixing these bugs
> myself?  (I would prefer to.)  Try to rebase my work on top
> of yours to see what the conflicts are and try to resolve
> them them in my ongoing work?  Or just keep working on my
> stuff and deal with the conflicts after your patches have
> been committed?  Or continue to debate conflicting priorities
> and try to resolve them first?
> 
> (Those are mostly rhetorical questions.)  The point is that
> if you would just let me fix my bugs we would not have this
> conundrum.  Your test cases are helpful.  But as I have said
> over and over, submitting patches for the same code at the same
> time and even undoing some prior work with no coordination is
> a recipe for confusion and conflict.  I don't recall this
> happening in the past and I don't really understand what
> triggered it in this case.  This isn't an area that normally
> sees a lot of activity.
> 

Martin,

I am totally sorry for this confusion.  I would please
ask you to do your work a bit slower, and that we please
can talk over the direction in which we want to go on.
For instance in the moment not so many new warnings, when
we actually should look at correctness and reliability issues.
I do definitely not want to revert your work, but I will have
to hedge it where it goes too far, but that does not mean that
it will be worthless.

What made my alarm bells ring is the speed in which new buggy
features, are being implemented recently, while at the same time
several global reviewers raised concerns, which would not be
honored.  That is not a good thing.

To me it is an serious problem when those global reviewers
do not seem to agree on the way these features are implemented.

To be honest, I do not believe in democracy, or majority decisions.
But I always slow down when there is no consensus, and look for a
solution that is acceptable for all the key players.


Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-08-01 14:21   ` Bernd Edlinger
@ 2018-08-01 16:34     ` Martin Sebor
  2018-08-01 17:16       ` Bernd Edlinger
  2018-08-01 20:33       ` Martin Sebor
  0 siblings, 2 replies; 53+ messages in thread
From: Martin Sebor @ 2018-08-01 16:34 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

>> If you care about detecting bugs I would expect you to be
>> supportive rather than dismissive of this work, and helpful
>> in bringing it to fruition rather that putting it down or
>> questioning my priorities.  Especially since the work was
>> prompted by your own (valid) complaint that GCC doesn't
>> diagnose them.
>>
>
> You don't really listen to what I am saying, I did not say
> that we need another warning instead of fixing the wrong
> optimization issue at hand.
>
> But I am in good company, you don't listen to Jakub and Richi
> either.

I certainly intend to fix bugs I'm responsible for introducing.
I always do if given the chance.  I assume you are referring
to bug 86711 (and 86714).  Fixing the underlying problem has
been on my mind since you first mentioned it, and on my to-do
list since last week (bug 86688).  You have now submitted
a patch for both of the former, plus a follow-on patch, but
you didn't assign either of the bugs to yourself, or indicated
if the patch fixes 86688, or if you intend to work on it too.
I haven't reviewed the patches in any detail except to note
that they touch the same area as mine and likely conflict.
I'm not sure what I should do now.  Work on fixing these bugs
myself?  (I would prefer to.)  Try to rebase my work on top
of yours to see what the conflicts are and try to resolve
them them in my ongoing work?  Or just keep working on my
stuff and deal with the conflicts after your patches have
been committed?  Or continue to debate conflicting priorities
and try to resolve them first?

(Those are mostly rhetorical questions.)  The point is that
if you would just let me fix my bugs we would not have this
conundrum.  Your test cases are helpful.  But as I have said
over and over, submitting patches for the same code at the same
time and even undoing some prior work with no coordination is
a recipe for confusion and conflict.  I don't recall this
happening in the past and I don't really understand what
triggered it in this case.  This isn't an area that normally
sees a lot of activity.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-07-31  3:52 ` Martin Sebor
@ 2018-08-01 14:21   ` Bernd Edlinger
  2018-08-01 16:34     ` Martin Sebor
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-08-01 14:21 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches

On 07/31/18 05:51, Martin Sebor wrote:
> On 07/30/2018 03:11 PM, Bernd Edlinger wrote:
>> Hi,
>>
>>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>>>     maxelts = maxelts / eltsize - 1;
>>>       }
>>>
>>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>>> +     is stored in (as in const char a[3] = "123";  */
>>> +  if (!arr && maxelts < strelts)
>>> +    return NULL_TREE;
>>> +
>>
>> this is c_strlen, how is the caller ever supposed to handle non-zero terminated strings???
>> especially if you do this above?
> 
> Callers that pass in a non-null ARR handle them by issuing
> a warning.  The rest get back a null result.  It should be
> evident from the rest of the patch.  It can be debated what
> each caller should do when it detects such a missing nul
> where one is expected.  Different approaches may be more
> or less appropriate for different callers/functions (e.g.,
> strcpy vs strlen).
> 

Sorry, right in the beginning you have "if (!add) arr = arrs;"

>>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>>> {
>>>   STRIP_NOPS (src);
>>> +
>>> +  /* Used to detect non-nul-terminated strings in subexpressions
>>> +     of a conditional expression.  When ARR is null, point it at
>>> +     one of the elements for simplicity.  */
>>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>>> +  if (!arr)
>>> +    arr = arrs;
>>
>>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>>   length = string_length (TREE_STRING_POINTER (init), charsize,
>>>               length / charsize);
>>> -  if (compare_tree_int (array_size, length + 1) < 0)
>>> +  if (nulterm)
>>> +    *nulterm = array_elts > length;
>>> +  else if (array_elts <= length)
>>>     return NULL_TREE;
>>
>> I don't understand why you can't use
>> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init))
>> instead of this convoluted code above ???
>>
>> Sorry, this patch does not look like it is ready any time soon.
> 
> I'm open to technical comments on the substance of my changes
> but I'm not interested in your opinion of the readiness of
> the patch (whatever that might mean), certainly not if you
> have formed it after skimming a random handful of lines out
> of a 600 line patch.
> 

Sorry, again.  I just meant you should fix the issues, and
maybe make the patch a bit smaller.

>> But actually I am totally puzzled by your priorities.
>> This is what I see right now:
>>
>> 1) We have missing warnings.
>> 2) We have wrong code bugs.
>> 3) We have apparently a specification error on the C Language standard (*)
>>
>>
>> Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong code
>> issue,and why do you not tackle 3) in your WG14?
> 
> My priorities are none of your concern.
> 

Sorry, again, but your priorities seem to conflict with mine.

> Your "attempts to fix" issues interfere with my work on a number
> of projects.  You are not being helpful -- instead, by submitting
> changes that you know fully well conflict with mine, you are
> impeding and undermining my work.  That is why I object to them.
> 
>> (*) which means that GCC is currently removing code from assertions
>> as I pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html
>>
>> This happens because GCC follows the language standards literally right now.
>>
>> I would say too literally, and it proves that the language standard's logic is
>> flawed IMHO.
> 
> I have no idea what your point is about standards, but bugs
> like the one in the example, including those arising from
> uninitialized arrays, could be detected with only minor
> enhancements to the tree-ssa-strlen pass.  Implementing some
> of this is among the projects I'm expected and expecting to
> work on for GCC 9.  This patch is a small step in that
> direction.
> 
> If you care about detecting bugs I would expect you to be
> supportive rather than dismissive of this work, and helpful
> in bringing it to fruition rather that putting it down or
> questioning my priorities.  Especially since the work was
> prompted by your own (valid) complaint that GCC doesn't
> diagnose them.
> 

You don't really listen to what I am saying, I did not say
that we need another warning instead of fixing the wrong
optimization issue at hand.

But I am in good company, you don't listen to Jakub and Richi
either.


Bernd.

> Martin
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
  2018-07-30 21:12 PING [PATCH] warn for strlen of arrays with missing nul (PR 86552) Bernd Edlinger
@ 2018-07-31  3:52 ` Martin Sebor
  2018-08-01 14:21   ` Bernd Edlinger
  0 siblings, 1 reply; 53+ messages in thread
From: Martin Sebor @ 2018-07-31  3:52 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 07/30/2018 03:11 PM, Bernd Edlinger wrote:
> Hi,
>
>> @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
>> 	maxelts = maxelts / eltsize - 1;
>>       }
>>
>> +  /* Unless the caller is prepared to handle it by passing in a non-null
>> +     ARR, fail if the terminating nul doesn't fit in the array the string
>> +     is stored in (as in const char a[3] = "123";  */
>> +  if (!arr && maxelts < strelts)
>> +    return NULL_TREE;
>> +
>
> this is c_strlen, how is the caller ever supposed to handle non-zero terminated strings???
> especially if you do this above?

Callers that pass in a non-null ARR handle them by issuing
a warning.  The rest get back a null result.  It should be
evident from the rest of the patch.  It can be debated what
each caller should do when it detects such a missing nul
where one is expected.  Different approaches may be more
or less appropriate for different callers/functions (e.g.,
strcpy vs strlen).

>> +c_strlen (tree src, int only_value, tree *arr /* = NULL */)
>> {
>>   STRIP_NOPS (src);
>> +
>> +  /* Used to detect non-nul-terminated strings in subexpressions
>> +     of a conditional expression.  When ARR is null, point it at
>> +     one of the elements for simplicity.  */
>> +  tree arrs[] = { NULL_TREE, NULL_TREE };
>> +  if (!arr)
>> +    arr = arrs;
>
>> @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>>   length = string_length (TREE_STRING_POINTER (init), charsize,
>> 			  length / charsize);
>> -  if (compare_tree_int (array_size, length + 1) < 0)
>> +  if (nulterm)
>> +    *nulterm = array_elts > length;
>> +  else if (array_elts <= length)
>>     return NULL_TREE;
>
> I don't understand why you can't use
> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init))
> instead of this convoluted code above ???
>
> Sorry, this patch does not look like it is ready any time soon.

I'm open to technical comments on the substance of my changes
but I'm not interested in your opinion of the readiness of
the patch (whatever that might mean), certainly not if you
have formed it after skimming a random handful of lines out
of a 600 line patch.

> But actually I am totally puzzled by your priorities.
> This is what I see right now:
>
> 1) We have missing warnings.
> 2) We have wrong code bugs.
> 3) We have apparently a specification error on the C Language standard (*)
>
>
> Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong code
> issue,and why do you not tackle 3) in your WG14?

My priorities are none of your concern.

Your "attempts to fix" issues interfere with my work on a number
of projects.  You are not being helpful -- instead, by submitting
changes that you know fully well conflict with mine, you are
impeding and undermining my work.  That is why I object to them.

> (*) which means that GCC is currently removing code from assertions
> as I pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html
>
> This happens because GCC follows the language standards literally right now.
>
> I would say too literally, and it proves that the language standard's logic is
> flawed IMHO.

I have no idea what your point is about standards, but bugs
like the one in the example, including those arising from
uninitialized arrays, could be detected with only minor
enhancements to the tree-ssa-strlen pass.  Implementing some
of this is among the projects I'm expected and expecting to
work on for GCC 9.  This patch is a small step in that
direction.

If you care about detecting bugs I would expect you to be
supportive rather than dismissive of this work, and helpful
in bringing it to fruition rather that putting it down or
questioning my priorities.  Especially since the work was
prompted by your own (valid) complaint that GCC doesn't
diagnose them.

Martin

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552)
@ 2018-07-30 21:12 Bernd Edlinger
  2018-07-31  3:52 ` Martin Sebor
  0 siblings, 1 reply; 53+ messages in thread
From: Bernd Edlinger @ 2018-07-30 21:12 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches

Hi,

>@@ -621,6 +674,12 @@ c_strlen (tree src, int only_value)
> 	maxelts = maxelts / eltsize - 1;
>       }
> 
>+  /* Unless the caller is prepared to handle it by passing in a non-null
>+     ARR, fail if the terminating nul doesn't fit in the array the string
>+     is stored in (as in const char a[3] = "123";  */
>+  if (!arr && maxelts < strelts)
>+    return NULL_TREE;
>+

this is c_strlen, how is the caller ever supposed to handle non-zero terminated strings???
especially if you do this above?

>+c_strlen (tree src, int only_value, tree *arr /* = NULL */)
> {
>   STRIP_NOPS (src);
>+
>+  /* Used to detect non-nul-terminated strings in subexpressions
>+     of a conditional expression.  When ARR is null, point it at
>+     one of the elements for simplicity.  */
>+  tree arrs[] = { NULL_TREE, NULL_TREE };
>+  if (!arr)
>+    arr = arrs;

>@@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset)
>   unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init);
>   length = string_length (TREE_STRING_POINTER (init), charsize,
> 			  length / charsize);
>-  if (compare_tree_int (array_size, length + 1) < 0)
>+  if (nulterm)
>+    *nulterm = array_elts > length;
>+  else if (array_elts <= length)
>     return NULL_TREE;

I don't understand why you can't use
compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), TREE_STRING_LENGTH (init))
instead of this convoluted code above ???

Sorry, this patch does not look like it is ready any time soon.


But actually I am totally puzzled by your priorities.
This is what I see right now:

1) We have missing warnings.
2) We have wrong code bugs.
3) We have apparently a specification error on the C Language standard (*)


Why are you prioritizing 1) over 2) thus blocking my attempts to fix a wrong code
issue,and why do you not tackle 3) in your WG14?



(*) which means that GCC is currently removing code from assertions
as I pointed out here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01695.html

This happens because GCC follows the language standards literally right now.

I would say too literally, and it proves that the language standard's logic is
flawed IMHO.

Thanks
Bernd.

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2018-10-01 21:48 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-19 20:09 [PATCH] warn for strlen of arrays with missing nul (PR 86552) Martin Sebor
2018-07-25 23:38 ` PING " Martin Sebor
2018-07-30 19:18   ` Martin Sebor
2018-08-02  2:44     ` PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) Martin Sebor
2018-08-02 13:26       ` Bernd Edlinger
2018-08-02 18:56         ` Bernd Edlinger
2018-08-02 20:34           ` Martin Sebor
2018-08-03 13:01             ` Bernd Edlinger
2018-08-03 19:59               ` Martin Sebor
2018-08-15  5:31               ` Jeff Law
2018-08-29 17:17           ` Jeff Law
2018-08-24  6:36         ` Jeff Law
2018-08-24 12:28           ` Bernd Edlinger
2018-08-24 16:04             ` Jeff Law
2018-08-24 21:56               ` Bernd Edlinger
2018-08-24 16:51         ` Jeff Law
2018-08-24 17:26           ` Bernd Edlinger
2018-08-24 23:54             ` Jeff Law
2018-08-25  6:32               ` Bernd Edlinger
2018-08-25 17:33                 ` Jeff Law
2018-08-25 18:36                   ` Bernd Edlinger
2018-08-25 19:02                     ` Jeff Law
2018-08-25 19:32                       ` Bernd Edlinger
2018-08-25 20:42                         ` Martin Sebor
2018-08-26 10:20                           ` Bernd Edlinger
2018-08-25 23:22                         ` Jeff Law
2018-08-17  5:15       ` Jeff Law
2018-08-17 14:38         ` Martin Sebor
2018-08-13 21:23   ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Martin Sebor
2018-08-13 21:25     ` [PATCH 1/6] prevent folding of unterminated const arrays in memchr calls (PR " Martin Sebor
2018-08-13 21:27     ` [PATCH 3/6] detect unterminated const arrays in strcpy calls (PR 86552) Martin Sebor
2018-08-30 22:31       ` Jeff Law
2018-08-13 21:28     ` [PATCH 4/6] detect unterminated const arrays in sprintf " Martin Sebor
2018-08-30 22:55       ` Jeff Law
2018-08-13 21:29     ` [PATCH 5/6] detect unterminated const arrays in stpcpy " Martin Sebor
2018-08-30 23:07       ` Jeff Law
2018-09-14 18:39       ` Jeff Law
2018-08-13 21:29     ` [PATCH 6/6] detect unterminated const arrays in strnlen " Martin Sebor
2018-08-30 23:25       ` Jeff Law
2018-10-01 21:49       ` Jeff Law
2018-08-14  3:21     ` [PATCH 2/6] detect unterminated const arrays in strlen " Martin Sebor
2018-08-30 22:15       ` Jeff Law
2018-08-31  2:25         ` Martin Sebor
2018-08-15  6:02     ` [PATCH 0/6] improve handling of char arrays with missing nul (PR 86552, 86711, 86714) Jeff Law
2018-08-15 14:47       ` Martin Sebor
2018-08-15 15:42         ` Jeff Law
2018-08-24 10:13           ` Richard Biener
2018-07-30 21:12 PING [PATCH] warn for strlen of arrays with missing nul (PR 86552) Bernd Edlinger
2018-07-31  3:52 ` Martin Sebor
2018-08-01 14:21   ` Bernd Edlinger
2018-08-01 16:34     ` Martin Sebor
2018-08-01 17:16       ` Bernd Edlinger
2018-08-01 20:33       ` Martin Sebor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).