From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 93613 invoked by alias); 2 Aug 2018 02:44:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 93574 invoked by uid 89); 2 Aug 2018 02:44:17 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.0 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=flexible, Attached X-HELO: mail-qt0-f195.google.com Received: from mail-qt0-f195.google.com (HELO mail-qt0-f195.google.com) (209.85.216.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 02 Aug 2018 02:44:11 +0000 Received: by mail-qt0-f195.google.com with SMTP id t5-v6so780576qtn.3 for ; Wed, 01 Aug 2018 19:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:cc:message-id:date:user-agent :mime-version:in-reply-to; bh=N5sM8pbsjpiGRT2UUpODY5VIlDAmgyrK1VZlTVoWXcg=; b=IRAeFu61oRFEhCasEGgKyfXSUuAu9BxsouJbquALdVRSQL3QEOTRLUgj0vche2ZRsV Hm6fsppa0X2l9uQYUbYrCO9jT91JqmhxREVKxosq/f2/hLGAuwRo5X2H+F+3vGMHLdEZ ReKILvdKyf/zD1iNNcCwwJhB/Xb7ZDGJ6XHGQyky/KJrJJaHp74h7pFIrQE2dJHk29Bg DV684QydFSQ01gCmoUD5XsgaGOEpoqTzSsopc1G2YSFuKAvMOoFUmmqGjijBP+iCS7YJ iF7wUs2F34C8ny/l/nQntYIxmZyIKjfK+/jJCyiT+L9WZ9NidMczERbSfib8Hjf9gHEz Oa5g== Return-Path: Received: from localhost.localdomain (97-118-124-30.hlrn.qwest.net. [97.118.124.30]) by smtp.gmail.com with ESMTPSA id k5-v6sm406704qkf.53.2018.08.01.19.44.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 01 Aug 2018 19:44:08 -0700 (PDT) Subject: PING [PATCH] warn for strlen of arrays with missing nul (PR 86552, 86711, 86714) ) To: Gcc Patch List References: <805ad9c9-e6e4-a159-7fe1-6bb5562b8fe0@gmail.com> From: Martin Sebor Cc: Bernd Edlinger Message-ID: <3136e337-a551-3853-c536-b3b9ff0acdb9@gmail.com> Date: Thu, 02 Aug 2018 02:44:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <805ad9c9-e6e4-a159-7fe1-6bb5562b8fe0@gmail.com> Content-Type: multipart/mixed; boundary="------------027912AD819A10832B584E8D" X-IsSubscribed: yes X-SW-Source: 2018-08/txt/msg00155.txt.bz2 This is a multi-part message in MIME format. --------------027912AD819A10832B584E8D Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2244 Since the foundation of the patch is detecting and avoiding the overly aggressive folding of unterminated char arrays, besides issuing a warning for such arguments to strlen, the patch also fixes pr86711 - wrong folding of memchr, and pr86714 - tree-ssa-forwprop.c confused by too long initializer. The substance of the attached updated patch is unchanged, I have just added test cases for the two additional bugs. Bernd, as I mentioned Wednesday, the patch supersedes yours here: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html Martin On 07/30/2018 01:17 PM, Martin Sebor wrote: > Attached is an updated version of the patch that handles more > instances of calling strlen() on a constant array that is not > a nul-terminated string. > > No other functions except strlen are explicitly handled yet, > and neither are constant arrays with braced-initializer lists > like const char a[] = { 'a', 'b', 'c' }; I am testing > an independent solution for those (bug 86552). Once those > are handled the warning will be able to detect those as well. > > Tested on x86_64-linux. > > On 07/25/2018 05:38 PM, Martin Sebor wrote: >> Ping: https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01124.html >> >> The fix for bug 86532 has been checked in so this enhancement >> can now be applied on top of it (with only minor adjustments). >> >> On 07/19/2018 02:08 PM, Martin Sebor wrote: >>> In the discussion of my patch for pr86532 Bernd noted that >>> GCC silently accepts constant character arrays with no >>> terminating nul as arguments to strlen (and other string >>> functions). >>> >>> The attached patch is a first step in detecting these kinds >>> of bugs in strlen calls by issuing -Wstringop-overflow. >>> The next step is to modify all other handlers of built-in >>> functions to detect the same problem (not part of this patch). >>> Yet another step is to detect these problems in arguments >>> initialized using the non-string form: >>> >>> const char a[] = { 'a', 'b', 'c' }; >>> >>> This patch is meant to apply on top of the one for bug 86532 >>> (I tested it with an earlier version of that patch so there >>> is code in the context that does not appear in the latest >>> version of the other diff). >>> >>> Martin >>> >> > --------------027912AD819A10832B584E8D Content-Type: text/x-patch; name="gcc-86552.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="gcc-86552.diff" Content-length: 36212 PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too long initializer PR tree-optimization/86711 - wrong folding of memchr PR tree-optimization/86552 - missing warning for reading past the end of non-string arrays gcc/ChangeLog: PR tree-optimization/86714 PR tree-optimization/86711 PR tree-optimization/86552 * builtins.h (warn_string_no_nul): Declare.. (c_strlen): Add argument. * builtins.c (warn_string_no_nul): New function. (fold_builtin_strlen): Add argument. Detect missing nul. (fold_builtin_1): Adjust. (string_length): Add argument and use it. (c_strlen): Same. (expand_builtin_strlen): Detect missing nul. * expr.c (string_constant): Add arguments. Detect missing nul terminator and outermost declaration it's missing in. * expr.h (string_constant): Add argument. * fold-const.c (c_getstr): Change argument to bool*, rename other arguments. * fold-const-call.c (fold_const_call): Detect missing nul. * gimple-fold.c (get_range_strlen): Add argument. (get_maxval_strlen): Adjust. * gimple-fold.h (get_range_strlen): Add argument. gcc/testsuite/ChangeLog: PR tree-optimization/86714 PR tree-optimization/86711 PR tree-optimization/86552 * gcc.c-torture/execute/memchr-1.c: New test. * gcc.c-torture/execute/pr86714.c: New test. * gcc.dg/warn-strlen-no-nul.c: New test. diff --git a/gcc/builtins.c b/gcc/builtins.c index aa3e0d8..f4924d5 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -150,7 +150,7 @@ static tree stabilize_va_list_loc (location_t, tree, int); static rtx expand_builtin_expect (tree, rtx); static tree fold_builtin_constant_p (tree); static tree fold_builtin_classify_type (tree); -static tree fold_builtin_strlen (location_t, tree, tree); +static tree fold_builtin_strlen (location_t, tree, tree, tree); static tree fold_builtin_inf (location_t, tree, int); static tree rewrite_call_expr (location_t, tree, int, tree, int, ...); static bool validate_arg (const_tree, enum tree_code code); @@ -550,6 +550,36 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts) return n; } +/* For a call expression EXP to a function that expects a string argument, + issue a diagnostic due to it being a called with an argument NONSTR + that is a character array with no terminating NUL. */ + +void +warn_string_no_nul (location_t loc, tree exp, tree fndecl, tree nonstr) +{ + loc = expansion_point_location_if_in_system_header (loc); + + bool warned; + if (exp) + { + if (!fndecl) + fndecl = get_callee_fndecl (exp); + warned = warning_at (loc, OPT_Wstringop_overflow_, + "%K%qD argument missing terminating nul", + exp, fndecl); + } + else + { + gcc_assert (fndecl); + warned = warning_at (loc, OPT_Wstringop_overflow_, + "%qD argument missing terminating nul", + fndecl); + } + + if (warned && DECL_P (nonstr)) + inform (DECL_SOURCE_LOCATION (nonstr), "referenced argument declared here"); +} + /* Compute the length of a null-terminated character string or wide character string handling character sizes of 1, 2, and 4 bytes. TREE_STRING_LENGTH is not the right way because it evaluates to @@ -567,37 +597,60 @@ string_length (const void *ptr, unsigned eltsize, unsigned maxelts) accesses. Note that this implies the result is not going to be emitted into the instruction stream. + When ARR is non-null and the string is not properly nul-terminated, + set *ARR to the declaration of the outermost constant object whose + initializer (or one of its elements) is not nul-terminated. + The value returned is of type `ssizetype'. Unfortunately, string_constant can't access the values of const char arrays with initializers, so neither can we do so here. */ tree -c_strlen (tree src, int only_value) +c_strlen (tree src, int only_value, tree *arr /* = NULL */) { STRIP_NOPS (src); + + /* Used to detect non-nul-terminated strings in subexpressions + of a conditional expression. When ARR is null, point it at + one of the elements for simplicity. */ + tree arrs[] = { NULL_TREE, NULL_TREE }; + if (!arr) + arr = arrs; + if (TREE_CODE (src) == COND_EXPR && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) { - tree len1, len2; - - len1 = c_strlen (TREE_OPERAND (src, 1), only_value); - len2 = c_strlen (TREE_OPERAND (src, 2), only_value); + tree len1 = c_strlen (TREE_OPERAND (src, 1), only_value, arrs); + tree len2 = c_strlen (TREE_OPERAND (src, 2), only_value, arrs + 1); if (tree_int_cst_equal (len1, len2)) - return len1; + { + *arr = arrs[0] ? arrs[0] : arrs[1]; + return len1; + } } if (TREE_CODE (src) == COMPOUND_EXPR && (only_value || !TREE_SIDE_EFFECTS (TREE_OPERAND (src, 0)))) - return c_strlen (TREE_OPERAND (src, 1), only_value); + return c_strlen (TREE_OPERAND (src, 1), only_value, arr); location_t loc = EXPR_LOC_OR_LOC (src, input_location); /* Offset from the beginning of the string in bytes. */ tree byteoff; - src = string_constant (src, &byteoff); - if (src == 0) - return NULL_TREE; + /* Set if array is nul-terminated, false otherwise. */ + bool nulterm; + src = string_constant (src, &byteoff, &nulterm, arr); + if (!src) + { + *arr = arrs[0] ? arrs[0] : arrs[1]; + return NULL_TREE; + } + + /* Clear *ARR when the string is nul-terminated. It should be + of no interest to callers. */ + if (nulterm) + *arr = NULL_TREE; /* Determine the size of the string element. */ unsigned eltsize @@ -621,6 +674,12 @@ c_strlen (tree src, int only_value) maxelts = maxelts / eltsize - 1; } + /* Unless the caller is prepared to handle it by passing in a non-null + ARR, fail if the terminating nul doesn't fit in the array the string + is stored in (as in const char a[3] = "123"; */ + if (!arr && maxelts < strelts) + return NULL_TREE; + /* PTR can point to the byte representation of any string type, including char* and wchar_t*. */ const char *ptr = TREE_STRING_POINTER (src); @@ -650,7 +709,8 @@ c_strlen (tree src, int only_value) offsave = fold_convert (ssizetype, offsave); tree condexp = fold_build2_loc (loc, LE_EXPR, boolean_type_node, offsave, build_int_cst (ssizetype, len * eltsize)); - tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), offsave); + tree lenexp = size_diffop_loc (loc, ssize_int (strelts * eltsize), + offsave); return fold_build3_loc (loc, COND_EXPR, ssizetype, condexp, lenexp, build_zero_cst (ssizetype)); } @@ -690,7 +750,7 @@ c_strlen (tree src, int only_value) Since ELTOFF is our starting index into the string, no further calculation is needed. */ unsigned len = string_length (ptr + eltoff * eltsize, eltsize, - maxelts - eltoff); + strelts - eltoff); return ssize_int (len); } @@ -2855,7 +2915,6 @@ expand_builtin_strlen (tree exp, rtx target, struct expand_operand ops[4]; rtx pat; - tree len; tree src = CALL_EXPR_ARG (exp, 0); rtx src_reg; rtx_insn *before_strlen; @@ -2864,20 +2923,39 @@ expand_builtin_strlen (tree exp, rtx target, unsigned int align; /* If the length can be computed at compile-time, return it. */ - len = c_strlen (src, 0); + tree array; + tree len = c_strlen (src, 0, &array); if (len) - return expand_expr (len, target, target_mode, EXPAND_NORMAL); + { + if (array) + { + /* Array refers to the non-nul terminated constant array + whose length is attempted to be computed. */ + warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array); + return NULL_RTX; + } + return expand_expr (len, target, target_mode, EXPAND_NORMAL); + } /* If the length can be computed at compile-time and is constant integer, but there are side-effects in src, evaluate src for side-effects, then return len. E.g. x = strlen (i++ ? "xfoo" + 1 : "bar"); can be optimized into: i++; x = 3; */ - len = c_strlen (src, 1); - if (len && TREE_CODE (len) == INTEGER_CST) + len = c_strlen (src, 1, &array); + if (len) { - expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL); - return expand_expr (len, target, target_mode, EXPAND_NORMAL); + if (array) + { + warn_string_no_nul (EXPR_LOCATION (exp), exp, NULL_TREE, array); + return NULL_RTX; + } + + if (TREE_CODE (len) == INTEGER_CST) + { + expand_expr (src, const0_rtx, VOIDmode, EXPAND_NORMAL); + return expand_expr (len, target, target_mode, EXPAND_NORMAL); + } } align = get_pointer_alignment (src) / BITS_PER_UNIT; @@ -8255,19 +8333,30 @@ fold_builtin_classify_type (tree arg) return build_int_cst (integer_type_node, type_to_class (TREE_TYPE (arg))); } -/* Fold a call to __builtin_strlen with argument ARG. */ +/* Fold a strlen call to FNDECL of TYPE, and with argument ARG. */ static tree -fold_builtin_strlen (location_t loc, tree type, tree arg) +fold_builtin_strlen (location_t loc, tree fndecl, tree type, tree arg) { if (!validate_arg (arg, POINTER_TYPE)) return NULL_TREE; else { - tree len = c_strlen (arg, 0); - + tree arr = NULL_TREE; + tree len = c_strlen (arg, 0, &arr); if (len) - return fold_convert_loc (loc, type, len); + { + if (loc == UNKNOWN_LOCATION && EXPR_HAS_LOCATION (arg)) + loc = EXPR_LOCATION (arg); + + /* To avoid warning multiple times about non-nul-terminated + strings only warn if their length has been determined + and it's being folded. */ + if (arr) + warn_string_no_nul (loc, NULL_TREE, fndecl, arr); + + return fold_convert_loc (loc, type, len); + } return NULL_TREE; } @@ -9175,7 +9264,7 @@ fold_builtin_1 (location_t loc, tree fndecl, tree arg0) return fold_builtin_classify_type (arg0); case BUILT_IN_STRLEN: - return fold_builtin_strlen (loc, type, arg0); + return fold_builtin_strlen (loc, fndecl, type, arg0); CASE_FLT_FN (BUILT_IN_FABS): CASE_FLT_FN_FLOATN_NX (BUILT_IN_FABS): diff --git a/gcc/builtins.h b/gcc/builtins.h index 2e0a2f9..73b0b0b 100644 --- a/gcc/builtins.h +++ b/gcc/builtins.h @@ -58,7 +58,7 @@ extern bool get_pointer_alignment_1 (tree, unsigned int *, unsigned HOST_WIDE_INT *); extern unsigned int get_pointer_alignment (tree); extern unsigned string_length (const void*, unsigned, unsigned); -extern tree c_strlen (tree, int); +extern tree c_strlen (tree, int, tree * = NULL); extern void expand_builtin_setjmp_setup (rtx, rtx); extern void expand_builtin_setjmp_receiver (rtx); extern void expand_builtin_update_setjmp_buf (rtx); @@ -103,7 +103,7 @@ extern bool target_char_cst_p (tree t, char *p); extern internal_fn associated_internal_fn (tree); extern internal_fn replacement_internal_fn (gcall *); - +extern void warn_string_no_nul (location_t, tree, tree, tree); extern tree max_object_size (); #endif /* GCC_BUILTINS_H */ diff --git a/gcc/expr.c b/gcc/expr.c index de6709d..edbd7f8 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -11271,10 +11271,14 @@ is_aligning_offset (const_tree offset, const_tree exp) /* Return the tree node if an ARG corresponds to a string constant or zero if it doesn't. If we return nonzero, set *PTR_OFFSET to the (possibly non-constant) offset in bytes within the string that ARG is accessing. + If NULTERM is non-null, consider valid even sequences of characters that + aren't nul-terminated strings. In that case, set NULTERM if ARG refers + to such a sequence and clear it otherwise. The type of the offset is sizetype. */ tree -string_constant (tree arg, tree *ptr_offset) +string_constant (tree arg, tree *ptr_offset, bool *nulterm /* = NULL */, + tree *decl /* = NULL */) { tree array; STRIP_NOPS (arg); @@ -11328,7 +11332,7 @@ string_constant (tree arg, tree *ptr_offset) return NULL_TREE; tree offset; - if (tree str = string_constant (arg0, &offset)) + if (tree str = string_constant (arg0, &offset, nulterm, decl)) { /* Avoid pointers to arrays (see bug 86622). */ if (POINTER_TYPE_P (TREE_TYPE (arg)) @@ -11368,6 +11372,10 @@ string_constant (tree arg, tree *ptr_offset) if (TREE_CODE (array) == STRING_CST) { *ptr_offset = fold_convert (sizetype, offset); + if (nulterm) + *nulterm = true; + if (decl) + *decl = NULL_TREE; return array; } @@ -11414,6 +11422,49 @@ string_constant (tree arg, tree *ptr_offset) if (!array_size || TREE_CODE (array_size) != INTEGER_CST) return NULL_TREE; + unsigned HOST_WIDE_INT array_elts = tree_to_uhwi (array_size); + + /* When ARG refers to an aggregate (of arrays) try to determine + the size of the character array within the aggregate. */ + tree ref = arg; + tree reftype = TREE_TYPE (arg); + + if (TREE_CODE (ref) == MEM_REF) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (ref) == ADDR_EXPR) + { + ref = TREE_OPERAND (ref, 0); + reftype = TREE_TYPE (ref); + } + } + else + while (TREE_CODE (ref) == ARRAY_REF) + { + reftype = TREE_TYPE (ref); + ref = TREE_OPERAND (ref, 0); + } + + if (TREE_CODE (ref) == COMPONENT_REF) + reftype = TREE_TYPE (ref); + + while (TREE_CODE (reftype) == ARRAY_TYPE) + { + tree next = TREE_TYPE (reftype); + if (TREE_CODE (next) == INTEGER_TYPE) + { + if (tree size = TYPE_SIZE_UNIT (reftype)) + if (tree_fits_uhwi_p (size)) + array_elts = tree_to_uhwi (size); + break; + } + + reftype = TREE_TYPE (reftype); + } + + if (decl) + *decl = array; + /* Avoid returning a string that doesn't fit in the array it is stored in, like const char a[4] = "abcde"; @@ -11427,7 +11478,9 @@ string_constant (tree arg, tree *ptr_offset) unsigned HOST_WIDE_INT length = TREE_STRING_LENGTH (init); length = string_length (TREE_STRING_POINTER (init), charsize, length / charsize); - if (compare_tree_int (array_size, length + 1) < 0) + if (nulterm) + *nulterm = array_elts > length; + else if (array_elts <= length) return NULL_TREE; *ptr_offset = offset; diff --git a/gcc/expr.h b/gcc/expr.h index cf047d4..e630979 100644 --- a/gcc/expr.h +++ b/gcc/expr.h @@ -288,7 +288,7 @@ expand_normal (tree exp) /* Return the tree node and offset if a given argument corresponds to a string constant. */ -extern tree string_constant (tree, tree *); +extern tree string_constant (tree, tree *, bool * = NULL, tree * = NULL); /* Two different ways of generating switch statements. */ extern int try_casesi (tree, tree, tree, tree, rtx, rtx, rtx, profile_probability); diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c index 06a42060..849a443 100644 --- a/gcc/fold-const-call.c +++ b/gcc/fold-const-call.c @@ -1199,9 +1199,14 @@ fold_const_call (combined_fn fn, tree type, tree arg) switch (fn) { case CFN_BUILT_IN_STRLEN: - if (const char *str = c_getstr (arg)) - return build_int_cst (type, strlen (str)); - return NULL_TREE; + { + bool nulterm; + if (const char *str = c_getstr (arg, NULL, &nulterm)) + if (nulterm) + return build_int_cst (type, strlen (str)); + + return NULL_TREE; + } CASE_CFN_NAN: CASE_FLT_FN_FLOATN_NX (CFN_BUILT_IN_NAN): diff --git a/gcc/fold-const.c b/gcc/fold-const.c index b318fc77..ecbc38c 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -14577,23 +14577,23 @@ fold_build_pointer_plus_hwi_loc (location_t loc, tree ptr, HOST_WIDE_INT off) /* Return a pointer P to a NUL-terminated string representing the sequence of constant characters referred to by SRC (or a subsequence of such characters within it if SRC is a reference to a string plus some - constant offset). If STRLEN is non-null, store stgrlen(P) in *STRLEN. - If STRSIZE is non-null, store in *STRSIZE the size of the array - the string is stored in; in that case, even though P points to a NUL - terminated string, SRC need not refer to one. This can happen when - SRC refers to a constant character array initialized to all non-NUL - values, as in the C declaration: char a[4] = "1234"; */ + constant offset). If STRSIZE is non-null, store the size of the string + literal in *STRSIZE, including any embedded or terminating nuls. If + If NULLTERM is non-null, set *NULLTERM if the referenced string is + guaranteed to contain a terminating NUL. Otherwise clear it. This + can happen in the case of valid C declarations such as: + const char a[3] = "123"; */ const char * -c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */, - unsigned HOST_WIDE_INT *strsize /* = NULL */) +c_getstr (tree src, unsigned HOST_WIDE_INT *strsize /* = NULL */, + bool *nulterm /* = NULL */) { tree offset_node; - if (strlen) - *strlen = 0; + if (strsize) + *strsize = 0; - src = string_constant (src, &offset_node); + src = string_constant (src, &offset_node, nulterm); if (src == 0) return NULL; @@ -14606,47 +14606,42 @@ c_getstr (tree src, unsigned HOST_WIDE_INT *strlen /* = NULL */, offset = tree_to_uhwi (offset_node); } - /* STRING_LENGTH is the size of the string literal, including any - embedded NULs. STRING_SIZE is the size of the array the string + /* STRING_SIZE is the size of the string literal, including any + embedded NULs. ARRAY_SIZE is the size of the array the string literal is stored in. */ - unsigned HOST_WIDE_INT string_length = TREE_STRING_LENGTH (src); - unsigned HOST_WIDE_INT string_size = string_length; + unsigned HOST_WIDE_INT string_size = TREE_STRING_LENGTH (src); + unsigned HOST_WIDE_INT array_size = string_size; tree type = TREE_TYPE (src); if (tree size = TYPE_SIZE_UNIT (type)) if (tree_fits_shwi_p (size)) - string_size = tree_to_uhwi (size); + array_size = tree_to_uhwi (size); + + const char *string = TREE_STRING_POINTER (src); - if (strlen) + if (strsize) { - /* Compute and store the length of the substring at OFFSET. + /* Compute and store the size of the substring at OFFSET. All offsets past the initial length refer to null strings. */ - if (offset <= string_length) - *strlen = string_length - offset; + if (offset <= string_size) + *strsize = string_size - offset; else - *strlen = 0; + *strsize = 0; } - const char *string = TREE_STRING_POINTER (src); - - if (string_length == 0 - || offset >= string_size) + if (string_size == 0 + || offset >= array_size) return NULL; - if (strsize) - { - /* Support even constant character arrays that aren't proper - NUL-terminated strings. */ - *strsize = string_size; - } - else if (string[string_length - 1] != '\0') + if (!nulterm && string[string_size - 1] != '\0') { - /* Support only properly NUL-terminated strings but handle - consecutive strings within the same array, such as the six - substrings in "1\0002\0003". */ + /* When NULTERM is null, support only properly nul-terminated + strings but handle consecutive strings within the same array, + such as the six substrings in "1\0002\0003". Otherwise, let + the caller deal with non-nul-terminated arrays. */ return NULL; } - return offset <= string_length ? string + offset : ""; + return offset <= string_size ? string + offset : ""; } /* Given a tree T, compute which bits in T may be nonzero. */ diff --git a/gcc/fold-const.h b/gcc/fold-const.h index 1b9ccc0..a58a4a2 100644 --- a/gcc/fold-const.h +++ b/gcc/fold-const.h @@ -188,7 +188,7 @@ extern tree const_unop (enum tree_code, tree, tree); extern tree const_binop (enum tree_code, tree, tree, tree); extern bool negate_mathfn_p (combined_fn); extern const char *c_getstr (tree, unsigned HOST_WIDE_INT * = NULL, - unsigned HOST_WIDE_INT * = NULL); + bool * = NULL); extern wide_int tree_nonzero_bits (const_tree); /* Return OFF converted to a pointer offset type suitable as offset for diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c index c3fa570..9eefb37 100644 --- a/gcc/gimple-fold.c +++ b/gcc/gimple-fold.c @@ -1275,11 +1275,13 @@ gimple_fold_builtin_memset (gimple_stmt_iterator *gsi, tree c, tree len) Set *FLEXP to true if the range of the string lengths has been obtained from the upper bound of an array at the end of a struct. Such an array may hold a string that's longer than its upper bound - due to it being used as a poor-man's flexible array member. */ + due to it being used as a poor-man's flexible array member. + Clear *NULTERM if ARG refers to a constant array that is known + not be nul-terminated. */ static bool get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, - int fuzzy, bool *flexp) + int fuzzy, bool *flexp, bool *nulterm) { tree var, val = NULL_TREE; gimple *def_stmt; @@ -1301,7 +1303,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, if (TREE_CODE (aop0) == INDIRECT_REF && TREE_CODE (TREE_OPERAND (aop0, 0)) == SSA_NAME) return get_range_strlen (TREE_OPERAND (aop0, 0), - length, visited, type, fuzzy, flexp); + length, visited, type, fuzzy, flexp, + nulterm); } else if (TREE_CODE (TREE_OPERAND (op, 0)) == COMPONENT_REF && fuzzy) { @@ -1329,13 +1332,18 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, return false; } else - val = c_strlen (arg, 1); + { + tree arr; + val = c_strlen (arg, 1, &arr); + if (val && arr) + *nulterm = false; + } if (!val && fuzzy) { if (TREE_CODE (arg) == ADDR_EXPR) return get_range_strlen (TREE_OPERAND (arg, 0), length, - visited, type, fuzzy, flexp); + visited, type, fuzzy, flexp, nulterm); if (TREE_CODE (arg) == ARRAY_REF) { @@ -1477,7 +1485,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, || gimple_assign_unary_nop_p (def_stmt)) { tree rhs = gimple_assign_rhs1 (def_stmt); - return get_range_strlen (rhs, length, visited, type, fuzzy, flexp); + return get_range_strlen (rhs, length, visited, type, fuzzy, flexp, + nulterm); } else if (gimple_assign_rhs_code (def_stmt) == COND_EXPR) { @@ -1486,7 +1495,7 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, for (unsigned int i = 0; i < 2; i++) if (!get_range_strlen (ops[i], length, visited, type, fuzzy, - flexp)) + flexp, nulterm)) { if (fuzzy == 2) *maxlen = build_all_ones_cst (size_type_node); @@ -1513,7 +1522,8 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, if (arg == gimple_phi_result (def_stmt)) continue; - if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp)) + if (!get_range_strlen (arg, length, visited, type, fuzzy, flexp, + nulterm)) { if (fuzzy == 2) *maxlen = build_all_ones_cst (size_type_node); @@ -1545,19 +1555,27 @@ get_range_strlen (tree arg, tree length[2], bitmap *visited, int type, and false if PHIs and COND_EXPRs are to be handled optimistically, if we can determine string length minimum and maximum; it will use the minimum from the ones where it can be determined. - STRICT false should be only used for warning code. */ + STRICT false should be only used for warning code. + When non-null, clear *NULTERM if ARG refers to a constant array + that is known not be nul-terminated. Otherwise set it to true. */ bool -get_range_strlen (tree arg, tree minmaxlen[2], bool strict) +get_range_strlen (tree arg, tree minmaxlen[2], bool strict /* = false */, + bool *nulterm /* = NULL */) { bitmap visited = NULL; minmaxlen[0] = NULL_TREE; minmaxlen[1] = NULL_TREE; + bool nultermbuf; + if (!nulterm) + nulterm = &nultermbuf; + *nulterm = true; + bool flexarray = false; if (!get_range_strlen (arg, minmaxlen, &visited, 1, strict ? 1 : 2, - &flexarray)) + &flexarray, nulterm)) { minmaxlen[0] = NULL_TREE; minmaxlen[1] = NULL_TREE; @@ -1576,7 +1594,7 @@ get_maxval_strlen (tree arg, int type) tree len[2] = { NULL_TREE, NULL_TREE }; bool dummy; - if (!get_range_strlen (arg, len, &visited, type, 0, &dummy)) + if (!get_range_strlen (arg, len, &visited, type, 0, &dummy, NULL)) len[1] = NULL_TREE; if (visited) BITMAP_FREE (visited); @@ -3496,12 +3514,14 @@ static bool gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi) { gimple *stmt = gsi_stmt (*gsi); + tree arg = gimple_call_arg (stmt, 0); wide_int minlen; wide_int maxlen; tree lenrange[2]; - if (!get_range_strlen (gimple_call_arg (stmt, 0), lenrange, true) + bool nulterm; + if (!get_range_strlen (arg, lenrange, true, &nulterm) && lenrange[0] && TREE_CODE (lenrange[0]) == INTEGER_CST && lenrange[1] && TREE_CODE (lenrange[1]) == INTEGER_CST) { @@ -3523,6 +3543,10 @@ gimple_fold_builtin_strlen (gimple_stmt_iterator *gsi) if (minlen == maxlen) { + if (!nulterm) + warn_string_no_nul (gimple_location (stmt), NULL_TREE, + gimple_call_fndecl (stmt), arg); + lenrange[0] = force_gimple_operand_gsi (gsi, lenrange[0], true, NULL, true, GSI_SAME_STMT); replace_call_with_value (gsi, lenrange[0]); diff --git a/gcc/gimple-fold.h b/gcc/gimple-fold.h index 04e9bfa..fe11728 100644 --- a/gcc/gimple-fold.h +++ b/gcc/gimple-fold.h @@ -25,7 +25,7 @@ along with GCC; see the file COPYING3. If not see extern tree create_tmp_reg_or_ssa_name (tree, gimple *stmt = NULL); extern tree canonicalize_constructor_val (tree, tree); extern tree get_symbol_constant_value (tree); -extern bool get_range_strlen (tree, tree[2], bool = false); +extern bool get_range_strlen (tree, tree[2], bool = false, bool * = NULL); extern tree get_maxval_strlen (tree, int); extern void gimplify_and_update_call_from_tree (gimple_stmt_iterator *, tree); extern bool fold_stmt (gimple_stmt_iterator *); diff --git a/gcc/testsuite/gcc.c-torture/execute/memchr-1.c b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c new file mode 100644 index 0000000..2614bee --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/memchr-1.c @@ -0,0 +1,43 @@ +/* PR tree-optimization/86711 - wrong folding of memchr + + Verify that memchr() of arrays initialized with string literals + where the nul doesn't fit in the array doesn't find the nul. */ + +extern void* memchr (const void*, int, __SIZE_TYPE__); + +#define A(expr) \ + ((expr) \ + ? (void)0 \ + : (__builtin_printf ("assertion failed on line %i: %s\n", \ + __LINE__, #expr), \ + __builtin_abort ())) + +static const char a1[4] = "1234"; +static const char a2[2][4] = { "1234", "5678" }; + +static const char a3[2][4] = { "1234", "567" }; + +int main () +{ + volatile int i = 0; + + A (memchr (a1, 0, sizeof a1) == 0); + A (memchr (a1 + 1, 0, sizeof a1 - 1) == 0); + A (memchr (a1 + 3, 0, sizeof a1 - 3) == 0); + A (memchr (a1 + i, 0, sizeof a1) == 0); + + A (memchr (a2, 0, sizeof a2) == 0); + + A (memchr (a2[0], 0, sizeof a2[0]) == 0); + A (memchr (a2[1], 0, sizeof a2[1]) == 0); + + A (memchr (a2[0] + 1, 0, sizeof a2[0] - 1) == 0); + A (memchr (a2[1] + 2, 0, sizeof a2[1] - 2) == 0); + A (memchr (a2[1] + 3, 0, sizeof a2[1] - 3) == 0); + + A (memchr (a2[i], 0, sizeof a2[i]) == 0); + A (memchr (a2[i] + 1, 0, sizeof a2[i] - 1) == 0); + + /* This one must find it. */ + A (memchr (a3, 0, sizeof a3) == &a3[1][3]); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr86714.c b/gcc/testsuite/gcc.c-torture/execute/pr86714.c new file mode 100644 index 0000000..3ad6852 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr86714.c @@ -0,0 +1,26 @@ +/* PR tree-optimization/86714 - tree-ssa-forwprop.c confused by too + long initializer + + The excessively long initializer for a[0] is undefined but this + test verifies that the excess elements are not considered a part + of the value of the array as a matter of QoI. */ + +const char a[2][3] = { "1234", "xyz" }; +char b[6]; + +void *pb = b; + +int main () +{ + __builtin_memcpy (b, a, 4); + __builtin_memset (b + 4, 'a', 2); + + if (b[0] != '1' || b[1] != '2' || b[2] != '3' + || b[3] != 'x' || b[4] != 'a' || b[5] != 'a') + __builtin_abort (); + + if (__builtin_memcmp (pb, "123xaa", 6)) + __builtin_abort (); + + return 0; +} diff --git a/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c new file mode 100644 index 0000000..838528f --- /dev/null +++ b/gcc/testsuite/gcc.dg/warn-strlen-no-nul.c @@ -0,0 +1,239 @@ +/* PR tree-optimization/86552 - missing warning for reading past the end + of non-string arrays + { dg-do compile } + { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */ + +extern __SIZE_TYPE__ strlen (const char*); + +const char a[5] = "12345"; /* { dg-message "declared here" } */ + +int i0 = 0; +int i1 = 1; + +void sink (int, ...); + +#define CONCAT(a, b) a ## b +#define CAT(a, b) CONCAT(a, b) + +#define T(str) \ + __attribute__ ((noipa)) \ + void CAT (test_, __LINE__) (void) { \ + sink (strlen (str)); \ + } typedef void dummy_type + +T (a); /* { dg-warning "argument missing terminating nul" } */ +T (&a[0]); /* { dg-warning "nul" } */ +T (&a[0] + 1); /* { dg-warning "nul" } */ +T (&a[1]); /* { dg-warning "nul" } */ +T (&a[i0]); /* { dg-warning "nul" } */ +T (&a[i0] + 1); /* { dg-warning "nul" } */ + + +const char b[][5] = { /* { dg-message "declared here" } */ + "12", "123", "1234", "54321" +}; + +T (b[0]); +T (b[1]); +T (b[2]); +T (b[3]); /* { dg-warning "nul" } */ +T (b[i0]); + +T (&b[2][1]); +T (&b[2][1] + 1); +T (&b[2][i0]); +T (&b[2][1] + i0); + +T (&b[3][1]); /* { dg-warning "nul" } */ +T (&b[3][1] + 1); /* { dg-warning "nul" } */ +T (&b[3][i0]); /* { dg-warning "nul" } */ +T (&b[3][1] + i0); /* { dg-warning "nul" } */ +T (&b[3][i0] + i1); /* { dg-warning "nul" } */ + +T (i0 ? "" : b[0]); +T (i0 ? "" : b[1]); +T (i0 ? "" : b[2]); +T (i0 ? "" : b[3]); /* { dg-warning "nul" } */ +T (i0 ? b[0] : ""); +T (i0 ? b[1] : ""); +T (i0 ? b[2] : ""); +T (i0 ? b[3] : ""); /* { dg-warning "nul" } */ + +T (i0 ? "1234" : b[3]); /* { dg-warning "nul" } */ +T (i0 ? b[3] : "1234"); /* { dg-warning "nul" } */ + +T (i0 ? a : b[3]); /* { dg-warning "nul" } */ +T (i0 ? b[0] : b[2]); +T (i0 ? b[2] : b[3]); /* { dg-warning "nul" } */ +T (i0 ? b[3] : b[2]); /* { dg-warning "nul" } */ + +T (i0 ? b[0] : &b[3][0] + 1); /* { dg-warning "nul" } */ +T (i0 ? b[1] : &b[3][1] + i0); /* { dg-warning "nul" } */ + +/* It's possible to detect the missing nul in the following two + expressions but GCC doesn't do it yet. */ +T (i0 ? &b[3][1] + i0 : b[2]); /* { dg-warning "nul" "bug" { xfail *-*-* } } */ +T (i0 ? &b[3][i0] : &b[3][i1]); /* { dg-warning "nul" "bug" { xfail *-*-* } } */ + + +struct A { char a[5], b[5]; }; + +const struct A s = { "1234", "12345" }; + +T (s.a); +T (&s.a[0]); +T (&s.a[0] + 1); +T (&s.a[0] + i0); +T (&s.a[1]); +T (&s.a[1] + 1); +T (&s.a[1] + i0); + +T (s.b); /* { dg-warning "nul" } */ +T (&s.b[0]); /* { dg-warning "nul" } */ +T (&s.b[0] + 1); /* { dg-warning "nul" } */ +T (&s.b[0] + i0); /* { dg-warning "nul" } */ +T (&s.b[1]); /* { dg-warning "nul" } */ +T (&s.b[1] + 1); /* { dg-warning "nul" } */ +T (&s.b[1] + i0); /* { dg-warning "nul" } */ + +struct B { struct A a[2]; }; + +const struct B ba[] = { + { { { "123", "12345" }, { "12345", "123" } } }, + { { { "12345", "123" }, { "123", "12345" } } }, + { { { "1", "12" }, { "123", "1234" } } }, + { { { "123", "1234" }, { "12345", "12" } } } +}; + +T (ba[0].a[0].a); +T (&ba[0].a[0].a[0]); +T (&ba[0].a[0].a[0] + 1); +T (&ba[0].a[0].a[0] + i0); +T (&ba[0].a[0].a[1]); +T (&ba[0].a[0].a[1] + 1); +T (&ba[0].a[0].a[1] + i0); + +T (ba[0].a[0].b); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[0]); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[0] + 1); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[0] + i0); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[1]); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[1] + 1); /* { dg-warning "nul" } */ +T (&ba[0].a[0].b[1] + i0); /* { dg-warning "nul" } */ + +T (ba[0].a[1].a); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[0]); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[0] + 1); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[0] + i0); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[1]); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[1] + 1); /* { dg-warning "nul" } */ +T (&ba[0].a[1].a[1] + i0); /* { dg-warning "nul" } */ + +T (ba[0].a[1].b); +T (&ba[0].a[1].b[0]); +T (&ba[0].a[1].b[0] + 1); +T (&ba[0].a[1].b[0] + i0); +T (&ba[0].a[1].b[1]); +T (&ba[0].a[1].b[1] + 1); +T (&ba[0].a[1].b[1] + i0); + + +T (ba[1].a[0].a); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[0]); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[0] + 1); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[0] + i0); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[1]); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[1] + 1); /* { dg-warning "nul" } */ +T (&ba[1].a[0].a[1] + i0); /* { dg-warning "nul" } */ + +T (ba[1].a[0].b); +T (&ba[1].a[0].b[0]); +T (&ba[1].a[0].b[0] + 1); +T (&ba[1].a[0].b[0] + i0); +T (&ba[1].a[0].b[1]); +T (&ba[1].a[0].b[1] + 1); +T (&ba[1].a[0].b[1] + i0); + +T (ba[1].a[1].a); +T (&ba[1].a[1].a[0]); +T (&ba[1].a[1].a[0] + 1); +T (&ba[1].a[1].a[0] + i0); +T (&ba[1].a[1].a[1]); +T (&ba[1].a[1].a[1] + 1); +T (&ba[1].a[1].a[1] + i0); + +T (ba[1].a[1].b); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[0]); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[0] + 1); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[0] + i0); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[1]); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[1] + 1); /* { dg-warning "nul" } */ +T (&ba[1].a[1].b[1] + i0); /* { dg-warning "nul" } */ + + +T (ba[2].a[0].a); +T (&ba[2].a[0].a[0]); +T (&ba[2].a[0].a[0] + 1); +T (&ba[2].a[0].a[0] + i0); +T (&ba[2].a[0].a[1]); +T (&ba[2].a[0].a[1] + 1); +T (&ba[2].a[0].a[1] + i0); + +T (ba[2].a[0].b); +T (&ba[2].a[0].b[0]); +T (&ba[2].a[0].b[0] + 1); +T (&ba[2].a[0].b[0] + i0); +T (&ba[2].a[0].b[1]); +T (&ba[2].a[0].b[1] + 1); +T (&ba[2].a[0].b[1] + i0); + +T (ba[2].a[1].a); +T (&ba[2].a[1].a[0]); +T (&ba[2].a[1].a[0] + 1); +T (&ba[2].a[1].a[0] + i0); +T (&ba[2].a[1].a[1]); +T (&ba[2].a[1].a[1] + 1); +T (&ba[2].a[1].a[1] + i0); + + +T (ba[3].a[0].a); +T (&ba[3].a[0].a[0]); +T (&ba[3].a[0].a[0] + 1); +T (&ba[3].a[0].a[0] + i0); +T (&ba[3].a[0].a[1]); +T (&ba[3].a[0].a[1] + 1); +T (&ba[3].a[0].a[1] + i0); + +T (ba[3].a[0].b); +T (&ba[3].a[0].b[0]); +T (&ba[3].a[0].b[0] + 1); +T (&ba[3].a[0].b[0] + i0); +T (&ba[3].a[0].b[1]); +T (&ba[3].a[0].b[1] + 1); +T (&ba[3].a[0].b[1] + i0); + +T (ba[3].a[1].a); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[0]); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[0] + 1); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[0] + i0); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[1]); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[1] + 1); /* { dg-warning "nul" } */ +T (&ba[3].a[1].a[1] + i0); /* { dg-warning "nul" } */ + +T (ba[3].a[1].b); +T (&ba[3].a[1].b[0]); +T (&ba[3].a[1].b[0] + 1); +T (&ba[3].a[1].b[0] + i0); +T (&ba[3].a[1].b[1]); +T (&ba[3].a[1].b[1] + 1); +T (&ba[3].a[1].b[1] + i0); + + +T (i0 ? ba[0].a[0].a : ba[0].a[0].b); /* { dg-warning "nul" } */ +T (i0 ? ba[0].a[0].a : ba[0].a[0].b); /* { dg-warning "nul" } */ + +T (i0 ? &ba[0].a[0].a[0] : &ba[3].a[1].a[0]); /* { dg-warning "nul" } */ +T (i0 ? &ba[3].a[1].a[1] : ba[0].a[0].a); /* { dg-warning "nul" } */ + +T (i0 ? ba[0].a[0].a : ba[0].a[1].b); +T (i0 ? ba[0].a[1].b : ba[0].a[0].a); --------------027912AD819A10832B584E8D--