From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16143 invoked by alias); 20 Jul 2018 11:35:10 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 16132 invoked by uid 89); 20 Jul 2018 11:35:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-24.3 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY autolearn=ham version=3.3.2 spammy= X-HELO: smtp.CeBiTec.Uni-Bielefeld.DE Received: from smtp.CeBiTec.Uni-Bielefeld.DE (HELO smtp.CeBiTec.Uni-Bielefeld.DE) (129.70.160.84) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 20 Jul 2018 11:35:07 +0000 Received: from localhost (localhost.CeBiTec.Uni-Bielefeld.DE [127.0.0.1]) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTP id 59CEAF92; Fri, 20 Jul 2018 13:35:04 +0200 (CEST) Received: from smtp.CeBiTec.Uni-Bielefeld.DE ([127.0.0.1]) by localhost (malfoy.CeBiTec.Uni-Bielefeld.DE [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Q7BQgJCRYQJM; Fri, 20 Jul 2018 13:34:58 +0200 (CEST) Received: from lokon.CeBiTec.Uni-Bielefeld.DE (lokon.CeBiTec.Uni-Bielefeld.DE [129.70.161.152]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.CeBiTec.Uni-Bielefeld.DE (Postfix) with ESMTPS id 40583F91; Fri, 20 Jul 2018 13:34:58 +0200 (CEST) Received: (from ro@localhost) by lokon.CeBiTec.Uni-Bielefeld.DE (8.15.2+Sun/8.15.2/Submit) id w6KBYvlN005676; Fri, 20 Jul 2018 13:34:57 +0200 (MEST) From: Rainer Orth To: Martin Sebor Cc: Jeff Law , Richard Biener , Gcc Patch List , Richard Biener Subject: Re: [PATCH] restore -Warray-bounds for string literals (PR 83776) References: <097132cc-18d7-3133-b425-e3ee9da28e77@gmail.com> <2735fb28-a404-e95f-2adb-87c7f124d8c0@redhat.com> <33633f6b-801c-21d7-a99f-f22d634cd13e@gmail.com> Date: Fri, 20 Jul 2018 11:35:00 -0000 In-Reply-To: <33633f6b-801c-21d7-a99f-f22d634cd13e@gmail.com> (Martin Sebor's message of "Thu, 19 Jul 2018 17:36:51 -0600") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (usg-unix-v) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg01176.txt.bz2 Hi Martin, > On 07/19/2018 03:51 PM, Jeff Law wrote: >> On 07/13/2018 05:45 PM, Martin Sebor wrote: >>>> >>>> + offset_int ofr[] = { >>>> + wi::to_offset (fold_convert (ptrdiff_type_node, vr->min)), >>>> + wi::to_offset (fold_convert (ptrdiff_type_node, vr->max)) >>>> + }; >>>> >>>> huh. Do you maybe want to use widest_int for ofr[]? What's >>>> wrong with wi::to_offset (vr->min)? Building another intermediate >>>> tree node here looks just bogus. >>> >>> I need to convert size_type indices to signed to keep their sign >>> if it's negative and include it in warnings. I've moved the code >>> into a conditional where it's used to minimize the cost but other >>> than that I don't know how else to convert it. >>> >>>> >>>> What are you trying to do in this loop anyways? >>> >>> The loop It determines the range of the final index/offset for >>> a series of POINTER_PLUS_EXPRs. It handles cases like this: >>> >>> int g (int i, int j, int k) >>> { >>> if (i < 1) i = 1; >>> if (j < 1) j = 1; >>> if (k < 1) k = 1; >>> >>> const char *p0 = "123"; >>> const char *p1 = p0 + i; >>> const char *p2 = p1 + j; >>> const char *p3 = p2 + k; >>> >>> // p2[3] and p3[1] are out of bounds >>> return p0[4] + p1[3] + p2[2] + p3[1]; >>> } >>> >>>> I suppose >>>> look at >>>> >>>> p_1 = &obj[i_2]; // already bounds-checked, but with ignore_off_by_one >>>> ... = MEM[p_1 + CST]; >>>> >>>> ? But then >>>> >>>> + if (TREE_CODE (varoff) != SSA_NAME) >>>> + break; >>>> >>>> you should at least handle INTEGER_CSTs here? >>> >>> It's already handled (and set in CSTOFF). There should be no >>> more constant offsets after that (I haven't come across any.) >>> >>>> >>>> + if (!vr || vr->type == VR_UNDEFINED || !vr->min || !vr->max) >>>> + break; >>>> >>>> please use positive tests here, VR_RANGE || VR_ANTI_RANGE. As usual >>>> the anti-range handling looks odd. Please add comments so we can follow >>>> what you were thinking when writing range merging code. Even better if you >>>> >>>> can stick to use existing code rather than always re-inventing the wheel... >>>> >>> >>> The anti-range handling is to conservatively add >>> [-MAXOBJSIZE -1, MAXOBJSIZE] to OFFRANGE. I've added comments >>> to make it clear. I'd be more than happy to reuse existing >>> code if I knew where to find it (if it exists). It sure would >>> save me lots of work to have a library of utility functions to >>> call instead of rolling my own code each time. >> Finding stuff is never easy :( GCC's just gotten so big it's virtually >> impossible for anyone to know all the APIs. >> >> The suggestion I'd have would be to (when possible) factor this stuff >> into routines you can reuse. We (as a whole) have this tendency to >> open-code all kinds of things rather than factoring the code into >> reusable routines. >> >> In addition to increasing the probability that you'll be able to reuse >> code later, just reading a new function's header tends to make me (as a >> reviewer) internally ask if there's already a routine we should be using >> instead. When it's open-coded it's significantly harder to spot those >> cases (at least for me). >> >> >>> >>>> >>>> I think I commented on earlier variants but this doesn't seem to resemble >>>> any of them. >>> >>> I've reworked the patch (sorry) to also handle arrays. For GCC >>> 9 it seems I might as well do both in one go. >>> >>> Attached is an updated patch with these changes. >>> >>> Martin >>> >>> gcc-83776.diff >>> >>> >>> PR tree-optimization/84047 - missing -Warray-bounds on an out-of-bounds >>> index into an array >>> PR tree-optimization/83776 - missing -Warray-bounds indexing past the >>> end of a string literal >>> >>> gcc/ChangeLog: >>> >>> PR tree-optimization/84047 >>> PR tree-optimization/83776 >>> * tree-vrp.c (vrp_prop::check_mem_ref): New function. >>> (check_array_bounds): Call it. >>> * /gcc/tree-sra.c (get_access_for_expr): Fail for out-of-bounds >>> array offsets. >>> >>> gcc/testsuite/ChangeLog: >>> >>> PR tree-optimization/83776 >>> PR tree-optimization/84047 >>> * gcc.dg/Warray-bounds-29.c: New test. >>> * gcc.dg/Warray-bounds-30.c: New test. >>> * gcc.dg/Warray-bounds-31.c: New test. >>> * gcc.dg/Warray-bounds-32.c: New test. >>> >> >>> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c >>> index 3e30f6b..8221a06 100644 >>> --- a/gcc/tree-sra.c >>> +++ b/gcc/tree-sra.c >>> @@ -3110,6 +3110,19 @@ get_access_for_expr (tree expr) >>> || !DECL_P (base)) >>> return NULL; >>> >>> + /* Fail for out-of-bounds array offsets. */ >>> + tree basetype = TREE_TYPE (base); >>> + if (TREE_CODE (basetype) == ARRAY_TYPE) >>> + { >>> + if (offset < 0) >>> + return NULL; >>> + >>> + if (tree size = DECL_SIZE (base)) >>> + if (tree_fits_uhwi_p (size) >>> + && tree_to_uhwi (size) < (unsigned HOST_WIDE_INT) offset) >>> + return NULL; >>> + } >>> + >>> if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base))) >>> return NULL; >> So I'm a bit curious about this hunk. Did we end up creating an access >> structure that walked off the end of the array? Presumably you >> suppressing SRA at this point so that you can see the array access later >> and give a suitable warning. Right? > > Yes, but I didn't make a note of the test case that triggered > it and I'm not able to trigger the code path now, so the change > might not be necessary. I've removed it from the patch. If it > turns out that it is needed after all I'll commit it later. > >>> diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c >>> index a7453ab..27021760 100644 >>> --- a/gcc/tree-vrp.c >>> +++ b/gcc/tree-vrp.c >>> @@ -4930,21 +4931,280 @@ vrp_prop::check_array_ref (location_t location, tree ref, >>> } >>> } >>> >>> +/* Checks one MEM_REF in REF, located at LOCATION, for out-of-bounds >>> + references to string constants. If VRP can determine that the array >>> + subscript is a constant, check if it is outside valid range. >>> + If the array subscript is a RANGE, warn if it is non-overlapping >>> + with valid range. >>> + IGNORE_OFF_BY_ONE is true if the MEM_REF is inside an ADDR_EXPR >>> + (used to allow one-past-the-end indices for code that takes >>> + the address of the just-past-the-end element of an array). */ >>> + >>> +void >>> +vrp_prop::check_mem_ref (location_t location, tree ref, bool ignore_off_by_one) >>> +{ >>> + if (TREE_NO_WARNING (ref)) >>> + return; >>> + >>> + tree arg = TREE_OPERAND (ref, 0); >>> + /* The constant and variable offset of the reference. */ >>> + tree cstoff = TREE_OPERAND (ref, 1); >>> + tree varoff = NULL_TREE; >>> + >>> + const offset_int maxobjsize = tree_to_shwi (max_object_size ()); >>> + >>> + /* The array or string constant bounds in bytes. Initially set >>> + to [-MAXOBJSIZE - 1, MAXOBJSIZE] until a tighter bound is >>> + determined. */ >>> + offset_int arrbounds[2]; >>> + arrbounds[1] = maxobjsize; >>> + arrbounds[0] = -arrbounds[1] - 1; >> I realize that arrbounds[1] == maxobjsize at this point. But is there >> any reason not to use maxobjsize in the assignment to arrbounds[0] so >> that its computation matches the comment. It would also seem advisable >> to set the min first, then the max since that's the ordering in the >> comment and in the array. > > Done. > >> >> >>> + >>> + /* The minimum and maximum intermediate offset. For a reference >>> + to be valid, not only does the final offset/subscript must be >>> + in bounds but all intermediate offsets must be as well. */ >>> + offset_int ioff = wi::to_offset (fold_convert (ptrdiff_type_node, >>> cstoff)); >>> + offset_int extrema[2] = { 0, wi::abs (ioff) }; >> You should probably tone back the comment here since the intermediate >> offsets do not have to be in bounds. > > Done. > >> >> >> [ Big snip ] >> >>> + /* The type of the object being referred to. It can be an array, >>> + string literal, or a non-array type when the MEM_REF represents >>> + a reference/subscript via a pointer to an object that is not >>> + an element of an array. References to members of structs and >>> + unions are excluded because MEM_REF doesn't make it possible >>> + to identify the member where the reference orginated. */ >> s/orginated/originated/ >> >> OK with those changes. > > Committed without the SRA change as 262893. your patch has caused quite a number of testsuite failures: +FAIL: c-c++-common/Warray-bounds-2.c -std=gnu++11 (test for excess errors) +FAIL: c-c++-common/Warray-bounds-2.c -std=gnu++14 (test for excess errors) +FAIL: c-c++-common/Warray-bounds-2.c -std=gnu++98 (test for excess errors) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/c-c++-common/Warray-bounds-2.c:200:11: warning: array subscript -1 is outside array bounds of 'Array [2]' [-Warray-bounds] +FAIL: c-c++-common/Warray-bounds-2.c -Wc++-compat (test for excess errors) on 32 and 64-bit Solaris/SPARC and x86, also Linux/x86_64 +XPASS: gcc.dg/Warray-bounds-31.c (test for warnings, line 26) +XPASS: gcc.dg/Warray-bounds-31.c (test for warnings, line 40) +FAIL: gcc.dg/Warray-bounds-31.c (test for excess errors) +XPASS: gcc.dg/Warray-bounds-31.c bug 84047 (test for warnings, line 72) +XPASS: gcc.dg/Warray-bounds-31.c bug 84047 (test for warnings, line 90) Excess errors: /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/Warray-bounds-31.c:100:3: warning: array subscript -2147483648 is outside array bounds of 'char[9]' [-Warray-bounds] 32-bit Solaris and Linux only. +FAIL: gcc.dg/Warray-bounds-32.c (test for warnings, line 28) Please fix. Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University