From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 91716 invoked by alias); 12 Nov 2019 19:55:48 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 91708 invoked by uid 89); 12 Nov 2019 19:55:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-14.6 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:ad9c7f8, H*MI:sk:ad9c7f8, H*f:sk:ad9c7f8 X-HELO: mail-yw1-f49.google.com Received: from mail-yw1-f49.google.com (HELO mail-yw1-f49.google.com) (209.85.161.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 12 Nov 2019 19:55:45 +0000 Received: by mail-yw1-f49.google.com with SMTP id g77so6877821ywb.10 for ; Tue, 12 Nov 2019 11:55:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=PR4cqYZJJX8vFAKncbZtk6yUlAlxq4Jp55o/v5dy5qo=; b=sjgpTFaZS8ednFQ5at4Yu8dC3u87RqVO9CZ2z/srP/xOmL6An9JzRh/lrdbsJt/7e1 iku4gwOzPwkZcWibp7u39NKORg2vEHX1T8hj3NLPMV1UNvnlzU/yRAabSYl6q8sZCyts o50AJS3Me90avIB00DY5GfKRmREB36FUv4ZeKPdbIP/nBFKXXPuGGNpI59QKA/eJHyIs /k93cjbYpXxFmccIBW3Q/zDY8/qNVk8hkaJCZLxWl9XkJOaQss+QbQusU9OBI8r8fW6R zr8n7EWJUucaKGI4zJn1ijq0OCo+RbPGIb8JknhTvWE2eRwVQC08pxsYiBUg9twJE+xI ODXQ== Return-Path: Received: from [192.168.0.41] (97-118-98-145.hlrn.qwest.net. [97.118.98.145]) by smtp.gmail.com with ESMTPSA id x201sm20519257ywx.34.2019.11.12.11.55.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 12 Nov 2019 11:55:42 -0800 (PST) Subject: Re: [PATCH] include size and offset in -Wstringop-overflow To: Jeff Law , Richard Biener Cc: gcc-patches References: <9d3210bf-e1bd-81ac-b1a0-5c66c8ec6db7@gmail.com> <6f6bdef6-0d0d-0303-b870-10ff2b036444@redhat.com> <41e6f241-2361-487d-1bb0-c376df10a116@gmail.com> <517e840d-7b24-7bbd-1981-0db5e3c9b036@redhat.com> <9d892990-40c7-2350-3fc8-6caa36636090@redhat.com> From: Martin Sebor Message-ID: Date: Tue, 12 Nov 2019 20:21:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2019-11/txt/msg00961.txt.bz2 On 11/12/19 10:54 AM, Jeff Law wrote: > On 11/12/19 1:15 AM, Richard Biener wrote: >> On Tue, Nov 12, 2019 at 6:10 AM Jeff Law wrote: >>> >>> On 11/6/19 3:34 PM, Martin Sebor wrote: >>>> On 11/6/19 2:06 PM, Martin Sebor wrote: >>>>> On 11/6/19 1:39 PM, Jeff Law wrote: >>>>>> On 11/6/19 1:27 PM, Martin Sebor wrote: >>>>>>> On 11/6/19 11:55 AM, Jeff Law wrote: >>>>>>>> On 11/6/19 11:00 AM, Martin Sebor wrote: >>>>>>>>> The -Wstringop-overflow warnings for single-byte and multi-byte >>>>>>>>> stores mention the amount of data being stored and the amount of >>>>>>>>> space remaining in the destination, such as: >>>>>>>>> >>>>>>>>> warning: writing 4 bytes into a region of size 0 [-Wstringop-overflow=] >>>>>>>>> >>>>>>>>> 123 | *p = 0; >>>>>>>>> | ~~~^~~ >>>>>>>>> note: destination object declared here >>>>>>>>> 45 | char b[N]; >>>>>>>>> | ^ >>>>>>>>> >>>>>>>>> A warning like this can take some time to analyze. First, the size >>>>>>>>> of the destination isn't mentioned and may not be easy to tell from >>>>>>>>> the sources. In the note above, when N's value is the result of >>>>>>>>> some non-trivial computation, chasing it down may be a small project >>>>>>>>> in and of itself. Second, it's also not clear why the region size >>>>>>>>> is zero. It could be because the offset is exactly N, or because >>>>>>>>> it's negative, or because it's in some range greater than N. >>>>>>>>> >>>>>>>>> Mentioning both the size of the destination object and the offset >>>>>>>>> makes the existing messages clearer, are will become essential when >>>>>>>>> GCC starts diagnosing overflow into allocated buffers (as my >>>>>>>>> follow-on patch does). >>>>>>>>> >>>>>>>>> The attached patch enhances -Wstringop-overflow to do this by >>>>>>>>> letting compute_objsize return the offset to its caller, doing >>>>>>>>> something similar in get_stridx, and adding a new function to >>>>>>>>> the strlen pass to issue this enhanced warning (eventually, I'd >>>>>>>>> like the function to replace the -Wstringop-overflow handler in >>>>>>>>> builtins.c). With the change, the note above might read something >>>>>>>>> like: >>>>>>>>> >>>>>>>>> note: at offset 11 to object ‘b’ with size 8 declared here >>>>>>>>> 45 | char b[N]; >>>>>>>>> | ^ >>>>>>>>> >>>>>>>>> Tested on x86_64-linux. >>>>>>>>> >>>>>>>>> Martin >>>>>>>>> >>>>>>>>> gcc-store-offset.diff >>>>>>>>> >>>>>>>>> gcc/ChangeLog: >>>>>>>>> >>>>>>>>> * builtins.c (compute_objsize): Add an argument and set it to >>>>>>>>> offset >>>>>>>>> into destination. >>>>>>>>> * builtins.h (compute_objsize): Add an argument. >>>>>>>>> * tree-object-size.c (addr_object_size): Add an argument and >>>>>>>>> set it >>>>>>>>> to offset into destination. >>>>>>>>> (compute_builtin_object_size): Same. >>>>>>>>> * tree-object-size.h (compute_builtin_object_size): Add an >>>>>>>>> argument. >>>>>>>>> * tree-ssa-strlen.c (get_addr_stridx): Add an argument and >>>>>>>>> set it >>>>>>>>> to offset into destination. >>>>>>>>> (maybe_warn_overflow): New function. >>>>>>>>> (handle_store): Call maybe_warn_overflow to issue warnings. >>>>>>>>> >>>>>>>>> gcc/testsuite/ChangeLog: >>>>>>>>> >>>>>>>>> * c-c++-common/Wstringop-overflow-2.c: Adjust text of expected >>>>>>>>> messages. >>>>>>>>> * g++.dg/warn/Wstringop-overflow-3.C: Same. >>>>>>>>> * gcc.dg/Wstringop-overflow-17.c: Same. >>>>>>>>> >>>>>>>> >>>>>>>>> Index: gcc/tree-ssa-strlen.c >>>>>>>>> =================================================================== >>>>>>>>> --- gcc/tree-ssa-strlen.c (revision 277886) >>>>>>>>> +++ gcc/tree-ssa-strlen.c (working copy) >>>>>>>>> @@ -189,6 +189,52 @@ struct laststmt_struct >>>>>>>>> static int get_stridx_plus_constant (strinfo *, unsigned >>>>>>>>> HOST_WIDE_INT, tree); >>>>>>>>> static void handle_builtin_stxncpy (built_in_function, >>>>>>>>> gimple_stmt_iterator *); >>>>>>>>> +/* Sets MINMAX to either the constant value or the range VAL >>>>>>>>> is in >>>>>>>>> + and returns true on success. */ >>>>>>>>> + >>>>>>>>> +static bool >>>>>>>>> +get_range (tree val, wide_int minmax[2], const vr_values *rvals = >>>>>>>>> NULL) >>>>>>>>> +{ >>>>>>>>> + if (tree_fits_uhwi_p (val)) >>>>>>>>> + { >>>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); >>>>>>>>> + return true; >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (TREE_CODE (val) != SSA_NAME) >>>>>>>>> + return false; >>>>>>>>> + >>>>>>>>> + if (rvals) >>>>>>>>> + { >>>>>>>>> + gimple *def = SSA_NAME_DEF_STMT (val); >>>>>>>>> + if (gimple_assign_single_p (def) >>>>>>>>> + && gimple_assign_rhs_code (def) == INTEGER_CST) >>>>>>>>> + { >>>>>>>>> + /* get_value_range returns [0, N] for constant >>>>>>>>> assignments. */ >>>>>>>>> + val = gimple_assign_rhs1 (def); >>>>>>>>> + minmax[0] = minmax[1] = wi::to_wide (val); >>>>>>>>> + return true; >>>>>>>>> + } >>>>>>>> Umm, something seems really off with this hunk. If the SSA_NAME is >>>>>>>> set >>>>>>>> via a simple constant assignment, then the range ought to be a >>>>>>>> singleton >>>>>>>> ie [CONST,CONST]. Is there are particular test were this is not >>>>>>>> true? >>>>>>>> >>>>>>>> The only way offhand I could see this happening is if originally >>>>>>>> the RHS >>>>>>>> wasn't a constant, but due to optimizations it either simplified >>>>>>>> into a >>>>>>>> constant or a constant was propagated into an SSA_NAME appearing on >>>>>>>> the >>>>>>>> RHS. This would have to happen between the last range analysis and >>>>>>>> the >>>>>>>> point where you're making this query. >>>>>>> >>>>>>> Yes, I think that's right. Here's an example where it happens: >>>>>>> >>>>>>> void f (void) >>>>>>> { >>>>>>> char s[] = "1234"; >>>>>>> unsigned n = strlen (s); >>>>>>> char vla[n]; // or malloc (n) >>>>>>> vla[n] = 0; // n = [4, 4] >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> The strlen call is folded to 4 but that's not propagated to >>>>>>> the access until sometime after the strlen pass is done. >>>>>> Hmm. Are we calling set_range_info in that case? That goes behind the >>>>>> back of pass instance of vr_values. If so, that might argue we want to >>>>>> be setting it in vr_values too. >>>>> >>>>> No, set_range_info is only called for ranges. In this case, >>>>> handle_builtin_strlen replaces the strlen() call with 4: >>>>> >>>>> s = "1234"; >>>>> _1 = __builtin_strlen (&s); >>>>> n_2 = (unsigned int) _1; >>>>> a.1_8 = __builtin_alloca_with_align (_1, 8); >>>>> (*a.1_8)[n_2] = 0; >>>>> >>>>> When the access is made, the __builtin_alloca_with_align call >>>>> is found as the destination and the _1 SSA_NAME is used to >>>>> get its size. We get back the range [4, 4]. >>>> >>>> By the way, I glossed over one detail. The above doesn't work >>>> exactly as is because the allocation size is the SSA_NAME _1 >>>> (with the range [4, 4]) but the index is the SSA_NAME n_2 (with >>>> the range [0, 4]; the range is [0, 4] because it was set based >>>> on the size of the argument to the strlen() call well before >>>> the strlen pass even ran). >>> Which would tend to argue that we should forward propagate the constant >>> to the uses of _1. That should expose that the RHS of the assignment to >>> n_2 is a constant as well. >>> >>> >>>> >>>> To make it work across assignments we need to propagate the strlen >>>> results down the CFG somehow. I'm hoping the on-demand VRP will >>>> do this automagically. >>> It would, but it's probably more heavyweight than we need. We just need >>> to forward propagate the discovered constant to the use points and pick >>> up any secondary opportunities that arise. >> >> Yes. And the usual way of doing this is to keep a constant-and-copy >> lattice (and for copies you'd need to track availability) and before optimizing >> a stmt substitute its operands with the lattice contents. >> >> forwprop has a scheme that can be followed doing a RPO walk, strlen >> does a DOM walk, there you can follow what DOM/PRE elimination do >> (for tracking copy availability - if you just track constants you can >> elide that). > I'm also note sure handling copies is all that interesting here and if > we just handle constants/invariants, then it's pretty easy. > > Whenever we replace a strlen call with a const, we put the LHS (assuming > its an SSA_NAME) of the statement on a worklist. > > We pull items off the worklist and propagate the value to the use points > and try to simplify the resulting statement. If the RHS of the use > point simplified to a constant, then put the LHS of the use statement > onto the worklist. Iterate until the list is empty. > > That would capture everything of interest I suspect and ought to be cheap. This sounds like a significant project of its own, and well beyond the scope of the simple infrastructure enhancement I'm making here: all this does is improve the accuracy of the diagnostics. Is implementing it a precondition for accepting this patch? If yes, since I don't think I have the time for it for GCC 10 I need to decide whether to drop just this improvement or all of the buffer overflow checks that depend on it. Let me know which you prefer. Martin > > jeff > > > > I think whenever we substitute a constant or SSA_NAME for a strlen call, > we can just replace uses of the LHS of the assignment with the > const/copy. Any statements we propagate into are put on a worklist. > > We pull statements off the worklist and try to simplify their RHS. If > the RHS simplifies to a const/copy, then then we repeat the process of > propagating to the use points and > > x = ; > > If we find collapses to a constant or copy we can just record > it in SSA_NAME_VALUE. As we walk through statements, we can propagate >> >> Richard. >> >>> jeff >>> >> >