From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 86359 invoked by alias); 14 Nov 2017 08:48:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 86346 invoked by uid 89); 14 Nov 2017 08:48:57 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=perspective X-HELO: mx2.suse.de Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Nov 2017 08:48:55 +0000 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 00334AAC8; Tue, 14 Nov 2017 08:48:52 +0000 (UTC) Date: Tue, 14 Nov 2017 09:13:00 -0000 From: Richard Biener To: Martin Sebor cc: Jeff Law , Gcc Patch List Subject: Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) In-Reply-To: <90f1f522-35b0-4989-bf77-b79948054281@gmail.com> Message-ID: References: <79634da6-bf31-b7f0-15f5-0436fc21a51a@gmail.com> <5FF222AD-B155-434B-9C65-721009D1964E@suse.de> <1064925a-ac64-502e-d0dd-85c27e7432f6@gmail.com> <6d6e6b84-e4b0-069c-30fa-58e45b2cd4c7@redhat.com> <90f1f522-35b0-4989-bf77-b79948054281@gmail.com> User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-SW-Source: 2017-11/txt/msg01045.txt.bz2 On Mon, 13 Nov 2017, Martin Sebor wrote: > On 11/10/2017 01:00 AM, Richard Biener wrote: > > On Thu, 9 Nov 2017, Jeff Law wrote: > > > > > On 11/02/2017 05:48 AM, Richard Biener wrote: > > > > > > > > > > > There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the > > > > past. We have ripped out _most_ of them because of bad interaction > > > > with dependence analysis and _b_o_s warnings. > > > > > > > > But for example PRE might still end up propagating > > > > > > > > _1 = &ptr->a.b.c; > > > > _2 = (*_1)[i_3]; > > > > > > > > as > > > > > > > > _2 = ptr->a.b.c[i_3]; > > > > > > > > But it's not so much GCC building up GIMPLE expressions that would > > > > cause false positives but "valid" C code and "invalid" C code > > > > represented exactly the same in GCC. Let's say > > > I think this is a key point. > > > > It's the usual issue with an optimizing compiler vs. a static analyzer. > > We try to get rid of the little semantic details of the input languages > > that in the end do not matter for code-generation but that makes > > using those semantic details hard (sometimes the little details > > are useful, like signed overflow being undefined). > > > > For GIMPLE it's also often the case that we didn't really thoroughly > > specify the semantics of the IL - like is an aggregate copy a > > block copy (that's how we expand it to RTL) or a memberwise copy? > > SRA treats it like the latter in some cases but memcpy folding > > turns memcpy into aggregate assignments ... (now think about padding). > > > > It's not that GCC doesn't have its set of existing issues with > > respect to interpreting GIMPLE semantic as it seems fit in one way > > here and in another way there. I'm just always nervous when adding > > new "interpretations" where I know the non-existing formal definition > > of GIMPLE leaves things unspecified. > > > > For example we _do_ use array bounds and array accesses (but _not_ > > and for now _nowhere_ if they appear in address computations!) > > to derive niter information. At the same time, because of this > > exploitation, we try very very hard to never (ok, PRE above as a > > couter-example) create an actual array access when dereferencing > > a pointer that is constructed by taking the address of an array-ref. > > That's why Martin added the warning to forwprop because that pass, > > when forwarding such addresses, gets rid of the array-ref. > > > > > > > Or, if that's not it, what exactly is your concern with this > > > > > enhancement? If it's that it's implemented in forwprop, what > > > > > would be a better place, e.g., earlier in the optimization > > > > > phase? If it's something something else, I'd appreciate it > > > > > if you could explain what. > > > > > > > > For one implementing this in forwprop looks like a move in the > > > > wrong direction. I'd like to have separate warning passes or > > > > at most amend warnings from optimization passes, not add new ones. > > > I tend to agree. That's one of the reasons why I pushed Aldy away from > > > doing this kind of stuff within VRP. > > > > > > What I envision is a pass which does a dominator walk through the > > > blocks. It gathers context sensitive range information as it does the > > > walk. > > > > > > As we encounter array references, we try to check them against the > > > current range information. We could also try to warn about certain > > > pointer computations, though we have to be more careful with those. > > > > > > Though I certainly still worry that the false positive cases which led > > > Aldy, Andrew and myself to look at path sensitive ranges arent' resolved > > > and will limit the utility of doing more array range checking. > > > > I fear while this might be a little bit cleaner you'd still have to > > do this very very early in the optimization pipeline (see all the > > hard time we had with __builtin_object_size) and thus you won't catch > > very many cases unless you start doing an IPA pass and handle propagating > > through memory. Which is when you arrived at a full-blown static > > analyzer. > > I have a different concern with the general idea of moving these > kinds of warnings into passes of their own. It would unavoidably > result in duplicating some code from the optimization passes (at > a minimum, the GIMPLE traversal, but likely more than that). It > would also require pretty tight coupling between the warning pass > and the optimization passes as the latter can defeat the work of > the former (as happens in this case). But most significantly, > the separation isn't feasible in cases where the optimization > pass computes and maintains non-trivial internal state (like, > say, tree-object-size or tree-ssa-strlen). Well, as I said elsewhere the goal is to modularize the "optimizations" enough so you can run their analysis phase together from a "warning pass" and get access to their internal state. Like what Jeff is doing with VRP. I'm currently working on some region-based value-numbering where the iteration scheme is supposed to allow pluggin in, say, a value-range lattice. It's currently not allowing path separation (simulating jump-threading) at the moment but you have to start somewhere ... > I think of these kinds of "basic" warnings as error checking and > handling. It makes no more sense to me to have a separate pass > check for input errors than it would to have one to check the > validity of other kinds of input within GCC (such as trees of > the wrong kind being passed from one function to another due to > a GCC programmer's error). They go hand in hand. > > To be sure, there are more sophisticated kinds of checkers that > depend on an in-depth analysis of the program that's outside of > what most existing optimization passes do. The sprintf warning > comes to mind as an example. But even those can be (and the > sprintf pass has been) used to do both: detect and diagnose > errors _and_ optimize code. And even these passes (or especially > they) do or would benefit from data computed by other optimization > passes to deliver better results (whether it be avoiding false > positives or improving the code they generate). Sure - no disagreement here. But then those warnings have to operate within the limitations of the GIMPLE IL which was designed for an optimizing compiler with a variety of input source languages. That is, you simply can't implement every diagnostic you'd like to have when coming from a source level perspective. It seems to be you're somewhat pushing the boundary here ;) Richard. > That said, it would make perfect sense to me to have the warning > code live in separate files from the optimization code, and keep > the separation of concerns and responsibilities crisp and well > defined by their APIs.