From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-466702-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 86359 invoked by alias); 14 Nov 2017 08:48:57 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 86346 invoked by uid 89); 14 Nov 2017 08:48:57 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,RP_MATCHES_RCVD,SPF_PASS autolearn=no version=3.3.2 spammy=perspective
X-HELO: mx2.suse.de
Received: from mx2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Nov 2017 08:48:55 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id 00334AAC8;	Tue, 14 Nov 2017 08:48:52 +0000 (UTC)
Date: Tue, 14 Nov 2017 09:13:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Martin Sebor <msebor@gmail.com>
cc: Jeff Law <law@redhat.com>, Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)
In-Reply-To: <90f1f522-35b0-4989-bf77-b79948054281@gmail.com>
Message-ID: <alpine.LSU.2.20.1711140943290.12252@zhemvz.fhfr.qr>
References: <b63d021f-ceca-e26f-848a-40182de004c9@gmail.com> <alpine.LSU.2.20.1710301238080.8202@zhemvz.fhfr.qr> <79634da6-bf31-b7f0-15f5-0436fc21a51a@gmail.com> <C174CF6C-E18C-4575-AA5E-AB7D52A6C1D8@suse.de> <ba57f7d9-fd4b-0692-462f-cf4fcf9b6f93@gmail.com> <5FF222AD-B155-434B-9C65-721009D1964E@suse.de> <1064925a-ac64-502e-d0dd-85c27e7432f6@gmail.com> <alpine.LSU.2.20.1711021227140.12252@zhemvz.fhfr.qr> <6d6e6b84-e4b0-069c-30fa-58e45b2cd4c7@redhat.com> <alpine.LSU.2.20.1711100853280.12252@zhemvz.fhfr.qr> <90f1f522-35b0-4989-bf77-b79948054281@gmail.com>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2017-11/txt/msg01045.txt.bz2

On Mon, 13 Nov 2017, Martin Sebor wrote:

> On 11/10/2017 01:00 AM, Richard Biener wrote:
> > On Thu, 9 Nov 2017, Jeff Law wrote:
> > 
> > > On 11/02/2017 05:48 AM, Richard Biener wrote:
> > > 
> > > > 
> > > > There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the
> > > > past.  We have ripped out _most_ of them because of bad interaction
> > > > with dependence analysis and _b_o_s warnings.
> > > > 
> > > > But for example PRE might still end up propagating
> > > > 
> > > >  _1 = &ptr->a.b.c;
> > > >  _2 = (*_1)[i_3];
> > > > 
> > > > as
> > > > 
> > > >  _2 = ptr->a.b.c[i_3];
> > > > 
> > > > But it's not so much GCC building up GIMPLE expressions that would
> > > > cause false positives but "valid" C code and "invalid" C code
> > > > represented exactly the same in GCC.  Let's say
> > > I think this is a key point.
> > 
> > It's the usual issue with an optimizing compiler vs. a static analyzer.
> > We try to get rid of the little semantic details of the input languages
> > that in the end do not matter for code-generation but that makes
> > using those semantic details hard (sometimes the little details
> > are useful, like signed overflow being undefined).
> > 
> > For GIMPLE it's also often the case that we didn't really thoroughly
> > specify the semantics of the IL - like is an aggregate copy a
> > block copy (that's how we expand it to RTL) or a memberwise copy?
> > SRA treats it like the latter in some cases but memcpy folding
> > turns memcpy into aggregate assignments ... (now think about padding).
> > 
> > It's not that GCC doesn't have its set of existing issues with
> > respect to interpreting GIMPLE semantic as it seems fit in one way
> > here and in another way there.  I'm just always nervous when adding
> > new "interpretations" where I know the non-existing formal definition
> > of GIMPLE leaves things unspecified.
> > 
> > For example we _do_ use array bounds and array accesses (but _not_
> > and for now _nowhere_ if they appear in address computations!)
> > to derive niter information.  At the same time, because of this
> > exploitation, we try very very hard to never (ok, PRE above as a
> > couter-example) create an actual array access when dereferencing
> > a pointer that is constructed by taking the address of an array-ref.
> > That's why Martin added the warning to forwprop because that pass,
> > when forwarding such addresses, gets rid of the array-ref.
> > 
> > > > > Or, if that's not it, what exactly is your concern with this
> > > > > enhancement?  If it's that it's implemented in forwprop, what
> > > > > would be a better place, e.g., earlier in the optimization
> > > > > phase?  If it's something something else, I'd appreciate it
> > > > > if you could explain what.
> > > > 
> > > > For one implementing this in forwprop looks like a move in the
> > > > wrong direction.  I'd like to have separate warning passes or
> > > > at most amend warnings from optimization passes, not add new ones.
> > > I tend to agree.  That's one of the reasons why I pushed Aldy away from
> > > doing this kind of stuff within VRP.
> > > 
> > > What I envision is a pass which does a dominator walk through the
> > > blocks.  It gathers context sensitive range information as it does the
> > > walk.
> > > 
> > > As we encounter array references, we try to check them against the
> > > current range information.  We could also try to warn about certain
> > > pointer computations, though we have to be more careful with those.
> > > 
> > > Though I certainly still worry that the false positive cases which led
> > > Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
> > > and will limit the utility of doing more array range checking.
> > 
> > I fear while this might be a little bit cleaner you'd still have to
> > do this very very early in the optimization pipeline (see all the
> > hard time we had with __builtin_object_size) and thus you won't catch
> > very many cases unless you start doing an IPA pass and handle propagating
> > through memory.  Which is when you arrived at a full-blown static
> > analyzer.
> 
> I have a different concern with the general idea of moving these
> kinds of warnings into passes of their own.  It would unavoidably
> result in duplicating some code from the optimization passes (at
> a minimum, the GIMPLE traversal, but likely more than that).  It
> would also require pretty tight coupling between the warning pass
> and the optimization passes as the latter can defeat the work of
> the former (as happens in this case).  But most significantly,
> the separation isn't feasible in cases where the optimization
> pass computes and maintains non-trivial internal state (like,
> say, tree-object-size or tree-ssa-strlen).

Well, as I said elsewhere the goal is to modularize the "optimizations"
enough so you can run their analysis phase together from a
"warning pass" and get access to their internal state.  Like what
Jeff is doing with VRP.  I'm currently working on some region-based
value-numbering where the iteration scheme is supposed to allow
pluggin in, say, a value-range lattice.  It's currently not
allowing path separation (simulating jump-threading) at the moment
but you have to start somewhere ...

> I think of these kinds of "basic" warnings as error checking and
> handling.  It makes no more sense to me to have a separate pass
> check for input errors than it would to have one to check the
> validity of other kinds of input within GCC (such as trees of
> the wrong kind being passed from one function to another due to
> a GCC programmer's error).  They go hand in hand.
> 
> To be sure, there are more sophisticated kinds of checkers that
> depend on an in-depth analysis of the program that's outside of
> what most existing optimization passes do.  The sprintf warning
> comes to mind as an example.  But even those can be (and the
> sprintf pass has been) used to do both: detect and diagnose
> errors _and_ optimize code.  And even these passes (or especially
> they) do or would benefit from data computed by other optimization
> passes to deliver better results (whether it be avoiding false
> positives or improving the code they generate).

Sure - no disagreement here.  But then those warnings have to
operate within the limitations of the GIMPLE IL which was designed
for an optimizing compiler with a variety of input source languages.
That is, you simply can't implement every diagnostic you'd like
to have when coming from a source level perspective.

It seems to be you're somewhat pushing the boundary here ;)

Richard.

> That said, it would make perfect sense to me to have the warning
> code live in separate files from the optimization code, and keep
> the separation of concerns and responsibilities crisp and well
> defined by their APIs.