From: Richard Biener <rguenther@suse.de>
To: Martin Sebor <msebor@gmail.com>
Cc: Jeff Law <law@redhat.com>, Gcc Patch List <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)
Date: Tue, 14 Nov 2017 09:13:00 -0000 [thread overview]
Message-ID: <alpine.LSU.2.20.1711140943290.12252@zhemvz.fhfr.qr> (raw)
In-Reply-To: <90f1f522-35b0-4989-bf77-b79948054281@gmail.com>
On Mon, 13 Nov 2017, Martin Sebor wrote:
> On 11/10/2017 01:00 AM, Richard Biener wrote:
> > On Thu, 9 Nov 2017, Jeff Law wrote:
> >
> > > On 11/02/2017 05:48 AM, Richard Biener wrote:
> > >
> > > >
> > > > There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the
> > > > past. We have ripped out _most_ of them because of bad interaction
> > > > with dependence analysis and _b_o_s warnings.
> > > >
> > > > But for example PRE might still end up propagating
> > > >
> > > > _1 = &ptr->a.b.c;
> > > > _2 = (*_1)[i_3];
> > > >
> > > > as
> > > >
> > > > _2 = ptr->a.b.c[i_3];
> > > >
> > > > But it's not so much GCC building up GIMPLE expressions that would
> > > > cause false positives but "valid" C code and "invalid" C code
> > > > represented exactly the same in GCC. Let's say
> > > I think this is a key point.
> >
> > It's the usual issue with an optimizing compiler vs. a static analyzer.
> > We try to get rid of the little semantic details of the input languages
> > that in the end do not matter for code-generation but that makes
> > using those semantic details hard (sometimes the little details
> > are useful, like signed overflow being undefined).
> >
> > For GIMPLE it's also often the case that we didn't really thoroughly
> > specify the semantics of the IL - like is an aggregate copy a
> > block copy (that's how we expand it to RTL) or a memberwise copy?
> > SRA treats it like the latter in some cases but memcpy folding
> > turns memcpy into aggregate assignments ... (now think about padding).
> >
> > It's not that GCC doesn't have its set of existing issues with
> > respect to interpreting GIMPLE semantic as it seems fit in one way
> > here and in another way there. I'm just always nervous when adding
> > new "interpretations" where I know the non-existing formal definition
> > of GIMPLE leaves things unspecified.
> >
> > For example we _do_ use array bounds and array accesses (but _not_
> > and for now _nowhere_ if they appear in address computations!)
> > to derive niter information. At the same time, because of this
> > exploitation, we try very very hard to never (ok, PRE above as a
> > couter-example) create an actual array access when dereferencing
> > a pointer that is constructed by taking the address of an array-ref.
> > That's why Martin added the warning to forwprop because that pass,
> > when forwarding such addresses, gets rid of the array-ref.
> >
> > > > > Or, if that's not it, what exactly is your concern with this
> > > > > enhancement? If it's that it's implemented in forwprop, what
> > > > > would be a better place, e.g., earlier in the optimization
> > > > > phase? If it's something something else, I'd appreciate it
> > > > > if you could explain what.
> > > >
> > > > For one implementing this in forwprop looks like a move in the
> > > > wrong direction. I'd like to have separate warning passes or
> > > > at most amend warnings from optimization passes, not add new ones.
> > > I tend to agree. That's one of the reasons why I pushed Aldy away from
> > > doing this kind of stuff within VRP.
> > >
> > > What I envision is a pass which does a dominator walk through the
> > > blocks. It gathers context sensitive range information as it does the
> > > walk.
> > >
> > > As we encounter array references, we try to check them against the
> > > current range information. We could also try to warn about certain
> > > pointer computations, though we have to be more careful with those.
> > >
> > > Though I certainly still worry that the false positive cases which led
> > > Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
> > > and will limit the utility of doing more array range checking.
> >
> > I fear while this might be a little bit cleaner you'd still have to
> > do this very very early in the optimization pipeline (see all the
> > hard time we had with __builtin_object_size) and thus you won't catch
> > very many cases unless you start doing an IPA pass and handle propagating
> > through memory. Which is when you arrived at a full-blown static
> > analyzer.
>
> I have a different concern with the general idea of moving these
> kinds of warnings into passes of their own. It would unavoidably
> result in duplicating some code from the optimization passes (at
> a minimum, the GIMPLE traversal, but likely more than that). It
> would also require pretty tight coupling between the warning pass
> and the optimization passes as the latter can defeat the work of
> the former (as happens in this case). But most significantly,
> the separation isn't feasible in cases where the optimization
> pass computes and maintains non-trivial internal state (like,
> say, tree-object-size or tree-ssa-strlen).
Well, as I said elsewhere the goal is to modularize the "optimizations"
enough so you can run their analysis phase together from a
"warning pass" and get access to their internal state. Like what
Jeff is doing with VRP. I'm currently working on some region-based
value-numbering where the iteration scheme is supposed to allow
pluggin in, say, a value-range lattice. It's currently not
allowing path separation (simulating jump-threading) at the moment
but you have to start somewhere ...
> I think of these kinds of "basic" warnings as error checking and
> handling. It makes no more sense to me to have a separate pass
> check for input errors than it would to have one to check the
> validity of other kinds of input within GCC (such as trees of
> the wrong kind being passed from one function to another due to
> a GCC programmer's error). They go hand in hand.
>
> To be sure, there are more sophisticated kinds of checkers that
> depend on an in-depth analysis of the program that's outside of
> what most existing optimization passes do. The sprintf warning
> comes to mind as an example. But even those can be (and the
> sprintf pass has been) used to do both: detect and diagnose
> errors _and_ optimize code. And even these passes (or especially
> they) do or would benefit from data computed by other optimization
> passes to deliver better results (whether it be avoiding false
> positives or improving the code they generate).
Sure - no disagreement here. But then those warnings have to
operate within the limitations of the GIMPLE IL which was designed
for an optimizing compiler with a variety of input source languages.
That is, you simply can't implement every diagnostic you'd like
to have when coming from a source level perspective.
It seems to be you're somewhat pushing the boundary here ;)
Richard.
> That said, it would make perfect sense to me to have the warning
> code live in separate files from the optimization code, and keep
> the separation of concerns and responsibilities crisp and well
> defined by their APIs.
next prev parent reply other threads:[~2017-11-14 8:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-29 16:15 Martin Sebor
2017-10-30 11:55 ` Richard Biener
2017-10-30 15:21 ` Martin Sebor
2017-10-30 19:59 ` Richard Biener
2017-10-30 20:40 ` Martin Sebor
2017-10-30 21:23 ` Richard Biener
2017-10-30 21:49 ` Martin Sebor
2017-11-02 11:48 ` Richard Biener
2017-11-10 1:12 ` Jeff Law
2017-11-10 8:25 ` Richard Biener
2017-11-14 0:04 ` Jeff Law
2017-11-14 9:11 ` Richard Biener
2017-11-15 1:52 ` Jeff Law
2017-11-14 5:22 ` Martin Sebor
2017-11-14 9:13 ` Richard Biener [this message]
2017-11-15 1:54 ` Jeff Law
2017-10-30 22:16 ` Jeff Law
2017-10-30 23:30 ` Martin Sebor
2017-10-31 4:32 ` Jeff Law
2017-11-01 22:21 ` Martin Sebor
2017-11-02 11:27 ` Richard Biener
2017-10-30 22:16 ` Jeff Law
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LSU.2.20.1711140943290.12252@zhemvz.fhfr.qr \
--to=rguenther@suse.de \
--cc=gcc-patches@gcc.gnu.org \
--cc=law@redhat.com \
--cc=msebor@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).