From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-466688-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 8232 invoked by alias); 13 Nov 2017 23:41:13 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 8217 invoked by uid 89); 13 Nov 2017 23:41:13 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=BAYES_00,KB_WAM_FROM_NAME_SINGLEWORD,RP_MATCHES_RCVD,SPF_HELO_PASS autolearn=no version=3.3.2 spammy=direction, sensitivity, thoroughly, nervous
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 13 Nov 2017 23:41:11 +0000
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id E753CA58B7;	Mon, 13 Nov 2017 23:41:09 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-12.rdu2.redhat.com [10.10.112.12])	by smtp.corp.redhat.com (Postfix) with ESMTP id CD7D51810D;	Mon, 13 Nov 2017 23:41:08 +0000 (UTC)
Subject: Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455)
To: Richard Biener <rguenther@suse.de>
Cc: Martin Sebor <msebor@gmail.com>, Gcc Patch List <gcc-patches@gcc.gnu.org>
References: <b63d021f-ceca-e26f-848a-40182de004c9@gmail.com> <alpine.LSU.2.20.1710301238080.8202@zhemvz.fhfr.qr> <79634da6-bf31-b7f0-15f5-0436fc21a51a@gmail.com> <C174CF6C-E18C-4575-AA5E-AB7D52A6C1D8@suse.de> <ba57f7d9-fd4b-0692-462f-cf4fcf9b6f93@gmail.com> <5FF222AD-B155-434B-9C65-721009D1964E@suse.de> <1064925a-ac64-502e-d0dd-85c27e7432f6@gmail.com> <alpine.LSU.2.20.1711021227140.12252@zhemvz.fhfr.qr> <6d6e6b84-e4b0-069c-30fa-58e45b2cd4c7@redhat.com> <alpine.LSU.2.20.1711100853280.12252@zhemvz.fhfr.qr>
From: Jeff Law <law@redhat.com>
Message-ID: <c78111fc-0411-a5e8-3cd8-dac69adb5458@redhat.com>
Date: Tue, 14 Nov 2017 00:04:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <alpine.LSU.2.20.1711100853280.12252@zhemvz.fhfr.qr>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2017-11/txt/msg01031.txt.bz2

On 11/10/2017 01:00 AM, Richard Biener wrote:
> 
> It's the usual issue with an optimizing compiler vs. a static analyzer.
> We try to get rid of the little semantic details of the input languages
> that in the end do not matter for code-generation but that makes
> using those semantic details hard (sometimes the little details
> are useful, like signed overflow being undefined).
> 
> For GIMPLE it's also often the case that we didn't really thoroughly
> specify the semantics of the IL - like is an aggregate copy a
> block copy (that's how we expand it to RTL) or a memberwise copy?
> SRA treats it like the latter in some cases but memcpy folding
> turns memcpy into aggregate assignments ... (now think about padding).
Understood far too well.  In fact, I was looking at the aggregate copy
stuff not terribly long ago and concluded that depending on either
particular behavior was undesirable. Something (glibc IIRC) was
depending on the padding being copied because they were actually shoving
live data into the pad.  Ugh.

> 
> It's not that GCC doesn't have its set of existing issues with
> respect to interpreting GIMPLE semantic as it seems fit in one way
> here and in another way there.  I'm just always nervous when adding
> new "interpretations" where I know the non-existing formal definition
> of GIMPLE leaves things unspecified.
Right.  No disagreement from me.  We have these issues and address
representation is just one of a class of things which we can represent
in gimple that don't "properly" map back to the source language.  And
that's probably inherent in the lowering from the source language to
something like GIMPLE.


> 
> For example we _do_ use array bounds and array accesses (but _not_
> and for now _nowhere_ if they appear in address computations!)
> to derive niter information.  At the same time, because of this
> exploitation, we try very very hard to never (ok, PRE above as a
> couter-example) create an actual array access when dereferencing
> a pointer that is constructed by taking the address of an array-ref.
> That's why Martin added the warning to forwprop because that pass,
> when forwarding such addresses, gets rid of the array-ref.
Right.  IIRC there's some BZs around this issue that come up
release-to-release related to how we've changed this code over the last
few years.


> 
>>>> Or, if that's not it, what exactly is your concern with this
>>>> enhancement?  If it's that it's implemented in forwprop, what
>>>> would be a better place, e.g., earlier in the optimization
>>>> phase?  If it's something something else, I'd appreciate it
>>>> if you could explain what.
>>>
>>> For one implementing this in forwprop looks like a move in the
>>> wrong direction.  I'd like to have separate warning passes or
>>> at most amend warnings from optimization passes, not add new ones.
>> I tend to agree.  That's one of the reasons why I pushed Aldy away from
>> doing this kind of stuff within VRP.
>>
>> What I envision is a pass which does a dominator walk through the
>> blocks.  It gathers context sensitive range information as it does the walk.
>>
>> As we encounter array references, we try to check them against the
>> current range information.  We could also try to warn about certain
>> pointer computations, though we have to be more careful with those.
>>
>> Though I certainly still worry that the false positive cases which led
>> Aldy, Andrew and myself to look at path sensitive ranges arent' resolved
>> and will limit the utility of doing more array range checking.
> 
> I fear while this might be a little bit cleaner you'd still have to
> do this very very early in the optimization pipeline (see all the
> hard time we had with __builtin_object_size) and thus you won't catch
> very many cases unless you start doing an IPA pass and handle propagating
> through memory.  Which is when you arrived at a full-blown static
> analyzer.
Could be.  We won't know until we give it a whirl.  FWIW, we saw far
more problems due to the lack of path sensitivity than anything.
Nothing I'm suggesting in this thread addresses that problem.

Doing a really good job for warnings with path sensitivity shares a lot
of properties with jump threading.  Specifically that you need to
propagate range knowledge along a path, often past join points in the
CFG (ie, you're propagating beyond the dominance frontier).

Once you do a good job there, I'd strongly suspect that IPA issues would
then dominate.

Jeff