From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 123092 invoked by alias); 14 Nov 2017 01:21:09 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 123078 invoked by uid 89); 14 Nov 2017 01:21:08 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,KB_WAM_FROM_NAME_SINGLEWORD,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=no version=3.3.2 spammy=direction, crisp, thoroughly, nervous X-HELO: mail-ot0-f180.google.com Received: from mail-ot0-f180.google.com (HELO mail-ot0-f180.google.com) (74.125.82.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 14 Nov 2017 01:21:06 +0000 Received: by mail-ot0-f180.google.com with SMTP id s4so6118762ote.4 for ; Mon, 13 Nov 2017 17:21:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:references:cc:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=IqGzo0oL04qpxHUwq8IwWM3Ih2ZEjytWmC5sRXNs6ds=; b=OWkLkHp71ztnUvDXE1M23LLVe2qXdd0fjvMoiFsg9VeKp7xVzShRnWunQXggyXv+xn l+uPm0lnWPlMXnuyFJmRkpjtxr22RMXXtbXdI+sIWsGofdCxzQDwk6HkG4nrLUHjQBPs qkWjxtJrhIWuewaoKTSXahC5mF3WPWzI8ytu+a+njlGSNNxMJYgEvj14GM7r5tbljCfT v2e9oMmn7d48VlEw+EXTm3bfOEU0B6cMgdNT9R7KMW9UpHzMNyQkx7vWuX6Q0nYqVsxT +4uPQUV4uKIzlmDpczd0M25nF/zzxIRYSQ8zlWrUD/mX30nDzI/YvmhMDRQax0llz2Kq PLog== X-Gm-Message-State: AJaThX7Gw+9CnK+FKo1QEq40Tgvnp3lKUVylnygb/GE2+3MlLCZdqvC9 b/nqOu1gNk7ClEz4ZuZRE2hGVw== X-Google-Smtp-Source: AGs4zMZpObB4fowOH62+h9o4DGekhXvq+N8UeKMMXCbY2ksJIuhtE1ixh5JreV+T85QvmjjBoETrRw== X-Received: by 10.157.85.150 with SMTP id m22mr7221357oth.218.1510622464597; Mon, 13 Nov 2017 17:21:04 -0800 (PST) Received: from localhost.localdomain (75-171-240-43.hlrn.qwest.net. [75.171.240.43]) by smtp.gmail.com with ESMTPSA id f11sm3065236otd.4.2017.11.13.17.21.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 13 Nov 2017 17:21:03 -0800 (PST) From: Martin Sebor Subject: Re: [PATCH] enhance -Warray-bounds to detect out-of-bounds offsets (PR 82455) To: Richard Biener , Jeff Law References: <79634da6-bf31-b7f0-15f5-0436fc21a51a@gmail.com> <5FF222AD-B155-434B-9C65-721009D1964E@suse.de> <1064925a-ac64-502e-d0dd-85c27e7432f6@gmail.com> <6d6e6b84-e4b0-069c-30fa-58e45b2cd4c7@redhat.com> Cc: Gcc Patch List Message-ID: <90f1f522-35b0-4989-bf77-b79948054281@gmail.com> Date: Tue, 14 Nov 2017 05:22:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2017-11/txt/msg01037.txt.bz2 On 11/10/2017 01:00 AM, Richard Biener wrote: > On Thu, 9 Nov 2017, Jeff Law wrote: > >> On 11/02/2017 05:48 AM, Richard Biener wrote: >> >>> >>> There were elaborate transforms of ptr + CST to ptr->a.b.c[3] in the >>> past. We have ripped out _most_ of them because of bad interaction >>> with dependence analysis and _b_o_s warnings. >>> >>> But for example PRE might still end up propagating >>> >>> _1 = &ptr->a.b.c; >>> _2 = (*_1)[i_3]; >>> >>> as >>> >>> _2 = ptr->a.b.c[i_3]; >>> >>> But it's not so much GCC building up GIMPLE expressions that would >>> cause false positives but "valid" C code and "invalid" C code >>> represented exactly the same in GCC. Let's say >> I think this is a key point. > > It's the usual issue with an optimizing compiler vs. a static analyzer. > We try to get rid of the little semantic details of the input languages > that in the end do not matter for code-generation but that makes > using those semantic details hard (sometimes the little details > are useful, like signed overflow being undefined). > > For GIMPLE it's also often the case that we didn't really thoroughly > specify the semantics of the IL - like is an aggregate copy a > block copy (that's how we expand it to RTL) or a memberwise copy? > SRA treats it like the latter in some cases but memcpy folding > turns memcpy into aggregate assignments ... (now think about padding). > > It's not that GCC doesn't have its set of existing issues with > respect to interpreting GIMPLE semantic as it seems fit in one way > here and in another way there. I'm just always nervous when adding > new "interpretations" where I know the non-existing formal definition > of GIMPLE leaves things unspecified. > > For example we _do_ use array bounds and array accesses (but _not_ > and for now _nowhere_ if they appear in address computations!) > to derive niter information. At the same time, because of this > exploitation, we try very very hard to never (ok, PRE above as a > couter-example) create an actual array access when dereferencing > a pointer that is constructed by taking the address of an array-ref. > That's why Martin added the warning to forwprop because that pass, > when forwarding such addresses, gets rid of the array-ref. > >>>> Or, if that's not it, what exactly is your concern with this >>>> enhancement? If it's that it's implemented in forwprop, what >>>> would be a better place, e.g., earlier in the optimization >>>> phase? If it's something something else, I'd appreciate it >>>> if you could explain what. >>> >>> For one implementing this in forwprop looks like a move in the >>> wrong direction. I'd like to have separate warning passes or >>> at most amend warnings from optimization passes, not add new ones. >> I tend to agree. That's one of the reasons why I pushed Aldy away from >> doing this kind of stuff within VRP. >> >> What I envision is a pass which does a dominator walk through the >> blocks. It gathers context sensitive range information as it does the walk. >> >> As we encounter array references, we try to check them against the >> current range information. We could also try to warn about certain >> pointer computations, though we have to be more careful with those. >> >> Though I certainly still worry that the false positive cases which led >> Aldy, Andrew and myself to look at path sensitive ranges arent' resolved >> and will limit the utility of doing more array range checking. > > I fear while this might be a little bit cleaner you'd still have to > do this very very early in the optimization pipeline (see all the > hard time we had with __builtin_object_size) and thus you won't catch > very many cases unless you start doing an IPA pass and handle propagating > through memory. Which is when you arrived at a full-blown static > analyzer. I have a different concern with the general idea of moving these kinds of warnings into passes of their own. It would unavoidably result in duplicating some code from the optimization passes (at a minimum, the GIMPLE traversal, but likely more than that). It would also require pretty tight coupling between the warning pass and the optimization passes as the latter can defeat the work of the former (as happens in this case). But most significantly, the separation isn't feasible in cases where the optimization pass computes and maintains non-trivial internal state (like, say, tree-object-size or tree-ssa-strlen). I think of these kinds of "basic" warnings as error checking and handling. It makes no more sense to me to have a separate pass check for input errors than it would to have one to check the validity of other kinds of input within GCC (such as trees of the wrong kind being passed from one function to another due to a GCC programmer's error). They go hand in hand. To be sure, there are more sophisticated kinds of checkers that depend on an in-depth analysis of the program that's outside of what most existing optimization passes do. The sprintf warning comes to mind as an example. But even those can be (and the sprintf pass has been) used to do both: detect and diagnose errors _and_ optimize code. And even these passes (or especially they) do or would benefit from data computed by other optimization passes to deliver better results (whether it be avoiding false positives or improving the code they generate). That said, it would make perfect sense to me to have the warning code live in separate files from the optimization code, and keep the separation of concerns and responsibilities crisp and well defined by their APIs. Martin