From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 65458 invoked by alias); 18 Apr 2019 12:31:21 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 65371 invoked by uid 89); 18 Apr 2019 12:31:16 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=recreating, prone, sword, wield X-HELO: mail-lj1-f195.google.com Received: from mail-lj1-f195.google.com (HELO mail-lj1-f195.google.com) (209.85.208.195) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 18 Apr 2019 12:31:13 +0000 Received: by mail-lj1-f195.google.com with SMTP id k8so1769193lja.8 for ; Thu, 18 Apr 2019 05:31:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4jNdk4ChSwFfSbgEBKCHHYV/Uip0cpm5cHTCWXIrA5w=; b=Irj3wNn0+tPbYRlDPPj+uE1a0fdSGCTwl9bGcdMR6UyEuESDixvpnY7yH6uuZR/7Ly qeW7YiHG3f7ZxRdScOMvu8wNDaKWVq1gTK90xvrCsnCX264VCos8+kJyEhf09l0goWLd NVH8oyKRvuCfPfA3TP3jrrNPNZU2rC9uWjAK84JOvOHrBs9/vbWDXsHcsROpJeVSSOzO l4bYHxMvzkFcQp+lDNuUQar0KULxJPcoVvkdfml51GD41T1Fw8MhEgtGKjpvK7l/KadX cjrRia3guz9fvG3u0M/47PNUt/6bIG7yLbyhlAPrSSG6lIp1ctgTplqlVoMgCBZ1ixG4 uuow== MIME-Version: 1.0 References: <1555502021.4884.1.camel@med.uni-goettingen.de> <1555505779.4884.4.camel@med.uni-goettingen.de> <1555510321.4884.7.camel@med.uni-goettingen.de> <1555588638.12545.1.camel@med.uni-goettingen.de> In-Reply-To: <1555588638.12545.1.camel@med.uni-goettingen.de> From: Richard Biener Date: Thu, 18 Apr 2019 12:31:00 -0000 Message-ID: Subject: Re: C provenance semantics proposal To: "Uecker, Martin" Cc: "gcc@gcc.gnu.org" , "Peter.Sewell@cl.cam.ac.uk" , "law@redhat.com" , "cl-c-memory-object-model@lists.cam.ac.uk" Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-04/txt/msg00200.txt.bz2 On Thu, Apr 18, 2019 at 1:57 PM Uecker, Martin wrote: > > Am Donnerstag, den 18.04.2019, 11:56 +0200 schrieb Richard Biener: > > On Thu, Apr 18, 2019 at 11:31 AM Richard Biener > > wrote: > > > > > > On Wed, Apr 17, 2019 at 4:12 PM Uecker, Martin > > > wrote: > > > > > > > > Am Mittwoch, den 17.04.2019, 15:34 +0200 schrieb Richard Biener: > > > > > On Wed, Apr 17, 2019 at 2:56 PM Uecker, Martin > > > > > wrote: > > .... > > > > Let's consider this example: > > > > > > > > int x; > > > > int y; > > > > uintptr_t pi = (uintptr_t)&x; > > > > uintptr_t pj = (uintptr_t)&y; > > > > > > > > if (pi + 4 == pj) { > > > > > > > > int* p = (int*)pj; // can be one-after pointer of 'x' > > > > p[-1] = 1; // well defined? > > > > } > > > > > > > > If I understand correctly, a pointer obtained from > > > > pi + 4 would have a "anything" provenance (which is > > > > fine). But the pointer obtained from 'pj' would have the > > > > provenance of 'y' so the access to 'x' would not > > > > be allowed. > > > > > > Correct. This is the most difficult case for us to handle > > > exactly also because (also valid for the proposal?) > > > > > > int x; > > > int y; > > > uintptr_t pi = (uintptr_t)&x; > > > uintptr_t pj = (uintptr_t)&y; > > > > > > if (pi + 4 == pj) { > > > > > > int* p = (int*)(pi + 4); // can be one-after pointer of 'x' > > > p[-1] = 1; // well defined? > > > } > > > > > > while well-handled by GCC in the written form (as you > > > say, pi + 4 yields "anything" provenance), GCC itself > > > may tranform it into the first variant by noticing > > > the conditional equivalence and substituting pj for > > > pi + 4. > > Integers are just integers in the proposal, so conditional > equivalence is not a problem for them. In my opinion this > is a strength of the proposal. Tracking provenance for > integers would mean that all computations would be affected > by such subtle semantics issues (where you can not even > replace an integer by an equivalent one). In this > proposal this is limited to pointers where it at least > makes some sense. > > > > > But according to the preferred version of > > > > our proposal, the pointer could also be used to > > > > access 'x' because it is also exposed. > > > > > > > > GCC could make pj have a "anything" provenance > > > > even though it is not modified. (This would break > > > > some optimization such as the one for Matlab.) > > > > > > > > Maybe one could also refine this optimization to check > > > > for additional conditions which rule out the case > > > > that there is another object the pointer could point > > > > to. > > > > > > The only feasible solution would be to not track > > > provenance through non-pointers and make > > > conversions of non-pointers to pointers have > > > "anything" provenance. > > This would be one solution, yes. But you could > reattach the same provenance if you know that the > pointer points in the middle of an object (so is > not a first or one-after pointer) or if you know > that there is no exposed object directly adjacent > to this object, etc.. > > > > The additional issue that appears here though > > > is that we cannot even turn (int *)(uintptr_t)p > > > into p anymore since with the conditional > > > substitution we can then still arrive at > > > effectively (&y)[-1] = 1 which is of course > > > undefined behavior. > > > > > > That is, your proposal makes > > > > > > ((int *)(uintptr_t)&y)[-1] = 1 > > > > > > well-defined (if &y - 1 == &x) but keeps > > > > > > (&y)[-1] = 1 > > > > > > as undefined which strikes me as a little bit > > > inconsistent. If that's true it's IMHO worth > > > a defect report and second consideration. > > This is true. But I would not call it inconsistent. > It is just unusual if you expect that casts to integers > and back are no-ops. In this proposal a round-trip has > the effect of stripping the original provenance and > attaching a new one (which could be the same as the > old one). Well, the standard explicitely says that if you convert a pointer to an integer (with the same or more precision) and back you get the same pointer back. That suggests (int *)(uintptr_t)&y is a semantical no-op? > While in this specific scenario this might seem > unreasonable, there are other examples where you may > want to be able to get from one object to the others. > and using casts to integers would then be the > blessed way to express this. Sure, no arguing about this. Sofar this all has been in the hands of implementors to make uses of this idiom work, now users will be able to wield the standards sword :/ > In my opinion, this is also intuitive: > By casting to an integer one then gets simple discrete > pointer semantics where one does not have provenance. > > > > Similarly that > > > > int x; > > int y; > > uintptr_t pj = (uintptr_t)&y; > > > > if (&x + 1 == &y) { > > > > int* p = (int*)pj; // can be one-after pointer of 'x' > > p[-1] = 1; // well defined? > > } > > > > is undefined but when I add a no-op > > > > (uintptr_t)&x; > > > > it is well-defined is undesirable. Can this no-op > > stmt appear in another function? Or even in > > another translation unit (if x and y are global variables)? > > And does such stmt have to be present (in another > > TU) to make the example valid in this case? > > Without that statement, the example is not valid as the > address of 'x' is not exposed. With the statement this > becomes valid and it does not matter where this statement > appears. Again, I agree that he fact that such a statement > has a side-effect is something one needs to get used to. > > But adress-taken already has side-effect which could be > surprising, doesn't it? If I understood your answer > above correctly, for GCC you get this side-effect already > without the cast: > > &x; Well, yes. But for GCC the important issue is whether this address-taking is still done after optimization (at the point we use provenance info to compute points-to sets). So this plain stmt wouldn't survive and would not make the example valid. It's of course a lot harder to write this down into standard wording ;) (if not impossible...) I guess there as to be a data dependence between an address-taken operation and recreating that address (or a derived one to the same object). That is, we're trying to support delta-compressing pointers as often used in shared memory data structures. But as you've seen already conditional "dependences" are prone to break. > For the statement to appear elsewhere, the address must > escape first. I would expect a compiler to treat a > cast to an integer identically to an escaped address. Sure, (uintptr_t)&a also takes the address of a and passing that integer to a function makes the address of the object a escape. > > To me all this makes requiring exposal through a cast > > to a non-pointer (or accessing its representation) not > > in any way more "useful" for an optimizing compiler than > > modeling exposal through address-taking. > > There would be a difference for cases like this: > > int x[3]; > int y; > > x[0] = 1; > uintptr_t pj = (uintptr_t)&y; > > if (pi + 4 == pj) { > > int* p = (int*)(pi + 4); > p[-1] = 1; > } > > Here 'x' is not exposed in our proposal so the assignment > via 'p' is invalid but the address is taken implicitly. Via the x[0] - yes. Unfortunate details of the C standard ;) > Other examples is storage allocated via malloc/alloca > where there is always a pointer involved but which is > not automatically exposed in our proposal. True, but the compiler nevertheless has to assume it is exposed once that pointer escapes the current function (or TU). It's hard to make the validity decision at parsing time and at optimization time a stmt like (uintptr_t)ptr; is gone very quickly. Richard. > > Best, > Martin > > >