From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-456307-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 64371 invoked by alias); 20 Jun 2017 08:18:26 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 64355 invoked by uid 89); 20 Jun 2017 08:18:24 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=excessive
X-HELO: mx1.suse.de
Received: from mx2.suse.de (HELO mx1.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Jun 2017 08:18:23 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx1.suse.de (Postfix) with ESMTP id D97F6ABC8;	Tue, 20 Jun 2017 08:18:20 +0000 (UTC)
Date: Tue, 20 Jun 2017 08:18:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Jakub Jelinek <jakub@redhat.com>
cc: =?ISO-8859-15?Q?Martin_Li=A8ka?= <mliska@suse.cz>, gcc-patches@gcc.gnu.org
Subject: Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)
In-Reply-To: <20170620081348.GE2123@tucnak>
Message-ID: <alpine.LSU.2.20.1706201015531.22867@zhemvz.fhfr.qr>
References: <20170619182515.GA2123@tucnak> <alpine.LSU.2.20.1706200928120.22867@zhemvz.fhfr.qr> <20170620081348.GE2123@tucnak>
User-Agent: Alpine 2.20 (LSU 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
X-SW-Source: 2017-06/txt/msg01402.txt.bz2

On Tue, 20 Jun 2017, Jakub Jelinek wrote:

> On Tue, Jun 20, 2017 at 09:41:43AM +0200, Richard Biener wrote:
> > On Mon, 19 Jun 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > The following patch adds -fsanitize=pointer-overflow support,
> > > which adds instrumentation (included in -fsanitize=undefined) that checks
> > > that pointer arithmetics doesn't wrap.  If the offset on ptr p+ off when treating
> > > it as signed value is non-negative, we check whether the result is bigger
> > > (uintptr_t comparison) than ptr, if it is negative in ssizetype, we check
> > > whether the result is smaller than ptr, otherwise we check at runtime
> > > whether (ssizetype) off < 0 and do the check based on that.
> > > The patch checks both POINTER_PLUS_EXPR, as well as e.g. ADDR_EXPR of
> > > handled components, and even handled components themselves (exception
> > > is for constant offset when the base is an automatic non-VLA decl or
> > > decl that binds to current function where we can at compile time for
> > > sure guarantee it will fit).
> > 
> > Does this "properly" interact with any array-bound sanitizing we do?
> > Say, for
> > 
> >  &a->b[i].c.d
> > 
> > ?
> 
> It doesn't interact with it right now at all, and I think it shouldn't
> for the case you wrote, &a->b[i].c.d even if i is within bounds of the
> declared array the pointer still could point to something that wraps around.
> struct S { struct T { struct U { int d; } c; } b[0x1000000]; } *a;
> could still in a buggy program point to something that is actually much
> shorter like a = (struct S *) malloc (0x10000 * sizeof (int));
> or similar and there could be a wrap around.
> 
> What I have considered, but haven't implemented yet, is checking if there
> is UBSAN_BOUNDS sanitization and it is actually array or structure with
> array etc. (i.e. there is no pointer dereference) - for
> &q.b[i].c.d
> if the bounds check is present and we are sure about the object size
> (i.e. automatic variable or locally defined file scope var).
> 
> > > Martin has said he'll write the sanopt part of optimization
> > > (if UBSAN_PTR for some pointer is dominated by UBSAN_PTR for the same
> > > pointer and the offset is constant in both cases and equal or absolute value
> > > bigger and same sign in the dominating UBSAN_PTR, we can avoid the dominated
> > > check).  
> > > 
> > > For the cases where there is a dereference (i.e. not ADDR_EXPR of the
> > > handled component or POINTER_PLUS_EXPR), I wonder if we couldn't ignore
> > > say constant offsets in range <-4096, 4096> or something similar, hoping
> > > people don't have anything mapped at the page 0 and -pagesize in hosted
> > > env.  Thoughts on that?
> > 
> > Not sure what the problem is here?
> 
> It would be an attempt to avoid sanitizing int foo (int *p) { return p[10] + p[-5]; }
> (when the offset is constant and small and we dereference it).
> If there is no page mapped at NULL or at the highest page in the virtual
> address space, then the above will crash in case p + 10 or p - 5 wraps
> around.

Ah, so merely an optimization to avoid excessive instrumentation then,
yes, this might work (maybe make 4096 a --param configurable to be able
to disable it).

> > > I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
> > > and additionally bootstrapped/regtested with bootstrap-ubsan on both too.
> > > The latter revealed a couple of issues I'd like to discuss:
> > > 
> > > 1) libcpp/symtab.c contains a couple of spots reduced into:
> > > #define DELETED ((char *) -1)
> > > void bar (char *);
> > > void
> > > foo (char *p)
> > > {
> > >   if (p && p != DELETED)
> > >     bar (p);
> > > }
> > > where we fold it early into if ((p p+ -1) <= (char *) -3)
> > > and as the instrumentation is done during ubsan pass, if p is NULL,
> > > we diagnose this as invalid pointer overflow from NULL to 0xffff*f.
> > > Shall we change the folder so that during GENERIC folding it
> > > actually does the addition and comparison in pointer_sized_int
> > > instead (my preference), or shall I move the UBSAN_PTR instrumentation
> > > earlier into the FEs (but then I still risk stuff is folded earlier)?
> > 
> > Aww, so we turn the pointer test into a range test ;)  That it uses
> > a pointer type rather than an unsigned integer type is a bug, probably
> > caused by pointers being TYPE_UNSIGNED.
> > 
> > Not sure if the folding itself is worthwhile to keep though, thus an
> > option would be to not generate range tests from pointers?
> 
> I'll have a look.  Maybe only do it during reassoc and not earlier.

It certainly looks somewhat premature in fold-const.c.

> > > 3) not really related to this patch, but something I also saw during the
> > > bootstrap-ubsan on i686-linux:
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147475412 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147478324 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147451580 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147465664 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147469348 - 2147451544 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147482364 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483624 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483628 - 2147451544 cannot be represented in type 'int'
> > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147426384 - 2147475376 cannot be represented in type 'int'
> > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147450216 - 2147451544 cannot be represented in type 'int'
> > > The problem here is that we lower pointer subtraction, e.g.
> > > long foo (char *p, char *q) { return q - p; }
> > > as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
> > > and even for a valid testcase where we have an array across
> > > the middle of the virtual address space, say the first one above
> > > is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
> > > there is 128KB array starting at 0x7fffd000, it will yield
> > > UB (not in the source, but in whatever the compiler lowered it into).
> > > So, shall we instead do the subtraction in sizetype and only then
> > > cast?  For sizeof (*ptr) > 1 I think we have some outstanding PR,
> > > and it is more difficult to find out in what types to compute it.
> > > Or do we want to introduce POINTER_DIFF_EXPR?
> > 
> > Just use uintptr_t for the difference computation (well, an unsigned
> > integer type of desired precision -- mind address-spaces), then cast
> > the result to signed.
> 
> Ok (of course, will handle this separately from the rest).

Yes.  Note I didn't look at the actual patch (yet).

Richard.