From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-456305-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 42203 invoked by alias); 20 Jun 2017 08:14:02 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 42176 invoked by uid 89); 20 Jun 2017 08:14:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=qb
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 20 Jun 2017 08:13:54 +0000
Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 511D28046D;	Tue, 20 Jun 2017 08:13:53 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 511D28046D
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=jakub@redhat.com
DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 511D28046D
Received: from tucnak.zalov.cz (ovpn-116-143.ams2.redhat.com [10.36.116.143])	by smtp.corp.redhat.com (Postfix) with ESMTPS id C939D8B2A6;	Tue, 20 Jun 2017 08:13:52 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])	by tucnak.zalov.cz (8.15.2/8.15.2) with ESMTP id v5K8DnEP007367;	Tue, 20 Jun 2017 10:13:50 +0200
Received: (from jakub@localhost)	by tucnak.zalov.cz (8.15.2/8.15.2/Submit) id v5K8Dm1M007366;	Tue, 20 Jun 2017 10:13:48 +0200
Date: Tue, 20 Jun 2017 08:14:00 -0000
From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>
Cc: Martin =?utf-8?B?TGnFoWth?= <mliska@suse.cz>, gcc-patches@gcc.gnu.org
Subject: Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)
Message-ID: <20170620081348.GE2123@tucnak>
Reply-To: Jakub Jelinek <jakub@redhat.com>
References: <20170619182515.GA2123@tucnak> <alpine.LSU.2.20.1706200928120.22867@zhemvz.fhfr.qr>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <alpine.LSU.2.20.1706200928120.22867@zhemvz.fhfr.qr>
User-Agent: Mutt/1.7.1 (2016-10-04)
X-IsSubscribed: yes
X-SW-Source: 2017-06/txt/msg01400.txt.bz2

On Tue, Jun 20, 2017 at 09:41:43AM +0200, Richard Biener wrote:
> On Mon, 19 Jun 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > The following patch adds -fsanitize=pointer-overflow support,
> > which adds instrumentation (included in -fsanitize=undefined) that checks
> > that pointer arithmetics doesn't wrap.  If the offset on ptr p+ off when treating
> > it as signed value is non-negative, we check whether the result is bigger
> > (uintptr_t comparison) than ptr, if it is negative in ssizetype, we check
> > whether the result is smaller than ptr, otherwise we check at runtime
> > whether (ssizetype) off < 0 and do the check based on that.
> > The patch checks both POINTER_PLUS_EXPR, as well as e.g. ADDR_EXPR of
> > handled components, and even handled components themselves (exception
> > is for constant offset when the base is an automatic non-VLA decl or
> > decl that binds to current function where we can at compile time for
> > sure guarantee it will fit).
> 
> Does this "properly" interact with any array-bound sanitizing we do?
> Say, for
> 
>  &a->b[i].c.d
> 
> ?

It doesn't interact with it right now at all, and I think it shouldn't
for the case you wrote, &a->b[i].c.d even if i is within bounds of the
declared array the pointer still could point to something that wraps around.
struct S { struct T { struct U { int d; } c; } b[0x1000000]; } *a;
could still in a buggy program point to something that is actually much
shorter like a = (struct S *) malloc (0x10000 * sizeof (int));
or similar and there could be a wrap around.

What I have considered, but haven't implemented yet, is checking if there
is UBSAN_BOUNDS sanitization and it is actually array or structure with
array etc. (i.e. there is no pointer dereference) - for
&q.b[i].c.d
if the bounds check is present and we are sure about the object size
(i.e. automatic variable or locally defined file scope var).

> > Martin has said he'll write the sanopt part of optimization
> > (if UBSAN_PTR for some pointer is dominated by UBSAN_PTR for the same
> > pointer and the offset is constant in both cases and equal or absolute value
> > bigger and same sign in the dominating UBSAN_PTR, we can avoid the dominated
> > check).  
> > 
> > For the cases where there is a dereference (i.e. not ADDR_EXPR of the
> > handled component or POINTER_PLUS_EXPR), I wonder if we couldn't ignore
> > say constant offsets in range <-4096, 4096> or something similar, hoping
> > people don't have anything mapped at the page 0 and -pagesize in hosted
> > env.  Thoughts on that?
> 
> Not sure what the problem is here?

It would be an attempt to avoid sanitizing int foo (int *p) { return p[10] + p[-5]; }
(when the offset is constant and small and we dereference it).
If there is no page mapped at NULL or at the highest page in the virtual
address space, then the above will crash in case p + 10 or p - 5 wraps
around.

> > I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
> > and additionally bootstrapped/regtested with bootstrap-ubsan on both too.
> > The latter revealed a couple of issues I'd like to discuss:
> > 
> > 1) libcpp/symtab.c contains a couple of spots reduced into:
> > #define DELETED ((char *) -1)
> > void bar (char *);
> > void
> > foo (char *p)
> > {
> >   if (p && p != DELETED)
> >     bar (p);
> > }
> > where we fold it early into if ((p p+ -1) <= (char *) -3)
> > and as the instrumentation is done during ubsan pass, if p is NULL,
> > we diagnose this as invalid pointer overflow from NULL to 0xffff*f.
> > Shall we change the folder so that during GENERIC folding it
> > actually does the addition and comparison in pointer_sized_int
> > instead (my preference), or shall I move the UBSAN_PTR instrumentation
> > earlier into the FEs (but then I still risk stuff is folded earlier)?
> 
> Aww, so we turn the pointer test into a range test ;)  That it uses
> a pointer type rather than an unsigned integer type is a bug, probably
> caused by pointers being TYPE_UNSIGNED.
> 
> Not sure if the folding itself is worthwhile to keep though, thus an
> option would be to not generate range tests from pointers?

I'll have a look.  Maybe only do it during reassoc and not earlier.

> > 3) not really related to this patch, but something I also saw during the
> > bootstrap-ubsan on i686-linux:
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147475412 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147426384 - 2147478324 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147451580 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147450216 - 2147465664 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147469348 - 2147451544 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147482364 - 2147475376 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483624 - 2147475376 cannot be represented in type 'int'
> > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: -2147483628 - 2147451544 cannot be represented in type 'int'
> > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147426384 - 2147475376 cannot be represented in type 'int'
> > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: -2147450216 - 2147451544 cannot be represented in type 'int'
> > The problem here is that we lower pointer subtraction, e.g.
> > long foo (char *p, char *q) { return q - p; }
> > as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
> > and even for a valid testcase where we have an array across
> > the middle of the virtual address space, say the first one above
> > is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
> > there is 128KB array starting at 0x7fffd000, it will yield
> > UB (not in the source, but in whatever the compiler lowered it into).
> > So, shall we instead do the subtraction in sizetype and only then
> > cast?  For sizeof (*ptr) > 1 I think we have some outstanding PR,
> > and it is more difficult to find out in what types to compute it.
> > Or do we want to introduce POINTER_DIFF_EXPR?
> 
> Just use uintptr_t for the difference computation (well, an unsigned
> integer type of desired precision -- mind address-spaces), then cast
> the result to signed.

Ok (of course, will handle this separately from the rest).

	Jakub