From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 116797 invoked by alias); 22 Jun 2017 15:30:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 113785 invoked by uid 89); 22 Jun 2017 15:30:24 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=compliant, 985, pros, 98.5 X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 22 Jun 2017 15:30:21 +0000 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DFAB97A16E; Thu, 22 Jun 2017 15:30:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com DFAB97A16E Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com DFAB97A16E Received: from localhost.localdomain (ovpn-117-117.phx2.redhat.com [10.3.117.117]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8B9DE19632; Thu, 22 Jun 2017 15:30:08 +0000 (UTC) Subject: Re: RFC: stack/heap collision vulnerability and mitigation with GCC To: "Richard Earnshaw (lists)" , gcc-patches References: <9dbdb66f-a9ec-c04a-8d83-e1597213e2da@arm.com> <6a46678a-01b7-50e8-c5e1-65ca25a5f662@redhat.com> <0afa3ebe-3a80-9324-c107-e1e39e747730@redhat.com> From: Jeff Law Message-ID: Date: Thu, 22 Jun 2017 15:30:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2017-06/txt/msg01680.txt.bz2 On 06/22/2017 03:53 AM, Richard Earnshaw (lists) wrote: > On 21/06/17 18:25, Jeff Law wrote: >> On 06/21/2017 02:41 AM, Richard Earnshaw (lists) wrote: >> >>>> But the stack pointer might have already been advanced into the guard >>>> page by the caller. For the sake of argument assume the guard page is >>>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that >>>> the caller hasn't touched the 0xf1000 page. >>> >>> Then make sure the caller does touch the 0xf1000 page. If it's >>> allocated that much stack it should be forced to do the probe and not >>> rely on all it's children having to do it because it can't be bothered. >> That needs to be mandated at the ABI level if it's going to happen. The >> threat model assumes that the caller adheres to the ABI, but was not >> necessarily compiled with -fstack-check. > > The base ABI would never mandate stack probes. Some systems may locate > the stack in such a way that it can never collide with the heap, making > guard pages and probes completely unnecessary (but perhaps at the > expense of limiting the theoretical maximum stack size). It might say > "if you do stack probes, use this model", but even then it would need to > parameterize the whole model as there are just too many OS configuration > options to consider (page size, size of guard zone, for example). I'm not suggesting mandating stack probes in the ABI, but that certain changes to the ABI can be made which would in turn make stack probing far more efficient. For example the PPC ABI mandates that *sp hold the outer frame address. That turns out to be amazingly useful from a probing standpoint -- we always know that *sp was written. That knowledge allows us to eliminate 98.5% of the prologue probing for glibc on PPC. I'm not suggesting you do exactly the same thing, but I do think there are things you could do at the ABI level which would drastically improve the generated code when stack probing is enabled and which would have minimal cost. > >> >> I'm all for making the common path fast and letting the uncommon cases >> pay additional penalties. That mindset has driven the work-to-date. >> >> But I don't think I have the liberty to change existing ABIs to >> facilitate lower overhead approaches. But I think ARM does given it >> owns the ABI for aarch64 and I would happily exploit whatever guarantees >> we can derive from an updated ABI. >> >> So if you want the caller to touch the page, you need to amend the ABI. >> I'd think touching the lowest address of the alloca area and outgoing >> args, if large would be sufficient. >> >> > > I can't help but feel there's a bit of a goode olde mediaeval witch hunt > going on here. As Wilco points out, we can never defend against a > function that is built without probe operations but skips the entire > guard zone. The only defence there is a larger guard zone, but how big > do you make it? No witchhunt at all. It just happens to be the case that x86 hits *sp when it stores the return pointer and that ppc always stores the backchain into *sp when it allocates additional stack space. As a result on those targets we know the offset between the stack pointer and the most recent probe is zero at the start of the callee's prologue. That allows us to avoid the vast majority of explicit probes. aarch64's ABI and ISA don't provide us with any such guarantees and we have to make very conservative assumptions which leads to much more explicit probing. s390 is in a similar situation to aarch64 in that it has to make worst case assumptions at prologue entry in the callee. I suspect many others will be too (but I haven't investigated each architecture as I've been focused strictly on RHEL targets). > > So we can design a half-way house probing scheme which doesn't really > solve the problem and is perhaps so expensive that most people will turn > it off. Or we can design something that addresses the problem > scientifically if applied everywhere and has almost zero impact on the > code size or performance. The latter would probably be so cheap that > most people would never notice that it was even on at all. Yes, you'd > need a system recompile to deploy it in full, but even a fairly limited > rebuild of critical libraries (libc, libstdc++) would help. > > Whichever route we take, this wouldn't be an ABI break. New code will > still interoperate with old code; you just don't get full heap > protection if you mix old and neAs the port maintainers I think you have significant say in how this plays out and you could (for example) say you simply aren't going to worry about cases where there's more than 1k of outgoing argument space or when the caller has a large alloca and wasn't compiled with -fstack-check and set the initial offset to 1k. That would provide a great deal of coverage, but still leaves folks using the aarch64 port vulnerable in certain corner cases. Red Hat would have to look at that carefully and decide to either leave customers vulnerable to the corner cases or pay the penalty of getting full protection. I honestly do not know where we'd land on that question, but we'd certainly have to evaluate the pros and cons. Or you could go with something like PPC64 and mandate something in the ABI. For example, you could mandate that the last word of dynamically allocated stack space must be written to and that the last word of the outgoing args space must be written to if the outgoing args space is > 1k. Note this would remain compatible with existing code since you're just forcing the caller to write into those areas at allocation time and the onus is on the uncommon code (allocas and really big outgoing args) to pay a tiny penalty and allow the common code (small frames in callee) to run fast. I suspect doing something like that would probably eliminate more than 90% of the explicit probes in glibc with minimal cost. Let's face it, a single write into alloca space is cheap compared to just the setup for alloca and a write into the outgoing args for >1k outgoing args isn't likely to happen enough to ever matter. And something like that is 100% compatible with existing code. You can mix and match freely. Obviously non-compliant code has a vulnerability, but as compliant compilers got into the wild the amount of non-compliant code would quickly drop. Jeff