From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 115369 invoked by alias); 19 Jun 2017 22:08:57 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 115358 invoked by uid 89); 19 Jun 2017 22:08:56 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy= X-HELO: mx1.redhat.com Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 19 Jun 2017 22:08:46 +0000 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E9904C04B31F; Mon, 19 Jun 2017 22:08:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com E9904C04B31F Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=law@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com E9904C04B31F Received: from localhost.localdomain (ovpn-116-41.phx2.redhat.com [10.3.116.41]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2B7E717134; Mon, 19 Jun 2017 22:08:47 +0000 (UTC) Subject: Re: RFC: stack/heap collision vulnerability and mitigation with GCC To: Richard Biener , Jakub Jelinek , Eric Botcazou Cc: gcc-patches References: <20170619172932.GV2123@tucnak> <759F8732-F3ED-4778-9CD6-9A4DF1015D44@gmail.com> <3FD871AF-91A5-4C77-B5CF-A1E66C02E486@gmail.com> From: Jeff Law Message-ID: <930b90bc-3977-5ce4-0a38-32c0b31ff072@redhat.com> Date: Mon, 19 Jun 2017 22:08:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.1.0 MIME-Version: 1.0 In-Reply-To: <3FD871AF-91A5-4C77-B5CF-A1E66C02E486@gmail.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes X-SW-Source: 2017-06/txt/msg01383.txt.bz2 On 06/19/2017 12:02 PM, Richard Biener wrote: > On June 19, 2017 8:00:19 PM GMT+02:00, Richard Biener wrote: >> On June 19, 2017 7:29:32 PM GMT+02:00, Jakub Jelinek >> wrote: >>> On Mon, Jun 19, 2017 at 11:07:06AM -0600, Jeff Law wrote: >>>> After much poking around I concluded that we really need to >> implement >>>> allocation and probing via a "moving sp" strategy. Probing into >>>> unallocated areas runs afoul of valgrind, so that's a non-starter. >>>> >>>> Allocating stack space, then probing the pages within the space is >>>> vulnerable to async signal delivery between the allocation point and >>> the >>>> probe point. If that occurs the signal handler could end up running >>> on >>>> a stack that has collided with the heap. >>>> >>>> Ideally we would allocate and probe a page as an atomic unit (which >>> is >>>> feasible on PPC). Alternatively, due to ISA restrictions, allocate >> a >>>> page, then probe the page as distinct instructions. The latter >> still >>>> has a race, but we'd have to take the async signal in a single >>>> instruction window. >>> >>> And if the allocation is only a page at a time, the single insn race >>> window >>> can be mitigated in the kernel (probe (read-only is fine) the word at >>> the >>> stack when setting up a signal frame for async signal). >>> >>>> So, time to open the discussion to questions & comments. >>>> >>>> I've got patches I need to cleanup and post for comments that >>> implement >>>> this for x86, ppc, aarch64 and s390. x86 and ppc are IMHO in good >>>> shape. THere's an unhandled case for s390. I've got evaluation >>> still >>>> to do on aarch64. >>> >>> In the patches Jeff is going to post, we have (at least for >>> -fasynchronous-unwind-tables which is on by default on e.g. x86) >>> precise unwind info even with the new stack check mode. >>> ira.c currently has: >>> /* We need the frame pointer to catch stack overflow exceptions >> if >>> the stack pointer is moving (as for the alloca case just above). >> */ >>> || (STACK_CHECK_MOVING_SP >>> && flag_stack_check >>> && flag_exceptions >>> && cfun->can_throw_non_call_exceptions) >>> For alloca we have a frame pointer for other reasons, the question is >>> if we really need this hunk even if we provided proper unwind info >>> even for the Ada -fstack-check mode. Or, if we provide proper unwind >>> info >>> for -fasynchronous-unwind-tables, if the above could not be also >>> && !flag_asynchronous_unwind_tables. Eric, what exactly is the reason >>> for the above, is it just lack of proper CFI notes, or something >>> different? >>> >>> Also, on i?86 orq $0, (%rsp) or orl $0, (%esp) is used to probe stack, >>> while it is shorter, is it actually faster or as slow as movq $0, >>> (%rsp) >>> or movl $0, (%esp) ? >> >> It at least has the chance of bypassing all of the store queue in CPUs >> and thus cause no cacheline allocation or trigger prefetching. >> >> Not sure if any of that is done though. >> >> Performance counters might tell. >> >> Otherwise incrementing SP by 4095 and then pushing al would work as >> well (and be similarly short as the or). > > Oh, and using push intelligently with first bumping to SP & 4096-1 + 4095 would solve the signal atomicity as well. Might be larger and somewhat interfere with CPUs stack engine. Who knows... Happy to rely on Honza or Uros for guidance on that. Though we do have to maintain proper stack alignment, right? jeff