From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-481539-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 97096 invoked by alias); 13 Jul 2018 16:46:59 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 97070 invoked by uid 89); 13 Jul 2018 16:46:56 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-25.9 required=5.0 tests=BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LAZY_DOMAIN_SECURITY,SPF_HELO_PASS autolearn=ham version=3.3.2 spammy=occurring
X-HELO: mx1.redhat.com
Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 13 Jul 2018 16:46:54 +0000
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23])	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))	(No client certificate requested)	by mx1.redhat.com (Postfix) with ESMTPS id 9F6407EA80;	Fri, 13 Jul 2018 16:46:53 +0000 (UTC)
Received: from localhost.localdomain (ovpn-112-9.rdu2.redhat.com [10.10.112.9])	by smtp.corp.redhat.com (Postfix) with ESMTP id 15A7426DED;	Fri, 13 Jul 2018 16:46:50 +0000 (UTC)
Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/6)]
To: Tamar Christina <Tamar.Christina@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, nd <nd@arm.com>, James Greenhalgh <James.Greenhalgh@arm.com>, Richard Earnshaw <Richard.Earnshaw@arm.com>, Marcus Shawcroft <Marcus.Shawcroft@arm.com>
References: <20180711112037.GA15738@arm.com> <0321b0d1-b768-87eb-3573-70547a1c3595@redhat.com> <20180712173958.GA24275@arm.com>
From: Jeff Law <law@redhat.com>
Openpgp: preference=signencrypt
Message-ID: <4dcce101-0b0c-c2df-414a-94880dbeb8b2@redhat.com>
Date: Fri, 13 Jul 2018 16:46:00 -0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0
MIME-Version: 1.0
In-Reply-To: <20180712173958.GA24275@arm.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
X-IsSubscribed: yes
X-SW-Source: 2018-07/txt/msg00736.txt.bz2

On 07/12/2018 11:39 AM, Tamar Christina wrote:
>>> +
>>> +  /* Round size to the nearest multiple of guard_size, and calculate the
>>> +     residual as the difference between the original size and the rounded
>>> +     size. */
>>> +  HOST_WIDE_INT rounded_size = size & -guard_size;
>>> +  HOST_WIDE_INT residual = size - rounded_size;
>>> +
>>> +  /* We can handle a small number of allocations/probes inline.  Otherwise
>>> +     punt to a loop.  */
>>> +  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
>>> +    {
>>> +      for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
>>> +	{
>>> +	  aarch64_sub_sp (NULL, temp2, guard_size, true);
>>> +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>>> +					   STACK_CLASH_CALLER_GUARD));
>>> +	}
>> So the only concern with this code is that it'll be inefficient and
>> possibly incorrect for probe sizes larger than ARITH_FACTOR.
>> Ultimately, that's a case I don't think we really care that much about.
>> I wouldn't lose sleep if the port clamped the requested probing interval
>> at ARITH_FACTOR.
>>
> I'm a bit confused here, the ARITH_FACTOR seems to have to do with the Ada
> stack probing implementation, which isn't used by this new code aside
> from the part that emits the actual probe when doing a variable or large
> allocation in aarch64_output_probe_stack_range.
> 
> Clamping the probing interval at ARITH_FACTOR means we can't do 64KB
> probing intervals. 
It may have been a misunderstanding on my part.  My understanding is
that ARITH_FACTOR represents the largest immediate constant we could
handle in this code using a single insn.  Anything above ARITH_FACTOR
needed a scratch register and I couldn't convince myself that we had a
suitable scratch register available.

But I'm really not well versed on the aarch64 architecture or the
various implementation details in aarch64.c.  So I'm happy to defer to
you and others @ ARM on what's best to do here.


>> That can be problematical in a couple cases.  First it can run afoul of
>> combine-stack-adjustments.  Essentially that pass will combine a series
>> of stack adjustments into a single insn so your unrolled probing turns
>> into something like this:
>>
>>   multi-page stack adjust
>>   probe first page
>>   probe second page
>>   probe third page
>>   and so on..
>>
> This is something we actually do want, particularly in the case of 4KB pages
> as the immediates would fit in the store.  It's one of the things we were
> thinking about for future improvements.
> 
>> That violates the design constraint that we never allocate more than a
>> page at a time.
> Would there be a big issue here if we didn't adhere to this constraint?
Yes, because it enlarges a window for exploitation.  Consider signal
delivery occurring after the adjustment but before the probes.  The
signal handler could then be running on a clashed heap/stack.

> 
>> Do you happen to have a libc.so and ld.so compiled with this code?  I've
>> got a scanner here which will look at a DSO and flag obviously invalid
>> stack allocations.  If you've got a suitable libc.so and ld.so I can run
>> them through the scanner.
>>
>>
>> Along similar lines, have you run the glibc testsuite with this stuff
>> enabled by default.  That proved useful to find codegen issues,
>> particularly with the register inheritance in the epilogue.
>>
> I'm running one now, I'll send the two binaries once I get the results back
> from the run. Thanks!
Great.  Looking forward getting those  .so files I can can throw them
into the scanner.

>>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>>> index 830f97603487d6ed07c4dc854a7898c4d17894d1..d1ed54c7160ab682c78d5950947608244d293025 100644
>>> --- a/gcc/config/aarch64/aarch64.md
>>> +++ b/gcc/config/aarch64/aarch64.md
>>> @@ -3072,7 +3072,7 @@
>>>  
>>>  (define_insn "cmp<mode>"
>>>    [(set (reg:CC CC_REGNUM)
>>> -	(compare:CC (match_operand:GPI 0 "register_operand" "r,r,r")
>>> +	(compare:CC (match_operand:GPI 0 "register_operand" "rk,r,r")
>>>  		    (match_operand:GPI 1 "aarch64_plus_operand" "r,I,J")))]
>>>    ""
>>>    "@
>> I don't think I needed this, but I can certainly understand how it might
>> be necessary.  THe only one I know we needed as the probe_stack_range
>> constraint change.
>>
> It's not strictly needed, but it allows us to do the comparison with the
> stack pointer in the loop without needing to emit sp to a temporary first.
> So it's just a tiny optimization. :)
Understood.  A similar tweak for cmp<mode> was done in a patch from
Richard E in his patches for spectre v1 mitigation.  I'm certainly not
objecting :-)


Jeff