public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
@ 2018-08-28 12:18 Tamar Christina
  2018-08-28 20:58 ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Tamar Christina @ 2018-08-28 12:18 UTC (permalink / raw)
  To: Tamar Christina, Jeff Law
  Cc: gcc-patches, nd, James Greenhalgh, Richard Earnshaw, Marcus Shawcroft

[-- Attachment #1: Type: text/plain, Size: 7620 bytes --]

Hi All,

As requested this patch series now contains basic SVE support, following that
I am updating this patch to remove the error/warnings generated when SVE is used.

The series now consists of 7 patches but I will only send updates for those that changed.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-28  Jeff Law  <law@redhat.com>
	    Richard Sandiford <richard.sandiford@linaro.org>
	    Tamar Christina  <tamar.christina@arm.com>

	PR target/86486
	* config/aarch64/aarch64.md
	(probe_stack_range): Add k (SP) constraint.
	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
	STACK_CLASH_MAX_UNROLL_PAGES): New.
	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
	stack probes for stack clash.
	(aarch64_allocate_and_probe_stack_space): New.
	(aarch64_expand_prologue): Use it.
	(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
	(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-08-28  Jeff Law  <law@redhat.com>
	    Richard Sandiford <richard.sandiford@linaro.org>
	    Tamar Christina  <tamar.christina@arm.com>

	PR target/86486
	* gcc.target/aarch64/stack-check-12.c: New.
	* gcc.target/aarch64/stack-check-13.c: New.
	* gcc.target/aarch64/stack-check-cfa-1.c: New.
	* gcc.target/aarch64/stack-check-cfa-2.c: New.
	* gcc.target/aarch64/stack-check-prologue-1.c: New.
	* gcc.target/aarch64/stack-check-prologue-10.c: New.
	* gcc.target/aarch64/stack-check-prologue-11.c: New.
	* gcc.target/aarch64/stack-check-prologue-2.c: New.
	* gcc.target/aarch64/stack-check-prologue-3.c: New.
	* gcc.target/aarch64/stack-check-prologue-4.c: New.
	* gcc.target/aarch64/stack-check-prologue-5.c: New.
	* gcc.target/aarch64/stack-check-prologue-6.c: New.
	* gcc.target/aarch64/stack-check-prologue-7.c: New.
	* gcc.target/aarch64/stack-check-prologue-8.c: New.
	* gcc.target/aarch64/stack-check-prologue-9.c: New.
	* gcc.target/aarch64/stack-check-prologue.h: New.
	* lib/target-supports.exp
	(check_effective_target_supports_stack_clash_protection): Add AArch64.


> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>
> On Behalf Of Tamar Christina
> Sent: Tuesday, August 7, 2018 11:10
> To: Jeff Law <law@redhat.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh
> <James.Greenhalgh@arm.com>; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>
> Subject: RE: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> Hi All,
> 
> This is  a re-spin of the patch to address review comments.
> It mostly just adds more comments and corrects typos.
> 
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-08-07  Jeff Law  <law@redhat.com>
> 	    Richard Sandiford <richard.sandiford@linaro.org>
> 	    Tamar Christina  <tamar.christina@arm.com>
> 
> 	PR target/86486
> 	* config/aarch64/aarch64.md (cmp<mode>,
> 	probe_stack_range): Add k (SP) constraint.
> 	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> 	STACK_CLASH_MAX_UNROLL_PAGES): New.
> 	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
> 	stack probes for stack clash.
> 	(aarch64_allocate_and_probe_stack_space): New.
> 	(aarch64_expand_prologue): Use it.
> 	(aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
> 	(aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-08-07  Jeff Law  <law@redhat.com>
> 	    Richard Sandiford <richard.sandiford@linaro.org>
> 	    Tamar Christina  <tamar.christina@arm.com>
> 
> 	PR target/86486
> 	* gcc.target/aarch64/stack-check-12.c: New.
> 	* gcc.target/aarch64/stack-check-13.c: New.
> 	* gcc.target/aarch64/stack-check-cfa-1.c: New.
> 	* gcc.target/aarch64/stack-check-cfa-2.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-1.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-10.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-11.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-2.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-3.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-4.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-5.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-6.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-7.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-8.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-9.c: New.
> 	* gcc.target/aarch64/stack-check-prologue.h: New.
> 	* lib/target-supports.exp
> 	(check_effective_target_supports_stack_clash_protection): Add
> AArch64.
> 
> > -----Original Message-----
> > From: Jeff Law <law@redhat.com>
> > Sent: Friday, August 3, 2018 19:05
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; James Greenhalgh
> > <James.Greenhalgh@arm.com>; Richard Earnshaw
> > <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> > <Marcus.Shawcroft@arm.com>
> > Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> > supporting 64k probes. [patch (1/6)]
> >
> > On 07/25/2018 05:09 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > Attached is an updated patch that clarifies some of the comments in
> > > the patch and adds comments to the individual testcases as requested.
> > >
> > > Ok for trunk?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/
> > > 2018-07-25  Jeff Law  <law@redhat.com>
> > > 	    Richard Sandiford <richard.sandiford@linaro.org>
> > > 	    Tamar Christina  <tamar.christina@arm.com>
> > >
> > > 	PR target/86486
> > > 	* config/aarch64/aarch64.md (cmp<mode>,
> > > 	probe_stack_range): Add k (SP) constraint.
> > > 	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> > > 	STACK_CLASH_MAX_UNROLL_PAGES): New.
> > > 	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> > Emit
> > > 	stack probes for stack clash.
> > > 	(aarch64_allocate_and_probe_stack_space): New.
> > > 	(aarch64_expand_prologue): Use it.
> > > 	(aarch64_expand_epilogue): Likewise and update IP regs re-use
> > criteria.
> > > 	(aarch64_sub_sp): Add emit_move_imm optional param.
> > >
> > > gcc/testsuite/
> > > 2018-07-25  Jeff Law  <law@redhat.com>
> > > 	    Richard Sandiford <richard.sandiford@linaro.org>
> > > 	    Tamar Christina  <tamar.christina@arm.com>
> > >
> > > 	PR target/86486
> > > 	* gcc.target/aarch64/stack-check-12.c: New.
> > > 	* gcc.target/aarch64/stack-check-13.c: New.
> > > 	* gcc.target/aarch64/stack-check-cfa-1.c: New.
> > > 	* gcc.target/aarch64/stack-check-cfa-2.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-1.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-10.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-11.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-2.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-3.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-4.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-5.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-6.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-7.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-8.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue-9.c: New.
> > > 	* gcc.target/aarch64/stack-check-prologue.h: New.
> > > 	* lib/target-supports.exp
> > > 	(check_effective_target_supports_stack_clash_protection): Add
> > AArch64.
> > OK on my end.  AArch64 maintainers have the final say since this is
> > all
> > AArch64 specific bits.
> >
> > jeff

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rb9150.patch --]
[-- Type: text/x-diff; name="rb9150.patch", Size: 30762 bytes --]

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c1218503bab19323eee1cca8b7e4bea8fbfcf573..bfb6d92f5e665b14926514b864489cb3f2793336 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -84,6 +84,14 @@
 
 #define LONG_DOUBLE_TYPE_SIZE	128
 
+/* This value is the amount of bytes a caller is allowed to drop the stack
+   before probing has to be done for stack clash protection.  */
+#define STACK_CLASH_CALLER_GUARD 1024
+
+/* This value controls how many pages we manually unroll the loop for when
+   generating stack clash probes.  */
+#define STACK_CLASH_MAX_UNROLL_PAGES 4
+
 /* The architecture reserves all bits of the address for hardware use,
    so the vbit must go into the delta field of pointers to member
    functions.  This is the same config as that in the AArch32
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e8d8104c066a265120ab776f7ab5a959d3512b6..06451f38b11822ea77323438fe8c7e373eb9e614 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2769,10 +2769,11 @@ aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
    if nonnull.  */
 
 static inline void
-aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p,
+		bool emit_move_imm = true)
 {
   aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
-		      temp1, temp2, frame_related_p);
+		      temp1, temp2, frame_related_p, emit_move_imm);
 }
 
 /* Set DEST to (vec_series BASE STEP).  */
@@ -3791,10 +3792,7 @@ aarch64_emit_probe_stack_range (HOST_WIDE_INT first, poly_int64 poly_size)
 {
   HOST_WIDE_INT size;
   if (!poly_size.is_constant (&size))
-    {
-      sorry ("stack probes for SVE frames");
       return;
-    }
 
   rtx reg1 = gen_rtx_REG (Pmode, PROBE_STACK_FIRST_REG);
 
@@ -3932,13 +3930,33 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   /* Loop.  */
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab);
 
+  HOST_WIDE_INT stack_clash_probe_interval
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+
   /* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL.  */
   xops[0] = reg1;
-  xops[1] = GEN_INT (PROBE_INTERVAL);
+  HOST_WIDE_INT interval;
+  if (flag_stack_clash_protection)
+    interval = stack_clash_probe_interval;
+  else
+    interval = PROBE_INTERVAL;
+
+  gcc_assert (aarch64_uimm12_shift (interval));
+  xops[1] = GEN_INT (interval);
+
   output_asm_insn ("sub\t%0, %0, %1", xops);
 
-  /* Probe at TEST_ADDR.  */
-  output_asm_insn ("str\txzr, [%0]", xops);
+  /* If doing stack clash protection then we probe up by the ABI specified
+     amount.  We do this because we're dropping full pages at a time in the
+     loop.  But if we're doing non-stack clash probing, probe at SP 0.  */
+  if (flag_stack_clash_protection)
+    xops[1] = GEN_INT (STACK_CLASH_CALLER_GUARD);
+  else
+    xops[1] = CONST0_RTX (GET_MODE (xops[1]));
+
+  /* Probe at TEST_ADDR.  If we're inside the loop it is always safe to probe
+     by this amount for each iteration.  */
+  output_asm_insn ("str\txzr, [%0, %1]", xops);
 
   /* Test if TEST_ADDR == LAST_ADDR.  */
   xops[1] = reg2;
@@ -4752,6 +4770,175 @@ aarch64_set_handled_components (sbitmap components)
       cfun->machine->reg_is_wrapped_separately[regno] = true;
 }
 
+/* Allocate POLY_SIZE bytes of stack space using TEMP1 and TEMP2 as scratch
+   registers.  If POLY_SIZE is not large enough to require a probe this function
+   will only adjust the stack.  When allocating the stack space
+   FRAME_RELATED_P is then used to indicate if the allocation is frame related.
+   FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
+   arguments.  If we are then we ensure that any allocation larger than the ABI
+   defined buffer needs a probe so that the invariant of having a 1KB buffer is
+   maintained.
+
+   We emit barriers after each stack adjustment to prevent optimizations from
+   breaking the invariant that we never drop the stack more than a page.  This
+   invariant is needed to make it easier to correctly handle asynchronous
+   events, e.g. if we were to allow the stack to be dropped by more than a page
+   and then have multiple probes up and we take a signal somewhere in between
+   then the signal handler doesn't know the state of the stack and can make no
+   assumptions about which pages have been probed.  */
+
+static void
+aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
+					poly_int64 poly_size,
+					bool frame_related_p,
+					bool final_adjustment_p)
+{
+  HOST_WIDE_INT guard_size
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
+  /* When doing the final adjustment for the outgoing argument size we can't
+     assume that LR was saved at position 0.  So subtract it's offset from the
+     ABI safe buffer so that we don't accidentally allow an adjustment that
+     would result in an allocation larger than the ABI buffer without
+     probing.  */
+  HOST_WIDE_INT min_probe_threshold
+    = final_adjustment_p
+      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
+      : guard_size - guard_used_by_caller;
+  poly_int64 frame_size = cfun->machine->frame.frame_size;
+
+  /* We should always have a positive probe threshold.  */
+  gcc_assert (min_probe_threshold > 0);
+
+  if (flag_stack_clash_protection && !final_adjustment_p)
+    {
+      poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+      poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+
+      if (known_eq (frame_size, 0))
+	{
+	  dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
+	}
+      else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
+	       && known_lt (final_adjust, guard_used_by_caller))
+	{
+	  dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
+	}
+    }
+
+  HOST_WIDE_INT size;
+  /* If SIZE is not large enough to require probing, just adjust the stack and
+     exit.  */
+  if (!poly_size.is_constant (&size)
+      || known_lt (poly_size, min_probe_threshold)
+      || !flag_stack_clash_protection)
+    {
+      aarch64_sub_sp (temp1, temp2, poly_size, frame_related_p);
+      return;
+    }
+
+  if (dump_file)
+    fprintf (dump_file,
+	     "Stack clash AArch64 prologue: " HOST_WIDE_INT_PRINT_DEC " bytes"
+	     ", probing will be required.\n", size);
+
+  /* Round size to the nearest multiple of guard_size, and calculate the
+     residual as the difference between the original size and the rounded
+     size.  */
+  HOST_WIDE_INT rounded_size = ROUND_DOWN (size, guard_size);
+  HOST_WIDE_INT residual = size - rounded_size;
+
+  /* We can handle a small number of allocations/probes inline.  Otherwise
+     punt to a loop.  */
+  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
+    {
+      for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
+	{
+	  aarch64_sub_sp (NULL, temp2, guard_size, true);
+	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+					   STACK_CLASH_CALLER_GUARD));
+	  emit_insn (gen_blockage ());
+	}
+      dump_stack_clash_frame_info (PROBE_INLINE, size != rounded_size);
+    }
+  else
+    {
+      /* Compute the ending address.  */
+      aarch64_add_offset (Pmode, temp1, stack_pointer_rtx, -rounded_size,
+			  temp1, NULL, false, true);
+      rtx_insn *insn = get_last_insn ();
+
+      /* For the initial allocation, we don't have a frame pointer
+	 set up, so we always need CFI notes.  If we're doing the
+	 final allocation, then we may have a frame pointer, in which
+	 case it is the CFA, otherwise we need CFI notes.
+
+	 We can determine which allocation we are doing by looking at
+	 the value of FRAME_RELATED_P since the final allocations are not
+	 frame related.  */
+      if (frame_related_p)
+	{
+	  /* We want the CFA independent of the stack pointer for the
+	     duration of the loop.  */
+	  add_reg_note (insn, REG_CFA_DEF_CFA,
+			plus_constant (Pmode, temp1, rounded_size));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      /* This allocates and probes the stack.  Note that this re-uses some of
+	 the existing Ada stack protection code.  However we are guaranteed not
+	 to enter the non loop or residual branches of that code.
+
+	 The non-loop part won't be entered because if our allocation amount
+	 doesn't require a loop, the case above would handle it.
+
+	 The residual amount won't be entered because TEMP1 is a mutliple of
+	 the allocation size.  The residual will always be 0.  As such, the only
+	 part we are actually using from that code is the loop setup.  The
+	 actual probing is done in aarch64_output_probe_stack_range.  */
+      insn = emit_insn (gen_probe_stack_range (stack_pointer_rtx,
+					       stack_pointer_rtx, temp1));
+
+      /* Now reset the CFA register if needed.  */
+      if (frame_related_p)
+	{
+	  add_reg_note (insn, REG_CFA_DEF_CFA,
+			plus_constant (Pmode, stack_pointer_rtx, rounded_size));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      emit_insn (gen_blockage ());
+      dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size);
+    }
+
+  /* Handle any residuals.  Residuals of at least min_probe_threshold have to
+     be probed.  This maintains the requirement that each page is probed at
+     least once.  For initial probing we probe only if the allocation is
+     more than guard_size - buffer, and for the outgoing arguments we probe
+     if the amount is larger than buffer.  guard_size - buffer + buffer ==
+     guard_size.  This works that for any allocation that is large enough to
+     trigger a probe here, we'll have at least one, and if they're not large
+     enough for this code to emit anything for them, The page would have been
+     probed by the saving of FP/LR either by this function or any callees.  If
+     we don't have any callees then we won't have more stack adjustments and so
+     are still safe.  */
+  if (residual)
+    {
+      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
+      if (residual >= min_probe_threshold)
+	{
+	  if (dump_file)
+	    fprintf (dump_file,
+		     "Stack clash AArch64 prologue residuals: "
+		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
+		     "\n", residual);
+	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+					   STACK_CLASH_CALLER_GUARD));
+	  emit_insn (gen_blockage ());
+	}
+    }
+}
+
 /* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
    is saved at BASE + OFFSET.  */
 
@@ -4779,7 +4966,7 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
 	|  local variables              | <-- frame_pointer_rtx
 	|                               |
 	+-------------------------------+
-	|  padding0                     | \
+	|  padding                      | \
 	+-------------------------------+  |
 	|  callee-saved registers       |  | frame.saved_regs_size
 	+-------------------------------+  |
@@ -4798,7 +4985,23 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
 
    Dynamic stack allocations via alloca() decrease stack_pointer_rtx
    but leave frame_pointer_rtx and hard_frame_pointer_rtx
-   unchanged.  */
+   unchanged.
+
+   By default for stack-clash we assume the guard is at least 64KB, but this
+   value is configurable to either 4KB or 64KB.  We also force the guard size to
+   be the same as the probing interval and both values are kept in sync.
+
+   With those assumptions the callee can allocate up to 63KB (or 3KB depending
+   on the guard size) of stack space without probing.
+
+   When probing is needed, we emit a probe at the start of the prologue
+   and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
+
+   We have to track how much space has been allocated and the only stores
+   to the stack we track as implicit probes are the FP/LR stores.
+
+   For outgoing arguments we probe if the size is larger than 1KB, such that
+   the ABI specified buffer is maintained for the next callee.  */
 
 /* Generate the prologue instructions for entry into a function.
    Establish the stack frame by decreasing the stack pointer with a
@@ -4849,7 +5052,16 @@ aarch64_expand_prologue (void)
   rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
   rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
 
-  aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
+  /* In theory we should never have both an initial adjustment
+     and a callee save adjustment.  Verify that is the case since the
+     code below does not handle it for -fstack-clash-protection.  */
+  gcc_assert (known_eq (initial_adjust, 0) || callee_adjust == 0);
+
+  /* Will only probe if the initial adjustment is larger than the guard
+     less the amount of the guard reserved for use by the caller's
+     outgoing args.  */
+  aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust,
+					  true, false);
 
   if (callee_adjust != 0)
     aarch64_push_regs (reg1, reg2, callee_adjust);
@@ -4905,7 +5117,11 @@ aarch64_expand_prologue (void)
 			     callee_adjust != 0 || emit_frame_chain);
   aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
 			     callee_adjust != 0 || emit_frame_chain);
-  aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
+
+  /* We may need to probe the final adjustment if it is larger than the guard
+     that is assumed by the called.  */
+  aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust,
+					  !frame_pointer_needed, true);
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -4949,10 +5165,21 @@ aarch64_expand_epilogue (bool for_sibcall)
   /* A stack clash protection prologue may not have left IP0_REGNUM or
      IP1_REGNUM in a usable state.  The same is true for allocations
      with an SVE component, since we then need both temporary registers
-     for each allocation.  */
+     for each allocation.  For stack clash we are in a usable state if
+     the adjustment is less than GUARD_SIZE - GUARD_USED_BY_CALLER.  */
+  HOST_WIDE_INT guard_size
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
+
+  /* We can re-use the registers when the allocation amount is smaller than
+     guard_size - guard_used_by_caller because we won't be doing any probes
+     then.  In such situations the register should remain live with the correct
+     value.  */
   bool can_inherit_p = (initial_adjust.is_constant ()
-			&& final_adjust.is_constant ()
-			&& !flag_stack_clash_protection);
+			&& final_adjust.is_constant ())
+			&& (!flag_stack_clash_protection
+			     || known_lt (initial_adjust,
+					  guard_size - guard_used_by_caller));
 
   /* We need to add memory barrier to prevent read from deallocated stack.  */
   bool need_barrier_p
@@ -4980,8 +5207,10 @@ aarch64_expand_epilogue (bool for_sibcall)
 			hard_frame_pointer_rtx, -callee_offset,
 			ip1_rtx, ip0_rtx, callee_adjust == 0);
   else
-    aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
-		    !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
+     /* The case where we need to re-use the register here is very rare, so
+	avoid the complicated condition and just always emit a move if the
+	immediate doesn't fit.  */
+     aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust, true);
 
   aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
 				callee_adjust != 0, &cfi_ops);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 22d20eae5c57de81827b3f0f676635a8fff2f054..b8da13f14fa9990e8fdc3c71ed407c8afc65a324 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6453,7 +6453,7 @@
 )
 
 (define_insn "probe_stack_range"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=rk")
 	(unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
 			     (match_operand:DI 2 "register_operand" "r")]
 			      UNSPECV_PROBE_STACK_RANGE))]
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-12.c b/gcc/testsuite/gcc.target/aarch64/stack-check-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e3abcbcef2eb216be9f0e01b4f1713c33f8b0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-12.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+extern void arf (unsigned long int *, unsigned long int *);
+void
+frob ()
+{
+  unsigned long int num[10000];
+  unsigned long int den[10000];
+  arf (den, num);
+}
+
+/* This verifies that the scheduler did not break the dependencies
+   by adjusting the offsets within the probe and that the scheduler
+   did not reorder around the stack probes.  */
+/* { dg-final { scan-assembler-times {sub\tsp, sp, #65536\n\tstr\txzr, \[sp, 1024\]} 2 } } */
+/* There is some residual allocation, but we don't care about that. Only that it's not probed.  */
+/* { dg-final { scan-assembler-times {str\txzr, } 2 } } */
+
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-13.c b/gcc/testsuite/gcc.target/aarch64/stack-check-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..1fcbae6e3fc6c8423883542d16735e2a8ca0e013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-13.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X
+#define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X)
+void out1(ARG192(__int128));
+int t1(int);
+
+int t3(int x)
+{
+  if (x < 1000)
+    return t1 (x) + 1;
+
+  out1 (ARG192(1));
+  return 0;
+}
+
+
+
+/* This test creates a large (> 1k) outgoing argument area that needs
+   to be probed.  We don't test the exact size of the space or the
+   exact offset to make the test a little less sensitive to trivial
+   output changes.  */
+/* { dg-final { scan-assembler-times "sub\\tsp, sp, #....\\n\\tstr\\txzr, \\\[sp" 1 } } */
+
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..6885894a97e0a53cf87fc3ff9ded156014864c4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128*1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 65536} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131072} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
+
+/* Checks that the CFA notes are correct for every sp adjustment.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..5796a53be0676bace2197e9d07a63b4b1757fd0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 1280*1024 + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 1310720} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311232} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1310720} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
+
+/* Checks that the CFA notes are correct for every sp adjustment.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..d2bfb788c6ff731ed0592e16813147e3e58b4df2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
+
+/* SIZE is smaller than guard-size - 1Kb so no probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..c9c9a1b9161b2bec19e61bf648b26938dcf001b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE (6 * 64 * 1024) + (1 * 63 * 1024) + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
+   1 probe expected in a loop and 1 residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..741f2f5fadc6960f1d2c34e1e93589cfcc6e1697
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE (6 * 64 * 1024) + (1 * 32 * 1024)
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
+   1 probe expected in a loop and 1 residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..61c52a251a7bf4f2d145e456c86049230d372ba4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 2 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
+
+/* SIZE is smaller than guard-size - 1Kb so no probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..0bef3c5b60c2abab6c28abd41444b5e3569c3652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 63 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 1 } } */
+
+/* SIZE is exactly guard-size - 1Kb, boundary condition so 1 probe expected.
+*/
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5b8693a051c321cb5a2f701cd3272e3970a6a4de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 63 * 1024 + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is less than 1kB,
+   1 probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ee16350127c2e201da7d990dbcb042691b52348
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 64 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is zero,
+   1 probe expected, boundary condition.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c9b606cbe0e3b4f75c86c22dd1f69dde7e36310
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 65 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is equal to 1kB,
+   1 probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..6324c0367fada8a2726e689d39e46ad5f8e130b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 127 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 1x guard-size and remainder equal than guard-size - 1Kb,
+   2 probe expected, unrolled, no loop.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..333f5fcc3607ee633c1b9374f6d5e8ac4a954b37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 2x guard-size and no remainder, unrolled, no loop.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3ff89b558139e56d2d69c93307fcab79c89a103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 6 * 64 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than 4x guard-size and no remainder, 1 probe expected in a loop
+   and no residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h
new file mode 100644
index 0000000000000000000000000000000000000000..b7e06aedb81d7692ebd587b23d1065436b1c7218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h
@@ -0,0 +1,5 @@
+int f_test (int x)
+{
+  char arr[SIZE];
+  return arr[x];
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b04ceb6508e77b1e7d489207652d8e5d4ea8cf35..7f33ce8a3efabc8a9144a83ee326120415fd38f7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9330,14 +9330,9 @@ proc check_effective_target_autoincdec { } {
 # 
 proc check_effective_target_supports_stack_clash_protection { } {
 
-   # Temporary until the target bits are fully ACK'd.
-#  if { [istarget aarch*-*-*] } {
-#	return 1
-#  }
-
     if { [istarget x86_64-*-*] || [istarget i?86-*-*] 
 	  || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
-	  || [istarget s390*-*-*] } {
+	  || [istarget aarch64*-**] || [istarget s390*-*-*] } {
 	return 1
     }
   return 0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-08-28 12:18 [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)] Tamar Christina
@ 2018-08-28 20:58 ` Richard Sandiford
  2018-09-07 16:03   ` Tamar Christina
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2018-08-28 20:58 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Jeff Law, gcc-patches, nd, James Greenhalgh, Richard Earnshaw,
	Marcus Shawcroft

Tamar Christina <Tamar.Christina@arm.com> writes:
> +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> +  /* When doing the final adjustment for the outgoing argument size we can't
> +     assume that LR was saved at position 0.  So subtract it's offset from the
> +     ABI safe buffer so that we don't accidentally allow an adjustment that
> +     would result in an allocation larger than the ABI buffer without
> +     probing.  */
> +  HOST_WIDE_INT min_probe_threshold
> +    = final_adjustment_p
> +      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> +      : guard_size - guard_used_by_caller;
[...]
> +  if (residual)
> +    {
> +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> +      if (residual >= min_probe_threshold)
> +	{
> +	  if (dump_file)
> +	    fprintf (dump_file,
> +		     "Stack clash AArch64 prologue residuals: "
> +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> +		     "\n", residual);
> +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> +					   STACK_CLASH_CALLER_GUARD));

reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
like the probe would then write above the region.

Using >= rather than > means that the same thing could happen when
LR_REGNUM is at position 0, if the residual is exactly
STACK_CLASH_CALLER_GUARD.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-08-28 20:58 ` Richard Sandiford
@ 2018-09-07 16:03   ` Tamar Christina
  2018-09-11 14:49     ` Richard Sandiford
  2018-09-11 15:55     ` James Greenhalgh
  0 siblings, 2 replies; 8+ messages in thread
From: Tamar Christina @ 2018-09-07 16:03 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Jeff Law, gcc-patches, nd, James Greenhalgh, Richard Earnshaw,
	Marcus Shawcroft

[-- Attachment #1: Type: text/plain, Size: 4058 bytes --]

Hi Richard,

The 08/28/2018 21:58, Richard Sandiford wrote:
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> > +  /* When doing the final adjustment for the outgoing argument size we can't
> > +     assume that LR was saved at position 0.  So subtract it's offset from the
> > +     ABI safe buffer so that we don't accidentally allow an adjustment that
> > +     would result in an allocation larger than the ABI buffer without
> > +     probing.  */
> > +  HOST_WIDE_INT min_probe_threshold
> > +    = final_adjustment_p
> > +      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> > +      : guard_size - guard_used_by_caller;
> [...]
> > +  if (residual)
> > +    {
> > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > +      if (residual >= min_probe_threshold)
> > +	{
> > +	  if (dump_file)
> > +	    fprintf (dump_file,
> > +		     "Stack clash AArch64 prologue residuals: "
> > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> > +		     "\n", residual);
> > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > +					   STACK_CLASH_CALLER_GUARD));
> 
> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
> min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
> like the probe would then write above the region.
> 
> Using >= rather than > means that the same thing could happen when
> LR_REGNUM is at position 0, if the residual is exactly
> STACK_CLASH_CALLER_GUARD.

That's true. While addressing this we changed how the residuals are probed.

To address a comment you raised offline about the saving of LR when calling
a no-return function using a tail call and -fomit-frame-pointer, I think this should
be safe as the code in frame_layout (line 4131-4136) would ensure that R30 is saved.

I have added two new tests to check for this, so that if it does change in the future they
would fail. 

Attached is the updated patch and new changelog

Ok for trunk?
Thanks,
Tamar

gcc/
2018-09-07  Jeff Law  <law@redhat.com>
	    Richard Sandiford <richard.sandiford@linaro.org>
	    Tamar Christina  <tamar.christina@arm.com>

	PR target/86486
	* config/aarch64/aarch64.md
	(probe_stack_range): Add k (SP) constraint.
	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
	STACK_CLASH_MAX_UNROLL_PAGES): New.
	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
	stack probes for stack clash.
	(aarch64_allocate_and_probe_stack_space): New.
	(aarch64_expand_prologue): Use it.
	(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
	(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-09-07  Jeff Law  <law@redhat.com>
	    Richard Sandiford <richard.sandiford@linaro.org>
	    Tamar Christina  <tamar.christina@arm.com>

	PR target/86486
	* gcc.target/aarch64/stack-check-12.c: New.
	* gcc.target/aarch64/stack-check-13.c: New.
	* gcc.target/aarch64/stack-check-cfa-1.c: New.
	* gcc.target/aarch64/stack-check-cfa-2.c: New.
	* gcc.target/aarch64/stack-check-prologue-1.c: New.
	* gcc.target/aarch64/stack-check-prologue-10.c: New.
	* gcc.target/aarch64/stack-check-prologue-11.c: New.
	* gcc.target/aarch64/stack-check-prologue-12.c: New.
	* gcc.target/aarch64/stack-check-prologue-13.c: New.
	* gcc.target/aarch64/stack-check-prologue-14.c: New.
	* gcc.target/aarch64/stack-check-prologue-15.c: New.
	* gcc.target/aarch64/stack-check-prologue-2.c: New.
	* gcc.target/aarch64/stack-check-prologue-3.c: New.
	* gcc.target/aarch64/stack-check-prologue-4.c: New.
	* gcc.target/aarch64/stack-check-prologue-5.c: New.
	* gcc.target/aarch64/stack-check-prologue-6.c: New.
	* gcc.target/aarch64/stack-check-prologue-7.c: New.
	* gcc.target/aarch64/stack-check-prologue-8.c: New.
	* gcc.target/aarch64/stack-check-prologue-9.c: New.
	* gcc.target/aarch64/stack-check-prologue.h: New.
	* lib/target-supports.exp
	(check_effective_target_supports_stack_clash_protection): Add AArch64.

> 
> Thanks,
> Richard

-- 

[-- Attachment #2: rb9150.patch --]
[-- Type: text/x-diff, Size: 35598 bytes --]

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index c1218503bab19323eee1cca8b7e4bea8fbfcf573..bfb6d92f5e665b14926514b864489cb3f2793336 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -84,6 +84,14 @@
 
 #define LONG_DOUBLE_TYPE_SIZE	128
 
+/* This value is the amount of bytes a caller is allowed to drop the stack
+   before probing has to be done for stack clash protection.  */
+#define STACK_CLASH_CALLER_GUARD 1024
+
+/* This value controls how many pages we manually unroll the loop for when
+   generating stack clash probes.  */
+#define STACK_CLASH_MAX_UNROLL_PAGES 4
+
 /* The architecture reserves all bits of the address for hardware use,
    so the vbit must go into the delta field of pointers to member
    functions.  This is the same config as that in the AArch32
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1e8d8104c066a265120ab776f7ab5a959d3512b6..283f3372798c84ef74356128acf2a5be7b4ce1ad 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -2769,10 +2769,11 @@ aarch64_add_sp (rtx temp1, rtx temp2, poly_int64 delta, bool emit_move_imm)
    if nonnull.  */
 
 static inline void
-aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, rtx temp2, poly_int64 delta, bool frame_related_p,
+		bool emit_move_imm = true)
 {
   aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
-		      temp1, temp2, frame_related_p);
+		      temp1, temp2, frame_related_p, emit_move_imm);
 }
 
 /* Set DEST to (vec_series BASE STEP).  */
@@ -3932,13 +3933,33 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   /* Loop.  */
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_lab);
 
+  HOST_WIDE_INT stack_clash_probe_interval
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+
   /* TEST_ADDR = TEST_ADDR + PROBE_INTERVAL.  */
   xops[0] = reg1;
-  xops[1] = GEN_INT (PROBE_INTERVAL);
+  HOST_WIDE_INT interval;
+  if (flag_stack_clash_protection)
+    interval = stack_clash_probe_interval;
+  else
+    interval = PROBE_INTERVAL;
+
+  gcc_assert (aarch64_uimm12_shift (interval));
+  xops[1] = GEN_INT (interval);
+
   output_asm_insn ("sub\t%0, %0, %1", xops);
 
-  /* Probe at TEST_ADDR.  */
-  output_asm_insn ("str\txzr, [%0]", xops);
+  /* If doing stack clash protection then we probe up by the ABI specified
+     amount.  We do this because we're dropping full pages at a time in the
+     loop.  But if we're doing non-stack clash probing, probe at SP 0.  */
+  if (flag_stack_clash_protection)
+    xops[1] = GEN_INT (STACK_CLASH_CALLER_GUARD);
+  else
+    xops[1] = CONST0_RTX (GET_MODE (xops[1]));
+
+  /* Probe at TEST_ADDR.  If we're inside the loop it is always safe to probe
+     by this amount for each iteration.  */
+  output_asm_insn ("str\txzr, [%0, %1]", xops);
 
   /* Test if TEST_ADDR == LAST_ADDR.  */
   xops[1] = reg2;
@@ -4752,6 +4773,188 @@ aarch64_set_handled_components (sbitmap components)
       cfun->machine->reg_is_wrapped_separately[regno] = true;
 }
 
+/* Allocate POLY_SIZE bytes of stack space using TEMP1 and TEMP2 as scratch
+   registers.  If POLY_SIZE is not large enough to require a probe this function
+   will only adjust the stack.  When allocating the stack space
+   FRAME_RELATED_P is then used to indicate if the allocation is frame related.
+   FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing
+   arguments.  If we are then we ensure that any allocation larger than the ABI
+   defined buffer needs a probe so that the invariant of having a 1KB buffer is
+   maintained.
+
+   We emit barriers after each stack adjustment to prevent optimizations from
+   breaking the invariant that we never drop the stack more than a page.  This
+   invariant is needed to make it easier to correctly handle asynchronous
+   events, e.g. if we were to allow the stack to be dropped by more than a page
+   and then have multiple probes up and we take a signal somewhere in between
+   then the signal handler doesn't know the state of the stack and can make no
+   assumptions about which pages have been probed.  */
+
+static void
+aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2,
+					poly_int64 poly_size,
+					bool frame_related_p,
+					bool final_adjustment_p)
+{
+  HOST_WIDE_INT guard_size
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
+  /* When doing the final adjustment for the outgoing argument size we can't
+     assume that LR was saved at position 0.  So subtract it's offset from the
+     ABI safe buffer so that we don't accidentally allow an adjustment that
+     would result in an allocation larger than the ABI buffer without
+     probing.  */
+  HOST_WIDE_INT min_probe_threshold
+    = final_adjustment_p
+      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
+      : guard_size - guard_used_by_caller;
+
+  poly_int64 frame_size = cfun->machine->frame.frame_size;
+
+  /* We should always have a positive probe threshold.  */
+  gcc_assert (min_probe_threshold > 0);
+
+  if (flag_stack_clash_protection && !final_adjustment_p)
+    {
+      poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
+      poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+
+      if (known_eq (frame_size, 0))
+	{
+	  dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
+	}
+      else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
+	       && known_lt (final_adjust, guard_used_by_caller))
+	{
+	  dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
+	}
+    }
+
+  HOST_WIDE_INT size;
+  /* If SIZE is not large enough to require probing, just adjust the stack and
+     exit.  */
+  if (!poly_size.is_constant (&size)
+      || known_lt (poly_size, min_probe_threshold)
+      || !flag_stack_clash_protection)
+    {
+      aarch64_sub_sp (temp1, temp2, poly_size, frame_related_p);
+      return;
+    }
+
+  if (dump_file)
+    fprintf (dump_file,
+	     "Stack clash AArch64 prologue: " HOST_WIDE_INT_PRINT_DEC " bytes"
+	     ", probing will be required.\n", size);
+
+  /* Round size to the nearest multiple of guard_size, and calculate the
+     residual as the difference between the original size and the rounded
+     size.  */
+  HOST_WIDE_INT rounded_size = ROUND_DOWN (size, guard_size);
+  HOST_WIDE_INT residual = size - rounded_size;
+
+  /* We can handle a small number of allocations/probes inline.  Otherwise
+     punt to a loop.  */
+  if (rounded_size <= STACK_CLASH_MAX_UNROLL_PAGES * guard_size)
+    {
+      for (HOST_WIDE_INT i = 0; i < rounded_size; i += guard_size)
+	{
+	  aarch64_sub_sp (NULL, temp2, guard_size, true);
+	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+					   guard_used_by_caller));
+	  emit_insn (gen_blockage ());
+	}
+      dump_stack_clash_frame_info (PROBE_INLINE, size != rounded_size);
+    }
+  else
+    {
+      /* Compute the ending address.  */
+      aarch64_add_offset (Pmode, temp1, stack_pointer_rtx, -rounded_size,
+			  temp1, NULL, false, true);
+      rtx_insn *insn = get_last_insn ();
+
+      /* For the initial allocation, we don't have a frame pointer
+	 set up, so we always need CFI notes.  If we're doing the
+	 final allocation, then we may have a frame pointer, in which
+	 case it is the CFA, otherwise we need CFI notes.
+
+	 We can determine which allocation we are doing by looking at
+	 the value of FRAME_RELATED_P since the final allocations are not
+	 frame related.  */
+      if (frame_related_p)
+	{
+	  /* We want the CFA independent of the stack pointer for the
+	     duration of the loop.  */
+	  add_reg_note (insn, REG_CFA_DEF_CFA,
+			plus_constant (Pmode, temp1, rounded_size));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      /* This allocates and probes the stack.  Note that this re-uses some of
+	 the existing Ada stack protection code.  However we are guaranteed not
+	 to enter the non loop or residual branches of that code.
+
+	 The non-loop part won't be entered because if our allocation amount
+	 doesn't require a loop, the case above would handle it.
+
+	 The residual amount won't be entered because TEMP1 is a mutliple of
+	 the allocation size.  The residual will always be 0.  As such, the only
+	 part we are actually using from that code is the loop setup.  The
+	 actual probing is done in aarch64_output_probe_stack_range.  */
+      insn = emit_insn (gen_probe_stack_range (stack_pointer_rtx,
+					       stack_pointer_rtx, temp1));
+
+      /* Now reset the CFA register if needed.  */
+      if (frame_related_p)
+	{
+	  add_reg_note (insn, REG_CFA_DEF_CFA,
+			plus_constant (Pmode, stack_pointer_rtx, rounded_size));
+	  RTX_FRAME_RELATED_P (insn) = 1;
+	}
+
+      emit_insn (gen_blockage ());
+      dump_stack_clash_frame_info (PROBE_LOOP, size != rounded_size);
+    }
+
+  /* Handle any residuals.  Residuals of at least MIN_PROBE_THRESHOLD have to
+     be probed.  This maintains the requirement that each page is probed at
+     least once.  For initial probing we probe only if the allocation is
+     more than GUARD_SIZE - buffer, and for the outgoing arguments we probe
+     if the amount is larger than buffer.  GUARD_SIZE - buffer + buffer ==
+     GUARD_SIZE.  This works that for any allocation that is large enough to
+     trigger a probe here, we'll have at least one, and if they're not large
+     enough for this code to emit anything for them, The page would have been
+     probed by the saving of FP/LR either by this function or any callees.  If
+     we don't have any callees then we won't have more stack adjustments and so
+     are still safe.  */
+  if (residual)
+    {
+      HOST_WIDE_INT residual_probe_offset = guard_used_by_caller;
+      /* If we're doing final adjustments, and we've done any full page
+	 allocations then any residual needs to be probed.  */
+      if (final_adjustment_p && rounded_size != 0)
+	min_probe_threshold = 0;
+      /* If doing a small final adjustment, we always probe at offset 0.
+	 This is done to avoid issues when LR is not at position 0 or when
+	 the final adjustment is smaller than the probing offset.  */
+      else if (final_adjustment_p && rounded_size == 0)
+	residual_probe_offset = 0;
+
+      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
+      if (residual >= min_probe_threshold)
+	{
+	  if (dump_file)
+	    fprintf (dump_file,
+		     "Stack clash AArch64 prologue residuals: "
+		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
+		     "\n", residual);
+
+	    emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
+					     residual_probe_offset));
+	  emit_insn (gen_blockage ());
+	}
+    }
+}
+
 /* Add a REG_CFA_EXPRESSION note to INSN to say that register REG
    is saved at BASE + OFFSET.  */
 
@@ -4779,7 +4982,7 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
 	|  local variables              | <-- frame_pointer_rtx
 	|                               |
 	+-------------------------------+
-	|  padding0                     | \
+	|  padding                      | \
 	+-------------------------------+  |
 	|  callee-saved registers       |  | frame.saved_regs_size
 	+-------------------------------+  |
@@ -4798,7 +5001,23 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int reg,
 
    Dynamic stack allocations via alloca() decrease stack_pointer_rtx
    but leave frame_pointer_rtx and hard_frame_pointer_rtx
-   unchanged.  */
+   unchanged.
+
+   By default for stack-clash we assume the guard is at least 64KB, but this
+   value is configurable to either 4KB or 64KB.  We also force the guard size to
+   be the same as the probing interval and both values are kept in sync.
+
+   With those assumptions the callee can allocate up to 63KB (or 3KB depending
+   on the guard size) of stack space without probing.
+
+   When probing is needed, we emit a probe at the start of the prologue
+   and every PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE bytes thereafter.
+
+   We have to track how much space has been allocated and the only stores
+   to the stack we track as implicit probes are the FP/LR stores.
+
+   For outgoing arguments we probe if the size is larger than 1KB, such that
+   the ABI specified buffer is maintained for the next callee.  */
 
 /* Generate the prologue instructions for entry into a function.
    Establish the stack frame by decreasing the stack pointer with a
@@ -4849,7 +5068,16 @@ aarch64_expand_prologue (void)
   rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
   rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
 
-  aarch64_sub_sp (ip0_rtx, ip1_rtx, initial_adjust, true);
+  /* In theory we should never have both an initial adjustment
+     and a callee save adjustment.  Verify that is the case since the
+     code below does not handle it for -fstack-clash-protection.  */
+  gcc_assert (known_eq (initial_adjust, 0) || callee_adjust == 0);
+
+  /* Will only probe if the initial adjustment is larger than the guard
+     less the amount of the guard reserved for use by the caller's
+     outgoing args.  */
+  aarch64_allocate_and_probe_stack_space (ip0_rtx, ip1_rtx, initial_adjust,
+					  true, false);
 
   if (callee_adjust != 0)
     aarch64_push_regs (reg1, reg2, callee_adjust);
@@ -4905,7 +5133,11 @@ aarch64_expand_prologue (void)
 			     callee_adjust != 0 || emit_frame_chain);
   aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM,
 			     callee_adjust != 0 || emit_frame_chain);
-  aarch64_sub_sp (ip1_rtx, ip0_rtx, final_adjust, !frame_pointer_needed);
+
+  /* We may need to probe the final adjustment if it is larger than the guard
+     that is assumed by the called.  */
+  aarch64_allocate_and_probe_stack_space (ip1_rtx, ip0_rtx, final_adjust,
+					  !frame_pointer_needed, true);
 }
 
 /* Return TRUE if we can use a simple_return insn.
@@ -4949,10 +5181,21 @@ aarch64_expand_epilogue (bool for_sibcall)
   /* A stack clash protection prologue may not have left IP0_REGNUM or
      IP1_REGNUM in a usable state.  The same is true for allocations
      with an SVE component, since we then need both temporary registers
-     for each allocation.  */
+     for each allocation.  For stack clash we are in a usable state if
+     the adjustment is less than GUARD_SIZE - GUARD_USED_BY_CALLER.  */
+  HOST_WIDE_INT guard_size
+    = 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_GUARD_SIZE);
+  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
+
+  /* We can re-use the registers when the allocation amount is smaller than
+     guard_size - guard_used_by_caller because we won't be doing any probes
+     then.  In such situations the register should remain live with the correct
+     value.  */
   bool can_inherit_p = (initial_adjust.is_constant ()
-			&& final_adjust.is_constant ()
-			&& !flag_stack_clash_protection);
+			&& final_adjust.is_constant ())
+			&& (!flag_stack_clash_protection
+			     || known_lt (initial_adjust,
+					  guard_size - guard_used_by_caller));
 
   /* We need to add memory barrier to prevent read from deallocated stack.  */
   bool need_barrier_p
@@ -4980,8 +5223,10 @@ aarch64_expand_epilogue (bool for_sibcall)
 			hard_frame_pointer_rtx, -callee_offset,
 			ip1_rtx, ip0_rtx, callee_adjust == 0);
   else
-    aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust,
-		    !can_inherit_p || df_regs_ever_live_p (IP1_REGNUM));
+     /* The case where we need to re-use the register here is very rare, so
+	avoid the complicated condition and just always emit a move if the
+	immediate doesn't fit.  */
+     aarch64_add_sp (ip1_rtx, ip0_rtx, final_adjust, true);
 
   aarch64_restore_callee_saves (DImode, callee_offset, R0_REGNUM, R30_REGNUM,
 				callee_adjust != 0, &cfi_ops);
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 22d20eae5c57de81827b3f0f676635a8fff2f054..b8da13f14fa9990e8fdc3c71ed407c8afc65a324 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -6453,7 +6453,7 @@
 )
 
 (define_insn "probe_stack_range"
-  [(set (match_operand:DI 0 "register_operand" "=r")
+  [(set (match_operand:DI 0 "register_operand" "=rk")
 	(unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0")
 			     (match_operand:DI 2 "register_operand" "r")]
 			      UNSPECV_PROBE_STACK_RANGE))]
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-12.c b/gcc/testsuite/gcc.target/aarch64/stack-check-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..4e3abcbcef2eb216be9f0e01b4f1713c33f8b0b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-12.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+extern void arf (unsigned long int *, unsigned long int *);
+void
+frob ()
+{
+  unsigned long int num[10000];
+  unsigned long int den[10000];
+  arf (den, num);
+}
+
+/* This verifies that the scheduler did not break the dependencies
+   by adjusting the offsets within the probe and that the scheduler
+   did not reorder around the stack probes.  */
+/* { dg-final { scan-assembler-times {sub\tsp, sp, #65536\n\tstr\txzr, \[sp, 1024\]} 2 } } */
+/* There is some residual allocation, but we don't care about that. Only that it's not probed.  */
+/* { dg-final { scan-assembler-times {str\txzr, } 2 } } */
+
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-13.c b/gcc/testsuite/gcc.target/aarch64/stack-check-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..1fcbae6e3fc6c8423883542d16735e2a8ca0e013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-13.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fno-asynchronous-unwind-tables -fno-unwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define ARG32(X) X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X,X
+#define ARG192(X) ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X),ARG32(X)
+void out1(ARG192(__int128));
+int t1(int);
+
+int t3(int x)
+{
+  if (x < 1000)
+    return t1 (x) + 1;
+
+  out1 (ARG192(1));
+  return 0;
+}
+
+
+
+/* This test creates a large (> 1k) outgoing argument area that needs
+   to be probed.  We don't test the exact size of the space or the
+   exact offset to make the test a little less sensitive to trivial
+   output changes.  */
+/* { dg-final { scan-assembler-times "sub\\tsp, sp, #....\\n\\tstr\\txzr, \\\[sp" 1 } } */
+
+
+
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..6885894a97e0a53cf87fc3ff9ded156014864c4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128*1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 65536} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 131072} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
+
+/* Checks that the CFA notes are correct for every sp adjustment.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..5796a53be0676bace2197e9d07a63b4b1757fd0d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-cfa-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -funwind-tables" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 1280*1024 + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa [0-9]+, 1310720} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1311232} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 1310720} 1 } } */
+/* { dg-final { scan-assembler-times {\.cfi_def_cfa_offset 0} 1 } } */
+
+/* Checks that the CFA notes are correct for every sp adjustment.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c
new file mode 100644
index 0000000000000000000000000000000000000000..d2bfb788c6ff731ed0592e16813147e3e58b4df2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
+
+/* SIZE is smaller than guard-size - 1Kb so no probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c
new file mode 100644
index 0000000000000000000000000000000000000000..c9c9a1b9161b2bec19e61bf648b26938dcf001b9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-10.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE (6 * 64 * 1024) + (1 * 63 * 1024) + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
+   1 probe expected in a loop and 1 residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c
new file mode 100644
index 0000000000000000000000000000000000000000..741f2f5fadc6960f1d2c34e1e93589cfcc6e1697
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-11.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE (6 * 64 * 1024) + (1 * 32 * 1024)
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than 4x guard-size and remainder larger than guard-size - 1Kb,
+   1 probe expected in a loop and 1 residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-12.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-12.c
new file mode 100644
index 0000000000000000000000000000000000000000..ece68003ade48799a0817c103a307a30537d6872
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-12.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+void
+f (void)
+{
+  volatile int x[16384 + 1000];
+  x[0] = 0;
+}
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than 1 guard-size, but only one 64KB page is used, expect only 1
+   probe.  Leaf function and omitting leaf pointers.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-13.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-13.c
new file mode 100644
index 0000000000000000000000000000000000000000..0fc900c6943ee92609ce2d83c5c54d6397ce35c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-13.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+void h (void) __attribute__ ((noreturn));
+
+void
+f (void)
+{
+  volatile int x[16384 + 1000];
+  x[30]=0;
+  h ();
+}
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
+
+/* SIZE is more than 1 guard-size, but only one 64KB page is used, expect only 1
+   probe.  Leaf function and omitting leaf pointers, tail call to noreturn which
+   may only omit an epilogue and not a prologue.  Checking for LR saving.  */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-14.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-14.c
new file mode 100644
index 0000000000000000000000000000000000000000..ea733f861e77a647cf5c661d23b116aabdfa31c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-14.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+void h (void) __attribute__ ((noreturn));
+
+void
+f (void)
+{
+  volatile int x[16384 + 1000];
+  if (x[0])
+     h ();
+  x[345] = 1;
+  h ();
+}
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
+
+/* SIZE is more than 1 guard-size, two 64k pages used, expect only 1 explicit
+   probe at 1024 and one implicit probe due to LR being saved.  Leaf function
+   and omitting leaf pointers, tail call to noreturn which may only omit an
+   epilogue and not a prologue and control flow in between.  Checking for
+   LR saving.  */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-15.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-15.c
new file mode 100644
index 0000000000000000000000000000000000000000..63df4a5609a2377a7b0688bd0871d2757c9f1a51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-15.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16 -fomit-frame-pointer -momit-leaf-frame-pointer" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+void g (volatile int *x) ;
+void h (void) __attribute__ ((noreturn));
+
+void
+f (void)
+{
+  volatile int x[16384 + 1000];
+  g (x);
+  h ();
+}
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+/* { dg-final { scan-assembler-times {str\s+x30, \[sp\]} 1 } } */
+
+/* SIZE is more than 1 guard-size, two 64k pages used, expect only 1 explicit
+   probe at 1024 and one implicit probe due to LR being saved.  Leaf function
+   and omitting leaf pointers, normal function call followed by a tail call to
+   noreturn which may only omit an epilogue and not a prologue and control flow
+   in between.  Checking for LR saving.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c
new file mode 100644
index 0000000000000000000000000000000000000000..61c52a251a7bf4f2d145e456c86049230d372ba4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-2.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 2 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 0 } } */
+
+/* SIZE is smaller than guard-size - 1Kb so no probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c
new file mode 100644
index 0000000000000000000000000000000000000000..0bef3c5b60c2abab6c28abd41444b5e3569c3652
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 63 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr,} 1 } } */
+
+/* SIZE is exactly guard-size - 1Kb, boundary condition so 1 probe expected.
+*/
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c
new file mode 100644
index 0000000000000000000000000000000000000000..5b8693a051c321cb5a2f701cd3272e3970a6a4de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-4.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 63 * 1024 + 512
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is less than 1kB,
+   1 probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c
new file mode 100644
index 0000000000000000000000000000000000000000..2ee16350127c2e201da7d990dbcb042691b52348
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-5.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 64 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is zero,
+   1 probe expected, boundary condition.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c
new file mode 100644
index 0000000000000000000000000000000000000000..3c9b606cbe0e3b4f75c86c22dd1f69dde7e36310
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-6.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 65 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than guard-size - 1Kb and remainder is equal to 1kB,
+   1 probe expected.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c
new file mode 100644
index 0000000000000000000000000000000000000000..6324c0367fada8a2726e689d39e46ad5f8e130b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-7.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 127 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 1x guard-size and remainder equal than guard-size - 1Kb,
+   2 probe expected, unrolled, no loop.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c
new file mode 100644
index 0000000000000000000000000000000000000000..333f5fcc3607ee633c1b9374f6d5e8ac4a954b37
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-8.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 128 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 2 } } */
+
+/* SIZE is more than 2x guard-size and no remainder, unrolled, no loop.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c
new file mode 100644
index 0000000000000000000000000000000000000000..a3ff89b558139e56d2d69c93307fcab79c89a103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue-9.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection --param stack-clash-protection-guard-size=16" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+#define SIZE 6 * 64 * 1024
+#include "stack-check-prologue.h"
+
+/* { dg-final { scan-assembler-times {str\s+xzr, \[sp, 1024\]} 1 } } */
+
+/* SIZE is more than 4x guard-size and no remainder, 1 probe expected in a loop
+   and no residual probe.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h
new file mode 100644
index 0000000000000000000000000000000000000000..b7e06aedb81d7692ebd587b23d1065436b1c7218
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stack-check-prologue.h
@@ -0,0 +1,5 @@
+int f_test (int x)
+{
+  char arr[SIZE];
+  return arr[x];
+}
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index b04ceb6508e77b1e7d489207652d8e5d4ea8cf35..7f33ce8a3efabc8a9144a83ee326120415fd38f7 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9330,14 +9330,9 @@ proc check_effective_target_autoincdec { } {
 # 
 proc check_effective_target_supports_stack_clash_protection { } {
 
-   # Temporary until the target bits are fully ACK'd.
-#  if { [istarget aarch*-*-*] } {
-#	return 1
-#  }
-
     if { [istarget x86_64-*-*] || [istarget i?86-*-*] 
 	  || [istarget powerpc*-*-*] || [istarget rs6000*-*-*]
-	  || [istarget s390*-*-*] } {
+	  || [istarget aarch64*-**] || [istarget s390*-*-*] } {
 	return 1
     }
   return 0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-09-07 16:03   ` Tamar Christina
@ 2018-09-11 14:49     ` Richard Sandiford
  2018-09-11 15:48       ` Tamar Christina
  2018-09-11 15:55     ` James Greenhalgh
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2018-09-11 14:49 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Jeff Law, gcc-patches, nd, James Greenhalgh, Richard Earnshaw,
	Marcus Shawcroft

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi Richard,
>
> The 08/28/2018 21:58, Richard Sandiford wrote:
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
>> > + /* When doing the final adjustment for the outgoing argument size
>> > we can't
>> > + assume that LR was saved at position 0.  So subtract it's offset
>> > from the
>> > +     ABI safe buffer so that we don't accidentally allow an adjustment that
>> > +     would result in an allocation larger than the ABI buffer without
>> > +     probing.  */
>> > +  HOST_WIDE_INT min_probe_threshold
>> > +    = final_adjustment_p
>> > +      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
>> > +      : guard_size - guard_used_by_caller;
>> [...]
>> > +  if (residual)
>> > +    {
>> > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
>> > +      if (residual >= min_probe_threshold)
>> > +	{
>> > +	  if (dump_file)
>> > +	    fprintf (dump_file,
>> > +		     "Stack clash AArch64 prologue residuals: "
>> > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
>> > +		     "\n", residual);
>> > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>> > +					   STACK_CLASH_CALLER_GUARD));
>> 
>> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
>> min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
>> like the probe would then write above the region.
>> 
>> Using >= rather than > means that the same thing could happen when
>> LR_REGNUM is at position 0, if the residual is exactly
>> STACK_CLASH_CALLER_GUARD.
>
> That's true. While addressing this we changed how the residuals are probed.
>
> To address a comment you raised offline about the saving of LR when
> calling a no-return function using a tail call and
> -fomit-frame-pointer, I think this should be safe as the code in
> frame_layout (line 4131-4136) would ensure that R30 is saved.

That line number range doesn't seem to match up with current sources.
But my point was that "X is a non-leaf function" does not directly
imply "X saves R30_REGNUM".  It might happen to imply that indirectly
for all cases at the moment, but it's not something that the AArch64
code enforces itself.  AFAICT the only time we force R30 to be saved
explicitly is when emitting a chain:

  if (cfun->machine->frame.emit_frame_chain)
    {
      /* FP and LR are placed in the linkage record.  */
      cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
      cfun->machine->frame.wb_candidate1 = R29_REGNUM;
      cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
      cfun->machine->frame.wb_candidate2 = R30_REGNUM;
      offset = 2 * UNITS_PER_WORD;
    }

Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
And in principle, it should be correct to use a sibcall pattern that
doesn't clobber R30 for a noreturn call even in functions that require
a frame.  We don't do that yet, and it probably doesn't make sense from
a QoI perspective on AArch64, but I don't think it's invalid in terms
of rtl semantics.  There's no real reason why a noreturn function has to
save the address of its caller unless the target forces it to, such as
for frame chains above.

In this patch we're relying on the link between the two concepts for a
security feature, so I think we should either enforce it explicitly or
add an assert like:

  gcc_assert (crtl->is_leaf
	      || (cfun->machine->frame.reg_offset[R30_REGNUM]
		  != SLOT_NOT_REQUIRED));

to aarch64_expand_prologue.  (I don't think we should make the assert
dependent on stack-clash, since the point is that we're assuming this
happens regardless of stack-clash.)

> I have added two new tests to check for this, so that if it does
> change in the future they would fail.

Thanks.

Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-09-11 14:49     ` Richard Sandiford
@ 2018-09-11 15:48       ` Tamar Christina
  2018-09-12 17:22         ` Richard Sandiford
  0 siblings, 1 reply; 8+ messages in thread
From: Tamar Christina @ 2018-09-11 15:48 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Jeff Law, gcc-patches, nd, James Greenhalgh, Richard Earnshaw,
	Marcus Shawcroft

Hi Richard,

> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Tuesday, September 11, 2018 15:49
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Jeff Law <law@redhat.com>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>;
> Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/7)]
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> > Hi Richard,
> >
> > The 08/28/2018 21:58, Richard Sandiford wrote:
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> > +  HOST_WIDE_INT guard_used_by_caller =
> STACK_CLASH_CALLER_GUARD;
> >> > + /* When doing the final adjustment for the outgoing argument size
> >> > we can't
> >> > + assume that LR was saved at position 0.  So subtract it's offset
> >> > from the
> >> > +     ABI safe buffer so that we don't accidentally allow an adjustment
> that
> >> > +     would result in an allocation larger than the ABI buffer without
> >> > +     probing.  */
> >> > +  HOST_WIDE_INT min_probe_threshold
> >> > +    = final_adjustment_p
> >> > +      ? guard_used_by_caller - cfun->machine-
> >frame.reg_offset[LR_REGNUM]
> >> > +      : guard_size - guard_used_by_caller;
> >> [...]
> >> > +  if (residual)
> >> > +    {
> >> > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> >> > +      if (residual >= min_probe_threshold)
> >> > +	{
> >> > +	  if (dump_file)
> >> > +	    fprintf (dump_file,
> >> > +		     "Stack clash AArch64 prologue residuals: "
> >> > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
> required."
> >> > +		     "\n", residual);
> >> > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> >> > +					   STACK_CLASH_CALLER_GUARD));
> >>
> >> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
> >> 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
> >> It looks like the probe would then write above the region.
> >>
> >> Using >= rather than > means that the same thing could happen when
> >> LR_REGNUM is at position 0, if the residual is exactly
> >> STACK_CLASH_CALLER_GUARD.
> >
> > That's true. While addressing this we changed how the residuals are
> probed.
> >
> > To address a comment you raised offline about the saving of LR when
> > calling a no-return function using a tail call and
> > -fomit-frame-pointer, I think this should be safe as the code in
> > frame_layout (line 4131-4136) would ensure that R30 is saved.
> 
> That line number range doesn't seem to match up with current sources.
> But my point was that "X is a non-leaf function" does not directly imply "X
> saves R30_REGNUM".  It might happen to imply that indirectly for all cases at
> the moment, but it's not something that the AArch64 code enforces itself.
> AFAICT the only time we force R30 to be saved explicitly is when emitting a
> chain:
> 
>   if (cfun->machine->frame.emit_frame_chain)
>     {
>       /* FP and LR are placed in the linkage record.  */
>       cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
>       cfun->machine->frame.wb_candidate1 = R29_REGNUM;
>       cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
>       cfun->machine->frame.wb_candidate2 = R30_REGNUM;
>       offset = 2 * UNITS_PER_WORD;
>     }
> 
> Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
> And in principle, it should be correct to use a sibcall pattern that doesn't
> clobber R30 for a noreturn call even in functions that require a frame.  We
> don't do that yet, and it probably doesn't make sense from a QoI perspective
> on AArch64, but I don't think it's invalid in terms of rtl semantics.  There's no
> real reason why a noreturn function has to save the address of its caller
> unless the target forces it to, such as for frame chains above.
> 
> In this patch we're relying on the link between the two concepts for a
> security feature, so I think we should either enforce it explicitly or add an
> assert like:
> 
>   gcc_assert (crtl->is_leaf
> 	      || (cfun->machine->frame.reg_offset[R30_REGNUM]
> 		  != SLOT_NOT_REQUIRED));
> 
> to aarch64_expand_prologue.  (I don't think we should make the assert
> dependent on stack-clash, since the point is that we're assuming this
> happens regardless of stack-clash.)

I agree that the assert would be a good idea, though I'm not sure enabling it always is
a good idea. I'm not sure what other languages that don't use stack-clash-protection and do their
own probing do. Like Ada or Go?

Regards,
Tamar

> 
> > I have added two new tests to check for this, so that if it does
> > change in the future they would fail.
> 
> Thanks.
> 
> Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-09-07 16:03   ` Tamar Christina
  2018-09-11 14:49     ` Richard Sandiford
@ 2018-09-11 15:55     ` James Greenhalgh
  2018-10-09  6:38       ` Tamar Christina
  1 sibling, 1 reply; 8+ messages in thread
From: James Greenhalgh @ 2018-09-11 15:55 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Richard Sandiford, Jeff Law, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft

On Fri, Sep 07, 2018 at 11:03:28AM -0500, Tamar Christina wrote:
> Hi Richard,
> 
> The 08/28/2018 21:58, Richard Sandiford wrote:
> > Tamar Christina <Tamar.Christina@arm.com> writes:
> > > +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> > > +  /* When doing the final adjustment for the outgoing argument size we can't
> > > +     assume that LR was saved at position 0.  So subtract it's offset from the
> > > +     ABI safe buffer so that we don't accidentally allow an adjustment that
> > > +     would result in an allocation larger than the ABI buffer without
> > > +     probing.  */
> > > +  HOST_WIDE_INT min_probe_threshold
> > > +    = final_adjustment_p
> > > +      ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> > > +      : guard_size - guard_used_by_caller;
> > [...]
> > > +  if (residual)
> > > +    {
> > > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > > +      if (residual >= min_probe_threshold)
> > > +	{
> > > +	  if (dump_file)
> > > +	    fprintf (dump_file,
> > > +		     "Stack clash AArch64 prologue residuals: "
> > > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> > > +		     "\n", residual);
> > > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > > +					   STACK_CLASH_CALLER_GUARD));
> > 
> > reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
> > min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
> > like the probe would then write above the region.
> > 
> > Using >= rather than > means that the same thing could happen when
> > LR_REGNUM is at position 0, if the residual is exactly
> > STACK_CLASH_CALLER_GUARD.
> 
> That's true. While addressing this we changed how the residuals are probed.
> 
> To address a comment you raised offline about the saving of LR when calling
> a no-return function using a tail call and -fomit-frame-pointer, I think this should
> be safe as the code in frame_layout (line 4131-4136) would ensure that R30 is saved.
> 
> I have added two new tests to check for this, so that if it does change in the future they
> would fail. 
> 
> Attached is the updated patch and new changelog
> 
> Ok for trunk?

I'm happy with this patch version; I'd have preferred a FORNOW comment on this:

> +  /* If SIZE is not large enough to require probing, just adjust the stack and
> +     exit.  */
> +  if (!poly_size.is_constant (&size)
> +      || known_lt (poly_size, min_probe_threshold)
> +      || !flag_stack_clash_protection)

as you don't fix it until 2/7, but that is a minor point.

I'm happy with you responding to Richard S' request for an assert either in
this patch, or tacked on as an 8/7.

OK.

Thanks,
James

> Thanks,
> Tamar
> 
> gcc/
> 2018-09-07  Jeff Law  <law@redhat.com>
> 	    Richard Sandiford <richard.sandiford@linaro.org>
> 	    Tamar Christina  <tamar.christina@arm.com>
> 
> 	PR target/86486
> 	* config/aarch64/aarch64.md
> 	(probe_stack_range): Add k (SP) constraint.
> 	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> 	STACK_CLASH_MAX_UNROLL_PAGES): New.
> 	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
> 	stack probes for stack clash.
> 	(aarch64_allocate_and_probe_stack_space): New.
> 	(aarch64_expand_prologue): Use it.
> 	(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
> 	(aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-09-07  Jeff Law  <law@redhat.com>
> 	    Richard Sandiford <richard.sandiford@linaro.org>
> 	    Tamar Christina  <tamar.christina@arm.com>
> 
> 	PR target/86486
> 	* gcc.target/aarch64/stack-check-12.c: New.
> 	* gcc.target/aarch64/stack-check-13.c: New.
> 	* gcc.target/aarch64/stack-check-cfa-1.c: New.
> 	* gcc.target/aarch64/stack-check-cfa-2.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-1.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-10.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-11.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-12.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-13.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-14.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-15.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-2.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-3.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-4.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-5.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-6.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-7.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-8.c: New.
> 	* gcc.target/aarch64/stack-check-prologue-9.c: New.
> 	* gcc.target/aarch64/stack-check-prologue.h: New.
> 	* lib/target-supports.exp
> 	(check_effective_target_supports_stack_clash_protection): Add AArch64.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-09-11 15:48       ` Tamar Christina
@ 2018-09-12 17:22         ` Richard Sandiford
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Sandiford @ 2018-09-12 17:22 UTC (permalink / raw)
  To: Tamar Christina
  Cc: Jeff Law, gcc-patches, nd, James Greenhalgh, Richard Earnshaw,
	Marcus Shawcroft

Tamar Christina <Tamar.Christina@arm.com> writes:
> Hi Richard,
>
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Tuesday, September 11, 2018 15:49
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Jeff Law <law@redhat.com>; gcc-patches@gcc.gnu.org; nd
>> <nd@arm.com>; James Greenhalgh <James.Greenhalgh@arm.com>;
>> Richard Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft
>> <Marcus.Shawcroft@arm.com>
>> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
>> supporting 64k probes. [patch (1/7)]
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > Hi Richard,
>> >
>> > The 08/28/2018 21:58, Richard Sandiford wrote:
>> >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> > +  HOST_WIDE_INT guard_used_by_caller =
>> STACK_CLASH_CALLER_GUARD;
>> >> > + /* When doing the final adjustment for the outgoing argument size
>> >> > we can't
>> >> > + assume that LR was saved at position 0.  So subtract it's offset
>> >> > from the
>> >> > +     ABI safe buffer so that we don't accidentally allow an adjustment
>> that
>> >> > +     would result in an allocation larger than the ABI buffer without
>> >> > +     probing.  */
>> >> > +  HOST_WIDE_INT min_probe_threshold
>> >> > +    = final_adjustment_p
>> >> > +      ? guard_used_by_caller - cfun->machine-
>> >frame.reg_offset[LR_REGNUM]
>> >> > +      : guard_size - guard_used_by_caller;
>> >> [...]
>> >> > +  if (residual)
>> >> > +    {
>> >> > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
>> >> > +      if (residual >= min_probe_threshold)
>> >> > +	{
>> >> > +	  if (dump_file)
>> >> > +	    fprintf (dump_file,
>> >> > +		     "Stack clash AArch64 prologue residuals: "
>> >> > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
>> required."
>> >> > +		     "\n", residual);
>> >> > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
>> >> > +					   STACK_CLASH_CALLER_GUARD));
>> >>
>> >> reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
>> >> 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
>> >> It looks like the probe would then write above the region.
>> >>
>> >> Using >= rather than > means that the same thing could happen when
>> >> LR_REGNUM is at position 0, if the residual is exactly
>> >> STACK_CLASH_CALLER_GUARD.
>> >
>> > That's true. While addressing this we changed how the residuals are
>> probed.
>> >
>> > To address a comment you raised offline about the saving of LR when
>> > calling a no-return function using a tail call and
>> > -fomit-frame-pointer, I think this should be safe as the code in
>> > frame_layout (line 4131-4136) would ensure that R30 is saved.
>> 
>> That line number range doesn't seem to match up with current sources.
>> But my point was that "X is a non-leaf function" does not directly imply "X
>> saves R30_REGNUM".  It might happen to imply that indirectly for all cases at
>> the moment, but it's not something that the AArch64 code enforces itself.
>> AFAICT the only time we force R30 to be saved explicitly is when emitting a
>> chain:
>> 
>>   if (cfun->machine->frame.emit_frame_chain)
>>     {
>>       /* FP and LR are placed in the linkage record.  */
>>       cfun->machine->frame.reg_offset[R29_REGNUM] = 0;
>>       cfun->machine->frame.wb_candidate1 = R29_REGNUM;
>>       cfun->machine->frame.reg_offset[R30_REGNUM] = UNITS_PER_WORD;
>>       cfun->machine->frame.wb_candidate2 = R30_REGNUM;
>>       offset = 2 * UNITS_PER_WORD;
>>     }
>> 
>> Otherwise we only save R30_REGNUM if the df machinery says R30 is live.
>> And in principle, it should be correct to use a sibcall pattern that doesn't
>> clobber R30 for a noreturn call even in functions that require a frame.  We
>> don't do that yet, and it probably doesn't make sense from a QoI perspective
>> on AArch64, but I don't think it's invalid in terms of rtl semantics.  There's no
>> real reason why a noreturn function has to save the address of its caller
>> unless the target forces it to, such as for frame chains above.
>> 
>> In this patch we're relying on the link between the two concepts for a
>> security feature, so I think we should either enforce it explicitly or add an
>> assert like:
>> 
>>   gcc_assert (crtl->is_leaf
>> 	      || (cfun->machine->frame.reg_offset[R30_REGNUM]
>> 		  != SLOT_NOT_REQUIRED));
>> 
>> to aarch64_expand_prologue.  (I don't think we should make the assert
>> dependent on stack-clash, since the point is that we're assuming this
>> happens regardless of stack-clash.)
>
> I agree that the assert would be a good idea, though I'm not sure
> enabling it always is
> a good idea. I'm not sure what other languages that don't use
> stack-clash-protection and do their
> own probing do. Like Ada or Go?

I think the argument was that R30 will always be saved in non-leaf
frames without any specific help from the target (which I have my
doubts about, but might be true in practice).  The mechanism by which
that happens doesn't include a test for stack-clash, so I think we
should have the courage of our convictions and not test it in the
assert either.

If we're not comfortable asserting without the stack-clash check
as things stand, I think we should instead make the layout code
force R30 to be saved for stack-clash.  Then including stack-clash
in the assert makes sense, since it ties to a specific piece of
the layout code.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]
  2018-09-11 15:55     ` James Greenhalgh
@ 2018-10-09  6:38       ` Tamar Christina
  0 siblings, 0 replies; 8+ messages in thread
From: Tamar Christina @ 2018-10-09  6:38 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Richard Sandiford, Jeff Law, gcc-patches, nd, Richard Earnshaw,
	Marcus Shawcroft

Hi All,

I'm looking for permission to backport this patch to the GCC-8 branch
to fix PR86486.

OK for backport?

Thanks,
Tamar

> -----Original Message-----
> From: James Greenhalgh <james.greenhalgh@arm.com>
> Sent: Tuesday, September 11, 2018 16:56
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Jeff Law
> <law@redhat.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard
> Earnshaw <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>
> Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/7)]
> 
> On Fri, Sep 07, 2018 at 11:03:28AM -0500, Tamar Christina wrote:
> > Hi Richard,
> >
> > The 08/28/2018 21:58, Richard Sandiford wrote:
> > > Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > +  HOST_WIDE_INT guard_used_by_caller =
> STACK_CLASH_CALLER_GUARD;
> > > > +  /* When doing the final adjustment for the outgoing argument size
> we can't
> > > > +     assume that LR was saved at position 0.  So subtract it's offset from
> the
> > > > +     ABI safe buffer so that we don't accidentally allow an adjustment
> that
> > > > +     would result in an allocation larger than the ABI buffer without
> > > > +     probing.  */
> > > > +  HOST_WIDE_INT min_probe_threshold
> > > > +    = final_adjustment_p
> > > > +      ? guard_used_by_caller - cfun->machine-
> >frame.reg_offset[LR_REGNUM]
> > > > +      : guard_size - guard_used_by_caller;
> > > [...]
> > > > +  if (residual)
> > > > +    {
> > > > +      aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> > > > +      if (residual >= min_probe_threshold)
> > > > +	{
> > > > +	  if (dump_file)
> > > > +	    fprintf (dump_file,
> > > > +		     "Stack clash AArch64 prologue residuals: "
> > > > +		     HOST_WIDE_INT_PRINT_DEC " bytes, probing will be
> required."
> > > > +		     "\n", residual);
> > > > +	  emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> > > > +					   STACK_CLASH_CALLER_GUARD));
> > >
> > > reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position
> > > 0, min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.
> > > It looks like the probe would then write above the region.
> > >
> > > Using >= rather than > means that the same thing could happen when
> > > LR_REGNUM is at position 0, if the residual is exactly
> > > STACK_CLASH_CALLER_GUARD.
> >
> > That's true. While addressing this we changed how the residuals are
> probed.
> >
> > To address a comment you raised offline about the saving of LR when
> > calling a no-return function using a tail call and
> > -fomit-frame-pointer, I think this should be safe as the code in
> frame_layout (line 4131-4136) would ensure that R30 is saved.
> >
> > I have added two new tests to check for this, so that if it does
> > change in the future they would fail.
> >
> > Attached is the updated patch and new changelog
> >
> > Ok for trunk?
> 
> I'm happy with this patch version; I'd have preferred a FORNOW comment on
> this:
> 
> > +  /* If SIZE is not large enough to require probing, just adjust the stack and
> > +     exit.  */
> > +  if (!poly_size.is_constant (&size)
> > +      || known_lt (poly_size, min_probe_threshold)
> > +      || !flag_stack_clash_protection)
> 
> as you don't fix it until 2/7, but that is a minor point.
> 
> I'm happy with you responding to Richard S' request for an assert either in
> this patch, or tacked on as an 8/7.
> 
> OK.
> 
> Thanks,
> James
> 
> > Thanks,
> > Tamar
> >
> > gcc/
> > 2018-09-07  Jeff Law  <law@redhat.com>
> > 	    Richard Sandiford <richard.sandiford@linaro.org>
> > 	    Tamar Christina  <tamar.christina@arm.com>
> >
> > 	PR target/86486
> > 	* config/aarch64/aarch64.md
> > 	(probe_stack_range): Add k (SP) constraint.
> > 	* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
> > 	STACK_CLASH_MAX_UNROLL_PAGES): New.
> > 	* config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
> > 	stack probes for stack clash.
> > 	(aarch64_allocate_and_probe_stack_space): New.
> > 	(aarch64_expand_prologue): Use it.
> > 	(aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
> > 	(aarch64_sub_sp): Add emit_move_imm optional param.
> >
> > gcc/testsuite/
> > 2018-09-07  Jeff Law  <law@redhat.com>
> > 	    Richard Sandiford <richard.sandiford@linaro.org>
> > 	    Tamar Christina  <tamar.christina@arm.com>
> >
> > 	PR target/86486
> > 	* gcc.target/aarch64/stack-check-12.c: New.
> > 	* gcc.target/aarch64/stack-check-13.c: New.
> > 	* gcc.target/aarch64/stack-check-cfa-1.c: New.
> > 	* gcc.target/aarch64/stack-check-cfa-2.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-1.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-10.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-11.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-12.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-13.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-14.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-15.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-2.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-3.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-4.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-5.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-6.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-7.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-8.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue-9.c: New.
> > 	* gcc.target/aarch64/stack-check-prologue.h: New.
> > 	* lib/target-supports.exp
> > 	(check_effective_target_supports_stack_clash_protection): Add
> AArch64.
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-09  6:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-28 12:18 [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)] Tamar Christina
2018-08-28 20:58 ` Richard Sandiford
2018-09-07 16:03   ` Tamar Christina
2018-09-11 14:49     ` Richard Sandiford
2018-09-11 15:48       ` Tamar Christina
2018-09-12 17:22         ` Richard Sandiford
2018-09-11 15:55     ` James Greenhalgh
2018-10-09  6:38       ` Tamar Christina

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).