[PATCH][RFA/RFC] Stack clash mitigation patch 01/08

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
@ 2017-07-11 21:20 Jeff Law
  2017-07-13  0:32 ` Segher Boessenkool
  2017-07-14  0:38 ` David Malcolm
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff Law @ 2017-07-11 21:20 UTC (permalink / raw)
  To: gcc-patches@gcc.gnu.org >> gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1561 bytes --]

This is the first patch in the stack-clash mitigation patches.

It introduces a new style of stack probing -fstack-check=clash,
including documentation of the new option, how it differs from
-fstack-check=specific, etc.

FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
address that.

It also introduces some dejagnu bits that are later used in tests.  The
idea was to introduce dejagnu functions which describe aspects of the
target and have the tests adjust their expectations based on those
dejagnu functions rather than on a target name.

Finally, this patch introduces one new test of note.  Some targets have
call instructions that store a return pointer into the stack and we take
advantage of that ISA feature to avoid some explicit probes.

This optimization is restricted to cases where the caller does not have
a frame of its own (because there's no reasonable way to tear that frame
down on the return path).

However, a sufficiently smart compiler could realize that a call to a
noreturn function could be converted into a jump, even if the caller has
a frame because that frame need not be torn down.  Thus it would be
possible for a function calling a noreturn function to advance the stack
into the guard without actually touching the guard page, which breaks
the assumption that the call instruction would touch the guard
triggering a fault for that case.

GCC doesn't currently optimize that case for various reasons, but it
seemed prudent to go ahead and explicitly verify that with a test.

Thoughts?  Ok for the trunk?


[-- Attachment #2: P1 --]
[-- Type: text/plain, Size: 10476 bytes --]


	* common.opt (-fstack-check=clash): New option.
	* flag-types.h (enum stack_check_type): Improve comments.
	(STACK_CLASH_BUILTIN_STACK_CHECK): New stack_check_type.
	* opts.c (common_handle_option): Handle -fstack-check=clash.
	* doc/invoke.texi (-fstack-check=clash): Document new option.
	(-fstack-check): Note additional problem with -fstack-check=generic.
	Note differences between "clash" and "specific", fallbacks
	and recommendations based on expected use.

testsuite/

	* gcc.dg/stack-check-2.c: New test.
	* lib/target-supports.exp
	(check_effective_target_stack_clash_protected): New function.
	(check_effective_target_frame_pointer_for_non_leaf): Likewise.
	(check_effective_target_caller_implicit_probes): Likewise.



commit 018fffb569512eccd6b77410e28caa159009df5d
Author: Jeff Law <law@torsion.usersys.redhat.com>
Date:   Thu Jun 29 16:36:21 2017 -0400

    Recongize new option + test of noreturn functions
    Dejagnu primitivies for stack probe checking

diff --git a/gcc/common.opt b/gcc/common.opt
index e81165c..8eec29f 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2300,7 +2300,7 @@ Apply variable expansion when loops are unrolled.
 
 fstack-check=
 Common Report RejectNegative Joined
--fstack-check=[no|generic|specific]	Insert stack checking code into the program.
+-fstack-check=[no|generic|clash|specific]	Insert stack checking code into the program.
 
 fstack-check
 Common Alias(fstack-check=, specific, no)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3e5cee8..e72f3e9 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -11324,8 +11324,9 @@ generation of code to ensure that they see the stack being extended.
 
 You can additionally specify a string parameter: @samp{no} means no
 checking, @samp{generic} means force the use of old-style checking,
-@samp{specific} means use the best checking method and is equivalent
-to bare @option{-fstack-check}.
+@samp{clash} means use a checking method designed to prevent stack clash
+style attacks, @samp{specific} means use target specific
+checking methods and is equivalent to bare @option{-fstack-check}.
 
 Old-style checking is a generic mechanism that requires no specific
 target support in the compiler but comes with the following drawbacks:
@@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks:
 @enumerate
 @item
 Modified allocation strategy for large objects: they are always
-allocated dynamically if their size exceeds a fixed threshold.
+allocated dynamically if their size exceeds a fixed threshold.  Note this
+may change the semantics of some code.
 
 @item
 Fixed limit on the size of the static frame of functions: when it is
@@ -11345,8 +11347,25 @@ Inefficiency: because of both the modified allocation strategy and the
 generic implementation, code performance is hampered.
 @end enumerate
 
-Note that old-style stack checking is also the fallback method for
-@samp{specific} if no target support has been added in the compiler.
+Note that old-style stack checking is the fallback method for @samp{clash}
+and @samp{specific} if no target support for either of those has been added
+in the compiler.
+
+Also note that @samp{clash} requires target dependent prologue sequences that
+are different than @samp{specific}.  However, some targets may use
+@samp{specific} style prologues if they have not defined @samp{clash} style
+prologues at the loss of some functionality (such as @command{valgrind}) and
+protection in certain difficult to trigger corner cases.
+
+@samp{specific} style checking is designed for Ada's needs to detect
+infinite recursion and stack overflows.  @samp{specific} is an excellent
+choice when compiling Ada code.  It is not generally sufficient to protect
+against attacks that jump the stack guard page.
+
+@samp{clash} is designed to prevent jumping the stack guard page as seen in
+stack clash style attacks.  However, it does not guarantee sufficient stack
+space to handle a signal in the event that the program hits the stack guard
+and is thus incompatible with Ada's needs.
 
 @item -fstack-limit-register=@var{reg}
 @itemx -fstack-limit-symbol=@var{sym}
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5faade5..9fad822 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -178,11 +178,27 @@ enum stack_check_type
 
   /* Check the stack and rely on the target configuration files to
      check the static frame of functions, i.e. use the generic
-     mechanism only for dynamic stack allocations.  */
+     mechanism only for dynamic stack allocations.
+
+     This approach attempts to guarantee enough stack space is always
+     available to handle a signal and assumes the entire program is
+     compiled with stack checking.  */
   STATIC_BUILTIN_STACK_CHECK,
 
+  /* Check the stack and rely on the target configuration files to
+     check the static frame of functions, i.e. use the generic
+     mechanism only for dynamic stack allocations.
+
+     This approach attempts to make code immune to attacks which jump
+     the stack guard page and stack clash style attacks.  It works
+     is mixed code, but does not guarantee the ability to handle a
+     signal.  */
+  STACK_CLASH_BUILTIN_STACK_CHECK,
+
   /* Check the stack and entirely rely on the target configuration
-     files, i.e. do not use the generic mechanism at all.  */
+     files, i.e. do not use the generic mechanism at all.  This
+     does not prevent stack guard jumping and stack clash style
+     attacks.  */
   FULL_BUILTIN_STACK_CHECK
 };
 
diff --git a/gcc/opts.c b/gcc/opts.c
index 7460c2b..61f5bb0 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options *opts,
 	opts->x_flag_stack_check = STACK_CHECK_BUILTIN
 			   ? FULL_BUILTIN_STACK_CHECK
 			   : GENERIC_STACK_CHECK;
+      else if (!strcmp (arg, "clash"))
+	{
+	  /* This is the stack checking method, designed to prevent
+	     stack-clash attacks.  */
+	  if (!STACK_GROWS_DOWNWARD)
+	    sorry ("-fstack-check=clash not implemented on this target");
+	  else
+	    opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
+				        ? FULL_BUILTIN_STACK_CHECK
+					: (STACK_CHECK_STATIC_BUILTIN
+					   ? STACK_CLASH_BUILTIN_STACK_CHECK
+					   : GENERIC_STACK_CHECK));
+	}
       else if (!strcmp (arg, "specific"))
 	/* This is the new stack checking method.  */
 	opts->x_flag_stack_check = STACK_CHECK_BUILTIN
diff --git a/gcc/testsuite/gcc.dg/stack-check-2.c b/gcc/testsuite/gcc.dg/stack-check-2.c
new file mode 100644
index 0000000..8be7dee
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/stack-check-2.c
@@ -0,0 +1,66 @@
+/* The goal here is to ensure that we never consider a call to a noreturn
+   function as a potential tail call.
+
+   Right now GCC discovers potential tail calls by looking at the
+   predecessors of the exit block.  A call to a non-return function
+   has no successors and thus can never match that first filter.
+
+   But that could change one day and we want to catch it.  The problem
+   is the compiler could potentially optimize a tail call to a nonreturn
+   function, even if the caller has a frame.  That breaks the assumption
+   that calls probe *sp when saving the return address that some targets
+   depend on to elide stack probes.  */
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-check=clash -fdump-tree-tailc -fdump-tree-optimized" } */
+/* { dg-require-effective-target stack_clash_protected } */
+
+extern void foo (void) __attribute__ ((__noreturn__));
+
+
+void
+test_direct_1 (void)
+{
+  foo ();
+}
+
+void
+test_direct_2 (void)
+{
+  return foo ();
+}
+
+void (*indirect)(void)__attribute__ ((noreturn));
+
+
+void
+test_indirect_1 ()
+{
+  (*indirect)();
+}
+
+void
+test_indirect_2 (void)
+{
+  return (*indirect)();;
+}
+
+
+typedef void (*pvfn)() __attribute__ ((noreturn));
+
+void (*indirect_casted)(void);
+
+void
+test_indirect_casted_1 ()
+{
+  (*(pvfn)indirect_casted)();
+}
+
+void
+test_indirect_casted_2 (void)
+{
+  return (*(pvfn)indirect_casted)();
+}
+/* { dg-final { scan-tree-dump-not "tail call" "tailc" } } */
+/* { dg-final { scan-tree-dump-not "tail call" "optimized" } } */
+
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index fe5e777..61ffb68 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -8468,3 +8468,61 @@ proc check_effective_target_arm_coproc4_ok { } {
     return [check_cached_effective_target arm_coproc4_ok \
 		check_effective_target_arm_coproc4_ok_nocache]
 }
+
+# Return 1 if the target has support for stack probing designed
+# to avoid stack-clash style attacks
+#
+# This is used to restrict the stack-clash mitigation tests to
+# just those targets that have been explicitly supported
+# 
+# In addition to the prologue work on those targets, each target's
+# properties should be described in the functions below so that
+# tests do not become a mess of unreadable target conditions.
+# 
+proc check_effective_target_stack_clash_protected { } {
+  if { [istarget aarch*-*-*] || [istarget x86_64-*-*]
+       || [istarget i?86-*-*] || [istarget s390*-*-*]
+       || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] } {
+	return 1
+  }
+  return 0
+}
+
+# Return 1 if the target creates a frame pointer for non-leaf functions
+# Note we ignore cases where we apply tail call optimization here.
+proc check_effective_target_frame_pointer_for_non_leaf { } {
+  if { [istarget aarch*-*-*] } {
+	return 1
+  }
+  return 0
+}
+
+# Return 1 if the target's calling sequence or its ABI
+# create implicit stack probes at *sp at function
+# entry.
+proc check_effective_target_caller_implicit_probes { } {
+  if { [istarget x86_64-*-*] || [istarget i?86-*-*]
+       || [istarget powerpc*-*-*] || [istarget rs6000*-*-*] } {
+	return 1
+  }
+
+  # s390's ABI has a register save area allocated by the
+  # caller for use by the callee.  The mere existence does
+  # not constitute a probe by the caller, but when the slots
+  # used by the callee those stores are implicit probes
+  if { [istarget s390*-*-*] } {
+	return 1
+  }
+
+  return 0
+}
+
+# The stack realignment often causes residual stack allocations and
+# can also make it difficult to elide loops or residual allocations
+# for dynamic allocations
+proc check_effective_target_callee_realigns_stack { } {
+  if { [istarget x86_64-*-*] || [istarget i?86-*-*] } {
+	return 1
+  }
+  return 0
+}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-11 21:20 [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 Jeff Law
@ 2017-07-13  0:32 ` Segher Boessenkool
  2017-07-13  3:56   ` Jeff Law
  2017-07-14  0:38 ` David Malcolm
  1 sibling, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-07-13  0:32 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

Hi!

On Tue, Jul 11, 2017 at 03:19:57PM -0600, Jeff Law wrote:
> It introduces a new style of stack probing -fstack-check=clash,
> including documentation of the new option, how it differs from
> -fstack-check=specific, etc.
> 
> FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
> address that.

-fstack-check=clash is itself not such a super name either.  It's not
checking stack, and it isn't clashing: it just does a store to every
page the stack will touch (minus the first and last or something).

How does this work for targets that want to enable this by default?  How
does that interact with checking for stack size overflow?

> However, a sufficiently smart compiler could realize that a call to a
> noreturn function could be converted into a jump, even if the caller has
> a frame because that frame need not be torn down.

Currently GCC will never make a tail call to a noreturn function (see
calls.c:can_implement_as_sibling_call_p).  It would be nice (for code
size, if nothing else) if that was changed.  But you know all this :-)

> @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks:
>  @enumerate
>  @item
>  Modified allocation strategy for large objects: they are always
> -allocated dynamically if their size exceeds a fixed threshold.
> +allocated dynamically if their size exceeds a fixed threshold.  Note this
> +may change the semantics of some code.

How so?  Semantics of valid (portable) code, as well?  It would be nice
to have some detail here, not just a threat :-)

> +@samp{clash} is designed to prevent jumping the stack guard page as seen in
> +stack clash style attacks.  However, it does not guarantee sufficient stack
> +space to handle a signal in the event that the program hits the stack guard
> +and is thus incompatible with Ada's needs.

Is there some way we could have both?

In principle stack-clash protection is completely orthogonal to
-fstack-check.

>    /* Check the stack and entirely rely on the target configuration
> -     files, i.e. do not use the generic mechanism at all.  */
> +     files, i.e. do not use the generic mechanism at all.  This
> +     does not prevent stack guard jumping and stack clash style
> +     attacks.  */
>    FULL_BUILTIN_STACK_CHECK
>  };

> +      else if (!strcmp (arg, "clash"))
> +	{
> +	  /* This is the stack checking method, designed to prevent
> +	     stack-clash attacks.  */
> +	  if (!STACK_GROWS_DOWNWARD)
> +	    sorry ("-fstack-check=clash not implemented on this target");
> +	  else
> +	    opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
> +				        ? FULL_BUILTIN_STACK_CHECK
> +					: (STACK_CHECK_STATIC_BUILTIN
> +					   ? STACK_CLASH_BUILTIN_STACK_CHECK
> +					   : GENERIC_STACK_CHECK));
> +	}

So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get
stack clash protection if you asked for it specifically, without warning,
if I read that correctly?

> +# Return 1 if the target has support for stack probing designed
> +# to avoid stack-clash style attacks
> +#
> +# This is used to restrict the stack-clash mitigation tests to
> +# just those targets that have been explicitly supported

Missing full stop, twice.  More later in the file.

> +proc check_effective_target_stack_clash_protected { } {

The name is maybe not so great: nothing is protected until you actually
use the option.  "supported", maybe?

> +# Return 1 if the target's calling sequence or its ABI
> +# create implicit stack probes at *sp at function
> +# entry.
> +proc check_effective_target_caller_implicit_probes { } {

"At function entry" isn't really true for Power ("when setting up a
stack frame", instead -- and you are required to set one up before
calling anything).

> +# The stack realignment often causes residual stack allocations and
> +# can also make it difficult to elide loops or residual allocations
> +# for dynamic allocations
> +proc check_effective_target_callee_realigns_stack { } {

Does this return 1 if the callee always realigns the stack?  Or maybe
realigns the stack?


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13  0:32 ` Segher Boessenkool
@ 2017-07-13  3:56   ` Jeff Law
  2017-07-13 21:32     ` Segher Boessenkool
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2017-07-13  3:56 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On 07/12/2017 06:31 PM, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Jul 11, 2017 at 03:19:57PM -0600, Jeff Law wrote:
>> It introduces a new style of stack probing -fstack-check=clash,
>> including documentation of the new option, how it differs from
>> -fstack-check=specific, etc.
>>
>> FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
>> address that.
> 
> -fstack-check=clash is itself not such a super name either.  It's not
> checking stack, and it isn't clashing: it just does a store to every
> page the stack will touch (minus the first and last or something).
Yea.  I don't particularly like it either.  Suggestions?  I considered
"probe" as well, but "specific" also does probing.  In the end I used
the part of the marketing name of the exploits.

I also considered trying to separate the fact that we are doing stack
probing from the exact method of probing.  So something like:

-fstack-check -fstack-check-probe=frob

And the current default would map to

-fstack-check -fstack-check-probe=ahead <probing ahead of need>

I didn't particularly like that either and using something like
-fstack-check -fstack-check-probe=frob is a bit cumbersome.  It's also
the case that we're changing two key aspects relative to
-fstack-check=specific.

1. We never probe ahead of need.  ie, if a function requests N bytes of
stack space, then we'll never probe beyond N bytes.  In contrast
-fstack-check=specific will tend to probe 2-3 pages beyond the N byte
request.

2. We probe as each page is allocated.  in contrast most
-fstack-check=specific implementations allocate all the space, then
probe into the space.

THe concept I keep coming back to is that we're changing probing policy.
 as-needed vs ahead-of-need for how much to probe.  And "as-allocated"
rather than "after-allocation" for when we emit the probe.

Certainly open to ideas on the interface aspects.

> 
> How does this work for targets that want to enable this by default?  How
> does that interact with checking for stack size overflow?
I don't currently have a way to enable it by default -- for my tests I
just slam the value I want into the initializer in common.opt :-)

It's independent of stack size overflow checking.  They could (in
theory) even co-exist on ports that support stack size overflow
checking, but I didn't test that.

> 
>> However, a sufficiently smart compiler could realize that a call to a
>> noreturn function could be converted into a jump, even if the caller has
>> a frame because that frame need not be torn down.
> 
> Currently GCC will never make a tail call to a noreturn function (see
> calls.c:can_implement_as_sibling_call_p).  It would be nice (for code
> size, if nothing else) if that was changed.  But you know all this :-)
:-)  Initially when I started looking at this issue I fully expected to
need to explicitly reject tail calls to noreturn functions to make
things safe.  Imagine my surprise when I found out that we look at the
preds of the exit block to find opportunities.  Which obviously doesn't
work with noreturn calls.

On the theory that someone might want to change that one day, there is a
test in the suite that will complain if we have a tail optimized call to
a noreturn function.  At the least it will spark a discussion about the
validity of such an optimization in a world where jumping the guard is a
concern.


> 
>> @@ -11333,7 +11334,8 @@ target support in the compiler but comes with the following drawbacks:
>>  @enumerate
>>  @item
>>  Modified allocation strategy for large objects: they are always
>> -allocated dynamically if their size exceeds a fixed threshold.
>> +allocated dynamically if their size exceeds a fixed threshold.  Note this
>> +may change the semantics of some code.
> 
> How so?  Semantics of valid (portable) code, as well?  It would be nice
> to have some detail here, not just a threat :-)
I'd have to go back and run the testsuite with generic checking enabled
by default.  I didn't dig deeply (because generic testing just isn't
interesting for so many reasons).  But what happened was we had a large
auto object, which gets turned into an alloca'd object with generic
checking.

We created that alloca'd object at the wrong lexical scope which mucked
up its expected persistence.  I'm sure I'd spot it trivially once I set
up the test again.

> 
>> +@samp{clash} is designed to prevent jumping the stack guard page as seen in
>> +stack clash style attacks.  However, it does not guarantee sufficient stack
>> +space to handle a signal in the event that the program hits the stack guard
>> +and is thus incompatible with Ada's needs.
> 
> Is there some way we could have both?
With kernel support, yes.  The kernel would set up a reserved stack area
contiguous with the guard and the two areas would move in unison.  Once
they're unable to move, the kernel could map in the reserved page(s) and
trigger the signal handler.


> 
> In principle stack-clash protection is completely orthogonal to
> -fstack-check.
To some degree, yes.  But on other levels they are tightly related.

In fact, the indirection with get_stack_check_protect allows us to use
the existing -fstack-check=specific prologue code in many targets to
give a fairly high amount of stack-clash protection.

The biggest problem is that -fstack-check=specific code allocates all
the stack it needs.  Then after allocation is complete it goes back and
probes the pages it allocated.

That leaves the target vulnerable to an async signal arriving after
allocation, but before probing and the signal handler running on a
clashed stack.


> 
>>    /* Check the stack and entirely rely on the target configuration
>> -     files, i.e. do not use the generic mechanism at all.  */
>> +     files, i.e. do not use the generic mechanism at all.  This
>> +     does not prevent stack guard jumping and stack clash style
>> +     attacks.  */
>>    FULL_BUILTIN_STACK_CHECK
>>  };
> 
>> +      else if (!strcmp (arg, "clash"))
>> +	{
>> +	  /* This is the stack checking method, designed to prevent
>> +	     stack-clash attacks.  */
>> +	  if (!STACK_GROWS_DOWNWARD)
>> +	    sorry ("-fstack-check=clash not implemented on this target");
>> +	  else
>> +	    opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
>> +				        ? FULL_BUILTIN_STACK_CHECK
>> +					: (STACK_CHECK_STATIC_BUILTIN
>> +					   ? STACK_CLASH_BUILTIN_STACK_CHECK
>> +					   : GENERIC_STACK_CHECK));
>> +	}
> 
> So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get
> stack clash protection if you asked for it specifically, without warning,
> if I read that correctly?
That's an unknown.  I'd have to dig into the guts of the alpha and spu
to understand precisely how their STACK_CHECK_BUILTIN works.


> 
>> +# Return 1 if the target has support for stack probing designed
>> +# to avoid stack-clash style attacks
>> +#
>> +# This is used to restrict the stack-clash mitigation tests to
>> +# just those targets that have been explicitly supported
> 
> Missing full stop, twice.  More later in the file.
Will fix.

> 
>> +proc check_effective_target_stack_clash_protected { } {
> 
> The name is maybe not so great: nothing is protected until you actually
> use the option.  "supported", maybe?
I hate all the names...  "supports_stack_clash_protection" perhaps?


> 
>> +# Return 1 if the target's calling sequence or its ABI
>> +# create implicit stack probes at *sp at function
>> +# entry.
>> +proc check_effective_target_caller_implicit_probes { } {
> 
> "At function entry" isn't really true for Power ("when setting up a
> stack frame", instead -- and you are required to set one up before
> calling anything).
I think it's close enough -- I'll ponder a better name.  s390x doesn't
really have caller implicit probes either, but stack saves in the callee
act like them (because the caller allocates the space for the callee's
save area).


> 
>> +# The stack realignment often causes residual stack allocations and
>> +# can also make it difficult to elide loops or residual allocations
>> +# for dynamic allocations
>> +proc check_effective_target_callee_realigns_stack { } {
> 
> Does this return 1 if the callee always realigns the stack?  Or maybe
> realigns the stack?
Yea, I can clarify that it's the "callee may realign the stack".

Thanks for the comments questions.  I've got more to respond to, but
probably won't get to it tonight.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13  3:56   ` Jeff Law
@ 2017-07-13 21:32     ` Segher Boessenkool
  2017-07-13 22:38       ` Jeff Law
  2017-07-14  7:40       ` Jakub Jelinek
  0 siblings, 2 replies; 13+ messages in thread
From: Segher Boessenkool @ 2017-07-13 21:32 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

Hi again,

On Wed, Jul 12, 2017 at 09:56:09PM -0600, Jeff Law wrote:
> >> FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
> >> address that.
> > 
> > -fstack-check=clash is itself not such a super name either.  It's not
> > checking stack, and it isn't clashing: it just does a store to every
> > page the stack will touch (minus the first and last or something).
> Yea.  I don't particularly like it either.  Suggestions?  I considered
> "probe" as well, but "specific" also does probing.  In the end I used
> the part of the marketing name of the exploits.

I don't think it should be inside -fstack-check at all.  Sure, the
mechanisms implementing it overlap a bit (more on some targets, less
on others), but how will a user ask for clash protection _and_ for
stack checking?

So something like -fstack-clash-protection, -fstack-touch-all-pages,
together with whatever -fstack-check option is wanted (and yes the
existing ones have very non-specific, non-descriptive, non-obvious names).

> 1. We never probe ahead of need.  ie, if a function requests N bytes of
> stack space, then we'll never probe beyond N bytes.  In contrast
> -fstack-check=specific will tend to probe 2-3 pages beyond the N byte
> request.

s/need/immediate need/.  Nod.

> 2. We probe as each page is allocated.  in contrast most
> -fstack-check=specific implementations allocate all the space, then
> probe into the space.

Right, and that leaves some dangerous openings for exploitation.

> Certainly open to ideas on the interface aspects.

The interface is much harder to change than any aspect of the GCC
implementation.  Have to get it right at once, almost.

> > How does this work for targets that want to enable this by default?  How
> > does that interact with checking for stack size overflow?
> I don't currently have a way to enable it by default -- for my tests I
> just slam the value I want into the initializer in common.opt :-)
> 
> It's independent of stack size overflow checking.  They could (in
> theory) even co-exist on ports that support stack size overflow
> checking, but I didn't test that.

Okay, that was my impression as well.  But that interface won't allow
it (unless every -fstack-check=X gets an -fstack-check=X+clash twin).

> We created that alloca'd object at the wrong lexical scope which mucked
> up its expected persistence.  I'm sure I'd spot it trivially once I set
> up the test again.

That sounds like it might be a bug even.

> >>    /* Check the stack and entirely rely on the target configuration
> >> -     files, i.e. do not use the generic mechanism at all.  */
> >> +     files, i.e. do not use the generic mechanism at all.  This
> >> +     does not prevent stack guard jumping and stack clash style
> >> +     attacks.  */
> >>    FULL_BUILTIN_STACK_CHECK
> >>  };
> > 
> >> +      else if (!strcmp (arg, "clash"))
> >> +	{
> >> +	  /* This is the stack checking method, designed to prevent
> >> +	     stack-clash attacks.  */
> >> +	  if (!STACK_GROWS_DOWNWARD)
> >> +	    sorry ("-fstack-check=clash not implemented on this target");
> >> +	  else
> >> +	    opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
> >> +				        ? FULL_BUILTIN_STACK_CHECK
> >> +					: (STACK_CHECK_STATIC_BUILTIN
> >> +					   ? STACK_CLASH_BUILTIN_STACK_CHECK
> >> +					   : GENERIC_STACK_CHECK));
> >> +	}
> > 
> > So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get
> > stack clash protection if you asked for it specifically, without warning,
> > if I read that correctly?
> That's an unknown.  I'd have to dig into the guts of the alpha and spu
> to understand precisely how their STACK_CHECK_BUILTIN works.

Both just define it to 1.  So this code then specialises to

      else if (!strcmp (arg, "clash"))
	{
	  opts->x_flag_stack_check = FULL_BUILTIN_STACK_CHECK;
	}

which it says above does *not* protect against stack clash attacks.
Which seems backward.

> >> +proc check_effective_target_stack_clash_protected { } {
> > 
> > The name is maybe not so great: nothing is protected until you actually
> > use the option.  "supported", maybe?
> I hate all the names...  "supports_stack_clash_protection" perhaps?

Works for me!

> >> +# Return 1 if the target's calling sequence or its ABI
> >> +# create implicit stack probes at *sp at function
> >> +# entry.
> >> +proc check_effective_target_caller_implicit_probes { } {
> > 
> > "At function entry" isn't really true for Power ("when setting up a
> > stack frame", instead -- and you are required to set one up before
> > calling anything).
> I think it's close enough -- I'll ponder a better name.  s390x doesn't
> really have caller implicit probes either, but stack saves in the callee
> act like them (because the caller allocates the space for the callee's
> save area).

Oh the name is fine, but maybe expand the comment a bit.  Or even just
s/at function entry/for every function/ ?


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13 21:32     ` Segher Boessenkool
@ 2017-07-13 22:38       ` Jeff Law
  2017-07-13 23:37         ` Segher Boessenkool
  2017-07-14  7:40       ` Jakub Jelinek
  1 sibling, 1 reply; 13+ messages in thread
From: Jeff Law @ 2017-07-13 22:38 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On 07/13/2017 03:32 PM, Segher Boessenkool wrote:
> Hi again,
> 
> On Wed, Jul 12, 2017 at 09:56:09PM -0600, Jeff Law wrote:
>>>> FWIW -fstack-check=specific is dreadfully named.  I haven't tried to
>>>> address that.
>>>
>>> -fstack-check=clash is itself not such a super name either.  It's not
>>> checking stack, and it isn't clashing: it just does a store to every
>>> page the stack will touch (minus the first and last or something).
>> Yea.  I don't particularly like it either.  Suggestions?  I considered
>> "probe" as well, but "specific" also does probing.  In the end I used
>> the part of the marketing name of the exploits.
> 
> I don't think it should be inside -fstack-check at all.  Sure, the
> mechanisms implementing it overlap a bit (more on some targets, less
> on others), but how will a user ask for clash protection _and_ for
> stack checking?
The biggest problem with separating them is we would end up with a fair
amount code that ultimately looks like

if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
    || flag_whatever_the_new_thing_is)
  {
    probe the stack
  }


We can certainly do that.  It'll touch a few more places (particularly
in backends that have checks for Ada stack overflow, but for which I
haven't written stack clash protection), but the changes would be simple
and repetitive.




> So something like -fstack-clash-protection, -fstack-touch-all-pages,
> together with whatever -fstack-check option is wanted (and yes the
> existing ones have very non-specific, non-descriptive, non-obvious names).
Given the heavy use of "stack clash" in the press I'd lean towards
-fstack-clash-protection.  THat makes it easier for folks to find the
right option to protect themselves.

>> Certainly open to ideas on the interface aspects.
> 
> The interface is much harder to change than any aspect of the GCC
> implementation.  Have to get it right at once, almost.
Yup.  It gets backed in even faster than normal in this case I suspect.
Red Hat will almost certainly pick up the bits and start backporting
them to RHEL 7, RHEL 6 and RHEL 5.  So the flag will be out in the wild
before gcc-8 hits the streets.

If you go back to my original message from several weeks ago, getting
the UI right and the basic concepts were my primary goals.  Details like
what probe instruction to use are, IMHO, much less important.


> 
>>> How does this work for targets that want to enable this by default?  How
>>> does that interact with checking for stack size overflow?
>> I don't currently have a way to enable it by default -- for my tests I
>> just slam the value I want into the initializer in common.opt :-)
>>
>> It's independent of stack size overflow checking.  They could (in
>> theory) even co-exist on ports that support stack size overflow
>> checking, but I didn't test that.
> 
> Okay, that was my impression as well.  But that interface won't allow
> it (unless every -fstack-check=X gets an -fstack-check=X+clash twin).
THere may have been some mis-communication here, but it's a good place
to spend some time thinking about interplay between options.

First, there's -fstack-limit-*.  Essentially there's an RTX which
defines the limit for the stack pointer.  If you cross that limit a trap
gets executed.  PPC, s390 and likely other ports have support for this.
It should interoperate with -fstack-clash-protection and/or -fstack-check.

Some ports have their own probing option.  x86 comes to mind.  The x86
uses that for stack probing on windows where extending the stack has to
happen explicitly (which is why they're not subject to stack-clash style
attacks).  The options probably shouldn't be mixed.


There's -fstack-check and -fstack-clash-protection.  I think with the
direction we're going they are fundamentally incompatible because
neither the compiler nor kernel do anything to guarantee enough stack is
available after hitting the guard for -fstack-clash-protection.

And there's shrink wrapping.  This was originally raised by the aarch64
guys and deserves some thought.  I'm particularly worried about aarch64
if it was to shrink-wrap some particular register saves that we wanted
to use as an implicit probe.

There may be others that I'm missing.



> 
>> We created that alloca'd object at the wrong lexical scope which mucked
>> up its expected persistence.  I'm sure I'd spot it trivially once I set
>> up the test again.
> 
> That sounds like it might be a bug even.
That was my thought as well.  It's c-torture/execute/20071029-1.c.


> 
>>>>    /* Check the stack and entirely rely on the target configuration
>>>> -     files, i.e. do not use the generic mechanism at all.  */
>>>> +     files, i.e. do not use the generic mechanism at all.  This
>>>> +     does not prevent stack guard jumping and stack clash style
>>>> +     attacks.  */
>>>>    FULL_BUILTIN_STACK_CHECK
>>>>  };
>>>
>>>> +      else if (!strcmp (arg, "clash"))
>>>> +	{
>>>> +	  /* This is the stack checking method, designed to prevent
>>>> +	     stack-clash attacks.  */
>>>> +	  if (!STACK_GROWS_DOWNWARD)
>>>> +	    sorry ("-fstack-check=clash not implemented on this target");
>>>> +	  else
>>>> +	    opts->x_flag_stack_check = (STACK_CHECK_BUILTIN
>>>> +				        ? FULL_BUILTIN_STACK_CHECK
>>>> +					: (STACK_CHECK_STATIC_BUILTIN
>>>> +					   ? STACK_CLASH_BUILTIN_STACK_CHECK
>>>> +					   : GENERIC_STACK_CHECK));
>>>> +	}
>>>
>>> So targets that define STACK_CHECK_BUILTIN (spu and alpha) do not get
>>> stack clash protection if you asked for it specifically, without warning,
>>> if I read that correctly?
>> That's an unknown.  I'd have to dig into the guts of the alpha and spu
>> to understand precisely how their STACK_CHECK_BUILTIN works.
> 
> Both just define it to 1.  So this code then specialises to
> 
>       else if (!strcmp (arg, "clash"))
> 	{
> 	  opts->x_flag_stack_check = FULL_BUILTIN_STACK_CHECK;
> 	}
> 
> which it says above does *not* protect against stack clash attacks.
> Which seems backward.
Right.  That was one of the other reasons why my initial patches left
those enums alone and introduced a new flag -- probing policy.

Those enum constants, depending on how you read them almost imply levels
of backend support for Ada style stack overflow checking.   That was one
of the reasons why early (not posted) versions of my patch left that
stuff alone -- instead they introduced the separate concept of probing
policy.  Jakub argued internally to put things under -fstack-check.






> 
>>>> +proc check_effective_target_stack_clash_protected { } {
>>>
>>> The name is maybe not so great: nothing is protected until you actually
>>> use the option.  "supported", maybe?
>> I hate all the names...  "supports_stack_clash_protection" perhaps?
> 
> Works for me!
> 
>>>> +# Return 1 if the target's calling sequence or its ABI
>>>> +# create implicit stack probes at *sp at function
>>>> +# entry.
>>>> +proc check_effective_target_caller_implicit_probes { } {
>>>
>>> "At function entry" isn't really true for Power ("when setting up a
>>> stack frame", instead -- and you are required to set one up before
>>> calling anything).
>> I think it's close enough -- I'll ponder a better name.  s390x doesn't
>> really have caller implicit probes either, but stack saves in the callee
>> act like them (because the caller allocates the space for the callee's
>> save area).
> 
> Oh the name is fine, but maybe expand the comment a bit.  Or even just
> s/at function entry/for every function/ ?
I'll use "at function entry" if I can't come up with something better
and expand the comments.

None of the options are strictly correct for s390, but "at function
entry" is less incorrect than "for every function".

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13 22:38       ` Jeff Law
@ 2017-07-13 23:37         ` Segher Boessenkool
  2017-07-14  4:36           ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-07-13 23:37 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On Thu, Jul 13, 2017 at 04:38:00PM -0600, Jeff Law wrote:
> On 07/13/2017 03:32 PM, Segher Boessenkool wrote:
> >>> -fstack-check=clash is itself not such a super name either.  It's not
> >>> checking stack, and it isn't clashing: it just does a store to every
> >>> page the stack will touch (minus the first and last or something).
> >> Yea.  I don't particularly like it either.  Suggestions?  I considered
> >> "probe" as well, but "specific" also does probing.  In the end I used
> >> the part of the marketing name of the exploits.
> > 
> > I don't think it should be inside -fstack-check at all.  Sure, the
> > mechanisms implementing it overlap a bit (more on some targets, less
> > on others), but how will a user ask for clash protection _and_ for
> > stack checking?
> The biggest problem with separating them is we would end up with a fair
> amount code that ultimately looks like
> 
> if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
>     || flag_whatever_the_new_thing_is)
>   {
>     probe the stack
>   }

But only in backends, right?  And some backends will be actually
simpler, afaics.

> We can certainly do that.  It'll touch a few more places (particularly
> in backends that have checks for Ada stack overflow, but for which I
> haven't written stack clash protection), but the changes would be simple
> and repetitive.

Sounds good :-)

> >> Certainly open to ideas on the interface aspects.
> > 
> > The interface is much harder to change than any aspect of the GCC
> > implementation.  Have to get it right at once, almost.
> Yup.  It gets backed in even faster than normal in this case I suspect.
> Red Hat will almost certainly pick up the bits and start backporting
> them to RHEL 7, RHEL 6 and RHEL 5.  So the flag will be out in the wild
> before gcc-8 hits the streets.

Are you planning to backport it for mainline GCC as well?

> >> It's independent of stack size overflow checking.  They could (in
> >> theory) even co-exist on ports that support stack size overflow
> >> checking, but I didn't test that.
> > 
> > Okay, that was my impression as well.  But that interface won't allow
> > it (unless every -fstack-check=X gets an -fstack-check=X+clash twin).
> THere may have been some mis-communication here, but it's a good place
> to spend some time thinking about interplay between options.
> 
> First, there's -fstack-limit-*.  Essentially there's an RTX which
> defines the limit for the stack pointer.  If you cross that limit a trap
> gets executed.  PPC, s390 and likely other ports have support for this.
> It should interoperate with -fstack-clash-protection and/or -fstack-check.

Yup.

> There's -fstack-check and -fstack-clash-protection.  I think with the
> direction we're going they are fundamentally incompatible because
> neither the compiler nor kernel do anything to guarantee enough stack is
> available after hitting the guard for -fstack-clash-protection.

Hrm, I have to think about that.

> And there's shrink wrapping.  This was originally raised by the aarch64
> guys and deserves some thought.  I'm particularly worried about aarch64
> if it was to shrink-wrap some particular register saves that we wanted
> to use as an implicit probe.

For normal shrink-wrapping, the volatile reg stores will always happen
in the same prologue that sets up the stack frame as well, so there
won't be any problem (except the usual stack ties you might need in the
prologue, etc.) (*)  For separate shrink-wrapping, yeah you'll need any
bb that could potentially store to the stack require the component for
the register you use as implicit probe.  This is pretty nasty.

> >> We created that alloca'd object at the wrong lexical scope which mucked
> >> up its expected persistence.  I'm sure I'd spot it trivially once I set
> >> up the test again.
> > 
> > That sounds like it might be a bug even.
> That was my thought as well.  It's c-torture/execute/20071029-1.c.

Ah cool, thanks for digging it up.


Segher


(*) Related, I know you don't want to scan generated code to see if
all probes are in place, but it seems you pretty much have to if you
want to make sure no later pass deletes the probes.  Well, something
for targets to worry about :-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-11 21:20 [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 Jeff Law
  2017-07-13  0:32 ` Segher Boessenkool
@ 2017-07-14  0:38 ` David Malcolm
  2017-07-14  4:40   ` Jeff Law
  1 sibling, 1 reply; 13+ messages in thread
From: David Malcolm @ 2017-07-14  0:38 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

On Tue, 2017-07-11 at 15:19 -0600, Jeff Law wrote:

[...]

> diff --git a/gcc/opts.c b/gcc/opts.c
> index 7460c2b..61f5bb0 100644
> --- a/gcc/opts.c
> +++ b/gcc/opts.c
> @@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options
> *opts,
>         opts->x_flag_stack_check = STACK_CHECK_BUILTIN
>                            ? FULL_BUILTIN_STACK_CHECK
>                            : GENERIC_STACK_CHECK;
> +      else if (!strcmp (arg, "clash"))
> +       {
> +         /* This is the stack checking method, designed to prevent
> +            stack-clash attacks.  */
> +         if (!STACK_GROWS_DOWNWARD)
> +           sorry ("-fstack-check=clash not implemented on this
> target");

Minor nitpick: shouldn't options be quoted in diagnostics?

So this should be:

  sorry ("%<-fstack-check=clash%> not implemented on this target");

(or whatever it ends up being called)

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13 23:37         ` Segher Boessenkool
@ 2017-07-14  4:36           ` Jeff Law
  2017-07-14 16:34             ` Segher Boessenkool
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Law @ 2017-07-14  4:36 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On 07/13/2017 05:37 PM, Segher Boessenkool wrote:
> On Thu, Jul 13, 2017 at 04:38:00PM -0600, Jeff Law wrote:
>> On 07/13/2017 03:32 PM, Segher Boessenkool wrote:
>>>>> -fstack-check=clash is itself not such a super name either.  It's not
>>>>> checking stack, and it isn't clashing: it just does a store to every
>>>>> page the stack will touch (minus the first and last or something).
>>>> Yea.  I don't particularly like it either.  Suggestions?  I considered
>>>> "probe" as well, but "specific" also does probing.  In the end I used
>>>> the part of the marketing name of the exploits.
>>>
>>> I don't think it should be inside -fstack-check at all.  Sure, the
>>> mechanisms implementing it overlap a bit (more on some targets, less
>>> on others), but how will a user ask for clash protection _and_ for
>>> stack checking?
>> The biggest problem with separating them is we would end up with a fair
>> amount code that ultimately looks like
>>
>> if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK
>>     || flag_whatever_the_new_thing_is)
>>   {
>>     probe the stack
>>   }
> 
> But only in backends, right?  And some backends will be actually
> simpler, afaics.
I suspect we might end up with one in explow.c as well.  But yes, it'd
primarily be buried in the targets.


>> Yup.  It gets baked in even faster than normal in this case I suspect.
>> Red Hat will almost certainly pick up the bits and start backporting
>> them to RHEL 7, RHEL 6 and RHEL 5.  So the flag will be out in the wild
>> before gcc-8 hits the streets.
> 
> Are you planning to backport it for mainline GCC as well?
The hope is we reach a consensus for mainline GCC, then we backport
whatever goes into mainline to the relevant RHEL releases.  THe
backports into RHEL would likely happen before gcc-8 hits the streets.

I've got a little time before I have to invert that process and go into
RHEL first.   I *really* want to avoid that for a multitude of reasons.


> 
>> There's -fstack-check and -fstack-clash-protection.  I think with the
>> direction we're going they are fundamentally incompatible because
>> neither the compiler nor kernel do anything to guarantee enough stack is
>> available after hitting the guard for -fstack-clash-protection.
> 
> Hrm, I have to think about that.
Even if the kernel implements the reserved page stuff mentioned earlier
in the thread, I'm not sure when we'd be able to reliably depend on that
capability (or even check for it).

ISTM that -fstack-check continues to be Ada centric and we have a new
option to deal with stack-clash protection and the two simply just
aren't allowed to be enabled together.


> 
>> And there's shrink wrapping.  This was originally raised by the aarch64
>> guys and deserves some thought.  I'm particularly worried about aarch64
>> if it was to shrink-wrap some particular register saves that we wanted
>> to use as an implicit probe.
> 
> For normal shrink-wrapping, the volatile reg stores will always happen
> in the same prologue that sets up the stack frame as well, so there
> won't be any problem (except the usual stack ties you might need in the
> prologue, etc.) (*)  For separate shrink-wrapping, yeah you'll need any
> bb that could potentially store to the stack require the component for
> the register you use as implicit probe.  This is pretty nasty.
I'm going to try and look at this tomorrow to get a feel for the aarch64
implementation separate shrink wrapping.

For ppc I'm not too worried -- the implicit backchain probes are so
effective at eliminating explicit probes that I didn't bother writing
any code to track the register saves as implicit probes.   I'm pretty
sure it just works on ppc with separate shrink wrapping.

> 
>>>> We created that alloca'd object at the wrong lexical scope which mucked
>>>> up its expected persistence.  I'm sure I'd spot it trivially once I set
>>>> up the test again.
>>>
>>> That sounds like it might be a bug even.
>> That was my thought as well.  It's c-torture/execute/20071029-1.c.
> 
> Ah cool, thanks for digging it up.
NP.  As expected, it was trivial to slam in the initializer and rerun
the testsuite.  As soon as I saw the failure and looked at the source I
knew I'd found it again.

The other thing to remember about -fstack-check=generic is that once
your frame gets big, it just gives up.  That's why it takes large
automatic objects and turns them into alloca'd objects -- to reduce the
size of the prologue allocated frame.

I looked at -fstack-check=generic just long enough to realize it wasn't
really an option to cover s390.  Then I promptly moved onto more useful
pursuits.




> (*) Related, I know you don't want to scan generated code to see if
> all probes are in place, but it seems you pretty much have to if you
> want to make sure no later pass deletes the probes.  Well, something
> for targets to worry about :-)
Right.   The dumping that's currently done just gives us a view into
what the prologue code wanted to emit.  Passes after prologue generation
can get in the way -- combine-stack-adjustments comes immediately to
mind.  end-to-end testing absolutely requires scanning the final rtl
dump or the assembly code or DSOs/executables.  My plan was to fault
that in as-needed.

For c-s-a, we've introduced a new note that gets attached to the
allocation step when protecting against stack-clash.  If c-s-a sees that
note it will refuse to combine the stack adjustment with any others.

I wrote a quick and dirty x86 specific test for that.  You'll find those
bits in patch #5.

FOr some architectures you can get a good sense of problematical
sequences with objdump and some greps.  It's far from automated and
nowhere near something I'd submit into GCC :-)


Jeff


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-14  0:38 ` David Malcolm
@ 2017-07-14  4:40   ` Jeff Law
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Law @ 2017-07-14  4:40 UTC (permalink / raw)
  To: David Malcolm, gcc-patches

On 07/13/2017 06:37 PM, David Malcolm wrote:
> On Tue, 2017-07-11 at 15:19 -0600, Jeff Law wrote:
> 
> [...]
> 
>> diff --git a/gcc/opts.c b/gcc/opts.c
>> index 7460c2b..61f5bb0 100644
>> --- a/gcc/opts.c
>> +++ b/gcc/opts.c
>> @@ -2243,6 +2243,19 @@ common_handle_option (struct gcc_options
>> *opts,
>>         opts->x_flag_stack_check = STACK_CHECK_BUILTIN
>>                            ? FULL_BUILTIN_STACK_CHECK
>>                            : GENERIC_STACK_CHECK;
>> +      else if (!strcmp (arg, "clash"))
>> +       {
>> +         /* This is the stack checking method, designed to prevent
>> +            stack-clash attacks.  */
>> +         if (!STACK_GROWS_DOWNWARD)
>> +           sorry ("-fstack-check=clash not implemented on this
>> target");
> 
> Minor nitpick: shouldn't options be quoted in diagnostics?
> 
> So this should be:
> 
>   sorry ("%<-fstack-check=clash%> not implemented on this target");
> 
> (or whatever it ends up being called)
Thanks.  Fixed for the next version.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-13 21:32     ` Segher Boessenkool
  2017-07-13 22:38       ` Jeff Law
@ 2017-07-14  7:40       ` Jakub Jelinek
  2017-07-14  7:46         ` Jeff Law
  1 sibling, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2017-07-14  7:40 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Jeff Law, gcc-patches@gcc.gnu.org >> gcc-patches

On Thu, Jul 13, 2017 at 04:32:02PM -0500, Segher Boessenkool wrote:
> I don't think it should be inside -fstack-check at all.  Sure, the
> mechanisms implementing it overlap a bit (more on some targets, less
> on others), but how will a user ask for clash protection _and_ for
> stack checking?

Are we willing to implement that?  What would we do in that case?
Probe in each function we emit both the current page that
-fstack-check=clash would probe and a page or how many below that how
-fstack-check=specific or generic would probe?
Even if yes, we can still use -fstack-check=clash,specific
for that (similarly to how we support -fsanitize=shift,unreachable).
But it is still a stack check kind.

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-14  7:40       ` Jakub Jelinek
@ 2017-07-14  7:46         ` Jeff Law
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Law @ 2017-07-14  7:46 UTC (permalink / raw)
  To: Jakub Jelinek, Segher Boessenkool
  Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On 07/14/2017 01:40 AM, Jakub Jelinek wrote:
> On Thu, Jul 13, 2017 at 04:32:02PM -0500, Segher Boessenkool wrote:
>> I don't think it should be inside -fstack-check at all.  Sure, the
>> mechanisms implementing it overlap a bit (more on some targets, less
>> on others), but how will a user ask for clash protection _and_ for
>> stack checking?
> 
> Are we willing to implement that?  What would we do in that case?
I'd change every existing target that has a backend stack probing
implementatoin to use a moving-sp style.  ie, allocate page, probe page,
allocate page, probe page.  We'd then want to use the -fstack-check
routines rather than the new routines.

But in the end I don't think using the options together makes all that
much sense.  We really should look at -fstack-check as Ada specific,
even though its implementation is in the target files and middle end.


Jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-14  4:36           ` Jeff Law
@ 2017-07-14 16:34             ` Segher Boessenkool
  2017-07-14 16:53               ` Jeff Law
  0 siblings, 1 reply; 13+ messages in thread
From: Segher Boessenkool @ 2017-07-14 16:34 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On Thu, Jul 13, 2017 at 10:35:55PM -0600, Jeff Law wrote:
> >> There's -fstack-check and -fstack-clash-protection.  I think with the
> >> direction we're going they are fundamentally incompatible because
> >> neither the compiler nor kernel do anything to guarantee enough stack is
> >> available after hitting the guard for -fstack-clash-protection.
> > 
> > Hrm, I have to think about that.
> Even if the kernel implements the reserved page stuff mentioned earlier
> in the thread, I'm not sure when we'd be able to reliably depend on that
> capability (or even check for it).
> 
> ISTM that -fstack-check continues to be Ada centric and we have a new
> option to deal with stack-clash protection and the two simply just
> aren't allowed to be enabled together.

So this essentially means Ada programs cannot get stack clash protection
at all?  And if -fstack-check is really only useful for Ada, we shouldn't
mix up that option with the stack clash thing.

> >> And there's shrink wrapping.  This was originally raised by the aarch64
> >> guys and deserves some thought.  I'm particularly worried about aarch64
> >> if it was to shrink-wrap some particular register saves that we wanted
> >> to use as an implicit probe.
> > 
> > For normal shrink-wrapping, the volatile reg stores will always happen
> > in the same prologue that sets up the stack frame as well, so there
> > won't be any problem (except the usual stack ties you might need in the
> > prologue, etc.) (*)  For separate shrink-wrapping, yeah you'll need any
> > bb that could potentially store to the stack require the component for
> > the register you use as implicit probe.  This is pretty nasty.
> I'm going to try and look at this tomorrow to get a feel for the aarch64
> implementation separate shrink wrapping.
> 
> For ppc I'm not too worried -- the implicit backchain probes are so
> effective at eliminating explicit probes that I didn't bother writing
> any code to track the register saves as implicit probes.   I'm pretty
> sure it just works on ppc with separate shrink wrapping.

With what is in trunk so far, yeah ;-)  In principle the "set up a stack
frame" part of the prologue can be made a separate shrink-wrapping
component and then we'd have to worry.  But it would be complicated to
do that and there is no real benefit I see so far, so it won't happen.
Nothing for you to have to worry about in any case :-)


Segher

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 01/08
  2017-07-14 16:34             ` Segher Boessenkool
@ 2017-07-14 16:53               ` Jeff Law
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff Law @ 2017-07-14 16:53 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches@gcc.gnu.org >> gcc-patches

On 07/14/2017 10:34 AM, Segher Boessenkool wrote:
> On Thu, Jul 13, 2017 at 10:35:55PM -0600, Jeff Law wrote:
>>>> There's -fstack-check and -fstack-clash-protection.  I think with the
>>>> direction we're going they are fundamentally incompatible because
>>>> neither the compiler nor kernel do anything to guarantee enough stack is
>>>> available after hitting the guard for -fstack-clash-protection.
>>>
>>> Hrm, I have to think about that.
>> Even if the kernel implements the reserved page stuff mentioned earlier
>> in the thread, I'm not sure when we'd be able to reliably depend on that
>> capability (or even check for it).
>>
>> ISTM that -fstack-check continues to be Ada centric and we have a new
>> option to deal with stack-clash protection and the two simply just
>> aren't allowed to be enabled together.
> 
> So this essentially means Ada programs cannot get stack clash protection
> at all?  And if -fstack-check is really only useful for Ada, we shouldn't
> mix up that option with the stack clash thing.
It's not a simple yes/no.


For Ada the model is that the entire application will be compiled with
-fstack-check and that each function that allocates space probes 2-3
pages beyond their immediate need to ensure space to run the signal
handler (damn the large register files :-)

Those are absolutely critical to keep in mind.  In combination they
allow prologues to skip the first 2-3 page probes (they were done
earlier in the call chain).

So from a standpoint of probing each and every allocated page, Ada with
-fstack-check will do that if and only if every function is compiled
with -fstack-check.

But the actual allocation/probing tends to look like this in the prologues

allocate all the stack space
probe third page
probe fourth page
probe fifth page
etc

So there's a race condition between the allocation point and when the
probes execute which means an async signal handler could be running on a
clashed stack/heap.

So an Ada program fully compiled with -fstack-check will mostly be
protected from a stack-clash style attack.

They could achieve better protection by fixing the
-fstack-check=specific prologue sequences so that they allocate and
probe a page at a time.  That (mostly) closes the issue with signal
handlers running on a clashed stack/heap.

They could achieve a level of protection in mixed code environments by
not skipping page probes.  But that would result in significantly worse
code given the current implementation of -fstack-check.


> 
> With what is in trunk so far, yeah ;-)  In principle the "set up a stack
> frame" part of the prologue can be made a separate shrink-wrapping
> component and then we'd have to worry.  But it would be complicated to
> do that and there is no real benefit I see so far, so it won't happen.
> Nothing for you to have to worry about in any case :-)
That's consistent with my thoughts as well.  You could separately wrap
the allocation, but it's not likely to be that beneficial and it would
seem to add a fair amount of complexity.

jeff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-07-14 16:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 21:20 [PATCH][RFA/RFC] Stack clash mitigation patch 01/08 Jeff Law
2017-07-13  0:32 ` Segher Boessenkool
2017-07-13  3:56   ` Jeff Law
2017-07-13 21:32     ` Segher Boessenkool
2017-07-13 22:38       ` Jeff Law
2017-07-13 23:37         ` Segher Boessenkool
2017-07-14  4:36           ` Jeff Law
2017-07-14 16:34             ` Segher Boessenkool
2017-07-14 16:53               ` Jeff Law
2017-07-14  7:40       ` Jakub Jelinek
2017-07-14  7:46         ` Jeff Law
2017-07-14  0:38 ` David Malcolm
2017-07-14  4:40   ` Jeff Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).