public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* RFC: stack/heap collision vulnerability and mitigation with GCC
@ 2017-06-19 17:07 Jeff Law
  2017-06-19 17:29 ` Jakub Jelinek
                   ` (4 more replies)
  0 siblings, 5 replies; 66+ messages in thread
From: Jeff Law @ 2017-06-19 17:07 UTC (permalink / raw)
  To: gcc-patches

As some of you are likely aware, Qualys has just published fairly
detailed information on using stack/heap clashes as an attack vector.
Eric B, Michael M -- sorry I couldn't say more when I contact you about
-fstack-check and some PPC specific stuff.  This has been under embargo
for the last month.


--


http://www.openwall.com/lists/oss-security/2017/06/19/1


Obviously various vulnerabilities pointed out in that advisory are being
mitigated, particularly those found within glibc.  But those are really
just scratching the surface of this issue.

At its core, this chained attack relies first upon using various
techniques to bring the stack and heap close together.  Then the
exploits rely on large stack allocations to "jump the guard".  Once the
guard has been jumped, the stack and heap have collided and all hell
breaks loose.

The "jump the guard" step can be mitigated with help from the compiler.
We just have to ensure that as we allocate chunks of stack space that we
touch each allocated page.  That ensures that the guard page is hit.

This sounds a whole lot like -fstack-check and initially that's what
folks were hoping could be used to eliminate this class of problems.

--

Unfortunately, -fstack-check is actually not well suited for our purposes.

Some background.  -fstack-check was designed primarily for Ada's needs.
It assumes the whole program is compiled with -fstack-check and it is
designed to ensure there is enough stack space left so that if the
program hits the guard (say via infinite recursion) the program can
safely call into a signal handler and raise an exception.

To ensure there's always enough space to meet that design requirement,
-fstack-check probes stack space ahead of the actual need of the code.

The assumption that all code was compiled with -fstack-check allows for
elision of some stack probes as they are assumed to have been probed by
earlier callers in the call chain.  This elision is safe in an
environment where all callers use -fstack-check, but fatally flawed in a
mixed environment.

Most ports first probe by pages for whatever space is requested, then
after all probing is done, they actually allocate space.  This runs
afoul of valgrind in various unpleasant ways (including crashing
valgrind on two targets).

Only x86-linux currently uses a "moving sp" allocation and probing
strategy.  ie, it actually allocates space, then probes the space.

--

After much poking around I concluded that we really need to implement
allocation and probing via a "moving sp" strategy.   Probing into
unallocated areas runs afoul of valgrind, so that's a non-starter.

Allocating stack space, then probing the pages within the space is
vulnerable to async signal delivery between the allocation point and the
probe point.  If that occurs the signal handler could end up running on
a stack that has collided with the heap.

Ideally we would allocate and probe a page as an atomic unit (which is
feasible on PPC).  Alternatively, due to ISA restrictions, allocate a
page, then probe the page as distinct instructions.  The latter still
has a race, but we'd have to take the async signal in a single
instruction window.

A key point to remember is that you can never have an allocation
(potentially using more than one allocation site) which is larger than a
page without probing the page.

Furthermore, we can not assume that earlier functions in the call stack
were compiled with stack checking enabled.  Thus we can not make any
assumptions about what pages other functions in the callstack have
probed or not probed.

Finally, we need not ensure the ability to handle a signal at stack
overflow.  It is fine for the kernel to halt the process immediately if
it detects a reference to the guard page.


--

With all that in mind, we also want to be as efficient as possible and I
think we do pretty good on x86 and ppc.  On x86, the call instruction
itself stores into the stack and on ppc stack is only supposed to be
allocated via the store-with-base-register-modification instructions
which also store into *sp.

Those "implicit probes" allow us to greatly reduce the amount of probing
we do on those architectures.  If a function allocates less than a page
of space, no probing is needed -- this covers the vast majority of
functions.  Furthermore, if we allocate N pages + M bytes of residuals,
we need only explicitly probe the N pages, but not any of the residual
allocation.

On glibc, we end up creating probes in ~1.5% of the functions on those
two architectures.  We could probably do even better on PPC, but we
currently assume 4k pages which is overly-conservative on that target.

aarch64 is significantly worse.  There are no implicit probes we can
exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
So we have the track the distance to the most recent probe and when that
distance grows too large, we have to emit a probe.  Of course we have to
make worst case assumptions at function entry.

s390 is much like aarch64 in that it doesn't have implicit probes.
However, it has simpler prologue code.


Dynamic (alloca) space is handled fairly generically with simple code to
allocate a page and probe the just allocated page.


Michael Matz has suggested some generic support so that we don't have to
write target specific code for each and every target we support.  THe
idea is to have a helper function which allocates and probes stack
space.  THe port can then call that helper function from within its
prologue generator.  I  think this is wise -- I wouldn't want to go
through this exercise on every port.


--


So, time to open the discussion to questions & comments.

I've got patches I need to cleanup and post for comments that implement
this for x86, ppc, aarch64 and s390.  x86 and ppc are IMHO in good
shape.  THere's an unhandled case for s390.  I've got evaluation still
to do on aarch64.



Jeff

^ permalink raw reply	[flat|nested] 66+ messages in thread
* Re: RFC: stack/heap collision vulnerability and mitigation with GCC
@ 2017-06-20 23:22 Wilco Dijkstra
  2017-06-21  8:34 ` Richard Earnshaw (lists)
  2017-06-21 17:05 ` Jeff Law
  0 siblings, 2 replies; 66+ messages in thread
From: Wilco Dijkstra @ 2017-06-20 23:22 UTC (permalink / raw)
  To: Jeff Law, Richard Earnshaw, GCC Patches; +Cc: nd

Jeff Law wrote:
> But the stack pointer might have already been advanced into the guard
> page by the caller.   For the sake of argument assume the guard page is
> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
> the caller hasn't touched the 0xf1000 page.
>
> If FrameSize >= 32, then the stores are going to hit the 0xf0000 page
> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
> to emit a probe prior to this stack allocation.

That's an incorrect ABI that allows adjusting the frame by 4080+32! A correct
one might allow say 1024 bytes for outgoing arguments. That means when
you call a function, there is still guard-page-size - 1024 bytes left that you can
use to allocate locals. With a 4K guard page that allows leaf functions up to 3KB, 
and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing
arguments without inserting any probes beyond the normal frame stores. 

This design means almost no functions need additional probes. Assuming we're
also increasing the guard page size to 64KB, it's cheap even for large functions.

Wilco

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2017-07-18 19:54 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 17:07 RFC: stack/heap collision vulnerability and mitigation with GCC Jeff Law
2017-06-19 17:29 ` Jakub Jelinek
2017-06-19 17:45   ` Jeff Law
2017-06-19 17:51     ` Jakub Jelinek
2017-06-19 21:51       ` Jeff Law
2017-06-20  8:03       ` Uros Bizjak
2017-06-20 10:18         ` Richard Biener
2017-06-20 11:10           ` Uros Bizjak
2017-06-20 12:13             ` Florian Weimer
2017-06-20 12:17               ` Uros Bizjak
2017-06-20 12:20                 ` Uros Bizjak
2017-06-20 12:27                   ` Richard Biener
2017-06-20 21:57                     ` Jeff Law
2017-06-20 15:59                 ` Jeff Law
2017-06-19 18:00   ` Richard Biener
2017-06-19 18:02     ` Richard Biener
2017-06-19 18:15       ` Florian Weimer
2017-06-19 21:57         ` Jeff Law
2017-06-19 22:08       ` Jeff Law
2017-06-20  7:50   ` Eric Botcazou
2017-06-19 17:51 ` Joseph Myers
2017-06-19 17:55   ` Jakub Jelinek
2017-06-19 18:21   ` Florian Weimer
2017-06-19 21:56     ` Joseph Myers
2017-06-19 22:05       ` Jeff Law
2017-06-19 22:10         ` Florian Weimer
2017-06-19 19:05   ` Jeff Law
2017-06-19 19:45     ` Jakub Jelinek
2017-06-19 21:41       ` Jeff Law
2017-06-20  8:27     ` Richard Earnshaw (lists)
2017-06-20 15:50       ` Jeff Law
2017-06-19 18:12 ` Richard Kenner
2017-06-19 22:05   ` Jeff Law
2017-06-19 22:07     ` Richard Kenner
2017-06-20  8:21   ` Eric Botcazou
2017-06-20 15:50     ` Jeff Law
2017-06-20 19:48     ` Jakub Jelinek
2017-06-20 20:37       ` Eric Botcazou
2017-06-20 20:46         ` Jeff Law
2017-06-20  8:17 ` Eric Botcazou
2017-06-20 21:52   ` Jeff Law
2017-06-20 22:20     ` Eric Botcazou
2017-06-21 17:31       ` Jeff Law
2017-06-21 19:07     ` Florian Weimer
2017-06-21  7:56   ` Andreas Schwab
2017-06-20  9:27 ` Richard Earnshaw (lists)
2017-06-20 21:39   ` Jeff Law
2017-06-21  8:41     ` Richard Earnshaw (lists)
2017-06-21 17:25       ` Jeff Law
2017-06-22  9:53         ` Richard Earnshaw (lists)
2017-06-22 15:30           ` Jeff Law
2017-06-22 16:07             ` Szabolcs Nagy
2017-06-22 16:15               ` Jeff Law
2017-06-28  6:45           ` Florian Weimer
2017-07-13 23:21             ` Jeff Law
2017-07-18 19:54               ` Florian Weimer
2017-06-20 23:22 Wilco Dijkstra
2017-06-21  8:34 ` Richard Earnshaw (lists)
2017-06-21  8:44   ` Andreas Schwab
2017-06-21  8:46     ` Richard Earnshaw (lists)
2017-06-21  8:46       ` Richard Earnshaw (lists)
2017-06-21  9:03   ` Wilco Dijkstra
2017-06-21 17:05 ` Jeff Law
2017-06-21 17:47   ` Wilco Dijkstra
2017-06-22 16:10     ` Jeff Law
2017-06-22 22:57       ` Wilco Dijkstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).