public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-08 10:59 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-08 10:59 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:26e7ad9c8e48d0e9e471515e859030615fe60e20

commit 26e7ad9c8e48d0e9e471515e859030615fe60e20
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:08 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eccec5f2ed0..a6dc923de57 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5315,23 +5315,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5417,7 +5401,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5442,7 +5425,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5463,7 +5445,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5474,7 +5455,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5516,6 +5496,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5523,6 +5510,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..463e98ce371 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..369384a1ad6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-09  8:07 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-09  8:07 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d85320a13af688ae8f30e6496a266cac8fe236d6

commit d85320a13af688ae8f30e6496a266cac8fe236d6
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:59 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0283519c211..db4aed40014 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5314,23 +5314,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5416,7 +5400,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5441,7 +5424,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5462,7 +5444,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5473,7 +5454,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5515,6 +5495,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5522,6 +5509,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a660e33739b..1b73098c65c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f7ab5d48a63..e9e195efbca 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-09  6:26 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-09  6:26 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:54b333f564ed199b92129e7309735cb68ac13858

commit 54b333f564ed199b92129e7309735cb68ac13858
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:59 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0283519c211..db4aed40014 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5314,23 +5314,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5416,7 +5400,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5441,7 +5424,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5462,7 +5444,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5473,7 +5454,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5515,6 +5495,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5522,6 +5509,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a660e33739b..1b73098c65c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f7ab5d48a63..e9e195efbca 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-09  6:17 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-09  6:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:d49ef2974f0659880d5d75f2673aca490b3fb14f

commit d49ef2974f0659880d5d75f2673aca490b3fb14f
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:59 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 0283519c211..db4aed40014 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5314,23 +5314,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5416,7 +5400,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5441,7 +5424,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5462,7 +5444,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5473,7 +5454,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5515,6 +5495,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5522,6 +5509,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index a660e33739b..1b73098c65c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f7ab5d48a63..e9e195efbca 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-08 10:44 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-08 10:44 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:c46fded1c6f46a99d06b61cc1f07c4ab0f8a027f

commit c46fded1c6f46a99d06b61cc1f07c4ab0f8a027f
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:59 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eccec5f2ed0..a6dc923de57 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5315,23 +5315,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5417,7 +5401,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5442,7 +5425,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5463,7 +5445,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5474,7 +5455,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5516,6 +5496,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5523,6 +5510,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..463e98ce371 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..369384a1ad6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-08  9:18 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-08  9:18 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:26e7ad9c8e48d0e9e471515e859030615fe60e20

commit 26e7ad9c8e48d0e9e471515e859030615fe60e20
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:08 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eccec5f2ed0..a6dc923de57 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5315,23 +5315,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5417,7 +5401,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5442,7 +5425,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5463,7 +5445,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5474,7 +5455,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5516,6 +5496,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5523,6 +5510,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..463e98ce371 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..369384a1ad6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing
@ 2023-06-08  4:48 Alexandre Oliva
  0 siblings, 0 replies; 7+ messages in thread
From: Alexandre Oliva @ 2023-06-08  4:48 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:37f768c40cdd5ff5af869369c335b62df8e578ff

commit 37f768c40cdd5ff5af869369c335b62df8e578ff
Author: Alexandre Oliva <oliva@adacore.com>
Date:   Thu Jun 8 01:43:08 2023 -0300

    strub: introduce downward scrubbing
    
    This patch causes strubbing to proceed downward on platforms in which
    the stack grows downward.  This is intended to avoid getting data
    unrelated to the stack corrupted by scrubbing in case the watermark is
    corrupted.  The underlying assumption that makes this more secure is
    that, if the watermark is corrupted to an address below the stack
    range, scrubbing will hit a stack guard page before causing any
    damage.
    
    The length computation in __strub_leave is moved into the dynamic
    array case only, and performed in a safer way to guard against some
    attack scenarios that get it misaligned pointers.  In the case without
    a dynamic array, we may scrub an extra few bytes at the stack-top-end
    of the scrub range.
    
    This also documents and streamlines the target settings to enable the
    use of dynamic arrays and memset in __strub_leave, drops the watermark
    (mis)optimization in the strub_enter builtin expander, and some
    obsolete preprocessor directives.
    
    
    for  gcc/ChangeLog
    
            * builtins.cc (expand_builtin_strub_enter): Drop
            misoptimization.
            (expand_builtin_strub_leave): Strub in the same direction the
            stack grows.
            * doc/tm.texi.in (TARGET_STRUB_USE_DYNAMIC_ARRAY): New.
            (TARGET_STRUB_MAY_USE_MEMSET): New.
            * doc/tm.texi: Rebuilt.
    
    for  libgcc/ChangeLog
    
            * strub.c (TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY): Drop in favor
            of...
            (TARGET_STRUB_USE_DYNAMIC_ARRAY): ... this.
            (TARGET_STRUB_MAŶ_USE_MEMSET): Rephrase default setting.
            (__strub_leave): Strub in the same direction the
            stack grows.  Simplify dynamic array and memset settings.
            Compute array length in a more conservative and security-aware
            way.
    
    Change-Id: I6293f2d503bbe12e9f58260c56b14e76494f0634
    TN: U611-048

Diff:
---
 gcc/builtins.cc    | 38 +++++++++++-------------
 gcc/doc/tm.texi    | 19 ++++++++++++
 gcc/doc/tm.texi.in | 19 ++++++++++++
 libgcc/strub.c     | 85 +++++++++++++++++++++++++++++++++++++++---------------
 4 files changed, 116 insertions(+), 45 deletions(-)

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index eccec5f2ed0..a6dc923de57 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -5315,23 +5315,7 @@ expand_builtin_strub_enter (tree exp)
   if (optimize < 1 || flag_no_inline)
     return NULL_RTX;
 
-  rtx stktop = NULL_RTX;
-
-#if 1 || defined RED_ZONE_SIZE
-  if (tree wmptr = (optimize
-		    ? strub_watermark_parm (current_function_decl)
-		    : NULL_TREE))
-    {
-      tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
-      tree wmtree = fold_build2 (MEM_REF, wmtype, wmptr,
-				 build_int_cst (TREE_TYPE (wmptr), 0));
-      rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
-      stktop = force_reg (ptr_mode, wmark);
-    }
-#endif
-
-  if (!stktop)
-    stktop = expand_builtin_stack_address ();
+  rtx stktop = expand_builtin_stack_address ();
 
   tree wmptr = CALL_EXPR_ARG (exp, 0);
   tree wmtype = TREE_TYPE (TREE_TYPE (wmptr));
@@ -5417,7 +5401,6 @@ expand_builtin_strub_update (tree exp)
 			   profile_probability::very_likely ());
   emit_move_insn (wmark, stktop);
 
-#if 1 || defined RED_ZONE_SIZE
   /* If this is an inlined strub function, also bump the watermark for the
      enclosing function.  This avoids a problem with the following scenario: A
      calls B and B calls C, and both B and C get inlined into A.  B allocates
@@ -5442,7 +5425,6 @@ expand_builtin_strub_update (tree exp)
 			       profile_probability::very_likely ());
       emit_move_insn (wmark, stktop);
     }
-#endif
 
   emit_label (lab);
 
@@ -5463,7 +5445,6 @@ expand_builtin_strub_leave (tree exp)
 
   rtx stktop = NULL_RTX;
 
-#if 1 || defined RED_ZONE_SIZE
   if (tree wmptr = (optimize
 		    ? strub_watermark_parm (current_function_decl)
 		    : NULL_TREE))
@@ -5474,7 +5455,6 @@ expand_builtin_strub_leave (tree exp)
       rtx wmark = expand_expr (wmtree, NULL_RTX, ptr_mode, EXPAND_MEMORY);
       stktop = force_reg (ptr_mode, wmark);
     }
-#endif
 
   if (!stktop)
     stktop = expand_builtin_stack_address ();
@@ -5516,6 +5496,13 @@ expand_builtin_strub_leave (tree exp)
 
       rtx zero = force_operand (const0_rtx, NULL_RTX);
       int ulen = GET_MODE_SIZE (ptr_mode);
+
+      /* ??? It would be nice to use setmem or similar patterns here,
+	 but they do not necessarily obey the stack growth direction,
+	 which has security implications.  We also have to avoid calls
+	 (memset, bzero or any machine-specific ones), which are
+	 likely unsafe here (see TARGET_STRUB_MAY_USE_MEMSET).  */
+#ifndef STACK_GROWS_DOWNWARD
       rtx incr = plus_constant (Pmode, base, ulen);
       rtx dstm = gen_rtx_MEM (ptr_mode, base);
 
@@ -5523,6 +5510,15 @@ expand_builtin_strub_leave (tree exp)
       emit_label (loop);
       emit_move_insn (dstm, zero);
       emit_move_insn (base, force_operand (incr, NULL_RTX));
+#else
+      rtx decr = plus_constant (Pmode, end, -ulen);
+      rtx dstm = gen_rtx_MEM (ptr_mode, end);
+
+      rtx_code_label *loop = gen_label_rtx ();
+      emit_label (loop);
+      emit_move_insn (end, force_operand (decr, NULL_RTX));
+      emit_move_insn (dstm, zero);
+#endif
       do_compare_rtx_and_jump (base, end, LT, STACK_UNSIGNED,
 			       Pmode, NULL_RTX, NULL, loop,
 			       profile_probability::very_likely ());
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 95ba56e05ae..463e98ce371 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -3399,6 +3399,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 4ac96dc357d..369384a1ad6 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2648,6 +2648,25 @@ in DWARF 2 debug information.  The default is zero.  A different value
 may reduce the size of debug information on some ports.
 @end defmac
 
+@defmac TARGET_STRUB_USE_DYNAMIC_ARRAY
+If defined to nonzero, @code{__strub_leave} will allocate a dynamic
+array covering the stack range that needs scrubbing before clearing it.
+Allocating the array tends to make scrubbing slower, but it enables the
+scrubbing to be safely implemented with a @code{memet} call, which could
+make up for the difference.
+@end defmac
+
+@defmac TARGET_STRUB_MAY_USE_MEMSET
+If defined to nonzero, enable @code{__strub_leave} to be optimized so as
+to call @code{memset} for stack scrubbing.  This is only enabled by
+default if @code{TARGET_STRUB_USE_DYNAMIC_ARRAY} is enabled; it's not
+advisable to enable it otherwise, since @code{memset} would then likely
+overwrite its own stack frame, but it might work if the target ABI
+enables @code{memset} to not use the stack at all, not even for
+arguments or its return address, and its implementation is trivial
+enough that it doesn't use a stack frame.
+@end defmac
+
 @node Exception Handling
 @subsection Exception Handling Support
 @cindex exception handling
diff --git a/libgcc/strub.c b/libgcc/strub.c
index 90d3e82067b..2a2b930b603 100644
--- a/libgcc/strub.c
+++ b/libgcc/strub.c
@@ -57,56 +57,93 @@ __strub_update (void **watermark)
     *watermark = sp;
 }
 
-#ifndef TARGET_STRUB_USE_DYNAMIC_ARRAY
-# define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY && ! defined TARGET_STRUB_MAY_USE_MEMSET
+# define TARGET_STRUB_MAY_USE_MEMSET 1
 #endif
 
-#ifndef TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
-# ifdef TARGET_STRUB_MAY_USE_MEMSET
-#  define TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY 1
-# else
-#  define TARGET_STRUB_MAY_USE_MEMSET 1
-# endif
+#if defined __x86_64__ && __OPTIMIZE__
+# define TARGET_STRUB_DISABLE_RED_ZONE \
+  /* __attribute__ ((__target__ ("no-red-zone"))) // not needed when optimizing */
+#elif !defined RED_ZONE_SIZE || defined __i386__
+# define TARGET_STRUB_DISABLE_RED_ZONE
 #endif
 
-/* Leave a stack scrubbing context, restoring and updating SAVED, and
-   clearing the stack between top and watermark.  */
+#ifndef TARGET_STRUB_DISABLE_RED_ZONE
+/* Dummy function, called to force the caller to not be a leaf function, so
+   that it can't use the red zone.  */
+static void ATTRIBUTE_STRUB_CALLABLE
+__attribute__ ((__noinline__, __noipa__))
+__strub_dummy_force_no_leaf (void)
+{
+}
+#endif
+
+/* Leave a stack scrubbing context, clearing the stack between its top and
+   *MARK.  */
 void ATTRIBUTE_STRUB_CALLABLE
 #if ! TARGET_STRUB_MAY_USE_MEMSET
 __attribute__ ((__optimize__ ("-fno-tree-loop-distribute-patterns")))
 #endif
+#ifdef TARGET_STRUB_DISABLE_RED_ZONE
+TARGET_STRUB_DISABLE_RED_ZONE
+#endif
 __strub_leave (void **mark)
 {
   void *sp = __builtin_stack_address ();
 
   void **base, **end;
 #ifndef STACK_GROWS_DOWNWARD
-  base = sp;
+  base = sp; /* ??? Do we need an offset here?  */
   end = *mark;
 #else
   base = *mark;
-  end = sp;
+  end = sp; /* ??? Does any platform require an offset here?  */
 #endif
 
-  ptrdiff_t len = end - base;
-  if (len <= 0)
+  if (! (base < end))
     return;
 
-#if ! TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY
+#if TARGET_STRUB_USE_DYNAMIC_ARRAY
+  /* Compute the length without assuming the pointers are both sufficiently
+     aligned.  They should be, but pointer differences expected to be exact may
+     yield unexpected results when the assumption doesn't hold.  Given the
+     potential security implications, compute the length without that
+     expectation.  If the pointers are misaligned, we may leave a partial
+     unscrubbed word behind.  */
+  ptrdiff_t len = ((char *)end - (char *)base) / sizeof (void *);
   /* Allocate a dynamically-sized array covering the desired range, so that we
      can safely call memset on it.  */
   void *ptr[len];
   base = &ptr[0];
   end = &ptr[len];
-#else
-  void **ptr = end;
-#endif /* TARGET_STRUB_DONT_USE_DYNAMIC_ARRAY */
+#elifndef TARGET_STRUB_DISABLE_RED_ZONE
+  /* Prevent the use of the red zone, by making this function non-leaf through
+     an unreachable call that, because of the asm stmt, the compiler will
+     consider reachable.  */
+  asm goto ("" : : : : no_leaf);
+  if (0)
+    {
+    no_leaf:
+      __strub_dummy_force_no_leaf ();
+      return;
+    }
+#endif
 
-  /* ldist turns this into a memset.  Without the dynamic array above, that call
-     is likely unsafe: possibly tail-called, and likely scribbling over its own
-     stack frame.  */
-  while (base < end)
+  /* ldist may turn these loops into a memset (thus the conditional
+     -fno-tree-loop-distribute-patterns above).  Without the dynamic array
+     above, that call would likely be unsafe: possibly tail-called, and likely
+     scribbling over its own stack frame.  */
+#ifndef STACK_GROWS_DOWNWARD
+  do
     *base++ = 0;
-
-  asm ("" : : "m" (ptr));
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (end[0]));
+#else
+  do
+    *--end = 0;
+  while (base < end);
+  /* Make sure the stack overwrites are not optimized away.  */
+  asm ("" : : "m" (base[0]));
+#endif
 }

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-06-09  8:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-08 10:59 [gcc(refs/users/aoliva/heads/testme)] strub: introduce downward scrubbing Alexandre Oliva
  -- strict thread matches above, loose matches on Subject: below --
2023-06-09  8:07 Alexandre Oliva
2023-06-09  6:26 Alexandre Oliva
2023-06-09  6:17 Alexandre Oliva
2023-06-08 10:44 Alexandre Oliva
2023-06-08  9:18 Alexandre Oliva
2023-06-08  4:48 Alexandre Oliva

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).