public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/3]  Add __builtin_load_no_speculate
@ 2018-01-04 13:58 Richard Earnshaw
  2018-01-04 13:58 ` [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Richard Earnshaw
                   ` (7 more replies)
  0 siblings, 8 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-04 13:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 4444 bytes --]


Recently, Google Project Zero disclosed several classes of attack
against speculative execution. One of these, known as variant-1
(CVE-2017-5753), allows explicit bounds checks to be bypassed under
speculation, providing an arbitrary read gadget. Further details can
be found on the GPZ blog [1] and the documentation that is included
with the first patch.

This patch set adds a new builtin function for GCC to provide a
mechanism for limiting speculation by a CPU after a bounds-checked
memory access.  I've tried to design this in such a way that it can be
used for any target where this might be necessary.  The patch set
provides a generic implementation of the builtin and then
target-specific support for Arm and AArch64.  Other architectures can
utilize the internal infrastructure as needed.

Most of the details of the builtin and the hooks that need to be
implemented to support it are described in the updates to the manual,
but a short summary is given below.

TYP __builtin_load_no_speculate
        (const volatile TYP *ptr,
         const volatile void *lower,
         const volatile void *upper,
         TYP failval,
         const volatile void *cmpptr)

Where TYP can be any integral type (signed or unsigned char, int,
short, long, etc) or any pointer type.

The builtin implements the following logical behaviour:

inline TYP __builtin_load_no_speculate
         (const volatile TYP *ptr,
          const volatile void *lower,
          const volatile void *upper,
          TYP failval,
          const volatile void *cmpptr)
{
  TYP result;

  if (cmpptr >= lower && cmpptr < upper)
    result = *ptr;
  else
    result = failval;
  return result;
}

in addition the specification of the builtin ensures that future
speculation using *ptr may only continue iff cmpptr lies within the
bounds specified.

Some optimizations are permitted to make the builtin easier to use.
The final two arguments can both be omitted (c++ style): failval will
default to 0 in this case and if cmpptr is omitted ptr will be used
for expansions of the range check.  In addition either lower or upper
(but not both) may be a literal NULL and the expansion will then
ignore that boundary condition when expanding.

The patch set is constructed as follows:
1 - generic modifications to GCC providing the builtin function for all
    architectures and expanding to an implementation that gives the
    logical behaviour of the builtin only.  A warning is generated if
    this expansion path is used that code will execute correctly but
    without providing protection against speculative use.
2 - AArch64 support
3 - AArch32 support (arm) for A32 and thumb2 states.

These patches can be used with the header file that Arm recently
published here: https://github.com/ARM-software/speculation-barrier.

Kernel patches are also being developed, eg:
https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
code like this will be able to use support directly from the compiler
in a portable manner.

Similar patches are also being developed for LLVM and will be posted
to their development lists shortly.

[1] More information on the topic can be found here:
https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
Arm specific information can be found here: https://www.arm.com/security-update



Richard Earnshaw (3):
  [builtins] Generic support for __builtin_load_no_speculate()
  [aarch64] Implement support for __builtin_load_no_speculate.
  [arm] Implement support for the de-speculation intrinsic

 gcc/builtin-types.def         |  16 +++++
 gcc/builtins.c                |  99 +++++++++++++++++++++++++
 gcc/builtins.def              |  22 ++++++
 gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md |  28 ++++++++
 gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
 gcc/config/arm/arm.md         |  40 ++++++++++-
 gcc/config/arm/unspecs.md     |   1 +
 gcc/doc/cpp.texi              |   4 ++
 gcc/doc/extend.texi           |  53 ++++++++++++++
 gcc/doc/tm.texi               |   6 ++
 gcc/doc/tm.texi.in            |   2 +
 gcc/target.def                |  20 ++++++
 gcc/targhooks.c               |  69 ++++++++++++++++++
 gcc/targhooks.h               |   3 +
 17 files changed, 729 insertions(+), 2 deletions(-)

[-- Attachment #2: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
@ 2018-01-04 13:58 ` Richard Earnshaw
  2018-01-08  2:20   ` Bill Schmidt
  2018-01-04 13:59 ` [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate Richard Earnshaw
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-04 13:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 2238 bytes --]


This patch adds generic support for the new builtin
__builtin_load_no_speculate.  It provides the overloading of the
different access sizes and a default fall-back expansion for targets
that do not support a mechanism for inhibiting speculation.

	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
	New builtin type signature.
	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
	* target.def (inhibit_load_speculation): New hook.
	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
	documentation.
	* doc/tm.texi: Regenerated.
	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
	* doc/extend.texi: Document __builtin_load_no_speculate.
	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
	(load_no_speculate_resolve_params): New function.
	(load_no_speculate_resolve_return): New function.
	(resolve_overloaded_builtin): Handle overloading
	__builtin_load_no_speculate.
	* builtins.c (expand_load_no_speculate): New function.
	(expand_builtin): Handle new no-speculation builtins.
	* targhooks.h (default_inhibit_load_speculation): Declare.
	* targhooks.c (default_inhibit_load_speculation): New function.
---
 gcc/builtin-types.def       |  16 +++++
 gcc/builtins.c              |  99 ++++++++++++++++++++++++++
 gcc/builtins.def            |  22 ++++++
 gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c |   5 +-
 gcc/doc/cpp.texi            |   4 ++
 gcc/doc/extend.texi         |  53 ++++++++++++++
 gcc/doc/tm.texi             |   6 ++
 gcc/doc/tm.texi.in          |   2 +
 gcc/target.def              |  20 ++++++
 gcc/targhooks.c             |  69 +++++++++++++++++++
 gcc/targhooks.h             |   3 +
 12 files changed, 462 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-builtins-Generic-support-for-__builtin_load_no_specu.patch --]
[-- Type: text/x-patch; name="0001-builtins-Generic-support-for-__builtin_load_no_specu.patch", Size: 22762 bytes --]

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index bb50e60..259aacd 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -785,6 +785,22 @@ DEF_FUNCTION_TYPE_VAR_3 (BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR,
 DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING)
 
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I1, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I2, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I4, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I8, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I16, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+
 DEF_FUNCTION_TYPE_VAR_4 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VAR,
 			 BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING)
 
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 98eb804..1bdbc64 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6602,6 +6602,97 @@ expand_stack_save (void)
   return ret;
 }
 
+/* Expand a call to __builtin_load_no_speculate_<N>.  MODE represents the
+   size of the first argument to that call.  We emit a warning if the
+   result isn't used (IGNORE != 0), since the implementation might
+   rely on the value being used to correctly inhibit speculation.  */
+static rtx
+expand_load_no_speculate (machine_mode mode, tree exp, rtx target, int ignore)
+{
+  rtx ptr, op0, op1, op2, op3, op4;
+  unsigned nargs = call_expr_nargs (exp);
+
+  if (ignore)
+    {
+      warning_at (input_location, 0,
+		  "result of __builtin_load_no_speculate must be used to "
+		  "ensure correct operation");
+      target = NULL;
+    }
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  ptr = expand_expr (arg0, NULL_RTX, ptr_mode, EXPAND_SUM);
+  op0 = validize_mem (gen_rtx_MEM (mode, convert_memory_address (Pmode, ptr)));
+
+  set_mem_align (op0, MAX (GET_MODE_ALIGNMENT (mode),
+			   get_pointer_alignment (arg0)));
+  set_mem_alias_set (op0, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
+
+  /* Mark the memory access as volatile.  We don't want the optimizers to
+     move it or otherwise substitue an alternative value.  */
+  MEM_VOLATILE_P (op0) = 1;
+
+  if (integer_zerop (tree_strip_nop_conversions (arg1)))
+    op1 = NULL;
+  else
+    {
+      op1 = expand_normal (arg1);
+      if (GET_MODE (op1) != ptr_mode && GET_MODE (op1) != VOIDmode)
+	op1 = convert_modes (ptr_mode, VOIDmode, op1,
+			     TYPE_UNSIGNED (TREE_TYPE (arg1)));
+    }
+
+  if (integer_zerop (tree_strip_nop_conversions (arg2)))
+    op2 = NULL;
+  else
+    {
+      op2 = expand_normal (arg2);
+      if (GET_MODE (op2) != ptr_mode && GET_MODE (op2) != VOIDmode)
+	op2 = convert_modes (ptr_mode, VOIDmode, op2,
+			     TYPE_UNSIGNED (TREE_TYPE (arg2)));
+    }
+
+  if (nargs > 3)
+    {
+      tree arg3 = CALL_EXPR_ARG (exp, 3);
+      op3 = expand_normal (arg3);
+      if (CONST_INT_P (op3))
+	op3 = gen_int_mode (INTVAL (op3), mode);
+      else if (GET_MODE (op3) != mode && GET_MODE (op3) != VOIDmode)
+	op3 = convert_modes (mode, VOIDmode, op3,
+			     TYPE_UNSIGNED (TREE_TYPE (arg3)));
+    }
+  else
+    op3 = const0_rtx;
+
+  if (nargs > 4)
+    {
+      tree arg4 = CALL_EXPR_ARG (exp, 4);
+      op4 = expand_normal (arg4);
+      if (GET_MODE (op4) != ptr_mode && GET_MODE (op4) != VOIDmode)
+	op4 = convert_modes (ptr_mode, VOIDmode, op4,
+			     TYPE_UNSIGNED (TREE_TYPE (arg4)));
+    }
+  else
+    op4 = ptr;
+
+  if (op1 == NULL && op2 == NULL)
+    {
+      error_at (input_location,
+		"at least one speculation bound must be non-NULL");
+      /* Ensure we don't crash later.  */
+      op1 = op4;
+    }
+
+  if (target == NULL)
+    target = gen_reg_rtx (mode);
+
+  return targetm.inhibit_load_speculation (mode, target, op0, op1, op2, op3,
+					   op4);
+}
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
@@ -7732,6 +7823,14 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	 folding.  */
       break;
 
+    case BUILT_IN_LOAD_NO_SPECULATE_1:
+    case BUILT_IN_LOAD_NO_SPECULATE_2:
+    case BUILT_IN_LOAD_NO_SPECULATE_4:
+    case BUILT_IN_LOAD_NO_SPECULATE_8:
+    case BUILT_IN_LOAD_NO_SPECULATE_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_LOAD_NO_SPECULATE_1);
+      return expand_load_no_speculate (mode, exp, target, ignore);
+
     default:	/* just do library call, if unknown builtin */
       break;
     }
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 671097e..761c063 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1017,6 +1017,28 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
 	     true, true, true, ATTR_NOTHROW_LEAF_LIST, false,
 	     !targetm.have_tls)
 
+/* Suppressing speculation.  Users are expected to use the first (N)
+   variant, which will be translated internally into one of the other
+   types.  */
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_N, "load_no_speculate",
+		 BT_FN_VOID_VAR, ATTR_NULL)
+
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_1, "load_no_speculate_1",
+		 BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_2, "load_no_speculate_2",
+		 BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_4, "load_no_speculate_4",
+		 BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_8, "load_no_speculate_8",
+		 BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_LOAD_NO_SPECULATE_16, "load_no_speculate_16",
+		 BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		 ATTR_NULL)
+
 /* Exception support.  */
 DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
 DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 197a71f..c213ffd 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6456,6 +6456,146 @@ builtin_type_for_size (int size, bool unsignedp)
   return type ? type : error_mark_node;
 }
 
+/* Work out the size of the object pointed to by the first arguement
+   of a call to __builtin_load_no_speculate.  Only pointers to
+   integral types and pointers are permitted.  Return 0 if the
+   arguement type is not supported of if the size is too large.  */
+static int
+load_no_speculate_resolve_size (tree function, vec<tree, va_gc> *params)
+{
+  /* Type of the argument.  */
+  tree type;
+  int size;
+
+  if (vec_safe_is_empty (params))
+    {
+      error ("too few arguments to function %qE", function);
+      return 0;
+    }
+
+  type = TREE_TYPE ((*params)[0]);
+
+  if (!POINTER_TYPE_P (type))
+    goto incompatible;
+
+  type = TREE_TYPE (type);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      /* Force array-to-pointer decay for c++.  */
+      gcc_assert (c_dialect_cxx());
+      (*params)[0] = default_conversion ((*params)[0]);
+      type = TREE_TYPE ((*params)[0]);
+    }
+
+  if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type))
+    goto incompatible;
+
+  if (!COMPLETE_TYPE_P (type))
+   goto incompatible;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+  if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16)
+    return size;
+
+ incompatible:
+  /* Issue the diagnostic only if the argument is valid, otherwise
+     it would be redundant at best and could be misleading.  */
+  if (type != error_mark_node)
+    error ("operand type %qT is incompatible with argument %d of %qE",
+	   type, 1, function);
+
+  return 0;
+}
+
+/* Validate and coerce PARAMS, the arguments to ORIG_FUNCTION to fit
+   the prototype for FUNCTION.  The first three arguments are
+   mandatory, but shouldn't need casting as they are all pointers and
+   we've already established that the first argument is a pointer to a
+   permitted type.  The two optional arguments may need to be
+   fabricated if they have been omitted.  */
+static bool
+load_no_speculate_resolve_params (location_t loc, tree orig_function,
+				  tree function,
+				  vec<tree, va_gc> *params)
+{
+  function_args_iterator iter;
+
+  function_args_iter_init (&iter, TREE_TYPE (function));
+  tree arg_type = function_args_iter_cond (&iter);
+  unsigned parmnum;
+  tree val;
+
+  if (params->length () < 3)
+    {
+      error_at (loc, "too few arguments to function %qE", orig_function);
+      return false;
+    }
+  else if (params->length () > 5)
+    {
+      error_at (loc, "too many arguments to function %qE", orig_function);
+      return false;
+    }
+
+  /* Required arguments.  These must all be pointers.  */
+  for (parmnum = 0; parmnum < 3; parmnum++)
+    {
+      arg_type = function_args_iter_cond (&iter);
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  /* Optional integer value.  */
+  arg_type = function_args_iter_cond (&iter);
+  if (params->length () >= 4)
+    {
+      val = (*params)[parmnum];
+      val = convert (arg_type, val);
+      (*params)[parmnum] = val;
+    }
+  else
+    return true;
+
+  /* Optional pointer to compare against.  */
+  parmnum = 4;
+  arg_type = function_args_iter_cond (&iter);
+  if (params->length () == 5)
+    {
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  return true;
+
+ bad_arg:
+  error_at (loc, "expecting argument of type %qT for argument %u", arg_type,
+	    parmnum);
+  return false;
+}
+
+/* Cast the result of the builtin back to the type pointed to by the
+   first argument, preserving any qualifiers that it might have.  */
+static tree
+load_no_speculate_resolve_return (tree first_param, tree result)
+{
+  tree ptype = TREE_TYPE (TREE_TYPE (first_param));
+  tree rtype = TREE_TYPE (result);
+  ptype = TYPE_MAIN_VARIANT (ptype);
+
+  if (tree_int_cst_equal (TYPE_SIZE (ptype), TYPE_SIZE (rtype)))
+    return convert (ptype, result);
+
+  return result;
+}
+
 /* A helper function for resolve_overloaded_builtin in resolving the
    overloaded __sync_ builtins.  Returns a positive power of 2 if the
    first operand of PARAMS is a pointer to a supported data type.
@@ -7110,6 +7250,30 @@ resolve_overloaded_builtin (location_t loc, tree function,
   /* Handle BUILT_IN_NORMAL here.  */
   switch (orig_code)
     {
+    case BUILT_IN_LOAD_NO_SPECULATE_N:
+      {
+	int n = load_no_speculate_resolve_size (function, params);
+	tree new_function, first_param, result;
+	enum built_in_function fncode;
+
+	if (n == 0)
+	  return error_mark_node;
+
+	fncode = (enum built_in_function)((int)orig_code + exact_log2 (n) + 1);
+	new_function = builtin_decl_explicit (fncode);
+	first_param = (*params)[0];
+	if (!load_no_speculate_resolve_params (loc, function, new_function,
+					       params))
+	  return error_mark_node;
+
+	result = build_function_call_vec (loc, vNULL, new_function, params,
+					  NULL);
+	if (result == error_mark_node)
+	  return result;
+
+	return load_no_speculate_resolve_return (first_param, result);
+      }
+
     case BUILT_IN_ATOMIC_EXCHANGE:
     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
     case BUILT_IN_ATOMIC_LOAD:
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 9e33aed..fb06ee7 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1361,7 +1361,10 @@ c_cpp_builtins (cpp_reader *pfile)
     cpp_define (pfile, "__WCHAR_UNSIGNED__");
 
   cpp_atomic_builtins (pfile);
-    
+
+  /* Show support for __builtin_load_no_speculate ().  */
+  cpp_define (pfile, "__HAVE_LOAD_NO_SPECULATE");
+
 #ifdef DWARF2_UNWIND_INFO
   if (dwarf2out_do_cfi_asm ())
     cpp_define (pfile, "__GCC_HAVE_DWARF2_CFI_ASM");
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 94437d5..9dca2e2 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -2381,6 +2381,10 @@ If GCC cannot determine the current date, it will emit a warning message
 These macros are defined when the target processor supports atomic compare
 and swap operations on operands 1, 2, 4, 8 or 16 bytes in length, respectively.
 
+@item __HAVE_LOAD_NO_SPECULATE
+This macro is defined with the value 1 to show that this version of GCC
+supports @code{__builtin_load_no_speculate}.
+
 @item __GCC_HAVE_DWARF2_CFI_ASM
 This macro is defined when the compiler is emitting DWARF CFI directives
 to the assembler.  When this is defined, it is possible to emit those same
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2a553ad..7a71f34 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10968,6 +10968,7 @@ the built-in function returns -1.
 @findex __builtin_islessequal
 @findex __builtin_islessgreater
 @findex __builtin_isunordered
+@findex __builtin_load_no_speculate
 @findex __builtin_powi
 @findex __builtin_powif
 @findex __builtin_powil
@@ -11614,6 +11615,58 @@ check its compatibility with @var{size}.
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_load_no_speculate (const volatile @var{type} *ptr, const volatile void *lower_bound, const volatile void *upper_bound, @var{type} failval, const volatile void *cmpptr)
+The @code{__builtin_load_no_speculation} function provides a means to
+limit the extent to which a processor can continue speculative
+execution with the result of loading a value stored at @var{ptr}.
+Logically, the builtin implements the following behavior:
+
+@smallexample
+inline @var{type} __builtin_load_no_speculate
+    (const volatile @var{type} *ptr,
+     const volatile void *lower_bound,
+     const volatile void *upper_bound,
+     @var{type} failval,
+     const volatile void *cmpptr)
+@{
+  @var{type} result;
+  if (cmpptr >= lower_bound && cmpptr < upper_bound)
+    result = *ptr;
+  else
+    result = failval;
+  return result;
+@}
+@end smallexample
+
+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.
+
+@var{type} may be any integral type (signed, or unsigned, @code{char},
+@code{short}, @code{int}, etc) or a pointer to any type.
+
+The final argument, @var{cmpptr}, may be omitted.  If you do this,
+then the compiler will use @var{ptr} for comparison against the upper
+and lower bounds.  Furthermore, if you omit @var{cmpptr}, you may also
+omit @var{failval} and the compiler will use @code{(@var{type})0} for
+the out-of-bounds result.
+
+Additionally, when it is know that one of the bounds can never fail,
+you can use a literal @code{NULL} argument and the compiler will
+generate code that only checks the other boundary condition.  It is generally
+only safe to do this when your code contains a loop construct where the only
+boundary of interest is the one beyond the termination condition.  You cannot
+omit both boundary conditions in this way.
+
+The logical behaviour of the builtin is supported for all architectures, but
+on machines where target-specific support for inhibiting speculation is not
+implemented, or not necessary, the compiler will emit a warning.
+
+The pre-processor macro @code{__HAVE_LOAD_NO_SPECULATE} is defined with the
+value 1 on all implementations of GCC that support this builtin.
+
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_types_compatible_p (@var{type1}, @var{type2})
 
 You can use the built-in function @code{__builtin_types_compatible_p} to
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9793a0e..7309ccb 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11922,6 +11922,12 @@ maintainer is familiar with.
 
 @end defmac
 
+@deftypefn {Target Hook} rtx TARGET_INHIBIT_LOAD_SPECULATION (machine_mode @var{mode}, rtx @var{result}, rtx @var{mem}, rtx @var{lower_bound}, rtx @var{upper_bound}, rtx @var{fail_result}, rtx @var{cmpptr})
+Generate a target-specific code sequence that implements @code{__builtin_load_no_speculate}, returning the result in @var{result}. If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode}, should be returned, otherwise @var{failval} should be returned.  The expansion must ensure that subsequent speculation by the processor using the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified bounds.  At most one of @var{lower_bound} and @var{upper_bound} can be @code{NULL_RTX}, indicating that code for that bounds check should not be generated.
+ 
+ The default implementation implements the logic of the builtin but cannot provide the target-specific code necessary to inhibit speculation.  A warning will be emitted to that effect.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_RUN_TARGET_SELFTESTS (void)
 If selftests are enabled, run any selftests for this target.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 7bcfb37..d34e4bf 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8075,4 +8075,6 @@ maintainer is familiar with.
 
 @end defmac
 
+@hook TARGET_INHIBIT_LOAD_SPECULATION
+
 @hook TARGET_RUN_TARGET_SELFTESTS
diff --git a/gcc/target.def b/gcc/target.def
index e9eacc8..375eb0a 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4214,6 +4214,26 @@ DEFHOOK
  hook_bool_void_true)
 
 DEFHOOK
+(inhibit_load_speculation,
+ "Generate a target-specific code sequence that implements\
+ @code{__builtin_load_no_speculate}, returning the result in @var{result}.\
+ If @var{cmpptr} is greater than, or equal to, @var{lower_bound} and less\
+ than @var{upper_bound} then @var{mem}, a @code{MEM} of type @var{mode},\
+ should be returned, otherwise @var{failval} should be returned.  The\
+ expansion must ensure that subsequent speculation by the processor using\
+ the @var{mem} cannot occur if @var{cmpptr} lies outside of the specified\
+ bounds.  At most one of @var{lower_bound} and @var{upper_bound} can be\
+ @code{NULL_RTX}, indicating that code for that bounds check should not be\
+ generated.\n\
+ \n\
+ The default implementation implements the logic of the builtin\
+ but cannot provide the target-specific code necessary to inhibit\
+ speculation.  A warning will be emitted to that effect.",
+ rtx, (machine_mode mode, rtx result, rtx mem, rtx lower_bound,
+       rtx upper_bound, rtx fail_result, rtx cmpptr),
+ default_inhibit_load_speculation)
+
+DEFHOOK
 (can_use_doloop_p,
  "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 653567c..24d9c7b 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "real.h"
 #include "langhooks.h"
+#include "dojump.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2307,4 +2308,72 @@ default_stack_clash_protection_final_dynamic_probe (rtx residual ATTRIBUTE_UNUSE
   return 0;
 }
 
+/* Default implementation of the load-and-inhibit-speculation builtin.
+   This version does not have, or know of, the target-specific
+   mechanisms necessary to inhibit speculation, so it simply emits a
+   code sequence that implements the architectural aspects of the
+   builtin.  */
+rtx
+default_inhibit_load_speculation (machine_mode mode ATTRIBUTE_UNUSED,
+				  rtx result,
+				  rtx mem,
+				  rtx lower_bound,
+				  rtx upper_bound,
+				  rtx fail_result,
+				  rtx cmpptr)
+{
+  rtx_code_label *done_label = gen_label_rtx ();
+  rtx_code_label *inrange_label = gen_label_rtx ();
+  warning_at
+    (input_location, 0,
+     "this target does not support anti-speculation operations.  "
+     "Your program will still execute correctly, but speculation "
+     "will not be inhibited");
+
+  /* We don't have any despeculation barriers, but if we mark the branch
+     probabilities to be always predicting the out-of-bounds path, then
+     there's a higher chance that the compiler will order code so that
+     static prediction will fall through a safe path.  */
+  if (lower_bound == NULL)
+    {
+      do_compare_rtx_and_jump (cmpptr, upper_bound, LTU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+  else if (upper_bound == NULL)
+    {
+      do_compare_rtx_and_jump (cmpptr, lower_bound, GEU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+  else
+    {
+      rtx_code_label *oob_label = gen_label_rtx ();
+      do_compare_rtx_and_jump (cmpptr, lower_bound, LTU, true, ptr_mode,
+			       NULL, NULL, oob_label,
+			       profile_probability::always ());
+      do_compare_rtx_and_jump (cmpptr, upper_bound, GEU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_label (oob_label);
+      emit_move_insn (result, fail_result);
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+
+  return result;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index e753e58..c55b43f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -286,4 +286,7 @@ extern enum flt_eval_method
 default_excess_precision (enum excess_precision_type ATTRIBUTE_UNUSED);
 extern bool default_stack_clash_protection_final_dynamic_probe (rtx);
 
+extern rtx
+default_inhibit_load_speculation (machine_mode, rtx, rtx, rtx, rtx, rtx, rtx);
+
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
  2018-01-04 13:58 ` [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Richard Earnshaw
@ 2018-01-04 13:59 ` Richard Earnshaw
  2018-01-05  9:51   ` Richard Biener
  2018-01-04 13:59 ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-04 13:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]


This patch implements support for __builtin_load_no_speculate on
AArch64.  On this architecture we inhibit speclation by emitting a
combination of CSEL and a hint instruction that ensures the CSEL is
full resolved when the operands to the CSEL may involve a speculative
load.

	* config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
	to 'H' operand qualifier.
	(aarch64_inhibit_load_speculation): New function.
	(TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
	* config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
	code.
	(nospeculate<ALLI:mode>, nospeculateti): New patterns.
---
 gcc/config/aarch64/aarch64.c  | 92 +++++++++++++++++++++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md | 28 +++++++++++++
 2 files changed, 120 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-aarch64-Implement-support-for-__builtin_load_no_spec.patch --]
[-- Type: text/x-patch; name="0002-aarch64-Implement-support-for-__builtin_load_no_spec.patch", Size: 5273 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93e9d9f9..7410921 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5315,6 +5315,14 @@ aarch64_print_operand (FILE *f, rtx x, int code)
       break;
 
     case 'H':
+       /* Print the higher numbered register of a pair (TImode) of regs.  */
+      if (x == const0_rtx
+	  || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
+	{
+	  asm_fprintf (f, "xzr");
+	  break;
+	}
+
       if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1))
 	{
 	  output_operand_lossage ("invalid operand for '%%%c'", code);
@@ -15115,6 +15123,87 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
     }
 }
 
+static rtx
+aarch64_inhibit_load_speculation (machine_mode mode, rtx result, rtx mem,
+				  rtx lower_bound, rtx upper_bound,
+				  rtx fail_result, rtx cmpptr)
+{
+  rtx cond, comparison;
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (cmpptr, ptr_mode))
+    cmpptr = force_reg (ptr_mode, cmpptr);
+
+  if (!register_operand (tgt2, mode))
+    tgt2 = gen_reg_rtx (mode);
+
+  if (upper_bound == NULL)
+    {
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      cond = aarch64_gen_compare_reg (LTU, cmpptr, lower_bound);
+      comparison = gen_rtx_LTU (VOIDmode, cond, const0_rtx);
+    }
+  else if (lower_bound == NULL)
+    {
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      cond = aarch64_gen_compare_reg (GEU, cmpptr, upper_bound);
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+  else
+    {
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      rtx cond1 = aarch64_gen_compare_reg (GEU, cmpptr, lower_bound);
+      rtx comparison1 = gen_rtx_GEU (ptr_mode, cond1, const0_rtx);
+      rtx failcond = GEN_INT (aarch64_get_condition_code (comparison1)^1);
+      cond = gen_rtx_REG (CCmode, CC_REGNUM);
+      if (ptr_mode == SImode)
+	emit_insn (gen_ccmpsi (cond1, cond, cmpptr, upper_bound, comparison1,
+			       failcond));
+      else
+	emit_insn (gen_ccmpdi (cond1, cond, cmpptr, upper_bound, comparison1,
+			       failcond));
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_condjump (comparison, cond, label));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  switch (mode)
+    {
+    case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+    case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+    case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+    case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+    case E_TImode: icode = CODE_FOR_nospeculateti; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (! insn_operand_matches (icode, 4, fail_result))
+    fail_result = force_reg (mode, fail_result);
+
+  emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, fail_result));
+
+  if (tgt2 != result)
+    emit_move_insn (result, tgt2);
+
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15554,6 +15643,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
+#undef TARGET_INHIBIT_LOAD_SPECULATION
+#define TARGET_INHIBIT_LOAD_SPECULATION aarch64_inhibit_load_speculation
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f1e2a07..1a1f398 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,7 @@
     UNSPECV_SET_FPSR		; Represent assign of FPSR content.
     UNSPECV_BLOCKAGE		; Represent a blockage
     UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+    UNSPECV_NOSPECULATE		; Inhibit speculation
   ]
 )
 
@@ -5797,6 +5798,33 @@
   DONE;
 })
 
+(define_insn "nospeculate<ALLI:mode>"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+        (unspec_volatile:ALLI
+         [(match_operator 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:ALLI 3 "register_operand" "r")
+	  (match_operand:ALLI 4 "aarch64_reg_or_zero" "rZ")]
+	 UNSPECV_NOSPECULATE))]
+  ""
+  "csel\\t%<w>0, %<w>3, %<w>4, %M1\;hint\t#0x14\t// CSDB"
+  [(set_attr "type" "csel")
+   (set_attr "length" "8")]
+)
+
+(define_insn "nospeculateti"
+  [(set (match_operand:TI 0 "register_operand" "=r")
+        (unspec_volatile:TI
+         [(match_operator 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:TI 3 "register_operand" "r")
+	  (match_operand:TI 4 "aarch64_reg_or_zero" "rZ")]
+	 UNSPECV_NOSPECULATE))]
+  ""
+  "csel\\t%x0, %x3, %x4, %M1\;csel\\t%H0, %H3, %H4, %M1\;hint\t#0x14\t// CSDB"
+  [(set_attr "type" "csel")
+   (set_attr "length" "12")]
+)
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
  2018-01-04 13:58 ` [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Richard Earnshaw
  2018-01-04 13:59 ` [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate Richard Earnshaw
@ 2018-01-04 13:59 ` Richard Earnshaw
  2018-01-04 18:20 ` [PATCH 0/3] Add __builtin_load_no_speculate Joseph Myers
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-04 13:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 826 bytes --]


This patch implements despeculation on ARM.  We only support it when
generating ARM or Thumb2 code (we need conditional execution); and we
only support it for sizes up to DImode.  For unsupported cases we
fall back to the generic code generation sequence so that a suitable
failure warning is emitted.

	* config/arm/arm.c (arm_inhibit_load_speculation): New function
	(TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
	* config/arm/unspec.md (VUNSPEC_NOSPECULATE): New unspec_volatile code.
	* config/arm/arm.md (cmp_ior): Make this pattern callable.
	(nospeculate<QHSI:mode>, nospeculatedi): New patterns.
---
 gcc/config/arm/arm.c      | 107 ++++++++++++++++++++++++++++++++++++++++++++++
 gcc/config/arm/arm.md     |  40 ++++++++++++++++-
 gcc/config/arm/unspecs.md |   1 +
 3 files changed, 147 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-arm-Implement-support-for-the-de-speculation-intrins.patch --]
[-- Type: text/x-patch; name="0003-arm-Implement-support-for-the-de-speculation-intrins.patch", Size: 6829 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11e35ad..d1fc0b9 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -321,6 +321,8 @@ static unsigned int arm_hard_regno_nregs (unsigned int, machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx arm_inhibit_load_speculation (machine_mode, rtx, rtx, rtx, rtx,
+					 rtx, rtx);
 \f
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -804,6 +806,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
+
+#undef TARGET_INHIBIT_LOAD_SPECULATION
+#define TARGET_INHIBIT_LOAD_SPECULATION arm_inhibit_load_speculation
 \f
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -31523,6 +31528,108 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align)
   return align;
 }
 
+static rtx
+arm_inhibit_load_speculation (machine_mode mode, rtx result, rtx mem,
+				  rtx lower_bound, rtx upper_bound,
+				  rtx fail_result, rtx cmpptr)
+{
+  rtx cond, comparison;
+
+  /* We can't support this for Thumb1 as we have no suitable conditional
+     move operations.  Nor do we support it for TImode.  For both
+     these cases fall back to the generic code sequence which will emit
+     a suitable warning for us.  */
+  if (mode == TImode || TARGET_THUMB1)
+    return default_inhibit_load_speculation (mode, result, mem, lower_bound,
+					     upper_bound, fail_result, cmpptr);
+
+
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (tgt2, mode))
+    tgt2 = gen_reg_rtx (mode);
+
+if (!register_operand (cmpptr, ptr_mode))
+    cmpptr = force_reg (ptr_mode, cmpptr);
+
+  if (upper_bound == NULL)
+    {
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      cond = arm_gen_compare_reg (LTU, cmpptr, lower_bound, NULL);
+      comparison = gen_rtx_LTU (VOIDmode, cond, const0_rtx);
+    }
+  else if (lower_bound == NULL)
+    {
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      cond = arm_gen_compare_reg (GEU, cmpptr, upper_bound, NULL);
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+  else
+    {
+      /* We want to generate code for
+	   result = (cmpptr < lower || cmpptr >= upper) ? 0 : *ptr;
+	 Which can be recast to
+	   result = (cmpptr < lower || upper <= cmpptr) ? 0 : *ptr;
+	 which can be implemented as
+	   cmp   cmpptr, lower
+	   cmpcs upper, cmpptr
+	   bls   1f
+	   ldr   result, [ptr]
+	  1:
+	   movls result, #0
+	 with suitable IT instructions as needed for thumb2.  Later
+	 optimization passes may make the load conditional.  */
+
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      rtx comparison1 = gen_rtx_LTU (SImode, cmpptr, lower_bound);
+      rtx comparison2 = gen_rtx_LEU (SImode, upper_bound, cmpptr);
+      cond = gen_rtx_REG (arm_select_dominance_cc_mode (comparison1,
+							comparison2,
+							DOM_CC_X_OR_Y),
+			  CC_REGNUM);
+      emit_insn (gen_cmp_ior (cmpptr, lower_bound, upper_bound, cmpptr,
+			      comparison1, comparison2, cond));
+      comparison = gen_rtx_NE (SImode, cond, const0_rtx);
+    }
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_arm_cond_branch (label, comparison, cond));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  switch (mode)
+    {
+    case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+    case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+    case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+    case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (! insn_operand_matches (icode, 4, fail_result))
+    fail_result = force_reg (mode, fail_result);
+
+  emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, fail_result));
+
+  if (tgt2 != result)
+    emit_move_insn (result, tgt2);
+
+  return result;
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index d60c5af..e700fdf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9488,7 +9488,7 @@
    (set_attr "type" "multiple")]
 )
 
-(define_insn "*cmp_ior"
+(define_insn "cmp_ior"
   [(set (match_operand 6 "dominant_cc_register" "")
 	(compare
 	 (ior:SI
@@ -12015,6 +12015,44 @@
   [(set_attr "length" "4")
    (set_attr "type" "coproc")])
 
+(define_insn "nospeculate<QHSI:mode>"
+ [(set (match_operand:QHSI 0 "s_register_operand" "=l,l,r")
+        (unspec_volatile:QHSI
+         [(match_operator 1 "arm_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:QHSI 3 "s_register_operand" "0,0,0")
+	  (match_operand:QHSI 4 "arm_not_operand" "I,K,r")]
+	 VUNSPEC_NOSPECULATE))]
+  "TARGET_32BIT"
+  {
+  if (TARGET_THUMB)
+    return \"it\\t%d1\;mov%d1\\t%0, %4\;.inst 0xf3af8014\t%@ CSDB\";
+  return \"mov%d1\\t%0, %4\;.inst 0xe320f014\t%@ CSDB\";
+  }
+  [(set_attr "type" "mov_imm,mvn_imm,mov_reg")
+   (set_attr "conds" "use")
+   (set_attr "length" "8")]
+)
+
+(define_insn "nospeculatedi"
+ [(set (match_operand:DI 0 "s_register_operand" "=r")
+        (unspec_volatile:DI
+         [(match_operator 1 "arm_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:DI 3 "s_register_operand" "0")
+	  (match_operand:DI 4 "arm_rhs_operand" "rI")]
+	 VUNSPEC_NOSPECULATE))]
+  "TARGET_32BIT"
+  {
+  if (TARGET_THUMB)
+    return \"it\\t%d1\;mov%d1\\t%Q0, %Q4\;it\\t%d1\;mov%d1\\t%R0, %R4\;.inst 0xf3af8014\t%@ CSDB\";
+  return  \"mov%d1\\t%Q0, %Q4\;mov%d1\\t%R0, %R4\;.inst 0xe320f014\t%@ CSDB\";
+  }
+  [(set_attr "type" "mov_reg")
+   (set_attr "conds" "use")
+   (set_attr "length" "12")]
+)
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index c474f4b..727a5ab 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -168,6 +168,7 @@
   VUNSPEC_MCRR2		; Represent the coprocessor mcrr2 instruction.
   VUNSPEC_MRRC		; Represent the coprocessor mrrc instruction.
   VUNSPEC_MRRC2		; Represent the coprocessor mrrc2 instruction.
+  VUNSPEC_NOSPECULATE	; Represent a despeculation sequence.
 ])
 
 ;; Enumerators for NEON unspecs.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3]  Add __builtin_load_no_speculate
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
                   ` (2 preceding siblings ...)
  2018-01-04 13:59 ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
@ 2018-01-04 18:20 ` Joseph Myers
  2018-01-05 17:26   ` Jeff Law
  2018-01-05  9:44 ` Richard Biener
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 70+ messages in thread
From: Joseph Myers @ 2018-01-04 18:20 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On Thu, 4 Jan 2018, Richard Earnshaw wrote:

> 1 - generic modifications to GCC providing the builtin function for all
>     architectures and expanding to an implementation that gives the
>     logical behaviour of the builtin only.  A warning is generated if
>     this expansion path is used that code will execute correctly but
>     without providing protection against speculative use.

Presumably it would make sense to have a standard way for architectures 
with no speculative execution to just use the generic version, but without 
the warning?  E.g., split up default_inhibit_load_speculation so that it 
generates the warning then calls another function with the same prototype, 
so such architectures can just define the hook to use that other function?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
                   ` (3 preceding siblings ...)
  2018-01-04 18:20 ` [PATCH 0/3] Add __builtin_load_no_speculate Joseph Myers
@ 2018-01-05  9:44 ` Richard Biener
  2018-01-05 10:40   ` Richard Earnshaw (lists)
  2018-01-05 16:37   ` Jeff Law
  2018-01-05 11:37 ` Alexander Monakov
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-05  9:44 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
>
> This patch set adds a new builtin function for GCC to provide a
> mechanism for limiting speculation by a CPU after a bounds-checked
> memory access.  I've tried to design this in such a way that it can be
> used for any target where this might be necessary.  The patch set
> provides a generic implementation of the builtin and then
> target-specific support for Arm and AArch64.  Other architectures can
> utilize the internal infrastructure as needed.
>
> Most of the details of the builtin and the hooks that need to be
> implemented to support it are described in the updates to the manual,
> but a short summary is given below.
>
> TYP __builtin_load_no_speculate
>         (const volatile TYP *ptr,
>          const volatile void *lower,
>          const volatile void *upper,
>          TYP failval,
>          const volatile void *cmpptr)
>
> Where TYP can be any integral type (signed or unsigned char, int,
> short, long, etc) or any pointer type.
>
> The builtin implements the following logical behaviour:
>
> inline TYP __builtin_load_no_speculate
>          (const volatile TYP *ptr,
>           const volatile void *lower,
>           const volatile void *upper,
>           TYP failval,
>           const volatile void *cmpptr)
> {
>   TYP result;
>
>   if (cmpptr >= lower && cmpptr < upper)
>     result = *ptr;
>   else
>     result = failval;
>   return result;
> }
>
> in addition the specification of the builtin ensures that future
> speculation using *ptr may only continue iff cmpptr lies within the
> bounds specified.

I fail to see how the vulnerability doesn't affect aggregate copies
or bitfield accesses.  The vulnerability doesn't cause the loaded
value to leak but (part of) the address by populating the CPU cache
side-channel.

So why isn't this just

 T __builtin_load_no_speculate (T *);

?  Why that "fallback" and why the lower/upper bounds?

Did you talk with other compiler vendors (intel, llvm, ...?)?

Richard.

> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
>
> The patch set is constructed as follows:
> 1 - generic modifications to GCC providing the builtin function for all
>     architectures and expanding to an implementation that gives the
>     logical behaviour of the builtin only.  A warning is generated if
>     this expansion path is used that code will execute correctly but
>     without providing protection against speculative use.
> 2 - AArch64 support
> 3 - AArch32 support (arm) for A32 and thumb2 states.
>
> These patches can be used with the header file that Arm recently
> published here: https://github.com/ARM-software/speculation-barrier.
>
> Kernel patches are also being developed, eg:
> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
> code like this will be able to use support directly from the compiler
> in a portable manner.
>
> Similar patches are also being developed for LLVM and will be posted
> to their development lists shortly.
>
> [1] More information on the topic can be found here:
> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
> Arm specific information can be found here: https://www.arm.com/security-update
>
>
>
> Richard Earnshaw (3):
>   [builtins] Generic support for __builtin_load_no_speculate()
>   [aarch64] Implement support for __builtin_load_no_speculate.
>   [arm] Implement support for the de-speculation intrinsic
>
>  gcc/builtin-types.def         |  16 +++++
>  gcc/builtins.c                |  99 +++++++++++++++++++++++++
>  gcc/builtins.def              |  22 ++++++
>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
>  gcc/config/aarch64/aarch64.md |  28 ++++++++
>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
>  gcc/config/arm/arm.md         |  40 ++++++++++-
>  gcc/config/arm/unspecs.md     |   1 +
>  gcc/doc/cpp.texi              |   4 ++
>  gcc/doc/extend.texi           |  53 ++++++++++++++
>  gcc/doc/tm.texi               |   6 ++
>  gcc/doc/tm.texi.in            |   2 +
>  gcc/target.def                |  20 ++++++
>  gcc/targhooks.c               |  69 ++++++++++++++++++
>  gcc/targhooks.h               |   3 +
>  17 files changed, 729 insertions(+), 2 deletions(-)
>
>
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.
  2018-01-04 13:59 ` [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate Richard Earnshaw
@ 2018-01-05  9:51   ` Richard Biener
  2018-01-05 10:48     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Biener @ 2018-01-05  9:51 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>
> This patch implements support for __builtin_load_no_speculate on
> AArch64.  On this architecture we inhibit speclation by emitting a
> combination of CSEL and a hint instruction that ensures the CSEL is
> full resolved when the operands to the CSEL may involve a speculative
> load.

Missing a high-level exlanation here but it looks like you do sth like

  if (access is not in bounds)
    no-speculate-access;
  else
     regular access;

with the upper/lower bounds.  Is this because 'no-speculate-access' is
prohibitely
expensive even when factoring in the condition computation and the condjump?
(trying to reverse engineer how the actual assembly will look like
from the patch,
there isn't even a testcase :/)

>         * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
>         to 'H' operand qualifier.
>         (aarch64_inhibit_load_speculation): New function.
>         (TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
>         * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
>         code.
>         (nospeculate<ALLI:mode>, nospeculateti): New patterns.
> ---
>  gcc/config/aarch64/aarch64.c  | 92 +++++++++++++++++++++++++++++++++++++++++++
>  gcc/config/aarch64/aarch64.md | 28 +++++++++++++
>  2 files changed, 120 insertions(+)
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05  9:44 ` Richard Biener
@ 2018-01-05 10:40   ` Richard Earnshaw (lists)
  2018-01-05 10:47     ` Richard Biener
  2018-01-05 16:41     ` Jeff Law
  2018-01-05 16:37   ` Jeff Law
  1 sibling, 2 replies; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-05 10:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 05/01/18 09:44, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>>         (const volatile TYP *ptr,
>>          const volatile void *lower,
>>          const volatile void *upper,
>>          TYP failval,
>>          const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>          (const volatile TYP *ptr,
>>           const volatile void *lower,
>>           const volatile void *upper,
>>           TYP failval,
>>           const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>>     result = *ptr;
>>   else
>>     result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 

It's not quite as simple as that.  You'll need to read the white paper
on Arm's web site to get a full grasp of this (linked from
https://www.arm.com/security-update).

> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?

Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
to restrict subsequent speculation, the CSEL needs a condition and the
compiler must not be able to optimize it away based on logical
reachability.  The fallback is used for the other operand of the CSEL
instruction.

> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?

As stated we'll shortly be submitting similar patches for LLVM.

R.

> 
> Richard.
> 
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
>>
>> The patch set is constructed as follows:
>> 1 - generic modifications to GCC providing the builtin function for all
>>     architectures and expanding to an implementation that gives the
>>     logical behaviour of the builtin only.  A warning is generated if
>>     this expansion path is used that code will execute correctly but
>>     without providing protection against speculative use.
>> 2 - AArch64 support
>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>
>> These patches can be used with the header file that Arm recently
>> published here: https://github.com/ARM-software/speculation-barrier.
>>
>> Kernel patches are also being developed, eg:
>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>> code like this will be able to use support directly from the compiler
>> in a portable manner.
>>
>> Similar patches are also being developed for LLVM and will be posted
>> to their development lists shortly.
>>
>> [1] More information on the topic can be found here:
>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>> Arm specific information can be found here: https://www.arm.com/security-update
>>
>>
>>
>> Richard Earnshaw (3):
>>   [builtins] Generic support for __builtin_load_no_speculate()
>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>   [arm] Implement support for the de-speculation intrinsic
>>
>>  gcc/builtin-types.def         |  16 +++++
>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++
>>  gcc/builtins.def              |  22 ++++++
>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
>>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
>>  gcc/config/aarch64/aarch64.md |  28 ++++++++
>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
>>  gcc/config/arm/arm.md         |  40 ++++++++++-
>>  gcc/config/arm/unspecs.md     |   1 +
>>  gcc/doc/cpp.texi              |   4 ++
>>  gcc/doc/extend.texi           |  53 ++++++++++++++
>>  gcc/doc/tm.texi               |   6 ++
>>  gcc/doc/tm.texi.in            |   2 +
>>  gcc/target.def                |  20 ++++++
>>  gcc/targhooks.c               |  69 ++++++++++++++++++
>>  gcc/targhooks.h               |   3 +
>>  17 files changed, 729 insertions(+), 2 deletions(-)
>>
>>
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 10:40   ` Richard Earnshaw (lists)
@ 2018-01-05 10:47     ` Richard Biener
  2018-01-05 10:59       ` Richard Earnshaw (lists)
  2018-01-05 16:50       ` Jeff Law
  2018-01-05 16:41     ` Jeff Law
  1 sibling, 2 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-05 10:47 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: GCC Patches

On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>> <Richard.Earnshaw@arm.com> wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>>         (const volatile TYP *ptr,
>>>          const volatile void *lower,
>>>          const volatile void *upper,
>>>          TYP failval,
>>>          const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>          (const volatile TYP *ptr,
>>>           const volatile void *lower,
>>>           const volatile void *upper,
>>>           TYP failval,
>>>           const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>>     result = *ptr;
>>>   else
>>>     result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
>
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
>
>> So why isn't this just
>>
>>  T __builtin_load_no_speculate (T *);
>>
>> ?  Why that "fallback" and why the lower/upper bounds?
>
> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
> to restrict subsequent speculation, the CSEL needs a condition and the
> compiler must not be able to optimize it away based on logical
> reachability.  The fallback is used for the other operand of the CSEL
> instruction.

But the condition could be just 'true' in the instruction encoding?  That is,
why do you do both the jump-around and the csel?  Maybe I misunderstood
the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
the compiler messing up things.

>>
>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>
> As stated we'll shortly be submitting similar patches for LLVM.

How do you solve the aggregate load issue?  I would imagine
that speculating stores (by pre-fetching the affected cache line!)
could be another attack vector?  Not sure if any CPU speculatively
does that (but I see no good reason why a CPU shouldn't do that).

Does ARM have a barrier like instruction suitable for use in the
kernel patches that are floating around?

Richard.

> R.
>
>>
>> Richard.
>>
>>> Some optimizations are permitted to make the builtin easier to use.
>>> The final two arguments can both be omitted (c++ style): failval will
>>> default to 0 in this case and if cmpptr is omitted ptr will be used
>>> for expansions of the range check.  In addition either lower or upper
>>> (but not both) may be a literal NULL and the expansion will then
>>> ignore that boundary condition when expanding.
>>>
>>> The patch set is constructed as follows:
>>> 1 - generic modifications to GCC providing the builtin function for all
>>>     architectures and expanding to an implementation that gives the
>>>     logical behaviour of the builtin only.  A warning is generated if
>>>     this expansion path is used that code will execute correctly but
>>>     without providing protection against speculative use.
>>> 2 - AArch64 support
>>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>>
>>> These patches can be used with the header file that Arm recently
>>> published here: https://github.com/ARM-software/speculation-barrier.
>>>
>>> Kernel patches are also being developed, eg:
>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>>> code like this will be able to use support directly from the compiler
>>> in a portable manner.
>>>
>>> Similar patches are also being developed for LLVM and will be posted
>>> to their development lists shortly.
>>>
>>> [1] More information on the topic can be found here:
>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>>> Arm specific information can be found here: https://www.arm.com/security-update
>>>
>>>
>>>
>>> Richard Earnshaw (3):
>>>   [builtins] Generic support for __builtin_load_no_speculate()
>>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>>   [arm] Implement support for the de-speculation intrinsic
>>>
>>>  gcc/builtin-types.def         |  16 +++++
>>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++
>>>  gcc/builtins.def              |  22 ++++++
>>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
>>>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
>>>  gcc/config/aarch64/aarch64.md |  28 ++++++++
>>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
>>>  gcc/config/arm/arm.md         |  40 ++++++++++-
>>>  gcc/config/arm/unspecs.md     |   1 +
>>>  gcc/doc/cpp.texi              |   4 ++
>>>  gcc/doc/extend.texi           |  53 ++++++++++++++
>>>  gcc/doc/tm.texi               |   6 ++
>>>  gcc/doc/tm.texi.in            |   2 +
>>>  gcc/target.def                |  20 ++++++
>>>  gcc/targhooks.c               |  69 ++++++++++++++++++
>>>  gcc/targhooks.h               |   3 +
>>>  17 files changed, 729 insertions(+), 2 deletions(-)
>>>
>>>
>>>
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.
  2018-01-05  9:51   ` Richard Biener
@ 2018-01-05 10:48     ` Richard Earnshaw (lists)
  2018-01-05 16:31       ` Jeff Law
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-05 10:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 05/01/18 09:51, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>> This patch implements support for __builtin_load_no_speculate on
>> AArch64.  On this architecture we inhibit speclation by emitting a
>> combination of CSEL and a hint instruction that ensures the CSEL is
>> full resolved when the operands to the CSEL may involve a speculative
>> load.
> 
> Missing a high-level exlanation here but it looks like you do sth like
> 
>   if (access is not in bounds)
>     no-speculate-access;
>   else
>      regular access;

For aarch64 we end up with a sequence like

	CMP	'bounds'
	B<cond> out-of-range
	LDR	x1, [ptr]
out-of-range:
	CSEL	x1, FAILVAL, x1, <cond>
	CSDB

The CSEL+CSDB combination ensures that even if we do fetch x1 from an
out-of-range location the value won't persist beyond the end of the code
sequence and thus can't be used for further speculation.

> 
> with the upper/lower bounds.  Is this because 'no-speculate-access' is
> prohibitely
> expensive even when factoring in the condition computation and the condjump?
> (trying to reverse engineer how the actual assembly will look like
> from the patch,

It's all in the white paper.

> there isn't even a testcase :/)
> 
>>         * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
>>         to 'H' operand qualifier.
>>         (aarch64_inhibit_load_speculation): New function.
>>         (TARGET_INHIBIT_LOAD_SPECULATION): Redefine.
>>         * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
>>         code.
>>         (nospeculate<ALLI:mode>, nospeculateti): New patterns.
>> ---
>>  gcc/config/aarch64/aarch64.c  | 92 +++++++++++++++++++++++++++++++++++++++++++
>>  gcc/config/aarch64/aarch64.md | 28 +++++++++++++
>>  2 files changed, 120 insertions(+)
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 10:47     ` Richard Biener
@ 2018-01-05 10:59       ` Richard Earnshaw (lists)
  2018-01-05 11:35         ` Jakub Jelinek
  2018-01-05 16:50       ` Jeff Law
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-05 10:59 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 05/01/18 10:47, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> Recently, Google Project Zero disclosed several classes of attack
>>>> against speculative execution. One of these, known as variant-1
>>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>>> speculation, providing an arbitrary read gadget. Further details can
>>>> be found on the GPZ blog [1] and the documentation that is included
>>>> with the first patch.
>>>>
>>>> This patch set adds a new builtin function for GCC to provide a
>>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>>> memory access.  I've tried to design this in such a way that it can be
>>>> used for any target where this might be necessary.  The patch set
>>>> provides a generic implementation of the builtin and then
>>>> target-specific support for Arm and AArch64.  Other architectures can
>>>> utilize the internal infrastructure as needed.
>>>>
>>>> Most of the details of the builtin and the hooks that need to be
>>>> implemented to support it are described in the updates to the manual,
>>>> but a short summary is given below.
>>>>
>>>> TYP __builtin_load_no_speculate
>>>>         (const volatile TYP *ptr,
>>>>          const volatile void *lower,
>>>>          const volatile void *upper,
>>>>          TYP failval,
>>>>          const volatile void *cmpptr)
>>>>
>>>> Where TYP can be any integral type (signed or unsigned char, int,
>>>> short, long, etc) or any pointer type.
>>>>
>>>> The builtin implements the following logical behaviour:
>>>>
>>>> inline TYP __builtin_load_no_speculate
>>>>          (const volatile TYP *ptr,
>>>>           const volatile void *lower,
>>>>           const volatile void *upper,
>>>>           TYP failval,
>>>>           const volatile void *cmpptr)
>>>> {
>>>>   TYP result;
>>>>
>>>>   if (cmpptr >= lower && cmpptr < upper)
>>>>     result = *ptr;
>>>>   else
>>>>     result = failval;
>>>>   return result;
>>>> }
>>>>
>>>> in addition the specification of the builtin ensures that future
>>>> speculation using *ptr may only continue iff cmpptr lies within the
>>>> bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 

That's why the intrinsic takes explicit bounds not a simple bool
expressing the conditions.  It prevents the optimizers from replacing
the condition by value range propagation.  That does restrict the
flexibility of the builtin somewhat but it does allow it to work
correctly at all optimization levels.

>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
> 
> Does ARM have a barrier like instruction suitable for use in the
> kernel patches that are floating around?
> 
> Richard.
> 
>> R.
>>
>>>
>>> Richard.
>>>
>>>> Some optimizations are permitted to make the builtin easier to use.
>>>> The final two arguments can both be omitted (c++ style): failval will
>>>> default to 0 in this case and if cmpptr is omitted ptr will be used
>>>> for expansions of the range check.  In addition either lower or upper
>>>> (but not both) may be a literal NULL and the expansion will then
>>>> ignore that boundary condition when expanding.
>>>>
>>>> The patch set is constructed as follows:
>>>> 1 - generic modifications to GCC providing the builtin function for all
>>>>     architectures and expanding to an implementation that gives the
>>>>     logical behaviour of the builtin only.  A warning is generated if
>>>>     this expansion path is used that code will execute correctly but
>>>>     without providing protection against speculative use.
>>>> 2 - AArch64 support
>>>> 3 - AArch32 support (arm) for A32 and thumb2 states.
>>>>
>>>> These patches can be used with the header file that Arm recently
>>>> published here: https://github.com/ARM-software/speculation-barrier.
>>>>
>>>> Kernel patches are also being developed, eg:
>>>> https://lkml.org/lkml/2018/1/3/754.  The intent is that eventually
>>>> code like this will be able to use support directly from the compiler
>>>> in a portable manner.
>>>>
>>>> Similar patches are also being developed for LLVM and will be posted
>>>> to their development lists shortly.
>>>>
>>>> [1] More information on the topic can be found here:
>>>> https://googleprojectzero.blogspot.co.uk/2018/01/reading-privileged-memory-with-side.html
>>>> Arm specific information can be found here: https://www.arm.com/security-update
>>>>
>>>>
>>>>
>>>> Richard Earnshaw (3):
>>>>   [builtins] Generic support for __builtin_load_no_speculate()
>>>>   [aarch64] Implement support for __builtin_load_no_speculate.
>>>>   [arm] Implement support for the de-speculation intrinsic
>>>>
>>>>  gcc/builtin-types.def         |  16 +++++
>>>>  gcc/builtins.c                |  99 +++++++++++++++++++++++++
>>>>  gcc/builtins.def              |  22 ++++++
>>>>  gcc/c-family/c-common.c       | 164 ++++++++++++++++++++++++++++++++++++++++++
>>>>  gcc/c-family/c-cppbuiltin.c   |   5 +-
>>>>  gcc/config/aarch64/aarch64.c  |  92 ++++++++++++++++++++++++
>>>>  gcc/config/aarch64/aarch64.md |  28 ++++++++
>>>>  gcc/config/arm/arm.c          | 107 +++++++++++++++++++++++++++
>>>>  gcc/config/arm/arm.md         |  40 ++++++++++-
>>>>  gcc/config/arm/unspecs.md     |   1 +
>>>>  gcc/doc/cpp.texi              |   4 ++
>>>>  gcc/doc/extend.texi           |  53 ++++++++++++++
>>>>  gcc/doc/tm.texi               |   6 ++
>>>>  gcc/doc/tm.texi.in            |   2 +
>>>>  gcc/target.def                |  20 ++++++
>>>>  gcc/targhooks.c               |  69 ++++++++++++++++++
>>>>  gcc/targhooks.h               |   3 +
>>>>  17 files changed, 729 insertions(+), 2 deletions(-)
>>>>
>>>>
>>>>
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 10:59       ` Richard Earnshaw (lists)
@ 2018-01-05 11:35         ` Jakub Jelinek
  2018-01-05 11:58           ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Jakub Jelinek @ 2018-01-05 11:35 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: Richard Biener, GCC Patches

On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:
> > But the condition could be just 'true' in the instruction encoding?  That is,
> > why do you do both the jump-around and the csel?  Maybe I misunderstood
> > the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> > the compiler messing up things.
> > 
> 
> That's why the intrinsic takes explicit bounds not a simple bool
> expressing the conditions.  It prevents the optimizers from replacing
> the condition by value range propagation.  That does restrict the

Preventing optimizers from replacing the condition can be done in many ways,
IFN, UNSPEC, ...
The question is if you really need it at the assembly level, or if you can
just do an unconditional branch or branch conditional on say comparison of
two constants, without any dynamic bounds.

	Jakub

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
                   ` (4 preceding siblings ...)
  2018-01-05  9:44 ` Richard Biener
@ 2018-01-05 11:37 ` Alexander Monakov
  2018-01-05 11:53   ` Richard Earnshaw (lists)
  2018-01-05 17:24 ` Jeff Law
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
  7 siblings, 1 reply; 70+ messages in thread
From: Alexander Monakov @ 2018-01-05 11:37 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

I believe the proposed behavior of the new builtin is over-specialized.
In principle the following snippet may be open to exploitation too:

  if (predicate)
    foo = arr[(secret >> untrusted) & 64];

assuming the attacker has a means to observe which part of 'arr' is brought into
cache, but cannot set 'predicate' to true (so has to rely on the speculative
population of the cache); and likewise if a store is predicated-off rather than
a load.

So shouldn't, for generality, the new builtin work "as if" by cleansing the
address rather than the result of the load, like the following? It would also be
applicable to stores then.

On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> inline TYP __builtin_load_no_speculate
>          (const volatile TYP *ptr,
>           const volatile void *lower,
>           const volatile void *upper,
>           TYP failval,
>           const volatile void *cmpptr)
> {
>   TYP result;
> 
>   if (cmpptr >= lower && cmpptr < upper)
>     result = *ptr;
>   else
>     result = failval;
>   return result;
> }

{
  if (!(cmpptr >= lower && cmpptr < upper))
    ptr = NULL;

  return *ptr;
}

Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 11:37 ` Alexander Monakov
@ 2018-01-05 11:53   ` Richard Earnshaw (lists)
  2018-01-05 13:08     ` Alexander Monakov
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-05 11:53 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On 05/01/18 11:37, Alexander Monakov wrote:
> I believe the proposed behavior of the new builtin is over-specialized.
> In principle the following snippet may be open to exploitation too:
> 
>   if (predicate)
>     foo = arr[(secret >> untrusted) & 64];
> 
> assuming the attacker has a means to observe which part of 'arr' is
> brought into
> cache, but cannot set 'predicate' to true (so has to rely on the speculative
> population of the cache); and likewise if a store is predicated-off
> rather than
> a load.
> 
> So shouldn't, for generality, the new builtin work "as if" by cleansing the
> address rather than the result of the load, like the following? It would
> also be
> applicable to stores then.

This is quite tricky.  For ARM we have to have a speculated load.  So
one way to recast this might be to push 'untrusted' into a stack slot
that was then speculatively loaded back by the builtin.  To do that
you'd need to find a way of expressing predicate as a range check, so
something like

  if (predicate)
   {
     foo = arr[(secret >> __builtin_load_no_speculate (&untrusted,
				predicate_base, predicate_upper, 0,
				predicate_val)) & 64)];
   }
You could use similar techniques with stores as well.

I'm open to suggestions as to how we might express the conditions
better.  It's certainly the least pleasant part of this proposal, but
the seemingly obvious '_Bool condition' parameter won't work because the
compiler will just substitute true in too many situations where it
thinks it has proved that it knows the conditions.

R.


> 
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>> inline TYP __builtin_load_no_speculate
>>          (const volatile TYP *ptr,
>>           const volatile void *lower,
>>           const volatile void *upper,
>>           TYP failval,
>>           const volatile void *cmpptr)
>> {
>>   TYP result;
>> 
>>   if (cmpptr >= lower && cmpptr < upper)
>>     result = *ptr;
>>   else
>>     result = failval;
>>   return result;
>> }
> 
> {
>   if (!(cmpptr >= lower && cmpptr < upper))
>     ptr = NULL;
> 
>   return *ptr;
> }
> 
> Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 11:35         ` Jakub Jelinek
@ 2018-01-05 11:58           ` Richard Earnshaw (lists)
  2018-01-05 16:44             ` Jeff Law
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-05 11:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On 05/01/18 11:35, Jakub Jelinek wrote:
> On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:
>>> But the condition could be just 'true' in the instruction encoding?  That is,
>>> why do you do both the jump-around and the csel?  Maybe I misunderstood
>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
>>> the compiler messing up things.
>>>
>>
>> That's why the intrinsic takes explicit bounds not a simple bool
>> expressing the conditions.  It prevents the optimizers from replacing
>> the condition by value range propagation.  That does restrict the
> 
> Preventing optimizers from replacing the condition can be done in many ways,
> IFN, UNSPEC, ...
> The question is if you really need it at the assembly level, or if you can
> just do an unconditional branch or branch conditional on say comparison of
> two constants, without any dynamic bounds.
> 
> 	Jakub
> 

The bounds /have/ to reflect the real speculation condition.  Otherwise
the CSEL becomes ineffective.

R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 11:53   ` Richard Earnshaw (lists)
@ 2018-01-05 13:08     ` Alexander Monakov
  2018-01-09 10:47       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Alexander Monakov @ 2018-01-05 13:08 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
> This is quite tricky.  For ARM we have to have a speculated load.

Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
wouldn't work (applying CSEL to the address rather than loaded value), and
if it wouldn't, then ARM-specific lowering of the builtin can handle that
anyhow, right? (by spilling the pointer)

(on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

Is the main issue expressing the CSEL condition in the source code? Perhaps it is
possible to introduce

  int guard = __builtin_nontransparent(predicate);

  if (predicate)
    foo = __builtin_load_no_speculate(&arr[addr], guard);

... or maybe even

  if (predicate)
    foo = arr[__builtin_loadspecbarrier(addr, guard)];

where internally __builtin_nontransparent is the same as

  guard = predicate;
  asm volatile("" : "+g"(guard));

although admittedly this is not perfect since it forces evaluation of 'guard'
before the branch.

Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate.
  2018-01-05 10:48     ` Richard Earnshaw (lists)
@ 2018-01-05 16:31       ` Jeff Law
  0 siblings, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-05 16:31 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener; +Cc: GCC Patches

On 01/05/2018 03:48 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:51, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>> <Richard.Earnshaw@arm.com> wrote:
>>>
>>> This patch implements support for __builtin_load_no_speculate on
>>> AArch64.  On this architecture we inhibit speclation by emitting a
>>> combination of CSEL and a hint instruction that ensures the CSEL is
>>> full resolved when the operands to the CSEL may involve a speculative
>>> load.
>>
>> Missing a high-level exlanation here but it looks like you do sth like
>>
>>   if (access is not in bounds)
>>     no-speculate-access;
>>   else
>>      regular access;
> 
> For aarch64 we end up with a sequence like
> 
> 	CMP	'bounds'
> 	B<cond> out-of-range
> 	LDR	x1, [ptr]
> out-of-range:
> 	CSEL	x1, FAILVAL, x1, <cond>
> 	CSDB
> 
> The CSEL+CSDB combination ensures that even if we do fetch x1 from an
> out-of-range location the value won't persist beyond the end of the code
> sequence and thus can't be used for further speculation.
If I try to look at this architecture independently the code starting at
out-of-range is just a conditional move and fence.  Given those
primitives any target ought to be able to define
__builtin_load_no_speculate -- which is obviously important :-)

Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05  9:44 ` Richard Biener
  2018-01-05 10:40   ` Richard Earnshaw (lists)
@ 2018-01-05 16:37   ` Jeff Law
  1 sibling, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-05 16:37 UTC (permalink / raw)
  To: Richard Biener, Richard Earnshaw; +Cc: GCC Patches

On 01/05/2018 02:44 AM, Richard Biener wrote:
> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
>>
>> This patch set adds a new builtin function for GCC to provide a
>> mechanism for limiting speculation by a CPU after a bounds-checked
>> memory access.  I've tried to design this in such a way that it can be
>> used for any target where this might be necessary.  The patch set
>> provides a generic implementation of the builtin and then
>> target-specific support for Arm and AArch64.  Other architectures can
>> utilize the internal infrastructure as needed.
>>
>> Most of the details of the builtin and the hooks that need to be
>> implemented to support it are described in the updates to the manual,
>> but a short summary is given below.
>>
>> TYP __builtin_load_no_speculate
>>         (const volatile TYP *ptr,
>>          const volatile void *lower,
>>          const volatile void *upper,
>>          TYP failval,
>>          const volatile void *cmpptr)
>>
>> Where TYP can be any integral type (signed or unsigned char, int,
>> short, long, etc) or any pointer type.
>>
>> The builtin implements the following logical behaviour:
>>
>> inline TYP __builtin_load_no_speculate
>>          (const volatile TYP *ptr,
>>           const volatile void *lower,
>>           const volatile void *upper,
>>           TYP failval,
>>           const volatile void *cmpptr)
>> {
>>   TYP result;
>>
>>   if (cmpptr >= lower && cmpptr < upper)
>>     result = *ptr;
>>   else
>>     result = failval;
>>   return result;
>> }
>>
>> in addition the specification of the builtin ensures that future
>> speculation using *ptr may only continue iff cmpptr lies within the
>> bounds specified.
> 
> I fail to see how the vulnerability doesn't affect aggregate copies
> or bitfield accesses.  The vulnerability doesn't cause the loaded
> value to leak but (part of) the address by populating the CPU cache
> side-channel.
> 
> So why isn't this just
> 
>  T __builtin_load_no_speculate (T *);
> 
> ?  Why that "fallback" and why the lower/upper bounds?
> 
> Did you talk with other compiler vendors (intel, llvm, ...?)?
I think "fallback" could potentially be any value, I don't think it's
actual value matters much -- I could just as easily be "0" across the
board or some indeterminate value (as long as the indeterminate doesn't
itself present an information leak).

The bounds are used to build a condition for the csel to select between
the loaded value and the fail value.  You need some way to select
between them.

If I've got anything wrong, I'm sure Richard E. will correct me.

Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 10:40   ` Richard Earnshaw (lists)
  2018-01-05 10:47     ` Richard Biener
@ 2018-01-05 16:41     ` Jeff Law
  1 sibling, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-05 16:41 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Richard Biener; +Cc: GCC Patches

On 01/05/2018 03:40 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 09:44, Richard Biener wrote:
>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>> <Richard.Earnshaw@arm.com> wrote:
>>>
>>> Recently, Google Project Zero disclosed several classes of attack
>>> against speculative execution. One of these, known as variant-1
>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>> speculation, providing an arbitrary read gadget. Further details can
>>> be found on the GPZ blog [1] and the documentation that is included
>>> with the first patch.
>>>
>>> This patch set adds a new builtin function for GCC to provide a
>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>> memory access.  I've tried to design this in such a way that it can be
>>> used for any target where this might be necessary.  The patch set
>>> provides a generic implementation of the builtin and then
>>> target-specific support for Arm and AArch64.  Other architectures can
>>> utilize the internal infrastructure as needed.
>>>
>>> Most of the details of the builtin and the hooks that need to be
>>> implemented to support it are described in the updates to the manual,
>>> but a short summary is given below.
>>>
>>> TYP __builtin_load_no_speculate
>>>         (const volatile TYP *ptr,
>>>          const volatile void *lower,
>>>          const volatile void *upper,
>>>          TYP failval,
>>>          const volatile void *cmpptr)
>>>
>>> Where TYP can be any integral type (signed or unsigned char, int,
>>> short, long, etc) or any pointer type.
>>>
>>> The builtin implements the following logical behaviour:
>>>
>>> inline TYP __builtin_load_no_speculate
>>>          (const volatile TYP *ptr,
>>>           const volatile void *lower,
>>>           const volatile void *upper,
>>>           TYP failval,
>>>           const volatile void *cmpptr)
>>> {
>>>   TYP result;
>>>
>>>   if (cmpptr >= lower && cmpptr < upper)
>>>     result = *ptr;
>>>   else
>>>     result = failval;
>>>   return result;
>>> }
>>>
>>> in addition the specification of the builtin ensures that future
>>> speculation using *ptr may only continue iff cmpptr lies within the
>>> bounds specified.
>>
>> I fail to see how the vulnerability doesn't affect aggregate copies
>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>> value to leak but (part of) the address by populating the CPU cache
>> side-channel.
>>
> 
> It's not quite as simple as that.  You'll need to read the white paper
> on Arm's web site to get a full grasp of this (linked from
> https://www.arm.com/security-update).
I actually recommend reading the original papers referenced by Google in
addition to the ARM whitepaper.  This stuff is far from intuitive.  I've
been loosely involved for a month or so, but it really didn't start to
click for me until right before the holiday break.



jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 11:58           ` Richard Earnshaw (lists)
@ 2018-01-05 16:44             ` Jeff Law
  0 siblings, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-05 16:44 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Jakub Jelinek; +Cc: Richard Biener, GCC Patches

On 01/05/2018 04:57 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 11:35, Jakub Jelinek wrote:
>> On Fri, Jan 05, 2018 at 10:59:21AM +0000, Richard Earnshaw (lists) wrote:
>>>> But the condition could be just 'true' in the instruction encoding?  That is,
>>>> why do you do both the jump-around and the csel?  Maybe I misunderstood
>>>> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
>>>> the compiler messing up things.
>>>>
>>>
>>> That's why the intrinsic takes explicit bounds not a simple bool
>>> expressing the conditions.  It prevents the optimizers from replacing
>>> the condition by value range propagation.  That does restrict the
>>
>> Preventing optimizers from replacing the condition can be done in many ways,
>> IFN, UNSPEC, ...
>> The question is if you really need it at the assembly level, or if you can
>> just do an unconditional branch or branch conditional on say comparison of
>> two constants, without any dynamic bounds.
>>
>> 	Jakub
>>
> 
> The bounds /have/ to reflect the real speculation condition.  Otherwise
> the CSEL becomes ineffective.
I think this is probably the key that Jakub and Richard B. were missing.
 It certainly hadn't clicked for me when we spoke earlier.

Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 10:47     ` Richard Biener
  2018-01-05 10:59       ` Richard Earnshaw (lists)
@ 2018-01-05 16:50       ` Jeff Law
  1 sibling, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-05 16:50 UTC (permalink / raw)
  To: Richard Biener, Richard Earnshaw (lists); +Cc: GCC Patches

On 01/05/2018 03:47 AM, Richard Biener wrote:
> On Fri, Jan 5, 2018 at 11:40 AM, Richard Earnshaw (lists)
> <Richard.Earnshaw@arm.com> wrote:
>> On 05/01/18 09:44, Richard Biener wrote:
>>> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw
>>> <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> Recently, Google Project Zero disclosed several classes of attack
>>>> against speculative execution. One of these, known as variant-1
>>>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>>>> speculation, providing an arbitrary read gadget. Further details can
>>>> be found on the GPZ blog [1] and the documentation that is included
>>>> with the first patch.
>>>>
>>>> This patch set adds a new builtin function for GCC to provide a
>>>> mechanism for limiting speculation by a CPU after a bounds-checked
>>>> memory access.  I've tried to design this in such a way that it can be
>>>> used for any target where this might be necessary.  The patch set
>>>> provides a generic implementation of the builtin and then
>>>> target-specific support for Arm and AArch64.  Other architectures can
>>>> utilize the internal infrastructure as needed.
>>>>
>>>> Most of the details of the builtin and the hooks that need to be
>>>> implemented to support it are described in the updates to the manual,
>>>> but a short summary is given below.
>>>>
>>>> TYP __builtin_load_no_speculate
>>>>         (const volatile TYP *ptr,
>>>>          const volatile void *lower,
>>>>          const volatile void *upper,
>>>>          TYP failval,
>>>>          const volatile void *cmpptr)
>>>>
>>>> Where TYP can be any integral type (signed or unsigned char, int,
>>>> short, long, etc) or any pointer type.
>>>>
>>>> The builtin implements the following logical behaviour:
>>>>
>>>> inline TYP __builtin_load_no_speculate
>>>>          (const volatile TYP *ptr,
>>>>           const volatile void *lower,
>>>>           const volatile void *upper,
>>>>           TYP failval,
>>>>           const volatile void *cmpptr)
>>>> {
>>>>   TYP result;
>>>>
>>>>   if (cmpptr >= lower && cmpptr < upper)
>>>>     result = *ptr;
>>>>   else
>>>>     result = failval;
>>>>   return result;
>>>> }
>>>>
>>>> in addition the specification of the builtin ensures that future
>>>> speculation using *ptr may only continue iff cmpptr lies within the
>>>> bounds specified.
>>>
>>> I fail to see how the vulnerability doesn't affect aggregate copies
>>> or bitfield accesses.  The vulnerability doesn't cause the loaded
>>> value to leak but (part of) the address by populating the CPU cache
>>> side-channel.
>>>
>>
>> It's not quite as simple as that.  You'll need to read the white paper
>> on Arm's web site to get a full grasp of this (linked from
>> https://www.arm.com/security-update).
>>
>>> So why isn't this just
>>>
>>>  T __builtin_load_no_speculate (T *);
>>>
>>> ?  Why that "fallback" and why the lower/upper bounds?
>>
>> Because, for Arm, we need to use both CSEL and CSDB (AArch64) in tandem
>> to restrict subsequent speculation, the CSEL needs a condition and the
>> compiler must not be able to optimize it away based on logical
>> reachability.  The fallback is used for the other operand of the CSEL
>> instruction.
> 
> But the condition could be just 'true' in the instruction encoding?  That is,
> why do you do both the jump-around and the csel?  Maybe I misunderstood
> the RTL you appear to generate.  I'd have expected an UNSPEC to avoid
> the compiler messing up things.
> 
>>>
>>> Did you talk with other compiler vendors (intel, llvm, ...?)?
>>
>> As stated we'll shortly be submitting similar patches for LLVM.
> 
> How do you solve the aggregate load issue?  I would imagine
> that speculating stores (by pre-fetching the affected cache line!)
> could be another attack vector?  Not sure if any CPU speculatively
> does that (but I see no good reason why a CPU shouldn't do that).
Without going into details, I've speculated to the appropriate folks
that there's other vectors for these kinds of attacks, some more
exploitable than others.  Attacking the cache via loads/stores is just
one instance of a class of problems IMHO.

I fully expect we'll be fighting this class of problems for a while.
This is just the tip of the iceberg.

Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
                   ` (5 preceding siblings ...)
  2018-01-05 11:37 ` Alexander Monakov
@ 2018-01-05 17:24 ` Jeff Law
  2018-01-09 11:44   ` Richard Earnshaw (lists)
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
  7 siblings, 1 reply; 70+ messages in thread
From: Jeff Law @ 2018-01-05 17:24 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches

On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
> 
> Recently, Google Project Zero disclosed several classes of attack
> against speculative execution. One of these, known as variant-1
> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
> speculation, providing an arbitrary read gadget. Further details can
> be found on the GPZ blog [1] and the documentation that is included
> with the first patch.
So I think it's important for anyone reading this stuff to read
Richard's paragraph carefully --  "an arbitrary read gadget".

I fully expect that over the course of time we're going to see other
arbitrary read gadgets to be found.  Those gadgets may have lower
bandwidth, have a higher degree of jitter or be harder to exploit, but
they're out there just waiting to be discovered.

So I don't expect this to be the only mitigation we have to put into place.

Anyway...


> 
> Some optimizations are permitted to make the builtin easier to use.
> The final two arguments can both be omitted (c++ style): failval will
> default to 0 in this case and if cmpptr is omitted ptr will be used
> for expansions of the range check.  In addition either lower or upper
> (but not both) may be a literal NULL and the expansion will then
> ignore that boundary condition when expanding.
So what are the cases where FAILVAL is useful rather than just using
zero (or some other constant) all the time?

Similarly under what conditions would one want to use CMPPTR rather than
PTR?

I wandered down through the LKML thread but didn't see anything which
would shed light on those two questions.

jeff
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-04 18:20 ` [PATCH 0/3] Add __builtin_load_no_speculate Joseph Myers
@ 2018-01-05 17:26   ` Jeff Law
  2018-01-09 15:30     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Jeff Law @ 2018-01-05 17:26 UTC (permalink / raw)
  To: Joseph Myers, Richard Earnshaw; +Cc: gcc-patches

On 01/04/2018 11:20 AM, Joseph Myers wrote:
> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
> 
>> 1 - generic modifications to GCC providing the builtin function for all
>>     architectures and expanding to an implementation that gives the
>>     logical behaviour of the builtin only.  A warning is generated if
>>     this expansion path is used that code will execute correctly but
>>     without providing protection against speculative use.
> 
> Presumably it would make sense to have a standard way for architectures 
> with no speculative execution to just use the generic version, but without 
> the warning?  E.g., split up default_inhibit_load_speculation so that it 
> generates the warning then calls another function with the same prototype, 
> so such architectures can just define the hook to use that other function?
> 
Seems wise.  There's still architectures that don't speculate or don't
speculate enough to trigger these problems.

Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-04 13:58 ` [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Richard Earnshaw
@ 2018-01-08  2:20   ` Bill Schmidt
  2018-01-08  6:55     ` Jeff Law
  2018-01-08 14:19     ` Richard Earnshaw (lists)
  0 siblings, 2 replies; 70+ messages in thread
From: Bill Schmidt @ 2018-01-08  2:20 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

Hi Richard,

Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
have a way to force resolution of a condition prior to continuing speculation, so this
will just introduce another comparison that we would speculate past.  For our mitigation
we will have to introduce an instruction that halts all speculation at that point, and place
it in front of all dangerous loads.  I wish it were otherwise.

Thanks,
Bill

> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:
> 
> 
> This patch adds generic support for the new builtin
> __builtin_load_no_speculate.  It provides the overloading of the
> different access sizes and a default fall-back expansion for targets
> that do not support a mechanism for inhibiting speculation.
> 
> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
> 	New builtin type signature.
> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
> 	* target.def (inhibit_load_speculation): New hook.
> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
> 	documentation.
> 	* doc/tm.texi: Regenerated.
> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
> 	* doc/extend.texi: Document __builtin_load_no_speculate.
> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
> 	(load_no_speculate_resolve_params): New function.
> 	(load_no_speculate_resolve_return): New function.
> 	(resolve_overloaded_builtin): Handle overloading
> 	__builtin_load_no_speculate.
> 	* builtins.c (expand_load_no_speculate): New function.
> 	(expand_builtin): Handle new no-speculation builtins.
> 	* targhooks.h (default_inhibit_load_speculation): Declare.
> 	* targhooks.c (default_inhibit_load_speculation): New function.
> ---
> gcc/builtin-types.def       |  16 +++++
> gcc/builtins.c              |  99 ++++++++++++++++++++++++++
> gcc/builtins.def            |  22 ++++++
> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
> gcc/c-family/c-cppbuiltin.c |   5 +-
> gcc/doc/cpp.texi            |   4 ++
> gcc/doc/extend.texi         |  53 ++++++++++++++
> gcc/doc/tm.texi             |   6 ++
> gcc/doc/tm.texi.in          |   2 +
> gcc/target.def              |  20 ++++++
> gcc/targhooks.c             |  69 +++++++++++++++++++
> gcc/targhooks.h             |   3 +
> 12 files changed, 462 insertions(+), 1 deletion(-)
> 
> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08  2:20   ` Bill Schmidt
@ 2018-01-08  6:55     ` Jeff Law
  2018-01-08 11:52       ` Richard Biener
  2018-01-08 14:23       ` Bill Schmidt
  2018-01-08 14:19     ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-08  6:55 UTC (permalink / raw)
  To: Bill Schmidt, Richard Earnshaw; +Cc: gcc-patches

On 01/07/2018 07:20 PM, Bill Schmidt wrote:
> Hi Richard,
> 
> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
> have a way to force resolution of a condition prior to continuing speculation, so this
> will just introduce another comparison that we would speculate past.  For our mitigation
> we will have to introduce an instruction that halts all speculation at that point, and place
> it in front of all dangerous loads.  I wish it were otherwise.
So could you have an expander for __builtin_load_no_speculate that just
emits the magic insn that halts all speculation and essentially ignores
the additional stuff that __builtin_load_no_speculate might be able to
do on other platforms?

jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08  6:55     ` Jeff Law
@ 2018-01-08 11:52       ` Richard Biener
  2018-01-08 14:23       ` Bill Schmidt
  1 sibling, 0 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-08 11:52 UTC (permalink / raw)
  To: Jeff Law; +Cc: Bill Schmidt, Richard Earnshaw, GCC Patches

On Mon, Jan 8, 2018 at 5:47 AM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>> Hi Richard,
>>
>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>> have a way to force resolution of a condition prior to continuing speculation, so this
>> will just introduce another comparison that we would speculate past.  For our mitigation
>> we will have to introduce an instruction that halts all speculation at that point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> So could you have an expander for __builtin_load_no_speculate that just
> emits the magic insn that halts all speculation and essentially ignores
> the additional stuff that __builtin_load_no_speculate might be able to
> do on other platforms?

I think you at least need to expand the builtin semantically given as designed
it might consume the condition entirely in the source code.

I also think the user documentation in extend.texi should contain examples on
how to actually use the builtin to mitigate the Spectre attack, that
is, code before
and after using it.

And somebody might want to set up a spectre.html page and some NEWS item
at some point.

Richard.

>
> jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08  2:20   ` Bill Schmidt
  2018-01-08  6:55     ` Jeff Law
@ 2018-01-08 14:19     ` Richard Earnshaw (lists)
  2018-01-08 16:07       ` Bill Schmidt
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-08 14:19 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches

On 08/01/18 02:20, Bill Schmidt wrote:
> Hi Richard,
> 
> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
> have a way to force resolution of a condition prior to continuing speculation, so this
> will just introduce another comparison that we would speculate past.  For our mitigation
> we will have to introduce an instruction that halts all speculation at that point, and place
> it in front of all dangerous loads.  I wish it were otherwise.

So can't you make the builtin expand to (in pseudo code):

	if (bounds_check)
	  {
	    __asm ("barrier");
	    result = *ptr;
          }
        else
	  result = failval;

R.

> 
> Thanks,
> Bill
> 
>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:
>>
>>
>> This patch adds generic support for the new builtin
>> __builtin_load_no_speculate.  It provides the overloading of the
>> different access sizes and a default fall-back expansion for targets
>> that do not support a mechanism for inhibiting speculation.
>>
>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>> 	New builtin type signature.
>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>> 	* target.def (inhibit_load_speculation): New hook.
>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>> 	documentation.
>> 	* doc/tm.texi: Regenerated.
>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>> 	* doc/extend.texi: Document __builtin_load_no_speculate.
>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
>> 	(load_no_speculate_resolve_params): New function.
>> 	(load_no_speculate_resolve_return): New function.
>> 	(resolve_overloaded_builtin): Handle overloading
>> 	__builtin_load_no_speculate.
>> 	* builtins.c (expand_load_no_speculate): New function.
>> 	(expand_builtin): Handle new no-speculation builtins.
>> 	* targhooks.h (default_inhibit_load_speculation): Declare.
>> 	* targhooks.c (default_inhibit_load_speculation): New function.
>> ---
>> gcc/builtin-types.def       |  16 +++++
>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++
>> gcc/builtins.def            |  22 ++++++
>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
>> gcc/c-family/c-cppbuiltin.c |   5 +-
>> gcc/doc/cpp.texi            |   4 ++
>> gcc/doc/extend.texi         |  53 ++++++++++++++
>> gcc/doc/tm.texi             |   6 ++
>> gcc/doc/tm.texi.in          |   2 +
>> gcc/target.def              |  20 ++++++
>> gcc/targhooks.c             |  69 +++++++++++++++++++
>> gcc/targhooks.h             |   3 +
>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>
>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08  6:55     ` Jeff Law
  2018-01-08 11:52       ` Richard Biener
@ 2018-01-08 14:23       ` Bill Schmidt
  2018-01-08 15:38         ` Richard Earnshaw (lists)
  2018-01-08 19:42         ` Jeff Law
  1 sibling, 2 replies; 70+ messages in thread
From: Bill Schmidt @ 2018-01-08 14:23 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Earnshaw, GCC Patches, Segher Boessenkool


> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:
> 
> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>> Hi Richard,
>> 
>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>> have a way to force resolution of a condition prior to continuing speculation, so this
>> will just introduce another comparison that we would speculate past.  For our mitigation
>> we will have to introduce an instruction that halts all speculation at that point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> So could you have an expander for __builtin_load_no_speculate that just
> emits the magic insn that halts all speculation and essentially ignores
> the additional stuff that __builtin_load_no_speculate might be able to
> do on other platforms?

This is possible, but the builtin documentation is completely misleading for
powerpc.  We will not provide the semantics that this builtin claims to provide.
So at a minimum we would need the documentation to indicate that the additional
range-checking is target-specific behavior as well, not just the speculation code.
At that point it isn't really a very target-neutral solution.

What about other targets?  This builtin seems predicated on specific behavior
of ARM architecture; I don't know whether other targets have a guaranteed
speculation-rectifying conditional test.

For POWER, all we would need, or be able to exploit, is 

	void __builtin_speculation_barrier ()

or some such.  If there are two dangerous loads in one block, a single call
to this suffices, but a generic solution involving range checks for specific
loads would require one per load.

Thanks,
Bill  

> 
> jeff
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 14:23       ` Bill Schmidt
@ 2018-01-08 15:38         ` Richard Earnshaw (lists)
  2018-01-08 17:02           ` Bill Schmidt
  2018-01-08 19:42         ` Jeff Law
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-08 15:38 UTC (permalink / raw)
  To: Bill Schmidt, Jeff Law; +Cc: GCC Patches, Segher Boessenkool

On 08/01/18 14:19, Bill Schmidt wrote:
> 
>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:
>>
>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>> So could you have an expander for __builtin_load_no_speculate that just
>> emits the magic insn that halts all speculation and essentially ignores
>> the additional stuff that __builtin_load_no_speculate might be able to
>> do on other platforms?
> 
> This is possible, but the builtin documentation is completely misleading for
> powerpc.  We will not provide the semantics that this builtin claims to provide.
> So at a minimum we would need the documentation to indicate that the additional
> range-checking is target-specific behavior as well, not just the speculation code.
> At that point it isn't really a very target-neutral solution.
> 
> What about other targets?  This builtin seems predicated on specific behavior
> of ARM architecture; I don't know whether other targets have a guaranteed
> speculation-rectifying conditional test.
> 
> For POWER, all we would need, or be able to exploit, is 
> 
> 	void __builtin_speculation_barrier ()
> 
> or some such.  If there are two dangerous loads in one block, a single call
> to this suffices, but a generic solution involving range checks for specific
> loads would require one per load.
> 

Do you have any data to suggest that multiple /independent/ vulnerable
accesses occur under a single guarding condition more often than 'once
in a blue moon'?  It seems to me that would be highly unlikely.


R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 14:19     ` Richard Earnshaw (lists)
@ 2018-01-08 16:07       ` Bill Schmidt
  2018-01-09 10:26         ` Richard Earnshaw (lists)
  2018-01-10 23:28         ` Jeff Law
  0 siblings, 2 replies; 70+ messages in thread
From: Bill Schmidt @ 2018-01-08 16:07 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 08/01/18 02:20, Bill Schmidt wrote:
>> Hi Richard,
>> 
>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>> have a way to force resolution of a condition prior to continuing speculation, so this
>> will just introduce another comparison that we would speculate past.  For our mitigation
>> we will have to introduce an instruction that halts all speculation at that point, and place
>> it in front of all dangerous loads.  I wish it were otherwise.
> 
> So can't you make the builtin expand to (in pseudo code):
> 
> 	if (bounds_check)
> 	  {
> 	    __asm ("barrier");
> 	    result = *ptr;
>          }
>        else
> 	  result = failval;

Could, but this just generates unnecessary code for Power.  We would instead generate

	__asm ("barrier");
	result = *ptr;

without any checks.  We would ignore everything but the first argument.

Thanks,
Bill

> 
> R.
> 
>> 
>> Thanks,
>> Bill
>> 
>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:
>>> 
>>> 
>>> This patch adds generic support for the new builtin
>>> __builtin_load_no_speculate.  It provides the overloading of the
>>> different access sizes and a default fall-back expansion for targets
>>> that do not support a mechanism for inhibiting speculation.
>>> 
>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>> 	New builtin type signature.
>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>>> 	* target.def (inhibit_load_speculation): New hook.
>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>>> 	documentation.
>>> 	* doc/tm.texi: Regenerated.
>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.
>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
>>> 	(load_no_speculate_resolve_params): New function.
>>> 	(load_no_speculate_resolve_return): New function.
>>> 	(resolve_overloaded_builtin): Handle overloading
>>> 	__builtin_load_no_speculate.
>>> 	* builtins.c (expand_load_no_speculate): New function.
>>> 	(expand_builtin): Handle new no-speculation builtins.
>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.
>>> 	* targhooks.c (default_inhibit_load_speculation): New function.
>>> ---
>>> gcc/builtin-types.def       |  16 +++++
>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++
>>> gcc/builtins.def            |  22 ++++++
>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
>>> gcc/c-family/c-cppbuiltin.c |   5 +-
>>> gcc/doc/cpp.texi            |   4 ++
>>> gcc/doc/extend.texi         |  53 ++++++++++++++
>>> gcc/doc/tm.texi             |   6 ++
>>> gcc/doc/tm.texi.in          |   2 +
>>> gcc/target.def              |  20 ++++++
>>> gcc/targhooks.c             |  69 +++++++++++++++++++
>>> gcc/targhooks.h             |   3 +
>>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>> 
>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
>> 
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 15:38         ` Richard Earnshaw (lists)
@ 2018-01-08 17:02           ` Bill Schmidt
  0 siblings, 0 replies; 70+ messages in thread
From: Bill Schmidt @ 2018-01-08 17:02 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: Jeff Law, GCC Patches, Segher Boessenkool

On Jan 8, 2018, at 9:23 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 08/01/18 14:19, Bill Schmidt wrote:
>> 
>>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:
>>> 
>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>>> Hi Richard,
>>>> 
>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>> So could you have an expander for __builtin_load_no_speculate that just
>>> emits the magic insn that halts all speculation and essentially ignores
>>> the additional stuff that __builtin_load_no_speculate might be able to
>>> do on other platforms?
>> 
>> This is possible, but the builtin documentation is completely misleading for
>> powerpc.  We will not provide the semantics that this builtin claims to provide.
>> So at a minimum we would need the documentation to indicate that the additional
>> range-checking is target-specific behavior as well, not just the speculation code.
>> At that point it isn't really a very target-neutral solution.
>> 
>> What about other targets?  This builtin seems predicated on specific behavior
>> of ARM architecture; I don't know whether other targets have a guaranteed
>> speculation-rectifying conditional test.
>> 
>> For POWER, all we would need, or be able to exploit, is 
>> 
>> 	void __builtin_speculation_barrier ()
>> 
>> or some such.  If there are two dangerous loads in one block, a single call
>> to this suffices, but a generic solution involving range checks for specific
>> loads would require one per load.
>> 
> 
> Do you have any data to suggest that multiple /independent/ vulnerable
> accesses occur under a single guarding condition more often than 'once
> in a blue moon'?  It seems to me that would be highly unlikely.

No, I agree with that.

This is more thinking ahead about the problem of trying to identify these
cases automatically.  Anything we do there is going to be necessarily 
conservative, so more loads may look dangerous than a human user can 
identify.  But for something like that then you probably aren't looking at
using __builtin_load_no_speculate anyhow.

Thanks,
Bill

> 
> 
> R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 14:23       ` Bill Schmidt
  2018-01-08 15:38         ` Richard Earnshaw (lists)
@ 2018-01-08 19:42         ` Jeff Law
  2018-01-08 21:08           ` Bill Schmidt
  1 sibling, 1 reply; 70+ messages in thread
From: Jeff Law @ 2018-01-08 19:42 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Richard Earnshaw, GCC Patches, Segher Boessenkool

On 01/08/2018 07:19 AM, Bill Schmidt wrote:
> 
>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:
>>
>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>> So could you have an expander for __builtin_load_no_speculate that just
>> emits the magic insn that halts all speculation and essentially ignores
>> the additional stuff that __builtin_load_no_speculate might be able to
>> do on other platforms?
> 
> This is possible, but the builtin documentation is completely misleading for
> powerpc.  We will not provide the semantics that this builtin claims to provide.
> So at a minimum we would need the documentation to indicate that the additional
> range-checking is target-specific behavior as well, not just the speculation code.
> At that point it isn't really a very target-neutral solution.
> 
> What about other targets?  This builtin seems predicated on specific behavior
> of ARM architecture; I don't know whether other targets have a guaranteed
> speculation-rectifying conditional test.
> 
> For POWER, all we would need, or be able to exploit, is 
> 
> 	void __builtin_speculation_barrier ()
> 
> or some such.  If there are two dangerous loads in one block, a single call
> to this suffices, but a generic solution involving range checks for specific
> loads would require one per load.
So my concern is that if we have multiple builtins to deal with this
problem, then we're forcing the pain of figuring out which one to use
onto the users.

I'd really like there to be a single builtin to address this problem.
Otherwise the kernel (and anyone else that wants to use this stuff) is
stuck with using both, conditional compilation or something along those
lines which seems like a huge lose.

We'd really like them to be able to add one appropriate
__builtin_whatever call at the key site(s) that does the right thing
regardless of the architecture.

I think that implies we're likely to have arguments that are unused on
some architectures.  I can live with that.  But it also implies we need
better language around the semantics.

As you mention -- there's some belief that we're going to want to do
something for automatic detection in the.  I think a builtin for that
could well be different than what we provide to the kernel folks in the
immediate term.  I want to focus first on getting a builtin the kernel
guys can use today as needed though.

Jeff


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 19:42         ` Jeff Law
@ 2018-01-08 21:08           ` Bill Schmidt
  2018-01-10 23:48             ` Jeff Law
  0 siblings, 1 reply; 70+ messages in thread
From: Bill Schmidt @ 2018-01-08 21:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Earnshaw, GCC Patches, Segher Boessenkool

On Jan 8, 2018, at 1:40 PM, Jeff Law <law@redhat.com> wrote:
> 
> On 01/08/2018 07:19 AM, Bill Schmidt wrote:
>> 
>>> On Jan 7, 2018, at 10:47 PM, Jeff Law <law@redhat.com> wrote:
>>> 
>>> On 01/07/2018 07:20 PM, Bill Schmidt wrote:
>>>> Hi Richard,
>>>> 
>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>> So could you have an expander for __builtin_load_no_speculate that just
>>> emits the magic insn that halts all speculation and essentially ignores
>>> the additional stuff that __builtin_load_no_speculate might be able to
>>> do on other platforms?
>> 
>> This is possible, but the builtin documentation is completely misleading for
>> powerpc.  We will not provide the semantics that this builtin claims to provide.
>> So at a minimum we would need the documentation to indicate that the additional
>> range-checking is target-specific behavior as well, not just the speculation code.
>> At that point it isn't really a very target-neutral solution.
>> 
>> What about other targets?  This builtin seems predicated on specific behavior
>> of ARM architecture; I don't know whether other targets have a guaranteed
>> speculation-rectifying conditional test.
>> 
>> For POWER, all we would need, or be able to exploit, is 
>> 
>> 	void __builtin_speculation_barrier ()
>> 
>> or some such.  If there are two dangerous loads in one block, a single call
>> to this suffices, but a generic solution involving range checks for specific
>> loads would require one per load.
> So my concern is that if we have multiple builtins to deal with this
> problem, then we're forcing the pain of figuring out which one to use
> onto the users.
> 
> I'd really like there to be a single builtin to address this problem.
> Otherwise the kernel (and anyone else that wants to use this stuff) is
> stuck with using both, conditional compilation or something along those
> lines which seems like a huge lose.
> 
> We'd really like them to be able to add one appropriate
> __builtin_whatever call at the key site(s) that does the right thing
> regardless of the architecture.
> 
> I think that implies we're likely to have arguments that are unused on
> some architectures.  I can live with that.  But it also implies we need
> better language around the semantics.
> 
> As you mention -- there's some belief that we're going to want to do
> something for automatic detection in the.  I think a builtin for that
> could well be different than what we provide to the kernel folks in the
> immediate term.  I want to focus first on getting a builtin the kernel
> guys can use today as needed though.

Hi Jeff,

I agree 100% with this approach.  I just wanted to raise the point in case
other architectures have different needs.  Power can work around this
by just ignoring 4 of the 5 arguments.  As long as nobody else needs
*additional* arguments, this should work out just fine.  But I want to be clear
that the only guarantee of the semantics for everybody is that "speculation 
stops here," while on some processors it may be "speculation stops here
if out of range."  If we can write this into the documentation, then I'm fine
writing a target expander for Power as discussed.

I had a brief interchange with Richi last week, and he suggested that for
the automatic detection we might look into flagging MEM_REFs rather
than inserting a built-in; a target hook can still handle such a flag.  That
has some advantages and some disadvantages that I can think of, so
we'll have to talk that out on the list over time after we get through the
crisis mode reactions.

Thanks!

Bill

> 
> Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 16:07       ` Bill Schmidt
@ 2018-01-09 10:26         ` Richard Earnshaw (lists)
  2018-01-09 17:11           ` Bill Schmidt
  2018-01-10 23:28         ` Jeff Law
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 10:26 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches

On 08/01/18 16:01, Bill Schmidt wrote:
> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 08/01/18 02:20, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>>
>> So can't you make the builtin expand to (in pseudo code):
>>
>> 	if (bounds_check)
>> 	  {
>> 	    __asm ("barrier");
>> 	    result = *ptr;
>>          }
>>        else
>> 	  result = failval;
> 
> Could, but this just generates unnecessary code for Power.  We would instead generate
> 
> 	__asm ("barrier");
> 	result = *ptr;
> 
> without any checks.  We would ignore everything but the first argument.

You can't do that with the builtin as it is currently specified as it
also has a defined behaviour for the result it returns.  You can,
however, expand the code as normal RTL and let the optimizers remove any
redundant code if they can make that deduction and you don't need the
additional behaviour.

R.

> 
> Thanks,
> Bill
> 
>>
>> R.
>>
>>>
>>> Thanks,
>>> Bill
>>>
>>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:
>>>>
>>>>
>>>> This patch adds generic support for the new builtin
>>>> __builtin_load_no_speculate.  It provides the overloading of the
>>>> different access sizes and a default fall-back expansion for targets
>>>> that do not support a mechanism for inhibiting speculation.
>>>>
>>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>>> 	New builtin type signature.
>>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>>>> 	* target.def (inhibit_load_speculation): New hook.
>>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>>>> 	documentation.
>>>> 	* doc/tm.texi: Regenerated.
>>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.
>>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
>>>> 	(load_no_speculate_resolve_params): New function.
>>>> 	(load_no_speculate_resolve_return): New function.
>>>> 	(resolve_overloaded_builtin): Handle overloading
>>>> 	__builtin_load_no_speculate.
>>>> 	* builtins.c (expand_load_no_speculate): New function.
>>>> 	(expand_builtin): Handle new no-speculation builtins.
>>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.
>>>> 	* targhooks.c (default_inhibit_load_speculation): New function.
>>>> ---
>>>> gcc/builtin-types.def       |  16 +++++
>>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++
>>>> gcc/builtins.def            |  22 ++++++
>>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
>>>> gcc/c-family/c-cppbuiltin.c |   5 +-
>>>> gcc/doc/cpp.texi            |   4 ++
>>>> gcc/doc/extend.texi         |  53 ++++++++++++++
>>>> gcc/doc/tm.texi             |   6 ++
>>>> gcc/doc/tm.texi.in          |   2 +
>>>> gcc/target.def              |  20 ++++++
>>>> gcc/targhooks.c             |  69 +++++++++++++++++++
>>>> gcc/targhooks.h             |   3 +
>>>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>>>
>>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>
>>>
>>
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 13:08     ` Alexander Monakov
@ 2018-01-09 10:47       ` Richard Earnshaw (lists)
  2018-01-09 13:41         ` Alexander Monakov
  2018-01-10 23:48         ` Jeff Law
  0 siblings, 2 replies; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 10:47 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On 05/01/18 13:08, Alexander Monakov wrote:
> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
>> This is quite tricky.  For ARM we have to have a speculated load.
> 
> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> wouldn't work (applying CSEL to the address rather than loaded value), and
> if it wouldn't, then ARM-specific lowering of the builtin can handle that
> anyhow, right? (by spilling the pointer)

The load has to feed /in/ to the csel/csdb sequence, not come after it.

> 
> (on x86 the current Intel's recommendation is to emit LFENCE prior to the load)

That can be supported in the way you expand the builtin.  The builtin
expander is given a (MEM (ptr)) , but it's up to the back-end where to
put that in the expanded sequence to materialize the load, so you could
write (sorry, don't know x86 asm very well, but I think this is how
you'd put it)

	lfence
	mov	(ptr), dest

with branches around that as appropriate to support the remainder of the
builtin's behaviour.

> Is the main issue expressing the CSEL condition in the source code? Perhaps it is
> possible to introduce
> 
>   int guard = __builtin_nontransparent(predicate);
> 
>   if (predicate)
>     foo = __builtin_load_no_speculate(&arr[addr], guard);
> 
> ... or maybe even
> 
>   if (predicate)
>     foo = arr[__builtin_loadspecbarrier(addr, guard)];
> 
> where internally __builtin_nontransparent is the same as
> 
>   guard = predicate;
>   asm volatile("" : "+g"(guard));
> 
> although admittedly this is not perfect since it forces evaluation of 'guard'
> before the branch.

As I explained to Bernd last night, I think this is likely be unsafe.
If there's some control path before __builtin_nontransparent that allows
'predicate' to be simplified (eg by value range propagation), then your
guard doesn't protect against the speculation that you think it does.
Changing all the optimizers to guarantee that wouldn't happen (and
guaranteeing that all future optimizers won't introduce new problems of
that nature) is, I suspect, very non-trivial.

R.

> 
> Alexander
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 17:24 ` Jeff Law
@ 2018-01-09 11:44   ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 11:44 UTC (permalink / raw)
  To: Jeff Law, gcc-patches

On 05/01/18 17:24, Jeff Law wrote:
> On 01/04/2018 06:58 AM, Richard Earnshaw wrote:
>>
>> Recently, Google Project Zero disclosed several classes of attack
>> against speculative execution. One of these, known as variant-1
>> (CVE-2017-5753), allows explicit bounds checks to be bypassed under
>> speculation, providing an arbitrary read gadget. Further details can
>> be found on the GPZ blog [1] and the documentation that is included
>> with the first patch.
> So I think it's important for anyone reading this stuff to read
> Richard's paragraph carefully --  "an arbitrary read gadget".
> 
> I fully expect that over the course of time we're going to see other
> arbitrary read gadgets to be found.  Those gadgets may have lower
> bandwidth, have a higher degree of jitter or be harder to exploit, but
> they're out there just waiting to be discovered.
> 
> So I don't expect this to be the only mitigation we have to put into place.
> 
> Anyway...
> 
> 
>>
>> Some optimizations are permitted to make the builtin easier to use.
>> The final two arguments can both be omitted (c++ style): failval will
>> default to 0 in this case and if cmpptr is omitted ptr will be used
>> for expansions of the range check.  In addition either lower or upper
>> (but not both) may be a literal NULL and the expansion will then
>> ignore that boundary condition when expanding.
> So what are the cases where FAILVAL is useful rather than just using
> zero (or some other constant) all the time?

So some background first.  My initial attempts to define a builtin were
based entirely around trying to specify the builtin without out any hard
functional behaviour as well.  The specification was a mess.  Things
just became so much easier to specify when I moved to defining a logical
behaviour on top of the intended side effects on speculation.  Having
done that it seemed sensible to permit the user to use the builtin in
more creative ways by allowing it to select the failure value.  The idea
was that code such as

  if (ptr >= base && ptr < limit) // bounds check
    return *ptr;
  return FAILVAL;

could simply be rewritten as

  return __builtin_load_no_speculate (ptr, base, limit, FAILVAL);

and now you don't have to worry about writing the condition out twice or
any other such nonsense.  In this case the builtin does exactly what you
want it to do.  (It's also easier now to write test cases that check the
correctness of the builtin's expansion, since you can test for a
behaviour of the code's execution, even if you can't check the
speculation behaviour properly.)



> 
> Similarly under what conditions would one want to use CMPPTR rather than
> PTR?

This was at the request of some kernel folk.  They have some cases where
CMPPTR may be a pointer to an atomic type and want to do something like

  if (cmpptr >= lower && cmpptr < upper)
    val = __atomic_read_and_inc (cmpptr);

The atomics are complicated enough already and rewriting them all to
additionally inhibit speculation based on the result would be a
nightmare.  By separating out cmpptr from ptr you can now write

  if (cmpptr >= lower && cmpptr < upper)
    {
      TYPE tmp_val = __atomic_read_and_inc (cmpptr);
      val = builtin_load_no_speculate (&tmp_val, lower, upper, 0,
                                       cmpptr);
    }

It's meaningless in this case to check the bounds of tmp_val - it's just
a value pushed onto the stack in order to satisfy the requirement that
we need a load that is being speculatively executed; but we can still
test against the speculation conditions, which are still the range check
for cmpptr.

There may be other similar cases as well where you have an object that
you want to clean up that is somehow derived from cmpptr but is not
cmpptr itself.  You do have to be more careful with the extended form,
since it is possible to write sequences that don't inhibit speculation
in the way you might think they do, but with greater flexibility also
comes greater risk.  I don't think there's ever a problem if ptr is
somehow derived from dereferencing cmpptr.

R.

> 
> I wandered down through the LKML thread but didn't see anything which
> would shed light on those two questions.
> 
> jeff
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 10:47       ` Richard Earnshaw (lists)
@ 2018-01-09 13:41         ` Alexander Monakov
  2018-01-09 13:50           ` Richard Earnshaw (lists)
  2018-01-10 23:48         ` Jeff Law
  1 sibling, 1 reply; 70+ messages in thread
From: Alexander Monakov @ 2018-01-09 13:41 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
> > wouldn't work (applying CSEL to the address rather than loaded value), and
> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
> > anyhow, right? (by spilling the pointer)
> 
> The load has to feed /in/ to the csel/csdb sequence, not come after it.

Again, I'm sorry, but I have to insist that what you're saying here contradicts
the documentation linked from https://developer.arm.com/support/security-update
The PDF currently says, in "Details of the CSDB barrier":

    Until the barrier completes:
    1) For any load, store, data or instruction preload, RW2, appearing in
    program order *after the barrier* [...]

    2) For any indirect branch (B2), appearing in program order
    *after the barrier* [...]

    [...] the speculative execution of RW2/B2 does not influence the
    allocations of entries in a cache [...]

It doesn't say anything about the behavior of CSDB being dependent on the loads
encountered prior to it.  (and imho it doesn't make sense for a hardware
implementation to work that way)

> As I explained to Bernd last night, I think this is likely be unsafe.
> If there's some control path before __builtin_nontransparent that allows
> 'predicate' to be simplified (eg by value range propagation), then your
> guard doesn't protect against the speculation that you think it does.
> Changing all the optimizers to guarantee that wouldn't happen (and
> guaranteeing that all future optimizers won't introduce new problems of
> that nature) is, I suspect, very non-trivial.

But note that in that case the compiler could have performed the same
simplification in the original code as well, emitting straight-line machine code
lacking speculatively executable parts in the first place.

Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 13:41         ` Alexander Monakov
@ 2018-01-09 13:50           ` Richard Earnshaw (lists)
  2018-01-09 14:37             ` Alexander Monakov
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 13:50 UTC (permalink / raw)
  To: Alexander Monakov; +Cc: gcc-patches

On 09/01/18 13:27, Alexander Monakov wrote:
> On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
>> > Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
>> > wouldn't work (applying CSEL to the address rather than loaded value), and
>> > if it wouldn't, then ARM-specific lowering of the builtin can handle that
>> > anyhow, right? (by spilling the pointer)
>> 
>> The load has to feed /in/ to the csel/csdb sequence, not come after it.
> 
> Again, I'm sorry, but I have to insist that what you're saying here
> contradicts
> the documentation linked from
> https://developer.arm.com/support/security-update
> The PDF currently says, in "Details of the CSDB barrier":
> 
>     Until the barrier completes:
>     1) For any load, store, data or instruction preload, RW2, appearing in
>     program order *after the barrier* [...]
> 
>     2) For any indirect branch (B2), appearing in program order
>     *after the barrier* [...]
> 
>     [...] the speculative execution of RW2/B2 does not influence the
>     allocations of entries in a cache [...]
> 
> It doesn't say anything about the behavior of CSDB being dependent on
> the loads
> encountered prior to it.  (and imho it doesn't make sense for a hardware
> implementation to work that way)

Read condition 1 i) again.


i) the conditional select instruction has a register data dependency on
a load R1, that has been executed speculatively, for one of its input
registers, and

To be executed speculatively it must be *after* a predicted operation
that has not yet resolved.  Once it has resolved it is no-longer
speculative, it's committed (one way or another).

R.

> 
>> As I explained to Bernd last night, I think this is likely be unsafe.
>> If there's some control path before __builtin_nontransparent that allows
>> 'predicate' to be simplified (eg by value range propagation), then your
>> guard doesn't protect against the speculation that you think it does.
>> Changing all the optimizers to guarantee that wouldn't happen (and
>> guaranteeing that all future optimizers won't introduce new problems of
>> that nature) is, I suspect, very non-trivial.
> 
> But note that in that case the compiler could have performed the same
> simplification in the original code as well, emitting straight-line
> machine code
> lacking speculatively executable parts in the first place.
> 
> Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 13:50           ` Richard Earnshaw (lists)
@ 2018-01-09 14:37             ` Alexander Monakov
  0 siblings, 0 replies; 70+ messages in thread
From: Alexander Monakov @ 2018-01-09 14:37 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

On Tue, 9 Jan 2018, Richard Earnshaw (lists) wrote:
> Read condition 1 i) again.
> 
> 
> i) the conditional select instruction has a register data dependency on
> a load R1, that has been executed speculatively, for one of its input
> registers, and

Sorry - I don't know how I missed that. I understand the motivation for the
behavior of the new built-in now; the CSDB specification is surprising, but
that is off-topic I guess. Seems quite unfortunate that CSDB is constraining
the generic built-in like this, though.

Thank you for clearing this up.

I suggest that the user documentation in extend.texi should explicitly call out
that the load itself still may be speculatively executed - only its consumers
can be expected to be fenced off. For example, after

+but in addition target-specific code will be inserted to ensure that
+speculation using @code{*ptr} cannot occur when @var{cmpptr} lies outside of
+the specified bounds.

it can go on to say e.g.,

  However note that the load of @code{*ptr} itself and the conditional branch
  leading to it may still be subject to speculative execution (and the load may
  have observable effects on the cache as a result).

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-05 17:26   ` Jeff Law
@ 2018-01-09 15:30     ` Richard Earnshaw (lists)
  2018-01-11  0:13       ` Jeff Law
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 15:30 UTC (permalink / raw)
  To: Jeff Law, Joseph Myers; +Cc: gcc-patches

On 05/01/18 17:26, Jeff Law wrote:
> On 01/04/2018 11:20 AM, Joseph Myers wrote:
>> On Thu, 4 Jan 2018, Richard Earnshaw wrote:
>>
>>> 1 - generic modifications to GCC providing the builtin function for all
>>>     architectures and expanding to an implementation that gives the
>>>     logical behaviour of the builtin only.  A warning is generated if
>>>     this expansion path is used that code will execute correctly but
>>>     without providing protection against speculative use.
>>
>> Presumably it would make sense to have a standard way for architectures 
>> with no speculative execution to just use the generic version, but without 
>> the warning?  E.g., split up default_inhibit_load_speculation so that it 
>> generates the warning then calls another function with the same prototype, 
>> so such architectures can just define the hook to use that other function?
>>
> Seems wise.  There's still architectures that don't speculate or don't
> speculate enough to trigger these problems.
> 
> Jeff
> 

I'm in two minds about that.  There are certainly cases where you might
want to use the generic expansion as part of handling a case where you
have a more standard speculation barrier; in those cases you might want
to emit your barrier and then let the generic code expand and provide
the logical behaviour of the builtin.

However, if you don't have a barrier you need to ask yourself, will my
architecture ever have an implementation that does do speculation?
Unless you can be certain that it won't, you probably need to be warned
that some code the programmer thinks might be vulnerable to spectre has
not been compiled with that protection, otherwise if you run that code
on a later implementation that does speculate, it might be vulnerable.

Let me give an example, we use the generic code expansion if we
encounter the builtin when generating code for Thumb on pre-ARMv7
devices.  We don't have instructions in 'thumb1' to guard against
speculation and we really want to warn users that they've done this (it
might be a mistake in how they're invoking the compiler).

I could add an extra parameter to the helper (bool warn_unimplemented),
which would be true if called directly from the expanders, but would now
allow back-ends to implement a trivial variant that just suppressed the
warning.  They could also then use the generic expansion if all that was
needed was a subsequent fence instruction.

R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-09 10:26         ` Richard Earnshaw (lists)
@ 2018-01-09 17:11           ` Bill Schmidt
  2018-01-10 23:36             ` Jeff Law
  0 siblings, 1 reply; 70+ messages in thread
From: Bill Schmidt @ 2018-01-09 17:11 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc-patches

On Jan 9, 2018, at 4:21 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 08/01/18 16:01, Bill Schmidt wrote:
>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>> 
>>> On 08/01/18 02:20, Bill Schmidt wrote:
>>>> Hi Richard,
>>>> 
>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>> 
>>> So can't you make the builtin expand to (in pseudo code):
>>> 
>>> 	if (bounds_check)
>>> 	  {
>>> 	    __asm ("barrier");
>>> 	    result = *ptr;
>>>         }
>>>       else
>>> 	  result = failval;
>> 
>> Could, but this just generates unnecessary code for Power.  We would instead generate
>> 
>> 	__asm ("barrier");
>> 	result = *ptr;
>> 
>> without any checks.  We would ignore everything but the first argument.
> 
> You can't do that with the builtin as it is currently specified as it
> also has a defined behaviour for the result it returns.  You can,
> however, expand the code as normal RTL and let the optimizers remove any
> redundant code if they can make that deduction and you don't need the
> additional behaviour.

But that's my original point.  "As currently specified" is overspecified for
our architecture, and expanding the extra code *hoping* it will go away
is not something I feel we should do.  If our hopes are dashed, we end up
with yet worse performance.  If we're going to use something generic
for everyone, then I argue that the semantics may not be the same for
all targets, and that this needs to be specified in the documentation.

I'm just looking for a solution that works for everyone.

Thanks,
Bill

> 
> R.
> 
>> 
>> Thanks,
>> Bill
>> 
>>> 
>>> R.
>>> 
>>>> 
>>>> Thanks,
>>>> Bill
>>>> 
>>>>> On Jan 4, 2018, at 7:58 AM, Richard Earnshaw <richard.earnshaw@arm.com> wrote:
>>>>> 
>>>>> 
>>>>> This patch adds generic support for the new builtin
>>>>> __builtin_load_no_speculate.  It provides the overloading of the
>>>>> different access sizes and a default fall-back expansion for targets
>>>>> that do not support a mechanism for inhibiting speculation.
>>>>> 
>>>>> 	* builtin_types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>>>> 	New builtin type signature.
>>>>> 	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>>> 	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>>> 	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>>> 	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>>> 	* builtins.def (BUILT_IN_LOAD_NO_SPECULATE_N): New builtin.
>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_1): Likewise.
>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_2): Likewise.
>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_4): Likewise.
>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_8): Likewise.
>>>>> 	(BUILT_IN_LOAD_NO_SPECULATE_16): Likewise.
>>>>> 	* target.def (inhibit_load_speculation): New hook.
>>>>> 	* doc/tm.texi.in (TARGET_INHIBIT_LOAD_SPECULATION): Add to
>>>>> 	documentation.
>>>>> 	* doc/tm.texi: Regenerated.
>>>>> 	* doc/cpp.texi: Document __HAVE_LOAD_NO_SPECULATE.
>>>>> 	* doc/extend.texi: Document __builtin_load_no_speculate.
>>>>> 	* c-family/c-common.c (load_no_speculate_resolve_size): New function.
>>>>> 	(load_no_speculate_resolve_params): New function.
>>>>> 	(load_no_speculate_resolve_return): New function.
>>>>> 	(resolve_overloaded_builtin): Handle overloading
>>>>> 	__builtin_load_no_speculate.
>>>>> 	* builtins.c (expand_load_no_speculate): New function.
>>>>> 	(expand_builtin): Handle new no-speculation builtins.
>>>>> 	* targhooks.h (default_inhibit_load_speculation): Declare.
>>>>> 	* targhooks.c (default_inhibit_load_speculation): New function.
>>>>> ---
>>>>> gcc/builtin-types.def       |  16 +++++
>>>>> gcc/builtins.c              |  99 ++++++++++++++++++++++++++
>>>>> gcc/builtins.def            |  22 ++++++
>>>>> gcc/c-family/c-common.c     | 164 ++++++++++++++++++++++++++++++++++++++++++++
>>>>> gcc/c-family/c-cppbuiltin.c |   5 +-
>>>>> gcc/doc/cpp.texi            |   4 ++
>>>>> gcc/doc/extend.texi         |  53 ++++++++++++++
>>>>> gcc/doc/tm.texi             |   6 ++
>>>>> gcc/doc/tm.texi.in          |   2 +
>>>>> gcc/target.def              |  20 ++++++
>>>>> gcc/targhooks.c             |  69 +++++++++++++++++++
>>>>> gcc/targhooks.h             |   3 +
>>>>> 12 files changed, 462 insertions(+), 1 deletion(-)
>>>>> 
>>>>> <0001-builtins-Generic-support-for-__builtin_load_no_specu.patch>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 16:07       ` Bill Schmidt
  2018-01-09 10:26         ` Richard Earnshaw (lists)
@ 2018-01-10 23:28         ` Jeff Law
  2018-01-12 16:16           ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 70+ messages in thread
From: Jeff Law @ 2018-01-10 23:28 UTC (permalink / raw)
  To: Bill Schmidt, Richard Earnshaw (lists); +Cc: gcc-patches

On 01/08/2018 09:01 AM, Bill Schmidt wrote:
> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 08/01/18 02:20, Bill Schmidt wrote:
>>> Hi Richard,
>>>
>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>> it in front of all dangerous loads.  I wish it were otherwise.
>>
>> So can't you make the builtin expand to (in pseudo code):
>>
>> 	if (bounds_check)
>> 	  {
>> 	    __asm ("barrier");
>> 	    result = *ptr;
>>          }
>>        else
>> 	  result = failval;
> 
> Could, but this just generates unnecessary code for Power.  We would instead generate
> 
> 	__asm ("barrier");
> 	result = *ptr;
> 
> without any checks.  We would ignore everything but the first argument.
So how bad would it be to drop the whole concept of failval and make the
result indeterminate in the out of bounds case.


Would that give us enough freedom to generate an appropriate sequence
for aarch64 and ppc?  It feels like these two architectures are
essentially on opposite sides of the spectrum and if we can find a
reasonable way to handle them that we'd likely have semantics we can use
on just about any architecture.


jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-09 17:11           ` Bill Schmidt
@ 2018-01-10 23:36             ` Jeff Law
  0 siblings, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-10 23:36 UTC (permalink / raw)
  To: Bill Schmidt, Richard Earnshaw (lists); +Cc: gcc-patches

On 01/09/2018 10:11 AM, Bill Schmidt wrote:
> On Jan 9, 2018, at 4:21 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>
>> On 08/01/18 16:01, Bill Schmidt wrote:
>>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>>
>>>> On 08/01/18 02:20, Bill Schmidt wrote:
>>>>> Hi Richard,
>>>>>
>>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>>>
>>>> So can't you make the builtin expand to (in pseudo code):
>>>>
>>>> 	if (bounds_check)
>>>> 	  {
>>>> 	    __asm ("barrier");
>>>> 	    result = *ptr;
>>>>         }
>>>>       else
>>>> 	  result = failval;
>>>
>>> Could, but this just generates unnecessary code for Power.  We would instead generate
>>>
>>> 	__asm ("barrier");
>>> 	result = *ptr;
>>>
>>> without any checks.  We would ignore everything but the first argument.
>>
>> You can't do that with the builtin as it is currently specified as it
>> also has a defined behaviour for the result it returns.  You can,
>> however, expand the code as normal RTL and let the optimizers remove any
>> redundant code if they can make that deduction and you don't need the
>> additional behaviour.
> 
> But that's my original point.  "As currently specified" is overspecified for
> our architecture, and expanding the extra code *hoping* it will go away
> is not something I feel we should do.  If our hopes are dashed, we end up
> with yet worse performance.  If we're going to use something generic
> for everyone, then I argue that the semantics may not be the same for
> all targets, and that this needs to be specified in the documentation.
Other than in the case where bounds_check is a compile-time constant I
don't see that the compiler is likely to drop the else clause above.  So
relying on the compiler to optimize away the redundant code doesn't seem
like a viable option.

However, if we relax the semantics on failval, then I think we get
enough freedom to generate the desired code on PPC.

> I'm just looking for a solution that works for everyone.
Likewise.  It'd be unfortunate if we end up with two distinct builtins
and the kernel buys have to write some level of abstraction on top of
them to select between them based on the target.


jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-08 21:08           ` Bill Schmidt
@ 2018-01-10 23:48             ` Jeff Law
  0 siblings, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-10 23:48 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: Richard Earnshaw, GCC Patches, Segher Boessenkool

On 01/08/2018 02:03 PM, Bill Schmidt wrote:
> 
> I agree 100% with this approach.  I just wanted to raise the point in case
> other architectures have different needs.  Power can work around this
> by just ignoring 4 of the 5 arguments.  As long as nobody else needs
> *additional* arguments, this should work out just fine.  But I want to be clear
> that the only guarantee of the semantics for everybody is that "speculation 
> stops here," while on some processors it may be "speculation stops here
> if out of range."  If we can write this into the documentation, then I'm fine
> writing a target expander for Power as discussed.
My recollection of other micro-architectures and how they handle
conditional moves makes me believe that the test and conditional move
may be enough to stop the rampant speculation that causes the problems.
They're just enough of a fence to provide a level of mitigation.

So I see value in providing those arguments for architectures other than
ARM/AArch64.

What I think we're really trying to nail down is how crisply the
semantics of this builtin are, particularly around the need to test and
provide a failval.

I'm actually going to be in a meeting with another chip vendor tomorrow
AM and will make a point to discuss this with them and see what
direction they want to go.  I suspect they're closer to PPC in terms of
the semantics they want, but need to verify.


> 
> I had a brief interchange with Richi last week, and he suggested that for
> the automatic detection we might look into flagging MEM_REFs rather
> than inserting a built-in; a target hook can still handle such a flag.  That
> has some advantages and some disadvantages that I can think of, so
> we'll have to talk that out on the list over time after we get through the
> crisis mode reactions.
I think someone could likely spend a huge amount of time in this space.
Both with the analysis around finding potentially vulnerable sequences,
generating appropriate mitigation code and optimizing that to be as
painless as possible.  But as you say, this is something we should hash
out post-crisis.

jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 10:47       ` Richard Earnshaw (lists)
  2018-01-09 13:41         ` Alexander Monakov
@ 2018-01-10 23:48         ` Jeff Law
  1 sibling, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-10 23:48 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Alexander Monakov; +Cc: gcc-patches

On 01/09/2018 03:47 AM, Richard Earnshaw (lists) wrote:
> On 05/01/18 13:08, Alexander Monakov wrote:
>> On Fri, 5 Jan 2018, Richard Earnshaw (lists) wrote:
>>> This is quite tricky.  For ARM we have to have a speculated load.
>>
>> Sorry, I don't follow. On ARM, it is surprising that CSEL-CSDB-LDR sequence
>> wouldn't work (applying CSEL to the address rather than loaded value), and
>> if it wouldn't, then ARM-specific lowering of the builtin can handle that
>> anyhow, right? (by spilling the pointer)
> 
> The load has to feed /in/ to the csel/csdb sequence, not come after it.
> 
>>
>> (on x86 the current Intel's recommendation is to emit LFENCE prior to the load)
> 
> That can be supported in the way you expand the builtin.  The builtin
> expander is given a (MEM (ptr)) , but it's up to the back-end where to
> put that in the expanded sequence to materialize the load, so you could
> write (sorry, don't know x86 asm very well, but I think this is how
> you'd put it)
> 
> 	lfence
> 	mov	(ptr), dest
> 
> with branches around that as appropriate to support the remainder of the
> builtin's behaviour.
I think the argument is going to be that they don't want the branches
around to support the other test + failval semantics.  Essentially the
same position as IBM has with PPC.

> 
>> Is the main issue expressing the CSEL condition in the source code? Perhaps it is
>> possible to introduce
>>
>>   int guard = __builtin_nontransparent(predicate);
>>
>>   if (predicate)
>>     foo = __builtin_load_no_speculate(&arr[addr], guard);
>>
>> ... or maybe even
>>
>>   if (predicate)
>>     foo = arr[__builtin_loadspecbarrier(addr, guard)];
>>
>> where internally __builtin_nontransparent is the same as
>>
>>   guard = predicate;
>>   asm volatile("" : "+g"(guard));
>>
>> although admittedly this is not perfect since it forces evaluation of 'guard'
>> before the branch.
> 
> As I explained to Bernd last night, I think this is likely be unsafe.
> If there's some control path before __builtin_nontransparent that allows
> 'predicate' to be simplified (eg by value range propagation), then your
> guard doesn't protect against the speculation that you think it does.
> Changing all the optimizers to guarantee that wouldn't happen (and
> guaranteeing that all future optimizers won't introduce new problems of
> that nature) is, I suspect, very non-trivial.
Agreed.  Whatever PREDICATE happens to be, the compiler is going to go
through extreme measures to try and collapse PREDICATE down to a
compile-time constant, including splitting paths to the point where
PREDICATE is used in the conditional so that on one side it's constant
and the other it's non-constant.  It seems like this approach is likely
to be compromised by the optimizers.


Jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 15:30     ` Richard Earnshaw (lists)
@ 2018-01-11  0:13       ` Jeff Law
  0 siblings, 0 replies; 70+ messages in thread
From: Jeff Law @ 2018-01-11  0:13 UTC (permalink / raw)
  To: Richard Earnshaw (lists), Joseph Myers; +Cc: gcc-patches

On 01/09/2018 08:29 AM, Richard Earnshaw (lists) wrote:
> I'm in two minds about that.  There are certainly cases where you might
> want to use the generic expansion as part of handling a case where you
> have a more standard speculation barrier; in those cases you might want
> to emit your barrier and then let the generic code expand and provide
> the logical behaviour of the builtin.
> 
> However, if you don't have a barrier you need to ask yourself, will my
> architecture ever have an implementation that does do speculation?
> Unless you can be certain that it won't, you probably need to be warned
> that some code the programmer thinks might be vulnerable to spectre has
> not been compiled with that protection, otherwise if you run that code
> on a later implementation that does speculate, it might be vulnerable.
> 
> Let me give an example, we use the generic code expansion if we
> encounter the builtin when generating code for Thumb on pre-ARMv7
> devices.  We don't have instructions in 'thumb1' to guard against
> speculation and we really want to warn users that they've done this (it
> might be a mistake in how they're invoking the compiler).
> 
> I could add an extra parameter to the helper (bool warn_unimplemented),
> which would be true if called directly from the expanders, but would now
> allow back-ends to implement a trivial variant that just suppressed the
> warning.  They could also then use the generic expansion if all that was
> needed was a subsequent fence instruction.
Yea, I see your side as well on this -- advisable or not I suspect we're
going to see folks saying "my embedded chip doesn't have these problems,
I don't want to pay any of these costs" and I don't want to be warned
about a problem I know can't happen (today).

Anyway, I think relatively speaking this is minor compared to the stuff
we're discussing around the semantics of the builtin.  I can live with
iterating on this aspect based on feedback.

jeff

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-10 23:28         ` Jeff Law
@ 2018-01-12 16:16           ` Richard Earnshaw (lists)
  2018-01-12 22:48             ` Bill Schmidt
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-12 16:16 UTC (permalink / raw)
  To: Jeff Law, Bill Schmidt; +Cc: gcc-patches

On 10/01/18 23:26, Jeff Law wrote:
> On 01/08/2018 09:01 AM, Bill Schmidt wrote:
>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>
>>> On 08/01/18 02:20, Bill Schmidt wrote:
>>>> Hi Richard,
>>>>
>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>>
>>> So can't you make the builtin expand to (in pseudo code):
>>>
>>> 	if (bounds_check)
>>> 	  {
>>> 	    __asm ("barrier");
>>> 	    result = *ptr;
>>>          }
>>>        else
>>> 	  result = failval;
>>
>> Could, but this just generates unnecessary code for Power.  We would instead generate
>>
>> 	__asm ("barrier");
>> 	result = *ptr;
>>
>> without any checks.  We would ignore everything but the first argument.
> So how bad would it be to drop the whole concept of failval and make the
> result indeterminate in the out of bounds case.
> 

Indeterminate on its own is too loose.  An arbitrary value fetched from
memory is 'indeterminate', so we'd need tighter wording that that, see
below.

> 
> Would that give us enough freedom to generate an appropriate sequence
> for aarch64 and ppc?  It feels like these two architectures are
> essentially on opposite sides of the spectrum and if we can find a
> reasonable way to handle them that we'd likely have semantics we can use
> on just about any architecture.
> 
> 
> jeff
> 

So if we changed the specification (using the same parameter names) to:

1) When the call to the builtin is not being speculatively executed the
   result is *ptr if lower_bound <= cmpptr < upper_bound and undefined
   otherwise.
2) When code is being speculatively executed either:
   a) execution of subsequent instructions that depend on the result
      will be prevented until it can be proven that the call to the
      builtin is not being speculatively executed (ie until execution
      can continue under point 1), or
   b) speculation may continue using *ptr as the result when
      lower_bound <= cmpptr < upper_bound, or an unspecified constant
      value (eg 0) if cmpptr lies outside that range.

Then I think that meets those requirements.

We could still implement the existing relaxations on permitting the
builtin to be called with only one of upper and lower.

My concern is that this would make it quite hard for programmers to
reason safely about the behaviour overall.

R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate()
  2018-01-12 16:16           ` Richard Earnshaw (lists)
@ 2018-01-12 22:48             ` Bill Schmidt
  0 siblings, 0 replies; 70+ messages in thread
From: Bill Schmidt @ 2018-01-12 22:48 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: Jeff Law, gcc-patches

Hi Richard and Jeff,

Sorry I missed this earlier today, it somehow ended up in my spam folder...

> On Jan 12, 2018, at 10:08 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
> 
> On 10/01/18 23:26, Jeff Law wrote:
>> On 01/08/2018 09:01 AM, Bill Schmidt wrote:
>>> On Jan 8, 2018, at 8:06 AM, Richard Earnshaw (lists) <Richard.Earnshaw@arm.com> wrote:
>>>> 
>>>> On 08/01/18 02:20, Bill Schmidt wrote:
>>>>> Hi Richard,
>>>>> 
>>>>> Unfortunately, I don't see any way that this will be useful for the ppc targets.  We don't
>>>>> have a way to force resolution of a condition prior to continuing speculation, so this
>>>>> will just introduce another comparison that we would speculate past.  For our mitigation
>>>>> we will have to introduce an instruction that halts all speculation at that point, and place
>>>>> it in front of all dangerous loads.  I wish it were otherwise.
>>>> 
>>>> So can't you make the builtin expand to (in pseudo code):
>>>> 
>>>> 	if (bounds_check)
>>>> 	  {
>>>> 	    __asm ("barrier");
>>>> 	    result = *ptr;
>>>>         }
>>>>       else
>>>> 	  result = failval;
>>> 
>>> Could, but this just generates unnecessary code for Power.  We would instead generate
>>> 
>>> 	__asm ("barrier");
>>> 	result = *ptr;
>>> 
>>> without any checks.  We would ignore everything but the first argument.
>> So how bad would it be to drop the whole concept of failval and make the
>> result indeterminate in the out of bounds case.
>> 
> 
> Indeterminate on its own is too loose.  An arbitrary value fetched from
> memory is 'indeterminate', so we'd need tighter wording that that, see
> below.
> 
>> 
>> Would that give us enough freedom to generate an appropriate sequence
>> for aarch64 and ppc?  It feels like these two architectures are
>> essentially on opposite sides of the spectrum and if we can find a
>> reasonable way to handle them that we'd likely have semantics we can use
>> on just about any architecture.
>> 
>> 
>> jeff
>> 
> 
> So if we changed the specification (using the same parameter names) to:
> 
> 1) When the call to the builtin is not being speculatively executed the
>   result is *ptr if lower_bound <= cmpptr < upper_bound and undefined
>   otherwise.
> 2) When code is being speculatively executed either:
>   a) execution of subsequent instructions that depend on the result
>      will be prevented until it can be proven that the call to the
>      builtin is not being speculatively executed (ie until execution
>      can continue under point 1), or
>   b) speculation may continue using *ptr as the result when
>      lower_bound <= cmpptr < upper_bound, or an unspecified constant
>      value (eg 0) if cmpptr lies outside that range.
> 
> Then I think that meets those requirements.

This is quite acceptable for Power.  Thanks, Richard, for your flexibility!

Bill

> 
> We could still implement the existing relaxations on permitting the
> builtin to be called with only one of upper and lower.
> 
> My concern is that this would make it quite hard for programmers to
> reason safely about the behaviour overall.
> 
> R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load
  2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
                   ` (6 preceding siblings ...)
  2018-01-05 17:24 ` Jeff Law
@ 2018-01-17 14:56 ` Richard Earnshaw
  2018-01-17 14:56   ` [PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load Richard Earnshaw
                     ` (3 more replies)
  7 siblings, 4 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-17 14:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 3263 bytes --]


This patch series is version 2 for a patch to protect against
speculative use of a load instruction.  It's based on a roll-up of the
feedback from the first version.

What's changed since v1?

First, and foremost, the API has changed to make it possible to reduce
the amount of code that needs to be generated on architectures that
provide an unconditional speculation barrier.  Although the builtin
still requires the additional operands (and does some checking to
ensure that they aren't completely unreasonable), back-end expansion
can ignore them and simply emit a load instruction along with the
barrier operation..

Secondly, I've dropped the failval parameter.  With the new API, with
undefined behaviour for an out-of-bounds access, this paramter
no-longer makes sense.

Thirdly, based on off-list comments, I've dropped the ability to make
upper_bound NULL.  As a bounds value it was not particularly logical
and coding it requierd special handling in the back-end (wrong code
would be generated otherwise).  By removing it the bounds are now all
natural.  (NULL is still supported as a lower bound, nothing can be
less than address 0 --- at least on unsigned addressing-mode machines,
which appear to be all that GCC supports --- so it is a simple back-end
optimization to drop the check in this case.)

Finally, I've changed the name of the builtin.  This was mostly for
Arm's benefit since we already have some folk building code with the
old API and I need to be able to transition them cleanly to the new
one.  This is a one-off change, I'm not intending to support such
renames if we have to iterate again on this builtin before the initial
commit (ie I won't do it again, promise).

I've updated the documentation for the changes, please read that for
additional details (in patch 1).  Richi commented that he thought some
examples would help: I agree, but feel that putting them in
extend.texi with the builtin itself doesn't really fit with the rest
of that section - I think a web, or wiki page on the subject would be a
better bet.  That can be kept up-to-date more easily than the
documentation that comes with the compiler.

R.

Richard Earnshaw (3):
  [builtins] Generic support for __builtin_speculation_safe_load()
  [aarch64] Implement support for __builtin_speculation_safe_load
  [arm] Implement support for the de-speculation intrinsic

 gcc/builtin-types.def         |  16 +++++
 gcc/builtins.c                |  81 ++++++++++++++++++++++
 gcc/builtins.def              |  17 +++++
 gcc/c-family/c-common.c       | 152 ++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c   |   5 +-
 gcc/config/aarch64/aarch64.c  |  81 ++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md |  28 ++++++++
 gcc/config/arm/arm.c          | 100 +++++++++++++++++++++++++++
 gcc/config/arm/arm.md         |  40 ++++++++++-
 gcc/config/arm/unspecs.md     |   1 +
 gcc/doc/cpp.texi              |   4 ++
 gcc/doc/extend.texi           |  68 +++++++++++++++++++
 gcc/doc/tm.texi               |   9 +++
 gcc/doc/tm.texi.in            |   2 +
 gcc/target.def                |  34 ++++++++++
 gcc/targhooks.c               |  59 ++++++++++++++++
 gcc/targhooks.h               |   3 +
 17 files changed, 698 insertions(+), 2 deletions(-)

[-- Attachment #2: Type: text/plain, Size: 1 bytes --]



^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
  2018-01-17 14:56   ` [PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load Richard Earnshaw
@ 2018-01-17 14:56   ` Richard Earnshaw
  2018-01-17 14:59   ` [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load() Richard Earnshaw
  2018-01-17 17:17   ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Joseph Myers
  3 siblings, 0 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-17 14:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 822 bytes --]


This patch implements despeculation on ARM.  We only support it when
generating ARM or Thumb2 code (we need conditional execution); and we
only support it for sizes up to DImode.  For unsupported cases we
fall back to the generic code generation sequence so that a suitable
failure warning is emitted.

	* config/arm/arm.c (arm_speculation_safe_load): New function
	(TARGET_SPECULATION_SAFE_LOAD): Redefine.
	* config/arm/unspec.md (VUNSPEC_NOSPECULATE): New unspec_volatile code.
	* config/arm/arm.md (cmp_ior): Make this pattern callable.
	(nospeculate<QHSI:mode>, nospeculatedi): New patterns.
---
 gcc/config/arm/arm.c      | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 gcc/config/arm/arm.md     |  40 ++++++++++++++++++-
 gcc/config/arm/unspecs.md |   1 +
 3 files changed, 140 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0003-arm-Implement-support-for-the-de-speculation-intrins.patch --]
[-- Type: text/x-patch; name="0003-arm-Implement-support-for-the-de-speculation-intrins.patch", Size: 6674 bytes --]

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 11e35ad..f28ad2b 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -321,6 +321,8 @@ static unsigned int arm_hard_regno_nregs (unsigned int, machine_mode);
 static bool arm_hard_regno_mode_ok (unsigned int, machine_mode);
 static bool arm_modes_tieable_p (machine_mode, machine_mode);
 static HOST_WIDE_INT arm_constant_alignment (const_tree, HOST_WIDE_INT);
+static rtx arm_speculation_safe_load (machine_mode, rtx, rtx, rtx, rtx, rtx,
+				      bool);
 \f
 /* Table of machine attributes.  */
 static const struct attribute_spec arm_attribute_table[] =
@@ -804,6 +806,9 @@ static const struct attribute_spec arm_attribute_table[] =
 
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT arm_constant_alignment
+
+#undef TARGET_SPECULATION_SAFE_LOAD
+#define TARGET_SPECULATION_SAFE_LOAD arm_speculation_safe_load
 \f
 /* Obstack for minipool constant handling.  */
 static struct obstack minipool_obstack;
@@ -31523,6 +31528,101 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT align)
   return align;
 }
 
+static rtx
+arm_speculation_safe_load (machine_mode mode, rtx result, rtx mem,
+				  rtx lower_bound, rtx upper_bound,
+			   rtx cmpptr, bool warn)
+{
+  rtx cond, comparison;
+
+  /* We can't support this for Thumb1 as we have no suitable conditional
+     move operations.  Nor do we support it for TImode.  For both
+     these cases fall back to the generic code sequence which will emit
+     a suitable warning for us.  */
+  if (mode == TImode || TARGET_THUMB1)
+    return default_speculation_safe_load (mode, result, mem, lower_bound,
+					  upper_bound, cmpptr, warn);
+
+
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (tgt2, mode))
+    tgt2 = gen_reg_rtx (mode);
+
+  if (!register_operand (cmpptr, ptr_mode))
+    cmpptr = force_reg (ptr_mode, cmpptr);
+
+  /* There's no point in comparing against a lower bound that is NULL, all
+     addresses are greater than or equal to that.  */
+  if (lower_bound == const0_rtx)
+    {
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      cond = arm_gen_compare_reg (GEU, cmpptr, upper_bound, NULL);
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+  else
+    {
+      /* We want to generate code for
+	   result = (cmpptr < lower || cmpptr >= upper) ? 0 : *ptr;
+	 Which can be recast to
+	   result = (cmpptr < lower || upper <= cmpptr) ? 0 : *ptr;
+	 which can be implemented as
+	   cmp   cmpptr, lower
+	   cmpcs upper, cmpptr
+	   bls   1f
+	   ldr   result, [ptr]
+	  1:
+	   movls result, #0
+	 with suitable IT instructions as needed for thumb2.  Later
+	 optimization passes may make the load conditional.  */
+
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      rtx comparison1 = gen_rtx_LTU (SImode, cmpptr, lower_bound);
+      rtx comparison2 = gen_rtx_LEU (SImode, upper_bound, cmpptr);
+      cond = gen_rtx_REG (arm_select_dominance_cc_mode (comparison1,
+							comparison2,
+							DOM_CC_X_OR_Y),
+			  CC_REGNUM);
+      emit_insn (gen_cmp_ior (cmpptr, lower_bound, upper_bound, cmpptr,
+			      comparison1, comparison2, cond));
+      comparison = gen_rtx_NE (SImode, cond, const0_rtx);
+    }
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_arm_cond_branch (label, comparison, cond));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  /* We don't support TImode on Arm, but that can't currently be generated
+     for integral types on this architecture.  */
+  switch (mode)
+    {
+    case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+    case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+    case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+    case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, const0_rtx));
+
+  if (tgt2 != result)
+    emit_move_insn (result, tgt2);
+
+  return result;
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index d60c5af..e700fdf 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -9488,7 +9488,7 @@
    (set_attr "type" "multiple")]
 )
 
-(define_insn "*cmp_ior"
+(define_insn "cmp_ior"
   [(set (match_operand 6 "dominant_cc_register" "")
 	(compare
 	 (ior:SI
@@ -12015,6 +12015,44 @@
   [(set_attr "length" "4")
    (set_attr "type" "coproc")])
 
+(define_insn "nospeculate<QHSI:mode>"
+ [(set (match_operand:QHSI 0 "s_register_operand" "=l,l,r")
+        (unspec_volatile:QHSI
+         [(match_operator 1 "arm_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:QHSI 3 "s_register_operand" "0,0,0")
+	  (match_operand:QHSI 4 "arm_not_operand" "I,K,r")]
+	 VUNSPEC_NOSPECULATE))]
+  "TARGET_32BIT"
+  {
+  if (TARGET_THUMB)
+    return \"it\\t%d1\;mov%d1\\t%0, %4\;.inst 0xf3af8014\t%@ CSDB\";
+  return \"mov%d1\\t%0, %4\;.inst 0xe320f014\t%@ CSDB\";
+  }
+  [(set_attr "type" "mov_imm,mvn_imm,mov_reg")
+   (set_attr "conds" "use")
+   (set_attr "length" "8")]
+)
+
+(define_insn "nospeculatedi"
+ [(set (match_operand:DI 0 "s_register_operand" "=r")
+        (unspec_volatile:DI
+         [(match_operator 1 "arm_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:DI 3 "s_register_operand" "0")
+	  (match_operand:DI 4 "arm_rhs_operand" "rI")]
+	 VUNSPEC_NOSPECULATE))]
+  "TARGET_32BIT"
+  {
+  if (TARGET_THUMB)
+    return \"it\\t%d1\;mov%d1\\t%Q0, %Q4\;it\\t%d1\;mov%d1\\t%R0, %R4\;.inst 0xf3af8014\t%@ CSDB\";
+  return  \"mov%d1\\t%Q0, %Q4\;mov%d1\\t%R0, %R4\;.inst 0xe320f014\t%@ CSDB\";
+  }
+  [(set_attr "type" "mov_reg")
+   (set_attr "conds" "use")
+   (set_attr "length" "12")]
+)
+
 ;; Vector bits common to IWMMXT and Neon
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index c474f4b..727a5ab 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -168,6 +168,7 @@
   VUNSPEC_MCRR2		; Represent the coprocessor mcrr2 instruction.
   VUNSPEC_MRRC		; Represent the coprocessor mrrc instruction.
   VUNSPEC_MRRC2		; Represent the coprocessor mrrc2 instruction.
+  VUNSPEC_NOSPECULATE	; Represent a despeculation sequence.
 ])
 
 ;; Enumerators for NEON unspecs.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
@ 2018-01-17 14:56   ` Richard Earnshaw
  2018-01-17 14:56   ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-17 14:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 780 bytes --]


This patch implements support for __builtin_speculation_safe_load on
AArch64.  On this architecture we inhibit speclation by emitting a
combination of CSEL and a hint instruction that ensures the CSEL is
full resolved when the operands to the CSEL may involve a speculative
load.

	* config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed
	to 'H' operand qualifier.
	(aarch64_speculation_safe_load): New function.
	(TARGET_SPECULATION_SAFE_LOAD): Redefine.
	* config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile
	code.
	(nospeculate<ALLI:mode>, nospeculateti): New patterns.
---
 gcc/config/aarch64/aarch64.c  | 81 +++++++++++++++++++++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md | 28 +++++++++++++++
 2 files changed, 109 insertions(+)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0002-aarch64-Implement-support-for-__builtin_speculation_.patch --]
[-- Type: text/x-patch; name="0002-aarch64-Implement-support-for-__builtin_speculation_.patch", Size: 4896 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93e9d9f9..6591d19 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5315,6 +5315,14 @@ aarch64_print_operand (FILE *f, rtx x, int code)
       break;
 
     case 'H':
+       /* Print the higher numbered register of a pair (TImode) of regs.  */
+      if (x == const0_rtx
+	  || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x)))
+	{
+	  asm_fprintf (f, "xzr");
+	  break;
+	}
+
       if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1))
 	{
 	  output_operand_lossage ("invalid operand for '%%%c'", code);
@@ -15115,6 +15123,76 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn)
     }
 }
 
+static rtx
+aarch64_speculation_safe_load (machine_mode mode, rtx result, rtx mem,
+			       rtx lower_bound, rtx upper_bound, rtx cmpptr,
+			       bool warn ATTRIBUTE_UNUSED)
+{
+  rtx cond, comparison;
+  rtx target = gen_reg_rtx (mode);
+  rtx tgt2 = result;
+
+  if (!register_operand (cmpptr, ptr_mode))
+    cmpptr = force_reg (ptr_mode, cmpptr);
+
+  if (!register_operand (tgt2, mode))
+    tgt2 = gen_reg_rtx (mode);
+
+  if (lower_bound == const0_rtx)
+    {
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      cond = aarch64_gen_compare_reg (GEU, cmpptr, upper_bound);
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+  else
+    {
+      if (!register_operand (lower_bound, ptr_mode))
+	lower_bound = force_reg (ptr_mode, lower_bound);
+
+      if (!register_operand (upper_bound, ptr_mode))
+	upper_bound = force_reg (ptr_mode, upper_bound);
+
+      rtx cond1 = aarch64_gen_compare_reg (GEU, cmpptr, lower_bound);
+      rtx comparison1 = gen_rtx_GEU (ptr_mode, cond1, const0_rtx);
+      rtx failcond = GEN_INT (aarch64_get_condition_code (comparison1)^1);
+      cond = gen_rtx_REG (CCmode, CC_REGNUM);
+      if (ptr_mode == SImode)
+	emit_insn (gen_ccmpsi (cond1, cond, cmpptr, upper_bound, comparison1,
+			       failcond));
+      else
+	emit_insn (gen_ccmpdi (cond1, cond, cmpptr, upper_bound, comparison1,
+			       failcond));
+      comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx);
+    }
+
+  rtx_code_label *label = gen_label_rtx ();
+  emit_jump_insn (gen_condjump (comparison, cond, label));
+  emit_move_insn (target, mem);
+  emit_label (label);
+
+  insn_code icode;
+
+  switch (mode)
+    {
+    case E_QImode: icode = CODE_FOR_nospeculateqi; break;
+    case E_HImode: icode = CODE_FOR_nospeculatehi; break;
+    case E_SImode: icode = CODE_FOR_nospeculatesi; break;
+    case E_DImode: icode = CODE_FOR_nospeculatedi; break;
+    case E_TImode: icode = CODE_FOR_nospeculateti; break;
+    default:
+      gcc_unreachable ();
+    }
+
+  emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, const0_rtx));
+
+  if (tgt2 != result)
+    emit_move_insn (result, tgt2);
+
+  return result;
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -15554,6 +15632,9 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment
 
+#undef TARGET_SPECULATION_SAFE_LOAD
+#define TARGET_SPECULATION_SAFE_LOAD aarch64_speculation_safe_load
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index f1e2a07..1a1f398 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -153,6 +153,7 @@
     UNSPECV_SET_FPSR		; Represent assign of FPSR content.
     UNSPECV_BLOCKAGE		; Represent a blockage
     UNSPECV_PROBE_STACK_RANGE	; Represent stack range probing.
+    UNSPECV_NOSPECULATE		; Inhibit speculation
   ]
 )
 
@@ -5797,6 +5798,33 @@
   DONE;
 })
 
+(define_insn "nospeculate<ALLI:mode>"
+  [(set (match_operand:ALLI 0 "register_operand" "=r")
+        (unspec_volatile:ALLI
+         [(match_operator 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:ALLI 3 "register_operand" "r")
+	  (match_operand:ALLI 4 "aarch64_reg_or_zero" "rZ")]
+	 UNSPECV_NOSPECULATE))]
+  ""
+  "csel\\t%<w>0, %<w>3, %<w>4, %M1\;hint\t#0x14\t// CSDB"
+  [(set_attr "type" "csel")
+   (set_attr "length" "8")]
+)
+
+(define_insn "nospeculateti"
+  [(set (match_operand:TI 0 "register_operand" "=r")
+        (unspec_volatile:TI
+         [(match_operator 1 "aarch64_comparison_operator"
+	   [(match_operand 2 "cc_register" "") (const_int 0)])
+	  (match_operand:TI 3 "register_operand" "r")
+	  (match_operand:TI 4 "aarch64_reg_or_zero" "rZ")]
+	 UNSPECV_NOSPECULATE))]
+  ""
+  "csel\\t%x0, %x3, %x4, %M1\;csel\\t%H0, %H3, %H4, %M1\;hint\t#0x14\t// CSDB"
+  [(set_attr "type" "csel")
+   (set_attr "length" "12")]
+)
 ;; AdvSIMD Stuff
 (include "aarch64-simd.md")
 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
  2018-01-17 14:56   ` [PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load Richard Earnshaw
  2018-01-17 14:56   ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
@ 2018-01-17 14:59   ` Richard Earnshaw
  2018-01-17 17:22     ` Joseph Myers
  2018-01-18 12:48     ` Richard Biener
  2018-01-17 17:17   ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Joseph Myers
  3 siblings, 2 replies; 70+ messages in thread
From: Richard Earnshaw @ 2018-01-17 14:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 2290 bytes --]


This patch adds generic support for the new builtin
__builtin_speculation_safe_load.  It provides the overloading of the
different access sizes and a default fall-back expansion for targets
that do not support a mechanism for inhibiting speculation.

	* builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
	New builtin type signature.
	(BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	(BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
	* builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
	(BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
	(BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
	(BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
	(BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
	(BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
	* target.def (speculation_safe_load): New hook.
	* doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
	documentation.
	* doc/tm.texi: Regenerated.
	* doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
	* doc/extend.texi: Document __builtin_speculation_safe_load.
	* c-family/c-common.c (speculation_safe_load_resolve_size): New
	function.
	(speculation_safe_load_resolve_params): New function.
	(speculation_safe_load_resolve_return): New function.
	(resolve_overloaded_builtin): Handle overloading
	__builtin_speculation_safe_load.
	* builtins.c (expand_speculation_safe_load): New function.
	(expand_builtin): Handle new speculation-safe builtins.
	* targhooks.h (default_speculation_safe_load): Declare.
	* targhooks.c (default_speculation_safe_load): New function.
---
 gcc/builtin-types.def       |  16 +++++
 gcc/builtins.c              |  81 +++++++++++++++++++++++
 gcc/builtins.def            |  17 +++++
 gcc/c-family/c-common.c     | 152 ++++++++++++++++++++++++++++++++++++++++++++
 gcc/c-family/c-cppbuiltin.c |   5 +-
 gcc/doc/cpp.texi            |   4 ++
 gcc/doc/extend.texi         |  68 ++++++++++++++++++++
 gcc/doc/tm.texi             |   9 +++
 gcc/doc/tm.texi.in          |   2 +
 gcc/target.def              |  34 ++++++++++
 gcc/targhooks.c             |  59 +++++++++++++++++
 gcc/targhooks.h             |   3 +
 12 files changed, 449 insertions(+), 1 deletion(-)


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-builtins-Generic-support-for-__builtin_speculation_s.patch --]
[-- Type: text/x-patch; name="0001-builtins-Generic-support-for-__builtin_speculation_s.patch", Size: 24260 bytes --]

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index bb50e60..492d4f6 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -785,6 +785,22 @@ DEF_FUNCTION_TYPE_VAR_3 (BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR,
 DEF_FUNCTION_TYPE_VAR_3 (BT_FN_INT_FILEPTR_INT_CONST_STRING_VAR,
 			 BT_INT, BT_FILEPTR, BT_INT, BT_CONST_STRING)
 
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I1, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+		         BT_I2, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I4, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I8, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_VAR_3 (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR,
+			 BT_I16, BT_CONST_VOLATILE_PTR,  BT_CONST_VOLATILE_PTR,
+			 BT_CONST_VOLATILE_PTR)
+
 DEF_FUNCTION_TYPE_VAR_4 (BT_FN_INT_STRING_INT_SIZE_CONST_STRING_VAR,
 			 BT_INT, BT_STRING, BT_INT, BT_SIZE, BT_CONST_STRING)
 
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 98eb804..c0a15d1 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -6602,6 +6602,79 @@ expand_stack_save (void)
   return ret;
 }
 
+/* Expand a call to __builtin_speculation_safe_load_<N>.  MODE
+   represents the size of the first argument to that call.  We emit a
+   warning if the result isn't used (IGNORE != 0), since the
+   implementation might rely on the value being used to correctly
+   inhibit speculation.  */
+static rtx
+expand_speculation_safe_load (machine_mode mode, tree exp, rtx target,
+			      int ignore)
+{
+  rtx ptr, mem, lower, upper, cmpptr;
+  unsigned nargs = call_expr_nargs (exp);
+
+  if (ignore)
+    {
+      warning_at (input_location, 0,
+		  "result of __builtin_speculation_safe_load must be used to "
+		  "ensure correct operation");
+      target = NULL;
+    }
+
+  tree arg0 = CALL_EXPR_ARG (exp, 0);
+  tree arg1 = CALL_EXPR_ARG (exp, 1);
+  tree arg2 = CALL_EXPR_ARG (exp, 2);
+
+  ptr = expand_expr (arg0, NULL_RTX, ptr_mode, EXPAND_SUM);
+  mem = validize_mem (gen_rtx_MEM (mode, convert_memory_address (Pmode, ptr)));
+
+  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
+			   get_pointer_alignment (arg0)));
+  set_mem_alias_set (mem, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
+
+  /* Mark the memory access as volatile.  We don't want the optimizers to
+     move it or otherwise substitue an alternative value.  */
+  MEM_VOLATILE_P (mem) = 1;
+
+  lower = expand_normal (arg1);
+  if (GET_MODE (lower) != ptr_mode && GET_MODE (lower) != VOIDmode)
+    lower = convert_modes (ptr_mode, VOIDmode, lower,
+			   TYPE_UNSIGNED (TREE_TYPE (arg1)));
+
+  /* Ensure that the upper bound is not NULL.  The builtin is not portable
+     unless we enforce this.  */
+  if (integer_zerop (tree_strip_nop_conversions (arg2)))
+    error_at (input_location, "third argument (upper bound) cannot be NULL");
+
+  upper = expand_normal (arg2);
+  if (GET_MODE (upper) != ptr_mode && GET_MODE (upper) != VOIDmode)
+    upper = convert_modes (ptr_mode, VOIDmode, upper,
+			   TYPE_UNSIGNED (TREE_TYPE (arg2)));
+
+  if (nargs > 3)
+    {
+      tree arg3 = CALL_EXPR_ARG (exp, 3);
+      /* Ensure that cmpptr is not NULL.  The builtin is not portable
+	 unless we enforce this.  */
+      if (integer_zerop (tree_strip_nop_conversions (arg3)))
+	error_at (input_location,
+		  "fourth argument, if present, must be non-NULL");
+
+      cmpptr = expand_normal (arg3);
+      if (GET_MODE (cmpptr) != ptr_mode && GET_MODE (cmpptr) != VOIDmode)
+	cmpptr = convert_modes (ptr_mode, VOIDmode, cmpptr,
+				TYPE_UNSIGNED (TREE_TYPE (arg3)));
+    }
+  else
+    cmpptr = ptr;
+
+  if (target == NULL)
+    target = gen_reg_rtx (mode);
+
+  return targetm.speculation_safe_load (mode, target, mem, lower, upper,
+					cmpptr, true);
+}
 
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
@@ -7732,6 +7805,14 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode,
 	 folding.  */
       break;
 
+    case BUILT_IN_SPECULATION_SAFE_LOAD_1:
+    case BUILT_IN_SPECULATION_SAFE_LOAD_2:
+    case BUILT_IN_SPECULATION_SAFE_LOAD_4:
+    case BUILT_IN_SPECULATION_SAFE_LOAD_8:
+    case BUILT_IN_SPECULATION_SAFE_LOAD_16:
+      mode = get_builtin_sync_mode (fcode - BUILT_IN_SPECULATION_SAFE_LOAD_1);
+      return expand_speculation_safe_load (mode, exp, target, ignore);
+
     default:	/* just do library call, if unknown builtin */
       break;
     }
diff --git a/gcc/builtins.def b/gcc/builtins.def
index 671097e..16fa3e3 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1017,6 +1017,23 @@ DEF_BUILTIN (BUILT_IN_EMUTLS_REGISTER_COMMON,
 	     true, true, true, ATTR_NOTHROW_LEAF_LIST, false,
 	     !targetm.have_tls)
 
+/* Suppressing speculation.  Users are expected to use the first (N)
+   variant, which will be translated internally into one of the other
+   types.  */
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_N, "speculation_safe_load",
+		 BT_FN_VOID_VAR, ATTR_NULL)
+
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_1, "speculation_safe_load_1",
+		 BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_2, "speculation_safe_load_2",
+		 BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_4, "speculation_safe_load_4",
+		 BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_8, "speculation_safe_load_8",
+		 BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_16, "speculation_safe_load_16",
+		 BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+
 /* Exception support.  */
 DEF_BUILTIN_STUB (BUILT_IN_UNWIND_RESUME, "__builtin_unwind_resume")
 DEF_BUILTIN_STUB (BUILT_IN_CXA_END_CLEANUP, "__builtin_cxa_end_cleanup")
diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 197a71f..f8b1fdf 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6456,6 +6456,134 @@ builtin_type_for_size (int size, bool unsignedp)
   return type ? type : error_mark_node;
 }
 
+/* Work out the size of the object pointed to by the first arguement
+   of a call to __builtin_speculation_safe.  Only pointers to
+   integral types and pointers are permitted.  Return 0 if the
+   arguement type is not supported of if the size is too large.  */
+static int
+speculation_safe_load_resolve_size (tree function, vec<tree, va_gc> *params)
+{
+  /* Type of the argument.  */
+  tree type;
+  int size;
+
+  if (vec_safe_is_empty (params))
+    {
+      error ("too few arguments to function %qE", function);
+      return 0;
+    }
+
+  type = TREE_TYPE ((*params)[0]);
+
+  if (!POINTER_TYPE_P (type))
+    goto incompatible;
+
+  type = TREE_TYPE (type);
+
+  if (TREE_CODE (type) == ARRAY_TYPE)
+    {
+      /* Force array-to-pointer decay for c++.  */
+      gcc_assert (c_dialect_cxx ());
+      (*params)[0] = default_conversion ((*params)[0]);
+      type = TREE_TYPE ((*params)[0]);
+    }
+
+  if (!INTEGRAL_TYPE_P (type) && !POINTER_TYPE_P (type))
+    goto incompatible;
+
+  if (!COMPLETE_TYPE_P (type))
+   goto incompatible;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+  if (size == 1 || size == 2 || size == 4 || size == 8 || size == 16)
+    return size;
+
+ incompatible:
+  /* Issue the diagnostic only if the argument is valid, otherwise
+     it would be redundant at best and could be misleading.  */
+  if (type != error_mark_node)
+    error ("operand type %qT is incompatible with argument %d of %qE",
+	   type, 1, function);
+
+  return 0;
+}
+
+/* Validate and coerce PARAMS, the arguments to ORIG_FUNCTION to fit
+   the prototype for FUNCTION.  The first three arguments are
+   mandatory, but shouldn't need casting as they are all pointers and
+   we've already established that the first argument is a pointer to a
+   permitted type.  The two optional arguments may need to be
+   fabricated if they have been omitted.  */
+static bool
+speculation_safe_load_resolve_params (location_t loc, tree orig_function,
+				      tree function,
+				      vec<tree, va_gc> *params)
+{
+  function_args_iterator iter;
+
+  function_args_iter_init (&iter, TREE_TYPE (function));
+  tree arg_type = function_args_iter_cond (&iter);
+  unsigned parmnum;
+  tree val;
+
+  if (params->length () < 3)
+    {
+      error_at (loc, "too few arguments to function %qE", orig_function);
+      return false;
+    }
+  else if (params->length () > 4)
+    {
+      error_at (loc, "too many arguments to function %qE", orig_function);
+      return false;
+    }
+
+  /* Required arguments.  These must all be pointers.  */
+  for (parmnum = 0; parmnum < 3; parmnum++)
+    {
+      arg_type = function_args_iter_cond (&iter);
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  /* Optional pointer to compare against.  */
+  arg_type = function_args_iter_cond (&iter);
+  if (params->length () == 4)
+    {
+      val = (*params)[parmnum];
+      if (TREE_CODE (TREE_TYPE (val)) == ARRAY_TYPE)
+	val = default_conversion (val);
+      if (TREE_CODE (TREE_TYPE (val)) != POINTER_TYPE)
+	goto bad_arg;
+      (*params)[parmnum] = val;
+    }
+
+  return true;
+
+ bad_arg:
+  error_at (loc, "expecting argument of type %qT for argument %u", arg_type,
+	    parmnum);
+  return false;
+}
+
+/* Cast the result of the builtin back to the type pointed to by the
+   first argument, preserving any qualifiers that it might have.  */
+static tree
+speculation_safe_load_resolve_return (tree first_param, tree result)
+{
+  tree ptype = TREE_TYPE (TREE_TYPE (first_param));
+  tree rtype = TREE_TYPE (result);
+  ptype = TYPE_MAIN_VARIANT (ptype);
+
+  if (tree_int_cst_equal (TYPE_SIZE (ptype), TYPE_SIZE (rtype)))
+    return convert (ptype, result);
+
+  return result;
+}
+
 /* A helper function for resolve_overloaded_builtin in resolving the
    overloaded __sync_ builtins.  Returns a positive power of 2 if the
    first operand of PARAMS is a pointer to a supported data type.
@@ -7110,6 +7238,30 @@ resolve_overloaded_builtin (location_t loc, tree function,
   /* Handle BUILT_IN_NORMAL here.  */
   switch (orig_code)
     {
+    case BUILT_IN_SPECULATION_SAFE_LOAD_N:
+      {
+	int n = speculation_safe_load_resolve_size (function, params);
+	tree new_function, first_param, result;
+	enum built_in_function fncode;
+
+	if (n == 0)
+	  return error_mark_node;
+
+	fncode = (enum built_in_function)((int)orig_code + exact_log2 (n) + 1);
+	new_function = builtin_decl_explicit (fncode);
+	first_param = (*params)[0];
+	if (!speculation_safe_load_resolve_params (loc, function, new_function,
+						   params))
+	  return error_mark_node;
+
+	result = build_function_call_vec (loc, vNULL, new_function, params,
+					  NULL);
+	if (result == error_mark_node)
+	  return result;
+
+	return speculation_safe_load_resolve_return (first_param, result);
+      }
+
     case BUILT_IN_ATOMIC_EXCHANGE:
     case BUILT_IN_ATOMIC_COMPARE_EXCHANGE:
     case BUILT_IN_ATOMIC_LOAD:
diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 9e33aed..61f1a2e 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1361,7 +1361,10 @@ c_cpp_builtins (cpp_reader *pfile)
     cpp_define (pfile, "__WCHAR_UNSIGNED__");
 
   cpp_atomic_builtins (pfile);
-    
+
+  /* Show support for __builtin_speculation_safe_load ().  */
+  cpp_define (pfile, "__HAVE_SPECULATION_SAFE_LOAD");
+
 #ifdef DWARF2_UNWIND_INFO
   if (dwarf2out_do_cfi_asm ())
     cpp_define (pfile, "__GCC_HAVE_DWARF2_CFI_ASM");
diff --git a/gcc/doc/cpp.texi b/gcc/doc/cpp.texi
index 94437d5..e1980fe 100644
--- a/gcc/doc/cpp.texi
+++ b/gcc/doc/cpp.texi
@@ -2381,6 +2381,10 @@ If GCC cannot determine the current date, it will emit a warning message
 These macros are defined when the target processor supports atomic compare
 and swap operations on operands 1, 2, 4, 8 or 16 bytes in length, respectively.
 
+@item __HAVE_SPECULATION_SAFE_LAOD
+This macro is defined with the value 1 to show that this version of GCC
+supports @code{__builtin_speculation_safe_load}.
+
 @item __GCC_HAVE_DWARF2_CFI_ASM
 This macro is defined when the compiler is emitting DWARF CFI directives
 to the assembler.  When this is defined, it is possible to emit those same
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 2a553ad..cbee943 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -10971,6 +10971,7 @@ the built-in function returns -1.
 @findex __builtin_powi
 @findex __builtin_powif
 @findex __builtin_powil
+@findex __builtin_speculation_safe_load
 @findex _Exit
 @findex _exit
 @findex abort
@@ -11614,6 +11615,73 @@ check its compatibility with @var{size}.
 
 @end deftypefn
 
+@deftypefn {Built-in Function} @var{type} __builtin_speculation_safe_load (const volatile @var{type} *ptr, const volatile void *lower_bound, const volatile void *upper_bound, const volatile void *cmpptr)
+
+Modern processors use sophisticated techniques to improve their
+overall performance.  For example, the processor may begin to execute
+instructions that it guesses are likely to be needed in the near
+future: this is known as speculative execution.  If the guess turns
+out to be correct then significant time can be saved.  If the guess
+turns out to be wrong the instruction's calculations are discarded and
+the correct ones are then executed and the program will continue as
+the programmer intended.  However, in some circumstances it is
+possible for the discarded operations to leave traces of what happened
+in a manner that can later be discovered by timing subsequent
+operations.  Speculative memory operations, for example, interact with
+the cache memory found on many processors in a way which could
+allow further carefully controlled speculative instructions to reveal
+information.
+
+The @code{__builtin_speculation_safe_load} function provides a means
+to limit the extent to which a processor can continue speculative
+execution with the result of loading a value stored at @var{ptr}.  The
+boundary conditions, described by @var{cmpptr}, @var{lower_bound} and
+@var{upper_bound}, define the conditions under which execution after
+the load can continue safely:
+
+@enumerate
+@item
+When the call to the builtin is not being speculatively executed the
+result is @code{*ptr} if @code{lower_bound <= cmpptr < upper_bound}.
+The behavior is undefined if cmpptr is outside of that range.
+
+@item
+When code is being speculatively executed either:
+@itemize
+@item
+execution of subsequent instructions that depend on the result will
+be prevented until it can be proven that the call to the builtin is
+not being speculatively executed (i.e.@: until execution can continue under
+point 1), or
+
+@item
+speculation may continue using @code{*ptr} as the result when
+@code{lower_bound <= cmpptr < upper_bound}, or an unspecified constant
+value (e.g.@: zero) if @code{cmpptr} lies outside that range.
+@end itemize
+@end enumerate
+
+The type of the result, @var{type}, may be any integral type (signed,
+or unsigned, @code{char}, @code{short}, @code{int}, etc) or a pointer
+to any type.
+
+The final argument, @var{cmpptr}, may be omitted if it is the same as
+@var{ptr}.
+
+The builtin is supported for all architectures, but on machines where
+target-specific support for inhibiting speculation is not implemented,
+or not necessary, the compiler will emit a warning.
+
+@emph{Note:} it is important that the boundary conditions used
+accurately describe the conditions under which speculation may be
+occuring, otherwise the builtin will not provide protection against
+speculative use of the result.
+
+The pre-processor macro @code{__HAVE_SPECULATION_SAFE_LOAD} is defined with the
+value 1 on all implementations of GCC that support this builtin.
+
+@end deftypefn
+
 @deftypefn {Built-in Function} int __builtin_types_compatible_p (@var{type1}, @var{type2})
 
 You can use the built-in function @code{__builtin_types_compatible_p} to
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 9793a0e..9f2206d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11922,6 +11922,15 @@ maintainer is familiar with.
 
 @end defmac
 
+@deftypefn {Target Hook} rtx TARGET_SPECULATION_SAFE_LOAD (machine_mode @var{mode}, rtx @var{result}, rtx @var{mem}, rtx @var{lower_bound}, rtx @var{upper_bound}, rtx @var{cmpptr}, bool @var{warn})
+Generate a target-specific code sequence that implements @code{__builtin_speculation_safe_load}, returning @var{mem}, a @code{MEM} of type @var{mode} in @var{result}. 
+ In the abstract machine, the built-in must only be called when @var{cmpptr} is greater than or equal to @var{lower} and less than @var{upper}.  The behaviour is undefined otherwise (and may generate memory faults). 
+ On real hardware, however, the builtin may be reached while the CPU is speculatively executing code.  It is this built-in's responsibility to ensure that, if speculating when outside the specified boundary conditions, the contents of @var{mem} is not visible to subsequent instructions. 
+ There are two common techniques that can be used to implement this builtin. The first is to emit a target-specific barrier instruction sequence that is guaranteed to ensure that speculative execution cannot continue.  On architectures with such a sequence the bounds checks can be ignored and the expansion can be simply a load and the barrier. 
+ On some targets, however, unsafe speculation can be inhibited by overwriting the result when the speculation bounds are exceeded.  This must be done using a code sequence that does not introduce further speculative behavior. An example of a target using this approach is AArch64. 
+ The default implementation implements the logic of the builtin but cannot provide the target-specific code necessary to inhibit speculation.  If @var{warn} is true a warning will be emitted to that effect.
+@end deftypefn
+
 @deftypefn {Target Hook} void TARGET_RUN_TARGET_SELFTESTS (void)
 If selftests are enabled, run any selftests for this target.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 7bcfb37..3393457 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8075,4 +8075,6 @@ maintainer is familiar with.
 
 @end defmac
 
+@hook TARGET_SPECULATION_SAFE_LOAD
+
 @hook TARGET_RUN_TARGET_SELFTESTS
diff --git a/gcc/target.def b/gcc/target.def
index e9eacc8..0ba42a1 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4214,6 +4214,40 @@ DEFHOOK
  hook_bool_void_true)
 
 DEFHOOK
+(speculation_safe_load,
+ "Generate a target-specific code sequence that implements\
+ @code{__builtin_speculation_safe_load}, returning @var{mem}, a @code{MEM} of\
+ type @var{mode} in @var{result}.\
+ \n\
+ In the abstract machine, the built-in must only be called when @var{cmpptr}\
+ is greater than or equal to @var{lower} and less than @var{upper}.  The\
+ behaviour is undefined otherwise (and may generate memory faults).\
+ \n\
+ On real hardware, however, the builtin may be reached while the CPU is\
+ speculatively executing code.  It is this built-in's responsibility to ensure\
+ that, if speculating when outside the specified boundary conditions, the\
+ contents of @var{mem} is not visible to subsequent instructions.\
+ \n\
+ There are two common techniques that can be used to implement this builtin.\
+ The first is to emit a target-specific barrier instruction sequence that is\
+ guaranteed to ensure that speculative execution cannot continue.  On\
+ architectures with such a sequence the bounds checks can be ignored and\
+ the expansion can be simply a load and the barrier.\
+ \n\
+ On some targets, however, unsafe speculation can be inhibited by overwriting\
+ the result when the speculation bounds are exceeded.  This must be done\
+ using a code sequence that does not introduce further speculative behavior.\
+ An example of a target using this approach is AArch64.\
+ \n\
+ The default implementation implements the logic of the builtin\
+ but cannot provide the target-specific code necessary to inhibit\
+ speculation.  If @var{warn} is true a warning will be emitted to that\
+ effect.",
+ rtx, (machine_mode mode, rtx result, rtx mem, rtx lower_bound,
+       rtx upper_bound, rtx cmpptr, bool warn),
+ default_speculation_safe_load)
+
+DEFHOOK
 (can_use_doloop_p,
  "Return true if it is possible to use low-overhead loops (@code{doloop_end}\n\
 and @code{doloop_begin}) for a particular loop.  @var{iterations} gives the\n\
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 653567c..228b966 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -82,6 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "real.h"
 #include "langhooks.h"
+#include "dojump.h"
 
 bool
 default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -2307,4 +2308,62 @@ default_stack_clash_protection_final_dynamic_probe (rtx residual ATTRIBUTE_UNUSE
   return 0;
 }
 
+/* Default implementation of the speculation-safe-load builtin.
+   This version does not have, or know of, the target-specific
+   mechanisms necessary to inhibit speculation, so it simply emits a
+   code sequence that implements the architectural aspects of the
+   builtin.  */
+rtx
+default_speculation_safe_load (machine_mode mode ATTRIBUTE_UNUSED,
+			       rtx result, rtx mem, rtx lower_bound,
+			       rtx upper_bound, rtx cmpptr, bool warn)
+{
+  rtx_code_label *done_label = gen_label_rtx ();
+  rtx_code_label *inrange_label = gen_label_rtx ();
+
+  if (warn)
+    warning_at
+      (input_location, 0,
+       "this target does not support anti-speculation operations.  "
+       "Your program will still execute correctly, but speculation "
+       "will not be inhibited");
+
+  /* We don't have any speculation barriers, but if we mark the branch
+     probabilities to be always predicting the out-of-bounds path, then
+     there's a higher chance that the compiler will order code so that
+     static prediction will fall through a safe path.  */
+  if (lower_bound == const0_rtx)
+    {
+      /* There's no point in checking against a lower bound of zero, simply
+	 skip that case as it tells us nothing about the speculation
+	 condition.  */
+      do_compare_rtx_and_jump (cmpptr, upper_bound, LTU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_move_insn (result, GEN_INT (0));
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+  else
+    {
+      rtx_code_label *oob_label = gen_label_rtx ();
+      do_compare_rtx_and_jump (cmpptr, lower_bound, LTU, true, ptr_mode,
+			       NULL, NULL, oob_label,
+			       profile_probability::always ());
+      do_compare_rtx_and_jump (cmpptr, upper_bound, GEU, true, ptr_mode,
+			       NULL, NULL, inrange_label,
+			       profile_probability::never ());
+      emit_label (oob_label);
+      emit_move_insn (result, GEN_INT (0));
+      emit_jump (done_label);
+      emit_label (inrange_label);
+      emit_move_insn (result, mem);
+      emit_label (done_label);
+    }
+
+  return result;
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index e753e58..76ba150 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -286,4 +286,7 @@ extern enum flt_eval_method
 default_excess_precision (enum excess_precision_type ATTRIBUTE_UNUSED);
 extern bool default_stack_clash_protection_final_dynamic_probe (rtx);
 
+extern rtx
+default_speculation_safe_load (machine_mode, rtx, rtx, rtx, rtx, rtx, bool);
+
 #endif /* GCC_TARGHOOKS_H */

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load
  2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
                     ` (2 preceding siblings ...)
  2018-01-17 14:59   ` [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load() Richard Earnshaw
@ 2018-01-17 17:17   ` Joseph Myers
  3 siblings, 0 replies; 70+ messages in thread
From: Joseph Myers @ 2018-01-17 17:17 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

This patch series seems to be missing testcases (generic and 
architecture-specific).  Generic ones should include testing the error 
cases that are diagnosed.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-17 17:22     ` Joseph Myers
@ 2018-01-17 17:22       ` Jakub Jelinek
  0 siblings, 0 replies; 70+ messages in thread
From: Jakub Jelinek @ 2018-01-17 17:22 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Richard Earnshaw, gcc-patches

On Wed, Jan 17, 2018 at 05:17:29PM +0000, Joseph Myers wrote:
> On Wed, 17 Jan 2018, Richard Earnshaw wrote:
> 
> > +  if (TREE_CODE (type) == ARRAY_TYPE)
> > +    {
> > +      /* Force array-to-pointer decay for c++.  */
> > +      gcc_assert (c_dialect_cxx ());
> 
> What's the basis for the assertion?  Why can't you have a pointer-to-array 
> passed in C?

Yeah, please see e.g. the PR82112 patches for a reason why something like
this doesn't work in C, try it with -std=gnu90 and
struct S { int a[10]; } bar (void);
...
  __whatever_builtin (bar ().a, ...);

	Jakub

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-17 14:59   ` [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load() Richard Earnshaw
@ 2018-01-17 17:22     ` Joseph Myers
  2018-01-17 17:22       ` Jakub Jelinek
  2018-01-18 12:48     ` Richard Biener
  1 sibling, 1 reply; 70+ messages in thread
From: Joseph Myers @ 2018-01-17 17:22 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

On Wed, 17 Jan 2018, Richard Earnshaw wrote:

> +  if (TREE_CODE (type) == ARRAY_TYPE)
> +    {
> +      /* Force array-to-pointer decay for c++.  */
> +      gcc_assert (c_dialect_cxx ());

What's the basis for the assertion?  Why can't you have a pointer-to-array 
passed in C?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-17 14:59   ` [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load() Richard Earnshaw
  2018-01-17 17:22     ` Joseph Myers
@ 2018-01-18 12:48     ` Richard Biener
  2018-01-18 12:52       ` Richard Biener
  2018-01-18 14:17       ` Richard Earnshaw (lists)
  1 sibling, 2 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-18 12:48 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

On Wed, Jan 17, 2018 at 3:55 PM, Richard Earnshaw
<Richard.Earnshaw@arm.com> wrote:
>
> This patch adds generic support for the new builtin
> __builtin_speculation_safe_load.  It provides the overloading of the
> different access sizes and a default fall-back expansion for targets
> that do not support a mechanism for inhibiting speculation.

+  if (ignore)
+    {
+      warning_at (input_location, 0,
+                 "result of __builtin_speculation_safe_load must be used to "
+                 "ensure correct operation");
+      target = NULL;
+    }

This warning cannot be disabled but these kind of cases could appear via
path isolation or missed optimizations since memory optimizations do not
recognize such speculative loads.  So - should it not be emitted way
earlier instead, like from the FEs or during some early warning pass?

In which case it could be an error as well, no?

+  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
+                          get_pointer_alignment (arg0)));

err...  so there's no way to do an unaligned speculation safe load?  Using
just get_pointer_alignment would be safe here.   Other builtins/IFNs explicitely
pass down the alignment and FEs generate such alignment from their
language constraints.

+  set_mem_alias_set (mem, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));

sorry, but pointer types are arbitrary in GIMPLE so this is clearly
wrong.  Without
more information you have to use zero.  Thus,

__builtin_speculation_safe_load ((int *)p, ...);

will do the wrong thing if the type of p is not int *.

+  /* Mark the memory access as volatile.  We don't want the optimizers to
+     move it or otherwise substitue an alternative value.  */
+  MEM_VOLATILE_P (mem) = 1;

how is moving or substituting in any way related to inhibiting speculation?
Why's this done for all targets rather than only those that need any such
mitigation?

I btw miss implementation for x86_64 (at least) where the agreed upon mitigation
is to insert 'lfence' before the load.

Any reason the builtin expansion is not fully left to the targets?

+/* Suppressing speculation.  Users are expected to use the first (N)
+   variant, which will be translated internally into one of the other
+   types.  */
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_N, "speculation_safe_load",
+                BT_FN_VOID_VAR, ATTR_NULL)
+
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_1, "speculation_safe_load_1",
+                BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
+DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_2, "speculation_safe_load_2",
+                BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)

any reason you are not simply using an internal function for the
non-_N variants?
Looks like you're closely following the atomic/sync builtins but those
predate IFNs I think.
They also have more constraints on alignment and use a restrictive alias set.

+rtx
+default_speculation_safe_load (machine_mode mode ATTRIBUTE_UNUSED,
+                              rtx result, rtx mem, rtx lower_bound,
+                              rtx upper_bound, rtx cmpptr, bool warn)
+{
+  rtx_code_label *done_label = gen_label_rtx ();
+  rtx_code_label *inrange_label = gen_label_rtx ();
+
+  if (warn)
+    warning_at
+      (input_location, 0,
+       "this target does not support anti-speculation operations.  "
+       "Your program will still execute correctly, but speculation "
+       "will not be inhibited");

so there's no way to inhibit this warning than changing all targets?

Iff it is correct for a target to expand this to an unconditional load with a
preceeding fence why not have the generic target hook implementation
do exactly that - emit an unconditional move?  Maybe I misunderstand
the clear words in the hook docs, but the builtin docs also say
the behavior is undefined if the builtin is executed when the condition
is false - which means traps are allowed.

Richard.


>         * builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>         New builtin type signature.
>         (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>         (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>         (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>         (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>         * builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
>         (BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
>         (BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
>         (BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
>         (BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
>         (BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
>         * target.def (speculation_safe_load): New hook.
>         * doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
>         documentation.
>         * doc/tm.texi: Regenerated.
>         * doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
>         * doc/extend.texi: Document __builtin_speculation_safe_load.
>         * c-family/c-common.c (speculation_safe_load_resolve_size): New
>         function.
>         (speculation_safe_load_resolve_params): New function.
>         (speculation_safe_load_resolve_return): New function.
>         (resolve_overloaded_builtin): Handle overloading
>         __builtin_speculation_safe_load.
>         * builtins.c (expand_speculation_safe_load): New function.
>         (expand_builtin): Handle new speculation-safe builtins.
>         * targhooks.h (default_speculation_safe_load): Declare.
>         * targhooks.c (default_speculation_safe_load): New function.
> ---
>  gcc/builtin-types.def       |  16 +++++
>  gcc/builtins.c              |  81 +++++++++++++++++++++++
>  gcc/builtins.def            |  17 +++++
>  gcc/c-family/c-common.c     | 152 ++++++++++++++++++++++++++++++++++++++++++++
>  gcc/c-family/c-cppbuiltin.c |   5 +-
>  gcc/doc/cpp.texi            |   4 ++
>  gcc/doc/extend.texi         |  68 ++++++++++++++++++++
>  gcc/doc/tm.texi             |   9 +++
>  gcc/doc/tm.texi.in          |   2 +
>  gcc/target.def              |  34 ++++++++++
>  gcc/targhooks.c             |  59 +++++++++++++++++
>  gcc/targhooks.h             |   3 +
>  12 files changed, 449 insertions(+), 1 deletion(-)
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-18 12:48     ` Richard Biener
@ 2018-01-18 12:52       ` Richard Biener
  2018-01-18 14:17       ` Richard Earnshaw (lists)
  1 sibling, 0 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-18 12:52 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: GCC Patches

On Thu, Jan 18, 2018 at 1:44 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, Jan 17, 2018 at 3:55 PM, Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>> This patch adds generic support for the new builtin
>> __builtin_speculation_safe_load.  It provides the overloading of the
>> different access sizes and a default fall-back expansion for targets
>> that do not support a mechanism for inhibiting speculation.
>
> +  if (ignore)
> +    {
> +      warning_at (input_location, 0,
> +                 "result of __builtin_speculation_safe_load must be used to "
> +                 "ensure correct operation");
> +      target = NULL;
> +    }
>
> This warning cannot be disabled but these kind of cases could appear via
> path isolation or missed optimizations since memory optimizations do not
> recognize such speculative loads.  So - should it not be emitted way
> earlier instead, like from the FEs or during some early warning pass?
>
> In which case it could be an error as well, no?
>
> +  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
> +                          get_pointer_alignment (arg0)));
>
> err...  so there's no way to do an unaligned speculation safe load?  Using
> just get_pointer_alignment would be safe here.   Other builtins/IFNs explicitely
> pass down the alignment and FEs generate such alignment from their
> language constraints.
>
> +  set_mem_alias_set (mem, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
>
> sorry, but pointer types are arbitrary in GIMPLE so this is clearly
> wrong.  Without
> more information you have to use zero.  Thus,
>
> __builtin_speculation_safe_load ((int *)p, ...);
>
> will do the wrong thing if the type of p is not int *.
>
> +  /* Mark the memory access as volatile.  We don't want the optimizers to
> +     move it or otherwise substitue an alternative value.  */
> +  MEM_VOLATILE_P (mem) = 1;
>
> how is moving or substituting in any way related to inhibiting speculation?
> Why's this done for all targets rather than only those that need any such
> mitigation?
>
> I btw miss implementation for x86_64 (at least) where the agreed upon mitigation
> is to insert 'lfence' before the load.
>
> Any reason the builtin expansion is not fully left to the targets?
>
> +/* Suppressing speculation.  Users are expected to use the first (N)
> +   variant, which will be translated internally into one of the other
> +   types.  */
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_N, "speculation_safe_load",
> +                BT_FN_VOID_VAR, ATTR_NULL)
> +
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_1, "speculation_safe_load_1",
> +                BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_2, "speculation_safe_load_2",
> +                BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
>
> any reason you are not simply using an internal function for the
> non-_N variants?
> Looks like you're closely following the atomic/sync builtins but those
> predate IFNs I think.
> They also have more constraints on alignment and use a restrictive alias set.
>
> +rtx
> +default_speculation_safe_load (machine_mode mode ATTRIBUTE_UNUSED,
> +                              rtx result, rtx mem, rtx lower_bound,
> +                              rtx upper_bound, rtx cmpptr, bool warn)
> +{
> +  rtx_code_label *done_label = gen_label_rtx ();
> +  rtx_code_label *inrange_label = gen_label_rtx ();
> +
> +  if (warn)
> +    warning_at
> +      (input_location, 0,
> +       "this target does not support anti-speculation operations.  "
> +       "Your program will still execute correctly, but speculation "
> +       "will not be inhibited");
>
> so there's no way to inhibit this warning than changing all targets?
>
> Iff it is correct for a target to expand this to an unconditional load with a
> preceeding fence why not have the generic target hook implementation
> do exactly that - emit an unconditional move?  Maybe I misunderstand
> the clear words in the hook docs, but the builtin docs also say
> the behavior is undefined if the builtin is executed when the condition
> is false - which means traps are allowed.

Oh, and I think I'd like to see the builtins folded to regular MEMs very early
if the target doesn't provide TARGET_SPECULATION_SAFE_LOAD.  Ideally
directly from where _n is disambiguated in the FE.

Richard.

> Richard.
>
>
>>         * builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>         New builtin type signature.
>>         (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         * builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
>>         * target.def (speculation_safe_load): New hook.
>>         * doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
>>         documentation.
>>         * doc/tm.texi: Regenerated.
>>         * doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
>>         * doc/extend.texi: Document __builtin_speculation_safe_load.
>>         * c-family/c-common.c (speculation_safe_load_resolve_size): New
>>         function.
>>         (speculation_safe_load_resolve_params): New function.
>>         (speculation_safe_load_resolve_return): New function.
>>         (resolve_overloaded_builtin): Handle overloading
>>         __builtin_speculation_safe_load.
>>         * builtins.c (expand_speculation_safe_load): New function.
>>         (expand_builtin): Handle new speculation-safe builtins.
>>         * targhooks.h (default_speculation_safe_load): Declare.
>>         * targhooks.c (default_speculation_safe_load): New function.
>> ---
>>  gcc/builtin-types.def       |  16 +++++
>>  gcc/builtins.c              |  81 +++++++++++++++++++++++
>>  gcc/builtins.def            |  17 +++++
>>  gcc/c-family/c-common.c     | 152 ++++++++++++++++++++++++++++++++++++++++++++
>>  gcc/c-family/c-cppbuiltin.c |   5 +-
>>  gcc/doc/cpp.texi            |   4 ++
>>  gcc/doc/extend.texi         |  68 ++++++++++++++++++++
>>  gcc/doc/tm.texi             |   9 +++
>>  gcc/doc/tm.texi.in          |   2 +
>>  gcc/target.def              |  34 ++++++++++
>>  gcc/targhooks.c             |  59 +++++++++++++++++
>>  gcc/targhooks.h             |   3 +
>>  12 files changed, 449 insertions(+), 1 deletion(-)
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-18 12:48     ` Richard Biener
  2018-01-18 12:52       ` Richard Biener
@ 2018-01-18 14:17       ` Richard Earnshaw (lists)
  2018-01-18 14:59         ` Richard Biener
  1 sibling, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-18 14:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

On 18/01/18 12:44, Richard Biener wrote:
> On Wed, Jan 17, 2018 at 3:55 PM, Richard Earnshaw
> <Richard.Earnshaw@arm.com> wrote:
>>
>> This patch adds generic support for the new builtin
>> __builtin_speculation_safe_load.  It provides the overloading of the
>> different access sizes and a default fall-back expansion for targets
>> that do not support a mechanism for inhibiting speculation.
> 
> +  if (ignore)
> +    {
> +      warning_at (input_location, 0,
> +                 "result of __builtin_speculation_safe_load must be used to "
> +                 "ensure correct operation");
> +      target = NULL;
> +    }
> 
> This warning cannot be disabled but these kind of cases could appear via
> path isolation or missed optimizations since memory optimizations do not
> recognize such speculative loads.  So - should it not be emitted way
> earlier instead, like from the FEs or during some early warning pass?

If you think so, and can recommend where to put it.  I'm not
particularly familiar with the front-end code, being mostly a back-end
maintainer.  Otherwise we could probably drop this entirely, but that
gives scope for users to do the Wrong Thing(tm)!

> 
> In which case it could be an error as well, no?
> 
> +  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
> +                          get_pointer_alignment (arg0)));
> 
> err...  so there's no way to do an unaligned speculation safe load?  Using
> just get_pointer_alignment would be safe here.   Other builtins/IFNs explicitely
> pass down the alignment and FEs generate such alignment from their
> language constraints.

No, last time I looked GCC didn't support taking the address of an
unaligned object and casting pointer to char to pointer to int implies
you know that it is correctly aligned.

> 
> +  set_mem_alias_set (mem, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
> 
> sorry, but pointer types are arbitrary in GIMPLE so this is clearly
> wrong.  Without
> more information you have to use zero.  Thus,
> 
> __builtin_speculation_safe_load ((int *)p, ...);
> 
> will do the wrong thing if the type of p is not int *.

Ok, I'll fix that.

> 
> +  /* Mark the memory access as volatile.  We don't want the optimizers to
> +     move it or otherwise substitue an alternative value.  */
> +  MEM_VOLATILE_P (mem) = 1;
> 
> how is moving or substituting in any way related to inhibiting speculation?
> Why's this done for all targets rather than only those that need any such
> mitigation?

I can move that, but it seemed right to do it here.

> 
> I btw miss implementation for x86_64 (at least) where the agreed upon mitigation
> is to insert 'lfence' before the load.

/I'm/ not planning on doing implementations for other targets.  I don't
know those back-ends.  Sorry.  It really needs the attention of a port
expert.

> 
> Any reason the builtin expansion is not fully left to the targets?

Because that would be a significant impediment to deploying the builtin
in generic code.  The compiler would have to error out if it couldn't
handle the builtin at all.  By providing a generic implementation we can
support everything, even if you do get a warning.

> 
> +/* Suppressing speculation.  Users are expected to use the first (N)
> +   variant, which will be translated internally into one of the other
> +   types.  */
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_N, "speculation_safe_load",
> +                BT_FN_VOID_VAR, ATTR_NULL)
> +
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_1, "speculation_safe_load_1",
> +                BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_2, "speculation_safe_load_2",
> +                BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
> 
> any reason you are not simply using an internal function for the
> non-_N variants?
> Looks like you're closely following the atomic/sync builtins but those
> predate IFNs I think.
> They also have more constraints on alignment and use a restrictive alias set.

Wouldn't that make it more difficult to do backports to older versions
of GCC?  I know some folk are interested in going back as far as
gcc-4.8, and maybe even earlier.

> 
> +rtx
> +default_speculation_safe_load (machine_mode mode ATTRIBUTE_UNUSED,
> +                              rtx result, rtx mem, rtx lower_bound,
> +                              rtx upper_bound, rtx cmpptr, bool warn)
> +{
> +  rtx_code_label *done_label = gen_label_rtx ();
> +  rtx_code_label *inrange_label = gen_label_rtx ();
> +
> +  if (warn)
> +    warning_at
> +      (input_location, 0,
> +       "this target does not support anti-speculation operations.  "
> +       "Your program will still execute correctly, but speculation "
> +       "will not be inhibited");
> 
> so there's no way to inhibit this warning than changing all targets?

No.  It's trivial to bind this to a hook that just passes all the other
arguments through to the generic code with the warning suppressed - we
could even add an additional definition to do that.  But binding to it
directly as the default means that port maintainers may not notice that
they need to take action here; even if that is to confirm that they've
nothing to worry about.

> 
> Iff it is correct for a target to expand this to an unconditional load with a
> preceeding fence why not have the generic target hook implementation
> do exactly that - emit an unconditional move?  Maybe I misunderstand
> the clear words in the hook docs, but the builtin docs also say
> the behavior is undefined if the builtin is executed when the condition
> is false - which means traps are allowed.
> 

Certainly a permitted implementation of the builtin would conditionally
nullify the pointer if it were out-of-bounds (rather than nullifying the
result).  That would then fault if reached when not speculating.

As to why that isn't the default implementation, three reasons.  Firstly
the code here is mostly unchanged from the first implementation I did
which required the conditional behaviour.  Secondly, on some systems
with simple static branch predictors and speculation the default
expansion may be "good enough" without an additional barrier.  Finally,
not all architectures define a suitable barrier so there's nothing to
expand to in that case.

R.

> Richard.
> 
> 
>>         * builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>         New builtin type signature.
>>         (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>         * builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
>>         (BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
>>         * target.def (speculation_safe_load): New hook.
>>         * doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
>>         documentation.
>>         * doc/tm.texi: Regenerated.
>>         * doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
>>         * doc/extend.texi: Document __builtin_speculation_safe_load.
>>         * c-family/c-common.c (speculation_safe_load_resolve_size): New
>>         function.
>>         (speculation_safe_load_resolve_params): New function.
>>         (speculation_safe_load_resolve_return): New function.
>>         (resolve_overloaded_builtin): Handle overloading
>>         __builtin_speculation_safe_load.
>>         * builtins.c (expand_speculation_safe_load): New function.
>>         (expand_builtin): Handle new speculation-safe builtins.
>>         * targhooks.h (default_speculation_safe_load): Declare.
>>         * targhooks.c (default_speculation_safe_load): New function.
>> ---
>>  gcc/builtin-types.def       |  16 +++++
>>  gcc/builtins.c              |  81 +++++++++++++++++++++++
>>  gcc/builtins.def            |  17 +++++
>>  gcc/c-family/c-common.c     | 152 ++++++++++++++++++++++++++++++++++++++++++++
>>  gcc/c-family/c-cppbuiltin.c |   5 +-
>>  gcc/doc/cpp.texi            |   4 ++
>>  gcc/doc/extend.texi         |  68 ++++++++++++++++++++
>>  gcc/doc/tm.texi             |   9 +++
>>  gcc/doc/tm.texi.in          |   2 +
>>  gcc/target.def              |  34 ++++++++++
>>  gcc/targhooks.c             |  59 +++++++++++++++++
>>  gcc/targhooks.h             |   3 +
>>  12 files changed, 449 insertions(+), 1 deletion(-)
>>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load()
  2018-01-18 14:17       ` Richard Earnshaw (lists)
@ 2018-01-18 14:59         ` Richard Biener
  0 siblings, 0 replies; 70+ messages in thread
From: Richard Biener @ 2018-01-18 14:59 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: GCC Patches

On Thu, Jan 18, 2018 at 3:07 PM, Richard Earnshaw (lists)
<Richard.Earnshaw@arm.com> wrote:
> On 18/01/18 12:44, Richard Biener wrote:
>> On Wed, Jan 17, 2018 at 3:55 PM, Richard Earnshaw
>> <Richard.Earnshaw@arm.com> wrote:
>>>
>>> This patch adds generic support for the new builtin
>>> __builtin_speculation_safe_load.  It provides the overloading of the
>>> different access sizes and a default fall-back expansion for targets
>>> that do not support a mechanism for inhibiting speculation.
>>
>> +  if (ignore)
>> +    {
>> +      warning_at (input_location, 0,
>> +                 "result of __builtin_speculation_safe_load must be used to "
>> +                 "ensure correct operation");
>> +      target = NULL;
>> +    }
>>
>> This warning cannot be disabled but these kind of cases could appear via
>> path isolation or missed optimizations since memory optimizations do not
>> recognize such speculative loads.  So - should it not be emitted way
>> earlier instead, like from the FEs or during some early warning pass?
>
> If you think so, and can recommend where to put it.  I'm not
> particularly familiar with the front-end code, being mostly a back-end
> maintainer.  Otherwise we could probably drop this entirely, but that
> gives scope for users to do the Wrong Thing(tm)!

I'd say the same place you do the __builtin_X -> __builtin_X_{1,2,4,8}
disambiguation.

>>
>> In which case it could be an error as well, no?
>>
>> +  set_mem_align (mem, MAX (GET_MODE_ALIGNMENT (mode),
>> +                          get_pointer_alignment (arg0)));
>>
>> err...  so there's no way to do an unaligned speculation safe load?  Using
>> just get_pointer_alignment would be safe here.   Other builtins/IFNs explicitely
>> pass down the alignment and FEs generate such alignment from their
>> language constraints.
>
> No, last time I looked GCC didn't support taking the address of an
> unaligned object and casting pointer to char to pointer to int implies
> you know that it is correctly aligned.

Sure, just take the address of a packed struct member.

__builtin_speculation_safe_load (&p->x, ...);

where's the cast here?

>
>>
>> +  set_mem_alias_set (mem, get_alias_set (TREE_TYPE (TREE_TYPE (arg0))));
>>
>> sorry, but pointer types are arbitrary in GIMPLE so this is clearly
>> wrong.  Without
>> more information you have to use zero.  Thus,
>>
>> __builtin_speculation_safe_load ((int *)p, ...);
>>
>> will do the wrong thing if the type of p is not int *.
>
> Ok, I'll fix that.
>
>>
>> +  /* Mark the memory access as volatile.  We don't want the optimizers to
>> +     move it or otherwise substitue an alternative value.  */
>> +  MEM_VOLATILE_P (mem) = 1;
>>
>> how is moving or substituting in any way related to inhibiting speculation?
>> Why's this done for all targets rather than only those that need any such
>> mitigation?
>
> I can move that, but it seemed right to do it here.
>
>>
>> I btw miss implementation for x86_64 (at least) where the agreed upon mitigation
>> is to insert 'lfence' before the load.
>
> /I'm/ not planning on doing implementations for other targets.  I don't
> know those back-ends.  Sorry.  It really needs the attention of a port
> expert.

Ok, fair enough.  But I'd not like to backport and release an
incomplete feature.

>>
>> Any reason the builtin expansion is not fully left to the targets?
>
> Because that would be a significant impediment to deploying the builtin
> in generic code.  The compiler would have to error out if it couldn't
> handle the builtin at all.  By providing a generic implementation we can
> support everything, even if you do get a warning.

I meant you are doing part of the expansion in generic code, including
expanding the MEM, setting it volatile, expanding the bounds (even if
a target only emits a barrier and the load).

So I'd expected expand_speculation_safe_load to just call the hook.
_Maybe_ do some initial MEM expansion.

>>
>> +/* Suppressing speculation.  Users are expected to use the first (N)
>> +   variant, which will be translated internally into one of the other
>> +   types.  */
>> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_N, "speculation_safe_load",
>> +                BT_FN_VOID_VAR, ATTR_NULL)
>> +
>> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_1, "speculation_safe_load_1",
>> +                BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
>> +DEF_GCC_BUILTIN (BUILT_IN_SPECULATION_SAFE_LOAD_2, "speculation_safe_load_2",
>> +                BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR, ATTR_NULL)
>>
>> any reason you are not simply using an internal function for the
>> non-_N variants?
>> Looks like you're closely following the atomic/sync builtins but those
>> predate IFNs I think.
>> They also have more constraints on alignment and use a restrictive alias set.
>
> Wouldn't that make it more difficult to do backports to older versions
> of GCC?  I know some folk are interested in going back as far as
> gcc-4.8, and maybe even earlier.

Possibly.

>>
>> +rtx
>> +default_speculation_safe_load (machine_mode mode ATTRIBUTE_UNUSED,
>> +                              rtx result, rtx mem, rtx lower_bound,
>> +                              rtx upper_bound, rtx cmpptr, bool warn)
>> +{
>> +  rtx_code_label *done_label = gen_label_rtx ();
>> +  rtx_code_label *inrange_label = gen_label_rtx ();
>> +
>> +  if (warn)
>> +    warning_at
>> +      (input_location, 0,
>> +       "this target does not support anti-speculation operations.  "
>> +       "Your program will still execute correctly, but speculation "
>> +       "will not be inhibited");
>>
>> so there's no way to inhibit this warning than changing all targets?
>
> No.  It's trivial to bind this to a hook that just passes all the other
> arguments through to the generic code with the warning suppressed - we
> could even add an additional definition to do that.  But binding to it
> directly as the default means that port maintainers may not notice that
> they need to take action here; even if that is to confirm that they've
> nothing to worry about.

The machinery to do that would be to fail the build (of GCC!), not to
bother the users...

>> Iff it is correct for a target to expand this to an unconditional load with a
>> preceeding fence why not have the generic target hook implementation
>> do exactly that - emit an unconditional move?  Maybe I misunderstand
>> the clear words in the hook docs, but the builtin docs also say
>> the behavior is undefined if the builtin is executed when the condition
>> is false - which means traps are allowed.
>>
>
> Certainly a permitted implementation of the builtin would conditionally
> nullify the pointer if it were out-of-bounds (rather than nullifying the
> result).  That would then fault if reached when not speculating.
>
> As to why that isn't the default implementation, three reasons.  Firstly
> the code here is mostly unchanged from the first implementation I did
> which required the conditional behaviour.  Secondly, on some systems
> with simple static branch predictors and speculation the default
> expansion may be "good enough" without an additional barrier.  Finally,
> not all architectures define a suitable barrier so there's nothing to
> expand to in that case.

But that's mostly speculation?  Either we add a optab for this kind of
barrier and use it when available or we leave it all to the target in which
case the default should expand like for an unaffected arch.

Richard.

> R.
>
>> Richard.
>>
>>
>>>         * builtin-types.def (BT_FN_I1_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR):
>>>         New builtin type signature.
>>>         (BT_FN_I2_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>         (BT_FN_I4_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>         (BT_FN_I8_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>         (BT_FN_I16_CONST_VPTR_CONST_VPTR_CONST_VPTR_VAR): Likewise.
>>>         * builtins.def (BUILT_IN_SPECULATION_SAFE_LOAD_N): New builtin.
>>>         (BUILT_IN_SPECULATION_SAFE_LOAD_1): Likewise.
>>>         (BUILT_IN_SPECULATION_SAFE_LOAD_2): Likewise.
>>>         (BUILT_IN_SPECULATION_SAFE_LOAD_4): Likewise.
>>>         (BUILT_IN_SPECULATION_SAFE_LOAD_8): Likewise.
>>>         (BUILT_IN_SPECULATION_SAFE_LOAD_16): Likewise.
>>>         * target.def (speculation_safe_load): New hook.
>>>         * doc/tm.texi.in (TARGET_SPECULATION_SAFE_LOAD): Add to
>>>         documentation.
>>>         * doc/tm.texi: Regenerated.
>>>         * doc/cpp.texi: Document __HAVE_SPECULATION_SAFE_LOAD.
>>>         * doc/extend.texi: Document __builtin_speculation_safe_load.
>>>         * c-family/c-common.c (speculation_safe_load_resolve_size): New
>>>         function.
>>>         (speculation_safe_load_resolve_params): New function.
>>>         (speculation_safe_load_resolve_return): New function.
>>>         (resolve_overloaded_builtin): Handle overloading
>>>         __builtin_speculation_safe_load.
>>>         * builtins.c (expand_speculation_safe_load): New function.
>>>         (expand_builtin): Handle new speculation-safe builtins.
>>>         * targhooks.h (default_speculation_safe_load): Declare.
>>>         * targhooks.c (default_speculation_safe_load): New function.
>>> ---
>>>  gcc/builtin-types.def       |  16 +++++
>>>  gcc/builtins.c              |  81 +++++++++++++++++++++++
>>>  gcc/builtins.def            |  17 +++++
>>>  gcc/c-family/c-common.c     | 152 ++++++++++++++++++++++++++++++++++++++++++++
>>>  gcc/c-family/c-cppbuiltin.c |   5 +-
>>>  gcc/doc/cpp.texi            |   4 ++
>>>  gcc/doc/extend.texi         |  68 ++++++++++++++++++++
>>>  gcc/doc/tm.texi             |   9 +++
>>>  gcc/doc/tm.texi.in          |   2 +
>>>  gcc/target.def              |  34 ++++++++++
>>>  gcc/targhooks.c             |  59 +++++++++++++++++
>>>  gcc/targhooks.h             |   3 +
>>>  12 files changed, 449 insertions(+), 1 deletion(-)
>>>
>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 17:53       ` Richard Earnshaw (lists)
  2018-01-09 18:00         ` Bernd Edlinger
@ 2018-01-09 22:54         ` Bernd Edlinger
  1 sibling, 0 replies; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-09 22:54 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches

On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:36, Bernd Edlinger wrote:
>> Richard Earnshaw wrote:
>>   > Let me give an example, we use the generic code expansion if we
>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>   > speculation and we really want to warn users that they've done this (it
>>   > might be a mistake in how they're invoking the compiler).
>>
>> Instead of this warning in one of the unsupported cases, could you use
>> the DSB SYS + ISB  barrier as the white paper suggests?
>>
>>
>> Bernd.
> 
> Thumb1 doesn't have those instructions.
> 


I don't know if it helps, but a few years ago I have written some
assembler macros that allow to temporarily switch from THUMB to ARM
mode and back on an ARMv5 target, that used to worked quite well:

# define HAL_ARM_MODE() ";"                              \
               " .align 2;"                               \
               " bx pc;"                                  \
               " nop;"                                    \
               " .code 32;"

# define HAL_THUMB_MODE(_scratch_) ";"                   \
               " orr " #_scratch_ ",pc,#1;"               \
               " bx " #_scratch_ ";"                      \
               " .code 16;"

#define HAL_DISABLE_INTERRUPTS(_old_)           \
     asm volatile (                              \
         HAL_ARM_MODE()                          \
         "mrs %0,cpsr;"                          \
         "orr r3,%0,%1;"                         \
         "msr cpsr,r3"                           \
         HAL_THUMB_MODE(r3)                      \
         : "=r"(_old_)                           \
         : "i"(CPSR_INTR_MASK)                   \
         : "r3"                                  \
         );

#define HAL_ENABLE_INTERRUPTS()                 \
     asm volatile (                              \
         HAL_ARM_MODE()                          \
         "mrs r3,cpsr;"                          \
         "bic r3,r3,%0;"                         \
         "msr cpsr,r3"                           \
         HAL_THUMB_MODE(r3)                      \
         :                                       \
         : "i"(CPSR_INTR_MASK)                   \
         : "r3"                                  \
         );


So the the switch back macro just needed a scratch register,
that's all.


Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 18:21             ` Richard Earnshaw (lists)
@ 2018-01-09 18:22               ` Bernd Edlinger
  0 siblings, 0 replies; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-09 18:22 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches

On 01/09/18 19:08, Richard Earnshaw (lists) wrote:
> 
> But we probably don't need to.
> 
>   int x __attribute__((mode(TI))) = 0;
> 
> $ arm-none-elf-gcc timode.c
> 
> /tmp/ti.c:1:1: error: unable to emulate ‘TI’
>   int x __attribute__((mode(TI))) = 0;
>   ^~~
> 

Yes, good, then you should probably do this:

+  if (TARGET_THUMB1)
+    return default_inhibit_load_speculation (mode, result, mem, 
lower_bound,
+					     upper_bound, fail_result, cmpptr);


Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 18:01           ` Richard Earnshaw (lists)
@ 2018-01-09 18:21             ` Richard Earnshaw (lists)
  2018-01-09 18:22               ` Bernd Edlinger
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 18:21 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 09/01/18 18:00, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:57, Bernd Edlinger wrote:
>> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>>> On 09/01/18 17:36, Bernd Edlinger wrote:
>>>> Richard Earnshaw wrote:
>>>>   > Let me give an example, we use the generic code expansion if we
>>>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>>>   > speculation and we really want to warn users that they've done this (it
>>>>   > might be a mistake in how they're invoking the compiler).
>>>>
>>>> Instead of this warning in one of the unsupported cases, could you use
>>>> the DSB SYS + ISB  barrier as the white paper suggests?
>>>>
>>>>
>>>> Bernd.
>>>
>>> Thumb1 doesn't have those instructions.
>>> 
>> 
>> Ah, well, and in the case mode == TImode ?
>> 
>> Bernd.
> 
> I don't think we can ever get a TImode integral type on AArch32.
> Perhaps we should just ICE in that case...
> 
> R.

But we probably don't need to.

 int x __attribute__((mode(TI))) = 0;

$ arm-none-elf-gcc timode.c

/tmp/ti.c:1:1: error: unable to emulate ‘TI’
 int x __attribute__((mode(TI))) = 0;
 ^~~

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 18:00         ` Bernd Edlinger
@ 2018-01-09 18:01           ` Richard Earnshaw (lists)
  2018-01-09 18:21             ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 18:01 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 09/01/18 17:57, Bernd Edlinger wrote:
> On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
>> On 09/01/18 17:36, Bernd Edlinger wrote:
>>> Richard Earnshaw wrote:
>>>   > Let me give an example, we use the generic code expansion if we
>>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>>   > speculation and we really want to warn users that they've done this (it
>>>   > might be a mistake in how they're invoking the compiler).
>>>
>>> Instead of this warning in one of the unsupported cases, could you use
>>> the DSB SYS + ISB  barrier as the white paper suggests?
>>>
>>>
>>> Bernd.
>>
>> Thumb1 doesn't have those instructions.
>> 
> 
> Ah, well, and in the case mode == TImode ?
> 
> Bernd.

I don't think we can ever get a TImode integral type on AArch32.
Perhaps we should just ICE in that case...

R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 17:53       ` Richard Earnshaw (lists)
@ 2018-01-09 18:00         ` Bernd Edlinger
  2018-01-09 18:01           ` Richard Earnshaw (lists)
  2018-01-09 22:54         ` Bernd Edlinger
  1 sibling, 1 reply; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-09 18:00 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches

On 01/09/18 18:50, Richard Earnshaw (lists) wrote:
> On 09/01/18 17:36, Bernd Edlinger wrote:
>> Richard Earnshaw wrote:
>>   > Let me give an example, we use the generic code expansion if we
>>   > encounter the builtin when generating code for Thumb on pre-ARMv7
>>   > devices.  We don't have instructions in 'thumb1' to guard against
>>   > speculation and we really want to warn users that they've done this (it
>>   > might be a mistake in how they're invoking the compiler).
>>
>> Instead of this warning in one of the unsupported cases, could you use
>> the DSB SYS + ISB  barrier as the white paper suggests?
>>
>>
>> Bernd.
>
> Thumb1 doesn't have those instructions.
> 

Ah, well, and in the case mode == TImode ?

Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-09 17:44     ` Bernd Edlinger
@ 2018-01-09 17:53       ` Richard Earnshaw (lists)
  2018-01-09 18:00         ` Bernd Edlinger
  2018-01-09 22:54         ` Bernd Edlinger
  0 siblings, 2 replies; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-09 17:53 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 09/01/18 17:36, Bernd Edlinger wrote:
> Richard Earnshaw wrote:
>  > Let me give an example, we use the generic code expansion if we
>  > encounter the builtin when generating code for Thumb on pre-ARMv7
>  > devices.  We don't have instructions in 'thumb1' to guard against
>  > speculation and we really want to warn users that they've done this (it
>  > might be a mistake in how they're invoking the compiler).
> 
> Instead of this warning in one of the unsupported cases, could you use
> the DSB SYS + ISB  barrier as the white paper suggests?
> 
> 
> Bernd.

Thumb1 doesn't have those instructions.

R.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-08 17:52   ` Richard Earnshaw (lists)
@ 2018-01-09 17:44     ` Bernd Edlinger
  2018-01-09 17:53       ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-09 17:44 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches

Richard Earnshaw wrote:
 > Let me give an example, we use the generic code expansion if we
 > encounter the builtin when generating code for Thumb on pre-ARMv7
 > devices.  We don't have instructions in 'thumb1' to guard against
 > speculation and we really want to warn users that they've done this (it
 > might be a mistake in how they're invoking the compiler).

Instead of this warning in one of the unsupported cases, could you use 
the DSB SYS + ISB  barrier as the white paper suggests?


Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
  2018-01-08 16:17 ` Bernd Edlinger
@ 2018-01-08 17:52   ` Richard Earnshaw (lists)
  2018-01-09 17:44     ` Bernd Edlinger
  0 siblings, 1 reply; 70+ messages in thread
From: Richard Earnshaw (lists) @ 2018-01-08 17:52 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches

On 08/01/18 16:10, Bernd Edlinger wrote:
> I thought about your new builtin again, and I wonder if
> something like that might work as well?
> 
> 
> cat despec.s
> 	.arch armv7-a
> 	.eabi_attribute 28, 1
> 	.eabi_attribute 20, 1
> 	.eabi_attribute 21, 1
> 	.eabi_attribute 23, 3
> 	.eabi_attribute 24, 1
> 	.eabi_attribute 25, 1
> 	.eabi_attribute 26, 2
> 	.eabi_attribute 30, 4
> 	.eabi_attribute 34, 1
> 	.eabi_attribute 18, 4
> 	.section	.text.startup,"ax",%progbits
> 	.align	2
> 	.global	despec_i
> 	.syntax unified
> 	.arm
> 	.fpu vfpv3-d16
> 
> despec_i:
> 	cmp r0,#0
> 	beq L0
> 	ldr r0,[r1]
> 	moveq r0,r2
> 	nop {0x14} @ CSDB
> 	str r0,[r1]
> 	mov r0,#1
> 	bx lr
> L0:	mov r0,#0
> 	bx lr
> 
> cat test.c
> extern int despec_i(int predicate, int *untrusted, int fallback);
> #define N 8
> int a[N] = {1,2,3,4,5,7,8,9};
> int a2[0x200];
> int test(int untrust)
> {
>    int x = 0;
>    if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
>    {
>       int v = a[untrust] & 0x1 ? 0x100 : 0x0;
>       x = a2[v];
>    }
>    return x;
> }
> 
> 
> So this should feed the predicate through the builtin, and
> clear the untrusted value when the condition has been been
> mis-predicted.
> 
> Wouldn't that be more flexible to use?
> Or am I missing something?

Yes, if you modified your test case to be something like:


int test(int untrust)
{
   int x = 0;

   if (untrust < 0)
     return x;

   if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
   {
      int v = a[untrust] & 0x1 ? 0x100 : 0x0;
      x = a2[v];
   }
   return x;
}

then the compiler can (and will) optimize the condition to

  if (despec (true && untrust < N, &untrust, 0))

and suddenly you don't have the protection against speculative execution
that you thought you had.  That makes the API exceedingly dangerous as
we can't easily detect and inhibit all the sources of information that
the compiler might use to make such deductions.  What happens, for
example if your original case is inlined into a wider function that has
additional tests?

R.


> 
> 
> As a side note: I noticed that "nop {0x14}" seems to produce the correct
> assembler opcode without the need for using a .insn code.
> 
> 
> Bernd.
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
       [not found] <9f3b02c6-09a7-273c-64b3-4d00a40f99bf@hotmail.de>
@ 2018-01-08 16:17 ` Bernd Edlinger
  2018-01-08 17:52   ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-08 16:17 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches

I thought about your new builtin again, and I wonder if
something like that might work as well?


cat despec.s
	.arch armv7-a
	.eabi_attribute 28, 1
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 4
	.eabi_attribute 34, 1
	.eabi_attribute 18, 4
	.section	.text.startup,"ax",%progbits
	.align	2
	.global	despec_i
	.syntax unified
	.arm
	.fpu vfpv3-d16

despec_i:
	cmp r0,#0
	beq L0
	ldr r0,[r1]
	moveq r0,r2
	nop {0x14} @ CSDB
	str r0,[r1]
	mov r0,#1
	bx lr
L0:	mov r0,#0
	bx lr

cat test.c
extern int despec_i(int predicate, int *untrusted, int fallback);
#define N 8
int a[N] = {1,2,3,4,5,7,8,9};
int a2[0x200];
int test(int untrust)
{
   int x = 0;
   if (despec_i(untrust >= 0 && untrust < N, &untrust, 0))
   {
      int v = a[untrust] & 0x1 ? 0x100 : 0x0;
      x = a2[v];
   }
   return x;
}


So this should feed the predicate through the builtin, and
clear the untrusted value when the condition has been been
mis-predicted.

Wouldn't that be more flexible to use?
Or am I missing something?


As a side note: I noticed that "nop {0x14}" seems to produce the correct
assembler opcode without the need for using a .insn code.


Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
@ 2018-01-08 15:47 Nick Clifton
  0 siblings, 0 replies; 70+ messages in thread
From: Nick Clifton @ 2018-01-08 15:47 UTC (permalink / raw)
  To: gcc-patches

Hi Guys,

  It seems to me that it might be worth taking a step back here,
  and consider adding a security framework to gcc.  Mitigations
  for CVEs in the past have resulted in individual patches being
  added to gcc, oftern in a target specific manner, and with no
  real framework to support them, document them, or indicate to
  an external tool that they have been applied.

  In addition security fixes often result in the generation of
  less optimal code, and so it might be useful to have a way to
  tell other parts of gcc that a given particular sequence should
  not be altered.

  Not that I am an expert in this area, but I do think that it is
  something that should be discussed...

Cheers
  Nick



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 0/3] Add __builtin_load_no_speculate
@ 2018-01-07 23:11 Bernd Edlinger
  0 siblings, 0 replies; 70+ messages in thread
From: Bernd Edlinger @ 2018-01-07 23:11 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches

Hi Richard,

I wonder how to use this builtin correctly.
Frankly it sounds like way too complicated.

Do you expect we need to use this stuff on every
array access now, or is that just a theoretical
thing that can only happen with jave byte code
interpreters?

If I assume there is an array of int, then
the correct way to check the array bounds
would be something like:

int a[N];

int x = __builtin_load_no_speculate(ptr, a, a+N);

But what if ptr points to the last byte of a[N-1] ?
This would load 3 bytes past the array, right ?

Then you would have to write:

x = __builtin_load_no_speculate(ptr, a, (char*)(a+N) - 3);

which is just horrible, wouldn't it be better to fold that
into the condition:

I mean, instead of...

if (cmpptr >= lower && cmpptr < upper)

...use something like:

if (cmpptr >= lower && cmpptr <= upper - sizeof(TYP))


And what happens if the untrusted pointer is unaligned?
Wouldn't that also give quite surprising values to speculate
with?


Bernd.

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2018-01-18 14:50 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-04 13:58 [PATCH 0/3] Add __builtin_load_no_speculate Richard Earnshaw
2018-01-04 13:58 ` [PATCH 1/3] [builtins] Generic support for __builtin_load_no_speculate() Richard Earnshaw
2018-01-08  2:20   ` Bill Schmidt
2018-01-08  6:55     ` Jeff Law
2018-01-08 11:52       ` Richard Biener
2018-01-08 14:23       ` Bill Schmidt
2018-01-08 15:38         ` Richard Earnshaw (lists)
2018-01-08 17:02           ` Bill Schmidt
2018-01-08 19:42         ` Jeff Law
2018-01-08 21:08           ` Bill Schmidt
2018-01-10 23:48             ` Jeff Law
2018-01-08 14:19     ` Richard Earnshaw (lists)
2018-01-08 16:07       ` Bill Schmidt
2018-01-09 10:26         ` Richard Earnshaw (lists)
2018-01-09 17:11           ` Bill Schmidt
2018-01-10 23:36             ` Jeff Law
2018-01-10 23:28         ` Jeff Law
2018-01-12 16:16           ` Richard Earnshaw (lists)
2018-01-12 22:48             ` Bill Schmidt
2018-01-04 13:59 ` [PATCH 2/3] [aarch64] Implement support for __builtin_load_no_speculate Richard Earnshaw
2018-01-05  9:51   ` Richard Biener
2018-01-05 10:48     ` Richard Earnshaw (lists)
2018-01-05 16:31       ` Jeff Law
2018-01-04 13:59 ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
2018-01-04 18:20 ` [PATCH 0/3] Add __builtin_load_no_speculate Joseph Myers
2018-01-05 17:26   ` Jeff Law
2018-01-09 15:30     ` Richard Earnshaw (lists)
2018-01-11  0:13       ` Jeff Law
2018-01-05  9:44 ` Richard Biener
2018-01-05 10:40   ` Richard Earnshaw (lists)
2018-01-05 10:47     ` Richard Biener
2018-01-05 10:59       ` Richard Earnshaw (lists)
2018-01-05 11:35         ` Jakub Jelinek
2018-01-05 11:58           ` Richard Earnshaw (lists)
2018-01-05 16:44             ` Jeff Law
2018-01-05 16:50       ` Jeff Law
2018-01-05 16:41     ` Jeff Law
2018-01-05 16:37   ` Jeff Law
2018-01-05 11:37 ` Alexander Monakov
2018-01-05 11:53   ` Richard Earnshaw (lists)
2018-01-05 13:08     ` Alexander Monakov
2018-01-09 10:47       ` Richard Earnshaw (lists)
2018-01-09 13:41         ` Alexander Monakov
2018-01-09 13:50           ` Richard Earnshaw (lists)
2018-01-09 14:37             ` Alexander Monakov
2018-01-10 23:48         ` Jeff Law
2018-01-05 17:24 ` Jeff Law
2018-01-09 11:44   ` Richard Earnshaw (lists)
2018-01-17 14:56 ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Richard Earnshaw
2018-01-17 14:56   ` [PATCH 2/3] [aarch64] Implement support for __builtin_speculation_safe_load Richard Earnshaw
2018-01-17 14:56   ` [PATCH 3/3] [arm] Implement support for the de-speculation intrinsic Richard Earnshaw
2018-01-17 14:59   ` [PATCH 1/3] [builtins] Generic support for __builtin_speculation_safe_load() Richard Earnshaw
2018-01-17 17:22     ` Joseph Myers
2018-01-17 17:22       ` Jakub Jelinek
2018-01-18 12:48     ` Richard Biener
2018-01-18 12:52       ` Richard Biener
2018-01-18 14:17       ` Richard Earnshaw (lists)
2018-01-18 14:59         ` Richard Biener
2018-01-17 17:17   ` [PATCH 0/3] [v2] Implement __builtin_speculation_safe_load Joseph Myers
2018-01-07 23:11 [PATCH 0/3] Add __builtin_load_no_speculate Bernd Edlinger
2018-01-08 15:47 Nick Clifton
     [not found] <9f3b02c6-09a7-273c-64b3-4d00a40f99bf@hotmail.de>
2018-01-08 16:17 ` Bernd Edlinger
2018-01-08 17:52   ` Richard Earnshaw (lists)
2018-01-09 17:44     ` Bernd Edlinger
2018-01-09 17:53       ` Richard Earnshaw (lists)
2018-01-09 18:00         ` Bernd Edlinger
2018-01-09 18:01           ` Richard Earnshaw (lists)
2018-01-09 18:21             ` Richard Earnshaw (lists)
2018-01-09 18:22               ` Bernd Edlinger
2018-01-09 22:54         ` Bernd Edlinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).