public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
                   ` (2 preceding siblings ...)
  2018-01-07 22:59 ` [PATCH 5/5] x86: Add 'V' register operand modifier H.J. Lu
@ 2018-01-07 22:59 ` H.J. Lu
  2018-01-08 10:01   ` Martin Liška
  2018-01-11 23:00   ` Jeff Law
  2018-01-07 22:59 ` [PATCH 2/5] x86: Add -mindirect-branch-loop= H.J. Lu
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

Add -mfunction-return= option to convert function return to call and
return thunks.  The default is 'keep', which keeps function return
unmodified.  'thunk' converts function return to call and return thunk.
'thunk-inline' converts function return to inlined call and return thunk.  'thunk-extern' converts function return to external call and return
thunk provided in a separate object file.  You can control this behavior
for a specific function by using the function attribute function_return.

Function return thunk is the same as memory thunk for -mindirect-branch=
where the return address is at the top of the stack:

__x86_return_thunk:
	call L2
L1:
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

and function return becomes

	jmp __x86_return_thunk

-mindirect-branch= tests are updated with -mfunction-return=keep to
avoid false test failures when -mfunction-return=lfence is added to
RUNTESTFLAGS for "make check".

gcc/

	* config/i386/i386-protos.h (ix86_output_function_return): New.
	* config/i386/i386.c (ix86_set_indirect_branch_type): Also
	set function_return_type.
	(indirect_thunk_name): Add ret_p to indicate thunk for function
	return.
	(output_indirect_thunk_function): Pass false to
	indirect_thunk_name.
	(ix86_output_indirect_branch): Likewise.
	(output_indirect_thunk_function): Create alias for function
	return thunk if regno < 0.
	(ix86_output_function_return): New function.
	(ix86_handle_fndecl_attribute): Handle function_return.
	(ix86_attribute_table): Add function_return.
	* config/i386/i386.h (machine_function): Add
	function_return_type.
	* config/i386/i386.md (simple_return_internal): Use
	ix86_output_function_return.
	(simple_return_internal_long): Likewise.
	* config/i386/i386.opt (mfunction-return=): New option.
	(indirect_branch): Mention -mfunction-return=.
	* doc/extend.texi: Document function_return function attribute.
	* doc/invoke.texi: Document -mfunction-return= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
	-mfunction-return=keep.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
	* gcc.target/i386/ret-thunk-1.c: New test.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-16.c: Likewise.
	* gcc.target/i386/ret-thunk-2.c: Likewise.
	* gcc.target/i386/ret-thunk-3.c: Likewise.
	* gcc.target/i386/ret-thunk-4.c: Likewise.
	* gcc.target/i386/ret-thunk-5.c: Likewise.
	* gcc.target/i386/ret-thunk-6.c: Likewise.
	* gcc.target/i386/ret-thunk-7.c: Likewise.
	* gcc.target/i386/ret-thunk-8.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.
---
 gcc/config/i386/i386-protos.h                      |   1 +
 gcc/config/i386/i386.c                             | 146 ++++++++++++++++++++-
 gcc/config/i386/i386.h                             |   3 +
 gcc/config/i386/i386.md                            |   9 +-
 gcc/config/i386/i386.opt                           |   6 +-
 gcc/doc/extend.texi                                |   9 ++
 gcc/doc/invoke.texi                                |  14 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c   |   2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c   |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-1.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-2.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-3.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-4.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-5.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-6.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-7.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-attr-8.c        |   2 +-
 .../gcc.target/i386/indirect-thunk-bnd-1.c         |   2 +-
 .../gcc.target/i386/indirect-thunk-bnd-2.c         |   2 +-
 .../gcc.target/i386/indirect-thunk-bnd-3.c         |   2 +-
 .../gcc.target/i386/indirect-thunk-bnd-4.c         |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-1.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-2.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-3.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-4.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-5.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-6.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-extern-7.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-1.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-2.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-3.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-4.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-5.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-6.c      |   2 +-
 .../gcc.target/i386/indirect-thunk-inline-7.c      |   2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-1.c        |  12 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-10.c       |  22 ++++
 gcc/testsuite/gcc.target/i386/ret-thunk-11.c       |  22 ++++
 gcc/testsuite/gcc.target/i386/ret-thunk-12.c       |  21 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-13.c       |  21 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-14.c       |  21 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-15.c       |  21 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-16.c       |  18 +++
 gcc/testsuite/gcc.target/i386/ret-thunk-2.c        |  12 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-3.c        |  12 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-4.c        |  12 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-5.c        |  14 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-6.c        |  13 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-7.c        |  13 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-8.c        |  14 ++
 gcc/testsuite/gcc.target/i386/ret-thunk-9.c        |  23 ++++
 56 files changed, 476 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-9.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bf11cc426f9..fb86f00b3a6 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -306,6 +306,7 @@ extern enum attr_cpu ix86_schedule;
 
 extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
 extern const char * ix86_output_indirect_jmp (rtx call_op, bool ret_p);
+extern const char * ix86_output_function_return (bool long_p);
 extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 						machine_mode mode);
 extern int ix86_min_insn_size (rtx_insn *);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f327a6b1b62..b75fc48a4da 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5833,6 +5833,31 @@ ix86_set_indirect_branch_type (tree fndecl)
       else
 	cfun->machine->indirect_branch_type = ix86_indirect_branch;
     }
+
+  if (cfun->machine->function_return_type == indirect_branch_unset)
+    {
+      tree attr = lookup_attribute ("function_return",
+				    DECL_ATTRIBUTES (fndecl));
+      if (attr != NULL)
+	{
+	  tree args = TREE_VALUE (attr);
+	  if (args == NULL)
+	    gcc_unreachable ();
+	  tree cst = TREE_VALUE (args);
+	  if (strcmp (TREE_STRING_POINTER (cst), "keep") == 0)
+	    cfun->machine->function_return_type = indirect_branch_keep;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk") == 0)
+	    cfun->machine->function_return_type = indirect_branch_thunk;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk-inline") == 0)
+	    cfun->machine->function_return_type = indirect_branch_thunk_inline;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk-extern") == 0)
+	    cfun->machine->function_return_type = indirect_branch_thunk_extern;
+	  else
+	    gcc_unreachable ();
+	}
+      else
+	cfun->machine->function_return_type = ix86_function_return;
+    }
 }
 
 /* Establish appropriate back-end context for processing the function
@@ -10695,8 +10720,12 @@ static int indirect_thunks_bnd_used;
 /* Fills in the label name that should be used for the indirect thunk.  */
 
 static void
-indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
+indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
+		     bool ret_p)
 {
+  if (regno >= 0 && ret_p)
+    gcc_unreachable ();
+
   if (USE_HIDDEN_LINKONCE)
     {
       const char *bnd = need_bnd_p ? "_bnd" : "";
@@ -10711,7 +10740,10 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
 		   bnd, reg_prefix, reg_names[regno]);
 	}
       else
-	sprintf (name, "__x86_indirect_thunk%s", bnd);
+	{
+	  const char *ret = ret_p ? "return" : "indirect";
+	  sprintf (name, "__x86_%s_thunk%s", ret, bnd);
+	}
     }
   else
     {
@@ -10724,10 +10756,20 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
 	}
       else
 	{
-	  if (need_bnd_p)
-	    ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0);
+	  if (ret_p)
+	    {
+	      if (need_bnd_p)
+		ASM_GENERATE_INTERNAL_LABEL (name, "LRTB", 0);
+	      else
+		ASM_GENERATE_INTERNAL_LABEL (name, "LRT", 0);
+	    }
 	  else
-	    ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0);
+	    {
+	      if (need_bnd_p)
+		ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0);
+	      else
+		ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0);
+	    }
 	}
     }
 }
@@ -10808,7 +10850,7 @@ output_indirect_thunk_function (bool need_bnd_p, int regno)
   tree decl;
 
   /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
-  indirect_thunk_name (name, regno, need_bnd_p);
+  indirect_thunk_name (name, regno, need_bnd_p, false);
   decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
 		     get_identifier (name),
 		     build_function_type_list (void_type_node, NULL_TREE));
@@ -10851,6 +10893,35 @@ output_indirect_thunk_function (bool need_bnd_p, int regno)
 	ASM_OUTPUT_LABEL (asm_out_file, name);
       }
 
+  if (regno < 0)
+    {
+      /* Create alias for __x86.return_thunk/__x86.return_thunk_bnd.  */
+      char alias[32];
+
+      indirect_thunk_name (alias, regno, need_bnd_p, true);
+      ASM_OUTPUT_DEF (asm_out_file, alias, name);
+#if TARGET_MACHO
+      if (TARGET_MACHO)
+	{
+	  fputs ("\t.weak_definition\t", asm_out_file);
+	  assemble_name (asm_out_file, alias);
+	  fputs ("\n\t.private_extern\t", asm_out_file);
+	  assemble_name (asm_out_file, alias);
+	  putc ('\n', asm_out_file);
+	}
+#else
+      if (USE_HIDDEN_LINKONCE)
+	{
+	  fputs ("\t.globl\t", asm_out_file);
+	  assemble_name (asm_out_file, alias);
+	  putc ('\n', asm_out_file);
+	  fputs ("\t.hidden\t", asm_out_file);
+	  assemble_name (asm_out_file, alias);
+	  putc ('\n', asm_out_file);
+	}
+#endif
+    }
+
   DECL_INITIAL (decl) = make_node (BLOCK);
   current_function_decl = decl;
   allocate_struct_function (decl, false);
@@ -28410,7 +28481,7 @@ ix86_output_indirect_branch (rtx call_op, const char *xasm,
 		indirect_thunk_needed = true;
 	    }
 	}
-      indirect_thunk_name (thunk_name_buf, regno, need_bnd_p);
+      indirect_thunk_name (thunk_name_buf, regno, need_bnd_p, false);
       thunk_name = thunk_name_buf;
     }
   else
@@ -28539,6 +28610,43 @@ ix86_output_indirect_jmp (rtx call_op, bool ret_p)
     return "%!jmp\t%A0";
 }
 
+const char *
+ix86_output_function_return (bool long_p)
+{
+  if (cfun->machine->function_return_type != indirect_branch_keep)
+    {
+      char thunk_name[32];
+      bool need_bnd_p = ix86_bnd_prefixed_insn_p (current_output_insn);
+
+      if (cfun->machine->function_return_type
+	  != indirect_branch_thunk_inline)
+	{
+	  bool need_thunk = (cfun->machine->function_return_type
+			     == indirect_branch_thunk);
+	  indirect_thunk_name (thunk_name, -1, need_bnd_p, true);
+	  if (need_bnd_p)
+	    {
+	      indirect_thunk_bnd_needed |= need_thunk;
+	      fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
+	    }
+	  else
+	    {
+	      indirect_thunk_needed |= need_thunk;
+	      fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
+	    }
+	}
+      else
+	output_indirect_thunk (need_bnd_p, -1);
+
+      return "";
+    }
+
+  if (!long_p || ix86_bnd_prefixed_insn_p (current_output_insn))
+    return "%!ret";
+
+  return "rep%; ret";
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -40849,6 +40957,28 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
 	}
     }
 
+  if (is_attribute_p ("function_return", name))
+    {
+      tree cst = TREE_VALUE (args);
+      if (TREE_CODE (cst) != STRING_CST)
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute requires a string constant argument",
+		   name);
+	  *no_add_attrs = true;
+	}
+      else if (strcmp (TREE_STRING_POINTER (cst), "keep") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk-inline") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk-extern") != 0)
+	{
+	  warning (OPT_Wattributes,
+		   "argument to %qE attribute is not "
+		   "(keep|thunk|thunk-inline|thunk-extern)", name);
+	  *no_add_attrs = true;
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -45263,6 +45393,8 @@ static const struct attribute_spec ix86_attribute_table[] =
     ix86_handle_fndecl_attribute, NULL },
   { "indirect_branch", 1, 1, true, false, false, false,
     ix86_handle_fndecl_attribute, NULL },
+  { "function_return", 1, 1, true, false, false, false,
+    ix86_handle_fndecl_attribute, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 51a920298a4..05cc9647755 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2575,6 +2575,9 @@ struct GTY(()) machine_function {
      "indirect_jump" or "tablejump".  */
   BOOL_BITFIELD has_local_indirect_jump : 1;
 
+  /* How to generate function return.  */
+  ENUM_BITFIELD(indirect_branch) function_return_type : 3;
+
   /* If true, the current function is a function specified with
      the "interrupt" or "no_caller_saved_registers" attribute.  */
   BOOL_BITFIELD no_caller_saved_registers : 1;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a7573c468ae..6c832a867c8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -13050,7 +13050,7 @@
 (define_insn "simple_return_internal"
   [(simple_return)]
   "reload_completed"
-  "%!ret"
+  "* return ix86_output_function_return (false);"
   [(set_attr "length" "1")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "0")
@@ -13072,12 +13072,7 @@
   [(simple_return)
    (unspec [(const_int 0)] UNSPEC_REP)]
   "reload_completed"
-{
-  if (ix86_bnd_prefixed_insn_p (insn))
-    return "%!ret";
-
-  return "rep%; ret";
-}
+  "* return ix86_output_function_return (true);"
   [(set_attr "length" "2")
    (set_attr "atom_unit" "jeu")
    (set_attr "length_immediate" "0")
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index f3e43da0aed..2b62363973d 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1026,9 +1026,13 @@ mindirect-branch=
 Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
 Convert indirect call and jump.
 
+mfunction-return=
+Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
+Convert function return.
+
 Enum
 Name(indirect_branch) Type(enum indirect_branch)
-Known indirect branch choices (for use with the -mindirect-branch= option):
+Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
 
 EnumValue
 Enum(indirect_branch) String(keep) Value(indirect_branch_keep)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 6e48d4108a2..7690a2bdb7d 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5745,6 +5745,15 @@ indirect call and jump to inlined call and return thunk.
 @samp{thunk-extern} converts indirect call and jump to external call
 and return thunk provided in a separate object file.
 
+@item function_return("@var{choice}")
+@cindex @code{function_return} function attribute, x86
+On x86 targets, the @code{function_return} attribute causes the compiler
+to convert function return with @var{choice}.  @samp{keep} keeps function
+return unmodified.  @samp{thunk} converts function return to call and
+return thunk.  @samp{thunk-inline} converts function return to inlined
+call and return thunk.  @samp{thunk-extern} converts function return to
+external call and return thunk provided in a separate object file.
+
 @item nocf_check
 @cindex @code{nocf_check} function attribute
 The @code{nocf_check} attribute on a function is used to inform the
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ee5248aba59..fa72a929751 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,7 +1227,8 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset} @gol
 -mstack-protector-guard-symbol=@var{symbol} -mmitigate-rop @gol
 -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
--mindirect-branch=@var{choice} -mindirect-branch-loop=@var{choice}}
+-mindirect-branch=@var{choice} -mindirect-branch-loop=@var{choice}
+-mfunction-return==@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -26776,6 +26777,17 @@ to external call and return thunk provided in a separate object file.
 You can control this behavior for a specific function by using the
 function attribute @code{indirect_branch}.  @xref{Function Attributes}.
 
+@item -mfunction-return=@var{choice}
+@opindex -mfunction-return
+Convert function return with @var{choice}.  The default is @samp{keep},
+which keeps function return unmodified.  @samp{thunk} converts function
+return to call and return thunk.  @samp{thunk-inline} converts function
+return to inlined call and return thunk.  @samp{thunk-extern} converts
+function return to external call and return thunk provided in a separate
+object file.  You can control this behavior for a specific function by
+using the function attribute @code{function_return}.
+@xref{Function Attributes}.
+
 @item -mindirect-branch-loop=@var{choice}
 @opindex -mindirect-branch-boop
 Control loop filler in call and return thunk for indirect call and jump.
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
index 08827448325..bd29d466f5c 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
index 1344b6bc0e9..cf6a6736524 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
index dcc9ef75df6..46925434933 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
index 2502860b6e6..ee95d21c08e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
index 58c81d16316..7ea4af5bb71 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
index b4c528a5b30..cbf2159ce50 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
index 4553ef0b622..44130a4175d 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
index b8e9851c76f..2e3e3a5ca8b 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
index 1d6d18c2aba..c88fd33c494 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
index af167840b81..21b69728796 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
index 146124894a0..0bd6aab2fd6 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
index 0833606046b..98785a38248 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
index 2eba0fbd9b2..a498a39e404 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
index f58427eae11..66f295d1eb6 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
index 6960fa0bbfb..c246f974610 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
index 21b25ec5bbf..154eb7adfcd 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! x32 } } } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
 
 void (*dispatch) (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
index 7bf7e6a1095..62d9831af27 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! x32 } } } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
 
 void (*dispatch) (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
index 14c60f232db..b5a601a4da3 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
 
 void bar (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
index 4fd6f360801..c36a821c8e5 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
-/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
 
 void bar (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
index 49f27b49465..637fc3d3f4e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
index a1e3eb6fc74..ff9efe03fe6 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
index 395634e7e5c..2686a5f2db4 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
index fd3f63379a1..f07f6b214ad 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
index ba2f92b6f34..21740ac5b7f 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
index 0c5a2d472c6..a77c1f470b8 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
index 665252327aa..e64910fd4aa 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
index 3ace8d1b031..efa0096e1e0 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
index 6c97b96f1f2..775d0b8c53e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
index 8f6759cbf06..788271f049f 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
index b07d08cab0f..ef8a2c746a7 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
index 10794886b1b..848ceefca02 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
index a26ec4b06ed..64608100782 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
index 77253af17c6..3c2758360f5 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-1.c b/gcc/testsuite/gcc.target/i386/ret-thunk-1.c
new file mode 100644
index 00000000000..07f382c21b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-10.c b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
new file mode 100644
index 00000000000..d7313d631aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-inline -mindirect-branch=thunk -fno-pic" } */
+
+extern void (*bar) (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 2 } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-11.c b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
new file mode 100644
index 00000000000..66efd2adfd3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-extern -mindirect-branch=thunk -fno-pic" } */
+
+extern void (*bar) (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 1 } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-12.c b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
new file mode 100644
index 00000000000..b5306581e91
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+
+extern void (*bar) (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 1 } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-13.c b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
new file mode 100644
index 00000000000..3759246d7ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+
+extern void (*bar) (void);
+extern int foo (void) __attribute__ ((function_return("thunk")));
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 2 } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 3 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 3 } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-14.c b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
new file mode 100644
index 00000000000..6827fa250aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+
+extern void (*bar) (void);
+
+__attribute__ ((function_return("thunk-inline")))
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {\tlfence} 1 } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-15.c b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
new file mode 100644
index 00000000000..0437ca2453b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=keep -fno-pic" } */
+
+extern void (*bar) (void);
+
+__attribute__ ((function_return("thunk-extern"), indirect_branch("thunk")))
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 1 } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-16.c b/gcc/testsuite/gcc.target/i386/ret-thunk-16.c
new file mode 100644
index 00000000000..173fe243d7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-inline -mindirect-branch=thunk-extern -fno-pic" } */
+
+extern void (*bar) (void);
+
+__attribute__ ((function_return("keep"), indirect_branch("keep")))
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-2.c b/gcc/testsuite/gcc.target/i386/ret-thunk-2.c
new file mode 100644
index 00000000000..5516813a290
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-inline" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-3.c b/gcc/testsuite/gcc.target/i386/ret-thunk-3.c
new file mode 100644
index 00000000000..9f1ade857ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-extern" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-4.c b/gcc/testsuite/gcc.target/i386/ret-thunk-4.c
new file mode 100644
index 00000000000..abecde0a550
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-4.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-5.c b/gcc/testsuite/gcc.target/i386/ret-thunk-5.c
new file mode 100644
index 00000000000..3b51a9931db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-5.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep" } */
+
+extern void foo (void) __attribute__ ((function_return("thunk")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-6.c b/gcc/testsuite/gcc.target/i386/ret-thunk-6.c
new file mode 100644
index 00000000000..52160e0ee77
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-6.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep" } */
+
+__attribute__ ((function_return("thunk-inline")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-7.c b/gcc/testsuite/gcc.target/i386/ret-thunk-7.c
new file mode 100644
index 00000000000..65819c2ab76
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=keep" } */
+
+__attribute__ ((function_return("thunk-extern")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-8.c b/gcc/testsuite/gcc.target/i386/ret-thunk-8.c
new file mode 100644
index 00000000000..a6a1bbc054b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-8.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk-inline" } */
+
+extern void foo (void) __attribute__ ((function_return("keep")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not {\tlfence} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-9.c b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
new file mode 100644
index 00000000000..adf83df4776
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mfunction-return=thunk -mindirect-branch=thunk -fno-pic" } */
+
+extern void (*bar) (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_return_thunk" } } */
+/* { dg-final { scan-assembler-not "__x86_return_thunk:" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk:" } } */
+/* { dg-final { scan-assembler-times {\tlfence} 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times {\tlfence} 2 { target { x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
                   ` (3 preceding siblings ...)
  2018-01-07 22:59 ` [PATCH 3/5] x86: Add -mfunction-return= H.J. Lu
@ 2018-01-07 22:59 ` H.J. Lu
  2018-01-08  8:23   ` Florian Weimer
  2018-01-11 21:49   ` Jeff Law
  2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

Add -mindirect-branch-loop= option to control loop filler in call and
return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
as loop filler.  The default is 'lfence'.

gcc/

	* config/i386/i386-opts.h (indirect_branch_loop): New.
	* config/i386/i386.c (output_indirect_thunk): Support
	-mindirect-branch-loop=.
	* config/i386/i386.opt (mindirect-branch-loop=): New option.
	(indirect_branch_loop): New.
	(lfence): Likewise.
	(pause): Likewise.
	(nop): Likewise.
	* doc/invoke.texi: Document -mindirect-branch-loop= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
---
 gcc/config/i386/i386-opts.h                           |  6 ++++++
 gcc/config/i386/i386.c                                | 19 +++++++++++++++++--
 gcc/config/i386/i386.opt                              | 17 +++++++++++++++++
 gcc/doc/invoke.texi                                   |  9 ++++++++-
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c | 19 +++++++++++++++++++
 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c | 19 +++++++++++++++++++
 9 files changed, 143 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index f14cbeee7a1..52f7cb979cc 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -114,4 +114,10 @@ enum indirect_branch {
   indirect_branch_thunk_extern
 };
 
+enum indirect_branch_loop {
+  indirect_branch_loop_lfence,
+  indirect_branch_loop_pause,
+  indirect_branch_loop_nop
+};
+
 #endif
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ac4d1f62f50..f327a6b1b62 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10753,8 +10753,23 @@ output_indirect_thunk (bool need_bnd_p, int regno)
 
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
 
-  /* lfence .  */
-  fprintf (asm_out_file, "\tlfence\n");
+  switch (ix86_indirect_branch_loop)
+    {
+    case indirect_branch_loop_lfence:
+      /* lfence.  */
+      fprintf (asm_out_file, "\tlfence\n");
+      break;
+    case indirect_branch_loop_pause:
+      /* pause.  */
+      fprintf (asm_out_file, "\tpause\n");
+      break;
+    case indirect_branch_loop_nop:
+      /* nop.  */
+      fprintf (asm_out_file, "\tnop\n");
+      break;
+    default:
+      gcc_unreachable ();
+    }
 
   /* Jump.  */
   fputs ("\tjmp\t", asm_out_file);
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 22c806206e4..f3e43da0aed 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1041,3 +1041,20 @@ Enum(indirect_branch) String(thunk-inline) Value(indirect_branch_thunk_inline)
 
 EnumValue
 Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern)
+
+mindirect-branch-loop=
+Target Report RejectNegative Joined Enum(indirect_branch_loop) Var(ix86_indirect_branch_loop) Init(indirect_branch_loop_lfence)
+Control loop filler in call and return thunk for indirect call and jump.
+
+Enum
+Name(indirect_branch_loop) Type(enum indirect_branch_loop)
+Known looop choices (for use with the -mindirect-branch-loop= option):
+
+EnumValue
+Enum(indirect_branch_loop) String(lfence) Value(indirect_branch_loop_lfence)
+
+EnumValue
+Enum(indirect_branch_loop) String(pause) Value(indirect_branch_loop_pause)
+
+EnumValue
+Enum(indirect_branch_loop) String(nop) Value(indirect_branch_loop_nop)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 46461d1ada3..ee5248aba59 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1227,7 +1227,7 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-offset=@var{offset} @gol
 -mstack-protector-guard-symbol=@var{symbol} -mmitigate-rop @gol
 -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
--mindirect-branch=@var{choice}}
+-mindirect-branch=@var{choice} -mindirect-branch-loop=@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -26776,6 +26776,13 @@ to external call and return thunk provided in a separate object file.
 You can control this behavior for a specific function by using the
 function attribute @code{indirect_branch}.  @xref{Function Attributes}.
 
+@item -mindirect-branch-loop=@var{choice}
+@opindex -mindirect-branch-boop
+Control loop filler in call and return thunk for indirect call and jump.
+@samp{lfence} uses @code{lfence} as loop filler.  @samp{pause} uses
+@code{pause} as loop filler.  @samp{nop} uses @code{nop} as loop filler.
+The default is @samp{lfence}.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
new file mode 100644
index 00000000000..1b0e2c58775
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
new file mode 100644
index 00000000000..feace47a765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=nop -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tnop} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
new file mode 100644
index 00000000000..ad2165fa7aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=lfence -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
new file mode 100644
index 00000000000..4ba997da966
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tpause} } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
new file mode 100644
index 00000000000..10fb2193f5e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -mindirect-branch-loop=pause -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause|nop)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
  2018-01-07 22:59 ` [PATCH 4/5] x86: Add -mindirect-branch-register H.J. Lu
@ 2018-01-07 22:59 ` H.J. Lu
  2018-01-08 10:56   ` Martin Liška
  2018-01-11 22:54   ` Jeff Law
  2018-01-07 22:59 ` [PATCH 5/5] x86: Add 'V' register operand modifier H.J. Lu
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

Add -mindirect-branch= option to convert indirect call and jump to call
and return thunks.  The default is 'keep', which keeps indirect call and
jump unmodified.  'thunk' converts indirect call and jump to call and
return thunk.  'thunk-inline' converts indirect call and jump to inlined
call and return thunk.  'thunk-extern' converts indirect call and jump to
external call and return thunk provided in a separate object file.  You
can control this behavior for a specific function by using the function
attribute indirect_branch.

2 kinds of thunks are geneated.  Memory thunk where the function address
is at the top of the stack:

__x86_indirect_thunk:
	call L2
L1:
	lfence
	jmp L1
L2:
	lea 8(%rsp), %rsp|lea 4(%esp), %esp
	ret

Indirect jmp via memory, "jmp mem", is converted to

	push memory
	jmp __x86_indirect_thunk

Indirect call via memory, "call mem", is converted to

	jmp L2
L1:
	push [mem]
	jmp __x86_indirect_thunk
L2:
	call L1

Register thunk where the function address is in a register, reg:

__x86_indirect_thunk_reg:
	call	L2
L1:
	lfence
	jmp	L1
L2:
	movq	%reg, (%rsp)|movl    %reg, (%esp)
	ret

where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
(r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.

Indirect jmp via register, "jmp reg", is converted to

	jmp __x86_indirect_thunk_reg

Indirect call via register, "call reg", is converted to

	call __x86_indirect_thunk_reg

gcc/

	* config/i386/i386-opts.h (indirect_branch): New.
	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
	with local indirect jump when converting indirect call and jump.
	(ix86_set_indirect_branch_type): New.
	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
	(indirectlabelno): New.
	(indirect_thunk_needed): Likewise.
	(indirect_thunk_bnd_needed): Likewise.
	(indirect_thunks_used): Likewise.
	(indirect_thunks_bnd_used): Likewise.
	(INDIRECT_LABEL): Likewise.
	(indirect_thunk_name): Likewise.
	(output_indirect_thunk): Likewise.
	(output_indirect_thunk_function): Likewise.
	(ix86_output_indirect_branch): Likewise.
	(ix86_output_indirect_jmp): Likewise.
	(ix86_code_end): Call output_indirect_thunk_function if needed.
	(ix86_output_call_insn): Call ix86_output_indirect_branch if
	needed.
	(ix86_handle_fndecl_attribute): Handle indirect_branch.
	(ix86_attribute_table): Add indirect_branch.
	* config/i386/i386.h (machine_function): Add indirect_branch_type
	and has_local_indirect_jump.
	* config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
	to true.
	(tablejump): Likewise.
	(*indirect_jump): Use ix86_output_indirect_jmp.
	(*tablejump_1): Likewise.
	(simple_return_indirect_internal): Likewise.
	* config/i386/i386.opt (mindirect-branch=): New option.
	(indirect_branch): New.
	(keep): Likewise.
	(thunk): Likewise.
	(thunk-inline): Likewise.
	(thunk-extern): Likewise.
	* doc/extend.texi: Document indirect_branch function attribute.
	* doc/invoke.texi: Document -mindirect-branch= option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c: New test.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
---
 gcc/config/i386/i386-opts.h                        |   8 +
 gcc/config/i386/i386-protos.h                      |   1 +
 gcc/config/i386/i386.c                             | 495 ++++++++++++++++++++-
 gcc/config/i386/i386.h                             |   7 +
 gcc/config/i386/i386.md                            |   8 +-
 gcc/config/i386/i386.opt                           |  20 +
 gcc/doc/extend.texi                                |  10 +
 gcc/doc/invoke.texi                                |  14 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c   |  19 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c   |  19 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c   |  20 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c   |  20 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c   |  16 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c   |  17 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c   |  43 ++
 .../gcc.target/i386/indirect-thunk-attr-1.c        |  22 +
 .../gcc.target/i386/indirect-thunk-attr-2.c        |  20 +
 .../gcc.target/i386/indirect-thunk-attr-3.c        |  21 +
 .../gcc.target/i386/indirect-thunk-attr-4.c        |  20 +
 .../gcc.target/i386/indirect-thunk-attr-5.c        |  22 +
 .../gcc.target/i386/indirect-thunk-attr-6.c        |  21 +
 .../gcc.target/i386/indirect-thunk-attr-7.c        |  44 ++
 .../gcc.target/i386/indirect-thunk-attr-8.c        |  41 ++
 .../gcc.target/i386/indirect-thunk-bnd-1.c         |  19 +
 .../gcc.target/i386/indirect-thunk-bnd-2.c         |  20 +
 .../gcc.target/i386/indirect-thunk-bnd-3.c         |  18 +
 .../gcc.target/i386/indirect-thunk-bnd-4.c         |  19 +
 .../gcc.target/i386/indirect-thunk-extern-1.c      |  19 +
 .../gcc.target/i386/indirect-thunk-extern-2.c      |  19 +
 .../gcc.target/i386/indirect-thunk-extern-3.c      |  20 +
 .../gcc.target/i386/indirect-thunk-extern-4.c      |  20 +
 .../gcc.target/i386/indirect-thunk-extern-5.c      |  16 +
 .../gcc.target/i386/indirect-thunk-extern-6.c      |  17 +
 .../gcc.target/i386/indirect-thunk-extern-7.c      |  43 ++
 .../gcc.target/i386/indirect-thunk-inline-1.c      |  18 +
 .../gcc.target/i386/indirect-thunk-inline-2.c      |  18 +
 .../gcc.target/i386/indirect-thunk-inline-3.c      |  19 +
 .../gcc.target/i386/indirect-thunk-inline-4.c      |  19 +
 .../gcc.target/i386/indirect-thunk-inline-5.c      |  15 +
 .../gcc.target/i386/indirect-thunk-inline-6.c      |  16 +
 .../gcc.target/i386/indirect-thunk-inline-7.c      |  42 ++
 41 files changed, 1289 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c

diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
index f245c1573cf..f14cbeee7a1 100644
--- a/gcc/config/i386/i386-opts.h
+++ b/gcc/config/i386/i386-opts.h
@@ -106,4 +106,12 @@ enum prefer_vector_width {
     PVW_AVX512
 };
 
+enum indirect_branch {
+  indirect_branch_unset = 0,
+  indirect_branch_keep,
+  indirect_branch_thunk,
+  indirect_branch_thunk_inline,
+  indirect_branch_thunk_extern
+};
+
 #endif
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 0e49652898c..bf11cc426f9 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -305,6 +305,7 @@ extern enum attr_cpu ix86_schedule;
 #endif
 
 extern const char * ix86_output_call_insn (rtx_insn *insn, rtx call_op);
+extern const char * ix86_output_indirect_jmp (rtx call_op, bool ret_p);
 extern bool ix86_operands_ok_for_move_multiple (rtx *operands, bool load,
 						machine_mode mode);
 extern int ix86_min_insn_size (rtx_insn *);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8696f931806..ac4d1f62f50 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2724,12 +2724,19 @@ make_pass_insert_endbranch (gcc::context *ctxt)
   return new pass_insert_endbranch (ctxt);
 }
 
-/* Return true if a red-zone is in use.  */
+/* Return true if a red-zone is in use.  We can't use red-zone when
+   there are local indirect jumps, like "indirect_jump" or "tablejump",
+   which jumps to another place in the function, since "call" in the
+   indirect thunk pushes the return address onto stack, destroying
+   red-zone.  */
 
 bool
 ix86_using_red_zone (void)
 {
-  return TARGET_RED_ZONE && !TARGET_64BIT_MS_ABI;
+  return (TARGET_RED_ZONE
+	  && !TARGET_64BIT_MS_ABI
+	  && (!cfun->machine->has_local_indirect_jump
+	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
 \f
 /* Return a string that documents the current -m options.  The caller is
@@ -5797,6 +5804,37 @@ ix86_set_func_type (tree fndecl)
     }
 }
 
+/* Set the indirect_branch_type field from the function FNDECL.  */
+
+static void
+ix86_set_indirect_branch_type (tree fndecl)
+{
+  if (cfun->machine->indirect_branch_type == indirect_branch_unset)
+    {
+      tree attr = lookup_attribute ("indirect_branch",
+				    DECL_ATTRIBUTES (fndecl));
+      if (attr != NULL)
+	{
+	  tree args = TREE_VALUE (attr);
+	  if (args == NULL)
+	    gcc_unreachable ();
+	  tree cst = TREE_VALUE (args);
+	  if (strcmp (TREE_STRING_POINTER (cst), "keep") == 0)
+	    cfun->machine->indirect_branch_type = indirect_branch_keep;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk") == 0)
+	    cfun->machine->indirect_branch_type = indirect_branch_thunk;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk-inline") == 0)
+	    cfun->machine->indirect_branch_type = indirect_branch_thunk_inline;
+	  else if (strcmp (TREE_STRING_POINTER (cst), "thunk-extern") == 0)
+	    cfun->machine->indirect_branch_type = indirect_branch_thunk_extern;
+	  else
+	    gcc_unreachable ();
+	}
+      else
+	cfun->machine->indirect_branch_type = ix86_indirect_branch;
+    }
+}
+
 /* Establish appropriate back-end context for processing the function
    FNDECL.  The argument might be NULL to indicate processing at top
    level, outside of any function scope.  */
@@ -5812,7 +5850,10 @@ ix86_set_current_function (tree fndecl)
 	 one is extern inline and one isn't.  Call ix86_set_func_type
 	 to set the func_type field.  */
       if (fndecl != NULL_TREE)
-	ix86_set_func_type (fndecl);
+	{
+	  ix86_set_func_type (fndecl);
+	  ix86_set_indirect_branch_type (fndecl);
+	}
       return;
     }
 
@@ -5832,6 +5873,7 @@ ix86_set_current_function (tree fndecl)
     }
 
   ix86_set_func_type (fndecl);
+  ix86_set_indirect_branch_type (fndecl);
 
   tree new_tree = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
   if (new_tree == NULL_TREE)
@@ -10639,6 +10681,181 @@ ix86_setup_frame_addresses (void)
 # endif
 #endif
 
+static int indirectlabelno;
+static bool indirect_thunk_needed = false;
+static bool indirect_thunk_bnd_needed = false;
+
+static int indirect_thunks_used;
+static int indirect_thunks_bnd_used;
+
+#ifndef INDIRECT_LABEL
+# define INDIRECT_LABEL "LIND"
+#endif
+
+/* Fills in the label name that should be used for the indirect thunk.  */
+
+static void
+indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
+{
+  if (USE_HIDDEN_LINKONCE)
+    {
+      const char *bnd = need_bnd_p ? "_bnd" : "";
+      if (regno >= 0)
+	{
+	  const char *reg_prefix;
+	  if (LEGACY_INT_REGNO_P (regno))
+	    reg_prefix = TARGET_64BIT ? "r" : "e";
+	  else
+	    reg_prefix = "";
+	  sprintf (name, "__x86_indirect_thunk%s_%s%s",
+		   bnd, reg_prefix, reg_names[regno]);
+	}
+      else
+	sprintf (name, "__x86_indirect_thunk%s", bnd);
+    }
+  else
+    {
+      if (regno >= 0)
+	{
+	  if (need_bnd_p)
+	    ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno);
+	  else
+	    ASM_GENERATE_INTERNAL_LABEL (name, "LITR", regno);
+	}
+      else
+	{
+	  if (need_bnd_p)
+	    ASM_GENERATE_INTERNAL_LABEL (name, "LITB", 0);
+	  else
+	    ASM_GENERATE_INTERNAL_LABEL (name, "LIT", 0);
+	}
+    }
+}
+
+static void
+output_indirect_thunk (bool need_bnd_p, int regno)
+{
+  char indirectlabel1[32];
+  char indirectlabel2[32];
+
+  ASM_GENERATE_INTERNAL_LABEL (indirectlabel1, INDIRECT_LABEL,
+			       indirectlabelno++);
+  ASM_GENERATE_INTERNAL_LABEL (indirectlabel2, INDIRECT_LABEL,
+			       indirectlabelno++);
+
+  /* Call */
+  if (need_bnd_p)
+    fputs ("\tbnd call\t", asm_out_file);
+  else
+    fputs ("\tcall\t", asm_out_file);
+  assemble_name_raw (asm_out_file, indirectlabel2);
+  fputc ('\n', asm_out_file);
+
+  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
+
+  /* lfence .  */
+  fprintf (asm_out_file, "\tlfence\n");
+
+  /* Jump.  */
+  fputs ("\tjmp\t", asm_out_file);
+  assemble_name_raw (asm_out_file, indirectlabel1);
+  fputc ('\n', asm_out_file);
+
+  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
+
+  if (regno >= 0)
+    {
+      /* MOV.  */
+      rtx xops[2];
+      xops[0] = gen_rtx_MEM (word_mode, stack_pointer_rtx);
+      xops[1] = gen_rtx_REG (word_mode, regno);
+      output_asm_insn ("mov\t{%1, %0|%0, %1}", xops);
+    }
+  else
+    {
+      /* LEA.  */
+      rtx xops[2];
+      xops[0] = stack_pointer_rtx;
+      xops[1] = plus_constant (Pmode, stack_pointer_rtx, UNITS_PER_WORD);
+      output_asm_insn ("lea\t{%E1, %0|%0, %E1}", xops);
+    }
+
+  if (need_bnd_p)
+    fputs ("\tbnd ret\n", asm_out_file);
+  else
+    fputs ("\tret\n", asm_out_file);
+}
+
+static void
+output_indirect_thunk_function (bool need_bnd_p, int regno)
+{
+  char name[32];
+  tree decl;
+
+  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
+  indirect_thunk_name (name, regno, need_bnd_p);
+  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+		     get_identifier (name),
+		     build_function_type_list (void_type_node, NULL_TREE));
+  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
+				   NULL_TREE, void_type_node);
+  TREE_PUBLIC (decl) = 1;
+  TREE_STATIC (decl) = 1;
+  DECL_IGNORED_P (decl) = 1;
+
+#if TARGET_MACHO
+  if (TARGET_MACHO)
+    {
+      switch_to_section (darwin_sections[picbase_thunk_section]);
+      fputs ("\t.weak_definition\t", asm_out_file);
+      assemble_name (asm_out_file, name);
+      fputs ("\n\t.private_extern\t", asm_out_file);
+      assemble_name (asm_out_file, name);
+      putc ('\n', asm_out_file);
+      ASM_OUTPUT_LABEL (asm_out_file, name);
+      DECL_WEAK (decl) = 1;
+    }
+  else
+#endif
+    if (USE_HIDDEN_LINKONCE)
+      {
+	cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
+
+	targetm.asm_out.unique_section (decl, 0);
+	switch_to_section (get_named_section (decl, NULL, 0));
+
+	targetm.asm_out.globalize_label (asm_out_file, name);
+	fputs ("\t.hidden\t", asm_out_file);
+	assemble_name (asm_out_file, name);
+	putc ('\n', asm_out_file);
+	ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
+      }
+    else
+      {
+	switch_to_section (text_section);
+	ASM_OUTPUT_LABEL (asm_out_file, name);
+      }
+
+  DECL_INITIAL (decl) = make_node (BLOCK);
+  current_function_decl = decl;
+  allocate_struct_function (decl, false);
+  init_function_start (decl);
+  /* We're about to hide the function body from callees of final_* by
+     emitting it directly; tell them we're a thunk, if they care.  */
+  cfun->is_thunk = true;
+  first_function_block_is_cold = false;
+  /* Make sure unwind info is emitted for the thunk if needed.  */
+  final_start_function (emit_barrier (), asm_out_file, 1);
+
+  output_indirect_thunk (need_bnd_p, regno);
+
+  final_end_function ();
+  init_insn_lengths ();
+  free_after_compilation (cfun);
+  set_cfun (NULL);
+  current_function_decl = NULL;
+}
+
 static int pic_labels_used;
 
 /* Fills in the label name that should be used for a pc thunk for
@@ -10665,11 +10882,32 @@ ix86_code_end (void)
   rtx xops[2];
   int regno;
 
+  if (indirect_thunk_needed)
+    output_indirect_thunk_function (false, -1);
+  if (indirect_thunk_bnd_needed)
+    output_indirect_thunk_function (true, -1);
+
+  for (regno = FIRST_REX_INT_REG; regno <= LAST_REX_INT_REG; regno++)
+    {
+      int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1;
+      if ((indirect_thunks_used & (1 << i)))
+	output_indirect_thunk_function (false, regno);
+
+      if ((indirect_thunks_bnd_used & (1 << i)))
+	output_indirect_thunk_function (true, regno);
+    }
+
   for (regno = FIRST_INT_REG; regno <= LAST_INT_REG; regno++)
     {
       char name[32];
       tree decl;
 
+      if ((indirect_thunks_used & (1 << regno)))
+	output_indirect_thunk_function (false, regno);
+
+      if ((indirect_thunks_bnd_used & (1 << regno)))
+	output_indirect_thunk_function (true, regno);
+
       if (!(pic_labels_used & (1 << regno)))
 	continue;
 
@@ -28119,12 +28357,182 @@ ix86_nopic_noplt_attribute_p (rtx call_op)
   return false;
 }
 
+static void
+ix86_output_indirect_branch (rtx call_op, const char *xasm,
+			     bool sibcall_p)
+{
+  char thunk_name_buf[32];
+  char *thunk_name;
+  char push_buf[64];
+  bool need_bnd_p = ix86_bnd_prefixed_insn_p (current_output_insn);
+  int regno;
+
+  if (REG_P (call_op))
+    regno = REGNO (call_op);
+  else
+    regno = -1;
+
+  if (cfun->machine->indirect_branch_type
+      != indirect_branch_thunk_inline)
+    {
+      if (cfun->machine->indirect_branch_type == indirect_branch_thunk)
+	{
+	  if (regno >= 0)
+	    {
+	      int i = regno;
+	      if (i >= FIRST_REX_INT_REG)
+		i -= (FIRST_REX_INT_REG - LAST_INT_REG - 1);
+	      if (need_bnd_p)
+		indirect_thunks_bnd_used |= 1 << i;
+	      else
+		indirect_thunks_used |= 1 << i;
+	    }
+	  else
+	    {
+	      if (need_bnd_p)
+		indirect_thunk_bnd_needed = true;
+	      else
+		indirect_thunk_needed = true;
+	    }
+	}
+      indirect_thunk_name (thunk_name_buf, regno, need_bnd_p);
+      thunk_name = thunk_name_buf;
+    }
+  else
+    thunk_name = NULL;
+
+  snprintf (push_buf, sizeof (push_buf), "push{%c}\t%s",
+	    TARGET_64BIT ? 'q' : 'l', xasm);
+
+  if (sibcall_p)
+    {
+      if (regno < 0)
+	output_asm_insn (push_buf, &call_op);
+      if (thunk_name != NULL)
+	{
+	  if (need_bnd_p)
+	    fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
+	  else
+	    fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
+	}
+      else
+	output_indirect_thunk (need_bnd_p, regno);
+    }
+  else
+    {
+      if (regno >= 0 && thunk_name != NULL)
+	{
+	  if (need_bnd_p)
+	    fprintf (asm_out_file, "\tbnd call\t%s\n", thunk_name);
+	  else
+	    fprintf (asm_out_file, "\tcall\t%s\n", thunk_name);
+	  return;
+	}
+
+      char indirectlabel1[32];
+      char indirectlabel2[32];
+
+      ASM_GENERATE_INTERNAL_LABEL (indirectlabel1,
+				   INDIRECT_LABEL,
+				   indirectlabelno++);
+      ASM_GENERATE_INTERNAL_LABEL (indirectlabel2,
+				   INDIRECT_LABEL,
+				   indirectlabelno++);
+
+      /* Jump.  */
+      if (need_bnd_p)
+	fputs ("\tbnd jmp\t", asm_out_file);
+      else
+	fputs ("\tjmp\t", asm_out_file);
+      assemble_name_raw (asm_out_file, indirectlabel2);
+      fputc ('\n', asm_out_file);
+
+      ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel1);
+
+      if (MEM_P (call_op))
+	{
+	  struct ix86_address parts;
+	  rtx addr = XEXP (call_op, 0);
+	  if (ix86_decompose_address (addr, &parts)
+	      && parts.base == stack_pointer_rtx)
+	    {
+	      /* Since call will adjust stack by -UNITS_PER_WORD,
+		 we must convert "disp(stack, index, scale)" to
+		 "disp+UNITS_PER_WORD(stack, index, scale)".  */
+	      if (parts.index)
+		{
+		  addr = gen_rtx_MULT (Pmode, parts.index,
+				       GEN_INT (parts.scale));
+		  addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
+				       addr);
+		}
+	      else
+		addr = stack_pointer_rtx;
+
+	      rtx disp;
+	      if (parts.disp != NULL_RTX)
+		disp = plus_constant (Pmode, parts.disp,
+				      UNITS_PER_WORD);
+	      else
+		disp = GEN_INT (UNITS_PER_WORD);
+
+	      addr = gen_rtx_PLUS (Pmode, addr, disp);
+	      call_op = gen_rtx_MEM (GET_MODE (call_op), addr);
+	    }
+	}
+
+      if (regno < 0)
+	output_asm_insn (push_buf, &call_op);
+
+      if (thunk_name != NULL)
+	{
+	  if (need_bnd_p)
+	    fprintf (asm_out_file, "\tbnd jmp\t%s\n", thunk_name);
+	  else
+	    fprintf (asm_out_file, "\tjmp\t%s\n", thunk_name);
+	}
+      else
+	output_indirect_thunk (need_bnd_p, regno);
+
+      ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
+
+      /* Call.  */
+      if (need_bnd_p)
+	fputs ("\tbnd call\t", asm_out_file);
+      else
+	fputs ("\tcall\t", asm_out_file);
+      assemble_name_raw (asm_out_file, indirectlabel1);
+      fputc ('\n', asm_out_file);
+    }
+}
+
+const char *
+ix86_output_indirect_jmp (rtx call_op, bool ret_p)
+{
+  if (cfun->machine->indirect_branch_type != indirect_branch_keep)
+    {
+      /* We can't have red-zone if this isn't a function return since
+	 "call" in the indirect thunk pushes the return address onto
+	 stack, destroying red-zone.  */
+      if (!ret_p && ix86_red_zone_size != 0)
+	gcc_unreachable ();
+
+      ix86_output_indirect_branch (call_op, "%0", true);
+      return "";
+    }
+  else
+    return "%!jmp\t%A0";
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
 ix86_output_call_insn (rtx_insn *insn, rtx call_op)
 {
   bool direct_p = constant_call_address_operand (call_op, VOIDmode);
+  bool output_indirect_p
+    = (!TARGET_SEH
+       && cfun->machine->indirect_branch_type != indirect_branch_keep);
   bool seh_nop_p = false;
   const char *xasm;
 
@@ -28134,10 +28542,21 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
 	{
 	  if (ix86_nopic_noplt_attribute_p (call_op))
 	    {
+	      direct_p = false;
 	      if (TARGET_64BIT)
-		xasm = "%!jmp\t{*%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+		{
+		  if (output_indirect_p)
+		    xasm = "{%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+		  else
+		    xasm = "%!jmp\t{*%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+		}
 	      else
-		xasm = "%!jmp\t{*%p0@GOT|[DWORD PTR %p0@GOT]}";
+		{
+		  if (output_indirect_p)
+		    xasm = "{%p0@GOT|[DWORD PTR %p0@GOT]}";
+		  else
+		    xasm = "%!jmp\t{*%p0@GOT|[DWORD PTR %p0@GOT]}";
+		}
 	    }
 	  else
 	    xasm = "%!jmp\t%P0";
@@ -28147,9 +28566,17 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
       else if (TARGET_SEH)
 	xasm = "%!rex.W jmp\t%A0";
       else
-	xasm = "%!jmp\t%A0";
+	{
+	  if (output_indirect_p)
+	    xasm = "%0";
+	  else
+	    xasm = "%!jmp\t%A0";
+	}
 
-      output_asm_insn (xasm, &call_op);
+      if (output_indirect_p && !direct_p)
+	ix86_output_indirect_branch (call_op, xasm, true);
+      else
+	output_asm_insn (xasm, &call_op);
       return "";
     }
 
@@ -28187,18 +28614,37 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op)
     {
       if (ix86_nopic_noplt_attribute_p (call_op))
 	{
+	  direct_p = false;
 	  if (TARGET_64BIT)
-	    xasm = "%!call\t{*%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+	    {
+	      if (output_indirect_p)
+		xasm = "{%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+	      else
+		xasm = "%!call\t{*%p0@GOTPCREL(%%rip)|[QWORD PTR %p0@GOTPCREL[rip]]}";
+	    }
 	  else
-	    xasm = "%!call\t{*%p0@GOT|[DWORD PTR %p0@GOT]}";
+	    {
+	      if (output_indirect_p)
+		xasm = "{%p0@GOT|[DWORD PTR %p0@GOT]}";
+	      else
+		xasm = "%!call\t{*%p0@GOT|[DWORD PTR %p0@GOT]}";
+	    }
 	}
       else
 	xasm = "%!call\t%P0";
     }
   else
-    xasm = "%!call\t%A0";
+    {
+      if (output_indirect_p)
+	xasm = "%0";
+      else
+	xasm = "%!call\t%A0";
+    }
 
-  output_asm_insn (xasm, &call_op);
+  if (output_indirect_p && !direct_p)
+    ix86_output_indirect_branch (call_op, xasm, false);
+  else
+    output_asm_insn (xasm, &call_op);
 
   if (seh_nop_p)
     return "nop";
@@ -40356,7 +40802,7 @@ ix86_handle_struct_attribute (tree *node, tree name, tree, int,
 }
 
 static tree
-ix86_handle_fndecl_attribute (tree *node, tree name, tree, int,
+ix86_handle_fndecl_attribute (tree *node, tree name, tree args, int,
 			      bool *no_add_attrs)
 {
   if (TREE_CODE (*node) != FUNCTION_DECL)
@@ -40365,6 +40811,29 @@ ix86_handle_fndecl_attribute (tree *node, tree name, tree, int,
                name);
       *no_add_attrs = true;
     }
+
+  if (is_attribute_p ("indirect_branch", name))
+    {
+      tree cst = TREE_VALUE (args);
+      if (TREE_CODE (cst) != STRING_CST)
+	{
+	  warning (OPT_Wattributes,
+		   "%qE attribute requires a string constant argument",
+		   name);
+	  *no_add_attrs = true;
+	}
+      else if (strcmp (TREE_STRING_POINTER (cst), "keep") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk-inline") != 0
+	       && strcmp (TREE_STRING_POINTER (cst), "thunk-extern") != 0)
+	{
+	  warning (OPT_Wattributes,
+		   "argument to %qE attribute is not "
+		   "(keep|thunk|thunk-inline|thunk-extern)", name);
+	  *no_add_attrs = true;
+	}
+    }
+
   return NULL_TREE;
 }
 
@@ -44777,6 +45246,8 @@ static const struct attribute_spec ix86_attribute_table[] =
     ix86_handle_no_caller_saved_registers_attribute, NULL },
   { "naked", 0, 0, true, false, false, false,
     ix86_handle_fndecl_attribute, NULL },
+  { "indirect_branch", 1, 1, true, false, false, false,
+    ix86_handle_fndecl_attribute, NULL },
 
   /* End element.  */
   { NULL, 0, 0, false, false, false, false, NULL, NULL }
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 93b7a2c5915..51a920298a4 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -2568,6 +2568,13 @@ struct GTY(()) machine_function {
   /* Function type.  */
   ENUM_BITFIELD(function_type) func_type : 2;
 
+  /* How to generate indirec branch.  */
+  ENUM_BITFIELD(indirect_branch) indirect_branch_type : 3;
+
+  /* If true, the current function has local indirect jumps, like
+     "indirect_jump" or "tablejump".  */
+  BOOL_BITFIELD has_local_indirect_jump : 1;
+
   /* If true, the current function is a function specified with
      the "interrupt" or "no_caller_saved_registers" attribute.  */
   BOOL_BITFIELD no_caller_saved_registers : 1;
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 3f587806407..a7573c468ae 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12313,12 +12313,13 @@
 {
   if (TARGET_X32)
     operands[0] = convert_memory_address (word_mode, operands[0]);
+  cfun->machine->has_local_indirect_jump = true;
 })
 
 (define_insn "*indirect_jump"
   [(set (pc) (match_operand:W 0 "indirect_branch_operand" "rBw"))]
   ""
-  "%!jmp\t%A0"
+  "* return ix86_output_indirect_jmp (operands[0], false);"
   [(set_attr "type" "ibr")
    (set_attr "length_immediate" "0")
    (set_attr "maybe_prefix_bnd" "1")])
@@ -12362,13 +12363,14 @@
 
   if (TARGET_X32)
     operands[0] = convert_memory_address (word_mode, operands[0]);
+  cfun->machine->has_local_indirect_jump = true;
 })
 
 (define_insn "*tablejump_1"
   [(set (pc) (match_operand:W 0 "indirect_branch_operand" "rBw"))
    (use (label_ref (match_operand 1)))]
   ""
-  "%!jmp\t%A0"
+  "* return ix86_output_indirect_jmp (operands[0], false);"
   [(set_attr "type" "ibr")
    (set_attr "length_immediate" "0")
    (set_attr "maybe_prefix_bnd" "1")])
@@ -13097,7 +13099,7 @@
   [(simple_return)
    (use (match_operand 0 "register_operand" "r"))]
   "reload_completed"
-  "%!jmp\t%A0"
+  "* return ix86_output_indirect_jmp (operands[0], true);"
   [(set_attr "type" "ibr")
    (set_attr "length_immediate" "0")
    (set_attr "maybe_prefix_bnd" "1")])
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 09aaa97c2fc..22c806206e4 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1021,3 +1021,23 @@ indirect jump.
 mforce-indirect-call
 Target Report Var(flag_force_indirect_call) Init(0)
 Make all function calls indirect.
+
+mindirect-branch=
+Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
+Convert indirect call and jump.
+
+Enum
+Name(indirect_branch) Type(enum indirect_branch)
+Known indirect branch choices (for use with the -mindirect-branch= option):
+
+EnumValue
+Enum(indirect_branch) String(keep) Value(indirect_branch_keep)
+
+EnumValue
+Enum(indirect_branch) String(thunk) Value(indirect_branch_thunk)
+
+EnumValue
+Enum(indirect_branch) String(thunk-inline) Value(indirect_branch_thunk_inline)
+
+EnumValue
+Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern)
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 5f0f4b86cb2..6e48d4108a2 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -5735,6 +5735,16 @@ Specify which floating-point unit to use.  You must specify the
 @code{target("fpmath=sse+387")} because the comma would separate
 different options.
 
+@item indirect_branch("@var{choice}")
+@cindex @code{indirect_branch} function attribute, x86
+On x86 targets, the @code{indirect_branch} attribute causes the compiler
+to convert indirect call and jump with @var{choice}.  @samp{keep}
+keeps indirect call and jump unmodified.  @samp{thunk} converts indirect
+call and jump to call and return thunk.  @samp{thunk-inline} converts
+indirect call and jump to inlined call and return thunk.
+@samp{thunk-extern} converts indirect call and jump to external call
+and return thunk provided in a separate object file.
+
 @item nocf_check
 @cindex @code{nocf_check} function attribute
 The @code{nocf_check} attribute on a function is used to inform the
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c6025382dbb..46461d1ada3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1226,7 +1226,8 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-reg=@var{reg} @gol
 -mstack-protector-guard-offset=@var{offset} @gol
 -mstack-protector-guard-symbol=@var{symbol} -mmitigate-rop @gol
--mgeneral-regs-only  -mcall-ms2sysv-xlogues}
+-mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
+-mindirect-branch=@var{choice}}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -26764,6 +26765,17 @@ Generate code that uses only the general-purpose registers.  This
 prevents the compiler from using floating-point, vector, mask and bound
 registers.
 
+@item -mindirect-branch=@var{choice}
+@opindex -mindirect-branch
+Convert indirect call and jump with @var{choice}.  The default is
+@samp{keep}, which keeps indirect call and jump unmodified.
+@samp{thunk} converts indirect call and jump to call and return thunk.
+@samp{thunk-inline} converts indirect call and jump to inlined call
+and return thunk.  @samp{thunk-extern} converts indirect call and jump
+to external call and return thunk provided in a separate object file.
+You can control this behavior for a specific function by using the
+function attribute @code{indirect_branch}.  @xref{Function Attributes}.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
new file mode 100644
index 00000000000..08827448325
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
new file mode 100644
index 00000000000..1344b6bc0e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
new file mode 100644
index 00000000000..dcc9ef75df6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
new file mode 100644
index 00000000000..2502860b6e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
new file mode 100644
index 00000000000..58c81d16316
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk" } */
+
+extern void bar (void);
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
new file mode 100644
index 00000000000..b4c528a5b30
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk" } */
+
+extern void bar (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
new file mode 100644
index 00000000000..4553ef0b622
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+void func0 (void);
+void func1 (void);
+void func2 (void);
+void func3 (void);
+void func4 (void);
+void func4 (void);
+void func5 (void);
+
+void
+bar (int i)
+{
+  switch (i)
+    {
+    default:
+      func0 ();
+      break;
+    case 1:
+      func1 ();
+      break;
+    case 2:
+      func2 ();
+      break;
+    case 3:
+      func3 ();
+      break;
+    case 4:
+      func4 ();
+      break;
+    case 5:
+      func5 ();
+      break;
+    }
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
new file mode 100644
index 00000000000..b8e9851c76f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+extern void male_indirect_jump (long)
+  __attribute__ ((indirect_branch("thunk")));
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
new file mode 100644
index 00000000000..1d6d18c2aba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+__attribute__ ((indirect_branch("thunk")))
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
new file mode 100644
index 00000000000..af167840b81
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+extern int male_indirect_jump (long)
+  __attribute__ ((indirect_branch("thunk-inline")));
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
new file mode 100644
index 00000000000..146124894a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+__attribute__ ((indirect_branch("thunk-inline")))
+int
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
new file mode 100644
index 00000000000..0833606046b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+extern int male_indirect_jump (long)
+  __attribute__ ((indirect_branch("thunk-extern")));
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
new file mode 100644
index 00000000000..2eba0fbd9b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+__attribute__ ((indirect_branch("thunk-extern")))
+int
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
new file mode 100644
index 00000000000..f58427eae11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-pic" } */
+
+void func0 (void);
+void func1 (void);
+void func2 (void);
+void func3 (void);
+void func4 (void);
+void func4 (void);
+void func5 (void);
+
+__attribute__ ((indirect_branch("thunk-extern")))
+void
+bar (int i)
+{
+  switch (i)
+    {
+    default:
+      func0 ();
+      break;
+    case 1:
+      func1 ();
+      break;
+    case 2:
+      func2 ();
+      break;
+    case 3:
+      func3 ();
+      break;
+    case 4:
+      func4 ();
+      break;
+    case 5:
+      func5 ();
+      break;
+    }
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
new file mode 100644
index 00000000000..6960fa0bbfb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fno-pic" } */
+
+void func0 (void);
+void func1 (void);
+void func2 (void);
+void func3 (void);
+void func4 (void);
+void func4 (void);
+void func5 (void);
+
+__attribute__ ((indirect_branch("keep")))
+void
+bar (int i)
+{
+  switch (i)
+    {
+    default:
+      func0 ();
+      break;
+    case 1:
+      func1 ();
+      break;
+    case 2:
+      func2 ();
+      break;
+    case 3:
+      func3 ();
+      break;
+    case 4:
+      func4 ();
+      break;
+    case 5:
+      func5 ();
+      break;
+    }
+}
+
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
new file mode 100644
index 00000000000..21b25ec5bbf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { ! x32 } } } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+
+void (*dispatch) (char *);
+char buf[10];
+
+void
+foo (void)
+{
+  dispatch (buf);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "pushq\[ \t\]%rax" { target x32 } } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*__x86_indirect_thunk_bnd" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd ret" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
new file mode 100644
index 00000000000..7bf7e6a1095
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { ! x32 } } } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+
+void (*dispatch) (char *);
+char buf[10];
+
+int
+foo (void)
+{
+  dispatch (buf);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "pushq\[ \t\]%rax" { target x32 } } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*__x86_indirect_thunk_bnd" } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd ret" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
new file mode 100644
index 00000000000..14c60f232db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+
+void bar (char *);
+char buf[10];
+
+void
+foo (void)
+{
+  bar (buf);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*__x86_indirect_thunk_bnd" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "bnd ret" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
new file mode 100644
index 00000000000..4fd6f360801
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
+/* { dg-options "-O2 -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+
+void bar (char *);
+char buf[10];
+
+int
+foo (void)
+{
+  bar (buf);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler "bnd jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-times "bnd call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler "bnd ret" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
new file mode 100644
index 00000000000..49f27b49465
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
new file mode 100644
index 00000000000..a1e3eb6fc74
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
new file mode 100644
index 00000000000..395634e7e5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
new file mode 100644
index 00000000000..fd3f63379a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
new file mode 100644
index 00000000000..ba2f92b6f34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+
+extern void bar (void);
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
new file mode 100644
index 00000000000..0c5a2d472c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+
+extern void bar (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
new file mode 100644
index 00000000000..665252327aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -fno-pic" } */
+
+void func0 (void);
+void func1 (void);
+void func2 (void);
+void func3 (void);
+void func4 (void);
+void func4 (void);
+void func5 (void);
+
+void
+bar (int i)
+{
+  switch (i)
+    {
+    default:
+      func0 ();
+      break;
+    case 1:
+      func1 ();
+      break;
+    case 2:
+      func2 ();
+      break;
+    case 3:
+      func3 ();
+      break;
+    case 4:
+      func4 ();
+      break;
+    case 5:
+      func5 ();
+      break;
+    }
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
new file mode 100644
index 00000000000..3ace8d1b031
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
new file mode 100644
index 00000000000..6c97b96f1f2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
new file mode 100644
index 00000000000..8f6759cbf06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
new file mode 100644
index 00000000000..b07d08cab0f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch[256];
+
+int
+male_indirect_jump (long offset)
+{
+  dispatch[offset](offset);
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
new file mode 100644
index 00000000000..10794886b1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+
+extern void bar (void);
+
+void
+foo (void)
+{
+  bar ();
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
new file mode 100644
index 00000000000..a26ec4b06ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+
+extern void bar (void);
+
+int
+foo (void)
+{
+  bar ();
+  return 0;
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*bar@GOT" } } */
+/* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 2 } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
new file mode 100644
index 00000000000..77253af17c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
@@ -0,0 +1,42 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -fno-pic" } */
+
+void func0 (void);
+void func1 (void);
+void func2 (void);
+void func3 (void);
+void func4 (void);
+void func4 (void);
+void func5 (void);
+
+void
+bar (int i)
+{
+  switch (i)
+    {
+    default:
+      func0 ();
+      break;
+    case 1:
+      func1 ();
+      break;
+    case 2:
+      func2 ();
+      break;
+    case 3:
+      func3 ();
+      break;
+    case 4:
+      func4 ();
+      break;
+    case 5:
+      func5 ();
+      break;
+    }
+}
+
+/* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 5/5] x86: Add 'V' register operand modifier
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
  2018-01-07 22:59 ` [PATCH 4/5] x86: Add -mindirect-branch-register H.J. Lu
  2018-01-07 22:59 ` [PATCH 1/5] x86: Add -mindirect-branch= H.J. Lu
@ 2018-01-07 22:59 ` H.J. Lu
  2018-01-11 23:17   ` Jeff Law
  2018-01-07 22:59 ` [PATCH 3/5] x86: Add -mfunction-return= H.J. Lu
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

Add 'V', a special modifier which prints the name of the full integer
register without '%'.  For

extern void (*func_p) (void);

void
foo (void)
{
  asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
}

it generates:

foo:
	movq	func_p(%rip), %rax
	call	__x86_indirect_thunk_rax
	ret

gcc/

	* config/i386/i386.c (print_reg): Print the name of the full
	integer register without '%'.
	(ix86_print_operand): Handle 'V'.
	 * doc/extend.texi: Document 'V' modifier.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-register-4.c: New test.
---
 gcc/config/i386/i386.c                                    | 13 ++++++++++++-
 gcc/doc/extend.texi                                       |  3 +++
 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c | 13 +++++++++++++
 3 files changed, 28 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b75fc48a4da..2627e57a47e 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -17591,6 +17591,7 @@ put_condition_code (enum rtx_code code, machine_mode mode, bool reverse,
    If CODE is 'h', pretend the reg is the 'high' byte register.
    If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
    If CODE is 'd', duplicate the operand for AVX instruction.
+   If CODE is 'V', print naked full integer register name without %.
  */
 
 void
@@ -17601,7 +17602,7 @@ print_reg (rtx x, int code, FILE *file)
   unsigned int regno;
   bool duplicated;
 
-  if (ASSEMBLER_DIALECT == ASM_ATT)
+  if (ASSEMBLER_DIALECT == ASM_ATT && code != 'V')
     putc ('%', file);
 
   if (x == pc_rtx)
@@ -17653,6 +17654,14 @@ print_reg (rtx x, int code, FILE *file)
       return;
     }
 
+  if (code == 'V')
+    {
+      if (GENERAL_REGNO_P (regno))
+	msize = GET_MODE_SIZE (word_mode);
+      else
+	error ("'V' modifier on non-integer register");
+    }
+
   duplicated = code == 'd' && TARGET_AVX;
 
   switch (msize)
@@ -17772,6 +17781,7 @@ print_reg (rtx x, int code, FILE *file)
    & -- print some in-use local-dynamic symbol name.
    H -- print a memory address offset by 8; used for sse high-parts
    Y -- print condition for XOP pcom* instruction.
+   V -- print naked full integer register name without %.
    + -- print a branch hint as 'cs' or 'ds' prefix
    ; -- print a semicolon (after prefixes due to bug in older gas).
    ~ -- print "i" if TARGET_AVX2, "f" otherwise.
@@ -17995,6 +18005,7 @@ ix86_print_operand (FILE *file, rtx x, int code)
 	case 'X':
 	case 'P':
 	case 'p':
+	case 'V':
 	  break;
 
 	case 's':
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 7690a2bdb7d..97a420155a4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9273,6 +9273,9 @@ The table below shows the list of supported modifiers and their effects.
 @tab @code{2}
 @end multitable
 
+@code{V} is a special modifier which prints the name of the full integer
+register without @code{%}.
+
 @anchor{x86floatingpointasmoperands}
 @subsubsection x86 Floating-Point @code{asm} Operands
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
new file mode 100644
index 00000000000..f0cd9b75be8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=keep -fno-pic" } */
+
+extern void (*func_p) (void);
+
+void
+foo (void)
+{
+  asm("call __x86_indirect_thunk_%V0" : : "a" (func_p));
+}
+
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_eax" { target ia32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_rax" { target { ! ia32 } } } } */
-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 4/5] x86: Add -mindirect-branch-register
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
@ 2018-01-07 22:59 ` H.J. Lu
  2018-01-11 23:11   ` Jeff Law
  2018-01-07 22:59 ` [PATCH 1/5] x86: Add -mindirect-branch= H.J. Lu
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

Add -mindirect-branch-register to force indirect branch via register.
This is implemented by disabling patterns of indirect branch via memory,
similar to TARGET_X32.

-mindirect-branch= and -mfunction-return= tests are updated with
-mno-indirect-branch-register to avoid false test failures when
-mindirect-branch-register is added to RUNTESTFLAGS for "make check".

gcc/

	* config/i386/constraints.md (Bs): Disallow memory operand for
	-mindirect-branch-register.
	(Bw): Likewise.
	* config/i386/predicates.md (indirect_branch_operand): Likewise.
	(GOT_memory_operand): Likewise.
	(call_insn_operand): Likewise.
	(sibcall_insn_operand): Likewise.
	(GOT32_symbol_operand): Likewise.
	* config/i386/i386.md (indirect_jump): Call convert_memory_address
	for -mindirect-branch-register.
	(tablejump): Likewise.
	(*sibcall_memory): Likewise.
	(*sibcall_value_memory): Likewise.
	Disallow peepholes of indirect call and jump via memory for
	-mindirect-branch-register.
	(*call_pop): Replace m with Bw.
	(*call_value_pop): Likewise.
	(*sibcall_pop_memory): Replace m with Bs.
	* config/i386/i386.opt (mindirect-branch-register): New option.
	* doc/invoke.texi: Document -mindirect-branch-register option.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
	-mno-indirect-branch-register.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-1.c: New test.
	* gcc.target/i386/indirect-thunk-register-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.
---
 gcc/config/i386/constraints.md                     | 12 +++++---
 gcc/config/i386/i386.md                            | 34 ++++++++++++++--------
 gcc/config/i386/i386.opt                           |  4 +++
 gcc/config/i386/predicates.md                      | 21 ++++++++-----
 gcc/doc/invoke.texi                                |  6 +++-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c   |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c   |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-1.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-2.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-3.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-4.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-5.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-6.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-7.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-bnd-1.c         |  2 +-
 .../gcc.target/i386/indirect-thunk-bnd-2.c         |  2 +-
 .../gcc.target/i386/indirect-thunk-bnd-3.c         |  2 +-
 .../gcc.target/i386/indirect-thunk-bnd-4.c         |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-1.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-2.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-3.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-4.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-5.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-6.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-7.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-1.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-2.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-3.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-4.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-5.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-6.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-inline-7.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-loop-1.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-loop-2.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-loop-3.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-loop-4.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-loop-5.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-register-1.c    | 22 ++++++++++++++
 .../gcc.target/i386/indirect-thunk-register-2.c    | 20 +++++++++++++
 .../gcc.target/i386/indirect-thunk-register-3.c    | 19 ++++++++++++
 gcc/testsuite/gcc.target/i386/ret-thunk-10.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-11.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-12.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-13.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-14.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-15.c       |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-9.c        |  2 +-
 52 files changed, 158 insertions(+), 68 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 5f6c6d65332..5592c43073e 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -225,16 +225,20 @@
 
 (define_constraint "Bs"
   "@internal Sibcall memory operand."
-  (ior (and (not (match_test "TARGET_X32"))
+  (ior (and (not (match_test "TARGET_X32
+			      || ix86_indirect_branch_thunk_register"))
 	    (match_operand 0 "sibcall_memory_operand"))
-       (and (match_test "TARGET_X32 && Pmode == DImode")
+       (and (match_test "TARGET_X32 && Pmode == DImode
+			 && !ix86_indirect_branch_thunk_register")
 	    (match_operand 0 "GOT_memory_operand"))))
 
 (define_constraint "Bw"
   "@internal Call memory operand."
-  (ior (and (not (match_test "TARGET_X32"))
+  (ior (and (not (match_test "TARGET_X32
+			      || ix86_indirect_branch_thunk_register"))
 	    (match_operand 0 "memory_operand"))
-       (and (match_test "TARGET_X32 && Pmode == DImode")
+       (and (match_test "TARGET_X32 && Pmode == DImode
+			 && !ix86_indirect_branch_thunk_register")
 	    (match_operand 0 "GOT_memory_operand"))))
 
 (define_constraint "Bz"
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6c832a867c8..c0efc380e37 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12311,7 +12311,7 @@
   [(set (pc) (match_operand 0 "indirect_branch_operand"))]
   ""
 {
-  if (TARGET_X32)
+  if (TARGET_X32 || ix86_indirect_branch_thunk_register)
     operands[0] = convert_memory_address (word_mode, operands[0]);
   cfun->machine->has_local_indirect_jump = true;
 })
@@ -12361,7 +12361,7 @@
 					 OPTAB_DIRECT);
     }
 
-  if (TARGET_X32)
+  if (TARGET_X32 || ix86_indirect_branch_thunk_register)
     operands[0] = convert_memory_address (word_mode, operands[0]);
   cfun->machine->has_local_indirect_jump = true;
 })
@@ -12606,7 +12606,7 @@
   [(call (mem:QI (match_operand:W 0 "memory_operand" "m"))
 	 (match_operand 1))
    (unspec [(const_int 0)] UNSPEC_PEEPSIB)]
-  "!TARGET_X32"
+  "!TARGET_X32 && !ix86_indirect_branch_thunk_register"
   "* return ix86_output_call_insn (insn, operands[0]);"
   [(set_attr "type" "call")])
 
@@ -12615,7 +12615,9 @@
 	(match_operand:W 1 "memory_operand"))
    (call (mem:QI (match_dup 0))
 	 (match_operand 3))]
-  "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (1))
+  "!TARGET_X32
+   && !ix86_indirect_branch_thunk_register
+   && SIBLING_CALL_P (peep2_next_insn (1))
    && !reg_mentioned_p (operands[0],
 			CALL_INSN_FUNCTION_USAGE (peep2_next_insn (1)))"
   [(parallel [(call (mem:QI (match_dup 1))
@@ -12628,7 +12630,9 @@
    (unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
    (call (mem:QI (match_dup 0))
 	 (match_operand 3))]
-  "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (2))
+  "!TARGET_X32
+   && !ix86_indirect_branch_thunk_register
+   && SIBLING_CALL_P (peep2_next_insn (2))
    && !reg_mentioned_p (operands[0],
 			CALL_INSN_FUNCTION_USAGE (peep2_next_insn (2)))"
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
@@ -12650,7 +12654,7 @@
 })
 
 (define_insn "*call_pop"
-  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lmBz"))
+  [(call (mem:QI (match_operand:SI 0 "call_insn_operand" "lBwBz"))
 	 (match_operand 1))
    (set (reg:SI SP_REG)
 	(plus:SI (reg:SI SP_REG)
@@ -12670,7 +12674,7 @@
   [(set_attr "type" "call")])
 
 (define_insn "*sibcall_pop_memory"
-  [(call (mem:QI (match_operand:SI 0 "memory_operand" "m"))
+  [(call (mem:QI (match_operand:SI 0 "memory_operand" "Bs"))
 	 (match_operand 1))
    (set (reg:SI SP_REG)
 	(plus:SI (reg:SI SP_REG)
@@ -12724,7 +12728,9 @@
   [(set (match_operand:W 0 "register_operand")
         (match_operand:W 1 "memory_operand"))
    (set (pc) (match_dup 0))]
-  "!TARGET_X32 && peep2_reg_dead_p (2, operands[0])"
+  "!TARGET_X32
+   && !ix86_indirect_branch_thunk_register
+   && peep2_reg_dead_p (2, operands[0])"
   [(set (pc) (match_dup 1))])
 
 ;; Call subroutine, returning value in operand 0
@@ -12805,7 +12811,7 @@
  	(call (mem:QI (match_operand:W 1 "memory_operand" "m"))
 	      (match_operand 2)))
    (unspec [(const_int 0)] UNSPEC_PEEPSIB)]
-  "!TARGET_X32"
+  "!TARGET_X32 && !ix86_indirect_branch_thunk_register"
   "* return ix86_output_call_insn (insn, operands[1]);"
   [(set_attr "type" "callv")])
 
@@ -12815,7 +12821,9 @@
    (set (match_operand 2)
    (call (mem:QI (match_dup 0))
 		 (match_operand 3)))]
-  "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (1))
+  "!TARGET_X32
+   && !ix86_indirect_branch_thunk_register
+   && SIBLING_CALL_P (peep2_next_insn (1))
    && !reg_mentioned_p (operands[0],
 			CALL_INSN_FUNCTION_USAGE (peep2_next_insn (1)))"
   [(parallel [(set (match_dup 2)
@@ -12830,7 +12838,9 @@
    (set (match_operand 2)
 	(call (mem:QI (match_dup 0))
 	      (match_operand 3)))]
-  "!TARGET_X32 && SIBLING_CALL_P (peep2_next_insn (2))
+  "!TARGET_X32
+   && !ix86_indirect_branch_thunk_register
+   && SIBLING_CALL_P (peep2_next_insn (2))
    && !reg_mentioned_p (operands[0],
 			CALL_INSN_FUNCTION_USAGE (peep2_next_insn (2)))"
   [(unspec_volatile [(const_int 0)] UNSPECV_BLOCKAGE)
@@ -12855,7 +12865,7 @@
 
 (define_insn "*call_value_pop"
   [(set (match_operand 0)
-	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lmBz"))
+	(call (mem:QI (match_operand:SI 1 "call_insn_operand" "lBwBz"))
 	      (match_operand 2)))
    (set (reg:SI SP_REG)
 	(plus:SI (reg:SI SP_REG)
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 2b62363973d..aca7e60a81d 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1046,6 +1046,10 @@ Enum(indirect_branch) String(thunk-inline) Value(indirect_branch_thunk_inline)
 EnumValue
 Enum(indirect_branch) String(thunk-extern) Value(indirect_branch_thunk_extern)
 
+mindirect-branch-register
+Target Report Var(ix86_indirect_branch_thunk_register) Init(0)
+Force indirect call and jump via register.
+
 mindirect-branch-loop=
 Target Report RejectNegative Joined Enum(indirect_branch_loop) Var(ix86_indirect_branch_loop) Init(indirect_branch_loop_lfence)
 Control loop filler in call and return thunk for indirect call and jump.
diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md
index 6fb2b4daf67..5ae443231b8 100644
--- a/gcc/config/i386/predicates.md
+++ b/gcc/config/i386/predicates.md
@@ -665,7 +665,8 @@
 ;; Test for a valid operand for indirect branch.
 (define_predicate "indirect_branch_operand"
   (ior (match_operand 0 "register_operand")
-       (and (not (match_test "TARGET_X32"))
+       (and (not (match_test "TARGET_X32
+			      || ix86_indirect_branch_thunk_register"))
 	    (match_operand 0 "memory_operand"))))
 
 ;; Return true if OP is a memory operands that can be used in sibcalls.
@@ -694,7 +695,8 @@
 
 ;; Return true if OP is a GOT memory operand.
 (define_predicate "GOT_memory_operand"
-  (match_operand 0 "memory_operand")
+  (and (match_test "!ix86_indirect_branch_thunk_register")
+       (match_operand 0 "memory_operand"))
 {
   op = XEXP (op, 0);
   return (GET_CODE (op) == CONST
@@ -708,9 +710,11 @@
   (ior (match_test "constant_call_address_operand
 		     (op, mode == VOIDmode ? mode : Pmode)")
        (match_operand 0 "call_register_no_elim_operand")
-       (ior (and (not (match_test "TARGET_X32"))
+       (ior (and (not (match_test "TARGET_X32
+				   || ix86_indirect_branch_thunk_register"))
 		 (match_operand 0 "memory_operand"))
-	    (and (match_test "TARGET_X32 && Pmode == DImode")
+	    (and (match_test "TARGET_X32 && Pmode == DImode
+			      && !ix86_indirect_branch_thunk_register")
 		 (match_operand 0 "GOT_memory_operand")))))
 
 ;; Similarly, but for tail calls, in which we cannot allow memory references.
@@ -718,14 +722,17 @@
   (ior (match_test "constant_call_address_operand
 		     (op, mode == VOIDmode ? mode : Pmode)")
        (match_operand 0 "register_no_elim_operand")
-       (ior (and (not (match_test "TARGET_X32"))
+       (ior (and (not (match_test "TARGET_X32
+				   || ix86_indirect_branch_thunk_register"))
 		 (match_operand 0 "sibcall_memory_operand"))
-	    (and (match_test "TARGET_X32 && Pmode == DImode")
+	    (and (match_test "TARGET_X32 && Pmode == DImode
+			      && !ix86_indirect_branch_thunk_register")
 		 (match_operand 0 "GOT_memory_operand")))))
 
 ;; Return true if OP is a 32-bit GOT symbol operand.
 (define_predicate "GOT32_symbol_operand"
-  (match_test "GET_CODE (op) == CONST
+  (match_test "!ix86_indirect_branch_thunk_register
+	       && GET_CODE (op) == CONST
                && GET_CODE (XEXP (op, 0)) == UNSPEC
                && XINT (XEXP (op, 0), 1) == UNSPEC_GOT"))
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index fa72a929751..32d68bb0784 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1228,7 +1228,7 @@ See RS/6000 and PowerPC Options.
 -mstack-protector-guard-symbol=@var{symbol} -mmitigate-rop @gol
 -mgeneral-regs-only -mcall-ms2sysv-xlogues @gol
 -mindirect-branch=@var{choice} -mindirect-branch-loop=@var{choice}
--mfunction-return==@var{choice}}
+-mfunction-return==@var{choice} -mindirect-branch-register}
 
 @emph{x86 Windows Options}
 @gccoptlist{-mconsole  -mcygwin  -mno-cygwin  -mdll @gol
@@ -26795,6 +26795,10 @@ Control loop filler in call and return thunk for indirect call and jump.
 @code{pause} as loop filler.  @samp{nop} uses @code{nop} as loop filler.
 The default is @samp{lfence}.
 
+@item -mindirect-branch-register
+@opindex -mindirect-branch-register
+Force indirect call and jump via register.
+
 @end table
 
 These @samp{-m} switches are supported in addition to the above
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
index bd29d466f5c..f4f2b7debe0 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
index cf6a6736524..d4e5dadd966 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
index 46925434933..9802fae5d04 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
index ee95d21c08e..fad3105b50d 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
index 7ea4af5bb71..e44f2ff5682 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
index cbf2159ce50..f1e03a30854 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
index 44130a4175d..fc91a334459 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
index 2e3e3a5ca8b..a8ab95b6451 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
index c88fd33c494..467d62324d5 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
index 21b69728796..02223f8d0f4 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
index 0bd6aab2fd6..a80b46af934 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
index 98785a38248..4bb1c5f9220 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
index a498a39e404..4e33a638862 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
index 66f295d1eb6..427ba3ddbb4 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
index 154eb7adfcd..3399ad56a7f 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! x32 } } } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
 
 void (*dispatch) (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
index 62d9831af27..daa9528f7bd 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { ! x32 } } } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fno-pic" } */
 
 void (*dispatch) (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
index b5a601a4da3..647ec5a4ade 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
 
 void bar (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
index c36a821c8e5..3a7a1cea8bc 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { *-*-linux* && { ! x32 } } } } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fcheck-pointer-bounds -mmpx -fpic -fno-plt" } */
 
 void bar (char *);
 char buf[10];
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
index 637fc3d3f4e..5c20a35ecec 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
index ff9efe03fe6..b2fb6e1bcd2 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
index 2686a5f2db4..9c84547cd7c 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
index f07f6b214ad..457849564bb 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
index 21740ac5b7f..5c07e02df6a 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
index a77c1f470b8..3eb440693a0 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-extern" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
index e64910fd4aa..d4747ea0764 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
index efa0096e1e0..f7fad345ca4 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
index 775d0b8c53e..91388544a20 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
index 788271f049f..69f03e6472e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
index ef8a2c746a7..226b776abcf 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
index 848ceefca02..b9120017c10 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
index 64608100782..fbd6f9ec457 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target *-*-linux* } } */
-/* { dg-options "-O2 -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -fpic -fno-plt -mindirect-branch=thunk-inline" } */
 
 extern void bar (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
index 3c2758360f5..2553c56f97f 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 void func0 (void);
 void func1 (void);
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
index 1b0e2c58775..c266ca6f2da 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=pause -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mindirect-branch=thunk -mindirect-branch-loop=pause -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
index feace47a765..f7c1cf6c45a 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=nop -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mindirect-branch=thunk -mindirect-branch-loop=nop -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
index ad2165fa7aa..ef5c4b84312 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-loop=lfence -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mindirect-branch=thunk -mindirect-branch-loop=lfence -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
index 4ba997da966..941fcdaffb1 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-inline -mindirect-branch-loop=pause -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mindirect-branch=thunk-inline -mindirect-branch-loop=pause -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
index 10fb2193f5e..0c5ace58358 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mindirect-branch=thunk-extern -mindirect-branch-loop=pause -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mindirect-branch=thunk-extern -mindirect-branch-loop=pause -fno-pic" } */
 
 typedef void (*dispatch_t)(long offset);
 
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
new file mode 100644
index 00000000000..06a1f9fa84e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk -mindirect-branch-register -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "mov\[ \t\](%eax|%rax), \\((%esp|%rsp)\\)" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[ \t\]*_?dispatch"  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk\n" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk_bnd\n" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-2.c
new file mode 100644
index 00000000000..428d6f9e986
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-inline -mindirect-branch-register -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler "mov\[ \t\](%eax|%rax), \\((%esp|%rsp)\\)" } } */
+/* { dg-final { scan-assembler {\tlfence} } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[ \t\]*_?dispatch"  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" } } */
+/* { dg-final { scan-assembler-not "__x86_indirect_thunk" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
new file mode 100644
index 00000000000..28dcdcf2855
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mindirect-branch=thunk-extern -mindirect-branch-register -fno-pic" } */
+
+typedef void (*dispatch_t)(long offset);
+
+dispatch_t dispatch;
+
+void
+male_indirect_jump (long offset)
+{
+  dispatch(offset);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" } } */
+/* { dg-final { scan-assembler-not "push(?:l|q)\[ \t\]*_?dispatch"  } } */
+/* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" } } */
+/* { dg-final { scan-assembler-not {\t(lfence|pause|nop)} } } */
+/* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-10.c b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
index d7313d631aa..da8029bad49 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=thunk-inline -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=thunk-inline -mindirect-branch=thunk -fno-pic" } */
 
 extern void (*bar) (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-11.c b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
index 66efd2adfd3..6964997871d 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=thunk-extern -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=thunk-extern -mindirect-branch=thunk -fno-pic" } */
 
 extern void (*bar) (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-12.c b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
index b5306581e91..ff0234bd17d 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk -fno-pic" } */
 
 extern void (*bar) (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-13.c b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
index 3759246d7ff..a5b16472051 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-inline -fno-pic" } */
 
 extern void (*bar) (void);
 extern int foo (void) __attribute__ ((function_return("thunk")));
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-14.c b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
index 6827fa250aa..219d71548bf 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=thunk-extern -fno-pic" } */
 
 extern void (*bar) (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-15.c b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
index 0437ca2453b..bad6b16820d 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=keep -mindirect-branch=keep -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=keep -mindirect-branch=keep -fno-pic" } */
 
 extern void (*bar) (void);
 
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-9.c b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
index adf83df4776..21a0e6bde3d 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -mfunction-return=thunk -mindirect-branch=thunk -fno-pic" } */
+/* { dg-options "-O2 -mno-indirect-branch-register -mno-indirect-branch-register -mfunction-return=thunk -mindirect-branch=thunk -fno-pic" } */
 
 extern void (*bar) (void);
 
-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
@ 2018-01-07 22:59 H.J. Lu
  2018-01-07 22:59 ` [PATCH 4/5] x86: Add -mindirect-branch-register H.J. Lu
                   ` (7 more replies)
  0 siblings, 8 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-07 22:59 UTC (permalink / raw)
  To: gcc-patches

This set of patches for GCC 8 mitigates variant #2 of the speculative execution
vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
convert indirect branches to call and return thunks to avoid speculative execution
via indirect call and jmp.


H.J. Lu (5):
  x86: Add -mindirect-branch=
  x86: Add -mindirect-branch-loop=
  x86: Add -mfunction-return=
  x86: Add -mindirect-branch-register
  x86: Add 'V' register operand modifier

 gcc/config/i386/constraints.md                     |  12 +-
 gcc/config/i386/i386-opts.h                        |  14 +
 gcc/config/i386/i386-protos.h                      |   2 +
 gcc/config/i386/i386.c                             | 655 ++++++++++++++++++++-
 gcc/config/i386/i386.h                             |  10 +
 gcc/config/i386/i386.md                            |  51 +-
 gcc/config/i386/i386.opt                           |  45 ++
 gcc/config/i386/predicates.md                      |  21 +-
 gcc/doc/extend.texi                                |  22 +
 gcc/doc/invoke.texi                                |  37 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c   |  19 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c   |  19 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c   |  20 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c   |  20 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c   |  16 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c   |  17 +
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c   |  43 ++
 .../gcc.target/i386/indirect-thunk-attr-1.c        |  22 +
 .../gcc.target/i386/indirect-thunk-attr-2.c        |  20 +
 .../gcc.target/i386/indirect-thunk-attr-3.c        |  21 +
 .../gcc.target/i386/indirect-thunk-attr-4.c        |  20 +
 .../gcc.target/i386/indirect-thunk-attr-5.c        |  22 +
 .../gcc.target/i386/indirect-thunk-attr-6.c        |  21 +
 .../gcc.target/i386/indirect-thunk-attr-7.c        |  44 ++
 .../gcc.target/i386/indirect-thunk-attr-8.c        |  41 ++
 .../gcc.target/i386/indirect-thunk-bnd-1.c         |  19 +
 .../gcc.target/i386/indirect-thunk-bnd-2.c         |  20 +
 .../gcc.target/i386/indirect-thunk-bnd-3.c         |  18 +
 .../gcc.target/i386/indirect-thunk-bnd-4.c         |  19 +
 .../gcc.target/i386/indirect-thunk-extern-1.c      |  19 +
 .../gcc.target/i386/indirect-thunk-extern-2.c      |  19 +
 .../gcc.target/i386/indirect-thunk-extern-3.c      |  20 +
 .../gcc.target/i386/indirect-thunk-extern-4.c      |  20 +
 .../gcc.target/i386/indirect-thunk-extern-5.c      |  16 +
 .../gcc.target/i386/indirect-thunk-extern-6.c      |  17 +
 .../gcc.target/i386/indirect-thunk-extern-7.c      |  43 ++
 .../gcc.target/i386/indirect-thunk-inline-1.c      |  18 +
 .../gcc.target/i386/indirect-thunk-inline-2.c      |  18 +
 .../gcc.target/i386/indirect-thunk-inline-3.c      |  19 +
 .../gcc.target/i386/indirect-thunk-inline-4.c      |  19 +
 .../gcc.target/i386/indirect-thunk-inline-5.c      |  15 +
 .../gcc.target/i386/indirect-thunk-inline-6.c      |  16 +
 .../gcc.target/i386/indirect-thunk-inline-7.c      |  42 ++
 .../gcc.target/i386/indirect-thunk-loop-1.c        |  19 +
 .../gcc.target/i386/indirect-thunk-loop-2.c        |  19 +
 .../gcc.target/i386/indirect-thunk-loop-3.c        |  19 +
 .../gcc.target/i386/indirect-thunk-loop-4.c        |  19 +
 .../gcc.target/i386/indirect-thunk-loop-5.c        |  19 +
 .../gcc.target/i386/indirect-thunk-register-1.c    |  22 +
 .../gcc.target/i386/indirect-thunk-register-2.c    |  20 +
 .../gcc.target/i386/indirect-thunk-register-3.c    |  19 +
 .../gcc.target/i386/indirect-thunk-register-4.c    |  13 +
 gcc/testsuite/gcc.target/i386/ret-thunk-1.c        |  12 +
 gcc/testsuite/gcc.target/i386/ret-thunk-10.c       |  22 +
 gcc/testsuite/gcc.target/i386/ret-thunk-11.c       |  22 +
 gcc/testsuite/gcc.target/i386/ret-thunk-12.c       |  21 +
 gcc/testsuite/gcc.target/i386/ret-thunk-13.c       |  21 +
 gcc/testsuite/gcc.target/i386/ret-thunk-14.c       |  21 +
 gcc/testsuite/gcc.target/i386/ret-thunk-15.c       |  21 +
 gcc/testsuite/gcc.target/i386/ret-thunk-16.c       |  18 +
 gcc/testsuite/gcc.target/i386/ret-thunk-2.c        |  12 +
 gcc/testsuite/gcc.target/i386/ret-thunk-3.c        |  12 +
 gcc/testsuite/gcc.target/i386/ret-thunk-4.c        |  12 +
 gcc/testsuite/gcc.target/i386/ret-thunk-5.c        |  14 +
 gcc/testsuite/gcc.target/i386/ret-thunk-6.c        |  13 +
 gcc/testsuite/gcc.target/i386/ret-thunk-7.c        |  13 +
 gcc/testsuite/gcc.target/i386/ret-thunk-8.c        |  14 +
 gcc/testsuite/gcc.target/i386/ret-thunk-9.c        |  23 +
 68 files changed, 2004 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-attr-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-bnd-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-inline-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-loop-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/ret-thunk-9.c

-- 
2.14.3

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
                   ` (4 preceding siblings ...)
  2018-01-07 22:59 ` [PATCH 2/5] x86: Add -mindirect-branch-loop= H.J. Lu
@ 2018-01-07 23:36 ` Jeff Law
  2018-01-08  0:30   ` H.J. Lu
                     ` (2 more replies)
  2018-01-08  4:22 ` Sandra Loosemore
  2018-01-08 17:33 ` Florian Weimer
  7 siblings, 3 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-07 23:36 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:58 PM, H.J. Lu wrote:
> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
> convert indirect branches to call and return thunks to avoid speculative execution
> via indirect call and jmp.
> 
> 
> H.J. Lu (5):
>   x86: Add -mindirect-branch=
>   x86: Add -mindirect-branch-loop=
>   x86: Add -mfunction-return=
>   x86: Add -mindirect-branch-register
>   x86: Add 'V' register operand modifier
> 
>  gcc/config/i386/constraints.md                     |  12 +-
>  gcc/config/i386/i386-opts.h                        |  14 +
>  gcc/config/i386/i386-protos.h                      |   2 +
>  gcc/config/i386/i386.c                             | 655 ++++++++++++++++++++-
>  gcc/config/i386/i386.h                             |  10 +
>  gcc/config/i386/i386.md                            |  51 +-
>  gcc/config/i386/i386.opt                           |  45 ++
>  gcc/config/i386/predicates.md                      |  21 +-
>  gcc/doc/extend.texi                                |  22 +
>  gcc/doc/invoke.texi                                |  37 +-
My fundamental problem with this patchkit is that it is 100% x86/x86_64
specific.

ISTM we want a target independent mechanism (ie, new standard patterns,
options, etc) then an x86/x86_64 implementation using that target
independent framework (ie, the actual implementation of those new
standard patterns).

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
@ 2018-01-08  0:30   ` H.J. Lu
  2018-01-08 10:01     ` Martin Liška
  2018-01-09 18:54     ` Jeff Law
  2018-01-08 14:36   ` Alan Modra
  2018-01-08 21:00   ` David Woodhouse
  2 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-08  0:30 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Sun, Jan 7, 2018 at 3:36 PM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
>> convert indirect branches to call and return thunks to avoid speculative execution
>> via indirect call and jmp.
>>
>>
>> H.J. Lu (5):
>>   x86: Add -mindirect-branch=
>>   x86: Add -mindirect-branch-loop=
>>   x86: Add -mfunction-return=
>>   x86: Add -mindirect-branch-register
>>   x86: Add 'V' register operand modifier
>>
>>  gcc/config/i386/constraints.md                     |  12 +-
>>  gcc/config/i386/i386-opts.h                        |  14 +
>>  gcc/config/i386/i386-protos.h                      |   2 +
>>  gcc/config/i386/i386.c                             | 655 ++++++++++++++++++++-
>>  gcc/config/i386/i386.h                             |  10 +
>>  gcc/config/i386/i386.md                            |  51 +-
>>  gcc/config/i386/i386.opt                           |  45 ++
>>  gcc/config/i386/predicates.md                      |  21 +-
>>  gcc/doc/extend.texi                                |  22 +
>>  gcc/doc/invoke.texi                                |  37 +-
> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> specific.
>
> ISTM we want a target independent mechanism (ie, new standard patterns,
> options, etc) then an x86/x86_64 implementation using that target
> independent framework (ie, the actual implementation of those new
> standard patterns).
>

My patch set is implemented with some constraints:

1. They need to be backportable to GCC 7/6/5/4.x.
2. They should work with all compiler optimizations.
3. They need to generate code sequences are x86 specific, which can't be
changed in any shape or form.  And the generated codes are quite opposite
to what a good optimizing compiler should generate.

Given that these conditions, I kept existing indirect call, jump and
return patterns.
I generated different code sequences for these patterns during the final pass
when generating assembly codes.

I guess that I could add a late target independent RTL pass to convert
indirect call, jump and return patterns to something else.  But I am not sure
if that is what you are looking for.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
                   ` (5 preceding siblings ...)
  2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
@ 2018-01-08  4:22 ` Sandra Loosemore
  2018-01-08  8:00   ` Markus Trippelsdorf
                     ` (2 more replies)
  2018-01-08 17:33 ` Florian Weimer
  7 siblings, 3 replies; 135+ messages in thread
From: Sandra Loosemore @ 2018-01-08  4:22 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:58 PM, H.J. Lu wrote:
> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
> convert indirect branches to call and return thunks to avoid speculative execution
> via indirect call and jmp.

I have a general documentation issue with all the new command-line 
options and attributes added by this patch set:  the documentation is 
very implementor-speaky and doesn't explain what user-level problem 
they're trying to solve.

E.g. to take just one example

> +@item function_return("@var{choice}")
> +@cindex @code{function_return} function attribute, x86
> +On x86 targets, the @code{function_return} attribute causes the compiler
> +to convert function return with @var{choice}.  @samp{keep} keeps function
> +return unmodified.  @samp{thunk} converts function return to call and
> +return thunk.  @samp{thunk-inline} converts function return to inlined
> +call and return thunk.  @samp{thunk-extern} converts function return to
> +external call and return thunk provided in a separate object file.

Why would you want to mess with call and return code generation in this 
way?  The documentation doesn't say.

For thunk-extern, is the programmer supposed to provide the thunk?  How 
would you go about implementing the missing bit of code?  What should it 
do?  I'm compiler implementor and I wouldn't even know how to use this 
feature by reading the manual; how would an ordinary application 
programmer who isn't familiar with GCC internals know how to use it?

If the goal here is to tell GCC to produce code that is protected 
against the Spectre vulnerability, perhaps simplify this to adding just 
one option that controls all the things you've given separate options 
and attributes for?

-Sandra

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  4:22 ` Sandra Loosemore
@ 2018-01-08  8:00   ` Markus Trippelsdorf
  2018-01-08 11:40     ` H.J. Lu
  2018-01-08 11:40   ` H.J. Lu
  2018-01-08 17:40   ` Florian Weimer
  2 siblings, 1 reply; 135+ messages in thread
From: Markus Trippelsdorf @ 2018-01-08  8:00 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: H.J. Lu, gcc-patches, chandlerc

On 2018.01.07 at 21:07 -0700, Sandra Loosemore wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
> > This set of patches for GCC 8 mitigates variant #2 of the speculative execution
> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
> > convert indirect branches to call and return thunks to avoid speculative execution
> > via indirect call and jmp.
> 
> I have a general documentation issue with all the new command-line 
> options and attributes added by this patch set:  the documentation is 
> very implementor-speaky and doesn't explain what user-level problem 
> they're trying to solve.
> 
> E.g. to take just one example
> 
> > +@item function_return("@var{choice}")
> > +@cindex @code{function_return} function attribute, x86
> > +On x86 targets, the @code{function_return} attribute causes the compiler
> > +to convert function return with @var{choice}.  @samp{keep} keeps function
> > +return unmodified.  @samp{thunk} converts function return to call and
> > +return thunk.  @samp{thunk-inline} converts function return to inlined
> > +call and return thunk.  @samp{thunk-extern} converts function return to
> > +external call and return thunk provided in a separate object file.
> 
> Why would you want to mess with call and return code generation in this 
> way?  The documentation doesn't say.
> 
> For thunk-extern, is the programmer supposed to provide the thunk?  How 
> would you go about implementing the missing bit of code?  What should it 
> do?  I'm compiler implementor and I wouldn't even know how to use this 
> feature by reading the manual; how would an ordinary application 
> programmer who isn't familiar with GCC internals know how to use it?
> 
> If the goal here is to tell GCC to produce code that is protected 
> against the Spectre vulnerability, perhaps simplify this to adding just 
> one option that controls all the things you've given separate options 
> and attributes for?

Also it would be good to coordinate with the LLVM guys: They currently
use -mretpoline and -mretpoline_external_thunk. 
I agree with Sandra that a single master option like -mretpoline would
be better.

-- 
Markus

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-07 22:59 ` [PATCH 2/5] x86: Add -mindirect-branch-loop= H.J. Lu
@ 2018-01-08  8:23   ` Florian Weimer
  2018-01-08 11:27     ` H.J. Lu
  2018-01-08 21:01     ` David Woodhouse
  2018-01-11 21:49   ` Jeff Law
  1 sibling, 2 replies; 135+ messages in thread
From: Florian Weimer @ 2018-01-08  8:23 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

* H. J. Lu:

> Add -mindirect-branch-loop= option to control loop filler in call and
> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> as loop filler.  The default is 'lfence'.

Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
execution?

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  0:30   ` H.J. Lu
@ 2018-01-08 10:01     ` Martin Liška
  2018-01-09 18:55       ` Jeff Law
  2018-01-09 18:54     ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: Martin Liška @ 2018-01-08 10:01 UTC (permalink / raw)
  To: H.J. Lu, Jeff Law; +Cc: GCC Patches

On 01/08/2018 01:29 AM, H.J. Lu wrote:
> 1. They need to be backportable to GCC 7/6/5/4.x.

I must admit this is very important constrain. To be honest, we're planning
to backport the patchset to GCC 4.3.

Martin

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-07 22:59 ` [PATCH 3/5] x86: Add -mfunction-return= H.J. Lu
@ 2018-01-08 10:01   ` Martin Liška
  2018-01-08 12:00     ` H.J. Lu
  2018-01-11 23:00   ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: Martin Liška @ 2018-01-08 10:01 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 11:59 PM, H.J. Lu wrote:
> Function return thunk is the same as memory thunk for -mindirect-branch=
> where the return address is at the top of the stack:
> 
> __x86_return_thunk:
> 	call L2
> L1:
> 	lfence
> 	jmp L1
> L2:
> 	lea 8(%rsp), %rsp|lea 4(%esp), %esp
> 	ret
> 
> and function return becomes
> 
> 	jmp __x86_return_thunk

Hello.

Can you please explain more usage of the option? Is to prevent a speculative
execution of 'ret' instruction (which is an indirect call), as described in [1]?
The paper mentions that return stack predictors are commonly implemented in some form.
Looks that current version of Linux patches does not use the option.

Thanks,
Martin

[1] https://support.google.com/faqs/answer/7625886

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-07 22:59 ` [PATCH 1/5] x86: Add -mindirect-branch= H.J. Lu
@ 2018-01-08 10:56   ` Martin Liška
  2018-01-08 11:59     ` H.J. Lu
  2018-01-09 19:05     ` Jeff Law
  2018-01-11 22:54   ` Jeff Law
  1 sibling, 2 replies; 135+ messages in thread
From: Martin Liška @ 2018-01-08 10:56 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 11:59 PM, H.J. Lu wrote:
> +static void
> +output_indirect_thunk_function (bool need_bnd_p, int regno)
> +{
> +  char name[32];
> +  tree decl;
> +
> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
> +  indirect_thunk_name (name, regno, need_bnd_p);
> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
> +		     get_identifier (name),
> +		     build_function_type_list (void_type_node, NULL_TREE));
> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
> +				   NULL_TREE, void_type_node);
> +  TREE_PUBLIC (decl) = 1;
> +  TREE_STATIC (decl) = 1;
> +  DECL_IGNORED_P (decl) = 1;
> +
> +#if TARGET_MACHO
> +  if (TARGET_MACHO)
> +    {
> +      switch_to_section (darwin_sections[picbase_thunk_section]);
> +      fputs ("\t.weak_definition\t", asm_out_file);
> +      assemble_name (asm_out_file, name);
> +      fputs ("\n\t.private_extern\t", asm_out_file);
> +      assemble_name (asm_out_file, name);
> +      putc ('\n', asm_out_file);
> +      ASM_OUTPUT_LABEL (asm_out_file, name);
> +      DECL_WEAK (decl) = 1;
> +    }
> +  else
> +#endif
> +    if (USE_HIDDEN_LINKONCE)
> +      {
> +	cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
> +
> +	targetm.asm_out.unique_section (decl, 0);
> +	switch_to_section (get_named_section (decl, NULL, 0));
> +
> +	targetm.asm_out.globalize_label (asm_out_file, name);
> +	fputs ("\t.hidden\t", asm_out_file);
> +	assemble_name (asm_out_file, name);
> +	putc ('\n', asm_out_file);
> +	ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
> +      }
> +    else
> +      {
> +	switch_to_section (text_section);
> +	ASM_OUTPUT_LABEL (asm_out_file, name);
> +      }
> +
> +  DECL_INITIAL (decl) = make_node (BLOCK);
> +  current_function_decl = decl;
> +  allocate_struct_function (decl, false);
> +  init_function_start (decl);
> +  /* We're about to hide the function body from callees of final_* by
> +     emitting it directly; tell them we're a thunk, if they care.  */
> +  cfun->is_thunk = true;
> +  first_function_block_is_cold = false;
> +  /* Make sure unwind info is emitted for the thunk if needed.  */
> +  final_start_function (emit_barrier (), asm_out_file, 1);
> +
> +  output_indirect_thunk (need_bnd_p, regno);
> +
> +  final_end_function ();
> +  init_insn_lengths ();
> +  free_after_compilation (cfun);
> +  set_cfun (NULL);
> +  current_function_decl = NULL;
> +}
> +

I'm wondering whether thunk creation can be a good target-independent generalization? I guess
we can emit the function declaration without direct writes to asm_out_file? And the emission
of function body can be potentially a target hook?

What about emitting body of the function with RTL instructions instead of direct assembly write?
My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
for other targets?

Thank you,
Martin

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-08  8:23   ` Florian Weimer
@ 2018-01-08 11:27     ` H.J. Lu
  2018-01-08 19:05       ` Florian Weimer
  2018-01-08 21:01     ` David Woodhouse
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 11:27 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GCC Patches

On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

My understanding is that a loop works better.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  8:00   ` Markus Trippelsdorf
@ 2018-01-08 11:40     ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 11:40 UTC (permalink / raw)
  To: Markus Trippelsdorf; +Cc: Sandra Loosemore, GCC Patches, Chandler Carruth

On Sun, Jan 7, 2018 at 10:55 PM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2018.01.07 at 21:07 -0700, Sandra Loosemore wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>> > This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
>> > convert indirect branches to call and return thunks to avoid speculative execution
>> > via indirect call and jmp.
>>
>> I have a general documentation issue with all the new command-line
>> options and attributes added by this patch set:  the documentation is
>> very implementor-speaky and doesn't explain what user-level problem
>> they're trying to solve.
>>
>> E.g. to take just one example
>>
>> > +@item function_return("@var{choice}")
>> > +@cindex @code{function_return} function attribute, x86
>> > +On x86 targets, the @code{function_return} attribute causes the compiler
>> > +to convert function return with @var{choice}.  @samp{keep} keeps function
>> > +return unmodified.  @samp{thunk} converts function return to call and
>> > +return thunk.  @samp{thunk-inline} converts function return to inlined
>> > +call and return thunk.  @samp{thunk-extern} converts function return to
>> > +external call and return thunk provided in a separate object file.
>>
>> Why would you want to mess with call and return code generation in this
>> way?  The documentation doesn't say.
>>
>> For thunk-extern, is the programmer supposed to provide the thunk?  How
>> would you go about implementing the missing bit of code?  What should it
>> do?  I'm compiler implementor and I wouldn't even know how to use this
>> feature by reading the manual; how would an ordinary application
>> programmer who isn't familiar with GCC internals know how to use it?
>>
>> If the goal here is to tell GCC to produce code that is protected
>> against the Spectre vulnerability, perhaps simplify this to adding just
>> one option that controls all the things you've given separate options
>> and attributes for?
>
> Also it would be good to coordinate with the LLVM guys: They currently
> use -mretpoline and -mretpoline_external_thunk.
> I agree with Sandra that a single master option like -mretpoline would
> be better.

Our current goal is to compile Linux kernel.  We won't change the generated
codes.  We will change the command options only if we add a late generic RTL
pass.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  4:22 ` Sandra Loosemore
  2018-01-08  8:00   ` Markus Trippelsdorf
@ 2018-01-08 11:40   ` H.J. Lu
  2018-01-11 13:28     ` H.J. Lu
  2018-01-08 17:40   ` Florian Weimer
  2 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 11:40 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: GCC Patches

On Sun, Jan 7, 2018 at 8:07 PM, Sandra Loosemore
<sandra@codesourcery.com> wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>
>> This set of patches for GCC 8 mitigates variant #2 of the speculative
>> execution
>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka
>> Spectre.  They
>> convert indirect branches to call and return thunks to avoid speculative
>> execution
>> via indirect call and jmp.
>
>
> I have a general documentation issue with all the new command-line options
> and attributes added by this patch set:  the documentation is very
> implementor-speaky and doesn't explain what user-level problem they're
> trying to solve.

Do you have any suggestions?

> E.g. to take just one example
>
>> +@item function_return("@var{choice}")
>> +@cindex @code{function_return} function attribute, x86
>> +On x86 targets, the @code{function_return} attribute causes the compiler
>> +to convert function return with @var{choice}.  @samp{keep} keeps function
>> +return unmodified.  @samp{thunk} converts function return to call and
>> +return thunk.  @samp{thunk-inline} converts function return to inlined
>> +call and return thunk.  @samp{thunk-extern} converts function return to
>> +external call and return thunk provided in a separate object file.
>
>
> Why would you want to mess with call and return code generation in this way?
> The documentation doesn't say.
>
> For thunk-extern, is the programmer supposed to provide the thunk?  How
> would you go about implementing the missing bit of code?  What should it do?
> I'm compiler implementor and I wouldn't even know how to use this feature by
> reading the manual; how would an ordinary application programmer who isn't
> familiar with GCC internals know how to use it?

This option was requested by Linux kernel people.  Linux kernel may
choose different thunks at kernel load-time.  If a program doesn't know
how to write external thunk, he/she shouldn't it.

> If the goal here is to tell GCC to produce code that is protected against
> the Spectre vulnerability, perhaps simplify this to adding just one option
> that controls all the things you've given separate options and attributes
> for?

-mindirect-branch=thunk does the job.  Other options/choices are for
fine tuning.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 10:56   ` Martin Liška
@ 2018-01-08 11:59     ` H.J. Lu
  2018-01-08 12:07       ` Jakub Jelinek
  2018-01-09 19:05     ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 11:59 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches

On Mon, Jan 8, 2018 at 2:10 AM, Martin Liška <mliska@suse.cz> wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> +  char name[32];
>> +  tree decl;
>> +
>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>> +  indirect_thunk_name (name, regno, need_bnd_p);
>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> +                  get_identifier (name),
>> +                  build_function_type_list (void_type_node, NULL_TREE));
>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> +                                NULL_TREE, void_type_node);
>> +  TREE_PUBLIC (decl) = 1;
>> +  TREE_STATIC (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 1;
>> +
>> +#if TARGET_MACHO
>> +  if (TARGET_MACHO)
>> +    {
>> +      switch_to_section (darwin_sections[picbase_thunk_section]);
>> +      fputs ("\t.weak_definition\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      fputs ("\n\t.private_extern\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      putc ('\n', asm_out_file);
>> +      ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      DECL_WEAK (decl) = 1;
>> +    }
>> +  else
>> +#endif
>> +    if (USE_HIDDEN_LINKONCE)
>> +      {
>> +     cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
>> +
>> +     targetm.asm_out.unique_section (decl, 0);
>> +     switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> +     targetm.asm_out.globalize_label (asm_out_file, name);
>> +     fputs ("\t.hidden\t", asm_out_file);
>> +     assemble_name (asm_out_file, name);
>> +     putc ('\n', asm_out_file);
>> +     ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> +      }
>> +    else
>> +      {
>> +     switch_to_section (text_section);
>> +     ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      }
>> +
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  current_function_decl = decl;
>> +  allocate_struct_function (decl, false);
>> +  init_function_start (decl);
>> +  /* We're about to hide the function body from callees of final_* by
>> +     emitting it directly; tell them we're a thunk, if they care.  */
>> +  cfun->is_thunk = true;
>> +  first_function_block_is_cold = false;
>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>> +
>> +  output_indirect_thunk (need_bnd_p, regno);
>> +
>> +  final_end_function ();
>> +  init_insn_lengths ();
>> +  free_after_compilation (cfun);
>> +  set_cfun (NULL);
>> +  current_function_decl = NULL;
>> +}
>> +
>
> I'm wondering whether thunk creation can be a good target-independent generalization? I guess
> we can emit the function declaration without direct writes to asm_out_file? And the emission
> of function body can be potentially a target hook?
>
> What about emitting body of the function with RTL instructions instead of direct assembly write?
> My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
> for other targets?

Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
I don't see how a target hook is used.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-08 10:01   ` Martin Liška
@ 2018-01-08 12:00     ` H.J. Lu
  2018-01-08 21:29       ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 12:00 UTC (permalink / raw)
  To: Martin Liška; +Cc: GCC Patches

On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška <mliska@suse.cz> wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> Function return thunk is the same as memory thunk for -mindirect-branch=
>> where the return address is at the top of the stack:
>>
>> __x86_return_thunk:
>>       call L2
>> L1:
>>       lfence
>>       jmp L1
>> L2:
>>       lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>       ret
>>
>> and function return becomes
>>
>>       jmp __x86_return_thunk
>
> Hello.
>
> Can you please explain more usage of the option? Is to prevent a speculative
> execution of 'ret' instruction (which is an indirect call), as described in [1]?
> The paper mentions that return stack predictors are commonly implemented in some form.
> Looks that current version of Linux patches does not use the option.
>

This option is requested by Linux kernel people.  It may be used in
the future.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 11:59     ` H.J. Lu
@ 2018-01-08 12:07       ` Jakub Jelinek
  2018-01-08 12:12         ` H.J. Lu
  0 siblings, 1 reply; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-08 12:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Martin Liška, GCC Patches

On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> > I'm wondering whether thunk creation can be a good target-independent generalization? I guess
> > we can emit the function declaration without direct writes to asm_out_file? And the emission
> > of function body can be potentially a target hook?
> >
> > What about emitting body of the function with RTL instructions instead of direct assembly write?
> > My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
> > for other targets?
> 
> Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
> I don't see how a target hook is used.

Talking about PIC thunks, those have I believe . character in their symbols,
so that they can't be confused with user functions.  Any reason these
retpoline thunks aren't?

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 12:07       ` Jakub Jelinek
@ 2018-01-08 12:12         ` H.J. Lu
  2018-01-08 17:25           ` Michael Matz
  2018-01-08 21:30           ` Andi Kleen
  0 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 12:12 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Martin Liška, GCC Patches

On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
>> > I'm wondering whether thunk creation can be a good target-independent generalization? I guess
>> > we can emit the function declaration without direct writes to asm_out_file? And the emission
>> > of function body can be potentially a target hook?
>> >
>> > What about emitting body of the function with RTL instructions instead of direct assembly write?
>> > My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
>> > for other targets?
>>
>> Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
>> I don't see how a target hook is used.
>
> Talking about PIC thunks, those have I believe . character in their symbols,
> so that they can't be confused with user functions.  Any reason these
> retpoline thunks aren't?
>

They used to have '.'.  It was changed at the last minute since kernel needs to
export them as regular symbols.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
  2018-01-08  0:30   ` H.J. Lu
@ 2018-01-08 14:36   ` Alan Modra
  2018-01-08 15:04     ` H.J. Lu
  2018-01-11  0:19     ` Jeff Law
  2018-01-08 21:00   ` David Woodhouse
  2 siblings, 2 replies; 135+ messages in thread
From: Alan Modra @ 2018-01-08 14:36 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, gcc-patches

On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
> On 01/07/2018 03:58 PM, H.J. Lu wrote:
> > This set of patches for GCC 8 mitigates variant #2 of the speculative execution
> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
[snip]
> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> specific.

It's possible that x86 needs spectre variant 2 mitigation that isn't
necessary on other modern processors like ARM and PowerPC, so let's
not rush into general solutions designed around x86..

Here's a quick overview of Spectre.  For more, see
https://spectreattack.com/spectre.pdf
https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html
https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf

The simplest example of ideal "gadget" code that can be exploited by
an attacker who can control the value of x, perhaps as a parameter to
some service provided by the victim is:

	char *array1, *array2;
	y = array2[array1[x] * cache_line_size];

The idea being that with the appropriate x, array1[x] can load any
byte of interest in the victim, with the array2 load evicting a cache
line detectable by the attacker.  The value of the byte of interest
can then be inferred by which cache line was affected.

Typical code of course checks user input.

	if (x < array1_size)
	  y = array2[array1[x] * cache_line_size];

Spectre variant 1 preloads the branch predictor to make the condition
predict as true.  Then when the out-of-range value of x is given,
speculative execution runs the gadget code affecting the cache.  Even
though this speculative execution is rolled back, the cache remains
affected..

Spectre variant 2 preloads the branch target predictor for indirect
branches so that some indirect branch in the victim, eg. a PLT call,
speculatively executes gadget code found somewhere in the victim.


So, to mitigate Spectre variant 1, ensure that speculative execution
doesn't get as far as the array2 load.  You could do that by modifying
the above code to

	if (x < array1_size)
          {
	    /* speculation barrier goes here */
	    y = array2[array1[x] * cache_line_size];
	  }

But you could also do

	if (x < array1_size)
          {
	    tmp = array1[x] * cache_line_size;
	    /* speculation barrier goes here */
	    y = array2[tmp];
	  }

This has the advantage of killing variant 2 attacks for this gadget
too.  If you ensure there are no gadgets anywhere, then variant 2
attacks are not possible.  Besides compiler changes to prevent gadgets
being emitted you also need compiler and linker changes to not emit
read-only data in executable segments, because data might just happen
to be a gadget when executed.

However, x86 has the additional problem of variable length
instructions.  Gadgets might be hiding in code when executed at an
offset from the start of the "real" instructions.  Which is why x86 is
more at risk from this attack than other processors, and why x86 needs
something like the posted variant 2 mitigation, slowing down all
indirect branches.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 14:36   ` Alan Modra
@ 2018-01-08 15:04     ` H.J. Lu
  2018-01-08 15:07       ` Jakub Jelinek
  2018-01-11  0:19     ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 15:04 UTC (permalink / raw)
  To: Alan Modra; +Cc: Jeff Law, GCC Patches

On Mon, Jan 8, 2018 at 6:23 AM, Alan Modra <amodra@gmail.com> wrote:
> On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>> > This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>> > vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
> [snip]
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
>
> It's possible that x86 needs spectre variant 2 mitigation that isn't
> necessary on other modern processors like ARM and PowerPC, so let's
> not rush into general solutions designed around x86..
>
> Here's a quick overview of Spectre.  For more, see
> https://spectreattack.com/spectre.pdf
> https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html
> https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf
>
> The simplest example of ideal "gadget" code that can be exploited by
> an attacker who can control the value of x, perhaps as a parameter to
> some service provided by the victim is:
>
>         char *array1, *array2;
>         y = array2[array1[x] * cache_line_size];
>
> The idea being that with the appropriate x, array1[x] can load any
> byte of interest in the victim, with the array2 load evicting a cache
> line detectable by the attacker.  The value of the byte of interest
> can then be inferred by which cache line was affected.
>
> Typical code of course checks user input.
>
>         if (x < array1_size)
>           y = array2[array1[x] * cache_line_size];
>
> Spectre variant 1 preloads the branch predictor to make the condition
> predict as true.  Then when the out-of-range value of x is given,
> speculative execution runs the gadget code affecting the cache.  Even
> though this speculative execution is rolled back, the cache remains
> affected..
>
> Spectre variant 2 preloads the branch target predictor for indirect
> branches so that some indirect branch in the victim, eg. a PLT call,
> speculatively executes gadget code found somewhere in the victim.
>
>
> So, to mitigate Spectre variant 1, ensure that speculative execution
> doesn't get as far as the array2 load.  You could do that by modifying
> the above code to
>
>         if (x < array1_size)
>           {
>             /* speculation barrier goes here */
>             y = array2[array1[x] * cache_line_size];
>           }
>
> But you could also do
>
>         if (x < array1_size)
>           {
>             tmp = array1[x] * cache_line_size;
>             /* speculation barrier goes here */
>             y = array2[tmp];
>           }
>
> This has the advantage of killing variant 2 attacks for this gadget
> too.  If you ensure there are no gadgets anywhere, then variant 2
> attacks are not possible.  Besides compiler changes to prevent gadgets
> being emitted you also need compiler and linker changes to not emit
> read-only data in executable segments, because data might just happen
> to be a gadget when executed.

See:

https://sourceware.org/ml/binutils/2017-11/msg00369.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 15:04     ` H.J. Lu
@ 2018-01-08 15:07       ` Jakub Jelinek
  2018-01-08 16:19         ` H.J. Lu
  2018-01-08 17:14         ` Michael Matz
  0 siblings, 2 replies; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-08 15:07 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Alan Modra, Jeff Law, GCC Patches

On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> See:
> 
> https://sourceware.org/ml/binutils/2017-11/msg00369.html

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x000000 0x00000000 0x00000000 0x00200 0x00200 R   0x200000
  LOAD           0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x200000
  LOAD           0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x200000
  LOAD           0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x200000
  DYNAMIC        0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1

Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
non-executable sections be emitted together, so that you have a R, then R E,
then RW PT_LOADs?

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 15:07       ` Jakub Jelinek
@ 2018-01-08 16:19         ` H.J. Lu
  2018-01-08 16:32           ` Jakub Jelinek
  2018-01-08 17:14         ` Michael Matz
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 16:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Alan Modra, Jeff Law, GCC Patches

On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
>> See:
>>
>> https://sourceware.org/ml/binutils/2017-11/msg00369.html
>
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x000000 0x00000000 0x00000000 0x00200 0x00200 R   0x200000
>   LOAD           0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x200000
>   LOAD           0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x200000
>   LOAD           0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x200000
>   DYNAMIC        0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
>   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
>   GNU_RELRO      0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
>
> Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> non-executable sections be emitted together, so that you have a R, then R E,
> then RW PT_LOADs?

It is done on purpose since the second RO segment will be merged with the RELRO
segment at load time:

Elf file type is EXEC (Executable file)
Entry point 0x401ea0
There are 11 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00400034 0x00400034 0x00160 0x00160 R   0x4
  INTERP         0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
      [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
  LOAD           0x000000 0x00400000 0x00400000 0x0037c 0x0037c R   0x1000
  LOAD           0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
  LOAD           0x001000 0x00402000 0x00402000 0x00124 0x00124 R   0x1000
  LOAD           0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
  DYNAMIC        0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
  NOTE           0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym
.dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
   03     .init .plt .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss
   06     .dynamic
   07     .note.ABI-tag .note.gnu.build-id
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got



-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 16:19         ` H.J. Lu
@ 2018-01-08 16:32           ` Jakub Jelinek
  0 siblings, 0 replies; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-08 16:32 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Alan Modra, Jeff Law, GCC Patches

On Mon, Jan 08, 2018 at 08:17:27AM -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 7:06 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> >> See:
> >>
> >> https://sourceware.org/ml/binutils/2017-11/msg00369.html
> >
> > Program Headers:
> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >   LOAD           0x000000 0x00000000 0x00000000 0x00200 0x00200 R   0x200000
> >   LOAD           0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x200000
> >   LOAD           0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x200000
> >   LOAD           0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x200000
> >   DYNAMIC        0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
> >   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
> >   GNU_RELRO      0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
> >
> > Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> > non-executable sections be emitted together, so that you have a R, then R E,
> > then RW PT_LOADs?
> 
> It is done on purpose since the second RO segment will be merged with the RELRO
> segment at load time:

That doesn't look like an advantage over not introducing it.

> Elf file type is EXEC (Executable file)
> Entry point 0x401ea0
> There are 11 program headers, starting at offset 52
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   PHDR           0x000034 0x00400034 0x00400034 0x00160 0x00160 R   0x4
>   INTERP         0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
>       [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
>   LOAD           0x000000 0x00400000 0x00400000 0x0037c 0x0037c R   0x1000
>   LOAD           0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
>   LOAD           0x001000 0x00402000 0x00402000 0x00124 0x00124 R   0x1000
>   LOAD           0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
>   DYNAMIC        0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
>   NOTE           0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
>   GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
>   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
>   GNU_RELRO      0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

   PHDR           0x000034 0x00400034 0x00400034 0x00160 0x00160 R   0x4
   INTERP         0x000194 0x00400194 0x00400194 0x0001a 0x0001a R   0x1
       [Requesting program interpreter: /libx32/ld-linux-x32.so.2]
   LOAD           0x000000 0x00400000 0x00400000 0x004a0 0x004a0 R   0x1000
   LOAD           0x000e68 0x00401e68 0x00401e68 0x00195 0x00195 R E 0x1000
   LOAD           0x001ef0 0x00402ef0 0x00402ef0 0x00134 0x00138 RW  0x1000
   DYNAMIC        0x001ef8 0x00402ef8 0x00402ef8 0x000f8 0x000f8 RW  0x4
   NOTE           0x0001b0 0x004001b0 0x004001b0 0x00044 0x00044 R   0x4
   GNU_EH_FRAME   0x001008 0x00402008 0x00402008 0x00034 0x00034 R   0x4
   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
   GNU_RELRO      0x001ef0 0x00402ef0 0x00402ef0 0x00110 0x00110 R   0x1

you could even more the second PT_LOAD earlier to make the gaps on disk
smaller.

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 15:07       ` Jakub Jelinek
  2018-01-08 16:19         ` H.J. Lu
@ 2018-01-08 17:14         ` Michael Matz
  1 sibling, 0 replies; 135+ messages in thread
From: Michael Matz @ 2018-01-08 17:14 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, Alan Modra, Jeff Law, GCC Patches

Hi,

On Mon, 8 Jan 2018, Jakub Jelinek wrote:

> On Mon, Jan 08, 2018 at 07:00:11AM -0800, H.J. Lu wrote:
> > See:
> > 
> > https://sourceware.org/ml/binutils/2017-11/msg00369.html
> 
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x000000 0x00000000 0x00000000 0x00200 0x00200 R   0x200000
>   LOAD           0x000fd0 0x00200fd0 0x00200fd0 0x0002b 0x0002b R E 0x200000
>   LOAD           0x001000 0x00201000 0x00201000 0x00058 0x00058 R   0x200000
>   LOAD           0x200f80 0x00400f80 0x00400f80 0x000a0 0x000a0 RW  0x200000
>   DYNAMIC        0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 RW  0x4
>   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
>   GNU_RELRO      0x200f80 0x00400f80 0x00400f80 0x00080 0x00080 R   0x1
> 
> Uh, 3 read-only LOADs instead of 2?  Shouldn't then all the read-only
> non-executable sections be emitted together, so that you have a R, then R E,
> then RW PT_LOADs?

See also my subthread starting at H.J. first version of the set:
  https://sourceware.org/ml/binutils/2017-11/msg00218.html
where some of the issues are hashed through.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 12:12         ` H.J. Lu
@ 2018-01-08 17:25           ` Michael Matz
  2018-01-08 17:25             ` H.J. Lu
  2018-01-08 21:30           ` Andi Kleen
  1 sibling, 1 reply; 135+ messages in thread
From: Michael Matz @ 2018-01-08 17:25 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Martin Liška, GCC Patches

Hi,

On Mon, 8 Jan 2018, H.J. Lu wrote:

> On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> >> > I'm wondering whether thunk creation can be a good target-independent generalization? I guess
> >> > we can emit the function declaration without direct writes to asm_out_file? And the emission
> >> > of function body can be potentially a target hook?
> >> >
> >> > What about emitting body of the function with RTL instructions instead of direct assembly write?
> >> > My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
> >> > for other targets?
> >>
> >> Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
> >> I don't see how a target hook is used.
> >
> > Talking about PIC thunks, those have I believe . character in their symbols,
> > so that they can't be confused with user functions.  Any reason these
> > retpoline thunks aren't?
> >
> 
> They used to have '.'.  It was changed at the last minute since kernel 
> needs to export them as regular symbols.

That can be done via asm aliases or direct assembler use; the kernel 
doesn't absolutely have to access them via C compatible symbol names.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 17:25           ` Michael Matz
@ 2018-01-08 17:25             ` H.J. Lu
  2018-01-08 17:51               ` Woodhouse, David
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 17:25 UTC (permalink / raw)
  To: Michael Matz, Woodhouse, David
  Cc: Jakub Jelinek, Martin Liška, GCC Patches

On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Mon, 8 Jan 2018, H.J. Lu wrote:
>
>> On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
>> >> > I'm wondering whether thunk creation can be a good target-independent generalization? I guess
>> >> > we can emit the function declaration without direct writes to asm_out_file? And the emission
>> >> > of function body can be potentially a target hook?
>> >> >
>> >> > What about emitting body of the function with RTL instructions instead of direct assembly write?
>> >> > My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
>> >> > for other targets?
>> >>
>> >> Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
>> >> I don't see how a target hook is used.
>> >
>> > Talking about PIC thunks, those have I believe . character in their symbols,
>> > so that they can't be confused with user functions.  Any reason these
>> > retpoline thunks aren't?
>> >
>>
>> They used to have '.'.  It was changed at the last minute since kernel
>> needs to export them as regular symbols.
>
> That can be done via asm aliases or direct assembler use; the kernel
> doesn't absolutely have to access them via C compatible symbol names.
>

Hi David,

Can you comment on this?


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
                   ` (6 preceding siblings ...)
  2018-01-08  4:22 ` Sandra Loosemore
@ 2018-01-08 17:33 ` Florian Weimer
  2018-01-08 20:49   ` David Woodhouse
  7 siblings, 1 reply; 135+ messages in thread
From: Florian Weimer @ 2018-01-08 17:33 UTC (permalink / raw)
  To: H.J. Lu; +Cc: gcc-patches

* H. J. Lu:

> This set of patches for GCC 8 mitigates variant #2 of the
> speculative execution vulnerabilities on x86 processors identified
> by CVE-2017-5715, aka Spectre.  They convert indirect branches to
> call and return thunks to avoid speculative execution via indirect
> call and jmp.

Would it make sense to add a mode which relies on an empty return
stack cache?  Or will CPUs use the regular branch predictor if the
return stack is empty?

With an empty return stack cache and no branch predictor, a simple
PUSH/RET sequence cannot be predicted, so the complex CALL sequence
with a speculation barrier is not needed.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  4:22 ` Sandra Loosemore
  2018-01-08  8:00   ` Markus Trippelsdorf
  2018-01-08 11:40   ` H.J. Lu
@ 2018-01-08 17:40   ` Florian Weimer
  2 siblings, 0 replies; 135+ messages in thread
From: Florian Weimer @ 2018-01-08 17:40 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: H.J. Lu, gcc-patches

* Sandra Loosemore:

> I have a general documentation issue with all the new command-line 
> options and attributes added by this patch set:  the documentation is 
> very implementor-speaky and doesn't explain what user-level problem 
> they're trying to solve.

Agreed.  Ideally, the documentation would also list the CPU
models/model groups where it is known to have the desired effect, and
if firmware updates are needed.

For some users, it may be useful to be able to advertise that they
have built their binaries with hardening, but another group of users
is interested in hardening which actually works to stop all potential
exploits.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 17:25             ` H.J. Lu
@ 2018-01-08 17:51               ` Woodhouse, David
  2018-01-08 23:10                 ` Michael Matz
  0 siblings, 1 reply; 135+ messages in thread
From: Woodhouse, David @ 2018-01-08 17:51 UTC (permalink / raw)
  To: H.J. Lu, Michael Matz; +Cc: Jakub Jelinek, Martin Liška, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]

On Mon, 2018-01-08 at 09:25 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 9:18 AM, Michael Matz <matz@suse.de> wrote:
> > 
> > Hi,
> > 
> > On Mon, 8 Jan 2018, H.J. Lu wrote:
> > 
> > > 
> > > On Mon, Jan 8, 2018 at 4:00 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > 
> > > > On Mon, Jan 08, 2018 at 03:55:52AM -0800, H.J. Lu wrote:
> > > > > 
> > > > > > 
> > > > > > I'm wondering whether thunk creation can be a good target-independent generalization? I guess
> > > > > > we can emit the function declaration without direct writes to asm_out_file? And the emission
> > > > > > of function body can be potentially a target hook?
> > > > > > 
> > > > > > What about emitting body of the function with RTL instructions instead of direct assembly write?
> > > > > > My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
> > > > > > for other targets?
> > > > > Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
> > > > > I don't see how a target hook is used.
> > > > Talking about PIC thunks, those have I believe . character in their symbols,
> > > > so that they can't be confused with user functions.  Any reason these
> > > > retpoline thunks aren't?
> > > > 
> > > They used to have '.'.  It was changed at the last minute since kernel
> > > needs to export them as regular symbols.
> > That can be done via asm aliases or direct assembler use; the kernel
> > doesn't absolutely have to access them via C compatible symbol names.
> > 
> Hi David,
> 
> Can you comment on this?

It ends up being a real pain for the CONFIG_TRIM_UNUSED_SYMBOLS
mechanism in the kernel, which really doesn't cope well with the dots.
It *does* assume that exported symbols have C-compatible names.
MODVERSIONS too, although we had a simpler "just shut up the warnings"
solution for that. It was CONFIG_TRIM_UNUSED_SYMBOLS which was the
really horrid one.

I went a little way down the rabbit-hole of trying to make it cope, but
it was far from pretty:

https://patchwork.kernel.org/patch/10148081/

If there's a way to make it work sanely, I'm up for that. But if the
counter-argument is "But someone might genuinely want to make their own
C function called __x86_indirect_thunk_rax"... I'm not so receptive to
that argument :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-08 11:27     ` H.J. Lu
@ 2018-01-08 19:05       ` Florian Weimer
  2018-01-08 19:36         ` H.J. Lu
  0 siblings, 1 reply; 135+ messages in thread
From: Florian Weimer @ 2018-01-08 19:05 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

* H. J. Lu:

> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>> * H. J. Lu:
>>
>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>> as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> My understanding is that a loop works better.

Better how?

What about far jumps?  I think they prevent some forms of prefetch on
i386, so perhaps long mode is similar in that regard?

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-08 19:05       ` Florian Weimer
@ 2018-01-08 19:36         ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 19:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: GCC Patches

On Mon, Jan 8, 2018 at 10:32 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
> * H. J. Lu:
>
>> On Mon, Jan 8, 2018 at 12:20 AM, Florian Weimer <fw@deneb.enyo.de> wrote:
>>> * H. J. Lu:
>>>
>>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>> as loop filler.  The default is 'lfence'.
>>>
>>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>>> execution?
>>
>> My understanding is that a loop works better.
>
> Better how?
>
> What about far jumps?  I think they prevent some forms of prefetch on
> i386, so perhaps long mode is similar in that regard?

These are more expensive and we can't guarantee that they are
effective, hence the short loop .

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 17:33 ` Florian Weimer
@ 2018-01-08 20:49   ` David Woodhouse
  0 siblings, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 20:49 UTC (permalink / raw)
  To: Florian Weimer, H.J. Lu; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

On Mon, 2018-01-08 at 09:27 +0100, Florian Weimer wrote:
> * H. J. Lu:
> 
> > 
> > This set of patches for GCC 8 mitigates variant #2 of the
> > speculative execution vulnerabilities on x86 processors identified
> > by CVE-2017-5715, aka Spectre.  They convert indirect branches to
> > call and return thunks to avoid speculative execution via indirect
> > call and jmp.
> Would it make sense to add a mode which relies on an empty return
> stack cache?  Or will CPUs use the regular branch predictor if the
> return stack is empty?
> 
> With an empty return stack cache and no branch predictor, a simple
> PUSH/RET sequence cannot be predicted, so the complex CALL sequence
> with a speculation barrier is not needed.

Some CPUs will use the regular branch predictor if the RSB is empty.
Others just round-robin the RSB and will use the *oldest* entry if they
underflow.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
  2018-01-08  0:30   ` H.J. Lu
  2018-01-08 14:36   ` Alan Modra
@ 2018-01-08 21:00   ` David Woodhouse
  2018-01-11 21:19     ` Florian Weimer
  2 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 21:00 UTC (permalink / raw)
  To: Jeff Law, H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Sun, 2018-01-07 at 16:36 -0700, Jeff Law wrote:
> 
> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> specific.
> 
> ISTM we want a target independent mechanism (ie, new standard patterns,
> options, etc) then an x86/x86_64 implementation using that target
> independent framework (ie, the actual implementation of those new
> standard patterns).

From the kernel point of view, I'm not too worried about GCC internal
implementation details. What would be really useful to agree in short
order is the command-line options that invoke this behaviour, and the
ABI for the thunks. 

Once that's done, we can push the patches to Linus and people can build
safe kernels, and we can build with HJ's existing patch set for the
time being. And you can bikeshed the rest to your collective hearts'
content... :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-08  8:23   ` Florian Weimer
  2018-01-08 11:27     ` H.J. Lu
@ 2018-01-08 21:01     ` David Woodhouse
  2018-01-08 21:10       ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 21:01 UTC (permalink / raw)
  To: Florian Weimer, H.J. Lu; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1120 bytes --]

On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
> * H. J. Lu:
> 
> > Add -mindirect-branch-loop= option to control loop filler in call and
> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> > as loop filler.  The default is 'lfence'.
> 
> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
> execution?

The idea is not to stop it per se, but to capture it. We trick the
speculative execution into *thinking* it's going to return back to that
endless loop, which prevents it from doing the branch prediction which
would otherwise have got into trouble.

There has been a fair amount of bikeshedding of precisely what goes in
there already, and '1: pause; jmp 1b' is the best option that hasn't
been shot down in flames by the CPU architects.

HJ, do we still actually need the options for lfence and nop? I thought
those were originally just for testing and could possibly be dropped
now?

Not that I care for Linux since I'm providing my own external thunk
anyway...

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-08 21:01     ` David Woodhouse
@ 2018-01-08 21:10       ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 21:10 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Florian Weimer, GCC Patches

On Mon, Jan 8, 2018 at 1:00 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Mon, 2018-01-08 at 09:20 +0100, Florian Weimer wrote:
>> * H. J. Lu:
>>
>> > Add -mindirect-branch-loop= option to control loop filler in call and
>> > return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> > as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> > as loop filler.  The default is 'lfence'.
>>
>> Why is the loop needed?  Doesn't ud2 or cpuid stop speculative
>> execution?
>
> The idea is not to stop it per se, but to capture it. We trick the
> speculative execution into *thinking* it's going to return back to that
> endless loop, which prevents it from doing the branch prediction which
> would otherwise have got into trouble.
>
> There has been a fair amount of bikeshedding of precisely what goes in
> there already, and '1: pause; jmp 1b' is the best option that hasn't
> been shot down in flames by the CPU architects.
>
> HJ, do we still actually need the options for lfence and nop? I thought
> those were originally just for testing and could possibly be dropped
> now?

This is a trial change.  It may be useful later.  But I can drop it and
hardcode it to "pause".

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-08 12:00     ` H.J. Lu
@ 2018-01-08 21:29       ` David Woodhouse
  0 siblings, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 21:29 UTC (permalink / raw)
  To: H.J. Lu, Martin Liška; +Cc: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2799 bytes --]

On Mon, 2018-01-08 at 03:59 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 1:56 AM, Martin Liška <mliska@suse.cz> wrote:
> > 
> > On 01/07/2018 11:59 PM, H.J. Lu wrote:
> > > 
> > > Function return thunk is the same as memory thunk for -mindirect-branch=
> > > where the return address is at the top of the stack:
> > > 
> > > __x86_return_thunk:
> > >       call L2
> > > L1:
> > >       lfence
> > >       jmp L1
> > > L2:
> > >       lea 8(%rsp), %rsp|lea 4(%esp), %esp
> > >       ret
> > > 
> > > and function return becomes
> > > 
> > >       jmp __x86_return_thunk
> > Hello.
> > 
> > Can you please explain more usage of the option? Is to prevent a speculative
> > execution of 'ret' instruction (which is an indirect call), as described in [1]?
> > The paper mentions that return stack predictors are commonly implemented in some form.
> > Looks that current version of Linux patches does not use the option.
> > 
> This option is requested by Linux kernel people.  It may be used in
> the future.

Right. Essentially the new set of vulnerability are all about
speculative execution. Instructions which *don't* get committed, and
it's supposed to be like they never happen, actually *do* have side-
effects and can leak information.

This is *particularly* problematic for Intel CPUs where the CPU
architects said "ah, screw it, let's not do memory permission checks in
advance; as long as we make sure it's done before we *commit* an
instruction it'll be fine". With the result that you can now basically
read *all* of kernel memory, and hence all of physical memory, directly
from userspace on Intel CPUs. Oops :)

The fix for *that* one is to actually remove the kernel pages from the
page tables while running userspace, instead of just setting the
permissions to prevent access. Hence the whole Kernel Page Table
Isolation thing.

The next interesting attack is the so-called "variant 2" where the
attacker pollutes the branch predictors so that in *kernel* mode the
CPU *speculatively* runs... well, whatever the attacker wants. This is
one that affects lots of vendors, not just Intel. We mitigate this by
eliminating *all* the indirect branches in the kernel, to make it
immune to such an attack.

This is all very well, but *some* CPUs also pull in predictions from
the generic branch target predictor when the return stack buffer has
underflowed (e.g. if there was a call stack of more than X depth).
Hence, in some cases we may yet end up needing this -mfunction-return=
thunk too. As you (Martin) note, we don't use it *yet*. The full set of
mitigations for the various attacks are still being put together, and
the optimal choice for each CPU family does end up being different.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 12:12         ` H.J. Lu
  2018-01-08 17:25           ` Michael Matz
@ 2018-01-08 21:30           ` Andi Kleen
  2018-01-08 21:35             ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Andi Kleen @ 2018-01-08 21:30 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Martin Liška, GCC Patches

"H.J. Lu" <hjl.tools@gmail.com> writes:
>>
>> Talking about PIC thunks, those have I believe . character in their symbols,
>> so that they can't be confused with user functions.  Any reason these
>> retpoline thunks aren't?
>>
>
> They used to have '.'.  It was changed at the last minute since kernel needs to
> export them as regular symbols.

The kernel doesn't actually need that to export the symbols.

While symbol CRCs cannot be generated for symbols with '.', CRCs are not
needed and there were already patches to hide the resulting warnings.

-Andi

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 21:30           ` Andi Kleen
@ 2018-01-08 21:35             ` H.J. Lu
  2018-01-08 21:44               ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-08 21:35 UTC (permalink / raw)
  To: Andi Kleen, Woodhouse, David
  Cc: Jakub Jelinek, Martin Liška, GCC Patches

On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen <ak@linux.intel.com> wrote:
> "H.J. Lu" <hjl.tools@gmail.com> writes:
>>>
>>> Talking about PIC thunks, those have I believe . character in their symbols,
>>> so that they can't be confused with user functions.  Any reason these
>>> retpoline thunks aren't?
>>>
>>
>> They used to have '.'.  It was changed at the last minute since kernel needs to
>> export them as regular symbols.
>
> The kernel doesn't actually need that to export the symbols.
>
> While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> needed and there were already patches to hide the resulting warnings.
>

Andi, can you work it out with David?

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 21:35             ` H.J. Lu
@ 2018-01-08 21:44               ` David Woodhouse
  2018-01-08 22:36                 ` Andi Kleen
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 21:44 UTC (permalink / raw)
  To: H.J. Lu, Andi Kleen; +Cc: Jakub Jelinek, Martin Liška, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1128 bytes --]

On Mon, 2018-01-08 at 13:32 -0800, H.J. Lu wrote:
> On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen <ak@linux.intel.com> wrote:
> > 
> > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > 
> > > > 
> > > > 
> > > > Talking about PIC thunks, those have I believe . character in their symbols,
> > > > so that they can't be confused with user functions.  Any reason these
> > > > retpoline thunks aren't?
> > > > 
> > > They used to have '.'.  It was changed at the last minute since kernel needs to
> > > export them as regular symbols.
> > The kernel doesn't actually need that to export the symbols.
> > 
> > While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> > needed and there were already patches to hide the resulting warnings.
> > 
> Andi, can you work it out with David?

It wasn't CONFIG_MODVERSIONS but CONFIG_TRIM_UNUSED_SYMBOLS which was
the straw that broke the camel's back on that one. I'm open to a
solution for that one, but I couldn't see one that didn't make my eyes
bleed. Except for making the symbols not have dots in.

https://patchwork.kernel.org/patch/10148081/

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 21:44               ` David Woodhouse
@ 2018-01-08 22:36                 ` Andi Kleen
  2018-01-08 23:02                   ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: Andi Kleen @ 2018-01-08 22:36 UTC (permalink / raw)
  To: David Woodhouse; +Cc: H.J. Lu, Jakub Jelinek, Martin Liška, GCC Patches

On Mon, Jan 08, 2018 at 09:35:26PM +0000, David Woodhouse wrote:
> On Mon, 2018-01-08 at 13:32 -0800, H.J. Lu wrote:
> > On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen <ak@linux.intel.com> wrote:
> > > 
> > > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > > 
> > > > > 
> > > > > 
> > > > > Talking about PIC thunks, those have I believe . character in their symbols,
> > > > > so that they can't be confused with user functions.  Any reason these
> > > > > retpoline thunks aren't?
> > > > > 
> > > > They used to have '.'.  It was changed at the last minute since kernel needs to
> > > > export them as regular symbols.
> > > The kernel doesn't actually need that to export the symbols.
> > > 
> > > While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> > > needed and there were already patches to hide the resulting warnings.
> > > 
> > Andi, can you work it out with David?
> 
> It wasn't CONFIG_MODVERSIONS but CONFIG_TRIM_UNUSED_SYMBOLS which was
> the straw that broke the camel's back on that one. I'm open to a
> solution for that one, but I couldn't see one that didn't make my eyes
> bleed. Except for making the symbols not have dots in.
> 
> https://patchwork.kernel.org/patch/10148081/

I guess we can stay with it the underscore version in the compiler now.

In theory it could conflict with something used in C, but the risk
is probably low.

-Andi


^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 22:36                 ` Andi Kleen
@ 2018-01-08 23:02                   ` David Woodhouse
  0 siblings, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-08 23:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H.J. Lu, Jakub Jelinek, Martin Liška, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1849 bytes --]

On Mon, 2018-01-08 at 14:27 -0800, Andi Kleen wrote:
> On Mon, Jan 08, 2018 at 09:35:26PM +0000, David Woodhouse wrote:
> > On Mon, 2018-01-08 at 13:32 -0800, H.J. Lu wrote:
> > > On Mon, Jan 8, 2018 at 8:46 AM, Andi Kleen <ak@linux.intel.com> wrote:
> > > > 
> > > > "H.J. Lu" <hjl.tools@gmail.com> writes:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Talking about PIC thunks, those have I believe . character in their symbols,
> > > > > > so that they can't be confused with user functions.  Any reason these
> > > > > > retpoline thunks aren't?
> > > > > > 
> > > > > They used to have '.'.  It was changed at the last minute since kernel needs to
> > > > > export them as regular symbols.
> > > > The kernel doesn't actually need that to export the symbols.
> > > > 
> > > > While symbol CRCs cannot be generated for symbols with '.', CRCs are not
> > > > needed and there were already patches to hide the resulting warnings.
> > > > 
> > > Andi, can you work it out with David?
> > 
> > It wasn't CONFIG_MODVERSIONS but CONFIG_TRIM_UNUSED_SYMBOLS which was
> > the straw that broke the camel's back on that one. I'm open to a
> > solution for that one, but I couldn't see one that didn't make my eyes
> > bleed. Except for making the symbols not have dots in.
> > 
> > https://patchwork.kernel.org/patch/10148081/
> 
> I guess we can stay with it the underscore version in the compiler now.
> 
> In theory it could conflict with something used in C, but the risk
> is probably low.

If it makes anyone happier, we could perhaps stick with the dot version
for the *inline* thunks but only use underscores for the external one?

But really, any "innocent user" claiming to be *surprised* after
writing their own C function and calling it __x86_indirect_thunk_rax is
surely taking the piss.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 17:51               ` Woodhouse, David
@ 2018-01-08 23:10                 ` Michael Matz
  2018-01-09  0:46                   ` Woodhouse, David
  0 siblings, 1 reply; 135+ messages in thread
From: Michael Matz @ 2018-01-08 23:10 UTC (permalink / raw)
  To: Woodhouse, David; +Cc: hjl.tools, gcc-patches, jakub, mliska

Hi,

On Mon, 8 Jan 2018, Woodhouse, David wrote:

> > > That can be done via asm aliases or direct assembler use; the kernel
> > > doesn't absolutely have to access them via C compatible symbol names.
> > > 
> > Hi David,
> > 
> > Can you comment on this?
> 
> It ends up being a real pain for the CONFIG_TRIM_UNUSED_SYMBOLS
> mechanism in the kernel, which really doesn't cope well with the dots.
> It *does* assume that exported symbols have C-compatible names.
> MODVERSIONS too, although we had a simpler "just shut up the warnings"
> solution for that. It was CONFIG_TRIM_UNUSED_SYMBOLS which was the
> really horrid one.
> 
> I went a little way down the rabbit-hole of trying to make it cope, but
> it was far from pretty:
> 
> https://patchwork.kernel.org/patch/10148081/
> 
> If there's a way to make it work sanely, I'm up for that. But if the
> counter-argument is "But someone might genuinely want to make their own
> C function called __x86_indirect_thunk_rax"... I'm not so receptive to
> that argument :)

Well, the naming of the extern thunk isn't so important that the above 
might not be a reason to just go with underscores.  I'll certainly not 
object to the patch on that basis.  But do keep in mind that GCC already 
uses '.' for other compiler generated symbols, and we're likely to 
continue doing this.  So eventually you'll want to fix your trim_unused 
infrastructure to cope with that.  (Perhaps by just ignoring those 
symbols?  It's not that they must be trimmed if unused, as the user didn't 
write them on his own to start with, right?)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 23:10                 ` Michael Matz
@ 2018-01-09  0:46                   ` Woodhouse, David
  0 siblings, 0 replies; 135+ messages in thread
From: Woodhouse, David @ 2018-01-09  0:46 UTC (permalink / raw)
  To: Michael Matz; +Cc: hjl.tools, gcc-patches, jakub, mliska

[-- Attachment #1: Type: text/plain, Size: 3233 bytes --]

On Tue, 2018-01-09 at 00:02 +0100, Michael Matz wrote:
> Hi,
> 
> On Mon, 8 Jan 2018, Woodhouse, David wrote:
> 
> > 
> > > 
> > > > 
> > > > That can be done via asm aliases or direct assembler use; the kernel
> > > > doesn't absolutely have to access them via C compatible symbol names.
> > > > 
> > > Hi David,
> > > 
> > > Can you comment on this?
> >
> > It ends up being a real pain for the CONFIG_TRIM_UNUSED_SYMBOLS
> > mechanism in the kernel, which really doesn't cope well with the dots.
> > It *does* assume that exported symbols have C-compatible names.
> > MODVERSIONS too, although we had a simpler "just shut up the warnings"
> > solution for that. It was CONFIG_TRIM_UNUSED_SYMBOLS which was the
> > really horrid one.
> > 
> > I went a little way down the rabbit-hole of trying to make it cope, but
> > it was far from pretty:
> > 
> > https://patchwork.kernel.org/patch/10148081/
> > 
> > If there's a way to make it work sanely, I'm up for that. But if the
> > counter-argument is "But someone might genuinely want to make their own
> > C function called __x86_indirect_thunk_rax"... I'm not so receptive to
> > that argument :)
>
> Well, the naming of the extern thunk isn't so important that the above 
> might not be a reason to just go with underscores.  I'll certainly not 
> object to the patch on that basis.  But do keep in mind that GCC already 
> uses '.' for other compiler generated symbols, and we're likely to 
> continue doing this.  So eventually you'll want to fix your trim_unused 
> infrastructure to cope with that.  (Perhaps by just ignoring those 
> symbols?  It's not that they must be trimmed if unused, as the user didn't 
> write them on his own to start with, right?)

This is only for the symbols which are exported to loadable kernel modules.

When the compiler-generated symbols are emitted inline in a COMDAT
section, we basically never even notice them. The module probably has
its own copy, and that's fine.

These indirect thunks are special because we really care about
modifying them at run time according to which CPU we happen to be
running on and which other mitigations for the Spectre problem are in
use. That's why we asked for the -mindirect-branch=thunk-extern option
and provided our own copy of the thunks — and why we're exporting *our*
version to loadable modules.

I don't think it's hugely likely that we'll need to cope with other
compiler-generated symbols in quite the same way, in the near future.

The CONFIG_TRIM_UNUSED_SYMBOLS option is an optimisation to avoid
exporting functions which aren't actually used by any of the loadable
modules that were built in the currently-active kernel configuration.
Lots of in-kernel text can thus be dropped from the image once it's
known that it will never be used. And that's the code that has
difficulty with the dots.

And yes, it's perfectly OK to drop, for example, the %rsp-based thunk
completely if nobody happens to use it today (actually, why am I
creating that one in the first place? :)

There are various ways to address the problem, none of them very
pretty. The nicest was to just *not* have dots in the symbols. 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08  0:30   ` H.J. Lu
  2018-01-08 10:01     ` Martin Liška
@ 2018-01-09 18:54     ` Jeff Law
  2018-01-09 19:26       ` H.J. Lu
  2018-01-10 10:44       ` Eric Botcazou
  1 sibling, 2 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-09 18:54 UTC (permalink / raw)
  To: H.J. Lu; +Cc: GCC Patches

On 01/07/2018 05:29 PM, H.J. Lu wrote:
> On Sun, Jan 7, 2018 at 3:36 PM, Jeff Law <law@redhat.com> wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
>>> convert indirect branches to call and return thunks to avoid speculative execution
>>> via indirect call and jmp.
>>>
>>>
>>> H.J. Lu (5):
>>>   x86: Add -mindirect-branch=
>>>   x86: Add -mindirect-branch-loop=
>>>   x86: Add -mfunction-return=
>>>   x86: Add -mindirect-branch-register
>>>   x86: Add 'V' register operand modifier
>>>
>>>  gcc/config/i386/constraints.md                     |  12 +-
>>>  gcc/config/i386/i386-opts.h                        |  14 +
>>>  gcc/config/i386/i386-protos.h                      |   2 +
>>>  gcc/config/i386/i386.c                             | 655 ++++++++++++++++++++-
>>>  gcc/config/i386/i386.h                             |  10 +
>>>  gcc/config/i386/i386.md                            |  51 +-
>>>  gcc/config/i386/i386.opt                           |  45 ++
>>>  gcc/config/i386/predicates.md                      |  21 +-
>>>  gcc/doc/extend.texi                                |  22 +
>>>  gcc/doc/invoke.texi                                |  37 +-
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
>>
>> ISTM we want a target independent mechanism (ie, new standard patterns,
>> options, etc) then an x86/x86_64 implementation using that target
>> independent framework (ie, the actual implementation of those new
>> standard patterns).
>>
> 
> My patch set is implemented with some constraints:
> 
> 1. They need to be backportable to GCC 7/6/5/4.x.
> 2. They should work with all compiler optimizations.
> 3. They need to generate code sequences are x86 specific, which can't be
> changed in any shape or form.  And the generated codes are quite opposite
> to what a good optimizing compiler should generate.
> 
> Given that these conditions, I kept existing indirect call, jump and
> return patterns.
> I generated different code sequences for these patterns during the final pass
> when generating assembly codes.
> 
> I guess that I could add a late target independent RTL pass to convert
> indirect call, jump and return patterns to something else.  But I am not sure
> if that is what you are looking for.
I don't see how those constraints are incompatible with doing most of
this work at a higher level in the compiler.  You just surround the
resulting RTL bits with appropriate barriers to prevent them from
getting mucked up by the optimizers.  Actually I think you're going to
need one barrier in the middle of the sequence to keep bbro at bay
within the sequence as a whole.

It's really just a couple of new primitives to emit a jump as a call and
one to slam in a new return address.  Given those I think you can do the
entire implementation as RTL at expansion time and you've got a damn
good shot at protecting most architectures from these kinds of attacks.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 10:01     ` Martin Liška
@ 2018-01-09 18:55       ` Jeff Law
  0 siblings, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-09 18:55 UTC (permalink / raw)
  To: Martin Liška, H.J. Lu; +Cc: GCC Patches

On 01/08/2018 03:01 AM, Martin Liška wrote:
> On 01/08/2018 01:29 AM, H.J. Lu wrote:
>> 1. They need to be backportable to GCC 7/6/5/4.x.
> 
> I must admit this is very important constrain. To be honest, we're planning
> to backport the patchset to GCC 4.3.
Red Hat would likely be backporting a ways as well.  But I don't think
doing this at expand rather than in the targets would affect
backportability all that much, nor do I think that backporting to 10
year old compilers like SuSE and Red Hat do should drive the design
decisions :-)

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-08 10:56   ` Martin Liška
  2018-01-08 11:59     ` H.J. Lu
@ 2018-01-09 19:05     ` Jeff Law
  2018-01-09 19:11       ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-09 19:05 UTC (permalink / raw)
  To: Martin Liška, H.J. Lu, gcc-patches

On 01/08/2018 03:10 AM, Martin Liška wrote:
> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>> +{
>> +  char name[32];
>> +  tree decl;
>> +
>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>> +  indirect_thunk_name (name, regno, need_bnd_p);
>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>> +		     get_identifier (name),
>> +		     build_function_type_list (void_type_node, NULL_TREE));
>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>> +				   NULL_TREE, void_type_node);
>> +  TREE_PUBLIC (decl) = 1;
>> +  TREE_STATIC (decl) = 1;
>> +  DECL_IGNORED_P (decl) = 1;
>> +
>> +#if TARGET_MACHO
>> +  if (TARGET_MACHO)
>> +    {
>> +      switch_to_section (darwin_sections[picbase_thunk_section]);
>> +      fputs ("\t.weak_definition\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      fputs ("\n\t.private_extern\t", asm_out_file);
>> +      assemble_name (asm_out_file, name);
>> +      putc ('\n', asm_out_file);
>> +      ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      DECL_WEAK (decl) = 1;
>> +    }
>> +  else
>> +#endif
>> +    if (USE_HIDDEN_LINKONCE)
>> +      {
>> +	cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
>> +
>> +	targetm.asm_out.unique_section (decl, 0);
>> +	switch_to_section (get_named_section (decl, NULL, 0));
>> +
>> +	targetm.asm_out.globalize_label (asm_out_file, name);
>> +	fputs ("\t.hidden\t", asm_out_file);
>> +	assemble_name (asm_out_file, name);
>> +	putc ('\n', asm_out_file);
>> +	ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>> +      }
>> +    else
>> +      {
>> +	switch_to_section (text_section);
>> +	ASM_OUTPUT_LABEL (asm_out_file, name);
>> +      }
>> +
>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>> +  current_function_decl = decl;
>> +  allocate_struct_function (decl, false);
>> +  init_function_start (decl);
>> +  /* We're about to hide the function body from callees of final_* by
>> +     emitting it directly; tell them we're a thunk, if they care.  */
>> +  cfun->is_thunk = true;
>> +  first_function_block_is_cold = false;
>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>> +
>> +  output_indirect_thunk (need_bnd_p, regno);
>> +
>> +  final_end_function ();
>> +  init_insn_lengths ();
>> +  free_after_compilation (cfun);
>> +  set_cfun (NULL);
>> +  current_function_decl = NULL;
>> +}
>> +
> 
> I'm wondering whether thunk creation can be a good target-independent generalization? I guess
> we can emit the function declaration without direct writes to asm_out_file? And the emission
> of function body can be potentially a target hook?
> 
> What about emitting body of the function with RTL instructions instead of direct assembly write?
> My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
> for other targets?
That's the key point I'm trying to make.  We should be looking at
generalizing this stuff where it makes sense.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-09 19:05     ` Jeff Law
@ 2018-01-09 19:11       ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-09 19:11 UTC (permalink / raw)
  To: Jeff Law; +Cc: Martin Liška, GCC Patches

On Tue, Jan 9, 2018 at 10:55 AM, Jeff Law <law@redhat.com> wrote:
> On 01/08/2018 03:10 AM, Martin Liška wrote:
>> On 01/07/2018 11:59 PM, H.J. Lu wrote:
>>> +static void
>>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
>>> +{
>>> +  char name[32];
>>> +  tree decl;
>>> +
>>> +  /* Create __x86_indirect_thunk/__x86_indirect_thunk_bnd.  */
>>> +  indirect_thunk_name (name, regno, need_bnd_p);
>>> +  decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>>> +                 get_identifier (name),
>>> +                 build_function_type_list (void_type_node, NULL_TREE));
>>> +  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
>>> +                               NULL_TREE, void_type_node);
>>> +  TREE_PUBLIC (decl) = 1;
>>> +  TREE_STATIC (decl) = 1;
>>> +  DECL_IGNORED_P (decl) = 1;
>>> +
>>> +#if TARGET_MACHO
>>> +  if (TARGET_MACHO)
>>> +    {
>>> +      switch_to_section (darwin_sections[picbase_thunk_section]);
>>> +      fputs ("\t.weak_definition\t", asm_out_file);
>>> +      assemble_name (asm_out_file, name);
>>> +      fputs ("\n\t.private_extern\t", asm_out_file);
>>> +      assemble_name (asm_out_file, name);
>>> +      putc ('\n', asm_out_file);
>>> +      ASM_OUTPUT_LABEL (asm_out_file, name);
>>> +      DECL_WEAK (decl) = 1;
>>> +    }
>>> +  else
>>> +#endif
>>> +    if (USE_HIDDEN_LINKONCE)
>>> +      {
>>> +    cgraph_node::create (decl)->set_comdat_group (DECL_ASSEMBLER_NAME (decl));
>>> +
>>> +    targetm.asm_out.unique_section (decl, 0);
>>> +    switch_to_section (get_named_section (decl, NULL, 0));
>>> +
>>> +    targetm.asm_out.globalize_label (asm_out_file, name);
>>> +    fputs ("\t.hidden\t", asm_out_file);
>>> +    assemble_name (asm_out_file, name);
>>> +    putc ('\n', asm_out_file);
>>> +    ASM_DECLARE_FUNCTION_NAME (asm_out_file, name, decl);
>>> +      }
>>> +    else
>>> +      {
>>> +    switch_to_section (text_section);
>>> +    ASM_OUTPUT_LABEL (asm_out_file, name);
>>> +      }
>>> +
>>> +  DECL_INITIAL (decl) = make_node (BLOCK);
>>> +  current_function_decl = decl;
>>> +  allocate_struct_function (decl, false);
>>> +  init_function_start (decl);
>>> +  /* We're about to hide the function body from callees of final_* by
>>> +     emitting it directly; tell them we're a thunk, if they care.  */
>>> +  cfun->is_thunk = true;
>>> +  first_function_block_is_cold = false;
>>> +  /* Make sure unwind info is emitted for the thunk if needed.  */
>>> +  final_start_function (emit_barrier (), asm_out_file, 1);
>>> +
>>> +  output_indirect_thunk (need_bnd_p, regno);
>>> +
>>> +  final_end_function ();
>>> +  init_insn_lengths ();
>>> +  free_after_compilation (cfun);
>>> +  set_cfun (NULL);
>>> +  current_function_decl = NULL;
>>> +}
>>> +
>>
>> I'm wondering whether thunk creation can be a good target-independent generalization? I guess
>> we can emit the function declaration without direct writes to asm_out_file? And the emission
>> of function body can be potentially a target hook?
>>
>> What about emitting body of the function with RTL instructions instead of direct assembly write?
>> My knowledge of RTL is quite small, but maybe it can bring some generalization and reusability
>> for other targets?
> That's the key point I'm trying to make.  We should be looking at
> generalizing this stuff where it makes sense.
>

Thunks are x86 specific and they are created the same way as 32-bit PIC thunks.
I don't see how a target hook is used.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-09 18:54     ` Jeff Law
@ 2018-01-09 19:26       ` H.J. Lu
  2018-01-10 10:44       ` Eric Botcazou
  1 sibling, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-09 19:26 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Tue, Jan 9, 2018 at 10:52 AM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 05:29 PM, H.J. Lu wrote:
>> On Sun, Jan 7, 2018 at 3:36 PM, Jeff Law <law@redhat.com> wrote:
>>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.  They
>>>> convert indirect branches to call and return thunks to avoid speculative execution
>>>> via indirect call and jmp.
>>>>
>>>>
>>>> H.J. Lu (5):
>>>>   x86: Add -mindirect-branch=
>>>>   x86: Add -mindirect-branch-loop=
>>>>   x86: Add -mfunction-return=
>>>>   x86: Add -mindirect-branch-register
>>>>   x86: Add 'V' register operand modifier
>>>>
>>>>  gcc/config/i386/constraints.md                     |  12 +-
>>>>  gcc/config/i386/i386-opts.h                        |  14 +
>>>>  gcc/config/i386/i386-protos.h                      |   2 +
>>>>  gcc/config/i386/i386.c                             | 655 ++++++++++++++++++++-
>>>>  gcc/config/i386/i386.h                             |  10 +
>>>>  gcc/config/i386/i386.md                            |  51 +-
>>>>  gcc/config/i386/i386.opt                           |  45 ++
>>>>  gcc/config/i386/predicates.md                      |  21 +-
>>>>  gcc/doc/extend.texi                                |  22 +
>>>>  gcc/doc/invoke.texi                                |  37 +-
>>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>>> specific.
>>>
>>> ISTM we want a target independent mechanism (ie, new standard patterns,
>>> options, etc) then an x86/x86_64 implementation using that target
>>> independent framework (ie, the actual implementation of those new
>>> standard patterns).
>>>
>>
>> My patch set is implemented with some constraints:
>>
>> 1. They need to be backportable to GCC 7/6/5/4.x.
>> 2. They should work with all compiler optimizations.
>> 3. They need to generate code sequences are x86 specific, which can't be
>> changed in any shape or form.  And the generated codes are quite opposite
>> to what a good optimizing compiler should generate.
>>
>> Given that these conditions, I kept existing indirect call, jump and
>> return patterns.
>> I generated different code sequences for these patterns during the final pass
>> when generating assembly codes.
>>
>> I guess that I could add a late target independent RTL pass to convert
>> indirect call, jump and return patterns to something else.  But I am not sure
>> if that is what you are looking for.
> I don't see how those constraints are incompatible with doing most of
> this work at a higher level in the compiler.  You just surround the
> resulting RTL bits with appropriate barriers to prevent them from
> getting mucked up by the optimizers.  Actually I think you're going to
> need one barrier in the middle of the sequence to keep bbro at bay
> within the sequence as a whole.
>
> It's really just a couple of new primitives to emit a jump as a call and
> one to slam in a new return address.  Given those I think you can do the
> entire implementation as RTL at expansion time and you've got a damn
> good shot at protecting most architectures from these kinds of attacks.

Doing this at RTL expansion time may not work well with RTL optimizing
paases nor IRA/LRA.  We may wind up undoing all kinds of optimizations
as well code sequences.


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-09 18:54     ` Jeff Law
  2018-01-09 19:26       ` H.J. Lu
@ 2018-01-10 10:44       ` Eric Botcazou
  2018-01-10 12:53         ` H.J. Lu
  2018-01-10 13:12         ` Richard Biener
  1 sibling, 2 replies; 135+ messages in thread
From: Eric Botcazou @ 2018-01-10 10:44 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, H.J. Lu

> It's really just a couple of new primitives to emit a jump as a call and
> one to slam in a new return address.  Given those I think you can do the
> entire implementation as RTL at expansion time and you've got a damn
> good shot at protecting most architectures from these kinds of attacks.

I think that you're a bit optimistic here and that implementing a generic and 
robust framework at the RTL level might require some time.  Given the time and 
(back-)portability constraints, it might be wiser to rush into architecture-
specific countermeasures than to rush into an half-backed RTL framework.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 10:44       ` Eric Botcazou
@ 2018-01-10 12:53         ` H.J. Lu
  2018-01-10 13:12         ` Richard Biener
  1 sibling, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-10 12:53 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Jeff Law, GCC Patches

On Wed, Jan 10, 2018 at 2:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> It's really just a couple of new primitives to emit a jump as a call and
>> one to slam in a new return address.  Given those I think you can do the
>> entire implementation as RTL at expansion time and you've got a damn
>> good shot at protecting most architectures from these kinds of attacks.
>
> I think that you're a bit optimistic here and that implementing a generic and
> robust framework at the RTL level might require some time.  Given the time and
> (back-)portability constraints, it might be wiser to rush into architecture-
> specific countermeasures than to rush into an half-backed RTL framework.
>

We have tried to implement this in target-independent IR with a different
compiler.  We run into a couple issues:

1. Some optimizations aren't performed since optimizers don't understand
our code sequences.
2. Some passes insert instructions between our code sequences, which
leads to invalid codes.

All of them can be resolved, given enough time.  I don't know how long
it will take to make generic RTL approach as robust as x86 backend
specific implementation, which just converts indirect branch and return
to different functional equivalent code sequences.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 10:44       ` Eric Botcazou
  2018-01-10 12:53         ` H.J. Lu
@ 2018-01-10 13:12         ` Richard Biener
  2018-01-10 13:35           ` Jakub Jelinek
  1 sibling, 1 reply; 135+ messages in thread
From: Richard Biener @ 2018-01-10 13:12 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Jeff Law, GCC Patches, H.J. Lu

On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> It's really just a couple of new primitives to emit a jump as a call and
>> one to slam in a new return address.  Given those I think you can do the
>> entire implementation as RTL at expansion time and you've got a damn
>> good shot at protecting most architectures from these kinds of attacks.
>
> I think that you're a bit optimistic here and that implementing a generic and
> robust framework at the RTL level might require some time.  Given the time and
> (back-)portability constraints, it might be wiser to rush into architecture-
> specific countermeasures than to rush into an half-backed RTL framework.

Let me also say that while it might be nice to commonize code introducing these
mitigations as late as possible to not disrupt optimization is important.  So I
don't see a very strong motivation in trying very hard to make this more
middle-endish, apart from maybe sharing helper functions where possible.

Richard.

> --
> Eric Botcazou

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 13:12         ` Richard Biener
@ 2018-01-10 13:35           ` Jakub Jelinek
  2018-01-10 13:55             ` H.J. Lu
  2018-01-11  1:30             ` Jeff Law
  0 siblings, 2 replies; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-10 13:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, Jeff Law, GCC Patches, H.J. Lu

On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
> >> It's really just a couple of new primitives to emit a jump as a call and
> >> one to slam in a new return address.  Given those I think you can do the
> >> entire implementation as RTL at expansion time and you've got a damn
> >> good shot at protecting most architectures from these kinds of attacks.
> >
> > I think that you're a bit optimistic here and that implementing a generic and
> > robust framework at the RTL level might require some time.  Given the time and
> > (back-)portability constraints, it might be wiser to rush into architecture-
> > specific countermeasures than to rush into an half-backed RTL framework.
> 
> Let me also say that while it might be nice to commonize code introducing these
> mitigations as late as possible to not disrupt optimization is important.  So I
> don't see a very strong motivation in trying very hard to make this more
> middle-endish, apart from maybe sharing helper functions where possible.

That and perhaps a common option to handle the cases that are common to
multiple backends (i.e. move some options from -m* namespace to -f*).
I'd say the decision about the options and ABI of what we emit is more
important than where we actually emit it, we can easily change where we do
that over time, but not the options nor the ABI.

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 13:35           ` Jakub Jelinek
@ 2018-01-10 13:55             ` H.J. Lu
  2018-01-11 10:16               ` Richard Biener
  2018-01-11  1:30             ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-10 13:55 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Richard Biener, Eric Botcazou, Jeff Law, GCC Patches

On Wed, Jan 10, 2018 at 5:14 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> >> It's really just a couple of new primitives to emit a jump as a call and
>> >> one to slam in a new return address.  Given those I think you can do the
>> >> entire implementation as RTL at expansion time and you've got a damn
>> >> good shot at protecting most architectures from these kinds of attacks.
>> >
>> > I think that you're a bit optimistic here and that implementing a generic and
>> > robust framework at the RTL level might require some time.  Given the time and
>> > (back-)portability constraints, it might be wiser to rush into architecture-
>> > specific countermeasures than to rush into an half-backed RTL framework.
>>
>> Let me also say that while it might be nice to commonize code introducing these
>> mitigations as late as possible to not disrupt optimization is important.  So I
>> don't see a very strong motivation in trying very hard to make this more
>> middle-endish, apart from maybe sharing helper functions where possible.
>
> That and perhaps a common option to handle the cases that are common to
> multiple backends (i.e. move some options from -m* namespace to -f*).
> I'd say the decision about the options and ABI of what we emit is more
> important than where we actually emit it, we can easily change where we do
> that over time, but not the options nor the ABI.
>

My x86 mitigations are specific to x86 processors.  I don't know if
these options
are relevant to other processors.  However, it is a good to have a common
option to enable mitigations, which can be built on top of processor specific
options.  For example, -fmitigate-spectre may simply imply

-mindirect-branch=thunk -mindirect-branch-register

For kernel, they may want to use

-mindirect-branch=thunk-extern -mindirect-branch-register

instead.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 14:36   ` Alan Modra
  2018-01-08 15:04     ` H.J. Lu
@ 2018-01-11  0:19     ` Jeff Law
  2018-01-11 10:05       ` Alan Modra
  1 sibling, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-11  0:19 UTC (permalink / raw)
  To: Alan Modra; +Cc: H.J. Lu, gcc-patches

On 01/08/2018 07:23 AM, Alan Modra wrote:
> On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
> [snip]
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
> 
> It's possible that x86 needs spectre variant 2 mitigation that isn't
> necessary on other modern processors like ARM and PowerPC, so let's
> not rush into general solutions designed around x86..
From what I know about variant 2 mitigation it's going to be needed on a
variety of chip families, not just the Intel architecture.

However, I'm seeing signals that other chips vendors are looking towards
approaches that don't use retpolines.  So even though I think we could
build them fairly easy for most targets out of simple primitives, it may
not be the best use of our time.

> 
> Here's a quick overview of Spectre.  For more, see
> https://spectreattack.com/spectre.pdf
> https://googleprojectzero.blogspot.com.au/2018/01/reading-privileged-memory-with-side.html
> https://developer.arm.com/-/media/Files/pdf/Cache_Speculation_Side-channels.pdf
Yup.  Already familiar with this stuff :-)

> 
> However, x86 has the additional problem of variable length
> instructions.  Gadgets might be hiding in code when executed at an
> offset from the start of the "real" instructions.  Which is why x86 is
> more at risk from this attack than other processors, and why x86 needs
> something like the posted variant 2 mitigation, slowing down all
> indirect branches.
> 
True, but largely beside the point.   I'm not aware of anyone serious
looking at mating ROP with Spectre at this point, though it is certainly
possible.  The bad guys don't need to work that hard at this time.


Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 13:35           ` Jakub Jelinek
  2018-01-10 13:55             ` H.J. Lu
@ 2018-01-11  1:30             ` Jeff Law
  2018-01-11 10:16               ` Richard Biener
  1 sibling, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-11  1:30 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: Eric Botcazou, GCC Patches, H.J. Lu

On 01/10/2018 06:14 AM, Jakub Jelinek wrote:
> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>>> It's really just a couple of new primitives to emit a jump as a call and
>>>> one to slam in a new return address.  Given those I think you can do the
>>>> entire implementation as RTL at expansion time and you've got a damn
>>>> good shot at protecting most architectures from these kinds of attacks.
>>>
>>> I think that you're a bit optimistic here and that implementing a generic and
>>> robust framework at the RTL level might require some time.  Given the time and
>>> (back-)portability constraints, it might be wiser to rush into architecture-
>>> specific countermeasures than to rush into an half-backed RTL framework.
>>
>> Let me also say that while it might be nice to commonize code introducing these
>> mitigations as late as possible to not disrupt optimization is important.  So I
>> don't see a very strong motivation in trying very hard to make this more
>> middle-endish, apart from maybe sharing helper functions where possible.
> 
> That and perhaps a common option to handle the cases that are common to
> multiple backends (i.e. move some options from -m* namespace to -f*).
> I'd say the decision about the options and ABI of what we emit is more
> important than where we actually emit it, we can easily change where we do
> that over time, but not the options nor the ABI.
From a UI standpoint, I think the decision has already been made as LLVM
has already thrown -mretpolines into their tree.   Sigh.

So I think the one thing we ought to seriously consider is at least
reserving -mretpoline for this style of mitigation of spectre v2.  ALl
target's don't have to implementation this style mitigation, but if they
do, they use -mretpoline.

Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11  0:19     ` Jeff Law
@ 2018-01-11 10:05       ` Alan Modra
  2018-01-11 20:30         ` Jeff Law
  0 siblings, 1 reply; 135+ messages in thread
From: Alan Modra @ 2018-01-11 10:05 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, gcc-patches

On Wed, Jan 10, 2018 at 05:13:36PM -0700, Jeff Law wrote:
> On 01/08/2018 07:23 AM, Alan Modra wrote:
> > On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
> >> On 01/07/2018 03:58 PM, H.J. Lu wrote:
> >>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
> >>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
> > [snip]
> >> My fundamental problem with this patchkit is that it is 100% x86/x86_64
> >> specific.
> > 
> > It's possible that x86 needs spectre variant 2 mitigation that isn't
> > necessary on other modern processors like ARM and PowerPC, so let's
> > not rush into general solutions designed around x86..
> >From what I know about variant 2 mitigation it's going to be needed on a
> variety of chip families, not just the Intel architecture.

Yes.  I was thinking that it might be possible ignore variant 2
attacks if there were no gadgets available anywhere in the victim
address space, which is true enough but difficult to achieve.  That
led me to think that indirect branches didn't matter, until someone
pointed out that the indirect branch attack could be chained.  If you
have the first part of a gadget, the read of interesting memory,
followed by an indirect branch, that indirect branch can be spoofed
into code that uses the interesting value in a way that affects cache
state.

-- 
Alan Modra
Australia Development Lab, IBM

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11  1:30             ` Jeff Law
@ 2018-01-11 10:16               ` Richard Biener
  2018-01-11 13:41                 ` H.J. Lu
  2018-01-11 20:32                 ` Jeff Law
  0 siblings, 2 replies; 135+ messages in thread
From: Richard Biener @ 2018-01-11 10:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: Jakub Jelinek, Eric Botcazou, GCC Patches, H.J. Lu

On Thu, Jan 11, 2018 at 1:18 AM, Jeff Law <law@redhat.com> wrote:
> On 01/10/2018 06:14 AM, Jakub Jelinek wrote:
>> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>>>> It's really just a couple of new primitives to emit a jump as a call and
>>>>> one to slam in a new return address.  Given those I think you can do the
>>>>> entire implementation as RTL at expansion time and you've got a damn
>>>>> good shot at protecting most architectures from these kinds of attacks.
>>>>
>>>> I think that you're a bit optimistic here and that implementing a generic and
>>>> robust framework at the RTL level might require some time.  Given the time and
>>>> (back-)portability constraints, it might be wiser to rush into architecture-
>>>> specific countermeasures than to rush into an half-backed RTL framework.
>>>
>>> Let me also say that while it might be nice to commonize code introducing these
>>> mitigations as late as possible to not disrupt optimization is important.  So I
>>> don't see a very strong motivation in trying very hard to make this more
>>> middle-endish, apart from maybe sharing helper functions where possible.
>>
>> That and perhaps a common option to handle the cases that are common to
>> multiple backends (i.e. move some options from -m* namespace to -f*).
>> I'd say the decision about the options and ABI of what we emit is more
>> important than where we actually emit it, we can easily change where we do
>> that over time, but not the options nor the ABI.
> From a UI standpoint, I think the decision has already been made as LLVM
> has already thrown -mretpolines into their tree.   Sigh.

Well, given retpolines are largely kernel relevant right now we don't
need to care here.

> So I think the one thing we ought to seriously consider is at least
> reserving -mretpoline for this style of mitigation of spectre v2.  ALl
> target's don't have to implementation this style mitigation, but if they
> do, they use -mretpoline.

And I'd also like people not to bikeshed too much on this given we're
in the situation
of having exploitable kernels around for which we need a cooperating
compiler.  So
during the time we bikeshed this (rather than reviewing the actual
patches) we have
to "backport" the current non-upstream state anyway to deliver fixed
kernels to our
customer.

Yes, if this were a "normal feature" we could continue discussing and
trying to design
sth nice and shiny.  But this isn't a normal feature.

So please - I'd also like to get this into a released compiler (aka
7.3) as soon as possible
(given a RC for 7.3 was scheduled to be early this week).

Thanks,
Richard.

>
> Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-10 13:55             ` H.J. Lu
@ 2018-01-11 10:16               ` Richard Biener
  0 siblings, 0 replies; 135+ messages in thread
From: Richard Biener @ 2018-01-11 10:16 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Jakub Jelinek, Eric Botcazou, Jeff Law, GCC Patches

On Wed, Jan 10, 2018 at 2:52 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Wed, Jan 10, 2018 at 5:14 AM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>> >> It's really just a couple of new primitives to emit a jump as a call and
>>> >> one to slam in a new return address.  Given those I think you can do the
>>> >> entire implementation as RTL at expansion time and you've got a damn
>>> >> good shot at protecting most architectures from these kinds of attacks.
>>> >
>>> > I think that you're a bit optimistic here and that implementing a generic and
>>> > robust framework at the RTL level might require some time.  Given the time and
>>> > (back-)portability constraints, it might be wiser to rush into architecture-
>>> > specific countermeasures than to rush into an half-backed RTL framework.
>>>
>>> Let me also say that while it might be nice to commonize code introducing these
>>> mitigations as late as possible to not disrupt optimization is important.  So I
>>> don't see a very strong motivation in trying very hard to make this more
>>> middle-endish, apart from maybe sharing helper functions where possible.
>>
>> That and perhaps a common option to handle the cases that are common to
>> multiple backends (i.e. move some options from -m* namespace to -f*).
>> I'd say the decision about the options and ABI of what we emit is more
>> important than where we actually emit it, we can easily change where we do
>> that over time, but not the options nor the ABI.
>>
>
> My x86 mitigations are specific to x86 processors.  I don't know if
> these options
> are relevant to other processors.  However, it is a good to have a common
> option to enable mitigations, which can be built on top of processor specific
> options.  For example, -fmitigate-spectre may simply imply
>
> -mindirect-branch=thunk -mindirect-branch-register
>
> For kernel, they may want to use
>
> -mindirect-branch=thunk-extern -mindirect-branch-register
>
> instead.

Yes, that's a good idea (common -fFOO umbrella option mapping to target bits).

And of course targets can follow x86 in their -mindirect-branch-foo
flag namings (if semantic matches).

Richard.

> --
> H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 11:40   ` H.J. Lu
@ 2018-01-11 13:28     ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-11 13:28 UTC (permalink / raw)
  To: Sandra Loosemore; +Cc: GCC Patches

On Mon, Jan 8, 2018 at 3:37 AM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Jan 7, 2018 at 8:07 PM, Sandra Loosemore
> <sandra@codesourcery.com> wrote:
>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>>
>>> This set of patches for GCC 8 mitigates variant #2 of the speculative
>>> execution
>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka
>>> Spectre.  They
>>> convert indirect branches to call and return thunks to avoid speculative
>>> execution
>>> via indirect call and jmp.
>>
>>
>> I have a general documentation issue with all the new command-line options
>> and attributes added by this patch set:  the documentation is very
>> implementor-speaky and doesn't explain what user-level problem they're
>> trying to solve.
>
> Do you have any suggestions?
>
>> E.g. to take just one example
>>
>>> +@item function_return("@var{choice}")
>>> +@cindex @code{function_return} function attribute, x86
>>> +On x86 targets, the @code{function_return} attribute causes the compiler
>>> +to convert function return with @var{choice}.  @samp{keep} keeps function
>>> +return unmodified.  @samp{thunk} converts function return to call and
>>> +return thunk.  @samp{thunk-inline} converts function return to inlined
>>> +call and return thunk.  @samp{thunk-extern} converts function return to
>>> +external call and return thunk provided in a separate object file.
>>
>>
>> Why would you want to mess with call and return code generation in this way?
>> The documentation doesn't say.
>>
>> For thunk-extern, is the programmer supposed to provide the thunk?  How
>> would you go about implementing the missing bit of code?  What should it do?
>> I'm compiler implementor and I wouldn't even know how to use this feature by
>> reading the manual; how would an ordinary application programmer who isn't
>> familiar with GCC internals know how to use it?

That is the best I can do.  My GCC documentation describes what my
patches generate.  The usage guidance should come from Intel white paper.
After it is published, I will submit a GCC patch to refer it.  It will
be very nice
for Intel white paper to recommend what compiler options should be used.

> This option was requested by Linux kernel people.  Linux kernel may
> choose different thunks at kernel load-time.  If a program doesn't know
> how to write external thunk, he/she shouldn't it.
>
>> If the goal here is to tell GCC to produce code that is protected against
>> the Spectre vulnerability, perhaps simplify this to adding just one option
>> that controls all the things you've given separate options and attributes
>> for?
>
> -mindirect-branch=thunk does the job.  Other options/choices are for
> fine tuning.
>
> Thanks.
>
> --
> H.J.



-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 10:16               ` Richard Biener
@ 2018-01-11 13:41                 ` H.J. Lu
  2018-01-12  8:07                   ` Uros Bizjak
  2018-01-11 20:32                 ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-11 13:41 UTC (permalink / raw)
  To: Richard Biener, Uros Bizjak
  Cc: Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Thu, Jan 11, 2018 at 2:16 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 1:18 AM, Jeff Law <law@redhat.com> wrote:
>> On 01/10/2018 06:14 AM, Jakub Jelinek wrote:
>>> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>>>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>>>>> It's really just a couple of new primitives to emit a jump as a call and
>>>>>> one to slam in a new return address.  Given those I think you can do the
>>>>>> entire implementation as RTL at expansion time and you've got a damn
>>>>>> good shot at protecting most architectures from these kinds of attacks.
>>>>>
>>>>> I think that you're a bit optimistic here and that implementing a generic and
>>>>> robust framework at the RTL level might require some time.  Given the time and
>>>>> (back-)portability constraints, it might be wiser to rush into architecture-
>>>>> specific countermeasures than to rush into an half-backed RTL framework.
>>>>
>>>> Let me also say that while it might be nice to commonize code introducing these
>>>> mitigations as late as possible to not disrupt optimization is important.  So I
>>>> don't see a very strong motivation in trying very hard to make this more
>>>> middle-endish, apart from maybe sharing helper functions where possible.
>>>
>>> That and perhaps a common option to handle the cases that are common to
>>> multiple backends (i.e. move some options from -m* namespace to -f*).
>>> I'd say the decision about the options and ABI of what we emit is more
>>> important than where we actually emit it, we can easily change where we do
>>> that over time, but not the options nor the ABI.
>> From a UI standpoint, I think the decision has already been made as LLVM
>> has already thrown -mretpolines into their tree.   Sigh.
>
> Well, given retpolines are largely kernel relevant right now we don't
> need to care here.
>
>> So I think the one thing we ought to seriously consider is at least
>> reserving -mretpoline for this style of mitigation of spectre v2.  ALl
>> target's don't have to implementation this style mitigation, but if they
>> do, they use -mretpoline.
>
> And I'd also like people not to bikeshed too much on this given we're
> in the situation
> of having exploitable kernels around for which we need a cooperating
> compiler.  So
> during the time we bikeshed this (rather than reviewing the actual
> patches) we have
> to "backport" the current non-upstream state anyway to deliver fixed
> kernels to our
> customer.
>
> Yes, if this were a "normal feature" we could continue discussing and
> trying to design
> sth nice and shiny.  But this isn't a normal feature.
>
> So please - I'd also like to get this into a released compiler (aka
> 7.3) as soon as possible
> (given a RC for 7.3 was scheduled to be early this week).

Hi Uros,

Can you take a look at my x86 backend changes so that they are ready
to check in once we have consensus.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 10:05       ` Alan Modra
@ 2018-01-11 20:30         ` Jeff Law
  0 siblings, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-11 20:30 UTC (permalink / raw)
  To: Alan Modra; +Cc: H.J. Lu, gcc-patches

On 01/11/2018 03:04 AM, Alan Modra wrote:
> On Wed, Jan 10, 2018 at 05:13:36PM -0700, Jeff Law wrote:
>> On 01/08/2018 07:23 AM, Alan Modra wrote:
>>> On Sun, Jan 07, 2018 at 04:36:20PM -0700, Jeff Law wrote:
>>>> On 01/07/2018 03:58 PM, H.J. Lu wrote:
>>>>> This set of patches for GCC 8 mitigates variant #2 of the speculative execution
>>>>> vulnerabilities on x86 processors identified by CVE-2017-5715, aka Spectre.
>>> [snip]
>>>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>>>> specific.
>>>
>>> It's possible that x86 needs spectre variant 2 mitigation that isn't
>>> necessary on other modern processors like ARM and PowerPC, so let's
>>> not rush into general solutions designed around x86..
>> >From what I know about variant 2 mitigation it's going to be needed on a
>> variety of chip families, not just the Intel architecture.
> 
> Yes.  I was thinking that it might be possible ignore variant 2
> attacks if there were no gadgets available anywhere in the victim
> address space, which is true enough but difficult to achieve.  That
> led me to think that indirect branches didn't matter, until someone
> pointed out that the indirect branch attack could be chained.  If you
> have the first part of a gadget, the read of interesting memory,
> followed by an indirect branch, that indirect branch can be spoofed
> into code that uses the interesting value in a way that affects cache
> state.
Note that even though variant 2 potentially applies to many
architectures and I believe we could do a fairly generic retpoline
implementation that would provide a high level of protection across many
targets that may not be the best choice.

As I mentioned in a message last night, some of the processor vendors
seem to want to go a different direction.  I'll summarize those as "add
this magic instruction to the indirect call/jump sequence".  They're
typically just a single insn and fairly trivial to implement.  They may
require using a different branch instruction as well.  But again,
trivial to implement.

Given that divergence I think we really want to look closely at HJ's
code for x86/x86_64 and let the other processor vendors work towards
their own mitigations -- ultimately with everyone doing their own thing
in their target files :(

Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 10:16               ` Richard Biener
  2018-01-11 13:41                 ` H.J. Lu
@ 2018-01-11 20:32                 ` Jeff Law
  2018-01-11 23:45                   ` Joseph Myers
  1 sibling, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-11 20:32 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jakub Jelinek, Eric Botcazou, GCC Patches, H.J. Lu

On 01/11/2018 03:16 AM, Richard Biener wrote:
> On Thu, Jan 11, 2018 at 1:18 AM, Jeff Law <law@redhat.com> wrote:
>> On 01/10/2018 06:14 AM, Jakub Jelinek wrote:
>>> On Wed, Jan 10, 2018 at 02:08:48PM +0100, Richard Biener wrote:
>>>> On Wed, Jan 10, 2018 at 11:18 AM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>>>>>> It's really just a couple of new primitives to emit a jump as a call and
>>>>>> one to slam in a new return address.  Given those I think you can do the
>>>>>> entire implementation as RTL at expansion time and you've got a damn
>>>>>> good shot at protecting most architectures from these kinds of attacks.
>>>>>
>>>>> I think that you're a bit optimistic here and that implementing a generic and
>>>>> robust framework at the RTL level might require some time.  Given the time and
>>>>> (back-)portability constraints, it might be wiser to rush into architecture-
>>>>> specific countermeasures than to rush into an half-backed RTL framework.
>>>>
>>>> Let me also say that while it might be nice to commonize code introducing these
>>>> mitigations as late as possible to not disrupt optimization is important.  So I
>>>> don't see a very strong motivation in trying very hard to make this more
>>>> middle-endish, apart from maybe sharing helper functions where possible.
>>>
>>> That and perhaps a common option to handle the cases that are common to
>>> multiple backends (i.e. move some options from -m* namespace to -f*).
>>> I'd say the decision about the options and ABI of what we emit is more
>>> important than where we actually emit it, we can easily change where we do
>>> that over time, but not the options nor the ABI.
>> From a UI standpoint, I think the decision has already been made as LLVM
>> has already thrown -mretpolines into their tree.   Sigh.
> 
> Well, given retpolines are largely kernel relevant right now we don't
> need to care here.
That's still TBD as far as I can tell.  I certainly hope we don't have
to go retpolines in user space, at least not in the general case.  I'm
holding out hope that the kernel folks are going to save the day.


> 
>> So I think the one thing we ought to seriously consider is at least
>> reserving -mretpoline for this style of mitigation of spectre v2.  ALl
>> target's don't have to implementation this style mitigation, but if they
>> do, they use -mretpoline.
> 
> And I'd also like people not to bikeshed too much on this given we're
> in the situation
> of having exploitable kernels around for which we need a cooperating
> compiler.  So
> during the time we bikeshed this (rather than reviewing the actual
> patches) we have
> to "backport" the current non-upstream state anyway to deliver fixed
> kernels to our
> customer.
Believe me, after dealing with stack clash, I'm fully aware of the
constraints folks dealling with mitigation are under.

Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-08 21:00   ` David Woodhouse
@ 2018-01-11 21:19     ` Florian Weimer
  2018-01-11 21:42       ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: Florian Weimer @ 2018-01-11 21:19 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Jeff Law, H.J. Lu, gcc-patches

* David Woodhouse:

> On Sun, 2018-01-07 at 16:36 -0700, Jeff Law wrote:
>> 
>> My fundamental problem with this patchkit is that it is 100% x86/x86_64
>> specific.
>> 
>> ISTM we want a target independent mechanism (ie, new standard patterns,
>> options, etc) then an x86/x86_64 implementation using that target
>> independent framework (ie, the actual implementation of those new
>> standard patterns).
>
> From the kernel point of view, I'm not too worried about GCC internal
> implementation details. What would be really useful to agree in short
> order is the command-line options that invoke this behaviour, and the
> ABI for the thunks.

Do you assume that you will eventually apply run-time patching to
thunks (in case they aren't needed)?

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 21:19     ` Florian Weimer
@ 2018-01-11 21:42       ` David Woodhouse
  2018-01-13 12:17         ` Florian Weimer
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-11 21:42 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jeff Law, H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1203 bytes --]

On Thu, 2018-01-11 at 22:17 +0100, Florian Weimer wrote:
> * David Woodhouse:
> 
> > 
> > On Sun, 2018-01-07 at 16:36 -0700, Jeff Law wrote:
> > > 
> > > 
> > > My fundamental problem with this patchkit is that it is 100% x86/x86_64
> > > specific.
> > > 
> > > ISTM we want a target independent mechanism (ie, new standard patterns,
> > > options, etc) then an x86/x86_64 implementation using that target
> > > independent framework (ie, the actual implementation of those new
> > > standard patterns).
> > From the kernel point of view, I'm not too worried about GCC internal
> > implementation details. What would be really useful to agree in short
> > order is the command-line options that invoke this behaviour, and the
> > ABI for the thunks.
>
> Do you assume that you will eventually apply run-time patching to
> thunks (in case they aren't needed)?

"eventually"? We've been doing it for weeks. We are desperate to
release the kernel.... when can we have agreement on at *least* the
command line option and the name of the thunk? Internal implementation
details we really don't care about, but those we do.

http://git.infradead.org/users/dwmw2/linux-retpoline.git

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-11 21:49   ` Jeff Law
@ 2018-01-11 21:49     ` H.J. Lu
  2018-01-12 12:56     ` Martin Jambor
  1 sibling, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-11 21:49 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Thu, Jan 11, 2018 at 1:42 PM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>>
>> gcc/
>>
>>       * config/i386/i386-opts.h (indirect_branch_loop): New.
>>       * config/i386/i386.c (output_indirect_thunk): Support
>>       -mindirect-branch-loop=.
>>       * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>       (indirect_branch_loop): New.
>>       (lfence): Likewise.
>>       (pause): Likewise.
>>       (nop): Likewise.
>>       * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>
>> gcc/testsuite/
>>
>>       * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>       * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> I think we should drop the ability to change the filler until such time
> as we really need it.  Just pick one and go with it.  I think David
> suggested that they wanted "pause".  I'm obviously fine with that.

I will hard code it to "pause".

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-07 22:59 ` [PATCH 2/5] x86: Add -mindirect-branch-loop= H.J. Lu
  2018-01-08  8:23   ` Florian Weimer
@ 2018-01-11 21:49   ` Jeff Law
  2018-01-11 21:49     ` H.J. Lu
  2018-01-12 12:56     ` Martin Jambor
  1 sibling, 2 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-11 21:49 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mindirect-branch-loop= option to control loop filler in call and
> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> as loop filler.  The default is 'lfence'.
> 
> gcc/
> 
> 	* config/i386/i386-opts.h (indirect_branch_loop): New.
> 	* config/i386/i386.c (output_indirect_thunk): Support
> 	-mindirect-branch-loop=.
> 	* config/i386/i386.opt (mindirect-branch-loop=): New option.
> 	(indirect_branch_loop): New.
> 	(lfence): Likewise.
> 	(pause): Likewise.
> 	(nop): Likewise.
> 	* doc/invoke.texi: Document -mindirect-branch-loop= option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
> 	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
I think we should drop the ability to change the filler until such time
as we really need it.  Just pick one and go with it.  I think David
suggested that they wanted "pause".  I'm obviously fine with that.


jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-07 22:59 ` [PATCH 1/5] x86: Add -mindirect-branch= H.J. Lu
  2018-01-08 10:56   ` Martin Liška
@ 2018-01-11 22:54   ` Jeff Law
  2018-01-11 23:03     ` H.J. Lu
  2018-01-11 23:09     ` Jakub Jelinek
  1 sibling, 2 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-11 22:54 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mindirect-branch= option to convert indirect call and jump to call
> and return thunks.  The default is 'keep', which keeps indirect call and
> jump unmodified.  'thunk' converts indirect call and jump to call and
> return thunk.  'thunk-inline' converts indirect call and jump to inlined
> call and return thunk.  'thunk-extern' converts indirect call and jump to
> external call and return thunk provided in a separate object file.  You
> can control this behavior for a specific function by using the function
> attribute indirect_branch.
> 
> 2 kinds of thunks are geneated.  Memory thunk where the function address
> is at the top of the stack:
> 
> __x86_indirect_thunk:
> 	call L2
> L1:
> 	lfence
> 	jmp L1
> L2:
> 	lea 8(%rsp), %rsp|lea 4(%esp), %esp
> 	ret
> 
> Indirect jmp via memory, "jmp mem", is converted to
> 
> 	push memory
> 	jmp __x86_indirect_thunk
> 
> Indirect call via memory, "call mem", is converted to
> 
> 	jmp L2
> L1:
> 	push [mem]
> 	jmp __x86_indirect_thunk
> L2:
> 	call L1
> 
> Register thunk where the function address is in a register, reg:
> 
> __x86_indirect_thunk_reg:
> 	call	L2
> L1:
> 	lfence
> 	jmp	L1
> L2:
> 	movq	%reg, (%rsp)|movl    %reg, (%esp)
> 	ret
> 
> where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
> (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.
> 
> Indirect jmp via register, "jmp reg", is converted to
> 
> 	jmp __x86_indirect_thunk_reg
> 
> Indirect call via register, "call reg", is converted to
> 
> 	call __x86_indirect_thunk_reg
> 
> gcc/
> 
> 	* config/i386/i386-opts.h (indirect_branch): New.
> 	* config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
> 	* config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
> 	with local indirect jump when converting indirect call and jump.
> 	(ix86_set_indirect_branch_type): New.
> 	(ix86_set_current_function): Call ix86_set_indirect_branch_type.
> 	(indirectlabelno): New.
> 	(indirect_thunk_needed): Likewise.
> 	(indirect_thunk_bnd_needed): Likewise.
> 	(indirect_thunks_used): Likewise.
> 	(indirect_thunks_bnd_used): Likewise.
> 	(INDIRECT_LABEL): Likewise.
> 	(indirect_thunk_name): Likewise.
> 	(output_indirect_thunk): Likewise.
> 	(output_indirect_thunk_function): Likewise.
> 	(ix86_output_indirect_branch): Likewise.
> 	(ix86_output_indirect_jmp): Likewise.
> 	(ix86_code_end): Call output_indirect_thunk_function if needed.
> 	(ix86_output_call_insn): Call ix86_output_indirect_branch if
> 	needed.
> 	(ix86_handle_fndecl_attribute): Handle indirect_branch.
> 	(ix86_attribute_table): Add indirect_branch.
> 	* config/i386/i386.h (machine_function): Add indirect_branch_type
> 	and has_local_indirect_jump.
> 	* config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
> 	to true.
> 	(tablejump): Likewise.
> 	(*indirect_jump): Use ix86_output_indirect_jmp.
> 	(*tablejump_1): Likewise.
> 	(simple_return_indirect_internal): Likewise.
> 	* config/i386/i386.opt (mindirect-branch=): New option.
> 	(indirect_branch): New.
> 	(keep): Likewise.
> 	(thunk): Likewise.
> 	(thunk-inline): Likewise.
> 	(thunk-extern): Likewise.
> 	* doc/extend.texi: Document indirect_branch function attribute.
> 	* doc/invoke.texi: Document -mindirect-branch= option.
Note I'm expecting Uros to chime in.  So please do not consider this
ack'd until you hear from Uros.

At a high level is there really that much value in having thunks in the
object file?  Why not put the full set of thunks into libgcc and just
allow selection between inline sequences and external thunks
(thunk-inline and thunk-external)?  It's not a huge simplification, but
if there isn't a compelling reason, let's drop the in-object-file thunks.

> +
> +/* Fills in the label name that should be used for the indirect thunk.  */
> +
> +static void
> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
Please document each argument in the function's comment.



> +
> +static void
> +output_indirect_thunk (bool need_bnd_p, int regno)
Needs a function comment.



> +
> +static void
> +output_indirect_thunk_function (bool need_bnd_p, int regno)
Needs a function comment.



> @@ -28119,12 +28357,182 @@ ix86_nopic_noplt_attribute_p (rtx call_op)
>    return false;
>  }
>  
> +static void
> +ix86_output_indirect_branch (rtx call_op, const char *xasm,
> +			     bool sibcall_p)
Needs a function comment.


I'd probably break this into a few smaller functions.  It's a lot of
inlined code.







> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 3f587806407..a7573c468ae 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -12313,12 +12313,13 @@
>  {
>    if (TARGET_X32)
>      operands[0] = convert_memory_address (word_mode, operands[0]);
> +  cfun->machine->has_local_indirect_jump = true;
Note this is not ideal in that it's set at expansion time and thus would
not be accurate if the RTL optimizers were able to simply things enough
such that the indirect jump has a known target.

But I wouldn't expect that to happen much in the RTL optimizers much as
the gimple optimizers are likely much better at doing that kind of
thing.  So I won't object to doing things this way as long as they
gracefully handle this case.


> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 09aaa97c2fc..22c806206e4 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1021,3 +1021,23 @@ indirect jump.
>  mforce-indirect-call
>  Target Report Var(flag_force_indirect_call) Init(0)
>  Make all function calls indirect.
> +
> +mindirect-branch=
> +Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
> +Convert indirect call and jump.
Convert to what?  I realize there's an enum of the choices below, but
this doesn't read well.

Do you want to mention that CET and retpolines are inherently
incompatible?  Should an attempt to use them together generate a
compile-time error?

Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-07 22:59 ` [PATCH 3/5] x86: Add -mfunction-return= H.J. Lu
  2018-01-08 10:01   ` Martin Liška
@ 2018-01-11 23:00   ` Jeff Law
  2018-01-11 23:07     ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-11 23:00 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mfunction-return= option to convert function return to call and
> return thunks.  The default is 'keep', which keeps function return
> unmodified.  'thunk' converts function return to call and return thunk.
> 'thunk-inline' converts function return to inlined call and return thunk.  'thunk-extern' converts function return to external call and return
> thunk provided in a separate object file.  You can control this behavior
> for a specific function by using the function attribute function_return.
> 
> Function return thunk is the same as memory thunk for -mindirect-branch=
> where the return address is at the top of the stack:
> 
> __x86_return_thunk:
> 	call L2
> L1:
> 	lfence
> 	jmp L1
> L2:
> 	lea 8(%rsp), %rsp|lea 4(%esp), %esp
> 	ret
> 
> and function return becomes
> 
> 	jmp __x86_return_thunk
> 
> -mindirect-branch= tests are updated with -mfunction-return=keep to
> avoid false test failures when -mfunction-return=lfence is added to
> RUNTESTFLAGS for "make check".
> 
> gcc/
> 
> 	* config/i386/i386-protos.h (ix86_output_function_return): New.
> 	* config/i386/i386.c (ix86_set_indirect_branch_type): Also
> 	set function_return_type.
> 	(indirect_thunk_name): Add ret_p to indicate thunk for function
> 	return.
> 	(output_indirect_thunk_function): Pass false to
> 	indirect_thunk_name.
> 	(ix86_output_indirect_branch): Likewise.
> 	(output_indirect_thunk_function): Create alias for function
> 	return thunk if regno < 0.
> 	(ix86_output_function_return): New function.
> 	(ix86_handle_fndecl_attribute): Handle function_return.
> 	(ix86_attribute_table): Add function_return.
> 	* config/i386/i386.h (machine_function): Add
> 	function_return_type.
> 	* config/i386/i386.md (simple_return_internal): Use
> 	ix86_output_function_return.
> 	(simple_return_internal_long): Likewise.
> 	* config/i386/i386.opt (mfunction-return=): New option.
> 	(indirect_branch): Mention -mfunction-return=.
> 	* doc/extend.texi: Document function_return function attribute.
> 	* doc/invoke.texi: Document -mfunction-return= option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
> 	-mfunction-return=keep.
> 	* gcc.target/i386/indirect-thunk-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
> 	* gcc.target/i386/ret-thunk-1.c: New test.
> 	* gcc.target/i386/ret-thunk-10.c: Likewise.
> 	* gcc.target/i386/ret-thunk-11.c: Likewise.
> 	* gcc.target/i386/ret-thunk-12.c: Likewise.
> 	* gcc.target/i386/ret-thunk-13.c: Likewise.
> 	* gcc.target/i386/ret-thunk-14.c: Likewise.
> 	* gcc.target/i386/ret-thunk-15.c: Likewise.
> 	* gcc.target/i386/ret-thunk-16.c: Likewise.
> 	* gcc.target/i386/ret-thunk-2.c: Likewise.
> 	* gcc.target/i386/ret-thunk-3.c: Likewise.
> 	* gcc.target/i386/ret-thunk-4.c: Likewise.
> 	* gcc.target/i386/ret-thunk-5.c: Likewise.
> 	* gcc.target/i386/ret-thunk-6.c: Likewise.
> 	* gcc.target/i386/ret-thunk-7.c: Likewise.
> 	* gcc.target/i386/ret-thunk-8.c: Likewise.
> 	* gcc.target/i386/ret-thunk-9.c: Likewise.
Same high level comments apply here.  ie, expecting Uros to chime in and
thus final approval rests with Uros.  Is there significant value in
having return thunks in the object file?

Is this something that is currently used or requested by the kernel teams?

> @@ -28539,6 +28610,43 @@ ix86_output_indirect_jmp (rtx call_op, bool ret_p)
>      return "%!jmp\t%A0";
>  }
>  
> +const char *
> +ix86_output_function_return (bool long_p)
Needs function comment.


> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index f3e43da0aed..2b62363973d 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -1026,9 +1026,13 @@ mindirect-branch=
>  Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
>  Convert indirect call and jump.
>  
> +mfunction-return=
> +Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
> +Convert function return.
Again, reads poorly.


^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-11 22:54   ` Jeff Law
@ 2018-01-11 23:03     ` H.J. Lu
  2018-01-11 23:09     ` Jakub Jelinek
  1 sibling, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-11 23:03 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Thu, Jan 11, 2018 at 2:46 PM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mindirect-branch= option to convert indirect call and jump to call
>> and return thunks.  The default is 'keep', which keeps indirect call and
>> jump unmodified.  'thunk' converts indirect call and jump to call and
>> return thunk.  'thunk-inline' converts indirect call and jump to inlined
>> call and return thunk.  'thunk-extern' converts indirect call and jump to
>> external call and return thunk provided in a separate object file.  You
>> can control this behavior for a specific function by using the function
>> attribute indirect_branch.
>>
>> 2 kinds of thunks are geneated.  Memory thunk where the function address
>> is at the top of the stack:
>>
>> __x86_indirect_thunk:
>>       call L2
>> L1:
>>       lfence
>>       jmp L1
>> L2:
>>       lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>       ret
>>
>> Indirect jmp via memory, "jmp mem", is converted to
>>
>>       push memory
>>       jmp __x86_indirect_thunk
>>
>> Indirect call via memory, "call mem", is converted to
>>
>>       jmp L2
>> L1:
>>       push [mem]
>>       jmp __x86_indirect_thunk
>> L2:
>>       call L1
>>
>> Register thunk where the function address is in a register, reg:
>>
>> __x86_indirect_thunk_reg:
>>       call    L2
>> L1:
>>       lfence
>>       jmp     L1
>> L2:
>>       movq    %reg, (%rsp)|movl    %reg, (%esp)
>>       ret
>>
>> where reg is one of (r|e)ax, (r|e)dx, (r|e)cx, (r|e)bx, (r|e)si, (r|e)di,
>> (r|e)bp, r8, r9, r10, r11, r12, r13, r14 and r15.
>>
>> Indirect jmp via register, "jmp reg", is converted to
>>
>>       jmp __x86_indirect_thunk_reg
>>
>> Indirect call via register, "call reg", is converted to
>>
>>       call __x86_indirect_thunk_reg
>>
>> gcc/
>>
>>       * config/i386/i386-opts.h (indirect_branch): New.
>>       * config/i386/i386-protos.h (ix86_output_indirect_jmp): Likewise.
>>       * config/i386/i386.c (ix86_using_red_zone): Disallow red-zone
>>       with local indirect jump when converting indirect call and jump.
>>       (ix86_set_indirect_branch_type): New.
>>       (ix86_set_current_function): Call ix86_set_indirect_branch_type.
>>       (indirectlabelno): New.
>>       (indirect_thunk_needed): Likewise.
>>       (indirect_thunk_bnd_needed): Likewise.
>>       (indirect_thunks_used): Likewise.
>>       (indirect_thunks_bnd_used): Likewise.
>>       (INDIRECT_LABEL): Likewise.
>>       (indirect_thunk_name): Likewise.
>>       (output_indirect_thunk): Likewise.
>>       (output_indirect_thunk_function): Likewise.
>>       (ix86_output_indirect_branch): Likewise.
>>       (ix86_output_indirect_jmp): Likewise.
>>       (ix86_code_end): Call output_indirect_thunk_function if needed.
>>       (ix86_output_call_insn): Call ix86_output_indirect_branch if
>>       needed.
>>       (ix86_handle_fndecl_attribute): Handle indirect_branch.
>>       (ix86_attribute_table): Add indirect_branch.
>>       * config/i386/i386.h (machine_function): Add indirect_branch_type
>>       and has_local_indirect_jump.
>>       * config/i386/i386.md (indirect_jump): Set has_local_indirect_jump
>>       to true.
>>       (tablejump): Likewise.
>>       (*indirect_jump): Use ix86_output_indirect_jmp.
>>       (*tablejump_1): Likewise.
>>       (simple_return_indirect_internal): Likewise.
>>       * config/i386/i386.opt (mindirect-branch=): New option.
>>       (indirect_branch): New.
>>       (keep): Likewise.
>>       (thunk): Likewise.
>>       (thunk-inline): Likewise.
>>       (thunk-extern): Likewise.
>>       * doc/extend.texi: Document indirect_branch function attribute.
>>       * doc/invoke.texi: Document -mindirect-branch= option.
> Note I'm expecting Uros to chime in.  So please do not consider this
> ack'd until you hear from Uros.
>
> At a high level is there really that much value in having thunks in the
> object file?  Why not put the full set of thunks into libgcc and just
> allow selection between inline sequences and external thunks
> (thunk-inline and thunk-external)?  It's not a huge simplification, but
> if there isn't a compelling reason, let's drop the in-object-file thunks.

I prefer to leave it in the object file just in case that
-mindirect-branch-loop=
is needed in the future.

>> +
>> +/* Fills in the label name that should be used for the indirect thunk.  */
>> +
>> +static void
>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
> Please document each argument in the function's comment.

Will do.

>
>
>> +
>> +static void
>> +output_indirect_thunk (bool need_bnd_p, int regno)
> Needs a function comment.

Will do.

>
>
>> +
>> +static void
>> +output_indirect_thunk_function (bool need_bnd_p, int regno)
> Needs a function comment.
>

Will do.

>
>> @@ -28119,12 +28357,182 @@ ix86_nopic_noplt_attribute_p (rtx call_op)
>>    return false;
>>  }
>>
>> +static void
>> +ix86_output_indirect_branch (rtx call_op, const char *xasm,
>> +                          bool sibcall_p)
> Needs a function comment.
>

Will do.

> I'd probably break this into a few smaller functions.  It's a lot of
> inlined code.
>

That function has 142 lines.  Unless there is a compelling need,
I prefer to leave it ASIS.

>
>
>
>
>
>> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
>> index 3f587806407..a7573c468ae 100644
>> --- a/gcc/config/i386/i386.md
>> +++ b/gcc/config/i386/i386.md
>> @@ -12313,12 +12313,13 @@
>>  {
>>    if (TARGET_X32)
>>      operands[0] = convert_memory_address (word_mode, operands[0]);
>> +  cfun->machine->has_local_indirect_jump = true;
> Note this is not ideal in that it's set at expansion time and thus would
> not be accurate if the RTL optimizers were able to simply things enough
> such that the indirect jump has a known target.
>
> But I wouldn't expect that to happen much in the RTL optimizers much as
> the gimple optimizers are likely much better at doing that kind of
> thing.  So I won't object to doing things this way as long as they
> gracefully handle this case.
>
>
>> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
>> index 09aaa97c2fc..22c806206e4 100644
>> --- a/gcc/config/i386/i386.opt
>> +++ b/gcc/config/i386/i386.opt
>> @@ -1021,3 +1021,23 @@ indirect jump.
>>  mforce-indirect-call
>>  Target Report Var(flag_force_indirect_call) Init(0)
>>  Make all function calls indirect.
>> +
>> +mindirect-branch=
>> +Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
>> +Convert indirect call and jump.
> Convert to what?  I realize there's an enum of the choices below, but
> this doesn't read well.

I will update.

> Do you want to mention that CET and retpolines are inherently

I will document it.

> incompatible?  Should an attempt to use them together generate a
> compile-time error?
>

Compile-time error sounds a good idea.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 3/5] x86: Add -mfunction-return=
  2018-01-11 23:00   ` Jeff Law
@ 2018-01-11 23:07     ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-11 23:07 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches

On Thu, Jan 11, 2018 at 2:54 PM, Jeff Law <law@redhat.com> wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mfunction-return= option to convert function return to call and
>> return thunks.  The default is 'keep', which keeps function return
>> unmodified.  'thunk' converts function return to call and return thunk.
>> 'thunk-inline' converts function return to inlined call and return thunk.  'thunk-extern' converts function return to external call and return
>> thunk provided in a separate object file.  You can control this behavior
>> for a specific function by using the function attribute function_return.
>>
>> Function return thunk is the same as memory thunk for -mindirect-branch=
>> where the return address is at the top of the stack:
>>
>> __x86_return_thunk:
>>       call L2
>> L1:
>>       lfence
>>       jmp L1
>> L2:
>>       lea 8(%rsp), %rsp|lea 4(%esp), %esp
>>       ret
>>
>> and function return becomes
>>
>>       jmp __x86_return_thunk
>>
>> -mindirect-branch= tests are updated with -mfunction-return=keep to
>> avoid false test failures when -mfunction-return=lfence is added to
>> RUNTESTFLAGS for "make check".
>>
>> gcc/
>>
>>       * config/i386/i386-protos.h (ix86_output_function_return): New.
>>       * config/i386/i386.c (ix86_set_indirect_branch_type): Also
>>       set function_return_type.
>>       (indirect_thunk_name): Add ret_p to indicate thunk for function
>>       return.
>>       (output_indirect_thunk_function): Pass false to
>>       indirect_thunk_name.
>>       (ix86_output_indirect_branch): Likewise.
>>       (output_indirect_thunk_function): Create alias for function
>>       return thunk if regno < 0.
>>       (ix86_output_function_return): New function.
>>       (ix86_handle_fndecl_attribute): Handle function_return.
>>       (ix86_attribute_table): Add function_return.
>>       * config/i386/i386.h (machine_function): Add
>>       function_return_type.
>>       * config/i386/i386.md (simple_return_internal): Use
>>       ix86_output_function_return.
>>       (simple_return_internal_long): Likewise.
>>       * config/i386/i386.opt (mfunction-return=): New option.
>>       (indirect_branch): Mention -mfunction-return=.
>>       * doc/extend.texi: Document function_return function attribute.
>>       * doc/invoke.texi: Document -mfunction-return= option.
>>
>> gcc/testsuite/
>>
>>       * gcc.target/i386/indirect-thunk-1.c (dg-options): Add
>>       -mfunction-return=keep.
>>       * gcc.target/i386/indirect-thunk-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-5.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-6.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-7.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-attr-8.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
>>       * gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
>>       * gcc.target/i386/ret-thunk-1.c: New test.
>>       * gcc.target/i386/ret-thunk-10.c: Likewise.
>>       * gcc.target/i386/ret-thunk-11.c: Likewise.
>>       * gcc.target/i386/ret-thunk-12.c: Likewise.
>>       * gcc.target/i386/ret-thunk-13.c: Likewise.
>>       * gcc.target/i386/ret-thunk-14.c: Likewise.
>>       * gcc.target/i386/ret-thunk-15.c: Likewise.
>>       * gcc.target/i386/ret-thunk-16.c: Likewise.
>>       * gcc.target/i386/ret-thunk-2.c: Likewise.
>>       * gcc.target/i386/ret-thunk-3.c: Likewise.
>>       * gcc.target/i386/ret-thunk-4.c: Likewise.
>>       * gcc.target/i386/ret-thunk-5.c: Likewise.
>>       * gcc.target/i386/ret-thunk-6.c: Likewise.
>>       * gcc.target/i386/ret-thunk-7.c: Likewise.
>>       * gcc.target/i386/ret-thunk-8.c: Likewise.
>>       * gcc.target/i386/ret-thunk-9.c: Likewise.
> Same high level comments apply here.  ie, expecting Uros to chime in and
> thus final approval rests with Uros.  Is there significant value in
> having return thunks in the object file?

To better support -mindirect-branch-loop= if it is ever needed.

> Is this something that is currently used or requested by the kernel teams?

Yes, it is requested by the kernel teams?

>> @@ -28539,6 +28610,43 @@ ix86_output_indirect_jmp (rtx call_op, bool ret_p)
>>      return "%!jmp\t%A0";
>>  }
>>
>> +const char *
>> +ix86_output_function_return (bool long_p)
> Needs function comment.

Will do.

>
>> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
>> index f3e43da0aed..2b62363973d 100644
>> --- a/gcc/config/i386/i386.opt
>> +++ b/gcc/config/i386/i386.opt
>> @@ -1026,9 +1026,13 @@ mindirect-branch=
>>  Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_indirect_branch) Init(indirect_branch_keep)
>>  Convert indirect call and jump.
>>
>> +mfunction-return=
>> +Target Report RejectNegative Joined Enum(indirect_branch) Var(ix86_function_return) Init(indirect_branch_keep)
>> +Convert function return.
> Again, reads poorly.
>

Will update.


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-11 22:54   ` Jeff Law
  2018-01-11 23:03     ` H.J. Lu
@ 2018-01-11 23:09     ` Jakub Jelinek
  2018-01-12 17:59       ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-11 23:09 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, gcc-patches

On Thu, Jan 11, 2018 at 03:46:51PM -0700, Jeff Law wrote:
> Note I'm expecting Uros to chime in.  So please do not consider this
> ack'd until you hear from Uros.
> 
> At a high level is there really that much value in having thunks in the
> object file?  Why not put the full set of thunks into libgcc and just
> allow selection between inline sequences and external thunks
> (thunk-inline and thunk-external)?  It's not a huge simplification, but
> if there isn't a compelling reason, let's drop the in-object-file thunks.

Not everything is linked against libgcc.a, something is linked against just
libgcc_s.so.1, other stuff against both, some libraries against none of that.
Probably it is undesirable to have the thunks at non-constant offsets from
the uses, that would need text relocations.  Thunks emitted in the object
files, hidden and comdat merged between .o files like what we have say for
i686 PIC thunks seems like the best default to me and a way for the kernel
to override that.

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 4/5] x86: Add -mindirect-branch-register
  2018-01-07 22:59 ` [PATCH 4/5] x86: Add -mindirect-branch-register H.J. Lu
@ 2018-01-11 23:11   ` Jeff Law
  0 siblings, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-11 23:11 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add -mindirect-branch-register to force indirect branch via register.
> This is implemented by disabling patterns of indirect branch via memory,
> similar to TARGET_X32.
> 
> -mindirect-branch= and -mfunction-return= tests are updated with
> -mno-indirect-branch-register to avoid false test failures when
> -mindirect-branch-register is added to RUNTESTFLAGS for "make check".
> 
> gcc/
> 
> 	* config/i386/constraints.md (Bs): Disallow memory operand for
> 	-mindirect-branch-register.
> 	(Bw): Likewise.
> 	* config/i386/predicates.md (indirect_branch_operand): Likewise.
> 	(GOT_memory_operand): Likewise.
> 	(call_insn_operand): Likewise.
> 	(sibcall_insn_operand): Likewise.
> 	(GOT32_symbol_operand): Likewise.
> 	* config/i386/i386.md (indirect_jump): Call convert_memory_address
> 	for -mindirect-branch-register.
> 	(tablejump): Likewise.
> 	(*sibcall_memory): Likewise.
> 	(*sibcall_value_memory): Likewise.
> 	Disallow peepholes of indirect call and jump via memory for
> 	-mindirect-branch-register.
> 	(*call_pop): Replace m with Bw.
> 	(*call_value_pop): Likewise.
> 	(*sibcall_pop_memory): Replace m with Bs.
> 	* config/i386/i386.opt (mindirect-branch-register): New option.
> 	* doc/invoke.texi: Document -mindirect-branch-register option.
> 
> gcc/testsuite/
> 
> 	* gcc.target/i386/indirect-thunk-1.c (dg-options): Add
> 	-mno-indirect-branch-register.
> 	* gcc.target/i386/indirect-thunk-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-bnd-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-5.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-6.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-inline-7.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-1.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> 	* gcc.target/i386/ret-thunk-10.c: Likewise.
> 	* gcc.target/i386/ret-thunk-11.c: Likewise.
> 	* gcc.target/i386/ret-thunk-12.c: Likewise.
> 	* gcc.target/i386/ret-thunk-13.c: Likewise.
> 	* gcc.target/i386/ret-thunk-14.c: Likewise.
> 	* gcc.target/i386/ret-thunk-15.c: Likewise.
> 	* gcc.target/i386/ret-thunk-9.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-register-1.c: New test.
> 	* gcc.target/i386/indirect-thunk-register-2.c: Likewise.
> 	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.
No comments from me on this.  Uros has final call.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 5/5] x86: Add 'V' register operand modifier
  2018-01-07 22:59 ` [PATCH 5/5] x86: Add 'V' register operand modifier H.J. Lu
@ 2018-01-11 23:17   ` Jeff Law
  0 siblings, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-11 23:17 UTC (permalink / raw)
  To: H.J. Lu, gcc-patches

On 01/07/2018 03:59 PM, H.J. Lu wrote:
> Add 'V', a special modifier which prints the name of the full integer
> register without '%'.  For
> 
> extern void (*func_p) (void);
> 
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
> }
> 
> it generates:
> 
> foo:
> 	movq	func_p(%rip), %rax
> 	call	__x86_indirect_thunk_rax
> 	ret
> 
> gcc/
> 
> 	* config/i386/i386.c (print_reg): Print the name of the full
> 	integer register without '%'.
> 	(ix86_print_operand): Handle 'V'.
> 	 * doc/extend.texi: Document 'V' modifier.
> 
> gcc/testsuite/
> 
> 	* gcc.target/i386/indirect-thunk-register-4.c: New test.
No comments from me.  Uros has final call here.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 20:32                 ` Jeff Law
@ 2018-01-11 23:45                   ` Joseph Myers
  2018-01-12  0:46                     ` Jeff Law
  0 siblings, 1 reply; 135+ messages in thread
From: Joseph Myers @ 2018-01-11 23:45 UTC (permalink / raw)
  To: Jeff Law
  Cc: Richard Biener, Jakub Jelinek, Eric Botcazou, GCC Patches, H.J. Lu

On Thu, 11 Jan 2018, Jeff Law wrote:

> > Well, given retpolines are largely kernel relevant right now we don't
> > need to care here.
> That's still TBD as far as I can tell.  I certainly hope we don't have
> to go retpolines in user space, at least not in the general case.  I'm
> holding out hope that the kernel folks are going to save the day.

I'd presume that just about any userspace process could have sensitive 
data in its address space (e.g. cp, if it happens to be copying it at the 
time).  Is the expectation that the kernel will use IBRS/IBPB/STIBP 
globally to shield processes from branch prediction state created by other 
processes?  (As far as I can tell, microcode enabling IBRS/IBPB/STIBP is 
only available for Ivy Bridge-EX and later at present, though I can't 
locate any official Intel status information on microcode updates for 
Spectre that have been released or are planned.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 23:45                   ` Joseph Myers
@ 2018-01-12  0:46                     ` Jeff Law
  0 siblings, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-12  0:46 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Richard Biener, Jakub Jelinek, Eric Botcazou, GCC Patches, H.J. Lu

On 01/11/2018 04:40 PM, Joseph Myers wrote:
> On Thu, 11 Jan 2018, Jeff Law wrote:
> 
>>> Well, given retpolines are largely kernel relevant right now we don't
>>> need to care here.
>> That's still TBD as far as I can tell.  I certainly hope we don't have
>> to go retpolines in user space, at least not in the general case.  I'm
>> holding out hope that the kernel folks are going to save the day.
> 
> I'd presume that just about any userspace process could have sensitive 
> data in its address space (e.g. cp, if it happens to be copying it at the 
> time).  
Yup.


> Is the expectation that the kernel will use IBRS/IBPB/STIBP 
> globally to shield processes from branch prediction state created by other 
> processes?  (As far as I can tell, microcode enabling IBRS/IBPB/STIBP is 
> only available for Ivy Bridge-EX and later at present, though I can't 
> locate any official Intel status information on microcode updates for 
> Spectre that have been released or are planned.)
I'm not sure of all the details of how it's supposed to work (assuming
it can) nor how much may or may not be covered by NDAs.  So it's
probably best to just stick with my statement that I hope that the
kernel folks can save the day here.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 13:41                 ` H.J. Lu
@ 2018-01-12  8:07                   ` Uros Bizjak
  2018-01-12 13:31                     ` H.J. Lu
  2018-01-14 16:21                     ` Uros Bizjak
  0 siblings, 2 replies; 135+ messages in thread
From: Uros Bizjak @ 2018-01-12  8:07 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:

> Hi Uros,
>
> Can you take a look at my x86 backend changes so that they are ready
> to check in once we have consensus.

Please finish the talks about the correct approach first. Once the
consensus is reached, please post the final version of the patches for
review.

BTW: I have no detailed insight in these issues, so I'll look mostly
at the implementation details, probably early next week.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-11 21:49   ` Jeff Law
  2018-01-11 21:49     ` H.J. Lu
@ 2018-01-12 12:56     ` Martin Jambor
  2018-01-12 14:23       ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Martin Jambor @ 2018-01-12 12:56 UTC (permalink / raw)
  To: Nagarajan, Muthu kumar raj, Kumar, Venkataramanan, gcc-patches
  Cc: Jeff Law, H.J. Lu

Hi,

On Thu, Jan 11 2018, Jeff Law wrote:
> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> Add -mindirect-branch-loop= option to control loop filler in call and
>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> as loop filler.  The default is 'lfence'.
>> 
>> gcc/
>> 
>> 	* config/i386/i386-opts.h (indirect_branch_loop): New.
>> 	* config/i386/i386.c (output_indirect_thunk): Support
>> 	-mindirect-branch-loop=.
>> 	* config/i386/i386.opt (mindirect-branch-loop=): New option.
>> 	(indirect_branch_loop): New.
>> 	(lfence): Likewise.
>> 	(pause): Likewise.
>> 	(nop): Likewise.
>> 	* doc/invoke.texi: Document -mindirect-branch-loop= option.
>> 
>> gcc/testsuite/
>> 
>> 	* gcc.target/i386/indirect-thunk-loop-1.c: New test.
>> 	* gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>> 	* gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> I think we should drop the ability to change the filler until such time
> as we really need it.  Just pick one and go with it.  I think David
> suggested that they wanted "pause".  I'm obviously fine with that.
>

unless I am mistaken (which is frankly quite possible, I am still not
quite up to speed about the nuances), AMD strongly prefers the lfence
variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
does not matter in the most pressing case and we might want to have a
mechanism doing something similar for protecting userspace later on.
But perhaps it is enough to keep the option?

Muthu and/or Venkat, can you please comment?

Thank you,

Martin

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-12  8:07                   ` Uros Bizjak
@ 2018-01-12 13:31                     ` H.J. Lu
  2018-01-12 15:09                       ` Jan Hubicka
  2018-01-14 16:21                     ` Uros Bizjak
  1 sibling, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-12 13:31 UTC (permalink / raw)
  To: Uros Bizjak, Jan Hubicka
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Fri, Jan 12, 2018 at 12:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> Hi Uros,
>>
>> Can you take a look at my x86 backend changes so that they are ready
>> to check in once we have consensus.
>
> Please finish the talks about the correct approach first. Once the
> consensus is reached, please post the final version of the patches for
> review.

A  new set of patches are posted at

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01041.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01044.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01045.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01043.html
https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01042.html

I will submit an additional patch to disallow
-mindirect-branch=/-mfunction-return=
with -mshstk.

> BTW: I have no detailed insight in these issues, so I'll look mostly
> at the implementation details, probably early next week.
>

Kernel teams are waiting for the GCC 8 upstream patches.  They
have been using my GCC 7 backports for weeks now.   Jan, can
you review my patches before Uros has time next week?

Thanks.


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 12:56     ` Martin Jambor
@ 2018-01-12 14:23       ` H.J. Lu
  2018-01-12 14:49         ` Kumar, Venkataramanan
  2018-01-12 16:03         ` Jeff Law
  0 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-12 14:23 UTC (permalink / raw)
  To: Martin Jambor
  Cc: Nagarajan, Muthu kumar raj, Kumar, Venkataramanan, GCC Patches, Jeff Law

On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
> Hi,
>
> On Thu, Jan 11 2018, Jeff Law wrote:
>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>> as loop filler.  The default is 'lfence'.
>>>
>>> gcc/
>>>
>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>      -mindirect-branch-loop=.
>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>      (indirect_branch_loop): New.
>>>      (lfence): Likewise.
>>>      (pause): Likewise.
>>>      (nop): Likewise.
>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>
>>> gcc/testsuite/
>>>
>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>> I think we should drop the ability to change the filler until such time
>> as we really need it.  Just pick one and go with it.  I think David
>> suggested that they wanted "pause".  I'm obviously fine with that.
>>
>
> unless I am mistaken (which is frankly quite possible, I am still not
> quite up to speed about the nuances), AMD strongly prefers the lfence
> variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
> does not matter in the most pressing case and we might want to have a
> mechanism doing something similar for protecting userspace later on.
> But perhaps it is enough to keep the option?
>
> Muthu and/or Venkat, can you please comment?

If we do want it, I will submit a separate patch AFTER the current patch
set has been approved and checked into GCC 8.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 14:23       ` H.J. Lu
@ 2018-01-12 14:49         ` Kumar, Venkataramanan
  2018-01-12 15:25           ` Kumar, Venkataramanan
  2018-01-12 16:03         ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: Kumar, Venkataramanan @ 2018-01-12 14:49 UTC (permalink / raw)
  To: H.J. Lu, Martin Jambor
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Jeff Law,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka

Hi all,

> -----Original Message-----
> From: H.J. Lu [mailto:hjl.tools@gmail.com]
> Sent: Friday, January 12, 2018 7:36 PM
> To: Martin Jambor <mjambor@suse.cz>
> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
> Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
> > Hi,
> >
> > On Thu, Jan 11 2018, Jeff Law wrote:
> >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
> >>> Add -mindirect-branch-loop= option to control loop filler in call
> >>> and return thunks generated by -mindirect-branch=.  'lfence' uses
> "lfence"
> >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> >>> as loop filler.  The default is 'lfence'.
> >>>
> >>> gcc/
> >>>
> >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
> >>>      * config/i386/i386.c (output_indirect_thunk): Support
> >>>      -mindirect-branch-loop=.
> >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
> >>>      (indirect_branch_loop): New.
> >>>      (lfence): Likewise.
> >>>      (pause): Likewise.
> >>>      (nop): Likewise.
> >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
> >>>
> >>> gcc/testsuite/
> >>>
> >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
> >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> >> I think we should drop the ability to change the filler until such
> >> time as we really need it.  Just pick one and go with it.  I think
> >> David suggested that they wanted "pause".  I'm obviously fine with that.
> >>
> >
> > unless I am mistaken (which is frankly quite possible, I am still not
> > quite up to speed about the nuances), AMD strongly prefers the lfence
> > variant.  OTOH, IIUC, in kernel this will be run-time patched but so
> > it does not matter in the most pressing case and we might want to have
> > a mechanism doing something similar for protecting userspace later on.
> > But perhaps it is enough to keep the option?
> >
> > Muthu and/or Venkat, can you please comment?
> 
> If we do want it, I will submit a separate patch AFTER the current patch set
> has been approved and checked into GCC 8.
> 

As per AMD architects, using “lfence” in “retpoline” is better than “pause” for our targets.
So please allow filler to use "lfence". 

Regards.
Venkat.
> --
> H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-12 13:31                     ` H.J. Lu
@ 2018-01-12 15:09                       ` Jan Hubicka
  2018-01-12 15:30                         ` H.J. Lu
  0 siblings, 1 reply; 135+ messages in thread
From: Jan Hubicka @ 2018-01-12 15:09 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Uros Bizjak, Richard Biener, Jeff Law, Jakub Jelinek,
	Eric Botcazou, GCC Patches

> On Fri, Jan 12, 2018 at 12:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> > On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> >> Hi Uros,
> >>
> >> Can you take a look at my x86 backend changes so that they are ready
> >> to check in once we have consensus.
> >
> > Please finish the talks about the correct approach first. Once the
> > consensus is reached, please post the final version of the patches for
> > review.
> 
> A  new set of patches are posted at
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01041.html
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01044.html
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01045.html
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01043.html
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01042.html
> 
> I will submit an additional patch to disallow
> -mindirect-branch=/-mfunction-return=
> with -mshstk.
> 
> > BTW: I have no detailed insight in these issues, so I'll look mostly
> > at the implementation details, probably early next week.
> >
> 
> Kernel teams are waiting for the GCC 8 upstream patches.  They
> have been using my GCC 7 backports for weeks now.   Jan, can
> you review my patches before Uros has time next week?

I have already read the original series, so I can take a look at the
updated ones. Did we get some concensus on how much we want to do
in middle-end?

Honza
> 
> Thanks.
> 
> 
> -- 
> H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 14:49         ` Kumar, Venkataramanan
@ 2018-01-12 15:25           ` Kumar, Venkataramanan
  2018-01-12 16:02             ` Jeff Law
  2018-01-12 18:32             ` Kumar, Venkataramanan
  0 siblings, 2 replies; 135+ messages in thread
From: Kumar, Venkataramanan @ 2018-01-12 15:25 UTC (permalink / raw)
  To: H.J. Lu, Martin Jambor
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Jeff Law,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka

Hi all, 

> -----Original Message-----
> From: Kumar, Venkataramanan
> Sent: Friday, January 12, 2018 8:16 PM
> To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; Uros
> Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
> <jh@suse.de>
> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> Hi all,
> 
> > -----Original Message-----
> > From: H.J. Lu [mailto:hjl.tools@gmail.com]
> > Sent: Friday, January 12, 2018 7:36 PM
> > To: Martin Jambor <mjambor@suse.cz>
> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
> Patches
> > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
> > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> >
> > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
> > > Hi,
> > >
> > > On Thu, Jan 11 2018, Jeff Law wrote:
> > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
> > >>> Add -mindirect-branch-loop= option to control loop filler in call
> > >>> and return thunks generated by -mindirect-branch=.  'lfence' uses
> > "lfence"
> > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> > >>> as loop filler.  The default is 'lfence'.
> > >>>
> > >>> gcc/
> > >>>
> > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
> > >>>      * config/i386/i386.c (output_indirect_thunk): Support
> > >>>      -mindirect-branch-loop=.
> > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
> > >>>      (indirect_branch_loop): New.
> > >>>      (lfence): Likewise.
> > >>>      (pause): Likewise.
> > >>>      (nop): Likewise.
> > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
> > >>>
> > >>> gcc/testsuite/
> > >>>
> > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
> > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> > >> I think we should drop the ability to change the filler until such
> > >> time as we really need it.  Just pick one and go with it.  I think
> > >> David suggested that they wanted "pause".  I'm obviously fine with that.
> > >>
> > >
> > > unless I am mistaken (which is frankly quite possible, I am still
> > > not quite up to speed about the nuances), AMD strongly prefers the
> > > lfence variant.  OTOH, IIUC, in kernel this will be run-time patched
> > > but so it does not matter in the most pressing case and we might
> > > want to have a mechanism doing something similar for protecting
> userspace later on.
> > > But perhaps it is enough to keep the option?
> > >
> > > Muthu and/or Venkat, can you please comment?
> >
> > If we do want it, I will submit a separate patch AFTER the current
> > patch set has been approved and checked into GCC 8.
> >
> 
> As per AMD architects, using “lfence” in “retpoline” is better than “pause” for
> our targets.
> So please allow filler to use "lfence".
> 
We also leant that "lfence" is a dispatch serializing instruction.  The Pause instruction is not serializing on AMD processors and has high latencies. 

 Regards.
 Venkat.
> > --
> > H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-12 15:09                       ` Jan Hubicka
@ 2018-01-12 15:30                         ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-12 15:30 UTC (permalink / raw)
  To: Jan Hubicka
  Cc: Uros Bizjak, Richard Biener, Jeff Law, Jakub Jelinek,
	Eric Botcazou, GCC Patches

On Fri, Jan 12, 2018 at 6:50 AM, Jan Hubicka <hubicka@ucw.cz> wrote:
>> On Fri, Jan 12, 2018 at 12:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> > On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> >> Hi Uros,
>> >>
>> >> Can you take a look at my x86 backend changes so that they are ready
>> >> to check in once we have consensus.
>> >
>> > Please finish the talks about the correct approach first. Once the
>> > consensus is reached, please post the final version of the patches for
>> > review.
>>
>> A  new set of patches are posted at
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01041.html
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01044.html
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01045.html
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01043.html
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01042.html
>>
>> I will submit an additional patch to disallow
>> -mindirect-branch=/-mfunction-return=
>> with -mshstk.
>>
>> > BTW: I have no detailed insight in these issues, so I'll look mostly
>> > at the implementation details, probably early next week.
>> >
>>
>> Kernel teams are waiting for the GCC 8 upstream patches.  They
>> have been using my GCC 7 backports for weeks now.   Jan, can
>> you review my patches before Uros has time next week?
>
> I have already read the original series, so I can take a look at the

Thanks.

> updated ones. Did we get some concensus on how much we want to do
> in middle-end?

I believe so.  Most of people, including Jeff, now think that my x86 specific
approach is the way to go.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 15:25           ` Kumar, Venkataramanan
@ 2018-01-12 16:02             ` Jeff Law
  2018-01-12 18:32             ` Kumar, Venkataramanan
  1 sibling, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-12 16:02 UTC (permalink / raw)
  To: Kumar, Venkataramanan, H.J. Lu, Martin Jambor
  Cc: Nagarajan, Muthu kumar raj, GCC Patches,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka

On 01/12/2018 08:09 AM, Kumar, Venkataramanan wrote:
> Hi all, 
> 
>> -----Original Message-----
>> From: Kumar, Venkataramanan
>> Sent: Friday, January 12, 2018 8:16 PM
>> To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>; Uros
>> Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> <jh@suse.de>
>> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>
>> Hi all,
>>
>>> -----Original Message-----
>>> From: H.J. Lu [mailto:hjl.tools@gmail.com]
>>> Sent: Friday, January 12, 2018 7:36 PM
>>> To: Martin Jambor <mjambor@suse.cz>
>>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>>> Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
>> Patches
>>> <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
>>> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>>
>>> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
>>>> Hi,
>>>>
>>>> On Thu, Jan 11 2018, Jeff Law wrote:
>>>>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>>>>> Add -mindirect-branch-loop= option to control loop filler in call
>>>>>> and return thunks generated by -mindirect-branch=.  'lfence' uses
>>> "lfence"
>>>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>>>> as loop filler.  The default is 'lfence'.
>>>>>>
>>>>>> gcc/
>>>>>>
>>>>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>>>>      -mindirect-branch-loop=.
>>>>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>>>>      (indirect_branch_loop): New.
>>>>>>      (lfence): Likewise.
>>>>>>      (pause): Likewise.
>>>>>>      (nop): Likewise.
>>>>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>>>>
>>>>>> gcc/testsuite/
>>>>>>
>>>>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>>>>> I think we should drop the ability to change the filler until such
>>>>> time as we really need it.  Just pick one and go with it.  I think
>>>>> David suggested that they wanted "pause".  I'm obviously fine with that.
>>>>>
>>>>
>>>> unless I am mistaken (which is frankly quite possible, I am still
>>>> not quite up to speed about the nuances), AMD strongly prefers the
>>>> lfence variant.  OTOH, IIUC, in kernel this will be run-time patched
>>>> but so it does not matter in the most pressing case and we might
>>>> want to have a mechanism doing something similar for protecting
>> userspace later on.
>>>> But perhaps it is enough to keep the option?
>>>>
>>>> Muthu and/or Venkat, can you please comment?
>>>
>>> If we do want it, I will submit a separate patch AFTER the current
>>> patch set has been approved and checked into GCC 8.
>>>
>>
>> As per AMD architects, using “lfence” in “retpoline” is better than “pause” for
>> our targets.
>> So please allow filler to use "lfence".
>>
> We also leant that "lfence" is a dispatch serializing instruction.  The Pause instruction is not serializing on AMD processors and has high latencies. 
So the concern I have if we end up wanting different sequences for
different micro-architectures is you either up having to build multiple
kernels, libraries and executables or you end up doing dynamic patching
(ifuncs which are often used for this kind of dynamic selection don't
really help in this particular scenario).

It's unlikely we'll see distributors building multiple kernels,
libraries and executables within a processor family.  The kernel does
have dynamic patching, but we don't have that for anything in user space
(but I think folks will be looking at that in the relatively near future).

So it's difficult to see a scenario where having the compiler make
different codegen choices for these sequences based on the
microarchitecture is all that useful.

ISTM we want to pick a sequence that is reasonably OK, then look to
dynamic patching to micro-optimize if the benefit is significant.
Ideally going forward these issues will be largely addressed at the
hardware level with perhaps some cooperation with kernel and repolines
become a mitigation technique for legacy hardware.

Thoughts Venkat?

Jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 14:23       ` H.J. Lu
  2018-01-12 14:49         ` Kumar, Venkataramanan
@ 2018-01-12 16:03         ` Jeff Law
  1 sibling, 0 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-12 16:03 UTC (permalink / raw)
  To: H.J. Lu, Martin Jambor
  Cc: Nagarajan, Muthu kumar raj, Kumar, Venkataramanan, GCC Patches

On 01/12/2018 07:05 AM, H.J. Lu wrote:
> On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz> wrote:
>> Hi,
>>
>> On Thu, Jan 11 2018, Jeff Law wrote:
>>> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>>>> Add -mindirect-branch-loop= option to control loop filler in call and
>>>> return thunks generated by -mindirect-branch=.  'lfence' uses "lfence"
>>>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>>>> as loop filler.  The default is 'lfence'.
>>>>
>>>> gcc/
>>>>
>>>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>>>>      * config/i386/i386.c (output_indirect_thunk): Support
>>>>      -mindirect-branch-loop=.
>>>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>>>>      (indirect_branch_loop): New.
>>>>      (lfence): Likewise.
>>>>      (pause): Likewise.
>>>>      (nop): Likewise.
>>>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>>>>
>>>> gcc/testsuite/
>>>>
>>>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>>>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>>>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>>> I think we should drop the ability to change the filler until such time
>>> as we really need it.  Just pick one and go with it.  I think David
>>> suggested that they wanted "pause".  I'm obviously fine with that.
>>>
>>
>> unless I am mistaken (which is frankly quite possible, I am still not
>> quite up to speed about the nuances), AMD strongly prefers the lfence
>> variant.  OTOH, IIUC, in kernel this will be run-time patched but so it
>> does not matter in the most pressing case and we might want to have a
>> mechanism doing something similar for protecting userspace later on.
>> But perhaps it is enough to keep the option?
>>
>> Muthu and/or Venkat, can you please comment?
> 
> If we do want it, I will submit a separate patch AFTER the current patch
> set has been approved and checked into GCC 8.
That would be my preference, if we need it at all.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-11 23:09     ` Jakub Jelinek
@ 2018-01-12 17:59       ` Jeff Law
  2018-01-12 18:00         ` Jakub Jelinek
  2018-01-13  9:59         ` David Woodhouse
  0 siblings, 2 replies; 135+ messages in thread
From: Jeff Law @ 2018-01-12 17:59 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On 01/11/2018 04:07 PM, Jakub Jelinek wrote:
> On Thu, Jan 11, 2018 at 03:46:51PM -0700, Jeff Law wrote:
>> Note I'm expecting Uros to chime in.  So please do not consider this
>> ack'd until you hear from Uros.
>>
>> At a high level is there really that much value in having thunks in the
>> object file?  Why not put the full set of thunks into libgcc and just
>> allow selection between inline sequences and external thunks
>> (thunk-inline and thunk-external)?  It's not a huge simplification, but
>> if there isn't a compelling reason, let's drop the in-object-file thunks.
> 
> Not everything is linked against libgcc.a, something is linked against just
> libgcc_s.so.1, other stuff against both, some libraries against none of that.
> Probably it is undesirable to have the thunks at non-constant offsets from
> the uses, that would need text relocations.  Thunks emitted in the object
> files, hidden and comdat merged between .o files like what we have say for
> i686 PIC thunks seems like the best default to me and a way for the kernel
> to override that.
For things that don't link against libgcc, they would (of course) be
expected to provide the necessary thunks.  The kernel would be the
obvious example and that's precisely what they're going to do.

WRT text relocs, yea that sucks, but if we're going to have user space
mitigations, then we're likely going to need those relocs so that the
thunks can be patched out.  I'm actually hoping we're not going to need
user space mitigations for spectre v2 and we can avoid this problem..



I'm just not convinved there's a lot of value there, but I'm not going
to hold things up on that.  So I won't object on this basis.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-12 17:59       ` Jeff Law
@ 2018-01-12 18:00         ` Jakub Jelinek
  2018-01-13  9:59         ` David Woodhouse
  1 sibling, 0 replies; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-12 18:00 UTC (permalink / raw)
  To: Jeff Law; +Cc: H.J. Lu, gcc-patches

On Fri, Jan 12, 2018 at 10:57:08AM -0700, Jeff Law wrote:
> On 01/11/2018 04:07 PM, Jakub Jelinek wrote:
> > On Thu, Jan 11, 2018 at 03:46:51PM -0700, Jeff Law wrote:
> >> Note I'm expecting Uros to chime in.  So please do not consider this
> >> ack'd until you hear from Uros.
> >>
> >> At a high level is there really that much value in having thunks in the
> >> object file?  Why not put the full set of thunks into libgcc and just
> >> allow selection between inline sequences and external thunks
> >> (thunk-inline and thunk-external)?  It's not a huge simplification, but
> >> if there isn't a compelling reason, let's drop the in-object-file thunks.
> > 
> > Not everything is linked against libgcc.a, something is linked against just
> > libgcc_s.so.1, other stuff against both, some libraries against none of that.
> > Probably it is undesirable to have the thunks at non-constant offsets from
> > the uses, that would need text relocations.  Thunks emitted in the object
> > files, hidden and comdat merged between .o files like what we have say for
> > i686 PIC thunks seems like the best default to me and a way for the kernel
> > to override that.
> For things that don't link against libgcc, they would (of course) be
> expected to provide the necessary thunks.  The kernel would be the
> obvious example and that's precisely what they're going to do.
> 
> WRT text relocs, yea that sucks, but if we're going to have user space
> mitigations, then we're likely going to need those relocs so that the
> thunks can be patched out.  I'm actually hoping we're not going to need
> user space mitigations for spectre v2 and we can avoid this problem..

Some architectures don't allow text relocations at all, including x86_64.
So any kind of runtime patching isn't that easy, and is generally a security
hole that e.g. SELinux prevents as well.

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 15:25           ` Kumar, Venkataramanan
  2018-01-12 16:02             ` Jeff Law
@ 2018-01-12 18:32             ` Kumar, Venkataramanan
  2018-01-12 19:51               ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Kumar, Venkataramanan @ 2018-01-12 18:32 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor, Jeff Law,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

Hi HJ, 

> -----Original Message-----
> From: Kumar, Venkataramanan
> Sent: Friday, January 12, 2018 8:39 PM
> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor' <mjambor@suse.cz>
> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;
> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
> <jh@suse.de>
> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> Hi all,
> 
> > -----Original Message-----
> > From: Kumar, Venkataramanan
> > Sent: Friday, January 12, 2018 8:16 PM
> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;
> Uros
> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
> > <jh@suse.de>
> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> >
> > Hi all,
> >
> > > -----Original Message-----
> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]
> > > Sent: Friday, January 12, 2018 7:36 PM
> > > To: Martin Jambor <mjambor@suse.cz>
> > > Cc: Nagarajan, Muthu kumar raj
> <Muthukumarraj.Nagarajan@amd.com>;
> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
> > Patches
> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> > >
> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>
> wrote:
> > > > Hi,
> > > >
> > > > On Thu, Jan 11 2018, Jeff Law wrote:
> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
> > > >>> Add -mindirect-branch-loop= option to control loop filler in
> > > >>> call and return thunks generated by -mindirect-branch=.
> > > >>> 'lfence' uses
> > > "lfence"
> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> > > >>> as loop filler.  The default is 'lfence'.
> > > >>>
> > > >>> gcc/
> > > >>>
> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support
> > > >>>      -mindirect-branch-loop=.
> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
> > > >>>      (indirect_branch_loop): New.
> > > >>>      (lfence): Likewise.
> > > >>>      (pause): Likewise.
> > > >>>      (nop): Likewise.
> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
> > > >>>
> > > >>> gcc/testsuite/
> > > >>>
> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> > > >> I think we should drop the ability to change the filler until
> > > >> such time as we really need it.  Just pick one and go with it.  I
> > > >> think David suggested that they wanted "pause".  I'm obviously fine
> with that.
> > > >>
> > > >
> > > > unless I am mistaken (which is frankly quite possible, I am still
> > > > not quite up to speed about the nuances), AMD strongly prefers the
> > > > lfence variant.  OTOH, IIUC, in kernel this will be run-time
> > > > patched but so it does not matter in the most pressing case and we
> > > > might want to have a mechanism doing something similar for
> > > > protecting
> > userspace later on.
> > > > But perhaps it is enough to keep the option?
> > > >
> > > > Muthu and/or Venkat, can you please comment?
> > >
> > > If we do want it, I will submit a separate patch AFTER the current
> > > patch set has been approved and checked into GCC 8.
> > >
> >
> > As per AMD architects, using “lfence” in “retpoline” is better than
> > “pause” for our targets.
> > So please allow filler to use "lfence".
> >
> We also leant that "lfence" is a dispatch serializing instruction.  The Puse
> instruction is not serializing on AMD processors and has high latencies.
> 
Any reason why Intel has chosen "pause" over "lfence" as the default loop filler for Retpoline?

>  Regards.
>  Venkat.
> > > --
> > > H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 18:32             ` Kumar, Venkataramanan
@ 2018-01-12 19:51               ` H.J. Lu
  2018-01-13  6:26                 ` Kumar, Venkataramanan
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-12 19:51 UTC (permalink / raw)
  To: Kumar, Venkataramanan
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor, Jeff Law,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

On Fri, Jan 12, 2018 at 10:27 AM, Kumar, Venkataramanan
<Venkataramanan.Kumar@amd.com> wrote:
> Hi HJ,
>
>> -----Original Message-----
>> From: Kumar, Venkataramanan
>> Sent: Friday, January 12, 2018 8:39 PM
>> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor' <mjambor@suse.cz>
>> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;
>> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> <jh@suse.de>
>> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>>
>> Hi all,
>>
>> > -----Original Message-----
>> > From: Kumar, Venkataramanan
>> > Sent: Friday, January 12, 2018 8:16 PM
>> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor <mjambor@suse.cz>
>> > Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
>> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;
>> Uros
>> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
>> > <jh@suse.de>
>> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>> >
>> > Hi all,
>> >
>> > > -----Original Message-----
>> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]
>> > > Sent: Friday, January 12, 2018 7:36 PM
>> > > To: Martin Jambor <mjambor@suse.cz>
>> > > Cc: Nagarajan, Muthu kumar raj
>> <Muthukumarraj.Nagarajan@amd.com>;
>> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
>> > Patches
>> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
>> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
>> > >
>> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>
>> wrote:
>> > > > Hi,
>> > > >
>> > > > On Thu, Jan 11 2018, Jeff Law wrote:
>> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
>> > > >>> Add -mindirect-branch-loop= option to control loop filler in
>> > > >>> call and return thunks generated by -mindirect-branch=.
>> > > >>> 'lfence' uses
>> > > "lfence"
>> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
>> > > >>> as loop filler.  The default is 'lfence'.
>> > > >>>
>> > > >>> gcc/
>> > > >>>
>> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
>> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support
>> > > >>>      -mindirect-branch-loop=.
>> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
>> > > >>>      (indirect_branch_loop): New.
>> > > >>>      (lfence): Likewise.
>> > > >>>      (pause): Likewise.
>> > > >>>      (nop): Likewise.
>> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
>> > > >>>
>> > > >>> gcc/testsuite/
>> > > >>>
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
>> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
>> > > >> I think we should drop the ability to change the filler until
>> > > >> such time as we really need it.  Just pick one and go with it.  I
>> > > >> think David suggested that they wanted "pause".  I'm obviously fine
>> with that.
>> > > >>
>> > > >
>> > > > unless I am mistaken (which is frankly quite possible, I am still
>> > > > not quite up to speed about the nuances), AMD strongly prefers the
>> > > > lfence variant.  OTOH, IIUC, in kernel this will be run-time
>> > > > patched but so it does not matter in the most pressing case and we
>> > > > might want to have a mechanism doing something similar for
>> > > > protecting
>> > userspace later on.
>> > > > But perhaps it is enough to keep the option?
>> > > >
>> > > > Muthu and/or Venkat, can you please comment?
>> > >
>> > > If we do want it, I will submit a separate patch AFTER the current
>> > > patch set has been approved and checked into GCC 8.
>> > >
>> >
>> > As per AMD architects, using “lfence” in “retpoline” is better than
>> > “pause” for our targets.
>> > So please allow filler to use "lfence".
>> >
>> We also leant that "lfence" is a dispatch serializing instruction.  The Puse
>> instruction is not serializing on AMD processors and has high latencies.
>>
> Any reason why Intel has chosen "pause" over "lfence" as the default loop filler for Retpoline?
>

My original patch uses "lfence".  I was asked to use "pause":

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-12 19:51               ` H.J. Lu
@ 2018-01-13  6:26                 ` Kumar, Venkataramanan
  2018-01-13  9:03                   ` David Woodhouse
  2018-01-13 16:36                   ` David Woodhouse
  0 siblings, 2 replies; 135+ messages in thread
From: Kumar, Venkataramanan @ 2018-01-13  6:26 UTC (permalink / raw)
  To: H.J. Lu, dwmw2, Jeff Law
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

Hi All, 

> -----Original Message-----
> From: H.J. Lu [mailto:hjl.tools@gmail.com]
> Sent: Saturday, January 13, 2018 1:11 AM
> To: Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>
> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> GCC Patches <gcc-patches@gcc.gnu.org>; Martin Jambor
> <mjambor@suse.cz>; Jeff Law <law@redhat.com>; Uros Bizjak
> (ubizjak@gmail.com) <ubizjak@gmail.com>; Jan Hubicka <jh@suse.de>;
> Dharmakan, Rohit arul raj <Rohitarulraj.Dharmakan@amd.com>
> Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> On Fri, Jan 12, 2018 at 10:27 AM, Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com> wrote:
> > Hi HJ,
> >
> >> -----Original Message-----
> >> From: Kumar, Venkataramanan
> >> Sent: Friday, January 12, 2018 8:39 PM
> >> To: 'H.J. Lu' <hjl.tools@gmail.com>; 'Martin Jambor'
> >> <mjambor@suse.cz>
> >> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> >> 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Jeff Law' <law@redhat.com>;
> >> Uros Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
> >> <jh@suse.de>
> >> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> >>
> >> Hi all,
> >>
> >> > -----Original Message-----
> >> > From: Kumar, Venkataramanan
> >> > Sent: Friday, January 12, 2018 8:16 PM
> >> > To: 'H.J. Lu' <hjl.tools@gmail.com>; Martin Jambor
> >> > <mjambor@suse.cz>
> >> > Cc: Nagarajan, Muthu kumar raj
> <Muthukumarraj.Nagarajan@amd.com>;
> >> > GCC Patches <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>;
> >> Uros
> >> > Bizjak (ubizjak@gmail.com) <ubizjak@gmail.com>; 'Jan Hubicka'
> >> > <jh@suse.de>
> >> > Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> >> >
> >> > Hi all,
> >> >
> >> > > -----Original Message-----
> >> > > From: H.J. Lu [mailto:hjl.tools@gmail.com]
> >> > > Sent: Friday, January 12, 2018 7:36 PM
> >> > > To: Martin Jambor <mjambor@suse.cz>
> >> > > Cc: Nagarajan, Muthu kumar raj
> >> <Muthukumarraj.Nagarajan@amd.com>;
> >> > > Kumar, Venkataramanan <Venkataramanan.Kumar@amd.com>; GCC
> >> > Patches
> >> > > <gcc-patches@gcc.gnu.org>; Jeff Law <law@redhat.com>
> >> > > Subject: Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> >> > >
> >> > > On Fri, Jan 12, 2018 at 4:38 AM, Martin Jambor <mjambor@suse.cz>
> >> wrote:
> >> > > > Hi,
> >> > > >
> >> > > > On Thu, Jan 11 2018, Jeff Law wrote:
> >> > > >> On 01/07/2018 03:59 PM, H.J. Lu wrote:
> >> > > >>> Add -mindirect-branch-loop= option to control loop filler in
> >> > > >>> call and return thunks generated by -mindirect-branch=.
> >> > > >>> 'lfence' uses
> >> > > "lfence"
> >> > > >>> as loop filler.  'pause' uses "pause" as loop filler.  'nop' uses "nop"
> >> > > >>> as loop filler.  The default is 'lfence'.
> >> > > >>>
> >> > > >>> gcc/
> >> > > >>>
> >> > > >>>      * config/i386/i386-opts.h (indirect_branch_loop): New.
> >> > > >>>      * config/i386/i386.c (output_indirect_thunk): Support
> >> > > >>>      -mindirect-branch-loop=.
> >> > > >>>      * config/i386/i386.opt (mindirect-branch-loop=): New option.
> >> > > >>>      (indirect_branch_loop): New.
> >> > > >>>      (lfence): Likewise.
> >> > > >>>      (pause): Likewise.
> >> > > >>>      (nop): Likewise.
> >> > > >>>      * doc/invoke.texi: Document -mindirect-branch-loop= option.
> >> > > >>>
> >> > > >>> gcc/testsuite/
> >> > > >>>
> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-1.c: New test.
> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-2.c: Likewise.
> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-3.c: Likewise.
> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-4.c: Likewise.
> >> > > >>>      * gcc.target/i386/indirect-thunk-loop-5.c: Likewise.
> >> > > >> I think we should drop the ability to change the filler until
> >> > > >> such time as we really need it.  Just pick one and go with it.
> >> > > >> I think David suggested that they wanted "pause".  I'm
> >> > > >> obviously fine
> >> with that.
> >> > > >>
> >> > > >
> >> > > > unless I am mistaken (which is frankly quite possible, I am
> >> > > > still not quite up to speed about the nuances), AMD strongly
> >> > > > prefers the lfence variant.  OTOH, IIUC, in kernel this will be
> >> > > > run-time patched but so it does not matter in the most pressing
> >> > > > case and we might want to have a mechanism doing something
> >> > > > similar for protecting
> >> > userspace later on.
> >> > > > But perhaps it is enough to keep the option?
> >> > > >
> >> > > > Muthu and/or Venkat, can you please comment?
> >> > >
> >> > > If we do want it, I will submit a separate patch AFTER the
> >> > > current patch set has been approved and checked into GCC 8.
> >> > >
> >> >
> >> > As per AMD architects, using “lfence” in “retpoline” is better than
> >> > “pause” for our targets.
> >> > So please allow filler to use "lfence".
> >> >
> >> We also leant that "lfence" is a dispatch serializing instruction.
> >> The Puse instruction is not serializing on AMD processors and has high
> latencies.
> >>
> > Any reason why Intel has chosen "pause" over "lfence" as the default loop
> filler for Retpoline?
> >
> 
> My original patch uses "lfence".  I was asked to use "pause":
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html

If everyone is ok, my suggestion is to use  "lfence" as the default loop filler for retpoline.

Please confirm.

> 
> --
> H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-13  6:26                 ` Kumar, Venkataramanan
@ 2018-01-13  9:03                   ` David Woodhouse
  2018-01-13 16:36                   ` David Woodhouse
  1 sibling, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-13  9:03 UTC (permalink / raw)
  To: Kumar, Venkataramanan, H.J. Lu, Jeff Law, Paul Turner, asit.k.mallick
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

[-- Attachment #1: Type: text/plain, Size: 769 bytes --]

On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
> 
> > My original patch uses "lfence".  I was asked to use "pause":
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
> 
> If everyone is ok, my suggestion is to use  "lfence" as the default
> loop filler for retpoline.
> 
> Please confirm.

I have had the same request for the kernel patches. I'm happy with it
but would like confirmation from Intel and from Paul Turner (whose idea
this is, and who has overseen most of the coherent analysis).

FWIW I haven't actually *changed* the kernel patch yet, awaiting that
confirmation. I understand this is a power optimisation only;
preventing the CPU from spinning in that loop when it's mispredicted a
return to it.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-12 17:59       ` Jeff Law
  2018-01-12 18:00         ` Jakub Jelinek
@ 2018-01-13  9:59         ` David Woodhouse
  2018-01-13 16:23           ` Jeff Law
  1 sibling, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-13  9:59 UTC (permalink / raw)
  To: Jeff Law, Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 931 bytes --]

On Fri, 2018-01-12 at 10:57 -0700, Jeff Law wrote:
> 
> WRT text relocs, yea that sucks, but if we're going to have user space
> mitigations, then we're likely going to need those relocs so that the
> thunks can be patched out.  I'm actually hoping we're not going to need
> user space mitigations for spectre v2 and we can avoid this problem..

As things stand with retpoline in the kernel, userspace  processes
aren't protected from each other. The attack mode is complex and
probably fairly unlikely, and we need to get the new microcode support
into the kernel, with the IBPB (flush branch predictor) MSR. And for
the kernel to use it, of course.

In the meantime, it does potentially make sense for sensitive userspace
processes to be compiled this way. Especially if they're going to run
external code (like JavaScript) and attempt to sandbox it — which is
something that IBPB isn't going to solve either.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-11 21:42       ` David Woodhouse
@ 2018-01-13 12:17         ` Florian Weimer
  2018-01-13 13:00           ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: Florian Weimer @ 2018-01-13 12:17 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Jeff Law, H.J. Lu, gcc-patches

* David Woodhouse:

>> Do you assume that you will eventually apply run-time patching to
>> thunks (in case they aren't needed)?
>
> "eventually"? We've been doing it for weeks. We are desperate to
> release the kernel.... when can we have agreement on at *least* the
> command line option and the name of the thunk? Internal implementation
> details we really don't care about, but those we do.

Thanks.  Are you content with patching just the thunk, or would you
prefer patching the control transfer to it, too?

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-13 12:17         ` Florian Weimer
@ 2018-01-13 13:00           ` David Woodhouse
  0 siblings, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-13 13:00 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Jeff Law, H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2542 bytes --]

On Sat, 2018-01-13 at 13:01 +0100, Florian Weimer wrote:
> * David Woodhouse:
> 
> > 
> > > 
> > > Do you assume that you will eventually apply run-time patching to
> > > thunks (in case they aren't needed)?
> > "eventually"? We've been doing it for weeks. We are desperate to
> > release the kernel.... when can we have agreement on at *least* the
> > command line option and the name of the thunk? Internal implementation
> > details we really don't care about, but those we do.
>
> Thanks.  Are you content with patching just the thunk, or would you
> prefer patching the control transfer to it, too?

As things stand, GCC will put the target into a register and then call
the thunk.

We currently patch the thunk to turn it into a straight 'jmp *%\reg'
instead of the full retpoline (or perhaps 'lfence; jmp *\reg' on AMD).

This means we have the direct unconditional branch instruction emitted
by GCC, directly followed by a branch-to-register. And presumably you
aren't talking about deep magic here, just turning that 'call
__x86_indirect_branch_\reg' back into a 'call *\reg' inline in the C
code? Probably not even eliminating the load into the register in the
first place, for targets which actually came from memory?

So right now, for this specific use case, we are content with just
patching the thunk. It's not like that unconditional direct branch to
the thunk is hard for the CPU to predict. ;)

Now, if we're talking about the general case of wanting to runtime-
patch code emitted by GCC, there are much better use cases. Like
letting you emit 'movbe;nop;nop;nop' in appropriate cases, but telling
us where they are so we can hotpatch them into a 'mov;bswap' on CPUs
that need it. And other similar cases which are *much* more interesting
than the retpoline thing.¹

If we get that generic capability, then by all means we might as well
do the retpoline thunks too in future. But we're not losing sleep over
it right now.

What we're losing sleep over right now is the fact that we haven't even
got an agreement that the command line options, and the thunk symbol
name, are OK to use. Regardless of how you *implement* those, we really
want to commit the kernel code to use it within the next 24 hours...


¹ I actually just had a *really* nasty thought about that. If I start
the C file with asm(".include movbe-hack.h") and movbe-hack.h contains
a '.macro movbe' which does the normal Linux ALTERNATIVE thing... no,
you're going to come and hurt me, aren't you... 

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-13  9:59         ` David Woodhouse
@ 2018-01-13 16:23           ` Jeff Law
  2018-01-13 16:35             ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: Jeff Law @ 2018-01-13 16:23 UTC (permalink / raw)
  To: David Woodhouse, Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

On 01/13/2018 02:03 AM, David Woodhouse wrote:
> On Fri, 2018-01-12 at 10:57 -0700, Jeff Law wrote:
>>
>> WRT text relocs, yea that sucks, but if we're going to have user space
>> mitigations, then we're likely going to need those relocs so that the
>> thunks can be patched out.  I'm actually hoping we're not going to need
>> user space mitigations for spectre v2 and we can avoid this problem..
> 
> As things stand with retpoline in the kernel, userspace  processes
> aren't protected from each other. The attack mode is complex and
> probably fairly unlikely, and we need to get the new microcode support
> into the kernel, with the IBPB (flush branch predictor) MSR. And for
> the kernel to use it, of course.
Correct, but for a user<->user exploit don't you have to at some point
run through a context switch?  That seems to be a point where we should
seriously think about flushing the predictor.

That wouldn't help code user space threading packages such as npth or
goroutines that multiplex on top of pthreads, but I'm happy to punt
those in the immediate term.

> 
> In the meantime, it does potentially make sense for sensitive userspace
> processes to be compiled this way. Especially if they're going to run
> external code (like JavaScript) and attempt to sandbox it — which is
> something that IBPB isn't going to solve either.
I've suspected that in the immediate term there will likely be some
sensitive packages compiled with -mretpoline to at least cut down the
attack surface while the hardware side sorts itself out.  But to totally
address the problem you have to build the entire system with -mretpoline
-- and then 18 months out we're unable to turn on something like CET
because retpolines and CET are fundamentally incompatible.  That seems
like a losing proposition.

jeff

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 1/5] x86: Add -mindirect-branch=
  2018-01-13 16:23           ` Jeff Law
@ 2018-01-13 16:35             ` David Woodhouse
  0 siblings, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-13 16:35 UTC (permalink / raw)
  To: Jeff Law, Jakub Jelinek; +Cc: H.J. Lu, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2649 bytes --]

On Sat, 2018-01-13 at 09:17 -0700, Jeff Law wrote:
> On 01/13/2018 02:03 AM, David Woodhouse wrote:
> > On Fri, 2018-01-12 at 10:57 -0700, Jeff Law wrote:
> > As things stand with retpoline in the kernel, userspace  processes
> > aren't protected from each other. The attack mode is complex and
> > probably fairly unlikely, and we need to get the new microcode support
> > into the kernel, with the IBPB (flush branch predictor) MSR. And for
> > the kernel to use it, of course.
>
> Correct, but for a user<->user exploit don't you have to at some point
> run through a context switch?  That seems to be a point where we should
> seriously think about flushing the predictor.

Yes, that is the very next thing on our TODO list. It requires the new
CPU microcode, and the kernel patches which are being polished now.

> That wouldn't help code user space threading packages such as npth or
> goroutines that multiplex on top of pthreads, but I'm happy to punt
> those in the immediate term.

Agreed.

> > In the meantime, it does potentially make sense for sensitive userspace
> > processes to be compiled this way. Especially if they're going to run
> > external code (like JavaScript) and attempt to sandbox it — which is
> > something that IBPB isn't going to solve either.
>
> I've suspected that in the immediate term there will likely be some
> sensitive packages compiled with -mretpoline to at least cut down the
> attack surface while the hardware side sorts itself out.  But to totally
> address the problem you have to build the entire system with -mretpoline
> -- and then 18 months out we're unable to turn on something like CET
> because retpolines and CET are fundamentally incompatible.  That seems
> like a losing proposition.

This is one of the reasons I asked HJ for -mindirect-branch-register.
It means that we can runtime patch the thunk into a simple 'jmp *\reg',
which wasn't possible with the original ret-equivalent version (we
didn't have a clobberable register).

Any future CPU with CET is also going to have the new IBRS_ALL feature
which you can turn on and forget, and patch out your retpolines.

https://software.intel.com/sites/default/files/managed/c5/63/336996-Speculative-Execution-Side-Channel-Mitigations.pdf

Of course, that's in the kernel. For runtime patching to work in
userspace, you really do need to put it somewhere other than inline,
and patch/select it once. But I don't care about *that* discussion,
because all I care about right now is reaching agreement on the command
line option and the thunk symbol name. Did I mention that before? :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-13  6:26                 ` Kumar, Venkataramanan
  2018-01-13  9:03                   ` David Woodhouse
@ 2018-01-13 16:36                   ` David Woodhouse
  2018-01-13 16:46                     ` H.J. Lu
  2018-01-13 16:46                     ` Van De Ven, Arjan
  1 sibling, 2 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-13 16:36 UTC (permalink / raw)
  To: Kumar, Venkataramanan, H.J. Lu, Jeff Law, Paul Turner,
	Van De Ven, Arjan, asit.k.mallick
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
> 
> > My original patch uses "lfence".  I was asked to use "pause":
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
> 
> If everyone is ok, my suggestion is to use  "lfence" as the default
> loop filler for retpoline.
> 
> Please confirm.

I have the same response to this as I did to precisely the same request
in the kernel implementation: I'm OK with it but I'd like an explicit
approval from Intel, and from Paul Turner at Google who architected the
retpoline concept in the first place.

cf. https://marc.info/?l=linux-kernel&m=151585246530355


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-13 16:36                   ` David Woodhouse
  2018-01-13 16:46                     ` H.J. Lu
@ 2018-01-13 16:46                     ` Van De Ven, Arjan
  2018-01-14  9:04                       ` Kumar, Venkataramanan
  1 sibling, 1 reply; 135+ messages in thread
From: Van De Ven, Arjan @ 2018-01-13 16:46 UTC (permalink / raw)
  To: David Woodhouse, Kumar, Venkataramanan, H.J. Lu, Jeff Law,
	Paul Turner, Mallick, Asit K
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

> > If everyone is ok, my suggestion is to use  "lfence" as the default
> > loop filler for retpoline.

can we do BOTH a pause and lfence.
(that way on cpu's where pause is the power stop, it works, and on cpus where it's a fallthrough (AMD) it goes to the lfence)



^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-13 16:36                   ` David Woodhouse
@ 2018-01-13 16:46                     ` H.J. Lu
  2018-01-13 16:46                     ` Van De Ven, Arjan
  1 sibling, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-13 16:46 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kumar, Venkataramanan, Jeff Law, Paul Turner, Van De Ven, Arjan,
	asit.k.mallick, Nagarajan, Muthu kumar raj, GCC Patches,
	Martin Jambor, Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

On Sat, Jan 13, 2018 at 8:34 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sat, 2018-01-13 at 03:11 +0000, Kumar, Venkataramanan wrote:
>>
>> > My original patch uses "lfence".  I was asked to use "pause":
>> >
>> > https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00969.html
>>
>> If everyone is ok, my suggestion is to use  "lfence" as the default
>> loop filler for retpoline.
>>
>> Please confirm.
>
> I have the same response to this as I did to precisely the same request
> in the kernel implementation: I'm OK with it but I'd like an explicit
> approval from Intel, and from Paul Turner at Google who architected the
> retpoline concept in the first place.
>
> cf. https://marc.info/?l=linux-kernel&m=151585246530355
>

There is no urgency on this.  We can take another look AFTER
my current patch set is approved and checked in.


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
  2018-01-13 16:46                     ` Van De Ven, Arjan
@ 2018-01-14  9:04                       ` Kumar, Venkataramanan
  0 siblings, 0 replies; 135+ messages in thread
From: Kumar, Venkataramanan @ 2018-01-14  9:04 UTC (permalink / raw)
  To: Van De Ven, Arjan, David Woodhouse, H.J. Lu, Jeff Law,
	Paul Turner, Mallick, Asit K
  Cc: Nagarajan, Muthu kumar raj, GCC Patches, Martin Jambor,
	Uros Bizjak (ubizjak@gmail.com),
	Jan Hubicka, Dharmakan, Rohit arul raj

Hi Arjan, 

> -----Original Message-----
> From: Van De Ven, Arjan [mailto:arjan.van.de.ven@intel.com]
> Sent: Saturday, January 13, 2018 10:16 PM
> To: David Woodhouse <dwmw2@infradead.org>; Kumar, Venkataramanan
> <Venkataramanan.Kumar@amd.com>; H.J. Lu <hjl.tools@gmail.com>; Jeff
> Law <law@redhat.com>; Paul Turner <pjt@google.com>; Mallick, Asit K
> <asit.k.mallick@intel.com>
> Cc: Nagarajan, Muthu kumar raj <Muthukumarraj.Nagarajan@amd.com>;
> GCC Patches <gcc-patches@gcc.gnu.org>; Martin Jambor
> <mjambor@suse.cz>; Uros Bizjak (ubizjak@gmail.com)
> <ubizjak@gmail.com>; Jan Hubicka <jh@suse.de>; Dharmakan, Rohit arul raj
> <Rohitarulraj.Dharmakan@amd.com>
> Subject: RE: [PATCH 2/5] x86: Add -mindirect-branch-loop=
> 
> > > If everyone is ok, my suggestion is to use  "lfence" as the default
> > > loop filler for retpoline.
> 
> can we do BOTH a pause and lfence.
> (that way on cpu's where pause is the power stop, it works, and on cpus
> where it's a fallthrough (AMD) it goes to the lfence)
> 

I checked with our Architect. Having just "pause" is the concern. 
It should also be fine for AMD to use "pause" followed by "lfence" in the loop of retpoline.

Regards,
Venkat.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-12  8:07                   ` Uros Bizjak
  2018-01-12 13:31                     ` H.J. Lu
@ 2018-01-14 16:21                     ` Uros Bizjak
  2018-01-14 16:39                       ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 16:21 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>
>> Hi Uros,
>>
>> Can you take a look at my x86 backend changes so that they are ready
>> to check in once we have consensus.
>
> Please finish the talks about the correct approach first. Once the
> consensus is reached, please post the final version of the patches for
> review.
>
> BTW: I have no detailed insight in these issues, so I'll look mostly
> at the implementation details, probably early next week.

One general remark is on the usage of -1 as an invalid register
number. We have INVALID_REGNUM definition for this, and many tests,
like:

      if (regno >= 0)

could become much more informative:

      if (regno != INVALID_REGNUM)

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:21                     ` Uros Bizjak
@ 2018-01-14 16:39                       ` H.J. Lu
  2018-01-14 16:41                         ` Uros Bizjak
  2018-01-14 16:45                         ` Uros Bizjak
  0 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 16:39 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>
>>> Hi Uros,
>>>
>>> Can you take a look at my x86 backend changes so that they are ready
>>> to check in once we have consensus.
>>
>> Please finish the talks about the correct approach first. Once the
>> consensus is reached, please post the final version of the patches for
>> review.
>>
>> BTW: I have no detailed insight in these issues, so I'll look mostly
>> at the implementation details, probably early next week.
>
> One general remark is on the usage of -1 as an invalid register

This has been rewritten.  The checked in patch no longer does that.

> number. We have INVALID_REGNUM definition for this, and many tests,
> like:
>
>       if (regno >= 0)
>
> could become much more informative:
>
>       if (regno != INVALID_REGNUM)



-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:39                       ` H.J. Lu
@ 2018-01-14 16:41                         ` Uros Bizjak
  2018-01-14 16:43                           ` H.J. Lu
  2018-01-14 16:45                         ` Uros Bizjak
  1 sibling, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 16:41 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>>> Hi Uros,
>>>>
>>>> Can you take a look at my x86 backend changes so that they are ready
>>>> to check in once we have consensus.
>>>
>>> Please finish the talks about the correct approach first. Once the
>>> consensus is reached, please post the final version of the patches for
>>> review.
>>>
>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>> at the implementation details, probably early next week.
>>
>> One general remark is on the usage of -1 as an invalid register
>
> This has been rewritten.  The checked in patch no longer does that.

Another issue:

+static void
+indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
+{
+  if (USE_HIDDEN_LINKONCE)
+    {
+      const char *bnd = need_bnd_p ? "_bnd" : "";
+      if (regno >= 0)
+    {
+      const char *reg_prefix;
+      if (LEGACY_INT_REGNO_P (regno))
+        reg_prefix = TARGET_64BIT ? "r" : "e";
+      else
+        reg_prefix = "";
+      sprintf (name, "__x86_indirect_thunk%s_%s%s",
+           bnd, reg_prefix, reg_names[regno]);
+    }
+      else
+    sprintf (name, "__x86_indirect_thunk%s", bnd);
+    }

What is the benefit of reg_prefix? Can't we just live with e.g.:

__x86_indirect_thunk_ax

which is the true register name and is valid for 32bit and 64bit targets.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:41                         ` Uros Bizjak
@ 2018-01-14 16:43                           ` H.J. Lu
  2018-01-14 16:48                             ` Jakub Jelinek
  2018-01-14 16:58                             ` Uros Bizjak
  0 siblings, 2 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 16:43 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>>> Hi Uros,
>>>>>
>>>>> Can you take a look at my x86 backend changes so that they are ready
>>>>> to check in once we have consensus.
>>>>
>>>> Please finish the talks about the correct approach first. Once the
>>>> consensus is reached, please post the final version of the patches for
>>>> review.
>>>>
>>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>>> at the implementation details, probably early next week.
>>>
>>> One general remark is on the usage of -1 as an invalid register
>>
>> This has been rewritten.  The checked in patch no longer does that.
>
> Another issue:
>
> +static void
> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
> +{
> +  if (USE_HIDDEN_LINKONCE)
> +    {
> +      const char *bnd = need_bnd_p ? "_bnd" : "";
> +      if (regno >= 0)
> +    {
> +      const char *reg_prefix;
> +      if (LEGACY_INT_REGNO_P (regno))
> +        reg_prefix = TARGET_64BIT ? "r" : "e";
> +      else
> +        reg_prefix = "";
> +      sprintf (name, "__x86_indirect_thunk%s_%s%s",
> +           bnd, reg_prefix, reg_names[regno]);
> +    }
> +      else
> +    sprintf (name, "__x86_indirect_thunk%s", bnd);
> +    }
>
> What is the benefit of reg_prefix? Can't we just live with e.g.:
>
> __x86_indirect_thunk_ax
>
> which is the true register name and is valid for 32bit and 64bit targets.

They are used in asm statements in kernel:

extern void (*func_p) (void);

void
foo (void)
{
  asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
}

it generates:

foo:
movq func_p(%rip), %rax
call __x86_indirect_thunk_rax
ret


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:39                       ` H.J. Lu
  2018-01-14 16:41                         ` Uros Bizjak
@ 2018-01-14 16:45                         ` Uros Bizjak
  2018-01-16 19:57                           ` Uros Bizjak
  1 sibling, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 16:45 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>
>>>> Hi Uros,
>>>>
>>>> Can you take a look at my x86 backend changes so that they are ready
>>>> to check in once we have consensus.
>>>
>>> Please finish the talks about the correct approach first. Once the
>>> consensus is reached, please post the final version of the patches for
>>> review.
>>>
>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>> at the implementation details, probably early next week.
>>
>> One general remark is on the usage of -1 as an invalid register
>
> This has been rewritten.  The checked in patch no longer does that.

I'm looking directly into current indirect_thunk_name,
output_indirect_thunk and output_indirect_thunk_function functions in
i386.c which have plenty of the mentioned checks.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:43                           ` H.J. Lu
@ 2018-01-14 16:48                             ` Jakub Jelinek
  2018-01-14 17:17                               ` H.J. Lu
  2018-01-14 16:58                             ` Uros Bizjak
  1 sibling, 1 reply; 135+ messages in thread
From: Jakub Jelinek @ 2018-01-14 16:48 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Uros Bizjak, Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
> They are used in asm statements in kernel:
> 
> extern void (*func_p) (void);
> 
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));

Well, using it just with a single register classes wouldn't make much sense,
then you can just use "call __x86_indirect_thunk_rax"
or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
need to extend anything.
But supposedly if you use it with "r" or "q" or similar class this will be
different.

	Jakub

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:43                           ` H.J. Lu
  2018-01-14 16:48                             ` Jakub Jelinek
@ 2018-01-14 16:58                             ` Uros Bizjak
  2018-01-14 16:58                               ` H.J. Lu
  1 sibling, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 16:58 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>
>>>>>> Hi Uros,
>>>>>>
>>>>>> Can you take a look at my x86 backend changes so that they are ready
>>>>>> to check in once we have consensus.
>>>>>
>>>>> Please finish the talks about the correct approach first. Once the
>>>>> consensus is reached, please post the final version of the patches for
>>>>> review.
>>>>>
>>>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>>>> at the implementation details, probably early next week.
>>>>
>>>> One general remark is on the usage of -1 as an invalid register
>>>
>>> This has been rewritten.  The checked in patch no longer does that.
>>
>> Another issue:
>>
>> +static void
>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>> +{
>> +  if (USE_HIDDEN_LINKONCE)
>> +    {
>> +      const char *bnd = need_bnd_p ? "_bnd" : "";
>> +      if (regno >= 0)
>> +    {
>> +      const char *reg_prefix;
>> +      if (LEGACY_INT_REGNO_P (regno))
>> +        reg_prefix = TARGET_64BIT ? "r" : "e";
>> +      else
>> +        reg_prefix = "";
>> +      sprintf (name, "__x86_indirect_thunk%s_%s%s",
>> +           bnd, reg_prefix, reg_names[regno]);
>> +    }
>> +      else
>> +    sprintf (name, "__x86_indirect_thunk%s", bnd);
>> +    }
>>
>> What is the benefit of reg_prefix? Can't we just live with e.g.:
>>
>> __x86_indirect_thunk_ax
>>
>> which is the true register name and is valid for 32bit and 64bit targets.
>
> They are used in asm statements in kernel:
>
> extern void (*func_p) (void);
>
> void
> foo (void)
> {
>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
> }
>
> it generates:
>
> foo:
> movq func_p(%rip), %rax
> call __x86_indirect_thunk_rax
> ret

Please fix %V to output reg_name instead.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:58                             ` Uros Bizjak
@ 2018-01-14 16:58                               ` H.J. Lu
  0 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 16:58 UTC (permalink / raw)
  To: Uros Bizjak, Woodhouse, David
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 8:48 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 5:41 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sun, Jan 14, 2018 at 8:39 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Uros,
>>>>>>>
>>>>>>> Can you take a look at my x86 backend changes so that they are ready
>>>>>>> to check in once we have consensus.
>>>>>>
>>>>>> Please finish the talks about the correct approach first. Once the
>>>>>> consensus is reached, please post the final version of the patches for
>>>>>> review.
>>>>>>
>>>>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>>>>> at the implementation details, probably early next week.
>>>>>
>>>>> One general remark is on the usage of -1 as an invalid register
>>>>
>>>> This has been rewritten.  The checked in patch no longer does that.
>>>
>>> Another issue:
>>>
>>> +static void
>>> +indirect_thunk_name (char name[32], int regno, bool need_bnd_p)
>>> +{
>>> +  if (USE_HIDDEN_LINKONCE)
>>> +    {
>>> +      const char *bnd = need_bnd_p ? "_bnd" : "";
>>> +      if (regno >= 0)
>>> +    {
>>> +      const char *reg_prefix;
>>> +      if (LEGACY_INT_REGNO_P (regno))
>>> +        reg_prefix = TARGET_64BIT ? "r" : "e";
>>> +      else
>>> +        reg_prefix = "";
>>> +      sprintf (name, "__x86_indirect_thunk%s_%s%s",
>>> +           bnd, reg_prefix, reg_names[regno]);
>>> +    }
>>> +      else
>>> +    sprintf (name, "__x86_indirect_thunk%s", bnd);
>>> +    }
>>>
>>> What is the benefit of reg_prefix? Can't we just live with e.g.:
>>>
>>> __x86_indirect_thunk_ax
>>>
>>> which is the true register name and is valid for 32bit and 64bit targets.
>>
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>> }
>>
>> it generates:
>>
>> foo:
>> movq func_p(%rip), %rax
>> call __x86_indirect_thunk_rax
>> ret
>
> Please fix %V to output reg_name instead.
>

David, please comment.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:48                             ` Jakub Jelinek
@ 2018-01-14 17:17                               ` H.J. Lu
  2018-01-14 17:51                                 ` Woodhouse, David
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 17:17 UTC (permalink / raw)
  To: Jakub Jelinek, Woodhouse, David
  Cc: Uros Bizjak, Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>
> Well, using it just with a single register classes wouldn't make much sense,
> then you can just use "call __x86_indirect_thunk_rax"
> or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
> need to extend anything.
> But supposedly if you use it with "r" or "q" or similar class this will be
> different.
>

I believe "r" is allowed.


-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 17:17                               ` H.J. Lu
@ 2018-01-14 17:51                                 ` Woodhouse, David
  2018-01-14 18:52                                   ` Uros Bizjak
  0 siblings, 1 reply; 135+ messages in thread
From: Woodhouse, David @ 2018-01-14 17:51 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Jakub Jelinek, Richard Biener, Uros Bizjak, Jeff Law,
	Eric Botcazou, GCC Patches

This won't make the list; I'll send a more coherent and less HTML-afflicted version later.

The bare 'ax' naming made it painful to instantiate the external thunks for 32-bit and 64-bot code because we had to put the e/r back again inside the .irp reg ax bx... code.

We could probably have lived with that but it would be painful to change now that Linux and Xen patches with the current ABI are all lined up. I appreciate they weren't in GCC yet so we get little sympathy but these are strange times and we had to move fast.

I'd really like *not* to change it now. Having the thunk name actually include the name of the register it's using does seem nicer anyway...

On 14 Jan 2018 17:58, "H.J. Lu" <hjl.tools@gmail.com> wrote:

On Sun, Jan 14, 2018 at 8:45 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Sun, Jan 14, 2018 at 08:41:54AM -0800, H.J. Lu wrote:
>> They are used in asm statements in kernel:
>>
>> extern void (*func_p) (void);
>>
>> void
>> foo (void)
>> {
>>   asm ("call __x86_indirect_thunk_%V0" : : "a" (func_p));
>
> Well, using it just with a single register classes wouldn't make much sense,
> then you can just use "call __x86_indirect_thunk_rax"
> or "call __x86_indirect_thunk_eax" depending on __x86_64__, you wouldn't
> need to extend anything.
> But supposedly if you use it with "r" or "q" or similar class this will be
> different.
>

I believe "r" is allowed.


--
H.J.




Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 17:51                                 ` Woodhouse, David
@ 2018-01-14 18:52                                   ` Uros Bizjak
  2018-01-14 20:35                                     ` Uros Bizjak
  2018-01-17 18:34                                     ` Woodhouse, David
  0 siblings, 2 replies; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 18:52 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David <dwmw@amazon.co.uk> wrote:
> This won't make the list; I'll send a more coherent and less HTML-afflicted
> version later.
>
> The bare 'ax' naming made it painful to instantiate the external thunks for
> 32-bit and 64-bot code because we had to put the e/r back again inside the
> .irp reg ax bx... code.
>
> We could probably have lived with that but it would be painful to change now
> that Linux and Xen patches with the current ABI are all lined up. I
> appreciate they weren't in GCC yet so we get little sympathy but these are
> strange times and we had to move fast.
>
> I'd really like *not* to change it now. Having the thunk name actually
> include the name of the register it's using does seem nicer anyway...

That's unfortunate... I suspect that in the future, one will need
#ifdef __x86_64__ around eventual calls to thunks from c code because
of this decision, since thunks for x86_64 target will have different
names than thunks for x86_32 target. I don't know if the (single?)
case of mixing 32 and 64 bit assembly in the highly specialized part
of the kernel really warrants this decision. Future programmers will
be grateful if kernel people can re-consider their choice in
not-yet-release ABI.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 18:52                                   ` Uros Bizjak
@ 2018-01-14 20:35                                     ` Uros Bizjak
  2018-01-14 20:44                                       ` David Woodhouse
  2018-01-17 18:34                                     ` Woodhouse, David
  1 sibling, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 20:35 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

On Sun, Jan 14, 2018 at 7:22 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David <dwmw@amazon.co.uk> wrote:
>> This won't make the list; I'll send a more coherent and less HTML-afflicted
>> version later.
>>
>> The bare 'ax' naming made it painful to instantiate the external thunks for
>> 32-bit and 64-bot code because we had to put the e/r back again inside the
>> .irp reg ax bx... code.
>>
>> We could probably have lived with that but it would be painful to change now
>> that Linux and Xen patches with the current ABI are all lined up. I
>> appreciate they weren't in GCC yet so we get little sympathy but these are
>> strange times and we had to move fast.
>>
>> I'd really like *not* to change it now. Having the thunk name actually
>> include the name of the register it's using does seem nicer anyway...
>
> That's unfortunate... I suspect that in the future, one will need
> #ifdef __x86_64__ around eventual calls to thunks from c code because
> of this decision, since thunks for x86_64 target will have different
> names than thunks for x86_32 target. I don't know if the (single?)
> case of mixing 32 and 64 bit assembly in the highly specialized part
> of the kernel really warrants this decision. Future programmers will
> be grateful if kernel people can re-consider their choice in
> not-yet-release ABI.

A quick look through latest x86/pti update [1] shows:

+#ifdef CONFIG_X86_32
+#define INDIRECT_THUNK(reg) extern asmlinkage void
__x86_indirect_thunk_e ## reg(void);
+#else
+#define INDIRECT_THUNK(reg) extern asmlinkage void
__x86_indirect_thunk_r ## reg(void);
+INDIRECT_THUNK(8)
+INDIRECT_THUNK(9)
+INDIRECT_THUNK(10)
+INDIRECT_THUNK(11)
+INDIRECT_THUNK(12)
+INDIRECT_THUNK(13)
+INDIRECT_THUNK(14)
+INDIRECT_THUNK(15)
+#endif
+INDIRECT_THUNK(ax)
+INDIRECT_THUNK(bx)
+INDIRECT_THUNK(cx)
+INDIRECT_THUNK(dx)
+INDIRECT_THUNK(si)
+INDIRECT_THUNK(di)
+INDIRECT_THUNK(bp)
+INDIRECT_THUNK(sp)

and:

+/*
+ * Despite being an assembler file we can't just use .irp here
+ * because __KSYM_DEPS__ only uses the C preprocessor and would
+ * only see one instance of "__x86_indirect_thunk_\reg" rather
+ * than one per register with the correct names. So we do it
+ * the simple and nasty way...
+ */
+#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
+#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+
+GENERATE_THUNK(_ASM_AX)
+GENERATE_THUNK(_ASM_BX)
+GENERATE_THUNK(_ASM_CX)
+GENERATE_THUNK(_ASM_DX)
+GENERATE_THUNK(_ASM_SI)
+GENERATE_THUNK(_ASM_DI)
+GENERATE_THUNK(_ASM_BP)
+GENERATE_THUNK(_ASM_SP)
+#ifdef CONFIG_64BIT
+GENERATE_THUNK(r8)
+GENERATE_THUNK(r9)
+GENERATE_THUNK(r10)
+GENERATE_THUNK(r11)
+GENERATE_THUNK(r12)
+GENERATE_THUNK(r13)
+GENERATE_THUNK(r14)
+GENERATE_THUNK(r15)

I have a feeling that using e.g. __x86_indirect_thunk_ax would be more
convenient in both cases.

[1] https://www.spinics.net/lists/kernel/msg2697606.html

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 20:35                                     ` Uros Bizjak
@ 2018-01-14 20:44                                       ` David Woodhouse
  2018-01-14 21:03                                         ` Uros Bizjak
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 20:44 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2979 bytes --]

On Sun, 2018-01-14 at 21:21 +0100, Uros Bizjak wrote:
> A quick look through latest x86/pti update [1] shows:
> 
> +#ifdef CONFIG_X86_32
> +#define INDIRECT_THUNK(reg) extern asmlinkage void
> __x86_indirect_thunk_e ## reg(void);
> +#else
> +#define INDIRECT_THUNK(reg) extern asmlinkage void
> __x86_indirect_thunk_r ## reg(void);
> +INDIRECT_THUNK(8)
> +INDIRECT_THUNK(9)
> +INDIRECT_THUNK(10)
> +INDIRECT_THUNK(11)
> +INDIRECT_THUNK(12)
> +INDIRECT_THUNK(13)
> +INDIRECT_THUNK(14)
> +INDIRECT_THUNK(15)
> +#endif
> +INDIRECT_THUNK(ax)
> +INDIRECT_THUNK(bx)
> +INDIRECT_THUNK(cx)
> +INDIRECT_THUNK(dx)
> +INDIRECT_THUNK(si)
> +INDIRECT_THUNK(di)
> +INDIRECT_THUNK(bp)
> +INDIRECT_THUNK(sp)

Yeah, that one is purely for the CONFIG_MODVERSIONS system, which I'm
hoping to fix properly by not having to have fake (and clearly
incorrect) C prototypes for the thunks which aren't actually C
functions. It's intended to go away.


> and:
> 
> +/*
> + * Despite being an assembler file we can't just use .irp here
> + * because __KSYM_DEPS__ only uses the C preprocessor and would
> + * only see one instance of "__x86_indirect_thunk_\reg" rather
> + * than one per register with the correct names. So we do it
> + * the simple and nasty way...
> + */
> +#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
> +#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
> +
> +GENERATE_THUNK(_ASM_AX)
> +GENERATE_THUNK(_ASM_BX)
> +GENERATE_THUNK(_ASM_CX)
> +GENERATE_THUNK(_ASM_DX)
> +GENERATE_THUNK(_ASM_SI)
> +GENERATE_THUNK(_ASM_DI)
> +GENERATE_THUNK(_ASM_BP)
> +GENERATE_THUNK(_ASM_SP)
> +#ifdef CONFIG_64BIT
> +GENERATE_THUNK(r8)
> +GENERATE_THUNK(r9)
> +GENERATE_THUNK(r10)
> +GENERATE_THUNK(r11)
> +GENERATE_THUNK(r12)
> +GENERATE_THUNK(r13)
> +GENERATE_THUNK(r14)
> +GENERATE_THUNK(r15)
> 
> I have a feeling that using e.g. __x86_indirect_thunk_ax would be more
> convenient in both cases.

Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes
.S files through the preprocessor and looks for EXPORT_SYMBOL, so it
wasn't working well with my .irp-based implementation like the one in
Xen. So I've swapped it out for this one for now.

Again, I was hoping to clean that up and make it do something saner,
and then this could switch back too.

But sure, right now it isn't that might of a difference for me; my
implementation has changed since I made that reqeust. I have no
fundamental technical objection to the bare 'ax' naming. We can live
with either.

It's just that we've been asking for an agreement on the basics (the
command line we use, and the thunk names) for some days now, and this
is the first time we've had this discussion, and Linus has just taken
the patches. 

That's still fine. I know we get no sympathy, and we *can* change the
Linux kernel between -rc8 and -final if we must, and change the Xen
patches too. I'd just rather not.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 20:44                                       ` David Woodhouse
@ 2018-01-14 21:03                                         ` Uros Bizjak
  2018-01-14 21:07                                           ` David Woodhouse
  2018-01-15  8:21                                           ` David Woodhouse
  0 siblings, 2 replies; 135+ messages in thread
From: Uros Bizjak @ 2018-01-14 21:03 UTC (permalink / raw)
  To: David Woodhouse
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse <dwmw2@infradead.org> wrote:

> Likewise, the CONFIG_TRIM_UNUSED_SYMBOLS mechanism in the kernel passes
> .S files through the preprocessor and looks for EXPORT_SYMBOL, so it
> wasn't working well with my .irp-based implementation like the one in
> Xen. So I've swapped it out for this one for now.
>
> Again, I was hoping to clean that up and make it do something saner,
> and then this could switch back too.
>
> But sure, right now it isn't that might of a difference for me; my
> implementation has changed since I made that reqeust. I have no
> fundamental technical objection to the bare 'ax' naming. We can live
> with either.
>
> It's just that we've been asking for an agreement on the basics (the
> command line we use, and the thunk names) for some days now, and this
> is the first time we've had this discussion, and Linus has just taken
> the patches.
>
> That's still fine. I know we get no sympathy, and we *can* change the
> Linux kernel between -rc8 and -final if we must, and change the Xen
> patches too. I'd just rather not.

Well, you did say that these are strange times ;)

From the user perspective, it would be more convenient to use the
thunk names that are the same for 32bit and 64bit targets. If we
ignore this fact, the difference is only a couple of lines in the
compiler source which we also can live with. But please discuss my
proposal also in the kernel community, and weight the benefits and
drawbacks of each approach before the final decision.

Please pass the final decision to gcc community, and we'll implement it.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:03                                         ` Uros Bizjak
@ 2018-01-14 21:07                                           ` David Woodhouse
  2018-01-14 21:09                                             ` Linus Torvalds
  2018-01-15  8:21                                           ` David Woodhouse
  1 sibling, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 21:07 UTC (permalink / raw)
  To: Uros Bizjak, torvalds, Thomas Gleixner
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1946 bytes --]

On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote:
> On Sun, Jan 14, 2018 at 9:34 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > But sure, right now it isn't that might of a difference for me; my
> > implementation has changed since I made that reqeust. I have no
> > fundamental technical objection to the bare 'ax' naming. We can live
> > with either.
> > 
> > It's just that we've been asking for an agreement on the basics (the
> > command line we use, and the thunk names) for some days now, and this
> > is the first time we've had this discussion, and Linus has just taken
> > the patches.
> > 
> > That's still fine. I know we get no sympathy, and we *can* change the
> > Linux kernel between -rc8 and -final if we must, and change the Xen
> > patches too. I'd just rather not.
> Well, you did say that these are strange times ;)
> 
> From the user perspective, it would be more convenient to use the
> thunk names that are the same for 32bit and 64bit targets. If we
> ignore this fact, the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement it.

+Linus, Thomas.

Review on the GCC patches has led to a request that the thunk symbols
be changed from e.g. __x86_indirect_thunk_rax to
__x86_indirect_thunk_ax without the 'r'.

If we're going to change the thunk names, it's best to do it *right*
now before the 4.15-rc8 release.

I genuinely don't care at this point what the thunk names are. It's
just that Linus is probably preparing the -rc8 release as we speak, and
I'd want to do a new compiler build and set of tests if we make the
change. For that reason alone, I'm inclined to answer that we should
leave them as they are.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:07                                           ` David Woodhouse
@ 2018-01-14 21:09                                             ` Linus Torvalds
  2018-01-14 21:16                                               ` David Woodhouse
                                                                 ` (2 more replies)
  0 siblings, 3 replies; 135+ messages in thread
From: Linus Torvalds @ 2018-01-14 21:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Uros Bizjak, Thomas Gleixner, H.J. Lu, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> Review on the GCC patches has led to a request that the thunk symbols
> be changed from e.g. __x86_indirect_thunk_rax to
> __x86_indirect_thunk_ax without the 'r'.

Ok. I think that just makes it easier for us, since then the names are
independent of 32-vs/64, and we don't need to use the _ASM_XY names.

What about r8-r15? I'm assuming 'r' there is used?

Mind sending me a tested patch? I'll was indeed planning on generating
rc8, but I might as well go grocery shopping now instead, and do rc8
later in the evening.

                Linus

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:09                                             ` Linus Torvalds
@ 2018-01-14 21:16                                               ` David Woodhouse
  2018-01-14 21:23                                                 ` Thomas Gleixner
  2018-01-14 21:49                                               ` H.J. Lu
  2018-01-14 22:03                                               ` David Woodhouse
  2 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 21:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Uros Bizjak, Thomas Gleixner, H.J. Lu, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > Review on the GCC patches has led to a request that the thunk symbols
> > be changed from e.g. __x86_indirect_thunk_rax to
> > __x86_indirect_thunk_ax without the 'r'.
> 
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> 
> What about r8-r15? I'm assuming 'r' there is used?
> 
> Mind sending me a tested patch? I'll was indeed planning on generating
> rc8, but I might as well go grocery shopping now instead, and do rc8
> later in the evening.

I'll kick off a compiler build now...

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:16                                               ` David Woodhouse
@ 2018-01-14 21:23                                                 ` Thomas Gleixner
  0 siblings, 0 replies; 135+ messages in thread
From: Thomas Gleixner @ 2018-01-14 21:23 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Uros Bizjak, H.J. Lu, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, 14 Jan 2018, David Woodhouse wrote:

> On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> > On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > > Review on the GCC patches has led to a request that the thunk symbols
> > > be changed from e.g. __x86_indirect_thunk_rax to
> > > __x86_indirect_thunk_ax without the 'r'.
> > 
> > Ok. I think that just makes it easier for us, since then the names are
> > independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> > 
> > What about r8-r15? I'm assuming 'r' there is used?
> > 
> > Mind sending me a tested patch? I'll was indeed planning on generating
> > rc8, but I might as well go grocery shopping now instead, and do rc8
> > later in the evening.
> 
> I'll kick off a compiler build now...

Send the patch to me/LKML. I'm queueing the compile time warning removal
and then can add that one on top so Linus can pull the lot.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:09                                             ` Linus Torvalds
  2018-01-14 21:16                                               ` David Woodhouse
@ 2018-01-14 21:49                                               ` H.J. Lu
  2018-01-14 22:03                                               ` David Woodhouse
  2 siblings, 0 replies; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 21:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Woodhouse, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 1:07 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>> Review on the GCC patches has led to a request that the thunk symbols
>> be changed from e.g. __x86_indirect_thunk_rax to
>> __x86_indirect_thunk_ax without the 'r'.
>
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
>
> What about r8-r15? I'm assuming 'r' there is used?

They will remain r8-r15.

> Mind sending me a tested patch? I'll was indeed planning on generating
> rc8, but I might as well go grocery shopping now instead, and do rc8
> later in the evening.
>
>                 Linus



-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:09                                             ` Linus Torvalds
  2018-01-14 21:16                                               ` David Woodhouse
  2018-01-14 21:49                                               ` H.J. Lu
@ 2018-01-14 22:03                                               ` David Woodhouse
  2018-01-14 22:09                                                 ` H.J. Lu
  2 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 22:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Uros Bizjak, Thomas Gleixner, H.J. Lu, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 2000 bytes --]

On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > Review on the GCC patches has led to a request that the thunk symbols
> > be changed from e.g. __x86_indirect_thunk_rax to
> > __x86_indirect_thunk_ax without the 'r'.
> Ok. I think that just makes it easier for us, since then the names are
> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
> 
> What about r8-r15? I'm assuming 'r' there is used?

Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the
register names as well as the suffix of the thunk name. But for the
legacy registers I have to prepend 'e' or 'r' myself in the macro. So
it ends up looking like this:


.macro THUNK reg
	.section .text.__x86.indirect_thunk.\reg

ENTRY(__x86_indirect_thunk_\reg)
	CFI_STARTPROC
	$done = 0
	.irp xreg r8 r9 r10 r11 r12 r13 r14 r15
	.ifeqs "\reg", "\xreg"
		JMP_NOSPEC %\reg
		$done = 1
	.endif
	.endr
	.if $done != 1
		JMP_NOSPEC %__ASM_REG(\reg)
	.endif
	CFI_ENDPROC
ENDPROC(__x86_indirect_thunk_\reg)
.endm

/*
 * Despite being an assembler file we can't just use .irp here
 * because __KSYM_DEPS__ only uses the C preprocessor and would
 * only see one instance of "__x86_indirect_thunk_\reg" rather
 * than one per register with the correct names. So we do it
 * the simple and nasty way...
 */
#define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)

GENERATE_THUNK(ax)
GENERATE_THUNK(bx)
GENERATE_THUNK(cx)
GENERATE_THUNK(dx)
GENERATE_THUNK(si)
GENERATE_THUNK(di)
GENERATE_THUNK(bp)
#ifdef CONFIG_64BIT
GENERATE_THUNK(r8)
GENERATE_THUNK(r9)
GENERATE_THUNK(r10)
GENERATE_THUNK(r11)
GENERATE_THUNK(r12)
GENERATE_THUNK(r13)
GENERATE_THUNK(r14)
GENERATE_THUNK(r15)
#endif


And *that* was the point at which I asked HJ to just use the proper
bloody register names :)

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 22:03                                               ` David Woodhouse
@ 2018-01-14 22:09                                                 ` H.J. Lu
  2018-01-14 22:12                                                   ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 22:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 1:58 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sun, 2018-01-14 at 13:07 -0800, Linus Torvalds wrote:
>> On Sun, Jan 14, 2018 at 1:02 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>> >
>> > Review on the GCC patches has led to a request that the thunk symbols
>> > be changed from e.g. __x86_indirect_thunk_rax to
>> > __x86_indirect_thunk_ax without the 'r'.
>> Ok. I think that just makes it easier for us, since then the names are
>> independent of 32-vs/64, and we don't need to use the _ASM_XY names.
>>
>> What about r8-r15? I'm assuming 'r' there is used?
>
> Ah yes, *this* is why I hated it... for 'r8' onwards that is indeed the
> register names as well as the suffix of the thunk name. But for the
> legacy registers I have to prepend 'e' or 'r' myself in the macro. So
> it ends up looking like this:
>
>
> .macro THUNK reg
>         .section .text.__x86.indirect_thunk.\reg
>
> ENTRY(__x86_indirect_thunk_\reg)
>         CFI_STARTPROC
>         $done = 0
>         .irp xreg r8 r9 r10 r11 r12 r13 r14 r15
>         .ifeqs "\reg", "\xreg"
>                 JMP_NOSPEC %\reg
>                 $done = 1
>         .endif
>         .endr
>         .if $done != 1
>                 JMP_NOSPEC %__ASM_REG(\reg)
>         .endif
>         CFI_ENDPROC
> ENDPROC(__x86_indirect_thunk_\reg)
> .endm
>
> /*
>  * Despite being an assembler file we can't just use .irp here
>  * because __KSYM_DEPS__ only uses the C preprocessor and would
>  * only see one instance of "__x86_indirect_thunk_\reg" rather
>  * than one per register with the correct names. So we do it
>  * the simple and nasty way...
>  */
> #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
> #define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
>
> GENERATE_THUNK(ax)
> GENERATE_THUNK(bx)
> GENERATE_THUNK(cx)
> GENERATE_THUNK(dx)
> GENERATE_THUNK(si)
> GENERATE_THUNK(di)
> GENERATE_THUNK(bp)
> #ifdef CONFIG_64BIT
> GENERATE_THUNK(r8)
> GENERATE_THUNK(r9)
> GENERATE_THUNK(r10)
> GENERATE_THUNK(r11)
> GENERATE_THUNK(r12)
> GENERATE_THUNK(r13)
> GENERATE_THUNK(r14)
> GENERATE_THUNK(r15)
> #endif
>
>
> And *that* was the point at which I asked HJ to just use the proper
> bloody register names :)

Please let me know if I should make the change to ax,..., r8,..r15.

-- 
H.J.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 22:09                                                 ` H.J. Lu
@ 2018-01-14 22:12                                                   ` David Woodhouse
  2018-01-14 22:39                                                     ` H.J. Lu
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 22:12 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Linus Torvalds, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 849 bytes --]

On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote:
> 
> > And *that* was the point at which I asked HJ to just use the proper
> > bloody register names :)
> 
> Please let me know if I should make the change to ax,..., r8,..r15.

This is what I'm building my compiler with now, to make that change:
http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames

At this point I'm inclined to suggest we don't make the change. I'll
finish and test it anyway. I *can* change my GENERATE_THUNK macro to
take two arguments — the suffix of the thunk name, and the register
name to use. That lets me ditch the clever but ugly loop trick.

I *did* want to get this file back to using .irp in the end, by fixing
up other kernel infrastructure to do things properly. But I can live
without that too if I must.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 22:12                                                   ` David Woodhouse
@ 2018-01-14 22:39                                                     ` H.J. Lu
  2018-01-14 23:02                                                       ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: H.J. Lu @ 2018-01-14 22:39 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Linus Torvalds, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]

On Sun, Jan 14, 2018 at 2:09 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Sun, 2018-01-14 at 14:03 -0800, H.J. Lu wrote:
>>
>> > And *that* was the point at which I asked HJ to just use the proper
>> > bloody register names :)
>>
>> Please let me know if I should make the change to ax,..., r8,..r15.
>
> This is what I'm building my compiler with now, to make that change:
> http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/retpoline-regnames
>
> At this point I'm inclined to suggest we don't make the change. I'll
> finish and test it anyway. I *can* change my GENERATE_THUNK macro to
> take two arguments — the suffix of the thunk name, and the register
> name to use. That lets me ditch the clever but ugly loop trick.
>
> I *did* want to get this file back to using .irp in the end, by fixing
> up other kernel infrastructure to do things properly. But I can live
> without that too if I must.

Please use this GCC patch instead.

-- 
H.J.

[-- Attachment #2: 0001-x86-Change-register-names-in-__x86_indirect_thunk_re.patch --]
[-- Type: text/x-patch, Size: 27208 bytes --]

From 223c07edf531eaa05c7fff564ce6b5dc48e6a49b Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sun, 14 Jan 2018 14:04:55 -0800
Subject: [PATCH] x86: Change register names in __x86_indirect_thunk_reg

Change names of the lower 8 integer registers in __x86_indirect_thunk_reg
to ax, dx, cx, bx, si, di and bp.

gcc/

	* config/i386/i386.c (indirect_thunk_name): Don't check
	LEGACY_INT_REGNO_P.
	(print_reg): Use reg_names[regno] for 'V'.
	* doc/extend.texi: Replace "the full integer" with "the integer"
	for 'V'.

gcc/testsuite/

	* gcc.target/i386/indirect-thunk-1.c: Updated.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-extern-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-1.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-register-4.c: Likewise.
	* gcc.target/i386/ret-thunk-10.c: Likewise.
	* gcc.target/i386/ret-thunk-11.c: Likewise.
	* gcc.target/i386/ret-thunk-12.c: Likewise.
	* gcc.target/i386/ret-thunk-13.c: Likewise.
	* gcc.target/i386/ret-thunk-14.c: Likewise.
	* gcc.target/i386/ret-thunk-15.c: Likewise.
	* gcc.target/i386/ret-thunk-9.c: Likewise.
---
 gcc/config/i386/i386.c                               | 20 ++++++++------------
 gcc/doc/extend.texi                                  |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-1.c     |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-2.c     |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-3.c     |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-4.c     |  2 +-
 gcc/testsuite/gcc.target/i386/indirect-thunk-7.c     |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-1.c          |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-2.c          |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-5.c          |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-6.c          |  2 +-
 .../gcc.target/i386/indirect-thunk-attr-7.c          |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-1.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-2.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-3.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-4.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-extern-7.c        |  2 +-
 .../gcc.target/i386/indirect-thunk-register-1.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-register-3.c      |  2 +-
 .../gcc.target/i386/indirect-thunk-register-4.c      |  3 +--
 gcc/testsuite/gcc.target/i386/ret-thunk-10.c         |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-11.c         |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-12.c         |  4 ++--
 gcc/testsuite/gcc.target/i386/ret-thunk-13.c         |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-14.c         |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-15.c         |  2 +-
 gcc/testsuite/gcc.target/i386/ret-thunk-9.c          |  2 +-
 27 files changed, 37 insertions(+), 42 deletions(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 5e4f845a1bd..890bd701cd1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10787,15 +10787,8 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
     {
       const char *bnd = need_bnd_p ? "_bnd" : "";
       if (regno >= 0)
-	{
-	  const char *reg_prefix;
-	  if (LEGACY_INT_REGNO_P (regno))
-	    reg_prefix = TARGET_64BIT ? "r" : "e";
-	  else
-	    reg_prefix = "";
-	  sprintf (name, "__x86_indirect_thunk%s_%s%s",
-		   bnd, reg_prefix, reg_names[regno]);
-	}
+	sprintf (name, "__x86_indirect_thunk%s_%s",
+		 bnd, reg_names[regno]);
       else
 	{
 	  const char *ret = ret_p ? "return" : "indirect";
@@ -17693,7 +17686,7 @@ put_condition_code (enum rtx_code code, machine_mode mode, bool reverse,
    If CODE is 'h', pretend the reg is the 'high' byte register.
    If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
    If CODE is 'd', duplicate the operand for AVX instruction.
-   If CODE is 'V', print naked full integer register name without %.
+   If CODE is 'V', print naked integer register name without %.
  */
 
 void
@@ -17759,7 +17752,10 @@ print_reg (rtx x, int code, FILE *file)
   if (code == 'V')
     {
       if (GENERAL_REGNO_P (regno))
-	msize = GET_MODE_SIZE (word_mode);
+	{
+	  fputs (reg_names[regno], file);
+	  return;
+	}
       else
 	error ("'V' modifier on non-integer register");
     }
@@ -17883,7 +17879,7 @@ print_reg (rtx x, int code, FILE *file)
    & -- print some in-use local-dynamic symbol name.
    H -- print a memory address offset by 8; used for sse high-parts
    Y -- print condition for XOP pcom* instruction.
-   V -- print naked full integer register name without %.
+   V -- print naked integer register name without %.
    + -- print a branch hint as 'cs' or 'ds' prefix
    ; -- print a semicolon (after prefixes due to bug in older gas).
    ~ -- print "i" if TARGET_AVX2, "f" otherwise.
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index dce808f1eab..ae68c8f0821 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -9292,7 +9292,7 @@ The table below shows the list of supported modifiers and their effects.
 @tab @code{2}
 @end multitable
 
-@code{V} is a special modifier which prints the name of the full integer
+@code{V} is a special modifier which prints the name of the integer
 register without @code{%}.
 
 @anchor{x86floatingpointasmoperands}
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
index 9eb9b273ade..fb56b2db3b6 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-1.c
@@ -13,7 +13,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
index c63795e4127..337f455aa44 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-2.c
@@ -13,7 +13,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
index 82973cda771..2e40ec71609 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-3.c
@@ -14,7 +14,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
index a5f3d1cbed8..309d1f6c10b 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-4.c
@@ -14,7 +14,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
index ebfb8aab937..47674395309 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-7.c
@@ -37,7 +37,7 @@ bar (int i)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
index a08022db8e4..e8cdc4fa05d 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-1.c
@@ -16,7 +16,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
index b257c695ad1..5f333d86e18 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-2.c
@@ -14,7 +14,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler {\tpause} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
index 4bb1c5f9220..fa067b0acc8 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-5.c
@@ -18,5 +18,5 @@ male_indirect_jump (long offset)
 /* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
index 4e33a638862..442f97c123c 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-6.c
@@ -17,5 +17,5 @@ male_indirect_jump (long offset)
 /* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
index 427ba3ddbb4..64b85fc7b96 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-attr-7.c
@@ -37,7 +37,7 @@ bar (int i)
 }
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
index 5c20a35ecec..cb95a09ea34 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-1.c
@@ -13,7 +13,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
index b2fb6e1bcd2..b2af9e96b7e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-2.c
@@ -13,7 +13,7 @@ male_indirect_jump (long offset)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?dispatch" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
index 9c84547cd7c..a9a1a8bc177 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-3.c
@@ -16,5 +16,5 @@ male_indirect_jump (long offset)
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
index 457849564bb..bbf3dd24f37 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-4.c
@@ -16,5 +16,5 @@ male_indirect_jump (long offset)
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 1 { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
index d4747ea0764..39acad399d3 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-extern-7.c
@@ -37,7 +37,7 @@ bar (int i)
 
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*\.L\[0-9\]+\\(,%" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not {\t(lfence|pause)} } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler-not "call\[ \t\]*\.LIND" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
index 7d396a31953..0660feeed73 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-1.c
@@ -11,7 +11,7 @@ male_indirect_jump (long offset)
   dispatch(offset);
 }
 
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "mov\[ \t\](%eax|%rax), \\((%esp|%rsp)\\)" } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
index 5320e923be2..d39e387586e 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-3.c
@@ -11,7 +11,7 @@ male_indirect_jump (long offset)
   dispatch(offset);
 }
 
-/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_(r|e)ax" } } */
+/* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk_ax" } } */
 /* { dg-final { scan-assembler-not "push(?:l|q)\[ \t\]*_?dispatch"  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" } } */
 /* { dg-final { scan-assembler-not {\t(pause|pause|nop)} } } */
diff --git a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
index f0cd9b75be8..b857b1ac19b 100644
--- a/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
+++ b/gcc/testsuite/gcc.target/i386/indirect-thunk-register-4.c
@@ -9,5 +9,4 @@ foo (void)
   asm("call __x86_indirect_thunk_%V0" : : "a" (func_p));
 }
 
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_eax" { target ia32 } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_rax" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-10.c b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
index b4f9d48065d..2f373c362d0 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-10.c
@@ -18,6 +18,6 @@ foo (void)
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } }  } } */
 /* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
-/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_ax:" { target { x32 } }  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-11.c b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
index 0312577a043..11e041fda7e 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-11.c
@@ -18,6 +18,6 @@ foo (void)
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
-/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_ax:" { target { x32 } }  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-12.c b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
index fa3181303c9..f16c1a5d0c6 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-12.c
@@ -17,6 +17,6 @@ foo (void)
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "__x86_indirect_thunk:" { target { ! x32 } }  } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
-/* { dg-final { scan-assembler "__x86_indirect_thunk_(r|e)ax:" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "__x86_indirect_thunk_ax:" { target { x32 } }  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-13.c b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
index 7a08e71c76b..b4553216f99 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-13.c
@@ -18,5 +18,5 @@ foo (void)
 /* { dg-final { scan-assembler-times "jmp\[ \t\]*\.LIND" 3 } } */
 /* { dg-final { scan-assembler-times "call\[ \t\]*\.LIND" 3 } } */
 /* { dg-final { scan-assembler-not "jmp\[ \t\]*__x86_indirect_thunk" } } */
-/* { dg-final { scan-assembler-not "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler-not "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } }  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-14.c b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
index dacf0c769fc..28e5434a004 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-14.c
@@ -18,5 +18,5 @@ foo (void)
 /* { dg-final { scan-assembler "call\[ \t\]*\.LIND" } } */
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } }  } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } }  } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-15.c b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
index cf06a5f35c7..20fad48b790 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-15.c
@@ -18,5 +18,5 @@ foo (void)
 /* { dg-final { scan-assembler-times {\tlfence} 1 } } */
 /* { dg-final { scan-assembler "push(?:l|q)\[ \t\]*_?bar" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target x32 } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target x32 } } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
diff --git a/gcc/testsuite/gcc.target/i386/ret-thunk-9.c b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
index 6da5ab97081..8f0375b5def 100644
--- a/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
+++ b/gcc/testsuite/gcc.target/i386/ret-thunk-9.c
@@ -21,5 +21,5 @@ foo (void)
 /* { dg-final { scan-assembler "jmp\[ \t\]*__x86_indirect_thunk" { target { ! x32 } } } } */
 /* { dg-final { scan-assembler-times {\tpause} 2 { target { x32 } } } } */
 /* { dg-final { scan-assembler-times {\tlfence} 2 { target { x32 } } } } */
-/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_(r|e)ax" { target { x32 } } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*__x86_indirect_thunk_ax" { target { x32 } } } } */
 /* { dg-final { scan-assembler-not "pushq\[ \t\]%rax" { target x32 } } } */
-- 
2.14.3


^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 22:39                                                     ` H.J. Lu
@ 2018-01-14 23:02                                                       ` David Woodhouse
  2018-01-14 23:09                                                         ` Linus Torvalds
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 23:02 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Linus Torvalds, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 4148 bytes --]

On Sun, 2018-01-14 at 14:12 -0800, H.J. Lu wrote:
> Please use this GCC patch instead.

Building now; thanks.

This is the kernel patch I'll test as soon as the compiler is done.
It's slightly less horrid than the "clever" one I sent out earlier, but
does still end up needing those _ASM_AX etc. macros in *addition* to
the bare "ax" that goes in the symbol names.

I'm not convinced we want to do this, but I'll defer to Linus.



From 755f50731a99b0ce0890e478e6a2d6ebd647da15 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Sun, 14 Jan 2018 22:21:02 +0000
Subject: [PATCH] x86/retpoline: Switch thunk names to match final GCC patches

At the last minute, they were switched from __x86_indirect_thunk_rax to
__x86_indirect_thunk_ax without the 'r' or 'e' on the register name.

This is not entirely an improvement, IMO.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/asm-prototypes.h | 24 ++++++++++----------
 arch/x86/lib/retpoline.S              | 41 +++++++++++++++++------------------
 2 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 0927cdc4f946..df80478fb682 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -18,19 +18,7 @@ extern void cmpxchg8b_emu(void);
 #endif
 
 #ifdef CONFIG_RETPOLINE
-#ifdef CONFIG_X86_32
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_e ## reg(void);
-#else
-#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_r ## reg(void);
-INDIRECT_THUNK(8)
-INDIRECT_THUNK(9)
-INDIRECT_THUNK(10)
-INDIRECT_THUNK(11)
-INDIRECT_THUNK(12)
-INDIRECT_THUNK(13)
-INDIRECT_THUNK(14)
-INDIRECT_THUNK(15)
-#endif
+#define INDIRECT_THUNK(reg) extern asmlinkage void __x86_indirect_thunk_ ## reg(void);
 INDIRECT_THUNK(ax)
 INDIRECT_THUNK(bx)
 INDIRECT_THUNK(cx)
@@ -39,4 +27,14 @@ INDIRECT_THUNK(si)
 INDIRECT_THUNK(di)
 INDIRECT_THUNK(bp)
 INDIRECT_THUNK(sp)
+#ifdef CONFIG_64BIT
+INDIRECT_THUNK(r8)
+INDIRECT_THUNK(r9)
+INDIRECT_THUNK(r10)
+INDIRECT_THUNK(r11)
+INDIRECT_THUNK(r12)
+INDIRECT_THUNK(r13)
+INDIRECT_THUNK(r14)
+INDIRECT_THUNK(r15)
+#endif /* CONFIG_64BIT */
 #endif /* CONFIG_RETPOLINE */
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index cb45c6cb465f..7da2c9035836 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -8,14 +8,14 @@
 #include <asm/export.h>
 #include <asm/nospec-branch.h>
 
-.macro THUNK reg
-	.section .text.__x86.indirect_thunk.\reg
+.macro THUNK reg suffix
+	.section .text.__x86.indirect_thunk.\suffix
 
-ENTRY(__x86_indirect_thunk_\reg)
+ENTRY(__x86_indirect_thunk_\suffix)
 	CFI_STARTPROC
 	JMP_NOSPEC %\reg
 	CFI_ENDPROC
-ENDPROC(__x86_indirect_thunk_\reg)
+ENDPROC(__x86_indirect_thunk_\suffix)
 .endm
 
 /*
@@ -26,23 +26,22 @@ ENDPROC(__x86_indirect_thunk_\reg)
  * the simple and nasty way...
  */
 #define EXPORT_THUNK(reg) EXPORT_SYMBOL(__x86_indirect_thunk_ ## reg)
-#define GENERATE_THUNK(reg) THUNK reg ; EXPORT_THUNK(reg)
+#define GENERATE_THUNK(reg, suffix) THUNK reg suffix; EXPORT_THUNK(suffix)
 
-GENERATE_THUNK(_ASM_AX)
-GENERATE_THUNK(_ASM_BX)
-GENERATE_THUNK(_ASM_CX)
-GENERATE_THUNK(_ASM_DX)
-GENERATE_THUNK(_ASM_SI)
-GENERATE_THUNK(_ASM_DI)
-GENERATE_THUNK(_ASM_BP)
-GENERATE_THUNK(_ASM_SP)
+GENERATE_THUNK(_ASM_AX, ax)
+GENERATE_THUNK(_ASM_BX, bx)
+GENERATE_THUNK(_ASM_CX, cx)
+GENERATE_THUNK(_ASM_DX, dx)
+GENERATE_THUNK(_ASM_SI, si)
+GENERATE_THUNK(_ASM_DI, di)
+GENERATE_THUNK(_ASM_BP, bp)
 #ifdef CONFIG_64BIT
-GENERATE_THUNK(r8)
-GENERATE_THUNK(r9)
-GENERATE_THUNK(r10)
-GENERATE_THUNK(r11)
-GENERATE_THUNK(r12)
-GENERATE_THUNK(r13)
-GENERATE_THUNK(r14)
-GENERATE_THUNK(r15)
+GENERATE_THUNK(r8, r8)
+GENERATE_THUNK(r9, r9)
+GENERATE_THUNK(r10, r10)
+GENERATE_THUNK(r11, r11)
+GENERATE_THUNK(r12, r12)
+GENERATE_THUNK(r13, r13)
+GENERATE_THUNK(r14, r14)
+GENERATE_THUNK(r15, r15)
 #endif
-- 
2.14.3

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 23:02                                                       ` David Woodhouse
@ 2018-01-14 23:09                                                         ` Linus Torvalds
  2018-01-14 23:21                                                           ` David Woodhouse
  0 siblings, 1 reply; 135+ messages in thread
From: Linus Torvalds @ 2018-01-14 23:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: H.J. Lu, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> I'm not convinced we want to do this, but I'll defer to Linus.

Well, I guess we have no choice, if gcc ends up using the stupid names.

And yes, apparently this just made our macros worse instead of
cleaning anything up. Oh well.

I do have one (possible) solution: just export both names. So you'd export

  __x86_indirect_thunk_ax
  __x86_indirect_thunk_rax
..
 __x86_indirect_thunk_8
 __x86_indirect_thunk_r8

as symbols (same code, obviously), and then

 (a) the macros would be simpler

 (b) it just happens to work with even the old gcc patch

But at this point I don't really care.

              Linus

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 23:09                                                         ` Linus Torvalds
@ 2018-01-14 23:21                                                           ` David Woodhouse
  2018-01-14 23:23                                                             ` Linus Torvalds
  0 siblings, 1 reply; 135+ messages in thread
From: David Woodhouse @ 2018-01-14 23:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: H.J. Lu, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 864 bytes --]

On Sun, 2018-01-14 at 15:02 -0800, Linus Torvalds wrote:
> On Sun, Jan 14, 2018 at 2:39 PM, David Woodhouse <dwmw2@infradead.org
> > wrote:
> >
> > I'm not convinced we want to do this, but I'll defer to Linus.
> 
> Well, I guess we have no choice, if gcc ends up using the stupid
> names.

At this point, they'll do what we ask. See Uros's earlier message:

> ... the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement it.

I think we should stick with what we have now, with the names of the
thunks actually being the *full* name of the register (rax, eax, etc.)
that they use.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 23:21                                                           ` David Woodhouse
@ 2018-01-14 23:23                                                             ` Linus Torvalds
  0 siblings, 0 replies; 135+ messages in thread
From: Linus Torvalds @ 2018-01-14 23:23 UTC (permalink / raw)
  To: David Woodhouse
  Cc: H.J. Lu, Uros Bizjak, Thomas Gleixner, Jakub Jelinek,
	Richard Biener, Jeff Law, Eric Botcazou, GCC Patches

On Sun, Jan 14, 2018 at 3:09 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>
> I think we should stick with what we have now, with the names of the
> thunks actually being the *full* name of the register (rax, eax, etc.)
> that they use.

It that works for the gcc people, then yes, I agree. The mixed
"sometimes full, sometimes not" approach just seems broken.

              Linus

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 21:03                                         ` Uros Bizjak
  2018-01-14 21:07                                           ` David Woodhouse
@ 2018-01-15  8:21                                           ` David Woodhouse
  1 sibling, 0 replies; 135+ messages in thread
From: David Woodhouse @ 2018-01-15  8:21 UTC (permalink / raw)
  To: Uros Bizjak, Linus Torvalds, Thomas Gleixner
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 842 bytes --]

On Sun, 2018-01-14 at 21:52 +0100, Uros Bizjak wrote:
> 
> Well, you did say that these are strange times ;)
> 
> From the user perspective, it would be more convenient to use the
> thunk names that are the same for 32bit and 64bit targets. If we
> ignore this fact, the difference is only a couple of lines in the
> compiler source which we also can live with. But please discuss my
> proposal also in the kernel community, and weight the benefits and
> drawbacks of each approach before the final decision.
> 
> Please pass the final decision to gcc community, and we'll implement
> it.

I think you watched this happen, but just to be explicitly clear:

We weighed the benefits and tested this, and we concluded that we don't
want it. Let's stick with e.g. __x86_indirect_thunk_rax please.

Thank you for being flexible.

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5213 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 16:45                         ` Uros Bizjak
@ 2018-01-16 19:57                           ` Uros Bizjak
  0 siblings, 0 replies; 135+ messages in thread
From: Uros Bizjak @ 2018-01-16 19:57 UTC (permalink / raw)
  To: H.J. Lu
  Cc: Richard Biener, Jeff Law, Jakub Jelinek, Eric Botcazou, GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1622 bytes --]

On Sun, Jan 14, 2018 at 5:43 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> On Sun, Jan 14, 2018 at 5:35 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>> On Sun, Jan 14, 2018 at 8:19 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>> On Fri, Jan 12, 2018 at 9:01 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>>>> On Thu, Jan 11, 2018 at 2:28 PM, H.J. Lu <hjl.tools@gmail.com> wrote:
>>>>
>>>>> Hi Uros,
>>>>>
>>>>> Can you take a look at my x86 backend changes so that they are ready
>>>>> to check in once we have consensus.
>>>>
>>>> Please finish the talks about the correct approach first. Once the
>>>> consensus is reached, please post the final version of the patches for
>>>> review.
>>>>
>>>> BTW: I have no detailed insight in these issues, so I'll look mostly
>>>> at the implementation details, probably early next week.
>>>
>>> One general remark is on the usage of -1 as an invalid register
>>
>> This has been rewritten.  The checked in patch no longer does that.
>
> I'm looking directly into current indirect_thunk_name,
> output_indirect_thunk and output_indirect_thunk_function functions in
> i386.c which have plenty of the mentioned checks.

Improved with attached patch.

2018-01-16  Uros Bizjak  <ubizjak@gmail.com>

    * config/i386/i386.c (indirect_thunk_name): Declare regno
    as unsigned int.  Compare regno with INVALID_REGNUM.
    (output_indirect_thunk): Ditto.
    (output_indirect_thunk_function): Ditto.
    (ix86_code_end): Declare regno as unsigned int.  Use INVALID_REGNUM
    in the call to output_indirect_thunk_function.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.

[-- Attachment #2: p.diff.txt --]
[-- Type: text/plain, Size: 3215 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ea9c462..7f233d1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -10765,16 +10765,16 @@ static int indirect_thunks_bnd_used;
 /* Fills in the label name that should be used for the indirect thunk.  */
 
 static void
-indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
-		     bool ret_p)
+indirect_thunk_name (char name[32], unsigned int regno,
+		     bool need_bnd_p, bool ret_p)
 {
-  if (regno >= 0 && ret_p)
+  if (regno != INVALID_REGNUM && ret_p)
     gcc_unreachable ();
 
   if (USE_HIDDEN_LINKONCE)
     {
       const char *bnd = need_bnd_p ? "_bnd" : "";
-      if (regno >= 0)
+      if (regno != INVALID_REGNUM)
 	{
 	  const char *reg_prefix;
 	  if (LEGACY_INT_REGNO_P (regno))
@@ -10792,7 +10792,7 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
     }
   else
     {
-      if (regno >= 0)
+      if (regno != INVALID_REGNUM)
 	{
 	  if (need_bnd_p)
 	    ASM_GENERATE_INTERNAL_LABEL (name, "LITBR", regno);
@@ -10844,7 +10844,7 @@ indirect_thunk_name (char name[32], int regno, bool need_bnd_p,
  */
 
 static void
-output_indirect_thunk (bool need_bnd_p, int regno)
+output_indirect_thunk (bool need_bnd_p, unsigned int regno)
 {
   char indirectlabel1[32];
   char indirectlabel2[32];
@@ -10874,7 +10874,7 @@ output_indirect_thunk (bool need_bnd_p, int regno)
 
   ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, indirectlabel2);
 
-  if (regno >= 0)
+  if (regno != INVALID_REGNUM)
     {
       /* MOV.  */
       rtx xops[2];
@@ -10898,12 +10898,12 @@ output_indirect_thunk (bool need_bnd_p, int regno)
 }
 
 /* Output a funtion with a call and return thunk for indirect branch.
-   If BND_P is true, the BND prefix is needed.   If REGNO != -1,  the
-   function address is in REGNO.  Otherwise, the function address is
+   If BND_P is true, the BND prefix is needed.  If REGNO != UNVALID_REGNUM,
+   the function address is in REGNO.  Otherwise, the function address is
    on the top of stack.  */
 
 static void
-output_indirect_thunk_function (bool need_bnd_p, int regno)
+output_indirect_thunk_function (bool need_bnd_p, unsigned int regno)
 {
   char name[32];
   tree decl;
@@ -10952,7 +10952,7 @@ output_indirect_thunk_function (bool need_bnd_p, int regno)
 	ASM_OUTPUT_LABEL (asm_out_file, name);
       }
 
-  if (regno < 0)
+  if (regno == INVALID_REGNUM)
     {
       /* Create alias for __x86.return_thunk/__x86.return_thunk_bnd.  */
       char alias[32];
@@ -11026,16 +11026,16 @@ static void
 ix86_code_end (void)
 {
   rtx xops[2];
-  int regno;
+  unsigned int regno;
 
   if (indirect_thunk_needed)
-    output_indirect_thunk_function (false, -1);
+    output_indirect_thunk_function (false, INVALID_REGNUM);
   if (indirect_thunk_bnd_needed)
-    output_indirect_thunk_function (true, -1);
+    output_indirect_thunk_function (true, INVALID_REGNUM);
 
   for (regno = FIRST_REX_INT_REG; regno <= LAST_REX_INT_REG; regno++)
     {
-      int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1;
+      unsigned int i = regno - FIRST_REX_INT_REG + LAST_INT_REG + 1;
       if ((indirect_thunks_used & (1 << i)))
 	output_indirect_thunk_function (false, regno);
 

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-14 18:52                                   ` Uros Bizjak
  2018-01-14 20:35                                     ` Uros Bizjak
@ 2018-01-17 18:34                                     ` Woodhouse, David
  2018-01-18  8:02                                       ` Uros Bizjak
  1 sibling, 1 reply; 135+ messages in thread
From: Woodhouse, David @ 2018-01-17 18:34 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

I'm not sure I understand the concern. When compiling a large project for -m32 vs. -m64, there must be a million times the compiler has to decide whether to emit "r" or "e" before a register name. HJ's patch already does this for the thunk symbol. What is the future requirement that I am not understanding, and that is so hard?

Back to real computer and will stop top-posting HTML soon, I promise!

On 14 Jan 2018 19:22, Uros Bizjak <ubizjak@gmail.com> wrote:

On Sun, Jan 14, 2018 at 6:44 PM, Woodhouse, David <dwmw@amazon.co.uk> wrote:
> This won't make the list; I'll send a more coherent and less HTML-afflicted
> version later.
>
> The bare 'ax' naming made it painful to instantiate the external thunks for
> 32-bit and 64-bot code because we had to put the e/r back again inside the
> .irp reg ax bx... code.
>
> We could probably have lived with that but it would be painful to change now
> that Linux and Xen patches with the current ABI are all lined up. I
> appreciate they weren't in GCC yet so we get little sympathy but these are
> strange times and we had to move fast.
>
> I'd really like *not* to change it now. Having the thunk name actually
> include the name of the register it's using does seem nicer anyway...

That's unfortunate... I suspect that in the future, one will need
#ifdef __x86_64__ around eventual calls to thunks from c code because
of this decision, since thunks for x86_64 target will have different
names than thunks for x86_32 target. I don't know if the (single?)
case of mixing 32 and 64 bit assembly in the highly specialized part
of the kernel really warrants this decision. Future programmers will
be grateful if kernel people can re-consider their choice in
not-yet-release ABI.

Uros.




Amazon Web Services UK Limited. Registered in England and Wales with registration number 08650665 and which has its registered office at 60 Holborn Viaduct, London EC1A 2FD, United Kingdom.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-17 18:34                                     ` Woodhouse, David
@ 2018-01-18  8:02                                       ` Uros Bizjak
  2018-01-18  8:18                                         ` Woodhouse, David
  0 siblings, 1 reply; 135+ messages in thread
From: Uros Bizjak @ 2018-01-18  8:02 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

On Wed, Jan 17, 2018 at 7:29 PM, Woodhouse, David <dwmw@amazon.co.uk> wrote:
> I'm not sure I understand the concern. When compiling a large project for
> -m32 vs. -m64, there must be a million times the compiler has to decide
> whether to emit "r" or "e" before a register name. HJ's patch already does
> this for the thunk symbol. What is the future requirement that I am not
> understanding, and that is so hard?

No, the concern is not with one extra fputc in the compiler.

IIRC, these thunks are also intended to be called from c code. So,
when one compiles this code on 64bit target, the thunk has different
name than when mentioned code is compiled on 32bit target. This puts
an extra burden on the developer, which has to use correct thunk name
in their code. Sure, this can be solved trivially with #ifdef
__x86_64__, so the issue is minor, but I thought it has to be
mentioned before the name is set in stone.

BTW: The names of the registers are ax, bx, di, si, bp, ... and this
is reflected in 32bit PIC thunk names. "e" prefix stands for
"extended", and "r" was added to be consistent with r8 ... r15. The
added pack of registers on 64bit target has different naming rules for
sub-word access, e.g. r8b, r10w, r12d.

Uros.

^ permalink raw reply	[flat|nested] 135+ messages in thread

* Re: [PATCH 0/5] x86: CVE-2017-5715, aka Spectre
  2018-01-18  8:02                                       ` Uros Bizjak
@ 2018-01-18  8:18                                         ` Woodhouse, David
  0 siblings, 0 replies; 135+ messages in thread
From: Woodhouse, David @ 2018-01-18  8:18 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: H.J. Lu, Jakub Jelinek, Richard Biener, Jeff Law, Eric Botcazou,
	GCC Patches

[-- Attachment #1: Type: text/plain, Size: 1357 bytes --]

On Thu, 2018-01-18 at 08:52 +0100, Uros Bizjak wrote:
> This puts an extra burden on the developer, which has to use correct
> thunk name in their code. Sure, this can be solved trivially with
> #ifdef __x86_64__, so the issue is minor, but I thought it has to be
> mentioned before the name is set in stone.

Except the developer can mostly use the inline thunk and not have to
worry their pretty little heads about it. And the kernel developers who
*have* chosen to implement their own thunk (and who requested the
thunk-extern variant in the first place) have explicitly asked for the
symbol name to be as it is.

I spent a happy few hours on Sunday night looking at this. Seriously,
in neither the thunk implementation nor the call sites (in explicit
asm) was it easier. It was horrid — just look at the patch I sent out
with the footnote "let's not do this".

In the thunk I still had the actual register name with the 'e' or the
'r' in a macro argument anyway, because I had to deal with the value
from that register. So the thunk macro ended up taking *two* arguments
because its name now didn't match what it had to call the register.

And in the call sites I was putting the target into a register again
using either 'eax' or 'rax' form, then having to branch to the thunk
using the different '_ax' form. It just hurt.


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

^ permalink raw reply	[flat|nested] 135+ messages in thread

end of thread, other threads:[~2018-01-18  8:12 UTC | newest]

Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-07 22:59 [PATCH 0/5] x86: CVE-2017-5715, aka Spectre H.J. Lu
2018-01-07 22:59 ` [PATCH 4/5] x86: Add -mindirect-branch-register H.J. Lu
2018-01-11 23:11   ` Jeff Law
2018-01-07 22:59 ` [PATCH 1/5] x86: Add -mindirect-branch= H.J. Lu
2018-01-08 10:56   ` Martin Liška
2018-01-08 11:59     ` H.J. Lu
2018-01-08 12:07       ` Jakub Jelinek
2018-01-08 12:12         ` H.J. Lu
2018-01-08 17:25           ` Michael Matz
2018-01-08 17:25             ` H.J. Lu
2018-01-08 17:51               ` Woodhouse, David
2018-01-08 23:10                 ` Michael Matz
2018-01-09  0:46                   ` Woodhouse, David
2018-01-08 21:30           ` Andi Kleen
2018-01-08 21:35             ` H.J. Lu
2018-01-08 21:44               ` David Woodhouse
2018-01-08 22:36                 ` Andi Kleen
2018-01-08 23:02                   ` David Woodhouse
2018-01-09 19:05     ` Jeff Law
2018-01-09 19:11       ` H.J. Lu
2018-01-11 22:54   ` Jeff Law
2018-01-11 23:03     ` H.J. Lu
2018-01-11 23:09     ` Jakub Jelinek
2018-01-12 17:59       ` Jeff Law
2018-01-12 18:00         ` Jakub Jelinek
2018-01-13  9:59         ` David Woodhouse
2018-01-13 16:23           ` Jeff Law
2018-01-13 16:35             ` David Woodhouse
2018-01-07 22:59 ` [PATCH 5/5] x86: Add 'V' register operand modifier H.J. Lu
2018-01-11 23:17   ` Jeff Law
2018-01-07 22:59 ` [PATCH 3/5] x86: Add -mfunction-return= H.J. Lu
2018-01-08 10:01   ` Martin Liška
2018-01-08 12:00     ` H.J. Lu
2018-01-08 21:29       ` David Woodhouse
2018-01-11 23:00   ` Jeff Law
2018-01-11 23:07     ` H.J. Lu
2018-01-07 22:59 ` [PATCH 2/5] x86: Add -mindirect-branch-loop= H.J. Lu
2018-01-08  8:23   ` Florian Weimer
2018-01-08 11:27     ` H.J. Lu
2018-01-08 19:05       ` Florian Weimer
2018-01-08 19:36         ` H.J. Lu
2018-01-08 21:01     ` David Woodhouse
2018-01-08 21:10       ` H.J. Lu
2018-01-11 21:49   ` Jeff Law
2018-01-11 21:49     ` H.J. Lu
2018-01-12 12:56     ` Martin Jambor
2018-01-12 14:23       ` H.J. Lu
2018-01-12 14:49         ` Kumar, Venkataramanan
2018-01-12 15:25           ` Kumar, Venkataramanan
2018-01-12 16:02             ` Jeff Law
2018-01-12 18:32             ` Kumar, Venkataramanan
2018-01-12 19:51               ` H.J. Lu
2018-01-13  6:26                 ` Kumar, Venkataramanan
2018-01-13  9:03                   ` David Woodhouse
2018-01-13 16:36                   ` David Woodhouse
2018-01-13 16:46                     ` H.J. Lu
2018-01-13 16:46                     ` Van De Ven, Arjan
2018-01-14  9:04                       ` Kumar, Venkataramanan
2018-01-12 16:03         ` Jeff Law
2018-01-07 23:36 ` [PATCH 0/5] x86: CVE-2017-5715, aka Spectre Jeff Law
2018-01-08  0:30   ` H.J. Lu
2018-01-08 10:01     ` Martin Liška
2018-01-09 18:55       ` Jeff Law
2018-01-09 18:54     ` Jeff Law
2018-01-09 19:26       ` H.J. Lu
2018-01-10 10:44       ` Eric Botcazou
2018-01-10 12:53         ` H.J. Lu
2018-01-10 13:12         ` Richard Biener
2018-01-10 13:35           ` Jakub Jelinek
2018-01-10 13:55             ` H.J. Lu
2018-01-11 10:16               ` Richard Biener
2018-01-11  1:30             ` Jeff Law
2018-01-11 10:16               ` Richard Biener
2018-01-11 13:41                 ` H.J. Lu
2018-01-12  8:07                   ` Uros Bizjak
2018-01-12 13:31                     ` H.J. Lu
2018-01-12 15:09                       ` Jan Hubicka
2018-01-12 15:30                         ` H.J. Lu
2018-01-14 16:21                     ` Uros Bizjak
2018-01-14 16:39                       ` H.J. Lu
2018-01-14 16:41                         ` Uros Bizjak
2018-01-14 16:43                           ` H.J. Lu
2018-01-14 16:48                             ` Jakub Jelinek
2018-01-14 17:17                               ` H.J. Lu
2018-01-14 17:51                                 ` Woodhouse, David
2018-01-14 18:52                                   ` Uros Bizjak
2018-01-14 20:35                                     ` Uros Bizjak
2018-01-14 20:44                                       ` David Woodhouse
2018-01-14 21:03                                         ` Uros Bizjak
2018-01-14 21:07                                           ` David Woodhouse
2018-01-14 21:09                                             ` Linus Torvalds
2018-01-14 21:16                                               ` David Woodhouse
2018-01-14 21:23                                                 ` Thomas Gleixner
2018-01-14 21:49                                               ` H.J. Lu
2018-01-14 22:03                                               ` David Woodhouse
2018-01-14 22:09                                                 ` H.J. Lu
2018-01-14 22:12                                                   ` David Woodhouse
2018-01-14 22:39                                                     ` H.J. Lu
2018-01-14 23:02                                                       ` David Woodhouse
2018-01-14 23:09                                                         ` Linus Torvalds
2018-01-14 23:21                                                           ` David Woodhouse
2018-01-14 23:23                                                             ` Linus Torvalds
2018-01-15  8:21                                           ` David Woodhouse
2018-01-17 18:34                                     ` Woodhouse, David
2018-01-18  8:02                                       ` Uros Bizjak
2018-01-18  8:18                                         ` Woodhouse, David
2018-01-14 16:58                             ` Uros Bizjak
2018-01-14 16:58                               ` H.J. Lu
2018-01-14 16:45                         ` Uros Bizjak
2018-01-16 19:57                           ` Uros Bizjak
2018-01-11 20:32                 ` Jeff Law
2018-01-11 23:45                   ` Joseph Myers
2018-01-12  0:46                     ` Jeff Law
2018-01-08 14:36   ` Alan Modra
2018-01-08 15:04     ` H.J. Lu
2018-01-08 15:07       ` Jakub Jelinek
2018-01-08 16:19         ` H.J. Lu
2018-01-08 16:32           ` Jakub Jelinek
2018-01-08 17:14         ` Michael Matz
2018-01-11  0:19     ` Jeff Law
2018-01-11 10:05       ` Alan Modra
2018-01-11 20:30         ` Jeff Law
2018-01-08 21:00   ` David Woodhouse
2018-01-11 21:19     ` Florian Weimer
2018-01-11 21:42       ` David Woodhouse
2018-01-13 12:17         ` Florian Weimer
2018-01-13 13:00           ` David Woodhouse
2018-01-08  4:22 ` Sandra Loosemore
2018-01-08  8:00   ` Markus Trippelsdorf
2018-01-08 11:40     ` H.J. Lu
2018-01-08 11:40   ` H.J. Lu
2018-01-11 13:28     ` H.J. Lu
2018-01-08 17:40   ` Florian Weimer
2018-01-08 17:33 ` Florian Weimer
2018-01-08 20:49   ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).