public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
@ 2020-10-06 14:01 Qing Zhao
  2020-10-19 13:48 ` Qing Zhao
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-06 14:01 UTC (permalink / raw)
  To: richard Sandiford, Uros Bizjak
  Cc: kees Cook, rodriguez Bahena Victor, H.J. Lu, segher Boessenkool,
	gcc-patches Kees Cook via

Hi, Gcc team,

This is the 3rd version of the implementation of patch -fzero-call-used-regs.

We will provide a new feature into GCC:

Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] command-line option
and
zero_call_used_regs("skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all") function attribues:

   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")

   Don't zero call-used registers upon function return. This is the default behavior.

   2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")

   Zero used call-used general purpose registers that are used to pass parameters upon function return.

   3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")

   Zero used call-used registers that are used to pass parameters upon function return.

   4. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")

   Zero all call-used registers that are used to pass parameters upon function return.

   5. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")

   Zero used call-used general purpose registers upon function return.

   6. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")

   Zero all call-used general purpose registers upon function return.

   7. -fzero-call-used-regs=used and zero_call_used_regs("used")

   Zero used call-used registers upon function return.

   8. -fzero-call-used-regs=all and zero_call_used_regs("all")

   Zero all call-used registers upon function return.

Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or
preventing information leak through registers.

{skip}, which is the default, doesn't zero call-used registers.

{used-arg-gpr} zeros used call-used general purpose registers that
pass parameters. {used-arg} zeros used call-used registers that
pass parameters. {arg} zeros all call-used registers that pass
parameters. These 3 choices are used for ROP mitigation.

{used-gpr} zeros call-used general purpose registers
which are used in function.  {all-gpr} zeros all
call-used registers.  {used} zeros call-used registers which
are used in function.  {all} zeros all call-used registers.
These 4 choices are used for preventing information leak through
registers.

You can control this behavior for a specific function by using the function
attribute {zero_call_used_regs}.

******Tests be done:
1. Gcc bootstrap on x86, aarch64 and rs6000.
2. Regression test on x86, aarch64 and rs6000.
(X86, aarch64 have no any issue, rs6000 failed at the new testing case in middle end which is expected)

3. Cpu2017 on x86, -O2 -fzero-call-used-regs=used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all

******runtime performance data of CPU2017 on x86
https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv <https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv>

******The major changes compared to the previous version are:

1. Add 3 new sub-options and corresponding function attributes:
  used-gpr-arg, used-arg, all-arg
  for ROP mitigation purpose;
2. Updated user manual;
3. Re-design of the implementation:

  3.1 data flow change to reflect the newly added zeroing insns to avoid
  these insns been deleted, moved, or merged by later passes:

  3.1.1.
  abstract EPILOGUE_USES into a new target-independent wrapper function that
  (a) returns true if EPILOGUE_USES itself returns true and (b) returns
  true for registers that need to be zero on return, if the zeroing
  instructions have already been inserted.  The places that currently
  test EPILOGUE_USES should then test this new wrapper function instead.

  Add this new wrapper function to df.h and df-scan.c.

  3.1.2.
  add a new utility routine "expand_asm_reg_clobber_mem_blockage" to generate
  a volatile asm insn that clobbers all the hard registers that are zeroed.

  emit this volatile asm in the very beginning of the zeroing sequence.

  3.2 new pass:
  add a new pass in the beginning of "late_compilation", before
  "pass_compute_alignment", called "pass_zero_call_used_regs".

  in this new pass,
  * compute the data flow information; (df_analyze ());
  * scan the exit block from backward to look for "return":
    A. for each return, compute the "need_zeroed_hardregs" based on
    the user request, and data flow information, and function ABI info.
    B. pass this need_zeroed_hardregs set to target hook "zero_call_used_regs"
    to generate the instruction sequnce that zero the regs.
    C. Data flow maintenance. 
4.Use "lookup_attribute" to get the attribute information instead of setting
  the attribute information into "tree_decl_with_vis" in tree-core.h.

******The changelog:

gcc/ChangeLog: 
2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* common.opt: Add new option -fzero-call-used-regs
	* config/i386/i386.c (zero_call_used_regno_p): New function.
	(zero_call_used_regno_mode): Likewise.
	(zero_all_vector_registers): Likewise.
	(zero_all_st_mm_registers): Likewise.
	(ix86_zero_call_used_regs): Likewise.
	(TARGET_ZERO_CALL_USED_REGS): Define.
	* coretypes.h (enum zero_call_used_regs): New type.
	* df-scan.c (df_epilogue_uses_p): New function.
	(df_get_exit_block_use_set): Replace EPILOGUE_USES with
	df_epilogue_uses_p.
	* df.h (df_epilogue_uses_p): Declare.
	* doc/extend.texi: Document the new zero_call_used_regs attribute.
	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
	* doc/tm.texi: Regenerate.
	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook. 
	* emit-rtl.h (struct rtl_data): New field zeroed_reg_set.
	* function.c (is_live_reg_at_return): New function.
	(gen_call_used_regs_seq): Likewise.
	(rest_of_zero_call_used_regs): Likewise.
	(class pass_zero_call_used_regs): New class.
	(make_pass_zero_call_used_regs): New function.
	* optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
	* optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
	* passes.def: Add new pass pass_zero_call_used_regs.
	* recog.c (valid_insn_p): New function.
	* recog.h (valid_insn_p): Declare.
	* resource.c (init_resource_info): Replace EPILOGUE_USES with
	df_epilogue_uses_p.
	* target.def (zero_call_used_regs): New hook.
	* targhooks.c (default_zero_call_used_regs): New function.
	* targhooks.h (default_zero_call_used_regs): Declare.
	* tree-pass.h (make_pass_zero_call_used_regs): Declare.

gcc/c-family/ChangeLog:

2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-attribs.c (c_common_attribute_table): Add new attribute
	zero_call_used_regs.
	(handle_zero_call_used_regs_attribute): New function.

gcc/testsuite/ChangeLog:

2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>

	* c-c++-common/zero-scratch-regs-1.c: New test.
	* c-c++-common/zero-scratch-regs-2.c: New test.
	* gcc.target/i386/zero-scratch-regs-1.c: New test.
	* gcc.target/i386/zero-scratch-regs-10.c: New test.
	* gcc.target/i386/zero-scratch-regs-11.c: New test.
	* gcc.target/i386/zero-scratch-regs-12.c: New test.
	* gcc.target/i386/zero-scratch-regs-13.c: New test.
	* gcc.target/i386/zero-scratch-regs-14.c: New test.
	* gcc.target/i386/zero-scratch-regs-15.c: New test.
	* gcc.target/i386/zero-scratch-regs-16.c: New test.
	* gcc.target/i386/zero-scratch-regs-17.c: New test.
	* gcc.target/i386/zero-scratch-regs-18.c: New test.
	* gcc.target/i386/zero-scratch-regs-19.c: New test.
	* gcc.target/i386/zero-scratch-regs-2.c: New test.
	* gcc.target/i386/zero-scratch-regs-20.c: New test.
	* gcc.target/i386/zero-scratch-regs-21.c: New test.
	* gcc.target/i386/zero-scratch-regs-22.c: New test.
	* gcc.target/i386/zero-scratch-regs-23.c: New test.
	* gcc.target/i386/zero-scratch-regs-24.c: New test.
	* gcc.target/i386/zero-scratch-regs-25.c: New test.
	* gcc.target/i386/zero-scratch-regs-26.c: New test.
	* gcc.target/i386/zero-scratch-regs-3.c: New test.
	* gcc.target/i386/zero-scratch-regs-4.c: New test.
	* gcc.target/i386/zero-scratch-regs-5.c: New test.
	* gcc.target/i386/zero-scratch-regs-6.c: New test.
	* gcc.target/i386/zero-scratch-regs-7.c: New test.
	* gcc.target/i386/zero-scratch-regs-8.c: New test.
	* gcc.target/i386/zero-scratch-regs-9.c: New test.


******The patch:

---
gcc/c-family/c-attribs.c                           |  50 +++++
gcc/common.opt                                     |  32 +++
gcc/config/i386/i386.c                             | 158 +++++++++++++
gcc/coretypes.h                                    |  13 ++
gcc/df-scan.c                                      |  12 +-
gcc/df.h                                           |   1 +
gcc/doc/extend.texi                                |  24 ++
gcc/doc/invoke.texi                                |  25 ++-
gcc/doc/tm.texi                                    |  10 +
gcc/doc/tm.texi.in                                 |   2 +
gcc/emit-rtl.h                                     |   3 +
gcc/function.c                                     | 247 ++++++++++++++++++++-
gcc/optabs.c                                       |  43 ++++
gcc/optabs.h                                       |   2 +
gcc/passes.def                                     |   1 +
gcc/recog.c                                        |  15 ++
gcc/recog.h                                        |   1 +
gcc/resource.c                                     |   2 +-
gcc/target.def                                     |  13 ++
gcc/targhooks.c                                    |  35 +++
gcc/targhooks.h                                    |   1 +
gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |  15 ++
gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |  16 ++
.../gcc.target/i386/zero-scratch-regs-1.c          |  12 +
.../gcc.target/i386/zero-scratch-regs-10.c         |  21 ++
.../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++
.../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++
.../gcc.target/i386/zero-scratch-regs-13.c         |  21 ++
.../gcc.target/i386/zero-scratch-regs-14.c         |  19 ++
.../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
.../gcc.target/i386/zero-scratch-regs-19.c         |  12 +
.../gcc.target/i386/zero-scratch-regs-2.c          |  19 ++
.../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++
.../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
.../gcc.target/i386/zero-scratch-regs-22.c         |  20 ++
.../gcc.target/i386/zero-scratch-regs-23.c         |  28 +++
.../gcc.target/i386/zero-scratch-regs-24.c         |  10 +
.../gcc.target/i386/zero-scratch-regs-25.c         |  10 +
.../gcc.target/i386/zero-scratch-regs-26.c         |  23 ++
.../gcc.target/i386/zero-scratch-regs-3.c          |  12 +
.../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-5.c          |  20 ++
.../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
.../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
.../gcc.target/i386/zero-scratch-regs-8.c          |  19 ++
.../gcc.target/i386/zero-scratch-regs-9.c          |  15 ++
gcc/tree-pass.h                                    |   1 +
50 files changed, 1187 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index c779d13..69c3886 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -138,6 +138,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
static tree ignore_attribute (tree *, tree, tree, int, bool *);
static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
+static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
+						  bool *);
static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
@@ -437,6 +439,8 @@ const struct attribute_spec c_common_attribute_table[] =
			      ignore_attribute, NULL },
  { "no_split_stack",	      0, 0, true,  false, false, false,
			      handle_no_split_stack_attribute, NULL },
+  { "zero_call_used_regs",    1, 1, true, false, false, false,
+			      handle_zero_call_used_regs_attribute, NULL },
  /* For internal use only (marking of function arguments).
     The name contains a space to prevent its usage in source code.  */
  { "arg spec",		      1, -1, true, false, false, false,
@@ -4959,6 +4963,52 @@ handle_no_split_stack_attribute (tree *node, tree name,
  return NULL_TREE;
}

+/* Handle a "zero_call_used_regs" attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
+				      int ARG_UNUSED (flags),
+				      bool *no_add_attris)
+{
+  tree decl = *node;
+  tree id = TREE_VALUE (args);
+
+  if (TREE_CODE (decl) != FUNCTION_DECL)
+    {
+      error_at (DECL_SOURCE_LOCATION (decl),
+		"%qE attribute applies only to functions", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if (TREE_CODE (id) != STRING_CST)
+    {
+      error ("attribute %qE arguments not a string", name);
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  if ((strcmp (TREE_STRING_POINTER (id), "skip") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "used-gpr-arg") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "used-arg") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "all-arg") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "used-gpr") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "all-gpr") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "used") != 0)
+      && (strcmp (TREE_STRING_POINTER (id), "all") != 0))
+    {
+      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs,"
+	     "%qs, %qs, %qs, or %qs",
+ 	     name, "skip", "used-gpr-arg", "used-arg", "all-arg",
+	     "used-gpr", "all-gpr", "used", "all");
+      *no_add_attris = true;
+      return NULL_TREE;
+    }
+
+  return NULL_TREE;
+}
+
/* Handle a "returns_nonnull" attribute; arguments as in
   struct attribute_spec.handler.  */

diff --git a/gcc/common.opt b/gcc/common.opt
index 292c2de..50bbf9c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3111,6 +3111,38 @@ fzero-initialized-in-bss
Common Report Var(flag_zero_initialized_in_bss) Init(1)
Put zero initialized data in the bss section.

+fzero-call-used-regs=
+Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_unset)
+Clear call-used registers upon function return.
+
+Enum
+Name(zero_call_used_regs) Type(enum zero_call_used_regs)
+Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
+
+EnumValue
+Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
+
+EnumValue
+Enum(zero_call_used_regs) String(used-gpr-arg) Value(zero_call_used_regs_used_gpr_arg)
+
+EnumValue
+Enum(zero_call_used_regs) String(used-arg) Value(zero_call_used_regs_used_arg)
+
+EnumValue
+Enum(zero_call_used_regs) String(all-arg) Value(zero_call_used_regs_all_arg)
+
+EnumValue
+Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
+
+EnumValue
+Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
+
+EnumValue
+Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
+
g
Common Driver RejectNegative JoinedOrMissing
Generate debug information in default format.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index f684954..620114f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
  return false;
}

+/* Check whether the register REGNO should be zeroed on X86.
+   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
+   together, no need to zero it again.
+   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
+   very hard to be zeroed individually, don't zero individual st or
+   mm registgers at this time.  */
+
+static bool
+zero_call_used_regno_p (const unsigned int regno,
+			bool all_sse_zeroed)
+{
+  return GENERAL_REGNO_P (regno)
+	 || (!all_sse_zeroed && SSE_REGNO_P (regno))
+	 || MASK_REGNO_P (regno);
+}
+
+/* Return the machine_mode that is used to zero register REGNO.  */
+
+static machine_mode
+zero_call_used_regno_mode (const unsigned int regno)
+{
+  /* NB: We only need to zero the lower 32 bits for integer registers
+     and the lower 128 bits for vector registers since destination are
+     zero-extended to the full register width.  */
+  if (GENERAL_REGNO_P (regno))
+    return SImode;
+  else if (SSE_REGNO_P (regno))
+    return V4SFmode;
+  else
+    return HImode;
+}
+
+/* Generate a rtx to zero all vector registers togetehr if possible,
+   otherwise, return NULL.  */
+
+static rtx
+zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
+{
+  if (!TARGET_AVX)
+    return NULL;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
+	 || (TARGET_64BIT
+	     && (REX_SSE_REGNO_P (regno)
+		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
+	&& !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      return NULL;
+
+  return gen_avx_vzeroall ();
+}
+
+/* Generate a rtx to zero all st and mm registers togetehr if possible,
+   otherwise, return NULL.  */
+
+static rtx
+zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
+{
+  if (!TARGET_MMX)
+    return NULL;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
+	&& !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      return NULL;
+
+  return gen_mmx_emms ();
+}
+
+/* TARGET_ZERO_CALL_USED_REGS.  */
+/* Generate a sequence of instructions that zero registers specified by
+   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
+   zeroed.  */
+static HARD_REG_SET
+ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  bool all_sse_zeroed = false;
+
+  /* first, let's see whether we can zero all vector registers together.  */
+  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
+  if (zero_all_vec_insn)
+    {
+      emit_insn (zero_all_vec_insn);
+      all_sse_zeroed = true;
+    }
+
+  /* then, let's see whether we can zero all st+mm registers togeter.  */
+  rtx zero_all_st_mm_insn = zero_all_st_mm_registers (need_zeroed_hardregs);
+  if (zero_all_st_mm_insn)
+    emit_insn (zero_all_st_mm_insn);
+
+  /* Now, generate instructions to zero all the registers.  */
+
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+  rtx zero_gpr = NULL_RTX;
+  rtx zero_vector = NULL_RTX;
+  rtx zero_mask = NULL_RTX;
+
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+	continue;
+      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
+	continue;
+
+      SET_HARD_REG_BIT (zeroed_hardregs, regno);
+
+      rtx reg, tmp;
+      machine_mode mode = zero_call_used_regno_mode (regno);
+
+      reg = gen_rtx_REG (mode, regno);
+
+      if (mode == SImode)
+	if (zero_gpr == NULL_RTX)
+	  {
+	    zero_gpr = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
+	      {
+		rtx clob = gen_rtx_CLOBBER (VOIDmode,
+					    gen_rtx_REG (CCmode,
+							 FLAGS_REG));
+		tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
+							     tmp,
+							     clob));
+	      }
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_gpr);
+      else if (mode == V4SFmode)
+	if (zero_vector == NULL_RTX)
+	  {
+	    zero_vector = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_vector);
+      else if (mode == HImode)
+	if (zero_mask == NULL_RTX)
+	  {
+	    zero_mask = reg;
+	    tmp = gen_rtx_SET (reg, const0_rtx);
+	    emit_insn (tmp);
+	  }
+	else
+	  emit_move_insn (reg, zero_mask);
+      else
+	gcc_unreachable ();
+    }
+  return zeroed_hardregs;
+}
+
/* Define how to find the value returned by a function.
   VALTYPE is the data type of the value (as a tree).
   If the precise function being called is known, FUNC is its FUNCTION_DECL;
@@ -23229,6 +23384,9 @@ ix86_run_selftests (void)
#undef TARGET_FUNCTION_VALUE_REGNO_P
#define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p

+#undef TARGET_ZERO_CALL_USED_REGS
+#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
+
#undef TARGET_PROMOTE_FUNCTION_MODE
#define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode

diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 6b6cfcd..0ce5eb4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -418,6 +418,19 @@ enum symbol_visibility
  VISIBILITY_INTERNAL
};

+/* Zero call-used registers type.  */
+enum zero_call_used_regs {
+  zero_call_used_regs_unset = 0,
+  zero_call_used_regs_skip,
+  zero_call_used_regs_used_gpr_arg,
+  zero_call_used_regs_used_arg,
+  zero_call_used_regs_all_arg,
+  zero_call_used_regs_used_gpr,
+  zero_call_used_regs_all_gpr,
+  zero_call_used_regs_used,
+  zero_call_used_regs_all
+};
+
/* enums used by the targetm.excess_precision hook.  */

enum flt_eval_method
diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 93b060f..630970b 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3614,6 +3614,14 @@ df_update_entry_block_defs (void)
}


+/* Return true if REGNO is used by the epilogue.  */
+bool
+df_epilogue_uses_p (unsigned int regno)
+{
+    return (EPILOGUE_USES (regno)
+	    || TEST_HARD_REG_BIT (crtl->zeroed_reg_set, regno));
+}
+
/* Set the bit for regs that are considered being used at the exit. */

static void
@@ -3661,7 +3669,7 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
     epilogue as being live at the end of the function since they
     may be referenced by our caller.  */
  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (global_regs[i] || EPILOGUE_USES (i))
+    if (global_regs[i] || df_epilogue_uses_p (i))
      bitmap_set_bit (exit_block_uses, i);

  if (targetm.have_epilogue () && epilogue_completed)
@@ -3802,7 +3810,6 @@ df_hard_reg_init (void)
  initialized = true;
}

-
/* Recompute the parts of scanning that are based on regs_ever_live
   because something changed in that array.  */

@@ -3862,7 +3869,6 @@ df_regs_ever_live_p (unsigned int regno)
  return regs_ever_live[regno];
}

-
/* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
   to change, schedule that change for the next update.  */

diff --git a/gcc/df.h b/gcc/df.h
index 8b6ca8c..0f098d7 100644
--- a/gcc/df.h
+++ b/gcc/df.h
@@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
extern bool df_hard_reg_used_p (unsigned int);
extern unsigned int df_hard_reg_used_count (unsigned int);
extern bool df_regs_ever_live_p (unsigned int);
+extern bool df_epilogue_uses_p (unsigned int);
extern void df_set_regs_ever_live (unsigned int, bool);
extern void df_compute_regs_ever_live (bool);
extern void df_scan_verify (void);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index c9f7299..f56f61a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -3992,6 +3992,30 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
A declaration to which @code{weakref} is attached and that is associated
with a named @code{target} must be @code{static}.

+@item zero_call_used_regs ("@var{choice}")
+@cindex @code{zero_call_used_regs} function attribute
+
+The @code{zero_call_used_regs} attribute causes the compiler to zero
+call-used registers at function return according to @var{choice}.
+This is used to increase the program security by either mitigating
+Return-Oriented Programming (ROP) or preventing information leak
+through registers.
+@samp{skip} doesn't zero call-used registers.
+
+@samp{used-arg-gpr} zeros used call-used general purpose registers that
+pass parameters. @samp{used-arg} zeros used call-used registers that
+pass parameters. @samp{arg} zeros all call-used registers that pass
+parameters.  These 3 choices are used for ROP mitigation.
+
+@samp{used-gpr} zeros call-used general purpose registers
+which are used in function.  @samp{all-gpr} zeros all
+call-used registers.  @samp{used} zeros call-used registers which
+are used in function.  @samp{all} zeros all call-used registers.
+These 4 choices are used for preventing information leak through
+registers.
+
+The default for the attribute is controlled by @option{-fzero-call-used-regs}.
+
@end table

@c This is the end of the target-independent attribute table
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c049932..aa04a3c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -550,7 +550,7 @@ Objective-C and Objective-C++ Dialects}.
-funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
-funsafe-math-optimizations  -funswitch-loops @gol
-fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
--fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
+-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
--param @var{name}=@var{value}
-O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}

@@ -12550,6 +12550,29 @@ int foo (void)

Not all targets support this option.

+@item -fzero-call-used-regs=@var{choice}
+@opindex fzero-call-used-regs
+Zero call-used registers at function return to increase the program
+security by either mitigating Return-Oriented Programming (ROP) or
+preventing information leak through registers.
+
+@samp{skip}, which is the default, doesn't zero call-used registers.
+
+@samp{used-gpr-arg} zeros used call-used general purpose registers that
+pass parameters. @samp{used-arg} zeros used call-used registers that
+pass parameters. @samp{all-arg} zeros all call-used registers that pass
+parameters.  These 3 choices are used for ROP mitigation.
+
+@samp{used-gpr} zeros call-used general purpose registers
+which are used in function.  @samp{all-gpr} zeros all
+call-used registers.  @samp{used} zeros call-used registers which
+are used in function.  @samp{all} zeros all call-used registers.
+These 4 choices are used for preventing information leak through
+registers.
+
+You can control this behavior for a specific function by using the function
+attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
+
@item --param @var{name}=@var{value}
@opindex param
In some places, GCC uses various constants to control the amount of
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 97437e8..7ecff05 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -12053,6 +12053,16 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
is needed.
@end deftypefn

+@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{need_zeroed_hardregs})
+This target hook emits instructions to zero registers specified
+by @var{need_zeroed_hardregs} at function return, at the same time
+return the hard register set that are actually zeroed by the hook
+Define this hook if the target has more effecient instructions to
+zero call-used registers, or if the target only tries to zero a subset
+of @var{need_zeroed_hardregs}.
+If the hook is not defined, the default_zero_call_used_reg will be used.
+@end deftypefn
+
@deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
When optimization is disabled, this hook indicates whether or not
arguments should be allocated to stack slots.  Normally, GCC allocates
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 412e22c..a67dbea 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8111,6 +8111,8 @@ and the associated definitions of those functions.

@hook TARGET_GET_DRAP_RTX

+@hook TARGET_ZERO_CALL_USED_REGS
+
@hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS

@hook TARGET_CONST_ANCHOR
diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
index 92ad0dd6..2dbeace0 100644
--- a/gcc/emit-rtl.h
+++ b/gcc/emit-rtl.h
@@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
     sets them.  */
  HARD_REG_SET asm_clobbers;

+  /* All hard registers that are zeroed at the return of the routine.  */
+  HARD_REG_SET zeroed_reg_set;
+
  /* The highest address seen during shorten_branches.  */
  int max_insn_address;
};
diff --git a/gcc/function.c b/gcc/function.c
index c612959..c8181bd 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
#include "emit-rtl.h"
#include "recog.h"
#include "rtl-error.h"
+#include "hard-reg-set.h"
#include "alias.h"
#include "fold-const.h"
#include "stor-layout.h"
@@ -5815,6 +5816,182 @@ make_prologue_seq (void)
  return seq;
}

+/* Check whether the hard register REGNO is live before the return insn RET.  */
+static bool
+is_live_reg_at_return (unsigned int regno, rtx_insn * ret)
+{
+  basic_block bb = BLOCK_FOR_INSN (ret);
+  auto_bitmap live_out;
+  bitmap_copy (live_out, df_get_live_out (bb));
+  df_simulate_one_insn_backwards (bb, ret, live_out);
+
+  if (REGNO_REG_SET_P (live_out, regno))
+    return true;
+
+  return false;
+}
+
+/* Emit a sequence of insns to zero the call-used-registers before RET.  */
+
+static void
+gen_call_used_regs_seq (rtx_insn *ret)
+{
+  bool gpr_only = true;
+  bool used_only = true;
+  bool arg_only = true;
+  enum zero_call_used_regs zero_regs_type = zero_call_used_regs_unset;
+  enum zero_call_used_regs attr_zero_regs_type
+			    = zero_call_used_regs_unset;
+  tree attr_zero_regs
+	= lookup_attribute ("zero_call_used_regs",
+			    DECL_ATTRIBUTES (cfun->decl));
+
+  /* Get the type of zero_call_used_regs from function attribute.  */
+  if (attr_zero_regs)
+    {
+      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
+	 is the attribute argument's value.  */
+      attr_zero_regs = TREE_VALUE (attr_zero_regs);
+      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
+      attr_zero_regs = TREE_VALUE (attr_zero_regs);
+      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
+
+      if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "skip") == 0)
+	attr_zero_regs_type = zero_call_used_regs_skip;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr-arg")
+		== 0)
+	attr_zero_regs_type = zero_call_used_regs_used_gpr_arg;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-arg") == 0)
+	attr_zero_regs_type = zero_call_used_regs_used_arg;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-arg") == 0)
+	attr_zero_regs_type = zero_call_used_regs_all_arg;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr") == 0)
+	attr_zero_regs_type = zero_call_used_regs_used_gpr;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-gpr") == 0)
+	attr_zero_regs_type = zero_call_used_regs_all_gpr;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used") == 0)
+	attr_zero_regs_type = zero_call_used_regs_used;
+      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all") == 0)
+	attr_zero_regs_type = zero_call_used_regs_all;
+      else
+	gcc_assert (0);
+    }
+
+  if (flag_zero_call_used_regs)
+    if (!attr_zero_regs)
+      zero_regs_type = flag_zero_call_used_regs;
+    else
+      zero_regs_type = attr_zero_regs_type;
+  else
+    zero_regs_type = attr_zero_regs_type;
+
+  /* No need to zero call-used-regs when no user request is present.  */
+  if (zero_regs_type <= zero_call_used_regs_skip)
+    return;
+
+  /* No need to zero call-used-regs in main ().  */
+  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
+    return;
+
+  /* No need to zero call-used-regs if __builtin_eh_return is called
+     since it isn't a normal function return.  */
+  if (crtl->calls_eh_return)
+    return;
+
+  /* If gpr_only is true, only zero call-used-registers that are
+     general-purpose registers; if used_only is true, only zero
+     call-used-registers that are used in the current function.  */
+
+  switch (zero_regs_type)
+    {
+      case zero_call_used_regs_used_arg:
+	gpr_only = false;
+	break;
+      case zero_call_used_regs_all_arg:
+	gpr_only = false;
+	used_only = false;
+	break;
+      case zero_call_used_regs_used_gpr:
+	arg_only = false;
+	break;
+      case zero_call_used_regs_all_gpr:
+	used_only = false;
+	arg_only = false;
+	break;
+      case zero_call_used_regs_used:
+	gpr_only = false;
+	arg_only = false;
+	break;
+      case zero_call_used_regs_all:
+	gpr_only = false;
+	used_only = false;
+	arg_only = false;
+	break;
+      default:
+	break;
+    }
+
+  /* For each of the hard registers, check to see whether we should zero it if:
+     1. it is a call-used-registers;
+ and 2. it is not a fixed-registers;
+ and 3. it is not live at the return of the routine;
+ and 4. it is general registor if gpr_only is true;
+ and 5. it is used in the routine if used_only is true;
+ and 6. it is a register that passes parameter if arg_only is true;
+   */
+
+  HARD_REG_SET need_zeroed_hardregs;
+  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    {
+      if (!this_target_hard_regs->x_call_used_regs[regno])
+	continue;
+      if (fixed_regs[regno])
+	continue;
+      if (is_live_reg_at_return (regno, ret))
+	continue;
+      if (gpr_only
+	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
+	continue;
+      if (used_only && !df_regs_ever_live_p (regno))
+	continue;
+      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
+	continue;
+
+      /* Now this is a register that we might want to zero.  */
+      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
+    }
+
+  if (hard_reg_set_empty_p (need_zeroed_hardregs))
+    return;
+
+  /* Now we get a hard register set that need to be zeroed, pass it to
+     target to generate zeroing sequence.  */
+  HARD_REG_SET zeroed_hardregs;
+  start_sequence ();
+  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
+  rtx_insn *seq = get_insns ();
+  end_sequence ();
+  if (seq)
+    {
+      /* emit the memory blockage and register clobber asm volatile before
+	 the whole sequence.  */
+      start_sequence ();
+      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
+      rtx_insn *seq_barrier = get_insns ();
+      end_sequence ();
+
+      emit_insn_before (seq_barrier, ret);
+      emit_insn_before (seq, ret);
+
+      /* update the data flow information.  */
+      crtl->zeroed_reg_set |= zeroed_hardregs;
+      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
+    }
+  return;
+}
+
+
/* Return a sequence to be used as the epilogue for the current function,
   or NULL.  */

@@ -6486,7 +6663,75 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
{
  return new pass_thread_prologue_and_epilogue (ctxt);
}
-

+
+static unsigned int
+rest_of_zero_call_used_regs (void)
+{
+  basic_block bb;
+  rtx_insn *insn;
+
+  /* This pass needs data flow information.  */
+  df_analyze ();
+
+  /* Search all the "return"s in the routine, and insert instruction sequence to
+     zero the call used registers.  */
+  FOR_EACH_BB_REVERSE_FN (bb, cfun)
+    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
+	|| (single_succ_p (bb)
+	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
+      FOR_BB_INSNS_REVERSE (bb, insn)
+	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
+	  {
+	    /* Now we can insert the instruction sequence to zero the call used
+	       registers before this insn.  */
+	    gen_call_used_regs_seq (insn);
+	    break;
+	  }
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_zero_call_used_regs =
+{
+  RTL_PASS, /* type */
+  "zero_call_used_regs", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_zero_call_used_regs: public rtl_opt_pass
+{
+public:
+  pass_zero_call_used_regs (gcc::context *ctxt)
+    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return flag_zero_call_used_regs > zero_call_used_regs_unset;
+    }
+  virtual unsigned int execute (function *)
+    {
+      return rest_of_zero_call_used_regs ();
+    }
+
+}; // class pass_zero_call_used_regs
+
+} // anon namespace
+
+rtl_opt_pass *
+make_pass_zero_call_used_regs (gcc::context *ctxt)
+{
+  return new pass_zero_call_used_regs (ctxt);
+}

/* If CONSTRAINT is a matching constraint, then return its number.
   Otherwise, return -1.  */
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 8ad7f4b..57e5c5d 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
    expand_asm_memory_blockage ();
}

+/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
+   same time clobbering the register set specified by ZEROED_REGS.  */
+
+void
+expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)
+{
+  rtx asm_op, clob_mem, clob_reg;
+
+  unsigned int num_of_regs = 0;
+  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (TEST_HARD_REG_BIT (zeroed_regs, i))
+      num_of_regs++;
+
+  if (num_of_regs == 0)
+    return;
+
+  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
+				 rtvec_alloc (0), rtvec_alloc (0),
+				 rtvec_alloc (0), UNKNOWN_LOCATION);
+  MEM_VOLATILE_P (asm_op) = 1;
+
+  rtvec v = rtvec_alloc (num_of_regs + 2);
+
+  clob_mem = gen_rtx_SCRATCH (VOIDmode);
+  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
+  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
+
+  RTVEC_ELT (v,0) = asm_op;
+  RTVEC_ELT (v,1) = clob_mem;
+
+  unsigned int j = 2;
+  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    if (TEST_HARD_REG_BIT (zeroed_regs, i))
+      {
+	clob_reg  = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
+	RTVEC_ELT (v,j) = clob_reg;
+ 	j++;
+      }
+  gcc_assert (j == (num_of_regs + 2));
+
+  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
+}
+
/* This routine will either emit the mem_thread_fence pattern or issue a 
   sync_synchronize to generate a fence for memory model MEMMODEL.  */

diff --git a/gcc/optabs.h b/gcc/optabs.h
index 0b14700..bfa10c8 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, 
			      bool);

+extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
+
extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
				  rtx operand);
extern bool valid_multiword_target_p (rtx);
diff --git a/gcc/passes.def b/gcc/passes.def
index f865bdc..77d4676 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -492,6 +492,7 @@ along with GCC; see the file COPYING3.  If not see
      POP_INSERT_PASSES ()
      NEXT_PASS (pass_late_compilation);
      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
+	  NEXT_PASS (pass_zero_call_used_regs);
	  NEXT_PASS (pass_compute_alignments);
	  NEXT_PASS (pass_variable_tracking);
	  NEXT_PASS (pass_free_cfg);
diff --git a/gcc/recog.c b/gcc/recog.c
index ce83b7f..472c2dc 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
  return ((num_changes_pending () > 0) && (apply_change_group () > 0));
}


+
+bool
+valid_insn_p (rtx_insn *insn)
+{
+  recog_memoized (insn);
+  if (INSN_CODE (insn) < 0)
+    return false;
+  extract_insn (insn);
+  /* We don't know whether the insn will be in code that is optimized
+     for size or speed, so consider all enabled alternatives.  */
+  if (!constrain_operands (1, get_enabled_alternatives (insn)))
+    return false;
+  return true;
+}
+
/* Return 1 if OP is a valid general operand for machine mode MODE.
   This is either a register reference, a memory reference,
   or a constant.  In the case of a memory reference, the address
diff --git a/gcc/recog.h b/gcc/recog.h
index ae3675f..d87456c 100644
--- a/gcc/recog.h
+++ b/gcc/recog.h
@@ -113,6 +113,7 @@ extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
extern bool validate_simplify_insn (rtx_insn *insn);
extern int num_changes_pending (void);
extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
+extern bool valid_insn_p (rtx_insn *);

extern int offsettable_memref_p (rtx);
extern int offsettable_nonstrict_memref_p (rtx);
diff --git a/gcc/resource.c b/gcc/resource.c
index 0a9d594..90cf091 100644
--- a/gcc/resource.c
+++ b/gcc/resource.c
@@ -1186,7 +1186,7 @@ init_resource_info (rtx_insn *epilogue_insn)
			       &end_of_function_needs, true);

  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-    if (global_regs[i] || EPILOGUE_USES (i))
+    if (global_regs[i] || df_epilogue_uses_p (i))
      SET_HARD_REG_BIT (end_of_function_needs.regs, i);

  /* The registers required to be live at the end of the function are
diff --git a/gcc/target.def b/gcc/target.def
index ed2da15..7d6807d 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5080,6 +5080,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
is needed.",
 rtx, (void), NULL)

+/* Generate instruction sequence to zero call used registers.  */
+DEFHOOK
+(zero_call_used_regs,
+ "This target hook emits instructions to zero registers specified\n\
+by @var{need_zeroed_hardregs} at function return, at the same time\n\
+return the hard register set that are actually zeroed by the hook\n\
+Define this hook if the target has more effecient instructions to\n\
+zero call-used registers, or if the target only tries to zero a subset\n\
+of @var{need_zeroed_hardregs}.\n\
+If the hook is not defined, the default_zero_call_used_reg will be used.",
+ HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),
+default_zero_call_used_regs)
+
/* Return true if all function parameters should be spilled to the
   stack.  */
DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 5d94fce..2318c324 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
#include "tree-ssa-alias.h"
#include "gimple-expr.h"
#include "memmodel.h"
+#include "backend.h"
+#include "emit-rtl.h"
+#include "df.h"
#include "tm_p.h"
#include "stringpool.h"
#include "tree-vrp.h"
@@ -987,6 +990,38 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
#endif
}

+/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
+
+HARD_REG_SET
+default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
+{
+  HARD_REG_SET zeroed_hardregs;
+  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
+
+  CLEAR_HARD_REG_SET (zeroed_hardregs);
+  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
+    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
+      {
+	rtx_insn *last_insn = get_last_insn ();
+	machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
+	rtx zero = CONST0_RTX (mode);
+	rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
+	if (!valid_insn_p (insn))
+	  {
+	    static bool issued_error;
+	    if (!issued_error)
+	      {
+		issued_error = true;
+		sorry ("-fzero-call-used-regs not supported on this target");
+	      }
+	    delete_insns_since (last_insn);
+	  }
+	else
+	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
+      }
+  return zeroed_hardregs;
+}
+
rtx
default_internal_arg_pointer (void)
{
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 44ab926..e0a925f 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -160,6 +160,7 @@ extern unsigned int default_function_arg_round_boundary (machine_mode,
							 const_tree);
extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
extern rtx default_function_value (const_tree, const_tree, bool);
+extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
extern rtx default_libcall_value (machine_mode, const_rtx);
extern bool default_function_value_regno_p (const unsigned int);
extern rtx default_internal_arg_pointer (void);
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
new file mode 100644
index 0000000..f44add9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
@@ -0,0 +1,15 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+volatile int result = 0;
+int 
+__attribute__((noinline))
+foo (int x)
+{
+  return x;
+}
+int main()
+{
+  result = foo (2);
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
new file mode 100644
index 0000000..7c8350b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
@@ -0,0 +1,16 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+volatile int result = 0;
+int 
+__attribute__((noinline))
+__attribute__ ((zero_call_used_regs("all")))
+foo (int x)
+{
+  return x;
+}
+int main()
+{
+  result = foo (2);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
new file mode 100644
index 0000000..9f61dc4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
new file mode 100644
index 0000000..09048e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
new file mode 100644
index 0000000..4862688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
new file mode 100644
index 0000000..500251b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
@@ -0,0 +1,39 @@
+/* { dg-do run { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+struct S { int i; };
+__attribute__((const, noinline, noclone))
+struct S foo (int x)
+{
+  struct S s;
+  s.i = x;
+  return s;
+}
+
+int a[2048], b[2048], c[2048], d[2048];
+struct S e[2048];
+
+__attribute__((noinline, noclone)) void
+bar (void)
+{
+  int i;
+  for (i = 0; i < 1024; i++)
+    {
+      e[i] = foo (i);
+      a[i+2] = a[i] + a[i+1];
+      b[10] = b[10] + i;
+      c[i] = c[2047 - i];
+      d[i] = d[i + 1];
+    }
+}
+
+int
+main ()
+{
+  int i;
+  bar ();
+  for (i = 0; i < 1024; i++)
+    if (e[i].i != i)
+      __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
new file mode 100644
index 0000000..8b058e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
new file mode 100644
index 0000000..d4eaaf7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
new file mode 100644
index 0000000..dd3bb90
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
new file mode 100644
index 0000000..e2274f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
new file mode 100644
index 0000000..7f5d153
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
new file mode 100644
index 0000000..fe13d2b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
new file mode 100644
index 0000000..205a532
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
new file mode 100644
index 0000000..e046684
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
new file mode 100644
index 0000000..4be8ff6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
+
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
+/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
new file mode 100644
index 0000000..0eb34e0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
+
+__attribute__ ((zero_call_used_regs("used")))
+float
+foo (float z, float y, float x)
+{
+  return x + y;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
new file mode 100644
index 0000000..76742bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler "emms" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
new file mode 100644
index 0000000..18a5ffb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
@@ -0,0 +1,28 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler "vzeroall" } } */
+/* { dg-final { scan-assembler "emms" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
new file mode 100644
index 0000000..208633e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
new file mode 100644
index 0000000..21e82c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
new file mode 100644
index 0000000..293d2fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
@@ -0,0 +1,23 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
+
+int 
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
+/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
+/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
new file mode 100644
index 0000000..de71223
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
@@ -0,0 +1,12 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
new file mode 100644
index 0000000..ccfa441
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
new file mode 100644
index 0000000..6b46ca3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
@@ -0,0 +1,20 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+__attribute__ ((zero_call_used_regs("all-gpr")))
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
new file mode 100644
index 0000000..0680f38
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
+
+void
+foo (void)
+{
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
+/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
new file mode 100644
index 0000000..534defa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
@@ -0,0 +1,13 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
new file mode 100644
index 0000000..477bb19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
new file mode 100644
index 0000000..a305a60
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
@@ -0,0 +1,15 @@
+/* { dg-do compile { target *-*-linux* } } */
+/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
+
+extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
+
+int
+foo (int x)
+{
+  return x;
+}
+
+/* { dg-final { scan-assembler-not "vzeroall" } } */
+/* { dg-final { scan-assembler-not "%xmm" } } */
+/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
+/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 62e5b69..8afe8ee 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -592,6 +592,7 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
							     *ctxt);
+extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-06 14:01 [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] Qing Zhao
@ 2020-10-19 13:48 ` Qing Zhao
  2020-10-19 19:30 ` Uros Bizjak
  2020-10-20 18:12 ` Richard Sandiford
  2 siblings, 0 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-19 13:48 UTC (permalink / raw)
  To: Richard Sandiford, Uros Bizjak
  Cc: gcc-patches Kees Cook via, kees Cook, segher Boessenkool,
	rodriguez Bahena Victor

Ping.

Richard and Uros,

Could you please review the patch and let me know whether it’s Okay for gcc 11?

Thanks a lot.


Qing

> On Oct 6, 2020, at 9:01 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> Hi, Gcc team,
> 
> This is the 3rd version of the implementation of patch -fzero-call-used-regs.
> 
> We will provide a new feature into GCC:
> 
> Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] command-line option
> and
> zero_call_used_regs("skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all") function attribues:
> 
>   1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
> 
>   Don't zero call-used registers upon function return. This is the default behavior.
> 
>   2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")
> 
>   Zero used call-used general purpose registers that are used to pass parameters upon function return.
> 
>   3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")
> 
>   Zero used call-used registers that are used to pass parameters upon function return.
> 
>   4. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")
> 
>   Zero all call-used registers that are used to pass parameters upon function return.
> 
>   5. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
> 
>   Zero used call-used general purpose registers upon function return.
> 
>   6. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
> 
>   Zero all call-used general purpose registers upon function return.
> 
>   7. -fzero-call-used-regs=used and zero_call_used_regs("used")
> 
>   Zero used call-used registers upon function return.
> 
>   8. -fzero-call-used-regs=all and zero_call_used_regs("all")
> 
>   Zero all call-used registers upon function return.
> 
> Zero call-used registers at function return to increase the program
> security by either mitigating Return-Oriented Programming (ROP) or
> preventing information leak through registers.
> 
> {skip}, which is the default, doesn't zero call-used registers.
> 
> {used-arg-gpr} zeros used call-used general purpose registers that
> pass parameters. {used-arg} zeros used call-used registers that
> pass parameters. {arg} zeros all call-used registers that pass
> parameters. These 3 choices are used for ROP mitigation.
> 
> {used-gpr} zeros call-used general purpose registers
> which are used in function.  {all-gpr} zeros all
> call-used registers.  {used} zeros call-used registers which
> are used in function.  {all} zeros all call-used registers.
> These 4 choices are used for preventing information leak through
> registers.
> 
> You can control this behavior for a specific function by using the function
> attribute {zero_call_used_regs}.
> 
> ******Tests be done:
> 1. Gcc bootstrap on x86, aarch64 and rs6000.
> 2. Regression test on x86, aarch64 and rs6000.
> (X86, aarch64 have no any issue, rs6000 failed at the new testing case in middle end which is expected)
> 
> 3. Cpu2017 on x86, -O2 -fzero-call-used-regs=used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all
> 
> ******runtime performance data of CPU2017 on x86
> https://urldefense.com/v3/__https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv__;!!GqivPVa7Brio!KabFawXf4waV8v6RfHHKO5ZoFbei3YVhbR9Pquv8AVICgBHpjAkuEb5mfVbaNozS$  <https://urldefense.com/v3/__https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv__;!!GqivPVa7Brio!KabFawXf4waV8v6RfHHKO5ZoFbei3YVhbR9Pquv8AVICgBHpjAkuEb5mfVbaNozS$ >
> 
> ******The major changes compared to the previous version are:
> 
> 1. Add 3 new sub-options and corresponding function attributes:
>  used-gpr-arg, used-arg, all-arg
>  for ROP mitigation purpose;
> 2. Updated user manual;
> 3. Re-design of the implementation:
> 
>  3.1 data flow change to reflect the newly added zeroing insns to avoid
>  these insns been deleted, moved, or merged by later passes:
> 
>  3.1.1.
>  abstract EPILOGUE_USES into a new target-independent wrapper function that
>  (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>  true for registers that need to be zero on return, if the zeroing
>  instructions have already been inserted.  The places that currently
>  test EPILOGUE_USES should then test this new wrapper function instead.
> 
>  Add this new wrapper function to df.h and df-scan.c.
> 
>  3.1.2.
>  add a new utility routine "expand_asm_reg_clobber_mem_blockage" to generate
>  a volatile asm insn that clobbers all the hard registers that are zeroed.
> 
>  emit this volatile asm in the very beginning of the zeroing sequence.
> 
>  3.2 new pass:
>  add a new pass in the beginning of "late_compilation", before
>  "pass_compute_alignment", called "pass_zero_call_used_regs".
> 
>  in this new pass,
>  * compute the data flow information; (df_analyze ());
>  * scan the exit block from backward to look for "return":
>    A. for each return, compute the "need_zeroed_hardregs" based on
>    the user request, and data flow information, and function ABI info.
>    B. pass this need_zeroed_hardregs set to target hook "zero_call_used_regs"
>    to generate the instruction sequnce that zero the regs.
>    C. Data flow maintenance. 
> 4.Use "lookup_attribute" to get the attribute information instead of setting
>  the attribute information into "tree_decl_with_vis" in tree-core.h.
> 
> ******The changelog:
> 
> gcc/ChangeLog: 
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* common.opt: Add new option -fzero-call-used-regs
> 	* config/i386/i386.c (zero_call_used_regno_p): New function.
> 	(zero_call_used_regno_mode): Likewise.
> 	(zero_all_vector_registers): Likewise.
> 	(zero_all_st_mm_registers): Likewise.
> 	(ix86_zero_call_used_regs): Likewise.
> 	(TARGET_ZERO_CALL_USED_REGS): Define.
> 	* coretypes.h (enum zero_call_used_regs): New type.
> 	* df-scan.c (df_epilogue_uses_p): New function.
> 	(df_get_exit_block_use_set): Replace EPILOGUE_USES with
> 	df_epilogue_uses_p.
> 	* df.h (df_epilogue_uses_p): Declare.
> 	* doc/extend.texi: Document the new zero_call_used_regs attribute.
> 	* doc/invoke.texi: Document the new -fzero-call-used-regs option.
> 	* doc/tm.texi: Regenerate.
> 	* doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook. 
> 	* emit-rtl.h (struct rtl_data): New field zeroed_reg_set.
> 	* function.c (is_live_reg_at_return): New function.
> 	(gen_call_used_regs_seq): Likewise.
> 	(rest_of_zero_call_used_regs): Likewise.
> 	(class pass_zero_call_used_regs): New class.
> 	(make_pass_zero_call_used_regs): New function.
> 	* optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
> 	* optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
> 	* passes.def: Add new pass pass_zero_call_used_regs.
> 	* recog.c (valid_insn_p): New function.
> 	* recog.h (valid_insn_p): Declare.
> 	* resource.c (init_resource_info): Replace EPILOGUE_USES with
> 	df_epilogue_uses_p.
> 	* target.def (zero_call_used_regs): New hook.
> 	* targhooks.c (default_zero_call_used_regs): New function.
> 	* targhooks.h (default_zero_call_used_regs): Declare.
> 	* tree-pass.h (make_pass_zero_call_used_regs): Declare.
> 
> gcc/c-family/ChangeLog:
> 
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-attribs.c (c_common_attribute_table): Add new attribute
> 	zero_call_used_regs.
> 	(handle_zero_call_used_regs_attribute): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com <mailto:qing.zhao@oracle.com>>
> 	    H.J. Lu  <hjl.tools@gmail.com <mailto:hjl.tools@gmail.com>>
> 
> 	* c-c++-common/zero-scratch-regs-1.c: New test.
> 	* c-c++-common/zero-scratch-regs-2.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-1.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-10.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-11.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-12.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-13.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-14.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-15.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-16.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-17.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-18.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-19.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-2.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-20.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-21.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-22.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-23.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-24.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-25.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-26.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-3.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-4.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-5.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-6.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-7.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-8.c: New test.
> 	* gcc.target/i386/zero-scratch-regs-9.c: New test.
> 
> 
> ******The patch:
> 
> ---
> gcc/c-family/c-attribs.c                           |  50 +++++
> gcc/common.opt                                     |  32 +++
> gcc/config/i386/i386.c                             | 158 +++++++++++++
> gcc/coretypes.h                                    |  13 ++
> gcc/df-scan.c                                      |  12 +-
> gcc/df.h                                           |   1 +
> gcc/doc/extend.texi                                |  24 ++
> gcc/doc/invoke.texi                                |  25 ++-
> gcc/doc/tm.texi                                    |  10 +
> gcc/doc/tm.texi.in                                 |   2 +
> gcc/emit-rtl.h                                     |   3 +
> gcc/function.c                                     | 247 ++++++++++++++++++++-
> gcc/optabs.c                                       |  43 ++++
> gcc/optabs.h                                       |   2 +
> gcc/passes.def                                     |   1 +
> gcc/recog.c                                        |  15 ++
> gcc/recog.h                                        |   1 +
> gcc/resource.c                                     |   2 +-
> gcc/target.def                                     |  13 ++
> gcc/targhooks.c                                    |  35 +++
> gcc/targhooks.h                                    |   1 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |  15 ++
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |  16 ++
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 +
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 ++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 ++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 ++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 +
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 ++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  20 ++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  28 +++
> .../gcc.target/i386/zero-scratch-regs-24.c         |  10 +
> .../gcc.target/i386/zero-scratch-regs-25.c         |  10 +
> .../gcc.target/i386/zero-scratch-regs-26.c         |  23 ++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 +
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 ++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 ++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 ++
> gcc/tree-pass.h                                    |   1 +
> 50 files changed, 1187 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> 
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index c779d13..69c3886 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -138,6 +138,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +						  bool *);
> static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> @@ -437,6 +439,8 @@ const struct attribute_spec c_common_attribute_table[] =
> 			      ignore_attribute, NULL },
>  { "no_split_stack",	      0, 0, true,  false, false, false,
> 			      handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +			      handle_zero_call_used_regs_attribute, NULL },
>  /* For internal use only (marking of function arguments).
>     The name contains a space to prevent its usage in source code.  */
>  { "arg spec",		      1, -1, true, false, false, false,
> @@ -4959,6 +4963,52 @@ handle_no_split_stack_attribute (tree *node, tree name,
>  return NULL_TREE;
> }
> 
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +				      int ARG_UNUSED (flags),
> +				      bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if ((strcmp (TREE_STRING_POINTER (id), "skip") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all") != 0))
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs,"
> +	     "%qs, %qs, %qs, or %qs",
> + 	     name, "skip", "used-gpr-arg", "used-arg", "all-arg",
> +	     "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>   struct attribute_spec.handler.  */
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 292c2de..50bbf9c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3111,6 +3111,38 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
> 
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_unset)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr-arg) Value(zero_call_used_regs_used_gpr_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-arg) Value(zero_call_used_regs_used_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-arg) Value(zero_call_used_regs_all_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index f684954..620114f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
>  return false;
> }
> 
> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers at this time.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> +			bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +	 || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +	 || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +	 || (TARGET_64BIT
> +	     && (REX_SSE_REGNO_P (regno)
> +		 || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> +	&& !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_MMX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> +	&& !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_mmx_emms ();
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGS.  */
> +/* Generate a sequence of instructions that zero registers specified by
> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> +   zeroed.  */
> +static HARD_REG_SET
> +ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  bool all_sse_zeroed = false;
> +
> +  /* first, let's see whether we can zero all vector registers together.  */
> +  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> +  if (zero_all_vec_insn)
> +    {
> +      emit_insn (zero_all_vec_insn);
> +      all_sse_zeroed = true;
> +    }
> +
> +  /* then, let's see whether we can zero all st+mm registers togeter.  */
> +  rtx zero_all_st_mm_insn = zero_all_st_mm_registers (need_zeroed_hardregs);
> +  if (zero_all_st_mm_insn)
> +    emit_insn (zero_all_st_mm_insn);
> +
> +  /* Now, generate instructions to zero all the registers.  */
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  rtx zero_gpr = NULL_RTX;
> +  rtx zero_vector = NULL_RTX;
> +  rtx zero_mask = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +	continue;
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> +	continue;
> +
> +      SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +
> +      rtx reg, tmp;
> +      machine_mode mode = zero_call_used_regno_mode (regno);
> +
> +      reg = gen_rtx_REG (mode, regno);
> +
> +      if (mode == SImode)
> +	if (zero_gpr == NULL_RTX)
> +	  {
> +	    zero_gpr = reg;
> +	    tmp = gen_rtx_SET (reg, const0_rtx);
> +	    if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
> +	      {
> +		rtx clob = gen_rtx_CLOBBER (VOIDmode,
> +					    gen_rtx_REG (CCmode,
> +							 FLAGS_REG));
> +		tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> +							     tmp,
> +							     clob));
> +	      }
> +	    emit_insn (tmp);
> +	  }
> +	else
> +	  emit_move_insn (reg, zero_gpr);
> +      else if (mode == V4SFmode)
> +	if (zero_vector == NULL_RTX)
> +	  {
> +	    zero_vector = reg;
> +	    tmp = gen_rtx_SET (reg, const0_rtx);
> +	    emit_insn (tmp);
> +	  }
> +	else
> +	  emit_move_insn (reg, zero_vector);
> +      else if (mode == HImode)
> +	if (zero_mask == NULL_RTX)
> +	  {
> +	    zero_mask = reg;
> +	    tmp = gen_rtx_SET (reg, const0_rtx);
> +	    emit_insn (tmp);
> +	  }
> +	else
> +	  emit_move_insn (reg, zero_mask);
> +      else
> +	gcc_unreachable ();
> +    }
> +  return zeroed_hardregs;
> +}
> +
> /* Define how to find the value returned by a function.
>   VALTYPE is the data type of the value (as a tree).
>   If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -23229,6 +23384,9 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
> 
> +#undef TARGET_ZERO_CALL_USED_REGS
> +#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
> 
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..0ce5eb4 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,19 @@ enum symbol_visibility
>  VISIBILITY_INTERNAL
> };
> 
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr_arg,
> +  zero_call_used_regs_used_arg,
> +  zero_call_used_regs_all_arg,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
> 
> enum flt_eval_method
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 93b060f..630970b 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3614,6 +3614,14 @@ df_update_entry_block_defs (void)
> }
> 
> 
> +/* Return true if REGNO is used by the epilogue.  */
> +bool
> +df_epilogue_uses_p (unsigned int regno)
> +{
> +    return (EPILOGUE_USES (regno)
> +	    || TEST_HARD_REG_BIT (crtl->zeroed_reg_set, regno));
> +}
> +
> /* Set the bit for regs that are considered being used at the exit. */
> 
> static void
> @@ -3661,7 +3669,7 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>     epilogue as being live at the end of the function since they
>     may be referenced by our caller.  */
>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>      bitmap_set_bit (exit_block_uses, i);
> 
>  if (targetm.have_epilogue () && epilogue_completed)
> @@ -3802,7 +3810,6 @@ df_hard_reg_init (void)
>  initialized = true;
> }
> 
> -
> /* Recompute the parts of scanning that are based on regs_ever_live
>   because something changed in that array.  */
> 
> @@ -3862,7 +3869,6 @@ df_regs_ever_live_p (unsigned int regno)
>  return regs_ever_live[regno];
> }
> 
> -
> /* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
>   to change, schedule that change for the next update.  */
> 
> diff --git a/gcc/df.h b/gcc/df.h
> index 8b6ca8c..0f098d7 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
> extern bool df_hard_reg_used_p (unsigned int);
> extern unsigned int df_hard_reg_used_count (unsigned int);
> extern bool df_regs_ever_live_p (unsigned int);
> +extern bool df_epilogue_uses_p (unsigned int);
> extern void df_set_regs_ever_live (unsigned int, bool);
> extern void df_compute_regs_ever_live (bool);
> extern void df_scan_verify (void);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c9f7299..f56f61a 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3992,6 +3992,30 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
> 
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +This is used to increase the program security by either mitigating
> +Return-Oriented Programming (ROP) or preventing information leak
> +through registers.
> +@samp{skip} doesn't zero call-used registers.
> +
> +@samp{used-arg-gpr} zeros used call-used general purpose registers that
> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.
> +
> +The default for the attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
> 
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c049932..aa04a3c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -550,7 +550,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
> 
> @@ -12550,6 +12550,29 @@ int foo (void)
> 
> Not all targets support this option.
> 
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return to increase the program
> +security by either mitigating Return-Oriented Programming (ROP) or
> +preventing information leak through registers.
> +
> +@samp{skip}, which is the default, doesn't zero call-used registers.
> +
> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{all-arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.
> +
> +You can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 97437e8..7ecff05 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12053,6 +12053,16 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
> 
> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{need_zeroed_hardregs})
> +This target hook emits instructions to zero registers specified
> +by @var{need_zeroed_hardregs} at function return, at the same time
> +return the hard register set that are actually zeroed by the hook
> +Define this hook if the target has more effecient instructions to
> +zero call-used registers, or if the target only tries to zero a subset
> +of @var{need_zeroed_hardregs}.
> +If the hook is not defined, the default_zero_call_used_reg will be used.
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 412e22c..a67dbea 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8111,6 +8111,8 @@ and the associated definitions of those functions.
> 
> @hook TARGET_GET_DRAP_RTX
> 
> +@hook TARGET_ZERO_CALL_USED_REGS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
> 
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
> index 92ad0dd6..2dbeace0 100644
> --- a/gcc/emit-rtl.h
> +++ b/gcc/emit-rtl.h
> @@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
>     sets them.  */
>  HARD_REG_SET asm_clobbers;
> 
> +  /* All hard registers that are zeroed at the return of the routine.  */
> +  HARD_REG_SET zeroed_reg_set;
> +
>  /* The highest address seen during shorten_branches.  */
>  int max_insn_address;
> };
> diff --git a/gcc/function.c b/gcc/function.c
> index c612959..c8181bd 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5815,6 +5816,182 @@ make_prologue_seq (void)
>  return seq;
> }
> 
> +/* Check whether the hard register REGNO is live before the return insn RET.  */
> +static bool
> +is_live_reg_at_return (unsigned int regno, rtx_insn * ret)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (ret);
> +  auto_bitmap live_out;
> +  bitmap_copy (live_out, df_get_live_out (bb));
> +  df_simulate_one_insn_backwards (bb, ret, live_out);
> +
> +  if (REGNO_REG_SET_P (live_out, regno))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers before RET.  */
> +
> +static void
> +gen_call_used_regs_seq (rtx_insn *ret)
> +{
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  bool arg_only = true;
> +  enum zero_call_used_regs zero_regs_type = zero_call_used_regs_unset;
> +  enum zero_call_used_regs attr_zero_regs_type
> +			    = zero_call_used_regs_unset;
> +  tree attr_zero_regs
> +	= lookup_attribute ("zero_call_used_regs",
> +			    DECL_ATTRIBUTES (cfun->decl));
> +
> +  /* Get the type of zero_call_used_regs from function attribute.  */
> +  if (attr_zero_regs)
> +    {
> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
> +	 is the attribute argument's value.  */
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
> +
> +      if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "skip") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_skip;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr-arg")
> +		== 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_gpr_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-arg") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-arg") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-gpr") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all;
> +      else
> +	gcc_assert (0);
> +    }
> +
> +  if (flag_zero_call_used_regs)
> +    if (!attr_zero_regs)
> +      zero_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_regs_type = attr_zero_regs_type;
> +  else
> +    zero_regs_type = attr_zero_regs_type;
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +
> +  switch (zero_regs_type)
> +    {
> +      case zero_call_used_regs_used_arg:
> +	gpr_only = false;
> +	break;
> +      case zero_call_used_regs_all_arg:
> +	gpr_only = false;
> +	used_only = false;
> +	break;
> +      case zero_call_used_regs_used_gpr:
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_all_gpr:
> +	used_only = false;
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_used:
> +	gpr_only = false;
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_all:
> +	gpr_only = false;
> +	used_only = false;
> +	arg_only = false;
> +	break;
> +      default:
> +	break;
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the return of the routine;
> + and 4. it is general registor if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> + and 6. it is a register that passes parameter if arg_only is true;
> +   */
> +
> +  HARD_REG_SET need_zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +	continue;
> +      if (fixed_regs[regno])
> +	continue;
> +      if (is_live_reg_at_return (regno, ret))
> +	continue;
> +      if (gpr_only
> +	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
> +	continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +	continue;
> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
> +	continue;
> +
> +      /* Now this is a register that we might want to zero.  */
> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
> +    }
> +
> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
> +    return;
> +
> +  /* Now we get a hard register set that need to be zeroed, pass it to
> +     target to generate zeroing sequence.  */
> +  HARD_REG_SET zeroed_hardregs;
> +  start_sequence ();
> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  if (seq)
> +    {
> +      /* emit the memory blockage and register clobber asm volatile before
> +	 the whole sequence.  */
> +      start_sequence ();
> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
> +      rtx_insn *seq_barrier = get_insns ();
> +      end_sequence ();
> +
> +      emit_insn_before (seq_barrier, ret);
> +      emit_insn_before (seq, ret);
> +
> +      /* update the data flow information.  */
> +      crtl->zeroed_reg_set |= zeroed_hardregs;
> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +    }
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>   or NULL.  */
> 
> @@ -6486,7 +6663,75 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
> {
>  return new pass_thread_prologue_and_epilogue (ctxt);
> }
> -
> 
> +
> +static unsigned int
> +rest_of_zero_call_used_regs (void)
> +{
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  /* This pass needs data flow information.  */
> +  df_analyze ();
> +
> +  /* Search all the "return"s in the routine, and insert instruction sequence to
> +     zero the call used registers.  */
> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
> +	|| (single_succ_p (bb)
> +	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
> +      FOR_BB_INSNS_REVERSE (bb, insn)
> +	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
> +	  {
> +	    /* Now we can insert the instruction sequence to zero the call used
> +	       registers before this insn.  */
> +	    gen_call_used_regs_seq (insn);
> +	    break;
> +	  }
> +
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_zero_call_used_regs =
> +{
> +  RTL_PASS, /* type */
> +  "zero_call_used_regs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_zero_call_used_regs: public rtl_opt_pass
> +{
> +public:
> +  pass_zero_call_used_regs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return flag_zero_call_used_regs > zero_call_used_regs_unset;
> +    }
> +  virtual unsigned int execute (function *)
> +    {
> +      return rest_of_zero_call_used_regs ();
> +    }
> +
> +}; // class pass_zero_call_used_regs
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_zero_call_used_regs (gcc::context *ctxt)
> +{
> +  return new pass_zero_call_used_regs (ctxt);
> +}
> 
> /* If CONSTRAINT is a matching constraint, then return its number.
>   Otherwise, return -1.  */
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 8ad7f4b..57e5c5d 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
>    expand_asm_memory_blockage ();
> }
> 
> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
> +   same time clobbering the register set specified by ZEROED_REGS.  */
> +
> +void
> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)
> +{
> +  rtx asm_op, clob_mem, clob_reg;
> +
> +  unsigned int num_of_regs = 0;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      num_of_regs++;
> +
> +  if (num_of_regs == 0)
> +    return;
> +
> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
> +				 rtvec_alloc (0), rtvec_alloc (0),
> +				 rtvec_alloc (0), UNKNOWN_LOCATION);
> +  MEM_VOLATILE_P (asm_op) = 1;
> +
> +  rtvec v = rtvec_alloc (num_of_regs + 2);
> +
> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
> +
> +  RTVEC_ELT (v,0) = asm_op;
> +  RTVEC_ELT (v,1) = clob_mem;
> +
> +  unsigned int j = 2;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      {
> +	clob_reg  = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
> +	RTVEC_ELT (v,j) = clob_reg;
> + 	j++;
> +      }
> +  gcc_assert (j == (num_of_regs + 2));
> +
> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> +}
> +
> /* This routine will either emit the mem_thread_fence pattern or issue a 
>   sync_synchronize to generate a fence for memory model MEMMODEL.  */
> 
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 0b14700..bfa10c8 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
> rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel, 
> 			      bool);
> 
> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
> +
> extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
> 				  rtx operand);
> extern bool valid_multiword_target_p (rtx);
> diff --git a/gcc/passes.def b/gcc/passes.def
> index f865bdc..77d4676 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -492,6 +492,7 @@ along with GCC; see the file COPYING3.  If not see
>      POP_INSERT_PASSES ()
>      NEXT_PASS (pass_late_compilation);
>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> +	  NEXT_PASS (pass_zero_call_used_regs);
> 	  NEXT_PASS (pass_compute_alignments);
> 	  NEXT_PASS (pass_variable_tracking);
> 	  NEXT_PASS (pass_free_cfg);
> diff --git a/gcc/recog.c b/gcc/recog.c
> index ce83b7f..472c2dc 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
>  return ((num_changes_pending () > 0) && (apply_change_group () > 0));
> }
> 
> 
> +
> +bool
> +valid_insn_p (rtx_insn *insn)
> +{
> +  recog_memoized (insn);
> +  if (INSN_CODE (insn) < 0)
> +    return false;
> +  extract_insn (insn);
> +  /* We don't know whether the insn will be in code that is optimized
> +     for size or speed, so consider all enabled alternatives.  */
> +  if (!constrain_operands (1, get_enabled_alternatives (insn)))
> +    return false;
> +  return true;
> +}
> +
> /* Return 1 if OP is a valid general operand for machine mode MODE.
>   This is either a register reference, a memory reference,
>   or a constant.  In the case of a memory reference, the address
> diff --git a/gcc/recog.h b/gcc/recog.h
> index ae3675f..d87456c 100644
> --- a/gcc/recog.h
> +++ b/gcc/recog.h
> @@ -113,6 +113,7 @@ extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
> extern bool validate_simplify_insn (rtx_insn *insn);
> extern int num_changes_pending (void);
> extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
> +extern bool valid_insn_p (rtx_insn *);
> 
> extern int offsettable_memref_p (rtx);
> extern int offsettable_nonstrict_memref_p (rtx);
> diff --git a/gcc/resource.c b/gcc/resource.c
> index 0a9d594..90cf091 100644
> --- a/gcc/resource.c
> +++ b/gcc/resource.c
> @@ -1186,7 +1186,7 @@ init_resource_info (rtx_insn *epilogue_insn)
> 			       &end_of_function_needs, true);
> 
>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>      SET_HARD_REG_BIT (end_of_function_needs.regs, i);
> 
>  /* The registers required to be live at the end of the function are
> diff --git a/gcc/target.def b/gcc/target.def
> index ed2da15..7d6807d 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5080,6 +5080,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
> rtx, (void), NULL)
> 
> +/* Generate instruction sequence to zero call used registers.  */
> +DEFHOOK
> +(zero_call_used_regs,
> + "This target hook emits instructions to zero registers specified\n\
> +by @var{need_zeroed_hardregs} at function return, at the same time\n\
> +return the hard register set that are actually zeroed by the hook\n\
> +Define this hook if the target has more effecient instructions to\n\
> +zero call-used registers, or if the target only tries to zero a subset\n\
> +of @var{need_zeroed_hardregs}.\n\
> +If the hook is not defined, the default_zero_call_used_reg will be used.",
> + HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),
> +default_zero_call_used_regs)
> +
> /* Return true if all function parameters should be spilled to the
>   stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 5d94fce..2318c324 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-alias.h"
> #include "gimple-expr.h"
> #include "memmodel.h"
> +#include "backend.h"
> +#include "emit-rtl.h"
> +#include "df.h"
> #include "tm_p.h"
> #include "stringpool.h"
> #include "tree-vrp.h"
> @@ -987,6 +990,38 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
> 
> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> +
> +HARD_REG_SET
> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> +	rtx_insn *last_insn = get_last_insn ();
> +	machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> +	rtx zero = CONST0_RTX (mode);
> +	rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
> +	if (!valid_insn_p (insn))
> +	  {
> +	    static bool issued_error;
> +	    if (!issued_error)
> +	      {
> +		issued_error = true;
> +		sorry ("-fzero-call-used-regs not supported on this target");
> +	      }
> +	    delete_insns_since (last_insn);
> +	  }
> +	else
> +	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +      }
> +  return zeroed_hardregs;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 44ab926..e0a925f 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -160,6 +160,7 @@ extern unsigned int default_function_arg_round_boundary (machine_mode,
> 							 const_tree);
> extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> +extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> extern rtx default_internal_arg_pointer (void);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..f44add9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +volatile int result = 0;
> +int 
> +__attribute__((noinline))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..7c8350b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +volatile int result = 0;
> +int 
> +__attribute__((noinline))
> +__attribute__ ((zero_call_used_regs("all")))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..76742bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler "emms" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..18a5ffb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler "emms" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> new file mode 100644
> index 0000000..208633e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
> +
> +int 
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> new file mode 100644
> index 0000000..21e82c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
> +
> +int 
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> new file mode 100644
> index 0000000..293d2fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
> +
> +int 
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 62e5b69..8afe8ee 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -592,6 +592,7 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
> 							     *ctxt);
> +extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
> -- 
> 1.8.3.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-06 14:01 [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] Qing Zhao
  2020-10-19 13:48 ` Qing Zhao
@ 2020-10-19 19:30 ` Uros Bizjak
  2020-10-20 14:00   ` Qing Zhao
  2020-10-20 18:12 ` Richard Sandiford
  2 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2020-10-19 19:30 UTC (permalink / raw)
  To: Qing Zhao
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

On Tue, Oct 6, 2020 at 4:02 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Gcc team,
>
> This is the 3rd version of the implementation of patch -fzero-call-used-regs.
>
> We will provide a new feature into GCC:
>
> Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] command-line option
> and
> zero_call_used_regs("skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all") function attribues:
>
>    1. -fzero-call-used-regs=skip and zero_call_used_regs("skip")
>
>    Don't zero call-used registers upon function return. This is the default behavior.
>
>    2. -fzero-call-used-regs=used-gpr-arg and zero_call_used_regs("used-gpr-arg")
>
>    Zero used call-used general purpose registers that are used to pass parameters upon function return.
>
>    3. -fzero-call-used-regs=used-arg and zero_call_used_regs("used-arg")
>
>    Zero used call-used registers that are used to pass parameters upon function return.
>
>    4. -fzero-call-used-regs=all-arg and zero_call_used_regs("all-arg")
>
>    Zero all call-used registers that are used to pass parameters upon function return.
>
>    5. -fzero-call-used-regs=used-gpr and zero_call_used_regs("used-gpr")
>
>    Zero used call-used general purpose registers upon function return.
>
>    6. -fzero-call-used-regs=all-gpr and zero_call_used_regs("all-gpr")
>
>    Zero all call-used general purpose registers upon function return.
>
>    7. -fzero-call-used-regs=used and zero_call_used_regs("used")
>
>    Zero used call-used registers upon function return.
>
>    8. -fzero-call-used-regs=all and zero_call_used_regs("all")
>
>    Zero all call-used registers upon function return.
>
> Zero call-used registers at function return to increase the program
> security by either mitigating Return-Oriented Programming (ROP) or
> preventing information leak through registers.
>
> {skip}, which is the default, doesn't zero call-used registers.
>
> {used-arg-gpr} zeros used call-used general purpose registers that
> pass parameters. {used-arg} zeros used call-used registers that
> pass parameters. {arg} zeros all call-used registers that pass
> parameters. These 3 choices are used for ROP mitigation.
>
> {used-gpr} zeros call-used general purpose registers
> which are used in function.  {all-gpr} zeros all
> call-used registers.  {used} zeros call-used registers which
> are used in function.  {all} zeros all call-used registers.
> These 4 choices are used for preventing information leak through
> registers.
>
> You can control this behavior for a specific function by using the function
> attribute {zero_call_used_regs}.
>
> ******Tests be done:
> 1. Gcc bootstrap on x86, aarch64 and rs6000.
> 2. Regression test on x86, aarch64 and rs6000.
> (X86, aarch64 have no any issue, rs6000 failed at the new testing case in middle end which is expected)
>
> 3. Cpu2017 on x86, -O2 -fzero-call-used-regs=used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all
>
> ******runtime performance data of CPU2017 on x86
> https://gitlab.com/x86-gcc/gcc/-/wikis/uploads/e9c5bedba6e387586364571f2eae3b8d/zero_call_used_regs_runtime_New.csv
>
> ******The major changes compared to the previous version are:
>
> 1. Add 3 new sub-options and corresponding function attributes:
>   used-gpr-arg, used-arg, all-arg
>   for ROP mitigation purpose;
> 2. Updated user manual;
> 3. Re-design of the implementation:
>
>   3.1 data flow change to reflect the newly added zeroing insns to avoid
>   these insns been deleted, moved, or merged by later passes:
>
>   3.1.1.
>   abstract EPILOGUE_USES into a new target-independent wrapper function that
>   (a) returns true if EPILOGUE_USES itself returns true and (b) returns
>   true for registers that need to be zero on return, if the zeroing
>   instructions have already been inserted.  The places that currently
>   test EPILOGUE_USES should then test this new wrapper function instead.
>
>   Add this new wrapper function to df.h and df-scan.c.
>
>   3.1.2.
>   add a new utility routine "expand_asm_reg_clobber_mem_blockage" to generate
>   a volatile asm insn that clobbers all the hard registers that are zeroed.
>
>   emit this volatile asm in the very beginning of the zeroing sequence.
>
>   3.2 new pass:
>   add a new pass in the beginning of "late_compilation", before
>   "pass_compute_alignment", called "pass_zero_call_used_regs".
>
>   in this new pass,
>   * compute the data flow information; (df_analyze ());
>   * scan the exit block from backward to look for "return":
>     A. for each return, compute the "need_zeroed_hardregs" based on
>     the user request, and data flow information, and function ABI info.
>     B. pass this need_zeroed_hardregs set to target hook "zero_call_used_regs"
>     to generate the instruction sequnce that zero the regs.
>     C. Data flow maintenance.
> 4.Use "lookup_attribute" to get the attribute information instead of setting
>   the attribute information into "tree_decl_with_vis" in tree-core.h.
>
> ******The changelog:
>
> gcc/ChangeLog:
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com>
>     H.J. Lu  <hjl.tools@gmail.com>
>
> * common.opt: Add new option -fzero-call-used-regs
> * config/i386/i386.c (zero_call_used_regno_p): New function.
> (zero_call_used_regno_mode): Likewise.
> (zero_all_vector_registers): Likewise.
> (zero_all_st_mm_registers): Likewise.
> (ix86_zero_call_used_regs): Likewise.
> (TARGET_ZERO_CALL_USED_REGS): Define.
> * coretypes.h (enum zero_call_used_regs): New type.
> * df-scan.c (df_epilogue_uses_p): New function.
> (df_get_exit_block_use_set): Replace EPILOGUE_USES with
> df_epilogue_uses_p.
> * df.h (df_epilogue_uses_p): Declare.
> * doc/extend.texi: Document the new zero_call_used_regs attribute.
> * doc/invoke.texi: Document the new -fzero-call-used-regs option.
> * doc/tm.texi: Regenerate.
> * doc/tm.texi.in (TARGET_ZERO_CALL_USED_REGS): New hook.
> * emit-rtl.h (struct rtl_data): New field zeroed_reg_set.
> * function.c (is_live_reg_at_return): New function.
> (gen_call_used_regs_seq): Likewise.
> (rest_of_zero_call_used_regs): Likewise.
> (class pass_zero_call_used_regs): New class.
> (make_pass_zero_call_used_regs): New function.
> * optabs.c (expand_asm_reg_clobber_mem_blockage): New function.
> * optabs.h (expand_asm_reg_clobber_mem_blockage): Declare.
> * passes.def: Add new pass pass_zero_call_used_regs.
> * recog.c (valid_insn_p): New function.
> * recog.h (valid_insn_p): Declare.
> * resource.c (init_resource_info): Replace EPILOGUE_USES with
> df_epilogue_uses_p.
> * target.def (zero_call_used_regs): New hook.
> * targhooks.c (default_zero_call_used_regs): New function.
> * targhooks.h (default_zero_call_used_regs): Declare.
> * tree-pass.h (make_pass_zero_call_used_regs): Declare.
>
> gcc/c-family/ChangeLog:
>
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com>
>     H.J. Lu  <hjl.tools@gmail.com>
>
> * c-attribs.c (c_common_attribute_table): Add new attribute
> zero_call_used_regs.
> (handle_zero_call_used_regs_attribute): New function.
>
> gcc/testsuite/ChangeLog:
>
> 2020-10-05  Qing Zhao  <qing.zhao@oracle.com>
>     H.J. Lu  <hjl.tools@gmail.com>
>
> * c-c++-common/zero-scratch-regs-1.c: New test.
> * c-c++-common/zero-scratch-regs-2.c: New test.
> * gcc.target/i386/zero-scratch-regs-1.c: New test.
> * gcc.target/i386/zero-scratch-regs-10.c: New test.
> * gcc.target/i386/zero-scratch-regs-11.c: New test.
> * gcc.target/i386/zero-scratch-regs-12.c: New test.
> * gcc.target/i386/zero-scratch-regs-13.c: New test.
> * gcc.target/i386/zero-scratch-regs-14.c: New test.
> * gcc.target/i386/zero-scratch-regs-15.c: New test.
> * gcc.target/i386/zero-scratch-regs-16.c: New test.
> * gcc.target/i386/zero-scratch-regs-17.c: New test.
> * gcc.target/i386/zero-scratch-regs-18.c: New test.
> * gcc.target/i386/zero-scratch-regs-19.c: New test.
> * gcc.target/i386/zero-scratch-regs-2.c: New test.
> * gcc.target/i386/zero-scratch-regs-20.c: New test.
> * gcc.target/i386/zero-scratch-regs-21.c: New test.
> * gcc.target/i386/zero-scratch-regs-22.c: New test.
> * gcc.target/i386/zero-scratch-regs-23.c: New test.
> * gcc.target/i386/zero-scratch-regs-24.c: New test.
> * gcc.target/i386/zero-scratch-regs-25.c: New test.
> * gcc.target/i386/zero-scratch-regs-26.c: New test.
> * gcc.target/i386/zero-scratch-regs-3.c: New test.
> * gcc.target/i386/zero-scratch-regs-4.c: New test.
> * gcc.target/i386/zero-scratch-regs-5.c: New test.
> * gcc.target/i386/zero-scratch-regs-6.c: New test.
> * gcc.target/i386/zero-scratch-regs-7.c: New test.
> * gcc.target/i386/zero-scratch-regs-8.c: New test.
> * gcc.target/i386/zero-scratch-regs-9.c: New test.
>
>
> ******The patch:
>
> ---
> gcc/c-family/c-attribs.c                           |  50 +++++
> gcc/common.opt                                     |  32 +++
> gcc/config/i386/i386.c                             | 158 +++++++++++++
> gcc/coretypes.h                                    |  13 ++
> gcc/df-scan.c                                      |  12 +-
> gcc/df.h                                           |   1 +
> gcc/doc/extend.texi                                |  24 ++
> gcc/doc/invoke.texi                                |  25 ++-
> gcc/doc/tm.texi                                    |  10 +
> gcc/doc/tm.texi.in                                 |   2 +
> gcc/emit-rtl.h                                     |   3 +
> gcc/function.c                                     | 247 ++++++++++++++++++++-
> gcc/optabs.c                                       |  43 ++++
> gcc/optabs.h                                       |   2 +
> gcc/passes.def                                     |   1 +
> gcc/recog.c                                        |  15 ++
> gcc/recog.h                                        |   1 +
> gcc/resource.c                                     |   2 +-
> gcc/target.def                                     |  13 ++
> gcc/targhooks.c                                    |  35 +++
> gcc/targhooks.h                                    |   1 +
> gcc/testsuite/c-c++-common/zero-scratch-regs-1.c   |  15 ++
> gcc/testsuite/c-c++-common/zero-scratch-regs-2.c   |  16 ++
> .../gcc.target/i386/zero-scratch-regs-1.c          |  12 +
> .../gcc.target/i386/zero-scratch-regs-10.c         |  21 ++
> .../gcc.target/i386/zero-scratch-regs-11.c         |  39 ++++
> .../gcc.target/i386/zero-scratch-regs-12.c         |  39 ++++
> .../gcc.target/i386/zero-scratch-regs-13.c         |  21 ++
> .../gcc.target/i386/zero-scratch-regs-14.c         |  19 ++
> .../gcc.target/i386/zero-scratch-regs-15.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-16.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-17.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-18.c         |  13 ++
> .../gcc.target/i386/zero-scratch-regs-19.c         |  12 +
> .../gcc.target/i386/zero-scratch-regs-2.c          |  19 ++
> .../gcc.target/i386/zero-scratch-regs-20.c         |  23 ++
> .../gcc.target/i386/zero-scratch-regs-21.c         |  14 ++
> .../gcc.target/i386/zero-scratch-regs-22.c         |  20 ++
> .../gcc.target/i386/zero-scratch-regs-23.c         |  28 +++
> .../gcc.target/i386/zero-scratch-regs-24.c         |  10 +
> .../gcc.target/i386/zero-scratch-regs-25.c         |  10 +
> .../gcc.target/i386/zero-scratch-regs-26.c         |  23 ++
> .../gcc.target/i386/zero-scratch-regs-3.c          |  12 +
> .../gcc.target/i386/zero-scratch-regs-4.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-5.c          |  20 ++
> .../gcc.target/i386/zero-scratch-regs-6.c          |  14 ++
> .../gcc.target/i386/zero-scratch-regs-7.c          |  13 ++
> .../gcc.target/i386/zero-scratch-regs-8.c          |  19 ++
> .../gcc.target/i386/zero-scratch-regs-9.c          |  15 ++
> gcc/tree-pass.h                                    |   1 +
> 50 files changed, 1187 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> create mode 100644 gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>
> diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
> index c779d13..69c3886 100644
> --- a/gcc/c-family/c-attribs.c
> +++ b/gcc/c-family/c-attribs.c
> @@ -138,6 +138,8 @@ static tree handle_target_clones_attribute (tree *, tree, tree, int, bool *);
> static tree handle_optimize_attribute (tree *, tree, tree, int, bool *);
> static tree ignore_attribute (tree *, tree, tree, int, bool *);
> static tree handle_no_split_stack_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_zero_call_used_regs_attribute (tree *, tree, tree, int,
> +   bool *);
> static tree handle_argspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_fnspec_attribute (tree *, tree, tree, int, bool *);
> static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *);
> @@ -437,6 +439,8 @@ const struct attribute_spec c_common_attribute_table[] =
>       ignore_attribute, NULL },
>   { "no_split_stack",       0, 0, true,  false, false, false,
>       handle_no_split_stack_attribute, NULL },
> +  { "zero_call_used_regs",    1, 1, true, false, false, false,
> +       handle_zero_call_used_regs_attribute, NULL },
>   /* For internal use only (marking of function arguments).
>      The name contains a space to prevent its usage in source code.  */
>   { "arg spec",       1, -1, true, false, false, false,
> @@ -4959,6 +4963,52 @@ handle_no_split_stack_attribute (tree *node, tree name,
>   return NULL_TREE;
> }
>
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +       int ARG_UNUSED (flags),
> +       bool *no_add_attris)
> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> + "%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if ((strcmp (TREE_STRING_POINTER (id), "skip") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all") != 0))
> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs,"
> +      "%qs, %qs, %qs, or %qs",
> +       name, "skip", "used-gpr-arg", "used-arg", "all-arg",
> +      "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>    struct attribute_spec.handler.  */
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 292c2de..50bbf9c 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -3111,6 +3111,38 @@ fzero-initialized-in-bss
> Common Report Var(flag_zero_initialized_in_bss) Init(1)
> Put zero initialized data in the bss section.
>
> +fzero-call-used-regs=
> +Common Report RejectNegative Joined Enum(zero_call_used_regs) Var(flag_zero_call_used_regs) Init(zero_call_used_regs_unset)
> +Clear call-used registers upon function return.
> +
> +Enum
> +Name(zero_call_used_regs) Type(enum zero_call_used_regs)
> +Known choices of clearing call-used registers upon function return (for use with the -fzero-call-used-regs= option):
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(skip) Value(zero_call_used_regs_skip)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr-arg) Value(zero_call_used_regs_used_gpr_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-arg) Value(zero_call_used_regs_used_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-arg) Value(zero_call_used_regs_all_arg)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used-gpr) Value(zero_call_used_regs_used_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all-gpr) Value(zero_call_used_regs_all_gpr)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(used) Value(zero_call_used_regs_used)
> +
> +EnumValue
> +Enum(zero_call_used_regs) String(all) Value(zero_call_used_regs_all)
> +
> g
> Common Driver RejectNegative JoinedOrMissing
> Generate debug information in default format.
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index f684954..620114f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
>   return false;
> }
>
> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers at this time.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> + bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +  || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +  || (TARGET_64BIT
> +      && (REX_SSE_REGNO_P (regno)
> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_MMX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_mmx_emms ();

emms is not clearing any register, it only loads x87FPUTagWord with
FFFFH. So I think, the above is useless, as far as register clearing
is concerned.

> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGS.  */
> +/* Generate a sequence of instructions that zero registers specified by
> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> +   zeroed.  */
> +static HARD_REG_SET
> +ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  bool all_sse_zeroed = false;
> +
> +  /* first, let's see whether we can zero all vector registers together.  */
> +  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> +  if (zero_all_vec_insn)
> +    {
> +      emit_insn (zero_all_vec_insn);
> +      all_sse_zeroed = true;
> +    }
> +
> +  /* then, let's see whether we can zero all st+mm registers togeter.  */
> +  rtx zero_all_st_mm_insn = zero_all_st_mm_registers (need_zeroed_hardregs);
> +  if (zero_all_st_mm_insn)
> +    emit_insn (zero_all_st_mm_insn);
> +
> +  /* Now, generate instructions to zero all the registers.  */
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  rtx zero_gpr = NULL_RTX;
> +  rtx zero_vector = NULL_RTX;
> +  rtx zero_mask = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> + continue;
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> + continue;
> +
> +      SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +
> +      rtx reg, tmp;
> +      machine_mode mode = zero_call_used_regno_mode (regno);
> +
> +      reg = gen_rtx_REG (mode, regno);
> +
> +      if (mode == SImode)
> + if (zero_gpr == NULL_RTX)
> +   {
> +     zero_gpr = reg;
> +     tmp = gen_rtx_SET (reg, const0_rtx);
> +     if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())

No need to complicate here, there is a peephole2 pattern that will perform:

;; Attempt to always use XOR for zeroing registers (including FP modes).
(define_peephole2
  [(set (match_operand 0 "general_reg_operand")
    (match_operand 1 "const0_operand"))]

So, simply load a register with 0 and leave to the peephole to do its magic.

Other than these two issues, the (relatively trivial) x86 part LGTM.

Uros.

> +       {
> + rtx clob = gen_rtx_CLOBBER (VOIDmode,
> +     gen_rtx_REG (CCmode,
> +  FLAGS_REG));
> + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
> +      tmp,
> +      clob));
> +       }
> +     emit_insn (tmp);
> +   }
> + else
> +   emit_move_insn (reg, zero_gpr);
> +      else if (mode == V4SFmode)
> + if (zero_vector == NULL_RTX)
> +   {
> +     zero_vector = reg;
> +     tmp = gen_rtx_SET (reg, const0_rtx);
> +     emit_insn (tmp);
> +   }
> + else
> +   emit_move_insn (reg, zero_vector);
> +      else if (mode == HImode)
> + if (zero_mask == NULL_RTX)
> +   {
> +     zero_mask = reg;
> +     tmp = gen_rtx_SET (reg, const0_rtx);
> +     emit_insn (tmp);
> +   }
> + else
> +   emit_move_insn (reg, zero_mask);
> +      else
> + gcc_unreachable ();
> +    }
> +  return zeroed_hardregs;
> +}
> +
> /* Define how to find the value returned by a function.
>    VALTYPE is the data type of the value (as a tree).
>    If the precise function being called is known, FUNC is its FUNCTION_DECL;
> @@ -23229,6 +23384,9 @@ ix86_run_selftests (void)
> #undef TARGET_FUNCTION_VALUE_REGNO_P
> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
>
> +#undef TARGET_ZERO_CALL_USED_REGS
> +#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
> +
> #undef TARGET_PROMOTE_FUNCTION_MODE
> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
>
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..0ce5eb4 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,19 @@ enum symbol_visibility
>   VISIBILITY_INTERNAL
> };
>
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr_arg,
> +  zero_call_used_regs_used_arg,
> +  zero_call_used_regs_all_arg,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};
> +
> /* enums used by the targetm.excess_precision hook.  */
>
> enum flt_eval_method
> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
> index 93b060f..630970b 100644
> --- a/gcc/df-scan.c
> +++ b/gcc/df-scan.c
> @@ -3614,6 +3614,14 @@ df_update_entry_block_defs (void)
> }
>
>
> +/* Return true if REGNO is used by the epilogue.  */
> +bool
> +df_epilogue_uses_p (unsigned int regno)
> +{
> +    return (EPILOGUE_USES (regno)
> +     || TEST_HARD_REG_BIT (crtl->zeroed_reg_set, regno));
> +}
> +
> /* Set the bit for regs that are considered being used at the exit. */
>
> static void
> @@ -3661,7 +3669,7 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>      epilogue as being live at the end of the function since they
>      may be referenced by our caller.  */
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>       bitmap_set_bit (exit_block_uses, i);
>
>   if (targetm.have_epilogue () && epilogue_completed)
> @@ -3802,7 +3810,6 @@ df_hard_reg_init (void)
>   initialized = true;
> }
>
> -
> /* Recompute the parts of scanning that are based on regs_ever_live
>    because something changed in that array.  */
>
> @@ -3862,7 +3869,6 @@ df_regs_ever_live_p (unsigned int regno)
>   return regs_ever_live[regno];
> }
>
> -
> /* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
>    to change, schedule that change for the next update.  */
>
> diff --git a/gcc/df.h b/gcc/df.h
> index 8b6ca8c..0f098d7 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
> extern bool df_hard_reg_used_p (unsigned int);
> extern unsigned int df_hard_reg_used_count (unsigned int);
> extern bool df_regs_ever_live_p (unsigned int);
> +extern bool df_epilogue_uses_p (unsigned int);
> extern void df_set_regs_ever_live (unsigned int, bool);
> extern void df_compute_regs_ever_live (bool);
> extern void df_scan_verify (void);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c9f7299..f56f61a 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3992,6 +3992,30 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
>
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +This is used to increase the program security by either mitigating
> +Return-Oriented Programming (ROP) or preventing information leak
> +through registers.
> +@samp{skip} doesn't zero call-used registers.
> +
> +@samp{used-arg-gpr} zeros used call-used general purpose registers that
> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.
> +
> +The default for the attribute is controlled by @option{-fzero-call-used-regs}.
> +
> @end table
>
> @c This is the end of the target-independent attribute table
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index c049932..aa04a3c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -550,7 +550,7 @@ Objective-C and Objective-C++ Dialects}.
> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
> -funsafe-math-optimizations  -funswitch-loops @gol
> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
> --param @var{name}=@var{value}
> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
>
> @@ -12550,6 +12550,29 @@ int foo (void)
>
> Not all targets support this option.
>
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return to increase the program
> +security by either mitigating Return-Oriented Programming (ROP) or
> +preventing information leak through registers.
> +
> +@samp{skip}, which is the default, doesn't zero call-used registers.
> +
> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{all-arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.
> +
> +You can control this behavior for a specific function by using the function
> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
> +
> @item --param @var{name}=@var{value}
> @opindex param
> In some places, GCC uses various constants to control the amount of
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 97437e8..7ecff05 100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -12053,6 +12053,16 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
> is needed.
> @end deftypefn
>
> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{need_zeroed_hardregs})
> +This target hook emits instructions to zero registers specified
> +by @var{need_zeroed_hardregs} at function return, at the same time
> +return the hard register set that are actually zeroed by the hook
> +Define this hook if the target has more effecient instructions to
> +zero call-used registers, or if the target only tries to zero a subset
> +of @var{need_zeroed_hardregs}.
> +If the hook is not defined, the default_zero_call_used_reg will be used.
> +@end deftypefn
> +
> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
> When optimization is disabled, this hook indicates whether or not
> arguments should be allocated to stack slots.  Normally, GCC allocates
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 412e22c..a67dbea 100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -8111,6 +8111,8 @@ and the associated definitions of those functions.
>
> @hook TARGET_GET_DRAP_RTX
>
> +@hook TARGET_ZERO_CALL_USED_REGS
> +
> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
>
> @hook TARGET_CONST_ANCHOR
> diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
> index 92ad0dd6..2dbeace0 100644
> --- a/gcc/emit-rtl.h
> +++ b/gcc/emit-rtl.h
> @@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
>      sets them.  */
>   HARD_REG_SET asm_clobbers;
>
> +  /* All hard registers that are zeroed at the return of the routine.  */
> +  HARD_REG_SET zeroed_reg_set;
> +
>   /* The highest address seen during shorten_branches.  */
>   int max_insn_address;
> };
> diff --git a/gcc/function.c b/gcc/function.c
> index c612959..c8181bd 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5815,6 +5816,182 @@ make_prologue_seq (void)
>   return seq;
> }
>
> +/* Check whether the hard register REGNO is live before the return insn RET.  */
> +static bool
> +is_live_reg_at_return (unsigned int regno, rtx_insn * ret)
> +{
> +  basic_block bb = BLOCK_FOR_INSN (ret);
> +  auto_bitmap live_out;
> +  bitmap_copy (live_out, df_get_live_out (bb));
> +  df_simulate_one_insn_backwards (bb, ret, live_out);
> +
> +  if (REGNO_REG_SET_P (live_out, regno))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers before RET.  */
> +
> +static void
> +gen_call_used_regs_seq (rtx_insn *ret)
> +{
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  bool arg_only = true;
> +  enum zero_call_used_regs zero_regs_type = zero_call_used_regs_unset;
> +  enum zero_call_used_regs attr_zero_regs_type
> +     = zero_call_used_regs_unset;
> +  tree attr_zero_regs
> + = lookup_attribute ("zero_call_used_regs",
> +     DECL_ATTRIBUTES (cfun->decl));
> +
> +  /* Get the type of zero_call_used_regs from function attribute.  */
> +  if (attr_zero_regs)
> +    {
> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
> +  is the attribute argument's value.  */
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
> +
> +      if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "skip") == 0)
> + attr_zero_regs_type = zero_call_used_regs_skip;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr-arg")
> + == 0)
> + attr_zero_regs_type = zero_call_used_regs_used_gpr_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-arg") == 0)
> + attr_zero_regs_type = zero_call_used_regs_used_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-arg") == 0)
> + attr_zero_regs_type = zero_call_used_regs_all_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr") == 0)
> + attr_zero_regs_type = zero_call_used_regs_used_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-gpr") == 0)
> + attr_zero_regs_type = zero_call_used_regs_all_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used") == 0)
> + attr_zero_regs_type = zero_call_used_regs_used;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all") == 0)
> + attr_zero_regs_type = zero_call_used_regs_all;
> +      else
> + gcc_assert (0);
> +    }
> +
> +  if (flag_zero_call_used_regs)
> +    if (!attr_zero_regs)
> +      zero_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_regs_type = attr_zero_regs_type;
> +  else
> +    zero_regs_type = attr_zero_regs_type;
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +
> +  switch (zero_regs_type)
> +    {
> +      case zero_call_used_regs_used_arg:
> + gpr_only = false;
> + break;
> +      case zero_call_used_regs_all_arg:
> + gpr_only = false;
> + used_only = false;
> + break;
> +      case zero_call_used_regs_used_gpr:
> + arg_only = false;
> + break;
> +      case zero_call_used_regs_all_gpr:
> + used_only = false;
> + arg_only = false;
> + break;
> +      case zero_call_used_regs_used:
> + gpr_only = false;
> + arg_only = false;
> + break;
> +      case zero_call_used_regs_all:
> + gpr_only = false;
> + used_only = false;
> + arg_only = false;
> + break;
> +      default:
> + break;
> +    }
> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the return of the routine;
> + and 4. it is general registor if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> + and 6. it is a register that passes parameter if arg_only is true;
> +   */
> +
> +  HARD_REG_SET need_zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> + continue;
> +      if (fixed_regs[regno])
> + continue;
> +      if (is_live_reg_at_return (regno, ret))
> + continue;
> +      if (gpr_only
> +   && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
> + continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> + continue;
> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
> + continue;
> +
> +      /* Now this is a register that we might want to zero.  */
> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
> +    }
> +
> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
> +    return;
> +
> +  /* Now we get a hard register set that need to be zeroed, pass it to
> +     target to generate zeroing sequence.  */
> +  HARD_REG_SET zeroed_hardregs;
> +  start_sequence ();
> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  if (seq)
> +    {
> +      /* emit the memory blockage and register clobber asm volatile before
> +  the whole sequence.  */
> +      start_sequence ();
> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
> +      rtx_insn *seq_barrier = get_insns ();
> +      end_sequence ();
> +
> +      emit_insn_before (seq_barrier, ret);
> +      emit_insn_before (seq, ret);
> +
> +      /* update the data flow information.  */
> +      crtl->zeroed_reg_set |= zeroed_hardregs;
> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +    }
> +  return;
> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>    or NULL.  */
>
> @@ -6486,7 +6663,75 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
> {
>   return new pass_thread_prologue_and_epilogue (ctxt);
> }
> -
>
> +
> +static unsigned int
> +rest_of_zero_call_used_regs (void)
> +{
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  /* This pass needs data flow information.  */
> +  df_analyze ();
> +
> +  /* Search all the "return"s in the routine, and insert instruction sequence to
> +     zero the call used registers.  */
> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
> + || (single_succ_p (bb)
> +     && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
> +      FOR_BB_INSNS_REVERSE (bb, insn)
> + if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
> +   {
> +     /* Now we can insert the instruction sequence to zero the call used
> +        registers before this insn.  */
> +     gen_call_used_regs_seq (insn);
> +     break;
> +   }
> +
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_zero_call_used_regs =
> +{
> +  RTL_PASS, /* type */
> +  "zero_call_used_regs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_zero_call_used_regs: public rtl_opt_pass
> +{
> +public:
> +  pass_zero_call_used_regs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return flag_zero_call_used_regs > zero_call_used_regs_unset;
> +    }
> +  virtual unsigned int execute (function *)
> +    {
> +      return rest_of_zero_call_used_regs ();
> +    }
> +
> +}; // class pass_zero_call_used_regs
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_zero_call_used_regs (gcc::context *ctxt)
> +{
> +  return new pass_zero_call_used_regs (ctxt);
> +}
>
> /* If CONSTRAINT is a matching constraint, then return its number.
>    Otherwise, return -1.  */
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 8ad7f4b..57e5c5d 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
>     expand_asm_memory_blockage ();
> }
>
> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
> +   same time clobbering the register set specified by ZEROED_REGS.  */
> +
> +void
> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)
> +{
> +  rtx asm_op, clob_mem, clob_reg;
> +
> +  unsigned int num_of_regs = 0;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      num_of_regs++;
> +
> +  if (num_of_regs == 0)
> +    return;
> +
> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
> +  rtvec_alloc (0), rtvec_alloc (0),
> +  rtvec_alloc (0), UNKNOWN_LOCATION);
> +  MEM_VOLATILE_P (asm_op) = 1;
> +
> +  rtvec v = rtvec_alloc (num_of_regs + 2);
> +
> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
> +
> +  RTVEC_ELT (v,0) = asm_op;
> +  RTVEC_ELT (v,1) = clob_mem;
> +
> +  unsigned int j = 2;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      {
> + clob_reg  = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
> + RTVEC_ELT (v,j) = clob_reg;
> +  j++;
> +      }
> +  gcc_assert (j == (num_of_regs + 2));
> +
> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
> +}
> +
> /* This routine will either emit the mem_thread_fence pattern or issue a
>    sync_synchronize to generate a fence for memory model MEMMODEL.  */
>
> diff --git a/gcc/optabs.h b/gcc/optabs.h
> index 0b14700..bfa10c8 100644
> --- a/gcc/optabs.h
> +++ b/gcc/optabs.h
> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
> rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel,
>       bool);
>
> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
> +
> extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
>   rtx operand);
> extern bool valid_multiword_target_p (rtx);
> diff --git a/gcc/passes.def b/gcc/passes.def
> index f865bdc..77d4676 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -492,6 +492,7 @@ along with GCC; see the file COPYING3.  If not see
>       POP_INSERT_PASSES ()
>       NEXT_PASS (pass_late_compilation);
>       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> +   NEXT_PASS (pass_zero_call_used_regs);
>   NEXT_PASS (pass_compute_alignments);
>   NEXT_PASS (pass_variable_tracking);
>   NEXT_PASS (pass_free_cfg);
> diff --git a/gcc/recog.c b/gcc/recog.c
> index ce83b7f..472c2dc 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
>   return ((num_changes_pending () > 0) && (apply_change_group () > 0));
> }
>
>
> +
> +bool
> +valid_insn_p (rtx_insn *insn)
> +{
> +  recog_memoized (insn);
> +  if (INSN_CODE (insn) < 0)
> +    return false;
> +  extract_insn (insn);
> +  /* We don't know whether the insn will be in code that is optimized
> +     for size or speed, so consider all enabled alternatives.  */
> +  if (!constrain_operands (1, get_enabled_alternatives (insn)))
> +    return false;
> +  return true;
> +}
> +
> /* Return 1 if OP is a valid general operand for machine mode MODE.
>    This is either a register reference, a memory reference,
>    or a constant.  In the case of a memory reference, the address
> diff --git a/gcc/recog.h b/gcc/recog.h
> index ae3675f..d87456c 100644
> --- a/gcc/recog.h
> +++ b/gcc/recog.h
> @@ -113,6 +113,7 @@ extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
> extern bool validate_simplify_insn (rtx_insn *insn);
> extern int num_changes_pending (void);
> extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
> +extern bool valid_insn_p (rtx_insn *);
>
> extern int offsettable_memref_p (rtx);
> extern int offsettable_nonstrict_memref_p (rtx);
> diff --git a/gcc/resource.c b/gcc/resource.c
> index 0a9d594..90cf091 100644
> --- a/gcc/resource.c
> +++ b/gcc/resource.c
> @@ -1186,7 +1186,7 @@ init_resource_info (rtx_insn *epilogue_insn)
>        &end_of_function_needs, true);
>
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> -    if (global_regs[i] || EPILOGUE_USES (i))
> +    if (global_regs[i] || df_epilogue_uses_p (i))
>       SET_HARD_REG_BIT (end_of_function_needs.regs, i);
>
>   /* The registers required to be live at the end of the function are
> diff --git a/gcc/target.def b/gcc/target.def
> index ed2da15..7d6807d 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5080,6 +5080,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
>  rtx, (void), NULL)
>
> +/* Generate instruction sequence to zero call used registers.  */
> +DEFHOOK
> +(zero_call_used_regs,
> + "This target hook emits instructions to zero registers specified\n\
> +by @var{need_zeroed_hardregs} at function return, at the same time\n\
> +return the hard register set that are actually zeroed by the hook\n\
> +Define this hook if the target has more effecient instructions to\n\
> +zero call-used registers, or if the target only tries to zero a subset\n\
> +of @var{need_zeroed_hardregs}.\n\
> +If the hook is not defined, the default_zero_call_used_reg will be used.",
> + HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),
> +default_zero_call_used_regs)
> +
> /* Return true if all function parameters should be spilled to the
>    stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 5d94fce..2318c324 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-alias.h"
> #include "gimple-expr.h"
> #include "memmodel.h"
> +#include "backend.h"
> +#include "emit-rtl.h"
> +#include "df.h"
> #include "tm_p.h"
> #include "stringpool.h"
> #include "tree-vrp.h"
> @@ -987,6 +990,38 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> +
> +HARD_REG_SET
> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> + rtx_insn *last_insn = get_last_insn ();
> + machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> + rtx zero = CONST0_RTX (mode);
> + rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
> + if (!valid_insn_p (insn))
> +   {
> +     static bool issued_error;
> +     if (!issued_error)
> +       {
> + issued_error = true;
> + sorry ("-fzero-call-used-regs not supported on this target");
> +       }
> +     delete_insns_since (last_insn);
> +   }
> + else
> +   SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +      }
> +  return zeroed_hardregs;
> +}
> +
> rtx
> default_internal_arg_pointer (void)
> {
> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
> index 44ab926..e0a925f 100644
> --- a/gcc/targhooks.h
> +++ b/gcc/targhooks.h
> @@ -160,6 +160,7 @@ extern unsigned int default_function_arg_round_boundary (machine_mode,
>  const_tree);
> extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
> extern rtx default_function_value (const_tree, const_tree, bool);
> +extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
> extern rtx default_libcall_value (machine_mode, const_rtx);
> extern bool default_function_value_regno_p (const unsigned int);
> extern rtx default_internal_arg_pointer (void);
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..f44add9
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +volatile int result = 0;
> +int
> +__attribute__((noinline))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..7c8350b
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
> @@ -0,0 +1,16 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +volatile int result = 0;
> +int
> +__attribute__((noinline))
> +__attribute__ ((zero_call_used_regs("all")))
> +foo (int x)
> +{
> +  return x;
> +}
> +int main()
> +{
> +  result = foo (2);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> new file mode 100644
> index 0000000..9f61dc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> new file mode 100644
> index 0000000..09048e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> new file mode 100644
> index 0000000..4862688
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> new file mode 100644
> index 0000000..500251b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +struct S { int i; };
> +__attribute__((const, noinline, noclone))
> +struct S foo (int x)
> +{
> +  struct S s;
> +  s.i = x;
> +  return s;
> +}
> +
> +int a[2048], b[2048], c[2048], d[2048];
> +struct S e[2048];
> +
> +__attribute__((noinline, noclone)) void
> +bar (void)
> +{
> +  int i;
> +  for (i = 0; i < 1024; i++)
> +    {
> +      e[i] = foo (i);
> +      a[i+2] = a[i] + a[i+1];
> +      b[10] = b[10] + i;
> +      c[i] = c[2047 - i];
> +      d[i] = d[i + 1];
> +    }
> +}
> +
> +int
> +main ()
> +{
> +  int i;
> +  bar ();
> +  for (i = 0; i < 1024; i++)
> +    if (e[i].i != i)
> +      __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> new file mode 100644
> index 0000000..8b058e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> new file mode 100644
> index 0000000..d4eaaf7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> new file mode 100644
> index 0000000..dd3bb90
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> new file mode 100644
> index 0000000..e2274f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> new file mode 100644
> index 0000000..7f5d153
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> new file mode 100644
> index 0000000..fe13d2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> new file mode 100644
> index 0000000..205a532
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> new file mode 100644
> index 0000000..e046684
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> new file mode 100644
> index 0000000..4be8ff6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
> +
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> new file mode 100644
> index 0000000..0eb34e0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
> +
> +__attribute__ ((zero_call_used_regs("used")))
> +float
> +foo (float z, float y, float x)
> +{
> +  return x + y;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> new file mode 100644
> index 0000000..76742bb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler "emms" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> new file mode 100644
> index 0000000..18a5ffb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler "vzeroall" } } */
> +/* { dg-final { scan-assembler "emms" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> new file mode 100644
> index 0000000..208633e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> new file mode 100644
> index 0000000..21e82c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> new file mode 100644
> index 0000000..293d2fe
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> new file mode 100644
> index 0000000..de71223
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> new file mode 100644
> index 0000000..ccfa441
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> new file mode 100644
> index 0000000..6b46ca3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +__attribute__ ((zero_call_used_regs("all-gpr")))
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> new file mode 100644
> index 0000000..0680f38
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
> +
> +void
> +foo (void)
> +{
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> new file mode 100644
> index 0000000..534defa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> new file mode 100644
> index 0000000..477bb19
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> new file mode 100644
> index 0000000..a305a60
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target *-*-linux* } } */
> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
> +
> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
> +
> +int
> +foo (int x)
> +{
> +  return x;
> +}
> +
> +/* { dg-final { scan-assembler-not "vzeroall" } } */
> +/* { dg-final { scan-assembler-not "%xmm" } } */
> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 62e5b69..8afe8ee 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -592,6 +592,7 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
>      *ctxt);
> +extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
> extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-19 19:30 ` Uros Bizjak
@ 2020-10-20 14:00   ` Qing Zhao
  2020-10-20 15:24     ` Uros Bizjak
  0 siblings, 1 reply; 20+ messages in thread
From: Qing Zhao @ 2020-10-20 14:00 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Hi, Uros,

Thanks a lot for your comments.

> On Oct 19, 2020, at 2:30 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
>> 
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index f684954..620114f 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
>>  return false;
>> }
>> 
>> +/* Check whether the register REGNO should be zeroed on X86.
>> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>> +   together, no need to zero it again.
>> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>> +   very hard to be zeroed individually, don't zero individual st or
>> +   mm registgers at this time.  */
>> +
>> +static bool
>> +zero_call_used_regno_p (const unsigned int regno,
>> + bool all_sse_zeroed)
>> +{
>> +  return GENERAL_REGNO_P (regno)
>> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
>> +  || MASK_REGNO_P (regno);
>> +}
>> +
>> +/* Return the machine_mode that is used to zero register REGNO.  */
>> +
>> +static machine_mode
>> +zero_call_used_regno_mode (const unsigned int regno)
>> +{
>> +  /* NB: We only need to zero the lower 32 bits for integer registers
>> +     and the lower 128 bits for vector registers since destination are
>> +     zero-extended to the full register width.  */
>> +  if (GENERAL_REGNO_P (regno))
>> +    return SImode;
>> +  else if (SSE_REGNO_P (regno))
>> +    return V4SFmode;
>> +  else
>> +    return HImode;
>> +}
>> +
>> +/* Generate a rtx to zero all vector registers togetehr if possible,
>> +   otherwise, return NULL.  */
>> +
>> +static rtx
>> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  if (!TARGET_AVX)
>> +    return NULL;
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>> +  || (TARGET_64BIT
>> +      && (REX_SSE_REGNO_P (regno)
>> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      return NULL;
>> +
>> +  return gen_avx_vzeroall ();
>> +}
>> +
>> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
>> +   otherwise, return NULL.  */
>> +
>> +static rtx
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  if (!TARGET_MMX)
>> +    return NULL;
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      return NULL;
>> +
>> +  return gen_mmx_emms ();
> 
> emms is not clearing any register, it only loads x87FPUTagWord with
> FFFFH. So I think, the above is useless, as far as register clearing
> is concerned.

Thanks for the info.

So, for mm and st registers, should we clear them, and how?


> 
>> +}
>> +
>> +/* TARGET_ZERO_CALL_USED_REGS.  */
>> +/* Generate a sequence of instructions that zero registers specified by
>> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
>> +   zeroed.  */
>> +static HARD_REG_SET
>> +ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  HARD_REG_SET zeroed_hardregs;
>> +  bool all_sse_zeroed = false;
>> +
>> +  /* first, let's see whether we can zero all vector registers together.  */
>> +  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
>> +  if (zero_all_vec_insn)
>> +    {
>> +      emit_insn (zero_all_vec_insn);
>> +      all_sse_zeroed = true;
>> +    }
>> +
>> +  /* then, let's see whether we can zero all st+mm registers togeter.  */
>> +  rtx zero_all_st_mm_insn = zero_all_st_mm_registers (need_zeroed_hardregs);
>> +  if (zero_all_st_mm_insn)
>> +    emit_insn (zero_all_st_mm_insn);
>> +
>> +  /* Now, generate instructions to zero all the registers.  */
>> +
>> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> +  rtx zero_gpr = NULL_RTX;
>> +  rtx zero_vector = NULL_RTX;
>> +  rtx zero_mask = NULL_RTX;
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    {
>> +      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> + continue;
>> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
>> + continue;
>> +
>> +      SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> +
>> +      rtx reg, tmp;
>> +      machine_mode mode = zero_call_used_regno_mode (regno);
>> +
>> +      reg = gen_rtx_REG (mode, regno);
>> +
>> +      if (mode == SImode)
>> + if (zero_gpr == NULL_RTX)
>> +   {
>> +     zero_gpr = reg;
>> +     tmp = gen_rtx_SET (reg, const0_rtx);
>> +     if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
> 
> No need to complicate here, there is a peephole2 pattern that will perform:
> 
> ;; Attempt to always use XOR for zeroing registers (including FP modes).
> (define_peephole2
>  [(set (match_operand 0 "general_reg_operand")
>    (match_operand 1 "const0_operand"))]
> 
> So, simply load a register with 0 and leave to the peephole to do its magic.

Since the new register zeroing pass is after peephole2 pass, the above peephole optimization cannot be applied.

          NEXT_PASS (pass_peephole2);   ====> peephole2 
          NEXT_PASS (pass_if_after_reload);
          NEXT_PASS (pass_regrename);
          NEXT_PASS (pass_cprop_hardreg);
          NEXT_PASS (pass_fast_rtl_dce);
          NEXT_PASS (pass_reorder_blocks);
          NEXT_PASS (pass_leaf_regs);
          NEXT_PASS (pass_split_before_sched2);
          NEXT_PASS (pass_sched2);
          NEXT_PASS (pass_stack_regs);
          PUSH_INSERT_PASSES_WITHIN (pass_stack_regs)
              NEXT_PASS (pass_split_before_regstack);
              NEXT_PASS (pass_stack_regs_run);
          POP_INSERT_PASSES ()
      POP_INSERT_PASSES ()
      NEXT_PASS (pass_late_compilation);
      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
          NEXT_PASS (pass_zero_call_used_regs);   ====> new zero registers pass.
          NEXT_PASS (pass_compute_alignments);
          NEXT_PASS (pass_variable_tracking);

So, the current code should still be necessary?

Thanks.

Qing


> 
> Other than these two issues, the (relatively trivial) x86 part LGTM.
> 
> Uros.
> 
>> +       {
>> + rtx clob = gen_rtx_CLOBBER (VOIDmode,
>> +     gen_rtx_REG (CCmode,
>> +  FLAGS_REG));
>> + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2,
>> +      tmp,
>> +      clob));
>> +       }
>> +     emit_insn (tmp);
>> +   }
>> + else
>> +   emit_move_insn (reg, zero_gpr);
>> +      else if (mode == V4SFmode)
>> + if (zero_vector == NULL_RTX)
>> +   {
>> +     zero_vector = reg;
>> +     tmp = gen_rtx_SET (reg, const0_rtx);
>> +     emit_insn (tmp);
>> +   }
>> + else
>> +   emit_move_insn (reg, zero_vector);
>> +      else if (mode == HImode)
>> + if (zero_mask == NULL_RTX)
>> +   {
>> +     zero_mask = reg;
>> +     tmp = gen_rtx_SET (reg, const0_rtx);
>> +     emit_insn (tmp);
>> +   }
>> + else
>> +   emit_move_insn (reg, zero_mask);
>> +      else
>> + gcc_unreachable ();
>> +    }
>> +  return zeroed_hardregs;
>> +}
>> +
>> /* Define how to find the value returned by a function.
>>   VALTYPE is the data type of the value (as a tree).
>>   If the precise function being called is known, FUNC is its FUNCTION_DECL;
>> @@ -23229,6 +23384,9 @@ ix86_run_selftests (void)
>> #undef TARGET_FUNCTION_VALUE_REGNO_P
>> #define TARGET_FUNCTION_VALUE_REGNO_P ix86_function_value_regno_p
>> 
>> +#undef TARGET_ZERO_CALL_USED_REGS
>> +#define TARGET_ZERO_CALL_USED_REGS ix86_zero_call_used_regs
>> +
>> #undef TARGET_PROMOTE_FUNCTION_MODE
>> #define TARGET_PROMOTE_FUNCTION_MODE ix86_promote_function_mode
>> 
>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>> index 6b6cfcd..0ce5eb4 100644
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -418,6 +418,19 @@ enum symbol_visibility
>>  VISIBILITY_INTERNAL
>> };
>> 
>> +/* Zero call-used registers type.  */
>> +enum zero_call_used_regs {
>> +  zero_call_used_regs_unset = 0,
>> +  zero_call_used_regs_skip,
>> +  zero_call_used_regs_used_gpr_arg,
>> +  zero_call_used_regs_used_arg,
>> +  zero_call_used_regs_all_arg,
>> +  zero_call_used_regs_used_gpr,
>> +  zero_call_used_regs_all_gpr,
>> +  zero_call_used_regs_used,
>> +  zero_call_used_regs_all
>> +};
>> +
>> /* enums used by the targetm.excess_precision hook.  */
>> 
>> enum flt_eval_method
>> diff --git a/gcc/df-scan.c b/gcc/df-scan.c
>> index 93b060f..630970b 100644
>> --- a/gcc/df-scan.c
>> +++ b/gcc/df-scan.c
>> @@ -3614,6 +3614,14 @@ df_update_entry_block_defs (void)
>> }
>> 
>> 
>> +/* Return true if REGNO is used by the epilogue.  */
>> +bool
>> +df_epilogue_uses_p (unsigned int regno)
>> +{
>> +    return (EPILOGUE_USES (regno)
>> +     || TEST_HARD_REG_BIT (crtl->zeroed_reg_set, regno));
>> +}
>> +
>> /* Set the bit for regs that are considered being used at the exit. */
>> 
>> static void
>> @@ -3661,7 +3669,7 @@ df_get_exit_block_use_set (bitmap exit_block_uses)
>>     epilogue as being live at the end of the function since they
>>     may be referenced by our caller.  */
>>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> -    if (global_regs[i] || EPILOGUE_USES (i))
>> +    if (global_regs[i] || df_epilogue_uses_p (i))
>>      bitmap_set_bit (exit_block_uses, i);
>> 
>>  if (targetm.have_epilogue () && epilogue_completed)
>> @@ -3802,7 +3810,6 @@ df_hard_reg_init (void)
>>  initialized = true;
>> }
>> 
>> -
>> /* Recompute the parts of scanning that are based on regs_ever_live
>>   because something changed in that array.  */
>> 
>> @@ -3862,7 +3869,6 @@ df_regs_ever_live_p (unsigned int regno)
>>  return regs_ever_live[regno];
>> }
>> 
>> -
>> /* Set regs_ever_live[REGNO] to VALUE.  If this cause regs_ever_live
>>   to change, schedule that change for the next update.  */
>> 
>> diff --git a/gcc/df.h b/gcc/df.h
>> index 8b6ca8c..0f098d7 100644
>> --- a/gcc/df.h
>> +++ b/gcc/df.h
>> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
>> extern bool df_hard_reg_used_p (unsigned int);
>> extern unsigned int df_hard_reg_used_count (unsigned int);
>> extern bool df_regs_ever_live_p (unsigned int);
>> +extern bool df_epilogue_uses_p (unsigned int);
>> extern void df_set_regs_ever_live (unsigned int, bool);
>> extern void df_compute_regs_ever_live (bool);
>> extern void df_scan_verify (void);
>> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>> index c9f7299..f56f61a 100644
>> --- a/gcc/doc/extend.texi
>> +++ b/gcc/doc/extend.texi
>> @@ -3992,6 +3992,30 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
>> A declaration to which @code{weakref} is attached and that is associated
>> with a named @code{target} must be @code{static}.
>> 
>> +@item zero_call_used_regs ("@var{choice}")
>> +@cindex @code{zero_call_used_regs} function attribute
>> +
>> +The @code{zero_call_used_regs} attribute causes the compiler to zero
>> +call-used registers at function return according to @var{choice}.
>> +This is used to increase the program security by either mitigating
>> +Return-Oriented Programming (ROP) or preventing information leak
>> +through registers.
>> +@samp{skip} doesn't zero call-used registers.
>> +
>> +@samp{used-arg-gpr} zeros used call-used general purpose registers that
>> +pass parameters. @samp{used-arg} zeros used call-used registers that
>> +pass parameters. @samp{arg} zeros all call-used registers that pass
>> +parameters.  These 3 choices are used for ROP mitigation.
>> +
>> +@samp{used-gpr} zeros call-used general purpose registers
>> +which are used in function.  @samp{all-gpr} zeros all
>> +call-used registers.  @samp{used} zeros call-used registers which
>> +are used in function.  @samp{all} zeros all call-used registers.
>> +These 4 choices are used for preventing information leak through
>> +registers.
>> +
>> +The default for the attribute is controlled by @option{-fzero-call-used-regs}.
>> +
>> @end table
>> 
>> @c This is the end of the target-independent attribute table
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index c049932..aa04a3c 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -550,7 +550,7 @@ Objective-C and Objective-C++ Dialects}.
>> -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
>> -funsafe-math-optimizations  -funswitch-loops @gol
>> -fipa-ra  -fvariable-expansion-in-unroller  -fvect-cost-model  -fvpt @gol
>> --fweb  -fwhole-program  -fwpa  -fuse-linker-plugin @gol
>> +-fweb  -fwhole-program  -fwpa  -fuse-linker-plugin -fzero-call-used-regs @gol
>> --param @var{name}=@var{value}
>> -O  -O0  -O1  -O2  -O3  -Os  -Ofast  -Og}
>> 
>> @@ -12550,6 +12550,29 @@ int foo (void)
>> 
>> Not all targets support this option.
>> 
>> +@item -fzero-call-used-regs=@var{choice}
>> +@opindex fzero-call-used-regs
>> +Zero call-used registers at function return to increase the program
>> +security by either mitigating Return-Oriented Programming (ROP) or
>> +preventing information leak through registers.
>> +
>> +@samp{skip}, which is the default, doesn't zero call-used registers.
>> +
>> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
>> +pass parameters. @samp{used-arg} zeros used call-used registers that
>> +pass parameters. @samp{all-arg} zeros all call-used registers that pass
>> +parameters.  These 3 choices are used for ROP mitigation.
>> +
>> +@samp{used-gpr} zeros call-used general purpose registers
>> +which are used in function.  @samp{all-gpr} zeros all
>> +call-used registers.  @samp{used} zeros call-used registers which
>> +are used in function.  @samp{all} zeros all call-used registers.
>> +These 4 choices are used for preventing information leak through
>> +registers.
>> +
>> +You can control this behavior for a specific function by using the function
>> +attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.
>> +
>> @item --param @var{name}=@var{value}
>> @opindex param
>> In some places, GCC uses various constants to control the amount of
>> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> index 97437e8..7ecff05 100644
>> --- a/gcc/doc/tm.texi
>> +++ b/gcc/doc/tm.texi
>> @@ -12053,6 +12053,16 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP
>> is needed.
>> @end deftypefn
>> 
>> +@deftypefn {Target Hook} HARD_REG_SET TARGET_ZERO_CALL_USED_REGS (HARD_REG_SET @var{need_zeroed_hardregs})
>> +This target hook emits instructions to zero registers specified
>> +by @var{need_zeroed_hardregs} at function return, at the same time
>> +return the hard register set that are actually zeroed by the hook
>> +Define this hook if the target has more effecient instructions to
>> +zero call-used registers, or if the target only tries to zero a subset
>> +of @var{need_zeroed_hardregs}.
>> +If the hook is not defined, the default_zero_call_used_reg will be used.
>> +@end deftypefn
>> +
>> @deftypefn {Target Hook} bool TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS (void)
>> When optimization is disabled, this hook indicates whether or not
>> arguments should be allocated to stack slots.  Normally, GCC allocates
>> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> index 412e22c..a67dbea 100644
>> --- a/gcc/doc/tm.texi.in
>> +++ b/gcc/doc/tm.texi.in
>> @@ -8111,6 +8111,8 @@ and the associated definitions of those functions.
>> 
>> @hook TARGET_GET_DRAP_RTX
>> 
>> +@hook TARGET_ZERO_CALL_USED_REGS
>> +
>> @hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS
>> 
>> @hook TARGET_CONST_ANCHOR
>> diff --git a/gcc/emit-rtl.h b/gcc/emit-rtl.h
>> index 92ad0dd6..2dbeace0 100644
>> --- a/gcc/emit-rtl.h
>> +++ b/gcc/emit-rtl.h
>> @@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
>>     sets them.  */
>>  HARD_REG_SET asm_clobbers;
>> 
>> +  /* All hard registers that are zeroed at the return of the routine.  */
>> +  HARD_REG_SET zeroed_reg_set;
>> +
>>  /* The highest address seen during shorten_branches.  */
>>  int max_insn_address;
>> };
>> diff --git a/gcc/function.c b/gcc/function.c
>> index c612959..c8181bd 100644
>> --- a/gcc/function.c
>> +++ b/gcc/function.c
>> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
>> #include "emit-rtl.h"
>> #include "recog.h"
>> #include "rtl-error.h"
>> +#include "hard-reg-set.h"
>> #include "alias.h"
>> #include "fold-const.h"
>> #include "stor-layout.h"
>> @@ -5815,6 +5816,182 @@ make_prologue_seq (void)
>>  return seq;
>> }
>> 
>> +/* Check whether the hard register REGNO is live before the return insn RET.  */
>> +static bool
>> +is_live_reg_at_return (unsigned int regno, rtx_insn * ret)
>> +{
>> +  basic_block bb = BLOCK_FOR_INSN (ret);
>> +  auto_bitmap live_out;
>> +  bitmap_copy (live_out, df_get_live_out (bb));
>> +  df_simulate_one_insn_backwards (bb, ret, live_out);
>> +
>> +  if (REGNO_REG_SET_P (live_out, regno))
>> +    return true;
>> +
>> +  return false;
>> +}
>> +
>> +/* Emit a sequence of insns to zero the call-used-registers before RET.  */
>> +
>> +static void
>> +gen_call_used_regs_seq (rtx_insn *ret)
>> +{
>> +  bool gpr_only = true;
>> +  bool used_only = true;
>> +  bool arg_only = true;
>> +  enum zero_call_used_regs zero_regs_type = zero_call_used_regs_unset;
>> +  enum zero_call_used_regs attr_zero_regs_type
>> +     = zero_call_used_regs_unset;
>> +  tree attr_zero_regs
>> + = lookup_attribute ("zero_call_used_regs",
>> +     DECL_ATTRIBUTES (cfun->decl));
>> +
>> +  /* Get the type of zero_call_used_regs from function attribute.  */
>> +  if (attr_zero_regs)
>> +    {
>> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
>> +  is the attribute argument's value.  */
>> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
>> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
>> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
>> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
>> +
>> +      if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "skip") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_skip;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr-arg")
>> + == 0)
>> + attr_zero_regs_type = zero_call_used_regs_used_gpr_arg;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-arg") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_used_arg;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-arg") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_all_arg;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_used_gpr;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-gpr") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_all_gpr;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_used;
>> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all") == 0)
>> + attr_zero_regs_type = zero_call_used_regs_all;
>> +      else
>> + gcc_assert (0);
>> +    }
>> +
>> +  if (flag_zero_call_used_regs)
>> +    if (!attr_zero_regs)
>> +      zero_regs_type = flag_zero_call_used_regs;
>> +    else
>> +      zero_regs_type = attr_zero_regs_type;
>> +  else
>> +    zero_regs_type = attr_zero_regs_type;
>> +
>> +  /* No need to zero call-used-regs when no user request is present.  */
>> +  if (zero_regs_type <= zero_call_used_regs_skip)
>> +    return;
>> +
>> +  /* No need to zero call-used-regs in main ().  */
>> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
>> +    return;
>> +
>> +  /* No need to zero call-used-regs if __builtin_eh_return is called
>> +     since it isn't a normal function return.  */
>> +  if (crtl->calls_eh_return)
>> +    return;
>> +
>> +  /* If gpr_only is true, only zero call-used-registers that are
>> +     general-purpose registers; if used_only is true, only zero
>> +     call-used-registers that are used in the current function.  */
>> +
>> +  switch (zero_regs_type)
>> +    {
>> +      case zero_call_used_regs_used_arg:
>> + gpr_only = false;
>> + break;
>> +      case zero_call_used_regs_all_arg:
>> + gpr_only = false;
>> + used_only = false;
>> + break;
>> +      case zero_call_used_regs_used_gpr:
>> + arg_only = false;
>> + break;
>> +      case zero_call_used_regs_all_gpr:
>> + used_only = false;
>> + arg_only = false;
>> + break;
>> +      case zero_call_used_regs_used:
>> + gpr_only = false;
>> + arg_only = false;
>> + break;
>> +      case zero_call_used_regs_all:
>> + gpr_only = false;
>> + used_only = false;
>> + arg_only = false;
>> + break;
>> +      default:
>> + break;
>> +    }
>> +
>> +  /* For each of the hard registers, check to see whether we should zero it if:
>> +     1. it is a call-used-registers;
>> + and 2. it is not a fixed-registers;
>> + and 3. it is not live at the return of the routine;
>> + and 4. it is general registor if gpr_only is true;
>> + and 5. it is used in the routine if used_only is true;
>> + and 6. it is a register that passes parameter if arg_only is true;
>> +   */
>> +
>> +  HARD_REG_SET need_zeroed_hardregs;
>> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    {
>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
>> + continue;
>> +      if (fixed_regs[regno])
>> + continue;
>> +      if (is_live_reg_at_return (regno, ret))
>> + continue;
>> +      if (gpr_only
>> +   && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
>> + continue;
>> +      if (used_only && !df_regs_ever_live_p (regno))
>> + continue;
>> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
>> + continue;
>> +
>> +      /* Now this is a register that we might want to zero.  */
>> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
>> +    }
>> +
>> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
>> +    return;
>> +
>> +  /* Now we get a hard register set that need to be zeroed, pass it to
>> +     target to generate zeroing sequence.  */
>> +  HARD_REG_SET zeroed_hardregs;
>> +  start_sequence ();
>> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
>> +  rtx_insn *seq = get_insns ();
>> +  end_sequence ();
>> +  if (seq)
>> +    {
>> +      /* emit the memory blockage and register clobber asm volatile before
>> +  the whole sequence.  */
>> +      start_sequence ();
>> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
>> +      rtx_insn *seq_barrier = get_insns ();
>> +      end_sequence ();
>> +
>> +      emit_insn_before (seq_barrier, ret);
>> +      emit_insn_before (seq, ret);
>> +
>> +      /* update the data flow information.  */
>> +      crtl->zeroed_reg_set |= zeroed_hardregs;
>> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
>> +    }
>> +  return;
>> +}
>> +
>> +
>> /* Return a sequence to be used as the epilogue for the current function,
>>   or NULL.  */
>> 
>> @@ -6486,7 +6663,75 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
>> {
>>  return new pass_thread_prologue_and_epilogue (ctxt);
>> }
>> -
>> 
>> +
>> +static unsigned int
>> +rest_of_zero_call_used_regs (void)
>> +{
>> +  basic_block bb;
>> +  rtx_insn *insn;
>> +
>> +  /* This pass needs data flow information.  */
>> +  df_analyze ();
>> +
>> +  /* Search all the "return"s in the routine, and insert instruction sequence to
>> +     zero the call used registers.  */
>> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>> + || (single_succ_p (bb)
>> +     && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
>> +      FOR_BB_INSNS_REVERSE (bb, insn)
>> + if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>> +   {
>> +     /* Now we can insert the instruction sequence to zero the call used
>> +        registers before this insn.  */
>> +     gen_call_used_regs_seq (insn);
>> +     break;
>> +   }
>> +
>> +  return 0;
>> +}
>> +
>> +namespace {
>> +
>> +const pass_data pass_data_zero_call_used_regs =
>> +{
>> +  RTL_PASS, /* type */
>> +  "zero_call_used_regs", /* name */
>> +  OPTGROUP_NONE, /* optinfo_flags */
>> +  TV_NONE, /* tv_id */
>> +  0, /* properties_required */
>> +  0, /* properties_provided */
>> +  0, /* properties_destroyed */
>> +  0, /* todo_flags_start */
>> +  0, /* todo_flags_finish */
>> +};
>> +
>> +class pass_zero_call_used_regs: public rtl_opt_pass
>> +{
>> +public:
>> +  pass_zero_call_used_regs (gcc::context *ctxt)
>> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
>> +  {}
>> +
>> +  /* opt_pass methods: */
>> +  virtual bool gate (function *)
>> +    {
>> +      return flag_zero_call_used_regs > zero_call_used_regs_unset;
>> +    }
>> +  virtual unsigned int execute (function *)
>> +    {
>> +      return rest_of_zero_call_used_regs ();
>> +    }
>> +
>> +}; // class pass_zero_call_used_regs
>> +
>> +} // anon namespace
>> +
>> +rtl_opt_pass *
>> +make_pass_zero_call_used_regs (gcc::context *ctxt)
>> +{
>> +  return new pass_zero_call_used_regs (ctxt);
>> +}
>> 
>> /* If CONSTRAINT is a matching constraint, then return its number.
>>   Otherwise, return -1.  */
>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>> index 8ad7f4b..57e5c5d 100644
>> --- a/gcc/optabs.c
>> +++ b/gcc/optabs.c
>> @@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
>>    expand_asm_memory_blockage ();
>> }
>> 
>> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
>> +   same time clobbering the register set specified by ZEROED_REGS.  */
>> +
>> +void
>> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)
>> +{
>> +  rtx asm_op, clob_mem, clob_reg;
>> +
>> +  unsigned int num_of_regs = 0;
>> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
>> +      num_of_regs++;
>> +
>> +  if (num_of_regs == 0)
>> +    return;
>> +
>> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
>> +  rtvec_alloc (0), rtvec_alloc (0),
>> +  rtvec_alloc (0), UNKNOWN_LOCATION);
>> +  MEM_VOLATILE_P (asm_op) = 1;
>> +
>> +  rtvec v = rtvec_alloc (num_of_regs + 2);
>> +
>> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
>> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
>> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
>> +
>> +  RTVEC_ELT (v,0) = asm_op;
>> +  RTVEC_ELT (v,1) = clob_mem;
>> +
>> +  unsigned int j = 2;
>> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
>> +      {
>> + clob_reg  = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
>> + RTVEC_ELT (v,j) = clob_reg;
>> +  j++;
>> +      }
>> +  gcc_assert (j == (num_of_regs + 2));
>> +
>> +  emit_insn (gen_rtx_PARALLEL (VOIDmode, v));
>> +}
>> +
>> /* This routine will either emit the mem_thread_fence pattern or issue a
>>   sync_synchronize to generate a fence for memory model MEMMODEL.  */
>> 
>> diff --git a/gcc/optabs.h b/gcc/optabs.h
>> index 0b14700..bfa10c8 100644
>> --- a/gcc/optabs.h
>> +++ b/gcc/optabs.h
>> @@ -345,6 +345,8 @@ rtx expand_atomic_store (rtx, rtx, enum memmodel, bool);
>> rtx expand_atomic_fetch_op (rtx, rtx, rtx, enum rtx_code, enum memmodel,
>>      bool);
>> 
>> +extern void expand_asm_reg_clobber_mem_blockage (HARD_REG_SET);
>> +
>> extern bool insn_operand_matches (enum insn_code icode, unsigned int opno,
>>  rtx operand);
>> extern bool valid_multiword_target_p (rtx);
>> diff --git a/gcc/passes.def b/gcc/passes.def
>> index f865bdc..77d4676 100644
>> --- a/gcc/passes.def
>> +++ b/gcc/passes.def
>> @@ -492,6 +492,7 @@ along with GCC; see the file COPYING3.  If not see
>>      POP_INSERT_PASSES ()
>>      NEXT_PASS (pass_late_compilation);
>>      PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>> +   NEXT_PASS (pass_zero_call_used_regs);
>>  NEXT_PASS (pass_compute_alignments);
>>  NEXT_PASS (pass_variable_tracking);
>>  NEXT_PASS (pass_free_cfg);
>> diff --git a/gcc/recog.c b/gcc/recog.c
>> index ce83b7f..472c2dc 100644
>> --- a/gcc/recog.c
>> +++ b/gcc/recog.c
>> @@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
>>  return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>> }
>> 
>> 
>> +
>> +bool
>> +valid_insn_p (rtx_insn *insn)
>> +{
>> +  recog_memoized (insn);
>> +  if (INSN_CODE (insn) < 0)
>> +    return false;
>> +  extract_insn (insn);
>> +  /* We don't know whether the insn will be in code that is optimized
>> +     for size or speed, so consider all enabled alternatives.  */
>> +  if (!constrain_operands (1, get_enabled_alternatives (insn)))
>> +    return false;
>> +  return true;
>> +}
>> +
>> /* Return 1 if OP is a valid general operand for machine mode MODE.
>>   This is either a register reference, a memory reference,
>>   or a constant.  In the case of a memory reference, the address
>> diff --git a/gcc/recog.h b/gcc/recog.h
>> index ae3675f..d87456c 100644
>> --- a/gcc/recog.h
>> +++ b/gcc/recog.h
>> @@ -113,6 +113,7 @@ extern void validate_replace_src_group (rtx, rtx, rtx_insn *);
>> extern bool validate_simplify_insn (rtx_insn *insn);
>> extern int num_changes_pending (void);
>> extern bool reg_fits_class_p (const_rtx, reg_class_t, int, machine_mode);
>> +extern bool valid_insn_p (rtx_insn *);
>> 
>> extern int offsettable_memref_p (rtx);
>> extern int offsettable_nonstrict_memref_p (rtx);
>> diff --git a/gcc/resource.c b/gcc/resource.c
>> index 0a9d594..90cf091 100644
>> --- a/gcc/resource.c
>> +++ b/gcc/resource.c
>> @@ -1186,7 +1186,7 @@ init_resource_info (rtx_insn *epilogue_insn)
>>       &end_of_function_needs, true);
>> 
>>  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> -    if (global_regs[i] || EPILOGUE_USES (i))
>> +    if (global_regs[i] || df_epilogue_uses_p (i))
>>      SET_HARD_REG_BIT (end_of_function_needs.regs, i);
>> 
>>  /* The registers required to be live at the end of the function are
>> diff --git a/gcc/target.def b/gcc/target.def
>> index ed2da15..7d6807d 100644
>> --- a/gcc/target.def
>> +++ b/gcc/target.def
>> @@ -5080,6 +5080,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
>> is needed.",
>> rtx, (void), NULL)
>> 
>> +/* Generate instruction sequence to zero call used registers.  */
>> +DEFHOOK
>> +(zero_call_used_regs,
>> + "This target hook emits instructions to zero registers specified\n\
>> +by @var{need_zeroed_hardregs} at function return, at the same time\n\
>> +return the hard register set that are actually zeroed by the hook\n\
>> +Define this hook if the target has more effecient instructions to\n\
>> +zero call-used registers, or if the target only tries to zero a subset\n\
>> +of @var{need_zeroed_hardregs}.\n\
>> +If the hook is not defined, the default_zero_call_used_reg will be used.",
>> + HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),
>> +default_zero_call_used_regs)
>> +
>> /* Return true if all function parameters should be spilled to the
>>   stack.  */
>> DEFHOOK
>> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
>> index 5d94fce..2318c324 100644
>> --- a/gcc/targhooks.c
>> +++ b/gcc/targhooks.c
>> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
>> #include "tree-ssa-alias.h"
>> #include "gimple-expr.h"
>> #include "memmodel.h"
>> +#include "backend.h"
>> +#include "emit-rtl.h"
>> +#include "df.h"
>> #include "tm_p.h"
>> #include "stringpool.h"
>> #include "tree-vrp.h"
>> @@ -987,6 +990,38 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
>> #endif
>> }
>> 
>> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> +
>> +HARD_REG_SET
>> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  HARD_REG_SET zeroed_hardregs;
>> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
>> +
>> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> + rtx_insn *last_insn = get_last_insn ();
>> + machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
>> + rtx zero = CONST0_RTX (mode);
>> + rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
>> + if (!valid_insn_p (insn))
>> +   {
>> +     static bool issued_error;
>> +     if (!issued_error)
>> +       {
>> + issued_error = true;
>> + sorry ("-fzero-call-used-regs not supported on this target");
>> +       }
>> +     delete_insns_since (last_insn);
>> +   }
>> + else
>> +   SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> +      }
>> +  return zeroed_hardregs;
>> +}
>> +
>> rtx
>> default_internal_arg_pointer (void)
>> {
>> diff --git a/gcc/targhooks.h b/gcc/targhooks.h
>> index 44ab926..e0a925f 100644
>> --- a/gcc/targhooks.h
>> +++ b/gcc/targhooks.h
>> @@ -160,6 +160,7 @@ extern unsigned int default_function_arg_round_boundary (machine_mode,
>> const_tree);
>> extern bool hook_bool_const_rtx_commutative_p (const_rtx, int);
>> extern rtx default_function_value (const_tree, const_tree, bool);
>> +extern HARD_REG_SET default_zero_call_used_regs (HARD_REG_SET);
>> extern rtx default_libcall_value (machine_mode, const_rtx);
>> extern bool default_function_value_regno_p (const unsigned int);
>> extern rtx default_internal_arg_pointer (void);
>> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
>> new file mode 100644
>> index 0000000..f44add9
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-1.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
>> +
>> +volatile int result = 0;
>> +int
>> +__attribute__((noinline))
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +int main()
>> +{
>> +  result = foo (2);
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
>> new file mode 100644
>> index 0000000..7c8350b
>> --- /dev/null
>> +++ b/gcc/testsuite/c-c++-common/zero-scratch-regs-2.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do run } */
>> +/* { dg-options "-O2" } */
>> +
>> +volatile int result = 0;
>> +int
>> +__attribute__((noinline))
>> +__attribute__ ((zero_call_used_regs("all")))
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +int main()
>> +{
>> +  result = foo (2);
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
>> new file mode 100644
>> index 0000000..9f61dc4
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-1.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
>> new file mode 100644
>> index 0000000..09048e5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-10.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +extern int foo (int) __attribute__ ((zero_call_used_regs("all-gpr")));
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
>> new file mode 100644
>> index 0000000..4862688
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-11.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do run { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
>> +
>> +struct S { int i; };
>> +__attribute__((const, noinline, noclone))
>> +struct S foo (int x)
>> +{
>> +  struct S s;
>> +  s.i = x;
>> +  return s;
>> +}
>> +
>> +int a[2048], b[2048], c[2048], d[2048];
>> +struct S e[2048];
>> +
>> +__attribute__((noinline, noclone)) void
>> +bar (void)
>> +{
>> +  int i;
>> +  for (i = 0; i < 1024; i++)
>> +    {
>> +      e[i] = foo (i);
>> +      a[i+2] = a[i] + a[i+1];
>> +      b[10] = b[10] + i;
>> +      c[i] = c[2047 - i];
>> +      d[i] = d[i + 1];
>> +    }
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  int i;
>> +  bar ();
>> +  for (i = 0; i < 1024; i++)
>> +    if (e[i].i != i)
>> +      __builtin_abort ();
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
>> new file mode 100644
>> index 0000000..500251b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-12.c
>> @@ -0,0 +1,39 @@
>> +/* { dg-do run { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
>> +
>> +struct S { int i; };
>> +__attribute__((const, noinline, noclone))
>> +struct S foo (int x)
>> +{
>> +  struct S s;
>> +  s.i = x;
>> +  return s;
>> +}
>> +
>> +int a[2048], b[2048], c[2048], d[2048];
>> +struct S e[2048];
>> +
>> +__attribute__((noinline, noclone)) void
>> +bar (void)
>> +{
>> +  int i;
>> +  for (i = 0; i < 1024; i++)
>> +    {
>> +      e[i] = foo (i);
>> +      a[i+2] = a[i] + a[i+1];
>> +      b[10] = b[10] + i;
>> +      c[i] = c[2047 - i];
>> +      d[i] = d[i + 1];
>> +    }
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  int i;
>> +  bar ();
>> +  for (i = 0; i < 1024; i++)
>> +    if (e[i].i != i)
>> +      __builtin_abort ();
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
>> new file mode 100644
>> index 0000000..8b058e3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-13.c
>> @@ -0,0 +1,21 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
>> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 15 { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
>> new file mode 100644
>> index 0000000..d4eaaf7
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-14.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-times "vzeroall" 1 } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
>> new file mode 100644
>> index 0000000..dd3bb90
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-15.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +extern void foo (void) __attribute__ ((zero_call_used_regs("used")));
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
>> new file mode 100644
>> index 0000000..e2274f6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-16.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all" } */
>> +
>> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
>> new file mode 100644
>> index 0000000..7f5d153
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-17.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
>> new file mode 100644
>> index 0000000..fe13d2b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-18.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
>> +
>> +float
>> +foo (float z, float y, float x)
>> +{
>> +  return x + y;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
>> new file mode 100644
>> index 0000000..205a532
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-19.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used -march=corei7" } */
>> +
>> +float
>> +foo (float z, float y, float x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm2, %xmm2" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
>> new file mode 100644
>> index 0000000..e046684
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-2.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
>> new file mode 100644
>> index 0000000..4be8ff6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-20.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7" } */
>> +
>> +float
>> +foo (float z, float y, float x)
>> +{
>> +  return x + y;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm0, %xmm\[0-9\]+" 7 { target { ia32 } } } } */
>> +/* { dg-final { scan-assembler-times "movaps\[ \t\]*%xmm1, %xmm\[0-9\]+" 14 { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
>> new file mode 100644
>> index 0000000..0eb34e0
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-21.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip -march=corei7" } */
>> +
>> +__attribute__ ((zero_call_used_regs("used")))
>> +float
>> +foo (float z, float y, float x)
>> +{
>> +  return x + y;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm1, %xmm1" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm1, %xmm2" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
>> new file mode 100644
>> index 0000000..76742bb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-22.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler "vzeroall" } } */
>> +/* { dg-final { scan-assembler "emms" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
>> new file mode 100644
>> index 0000000..18a5ffb
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-23.c
>> @@ -0,0 +1,28 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all -march=corei7 -mavx512f" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler "vzeroall" } } */
>> +/* { dg-final { scan-assembler "emms" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kxorw\[ \t\]*%k0, %k0, %k0" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k1" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k2" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k3" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k4" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k5" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k6" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "kmovw\[ \t\]*%k0, %k7" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
>> new file mode 100644
>> index 0000000..208633e
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-24.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr-arg" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
>> new file mode 100644
>> index 0000000..21e82c6
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-25.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used-arg" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
>> new file mode 100644
>> index 0000000..293d2fe
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-26.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all-arg" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" } } */
>> +/* { dg-final { scan-assembler "pxor\[ \t\]*%xmm0, %xmm0" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm1" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm2" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm3" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm4" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm5" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm6" } } */
>> +/* { dg-final { scan-assembler "movaps\[ \t\]*%xmm0, %xmm7" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
>> new file mode 100644
>> index 0000000..de71223
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-3.c
>> @@ -0,0 +1,12 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
>> new file mode 100644
>> index 0000000..ccfa441
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-4.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +extern void foo (void) __attribute__ ((zero_call_used_regs("used-gpr")));
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
>> new file mode 100644
>> index 0000000..6b46ca3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-5.c
>> @@ -0,0 +1,20 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +__attribute__ ((zero_call_used_regs("all-gpr")))
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%eax, %eax" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%eax, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
>> new file mode 100644
>> index 0000000..0680f38
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-6.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
>> +
>> +extern void foo (void) __attribute__ ((zero_call_used_regs("skip")));
>> +
>> +void
>> +foo (void)
>> +{
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" } } */
>> +/* { dg-final { scan-assembler-not "movl\[ \t\]*%" } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
>> new file mode 100644
>> index 0000000..534defa
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-7.c
>> @@ -0,0 +1,13 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=used-gpr" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
>> new file mode 100644
>> index 0000000..477bb19
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-8.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=all-gpr" } */
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edx, %edx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %ecx" } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %esi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %edi" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r8d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r9d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r10d" { target { ! ia32 } } } } */
>> +/* { dg-final { scan-assembler "movl\[ \t\]*%edx, %r11d" { target { ! ia32 } } } } */
>> diff --git a/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>> new file mode 100644
>> index 0000000..a305a60
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/i386/zero-scratch-regs-9.c
>> @@ -0,0 +1,15 @@
>> +/* { dg-do compile { target *-*-linux* } } */
>> +/* { dg-options "-O2 -fzero-call-used-regs=skip" } */
>> +
>> +extern int foo (int) __attribute__ ((zero_call_used_regs("used-gpr")));
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "vzeroall" } } */
>> +/* { dg-final { scan-assembler-not "%xmm" } } */
>> +/* { dg-final { scan-assembler-not "xorl\[ \t\]*%" { target ia32 } } } */
>> +/* { dg-final { scan-assembler "xorl\[ \t\]*%edi, %edi" { target { ! ia32 } } } } */
>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
>> index 62e5b69..8afe8ee 100644
>> --- a/gcc/tree-pass.h
>> +++ b/gcc/tree-pass.h
>> @@ -592,6 +592,7 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt);
>> extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt);
>> extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context
>>     *ctxt);
>> +extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt);
>> extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt);
>> extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);
>> extern rtl_opt_pass *make_pass_peephole2 (gcc::context *ctxt);
>> --
>> 1.8.3.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 14:00   ` Qing Zhao
@ 2020-10-20 15:24     ` Uros Bizjak
  2020-10-20 20:04       ` Qing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2020-10-20 15:24 UTC (permalink / raw)
  To: Qing Zhao
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

On Tue, Oct 20, 2020 at 4:01 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Uros,
>
> Thanks a lot for your comments.
>
> On Oct 19, 2020, at 2:30 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
>
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index f684954..620114f 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
>  return false;
> }
>
> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers at this time.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> + bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +  || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +  || (TARGET_64BIT
> +      && (REX_SSE_REGNO_P (regno)
> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_MMX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_mmx_emms ();
>
>
> emms is not clearing any register, it only loads x87FPUTagWord with
> FFFFH. So I think, the above is useless, as far as register clearing
> is concerned.
>
>
> Thanks for the info.
>
> So, for mm and st registers, should we clear them, and how?

I don't know.

Please note that %mm and %st share the same register file, and
touching %mm registers will block access to %st until emms is emitted.
You can't just blindly load 0 to %st registers, because the register
file can be in MMX mode and vice versa. For 32bit targets, function
can also  return a value in the %mm0.

>
>
> +}
> +
> +/* TARGET_ZERO_CALL_USED_REGS.  */
> +/* Generate a sequence of instructions that zero registers specified by
> +   NEED_ZEROED_HARDREGS.  Return the ZEROED_HARDREGS that are actually
> +   zeroed.  */
> +static HARD_REG_SET
> +ix86_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  bool all_sse_zeroed = false;
> +
> +  /* first, let's see whether we can zero all vector registers together.  */
> +  rtx zero_all_vec_insn = zero_all_vector_registers (need_zeroed_hardregs);
> +  if (zero_all_vec_insn)
> +    {
> +      emit_insn (zero_all_vec_insn);
> +      all_sse_zeroed = true;
> +    }
> +
> +  /* then, let's see whether we can zero all st+mm registers togeter.  */
> +  rtx zero_all_st_mm_insn = zero_all_st_mm_registers (need_zeroed_hardregs);
> +  if (zero_all_st_mm_insn)
> +    emit_insn (zero_all_st_mm_insn);
> +
> +  /* Now, generate instructions to zero all the registers.  */
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  rtx zero_gpr = NULL_RTX;
> +  rtx zero_vector = NULL_RTX;
> +  rtx zero_mask = NULL_RTX;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> + continue;
> +      if (!zero_call_used_regno_p (regno, all_sse_zeroed))
> + continue;
> +
> +      SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +
> +      rtx reg, tmp;
> +      machine_mode mode = zero_call_used_regno_mode (regno);
> +
> +      reg = gen_rtx_REG (mode, regno);
> +
> +      if (mode == SImode)
> + if (zero_gpr == NULL_RTX)
> +   {
> +     zero_gpr = reg;
> +     tmp = gen_rtx_SET (reg, const0_rtx);
> +     if (!TARGET_USE_MOV0 || optimize_insn_for_size_p ())
>
>
> No need to complicate here, there is a peephole2 pattern that will perform:
>
> ;; Attempt to always use XOR for zeroing registers (including FP modes).
> (define_peephole2
>  [(set (match_operand 0 "general_reg_operand")
>    (match_operand 1 "const0_operand"))]
>
> So, simply load a register with 0 and leave to the peephole to do its magic.
>
>
> Since the new register zeroing pass is after peephole2 pass, the above peephole optimization cannot be applied.
>
>           NEXT_PASS (pass_peephole2);   ====> peephole2
>           NEXT_PASS (pass_if_after_reload);
>           NEXT_PASS (pass_regrename);
>           NEXT_PASS (pass_cprop_hardreg);
>           NEXT_PASS (pass_fast_rtl_dce);
>           NEXT_PASS (pass_reorder_blocks);
>           NEXT_PASS (pass_leaf_regs);
>           NEXT_PASS (pass_split_before_sched2);
>           NEXT_PASS (pass_sched2);
>           NEXT_PASS (pass_stack_regs);
>           PUSH_INSERT_PASSES_WITHIN (pass_stack_regs)
>               NEXT_PASS (pass_split_before_regstack);
>               NEXT_PASS (pass_stack_regs_run);
>           POP_INSERT_PASSES ()
>       POP_INSERT_PASSES ()
>       NEXT_PASS (pass_late_compilation);
>       PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
>           NEXT_PASS (pass_zero_call_used_regs);   ====> new zero registers pass.
>           NEXT_PASS (pass_compute_alignments);
>           NEXT_PASS (pass_variable_tracking);
>
> So, the current code should still be necessary?

Yes, I was not aware that the pass is after peephole2.

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-06 14:01 [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] Qing Zhao
  2020-10-19 13:48 ` Qing Zhao
  2020-10-19 19:30 ` Uros Bizjak
@ 2020-10-20 18:12 ` Richard Sandiford
  2020-10-20 21:47   ` Qing Zhao
  2020-10-22 13:49   ` Qing Zhao
  2 siblings, 2 replies; 20+ messages in thread
From: Richard Sandiford @ 2020-10-20 18:12 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Uros Bizjak, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
> @@ -4959,6 +4963,52 @@ handle_no_split_stack_attribute (tree *node, tree name,
>   return NULL_TREE;
> }
>
> +/* Handle a "zero_call_used_regs" attribute; arguments as in
> +   struct attribute_spec.handler.  */
> +
> +static tree
> +handle_zero_call_used_regs_attribute (tree *node, tree name, tree args,
> +				      int ARG_UNUSED (flags),
> +				      bool *no_add_attris)

s/attris/attrs/

> +{
> +  tree decl = *node;
> +  tree id = TREE_VALUE (args);
> +
> +  if (TREE_CODE (decl) != FUNCTION_DECL)
> +    {
> +      error_at (DECL_SOURCE_LOCATION (decl),
> +		"%qE attribute applies only to functions", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if (TREE_CODE (id) != STRING_CST)
> +    {
> +      error ("attribute %qE arguments not a string", name);
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  if ((strcmp (TREE_STRING_POINTER (id), "skip") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-arg") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all-gpr") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "used") != 0)
> +      && (strcmp (TREE_STRING_POINTER (id), "all") != 0))

Any reason we don't support all-gpr-arg?  Seems to be the only
“missing” combination.

Would be good to have a single piece of code that parses these
arguments into a set of flags, rather than have one list here
and one get_call_used_regs_seq.

Maybe we could do something similar to sanitizer_opts, but that
might not be necessary.

> +    {
> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs,"
> +	     "%qs, %qs, %qs, or %qs",
> + 	     name, "skip", "used-gpr-arg", "used-arg", "all-arg",
> +	     "used-gpr", "all-gpr", "used", "all");
> +      *no_add_attris = true;
> +      return NULL_TREE;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> /* Handle a "returns_nonnull" attribute; arguments as in
>    struct attribute_spec.handler.  */
>
> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
> index 6b6cfcd..0ce5eb4 100644
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -418,6 +418,19 @@ enum symbol_visibility
>   VISIBILITY_INTERNAL
> };
>
> +/* Zero call-used registers type.  */
> +enum zero_call_used_regs {
> +  zero_call_used_regs_unset = 0,
> +  zero_call_used_regs_skip,
> +  zero_call_used_regs_used_gpr_arg,
> +  zero_call_used_regs_used_arg,
> +  zero_call_used_regs_all_arg,
> +  zero_call_used_regs_used_gpr,
> +  zero_call_used_regs_all_gpr,
> +  zero_call_used_regs_used,
> +  zero_call_used_regs_all
> +};

I think a bitmask would be easier to use:

  SKIP
  ONLY_USED
  ONLY_GPR
  ONLY_ARG

Should probably be a class enum given that we're C++11.

> +/* Return true if REGNO is used by the epilogue.  */
> +bool
> +df_epilogue_uses_p (unsigned int regno)
> +{
> +    return (EPILOGUE_USES (regno)
> +	    || TEST_HARD_REG_BIT (crtl->zeroed_reg_set, regno));

Nit: the { … } body should be indented by two spaces rather than four.

> diff --git a/gcc/df.h b/gcc/df.h
> index 8b6ca8c..0f098d7 100644
> --- a/gcc/df.h
> +++ b/gcc/df.h
> @@ -1085,6 +1085,7 @@ extern void df_update_entry_exit_and_calls (void);
> extern bool df_hard_reg_used_p (unsigned int);
> extern unsigned int df_hard_reg_used_count (unsigned int);
> extern bool df_regs_ever_live_p (unsigned int);
> +extern bool df_epilogue_uses_p (unsigned int);
> extern void df_set_regs_ever_live (unsigned int, bool);
> extern void df_compute_regs_ever_live (bool);
> extern void df_scan_verify (void);
> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
> index c9f7299..f56f61a 100644
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -3992,6 +3992,30 @@ performing a link with relocatable output (i.e.@: @code{ld -r}) on them.
> A declaration to which @code{weakref} is attached and that is associated
> with a named @code{target} must be @code{static}.
>
> +@item zero_call_used_regs ("@var{choice}")
> +@cindex @code{zero_call_used_regs} function attribute
> +
> +The @code{zero_call_used_regs} attribute causes the compiler to zero
> +call-used registers at function return according to @var{choice}.
> +This is used to increase the program security by either mitigating
> +Return-Oriented Programming (ROP) or preventing information leak
> +through registers.
> +@samp{skip} doesn't zero call-used registers.
> +
> +@samp{used-arg-gpr} zeros used call-used general purpose registers that

used-gpr-arg

> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.

The description for all-gpr doesn't look right.  I think it would
be easier to describe (and hopefully to follow) if we start with
the three basic choices: “skip”, “used” and “all”.  Then describe
how “used” and “all” can be modified by adding “-gpr” to limit the
clearing to general-purpose registers and “-arg” to limit the
clearing to argument registers.

We need to say what “call-used” and “used” mean in this context.
In particular, “call-used” is also known as “call-clobbered”,
“caller-saved“ and “volatile”, so it would be good to list those
as alternatives.  We need to say what “used” registers are.

> @@ -12550,6 +12550,29 @@ int foo (void)
>
> Not all targets support this option.
>
> +@item -fzero-call-used-regs=@var{choice}
> +@opindex fzero-call-used-regs
> +Zero call-used registers at function return to increase the program
> +security by either mitigating Return-Oriented Programming (ROP) or
> +preventing information leak through registers.
> +
> +@samp{skip}, which is the default, doesn't zero call-used registers.
> +
> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
> +pass parameters. @samp{used-arg} zeros used call-used registers that
> +pass parameters. @samp{all-arg} zeros all call-used registers that pass
> +parameters.  These 3 choices are used for ROP mitigation.
> +
> +@samp{used-gpr} zeros call-used general purpose registers
> +which are used in function.  @samp{all-gpr} zeros all
> +call-used registers.  @samp{used} zeros call-used registers which
> +are used in function.  @samp{all} zeros all call-used registers.
> +These 4 choices are used for preventing information leak through
> +registers.

Same comment here.

> @@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
>      sets them.  */
>   HARD_REG_SET asm_clobbers;
>
> +  /* All hard registers that are zeroed at the return of the routine.  */
> +  HARD_REG_SET zeroed_reg_set;

How about “must_be_zero_on_return“?  “zeroed_reg_set” isn't very
specific about where the zeroing happens or is needed.  E.g. we also
zero uninitialised registers.

> +
>   /* The highest address seen during shorten_branches.  */
>   int max_insn_address;
> };
> diff --git a/gcc/function.c b/gcc/function.c
> index c612959..c8181bd 100644
> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
> #include "emit-rtl.h"
> #include "recog.h"
> #include "rtl-error.h"
> +#include "hard-reg-set.h"
> #include "alias.h"
> #include "fold-const.h"
> #include "stor-layout.h"
> @@ -5815,6 +5816,182 @@ make_prologue_seq (void)
>   return seq;
> }
>
> +/* Check whether the hard register REGNO is live before the return insn RET.  */
> +static bool
> +is_live_reg_at_return (unsigned int regno, rtx_insn * ret)

Nit: should be no space after “*”.

> +{
> +  basic_block bb = BLOCK_FOR_INSN (ret);
> +  auto_bitmap live_out;
> +  bitmap_copy (live_out, df_get_live_out (bb));

Sorry, forgot that here we should do:

  df_simulate_initialize_backwards (bb, live_out);

But we should calculate this set once per return instruction rather
than repeat the calculation for every register.

> +  df_simulate_one_insn_backwards (bb, ret, live_out);
> +
> +  if (REGNO_REG_SET_P (live_out, regno))
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Emit a sequence of insns to zero the call-used-registers before RET.  */
> +
> +static void
> +gen_call_used_regs_seq (rtx_insn *ret)
> +{
> +  bool gpr_only = true;
> +  bool used_only = true;
> +  bool arg_only = true;
> +  enum zero_call_used_regs zero_regs_type = zero_call_used_regs_unset;
> +  enum zero_call_used_regs attr_zero_regs_type
> +			    = zero_call_used_regs_unset;
> +  tree attr_zero_regs
> +	= lookup_attribute ("zero_call_used_regs",
> +			    DECL_ATTRIBUTES (cfun->decl));
> +
> +  /* Get the type of zero_call_used_regs from function attribute.  */
> +  if (attr_zero_regs)
> +    {
> +      /* The TREE_VALUE of an attribute is a TREE_LIST whose TREE_VALUE
> +	 is the attribute argument's value.  */
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == TREE_LIST);
> +      attr_zero_regs = TREE_VALUE (attr_zero_regs);
> +      gcc_assert (TREE_CODE (attr_zero_regs) == STRING_CST);
> +
> +      if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "skip") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_skip;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr-arg")
> +		== 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_gpr_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-arg") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-arg") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all_arg;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used-gpr") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all-gpr") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all_gpr;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "used") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_used;
> +      else if (strcmp (TREE_STRING_POINTER (attr_zero_regs), "all") == 0)
> +	attr_zero_regs_type = zero_call_used_regs_all;
> +      else
> +	gcc_assert (0);
> +    }
> +
> +  if (flag_zero_call_used_regs)
> +    if (!attr_zero_regs)
> +      zero_regs_type = flag_zero_call_used_regs;
> +    else
> +      zero_regs_type = attr_zero_regs_type;
> +  else
> +    zero_regs_type = attr_zero_regs_type;
> +
> +  /* No need to zero call-used-regs when no user request is present.  */
> +  if (zero_regs_type <= zero_call_used_regs_skip)
> +    return;
> +
> +  /* No need to zero call-used-regs in main ().  */
> +  if (MAIN_NAME_P (DECL_NAME (current_function_decl)))
> +    return;
> +
> +  /* No need to zero call-used-regs if __builtin_eh_return is called
> +     since it isn't a normal function return.  */
> +  if (crtl->calls_eh_return)
> +    return;
> +
> +  /* If gpr_only is true, only zero call-used-registers that are
> +     general-purpose registers; if used_only is true, only zero
> +     call-used-registers that are used in the current function.  */
> +
> +  switch (zero_regs_type)
> +    {
> +      case zero_call_used_regs_used_arg:
> +	gpr_only = false;
> +	break;
> +      case zero_call_used_regs_all_arg:
> +	gpr_only = false;
> +	used_only = false;
> +	break;
> +      case zero_call_used_regs_used_gpr:
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_all_gpr:
> +	used_only = false;
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_used:
> +	gpr_only = false;
> +	arg_only = false;
> +	break;
> +      case zero_call_used_regs_all:
> +	gpr_only = false;
> +	used_only = false;
> +	arg_only = false;
> +	break;
> +      default:
> +	break;
> +    }

Using a bitmask would also simplify this.

> +
> +  /* For each of the hard registers, check to see whether we should zero it if:
> +     1. it is a call-used-registers;
> + and 2. it is not a fixed-registers;
> + and 3. it is not live at the return of the routine;
> + and 4. it is general registor if gpr_only is true;
> + and 5. it is used in the routine if used_only is true;
> + and 6. it is a register that passes parameter if arg_only is true;
> +   */
> +
> +  HARD_REG_SET need_zeroed_hardregs;
> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    {
> +      if (!this_target_hard_regs->x_call_used_regs[regno])
> +	continue;

This should use crtl->abi instead.  The set of call-used registers
can vary from function to function.

> +      if (fixed_regs[regno])
> +	continue;
> +      if (is_live_reg_at_return (regno, ret))
> +	continue;
> +      if (gpr_only
> +	  && !TEST_HARD_REG_BIT (reg_class_contents[GENERAL_REGS], regno))
> +	continue;
> +      if (used_only && !df_regs_ever_live_p (regno))
> +	continue;
> +      if (arg_only && !FUNCTION_ARG_REGNO_P (regno))
> +	continue;
> +
> +      /* Now this is a register that we might want to zero.  */
> +      SET_HARD_REG_BIT (need_zeroed_hardregs, regno);
> +    }
> +
> +  if (hard_reg_set_empty_p (need_zeroed_hardregs))
> +    return;
> +
> +  /* Now we get a hard register set that need to be zeroed, pass it to
> +     target to generate zeroing sequence.  */
> +  HARD_REG_SET zeroed_hardregs;
> +  start_sequence ();
> +  zeroed_hardregs = targetm.calls.zero_call_used_regs (need_zeroed_hardregs);
> +  rtx_insn *seq = get_insns ();
> +  end_sequence ();
> +  if (seq)
> +    {
> +      /* emit the memory blockage and register clobber asm volatile before

Nit: “Emit”

> +	 the whole sequence.  */
> +      start_sequence ();
> +      expand_asm_reg_clobber_mem_blockage (zeroed_hardregs);
> +      rtx_insn *seq_barrier = get_insns ();
> +      end_sequence ();
> +
> +      emit_insn_before (seq_barrier, ret);
> +      emit_insn_before (seq, ret);
> +
> +      /* update the data flow information.  */

“Update”

> +      crtl->zeroed_reg_set |= zeroed_hardregs;
> +      df_set_bb_dirty (EXIT_BLOCK_PTR_FOR_FN (cfun));
> +    }
> +  return;

GCC style is to avoid returns at the end of void functions.

> +}
> +
> +
> /* Return a sequence to be used as the epilogue for the current function,
>    or NULL.  */
>
> @@ -6486,7 +6663,75 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt)
> {
>   return new pass_thread_prologue_and_epilogue (ctxt);
> }
> -
>
> +
> +static unsigned int
> +rest_of_zero_call_used_regs (void)
> +{
> +  basic_block bb;
> +  rtx_insn *insn;
> +
> +  /* This pass needs data flow information.  */
> +  df_analyze ();
> +
> +  /* Search all the "return"s in the routine, and insert instruction sequence to
> +     zero the call used registers.  */
> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
> +	|| (single_succ_p (bb)
> +	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
> +      FOR_BB_INSNS_REVERSE (bb, insn)
> +	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
> +	  {
> +	    /* Now we can insert the instruction sequence to zero the call used
> +	       registers before this insn.  */
> +	    gen_call_used_regs_seq (insn);
> +	    break;
> +	  }

The exit block never has instructions, so it's only necessary to process
predecessors.  A simpler way to do that is to iterate over the edges in:

  EXIT_BLOCK_PTR_FOR_FN (cfun)->preds

You shouldn't need to use FOR_BB_INSNS_REVERSE: it should be enough
to check only BB_END (bb), since returns always end blocks.

> +
> +  return 0;
> +}
> +
> +namespace {
> +
> +const pass_data pass_data_zero_call_used_regs =
> +{
> +  RTL_PASS, /* type */
> +  "zero_call_used_regs", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  0, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_zero_call_used_regs: public rtl_opt_pass
> +{
> +public:
> +  pass_zero_call_used_regs (gcc::context *ctxt)
> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +    {
> +      return flag_zero_call_used_regs > zero_call_used_regs_unset;

I think this also needs to check the function attributes.

> +    }
> +  virtual unsigned int execute (function *)
> +    {
> +      return rest_of_zero_call_used_regs ();
> +    }
> +
> +}; // class pass_zero_call_used_regs
> +
> +} // anon namespace
> +
> +rtl_opt_pass *
> +make_pass_zero_call_used_regs (gcc::context *ctxt)
> +{
> +  return new pass_zero_call_used_regs (ctxt);
> +}
>
> /* If CONSTRAINT is a matching constraint, then return its number.
>    Otherwise, return -1.  */
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index 8ad7f4b..57e5c5d 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
>     expand_asm_memory_blockage ();
> }
>
> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
> +   same time clobbering the register set specified by ZEROED_REGS.  */
> +
> +void
> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)

Just “regs”: the interface is more general than registers that are being
zeroed.

> +{
> +  rtx asm_op, clob_mem, clob_reg;
> +
> +  unsigned int num_of_regs = 0;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      num_of_regs++;
> +
> +  if (num_of_regs == 0)
> +    return;

For this interface, I think we should continue and just include
a memory clobber.

> +
> +  asm_op = gen_rtx_ASM_OPERANDS (VOIDmode, "", "", 0,
> +				 rtvec_alloc (0), rtvec_alloc (0),
> +				 rtvec_alloc (0), UNKNOWN_LOCATION);
> +  MEM_VOLATILE_P (asm_op) = 1;
> +
> +  rtvec v = rtvec_alloc (num_of_regs + 2);
> +
> +  clob_mem = gen_rtx_SCRATCH (VOIDmode);
> +  clob_mem = gen_rtx_MEM (BLKmode, clob_mem);
> +  clob_mem = gen_rtx_CLOBBER (VOIDmode, clob_mem);
> +
> +  RTVEC_ELT (v,0) = asm_op;
> +  RTVEC_ELT (v,1) = clob_mem;
> +
> +  unsigned int j = 2;
> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
> +      {
> +	clob_reg  = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);

Nit: should just be one space before “=”.  However…

> +	RTVEC_ELT (v,j) = clob_reg;

…IMO it would be more readable as just:

	RTVEC_ELT (v, j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);

> diff --git a/gcc/recog.c b/gcc/recog.c
> index ce83b7f..472c2dc 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
>   return ((num_changes_pending () > 0) && (apply_change_group () > 0));
> }
>
>
> +
> +bool
> +valid_insn_p (rtx_insn *insn)

This should have a function comment.  E.g.:

/* Check whether INSN matches a specific alternative of an .md pattern.  */

> +{
> +  recog_memoized (insn);
> +  if (INSN_CODE (insn) < 0)
> +    return false;
> +  extract_insn (insn);
> +  /* We don't know whether the insn will be in code that is optimized
> +     for size or speed, so consider all enabled alternatives.  */
> +  if (!constrain_operands (1, get_enabled_alternatives (insn)))
> +    return false;
> +  return true;
> +}
> +
> /* Return 1 if OP is a valid general operand for machine mode MODE.
>    This is either a register reference, a memory reference,
>    or a constant.  In the case of a memory reference, the address
> diff --git a/gcc/target.def b/gcc/target.def
> index ed2da15..7d6807d 100644
> --- a/gcc/target.def
> +++ b/gcc/target.def
> @@ -5080,6 +5080,19 @@ argument list due to stack realignment.  Return @code{NULL} if no DRAP\n\
> is needed.",
>  rtx, (void), NULL)
>
> +/* Generate instruction sequence to zero call used registers.  */
> +DEFHOOK
> +(zero_call_used_regs,
> + "This target hook emits instructions to zero registers specified\n\
> +by @var{need_zeroed_hardregs} at function return, at the same time\n\
> +return the hard register set that are actually zeroed by the hook\n\
> +Define this hook if the target has more effecient instructions to\n\
> +zero call-used registers, or if the target only tries to zero a subset\n\
> +of @var{need_zeroed_hardregs}.\n\
> +If the hook is not defined, the default_zero_call_used_reg will be used.",
> + HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),

I'd suggest:

 "Emit instructions to zero the subset of @var{selected_regs} that\n\
could conceivably contain values that are useful to an attacker.\n\
Return the set of registers that were actually cleared.\n\
\n\
The default implementation uses normal move instructions to zero\n\
all the registers in @var{selected_regs}.  Define this hook if the\n\
target has more efficient ways of zeroing certain registers,\n\
or if you believe that certain registers would never contain\n\
values that are useful to an attacker."

with the parameter called “selected_regs” instead of
“need_zeroed_hardregs”.  (“need” suggests that the target
doesn't have the option of not zeroing.)

> +default_zero_call_used_regs)
> +
> /* Return true if all function parameters should be spilled to the
>    stack.  */
> DEFHOOK
> diff --git a/gcc/targhooks.c b/gcc/targhooks.c
> index 5d94fce..2318c324 100644
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -56,6 +56,9 @@ along with GCC; see the file COPYING3.  If not see
> #include "tree-ssa-alias.h"
> #include "gimple-expr.h"
> #include "memmodel.h"
> +#include "backend.h"
> +#include "emit-rtl.h"
> +#include "df.h"
> #include "tm_p.h"
> #include "stringpool.h"
> #include "tree-vrp.h"
> @@ -987,6 +990,38 @@ default_function_value_regno_p (const unsigned int regno ATTRIBUTE_UNUSED)
> #endif
> }
>
> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
> +
> +HARD_REG_SET
> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  HARD_REG_SET zeroed_hardregs;
> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
> +
> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      {
> +	rtx_insn *last_insn = get_last_insn ();
> +	machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
> +	rtx zero = CONST0_RTX (mode);
> +	rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
> +	if (!valid_insn_p (insn))
> +	  {
> +	    static bool issued_error;
> +	    if (!issued_error)
> +	      {
> +		issued_error = true;
> +		sorry ("-fzero-call-used-regs not supported on this target");

Should be "%qs not supported on this target", with the option name
as a second argument.

> +	      }
> +	    delete_insns_since (last_insn);
> +	  }
> +	else
> +	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
> +      }
> +  return zeroed_hardregs;

I don't think it's worth building up a different return set.  The only
time it's different from need_zeroed_hardregs is if we emit the sorry,
which will cause compilation to fail anyway.

Sorry, I ran out of time to review the tests, but the code part
otherwise looks good.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 15:24     ` Uros Bizjak
@ 2020-10-20 20:04       ` Qing Zhao
  2020-10-21  7:18         ` Uros Bizjak
  0 siblings, 1 reply; 20+ messages in thread
From: Qing Zhao @ 2020-10-20 20:04 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via



> On Oct 20, 2020, at 10:24 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Tue, Oct 20, 2020 at 4:01 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Hi, Uros,
>> 
>> Thanks a lot for your comments.
>> 
>> On Oct 19, 2020, at 2:30 PM, Uros Bizjak <ubizjak@gmail.com> wrote:
>> 
>> 
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index f684954..620114f 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3551,6 +3551,161 @@ ix86_function_value_regno_p (const unsigned int regno)
>> return false;
>> }
>> 
>> +/* Check whether the register REGNO should be zeroed on X86.
>> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>> +   together, no need to zero it again.
>> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>> +   very hard to be zeroed individually, don't zero individual st or
>> +   mm registgers at this time.  */
>> +
>> +static bool
>> +zero_call_used_regno_p (const unsigned int regno,
>> + bool all_sse_zeroed)
>> +{
>> +  return GENERAL_REGNO_P (regno)
>> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
>> +  || MASK_REGNO_P (regno);
>> +}
>> +
>> +/* Return the machine_mode that is used to zero register REGNO.  */
>> +
>> +static machine_mode
>> +zero_call_used_regno_mode (const unsigned int regno)
>> +{
>> +  /* NB: We only need to zero the lower 32 bits for integer registers
>> +     and the lower 128 bits for vector registers since destination are
>> +     zero-extended to the full register width.  */
>> +  if (GENERAL_REGNO_P (regno))
>> +    return SImode;
>> +  else if (SSE_REGNO_P (regno))
>> +    return V4SFmode;
>> +  else
>> +    return HImode;
>> +}
>> +
>> +/* Generate a rtx to zero all vector registers togetehr if possible,
>> +   otherwise, return NULL.  */
>> +
>> +static rtx
>> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  if (!TARGET_AVX)
>> +    return NULL;
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>> +  || (TARGET_64BIT
>> +      && (REX_SSE_REGNO_P (regno)
>> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      return NULL;
>> +
>> +  return gen_avx_vzeroall ();
>> +}
>> +
>> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
>> +   otherwise, return NULL.  */
>> +
>> +static rtx
>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  if (!TARGET_MMX)
>> +    return NULL;
>> +
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      return NULL;
>> +
>> +  return gen_mmx_emms ();
>> 
>> 
>> emms is not clearing any register, it only loads x87FPUTagWord with
>> FFFFH. So I think, the above is useless, as far as register clearing
>> is concerned.
>> 
>> 
>> Thanks for the info.
>> 
>> So, for mm and st registers, should we clear them, and how?
> 
> I don't know.
> 
> Please note that %mm and %st share the same register file, and
> touching %mm registers will block access to %st until emms is emitted.
> You can't just blindly load 0 to %st registers, because the register
> file can be in MMX mode and vice versa. For 32bit targets, function
> can also  return a value in the %mm0.

If data flow determine that %mm0 does not return a value at the return, can we clear all the %st as following:

emms
mov %st0, 0
mov %st1, 0
mov %st2, 0
mov %st3, 0
mov %st4, 0
mov %st5, 0
mov %st6, 0
mov %st7, 0

? 

Thanks.

Qing
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 18:12 ` Richard Sandiford
@ 2020-10-20 21:47   ` Qing Zhao
  2020-10-21 15:47     ` Richard Sandiford
  2020-10-22 13:49   ` Qing Zhao
  1 sibling, 1 reply; 20+ messages in thread
From: Qing Zhao @ 2020-10-20 21:47 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Uros Bizjak, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Richard,

Thanks a lot for your comments.

> On Oct 20, 2020, at 1:12 PM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
>> 
>> +
>> +  if ((strcmp (TREE_STRING_POINTER (id), "skip") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr-arg") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "used-arg") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "all-arg") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "used-gpr") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "all-gpr") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "used") != 0)
>> +      && (strcmp (TREE_STRING_POINTER (id), "all") != 0))
> 
> Any reason we don't support all-gpr-arg?  Seems to be the only
> “missing” combination.
Will add this one.

> 
> Would be good to have a single piece of code that parses these
> arguments into a set of flags, rather than have one list here
> and one get_call_used_regs_seq.
> 
> Maybe we could do something similar to sanitizer_opts, but that
> might not be necessary.

Okay, will do that.
> 
>> +    {
>> +      error ("attribute %qE argument must be one of %qs, %qs, %qs, %qs,"
>> +	     "%qs, %qs, %qs, or %qs",
>> + 	     name, "skip", "used-gpr-arg", "used-arg", "all-arg",
>> +	     "used-gpr", "all-gpr", "used", "all");
>> +      *no_add_attris = true;
>> +      return NULL_TREE;
>> +    }
>> +
>> +  return NULL_TREE;
>> +}
>> +
>> /* Handle a "returns_nonnull" attribute; arguments as in
>>   struct attribute_spec.handler.  */
>> 
>> diff --git a/gcc/coretypes.h b/gcc/coretypes.h
>> index 6b6cfcd..0ce5eb4 100644
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -418,6 +418,19 @@ enum symbol_visibility
>>  VISIBILITY_INTERNAL
>> };
>> 
>> +/* Zero call-used registers type.  */
>> +enum zero_call_used_regs {
>> +  zero_call_used_regs_unset = 0,
>> +  zero_call_used_regs_skip,
>> +  zero_call_used_regs_used_gpr_arg,
>> +  zero_call_used_regs_used_arg,
>> +  zero_call_used_regs_all_arg,
>> +  zero_call_used_regs_used_gpr,
>> +  zero_call_used_regs_all_gpr,
>> +  zero_call_used_regs_used,
>> +  zero_call_used_regs_all
>> +};
> 
> I think a bitmask would be easier to use:
> 
>  SKIP
>  ONLY_USED
>  ONLY_GPR
>  ONLY_ARG
> 
> Should probably be a class enum given that we're C++11.

Good suggestion.

> 
>> +pass parameters. @samp{used-arg} zeros used call-used registers that
>> +pass parameters. @samp{arg} zeros all call-used registers that pass
>> +parameters.  These 3 choices are used for ROP mitigation.
>> +
>> +@samp{used-gpr} zeros call-used general purpose registers
>> +which are used in function.  @samp{all-gpr} zeros all
>> +call-used registers.  @samp{used} zeros call-used registers which
>> +are used in function.  @samp{all} zeros all call-used registers.
>> +These 4 choices are used for preventing information leak through
>> +registers.
> 
> The description for all-gpr doesn't look right.
Oops. Will fix it.

>  I think it would
> be easier to describe (and hopefully to follow) if we start with
> the three basic choices: “skip”, “used” and “all”.  Then describe
> how “used” and “all” can be modified by adding “-gpr” to limit the
> clearing to general-purpose registers and “-arg” to limit the
> clearing to argument registers.
> 
> We need to say what “call-used” and “used” mean in this context.
> In particular, “call-used” is also known as “call-clobbered”,
> “caller-saved“ and “volatile”, so it would be good to list those
> as alternatives.  We need to say what “used” registers are.

Okay.

>> 
>> +@item -fzero-call-used-regs=@var{choice}
>> +@opindex fzero-call-used-regs
>> +Zero call-used registers at function return to increase the program
>> +security by either mitigating Return-Oriented Programming (ROP) or
>> +preventing information leak through registers.
>> +
>> +@samp{skip}, which is the default, doesn't zero call-used registers.
>> +
>> +@samp{used-gpr-arg} zeros used call-used general purpose registers that
>> +pass parameters. @samp{used-arg} zeros used call-used registers that
>> +pass parameters. @samp{all-arg} zeros all call-used registers that pass
>> +parameters.  These 3 choices are used for ROP mitigation.
>> +
>> +@samp{used-gpr} zeros call-used general purpose registers
>> +which are used in function.  @samp{all-gpr} zeros all
>> +call-used registers.  @samp{used} zeros call-used registers which
>> +are used in function.  @samp{all} zeros all call-used registers.
>> +These 4 choices are used for preventing information leak through
>> +registers.
> 
> Same comment here.

Okay.

> 
>> @@ -310,6 +310,9 @@ struct GTY(()) rtl_data {
>>     sets them.  */
>>  HARD_REG_SET asm_clobbers;
>> 
>> +  /* All hard registers that are zeroed at the return of the routine.  */
>> +  HARD_REG_SET zeroed_reg_set;
> 
> How about “must_be_zero_on_return“?  “zeroed_reg_set” isn't very
> specific about where the zeroing happens or is needed.  E.g. we also
> zero uninitialised registers.
okay.

> 
>> +{
>> +  basic_block bb = BLOCK_FOR_INSN (ret);
>> +  auto_bitmap live_out;
>> +  bitmap_copy (live_out, df_get_live_out (bb));
> 
> Sorry, forgot that here we should do:
> 
>  df_simulate_initialize_backwards (bb, live_out);
> 
> But we should calculate this set once per return instruction rather
> than repeat the calculation for every register.

Okay.
> 
>> 
>> +  /* If gpr_only is true, only zero call-used-registers that are
>> +     general-purpose registers; if used_only is true, only zero
>> +     call-used-registers that are used in the current function.  */
>> +
>> +  switch (zero_regs_type)
>> +    {
>> +      case zero_call_used_regs_used_arg:
>> +	gpr_only = false;
>> +	break;
>> +      case zero_call_used_regs_all_arg:
>> +	gpr_only = false;
>> +	used_only = false;
>> +	break;
>> +      case zero_call_used_regs_used_gpr:
>> +	arg_only = false;
>> +	break;
>> +      case zero_call_used_regs_all_gpr:
>> +	used_only = false;
>> +	arg_only = false;
>> +	break;
>> +      case zero_call_used_regs_used:
>> +	gpr_only = false;
>> +	arg_only = false;
>> +	break;
>> +      case zero_call_used_regs_all:
>> +	gpr_only = false;
>> +	used_only = false;
>> +	arg_only = false;
>> +	break;
>> +      default:
>> +	break;
>> +    }
> 
> Using a bitmask would also simplify this.
agreed.

> 
>> +
>> +  /* For each of the hard registers, check to see whether we should zero it if:
>> +     1. it is a call-used-registers;
>> + and 2. it is not a fixed-registers;
>> + and 3. it is not live at the return of the routine;
>> + and 4. it is general registor if gpr_only is true;
>> + and 5. it is used in the routine if used_only is true;
>> + and 6. it is a register that passes parameter if arg_only is true;
>> +   */
>> +
>> +  HARD_REG_SET need_zeroed_hardregs;
>> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    {
>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
>> +	continue;
> 
> This should use crtl->abi instead.  The set of call-used registers
> can vary from function to function.

You mean to use:

If (!crtl->abi->clobbers_full_reg_p(regno))

?


> 
>> +static unsigned int
>> +rest_of_zero_call_used_regs (void)
>> +{
>> +  basic_block bb;
>> +  rtx_insn *insn;
>> +
>> +  /* This pass needs data flow information.  */
>> +  df_analyze ();
>> +
>> +  /* Search all the "return"s in the routine, and insert instruction sequence to
>> +     zero the call used registers.  */
>> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>> +	|| (single_succ_p (bb)
>> +	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
>> +      FOR_BB_INSNS_REVERSE (bb, insn)
>> +	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>> +	  {
>> +	    /* Now we can insert the instruction sequence to zero the call used
>> +	       registers before this insn.  */
>> +	    gen_call_used_regs_seq (insn);
>> +	    break;
>> +	  }
> 
> The exit block never has instructions, so it's only necessary to process
> predecessors.  A simpler way to do that is to iterate over the edges in:
> 
>  EXIT_BLOCK_PTR_FOR_FN (cfun)->preds
> 
> You shouldn't need to use FOR_BB_INSNS_REVERSE: it should be enough
> to check only BB_END (bb), since returns always end blocks.

Something like the following?

  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
    {
     insn = BB_END (e->src);
      If (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
        {
	  /* Now we can insert the instruction sequence to zero the call used
	       registers before this insn.  */
	    gen_call_used_regs_seq (insn);
	    break;       
        }
      }
> 
>> +
>> +  return 0;
>> +}
>> +
>> +namespace {
>> +
>> +const pass_data pass_data_zero_call_used_regs =
>> +{
>> +  RTL_PASS, /* type */
>> +  "zero_call_used_regs", /* name */
>> +  OPTGROUP_NONE, /* optinfo_flags */
>> +  TV_NONE, /* tv_id */
>> +  0, /* properties_required */
>> +  0, /* properties_provided */
>> +  0, /* properties_destroyed */
>> +  0, /* todo_flags_start */
>> +  0, /* todo_flags_finish */
>> +};
>> +
>> +class pass_zero_call_used_regs: public rtl_opt_pass
>> +{
>> +public:
>> +  pass_zero_call_used_regs (gcc::context *ctxt)
>> +    : rtl_opt_pass (pass_data_zero_call_used_regs, ctxt)
>> +  {}
>> +
>> +  /* opt_pass methods: */
>> +  virtual bool gate (function *)
>> +    {
>> +      return flag_zero_call_used_regs > zero_call_used_regs_unset;
> 
> I think this also needs to check the function attributes.

Okay.
> 
>> +    }
>> +  virtual unsigned int execute (function *)
>> +    {
>> +      return rest_of_zero_call_used_regs ();
>> +    }
>> +
>> +}; // class pass_zero_call_used_regs
>> +
>> +} // anon namespace
>> +
>> +rtl_opt_pass *
>> +make_pass_zero_call_used_regs (gcc::context *ctxt)
>> +{
>> +  return new pass_zero_call_used_regs (ctxt);
>> +}
>> 
>> /* If CONSTRAINT is a matching constraint, then return its number.
>>   Otherwise, return -1.  */
>> diff --git a/gcc/optabs.c b/gcc/optabs.c
>> index 8ad7f4b..57e5c5d 100644
>> --- a/gcc/optabs.c
>> +++ b/gcc/optabs.c
>> @@ -6484,6 +6484,49 @@ expand_memory_blockage (void)
>>    expand_asm_memory_blockage ();
>> }
>> 
>> +/* Generate asm volatile("" : : : "memory") as a memory blockage, at the
>> +   same time clobbering the register set specified by ZEROED_REGS.  */
>> +
>> +void
>> +expand_asm_reg_clobber_mem_blockage (HARD_REG_SET zeroed_regs)
> 
> Just “regs”: the interface is more general than registers that are being
> zeroed.
Okay.

> 
>> +{
>> +  rtx asm_op, clob_mem, clob_reg;
>> +
>> +  unsigned int num_of_regs = 0;
>> +  for (unsigned int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>> +    if (TEST_HARD_REG_BIT (zeroed_regs, i))
>> +      num_of_regs++;
>> +
>> +  if (num_of_regs == 0)
>> +    return;
> 
> For this interface, I think we should continue and just include
> a memory clobber.

Right.

> 
>> +	RTVEC_ELT (v,j) = clob_reg;
> 
> …IMO it would be more readable as just:
> 
> 	RTVEC_ELT (v, j) = gen_rtx_CLOBBER (VOIDmode, regno_reg_rtx[i]);
Okay.

> 
>> diff --git a/gcc/recog.c b/gcc/recog.c
>> index ce83b7f..472c2dc 100644
>> --- a/gcc/recog.c
>> +++ b/gcc/recog.c
>> @@ -923,6 +923,21 @@ validate_simplify_insn (rtx_insn *insn)
>>  return ((num_changes_pending () > 0) && (apply_change_group () > 0));
>> }
>> 
>> 
>> +
>> +bool
>> +valid_insn_p (rtx_insn *insn)
> 
> This should have a function comment.  E.g.:
> 
> /* Check whether INSN matches a specific alternative of an .md pattern.  */
Okay.

> 
>> 
>> +/* Generate instruction sequence to zero call used registers.  */
>> +DEFHOOK
>> +(zero_call_used_regs,
>> + "This target hook emits instructions to zero registers specified\n\
>> +by @var{need_zeroed_hardregs} at function return, at the same time\n\
>> +return the hard register set that are actually zeroed by the hook\n\
>> +Define this hook if the target has more effecient instructions to\n\
>> +zero call-used registers, or if the target only tries to zero a subset\n\
>> +of @var{need_zeroed_hardregs}.\n\
>> +If the hook is not defined, the default_zero_call_used_reg will be used.",
>> + HARD_REG_SET, (HARD_REG_SET need_zeroed_hardregs),
> 
> I'd suggest:
> 
> "Emit instructions to zero the subset of @var{selected_regs} that\n\
> could conceivably contain values that are useful to an attacker.\n\
> Return the set of registers that were actually cleared.\n\
> \n\
> The default implementation uses normal move instructions to zero\n\
> all the registers in @var{selected_regs}.  Define this hook if the\n\
> target has more efficient ways of zeroing certain registers,\n\
> or if you believe that certain registers would never contain\n\
> values that are useful to an attacker."
> 
> with the parameter called “selected_regs” instead of
> “need_zeroed_hardregs”.  (“need” suggests that the target
> doesn't have the option of not zeroing.)

Okay.

> 
>> 
>> +/* The default hook for TARGET_ZERO_CALL_USED_REGS.  */
>> +
>> +HARD_REG_SET
>> +default_zero_call_used_regs (HARD_REG_SET need_zeroed_hardregs)
>> +{
>> +  HARD_REG_SET zeroed_hardregs;
>> +  gcc_assert (!hard_reg_set_empty_p (need_zeroed_hardregs));
>> +
>> +  CLEAR_HARD_REG_SET (zeroed_hardregs);
>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> +    if (TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>> +      {
>> +	rtx_insn *last_insn = get_last_insn ();
>> +	machine_mode mode = GET_MODE (regno_reg_rtx[regno]);
>> +	rtx zero = CONST0_RTX (mode);
>> +	rtx_insn *insn = emit_move_insn (regno_reg_rtx[regno], zero);
>> +	if (!valid_insn_p (insn))
>> +	  {
>> +	    static bool issued_error;
>> +	    if (!issued_error)
>> +	      {
>> +		issued_error = true;
>> +		sorry ("-fzero-call-used-regs not supported on this target");
> 
> Should be "%qs not supported on this target", with the option name
> as a second argument.

Okay.

> 
>> +	      }
>> +	    delete_insns_since (last_insn);
>> +	  }
>> +	else
>> +	  SET_HARD_REG_BIT (zeroed_hardregs, regno);
>> +      }
>> +  return zeroed_hardregs;
> 
> I don't think it's worth building up a different return set.  The only
> time it's different from need_zeroed_hardregs is if we emit the sorry,
> which will cause compilation to fail anyway.

Okay.

> 
> Sorry, I ran out of time to review the tests, but the code part
> otherwise looks good.

thanks.

Qing
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 20:04       ` Qing Zhao
@ 2020-10-21  7:18         ` Uros Bizjak
  2020-10-21  8:03           ` Uros Bizjak
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2020-10-21  7:18 UTC (permalink / raw)
  To: Qing Zhao
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:

> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers at this time.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> + bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +  || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +  || (TARGET_64BIT
> +      && (REX_SSE_REGNO_P (regno)
> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_MMX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_mmx_emms ();
>
>
> emms is not clearing any register, it only loads x87FPUTagWord with
> FFFFH. So I think, the above is useless, as far as register clearing
> is concerned.
>
>
> Thanks for the info.
>
> So, for mm and st registers, should we clear them, and how?
>
>
> I don't know.
>
> Please note that %mm and %st share the same register file, and
> touching %mm registers will block access to %st until emms is emitted.
> You can't just blindly load 0 to %st registers, because the register
> file can be in MMX mode and vice versa. For 32bit targets, function
> can also  return a value in the %mm0.
>
>
> If data flow determine that %mm0 does not return a value at the return, can we clear all the %st as following:
>
> emms
> mov %st0, 0
> mov %st1, 0
> mov %st2, 0
> mov %st3, 0
> mov %st4, 0
> mov %st5, 0
> mov %st6, 0
> mov %st7, 0

The i386 ABI says:

-- q --
The CPU shall be in x87 mode upon entry to a function. Therefore,
every function that uses the MMX registers is required to issue an
emms or femms instruction after using MMX registers, before returning
or calling another function.
-- /q --

(The above requirement slightly contradicts its own ABI, since we have
3 MMX argument registers and MMX return register, so the CPU obviously
can't be in x87 mode at all function boundaries).

So, assuming that the first sentence is not deliberately vague w.r.t
function exit, emms should not be needed. However, we are dealing with
x87 stack registers that have their own set of peculiarities. It is
not possible to load a random register in the way you show.  Also,
stack should be either empty or one (two in case of complex value
return) levels deep at the function return. I think you want a series
of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
the stack and mark stack slots empty.

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21  7:18         ` Uros Bizjak
@ 2020-10-21  8:03           ` Uros Bizjak
  2020-10-21 14:45             ` Qing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2020-10-21  8:03 UTC (permalink / raw)
  To: Qing Zhao
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> > +/* Check whether the register REGNO should be zeroed on X86.
> > +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> > +   together, no need to zero it again.
> > +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> > +   very hard to be zeroed individually, don't zero individual st or
> > +   mm registgers at this time.  */
> > +
> > +static bool
> > +zero_call_used_regno_p (const unsigned int regno,
> > + bool all_sse_zeroed)
> > +{
> > +  return GENERAL_REGNO_P (regno)
> > +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> > +  || MASK_REGNO_P (regno);
> > +}
> > +
> > +/* Return the machine_mode that is used to zero register REGNO.  */
> > +
> > +static machine_mode
> > +zero_call_used_regno_mode (const unsigned int regno)
> > +{
> > +  /* NB: We only need to zero the lower 32 bits for integer registers
> > +     and the lower 128 bits for vector registers since destination are
> > +     zero-extended to the full register width.  */
> > +  if (GENERAL_REGNO_P (regno))
> > +    return SImode;
> > +  else if (SSE_REGNO_P (regno))
> > +    return V4SFmode;
> > +  else
> > +    return HImode;
> > +}
> > +
> > +/* Generate a rtx to zero all vector registers togetehr if possible,
> > +   otherwise, return NULL.  */
> > +
> > +static rtx
> > +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> > +{
> > +  if (!TARGET_AVX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> > +  || (TARGET_64BIT
> > +      && (REX_SSE_REGNO_P (regno)
> > +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> > +      return NULL;
> > +
> > +  return gen_avx_vzeroall ();
> > +}
> > +
> > +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> > +   otherwise, return NULL.  */
> > +
> > +static rtx
> > +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> > +{
> > +  if (!TARGET_MMX)
> > +    return NULL;
> > +
> > +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> > +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> > + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> > +      return NULL;
> > +
> > +  return gen_mmx_emms ();
> >
> >
> > emms is not clearing any register, it only loads x87FPUTagWord with
> > FFFFH. So I think, the above is useless, as far as register clearing
> > is concerned.
> >
> >
> > Thanks for the info.
> >
> > So, for mm and st registers, should we clear them, and how?
> >
> >
> > I don't know.
> >
> > Please note that %mm and %st share the same register file, and
> > touching %mm registers will block access to %st until emms is emitted.
> > You can't just blindly load 0 to %st registers, because the register
> > file can be in MMX mode and vice versa. For 32bit targets, function
> > can also  return a value in the %mm0.
> >
> >
> > If data flow determine that %mm0 does not return a value at the return, can we clear all the %st as following:
> >
> > emms
> > mov %st0, 0
> > mov %st1, 0
> > mov %st2, 0
> > mov %st3, 0
> > mov %st4, 0
> > mov %st5, 0
> > mov %st6, 0
> > mov %st7, 0
>
> The i386 ABI says:
>
> -- q --
> The CPU shall be in x87 mode upon entry to a function. Therefore,
> every function that uses the MMX registers is required to issue an
> emms or femms instruction after using MMX registers, before returning
> or calling another function.
> -- /q --
>
> (The above requirement slightly contradicts its own ABI, since we have
> 3 MMX argument registers and MMX return register, so the CPU obviously
> can't be in x87 mode at all function boundaries).
>
> So, assuming that the first sentence is not deliberately vague w.r.t
> function exit, emms should not be needed. However, we are dealing with
> x87 stack registers that have their own set of peculiarities. It is
> not possible to load a random register in the way you show.  Also,
> stack should be either empty or one (two in case of complex value
> return) levels deep at the function return. I think you want a series
> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
> the stack and mark stack slots empty.

Something like this:

--cut here--
long double
__attribute__ ((noinline))
test (long double a, long double b)
{
  long double r = a + b;

  asm volatile ("fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fldz;                \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0);            \
        fstp %%st(0)" : : "X"(r));
  return r;
}

int
main ()
{
  long double a = 1.1, b = 1.2;

  long double c = test (a, b);

  printf ("%Lf\n", c);

  return 0;
}
--cut here--

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21  8:03           ` Uros Bizjak
@ 2020-10-21 14:45             ` Qing Zhao
  2020-10-21 16:09               ` Uros Bizjak
  2020-10-22 14:46               ` Qing Zhao
  0 siblings, 2 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-21 14:45 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via



> On Oct 21, 2020, at 3:03 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubizjak@gmail.com <mailto:ubizjak@gmail.com>> wrote:
>> 
>> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>>> +/* Check whether the register REGNO should be zeroed on X86.
>>> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
>>> +   together, no need to zero it again.
>>> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
>>> +   very hard to be zeroed individually, don't zero individual st or
>>> +   mm registgers at this time.  */
>>> +
>>> +static bool
>>> +zero_call_used_regno_p (const unsigned int regno,
>>> + bool all_sse_zeroed)
>>> +{
>>> +  return GENERAL_REGNO_P (regno)
>>> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
>>> +  || MASK_REGNO_P (regno);
>>> +}
>>> +
>>> +/* Return the machine_mode that is used to zero register REGNO.  */
>>> +
>>> +static machine_mode
>>> +zero_call_used_regno_mode (const unsigned int regno)
>>> +{
>>> +  /* NB: We only need to zero the lower 32 bits for integer registers
>>> +     and the lower 128 bits for vector registers since destination are
>>> +     zero-extended to the full register width.  */
>>> +  if (GENERAL_REGNO_P (regno))
>>> +    return SImode;
>>> +  else if (SSE_REGNO_P (regno))
>>> +    return V4SFmode;
>>> +  else
>>> +    return HImode;
>>> +}
>>> +
>>> +/* Generate a rtx to zero all vector registers togetehr if possible,
>>> +   otherwise, return NULL.  */
>>> +
>>> +static rtx
>>> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> +  if (!TARGET_AVX)
>>> +    return NULL;
>>> +
>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
>>> +  || (TARGET_64BIT
>>> +      && (REX_SSE_REGNO_P (regno)
>>> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> +      return NULL;
>>> +
>>> +  return gen_avx_vzeroall ();
>>> +}
>>> +
>>> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
>>> +   otherwise, return NULL.  */
>>> +
>>> +static rtx
>>> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
>>> +{
>>> +  if (!TARGET_MMX)
>>> +    return NULL;
>>> +
>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
>>> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
>>> +      return NULL;
>>> +
>>> +  return gen_mmx_emms ();
>>> 
>>> 
>>> emms is not clearing any register, it only loads x87FPUTagWord with
>>> FFFFH. So I think, the above is useless, as far as register clearing
>>> is concerned.
>>> 
>>> 
>>> Thanks for the info.
>>> 
>>> So, for mm and st registers, should we clear them, and how?
>>> 
>>> 
>>> I don't know.
>>> 
>>> Please note that %mm and %st share the same register file, and
>>> touching %mm registers will block access to %st until emms is emitted.
>>> You can't just blindly load 0 to %st registers, because the register
>>> file can be in MMX mode and vice versa. For 32bit targets, function
>>> can also  return a value in the %mm0.
>>> 
>>> 
>>> If data flow determine that %mm0 does not return a value at the return, can we clear all the %st as following:
>>> 
>>> emms
>>> mov %st0, 0
>>> mov %st1, 0
>>> mov %st2, 0
>>> mov %st3, 0
>>> mov %st4, 0
>>> mov %st5, 0
>>> mov %st6, 0
>>> mov %st7, 0
>> 
>> The i386 ABI says:
>> 
>> -- q --
>> The CPU shall be in x87 mode upon entry to a function. Therefore,
>> every function that uses the MMX registers is required to issue an
>> emms or femms instruction after using MMX registers, before returning
>> or calling another function.
>> -- /q --
>> 
>> (The above requirement slightly contradicts its own ABI, since we have
>> 3 MMX argument registers and MMX return register, so the CPU obviously
>> can't be in x87 mode at all function boundaries).
>> 
>> So, assuming that the first sentence is not deliberately vague w.r.t
>> function exit, emms should not be needed. However, we are dealing with
>> x87 stack registers that have their own set of peculiarities. It is
>> not possible to load a random register in the way you show.  Also,
>> stack should be either empty or one (two in case of complex value
>> return) levels deep at the function return. I think you want a series
>> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
>> the stack and mark stack slots empty.
> 
> Something like this:
> 
> --cut here--
> long double
> __attribute__ ((noinline))
> test (long double a, long double b)
> {
>  long double r = a + b;
> 
>  asm volatile ("fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0)" : : "X"(r));
>  return r;
> }
> 
> int
> main ()
> {
>  long double a = 1.1, b = 1.2;
> 
>  long double c = test (a, b);
> 
>  printf ("%Lf\n", c);
> 
>  return 0;
> }
> --cut here—


Okay, so,

1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.

Is the above understanding correctly?

Another thought is:

Looks like it’s very complicate to use the st/mm register set correctly, So, 
I assume that this set of registers might be very hard to be used by the attacker correctly. 
Right?

thanks.

Qing
> 
> Uros.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 21:47   ` Qing Zhao
@ 2020-10-21 15:47     ` Richard Sandiford
  2020-10-21 16:52       ` Qing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Sandiford @ 2020-10-21 15:47 UTC (permalink / raw)
  To: Qing Zhao
  Cc: Uros Bizjak, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>> +  /* For each of the hard registers, check to see whether we should zero it if:
>>> +     1. it is a call-used-registers;
>>> + and 2. it is not a fixed-registers;
>>> + and 3. it is not live at the return of the routine;
>>> + and 4. it is general registor if gpr_only is true;
>>> + and 5. it is used in the routine if used_only is true;
>>> + and 6. it is a register that passes parameter if arg_only is true;
>>> +   */
>>> +
>>> +  HARD_REG_SET need_zeroed_hardregs;
>>> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> +    {
>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
>>> +	continue;
>> 
>> This should use crtl->abi instead.  The set of call-used registers
>> can vary from function to function.
>
> You mean to use:
>
> If (!crtl->abi->clobbers_full_reg_p(regno))
>
> ?

Yeah, that's right.  (But with a space before “(regno)” :-))

>>> +static unsigned int
>>> +rest_of_zero_call_used_regs (void)
>>> +{
>>> +  basic_block bb;
>>> +  rtx_insn *insn;
>>> +
>>> +  /* This pass needs data flow information.  */
>>> +  df_analyze ();
>>> +
>>> +  /* Search all the "return"s in the routine, and insert instruction sequence to
>>> +     zero the call used registers.  */
>>> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>>> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>>> +	|| (single_succ_p (bb)
>>> +	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
>>> +      FOR_BB_INSNS_REVERSE (bb, insn)
>>> +	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>>> +	  {
>>> +	    /* Now we can insert the instruction sequence to zero the call used
>>> +	       registers before this insn.  */
>>> +	    gen_call_used_regs_seq (insn);
>>> +	    break;
>>> +	  }
>> 
>> The exit block never has instructions, so it's only necessary to process
>> predecessors.  A simpler way to do that is to iterate over the edges in:
>> 
>>  EXIT_BLOCK_PTR_FOR_FN (cfun)->preds
>> 
>> You shouldn't need to use FOR_BB_INSNS_REVERSE: it should be enough
>> to check only BB_END (bb), since returns always end blocks.
>
> Something like the following?
>
>   FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>     {
>      insn = BB_END (e->src);
>       If (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>         {
> 	  /* Now we can insert the instruction sequence to zero the call used
> 	       registers before this insn.  */
> 	    gen_call_used_regs_seq (insn);
> 	    break;       
>         }
>       }

With this you don't want/need the break, since it would break out
of the edge traversal (instead of the FOR_BB_INSNS_REVERSE, as above).
Also, I think the code becomes simple enough that the comment isn't
really needed.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21 14:45             ` Qing Zhao
@ 2020-10-21 16:09               ` Uros Bizjak
  2020-10-21 16:51                 ` Qing Zhao
  2020-10-21 18:25                 ` Segher Boessenkool
  2020-10-22 14:46               ` Qing Zhao
  1 sibling, 2 replies; 20+ messages in thread
From: Uros Bizjak @ 2020-10-21 16:09 UTC (permalink / raw)
  To: Qing Zhao
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

On Wed, Oct 21, 2020 at 4:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
>
>
> On Oct 21, 2020, at 3:03 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Oct 21, 2020 at 9:18 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
>
> On Tue, Oct 20, 2020 at 10:04 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> +/* Check whether the register REGNO should be zeroed on X86.
> +   When ALL_SSE_ZEROED is true, all SSE registers have been zeroed
> +   together, no need to zero it again.
> +   Stack registers (st0-st7) and mm0-mm7 are aliased with each other.
> +   very hard to be zeroed individually, don't zero individual st or
> +   mm registgers at this time.  */
> +
> +static bool
> +zero_call_used_regno_p (const unsigned int regno,
> + bool all_sse_zeroed)
> +{
> +  return GENERAL_REGNO_P (regno)
> +  || (!all_sse_zeroed && SSE_REGNO_P (regno))
> +  || MASK_REGNO_P (regno);
> +}
> +
> +/* Return the machine_mode that is used to zero register REGNO.  */
> +
> +static machine_mode
> +zero_call_used_regno_mode (const unsigned int regno)
> +{
> +  /* NB: We only need to zero the lower 32 bits for integer registers
> +     and the lower 128 bits for vector registers since destination are
> +     zero-extended to the full register width.  */
> +  if (GENERAL_REGNO_P (regno))
> +    return SImode;
> +  else if (SSE_REGNO_P (regno))
> +    return V4SFmode;
> +  else
> +    return HImode;
> +}
> +
> +/* Generate a rtx to zero all vector registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_vector_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_AVX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((IN_RANGE (regno, FIRST_SSE_REG, LAST_SSE_REG)
> +  || (TARGET_64BIT
> +      && (REX_SSE_REGNO_P (regno)
> +  || (TARGET_AVX512F && EXT_REX_SSE_REGNO_P (regno)))))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_avx_vzeroall ();
> +}
> +
> +/* Generate a rtx to zero all st and mm registers togetehr if possible,
> +   otherwise, return NULL.  */
> +
> +static rtx
> +zero_all_st_mm_registers (HARD_REG_SET need_zeroed_hardregs)
> +{
> +  if (!TARGET_MMX)
> +    return NULL;
> +
> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> +    if ((STACK_REGNO_P (regno) || MMX_REGNO_P (regno))
> + && !TEST_HARD_REG_BIT (need_zeroed_hardregs, regno))
> +      return NULL;
> +
> +  return gen_mmx_emms ();
>
>
> emms is not clearing any register, it only loads x87FPUTagWord with
> FFFFH. So I think, the above is useless, as far as register clearing
> is concerned.
>
>
> Thanks for the info.
>
> So, for mm and st registers, should we clear them, and how?
>
>
> I don't know.
>
> Please note that %mm and %st share the same register file, and
> touching %mm registers will block access to %st until emms is emitted.
> You can't just blindly load 0 to %st registers, because the register
> file can be in MMX mode and vice versa. For 32bit targets, function
> can also  return a value in the %mm0.
>
>
> If data flow determine that %mm0 does not return a value at the return, can we clear all the %st as following:
>
> emms
> mov %st0, 0
> mov %st1, 0
> mov %st2, 0
> mov %st3, 0
> mov %st4, 0
> mov %st5, 0
> mov %st6, 0
> mov %st7, 0
>
>
> The i386 ABI says:
>
> -- q --
> The CPU shall be in x87 mode upon entry to a function. Therefore,
> every function that uses the MMX registers is required to issue an
> emms or femms instruction after using MMX registers, before returning
> or calling another function.
> -- /q --
>
> (The above requirement slightly contradicts its own ABI, since we have
> 3 MMX argument registers and MMX return register, so the CPU obviously
> can't be in x87 mode at all function boundaries).
>
> So, assuming that the first sentence is not deliberately vague w.r.t
> function exit, emms should not be needed. However, we are dealing with
> x87 stack registers that have their own set of peculiarities. It is
> not possible to load a random register in the way you show.  Also,
> stack should be either empty or one (two in case of complex value
> return) levels deep at the function return. I think you want a series
> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
> the stack and mark stack slots empty.
>
>
> Something like this:
>
> --cut here--
> long double
> __attribute__ ((noinline))
> test (long double a, long double b)
> {
>  long double r = a + b;
>
>  asm volatile ("fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fldz;                \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0);            \
>        fstp %%st(0)" : : "X"(r));
>  return r;
> }
>
> int
> main ()
> {
>  long double a = 1.1, b = 1.2;
>
>  long double c = test (a, b);
>
>  printf ("%Lf\n", c);
>
>  return 0;
> }
> --cut here—
>
>
>
> Okay, so,
>
> 1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
> 2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
> 3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.
>
> Is the above understanding correctly?

Yes.

> Another thought is:
>
> Looks like it’s very complicate to use the st/mm register set correctly, So,
> I assume that this set of registers might be very hard to be used by the attacker correctly.
> Right?

Correct, but "very hard to be used" depends on how determined the attacker is.

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21 16:09               ` Uros Bizjak
@ 2020-10-21 16:51                 ` Qing Zhao
  2020-10-21 18:25                 ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-21 16:51 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: richard Sandiford, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via



> On Oct 21, 2020, at 11:09 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Wed, Oct 21, 2020 at 4:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>> 
>> 
>> 
>> -- q --
>> The CPU shall be in x87 mode upon entry to a function. Therefore,
>> every function that uses the MMX registers is required to issue an
>> emms or femms instruction after using MMX registers, before returning
>> or calling another function.
>> -- /q --
>> 
>> (The above requirement slightly contradicts its own ABI, since we have
>> 3 MMX argument registers and MMX return register, so the CPU obviously
>> can't be in x87 mode at all function boundaries).
>> 
>> So, assuming that the first sentence is not deliberately vague w.r.t
>> function exit, emms should not be needed. However, we are dealing with
>> x87 stack registers that have their own set of peculiarities. It is
>> not possible to load a random register in the way you show.  Also,
>> stack should be either empty or one (two in case of complex value
>> return) levels deep at the function return. I think you want a series
>> of 8 or 7(6) fldz insns, followed by a series of fstp insn to clear
>> the stack and mark stack slots empty.
>> 
>> 
>> Something like this:
>> 
>> --cut here--
>> long double
>> __attribute__ ((noinline))
>> test (long double a, long double b)
>> {
>> long double r = a + b;
>> 
>> asm volatile ("fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0)" : : "X"(r));
>> return r;
>> }
>> 
>> int
>> main ()
>> {
>> long double a = 1.1, b = 1.2;
>> 
>> long double c = test (a, b);
>> 
>> printf ("%Lf\n", c);
>> 
>> return 0;
>> }
>> --cut here—
>> 
>> 
>> 
>> Okay, so,
>> 
>> 1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
>> 2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
>> 3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.
>> 
>> Is the above understanding correctly?
> 
> Yes.
> 
>> Another thought is:
>> 
>> Looks like it’s very complicate to use the st/mm register set correctly, So,
>> I assume that this set of registers might be very hard to be used by the attacker correctly.
>> Right?
> 
> Correct, but "very hard to be used" depends on how determined the attacker is.

Okay, I see.
Then I will clear the st registers per the above algorithm you suggested.

Thanks a lot for the help.

Qing
> 
> Uros.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21 15:47     ` Richard Sandiford
@ 2020-10-21 16:52       ` Qing Zhao
  0 siblings, 0 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-21 16:52 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Uros Bizjak, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Got it.

thanks.

Qing

> On Oct 21, 2020, at 10:47 AM, Richard Sandiford <richard.sandiford@arm.com> wrote:
> 
> Qing Zhao <QING.ZHAO@ORACLE.COM> writes:
>>>> +  /* For each of the hard registers, check to see whether we should zero it if:
>>>> +     1. it is a call-used-registers;
>>>> + and 2. it is not a fixed-registers;
>>>> + and 3. it is not live at the return of the routine;
>>>> + and 4. it is general registor if gpr_only is true;
>>>> + and 5. it is used in the routine if used_only is true;
>>>> + and 6. it is a register that passes parameter if arg_only is true;
>>>> +   */
>>>> +
>>>> +  HARD_REG_SET need_zeroed_hardregs;
>>>> +  CLEAR_HARD_REG_SET (need_zeroed_hardregs);
>>>> +  for (unsigned int regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>> +    {
>>>> +      if (!this_target_hard_regs->x_call_used_regs[regno])
>>>> +	continue;
>>> 
>>> This should use crtl->abi instead.  The set of call-used registers
>>> can vary from function to function.
>> 
>> You mean to use:
>> 
>> If (!crtl->abi->clobbers_full_reg_p(regno))
>> 
>> ?
> 
> Yeah, that's right.  (But with a space before “(regno)” :-))
> 
>>>> +static unsigned int
>>>> +rest_of_zero_call_used_regs (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx_insn *insn;
>>>> +
>>>> +  /* This pass needs data flow information.  */
>>>> +  df_analyze ();
>>>> +
>>>> +  /* Search all the "return"s in the routine, and insert instruction sequence to
>>>> +     zero the call used registers.  */
>>>> +  FOR_EACH_BB_REVERSE_FN (bb, cfun)
>>>> +    if (bb == EXIT_BLOCK_PTR_FOR_FN (cfun)
>>>> +	|| (single_succ_p (bb)
>>>> +	    && single_succ (bb) == EXIT_BLOCK_PTR_FOR_FN (cfun)))
>>>> +      FOR_BB_INSNS_REVERSE (bb, insn)
>>>> +	if (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>>>> +	  {
>>>> +	    /* Now we can insert the instruction sequence to zero the call used
>>>> +	       registers before this insn.  */
>>>> +	    gen_call_used_regs_seq (insn);
>>>> +	    break;
>>>> +	  }
>>> 
>>> The exit block never has instructions, so it's only necessary to process
>>> predecessors.  A simpler way to do that is to iterate over the edges in:
>>> 
>>> EXIT_BLOCK_PTR_FOR_FN (cfun)->preds
>>> 
>>> You shouldn't need to use FOR_BB_INSNS_REVERSE: it should be enough
>>> to check only BB_END (bb), since returns always end blocks.
>> 
>> Something like the following?
>> 
>>  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>>    {
>>     insn = BB_END (e->src);
>>      If (JUMP_P (insn) && ANY_RETURN_P (JUMP_LABEL (insn)))
>>        {
>> 	  /* Now we can insert the instruction sequence to zero the call used
>> 	       registers before this insn.  */
>> 	    gen_call_used_regs_seq (insn);
>> 	    break;       
>>        }
>>      }
> 
> With this you don't want/need the break, since it would break out
> of the edge traversal (instead of the FOR_BB_INSNS_REVERSE, as above).
> Also, I think the code becomes simple enough that the comment isn't
> really needed.
> 
> Thanks,
> Richard


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21 16:09               ` Uros Bizjak
  2020-10-21 16:51                 ` Qing Zhao
@ 2020-10-21 18:25                 ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2020-10-21 18:25 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: Qing Zhao, richard Sandiford, kees Cook, rodriguez Bahena Victor,
	H.J. Lu, gcc-patches Kees Cook via

On Wed, Oct 21, 2020 at 06:09:28PM +0200, Uros Bizjak wrote:
> On Wed, Oct 21, 2020 at 4:45 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
> > Looks like it’s very complicate to use the st/mm register set correctly, So,
> > I assume that this set of registers might be very hard to be used by the attacker correctly.
> > Right?
> 
> Correct, but "very hard to be used" depends on how determined the attacker is.

Not only that, but the attacker only needs to get it right once, not for
every function (and not even for every program for that matter).


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-20 18:12 ` Richard Sandiford
  2020-10-20 21:47   ` Qing Zhao
@ 2020-10-22 13:49   ` Qing Zhao
  1 sibling, 0 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-22 13:49 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: Uros Bizjak, kees Cook, rodriguez Bahena Victor, H.J. Lu,
	segher Boessenkool, gcc-patches Kees Cook via

Hi, Richard,

Could you please check the following documentation change, and let me know any suggestions?

Thanks.

Qing
> 
>> +pass parameters. @samp{used-arg} zeros used call-used registers that
>> +pass parameters. @samp{arg} zeros all call-used registers that pass
>> +parameters.  These 3 choices are used for ROP mitigation.
>> +
>> +@samp{used-gpr} zeros call-used general purpose registers
>> +which are used in function.  @samp{all-gpr} zeros all
>> +call-used registers.  @samp{used} zeros call-used registers which
>> +are used in function.  @samp{all} zeros all call-used registers.
>> +These 4 choices are used for preventing information leak through
>> +registers.
> 
> The description for all-gpr doesn't look right.  I think it would
> be easier to describe (and hopefully to follow) if we start with
> the three basic choices: “skip”, “used” and “all”.  Then describe
> how “used” and “all” can be modified by adding “-gpr” to limit the
> clearing to general-purpose registers and “-arg” to limit the
> clearing to argument registers.
> 
> We need to say what “call-used” and “used” mean in this context.
> In particular, “call-used” is also known as “call-clobbered”,
> “caller-saved“ and “volatile”, so it would be good to list those
> as alternatives.  We need to say what “used” registers are.

@item -fzero-call-used-regs=@var{choice}
@opindex fzero-call-used-regs
Zero call-used registers at function return to increase the program
security by either mitigating Return-Oriented Programming (ROP) or
preventing information leak through registers.

A "call-used" register is a register that is clobbered by function calls,
as a result, the caller has to save and restore it before or after a
function call. It is also called as "call-clobbered", "caller-saved", or
"volatile".

In order to satisfy users with different security needs and control the
run-time overhead at the same time,  GCC provides a flexible way to choose
the subset of the call-used registers to be zeroed.

@samp{skip}, which is the default, doesn't zero any call-used registers.
@samp{used} zeros call-used registers which are used in the function. A "used"
register is one whose content has been set or referenced in the function.
@samp{all} zeros all call-used registers.

In addition to the above three basic choices, the register set can be further
limited by adding "-gpr" (i.e., general purpose register), "-arg" (i.e.,
argument register), or both as following:

@samp{used-gpr-arg} zeros used call-used general purpose registers that
pass parameters.
@samp{used-arg} zeros used call-used registers that pass parameters.
@samp{all-gpr-arg} zeros all call-used general purpose registers that pass
parameters.
@samp{all-arg} zeros all call-used registers that pass parameters.
@samp{used-gpr} zeros call-used general purpose registers which are used in the
function.
@samp{all-gpr} zeros all call-used general purpose registers.

Among this list, "used-gpr-arg", "used-arg", "all-gpr-arg", and "all-arg" are
mainly used for ROP mitigation.

You can control this behavior for a specific function by using the function
attribute @code{zero_call_used_regs}.  @xref{Function Attributes}.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-21 14:45             ` Qing Zhao
  2020-10-21 16:09               ` Uros Bizjak
@ 2020-10-22 14:46               ` Qing Zhao
  2020-10-22 15:34                 ` Uros Bizjak
  1 sibling, 1 reply; 20+ messages in thread
From: Qing Zhao @ 2020-10-22 14:46 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches Kees Cook via

Hi, Uros,

> On Oct 21, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>> 
>> 
>> Something like this:
>> 
>> --cut here--
>> long double
>> __attribute__ ((noinline))
>> test (long double a, long double b)
>> {
>> long double r = a + b;
>> 
>> asm volatile ("fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fldz;                \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0);            \
>>       fstp %%st(0)" : : "X"(r));
>> return r;
>> }
>> 
>> int
>> main ()
>> {
>> long double a = 1.1, b = 1.2;
>> 
>> long double c = test (a, b);
>> 
>> printf ("%Lf\n", c);
>> 
>> return 0;
>> }
>> --cut here—
> 
> 
> Okay, so,
> 
> 1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
> 2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
> 3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.
> 

How to generate such asm volatile insn at i386 backend? Is there any code in i386 backend I can refer for this ?

thanks.

Qing

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-22 14:46               ` Qing Zhao
@ 2020-10-22 15:34                 ` Uros Bizjak
  2020-10-22 16:37                   ` Qing Zhao
  0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2020-10-22 15:34 UTC (permalink / raw)
  To: Qing Zhao; +Cc: gcc-patches Kees Cook via

On Thu, Oct 22, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com> wrote:
>
> Hi, Uros,
>
> > On Oct 21, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> >>>>
> >>
> >> Something like this:
> >>
> >> --cut here--
> >> long double
> >> __attribute__ ((noinline))
> >> test (long double a, long double b)
> >> {
> >> long double r = a + b;
> >>
> >> asm volatile ("fldz;                \
> >>       fldz;                \
> >>       fldz;                \
> >>       fldz;                \
> >>       fldz;                \
> >>       fldz;                \
> >>       fldz;                \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0);            \
> >>       fstp %%st(0)" : : "X"(r));
> >> return r;
> >> }
> >>
> >> int
> >> main ()
> >> {
> >> long double a = 1.1, b = 1.2;
> >>
> >> long double c = test (a, b);
> >>
> >> printf ("%Lf\n", c);
> >>
> >> return 0;
> >> }
> >> --cut here—
> >
> >
> > Okay, so,
> >
> > 1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
> > 2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
> > 3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.
> >
>
> How to generate such asm volatile insn at i386 backend? Is there any code in i386 backend I can refer for this ?

fldz is plain move of zero to XF register, fstp is generated from an
XF move of FIRST_STACK_REG to itself with REG_DEAD note added:

#(insn 366 128 129 9 (set (reg:XF 8 st)
#        (reg:XF 8 st)) "test.c":711:14 110 {*movxf_internal}
#     (expr_list:REG_DEAD (reg:XF 8 st)
#        (nil)))

Uros.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all]
  2020-10-22 15:34                 ` Uros Bizjak
@ 2020-10-22 16:37                   ` Qing Zhao
  0 siblings, 0 replies; 20+ messages in thread
From: Qing Zhao @ 2020-10-22 16:37 UTC (permalink / raw)
  To: Uros Bizjak; +Cc: gcc-patches Kees Cook via



> On Oct 22, 2020, at 10:34 AM, Uros Bizjak <ubizjak@gmail.com> wrote:
> 
> On Thu, Oct 22, 2020 at 4:47 PM Qing Zhao <QING.ZHAO@oracle.com <mailto:QING.ZHAO@oracle.com>> wrote:
>> 
>> Hi, Uros,
>> 
>>> On Oct 21, 2020, at 9:45 AM, Qing Zhao via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>>> 
>>>> 
>>>> Something like this:
>>>> 
>>>> --cut here--
>>>> long double
>>>> __attribute__ ((noinline))
>>>> test (long double a, long double b)
>>>> {
>>>> long double r = a + b;
>>>> 
>>>> asm volatile ("fldz;                \
>>>>      fldz;                \
>>>>      fldz;                \
>>>>      fldz;                \
>>>>      fldz;                \
>>>>      fldz;                \
>>>>      fldz;                \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0);            \
>>>>      fstp %%st(0)" : : "X"(r));
>>>> return r;
>>>> }
>>>> 
>>>> int
>>>> main ()
>>>> {
>>>> long double a = 1.1, b = 1.2;
>>>> 
>>>> long double c = test (a, b);
>>>> 
>>>> printf ("%Lf\n", c);
>>>> 
>>>> return 0;
>>>> }
>>>> --cut here—
>>> 
>>> 
>>> Okay, so,
>>> 
>>> 1. First compute how many st registers need to be zeroed,  num_of_zeroed_st
>>> 2. Then issue (8 - num_of_zeroed_st) fldz to push 0 to the stack to clear all the dead stack slots;
>>> 3. Then issue (8 - num_of_zeroed_st) fstp %st(0) to pop the stack and empty the stack.
>>> 
>> 
>> How to generate such asm volatile insn at i386 backend? Is there any code in i386 backend I can refer for this ?
> 
> fldz is plain move of zero to XF register, fstp is generated from an
> XF move of FIRST_STACK_REG to itself with REG_DEAD note added:
> 
> #(insn 366 128 129 9 (set (reg:XF 8 st)
> #        (reg:XF 8 st)) "test.c":711:14 110 {*movxf_internal}
> #     (expr_list:REG_DEAD (reg:XF 8 st)
> #        (nil)))

What’s the reason to add the “REG_DEAD” note?

Qing
> 
> Uros.


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-10-22 16:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-06 14:01 [PATCH][middle-end][i386][version 3]Add -fzero-call-used-regs=[skip|used-gpr-arg|used-arg|all-arg|used-gpr|all-gpr|used|all] Qing Zhao
2020-10-19 13:48 ` Qing Zhao
2020-10-19 19:30 ` Uros Bizjak
2020-10-20 14:00   ` Qing Zhao
2020-10-20 15:24     ` Uros Bizjak
2020-10-20 20:04       ` Qing Zhao
2020-10-21  7:18         ` Uros Bizjak
2020-10-21  8:03           ` Uros Bizjak
2020-10-21 14:45             ` Qing Zhao
2020-10-21 16:09               ` Uros Bizjak
2020-10-21 16:51                 ` Qing Zhao
2020-10-21 18:25                 ` Segher Boessenkool
2020-10-22 14:46               ` Qing Zhao
2020-10-22 15:34                 ` Uros Bizjak
2020-10-22 16:37                   ` Qing Zhao
2020-10-20 18:12 ` Richard Sandiford
2020-10-20 21:47   ` Qing Zhao
2020-10-21 15:47     ` Richard Sandiford
2020-10-21 16:52       ` Qing Zhao
2020-10-22 13:49   ` Qing Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).